Patchwork slub: fix slab_pad_check()

login
register
mail settings
Submitter Christoph Lameter
Date Sept. 3, 2009, 7:24 p.m.
Message ID <alpine.DEB.1.10.0909031414310.29881@V090114053VZO-1>
Download mbox | patch
Permalink /patch/32902/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Paul E. McKenney - Sept. 3, 2009, 5:44 p.m.
On Thu, Sep 03, 2009 at 02:24:17PM -0500, Christoph Lameter wrote:
> On Thu, 3 Sep 2009, Eric Dumazet wrote:
> 
> > Point is we cannot deal with RCU quietness before disposing the slab cache,
> > (if SLAB_DESTROY_BY_RCU was set on the cache) since this disposing *will*
> > make call_rcu() calls when a full slab is freed/purged.
> 
> There is no need to do call_rcu calls for frees at that point since
> objects are no longer in use. We could simply disable SLAB_DESTROY_BY_RCU
> for the final clearing of caches.

Suppose we have the following sequence of events:

1.	CPU 0 is running a task that is using the slab cache.

	This CPU does kmem_cache_free(), which happens to free up
	some memory to the system.  Because SLAB_DESTROY_BY_RCU is
	set, an RCU callback is posted to do the actual freeing.

	Please note that this RCU callback is internal to the slab,
	so that the slab user cannot be aware of it.  In fact, the
	slab user isn't doing any call_rcu()s whatever.

2.	CPU 0 discovers that the slab cache can now be destroyed.

	It determines that there are no users, and has guaranteed
	that there will be no future users.  So it knows that it
	can safely do kmem_cache_destroy().

3.	In absence of rcu_barrier(), kmem_cache_destroy() would
	immediately tear down the slab data structures.

4.	At the end of the next grace period, the RCU callback posted
	(again, internally by the slab cache) is invoked.  It has a
	coronary due to the slab data structures having already been
	freed, and (worse yet) possibly reallocated for other uses.

Hence the need for the rcu_barrier() when tearing down SLAB_DESTROY_BY_RCU
slab caches.

> > And when RCU grace period is elapsed, the callback *will* need access to
> > the cache we want to dismantle. Better to not have kfreed()/poisoned it...
> 
> But going through the RCU period is pointless since no user of the cache
> remains.

Which is irrelevant.  The outstanding RCU callback was posted by the
slab cache itself, -not- by the user of the slab cache.

> > I believe you mix two RCU uses here.
> >
> > 1) The one we all know, is use normal caches (!SLAB_DESTROY_BY_RCU)
> > (or kmalloc()), and use call_rcu(... kfree_something)
> >
> >    In this case, you are 100% right that the subsystem itself has
> >    to call rcu_barrier() (or respect whatever self-synchro) itself,
> >    before calling kmem_cache_destroy()
> >
> > 2) The SLAB_DESTROY_BY_RCU one.
> >
> >    Part of cache dismantle needs to call rcu_barrier() itself.
> >    Caller doesnt have to use rcu_barrier(). It would be a waste of time,
> >    as kmem_cache_destroy() will refill rcu wait queues with its own stuff.
> 
> The dismantling does not need RCU since there are no operations on the
> objects in progress. So simply switch DESTROY_BY_RCU off for close.

Unless I am missing something, this patch re-introduces the bug that
the rcu_barrier() was added to prevent.  So, in absence of a better
explanation of what I am missing:

NACK.

							Thanx, Paul

> ---
>  mm/slub.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c	2009-09-03 10:14:51.000000000 -0500
> +++ linux-2.6/mm/slub.c	2009-09-03 10:18:32.000000000 -0500
> @@ -2594,9 +2594,9 @@ static inline int kmem_cache_close(struc
>   */
>  void kmem_cache_destroy(struct kmem_cache *s)
>  {
> -	if (s->flags & SLAB_DESTROY_BY_RCU)
> -		rcu_barrier();
>  	down_write(&slub_lock);
> +	/* Stop deferring frees so that we can immediately free structures */
> +	s->flags &= ~SLAB_DESTROY_BY_RCU;
>  	s->refcount--;
>  	if (!s->refcount) {
>  		list_del(&s->list);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Sept. 3, 2009, 5:59 p.m.
Christoph Lameter a écrit :
> On Thu, 3 Sep 2009, Eric Dumazet wrote:
> 
>> Point is we cannot deal with RCU quietness before disposing the slab cache,
>> (if SLAB_DESTROY_BY_RCU was set on the cache) since this disposing *will*
>> make call_rcu() calls when a full slab is freed/purged.
> 
> There is no need to do call_rcu calls for frees at that point since
> objects are no longer in use. We could simply disable SLAB_DESTROY_BY_RCU
> for the final clearing of caches.
> 
>> And when RCU grace period is elapsed, the callback *will* need access to
>> the cache we want to dismantle. Better to not have kfreed()/poisoned it...
> 
> But going through the RCU period is pointless since no user of the cache
> remains.
> 
>> I believe you mix two RCU uses here.
>>
>> 1) The one we all know, is use normal caches (!SLAB_DESTROY_BY_RCU)
>> (or kmalloc()), and use call_rcu(... kfree_something)
>>
>>    In this case, you are 100% right that the subsystem itself has
>>    to call rcu_barrier() (or respect whatever self-synchro) itself,
>>    before calling kmem_cache_destroy()
>>
>> 2) The SLAB_DESTROY_BY_RCU one.
>>
>>    Part of cache dismantle needs to call rcu_barrier() itself.
>>    Caller doesnt have to use rcu_barrier(). It would be a waste of time,
>>    as kmem_cache_destroy() will refill rcu wait queues with its own stuff.
> 
> The dismantling does not need RCU since there are no operations on the
> objects in progress. So simply switch DESTROY_BY_RCU off for close.
> 
> 
> ---
>  mm/slub.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c	2009-09-03 10:14:51.000000000 -0500
> +++ linux-2.6/mm/slub.c	2009-09-03 10:18:32.000000000 -0500
> @@ -2594,9 +2594,9 @@ static inline int kmem_cache_close(struc
>   */
>  void kmem_cache_destroy(struct kmem_cache *s)
>  {
> -	if (s->flags & SLAB_DESTROY_BY_RCU)
> -		rcu_barrier();
>  	down_write(&slub_lock);
> +	/* Stop deferring frees so that we can immediately free structures */
> +	s->flags &= ~SLAB_DESTROY_BY_RCU;
>  	s->refcount--;
>  	if (!s->refcount) {
>  		list_del(&s->list);

It seems very smart, but needs review of all callers to make sure no slabs
are waiting for final freeing in call_rcu queue on some cpu.

I suspect most of them will then have to use rcu_barrier() before calling
kmem_cache_destroy(), so why not factorizing code in one place ?

net/dccp/ipv6.c:1145:   .slab_flags        = SLAB_DESTROY_BY_RCU,
net/dccp/ipv4.c:941:    .slab_flags             = SLAB_DESTROY_BY_RCU,
net/ipv4/udp.c:1593:    .slab_flags        = SLAB_DESTROY_BY_RCU,
net/ipv4/udplite.c:54:  .slab_flags        = SLAB_DESTROY_BY_RCU,
net/ipv4/tcp_ipv4.c:2446:       .slab_flags             = SLAB_DESTROY_BY_RCU,
net/ipv4/udp.c.orig:1587:       .slab_flags        = SLAB_DESTROY_BY_RCU,
net/ipv6/udp.c:1274:    .slab_flags        = SLAB_DESTROY_BY_RCU,
net/ipv6/udplite.c:52:  .slab_flags        = SLAB_DESTROY_BY_RCU,
net/ipv6/tcp_ipv6.c:2085:       .slab_flags             = SLAB_DESTROY_BY_RCU,
net/netfilter/nf_conntrack_core.c:1269:                                         0, SLAB_DESTROY_BY_RCU, NULL);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pekka Enberg - Sept. 3, 2009, 7 p.m.
On Thu, Sep 3, 2009 at 8:59 PM, Eric Dumazet<eric.dumazet@gmail.com> wrote:
> Christoph Lameter a écrit :
>> On Thu, 3 Sep 2009, Eric Dumazet wrote:
>>
>>> Point is we cannot deal with RCU quietness before disposing the slab cache,
>>> (if SLAB_DESTROY_BY_RCU was set on the cache) since this disposing *will*
>>> make call_rcu() calls when a full slab is freed/purged.
>>
>> There is no need to do call_rcu calls for frees at that point since
>> objects are no longer in use. We could simply disable SLAB_DESTROY_BY_RCU
>> for the final clearing of caches.
>>
>>> And when RCU grace period is elapsed, the callback *will* need access to
>>> the cache we want to dismantle. Better to not have kfreed()/poisoned it...
>>
>> But going through the RCU period is pointless since no user of the cache
>> remains.
>>
>>> I believe you mix two RCU uses here.
>>>
>>> 1) The one we all know, is use normal caches (!SLAB_DESTROY_BY_RCU)
>>> (or kmalloc()), and use call_rcu(... kfree_something)
>>>
>>>    In this case, you are 100% right that the subsystem itself has
>>>    to call rcu_barrier() (or respect whatever self-synchro) itself,
>>>    before calling kmem_cache_destroy()
>>>
>>> 2) The SLAB_DESTROY_BY_RCU one.
>>>
>>>    Part of cache dismantle needs to call rcu_barrier() itself.
>>>    Caller doesnt have to use rcu_barrier(). It would be a waste of time,
>>>    as kmem_cache_destroy() will refill rcu wait queues with its own stuff.
>>
>> The dismantling does not need RCU since there are no operations on the
>> objects in progress. So simply switch DESTROY_BY_RCU off for close.
>>
>>
>> ---
>>  mm/slub.c |    4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> Index: linux-2.6/mm/slub.c
>> ===================================================================
>> --- linux-2.6.orig/mm/slub.c  2009-09-03 10:14:51.000000000 -0500
>> +++ linux-2.6/mm/slub.c       2009-09-03 10:18:32.000000000 -0500
>> @@ -2594,9 +2594,9 @@ static inline int kmem_cache_close(struc
>>   */
>>  void kmem_cache_destroy(struct kmem_cache *s)
>>  {
>> -     if (s->flags & SLAB_DESTROY_BY_RCU)
>> -             rcu_barrier();
>>       down_write(&slub_lock);
>> +     /* Stop deferring frees so that we can immediately free structures */
>> +     s->flags &= ~SLAB_DESTROY_BY_RCU;
>>       s->refcount--;
>>       if (!s->refcount) {
>>               list_del(&s->list);
>
> It seems very smart, but needs review of all callers to make sure no slabs
> are waiting for final freeing in call_rcu queue on some cpu.
>
> I suspect most of them will then have to use rcu_barrier() before calling
> kmem_cache_destroy(), so why not factorizing code in one place ?

[snip]

Can someone please explain what's the upside in Christoph's approach?
Performance? Correctness? Something else entirely? We're looking at a
tested bug fix here and I don't understand why I shouldn't just go
ahead and merge it. Hmm?

                        Pekka
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter - Sept. 3, 2009, 7:24 p.m.
On Thu, 3 Sep 2009, Eric Dumazet wrote:

> Point is we cannot deal with RCU quietness before disposing the slab cache,
> (if SLAB_DESTROY_BY_RCU was set on the cache) since this disposing *will*
> make call_rcu() calls when a full slab is freed/purged.

There is no need to do call_rcu calls for frees at that point since
objects are no longer in use. We could simply disable SLAB_DESTROY_BY_RCU
for the final clearing of caches.

> And when RCU grace period is elapsed, the callback *will* need access to
> the cache we want to dismantle. Better to not have kfreed()/poisoned it...

But going through the RCU period is pointless since no user of the cache
remains.

> I believe you mix two RCU uses here.
>
> 1) The one we all know, is use normal caches (!SLAB_DESTROY_BY_RCU)
> (or kmalloc()), and use call_rcu(... kfree_something)
>
>    In this case, you are 100% right that the subsystem itself has
>    to call rcu_barrier() (or respect whatever self-synchro) itself,
>    before calling kmem_cache_destroy()
>
> 2) The SLAB_DESTROY_BY_RCU one.
>
>    Part of cache dismantle needs to call rcu_barrier() itself.
>    Caller doesnt have to use rcu_barrier(). It would be a waste of time,
>    as kmem_cache_destroy() will refill rcu wait queues with its own stuff.

The dismantling does not need RCU since there are no operations on the
objects in progress. So simply switch DESTROY_BY_RCU off for close.


---
 mm/slub.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - Sept. 3, 2009, 10:03 p.m.
On Thu, Sep 03, 2009 at 05:43:12PM -0500, Christoph Lameter wrote:
> On Thu, 3 Sep 2009, Paul E. McKenney wrote:
> 
> > 2.	CPU 0 discovers that the slab cache can now be destroyed.
> >
> > 	It determines that there are no users, and has guaranteed
> > 	that there will be no future users.  So it knows that it
> > 	can safely do kmem_cache_destroy().
> >
> > 3.	In absence of rcu_barrier(), kmem_cache_destroy() would
> > 	immediately tear down the slab data structures.
> 
> Of course. This has been discussed before.
> 
> You need to ensure that no objects are in use before destroying a slab. In
> case of DESTROY_BY_RCU you must ensure that there are no potential
> readers. So use a suitable rcu barrier or something else like a
> synchronize_rcu...

If by "you must ensure" you mean "kmem_cache_destroy must ensure", then
we are in complete agreement.  Otherwise, not a chance.

> > > But going through the RCU period is pointless since no user of the cache
> > > remains.
> >
> > Which is irrelevant.  The outstanding RCU callback was posted by the
> > slab cache itself, -not- by the user of the slab cache.
> 
> There will be no rcu callbacks generated at kmem_cache_destroy with the
> patch I posted.

That is true.  However, there well might be RCU callbacks generated by
kmem_cache_free() that have not yet been invoked.  Since kmem_cache_free()
generated them, it is ridiculous to insist that the user account for them.
That responsibility must instead fall on kmem_cache_destroy().

> > > The dismantling does not need RCU since there are no operations on the
> > > objects in progress. So simply switch DESTROY_BY_RCU off for close.
> >
> > Unless I am missing something, this patch re-introduces the bug that
> > the rcu_barrier() was added to prevent.  So, in absence of a better
> > explanation of what I am missing:
> 
> The "fix" was ill advised. Slab users must ensure that no objects are in
> use before destroying a slab. Only the slab users know how the objects
> are being used. The slab allocator itself cannot know how to ensure that
> there are no pending references. Putting a rcu_barrier in there creates an
> inconsistency in the operation of kmem_cache_destroy() and an expectation
> of functionality that the function cannot provide.

If by "must ensure that no objects are in use", you mean "must have
no further references to them", then we are in agreement.  And in my
scenario above, it is not the -user- who later references the memory,
but rather the slab code itself.

Put the rcu_barrier() in kmem_cache_free().  Please.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Sept. 3, 2009, 10:08 p.m.
Christoph Lameter a écrit :
> On Thu, 3 Sep 2009, Paul E. McKenney wrote:
> 
>> 2.	CPU 0 discovers that the slab cache can now be destroyed.
>>
>> 	It determines that there are no users, and has guaranteed
>> 	that there will be no future users.  So it knows that it
>> 	can safely do kmem_cache_destroy().
>>
>> 3.	In absence of rcu_barrier(), kmem_cache_destroy() would
>> 	immediately tear down the slab data structures.
> 
> Of course. This has been discussed before.
> 
> You need to ensure that no objects are in use before destroying a slab. In
> case of DESTROY_BY_RCU you must ensure that there are no potential
> readers. So use a suitable rcu barrier or something else like a
> synchronize_rcu...
> 
>>> But going through the RCU period is pointless since no user of the cache
>>> remains.
>> Which is irrelevant.  The outstanding RCU callback was posted by the
>> slab cache itself, -not- by the user of the slab cache.
> 
> There will be no rcu callbacks generated at kmem_cache_destroy with the
> patch I posted.
> 
>>> The dismantling does not need RCU since there are no operations on the
>>> objects in progress. So simply switch DESTROY_BY_RCU off for close.
>> Unless I am missing something, this patch re-introduces the bug that
>> the rcu_barrier() was added to prevent.  So, in absence of a better
>> explanation of what I am missing:
> 
> The "fix" was ill advised. Slab users must ensure that no objects are in
> use before destroying a slab. Only the slab users know how the objects
> are being used. The slab allocator itself cannot know how to ensure that
> there are no pending references. Putting a rcu_barrier in there creates an
> inconsistency in the operation of kmem_cache_destroy() and an expectation
> of functionality that the function cannot provide.
> 



Problem is not _objects_ Christoph, but _slabs_, and your patch is not working.

Its true that when User calls kmem_cache_destroy(), all _objects_ were previously freed.
This is mandatory, with or withou SLAB_DESTROY_BY_RCU thing

Problem is that slub has some internal state, including some to-be-freed _slabs_,
that User have no control at all on it.

User cannot "know" slabs are freed, inuse, or whatever state in cache or call_rcu queues.

Face it, SLAB_DESTROY_BY_RCU is internal affair (to slub/slab/... allocators)

We absolutely need a rcu_barrier() somewhere, believe it or not. You can argue that it should
be done *before*, but it gives no speedup, only potential bugs.

Only case User should do its rcu_barrier() itself is if it knows some call_rcu() are pending
and are delaying _objects_ freeing (typical !SLAB_DESTROY_RCU usage in RCU algos).

I dont even understand why you care so much about kmem_cache_destroy(SLAB_DESTROY_BY_RCU),
given that almost nobody use it. We took almost one month to find out what the bug was in first
place...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Sept. 3, 2009, 10:17 p.m.
Eric Dumazet a écrit :
> 
> 
> 
> Problem is not _objects_ Christoph, but _slabs_, and your patch is not working.
> 
> Its true that when User calls kmem_cache_destroy(), all _objects_ were previously freed.
> This is mandatory, with or withou SLAB_DESTROY_BY_RCU thing
> 
> Problem is that slub has some internal state, including some to-be-freed _slabs_,
> that User have no control at all on it.
> 
> User cannot "know" slabs are freed, inuse, or whatever state in cache or call_rcu queues.
> 
> Face it, SLAB_DESTROY_BY_RCU is internal affair (to slub/slab/... allocators)
> 
> We absolutely need a rcu_barrier() somewhere, believe it or not. You can argue that it should
> be done *before*, but it gives no speedup, only potential bugs.
> 
> Only case User should do its rcu_barrier() itself is if it knows some call_rcu() are pending
> and are delaying _objects_ freeing (typical !SLAB_DESTROY_RCU usage in RCU algos).
> 
> I dont even understand why you care so much about kmem_cache_destroy(SLAB_DESTROY_BY_RCU),
> given that almost nobody use it. We took almost one month to find out what the bug was in first
> place...


So maybe the safest thing would be to include the rcu_barrier() to insure all objects where freed

And another one for SLAB_DESTROY_BY_RCU to make sure slabs where freed

void kmem_cache_destroy(struct kmem_cache *s)
{
	/*
	 * Make sure no objects are waiting in call_rcu queues to be freed
	 */
	rcu_barrier();

	down_write(&slub_lock);
	s->refcount--;
	if (!s->refcount) {
                list_del(&s->list);
                up_write(&slub_lock);
                if (kmem_cache_close(s)) {
                        printk(KERN_ERR "SLUB %s: %s called for cache that "
                                "still has objects.\n", s->name, __func__);
                        dump_stack();
                }
		/*
		 * Make sure no slabs are waiting in call_rcu queues to be freed
		 */
                if (s->flags & SLAB_DESTROY_BY_RCU)
                        rcu_barrier();
                sysfs_slab_remove(s);
        } else
                up_write(&slub_lock);
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter - Sept. 3, 2009, 10:43 p.m.
On Thu, 3 Sep 2009, Paul E. McKenney wrote:

> 2.	CPU 0 discovers that the slab cache can now be destroyed.
>
> 	It determines that there are no users, and has guaranteed
> 	that there will be no future users.  So it knows that it
> 	can safely do kmem_cache_destroy().
>
> 3.	In absence of rcu_barrier(), kmem_cache_destroy() would
> 	immediately tear down the slab data structures.

Of course. This has been discussed before.

You need to ensure that no objects are in use before destroying a slab. In
case of DESTROY_BY_RCU you must ensure that there are no potential
readers. So use a suitable rcu barrier or something else like a
synchronize_rcu...

> > But going through the RCU period is pointless since no user of the cache
> > remains.
>
> Which is irrelevant.  The outstanding RCU callback was posted by the
> slab cache itself, -not- by the user of the slab cache.

There will be no rcu callbacks generated at kmem_cache_destroy with the
patch I posted.

> > The dismantling does not need RCU since there are no operations on the
> > objects in progress. So simply switch DESTROY_BY_RCU off for close.
>
> Unless I am missing something, this patch re-introduces the bug that
> the rcu_barrier() was added to prevent.  So, in absence of a better
> explanation of what I am missing:

The "fix" was ill advised. Slab users must ensure that no objects are in
use before destroying a slab. Only the slab users know how the objects
are being used. The slab allocator itself cannot know how to ensure that
there are no pending references. Putting a rcu_barrier in there creates an
inconsistency in the operation of kmem_cache_destroy() and an expectation
of functionality that the function cannot provide.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter - Sept. 3, 2009, 10:44 p.m.
On Thu, 3 Sep 2009, Eric Dumazet wrote:

> It seems very smart, but needs review of all callers to make sure no slabs
> are waiting for final freeing in call_rcu queue on some cpu.

Yes. Again this is the first time we encounter a situation where a
DESTROY_BY_RCU slab has to be destroyed. So the review is quite short.

> I suspect most of them will then have to use rcu_barrier() before calling
> kmem_cache_destroy(), so why not factorizing code in one place ?

There are different forms of RCU which require different forms of
barriers. Its best to leave that up to the user. Again the user must make
sure that no objects are in use before a slab is destroyed. For
SLAB_DESTROY_BY_RCU this means that there are no potential outstanding
reads on the structure. You may need an rcu_barrier() to accomplish that.

Slight variations in the use of RCU could require different method. Better
reduce the entanglement of slabs to RCU to a mininum possible.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - Sept. 3, 2009, 11:17 p.m.
On Thu, Sep 03, 2009 at 05:44:54PM -0500, Christoph Lameter wrote:
> On Thu, 3 Sep 2009, Eric Dumazet wrote:
> 
> > It seems very smart, but needs review of all callers to make sure no slabs
> > are waiting for final freeing in call_rcu queue on some cpu.
> 
> Yes. Again this is the first time we encounter a situation where a
> DESTROY_BY_RCU slab has to be destroyed. So the review is quite short.
> 
> > I suspect most of them will then have to use rcu_barrier() before calling
> > kmem_cache_destroy(), so why not factorizing code in one place ?
> 
> There are different forms of RCU which require different forms of
> barriers. Its best to leave that up to the user. Again the user must make
> sure that no objects are in use before a slab is destroyed. For
> SLAB_DESTROY_BY_RCU this means that there are no potential outstanding
> reads on the structure. You may need an rcu_barrier() to accomplish that.
> 
> Slight variations in the use of RCU could require different method. Better
> reduce the entanglement of slabs to RCU to a mininum possible.

If it were the user of the slab who was invoking some variant of
call_rcu(), then I would agree with you.

However, call_rcu() is instead being invoked by the slab itself in the
case of SLAB_DESTROY_BY_RCU, so that there is no variation in usage.
Requiring that the user call rcu_barrier() is asking for subtle bugs.
Therefore, the best approach is to have kmem_cache_destroy() handle
the RCU cleanup, given that this cleanup is for actions taken by
kmem_cache_free(), not by the user.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter - Sept. 4, 2009, 3:33 p.m.
On Thu, 3 Sep 2009, Paul E. McKenney wrote:

> > You need to ensure that no objects are in use before destroying a slab. In
> > case of DESTROY_BY_RCU you must ensure that there are no potential
> > readers. So use a suitable rcu barrier or something else like a
> > synchronize_rcu...
>
> If by "you must ensure" you mean "kmem_cache_destroy must ensure", then
> we are in complete agreement.  Otherwise, not a chance.

Well then we are going down to a crappy interface and mixing rcu with slab
semantics. More bugs to follow in the future.

> If by "must ensure that no objects are in use", you mean "must have
> no further references to them", then we are in agreement.  And in my
> scenario above, it is not the -user- who later references the memory,
> but rather the slab code itself.

The slab code itself is not referencing the later memory with my patch. It
can only be the user.

> Put the rcu_barrier() in kmem_cache_free().  Please.

Guess we are doing this ... Sigh. Are you going to add support other rcu
versions to slab as well as time permits and as the need arises? Should we
add you as a maintainer? ;-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter - Sept. 4, 2009, 3:38 p.m.
On Fri, 4 Sep 2009, Eric Dumazet wrote:

> Problem is not _objects_ Christoph, but _slabs_, and your patch is not working.

Why?

> Its true that when User calls kmem_cache_destroy(), all _objects_ were
> previously freed. This is mandatory, with or withou SLAB_DESTROY_BY_RCU
> thing

Right.

> Problem is that slub has some internal state, including some to-be-freed _slabs_,
> that User have no control at all on it.

Those are going to be freed without calls to rcu with my patch. The only
reason for earlier rcu frees are user calls to kfree.

> Face it, SLAB_DESTROY_BY_RCU is internal affair (to slub/slab/... allocators)

Nope the user must follow RCU guidelines when using objects.

> We absolutely need a rcu_barrier() somewhere, believe it or not. You can
> argue that it should be done *before*, but it gives no speedup, only
> potential bugs.

I never said that you do not need an rcu_barrier() for this particular
situation? Why suggest such a thing?

The insertion of rcu stuff in the slab code will lead to future bugs since
now the slab logic is tied to the semantics of a particular rcu
implementatin.

> Only case User should do its rcu_barrier() itself is if it knows some
> call_rcu() are pending and are delaying _objects_ freeing (typical
> !SLAB_DESTROY_RCU usage in RCU algos).

Ok then the user already has to deal with the barriers. The API is
inconsistent if you put this into kmem_cache_destroy.

> I dont even understand why you care so much about
> kmem_cache_destroy(SLAB_DESTROY_BY_RCU), given that almost nobody use
> it. We took almost one month to find out what the bug was in first
> place...

This is already the second bug on this issue. Given the complexity of rcu
it is to be experted that inserting more RCU semantics into the slab
allocators is likely to cause future chains of new features and
bugs in slab allocators.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter - Sept. 4, 2009, 3:39 p.m.
On Fri, 4 Sep 2009, Eric Dumazet wrote:

> And another one for SLAB_DESTROY_BY_RCU to make sure slabs where freed

Now its two rcu_barriers(). What I feared comes true.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter - Sept. 4, 2009, 3:42 p.m.
On Thu, 3 Sep 2009, Paul E. McKenney wrote:

> If it were the user of the slab who was invoking some variant of
> call_rcu(), then I would agree with you.

The user already has to deal with it as explained by Eric.

> However, call_rcu() is instead being invoked by the slab itself in the
> case of SLAB_DESTROY_BY_RCU, so that there is no variation in usage.
> Requiring that the user call rcu_barrier() is asking for subtle bugs.
> Therefore, the best approach is to have kmem_cache_destroy() handle
> the RCU cleanup, given that this cleanup is for actions taken by
> kmem_cache_free(), not by the user.

The user already has to properly handle barriers and rcu logic in order to
use objects handled with RCU properly. Moreover the user even has to check
that the object is not suddenly checked under it. Its already complex.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - Sept. 4, 2009, 8:42 p.m.
On Fri, Sep 04, 2009 at 12:17:34AM +0200, Eric Dumazet wrote:
> Eric Dumazet a écrit :
> > 
> > 
> > 
> > Problem is not _objects_ Christoph, but _slabs_, and your patch is not working.
> > 
> > Its true that when User calls kmem_cache_destroy(), all _objects_ were previously freed.
> > This is mandatory, with or withou SLAB_DESTROY_BY_RCU thing
> > 
> > Problem is that slub has some internal state, including some to-be-freed _slabs_,
> > that User have no control at all on it.
> > 
> > User cannot "know" slabs are freed, inuse, or whatever state in cache or call_rcu queues.
> > 
> > Face it, SLAB_DESTROY_BY_RCU is internal affair (to slub/slab/... allocators)
> > 
> > We absolutely need a rcu_barrier() somewhere, believe it or not. You can argue that it should
> > be done *before*, but it gives no speedup, only potential bugs.
> > 
> > Only case User should do its rcu_barrier() itself is if it knows some call_rcu() are pending
> > and are delaying _objects_ freeing (typical !SLAB_DESTROY_RCU usage in RCU algos).
> > 
> > I dont even understand why you care so much about kmem_cache_destroy(SLAB_DESTROY_BY_RCU),
> > given that almost nobody use it. We took almost one month to find out what the bug was in first
> > place...
> 
> 
> So maybe the safest thing would be to include the rcu_barrier() to
> insure all objects where freed

I argue that the above is the user's responsibility.  That said, I don't
see why the user would pass a SLAB_DESTROY_BY_RCU object to call_rcu().
So I would want to see an example of this before inflicting a pair
of rcu_barrier() calls on kmem_cache_destroy().

> And another one for SLAB_DESTROY_BY_RCU to make sure slabs where freed

This last is I believe kmem_cache's responsibility.

							Thanx, Paul

> void kmem_cache_destroy(struct kmem_cache *s)
> {
> 	/*
> 	 * Make sure no objects are waiting in call_rcu queues to be freed
> 	 */
> 	rcu_barrier();
> 
> 	down_write(&slub_lock);
> 	s->refcount--;
> 	if (!s->refcount) {
>                 list_del(&s->list);
>                 up_write(&slub_lock);
>                 if (kmem_cache_close(s)) {
>                         printk(KERN_ERR "SLUB %s: %s called for cache that "
>                                 "still has objects.\n", s->name, __func__);
>                         dump_stack();
>                 }
> 		/*
> 		 * Make sure no slabs are waiting in call_rcu queues to be freed
> 		 */
>                 if (s->flags & SLAB_DESTROY_BY_RCU)
>                         rcu_barrier();
>                 sysfs_slab_remove(s);
>         } else
>                 up_write(&slub_lock);
> }
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - Sept. 4, 2009, 8:43 p.m.
On Fri, Sep 04, 2009 at 11:42:17AM -0400, Christoph Lameter wrote:
> On Thu, 3 Sep 2009, Paul E. McKenney wrote:
> 
> > If it were the user of the slab who was invoking some variant of
> > call_rcu(), then I would agree with you.
> 
> The user already has to deal with it as explained by Eric.

I didn't read his email that way.  Perhaps I misinterpreted it.

> > However, call_rcu() is instead being invoked by the slab itself in the
> > case of SLAB_DESTROY_BY_RCU, so that there is no variation in usage.
> > Requiring that the user call rcu_barrier() is asking for subtle bugs.
> > Therefore, the best approach is to have kmem_cache_destroy() handle
> > the RCU cleanup, given that this cleanup is for actions taken by
> > kmem_cache_free(), not by the user.
> 
> The user already has to properly handle barriers and rcu logic in order to
> use objects handled with RCU properly. Moreover the user even has to check
> that the object is not suddenly checked under it. Its already complex.

mm/slab.c has had RCU calls since 2.6.9, so this is not new.

> Guess we are doing this ... Sigh. Are you going to add support other rcu
> versions to slab as well as time permits and as the need arises? Should
> we add you as a maintainer? ;-)

I don't see any point in adding anything resembling SLAB_DESTROY_BY_RCU_BH,
SLAB_DESTROY_BY_RCU_SCHED, or SLAB_DESTROY_BY_SRCU unless and until
someone needs it.  And I most especially don't see the point of adding
(say) SLAB_DESTROY_BY_RCU_BH_SCHED until someone convinces me of the
need for it.  I would prefer to put the energy into further streamlining
the underlying RCU implementation, maybe someday collapsing RCU-BH back
into RCU.  ;-)

We have gotten along fine with only SLAB_DESTROY_BY_RCU for almost
five years, so I think we are plenty fine with what we have.  So, as
you say, "as the need arises".

I don't see any more need to add me as maintainer of slab and friends
than of btrfs, netfilter, selinux, decnet, afs, wireless, or any of a
number of other subsystems that use RCU.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter - Sept. 8, 2009, 7:57 p.m.
On Fri, 4 Sep 2009, Paul E. McKenney wrote:

> We have gotten along fine with only SLAB_DESTROY_BY_RCU for almost
> five years, so I think we are plenty fine with what we have.  So, as
> you say, "as the need arises".

These were the glory years where SLAB_DESTROY_BY_RCU was only used for
anonymous vmas. Now Eric has picked it up for the net subsystem. You may
see the RCU use proliferate.

The kmem_cache_destroy rcu barriers did not matter until
SLAB_DESTROY_BY_RCU spread.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - Sept. 8, 2009, 10:20 p.m.
On Tue, Sep 08, 2009 at 03:57:04PM -0400, Christoph Lameter wrote:
> On Fri, 4 Sep 2009, Paul E. McKenney wrote:
> 
> > We have gotten along fine with only SLAB_DESTROY_BY_RCU for almost
> > five years, so I think we are plenty fine with what we have.  So, as
> > you say, "as the need arises".
> 
> These were the glory years where SLAB_DESTROY_BY_RCU was only used for
> anonymous vmas. Now Eric has picked it up for the net subsystem. You may
> see the RCU use proliferate.
> 
> The kmem_cache_destroy rcu barriers did not matter until
> SLAB_DESTROY_BY_RCU spread.

Certainly it is true that increased use of RCU has resulted in new
requirements, which have in turn led to any number of changes over
the years.

Are you saying that people have already asked you for additional
variants of SLAB_DESTROY_BY_RCU?  If so, please don't keep them a secret!
Otherwise, experience indicates that it is best to wait for the new uses,
because they usually aren't quite what one might expect.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter - Sept. 8, 2009, 10:41 p.m.
On Tue, 8 Sep 2009, Paul E. McKenney wrote:

> Are you saying that people have already asked you for additional
> variants of SLAB_DESTROY_BY_RCU?  If so, please don't keep them a secret!
> Otherwise, experience indicates that it is best to wait for the new uses,
> because they usually aren't quite what one might expect.

No direct request but I have seen the network developers discover these
features and their caching benefits over the last year. It is likely that
they will try to push it into more components of the net subsystem.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - Sept. 8, 2009, 10:59 p.m.
On Tue, Sep 08, 2009 at 06:41:14PM -0400, Christoph Lameter wrote:
> On Tue, 8 Sep 2009, Paul E. McKenney wrote:
> 
> > Are you saying that people have already asked you for additional
> > variants of SLAB_DESTROY_BY_RCU?  If so, please don't keep them a secret!
> > Otherwise, experience indicates that it is best to wait for the new uses,
> > because they usually aren't quite what one might expect.
> 
> No direct request but I have seen the network developers discover these
> features and their caching benefits over the last year. It is likely that
> they will try to push it into more components of the net subsystem.

So if they push it far enough, they might well decide that they need
a SLAB_DESTROY_BY_RCU_BH, for example.  Looks like seven bits left,
so unless I am missing something, should not be a huge problem should
this need arise.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter - Sept. 9, 2009, 2:04 p.m.
On Tue, 8 Sep 2009, Paul E. McKenney wrote:

> > No direct request but I have seen the network developers discover these
> > features and their caching benefits over the last year. It is likely that
> > they will try to push it into more components of the net subsystem.
>
> So if they push it far enough, they might well decide that they need
> a SLAB_DESTROY_BY_RCU_BH, for example.  Looks like seven bits left,
> so unless I am missing something, should not be a huge problem should
> this need arise.

I'd rather have the call_rcu in the slabs replaced by a function that
can be set by the user. Then we can remove all rcu barriers from the code
and the slabs could be used with any type of rcu functionality.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - Sept. 9, 2009, 2:42 p.m.
On Wed, Sep 09, 2009 at 10:04:20AM -0400, Christoph Lameter wrote:
> On Tue, 8 Sep 2009, Paul E. McKenney wrote:
> 
> > > No direct request but I have seen the network developers discover these
> > > features and their caching benefits over the last year. It is likely that
> > > they will try to push it into more components of the net subsystem.
> >
> > So if they push it far enough, they might well decide that they need
> > a SLAB_DESTROY_BY_RCU_BH, for example.  Looks like seven bits left,
> > so unless I am missing something, should not be a huge problem should
> > this need arise.
> 
> I'd rather have the call_rcu in the slabs replaced by a function that
> can be set by the user. Then we can remove all rcu barriers from the code
> and the slabs could be used with any type of rcu functionality.

If the embedded guys are OK with an additional pointer in the slab data
structure, I have no objection to this approach.  I am assuming that we
would use the usual ops-style structure full of pointers to functions
in order to avoid a pair of extra pointers.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter - Sept. 9, 2009, 2:53 p.m.
On Wed, 9 Sep 2009, Paul E. McKenney wrote:

> If the embedded guys are OK with an additional pointer in the slab data
> structure, I have no objection to this approach.  I am assuming that we
> would use the usual ops-style structure full of pointers to functions
> in order to avoid a pair of extra pointers.

This would also allow us to get rid of the ctor pointer in
kmem_cache_create(). We can put it into the same structure.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - Sept. 9, 2009, 3:09 p.m.
On Wed, Sep 09, 2009 at 10:53:22AM -0400, Christoph Lameter wrote:
> On Wed, 9 Sep 2009, Paul E. McKenney wrote:
> 
> > If the embedded guys are OK with an additional pointer in the slab data
> > structure, I have no objection to this approach.  I am assuming that we
> > would use the usual ops-style structure full of pointers to functions
> > in order to avoid a pair of extra pointers.
> 
> This would also allow us to get rid of the ctor pointer in
> kmem_cache_create(). We can put it into the same structure.

Sounds good -- that would result in no increase in size.  Could even
add a dtor if people want it.  ;-)  (Sorry, couldn't resist...)

						Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2009-09-03 10:14:51.000000000 -0500
+++ linux-2.6/mm/slub.c	2009-09-03 10:18:32.000000000 -0500
@@ -2594,9 +2594,9 @@  static inline int kmem_cache_close(struc
  */
 void kmem_cache_destroy(struct kmem_cache *s)
 {
-	if (s->flags & SLAB_DESTROY_BY_RCU)
-		rcu_barrier();
 	down_write(&slub_lock);
+	/* Stop deferring frees so that we can immediately free structures */
+	s->flags &= ~SLAB_DESTROY_BY_RCU;
 	s->refcount--;
 	if (!s->refcount) {
 		list_del(&s->list);