Message ID | 20150615155256.18824.42651.stgit@devil |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, 15 Jun 2015, Jesper Dangaard Brouer wrote: > + for (i = 0; i < size; i++) { > + void *object = p[i]; > + > + if (unlikely(!object)) > + continue; // HOW ABOUT BUG_ON()??? Sure BUG_ON would be fitting here. > + > + page = virt_to_head_page(object); > + BUG_ON(s != page->slab_cache); /* Check if valid slab page */ This is the check if the slab page belongs to the slab cache we are interested in. > + > + if (c->page == page) { > + /* Fastpath: local CPU free */ > + set_freepointer(s, object, c->freelist); > + c->freelist = object; > + } else { > + c->tid = next_tid(c->tid); tids are only useful for the fastpath. No need to fiddle around with them for the slowpath. > + local_irq_enable(); > + /* Slowpath: overhead locked cmpxchg_double_slab */ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/15/2015 08:52 AM, Jesper Dangaard Brouer wrote: > This implements SLUB specific kmem_cache_free_bulk(). SLUB allocator > now both have bulk alloc and free implemented. > > Play nice and reenable local IRQs while calling slowpath. > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > --- > mm/slub.c | 32 +++++++++++++++++++++++++++++++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 98d0e6f73ec1..cc4f870677bb 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free); > > void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p) > { > - __kmem_cache_free_bulk(s, size, p); > + struct kmem_cache_cpu *c; > + struct page *page; > + int i; > + > + local_irq_disable(); > + c = this_cpu_ptr(s->cpu_slab); > + > + for (i = 0; i < size; i++) { > + void *object = p[i]; > + > + if (unlikely(!object)) > + continue; // HOW ABOUT BUG_ON()??? > + > + page = virt_to_head_page(object); > + BUG_ON(s != page->slab_cache); /* Check if valid slab page */ > + > + if (c->page == page) { > + /* Fastpath: local CPU free */ > + set_freepointer(s, object, c->freelist); > + c->freelist = object; > + } else { > + c->tid = next_tid(c->tid); > + local_irq_enable(); > + /* Slowpath: overhead locked cmpxchg_double_slab */ > + __slab_free(s, page, object, _RET_IP_); > + local_irq_disable(); > + c = this_cpu_ptr(s->cpu_slab); > + } > + } > + c->tid = next_tid(c->tid); > + local_irq_enable(); > } > EXPORT_SYMBOL(kmem_cache_free_bulk); So if the idea is to batch the freeing maybe you should look at doing the freeing in two passes. The first would be to free all those buffers that share their page with the percpu slab. Then you could just free everything else in the second pass after you have re-enabled IRQs. - Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 15, 2015 at 05:52:56PM +0200, Jesper Dangaard Brouer wrote: > This implements SLUB specific kmem_cache_free_bulk(). SLUB allocator > now both have bulk alloc and free implemented. > > Play nice and reenable local IRQs while calling slowpath. > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > --- > mm/slub.c | 32 +++++++++++++++++++++++++++++++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 98d0e6f73ec1..cc4f870677bb 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free); > > void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p) > { > - __kmem_cache_free_bulk(s, size, p); > + struct kmem_cache_cpu *c; > + struct page *page; > + int i; > + > + local_irq_disable(); > + c = this_cpu_ptr(s->cpu_slab); > + > + for (i = 0; i < size; i++) { > + void *object = p[i]; > + > + if (unlikely(!object)) > + continue; // HOW ABOUT BUG_ON()??? > + > + page = virt_to_head_page(object); > + BUG_ON(s != page->slab_cache); /* Check if valid slab page */ You need to use cache_from_objt() to support kmemcg accounting. And, slab_free_hook() should be called before free. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 15, 2015 at 05:52:56PM +0200, Jesper Dangaard Brouer wrote: > This implements SLUB specific kmem_cache_free_bulk(). SLUB allocator > now both have bulk alloc and free implemented. > > Play nice and reenable local IRQs while calling slowpath. > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > --- > mm/slub.c | 32 +++++++++++++++++++++++++++++++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 98d0e6f73ec1..cc4f870677bb 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free); > > void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p) > { > - __kmem_cache_free_bulk(s, size, p); > + struct kmem_cache_cpu *c; > + struct page *page; > + int i; > + > + local_irq_disable(); > + c = this_cpu_ptr(s->cpu_slab); > + > + for (i = 0; i < size; i++) { > + void *object = p[i]; > + > + if (unlikely(!object)) > + continue; // HOW ABOUT BUG_ON()??? > + > + page = virt_to_head_page(object); > + BUG_ON(s != page->slab_cache); /* Check if valid slab page */ > + > + if (c->page == page) { > + /* Fastpath: local CPU free */ > + set_freepointer(s, object, c->freelist); > + c->freelist = object; > + } else { > + c->tid = next_tid(c->tid); > + local_irq_enable(); > + /* Slowpath: overhead locked cmpxchg_double_slab */ > + __slab_free(s, page, object, _RET_IP_); > + local_irq_disable(); > + c = this_cpu_ptr(s->cpu_slab); SLUB free path doesn't need to irq management in many cases although it uses cmpxchg_doule_slab. Is this really better than just calling __kmem_cache_free_bulk()? Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 15 Jun 2015 11:34:44 -0500 (CDT) Christoph Lameter <cl@linux.com> wrote: > On Mon, 15 Jun 2015, Jesper Dangaard Brouer wrote: > > > + for (i = 0; i < size; i++) { > > + void *object = p[i]; > > + > > + if (unlikely(!object)) > > + continue; // HOW ABOUT BUG_ON()??? > > Sure BUG_ON would be fitting here. Okay, will do in V2. > > + > > + page = virt_to_head_page(object); > > + BUG_ON(s != page->slab_cache); /* Check if valid slab page */ > > This is the check if the slab page belongs to the slab cache we are > interested in. Is this appropriate to keep on this fastpath? (I copied the check from one of your earlier patches) > > + > > + if (c->page == page) { > > + /* Fastpath: local CPU free */ > > + set_freepointer(s, object, c->freelist); > > + c->freelist = object; > > + } else { > > + c->tid = next_tid(c->tid); > > tids are only useful for the fastpath. No need to fiddle around with them > for the slowpath. Okay, understood. > > + local_irq_enable(); > > + /* Slowpath: overhead locked cmpxchg_double_slab */
On Tue, 16 Jun 2015 16:28:06 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> Is this really better than just calling __kmem_cache_free_bulk()?
Yes, as can be seen by cover-letter, but my cover-letter does not seem
to have reached mm-list.
Measurements for the entire patchset:
Bulk - Fallback bulking - fastpath-bulking
1 - 47 cycles(tsc) 11.921 ns - 45 cycles(tsc) 11.461 ns improved 4.3%
2 - 46 cycles(tsc) 11.649 ns - 28 cycles(tsc) 7.023 ns improved 39.1%
3 - 46 cycles(tsc) 11.550 ns - 22 cycles(tsc) 5.671 ns improved 52.2%
4 - 45 cycles(tsc) 11.398 ns - 19 cycles(tsc) 4.967 ns improved 57.8%
8 - 45 cycles(tsc) 11.303 ns - 17 cycles(tsc) 4.298 ns improved 62.2%
16 - 44 cycles(tsc) 11.221 ns - 17 cycles(tsc) 4.423 ns improved 61.4%
30 - 75 cycles(tsc) 18.894 ns - 57 cycles(tsc) 14.497 ns improved 24.0%
32 - 73 cycles(tsc) 18.491 ns - 56 cycles(tsc) 14.227 ns improved 23.3%
34 - 75 cycles(tsc) 18.962 ns - 58 cycles(tsc) 14.638 ns improved 22.7%
48 - 80 cycles(tsc) 20.049 ns - 64 cycles(tsc) 16.247 ns improved 20.0%
64 - 87 cycles(tsc) 21.929 ns - 74 cycles(tsc) 18.598 ns improved 14.9%
128 - 98 cycles(tsc) 24.511 ns - 89 cycles(tsc) 22.295 ns improved 9.2%
158 - 101 cycles(tsc) 25.389 ns - 93 cycles(tsc) 23.390 ns improved 7.9%
250 - 104 cycles(tsc) 26.170 ns - 100 cycles(tsc) 25.112 ns improved 3.8%
I'll do a compare against the previous patch, and post the results.
On Tue, 16 Jun 2015 10:21:10 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote: > > On Tue, 16 Jun 2015 16:28:06 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > > > Is this really better than just calling __kmem_cache_free_bulk()? > > Yes, as can be seen by cover-letter, but my cover-letter does not seem > to have reached mm-list. > > Measurements for the entire patchset: > > Bulk - Fallback bulking - fastpath-bulking > 1 - 47 cycles(tsc) 11.921 ns - 45 cycles(tsc) 11.461 ns improved 4.3% > 2 - 46 cycles(tsc) 11.649 ns - 28 cycles(tsc) 7.023 ns improved 39.1% > 3 - 46 cycles(tsc) 11.550 ns - 22 cycles(tsc) 5.671 ns improved 52.2% > 4 - 45 cycles(tsc) 11.398 ns - 19 cycles(tsc) 4.967 ns improved 57.8% > 8 - 45 cycles(tsc) 11.303 ns - 17 cycles(tsc) 4.298 ns improved 62.2% > 16 - 44 cycles(tsc) 11.221 ns - 17 cycles(tsc) 4.423 ns improved 61.4% > 30 - 75 cycles(tsc) 18.894 ns - 57 cycles(tsc) 14.497 ns improved 24.0% > 32 - 73 cycles(tsc) 18.491 ns - 56 cycles(tsc) 14.227 ns improved 23.3% > 34 - 75 cycles(tsc) 18.962 ns - 58 cycles(tsc) 14.638 ns improved 22.7% > 48 - 80 cycles(tsc) 20.049 ns - 64 cycles(tsc) 16.247 ns improved 20.0% > 64 - 87 cycles(tsc) 21.929 ns - 74 cycles(tsc) 18.598 ns improved 14.9% > 128 - 98 cycles(tsc) 24.511 ns - 89 cycles(tsc) 22.295 ns improved 9.2% > 158 - 101 cycles(tsc) 25.389 ns - 93 cycles(tsc) 23.390 ns improved 7.9% > 250 - 104 cycles(tsc) 26.170 ns - 100 cycles(tsc) 25.112 ns improved 3.8% > > I'll do a compare against the previous patch, and post the results. Compare against previous patch: Run: previous-patch - this patch 1 - 49 cycles(tsc) 12.378 ns - 43 cycles(tsc) 10.775 ns improved 12.2% 2 - 37 cycles(tsc) 9.297 ns - 26 cycles(tsc) 6.652 ns improved 29.7% 3 - 33 cycles(tsc) 8.348 ns - 21 cycles(tsc) 5.347 ns improved 36.4% 4 - 31 cycles(tsc) 7.930 ns - 18 cycles(tsc) 4.669 ns improved 41.9% 8 - 30 cycles(tsc) 7.693 ns - 17 cycles(tsc) 4.404 ns improved 43.3% 16 - 32 cycles(tsc) 8.059 ns - 17 cycles(tsc) 4.493 ns improved 46.9% 30 - 65 cycles(tsc) 16.345 ns - 59 cycles(tsc) 14.858 ns improved 9.2% 32 - 64 cycles(tsc) 16.170 ns - 56 cycles(tsc) 14.074 ns improved 12.5% 34 - 66 cycles(tsc) 16.645 ns - 55 cycles(tsc) 13.882 ns improved 16.7% 48 - 78 cycles(tsc) 19.581 ns - 65 cycles(tsc) 16.266 ns improved 16.7% 64 - 81 cycles(tsc) 20.428 ns - 77 cycles(tsc) 19.432 ns improved 4.9% 128 - 92 cycles(tsc) 23.030 ns - 78 cycles(tsc) 19.650 ns improved 15.2% 158 - 94 cycles(tsc) 23.581 ns - 83 cycles(tsc) 20.953 ns improved 11.7% 250 - 96 cycles(tsc) 24.175 ns - 93 cycles(tsc) 23.444 ns improved 3.1% This mostly amortize the less-heavy none-locked cmpxchg_double used on fastpath. As I demonstrated earlier this fastpath cmpxchg is approx 38% of the fastpath cost. Thus, yes, doing bulk free this way is faster ;-)
On Tue, 16 Jun 2015 16:23:28 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > On Mon, Jun 15, 2015 at 05:52:56PM +0200, Jesper Dangaard Brouer wrote: > > This implements SLUB specific kmem_cache_free_bulk(). SLUB allocator > > now both have bulk alloc and free implemented. > > > > Play nice and reenable local IRQs while calling slowpath. > > > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > > --- > > mm/slub.c | 32 +++++++++++++++++++++++++++++++- > > 1 file changed, 31 insertions(+), 1 deletion(-) > > > > diff --git a/mm/slub.c b/mm/slub.c > > index 98d0e6f73ec1..cc4f870677bb 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free); > > > > void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p) > > { > > - __kmem_cache_free_bulk(s, size, p); > > + struct kmem_cache_cpu *c; > > + struct page *page; > > + int i; > > + > > + local_irq_disable(); > > + c = this_cpu_ptr(s->cpu_slab); > > + > > + for (i = 0; i < size; i++) { > > + void *object = p[i]; > > + > > + if (unlikely(!object)) > > + continue; // HOW ABOUT BUG_ON()??? > > + > > + page = virt_to_head_page(object); > > + BUG_ON(s != page->slab_cache); /* Check if valid slab page */ > > You need to use cache_from_objt() to support kmemcg accounting. > And, slab_free_hook() should be called before free. Okay, but Christoph choose to not support kmem_cache_debug() in patch2/7. Should we/I try to add kmem cache debugging support? If adding these, then I would also need to add those on alloc path...
2015-06-16 18:20 GMT+09:00 Jesper Dangaard Brouer <brouer@redhat.com>: > On Tue, 16 Jun 2015 16:23:28 +0900 > Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > >> On Mon, Jun 15, 2015 at 05:52:56PM +0200, Jesper Dangaard Brouer wrote: >> > This implements SLUB specific kmem_cache_free_bulk(). SLUB allocator >> > now both have bulk alloc and free implemented. >> > >> > Play nice and reenable local IRQs while calling slowpath. >> > >> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> >> > --- >> > mm/slub.c | 32 +++++++++++++++++++++++++++++++- >> > 1 file changed, 31 insertions(+), 1 deletion(-) >> > >> > diff --git a/mm/slub.c b/mm/slub.c >> > index 98d0e6f73ec1..cc4f870677bb 100644 >> > --- a/mm/slub.c >> > +++ b/mm/slub.c >> > @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free); >> > >> > void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p) >> > { >> > - __kmem_cache_free_bulk(s, size, p); >> > + struct kmem_cache_cpu *c; >> > + struct page *page; >> > + int i; >> > + >> > + local_irq_disable(); >> > + c = this_cpu_ptr(s->cpu_slab); >> > + >> > + for (i = 0; i < size; i++) { >> > + void *object = p[i]; >> > + >> > + if (unlikely(!object)) >> > + continue; // HOW ABOUT BUG_ON()??? >> > + >> > + page = virt_to_head_page(object); >> > + BUG_ON(s != page->slab_cache); /* Check if valid slab page */ >> >> You need to use cache_from_objt() to support kmemcg accounting. >> And, slab_free_hook() should be called before free. > > Okay, but Christoph choose to not support kmem_cache_debug() in patch2/7. > > Should we/I try to add kmem cache debugging support? kmem_cache_debug() is the check for slab internal debugging feature. slab_free_hook() and others mentioned from me are also related to external debugging features like as kasan and kmemleak. So, even if debugged kmem_cache isn't supported by bulk API, external debugging feature should be supported. > If adding these, then I would also need to add those on alloc path... Yes, please. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2015-06-16 17:57 GMT+09:00 Jesper Dangaard Brouer <brouer@redhat.com>: > On Tue, 16 Jun 2015 10:21:10 +0200 > Jesper Dangaard Brouer <brouer@redhat.com> wrote: > >> >> On Tue, 16 Jun 2015 16:28:06 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: >> >> > Is this really better than just calling __kmem_cache_free_bulk()? >> >> Yes, as can be seen by cover-letter, but my cover-letter does not seem >> to have reached mm-list. >> >> Measurements for the entire patchset: >> >> Bulk - Fallback bulking - fastpath-bulking >> 1 - 47 cycles(tsc) 11.921 ns - 45 cycles(tsc) 11.461 ns improved 4.3% >> 2 - 46 cycles(tsc) 11.649 ns - 28 cycles(tsc) 7.023 ns improved 39.1% >> 3 - 46 cycles(tsc) 11.550 ns - 22 cycles(tsc) 5.671 ns improved 52.2% >> 4 - 45 cycles(tsc) 11.398 ns - 19 cycles(tsc) 4.967 ns improved 57.8% >> 8 - 45 cycles(tsc) 11.303 ns - 17 cycles(tsc) 4.298 ns improved 62.2% >> 16 - 44 cycles(tsc) 11.221 ns - 17 cycles(tsc) 4.423 ns improved 61.4% >> 30 - 75 cycles(tsc) 18.894 ns - 57 cycles(tsc) 14.497 ns improved 24.0% >> 32 - 73 cycles(tsc) 18.491 ns - 56 cycles(tsc) 14.227 ns improved 23.3% >> 34 - 75 cycles(tsc) 18.962 ns - 58 cycles(tsc) 14.638 ns improved 22.7% >> 48 - 80 cycles(tsc) 20.049 ns - 64 cycles(tsc) 16.247 ns improved 20.0% >> 64 - 87 cycles(tsc) 21.929 ns - 74 cycles(tsc) 18.598 ns improved 14.9% >> 128 - 98 cycles(tsc) 24.511 ns - 89 cycles(tsc) 22.295 ns improved 9.2% >> 158 - 101 cycles(tsc) 25.389 ns - 93 cycles(tsc) 23.390 ns improved 7.9% >> 250 - 104 cycles(tsc) 26.170 ns - 100 cycles(tsc) 25.112 ns improved 3.8% >> >> I'll do a compare against the previous patch, and post the results. > > Compare against previous patch: > > Run: previous-patch - this patch > 1 - 49 cycles(tsc) 12.378 ns - 43 cycles(tsc) 10.775 ns improved 12.2% > 2 - 37 cycles(tsc) 9.297 ns - 26 cycles(tsc) 6.652 ns improved 29.7% > 3 - 33 cycles(tsc) 8.348 ns - 21 cycles(tsc) 5.347 ns improved 36.4% > 4 - 31 cycles(tsc) 7.930 ns - 18 cycles(tsc) 4.669 ns improved 41.9% > 8 - 30 cycles(tsc) 7.693 ns - 17 cycles(tsc) 4.404 ns improved 43.3% > 16 - 32 cycles(tsc) 8.059 ns - 17 cycles(tsc) 4.493 ns improved 46.9% So, in your test, most of objects may come from one or two slabs and your algorithm is well optimized for this case. But, is this workload normal case? If most of objects comes from many different slabs, bulk free API does enabling/disabling interrupt very much so I guess it work worse than just calling __kmem_cache_free_bulk(). Could you test this case? Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 16 Jun 2015, Joonsoo Kim wrote: > > If adding these, then I would also need to add those on alloc path... > > Yes, please. Lets fall back to the generic implementation for any of these things. We need to focus on maximum performance in these functions. The more special cases we have to handle the more all of this gets compromised. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 16 Jun 2015, Joonsoo Kim wrote: > So, in your test, most of objects may come from one or two slabs and your > algorithm is well optimized for this case. But, is this workload normal case? It is normal if the objects were bulk allocated because SLUB ensures that all objects are first allocated from one page before moving to another. > If most of objects comes from many different slabs, bulk free API does > enabling/disabling interrupt very much so I guess it work worse than > just calling __kmem_cache_free_bulk(). Could you test this case? In case of SLAB this would be an issue since the queueing mechanism destroys spatial locality. This is much less an issue for SLUB. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 16 Jun 2015 10:10:25 -0500 (CDT) Christoph Lameter <cl@linux.com> wrote: > On Tue, 16 Jun 2015, Joonsoo Kim wrote: > > > So, in your test, most of objects may come from one or two slabs and your > > algorithm is well optimized for this case. But, is this workload normal case? > > It is normal if the objects were bulk allocated because SLUB ensures that > all objects are first allocated from one page before moving to another. Yes, exactly. Maybe SLAB is different? If so, then we can handle that in the SLAB specific bulk implementation. > > If most of objects comes from many different slabs, bulk free API does > > enabling/disabling interrupt very much so I guess it work worse than > > just calling __kmem_cache_free_bulk(). Could you test this case? > > In case of SLAB this would be an issue since the queueing mechanism > destroys spatial locality. This is much less an issue for SLUB. I think Kim is worried about the cost of the enable/disable calls, when the slowpath gets called. But it is not a problem because the cost of local_irq_{disable,enable} is very low (total cost 7 cycles). It is very important that everybody realizes that the save+restore variant is very expensive, this is key: CPU: i7-4790K CPU @ 4.00GHz * local_irq_{disable,enable}: 7 cycles(tsc) - 1.821 ns * local_irq_{save,restore} : 37 cycles(tsc) - 9.443 ns Even if EVERY object need to call slowpath/__slab_free() it will be faster than calling the fallback. Because I've demonstrated the call this_cpu_cmpxchg_double() costs 9 cycles.
On Tue, 16 Jun 2015, Jesper Dangaard Brouer wrote: > It is very important that everybody realizes that the save+restore > variant is very expensive, this is key: > > CPU: i7-4790K CPU @ 4.00GHz > * local_irq_{disable,enable}: 7 cycles(tsc) - 1.821 ns > * local_irq_{save,restore} : 37 cycles(tsc) - 9.443 ns > > Even if EVERY object need to call slowpath/__slab_free() it will be > faster than calling the fallback. Because I've demonstrated the call > this_cpu_cmpxchg_double() costs 9 cycles. But the cmpxchg also stores a value. You need to add the cost of the store to the cycles. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/mm/slub.c b/mm/slub.c index 98d0e6f73ec1..cc4f870677bb 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2752,7 +2752,37 @@ EXPORT_SYMBOL(kmem_cache_free); void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p) { - __kmem_cache_free_bulk(s, size, p); + struct kmem_cache_cpu *c; + struct page *page; + int i; + + local_irq_disable(); + c = this_cpu_ptr(s->cpu_slab); + + for (i = 0; i < size; i++) { + void *object = p[i]; + + if (unlikely(!object)) + continue; // HOW ABOUT BUG_ON()??? + + page = virt_to_head_page(object); + BUG_ON(s != page->slab_cache); /* Check if valid slab page */ + + if (c->page == page) { + /* Fastpath: local CPU free */ + set_freepointer(s, object, c->freelist); + c->freelist = object; + } else { + c->tid = next_tid(c->tid); + local_irq_enable(); + /* Slowpath: overhead locked cmpxchg_double_slab */ + __slab_free(s, page, object, _RET_IP_); + local_irq_disable(); + c = this_cpu_ptr(s->cpu_slab); + } + } + c->tid = next_tid(c->tid); + local_irq_enable(); } EXPORT_SYMBOL(kmem_cache_free_bulk);
This implements SLUB specific kmem_cache_free_bulk(). SLUB allocator now both have bulk alloc and free implemented. Play nice and reenable local IRQs while calling slowpath. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> --- mm/slub.c | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html