Message ID | 20150604103159.4744.75870.stgit@ivy |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 2015-06-04 at 12:31 +0200, Jesper Dangaard Brouer wrote: > This patch improves performance of SLUB allocator fastpath with 38% by > avoiding the call to this_cpu_cmpxchg_double() for NO-PREEMPT kernels. > > Reviewers please point out why this change is wrong, as such a large > improvement should not be possible ;-) I am not sure if anyone already answered, but the cmpxchg_double() is needed to avoid the ABA problem. This is the whole point using tid _and_ freelist Preemption is not the only thing that could happen here, think of interrupts. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 04 Jun 2015 19:37:57 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2015-06-04 at 12:31 +0200, Jesper Dangaard Brouer wrote: > > This patch improves performance of SLUB allocator fastpath with 38% by > > avoiding the call to this_cpu_cmpxchg_double() for NO-PREEMPT kernels. > > > > Reviewers please point out why this change is wrong, as such a large > > improvement should not be possible ;-) > > I am not sure if anyone already answered, but the cmpxchg_double() > is needed to avoid the ABA problem. > > This is the whole point using tid _and_ freelist > > Preemption is not the only thing that could happen here, think of > interrupts. Yes, I sort of already knew this. My real question is if disabling local interrupts is enough to avoid this? And, does local irq disabling also stop preemption? Questions relate to this patch: http://ozlabs.org/~akpm/mmots/broken-out/slub-bulk-alloc-extract-objects-from-the-per-cpu-slab.patch
On Mon, 8 Jun 2015, Jesper Dangaard Brouer wrote: > My real question is if disabling local interrupts is enough to avoid this? Yes the initial release of slub used interrupt disable in the fast paths. > And, does local irq disabling also stop preemption? Of course. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 8 Jun 2015 04:39:38 -0500 (CDT) Christoph Lameter <cl@linux.com> wrote: > On Mon, 8 Jun 2015, Jesper Dangaard Brouer wrote: > > > My real question is if disabling local interrupts is enough to avoid this? > > Yes the initial release of slub used interrupt disable in the fast paths. Thanks for the confirmation. For this code path we would need the save/restore variant, which is more expensive than the local cmpxchg16b. In case of bulking, we should be able to use the less expensive local_irq_{disable,enable}. Cost of local IRQ toggling (CPU E5-2695): * local_irq_{disable,enable}: 7 cycles(tsc) - 2.861 ns * local_irq_{save,restore} : 37 cycles(tsc) - 14.846 ns p.s. I'm back working on bulking API... > > And, does local irq disabling also stop preemption? > > Of course. Thanks for confirming this.
diff --git a/mm/slub.c b/mm/slub.c index 54c0876..b31991f 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2489,13 +2489,32 @@ redo: * against code executing on this cpu *not* from access by * other cpus. */ - if (unlikely(!this_cpu_cmpxchg_double( - s->cpu_slab->freelist, s->cpu_slab->tid, - object, tid, - next_object, next_tid(tid)))) { - - note_cmpxchg_failure("slab_alloc", s, tid); - goto redo; + if (IS_ENABLED(CONFIG_PREEMPT)) { + if (unlikely(!this_cpu_cmpxchg_double( + s->cpu_slab->freelist, s->cpu_slab->tid, + object, tid, + next_object, next_tid(tid)))) { + + note_cmpxchg_failure("slab_alloc", s, tid); + goto redo; + } + } else { + // HACK - On a NON-PREEMPT cmpxchg is not necessary(?) + __this_cpu_write(s->cpu_slab->tid, next_tid(tid)); + __this_cpu_write(s->cpu_slab->freelist, next_object); + /* + * Q: What happens in-case called from interrupt handler? + * + * If we need to disable (local) IRQs then most of the + * saving is lost. E.g. the local_irq_{save,restore} + * is too costly. + * + * Saved (alloc+free): 18 cycles - 7.157ns + * + * Cost of (CPU E5-2695): + * local_irq_{disable,enable}: 7 cycles(tsc) - 2.861 ns + * local_irq_{save,restore} : 37 cycles(tsc) - 14.846 ns + */ } prefetch_freepointer(s, next_object); stat(s, ALLOC_FASTPATH); @@ -2726,14 +2745,21 @@ redo: if (likely(page == c->page)) { set_freepointer(s, object, c->freelist); - if (unlikely(!this_cpu_cmpxchg_double( - s->cpu_slab->freelist, s->cpu_slab->tid, - c->freelist, tid, - object, next_tid(tid)))) { + if (IS_ENABLED(CONFIG_PREEMPT)) { + if (unlikely(!this_cpu_cmpxchg_double( + s->cpu_slab->freelist, s->cpu_slab->tid, + c->freelist, tid, + object, next_tid(tid)))) { - note_cmpxchg_failure("slab_free", s, tid); - goto redo; + note_cmpxchg_failure("slab_free", s, tid); + goto redo; + } + } else { + // HACK - On a NON-PREEMPT cmpxchg is not necessary(?) + __this_cpu_write(s->cpu_slab->tid, next_tid(tid)); + __this_cpu_write(s->cpu_slab->freelist, object); } + stat(s, FREE_FASTPATH); } else __slab_free(s, page, x, addr);