From patchwork Wed Apr 30 13:57:07 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 344163 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 9A4E4140137 for ; Wed, 30 Apr 2014 23:57:25 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :content-type:content-transfer-encoding; q=dns; s=default; b=Lfp OSHNoG9jfFWVZqxAq6KX/JYYvDX4LVdfgxlJW0mCxNnLQGk7eUbZezRfGOYNuy8b lqO3YrDuD1QUJ7LQaCPeRncsGTC5oAW5xn3see55M86GlQh+73z5sLYDA9XXEU0U fZXUkRlfSvAbDYsc8fvMIPPa8NMgLmIKN8HKrhu0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :content-type:content-transfer-encoding; s=default; bh=uXKOoEj+x geOkjOPJ4SZO29n/24=; b=SRgNbmO9xRBfBlo3aoNCPkb5RKC05LuoQ2hJ6/qNi mFguTDaqt9ktwkZIIkpe0TPh6e0IUmPXP8CGJyYZlmVeBCA/QtvZvj5amyN1qJak WrmlmvQUatxUlCANU+jwAcs+MUgbGOPFKb3YaDtceSHIKOkksAoHOjidFU8l5nom xM= Received: (qmail 25544 invoked by alias); 30 Apr 2014 13:57:18 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 25532 invoked by uid 89); 30 Apr 2014 13:57:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e24smtp01.br.ibm.com Message-ID: <53610133.3070908@linux.vnet.ibm.com> Date: Wed, 30 Apr 2014 10:57:07 -0300 From: Adhemerval Zanella User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: "GNU C. Library" Subject: [PATCH 1/2] Single thread optimization for malloc atomics X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14043013-1524-0000-0000-0000099D804B This patch adds a single-thread optimization for malloc atomic usage to first check if process is single-thread (ST) and if so use normal load/store instead of atomic instructions. It is a respin of my initial attempt to add ST optimization on PowerPC. Now malloc optimization is arch-agnostic and rely in already defined macros in GLIBC (SINGLE_THREAD_P). x86 probably would kike an atomic.h refactor to avoid the double check for '__local_multiple_threads', but that is not the aim of this patch. Tested on powerpc32, powerpc64, and x86_64. --- * malloc/malloc.c (MALLOC_ATOMIC_OR): New macro: optimize atomic for single thread. (MALLOC_ATOMIC_AND): Likewise. (MALLOC_ATOMIC_CAS_VAL_ACQ): Likewise. (MALLOC_ATOMIC_CAS_VAL_REL): Likewise. (clear_fastchunks): Use optimized single thread macro. (set_fastchunks): Likewise. (_int_malloc): Likewise. (_int_free): Likewise. --- diff --git a/malloc/malloc.c b/malloc/malloc.c index 1120d4d..bb0aa82 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -231,6 +231,7 @@ #include #include +#include /* For SINGLE_THREAD_P macro */ /* For uintptr_t. */ #include @@ -243,6 +244,57 @@ /* + Single-thread lock optimization: atomic primitives first check the number of + threads and avoid atomic instructions for single-thread case. + */ +#define MALLOC_ATOMIC_OR(mem, mask) \ + ({ \ + if (!SINGLE_THREAD_P) \ + catomic_or (mem, mask); \ + else \ + *mem |= mask; \ + }) + +#define MALLOC_ATOMIC_AND(mem, mask) \ + ({ \ + if (!SINGLE_THREAD_P) \ + catomic_and (mem, mask); \ + else \ + *mem &= mask; \ + }) + +#define MALLOC_ATOMIC_CAS_VAL_ACQ(mem, newval, oldval) \ + ({ \ + __typeof (*(mem)) __tmp; \ + __typeof (mem) __memp = (mem); \ + if (!SINGLE_THREAD_P) \ + __tmp = catomic_compare_and_exchange_val_acq (mem, newval, oldval); \ + else \ + { \ + __tmp = *__memp; \ + if (__tmp == oldval) \ + *__memp = newval; \ + } \ + __tmp; \ + }) + +#define MALLOC_ATOMIC_CAS_VAL_REL(mem, newval, oldval) \ + ({ \ + __typeof (*(mem)) __tmp; \ + __typeof (mem) __memp = (mem); \ + if (!SINGLE_THREAD_P) \ + __tmp = catomic_compare_and_exchange_val_rel (mem, newval, oldval); \ + else \ + { \ + __tmp = *__memp; \ + if (__tmp == oldval) \ + *__memp = newval; \ + } \ + __tmp; \ + }) + + +/* Debugging: Because freed chunks may be overwritten with bookkeeping fields, this @@ -1632,8 +1684,8 @@ typedef struct malloc_chunk *mfastbinptr; #define FASTCHUNKS_BIT (1U) #define have_fastchunks(M) (((M)->flags & FASTCHUNKS_BIT) == 0) -#define clear_fastchunks(M) catomic_or (&(M)->flags, FASTCHUNKS_BIT) -#define set_fastchunks(M) catomic_and (&(M)->flags, ~FASTCHUNKS_BIT) +#define clear_fastchunks(M) MALLOC_ATOMIC_OR (&(M)->flags, FASTCHUNKS_BIT) +#define set_fastchunks(M) MALLOC_ATOMIC_AND (&(M)->flags, ~FASTCHUNKS_BIT) /* NONCONTIGUOUS_BIT indicates that MORECORE does not return contiguous @@ -3334,7 +3386,7 @@ _int_malloc (mstate av, size_t bytes) if (victim == NULL) break; } - while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim)) + while ((pp = MALLOC_ATOMIC_CAS_VAL_ACQ (fb, victim->fd, victim)) != victim); if (victim != 0) { @@ -3903,7 +3955,7 @@ _int_free (mstate av, mchunkptr p, int have_lock) old_idx = fastbin_index(chunksize(old)); p->fd = old2 = old; } - while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) != old2); + while ((old = MALLOC_ATOMIC_CAS_VAL_REL (fb, p, old2)) != old2); if (have_lock && old != NULL && __builtin_expect (old_idx != idx, 0)) {