From patchwork Thu Sep 21 18:43:20 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 817150 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-84831-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="rXLW0yLS"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xylqq29Nqz9sP1 for ; Fri, 22 Sep 2017 04:43:34 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id :content-type:content-transfer-encoding:mime-version; q=dns; s= default; b=w80IRC85BQ1Duc4P7VPPaSuhyM2LPNaJM2UF5PCYNsNHTAe5coNJ7 8Ye6QOVekihtu1ZVmLwW08VmKsJQELa8Go9cS9ol0/WfmTb86diH2R3fuZRVTsCQ 0HTXi90Xoh5/ArCApIovld+RCFByMRdI/kL0Qw9q1cftOc3pMfUsaA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id :content-type:content-transfer-encoding:mime-version; s=default; bh=8rNvCt9b999ITNbuNtz/5b+ZWUk=; b=rXLW0yLS0AM+0PuO6u76cJlOf5XK GBt2uBMnx8CxJ6HigeeQHQME2Qns2LkNC/4ufG2lnCEn/NaSNG67hkcPxUDcrEq8 s0Sp/hdjHEjATXooJ9vdKyd6zV+xdYXWRQHgKvvYm6PIBVOsp3QaUul4b5p8EpL/ PQbAloAixqX55gk= Received: (qmail 107451 invoked by alias); 21 Sep 2017 18:43:26 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 107428 invoked by uid 89); 21 Sep 2017 18:43:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=HAccept-Language:en-GB X-HELO: EUR02-HE1-obe.outbound.protection.outlook.com From: Wilco Dijkstra To: "libc-alpha@sourceware.org" , "dj@redhat.com" CC: nd Subject: [PATCH v2][malloc] Use relaxed atomics for malloc have_fastchunks Date: Thu, 21 Sep 2017 18:43:20 +0000 Message-ID: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB6PR0801MB1366; 6:NxrudDCC5Du/B4vs2xeU4TImhORSGIn8Ktirh8yrXSCUAcCvjRzd5Wir4Outo6IoCrsejDbLRmGKXdxOKpIAFMfepUTxfZVg6x4gegnwFCzzDnuFXBQS+Y0rjxgvLI1htAA2z2w6OmRWTAL+NZatvOttzPuq6jvXA6fVtaTTYlA9++5RBBlvoecmb+hD1/PCb7X9v+HMpBkkgk2lxIWdg2nQDMnCJWx19A222X4SHc0ab6uQ3Lorj4u+pTCDQ1uO7tLdjCrzWDosKbovFIj6WyWN2vj43VnpsiqC4k+txM1Ymx88WX5iFuT+lQIqfeE41AMiEBVEKYsDsLtDorFwSQ==; 5:01h+vZ/BpjSj+di5lB9kQuciJ+08VL7ouj+cIBUe3EIwtxILQAMPMpUcjdf1NUyOvzIeNwfozC1voDGxTNbi0IH+rAeKs9+TdSfTmdfvX0XFtgivHWfwICYv2or6QIDVtvFF8He3Ibw5EDYgdmsjLw==; 24:uTmsn2y6Q2Zf98q658yPX1xvvGAkodPNkaWSor+l5Srz612cAEAnPwSvG8kcbj5KYapLDlQlaXXWnVU7I0MDhYv+VY4LUd9UgV+e4Okuhjc=; 7:zDyAUtKzBsLhCPjYzmtjPZl9Oe4UnfyZUqO7LS1uxDv37N2NJO5QziuMfXReehKbrtPj9KgqepldAqa6wqoDBngg5EBEP+Ip51kmkv/QjKYyCYX8uy7/65B10t6bqB4wGKWuPULz5BceZnSMQ04qoT6++1cWr/HhxlT8E/zmcohz9MDzl80zxIjYBb1XxTeEn3Mo0zEOtKRc7G9CaP907Os2AZVZjaorJy/7DMIeCSU= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: c0862d4c-c37f-4298-b01d-08d50120a1bb x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(48565401081)(300000503095)(300135400095)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:DB6PR0801MB1366; x-ms-traffictypediagnostic: DB6PR0801MB1366: nodisclaimer: True x-exchange-antispam-report-test: UriScan:(180628864354917); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(93006095)(93001095)(100000703101)(100105400095)(3002001)(10201501046)(6055026)(6041248)(20161123564025)(20161123562025)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(20161123558100)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DB6PR0801MB1366; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DB6PR0801MB1366; x-forefront-prvs: 04371797A5 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(39860400002)(376002)(346002)(189002)(54534003)(377424004)(199003)(3280700002)(2900100001)(189998001)(2501003)(5250100002)(50986999)(54356999)(101416001)(97736004)(53936002)(478600001)(3660700001)(55016002)(8676002)(68736007)(99286003)(14454004)(9686003)(575784001)(8936002)(81166006)(86362001)(81156014)(33656002)(105586002)(4326008)(110136005)(6506006)(6436002)(6116002)(74316002)(3846002)(102836003)(305945005)(7736002)(316002)(72206003)(5660300001)(106356001)(7696004)(66066001)(2906002)(25786009); DIR:OUT; SFP:1101; SCL:1; SRVR:DB6PR0801MB1366; H:DB6PR0801MB2053.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Sep 2017 18:43:20.5556 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB1366 Currently free typically uses 2 atomic operations per call. The have_fastchunks flag indicates whether there are recently freed blocks in the fastbins. This is purely an optimization to avoid calling malloc_consolidate too often and avoiding the overhead of walking all fast bins even if all are empty during a sequence of allocations. However using catomic_or to update the flag is completely unnecessary since it can be changed into a simple boolean and accessed using relaxed atomics. There is no change in multi-threaded behaviour given the flag is already approximate (it may be set when there are no blocks in any fast bins, or it may be clear when there are free blocks that could be consolidated). Performance of malloc/free improves by 27% on a simple benchmark on AArch64 (both single and multithreaded). The number of load/store exclusive instructions is reduced by 33%. Bench-malloc-thread speeds up by ~3% in all cases. Passes GLIBC tests. OK to commit? ChangeLog: 2017-09-21 Wilco Dijkstra * malloc/malloc.c (FASTCHUNKS_BIT): Remove. (have_fastchunks): Remove. (clear_fastchunks): Remove. (set_fastchunks): Remove. (malloc_state): Add have_fastchunks. (malloc_init_state): Use have_fastchunks. (do_check_malloc_state): Remove incorrect invariant checks. (_int_malloc): Use have_fastchunks. (_int_free): Likewise. (malloc_consolidate): Likewise. diff --git a/malloc/malloc.c b/malloc/malloc.c index 1c2a0b05b78c84cea60ee998108180d51b1f1ddf..082c2b927727bff441cf48744265628d0bc40add 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -1604,27 +1604,6 @@ typedef struct malloc_chunk *mfastbinptr; #define FASTBIN_CONSOLIDATION_THRESHOLD (65536UL) /* - Since the lowest 2 bits in max_fast don't matter in size comparisons, - they are used as flags. - */ - -/* - FASTCHUNKS_BIT held in max_fast indicates that there are probably - some fastbin chunks. It is set true on entering a chunk into any - fastbin, and cleared only in malloc_consolidate. - - The truth value is inverted so that have_fastchunks will be true - upon startup (since statics are zero-filled), simplifying - initialization checks. - */ - -#define FASTCHUNKS_BIT (1U) - -#define have_fastchunks(M) (((M)->flags & FASTCHUNKS_BIT) == 0) -#define clear_fastchunks(M) catomic_or (&(M)->flags, FASTCHUNKS_BIT) -#define set_fastchunks(M) catomic_and (&(M)->flags, ~FASTCHUNKS_BIT) - -/* NONCONTIGUOUS_BIT indicates that MORECORE does not return contiguous regions. Otherwise, contiguity is exploited in merging together, when possible, results from consecutive MORECORE calls. @@ -1672,6 +1651,17 @@ get_max_fast (void) ----------- Internal state representation and initialization ----------- */ +/* + have_fastchunks indicates that there are probably some fastbin chunks. + It is set true on entering a chunk into any fastbin, and cleared early in + malloc_consolidate. The value is approximate since it may be set when there + are no fastbin chunks, or it may be clear even if there are fastbin chunks + available. Given it's sole purpose is to reduce number of redundant calls to + malloc_consolidate, it does not affect correctness. As a result we can safely + use relaxed atomic accesses. + */ + + struct malloc_state { /* Serialize access. */ @@ -1680,6 +1670,9 @@ struct malloc_state /* Flags (formerly in max_fast). */ int flags; + /* Set if the fastbin chunks contain recently inserted free blocks. */ + bool have_fastchunks; + /* Fastbins */ mfastbinptr fastbinsY[NFASTBINS]; @@ -1823,7 +1816,7 @@ malloc_init_state (mstate av) set_noncontiguous (av); if (av == &main_arena) set_max_fast (DEFAULT_MXFAST); - av->flags |= FASTCHUNKS_BIT; + atomic_store_relaxed (&av->have_fastchunks, false); av->top = initial_top (av); } @@ -2179,11 +2172,6 @@ do_check_malloc_state (mstate av) } } - if (total != 0) - assert (have_fastchunks (av)); - else if (!have_fastchunks (av)) - assert (total == 0); - /* check normal bins */ for (i = 1; i < NBINS; ++i) { @@ -3650,7 +3638,7 @@ _int_malloc (mstate av, size_t bytes) else { idx = largebin_index (nb); - if (have_fastchunks (av)) + if (atomic_load_relaxed (&av->have_fastchunks)) malloc_consolidate (av); } @@ -4058,7 +4046,7 @@ _int_malloc (mstate av, size_t bytes) /* When we are using atomic ops to free fast chunks we can get here for all block sizes. */ - else if (have_fastchunks (av)) + else if (atomic_load_relaxed (&av->have_fastchunks)) { malloc_consolidate (av); /* restore original bin index */ @@ -4163,7 +4151,7 @@ _int_free (mstate av, mchunkptr p, int have_lock) free_perturb (chunk2mem(p), size - 2 * SIZE_SZ); - set_fastchunks(av); + atomic_store_relaxed (&av->have_fastchunks, true); unsigned int idx = fastbin_index(size); fb = &fastbin (av, idx); @@ -4291,7 +4279,7 @@ _int_free (mstate av, mchunkptr p, int have_lock) */ if ((unsigned long)(size) >= FASTBIN_CONSOLIDATION_THRESHOLD) { - if (have_fastchunks(av)) + if (atomic_load_relaxed (&av->have_fastchunks)) malloc_consolidate(av); if (av == &main_arena) { @@ -4360,7 +4348,7 @@ static void malloc_consolidate(mstate av) */ if (get_max_fast () != 0) { - clear_fastchunks(av); + atomic_store_relaxed (&av->have_fastchunks, false); unsorted_bin = unsorted_chunks(av);