From patchwork Mon Apr 14 14:51:23 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 338981 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 6FFC314008A for ; Tue, 15 Apr 2014 00:51:54 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:cc :subject:content-type:content-transfer-encoding; q=dns; s= default; b=EH2ZfV+edm9JtbPFYzw9SN2bPQDJjfu/c1WF+07bYRwIN3w2uBwtE t7DGFycKRXphg73gl47VuJANUf2jRJpJVu/6nnf0gc0dmfxudpt1ukVCENXam94G gS7dkkr7hWTm4EMfbrFpbgmRmIPAyzfLQoVRqkwJ37209Un7pE+t5w= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:cc :subject:content-type:content-transfer-encoding; s=default; bh=4 STQLFbYPTUhly+z3BJwoWoXwXY=; b=eTLBmSPtTkKXS7YVST3v9SWPMvkvxwI1X CgPhD1kdloQejXxcqs6HRJfmAzwmc/X24PcLsggYvXV75hwLRzwnexTlExT1TRlF P1qAPXer/C4qE+SzkebF04qwjc04FX8PvBM3SqLJS03nB8Wktpmp5wxOYzfPWZeM vtmfOU5/Lc= Received: (qmail 11352 invoked by alias); 14 Apr 2014 14:51:47 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 11333 invoked by uid 89); 14 Apr 2014 14:51:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e24smtp03.br.ibm.com Message-ID: <534BF5EB.4000608@linux.vnet.ibm.com> Date: Mon, 14 Apr 2014 11:51:23 -0300 From: Adhemerval Zanella User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: "GNU C. Library" CC: "Steven J. Munroe" Subject: PowerPC: Sync pthread_once with default implementation X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14041414-9564-0000-0000-000000946154 This patch removes the arch specific powerpc implementation and instead uses the linux default one. Although the current powerpc implementation already contains the required memory barriers for correct initialization, the default implementation shows a better performance on newer chips. I checked the default implementations with some other tests for powerpc and everything looks ok. The only nit that was puzzling me was code difference I saw checking if a load acquire instruction sequence would yield better performance than the relaxed load plus acquire memory fence. From Torvald Riegel comment: > Okay. If lwsync is preferable for an acquire load, it might be best to > check the GCC atomics implementation because I believe it's following > http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html, which suggests > isync for acquire loads. I rechecked ISA documentation and some literature and the only inhibitor about using an 'lwsync' as an acquire barrier is because it should not be used on 'Write Through Required and Caching Inhibited' storage location, which AFAIK are used only in restricted kernel areas. In fact, load/store with reservations does not work on such kind of memory attributes, so for general userland lock mechanism 'lwsync' is safe to be used as acquire memory fence. GCC it self translate C11 atomic_thread_fence (memory_order_acquire) to 'lwsync', so a relaxed load plus memory acquire fence will be translated to 'ld; lwsync'. Two advantage by using 'ld; cmp; bc; isync' for acquire loads is 1) it is more strict, thus safer (although still not really required) 2) is it performs much better some chips (POWER6), while showing very similar performance on other POWER platforms. I will see if it is worth to change the current core for acquire load/ release store, at least for POWER. The only chip that I saw a big improvement of doing it is on POWER6. Also, this testcase itself is single-thread and think it would be worth to extend it to check contention on multithread cases as well. --- * nptl/sysdeps/unix/sysv/linux/powerpc/pthread_once.c: Remove file. --- diff --git a/nptl/sysdeps/unix/sysv/linux/powerpc/pthread_once.c b/nptl/sysdeps/unix/sysv/linux/powerpc/pthread_once.c deleted file mode 100644 index e925299..0000000 --- a/nptl/sysdeps/unix/sysv/linux/powerpc/pthread_once.c +++ /dev/null @@ -1,110 +0,0 @@ -/* Copyright (C) 2003-2014 Free Software Foundation, Inc. - This file is part of the GNU C Library. - Contributed by Paul Mackerras , 2003. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include "pthreadP.h" -#include - - -unsigned long int __fork_generation attribute_hidden; - - -static void -clear_once_control (void *arg) -{ - pthread_once_t *once_control = (pthread_once_t *) arg; - - __asm __volatile (__lll_rel_instr); - *once_control = 0; - lll_futex_wake (once_control, INT_MAX, LLL_PRIVATE); -} - - -int -__pthread_once (pthread_once_t *once_control, void (*init_routine) (void)) -{ - for (;;) - { - int oldval; - int newval; - int tmp; - - /* Pseudo code: - newval = __fork_generation | 1; - oldval = *once_control; - if ((oldval & 2) == 0) - *once_control = newval; - Do this atomically with an acquire barrier. - */ - newval = __fork_generation | 1; - __asm __volatile ("1: lwarx %0,0,%3" MUTEX_HINT_ACQ "\n" - " andi. %1,%0,2\n" - " bne 2f\n" - " stwcx. %4,0,%3\n" - " bne 1b\n" - "2: " __lll_acq_instr - : "=&r" (oldval), "=&r" (tmp), "=m" (*once_control) - : "r" (once_control), "r" (newval), "m" (*once_control) - : "cr0"); - - /* Check if the initializer has already been done. */ - if ((oldval & 2) != 0) - return 0; - - /* Check if another thread already runs the initializer. */ - if ((oldval & 1) == 0) - break; - - /* Check whether the initializer execution was interrupted by a fork. */ - if (oldval != newval) - break; - - /* Same generation, some other thread was faster. Wait. */ - lll_futex_wait (once_control, oldval, LLL_PRIVATE); - } - - - /* This thread is the first here. Do the initialization. - Register a cleanup handler so that in case the thread gets - interrupted the initialization can be restarted. */ - pthread_cleanup_push (clear_once_control, once_control); - - init_routine (); - - pthread_cleanup_pop (0); - - - /* Add one to *once_control to take the bottom 2 bits from 01 to 10. - A release barrier is needed to ensure memory written by init_routine - is seen in other threads before *once_control changes. */ - int tmp; - __asm __volatile (__lll_rel_instr "\n" - "1: lwarx %0,0,%2" MUTEX_HINT_REL "\n" - " addi %0,%0,1\n" - " stwcx. %0,0,%2\n" - " bne- 1b" - : "=&b" (tmp), "=m" (*once_control) - : "r" (once_control), "m" (*once_control) - : "cr0"); - - /* Wake up all other threads. */ - lll_futex_wake (once_control, INT_MAX, LLL_PRIVATE); - - return 0; -} -weak_alias (__pthread_once, pthread_once) -hidden_def (__pthread_once)