From patchwork Tue Dec 14 13:35:50 2010
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Henderson <rth@redhat.com>
X-Patchwork-Id: 75489
Return-Path: 
 <gcc-patches-return-280997-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id 07FEC1007D1
	for <incoming@patchwork.ozlabs.org>;
	Wed, 15 Dec 2010 00:35:58 +1100 (EST)
Received: (qmail 11374 invoked by alias); 14 Dec 2010 13:35:52 -0000
Received: (qmail 11235 invoked by uid 22791); 14 Dec 2010 13:35:50 -0000
X-SWARE-Spam-Status: No, hits=-6.0 required=5.0	tests=AWL, BAYES_00,
	RCVD_IN_DNSWL_HI, SPF_HELO_PASS, T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by
	sourceware.org (qpsmtpd/0.43rc1) with ESMTP;
	Tue, 14 Dec 2010 13:35:45 +0000
Received: from int-mx12.intmail.prod.int.phx2.redhat.com
	(int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25])	by
	mx1.redhat.com (8.13.8/8.13.8) with ESMTP id
	oBEDZhvF006342	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA
	bits=256 verify=OK); Tue, 14 Dec 2010 08:35:43 -0500
Received: from stone.twiddle.home (vpn-8-37.rdu.redhat.com [10.11.8.37])	by
	int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4)
	with ESMTP id oBEDZfX1027088; Tue, 14 Dec 2010 08:35:42 -0500
Message-ID: <4D0772B6.5070407@redhat.com>
Date: Tue, 14 Dec 2010 05:35:50 -0800
From: Richard Henderson <rth@redhat.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14
	Thunderbird/3.1.7
MIME-Version: 1.0
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: torvald@se.inf.tu-dresden.de
Subject: [trans-mem] reduce contention caused by transaction id
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

Having to ping-pong a cacheline between threads upon creation of any
transaction is quite wasteful.  We can essentially eliminate this
contention by allocating 2**N IDs to the thread at once.

Here I use 2**16, which means that we have 2**48 such blocks to
allocate between threads.  That ought to be enough such that even a
badly behaving application that continuously creates new threads 
cannot quickly exhaust the supply.

Idea courtesy of Torvald Riegel.


r~
* beginend.cc (GTM::gtm_transaction::begin_transaction): Allocate
	blocks of TIDs per thread.
	* config/generic/tls.h (struct gtm_thread): Add local_tid member.
	(setup_gtm_thr): Return the thread structure.
	* config/x86/tls.h (setup_gtm_thr): Likewise.

Index: beginend.cc
===================================================================
--- beginend.cc	(revision 167789)
+++ beginend.cc	(working copy)
@@ -91,11 +91,13 @@
 uint32_t
 GTM::gtm_transaction::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
 {
+  static const _ITM_transactionId_t tid_block_size = 1 << 16;
+
   gtm_transaction *tx;
   gtm_dispatch *disp;
   uint32_t ret;
 
-  setup_gtm_thr ();
+  gtm_thread *thr = setup_gtm_thr ();
 
   tx = new gtm_transaction;
 
@@ -103,13 +105,25 @@
   tx->prev = gtm_tx();
   if (tx->prev)
     tx->nesting = tx->prev->nesting + 1;
+
+  // As long as we have not exhausted a previously allocated block of TIDs,
+  // we can avoid an atomic operation on a shared cacheline.
+  if (thr->local_tid & (tid_block_size - 1))
+    tx->id = thr->local_tid++;
+  else
+    {
 #ifdef HAVE_64BIT_SYNC_BUILTINS
-  tx->id = __sync_add_and_fetch (&global_tid, 1);
+      tx->id = __sync_add_and_fetch (&global_tid, tid_block_size);
+      thr->local_tid = tx->id + 1;
 #else
-  pthread_mutex_lock (&global_tid_lock);
-  tx->id = ++global_tid;
-  pthread_mutex_unlock (&global_tid_lock);
+      pthread_mutex_lock (&global_tid_lock);
+      global_tid += tid_block_size;
+      tx->id = global_tid;
+      thr->local_tid = tx->id + 1;
+      pthread_mutex_unlock (&global_tid_lock);
 #endif
+    }
+
   tx->jb = *jb;
 
   set_gtm_tx (tx);
Index: config/x86/tls.h
===================================================================
--- config/x86/tls.h	(revision 167790)
+++ config/x86/tls.h	(working copy)
@@ -65,10 +65,15 @@
   return r;
 }
 
-static inline void setup_gtm_thr(void)
+static inline struct gtm_thread *setup_gtm_thr(void)
 {
-  if (gtm_thr() == NULL)
-    asm volatile (SEG_WRITE(10) : : "r"(&_gtm_thr));
+  gtm_thread *thr = gtm_thr();
+  if (thr == NULL)
+    {
+      thr = &_gtm_thr;
+      asm volatile (SEG_WRITE(10) : : "r"(thr));
+    }
+  return thr;
 }
 
 static inline struct gtm_transaction * gtm_tx(void)
Index: config/generic/tls.h
===================================================================
--- config/generic/tls.h	(revision 167789)
+++ config/generic/tls.h	(working copy)
@@ -50,6 +50,12 @@
   void *free_tx[MAX_FREE_TX];
   unsigned free_tx_idx, free_tx_count;
 
+  // In order to reduce cacheline contention on global_tid during
+  // beginTransaction, we allocate a block of 2**N ids to the thread
+  // all at once.  This number is the next value to be allocated from
+  // the block, or 0 % 2**N if no such block is allocated.
+  _ITM_transactionId_t local_tid;
+
   // The value returned by _ITM_getThreadnum to identify this thread.
   // ??? At present, this is densely allocated beginning with 1 and
   // we don't bother filling in this value until it is requested.
@@ -67,7 +73,7 @@
 #ifndef HAVE_ARCH_GTM_THREAD
 // If the target does not provide optimized access to the thread-local
 // data, simply access the TLS variable defined above.
-static inline void setup_gtm_thr() { }
+static inline gtm_thread *setup_gtm_thr() { return &_gtm_thr; }
 static inline gtm_thread *gtm_thr() { return &_gtm_thr; }
 #endif