From patchwork Tue Mar 5 16:19:12 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michel Lespinasse X-Patchwork-Id: 225077 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from ozlabs.org (localhost [IPv6:::1]) by ozlabs.org (Postfix) with ESMTP id 8405D2C044F for ; Wed, 6 Mar 2013 03:20:02 +1100 (EST) Received: from mail-pb0-f53.google.com (mail-pb0-f53.google.com [209.85.160.53]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (not verified)) by ozlabs.org (Postfix) with ESMTPS id D12782C032D for ; Wed, 6 Mar 2013 03:19:20 +1100 (EST) Received: by mail-pb0-f53.google.com with SMTP id un1so4619917pbc.26 for ; Tue, 05 Mar 2013 08:19:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=nAr2X99pytchHEbbN5bVQn0pZBlyGSVi9ADDsLndqI8=; b=c4x4d18T2ozSh/z2+K5MdPRtSJg36ua8V8ipe3jlBkiw5LweAD/s1Zs3oYJusVtAUx LvIHyOjV/WokeUpZ9U+dUWvJw9nsxW/GwEy1MJYQDLL6wJ3qYBzQxGMzerSJijJFuL23 LEgxMx/90dqIIVI5a95CC0EpSRDBvOvYTStLyehhOj19I9SP7vrHmveTMkvCM40XN5A+ peQEKLKKSR9oJF9fjBtyMWKxSdbx7GhytGHZzeIrA4NJSTJ+mOTTFQLW6BY5WSAwg0RO dgW5VNqLlRKU8Rh0RrKRJUsqWmXC0Eth1jsZUMz3lABWijm5l5Z1Hr/I5i/I+/QowqVO W9aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent:x-gm-message-state; bh=nAr2X99pytchHEbbN5bVQn0pZBlyGSVi9ADDsLndqI8=; b=eXhzRBDSPU/h526YG9Mk2OHdmvbwS5Zwr1LprfbjAl/zEZ/CE8iqzwKLZBKbcyqLKS rZwRvT1dGUKxMhLRhq5GE6ZDxgR7FWoKf0fjqosw+Bn/RCWjO0TTCjWPNgq/K1LJcUIv i1MdLfjoQjuR+Td4vBkJeOJTAREcbX8mmzw5D34ZhSLWCV+xv1ejyfRYoB6+ui3aVzLG 5cCrnrM3RA4yL94e7T+9Q7ATHKy6nlZswC1fQMJnDxWI8t0/eHhU0+HWkbGf1JvnQdho 1CC6SW4H76DHK/LS+AMah5D7c4//eNP8TcuzfQ/Gnh5sp96Zurf/HunChUsk6O8QX3Ad lk/Q== X-Received: by 10.68.213.7 with SMTP id no7mr37919597pbc.48.1362500356266; Tue, 05 Mar 2013 08:19:16 -0800 (PST) Received: from google.com ([2620:0:1000:3003:baac:6fff:fe98:d63f]) by mx.google.com with ESMTPS id rl3sm27256125pbb.28.2013.03.05.08.19.14 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 05 Mar 2013 08:19:15 -0800 (PST) Date: Tue, 5 Mar 2013 08:19:12 -0800 From: Michel Lespinasse To: Lai Jiangshan Subject: Re: [PATCH V2] lglock: add read-preference local-global rwlock Message-ID: <20130305161912.GA30756@google.com> References: <512D0D67.9010609@linux.vnet.ibm.com> <512E7879.20109@linux.vnet.ibm.com> <5130E8E2.50206@cn.fujitsu.com> <20130301182854.GA3631@redhat.com> <5131FB4C.7070408@cn.fujitsu.com> <20130302172003.GC29769@redhat.com> <51360ED1.3030104@cn.fujitsu.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <51360ED1.3030104@cn.fujitsu.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Gm-Message-State: ALoCoQnAIeH4faV+cXPgW99T9IqFndlu2sm45TrFhk/8xvBZJ7dKNvbBPCDyU1f8hccSEyemzAnX4D5XA1ykcTB7gngsT/7ynI0aLxmGI5ivXhg6RyiqU7Lmhx/N6cdw9GSNWIpURntFEJ+PsHexFpg6OzWgYyKx7YMu9GjjfGpYFOj7ajvevpx/l/xG/FfQgPrwdCOjx/rheYNK4CzSIMmsL4X1bp3UbA== Cc: Lai Jiangshan , linux-doc@vger.kernel.org, peterz@infradead.org, fweisbec@gmail.com, linux-kernel@vger.kernel.org, mingo@kernel.org, linux-arch@vger.kernel.org, linux@arm.linux.org.uk, xiaoguangrong@linux.vnet.ibm.com, wangyun@linux.vnet.ibm.com, paulmck@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com, linux-pm@vger.kernel.org, rusty@rustcorp.com.au, rostedt@goodmis.org, rjw@sisk.pl, namhyung@kernel.org, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org, Oleg Nesterov , vincent.guittot@linaro.org, sbw@mit.edu, "Srivatsa S. Bhat" , tj@kernel.org, akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Hi Lai, Just a few comments about your v2 proposal. Hopefully you'll catch these before you send out v3 :) - I would prefer reader_refcnt to be unsigned int instead of unsigned long - I would like some comment to indicate that lgrwlocks don't have reader-writer fairness and are thus somewhat discouraged (people could use plain lglock if they don't need reader preference, though even that use (as brlock) is discouraged already :) - I don't think FALLBACK_BASE is necessary (you already mentioned you'd drop it) - I prefer using the fallback_rwlock's dep_map for lockdep tracking. I feel this is more natural since we want the lgrwlock to behave as the rwlock, not as the lglock. - I prefer to avoid return statements in the middle of functions when it's easyto do so. Attached is my current version (based on an earlier version of your code). You don't have to take it as is but I feel it makes for a more concrete suggestion :) Thanks, ----------------------------8<------------------------------------------- lglock: add read-preference lgrwlock Current lglock may be used as a fair rwlock; however sometimes a read-preference rwlock is preferred. One such use case recently came up for get_cpu_online_atomic(). This change adds a new lgrwlock with the following properties: - high performance read side, using only cpu-local structures when there is no write side to contend with; - correctness guarantees similar to rwlock_t: recursive readers are allowed and the lock's read side is not ordered vs other locks; - low performance write side (comparable to lglocks' global side). The implementation relies on the following principles: - reader_refcnt is a local lock count; it indicates how many recursive read locks are taken using the local lglock; - lglock is used by readers for local locking; it must be acquired before reader_refcnt becomes nonzero and released after reader_refcnt goes back to zero; - fallback_rwlock is used by readers for global locking; it is acquired when fallback_reader_refcnt is zero and the trylock fails on lglock. - writers take both the lglock write side and the fallback_rwlock, thus making sure to exclude both local and global readers. Thanks to Srivatsa S. Bhat for proposing a lock with these requirements and Lai Jiangshan for proposing this algorithm as an lglock extension. Signed-off-by: Michel Lespinasse --- include/linux/lglock.h | 46 +++++++++++++++++++++++++++++++++++++++ kernel/lglock.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+) diff --git a/include/linux/lglock.h b/include/linux/lglock.h index 0d24e932db0b..8b59084935d5 100644 --- a/include/linux/lglock.h +++ b/include/linux/lglock.h @@ -67,4 +67,50 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu); void lg_global_lock(struct lglock *lg); void lg_global_unlock(struct lglock *lg); +/* + * lglock may be used as a read write spinlock if desired (though this is + * not encouraged as the write side scales badly on high CPU count machines). + * It has reader/writer fairness when used that way. + * + * However, sometimes it is desired to have an unfair rwlock instead, with + * reentrant readers that don't need to be ordered vs other locks, comparable + * to rwlock_t. lgrwlock implements such semantics. + */ +struct lgrwlock { + unsigned int __percpu *reader_refcnt; + struct lglock lglock; + rwlock_t fallback_rwlock; +}; + +#define __DEFINE_LGRWLOCK_PERCPU_DATA(name) \ + static DEFINE_PER_CPU(unsigned int, name ## _refcnt); \ + static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \ + = __ARCH_SPIN_LOCK_UNLOCKED; + +#define __LGRWLOCK_INIT(name) { \ + .reader_refcnt = &name ## _refcnt, \ + .lglock = { .lock = &name ## _lock }, \ + .fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock) \ +} + +#define DEFINE_LGRWLOCK(name) \ + __DEFINE_LGRWLOCK_PERCPU_DATA(name) \ + struct lgrwlock name = __LGRWLOCK_INIT(name) + +#define DEFINE_STATIC_LGRWLOCK(name) \ + __DEFINE_LGRWLOCK_PERCPU_DATA(name) \ + static struct lgrwlock name = __LGRWLOCK_INIT(name) + +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name) +{ + lg_lock_init(&lgrw->lglock, name); +} + +void lg_read_lock(struct lgrwlock *lgrw); +void lg_read_unlock(struct lgrwlock *lgrw); +void lg_write_lock(struct lgrwlock *lgrw); +void lg_write_unlock(struct lgrwlock *lgrw); +void __lg_read_write_lock(struct lgrwlock *lgrw); +void __lg_read_write_unlock(struct lgrwlock *lgrw); + #endif diff --git a/kernel/lglock.c b/kernel/lglock.c index 86ae2aebf004..e78a7c95dbfd 100644 --- a/kernel/lglock.c +++ b/kernel/lglock.c @@ -87,3 +87,61 @@ void lg_global_unlock(struct lglock *lg) preempt_enable(); } EXPORT_SYMBOL(lg_global_unlock); + +void lg_read_lock(struct lgrwlock *lgrw) +{ + preempt_disable(); + + if (__this_cpu_read(*lgrw->reader_refcnt) || + arch_spin_trylock(this_cpu_ptr(lgrw->lglock.lock))) { + __this_cpu_inc(*lgrw->reader_refcnt); + rwlock_acquire_read(&lgrw->fallback_rwlock.dep_map, + 0, 0, _RET_IP_); + } else { + read_lock(&lgrw->fallback_rwlock); + } +} +EXPORT_SYMBOL(lg_read_lock); + +void lg_read_unlock(struct lgrwlock *lgrw) +{ + if (likely(__this_cpu_read(*lgrw->reader_refcnt))) { + rwlock_release(&lgrw->fallback_rwlock.dep_map, + 1, _RET_IP_); + if (!__this_cpu_dec_return(*lgrw->reader_refcnt)) + arch_spin_unlock(this_cpu_ptr(lgrw->lglock.lock)); + } else { + read_unlock(&lgrw->fallback_rwlock); + } + + preempt_enable(); +} +EXPORT_SYMBOL(lg_read_unlock); + +void lg_write_lock(struct lgrwlock *lgrw) +{ + lg_global_lock(&lgrw->lglock); + write_lock(&lgrw->fallback_rwlock); +} +EXPORT_SYMBOL(lg_write_lock); + +void lg_write_unlock(struct lgrwlock *lgrw) +{ + write_unlock(&lgrw->fallback_rwlock); + lg_global_unlock(&lgrw->lglock); +} +EXPORT_SYMBOL(lg_write_unlock); + +void __lg_read_write_lock(struct lgrwlock *lgrw) +{ + lg_write_lock(lgrw); + __this_cpu_write(*lgrw->reader_refcnt, 1); +} +EXPORT_SYMBOL(__lg_read_write_lock); + +void __lg_read_write_unlock(struct lgrwlock *lgrw) +{ + __this_cpu_write(*lgrw->reader_refcnt, 0); + lg_write_unlock(lgrw); +} +EXPORT_SYMBOL(__lg_read_write_unlock);