From patchwork Tue Mar 5 16:19:12 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michel Lespinasse X-Patchwork-Id: 225076 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3A7BC2C0359 for ; Wed, 6 Mar 2013 03:19:24 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757081Ab3CEQTW (ORCPT ); Tue, 5 Mar 2013 11:19:22 -0500 Received: from mail-pb0-f49.google.com ([209.85.160.49]:51575 "EHLO mail-pb0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752404Ab3CEQTS (ORCPT ); Tue, 5 Mar 2013 11:19:18 -0500 Received: by mail-pb0-f49.google.com with SMTP id xa12so4613899pbc.22 for ; Tue, 05 Mar 2013 08:19:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=nAr2X99pytchHEbbN5bVQn0pZBlyGSVi9ADDsLndqI8=; b=c4x4d18T2ozSh/z2+K5MdPRtSJg36ua8V8ipe3jlBkiw5LweAD/s1Zs3oYJusVtAUx LvIHyOjV/WokeUpZ9U+dUWvJw9nsxW/GwEy1MJYQDLL6wJ3qYBzQxGMzerSJijJFuL23 LEgxMx/90dqIIVI5a95CC0EpSRDBvOvYTStLyehhOj19I9SP7vrHmveTMkvCM40XN5A+ peQEKLKKSR9oJF9fjBtyMWKxSdbx7GhytGHZzeIrA4NJSTJ+mOTTFQLW6BY5WSAwg0RO dgW5VNqLlRKU8Rh0RrKRJUsqWmXC0Eth1jsZUMz3lABWijm5l5Z1Hr/I5i/I+/QowqVO W9aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent:x-gm-message-state; bh=nAr2X99pytchHEbbN5bVQn0pZBlyGSVi9ADDsLndqI8=; b=M4ewtQOrfbrLH1uUx+af5+Y+ENINjNoKCJaesEu8g1VvMBP3w011FiJWPF/HpqvQzx w3on5K1YBE8pzgHX1n2KYIjC3XEq32WRmL5+lP61l7ShvP/M9F/vdrqoM/+0HLqDkWu+ MaO8nhfhhSRq2jW0obiy3eJxlPSKngK9N0Ln5ccNoLzDU6o2l1qjZWQfmBHdCjZ3E5Ng 7lxhdSNTMlA6O2RE9PtorNOYmHaaqXBe9YRlC7P3eVva11TjHz0u7FbxONlL+L1taXWz d22zVSGvmoSpHlQVBWAwkY4BRJ7ZrzweO7aCsVnzWOS/c9z4LCRQhZDN0cRLV86hR/Dr +BgA== X-Received: by 10.68.213.7 with SMTP id no7mr37919597pbc.48.1362500356266; Tue, 05 Mar 2013 08:19:16 -0800 (PST) Received: from google.com ([2620:0:1000:3003:baac:6fff:fe98:d63f]) by mx.google.com with ESMTPS id rl3sm27256125pbb.28.2013.03.05.08.19.14 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 05 Mar 2013 08:19:15 -0800 (PST) Date: Tue, 5 Mar 2013 08:19:12 -0800 From: Michel Lespinasse To: Lai Jiangshan Cc: Oleg Nesterov , "Srivatsa S. Bhat" , Lai Jiangshan , linux-doc@vger.kernel.org, peterz@infradead.org, fweisbec@gmail.com, linux-kernel@vger.kernel.org, namhyung@kernel.org, mingo@kernel.org, linux-arch@vger.kernel.org, linux@arm.linux.org.uk, xiaoguangrong@linux.vnet.ibm.com, wangyun@linux.vnet.ibm.com, paulmck@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com, linux-pm@vger.kernel.org, rusty@rustcorp.com.au, rostedt@goodmis.org, rjw@sisk.pl, vincent.guittot@linaro.org, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org, sbw@mit.edu, tj@kernel.org, akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH V2] lglock: add read-preference local-global rwlock Message-ID: <20130305161912.GA30756@google.com> References: <512D0D67.9010609@linux.vnet.ibm.com> <512E7879.20109@linux.vnet.ibm.com> <5130E8E2.50206@cn.fujitsu.com> <20130301182854.GA3631@redhat.com> <5131FB4C.7070408@cn.fujitsu.com> <20130302172003.GC29769@redhat.com> <51360ED1.3030104@cn.fujitsu.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <51360ED1.3030104@cn.fujitsu.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Gm-Message-State: ALoCoQmBFs1yFf/SEZNXhMX0+qrSL73oCAXj+B7qssUml71ohRAw7RmECqTgMdBncZcVHBsNhcHyfxseWxDNYkUFnMzUsxmfaLeBTlNu7D388znz83m/ADLwZ2/KO+rgNZvDevQ69N2ZnPD4L45ZFMxl+HiJgr0xBgPfcyMWmulDJlAIKjMzgWkulXn8M+TSpCQsQHlsyZY2 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hi Lai, Just a few comments about your v2 proposal. Hopefully you'll catch these before you send out v3 :) - I would prefer reader_refcnt to be unsigned int instead of unsigned long - I would like some comment to indicate that lgrwlocks don't have reader-writer fairness and are thus somewhat discouraged (people could use plain lglock if they don't need reader preference, though even that use (as brlock) is discouraged already :) - I don't think FALLBACK_BASE is necessary (you already mentioned you'd drop it) - I prefer using the fallback_rwlock's dep_map for lockdep tracking. I feel this is more natural since we want the lgrwlock to behave as the rwlock, not as the lglock. - I prefer to avoid return statements in the middle of functions when it's easyto do so. Attached is my current version (based on an earlier version of your code). You don't have to take it as is but I feel it makes for a more concrete suggestion :) Thanks, ----------------------------8<------------------------------------------- lglock: add read-preference lgrwlock Current lglock may be used as a fair rwlock; however sometimes a read-preference rwlock is preferred. One such use case recently came up for get_cpu_online_atomic(). This change adds a new lgrwlock with the following properties: - high performance read side, using only cpu-local structures when there is no write side to contend with; - correctness guarantees similar to rwlock_t: recursive readers are allowed and the lock's read side is not ordered vs other locks; - low performance write side (comparable to lglocks' global side). The implementation relies on the following principles: - reader_refcnt is a local lock count; it indicates how many recursive read locks are taken using the local lglock; - lglock is used by readers for local locking; it must be acquired before reader_refcnt becomes nonzero and released after reader_refcnt goes back to zero; - fallback_rwlock is used by readers for global locking; it is acquired when fallback_reader_refcnt is zero and the trylock fails on lglock. - writers take both the lglock write side and the fallback_rwlock, thus making sure to exclude both local and global readers. Thanks to Srivatsa S. Bhat for proposing a lock with these requirements and Lai Jiangshan for proposing this algorithm as an lglock extension. Signed-off-by: Michel Lespinasse --- include/linux/lglock.h | 46 +++++++++++++++++++++++++++++++++++++++ kernel/lglock.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+) diff --git a/include/linux/lglock.h b/include/linux/lglock.h index 0d24e932db0b..8b59084935d5 100644 --- a/include/linux/lglock.h +++ b/include/linux/lglock.h @@ -67,4 +67,50 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu); void lg_global_lock(struct lglock *lg); void lg_global_unlock(struct lglock *lg); +/* + * lglock may be used as a read write spinlock if desired (though this is + * not encouraged as the write side scales badly on high CPU count machines). + * It has reader/writer fairness when used that way. + * + * However, sometimes it is desired to have an unfair rwlock instead, with + * reentrant readers that don't need to be ordered vs other locks, comparable + * to rwlock_t. lgrwlock implements such semantics. + */ +struct lgrwlock { + unsigned int __percpu *reader_refcnt; + struct lglock lglock; + rwlock_t fallback_rwlock; +}; + +#define __DEFINE_LGRWLOCK_PERCPU_DATA(name) \ + static DEFINE_PER_CPU(unsigned int, name ## _refcnt); \ + static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \ + = __ARCH_SPIN_LOCK_UNLOCKED; + +#define __LGRWLOCK_INIT(name) { \ + .reader_refcnt = &name ## _refcnt, \ + .lglock = { .lock = &name ## _lock }, \ + .fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock) \ +} + +#define DEFINE_LGRWLOCK(name) \ + __DEFINE_LGRWLOCK_PERCPU_DATA(name) \ + struct lgrwlock name = __LGRWLOCK_INIT(name) + +#define DEFINE_STATIC_LGRWLOCK(name) \ + __DEFINE_LGRWLOCK_PERCPU_DATA(name) \ + static struct lgrwlock name = __LGRWLOCK_INIT(name) + +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name) +{ + lg_lock_init(&lgrw->lglock, name); +} + +void lg_read_lock(struct lgrwlock *lgrw); +void lg_read_unlock(struct lgrwlock *lgrw); +void lg_write_lock(struct lgrwlock *lgrw); +void lg_write_unlock(struct lgrwlock *lgrw); +void __lg_read_write_lock(struct lgrwlock *lgrw); +void __lg_read_write_unlock(struct lgrwlock *lgrw); + #endif diff --git a/kernel/lglock.c b/kernel/lglock.c index 86ae2aebf004..e78a7c95dbfd 100644 --- a/kernel/lglock.c +++ b/kernel/lglock.c @@ -87,3 +87,61 @@ void lg_global_unlock(struct lglock *lg) preempt_enable(); } EXPORT_SYMBOL(lg_global_unlock); + +void lg_read_lock(struct lgrwlock *lgrw) +{ + preempt_disable(); + + if (__this_cpu_read(*lgrw->reader_refcnt) || + arch_spin_trylock(this_cpu_ptr(lgrw->lglock.lock))) { + __this_cpu_inc(*lgrw->reader_refcnt); + rwlock_acquire_read(&lgrw->fallback_rwlock.dep_map, + 0, 0, _RET_IP_); + } else { + read_lock(&lgrw->fallback_rwlock); + } +} +EXPORT_SYMBOL(lg_read_lock); + +void lg_read_unlock(struct lgrwlock *lgrw) +{ + if (likely(__this_cpu_read(*lgrw->reader_refcnt))) { + rwlock_release(&lgrw->fallback_rwlock.dep_map, + 1, _RET_IP_); + if (!__this_cpu_dec_return(*lgrw->reader_refcnt)) + arch_spin_unlock(this_cpu_ptr(lgrw->lglock.lock)); + } else { + read_unlock(&lgrw->fallback_rwlock); + } + + preempt_enable(); +} +EXPORT_SYMBOL(lg_read_unlock); + +void lg_write_lock(struct lgrwlock *lgrw) +{ + lg_global_lock(&lgrw->lglock); + write_lock(&lgrw->fallback_rwlock); +} +EXPORT_SYMBOL(lg_write_lock); + +void lg_write_unlock(struct lgrwlock *lgrw) +{ + write_unlock(&lgrw->fallback_rwlock); + lg_global_unlock(&lgrw->lglock); +} +EXPORT_SYMBOL(lg_write_unlock); + +void __lg_read_write_lock(struct lgrwlock *lgrw) +{ + lg_write_lock(lgrw); + __this_cpu_write(*lgrw->reader_refcnt, 1); +} +EXPORT_SYMBOL(__lg_read_write_lock); + +void __lg_read_write_unlock(struct lgrwlock *lgrw) +{ + __this_cpu_write(*lgrw->reader_refcnt, 0); + lg_write_unlock(lgrw); +} +EXPORT_SYMBOL(__lg_read_write_unlock);