From patchwork Tue Oct 20 05:03:44 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 36431 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 8CCE9B7B79 for ; Tue, 20 Oct 2009 16:04:19 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753340AbZJTFDo (ORCPT ); Tue, 20 Oct 2009 01:03:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753247AbZJTFDn (ORCPT ); Tue, 20 Oct 2009 01:03:43 -0400 Received: from gw1.cosmosbay.com ([212.99.114.194]:52883 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753274AbZJTFDl (ORCPT ); Tue, 20 Oct 2009 01:03:41 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) by gw1.cosmosbay.com (8.13.7/8.13.7) with ESMTP id n9K53hgb025185; Tue, 20 Oct 2009 07:03:43 +0200 Message-ID: <4ADD44B0.8030204@gmail.com> Date: Tue, 20 Oct 2009 07:03:44 +0200 From: Eric Dumazet User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: David Miller CC: netdev@vger.kernel.org Subject: [PATCH net-next-2.6] net: Introduce dev_get_by_index_rcu() References: <4ADD3794.8030906@gmail.com> <20091019.212018.79580287.davem@davemloft.net> <4ADD3B5A.1080905@gmail.com> <20091019.212855.179405364.davem@davemloft.net> In-Reply-To: <20091019.212855.179405364.davem@davemloft.net> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [0.0.0.0]); Tue, 20 Oct 2009 07:03:43 +0200 (CEST) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org David Miller a écrit : > From: Eric Dumazet > Date: Tue, 20 Oct 2009 06:23:54 +0200 > >> I wonder if the whole thing could use RCU somehow, since some >> workloads hit this dev_base_lock rwlock pretty hard... > > True, but for now we'll put your fix in :-) [PATCH net-next-2.6] net: Introduce dev_get_by_index_rcu() Some workloads hit dev_base_lock rwlock pretty hard. We can use RCU lookups to avoid touching this rwlock. netdevices are already freed after a RCU grace period, so this patch adds no penalty at device dismantle time. Signed-off-by: Eric Dumazet --- include/linux/netdevice.h | 1 net/core/dev.c | 40 ++++++++++++++++++++++++++++++------ 2 files changed, 35 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 8380009..4eda680 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1127,6 +1127,7 @@ extern void netdev_resync_ops(struct net_device *dev); extern int call_netdevice_notifiers(unsigned long val, struct net_device *dev); extern struct net_device *dev_get_by_index(struct net *net, int ifindex); extern struct net_device *__dev_get_by_index(struct net *net, int ifindex); +extern struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex); extern int dev_restart(struct net_device *dev); #ifdef CONFIG_NETPOLL_TRAP extern int netpoll_trap(void); diff --git a/net/core/dev.c b/net/core/dev.c index 28b0b9e..cb011b7 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -217,12 +217,15 @@ static int list_netdevice(struct net_device *dev) write_lock_bh(&dev_base_lock); list_add_tail(&dev->dev_list, &net->dev_base_head); hlist_add_head(&dev->name_hlist, dev_name_hash(net, dev->name)); - hlist_add_head(&dev->index_hlist, dev_index_hash(net, dev->ifindex)); + hlist_add_head_rcu(&dev->index_hlist, + dev_index_hash(net, dev->ifindex)); write_unlock_bh(&dev_base_lock); return 0; } -/* Device list removal */ +/* Device list removal + * caller must respect a RCU grace period before freeing/reusing dev + */ static void unlist_netdevice(struct net_device *dev) { ASSERT_RTNL(); @@ -231,7 +234,7 @@ static void unlist_netdevice(struct net_device *dev) write_lock_bh(&dev_base_lock); list_del(&dev->dev_list); hlist_del(&dev->name_hlist); - hlist_del(&dev->index_hlist); + hlist_del_rcu(&dev->index_hlist); write_unlock_bh(&dev_base_lock); } @@ -649,6 +652,31 @@ struct net_device *__dev_get_by_index(struct net *net, int ifindex) } EXPORT_SYMBOL(__dev_get_by_index); +/** + * dev_get_by_index_rcu - find a device by its ifindex + * @net: the applicable net namespace + * @ifindex: index of device + * + * Search for an interface by index. Returns %NULL if the device + * is not found or a pointer to the device. The device has not + * had its reference counter increased so the caller must be careful + * about locking. The caller must hold RCU lock. + */ + +struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex) +{ + struct hlist_node *p; + struct net_device *dev; + struct hlist_head *head = dev_index_hash(net, ifindex); + + hlist_for_each_entry_rcu(dev, p, head, index_hlist) + if (dev->ifindex == ifindex) + return dev; + + return NULL; +} +EXPORT_SYMBOL(dev_get_by_index_rcu); + /** * dev_get_by_index - find a device by its ifindex @@ -665,11 +693,11 @@ struct net_device *dev_get_by_index(struct net *net, int ifindex) { struct net_device *dev; - read_lock(&dev_base_lock); - dev = __dev_get_by_index(net, ifindex); + rcu_read_lock(); + dev = dev_get_by_index_rcu(net, ifindex); if (dev) dev_hold(dev); - read_unlock(&dev_base_lock); + rcu_read_unlock(); return dev; } EXPORT_SYMBOL(dev_get_by_index);