diff mbox

dev: use name hash for dev_seq_ops.

Message ID 1318000849-2531-1-git-send-email-mmaruseac@ixiacom.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Mihai Maruseac Oct. 7, 2011, 3:20 p.m. UTC
Instead of using the dev->next chain and trying to resync at each call to
dev_seq_start, use this hash and store bucket number and bucket offset in
seq->private field.

Tests revealed the following results for ifconfig > /dev/null
	* 1000 interfaces:
		* 0.114s without patch
		* 0.020s with patch
	* 3000 interfaces:
		* 0.489s without patch
		* 0.048s with patch
	* 5000 interfaces:
		* 1.363s without patch
		* 0.131s with patch

As one can notice the improvement is of 1 order of magnitude.

Signed-off-by: Mihai Maruseac <mmaruseac@ixiacom.com>
---
 net/core/dev.c |   73 +++++++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 56 insertions(+), 17 deletions(-)

Comments

stephen hemminger Oct. 7, 2011, 4:24 p.m. UTC | #1
On Fri,  7 Oct 2011 18:20:49 +0300
Mihai Maruseac <mihai.maruseac@gmail.com> wrote:

> Instead of using the dev->next chain and trying to resync at each call to
> dev_seq_start, use this hash and store bucket number and bucket offset in
> seq->private field.
> 
> Tests revealed the following results for ifconfig > /dev/null
> 	* 1000 interfaces:
> 		* 0.114s without patch
> 		* 0.020s with patch
> 	* 3000 interfaces:
> 		* 0.489s without patch
> 		* 0.048s with patch
> 	* 5000 interfaces:
> 		* 1.363s without patch
> 		* 0.131s with patch
> 
> As one can notice the improvement is of 1 order of magnitude.

Good idea,
This will change the ordering of entries in /proc which may upset
some program, not a critical flaw but worth noting.

Rather than recording the bucket and offset of last entry, another
alternative would be to just record the ifindex.

Also ifconfig is considered deprecated and replaced by ip commands
for general use.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mihai Maruseac Oct. 8, 2011, 7:22 a.m. UTC | #2
On Fri, Oct 7, 2011 at 7:24 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> On Fri,  7 Oct 2011 18:20:49 +0300
> Mihai Maruseac <mihai.maruseac@gmail.com> wrote:
>
>> Instead of using the dev->next chain and trying to resync at each call to
>> dev_seq_start, use this hash and store bucket number and bucket offset in
>> seq->private field.
>>
>> Tests revealed the following results for ifconfig > /dev/null
>>       * 1000 interfaces:
>>               * 0.114s without patch
>>               * 0.020s with patch
>>       * 3000 interfaces:
>>               * 0.489s without patch
>>               * 0.048s with patch
>>       * 5000 interfaces:
>>               * 1.363s without patch
>>               * 0.131s with patch
>>
>> As one can notice the improvement is of 1 order of magnitude.
>
> Good idea,
> This will change the ordering of entries in /proc which may upset
> some program, not a critical flaw but worth noting.
>
> Rather than recording the bucket and offset of last entry, another
> alternative would be to just record the ifindex.
>
> Also ifconfig is considered deprecated and replaced by ip commands
> for general use.
>

Thanks,
This is a patch from a series of improvements to both the ifconfig and
ip commands. The ip part will come later, after being properly
implemented and tested.
Mihai Maruseac Oct. 10, 2011, 8:43 a.m. UTC | #3
On Fri, Oct 7, 2011 at 7:24 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> On Fri,  7 Oct 2011 18:20:49 +0300
> Mihai Maruseac <mihai.maruseac@gmail.com> wrote:
>
>> Instead of using the dev->next chain and trying to resync at each call to
>> dev_seq_start, use this hash and store bucket number and bucket offset in
>> seq->private field.
>>
>> As one can notice the improvement is of 1 order of magnitude.
>
> Good idea,
> This will change the ordering of entries in /proc which may upset
> some program, not a critical flaw but worth noting.
>
> Rather than recording the bucket and offset of last entry, another
> alternative would be to just record the ifindex.
>

I tried to record the ifindex but I think that using it and
dev_get_by_index can result in an infinite loop or a NULL
dereferrence. If a device is removed and ifindex points to it we'll
get a NULL from dev_get_by_index. Checking for NULL and calling again
dev_get_by_index will end in an infinite loop at the end of the hlist.

Augmenting the structure to also contain the number of indexes when
the seq_file is opened returns to the current situation with two ints.
Also, it is more prone to bugs caused by device removal while
printing.
stephen hemminger Oct. 11, 2011, 5:55 a.m. UTC | #4
On Mon, 10 Oct 2011 11:43:20 +0300
Mihai Maruseac <mihai.maruseac@gmail.com> wrote:

> On Fri, Oct 7, 2011 at 7:24 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> > On Fri,  7 Oct 2011 18:20:49 +0300
> > Mihai Maruseac <mihai.maruseac@gmail.com> wrote:
> >
> >> Instead of using the dev->next chain and trying to resync at each call to
> >> dev_seq_start, use this hash and store bucket number and bucket offset in
> >> seq->private field.
> >>
> >> As one can notice the improvement is of 1 order of magnitude.
> >
> > Good idea,
> > This will change the ordering of entries in /proc which may upset
> > some program, not a critical flaw but worth noting.
> >
> > Rather than recording the bucket and offset of last entry, another
> > alternative would be to just record the ifindex.
> >
> 
> I tried to record the ifindex but I think that using it and
> dev_get_by_index can result in an infinite loop or a NULL
> dereferrence. If a device is removed and ifindex points to it we'll
> get a NULL from dev_get_by_index. Checking for NULL and calling again
> dev_get_by_index will end in an infinite loop at the end of the hlist.
> 
> Augmenting the structure to also contain the number of indexes when
> the seq_file is opened returns to the current situation with two ints.
> Also, it is more prone to bugs caused by device removal while
> printing.

If ifindex has been deleted, the code should fall back to delivering
the next offset. That means for the rare case it would have the old
behavior of linear searching.  There is similar code already to
deal with /proc/net/route; it continues from the last address.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mihai Maruseac Oct. 12, 2011, 10:08 a.m. UTC | #5
On Mon, 2011-10-10 at 22:55 -0700, Stephen Hemminger wrote:
> If ifindex has been deleted, the code should fall back to delivering
> the next offset. That means for the rare case it would have the old
> behavior of linear searching.  There is similar code already to
> deal with /proc/net/route; it continues from the last address.

We rewrote the patch to use the ifindex. Since we are at the beginning,
the patch was sent in another thread. We'll learn how to properly send
the mail.
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index bf49a47..2d0f126 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4034,6 +4034,51 @@  static int dev_ifconf(struct net *net, char __user *arg)
 }
 
 #ifdef CONFIG_PROC_FS
+
+struct dev_iter_state {
+	struct seq_net_private p;
+	int bucket;
+	int offset;
+};
+
+static inline struct net_device *dev_from_same_bucket(struct seq_file *seq)
+{
+	struct net_device *dev = NULL;
+	struct net *net = seq_file_net(seq);
+	struct dev_iter_state *state = seq->private;
+	struct hlist_node *p;
+	struct hlist_head *h;
+	int count;
+
+	h = &net->dev_name_head[state->bucket];
+	count = 0;
+	hlist_for_each_entry_rcu(dev, p, h, name_hlist) {
+		if (count++ == state->offset) {
+			state->offset = count;
+			return dev;
+		}
+	}
+
+	return NULL;
+}
+
+static inline struct net_device *dev_from_new_bucket(struct seq_file *seq)
+{
+	struct dev_iter_state *state = seq->private;
+	struct net_device *dev = NULL;
+
+	while (state->bucket < NETDEV_HASHENTRIES) {
+		dev = dev_from_same_bucket(seq);
+		if (dev)
+			return dev;
+
+		state->bucket++;
+		state->offset = 0;
+	}
+
+	return NULL;
+}
+
 /*
  *	This is invoked by the /proc filesystem handler to display a device
  *	in detail.
@@ -4041,33 +4086,27 @@  static int dev_ifconf(struct net *net, char __user *arg)
 void *dev_seq_start(struct seq_file *seq, loff_t *pos)
 	__acquires(RCU)
 {
-	struct net *net = seq_file_net(seq);
-	loff_t off;
-	struct net_device *dev;
-
 	rcu_read_lock();
 	if (!*pos)
 		return SEQ_START_TOKEN;
 
-	off = 1;
-	for_each_netdev_rcu(net, dev)
-		if (off++ == *pos)
-			return dev;
-
-	return NULL;
+	return dev_from_new_bucket(seq);
 }
 
 void *dev_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-	struct net_device *dev = v;
+	struct net_device *dev = NULL;
+
+	++*pos;
 
 	if (v == SEQ_START_TOKEN)
-		dev = first_net_device_rcu(seq_file_net(seq));
-	else
-		dev = next_net_device_rcu(dev);
+		return dev_from_new_bucket(seq);
 
-	++*pos;
-	return dev;
+	dev = dev_from_same_bucket(seq);
+	if (dev)
+		return dev;
+
+	return dev_from_new_bucket(seq);
 }
 
 void dev_seq_stop(struct seq_file *seq, void *v)
@@ -4166,7 +4205,7 @@  static const struct seq_operations dev_seq_ops = {
 static int dev_seq_open(struct inode *inode, struct file *file)
 {
 	return seq_open_net(inode, file, &dev_seq_ops,
-			    sizeof(struct seq_net_private));
+			    sizeof(struct dev_iter_state));
 }
 
 static const struct file_operations dev_seq_fops = {