Patchwork [1/2] net: Allow to create links with given ifindex

login
register
mail settings
Submitter Eric Dumazet
Date Aug. 2, 2012, 10:28 a.m.
Message ID <1343903310.9299.184.camel@edumazet-glaptop>
Download mbox | patch
Permalink /patch/174716/
State RFC
Delegated to: David Miller
Headers show

Comments

Eric Dumazet - Aug. 2, 2012, 10:28 a.m.
On Tue, 2012-07-31 at 04:58 -0700, Eric W. Biederman wrote:

> Making lo the particularly interesting case.


BTW, I noticed in my benchmarks, that once I remove the contention on
dst refcnt (using a percpu cache of dsts), I have a strange performance
cost accessing net->loopback_dev->ifindex in ip_route_output_key.

Strange because I see no false sharing on this ifindex location for
loopback device.

So we probably can save some cycles adding a net->loopback_ifindex
to remove one dereference.

If ifindex are per network space, I guess we'll need to change
arp_hashfn() or else we'll use some slots more than others.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman - Aug. 2, 2012, 11:09 a.m.
Eric Dumazet <eric.dumazet@gmail.com> writes:

> On Tue, 2012-07-31 at 04:58 -0700, Eric W. Biederman wrote:
>
>> Making lo the particularly interesting case.
>
>
> BTW, I noticed in my benchmarks, that once I remove the contention on
> dst refcnt (using a percpu cache of dsts), I have a strange performance
> cost accessing net->loopback_dev->ifindex in ip_route_output_key.
>
> Strange because I see no false sharing on this ifindex location for
> loopback device.
>
> So we probably can save some cycles adding a net->loopback_ifindex
> to remove one dereference.

I am going to let Pavel tackle the actual work because only migration
really cares and he is working on migration right now.

But assuming we merge the per network namespace ifindex counter we
can change net->loopback_dev->ifindex to LOOPBACK_IFINDEX and
define "#define LOOPBACK_IFINDEX 1"

Certainly that works in the initial network namespace today and might be
worth testing.

> If ifindex are per network space, I guess we'll need to change
> arp_hashfn() or else we'll use some slots more than others.

Darn.  I hate being right about there being a few places to fix
up.

ndisc_hashfn also has the same limitation.

Eric

> diff --git a/include/net/arp.h b/include/net/arp.h
> index 7f7df93..37aac58 100644
> --- a/include/net/arp.h
> +++ b/include/net/arp.h
> @@ -10,7 +10,7 @@ extern struct neigh_table arp_tbl;
>  
>  static inline u32 arp_hashfn(u32 key, const struct net_device *dev, u32 hash_rnd)
>  {
> -	u32 val = key ^ dev->ifindex;
> +	u32 val = key ^ (u32)(unsigned long)dev;
>  
>  	return val * hash_rnd;
>  }
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Aug. 2, 2012, 11:26 p.m.
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 02 Aug 2012 12:28:30 +0200

> Strange because I see no false sharing on this ifindex location for
> loopback device.

Are you sure netdev->rx_dropped isn't being incremented?  That appears
as if it would land on the same cache line as netdev->ifindex.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Aug. 2, 2012, 11:37 p.m.
From: ebiederm@xmission.com (Eric W. Biederman)
Date: Thu, 02 Aug 2012 04:09:39 -0700

> Eric Dumazet <eric.dumazet@gmail.com> writes:
> 
>> If ifindex are per network space, I guess we'll need to change
>> arp_hashfn() or else we'll use some slots more than others.
> 
> Darn.  I hate being right about there being a few places to fix
> up.
> 
> ndisc_hashfn also has the same limitation.

And netlabel's inteface hashing as well.

LLC works with ifindex hashing and is not namespace aware.  It's
should therefore limited to &init_net and therefore OK.  Likewise
for the CAN code.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/include/net/arp.h b/include/net/arp.h
index 7f7df93..37aac58 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -10,7 +10,7 @@  extern struct neigh_table arp_tbl;
 
 static inline u32 arp_hashfn(u32 key, const struct net_device *dev, u32 hash_rnd)
 {
-	u32 val = key ^ dev->ifindex;
+	u32 val = key ^ (u32)(unsigned long)dev;
 
 	return val * hash_rnd;
 }