Message ID | 1283986576.2428.21.camel@edumazet-laptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, Sep 8, 2010 at 3:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Use cmpxchg() to get rid of spinlocks in inet_add_protocol() and > friends. > > inet_protos[] & inet6_protos[] are moved to read_mostly section > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> What's the benefit to this? It's hard to imagine that add/deleting protocols is highly contended. On the other hand, a simple spinlock is very easy to look at and verify correct, while this takes a little more thought. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Jesse Gross <jesse@nicira.com> Date: Wed, 8 Sep 2010 16:49:27 -0700 > On Wed, Sep 8, 2010 at 3:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> Use cmpxchg() to get rid of spinlocks in inet_add_protocol() and >> friends. >> >> inet_protos[] & inet6_protos[] are moved to read_mostly section >> >> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > What's the benefit to this? It's hard to imagine that add/deleting > protocols is highly contended. On the other hand, a simple spinlock > is very easy to look at and verify correct, while this takes a little > more thought. Smaller data section is the benefit, thus less cache pressure there. It's not about contention at all. We've already applied other similar changes from Eric doing exactly this kind of transformation and it's not all that non-trivial to me, frankly. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 09 Sep 2010 00:56:16 +0200 > Use cmpxchg() to get rid of spinlocks in inet_add_protocol() and > friends. > > inet_protos[] & inet6_protos[] are moved to read_mostly section > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 08 septembre 2010 à 16:49 -0700, Jesse Gross a écrit : > On Wed, Sep 8, 2010 at 3:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Use cmpxchg() to get rid of spinlocks in inet_add_protocol() and > > friends. > > > > inet_protos[] & inet6_protos[] are moved to read_mostly section > > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > What's the benefit to this? It's hard to imagine that add/deleting > protocols is highly contended. On the other hand, a simple spinlock > is very easy to look at and verify correct, while this takes a little > more thought. Good question. Did you know that _bh() was not necessary for example at this point ? Its a leftover of an RCU conversion, done many years ago. It _was_ working of course, but it really hurts my mind. We could use stop_machine or other brute force thing too ;) Before patch : # size net/ipv4/protocol.o net/ipv6/protocol.o text data bss dec hex filename 340 0 2052 2392 958 net/ipv4/protocol.o 342 0 2052 2394 95a net/ipv6/protocol.o After patch : # size net/ipv4/protocol.o net/ipv6/protocol.o text data bss dec hex filename 220 2048 0 2268 8dc net/ipv4/protocol.o 222 2048 0 2270 8de net/ipv6/protocol.o 45 lines removed, 240 bytes of text removed, I found this interesting. Removing the ____cacheline_aligned_in_smp is also good, because the .data alignement, is reduced and linker is not force to propagate this alignement to upper .o files, with many holes. If you take a look at most patches flying around, this one at least took me to think a bit ;) Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2010-09-09 06:30, David Miller wrote: > From: Jesse Gross <jesse@nicira.com> > Date: Wed, 8 Sep 2010 16:49:27 -0700 > >> On Wed, Sep 8, 2010 at 3:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >>> Use cmpxchg() to get rid of spinlocks in inet_add_protocol() and >>> friends. >>> >>> inet_protos[] & inet6_protos[] are moved to read_mostly section >>> >>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> >> >> What's the benefit to this? It's hard to imagine that add/deleting >> protocols is highly contended. On the other hand, a simple spinlock >> is very easy to look at and verify correct, while this takes a little >> more thought. > > Smaller data section is the benefit, thus less cache pressure there. > Of course, each case is different, but generally I understand Jesse's concern. Smaller data section argument shouldn't be enough. There is some reason we do it in C not asm, and there is some cost of unreadable code too. Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 09 septembre 2010 à 06:42 +0000, Jarek Poplawski a écrit : > Of course, each case is different, but generally I understand Jesse's > concern. Smaller data section argument shouldn't be enough. There is > some reason we do it in C not asm, and there is some cost of > unreadable code too. Sure, but is cmpxchg() really not readable for you ? ;) Now it is available for all arches, you can bet it is going to be adopted. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 09, 2010 at 08:55:45AM +0200, Eric Dumazet wrote: > Le jeudi 09 septembre 2010 ?? 06:42 +0000, Jarek Poplawski a écrit : > > Of course, each case is different, but generally I understand Jesse's > > concern. Smaller data section argument shouldn't be enough. There is > > some reason we do it in C not asm, and there is some cost of > > unreadable code too. > > Sure, but is cmpxchg() really not readable for you ? ;) Just like for Jesse: "this takes a little more thought". ;-) Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 09, 2010 at 07:09:26AM +0000, Jarek Poplawski wrote: > On Thu, Sep 09, 2010 at 08:55:45AM +0200, Eric Dumazet wrote: > > Le jeudi 09 septembre 2010 ?? 06:42 +0000, Jarek Poplawski a écrit : > > > Of course, each case is different, but generally I understand Jesse's > > > concern. Smaller data section argument shouldn't be enough. There is > > > some reason we do it in C not asm, and there is some cost of > > > unreadable code too. > > > > Sure, but is cmpxchg() really not readable for you ? ;) > > Just like for Jesse: "this takes a little more thought". ;-) But more exactly, it's not about this one line, but the time you need when reading this after some time to tell: yes, this place is properly locked for sure. Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 09, 2010 at 07:24:53AM +0000, Jarek Poplawski wrote: > On Thu, Sep 09, 2010 at 07:09:26AM +0000, Jarek Poplawski wrote: > > On Thu, Sep 09, 2010 at 08:55:45AM +0200, Eric Dumazet wrote: > > > Le jeudi 09 septembre 2010 ?? 06:42 +0000, Jarek Poplawski a écrit : > > > > Of course, each case is different, but generally I understand Jesse's > > > > concern. Smaller data section argument shouldn't be enough. There is > > > > some reason we do it in C not asm, and there is some cost of > > > > unreadable code too. > > > > > > Sure, but is cmpxchg() really not readable for you ? ;) > > > > Just like for Jesse: "this takes a little more thought". ;-) > > But more exactly, it's not about this one line, but the time you need > when reading this after some time to tell: yes, this place is properly > locked for sure. And most exactly, without "after some time" I couldn't tell this either. Is seems some barriers might have been changed, but I might be wrong. Anyway, it would be nice to have some comment on that in the changelog. Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 09 septembre 2010 à 09:08 +0000, Jarek Poplawski a écrit : > xactly, without "after some time" I couldn't tell this > either. Is seems some barriers might have been changed, but I might > be wrong. Anyway, it would be nice to have some comment on that in > the changelog. Jarek, cmpxchg() is a very strong primitive, as documented in Documentation/memory-barriers.txt : <begin of quote> Any atomic operation that modifies some state in memory and returns information about the state (old or new) implies an SMP-conditional general memory barrier (smp_mb()) on each side of the actual operation (with the exception of explicit lock operations, described later). These include: xchg(); cmpxchg(); ... These are used for such things as implementing LOCK-class and UNLOCK-class operations and adjusting reference counters towards object destruction, and as such the implicit memory barrier effects are necessary. <end of quote> Its not needed to document each cmpxchg() use, because of these strong requirements. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 09, 2010 at 11:23:57AM +0200, Eric Dumazet wrote: > Le jeudi 09 septembre 2010 ?? 09:08 +0000, Jarek Poplawski a écrit : > > xactly, without "after some time" I couldn't tell this > > either. Is seems some barriers might have been changed, but I might > > be wrong. Anyway, it would be nice to have some comment on that in > > the changelog. > > Jarek, cmpxchg() is a very strong primitive, as documented > in Documentation/memory-barriers.txt : > > <begin of quote> > > Any atomic operation that modifies some state in memory and returns information > about the state (old or new) implies an SMP-conditional general memory barrier > (smp_mb()) on each side of the actual operation (with the exception of > explicit lock operations, described later). These include: > > xchg(); > cmpxchg(); > ... > These are used for such things as implementing LOCK-class and UNLOCK-class > operations and adjusting reference counters towards object destruction, and as > such the implicit memory barrier effects are necessary. > > <end of quote> > > Its not needed to document each cmpxchg() use, because of these > strong requirements. OK! I should remember this until "after some time". ;-) Thanks for the explanation, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 09, 2010 at 11:23:57AM +0200, Eric Dumazet wrote: > Le jeudi 09 septembre 2010 ?? 09:08 +0000, Jarek Poplawski a écrit : > > xactly, without "after some time" I couldn't tell this > > either. Is seems some barriers might have been changed, but I might > > be wrong. Anyway, it would be nice to have some comment on that in > > the changelog. > > Jarek, cmpxchg() is a very strong primitive, as documented > in Documentation/memory-barriers.txt : ... > Its not needed to document each cmpxchg() use, because of these > strong requirements. > Btw, I wonder if for readability and debuging it shouldn't be made generic as rcu_cmpxchg_pointer() etc.? Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 09 septembre 2010 à 09:59 +0000, Jarek Poplawski a écrit : > Btw, I wonder if for readability and debuging it shouldn't be made > generic as rcu_cmpxchg_pointer() etc.? There are a lot of things to do in this area (with proper __rcu sparse annotations, this can be not very readable ...) /** * rcu_cmpxchg - atomic compare and exchange, SMP & rcu safe * @p: pointer to value * @old: old value * @new: new value * * Equivalent to : * ATOMIC_BEGIN * ret = *p; * if (ret == old) * rcu_assign_pointer(*p, new); * ATOMIC_END * return ret; * * cmpxpchg() contains full memory barriers, so can be used * in rcu write side without additional smp_wmb() barrier */ #define rcu_cmpxchg(p, old, new) cmpxchg(p, old, new) const struct net_protocol __rcu * inet_protos[MAX_INET_PROTOS] __read_mostly; ... return !rcu_cmpxchg(&inet_protos[hash], NULL, (const struct net_protocol __force __rcu *)prot) ? 0 : -1; ... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 09, 2010 at 02:57:38PM +0200, Eric Dumazet wrote: > Le jeudi 09 septembre 2010 ?? 09:59 +0000, Jarek Poplawski a écrit : > > > Btw, I wonder if for readability and debuging it shouldn't be made > > generic as rcu_cmpxchg_pointer() etc.? > > There are a lot of things to do in this area (with proper __rcu sparse > annotations, this can be not very readable ...) Well, this is more than I could dream ;-) The main point to me was autodocumentation in use, similarly to rcu_assign_pointer(). But, of course, with the description below it's much better. Thanks, Jarek P. > > /** > * rcu_cmpxchg - atomic compare and exchange, SMP & rcu safe > * @p: pointer to value > * @old: old value > * @new: new value > * > * Equivalent to : > * ATOMIC_BEGIN > * ret = *p; > * if (ret == old) > * rcu_assign_pointer(*p, new); > * ATOMIC_END > * return ret; > * > * cmpxpchg() contains full memory barriers, so can be used > * in rcu write side without additional smp_wmb() barrier > */ > #define rcu_cmpxchg(p, old, new) cmpxchg(p, old, new) > > > const struct net_protocol __rcu * > inet_protos[MAX_INET_PROTOS] __read_mostly; > > ... > > return !rcu_cmpxchg(&inet_protos[hash], > NULL, > (const struct net_protocol __force __rcu *)prot) ? > 0 : -1; > > ... > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c index f2d2973..65699c2 100644 --- a/net/ipv4/protocol.c +++ b/net/ipv4/protocol.c @@ -28,8 +28,7 @@ #include <linux/spinlock.h> #include <net/protocol.h> -const struct net_protocol *inet_protos[MAX_INET_PROTOS] ____cacheline_aligned_in_smp; -static DEFINE_SPINLOCK(inet_proto_lock); +const struct net_protocol *inet_protos[MAX_INET_PROTOS] __read_mostly; /* * Add a protocol handler to the hash tables @@ -37,20 +36,9 @@ static DEFINE_SPINLOCK(inet_proto_lock); int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol) { - int hash, ret; + int hash = protocol & (MAX_INET_PROTOS - 1); - hash = protocol & (MAX_INET_PROTOS - 1); - - spin_lock_bh(&inet_proto_lock); - if (inet_protos[hash]) { - ret = -1; - } else { - inet_protos[hash] = prot; - ret = 0; - } - spin_unlock_bh(&inet_proto_lock); - - return ret; + return !cmpxchg(&inet_protos[hash], NULL, prot) ? 0 : -1; } EXPORT_SYMBOL(inet_add_protocol); @@ -60,18 +48,9 @@ EXPORT_SYMBOL(inet_add_protocol); int inet_del_protocol(const struct net_protocol *prot, unsigned char protocol) { - int hash, ret; - - hash = protocol & (MAX_INET_PROTOS - 1); + int ret, hash = protocol & (MAX_INET_PROTOS - 1); - spin_lock_bh(&inet_proto_lock); - if (inet_protos[hash] == prot) { - inet_protos[hash] = NULL; - ret = 0; - } else { - ret = -1; - } - spin_unlock_bh(&inet_proto_lock); + ret = (cmpxchg(&inet_protos[hash], prot, NULL) == prot) ? 0 : -1; synchronize_net(); diff --git a/net/ipv6/protocol.c b/net/ipv6/protocol.c index 1fa3468..9bb936a 100644 --- a/net/ipv6/protocol.c +++ b/net/ipv6/protocol.c @@ -25,28 +25,14 @@ #include <linux/spinlock.h> #include <net/protocol.h> -const struct inet6_protocol *inet6_protos[MAX_INET_PROTOS]; -static DEFINE_SPINLOCK(inet6_proto_lock); - +const struct inet6_protocol *inet6_protos[MAX_INET_PROTOS] __read_mostly; int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char protocol) { - int ret, hash = protocol & (MAX_INET_PROTOS - 1); - - spin_lock_bh(&inet6_proto_lock); - - if (inet6_protos[hash]) { - ret = -1; - } else { - inet6_protos[hash] = prot; - ret = 0; - } - - spin_unlock_bh(&inet6_proto_lock); + int hash = protocol & (MAX_INET_PROTOS - 1); - return ret; + return !cmpxchg(&inet6_protos[hash], NULL, prot) ? 0 : -1; } - EXPORT_SYMBOL(inet6_add_protocol); /* @@ -57,20 +43,10 @@ int inet6_del_protocol(const struct inet6_protocol *prot, unsigned char protocol { int ret, hash = protocol & (MAX_INET_PROTOS - 1); - spin_lock_bh(&inet6_proto_lock); - - if (inet6_protos[hash] != prot) { - ret = -1; - } else { - inet6_protos[hash] = NULL; - ret = 0; - } - - spin_unlock_bh(&inet6_proto_lock); + ret = (cmpxchg(&inet6_protos[hash], prot, NULL) == prot) ? 0 : -1; synchronize_net(); return ret; } - EXPORT_SYMBOL(inet6_del_protocol);
Use cmpxchg() to get rid of spinlocks in inet_add_protocol() and friends. inet_protos[] & inet6_protos[] are moved to read_mostly section Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/ipv4/protocol.c | 31 +++++-------------------------- net/ipv6/protocol.c | 32 ++++---------------------------- 2 files changed, 9 insertions(+), 54 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html