Message ID | 1276656324.19249.39.camel@edumazet-laptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 16 Jun 2010 04:45:24 +0200 > [PATCH net-next-2.6] inetpeer: do not use zero refcnt for freed entries > > Followup of commit aa1039e73cc2 (inetpeer: RCU conversion) > > Unused inet_peer entries have a null refcnt. > > Using atomic_inc_not_zero() in rcu lookups is not going to work for > them, and slow path is taken. > > Fix this using -1 marker instead of 0 for deleted entries. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mardi 15 juin 2010 à 21:47 -0700, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Wed, 16 Jun 2010 04:45:24 +0200 > > > [PATCH net-next-2.6] inetpeer: do not use zero refcnt for freed entries > > > > Followup of commit aa1039e73cc2 (inetpeer: RCU conversion) > > > > Unused inet_peer entries have a null refcnt. > > > > Using atomic_inc_not_zero() in rcu lookups is not going to work for > > them, and slow path is taken. > > > > Fix this using -1 marker instead of 0 for deleted entries. > > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > Applied, thanks Eric. Thanks With 65537 peers and a DDOS frag attack, I now get following profiling results : ----------------------------------------------------------------------------------------- PerfTop: 1024 irqs/sec kernel:100.0% exact: 0.0% [1000Hz cycles], (all, cpu: 0) ----------------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ _________________________ 7722.00 65.6% inet_frag_find 1355.00 11.5% ip4_frag_match 494.00 4.2% __lock_acquire 260.00 2.2% inet_getpeer 243.00 2.1% ip_route_input_common 151.00 1.3% lock_release 142.00 1.2% mark_lock 126.00 1.1% lock_acquire 104.00 0.9% __kmalloc 86.00 0.7% skb_put Just to show what could be the next steps ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 16, 2010 at 04:45:24AM +0200, Eric Dumazet wrote: > Le mardi 15 juin 2010 à 14:25 -0700, David Miller a écrit : > > From: Eric Dumazet <eric.dumazet@gmail.com> > > Date: Tue, 15 Jun 2010 20:23:14 +0200 > > > > > inetpeer currently uses an AVL tree protected by an rwlock. > > > > > > It's possible to make most lookups use RCU > > ... > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > > > Applied, nice work Eric. > > Thanks David ! > > Re-reading patch I realize refcnt is expected to be 0 for unused entries > (obviously), so we should use a different marker for 'about to be freed' > ones. > > Thanks > > [PATCH net-next-2.6] inetpeer: do not use zero refcnt for freed entries > > Followup of commit aa1039e73cc2 (inetpeer: RCU conversion) > > Unused inet_peer entries have a null refcnt. > > Using atomic_inc_not_zero() in rcu lookups is not going to work for > them, and slow path is taken. > > Fix this using -1 marker instead of 0 for deleted entries. Based on this patch, looks good to me! (I don't see lookup_rcu_bh() and friends in the trees I have at hand.) Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > net/ipv4/inetpeer.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c > index 58fbc7e..39a14ba 100644 > --- a/net/ipv4/inetpeer.c > +++ b/net/ipv4/inetpeer.c > @@ -187,7 +187,12 @@ static struct inet_peer *lookup_rcu_bh(__be32 daddr) > > while (u != peer_avl_empty) { > if (daddr == u->v4daddr) { > - if (unlikely(!atomic_inc_not_zero(&u->refcnt))) > + /* Before taking a reference, check if this entry was > + * deleted, unlink_from_pool() sets refcnt=-1 to make > + * distinction between an unused entry (refcnt=0) and > + * a freed one. > + */ > + if (unlikely(!atomic_add_unless(&u->refcnt, 1, -1))) > u = NULL; > return u; > } > @@ -322,8 +327,9 @@ static void unlink_from_pool(struct inet_peer *p) > * in cleanup() function to prevent sudden disappearing. If we can > * atomically (because of lockless readers) take this last reference, > * it's safe to remove the node and free it later. > + * We use refcnt=-1 to alert lockless readers this entry is deleted. > */ > - if (atomic_cmpxchg(&p->refcnt, 1, 0) == 1) { > + if (atomic_cmpxchg(&p->refcnt, 1, -1) == 1) { > struct inet_peer **stack[PEER_MAXDEPTH]; > struct inet_peer ***stackptr, ***delp; > if (lookup(p->v4daddr, stack) != p) > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c index 58fbc7e..39a14ba 100644 --- a/net/ipv4/inetpeer.c +++ b/net/ipv4/inetpeer.c @@ -187,7 +187,12 @@ static struct inet_peer *lookup_rcu_bh(__be32 daddr) while (u != peer_avl_empty) { if (daddr == u->v4daddr) { - if (unlikely(!atomic_inc_not_zero(&u->refcnt))) + /* Before taking a reference, check if this entry was + * deleted, unlink_from_pool() sets refcnt=-1 to make + * distinction between an unused entry (refcnt=0) and + * a freed one. + */ + if (unlikely(!atomic_add_unless(&u->refcnt, 1, -1))) u = NULL; return u; } @@ -322,8 +327,9 @@ static void unlink_from_pool(struct inet_peer *p) * in cleanup() function to prevent sudden disappearing. If we can * atomically (because of lockless readers) take this last reference, * it's safe to remove the node and free it later. + * We use refcnt=-1 to alert lockless readers this entry is deleted. */ - if (atomic_cmpxchg(&p->refcnt, 1, 0) == 1) { + if (atomic_cmpxchg(&p->refcnt, 1, -1) == 1) { struct inet_peer **stack[PEER_MAXDEPTH]; struct inet_peer ***stackptr, ***delp; if (lookup(p->v4daddr, stack) != p)