diff mbox

[net-next-2.6] inetpeer: do not use zero refcnt for freed entries

Message ID 1276656324.19249.39.camel@edumazet-laptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet June 16, 2010, 2:45 a.m. UTC
Le mardi 15 juin 2010 à 14:25 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 15 Jun 2010 20:23:14 +0200
> 
> > inetpeer currently uses an AVL tree protected by an rwlock.
> > 
> > It's possible to make most lookups use RCU
>  ...
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Applied, nice work Eric.

Thanks David !

Re-reading patch I realize refcnt is expected to be 0 for unused entries
(obviously), so we should use a different marker for 'about to be freed'
ones.

Thanks

[PATCH net-next-2.6] inetpeer: do not use zero refcnt for freed entries

Followup of commit aa1039e73cc2 (inetpeer: RCU conversion)

Unused inet_peer entries have a null refcnt.

Using atomic_inc_not_zero() in rcu lookups is not going to work for
them, and slow path is taken.

Fix this using -1 marker instead of 0 for deleted entries.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/inetpeer.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller June 16, 2010, 4:47 a.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 16 Jun 2010 04:45:24 +0200

> [PATCH net-next-2.6] inetpeer: do not use zero refcnt for freed entries
> 
> Followup of commit aa1039e73cc2 (inetpeer: RCU conversion)
> 
> Unused inet_peer entries have a null refcnt.
> 
> Using atomic_inc_not_zero() in rcu lookups is not going to work for
> them, and slow path is taken.
> 
> Fix this using -1 marker instead of 0 for deleted entries.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 16, 2010, 8:56 a.m. UTC | #2
Le mardi 15 juin 2010 à 21:47 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 16 Jun 2010 04:45:24 +0200
> 
> > [PATCH net-next-2.6] inetpeer: do not use zero refcnt for freed entries
> > 
> > Followup of commit aa1039e73cc2 (inetpeer: RCU conversion)
> > 
> > Unused inet_peer entries have a null refcnt.
> > 
> > Using atomic_inc_not_zero() in rcu lookups is not going to work for
> > them, and slow path is taken.
> > 
> > Fix this using -1 marker instead of 0 for deleted entries.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Applied, thanks Eric.

Thanks

With 65537 peers and a DDOS frag attack, I now get following profiling
results :

-----------------------------------------------------------------------------------------
   PerfTop:    1024 irqs/sec  kernel:100.0%  exact:  0.0% [1000Hz
cycles],  (all, cpu: 0)
-----------------------------------------------------------------------------------------

             samples  pcnt function                  DSO
             _______ _____ _________________________ 

             7722.00 65.6% inet_frag_find            
             1355.00 11.5% ip4_frag_match            
              494.00  4.2% __lock_acquire            
              260.00  2.2% inet_getpeer              
              243.00  2.1% ip_route_input_common     
              151.00  1.3% lock_release              
              142.00  1.2% mark_lock                 
              126.00  1.1% lock_acquire              
              104.00  0.9% __kmalloc                 
               86.00  0.7% skb_put                   


Just to show what could be the next steps ;)





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney June 16, 2010, 6:12 p.m. UTC | #3
On Wed, Jun 16, 2010 at 04:45:24AM +0200, Eric Dumazet wrote:
> Le mardi 15 juin 2010 à 14:25 -0700, David Miller a écrit :
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Tue, 15 Jun 2010 20:23:14 +0200
> > 
> > > inetpeer currently uses an AVL tree protected by an rwlock.
> > > 
> > > It's possible to make most lookups use RCU
> >  ...
> > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > 
> > Applied, nice work Eric.
> 
> Thanks David !
> 
> Re-reading patch I realize refcnt is expected to be 0 for unused entries
> (obviously), so we should use a different marker for 'about to be freed'
> ones.
> 
> Thanks
> 
> [PATCH net-next-2.6] inetpeer: do not use zero refcnt for freed entries
> 
> Followup of commit aa1039e73cc2 (inetpeer: RCU conversion)
> 
> Unused inet_peer entries have a null refcnt.
> 
> Using atomic_inc_not_zero() in rcu lookups is not going to work for
> them, and slow path is taken.
> 
> Fix this using -1 marker instead of 0 for deleted entries.

Based on this patch, looks good to me!  (I don't see lookup_rcu_bh() and
friends in the trees I have at hand.)

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/ipv4/inetpeer.c |   10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
> index 58fbc7e..39a14ba 100644
> --- a/net/ipv4/inetpeer.c
> +++ b/net/ipv4/inetpeer.c
> @@ -187,7 +187,12 @@ static struct inet_peer *lookup_rcu_bh(__be32 daddr)
> 
>  	while (u != peer_avl_empty) {
>  		if (daddr == u->v4daddr) {
> -			if (unlikely(!atomic_inc_not_zero(&u->refcnt)))
> +			/* Before taking a reference, check if this entry was
> +			 * deleted, unlink_from_pool() sets refcnt=-1 to make
> +			 * distinction between an unused entry (refcnt=0) and
> +			 * a freed one.
> +			 */
> +			if (unlikely(!atomic_add_unless(&u->refcnt, 1, -1)))
>  				u = NULL;
>  			return u;
>  		}
> @@ -322,8 +327,9 @@ static void unlink_from_pool(struct inet_peer *p)
>  	 * in cleanup() function to prevent sudden disappearing.  If we can
>  	 * atomically (because of lockless readers) take this last reference,
>  	 * it's safe to remove the node and free it later.
> +	 * We use refcnt=-1 to alert lockless readers this entry is deleted.
>  	 */
> -	if (atomic_cmpxchg(&p->refcnt, 1, 0) == 1) {
> +	if (atomic_cmpxchg(&p->refcnt, 1, -1) == 1) {
>  		struct inet_peer **stack[PEER_MAXDEPTH];
>  		struct inet_peer ***stackptr, ***delp;
>  		if (lookup(p->v4daddr, stack) != p)
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index 58fbc7e..39a14ba 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -187,7 +187,12 @@  static struct inet_peer *lookup_rcu_bh(__be32 daddr)
 
 	while (u != peer_avl_empty) {
 		if (daddr == u->v4daddr) {
-			if (unlikely(!atomic_inc_not_zero(&u->refcnt)))
+			/* Before taking a reference, check if this entry was
+			 * deleted, unlink_from_pool() sets refcnt=-1 to make
+			 * distinction between an unused entry (refcnt=0) and
+			 * a freed one.
+			 */
+			if (unlikely(!atomic_add_unless(&u->refcnt, 1, -1)))
 				u = NULL;
 			return u;
 		}
@@ -322,8 +327,9 @@  static void unlink_from_pool(struct inet_peer *p)
 	 * in cleanup() function to prevent sudden disappearing.  If we can
 	 * atomically (because of lockless readers) take this last reference,
 	 * it's safe to remove the node and free it later.
+	 * We use refcnt=-1 to alert lockless readers this entry is deleted.
 	 */
-	if (atomic_cmpxchg(&p->refcnt, 1, 0) == 1) {
+	if (atomic_cmpxchg(&p->refcnt, 1, -1) == 1) {
 		struct inet_peer **stack[PEER_MAXDEPTH];
 		struct inet_peer ***stackptr, ***delp;
 		if (lookup(p->v4daddr, stack) != p)