Patchwork net: Fix __inet_inherit_port() to correctly increment bsockets and num_owners

login
register
mail settings
Submitter Eric Dumazet
Date Nov. 26, 2010, 10:47 a.m.
Message ID <1290768477.2855.97.camel@edumazet-laptop>
Download mbox | patch
Permalink /patch/73164/
State Superseded
Delegated to: David Miller
Headers show

Comments

Eric Dumazet - Nov. 26, 2010, 10:47 a.m.
From: Nagendra Tomar <tomer_iisc@yahoo.com>

Le vendredi 26 novembre 2010 à 01:40 -0800, Nagendra Tomar a écrit :
> 
> --- On Fri, 26/11/10, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > 
> > OK so you'll have to make a proof, because current code
> > seems to work ;)
> > 
> > 
> 
> ok, so I printed hashinfo->bsockets and tb->num_owners inside
> __inet_put_port() and I could see both of them to be -ve. All we need
> to do is establish and terminate a connection. I used netcat for that.
> 
> The only place 'bsockets' and 'num_owners' are used is
> inet_csk_get_port() and the only effect that they might have is on the
> choice of the port to be used for binding. 
> 'bsockets' is used as a hint to stop searching for a free port (and
> instead share an already used port) when we know that all the ports
> could be used up.
> 'num_owners' is used to find the port which is least shared (to
> balance the 'owners' list) in case we need to share a port.
> 
> Since both of these are used as optimizations (in the bind path), they
> do not affect correctness and hence the code works even with these
> values not being updated correctly.

Hmm, thanks for clarification.

bsockets / mnum_owners iscount is indeed an 'optimization' problem.

Problem is your patch is not applicable to current tree.

In order to submit it to stable team, you should first post a patch for
next/current kernel (net-next-2.6 tree).

David will decide if its net-2.6 material or not.

You could add in your changelog the problem comes from commit 
a9d8f9110d7e953c (inet: Allowing more than 64k connections and heavily
optimize bind(0)), included in 2.6.30, to ease stable team work.

On current tree your patch would be :



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nagendra Tomar - Nov. 26, 2010, 11:01 a.m.
--- On Fri, 26/11/10, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> 
> Problem is your patch is not applicable to current tree.
> 
> In order to submit it to stable team, you should first post
> a patch for
> next/current kernel (net-next-2.6 tree).
> 

Thanks, Erik.
I'd made the patch against 2.6.36.1 which is the latest stable kernel per kernel.org. I thought that was the right kernel version to make a patch against.
I do not use git. Shall I make a patch against linux-next as it appears in kernel.org.


> David will decide if its net-2.6 material or not.
> 
> You could add in your changelog the problem comes from
> commit 
> a9d8f9110d7e953c (inet: Allowing more than 64k connections
> and heavily
> optimize bind(0)), included in 2.6.30, to ease stable team
> work.
> 
> On current tree your patch would be :
> 
> diff --git a/net/ipv4/inet_hashtables.c
> b/net/ipv4/inet_hashtables.c
> index 1b344f3..3c0369a 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -133,8 +133,7 @@ int __inet_inherit_port(struct sock
> *sk, struct sock *child)
>             
> }
>          }
>      }
> -    sk_add_bind_node(child,
> &tb->owners);
> -    inet_csk(child)->icsk_bind_hash =
> tb;
> +    inet_bind_hash(child, tb, port);
>      spin_unlock(&head->lock);
>  
>      return 0;
> 
> 
> 

Thanks,
Tomar


      
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Evgeniy Polyakov - Nov. 26, 2010, 11:07 a.m.
Hi.

On Fri, Nov 26, 2010 at 11:47:57AM +0100, Eric Dumazet (eric.dumazet@gmail.com) wrote:
> > ok, so I printed hashinfo->bsockets and tb->num_owners inside
> > __inet_put_port() and I could see both of them to be -ve. All we need
> > to do is establish and terminate a connection. I used netcat for that.
> > 
> > The only place 'bsockets' and 'num_owners' are used is
> > inet_csk_get_port() and the only effect that they might have is on the
> > choice of the port to be used for binding. 
> > 'bsockets' is used as a hint to stop searching for a free port (and
> > instead share an already used port) when we know that all the ports
> > could be used up.
> > 'num_owners' is used to find the port which is least shared (to
> > balance the 'owners' list) in case we need to share a port.
> > 
> > Since both of these are used as optimizations (in the bind path), they
> > do not affect correctness and hence the code works even with these
> > values not being updated correctly.
> 
> Hmm, thanks for clarification.
> 
> bsockets / mnum_owners iscount is indeed an 'optimization' problem.
> 
> Problem is your patch is not applicable to current tree.
> 
> In order to submit it to stable team, you should first post a patch for
> next/current kernel (net-next-2.6 tree).
> 
> David will decide if its net-2.6 material or not.
> 
> You could add in your changelog the problem comes from commit 
> a9d8f9110d7e953c (inet: Allowing more than 64k connections and heavily
> optimize bind(0)), included in 2.6.30, to ease stable team work.

Frankly I did not find how those optimizations made a bug as well as
what is this bug about from given description, but I'm glad it is resolved now :)
Nagendra Tomar - Nov. 26, 2010, 11:20 a.m.
--- On Fri, 26/11/10, Evgeniy Polyakov <zbr@ioremap.net> wrote:

> Frankly I did not find how those optimizations made a bug
> as well as
> what is this bug about from given description, but I'm glad
> it is resolved now :)
> 

I'm not sure of what all went into the "optimization" patch, but the bug is not due to the optimization per-se. As my original post says, it is due to the 'bsockets' and 'num_owners' not being incremented inside __inet_inherit_port(), where it should have been, as __inet_put_port() decrements them on port release, which causes the imbalance.

Thanks,
Tomar


      
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Evgeniy Polyakov - Nov. 26, 2010, 3:56 p.m.
On Fri, Nov 26, 2010 at 03:20:31AM -0800, Nagendra Tomar (tomer_iisc@yahoo.com) wrote:
> > Frankly I did not find how those optimizations made a bug
> > as well as
> > what is this bug about from given description, but I'm glad
> > it is resolved now :)
> > 
> 
> I'm not sure of what all went into the "optimization" patch, but the bug is not due to the optimization per-se. As my original post says, it is due to the 'bsockets' and 'num_owners' not being incremented inside __inet_inherit_port(), where it should have been, as __inet_put_port() decrements them on port release, which causes the imbalance.

Argh, I see, thanks a lot for explanation!

Patch

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 1b344f3..3c0369a 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -133,8 +133,7 @@  int __inet_inherit_port(struct sock *sk, struct sock *child)
 			}
 		}
 	}
-	sk_add_bind_node(child, &tb->owners);
-	inet_csk(child)->icsk_bind_hash = tb;
+	inet_bind_hash(child, tb, port);
 	spin_unlock(&head->lock);
 
 	return 0;