Patchwork virt-manager broken by bind(0) in net-next.

login
register
mail settings
Submitter Evgeniy Polyakov
Date Jan. 31, 2009, 9:31 a.m.
Message ID <20090131093123.GA28151@ioremap.net>
Download mbox | patch
Permalink /patch/21406/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

Evgeniy Polyakov - Jan. 31, 2009, 9:31 a.m.
On Sat, Jan 31, 2009 at 10:17:44AM +0100, Eric Dumazet (dada1@cosmosbay.com) wrote:
> > getaddrinfo() returns list of addresses and IPv6 was the first one iirc.
> > Previously it bailed out, but with my change it will try again without
> > reason for doing this. With the patch I sent based on Eric's observation
> > things should be fine.
> 
> Problem is your patch is wrong Evgeniy, please think about it litle bit more
> and resubmit it. 

No, patch should be ok. And its part which moves bsockets around was
added because of your complaints, that it is written into read-mostly
cache line. It is not a fix and has nothing with the problem at all.

> Take the time to run this $0.02 program, before and after your upcoming fix :

It is not a fix, but enhancement, which really has nothing with the bug
in question :)
Fix is to return an error if socket binding does not use the heuristic.

> offset of bsockets being 0x18 or 0x20 is same result : bad because in
> same cache line than ehash, ehash_locks, ehash_size, ehash_locks_mask,
> bhash, bhash_size, unless your cpu is a Pentium.

Attached patch makes difference, I'm curious if it ever make any
Eric Dumazet - Jan. 31, 2009, 9:49 a.m.
Evgeniy Polyakov a écrit :
> On Sat, Jan 31, 2009 at 10:17:44AM +0100, Eric Dumazet (dada1@cosmosbay.com) wrote:
>>> getaddrinfo() returns list of addresses and IPv6 was the first one iirc.
>>> Previously it bailed out, but with my change it will try again without
>>> reason for doing this. With the patch I sent based on Eric's observation
>>> things should be fine.
>> Problem is your patch is wrong Evgeniy, please think about it litle bit more
>> and resubmit it. 
> 
> No, patch should be ok. And its part which moves bsockets around was
> added because of your complaints, that it is written into read-mostly
> cache line. It is not a fix and has nothing with the problem at all.
> 
>> Take the time to run this $0.02 program, before and after your upcoming fix :
> 
> It is not a fix, but enhancement, which really has nothing with the bug
> in question :)
> Fix is to return an error if socket binding does not use the heuristic.
> 
>> offset of bsockets being 0x18 or 0x20 is same result : bad because in
>> same cache line than ehash, ehash_locks, ehash_size, ehash_locks_mask,
>> bhash, bhash_size, unless your cpu is a Pentium.
> 
> Attached patch makes difference, I'm curious if it ever make any
> difference in the benchmarks.
> 
>> Also, I suggest you change bsockets to something more appropriate, eg a
>> percpu counter.
> 
> I thought on that first, but found that looping over every cpu and
> summing the total number of allocated/freed sockets will have noticebly
> bigger overhead than having loosely maintaned number of sockets.
> 
> For the reference. This patch has nothing with the bug we discuss here,
> the proper patch (without need to move bsockets around) was sent
> earlier, which forces port selection codepath to return error when new
> selection heuristic is not used.
> 
> --- ./include/net/inet_hashtables.h.orig	2009-01-31 12:27:41.000000000 +0300
> +++ ./include/net/inet_hashtables.h	2009-01-31 12:28:15.000000000 +0300
> @@ -134,7 +134,6 @@
>  	struct inet_bind_hashbucket	*bhash;
>  
>  	unsigned int			bhash_size;
> -	int				bsockets;
>  
>  	struct kmem_cache		*bind_bucket_cachep;
>  
> @@ -150,6 +149,8 @@
>  	 */
>  	struct inet_listen_hashbucket	listening_hash[INET_LHTABLE_SIZE]
>  					____cacheline_aligned_in_smp;
> +	
> +	int				bsockets ____cacheline_aligned_in_smp;
>  
>  };
>  
> 


It appears you are always right, I have nothing to say then.

Stupid I am.

I vote for plain revert of your initial patch, since you are anaware
of performance problems it introduces. Then, probably nobody cares
of my complaints, so dont worry.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Evgeniy Polyakov - Jan. 31, 2009, 9:56 a.m.
On Sat, Jan 31, 2009 at 10:49:00AM +0100, Eric Dumazet (dada1@cosmosbay.com) wrote:
> It appears you are always right, I have nothing to say then.
> 
> Stupid I am.
> 
> I vote for plain revert of your initial patch, since you are anaware
> of performance problems it introduces. Then, probably nobody cares
> of my complaints, so dont worry.

Eric, do not get it soo personally :) After all it is only a matter of
how we enjoy the process and have fun with the development.

Really, I appreciate your work and help, and likely this
misunderstanding happened because of a bad mix of the original bug and
this performance implication. Original bug has really nothing with what
we discuss here. And while the performance problem with bound sockets
creation may be visible, I did not observe it, while the idea
implemented with this approach shows up clearly in the graph I posted.
So I vote by both hands to further improve it by moving things around so
that there would be no unneded cache flushes during update of this
field.
Eric Dumazet - Jan. 31, 2009, 10:17 a.m.
Evgeniy Polyakov a écrit :
> On Sat, Jan 31, 2009 at 10:49:00AM +0100, Eric Dumazet (dada1@cosmosbay.com) wrote:
>> It appears you are always right, I have nothing to say then.
>>
>> Stupid I am.
>>
>> I vote for plain revert of your initial patch, since you are anaware
>> of performance problems it introduces. Then, probably nobody cares
>> of my complaints, so dont worry.
> 
> Eric, do not get it soo personally :) After all it is only a matter of
> how we enjoy the process and have fun with the development.
> 
> Really, I appreciate your work and help, and likely this
> misunderstanding happened because of a bad mix of the original bug and
> this performance implication. Original bug has really nothing with what
> we discuss here. And while the performance problem with bound sockets
> creation may be visible, I did not observe it, while the idea
> implemented with this approach shows up clearly in the graph I posted.
> So I vote by both hands to further improve it by moving things around so
> that there would be no unneded cache flushes during update of this
> field.
> 

OK OK, as I said, dont worry, it was not a strong feeling from me, only
a litle bit upset, thats all.

We only need to know if the *fix* is solving Stephen problem

About performance effects of careful variable placement and percpu counter
strategy you might consult as an example :

http://lkml.indiana.edu/hypermail/linux/kernel/0812.1/01624.html

Now, with these patches applied, try to see effect of your new bsockets field
on a network workload doing lot of socket bind()/unbind() calls...

With current kernels, you probably wont notice because of inode/dcache hot
cache lines, but it might change eventually...


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Evgeniy Polyakov - Feb. 1, 2009, 12:42 p.m.
Hi Eric.

On Sat, Jan 31, 2009 at 11:17:15AM +0100, Eric Dumazet (dada1@cosmosbay.com) wrote:
> We only need to know if the *fix* is solving Stephen problem
> 
> About performance effects of careful variable placement and percpu counter
> strategy you might consult as an example :
> 
> http://lkml.indiana.edu/hypermail/linux/kernel/0812.1/01624.html

Impressive, but to be 100% fair it is not only because of the cache line
issues :)

> Now, with these patches applied, try to see effect of your new bsockets field
> on a network workload doing lot of socket bind()/unbind() calls...
> 
> With current kernels, you probably wont notice because of inode/dcache hot
> cache lines, but it might change eventually...

David applied the patch which fixed the problem, so we can return to the
cache line issues. What do you think about the last version where
bsockets field was placed at the very end of the structure and with
cacheline_aligned_on_smp attribute?

Patch

difference in the benchmarks.

> Also, I suggest you change bsockets to something more appropriate, eg a
> percpu counter.

I thought on that first, but found that looping over every cpu and
summing the total number of allocated/freed sockets will have noticebly
bigger overhead than having loosely maintaned number of sockets.

For the reference. This patch has nothing with the bug we discuss here,
the proper patch (without need to move bsockets around) was sent
earlier, which forces port selection codepath to return error when new
selection heuristic is not used.

--- ./include/net/inet_hashtables.h.orig	2009-01-31 12:27:41.000000000 +0300
+++ ./include/net/inet_hashtables.h	2009-01-31 12:28:15.000000000 +0300
@@ -134,7 +134,6 @@ 
 	struct inet_bind_hashbucket	*bhash;
 
 	unsigned int			bhash_size;
-	int				bsockets;
 
 	struct kmem_cache		*bind_bucket_cachep;
 
@@ -150,6 +149,8 @@ 
 	 */
 	struct inet_listen_hashbucket	listening_hash[INET_LHTABLE_SIZE]
 					____cacheline_aligned_in_smp;
+	
+	int				bsockets ____cacheline_aligned_in_smp;
 
 };