diff mbox

[net-next] tcp: tcp_sendmsg() page recycling

Message ID 1323018317.2762.143.camel@edumazet-laptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Dec. 4, 2011, 5:05 p.m. UTC
If our TCP_PAGE(sk) is not shared (page_count() == 1), we can set page
offset to 0.

This permits better filling of the pages on small to medium tcp writes.

"tbench 16" results on my dev server (2x4x2 machine) :

Before : 3072 MB/s
After  : 3146 MB/s  (2.4 % gain)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/tcp.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Joe Perches Dec. 4, 2011, 5:20 p.m. UTC | #1
On Sun, 2011-12-04 at 18:05 +0100, Eric Dumazet wrote:
> If our TCP_PAGE(sk) is not shared (page_count() == 1), we can set page
> offset to 0.
[]
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
[]
> @@ -1009,7 +1009,12 @@ new_segment:
>  				int merge = 0;
>  				int i = skb_shinfo(skb)->nr_frags;
>  				struct page *page = TCP_PAGE(sk);
> -				int off = TCP_OFF(sk);
> +				int off;
> +
> +				if (page && page_count(page) == 1)
> +					TCP_OFF(sk) = 0;
> +
> +				off = TCP_OFF(sk);

This might be clearer and take one less indirection as

				if (page && page_count(page) == 1) {
					TCP_OFF(sk) = 0;
					off = 0;
				} else {
					off = 0;
				}

or maybe

				if (page && page_count(page) == 1)
					off = TCP_OFF(sk) = 0;
				else
					off = 0;

And maybe the TCP_OFF and TCP_PAGE macros should be removed.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 4, 2011, 5:36 p.m. UTC | #2
Le dimanche 04 décembre 2011 à 18:05 +0100, Eric Dumazet a écrit :
> If our TCP_PAGE(sk) is not shared (page_count() == 1), we can set page
> offset to 0.
> 
> This permits better filling of the pages on small to medium tcp writes.
> 
> "tbench 16" results on my dev server (2x4x2 machine) :
> 
> Before : 3072 MB/s
> After  : 3146 MB/s  (2.4 % gain)

By the way current linux tree on the same machine gets 2957 MB/s

So the combination of select_size() patch and this one gives
a 6.4 % gain.

Definitely worth the pain, isnt it ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 4, 2011, 6:14 p.m. UTC | #3
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 04 Dec 2011 18:36:52 +0100

> Definitely worth the pain, isnt it ?

As long as such gains aren't lost in the noise when talking over
real devices.

I'm not a big fan of tbench over loopback numbers, frankly :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 4, 2011, 6:21 p.m. UTC | #4
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 04 Dec 2011 18:05:17 +0100

> If our TCP_PAGE(sk) is not shared (page_count() == 1), we can set page
> offset to 0.
> 
> This permits better filling of the pages on small to medium tcp writes.
> 
> "tbench 16" results on my dev server (2x4x2 machine) :
> 
> Before : 3072 MB/s
> After  : 3146 MB/s  (2.4 % gain)
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas de Pesloüan Dec. 4, 2011, 6:27 p.m. UTC | #5
Le 04/12/2011 18:20, Joe Perches a écrit :
> On Sun, 2011-12-04 at 18:05 +0100, Eric Dumazet wrote:
>> If our TCP_PAGE(sk) is not shared (page_count() == 1), we can set page
>> offset to 0.
> []
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> []
>> @@ -1009,7 +1009,12 @@ new_segment:
>>   				int merge = 0;
>>   				int i = skb_shinfo(skb)->nr_frags;
>>   				struct page *page = TCP_PAGE(sk);
>> -				int off = TCP_OFF(sk);
>> +				int off;
>> +
>> +				if (page&&  page_count(page) == 1)
>> +					TCP_OFF(sk) = 0;
>> +
>> +				off = TCP_OFF(sk);
>
> This might be clearer and take one less indirection as
>
> 				if (page&&  page_count(page) == 1) {
> 					TCP_OFF(sk) = 0;
> 					off = 0;
> 				} else {
> 					off = 0;
> 				}
>
> or maybe
>
> 				if (page&&  page_count(page) == 1)
> 					off = TCP_OFF(sk) = 0;
> 				else
> 					off = 0;

Well, those two fragments don't look equivalent to the original one, unless TCP_OFF(sk) happens to 
be 0 before the test. But if this is true, then the whole thing seems useless. Do I miss something?

For as far as I understand, the following should be equivalent to Eric's original, but I don't 
consider it better than the original.

  				if (page&&  page_count(page) == 1)
  					off = TCP_OFF(sk) = 0;
  				else
  					off = TCP_OFF(sk);

	Nicolas.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joe Perches Dec. 4, 2011, 6:54 p.m. UTC | #6
On Sun, 2011-12-04 at 19:27 +0100, Nicolas de Pesloüan wrote:
> Well, those two fragments don't look equivalent to the original one, unless TCP_OFF(sk) happens to 
> be 0 before the test. But if this is true, then the whole thing seems useless. Do I miss something?

Nope, you're completely correct.  It's just me not typing correctly.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 45156be..a09fe25 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1009,7 +1009,12 @@  new_segment:
 				int merge = 0;
 				int i = skb_shinfo(skb)->nr_frags;
 				struct page *page = TCP_PAGE(sk);
-				int off = TCP_OFF(sk);
+				int off;
+
+				if (page && page_count(page) == 1)
+					TCP_OFF(sk) = 0;
+
+				off = TCP_OFF(sk);
 
 				if (skb_can_coalesce(skb, i, page, off) &&
 				    off != PAGE_SIZE) {