diff mbox

REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

Message ID 20140208133744.GA20512@glanzmann.de
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Thomas Glanzmann Feb. 8, 2014, 1:37 p.m. UTC
Hello Eric,

> > tcp corking kills iSCSI performance

> Here is the combined patch, could you test it?

the patch did not apply, so I edited by hand. Here is the resulting
patch:


-- cut here --

It fixes my case but if you look at the round trip time it is not even
close what it used to be. So while this fixes my problem I'm still for
disabling it by default.

https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-14:36:25.png

Cheers,
        Thomas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Feb. 8, 2014, 1:53 p.m. UTC | #1
On Sat, 2014-02-08 at 14:37 +0100, Thomas Glanzmann wrote:
> Hello Eric,

> 
> It fixes my case but if you look at the round trip time it is not even
> close what it used to be. So while this fixes my problem I'm still for
> disabling it by default.
> 
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-14:36:25.png

Very nice.

Now we have to check your NIC and how TX completion is performed.

What is your NIC model and driver ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Glanzmann Feb. 8, 2014, 1:58 p.m. UTC | #2
Hello Eric,

> What is your NIC model and driver?

I have four Intel Corporation I350 Gigabit Network Connection (rev 01).

(node-62) [~/work/linux-2.6] lspci -v | pbot
http://pbot.rmdir.de/rgu6yHMBDVQpflMmbcJACg
(node-62) [~/work/linux-2.6] ip a s | pbot
http://pbot.rmdir.de/xJjRT8u-ekC6mrWgl09ZtQ
(node-62) [~/work/linux-2.6] dmesg | pbot
http://pbot.rmdir.de/MigrSPtxGmp0fI1CRgXsHw

I do 802.3ad link aggregation layer 2 hash with two network cards to one
switch.

I'm running:
Linux node-62 3.14.0-rc1+ #23 SMP Sat Feb 8 14:27:47 CET 2014 x86_64 GNU/Linux

Driver: igb

If you need remote access to the machine let me know.

Cheers,
        Thomas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 8, 2014, 2:09 p.m. UTC | #3
On Sat, 2014-02-08 at 14:37 +0100, Thomas Glanzmann wrote:

> 
> It fixes my case but if you look at the round trip time it is not even
> close what it used to be. So while this fixes my problem I'm still for
> disabling it by default.
> 
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2

This pcap was taken on which host ?

10.101.99.5 or  10.101.0.13 ?




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Glanzmann Feb. 8, 2014, 2:12 p.m. UTC | #4
Hello Eric,

[RESEND: dropped CC accidently]

> 10.101.99.5 or 10.101.0.13?

10.101.99.5 (iSCSI Target)

tcpdump -i bond0.101 -s 0 -w /tmp/tcp_auto_corking_on_patched.pcap host esx-03.v101.campusvl.de

Cheers,
        Thomas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Glanzmann Feb. 17, 2014, 2:08 p.m. UTC | #5
Hello Eric,
may submit your latest patch for upstream? Or do you plan on doing that
yourself?

Cheers,
        Thomas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 17, 2014, 3:26 p.m. UTC | #6
On Mon, 2014-02-17 at 15:08 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> may submit your latest patch for upstream? Or do you plan on doing that
> yourself?

Unfortunately you did not had good results with the MSG_MORE applied to
the page fragments.

I think I'll submit the part only dealing with the metadata.

Then later we might take care of the page themselves.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Glanzmann Feb. 17, 2014, 3:32 p.m. UTC | #7
Hello Eric,

> Unfortunately you did not had good results with the MSG_MORE applied
> to the page fragments.

I agree. We should submit only the submit the patch from this message:

Message-ID: <1391886759.10160.114.camel@edumazet-glaptop2.roam.corp.google.com>
http://mid.gmane.org/1391886759.10160.114.camel@edumazet-glaptop2.roam.corp.google.com

> I think I'll submit the part only dealing with the metadata.

May I submit the patch or do you want to do it yourself?

Cheers,
        Thomas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 17, 2014, 3:46 p.m. UTC | #8
On Mon, 2014-02-17 at 16:32 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> 
> > Unfortunately you did not had good results with the MSG_MORE applied
> > to the page fragments.
> 
> I agree. We should submit only the submit the patch from this message:
> 
> Message-ID: <1391886759.10160.114.camel@edumazet-glaptop2.roam.corp.google.com>
> http://mid.gmane.org/1391886759.10160.114.camel@edumazet-glaptop2.roam.corp.google.com
> 
> > I think I'll submit the part only dealing with the metadata.
> 
> May I submit the patch or do you want to do it yourself?

I'll do it tomorrow : Today is President's Day in the US, and I am
spending the day with my family.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Glanzmann Feb. 17, 2014, 3:46 p.m. UTC | #9
Hello Eric,

> I'll do it tomorrow : Today is President's Day in the US, and I am
> spending the day with my family.

thank you. Enjoy your day.

Cheers,
        Thomas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 03d26b8..40d1958 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -698,7 +698,8 @@  static void tcp_tsq_handler(struct sock *sk)
 	if ((1 << sk->sk_state) &
 	    (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_CLOSING |
 	     TCPF_CLOSE_WAIT  | TCPF_LAST_ACK))
-		tcp_write_xmit(sk, tcp_current_mss(sk), 0, 0, GFP_ATOMIC);
+			tcp_write_xmit(sk, tcp_current_mss(sk), tcp_sk(sk)->nonagle,
+	                               0, GFP_ATOMIC);
 }
 /*
  * One tasklet per cpu tries to send more skbs.
@@ -1904,7 +1905,16 @@  static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 
 		if (atomic_read(&sk->sk_wmem_alloc) > limit) {
 			set_bit(TSQ_THROTTLED, &tp->tsq_flags);
-			break;
+			/* It is possible TX completion already happened
+			 * before we set TSQ_THROTTLED, so we must
+			 * test again the condition.
+			 * We abuse smp_mb__after_clear_bit() because
+			 * there is no smp_mb__after_set_bit() yet
+			 */
+			smp_mb__after_clear_bit();
+			if (atomic_read(&sk->sk_wmem_alloc) > limit)
+				break;
+
 		}
 
 		limit = mss_now;