diff mbox

REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

Message ID 20140208213608.GA24328@glanzmann.de
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Thomas Glanzmann Feb. 8, 2014, 9:36 p.m. UTC
Hello Eric,

> I was simply thinking about something like :
> (might need further changes, but I guess this should solve your case)

thank you for your patch. It did not apply on top of Linux tip, so I put
in the changes manually and fixed up another call to tx_data that your
forgot in your initial patch to make it apply.

I gave it another run, can you confirm that it now behaves better?

https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched_tcp_more.pcap.bz2

And look at that roundtrip graph it is perfect. Also filesystem is now
created in 3 seconds instead of 4.

https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-22:34:57.png

Nab, do you consider this patch for upstream? Would you take if I clean
it up?

Cheers,
        Thomas

PS: I'm asleep for the next 8 hours.

More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Feb. 9, 2014, 12:15 a.m. UTC | #1
On Sat, 2014-02-08 at 22:36 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> 
> > I was simply thinking about something like :
> > (might need further changes, but I guess this should solve your case)
> 
> thank you for your patch. It did not apply on top of Linux tip, so I put
> in the changes manually and fixed up another call to tx_data that your
> forgot in your initial patch to make it apply.
> 
> I gave it another run, can you confirm that it now behaves better?
> 
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched_tcp_more.pcap.bz2
> 
> And look at that roundtrip graph it is perfect. Also filesystem is now
> created in 3 seconds instead of 4.

Yes, this is much better : 2 frames per request/response, instead of 4.

13:32:04.665367 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 384:432, ack 2529, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], length 48
13:32:04.665483 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 2529:3089, ack 432, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], length 560
13:32:04.665642 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 432:480, ack 3089, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], length 48
13:32:04.665756 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 3089:3649, ack 480, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], length 560
13:32:04.665933 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 480:528, ack 3649, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], length 48
13:32:04.666046 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 3649:4209, ack 528, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], length 560
13:32:04.666214 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 528:576, ack 4209, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], length 48
13:32:04.666333 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 4209:4769, ack 576, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], length 560
13:32:04.666678 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 576:624, ack 4769, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], length 48
13:32:04.666790 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 4769:5329, ack 624, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], length 560
13:32:04.666983 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 624:672, ack 5329, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], length 48
13:32:04.667097 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 5329:5889, ack 672, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], length 560
13:32:04.667280 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 672:720, ack 5889, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], length 48
13:32:04.667324 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 5889:6449, ack 720, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], length 560
13:32:04.667500 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 720:768, ack 6449, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], length 48
13:32:04.667540 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 6449:7009, ack 768, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], length 560


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Glanzmann Feb. 9, 2014, 7:45 a.m. UTC | #2
Hello Eric,

> Yes, this is much better : 2 frames per request/response, instead of 4.

perfect. I send out the page to the iscsi target list in your name since
you did the work and I added me as signed off I hope that is how it is
handled or should I have added my name to the from line and mentioned in
the description of the patch that you did the heavy lifting?

Cheers,
        Thomas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index e655b04..0eb9681 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -1168,7 +1168,7 @@  send_data:
 		iov_count = cmd->iov_misc_count;
 	}
 
-	tx_sent = tx_data(conn, &iov[0], iov_count, tx_size);
+	tx_sent = tx_data(conn, &iov[0], iov_count, tx_size, 0);
 	if (tx_size != tx_sent) {
 		if (tx_sent == -EAGAIN) {
 			pr_err("tx_data() returned -EAGAIN\n");
@@ -1199,7 +1199,8 @@  send_hdr:
 	iov.iov_base = cmd->pdu;
 	iov.iov_len = tx_hdr_size;
 
-	tx_sent = tx_data(conn, &iov, 1, tx_hdr_size);
+	data_len = cmd->tx_size - tx_hdr_size - cmd->padding;
+        tx_sent = tx_data(conn, &iov, 1, tx_hdr_size, data_len ? MSG_MORE : 0);
 	if (tx_hdr_size != tx_sent) {
 		if (tx_sent == -EAGAIN) {
 			pr_err("tx_data() returned -EAGAIN\n");
@@ -1208,7 +1209,6 @@  send_hdr:
 		return -1;
 	}
 
-	data_len = cmd->tx_size - tx_hdr_size - cmd->padding;
 	/*
 	 * Set iov_off used by padding and data digest tx_data() calls below
 	 * in order to determine proper offset into cmd->iov_data[]
@@ -1252,7 +1252,8 @@  send_padding:
 	if (cmd->padding) {
 		struct kvec *iov_p = &cmd->iov_data[iov_off++];
 
-		tx_sent = tx_data(conn, iov_p, 1, cmd->padding);
+		tx_sent = tx_data(conn, iov_p, 1, cmd->padding,
+			          conn->conn_ops->DataDigest ? MSG_MORE : 0);
 		if (cmd->padding != tx_sent) {
 			if (tx_sent == -EAGAIN) {
 				pr_err("tx_data() returned -EAGAIN\n");
@@ -1266,7 +1267,7 @@  send_datacrc:
 	if (conn->conn_ops->DataDigest) {
 		struct kvec *iov_d = &cmd->iov_data[iov_off];
 
-		tx_sent = tx_data(conn, iov_d, 1, ISCSI_CRC_LEN);
+		tx_sent = tx_data(conn, iov_d, 1, ISCSI_CRC_LEN, 0);
 		if (ISCSI_CRC_LEN != tx_sent) {
 			if (tx_sent == -EAGAIN) {
 				pr_err("tx_data() returned -EAGAIN\n");
@@ -1352,11 +1353,13 @@  static int iscsit_do_rx_data(
 
 static int iscsit_do_tx_data(
 	struct iscsi_conn *conn,
-	struct iscsi_data_count *count)
+	struct iscsi_data_count *count,
+	int flags)
 {
 	int data = count->data_length, total_tx = 0, tx_loop = 0, iov_len;
 	struct kvec *iov_p;
 	struct msghdr msg;
+        struct msghdr msg = { .msg_flags = flags };
 
 	if (!conn || !conn->sock || !conn->conn_ops)
 		return -1;
@@ -1366,8 +1369,6 @@  static int iscsit_do_tx_data(
 		return -1;
 	}
 
-	memset(&msg, 0, sizeof(struct msghdr));
-
 	iov_p = count->iov;
 	iov_len = count->iov_count;
 
@@ -1411,7 +1412,8 @@  int tx_data(
 	struct iscsi_conn *conn,
 	struct kvec *iov,
 	int iov_count,
-	int data)
+	int data,
+	int flags)
 {
 	struct iscsi_data_count c;
 
@@ -1424,7 +1426,7 @@  int tx_data(
 	c.data_length = data;
 	c.type = ISCSI_TX_DATA;
 
-	return iscsit_do_tx_data(conn, &c);
+	return iscsit_do_tx_data(conn, &c, flags);
 }
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org