Message ID | 1357498815.6919.570.camel@edumazet-glaptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Sun, Jan 06, 2013 at 11:00:15AM -0800, Eric Dumazet wrote: > On Sun, 2013-01-06 at 10:51 -0800, Eric Dumazet wrote: > > > > > > (sd->len is usually 4096, which is expected, but sd->total_len value is > > > huge in your case, so we always set the flag in fs/splice.c) > > > > I am testing : > > > > if (sd->len < sd->total_len && pipe->nrbufs > 1) > > more |= MSG_SENDPAGE_NOTLAST; > > > > Yes, this should fix the problem : > > If there is no following buffer in the pipe, we should not set NOTLAST. > > diff --git a/fs/splice.c b/fs/splice.c > index 8890604..6909d89 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -696,8 +696,10 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe, > return -EINVAL; > > more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; > - if (sd->len < sd->total_len) > + > + if (sd->len < sd->total_len && pipe->nrbufs > 1) > more |= MSG_SENDPAGE_NOTLAST; > + > return file->f_op->sendpage(file, buf->page, buf->offset, > sd->len, &pos, more); > } OK it works like a charm here now ! I can't break it anymore, so it looks like you finally got it ! I noticed that the data rate was higher when the loopback's MTU is exactly a multiple of 4096 (making the 64k choice optimal) while I would have assumed that in order to efficiently splice TCP segments, we'd need to have some space for IP/TCP headers and n*4k for the payload. I also got the transfer freezes again a few times when starting tcpdump on the server, but this is not 100% reproducible I'm afraid. So I'll bring this back when I manage to get some analysable pattern. The spliced transfer through all the chain haproxy works fine again at 10gig with your fix. The issue is closed for me. Feel free to add my Tested-By if you want. Thank you Eric :-) Willy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2013-01-06 at 20:34 +0100, Willy Tarreau wrote: > OK it works like a charm here now ! I can't break it anymore, so it > looks like you finally got it ! > > I noticed that the data rate was higher when the loopback's MTU > is exactly a multiple of 4096 (making the 64k choice optimal) > while I would have assumed that in order to efficiently splice > TCP segments, we'd need to have some space for IP/TCP headers > and n*4k for the payload. > > I also got the transfer freezes again a few times when starting > tcpdump on the server, but this is not 100% reproducible I'm afraid. > So I'll bring this back when I manage to get some analysable pattern. > > The spliced transfer through all the chain haproxy works fine again > at 10gig with your fix. The issue is closed for me. Feel free to add > my Tested-By if you want. > Good to know ! What is the max speed you get now ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 06, 2013 at 11:39:31AM -0800, Eric Dumazet wrote: > On Sun, 2013-01-06 at 20:34 +0100, Willy Tarreau wrote: > > > OK it works like a charm here now ! I can't break it anymore, so it > > looks like you finally got it ! > > > > I noticed that the data rate was higher when the loopback's MTU > > is exactly a multiple of 4096 (making the 64k choice optimal) > > while I would have assumed that in order to efficiently splice > > TCP segments, we'd need to have some space for IP/TCP headers > > and n*4k for the payload. > > > > I also got the transfer freezes again a few times when starting > > tcpdump on the server, but this is not 100% reproducible I'm afraid. > > So I'll bring this back when I manage to get some analysable pattern. > > > > The spliced transfer through all the chain haproxy works fine again > > at 10gig with your fix. The issue is closed for me. Feel free to add > > my Tested-By if you want. > > > > Good to know ! > > What is the max speed you get now ? Line rate with 1500 MTU and LRO enabled : # time eth1(ikb ipk okb opk) eth2(ikb ipk okb opk) 1357060023 19933.3 41527.7 9355538.2 62167.7 9757888.1 808701.1 19400.3 40417.7 1357060024 26124.1 54425.5 9290064.9 48804.4 9778294.0 810210.0 18068.8 37643.3 1357060025 27015.2 56281.1 9296115.3 46868.8 9797125.9 811271.1 8790.1 18312.2 1357060026 27556.0 57408.8 9291701.4 46805.5 9805371.6 811410.0 3494.8 7280.0 1357060027 27577.0 57452.2 9293606.8 46804.4 9806122.3 811314.4 2558.7 5330.0 1357060028 27476.1 57242.2 9296885.4 46830.0 9794537.3 810527.7 2516.1 5242.2 ^^^^^^^ ^^^^^^^ kbps out kbps in eth1=facing the client eth2=facing the server Top reports the following usage : Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 31.7%id, 0.0%wa, 0.0%hi, 68.3%si, 0.0%st Cpu1 : 1.0%us, 37.3%sy, 0.0%ni, 61.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st (IRQ bound to cpu 0, haproxy to cpu 1) This is a core2duo 2.66 GHz and the myris are 1st generation. BTW I was very happy to see that the LRO->GRO conversion patches in 3.8-rc2 don't affect byte rate anymore (just a minor CPU usage increase but this is not critical here), now I won't complain about it being slower anymore, you won :-) With the GRO patches backported, still at 1500 MTU but with GRO now : Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 28.7%id, 0.0%wa, 0.0%hi, 71.3%si, 0.0%st Cpu1 : 0.0%us, 37.6%sy, 0.0%ni, 62.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st # time eth1(ikb ipk okb opk) eth2(ikb ipk okb opk) 1357058637 18319.3 38165.5 9401736.3 65159.9 9761613.4 808963.3 19403.6 40424.4 1357058638 20009.8 41687.7 9400903.7 62706.6 9770555.8 809522.2 18696.5 38951.1 1357058639 25439.5 52999.9 9301635.3 50267.7 9773666.7 809721.1 19174.1 39946.6 1357058640 26808.2 55850.0 9298301.4 46876.6 9790470.1 810843.3 12408.7 25851.1 1357058641 27110.9 56481.1 9297009.2 46832.2 9803308.4 811339.9 5692.5 11859.9 1357058642 27411.1 57106.6 9291419.2 46796.6 9806846.5 811378.8 2804.4 5842.2 This kernel is getting really good :-) Cheers, Willy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> "Willy" == Willy Tarreau <w@1wt.eu> writes: Willy> On Sun, Jan 06, 2013 at 11:00:15AM -0800, Eric Dumazet wrote: >> On Sun, 2013-01-06 at 10:51 -0800, Eric Dumazet wrote: >> > > >> > > (sd->len is usually 4096, which is expected, but sd->total_len value is >> > > huge in your case, so we always set the flag in fs/splice.c) >> > >> > I am testing : >> > >> > if (sd->len < sd->total_len && pipe->nrbufs > 1) >> > more |= MSG_SENDPAGE_NOTLAST; >> > >> >> Yes, this should fix the problem : >> >> If there is no following buffer in the pipe, we should not set NOTLAST. >> >> diff --git a/fs/splice.c b/fs/splice.c >> index 8890604..6909d89 100644 >> --- a/fs/splice.c >> +++ b/fs/splice.c >> @@ -696,8 +696,10 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe, >> return -EINVAL; >> >> more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; >> - if (sd->len < sd->total_len) >> + >> + if (sd->len < sd->total_len && pipe->nrbufs > 1) >> more |= MSG_SENDPAGE_NOTLAST; >> + >> return file->f_op->sendpage(file, buf->page, buf->offset, sd-> len, &pos, more); >> } Willy> OK it works like a charm here now ! I can't break it anymore, so it Willy> looks like you finally got it ! It's still broken, there's no comments in the code to explain all this magic to mere mortals! *grin* John -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 06, 2013 at 04:49:35PM -0500, John Stoffel wrote: > >>>>> "Willy" == Willy Tarreau <w@1wt.eu> writes: > > Willy> On Sun, Jan 06, 2013 at 11:00:15AM -0800, Eric Dumazet wrote: > >> On Sun, 2013-01-06 at 10:51 -0800, Eric Dumazet wrote: > >> > > > >> > > (sd->len is usually 4096, which is expected, but sd->total_len value is > >> > > huge in your case, so we always set the flag in fs/splice.c) > >> > > >> > I am testing : > >> > > >> > if (sd->len < sd->total_len && pipe->nrbufs > 1) > >> > more |= MSG_SENDPAGE_NOTLAST; > >> > > >> > >> Yes, this should fix the problem : > >> > >> If there is no following buffer in the pipe, we should not set NOTLAST. > >> > >> diff --git a/fs/splice.c b/fs/splice.c > >> index 8890604..6909d89 100644 > >> --- a/fs/splice.c > >> +++ b/fs/splice.c > >> @@ -696,8 +696,10 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe, > >> return -EINVAL; > >> > >> more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; > >> - if (sd->len < sd->total_len) > >> + > >> + if (sd->len < sd->total_len && pipe->nrbufs > 1) > >> more |= MSG_SENDPAGE_NOTLAST; > >> + > >> return file->f_op->sendpage(file, buf->page, buf->offset, > sd-> len, &pos, more); > >> } > > Willy> OK it works like a charm here now ! I can't break it anymore, so it > Willy> looks like you finally got it ! > > It's still broken, there's no comments in the code to explain all this > magic to mere mortals! *grin* I would generally agree, but when Eric fixes such a thing, he generally goes with lengthy details in the commit message. Willy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> "Willy" == Willy Tarreau <w@1wt.eu> writes: Willy> On Sun, Jan 06, 2013 at 04:49:35PM -0500, John Stoffel wrote: >> >>>>> "Willy" == Willy Tarreau <w@1wt.eu> writes: >> Willy> On Sun, Jan 06, 2013 at 11:00:15AM -0800, Eric Dumazet wrote: >> >> On Sun, 2013-01-06 at 10:51 -0800, Eric Dumazet wrote: >> >> > > >> >> > > (sd->len is usually 4096, which is expected, but sd->total_len value is >> >> > > huge in your case, so we always set the flag in fs/splice.c) >> >> > >> >> > I am testing : >> >> > >> >> > if (sd->len < sd->total_len && pipe->nrbufs > 1) >> >> > more |= MSG_SENDPAGE_NOTLAST; >> >> > >> >> >> >> Yes, this should fix the problem : >> >> >> >> If there is no following buffer in the pipe, we should not set NOTLAST. >> >> >> >> diff --git a/fs/splice.c b/fs/splice.c >> >> index 8890604..6909d89 100644 >> >> --- a/fs/splice.c >> >> +++ b/fs/splice.c >> >> @@ -696,8 +696,10 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe, >> >> return -EINVAL; >> >> >> >> more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; >> >> - if (sd->len < sd->total_len) >> >> + >> >> + if (sd->len < sd->total_len && pipe->nrbufs > 1) >> >> more |= MSG_SENDPAGE_NOTLAST; >> >> + >> >> return file->f_op->sendpage(file, buf->page, buf->offset, sd-> len, &pos, more); >> >> } >> Willy> OK it works like a charm here now ! I can't break it anymore, so it Willy> looks like you finally got it ! >> >> It's still broken, there's no comments in the code to explain all this >> magic to mere mortals! *grin* Willy> I would generally agree, but when Eric fixes such a thing, he Willy> generally goes with lengthy details in the commit message. I'm sure he will too, I just wanted to nudge him because while I sorta followed this discussion, I see lots of pain down the road if the code wasn't updated with some nice big fat comments. Great job finding this code and testing, testing, testing. John -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/splice.c b/fs/splice.c index 8890604..6909d89 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -696,8 +696,10 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe, return -EINVAL; more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; - if (sd->len < sd->total_len) + + if (sd->len < sd->total_len && pipe->nrbufs > 1) more |= MSG_SENDPAGE_NOTLAST; + return file->f_op->sendpage(file, buf->page, buf->offset, sd->len, &pos, more); }