Patchwork 2.6.30-rc deadline scheduler performance regression for iozone over NFS

login
register
mail settings
Submitter Trond Myklebust
Date May 14, 2009, 2:33 p.m.
Message ID <1242311620.6560.14.camel@heimdal.trondhjem.org>
Download mbox | patch
Permalink /patch/27214/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Trond Myklebust - May 14, 2009, 2:33 p.m.
On Thu, 2009-05-14 at 09:34 -0400, Jeff Moyer wrote:
> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> 
> > On Wed, 2009-05-13 at 15:29 -0400, Jeff Moyer wrote:
> >> Hi, netdev folks.  The summary here is:
> >> 
> >> A patch added in the 2.6.30 development cycle caused a performance
> >> regression in my NFS iozone testing.  The patch in question is the
> >> following:
> >> 
> >> commit 47a14ef1af48c696b214ac168f056ddc79793d0e
> >> Author: Olga Kornievskaia <aglo@citi.umich.edu>
> >> Date:   Tue Oct 21 14:13:47 2008 -0400
> >> 
> >>     svcrpc: take advantage of tcp autotuning
> >>  
> >> which is also quoted below.  Using 8 nfsd threads, a single client doing
> >> 2GB of streaming read I/O goes from 107590 KB/s under 2.6.29 to 65558
> >> KB/s under 2.6.30-rc4.  I also see more run to run variation under
> >> 2.6.30-rc4 using the deadline I/O scheduler on the server.  That
> >> variation disappears (as does the performance regression) when reverting
> >> the above commit.
> >
> > It looks to me as if we've got a bug in the svc_tcp_has_wspace() helper
> > function. I can see no reason why we should stop processing new incoming
> > RPC requests just because the send buffer happens to be 2/3 full. If we
> > see that we have space for another reply, then we should just go for it.
> > OTOH, we do want to ensure that the SOCK_NOSPACE flag remains set, so
> > that the TCP layer knows that we're congested, and that we'd like it to
> > increase the send window size, please.
> >
> > Could you therefore please see if the following (untested) patch helps?
> 
> I'm seeing slightly better results with the patch:
> 
> 71548
> 75987
> 71557
> 87432
> 83538
> 
> But that's still not up to the speeds we saw under 2.6.29.  The packet
> capture for one run can be found here:
>   http://people.redhat.com/jmoyer/trond.pcap.bz2
> 
> Cheers,
> Jeff

Yes. Something is very wrong there...

See for instance frame 1195, where the client finishes sending a whole
series of READ requests, and we go into a flurry of ACKs passing
backwards and forwards, but no data. It looks as if the NFS server isn't
processing anything, probably because the window size falls afoul of the
svc_tcp_has_wspace()...

Does something like this help?

Cheers
  Trond
---------------------------------------------------------------------
>From 85e3f5860a9063d193bdb45516b3d3d347b87301 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Thu, 14 May 2009 10:33:07 -0400
Subject: [PATCH] SUNRPC: Always allow the NFS server to process at least one request

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 net/sunrpc/svcsock.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)
Jeff Moyer - May 14, 2009, 2:38 p.m.
Trond Myklebust <trond.myklebust@fys.uio.no> writes:

> On Thu, 2009-05-14 at 09:34 -0400, Jeff Moyer wrote:
>> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>> 
>> > On Wed, 2009-05-13 at 15:29 -0400, Jeff Moyer wrote:
>> >> Hi, netdev folks.  The summary here is:
>> >> 
>> >> A patch added in the 2.6.30 development cycle caused a performance
>> >> regression in my NFS iozone testing.  The patch in question is the
>> >> following:
>> >> 
>> >> commit 47a14ef1af48c696b214ac168f056ddc79793d0e
>> >> Author: Olga Kornievskaia <aglo@citi.umich.edu>
>> >> Date:   Tue Oct 21 14:13:47 2008 -0400
>> >> 
>> >>     svcrpc: take advantage of tcp autotuning
>> >>  
>> >> which is also quoted below.  Using 8 nfsd threads, a single client doing
>> >> 2GB of streaming read I/O goes from 107590 KB/s under 2.6.29 to 65558
>> >> KB/s under 2.6.30-rc4.  I also see more run to run variation under
>> >> 2.6.30-rc4 using the deadline I/O scheduler on the server.  That
>> >> variation disappears (as does the performance regression) when reverting
>> >> the above commit.
>> >
>> > It looks to me as if we've got a bug in the svc_tcp_has_wspace() helper
>> > function. I can see no reason why we should stop processing new incoming
>> > RPC requests just because the send buffer happens to be 2/3 full. If we
>> > see that we have space for another reply, then we should just go for it.
>> > OTOH, we do want to ensure that the SOCK_NOSPACE flag remains set, so
>> > that the TCP layer knows that we're congested, and that we'd like it to
>> > increase the send window size, please.
>> >
>> > Could you therefore please see if the following (untested) patch helps?
>> 
>> I'm seeing slightly better results with the patch:
>> 
>> 71548
>> 75987
>> 71557
>> 87432
>> 83538
>> 
>> But that's still not up to the speeds we saw under 2.6.29.  The packet
>> capture for one run can be found here:
>>   http://people.redhat.com/jmoyer/trond.pcap.bz2
>> 
>> Cheers,
>> Jeff
>
> Yes. Something is very wrong there...
>
> See for instance frame 1195, where the client finishes sending a whole
> series of READ requests, and we go into a flurry of ACKs passing
> backwards and forwards, but no data. It looks as if the NFS server isn't
> processing anything, probably because the window size falls afoul of the
> svc_tcp_has_wspace()...
>
> Does something like this help?

Is this in addition to the previous patch or instead of it?

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Moyer - May 14, 2009, 3 p.m.
Trond Myklebust <trond.myklebust@fys.uio.no> writes:

> On Thu, 2009-05-14 at 09:34 -0400, Jeff Moyer wrote:
>> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>> 
>> > On Wed, 2009-05-13 at 15:29 -0400, Jeff Moyer wrote:
>> >> Hi, netdev folks.  The summary here is:
>> >> 
>> >> A patch added in the 2.6.30 development cycle caused a performance
>> >> regression in my NFS iozone testing.  The patch in question is the
>> >> following:
>> >> 
>> >> commit 47a14ef1af48c696b214ac168f056ddc79793d0e
>> >> Author: Olga Kornievskaia <aglo@citi.umich.edu>
>> >> Date:   Tue Oct 21 14:13:47 2008 -0400
>> >> 
>> >>     svcrpc: take advantage of tcp autotuning
>> >>  
>> >> which is also quoted below.  Using 8 nfsd threads, a single client doing
>> >> 2GB of streaming read I/O goes from 107590 KB/s under 2.6.29 to 65558
>> >> KB/s under 2.6.30-rc4.  I also see more run to run variation under
>> >> 2.6.30-rc4 using the deadline I/O scheduler on the server.  That
>> >> variation disappears (as does the performance regression) when reverting
>> >> the above commit.
>> >
>> > It looks to me as if we've got a bug in the svc_tcp_has_wspace() helper
>> > function. I can see no reason why we should stop processing new incoming
>> > RPC requests just because the send buffer happens to be 2/3 full. If we
>> > see that we have space for another reply, then we should just go for it.
>> > OTOH, we do want to ensure that the SOCK_NOSPACE flag remains set, so
>> > that the TCP layer knows that we're congested, and that we'd like it to
>> > increase the send window size, please.
>> >
>> > Could you therefore please see if the following (untested) patch helps?
>> 
>> I'm seeing slightly better results with the patch:
>> 
>> 71548
>> 75987
>> 71557
>> 87432
>> 83538
>> 
>> But that's still not up to the speeds we saw under 2.6.29.  The packet
>> capture for one run can be found here:
>>   http://people.redhat.com/jmoyer/trond.pcap.bz2
>> 
>> Cheers,
>> Jeff
>
> Yes. Something is very wrong there...
>
> See for instance frame 1195, where the client finishes sending a whole
> series of READ requests, and we go into a flurry of ACKs passing
> backwards and forwards, but no data. It looks as if the NFS server isn't
> processing anything, probably because the window size falls afoul of the
> svc_tcp_has_wspace()...
>
> Does something like this help?

Sorry for the previous, stupid question.  I applied the patch in
addition the last one and here are the results:

70327
71561
68760
69199
65324

A packet capture for this run is available here:
  http://people.redhat.com/jmoyer/trond2.pcap.bz2

Any more ideas?  ;)

-Jeff
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 8962355..4837442 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -972,9 +972,16 @@  static int svc_tcp_has_wspace(struct svc_xprt *xprt)
 {
 	struct svc_sock *svsk =	container_of(xprt, struct svc_sock, sk_xprt);
 	struct svc_serv	*serv = svsk->sk_xprt.xpt_server;
+	int reserved;
 	int required;
 
-	required = (atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg) * 2;
+	reserved = atomic_read(&xprt->xpt_reserved);
+	/* Always allow the server to process at least one request, whether
+	 * or not the TCP window is large enough
+	 */
+	if (reserved == 0)
+		return 1;
+	required = (reserved + serv->sv_max_mesg) << 1;
 	if (sk_stream_wspace(svsk->sk_sk) < required)
 		goto out_nospace;
 	return 1;