Message ID | 20090416213122.GB5894@dhcp-1-124.tlv.redhat.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, Apr 17, 2009 at 12:31:22AM +0300, Michael S. Tsirkin wrote: > Hi, > I have a simple test that sends 10K packets out of a tap device. Average time > needed to send a packet has gone up from 2.6.29 to 2.6.30-rc1. > > 2.6.30-rc1: > > #sh runsend > time per packet: 7570 ns > > 2.6.29: > > #git checkout v2.6.29 -- drivers/net/tun.c > #make modules modules_install > #rmmod tun > #sh runsend > time per packet: 6337 ns > > I note that before 2.6.29, all tun skbs would typically be linear, > while in 2.6.30-rc1, skbs for packet size > 1 page would be paged. > And I found this comment by Rusty (it appears in the comment for > commit f42157cb568c1eb02eca7df4da67553a9edae24a): > > My original version of this patch always allocate paged skbs for big > packets. But that made performance drop from 8.4 seconds to 8.8 > seconds on 1G lguest->Host TCP xmit. So now we only do that as a > fallback. > > So just for fun, I did this: > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 37a5a04..1234d6b 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -520,7 +518,6 @@ static inline struct sk_buff *tun_alloc_skb(struct tun_struct *tun, > int err; > > /* Under a page? Don't bother with paged skb. */ > - if (prepad + len < PAGE_SIZE) > linear = len; > > skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock, > > This makes all skbs linear in tun. And now: > > 2.6.30-rc1 made linear: > #sh runsend > time per packet: 6611 ns > > Two points of interest here: > - It seems that linear skbs are generally faster. > Would it make sense to make tun try to use linear skbs again, > as it did before 2.6.29? > > - The new code seems to introduce some measurable overhead. > My understanding is that it's main motivation is memory > accounting - would it make sense to create a faster code path > for the default case where accounting is disabled? Continuing with the investigation, commenting out atomic_inc_not_zero and atomic_dec_and_test in tun_get/tun_put gets us back most of the rest of the performance: # sh runsend time per packet: 6461 ns I was wondering whether the socket reference counting, which is done anyway, can be reused in some way. Ideas?
On Fri, Apr 17, 2009 at 12:31:22AM +0300, Michael S. Tsirkin wrote: > Hi, > I have a simple test that sends 10K packets out of a tap device. Average time > needed to send a packet has gone up from 2.6.29 to 2.6.30-rc1. > > 2.6.30-rc1: > > #sh runsend > time per packet: 7570 ns > > 2.6.29: > > #git checkout v2.6.29 -- drivers/net/tun.c > #make modules modules_install > #rmmod tun > #sh runsend > time per packet: 6337 ns > > I note that before 2.6.29, all tun skbs would typically be linear, > while in 2.6.30-rc1, skbs for packet size > 1 page would be paged. > And I found this comment by Rusty (it appears in the comment for > commit f42157cb568c1eb02eca7df4da67553a9edae24a): Again this should already be fixed in the latest net-2.6. Thanks,
On Fri, Apr 17, 2009 at 01:15:05AM +0300, Michael S. Tsirkin wrote: > > Continuing with the investigation, commenting out > atomic_inc_not_zero and atomic_dec_and_test in tun_get/tun_put > gets us back most of the rest of the performance: I'll try to think of a way to kill these ref counts, once I get my other patch fixed :) Cheers,
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 37a5a04..1234d6b 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -520,7 +518,6 @@ static inline struct sk_buff *tun_alloc_skb(struct tun_struct *tun, int err; /* Under a page? Don't bother with paged skb. */ - if (prepad + len < PAGE_SIZE) linear = len; skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,