diff mbox

[iproute2] Re: HTB accuracy for high speed

Message ID 20090528211258.GA3658@ami.dom.local
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jarek Poplawski May 28, 2009, 9:12 p.m. UTC
On Thu, May 28, 2009 at 07:13:40PM +0100, Antonio Almeida wrote:
> On Sat, May 23, 2009 at 8:32 AM, Jarek Poplawski wrote:
> > Actually, from these two I was more interested in iproute2 more
> > fitting the kernel version. :-((It should be enough to have at least
> > tc compiled properly, I guess.)
> I installed iproute2-ss090115 with the new patch but the results are
> the same for my test scenery. HTB keeps sending 620Mbit/s when I
> configure it's ceil to 555Mbit/s, with 800 bytes packets long.
> 
> > Btw.: if at any point you think this testing is too disturbing to you
> > etc., feel free to stop this or delay in time as you like.
> I'm working on this, don't worry. Since I have a traffic
> generator/analyser, any modification you would make I can test it.
> You're free to ask.
> 
> I've been looking inside htb source code. The granularity problem
> could be in the use qdisc_rate_table or near that.

Yes, but according to my assessment there should be "only" 50Mbit
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Antonio Almeida May 29, 2009, 5:02 p.m. UTC | #1
On Thu, May 28, 2009 at 10:12 PM, Jarek Poplawski wrote:
> Yes, but according to my assessment there should be "only" 50Mbit
> difference for this rate/packet size. Anyway, could you try a testing
> patch below, which should add some granularity to this rate table?
>
> Thanks,
> Jarek P.
> ---
>
>  include/net/pkt_sched.h |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> index e37fe31..f0faf03 100644
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -42,8 +42,8 @@ typedef u64   psched_time_t;
>  typedef long   psched_tdiff_t;
>
>  /* Avoid doing 64 bit divide by 1000 */
> -#define PSCHED_US2NS(x)                        ((s64)(x) << 10)
> -#define PSCHED_NS2US(x)                        ((x) >> 10)
> +#define PSCHED_US2NS(x)                        ((s64)(x) << 6)
> +#define PSCHED_NS2US(x)                        ((x) >> 6)
>
>  #define PSCHED_TICKS_PER_SEC           PSCHED_NS2US(NSEC_PER_SEC)
>  #define PSCHED_PASTPERFECT             0

It's better! This patch gives more accuracy to HTB. Here some values:
Note that these are boundary values, so, e.g., any HTB configuration
between 377000Kbit and 400000Kbit would fall in the same step - close
to 397977Kbit.
This test was made over the same conditions: generating 950Mbit/s of
unidirectional tcp traffic of 800 bytes packets long.

leaf class ceil	leaf class sent rate (tc -s values)
376000Kbit	375379Kbit
--
377000Kbit	397977Kbit
400000Kbit	397973Kbit
--
401000Kbit	425199Kbit
426000Kbit	425199Kbit
--
427000Kbit	456389Kbit
457000Kbit	456409Kbit
--
458000Kbit	490111Kbit
492000Kbit	490138Kbit
--
493000Kbit	531957Kbit
533000Kbit	532078Kbit
--
534000Kbit	581835Kbit
581000Kbit	581820Kbit
--
582000Kbit	637809Kbit
640000Kbit	637709Kbit
--
641000Kbit	710526Kbit
711000Kbit	710553Kbit
--
712000Kbit	795921Kbit
800000Kbit	795901Kbit
--
801000Kbit	912706Kbit
914000Kbit	912782Kbit
--
915000Kbit	--


Here more values for a HTB ceil configuration of 555Mbit/s changing packet size:

800 bytes:
class htb 1:108 parent 1:10 leaf 108: prio 7 quantum 1514 rate
555000Kbit ceil 555000Kbit burst 70901b/8 mpu 0b overhead 0b cburst
70901b/8 mpu 0b overhead 0b level 0
 Sent 18731000768 bytes 23531408 pkt (dropped 15715520, overlimits 0 requeues 0)
 rate 581832Kbit 91368pps backlog 0b 110p requeues 0
 lended: 23531298 borrowed: 0 giants: 0
 tokens: -16091 ctokens: -16091


850 bytes:
class htb 1:108 parent 1:10 leaf 108: prio 7 quantum 1514 rate
555000Kbit ceil 555000Kbit burst 70901b/8 mpu 0b overhead 0b cburst
70901b/8 mpu 0b overhead 0b level 0
 Sent 30556163150 bytes 37645600 pkt (dropped 25746491, overlimits 0 requeues 0)
 rate 565509Kbit 83556pps backlog 0b 15p requeues 0
 lended: 37645585 borrowed: 0 giants: 0
 tokens: -16010 ctokens: -16010


950 bytes	
class htb 1:108 parent 1:10 leaf 108: prio 7 quantum 1514 rate
555000Kbit ceil 555000Kbit burst 70901b/8 mpu 0b overhead 0b cburst
70901b/8 mpu 0b overhead 0b level 0
 Sent 51363059854 bytes 60954074 pkt (dropped 40474346, overlimits 0 requeues 0)
 rate 598925Kbit 83555pps backlog 0b 112p requeues 0
 lended: 60953962 borrowed: 0 giants: 0
 tokens: 12446 ctokens: 12446

I'm using
# tc -V
tc utility, iproute2-ss090115

and keeping tso and gso off:
# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off

# ethtool -k eth1
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger May 29, 2009, 5:28 p.m. UTC | #2
On Fri, 29 May 2009 18:02:39 +0100
Antonio Almeida <vexwek@gmail.com> wrote:

> On Thu, May 28, 2009 at 10:12 PM, Jarek Poplawski wrote:
> > Yes, but according to my assessment there should be "only" 50Mbit
> > difference for this rate/packet size. Anyway, could you try a testing
> > patch below, which should add some granularity to this rate table?
> >
> > Thanks,
> > Jarek P.
> > ---
> >
> >  include/net/pkt_sched.h |    4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> > index e37fe31..f0faf03 100644
> > --- a/include/net/pkt_sched.h
> > +++ b/include/net/pkt_sched.h
> > @@ -42,8 +42,8 @@ typedef u64   psched_time_t;
> >  typedef long   psched_tdiff_t;
> >
> >  /* Avoid doing 64 bit divide by 1000 */
> > -#define PSCHED_US2NS(x)                        ((s64)(x) << 10)
> > -#define PSCHED_NS2US(x)                        ((x) >> 10)
> > +#define PSCHED_US2NS(x)                        ((s64)(x) << 6)
> > +#define PSCHED_NS2US(x)                        ((x) >> 6)
> >
> >  #define PSCHED_TICKS_PER_SEC           PSCHED_NS2US(NSEC_PER_SEC)
> >  #define PSCHED_PASTPERFECT             0
> 
> It's better! This patch gives more accuracy to HTB. Here some values:
> Note that these are boundary values, so, e.g., any HTB configuration
> between 377000Kbit and 400000Kbit would fall in the same step - close
> to 397977Kbit.
> This test was made over the same conditions: generating 950Mbit/s of
> unidirectional tcp traffic of 800 bytes packets long.

You really need to get a better box than the dual core AMD.
There is only millisecond (or worse with HZ=100) resolution possible because
there is no working TSC on that hardware.
Jarek Poplawski May 29, 2009, 7:46 p.m. UTC | #3
On Fri, May 29, 2009 at 06:02:39PM +0100, Antonio Almeida wrote:
> On Thu, May 28, 2009 at 10:12 PM, Jarek Poplawski wrote:
> > Yes, but according to my assessment there should be "only" 50Mbit
> > difference for this rate/packet size. Anyway, could you try a testing
> > patch below, which should add some granularity to this rate table?
> >
> > Thanks,
> > Jarek P.
> > ---
> >
> >  include/net/pkt_sched.h |    4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> > index e37fe31..f0faf03 100644
> > --- a/include/net/pkt_sched.h
> > +++ b/include/net/pkt_sched.h
> > @@ -42,8 +42,8 @@ typedef u64   psched_time_t;
> >  typedef long   psched_tdiff_t;
> >
> >  /* Avoid doing 64 bit divide by 1000 */
> > -#define PSCHED_US2NS(x)                        ((s64)(x) << 10)
> > -#define PSCHED_NS2US(x)                        ((x) >> 10)
> > +#define PSCHED_US2NS(x)                        ((s64)(x) << 6)
> > +#define PSCHED_NS2US(x)                        ((x) >> 6)
> >
> >  #define PSCHED_TICKS_PER_SEC           PSCHED_NS2US(NSEC_PER_SEC)
> >  #define PSCHED_PASTPERFECT             0
> 
> It's better! This patch gives more accuracy to HTB. Here some values:
> Note that these are boundary values, so, e.g., any HTB configuration
> between 377000Kbit and 400000Kbit would fall in the same step - close
> to 397977Kbit.

Good news! So it seems there are no other reasons of this inaccuracy
than too coarse granularity, but I have to check this yet. Alas there
is needed something more than this patch, because it probably breaks
other things like hfsc.

Thanks,
Jarek P.

> This test was made over the same conditions: generating 950Mbit/s of
> unidirectional tcp traffic of 800 bytes packets long.
> 
> leaf class ceil	leaf class sent rate (tc -s values)
> 376000Kbit	375379Kbit
> --
> 377000Kbit	397977Kbit
> 400000Kbit	397973Kbit
> --
> 401000Kbit	425199Kbit
> 426000Kbit	425199Kbit
> --
> 427000Kbit	456389Kbit
> 457000Kbit	456409Kbit
> --
> 458000Kbit	490111Kbit
> 492000Kbit	490138Kbit
> --
> 493000Kbit	531957Kbit
> 533000Kbit	532078Kbit
> --
> 534000Kbit	581835Kbit
> 581000Kbit	581820Kbit
> --
> 582000Kbit	637809Kbit
> 640000Kbit	637709Kbit
> --
> 641000Kbit	710526Kbit
> 711000Kbit	710553Kbit
> --
> 712000Kbit	795921Kbit
> 800000Kbit	795901Kbit
> --
> 801000Kbit	912706Kbit
> 914000Kbit	912782Kbit
> --
> 915000Kbit	--
> 
> 
> Here more values for a HTB ceil configuration of 555Mbit/s changing packet size:
> 
> 800 bytes:
> class htb 1:108 parent 1:10 leaf 108: prio 7 quantum 1514 rate
> 555000Kbit ceil 555000Kbit burst 70901b/8 mpu 0b overhead 0b cburst
> 70901b/8 mpu 0b overhead 0b level 0
>  Sent 18731000768 bytes 23531408 pkt (dropped 15715520, overlimits 0 requeues 0)
>  rate 581832Kbit 91368pps backlog 0b 110p requeues 0
>  lended: 23531298 borrowed: 0 giants: 0
>  tokens: -16091 ctokens: -16091
> 
> 
> 850 bytes:
> class htb 1:108 parent 1:10 leaf 108: prio 7 quantum 1514 rate
> 555000Kbit ceil 555000Kbit burst 70901b/8 mpu 0b overhead 0b cburst
> 70901b/8 mpu 0b overhead 0b level 0
>  Sent 30556163150 bytes 37645600 pkt (dropped 25746491, overlimits 0 requeues 0)
>  rate 565509Kbit 83556pps backlog 0b 15p requeues 0
>  lended: 37645585 borrowed: 0 giants: 0
>  tokens: -16010 ctokens: -16010
> 
> 
> 950 bytes	
> class htb 1:108 parent 1:10 leaf 108: prio 7 quantum 1514 rate
> 555000Kbit ceil 555000Kbit burst 70901b/8 mpu 0b overhead 0b cburst
> 70901b/8 mpu 0b overhead 0b level 0
>  Sent 51363059854 bytes 60954074 pkt (dropped 40474346, overlimits 0 requeues 0)
>  rate 598925Kbit 83555pps backlog 0b 112p requeues 0
>  lended: 60953962 borrowed: 0 giants: 0
>  tokens: 12446 ctokens: 12446
> 
> I'm using
> # tc -V
> tc utility, iproute2-ss090115
> 
> and keeping tso and gso off:
> # ethtool -k eth0
> Offload parameters for eth0:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp segmentation offload: off
> udp fragmentation offload: off
> generic segmentation offload: off
> 
> # ethtool -k eth1
> Offload parameters for eth1:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp segmentation offload: off
> udp fragmentation offload: off
> generic segmentation offload: off
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 29, 2009, 7:58 p.m. UTC | #4
On Fri, May 29, 2009 at 10:28:45AM -0700, Stephen Hemminger wrote:
> On Fri, 29 May 2009 18:02:39 +0100
> Antonio Almeida <vexwek@gmail.com> wrote:
> 
> > On Thu, May 28, 2009 at 10:12 PM, Jarek Poplawski wrote:
> > > Yes, but according to my assessment there should be "only" 50Mbit
> > > difference for this rate/packet size. Anyway, could you try a testing
> > > patch below, which should add some granularity to this rate table?
> > >
> > > Thanks,
> > > Jarek P.
> > > ---
> > >
> > >  include/net/pkt_sched.h |    4 ++--
> > >  1 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> > > index e37fe31..f0faf03 100644
> > > --- a/include/net/pkt_sched.h
> > > +++ b/include/net/pkt_sched.h
> > > @@ -42,8 +42,8 @@ typedef u64   psched_time_t;
> > >  typedef long   psched_tdiff_t;
> > >
> > >  /* Avoid doing 64 bit divide by 1000 */
> > > -#define PSCHED_US2NS(x)                        ((s64)(x) << 10)
> > > -#define PSCHED_NS2US(x)                        ((x) >> 10)
> > > +#define PSCHED_US2NS(x)                        ((s64)(x) << 6)
> > > +#define PSCHED_NS2US(x)                        ((x) >> 6)
> > >
> > >  #define PSCHED_TICKS_PER_SEC           PSCHED_NS2US(NSEC_PER_SEC)
> > >  #define PSCHED_PASTPERFECT             0
> > 
> > It's better! This patch gives more accuracy to HTB. Here some values:
> > Note that these are boundary values, so, e.g., any HTB configuration
> > between 377000Kbit and 400000Kbit would fall in the same step - close
> > to 397977Kbit.
> > This test was made over the same conditions: generating 950Mbit/s of
> > unidirectional tcp traffic of 800 bytes packets long.
> 
> You really need to get a better box than the dual core AMD.
> There is only millisecond (or worse with HZ=100) resolution possible because
> there is no working TSC on that hardware.

I think this could cause problems with peak rates but IMHO there is
no reason for htb to miss per second (4s) estimations against the same
clock. Plus it mostly confirms theoretical limits of currently used
rate tables vs. usecond time/ticket accounting.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger May 29, 2009, 8:49 p.m. UTC | #5
On Fri, 29 May 2009 21:46:43 +0200
Jarek Poplawski <jarkao2@gmail.com> wrote:

> On Fri, May 29, 2009 at 06:02:39PM +0100, Antonio Almeida wrote:
> > On Thu, May 28, 2009 at 10:12 PM, Jarek Poplawski wrote:
> > > Yes, but according to my assessment there should be "only" 50Mbit
> > > difference for this rate/packet size. Anyway, could you try a testing
> > > patch below, which should add some granularity to this rate table?
> > >
> > > Thanks,
> > > Jarek P.
> > > ---
> > >
> > >  include/net/pkt_sched.h |    4 ++--
> > >  1 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> > > index e37fe31..f0faf03 100644
> > > --- a/include/net/pkt_sched.h
> > > +++ b/include/net/pkt_sched.h
> > > @@ -42,8 +42,8 @@ typedef u64   psched_time_t;
> > >  typedef long   psched_tdiff_t;
> > >
> > >  /* Avoid doing 64 bit divide by 1000 */
> > > -#define PSCHED_US2NS(x)                        ((s64)(x) << 10)
> > > -#define PSCHED_NS2US(x)                        ((x) >> 10)
> > > +#define PSCHED_US2NS(x)                        ((s64)(x) << 6)
> > > +#define PSCHED_NS2US(x)                        ((x) >> 6)
> > >
> > >  #define PSCHED_TICKS_PER_SEC           PSCHED_NS2US(NSEC_PER_SEC)
> > >  #define PSCHED_PASTPERFECT             0
> > 
> > It's better! This patch gives more accuracy to HTB. Here some values:
> > Note that these are boundary values, so, e.g., any HTB configuration
> > between 377000Kbit and 400000Kbit would fall in the same step - close
> > to 397977Kbit.
> 
> Good news! So it seems there are no other reasons of this inaccuracy
> than too coarse granularity, but I have to check this yet. Alas there
> is needed something more than this patch, because it probably breaks
> other things like hfsc.
> 
> Thanks,
> Jarek P.
> 

Why would it break hfsc, if it isn't already broken.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 29, 2009, 8:59 p.m. UTC | #6
On Fri, May 29, 2009 at 01:49:56PM -0700, Stephen Hemminger wrote:
> On Fri, 29 May 2009 21:46:43 +0200
> Jarek Poplawski <jarkao2@gmail.com> wrote:
...
> > > >  /* Avoid doing 64 bit divide by 1000 */
> > > > -#define PSCHED_US2NS(x)                        ((s64)(x) << 10)
> > > > -#define PSCHED_NS2US(x)                        ((x) >> 10)
> > > > +#define PSCHED_US2NS(x)                        ((s64)(x) << 6)
> > > > +#define PSCHED_NS2US(x)                        ((x) >> 6)
> > > >
> > > >  #define PSCHED_TICKS_PER_SEC           PSCHED_NS2US(NSEC_PER_SEC)
> > > >  #define PSCHED_PASTPERFECT             0
> > > 
> > > It's better! This patch gives more accuracy to HTB. Here some values:
> > > Note that these are boundary values, so, e.g., any HTB configuration
> > > between 377000Kbit and 400000Kbit would fall in the same step - close
> > > to 397977Kbit.
> > 
> > Good news! So it seems there are no other reasons of this inaccuracy
> > than too coarse granularity, but I have to check this yet. Alas there
> > is needed something more than this patch, because it probably breaks
> > other things like hfsc.
> > 
> > Thanks,
> > Jarek P.
> > 
> 
> Why would it break hfsc, if it isn't already broken.

I might be wrong but e.g. these usecs could be one reason:

/* convert d (us) into dx (psched us) */
static u64
d2dx(u32 d)
{
        u64 dx;

        dx = ((u64)d * PSCHED_TICKS_PER_SEC);
        dx += USEC_PER_SEC - 1;
        do_div(dx, USEC_PER_SEC);
        return dx;
}

And maybe these shifts need some adjustment:
m = (sm * PSCHED_TICKS_PER_SEC) >> SM_SHIFT;

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

difference for this rate/packet size. Anyway, could you try a testing
patch below, which should add some granularity to this rate table?

Thanks,
Jarek P.
---

 include/net/pkt_sched.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index e37fe31..f0faf03 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -42,8 +42,8 @@  typedef u64	psched_time_t;
 typedef long	psched_tdiff_t;
 
 /* Avoid doing 64 bit divide by 1000 */
-#define PSCHED_US2NS(x)			((s64)(x) << 10)
-#define PSCHED_NS2US(x)			((x) >> 10)
+#define PSCHED_US2NS(x)			((s64)(x) << 6)
+#define PSCHED_NS2US(x)			((x) >> 6)
 
 #define PSCHED_TICKS_PER_SEC		PSCHED_NS2US(NSEC_PER_SEC)
 #define PSCHED_PASTPERFECT		0