diff mbox

[RFC] pkt_sched: gen_estimator: Dont report fake rate estimators

Message ID 4AC4FE07.5070204@gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Oct. 1, 2009, 7:07 p.m. UTC
We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
is running.

# tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0

User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
one (because no estimator is active)

After this patch, tc command output is :
$ tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/gen_stats.h |   16 +++++++++++++---
 net/core/gen_estimator.c  |    9 +++++----
 net/core/gen_stats.c      |    9 ++++++---
 net/sched/act_police.c    |    2 +-
 4 files changed, 25 insertions(+), 11 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Oct. 1, 2009, 7:37 p.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 01 Oct 2009 21:07:51 +0200

> We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
> is running.
> 
> # tc -s -d qdisc
> qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>  Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
>  rate 0bit 0pps backlog 0b 0p requeues 0
> 
> User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
> one (because no estimator is active)
> 
> After this patch, tc command output is :
> $ tc -s -d qdisc
> qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>  Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

I'm generally fine with this idea.

The new behavior is certainly more intuitive even to me :-)

Unless there are other objections I'm ok with this and I'll apply
your final version when I start taking changes for net-next-2.6
(which is probably right after -rc3 is released).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 1, 2009, 9:05 p.m. UTC | #2
David Miller wrote, On 10/01/2009 09:37 PM:

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 01 Oct 2009 21:07:51 +0200
> 
>> We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
>> is running.
>>
>> # tc -s -d qdisc
>> qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>>  Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
>>  rate 0bit 0pps backlog 0b 0p requeues 0
>>
>> User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
>> one (because no estimator is active)
>>
>> After this patch, tc command output is :
>> $ tc -s -d qdisc
>> qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>>  Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> I'm generally fine with this idea.
> 
> The new behavior is certainly more intuitive even to me :-)
> 
> Unless there are other objections I'm ok with this and I'll apply

Since you ask... I wonder about this whole int plus quite a bit of
struct unreadability for one flag only. Maybe it could be queried
on qdisc level (with a flag if necessary), and additional parameter
of gnet_stats_copy_rate_est()? (Qdiscs should have no problem with
setting this param for their classes too.)


Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 1, 2009, 9:14 p.m. UTC | #3
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 01 Oct 2009 23:05:53 +0200

> Since you ask... I wonder about this whole int plus quite a bit of
> struct unreadability for one flag only. Maybe it could be queried
> on qdisc level (with a flag if necessary), and additional parameter
> of gnet_stats_copy_rate_est()? (Qdiscs should have no problem with
> setting this param for their classes too.)

Certainly, that's another approach to this problem.

But logically, just like we wouldn't emit a block of RED scheduler
data to 'tc' unless RED is actually configured, it seems consistent to
not emit estimator data when no estimator is even there.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 1, 2009, 9:21 p.m. UTC | #4
David Miller wrote, On 10/01/2009 11:14 PM:

> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Thu, 01 Oct 2009 23:05:53 +0200
> 
>> Since you ask... I wonder about this whole int plus quite a bit of
>> struct unreadability for one flag only. Maybe it could be queried
>> on qdisc level (with a flag if necessary), and additional parameter
>> of gnet_stats_copy_rate_est()? (Qdiscs should have no problem with
>> setting this param for their classes too.)
> 
> Certainly, that's another approach to this problem.
> 
> But logically, just like we wouldn't emit a block of RED scheduler
> data to 'tc' unless RED is actually configured, it seems consistent to
> not emit estimator data when no estimator is even there.

Sure! I've exaggerated with this additional parameter. ;-)

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 2, 2009, 7:08 a.m. UTC | #5
On 01-10-2009 23:21, Jarek Poplawski wrote:
> David Miller wrote, On 10/01/2009 11:14 PM:
> 
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Thu, 01 Oct 2009 23:05:53 +0200
>>
>>> Since you ask... I wonder about this whole int plus quite a bit of
>>> struct unreadability for one flag only. Maybe it could be queried
>>> on qdisc level (with a flag if necessary), and additional parameter
>>> of gnet_stats_copy_rate_est()? (Qdiscs should have no problem with
>>> setting this param for their classes too.)
>> Certainly, that's another approach to this problem.
>>
>> But logically, just like we wouldn't emit a block of RED scheduler
>> data to 'tc' unless RED is actually configured, it seems consistent to
>> not emit estimator data when no estimator is even there.
> 
> Sure! I've exaggerated with this additional parameter. ;-)

To make my point clare: why not something like this?:

static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
                         u32 pid, u32 seq, u16 flags, int event)
{
	...
	if (gnet_stats_copy_basic(&d, &q->bstats) < 0 ||
	    (gen_estimator_active(&q->bstats, &q->rate_est) &&
             gnet_stats_copy_rate_est(&d, &q->rate_est) < 0) ||
            gnet_stats_copy_queue(&d, &q->qstats) < 0)
                goto nla_put_failure;

BTW, I'm not sure we need to chanage user visible API for this.
(Is it really expected to work after updating gen_stats.h only in
iproute?)

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 2, 2009, 7:12 a.m. UTC | #6
Jarek Poplawski a écrit :

> To make my point clare: why not something like this?:
> 
> static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
>                          u32 pid, u32 seq, u16 flags, int event)
> {
> 	...
> 	if (gnet_stats_copy_basic(&d, &q->bstats) < 0 ||
> 	    (gen_estimator_active(&q->bstats, &q->rate_est) &&
>              gnet_stats_copy_rate_est(&d, &q->rate_est) < 0) ||
>             gnet_stats_copy_queue(&d, &q->qstats) < 0)
>                 goto nla_put_failure;
> 
> BTW, I'm not sure we need to chanage user visible API for this.
> (Is it really expected to work after updating gen_stats.h only in
> iproute?)
> 

Thats would be better indeed, do you want to work on it or let me do it ?

Thanks
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 2, 2009, 7:17 a.m. UTC | #7
On Fri, Oct 02, 2009 at 09:12:57AM +0200, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> 
> > To make my point clare: why not something like this?:
> > 
> > static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
> >                          u32 pid, u32 seq, u16 flags, int event)
> > {
> > 	...
> > 	if (gnet_stats_copy_basic(&d, &q->bstats) < 0 ||
> > 	    (gen_estimator_active(&q->bstats, &q->rate_est) &&
> >              gnet_stats_copy_rate_est(&d, &q->rate_est) < 0) ||
> >             gnet_stats_copy_queue(&d, &q->qstats) < 0)
> >                 goto nla_put_failure;
> > 
> > BTW, I'm not sure we need to chanage user visible API for this.
> > (Is it really expected to work after updating gen_stats.h only in
> > iproute?)
> > 
> 
> Thats would be better indeed, do you want to work on it or let me do it ?

I want you work on it.

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 2, 2009, 7:32 a.m. UTC | #8
On Fri, Oct 02, 2009 at 07:08:19AM +0000, Jarek Poplawski wrote:
> On 01-10-2009 23:21, Jarek Poplawski wrote:
...
> To make my point clare: [...]

Am I clair? ;-)

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/gen_stats.h b/include/linux/gen_stats.h
index 710e901..7678ded 100644
--- a/include/linux/gen_stats.h
+++ b/include/linux/gen_stats.h
@@ -30,17 +30,27 @@  struct gnet_stats_basic_packed
 } __attribute__ ((packed));
 
 /**
- * struct gnet_stats_rate_est - rate estimator
+ * struct gnet_stats_user_rate_est - rate estimator
  * @bps: current byte rate
  * @pps: current packet rate
  */
-struct gnet_stats_rate_est
-{
+struct gnet_stats_user_rate_est {
 	__u32	bps;
 	__u32	pps;
 };
 
 /**
+ * struct gnet_stats_rate_est - rate estimator with flags
+ * @est: current byte/packet rate
+ * @flags: set to one if estimation is valid
+ */
+struct gnet_stats_rate_est {
+	struct gnet_stats_user_rate_est	est;
+	int				flags;
+};
+#define RATE_EST_VALID 1
+
+/**
  * struct gnet_stats_queue - queuing statistics
  * @qlen: queue length
  * @backlog: backlog size of queue
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 493775f..5ba9d90 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -129,12 +129,13 @@  static void est_timer(unsigned long arg)
 		brate = (nbytes - e->last_bytes)<<(7 - idx);
 		e->last_bytes = nbytes;
 		e->avbps += (brate >> e->ewma_log) - (e->avbps >> e->ewma_log);
-		e->rate_est->bps = (e->avbps+0xF)>>5;
+		e->rate_est->est.bps = (e->avbps+0xF)>>5;
 
 		rate = (npackets - e->last_packets)<<(12 - idx);
 		e->last_packets = npackets;
 		e->avpps += (rate >> e->ewma_log) - (e->avpps >> e->ewma_log);
-		e->rate_est->pps = (e->avpps+0x1FF)>>10;
+		e->rate_est->est.pps = (e->avpps+0x1FF)>>10;
+		e->rate_est->flags |= RATE_EST_VALID;
 skip:
 		read_unlock(&est_lock);
 		spin_unlock(e->stats_lock);
@@ -227,9 +228,9 @@  int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 	est->stats_lock = stats_lock;
 	est->ewma_log = parm->ewma_log;
 	est->last_bytes = bstats->bytes;
-	est->avbps = rate_est->bps<<5;
+	est->avbps = rate_est->est.bps<<5;
 	est->last_packets = bstats->packets;
-	est->avpps = rate_est->pps<<10;
+	est->avpps = rate_est->est.pps<<10;
 
 	if (!elist[idx].timer.function) {
 		INIT_LIST_HEAD(&elist[idx].list);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 8569310..b6f723c 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -138,13 +138,16 @@  gnet_stats_copy_basic(struct gnet_dump *d, struct gnet_stats_basic_packed *b)
 int
 gnet_stats_copy_rate_est(struct gnet_dump *d, struct gnet_stats_rate_est *r)
 {
+	if (!(r->flags & RATE_EST_VALID))
+		return 0;
+
 	if (d->compat_tc_stats) {
-		d->tc_stats.bps = r->bps;
-		d->tc_stats.pps = r->pps;
+		d->tc_stats.bps = r->est.bps;
+		d->tc_stats.pps = r->est.pps;
 	}
 
 	if (d->tail)
-		return gnet_stats_copy(d, TCA_STATS_RATE_EST, r, sizeof(*r));
+		return gnet_stats_copy(d, TCA_STATS_RATE_EST, &r->est, sizeof(r->est));
 
 	return 0;
 }
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 723964c..ba01081 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -292,7 +292,7 @@  static int tcf_act_police(struct sk_buff *skb, struct tc_action *a,
 	police->tcf_bstats.packets++;
 
 	if (police->tcfp_ewma_rate &&
-	    police->tcf_rate_est.bps >= police->tcfp_ewma_rate) {
+	    police->tcf_rate_est.est.bps >= police->tcfp_ewma_rate) {
 		police->tcf_qstats.overlimits++;
 		if (police->tcf_action == TC_ACT_SHOT)
 			police->tcf_qstats.drops++;