diff mbox

[0/6] Add qdisc->ops->peek() support.

Message ID 20081016220905.GA2747@ami.dom.local
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jarek Poplawski Oct. 16, 2008, 10:09 p.m. UTC
Patrick McHardy wrote, On 10/16/2008 02:38 PM:

> Jarek Poplawski wrote:
...
>> PS: after this patchset only netem_enqueue() needs qdisc->requeue(),
>> but I hope this won't take too long.
> 
> Assuming work-conserving qdiscs are used with netem, the currently
> code will always send out a reorder packet immediately. This behaviour
> is trivial to implement without ->requeue. The problematic case is
> non-work-conserving inner qdiscs, but that doesn't seem important
> at all since you'd usually add it as parent of netem, which still
> works.

How about something like this (example only)?

Jarek P.
---

 include/net/sch_generic.h |    1 +
 net/sched/sch_netem.c     |    7 ++++++-
 2 files changed, 7 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Patrick McHardy Oct. 17, 2008, 12:33 p.m. UTC | #1
Jarek Poplawski wrote:
> Patrick McHardy wrote, On 10/16/2008 02:38 PM:
> 
>> Jarek Poplawski wrote:
> ...
>>> PS: after this patchset only netem_enqueue() needs qdisc->requeue(),
>>> but I hope this won't take too long.
>> Assuming work-conserving qdiscs are used with netem, the currently
>> code will always send out a reorder packet immediately. This behaviour
>> is trivial to implement without ->requeue. The problematic case is
>> non-work-conserving inner qdiscs, but that doesn't seem important
>> at all since you'd usually add it as parent of netem, which still
>> works.
> 
> How about something like this (example only)?
> @@ -233,7 +233,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  		 */
>  		cb->time_to_send = psched_get_time();
>  		q->counter = 0;
> -		ret = q->qdisc->ops->requeue(skb, q->qdisc);
> +		q->qdisc->flags |= TCQ_F_REQUEUE;
> +		ret = qdisc_equeue(skb, q->qdisc);
> +		q->qdisc->flags &= ~TCQ_F_REQUEUE;

Well, the inner qdisc would still need to logic to order packets
apprioriately. Its probably not that hard, but as I said, I don't
think its necessary at all. It only makes a difference with a
non-work-conserving inner qdisc, but a lot of the functionality of
netem requires the inner tfifo anyways and rate-limiting is usually
done on top of netem. So I would suggest so either hard-wire the
tfifo qdisc or at least make the assumption that inner qdiscs are
work-conserving.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 17, 2008, 1:03 p.m. UTC | #2
On Fri, Oct 17, 2008 at 02:33:23PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> Patrick McHardy wrote, On 10/16/2008 02:38 PM:
>>
>>> Jarek Poplawski wrote:
>> ...
>>>> PS: after this patchset only netem_enqueue() needs qdisc->requeue(),
>>>> but I hope this won't take too long.
>>> Assuming work-conserving qdiscs are used with netem, the currently
>>> code will always send out a reorder packet immediately. This behaviour
>>> is trivial to implement without ->requeue. The problematic case is
>>> non-work-conserving inner qdiscs, but that doesn't seem important
>>> at all since you'd usually add it as parent of netem, which still
>>> works.
>>
>> How about something like this (example only)?
>> @@ -233,7 +233,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>>  		 */
>>  		cb->time_to_send = psched_get_time();
>>  		q->counter = 0;
>> -		ret = q->qdisc->ops->requeue(skb, q->qdisc);
>> +		q->qdisc->flags |= TCQ_F_REQUEUE;
>> +		ret = qdisc_equeue(skb, q->qdisc);
>> +		q->qdisc->flags &= ~TCQ_F_REQUEUE;
>
> Well, the inner qdisc would still need to logic to order packets
> apprioriately.

I'm not sure I was understood: the idea is to do something like
in this example in tfifo_enqueue() in all leaf qdiscs like fifo
etc. too, so to redirect their ->enqueue() to their ->requeue()
which usually is qdisc_requeue() (or to it directly if needed).

> Its probably not that hard, but as I said, I don't
> think its necessary at all. It only makes a difference with a
> non-work-conserving inner qdisc, but a lot of the functionality of
> netem requires the inner tfifo anyways and rate-limiting is usually
> done on top of netem. So I would suggest so either hard-wire the
> tfifo qdisc or at least make the assumption that inner qdiscs are
> work-conserving.

Of course, I can do it like this, but wouldn't it break backward
compatibility for some users?

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 17, 2008, 2:12 p.m. UTC | #3
Jarek Poplawski wrote:
> On Fri, Oct 17, 2008 at 02:33:23PM +0200, Patrick McHardy wrote:
>>> @@ -233,7 +233,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>>>  		 */
>>>  		cb->time_to_send = psched_get_time();
>>>  		q->counter = 0;
>>> -		ret = q->qdisc->ops->requeue(skb, q->qdisc);
>>> +		q->qdisc->flags |= TCQ_F_REQUEUE;
>>> +		ret = qdisc_equeue(skb, q->qdisc);
>>> +		q->qdisc->flags &= ~TCQ_F_REQUEUE;
>> Well, the inner qdisc would still need to logic to order packets
>> apprioriately.
> 
> I'm not sure I was understood: the idea is to do something like
> in this example in tfifo_enqueue() in all leaf qdiscs like fifo
> etc. too, so to redirect their ->enqueue() to their ->requeue()
> which usually is qdisc_requeue() (or to it directly if needed).

Yes, I misunderstood this, I though the intention was to get
rid of requeue entirely.

>> Its probably not that hard, but as I said, I don't
>> think its necessary at all. It only makes a difference with a
>> non-work-conserving inner qdisc, but a lot of the functionality of
>> netem requires the inner tfifo anyways and rate-limiting is usually
>> done on top of netem. So I would suggest so either hard-wire the
>> tfifo qdisc or at least make the assumption that inner qdiscs are
>> work-conserving.
> 
> Of course, I can do it like this, but wouldn't it break backward
> compatibility for some users?

Some general thoughts ...

We've never had any systematic checks for useful and non-useful
combination of qdiscs, which is causing a lot of these complications.
Think of all the multiq work that was required to make it work
properly with non-work-conserving qdiscs - while at the same time,
using a non-work-conserving qdisc (which require a global view)
defeats basically all of the benefits.

So it would be really useful to come up with a systematic definition
of valid combinations instead of trying handling lots of purely
theoretical case that don't make sense. One more example - all the
qdiscs implement ->drop(), yet its only needed by CBQ and it doesn't
make any sense at all to use lets say HFSC as child of CBQ.

About this specific case - yes, it would break compatibility for
users using f.i. TBF as child of netem. But if you look at the
netem_enqueue() function, it in fact assumes that the inner qdisc
is a tfifo, so we'd be breaking an already broken case. We can
of course be nice and warn about it for a few releases, but I believe
there is some real potential for simplification that makes it
worth it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 17, 2008, 8:45 p.m. UTC | #4
On Fri, Oct 17, 2008 at 04:12:03PM +0200, Patrick McHardy wrote:
...
> Some general thoughts ...
>
> We've never had any systematic checks for useful and non-useful
> combination of qdiscs, which is causing a lot of these complications.
> Think of all the multiq work that was required to make it work
> properly with non-work-conserving qdiscs - while at the same time,
> using a non-work-conserving qdisc (which require a global view)
> defeats basically all of the benefits.
>
> So it would be really useful to come up with a systematic definition
> of valid combinations instead of trying handling lots of purely
> theoretical case that don't make sense. One more example - all the
> qdiscs implement ->drop(), yet its only needed by CBQ and it doesn't
> make any sense at all to use lets say HFSC as child of CBQ.
>
> About this specific case - yes, it would break compatibility for
> users using f.i. TBF as child of netem. But if you look at the
> netem_enqueue() function, it in fact assumes that the inner qdisc
> is a tfifo, so we'd be breaking an already broken case. We can
> of course be nice and warn about it for a few releases, but I believe
> there is some real potential for simplification that makes it
> worth it.

I'm not sure this is all right: at least until there is not too much
problems with some code (like with requeuing). We probably should try
before this official ways like feature-removal-schedule.txt, and/or
maybe some CONFIG_XXX_DEPRECATED things. But I don't persist with this.

BTW, I'm not sure if I'm expected to redo any patches in this thread.
(Probably some things like this teql_peek() could be removed with
->requeue() killing.)

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 21, 2008, 11:43 p.m. UTC | #5
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Fri, 17 Oct 2008 22:45:43 +0200

> BTW, I'm not sure if I'm expected to redo any patches in this thread.
> (Probably some things like this teql_peek() could be removed with
> ->requeue() killing.)

In any event, you should resubmit these when I open the net-next-2.6
tree up in about a week or so.

It looks like Patrick is in agreement about these changes and they
look mostly fine to me too, FWIW.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 22, 2008, 4:01 p.m. UTC | #6
David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Fri, 17 Oct 2008 22:45:43 +0200
> 
>> BTW, I'm not sure if I'm expected to redo any patches in this thread.
>> (Probably some things like this teql_peek() could be removed with
>> ->requeue() killing.)
> 
> In any event, you should resubmit these when I open the net-next-2.6
> tree up in about a week or so.
> 
> It looks like Patrick is in agreement about these changes and they
> look mostly fine to me too, FWIW.

I think I lost the overview :) I'll review them again once the complete
batch is resent.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 22, 2008, 4:04 p.m. UTC | #7
Jarek Poplawski wrote:
> On Fri, Oct 17, 2008 at 04:12:03PM +0200, Patrick McHardy wrote:
> ...
>> Some general thoughts ...
>>
>> We've never had any systematic checks for useful and non-useful
>> combination of qdiscs, which is causing a lot of these complications.
>> Think of all the multiq work that was required to make it work
>> properly with non-work-conserving qdiscs - while at the same time,
>> using a non-work-conserving qdisc (which require a global view)
>> defeats basically all of the benefits.
>>
>> So it would be really useful to come up with a systematic definition
>> of valid combinations instead of trying handling lots of purely
>> theoretical case that don't make sense. One more example - all the
>> qdiscs implement ->drop(), yet its only needed by CBQ and it doesn't
>> make any sense at all to use lets say HFSC as child of CBQ.
>>
>> About this specific case - yes, it would break compatibility for
>> users using f.i. TBF as child of netem. But if you look at the
>> netem_enqueue() function, it in fact assumes that the inner qdisc
>> is a tfifo, so we'd be breaking an already broken case. We can
>> of course be nice and warn about it for a few releases, but I believe
>> there is some real potential for simplification that makes it
>> worth it.
> 
> I'm not sure this is all right: at least until there is not too much
> problems with some code (like with requeuing).  We probably should try
> before this official ways like feature-removal-schedule.txt, and/or
> maybe some CONFIG_XXX_DEPRECATED things. But I don't persist with this.

Sure. Nobody reads feature-removal-schedule.txt though, so we should
also have a runtime warning :)

> BTW, I'm not sure if I'm expected to redo any patches in this thread.
> (Probably some things like this teql_peek() could be removed with
> ->requeue() killing.)

Just do the changes you think are right, I'm not expecting you to
do anything I suggested :)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9dcb5bf..9157766 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -45,6 +45,7 @@  struct Qdisc
 #define TCQ_F_BUILTIN	1
 #define TCQ_F_THROTTLED	2
 #define TCQ_F_INGRESS	4
+#define TCQ_F_REQUEUE	8
 	int			padded;
 	struct Qdisc_ops	*ops;
 	struct qdisc_size_table	*stab;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 3080bd6..6ac8efc 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -233,7 +233,9 @@  static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		 */
 		cb->time_to_send = psched_get_time();
 		q->counter = 0;
-		ret = q->qdisc->ops->requeue(skb, q->qdisc);
+		q->qdisc->flags |= TCQ_F_REQUEUE;
+		ret = qdisc_equeue(skb, q->qdisc);
+		q->qdisc->flags &= ~TCQ_F_REQUEUE;
 	}
 
 	if (likely(ret == NET_XMIT_SUCCESS)) {
@@ -478,6 +480,9 @@  static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
 	psched_time_t tnext = netem_skb_cb(nskb)->time_to_send;
 	struct sk_buff *skb;
 
+	if (unlikely(sch->flags & TCQ_F_REQUEUE))
+		return qdisc_requeue(nskb, sch);
+
 	if (likely(skb_queue_len(list) < q->limit)) {
 		/* Optimize for add at tail */
 		if (likely(skb_queue_empty(list) || tnext >= q->oldest)) {