diff mbox

[RFC,2/3] tc: deprecate TC_ACT_QUEUED

Message ID 1429644476-8914-3-git-send-email-ast@plumgrid.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Alexei Starovoitov April 21, 2015, 7:27 p.m. UTC
TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
Get rid of redundant checks in all qdiscs.
Instead do it once.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/uapi/linux/pkt_cls.h |    2 +-
 net/sched/sch_api.c          |    2 ++
 net/sched/sch_atm.c          |    1 -
 net/sched/sch_cbq.c          |    1 -
 net/sched/sch_choke.c        |    1 -
 net/sched/sch_drr.c          |    1 -
 net/sched/sch_dsmark.c       |    1 -
 net/sched/sch_fq_codel.c     |    1 -
 net/sched/sch_hfsc.c         |    1 -
 net/sched/sch_htb.c          |    1 -
 net/sched/sch_ingress.c      |    1 -
 net/sched/sch_multiq.c       |    1 -
 net/sched/sch_prio.c         |    1 -
 net/sched/sch_qfq.c          |    1 -
 net/sched/sch_sfb.c          |    1 -
 net/sched/sch_sfq.c          |    1 -
 16 files changed, 3 insertions(+), 15 deletions(-)

Comments

Cong Wang April 22, 2015, 5:02 a.m. UTC | #1
On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
> Get rid of redundant checks in all qdiscs.
> Instead do it once.

The current code can be easily extended, while your code not.
I don't see the need of this change.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 22, 2015, 7:43 a.m. UTC | #2
On 04/22/2015 07:02 AM, Cong Wang wrote:
> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
>> Get rid of redundant checks in all qdiscs.
>> Instead do it once.
>
> The current code can be easily extended, while your code not.

So, do you plan to extend it ?

The alias has been there since pre-git times, so more than
10 years now, where no-one felt the need to extended the
semantics of it ...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 22, 2015, 10:22 p.m. UTC | #3
On 4/21/15 10:02 PM, Cong Wang wrote:
> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
>> Get rid of redundant checks in all qdiscs.
>> Instead do it once.
>
> The current code can be easily extended, while your code not.
> I don't see the need of this change.

well, iproute2 doesn't use TC_ACT_QUEUED action at all and
TC_ACT_STOLEN is used by mirred. All in-tree qdiscs alias them.
If you're saying that some future actions together with
some future qdiscs may take advantage of that, then why they didn't
use it over the last 10 years?
Having both that do the same thing is only confusing.
I think having one value to indicate 'stolen' condition makes TC
code easier to understand.
Jamal, what's your take on this?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang April 22, 2015, 11:39 p.m. UTC | #4
On Wed, Apr 22, 2015 at 3:22 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 4/21/15 10:02 PM, Cong Wang wrote:
>>
>> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com>
>> wrote:
>>>
>>> TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
>>> Get rid of redundant checks in all qdiscs.
>>> Instead do it once.
>>
>>
>> The current code can be easily extended, while your code not.
>> I don't see the need of this change.
>
>
> well, iproute2 doesn't use TC_ACT_QUEUED action at all and
> TC_ACT_STOLEN is used by mirred. All in-tree qdiscs alias them.
> If you're saying that some future actions together with
> some future qdiscs may take advantage of that, then why they didn't
> use it over the last 10 years?
> Having both that do the same thing is only confusing.
> I think having one value to indicate 'stolen' condition makes TC
> code easier to understand.

Then remove it, I am all for this. ;)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 23, 2015, 2:46 a.m. UTC | #5
On 4/22/15 4:39 PM, Cong Wang wrote:
> On Wed, Apr 22, 2015 at 3:22 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> On 4/21/15 10:02 PM, Cong Wang wrote:
>>>
>>> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com>
>>> wrote:
>>>>
>>>> TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
>>>> Get rid of redundant checks in all qdiscs.
>>>> Instead do it once.
>>>
>>>
>>> The current code can be easily extended, while your code not.
>>> I don't see the need of this change.
>>
>>
>> well, iproute2 doesn't use TC_ACT_QUEUED action at all and
>> TC_ACT_STOLEN is used by mirred. All in-tree qdiscs alias them.
>> If you're saying that some future actions together with
>> some future qdiscs may take advantage of that, then why they didn't
>> use it over the last 10 years?
>> Having both that do the same thing is only confusing.
>> I think having one value to indicate 'stolen' condition makes TC
>> code easier to understand.
>
> Then remove it, I am all for this. ;)

TC_ACT_QUEUED cannot be removed.
Only deprecated with backwards compatibility the way this patch did it.
That should have been obvious.

The other two threads degenerated into non-technical comments.

Anyway, this set was RFC to answer my main question whether I should
continue with tc cleanup or stop right here. I got my answer.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 23, 2015, 7:13 a.m. UTC | #6
On 04/23/2015 04:46 AM, Alexei Starovoitov wrote:
...
> The other two threads degenerated into non-technical comments.

Yep. :-/

> Anyway, this set was RFC to answer my main question whether I should
> continue with tc cleanup or stop right here. I got my answer.

I think it's worth proceeding with this, it's long overdue.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang April 23, 2015, 6:12 p.m. UTC | #7
On Wed, Apr 22, 2015 at 7:46 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>
> TC_ACT_QUEUED cannot be removed.
> Only deprecated with backwards compatibility the way this patch did it.
> That should have been obvious.
>

It is at least the third time I have to repeat that: we really don't care
about out-of-tree modules.

Everyone MUST read Documentation/stable_api_nonsense.txt.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 23, 2015, 6:21 p.m. UTC | #8
On 04/23/2015 08:12 PM, Cong Wang wrote:
> On Wed, Apr 22, 2015 at 7:46 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
...
>> TC_ACT_QUEUED cannot be removed.
>> Only deprecated with backwards compatibility the way this patch did it.
>> That should have been obvious.
>
> It is at least the third time I have to repeat that: we really don't care
> about out-of-tree modules.
>
> Everyone MUST read Documentation/stable_api_nonsense.txt.

Seriously, what are you talking about ???

$ git grep -n TC_ACT_QUEUED include/uapi/
include/uapi/linux/pkt_cls.h:105:#define TC_ACT_QUEUED          5

This is *UAPI*, no-one is even remotely talking about out-of-tree
kernel modules.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang April 23, 2015, 6:30 p.m. UTC | #9
On Thu, Apr 23, 2015 at 11:21 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 04/23/2015 08:12 PM, Cong Wang wrote:
>>
>> On Wed, Apr 22, 2015 at 7:46 PM, Alexei Starovoitov <ast@plumgrid.com>
>> wrote:
>
> ...
>>>
>>> TC_ACT_QUEUED cannot be removed.
>>> Only deprecated with backwards compatibility the way this patch did it.
>>> That should have been obvious.
>>
>>
>> It is at least the third time I have to repeat that: we really don't care
>> about out-of-tree modules.
>>
>> Everyone MUST read Documentation/stable_api_nonsense.txt.
>
>
> Seriously, what are you talking about ???
>
> $ git grep -n TC_ACT_QUEUED include/uapi/
> include/uapi/linux/pkt_cls.h:105:#define TC_ACT_QUEUED          5
>
> This is *UAPI*, no-one is even remotely talking about out-of-tree
> kernel modules.

I am not stupid. You should figure out we can just leave its definition
by removing it, leaving the default as stolen.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim April 23, 2015, 8:45 p.m. UTC | #10
On 04/22/15 18:22, Alexei Starovoitov wrote:
> On 4/21/15 10:02 PM, Cong Wang wrote:
>> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov
>> <ast@plumgrid.com> wrote:
>>> TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
>>> Get rid of redundant checks in all qdiscs.
>>> Instead do it once.
>>
>> The current code can be easily extended, while your code not.
>> I don't see the need of this change.
>
> well, iproute2 doesn't use TC_ACT_QUEUED action at all and
> TC_ACT_STOLEN is used by mirred. All in-tree qdiscs alias them.
> If you're saying that some future actions together with
> some future qdiscs may take advantage of that, then why they didn't
> use it over the last 10 years?
> Having both that do the same thing is only confusing.
> I think having one value to indicate 'stolen' condition makes TC
> code easier to understand.
> Jamal, what's your take on this?


Sorry - I am on travel so not very helpful/responsive.
Several things:

1) the _XMIT semantics are useful on the egress side because in fact
we do have queues and they can be attached to qdiscs etc.
The TC_ACT_XXX codes were _intentional_ since ingress works as a
classifier shell. The fact that qdiscs dealt with these codes directly
allows for specialized handling. Moving them to a generic function
seems to defeat that purpose. So I am siding with Cong on this.
You probably have some other motivation for doing this which is not 
clear yet? Are you planning to queue things on ingress?

2) the ACT_QUEUED vs STOLEN was supposed to have semantics of something
that was stolen (eg redirection should definetely have been returning
STOLEN not QUEUED); something that queues for later re-injection
(with any/all metadata) was intended to use QUEUED. I believe netfilter
may have  followed suit and introduced similar codes (so it would be
interesting to see how they use them). In any case, it is clear it is
not being used - but i want to point the reason it was there
(which may still be useful; problem is TheLinuxWay is cutnpaste where
someone repeats the code that exists).

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal April 23, 2015, 9:20 p.m. UTC | #11
Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> 2) the ACT_QUEUED vs STOLEN was supposed to have semantics of something
> that was stolen (eg redirection should definetely have been returning
> STOLEN not QUEUED); something that queues for later re-injection
> (with any/all metadata) was intended to use QUEUED. I believe netfilter
> may have  followed suit and introduced similar codes (so it would be
> interesting to see how they use them).

Hooks (Targets) don't queue themselves, i.e. NF_QUEUE tells the
netfilter core that the skb is to be handed off to nf_queue machinery,
while NF_STOLEN is the more obvious "don't touch this skb ever again".
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 23, 2015, 10:13 p.m. UTC | #12
On 4/23/15 1:45 PM, Jamal Hadi Salim wrote:
>
> 1) the _XMIT semantics are useful on the egress side because in fact
> we do have queues and they can be attached to qdiscs etc.
> The TC_ACT_XXX codes were _intentional_ since ingress works as a
> classifier shell.

then it is worse mess than I thought :(
Why call it _qdisc_ then? and have special and convoluted handling for
it in qdisc_create, qdisc_graft and other places?

 > Are you planning to queue things on ingress?

I thought that was the whole purpose of ingress qdisc.
why then we have dev->ingress_queue?

If queueing was never a goal, may be we should kill ingress qdisc
and replace it with a simple shim that only does cls/act.
The code overall will get much simpler and faster.

Feels like falling into rabbit hole.

> The fact that qdiscs dealt with these codes directly
> allows for specialized handling. Moving them to a generic function
> seems to defeat that purpose. So I am siding with Cong on this.

that's not what patch 1 is doing. It is still doing specialized
handling... but in light of what you said above, it looks like much
bigger cleanup is needed. We'll continue arguing when I refactor
this set and resubmit when net-next reopens.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang April 23, 2015, 10:33 p.m. UTC | #13
On Thu, Apr 23, 2015 at 3:13 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 4/23/15 1:45 PM, Jamal Hadi Salim wrote:
>>
>>
>> 1) the _XMIT semantics are useful on the egress side because in fact
>> we do have queues and they can be attached to qdiscs etc.
>> The TC_ACT_XXX codes were _intentional_ since ingress works as a
>> classifier shell.
>
>
> then it is worse mess than I thought :(
> Why call it _qdisc_ then? and have special and convoluted handling for
> it in qdisc_create, qdisc_graft and other places?

It needs to glue into qdisc layer, unify the abstractions.

>
>> Are you planning to queue things on ingress?
>
> I thought that was the whole purpose of ingress qdisc.

There is no queue for ingress qdisc.

> why then we have dev->ingress_queue?

Glue layer, qdisc ties too much with txq, which is what I want to solve.

We have struct netdev_rx_queue too, but look at it, there is nothing
but some sysfs stuffs needed by RPS.

>
> If queueing was never a goal, may be we should kill ingress qdisc
> and replace it with a simple shim that only does cls/act.
> The code overall will get much simpler and faster.
>

As I said in response to your patch for skb->data, we need to align
ingress with egress, rather than making more differences, this includes
qdisc's too. We do have per-cpu queues for ingress, at least in theory
we have a queue for ingress to use.

Of course, RX is naturally different from TX, which is the root cause
why we have so many differences here.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim April 23, 2015, 10:51 p.m. UTC | #14
On 04/23/15 18:13, Alexei Starovoitov wrote:
> On 4/23/15 1:45 PM, Jamal Hadi Salim wrote:

> then it is worse mess than I thought :(
> Why call it _qdisc_ then? and have special and convoluted handling for
> it in qdisc_create, qdisc_graft and other places?

Convinience for tooling and using existing abstractions is the
historical. You can attach qdiscs to netdevs.
You can then use all sorts of well understood tools.

>  > Are you planning to queue things on ingress?
>
> I thought that was the whole purpose of ingress qdisc.
> why then we have dev->ingress_queue?
>

So you are planning to add queues? If you are that is a different
discussion (and the use case needs some clarity).

> If queueing was never a goal, may be we should kill ingress qdisc
> and replace it with a simple shim that only does cls/act.
> The code overall will get much simpler and faster.
>

Attaching to the device was considered "simpler" based on the tooling
thought process. On whether we should queue on ingress - it was not
considered useful if the packets were going to the host i.e we want to
proceed to completion likely queueing to some socket layer further
upstream.
For packets being forwarded we already had egress qdiscs which had
queues so it didnt seem to make sense to enqueue on ingress for such
use cases.
There was a use case later where multiple ingress ports had to be shared
and ifb was born - which is pseudo temporary enqueueing on ingress.


> Feels like falling into rabbit hole.
>
>> The fact that qdiscs dealt with these codes directly
>> allows for specialized handling. Moving them to a generic function
>> seems to defeat that purpose. So I am siding with Cong on this.
>
> that's not what patch 1 is doing. It is still doing specialized
> handling... but in light of what you said above, it looks like much
> bigger cleanup is needed. We'll continue arguing when I refactor
> this set and resubmit when net-next reopens.
>

Sorry - i was refereing to the patch where you agregated things after
the qdisc invokes a classifier.

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 24, 2015, 12:59 a.m. UTC | #15
On 4/23/15 3:51 PM, Jamal Hadi Salim wrote:
>
> So you are planning to add queues? If you are that is a different
> discussion (and the use case needs some clarity).

nope. I wasn't planning to do that.

 > For packets being forwarded we already had egress qdiscs which had
 > queues so it didnt seem to make sense to enqueue on ingress for such
 > use cases.
 > There was a use case later where multiple ingress ports had to be
 > shared
 > and ifb was born - which is pseudo temporary enqueueing on ingress.

agree. imo ifb approach is more flexible, since it has full hierarchy
of qdiscs. As you're saying above and from the old ifb logs I thought
that ifb is _temporary_ and long term plan is to use ingress_queue,
but looks like this is not the case. Also not too long ago we decided
that we don't want another ingress qdisc. Therefore I think it makes
sense to simplify the code and boost performance.
I'm not proposing to change tooling, of course.
 From iproute2 point of view we'll still have ingress qdisc.
Right now we're pointlessly allocating memory for it and for
ingress_queue, whereas we only need to call cls/act.
I'm proposing to kill them and used tcf_proto in net_device instead.
Right now to reach cls in critical path on ingress we do:
rxq = skb->dev->ingress_queue
sch = rxq->qdisc
sch->enqueue
sch->filter_list
with a bunch of 'if' conditions and useless memory accesses in-between.
If we add 'ingress_filter_list' directly to net_device,
it will be just:
skb->dev->ingress_filter_list
which is huge performance boost. Code size will shrink as well.
iproute2 and all existing tools will work as-is. Looks as win-win to me.

>>> The fact that qdiscs dealt with these codes directly
>>> allows for specialized handling. Moving them to a generic function
>>> seems to defeat that purpose. So I am siding with Cong on this.
>>
>> that's not what patch 1 is doing. It is still doing specialized
>> handling... but in light of what you said above, it looks like much
>> bigger cleanup is needed. We'll continue arguing when I refactor
>> this set and resubmit when net-next reopens.
>
> Sorry - i was refereing to the patch where you agregated things after
> the qdisc invokes a classifier.

hmm. There I also didn't convert all qdiscs into single helper.
Only those that have exactly the same logic after tc_classify.
All other qdiscs have custom handling.
No worries, it's hard to review things while traveling. Been there :)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang April 24, 2015, 3:37 a.m. UTC | #16
On Thu, Apr 23, 2015 at 5:59 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 4/23/15 3:51 PM, Jamal Hadi Salim wrote:
>>
>>
>> So you are planning to add queues? If you are that is a different
>> discussion (and the use case needs some clarity).
>
>
> nope. I wasn't planning to do that.
>
>> For packets being forwarded we already had egress qdiscs which had
>> queues so it didnt seem to make sense to enqueue on ingress for such
>> use cases.
>> There was a use case later where multiple ingress ports had to be
>> shared
>> and ifb was born - which is pseudo temporary enqueueing on ingress.
>
> agree. imo ifb approach is more flexible, since it has full hierarchy
> of qdiscs. As you're saying above and from the old ifb logs I thought
> that ifb is _temporary_ and long term plan is to use ingress_queue,
> but looks like this is not the case. Also not too long ago we decided
> that we don't want another ingress qdisc. Therefore I think it makes
> sense to simplify the code and boost performance.

Which performance problem did you see? Did you hit the spinlock
bottleneck? That spinlock should be eliminated before you
try to do more, and it can be replaced by RCU read lock after
John's work (I am still waiting for his patch btw).


> I'm not proposing to change tooling, of course.
> From iproute2 point of view we'll still have ingress qdisc.
> Right now we're pointlessly allocating memory for it and for
> ingress_queue, whereas we only need to call cls/act.
> I'm proposing to kill them and used tcf_proto in net_device instead.
> Right now to reach cls in critical path on ingress we do:
> rxq = skb->dev->ingress_queue
> sch = rxq->qdisc
> sch->enqueue
> sch->filter_list
> with a bunch of 'if' conditions and useless memory accesses in-between.
> If we add 'ingress_filter_list' directly to net_device,
> it will be just:
> skb->dev->ingress_filter_list
> which is huge performance boost. Code size will shrink as well.
> iproute2 and all existing tools will work as-is. Looks as win-win to me.

Nope, it breaks Qdisc abstraction, filters never attach to netdevice
directly. You should stop, since you really don't understand the code.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 24, 2015, 8:12 a.m. UTC | #17
On 04/24/2015 05:37 AM, Cong Wang wrote:
> On Thu, Apr 23, 2015 at 5:59 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> On 4/23/15 3:51 PM, Jamal Hadi Salim wrote:
...
>> agree. imo ifb approach is more flexible, since it has full hierarchy
>> of qdiscs. As you're saying above and from the old ifb logs I thought
>> that ifb is _temporary_ and long term plan is to use ingress_queue,
>> but looks like this is not the case. Also not too long ago we decided
>> that we don't want another ingress qdisc. Therefore I think it makes
>> sense to simplify the code and boost performance.
>
> Which performance problem did you see? Did you hit the spinlock
> bottleneck? That spinlock should be eliminated before you
> try to do more  [...]

Those are already the next steps for improving performance after
that, by attempting to get rid of these additional indirections
and shrinking code.

>> I'm not proposing to change tooling, of course.
>>  From iproute2 point of view we'll still have ingress qdisc.
>> Right now we're pointlessly allocating memory for it and for
>> ingress_queue, whereas we only need to call cls/act.
>> I'm proposing to kill them and used tcf_proto in net_device instead.
>> Right now to reach cls in critical path on ingress we do:
>> rxq = skb->dev->ingress_queue
>> sch = rxq->qdisc
>> sch->enqueue
>> sch->filter_list
>> with a bunch of 'if' conditions and useless memory accesses in-between.
>> If we add 'ingress_filter_list' directly to net_device,
>> it will be just:
>> skb->dev->ingress_filter_list
>> which is huge performance boost. Code size will shrink as well.
>> iproute2 and all existing tools will work as-is. Looks as win-win to me.
>
> Nope, it breaks Qdisc abstraction, filters never attach to netdevice
> directly. You should stop, since you really don't understand the code.

Would be great if you make netdev a better place, stop such immature
comments, and keep it technical as everyone else.

The point here is to try thinking outside the box, so lets see how
the result in the end looks like, then we can discuss.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim April 27, 2015, 12:31 p.m. UTC | #18
On 04/23/15 20:59, Alexei Starovoitov wrote:
> On 4/23/15 3:51 PM, Jamal Hadi Salim wrote:
>>
>> So you are planning to add queues? If you are that is a different
>> discussion (and the use case needs some clarity).
>
> nope. I wasn't planning to do that.
>

Then i would say, lets just keep the naming convention. Maybe
better documentation is needed.

>  > For packets being forwarded we already had egress qdiscs which had
>  > queues so it didnt seem to make sense to enqueue on ingress for such
>  > use cases.
>  > There was a use case later where multiple ingress ports had to be
>  > shared
>  > and ifb was born - which is pseudo temporary enqueueing on ingress.
>
> agree. imo ifb approach is more flexible, since it has full hierarchy
> of qdiscs. As you're saying above and from the old ifb logs I thought
> that ifb is _temporary_ and long term plan is to use ingress_queue,
> but looks like this is not the case.

ifb represented a real use case which questioned the lack
of ingress queue. But do note, that problem was solved. And in
engineering we just move on unless a compelling reason makes us
rethink.
IMO, the original use case for ifb no longer requires it but the
internets and the googles havent caught up with it yet. Refer to
my netdev01 preso/paper where i talk about "sharing".
However, ifb addressed another issue. Instead of having per port/netdev
policies we can now have per-aggregate-port-group policies. While this
comes at a small cost of redirecting packets, that penalty is only paid
for by people interested in defining such policies.
This in itself is useful.

>Also not too long ago we decided
> that we don't want another ingress qdisc. Therefore I think it makes
> sense to simplify the code and boost performance.
> I'm not proposing to change tooling, of course.
>  From iproute2 point of view we'll still have ingress qdisc.
> Right now we're pointlessly allocating memory for it and for
> ingress_queue, whereas we only need to call cls/act.
> I'm proposing to kill them and used tcf_proto in net_device instead.
> Right now to reach cls in critical path on ingress we do:
> rxq = skb->dev->ingress_queue
> sch = rxq->qdisc
> sch->enqueue
> sch->filter_list
> with a bunch of 'if' conditions and useless memory accesses in-between.
> If we add 'ingress_filter_list' directly to net_device,
> it will be just:
> skb->dev->ingress_filter_list
> which is huge performance boost. Code size will shrink as well.
> iproute2 and all existing tools will work as-is. Looks as win-win to me.
>

I hear you; any extra cycle we can remove from the equation is useful.
But do note: in this case, it comes down to the thin line between
usability and  performance. Do take a closer look at the tooling
interfaces from user space on ingress qdisc. The serialization,
the ops already provided for manipulating filters etc. Those are
for free if you use the qdisc abstraction.
If you can still keep that and get rid of the opcodes you mention
above then it is win-win. My feeling is it is a lot more challenging.

>>>> The fact that qdiscs dealt with these codes directly
>>>> allows for specialized handling. Moving them to a generic function
>>>> seems to defeat that purpose. So I am siding with Cong on this.
>>>
>>> that's not what patch 1 is doing. It is still doing specialized
>>> handling... but in light of what you said above, it looks like much
>>> bigger cleanup is needed. We'll continue arguing when I refactor
>>> this set and resubmit when net-next reopens.
>>
>> Sorry - i was refereing to the patch where you agregated things after
>> the qdisc invokes a classifier.
>
> hmm. There I also didn't convert all qdiscs into single helper.
> Only those that have exactly the same logic after tc_classify.
> All other qdiscs have custom handling.
> No worries, it's hard to review things while traveling. Been there :)
>

I am sorry again - will look when i get out of travel mode.

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index bf08e76bf505..208e5ed5256c 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -102,7 +102,7 @@  enum {
 #define TC_ACT_SHOT		2
 #define TC_ACT_PIPE		3
 #define TC_ACT_STOLEN		4
-#define TC_ACT_QUEUED		5
+#define TC_ACT_QUEUED		5 /* deprecated. same as TC_ACT_STOLEN */
 #define TC_ACT_REPEAT		6
 #define TC_ACT_JUMP		0x10000000
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ad9eed70bc8f..f7950327bb22 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1820,6 +1820,8 @@  int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
 #ifdef CONFIG_NET_CLS_ACT
 			if (err != TC_ACT_RECLASSIFY && skb->tc_verd)
 				skb->tc_verd = SET_TC_VERD(skb->tc_verd, 0);
+			if (err == TC_ACT_QUEUED)
+				return TC_ACT_STOLEN;
 #endif
 			return err;
 		}
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index e3e2cc5fd068..7ef0bf4bdce6 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -396,7 +396,6 @@  done:
 		/*@@@ looks good ... but it's not supposed to work :-) */
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			kfree_skb(skb);
 			return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index beeb75f80fdb..eca4725e3273 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -258,7 +258,6 @@  cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 			goto fallback;
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index c009eb9045ce..a3bc7cf151d3 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -212,7 +212,6 @@  static bool choke_classify(struct sk_buff *skb,
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
 		case TC_ACT_STOLEN:
-		case TC_ACT_QUEUED:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
 			return false;
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 338706092c27..1051c5d4e85b 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -335,7 +335,6 @@  static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 66700a6116aa..ce9d4123cbbe 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -236,7 +236,6 @@  static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 		switch (result) {
 #ifdef CONFIG_NET_CLS_ACT
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			kfree_skb(skb);
 			return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 1e52decb7b59..4d00ece3243d 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -104,7 +104,6 @@  static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
 		case TC_ACT_STOLEN:
-		case TC_ACT_QUEUED:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
 			return 0;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index e6c7416d0332..3438f9b4fd8b 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1168,7 +1168,6 @@  hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	while (tcf && (result = tc_classify(skb, tcf, &res)) >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index f1acb0f60dc3..32c1f081bed9 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -232,7 +232,6 @@  static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
 	while (tcf && (result = tc_classify(skb, tcf, &res)) >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index e68f4a5dbeba..63cbd1e066bd 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -69,7 +69,6 @@  static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	case TC_ACT_SHOT:
 		qdisc_qstats_drop(sch);
 	case TC_ACT_STOLEN:
-	case TC_ACT_QUEUED:
 		result = NET_XMIT_DROP;
 		kfree_skb(skb);
 		break;
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 42dd218871e0..dc2a708a772c 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -50,7 +50,6 @@  multiq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 #ifdef CONFIG_NET_CLS_ACT
 	switch (err) {
 	case TC_ACT_STOLEN:
-	case TC_ACT_QUEUED:
 		*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 	case TC_ACT_SHOT:
 		return NULL;
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 8e5cd34aaa74..de4825a9d0aa 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -46,7 +46,6 @@  prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 #ifdef CONFIG_NET_CLS_ACT
 		switch (err) {
 		case TC_ACT_STOLEN:
-		case TC_ACT_QUEUED:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
 			return NULL;
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 3ec7e88a43ca..9d1e43ea0ccb 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -723,7 +723,6 @@  static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 5819dd82630d..61e18108d0d3 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -264,7 +264,6 @@  static bool sfb_classify(struct sk_buff *skb, struct tcf_proto *fl,
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
 		case TC_ACT_STOLEN:
-		case TC_ACT_QUEUED:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
 			return false;
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index b877140beda5..b508ff655da3 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -207,7 +207,6 @@  static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
 		case TC_ACT_STOLEN:
-		case TC_ACT_QUEUED:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
 			return 0;