diff mbox

[2/2] net: sched: convert qdisc linked list to hashtable

Message ID alpine.LNX.2.00.1608121523020.22028@cbobk.fhfr.pm
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Kosina Aug. 12, 2016, 1:53 p.m. UTC
On Fri, 12 Aug 2016, Daniel Borkmann wrote:

> This results in below panic. Tested reverting this patch and it fixes
> the panic.
> 
> Did you test this also with ingress or clsact qdisc (just try adding
> it to lo dev for example) ?

Hi Daniel,

thanks for the report. Hmm, I am pretty sure clsact worked for me, but 
I'll recheck.

> What happens is the following in qdisc_match_from_root():
> 
> [  995.422187] XXX qdisc:ffff88025e4fc800 queue:ffff880262759000
> dev:ffff880261cc2000 handle:ffff0000
> [  995.422200] XXX qdisc:ffffffff81cf8100 queue:ffffffff81cf8240 dev:
> (null) handle:ffff0000
> 
> I believe this is due to dev_ingress_queue_create() assigning the
> global noop_qdisc instance as qdisc_sleeping, which later qdisc_lookup()
> uses for qdisc_match_from_root().
> 
> But everything that uses things like noop_qdisc cannot work with the
> new qdisc_match_from_root(), because qdisc_dev(root) will always trigger
> NULL pointer dereference there. Reason is because the dev is always
> NULL for noop, it's a singleton, see noop_qdisc and noop_netdev_queue
> in sch_generic.c.
> 
> Now how to fix it? Creating separate noop instances each time it's set
> would be quite a waste of memory. Even fuglier would be to hack a static
> net device struct into sch_generic.c and let noop_netdev_queue point there
> to get to the hash table. Or we just not use qdisc_dev().

How about we actually extend a little bit the TCQ_F_BUILTIN special case 
test in qdisc_match_from_root()?

After the change, the only way how qdisc_dev() could be NULL should be a 
TCQ_F_BUILTIN case, right?

I was thinking about something like the patch below (the reasong being 
that ->dev would be NULL only in cases of singletonish qdiscs) ... 
wouldn't that also fix the issue you're seeing? Have to think it through a 
little bit more ..




Thanks!

Comments

Daniel Borkmann Aug. 12, 2016, 2:26 p.m. UTC | #1
On 08/12/2016 03:53 PM, Jiri Kosina wrote:
> On Fri, 12 Aug 2016, Daniel Borkmann wrote:
>
>> This results in below panic. Tested reverting this patch and it fixes
>> the panic.
>>
>> Did you test this also with ingress or clsact qdisc (just try adding
>> it to lo dev for example) ?
>
> Hi Daniel,
>
> thanks for the report. Hmm, I am pretty sure clsact worked for me, but
> I'll recheck.
>
>> What happens is the following in qdisc_match_from_root():
>>
>> [  995.422187] XXX qdisc:ffff88025e4fc800 queue:ffff880262759000
>> dev:ffff880261cc2000 handle:ffff0000
>> [  995.422200] XXX qdisc:ffffffff81cf8100 queue:ffffffff81cf8240 dev:
>> (null) handle:ffff0000
>>
>> I believe this is due to dev_ingress_queue_create() assigning the
>> global noop_qdisc instance as qdisc_sleeping, which later qdisc_lookup()
>> uses for qdisc_match_from_root().
>>
>> But everything that uses things like noop_qdisc cannot work with the
>> new qdisc_match_from_root(), because qdisc_dev(root) will always trigger
>> NULL pointer dereference there. Reason is because the dev is always
>> NULL for noop, it's a singleton, see noop_qdisc and noop_netdev_queue
>> in sch_generic.c.
>>
>> Now how to fix it? Creating separate noop instances each time it's set
>> would be quite a waste of memory. Even fuglier would be to hack a static
>> net device struct into sch_generic.c and let noop_netdev_queue point there
>> to get to the hash table. Or we just not use qdisc_dev().
>
> How about we actually extend a little bit the TCQ_F_BUILTIN special case
> test in qdisc_match_from_root()?
>
> After the change, the only way how qdisc_dev() could be NULL should be a
> TCQ_F_BUILTIN case, right?
>
> I was thinking about something like the patch below (the reasong being
> that ->dev would be NULL only in cases of singletonish qdiscs) ...
> wouldn't that also fix the issue you're seeing? Have to think it through a
> little bit more ..

Ahh, so this has the same effect as previously observed with the other fix.

Perhaps it's just a dumping issue, but to the below clsact, there shouldn't
be pfifo_fast instances appearing.

# tc qdisc show dev wlp2s0b1
qdisc mq 0: root
qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
# tc qdisc add dev wlp2s0b1 clsact
# tc qdisc show dev wlp2s0b1
qdisc mq 0: root
qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc clsact ffff: parent ffff:fff1
qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index 25aada7..1c9faed 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -260,6 +260,9 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
>   {
>   	struct Qdisc *q;
>
> +	if (!qdisc_dev(root))
> +		return (root->handle == handle ? root : NULL);
> +
>   	if (!(root->flags & TCQ_F_BUILTIN) &&
>   	    root->handle == handle)
>   		return root;
>
>
> Thanks!
>
Jiri Kosina Aug. 12, 2016, 2:36 p.m. UTC | #2
On Fri, 12 Aug 2016, Daniel Borkmann wrote:

> > I was thinking about something like the patch below (the reasong being 
> > that ->dev would be NULL only in cases of singletonish qdiscs) ... 
> > wouldn't that also fix the issue you're seeing? Have to think it 
> > through a little bit more ..
> 
> Ahh, so this has the same effect as previously observed with the other fix.

Thanks a lot for confirming that this fixes the panic. I still have to 
think a little bit more about this though.

> Perhaps it's just a dumping issue, but to the below clsact, there shouldn't
> be pfifo_fast instances appearing.
> 
> # tc qdisc show dev wlp2s0b1
> qdisc mq 0: root
> qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> # tc qdisc add dev wlp2s0b1 clsact
> # tc qdisc show dev wlp2s0b1
> qdisc mq 0: root
> qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc clsact ffff: parent ffff:fff1
> qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

Hmm, no immediate idea where those are coming from, we'll have to figure 
it out. The mq device used here has 4 queues, right?
Daniel Borkmann Aug. 12, 2016, 2:44 p.m. UTC | #3
On 08/12/2016 04:36 PM, Jiri Kosina wrote:
> On Fri, 12 Aug 2016, Daniel Borkmann wrote:
>
>>> I was thinking about something like the patch below (the reasong being
>>> that ->dev would be NULL only in cases of singletonish qdiscs) ...
>>> wouldn't that also fix the issue you're seeing? Have to think it
>>> through a little bit more ..
>>
>> Ahh, so this has the same effect as previously observed with the other fix.
>
> Thanks a lot for confirming that this fixes the panic. I still have to
> think a little bit more about this though.
>
>> Perhaps it's just a dumping issue, but to the below clsact, there shouldn't
>> be pfifo_fast instances appearing.
>>
>> # tc qdisc show dev wlp2s0b1
>> qdisc mq 0: root
>> qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> # tc qdisc add dev wlp2s0b1 clsact
>> # tc qdisc show dev wlp2s0b1
>> qdisc mq 0: root
>> qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc clsact ffff: parent ffff:fff1
>> qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>
> Hmm, no immediate idea where those are coming from, we'll have to figure
> it out. The mq device used here has 4 queues, right?

Yes, the first tc qdisc show is after boot and how it should normally
look like, so 4 tx queues.

# ls /sys/class/net/wlp2s0b1/queues/
rx-0  tx-0  tx-1  tx-2  tx-3

When adding clsact, only the 'qdisc clsact' line should be extra. Given
the extra pfifo_fast ones look the same as above, I would suspect a htab
dumping issue, perhaps. I can debug a bit later tonight on this.
Cong Wang Aug. 14, 2016, 6:19 a.m. UTC | #4
On Fri, Aug 12, 2016 at 6:53 AM, Jiri Kosina <jikos@kernel.org> wrote:
> On Fri, 12 Aug 2016, Daniel Borkmann wrote:
>
>> This results in below panic. Tested reverting this patch and it fixes
>> the panic.
>>
>> Did you test this also with ingress or clsact qdisc (just try adding
>> it to lo dev for example) ?
>
> Hi Daniel,
>
> thanks for the report. Hmm, I am pretty sure clsact worked for me, but
> I'll recheck.
>
>> What happens is the following in qdisc_match_from_root():
>>
>> [  995.422187] XXX qdisc:ffff88025e4fc800 queue:ffff880262759000
>> dev:ffff880261cc2000 handle:ffff0000
>> [  995.422200] XXX qdisc:ffffffff81cf8100 queue:ffffffff81cf8240 dev:
>> (null) handle:ffff0000
>>
>> I believe this is due to dev_ingress_queue_create() assigning the
>> global noop_qdisc instance as qdisc_sleeping, which later qdisc_lookup()
>> uses for qdisc_match_from_root().
>>
>> But everything that uses things like noop_qdisc cannot work with the
>> new qdisc_match_from_root(), because qdisc_dev(root) will always trigger
>> NULL pointer dereference there. Reason is because the dev is always
>> NULL for noop, it's a singleton, see noop_qdisc and noop_netdev_queue
>> in sch_generic.c.
>>
>> Now how to fix it? Creating separate noop instances each time it's set
>> would be quite a waste of memory. Even fuglier would be to hack a static
>> net device struct into sch_generic.c and let noop_netdev_queue point there
>> to get to the hash table. Or we just not use qdisc_dev().
>
> How about we actually extend a little bit the TCQ_F_BUILTIN special case
> test in qdisc_match_from_root()?
>
> After the change, the only way how qdisc_dev() could be NULL should be a
> TCQ_F_BUILTIN case, right?
>
> I was thinking about something like the patch below (the reasong being
> that ->dev would be NULL only in cases of singletonish qdiscs) ...
> wouldn't that also fix the issue you're seeing? Have to think it through a
> little bit more ..

I think this is probably why we never show noop qdisc in dump. So I think
we should relax the singleton rule for noop_qdisc, to save some code
for noop_qdisc case and also for dumping noop_qdisc.

I will try to work on a patch tomorrow.
Jiri Kosina Aug. 15, 2016, 11:27 p.m. UTC | #5
On Sat, 13 Aug 2016, Cong Wang wrote:

> > How about we actually extend a little bit the TCQ_F_BUILTIN special case
> > test in qdisc_match_from_root()?
> >
> > After the change, the only way how qdisc_dev() could be NULL should be a
> > TCQ_F_BUILTIN case, right?
> >
> > I was thinking about something like the patch below (the reasong being
> > that ->dev would be NULL only in cases of singletonish qdiscs) ...
> > wouldn't that also fix the issue you're seeing? Have to think it through a
> > little bit more ..
> 
> I think this is probably why we never show noop qdisc in dump. 

Well, partially. A lot of 'default' qdiscs are omitted in a not really 
uniform and deterministic way. That's actually the primary point of this 
whole effort -- to get rid of the hidden qdiscs entirely.

> So I think we should relax the singleton rule for noop_qdisc, to save 
> some code for noop_qdisc case and also for dumping noop_qdisc.

Completely moving away from singleton qdiscs is one of the possibilities, 
but OTOH I think that my special-casing of !qdisc_dev(root) in 
qdisc_match_from_root() is correct handling of singletons. I've been 
completely off the grid for the past three days, but I plan to submit this 
as a proper followup fix tomorrow if noone has any objections.

> I will try to work on a patch tomorrow.

What still needs to be looked into are the duplicate clsact entries for 
multiqueue.

Thanks,
diff mbox

Patch

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 25aada7..1c9faed 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -260,6 +260,9 @@  static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
 {
 	struct Qdisc *q;
 
+	if (!qdisc_dev(root))
+		return (root->handle == handle ? root : NULL);
+
 	if (!(root->flags & TCQ_F_BUILTIN) &&
 	    root->handle == handle)
 		return root;