diff mbox

[net] net/sched: Fix use of wild pointer in mq_destroy() when qdisc_alloc fail

Message ID OF7A403C69.2C35464E-ON48257D7B.002E24AA-48257D7B.002F2A62@zte.com.cn
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

wang.bo116@zte.com.cn Oct. 24, 2014, 8:34 a.m. UTC
Hello:
	In mq_destroy() we should set pointer priv->qdiscs to null after free it.
	When attach_default_qdiscs -> qdisc_create_dflt -> mq_init -> qdisc_create_dflt fail -> qdisc_alloc fail,
mq_destroy() will called twice, the first time called in mq_init, and the second time called by qdisc_destroy -> mq_destroy,
if priv->qdiscs not set null after free, the second time to go into mq_destroy() will use wild pointer, becasuse if(!priv->qdiscs) not work.

The problem happend in my machine when ifconfig alloc memory failed:

ifconfig: page allocation failure. order:0, mode:0xd0, oom_adj:0
[<c0211a00>] (unwind_backtrace+0x0/0xd4) from [<c060dc14>] (dump_stack+0x18/0x1c)
[<c060dc14>] (dump_stack+0x18/0x1c) from [<c02a64f0>] (__alloc_pages_nodemask+0x910/0x9dc)
[<c02a64f0>] (__alloc_pages_nodemask+0x910/0x9dc) from [<c02cf0b4>] (cache_alloc_refill+0x364/0x788)
[<c02cf0b4>] (cache_alloc_refill+0x364/0x788) from [<c02cf7f4>] (__kmalloc+0x134/0x1e8)
[<c02cf7f4>] (__kmalloc+0x134/0x1e8) from [<c054b540>] (qdisc_alloc+0x24/0xbc)
[<c054b540>] (qdisc_alloc+0x24/0xbc) from [<c054b5f8>] (qdisc_create_dflt+0x20/0x60)
[<c054b5f8>] (qdisc_create_dflt+0x20/0x60) from [<c054c008>] (mq_init+0x8c/0xf4)
[<c054c008>] (mq_init+0x8c/0xf4) from [<c054b61c>] (qdisc_create_dflt+0x44/0x60)
[<c054b61c>] (qdisc_create_dflt+0x44/0x60) from [<c054b7b4>] (dev_activate+0xac/0x150)
[<c054b7b4>] (dev_activate+0xac/0x150) from [<c053a298>] (dev_open+0xf0/0x120)
[<c053a298>] (dev_open+0xf0/0x120) from [<c0539e08>] (dev_change_flags+0x94/0x164)
[<c0539e08>] (dev_change_flags+0x94/0x164) from [<c05804d8>] (devinet_ioctl+0x300/0x684)
[<c05804d8>] (devinet_ioctl+0x300/0x684) from [<c0581a4c>] (inet_ioctl+0xd0/0x104)
[<c0581a4c>] (inet_ioctl+0xd0/0x104) from [<c0526d0c>] (sock_ioctl+0x200/0x250)
[<c0526d0c>] (sock_ioctl+0x200/0x250) from [<c02e2010>] (vfs_ioctl+0x34/0xb4)
[<c02e2010>] (vfs_ioctl+0x34/0xb4) from [<c02e2b6c>] (do_vfs_ioctl+0x56c/0x5d8)
[<c02e2b6c>] (do_vfs_ioctl+0x56c/0x5d8) from [<c02e2c18>] (sys_ioctl+0x40/0x64)
[<c02e2c18>] (sys_ioctl+0x40/0x64) from [<c0209a60>] (ret_fast_syscall+0x0/0x38)

Unable to handle kernel paging request at virtual address 6b6b6b73
pgd = c1e70000
[6b6b6b73] *pgd=00000000
Internal error: Oops: 15 [#1] PREEMPT
last sysfs file:
Modules linked in:
CPU: 0    Tainted: G        W   (2.6.32.61-EMBSYS-CGEL-4.03.20.P3.F0.B5MAXCNF #2)
PC is at qdisc_destroy+0xc/0xb4
LR is at mq_destroy+0x34/0x60
pc : [<c054b084>]    lr : [<c054bf50>]    psr: 20000213
sp : c191bd80  ip : c191bd98  fp : c191bd94
r10: 00000000  r9 : c191be70  r8 : c1bff40c
r7 : c1c2e000  r6 : c1f3e140  r5 : 00000000  r4 : c1f3e0a0
r3 : f2266ea0  r2 : 00000000  r1 : c1f3e0cc  r0 : 6b6b6b6b
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 12c5387d  Table: 01e70019  DAC: 55555555
Process ifconfig (pid: 391, stack limit = 0xc191a2e8)
Stack: (0xc191bd80 to 0xc191c000)
[<c054b084>] (qdisc_destroy+0xc/0xb4) from [<c054bf50>] (mq_destroy+0x34/0x60)
[<c054bf50>] (mq_destroy+0x34/0x60) from [<c054b0ec>] (qdisc_destroy+0x74/0xb4)
[<c054b0ec>] (qdisc_destroy+0x74/0xb4) from [<c054b62c>] (qdisc_create_dflt+0x54/0x60)
[<c054b62c>] (qdisc_create_dflt+0x54/0x60) from [<c054b7b4>] (dev_activate+0xac/0x150)
[<c054b7b4>] (dev_activate+0xac/0x150) from [<c053a298>] (dev_open+0xf0/0x120)
[<c053a298>] (dev_open+0xf0/0x120) from [<c0539e08>] (dev_change_flags+0x94/0x164)
[<c0539e08>] (dev_change_flags+0x94/0x164) from [<c05804d8>] (devinet_ioctl+0x300/0x684)
[<c05804d8>] (devinet_ioctl+0x300/0x684) from [<c0581a4c>] (inet_ioctl+0xd0/0x104)
[<c0581a4c>] (inet_ioctl+0xd0/0x104) from [<c0526d0c>] (sock_ioctl+0x200/0x250)
[<c0526d0c>] (sock_ioctl+0x200/0x250) from [<c02e2010>] (vfs_ioctl+0x34/0xb4)
[<c02e2010>] (vfs_ioctl+0x34/0xb4) from [<c02e2b6c>] (do_vfs_ioctl+0x56c/0x5d8)
[<c02e2b6c>] (do_vfs_ioctl+0x56c/0x5d8) from [<c02e2c18>] (sys_ioctl+0x40/0x64)
[<c02e2c18>] (sys_ioctl+0x40/0x64) from [<c0209a60>] (ret_fast_syscall+0x0/0x38)
Code: e89da8f0 e1a0c00d e92dd830 e24cb004 (e5903008)
---[ end trace 8e66b5118c0bea77 ]---
Kernel panic - not syncing: Fatal exception

--------------------------------------------------------------------------------

This patch  fix this problem, base on linux 3.18-rc-1:

Signed-off-by: Wang Bo <wang.bo116@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

John Fastabend Oct. 24, 2014, 5:49 p.m. UTC | #1
On 10/24/2014 01:34 AM, wang.bo116@zte.com.cn wrote:
>

[...]

>
> --------------------------------------------------------------------------------
>
> This patch  fix this problem, base on linux 3.18-rc-1:
>
> Signed-off-by: Wang Bo <wang.bo116@zte.com.cn>
> Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
> diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
> index 42f72f1..a0c90e7 100755
> --- a/net/sched/sch_mq.c
> +++ b/net/sched/sch_mq.c
> @@ -33,6 +33,7 @@ static void mq_destroy(struct Qdisc *sch)
>   	for (ntx = 0; ntx < dev->num_tx_queues && priv->qdiscs[ntx]; ntx++)
>   		qdisc_destroy(priv->qdiscs[ntx]);
>   	kfree(priv->qdiscs);
> +	priv->qdiscs = NULL;
>   }
>
>   static int mq_init(struct Qdisc *sch, struct nlattr *opt)
>

Acked-by: John Fastabend <john.r.fastabend@intel.com>

Patch looks fine, another way to fix this would be drop the
mq_destroy() call in the error path. I'm not convinced one
is any better than the other but maybe some other folks have
opinions, it seems a bit wrong to call mq_destroy twice so in
that sense it may be a bit nicer to drop the mq_destroy().

Also same bug in mqprio do you want to submit a patch for
that qdisc as well? Otherwise I can.

Thanks!
Cong Wang Oct. 24, 2014, 6:13 p.m. UTC | #2
On Fri, Oct 24, 2014 at 10:49 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
>
> Patch looks fine, another way to fix this would be drop the
> mq_destroy() call in the error path. I'm not convinced one
> is any better than the other but maybe some other folks have
> opinions, it seems a bit wrong to call mq_destroy twice so in
> that sense it may be a bit nicer to drop the mq_destroy().

Dropping mq_destroy() in error path is indeed better,
because upper layer does cleanup intentionally.
Look at what other qdisc's do. :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Oct. 24, 2014, 6:58 p.m. UTC | #3
On Fri, Oct 24, 2014 at 11:13 AM, Cong Wang <cwang@twopensource.com> wrote:
> On Fri, Oct 24, 2014 at 10:49 AM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>>
>> Patch looks fine, another way to fix this would be drop the
>> mq_destroy() call in the error path. I'm not convinced one
>> is any better than the other but maybe some other folks have
>> opinions, it seems a bit wrong to call mq_destroy twice so in
>> that sense it may be a bit nicer to drop the mq_destroy().
>
> Dropping mq_destroy() in error path is indeed better,
> because upper layer does cleanup intentionally.
> Look at what other qdisc's do. :)


Anyway, I am fixing other qdisc's now, will leave mq and
mq_prio to you.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 24, 2014, 7:14 p.m. UTC | #4
On Fri, Oct 24, 2014 at 11:13:56AM -0700, Cong Wang wrote:
> On Fri, Oct 24, 2014 at 10:49 AM, John Fastabend
> <john.fastabend@gmail.com> wrote:
> >
> > Patch looks fine, another way to fix this would be drop the
> > mq_destroy() call in the error path. I'm not convinced one
> > is any better than the other but maybe some other folks have
> > opinions, it seems a bit wrong to call mq_destroy twice so in
> > that sense it may be a bit nicer to drop the mq_destroy().
> 
> Dropping mq_destroy() in error path is indeed better,
> because upper layer does cleanup intentionally.
> Look at what other qdisc's do. :)

I would argue that the qdisc_destroy() call in qdisc_create_dflt()
is wrong, it should instead free the qdisc and release the module
reference manually as done in qdisc_create().

qdisc_destroy() should only be called for fully initialized qdiscs.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Oct. 24, 2014, 8:52 p.m. UTC | #5
On Fri, Oct 24, 2014 at 12:14 PM, Patrick McHardy <kaber@trash.net> wrote:
> On Fri, Oct 24, 2014 at 11:13:56AM -0700, Cong Wang wrote:
>> On Fri, Oct 24, 2014 at 10:49 AM, John Fastabend
>> <john.fastabend@gmail.com> wrote:
>> >
>> > Patch looks fine, another way to fix this would be drop the
>> > mq_destroy() call in the error path. I'm not convinced one
>> > is any better than the other but maybe some other folks have
>> > opinions, it seems a bit wrong to call mq_destroy twice so in
>> > that sense it may be a bit nicer to drop the mq_destroy().
>>
>> Dropping mq_destroy() in error path is indeed better,
>> because upper layer does cleanup intentionally.
>> Look at what other qdisc's do. :)
>
> I would argue that the qdisc_destroy() call in qdisc_create_dflt()
> is wrong, it should instead free the qdisc and release the module
> reference manually as done in qdisc_create().
>
> qdisc_destroy() should only be called for fully initialized qdiscs.

Probably, but at least ->destroy() should be called, looking at
those calling qdisc_watchdog_init(), they are supposed to call
qdisc_watchdog_cancel() when >init() fails after that.

->destroy() is supposed to be able to clean up even partially
initialized qdisc's. So, for qdisc_create_dflt() we should probably
just call ->destroy().

Reading the code again, seems it is inconsistent with qdisc_create(),
where ->destroy() is skipped when ->init() fails. Hmm, we have
a bigger problem here.

I am working on a patch now. Thanks for pointing this out.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 24, 2014, 9:45 p.m. UTC | #6
On Fri, Oct 24, 2014 at 01:52:00PM -0700, Cong Wang wrote:
> On Fri, Oct 24, 2014 at 12:14 PM, Patrick McHardy <kaber@trash.net> wrote:
> > On Fri, Oct 24, 2014 at 11:13:56AM -0700, Cong Wang wrote:
> >> On Fri, Oct 24, 2014 at 10:49 AM, John Fastabend
> >> <john.fastabend@gmail.com> wrote:
> >> >
> >> > Patch looks fine, another way to fix this would be drop the
> >> > mq_destroy() call in the error path. I'm not convinced one
> >> > is any better than the other but maybe some other folks have
> >> > opinions, it seems a bit wrong to call mq_destroy twice so in
> >> > that sense it may be a bit nicer to drop the mq_destroy().
> >>
> >> Dropping mq_destroy() in error path is indeed better,
> >> because upper layer does cleanup intentionally.
> >> Look at what other qdisc's do. :)
> >
> > I would argue that the qdisc_destroy() call in qdisc_create_dflt()
> > is wrong, it should instead free the qdisc and release the module
> > reference manually as done in qdisc_create().
> >
> > qdisc_destroy() should only be called for fully initialized qdiscs.
> 
> Probably, but at least ->destroy() should be called, looking at
> those calling qdisc_watchdog_init(), they are supposed to call
> qdisc_watchdog_cancel() when >init() fails after that.

In which cases does it actually fail after that? Usually this is
called once initialization is complete.

> ->destroy() is supposed to be able to clean up even partially
> initialized qdisc's. So, for qdisc_create_dflt() we should probably
> just call ->destroy().

No, why do you think that? ->init() is supposed to clean up after
itself if it fails, both qdisc_destroy() and ->destroy are only
supposed to be called after ->init() succeeded.

Its simply symetrical, as everywhere else in the kernel. If a sub-init
funtion fails, it should clean up and return an error. We don't
destroy things we've never successfully initialized, they're supposed
to clean up after themselves.

> Reading the code again, seems it is inconsistent with qdisc_create(),
> where ->destroy() is skipped when ->init() fails. Hmm, we have
> a bigger problem here.
> 
> I am working on a patch now. Thanks for pointing this out.

No, that is not inconsistent, it is consistent with what we're doing
everywhere else in the kernel, what the qdisc layer has always done
and how qdiscs expect it.

Where are you seeing a problem now? Its a simple fix:
qdisc_create_dflt() should behave similar to qdisc_create().

If any ->init() functions than don't properly clean up on error, that's
an additional bug.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Oct. 24, 2014, 10:17 p.m. UTC | #7
On Fri, Oct 24, 2014 at 2:45 PM, Patrick McHardy <kaber@trash.net> wrote:
> On Fri, Oct 24, 2014 at 01:52:00PM -0700, Cong Wang wrote:
>> On Fri, Oct 24, 2014 at 12:14 PM, Patrick McHardy <kaber@trash.net> wrote:
>> > On Fri, Oct 24, 2014 at 11:13:56AM -0700, Cong Wang wrote:
>> >> On Fri, Oct 24, 2014 at 10:49 AM, John Fastabend
>> >> <john.fastabend@gmail.com> wrote:
>> >> >
>> >> > Patch looks fine, another way to fix this would be drop the
>> >> > mq_destroy() call in the error path. I'm not convinced one
>> >> > is any better than the other but maybe some other folks have
>> >> > opinions, it seems a bit wrong to call mq_destroy twice so in
>> >> > that sense it may be a bit nicer to drop the mq_destroy().
>> >>
>> >> Dropping mq_destroy() in error path is indeed better,
>> >> because upper layer does cleanup intentionally.
>> >> Look at what other qdisc's do. :)
>> >
>> > I would argue that the qdisc_destroy() call in qdisc_create_dflt()
>> > is wrong, it should instead free the qdisc and release the module
>> > reference manually as done in qdisc_create().
>> >
>> > qdisc_destroy() should only be called for fully initialized qdiscs.
>>
>> Probably, but at least ->destroy() should be called, looking at
>> those calling qdisc_watchdog_init(), they are supposed to call
>> qdisc_watchdog_cancel() when >init() fails after that.
>
> In which cases does it actually fail after that? Usually this is
> called once initialization is complete.

How about tbf_change() in tbf_init()? If tbf_change() fails,
watchdog is still there if we don't call ->destroy(). Yes,
I know the timer is started, the point is we do miss something
clean up, even trivial.

tbf is not the only one who calls xxx_change() in xxx_init().

>
>> ->destroy() is supposed to be able to clean up even partially
>> initialized qdisc's. So, for qdisc_create_dflt() we should probably
>> just call ->destroy().
>
> No, why do you think that? ->init() is supposed to clean up after
> itself if it fails, both qdisc_destroy() and ->destroy are only
> supposed to be called after ->init() succeeded.


Why? This would end up with calling xxx_destroy() in every
failure path in ->init(). Moving it up to caller makes the code cleaner
and smaller.

>
> Its simply symetrical, as everywhere else in the kernel. If a sub-init
> funtion fails, it should clean up and return an error. We don't
> destroy things we've never successfully initialized, they're supposed
> to clean up after themselves.

Most (if not all) ->destroy() are able to clean partially initialized qdisc,
I don't see why it  could be a problem here.

We don't have to keep with other kernel subsystem as long as it makes
sense, net_sched subsystem is pretty much self-contained.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 25, 2014, 12:33 a.m. UTC | #8
On Fri, Oct 24, 2014 at 03:17:41PM -0700, Cong Wang wrote:
> On Fri, Oct 24, 2014 at 2:45 PM, Patrick McHardy <kaber@trash.net> wrote:
> >> >
> >> > I would argue that the qdisc_destroy() call in qdisc_create_dflt()
> >> > is wrong, it should instead free the qdisc and release the module
> >> > reference manually as done in qdisc_create().
> >> >
> >> > qdisc_destroy() should only be called for fully initialized qdiscs.
> >>
> >> Probably, but at least ->destroy() should be called, looking at
> >> those calling qdisc_watchdog_init(), they are supposed to call
> >> qdisc_watchdog_cancel() when >init() fails after that.
> >
> > In which cases does it actually fail after that? Usually this is
> > called once initialization is complete.
> 
> How about tbf_change() in tbf_init()? If tbf_change() fails,
> watchdog is still there if we don't call ->destroy(). Yes,
> I know the timer is started, the point is we do miss something
> clean up, even trivial.
> 
> tbf is not the only one who calls xxx_change() in xxx_init().

Then these are bugs. On failure we exit with the same state as we
entered, there's nothing new about that.

This in fact is *no* bug though since qdisc_watchdog_init() merely
initializes the timer and doesn't require cleanup.

> > Its simply symetrical, as everywhere else in the kernel. If a sub-init
> > funtion fails, it should clean up and return an error. We don't
> > destroy things we've never successfully initialized, they're supposed
> > to clean up after themselves.
> 
> Most (if not all) ->destroy() are able to clean partially initialized qdisc,
> I don't see why it  could be a problem here.
> 
> We don't have to keep with other kernel subsystem as long as it makes
> sense, net_sched subsystem is pretty much self-contained.

Its about having a sane API.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Oct. 25, 2014, 12:57 a.m. UTC | #9
On Fri, Oct 24, 2014 at 5:33 PM, Patrick McHardy <kaber@trash.net> wrote:
>
> Its about having a sane API.

I don't see why calling ->destroy() on failure is not sane in qdisc case.
I never want to argue general case.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 25, 2014, 1:33 a.m. UTC | #10
On Fri, Oct 24, 2014 at 05:57:59PM -0700, Cong Wang wrote:
> On Fri, Oct 24, 2014 at 5:33 PM, Patrick McHardy <kaber@trash.net> wrote:
> >
> > Its about having a sane API.
> 
> I don't see why calling ->destroy() on failure is not sane in qdisc case.
> I never want to argue general case.

Because it makes things more complicated. You need to keep track of what
was actually initialized since you can't assume a consistent state in
->destroy() anymore. If ->init() fails, it knows where it failed,
->destroy() can't know that.

Look at htb_destroy() for an example. It starts with

	cancel_work_sync(&q->work);

Was that actually initialized and can be cancled? You don't know.
Next comes

	qdisc_watchdog_cancel(&q->watchdog);

Same here, if the error happened before it was initialized, crash.

These are just the first two lines. You get the problem.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 42f72f1..a0c90e7 100755
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -33,6 +33,7 @@  static void mq_destroy(struct Qdisc *sch)
 	for (ntx = 0; ntx < dev->num_tx_queues && priv->qdiscs[ntx]; ntx++)
 		qdisc_destroy(priv->qdiscs[ntx]);
 	kfree(priv->qdiscs);
+	priv->qdiscs = NULL;
 }

 static int mq_init(struct Qdisc *sch, struct nlattr *opt)