diff mbox series

[net] sch_htb: fix crash on init failure

Message ID 1504029487-7085-1-git-send-email-nikolay@cumulusnetworks.com
State Superseded, archived
Delegated to: David Miller
Headers show
Series [net] sch_htb: fix crash on init failure | expand

Commit Message

Nikolay Aleksandrov Aug. 29, 2017, 5:58 p.m. UTC
The commit below added a call to the ->destroy() callback for all qdiscs
which failed in their ->init(), but some were not prepared for such
change and can't handle partially initialized qdisc. HTB is one of them
and if any error occurs before the qdisc watchdog timer and qdisc work are
initialized then we can hit either a null ptr deref (timer->base) when
canceling in ->destroy or lockdep error info about trying to register
a non-static key and a stack dump. So to fix these two move the watchdog
timer and workqueue init before anything that can err out.
To reproduce userspace needs to send broken htb qdisc create request,
tested with a modified tc (q_htb.c).

Trace log:
[ 2710.897602] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 2710.897977] IP: hrtimer_active+0x17/0x8a
[ 2710.898174] PGD 58fab067
[ 2710.898175] P4D 58fab067
[ 2710.898353] PUD 586c0067
[ 2710.898531] PMD 0
[ 2710.898710]
[ 2710.899045] Oops: 0000 [#1] SMP
[ 2710.899232] Modules linked in:
[ 2710.899419] CPU: 1 PID: 950 Comm: tc Not tainted 4.13.0-rc6+ #54
[ 2710.899646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 2710.900035] task: ffff880059ed2700 task.stack: ffff88005ad4c000
[ 2710.900262] RIP: 0010:hrtimer_active+0x17/0x8a
[ 2710.900467] RSP: 0018:ffff88005ad4f960 EFLAGS: 00010246
[ 2710.900684] RAX: 0000000000000000 RBX: ffff88003701e298 RCX: 0000000000000000
[ 2710.900933] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003701e298
[ 2710.901177] RBP: ffff88005ad4f980 R08: 0000000000000001 R09: 0000000000000001
[ 2710.901419] R10: ffff88005ad4f800 R11: 0000000000000400 R12: 0000000000000000
[ 2710.901663] R13: ffff88003701e298 R14: ffffffff822a4540 R15: ffff88005ad4fac0
[ 2710.901907] FS:  00007f2f5e90f740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000
[ 2710.902277] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2710.902500] CR2: 0000000000000000 CR3: 0000000058ca3000 CR4: 00000000000406e0
[ 2710.902744] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2710.902977] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2710.903180] Call Trace:
[ 2710.903332]  hrtimer_try_to_cancel+0x1a/0x93
[ 2710.903504]  hrtimer_cancel+0x15/0x20
[ 2710.903667]  qdisc_watchdog_cancel+0x12/0x14
[ 2710.903866]  htb_destroy+0x2e/0xf7
[ 2710.904097]  qdisc_create+0x377/0x3fd
[ 2710.904330]  tc_modify_qdisc+0x4d2/0x4fd
[ 2710.904511]  rtnetlink_rcv_msg+0x188/0x197
[ 2710.904682]  ? rcu_read_unlock+0x3e/0x5f
[ 2710.904849]  ? rtnl_newlink+0x729/0x729
[ 2710.905017]  netlink_rcv_skb+0x6c/0xce
[ 2710.905183]  rtnetlink_rcv+0x23/0x2a
[ 2710.905345]  netlink_unicast+0x103/0x181
[ 2710.905511]  netlink_sendmsg+0x326/0x337
[ 2710.905679]  sock_sendmsg_nosec+0x14/0x3f
[ 2710.905847]  sock_sendmsg+0x29/0x2e
[ 2710.906010]  ___sys_sendmsg+0x209/0x28b
[ 2710.906176]  ? do_raw_spin_unlock+0xcd/0xf8
[ 2710.906346]  ? _raw_spin_unlock+0x27/0x31
[ 2710.906514]  ? __handle_mm_fault+0x651/0xdb1
[ 2710.906685]  ? check_chain_key+0xb0/0xfd
[ 2710.906855]  __sys_sendmsg+0x45/0x63
[ 2710.907018]  ? __sys_sendmsg+0x45/0x63
[ 2710.907185]  SyS_sendmsg+0x19/0x1b
[ 2710.907344]  entry_SYSCALL_64_fastpath+0x23/0xc2

Note that probably this bug goes further back because the default qdisc
handling always calls ->destroy on init failure too.

Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
Always calling qdisc destroy on init failure in the default qdisc handling
was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
one as fixes tag.

There're more fixes to come, some are much easier to trigger without
modifications to tc.

 net/sched/sch_htb.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Eric Dumazet Aug. 29, 2017, 6:09 p.m. UTC | #1
On Tue, 2017-08-29 at 20:58 +0300, Nikolay Aleksandrov wrote:
> The commit below added a call to the ->destroy() callback for all qdiscs
> which failed in their ->init(), but some were not prepared for such
> change and can't handle partially initialized qdisc. HTB is one of them
> and if any error occurs before the qdisc watchdog timer and qdisc work are
> initialized then we can hit either a null ptr deref (timer->base) when
> canceling in ->destroy or lockdep error info about trying to register
> a non-static key and a stack dump. So to fix these two move the watchdog
> timer and workqueue init before anything that can err out.
> To reproduce userspace needs to send broken htb qdisc create request,
> tested with a modified tc (q_htb.c).

> Note that probably this bug goes further back because the default qdisc
> handling always calls ->destroy on init failure too.
> 
> Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---
> Always calling qdisc destroy on init failure in the default qdisc handling
> was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
> one as fixes tag.

Well, we probably need to audit init/destroy not only in net/sched, but
other parts of networking stack.

What about the qdisc_skb_head_init(&q->direct_queue) call ?

I am surprised you do not crash in __skb_queue_purge(&q->direct_queue);
Cong Wang Aug. 29, 2017, 6:13 p.m. UTC | #2
On Tue, Aug 29, 2017 at 11:09 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2017-08-29 at 20:58 +0300, Nikolay Aleksandrov wrote:
>> The commit below added a call to the ->destroy() callback for all qdiscs
>> which failed in their ->init(), but some were not prepared for such
>> change and can't handle partially initialized qdisc. HTB is one of them
>> and if any error occurs before the qdisc watchdog timer and qdisc work are
>> initialized then we can hit either a null ptr deref (timer->base) when
>> canceling in ->destroy or lockdep error info about trying to register
>> a non-static key and a stack dump. So to fix these two move the watchdog
>> timer and workqueue init before anything that can err out.
>> To reproduce userspace needs to send broken htb qdisc create request,
>> tested with a modified tc (q_htb.c).
>
>> Note that probably this bug goes further back because the default qdisc
>> handling always calls ->destroy on init failure too.
>>
>> Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> ---
>> Always calling qdisc destroy on init failure in the default qdisc handling
>> was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
>> one as fixes tag.
>
> Well, we probably need to audit init/destroy not only in net/sched, but
> other parts of networking stack.
>
> What about the qdisc_skb_head_init(&q->direct_queue) call ?

It just zero the pointers:

static inline void qdisc_skb_head_init(struct qdisc_skb_head *qh)
{
        qh->head = NULL;
        qh->tail = NULL;
        qh->qlen = 0;
}

And qdisc is already kzalloc()'ed.
Eric Dumazet Aug. 29, 2017, 6:14 p.m. UTC | #3
On Tue, 2017-08-29 at 11:09 -0700, Eric Dumazet wrote:
> On Tue, 2017-08-29 at 20:58 +0300, Nikolay Aleksandrov wrote:
> > The commit below added a call to the ->destroy() callback for all qdiscs
> > which failed in their ->init(), but some were not prepared for such
> > change and can't handle partially initialized qdisc. HTB is one of them
> > and if any error occurs before the qdisc watchdog timer and qdisc work are
> > initialized then we can hit either a null ptr deref (timer->base) when
> > canceling in ->destroy or lockdep error info about trying to register
> > a non-static key and a stack dump. So to fix these two move the watchdog
> > timer and workqueue init before anything that can err out.
> > To reproduce userspace needs to send broken htb qdisc create request,
> > tested with a modified tc (q_htb.c).
> 
> > Note that probably this bug goes further back because the default qdisc
> > handling always calls ->destroy on init failure too.
> > 
> > Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
> > Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> > ---
> > Always calling qdisc destroy on init failure in the default qdisc handling
> > was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
> > one as fixes tag.
> 
> Well, we probably need to audit init/destroy not only in net/sched, but
> other parts of networking stack.
> 
> What about the qdisc_skb_head_init(&q->direct_queue) call ?
> 
> I am surprised you do not crash in __skb_queue_purge(&q->direct_queue);

Oh, this is because skb_peek() is happy if queue->next is NULL.
Nikolay Aleksandrov Aug. 29, 2017, 6:19 p.m. UTC | #4
On 29/08/17 21:09, Eric Dumazet wrote:
> On Tue, 2017-08-29 at 20:58 +0300, Nikolay Aleksandrov wrote:
>> The commit below added a call to the ->destroy() callback for all qdiscs
>> which failed in their ->init(), but some were not prepared for such
>> change and can't handle partially initialized qdisc. HTB is one of them
>> and if any error occurs before the qdisc watchdog timer and qdisc work are
>> initialized then we can hit either a null ptr deref (timer->base) when
>> canceling in ->destroy or lockdep error info about trying to register
>> a non-static key and a stack dump. So to fix these two move the watchdog
>> timer and workqueue init before anything that can err out.
>> To reproduce userspace needs to send broken htb qdisc create request,
>> tested with a modified tc (q_htb.c).
> 
>> Note that probably this bug goes further back because the default qdisc
>> handling always calls ->destroy on init failure too.
>>
>> Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> ---
>> Always calling qdisc destroy on init failure in the default qdisc handling
>> was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
>> one as fixes tag.
> 
> Well, we probably need to audit init/destroy not only in net/sched, but
> other parts of networking stack.
> 

I'm not sure I follow, I hit this while working on a net/sched/ patch and had to error
out in the init() function.

> What about the qdisc_skb_head_init(&q->direct_queue) call ?
> 
> I am surprised you do not crash in __skb_queue_purge(&q->direct_queue);
> 

Hm, do you mean in __qdisc_reset_queue() ? 
I have only tried/seen the crash happen on qdisc add.

A much simpler and easier bug is sch_multiq (ethX non-multiq):
$ tc qdisc add dev ethX root multiq
(error EOPNOTSUPP + double free of ->queues due to free in init())
and e.g. $ ip l add dumdum type dummy
to see a crash due to the corrupted memory.
Cong Wang Aug. 29, 2017, 6:20 p.m. UTC | #5
On Tue, Aug 29, 2017 at 10:58 AM, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
> The commit below added a call to the ->destroy() callback for all qdiscs
> which failed in their ->init(), but some were not prepared for such
> change and can't handle partially initialized qdisc. HTB is one of them
> and if any error occurs before the qdisc watchdog timer and qdisc work are
> initialized then we can hit either a null ptr deref (timer->base) when
> canceling in ->destroy or lockdep error info about trying to register
> a non-static key and a stack dump. So to fix these two move the watchdog
> timer and workqueue init before anything that can err out.
> To reproduce userspace needs to send broken htb qdisc create request,
> tested with a modified tc (q_htb.c).
>
> Trace log:
> [ 2710.897602] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 2710.897977] IP: hrtimer_active+0x17/0x8a
> [ 2710.898174] PGD 58fab067
> [ 2710.898175] P4D 58fab067
> [ 2710.898353] PUD 586c0067
> [ 2710.898531] PMD 0
> [ 2710.898710]
> [ 2710.899045] Oops: 0000 [#1] SMP
> [ 2710.899232] Modules linked in:
> [ 2710.899419] CPU: 1 PID: 950 Comm: tc Not tainted 4.13.0-rc6+ #54
> [ 2710.899646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
> [ 2710.900035] task: ffff880059ed2700 task.stack: ffff88005ad4c000
> [ 2710.900262] RIP: 0010:hrtimer_active+0x17/0x8a
> [ 2710.900467] RSP: 0018:ffff88005ad4f960 EFLAGS: 00010246
> [ 2710.900684] RAX: 0000000000000000 RBX: ffff88003701e298 RCX: 0000000000000000
> [ 2710.900933] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003701e298
> [ 2710.901177] RBP: ffff88005ad4f980 R08: 0000000000000001 R09: 0000000000000001
> [ 2710.901419] R10: ffff88005ad4f800 R11: 0000000000000400 R12: 0000000000000000
> [ 2710.901663] R13: ffff88003701e298 R14: ffffffff822a4540 R15: ffff88005ad4fac0
> [ 2710.901907] FS:  00007f2f5e90f740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000
> [ 2710.902277] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2710.902500] CR2: 0000000000000000 CR3: 0000000058ca3000 CR4: 00000000000406e0
> [ 2710.902744] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2710.902977] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2710.903180] Call Trace:
> [ 2710.903332]  hrtimer_try_to_cancel+0x1a/0x93
> [ 2710.903504]  hrtimer_cancel+0x15/0x20
> [ 2710.903667]  qdisc_watchdog_cancel+0x12/0x14
> [ 2710.903866]  htb_destroy+0x2e/0xf7
> [ 2710.904097]  qdisc_create+0x377/0x3fd
> [ 2710.904330]  tc_modify_qdisc+0x4d2/0x4fd
> [ 2710.904511]  rtnetlink_rcv_msg+0x188/0x197
> [ 2710.904682]  ? rcu_read_unlock+0x3e/0x5f
> [ 2710.904849]  ? rtnl_newlink+0x729/0x729
> [ 2710.905017]  netlink_rcv_skb+0x6c/0xce
> [ 2710.905183]  rtnetlink_rcv+0x23/0x2a
> [ 2710.905345]  netlink_unicast+0x103/0x181
> [ 2710.905511]  netlink_sendmsg+0x326/0x337
> [ 2710.905679]  sock_sendmsg_nosec+0x14/0x3f
> [ 2710.905847]  sock_sendmsg+0x29/0x2e
> [ 2710.906010]  ___sys_sendmsg+0x209/0x28b
> [ 2710.906176]  ? do_raw_spin_unlock+0xcd/0xf8
> [ 2710.906346]  ? _raw_spin_unlock+0x27/0x31
> [ 2710.906514]  ? __handle_mm_fault+0x651/0xdb1
> [ 2710.906685]  ? check_chain_key+0xb0/0xfd
> [ 2710.906855]  __sys_sendmsg+0x45/0x63
> [ 2710.907018]  ? __sys_sendmsg+0x45/0x63
> [ 2710.907185]  SyS_sendmsg+0x19/0x1b
> [ 2710.907344]  entry_SYSCALL_64_fastpath+0x23/0xc2
>
> Note that probably this bug goes further back because the default qdisc
> handling always calls ->destroy on init failure too.
>
> Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---
> Always calling qdisc destroy on init failure in the default qdisc handling
> was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
> one as fixes tag.
>
> There're more fixes to come, some are much easier to trigger without
> modifications to tc.

Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Eric Dumazet Aug. 29, 2017, 6:23 p.m. UTC | #6
On Tue, 2017-08-29 at 11:13 -0700, Cong Wang wrote:
> On Tue, Aug 29, 2017 at 11:09 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Tue, 2017-08-29 at 20:58 +0300, Nikolay Aleksandrov wrote:
> >> The commit below added a call to the ->destroy() callback for all qdiscs
> >> which failed in their ->init(), but some were not prepared for such
> >> change and can't handle partially initialized qdisc. HTB is one of them
> >> and if any error occurs before the qdisc watchdog timer and qdisc work are
> >> initialized then we can hit either a null ptr deref (timer->base) when
> >> canceling in ->destroy or lockdep error info about trying to register
> >> a non-static key and a stack dump. So to fix these two move the watchdog
> >> timer and workqueue init before anything that can err out.
> >> To reproduce userspace needs to send broken htb qdisc create request,
> >> tested with a modified tc (q_htb.c).
> >
> >> Note that probably this bug goes further back because the default qdisc
> >> handling always calls ->destroy on init failure too.
> >>
> >> Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
> >> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> >> ---
> >> Always calling qdisc destroy on init failure in the default qdisc handling
> >> was added in commit 0fbbeb1ba43b. I'm not sure if I should include that
> >> one as fixes tag.
> >
> > Well, we probably need to audit init/destroy not only in net/sched, but
> > other parts of networking stack.
> >
> > What about the qdisc_skb_head_init(&q->direct_queue) call ?
> 
> It just zero the pointers:
> 
> static inline void qdisc_skb_head_init(struct qdisc_skb_head *qh)
> {
>         qh->head = NULL;
>         qh->tail = NULL;
>         qh->qlen = 0;
> }
> 
> And qdisc is already kzalloc()'ed.

Yeah, I was looking at some old tree, not having the
__qdisc_reset_queue() yet.
David Miller Aug. 29, 2017, 11:06 p.m. UTC | #7
I expect that you will resubmit all of these similar fixes as a patch
series after you have sorted everything out.

Correct?
Nikolay Aleksandrov Aug. 29, 2017, 11:11 p.m. UTC | #8
On 30.08.2017 02:06, David Miller wrote:
> 
> I expect that you will resubmit all of these similar fixes as a patch
> series after you have sorted everything out.
> 
> Correct?
> 

Yes, I will. There are a few more places that need fixing, I'll resubmit
them all as a set.
diff mbox series

Patch

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 5d65ec5207e9..5bf5177b2bd3 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1017,6 +1017,9 @@  static int htb_init(struct Qdisc *sch, struct nlattr *opt)
 	int err;
 	int i;
 
+	qdisc_watchdog_init(&q->watchdog, sch);
+	INIT_WORK(&q->work, htb_work_func);
+
 	if (!opt)
 		return -EINVAL;
 
@@ -1041,8 +1044,6 @@  static int htb_init(struct Qdisc *sch, struct nlattr *opt)
 	for (i = 0; i < TC_HTB_NUMPRIO; i++)
 		INIT_LIST_HEAD(q->drops + i);
 
-	qdisc_watchdog_init(&q->watchdog, sch);
-	INIT_WORK(&q->work, htb_work_func);
 	qdisc_skb_head_init(&q->direct_queue);
 
 	if (tb[TCA_HTB_DIRECT_QLEN])