diff mbox series

netfilter:bridge: Hold bridge dev for fake_rtable to avoid the dangling pointer

Message ID 20190402125609.30313-1-rdong.ge@gmail.com
State Awaiting Upstream
Delegated to: David Miller
Headers show
Series netfilter:bridge: Hold bridge dev for fake_rtable to avoid the dangling pointer | expand

Commit Message

Rundong Ge April 2, 2019, 12:56 p.m. UTC
Problem:
When bridge-nf-call-iptables is enabled, skb_dst(skb) of packets that
in the nfqueue may be a dangling pointer if user delete the bridge.
Because packets go through the br_nf_pre_routing_finish will set the dst
pointer to the br->fake_rtable. But the br struct will be freed
without the reference check for these skbs.

User impact:
Kernel panic may happen when user delete the bridge if there are
continuous traffics go through the nfqueue.
Here is a panic in my device which using kernel v3.10.

general protection fault: 0000 1 SMP
task: ffff880158418000 ti: ffff88011aeec000 task.ti: ffff88011aeec000
RIP: 0010:[<ffffffff8133a83f>] [<ffffffff8133a83f>]
 __percpu_counter_add+0xf/0x70
RSP: 0000:ffff88017fc83e20 EFLAGS: 00010206
RAX: ffff88011aeeffd8 RBX: ff0b900200000080 RCX: ffff88017fc901a0
RDX: 0000000000000020 RSI: ffffffffffffffff RDI: ff0b900200000080
RBP: ffff88017fc83e38 R08: ffff88015b5b1100 R09: ffff88017fc901a0
R10: 0000000000000000 R11: ffff88017fc83da0 R12: 0000000bfd80400a
R13: ffffffffffffffff R14: 0000000000000000 R15: ffff88017fc901c0
FS: 00007fcfe17d2700(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa3fbdf0ec0 CR3: 0000000159eba000 CR4: 00000000003407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88015b5b1100 0000000bfd80400a ff0b900200000000 ffff88017fc83e60
ffffffff8157be3a ffffffff81a3a580 000000000000000a 0000000000000000
ffff88017fc83e70 ffffffff8157c0be ffff88017fc83ed0 ffffffff8113977d
Call Trace:
<IRQ>
[<ffffffff8157be3a>] dst_destroy+0xfa/0x120
[<ffffffff8157c0be>] dst_destroy_rcu+0xe/0x20
[<ffffffff8113977d>] rcu_process_callbacks+0x1dd/0x550
[<ffffffff8108f2cf>] __do_softirq+0xef/0x280
[<ffffffff816b1adc>] call_softirq+0x1c/0x30
[<ffffffff8102d365>] do_softirq+0x65/0xa0
[<ffffffff8108f665>] irq_exit+0x115/0x120
[<ffffffff816b2755>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff816b0c9d>] apic_timer_interrupt+0x6d/0x80
<EOI>
[<ffffffff816b016b>] ? sysret_audit+0x17/0x21
RIP [<ffffffff8133a83f>] __percpu_counter_add+0xf/0x70
RSP <ffff88017fc83e20>

Solution:
Hold the bridge dev until there is no dst reference.

Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
---
 net/bridge/br_if.c              |  3 +++
 net/bridge/br_netfilter_hooks.c |  3 ++-
 net/bridge/br_netfilter_ipv6.c  |  3 ++-
 net/bridge/br_nf_core.c         |  1 +
 net/core/dst.c                  | 13 ++++++++++++-
 5 files changed, 20 insertions(+), 3 deletions(-)

Comments

Pablo Neira Ayuso April 3, 2019, 5:44 p.m. UTC | #1
On Tue, Apr 02, 2019 at 12:56:09PM +0000, Rundong Ge wrote:
> Problem:
> When bridge-nf-call-iptables is enabled, skb_dst(skb) of packets that
> in the nfqueue may be a dangling pointer if user delete the bridge.
> Because packets go through the br_nf_pre_routing_finish will set the dst
> pointer to the br->fake_rtable. But the br struct will be freed
> without the reference check for these skbs.
> 
> User impact:
> Kernel panic may happen when user delete the bridge if there are
> continuous traffics go through the nfqueue.
> Here is a panic in my device which using kernel v3.10.

This kernel is _very old_.

Could you provide the steps to reproduce this issue?

Holding the device doesn't seem the way to go to me, we have a of
netdevice_notifier that is dropping packets for an interface that is
gone in nfnetlink_queue. We also drop packets whenever a hook in gone.

So I wonder if this is still a problem in mainline kernels.
Rundong Ge April 9, 2019, 6:33 a.m. UTC | #2
Hi Pablo

I've tested on mainline v5.0. The dangling pointer access of fake_rtable
still exists.

My env is like this: client0--box--client1.
The box runs ubuntu with v5.0 kernel, br_netfilter and nfnetlink_queue
are inserted.

Reproduce steps:
1.Create a bridge on the box.
2.echo 1 >/proc/sys/net/bridge/bridge-nf-call-iptables
3.Add a netfilter hook function to queue the packets to nfqueuenum 0.
  The hook point must between <NF_BR_PRE_ROUTING,NF_BR_PRI_BRNF> and
  <NF_BR_FORWARD,NF_BR_PRI_BRNF - 1>.
4.Add a userspace process "nfqueue_rcv" to continuously read and
  set_verdict "NF_ACCEPT" to packets from queue 0.
5.Continuosly ping client1 from client0
6.Send "Ctrl + Z" to pause the "nfqueue_rcv" to simulate the queue
  congestion.
7.Using "ifconfig br0 down&&brctl delbr br0" to delete the bridge.
8.At this time the _skb_refdst of skbs in the nfqueue become dangling
  pointer. If we send "fg" to resume the "nfqueue_rcv", the kernel
  may try to access the freed memory.

Debug log:
Here I add debug logs in "netdev_freemem" and "dst_release" to prove
the freed memory access. As the log shows, the "dst_release" accessed
bridge's fake_rtable after the bridge was freed.

Apr  8 22:25:14 raydon kernel: [62139.005062] netdev_freemem name:br0,
fake_rtable:000000009d76cef0
Apr  8 22:25:21 raydon kernel: [62145.967133] dst_release
dst:000000009d76cef0 dst->dev->name: řKU¡TH
Apr  8 22:25:21 raydon kernel: [62145.967154] dst_release
dst:000000009d76cef0 dst->dev->name: řKU¡TH
Apr  8 22:25:21 raydon kernel: [62145.967180] dst_release
dst:000000009d76cef0 dst->dev->name: řKU¡TH
Apr  8 22:25:21 raydon kernel: [62145.967197] dst_release
dst:000000009d76cef0 dst->dev->name: řKU¡TH



The reason why the hook point should be after <NF_BR_PRE_ROUTING,
NF_BR_PRI_BRNF> is skbs reference bridge's fake_rtable in
"br_nf_pre_routing_finish" hooked at <NF_BR_PRE_ROUTING,NF_BR_PRI_BRNF>.

And the reason why the hook point should be before <NF_BR_FORWARD,
NF_BR_PRI_BRNF - 1> is "br_nf_forward_ip" will set the state.out to
bridge dev. After this hook point, the "nfqnl_dev_drop" triggered by
the bridge's NETDEV_DOWN event can flush the queued skbs before
bridge's memory is freed, because the state.out now matches the
bridge's dev.

So the root cause is "nfqnl_dev_drop" didn't flush the skbs properly
queued between <NF_BR_PRE_ROUTING,NF_BR_PRI_BRNF> and <NF_BR_FORWARD,
NF_BR_PRI_BRNF - 1>.

As you mentioned, hold the bridge dev for these skbs is not a proper
solution. I will send another patch to let "nfqnl_dev_drop" can flush
these skbs.

Thanks
Rundong


Pablo Neira Ayuso <pablo@netfilter.org> 于2019年4月4日周四 上午1:44写道:
>
> On Tue, Apr 02, 2019 at 12:56:09PM +0000, Rundong Ge wrote:
> > Problem:
> > When bridge-nf-call-iptables is enabled, skb_dst(skb) of packets that
> > in the nfqueue may be a dangling pointer if user delete the bridge.
> > Because packets go through the br_nf_pre_routing_finish will set the dst
> > pointer to the br->fake_rtable. But the br struct will be freed
> > without the reference check for these skbs.
> >
> > User impact:
> > Kernel panic may happen when user delete the bridge if there are
> > continuous traffics go through the nfqueue.
> > Here is a panic in my device which using kernel v3.10.
>
> This kernel is _very old_.
>
> Could you provide the steps to reproduce this issue?
>
> Holding the device doesn't seem the way to go to me, we have a of
> netdevice_notifier that is dropping packets for an interface that is
> gone in nfnetlink_queue. We also drop packets whenever a hook in gone.
>
> So I wonder if this is still a problem in mainline kernels.
Rundong Ge April 18, 2019, 9:58 a.m. UTC | #3
friendly ping

Rundong Ge <rdong.ge@gmail.com> 于2019年4月9日周二 下午2:33写道:
>
> Hi Pablo
>
> I've tested on mainline v5.0. The dangling pointer access of fake_rtable
> still exists.
>
> My env is like this: client0--box--client1.
> The box runs ubuntu with v5.0 kernel, br_netfilter and nfnetlink_queue
> are inserted.
>
> Reproduce steps:
> 1.Create a bridge on the box.
> 2.echo 1 >/proc/sys/net/bridge/bridge-nf-call-iptables
> 3.Add a netfilter hook function to queue the packets to nfqueuenum 0.
>   The hook point must between <NF_BR_PRE_ROUTING,NF_BR_PRI_BRNF> and
>   <NF_BR_FORWARD,NF_BR_PRI_BRNF - 1>.
> 4.Add a userspace process "nfqueue_rcv" to continuously read and
>   set_verdict "NF_ACCEPT" to packets from queue 0.
> 5.Continuosly ping client1 from client0
> 6.Send "Ctrl + Z" to pause the "nfqueue_rcv" to simulate the queue
>   congestion.
> 7.Using "ifconfig br0 down&&brctl delbr br0" to delete the bridge.
> 8.At this time the _skb_refdst of skbs in the nfqueue become dangling
>   pointer. If we send "fg" to resume the "nfqueue_rcv", the kernel
>   may try to access the freed memory.
>
> Debug log:
> Here I add debug logs in "netdev_freemem" and "dst_release" to prove
> the freed memory access. As the log shows, the "dst_release" accessed
> bridge's fake_rtable after the bridge was freed.
>
> Apr  8 22:25:14 raydon kernel: [62139.005062] netdev_freemem name:br0,
> fake_rtable:000000009d76cef0
> Apr  8 22:25:21 raydon kernel: [62145.967133] dst_release
> dst:000000009d76cef0 dst->dev->name: řKU¡TH
> Apr  8 22:25:21 raydon kernel: [62145.967154] dst_release
> dst:000000009d76cef0 dst->dev->name: řKU¡TH
> Apr  8 22:25:21 raydon kernel: [62145.967180] dst_release
> dst:000000009d76cef0 dst->dev->name: řKU¡TH
> Apr  8 22:25:21 raydon kernel: [62145.967197] dst_release
> dst:000000009d76cef0 dst->dev->name: řKU¡TH
>
>
>
> The reason why the hook point should be after <NF_BR_PRE_ROUTING,
> NF_BR_PRI_BRNF> is skbs reference bridge's fake_rtable in
> "br_nf_pre_routing_finish" hooked at <NF_BR_PRE_ROUTING,NF_BR_PRI_BRNF>.
>
> And the reason why the hook point should be before <NF_BR_FORWARD,
> NF_BR_PRI_BRNF - 1> is "br_nf_forward_ip" will set the state.out to
> bridge dev. After this hook point, the "nfqnl_dev_drop" triggered by
> the bridge's NETDEV_DOWN event can flush the queued skbs before
> bridge's memory is freed, because the state.out now matches the
> bridge's dev.
>
> So the root cause is "nfqnl_dev_drop" didn't flush the skbs properly
> queued between <NF_BR_PRE_ROUTING,NF_BR_PRI_BRNF> and <NF_BR_FORWARD,
> NF_BR_PRI_BRNF - 1>.
>
> As you mentioned, hold the bridge dev for these skbs is not a proper
> solution. I will send another patch to let "nfqnl_dev_drop" can flush
> these skbs.
>
> Thanks
> Rundong
>
>
> Pablo Neira Ayuso <pablo@netfilter.org> 于2019年4月4日周四 上午1:44写道:
> >
> > On Tue, Apr 02, 2019 at 12:56:09PM +0000, Rundong Ge wrote:
> > > Problem:
> > > When bridge-nf-call-iptables is enabled, skb_dst(skb) of packets that
> > > in the nfqueue may be a dangling pointer if user delete the bridge.
> > > Because packets go through the br_nf_pre_routing_finish will set the dst
> > > pointer to the br->fake_rtable. But the br struct will be freed
> > > without the reference check for these skbs.
> > >
> > > User impact:
> > > Kernel panic may happen when user delete the bridge if there are
> > > continuous traffics go through the nfqueue.
> > > Here is a panic in my device which using kernel v3.10.
> >
> > This kernel is _very old_.
> >
> > Could you provide the steps to reproduce this issue?
> >
> > Holding the device doesn't seem the way to go to me, we have a of
> > netdevice_notifier that is dropping packets for an interface that is
> > gone in nfnetlink_queue. We also drop packets whenever a hook in gone.
> >
> > So I wonder if this is still a problem in mainline kernels.
diff mbox series

Patch

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 41f0a69..21948bd 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -384,6 +384,9 @@  void br_dev_delete(struct net_device *dev, struct list_head *head)
 	cancel_delayed_work_sync(&br->gc_work);
 
 	br_sysfs_delbr(br->dev);
+#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+	dst_release(&br->fake_rtable.dst);
+#endif
 	unregister_netdevice_queue(br->dev, head);
 }
 
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 22afa56..3683f0f 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -401,7 +401,8 @@  static int br_nf_pre_routing_finish(struct net *net, struct sock *sk, struct sk_
 			kfree_skb(skb);
 			return 0;
 		}
-		skb_dst_set_noref(skb, &rt->dst);
+		skb_dst_set(skb, &rt->dst);
+		dst_hold(&rt->dst);
 	}
 
 	skb->dev = nf_bridge->physindev;
diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
index e88d664..425b11a 100644
--- a/net/bridge/br_netfilter_ipv6.c
+++ b/net/bridge/br_netfilter_ipv6.c
@@ -201,7 +201,8 @@  static int br_nf_pre_routing_finish_ipv6(struct net *net, struct sock *sk, struc
 			kfree_skb(skb);
 			return 0;
 		}
-		skb_dst_set_noref(skb, &rt->dst);
+		skb_dst_set(skb, &rt->dst);
+		dst_hold(&rt->dst);
 	}
 
 	skb->dev = nf_bridge->physindev;
diff --git a/net/bridge/br_nf_core.c b/net/bridge/br_nf_core.c
index 8e2d7cf..6543c3c 100644
--- a/net/bridge/br_nf_core.c
+++ b/net/bridge/br_nf_core.c
@@ -81,6 +81,7 @@  void br_netfilter_rtable_init(struct net_bridge *br)
 	dst_init_metrics(&rt->dst, br_dst_default_metrics, true);
 	rt->dst.flags	= DST_NOXFRM | DST_FAKE_RTABLE;
 	rt->dst.ops = &fake_dst_ops;
+	dev_hold(br->dev);
 }
 
 int __init br_nf_core_init(void)
diff --git a/net/core/dst.c b/net/core/dst.c
index a263309..0e6f2a2 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -186,13 +186,24 @@  void dst_release(struct dst_entry *dst)
 {
 	if (dst) {
 		int newrefcnt;
+#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+		unsigned short fakertable = dst->flags & DST_FAKE_RTABLE;
+#endif
 
 		newrefcnt = atomic_dec_return(&dst->__refcnt);
 		if (unlikely(newrefcnt < 0))
 			net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
 					     __func__, dst, newrefcnt);
-		if (!newrefcnt)
+		if (!newrefcnt) {
+#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+			if (fakertable) {
+				if (dst->dev)
+					dev_put(dst->dev);
+				return;
+			}
+#endif
 			call_rcu(&dst->rcu_head, dst_destroy_rcu);
+		}
 	}
 }
 EXPORT_SYMBOL(dst_release);