[SRU,Bionic] ip: frags: fix crash in ip_do_fragment()
diff mbox series

Message ID 20190904174049.27082-1-cascardo@canonical.com
State New
Headers show
Series
  • [SRU,Bionic] ip: frags: fix crash in ip_do_fragment()
Related show

Commit Message

Thadeu Lima de Souza Cascardo Sept. 4, 2019, 5:40 p.m. UTC
From: Taehee Yoo <ap420073@gmail.com>

BugLink: https://bugs.launchpad.net/bugs/1842447

commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream

A kernel crash occurrs when defragmented packet is fragmented
in ip_do_fragment().
In defragment routine, skb_orphan() is called and
skb->ip_defrag_offset is set. but skb->sk and
skb->ip_defrag_offset are same union member. so that
frag->sk is not NULL.
Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
defragmented packet is fragmented.

test commands:
   %iptables -t nat -I POSTROUTING -j MASQUERADE
   %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000

splat looks like:
[  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
[  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
[  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
[  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
[  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
[  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
[  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
[  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
[  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
[  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
[  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
[  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
[  261.198158] Call Trace:
[  261.199018]  ? dst_output+0x180/0x180
[  261.205011]  ? save_trace+0x300/0x300
[  261.209018]  ? ip_copy_metadata+0xb00/0xb00
[  261.213034]  ? sched_clock_local+0xd4/0x140
[  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
[  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
[  261.227014]  ? find_held_lock+0x39/0x1c0
[  261.233008]  ip_finish_output+0x51d/0xb50
[  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
[  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
[  261.250152]  ? rcu_is_watching+0x77/0x120
[  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
[  261.261033]  ? nf_hook_slow+0xb1/0x160
[  261.265007]  ip_output+0x1c7/0x710
[  261.269005]  ? ip_mc_output+0x13f0/0x13f0
[  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
[  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
[  261.282996]  ? nf_hook_slow+0xb1/0x160
[  261.287007]  raw_sendmsg+0x21f9/0x4420
[  261.291008]  ? dst_output+0x180/0x180
[  261.297003]  ? sched_clock_cpu+0x126/0x170
[  261.301003]  ? find_held_lock+0x39/0x1c0
[  261.306155]  ? stop_critical_timings+0x420/0x420
[  261.311004]  ? check_flags.part.36+0x450/0x450
[  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
[  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
[  261.326142]  ? cyc2ns_read_end+0x10/0x10
[  261.330139]  ? raw_bind+0x280/0x280
[  261.334138]  ? sched_clock_cpu+0x126/0x170
[  261.338995]  ? check_flags.part.36+0x450/0x450
[  261.342991]  ? __lock_acquire+0x4500/0x4500
[  261.348994]  ? inet_sendmsg+0x11c/0x500
[  261.352989]  ? dst_output+0x180/0x180
[  261.357012]  inet_sendmsg+0x11c/0x500
[ ... ]

v2:
 - clear skb->sk at reassembly routine.(Eric Dumarzet)

Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(backported from commit 08fb833b40e361ce927c64d40e348af96996d9eb)

[cascardo:
The backport here misses the hunk at net/ipv6/netfilter/nf_conntrack_reasm.c.
This one has been changed by our commit (net: IP6 defrag: use rbtrees in
nf_conntrack_reasm.c), and calls inet_frag_reasm_finish, which resets sk to
NULL.
]

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
---
 net/ipv4/ip_fragment.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Mauricio Oliveira Sept. 4, 2019, 6:10 p.m. UTC | #1
On Wed, Sep 4, 2019 at 2:41 PM Thadeu Lima de Souza Cascardo
<cascardo@canonical.com> wrote:
>
> From: Taehee Yoo <ap420073@gmail.com>
>
> BugLink: https://bugs.launchpad.net/bugs/1842447
>
> commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream
>
> A kernel crash occurrs when defragmented packet is fragmented
> in ip_do_fragment().
> In defragment routine, skb_orphan() is called and
> skb->ip_defrag_offset is set. but skb->sk and
> skb->ip_defrag_offset are same union member. so that
> frag->sk is not NULL.
> Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
> defragmented packet is fragmented.
>
> test commands:
>    %iptables -t nat -I POSTROUTING -j MASQUERADE
>    %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
>
> splat looks like:
> [  261.069429] kernel BUG at net/ipv4/ip_output.c:636!

Thanks for sending this one out!

This is probably the issue responsible for a late NAK to a bnx2x patch recently,
per the line/function, although with a different stack trace in the
forwarding path.

I'll ask that reporter to check this patch too, but response may not
happen in a timely manner.

cheers,
Mauricio

> [  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
> [  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
> [  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
> [  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
> [  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
> [  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
> [  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
> [  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
> [  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
> [  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
> [  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
> [  261.198158] Call Trace:
> [  261.199018]  ? dst_output+0x180/0x180
> [  261.205011]  ? save_trace+0x300/0x300
> [  261.209018]  ? ip_copy_metadata+0xb00/0xb00
> [  261.213034]  ? sched_clock_local+0xd4/0x140
> [  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
> [  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
> [  261.227014]  ? find_held_lock+0x39/0x1c0
> [  261.233008]  ip_finish_output+0x51d/0xb50
> [  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
> [  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
> [  261.250152]  ? rcu_is_watching+0x77/0x120
> [  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
> [  261.261033]  ? nf_hook_slow+0xb1/0x160
> [  261.265007]  ip_output+0x1c7/0x710
> [  261.269005]  ? ip_mc_output+0x13f0/0x13f0
> [  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
> [  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
> [  261.282996]  ? nf_hook_slow+0xb1/0x160
> [  261.287007]  raw_sendmsg+0x21f9/0x4420
> [  261.291008]  ? dst_output+0x180/0x180
> [  261.297003]  ? sched_clock_cpu+0x126/0x170
> [  261.301003]  ? find_held_lock+0x39/0x1c0
> [  261.306155]  ? stop_critical_timings+0x420/0x420
> [  261.311004]  ? check_flags.part.36+0x450/0x450
> [  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
> [  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
> [  261.326142]  ? cyc2ns_read_end+0x10/0x10
> [  261.330139]  ? raw_bind+0x280/0x280
> [  261.334138]  ? sched_clock_cpu+0x126/0x170
> [  261.338995]  ? check_flags.part.36+0x450/0x450
> [  261.342991]  ? __lock_acquire+0x4500/0x4500
> [  261.348994]  ? inet_sendmsg+0x11c/0x500
> [  261.352989]  ? dst_output+0x180/0x180
> [  261.357012]  inet_sendmsg+0x11c/0x500
> [ ... ]
>
> v2:
>  - clear skb->sk at reassembly routine.(Eric Dumarzet)
>
> Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> (backported from commit 08fb833b40e361ce927c64d40e348af96996d9eb)
>
> [cascardo:
> The backport here misses the hunk at net/ipv6/netfilter/nf_conntrack_reasm.c.
> This one has been changed by our commit (net: IP6 defrag: use rbtrees in
> nf_conntrack_reasm.c), and calls inet_frag_reasm_finish, which resets sk to
> NULL.
> ]
>
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> ---
>  net/ipv4/ip_fragment.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index 73ca84d880b9..098c6dcff305 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -606,6 +606,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
>                         nextp = &fp->next;
>                         fp->prev = NULL;
>                         memset(&fp->rbnode, 0, sizeof(fp->rbnode));
> +                       fp->sk = NULL;
>                         head->data_len += fp->len;
>                         head->len += fp->len;
>                         if (head->ip_summed != fp->ip_summed)
> --
> 2.20.1
>
>
> --
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Khalid Elmously Sept. 4, 2019, 7:39 p.m. UTC | #2
On 2019-09-04 14:40:49 , Thadeu Lima de Souza Cascardo wrote:
> From: Taehee Yoo <ap420073@gmail.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1842447
> 
> commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream
> 
> A kernel crash occurrs when defragmented packet is fragmented
> in ip_do_fragment().
> In defragment routine, skb_orphan() is called and
> skb->ip_defrag_offset is set. but skb->sk and
> skb->ip_defrag_offset are same union member. so that
> frag->sk is not NULL.
> Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
> defragmented packet is fragmented.
> 
> test commands:
>    %iptables -t nat -I POSTROUTING -j MASQUERADE
>    %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
> 
> splat looks like:
> [  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
> [  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
> [  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
> [  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
> [  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
> [  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
> [  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
> [  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
> [  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
> [  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
> [  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
> [  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
> [  261.198158] Call Trace:
> [  261.199018]  ? dst_output+0x180/0x180
> [  261.205011]  ? save_trace+0x300/0x300
> [  261.209018]  ? ip_copy_metadata+0xb00/0xb00
> [  261.213034]  ? sched_clock_local+0xd4/0x140
> [  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
> [  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
> [  261.227014]  ? find_held_lock+0x39/0x1c0
> [  261.233008]  ip_finish_output+0x51d/0xb50
> [  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
> [  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
> [  261.250152]  ? rcu_is_watching+0x77/0x120
> [  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
> [  261.261033]  ? nf_hook_slow+0xb1/0x160
> [  261.265007]  ip_output+0x1c7/0x710
> [  261.269005]  ? ip_mc_output+0x13f0/0x13f0
> [  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
> [  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
> [  261.282996]  ? nf_hook_slow+0xb1/0x160
> [  261.287007]  raw_sendmsg+0x21f9/0x4420
> [  261.291008]  ? dst_output+0x180/0x180
> [  261.297003]  ? sched_clock_cpu+0x126/0x170
> [  261.301003]  ? find_held_lock+0x39/0x1c0
> [  261.306155]  ? stop_critical_timings+0x420/0x420
> [  261.311004]  ? check_flags.part.36+0x450/0x450
> [  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
> [  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
> [  261.326142]  ? cyc2ns_read_end+0x10/0x10
> [  261.330139]  ? raw_bind+0x280/0x280
> [  261.334138]  ? sched_clock_cpu+0x126/0x170
> [  261.338995]  ? check_flags.part.36+0x450/0x450
> [  261.342991]  ? __lock_acquire+0x4500/0x4500
> [  261.348994]  ? inet_sendmsg+0x11c/0x500
> [  261.352989]  ? dst_output+0x180/0x180
> [  261.357012]  inet_sendmsg+0x11c/0x500
> [ ... ]
> 
> v2:
>  - clear skb->sk at reassembly routine.(Eric Dumarzet)
> 
> Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> (backported from commit 08fb833b40e361ce927c64d40e348af96996d9eb)
> 
> [cascardo:
> The backport here misses the hunk at net/ipv6/netfilter/nf_conntrack_reasm.c.
> This one has been changed by our commit (net: IP6 defrag: use rbtrees in
> nf_conntrack_reasm.c), and calls inet_frag_reasm_finish, which resets sk to
> NULL.
> ]
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> ---
>  net/ipv4/ip_fragment.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index 73ca84d880b9..098c6dcff305 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -606,6 +606,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
>  			nextp = &fp->next;
>  			fp->prev = NULL;
>  			memset(&fp->rbnode, 0, sizeof(fp->rbnode));
> +			fp->sk = NULL;
>  			head->data_len += fp->len;
>  			head->len += fp->len;
>  			if (head->ip_summed != fp->ip_summed)

Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Connor Kuehl Sept. 4, 2019, 7:54 p.m. UTC | #3
On 9/4/19 10:40 AM, Thadeu Lima de Souza Cascardo wrote:
> From: Taehee Yoo <ap420073@gmail.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1842447
> 
> commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream
> 
> A kernel crash occurrs when defragmented packet is fragmented
> in ip_do_fragment().
> In defragment routine, skb_orphan() is called and
> skb->ip_defrag_offset is set. but skb->sk and
> skb->ip_defrag_offset are same union member. so that
> frag->sk is not NULL.
> Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
> defragmented packet is fragmented.
> 
> test commands:
>     %iptables -t nat -I POSTROUTING -j MASQUERADE
>     %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
> 
> splat looks like:
> [  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
> [  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
> [  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
> [  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
> [  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
> [  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
> [  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
> [  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
> [  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
> [  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
> [  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
> [  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
> [  261.198158] Call Trace:
> [  261.199018]  ? dst_output+0x180/0x180
> [  261.205011]  ? save_trace+0x300/0x300
> [  261.209018]  ? ip_copy_metadata+0xb00/0xb00
> [  261.213034]  ? sched_clock_local+0xd4/0x140
> [  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
> [  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
> [  261.227014]  ? find_held_lock+0x39/0x1c0
> [  261.233008]  ip_finish_output+0x51d/0xb50
> [  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
> [  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
> [  261.250152]  ? rcu_is_watching+0x77/0x120
> [  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
> [  261.261033]  ? nf_hook_slow+0xb1/0x160
> [  261.265007]  ip_output+0x1c7/0x710
> [  261.269005]  ? ip_mc_output+0x13f0/0x13f0
> [  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
> [  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
> [  261.282996]  ? nf_hook_slow+0xb1/0x160
> [  261.287007]  raw_sendmsg+0x21f9/0x4420
> [  261.291008]  ? dst_output+0x180/0x180
> [  261.297003]  ? sched_clock_cpu+0x126/0x170
> [  261.301003]  ? find_held_lock+0x39/0x1c0
> [  261.306155]  ? stop_critical_timings+0x420/0x420
> [  261.311004]  ? check_flags.part.36+0x450/0x450
> [  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
> [  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
> [  261.326142]  ? cyc2ns_read_end+0x10/0x10
> [  261.330139]  ? raw_bind+0x280/0x280
> [  261.334138]  ? sched_clock_cpu+0x126/0x170
> [  261.338995]  ? check_flags.part.36+0x450/0x450
> [  261.342991]  ? __lock_acquire+0x4500/0x4500
> [  261.348994]  ? inet_sendmsg+0x11c/0x500
> [  261.352989]  ? dst_output+0x180/0x180
> [  261.357012]  inet_sendmsg+0x11c/0x500
> [ ... ]
> 
> v2:
>   - clear skb->sk at reassembly routine.(Eric Dumarzet)
> 
> Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> (backported from commit 08fb833b40e361ce927c64d40e348af96996d9eb)
> 
> [cascardo:
> The backport here misses the hunk at net/ipv6/netfilter/nf_conntrack_reasm.c.
> This one has been changed by our commit (net: IP6 defrag: use rbtrees in
> nf_conntrack_reasm.c), and calls inet_frag_reasm_finish, which resets sk to
> NULL.
> ]
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

Acked-by: Connor Kuehl <connor.kuehl@canonical.com>

> ---
>   net/ipv4/ip_fragment.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index 73ca84d880b9..098c6dcff305 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -606,6 +606,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
>   			nextp = &fp->next;
>   			fp->prev = NULL;
>   			memset(&fp->rbnode, 0, sizeof(fp->rbnode));
> +			fp->sk = NULL;
>   			head->data_len += fp->len;
>   			head->len += fp->len;
>   			if (head->ip_summed != fp->ip_summed)
>
Tyler Hicks Sept. 4, 2019, 8:10 p.m. UTC | #4
On 2019-09-04 14:40:49, Thadeu Lima de Souza Cascardo wrote:
> From: Taehee Yoo <ap420073@gmail.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1842447
> 
> commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream
> 
> A kernel crash occurrs when defragmented packet is fragmented
> in ip_do_fragment().
> In defragment routine, skb_orphan() is called and
> skb->ip_defrag_offset is set. but skb->sk and
> skb->ip_defrag_offset are same union member. so that
> frag->sk is not NULL.
> Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
> defragmented packet is fragmented.
> 
> test commands:
>    %iptables -t nat -I POSTROUTING -j MASQUERADE
>    %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
> 
> splat looks like:
> [  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
> [  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
> [  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
> [  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
> [  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
> [  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
> [  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
> [  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
> [  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
> [  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
> [  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
> [  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
> [  261.198158] Call Trace:
> [  261.199018]  ? dst_output+0x180/0x180
> [  261.205011]  ? save_trace+0x300/0x300
> [  261.209018]  ? ip_copy_metadata+0xb00/0xb00
> [  261.213034]  ? sched_clock_local+0xd4/0x140
> [  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
> [  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
> [  261.227014]  ? find_held_lock+0x39/0x1c0
> [  261.233008]  ip_finish_output+0x51d/0xb50
> [  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
> [  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
> [  261.250152]  ? rcu_is_watching+0x77/0x120
> [  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
> [  261.261033]  ? nf_hook_slow+0xb1/0x160
> [  261.265007]  ip_output+0x1c7/0x710
> [  261.269005]  ? ip_mc_output+0x13f0/0x13f0
> [  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
> [  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
> [  261.282996]  ? nf_hook_slow+0xb1/0x160
> [  261.287007]  raw_sendmsg+0x21f9/0x4420
> [  261.291008]  ? dst_output+0x180/0x180
> [  261.297003]  ? sched_clock_cpu+0x126/0x170
> [  261.301003]  ? find_held_lock+0x39/0x1c0
> [  261.306155]  ? stop_critical_timings+0x420/0x420
> [  261.311004]  ? check_flags.part.36+0x450/0x450
> [  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
> [  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
> [  261.326142]  ? cyc2ns_read_end+0x10/0x10
> [  261.330139]  ? raw_bind+0x280/0x280
> [  261.334138]  ? sched_clock_cpu+0x126/0x170
> [  261.338995]  ? check_flags.part.36+0x450/0x450
> [  261.342991]  ? __lock_acquire+0x4500/0x4500
> [  261.348994]  ? inet_sendmsg+0x11c/0x500
> [  261.352989]  ? dst_output+0x180/0x180
> [  261.357012]  inet_sendmsg+0x11c/0x500
> [ ... ]
> 
> v2:
>  - clear skb->sk at reassembly routine.(Eric Dumarzet)
> 
> Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> (backported from commit 08fb833b40e361ce927c64d40e348af96996d9eb)
> 
> [cascardo:
> The backport here misses the hunk at net/ipv6/netfilter/nf_conntrack_reasm.c.
> This one has been changed by our commit (net: IP6 defrag: use rbtrees in
> nf_conntrack_reasm.c), and calls inet_frag_reasm_finish, which resets sk to
> NULL.
> ]
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

The Fixes line above confused me because I know that we don't have
upstream commit fa0f527358bd ("ip: use rb trees for IP frag queue.") in
Bionic. However, Bionic commit 4faadabc060a ("net: IP defrag:
encapsulate rbtree defrag code into callable functions") brings back the
"harmful" parts of upstream commit fa0f527358bd which is why this fix is
needed.

I agree that the call to inet_frag_reasm_finish() handles the
reassignment for the IPv6 case.

Acked-by: Tyler Hicks <tyhicks@canonical.com>

Tyler

> ---
>  net/ipv4/ip_fragment.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index 73ca84d880b9..098c6dcff305 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -606,6 +606,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
>  			nextp = &fp->next;
>  			fp->prev = NULL;
>  			memset(&fp->rbnode, 0, sizeof(fp->rbnode));
> +			fp->sk = NULL;
>  			head->data_len += fp->len;
>  			head->len += fp->len;
>  			if (head->ip_summed != fp->ip_summed)
> -- 
> 2.20.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Paolo Pisati Sept. 5, 2019, 8:28 a.m. UTC | #5
On Wed, Sep 04, 2019 at 02:40:49PM -0300, Thadeu Lima de Souza Cascardo wrote:
> From: Taehee Yoo <ap420073@gmail.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1842447
> 
> commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream
> 
> A kernel crash occurrs when defragmented packet is fragmented
> in ip_do_fragment().
> In defragment routine, skb_orphan() is called and
> skb->ip_defrag_offset is set. but skb->sk and
> skb->ip_defrag_offset are same union member. so that
> frag->sk is not NULL.
> Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
> defragmented packet is fragmented.
> 
> test commands:
>    %iptables -t nat -I POSTROUTING -j MASQUERADE
>    %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
> 
> splat looks like:
> [  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
> [  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
> [  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
> [  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
> [  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
> [  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
> [  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
> [  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
> [  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
> [  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
> [  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
> [  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
> [  261.198158] Call Trace:
> [  261.199018]  ? dst_output+0x180/0x180
> [  261.205011]  ? save_trace+0x300/0x300
> [  261.209018]  ? ip_copy_metadata+0xb00/0xb00
> [  261.213034]  ? sched_clock_local+0xd4/0x140
> [  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
> [  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
> [  261.227014]  ? find_held_lock+0x39/0x1c0
> [  261.233008]  ip_finish_output+0x51d/0xb50
> [  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
> [  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
> [  261.250152]  ? rcu_is_watching+0x77/0x120
> [  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
> [  261.261033]  ? nf_hook_slow+0xb1/0x160
> [  261.265007]  ip_output+0x1c7/0x710
> [  261.269005]  ? ip_mc_output+0x13f0/0x13f0
> [  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
> [  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
> [  261.282996]  ? nf_hook_slow+0xb1/0x160
> [  261.287007]  raw_sendmsg+0x21f9/0x4420
> [  261.291008]  ? dst_output+0x180/0x180
> [  261.297003]  ? sched_clock_cpu+0x126/0x170
> [  261.301003]  ? find_held_lock+0x39/0x1c0
> [  261.306155]  ? stop_critical_timings+0x420/0x420
> [  261.311004]  ? check_flags.part.36+0x450/0x450
> [  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
> [  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
> [  261.326142]  ? cyc2ns_read_end+0x10/0x10
> [  261.330139]  ? raw_bind+0x280/0x280
> [  261.334138]  ? sched_clock_cpu+0x126/0x170
> [  261.338995]  ? check_flags.part.36+0x450/0x450
> [  261.342991]  ? __lock_acquire+0x4500/0x4500
> [  261.348994]  ? inet_sendmsg+0x11c/0x500
> [  261.352989]  ? dst_output+0x180/0x180
> [  261.357012]  inet_sendmsg+0x11c/0x500
> [ ... ]
> 
> v2:
>  - clear skb->sk at reassembly routine.(Eric Dumarzet)
> 
> Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> (backported from commit 08fb833b40e361ce927c64d40e348af96996d9eb)
> 
> [cascardo:
> The backport here misses the hunk at net/ipv6/netfilter/nf_conntrack_reasm.c.
> This one has been changed by our commit (net: IP6 defrag: use rbtrees in
> nf_conntrack_reasm.c), and calls inet_frag_reasm_finish, which resets sk to
> NULL.
> ]
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

Acked-by: Paolo Pisati <paolo.pisati@canonical.com>

Patch
diff mbox series

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 73ca84d880b9..098c6dcff305 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -606,6 +606,7 @@  static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
 			nextp = &fp->next;
 			fp->prev = NULL;
 			memset(&fp->rbnode, 0, sizeof(fp->rbnode));
+			fp->sk = NULL;
 			head->data_len += fp->len;
 			head->len += fp->len;
 			if (head->ip_summed != fp->ip_summed)