Message ID | 1454094116-38878-1-git-send-email-joe@ovn.org |
---|---|
State | Accepted, archived |
Headers | show |
On Fri, Jan 29, 2016 at 11:01 AM, Joe Stringer <joe@ovn.org> wrote: > When the linux stack is an endpoint connected to OVS which is performing > IP fragmentation via conntrack actions, it's possible to hit a kernel > BUG. The following upstream commit fixes the issue inside ip_defrag(). > For the backport, we provide this inside ip_defrag() for kernels that we > currently backport that function, and also provide just the bugfix for > newer kernels, so we can continue to use upstream functionality as much > as possible. > > Upstream commit: > Later parts of the stack (including fragmentation) expect that there is > never a socket attached to frag in a frag_list, however this invariant > was not enforced on all defrag paths. This could lead to the > BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the > end of this commit message. > > While the call could be added to openvswitch to fix this particular > error, the head and tail of the frags list are already orphaned > indirectly inside ip_defrag(), so it seems like the remaining fragments > should all be orphaned in all circumstances. > > kernel BUG at net/ipv4/ip_output.c:586! > [...] > Call Trace: > <IRQ> > [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch] > [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch] > [<ffffffff81667830>] ? dst_discard_out+0x20/0x20 > [<ffffffff81667810>] ? dst_ifdown+0x80/0x80 > [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch] > [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210 > [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 > [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack] > [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 > [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch] > [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch] > [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 > [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch] > [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch] > [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 > [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch] > [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch] > [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch] > [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch] > [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 > [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 > [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 > [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch] > [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch] > [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch] > [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0 > [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0 > [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930 > [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930 > [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90 > [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220 > [<ffffffff81661060>] dev_queue_xmit+0x10/0x20 > [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220 > [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590 > [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590 > [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590 > [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0 > [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0 > [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80 > [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340 > [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190 > [<ffffffff816ab4c0>] ip_output+0x70/0x110 > [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80 > [<ffffffff816aa9f9>] ip_local_out+0x39/0x70 > [<ffffffff816abf89>] ip_send_skb+0x19/0x40 > [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40 > [<ffffffff816df21a>] icmp_push_reply+0xea/0x120 > [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230 > [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50 > [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 > [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70 > [<ffffffff816dfa06>] icmp_echo+0x36/0x70 > [<ffffffff816e0d11>] icmp_rcv+0x271/0x450 > [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0 > [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0 > [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0 > [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560 > [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560 > [<ffffffff816a5453>] ip_rcv+0x283/0x3e0 > [<ffffffff810b6302>] ? match_held_lock+0x192/0x200 > [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40 > [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0 > [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230 > [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90 > [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60 > [<ffffffff8165e678>] process_backlog+0x78/0x230 > [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230 > [<ffffffff8165e355>] net_rx_action+0x155/0x400 > [<ffffffff8106b48c>] __do_softirq+0xcc/0x420 > [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590 > [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30 > <EOI> > [<ffffffff8106b88e>] do_softirq+0x4e/0x60 > [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0 > [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590 > [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0 > [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0 > [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0 > [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80 > [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340 > [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190 > [<ffffffff816ab4c0>] ip_output+0x70/0x110 > [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80 > [<ffffffff816aa9f9>] ip_local_out+0x39/0x70 > [<ffffffff816abf89>] ip_send_skb+0x19/0x40 > [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40 > [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30 > [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 > [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0 > [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 > [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0 > [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0 > [<ffffffff8163e398>] sock_sendmsg+0x38/0x50 > [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270 > [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320 > [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40 > [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460 > [<ffffffff81204886>] ? __fget_light+0x66/0x90 > [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80 > [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20 > [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f > Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f > 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 > RIP [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0 > RSP <ffff88006d603170> > > Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") > Signed-off-by: Joe Stringer <joe@ovn.org> > Signed-off-by: David S. Miller <davem@davemloft.net> > > Upstream: 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()") > Signed-off-by: Joe Stringer <joe@ovn.org> Looks good. Acked-by: Pravin B Shelar <pshelar@ovn.org>
On 29 January 2016 at 13:10, pravin shelar <pshelar@ovn.org> wrote: > On Fri, Jan 29, 2016 at 11:01 AM, Joe Stringer <joe@ovn.org> wrote: >> When the linux stack is an endpoint connected to OVS which is performing >> IP fragmentation via conntrack actions, it's possible to hit a kernel >> BUG. The following upstream commit fixes the issue inside ip_defrag(). >> For the backport, we provide this inside ip_defrag() for kernels that we >> currently backport that function, and also provide just the bugfix for >> newer kernels, so we can continue to use upstream functionality as much >> as possible. >> >> Upstream commit: >> Later parts of the stack (including fragmentation) expect that there is >> never a socket attached to frag in a frag_list, however this invariant >> was not enforced on all defrag paths. This could lead to the >> BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the >> end of this commit message. >> >> While the call could be added to openvswitch to fix this particular >> error, the head and tail of the frags list are already orphaned >> indirectly inside ip_defrag(), so it seems like the remaining fragments >> should all be orphaned in all circumstances. >> >> kernel BUG at net/ipv4/ip_output.c:586! >> [...] >> Call Trace: >> <IRQ> >> [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch] >> [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch] >> [<ffffffff81667830>] ? dst_discard_out+0x20/0x20 >> [<ffffffff81667810>] ? dst_ifdown+0x80/0x80 >> [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch] >> [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210 >> [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 >> [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack] >> [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 >> [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch] >> [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch] >> [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 >> [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch] >> [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch] >> [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 >> [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch] >> [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch] >> [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch] >> [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch] >> [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 >> [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 >> [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 >> [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch] >> [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch] >> [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch] >> [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0 >> [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0 >> [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930 >> [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930 >> [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90 >> [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220 >> [<ffffffff81661060>] dev_queue_xmit+0x10/0x20 >> [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220 >> [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590 >> [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590 >> [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590 >> [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0 >> [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0 >> [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80 >> [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340 >> [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190 >> [<ffffffff816ab4c0>] ip_output+0x70/0x110 >> [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80 >> [<ffffffff816aa9f9>] ip_local_out+0x39/0x70 >> [<ffffffff816abf89>] ip_send_skb+0x19/0x40 >> [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40 >> [<ffffffff816df21a>] icmp_push_reply+0xea/0x120 >> [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230 >> [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50 >> [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 >> [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70 >> [<ffffffff816dfa06>] icmp_echo+0x36/0x70 >> [<ffffffff816e0d11>] icmp_rcv+0x271/0x450 >> [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0 >> [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0 >> [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0 >> [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560 >> [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560 >> [<ffffffff816a5453>] ip_rcv+0x283/0x3e0 >> [<ffffffff810b6302>] ? match_held_lock+0x192/0x200 >> [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40 >> [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0 >> [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230 >> [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90 >> [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60 >> [<ffffffff8165e678>] process_backlog+0x78/0x230 >> [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230 >> [<ffffffff8165e355>] net_rx_action+0x155/0x400 >> [<ffffffff8106b48c>] __do_softirq+0xcc/0x420 >> [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590 >> [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30 >> <EOI> >> [<ffffffff8106b88e>] do_softirq+0x4e/0x60 >> [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0 >> [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590 >> [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0 >> [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0 >> [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0 >> [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80 >> [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340 >> [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190 >> [<ffffffff816ab4c0>] ip_output+0x70/0x110 >> [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80 >> [<ffffffff816aa9f9>] ip_local_out+0x39/0x70 >> [<ffffffff816abf89>] ip_send_skb+0x19/0x40 >> [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40 >> [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30 >> [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 >> [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0 >> [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 >> [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0 >> [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0 >> [<ffffffff8163e398>] sock_sendmsg+0x38/0x50 >> [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270 >> [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320 >> [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40 >> [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460 >> [<ffffffff81204886>] ? __fget_light+0x66/0x90 >> [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80 >> [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20 >> [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f >> Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f >> 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 >> RIP [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0 >> RSP <ffff88006d603170> >> >> Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") >> Signed-off-by: Joe Stringer <joe@ovn.org> >> Signed-off-by: David S. Miller <davem@davemloft.net> >> >> Upstream: 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()") >> Signed-off-by: Joe Stringer <joe@ovn.org> > > Looks good. > > Acked-by: Pravin B Shelar <pshelar@ovn.org> Thanks, applied to master and branch-2.5.
diff --git a/datapath/linux/compat/include/net/ip.h b/datapath/linux/compat/include/net/ip.h index 63a3f0924e9d..2e33664147a1 100644 --- a/datapath/linux/compat/include/net/ip.h +++ b/datapath/linux/compat/include/net/ip.h @@ -122,6 +122,19 @@ int rpl_ip_defrag(struct sk_buff *skb, u32 user); int __init rpl_ipfrag_init(void); void rpl_ipfrag_fini(void); #else /* OVS_FRAGMENT_BACKPORT */ + +/* We have no good way to detect the presence of upstream commit 8282f27449bf + * ("inet: frag: Always orphan skbs inside ip_defrag()"), but it should be + * always included in kernels 4.5+. */ +#if LINUX_VERSION_CODE < KERNEL_VERSION(4,5,0) +static inline int rpl_ip_defrag(struct sk_buff *skb, u32 user) +{ + skb_orphan(skb); + ip_defrag(skb, user); +} +#define ip_defrag rpl_ip_defrag +#endif + static inline int rpl_ipfrag_init(void) { return 0; } static inline void rpl_ipfrag_fini(void) { } #endif /* OVS_FRAGMENT_BACKPORT */ diff --git a/datapath/linux/compat/ip_fragment.c b/datapath/linux/compat/ip_fragment.c index 418134928114..42514e42815f 100644 --- a/datapath/linux/compat/ip_fragment.c +++ b/datapath/linux/compat/ip_fragment.c @@ -682,6 +682,7 @@ int rpl_ip_defrag(struct sk_buff *skb, u32 user) struct ipq *qp; IP_INC_STATS_BH(net, IPSTATS_MIB_REASMREQDS); + skb_orphan(skb); /* Lookup (or create) queue header */ qp = ip_find(net, ip_hdr(skb), user, vif);