From patchwork Fri Jan 29 19:01:56 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Stringer X-Patchwork-Id: 575802 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (li376-54.members.linode.com [96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id D495E140328 for ; Sat, 30 Jan 2016 06:02:12 +1100 (AEDT) Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id 0D8F610DB0; Fri, 29 Jan 2016 11:02:12 -0800 (PST) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx3v3.cudamail.com (mx3.cudamail.com [64.34.241.5]) by archives.nicira.com (Postfix) with ESMTPS id 47C8010535 for ; Fri, 29 Jan 2016 11:02:11 -0800 (PST) Received: from bar4.cudamail.com (localhost [127.0.0.1]) by mx3v3.cudamail.com (Postfix) with ESMTPS id 7492116185B for ; Fri, 29 Jan 2016 12:02:10 -0700 (MST) X-ASG-Debug-ID: 1454094129-03dc217d9ac2ed60001-byXFYA Received: from mx1-pf1.cudamail.com ([192.168.24.1]) by bar4.cudamail.com with ESMTP id C5BxtDeBjsgBHZfg (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 29 Jan 2016 12:02:09 -0700 (MST) X-Barracuda-Envelope-From: joe@ovn.org X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.1 Received: from unknown (HELO relay3-d.mail.gandi.net) (217.70.183.195) by mx1-pf1.cudamail.com with ESMTPS (DHE-RSA-AES256-SHA encrypted); 29 Jan 2016 19:02:08 -0000 Received-SPF: pass (mx1-pf1.cudamail.com: SPF record at ovn.org designates 217.70.183.195 as permitted sender) X-Barracuda-Apparent-Source-IP: 217.70.183.195 X-Barracuda-RBL-IP: 217.70.183.195 Received: from mfilter23-d.gandi.net (mfilter23-d.gandi.net [217.70.178.151]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 01C5BA80D5; Fri, 29 Jan 2016 20:02:06 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at mfilter23-d.gandi.net Received: from relay3-d.mail.gandi.net ([IPv6:::ffff:217.70.183.195]) by mfilter23-d.gandi.net (mfilter23-d.gandi.net [::ffff:10.0.15.180]) (amavisd-new, port 10024) with ESMTP id I3I7iTmlYyBG; Fri, 29 Jan 2016 20:02:04 +0100 (CET) X-Originating-IP: 208.91.1.34 Received: from localhost.localdomain (unknown [208.91.1.34]) (Authenticated sender: joe@ovn.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 53397A80E5; Fri, 29 Jan 2016 20:02:02 +0100 (CET) X-CudaMail-Envelope-Sender: joe@ovn.org From: Joe Stringer To: dev@openvswitch.org X-CudaMail-Whitelist-To: dev@openvswitch.org X-CudaMail-MID: CM-E1-128052557 X-CudaMail-DTE: 012916 X-CudaMail-Originating-IP: 217.70.183.195 Date: Fri, 29 Jan 2016 11:01:56 -0800 X-ASG-Orig-Subj: [##CM-E1-128052557##][PATCH] datapath: inet: frag: Always orphan skbs inside ip_defrag(). Message-Id: <1454094116-38878-1-git-send-email-joe@ovn.org> X-Mailer: git-send-email 2.1.4 X-Barracuda-Connect: UNKNOWN[192.168.24.1] X-Barracuda-Start-Time: 1454094129 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-ASG-Whitelist: Header =?UTF-8?B?eFwtY3VkYW1haWxcLXdoaXRlbGlzdFwtdG8=?= X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 Subject: [ovs-dev] [PATCH] datapath: inet: frag: Always orphan skbs inside ip_defrag(). X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: dev-bounces@openvswitch.org Sender: "dev" When the linux stack is an endpoint connected to OVS which is performing IP fragmentation via conntrack actions, it's possible to hit a kernel BUG. The following upstream commit fixes the issue inside ip_defrag(). For the backport, we provide this inside ip_defrag() for kernels that we currently backport that function, and also provide just the bugfix for newer kernels, so we can continue to use upstream functionality as much as possible. Upstream commit: Later parts of the stack (including fragmentation) expect that there is never a socket attached to frag in a frag_list, however this invariant was not enforced on all defrag paths. This could lead to the BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the end of this commit message. While the call could be added to openvswitch to fix this particular error, the head and tail of the frags list are already orphaned indirectly inside ip_defrag(), so it seems like the remaining fragments should all be orphaned in all circumstances. kernel BUG at net/ipv4/ip_output.c:586! [...] Call Trace: [] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch] [] ovs_fragment+0xcc/0x214 [openvswitch] [] ? dst_discard_out+0x20/0x20 [] ? dst_ifdown+0x80/0x80 [] ? find_bucket.isra.2+0x62/0x70 [openvswitch] [] ? mod_timer_pending+0x65/0x210 [] ? __lock_acquire+0x3db/0x1b90 [] ? nf_conntrack_in+0x252/0x500 [nf_conntrack] [] ? __lock_is_held+0x54/0x70 [] do_output.isra.29+0xe3/0x1b0 [openvswitch] [] do_execute_actions+0xe11/0x11f0 [openvswitch] [] ? __lock_is_held+0x54/0x70 [] ovs_execute_actions+0x32/0xd0 [openvswitch] [] ovs_dp_process_packet+0x85/0x140 [openvswitch] [] ? __lock_is_held+0x54/0x70 [] ovs_execute_actions+0xb2/0xd0 [openvswitch] [] ovs_dp_process_packet+0x85/0x140 [openvswitch] [] ? ovs_ct_get_labels+0x49/0x80 [openvswitch] [] ovs_vport_receive+0x5d/0xa0 [openvswitch] [] ? __lock_acquire+0x3db/0x1b90 [] ? __lock_acquire+0x3db/0x1b90 [] ? __lock_acquire+0x3db/0x1b90 [] ? internal_dev_xmit+0x5/0x140 [openvswitch] [] internal_dev_xmit+0x6c/0x140 [openvswitch] [] ? internal_dev_xmit+0x5/0x140 [openvswitch] [] dev_hard_start_xmit+0x2b9/0x5e0 [] ? netif_skb_features+0xd1/0x1f0 [] __dev_queue_xmit+0x800/0x930 [] ? __dev_queue_xmit+0x50/0x930 [] ? mark_held_locks+0x71/0x90 [] ? neigh_resolve_output+0x106/0x220 [] dev_queue_xmit+0x10/0x20 [] neigh_resolve_output+0x178/0x220 [] ? ip_finish_output2+0x1ff/0x590 [] ip_finish_output2+0x1ff/0x590 [] ? ip_finish_output2+0x7e/0x590 [] ip_do_fragment+0x831/0x8a0 [] ? ip_copy_metadata+0x1b0/0x1b0 [] ip_fragment.constprop.49+0x43/0x80 [] ip_finish_output+0x17c/0x340 [] ? nf_hook_slow+0xe4/0x190 [] ip_output+0x70/0x110 [] ? ip_fragment.constprop.49+0x80/0x80 [] ip_local_out+0x39/0x70 [] ip_send_skb+0x19/0x40 [] ip_push_pending_frames+0x33/0x40 [] icmp_push_reply+0xea/0x120 [] icmp_reply.constprop.23+0x1ed/0x230 [] icmp_echo.part.21+0x4e/0x50 [] ? __lock_is_held+0x54/0x70 [] ? rcu_read_lock_held+0x5e/0x70 [] icmp_echo+0x36/0x70 [] icmp_rcv+0x271/0x450 [] ip_local_deliver_finish+0x127/0x3a0 [] ? ip_local_deliver_finish+0x41/0x3a0 [] ip_local_deliver+0x60/0xd0 [] ? ip_rcv_finish+0x560/0x560 [] ip_rcv_finish+0xdd/0x560 [] ip_rcv+0x283/0x3e0 [] ? match_held_lock+0x192/0x200 [] ? inet_del_offload+0x40/0x40 [] __netif_receive_skb_core+0x392/0xae0 [] ? process_backlog+0x8e/0x230 [] ? mark_held_locks+0x71/0x90 [] __netif_receive_skb+0x18/0x60 [] process_backlog+0x78/0x230 [] ? process_backlog+0xdd/0x230 [] net_rx_action+0x155/0x400 [] __do_softirq+0xcc/0x420 [] ? ip_finish_output2+0x217/0x590 [] do_softirq_own_stack+0x1c/0x30 [] do_softirq+0x4e/0x60 [] __local_bh_enable_ip+0xa8/0xb0 [] ip_finish_output2+0x240/0x590 [] ? ip_do_fragment+0x831/0x8a0 [] ip_do_fragment+0x831/0x8a0 [] ? ip_copy_metadata+0x1b0/0x1b0 [] ip_fragment.constprop.49+0x43/0x80 [] ip_finish_output+0x17c/0x340 [] ? nf_hook_slow+0xe4/0x190 [] ip_output+0x70/0x110 [] ? ip_fragment.constprop.49+0x80/0x80 [] ip_local_out+0x39/0x70 [] ip_send_skb+0x19/0x40 [] ip_push_pending_frames+0x33/0x40 [] raw_sendmsg+0x7d3/0xc30 [] ? __lock_acquire+0x3db/0x1b90 [] ? inet_sendmsg+0xc7/0x1d0 [] ? __lock_is_held+0x54/0x70 [] inet_sendmsg+0x10a/0x1d0 [] ? inet_sendmsg+0x5/0x1d0 [] sock_sendmsg+0x38/0x50 [] ___sys_sendmsg+0x25f/0x270 [] ? handle_mm_fault+0x8dd/0x1320 [] ? _raw_spin_unlock+0x27/0x40 [] ? __do_page_fault+0x1e2/0x460 [] ? __fget_light+0x66/0x90 [] __sys_sendmsg+0x42/0x80 [] SyS_sendmsg+0x12/0x20 [] entry_SYSCALL_64_fastpath+0x12/0x6f Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 RIP [] ip_do_fragment+0x892/0x8a0 RSP Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Joe Stringer Signed-off-by: David S. Miller Upstream: 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()") Signed-off-by: Joe Stringer Acked-by: Pravin B Shelar --- datapath/linux/compat/include/net/ip.h | 13 +++++++++++++ datapath/linux/compat/ip_fragment.c | 1 + 2 files changed, 14 insertions(+) diff --git a/datapath/linux/compat/include/net/ip.h b/datapath/linux/compat/include/net/ip.h index 63a3f0924e9d..2e33664147a1 100644 --- a/datapath/linux/compat/include/net/ip.h +++ b/datapath/linux/compat/include/net/ip.h @@ -122,6 +122,19 @@ int rpl_ip_defrag(struct sk_buff *skb, u32 user); int __init rpl_ipfrag_init(void); void rpl_ipfrag_fini(void); #else /* OVS_FRAGMENT_BACKPORT */ + +/* We have no good way to detect the presence of upstream commit 8282f27449bf + * ("inet: frag: Always orphan skbs inside ip_defrag()"), but it should be + * always included in kernels 4.5+. */ +#if LINUX_VERSION_CODE < KERNEL_VERSION(4,5,0) +static inline int rpl_ip_defrag(struct sk_buff *skb, u32 user) +{ + skb_orphan(skb); + ip_defrag(skb, user); +} +#define ip_defrag rpl_ip_defrag +#endif + static inline int rpl_ipfrag_init(void) { return 0; } static inline void rpl_ipfrag_fini(void) { } #endif /* OVS_FRAGMENT_BACKPORT */ diff --git a/datapath/linux/compat/ip_fragment.c b/datapath/linux/compat/ip_fragment.c index 418134928114..42514e42815f 100644 --- a/datapath/linux/compat/ip_fragment.c +++ b/datapath/linux/compat/ip_fragment.c @@ -682,6 +682,7 @@ int rpl_ip_defrag(struct sk_buff *skb, u32 user) struct ipq *qp; IP_INC_STATS_BH(net, IPSTATS_MIB_REASMREQDS); + skb_orphan(skb); /* Lookup (or create) queue header */ qp = ip_find(net, ip_hdr(skb), user, vif);