From patchwork Fri Jan 26 02:09:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Li RongQing X-Patchwork-Id: 866125 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zSNq71x5Xz9s75 for ; Fri, 26 Jan 2018 13:57:06 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751524AbeAZC5E (ORCPT ); Thu, 25 Jan 2018 21:57:04 -0500 Received: from mx20.baidu.com ([111.202.115.85]:56299 "EHLO baidu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751340AbeAZC5C (ORCPT ); Thu, 25 Jan 2018 21:57:02 -0500 X-Greylist: delayed 2815 seconds by postgrey-1.27 at vger.kernel.org; Thu, 25 Jan 2018 21:57:02 EST Received: from M1-MAIL-HUB08.internal.baidu.com (unknown [10.44.83.48]) by Forcepoint Email with ESMTPS id 7CC4464FE9F8A; Fri, 26 Jan 2018 10:09:59 +0800 (CST) Received: from BC-MAIL-MBX12.internal.baidu.com ([169.254.2.182]) by M1-MAIL-HUB08.internal.baidu.com ([::1]) with mapi id 14.03.0382.000; Fri, 26 Jan 2018 10:09:55 +0800 From: "Li,Rongqing" To: Eric Dumazet , "netdev@vger.kernel.org" CC: "edumazet@google.com" Subject: =?utf-8?b?562U5aSNOiBbUEFUQ0hdIG5ldDogY2xlYW4gdGhlIHNrX2ZyYWcu?= =?utf-8?q?page_of_new_cloned_socket?= Thread-Topic: [PATCH] net: clean the sk_frag.page of new cloned socket Thread-Index: AQHTldovHUDD4KkdnE+r+lcOrW1IsqOFZ62w Date: Fri, 26 Jan 2018 02:09:54 +0000 Message-ID: <2AD939572F25A448A3AE3CAEA61328C23694645C@BC-MAIL-MBX12.internal.baidu.com> References: <1516882089-28575-1-git-send-email-lirongqing@baidu.com> <1516884245.3715.48.camel@gmail.com> In-Reply-To: <1516884245.3715.48.camel@gmail.com> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.31.62.17] x-baidu-bdmsfe-datecheck: 1_M1-MAIL-HUB08_2018-01-26 10:10:00:794 x-baidu-bdmsfe-viruscheck: M1-MAIL-HUB08_GRAY_Inside_WithoutAtta_2018-01-26 10:09:55:942 MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org > > if (newsk->sk_prot->sockets_allocated) > > sk_sockets_allocated_inc(newsk); > > Good catch. > > I suspect this was discovered by some syzkaller/syzbot run ? > No. I am seeing a panic, a page is in both task.task_frag.page and buddy free list; It should not happen , and the page->lru->next and page->lru->pre is 0xdead000000100100, then when alloc page from buddy, the system panic at __list_del of __rmqueue #0 [ffff881fff0c3850] machine_kexec at ffffffff8103cca8 #1 [ffff881fff0c38a0] crash_kexec at ffffffff810c2443 #2 [ffff881fff0c3968] oops_end at ffffffff816cae70 #3 [ffff881fff0c3990] die at ffffffff810063eb #4 [ffff881fff0c39c0] do_general_protection at ffffffff816ca7ce #5 [ffff881fff0c39f0] general_protection at ffffffff816ca0d8 [exception RIP: __rmqueue+120] RIP: ffffffff8113a918 RSP: ffff881fff0c3aa0 RFLAGS: 00010046 RAX: ffff88207ffd8018 RBX: 0000000000000003 RCX: 0000000000000003 RDX: 0000000000000001 RSI: ffffea006f4cf620 RDI: dead000000200200 RBP: ffff881fff0c3b00 R8: ffff88207ffd8018 R9: 0000000000000000 R10: dead000000100100 R11: ffffea007ecc6480 R12: ffffea006f4cf600 R13: 0000000000000000 R14: 0000000000000003 R15: ffff88207ffd7e80 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #6 [ffff881fff0c3b08] get_page_from_freelist at ffffffff8113ce71 #7 [ffff881fff0c3be0] __alloc_pages_nodemask at ffffffff8113d15f #8 [ffff881fff0c3d10] __alloc_page_frag at ffffffff815e2362 #9 [ffff881fff0c3d40] __netdev_alloc_frag at ffffffff815e241b #10 [ffff881fff0c3d58] __alloc_rx_skb at ffffffff815e2f91 #11 [ffff881fff0c3d78] __netdev_alloc_skb at ffffffff815e300b #12 [ffff881fff0c3d90] ixgbe_clean_rx_irq at ffffffffa003a98f [ixgbe] #13 [ffff881fff0c3df8] ixgbe_poll at ffffffffa003c233 [ixgbe] #14 [ffff881fff0c3e70] net_rx_action at ffffffff815f2f09 #15 [ffff881fff0c3ec8] __do_softirq at ffffffff81064867 #16 [ffff881fff0c3f38] call_softirq at ffffffff816d3a9c #17 [ffff881fff0c3f50] do_softirq at ffffffff81004e65 #18 [ffff881fff0c3f68] irq_exit at ffffffff81064b7d #19 [ffff881fff0c3f78] do_IRQ at ffffffff816d4428 The page info is like below, some element is removed: crash> struct page ffffea006f4cf600 -x struct page { flags = 0x2fffff00004000, mapping = 0x0, { { counters = 0x2ffffffff, { { _mapcount = { counter = 0xffffffff }, { inuse = 0xffff, objects = 0x7fff, frozen = 0x1 }, units = 0xffffffff }, _count = { counter = 0x2 } } } }, { lru = { next = 0xdead000000100100, prev = 0xdead000000200200 }, }, ….. } } crash> the page ffffea006f4cf600 is in other task task_frag.page and the task backtrace is like below crash> task 8683|grep ffffea006f4cf600 -A3 page = 0xffffea006f4cf600, offset = 32768, size = 32768 }, crash> crash> bt 8683 PID: 8683 TASK: ffff881faa088000 CPU: 10 COMMAND: "mynode" #0 [ffff881fff145e78] crash_nmi_callback at ffffffff81031712 #1 [ffff881fff145e88] nmi_handle at ffffffff816cafe9 #2 [ffff881fff145ec8] do_nmi at ffffffff816cb0f0 #3 [ffff881fff145ef0] end_repeat_nmi at ffffffff816ca4a1 [exception RIP: _raw_spin_lock_irqsave+62] RIP: ffffffff816c9a9e RSP: ffff881fa992b990 RFLAGS: 00000002 RAX: 0000000000004358 RBX: ffff88207ffd7e80 RCX: 0000000000004358 RDX: 0000000000004356 RSI: 0000000000000246 RDI: ffff88207ffd7ee8 RBP: ffff881fa992b990 R8: 0000000000000000 R9: 00000000019a16e6 R10: 0000000000004d24 R11: 0000000000004000 R12: 0000000000000242 R13: 0000000000004d24 R14: 0000000000000001 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- --- #4 [ffff881fa992b990] _raw_spin_lock_irqsave at ffffffff816c9a9e #5 [ffff881fa992b998] get_page_from_freelist at ffffffff8113ce5f #6 [ffff881fa992ba70] __alloc_pages_nodemask at ffffffff8113d15f #7 [ffff881fa992bba0] alloc_pages_current at ffffffff8117ab29 #8 [ffff881fa992bbe8] sk_page_frag_refill at ffffffff815dd310 #9 [ffff881fa992bc18] tcp_sendmsg at ffffffff8163e4f3 #10 [ffff881fa992bcd8] inet_sendmsg at ffffffff81668434 #11 [ffff881fa992bd08] sock_sendmsg at ffffffff815d9719 #12 [ffff881fa992be58] SYSC_sendto at ffffffff815d9c81 #13 [ffff881fa992bf70] sys_sendto at ffffffff815da6ae #14 [ffff881fa992bf80] system_call_fastpath at ffffffff816d2189 RIP: 00007f5bfe1d804b RSP: 00007f5bfa63b3b0 RFLAGS: 00000206 RAX: 000000000000002c RBX: ffffffff816d2189 RCX: 00007f5bfa63b420 RDX: 0000000000002000 RSI: 000000000c096000 RDI: 0000000000000040 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff815da6ae R13: ffff881fa992bf78 R14: 000000000000a552 R15: 0000000000000016 ORIG_RAX: 000000000000002c CS: 0033 SS: 002b crash> my kernel is 3.10, I did not find the root cause, I guest all kind of possibility > I would rather move that in tcp_disconnect() that only fuzzers use, instead of > doing this on every clone and slowing down normal users. > Do you mean we should fix it like below: diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f08eebe60446..44f8320610ab 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2431,6 +2431,12 @@ int tcp_disconnect(struct sock *sk, int flags) WARN_ON(inet->inet_num && !icsk->icsk_bind_hash); + + if (sk->sk_frag.page) { + put_page(sk->sk_frag.page); + sk->sk_frag.page = NULL; + } + sk->sk_error_report(sk); return err; }