From patchwork Fri Jan 11 18:07:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: zzoru X-Patchwork-Id: 1023718 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="nPxX3hIp"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43brSN0kNtz9sCr for ; Sat, 12 Jan 2019 05:07:48 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731136AbfAKSHm (ORCPT ); Fri, 11 Jan 2019 13:07:42 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:46609 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727745AbfAKSHl (ORCPT ); Fri, 11 Jan 2019 13:07:41 -0500 Received: by mail-pf1-f193.google.com with SMTP id c73so7283520pfe.13; Fri, 11 Jan 2019 10:07:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:openpgp:autocrypt:cc:subject:message-id:date:user-agent :mime-version; bh=nn4wyiQo2EGiRtRoGe7AxlJ/jjHCaFajVxqnvt02PhI=; b=nPxX3hIpoO3nz/JnC1H6MxYtkE4ll2whzLc1w0p3tElE+HWK6lBVi8A9hV6j0WF1ss oTNjciOc4s9fnIgKyDz4TGY5/lM6BWzP0+HbGc6UKYSWPD/oyBk78CuvXTs9gdZV7JXm kysnB/ttSlDY6aPXTTD1sv+F+aSzCKKoAz27Y/7ifMao5xekGrlwEJO8T3ga8tib4UKC OMmW3bob1FzMjt70P/4/CBvUKSQG9KrYUUFfbG4wTOWGhL9ud+mWOPzZaLxMS9BaKOkh FCfd1QL/eAPDaVUk/FwdvGmPGsrrTTdgOOdeG3ZJuyvslNg90Z60m6wMZbUIJ19uzHop /FmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:openpgp:autocrypt:cc:subject:message-id :date:user-agent:mime-version; bh=nn4wyiQo2EGiRtRoGe7AxlJ/jjHCaFajVxqnvt02PhI=; b=Uvvywn/iB4ewfZyy7Q06+nxPfSzTpzpimVP8kUO95gSnLshg2Lm9+mz2WtNkOgF1cg mOnfSzMseVg+LvSV1StvAib7vUrje3WF/PNtieQzgEkF8b9AC888SIPeJdt4yKQHBn65 qlcY3rcpcme1Of2lOmxuF/bPTSTaWC83DcbMTIMBsV4EvgKATwtObgb0ButTIVOZJUtS 3/csck2iELzfwq7W7PDxcODO5s0ENNc7wGLUlwbJitro+vo1DTpGR8B6EsckgN/eH1EU 0ZHwgZ1I56ewGRkA9io/7u9l0Yq4G2yUn/FQDoSuSdABWwSZ8jGnoXr56TSvHthZCmnt hjFQ== X-Gm-Message-State: AJcUukdI+6AwTPK+Yh4wyQKEVcTmnPbDDBoietr5yeCMRPumHE6Aor16 lIUK6pKOpO4wCT6fKBcwC/E= X-Google-Smtp-Source: ALg8bN41iwUDr40vsENFHQ78W0hJ9/uYjjwE0r0Wwc6wG39R6wRWdc3UZhil7Yr4+Wt8tui0nIqQLg== X-Received: by 2002:a62:6143:: with SMTP id v64mr15533491pfb.142.1547230058548; Fri, 11 Jan 2019 10:07:38 -0800 (PST) Received: from zzoruui-MacBook-Pro.local ([59.27.106.94]) by smtp.gmail.com with ESMTPSA id t12sm123624255pfi.45.2019.01.11.10.07.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Jan 2019 10:07:37 -0800 (PST) To: davem@davemloft.net, ktkhai@virtuozzo.com, avagin@virtuozzo.com, dsahern@gmail.com, nicolas.dichtel@6wind.com, tyhicks@canonical.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org From: zzoru Openpgp: preference=signencrypt Autocrypt: addr=zzoru007@gmail.com; keydata= mQENBFu9j/8BCADL05lHHUrl10ujiLq/kzsREH7T4T/+Nt6ENIW/4QJTBF/uWeGF8euOXvVz tDXpKpXNjnYregmsElVBGk69FKIum5H6cLAjM63WJbnPekgQ6n/qylqOBpUXJSsCBhOszfTe i43CdEuD8IdPmK6J22pNz+SuAHin7Q1JYpXmtYbC5FW2KS3+fEkBM5Ag0hCTYjUfDl5E4wz/ 8i+31D2y8B962AfFOgnJXHmt4EU5d3t9SZHE5UI+JFO0hRLP0RUM8uq8mKt2ywu2/HU152qF Jndws8XB9ruE0P46loXfTant2P66tvu1jjBel4BdNCr2ibptBo7P/lf6EXRBF6tKnldrABEB AAG0Gnp6b3J1IDx6em9ydTAwN0BnbWFpbC5jb20+iQFUBBMBCAA+FiEE8vbnLtWOgTAUlgt0 598BEcIsRPcFAlu9j/8CGwMFCQHhM4AFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQ598B EcIsRPcKaAf+ItZDFFHRmHEMJIQUCNkq7M9rsub108Y8Xh/ctCHWtiRf+S73Uhii4U1QUJFN GuvXRSPR8L3nd8bcbXG+D27biOwHPiyQW2NKwNEHciQyzBPIZOd7fwbRyLs1pdfD7BXKylx6 1XPpVqLC/fRVk8nrKwQ63Seq9nEHMrJkowiP16tcz8FYVGYLtX/3xd7Cwq9SJ332IrtBgSty Y/JTDEOPkENSu2lVuPUqyEbT158CPPN2niEOCRSu1L6Ht/iElfsHhrnKbbsT5Zu+99KdGCek sh6uN7KSBgquXdlONmR9KPPBTADSwRJooXBlOINa/K5gx+C8hNUelDKOBvgBq7uUrrkBDQRb vY//AQgAtp+Dsxh+ldEY7xr/QDqoWKM2H2IGm4uH7XWdyh8ua+7JOYVe1umLT/C+ZBqxrIaj 9SV91+Tk5I6KMFR5aTEeNx5AwPOoQKRhs8vjXUT5nZmcBPVLH+cpGOBHr/zZy5UBVIlt+4Qb OakHSFeS3kYmccOmBfEih6khwQSfBk/x5yf08ZGRlfGBIlhI9nfNNiuWgn6qj+70Zf8XuoZH vrvUaj01Y2E/tWqSPMlZSgVzfcDCC7K/LxVLLu6qgRoCwkXYLggC9A8lxZmQjevEo8x15QET bfWqUyH/mzoo1YHvqIEIZ6b3mT0SUg5Y9e+EOsEHOs8+9fDJTx4yl5PC2+/UxwARAQABiQE8 BBgBCAAmFiEE8vbnLtWOgTAUlgt0598BEcIsRPcFAlu9j/8CGwwFCQHhM4AACgkQ598BEcIs RPf7UQf/QxtbNgbOSPSBFMbrlZFULm6jedNl8DQttXPpvKPsNAhnTLoN9Ejst4QNnA9ep18H 7T4tJ6KsgJmdNGxIY6+pBIMyoKOtZpDtpuRk8Z7cu5YiQ3JIqn5xBGnnNNmkUInHNN8Mw3Z+ lwnfQtNgyr6CvvTJuG34ep0oh3ctLeNrLCI8r2gnIAODp4kX3KkY40hAvbXrDKMDgglqOeXD UlMA3Se97p+0YAz/qpeqIGbu90IiUdFY2R8eV1/9JMa6e1bVi8v0lr3k9A3EkOtZzDrHU7ie ay0I4vXg6GSPR5TXOSIz+/z1/1Hyw5qx6fsY75yR2m5WdS+n8I0pmJ8Sot4T8Q== Cc: syzkaller@googlegroups.com Subject: net/core: BUG in copy_net_ns() Message-ID: Date: Sat, 12 Jan 2019 03:07:33 +0900 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org net/core: BUG in copy_net_ns() (net_namespace.c) Hello, I've got the following error report while fuzzing the kernel with syzkaller. On commit 1bdbe227492075d058e37cb3d400e6468d0095b5 Syzkaller hit 'WARNING in __alloc_pages_slowpath' bug. syz-executor561 (17453) used greatest stack depth: 25056 bytes left WARNING: CPU: 0 PID: 692 at mm/page_alloc.c:4415 __alloc_pages_slowpath+0x1cb1/0x2220 mm/page_alloc.c:4386 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 692 Comm: kswapd0 Not tainted 5.0.0-rc1+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace:  __dump_stack lib/dump_stack.c:77 [inline]  dump_stack+0xca/0x13e lib/dump_stack.c:113  panic+0x278/0x5bf kernel/panic.c:214  __warn.cold.10+0x20/0x45 kernel/panic.c:571  report_bug+0x246/0x2d0 lib/bug.c:186  fixup_bug arch/x86/kernel/traps.c:178 [inline]  do_error_trap+0x123/0x1e0 arch/x86/kernel/traps.c:271  do_invalid_op+0x31/0x40 arch/x86/kernel/traps.c:290  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973 RIP: 0010:__alloc_pages_slowpath+0x1cb1/0x2220 mm/page_alloc.c:4415 Code: 8b 84 24 a8 00 00 00 e9 ea f1 ff ff 85 d2 0f 85 0b 01 00 00 48 c7 c7 c0 5e 55 84 e8 79 f8 23 02 e9 86 f9 ff ff 44 8b 74 24 0c <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 8b 54 24 18 48 c1 ea 03 80 RSP: 0018:ffff8880683fedb8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1ffff1100d07fda4 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88807ffdd528 RBP: dffffc0000000000 R08: 0000000000000000 R09: 000000000000067a R10: 0000000000000000 R11: ffff88807ffdc487 R12: 0000000000000000 R13: ffff8880683ff010 R14: 0000000000415a00 R15: ffff8880683ff010  __alloc_pages_nodemask+0x521/0x5f0 mm/page_alloc.c:4555  __alloc_pages include/linux/gfp.h:473 [inline]  __alloc_pages_node include/linux/gfp.h:486 [inline]  kmem_getpages mm/slab.c:1398 [inline]  cache_grow_begin+0x95/0x300 mm/slab.c:2666  fallback_alloc+0x1ce/0x270 mm/slab.c:3208  __do_cache_alloc mm/slab.c:3345 [inline]  slab_alloc mm/slab.c:3373 [inline]  kmem_cache_alloc+0x286/0x2f0 mm/slab.c:3541  create_object+0x83/0x880 mm/kmemleak.c:578  kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]  slab_post_alloc_hook mm/slab.h:442 [inline]  slab_alloc mm/slab.c:3381 [inline]  kmem_cache_alloc+0x18f/0x2f0 mm/slab.c:3541  mempool_alloc+0x13e/0x340 mm/mempool.c:385  bio_alloc_bioset+0x36f/0x5d0 block/bio.c:489  bio_alloc include/linux/bio.h:393 [inline]  submit_bh_wbc.isra.57+0x128/0x680 fs/buffer.c:3061  __block_write_full_page+0x6e8/0xcd0 fs/buffer.c:1765  block_write_full_page+0x202/0x250 fs/buffer.c:2955  pageout mm/vmscan.c:865 [inline]  shrink_page_list+0x220f/0x3800 mm/vmscan.c:1383  shrink_inactive_list+0x3c2/0xaa0 mm/vmscan.c:1961  shrink_list mm/vmscan.c:2273 [inline]  shrink_node_memcg.constprop.83+0x4bf/0x10e0 mm/vmscan.c:2538  shrink_node+0x162/0xd10 mm/vmscan.c:2753  kswapd_shrink_node mm/vmscan.c:3516 [inline]  balance_pgdat+0x47f/0xc00 mm/vmscan.c:3674  kswapd+0x57c/0xde0 mm/vmscan.c:3929  kthread+0x347/0x410 kernel/kthread.c:246  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 Dumping ftrace buffer:    (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. Syzkaller reproducer: # {Threaded:false Collide:false Repeat:true RepeatTimes:0 Procs:8 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 EnableTun:false UseTmpDir:true EnableCgroups:false EnableNetdev:true ResetNet:false HandleSegv:false Repro:false Trace:false} unshare(0x40000000) C reproducer: // autogenerated by syzkaller (https://github.com/google/syzkaller) #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include unsigned long long procid; static void sleep_ms(uint64_t ms) {   usleep(ms * 1000); } static uint64_t current_time_ms(void) {   struct timespec ts;   if (clock_gettime(CLOCK_MONOTONIC, &ts))     exit(1);   return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000; } static void use_temporary_dir(void) {   char tmpdir_template[] = "./syzkaller.XXXXXX";   char* tmpdir = mkdtemp(tmpdir_template);   if (!tmpdir)     exit(1);   if (chmod(tmpdir, 0777))     exit(1);   if (chdir(tmpdir))     exit(1); } static bool write_file(const char* file, const char* what, ...) {   char buf[1024];   va_list args;   va_start(args, what);   vsnprintf(buf, sizeof(buf), what, args);   va_end(args);   buf[sizeof(buf) - 1] = 0;   int len = strlen(buf);   int fd = open(file, O_WRONLY | O_CLOEXEC);   if (fd == -1)     return false;   if (write(fd, buf, len) != len) {     int err = errno;     close(fd);     errno = err;     return false;   }   close(fd);   return true; } static struct {   char* pos;   int nesting;   struct nlattr* nested[8];   char buf[1024]; } nlmsg; static void netlink_init(int typ, int flags, const void* data, int size) {   memset(&nlmsg, 0, sizeof(nlmsg));   struct nlmsghdr* hdr = (struct nlmsghdr*)nlmsg.buf;   hdr->nlmsg_type = typ;   hdr->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;   memcpy(hdr + 1, data, size);   nlmsg.pos = (char*)(hdr + 1) + NLMSG_ALIGN(size); } static void netlink_attr(int typ, const void* data, int size) {   struct nlattr* attr = (struct nlattr*)nlmsg.pos;   attr->nla_len = sizeof(*attr) + size;   attr->nla_type = typ;   memcpy(attr + 1, data, size);   nlmsg.pos += NLMSG_ALIGN(attr->nla_len); } static void netlink_nest(int typ) {   struct nlattr* attr = (struct nlattr*)nlmsg.pos;   attr->nla_type = typ;   nlmsg.pos += sizeof(*attr);   nlmsg.nested[nlmsg.nesting++] = attr; } static void netlink_done(void) {   struct nlattr* attr = nlmsg.nested[--nlmsg.nesting];   attr->nla_len = nlmsg.pos - (char*)attr; } static int netlink_send(int sock) {   if (nlmsg.pos > nlmsg.buf + sizeof(nlmsg.buf) || nlmsg.nesting)     exit(1);   struct nlmsghdr* hdr = (struct nlmsghdr*)nlmsg.buf;   hdr->nlmsg_len = nlmsg.pos - nlmsg.buf;   struct sockaddr_nl addr;   memset(&addr, 0, sizeof(addr));   addr.nl_family = AF_NETLINK;   unsigned n = sendto(sock, nlmsg.buf, hdr->nlmsg_len, 0,                       (struct sockaddr*)&addr, sizeof(addr));   if (n != hdr->nlmsg_len)     exit(1);   n = recv(sock, nlmsg.buf, sizeof(nlmsg.buf), 0);   if (n < sizeof(struct nlmsghdr) + sizeof(struct nlmsgerr))     exit(1);   if (hdr->nlmsg_type != NLMSG_ERROR)     exit(1);   return -((struct nlmsgerr*)(hdr + 1))->error; } static void netlink_add_device_impl(const char* type, const char* name) {   struct ifinfomsg hdr;   memset(&hdr, 0, sizeof(hdr));   netlink_init(RTM_NEWLINK, NLM_F_EXCL | NLM_F_CREATE, &hdr, sizeof(hdr));   if (name)     netlink_attr(IFLA_IFNAME, name, strlen(name));   netlink_nest(IFLA_LINKINFO);   netlink_attr(IFLA_INFO_KIND, type, strlen(type)); } static void netlink_add_device(int sock, const char* type, const char* name) {   netlink_add_device_impl(type, name);   netlink_done();   int err = netlink_send(sock);   (void)err; } static void netlink_add_veth(int sock, const char* name, const char* peer) {   netlink_add_device_impl("veth", name);   netlink_nest(IFLA_INFO_DATA);   netlink_nest(VETH_INFO_PEER);   nlmsg.pos += sizeof(struct ifinfomsg);   netlink_attr(IFLA_IFNAME, peer, strlen(peer));   netlink_done();   netlink_done();   netlink_done();   int err = netlink_send(sock);   (void)err; } static void netlink_add_hsr(int sock, const char* name, const char* slave1,                             const char* slave2) {   netlink_add_device_impl("hsr", name);   netlink_nest(IFLA_INFO_DATA);   int ifindex1 = if_nametoindex(slave1);   netlink_attr(IFLA_HSR_SLAVE1, &ifindex1, sizeof(ifindex1));   int ifindex2 = if_nametoindex(slave2);   netlink_attr(IFLA_HSR_SLAVE2, &ifindex2, sizeof(ifindex2));   netlink_done();   netlink_done();   int err = netlink_send(sock);   (void)err; } static void netlink_device_change(int sock, const char* name, bool up,                                   const char* master, const void* mac,                                   int macsize) {   struct ifinfomsg hdr;   memset(&hdr, 0, sizeof(hdr));   if (up)     hdr.ifi_flags = hdr.ifi_change = IFF_UP;   netlink_init(RTM_NEWLINK, 0, &hdr, sizeof(hdr));   netlink_attr(IFLA_IFNAME, name, strlen(name));   if (master) {     int ifindex = if_nametoindex(master);     netlink_attr(IFLA_MASTER, &ifindex, sizeof(ifindex));   }   if (macsize)     netlink_attr(IFLA_ADDRESS, mac, macsize);   int err = netlink_send(sock);   (void)err; } static int netlink_add_addr(int sock, const char* dev, const void* addr,                             int addrsize) {   struct ifaddrmsg hdr;   memset(&hdr, 0, sizeof(hdr));   hdr.ifa_family = addrsize == 4 ? AF_INET : AF_INET6;   hdr.ifa_prefixlen = addrsize == 4 ? 24 : 120;   hdr.ifa_scope = RT_SCOPE_UNIVERSE;   hdr.ifa_index = if_nametoindex(dev);   netlink_init(RTM_NEWADDR, NLM_F_CREATE | NLM_F_REPLACE, &hdr, sizeof(hdr));   netlink_attr(IFA_LOCAL, addr, addrsize);   netlink_attr(IFA_ADDRESS, addr, addrsize);   return netlink_send(sock); } static void netlink_add_addr4(int sock, const char* dev, const char* addr) {   struct in_addr in_addr;   inet_pton(AF_INET, addr, &in_addr);   int err = netlink_add_addr(sock, dev, &in_addr, sizeof(in_addr));   (void)err; } static void netlink_add_addr6(int sock, const char* dev, const char* addr) {   struct in6_addr in6_addr;   inet_pton(AF_INET6, addr, &in6_addr);   int err = netlink_add_addr(sock, dev, &in6_addr, sizeof(in6_addr));   (void)err; } #define DEV_IPV4 "172.20.20.%d" #define DEV_IPV6 "fe80::%02hx" #define DEV_MAC 0x00aaaaaaaaaa static void initialize_netdevices(void) {   char netdevsim[16];   sprintf(netdevsim, "netdevsim%d", (int)procid);   struct {     const char* type;     const char* dev;   } devtypes[] = {       {"ip6gretap", "ip6gretap0"}, {"bridge", "bridge0"},       {"vcan", "vcan0"},           {"bond", "bond0"},       {"team", "team0"},           {"dummy", "dummy0"},       {"nlmon", "nlmon0"},         {"caif", "caif0"},       {"batadv", "batadv0"},       {"vxcan", "vxcan1"},       {"netdevsim", netdevsim},    {"veth", 0},   };   const char* devmasters[] = {"bridge", "bond", "team"};   struct {     const char* name;     int macsize;     bool noipv6;   } devices[] = {       {"lo", ETH_ALEN},       {"sit0", 0},       {"bridge0", ETH_ALEN},       {"vcan0", 0, true},       {"tunl0", 0},       {"gre0", 0},       {"gretap0", ETH_ALEN},       {"ip_vti0", 0},       {"ip6_vti0", 0},       {"ip6tnl0", 0},       {"ip6gre0", 0},       {"ip6gretap0", ETH_ALEN},       {"erspan0", ETH_ALEN},       {"bond0", ETH_ALEN},       {"veth0", ETH_ALEN},       {"veth1", ETH_ALEN},       {"team0", ETH_ALEN},       {"veth0_to_bridge", ETH_ALEN},       {"veth1_to_bridge", ETH_ALEN},       {"veth0_to_bond", ETH_ALEN},       {"veth1_to_bond", ETH_ALEN},       {"veth0_to_team", ETH_ALEN},       {"veth1_to_team", ETH_ALEN},       {"veth0_to_hsr", ETH_ALEN},       {"veth1_to_hsr", ETH_ALEN},       {"hsr0", 0},       {"dummy0", ETH_ALEN},       {"nlmon0", 0},       {"vxcan1", 0, true},       {"caif0", ETH_ALEN},       {"batadv0", ETH_ALEN},       {netdevsim, ETH_ALEN},   };   int sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);   if (sock == -1)     exit(1);   unsigned i;   for (i = 0; i < sizeof(devtypes) / sizeof(devtypes[0]); i++)     netlink_add_device(sock, devtypes[i].type, devtypes[i].dev);   for (i = 0; i < sizeof(devmasters) / (sizeof(devmasters[0])); i++) {     char master[32], slave0[32], veth0[32], slave1[32], veth1[32];     sprintf(slave0, "%s_slave_0", devmasters[i]);     sprintf(veth0, "veth0_to_%s", devmasters[i]);     netlink_add_veth(sock, slave0, veth0);     sprintf(slave1, "%s_slave_1", devmasters[i]);     sprintf(veth1, "veth1_to_%s", devmasters[i]);     netlink_add_veth(sock, slave1, veth1);     sprintf(master, "%s0", devmasters[i]);     netlink_device_change(sock, slave0, false, master, 0, 0);     netlink_device_change(sock, slave1, false, master, 0, 0);   }   netlink_device_change(sock, "bridge_slave_0", true, 0, 0, 0);   netlink_device_change(sock, "bridge_slave_1", true, 0, 0, 0);   netlink_add_veth(sock, "hsr_slave_0", "veth0_to_hsr");   netlink_add_veth(sock, "hsr_slave_1", "veth1_to_hsr");   netlink_add_hsr(sock, "hsr0", "hsr_slave_0", "hsr_slave_1");   netlink_device_change(sock, "hsr_slave_0", true, 0, 0, 0);   netlink_device_change(sock, "hsr_slave_1", true, 0, 0, 0);   for (i = 0; i < sizeof(devices) / (sizeof(devices[0])); i++) {     char addr[32];     sprintf(addr, DEV_IPV4, i + 10);     netlink_add_addr4(sock, devices[i].name, addr);     if (!devices[i].noipv6) {       sprintf(addr, DEV_IPV6, i + 10);       netlink_add_addr6(sock, devices[i].name, addr);     }     uint64_t macaddr = DEV_MAC + ((i + 10ull) << 40);     netlink_device_change(sock, devices[i].name, true, 0, &macaddr,                           devices[i].macsize);   }   close(sock); } static void initialize_netdevices_init(void) {   int sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);   if (sock == -1)     exit(1);   struct {     const char* type;     int macsize;     bool noipv6;     bool noup;   } devtypes[] = {       {"nr", 7, true}, {"rose", 5, true, true},   };   unsigned i;   for (i = 0; i < sizeof(devtypes) / sizeof(devtypes[0]); i++) {     char dev[32], addr[32];     sprintf(dev, "%s%d", devtypes[i].type, (int)procid);     sprintf(addr, "172.30.%d.%d", i, (int)procid + 1);     netlink_add_addr4(sock, dev, addr);     if (!devtypes[i].noipv6) {       sprintf(addr, "fe88::%02hx:%02hx", i, (int)procid + 1);       netlink_add_addr6(sock, dev, addr);     }     int macsize = devtypes[i].macsize;     uint64_t macaddr = 0xbbbbbb +                        ((unsigned long long)i << (8 * (macsize - 2))) +                        (procid << (8 * (macsize - 1)));     netlink_device_change(sock, dev, !devtypes[i].noup, 0, &macaddr, macsize);   }   close(sock); } static void setup_common() {   if (mount(0, "/sys/fs/fuse/connections", "fusectl", 0, 0)) {   } } static void loop(); static void sandbox_common() {   prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);   setpgrp();   setsid();   struct rlimit rlim;   rlim.rlim_cur = rlim.rlim_max = 200 << 20;   setrlimit(RLIMIT_AS, &rlim);   rlim.rlim_cur = rlim.rlim_max = 32 << 20;   setrlimit(RLIMIT_MEMLOCK, &rlim);   rlim.rlim_cur = rlim.rlim_max = 136 << 20;   setrlimit(RLIMIT_FSIZE, &rlim);   rlim.rlim_cur = rlim.rlim_max = 1 << 20;   setrlimit(RLIMIT_STACK, &rlim);   rlim.rlim_cur = rlim.rlim_max = 0;   setrlimit(RLIMIT_CORE, &rlim);   rlim.rlim_cur = rlim.rlim_max = 256;   setrlimit(RLIMIT_NOFILE, &rlim);   if (unshare(CLONE_NEWNS)) {   }   if (unshare(CLONE_NEWIPC)) {   }   if (unshare(0x02000000)) {   }   if (unshare(CLONE_NEWUTS)) {   }   if (unshare(CLONE_SYSVSEM)) {   }   typedef struct {     const char* name;     const char* value;   } sysctl_t;   static const sysctl_t sysctls[] = {       {"/proc/sys/kernel/shmmax", "16777216"},       {"/proc/sys/kernel/shmall", "536870912"},       {"/proc/sys/kernel/shmmni", "1024"},       {"/proc/sys/kernel/msgmax", "8192"},       {"/proc/sys/kernel/msgmni", "1024"},       {"/proc/sys/kernel/msgmnb", "1024"},       {"/proc/sys/kernel/sem", "1024 1048576 500 1024"},   };   unsigned i;   for (i = 0; i < sizeof(sysctls) / sizeof(sysctls[0]); i++)     write_file(sysctls[i].name, sysctls[i].value); } int wait_for_loop(int pid) {   if (pid < 0)     exit(1);   int status = 0;   while (waitpid(-1, &status, __WALL) != pid) {   }   return WEXITSTATUS(status); } static int do_sandbox_none(void) {   if (unshare(CLONE_NEWPID)) {   }   int pid = fork();   if (pid != 0)     return wait_for_loop(pid);   setup_common();   sandbox_common();   initialize_netdevices_init();   if (unshare(CLONE_NEWNET)) {   }   initialize_netdevices();   loop();   exit(1); } #define FS_IOC_SETFLAGS _IOW('f', 2, long) static void remove_dir(const char* dir) {   DIR* dp;   struct dirent* ep;   int iter = 0; retry:   while (umount2(dir, MNT_DETACH) == 0) {   }   dp = opendir(dir);   if (dp == NULL) {     if (errno == EMFILE) {       exit(1);     }     exit(1);   }   while ((ep = readdir(dp))) {     if (strcmp(ep->d_name, ".") == 0 || strcmp(ep->d_name, "..") == 0)       continue;     char filename[FILENAME_MAX];     snprintf(filename, sizeof(filename), "%s/%s", dir, ep->d_name);     while (umount2(filename, MNT_DETACH) == 0) {     }     struct stat st;     if (lstat(filename, &st))       exit(1);     if (S_ISDIR(st.st_mode)) {       remove_dir(filename);       continue;     }     int i;     for (i = 0;; i++) {       if (unlink(filename) == 0)         break;       if (errno == EPERM) {         int fd = open(filename, O_RDONLY);         if (fd != -1) {           long flags = 0;           if (ioctl(fd, FS_IOC_SETFLAGS, &flags) == 0)             close(fd);           continue;         }       }       if (errno == EROFS) {         break;       }       if (errno != EBUSY || i > 100)         exit(1);       if (umount2(filename, MNT_DETACH))         exit(1);     }   }   closedir(dp);   int i;   for (i = 0;; i++) {     if (rmdir(dir) == 0)       break;     if (i < 100) {       if (errno == EPERM) {         int fd = open(dir, O_RDONLY);         if (fd != -1) {           long flags = 0;           if (ioctl(fd, FS_IOC_SETFLAGS, &flags) == 0)             close(fd);           continue;         }       }       if (errno == EROFS) {         break;       }       if (errno == EBUSY) {         if (umount2(dir, MNT_DETACH))           exit(1);         continue;       }       if (errno == ENOTEMPTY) {         if (iter < 100) {           iter++;           goto retry;         }       }     }     exit(1);   } } static void kill_and_wait(int pid, int* status) {   kill(-pid, SIGKILL);   kill(pid, SIGKILL);   int i;   for (i = 0; i < 100; i++) {     if (waitpid(-1, status, WNOHANG | __WALL) == pid)       return;     usleep(1000);   }   DIR* dir = opendir("/sys/fs/fuse/connections");   if (dir) {     for (;;) {       struct dirent* ent = readdir(dir);       if (!ent)         break;       if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)         continue;       char abort[300];       snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",                ent->d_name);       int fd = open(abort, O_WRONLY);       if (fd == -1) {         continue;       }       if (write(fd, abort, 1) < 0) {       }       close(fd);     }     closedir(dir);   } else {   }   while (waitpid(-1, status, __WALL) != pid) {   } } #define SYZ_HAVE_SETUP_TEST 1 static void setup_test() {   prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);   setpgrp(); } #define SYZ_HAVE_RESET_TEST 1 static void reset_test() {   int fd;   for (fd = 3; fd < 30; fd++)     close(fd); } static void execute_one(void); #define WAIT_FLAGS __WALL static void loop(void) {   int iter;   for (iter = 0;; iter++) {     char cwdbuf[32];     sprintf(cwdbuf, "./%d", iter);     if (mkdir(cwdbuf, 0777))       exit(1);     int pid = fork();     if (pid < 0)       exit(1);     if (pid == 0) {       if (chdir(cwdbuf))         exit(1);       setup_test();       execute_one();       reset_test();       exit(0);     }     int status = 0;     uint64_t start = current_time_ms();     for (;;) {       if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)         break;       sleep_ms(1);       if (current_time_ms() - start < 5 * 1000)         continue;       kill_and_wait(pid, &status);       break;     }     remove_dir(cwdbuf);   } } void execute_one(void) {   syscall(__NR_unshare, 0x40000000); } int main(void) {   syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);   for (procid = 0; procid < 8; procid++) {     if (fork() == 0) {       use_temporary_dir();       do_sandbox_none();     }   }   sleep(1000000);   return 0; } I reviewed kernel code and found a bug that net_drop_ns func doesn't call net_free func when refcount_dec_and_test's return value is zero. or when rv = down_read_killable(&pernet_ops_rwsem) < 0, it doesn't need to call refcount_dec_and_test. https://github.com/torvalds/linux/commit/5ba049a5cc8e24a1643df75bbf65b4efa070fa74#diff-9312644e2968a45510bacdd2b2872ad2 (I can't reproduce this bug on v4.15 , and 1bdbe227492075d058e37cb3d400e6468d0095b5 with my patch. Because of the previous version of kernel doesn't have this bug.) This bug can lead to memory leak or DOS. I made a patch for this bug. (just revert to a before commit) and, sorry for my encrypted mails. diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index b02fb19df2cc..9de0ade14956 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -431,15 +431,18 @@ struct net *copy_net_ns(unsigned long flags,         get_user_ns(user_ns);         rv = down_read_killable(&pernet_ops_rwsem); -       if (rv < 0) -               goto put_userns; +       if (rv < 0){ +        net_free(net); +        dec_net_namespaces(ucounts); +        put_user_ns(user_ns); +        return ERR_PTR(rv); +    }         rv = setup_net(net, user_ns);         up_read(&pernet_ops_rwsem);         if (rv < 0) { -put_userns:                 put_user_ns(user_ns);                 net_drop_ns(net);  dec_ucounts: