From patchwork Tue Feb 11 17:43:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1236400 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48H9BR6R91z9sRN; Wed, 12 Feb 2020 04:44:15 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1j1ZZz-00055f-PD; Tue, 11 Feb 2020 17:44:11 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1j1ZZx-00054d-CQ for kernel-team@lists.ubuntu.com; Tue, 11 Feb 2020 17:44:09 +0000 Received: from mail-qv1-f70.google.com ([209.85.219.70]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1j1ZZw-0003ci-WC for kernel-team@lists.ubuntu.com; Tue, 11 Feb 2020 17:44:09 +0000 Received: by mail-qv1-f70.google.com with SMTP id cn2so7677380qvb.1 for ; Tue, 11 Feb 2020 09:44:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=L+Ul0Y0O+lP/5zbG/654jCB/7DBP4e2w3/yBwgPYdU0=; b=NBY//th53LuZHDgxlKEd7e9sm/HDZ7AYBbn4O3KePh1s/HKL569RD0kfHJP74A/k6o 6TLK7nFEjiMwhK6zL5FmMbRIFa34/VCECGDsG9SsmXx43sJXrp926/+N7cZ6eORBzkrF BOhTjoQDlAeYCgL0jvYb7M6Nomn1GKPIb8BoXyAsK3/QHMjqy5rBFLbj5L83my70HvQ5 Z7h947JQeP5B+4povyOs+bWU/LwcP+IY8nWyLJhkxvQYCM105owgbCCdlm0tc8k+GbaH ++ygnLC3rFaBE97KMkXQmSoBvDTUoaIrvYaB1K5p2LCe/hpXtKMFMvGNj49uwty3L/Na A8tA== X-Gm-Message-State: APjAAAXscU7oK53Nd3f7Mv/V+wLEPNCWthaD+7aGqK8FQluwlF2lJqnk tgfXKiCl/6fSYMGyTLKv1N6rFNkqBObIxJRI3noHg6yGDfQTvlSL6lRrqihPwhUbL+sJt29N6i7 jgqmCR3MLPo2iULRtKGVmMC/2JsASSlqwjRpHJ3vR X-Received: by 2002:ac8:3f88:: with SMTP id d8mr3497558qtk.382.1581443047635; Tue, 11 Feb 2020 09:44:07 -0800 (PST) X-Google-Smtp-Source: APXvYqytChUGlIMctOSlzb7BagMma8LZhvBCbKnQA2jukFBML6qooj1P+LnOsopkl0FgFqfkNFu3kQ== X-Received: by 2002:ac8:3f88:: with SMTP id d8mr3497535qtk.382.1581443047274; Tue, 11 Feb 2020 09:44:07 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:34d:c976:c9a5:3dcb:863]) by smtp.gmail.com with ESMTPSA id w9sm2343433qka.71.2020.02.11.09.44.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Feb 2020 09:44:06 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [bionic:linux][PATCH 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock Date: Tue, 11 Feb 2020 14:43:59 -0300 Message-Id: <20200211174359.12230-3-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200211174359.12230-1-marcelo.cerri@canonical.com> References: <20200211174359.12230-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: David Hildenbrand BugLink: https://bugs.launchpad.net/bugs/1862312 There seem to be some problems as result of 30467e0b3be ("mm, hotplug: fix concurrent memory hot-add deadlock"), which tried to fix a possible lock inversion reported and discussed in [1] due to the two locks a) device_lock() b) mem_hotplug_lock While add_memory() first takes b), followed by a) during bus_probe_device(), onlining of memory from user space first took a), followed by b), exposing a possible deadlock. In [1], and it was decided to not make use of device_hotplug_lock, but rather to enforce a locking order. The problems I spotted related to this: 1. Memory block device attributes: While .state first calls mem_hotplug_begin() and the calls device_online() - which takes device_lock() - .online does no longer call mem_hotplug_begin(), so effectively calls online_pages() without mem_hotplug_lock. 2. device_online() should be called under device_hotplug_lock, however onlining memory during add_memory() does not take care of that. In addition, I think there is also something wrong about the locking in 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages() without locks. This was introduced after 30467e0b3be. And skimming over the code, I assume it could need some more care in regards to locking (e.g. device_online() called without device_hotplug_lock. This will be addressed in the following patches. Now that we hold the device_hotplug_lock when - adding memory (e.g. via add_memory()/add_memory_resource()) - removing memory (e.g. via remove_memory()) - device_online()/device_offline() We can move mem_hotplug_lock usage back into online_pages()/offline_pages(). Why is mem_hotplug_lock still needed? Essentially to make get_online_mems()/put_online_mems() be very fast (relying on device_hotplug_lock would be very slow), and to serialize against addition of memory that does not create memory block devices (hmm). [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/ 2015-February/065324.html This patch is partly based on a patch by Vitaly Kuznetsov. Link: http://lkml.kernel.org/r/20180925091457.28651-4-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Pavel Tatashin Reviewed-by: Rashmica Gupta Reviewed-by: Oscar Salvador Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Greg Kroah-Hartman Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Rashmica Gupta Cc: Michael Neuling Cc: Balbir Singh Cc: Kate Stewart Cc: Thomas Gleixner Cc: Philippe Ombredanne Cc: Pavel Tatashin Cc: Vlastimil Babka Cc: Dan Williams Cc: Oscar Salvador Cc: YASUAKI ISHIMATSU Cc: Mathieu Malaterre Cc: John Allen Cc: Jonathan Corbet Cc: Joonsoo Kim Cc: Nathan Fontenot Cc: Rafael J. Wysocki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (backported from commit 381eab4a6ee81266f8dddc62e57376c7e584e5b8) [marcelo.cerri@canonical.com: Fixed context in online_pages()] Signed-off-by: Marcelo Henrique Cerri --- drivers/base/memory.c | 13 +------------ mm/memory_hotplug.c | 28 ++++++++++++++++++++-------- 2 files changed, 21 insertions(+), 20 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index fe1557aa9b10..a3b8aebb95cb 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -224,7 +224,6 @@ static bool pages_correctly_reserved(unsigned long start_pfn) /* * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is * OK to have direct references to sparsemem variables in here. - * Must already be protected by mem_hotplug_begin(). */ static int memory_block_action(unsigned long phys_index, unsigned long action, int online_type) @@ -290,7 +289,6 @@ static int memory_subsys_online(struct device *dev) if (mem->online_type < 0) mem->online_type = MMOP_ONLINE_KEEP; - /* Already under protection of mem_hotplug_begin() */ ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE); /* clear online_type */ @@ -337,19 +335,11 @@ store_mem_state(struct device *dev, goto err; } - /* - * Memory hotplug needs to hold mem_hotplug_begin() for probe to find - * the correct memory block to online before doing device_online(dev), - * which will take dev->mutex. Take the lock early to prevent an - * inversion, memory_subsys_online() callbacks will be implemented by - * assuming it's already protected. - */ - mem_hotplug_begin(); - switch (online_type) { case MMOP_ONLINE_KERNEL: case MMOP_ONLINE_MOVABLE: case MMOP_ONLINE_KEEP: + /* mem->online_type is protected by device_hotplug_lock */ mem->online_type = online_type; ret = device_online(&mem->dev); break; @@ -360,7 +350,6 @@ store_mem_state(struct device *dev, ret = -EINVAL; /* should never happen */ } - mem_hotplug_done(); err: unlock_device_hotplug(); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 6f9e895ce21a..7dd8b84ea849 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -881,7 +881,6 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid, return zone; } -/* Must be protected by mem_hotplug_begin() or a device_lock */ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type) { unsigned long flags; @@ -892,6 +891,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ int ret; struct memory_notify arg; + mem_hotplug_begin(); + nid = pfn_to_nid(pfn); /* associate pfn range with the zone */ zone = move_pfn_range(online_type, nid, pfn, nr_pages); @@ -950,6 +951,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ if (onlined_pages) memory_notify(MEM_ONLINE, &arg); + mem_hotplug_done(); return 0; failed_addition: @@ -957,6 +959,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ (unsigned long long) pfn << PAGE_SHIFT, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); memory_notify(MEM_CANCEL_ONLINE, &arg); + mem_hotplug_done(); return ret; } #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */ @@ -1167,20 +1170,20 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online) /* create new memmap entry */ firmware_map_add_hotplug(start, start + size, "System RAM"); + /* device_online() will take the lock when calling online_pages() */ + mem_hotplug_done(); + /* online pages if requested */ if (online) walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL, online_memory_block); - goto out; - + return ret; error: /* rollback pgdat allocation and others */ if (new_pgdat && pgdat) rollback_node_hotadd(nid, pgdat); memblock_remove(start, size); - -out: mem_hotplug_done(); return ret; } @@ -1621,10 +1624,16 @@ static int __ref __offline_pages(unsigned long start_pfn, return -EINVAL; if (!IS_ALIGNED(end_pfn, pageblock_nr_pages)) return -EINVAL; + + mem_hotplug_begin(); + /* This makes hotplug much easier...and readable. we assume this for now. .*/ - if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end)) + if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, + &valid_end)) { + mem_hotplug_done(); return -EINVAL; + } zone = page_zone(pfn_to_page(valid_start)); node = zone_to_nid(zone); @@ -1633,8 +1642,10 @@ static int __ref __offline_pages(unsigned long start_pfn, /* set above range as isolated */ ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, true); - if (ret) + if (ret) { + mem_hotplug_done(); return ret; + } arg.start_pfn = start_pfn; arg.nr_pages = nr_pages; @@ -1705,6 +1716,7 @@ static int __ref __offline_pages(unsigned long start_pfn, writeback_set_ratelimit(); memory_notify(MEM_OFFLINE, &arg); + mem_hotplug_done(); return 0; failed_removal: @@ -1714,10 +1726,10 @@ static int __ref __offline_pages(unsigned long start_pfn, memory_notify(MEM_CANCEL_OFFLINE, &arg); /* pushback to free area */ undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); + mem_hotplug_done(); return ret; } -/* Must be protected by mem_hotplug_begin() or a device_lock */ int offline_pages(unsigned long start_pfn, unsigned long nr_pages) { return __offline_pages(start_pfn, start_pfn + nr_pages);