diff mbox series

[SRU,J,1/2] device-dax: Fix duplicate 'hmem' device registration

Message ID 20230922170910.12844-2-michael.reed@canonical.com
State New
Headers show
Series Duplicate Device_dax ids Created and hence Probing is Failing | expand

Commit Message

Michael Reed Sept. 22, 2023, 5:09 p.m. UTC
From: Dan Williams <dan.j.williams@intel.com>

BugLink: https://bugs.launchpad.net/bugs/2028158

So called "soft-reserved" memory is an EFI conventional memory range
with the EFI_MEMORY_SP attribute set. That attribute indicates that the
memory is not part of the platform general purpose memory pool and may
want some consideration from the system administrator about whether to
keep that memory set aside for dedicated access through device-dax (map
a device file), or assigned to the page allocator as another general
purpose memory node target.

Absent an ACPI HMAT table the default device-dax registration creates
coarse grained devices that are delineated by EFI Memory Map entries.
With the HMAT the devices are delineated by the finer grained ranges
associated with the proximity domain of the memory target. I.e. the HMAT
describes the properties of performance differentiated memory and each
unique performance description results in a unique target proximity
domain where each memory proximity domain has an associated SRAT entry
that delineates the address range.

The intent was that SRAT-defined device-dax instances are registered
first. Then any left-over address range with the EFI_MEMORY_SP
attribute, but not covered by the SRAT, would have a coarse grained
device-dax instance established. However, the scheme to detect what
ranges are left to be assigned to a device was buggy and resulted in
multiple overlapping device-dax instances. Fix this by using explicit
tracking for which ranges have been handled.

Now, this new approach may leave memory stranded in the presence of
broken platform firmware that fails to fully describe all EFI_MEMORY_SP
ranges in the HMAT. That requires a deeper fix if it becomes a problem
in practice.

Reported-by: "Tallam Mahendra Kumar" <tallam.mahendra.kumar@intel.com>
Reported-by: Mustafa Hajeer <mustafa.hajeer@intel.com>
Debugged-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/166890823379.4183293.15333502171004313377.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
(backported from commit 472faf72b33d80aa8e7a99c9410c1a23d3bf0cd8)
[Michael Reed - Simple adjustment for location of patch]
Signed-off-by: Michael Reed <Michael.Reed@canonical.com>
---
 drivers/dax/hmem/device.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)
diff mbox series

Patch

diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
index acf31cc1dbcc..d0f897c7f52a 100644
--- a/drivers/dax/hmem/device.c
+++ b/drivers/dax/hmem/device.c
@@ -8,6 +8,13 @@ 
 static bool nohmem;
 module_param_named(disable, nohmem, bool, 0444);
 
+static struct resource hmem_active = {
+	.name = "HMEM devices",
+	.start = 0,
+	.end = -1,
+	.flags = IORESOURCE_MEM,
+};
+
 void hmem_register_device(int target_nid, struct resource *r)
 {
 	/* define a clean / non-busy resource for the platform device */
@@ -41,6 +48,12 @@  void hmem_register_device(int target_nid, struct resource *r)
 		goto out_pdev;
 	}
 
+	if (!__request_region(&hmem_active, res.start, resource_size(&res),
+			      dev_name(&pdev->dev), 0)) {
+		dev_dbg(&pdev->dev, "hmem range %pr already active\n", &res);
+		goto out_active;
+	}
+
 	pdev->dev.numa_node = numa_map_to_online_node(target_nid);
 	info = (struct memregion_info) {
 		.target_node = target_nid,
@@ -66,22 +79,15 @@  void hmem_register_device(int target_nid, struct resource *r)
 	return;
 
 out_resource:
-	put_device(&pdev->dev);
+	__release_region(&hmem_active, res.start, resource_size(&res));
+out_active:
+	platform_device_put(pdev);
 out_pdev:
 	memregion_free(id);
 }
 
 static __init int hmem_register_one(struct resource *res, void *data)
 {
-	/*
-	 * If the resource is not a top-level resource it was already
-	 * assigned to a device by the HMAT parsing.
-	 */
-	if (res->parent != &iomem_resource) {
-		pr_info("HMEM: skip %pr, already claimed\n", res);
-		return 0;
-	}
-
 	hmem_register_device(phys_to_target_node(res->start), res);
 
 	return 0;