diff mbox series

[v2] PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology

Message ID 1642560329-5012-1-git-send-email-longli@linuxonhyperv.com
State New
Headers show
Series [v2] PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology | expand

Commit Message

Long Li Jan. 19, 2022, 2:45 a.m. UTC
From: Long Li <longli@microsoft.com>

When kernel boots with a NUMA topology with some NUMA nodes offline, the PCI
driver should only set an online NUMA node on the device. This can happen
during KDUMP where some NUMA nodes are not made online by the KDUMP kernel.

This patch also fixes the case where kernel is booting with "numa=off".

Signed-off-by: Long Li <longli@microsoft.com>

Change from v1:
Use numa_map_to_online_node() to assign a node to device (suggested by
Michael Kelly <mikelley@microsoft.com>)

---
 drivers/pci/controller/pci-hyperv.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Michael Kelley (LINUX) Jan. 19, 2022, 4:40 a.m. UTC | #1
From: longli@linuxonhyperv.com <longli@linuxonhyperv.com> Sent: Tuesday, January 18, 2022 6:45 PM
> 
> When kernel boots with a NUMA topology with some NUMA nodes offline, the PCI
> driver should only set an online NUMA node on the device. This can happen
> during KDUMP where some NUMA nodes are not made online by the KDUMP kernel.
> 
> This patch also fixes the case where kernel is booting with "numa=off".
> 
> Signed-off-by: Long Li <longli@microsoft.com>

It seems like adding a "Fixes:" tag would be appropriate.

> 
> Change from v1:
> Use numa_map_to_online_node() to assign a node to device (suggested by
> Michael Kelly <mikelley@microsoft.com>)
> 
> ---
>  drivers/pci/controller/pci-hyperv.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 6c9efeefae1b..c7519add6f13 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -2130,7 +2130,15 @@ static void hv_pci_assign_numa_node(struct
> hv_pcibus_device *hbus)
>  			continue;
> 
>  		if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY)
> -			set_dev_node(&dev->dev, hv_dev->desc.virtual_numa_node);
> +			/*
> +			 * The kernel may boot with some NUMA nodes offline
> +			 * (e.g. in a KDUMP kernel) or with NUMA disabled via
> +			 * "numa=off". In those cases, adjust the host provided
> +			 * NUMA node to a valid NUMA node used by the kernel.
> +			 */
> +			set_dev_node(&dev->dev,
> +				     numa_map_to_online_node(
> +					     hv_dev->desc.virtual_numa_node));

Double-check me, but I think this approach has a flaw in that
numa_map_to_online_node() doesn't check the input node for being
out-of-range.  The call to node_online() uses the input node to index into
the node_states bitmap (of type nodemask_t), and could go off the end of
the bitmap if the input node isn't validated to be less than MAX_NUMNODES
(or nr_node_ids).

Michael

> 
>  		put_pcichild(hv_dev);
>  	}
> --
> 2.25.1
Long Li Jan. 19, 2022, 5:56 a.m. UTC | #2
> Subject: RE: [Patch v2] PCI: hv: Fix NUMA node assignment when kernel boots
> with custom NUMA topology
> 
> From: longli@linuxonhyperv.com <longli@linuxonhyperv.com> Sent: Tuesday,
> January 18, 2022 6:45 PM
> >
> > When kernel boots with a NUMA topology with some NUMA nodes offline,
> > the PCI driver should only set an online NUMA node on the device. This
> > can happen during KDUMP where some NUMA nodes are not made online by
> the KDUMP kernel.
> >
> > This patch also fixes the case where kernel is booting with "numa=off".
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> 
> It seems like adding a "Fixes:" tag would be appropriate.

Okay, sending v3 with "Fixes:".

> 
> >
> > Change from v1:
> > Use numa_map_to_online_node() to assign a node to device (suggested by
> > Michael Kelly <mikelley@microsoft.com>)
> >
> > ---
> >  drivers/pci/controller/pci-hyperv.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/controller/pci-hyperv.c
> > b/drivers/pci/controller/pci-hyperv.c
> > index 6c9efeefae1b..c7519add6f13 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -2130,7 +2130,15 @@ static void hv_pci_assign_numa_node(struct
> > hv_pcibus_device *hbus)
> >  			continue;
> >
> >  		if (hv_dev->desc.flags &
> HV_PCI_DEVICE_FLAG_NUMA_AFFINITY)
> > -			set_dev_node(&dev->dev, hv_dev-
> >desc.virtual_numa_node);
> > +			/*
> > +			 * The kernel may boot with some NUMA nodes offline
> > +			 * (e.g. in a KDUMP kernel) or with NUMA disabled via
> > +			 * "numa=off". In those cases, adjust the host provided
> > +			 * NUMA node to a valid NUMA node used by the kernel.
> > +			 */
> > +			set_dev_node(&dev->dev,
> > +				     numa_map_to_online_node(
> > +					     hv_dev->desc.virtual_numa_node));
> 
> Double-check me, but I think this approach has a flaw in that
> numa_map_to_online_node() doesn't check the input node for being out-of-
> range.  The call to node_online() uses the input node to index into the
> node_states bitmap (of type nodemask_t), and could go off the end of the
> bitmap if the input node isn't validated to be less than MAX_NUMNODES (or
> nr_node_ids).

Indeed, your concern is correct when "numa=off" is set.

> 
> Michael
> 
> >
> >  		put_pcichild(hv_dev);
> >  	}
> > --
> > 2.25.1
diff mbox series

Patch

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 6c9efeefae1b..c7519add6f13 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2130,7 +2130,15 @@  static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus)
 			continue;
 
 		if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY)
-			set_dev_node(&dev->dev, hv_dev->desc.virtual_numa_node);
+			/*
+			 * The kernel may boot with some NUMA nodes offline
+			 * (e.g. in a KDUMP kernel) or with NUMA disabled via
+			 * "numa=off". In those cases, adjust the host provided
+			 * NUMA node to a valid NUMA node used by the kernel.
+			 */
+			set_dev_node(&dev->dev,
+				     numa_map_to_online_node(
+					     hv_dev->desc.virtual_numa_node));
 
 		put_pcichild(hv_dev);
 	}