[RFC] npu2: Add nvlink2 interconnect information

Message ID 20180815082624.18683-1-aik@ozlabs.ru
State New
Headers show
Series
  • [RFC] npu2: Add nvlink2 interconnect information
Related show

Checks

Context Check Description
snowpatch_ozlabs/make_check success Test make_check on branch master
snowpatch_ozlabs/apply_patch success master/apply_patch Successfully applied

Commit Message

Alexey Kardashevskiy Aug. 15, 2018, 8:26 a.m.
GPUs on Redbud and Sequoia platforms are interconnected between each
other in groups of 2 or 3 GPUs. The problem with that is if we decide
to pass one of GPUs in a group to the userspace (and potentially a guest),
we need to make sure that interconnectd link does not get enabled.

The GPU firmware provides a way to disable links on a GPU. However we
want to disable only links to other GPUs which are not in the same guest
so we need a map of what nvlink is connected to what.

This adds an "ibm,nvlinks" property to every GPU in a "GPUn" slot
with phandles to peer GPUs and NPU PHB, the index in the property is GPU's
link number.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/npu2.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

Comments

Andrew Donnellan Aug. 16, 2018, 5:38 a.m. | #1
On 15/08/18 18:26, Alexey Kardashevskiy wrote:
> GPUs on Redbud and Sequoia platforms are interconnected between each
> other in groups of 2 or 3 GPUs. The problem with that is if we decide
> to pass one of GPUs in a group to the userspace (and potentially a guest),
> we need to make sure that interconnectd link does not get enabled.
> 
> The GPU firmware provides a way to disable links on a GPU. However we
> want to disable only links to other GPUs which are not in the same guest
> so we need a map of what nvlink is connected to what.
> 
> This adds an "ibm,nvlinks" property to every GPU in a "GPUn" slot
> with phandles to peer GPUs and NPU PHB, the index in the property is GPU's
> link number.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Would we be able to use a more descriptive name like 
"ibm,nvlink-peer-devices" or something?
Alexey Kardashevskiy Aug. 16, 2018, 7:08 a.m. | #2
On 16/08/2018 15:38, Andrew Donnellan wrote:
> On 15/08/18 18:26, Alexey Kardashevskiy wrote:
>> GPUs on Redbud and Sequoia platforms are interconnected between each
>> other in groups of 2 or 3 GPUs. The problem with that is if we decide
>> to pass one of GPUs in a group to the userspace (and potentially a
>> guest),
>> we need to make sure that interconnectd link does not get enabled.
>>
>> The GPU firmware provides a way to disable links on a GPU. However we
>> want to disable only links to other GPUs which are not in the same guest
>> so we need a map of what nvlink is connected to what.
>>
>> This adds an "ibm,nvlinks" property to every GPU in a "GPUn" slot
>> with phandles to peer GPUs and NPU PHB, the index in the property is
>> GPU's
>> link number.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Would we be able to use a more descriptive name like
> "ibm,nvlink-peer-devices" or something?


"ibm,nvlink-peers" may be (shorter) but yes, good point.
Reza Arbab Aug. 16, 2018, 6:35 p.m. | #3
On Wed, Aug 15, 2018 at 06:26:24PM +1000, Alexey Kardashevskiy wrote:
>GPUs on Redbud and Sequoia platforms are interconnected between each
>other in groups of 2 or 3 GPUs. The problem with that is if we decide
>to pass one of GPUs in a group to the userspace (and potentially a guest),
>we need to make sure that interconnectd link does not get enabled.
>
>The GPU firmware provides a way to disable links on a GPU. However we
>want to disable only links to other GPUs which are not in the same guest
>so we need a map of what nvlink is connected to what.
>
>This adds an "ibm,nvlinks" property to every GPU in a "GPUn" slot
>with phandles to peer GPUs and NPU PHB, the index in the property is GPU's
>link number.

Seems to work just fine on the Redbud I tried.

You'll probably get asked to add this new property to 
doc/device-tree/nvlink.rst.

Patch

diff --git a/hw/npu2.c b/hw/npu2.c
index 2b35405..0871a85 100644
--- a/hw/npu2.c
+++ b/hw/npu2.c
@@ -732,6 +732,91 @@  static void npu2_phb_fixup_scominit(struct dt_node *dn, int links_per_gpu)
 	xscom_write_mask(gcid, 0x50114c0, val, mask);
 }
 
+static int gpu_slot_to_num(const char *slot)
+{
+	char *p = NULL;
+	int ret;
+
+	if (!slot)
+		return -1;
+
+	if (memcmp(slot, "GPU", 3))
+		return -1;
+
+	ret = strtol(slot + 3, &p, 10);
+	if (*p || p == slot)
+		return -1;
+
+	return ret;
+}
+
+static void npu2_phb_nvlink_dt(struct phb *npuphb, int links_per_gpu)
+{
+	struct dt_node *g[3] = { 0 }; /* Current maximum is 3 GPUs per 1 NPU */
+	const int max_gpus = 6 / links_per_gpu;
+	struct npu2 *npu2_phb = phb_to_npu2_nvlink(npuphb);
+	const u32 npuph = npuphb->dt_node->phandle;
+	int i, gpuid, first = max_gpus, last = 0;
+
+	/* Find the indexes of GPUs connected to this NPU */
+	for (i = 0; i < npu2_phb->total_devices; ++i) {
+		gpuid = gpu_slot_to_num(npu2_phb->devices[i].nvlink.slot_label);
+		if (gpuid < 0)
+			continue;
+		if (gpuid > last)
+			last = gpuid;
+		if (gpuid < first)
+			first = gpuid;
+	}
+
+	/* Either no "GPUx" slots found or they are not consecutive, abort */
+	if (!last || last + 1 - first > max_gpus)
+		return;
+
+	/* Collect GPU device nodes, sorted by an index from "GPUn" */
+	for (i = 0; i < npu2_phb->total_devices; ++i) {
+		gpuid = gpu_slot_to_num(npu2_phb->devices[i].nvlink.slot_label);
+		g[gpuid - first] = npu2_phb->devices[i].nvlink.pd->dn;
+	}
+
+	/*
+	 * Store interconnect phandles in the device tree.
+	 * The mapping is from Witherspoon_Design_Workbook_v1.7_19June2018.pdf,
+	 * pages 39 (Sequoia), 40 (Redbud):
+	 * Figure 16: NVLink wiring diagram for planar with 6 GPUs - SXM2 connector
+	 * Figure 17: NVLink wiring diagram for planar with 4 GPUs - SXM2 connector
+	 */
+	switch (last + 1 - first) {
+	case 2: /* Redbud */
+		dt_add_property_cells(g[0], "ibm,nvlinks",
+				      g[1]->phandle, npuph,
+				      g[1]->phandle, npuph,
+				      g[1]->phandle, npuph);
+		dt_add_property_cells(g[1], "ibm,nvlinks",
+				      g[0]->phandle, npuph,
+				      g[0]->phandle, npuph,
+				      g[0]->phandle, npuph);
+		break;
+	case 3: /* Sequoia */
+		dt_add_property_cells(g[0], "ibm,nvlinks",
+				      g[1]->phandle, npuph,
+				      g[2]->phandle, g[2]->phandle,
+				      g[1]->phandle, npuph);
+		dt_add_property_cells(g[1], "ibm,nvlinks",
+				      g[0]->phandle, npuph,
+				      g[2]->phandle, g[2]->phandle,
+				      g[0]->phandle, npuph);
+		dt_add_property_cells(g[2], "ibm,nvlinks",
+				      g[1]->phandle, g[0]->phandle,
+				      g[1]->phandle, npuph,
+				      g[0]->phandle, npuph);
+		break;
+	default:
+		prlog(PR_NOTICE, "Failed to detect the exact platform\n");
+		break;
+	}
+}
+
 static void npu2_phb_final_fixup(struct phb *phb)
 {
 	int links_per_gpu = 0;
@@ -746,6 +831,8 @@  static void npu2_phb_final_fixup(struct phb *phb)
 	pci_walk_dev(phb, NULL, npu2_links_per_gpu, &links_per_gpu);
 	dt_for_each_compatible(dt_root, np, "ibm,power9-npu")
 		npu2_phb_fixup_scominit(np, links_per_gpu);
+
+	npu2_phb_nvlink_dt(phb, links_per_gpu);
 }
 
 static void npu2_init_ioda_cache(struct npu2 *p)