diff mbox series

[11/13] hw/npu2: Fix OpenCAPI PE assignment

Message ID 87ffa1801b51c2f13e39128c57b50596be50215d.1544597914.git-series.andrew.donnellan@au1.ibm.com
State Superseded
Headers show
Series Support OpenCAPI and NVLink devices on same NPU on Witherspoon | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success master/apply_patch Successfully applied
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot success Test snowpatch/job/snowpatch-skiboot on branch master

Commit Message

Andrew Donnellan Dec. 12, 2018, 6:58 a.m. UTC
When we support mixing NVLink and OpenCAPI devices on the same NPU, we're
going to have to share the same range of 16 PE numbers between NVLink and
OpenCAPI PHBs.

For OpenCAPI devices, PE assignment is only significant for determining
which System Interrupt Log register is used for a particular brick - unlike
NVLink, it doesn't play any role in determining how links are fenced.

Split the PE range into a lower half which is used for NVLink, and an upper
half that is used for OpenCAPI, with a fixed PE number assigned per brick.

As the PE assignment for OpenCAPI devices is fixed, set the PE once
during device init and then ignore calls to the set_pe() operation.

Suggested-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
---
 hw/npu2-opencapi.c | 75 +++++++++++++++++++++--------------------------
 include/npu2.h     | 21 +++++++++++--
 2 files changed, 52 insertions(+), 44 deletions(-)

Comments

Frederic Barrat Jan. 8, 2019, 1:19 p.m. UTC | #1
Le 12/12/2018 à 07:58, Andrew Donnellan a écrit :
> When we support mixing NVLink and OpenCAPI devices on the same NPU, we're
> going to have to share the same range of 16 PE numbers between NVLink and
> OpenCAPI PHBs.
> 
> For OpenCAPI devices, PE assignment is only significant for determining
> which System Interrupt Log register is used for a particular brick - unlike
> NVLink, it doesn't play any role in determining how links are fenced.
> 
> Split the PE range into a lower half which is used for NVLink, and an upper
> half that is used for OpenCAPI, with a fixed PE number assigned per brick.
> 
> As the PE assignment for OpenCAPI devices is fixed, set the PE once
> during device init and then ignore calls to the set_pe() operation.
> 
> Suggested-by: Frederic Barrat <fbarrat@linux.ibm.com>
> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> ---
>   hw/npu2-opencapi.c | 75 +++++++++++++++++++++--------------------------
>   include/npu2.h     | 21 +++++++++++--
>   2 files changed, 52 insertions(+), 44 deletions(-)
> 
> diff --git a/hw/npu2-opencapi.c b/hw/npu2-opencapi.c
> index d79727ad4199..a57f556b102d 100644
> --- a/hw/npu2-opencapi.c
> +++ b/hw/npu2-opencapi.c
> @@ -482,6 +482,20 @@ static void brick_config(uint32_t gcid, uint32_t scom_base, int index)
>   	enable_pb_snooping(gcid, scom_base, index);
>   }
> 
> +/* Procedure 13.1.3.4 - Brick to PE Mapping */
> +static void pe_config(struct npu2_dev *dev)
> +{
> +	/* We currently use a fixed PE assignment per brick */
> +	uint64_t val, reg;
> +	val = NPU2_MISC_BRICK_BDF2PE_MAP_ENABLE;
> +	val = SETFIELD(NPU2_MISC_BRICK_BDF2PE_MAP_PE, val, NPU2_OCAPI_PE(dev));
> +	val = SETFIELD(NPU2_MISC_BRICK_BDF2PE_MAP_BDF, val, 0);
> +	reg = NPU2_REG_OFFSET(NPU2_STACK_MISC, NPU2_BLOCK_MISC,
> +			      NPU2_MISC_BRICK0_BDF2PE_MAP0 +
> +			      (dev->brick_index * 0x18));
> +	npu2_write(dev->npu, reg, val);
> +}
> +

I can see that you don't populate the pdf2pe_cache of the NPU. I could 
have sworn that I saw a patchset to remove it, since it's not used, but 
I can't find it on the skiboot list. In the meantime, shouldn't we keep 
it, at least for consistency?


> @@ -1424,7 +1408,13 @@ static void setup_device(struct npu2_dev *dev)
>   	dt_add_property_cells(dn_phb, "ibm,links", 1);
>   	dt_add_property(dn_phb, "ibm,mmio-window", mm_win, sizeof(mm_win));
>   	dt_add_property_cells(dn_phb, "ibm,phb-diag-data-size", 0);
> +
> +	/*
> +	 * We ignore whatever PE numbers Linux tries to set, so we just
> +	 * advertise enough that Linux won't complain
> +	 */
>   	dt_add_property_cells(dn_phb, "ibm,opal-num-pes", NPU2_MAX_PE_NUM);
> +	dt_add_property_cells(dn_phb, "ibm,opal-reserved-pe", NPU2_RESERVED_PE_NUM);

I'm missing the subtlety that does...

   Fred


> 
>   	dt_add_property_cells(dn_phb, "ranges", 0x02000000,
>   			      hi32(mm_win[0]), lo32(mm_win[0]),
> @@ -1504,6 +1494,9 @@ int npu2_opencapi_init_npu(struct npu2 *npu)
>   		/* Procedure 13.1.3.1 - Select OCAPI vs NVLink */
>   		brick_config(npu->chip_id, npu->xscom_base, dev->brick_index);
> 
> +		/* Procedure 13.1.3.4 - Brick to PE Mapping */
> +		pe_config(dev);
> +
>   		/* Procedure 13.1.3.5 - Transaction Layer Configuration */
>   		tl_config(npu->chip_id, npu->xscom_base, dev->brick_index);
> 
> diff --git a/include/npu2.h b/include/npu2.h
> index 8f4747006980..c7b20f19fde1 100644
> --- a/include/npu2.h
> +++ b/include/npu2.h
> @@ -45,9 +45,24 @@
>   					  dev->npu->chip_id, dev->brick_index, ## a)
> 
> 
> -/* Number of PEs supported */
> -#define NPU2_MAX_PE_NUM		16
> -#define NPU2_RESERVED_PE_NUM	15
> +/*
> + * Number of PEs supported
> + *
> + * The NPU supports PE numbers from 0-15. At present, we only assign a maximum
> + * of 1 PE per brick.
> + *
> + * NVLink devices are currently exposed to Linux underneath a single virtual
> + * PHB. Therefore, we give NVLink half the available PEs, which is enough for
> + * 6 bricks plus 1 reserved PE.
> + *
> + * For OpenCAPI, the BDF-to-PE registers are used exclusively for mapping
> + * bricks to System Interrupt Log registers (the BDF component of those
> + * registers is ignored). Currently, we allocate a fixed PE based on the brick
> + * index in the upper half of the PE namespace.
> + */
> +#define NPU2_MAX_PE_NUM		8
> +#define NPU2_RESERVED_PE_NUM	7
> +#define NPU2_OCAPI_PE(ndev) ((ndev)->brick_index + NPU2_MAX_PE_NUM)
> 
>   #define NPU2_LINKS_PER_CHIP 6
>
Andrew Donnellan Jan. 9, 2019, 7:26 a.m. UTC | #2
On 9/1/19 12:19 am, Frederic Barrat wrote:
>> +/* Procedure 13.1.3.4 - Brick to PE Mapping */
>> +static void pe_config(struct npu2_dev *dev)
>> +{
>> +    /* We currently use a fixed PE assignment per brick */
>> +    uint64_t val, reg;
>> +    val = NPU2_MISC_BRICK_BDF2PE_MAP_ENABLE;
>> +    val = SETFIELD(NPU2_MISC_BRICK_BDF2PE_MAP_PE, val, 
>> NPU2_OCAPI_PE(dev));
>> +    val = SETFIELD(NPU2_MISC_BRICK_BDF2PE_MAP_BDF, val, 0);
>> +    reg = NPU2_REG_OFFSET(NPU2_STACK_MISC, NPU2_BLOCK_MISC,
>> +                  NPU2_MISC_BRICK0_BDF2PE_MAP0 +
>> +                  (dev->brick_index * 0x18));
>> +    npu2_write(dev->npu, reg, val);
>> +}
>> +
> 
> I can see that you don't populate the pdf2pe_cache of the NPU. I could 
> have sworn that I saw a patchset to remove it, since it's not used, but 
> I can't find it on the skiboot list. In the meantime, shouldn't we keep 
> it, at least for consistency?

IIRC, Reza was working on some patches to do that which he's shared 
internally but I don't think he's posted to the list yet.

Reza: if you feel like posting that I will happily rebase on top.

> 
> 
>> @@ -1424,7 +1408,13 @@ static void setup_device(struct npu2_dev *dev)
>>       dt_add_property_cells(dn_phb, "ibm,links", 1);
>>       dt_add_property(dn_phb, "ibm,mmio-window", mm_win, sizeof(mm_win));
>>       dt_add_property_cells(dn_phb, "ibm,phb-diag-data-size", 0);
>> +
>> +    /*
>> +     * We ignore whatever PE numbers Linux tries to set, so we just
>> +     * advertise enough that Linux won't complain
>> +     */
>>       dt_add_property_cells(dn_phb, "ibm,opal-num-pes", NPU2_MAX_PE_NUM);
>> +    dt_add_property_cells(dn_phb, "ibm,opal-reserved-pe", 
>> NPU2_RESERVED_PE_NUM);
> 
> I'm missing the subtlety that does...

Really for consistency with NVLink... without it the reserved PE 
defaults to 0. I'm *fairly* sure it doesn't really matter one way or the 
other.

> 
>    Fred
> 
> 
>>
>>       dt_add_property_cells(dn_phb, "ranges", 0x02000000,
>>                     hi32(mm_win[0]), lo32(mm_win[0]),
>> @@ -1504,6 +1494,9 @@ int npu2_opencapi_init_npu(struct npu2 *npu)
>>           /* Procedure 13.1.3.1 - Select OCAPI vs NVLink */
>>           brick_config(npu->chip_id, npu->xscom_base, dev->brick_index);
>>
>> +        /* Procedure 13.1.3.4 - Brick to PE Mapping */
>> +        pe_config(dev);
>> +
>>           /* Procedure 13.1.3.5 - Transaction Layer Configuration */
>>           tl_config(npu->chip_id, npu->xscom_base, dev->brick_index);
>>
>> diff --git a/include/npu2.h b/include/npu2.h
>> index 8f4747006980..c7b20f19fde1 100644
>> --- a/include/npu2.h
>> +++ b/include/npu2.h
>> @@ -45,9 +45,24 @@
>>                         dev->npu->chip_id, dev->brick_index, ## a)
>>
>>
>> -/* Number of PEs supported */
>> -#define NPU2_MAX_PE_NUM        16
>> -#define NPU2_RESERVED_PE_NUM    15
>> +/*
>> + * Number of PEs supported
>> + *
>> + * The NPU supports PE numbers from 0-15. At present, we only assign 
>> a maximum
>> + * of 1 PE per brick.
>> + *
>> + * NVLink devices are currently exposed to Linux underneath a single 
>> virtual
>> + * PHB. Therefore, we give NVLink half the available PEs, which is 
>> enough for
>> + * 6 bricks plus 1 reserved PE.
>> + *
>> + * For OpenCAPI, the BDF-to-PE registers are used exclusively for 
>> mapping
>> + * bricks to System Interrupt Log registers (the BDF component of those
>> + * registers is ignored). Currently, we allocate a fixed PE based on 
>> the brick
>> + * index in the upper half of the PE namespace.
>> + */
>> +#define NPU2_MAX_PE_NUM        8
>> +#define NPU2_RESERVED_PE_NUM    7
>> +#define NPU2_OCAPI_PE(ndev) ((ndev)->brick_index + NPU2_MAX_PE_NUM)
>>
>>   #define NPU2_LINKS_PER_CHIP 6
>>
Reza Arbab Jan. 9, 2019, 3:53 p.m. UTC | #3
On Wed, Jan 09, 2019 at 06:26:11PM +1100, Andrew Donnellan wrote:
>On 9/1/19 12:19 am, Frederic Barrat wrote:
>>I can see that you don't populate the pdf2pe_cache of the NPU. I could 
>>have sworn that I saw a patchset to remove it, since it's not used, 
>>but I can't find it on the skiboot list. In the meantime, shouldn't we 
>>keep it, at least for consistency?
>
>IIRC, Reza was working on some patches to do that which he's shared 
>internally but I don't think he's posted to the list yet.
>
>Reza: if you feel like posting that I will happily rebase on top.

Sure. I've been amassing a series of trivial npu2 fixes. Since you 
volunteered to rebase, I guess we might as well try to get them in now.
diff mbox series

Patch

diff --git a/hw/npu2-opencapi.c b/hw/npu2-opencapi.c
index d79727ad4199..a57f556b102d 100644
--- a/hw/npu2-opencapi.c
+++ b/hw/npu2-opencapi.c
@@ -482,6 +482,20 @@  static void brick_config(uint32_t gcid, uint32_t scom_base, int index)
 	enable_pb_snooping(gcid, scom_base, index);
 }
 
+/* Procedure 13.1.3.4 - Brick to PE Mapping */
+static void pe_config(struct npu2_dev *dev)
+{
+	/* We currently use a fixed PE assignment per brick */
+	uint64_t val, reg;
+	val = NPU2_MISC_BRICK_BDF2PE_MAP_ENABLE;
+	val = SETFIELD(NPU2_MISC_BRICK_BDF2PE_MAP_PE, val, NPU2_OCAPI_PE(dev));
+	val = SETFIELD(NPU2_MISC_BRICK_BDF2PE_MAP_BDF, val, 0);
+	reg = NPU2_REG_OFFSET(NPU2_STACK_MISC, NPU2_BLOCK_MISC,
+			      NPU2_MISC_BRICK0_BDF2PE_MAP0 +
+			      (dev->brick_index * 0x18));
+	npu2_write(dev->npu, reg, val);
+}
+
 /* Procedure 13.1.3.5 - TL Configuration */
 static void tl_config(uint32_t gcid, uint32_t scom_base, uint64_t index)
 {
@@ -1214,48 +1228,18 @@  static int64_t npu2_opencapi_ioda_reset(struct phb __unused *phb,
 	return OPAL_SUCCESS;
 }
 
-static int64_t npu2_opencapi_set_pe(struct phb *phb,
-				    uint64_t pe_num,
-				    uint64_t bdfn,
-				    uint8_t bcompare,
-				    uint8_t dcompare,
-				    uint8_t fcompare,
-				    uint8_t action)
+static int64_t npu2_opencapi_set_pe(struct phb __unused *phb,
+				    uint64_t __unused pe_num,
+				    uint64_t __unused bdfn,
+				    uint8_t __unused bcompare,
+				    uint8_t __unused dcompare,
+				    uint8_t __unused fcompare,
+				    uint8_t __unused action)
 {
-	struct npu2 *p;
-	struct npu2_dev *dev;
-	uint64_t reg, val, pe_bdfn;
-
-	/* Sanity check */
-	if (action != OPAL_MAP_PE && action != OPAL_UNMAP_PE)
-		return OPAL_PARAMETER;
-	if (pe_num >= NPU2_MAX_PE_NUM)
-		return OPAL_PARAMETER;
-	if (bdfn >> 8)
-		return OPAL_PARAMETER;
-	if (bcompare != OpalPciBusAll ||
-	    dcompare != OPAL_COMPARE_RID_DEVICE_NUMBER ||
-	    fcompare != OPAL_COMPARE_RID_FUNCTION_NUMBER)
-		return OPAL_UNSUPPORTED;
-
-	/* Get the NPU2 device */
-	dev = phb_to_npu2_dev_ocapi(phb);
-	if (!dev)
-		return OPAL_PARAMETER;
-
-	p = dev->npu;
-
-	pe_bdfn = dev->bdfn;
-
-	val = NPU2_MISC_BRICK_BDF2PE_MAP_ENABLE;
-	val = SETFIELD(NPU2_MISC_BRICK_BDF2PE_MAP_PE, val, pe_num);
-	val = SETFIELD(NPU2_MISC_BRICK_BDF2PE_MAP_BDF, val, pe_bdfn);
-	reg = NPU2_REG_OFFSET(NPU2_STACK_MISC, NPU2_BLOCK_MISC,
-			      NPU2_MISC_BRICK0_BDF2PE_MAP0 +
-			      (dev->brick_index * 0x18));
-	p->bdf2pe_cache[dev->brick_index] = val;
-	npu2_write(p, reg, val);
-
+	/*
+	 * Ignored on OpenCAPI - we use fixed PE assignments. May need
+	 * addressing when we support dual-link devices.
+	 */
 	return OPAL_SUCCESS;
 }
 
@@ -1424,7 +1408,13 @@  static void setup_device(struct npu2_dev *dev)
 	dt_add_property_cells(dn_phb, "ibm,links", 1);
 	dt_add_property(dn_phb, "ibm,mmio-window", mm_win, sizeof(mm_win));
 	dt_add_property_cells(dn_phb, "ibm,phb-diag-data-size", 0);
+
+	/*
+	 * We ignore whatever PE numbers Linux tries to set, so we just
+	 * advertise enough that Linux won't complain
+	 */
 	dt_add_property_cells(dn_phb, "ibm,opal-num-pes", NPU2_MAX_PE_NUM);
+	dt_add_property_cells(dn_phb, "ibm,opal-reserved-pe", NPU2_RESERVED_PE_NUM);
 
 	dt_add_property_cells(dn_phb, "ranges", 0x02000000,
 			      hi32(mm_win[0]), lo32(mm_win[0]),
@@ -1504,6 +1494,9 @@  int npu2_opencapi_init_npu(struct npu2 *npu)
 		/* Procedure 13.1.3.1 - Select OCAPI vs NVLink */
 		brick_config(npu->chip_id, npu->xscom_base, dev->brick_index);
 
+		/* Procedure 13.1.3.4 - Brick to PE Mapping */
+		pe_config(dev);
+
 		/* Procedure 13.1.3.5 - Transaction Layer Configuration */
 		tl_config(npu->chip_id, npu->xscom_base, dev->brick_index);
 
diff --git a/include/npu2.h b/include/npu2.h
index 8f4747006980..c7b20f19fde1 100644
--- a/include/npu2.h
+++ b/include/npu2.h
@@ -45,9 +45,24 @@ 
 					  dev->npu->chip_id, dev->brick_index, ## a)
 
 
-/* Number of PEs supported */
-#define NPU2_MAX_PE_NUM		16
-#define NPU2_RESERVED_PE_NUM	15
+/*
+ * Number of PEs supported
+ *
+ * The NPU supports PE numbers from 0-15. At present, we only assign a maximum
+ * of 1 PE per brick.
+ *
+ * NVLink devices are currently exposed to Linux underneath a single virtual
+ * PHB. Therefore, we give NVLink half the available PEs, which is enough for
+ * 6 bricks plus 1 reserved PE.
+ *
+ * For OpenCAPI, the BDF-to-PE registers are used exclusively for mapping
+ * bricks to System Interrupt Log registers (the BDF component of those
+ * registers is ignored). Currently, we allocate a fixed PE based on the brick
+ * index in the upper half of the PE namespace.
+ */
+#define NPU2_MAX_PE_NUM		8
+#define NPU2_RESERVED_PE_NUM	7
+#define NPU2_OCAPI_PE(ndev) ((ndev)->brick_index + NPU2_MAX_PE_NUM)
 
 #define NPU2_LINKS_PER_CHIP 6