diff mbox series

[20/23] cxl/port: Introduce a port driver

Message ID 20211120000250.1663391-21-ben.widawsky@intel.com
State New
Headers show
Series Add drivers for CXL ports and mem devices | expand

Commit Message

Ben Widawsky Nov. 20, 2021, 12:02 a.m. UTC
The CXL port driver is responsible for managing the decoder resources
contained within the port. It will also provide APIs that other drivers
will consume for managing these resources.

There are 4 types of ports in a system:
1. Platform port. This is a non-programmable entity. Such a port is
   named rootX. It is enumerated by cxl_acpi in an ACPI based system.
2. Hostbridge port. This ports register access is defined in a platform
   specific way (CHBS for ACPI platforms). It has n downstream ports,
   each of which are known as CXL 2.0 root ports. Once the platform
   specific mechanism to get the offset to the registers is obtained it
   operates just like other CXL components. The enumeration of this
   component is started by cxl_acpi and completed by cxl_port.
3. Switch port. A switch port is similar to a hostbridge port except
   register access is defined in the CXL specification in a platform
   agnostic way. The downstream ports for a switch are simply known as
   downstream ports. The enumeration of these are entirely contained
   within cxl_port.
4. Endpoint port. Endpoint ports are similar to switch ports with the
   exception that they have no downstream ports, only the underlying
   media on the device. The enumeration of these are started by cxl_pci,
   and completed by cxl_port.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>

---
Changes since RFCv2:
- Reword commit message tense (Dan)
- Reword commit message
- Drop SOFTDEP since it's not needed yet (Dan)
- Add CONFIG_CXL_PORT (Dan)
- s/CXL_DECODER_F_EN/CXL_DECODER_F_ENABLE (Dan)
- rename cxl_hdm_decoder_ functions to "to_" (Dan)
- remove useless inline (Dan)
- Check endpoint decoder based on dport list instead of driver id (Dan)
- Use range instead of resource per dependent patch change
- Use clever union packing for target list (Dan)
- Only check NULL from devm_cxl_iomap_block (Dan)
- Properly parent the created cxl decoders
- Move bus rescanning from cxl_acpi to here (Dan)
- Remove references to "CFMWS" in core (Dan)
- Use macro to help keep within 80 character lines
---
 .../driver-api/cxl/memory-devices.rst         |   5 +
 drivers/cxl/Kconfig                           |  22 ++
 drivers/cxl/Makefile                          |   2 +
 drivers/cxl/core/bus.c                        |  67 ++++
 drivers/cxl/core/regs.c                       |   6 +-
 drivers/cxl/cxl.h                             |  34 +-
 drivers/cxl/port.c                            | 323 ++++++++++++++++++
 7 files changed, 450 insertions(+), 9 deletions(-)
 create mode 100644 drivers/cxl/port.c

Comments

kernel test robot Nov. 20, 2021, 3:14 a.m. UTC | #1
Hi Ben,

I love your patch! Yet something to improve:

[auto build test ERROR on 53989fad1286e652ea3655ae3367ba698da8d2ff]

url:    https://github.com/0day-ci/linux/commits/Ben-Widawsky/Add-drivers-for-CXL-ports-and-mem-devices/20211120-080513
base:   53989fad1286e652ea3655ae3367ba698da8d2ff
config: alpha-randconfig-r026-20211118 (attached as .config)
compiler: alpha-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/8ff43502e84dd4fa1296a131cb0cc82146389db4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ben-Widawsky/Add-drivers-for-CXL-ports-and-mem-devices/20211120-080513
        git checkout 8ff43502e84dd4fa1296a131cb0cc82146389db4
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=alpha 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/cxl/port.c: In function 'get_decoder_size':
>> drivers/cxl/port.c:105:16: error: implicit declaration of function 'ioread64_hi_lo' [-Werror=implicit-function-declaration]
     105 |         return ioread64_hi_lo(hdm_decoder +
         |                ^~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/ioread64_hi_lo +105 drivers/cxl/port.c

    97	
    98	static u64 get_decoder_size(void __iomem *hdm_decoder, int n)
    99	{
   100		u32 ctrl = readl(hdm_decoder + CXL_HDM_DECODER0_CTRL_OFFSET(n));
   101	
   102		if (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED)
   103			return 0;
   104	
 > 105		return ioread64_hi_lo(hdm_decoder +
   106				      CXL_HDM_DECODER0_SIZE_LOW_OFFSET(n));
   107	}
   108	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
kernel test robot Nov. 20, 2021, 5:38 a.m. UTC | #2
Hi Ben,

I love your patch! Yet something to improve:

[auto build test ERROR on 53989fad1286e652ea3655ae3367ba698da8d2ff]

url:    https://github.com/0day-ci/linux/commits/Ben-Widawsky/Add-drivers-for-CXL-ports-and-mem-devices/20211120-080513
base:   53989fad1286e652ea3655ae3367ba698da8d2ff
config: arm-randconfig-r034-20211119 (attached as .config)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/8ff43502e84dd4fa1296a131cb0cc82146389db4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ben-Widawsky/Add-drivers-for-CXL-ports-and-mem-devices/20211120-080513
        git checkout 8ff43502e84dd4fa1296a131cb0cc82146389db4
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/cxl/port.c:105:9: error: implicit declaration of function 'ioread64_hi_lo' [-Werror,-Wimplicit-function-declaration]
           return ioread64_hi_lo(hdm_decoder +
                  ^
   drivers/cxl/port.c:191:11: error: implicit declaration of function 'ioread64_hi_lo' [-Werror,-Wimplicit-function-declaration]
                           base = ioread64_hi_lo(decoderN(BASE_LOW_OFFSET, i));
                                  ^
   2 errors generated.


vim +/ioread64_hi_lo +105 drivers/cxl/port.c

    97	
    98	static u64 get_decoder_size(void __iomem *hdm_decoder, int n)
    99	{
   100		u32 ctrl = readl(hdm_decoder + CXL_HDM_DECODER0_CTRL_OFFSET(n));
   101	
   102		if (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED)
   103			return 0;
   104	
 > 105		return ioread64_hi_lo(hdm_decoder +
   106				      CXL_HDM_DECODER0_SIZE_LOW_OFFSET(n));
   107	}
   108	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Jonathan Cameron Nov. 22, 2021, 5:41 p.m. UTC | #3
On Fri, 19 Nov 2021 16:02:47 -0800
Ben Widawsky <ben.widawsky@intel.com> wrote:

> The CXL port driver is responsible for managing the decoder resources
> contained within the port. It will also provide APIs that other drivers
> will consume for managing these resources.
> 
> There are 4 types of ports in a system:
> 1. Platform port. This is a non-programmable entity. Such a port is
>    named rootX. It is enumerated by cxl_acpi in an ACPI based system.
> 2. Hostbridge port. This ports register access is defined in a platform

port's 

>    specific way (CHBS for ACPI platforms). It has n downstream ports,
>    each of which are known as CXL 2.0 root ports. Once the platform
>    specific mechanism to get the offset to the registers is obtained it
>    operates just like other CXL components. The enumeration of this
>    component is started by cxl_acpi and completed by cxl_port.
> 3. Switch port. A switch port is similar to a hostbridge port except
>    register access is defined in the CXL specification in a platform
>    agnostic way. The downstream ports for a switch are simply known as
>    downstream ports. The enumeration of these are entirely contained
>    within cxl_port.
> 4. Endpoint port. Endpoint ports are similar to switch ports with the
>    exception that they have no downstream ports, only the underlying
>    media on the device. The enumeration of these are started by cxl_pci,
>    and completed by cxl_port.
> 
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
A few comments inline including what looks to me memory on the stack which has
gone out of scope when it's accessed.

Jonathan

> 
> ---
> Changes since RFCv2:
> - Reword commit message tense (Dan)
> - Reword commit message
> - Drop SOFTDEP since it's not needed yet (Dan)
> - Add CONFIG_CXL_PORT (Dan)
> - s/CXL_DECODER_F_EN/CXL_DECODER_F_ENABLE (Dan)
> - rename cxl_hdm_decoder_ functions to "to_" (Dan)
> - remove useless inline (Dan)
> - Check endpoint decoder based on dport list instead of driver id (Dan)
> - Use range instead of resource per dependent patch change
> - Use clever union packing for target list (Dan)
> - Only check NULL from devm_cxl_iomap_block (Dan)
> - Properly parent the created cxl decoders
> - Move bus rescanning from cxl_acpi to here (Dan)
> - Remove references to "CFMWS" in core (Dan)
> - Use macro to help keep within 80 character lines
> ---
>  .../driver-api/cxl/memory-devices.rst         |   5 +
>  drivers/cxl/Kconfig                           |  22 ++
>  drivers/cxl/Makefile                          |   2 +
>  drivers/cxl/core/bus.c                        |  67 ++++
>  drivers/cxl/core/regs.c                       |   6 +-
>  drivers/cxl/cxl.h                             |  34 +-
>  drivers/cxl/port.c                            | 323 ++++++++++++++++++
>  7 files changed, 450 insertions(+), 9 deletions(-)
>  create mode 100644 drivers/cxl/port.c
> 
> diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
> index 3b8f41395f6b..fbf0393cdddc 100644
> --- a/Documentation/driver-api/cxl/memory-devices.rst
> +++ b/Documentation/driver-api/cxl/memory-devices.rst
> @@ -28,6 +28,11 @@ CXL Memory Device
>  .. kernel-doc:: drivers/cxl/pci.c
>     :internal:
>  
> +CXL Port
> +--------
> +.. kernel-doc:: drivers/cxl/port.c
> +   :doc: cxl port
> +
>  CXL Core
>  --------
>  .. kernel-doc:: drivers/cxl/cxl.h
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index ef05e96f8f97..3aeb33bba5a3 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -77,4 +77,26 @@ config CXL_PMEM
>  	  provisioning the persistent memory capacity of CXL memory expanders.
>  
>  	  If unsure say 'm'.
> +
> +config CXL_MEM
> +	tristate "CXL.mem: Memory Devices"
> +	select CXL_PORT
> +	depends on CXL_PCI
> +	default CXL_BUS
> +	help
> +	  The CXL.mem protocol allows a device to act as a provider of "System
> +	  RAM" and/or "Persistent Memory" that is fully coherent as if the
> +	  memory was attached to the typical CPU memory controller.  This is
> +	  known as HDM "Host-managed Device Memory".
> +
> +	  Say 'y/m' to enable a driver that will attach to CXL.mem devices for
> +	  memory expansion and control of HDM. See Chapter 9.13 in the CXL 2.0
> +	  specification for a detailed description of HDM.
> +
> +	  If unsure say 'm'.
> +
> +
> +config CXL_PORT
> +	tristate
> +
>  endif
> diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
> index cf07ae6cea17..56fcac2323cb 100644
> --- a/drivers/cxl/Makefile
> +++ b/drivers/cxl/Makefile
> @@ -3,7 +3,9 @@ obj-$(CONFIG_CXL_BUS) += core/
>  obj-$(CONFIG_CXL_PCI) += cxl_pci.o
>  obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
>  obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
> +obj-$(CONFIG_CXL_PORT) += cxl_port.o
>  
>  cxl_pci-y := pci.o
>  cxl_acpi-y := acpi.o
>  cxl_pmem-y := pmem.o
> +cxl_port-y := port.o
> diff --git a/drivers/cxl/core/bus.c b/drivers/cxl/core/bus.c
> index 9e0d7d5d9298..46a06cfe79bd 100644
> --- a/drivers/cxl/core/bus.c
> +++ b/drivers/cxl/core/bus.c
> @@ -31,6 +31,8 @@ static DECLARE_RWSEM(root_port_sem);
>  
>  static struct device *cxl_topology_host;
>  
> +static bool is_cxl_decoder(struct device *dev);
> +
>  int cxl_register_topology_host(struct device *host)
>  {
>  	down_write(&topology_host_sem);
> @@ -75,6 +77,45 @@ static void put_cxl_topology_host(struct device *dev)
>  	up_read(&topology_host_sem);
>  }
>  
> +static int decoder_match(struct device *dev, void *data)
> +{
> +	struct resource *theirs = (struct resource *)data;
> +	struct cxl_decoder *cxld;
> +
> +	if (!is_cxl_decoder(dev))
> +		return 0;
> +
> +	cxld = to_cxl_decoder(dev);
> +	if (theirs->start <= cxld->decoder_range.start &&
> +	    theirs->end >= cxld->decoder_range.end)
> +		return 1;
> +
> +	return 0;
> +}
> +
> +static struct cxl_decoder *cxl_find_root_decoder(resource_size_t base,
> +						 resource_size_t size)
> +{
> +	struct cxl_decoder *cxld = NULL;
> +	struct device *cxldd;
> +	struct device *host;
> +	struct resource res = (struct resource){
> +		.start = base,
> +		.end = base + size - 1,
> +	};
> +
> +	host = get_cxl_topology_host();
> +	if (!host)
> +		return NULL;
> +
> +	cxldd = device_find_child(host, &res, decoder_match);
> +	if (cxldd)
> +		cxld = to_cxl_decoder(cxldd);
> +
> +	put_cxl_topology_host(host);
> +	return cxld;
> +}
> +
>  static ssize_t devtype_show(struct device *dev, struct device_attribute *attr,
>  			    char *buf)
>  {
> @@ -280,6 +321,11 @@ bool is_root_decoder(struct device *dev)
>  }
>  EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
>  
> +static bool is_cxl_decoder(struct device *dev)
> +{
> +	return dev->type->release == cxl_decoder_release;
> +}
> +
>  struct cxl_decoder *to_cxl_decoder(struct device *dev)
>  {
>  	if (dev_WARN_ONCE(dev, dev->type->release != cxl_decoder_release,
> @@ -327,6 +373,7 @@ struct cxl_port *to_cxl_port(struct device *dev)
>  		return NULL;
>  	return container_of(dev, struct cxl_port, dev);
>  }
> +EXPORT_SYMBOL_NS_GPL(to_cxl_port, CXL);
>  
>  struct cxl_dport *cxl_get_root_dport(struct device *dev)
>  {
> @@ -735,6 +782,24 @@ EXPORT_SYMBOL_NS_GPL(cxl_decoder_add, CXL);
>  
>  static void cxld_unregister(void *dev)
>  {
> +	struct cxl_decoder *plat_decoder, *cxld = to_cxl_decoder(dev);
> +	resource_size_t base, size;
> +
> +	if (is_root_decoder(dev)) {
> +		device_unregister(dev);
> +		return;
> +	}
> +
> +	base = cxld->decoder_range.start;
> +	size = range_len(&cxld->decoder_range);
> +
> +	if (size) {
> +		plat_decoder = cxl_find_root_decoder(base, size);
> +		if (plat_decoder)
> +			__release_region(&plat_decoder->platform_res, base,
> +					 size);
> +	}
> +
>  	device_unregister(dev);
>  }
>  
> @@ -789,6 +854,8 @@ static int cxl_device_id(struct device *dev)
>  		return CXL_DEVICE_NVDIMM_BRIDGE;
>  	if (dev->type == &cxl_nvdimm_type)
>  		return CXL_DEVICE_NVDIMM;
> +	if (dev->type == &cxl_port_type)
> +		return CXL_DEVICE_PORT;
>  	return 0;
>  }
>  
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index 41a0245867ea..f191b0c995a7 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -159,9 +159,8 @@ void cxl_probe_device_regs(struct device *dev, void __iomem *base,
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_probe_device_regs, CXL);
>  
> -static void __iomem *devm_cxl_iomap_block(struct device *dev,
> -					  resource_size_t addr,
> -					  resource_size_t length)
> +void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> +				   resource_size_t length)
>  {
>  	void __iomem *ret_val;
>  	struct resource *res;
> @@ -180,6 +179,7 @@ static void __iomem *devm_cxl_iomap_block(struct device *dev,
>  
>  	return ret_val;
>  }
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_iomap_block, CXL);
>  
>  int cxl_map_component_regs(struct pci_dev *pdev,
>  			   struct cxl_component_regs *regs,
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 3962a5e6a950..24fa16157d5e 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -17,6 +17,9 @@
>   * (port-driver, region-driver, nvdimm object-drivers... etc).
>   */
>  
> +/* CXL 2.0 8.2.4 CXL Component Register Layout and Definition */
> +#define CXL_COMPONENT_REG_BLOCK_SIZE SZ_64K
> +
>  /* CXL 2.0 8.2.5 CXL.cache and CXL.mem Registers*/
>  #define CXL_CM_OFFSET 0x1000
>  #define CXL_CM_CAP_HDR_OFFSET 0x0
> @@ -36,11 +39,22 @@
>  #define CXL_HDM_DECODER_CAP_OFFSET 0x0
>  #define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
>  #define   CXL_HDM_DECODER_TARGET_COUNT_MASK GENMASK(7, 4)
> -#define CXL_HDM_DECODER0_BASE_LOW_OFFSET 0x10
> -#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET 0x14
> -#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET 0x18
> -#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET 0x1c
> -#define CXL_HDM_DECODER0_CTRL_OFFSET 0x20
> +#define   CXL_HDM_DECODER_INTERLEAVE_11_8 BIT(8)
> +#define   CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
> +#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
> +#define   CXL_HDM_DECODER_ENABLE BIT(1)
> +#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
> +#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14)
> +#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18)
> +#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c)
> +#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20)
> +#define   CXL_HDM_DECODER0_CTRL_IG_MASK GENMASK(3, 0)
> +#define   CXL_HDM_DECODER0_CTRL_IW_MASK GENMASK(7, 4)
> +#define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
> +#define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
> +#define   CXL_HDM_DECODER0_CTRL_TYPE BIT(12)
> +#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
> +#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
>  
>  static inline int cxl_hdm_decoder_count(u32 cap_hdr)
>  {
> @@ -148,6 +162,8 @@ int cxl_map_device_regs(struct pci_dev *pdev,
>  enum cxl_regloc_type;
>  int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
>  		      struct cxl_register_map *map);
> +void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> +				   resource_size_t length);
>  
>  #define CXL_RESOURCE_NONE ((resource_size_t) -1)
>  #define CXL_TARGET_STRLEN 20
> @@ -165,7 +181,8 @@ void cxl_unregister_topology_host(struct device *host);
>  #define CXL_DECODER_F_TYPE2 BIT(2)
>  #define CXL_DECODER_F_TYPE3 BIT(3)
>  #define CXL_DECODER_F_LOCK  BIT(4)
> -#define CXL_DECODER_F_MASK  GENMASK(4, 0)
> +#define CXL_DECODER_F_ENABLE    BIT(5)
> +#define CXL_DECODER_F_MASK  GENMASK(5, 0)
>  
>  enum cxl_decoder_type {
>         CXL_DECODER_ACCELERATOR = 2,
> @@ -255,6 +272,8 @@ struct cxl_walk_context {
>   * @dports: cxl_dport instances referenced by decoders
>   * @decoder_ida: allocator for decoder ids
>   * @component_reg_phys: component register capability base address (optional)
> + * @rescan_work: worker object for bus rescans after port additions
> + * @data: opaque data with driver specific usage
>   */
>  struct cxl_port {
>  	struct device dev;
> @@ -263,6 +282,8 @@ struct cxl_port {
>  	struct list_head dports;
>  	struct ida decoder_ida;
>  	resource_size_t component_reg_phys;
> +	struct work_struct rescan_work;
> +	void *data;
>  };
>  
>  /**
> @@ -325,6 +346,7 @@ void cxl_driver_unregister(struct cxl_driver *cxl_drv);
>  
>  #define CXL_DEVICE_NVDIMM_BRIDGE	1
>  #define CXL_DEVICE_NVDIMM		2
> +#define CXL_DEVICE_PORT			3
>  
>  #define MODULE_ALIAS_CXL(type) MODULE_ALIAS("cxl:t" __stringify(type) "*")
>  #define CXL_MODALIAS_FMT "cxl:t%d"
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> new file mode 100644
> index 000000000000..3c03131517af
> --- /dev/null
> +++ b/drivers/cxl/port.c
> @@ -0,0 +1,323 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +
> +#include "cxlmem.h"
> +
> +/**
> + * DOC: cxl port
> + *
> + * The port driver implements the set of functionality needed to allow full
> + * decoder enumeration and routing. A CXL port is an abstraction of a CXL
> + * component that implements some amount of CXL decoding of CXL.mem traffic.
> + * As of the CXL 2.0 spec, this includes:
> + *
> + *	.. list-table:: CXL Components w/ Ports
> + *		:widths: 25 25 50
> + *		:header-rows: 1
> + *
> + *		* - component
> + *		  - upstream
> + *		  - downstream
> + *		* - Hostbridge
> + *		  - ACPI0016
> + *		  - root port
> + *		* - Switch
> + *		  - Switch Upstream Port
> + *		  - Switch Downstream Port
> + *		* - Endpoint
> + *		  - Endpoint Port
> + *		  - N/A
> + *
> + * The primary service this driver provides is enumerating HDM decoders and
> + * presenting APIs to other drivers to utilize the decoders.
> + */
> +
> +static struct workqueue_struct *cxl_port_wq;
> +
> +struct cxl_port_data {
> +	struct cxl_component_regs regs;
> +
> +	struct port_caps {
> +		unsigned int count;
> +		unsigned int tc;
> +		unsigned int interleave11_8;
> +		unsigned int interleave14_12;
> +	} caps;
> +};
> +
> +static inline int to_interleave_granularity(u32 ctrl)
> +{
> +	int val = FIELD_GET(CXL_HDM_DECODER0_CTRL_IG_MASK, ctrl);
> +
> +	return 256 << val;
> +}
> +
> +static inline int to_interleave_ways(u32 ctrl)
> +{
> +	int val = FIELD_GET(CXL_HDM_DECODER0_CTRL_IW_MASK, ctrl);
> +
> +	return 1 << val;
> +}
> +
> +static void get_caps(struct cxl_port *port, struct cxl_port_data *cpd)
> +{
> +	void __iomem *hdm_decoder = cpd->regs.hdm_decoder;
> +	struct port_caps *caps = &cpd->caps;
> +	u32 hdm_cap;
> +
> +	hdm_cap = readl(hdm_decoder + CXL_HDM_DECODER_CAP_OFFSET);
> +
> +	caps->count = cxl_hdm_decoder_count(hdm_cap);
> +	caps->tc = FIELD_GET(CXL_HDM_DECODER_TARGET_COUNT_MASK, hdm_cap);
> +	caps->interleave11_8 =
> +		FIELD_GET(CXL_HDM_DECODER_INTERLEAVE_11_8, hdm_cap);
> +	caps->interleave14_12 =
> +		FIELD_GET(CXL_HDM_DECODER_INTERLEAVE_14_12, hdm_cap);
> +}
> +
> +static int map_regs(struct cxl_port *port, void __iomem *crb,
> +		    struct cxl_port_data *cpd)
> +{
> +	struct cxl_register_map map;
> +	struct cxl_component_reg_map *comp_map = &map.component_map;
> +
> +	cxl_probe_component_regs(&port->dev, crb, comp_map);
> +	if (!comp_map->hdm_decoder.valid) {
> +		dev_err(&port->dev, "HDM decoder registers invalid\n");
> +		return -ENXIO;
> +	}
> +
> +	cpd->regs.hdm_decoder = crb + comp_map->hdm_decoder.offset;
> +
> +	return 0;
> +}
> +
> +static u64 get_decoder_size(void __iomem *hdm_decoder, int n)
> +{
> +	u32 ctrl = readl(hdm_decoder + CXL_HDM_DECODER0_CTRL_OFFSET(n));
> +
> +	if (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED)
> +		return 0;
> +
> +	return ioread64_hi_lo(hdm_decoder +
> +			      CXL_HDM_DECODER0_SIZE_LOW_OFFSET(n));
> +}
> +
> +static bool is_endpoint_port(struct cxl_port *port)
> +{
> +	/* Endpoints can't be ports... yet! */
> +	return false;
> +}
> +
> +static void rescan_ports(struct work_struct *work)
> +{
> +	if (bus_rescan_devices(&cxl_bus_type))
> +		pr_warn("Failed to rescan\n");
> +}
> +
> +/* Minor layering violation */
> +static int dvsec_range_used(struct cxl_port *port)
> +{
> +	struct cxl_endpoint_dvsec_info *info;
> +	int i, ret = 0;
> +
> +	if (!is_endpoint_port(port))
> +		return 0;
> +
> +	info = port->data;
> +	for (i = 0; i < info->ranges; i++)
> +		if (info->range[i].size)
> +			ret |= 1 << i;
> +
> +	return ret;
> +}
> +
> +static int enumerate_hdm_decoders(struct cxl_port *port,
> +				  struct cxl_port_data *portdata)
> +{
> +	void __iomem *hdm_decoder = portdata->regs.hdm_decoder;
> +	bool global_enable;
> +	u32 global_ctrl;
> +	int i = 0;
> +
> +	global_ctrl = readl(hdm_decoder + CXL_HDM_DECODER_CTRL_OFFSET);
> +	global_enable = global_ctrl & CXL_HDM_DECODER_ENABLE;
> +	if (!global_enable) {
> +		i = dvsec_range_used(port);
> +		if (i) {
> +			dev_err(&port->dev,
> +				"Couldn't add port because device is using DVSEC range registers\n");
> +			return -EBUSY;
> +		}
> +	}
> +
> +	for (i = 0; i < portdata->caps.count; i++) {
> +		int rc, target_count = portdata->caps.tc;
> +		struct cxl_decoder *cxld;
> +		int *target_map = NULL;
> +		u64 size;
> +
> +		if (is_endpoint_port(port))
> +			target_count = 0;
> +
> +		cxld = cxl_decoder_alloc(port, target_count);
> +		if (IS_ERR(cxld)) {
> +			dev_warn(&port->dev,
> +				 "Failed to allocate the decoder\n");
> +			return PTR_ERR(cxld);
> +		}
> +
> +		cxld->target_type = CXL_DECODER_EXPANDER;
> +		cxld->interleave_ways = 1;
> +		cxld->interleave_granularity = 0;
> +
> +		size = get_decoder_size(hdm_decoder, i);
> +		if (size != 0) {
> +#define decoderN(reg, n) hdm_decoder + CXL_HDM_DECODER0_##reg(n)

Perhaps this block in the if (size != 0) would be more readable if broken out
to a utility function?
As normal I'm not keen on the macro magic if we can avoid it.


> +			int temp[CXL_DECODER_MAX_INTERLEAVE];
> +			u64 base;
> +			u32 ctrl;
> +			int j;
> +			union {
> +				u64 value;
> +				char target_id[8];
> +			} target_list;

I thought we tried to avoid this union usage in kernel because of the whole
thing about c compilers being able to do what they like with it...

> +
> +			target_map = temp;

This is set to a variable that goes out of scope whilst target_map is still in
use.

> +			ctrl = readl(decoderN(CTRL_OFFSET, i));
> +			base = ioread64_hi_lo(decoderN(BASE_LOW_OFFSET, i));
> +
> +			cxld->decoder_range = (struct range){
> +				.start = base,
> +				.end = base + size - 1
> +			};
> +
> +			cxld->flags = CXL_DECODER_F_ENABLE;
> +			cxld->interleave_ways = to_interleave_ways(ctrl);
> +			cxld->interleave_granularity =
> +				to_interleave_granularity(ctrl);
> +
> +			if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl) == 0)
> +				cxld->target_type = CXL_DECODER_ACCELERATOR;
> +
> +			target_list.value = ioread64_hi_lo(decoderN(TL_LOW, i));
> +			for (j = 0; j < cxld->interleave_ways; j++)
> +				target_map[j] = target_list.target_id[j];
> +#undef decoderN
> +		}
> +
> +		rc = cxl_decoder_add_locked(cxld, target_map);
> +		if (rc)
> +			put_device(&cxld->dev);
> +		else
> +			rc = cxl_decoder_autoremove(&port->dev, cxld);
> +		if (rc)
> +			dev_err(&port->dev, "Failed to add decoder\n");

If that fails on the autoremove registration (unlikely) this message
will be rather confusing - as the add was fine...

This nest of carrying on when we have an error is getting ever deeper...

> +		else
> +			dev_dbg(&cxld->dev, "Added to port %s\n",
> +				dev_name(&port->dev));
> +	}
> +
> +	/*
> +	 * Turn on global enable now since DVSEC ranges aren't being used and
> +	 * we'll eventually want the decoder enabled.
> +	 */
> +	if (!global_enable) {
> +		dev_dbg(&port->dev, "Enabling HDM decode\n");
> +		writel(global_ctrl | CXL_HDM_DECODER_ENABLE, hdm_decoder + CXL_HDM_DECODER_CTRL_OFFSET);
> +	}
> +
> +	return 0;
> +}
> +
Ben Widawsky Nov. 22, 2021, 11:38 p.m. UTC | #4
On 21-11-22 17:41:32, Jonathan Cameron wrote:
> On Fri, 19 Nov 2021 16:02:47 -0800
> Ben Widawsky <ben.widawsky@intel.com> wrote:
> 
> > The CXL port driver is responsible for managing the decoder resources
> > contained within the port. It will also provide APIs that other drivers
> > will consume for managing these resources.
> > 
> > There are 4 types of ports in a system:
> > 1. Platform port. This is a non-programmable entity. Such a port is
> >    named rootX. It is enumerated by cxl_acpi in an ACPI based system.
> > 2. Hostbridge port. This ports register access is defined in a platform
> 
> port's 
> 
> >    specific way (CHBS for ACPI platforms). It has n downstream ports,
> >    each of which are known as CXL 2.0 root ports. Once the platform
> >    specific mechanism to get the offset to the registers is obtained it
> >    operates just like other CXL components. The enumeration of this
> >    component is started by cxl_acpi and completed by cxl_port.
> > 3. Switch port. A switch port is similar to a hostbridge port except
> >    register access is defined in the CXL specification in a platform
> >    agnostic way. The downstream ports for a switch are simply known as
> >    downstream ports. The enumeration of these are entirely contained
> >    within cxl_port.
> > 4. Endpoint port. Endpoint ports are similar to switch ports with the
> >    exception that they have no downstream ports, only the underlying
> >    media on the device. The enumeration of these are started by cxl_pci,
> >    and completed by cxl_port.
> > 
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> A few comments inline including what looks to me memory on the stack which has
> gone out of scope when it's accessed.
> 
> Jonathan
> 
> > 
> > ---
> > Changes since RFCv2:
> > - Reword commit message tense (Dan)
> > - Reword commit message
> > - Drop SOFTDEP since it's not needed yet (Dan)
> > - Add CONFIG_CXL_PORT (Dan)
> > - s/CXL_DECODER_F_EN/CXL_DECODER_F_ENABLE (Dan)
> > - rename cxl_hdm_decoder_ functions to "to_" (Dan)
> > - remove useless inline (Dan)
> > - Check endpoint decoder based on dport list instead of driver id (Dan)
> > - Use range instead of resource per dependent patch change
> > - Use clever union packing for target list (Dan)
> > - Only check NULL from devm_cxl_iomap_block (Dan)
> > - Properly parent the created cxl decoders
> > - Move bus rescanning from cxl_acpi to here (Dan)
> > - Remove references to "CFMWS" in core (Dan)
> > - Use macro to help keep within 80 character lines
> > ---
> >  .../driver-api/cxl/memory-devices.rst         |   5 +
> >  drivers/cxl/Kconfig                           |  22 ++
> >  drivers/cxl/Makefile                          |   2 +
> >  drivers/cxl/core/bus.c                        |  67 ++++
> >  drivers/cxl/core/regs.c                       |   6 +-
> >  drivers/cxl/cxl.h                             |  34 +-
> >  drivers/cxl/port.c                            | 323 ++++++++++++++++++
> >  7 files changed, 450 insertions(+), 9 deletions(-)
> >  create mode 100644 drivers/cxl/port.c
> > 
> > diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
> > index 3b8f41395f6b..fbf0393cdddc 100644
> > --- a/Documentation/driver-api/cxl/memory-devices.rst
> > +++ b/Documentation/driver-api/cxl/memory-devices.rst
> > @@ -28,6 +28,11 @@ CXL Memory Device
> >  .. kernel-doc:: drivers/cxl/pci.c
> >     :internal:
> >  
> > +CXL Port
> > +--------
> > +.. kernel-doc:: drivers/cxl/port.c
> > +   :doc: cxl port
> > +
> >  CXL Core
> >  --------
> >  .. kernel-doc:: drivers/cxl/cxl.h
> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > index ef05e96f8f97..3aeb33bba5a3 100644
> > --- a/drivers/cxl/Kconfig
> > +++ b/drivers/cxl/Kconfig
> > @@ -77,4 +77,26 @@ config CXL_PMEM
> >  	  provisioning the persistent memory capacity of CXL memory expanders.
> >  
> >  	  If unsure say 'm'.
> > +
> > +config CXL_MEM
> > +	tristate "CXL.mem: Memory Devices"
> > +	select CXL_PORT
> > +	depends on CXL_PCI
> > +	default CXL_BUS
> > +	help
> > +	  The CXL.mem protocol allows a device to act as a provider of "System
> > +	  RAM" and/or "Persistent Memory" that is fully coherent as if the
> > +	  memory was attached to the typical CPU memory controller.  This is
> > +	  known as HDM "Host-managed Device Memory".
> > +
> > +	  Say 'y/m' to enable a driver that will attach to CXL.mem devices for
> > +	  memory expansion and control of HDM. See Chapter 9.13 in the CXL 2.0
> > +	  specification for a detailed description of HDM.
> > +
> > +	  If unsure say 'm'.
> > +
> > +
> > +config CXL_PORT
> > +	tristate
> > +
> >  endif
> > diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
> > index cf07ae6cea17..56fcac2323cb 100644
> > --- a/drivers/cxl/Makefile
> > +++ b/drivers/cxl/Makefile
> > @@ -3,7 +3,9 @@ obj-$(CONFIG_CXL_BUS) += core/
> >  obj-$(CONFIG_CXL_PCI) += cxl_pci.o
> >  obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
> >  obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
> > +obj-$(CONFIG_CXL_PORT) += cxl_port.o
> >  
> >  cxl_pci-y := pci.o
> >  cxl_acpi-y := acpi.o
> >  cxl_pmem-y := pmem.o
> > +cxl_port-y := port.o
> > diff --git a/drivers/cxl/core/bus.c b/drivers/cxl/core/bus.c
> > index 9e0d7d5d9298..46a06cfe79bd 100644
> > --- a/drivers/cxl/core/bus.c
> > +++ b/drivers/cxl/core/bus.c
> > @@ -31,6 +31,8 @@ static DECLARE_RWSEM(root_port_sem);
> >  
> >  static struct device *cxl_topology_host;
> >  
> > +static bool is_cxl_decoder(struct device *dev);
> > +
> >  int cxl_register_topology_host(struct device *host)
> >  {
> >  	down_write(&topology_host_sem);
> > @@ -75,6 +77,45 @@ static void put_cxl_topology_host(struct device *dev)
> >  	up_read(&topology_host_sem);
> >  }
> >  
> > +static int decoder_match(struct device *dev, void *data)
> > +{
> > +	struct resource *theirs = (struct resource *)data;
> > +	struct cxl_decoder *cxld;
> > +
> > +	if (!is_cxl_decoder(dev))
> > +		return 0;
> > +
> > +	cxld = to_cxl_decoder(dev);
> > +	if (theirs->start <= cxld->decoder_range.start &&
> > +	    theirs->end >= cxld->decoder_range.end)
> > +		return 1;
> > +
> > +	return 0;
> > +}
> > +
> > +static struct cxl_decoder *cxl_find_root_decoder(resource_size_t base,
> > +						 resource_size_t size)
> > +{
> > +	struct cxl_decoder *cxld = NULL;
> > +	struct device *cxldd;
> > +	struct device *host;
> > +	struct resource res = (struct resource){
> > +		.start = base,
> > +		.end = base + size - 1,
> > +	};
> > +
> > +	host = get_cxl_topology_host();
> > +	if (!host)
> > +		return NULL;
> > +
> > +	cxldd = device_find_child(host, &res, decoder_match);
> > +	if (cxldd)
> > +		cxld = to_cxl_decoder(cxldd);
> > +
> > +	put_cxl_topology_host(host);
> > +	return cxld;
> > +}
> > +
> >  static ssize_t devtype_show(struct device *dev, struct device_attribute *attr,
> >  			    char *buf)
> >  {
> > @@ -280,6 +321,11 @@ bool is_root_decoder(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
> >  
> > +static bool is_cxl_decoder(struct device *dev)
> > +{
> > +	return dev->type->release == cxl_decoder_release;
> > +}
> > +
> >  struct cxl_decoder *to_cxl_decoder(struct device *dev)
> >  {
> >  	if (dev_WARN_ONCE(dev, dev->type->release != cxl_decoder_release,
> > @@ -327,6 +373,7 @@ struct cxl_port *to_cxl_port(struct device *dev)
> >  		return NULL;
> >  	return container_of(dev, struct cxl_port, dev);
> >  }
> > +EXPORT_SYMBOL_NS_GPL(to_cxl_port, CXL);
> >  
> >  struct cxl_dport *cxl_get_root_dport(struct device *dev)
> >  {
> > @@ -735,6 +782,24 @@ EXPORT_SYMBOL_NS_GPL(cxl_decoder_add, CXL);
> >  
> >  static void cxld_unregister(void *dev)
> >  {
> > +	struct cxl_decoder *plat_decoder, *cxld = to_cxl_decoder(dev);
> > +	resource_size_t base, size;
> > +
> > +	if (is_root_decoder(dev)) {
> > +		device_unregister(dev);
> > +		return;
> > +	}
> > +
> > +	base = cxld->decoder_range.start;
> > +	size = range_len(&cxld->decoder_range);
> > +
> > +	if (size) {
> > +		plat_decoder = cxl_find_root_decoder(base, size);
> > +		if (plat_decoder)
> > +			__release_region(&plat_decoder->platform_res, base,
> > +					 size);
> > +	}
> > +
> >  	device_unregister(dev);
> >  }
> >  
> > @@ -789,6 +854,8 @@ static int cxl_device_id(struct device *dev)
> >  		return CXL_DEVICE_NVDIMM_BRIDGE;
> >  	if (dev->type == &cxl_nvdimm_type)
> >  		return CXL_DEVICE_NVDIMM;
> > +	if (dev->type == &cxl_port_type)
> > +		return CXL_DEVICE_PORT;
> >  	return 0;
> >  }
> >  
> > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> > index 41a0245867ea..f191b0c995a7 100644
> > --- a/drivers/cxl/core/regs.c
> > +++ b/drivers/cxl/core/regs.c
> > @@ -159,9 +159,8 @@ void cxl_probe_device_regs(struct device *dev, void __iomem *base,
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cxl_probe_device_regs, CXL);
> >  
> > -static void __iomem *devm_cxl_iomap_block(struct device *dev,
> > -					  resource_size_t addr,
> > -					  resource_size_t length)
> > +void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> > +				   resource_size_t length)
> >  {
> >  	void __iomem *ret_val;
> >  	struct resource *res;
> > @@ -180,6 +179,7 @@ static void __iomem *devm_cxl_iomap_block(struct device *dev,
> >  
> >  	return ret_val;
> >  }
> > +EXPORT_SYMBOL_NS_GPL(devm_cxl_iomap_block, CXL);
> >  
> >  int cxl_map_component_regs(struct pci_dev *pdev,
> >  			   struct cxl_component_regs *regs,
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index 3962a5e6a950..24fa16157d5e 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -17,6 +17,9 @@
> >   * (port-driver, region-driver, nvdimm object-drivers... etc).
> >   */
> >  
> > +/* CXL 2.0 8.2.4 CXL Component Register Layout and Definition */
> > +#define CXL_COMPONENT_REG_BLOCK_SIZE SZ_64K
> > +
> >  /* CXL 2.0 8.2.5 CXL.cache and CXL.mem Registers*/
> >  #define CXL_CM_OFFSET 0x1000
> >  #define CXL_CM_CAP_HDR_OFFSET 0x0
> > @@ -36,11 +39,22 @@
> >  #define CXL_HDM_DECODER_CAP_OFFSET 0x0
> >  #define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
> >  #define   CXL_HDM_DECODER_TARGET_COUNT_MASK GENMASK(7, 4)
> > -#define CXL_HDM_DECODER0_BASE_LOW_OFFSET 0x10
> > -#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET 0x14
> > -#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET 0x18
> > -#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET 0x1c
> > -#define CXL_HDM_DECODER0_CTRL_OFFSET 0x20
> > +#define   CXL_HDM_DECODER_INTERLEAVE_11_8 BIT(8)
> > +#define   CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
> > +#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
> > +#define   CXL_HDM_DECODER_ENABLE BIT(1)
> > +#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
> > +#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14)
> > +#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18)
> > +#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c)
> > +#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20)
> > +#define   CXL_HDM_DECODER0_CTRL_IG_MASK GENMASK(3, 0)
> > +#define   CXL_HDM_DECODER0_CTRL_IW_MASK GENMASK(7, 4)
> > +#define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
> > +#define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
> > +#define   CXL_HDM_DECODER0_CTRL_TYPE BIT(12)
> > +#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
> > +#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
> >  
> >  static inline int cxl_hdm_decoder_count(u32 cap_hdr)
> >  {
> > @@ -148,6 +162,8 @@ int cxl_map_device_regs(struct pci_dev *pdev,
> >  enum cxl_regloc_type;
> >  int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
> >  		      struct cxl_register_map *map);
> > +void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> > +				   resource_size_t length);
> >  
> >  #define CXL_RESOURCE_NONE ((resource_size_t) -1)
> >  #define CXL_TARGET_STRLEN 20
> > @@ -165,7 +181,8 @@ void cxl_unregister_topology_host(struct device *host);
> >  #define CXL_DECODER_F_TYPE2 BIT(2)
> >  #define CXL_DECODER_F_TYPE3 BIT(3)
> >  #define CXL_DECODER_F_LOCK  BIT(4)
> > -#define CXL_DECODER_F_MASK  GENMASK(4, 0)
> > +#define CXL_DECODER_F_ENABLE    BIT(5)
> > +#define CXL_DECODER_F_MASK  GENMASK(5, 0)
> >  
> >  enum cxl_decoder_type {
> >         CXL_DECODER_ACCELERATOR = 2,
> > @@ -255,6 +272,8 @@ struct cxl_walk_context {
> >   * @dports: cxl_dport instances referenced by decoders
> >   * @decoder_ida: allocator for decoder ids
> >   * @component_reg_phys: component register capability base address (optional)
> > + * @rescan_work: worker object for bus rescans after port additions
> > + * @data: opaque data with driver specific usage
> >   */
> >  struct cxl_port {
> >  	struct device dev;
> > @@ -263,6 +282,8 @@ struct cxl_port {
> >  	struct list_head dports;
> >  	struct ida decoder_ida;
> >  	resource_size_t component_reg_phys;
> > +	struct work_struct rescan_work;
> > +	void *data;
> >  };
> >  
> >  /**
> > @@ -325,6 +346,7 @@ void cxl_driver_unregister(struct cxl_driver *cxl_drv);
> >  
> >  #define CXL_DEVICE_NVDIMM_BRIDGE	1
> >  #define CXL_DEVICE_NVDIMM		2
> > +#define CXL_DEVICE_PORT			3
> >  
> >  #define MODULE_ALIAS_CXL(type) MODULE_ALIAS("cxl:t" __stringify(type) "*")
> >  #define CXL_MODALIAS_FMT "cxl:t%d"
> > diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> > new file mode 100644
> > index 000000000000..3c03131517af
> > --- /dev/null
> > +++ b/drivers/cxl/port.c
> > @@ -0,0 +1,323 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
> > +#include <linux/device.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +
> > +#include "cxlmem.h"
> > +
> > +/**
> > + * DOC: cxl port
> > + *
> > + * The port driver implements the set of functionality needed to allow full
> > + * decoder enumeration and routing. A CXL port is an abstraction of a CXL
> > + * component that implements some amount of CXL decoding of CXL.mem traffic.
> > + * As of the CXL 2.0 spec, this includes:
> > + *
> > + *	.. list-table:: CXL Components w/ Ports
> > + *		:widths: 25 25 50
> > + *		:header-rows: 1
> > + *
> > + *		* - component
> > + *		  - upstream
> > + *		  - downstream
> > + *		* - Hostbridge
> > + *		  - ACPI0016
> > + *		  - root port
> > + *		* - Switch
> > + *		  - Switch Upstream Port
> > + *		  - Switch Downstream Port
> > + *		* - Endpoint
> > + *		  - Endpoint Port
> > + *		  - N/A
> > + *
> > + * The primary service this driver provides is enumerating HDM decoders and
> > + * presenting APIs to other drivers to utilize the decoders.
> > + */
> > +
> > +static struct workqueue_struct *cxl_port_wq;
> > +
> > +struct cxl_port_data {
> > +	struct cxl_component_regs regs;
> > +
> > +	struct port_caps {
> > +		unsigned int count;
> > +		unsigned int tc;
> > +		unsigned int interleave11_8;
> > +		unsigned int interleave14_12;
> > +	} caps;
> > +};
> > +
> > +static inline int to_interleave_granularity(u32 ctrl)
> > +{
> > +	int val = FIELD_GET(CXL_HDM_DECODER0_CTRL_IG_MASK, ctrl);
> > +
> > +	return 256 << val;
> > +}
> > +
> > +static inline int to_interleave_ways(u32 ctrl)
> > +{
> > +	int val = FIELD_GET(CXL_HDM_DECODER0_CTRL_IW_MASK, ctrl);
> > +
> > +	return 1 << val;
> > +}
> > +
> > +static void get_caps(struct cxl_port *port, struct cxl_port_data *cpd)
> > +{
> > +	void __iomem *hdm_decoder = cpd->regs.hdm_decoder;
> > +	struct port_caps *caps = &cpd->caps;
> > +	u32 hdm_cap;
> > +
> > +	hdm_cap = readl(hdm_decoder + CXL_HDM_DECODER_CAP_OFFSET);
> > +
> > +	caps->count = cxl_hdm_decoder_count(hdm_cap);
> > +	caps->tc = FIELD_GET(CXL_HDM_DECODER_TARGET_COUNT_MASK, hdm_cap);
> > +	caps->interleave11_8 =
> > +		FIELD_GET(CXL_HDM_DECODER_INTERLEAVE_11_8, hdm_cap);
> > +	caps->interleave14_12 =
> > +		FIELD_GET(CXL_HDM_DECODER_INTERLEAVE_14_12, hdm_cap);
> > +}
> > +
> > +static int map_regs(struct cxl_port *port, void __iomem *crb,
> > +		    struct cxl_port_data *cpd)
> > +{
> > +	struct cxl_register_map map;
> > +	struct cxl_component_reg_map *comp_map = &map.component_map;
> > +
> > +	cxl_probe_component_regs(&port->dev, crb, comp_map);
> > +	if (!comp_map->hdm_decoder.valid) {
> > +		dev_err(&port->dev, "HDM decoder registers invalid\n");
> > +		return -ENXIO;
> > +	}
> > +
> > +	cpd->regs.hdm_decoder = crb + comp_map->hdm_decoder.offset;
> > +
> > +	return 0;
> > +}
> > +
> > +static u64 get_decoder_size(void __iomem *hdm_decoder, int n)
> > +{
> > +	u32 ctrl = readl(hdm_decoder + CXL_HDM_DECODER0_CTRL_OFFSET(n));
> > +
> > +	if (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED)
> > +		return 0;
> > +
> > +	return ioread64_hi_lo(hdm_decoder +
> > +			      CXL_HDM_DECODER0_SIZE_LOW_OFFSET(n));
> > +}
> > +
> > +static bool is_endpoint_port(struct cxl_port *port)
> > +{
> > +	/* Endpoints can't be ports... yet! */
> > +	return false;
> > +}
> > +
> > +static void rescan_ports(struct work_struct *work)
> > +{
> > +	if (bus_rescan_devices(&cxl_bus_type))
> > +		pr_warn("Failed to rescan\n");
> > +}
> > +
> > +/* Minor layering violation */
> > +static int dvsec_range_used(struct cxl_port *port)
> > +{
> > +	struct cxl_endpoint_dvsec_info *info;
> > +	int i, ret = 0;
> > +
> > +	if (!is_endpoint_port(port))
> > +		return 0;
> > +
> > +	info = port->data;
> > +	for (i = 0; i < info->ranges; i++)
> > +		if (info->range[i].size)
> > +			ret |= 1 << i;
> > +
> > +	return ret;
> > +}
> > +
> > +static int enumerate_hdm_decoders(struct cxl_port *port,
> > +				  struct cxl_port_data *portdata)
> > +{
> > +	void __iomem *hdm_decoder = portdata->regs.hdm_decoder;
> > +	bool global_enable;
> > +	u32 global_ctrl;
> > +	int i = 0;
> > +
> > +	global_ctrl = readl(hdm_decoder + CXL_HDM_DECODER_CTRL_OFFSET);
> > +	global_enable = global_ctrl & CXL_HDM_DECODER_ENABLE;
> > +	if (!global_enable) {
> > +		i = dvsec_range_used(port);
> > +		if (i) {
> > +			dev_err(&port->dev,
> > +				"Couldn't add port because device is using DVSEC range registers\n");
> > +			return -EBUSY;
> > +		}
> > +	}
> > +
> > +	for (i = 0; i < portdata->caps.count; i++) {
> > +		int rc, target_count = portdata->caps.tc;
> > +		struct cxl_decoder *cxld;
> > +		int *target_map = NULL;
> > +		u64 size;
> > +
> > +		if (is_endpoint_port(port))
> > +			target_count = 0;
> > +
> > +		cxld = cxl_decoder_alloc(port, target_count);
> > +		if (IS_ERR(cxld)) {
> > +			dev_warn(&port->dev,
> > +				 "Failed to allocate the decoder\n");
> > +			return PTR_ERR(cxld);
> > +		}
> > +
> > +		cxld->target_type = CXL_DECODER_EXPANDER;
> > +		cxld->interleave_ways = 1;
> > +		cxld->interleave_granularity = 0;
> > +
> > +		size = get_decoder_size(hdm_decoder, i);
> > +		if (size != 0) {
> > +#define decoderN(reg, n) hdm_decoder + CXL_HDM_DECODER0_##reg(n)
> 
> Perhaps this block in the if (size != 0) would be more readable if broken out
> to a utility function?

I don't get this comment, there is already get_decoder_size(). Can you please
elaborate?

> As normal I'm not keen on the macro magic if we can avoid it.
> 

Yeah - just trying to not have to deal with wrapping long lines.

> 
> > +			int temp[CXL_DECODER_MAX_INTERLEAVE];
> > +			u64 base;
> > +			u32 ctrl;
> > +			int j;
> > +			union {
> > +				u64 value;
> > +				char target_id[8];
> > +			} target_list;
> 
> I thought we tried to avoid this union usage in kernel because of the whole
> thing about c compilers being able to do what they like with it...
> 

I wasn't aware of the restriction. I can change it back if it's required. It
does look a lot nicer this way. Is there a reference to this issue somewhere?

> > +
> > +			target_map = temp;
> 
> This is set to a variable that goes out of scope whilst target_map is still in
> use.
> 

Yikes. I'm pretty surprised the compiler didn't catch this.

> > +			ctrl = readl(decoderN(CTRL_OFFSET, i));
> > +			base = ioread64_hi_lo(decoderN(BASE_LOW_OFFSET, i));
> > +
> > +			cxld->decoder_range = (struct range){
> > +				.start = base,
> > +				.end = base + size - 1
> > +			};
> > +
> > +			cxld->flags = CXL_DECODER_F_ENABLE;
> > +			cxld->interleave_ways = to_interleave_ways(ctrl);
> > +			cxld->interleave_granularity =
> > +				to_interleave_granularity(ctrl);
> > +
> > +			if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl) == 0)
> > +				cxld->target_type = CXL_DECODER_ACCELERATOR;
> > +
> > +			target_list.value = ioread64_hi_lo(decoderN(TL_LOW, i));
> > +			for (j = 0; j < cxld->interleave_ways; j++)
> > +				target_map[j] = target_list.target_id[j];
> > +#undef decoderN
> > +		}
> > +
> > +		rc = cxl_decoder_add_locked(cxld, target_map);
> > +		if (rc)
> > +			put_device(&cxld->dev);
> > +		else
> > +			rc = cxl_decoder_autoremove(&port->dev, cxld);
> > +		if (rc)
> > +			dev_err(&port->dev, "Failed to add decoder\n");
> 
> If that fails on the autoremove registration (unlikely) this message
> will be rather confusing - as the add was fine...
> 
> This nest of carrying on when we have an error is getting ever deeper...
> 

Yeah, this is not great. I will clean it up.

Thanks.

> > +		else
> > +			dev_dbg(&cxld->dev, "Added to port %s\n",
> > +				dev_name(&port->dev));
> > +	}
> > +
> > +	/*
> > +	 * Turn on global enable now since DVSEC ranges aren't being used and
> > +	 * we'll eventually want the decoder enabled.
> > +	 */
> > +	if (!global_enable) {
> > +		dev_dbg(&port->dev, "Enabling HDM decode\n");
> > +		writel(global_ctrl | CXL_HDM_DECODER_ENABLE, hdm_decoder + CXL_HDM_DECODER_CTRL_OFFSET);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
Jonathan Cameron Nov. 23, 2021, 11:38 a.m. UTC | #5
On Mon, 22 Nov 2021 15:38:20 -0800
Ben Widawsky <ben.widawsky@intel.com> wrote:

...

> > > +static int enumerate_hdm_decoders(struct cxl_port *port,
> > > +				  struct cxl_port_data *portdata)
> > > +{
> > > +	void __iomem *hdm_decoder = portdata->regs.hdm_decoder;
> > > +	bool global_enable;
> > > +	u32 global_ctrl;
> > > +	int i = 0;
> > > +
> > > +	global_ctrl = readl(hdm_decoder + CXL_HDM_DECODER_CTRL_OFFSET);
> > > +	global_enable = global_ctrl & CXL_HDM_DECODER_ENABLE;
> > > +	if (!global_enable) {
> > > +		i = dvsec_range_used(port);
> > > +		if (i) {
> > > +			dev_err(&port->dev,
> > > +				"Couldn't add port because device is using DVSEC range registers\n");
> > > +			return -EBUSY;
> > > +		}
> > > +	}
> > > +
> > > +	for (i = 0; i < portdata->caps.count; i++) {
> > > +		int rc, target_count = portdata->caps.tc;
> > > +		struct cxl_decoder *cxld;
> > > +		int *target_map = NULL;
> > > +		u64 size;
> > > +
> > > +		if (is_endpoint_port(port))
> > > +			target_count = 0;
> > > +
> > > +		cxld = cxl_decoder_alloc(port, target_count);
> > > +		if (IS_ERR(cxld)) {
> > > +			dev_warn(&port->dev,
> > > +				 "Failed to allocate the decoder\n");
> > > +			return PTR_ERR(cxld);
> > > +		}
> > > +
> > > +		cxld->target_type = CXL_DECODER_EXPANDER;
> > > +		cxld->interleave_ways = 1;
> > > +		cxld->interleave_granularity = 0;
> > > +
> > > +		size = get_decoder_size(hdm_decoder, i);
> > > +		if (size != 0) {
> > > +#define decoderN(reg, n) hdm_decoder + CXL_HDM_DECODER0_##reg(n)  
> > 
> > Perhaps this block in the if (size != 0) would be more readable if broken out
> > to a utility function?  
> 
> I don't get this comment, there is already get_decoder_size(). Can you please
> elaborate?

Sure.  Just talking about having something like

		if (size != 0)
			init_decoder() // or something better named

as an alternative to this deep nesting. 

> 
> > As normal I'm not keen on the macro magic if we can avoid it.
> >   
> 
> Yeah - just trying to not have to deal with wrapping long lines.
> 
> >   
> > > +			int temp[CXL_DECODER_MAX_INTERLEAVE];
> > > +			u64 base;
> > > +			u32 ctrl;
> > > +			int j;
> > > +			union {
> > > +				u64 value;
> > > +				char target_id[8];
> > > +			} target_list;  
> > 
> > I thought we tried to avoid this union usage in kernel because of the whole
> > thing about c compilers being able to do what they like with it...
> >   
> 
> I wasn't aware of the restriction. I can change it back if it's required. It
> does look a lot nicer this way. Is there a reference to this issue somewhere?

Hmm. Seems I was out of date on this.  There is a mess in the c99 standard that
contradicts itself on whether you can do this or not.

https://davmac.wordpress.com/2010/02/26/c99-revisited/

The pull request for a patch form Andy got a Linus response...

https://lore.kernel.org/all/CAJZ5v0jq45atkapwSjJ4DkHhB1bfOA-Sh1TiA3dPXwKyFTBheA@mail.gmail.com/


> 
> > > +
> > > +			target_map = temp;  
> > 
> > This is set to a variable that goes out of scope whilst target_map is still in
> > use.
> >   
> 
> Yikes. I'm pretty surprised the compiler didn't catch this.
> 
> > > +			ctrl = readl(decoderN(CTRL_OFFSET, i));
> > > +			base = ioread64_hi_lo(decoderN(BASE_LOW_OFFSET, i));
> > > +
> > > +			cxld->decoder_range = (struct range){
> > > +				.start = base,
> > > +				.end = base + size - 1
> > > +			};
> > > +
> > > +			cxld->flags = CXL_DECODER_F_ENABLE;
> > > +			cxld->interleave_ways = to_interleave_ways(ctrl);
> > > +			cxld->interleave_granularity =
> > > +				to_interleave_granularity(ctrl);
> > > +
> > > +			if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl) == 0)
> > > +				cxld->target_type = CXL_DECODER_ACCELERATOR;
> > > +
> > > +			target_list.value = ioread64_hi_lo(decoderN(TL_LOW, i));
> > > +			for (j = 0; j < cxld->interleave_ways; j++)
> > > +				target_map[j] = target_list.target_id[j];
> > > +#undef decoderN
> > > +		}
> > > +
> > > +		rc = cxl_decoder_add_locked(cxld, target_map);
> > > +		if (rc)
> > > +			put_device(&cxld->dev);
> > > +		else
> > > +			rc = cxl_decoder_autoremove(&port->dev, cxld);
> > > +		if (rc)
> > > +			dev_err(&port->dev, "Failed to add decoder\n");  
> > 
> > If that fails on the autoremove registration (unlikely) this message
> > will be rather confusing - as the add was fine...
> > 
> > This nest of carrying on when we have an error is getting ever deeper...
> >   
> 
> Yeah, this is not great. I will clean it up.
> 
> Thanks.
> 
> > > +		else
> > > +			dev_dbg(&cxld->dev, "Added to port %s\n",
> > > +				dev_name(&port->dev));
> > > +	}
> > > +
> > > +	/*
> > > +	 * Turn on global enable now since DVSEC ranges aren't being used and
> > > +	 * we'll eventually want the decoder enabled.
> > > +	 */
> > > +	if (!global_enable) {
> > > +		dev_dbg(&port->dev, "Enabling HDM decode\n");
> > > +		writel(global_ctrl | CXL_HDM_DECODER_ENABLE, hdm_decoder + CXL_HDM_DECODER_CTRL_OFFSET);
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
Ben Widawsky Nov. 23, 2021, 4:14 p.m. UTC | #6
On 21-11-23 11:38:23, Jonathan Cameron wrote:
> On Mon, 22 Nov 2021 15:38:20 -0800
> Ben Widawsky <ben.widawsky@intel.com> wrote:
> 
> ...
> 
> > > > +static int enumerate_hdm_decoders(struct cxl_port *port,
> > > > +				  struct cxl_port_data *portdata)
> > > > +{
> > > > +	void __iomem *hdm_decoder = portdata->regs.hdm_decoder;
> > > > +	bool global_enable;
> > > > +	u32 global_ctrl;
> > > > +	int i = 0;
> > > > +
> > > > +	global_ctrl = readl(hdm_decoder + CXL_HDM_DECODER_CTRL_OFFSET);
> > > > +	global_enable = global_ctrl & CXL_HDM_DECODER_ENABLE;
> > > > +	if (!global_enable) {
> > > > +		i = dvsec_range_used(port);
> > > > +		if (i) {
> > > > +			dev_err(&port->dev,
> > > > +				"Couldn't add port because device is using DVSEC range registers\n");
> > > > +			return -EBUSY;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	for (i = 0; i < portdata->caps.count; i++) {
> > > > +		int rc, target_count = portdata->caps.tc;
> > > > +		struct cxl_decoder *cxld;
> > > > +		int *target_map = NULL;
> > > > +		u64 size;
> > > > +
> > > > +		if (is_endpoint_port(port))
> > > > +			target_count = 0;
> > > > +
> > > > +		cxld = cxl_decoder_alloc(port, target_count);
> > > > +		if (IS_ERR(cxld)) {
> > > > +			dev_warn(&port->dev,
> > > > +				 "Failed to allocate the decoder\n");
> > > > +			return PTR_ERR(cxld);
> > > > +		}
> > > > +
> > > > +		cxld->target_type = CXL_DECODER_EXPANDER;
> > > > +		cxld->interleave_ways = 1;
> > > > +		cxld->interleave_granularity = 0;
> > > > +
> > > > +		size = get_decoder_size(hdm_decoder, i);
> > > > +		if (size != 0) {
> > > > +#define decoderN(reg, n) hdm_decoder + CXL_HDM_DECODER0_##reg(n)  
> > > 
> > > Perhaps this block in the if (size != 0) would be more readable if broken out
> > > to a utility function?  
> > 
> > I don't get this comment, there is already get_decoder_size(). Can you please
> > elaborate?
> 
> Sure.  Just talking about having something like
> 
> 		if (size != 0)
> 			init_decoder() // or something better named
> 
> as an alternative to this deep nesting. 
> 

Sounds good. I can combine it with the similar initialization done in cxl_acpi.

> > 
> > > As normal I'm not keen on the macro magic if we can avoid it.
> > >   
> > 
> > Yeah - just trying to not have to deal with wrapping long lines.
> > 
> > >   
> > > > +			int temp[CXL_DECODER_MAX_INTERLEAVE];
> > > > +			u64 base;
> > > > +			u32 ctrl;
> > > > +			int j;
> > > > +			union {
> > > > +				u64 value;
> > > > +				char target_id[8];
> > > > +			} target_list;  
> > > 
> > > I thought we tried to avoid this union usage in kernel because of the whole
> > > thing about c compilers being able to do what they like with it...
> > >   
> > 
> > I wasn't aware of the restriction. I can change it back if it's required. It
> > does look a lot nicer this way. Is there a reference to this issue somewhere?
> 
> Hmm. Seems I was out of date on this.  There is a mess in the c99 standard that
> contradicts itself on whether you can do this or not.
> 
> https://davmac.wordpress.com/2010/02/26/c99-revisited/

Thanks for the link.

> 
> The pull request for a patch form Andy got a Linus response...
> 
> https://lore.kernel.org/all/CAJZ5v0jq45atkapwSjJ4DkHhB1bfOA-Sh1TiA3dPXwKyFTBheA@mail.gmail.com/
> 

That was a fun read :-)

I'll defer to Dan on this. This was actually his code that he suggested in
review of the RFC.

> 
> > 
> > > > +
> > > > +			target_map = temp;  
> > > 
> > > This is set to a variable that goes out of scope whilst target_map is still in
> > > use.
> > >   
> > 
> > Yikes. I'm pretty surprised the compiler didn't catch this.
> > 
> > > > +			ctrl = readl(decoderN(CTRL_OFFSET, i));
> > > > +			base = ioread64_hi_lo(decoderN(BASE_LOW_OFFSET, i));
> > > > +
> > > > +			cxld->decoder_range = (struct range){
> > > > +				.start = base,
> > > > +				.end = base + size - 1
> > > > +			};
> > > > +
> > > > +			cxld->flags = CXL_DECODER_F_ENABLE;
> > > > +			cxld->interleave_ways = to_interleave_ways(ctrl);
> > > > +			cxld->interleave_granularity =
> > > > +				to_interleave_granularity(ctrl);
> > > > +
> > > > +			if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl) == 0)
> > > > +				cxld->target_type = CXL_DECODER_ACCELERATOR;
> > > > +
> > > > +			target_list.value = ioread64_hi_lo(decoderN(TL_LOW, i));
> > > > +			for (j = 0; j < cxld->interleave_ways; j++)
> > > > +				target_map[j] = target_list.target_id[j];
> > > > +#undef decoderN
> > > > +		}
> > > > +
> > > > +		rc = cxl_decoder_add_locked(cxld, target_map);
> > > > +		if (rc)
> > > > +			put_device(&cxld->dev);
> > > > +		else
> > > > +			rc = cxl_decoder_autoremove(&port->dev, cxld);
> > > > +		if (rc)
> > > > +			dev_err(&port->dev, "Failed to add decoder\n");  
> > > 
> > > If that fails on the autoremove registration (unlikely) this message
> > > will be rather confusing - as the add was fine...
> > > 
> > > This nest of carrying on when we have an error is getting ever deeper...
> > >   
> > 
> > Yeah, this is not great. I will clean it up.
> > 
> > Thanks.
> > 
> > > > +		else
> > > > +			dev_dbg(&cxld->dev, "Added to port %s\n",
> > > > +				dev_name(&port->dev));
> > > > +	}
> > > > +
> > > > +	/*
> > > > +	 * Turn on global enable now since DVSEC ranges aren't being used and
> > > > +	 * we'll eventually want the decoder enabled.
> > > > +	 */
> > > > +	if (!global_enable) {
> > > > +		dev_dbg(&port->dev, "Enabling HDM decode\n");
> > > > +		writel(global_ctrl | CXL_HDM_DECODER_ENABLE, hdm_decoder + CXL_HDM_DECODER_CTRL_OFFSET);
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +  
>
Bjorn Helgaas Nov. 23, 2021, 6:21 p.m. UTC | #7
On Fri, Nov 19, 2021 at 04:02:47PM -0800, Ben Widawsky wrote:
> The CXL port driver is responsible for managing the decoder resources
> contained within the port. It will also provide APIs that other drivers
> will consume for managing these resources.
> 
> There are 4 types of ports in a system:
> 1. Platform port. This is a non-programmable entity. Such a port is
>    named rootX. It is enumerated by cxl_acpi in an ACPI based system.

Can you mention the ACPI source (static table, namespace PNP ID) here?

> 2. Hostbridge port. 

Is "hostbridge" styled as a single word in the spec?  I've only seen
"host bridge" elsewhere.

>    This ports register access is defined in a platform
>    specific way (CHBS for ACPI platforms). 

s/This ports/This port's/

This doesn't really make sense, though.  Are you saying the register
access *mechanism* is platform specific?  Or merely that the
enumeration method (ACPI table, ACPI namespace, DT, etc) is
platform-specific?

I assume "CHBS" is an ACPI static table?

>    It has n downstream ports,
>    each of which are known as CXL 2.0 root ports.

This sounds like a "host bridge port *contains* these root ports."
That doesn't sound right.

>    Once the platform
>    specific mechanism to get the offset to the registers is obtained it
>    operates just like other CXL components. The enumeration of this
>    component is started by cxl_acpi and completed by cxl_port.

> 3. Switch port. A switch port is similar to a hostbridge port except
>    register access is defined in the CXL specification in a platform
>    agnostic way. The downstream ports for a switch are simply known as
>    downstream ports. The enumeration of these are entirely contained
>    within cxl_port.

In PCIe, "Downstream Port" includes both Root Ports and Switch
Downstream Ports.  It sounds like that's not the case for CXL?

Well, except above you say that a Host Bridge Port has N Downstream
Ports, so I guess "Downstream Port" probably includes both Host Bridge
Ports and Switch Downstream Ports.

Maybe you should just omit the "The downstream ports for a switch are
simply known as downstream ports" sentence.

> 4. Endpoint port. Endpoint ports are similar to switch ports with the
>    exception that they have no downstream ports, only the underlying
>    media on the device. The enumeration of these are started by cxl_pci,
>    and completed by cxl_port.

Does CXL use an "Upstream Port" concept similar to PCIe?  In PCIe,
"Upstream Port" includes both Switch Upstream Ports and the Upstream
Port on an Endpoint.

I hope this driver is not modeled on the PCI portdrv.  IMO, that was a
design error, and the "port service drivers" (PME, hotplug, AER, etc)
should be directly integrated into the PCI core instead of pretending
to be independent drivers.

> --- a/Documentation/driver-api/cxl/memory-devices.rst
> +++ b/Documentation/driver-api/cxl/memory-devices.rst
> @@ -28,6 +28,11 @@ CXL Memory Device
>  .. kernel-doc:: drivers/cxl/pci.c
>     :internal:
>  
> +CXL Port
> +--------
> +.. kernel-doc:: drivers/cxl/port.c
> +   :doc: cxl port

s/cxl port/CXL Port/ ?  I don't know exactly how this gets rendered by
ReST.

>  CXL Core
>  --------
>  .. kernel-doc:: drivers/cxl/cxl.h
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index ef05e96f8f97..3aeb33bba5a3 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -77,4 +77,26 @@ config CXL_PMEM
>  	  provisioning the persistent memory capacity of CXL memory expanders.
>  
>  	  If unsure say 'm'.
> +
> +config CXL_MEM
> +	tristate "CXL.mem: Memory Devices"
> +	select CXL_PORT
> +	depends on CXL_PCI
> +	default CXL_BUS
> +	help
> +	  The CXL.mem protocol allows a device to act as a provider of "System
> +	  RAM" and/or "Persistent Memory" that is fully coherent as if the
> +	  memory was attached to the typical CPU memory controller.  This is
> +	  known as HDM "Host-managed Device Memory".

s/was attached/were attached/

> +	  Say 'y/m' to enable a driver that will attach to CXL.mem devices for
> +	  memory expansion and control of HDM. See Chapter 9.13 in the CXL 2.0
> +	  specification for a detailed description of HDM.
> +
> +	  If unsure say 'm'.
> +
> +

Spurious blank line.

> +config CXL_PORT
> +	tristate
> +
>  endif

> --- /dev/null
> +++ b/drivers/cxl/port.c
> @@ -0,0 +1,323 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +
> +#include "cxlmem.h"
> +
> +/**
> + * DOC: cxl port

s/cxl port/CXL port/ (I'm assuming this should match usage below)
or maybe "CXL Port" both places to match typical PCIe spec usage?

Capitalization is a useful hint that this term is defined by a spec.

> + *
> + * The port driver implements the set of functionality needed to allow full
> + * decoder enumeration and routing. A CXL port is an abstraction of a CXL
> + * component that implements some amount of CXL decoding of CXL.mem traffic.
> + * As of the CXL 2.0 spec, this includes:
> + *
> + *	.. list-table:: CXL Components w/ Ports
> + *		:widths: 25 25 50
> + *		:header-rows: 1
> + *
> + *		* - component
> + *		  - upstream
> + *		  - downstream
> + *		* - Hostbridge

s/Hostbridge/Host bridge/

> + *		  - ACPI0016
> + *		  - root port

s/root port/Root Port/ to match Switch Ports below (and spec usage).

> + *		* - Switch
> + *		  - Switch Upstream Port
> + *		  - Switch Downstream Port
> + *		* - Endpoint
> + *		  - Endpoint Port
> + *		  - N/A

What does "N/A" mean here?  Is it telling us something useful?

> +static void rescan_ports(struct work_struct *work)
> +{
> +	if (bus_rescan_devices(&cxl_bus_type))
> +		pr_warn("Failed to rescan\n");

Needs some context.  A bare "Failed to rescan" in the dmesg log
without a clue about who emitted it is really annoying.

Maybe you defined pr_fmt() somewhere; I couldn't find it easily.

Bjorn
Ben Widawsky Nov. 23, 2021, 10:03 p.m. UTC | #8
On 21-11-23 12:21:28, Bjorn Helgaas wrote:
> On Fri, Nov 19, 2021 at 04:02:47PM -0800, Ben Widawsky wrote:
> > The CXL port driver is responsible for managing the decoder resources
> > contained within the port. It will also provide APIs that other drivers
> > will consume for managing these resources.
> > 
> > There are 4 types of ports in a system:
> > 1. Platform port. This is a non-programmable entity. Such a port is
> >    named rootX. It is enumerated by cxl_acpi in an ACPI based system.
> 
> Can you mention the ACPI source (static table, namespace PNP ID) here?

Done.

> 
> > 2. Hostbridge port. 
> 
> Is "hostbridge" styled as a single word in the spec?  I've only seen
> "host bridge" elsewhere.
> 

2 words. I'm sadly inconsistent with this particular word. CXL spec capitalizes
it.

> >    This ports register access is defined in a platform
> >    specific way (CHBS for ACPI platforms). 
> 
> s/This ports/This port's/
> 
> This doesn't really make sense, though.  Are you saying the register
> access *mechanism* is platform specific?  Or merely that the
> enumeration method (ACPI table, ACPI namespace, DT, etc) is
> platform-specific?
> 
> I assume "CHBS" is an ACPI static table?
> 

The registers are spec defined. The base address of those registers is defined
in a platform specific manner. Enumeration is a better word. CHBS is an ACPI
static table, yes.

> >    It has n downstream ports,
> >    each of which are known as CXL 2.0 root ports.
> 
> This sounds like a "host bridge port *contains* these root ports."
> That doesn't sound right.
> 

What sounds better? A CXL 2.0 Root Port is CXL capabilities on top of the PCIe
component which has the PCI_EXP_TYPE_ROOT_PORT cap. In my mental model, a host
bridge does contain the root ports. Perhaps I am wrong about that?

> >    Once the platform
> >    specific mechanism to get the offset to the registers is obtained it
> >    operates just like other CXL components. The enumeration of this
> >    component is started by cxl_acpi and completed by cxl_port.
> 
> > 3. Switch port. A switch port is similar to a hostbridge port except
> >    register access is defined in the CXL specification in a platform
> >    agnostic way. The downstream ports for a switch are simply known as
> >    downstream ports. The enumeration of these are entirely contained
> >    within cxl_port.
> 
> In PCIe, "Downstream Port" includes both Root Ports and Switch
> Downstream Ports.  It sounds like that's not the case for CXL?
> 

In CXL 2.0, it's fairly close to true that switch downstream ports and root
ports are equivalent. From today's driver perspective they are equivalent. Root
ports have some capabilities which switch downstream ports do not.

> Well, except above you say that a Host Bridge Port has N Downstream
> Ports, so I guess "Downstream Port" probably includes both Host Bridge
> Ports and Switch Downstream Ports.

Yes, in that case I meant a port which is downstream - confusing indeed.

> 
> Maybe you should just omit the "The downstream ports for a switch are
> simply known as downstream ports" sentence.
> 

Sounds good.

> > 4. Endpoint port. Endpoint ports are similar to switch ports with the
> >    exception that they have no downstream ports, only the underlying
> >    media on the device. The enumeration of these are started by cxl_pci,
> >    and completed by cxl_port.
> 
> Does CXL use an "Upstream Port" concept similar to PCIe?  In PCIe,
> "Upstream Port" includes both Switch Upstream Ports and the Upstream
> Port on an Endpoint.

Not really. Endpoints aren't known as ports in the spec and they have a decent
amount of divergence from upstream ports. The main area where they do converge
is in the memory decoding capabilities. In fact, it might come to a point where
we find adding an endpoint as a port does not make sense, but for now it does.

> 
> I hope this driver is not modeled on the PCI portdrv.  IMO, that was a
> design error, and the "port service drivers" (PME, hotplug, AER, etc)
> should be directly integrated into the PCI core instead of pretending
> to be independent drivers.

I'd like to understand a bit about why you think it was a design error. The port
driver is intended to be a port services driver, however I see the services
provided as quite different than the ones you mention. The primary service
cxl_port will provide from here on out is the ability to manage HDM decoder
resources for a port. Other independent drivers that want to use HDM decoder
resources would rely on the port driver for this.

It could be in CXL core just the same, but I don't see too much of a benefit
since the code would be almost identical. One nice aspect of having the port
driver outside of CXL core is it would allow CXL devices which do not need port
services (type-1 and probably type-2) to not need to load the port driver. We do
not have examples of such devices today but there's good evidence they are being
built. Whether or not they will even want CXL core is another topic up for
debate however.

I admit Dan and I did discuss putting this in either its own port driver, or
within core as a port driver. My inclination is, less is more in CXL core; but
perhaps you will be able to change my mind.

> 
> > --- a/Documentation/driver-api/cxl/memory-devices.rst
> > +++ b/Documentation/driver-api/cxl/memory-devices.rst
> > @@ -28,6 +28,11 @@ CXL Memory Device
> >  .. kernel-doc:: drivers/cxl/pci.c
> >     :internal:
> >  
> > +CXL Port
> > +--------
> > +.. kernel-doc:: drivers/cxl/port.c
> > +   :doc: cxl port
> 
> s/cxl port/CXL Port/ ?  I don't know exactly how this gets rendered by
> ReST.

I believe this is the specific tag as specified in the .c file. I can capitalize
it, but we haven't done this for other tags.

> 
> >  CXL Core
> >  --------
> >  .. kernel-doc:: drivers/cxl/cxl.h
> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > index ef05e96f8f97..3aeb33bba5a3 100644
> > --- a/drivers/cxl/Kconfig
> > +++ b/drivers/cxl/Kconfig
> > @@ -77,4 +77,26 @@ config CXL_PMEM
> >  	  provisioning the persistent memory capacity of CXL memory expanders.
> >  
> >  	  If unsure say 'm'.
> > +
> > +config CXL_MEM
> > +	tristate "CXL.mem: Memory Devices"
> > +	select CXL_PORT
> > +	depends on CXL_PCI
> > +	default CXL_BUS
> > +	help
> > +	  The CXL.mem protocol allows a device to act as a provider of "System
> > +	  RAM" and/or "Persistent Memory" that is fully coherent as if the
> > +	  memory was attached to the typical CPU memory controller.  This is
> > +	  known as HDM "Host-managed Device Memory".
> 
> s/was attached/were attached/
> 
> > +	  Say 'y/m' to enable a driver that will attach to CXL.mem devices for
> > +	  memory expansion and control of HDM. See Chapter 9.13 in the CXL 2.0
> > +	  specification for a detailed description of HDM.
> > +
> > +	  If unsure say 'm'.
> > +
> > +
> 
> Spurious blank line.
> 
> > +config CXL_PORT
> > +	tristate
> > +
> >  endif
> 
> > --- /dev/null
> > +++ b/drivers/cxl/port.c
> > @@ -0,0 +1,323 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
> > +#include <linux/device.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +
> > +#include "cxlmem.h"
> > +
> > +/**
> > + * DOC: cxl port
> 
> s/cxl port/CXL port/ (I'm assuming this should match usage below)
> or maybe "CXL Port" both places to match typical PCIe spec usage?
> 
> Capitalization is a useful hint that this term is defined by a spec.
> 

As above, I don't mind changing this at all, but this would be inconsistent with
the other tags we have defined.

> > + *
> > + * The port driver implements the set of functionality needed to allow full
> > + * decoder enumeration and routing. A CXL port is an abstraction of a CXL
> > + * component that implements some amount of CXL decoding of CXL.mem traffic.
> > + * As of the CXL 2.0 spec, this includes:
> > + *
> > + *	.. list-table:: CXL Components w/ Ports
> > + *		:widths: 25 25 50
> > + *		:header-rows: 1
> > + *
> > + *		* - component
> > + *		  - upstream
> > + *		  - downstream
> > + *		* - Hostbridge
> 
> s/Hostbridge/Host bridge/
> 
> > + *		  - ACPI0016
> > + *		  - root port
> 
> s/root port/Root Port/ to match Switch Ports below (and spec usage).
> 
> > + *		* - Switch
> > + *		  - Switch Upstream Port
> > + *		  - Switch Downstream Port
> > + *		* - Endpoint
> > + *		  - Endpoint Port
> > + *		  - N/A
> 
> What does "N/A" mean here?  Is it telling us something useful?

This gets rendered into a table that looks like the following:


| component  | upstream             | downstream             |
| ---------  | --------             | ----------             |
| Hostbridge | ACPI0016             | Root Port              |
| Switch     | Switch Upstream Port | Switch Downstream Port |
| Endpoint   | Endpoint Port        | N/A                    |


> 
> > +static void rescan_ports(struct work_struct *work)
> > +{
> > +	if (bus_rescan_devices(&cxl_bus_type))
> > +		pr_warn("Failed to rescan\n");
> 
> Needs some context.  A bare "Failed to rescan" in the dmesg log
> without a clue about who emitted it is really annoying.
> 
> Maybe you defined pr_fmt() somewhere; I couldn't find it easily.
> 

I actually didn't know about pr_fmt() trick, so thanks for that. I'll improve
this message to be more useful and contextual.

> Bjorn
Dan Williams Nov. 23, 2021, 10:36 p.m. UTC | #9
On Tue, Nov 23, 2021 at 2:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
[..]
> > I hope this driver is not modeled on the PCI portdrv.  IMO, that was a
> > design error, and the "port service drivers" (PME, hotplug, AER, etc)
> > should be directly integrated into the PCI core instead of pretending
> > to be independent drivers.
>
> I'd like to understand a bit about why you think it was a design error. The port
> driver is intended to be a port services driver, however I see the services
> provided as quite different than the ones you mention. The primary service
> cxl_port will provide from here on out is the ability to manage HDM decoder
> resources for a port. Other independent drivers that want to use HDM decoder
> resources would rely on the port driver for this.
>
> It could be in CXL core just the same, but I don't see too much of a benefit
> since the code would be almost identical. One nice aspect of having the port
> driver outside of CXL core is it would allow CXL devices which do not need port
> services (type-1 and probably type-2) to not need to load the port driver. We do
> not have examples of such devices today but there's good evidence they are being
> built. Whether or not they will even want CXL core is another topic up for
> debate however.
>
> I admit Dan and I did discuss putting this in either its own port driver, or
> within core as a port driver. My inclination is, less is more in CXL core; but
> perhaps you will be able to change my mind.

No, I don't think Bjorn is talking about whether the driver is
integrated into cxl_core.ko vs its own cxl_port.ko. IIUC this goes
back to the original contention about have /sys/bus/cxl at all:

https://lore.kernel.org/r/CAPcyv4iu8D-hJoujLXw8a4myS7trOE1FcUhESLB_imGMECVfrg@mail.gmail.com

Unlike pcieportdrv where the functionality is bounded to a single
device instance with relatively simpler hierarchical coordination of
the memory space and services. CXL interleaving is both a foreign
concept to the PCIE core and an awkward fit for the pci_bus_type
device model. CXL uses the cxl_bus_type and bus drivers to coordinate
CXL operations that have cross PCI device implications.

Outside of that I think the auxiliary device driver model, of which
the PCIE portdrv model is an open-coded form, is a useful construct
for separation of concerns. That said, I do want to hear more about
what trouble it has caused though to make sure that CXL does not trip
over the same issues longer term.

[..]
> > > +static void rescan_ports(struct work_struct *work)
> > > +{
> > > +   if (bus_rescan_devices(&cxl_bus_type))
> > > +           pr_warn("Failed to rescan\n");
> >
> > Needs some context.  A bare "Failed to rescan" in the dmesg log
> > without a clue about who emitted it is really annoying.
> >
> > Maybe you defined pr_fmt() somewhere; I couldn't find it easily.
> >
>
> I actually didn't know about pr_fmt() trick, so thanks for that. I'll improve
> this message to be more useful and contextual.

I am skeptical that this implementation of the workqueue properly
synchronizes with workqueue shutdown concerns, but I have not had a
chance to dig in too deep on this patchset.

Regardless, it is not worth reporting a rescan failure, because those
are to be expected here. The rescan attempts to rescan when a
constraint changes, there is no guarantee that all constraints are met
just because one constraint changed, so rescan failures
(device_attach() failures) are not interesting to report. The
individual driver probe failure reporting is sufficient.
Ben Widawsky Nov. 23, 2021, 11:38 p.m. UTC | #10
On 21-11-23 14:36:32, Dan Williams wrote:
> On Tue, Nov 23, 2021 at 2:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> [..]
> > > I hope this driver is not modeled on the PCI portdrv.  IMO, that was a
> > > design error, and the "port service drivers" (PME, hotplug, AER, etc)
> > > should be directly integrated into the PCI core instead of pretending
> > > to be independent drivers.
> >
> > I'd like to understand a bit about why you think it was a design error. The port
> > driver is intended to be a port services driver, however I see the services
> > provided as quite different than the ones you mention. The primary service
> > cxl_port will provide from here on out is the ability to manage HDM decoder
> > resources for a port. Other independent drivers that want to use HDM decoder
> > resources would rely on the port driver for this.
> >
> > It could be in CXL core just the same, but I don't see too much of a benefit
> > since the code would be almost identical. One nice aspect of having the port
> > driver outside of CXL core is it would allow CXL devices which do not need port
> > services (type-1 and probably type-2) to not need to load the port driver. We do
> > not have examples of such devices today but there's good evidence they are being
> > built. Whether or not they will even want CXL core is another topic up for
> > debate however.
> >
> > I admit Dan and I did discuss putting this in either its own port driver, or
> > within core as a port driver. My inclination is, less is more in CXL core; but
> > perhaps you will be able to change my mind.
> 
> No, I don't think Bjorn is talking about whether the driver is
> integrated into cxl_core.ko vs its own cxl_port.ko. IIUC this goes
> back to the original contention about have /sys/bus/cxl at all:
> 
> https://lore.kernel.org/r/CAPcyv4iu8D-hJoujLXw8a4myS7trOE1FcUhESLB_imGMECVfrg@mail.gmail.com
> 
> Unlike pcieportdrv where the functionality is bounded to a single
> device instance with relatively simpler hierarchical coordination of
> the memory space and services. CXL interleaving is both a foreign
> concept to the PCIE core and an awkward fit for the pci_bus_type
> device model. CXL uses the cxl_bus_type and bus drivers to coordinate
> CXL operations that have cross PCI device implications.
> 
> Outside of that I think the auxiliary device driver model, of which
> the PCIE portdrv model is an open-coded form, is a useful construct
> for separation of concerns. That said, I do want to hear more about
> what trouble it has caused though to make sure that CXL does not trip
> over the same issues longer term.
> 
> [..]
> > > > +static void rescan_ports(struct work_struct *work)
> > > > +{
> > > > +   if (bus_rescan_devices(&cxl_bus_type))
> > > > +           pr_warn("Failed to rescan\n");
> > >
> > > Needs some context.  A bare "Failed to rescan" in the dmesg log
> > > without a clue about who emitted it is really annoying.
> > >
> > > Maybe you defined pr_fmt() somewhere; I couldn't find it easily.
> > >
> >
> > I actually didn't know about pr_fmt() trick, so thanks for that. I'll improve
> > this message to be more useful and contextual.
> 
> I am skeptical that this implementation of the workqueue properly
> synchronizes with workqueue shutdown concerns, but I have not had a
> chance to dig in too deep on this patchset.

Yeah, we discussed this. Please do check that. I'm happy to send out v2 with all
of Jonathan's fixes first, so you don't have to duplicate effort with what he
has already uncovered.

> 
> Regardless, it is not worth reporting a rescan failure, because those
> are to be expected here. The rescan attempts to rescan when a
> constraint changes, there is no guarantee that all constraints are met
> just because one constraint changed, so rescan failures
> (device_attach() failures) are not interesting to report. The
> individual driver probe failure reporting is sufficient.

Agreed.
Bjorn Helgaas Nov. 23, 2021, 11:55 p.m. UTC | #11
[+cc Christoph, since he has opinions about portdrv;
Greg, Rafael, since they have good opinions about sysfs structure]

On Tue, Nov 23, 2021 at 02:36:32PM -0800, Dan Williams wrote:
> On Tue, Nov 23, 2021 at 2:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> [..]
> > > I hope this driver is not modeled on the PCI portdrv.  IMO, that
> > > was a design error, and the "port service drivers" (PME,
> > > hotplug, AER, etc) should be directly integrated into the PCI
> > > core instead of pretending to be independent drivers.
> >
> > I'd like to understand a bit about why you think it was a design
> > error. The port driver is intended to be a port services driver,
> > however I see the services provided as quite different than the
> > ones you mention. The primary service cxl_port will provide from
> > here on out is the ability to manage HDM decoder resources for a
> > port. Other independent drivers that want to use HDM decoder
> > resources would rely on the port driver for this.
> >
> > It could be in CXL core just the same, but I don't see too much of
> > a benefit since the code would be almost identical. One nice
> > aspect of having the port driver outside of CXL core is it would
> > allow CXL devices which do not need port services (type-1 and
> > probably type-2) to not need to load the port driver. We do not
> > have examples of such devices today but there's good evidence they
> > are being built. Whether or not they will even want CXL core is
> > another topic up for debate however.
> >
> > I admit Dan and I did discuss putting this in either its own port
> > driver, or within core as a port driver. My inclination is, less
> > is more in CXL core; but perhaps you will be able to change my
> > mind.
> 
> No, I don't think Bjorn is talking about whether the driver is
> integrated into cxl_core.ko vs its own cxl_port.ko. IIUC this goes
> back to the original contention about have /sys/bus/cxl at all:
> 
> https://lore.kernel.org/r/CAPcyv4iu8D-hJoujLXw8a4myS7trOE1FcUhESLB_imGMECVfrg@mail.gmail.com

That question was about whether we want the same physical device to be
represented both in the /sys/bus/pci hierarchy and the /sys/bus/cxl
hierarchy.  That still seems a little weird to me, but I don't know
enough about the CXL constraints to really object to it.

My question here is more about whether you're going to use something
like the pcie_port_service_register() model for supporting multiple
drivers attached to the same physical device.

The PCIe portdrv creates a /sys/bus/pci_express hierarchy parallel to
the /sys/bus/pci hierarchy.  The pci_express hierarchy has a "device"
for every service (hotplug, AER, DPC, PME, etc) (see
pcie_device_init()).  This device creation is quite complicated and
depends on whether the Port advertises a Capability, whether the
platform has granted control to the OS, whether support is compiled
in, etc.

I think that was a mistake because these hotplug, AER, DPC, PME
"devices" are not independent things.  They are optional features that
can be added to a variety of devices, and those devices might have
their own drivers.  For example, we want to write drivers for
vendor-specific functionality like PMUs in switches, but we can't do
that cleanly because portdrv claims them.

The portdrv features are fully specified by the PCIe spec, so nothing
is vendor-specific.  They share interrupts.  They share power state.
They cannot be reset independently.  They are not addressable entities
in the usual bus/device/function model.  They can't be removed or
added like the underlying device can.  I wasn't there when they were
designed, but from reading the spec, it seems like they were designed
as optional features of a device, not as separate devices themselves.

> Unlike pcieportdrv where the functionality is bounded to a single
> device instance with relatively simpler hierarchical coordination of
> the memory space and services. CXL interleaving is both a foreign
> concept to the PCIE core and an awkward fit for the pci_bus_type
> device model. CXL uses the cxl_bus_type and bus drivers to
> coordinate CXL operations that have cross PCI device implications.
> 
> Outside of that I think the auxiliary device driver model, of which
> the PCIE portdrv model is an open-coded form, is a useful construct
> for separation of concerns. That said, I do want to hear more about
> what trouble it has caused though to make sure that CXL does not
> trip over the same issues longer term.

Bjorn
Dan Williams Nov. 24, 2021, 12:40 a.m. UTC | #12
On Tue, Nov 23, 2021 at 3:56 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Christoph, since he has opinions about portdrv;
> Greg, Rafael, since they have good opinions about sysfs structure]
>
> On Tue, Nov 23, 2021 at 02:36:32PM -0800, Dan Williams wrote:
> > On Tue, Nov 23, 2021 at 2:03 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > [..]
> > > > I hope this driver is not modeled on the PCI portdrv.  IMO, that
> > > > was a design error, and the "port service drivers" (PME,
> > > > hotplug, AER, etc) should be directly integrated into the PCI
> > > > core instead of pretending to be independent drivers.
> > >
> > > I'd like to understand a bit about why you think it was a design
> > > error. The port driver is intended to be a port services driver,
> > > however I see the services provided as quite different than the
> > > ones you mention. The primary service cxl_port will provide from
> > > here on out is the ability to manage HDM decoder resources for a
> > > port. Other independent drivers that want to use HDM decoder
> > > resources would rely on the port driver for this.
> > >
> > > It could be in CXL core just the same, but I don't see too much of
> > > a benefit since the code would be almost identical. One nice
> > > aspect of having the port driver outside of CXL core is it would
> > > allow CXL devices which do not need port services (type-1 and
> > > probably type-2) to not need to load the port driver. We do not
> > > have examples of such devices today but there's good evidence they
> > > are being built. Whether or not they will even want CXL core is
> > > another topic up for debate however.
> > >
> > > I admit Dan and I did discuss putting this in either its own port
> > > driver, or within core as a port driver. My inclination is, less
> > > is more in CXL core; but perhaps you will be able to change my
> > > mind.
> >
> > No, I don't think Bjorn is talking about whether the driver is
> > integrated into cxl_core.ko vs its own cxl_port.ko. IIUC this goes
> > back to the original contention about have /sys/bus/cxl at all:
> >
> > https://lore.kernel.org/r/CAPcyv4iu8D-hJoujLXw8a4myS7trOE1FcUhESLB_imGMECVfrg@mail.gmail.com
>
> That question was about whether we want the same physical device to be
> represented both in the /sys/bus/pci hierarchy and the /sys/bus/cxl
> hierarchy.  That still seems a little weird to me, but I don't know
> enough about the CXL constraints to really object to it.
>
> My question here is more about whether you're going to use something
> like the pcie_port_service_register() model for supporting multiple
> drivers attached to the same physical device.
>
> The PCIe portdrv creates a /sys/bus/pci_express hierarchy parallel to
> the /sys/bus/pci hierarchy.  The pci_express hierarchy has a "device"
> for every service (hotplug, AER, DPC, PME, etc) (see
> pcie_device_init()).  This device creation is quite complicated and
> depends on whether the Port advertises a Capability, whether the
> platform has granted control to the OS, whether support is compiled
> in, etc.
>
> I think that was a mistake because these hotplug, AER, DPC, PME
> "devices" are not independent things.  They are optional features that
> can be added to a variety of devices, and those devices might have
> their own drivers.  For example, we want to write drivers for
> vendor-specific functionality like PMUs in switches, but we can't do
> that cleanly because portdrv claims them.
>
> The portdrv features are fully specified by the PCIe spec, so nothing
> is vendor-specific.  They share interrupts.  They share power state.
> They cannot be reset independently.  They are not addressable entities
> in the usual bus/device/function model.  They can't be removed or
> added like the underlying device can.  I wasn't there when they were
> designed, but from reading the spec, it seems like they were designed
> as optional features of a device, not as separate devices themselves.

Let me ask a clarifying question coming from the other direction that
resulted in the creation of the auxiliary bus architecture. Some
background. RDMA is a protocol that may run on top of Ethernet.
Consider the case where you have multiple generations of Ethernet
adapter devices, but they all support common RDMA functionality. You
only have the one PCI device to attach a unique Ethernet driver. What
is an idiomatic way to deploy a module that automatically loads and
attaches to the exported common functionality across adapters that
otherwise have a unique native driver for the hardware device?

Another example, the Native PCIe Enclosure Management (NPEM)
specification defines a handful of registers that can appear anywhere
in the PCIe hierarchy. How can you write a common driver that is
generically applicable to any given NPEM instance?

The auxiliary bus answer to those questions is to register an
auxiliary device to be driven by a common auxiliary driver across
hard-device generations from the same vendor or even across vendors.

For your example about a PCIe port PMU driver it ultimately requires
something to enumerate that capability and a library of code to
exercise that functionality. What is a more natural fit than a Linux
"device" and a Linux driver to coordinate attaching enabling code to a
standalone hardware capability?

PCIe portdrv may be awkward because there was never a real native
driver for the device to begin with and it all could have handled by
'pcie_portdriver' directly without registering more Linux devices, but
assigning self contained features to Linux devices otherwise seems a
useful idiom to me.

As for CXL, there is no analog of the PCIe portdrv pattern of just
having a device act as a multiplexer of features to other Linux
devices.
Christoph Hellwig Nov. 24, 2021, 6:33 a.m. UTC | #13
On Tue, Nov 23, 2021 at 04:40:06PM -0800, Dan Williams wrote:
> Let me ask a clarifying question coming from the other direction that
> resulted in the creation of the auxiliary bus architecture. Some
> background. RDMA is a protocol that may run on top of Ethernet.

No, RDMA is a concept.  Linux supports 2 and a half RDMA protocols
that run over ethernet (RoCE v1 and v2 and iWarp).

> Consider the case where you have multiple generations of Ethernet
> adapter devices, but they all support common RDMA functionality. You
> only have the one PCI device to attach a unique Ethernet driver. What
> is an idiomatic way to deploy a module that automatically loads and
> attaches to the exported common functionality across adapters that
> otherwise have a unique native driver for the hardware device?

The whole aux bus drama is mostly because the intel design for these
is really fucked up.  All the sane HCAs do not use this model.  All
this attchment crap really should not be there.

> Another example, the Native PCIe Enclosure Management (NPEM)
> specification defines a handful of registers that can appear anywhere
> in the PCIe hierarchy. How can you write a common driver that is
> generically applicable to any given NPEM instance?

Another totally messed up spec.  But then pretty much everything coming
from the PCIe SIG in terms of interface tends to be really, really
broken lately.
Dan Williams Nov. 24, 2021, 7:17 a.m. UTC | #14
On Tue, Nov 23, 2021 at 10:33 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Tue, Nov 23, 2021 at 04:40:06PM -0800, Dan Williams wrote:
> > Let me ask a clarifying question coming from the other direction that
> > resulted in the creation of the auxiliary bus architecture. Some
> > background. RDMA is a protocol that may run on top of Ethernet.
>
> No, RDMA is a concept.  Linux supports 2 and a half RDMA protocols
> that run over ethernet (RoCE v1 and v2 and iWarp).

Yes, I was being too coarse, point taken. However, I don't think that
changes the observation that multiple vendors are using aux bus to
share a feature driver across multiple base Ethernet drivers.

>
> > Consider the case where you have multiple generations of Ethernet
> > adapter devices, but they all support common RDMA functionality. You
> > only have the one PCI device to attach a unique Ethernet driver. What
> > is an idiomatic way to deploy a module that automatically loads and
> > attaches to the exported common functionality across adapters that
> > otherwise have a unique native driver for the hardware device?
>
> The whole aux bus drama is mostly because the intel design for these
> is really fucked up.  All the sane HCAs do not use this model.  All
> this attchment crap really should not be there.

I am missing the counter proposal in both Bjorn's and your distaste
for aux bus and PCIe portdrv?

> > Another example, the Native PCIe Enclosure Management (NPEM)
> > specification defines a handful of registers that can appear anywhere
> > in the PCIe hierarchy. How can you write a common driver that is
> > generically applicable to any given NPEM instance?
>
> Another totally messed up spec.  But then pretty much everything coming
> from the PCIe SIG in terms of interface tends to be really, really
> broken lately.

DVSEC and DOE is more of the same in terms of composing add-on
features into devices. Hardware vendors want to mix multiple hard-IPs
into a single device, aux bus is one response. Topology specific buses
like /sys/bus/cxl are another.

This CXL port driver is offering enumeration, link management, and
memory decode setup services to the rest of the topology. I see it as
similar to management protocol services offered by libsas.
Christoph Hellwig Nov. 24, 2021, 7:28 a.m. UTC | #15
On Tue, Nov 23, 2021 at 11:17:55PM -0800, Dan Williams wrote:
> I am missing the counter proposal in both Bjorn's and your distaste
> for aux bus and PCIe portdrv?

Given that I've only brought in in the last mail I have no idea what
the original proposal even is.
gregkh@linuxfoundation.org Nov. 24, 2021, 7:33 a.m. UTC | #16
On Wed, Nov 24, 2021 at 08:28:24AM +0100, Christoph Hellwig wrote:
> On Tue, Nov 23, 2021 at 11:17:55PM -0800, Dan Williams wrote:
> > I am missing the counter proposal in both Bjorn's and your distaste
> > for aux bus and PCIe portdrv?
> 
> Given that I've only brought in in the last mail I have no idea what
> the original proposal even is.

Neither do I :(
Dan Williams Nov. 24, 2021, 7:54 a.m. UTC | #17
On Tue, Nov 23, 2021 at 11:33 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Wed, Nov 24, 2021 at 08:28:24AM +0100, Christoph Hellwig wrote:
> > On Tue, Nov 23, 2021 at 11:17:55PM -0800, Dan Williams wrote:
> > > I am missing the counter proposal in both Bjorn's and your distaste
> > > for aux bus and PCIe portdrv?
> >
> > Given that I've only brought in in the last mail I have no idea what
> > the original proposal even is.
>
> Neither do I :(

To be clear I am also trying to get to the root of Bjorn's concern.

The proposal in $SUBJECT is to build on / treat a CXL topology as a
Linux device topology on /sys/bus/cxl that references devices on
/sys/bus/platform (CXL ACPI topology root and Host Bridges) and
/sys/bus/pci (CXL Switches and Endpoints). This CXL port device
topology has already been shipping for a few kernel cycles. What is on
the table now is a driver for CXL port devices (a logical Linux
construct). The driver handles discovery of "component registers"
either by ACPI table or PCI DVSEC and offers services to proxision CXL
regions. CXL 'regions' are also proposed as Linux devices that
represent an active CXL memory range which can interleave multiple
endpoints across multiple switches and host bridges.
gregkh@linuxfoundation.org Nov. 24, 2021, 8:21 a.m. UTC | #18
On Tue, Nov 23, 2021 at 11:54:03PM -0800, Dan Williams wrote:
> On Tue, Nov 23, 2021 at 11:33 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Nov 24, 2021 at 08:28:24AM +0100, Christoph Hellwig wrote:
> > > On Tue, Nov 23, 2021 at 11:17:55PM -0800, Dan Williams wrote:
> > > > I am missing the counter proposal in both Bjorn's and your distaste
> > > > for aux bus and PCIe portdrv?
> > >
> > > Given that I've only brought in in the last mail I have no idea what
> > > the original proposal even is.
> >
> > Neither do I :(
> 
> To be clear I am also trying to get to the root of Bjorn's concern.
> 
> The proposal in $SUBJECT is to build on / treat a CXL topology as a
> Linux device topology on /sys/bus/cxl that references devices on
> /sys/bus/platform (CXL ACPI topology root and Host Bridges) and
> /sys/bus/pci (CXL Switches and Endpoints).

Wait, I am confused.

A bus lists devices that are on that bus.  It does not list devices that
are on other busses.

Now a device in a bus list can have a parent be on different types of
busses, as the parent device does not matter, but the device itself can
not be of different types.

So I do not understand what you are describing here at all.  Do you have
an example output of sysfs that shows this situation?

> This CXL port device topology has already been shipping for a few
> kernel cycles.

So it's always been broken?  :)

> What is on
> the table now is a driver for CXL port devices (a logical Linux
> construct). The driver handles discovery of "component registers"
> either by ACPI table or PCI DVSEC and offers services to proxision CXL
> regions.

So a normal bus controller device that creates new devices of a bus
type, right?  What is special about that?

> CXL 'regions' are also proposed as Linux devices that
> represent an active CXL memory range which can interleave multiple
> endpoints across multiple switches and host bridges.

As long as these 'devices' have drivers and do not mess with the
resources of any other device in the system, I do not understand the
problem here.

Or is the issue that you are again trying to carve up "real" devices
into tiny devices that then somehow need to be aware of the resources
being touched by different drivers at the same time?

I'm still confused, sorry.

greg k-h
Dan Williams Nov. 24, 2021, 6:24 p.m. UTC | #19
On Wed, Nov 24, 2021 at 12:22 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, Nov 23, 2021 at 11:54:03PM -0800, Dan Williams wrote:
> > On Tue, Nov 23, 2021 at 11:33 PM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Wed, Nov 24, 2021 at 08:28:24AM +0100, Christoph Hellwig wrote:
> > > > On Tue, Nov 23, 2021 at 11:17:55PM -0800, Dan Williams wrote:
> > > > > I am missing the counter proposal in both Bjorn's and your distaste
> > > > > for aux bus and PCIe portdrv?
> > > >
> > > > Given that I've only brought in in the last mail I have no idea what
> > > > the original proposal even is.
> > >
> > > Neither do I :(
> >
> > To be clear I am also trying to get to the root of Bjorn's concern.
> >
> > The proposal in $SUBJECT is to build on / treat a CXL topology as a
> > Linux device topology on /sys/bus/cxl that references devices on
> > /sys/bus/platform (CXL ACPI topology root and Host Bridges) and
> > /sys/bus/pci (CXL Switches and Endpoints).
>
> Wait, I am confused.
>
> A bus lists devices that are on that bus.  It does not list devices that
> are on other busses.
>
> Now a device in a bus list can have a parent be on different types of
> busses, as the parent device does not matter, but the device itself can
> not be of different types.
>
> So I do not understand what you are describing here at all.  Do you have
> an example output of sysfs that shows this situation?

Commit 40ba17afdfab ("cxl/acpi: Introduce cxl_decoder objects")

...has an example of the devices registered on the CXL bus by the
cxl_acpi driver.

> > This CXL port device topology has already been shipping for a few
> > kernel cycles.
>
> So it's always been broken?  :)

Kidding aside, CXL has different moving pieces than PCI and the Linux
device-driver model is how we are organizing that complexity. CXL is
also symbiotically attached to PCI as it borrows the enumeration
mechanism while building up a parallel CXL.mem universe to live
alongside PCI.config and PCI.mmio. The CXL subsystem is similar to MD
/ DM where virtual devices are built up from other devices.

> > What is on
> > the table now is a driver for CXL port devices (a logical Linux
> > construct). The driver handles discovery of "component registers"
> > either by ACPI table or PCI DVSEC and offers services to proxision CXL
> > regions.
>
> So a normal bus controller device that creates new devices of a bus
> type, right?  What is special about that?

Yes, which is the root of my confusion about Bjorn's concern.

> > CXL 'regions' are also proposed as Linux devices that
> > represent an active CXL memory range which can interleave multiple
> > endpoints across multiple switches and host bridges.
>
> As long as these 'devices' have drivers and do not mess with the
> resources of any other device in the system, I do not understand the
> problem here.

Correct, these drivers manage CXL.mem resources and leave PCI.mmio
resource management to PCI.

> Or is the issue that you are again trying to carve up "real" devices
> into tiny devices that then somehow need to be aware of the resources
> being touched by different drivers at the same time?

No, there's no multiplexing of resources across devices / drivers that
requires cross-driver awareness.

> I'm still confused, sorry.

No worries, appreciate the attention to make sure this implementation
is idiomatic and avoids any architectural dragons.
Bjorn Helgaas Nov. 24, 2021, 9:31 p.m. UTC | #20
On Tue, Nov 23, 2021 at 02:03:15PM -0800, Ben Widawsky wrote:
> On 21-11-23 12:21:28, Bjorn Helgaas wrote:
> > On Fri, Nov 19, 2021 at 04:02:47PM -0800, Ben Widawsky wrote:

> > > 2. Hostbridge port. 
> > > ...
> > >    It has n downstream ports,
> > >    each of which are known as CXL 2.0 root ports.
> > 
> > This sounds like a "host bridge port *contains* these root ports."
> > That doesn't sound right.
> 
> What sounds better? A CXL 2.0 Root Port is CXL capabilities on top
> of the PCIe component which has the PCI_EXP_TYPE_ROOT_PORT cap. In
> my mental model, a host bridge does contain the root ports. Perhaps
> I am wrong about that?

"A host bridge contains the root ports" makes sense to me.
"A host bridge *port* contains root ports" does not.

The PCIe spec would describe this as a "Root Complex may support one
or more [Root] Ports" (see PCIe r5.0, sec 1.3.1).

In PCIe, a "Port" is "an interface between a component and a PCI
Express Link."  It doesn't contain other Ports.

Sounds like CXL is the same here, and using the same terminology
(assuming that's what the CXL spec does) will reduce confusion.

> > > 3. Switch port. A switch port is similar to a hostbridge port except
> > >    register access is defined in the CXL specification in a platform
> > >    agnostic way. The downstream ports for a switch are simply known as
> > >    downstream ports. The enumeration of these are entirely contained
> > >    within cxl_port.
> > 
> > In PCIe, "Downstream Port" includes both Root Ports and Switch
> > Downstream Ports.  It sounds like that's not the case for CXL?
> 
> In CXL 2.0, it's fairly close to true that switch downstream ports
> and root ports are equivalent. From today's driver perspective they
> are equivalent. Root ports have some capabilities which switch
> downstream ports do not.

Same as PCIe.

> > > 4. Endpoint port. Endpoint ports are similar to switch ports with the
> > >    exception that they have no downstream ports, only the underlying
> > >    media on the device. The enumeration of these are started by cxl_pci,
> > >    and completed by cxl_port.
> > 
> > Does CXL use an "Upstream Port" concept similar to PCIe?  In PCIe,
> > "Upstream Port" includes both Switch Upstream Ports and the Upstream
> > Port on an Endpoint.
> 
> Not really. Endpoints aren't known as ports in the spec and they
> have a decent amount of divergence from upstream ports. The main
> area where they do converge is in the memory decoding capabilities.
> In fact, it might come to a point where we find adding an endpoint
> as a port does not make sense, but for now it does.

Since a PCIe "Port" refers to the interface between a component and a
Link, PCIe Endpoints have Upstream Ports just like switches do.  I'm
guessing CXL is the same.

The part about "endpoint ports have no downstream ports" is what
doesn't read well to me because ports don't have other ports.

This section is about the four types of ports in a system.  I'm
trying to match those up with spec terms, either PCIe or CXL.  It
sounds like you intend bullet 3 to include both Switch Upstream Ports
and Switch Downstream Ports, and bullet 4 to be only Endpoint Ports
(which are Upstream Ports).

> > I hope this driver is not modeled on the PCI portdrv.  IMO, that
> > was a design error, and the "port service drivers" (PME, hotplug,
> > AER, etc) should be directly integrated into the PCI core instead
> > of pretending to be independent drivers.
> 
> I'd like to understand a bit about why you think it was a design
> error. The port driver is intended to be a port services driver,
> however I see the services provided as quite different than the ones
> you mention. The primary service cxl_port will provide from here on
> out is the ability to manage HDM decoder resources for a port. Other
> independent drivers that want to use HDM decoder resources would
> rely on the port driver for this.

I'll continue this part in the later part of the thread.

> > > + *		* - Switch
> > > + *		  - Switch Upstream Port
> > > + *		  - Switch Downstream Port
> > > + *		* - Endpoint
> > > + *		  - Endpoint Port
> > > + *		  - N/A
> > 
> > What does "N/A" mean here?  Is it telling us something useful?
> 
> This gets rendered into a table that looks like the following:
> 
> | component  | upstream             | downstream             |
> | ---------  | --------             | ----------             |
> | Hostbridge | ACPI0016             | Root Port              |
> | Switch     | Switch Upstream Port | Switch Downstream Port |
> | Endpoint   | Endpoint Port        | N/A                    |

Makes sense, thanks.  I didn't know how to read the ReST and thought
this was just a list, where N/A wouldn't make much sense.

Bjorn
Bjorn Helgaas Dec. 2, 2021, 9:24 p.m. UTC | #21
On Tue, Nov 23, 2021 at 11:17:55PM -0800, Dan Williams wrote:
> On Tue, Nov 23, 2021 at 10:33 PM Christoph Hellwig <hch@lst.de> wrote:
> > On Tue, Nov 23, 2021 at 04:40:06PM -0800, Dan Williams wrote:
> > > Let me ask a clarifying question coming from the other direction that
> > > resulted in the creation of the auxiliary bus architecture. Some
> > > background. RDMA is a protocol that may run on top of Ethernet.
> >
> > No, RDMA is a concept.  Linux supports 2 and a half RDMA protocols
> > that run over ethernet (RoCE v1 and v2 and iWarp).
> 
> Yes, I was being too coarse, point taken. However, I don't think that
> changes the observation that multiple vendors are using aux bus to
> share a feature driver across multiple base Ethernet drivers.
> 
> > > Consider the case where you have multiple generations of Ethernet
> > > adapter devices, but they all support common RDMA functionality. You
> > > only have the one PCI device to attach a unique Ethernet driver. What
> > > is an idiomatic way to deploy a module that automatically loads and
> > > attaches to the exported common functionality across adapters that
> > > otherwise have a unique native driver for the hardware device?
> >
> > The whole aux bus drama is mostly because the intel design for these
> > is really fucked up.  All the sane HCAs do not use this model.  All
> > this attchment crap really should not be there.
> 
> I am missing the counter proposal in both Bjorn's and your distaste
> for aux bus and PCIe portdrv?

For the case of PCIe portdrv, the functionality involved is Power
Management Events (PME), Advanced Error Reporting (AER), PCIe native
hotplug, Downstream Port Containment (DPC), and Bandwidth
Notifications.

Currently each has a separate "port service driver" with .probe(),
.remove(), .suspend(), .resume(), etc.

The services share interrupt vectors.  It's quite complicated to set
them up, and it has to be done in the portdrv, not in the individual
drivers.

They also share power state (D0, D3hot, etc).  

In my mind these are not separate devices from the underlying PCI
device, and I don't think splitting the support into "service drivers"
made things better.  I think it would be simpler if these were just
added to pci_init_capabilities() like other optional pieces of PCI
functionality.

Sysfs looks like this:

  /sys/devices/pci0000:00/0000:00:1c.0/                       # Root Port
  /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie002/  # AER "device"
  /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie010/  # BW notif

  /sys/bus/pci/devices/0000:00:1c.0 -> ../../../devices/pci0000:00/0000:00:1c.0/
  /sys/bus/pci_express/devices/0000:00:1c.0:pcie002 -> ../../../devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie002/

The "pcie002" names (hex for PCIE_PORT_SERVICE_AER, etc.) are
unintelligible.  I don't know why we have a separate
/sys/bus/pci_express hierarchy.

IIUC, CXL devices will be enumerated by the usual PCI enumeration, so
there will be a struct pci_dev for them, and they will appear under
/sys/devices/pci*/.

They will have the usual PCI Power Management, MSI, AER, DPC, and
similar Capabilites, so the PCI core will manage them.

CXL devices have lots of fancy additional features.  Does that merit
making a separate struct device and a separate sysfs hierarchy for
them?  I don't know.

> > > Another example, the Native PCIe Enclosure Management (NPEM)
> > > specification defines a handful of registers that can appear anywhere
> > > in the PCIe hierarchy. How can you write a common driver that is
> > > generically applicable to any given NPEM instance?
> >
> > Another totally messed up spec.  But then pretty much everything coming
> > from the PCIe SIG in terms of interface tends to be really, really
> > broken lately.

Hotplug is more central to PCI than NPEM is, but NPEM is a little bit
like PCIe native hotplug in concept: hotplug has a few registers that
control downstream indicators, interlock, and power controller; NPEM
has registers that control downstream indicators.

Both are prescribed by the PCIe spec and presumably designed to work
alongside the usual device-specific drivers for bridges, SSDs, etc.

I would at least explore the idea of doing common support by
integrating NPEM into the PCI core.  There would have to be some hook
for the enclosure-specific bits, but I think it's fair for the details
of sending commands and polling for command completed to be part of
the PCI core.

> DVSEC and DOE is more of the same in terms of composing add-on
> features into devices. Hardware vendors want to mix multiple hard-IPs
> into a single device, aux bus is one response. Topology specific buses
> like /sys/bus/cxl are another.

VSEC and DVSEC are pretty much wild cards since the PCIe spec says
nothing about what registers they may contain or how they should work.

DOE *is* specified by PCIe, at least in terms of the data transfer
protocol (interrupt usage, read/write mailbox, etc).  I think that,
and the fact that it's not specific to CXL, means we need some kind of
PCI core interface to do the transfers.

> This CXL port driver is offering enumeration, link management, and
> memory decode setup services to the rest of the topology. I see it as
> similar to management protocol services offered by libsas.

Bjorn
Dan Williams Dec. 3, 2021, 1:38 a.m. UTC | #22
[ add Stuart for NPEM feedback ]

On Thu, Dec 2, 2021 at 1:24 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Tue, Nov 23, 2021 at 11:17:55PM -0800, Dan Williams wrote:
> > On Tue, Nov 23, 2021 at 10:33 PM Christoph Hellwig <hch@lst.de> wrote:
> > > On Tue, Nov 23, 2021 at 04:40:06PM -0800, Dan Williams wrote:
> > > > Let me ask a clarifying question coming from the other direction that
> > > > resulted in the creation of the auxiliary bus architecture. Some
> > > > background. RDMA is a protocol that may run on top of Ethernet.
> > >
> > > No, RDMA is a concept.  Linux supports 2 and a half RDMA protocols
> > > that run over ethernet (RoCE v1 and v2 and iWarp).
> >
> > Yes, I was being too coarse, point taken. However, I don't think that
> > changes the observation that multiple vendors are using aux bus to
> > share a feature driver across multiple base Ethernet drivers.
> >
> > > > Consider the case where you have multiple generations of Ethernet
> > > > adapter devices, but they all support common RDMA functionality. You
> > > > only have the one PCI device to attach a unique Ethernet driver. What
> > > > is an idiomatic way to deploy a module that automatically loads and
> > > > attaches to the exported common functionality across adapters that
> > > > otherwise have a unique native driver for the hardware device?
> > >
> > > The whole aux bus drama is mostly because the intel design for these
> > > is really fucked up.  All the sane HCAs do not use this model.  All
> > > this attchment crap really should not be there.
> >
> > I am missing the counter proposal in both Bjorn's and your distaste
> > for aux bus and PCIe portdrv?
>
> For the case of PCIe portdrv, the functionality involved is Power
> Management Events (PME), Advanced Error Reporting (AER), PCIe native
> hotplug, Downstream Port Containment (DPC), and Bandwidth
> Notifications.
>
> Currently each has a separate "port service driver" with .probe(),
> .remove(), .suspend(), .resume(), etc.
>
> The services share interrupt vectors.  It's quite complicated to set
> them up, and it has to be done in the portdrv, not in the individual
> drivers.
>
> They also share power state (D0, D3hot, etc).
>
> In my mind these are not separate devices from the underlying PCI
> device, and I don't think splitting the support into "service drivers"
> made things better.  I think it would be simpler if these were just
> added to pci_init_capabilities() like other optional pieces of PCI
> functionality.
>
> Sysfs looks like this:
>
>   /sys/devices/pci0000:00/0000:00:1c.0/                       # Root Port
>   /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie002/  # AER "device"
>   /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie010/  # BW notif
>
>   /sys/bus/pci/devices/0000:00:1c.0 -> ../../../devices/pci0000:00/0000:00:1c.0/
>   /sys/bus/pci_express/devices/0000:00:1c.0:pcie002 -> ../../../devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie002/
>
> The "pcie002" names (hex for PCIE_PORT_SERVICE_AER, etc.) are
> unintelligible.  I don't know why we have a separate
> /sys/bus/pci_express hierarchy.
>
> IIUC, CXL devices will be enumerated by the usual PCI enumeration, so
> there will be a struct pci_dev for them, and they will appear under
> /sys/devices/pci*/.
>
> They will have the usual PCI Power Management, MSI, AER, DPC, and
> similar Capabilites, so the PCI core will manage them.
>
> CXL devices have lots of fancy additional features.  Does that merit
> making a separate struct device and a separate sysfs hierarchy for
> them?  I don't know.

Thank you, this framing really helps.

So, if I look at this through the lens of the "can this just be
handled by pci_init_capabilities()?" guardband, where the capability
is device-scoped and shares resources that a driver for the same
device would use, then yes, PCIe port services do not merit a separate
bus.

CXL memory expansion is distinctly not in that category. CXL is a
distinct specification (i.e. unlike PCIe port services which are
literally in the PCI Sig purview), distinct transport/data layer
(effectively a different physical connection on the wire), and is
composable in ways that PCIe is not.

For example, if you trigger FLR on a PCI device you would expect the
entirety of pci_init_capabilities() needs to be saved / restored, CXL
state is not affected by FLR.

The separate link layer for CXL means that mere device visibility is
not sufficient to determine CXL operation. Whereas with a PCI driver
if you can see the device you know that the device is ready to be the
target of config, io, and mmio cycles, CXL.mem operation may not be
available when the device is visible to enumeration.

The composability of CXL places the highest demand for an independent
'struct bus_type' and registering CXL devices for their corresponding
PCIe devices. The bus is a rendezvous for all the moving pieces needed
to validate and provision a CXL memory region that may span multiple
endpoints, switches and host bridges. An action like reprogramming
memory decode of an endpoint needs to be coordinated across the entire
topology.

The fact that the PCI core remains blissfully unaware of all these
interdependencies is a feature because CXL is effectively a super-set
of PCIe for fast-path CXL.mem operation, even though it is derivative
for enumeration and slow-path manageability.

So I hope you see CXL's need to create some PCIe-symbiotic objects in
order to compose CXL objects (like interleaved memory regions) that
have no PCIe analog.

> > > > Another example, the Native PCIe Enclosure Management (NPEM)
> > > > specification defines a handful of registers that can appear anywhere
> > > > in the PCIe hierarchy. How can you write a common driver that is
> > > > generically applicable to any given NPEM instance?
> > >
> > > Another totally messed up spec.  But then pretty much everything coming
> > > from the PCIe SIG in terms of interface tends to be really, really
> > > broken lately.
>
> Hotplug is more central to PCI than NPEM is, but NPEM is a little bit
> like PCIe native hotplug in concept: hotplug has a few registers that
> control downstream indicators, interlock, and power controller; NPEM
> has registers that control downstream indicators.
>
> Both are prescribed by the PCIe spec and presumably designed to work
> alongside the usual device-specific drivers for bridges, SSDs, etc.
>
> I would at least explore the idea of doing common support by
> integrating NPEM into the PCI core.  There would have to be some hook
> for the enclosure-specific bits, but I think it's fair for the details
> of sending commands and polling for command completed to be part of
> the PCI core.

The problem with NPEM compared to hotplug LED signaling is that it has
the strange property that an NPEM higher in the topology might
override one lower in the topology. This is to support things like
NVME enclosures where the NVME device itself may have an NPEM
instance, but that instance is of course not suitable for signaling
that the device is not present. So, instead the system may be designed
to have an NPEM in the upstream switch port and disable the NPEM
instances in the device. Platform firmware decides which NPEM is in
use.

It also has the "fun" property of additionally being overridden by ACPI.

Stuart, have a look at collapsing the auxiliary-device approach into
pci_init_capabilities() and whether that can still coordinate with the
enclosure code.

One of the nice properties of the auxiliary organization is well
defined module boundaries. Will NPEM in the PCI core now force
CONFIG_ENCLOSURE_SERVICES to be built-in? That seems an unwanted side
effect to me.

> > DVSEC and DOE is more of the same in terms of composing add-on
> > features into devices. Hardware vendors want to mix multiple hard-IPs
> > into a single device, aux bus is one response. Topology specific buses
> > like /sys/bus/cxl are another.
>
> VSEC and DVSEC are pretty much wild cards since the PCIe spec says
> nothing about what registers they may contain or how they should work.
>
> DOE *is* specified by PCIe, at least in terms of the data transfer
> protocol (interrupt usage, read/write mailbox, etc).  I think that,
> and the fact that it's not specific to CXL, means we need some kind of
> PCI core interface to do the transfers.

DOE transfer code belongs in drivers/pci/ period, but does it really
need to be in drivers/pci/built-in.a for everyone regardless of
whether it is being used or not?
Bjorn Helgaas Dec. 3, 2021, 10:03 p.m. UTC | #23
On Thu, Dec 02, 2021 at 05:38:17PM -0800, Dan Williams wrote:
> [ add Stuart for NPEM feedback ]
> 
> On Thu, Dec 2, 2021 at 1:24 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, Nov 23, 2021 at 11:17:55PM -0800, Dan Williams wrote:
> > > On Tue, Nov 23, 2021 at 10:33 PM Christoph Hellwig <hch@lst.de> wrote:
> > > > On Tue, Nov 23, 2021 at 04:40:06PM -0800, Dan Williams wrote:
> > > > > Let me ask a clarifying question coming from the other direction that
> > > > > resulted in the creation of the auxiliary bus architecture. Some
> > > > > background. RDMA is a protocol that may run on top of Ethernet.
> > > >
> > > > No, RDMA is a concept.  Linux supports 2 and a half RDMA protocols
> > > > that run over ethernet (RoCE v1 and v2 and iWarp).
> > >
> > > Yes, I was being too coarse, point taken. However, I don't think that
> > > changes the observation that multiple vendors are using aux bus to
> > > share a feature driver across multiple base Ethernet drivers.
> > >
> > > > > Consider the case where you have multiple generations of Ethernet
> > > > > adapter devices, but they all support common RDMA functionality. You
> > > > > only have the one PCI device to attach a unique Ethernet driver. What
> > > > > is an idiomatic way to deploy a module that automatically loads and
> > > > > attaches to the exported common functionality across adapters that
> > > > > otherwise have a unique native driver for the hardware device?
> > > >
> > > > The whole aux bus drama is mostly because the intel design for these
> > > > is really fucked up.  All the sane HCAs do not use this model.  All
> > > > this attchment crap really should not be there.
> > >
> > > I am missing the counter proposal in both Bjorn's and your distaste
> > > for aux bus and PCIe portdrv?
> >
> > For the case of PCIe portdrv, the functionality involved is Power
> > Management Events (PME), Advanced Error Reporting (AER), PCIe native
> > hotplug, Downstream Port Containment (DPC), and Bandwidth
> > Notifications.
> >
> > Currently each has a separate "port service driver" with .probe(),
> > .remove(), .suspend(), .resume(), etc.
> >
> > The services share interrupt vectors.  It's quite complicated to set
> > them up, and it has to be done in the portdrv, not in the individual
> > drivers.
> >
> > They also share power state (D0, D3hot, etc).
> >
> > In my mind these are not separate devices from the underlying PCI
> > device, and I don't think splitting the support into "service drivers"
> > made things better.  I think it would be simpler if these were just
> > added to pci_init_capabilities() like other optional pieces of PCI
> > functionality.
> >
> > Sysfs looks like this:
> >
> >   /sys/devices/pci0000:00/0000:00:1c.0/                       # Root Port
> >   /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie002/  # AER "device"
> >   /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie010/  # BW notif
> >
> >   /sys/bus/pci/devices/0000:00:1c.0 -> ../../../devices/pci0000:00/0000:00:1c.0/
> >   /sys/bus/pci_express/devices/0000:00:1c.0:pcie002 -> ../../../devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie002/
> >
> > The "pcie002" names (hex for PCIE_PORT_SERVICE_AER, etc.) are
> > unintelligible.  I don't know why we have a separate
> > /sys/bus/pci_express hierarchy.
> >
> > IIUC, CXL devices will be enumerated by the usual PCI enumeration, so
> > there will be a struct pci_dev for them, and they will appear under
> > /sys/devices/pci*/.
> >
> > They will have the usual PCI Power Management, MSI, AER, DPC, and
> > similar Capabilites, so the PCI core will manage them.
> >
> > CXL devices have lots of fancy additional features.  Does that merit
> > making a separate struct device and a separate sysfs hierarchy for
> > them?  I don't know.
> 
> Thank you, this framing really helps.
> 
> So, if I look at this through the lens of the "can this just be
> handled by pci_init_capabilities()?" guardband, where the capability
> is device-scoped and shares resources that a driver for the same
> device would use, then yes, PCIe port services do not merit a separate
> bus.
> 
> CXL memory expansion is distinctly not in that category. CXL is a
> distinct specification (i.e. unlike PCIe port services which are
> literally in the PCI Sig purview), distinct transport/data layer
> (effectively a different physical connection on the wire), and is
> composable in ways that PCIe is not.
> 
> For example, if you trigger FLR on a PCI device you would expect the
> entirety of pci_init_capabilities() needs to be saved / restored, CXL
> state is not affected by FLR.
> 
> The separate link layer for CXL means that mere device visibility is
> not sufficient to determine CXL operation. Whereas with a PCI driver
> if you can see the device you know that the device is ready to be the
> target of config, io, and mmio cycles, 

Not strictly true.  A PCI device must respond to config transactions
within a short time after reset, but it does not respond to IO or MEM
transactions until a driver sets PCI_COMMAND_IO or PCI_COMMAND_MEMORY.

> ... CXL.mem operation may not be available when the device is
> visible to enumeration.

> The composability of CXL places the highest demand for an independent
> 'struct bus_type' and registering CXL devices for their corresponding
> PCIe devices. The bus is a rendezvous for all the moving pieces needed
> to validate and provision a CXL memory region that may span multiple
> endpoints, switches and host bridges. An action like reprogramming
> memory decode of an endpoint needs to be coordinated across the entire
> topology.

I guess software uses the CXL DVSEC to distinguish a CXL device from a
PCIe device, at least for 00.0.  Not sure about non-Dev 0 Func 0
devices; maybe this implies other functions in the same device are
part of the same CXL device?

A CXL device may comprise several PCIe devices, and I think they may
have non-CXL Capabilities, so I assume you need a struct pci_dev for
each PCIe device?

Looks like a single CXL DVSEC controls the entire CXL device
(including several PCIe devices), so I assume you want some kind of
struct cxl_dev to represent that aggregation?

> The fact that the PCI core remains blissfully unaware of all these
> interdependencies is a feature because CXL is effectively a super-set
> of PCIe for fast-path CXL.mem operation, even though it is derivative
> for enumeration and slow-path manageability.

I don't know how blissfully unaware PCI can be.  Can a user remove a
PCIe device that's part of a CXL device via sysfs?  Is the PCIe device
available for drivers to claim?  Do we need coordination between
cxl_driver_register() and pci_register_driver()?  Does CXL impose new
constraints on PCI power management?

> So I hope you see CXL's need to create some PCIe-symbiotic objects in
> order to compose CXL objects (like interleaved memory regions) that
> have no PCIe analog.

> > Hotplug is more central to PCI than NPEM is, but NPEM is a little bit
> > like PCIe native hotplug in concept: hotplug has a few registers that
> > control downstream indicators, interlock, and power controller; NPEM
> > has registers that control downstream indicators.
> >
> > Both are prescribed by the PCIe spec and presumably designed to work
> > alongside the usual device-specific drivers for bridges, SSDs, etc.
> >
> > I would at least explore the idea of doing common support by
> > integrating NPEM into the PCI core.  There would have to be some hook
> > for the enclosure-specific bits, but I think it's fair for the details
> > of sending commands and polling for command completed to be part of
> > the PCI core.
> 
> The problem with NPEM compared to hotplug LED signaling is that it has
> the strange property that an NPEM higher in the topology might
> override one lower in the topology. This is to support things like
> NVME enclosures where the NVME device itself may have an NPEM
> instance, but that instance is of course not suitable for signaling
> that the device is not present.

I would envision the PCI core providing only a bare-bones "send this
command and wait for it to complete" sort of interface.  Everything
else, including deciding which device to use, would go elsewhere.

> So, instead the system may be designed to have an NPEM in the
> upstream switch port and disable the NPEM instances in the device.
> Platform firmware decides which NPEM is in use.

Since you didn't mention a firmware interface to communicate which
NPEM is in use, I assume firmware does this by preventing other
devices from advertising NPEM support?

> It also has the "fun" property of additionally being overridden by ACPI.
> ...
> One of the nice properties of the auxiliary organization is well
> defined module boundaries. Will NPEM in the PCI core now force
> CONFIG_ENCLOSURE_SERVICES to be built-in? That seems an unwanted side
> effect to me.

If the PCI core provides only the mechanism, and the smarts of NPEM
are in something analogous to drivers/scsi/ses.c, I don't think it
would force CONFIG_ENCLOSURE_SERVICES to be built-in.

> > DOE *is* specified by PCIe, at least in terms of the data transfer
> > protocol (interrupt usage, read/write mailbox, etc).  I think that,
> > and the fact that it's not specific to CXL, means we need some kind of
> > PCI core interface to do the transfers.
> 
> DOE transfer code belongs in drivers/pci/ period, but does it really
> need to be in drivers/pci/built-in.a for everyone regardless of
> whether it is being used or not?

I think my opinion would depend on how small the DOE transfer code
could be made and how much it would complicate things to make it a
module.  And also how we could enforce whatever mutual exclusion we
need to make it safe.

Bjorn
Dan Williams Dec. 4, 2021, 1:24 a.m. UTC | #24
On Fri, Dec 3, 2021 at 2:03 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Dec 02, 2021 at 05:38:17PM -0800, Dan Williams wrote:
> > [ add Stuart for NPEM feedback ]
> >
> > On Thu, Dec 2, 2021 at 1:24 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Tue, Nov 23, 2021 at 11:17:55PM -0800, Dan Williams wrote:
> > > > On Tue, Nov 23, 2021 at 10:33 PM Christoph Hellwig <hch@lst.de> wrote:
> > > > > On Tue, Nov 23, 2021 at 04:40:06PM -0800, Dan Williams wrote:
> > > > > > Let me ask a clarifying question coming from the other direction that
> > > > > > resulted in the creation of the auxiliary bus architecture. Some
> > > > > > background. RDMA is a protocol that may run on top of Ethernet.
> > > > >
> > > > > No, RDMA is a concept.  Linux supports 2 and a half RDMA protocols
> > > > > that run over ethernet (RoCE v1 and v2 and iWarp).
> > > >
> > > > Yes, I was being too coarse, point taken. However, I don't think that
> > > > changes the observation that multiple vendors are using aux bus to
> > > > share a feature driver across multiple base Ethernet drivers.
> > > >
> > > > > > Consider the case where you have multiple generations of Ethernet
> > > > > > adapter devices, but they all support common RDMA functionality. You
> > > > > > only have the one PCI device to attach a unique Ethernet driver. What
> > > > > > is an idiomatic way to deploy a module that automatically loads and
> > > > > > attaches to the exported common functionality across adapters that
> > > > > > otherwise have a unique native driver for the hardware device?
> > > > >
> > > > > The whole aux bus drama is mostly because the intel design for these
> > > > > is really fucked up.  All the sane HCAs do not use this model.  All
> > > > > this attchment crap really should not be there.
> > > >
> > > > I am missing the counter proposal in both Bjorn's and your distaste
> > > > for aux bus and PCIe portdrv?
> > >
> > > For the case of PCIe portdrv, the functionality involved is Power
> > > Management Events (PME), Advanced Error Reporting (AER), PCIe native
> > > hotplug, Downstream Port Containment (DPC), and Bandwidth
> > > Notifications.
> > >
> > > Currently each has a separate "port service driver" with .probe(),
> > > .remove(), .suspend(), .resume(), etc.
> > >
> > > The services share interrupt vectors.  It's quite complicated to set
> > > them up, and it has to be done in the portdrv, not in the individual
> > > drivers.
> > >
> > > They also share power state (D0, D3hot, etc).
> > >
> > > In my mind these are not separate devices from the underlying PCI
> > > device, and I don't think splitting the support into "service drivers"
> > > made things better.  I think it would be simpler if these were just
> > > added to pci_init_capabilities() like other optional pieces of PCI
> > > functionality.
> > >
> > > Sysfs looks like this:
> > >
> > >   /sys/devices/pci0000:00/0000:00:1c.0/                       # Root Port
> > >   /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie002/  # AER "device"
> > >   /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie010/  # BW notif
> > >
> > >   /sys/bus/pci/devices/0000:00:1c.0 -> ../../../devices/pci0000:00/0000:00:1c.0/
> > >   /sys/bus/pci_express/devices/0000:00:1c.0:pcie002 -> ../../../devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie002/
> > >
> > > The "pcie002" names (hex for PCIE_PORT_SERVICE_AER, etc.) are
> > > unintelligible.  I don't know why we have a separate
> > > /sys/bus/pci_express hierarchy.
> > >
> > > IIUC, CXL devices will be enumerated by the usual PCI enumeration, so
> > > there will be a struct pci_dev for them, and they will appear under
> > > /sys/devices/pci*/.
> > >
> > > They will have the usual PCI Power Management, MSI, AER, DPC, and
> > > similar Capabilites, so the PCI core will manage them.
> > >
> > > CXL devices have lots of fancy additional features.  Does that merit
> > > making a separate struct device and a separate sysfs hierarchy for
> > > them?  I don't know.
> >
> > Thank you, this framing really helps.
> >
> > So, if I look at this through the lens of the "can this just be
> > handled by pci_init_capabilities()?" guardband, where the capability
> > is device-scoped and shares resources that a driver for the same
> > device would use, then yes, PCIe port services do not merit a separate
> > bus.
> >
> > CXL memory expansion is distinctly not in that category. CXL is a
> > distinct specification (i.e. unlike PCIe port services which are
> > literally in the PCI Sig purview), distinct transport/data layer
> > (effectively a different physical connection on the wire), and is
> > composable in ways that PCIe is not.
> >
> > For example, if you trigger FLR on a PCI device you would expect the
> > entirety of pci_init_capabilities() needs to be saved / restored, CXL
> > state is not affected by FLR.
> >
> > The separate link layer for CXL means that mere device visibility is
> > not sufficient to determine CXL operation. Whereas with a PCI driver
> > if you can see the device you know that the device is ready to be the
> > target of config, io, and mmio cycles,
>
> Not strictly true.  A PCI device must respond to config transactions
> within a short time after reset, but it does not respond to IO or MEM
> transactions until a driver sets PCI_COMMAND_IO or PCI_COMMAND_MEMORY.

Right, what I was attempting to convey is that it's not like CXL.mem.
While flipping a bit on the device turns on PCI.mmio target support,
there's additional polling and status checking to be done after the
device is enabled to be a target for CXL.mem. I.e. the CXL.mem
configuration is done via PCI.mmio* (*for CXL 2.0 devices) and only
after the device negotiates a CXL link beyond the base PCIe link. It
is also the case that the device may not be ready for CXL.mem until
all the devices that compose the same range are available as well.

>
> > ... CXL.mem operation may not be available when the device is
> > visible to enumeration.
>
> > The composability of CXL places the highest demand for an independent
> > 'struct bus_type' and registering CXL devices for their corresponding
> > PCIe devices. The bus is a rendezvous for all the moving pieces needed
> > to validate and provision a CXL memory region that may span multiple
> > endpoints, switches and host bridges. An action like reprogramming
> > memory decode of an endpoint needs to be coordinated across the entire
> > topology.
>
> I guess software uses the CXL DVSEC to distinguish a CXL device from a
> PCIe device, at least for 00.0.

driver/cxl/pci.c attaches to: "PCI_DEVICE_CLASS((PCI_CLASS_MEMORY_CXL
<< 8 | CXL_MEMORY_PROGIF), ~0)"

I am not aware of any restriction for that class code to appear at function0?

> Not sure about non-Dev 0 Func 0
> devices; maybe this implies other functions in the same device are
> part of the same CXL device?

DVSEC is really only telling us the layout of the MMIO register space.
A CXL Memory Device (CXL.mem endpoint) implements so-called "device"
registers and "component" registers. The "device" registers control
things that a traditional PCI driver would control like a mailbox
interface and device local status registers. The "component" registers
are what mediate access to the CXL.mem decode space.

It is somewhat confusing because CXL 1.1 devices used the DVSEC
configuration registers directly for CXL.mem configuration, but CXL
2.0 ditched that organization. The Linux driver is only targeting CXL
2.0+ devices as platform firmware owns setting up CXL 1.1 memory
expanders. As far as Linux is concerned CXL 1.1 looks like DDR i.e.
it's just address space set up by the BIOS and populated into EFI and
ACPI tables. CXL 1.1 is also implementing a 1:1 device-to-memory-range
association. CXL 2.0 allows for N:1 devices-to-memory-range and
dynamic configuration of that by the OS. CXL 1.1 platform firmware
locks out the OS from making configuration changes.

> A CXL device may comprise several PCIe devices, and I think they may
> have non-CXL Capabilities, so I assume you need a struct pci_dev for
> each PCIe device?
>
> Looks like a single CXL DVSEC controls the entire CXL device
> (including several PCIe devices), so I assume you want some kind of
> struct cxl_dev to represent that aggregation?

We have 3 classes of a 'cxl_dev' in drivers/cxl:

"mem" devices (PCIe endpoints)
"port" devices (PCIe Upstream Switch Ports, PCIe Root Ports, ACPI0016
Host Bridge devices, and ACPI0017 CXL Root devices)
"region" devices (aggregate devices composed of multiple mem devices).

The "mem" driver is tasked with enumerating all "ports" in the path
between itself and the ACPI0017 root and validating that a CXL link is
up at each hop before enumerating "region" devices.

>
> > The fact that the PCI core remains blissfully unaware of all these
> > interdependencies is a feature because CXL is effectively a super-set
> > of PCIe for fast-path CXL.mem operation, even though it is derivative
> > for enumeration and slow-path manageability.
>
> I don't know how blissfully unaware PCI can be.  Can a user remove a
> PCIe device that's part of a CXL device via sysfs?

Yes. If that PCIe device is an endpoint then the corresponding "mem"
driver will get a ->remove() event because it is registered as a child
of that parent PCIe device. That in turn will tear down the relevant
part of the CXL port topology.

However, a gap (deliberate design choice?) in the PCI hotplug
implementation I currently see is an ability for an endpoint PCI
driver to dynamically deny hotplug requests based on the state of the
device. pci_ignore_hotplug() seems inadequate for the job of making
sure that someone first disables all participation of a "mem" device
in all regions before asking for its physical removal. However, if
someone force removes a device the driver and CXL subsystem will do
the right thing and go fail that memory region for all users. I'm
working with other DAX developers on a range based memory_failure()
api for this case.

> Is the PCIe device available for drivers to claim?

drivers/cxl/pci.c *is* a native PCI driver for CXL memory expanders.
You might be thinking of CXL accelerator devices, where the CXL.cache
and CXL.mem capabilities are incremental capabilities for the
accelerator. So, for example, a GPU with CXL.mem capabilities, would
be mostly ignored by the drivers/cxl/ stack by default. Only if that
device+driver wanted to emulate a generic memory expander and share
its memory space with the host might it instantiate mem, port, and
region objects. Otherwise CXL for accelerators is mostly a transparent
capability that the OS may not need to manage. Rather than copy data
back and forth between the host a CXL enabled GPU's driver can just
assign pointers to its local memory space and trust that it is
coherent. That's a gross oversimplification, but I want to convey that
there are CXL devices like accelerators that are the responsibility of
each accelerator PCI driver, vs CXL memory expander devices which are
generic providers of "System RAM" and "Persistent Memory" resources
and need the "mem", "port", "region" schema.

> Do we need coordination between cxl_driver_register() and pci_register_driver()?

They do not collide, and I think this goes back to the concern about
whether drivers/cxl/ is scanning for all CXL DVSECs. No, it only cares
about the CXL DVSEC in CXL memory expander endpoints with the
aforementioned class code, and the CXL DVSEC in every upstream switch
port in the ancestry up to the CXL root device (ACPI0017). CXL
accelerator drivers will always use pci_register_driver() and can
decide to register their DVSEC with the CXL core, or keep it private
to themselves. I imagine a GPU accelerator might register a "mem"
device if it needs to get CXL.mem decode set up after a hotplug event,
but if it's only using CXL.cache, or if its memory space was
established by platform firmware then drivers/cxl/ is not involved.

> Does CXL impose new constraints on PCI power management?

Recall that CXL is built such that it could be supported by a legacy
operating system where platform firmware owns the setup of devices,
just like DDR memory does not need a driver. This is where CXL 1.1
played until CXL 2.0 added so much configuration complexity (hotplug,
interleave, persistent memory) that it started to need OS help. The
Linux PCIe PM code will not notice a difference, but behind the scenes
the device, if it is offering coherent memory to the CPU, will be
coordinating with the CPU like it was part of the CPU package and not
a discrete device. I do not see new PM software enabling required in
my reading of "Section 10.0 Power Management" of the CXL
specification.

> > So I hope you see CXL's need to create some PCIe-symbiotic objects in
> > order to compose CXL objects (like interleaved memory regions) that
> > have no PCIe analog.
>
> > > Hotplug is more central to PCI than NPEM is, but NPEM is a little bit
> > > like PCIe native hotplug in concept: hotplug has a few registers that
> > > control downstream indicators, interlock, and power controller; NPEM
> > > has registers that control downstream indicators.
> > >
> > > Both are prescribed by the PCIe spec and presumably designed to work
> > > alongside the usual device-specific drivers for bridges, SSDs, etc.
> > >
> > > I would at least explore the idea of doing common support by
> > > integrating NPEM into the PCI core.  There would have to be some hook
> > > for the enclosure-specific bits, but I think it's fair for the details
> > > of sending commands and polling for command completed to be part of
> > > the PCI core.
> >
> > The problem with NPEM compared to hotplug LED signaling is that it has
> > the strange property that an NPEM higher in the topology might
> > override one lower in the topology. This is to support things like
> > NVME enclosures where the NVME device itself may have an NPEM
> > instance, but that instance is of course not suitable for signaling
> > that the device is not present.
>
> I would envision the PCI core providing only a bare-bones "send this
> command and wait for it to complete" sort of interface.  Everything
> else, including deciding which device to use, would go elsewhere.
>
> > So, instead the system may be designed to have an NPEM in the
> > upstream switch port and disable the NPEM instances in the device.
> > Platform firmware decides which NPEM is in use.
>
> Since you didn't mention a firmware interface to communicate which
> NPEM is in use, I assume firmware does this by preventing other
> devices from advertising NPEM support?

That's also my assumption. If the OS sees a disabled NPEM in the
topology it just needs to assume firmware knew what it was doing when
it disabled it. I wish NPEM was better specified than "trust firmware
to do the right thing via an ambiguous signal".

>
> > It also has the "fun" property of additionally being overridden by ACPI.
> > ...
> > One of the nice properties of the auxiliary organization is well
> > defined module boundaries. Will NPEM in the PCI core now force
> > CONFIG_ENCLOSURE_SERVICES to be built-in? That seems an unwanted side
> > effect to me.
>
> If the PCI core provides only the mechanism, and the smarts of NPEM
> are in something analogous to drivers/scsi/ses.c, I don't think it
> would force CONFIG_ENCLOSURE_SERVICES to be built-in.

What is that dynamic thing that glues CONFIG_ENCLOSURE_SERVICES to the
PCI core that also does not require statically linking that glue to
every driver that wants to talk to NPEM? I don't mind that the base
hardware access mechanism library is in the PCI core, but what does
NVME and the CXL memory expander driver register to get NPEM service
and associate their block / memory device with an enclosure slot? To
me that glue for these one-off ancillary features is what aux-bus is
all about, but I'm open to it not being aux-bus in the end. Maybe
Stuart has a proposal here?

>
> > > DOE *is* specified by PCIe, at least in terms of the data transfer
> > > protocol (interrupt usage, read/write mailbox, etc).  I think that,
> > > and the fact that it's not specific to CXL, means we need some kind of
> > > PCI core interface to do the transfers.
> >
> > DOE transfer code belongs in drivers/pci/ period, but does it really
> > need to be in drivers/pci/built-in.a for everyone regardless of
> > whether it is being used or not?
>
> I think my opinion would depend on how small the DOE transfer code
> could be made and how much it would complicate things to make it a
> module.  And also how we could enforce whatever mutual exclusion we
> need to make it safe.

At least for the mutual exclusion aspect I'm thinking of typical
request_region() style exclusion where the aux-driver claims the
configuration address register range.
Bjorn Helgaas Dec. 7, 2021, 2:56 a.m. UTC | #25
On Fri, Dec 03, 2021 at 05:24:34PM -0800, Dan Williams wrote:
> On Fri, Dec 3, 2021 at 2:03 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, Dec 02, 2021 at 05:38:17PM -0800, Dan Williams wrote:

I'm cutting out a bunch of details, not because they're unimportant,
but because I don't know enough yet for them to make sense to me.

> > I guess software uses the CXL DVSEC to distinguish a CXL device
> > from a PCIe device, at least for 00.0.
> 
> driver/cxl/pci.c attaches to: "PCI_DEVICE_CLASS((PCI_CLASS_MEMORY_CXL
> << 8 | CXL_MEMORY_PROGIF), ~0)"
> 
> I am not aware of any restriction for that class code to appear at
> function0?

Maybe it's not an actual restriction; I'm reading CXL r2.0, sec 8.1.3,
where it says:

  The PCIe configuration space of Device 0, Function 0 shall include
  the CXL PCI Express Designated Vendor-Specific Extended Capability
  (DVSEC) as shown in Figure 126. The capability, status and control
  fields in Device 0, Function 0 DVSEC control the CXL functionality
  of the entire CXL device.
  ...
  Software may use the presence of this DVSEC to differentiate between
  a CXL device and a PCIe device. As such, a standard PCIe device must
  not expose this DVSEC.

Sections 9.11.5 and 9.12.2 also talk about looking for CXL DVSEC on
dev 0, func 0 to identify CXL devices.

> > Not sure about non-Dev 0 Func 0 devices; maybe this implies other
> > functions in the same device are part of the same CXL device?
> 
> DVSEC is really only telling us the layout of the MMIO register
> space. ...

And DVSEC also apparently tells us that "this is a CXL device, not
just an ordinary PCIe device"?  It's not clear to me how you identify
other PCIe functions that are also part of the same CXL device.

> > > The fact that the PCI core remains blissfully unaware of all these
> > > interdependencies is a feature because CXL is effectively a super-set
> > > of PCIe for fast-path CXL.mem operation, even though it is derivative
> > > for enumeration and slow-path manageability.
> >
> > I don't know how blissfully unaware PCI can be.  Can a user remove a
> > PCIe device that's part of a CXL device via sysfs?
> 
> Yes. If that PCIe device is an endpoint then the corresponding "mem"
> driver will get a ->remove() event because it is registered as a child
> of that parent PCIe device. That in turn will tear down the relevant
> part of the CXL port topology.

The CXL device may include several PCIe devices.  "mem" is a CXL
driver that's bound to one of them (?)  Is that what you mean by "mem"
being a "child of the the parent PCIe device"?

> However, a gap (deliberate design choice?) in the PCI hotplug
> implementation I currently see is an ability for an endpoint PCI
> driver to dynamically deny hotplug requests based on the state of the
> device. ...

PCI allows surprise remove, so drivers generally can't deny hot
unplugs.  PCIe *does* provide for an Electromechanical Interlock (see
PCI_EXP_SLTCTL_EIC), but we don't currently support it.

> > Is the PCIe device available for drivers to claim?
> 
> drivers/cxl/pci.c *is* a native PCI driver for CXL memory expanders.
> You might be thinking of CXL accelerator devices, where the CXL.cache
> and CXL.mem capabilities are incremental capabilities for the
> accelerator.  ...

No, I'm not nearly sophisticated enough to be thinking of specific
types of CXL things :)

> > Do we need coordination between cxl_driver_register() and
> > pci_register_driver()?
> 
> They do not collide, and I think this goes back to the concern about
> whether drivers/cxl/ is scanning for all CXL DVSECs. ...

Sorry, I don't remember what this concern was, and I don't know why
they don't collide.  I *would* know that if I knew that the set of
things cxl_driver_register() binds to doesn't intersect the set of
pci_devs, but it's not clear to me what those things are.

The PCI core enumerates devices by doing config reads of the Vendor ID
for every possible bus and device number.  It allocs a pci_dev for
each device it finds, and those are the things pci_register_driver()
binds drivers to based on Vendor ID, Device ID, etc.

How does CXL enumeration work?  Do you envision it being integrated
into PCI enumeration?  Does it build a list/tree/etc of cxl_devs?

cxl_driver_register() associates a driver with something.  What
exactly is the thing the driver is associated with?  A pci_dev?  A
cxl_dev?  Some kind of aggregate CXL device composed of several PCIe
devices?

I expected cxl_driver.probe() to take a "struct cxl_dev *" or similar,
but it takes a "struct device *".  I'm trying to apply my knowledge of
how other buses work in Linux, but obviously it's not working very
well.

> > Does CXL impose new constraints on PCI power management?
> 
> Recall that CXL is built such that it could be supported by a legacy
> operating system where platform firmware owns the setup of devices,
> just like DDR memory does not need a driver. This is where CXL 1.1
> played until CXL 2.0 added so much configuration complexity (hotplug,
> interleave, persistent memory) that it started to need OS help. The
> Linux PCIe PM code will not notice a difference, but behind the scenes
> the device, if it is offering coherent memory to the CPU, will be
> coordinating with the CPU like it was part of the CPU package and not
> a discrete device. I do not see new PM software enabling required in
> my reading of "Section 10.0 Power Management" of the CXL
> specification.

So if Linux PM decides to suspend a PCIe device that's part of a CXL
device and put it in D3hot, this is not a problem for CXL?  I guess if
a CXL driver binds to the PCIe device, it can control what PM will do.
But I thought CXL drivers would bind to a CXL thing, not a PCIe thing.

I see lots of mentions of LTR in sec 10, which really scares me
because I'm pretty confident that Linux handling of LTR is broken, and
I have no idea how to fix it.

> > > So, instead the system may be designed to have an NPEM in the
> > > upstream switch port and disable the NPEM instances in the device.
> > > Platform firmware decides which NPEM is in use.
> >
> > Since you didn't mention a firmware interface to communicate which
> > NPEM is in use, I assume firmware does this by preventing other
> > devices from advertising NPEM support?
> 
> That's also my assumption. If the OS sees a disabled NPEM in the
> topology it just needs to assume firmware knew what it was doing when
> it disabled it. I wish NPEM was better specified than "trust firmware
> to do the right thing via an ambiguous signal".

If we enumerate a device with a capability that is disabled, we
normally don't treat that as a signal that it cannot be enabled.
There are lots of enable bits in PCI capabilities, and I don't know of
any cases where Linux assumes "Enable == 0" means "don't use this
feature."  Absent some negotiation like _OSC or some documented
restriction, e.g., in the PCI Firmware spec, Linux normally just
enables features when it decides to use them.

Bjorn
Dan Williams Dec. 7, 2021, 4:48 a.m. UTC | #26
On Mon, Dec 6, 2021 at 6:56 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, Dec 03, 2021 at 05:24:34PM -0800, Dan Williams wrote:
> > On Fri, Dec 3, 2021 at 2:03 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Thu, Dec 02, 2021 at 05:38:17PM -0800, Dan Williams wrote:
>
> I'm cutting out a bunch of details, not because they're unimportant,
> but because I don't know enough yet for them to make sense to me.
>
> > > I guess software uses the CXL DVSEC to distinguish a CXL device
> > > from a PCIe device, at least for 00.0.
> >
> > driver/cxl/pci.c attaches to: "PCI_DEVICE_CLASS((PCI_CLASS_MEMORY_CXL
> > << 8 | CXL_MEMORY_PROGIF), ~0)"
> >
> > I am not aware of any restriction for that class code to appear at
> > function0?
>
> Maybe it's not an actual restriction; I'm reading CXL r2.0, sec 8.1.3,
> where it says:
>
>   The PCIe configuration space of Device 0, Function 0 shall include
>   the CXL PCI Express Designated Vendor-Specific Extended Capability
>   (DVSEC) as shown in Figure 126. The capability, status and control
>   fields in Device 0, Function 0 DVSEC control the CXL functionality
>   of the entire CXL device.
>   ...
>   Software may use the presence of this DVSEC to differentiate between
>   a CXL device and a PCIe device. As such, a standard PCIe device must
>   not expose this DVSEC.
>
> Sections 9.11.5 and 9.12.2 also talk about looking for CXL DVSEC on
> dev 0, func 0 to identify CXL devices.

Ah, I did not internalize that when reading because if the DVSEC shows
up on other functions on the same device it should "just work" as far
as Linux is concerned, but I guess it simplifies implementations to
constrain where the capability appears.

> > > Not sure about non-Dev 0 Func 0 devices; maybe this implies other
> > > functions in the same device are part of the same CXL device?
> >
> > DVSEC is really only telling us the layout of the MMIO register
> > space. ...
>
> And DVSEC also apparently tells us that "this is a CXL device, not
> just an ordinary PCIe device"?  It's not clear to me how you identify
> other PCIe functions that are also part of the same CXL device.

I have not encountered any flows where the driver would care if it was
enabling another instance of the CXL DVSEC on the same device.

>
> > > > The fact that the PCI core remains blissfully unaware of all these
> > > > interdependencies is a feature because CXL is effectively a super-set
> > > > of PCIe for fast-path CXL.mem operation, even though it is derivative
> > > > for enumeration and slow-path manageability.
> > >
> > > I don't know how blissfully unaware PCI can be.  Can a user remove a
> > > PCIe device that's part of a CXL device via sysfs?
> >
> > Yes. If that PCIe device is an endpoint then the corresponding "mem"
> > driver will get a ->remove() event because it is registered as a child
> > of that parent PCIe device. That in turn will tear down the relevant
> > part of the CXL port topology.
>
> The CXL device may include several PCIe devices.  "mem" is a CXL
> driver that's bound to one of them (?)  Is that what you mean by "mem"
> being a "child of the the parent PCIe device"?

Right, cxl_pci_probe() does device_add() of a "mem" device on the
cxl_bus_type, and cxl_pci_remove() does device_del() of that same
child device. Just like xhci_pci_probe() arranges for device_add() of
a child USB device on the usb_dev_type.

>
> > However, a gap (deliberate design choice?) in the PCI hotplug
> > implementation I currently see is an ability for an endpoint PCI
> > driver to dynamically deny hotplug requests based on the state of the
> > device. ...
>
> PCI allows surprise remove, so drivers generally can't deny hot
> unplugs.  PCIe *does* provide for an Electromechanical Interlock (see
> PCI_EXP_SLTCTL_EIC), but we don't currently support it.

Ok, I guess people will just need to be careful then.

> > > Is the PCIe device available for drivers to claim?
> >
> > drivers/cxl/pci.c *is* a native PCI driver for CXL memory expanders.
> > You might be thinking of CXL accelerator devices, where the CXL.cache
> > and CXL.mem capabilities are incremental capabilities for the
> > accelerator.  ...
>
> No, I'm not nearly sophisticated enough to be thinking of specific
> types of CXL things :)

Ah, apologies I was reading too much into your line of questions.

>
> > > Do we need coordination between cxl_driver_register() and
> > > pci_register_driver()?
> >
> > They do not collide, and I think this goes back to the concern about
> > whether drivers/cxl/ is scanning for all CXL DVSECs. ...
>
> Sorry, I don't remember what this concern was, and I don't know why
> they don't collide.  I *would* know that if I knew that the set of
> things cxl_driver_register() binds to doesn't intersect the set of
> pci_devs, but it's not clear to me what those things are.

cxl_driver_register() registers a driver on the cxl_bus_type, it can
not bind to devices on the pci_bus_type.

> The PCI core enumerates devices by doing config reads of the Vendor ID
> for every possible bus and device number.  It allocs a pci_dev for
> each device it finds, and those are the things pci_register_driver()
> binds drivers to based on Vendor ID, Device ID, etc.
>
> How does CXL enumeration work?  Do you envision it being integrated
> into PCI enumeration?  Does it build a list/tree/etc of cxl_devs?

The cxl_pci driver, native PCI driver, attaches to endpoints that emit
the CXL Memory Device class code. That driver walks up the PCI
topology and adds a cxl_port device for each CXL DVSEC it finds in
each PCIE Upstream Port up to the hostbridge. Additionally there is a
cxl_acpi driver that enumerates platform CXL root resources and is
called the 'root' cxl_port. Once root-to-endpoint connectivity is
established then the endpoint is considered ready for CXL.mem
operation.

> cxl_driver_register() associates a driver with something.  What
> exactly is the thing the driver is associated with?  A pci_dev?  A
> cxl_dev?  Some kind of aggregate CXL device composed of several PCIe
> devices?

The aggregate device is a  cxl_region composed of 1 or more endpoint
(registered by cxl_pci_probe()) devices. Like a RAID array it
aggregates multiple devices to produce a new memory range.

> I expected cxl_driver.probe() to take a "struct cxl_dev *" or similar,
> but it takes a "struct device *".  I'm trying to apply my knowledge of
> how other buses work in Linux, but obviously it's not working very
> well.

There's no common attributes between "mem" "port" and "region" devices
so the drivers just do container_of() on the device and skip defining
a generic 'struct cxl_dev'.

> > > Does CXL impose new constraints on PCI power management?
> >
> > Recall that CXL is built such that it could be supported by a legacy
> > operating system where platform firmware owns the setup of devices,
> > just like DDR memory does not need a driver. This is where CXL 1.1
> > played until CXL 2.0 added so much configuration complexity (hotplug,
> > interleave, persistent memory) that it started to need OS help. The
> > Linux PCIe PM code will not notice a difference, but behind the scenes
> > the device, if it is offering coherent memory to the CPU, will be
> > coordinating with the CPU like it was part of the CPU package and not
> > a discrete device. I do not see new PM software enabling required in
> > my reading of "Section 10.0 Power Management" of the CXL
> > specification.
>
> So if Linux PM decides to suspend a PCIe device that's part of a CXL
> device and put it in D3hot, this is not a problem for CXL?  I guess if
> a CXL driver binds to the PCIe device, it can control what PM will do.
> But I thought CXL drivers would bind to a CXL thing, not a PCIe thing.

Recall that this starts with a native PCI driver for endpoints, that
driver can coordinate PM for its children on the CXL bus if it is
needed.

> I see lots of mentions of LTR in sec 10, which really scares me
> because I'm pretty confident that Linux handling of LTR is broken, and
> I have no idea how to fix it.

Ok, I will keep that in mind.

> > > > So, instead the system may be designed to have an NPEM in the
> > > > upstream switch port and disable the NPEM instances in the device.
> > > > Platform firmware decides which NPEM is in use.
> > >
> > > Since you didn't mention a firmware interface to communicate which
> > > NPEM is in use, I assume firmware does this by preventing other
> > > devices from advertising NPEM support?
> >
> > That's also my assumption. If the OS sees a disabled NPEM in the
> > topology it just needs to assume firmware knew what it was doing when
> > it disabled it. I wish NPEM was better specified than "trust firmware
> > to do the right thing via an ambiguous signal".
>
> If we enumerate a device with a capability that is disabled, we
> normally don't treat that as a signal that it cannot be enabled.
> There are lots of enable bits in PCI capabilities, and I don't know of
> any cases where Linux assumes "Enable == 0" means "don't use this
> feature."  Absent some negotiation like _OSC or some documented
> restriction, e.g., in the PCI Firmware spec, Linux normally just
> enables features when it decides to use them.

If the LEDs are not routed to a given NPEM instance, no amount of
enabling can get that instance operational.
diff mbox series

Patch

diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
index 3b8f41395f6b..fbf0393cdddc 100644
--- a/Documentation/driver-api/cxl/memory-devices.rst
+++ b/Documentation/driver-api/cxl/memory-devices.rst
@@ -28,6 +28,11 @@  CXL Memory Device
 .. kernel-doc:: drivers/cxl/pci.c
    :internal:
 
+CXL Port
+--------
+.. kernel-doc:: drivers/cxl/port.c
+   :doc: cxl port
+
 CXL Core
 --------
 .. kernel-doc:: drivers/cxl/cxl.h
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index ef05e96f8f97..3aeb33bba5a3 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -77,4 +77,26 @@  config CXL_PMEM
 	  provisioning the persistent memory capacity of CXL memory expanders.
 
 	  If unsure say 'm'.
+
+config CXL_MEM
+	tristate "CXL.mem: Memory Devices"
+	select CXL_PORT
+	depends on CXL_PCI
+	default CXL_BUS
+	help
+	  The CXL.mem protocol allows a device to act as a provider of "System
+	  RAM" and/or "Persistent Memory" that is fully coherent as if the
+	  memory was attached to the typical CPU memory controller.  This is
+	  known as HDM "Host-managed Device Memory".
+
+	  Say 'y/m' to enable a driver that will attach to CXL.mem devices for
+	  memory expansion and control of HDM. See Chapter 9.13 in the CXL 2.0
+	  specification for a detailed description of HDM.
+
+	  If unsure say 'm'.
+
+
+config CXL_PORT
+	tristate
+
 endif
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index cf07ae6cea17..56fcac2323cb 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -3,7 +3,9 @@  obj-$(CONFIG_CXL_BUS) += core/
 obj-$(CONFIG_CXL_PCI) += cxl_pci.o
 obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
 obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
+obj-$(CONFIG_CXL_PORT) += cxl_port.o
 
 cxl_pci-y := pci.o
 cxl_acpi-y := acpi.o
 cxl_pmem-y := pmem.o
+cxl_port-y := port.o
diff --git a/drivers/cxl/core/bus.c b/drivers/cxl/core/bus.c
index 9e0d7d5d9298..46a06cfe79bd 100644
--- a/drivers/cxl/core/bus.c
+++ b/drivers/cxl/core/bus.c
@@ -31,6 +31,8 @@  static DECLARE_RWSEM(root_port_sem);
 
 static struct device *cxl_topology_host;
 
+static bool is_cxl_decoder(struct device *dev);
+
 int cxl_register_topology_host(struct device *host)
 {
 	down_write(&topology_host_sem);
@@ -75,6 +77,45 @@  static void put_cxl_topology_host(struct device *dev)
 	up_read(&topology_host_sem);
 }
 
+static int decoder_match(struct device *dev, void *data)
+{
+	struct resource *theirs = (struct resource *)data;
+	struct cxl_decoder *cxld;
+
+	if (!is_cxl_decoder(dev))
+		return 0;
+
+	cxld = to_cxl_decoder(dev);
+	if (theirs->start <= cxld->decoder_range.start &&
+	    theirs->end >= cxld->decoder_range.end)
+		return 1;
+
+	return 0;
+}
+
+static struct cxl_decoder *cxl_find_root_decoder(resource_size_t base,
+						 resource_size_t size)
+{
+	struct cxl_decoder *cxld = NULL;
+	struct device *cxldd;
+	struct device *host;
+	struct resource res = (struct resource){
+		.start = base,
+		.end = base + size - 1,
+	};
+
+	host = get_cxl_topology_host();
+	if (!host)
+		return NULL;
+
+	cxldd = device_find_child(host, &res, decoder_match);
+	if (cxldd)
+		cxld = to_cxl_decoder(cxldd);
+
+	put_cxl_topology_host(host);
+	return cxld;
+}
+
 static ssize_t devtype_show(struct device *dev, struct device_attribute *attr,
 			    char *buf)
 {
@@ -280,6 +321,11 @@  bool is_root_decoder(struct device *dev)
 }
 EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
 
+static bool is_cxl_decoder(struct device *dev)
+{
+	return dev->type->release == cxl_decoder_release;
+}
+
 struct cxl_decoder *to_cxl_decoder(struct device *dev)
 {
 	if (dev_WARN_ONCE(dev, dev->type->release != cxl_decoder_release,
@@ -327,6 +373,7 @@  struct cxl_port *to_cxl_port(struct device *dev)
 		return NULL;
 	return container_of(dev, struct cxl_port, dev);
 }
+EXPORT_SYMBOL_NS_GPL(to_cxl_port, CXL);
 
 struct cxl_dport *cxl_get_root_dport(struct device *dev)
 {
@@ -735,6 +782,24 @@  EXPORT_SYMBOL_NS_GPL(cxl_decoder_add, CXL);
 
 static void cxld_unregister(void *dev)
 {
+	struct cxl_decoder *plat_decoder, *cxld = to_cxl_decoder(dev);
+	resource_size_t base, size;
+
+	if (is_root_decoder(dev)) {
+		device_unregister(dev);
+		return;
+	}
+
+	base = cxld->decoder_range.start;
+	size = range_len(&cxld->decoder_range);
+
+	if (size) {
+		plat_decoder = cxl_find_root_decoder(base, size);
+		if (plat_decoder)
+			__release_region(&plat_decoder->platform_res, base,
+					 size);
+	}
+
 	device_unregister(dev);
 }
 
@@ -789,6 +854,8 @@  static int cxl_device_id(struct device *dev)
 		return CXL_DEVICE_NVDIMM_BRIDGE;
 	if (dev->type == &cxl_nvdimm_type)
 		return CXL_DEVICE_NVDIMM;
+	if (dev->type == &cxl_port_type)
+		return CXL_DEVICE_PORT;
 	return 0;
 }
 
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index 41a0245867ea..f191b0c995a7 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -159,9 +159,8 @@  void cxl_probe_device_regs(struct device *dev, void __iomem *base,
 }
 EXPORT_SYMBOL_NS_GPL(cxl_probe_device_regs, CXL);
 
-static void __iomem *devm_cxl_iomap_block(struct device *dev,
-					  resource_size_t addr,
-					  resource_size_t length)
+void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
+				   resource_size_t length)
 {
 	void __iomem *ret_val;
 	struct resource *res;
@@ -180,6 +179,7 @@  static void __iomem *devm_cxl_iomap_block(struct device *dev,
 
 	return ret_val;
 }
+EXPORT_SYMBOL_NS_GPL(devm_cxl_iomap_block, CXL);
 
 int cxl_map_component_regs(struct pci_dev *pdev,
 			   struct cxl_component_regs *regs,
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 3962a5e6a950..24fa16157d5e 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -17,6 +17,9 @@ 
  * (port-driver, region-driver, nvdimm object-drivers... etc).
  */
 
+/* CXL 2.0 8.2.4 CXL Component Register Layout and Definition */
+#define CXL_COMPONENT_REG_BLOCK_SIZE SZ_64K
+
 /* CXL 2.0 8.2.5 CXL.cache and CXL.mem Registers*/
 #define CXL_CM_OFFSET 0x1000
 #define CXL_CM_CAP_HDR_OFFSET 0x0
@@ -36,11 +39,22 @@ 
 #define CXL_HDM_DECODER_CAP_OFFSET 0x0
 #define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
 #define   CXL_HDM_DECODER_TARGET_COUNT_MASK GENMASK(7, 4)
-#define CXL_HDM_DECODER0_BASE_LOW_OFFSET 0x10
-#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET 0x14
-#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET 0x18
-#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET 0x1c
-#define CXL_HDM_DECODER0_CTRL_OFFSET 0x20
+#define   CXL_HDM_DECODER_INTERLEAVE_11_8 BIT(8)
+#define   CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
+#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
+#define   CXL_HDM_DECODER_ENABLE BIT(1)
+#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
+#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14)
+#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18)
+#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c)
+#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20)
+#define   CXL_HDM_DECODER0_CTRL_IG_MASK GENMASK(3, 0)
+#define   CXL_HDM_DECODER0_CTRL_IW_MASK GENMASK(7, 4)
+#define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
+#define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
+#define   CXL_HDM_DECODER0_CTRL_TYPE BIT(12)
+#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
+#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
 
 static inline int cxl_hdm_decoder_count(u32 cap_hdr)
 {
@@ -148,6 +162,8 @@  int cxl_map_device_regs(struct pci_dev *pdev,
 enum cxl_regloc_type;
 int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
 		      struct cxl_register_map *map);
+void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
+				   resource_size_t length);
 
 #define CXL_RESOURCE_NONE ((resource_size_t) -1)
 #define CXL_TARGET_STRLEN 20
@@ -165,7 +181,8 @@  void cxl_unregister_topology_host(struct device *host);
 #define CXL_DECODER_F_TYPE2 BIT(2)
 #define CXL_DECODER_F_TYPE3 BIT(3)
 #define CXL_DECODER_F_LOCK  BIT(4)
-#define CXL_DECODER_F_MASK  GENMASK(4, 0)
+#define CXL_DECODER_F_ENABLE    BIT(5)
+#define CXL_DECODER_F_MASK  GENMASK(5, 0)
 
 enum cxl_decoder_type {
        CXL_DECODER_ACCELERATOR = 2,
@@ -255,6 +272,8 @@  struct cxl_walk_context {
  * @dports: cxl_dport instances referenced by decoders
  * @decoder_ida: allocator for decoder ids
  * @component_reg_phys: component register capability base address (optional)
+ * @rescan_work: worker object for bus rescans after port additions
+ * @data: opaque data with driver specific usage
  */
 struct cxl_port {
 	struct device dev;
@@ -263,6 +282,8 @@  struct cxl_port {
 	struct list_head dports;
 	struct ida decoder_ida;
 	resource_size_t component_reg_phys;
+	struct work_struct rescan_work;
+	void *data;
 };
 
 /**
@@ -325,6 +346,7 @@  void cxl_driver_unregister(struct cxl_driver *cxl_drv);
 
 #define CXL_DEVICE_NVDIMM_BRIDGE	1
 #define CXL_DEVICE_NVDIMM		2
+#define CXL_DEVICE_PORT			3
 
 #define MODULE_ALIAS_CXL(type) MODULE_ALIAS("cxl:t" __stringify(type) "*")
 #define CXL_MODALIAS_FMT "cxl:t%d"
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
new file mode 100644
index 000000000000..3c03131517af
--- /dev/null
+++ b/drivers/cxl/port.c
@@ -0,0 +1,323 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+#include "cxlmem.h"
+
+/**
+ * DOC: cxl port
+ *
+ * The port driver implements the set of functionality needed to allow full
+ * decoder enumeration and routing. A CXL port is an abstraction of a CXL
+ * component that implements some amount of CXL decoding of CXL.mem traffic.
+ * As of the CXL 2.0 spec, this includes:
+ *
+ *	.. list-table:: CXL Components w/ Ports
+ *		:widths: 25 25 50
+ *		:header-rows: 1
+ *
+ *		* - component
+ *		  - upstream
+ *		  - downstream
+ *		* - Hostbridge
+ *		  - ACPI0016
+ *		  - root port
+ *		* - Switch
+ *		  - Switch Upstream Port
+ *		  - Switch Downstream Port
+ *		* - Endpoint
+ *		  - Endpoint Port
+ *		  - N/A
+ *
+ * The primary service this driver provides is enumerating HDM decoders and
+ * presenting APIs to other drivers to utilize the decoders.
+ */
+
+static struct workqueue_struct *cxl_port_wq;
+
+struct cxl_port_data {
+	struct cxl_component_regs regs;
+
+	struct port_caps {
+		unsigned int count;
+		unsigned int tc;
+		unsigned int interleave11_8;
+		unsigned int interleave14_12;
+	} caps;
+};
+
+static inline int to_interleave_granularity(u32 ctrl)
+{
+	int val = FIELD_GET(CXL_HDM_DECODER0_CTRL_IG_MASK, ctrl);
+
+	return 256 << val;
+}
+
+static inline int to_interleave_ways(u32 ctrl)
+{
+	int val = FIELD_GET(CXL_HDM_DECODER0_CTRL_IW_MASK, ctrl);
+
+	return 1 << val;
+}
+
+static void get_caps(struct cxl_port *port, struct cxl_port_data *cpd)
+{
+	void __iomem *hdm_decoder = cpd->regs.hdm_decoder;
+	struct port_caps *caps = &cpd->caps;
+	u32 hdm_cap;
+
+	hdm_cap = readl(hdm_decoder + CXL_HDM_DECODER_CAP_OFFSET);
+
+	caps->count = cxl_hdm_decoder_count(hdm_cap);
+	caps->tc = FIELD_GET(CXL_HDM_DECODER_TARGET_COUNT_MASK, hdm_cap);
+	caps->interleave11_8 =
+		FIELD_GET(CXL_HDM_DECODER_INTERLEAVE_11_8, hdm_cap);
+	caps->interleave14_12 =
+		FIELD_GET(CXL_HDM_DECODER_INTERLEAVE_14_12, hdm_cap);
+}
+
+static int map_regs(struct cxl_port *port, void __iomem *crb,
+		    struct cxl_port_data *cpd)
+{
+	struct cxl_register_map map;
+	struct cxl_component_reg_map *comp_map = &map.component_map;
+
+	cxl_probe_component_regs(&port->dev, crb, comp_map);
+	if (!comp_map->hdm_decoder.valid) {
+		dev_err(&port->dev, "HDM decoder registers invalid\n");
+		return -ENXIO;
+	}
+
+	cpd->regs.hdm_decoder = crb + comp_map->hdm_decoder.offset;
+
+	return 0;
+}
+
+static u64 get_decoder_size(void __iomem *hdm_decoder, int n)
+{
+	u32 ctrl = readl(hdm_decoder + CXL_HDM_DECODER0_CTRL_OFFSET(n));
+
+	if (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED)
+		return 0;
+
+	return ioread64_hi_lo(hdm_decoder +
+			      CXL_HDM_DECODER0_SIZE_LOW_OFFSET(n));
+}
+
+static bool is_endpoint_port(struct cxl_port *port)
+{
+	/* Endpoints can't be ports... yet! */
+	return false;
+}
+
+static void rescan_ports(struct work_struct *work)
+{
+	if (bus_rescan_devices(&cxl_bus_type))
+		pr_warn("Failed to rescan\n");
+}
+
+/* Minor layering violation */
+static int dvsec_range_used(struct cxl_port *port)
+{
+	struct cxl_endpoint_dvsec_info *info;
+	int i, ret = 0;
+
+	if (!is_endpoint_port(port))
+		return 0;
+
+	info = port->data;
+	for (i = 0; i < info->ranges; i++)
+		if (info->range[i].size)
+			ret |= 1 << i;
+
+	return ret;
+}
+
+static int enumerate_hdm_decoders(struct cxl_port *port,
+				  struct cxl_port_data *portdata)
+{
+	void __iomem *hdm_decoder = portdata->regs.hdm_decoder;
+	bool global_enable;
+	u32 global_ctrl;
+	int i = 0;
+
+	global_ctrl = readl(hdm_decoder + CXL_HDM_DECODER_CTRL_OFFSET);
+	global_enable = global_ctrl & CXL_HDM_DECODER_ENABLE;
+	if (!global_enable) {
+		i = dvsec_range_used(port);
+		if (i) {
+			dev_err(&port->dev,
+				"Couldn't add port because device is using DVSEC range registers\n");
+			return -EBUSY;
+		}
+	}
+
+	for (i = 0; i < portdata->caps.count; i++) {
+		int rc, target_count = portdata->caps.tc;
+		struct cxl_decoder *cxld;
+		int *target_map = NULL;
+		u64 size;
+
+		if (is_endpoint_port(port))
+			target_count = 0;
+
+		cxld = cxl_decoder_alloc(port, target_count);
+		if (IS_ERR(cxld)) {
+			dev_warn(&port->dev,
+				 "Failed to allocate the decoder\n");
+			return PTR_ERR(cxld);
+		}
+
+		cxld->target_type = CXL_DECODER_EXPANDER;
+		cxld->interleave_ways = 1;
+		cxld->interleave_granularity = 0;
+
+		size = get_decoder_size(hdm_decoder, i);
+		if (size != 0) {
+#define decoderN(reg, n) hdm_decoder + CXL_HDM_DECODER0_##reg(n)
+			int temp[CXL_DECODER_MAX_INTERLEAVE];
+			u64 base;
+			u32 ctrl;
+			int j;
+			union {
+				u64 value;
+				char target_id[8];
+			} target_list;
+
+			target_map = temp;
+			ctrl = readl(decoderN(CTRL_OFFSET, i));
+			base = ioread64_hi_lo(decoderN(BASE_LOW_OFFSET, i));
+
+			cxld->decoder_range = (struct range){
+				.start = base,
+				.end = base + size - 1
+			};
+
+			cxld->flags = CXL_DECODER_F_ENABLE;
+			cxld->interleave_ways = to_interleave_ways(ctrl);
+			cxld->interleave_granularity =
+				to_interleave_granularity(ctrl);
+
+			if (FIELD_GET(CXL_HDM_DECODER0_CTRL_TYPE, ctrl) == 0)
+				cxld->target_type = CXL_DECODER_ACCELERATOR;
+
+			target_list.value = ioread64_hi_lo(decoderN(TL_LOW, i));
+			for (j = 0; j < cxld->interleave_ways; j++)
+				target_map[j] = target_list.target_id[j];
+#undef decoderN
+		}
+
+		rc = cxl_decoder_add_locked(cxld, target_map);
+		if (rc)
+			put_device(&cxld->dev);
+		else
+			rc = cxl_decoder_autoremove(&port->dev, cxld);
+		if (rc)
+			dev_err(&port->dev, "Failed to add decoder\n");
+		else
+			dev_dbg(&cxld->dev, "Added to port %s\n",
+				dev_name(&port->dev));
+	}
+
+	/*
+	 * Turn on global enable now since DVSEC ranges aren't being used and
+	 * we'll eventually want the decoder enabled.
+	 */
+	if (!global_enable) {
+		dev_dbg(&port->dev, "Enabling HDM decode\n");
+		writel(global_ctrl | CXL_HDM_DECODER_ENABLE, hdm_decoder + CXL_HDM_DECODER_CTRL_OFFSET);
+	}
+
+	return 0;
+}
+
+static int cxl_port_probe(struct device *dev)
+{
+	struct cxl_port *port = to_cxl_port(dev);
+	struct cxl_port_data *portdata;
+	void __iomem *crb;
+	int rc;
+
+	if (port->component_reg_phys == CXL_RESOURCE_NONE)
+		return 0;
+
+	portdata = devm_kzalloc(dev, sizeof(*portdata), GFP_KERNEL);
+	if (!portdata)
+		return -ENOMEM;
+
+	crb = devm_cxl_iomap_block(&port->dev, port->component_reg_phys,
+				   CXL_COMPONENT_REG_BLOCK_SIZE);
+	if (!crb) {
+		dev_err(&port->dev, "No component registers mapped\n");
+		return -ENXIO;
+	}
+
+	rc = map_regs(port, crb, portdata);
+	if (rc)
+		return rc;
+
+	get_caps(port, portdata);
+	if (portdata->caps.count == 0) {
+		dev_err(&port->dev, "Spec violation. Caps invalid\n");
+		return -ENXIO;
+	}
+
+	rc = enumerate_hdm_decoders(port, portdata);
+	if (rc) {
+		dev_err(&port->dev, "Couldn't enumerate decoders (%d)\n", rc);
+		return rc;
+	}
+
+	/*
+	 * Bus rescan is done in a workqueue so that we can do so with the
+	 * device lock dropped.
+	 *
+	 * Why do we need to rescan? There is a race between cxl_acpi and
+	 * cxl_mem (which depends on cxl_pci). cxl_mem will only create a port
+	 * if it can establish a path up to a root port, which is enumerated by
+	 * a platform specific driver (ie. cxl_acpi) and bound by this driver.
+	 * While cxl_acpi could do the rescan, it makes sense to be here as
+	 * other platform drivers might require the same functionality.
+	 */
+	INIT_WORK(&port->rescan_work, rescan_ports);
+	queue_work(cxl_port_wq, &port->rescan_work);
+
+	return 0;
+}
+
+static struct cxl_driver cxl_port_driver = {
+	.name = "cxl_port",
+	.probe = cxl_port_probe,
+	.id = CXL_DEVICE_PORT,
+};
+
+static __init int cxl_port_init(void)
+{
+	int rc;
+
+	cxl_port_wq = alloc_ordered_workqueue("cxl_port", 0);
+	if (!cxl_port_wq)
+		return -ENOMEM;
+
+	rc = cxl_driver_register(&cxl_port_driver);
+	if (rc) {
+		destroy_workqueue(cxl_port_wq);
+		return rc;
+	}
+
+	return 0;
+}
+
+static __exit void cxl_port_exit(void)
+{
+	destroy_workqueue(cxl_port_wq);
+	cxl_driver_unregister(&cxl_port_driver);
+}
+
+module_init(cxl_port_init);
+module_exit(cxl_port_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_IMPORT_NS(CXL);
+MODULE_ALIAS_CXL(CXL_DEVICE_PORT);