diff mbox series

[v4,3/4] PCI: Create new reset method to force SBR for CXL

Message ID 20240409160256.94184-4-dave.jiang@intel.com
State New
Headers show
Series PCI: Add Secondary Bus Reset (SBR) support for CXL | expand

Commit Message

Dave Jiang April 9, 2024, 4:01 p.m. UTC
CXL spec r3.1 8.1.5.2
By default Secondary Bus Reset (SBR) is masked for CXL ports. Introduce a
new PCI reset method "cxl_bus" to force SBR on CXL ports by setting
the unmask SBR bit in the CXL DVSEC port control register before performing
the bus reset and restore the original value of the bit post reset. The
new reset method allows the user to intentionally perform SBR on a CXL
device without needing to set the "Unmask SBR" bit via a user tool.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
v4:
- Use pci_dev_lock guard() on bridge. (Dan)
---
 drivers/pci/pci.c   | 39 +++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h |  2 +-
 2 files changed, 40 insertions(+), 1 deletion(-)

Comments

Dan Williams April 26, 2024, 7:46 p.m. UTC | #1
Dave Jiang wrote:
> CXL spec r3.1 8.1.5.2
> By default Secondary Bus Reset (SBR) is masked for CXL ports. Introduce a
> new PCI reset method "cxl_bus" to force SBR on CXL ports by setting
> the unmask SBR bit in the CXL DVSEC port control register before performing
> the bus reset and restore the original value of the bit post reset. The
> new reset method allows the user to intentionally perform SBR on a CXL
> device without needing to set the "Unmask SBR" bit via a user tool.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
> v4:
> - Use pci_dev_lock guard() on bridge. (Dan)
> ---
>  drivers/pci/pci.c   | 39 +++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h |  2 +-
>  2 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 570b00fe10f7..3b4f7b369287 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4982,6 +4982,44 @@ static int pci_reset_bus_function(struct pci_dev *dev, bool probe)
>  	return pci_parent_bus_reset(dev, probe);
>  }
>  
> +static int cxl_reset_bus_function(struct pci_dev *dev, bool probe)
> +{
> +	struct pci_dev *bridge;
> +	u16 dvsec, reg, val;
> +	int rc;
> +
> +	bridge = pci_upstream_bridge(dev);
> +	if (!bridge)
> +		return -ENOTTY;
> +
> +	dvsec = cxl_port_dvsec(bridge);
> +	if (!dvsec)
> +		return -ENOTTY;
> +
> +	if (probe)
> +		return 0;
> +
> +	guard(pci_dev)(bridge);

My concern here is this sets up a pci_reset_function() locking order of:

    pci_dev_lock(@dev)
        pci_dev_lock(pci_upstream_bridge(@dev))

Compare that to pci_reset_bus() that does:

    pci_dev_lock(pci_upstream_bridge(@dev))
        pci_dev_lock(@dev)

This also highlights that the pci_dev_lock() performed by
pci_reset_function() has long been insufficient for the
pci_reset_bus_function() method case that could race userspace when
pci_reset_secondary_bus() is manipulating the bridge control register.

So, if the goal of the lock is to prevent userspace from clobbering the
kernel's read-modify-write cycles of @dev's parent bridge, then the lock
needs to be held over both the cxl_reset_bus_function() and the
pci_reset_bus_function() cases, and it needs to be taken in
upstream-bridge => endpoint order.
Lukas Wunner April 27, 2024, 6:19 a.m. UTC | #2
On Fri, Apr 26, 2024 at 12:46:45PM -0700, Dan Williams wrote:
> This also highlights that the pci_dev_lock() performed by
> pci_reset_function() has long been insufficient for the
> pci_reset_bus_function() method case that could race userspace when
> pci_reset_secondary_bus() is manipulating the bridge control register.
> 
> So, if the goal of the lock is to prevent userspace from clobbering the
> kernel's read-modify-write cycles of @dev's parent bridge, then the lock
> needs to be held over both the cxl_reset_bus_function() and the
> pci_reset_bus_function() cases, and it needs to be taken in
> upstream-bridge => endpoint order.

No, the device lock is taken to prevent the driver from unbinding.
It has nothing to do with protecting RMW of parent bridge registers.

Here's Christoph Hellwig's explanation why he introduced acquisition
of the device lock in the PCI reset code paths:

https://lore.kernel.org/all/20200325104018.GA30853@lst.de/

TL;DR:  The PCI core calls the driver's ->reset_prepare and ->reset_done
callbacks and the driver needs to be held in place for that.

Thanks,

Lukas
Dan Williams April 27, 2024, 5:07 p.m. UTC | #3
Lukas Wunner wrote:
> On Fri, Apr 26, 2024 at 12:46:45PM -0700, Dan Williams wrote:
> > This also highlights that the pci_dev_lock() performed by
> > pci_reset_function() has long been insufficient for the
> > pci_reset_bus_function() method case that could race userspace when
> > pci_reset_secondary_bus() is manipulating the bridge control register.
> > 
> > So, if the goal of the lock is to prevent userspace from clobbering the
> > kernel's read-modify-write cycles of @dev's parent bridge, then the lock
> > needs to be held over both the cxl_reset_bus_function() and the
> > pci_reset_bus_function() cases, and it needs to be taken in
> > upstream-bridge => endpoint order.
> 
> No, the device lock is taken to prevent the driver from unbinding.
> It has nothing to do with protecting RMW of parent bridge registers.
> 
> Here's Christoph Hellwig's explanation why he introduced acquisition
> of the device lock in the PCI reset code paths:
> 
> https://lore.kernel.org/all/20200325104018.GA30853@lst.de/
> 
> TL;DR:  The PCI core calls the driver's ->reset_prepare and ->reset_done
> callbacks and the driver needs to be held in place for that.

Yes, that's what device_lock() is for, *but* that's not what
pci_cfg_access_lock() is for.

So when I say that pci_dev_lock() is to protect read-modify-write
configuration cycle access of an endpoint to be reset, I am talking
about pci_cfg_access_lock() not device_lock(). For better or worse
pci_dev_lock() does both, and maybe device_lock() should be open-coded
separately.

So there is a discrepancy when it comes to protecting manipulation of
secondary bus reset control registers of a bridge. The
pci_cfg_access_lock() protection is applied to the bridge in the
pci_reset_bus() case, but not the pci_reset_bus_function() case.
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 570b00fe10f7..3b4f7b369287 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4982,6 +4982,44 @@  static int pci_reset_bus_function(struct pci_dev *dev, bool probe)
 	return pci_parent_bus_reset(dev, probe);
 }
 
+static int cxl_reset_bus_function(struct pci_dev *dev, bool probe)
+{
+	struct pci_dev *bridge;
+	u16 dvsec, reg, val;
+	int rc;
+
+	bridge = pci_upstream_bridge(dev);
+	if (!bridge)
+		return -ENOTTY;
+
+	dvsec = cxl_port_dvsec(bridge);
+	if (!dvsec)
+		return -ENOTTY;
+
+	if (probe)
+		return 0;
+
+	guard(pci_dev)(bridge);
+	rc = pci_read_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL, &reg);
+	if (rc)
+		return -ENOTTY;
+
+	if (!(reg & PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR)) {
+		val = reg | PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR;
+		pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL,
+				      val);
+	} else {
+		val = reg;
+	}
+
+	rc = pci_reset_bus_function(dev, probe);
+
+	if (reg != val)
+		pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL, reg);
+
+	return rc;
+}
+
 void pci_dev_lock(struct pci_dev *dev)
 {
 	/* block PM suspend, driver probe, etc. */
@@ -5066,6 +5104,7 @@  static const struct pci_reset_fn_method pci_reset_fn_methods[] = {
 	{ pci_af_flr, .name = "af_flr" },
 	{ pci_pm_reset, .name = "pm" },
 	{ pci_reset_bus_function, .name = "bus" },
+	{ cxl_reset_bus_function, .name = "cxl_bus" },
 };
 
 static ssize_t reset_method_show(struct device *dev,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 16493426a04f..235f37715a43 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -51,7 +51,7 @@ 
 			       PCI_STATUS_PARITY)
 
 /* Number of reset methods used in pci_reset_fn_methods array in pci.c */
-#define PCI_NUM_RESET_METHODS 7
+#define PCI_NUM_RESET_METHODS 8
 
 #define PCI_RESET_PROBE		true
 #define PCI_RESET_DO_RESET	false