diff mbox series

[v7,7/7] PCI: qcom: Add OPP support to scale performance state of power domain

Message ID 20240223-opp_support-v7-7-10b4363d7e71@quicinc.com
State New
Headers show
Series PCI: qcom: Add support for OPP | expand

Commit Message

Krishna chaitanya chundru Feb. 23, 2024, 2:48 p.m. UTC
QCOM Resource Power Manager-hardened (RPMh) is a hardware block which
maintains hardware state of a regulator by performing max aggregation of
the requests made by all of the clients.

PCIe controller can operate on different RPMh performance state of power
domain based up on the speed of the link. And this performance state varies
from target to target.

It is manadate to scale the performance state based up on the PCIe speed
link operates so that SoC can run under optimum power conditions.

Add Operating Performance Points(OPP) support to vote for RPMh state based
upon the speed link is operating.

OPP can handle ICC bw voting also, so move ICC bw voting through OPP
framework if OPP entries are present.

In PCIe certain speeds like GEN1x2 & GEN2x1 or GEN3x2 & GEN4x1 use
same bw and frequency and thus the OPP entry, so use frequency based
search to reduce number of entries in the OPP table.

Don't initialize ICC if OPP is supported.

Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
---
 drivers/pci/controller/dwc/pcie-qcom.c | 75 +++++++++++++++++++++++++++-------
 1 file changed, 61 insertions(+), 14 deletions(-)

Comments

Bjorn Helgaas Feb. 27, 2024, 11:36 p.m. UTC | #1
On Fri, Feb 23, 2024 at 08:18:04PM +0530, Krishna chaitanya chundru wrote:
> QCOM Resource Power Manager-hardened (RPMh) is a hardware block which
> maintains hardware state of a regulator by performing max aggregation of
> the requests made by all of the clients.
> 
> PCIe controller can operate on different RPMh performance state of power
> domain based up on the speed of the link. And this performance state varies
> from target to target.

s/up on/on/ (or "upon" if you prefer) (also below)

I understand changing the performance state based on the link speed,
but I don't understand the variation from target to target.  Do you
mean just that the link speed may vary based on the rates supported by
the downstream device?

> It is manadate to scale the performance state based up on the PCIe speed
> link operates so that SoC can run under optimum power conditions.

It sounds like it's more power efficient, but not actually
*mandatory*.  Maybe something like this?

  The SoC can be more power efficient if we scale the performance
  state based on the aggregate PCIe link speed.

> Add Operating Performance Points(OPP) support to vote for RPMh state based
> upon the speed link is operating.

Space before open paren, e.g., "Points (OPP)".

"... based on the link speed."

> OPP can handle ICC bw voting also, so move ICC bw voting through OPP
> framework if OPP entries are present.
> 
> In PCIe certain speeds like GEN1x2 & GEN2x1 or GEN3x2 & GEN4x1 use
> same bw and frequency and thus the OPP entry, so use frequency based
> search to reduce number of entries in the OPP table.

GEN1x2, GEN2x1, etc are not "speeds".  I would say:

  Different link configurations may share the same aggregate speed,
  e.g., a 2.5 GT/s x2 link and a 5.0 GT/s x1 link have the same speed
  and share the same OPP entry.

> Don't initialize ICC if OPP is supported.

Because?  Maybe this should say something about OPP including the ICC
voting?

> +		ret = icc_set_bw(pcie->icc_mem, 0, width * QCOM_PCIE_LINK_SPEED_TO_BW(speed));

Wrap to fit in 80 columns.

> +	 * Use highest OPP here if the OPP table is present. At the end of the probe(),
> +	 * OPP will be updated using qcom_pcie_icc_opp_update().

Wrap to fit in 80 columns.

> +	/* Skip ICC init if OPP is supported as ICC bw vote is handled by OPP framework */

Wrap to fit in 80 columns.
Bjorn Helgaas Feb. 27, 2024, 11:45 p.m. UTC | #2
On Tue, Feb 27, 2024 at 05:36:38PM -0600, Bjorn Helgaas wrote:
> On Fri, Feb 23, 2024 at 08:18:04PM +0530, Krishna chaitanya chundru wrote:
> > QCOM Resource Power Manager-hardened (RPMh) is a hardware block which
> > maintains hardware state of a regulator by performing max aggregation of
> > the requests made by all of the clients.

> > It is manadate to scale the performance state based up on the PCIe speed
> > link operates so that SoC can run under optimum power conditions.
> 
> It sounds like it's more power efficient, but not actually
> *mandatory*.  Maybe something like this?
> 
>   The SoC can be more power efficient if we scale the performance
>   state based on the aggregate PCIe link speed.

Actually, maybe it would be better to say "aggregate PCIe link
bandwidth", because we use "speed" elsewhere (PCIE_SPEED2MBS_ENC(),
etc) to refer specifically to the data rate independent of the width.

> > Add Operating Performance Points(OPP) support to vote for RPMh state based
> > upon the speed link is operating.
> 
> "... based on the link speed."

"... based on the aggregate link bandwidth."

> > In PCIe certain speeds like GEN1x2 & GEN2x1 or GEN3x2 & GEN4x1 use
> > same bw and frequency and thus the OPP entry, so use frequency based
> > search to reduce number of entries in the OPP table.
> 
> GEN1x2, GEN2x1, etc are not "speeds".  I would say:
> 
>   Different link configurations may share the same aggregate speed,
>   e.g., a 2.5 GT/s x2 link and a 5.0 GT/s x1 link have the same speed
>   and share the same OPP entry.

  Different link configurations may share the same aggregate
  bandwidth, e.g., a 2.5 GT/s x2 link and a 5.0 GT/s x1 link
  have the same bandwidth and share the same OPP entry.
Krishna chaitanya chundru Feb. 28, 2024, 6:48 a.m. UTC | #3
On 2/28/2024 5:15 AM, Bjorn Helgaas wrote:
> On Tue, Feb 27, 2024 at 05:36:38PM -0600, Bjorn Helgaas wrote:
>> On Fri, Feb 23, 2024 at 08:18:04PM +0530, Krishna chaitanya chundru wrote:
>>> QCOM Resource Power Manager-hardened (RPMh) is a hardware block which
>>> maintains hardware state of a regulator by performing max aggregation of
>>> the requests made by all of the clients.
> 
>>> It is manadate to scale the performance state based up on the PCIe speed
>>> link operates so that SoC can run under optimum power conditions.
>>
>> It sounds like it's more power efficient, but not actually
>> *mandatory*.  Maybe something like this?
>>
>>    The SoC can be more power efficient if we scale the performance
>>    state based on the aggregate PCIe link speed.
> 
> Actually, maybe it would be better to say "aggregate PCIe link
> bandwidth", because we use "speed" elsewhere (PCIE_SPEED2MBS_ENC(),
> etc) to refer specifically to the data rate independent of the width.
> 
>>> Add Operating Performance Points(OPP) support to vote for RPMh state based
>>> upon the speed link is operating.
>>
>> "... based on the link speed."
> 
> "... based on the aggregate link bandwidth."
> 
>>> In PCIe certain speeds like GEN1x2 & GEN2x1 or GEN3x2 & GEN4x1 use
>>> same bw and frequency and thus the OPP entry, so use frequency based
>>> search to reduce number of entries in the OPP table.
>>
>> GEN1x2, GEN2x1, etc are not "speeds".  I would say:
>>
>>    Different link configurations may share the same aggregate speed,
>>    e.g., a 2.5 GT/s x2 link and a 5.0 GT/s x1 link have the same speed
>>    and share the same OPP entry.
> 
>    Different link configurations may share the same aggregate
>    bandwidth, e.g., a 2.5 GT/s x2 link and a 5.0 GT/s x1 link
>    have the same bandwidth and share the same OPP entry.
- I will update the commit message as suggested in my next series.

- Krishna Chaitanya.
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index 088ebd2e5865..c608bec8b9cb 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -22,6 +22,7 @@ 
 #include <linux/of.h>
 #include <linux/of_gpio.h>
 #include <linux/pci.h>
+#include <linux/pm_opp.h>
 #include <linux/pm_runtime.h>
 #include <linux/platform_device.h>
 #include <linux/phy/pcie.h>
@@ -244,6 +245,7 @@  struct qcom_pcie {
 	const struct qcom_pcie_cfg *cfg;
 	struct dentry *debugfs;
 	bool suspended;
+	bool opp_supported;
 };
 
 #define to_qcom_pcie(x)		dev_get_drvdata((x)->dev)
@@ -1404,16 +1406,14 @@  static int qcom_pcie_icc_init(struct qcom_pcie *pcie)
 	return 0;
 }
 
-static void qcom_pcie_icc_update(struct qcom_pcie *pcie)
+static void qcom_pcie_icc_opp_update(struct qcom_pcie *pcie)
 {
 	struct dw_pcie *pci = pcie->pci;
-	u32 offset, status;
+	u32 offset, status, freq;
+	struct dev_pm_opp *opp;
 	int speed, width;
 	int ret;
 
-	if (!pcie->icc_mem)
-		return;
-
 	offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
 	status = readw(pci->dbi_base + offset + PCI_EXP_LNKSTA);
 
@@ -1424,11 +1424,26 @@  static void qcom_pcie_icc_update(struct qcom_pcie *pcie)
 	speed = FIELD_GET(PCI_EXP_LNKSTA_CLS, status);
 	width = FIELD_GET(PCI_EXP_LNKSTA_NLW, status);
 
-	ret = icc_set_bw(pcie->icc_mem, 0, width * QCOM_PCIE_LINK_SPEED_TO_BW(speed));
-	if (ret) {
-		dev_err(pci->dev, "failed to set interconnect bandwidth: %d\n",
-			ret);
+	if (pcie->opp_supported) {
+		freq = PCIE_MBS2FREQ(pcie_link_speed[speed]);
+
+		opp = dev_pm_opp_find_freq_exact(pci->dev, freq * width, true);
+		if (!IS_ERR(opp)) {
+			ret = dev_pm_opp_set_opp(pci->dev, opp);
+			if (ret)
+				dev_err(pci->dev, "Failed to set opp: freq %ld ret %d\n",
+					dev_pm_opp_get_freq(opp), ret);
+			dev_pm_opp_put(opp);
+		}
+	} else {
+		ret = icc_set_bw(pcie->icc_mem, 0, width * QCOM_PCIE_LINK_SPEED_TO_BW(speed));
+		if (ret) {
+			dev_err(pci->dev, "failed to set interconnect bandwidth for pcie-mem: %d\n",
+				ret);
+		}
 	}
+
+	return;
 }
 
 static int qcom_pcie_link_transition_count(struct seq_file *s, void *data)
@@ -1471,8 +1486,10 @@  static void qcom_pcie_init_debugfs(struct qcom_pcie *pcie)
 static int qcom_pcie_probe(struct platform_device *pdev)
 {
 	const struct qcom_pcie_cfg *pcie_cfg;
+	unsigned long max_freq = INT_MAX;
 	struct device *dev = &pdev->dev;
 	struct qcom_pcie *pcie;
+	struct dev_pm_opp *opp;
 	struct dw_pcie_rp *pp;
 	struct resource *res;
 	struct dw_pcie *pci;
@@ -1539,9 +1556,36 @@  static int qcom_pcie_probe(struct platform_device *pdev)
 		goto err_pm_runtime_put;
 	}
 
-	ret = qcom_pcie_icc_init(pcie);
-	if (ret)
+	 /* OPP table is optional */
+	ret = devm_pm_opp_of_add_table(dev);
+	if (ret && ret != -ENODEV) {
+		dev_err_probe(dev, ret, "Failed to add OPP table\n");
 		goto err_pm_runtime_put;
+	}
+
+	/*
+	 * Use highest OPP here if the OPP table is present. At the end of the probe(),
+	 * OPP will be updated using qcom_pcie_icc_opp_update().
+	 */
+	if (ret != -ENODEV) {
+		opp = dev_pm_opp_find_freq_floor(dev, &max_freq);
+		if (!IS_ERR(opp)) {
+			ret = dev_pm_opp_set_opp(dev, opp);
+			if (ret)
+				dev_err_probe(pci->dev, ret,
+					      "Failed to set opp: freq %ld\n",
+					      dev_pm_opp_get_freq(opp));
+			dev_pm_opp_put(opp);
+		}
+		pcie->opp_supported = true;
+	}
+
+	/* Skip ICC init if OPP is supported as ICC bw vote is handled by OPP framework */
+	if (!pcie->opp_supported) {
+		ret = qcom_pcie_icc_init(pcie);
+		if (ret)
+			goto err_pm_runtime_put;
+	}
 
 	ret = pcie->cfg->ops->get_resources(pcie);
 	if (ret)
@@ -1561,7 +1605,7 @@  static int qcom_pcie_probe(struct platform_device *pdev)
 		goto err_phy_exit;
 	}
 
-	qcom_pcie_icc_update(pcie);
+	qcom_pcie_icc_opp_update(pcie);
 
 	if (pcie->mhi)
 		qcom_pcie_init_debugfs(pcie);
@@ -1612,7 +1656,7 @@  static int qcom_pcie_suspend_noirq(struct device *dev)
 		pcie->suspended = true;
 	}
 
-	/* Remove cpu path vote after all the register access is done */
+	/* Remove CPU path vote after all the register access is done */
 	ret = icc_disable(pcie->icc_cpu);
 	if (ret) {
 		dev_err(dev, "failed to disable icc path of cpu-pcie: %d\n", ret);
@@ -1624,6 +1668,9 @@  static int qcom_pcie_suspend_noirq(struct device *dev)
 		return ret;
 	}
 
+	if (pcie->opp_supported)
+		dev_pm_opp_set_opp(pcie->pci->dev, NULL);
+
 	return 0;
 }
 
@@ -1646,7 +1693,7 @@  static int qcom_pcie_resume_noirq(struct device *dev)
 		pcie->suspended = false;
 	}
 
-	qcom_pcie_icc_update(pcie);
+	qcom_pcie_icc_opp_update(pcie);
 
 	return 0;
 }