diff mbox series

[kernel] powerpc/pci/of: Parse unassigned resources

Message ID 20190614025916.123589-1-aik@ozlabs.ru
State Superseded
Headers show
Series [kernel] powerpc/pci/of: Parse unassigned resources | expand

Checks

Context Check Description
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/apply_patch success Successfully applied on branch next (a3bf9fbdad600b1e4335dd90979f8d6072e4f602)

Commit Message

Alexey Kardashevskiy June 14, 2019, 2:59 a.m. UTC
The pseries platform uses the PCI_PROBE_DEVTREE method of PCI probing
which is basically reading "assigned-addresses" of every PCI device.
However if the property is missing or zero sized, then there is
no fallback of any kind and the PCI resources remain undiscovered, i.e.
pdev->resource[] array is empty.

This adds a fallback which parses the "reg" property in pretty much same
way except it marks resources as "unset" which later makes Linux assign
those resources with proper addresses.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

This is an attempts to boot linux directly under QEMU without slof/rtas;
the aim is to use petitboot instead and let the guest kernel configure
devices.

QEMU does not allocate resources, it creates correct "reg" and zero length
"assigned-addresses" (which is probably a bug on its own) which is
normally populated by SLOF later but not during this exercise.

---
 arch/powerpc/kernel/pci_of_scan.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Alexey Kardashevskiy June 14, 2019, 3:18 a.m. UTC | #1
On 14/06/2019 12:59, Alexey Kardashevskiy wrote:
> The pseries platform uses the PCI_PROBE_DEVTREE method of PCI probing
> which is basically reading "assigned-addresses" of every PCI device.
> However if the property is missing or zero sized, then there is
> no fallback of any kind and the PCI resources remain undiscovered, i.e.
> pdev->resource[] array is empty.
> 
> This adds a fallback which parses the "reg" property in pretty much same
> way except it marks resources as "unset" which later makes Linux assign
> those resources with proper addresses.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> This is an attempts to boot linux directly under QEMU without slof/rtas;
> the aim is to use petitboot instead and let the guest kernel configure
> devices.
> 
> QEMU does not allocate resources, it creates correct "reg" and zero length
> "assigned-addresses" (which is probably a bug on its own) which is
> normally populated by SLOF later but not during this exercise.
> 
> ---
>  arch/powerpc/kernel/pci_of_scan.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
> index 64ad92016b63..cfe6ec3c6aaf 100644
> --- a/arch/powerpc/kernel/pci_of_scan.c
> +++ b/arch/powerpc/kernel/pci_of_scan.c
> @@ -82,10 +82,18 @@ static void of_pci_parse_addrs(struct device_node *node, struct pci_dev *dev)
>  	const __be32 *addrs;
>  	u32 i;
>  	int proplen;
> +	bool unset = false;
>  
>  	addrs = of_get_property(node, "assigned-addresses", &proplen);
>  	if (!addrs)
>  		return;


Ah. Of course, these 2 lines above should go, my bad. I'll repost if
there are no other (and bigger) problems with this.



> +	if (!addrs || !proplen) {
> +		addrs = of_get_property(node, "reg", &proplen);
> +		if (!addrs || !proplen)
> +			return;
> +		unset = true;
> +	}
> +
>  	pr_debug("    parse addresses (%d bytes) @ %p\n", proplen, addrs);
>  	for (; proplen >= 20; proplen -= 20, addrs += 5) {
>  		flags = pci_parse_of_flags(of_read_number(addrs, 1), 0);
> @@ -110,6 +118,8 @@ static void of_pci_parse_addrs(struct device_node *node, struct pci_dev *dev)
>  			continue;
>  		}
>  		res->flags = flags;
> +		if (unset)
> +			res->flags |= IORESOURCE_UNSET;
>  		res->name = pci_name(dev);
>  		region.start = base;
>  		region.end = base + size - 1;
>
Sam Bobroff June 18, 2019, 4:02 a.m. UTC | #2
On Fri, Jun 14, 2019 at 01:18:28PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 14/06/2019 12:59, Alexey Kardashevskiy wrote:
> > The pseries platform uses the PCI_PROBE_DEVTREE method of PCI probing
> > which is basically reading "assigned-addresses" of every PCI device.
> > However if the property is missing or zero sized, then there is
> > no fallback of any kind and the PCI resources remain undiscovered, i.e.
> > pdev->resource[] array is empty.
> > 
> > This adds a fallback which parses the "reg" property in pretty much same
> > way except it marks resources as "unset" which later makes Linux assign
> > those resources with proper addresses.
> > 
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> > 
> > This is an attempts to boot linux directly under QEMU without slof/rtas;
> > the aim is to use petitboot instead and let the guest kernel configure
> > devices.
> > 
> > QEMU does not allocate resources, it creates correct "reg" and zero length
> > "assigned-addresses" (which is probably a bug on its own) which is
> > normally populated by SLOF later but not during this exercise.

Hi Alexey,

This patch (fixed, as you point out below) also seems to improve hotplug
on pSeries.

Currently, the PCI hotplug driver for pSeries (rpaphp) uses generic PCI
scanning to add hot plugged devices, rather than slot power control,
because the slot power control method doesn't work.

AFAIK one of the reasons that slot power control doesn't work is that
the assigned-addresses node isn't populated by QEMU during hotplug, so I
tested this patch on a guest that has been modified to use that method.

In combination with a QEMU change to prevent PCI_PROBE_ONLY being set
(necessary to allow pcibios_finish_adding_to_bus() to do resource
allocation -- I assume you are using a similar change), I was able to
successfully hot plug a few devices!

So this change seems to be a step in the right direction.

(I also tested it with an unmodified guest, and it doesn't seem to harm
hotpluging via generic PCI scanning.)

One nit: I think that calling the variable "unset" is a bit confusing.
What about calling it "aa_missing" or something like that?

Cheers,
Sam.

> > ---
> >  arch/powerpc/kernel/pci_of_scan.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
> > index 64ad92016b63..cfe6ec3c6aaf 100644
> > --- a/arch/powerpc/kernel/pci_of_scan.c
> > +++ b/arch/powerpc/kernel/pci_of_scan.c
> > @@ -82,10 +82,18 @@ static void of_pci_parse_addrs(struct device_node *node, struct pci_dev *dev)
> >  	const __be32 *addrs;
> >  	u32 i;
> >  	int proplen;
> > +	bool unset = false;
> >  
> >  	addrs = of_get_property(node, "assigned-addresses", &proplen);
> >  	if (!addrs)
> >  		return;
> 
> 
> Ah. Of course, these 2 lines above should go, my bad. I'll repost if
> there are no other (and bigger) problems with this.
> 
> 
> 
> > +	if (!addrs || !proplen) {
> > +		addrs = of_get_property(node, "reg", &proplen);
> > +		if (!addrs || !proplen)
> > +			return;
> > +		unset = true;
> > +	}
> > +
> >  	pr_debug("    parse addresses (%d bytes) @ %p\n", proplen, addrs);
> >  	for (; proplen >= 20; proplen -= 20, addrs += 5) {
> >  		flags = pci_parse_of_flags(of_read_number(addrs, 1), 0);
> > @@ -110,6 +118,8 @@ static void of_pci_parse_addrs(struct device_node *node, struct pci_dev *dev)
> >  			continue;
> >  		}
> >  		res->flags = flags;
> > +		if (unset)
> > +			res->flags |= IORESOURCE_UNSET;
> >  		res->name = pci_name(dev);
> >  		region.start = base;
> >  		region.end = base + size - 1;
> > 
> 
> -- 
> Alexey
>
Michael Ellerman June 18, 2019, 12:15 p.m. UTC | #3
Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> The pseries platform uses the PCI_PROBE_DEVTREE method of PCI probing
> which is basically reading "assigned-addresses" of every PCI device.
> However if the property is missing or zero sized, then there is
> no fallback of any kind and the PCI resources remain undiscovered, i.e.
> pdev->resource[] array is empty.
>
> This adds a fallback which parses the "reg" property in pretty much same
> way except it marks resources as "unset" which later makes Linux assign
> those resources with proper addresses.

What happens under PowerVM is the big question.

ie. if we see such a device under PowerVM and then do our own assignment
what happens?

cheers
Benjamin Herrenschmidt June 18, 2019, 12:29 p.m. UTC | #4
On Tue, 2019-06-18 at 22:15 +1000, Michael Ellerman wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> > The pseries platform uses the PCI_PROBE_DEVTREE method of PCI probing
> > which is basically reading "assigned-addresses" of every PCI device.
> > However if the property is missing or zero sized, then there is
> > no fallback of any kind and the PCI resources remain undiscovered, i.e.
> > pdev->resource[] array is empty.
> > 
> > This adds a fallback which parses the "reg" property in pretty much same
> > way except it marks resources as "unset" which later makes Linux assign
> > those resources with proper addresses.
> 
> What happens under PowerVM is the big question.
> 
> ie. if we see such a device under PowerVM and then do our own assignment
> what happens?

May or may not work ... EEH will be probably b0rked, but then it
shouldn't happen.

Basically PowerVM itself doesn't do anything special with PCI. It maps
a whole PHB (or virtual PHB) into the guest and doesn't care much
beyond that for MMIOs.

What you see in Linux getting in the way is RTAS. It's the one
assigning BAR values etc... within that region setup by the HV, but
RTAS is running in the guest, from the HV perspective it's all the same
really.

So if such a device did exist, RTAS would lose track but it would still
work from a HW/HV perspective. RTAS-driven services such as EEH would
probably fail though.

But in practice this shouldn't happen bcs RTAS will set assigned-
addresses on everything.

Cheers,
Ben.
Alexey Kardashevskiy June 19, 2019, 1:20 a.m. UTC | #5
On 18/06/2019 22:15, Michael Ellerman wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> The pseries platform uses the PCI_PROBE_DEVTREE method of PCI probing
>> which is basically reading "assigned-addresses" of every PCI device.
>> However if the property is missing or zero sized, then there is
>> no fallback of any kind and the PCI resources remain undiscovered, i.e.
>> pdev->resource[] array is empty.
>>
>> This adds a fallback which parses the "reg" property in pretty much same
>> way except it marks resources as "unset" which later makes Linux assign
>> those resources with proper addresses.
> 
> What happens under PowerVM is the big question.
> 
> ie. if we see such a device under PowerVM and then do our own assignment
> what happens?

I'd be surprised not to see at least one "assigned-addresses" under
powervm, and a single assigned bar will do the old behavior.

I guess I could make it depend on "linux,pci-probe-only" (which I will
need for this to work anyway), if that helps, should I?
diff mbox series

Patch

diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
index 64ad92016b63..cfe6ec3c6aaf 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -82,10 +82,18 @@  static void of_pci_parse_addrs(struct device_node *node, struct pci_dev *dev)
 	const __be32 *addrs;
 	u32 i;
 	int proplen;
+	bool unset = false;
 
 	addrs = of_get_property(node, "assigned-addresses", &proplen);
 	if (!addrs)
 		return;
+	if (!addrs || !proplen) {
+		addrs = of_get_property(node, "reg", &proplen);
+		if (!addrs || !proplen)
+			return;
+		unset = true;
+	}
+
 	pr_debug("    parse addresses (%d bytes) @ %p\n", proplen, addrs);
 	for (; proplen >= 20; proplen -= 20, addrs += 5) {
 		flags = pci_parse_of_flags(of_read_number(addrs, 1), 0);
@@ -110,6 +118,8 @@  static void of_pci_parse_addrs(struct device_node *node, struct pci_dev *dev)
 			continue;
 		}
 		res->flags = flags;
+		if (unset)
+			res->flags |= IORESOURCE_UNSET;
 		res->name = pci_name(dev);
 		region.start = base;
 		region.end = base + size - 1;