diff mbox series

powerpc/pci: Add config option for using OF 'reg' for PCI domain

Message ID 20220504175718.29011-1-pali@kernel.org (mailing list archive)
State Changes Requested
Headers show
Series powerpc/pci: Add config option for using OF 'reg' for PCI domain | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 7 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 24 jobs.
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.

Commit Message

Pali Rohár May 4, 2022, 5:57 p.m. UTC
Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on
device-tree properties"), powerpc kernel always fallback to PCI domain
assignment from OF / Device Tree 'reg' property of the PCI controller.

PCI code for other Linux architectures use increasing assignment of the PCI
domain for individual controllers (assign the first free number), like it
was also for powerpc prior mentioned commit.

Upgrading powerpc kernels from LTS 4.4 version (which does not contain
mentioned commit) to new LTS versions brings a regression in domain
assignment.

Fix this issue by introducing a new option CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG
When this options is disabled then powerpc kernel would assign PCI domains
in the similar way like it is doing kernel for other architectures and also
how it was done prior that commit.

Fixes: 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties")
Signed-off-by: Pali Rohár <pali@kernel.org>
---
 arch/powerpc/Kconfig             | 10 ++++++++++
 arch/powerpc/kernel/pci-common.c |  4 ++--
 2 files changed, 12 insertions(+), 2 deletions(-)

Comments

Christophe Leroy May 5, 2022, 7:16 a.m. UTC | #1
Le 04/05/2022 à 19:57, Pali Rohár a écrit :
> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on
> device-tree properties"), powerpc kernel always fallback to PCI domain
> assignment from OF / Device Tree 'reg' property of the PCI controller.
> 
> PCI code for other Linux architectures use increasing assignment of the PCI
> domain for individual controllers (assign the first free number), like it
> was also for powerpc prior mentioned commit.
> 
> Upgrading powerpc kernels from LTS 4.4 version (which does not contain
> mentioned commit) to new LTS versions brings a regression in domain
> assignment.

Can you elaborate why it is a regression ?

That commit says 'no functionnal changes', I'm having hard time 
understanding how a nochange can be a regression.

Usually we don't commit regressions to mainline ...


> 
> Fix this issue by introducing a new option CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG
> When this options is disabled then powerpc kernel would assign PCI domains
> in the similar way like it is doing kernel for other architectures and also
> how it was done prior that commit.

You don't define CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG on by default, it 
means this commit will change the behaviour. Is that expected ?

Is that really worth a user selectable option ? Is the user able to 
decide what he needs ?

> 
> Fixes: 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties")

Is that really a fix ? What is the problem really ?

> Signed-off-by: Pali Rohár <pali@kernel.org>
> ---
>   arch/powerpc/Kconfig             | 10 ++++++++++
>   arch/powerpc/kernel/pci-common.c |  4 ++--
>   2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 174edabb74fa..4dd3e3acddda 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -375,6 +375,16 @@ config PPC_OF_PLATFORM_PCI
>   	depends on PCI
>   	depends on PPC64 # not supported on 32 bits yet
>   
> +config PPC_PCI_DOMAIN_FROM_OF_REG
> +	bool "Use OF reg property for PCI domain"
> +	depends on PCI

Should it depend on PPC_OF_PLATFORM_PCI instead ?

> +	help
> +	  By default PCI domain for host bridge during its registration is
> +	  chosen as the lowest unused PCI domain number.
> +
> +	  When this option is enabled then PCI domain is determined from
> +	  the OF / Device Tree 'reg' property.
> +
>   config ARCH_SUPPORTS_UPROBES
>   	def_bool y
>   
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index 8bc9cf62cd93..8cb6fc5302ae 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -74,7 +74,6 @@ void __init set_pci_dma_ops(const struct dma_map_ops *dma_ops)
>   static int get_phb_number(struct device_node *dn)
>   {
>   	int ret, phb_id = -1;
> -	u32 prop_32;
>   	u64 prop;
>   
>   	/*
> @@ -83,7 +82,8 @@ static int get_phb_number(struct device_node *dn)
>   	 * reading "ibm,opal-phbid", only present in OPAL environment.
>   	 */
>   	ret = of_property_read_u64(dn, "ibm,opal-phbid", &prop);

This looks like very specific, it is not reflected in the commit log.

> -	if (ret) {
> +	if (ret && IS_ENABLED(CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG)) {
> +		u32 prop_32;
>   		ret = of_property_read_u32_index(dn, "reg", 1, &prop_32);
>   		prop = prop_32;
>   	}
Pali Rohár May 5, 2022, 9:31 a.m. UTC | #2
Hello!

On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote:
> Le 04/05/2022 à 19:57, Pali Rohár a écrit :
> > Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on
> > device-tree properties"), powerpc kernel always fallback to PCI domain
> > assignment from OF / Device Tree 'reg' property of the PCI controller.
> > 
> > PCI code for other Linux architectures use increasing assignment of the PCI
> > domain for individual controllers (assign the first free number), like it
> > was also for powerpc prior mentioned commit.
> > 
> > Upgrading powerpc kernels from LTS 4.4 version (which does not contain
> > mentioned commit) to new LTS versions brings a regression in domain
> > assignment.
> 
> Can you elaborate why it is a regression ?
> 
> That commit says 'no functionnal changes', I'm having hard time 
> understanding how a nochange can be a regression.

It is not 'no functional change'. That commit completely changed PCI
domain assignment in a way that is incompatible with other architectures
and also incompatible with the way how it was done prior that commit.

For example, prior that commit on P2020 RDB board were PCI domains 0, 1 and 2.

$ lspci
0000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
0000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0001:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
0001:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
0002:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
0002:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter

After that commit on P2020 RDB board are PCI domains 0x8000, 0x9000 and 0xa000.

$ lspci
8000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
8000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
9000:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
9000:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
a000:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
a000:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter

It is somehow strange that PCI domains are not indexed one by one and
also that there is no domain 0.

With my patch when CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG is not set, then
previous behavior used and PCI domains are again 0, 1 and 2.

> Usually we don't commit regressions to mainline ...
> 
> 
> > 
> > Fix this issue by introducing a new option CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG
> > When this options is disabled then powerpc kernel would assign PCI domains
> > in the similar way like it is doing kernel for other architectures and also
> > how it was done prior that commit.
> 
> You don't define CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG on by default, it 
> means this commit will change the behaviour. Is that expected ?
> 
> Is that really worth a user selectable option ? Is the user able to 
> decide what he needs ?

Well, I hope that maintainers of that code answer to these questions.

In any case, I think that it could be a user selectable option as in
that commit is explained that in some situation is makes sense to do
PCI domain numbering based on DT reg.

But as I pointed above, upgrading from 4.4 TLS kernel to some new TLS
kernel brings above regression, so I think that there should be a way to
disable this behavior.

In my opinion, for people who are upgrading from 4.4 TLS kernel, this
option should be turned off by default (= do not change behavior). For
people who want same behaviour on powerpc as on other platforms, also it
should be turned off by default.

> > 
> > Fixes: 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties")
> 
> Is that really a fix ? What is the problem really ?

Problem is that PCI domains were changed in a way that is not compatible
neither with version prior that commit and neither with how other linux
platforms assign PCI domains for controllers.

> > Signed-off-by: Pali Rohár <pali@kernel.org>
> > ---
> >   arch/powerpc/Kconfig             | 10 ++++++++++
> >   arch/powerpc/kernel/pci-common.c |  4 ++--
> >   2 files changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index 174edabb74fa..4dd3e3acddda 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -375,6 +375,16 @@ config PPC_OF_PLATFORM_PCI
> >   	depends on PCI
> >   	depends on PPC64 # not supported on 32 bits yet
> >   
> > +config PPC_PCI_DOMAIN_FROM_OF_REG
> > +	bool "Use OF reg property for PCI domain"
> > +	depends on PCI
> 
> Should it depend on PPC_OF_PLATFORM_PCI instead ?

No, PPC_OF_PLATFORM_PCI has line "depends on PPC64 # not supported on 32
bits yet". But it is already used also for e.g. P2020 which is 32-bit
platform.

> > +	help
> > +	  By default PCI domain for host bridge during its registration is
> > +	  chosen as the lowest unused PCI domain number.
> > +
> > +	  When this option is enabled then PCI domain is determined from
> > +	  the OF / Device Tree 'reg' property.
> > +
> >   config ARCH_SUPPORTS_UPROBES
> >   	def_bool y
> >   
> > diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> > index 8bc9cf62cd93..8cb6fc5302ae 100644
> > --- a/arch/powerpc/kernel/pci-common.c
> > +++ b/arch/powerpc/kernel/pci-common.c
> > @@ -74,7 +74,6 @@ void __init set_pci_dma_ops(const struct dma_map_ops *dma_ops)
> >   static int get_phb_number(struct device_node *dn)
> >   {
> >   	int ret, phb_id = -1;
> > -	u32 prop_32;
> >   	u64 prop;
> >   
> >   	/*
> > @@ -83,7 +82,8 @@ static int get_phb_number(struct device_node *dn)
> >   	 * reading "ibm,opal-phbid", only present in OPAL environment.
> >   	 */
> >   	ret = of_property_read_u64(dn, "ibm,opal-phbid", &prop);
> 
> This looks like very specific, it is not reflected in the commit log.

I have not changed nor touched this "ibm,opal-phbid" setting. And it was
not also touched in that mentioned patch. I see that no DTS file in
kernel use this option (so probably only DTS files supplied by
bootloader use it). So I thought that there is not reason to mention in
commit message.

But if you think so, I can add some info to commit message about it.

> > -	if (ret) {
> > +	if (ret && IS_ENABLED(CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG)) {
> > +		u32 prop_32;
> >   		ret = of_property_read_u32_index(dn, "reg", 1, &prop_32);
> >   		prop = prop_32;
> >   	}
Tyrel Datwyler May 5, 2022, 10:10 p.m. UTC | #3
On 5/5/22 02:31, Pali Rohár wrote:
> Hello!
> 
> On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote:
>> Le 04/05/2022 à 19:57, Pali Rohár a écrit :
>>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on
>>> device-tree properties"), powerpc kernel always fallback to PCI domain
>>> assignment from OF / Device Tree 'reg' property of the PCI controller.
>>>
>>> PCI code for other Linux architectures use increasing assignment of the PCI
>>> domain for individual controllers (assign the first free number), like it
>>> was also for powerpc prior mentioned commit.
>>>
>>> Upgrading powerpc kernels from LTS 4.4 version (which does not contain
>>> mentioned commit) to new LTS versions brings a regression in domain
>>> assignment.
>>
>> Can you elaborate why it is a regression ?
>>63a72284b159
>> That commit says 'no functionnal changes', I'm having hard time 
>> understanding how a nochange can be a regression.
> 
> It is not 'no functional change'. That commit completely changed PCI
> domain assignment in a way that is incompatible with other architectures
> and also incompatible with the way how it was done prior that commit.

I agree that the "no functional change" statement is incorrect. However, for
most powerpc platforms it ended up being simply a cosmetic behavior change. As
far as I can tell there is nothing requiring domain ids to increase montonically
from zero or that each architecture is required to use the same domain numbering
scheme. Its hard to call this a true regression unless it actually broke
something. The commit in question has been in the kernel since 4.8 which was
released over 5 1/2 years ago.

With all that said looking closer at the code in question I think it is fair to
assume that the author only intended this change for powernv and pseries
platforms and not every powerpc platform. That change was done to make
persistent naming easier to manage in userspace. Your change defaults back to
the old behavior which will now break both powernv and pseries platforms with
regard to hotplugging and persistent naming.

We could properly limit it to powernv and pseries by using ibm,fw-phb-id instead
of reg property in the look up that follows a failed ibm,opal-phbid lookup. I
think this is acceptable as long as no other powerpc platforms have started
using this behavior for persistent naming.

-Tyrel

> For example, prior that commit on P2020 RDB board were PCI domains 0, 1 and 2.
> 
> $ lspci
> 0000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> 0000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
> 0001:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> 0001:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
> 0002:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> 0002:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter
> 
> After that commit on P2020 RDB board are PCI domains 0x8000, 0x9000 and 0xa000.
> 
> $ lspci
> 8000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> 8000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
> 9000:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> 9000:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
> a000:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> a000:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter
> 
> It is somehow strange that PCI domains are not indexed one by one and
> also that there is no domain 0
> 
> With my patch when CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG is not set, then
> previous behavior used and PCI domains are again 0, 1 and 2.
> 
>> Usually we don't commit regressions to mainline ...
>>
>>
>>>
>>> Fix this issue by introducing a new option CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG
>>> When this options is disabled then powerpc kernel would assign PCI domains
>>> in the similar way like it is doing kernel for other architectures and also
>>> how it was done prior that commit.
>>
>> You don't define CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG on by default, it 
>> means this commit will change the behaviour. Is that expected ?
>>
>> Is that really worth a user selectable option ? Is the user able to 
>> decide what he needs ?
> 
> Well, I hope that maintainers of that code answer to these questions.
> 
> In any case, I think that it could be a user selectable option as in
> that commit is explained that in some situation is makes sense to do
> PCI domain numbering based on DT reg.
> 
> But as I pointed above, upgrading from 4.4 TLS kernel to some new TLS
> kernel brings above regression, so I think that there should be a way to
> disable this behavior.
> 
> In my opinion, for people who are upgrading from 4.4 TLS kernel, this
> option should be turned off by default (= do not change behavior). For
> people who want same behaviour on powerpc as on other platforms, also it
> should be turned off by default.
> 
>>>
>>> Fixes: 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties")
>>
>> Is that really a fix ? What is the problem really ?
> 
> Problem is that PCI domains were changed in a way that is not compatible
> neither with version prior that commit and neither with how other linux
> platforms assign PCI domains for controllers.
> 
>>> Signed-off-by: Pali Rohár <pali@kernel.org>
>>> ---
>>>   arch/powerpc/Kconfig             | 10 ++++++++++
>>>   arch/powerpc/kernel/pci-common.c |  4 ++--
>>>   2 files changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>> index 174edabb74fa..4dd3e3acddda 100644
>>> --- a/arch/powerpc/Kconfig
>>> +++ b/arch/powerpc/Kconfig
>>> @@ -375,6 +375,16 @@ config PPC_OF_PLATFORM_PCI
>>>   	depends on PCI
>>>   	depends on PPC64 # not supported on 32 bits yet
>>>   
>>> +config PPC_PCI_DOMAIN_FROM_OF_REG
>>> +	bool "Use OF reg property for PCI domain"
>>> +	depends on PCI
>>
>> Should it depend on PPC_OF_PLATFORM_PCI instead ?
> 
> No, PPC_OF_PLATFORM_PCI has line "depends on PPC64 # not supported on 32
> bits yet". But it is already used also for e.g. P2020 which is 32-bit
> platform.
> 
>>> +	help
>>> +	  By default PCI domain for host bridge during its registration is
>>> +	  chosen as the lowest unused PCI domain number.
>>> +
>>> +	  When this option is enabled then PCI domain is determined from
>>> +	  the OF / Device Tree 'reg' property.
>>> +
>>>   config ARCH_SUPPORTS_UPROBES
>>>   	def_bool y
>>>   
>>> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>>> index 8bc9cf62cd93..8cb6fc5302ae 100644
>>> --- a/arch/powerpc/kernel/pci-common.c
>>> +++ b/arch/powerpc/kernel/pci-common.c
>>> @@ -74,7 +74,6 @@ void __init set_pci_dma_ops(const struct dma_map_ops *dma_ops)
>>>   static int get_phb_number(struct device_node *dn)
>>>   {
>>>   	int ret, phb_id = -1;
>>> -	u32 prop_32;
>>>   	u64 prop;
>>>   
>>>   	/*
>>> @@ -83,7 +82,8 @@ static int get_phb_number(struct device_node *dn)
>>>   	 * reading "ibm,opal-phbid", only present in OPAL environment.
>>>   	 */
>>>   	ret = of_property_read_u64(dn, "ibm,opal-phbid", &prop);
>>
>> This looks like very specific, it is not reflected in the commit log.
> 
> I have not changed nor touched this "ibm,opal-phbid" setting. And it was
> not also touched in that mentioned patch. I see that no DTS file in
> kernel use this option (so probably only DTS files supplied by
> bootloader use it). So I thought that there is not reason to mention in
> commit message.
> 
> But if you think so, I can add some info to commit message about it.
> 
>>> -	if (ret) {
>>> +	if (ret && IS_ENABLED(CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG)) {
>>> +		u32 prop_32;
>>>   		ret = of_property_read_u32_index(dn, "reg", 1, &prop_32);
>>>   		prop = prop_32;
>>>   	}
Pali Rohár May 5, 2022, 10:33 p.m. UTC | #4
On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote:
> On 5/5/22 02:31, Pali Rohár wrote:
> > Hello!
> > 
> > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote:
> >> Le 04/05/2022 à 19:57, Pali Rohár a écrit :
> >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on
> >>> device-tree properties"), powerpc kernel always fallback to PCI domain
> >>> assignment from OF / Device Tree 'reg' property of the PCI controller.
> >>>
> >>> PCI code for other Linux architectures use increasing assignment of the PCI
> >>> domain for individual controllers (assign the first free number), like it
> >>> was also for powerpc prior mentioned commit.
> >>>
> >>> Upgrading powerpc kernels from LTS 4.4 version (which does not contain
> >>> mentioned commit) to new LTS versions brings a regression in domain
> >>> assignment.
> >>
> >> Can you elaborate why it is a regression ?
> >>63a72284b159
> >> That commit says 'no functionnal changes', I'm having hard time 
> >> understanding how a nochange can be a regression.
> > 
> > It is not 'no functional change'. That commit completely changed PCI
> > domain assignment in a way that is incompatible with other architectures
> > and also incompatible with the way how it was done prior that commit.
> 
> I agree that the "no functional change" statement is incorrect. However, for
> most powerpc platforms it ended up being simply a cosmetic behavior change. As
> far as I can tell there is nothing requiring domain ids to increase montonically
> from zero or that each architecture is required to use the same domain numbering
> scheme.

That is truth. But it looks really suspicious why domains are not
assigned monotonically. Some scripts / applications are using PCI
location (domain:bus:dev:func) for remembering PCI device and domain
change can cause issue for config files. And some (older) applications
expects existence of domain zero. In systems without hot plug support
with small number of domains (e.g. 3) it means that there are always
domains 0, 1 and 2.

> Its hard to call this a true regression unless it actually broke
> something. The commit in question has been in the kernel since 4.8 which was
> released over 5 1/2 years ago.

I agree, it really depends on how you look at it.

The important is that lot of people are using LTS versions and are doing
upgrades when LTS support is dropped. Which for 4.4 now happened. So not
all smaller or "cosmetic" changes could be detected until longer LTS
period pass.

> With all that said looking closer at the code in question I think it is fair to
> assume that the author only intended this change for powernv and pseries
> platforms and not every powerpc platform. That change was done to make
> persistent naming easier to manage in userspace.

I agree that this behavior change may be useful in some situations and I
do not object this need.

> Your change defaults back to
> the old behavior which will now break both powernv and pseries platforms with
> regard to hotplugging and persistent naming.

I was aware of it, that change could cause issues. And that is why I
added config option for choosing behavior. So users would be able to
choose what they need.

> We could properly limit it to powernv and pseries by using ibm,fw-phb-id instead
> of reg property in the look up that follows a failed ibm,opal-phbid lookup. I
> think this is acceptable as long as no other powerpc platforms have started
> using this behavior for persistent naming.

And what about setting that new config option to enabled by default for
those series?

Or is there issue with introduction of the new config option?

One of the point is that it is really a good idea to have similar/same
behavior for all linux platforms. And if it cannot be enabled by default
(for backward compatibility) add at least some option, so new platforms
can start using it or users can decide to switch behavior.

> -Tyrel
> 
> > For example, prior that commit on P2020 RDB board were PCI domains 0, 1 and 2.
> > 
> > $ lspci
> > 0000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > 0000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
> > 0001:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > 0001:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
> > 0002:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > 0002:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter
> > 
> > After that commit on P2020 RDB board are PCI domains 0x8000, 0x9000 and 0xa000.
> > 
> > $ lspci
> > 8000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > 8000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
> > 9000:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > 9000:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
> > a000:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > a000:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter
> > 
> > It is somehow strange that PCI domains are not indexed one by one and
> > also that there is no domain 0
> > 
> > With my patch when CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG is not set, then
> > previous behavior used and PCI domains are again 0, 1 and 2.
> > 
> >> Usually we don't commit regressions to mainline ...
> >>
> >>
> >>>
> >>> Fix this issue by introducing a new option CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG
> >>> When this options is disabled then powerpc kernel would assign PCI domains
> >>> in the similar way like it is doing kernel for other architectures and also
> >>> how it was done prior that commit.
> >>
> >> You don't define CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG on by default, it 
> >> means this commit will change the behaviour. Is that expected ?
> >>
> >> Is that really worth a user selectable option ? Is the user able to 
> >> decide what he needs ?
> > 
> > Well, I hope that maintainers of that code answer to these questions.
> > 
> > In any case, I think that it could be a user selectable option as in
> > that commit is explained that in some situation is makes sense to do
> > PCI domain numbering based on DT reg.
> > 
> > But as I pointed above, upgrading from 4.4 TLS kernel to some new TLS
> > kernel brings above regression, so I think that there should be a way to
> > disable this behavior.
> > 
> > In my opinion, for people who are upgrading from 4.4 TLS kernel, this
> > option should be turned off by default (= do not change behavior). For
> > people who want same behaviour on powerpc as on other platforms, also it
> > should be turned off by default.
> > 
> >>>
> >>> Fixes: 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties")
> >>
> >> Is that really a fix ? What is the problem really ?
> > 
> > Problem is that PCI domains were changed in a way that is not compatible
> > neither with version prior that commit and neither with how other linux
> > platforms assign PCI domains for controllers.
> > 
> >>> Signed-off-by: Pali Rohár <pali@kernel.org>
> >>> ---
> >>>   arch/powerpc/Kconfig             | 10 ++++++++++
> >>>   arch/powerpc/kernel/pci-common.c |  4 ++--
> >>>   2 files changed, 12 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> >>> index 174edabb74fa..4dd3e3acddda 100644
> >>> --- a/arch/powerpc/Kconfig
> >>> +++ b/arch/powerpc/Kconfig
> >>> @@ -375,6 +375,16 @@ config PPC_OF_PLATFORM_PCI
> >>>   	depends on PCI
> >>>   	depends on PPC64 # not supported on 32 bits yet
> >>>   
> >>> +config PPC_PCI_DOMAIN_FROM_OF_REG
> >>> +	bool "Use OF reg property for PCI domain"
> >>> +	depends on PCI
> >>
> >> Should it depend on PPC_OF_PLATFORM_PCI instead ?
> > 
> > No, PPC_OF_PLATFORM_PCI has line "depends on PPC64 # not supported on 32
> > bits yet". But it is already used also for e.g. P2020 which is 32-bit
> > platform.
> > 
> >>> +	help
> >>> +	  By default PCI domain for host bridge during its registration is
> >>> +	  chosen as the lowest unused PCI domain number.
> >>> +
> >>> +	  When this option is enabled then PCI domain is determined from
> >>> +	  the OF / Device Tree 'reg' property.
> >>> +
> >>>   config ARCH_SUPPORTS_UPROBES
> >>>   	def_bool y
> >>>   
> >>> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> >>> index 8bc9cf62cd93..8cb6fc5302ae 100644
> >>> --- a/arch/powerpc/kernel/pci-common.c
> >>> +++ b/arch/powerpc/kernel/pci-common.c
> >>> @@ -74,7 +74,6 @@ void __init set_pci_dma_ops(const struct dma_map_ops *dma_ops)
> >>>   static int get_phb_number(struct device_node *dn)
> >>>   {
> >>>   	int ret, phb_id = -1;
> >>> -	u32 prop_32;
> >>>   	u64 prop;
> >>>   
> >>>   	/*
> >>> @@ -83,7 +82,8 @@ static int get_phb_number(struct device_node *dn)
> >>>   	 * reading "ibm,opal-phbid", only present in OPAL environment.
> >>>   	 */
> >>>   	ret = of_property_read_u64(dn, "ibm,opal-phbid", &prop);
> >>
> >> This looks like very specific, it is not reflected in the commit log.
> > 
> > I have not changed nor touched this "ibm,opal-phbid" setting. And it was
> > not also touched in that mentioned patch. I see that no DTS file in
> > kernel use this option (so probably only DTS files supplied by
> > bootloader use it). So I thought that there is not reason to mention in
> > commit message.
> > 
> > But if you think so, I can add some info to commit message about it.
> > 
> >>> -	if (ret) {
> >>> +	if (ret && IS_ENABLED(CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG)) {
> >>> +		u32 prop_32;
> >>>   		ret = of_property_read_u32_index(dn, "reg", 1, &prop_32);
> >>>   		prop = prop_32;
> >>>   	}
>
Pali Rohár May 24, 2022, 9:17 a.m. UTC | #5
On Friday 06 May 2022 00:33:02 Pali Rohár wrote:
> On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote:
> > On 5/5/22 02:31, Pali Rohár wrote:
> > > Hello!
> > > 
> > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote:
> > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit :
> > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on
> > >>> device-tree properties"), powerpc kernel always fallback to PCI domain
> > >>> assignment from OF / Device Tree 'reg' property of the PCI controller.
> > >>>
> > >>> PCI code for other Linux architectures use increasing assignment of the PCI
> > >>> domain for individual controllers (assign the first free number), like it
> > >>> was also for powerpc prior mentioned commit.
> > >>>
> > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not contain
> > >>> mentioned commit) to new LTS versions brings a regression in domain
> > >>> assignment.
> > >>
> > >> Can you elaborate why it is a regression ?
> > >>63a72284b159
> > >> That commit says 'no functionnal changes', I'm having hard time 
> > >> understanding how a nochange can be a regression.
> > > 
> > > It is not 'no functional change'. That commit completely changed PCI
> > > domain assignment in a way that is incompatible with other architectures
> > > and also incompatible with the way how it was done prior that commit.
> > 
> > I agree that the "no functional change" statement is incorrect. However, for
> > most powerpc platforms it ended up being simply a cosmetic behavior change. As
> > far as I can tell there is nothing requiring domain ids to increase montonically
> > from zero or that each architecture is required to use the same domain numbering
> > scheme.
> 
> That is truth. But it looks really suspicious why domains are not
> assigned monotonically. Some scripts / applications are using PCI
> location (domain:bus:dev:func) for remembering PCI device and domain
> change can cause issue for config files. And some (older) applications
> expects existence of domain zero. In systems without hot plug support
> with small number of domains (e.g. 3) it means that there are always
> domains 0, 1 and 2.
> 
> > Its hard to call this a true regression unless it actually broke
> > something. The commit in question has been in the kernel since 4.8 which was
> > released over 5 1/2 years ago.
> 
> I agree, it really depends on how you look at it.
> 
> The important is that lot of people are using LTS versions and are doing
> upgrades when LTS support is dropped. Which for 4.4 now happened. So not
> all smaller or "cosmetic" changes could be detected until longer LTS
> period pass.
> 
> > With all that said looking closer at the code in question I think it is fair to
> > assume that the author only intended this change for powernv and pseries
> > platforms and not every powerpc platform. That change was done to make
> > persistent naming easier to manage in userspace.
> 
> I agree that this behavior change may be useful in some situations and I
> do not object this need.
> 
> > Your change defaults back to
> > the old behavior which will now break both powernv and pseries platforms with
> > regard to hotplugging and persistent naming.
> 
> I was aware of it, that change could cause issues. And that is why I
> added config option for choosing behavior. So users would be able to
> choose what they need.
> 
> > We could properly limit it to powernv and pseries by using ibm,fw-phb-id instead
> > of reg property in the look up that follows a failed ibm,opal-phbid lookup. I
> > think this is acceptable as long as no other powerpc platforms have started
> > using this behavior for persistent naming.
> 
> And what about setting that new config option to enabled by default for
> those series?
> 
> Or is there issue with introduction of the new config option?

PING? Any opinion?

> One of the point is that it is really a good idea to have similar/same
> behavior for all linux platforms. And if it cannot be enabled by default
> (for backward compatibility) add at least some option, so new platforms
> can start using it or users can decide to switch behavior.
> 
> > -Tyrel
> > 
> > > For example, prior that commit on P2020 RDB board were PCI domains 0, 1 and 2.
> > > 
> > > $ lspci
> > > 0000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > 0000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
> > > 0001:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > 0001:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
> > > 0002:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > 0002:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter
> > > 
> > > After that commit on P2020 RDB board are PCI domains 0x8000, 0x9000 and 0xa000.
> > > 
> > > $ lspci
> > > 8000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > 8000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
> > > 9000:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > 9000:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
> > > a000:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > a000:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter
> > > 
> > > It is somehow strange that PCI domains are not indexed one by one and
> > > also that there is no domain 0
> > > 
> > > With my patch when CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG is not set, then
> > > previous behavior used and PCI domains are again 0, 1 and 2.
> > > 
> > >> Usually we don't commit regressions to mainline ...
> > >>
> > >>
> > >>>
> > >>> Fix this issue by introducing a new option CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG
> > >>> When this options is disabled then powerpc kernel would assign PCI domains
> > >>> in the similar way like it is doing kernel for other architectures and also
> > >>> how it was done prior that commit.
> > >>
> > >> You don't define CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG on by default, it 
> > >> means this commit will change the behaviour. Is that expected ?
> > >>
> > >> Is that really worth a user selectable option ? Is the user able to 
> > >> decide what he needs ?
> > > 
> > > Well, I hope that maintainers of that code answer to these questions.
> > > 
> > > In any case, I think that it could be a user selectable option as in
> > > that commit is explained that in some situation is makes sense to do
> > > PCI domain numbering based on DT reg.
> > > 
> > > But as I pointed above, upgrading from 4.4 TLS kernel to some new TLS
> > > kernel brings above regression, so I think that there should be a way to
> > > disable this behavior.
> > > 
> > > In my opinion, for people who are upgrading from 4.4 TLS kernel, this
> > > option should be turned off by default (= do not change behavior). For
> > > people who want same behaviour on powerpc as on other platforms, also it
> > > should be turned off by default.
> > > 
> > >>>
> > >>> Fixes: 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties")
> > >>
> > >> Is that really a fix ? What is the problem really ?
> > > 
> > > Problem is that PCI domains were changed in a way that is not compatible
> > > neither with version prior that commit and neither with how other linux
> > > platforms assign PCI domains for controllers.
> > > 
> > >>> Signed-off-by: Pali Rohár <pali@kernel.org>
> > >>> ---
> > >>>   arch/powerpc/Kconfig             | 10 ++++++++++
> > >>>   arch/powerpc/kernel/pci-common.c |  4 ++--
> > >>>   2 files changed, 12 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > >>> index 174edabb74fa..4dd3e3acddda 100644
> > >>> --- a/arch/powerpc/Kconfig
> > >>> +++ b/arch/powerpc/Kconfig
> > >>> @@ -375,6 +375,16 @@ config PPC_OF_PLATFORM_PCI
> > >>>   	depends on PCI
> > >>>   	depends on PPC64 # not supported on 32 bits yet
> > >>>   
> > >>> +config PPC_PCI_DOMAIN_FROM_OF_REG
> > >>> +	bool "Use OF reg property for PCI domain"
> > >>> +	depends on PCI
> > >>
> > >> Should it depend on PPC_OF_PLATFORM_PCI instead ?
> > > 
> > > No, PPC_OF_PLATFORM_PCI has line "depends on PPC64 # not supported on 32
> > > bits yet". But it is already used also for e.g. P2020 which is 32-bit
> > > platform.
> > > 
> > >>> +	help
> > >>> +	  By default PCI domain for host bridge during its registration is
> > >>> +	  chosen as the lowest unused PCI domain number.
> > >>> +
> > >>> +	  When this option is enabled then PCI domain is determined from
> > >>> +	  the OF / Device Tree 'reg' property.
> > >>> +
> > >>>   config ARCH_SUPPORTS_UPROBES
> > >>>   	def_bool y
> > >>>   
> > >>> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> > >>> index 8bc9cf62cd93..8cb6fc5302ae 100644
> > >>> --- a/arch/powerpc/kernel/pci-common.c
> > >>> +++ b/arch/powerpc/kernel/pci-common.c
> > >>> @@ -74,7 +74,6 @@ void __init set_pci_dma_ops(const struct dma_map_ops *dma_ops)
> > >>>   static int get_phb_number(struct device_node *dn)
> > >>>   {
> > >>>   	int ret, phb_id = -1;
> > >>> -	u32 prop_32;
> > >>>   	u64 prop;
> > >>>   
> > >>>   	/*
> > >>> @@ -83,7 +82,8 @@ static int get_phb_number(struct device_node *dn)
> > >>>   	 * reading "ibm,opal-phbid", only present in OPAL environment.
> > >>>   	 */
> > >>>   	ret = of_property_read_u64(dn, "ibm,opal-phbid", &prop);
> > >>
> > >> This looks like very specific, it is not reflected in the commit log.
> > > 
> > > I have not changed nor touched this "ibm,opal-phbid" setting. And it was
> > > not also touched in that mentioned patch. I see that no DTS file in
> > > kernel use this option (so probably only DTS files supplied by
> > > bootloader use it). So I thought that there is not reason to mention in
> > > commit message.
> > > 
> > > But if you think so, I can add some info to commit message about it.
> > > 
> > >>> -	if (ret) {
> > >>> +	if (ret && IS_ENABLED(CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG)) {
> > >>> +		u32 prop_32;
> > >>>   		ret = of_property_read_u32_index(dn, "reg", 1, &prop_32);
> > >>>   		prop = prop_32;
> > >>>   	}
> >
Pali Rohár June 9, 2022, 10:19 a.m. UTC | #6
On Tuesday 24 May 2022 11:17:56 Pali Rohár wrote:
> On Friday 06 May 2022 00:33:02 Pali Rohár wrote:
> > On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote:
> > > On 5/5/22 02:31, Pali Rohár wrote:
> > > > Hello!
> > > > 
> > > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote:
> > > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit :
> > > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on
> > > >>> device-tree properties"), powerpc kernel always fallback to PCI domain
> > > >>> assignment from OF / Device Tree 'reg' property of the PCI controller.
> > > >>>
> > > >>> PCI code for other Linux architectures use increasing assignment of the PCI
> > > >>> domain for individual controllers (assign the first free number), like it
> > > >>> was also for powerpc prior mentioned commit.
> > > >>>
> > > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not contain
> > > >>> mentioned commit) to new LTS versions brings a regression in domain
> > > >>> assignment.
> > > >>
> > > >> Can you elaborate why it is a regression ?
> > > >>63a72284b159
> > > >> That commit says 'no functionnal changes', I'm having hard time 
> > > >> understanding how a nochange can be a regression.
> > > > 
> > > > It is not 'no functional change'. That commit completely changed PCI
> > > > domain assignment in a way that is incompatible with other architectures
> > > > and also incompatible with the way how it was done prior that commit.
> > > 
> > > I agree that the "no functional change" statement is incorrect. However, for
> > > most powerpc platforms it ended up being simply a cosmetic behavior change. As
> > > far as I can tell there is nothing requiring domain ids to increase montonically
> > > from zero or that each architecture is required to use the same domain numbering
> > > scheme.
> > 
> > That is truth. But it looks really suspicious why domains are not
> > assigned monotonically. Some scripts / applications are using PCI
> > location (domain:bus:dev:func) for remembering PCI device and domain
> > change can cause issue for config files. And some (older) applications
> > expects existence of domain zero. In systems without hot plug support
> > with small number of domains (e.g. 3) it means that there are always
> > domains 0, 1 and 2.
> > 
> > > Its hard to call this a true regression unless it actually broke
> > > something. The commit in question has been in the kernel since 4.8 which was
> > > released over 5 1/2 years ago.
> > 
> > I agree, it really depends on how you look at it.
> > 
> > The important is that lot of people are using LTS versions and are doing
> > upgrades when LTS support is dropped. Which for 4.4 now happened. So not
> > all smaller or "cosmetic" changes could be detected until longer LTS
> > period pass.
> > 
> > > With all that said looking closer at the code in question I think it is fair to
> > > assume that the author only intended this change for powernv and pseries
> > > platforms and not every powerpc platform. That change was done to make
> > > persistent naming easier to manage in userspace.
> > 
> > I agree that this behavior change may be useful in some situations and I
> > do not object this need.
> > 
> > > Your change defaults back to
> > > the old behavior which will now break both powernv and pseries platforms with
> > > regard to hotplugging and persistent naming.
> > 
> > I was aware of it, that change could cause issues. And that is why I
> > added config option for choosing behavior. So users would be able to
> > choose what they need.
> > 
> > > We could properly limit it to powernv and pseries by using ibm,fw-phb-id instead
> > > of reg property in the look up that follows a failed ibm,opal-phbid lookup. I
> > > think this is acceptable as long as no other powerpc platforms have started
> > > using this behavior for persistent naming.
> > 
> > And what about setting that new config option to enabled by default for
> > those series?
> > 
> > Or is there issue with introduction of the new config option?
> 
> PING? Any opinion?

PING?

> > One of the point is that it is really a good idea to have similar/same
> > behavior for all linux platforms. And if it cannot be enabled by default
> > (for backward compatibility) add at least some option, so new platforms
> > can start using it or users can decide to switch behavior.
> > 
> > > -Tyrel
> > > 
> > > > For example, prior that commit on P2020 RDB board were PCI domains 0, 1 and 2.
> > > > 
> > > > $ lspci
> > > > 0000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > > 0000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
> > > > 0001:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > > 0001:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
> > > > 0002:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > > 0002:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter
> > > > 
> > > > After that commit on P2020 RDB board are PCI domains 0x8000, 0x9000 and 0xa000.
> > > > 
> > > > $ lspci
> > > > 8000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > > 8000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
> > > > 9000:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > > 9000:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
> > > > a000:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > > a000:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter
> > > > 
> > > > It is somehow strange that PCI domains are not indexed one by one and
> > > > also that there is no domain 0
> > > > 
> > > > With my patch when CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG is not set, then
> > > > previous behavior used and PCI domains are again 0, 1 and 2.
> > > > 
> > > >> Usually we don't commit regressions to mainline ...
> > > >>
> > > >>
> > > >>>
> > > >>> Fix this issue by introducing a new option CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG
> > > >>> When this options is disabled then powerpc kernel would assign PCI domains
> > > >>> in the similar way like it is doing kernel for other architectures and also
> > > >>> how it was done prior that commit.
> > > >>
> > > >> You don't define CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG on by default, it 
> > > >> means this commit will change the behaviour. Is that expected ?
> > > >>
> > > >> Is that really worth a user selectable option ? Is the user able to 
> > > >> decide what he needs ?
> > > > 
> > > > Well, I hope that maintainers of that code answer to these questions.
> > > > 
> > > > In any case, I think that it could be a user selectable option as in
> > > > that commit is explained that in some situation is makes sense to do
> > > > PCI domain numbering based on DT reg.
> > > > 
> > > > But as I pointed above, upgrading from 4.4 TLS kernel to some new TLS
> > > > kernel brings above regression, so I think that there should be a way to
> > > > disable this behavior.
> > > > 
> > > > In my opinion, for people who are upgrading from 4.4 TLS kernel, this
> > > > option should be turned off by default (= do not change behavior). For
> > > > people who want same behaviour on powerpc as on other platforms, also it
> > > > should be turned off by default.
> > > > 
> > > >>>
> > > >>> Fixes: 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties")
> > > >>
> > > >> Is that really a fix ? What is the problem really ?
> > > > 
> > > > Problem is that PCI domains were changed in a way that is not compatible
> > > > neither with version prior that commit and neither with how other linux
> > > > platforms assign PCI domains for controllers.
> > > > 
> > > >>> Signed-off-by: Pali Rohár <pali@kernel.org>
> > > >>> ---
> > > >>>   arch/powerpc/Kconfig             | 10 ++++++++++
> > > >>>   arch/powerpc/kernel/pci-common.c |  4 ++--
> > > >>>   2 files changed, 12 insertions(+), 2 deletions(-)
> > > >>>
> > > >>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > > >>> index 174edabb74fa..4dd3e3acddda 100644
> > > >>> --- a/arch/powerpc/Kconfig
> > > >>> +++ b/arch/powerpc/Kconfig
> > > >>> @@ -375,6 +375,16 @@ config PPC_OF_PLATFORM_PCI
> > > >>>   	depends on PCI
> > > >>>   	depends on PPC64 # not supported on 32 bits yet
> > > >>>   
> > > >>> +config PPC_PCI_DOMAIN_FROM_OF_REG
> > > >>> +	bool "Use OF reg property for PCI domain"
> > > >>> +	depends on PCI
> > > >>
> > > >> Should it depend on PPC_OF_PLATFORM_PCI instead ?
> > > > 
> > > > No, PPC_OF_PLATFORM_PCI has line "depends on PPC64 # not supported on 32
> > > > bits yet". But it is already used also for e.g. P2020 which is 32-bit
> > > > platform.
> > > > 
> > > >>> +	help
> > > >>> +	  By default PCI domain for host bridge during its registration is
> > > >>> +	  chosen as the lowest unused PCI domain number.
> > > >>> +
> > > >>> +	  When this option is enabled then PCI domain is determined from
> > > >>> +	  the OF / Device Tree 'reg' property.
> > > >>> +
> > > >>>   config ARCH_SUPPORTS_UPROBES
> > > >>>   	def_bool y
> > > >>>   
> > > >>> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> > > >>> index 8bc9cf62cd93..8cb6fc5302ae 100644
> > > >>> --- a/arch/powerpc/kernel/pci-common.c
> > > >>> +++ b/arch/powerpc/kernel/pci-common.c
> > > >>> @@ -74,7 +74,6 @@ void __init set_pci_dma_ops(const struct dma_map_ops *dma_ops)
> > > >>>   static int get_phb_number(struct device_node *dn)
> > > >>>   {
> > > >>>   	int ret, phb_id = -1;
> > > >>> -	u32 prop_32;
> > > >>>   	u64 prop;
> > > >>>   
> > > >>>   	/*
> > > >>> @@ -83,7 +82,8 @@ static int get_phb_number(struct device_node *dn)
> > > >>>   	 * reading "ibm,opal-phbid", only present in OPAL environment.
> > > >>>   	 */
> > > >>>   	ret = of_property_read_u64(dn, "ibm,opal-phbid", &prop);
> > > >>
> > > >> This looks like very specific, it is not reflected in the commit log.
> > > > 
> > > > I have not changed nor touched this "ibm,opal-phbid" setting. And it was
> > > > not also touched in that mentioned patch. I see that no DTS file in
> > > > kernel use this option (so probably only DTS files supplied by
> > > > bootloader use it). So I thought that there is not reason to mention in
> > > > commit message.
> > > > 
> > > > But if you think so, I can add some info to commit message about it.
> > > > 
> > > >>> -	if (ret) {
> > > >>> +	if (ret && IS_ENABLED(CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG)) {
> > > >>> +		u32 prop_32;
> > > >>>   		ret = of_property_read_u32_index(dn, "reg", 1, &prop_32);
> > > >>>   		prop = prop_32;
> > > >>>   	}
> > >
Bjorn Helgaas June 9, 2022, 4:22 p.m. UTC | #7
[+cc Guilherme, Michael, Ben (author of 63a72284b159 and PPC folks), thread:
https://lore.kernel.org/r/20220504175718.29011-1-pali@kernel.org]

On Fri, May 06, 2022 at 12:33:02AM +0200, Pali Rohár wrote:
> On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote:
> > On 5/5/22 02:31, Pali Rohár wrote:
> > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote:
> > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit :
> > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB
> > >>> number based on device-tree properties"), powerpc kernel
> > >>> always fallback to PCI domain assignment from OF / Device Tree
> > >>> 'reg' property of the PCI controller.
> > >>>
> > >>> PCI code for other Linux architectures use increasing
> > >>> assignment of the PCI domain for individual controllers
> > >>> (assign the first free number), like it was also for powerpc
> > >>> prior mentioned commit.
> > >>>
> > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not
> > >>> contain mentioned commit) to new LTS versions brings a
> > >>> regression in domain assignment.
> > >>
> > >> Can you elaborate why it is a regression ?
> > >> 63a72284b159 That commit says 'no functionnal changes', I'm
> > >> having hard time understanding how a nochange can be a
> > >> regression.
> > > 
> > > It is not 'no functional change'. That commit completely changed
> > > PCI domain assignment in a way that is incompatible with other
> > > architectures and also incompatible with the way how it was done
> > > prior that commit.
> > 
> > I agree that the "no functional change" statement is incorrect.
> > However, for most powerpc platforms it ended up being simply a
> > cosmetic behavior change. As far as I can tell there is nothing
> > requiring domain ids to increase montonically from zero or that
> > each architecture is required to use the same domain numbering
> > scheme.
> 
> That is truth. But it looks really suspicious why domains are not
> assigned monotonically. Some scripts / applications are using PCI
> location (domain:bus:dev:func) for remembering PCI device and domain
> change can cause issue for config files. And some (older) applications
> expects existence of domain zero. In systems without hot plug support
> with small number of domains (e.g. 3) it means that there are always
> domains 0, 1 and 2.
> 
> > Its hard to call this a true regression unless it actually broke
> > something. The commit in question has been in the kernel since 4.8
> > which was released over 5 1/2 years ago.
> 
> I agree, it really depends on how you look at it.
> 
> The important is that lot of people are using LTS versions and are
> doing upgrades when LTS support is dropped. Which for 4.4 now
> happened. So not all smaller or "cosmetic" changes could be detected
> until longer LTS period pass.
> 
> > With all that said looking closer at the code in question I think
> > it is fair to assume that the author only intended this change for
> > powernv and pseries platforms and not every powerpc platform. That
> > change was done to make persistent naming easier to manage in
> > userspace.
> 
> I agree that this behavior change may be useful in some situations
> and I do not object this need.
> 
> > Your change defaults back to the old behavior which will now break
> > both powernv and pseries platforms with regard to hotplugging and
> > persistent naming.
> 
> I was aware of it, that change could cause issues. And that is why I
> added config option for choosing behavior. So users would be able to
> choose what they need.
> 
> > We could properly limit it to powernv and pseries by using
> > ibm,fw-phb-id instead of reg property in the look up that follows
> > a failed ibm,opal-phbid lookup. I think this is acceptable as long
> > as no other powerpc platforms have started using this behavior for
> > persistent naming.
> 
> And what about setting that new config option to enabled by default
> for those series?
> 
> Or is there issue with introduction of the new config option?
> 
> One of the point is that it is really a good idea to have
> similar/same behavior for all linux platforms. And if it cannot be
> enabled by default (for backward compatibility) add at least some
> option, so new platforms can start using it or users can decide to
> switch behavior.

This is a powerpc thing so I'm just kibbitzing a little.

This basically looks like a new config option to selectively revert
63a72284b159.  That seems hard to maintain and doesn't seem like
something that needs to be baked into the kernel at compile-time.

The 63a72284b159 commit log says persistent NIC names are tied to PCI
domain/bus/dev/fn addresses, which seems like something we should
discourage because we can't predict PCI addresses in general.  I
assume other platforms typically use udev with MAC addresses or
something?

> > > For example, prior that commit on P2020 RDB board were PCI
> > > domains 0, 1 and 2.
> > > 
> > > $ lspci
> > > 0000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > 0000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
> > > 0001:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > 0001:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
> > > 0002:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > 0002:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter
> > > 
> > > After that commit on P2020 RDB board are PCI domains 0x8000,
> > > 0x9000 and 0xa000.
> > > 
> > > $ lspci
> > > 8000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > 8000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
> > > 9000:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > 9000:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
> > > a000:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > a000:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter
> > > 
> > > It is somehow strange that PCI domains are not indexed one by one and
> > > also that there is no domain 0
> > > 
> > > With my patch when CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG is not set, then
> > > previous behavior used and PCI domains are again 0, 1 and 2.
> > > 
> > >> Usually we don't commit regressions to mainline ...
> > >>
> > >>> Fix this issue by introducing a new option
> > >>> CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG When this options is
> > >>> disabled then powerpc kernel would assign PCI domains in the
> > >>> similar way like it is doing kernel for other architectures
> > >>> and also how it was done prior that commit.
> > >>
> > >> You don't define CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG on by
> > >> default, it means this commit will change the behaviour. Is
> > >> that expected ?
> > >>
> > >> Is that really worth a user selectable option ? Is the user
> > >> able to decide what he needs ?
> > > 
> > > Well, I hope that maintainers of that code answer to these
> > > questions.
> > > 
> > > In any case, I think that it could be a user selectable option
> > > as in that commit is explained that in some situation is makes
> > > sense to do PCI domain numbering based on DT reg.
> > > 
> > > But as I pointed above, upgrading from 4.4 TLS kernel to some
> > > new TLS kernel brings above regression, so I think that there
> > > should be a way to disable this behavior.
> > > 
> > > In my opinion, for people who are upgrading from 4.4 TLS kernel,
> > > this option should be turned off by default (= do not change
> > > behavior). For people who want same behaviour on powerpc as on
> > > other platforms, also it should be turned off by default.
> > > 
> > >>>
> > >>> Fixes: 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties")
> > >>
> > >> Is that really a fix ? What is the problem really ?
> > > 
> > > Problem is that PCI domains were changed in a way that is not
> > > compatible neither with version prior that commit and neither
> > > with how other linux platforms assign PCI domains for
> > > controllers.
> > > 
> > >>> Signed-off-by: Pali Rohár <pali@kernel.org>
> > >>> ---
> > >>>   arch/powerpc/Kconfig             | 10 ++++++++++
> > >>>   arch/powerpc/kernel/pci-common.c |  4 ++--
> > >>>   2 files changed, 12 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > >>> index 174edabb74fa..4dd3e3acddda 100644
> > >>> --- a/arch/powerpc/Kconfig
> > >>> +++ b/arch/powerpc/Kconfig
> > >>> @@ -375,6 +375,16 @@ config PPC_OF_PLATFORM_PCI
> > >>>   	depends on PCI
> > >>>   	depends on PPC64 # not supported on 32 bits yet
> > >>>   
> > >>> +config PPC_PCI_DOMAIN_FROM_OF_REG
> > >>> +	bool "Use OF reg property for PCI domain"
> > >>> +	depends on PCI
> > >>
> > >> Should it depend on PPC_OF_PLATFORM_PCI instead ?
> > > 
> > > No, PPC_OF_PLATFORM_PCI has line "depends on PPC64 # not supported on 32
> > > bits yet". But it is already used also for e.g. P2020 which is 32-bit
> > > platform.
> > > 
> > >>> +	help
> > >>> +	  By default PCI domain for host bridge during its registration is
> > >>> +	  chosen as the lowest unused PCI domain number.
> > >>> +
> > >>> +	  When this option is enabled then PCI domain is determined from
> > >>> +	  the OF / Device Tree 'reg' property.
> > >>> +
> > >>>   config ARCH_SUPPORTS_UPROBES
> > >>>   	def_bool y
> > >>>   
> > >>> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> > >>> index 8bc9cf62cd93..8cb6fc5302ae 100644
> > >>> --- a/arch/powerpc/kernel/pci-common.c
> > >>> +++ b/arch/powerpc/kernel/pci-common.c
> > >>> @@ -74,7 +74,6 @@ void __init set_pci_dma_ops(const struct dma_map_ops *dma_ops)
> > >>>   static int get_phb_number(struct device_node *dn)
> > >>>   {
> > >>>   	int ret, phb_id = -1;
> > >>> -	u32 prop_32;
> > >>>   	u64 prop;
> > >>>   
> > >>>   	/*
> > >>> @@ -83,7 +82,8 @@ static int get_phb_number(struct device_node *dn)
> > >>>   	 * reading "ibm,opal-phbid", only present in OPAL environment.
> > >>>   	 */
> > >>>   	ret = of_property_read_u64(dn, "ibm,opal-phbid", &prop);
> > >>
> > >> This looks like very specific, it is not reflected in the commit log.
> > > 
> > > I have not changed nor touched this "ibm,opal-phbid" setting. And it was
> > > not also touched in that mentioned patch. I see that no DTS file in
> > > kernel use this option (so probably only DTS files supplied by
> > > bootloader use it). So I thought that there is not reason to mention in
> > > commit message.
> > > 
> > > But if you think so, I can add some info to commit message about it.
> > > 
> > >>> -	if (ret) {
> > >>> +	if (ret && IS_ENABLED(CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG)) {
> > >>> +		u32 prop_32;
> > >>>   		ret = of_property_read_u32_index(dn, "reg", 1, &prop_32);
> > >>>   		prop = prop_32;
> > >>>   	}
> >
Pali Rohár June 9, 2022, 4:27 p.m. UTC | #8
On Thursday 09 June 2022 11:22:55 Bjorn Helgaas wrote:
> [+cc Guilherme, Michael, Ben (author of 63a72284b159 and PPC folks), thread:
> https://lore.kernel.org/r/20220504175718.29011-1-pali@kernel.org]
> 
> On Fri, May 06, 2022 at 12:33:02AM +0200, Pali Rohár wrote:
> > On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote:
> > > On 5/5/22 02:31, Pali Rohár wrote:
> > > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote:
> > > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit :
> > > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB
> > > >>> number based on device-tree properties"), powerpc kernel
> > > >>> always fallback to PCI domain assignment from OF / Device Tree
> > > >>> 'reg' property of the PCI controller.
> > > >>>
> > > >>> PCI code for other Linux architectures use increasing
> > > >>> assignment of the PCI domain for individual controllers
> > > >>> (assign the first free number), like it was also for powerpc
> > > >>> prior mentioned commit.
> > > >>>
> > > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not
> > > >>> contain mentioned commit) to new LTS versions brings a
> > > >>> regression in domain assignment.
> > > >>
> > > >> Can you elaborate why it is a regression ?
> > > >> 63a72284b159 That commit says 'no functionnal changes', I'm
> > > >> having hard time understanding how a nochange can be a
> > > >> regression.
> > > > 
> > > > It is not 'no functional change'. That commit completely changed
> > > > PCI domain assignment in a way that is incompatible with other
> > > > architectures and also incompatible with the way how it was done
> > > > prior that commit.
> > > 
> > > I agree that the "no functional change" statement is incorrect.
> > > However, for most powerpc platforms it ended up being simply a
> > > cosmetic behavior change. As far as I can tell there is nothing
> > > requiring domain ids to increase montonically from zero or that
> > > each architecture is required to use the same domain numbering
> > > scheme.
> > 
> > That is truth. But it looks really suspicious why domains are not
> > assigned monotonically. Some scripts / applications are using PCI
> > location (domain:bus:dev:func) for remembering PCI device and domain
> > change can cause issue for config files. And some (older) applications
> > expects existence of domain zero. In systems without hot plug support
> > with small number of domains (e.g. 3) it means that there are always
> > domains 0, 1 and 2.
> > 
> > > Its hard to call this a true regression unless it actually broke
> > > something. The commit in question has been in the kernel since 4.8
> > > which was released over 5 1/2 years ago.
> > 
> > I agree, it really depends on how you look at it.
> > 
> > The important is that lot of people are using LTS versions and are
> > doing upgrades when LTS support is dropped. Which for 4.4 now
> > happened. So not all smaller or "cosmetic" changes could be detected
> > until longer LTS period pass.
> > 
> > > With all that said looking closer at the code in question I think
> > > it is fair to assume that the author only intended this change for
> > > powernv and pseries platforms and not every powerpc platform. That
> > > change was done to make persistent naming easier to manage in
> > > userspace.
> > 
> > I agree that this behavior change may be useful in some situations
> > and I do not object this need.
> > 
> > > Your change defaults back to the old behavior which will now break
> > > both powernv and pseries platforms with regard to hotplugging and
> > > persistent naming.
> > 
> > I was aware of it, that change could cause issues. And that is why I
> > added config option for choosing behavior. So users would be able to
> > choose what they need.
> > 
> > > We could properly limit it to powernv and pseries by using
> > > ibm,fw-phb-id instead of reg property in the look up that follows
> > > a failed ibm,opal-phbid lookup. I think this is acceptable as long
> > > as no other powerpc platforms have started using this behavior for
> > > persistent naming.
> > 
> > And what about setting that new config option to enabled by default
> > for those series?
> > 
> > Or is there issue with introduction of the new config option?
> > 
> > One of the point is that it is really a good idea to have
> > similar/same behavior for all linux platforms. And if it cannot be
> > enabled by default (for backward compatibility) add at least some
> > option, so new platforms can start using it or users can decide to
> > switch behavior.
> 
> This is a powerpc thing so I'm just kibbitzing a little.
> 
> This basically looks like a new config option to selectively revert
> 63a72284b159.  That seems hard to maintain and doesn't seem like
> something that needs to be baked into the kernel at compile-time.
> 
> The 63a72284b159 commit log says persistent NIC names are tied to PCI
> domain/bus/dev/fn addresses, which seems like something we should
> discourage because we can't predict PCI addresses in general.  I
> assume other platforms typically use udev with MAC addresses or
> something?

This is not about ethernet NIC cards only. But affects also WiFi cards
(which registers phy dev, not netdev) and also all other PCIe cards
which do not have to be network-based. Hence MAC address or udev does
not play role there.

> > > > For example, prior that commit on P2020 RDB board were PCI
> > > > domains 0, 1 and 2.
> > > > 
> > > > $ lspci
> > > > 0000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > > 0000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
> > > > 0001:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > > 0001:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
> > > > 0002:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > > 0002:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter
> > > > 
> > > > After that commit on P2020 RDB board are PCI domains 0x8000,
> > > > 0x9000 and 0xa000.
> > > > 
> > > > $ lspci
> > > > 8000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > > 8000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
> > > > 9000:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > > 9000:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
> > > > a000:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> > > > a000:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter
> > > > 
> > > > It is somehow strange that PCI domains are not indexed one by one and
> > > > also that there is no domain 0
> > > > 
> > > > With my patch when CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG is not set, then
> > > > previous behavior used and PCI domains are again 0, 1 and 2.
> > > > 
> > > >> Usually we don't commit regressions to mainline ...
> > > >>
> > > >>> Fix this issue by introducing a new option
> > > >>> CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG When this options is
> > > >>> disabled then powerpc kernel would assign PCI domains in the
> > > >>> similar way like it is doing kernel for other architectures
> > > >>> and also how it was done prior that commit.
> > > >>
> > > >> You don't define CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG on by
> > > >> default, it means this commit will change the behaviour. Is
> > > >> that expected ?
> > > >>
> > > >> Is that really worth a user selectable option ? Is the user
> > > >> able to decide what he needs ?
> > > > 
> > > > Well, I hope that maintainers of that code answer to these
> > > > questions.
> > > > 
> > > > In any case, I think that it could be a user selectable option
> > > > as in that commit is explained that in some situation is makes
> > > > sense to do PCI domain numbering based on DT reg.
> > > > 
> > > > But as I pointed above, upgrading from 4.4 TLS kernel to some
> > > > new TLS kernel brings above regression, so I think that there
> > > > should be a way to disable this behavior.
> > > > 
> > > > In my opinion, for people who are upgrading from 4.4 TLS kernel,
> > > > this option should be turned off by default (= do not change
> > > > behavior). For people who want same behaviour on powerpc as on
> > > > other platforms, also it should be turned off by default.
> > > > 
> > > >>>
> > > >>> Fixes: 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties")
> > > >>
> > > >> Is that really a fix ? What is the problem really ?
> > > > 
> > > > Problem is that PCI domains were changed in a way that is not
> > > > compatible neither with version prior that commit and neither
> > > > with how other linux platforms assign PCI domains for
> > > > controllers.
> > > > 
> > > >>> Signed-off-by: Pali Rohár <pali@kernel.org>
> > > >>> ---
> > > >>>   arch/powerpc/Kconfig             | 10 ++++++++++
> > > >>>   arch/powerpc/kernel/pci-common.c |  4 ++--
> > > >>>   2 files changed, 12 insertions(+), 2 deletions(-)
> > > >>>
> > > >>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > > >>> index 174edabb74fa..4dd3e3acddda 100644
> > > >>> --- a/arch/powerpc/Kconfig
> > > >>> +++ b/arch/powerpc/Kconfig
> > > >>> @@ -375,6 +375,16 @@ config PPC_OF_PLATFORM_PCI
> > > >>>   	depends on PCI
> > > >>>   	depends on PPC64 # not supported on 32 bits yet
> > > >>>   
> > > >>> +config PPC_PCI_DOMAIN_FROM_OF_REG
> > > >>> +	bool "Use OF reg property for PCI domain"
> > > >>> +	depends on PCI
> > > >>
> > > >> Should it depend on PPC_OF_PLATFORM_PCI instead ?
> > > > 
> > > > No, PPC_OF_PLATFORM_PCI has line "depends on PPC64 # not supported on 32
> > > > bits yet". But it is already used also for e.g. P2020 which is 32-bit
> > > > platform.
> > > > 
> > > >>> +	help
> > > >>> +	  By default PCI domain for host bridge during its registration is
> > > >>> +	  chosen as the lowest unused PCI domain number.
> > > >>> +
> > > >>> +	  When this option is enabled then PCI domain is determined from
> > > >>> +	  the OF / Device Tree 'reg' property.
> > > >>> +
> > > >>>   config ARCH_SUPPORTS_UPROBES
> > > >>>   	def_bool y
> > > >>>   
> > > >>> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> > > >>> index 8bc9cf62cd93..8cb6fc5302ae 100644
> > > >>> --- a/arch/powerpc/kernel/pci-common.c
> > > >>> +++ b/arch/powerpc/kernel/pci-common.c
> > > >>> @@ -74,7 +74,6 @@ void __init set_pci_dma_ops(const struct dma_map_ops *dma_ops)
> > > >>>   static int get_phb_number(struct device_node *dn)
> > > >>>   {
> > > >>>   	int ret, phb_id = -1;
> > > >>> -	u32 prop_32;
> > > >>>   	u64 prop;
> > > >>>   
> > > >>>   	/*
> > > >>> @@ -83,7 +82,8 @@ static int get_phb_number(struct device_node *dn)
> > > >>>   	 * reading "ibm,opal-phbid", only present in OPAL environment.
> > > >>>   	 */
> > > >>>   	ret = of_property_read_u64(dn, "ibm,opal-phbid", &prop);
> > > >>
> > > >> This looks like very specific, it is not reflected in the commit log.
> > > > 
> > > > I have not changed nor touched this "ibm,opal-phbid" setting. And it was
> > > > not also touched in that mentioned patch. I see that no DTS file in
> > > > kernel use this option (so probably only DTS files supplied by
> > > > bootloader use it). So I thought that there is not reason to mention in
> > > > commit message.
> > > > 
> > > > But if you think so, I can add some info to commit message about it.
> > > > 
> > > >>> -	if (ret) {
> > > >>> +	if (ret && IS_ENABLED(CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG)) {
> > > >>> +		u32 prop_32;
> > > >>>   		ret = of_property_read_u32_index(dn, "reg", 1, &prop_32);
> > > >>>   		prop = prop_32;
> > > >>>   	}
> > >
Bjorn Helgaas June 9, 2022, 5:10 p.m. UTC | #9
On Thu, Jun 09, 2022 at 06:27:25PM +0200, Pali Rohár wrote:
> On Thursday 09 June 2022 11:22:55 Bjorn Helgaas wrote:
> > [+cc Guilherme, Michael, Ben (author of 63a72284b159 and PPC folks), thread:
> > https://lore.kernel.org/r/20220504175718.29011-1-pali@kernel.org]
> > 
> > On Fri, May 06, 2022 at 12:33:02AM +0200, Pali Rohár wrote:
> > > On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote:
> > > > On 5/5/22 02:31, Pali Rohár wrote:
> > > > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote:
> > > > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit :
> > > > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB
> > > > >>> number based on device-tree properties"), powerpc kernel
> > > > >>> always fallback to PCI domain assignment from OF / Device Tree
> > > > >>> 'reg' property of the PCI controller.
> > > > >>>
> > > > >>> PCI code for other Linux architectures use increasing
> > > > >>> assignment of the PCI domain for individual controllers
> > > > >>> (assign the first free number), like it was also for powerpc
> > > > >>> prior mentioned commit.
> > > > >>>
> > > > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not
> > > > >>> contain mentioned commit) to new LTS versions brings a
> > > > >>> regression in domain assignment.
> > > > >>
> > > > >> Can you elaborate why it is a regression ?
> > > > >> 63a72284b159 That commit says 'no functionnal changes', I'm
> > > > >> having hard time understanding how a nochange can be a
> > > > >> regression.
> > > > > 
> > > > > It is not 'no functional change'. That commit completely changed
> > > > > PCI domain assignment in a way that is incompatible with other
> > > > > architectures and also incompatible with the way how it was done
> > > > > prior that commit.
> > > > 
> > > > I agree that the "no functional change" statement is incorrect.
> > > > However, for most powerpc platforms it ended up being simply a
> > > > cosmetic behavior change. As far as I can tell there is nothing
> > > > requiring domain ids to increase montonically from zero or that
> > > > each architecture is required to use the same domain numbering
> > > > scheme.
> > > 
> > > That is truth. But it looks really suspicious why domains are not
> > > assigned monotonically. Some scripts / applications are using PCI
> > > location (domain:bus:dev:func) for remembering PCI device and domain
> > > change can cause issue for config files. And some (older) applications
> > > expects existence of domain zero. In systems without hot plug support
> > > with small number of domains (e.g. 3) it means that there are always
> > > domains 0, 1 and 2.
> > > 
> > > > Its hard to call this a true regression unless it actually broke
> > > > something. The commit in question has been in the kernel since 4.8
> > > > which was released over 5 1/2 years ago.
> > > 
> > > I agree, it really depends on how you look at it.
> > > 
> > > The important is that lot of people are using LTS versions and are
> > > doing upgrades when LTS support is dropped. Which for 4.4 now
> > > happened. So not all smaller or "cosmetic" changes could be detected
> > > until longer LTS period pass.
> > > 
> > > > With all that said looking closer at the code in question I think
> > > > it is fair to assume that the author only intended this change for
> > > > powernv and pseries platforms and not every powerpc platform. That
> > > > change was done to make persistent naming easier to manage in
> > > > userspace.
> > > 
> > > I agree that this behavior change may be useful in some situations
> > > and I do not object this need.
> > > 
> > > > Your change defaults back to the old behavior which will now break
> > > > both powernv and pseries platforms with regard to hotplugging and
> > > > persistent naming.
> > > 
> > > I was aware of it, that change could cause issues. And that is why I
> > > added config option for choosing behavior. So users would be able to
> > > choose what they need.
> > > 
> > > > We could properly limit it to powernv and pseries by using
> > > > ibm,fw-phb-id instead of reg property in the look up that follows
> > > > a failed ibm,opal-phbid lookup. I think this is acceptable as long
> > > > as no other powerpc platforms have started using this behavior for
> > > > persistent naming.
> > > 
> > > And what about setting that new config option to enabled by default
> > > for those series?
> > > 
> > > Or is there issue with introduction of the new config option?
> > > 
> > > One of the point is that it is really a good idea to have
> > > similar/same behavior for all linux platforms. And if it cannot be
> > > enabled by default (for backward compatibility) add at least some
> > > option, so new platforms can start using it or users can decide to
> > > switch behavior.
> > 
> > This is a powerpc thing so I'm just kibbitzing a little.
> > 
> > This basically looks like a new config option to selectively revert
> > 63a72284b159.  That seems hard to maintain and doesn't seem like
> > something that needs to be baked into the kernel at compile-time.
> > 
> > The 63a72284b159 commit log says persistent NIC names are tied to PCI
> > domain/bus/dev/fn addresses, which seems like something we should
> > discourage because we can't predict PCI addresses in general.  I
> > assume other platforms typically use udev with MAC addresses or
> > something?
> 
> This is not about ethernet NIC cards only. But affects also WiFi cards
> (which registers phy dev, not netdev) and also all other PCIe cards
> which do not have to be network-based. Hence MAC address or udev does
> not play role there.

What persistent naming mechanism do other platforms use in those
cases?

I forgot to ask before about the actual regression here.  The commit
log says domain numbers are different, but I don't know the connection
from there to something failing.  I assume there's some script or
config file that depends on specific domain numbers?  And that
dependency is (hopefully) powerpc-specific?

Bjorn
Pali Rohár June 9, 2022, 6:05 p.m. UTC | #10
On Thursday 09 June 2022 12:10:22 Bjorn Helgaas wrote:
> On Thu, Jun 09, 2022 at 06:27:25PM +0200, Pali Rohár wrote:
> > On Thursday 09 June 2022 11:22:55 Bjorn Helgaas wrote:
> > > [+cc Guilherme, Michael, Ben (author of 63a72284b159 and PPC folks), thread:
> > > https://lore.kernel.org/r/20220504175718.29011-1-pali@kernel.org]
> > > 
> > > On Fri, May 06, 2022 at 12:33:02AM +0200, Pali Rohár wrote:
> > > > On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote:
> > > > > On 5/5/22 02:31, Pali Rohár wrote:
> > > > > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote:
> > > > > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit :
> > > > > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB
> > > > > >>> number based on device-tree properties"), powerpc kernel
> > > > > >>> always fallback to PCI domain assignment from OF / Device Tree
> > > > > >>> 'reg' property of the PCI controller.
> > > > > >>>
> > > > > >>> PCI code for other Linux architectures use increasing
> > > > > >>> assignment of the PCI domain for individual controllers
> > > > > >>> (assign the first free number), like it was also for powerpc
> > > > > >>> prior mentioned commit.
> > > > > >>>
> > > > > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not
> > > > > >>> contain mentioned commit) to new LTS versions brings a
> > > > > >>> regression in domain assignment.
> > > > > >>
> > > > > >> Can you elaborate why it is a regression ?
> > > > > >> 63a72284b159 That commit says 'no functionnal changes', I'm
> > > > > >> having hard time understanding how a nochange can be a
> > > > > >> regression.
> > > > > > 
> > > > > > It is not 'no functional change'. That commit completely changed
> > > > > > PCI domain assignment in a way that is incompatible with other
> > > > > > architectures and also incompatible with the way how it was done
> > > > > > prior that commit.
> > > > > 
> > > > > I agree that the "no functional change" statement is incorrect.
> > > > > However, for most powerpc platforms it ended up being simply a
> > > > > cosmetic behavior change. As far as I can tell there is nothing
> > > > > requiring domain ids to increase montonically from zero or that
> > > > > each architecture is required to use the same domain numbering
> > > > > scheme.
> > > > 
> > > > That is truth. But it looks really suspicious why domains are not
> > > > assigned monotonically. Some scripts / applications are using PCI
> > > > location (domain:bus:dev:func) for remembering PCI device and domain
> > > > change can cause issue for config files. And some (older) applications
> > > > expects existence of domain zero. In systems without hot plug support
> > > > with small number of domains (e.g. 3) it means that there are always
> > > > domains 0, 1 and 2.
> > > > 
> > > > > Its hard to call this a true regression unless it actually broke
> > > > > something. The commit in question has been in the kernel since 4.8
> > > > > which was released over 5 1/2 years ago.
> > > > 
> > > > I agree, it really depends on how you look at it.
> > > > 
> > > > The important is that lot of people are using LTS versions and are
> > > > doing upgrades when LTS support is dropped. Which for 4.4 now
> > > > happened. So not all smaller or "cosmetic" changes could be detected
> > > > until longer LTS period pass.
> > > > 
> > > > > With all that said looking closer at the code in question I think
> > > > > it is fair to assume that the author only intended this change for
> > > > > powernv and pseries platforms and not every powerpc platform. That
> > > > > change was done to make persistent naming easier to manage in
> > > > > userspace.
> > > > 
> > > > I agree that this behavior change may be useful in some situations
> > > > and I do not object this need.
> > > > 
> > > > > Your change defaults back to the old behavior which will now break
> > > > > both powernv and pseries platforms with regard to hotplugging and
> > > > > persistent naming.
> > > > 
> > > > I was aware of it, that change could cause issues. And that is why I
> > > > added config option for choosing behavior. So users would be able to
> > > > choose what they need.
> > > > 
> > > > > We could properly limit it to powernv and pseries by using
> > > > > ibm,fw-phb-id instead of reg property in the look up that follows
> > > > > a failed ibm,opal-phbid lookup. I think this is acceptable as long
> > > > > as no other powerpc platforms have started using this behavior for
> > > > > persistent naming.
> > > > 
> > > > And what about setting that new config option to enabled by default
> > > > for those series?
> > > > 
> > > > Or is there issue with introduction of the new config option?
> > > > 
> > > > One of the point is that it is really a good idea to have
> > > > similar/same behavior for all linux platforms. And if it cannot be
> > > > enabled by default (for backward compatibility) add at least some
> > > > option, so new platforms can start using it or users can decide to
> > > > switch behavior.
> > > 
> > > This is a powerpc thing so I'm just kibbitzing a little.
> > > 
> > > This basically looks like a new config option to selectively revert
> > > 63a72284b159.  That seems hard to maintain and doesn't seem like
> > > something that needs to be baked into the kernel at compile-time.
> > > 
> > > The 63a72284b159 commit log says persistent NIC names are tied to PCI
> > > domain/bus/dev/fn addresses, which seems like something we should
> > > discourage because we can't predict PCI addresses in general.  I
> > > assume other platforms typically use udev with MAC addresses or
> > > something?
> > 
> > This is not about ethernet NIC cards only. But affects also WiFi cards
> > (which registers phy dev, not netdev) and also all other PCIe cards
> > which do not have to be network-based. Hence MAC address or udev does
> > not play role there.
> 
> What persistent naming mechanism do other platforms use in those
> cases?

For example sysfs path which contains domain/bus/dev/fn numbers. And
these numbers were changed in that mentioned commit.

> I forgot to ask before about the actual regression here.  The commit
> log says domain numbers are different, but I don't know the connection
> from there to something failing.  I assume there's some script or
> config file that depends on specific domain numbers?  And that
> dependency is (hopefully) powerpc-specific?
> 
> Bjorn

You assume correct. For example this is the way how OpenWRT handles PCI
devices (but not only OpenWRT). This OpenWRT case is not
powerpc-specific but generic to all architectures. This is just one
example.
Bjorn Helgaas June 9, 2022, 7:34 p.m. UTC | #11
On Thu, Jun 09, 2022 at 08:05:26PM +0200, Pali Rohár wrote:
> On Thursday 09 June 2022 12:10:22 Bjorn Helgaas wrote:
> > On Thu, Jun 09, 2022 at 06:27:25PM +0200, Pali Rohár wrote:
> > > On Thursday 09 June 2022 11:22:55 Bjorn Helgaas wrote:
> > > > [+cc Guilherme, Michael, Ben (author of 63a72284b159 and PPC folks), thread:
> > > > https://lore.kernel.org/r/20220504175718.29011-1-pali@kernel.org]
> > > > 
> > > > On Fri, May 06, 2022 at 12:33:02AM +0200, Pali Rohár wrote:
> > > > > On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote:
> > > > > > On 5/5/22 02:31, Pali Rohár wrote:
> > > > > > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote:
> > > > > > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit :
> > > > > > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB
> > > > > > >>> number based on device-tree properties"), powerpc kernel
> > > > > > >>> always fallback to PCI domain assignment from OF / Device Tree
> > > > > > >>> 'reg' property of the PCI controller.
> > > > > > >>>
> > > > > > >>> PCI code for other Linux architectures use increasing
> > > > > > >>> assignment of the PCI domain for individual controllers
> > > > > > >>> (assign the first free number), like it was also for powerpc
> > > > > > >>> prior mentioned commit.
> > > > > > >>>
> > > > > > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not
> > > > > > >>> contain mentioned commit) to new LTS versions brings a
> > > > > > >>> regression in domain assignment.
> > > > > > >>
> > > > > > >> Can you elaborate why it is a regression ?
> > > > > > >> 63a72284b159 That commit says 'no functionnal changes', I'm
> > > > > > >> having hard time understanding how a nochange can be a
> > > > > > >> regression.
> > > > > > > 
> > > > > > > It is not 'no functional change'. That commit completely changed
> > > > > > > PCI domain assignment in a way that is incompatible with other
> > > > > > > architectures and also incompatible with the way how it was done
> > > > > > > prior that commit.
> > > > > > 
> > > > > > I agree that the "no functional change" statement is incorrect.
> > > > > > However, for most powerpc platforms it ended up being simply a
> > > > > > cosmetic behavior change. As far as I can tell there is nothing
> > > > > > requiring domain ids to increase montonically from zero or that
> > > > > > each architecture is required to use the same domain numbering
> > > > > > scheme.
> > > > > 
> > > > > That is truth. But it looks really suspicious why domains are not
> > > > > assigned monotonically. Some scripts / applications are using PCI
> > > > > location (domain:bus:dev:func) for remembering PCI device and domain
> > > > > change can cause issue for config files. And some (older) applications
> > > > > expects existence of domain zero. In systems without hot plug support
> > > > > with small number of domains (e.g. 3) it means that there are always
> > > > > domains 0, 1 and 2.
> > > > > 
> > > > > > Its hard to call this a true regression unless it actually broke
> > > > > > something. The commit in question has been in the kernel since 4.8
> > > > > > which was released over 5 1/2 years ago.
> > > > > 
> > > > > I agree, it really depends on how you look at it.
> > > > > 
> > > > > The important is that lot of people are using LTS versions and are
> > > > > doing upgrades when LTS support is dropped. Which for 4.4 now
> > > > > happened. So not all smaller or "cosmetic" changes could be detected
> > > > > until longer LTS period pass.
> > > > > 
> > > > > > With all that said looking closer at the code in question I think
> > > > > > it is fair to assume that the author only intended this change for
> > > > > > powernv and pseries platforms and not every powerpc platform. That
> > > > > > change was done to make persistent naming easier to manage in
> > > > > > userspace.
> > > > > 
> > > > > I agree that this behavior change may be useful in some situations
> > > > > and I do not object this need.
> > > > > 
> > > > > > Your change defaults back to the old behavior which will now break
> > > > > > both powernv and pseries platforms with regard to hotplugging and
> > > > > > persistent naming.
> > > > > 
> > > > > I was aware of it, that change could cause issues. And that is why I
> > > > > added config option for choosing behavior. So users would be able to
> > > > > choose what they need.
> > > > > 
> > > > > > We could properly limit it to powernv and pseries by using
> > > > > > ibm,fw-phb-id instead of reg property in the look up that follows
> > > > > > a failed ibm,opal-phbid lookup. I think this is acceptable as long
> > > > > > as no other powerpc platforms have started using this behavior for
> > > > > > persistent naming.
> > > > > 
> > > > > And what about setting that new config option to enabled by default
> > > > > for those series?
> > > > > 
> > > > > Or is there issue with introduction of the new config option?
> > > > > 
> > > > > One of the point is that it is really a good idea to have
> > > > > similar/same behavior for all linux platforms. And if it cannot be
> > > > > enabled by default (for backward compatibility) add at least some
> > > > > option, so new platforms can start using it or users can decide to
> > > > > switch behavior.
> > > > 
> > > > This is a powerpc thing so I'm just kibbitzing a little.
> > > > 
> > > > This basically looks like a new config option to selectively revert
> > > > 63a72284b159.  That seems hard to maintain and doesn't seem like
> > > > something that needs to be baked into the kernel at compile-time.
> > > > 
> > > > The 63a72284b159 commit log says persistent NIC names are tied to PCI
> > > > domain/bus/dev/fn addresses, which seems like something we should
> > > > discourage because we can't predict PCI addresses in general.  I
> > > > assume other platforms typically use udev with MAC addresses or
> > > > something?
> > > 
> > > This is not about ethernet NIC cards only. But affects also WiFi cards
> > > (which registers phy dev, not netdev) and also all other PCIe cards
> > > which do not have to be network-based. Hence MAC address or udev does
> > > not play role there.
> > 
> > What persistent naming mechanism do other platforms use in those
> > cases?
> 
> For example sysfs path which contains domain/bus/dev/fn numbers. And
> these numbers were changed in that mentioned commit.
> 
> > I forgot to ask before about the actual regression here.  The commit
> > log says domain numbers are different, but I don't know the connection
> > from there to something failing.  I assume there's some script or
> > config file that depends on specific domain numbers?  And that
> > dependency is (hopefully) powerpc-specific?
> 
> You assume correct. For example this is the way how OpenWRT handles PCI
> devices (but not only OpenWRT). This OpenWRT case is not
> powerpc-specific but generic to all architectures. This is just one
> example.

So basically everybody uses D/b/d/f for persistent names.  That's ...
well, somewhat stable for things soldered down or in a motherboard
slot, but a terrible idea for things that can be hot-plugged.

Even for more core things, it's possible for firmware to change bus
numbering between boots.  For example, if a complicated hierarchy is
cold-plugged into one slot, firmware is likely to assign different bus
numbers on the next boot to make room for it.  Obviously this can also
happen as a hot-add, and Linux needs the flexibility to do similar
renumbering then, although we don't support it yet.

It looks like 63a72284b159 was intended to make domain numbers *more*
consistent, so it's ironic that this actually broke something by
changing domain numbers.  Maybe there's a way to limit the scope of
63a72284b159 so it avoids the breakage.  I don't know enough about the
powerpc landscape to even guess at how.

Bjorn
Pali Rohár June 9, 2022, 7:41 p.m. UTC | #12
On Thursday 09 June 2022 14:34:51 Bjorn Helgaas wrote:
> On Thu, Jun 09, 2022 at 08:05:26PM +0200, Pali Rohár wrote:
> > On Thursday 09 June 2022 12:10:22 Bjorn Helgaas wrote:
> > > On Thu, Jun 09, 2022 at 06:27:25PM +0200, Pali Rohár wrote:
> > > > On Thursday 09 June 2022 11:22:55 Bjorn Helgaas wrote:
> > > > > [+cc Guilherme, Michael, Ben (author of 63a72284b159 and PPC folks), thread:
> > > > > https://lore.kernel.org/r/20220504175718.29011-1-pali@kernel.org]
> > > > > 
> > > > > On Fri, May 06, 2022 at 12:33:02AM +0200, Pali Rohár wrote:
> > > > > > On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote:
> > > > > > > On 5/5/22 02:31, Pali Rohár wrote:
> > > > > > > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote:
> > > > > > > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit :
> > > > > > > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB
> > > > > > > >>> number based on device-tree properties"), powerpc kernel
> > > > > > > >>> always fallback to PCI domain assignment from OF / Device Tree
> > > > > > > >>> 'reg' property of the PCI controller.
> > > > > > > >>>
> > > > > > > >>> PCI code for other Linux architectures use increasing
> > > > > > > >>> assignment of the PCI domain for individual controllers
> > > > > > > >>> (assign the first free number), like it was also for powerpc
> > > > > > > >>> prior mentioned commit.
> > > > > > > >>>
> > > > > > > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not
> > > > > > > >>> contain mentioned commit) to new LTS versions brings a
> > > > > > > >>> regression in domain assignment.
> > > > > > > >>
> > > > > > > >> Can you elaborate why it is a regression ?
> > > > > > > >> 63a72284b159 That commit says 'no functionnal changes', I'm
> > > > > > > >> having hard time understanding how a nochange can be a
> > > > > > > >> regression.
> > > > > > > > 
> > > > > > > > It is not 'no functional change'. That commit completely changed
> > > > > > > > PCI domain assignment in a way that is incompatible with other
> > > > > > > > architectures and also incompatible with the way how it was done
> > > > > > > > prior that commit.
> > > > > > > 
> > > > > > > I agree that the "no functional change" statement is incorrect.
> > > > > > > However, for most powerpc platforms it ended up being simply a
> > > > > > > cosmetic behavior change. As far as I can tell there is nothing
> > > > > > > requiring domain ids to increase montonically from zero or that
> > > > > > > each architecture is required to use the same domain numbering
> > > > > > > scheme.
> > > > > > 
> > > > > > That is truth. But it looks really suspicious why domains are not
> > > > > > assigned monotonically. Some scripts / applications are using PCI
> > > > > > location (domain:bus:dev:func) for remembering PCI device and domain
> > > > > > change can cause issue for config files. And some (older) applications
> > > > > > expects existence of domain zero. In systems without hot plug support
> > > > > > with small number of domains (e.g. 3) it means that there are always
> > > > > > domains 0, 1 and 2.
> > > > > > 
> > > > > > > Its hard to call this a true regression unless it actually broke
> > > > > > > something. The commit in question has been in the kernel since 4.8
> > > > > > > which was released over 5 1/2 years ago.
> > > > > > 
> > > > > > I agree, it really depends on how you look at it.
> > > > > > 
> > > > > > The important is that lot of people are using LTS versions and are
> > > > > > doing upgrades when LTS support is dropped. Which for 4.4 now
> > > > > > happened. So not all smaller or "cosmetic" changes could be detected
> > > > > > until longer LTS period pass.
> > > > > > 
> > > > > > > With all that said looking closer at the code in question I think
> > > > > > > it is fair to assume that the author only intended this change for
> > > > > > > powernv and pseries platforms and not every powerpc platform. That
> > > > > > > change was done to make persistent naming easier to manage in
> > > > > > > userspace.
> > > > > > 
> > > > > > I agree that this behavior change may be useful in some situations
> > > > > > and I do not object this need.
> > > > > > 
> > > > > > > Your change defaults back to the old behavior which will now break
> > > > > > > both powernv and pseries platforms with regard to hotplugging and
> > > > > > > persistent naming.
> > > > > > 
> > > > > > I was aware of it, that change could cause issues. And that is why I
> > > > > > added config option for choosing behavior. So users would be able to
> > > > > > choose what they need.
> > > > > > 
> > > > > > > We could properly limit it to powernv and pseries by using
> > > > > > > ibm,fw-phb-id instead of reg property in the look up that follows
> > > > > > > a failed ibm,opal-phbid lookup. I think this is acceptable as long
> > > > > > > as no other powerpc platforms have started using this behavior for
> > > > > > > persistent naming.
> > > > > > 
> > > > > > And what about setting that new config option to enabled by default
> > > > > > for those series?
> > > > > > 
> > > > > > Or is there issue with introduction of the new config option?
> > > > > > 
> > > > > > One of the point is that it is really a good idea to have
> > > > > > similar/same behavior for all linux platforms. And if it cannot be
> > > > > > enabled by default (for backward compatibility) add at least some
> > > > > > option, so new platforms can start using it or users can decide to
> > > > > > switch behavior.
> > > > > 
> > > > > This is a powerpc thing so I'm just kibbitzing a little.
> > > > > 
> > > > > This basically looks like a new config option to selectively revert
> > > > > 63a72284b159.  That seems hard to maintain and doesn't seem like
> > > > > something that needs to be baked into the kernel at compile-time.
> > > > > 
> > > > > The 63a72284b159 commit log says persistent NIC names are tied to PCI
> > > > > domain/bus/dev/fn addresses, which seems like something we should
> > > > > discourage because we can't predict PCI addresses in general.  I
> > > > > assume other platforms typically use udev with MAC addresses or
> > > > > something?
> > > > 
> > > > This is not about ethernet NIC cards only. But affects also WiFi cards
> > > > (which registers phy dev, not netdev) and also all other PCIe cards
> > > > which do not have to be network-based. Hence MAC address or udev does
> > > > not play role there.
> > > 
> > > What persistent naming mechanism do other platforms use in those
> > > cases?
> > 
> > For example sysfs path which contains domain/bus/dev/fn numbers. And
> > these numbers were changed in that mentioned commit.
> > 
> > > I forgot to ask before about the actual regression here.  The commit
> > > log says domain numbers are different, but I don't know the connection
> > > from there to something failing.  I assume there's some script or
> > > config file that depends on specific domain numbers?  And that
> > > dependency is (hopefully) powerpc-specific?
> > 
> > You assume correct. For example this is the way how OpenWRT handles PCI
> > devices (but not only OpenWRT). This OpenWRT case is not
> > powerpc-specific but generic to all architectures. This is just one
> > example.
> 
> So basically everybody uses D/b/d/f for persistent names.  That's ...
> well, somewhat stable for things soldered down or in a motherboard
> slot, but a terrible idea for things that can be hot-plugged.

I agree!

> Even for more core things, it's possible for firmware to change bus
> numbering between boots.  For example, if a complicated hierarchy is
> cold-plugged into one slot, firmware is likely to assign different bus
> numbers on the next boot to make room for it.  Obviously this can also
> happen as a hot-add, and Linux needs the flexibility to do similar
> renumbering then, although we don't support it yet.

Yes.

But in soldered design where you have just 3 devices without hotplug or
any complicated topology then all those issues do not apply.

That is why I wrote that in some scenario makes sense new behavior (e.g.
big hot plug system or system with complicated topology with many
bridges) and in some other scenario makes sense old behavior (e.g.
simple soldered PCIe devices on board). And that is why I proposed
config option, so everybody can choose what is better and better fits
for chosen design.

> It looks like 63a72284b159 was intended to make domain numbers *more*
> consistent, so it's ironic that this actually broke something by
> changing domain numbers.  Maybe there's a way to limit the scope of
> 63a72284b159 so it avoids the breakage.  I don't know enough about the
> powerpc landscape to even guess at how.
> 
> Bjorn
Guilherme G. Piccoli June 9, 2022, 8:21 p.m. UTC | #13
First of all, thanks for looping me Bjorn! Much appreciated.
I'm also CCing Ben and Gavin, that know a lot of PPC PCI stuff.


On 09/06/2022 16:34, Bjorn Helgaas wrote:
> [...]
>>>>>>>>>> Upgrading powerpc kernels from LTS 4.4 version (which does not
>>>>>>>>>> contain mentioned commit) to new LTS versions brings a
>>>>>>>>>> regression in domain assignment.
>>>>>>>>>
>>>>>>>>> Can you elaborate why it is a regression ?
>>>>>>>>> 63a72284b159 That commit says 'no functionnal changes', I'm
>>>>>>>>> having hard time understanding how a nochange can be a
>>>>>>>>> regression.
>>>>>>>>
>>>>>>>> It is not 'no functional change'. That commit completely changed
>>>>>>>> PCI domain assignment in a way that is incompatible with other
>>>>>>>> architectures and also incompatible with the way how it was done
>>>>>>>> prior that commit.
>>>>>>>
>>>>>>> I agree that the "no functional change" statement is incorrect.
>>>>>>> However, for most powerpc platforms it ended up being simply a
>>>>>>> cosmetic behavior change. As far as I can tell there is nothing
>>>>>>> requiring domain ids to increase montonically from zero or that
>>>>>>> each architecture is required to use the same domain numbering
>>>>>>> scheme.

Strongly agree here in both points: first, this was not a "no functional
change" thing, and I apologize for adding this in the commit message.
What I meant is that: despite changing the numbering, (as Tyrel said)
nothing should require increasing monotonic mutable PCI domains. At
least, I'm not aware of such requirement in any spec or even in the
kernel and adjacent tooling.


>>>>>> [...]
>>>>>>> We could properly limit it to powernv and pseries by using
>>>>>>> ibm,fw-phb-id instead of reg property in the look up that follows
>>>>>>> a failed ibm,opal-phbid lookup. I think this is acceptable as long
>>>>>>> as no other powerpc platforms have started using this behavior for
>>>>>>> persistent naming.
>>>>>>
>>>>>> And what about setting that new config option to enabled by default
>>>>>> for those series?

I don't remember all the details about PPC dt, but it should already be
restricted to pseries/powernv, right? At least, the commit has a comment:

/* If not pseries nor powernv, or if fixed PHB numbering tried to add
 * the same PHB number twice, then fallback to dynamic PHB numbering.*/

If this is *not* restricted to these 2 platforms, I agree with Pali's
approach, although I'd consider the correct is to keep the persistent
domain scheme for both pnv and pseries, as it's working like this for 5
years and counting, and this *does* prevent a lot of issues with PCI
hotplugging in PPC servers.


>> [...]
>>> I forgot to ask before about the actual regression here.  The commit
>>> log says domain numbers are different, but I don't know the connection
>>> from there to something failing.  I assume there's some script or
>>> config file that depends on specific domain numbers?  And that
>>> dependency is (hopefully) powerpc-specific?
>>
>> You assume correct. For example this is the way how OpenWRT handles PCI
>> devices (but not only OpenWRT). This OpenWRT case is not
>> powerpc-specific but generic to all architectures. This is just one
>> example.
> 
> So basically everybody uses D/b/d/f for persistent names.  That's ...
> well, somewhat stable for things soldered down or in a motherboard
> slot, but a terrible idea for things that can be hot-plugged.
> 
> Even for more core things, it's possible for firmware to change bus
> numbering between boots.  For example, if a complicated hierarchy is
> cold-plugged into one slot, firmware is likely to assign different bus
> numbers on the next boot to make room for it.  Obviously this can also
> happen as a hot-add, and Linux needs the flexibility to do similar
> renumbering then, although we don't support it yet.
> 
> It looks like 63a72284b159 was intended to make domain numbers *more*
> consistent, so it's ironic that this actually broke something by
> changing domain numbers.  Maybe there's a way to limit the scope of
> 63a72284b159 so it avoids the breakage.  I don't know enough about the
> powerpc landscape to even guess at how.

I don't considereit breaks the userspace since this is definitely no
stable ABI (or if it is, I'm not aware and my apologies). If scripts
rely on that, they are doing the wrong thing it seems.

With that said, I'm definitely not against improving the situation with
Pali's KConfig - just think that we somehow should keep the persistent
behavior for powernv and pseries.
Hopefully PPC folks has more to say on that!
Cheers,


Guilherme
Tyrel Datwyler June 9, 2022, 8:50 p.m. UTC | #14
On 6/9/22 13:21, Guilherme G. Piccoli wrote:
> First of all, thanks for looping me Bjorn! Much appreciated.
> I'm also CCing Ben and Gavin, that know a lot of PPC PCI stuff.
> 
> 
> On 09/06/2022 16:34, Bjorn Helgaas wrote:
>> [...]
>>>>>>>>>>> Upgrading powerpc kernels from LTS 4.4 version (which does not
>>>>>>>>>>> contain mentioned commit) to new LTS versions brings a
>>>>>>>>>>> regression in domain assignment.
>>>>>>>>>>
>>>>>>>>>> Can you elaborate why it is a regression ?
>>>>>>>>>> 63a72284b159 That commit says 'no functionnal changes', I'm
>>>>>>>>>> having hard time understanding how a nochange can be a
>>>>>>>>>> regression.
>>>>>>>>>
>>>>>>>>> It is not 'no functional change'. That commit completely changed
>>>>>>>>> PCI domain assignment in a way that is incompatible with other
>>>>>>>>> architectures and also incompatible with the way how it was done
>>>>>>>>> prior that commit.
>>>>>>>>
>>>>>>>> I agree that the "no functional change" statement is incorrect.
>>>>>>>> However, for most powerpc platforms it ended up being simply a
>>>>>>>> cosmetic behavior change. As far as I can tell there is nothing
>>>>>>>> requiring domain ids to increase montonically from zero or that
>>>>>>>> each architecture is required to use the same domain numbering
>>>>>>>> scheme.
> 
> Strongly agree here in both points: first, this was not a "no functional
> change" thing, and I apologize for adding this in the commit message.
> What I meant is that: despite changing the numbering, (as Tyrel said)
> nothing should require increasing monotonic mutable PCI domains. At
> least, I'm not aware of such requirement in any spec or even in the
> kernel and adjacent tooling.
> 
> 
>>>>>>> [...]
>>>>>>>> We could properly limit it to powernv and pseries by using
>>>>>>>> ibm,fw-phb-id instead of reg property in the look up that follows
>>>>>>>> a failed ibm,opal-phbid lookup. I think this is acceptable as long
>>>>>>>> as no other powerpc platforms have started using this behavior for
>>>>>>>> persistent naming.
>>>>>>>
>>>>>>> And what about setting that new config option to enabled by default
>>>>>>> for those series?
> 
> I don't remember all the details about PPC dt, but it should already be
> restricted to pseries/powernv, right? At least, the commit has a comment:
> 
> /* If not pseries nor powernv, or if fixed PHB numbering tried to add
>  * the same PHB number twice, then fallback to dynamic PHB numbering.*/
> 
> If this is *not* restricted to these 2 platforms, I agree with Pali's
> approach, although I'd consider the correct is to keep the persistent
> domain scheme for both pnv and pseries, as it's working like this for 5
> years and counting, and this *does* prevent a lot of issues with PCI
> hotplugging in PPC servers.

I mentioned this in a previous post, but it is clear the Author's intent was for
this only to apply to powernv and pseries platforms. However, it only really
checks for powernv, and if that fails it does a read the reg property for the
domain which works for and PPC platform. If we really only want this on powernv
and pseries and revert all other PPC platforms back we can fix this with a
pseries check instead of a config property. Using ibm,fw-phb-id instead of reg
property if ibm,opal-phbid lookup fails does the trick.

-Tyrel

> 
> 
>>> [...]
>>>> I forgot to ask before about the actual regression here.  The commit
>>>> log says domain numbers are different, but I don't know the connection
>>>> from there to something failing.  I assume there's some script or
>>>> config file that depends on specific domain numbers?  And that
>>>> dependency is (hopefully) powerpc-specific?
>>>
>>> You assume correct. For example this is the way how OpenWRT handles PCI
>>> devices (but not only OpenWRT). This OpenWRT case is not
>>> powerpc-specific but generic to all architectures. This is just one
>>> example.
>>
>> So basically everybody uses D/b/d/f for persistent names.  That's ...
>> well, somewhat stable for things soldered down or in a motherboard
>> slot, but a terrible idea for things that can be hot-plugged.
>>
>> Even for more core things, it's possible for firmware to change bus
>> numbering between boots.  For example, if a complicated hierarchy is
>> cold-plugged into one slot, firmware is likely to assign different bus
>> numbers on the next boot to make room for it.  Obviously this can also
>> happen as a hot-add, and Linux needs the flexibility to do similar
>> renumbering then, although we don't support it yet.
>>
>> It looks like 63a72284b159 was intended to make domain numbers *more*
>> consistent, so it's ironic that this actually broke something by
>> changing domain numbers.  Maybe there's a way to limit the scope of
>> 63a72284b159 so it avoids the breakage.  I don't know enough about the
>> powerpc landscape to even guess at how.
> 
> I don't considereit breaks the userspace since this is definitely no
> stable ABI (or if it is, I'm not aware and my apologies). If scripts
> rely on that, they are doing the wrong thing it seems.
> 
> With that said, I'm definitely not against improving the situation with
> Pali's KConfig - just think that we somehow should keep the persistent
> behavior for powernv and pseries.
> Hopefully PPC folks has more to say on that!
> Cheers,
> 
> 
> Guilherme
Michael Ellerman June 10, 2022, 7:33 a.m. UTC | #15
Pali Rohár <pali@kernel.org> writes:
> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on
> device-tree properties"), powerpc kernel always fallback to PCI domain
> assignment from OF / Device Tree 'reg' property of the PCI controller.
>
> PCI code for other Linux architectures use increasing assignment of the PCI
> domain for individual controllers (assign the first free number), like it
> was also for powerpc prior mentioned commit.
>
> Upgrading powerpc kernels from LTS 4.4 version (which does not contain
> mentioned commit) to new LTS versions brings a regression in domain
> assignment.

I'm sorry this broke your system. But I don't really consider it a
regression, the kernel provides no guarantee about the PCI domain
numbering across LTS releases.

Prior to the change the numbering was just based on the order the PHBs
were discovered in the device tree, which is not robust. A cosmetic
refactor of the device tree source could cause PHBs to be discovered in
a different order.

Similarly a change in firmware PCI discovery or device tree generation
could cause the numbering to change.

If you have scripts that are looking for certain devices they can use
the vendor/device fields in sysfs to find the actual devices they want,
not just whatever happens to be at 0000:01:00.0.

> Fix this issue by introducing a new option CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG
> When this options is disabled then powerpc kernel would assign PCI domains
> in the similar way like it is doing kernel for other architectures and also
> how it was done prior that commit.

I really don't want a config option for that.

There is a device tree property "linux,pci-domain", described in
Documentation/devicetree/bindings/pci/pci.txt.

Can you try adding that to your device tree and updating
get_phb_number() to look for it?

cheers
Pali Rohár June 10, 2022, 7:35 a.m. UTC | #16
On Friday 10 June 2022 17:33:32 Michael Ellerman wrote:
> If you have scripts that are looking for certain devices they can use
> the vendor/device fields in sysfs to find the actual devices they want,
> not just whatever happens to be at 0000:01:00.0.

This does not work if you have more cards with same vendor+device ids in system.
Pali Rohár July 6, 2022, 10:23 a.m. UTC | #17
On Friday 10 June 2022 17:33:32 Michael Ellerman wrote:
> Pali Rohár <pali@kernel.org> writes:
> > Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on
> > device-tree properties"), powerpc kernel always fallback to PCI domain
> > assignment from OF / Device Tree 'reg' property of the PCI controller.
> >
> > PCI code for other Linux architectures use increasing assignment of the PCI
> > domain for individual controllers (assign the first free number), like it
> > was also for powerpc prior mentioned commit.
> >
> > Upgrading powerpc kernels from LTS 4.4 version (which does not contain
> > mentioned commit) to new LTS versions brings a regression in domain
> > assignment.
> 
> I'm sorry this broke your system. But I don't really consider it a
> regression, the kernel provides no guarantee about the PCI domain
> numbering across LTS releases.
> 
> Prior to the change the numbering was just based on the order the PHBs
> were discovered in the device tree, which is not robust. A cosmetic
> refactor of the device tree source could cause PHBs to be discovered in
> a different order.
> 
> Similarly a change in firmware PCI discovery or device tree generation
> could cause the numbering to change.
> 
> If you have scripts that are looking for certain devices they can use
> the vendor/device fields in sysfs to find the actual devices they want,
> not just whatever happens to be at 0000:01:00.0.
> 
> > Fix this issue by introducing a new option CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG
> > When this options is disabled then powerpc kernel would assign PCI domains
> > in the similar way like it is doing kernel for other architectures and also
> > how it was done prior that commit.
> 
> I really don't want a config option for that.
> 
> There is a device tree property "linux,pci-domain", described in
> Documentation/devicetree/bindings/pci/pci.txt.
> 
> Can you try adding that to your device tree and updating
> get_phb_number() to look for it?
> 
> cheers

I sent another proposal in V2, now also with "linux,pci-domain" support.
diff mbox series

Patch

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 174edabb74fa..4dd3e3acddda 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -375,6 +375,16 @@  config PPC_OF_PLATFORM_PCI
 	depends on PCI
 	depends on PPC64 # not supported on 32 bits yet
 
+config PPC_PCI_DOMAIN_FROM_OF_REG
+	bool "Use OF reg property for PCI domain"
+	depends on PCI
+	help
+	  By default PCI domain for host bridge during its registration is
+	  chosen as the lowest unused PCI domain number.
+
+	  When this option is enabled then PCI domain is determined from
+	  the OF / Device Tree 'reg' property.
+
 config ARCH_SUPPORTS_UPROBES
 	def_bool y
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 8bc9cf62cd93..8cb6fc5302ae 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -74,7 +74,6 @@  void __init set_pci_dma_ops(const struct dma_map_ops *dma_ops)
 static int get_phb_number(struct device_node *dn)
 {
 	int ret, phb_id = -1;
-	u32 prop_32;
 	u64 prop;
 
 	/*
@@ -83,7 +82,8 @@  static int get_phb_number(struct device_node *dn)
 	 * reading "ibm,opal-phbid", only present in OPAL environment.
 	 */
 	ret = of_property_read_u64(dn, "ibm,opal-phbid", &prop);
-	if (ret) {
+	if (ret && IS_ENABLED(CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG)) {
+		u32 prop_32;
 		ret = of_property_read_u32_index(dn, "reg", 1, &prop_32);
 		prop = prop_32;
 	}