diff mbox series

PCI/sysfs: Fix race in pci sysfs creation

Message ID 1702093576-30405-1-git-send-email-ssengar@linux.microsoft.com
State New
Headers show
Series PCI/sysfs: Fix race in pci sysfs creation | expand

Commit Message

Saurabh Singh Sengar Dec. 9, 2023, 3:46 a.m. UTC
Currently there is a race in calling pci_create_resource_files function
from two different therads, first therad is triggered by pci_sysfs_init
from the late initcall where as the second thread is initiated by
pci_bus_add_devices from the respective PCI drivers probe.

The synchronization between these threads relies on the sysfs_initialized
flag. However, in pci_sysfs_init, sysfs_initialized is set right before
calling pci_create_resource_files which is wrong as it can create race
condition with pci_bus_add_devices threads. Fix this by setting
sysfs_initialized flag at the end of pci_sysfs_init and direecly call the
pci_create_resource_files function from it.

There can be an additional case where driver probe is so delayed that
pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init.
In such cases, attempting to access already existing sysfs resources is
unnecessary. Fix this by adding a check for sysfs attributes and return
if they are already allocated.

In both cases, the consequence will be the removal of sysfs resources that
were appropriately allocated by pci_sysfs_init following the warning below.

[    3.376688] sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-0001-0000-3130-444531454238/pci0001:00/0001:00:00.0/resource0'
[    3.385103] CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.0-1046-azure #53~20.04.1-Ubuntu
[    3.389585] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[    3.394663] Workqueue: events_unbound async_run_entry_fn
[    3.397687] Call Trace:
[    3.399312]  <TASK>
[    3.400780]  dump_stack_lvl+0x38/0x4d
[    3.402998]  dump_stack+0x10/0x16
[    3.406050]  sysfs_warn_dup.cold+0x17/0x2b
[    3.408476]  sysfs_add_file_mode_ns+0x17b/0x190
[    3.411072]  sysfs_create_bin_file+0x64/0x90
[    3.413514]  pci_create_attr+0xc7/0x260
[    3.415827]  pci_create_resource_files+0x6f/0x150
[    3.418455]  pci_create_sysfs_dev_files+0x18/0x30
[    3.421136]  pci_bus_add_device+0x30/0x70
[    3.423512]  pci_bus_add_devices+0x31/0x70
[    3.425958]  hv_pci_probe+0x4ce/0x640
[    3.428106]  vmbus_probe+0x67/0x90
[    3.430121]  really_probe.part.0+0xcb/0x380
[    3.432516]  really_probe+0x40/0x80
[    3.434581]  __driver_probe_device+0xe8/0x140
[    3.437119]  driver_probe_device+0x23/0xb0
[    3.439504]  __driver_attach_async_helper+0x31/0x90
[    3.442296]  async_run_entry_fn+0x33/0x120
[    3.444666]  process_one_work+0x225/0x3d0
[    3.447043]  worker_thread+0x4d/0x3e0
[    3.449233]  ? process_one_work+0x3d0/0x3d0
[    3.451632]  kthread+0x12a/0x150
[    3.453583]  ? set_kthread_struct+0x50/0x50
[    3.456103]  ret_from_fork+0x22/0x30

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
There has been earlier attempts to fix this problem, below are the patches
for reference of these attempts.
1. https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ew.tq-group.com/T/#u
2. https://lwn.net/ml/linux-kernel/20230316091540.494366-1-alexander.stein@ew.tq-group.com/

Bug details: https://bugzilla.kernel.org/show_bug.cgi?id=215515

 drivers/pci/pci-sysfs.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Alexander Stein Dec. 12, 2023, 7:19 a.m. UTC | #1
Hi Saurabh,

thanks for the patch.

Am Samstag, 9. Dezember 2023, 04:46:16 CET schrieb Saurabh Sengar:
> Currently there is a race in calling pci_create_resource_files function
> from two different therads, first therad is triggered by pci_sysfs_init
> from the late initcall where as the second thread is initiated by
> pci_bus_add_devices from the respective PCI drivers probe.
> 
> The synchronization between these threads relies on the sysfs_initialized
> flag. However, in pci_sysfs_init, sysfs_initialized is set right before
> calling pci_create_resource_files which is wrong as it can create race
> condition with pci_bus_add_devices threads. Fix this by setting
> sysfs_initialized flag at the end of pci_sysfs_init and direecly call the

Small typo here: direecly -> directly

> pci_create_resource_files function from it.
> 
> There can be an additional case where driver probe is so delayed that
> pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init.
> In such cases, attempting to access already existing sysfs resources is
> unnecessary. Fix this by adding a check for sysfs attributes and return
> if they are already allocated.
> 
> In both cases, the consequence will be the removal of sysfs resources that
> were appropriately allocated by pci_sysfs_init following the warning below.

I'm not sure if this is the way to go. Unfortunately I can't trigger this 
error on my imx6 platform at the moment (apparently timing is off).
But reading [1] again, the most expressive way is that pci_bus_add_devices() 
needs to wait until pci_sysfs_init() has passed.

Best regards,
Alexander

[1] https://lore.kernel.org/lkml/a1cca367-52b6-a6b1-fb01-890cad39fd29@suse.com/

> 
> [    3.376688] sysfs: cannot create duplicate filename
> '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-00
> 01-0000-3130-444531454238/pci0001:00/0001:00:00.0/resource0' [    3.385103]
> CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.0-1046-azure
> #53~20.04.1-Ubuntu [    3.389585] Hardware name: Microsoft Corporation
> Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018 [    3.394663]
> Workqueue: events_unbound async_run_entry_fn
> [    3.397687] Call Trace:
> [    3.399312]  <TASK>
> [    3.400780]  dump_stack_lvl+0x38/0x4d
> [    3.402998]  dump_stack+0x10/0x16
> [    3.406050]  sysfs_warn_dup.cold+0x17/0x2b
> [    3.408476]  sysfs_add_file_mode_ns+0x17b/0x190
> [    3.411072]  sysfs_create_bin_file+0x64/0x90
> [    3.413514]  pci_create_attr+0xc7/0x260
> [    3.415827]  pci_create_resource_files+0x6f/0x150
> [    3.418455]  pci_create_sysfs_dev_files+0x18/0x30
> [    3.421136]  pci_bus_add_device+0x30/0x70
> [    3.423512]  pci_bus_add_devices+0x31/0x70
> [    3.425958]  hv_pci_probe+0x4ce/0x640
> [    3.428106]  vmbus_probe+0x67/0x90
> [    3.430121]  really_probe.part.0+0xcb/0x380
> [    3.432516]  really_probe+0x40/0x80
> [    3.434581]  __driver_probe_device+0xe8/0x140
> [    3.437119]  driver_probe_device+0x23/0xb0
> [    3.439504]  __driver_attach_async_helper+0x31/0x90
> [    3.442296]  async_run_entry_fn+0x33/0x120
> [    3.444666]  process_one_work+0x225/0x3d0
> [    3.447043]  worker_thread+0x4d/0x3e0
> [    3.449233]  ? process_one_work+0x3d0/0x3d0
> [    3.451632]  kthread+0x12a/0x150
> [    3.453583]  ? set_kthread_struct+0x50/0x50
> [    3.456103]  ret_from_fork+0x22/0x30
> 
> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> ---
> There has been earlier attempts to fix this problem, below are the patches
> for reference of these attempts.
> 1.
> https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@
> ew.tq-group.com/T/#u 2.
> https://lwn.net/ml/linux-kernel/20230316091540.494366-1-alexander.stein@ew.
> tq-group.com/
> 
> Bug details: https://bugzilla.kernel.org/show_bug.cgi?id=215515
> 
>  drivers/pci/pci-sysfs.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index f2909ae93f2f..a31f6f2cf309 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1230,6 +1230,10 @@ static int pci_create_resource_files(struct pci_dev
> *pdev) if (!pci_resource_len(pdev, i))
>  			continue;
> 
> +		/* Check if resource already allocated and proceed no 
further */
> +		if (pdev->res_attr[i] || pdev->res_attr_wc[i])
> +			return 0;
> +
>  		retval = pci_create_attr(pdev, i, 0);
>  		/* for prefetchable resources, create a WC mappable file 
*/
>  		if (!retval && arch_can_pci_mmap_wc() &&
> @@ -1411,9 +1415,8 @@ static int __init pci_sysfs_init(void)
>  	struct pci_bus *pbus = NULL;
>  	int retval;
> 
> -	sysfs_initialized = 1;
>  	for_each_pci_dev(pdev) {
> -		retval = pci_create_sysfs_dev_files(pdev);
> +		retval = pci_create_resource_files(pdev);
>  		if (retval) {
>  			pci_dev_put(pdev);
>  			return retval;
> @@ -1423,6 +1426,8 @@ static int __init pci_sysfs_init(void)
>  	while ((pbus = pci_find_next_bus(pbus)))
>  		pci_create_legacy_files(pbus);
> 
> +	sysfs_initialized = 1;
> +
>  	return 0;
>  }
>  late_initcall(pci_sysfs_init);
Saurabh Singh Sengar Dec. 12, 2023, 8:21 a.m. UTC | #2
On Tue, Dec 12, 2023 at 08:19:11AM +0100, Alexander Stein wrote:
> Hi Saurabh,
> 
> thanks for the patch.
> 
> Am Samstag, 9. Dezember 2023, 04:46:16 CET schrieb Saurabh Sengar:
> > Currently there is a race in calling pci_create_resource_files function
> > from two different therads, first therad is triggered by pci_sysfs_init
> > from the late initcall where as the second thread is initiated by
> > pci_bus_add_devices from the respective PCI drivers probe.
> > 
> > The synchronization between these threads relies on the sysfs_initialized
> > flag. However, in pci_sysfs_init, sysfs_initialized is set right before
> > calling pci_create_resource_files which is wrong as it can create race
> > condition with pci_bus_add_devices threads. Fix this by setting
> > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the
> 
> Small typo here: direecly -> directly

thanks

> 
> > pci_create_resource_files function from it.
> > 
> > There can be an additional case where driver probe is so delayed that
> > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init.
> > In such cases, attempting to access already existing sysfs resources is
> > unnecessary. Fix this by adding a check for sysfs attributes and return
> > if they are already allocated.
> > 
> > In both cases, the consequence will be the removal of sysfs resources that
> > were appropriately allocated by pci_sysfs_init following the warning below.
> 
> I'm not sure if this is the way to go. Unfortunately I can't trigger this 
> error on my imx6 platform at the moment (apparently timing is off).

The first case in the commit message is the issue which motivated me to write
this patch. The additional case I am explaining in the commit message is not
happening for me as well, I have hacked my driver to add a big sleep (10 second)
before pci_bus_add_devices to create this scenario. Probably you can try the
same as well.

The check added for "resource already allocated" is for this additional case only.

> But reading [1] again, the most expressive way is that pci_bus_add_devices() 
> needs to wait until pci_sysfs_init() has passed.

For the first case I agree with you. This patch is doing exactly the same by moving
sysfs_initialized flag setting at the end of pci_sysfs_init function.
Is there anything I might be ovelooking ?

- Saurabh


> 
> Best regards,
> Alexander
> 
> [1] https://lore.kernel.org/lkml/a1cca367-52b6-a6b1-fb01-890cad39fd29@suse.com/
> 
> > 
> > [    3.376688] sysfs: cannot create duplicate filename
> > '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-00
> > 01-0000-3130-444531454238/pci0001:00/0001:00:00.0/resource0' [    3.385103]
> > CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.0-1046-azure
> > #53~20.04.1-Ubuntu [    3.389585] Hardware name: Microsoft Corporation
> > Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018 [    3.394663]
> > Workqueue: events_unbound async_run_entry_fn
> > [    3.397687] Call Trace:
> > [    3.399312]  <TASK>
> > [    3.400780]  dump_stack_lvl+0x38/0x4d
> > [    3.402998]  dump_stack+0x10/0x16
> > [    3.406050]  sysfs_warn_dup.cold+0x17/0x2b
> > [    3.408476]  sysfs_add_file_mode_ns+0x17b/0x190
> > [    3.411072]  sysfs_create_bin_file+0x64/0x90
> > [    3.413514]  pci_create_attr+0xc7/0x260
> > [    3.415827]  pci_create_resource_files+0x6f/0x150
> > [    3.418455]  pci_create_sysfs_dev_files+0x18/0x30
> > [    3.421136]  pci_bus_add_device+0x30/0x70
> > [    3.423512]  pci_bus_add_devices+0x31/0x70
> > [    3.425958]  hv_pci_probe+0x4ce/0x640
> > [    3.428106]  vmbus_probe+0x67/0x90
> > [    3.430121]  really_probe.part.0+0xcb/0x380
> > [    3.432516]  really_probe+0x40/0x80
> > [    3.434581]  __driver_probe_device+0xe8/0x140
> > [    3.437119]  driver_probe_device+0x23/0xb0
> > [    3.439504]  __driver_attach_async_helper+0x31/0x90
> > [    3.442296]  async_run_entry_fn+0x33/0x120
> > [    3.444666]  process_one_work+0x225/0x3d0
> > [    3.447043]  worker_thread+0x4d/0x3e0
> > [    3.449233]  ? process_one_work+0x3d0/0x3d0
> > [    3.451632]  kthread+0x12a/0x150
> > [    3.453583]  ? set_kthread_struct+0x50/0x50
> > [    3.456103]  ret_from_fork+0x22/0x30
> > 
> > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> > ---
> > There has been earlier attempts to fix this problem, below are the patches
> > for reference of these attempts.
> > 1.
> > https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@
> > ew.tq-group.com/T/#u 2.
> > https://lwn.net/ml/linux-kernel/20230316091540.494366-1-alexander.stein@ew.
> > tq-group.com/
> > 
> > Bug details: https://bugzilla.kernel.org/show_bug.cgi?id=215515
> > 
> >  drivers/pci/pci-sysfs.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > index f2909ae93f2f..a31f6f2cf309 100644
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -1230,6 +1230,10 @@ static int pci_create_resource_files(struct pci_dev
> > *pdev) if (!pci_resource_len(pdev, i))
> >  			continue;
> > 
> > +		/* Check if resource already allocated and proceed no 
> further */
> > +		if (pdev->res_attr[i] || pdev->res_attr_wc[i])
> > +			return 0;
> > +
> >  		retval = pci_create_attr(pdev, i, 0);
> >  		/* for prefetchable resources, create a WC mappable file 
> */
> >  		if (!retval && arch_can_pci_mmap_wc() &&
> > @@ -1411,9 +1415,8 @@ static int __init pci_sysfs_init(void)
> >  	struct pci_bus *pbus = NULL;
> >  	int retval;
> > 
> > -	sysfs_initialized = 1;
> >  	for_each_pci_dev(pdev) {
> > -		retval = pci_create_sysfs_dev_files(pdev);
> > +		retval = pci_create_resource_files(pdev);
> >  		if (retval) {
> >  			pci_dev_put(pdev);
> >  			return retval;
> > @@ -1423,6 +1426,8 @@ static int __init pci_sysfs_init(void)
> >  	while ((pbus = pci_find_next_bus(pbus)))
> >  		pci_create_legacy_files(pbus);
> > 
> > +	sysfs_initialized = 1;
> > +
> >  	return 0;
> >  }
> >  late_initcall(pci_sysfs_init);
> 
> 
> -- 
> TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
> Amtsgericht München, HRB 105018
> Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
> http://www.tq-group.com/
>
Saurabh Singh Sengar Dec. 12, 2023, 8:28 a.m. UTC | #3
On Tue, Dec 12, 2023 at 08:19:11AM +0100, Alexander Stein wrote:
> Hi Saurabh,
> 
> thanks for the patch.
> 
> Am Samstag, 9. Dezember 2023, 04:46:16 CET schrieb Saurabh Sengar:
> > Currently there is a race in calling pci_create_resource_files function
> > from two different therads, first therad is triggered by pci_sysfs_init
> > from the late initcall where as the second thread is initiated by
> > pci_bus_add_devices from the respective PCI drivers probe.
> > 
> > The synchronization between these threads relies on the sysfs_initialized
> > flag. However, in pci_sysfs_init, sysfs_initialized is set right before
> > calling pci_create_resource_files which is wrong as it can create race
> > condition with pci_bus_add_devices threads. Fix this by setting
> > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the
> 
> Small typo here: direecly -> directly
> 
> > pci_create_resource_files function from it.
> > 
> > There can be an additional case where driver probe is so delayed that
> > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init.
> > In such cases, attempting to access already existing sysfs resources is
> > unnecessary. Fix this by adding a check for sysfs attributes and return
> > if they are already allocated.
> > 
> > In both cases, the consequence will be the removal of sysfs resources that
> > were appropriately allocated by pci_sysfs_init following the warning below.
> 
> I'm not sure if this is the way to go. Unfortunately I can't trigger this 
> error on my imx6 platform at the moment (apparently timing is off).
> But reading [1] again, the most expressive way is that pci_bus_add_devices() 
> needs to wait until pci_sysfs_init() has passed.

(I correct my self a bit in my earlier reply)
The problem with waiting is that sysfs entries will be created by pci_sysfs_init
already and when pci_bus_add_devices try to create it will observe that the
entries are already existing and in such case PCI code will remove the sysfs
entries created by pci_sysfs_init. Resulting system will be having no sysfs
entries.

- Saurabh
Saurabh Singh Sengar Jan. 4, 2024, 5:38 a.m. UTC | #4
On Tue, Dec 12, 2023 at 12:28:05AM -0800, Saurabh Singh Sengar wrote:
> On Tue, Dec 12, 2023 at 08:19:11AM +0100, Alexander Stein wrote:
> > Hi Saurabh,
> > 
> > thanks for the patch.
> > 
> > Am Samstag, 9. Dezember 2023, 04:46:16 CET schrieb Saurabh Sengar:
> > > Currently there is a race in calling pci_create_resource_files function
> > > from two different therads, first therad is triggered by pci_sysfs_init
> > > from the late initcall where as the second thread is initiated by
> > > pci_bus_add_devices from the respective PCI drivers probe.
> > > 
> > > The synchronization between these threads relies on the sysfs_initialized
> > > flag. However, in pci_sysfs_init, sysfs_initialized is set right before
> > > calling pci_create_resource_files which is wrong as it can create race
> > > condition with pci_bus_add_devices threads. Fix this by setting
> > > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the
> > 
> > Small typo here: direecly -> directly
> > 
> > > pci_create_resource_files function from it.
> > > 
> > > There can be an additional case where driver probe is so delayed that
> > > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init.
> > > In such cases, attempting to access already existing sysfs resources is
> > > unnecessary. Fix this by adding a check for sysfs attributes and return
> > > if they are already allocated.
> > > 
> > > In both cases, the consequence will be the removal of sysfs resources that
> > > were appropriately allocated by pci_sysfs_init following the warning below.
> > 
> > I'm not sure if this is the way to go. Unfortunately I can't trigger this 
> > error on my imx6 platform at the moment (apparently timing is off).
> > But reading [1] again, the most expressive way is that pci_bus_add_devices() 
> > needs to wait until pci_sysfs_init() has passed.
> 
> (I correct my self a bit in my earlier reply)
> The problem with waiting is that sysfs entries will be created by pci_sysfs_init
> already and when pci_bus_add_devices try to create it will observe that the
> entries are already existing and in such case PCI code will remove the sysfs
> entries created by pci_sysfs_init. Resulting system will be having no sysfs
> entries.


Hi Alexander,
Have you got time to check this ? Please let me know if you think there is any
concern left with this patch.

- Saurabh
Saurabh Singh Sengar Jan. 20, 2024, 6:41 a.m. UTC | #5
On Wed, Jan 03, 2024 at 09:38:03PM -0800, Saurabh Singh Sengar wrote:
> On Tue, Dec 12, 2023 at 12:28:05AM -0800, Saurabh Singh Sengar wrote:
> > On Tue, Dec 12, 2023 at 08:19:11AM +0100, Alexander Stein wrote:
> > > Hi Saurabh,
> > > 
> > > thanks for the patch.
> > > 
> > > Am Samstag, 9. Dezember 2023, 04:46:16 CET schrieb Saurabh Sengar:
> > > > Currently there is a race in calling pci_create_resource_files function
> > > > from two different therads, first therad is triggered by pci_sysfs_init
> > > > from the late initcall where as the second thread is initiated by
> > > > pci_bus_add_devices from the respective PCI drivers probe.
> > > > 
> > > > The synchronization between these threads relies on the sysfs_initialized
> > > > flag. However, in pci_sysfs_init, sysfs_initialized is set right before
> > > > calling pci_create_resource_files which is wrong as it can create race
> > > > condition with pci_bus_add_devices threads. Fix this by setting
> > > > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the
> > > 
> > > Small typo here: direecly -> directly
> > > 
> > > > pci_create_resource_files function from it.
> > > > 
> > > > There can be an additional case where driver probe is so delayed that
> > > > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init.
> > > > In such cases, attempting to access already existing sysfs resources is
> > > > unnecessary. Fix this by adding a check for sysfs attributes and return
> > > > if they are already allocated.
> > > > 
> > > > In both cases, the consequence will be the removal of sysfs resources that
> > > > were appropriately allocated by pci_sysfs_init following the warning below.
> > > 
> > > I'm not sure if this is the way to go. Unfortunately I can't trigger this 
> > > error on my imx6 platform at the moment (apparently timing is off).
> > > But reading [1] again, the most expressive way is that pci_bus_add_devices() 
> > > needs to wait until pci_sysfs_init() has passed.
> > 
> > (I correct my self a bit in my earlier reply)
> > The problem with waiting is that sysfs entries will be created by pci_sysfs_init
> > already and when pci_bus_add_devices try to create it will observe that the
> > entries are already existing and in such case PCI code will remove the sysfs
> > entries created by pci_sysfs_init. Resulting system will be having no sysfs
> > entries.
> 
> 
> Hi Alexander,
> Have you got time to check this ? Please let me know if you think there is any
> concern left with this patch.

Hi PCI Maintainers,

If there is no objection, can we take this patch in ?


Regards,
Saurabh
Saurabh Singh Sengar Feb. 2, 2024, 7:17 a.m. UTC | #6
On Fri, Jan 19, 2024 at 10:41:56PM -0800, Saurabh Singh Sengar wrote:
> On Wed, Jan 03, 2024 at 09:38:03PM -0800, Saurabh Singh Sengar wrote:
> > On Tue, Dec 12, 2023 at 12:28:05AM -0800, Saurabh Singh Sengar wrote:
> > > On Tue, Dec 12, 2023 at 08:19:11AM +0100, Alexander Stein wrote:
> > > > Hi Saurabh,
> > > > 
> > > > thanks for the patch.
> > > > 
> > > > Am Samstag, 9. Dezember 2023, 04:46:16 CET schrieb Saurabh Sengar:
> > > > > Currently there is a race in calling pci_create_resource_files function
> > > > > from two different therads, first therad is triggered by pci_sysfs_init
> > > > > from the late initcall where as the second thread is initiated by
> > > > > pci_bus_add_devices from the respective PCI drivers probe.
> > > > > 
> > > > > The synchronization between these threads relies on the sysfs_initialized
> > > > > flag. However, in pci_sysfs_init, sysfs_initialized is set right before
> > > > > calling pci_create_resource_files which is wrong as it can create race
> > > > > condition with pci_bus_add_devices threads. Fix this by setting
> > > > > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the
> > > > 
> > > > Small typo here: direecly -> directly
> > > > 
> > > > > pci_create_resource_files function from it.
> > > > > 
> > > > > There can be an additional case where driver probe is so delayed that
> > > > > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init.
> > > > > In such cases, attempting to access already existing sysfs resources is
> > > > > unnecessary. Fix this by adding a check for sysfs attributes and return
> > > > > if they are already allocated.
> > > > > 
> > > > > In both cases, the consequence will be the removal of sysfs resources that
> > > > > were appropriately allocated by pci_sysfs_init following the warning below.
> > > > 
> > > > I'm not sure if this is the way to go. Unfortunately I can't trigger this 
> > > > error on my imx6 platform at the moment (apparently timing is off).
> > > > But reading [1] again, the most expressive way is that pci_bus_add_devices() 
> > > > needs to wait until pci_sysfs_init() has passed.
> > > 
> > > (I correct my self a bit in my earlier reply)
> > > The problem with waiting is that sysfs entries will be created by pci_sysfs_init
> > > already and when pci_bus_add_devices try to create it will observe that the
> > > entries are already existing and in such case PCI code will remove the sysfs
> > > entries created by pci_sysfs_init. Resulting system will be having no sysfs
> > > entries.
> > 
> > 
> > Hi Alexander,
> > Have you got time to check this ? Please let me know if you think there is any
> > concern left with this patch.
> 
> Hi PCI Maintainers,
> 
> If there is no objection, can we take this patch in ?
> 

ping

> 
> Regards,
> Saurabh
> 
> 
> 
>
Bjorn Helgaas Feb. 6, 2024, 10:07 p.m. UTC | #7
[+cc Krzysztof]

On Fri, Dec 08, 2023 at 07:46:16PM -0800, Saurabh Sengar wrote:
> Currently there is a race in calling pci_create_resource_files function
> from two different therads, first therad is triggered by pci_sysfs_init
> from the late initcall where as the second thread is initiated by
> pci_bus_add_devices from the respective PCI drivers probe.
> 
> The synchronization between these threads relies on the sysfs_initialized
> flag. However, in pci_sysfs_init, sysfs_initialized is set right before
> calling pci_create_resource_files which is wrong as it can create race
> condition with pci_bus_add_devices threads. Fix this by setting
> sysfs_initialized flag at the end of pci_sysfs_init and direecly call the
> pci_create_resource_files function from it.
> 
> There can be an additional case where driver probe is so delayed that
> pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init.
> In such cases, attempting to access already existing sysfs resources is
> unnecessary. Fix this by adding a check for sysfs attributes and return
> if they are already allocated.
> 
> In both cases, the consequence will be the removal of sysfs resources that
> were appropriately allocated by pci_sysfs_init following the warning below.
> 
> [    3.376688] sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-0001-0000-3130-444531454238/pci0001:00/0001:00:00.0/resource0'
> [    3.385103] CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.0-1046-azure #53~20.04.1-Ubuntu
> [    3.389585] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
> [    3.394663] Workqueue: events_unbound async_run_entry_fn
> [    3.397687] Call Trace:
> [    3.399312]  <TASK>
> [    3.400780]  dump_stack_lvl+0x38/0x4d
> [    3.402998]  dump_stack+0x10/0x16
> [    3.406050]  sysfs_warn_dup.cold+0x17/0x2b
> [    3.408476]  sysfs_add_file_mode_ns+0x17b/0x190
> [    3.411072]  sysfs_create_bin_file+0x64/0x90
> [    3.413514]  pci_create_attr+0xc7/0x260
> [    3.415827]  pci_create_resource_files+0x6f/0x150
> [    3.418455]  pci_create_sysfs_dev_files+0x18/0x30
> [    3.421136]  pci_bus_add_device+0x30/0x70
> [    3.423512]  pci_bus_add_devices+0x31/0x70
> [    3.425958]  hv_pci_probe+0x4ce/0x640
> [    3.428106]  vmbus_probe+0x67/0x90
> [    3.430121]  really_probe.part.0+0xcb/0x380
> [    3.432516]  really_probe+0x40/0x80
> [    3.434581]  __driver_probe_device+0xe8/0x140
> [    3.437119]  driver_probe_device+0x23/0xb0
> [    3.439504]  __driver_attach_async_helper+0x31/0x90
> [    3.442296]  async_run_entry_fn+0x33/0x120
> [    3.444666]  process_one_work+0x225/0x3d0
> [    3.447043]  worker_thread+0x4d/0x3e0
> [    3.449233]  ? process_one_work+0x3d0/0x3d0
> [    3.451632]  kthread+0x12a/0x150
> [    3.453583]  ? set_kthread_struct+0x50/0x50
> [    3.456103]  ret_from_fork+0x22/0x30
> 
> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> ---
> There has been earlier attempts to fix this problem, below are the patches
> for reference of these attempts.
> 1. https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ew.tq-group.com/T/#u
> 2. https://lwn.net/ml/linux-kernel/20230316091540.494366-1-alexander.stein@ew.tq-group.com/
> 
> Bug details: https://bugzilla.kernel.org/show_bug.cgi?id=215515
> 
>  drivers/pci/pci-sysfs.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index f2909ae93f2f..a31f6f2cf309 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1230,6 +1230,10 @@ static int pci_create_resource_files(struct pci_dev *pdev)
>  		if (!pci_resource_len(pdev, i))
>  			continue;
>  
> +		/* Check if resource already allocated and proceed no further */
> +		if (pdev->res_attr[i] || pdev->res_attr_wc[i])
> +			return 0;
> +
>  		retval = pci_create_attr(pdev, i, 0);
>  		/* for prefetchable resources, create a WC mappable file */
>  		if (!retval && arch_can_pci_mmap_wc() &&
> @@ -1411,9 +1415,8 @@ static int __init pci_sysfs_init(void)
>  	struct pci_bus *pbus = NULL;
>  	int retval;
>  
> -	sysfs_initialized = 1;
>  	for_each_pci_dev(pdev) {
> -		retval = pci_create_sysfs_dev_files(pdev);
> +		retval = pci_create_resource_files(pdev);
>  		if (retval) {
>  			pci_dev_put(pdev);
>  			return retval;
> @@ -1423,6 +1426,8 @@ static int __init pci_sysfs_init(void)
>  	while ((pbus = pci_find_next_bus(pbus)))
>  		pci_create_legacy_files(pbus);
>  
> +	sysfs_initialized = 1;
> +
>  	return 0;
>  }
>  late_initcall(pci_sysfs_init);

Sorry for the delay in looking at this.  Consider the following
sequence where thread A is executing pci_sysfs_init() at the same time
as thread B enumerates and adds device X:

  Thread A:

    pci_sysfs_init
      for_each_pci_dev(pdev) {                  # device X not included
        pci_create_resource_files(pdev);
      }

  Thread B:

    pci_bus_add_device                          # add device X
      pci_create_sysfs_dev_files
        if (!sysfs_initialized)                 # sysfs_initialized still zero
          return -EACCES;
        pci_create_resource_files(pdev);        # not executed

  Thread A:

    while ((pbus = pci_find_next_bus(pbus)))
      pci_create_legacy_files(pbus);

    sysfs_initialized = 1;

Doesn't this have a similar race where instead of the duplicate
filename from having two threads try to create the resource files,
neither thread creates them and device X ends up with no resource
files at all?

Krzysztof has done a ton of work to convert these files to static
attributes, where the device model prevents most of these races:

  506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to static attributes")
  d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute")
  f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute")
  527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute")
  e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute")

and he even posted a series to do the same for the resource files:

  https://lore.kernel.org/linux-pci/20210910202623.2293708-1-kw@linux.com/

I can't remember why we didn't apply that at the time, and it no
longer applies cleanly, but I think that's the direction we should go.

Bjorn
Saurabh Singh Sengar Feb. 7, 2024, 4:30 p.m. UTC | #8
On Tue, Feb 06, 2024 at 04:07:15PM -0600, Bjorn Helgaas wrote:
> [+cc Krzysztof]
> 
> On Fri, Dec 08, 2023 at 07:46:16PM -0800, Saurabh Sengar wrote:
> > Currently there is a race in calling pci_create_resource_files function
> > from two different therads, first therad is triggered by pci_sysfs_init
> > from the late initcall where as the second thread is initiated by
> > pci_bus_add_devices from the respective PCI drivers probe.
> > 
> > The synchronization between these threads relies on the sysfs_initialized
> > flag. However, in pci_sysfs_init, sysfs_initialized is set right before
> > calling pci_create_resource_files which is wrong as it can create race
> > condition with pci_bus_add_devices threads. Fix this by setting
> > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the
> > pci_create_resource_files function from it.
> > 
> > There can be an additional case where driver probe is so delayed that
> > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init.
> > In such cases, attempting to access already existing sysfs resources is
> > unnecessary. Fix this by adding a check for sysfs attributes and return
> > if they are already allocated.
> > 
> > In both cases, the consequence will be the removal of sysfs resources that
> > were appropriately allocated by pci_sysfs_init following the warning below.
> > 
> > [    3.376688] sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-0001-0000-3130-444531454238/pci0001:00/0001:00:00.0/resource0'
> > [    3.385103] CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.0-1046-azure #53~20.04.1-Ubuntu
> > [    3.389585] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
> > [    3.394663] Workqueue: events_unbound async_run_entry_fn
> > [    3.397687] Call Trace:
> > [    3.399312]  <TASK>
> > [    3.400780]  dump_stack_lvl+0x38/0x4d
> > [    3.402998]  dump_stack+0x10/0x16
> > [    3.406050]  sysfs_warn_dup.cold+0x17/0x2b
> > [    3.408476]  sysfs_add_file_mode_ns+0x17b/0x190
> > [    3.411072]  sysfs_create_bin_file+0x64/0x90
> > [    3.413514]  pci_create_attr+0xc7/0x260
> > [    3.415827]  pci_create_resource_files+0x6f/0x150
> > [    3.418455]  pci_create_sysfs_dev_files+0x18/0x30
> > [    3.421136]  pci_bus_add_device+0x30/0x70
> > [    3.423512]  pci_bus_add_devices+0x31/0x70
> > [    3.425958]  hv_pci_probe+0x4ce/0x640
> > [    3.428106]  vmbus_probe+0x67/0x90
> > [    3.430121]  really_probe.part.0+0xcb/0x380
> > [    3.432516]  really_probe+0x40/0x80
> > [    3.434581]  __driver_probe_device+0xe8/0x140
> > [    3.437119]  driver_probe_device+0x23/0xb0
> > [    3.439504]  __driver_attach_async_helper+0x31/0x90
> > [    3.442296]  async_run_entry_fn+0x33/0x120
> > [    3.444666]  process_one_work+0x225/0x3d0
> > [    3.447043]  worker_thread+0x4d/0x3e0
> > [    3.449233]  ? process_one_work+0x3d0/0x3d0
> > [    3.451632]  kthread+0x12a/0x150
> > [    3.453583]  ? set_kthread_struct+0x50/0x50
> > [    3.456103]  ret_from_fork+0x22/0x30
> > 
> > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> > ---
> > There has been earlier attempts to fix this problem, below are the patches
> > for reference of these attempts.
> > 1. https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ew.tq-group.com/T/#u
> > 2. https://lwn.net/ml/linux-kernel/20230316091540.494366-1-alexander.stein@ew.tq-group.com/
> > 
> > Bug details: https://bugzilla.kernel.org/show_bug.cgi?id=215515
> > 
> >  drivers/pci/pci-sysfs.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > index f2909ae93f2f..a31f6f2cf309 100644
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -1230,6 +1230,10 @@ static int pci_create_resource_files(struct pci_dev *pdev)
> >  		if (!pci_resource_len(pdev, i))
> >  			continue;
> >  
> > +		/* Check if resource already allocated and proceed no further */
> > +		if (pdev->res_attr[i] || pdev->res_attr_wc[i])
> > +			return 0;
> > +
> >  		retval = pci_create_attr(pdev, i, 0);
> >  		/* for prefetchable resources, create a WC mappable file */
> >  		if (!retval && arch_can_pci_mmap_wc() &&
> > @@ -1411,9 +1415,8 @@ static int __init pci_sysfs_init(void)
> >  	struct pci_bus *pbus = NULL;
> >  	int retval;
> >  
> > -	sysfs_initialized = 1;
> >  	for_each_pci_dev(pdev) {
> > -		retval = pci_create_sysfs_dev_files(pdev);
> > +		retval = pci_create_resource_files(pdev);
> >  		if (retval) {
> >  			pci_dev_put(pdev);
> >  			return retval;
> > @@ -1423,6 +1426,8 @@ static int __init pci_sysfs_init(void)
> >  	while ((pbus = pci_find_next_bus(pbus)))
> >  		pci_create_legacy_files(pbus);
> >  
> > +	sysfs_initialized = 1;
> > +
> >  	return 0;
> >  }
> >  late_initcall(pci_sysfs_init);
> 
> Sorry for the delay in looking at this.  Consider the following
> sequence where thread A is executing pci_sysfs_init() at the same time
> as thread B enumerates and adds device X:
> 
>   Thread A:
> 
>     pci_sysfs_init
>       for_each_pci_dev(pdev) {                  # device X not included
>         pci_create_resource_files(pdev);
>       }
> 
>   Thread B:
> 
>     pci_bus_add_device                          # add device X
>       pci_create_sysfs_dev_files
>         if (!sysfs_initialized)                 # sysfs_initialized still zero
>           return -EACCES;
>         pci_create_resource_files(pdev);        # not executed
> 
>   Thread A:
> 
>     while ((pbus = pci_find_next_bus(pbus)))
>       pci_create_legacy_files(pbus);
> 
>     sysfs_initialized = 1;
> 
> Doesn't this have a similar race where instead of the duplicate
> filename from having two threads try to create the resource files,
> neither thread creates them and device X ends up with no resource
> files at all?
> 
> Krzysztof has done a ton of work to convert these files to static
> attributes, where the device model prevents most of these races:
> 
>   506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to static attributes")
>   d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute")
>   f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute")
>   527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute")
>   e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute")
> 
> and he even posted a series to do the same for the resource files:
> 
>   https://lore.kernel.org/linux-pci/20210910202623.2293708-1-kw@linux.com/
> 
> I can't remember why we didn't apply that at the time, and it no
> longer applies cleanly, but I think that's the direction we should go.
> 
> Bjorn


Thanks for you review.

Krzysztof,

Please inform me if there's existing feedback explaining why this
series hasn't been merged yet. I am willing to further improve it
if necessary.

- Saurabh
Saurabh Singh Sengar Feb. 27, 2024, 5:14 p.m. UTC | #9
On Wed, Feb 07, 2024 at 08:30:27AM -0800, Saurabh Singh Sengar wrote:
> On Tue, Feb 06, 2024 at 04:07:15PM -0600, Bjorn Helgaas wrote:
> > [+cc Krzysztof]
> > 
> > On Fri, Dec 08, 2023 at 07:46:16PM -0800, Saurabh Sengar wrote:
> > > Currently there is a race in calling pci_create_resource_files function
> > > from two different therads, first therad is triggered by pci_sysfs_init
> > > from the late initcall where as the second thread is initiated by
> > > pci_bus_add_devices from the respective PCI drivers probe.
> > > 
> > > The synchronization between these threads relies on the sysfs_initialized
> > > flag. However, in pci_sysfs_init, sysfs_initialized is set right before
> > > calling pci_create_resource_files which is wrong as it can create race
> > > condition with pci_bus_add_devices threads. Fix this by setting
> > > sysfs_initialized flag at the end of pci_sysfs_init and direecly call the
> > > pci_create_resource_files function from it.
> > > 
> > > There can be an additional case where driver probe is so delayed that
> > > pci_bus_add_devices is called after the sysfs is created by pci_sysfs_init.
> > > In such cases, attempting to access already existing sysfs resources is
> > > unnecessary. Fix this by adding a check for sysfs attributes and return
> > > if they are already allocated.
> > > 
> > > In both cases, the consequence will be the removal of sysfs resources that
> > > were appropriately allocated by pci_sysfs_init following the warning below.
> > > 
> > > [    3.376688] sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-0001-0000-3130-444531454238/pci0001:00/0001:00:00.0/resource0'
> > > [    3.385103] CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.0-1046-azure #53~20.04.1-Ubuntu
> > > [    3.389585] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
> > > [    3.394663] Workqueue: events_unbound async_run_entry_fn
> > > [    3.397687] Call Trace:
> > > [    3.399312]  <TASK>
> > > [    3.400780]  dump_stack_lvl+0x38/0x4d
> > > [    3.402998]  dump_stack+0x10/0x16
> > > [    3.406050]  sysfs_warn_dup.cold+0x17/0x2b
> > > [    3.408476]  sysfs_add_file_mode_ns+0x17b/0x190
> > > [    3.411072]  sysfs_create_bin_file+0x64/0x90
> > > [    3.413514]  pci_create_attr+0xc7/0x260
> > > [    3.415827]  pci_create_resource_files+0x6f/0x150
> > > [    3.418455]  pci_create_sysfs_dev_files+0x18/0x30
> > > [    3.421136]  pci_bus_add_device+0x30/0x70
> > > [    3.423512]  pci_bus_add_devices+0x31/0x70
> > > [    3.425958]  hv_pci_probe+0x4ce/0x640
> > > [    3.428106]  vmbus_probe+0x67/0x90
> > > [    3.430121]  really_probe.part.0+0xcb/0x380
> > > [    3.432516]  really_probe+0x40/0x80
> > > [    3.434581]  __driver_probe_device+0xe8/0x140
> > > [    3.437119]  driver_probe_device+0x23/0xb0
> > > [    3.439504]  __driver_attach_async_helper+0x31/0x90
> > > [    3.442296]  async_run_entry_fn+0x33/0x120
> > > [    3.444666]  process_one_work+0x225/0x3d0
> > > [    3.447043]  worker_thread+0x4d/0x3e0
> > > [    3.449233]  ? process_one_work+0x3d0/0x3d0
> > > [    3.451632]  kthread+0x12a/0x150
> > > [    3.453583]  ? set_kthread_struct+0x50/0x50
> > > [    3.456103]  ret_from_fork+0x22/0x30
> > > 
> > > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> > > ---
> > > There has been earlier attempts to fix this problem, below are the patches
> > > for reference of these attempts.
> > > 1. https://lore.kernel.org/linux-pci/20230316103036.1837869-1-alexander.stein@ew.tq-group.com/T/#u
> > > 2. https://lwn.net/ml/linux-kernel/20230316091540.494366-1-alexander.stein@ew.tq-group.com/
> > > 
> > > Bug details: https://bugzilla.kernel.org/show_bug.cgi?id=215515
> > > 
> > >  drivers/pci/pci-sysfs.c | 9 +++++++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > > index f2909ae93f2f..a31f6f2cf309 100644
> > > --- a/drivers/pci/pci-sysfs.c
> > > +++ b/drivers/pci/pci-sysfs.c
> > > @@ -1230,6 +1230,10 @@ static int pci_create_resource_files(struct pci_dev *pdev)
> > >  		if (!pci_resource_len(pdev, i))
> > >  			continue;
> > >  
> > > +		/* Check if resource already allocated and proceed no further */
> > > +		if (pdev->res_attr[i] || pdev->res_attr_wc[i])
> > > +			return 0;
> > > +
> > >  		retval = pci_create_attr(pdev, i, 0);
> > >  		/* for prefetchable resources, create a WC mappable file */
> > >  		if (!retval && arch_can_pci_mmap_wc() &&
> > > @@ -1411,9 +1415,8 @@ static int __init pci_sysfs_init(void)
> > >  	struct pci_bus *pbus = NULL;
> > >  	int retval;
> > >  
> > > -	sysfs_initialized = 1;
> > >  	for_each_pci_dev(pdev) {
> > > -		retval = pci_create_sysfs_dev_files(pdev);
> > > +		retval = pci_create_resource_files(pdev);
> > >  		if (retval) {
> > >  			pci_dev_put(pdev);
> > >  			return retval;
> > > @@ -1423,6 +1426,8 @@ static int __init pci_sysfs_init(void)
> > >  	while ((pbus = pci_find_next_bus(pbus)))
> > >  		pci_create_legacy_files(pbus);
> > >  
> > > +	sysfs_initialized = 1;
> > > +
> > >  	return 0;
> > >  }
> > >  late_initcall(pci_sysfs_init);
> > 
> > Sorry for the delay in looking at this.  Consider the following
> > sequence where thread A is executing pci_sysfs_init() at the same time
> > as thread B enumerates and adds device X:
> > 
> >   Thread A:
> > 
> >     pci_sysfs_init
> >       for_each_pci_dev(pdev) {                  # device X not included
> >         pci_create_resource_files(pdev);
> >       }
> > 
> >   Thread B:
> > 
> >     pci_bus_add_device                          # add device X
> >       pci_create_sysfs_dev_files
> >         if (!sysfs_initialized)                 # sysfs_initialized still zero
> >           return -EACCES;
> >         pci_create_resource_files(pdev);        # not executed
> > 
> >   Thread A:
> > 
> >     while ((pbus = pci_find_next_bus(pbus)))
> >       pci_create_legacy_files(pbus);
> > 
> >     sysfs_initialized = 1;
> > 
> > Doesn't this have a similar race where instead of the duplicate
> > filename from having two threads try to create the resource files,
> > neither thread creates them and device X ends up with no resource
> > files at all?
> > 
> > Krzysztof has done a ton of work to convert these files to static
> > attributes, where the device model prevents most of these races:
> > 
> >   506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to static attributes")
> >   d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute")
> >   f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute")
> >   527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute")
> >   e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute")
> > 
> > and he even posted a series to do the same for the resource files:
> > 
> >   https://lore.kernel.org/linux-pci/20210910202623.2293708-1-kw@linux.com/
> > 
> > I can't remember why we didn't apply that at the time, and it no
> > longer applies cleanly, but I think that's the direction we should go.
> > 
> > Bjorn
> 
> 
> Thanks for you review.
> 
> Krzysztof,
> 
> Please inform me if there's existing feedback explaining why this
> series hasn't been merged yet. I am willing to further improve it
> if necessary.

Krzysztof Wilczyński,

Let us know your opinion so that we can move ahead in fixing this
long pending bug.

- Saurabh
> 
> - Saurabh
Bjorn Helgaas Feb. 28, 2024, 3:22 p.m. UTC | #10
On Tue, Feb 27, 2024 at 09:14:58AM -0800, Saurabh Singh Sengar wrote:
> On Wed, Feb 07, 2024 at 08:30:27AM -0800, Saurabh Singh Sengar wrote:
> > On Tue, Feb 06, 2024 at 04:07:15PM -0600, Bjorn Helgaas wrote:
> > > On Fri, Dec 08, 2023 at 07:46:16PM -0800, Saurabh Sengar wrote:
> > > > Currently there is a race in calling pci_create_resource_files function
> > > > from two different therads, first therad is triggered by pci_sysfs_init
> > > > from the late initcall where as the second thread is initiated by
> > > > pci_bus_add_devices from the respective PCI drivers probe.
> ...

> > > Krzysztof has done a ton of work to convert these files to static
> > > attributes, where the device model prevents most of these races:
> > > 
> > >   506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to static attributes")
> > >   d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute")
> > >   f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute")
> > >   527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute")
> > >   e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute")
> > > 
> > > and he even posted a series to do the same for the resource files:
> > > 
> > >   https://lore.kernel.org/linux-pci/20210910202623.2293708-1-kw@linux.com/
> > > 
> > > I can't remember why we didn't apply that at the time, and it no
> > > longer applies cleanly, but I think that's the direction we should go.
> > 
> > Thanks for you review.
> > 
> > Please inform me if there's existing feedback explaining why this
> > series hasn't been merged yet. I am willing to further improve it
> > if necessary.
> 
> Let us know your opinion so that we can move ahead in fixing this
> long pending bug.

There's no feedback on the mailing list (I checked the link above), so
the way forward is to update the series so it applies cleanly again
and post it as a v3.

There's no need to wait for Krzysztof to refresh it, and if you have
time to do it, it would be very welcomed!  The best base would be
v6.8-rc1.

Bjorn
Krzysztof Wilczyński Feb. 28, 2024, 5:22 p.m. UTC | #11
Hello,

Sorry for late reply.

[...]
> > > > Krzysztof has done a ton of work to convert these files to static
> > > > attributes, where the device model prevents most of these races:
> > > > 
> > > >   506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to static attributes")
> > > >   d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute")
> > > >   f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute")
> > > >   527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute")
> > > >   e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute")
> > > > 
> > > > and he even posted a series to do the same for the resource files:
> > > > 
> > > >   https://lore.kernel.org/linux-pci/20210910202623.2293708-1-kw@linux.com/
> > > > 
> > > > I can't remember why we didn't apply that at the time, and it no
> > > > longer applies cleanly, but I think that's the direction we should go.
> > > 
> > > Thanks for you review.
> > > 
> > > Please inform me if there's existing feedback explaining why this
> > > series hasn't been merged yet. I am willing to further improve it
> > > if necessary.
> > 
> > Let us know your opinion so that we can move ahead in fixing this
> > long pending bug.

I really thought you were asking me about your patch.  So, I didn't reply
since Bjorn added his review.

> There's no feedback on the mailing list (I checked the link above), so
> the way forward is to update the series so it applies cleanly again
> and post it as a v3.

Start with a review, if you have some time.  Perhaps we can make it better
before sending another revision.

There are two areas which this series decided not to tackle initially:

  - Support for the Alpha platform
  - Support for legacy PCI platforms

Feel free to have a look at the above.  Perhaps you will have some ideas on how
to best convert both of these to use static attributes, so that we could convert
everything at the same time.

> There's no need to wait for Krzysztof to refresh it, and if you have
> time to do it, it would be very welcomed!  The best base would be
> v6.8-rc1.

That I can do, perhaps this coming weekend.  Or even sooner when I find some
time this week.

	Krzysztof
Saurabh Singh Sengar Feb. 28, 2024, 6:16 p.m. UTC | #12
On Thu, Feb 29, 2024 at 02:22:55AM +0900, Krzysztof Wilczyński wrote:
> Hello,
> 
> Sorry for late reply.
> 
> [...]
> > > > > Krzysztof has done a ton of work to convert these files to static
> > > > > attributes, where the device model prevents most of these races:
> > > > > 
> > > > >   506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to static attributes")
> > > > >   d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute")
> > > > >   f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute")
> > > > >   527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute")
> > > > >   e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute")
> > > > > 
> > > > > and he even posted a series to do the same for the resource files:
> > > > > 
> > > > >   https://lore.kernel.org/linux-pci/20210910202623.2293708-1-kw@linux.com/
> > > > > 
> > > > > I can't remember why we didn't apply that at the time, and it no
> > > > > longer applies cleanly, but I think that's the direction we should go.
> > > > 
> > > > Thanks for you review.
> > > > 
> > > > Please inform me if there's existing feedback explaining why this
> > > > series hasn't been merged yet. I am willing to further improve it
> > > > if necessary.
> > > 
> > > Let us know your opinion so that we can move ahead in fixing this
> > > long pending bug.
> 
> I really thought you were asking me about your patch.  So, I didn't reply
> since Bjorn added his review.
> 
> > There's no feedback on the mailing list (I checked the link above), so
> > the way forward is to update the series so it applies cleanly again
> > and post it as a v3.
> 
> Start with a review, if you have some time.  Perhaps we can make it better
> before sending another revision.
> 
> There are two areas which this series decided not to tackle initially:
> 
>   - Support for the Alpha platform
>   - Support for legacy PCI platforms
> 
> Feel free to have a look at the above.  Perhaps you will have some ideas on how
> to best convert both of these to use static attributes, so that we could convert
> everything at the same time.
> 
> > There's no need to wait for Krzysztof to refresh it, and if you have
> > time to do it, it would be very welcomed!  The best base would be
> > v6.8-rc1.
> 
> That I can do, perhaps this coming weekend.  Or even sooner when I find some
> time this week.
> 
> 	Krzysztof

Thank you ! I will look forward to it. I am happy to review and offer
contributions if required.

- Saurabh
Lukas Wunner March 2, 2024, 8:57 a.m. UTC | #13
On Tue, Feb 06, 2024 at 04:07:15PM -0600, Bjorn Helgaas wrote:
> Krzysztof has done a ton of work to convert these files to static
> attributes, where the device model prevents most of these races:
> 
>   506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to static attributes")
>   d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute")
>   f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute")
>   527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute")
>   e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute")
> 
> and he even posted a series to do the same for the resource files:
> 
>   https://lore.kernel.org/linux-pci/20210910202623.2293708-1-kw@linux.com/
> 
> I can't remember why we didn't apply that at the time, and it no
> longer applies cleanly, but I think that's the direction we should go.

When I brought up resource sysfs files in October, Bjorn said:

    I think the reason pci_sysfs_init() exists in the first place is
    because those resources may be assigned after pci_device_add(), and
    (my memory is hazy here) it seems like changing the size of binary
    attributes is hard, which might fit with the
    pci_remove_resource_files() and pci_create_resource_files() in the
    resource##n##_resize_store() macro

    https://lore.kernel.org/all/20231019200110.GA1410324@bhelgaas/

I'm wondering in how far Krzysztof's above-mentioned patches
address the issue of late-appearing resources?

In the meantime I've learned of the existence of sysfs_update_group().
It would seem to me that if resources such as the ROM appear late,
we should just call sysfs_update_group() to make them show up in sysfs
(or correct the size of their sysfs files).

But that requires that we identify the places where resources
are unhidden.  Do we know where this happens?

Thanks,

Lukas
Saurabh Singh Sengar April 15, 2024, 6:15 p.m. UTC | #14
> -----Original Message-----
> From: Krzysztof Wilczyński <kwilczynski@kernel.org>
> Sent: Wednesday, February 28, 2024 10:53 PM
> To: Bjorn Helgaas <helgaas@kernel.org>
> Cc: Saurabh Singh Sengar <ssengar@linux.microsoft.com>;
> bhelgaas@google.com; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org; alexander.stein@ew.tq-group.com; Dexuan Cui
> <decui@microsoft.com>
> Subject: [EXTERNAL] Re: [PATCH] PCI/sysfs: Fix race in pci sysfs creation
> 
> [You don't often get email from kwilczynski@kernel.org. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> Hello,
> 
> Sorry for late reply.
> 
> [...]
> > > > > Krzysztof has done a ton of work to convert these files to
> > > > > static attributes, where the device model prevents most of these
> races:
> > > > >
> > > > >   506140f9c06b ("PCI/sysfs: Convert "index", "acpi_index", "label" to
> static attributes")
> > > > >   d93f8399053d ("PCI/sysfs: Convert "vpd" to static attribute")
> > > > >   f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute")
> > > > >   527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute")
> > > > >   e1d3f3268b0e ("PCI/sysfs: Convert "config" to static
> > > > > attribute")
> > > > >
> > > > > and he even posted a series to do the same for the resource files:
> > > > >
> > > > >
> > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
> > > > > 2Flore.kernel.org%2Flinux-pci%2F20210910202623.2293708-1-
> kw%40li
> > > > >
> nux.com%2F&data=05%7C02%7Cssengar%40microsoft.com%7C99b036f685e
> 4
> > > > >
> 448ddb5408dc3881e998%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0
> %
> > > > >
> 7C638447377886194494%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA
> wMDA
> > > > >
> iLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sda
> > > > >
> ta=TsOJsR8CQGaOrJVnw0BPm0QGL%2FAUQ0GCTuzgrN8FX%2BQ%3D&reserve
> d=0
> > > > >
> > > > > I can't remember why we didn't apply that at the time, and it no
> > > > > longer applies cleanly, but I think that's the direction we should go.
> > > >
> > > > Thanks for you review.
> > > >
> > > > Please inform me if there's existing feedback explaining why this
> > > > series hasn't been merged yet. I am willing to further improve it
> > > > if necessary.
> > >
> > > Let us know your opinion so that we can move ahead in fixing this
> > > long pending bug.
> 
> I really thought you were asking me about your patch.  So, I didn't reply
> since Bjorn added his review.
> 
> > There's no feedback on the mailing list (I checked the link above), so
> > the way forward is to update the series so it applies cleanly again
> > and post it as a v3.
> 
> Start with a review, if you have some time.  Perhaps we can make it better
> before sending another revision.
> 
> There are two areas which this series decided not to tackle initially:
> 
>   - Support for the Alpha platform
>   - Support for legacy PCI platforms
> 
> Feel free to have a look at the above.  Perhaps you will have some ideas on
> how to best convert both of these to use static attributes, so that we could
> convert everything at the same time.
> 
> > There's no need to wait for Krzysztof to refresh it, and if you have
> > time to do it, it would be very welcomed!  The best base would be
> > v6.8-rc1.
> 
> That I can do, perhaps this coming weekend.  Or even sooner when I find
> some time this week.
> 
>         Krzysztof

Krzysztof,
Are you still planning to send the revised version for it ?
diff mbox series

Patch

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index f2909ae93f2f..a31f6f2cf309 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1230,6 +1230,10 @@  static int pci_create_resource_files(struct pci_dev *pdev)
 		if (!pci_resource_len(pdev, i))
 			continue;
 
+		/* Check if resource already allocated and proceed no further */
+		if (pdev->res_attr[i] || pdev->res_attr_wc[i])
+			return 0;
+
 		retval = pci_create_attr(pdev, i, 0);
 		/* for prefetchable resources, create a WC mappable file */
 		if (!retval && arch_can_pci_mmap_wc() &&
@@ -1411,9 +1415,8 @@  static int __init pci_sysfs_init(void)
 	struct pci_bus *pbus = NULL;
 	int retval;
 
-	sysfs_initialized = 1;
 	for_each_pci_dev(pdev) {
-		retval = pci_create_sysfs_dev_files(pdev);
+		retval = pci_create_resource_files(pdev);
 		if (retval) {
 			pci_dev_put(pdev);
 			return retval;
@@ -1423,6 +1426,8 @@  static int __init pci_sysfs_init(void)
 	while ((pbus = pci_find_next_bus(pbus)))
 		pci_create_legacy_files(pbus);
 
+	sysfs_initialized = 1;
+
 	return 0;
 }
 late_initcall(pci_sysfs_init);