diff mbox series

[v9,2/2] PCI: Implement custom llseek for sysfs resource entries

Message ID 20230925084013.309399-2-valesini@yandex-team.ru
State New
Headers show
Series [v9,1/2] kernfs: sysfs: support custom llseek method for sysfs entries | expand

Commit Message

Valentin Sinitsyn Sept. 25, 2023, 8:40 a.m. UTC
Since commit 636b21b50152 ("PCI: Revoke mappings like devmem"), mmappable
sysfs entries have started to receive their f_mapping from the iomem
pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs
(and procfs) as well as in /dev/[k]mem.

This resulted in a userspace-visible regression:

1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0)
2. Use lseek(fd, 0, SEEK_END) to determine its size

Expected result: a PCI region size is returned.
Actual result: 0 is returned.

The reason is that PCI resource files residing in sysfs use
generic_file_llseek(), which relies on f_mapping->host inode to get the
file size. As f_mapping is now redefined, f_mapping->host points to an
anonymous zero-sized iomem_inode which has nothing to do with sysfs file
in question.

Implement a custom llseek method for sysfs PCI resources, which is
almost the same as proc_bus_pci_lseek() used for procfs entries.

This makes sysfs and procfs entries consistent with regards to seeking,
but also introduces userspace-visible changes to seeking PCI resources
in sysfs:

- SEEK_DATA and SEEK_HOLE are no longer supported;
- Seeking past the end of the file is prohibited while previously
  offsets up to MAX_NON_LFS were accepted (reading from these offsets
  was always invalid).

Fixes: 636b21b50152 ("PCI: Revoke mappings like devmem")
Cc: stable@vger.kernel.org
Signed-off-by: Valentine Sinitsyn <valesini@yandex-team.ru>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pci-sysfs.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

Comments

Greg KH Oct. 5, 2023, 11:41 a.m. UTC | #1
On Mon, Sep 25, 2023 at 11:40:13AM +0300, Valentine Sinitsyn wrote:
> Since commit 636b21b50152 ("PCI: Revoke mappings like devmem"), mmappable
> sysfs entries have started to receive their f_mapping from the iomem
> pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs
> (and procfs) as well as in /dev/[k]mem.
> 
> This resulted in a userspace-visible regression:
> 
> 1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0)
> 2. Use lseek(fd, 0, SEEK_END) to determine its size
> 
> Expected result: a PCI region size is returned.
> Actual result: 0 is returned.
> 
> The reason is that PCI resource files residing in sysfs use
> generic_file_llseek(), which relies on f_mapping->host inode to get the
> file size. As f_mapping is now redefined, f_mapping->host points to an
> anonymous zero-sized iomem_inode which has nothing to do with sysfs file
> in question.
> 
> Implement a custom llseek method for sysfs PCI resources, which is
> almost the same as proc_bus_pci_lseek() used for procfs entries.
> 
> This makes sysfs and procfs entries consistent with regards to seeking,
> but also introduces userspace-visible changes to seeking PCI resources
> in sysfs:
> 
> - SEEK_DATA and SEEK_HOLE are no longer supported;
> - Seeking past the end of the file is prohibited while previously
>   offsets up to MAX_NON_LFS were accepted (reading from these offsets
>   was always invalid).
> 
> Fixes: 636b21b50152 ("PCI: Revoke mappings like devmem")
> Cc: stable@vger.kernel.org
> Signed-off-by: Valentine Sinitsyn <valesini@yandex-team.ru>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/pci-sysfs.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)

I'll take these now, for 6.7-rc1, but not mark them as fixes or cc:
stable as this is a new functionality, the code has never worked for
lseek on these files so it's not like anything was broken :)

thanks,

greg k-h
Dan Williams Oct. 5, 2023, 6:16 p.m. UTC | #2
Valentine Sinitsyn wrote:
> Since commit 636b21b50152 ("PCI: Revoke mappings like devmem"), mmappable
> sysfs entries have started to receive their f_mapping from the iomem
> pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs
> (and procfs) as well as in /dev/[k]mem.
> 
> This resulted in a userspace-visible regression:
> 
> 1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0)
> 2. Use lseek(fd, 0, SEEK_END) to determine its size
> 
> Expected result: a PCI region size is returned.
> Actual result: 0 is returned.
> 
> The reason is that PCI resource files residing in sysfs use
> generic_file_llseek(), which relies on f_mapping->host inode to get the
> file size. As f_mapping is now redefined, f_mapping->host points to an
> anonymous zero-sized iomem_inode which has nothing to do with sysfs file
> in question.
> 
> Implement a custom llseek method for sysfs PCI resources, which is
> almost the same as proc_bus_pci_lseek() used for procfs entries.
> 
> This makes sysfs and procfs entries consistent with regards to seeking,
> but also introduces userspace-visible changes to seeking PCI resources
> in sysfs:
> 
> - SEEK_DATA and SEEK_HOLE are no longer supported;
> - Seeking past the end of the file is prohibited while previously
>   offsets up to MAX_NON_LFS were accepted (reading from these offsets
>   was always invalid).
> 
> Fixes: 636b21b50152 ("PCI: Revoke mappings like devmem")
> Cc: stable@vger.kernel.org
> Signed-off-by: Valentine Sinitsyn <valesini@yandex-team.ru>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Thanks for doing this reorganization and the follow-up, looks good to me:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Valentin Sinitsyn Oct. 8, 2023, 9:51 p.m. UTC | #3
On 05.10.2023 16:41, Greg Kroah-Hartman wrote:
> On Mon, Sep 25, 2023 at 11:40:13AM +0300, Valentine Sinitsyn wrote:
>> Since commit 636b21b50152 ("PCI: Revoke mappings like devmem"), mmappable
>> sysfs entries have started to receive their f_mapping from the iomem
>> pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs
>> (and procfs) as well as in /dev/[k]mem.
>>
>> This resulted in a userspace-visible regression:
>>
>> 1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0)
>> 2. Use lseek(fd, 0, SEEK_END) to determine its size
>>
>> Expected result: a PCI region size is returned.
>> Actual result: 0 is returned.
>>
>> The reason is that PCI resource files residing in sysfs use
>> generic_file_llseek(), which relies on f_mapping->host inode to get the
>> file size. As f_mapping is now redefined, f_mapping->host points to an
>> anonymous zero-sized iomem_inode which has nothing to do with sysfs file
>> in question.
>>
>> Implement a custom llseek method for sysfs PCI resources, which is
>> almost the same as proc_bus_pci_lseek() used for procfs entries.
>>
>> This makes sysfs and procfs entries consistent with regards to seeking,
>> but also introduces userspace-visible changes to seeking PCI resources
>> in sysfs:
>>
>> - SEEK_DATA and SEEK_HOLE are no longer supported;
>> - Seeking past the end of the file is prohibited while previously
>>    offsets up to MAX_NON_LFS were accepted (reading from these offsets
>>    was always invalid).
>>
>> Fixes: 636b21b50152 ("PCI: Revoke mappings like devmem")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Valentine Sinitsyn <valesini@yandex-team.ru>
>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>> ---
>>   drivers/pci/pci-sysfs.c | 26 +++++++++++++++++++++++++-
>>   1 file changed, 25 insertions(+), 1 deletion(-)
> 
> I'll take these now, for 6.7-rc1, but not mark them as fixes or cc:
Thanks, appreciated.

> stable as this is a new functionality, the code has never worked for
> lseek on these files so it's not like anything was broken :)
In fact, lseek() on PCI resource files in sysfs was broken since commit 
636b21b50152. That was the reason why I started to investigate the 
issue: one of our applications stopped working after a kernel update.

I'm not hundred percent sure if it belongs to stable, but it does fix a 
user-visible regression.

Best,
Valentin
diff mbox series

Patch

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index d9eede2dbc0e..97c9c62d5e3e 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -835,6 +835,19 @@  static const struct attribute_group pci_dev_config_attr_group = {
 	.is_bin_visible = pci_dev_config_attr_is_visible,
 };
 
+/*
+ * llseek operation for mmappable PCI resources.
+ * May be left unused if the arch doesn't provide them.
+ */
+static __maybe_unused loff_t
+pci_llseek_resource(struct file *filep,
+		    struct kobject *kobj __always_unused,
+		    struct bin_attribute *attr,
+		    loff_t offset, int whence)
+{
+	return fixed_size_llseek(filep, offset, whence, attr->size);
+}
+
 #ifdef HAVE_PCI_LEGACY
 /**
  * pci_read_legacy_io - read byte(s) from legacy I/O port space
@@ -967,6 +980,8 @@  void pci_create_legacy_files(struct pci_bus *b)
 	b->legacy_io->attr.mode = 0600;
 	b->legacy_io->read = pci_read_legacy_io;
 	b->legacy_io->write = pci_write_legacy_io;
+	/* See pci_create_attr() for motivation */
+	b->legacy_io->llseek = pci_llseek_resource;
 	b->legacy_io->mmap = pci_mmap_legacy_io;
 	b->legacy_io->f_mapping = iomem_get_mapping;
 	pci_adjust_legacy_attr(b, pci_mmap_io);
@@ -981,6 +996,8 @@  void pci_create_legacy_files(struct pci_bus *b)
 	b->legacy_mem->size = 1024*1024;
 	b->legacy_mem->attr.mode = 0600;
 	b->legacy_mem->mmap = pci_mmap_legacy_mem;
+	/* See pci_create_attr() for motivation */
+	b->legacy_mem->llseek = pci_llseek_resource;
 	b->legacy_mem->f_mapping = iomem_get_mapping;
 	pci_adjust_legacy_attr(b, pci_mmap_mem);
 	error = device_create_bin_file(&b->dev, b->legacy_mem);
@@ -1199,8 +1216,15 @@  static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
 			res_attr->mmap = pci_mmap_resource_uc;
 		}
 	}
-	if (res_attr->mmap)
+	if (res_attr->mmap) {
 		res_attr->f_mapping = iomem_get_mapping;
+		/*
+		 * generic_file_llseek() consults f_mapping->host to determine
+		 * the file size. As iomem_inode knows nothing about the
+		 * attribute, it's not going to work, so override it as well.
+		 */
+		res_attr->llseek = pci_llseek_resource;
+	}
 	res_attr->attr.name = res_attr_name;
 	res_attr->attr.mode = 0600;
 	res_attr->size = pci_resource_len(pdev, num);