Message ID | 20230925084013.309399-2-valesini@yandex-team.ru |
---|---|
State | New |
Headers | show |
Series | [v9,1/2] kernfs: sysfs: support custom llseek method for sysfs entries | expand |
On Mon, Sep 25, 2023 at 11:40:13AM +0300, Valentine Sinitsyn wrote: > Since commit 636b21b50152 ("PCI: Revoke mappings like devmem"), mmappable > sysfs entries have started to receive their f_mapping from the iomem > pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs > (and procfs) as well as in /dev/[k]mem. > > This resulted in a userspace-visible regression: > > 1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0) > 2. Use lseek(fd, 0, SEEK_END) to determine its size > > Expected result: a PCI region size is returned. > Actual result: 0 is returned. > > The reason is that PCI resource files residing in sysfs use > generic_file_llseek(), which relies on f_mapping->host inode to get the > file size. As f_mapping is now redefined, f_mapping->host points to an > anonymous zero-sized iomem_inode which has nothing to do with sysfs file > in question. > > Implement a custom llseek method for sysfs PCI resources, which is > almost the same as proc_bus_pci_lseek() used for procfs entries. > > This makes sysfs and procfs entries consistent with regards to seeking, > but also introduces userspace-visible changes to seeking PCI resources > in sysfs: > > - SEEK_DATA and SEEK_HOLE are no longer supported; > - Seeking past the end of the file is prohibited while previously > offsets up to MAX_NON_LFS were accepted (reading from these offsets > was always invalid). > > Fixes: 636b21b50152 ("PCI: Revoke mappings like devmem") > Cc: stable@vger.kernel.org > Signed-off-by: Valentine Sinitsyn <valesini@yandex-team.ru> > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > --- > drivers/pci/pci-sysfs.c | 26 +++++++++++++++++++++++++- > 1 file changed, 25 insertions(+), 1 deletion(-) I'll take these now, for 6.7-rc1, but not mark them as fixes or cc: stable as this is a new functionality, the code has never worked for lseek on these files so it's not like anything was broken :) thanks, greg k-h
Valentine Sinitsyn wrote: > Since commit 636b21b50152 ("PCI: Revoke mappings like devmem"), mmappable > sysfs entries have started to receive their f_mapping from the iomem > pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs > (and procfs) as well as in /dev/[k]mem. > > This resulted in a userspace-visible regression: > > 1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0) > 2. Use lseek(fd, 0, SEEK_END) to determine its size > > Expected result: a PCI region size is returned. > Actual result: 0 is returned. > > The reason is that PCI resource files residing in sysfs use > generic_file_llseek(), which relies on f_mapping->host inode to get the > file size. As f_mapping is now redefined, f_mapping->host points to an > anonymous zero-sized iomem_inode which has nothing to do with sysfs file > in question. > > Implement a custom llseek method for sysfs PCI resources, which is > almost the same as proc_bus_pci_lseek() used for procfs entries. > > This makes sysfs and procfs entries consistent with regards to seeking, > but also introduces userspace-visible changes to seeking PCI resources > in sysfs: > > - SEEK_DATA and SEEK_HOLE are no longer supported; > - Seeking past the end of the file is prohibited while previously > offsets up to MAX_NON_LFS were accepted (reading from these offsets > was always invalid). > > Fixes: 636b21b50152 ("PCI: Revoke mappings like devmem") > Cc: stable@vger.kernel.org > Signed-off-by: Valentine Sinitsyn <valesini@yandex-team.ru> > Acked-by: Bjorn Helgaas <bhelgaas@google.com> Thanks for doing this reorganization and the follow-up, looks good to me: Reviewed-by: Dan Williams <dan.j.williams@intel.com>
On 05.10.2023 16:41, Greg Kroah-Hartman wrote: > On Mon, Sep 25, 2023 at 11:40:13AM +0300, Valentine Sinitsyn wrote: >> Since commit 636b21b50152 ("PCI: Revoke mappings like devmem"), mmappable >> sysfs entries have started to receive their f_mapping from the iomem >> pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs >> (and procfs) as well as in /dev/[k]mem. >> >> This resulted in a userspace-visible regression: >> >> 1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0) >> 2. Use lseek(fd, 0, SEEK_END) to determine its size >> >> Expected result: a PCI region size is returned. >> Actual result: 0 is returned. >> >> The reason is that PCI resource files residing in sysfs use >> generic_file_llseek(), which relies on f_mapping->host inode to get the >> file size. As f_mapping is now redefined, f_mapping->host points to an >> anonymous zero-sized iomem_inode which has nothing to do with sysfs file >> in question. >> >> Implement a custom llseek method for sysfs PCI resources, which is >> almost the same as proc_bus_pci_lseek() used for procfs entries. >> >> This makes sysfs and procfs entries consistent with regards to seeking, >> but also introduces userspace-visible changes to seeking PCI resources >> in sysfs: >> >> - SEEK_DATA and SEEK_HOLE are no longer supported; >> - Seeking past the end of the file is prohibited while previously >> offsets up to MAX_NON_LFS were accepted (reading from these offsets >> was always invalid). >> >> Fixes: 636b21b50152 ("PCI: Revoke mappings like devmem") >> Cc: stable@vger.kernel.org >> Signed-off-by: Valentine Sinitsyn <valesini@yandex-team.ru> >> Acked-by: Bjorn Helgaas <bhelgaas@google.com> >> --- >> drivers/pci/pci-sysfs.c | 26 +++++++++++++++++++++++++- >> 1 file changed, 25 insertions(+), 1 deletion(-) > > I'll take these now, for 6.7-rc1, but not mark them as fixes or cc: Thanks, appreciated. > stable as this is a new functionality, the code has never worked for > lseek on these files so it's not like anything was broken :) In fact, lseek() on PCI resource files in sysfs was broken since commit 636b21b50152. That was the reason why I started to investigate the issue: one of our applications stopped working after a kernel update. I'm not hundred percent sure if it belongs to stable, but it does fix a user-visible regression. Best, Valentin
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index d9eede2dbc0e..97c9c62d5e3e 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -835,6 +835,19 @@ static const struct attribute_group pci_dev_config_attr_group = { .is_bin_visible = pci_dev_config_attr_is_visible, }; +/* + * llseek operation for mmappable PCI resources. + * May be left unused if the arch doesn't provide them. + */ +static __maybe_unused loff_t +pci_llseek_resource(struct file *filep, + struct kobject *kobj __always_unused, + struct bin_attribute *attr, + loff_t offset, int whence) +{ + return fixed_size_llseek(filep, offset, whence, attr->size); +} + #ifdef HAVE_PCI_LEGACY /** * pci_read_legacy_io - read byte(s) from legacy I/O port space @@ -967,6 +980,8 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_io->attr.mode = 0600; b->legacy_io->read = pci_read_legacy_io; b->legacy_io->write = pci_write_legacy_io; + /* See pci_create_attr() for motivation */ + b->legacy_io->llseek = pci_llseek_resource; b->legacy_io->mmap = pci_mmap_legacy_io; b->legacy_io->f_mapping = iomem_get_mapping; pci_adjust_legacy_attr(b, pci_mmap_io); @@ -981,6 +996,8 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_mem->size = 1024*1024; b->legacy_mem->attr.mode = 0600; b->legacy_mem->mmap = pci_mmap_legacy_mem; + /* See pci_create_attr() for motivation */ + b->legacy_mem->llseek = pci_llseek_resource; b->legacy_mem->f_mapping = iomem_get_mapping; pci_adjust_legacy_attr(b, pci_mmap_mem); error = device_create_bin_file(&b->dev, b->legacy_mem); @@ -1199,8 +1216,15 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine) res_attr->mmap = pci_mmap_resource_uc; } } - if (res_attr->mmap) + if (res_attr->mmap) { res_attr->f_mapping = iomem_get_mapping; + /* + * generic_file_llseek() consults f_mapping->host to determine + * the file size. As iomem_inode knows nothing about the + * attribute, it's not going to work, so override it as well. + */ + res_attr->llseek = pci_llseek_resource; + } res_attr->attr.name = res_attr_name; res_attr->attr.mode = 0600; res_attr->size = pci_resource_len(pdev, num);