diff mbox series

pci: call _cond_resched() after pci_bus_write_config

Message ID 20211013125542.759696-1-imagedong@tencent.com
State New
Headers show
Series pci: call _cond_resched() after pci_bus_write_config | expand

Commit Message

Menglong Dong Oct. 13, 2021, 12:55 p.m. UTC
From: Menglong Dong <imagedong@tencent.com>

While the system is running in KVM, pci config writing for virtio devices
may cost long time(about 1-2ms), as it causes VM-exit. During
__pci_bus_assign_resources(), pci_setup_bridge, which can do pci config
writing up to 10 times, can be called many times without any
_cond_resched(). So __pci_bus_assign_resources can cause 25+ms scheduling
latency with !CONFIG_PREEMPT.

To solve this problem, call _cond_resched() after pci config writing.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
 drivers/pci/access.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Bjorn Helgaas Oct. 13, 2021, 7 p.m. UTC | #1
Match previous subject lines (use "git log --oneline
drivers/pci/access.c" to see them).

On Wed, Oct 13, 2021 at 08:55:42PM +0800, menglong8.dong@gmail.com wrote:
> From: Menglong Dong <imagedong@tencent.com>
> 
> While the system is running in KVM, pci config writing for virtio devices
> may cost long time(about 1-2ms), as it causes VM-exit. During
> __pci_bus_assign_resources(), pci_setup_bridge, which can do pci config
> writing up to 10 times, can be called many times without any
> _cond_resched(). So __pci_bus_assign_resources can cause 25+ms scheduling
> latency with !CONFIG_PREEMPT.
> 
> To solve this problem, call _cond_resched() after pci config writing.

s/pci/PCI/ above.
Add space before "(".
Add "()" after function names consistently (some have it, some don't).

What exactly is the problem?  I expect __pci_bus_assign_resources() to
be used mostly during boot-time enumeration.  How much of a problem is
the latency at that point?  Why is this particularly a problem in the
KVM environment?  Or is it also a problem on bare metal?

Are there other config write paths that should have a similar change?

_cond_resched() only appears here:

  $ git grep "\<_cond_resched\>"
  include/linux/sched.h:static __always_inline int _cond_resched(void)
  include/linux/sched.h:static inline int _cond_resched(void)
  include/linux/sched.h:static inline int _cond_resched(void) { return 0; }
  include/linux/sched.h:  _cond_resched();

so I don't believe PCI is so special that this needs to be the only
other use.  Maybe a different resched interface is more appropriate?

> Signed-off-by: Menglong Dong <imagedong@tencent.com>
> ---
>  drivers/pci/access.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> index 46935695cfb9..babed43702df 100644
> --- a/drivers/pci/access.c
> +++ b/drivers/pci/access.c
> @@ -57,6 +57,7 @@ int noinline pci_bus_write_config_##size \
>  	pci_lock_config(flags);						\
>  	res = bus->ops->write(bus, devfn, pos, len, value);		\
>  	pci_unlock_config(flags);					\
> +	_cond_resched();						\
>  	return res;							\
>  }
>  
> -- 
> 2.27.0
>
Menglong Dong Oct. 14, 2021, 2:34 a.m. UTC | #2
Hello,

On Thu, Oct 14, 2021 at 3:00 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
[...]
>
> s/pci/PCI/ above.
> Add space before "(".
> Add "()" after function names consistently (some have it, some don't).

Thanks, get it!

>
> What exactly is the problem?  I expect __pci_bus_assign_resources() to
> be used mostly during boot-time enumeration.  How much of a problem is
> the latency at that point?  Why is this particularly a problem in the
> KVM environment?  Or is it also a problem on bare metal?
>

In fact, this is a problem on KVM when hotplug virtual devices. The
initialization
of this devices will be done in a workqueue, which can be seen from the call
stack:

62485   62485   kworker/u8:0    pcibios_resource_survey_bus
        b'pcibios_resource_survey_bus+0x1 [kernel]'
        b'acpiphp_check_bridge.part.13+0x11c [kernel]'
        b'acpiphp_hotplug_notify+0x14b [kernel]'
        b'acpi_device_hotplug+0xe0 [kernel]'
        b'acpi_hotplug_work_fn+0x1e [kernel]'
        b'process_one_work+0x19f [kernel]'
        b'worker_thread+0x37 [kernel]'
        b'kthread+0x117 [kernel]'
        b'ret_from_fork+0x24 [kernel]'

And __pci_bus_assign_resources() will be called too. However, as the device
is virtual, it is simulated by qemu, which makes its pci config can't be written
directly. During pci config writing, it will cause KVM exit guest mode and qemu
in HOST can process the pci config writing. Therefore, the kworker in KVM will
block the CPU.

It is't a problem on bare metal.

The latency can be different in different machines. With 4-core CPU and 2G
memory, single pci config writing can cost 1-2ms, and
__pci_bus_assign_resources()
can cost up to 30ms.


> Are there other config write paths that should have a similar change?
>
> _cond_resched() only appears here:
>
>   $ git grep "\<_cond_resched\>"
>   include/linux/sched.h:static __always_inline int _cond_resched(void)
>   include/linux/sched.h:static inline int _cond_resched(void)
>   include/linux/sched.h:static inline int _cond_resched(void) { return 0; }
>   include/linux/sched.h:  _cond_resched();
>
> so I don't believe PCI is so special that this needs to be the only
> other use.  Maybe a different resched interface is more appropriate?

Seems _cond_resched() is not directly used any more. cond_resched()
should be used here.

Thanks!
Menglong Dong
diff mbox series

Patch

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 46935695cfb9..babed43702df 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -57,6 +57,7 @@  int noinline pci_bus_write_config_##size \
 	pci_lock_config(flags);						\
 	res = bus->ops->write(bus, devfn, pos, len, value);		\
 	pci_unlock_config(flags);					\
+	_cond_resched();						\
 	return res;							\
 }