Patchwork pci: Avoid unnecessary calls to work_on_cpu

login
register
mail settings
Submitter Alexander Duyck
Date June 24, 2013, 8:05 p.m.
Message ID <20130624195942.40795.27292.stgit@ahduyck-cp1.jf.intel.com>
Download mbox | patch
Permalink /patch/253953/
State Changes Requested
Headers show

Comments

Alexander Duyck - June 24, 2013, 8:05 p.m.
This patch is meant to address the fact that we are making unnecessary calls
to work_on_cpu.  To resolve this I have added a check to see if the current
node is the correct node for the device before we decide to assign the probe
task to another CPU.

The advantages to this approach is that we can avoid reentrant calls to
work_on_cpu.  In addition we should not make any calls to setup the work
remotely in the case of a single node system that has NUMA enabled.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

This patch is based off of work I submitted in an earlier patch that I never
heard back on.  The change was originally submitted in:
  pci: Avoid reentrant calls to work_on_cpu

I'm not sure what ever happened with that patch, however after reviewing it
some myself I decided I could do without the change to the comments since they
were unneeded.  As such I am resubmitting this as a much simpler patch that
only adds the line of code needed to avoid calling work_on_cpu for every call
to probe on an NUMA node specific device.

 drivers/pci/pci-driver.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - July 5, 2013, 11:36 p.m.
[+cc Rusty]

On Mon, Jun 24, 2013 at 2:05 PM, Alexander Duyck
<alexander.h.duyck@intel.com> wrote:
> This patch is meant to address the fact that we are making unnecessary calls
> to work_on_cpu.  To resolve this I have added a check to see if the current
> node is the correct node for the device before we decide to assign the probe
> task to another CPU.
>
> The advantages to this approach is that we can avoid reentrant calls to
> work_on_cpu.  In addition we should not make any calls to setup the work
> remotely in the case of a single node system that has NUMA enabled.

The description above makes it sound like this is just a minor
performance enhancement, but I think the real reason you want this is
to resolve the lockdep warning mentioned at [1].  That thread is long
and confusing, so I'd like to see a bugzilla that distills out the
useful details, and a synopsis in this changelog.

[1] https://lkml.kernel.org/r/1368498506-25857-7-git-send-email-yinghai@kernel.org

> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>
> This patch is based off of work I submitted in an earlier patch that I never
> heard back on.  The change was originally submitted in:
>   pci: Avoid reentrant calls to work_on_cpu
>
> I'm not sure what ever happened with that patch, however after reviewing it
> some myself I decided I could do without the change to the comments since they
> were unneeded.  As such I am resubmitting this as a much simpler patch that
> only adds the line of code needed to avoid calling work_on_cpu for every call
> to probe on an NUMA node specific device.
>
>  drivers/pci/pci-driver.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 79277fb..7d81713 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -282,7 +282,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
>            its local memory on the right node without any need to
>            change it. */
>         node = dev_to_node(&dev->dev);
> -       if (node >= 0) {
> +       if ((node >= 0) && (node != numa_node_id())) {
>                 int cpu;
>
>                 get_online_cpus();

I think it's theoretically unsafe to use numa_node_id() while
preemption is enabled.

It seems a little strange to me that this "run the driver probe method
on the correct node" code is in PCI.  I would think this behavior
would be desirable for *all* bus types, not just PCI, so maybe it
would make sense to do this up in device_attach() or somewhere
similar.

But Rusty added this (in 873392ca51), and he knows way more about this
stuff than I do.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt - July 6, 2013, 12:29 a.m.
On Fri, 2013-07-05 at 17:36 -0600, Bjorn Helgaas wrote:
> It seems a little strange to me that this "run the driver probe method
> on the correct node" code is in PCI.  I would think this behavior
> would be desirable for *all* bus types, not just PCI, so maybe it
> would make sense to do this up in device_attach() or somewhere
> similar.
> 
> But Rusty added this (in 873392ca51), and he knows way more about this
> stuff than I do.

I tend to agree... I can see this being useful on some of our non-PCI
devices on power as well in fact.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rusty Russell - July 8, 2013, 1:44 a.m.
Bjorn Helgaas <bhelgaas@google.com> writes:
> [+cc Rusty]
>
> On Mon, Jun 24, 2013 at 2:05 PM, Alexander Duyck
> <alexander.h.duyck@intel.com> wrote:
>> This patch is meant to address the fact that we are making unnecessary calls
>> to work_on_cpu.  To resolve this I have added a check to see if the current
>> node is the correct node for the device before we decide to assign the probe
>> task to another CPU.
>>
>> The advantages to this approach is that we can avoid reentrant calls to
>> work_on_cpu.  In addition we should not make any calls to setup the work
>> remotely in the case of a single node system that has NUMA enabled.
>
> The description above makes it sound like this is just a minor
> performance enhancement, but I think the real reason you want this is
> to resolve the lockdep warning mentioned at [1].  That thread is long
> and confusing, so I'd like to see a bugzilla that distills out the
> useful details, and a synopsis in this changelog.
>
> [1] https://lkml.kernel.org/r/1368498506-25857-7-git-send-email-yinghai@kernel.org
>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> ---
>>
>> This patch is based off of work I submitted in an earlier patch that I never
>> heard back on.  The change was originally submitted in:
>>   pci: Avoid reentrant calls to work_on_cpu
>>
>> I'm not sure what ever happened with that patch, however after reviewing it
>> some myself I decided I could do without the change to the comments since they
>> were unneeded.  As such I am resubmitting this as a much simpler patch that
>> only adds the line of code needed to avoid calling work_on_cpu for every call
>> to probe on an NUMA node specific device.
>>
>>  drivers/pci/pci-driver.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>> index 79277fb..7d81713 100644
>> --- a/drivers/pci/pci-driver.c
>> +++ b/drivers/pci/pci-driver.c
>> @@ -282,7 +282,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
>>            its local memory on the right node without any need to
>>            change it. */
>>         node = dev_to_node(&dev->dev);
>> -       if (node >= 0) {
>> +       if ((node >= 0) && (node != numa_node_id())) {
>>                 int cpu;
>>
>>                 get_online_cpus();
>
> I think it's theoretically unsafe to use numa_node_id() while
> preemption is enabled.
>
> It seems a little strange to me that this "run the driver probe method
> on the correct node" code is in PCI.  I would think this behavior
> would be desirable for *all* bus types, not just PCI, so maybe it
> would make sense to do this up in device_attach() or somewhere
> similar.
>
> But Rusty added this (in 873392ca51), and he knows way more about this
> stuff than I do.

Actually, I just stopped the code from playing cpumask games, which is
what it used to do.

You want Andi and/or Greg KH:

commit d42c69972b853fd33a26c8c7405624be41a22136
Author: Andi Kleen <ak@suse.de>
Date:   Wed Jul 6 19:56:03 2005 +0200

    [PATCH] PCI: Run PCI driver initialization on local node
    
    Run PCI driver initialization on local node
    
    Instead of adding messy kmalloc_node()s everywhere run the
    PCI driver probe on the node local to the device.
    
    This would not have helped for IDE, but should for
    other more clean drivers that do more initialization in probe().
    It won't help for drivers that do most of the work
    on first open (like many network drivers)
    
    Signed-off-by: Andi Kleen <ak@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andi Kleen - July 8, 2013, 4:37 a.m.
> > But Rusty added this (in 873392ca51), and he knows way more about this
> > stuff than I do.
> 
> Actually, I just stopped the code from playing cpumask games, which is
> what it used to do.

You're right the numa_node_id() check ptimization is not 100% safe on preempt 
kernels and should be probably removed. Also I agree it would probably
make sense to move it up to the generic device layer (although I'm not
sure that other bus types really care that much about NUMA locality)

None of it seems to be a fatal problem though, so it boils down to
"if someone cares enough to write a patch"

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 79277fb..7d81713 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -282,7 +282,7 @@  static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
 	   its local memory on the right node without any need to
 	   change it. */
 	node = dev_to_node(&dev->dev);
-	if (node >= 0) {
+	if ((node >= 0) && (node != numa_node_id())) {
 		int cpu;
 
 		get_online_cpus();