diff mbox

[1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()

Message ID 1379382464-7920-2-git-send-email-vfalico@redhat.com
State Changes Requested
Headers show

Commit Message

Veaceslav Falico Sept. 17, 2013, 1:47 a.m. UTC
Before trying to kobject_init_and_add(), we add a reference to pdev via
pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
don't return it back - even on out_unroll.

Fix this by adding pci_dev_put(pdev) before going to unrolling section.

CC: Bjorn Helgaas <bhelgaas@google.com>
CC: linux-pci@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
 drivers/pci/msi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Sept. 25, 2013, 9:08 p.m. UTC | #1
[+cc Neil (he added this code in da8d1c8ba4), Greg]

On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
> Before trying to kobject_init_and_add(), we add a reference to pdev via
> pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
> don't return it back - even on out_unroll.
>
> Fix this by adding pci_dev_put(pdev) before going to unrolling section.
>
> CC: Bjorn Helgaas <bhelgaas@google.com>
> CC: linux-pci@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> ---
>  drivers/pci/msi.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index d5f90d6..14bf578 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>                 pci_dev_get(pdev);
>                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
>                                      "%u", entry->irq);
> -               if (ret)
> +               if (ret) {
> +                       pci_dev_put(pdev);
>                         goto out_unroll;
> +               }
>
>                 count++;
>         }

I don't understand why this code does the pci_dev_get() in the first
place.  The pdev->msi_list of msi_desc structs is private to the
pci_dev, and even without bumping the refcount, there should be no way
for the pci_dev to be freed before the msi_desc.

I also don't understand this nearby code (the same pattern appears in
free_msi_irqs()):

    out_unroll:
        list_for_each_entry(entry, &pdev->msi_list, list) {
                if (!count)
                        break;
                kobject_del(&entry->kobj);
                kobject_put(&entry->kobj);
                count--;
        }

Why do we call kobject_del() here?  The kobject_put() will call
kobject_del() anyway, so it looks redundant.
Documentation/kobject.txt says kobject_del() must be called explicitly
to break a circular reference, but I don't think we have that here.

Also, I think it is incorrect that free_msi_irqs() does this:

                if (entry->kobj.parent) {
                        kobject_del(&entry->kobj);
                        kobject_put(&entry->kobj);
                }

                list_del(&entry->list);
                kfree(entry);

I think the "kfree(entry)" should be in msi_kobj_release() instead.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Sept. 25, 2013, 9:30 p.m. UTC | #2
On Wed, Sep 25, 2013 at 3:08 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Also, I think it is incorrect that free_msi_irqs() does this:
>
>                 if (entry->kobj.parent) {
>                         kobject_del(&entry->kobj);
>                         kobject_put(&entry->kobj);
>                 }
>
>                 list_del(&entry->list);
>                 kfree(entry);
>
> I think the "kfree(entry)" should be in msi_kobj_release() instead.

Oh, I see you fixed this in patch 3/3.  I hadn't read that far yet :)

Did you find these problems by inspection, or is there some easy way
to trigger bad behavior?  Just wondering if this is some way I can
reproduce the problem.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Veaceslav Falico Sept. 25, 2013, 10:09 p.m. UTC | #3
On Wed, Sep 25, 2013 at 03:30:14PM -0600, Bjorn Helgaas wrote:
>On Wed, Sep 25, 2013 at 3:08 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> Also, I think it is incorrect that free_msi_irqs() does this:
>>
>>                 if (entry->kobj.parent) {
>>                         kobject_del(&entry->kobj);
>>                         kobject_put(&entry->kobj);
>>                 }
>>
>>                 list_del(&entry->list);
>>                 kfree(entry);
>>
>> I think the "kfree(entry)" should be in msi_kobj_release() instead.
>
>Oh, I see you fixed this in patch 3/3.  I hadn't read that far yet :)
>
>Did you find these problems by inspection, or is there some easy way
>to trigger bad behavior?  Just wondering if this is some way I can
>reproduce the problem.

Hi,

I've first found it by building with CONFIG_DEBUG_KOBJECT_RELEASE and
CONFIG_DEBUG_OBJECTS - it shows that it's freeing an object in an active
state (I'm just running insmod/rmmod tg3 concurently, but I guess it's
reproducible with any driver that uses msi/x).

Without CONFIG_DEBUG_OBJECTS it's also reproducible, and without
CONFIG_DEBUG_KOBJECT_RELEASE it's really hard to reproduce, but still
reproducible (I've hit it with tg3 as a slave of bonding and concurently
running rmmod bonding/ifup bond0 - though it's really hard). It just panics
in kobject_put(), iirc.

So, with those CONFIG_DEBUG_* it's easily triggerable, without - quite
hard.

Hope that helps.

p.s. I'll adress your other comments tomorrow already, it's quite late here
and I don't want to do something stupid now :).

Thanks a lot!

>
>Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Sept. 25, 2013, 11:23 p.m. UTC | #4
On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
> [+cc Neil (he added this code in da8d1c8ba4), Greg]
> 
> On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
> > Before trying to kobject_init_and_add(), we add a reference to pdev via
> > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
> > don't return it back - even on out_unroll.
> >
> > Fix this by adding pci_dev_put(pdev) before going to unrolling section.
> >
> > CC: Bjorn Helgaas <bhelgaas@google.com>
> > CC: linux-pci@vger.kernel.org
> > CC: linux-kernel@vger.kernel.org
> > Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> > ---
> >  drivers/pci/msi.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index d5f90d6..14bf578 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
> >                 pci_dev_get(pdev);
> >                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
> >                                      "%u", entry->irq);
> > -               if (ret)
> > +               if (ret) {
> > +                       pci_dev_put(pdev);
> >                         goto out_unroll;
> > +               }
> >
> >                 count++;
> >         }
> 
> I don't understand why this code does the pci_dev_get() in the first
> place.  The pdev->msi_list of msi_desc structs is private to the
> pci_dev, and even without bumping the refcount, there should be no way
> for the pci_dev to be freed before the msi_desc.
> 
Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that
people didn't try to remove the device prior to freeing all their interrupts
(i.e I didn't want a broken driver to go through its remove routine without
freeing all its irqs).  That might have been the wrong thing to do, but thats
what bubbles to the front of my head when looking at this.

> I also don't understand this nearby code (the same pattern appears in
> free_msi_irqs()):
> 
>     out_unroll:
>         list_for_each_entry(entry, &pdev->msi_list, list) {
>                 if (!count)
>                         break;
>                 kobject_del(&entry->kobj);
>                 kobject_put(&entry->kobj);
>                 count--;
>         }
> 
> Why do we call kobject_del() here?  The kobject_put() will call
> kobject_del() anyway, so it looks redundant.
> Documentation/kobject.txt says kobject_del() must be called explicitly
> to break a circular reference, but I don't think we have that here.
> 
 I think thats exactly why I did it, because of the documentation.  I agree
however, it does look redundant.  Harmless, but redundant.

> Also, I think it is incorrect that free_msi_irqs() does this:
> 
>                 if (entry->kobj.parent) {
>                         kobject_del(&entry->kobj);
>                         kobject_put(&entry->kobj);
>                 }
> 
>                 list_del(&entry->list);
>                 kfree(entry);
> 
> I think the "kfree(entry)" should be in msi_kobj_release() instead.
> 
As far as this change goes, I think it looks ok

Acked-by: Neil Horman <nhorman@tuxdriver.com>


> Bjorn
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Sept. 25, 2013, 11:35 p.m. UTC | #5
On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
>> [+cc Neil (he added this code in da8d1c8ba4), Greg]
>>
>> On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
>> > Before trying to kobject_init_and_add(), we add a reference to pdev via
>> > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
>> > don't return it back - even on out_unroll.
>> >
>> > Fix this by adding pci_dev_put(pdev) before going to unrolling section.
>> >
>> > CC: Bjorn Helgaas <bhelgaas@google.com>
>> > CC: linux-pci@vger.kernel.org
>> > CC: linux-kernel@vger.kernel.org
>> > Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>> > ---
>> >  drivers/pci/msi.c | 4 +++-
>> >  1 file changed, 3 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> > index d5f90d6..14bf578 100644
>> > --- a/drivers/pci/msi.c
>> > +++ b/drivers/pci/msi.c
>> > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>> >                 pci_dev_get(pdev);
>> >                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
>> >                                      "%u", entry->irq);
>> > -               if (ret)
>> > +               if (ret) {
>> > +                       pci_dev_put(pdev);
>> >                         goto out_unroll;
>> > +               }
>> >
>> >                 count++;
>> >         }
>>
>> I don't understand why this code does the pci_dev_get() in the first
>> place.  The pdev->msi_list of msi_desc structs is private to the
>> pci_dev, and even without bumping the refcount, there should be no way
>> for the pci_dev to be freed before the msi_desc.
>>
> Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that
> people didn't try to remove the device prior to freeing all their interrupts
> (i.e I didn't want a broken driver to go through its remove routine without
> freeing all its irqs).  That might have been the wrong thing to do, but thats
> what bubbles to the front of my head when looking at this.

That sounds plausible, but I think I'd rather deal with that by having
the PCI core remove logic free all the interrupts.  I *think* that's
already in place, i.e., pci_free_resources() calls
msi_remove_pci_irq_vectors().  So I propose that we remove the
pci_dev_get()/put() unless we come up with a more compelling reason
for it.

>> I also don't understand this nearby code (the same pattern appears in
>> free_msi_irqs()):
>>
>>     out_unroll:
>>         list_for_each_entry(entry, &pdev->msi_list, list) {
>>                 if (!count)
>>                         break;
>>                 kobject_del(&entry->kobj);
>>                 kobject_put(&entry->kobj);
>>                 count--;
>>         }
>>
>> Why do we call kobject_del() here?  The kobject_put() will call
>> kobject_del() anyway, so it looks redundant.
>> Documentation/kobject.txt says kobject_del() must be called explicitly
>> to break a circular reference, but I don't think we have that here.
>>
>  I think thats exactly why I did it, because of the documentation.  I agree
> however, it does look redundant.  Harmless, but redundant.

OK, thanks.  I think we should remove it on the grounds that it's not
needed and removing it will make this code look more similar to other
callers of kobject_init_and_add(), which means bugs will have fewer
places to hide.

Thanks, Neil!

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Veaceslav Falico Sept. 26, 2013, 9:27 a.m. UTC | #6
On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote:
>On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>> On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
>>> [+cc Neil (he added this code in da8d1c8ba4), Greg]
>>>
>>> On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
>>> > Before trying to kobject_init_and_add(), we add a reference to pdev via
>>> > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
>>> > don't return it back - even on out_unroll.
>>> >
>>> > Fix this by adding pci_dev_put(pdev) before going to unrolling section.
>>> >
>>> > CC: Bjorn Helgaas <bhelgaas@google.com>
>>> > CC: linux-pci@vger.kernel.org
>>> > CC: linux-kernel@vger.kernel.org
>>> > Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>>> > ---
>>> >  drivers/pci/msi.c | 4 +++-
>>> >  1 file changed, 3 insertions(+), 1 deletion(-)
>>> >
>>> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>>> > index d5f90d6..14bf578 100644
>>> > --- a/drivers/pci/msi.c
>>> > +++ b/drivers/pci/msi.c
>>> > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>>> >                 pci_dev_get(pdev);
>>> >                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
>>> >                                      "%u", entry->irq);
>>> > -               if (ret)
>>> > +               if (ret) {
>>> > +                       pci_dev_put(pdev);
>>> >                         goto out_unroll;
>>> > +               }
>>> >
>>> >                 count++;
>>> >         }
>>>
>>> I don't understand why this code does the pci_dev_get() in the first
>>> place.  The pdev->msi_list of msi_desc structs is private to the
>>> pci_dev, and even without bumping the refcount, there should be no way
>>> for the pci_dev to be freed before the msi_desc.
>>>
>> Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that
>> people didn't try to remove the device prior to freeing all their interrupts
>> (i.e I didn't want a broken driver to go through its remove routine without
>> freeing all its irqs).  That might have been the wrong thing to do, but thats
>> what bubbles to the front of my head when looking at this.
>
>That sounds plausible, but I think I'd rather deal with that by having
>the PCI core remove logic free all the interrupts.  I *think* that's
>already in place, i.e., pci_free_resources() calls
>msi_remove_pci_irq_vectors().  So I propose that we remove the
>pci_dev_get()/put() unless we come up with a more compelling reason
>for it.
>
>>> I also don't understand this nearby code (the same pattern appears in
>>> free_msi_irqs()):
>>>
>>>     out_unroll:
>>>         list_for_each_entry(entry, &pdev->msi_list, list) {
>>>                 if (!count)
>>>                         break;
>>>                 kobject_del(&entry->kobj);
>>>                 kobject_put(&entry->kobj);
>>>                 count--;
>>>         }
>>>
>>> Why do we call kobject_del() here?  The kobject_put() will call
>>> kobject_del() anyway, so it looks redundant.
>>> Documentation/kobject.txt says kobject_del() must be called explicitly
>>> to break a circular reference, but I don't think we have that here.
>>>
>>  I think thats exactly why I did it, because of the documentation.  I agree
>> however, it does look redundant.  Harmless, but redundant.
>
>OK, thanks.  I think we should remove it on the grounds that it's not
>needed and removing it will make this code look more similar to other
>callers of kobject_init_and_add(), which means bugs will have fewer
>places to hide.
Hi,

Sounds great, I'll add this as new patches and send this in a separate
patchset with the unregister msi_kset in free_msi_irqs patch, so that it
won't depend on the fix (both removing kobject_del() and
pci_dev_get/put()).

Thank you!

>
>Thanks, Neil!
>
>Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Veaceslav Falico Sept. 26, 2013, 12:25 p.m. UTC | #7
On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote:
>On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>> On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
>>> [+cc Neil (he added this code in da8d1c8ba4), Greg]
>>>
>>> On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
>>> > Before trying to kobject_init_and_add(), we add a reference to pdev via
>>> > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
>>> > don't return it back - even on out_unroll.
>>> >
>>> > Fix this by adding pci_dev_put(pdev) before going to unrolling section.
>>> >
>>> > CC: Bjorn Helgaas <bhelgaas@google.com>
>>> > CC: linux-pci@vger.kernel.org
>>> > CC: linux-kernel@vger.kernel.org
>>> > Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>>> > ---
>>> >  drivers/pci/msi.c | 4 +++-
>>> >  1 file changed, 3 insertions(+), 1 deletion(-)
>>> >
>>> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>>> > index d5f90d6..14bf578 100644
>>> > --- a/drivers/pci/msi.c
>>> > +++ b/drivers/pci/msi.c
>>> > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>>> >                 pci_dev_get(pdev);
>>> >                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
>>> >                                      "%u", entry->irq);
>>> > -               if (ret)
>>> > +               if (ret) {
>>> > +                       pci_dev_put(pdev);
>>> >                         goto out_unroll;
>>> > +               }
>>> >
>>> >                 count++;
>>> >         }
>>>
>>> I don't understand why this code does the pci_dev_get() in the first
>>> place.  The pdev->msi_list of msi_desc structs is private to the
>>> pci_dev, and even without bumping the refcount, there should be no way
>>> for the pci_dev to be freed before the msi_desc.
>>>
>> Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that
>> people didn't try to remove the device prior to freeing all their interrupts
>> (i.e I didn't want a broken driver to go through its remove routine without
>> freeing all its irqs).  That might have been the wrong thing to do, but thats
>> what bubbles to the front of my head when looking at this.
>
>That sounds plausible, but I think I'd rather deal with that by having
>the PCI core remove logic free all the interrupts.  I *think* that's
>already in place, i.e., pci_free_resources() calls
>msi_remove_pci_irq_vectors().  So I propose that we remove the
>pci_dev_get()/put() unless we come up with a more compelling reason
>for it.

As an update - I've found an interesting case why exactly that
kobject_del() might be needed:

in kobject_del() it removes instantly the link to kset - via
kobj_kset_leave(), so that our kset remains without links and, thus, might
be instantly removed.

So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly
without any links (i.e. other kobjects) and, when we call kset_unregister()
- it exits instantly (if it's not being hold somewhere elsewhere...).

Without it, kset_unregister() will wait till all the kobjects will be gone.

Now, the fun part starts - if we quickly call pci_disable_msi() and,
afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is
still there, waiting to unregister, and the sysfs dir is still active.

It's used, for example, in tg3_open/tg3_close, which are ndo_open/close,
and are called on enslave/deslave in bonding.

What I get:
[   60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0()
[   60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs'

I'll take a deeper look at the issue, though any feedback/advise is
welcome. And I'll hold on with the patchset that removes pci_dev_get/put
and kobject_del.


>
>>> I also don't understand this nearby code (the same pattern appears in
>>> free_msi_irqs()):
>>>
>>>     out_unroll:
>>>         list_for_each_entry(entry, &pdev->msi_list, list) {
>>>                 if (!count)
>>>                         break;
>>>                 kobject_del(&entry->kobj);
>>>                 kobject_put(&entry->kobj);
>>>                 count--;
>>>         }
>>>
>>> Why do we call kobject_del() here?  The kobject_put() will call
>>> kobject_del() anyway, so it looks redundant.
>>> Documentation/kobject.txt says kobject_del() must be called explicitly
>>> to break a circular reference, but I don't think we have that here.
>>>
>>  I think thats exactly why I did it, because of the documentation.  I agree
>> however, it does look redundant.  Harmless, but redundant.
>
>OK, thanks.  I think we should remove it on the grounds that it's not
>needed and removing it will make this code look more similar to other
>callers of kobject_init_and_add(), which means bugs will have fewer
>places to hide.
>
>Thanks, Neil!
>
>Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Veaceslav Falico Sept. 26, 2013, 2:07 p.m. UTC | #8
On Thu, Sep 26, 2013 at 02:25:52PM +0200, Veaceslav Falico wrote:
>On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote:
>>On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>>>On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
>>>>[+cc Neil (he added this code in da8d1c8ba4), Greg]
>>>>
>>>>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
>>>>> Before trying to kobject_init_and_add(), we add a reference to pdev via
>>>>> pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
>>>>> don't return it back - even on out_unroll.
>>>>>
>>>>> Fix this by adding pci_dev_put(pdev) before going to unrolling section.
>>>>>
>>>>> CC: Bjorn Helgaas <bhelgaas@google.com>
>>>>> CC: linux-pci@vger.kernel.org
>>>>> CC: linux-kernel@vger.kernel.org
>>>>> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>>>>> ---
>>>>>  drivers/pci/msi.c | 4 +++-
>>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>>>>> index d5f90d6..14bf578 100644
>>>>> --- a/drivers/pci/msi.c
>>>>> +++ b/drivers/pci/msi.c
>>>>> @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>>>>>                 pci_dev_get(pdev);
>>>>>                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
>>>>>                                      "%u", entry->irq);
>>>>> -               if (ret)
>>>>> +               if (ret) {
>>>>> +                       pci_dev_put(pdev);
>>>>>                         goto out_unroll;
>>>>> +               }
>>>>>
>>>>>                 count++;
>>>>>         }
>>>>
>>>>I don't understand why this code does the pci_dev_get() in the first
>>>>place.  The pdev->msi_list of msi_desc structs is private to the
>>>>pci_dev, and even without bumping the refcount, there should be no way
>>>>for the pci_dev to be freed before the msi_desc.
>>>>
>>>Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that
>>>people didn't try to remove the device prior to freeing all their interrupts
>>>(i.e I didn't want a broken driver to go through its remove routine without
>>>freeing all its irqs).  That might have been the wrong thing to do, but thats
>>>what bubbles to the front of my head when looking at this.
>>
>>That sounds plausible, but I think I'd rather deal with that by having
>>the PCI core remove logic free all the interrupts.  I *think* that's
>>already in place, i.e., pci_free_resources() calls
>>msi_remove_pci_irq_vectors().  So I propose that we remove the
>>pci_dev_get()/put() unless we come up with a more compelling reason
>>for it.
>
>As an update - I've found an interesting case why exactly that
>kobject_del() might be needed:
>
>in kobject_del() it removes instantly the link to kset - via
>kobj_kset_leave(), so that our kset remains without links and, thus, might
>be instantly removed.
>
>So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly
>without any links (i.e. other kobjects) and, when we call kset_unregister()
>- it exits instantly (if it's not being hold somewhere elsewhere...).
>
>Without it, kset_unregister() will wait till all the kobjects will be gone.
>
>Now, the fun part starts - if we quickly call pci_disable_msi() and,
>afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is
>still there, waiting to unregister, and the sysfs dir is still active.
>
>It's used, for example, in tg3_open/tg3_close, which are ndo_open/close,
>and are called on enslave/deslave in bonding.
>
>What I get:
>[   60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0()
>[   60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs'
>
>I'll take a deeper look at the issue, though any feedback/advise is
>welcome. And I'll hold on with the patchset that removes pci_dev_get/put
>and kobject_del.

Ok, this is only reproducible with CONFIG_DEBUG_KOBJECT_RELEASE, with or
without removing kobject_del() (though it's harder to reproduce). I could
not trigger it without CONFIG_DEBUG_KOBJECT_RELEASE, even with constantly
poking /sys/class/net/tg3_device/device/msi_irqs/* . Though it's,
obviously, possible, with and without cleanup and my previous bugfix.

I'll then send now the cleanup, however this theoretical issue was, is and
won't be fixed by it :-/. And I don't know how can we possible fix it
without something like kobject_put_wait().

>
>
>>
>>>>I also don't understand this nearby code (the same pattern appears in
>>>>free_msi_irqs()):
>>>>
>>>>    out_unroll:
>>>>        list_for_each_entry(entry, &pdev->msi_list, list) {
>>>>                if (!count)
>>>>                        break;
>>>>                kobject_del(&entry->kobj);
>>>>                kobject_put(&entry->kobj);
>>>>                count--;
>>>>        }
>>>>
>>>>Why do we call kobject_del() here?  The kobject_put() will call
>>>>kobject_del() anyway, so it looks redundant.
>>>>Documentation/kobject.txt says kobject_del() must be called explicitly
>>>>to break a circular reference, but I don't think we have that here.
>>>>
>>> I think thats exactly why I did it, because of the documentation.  I agree
>>>however, it does look redundant.  Harmless, but redundant.
>>
>>OK, thanks.  I think we should remove it on the grounds that it's not
>>needed and removing it will make this code look more similar to other
>>callers of kobject_init_and_add(), which means bugs will have fewer
>>places to hide.
>>
>>Thanks, Neil!
>>
>>Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Sept. 26, 2013, 2:40 p.m. UTC | #9
On Thu, Sep 26, 2013 at 02:25:52PM +0200, Veaceslav Falico wrote:
> On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote:
> >On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> >>On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
> >>>[+cc Neil (he added this code in da8d1c8ba4), Greg]
> >>>
> >>>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
> >>>> Before trying to kobject_init_and_add(), we add a reference to pdev via
> >>>> pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we
> >>>> don't return it back - even on out_unroll.
> >>>>
> >>>> Fix this by adding pci_dev_put(pdev) before going to unrolling section.
> >>>>
> >>>> CC: Bjorn Helgaas <bhelgaas@google.com>
> >>>> CC: linux-pci@vger.kernel.org
> >>>> CC: linux-kernel@vger.kernel.org
> >>>> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> >>>> ---
> >>>>  drivers/pci/msi.c | 4 +++-
> >>>>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> >>>> index d5f90d6..14bf578 100644
> >>>> --- a/drivers/pci/msi.c
> >>>> +++ b/drivers/pci/msi.c
> >>>> @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
> >>>>                 pci_dev_get(pdev);
> >>>>                 ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
> >>>>                                      "%u", entry->irq);
> >>>> -               if (ret)
> >>>> +               if (ret) {
> >>>> +                       pci_dev_put(pdev);
> >>>>                         goto out_unroll;
> >>>> +               }
> >>>>
> >>>>                 count++;
> >>>>         }
> >>>
> >>>I don't understand why this code does the pci_dev_get() in the first
> >>>place.  The pdev->msi_list of msi_desc structs is private to the
> >>>pci_dev, and even without bumping the refcount, there should be no way
> >>>for the pci_dev to be freed before the msi_desc.
> >>>
> >>Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that
> >>people didn't try to remove the device prior to freeing all their interrupts
> >>(i.e I didn't want a broken driver to go through its remove routine without
> >>freeing all its irqs).  That might have been the wrong thing to do, but thats
> >>what bubbles to the front of my head when looking at this.
> >
> >That sounds plausible, but I think I'd rather deal with that by having
> >the PCI core remove logic free all the interrupts.  I *think* that's
> >already in place, i.e., pci_free_resources() calls
> >msi_remove_pci_irq_vectors().  So I propose that we remove the
> >pci_dev_get()/put() unless we come up with a more compelling reason
> >for it.
> 
> As an update - I've found an interesting case why exactly that
> kobject_del() might be needed:
> 
> in kobject_del() it removes instantly the link to kset - via
> kobj_kset_leave(), so that our kset remains without links and, thus, might
> be instantly removed.
> 
> So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly
> without any links (i.e. other kobjects) and, when we call kset_unregister()
> - it exits instantly (if it's not being hold somewhere elsewhere...).
> 
> Without it, kset_unregister() will wait till all the kobjects will be gone.
> 
> Now, the fun part starts - if we quickly call pci_disable_msi() and,
> afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is
> still there, waiting to unregister, and the sysfs dir is still active.
> 
> It's used, for example, in tg3_open/tg3_close, which are ndo_open/close,
> and are called on enslave/deslave in bonding.
> 
> What I get:
> [   60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0()
> [   60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs'
> 
> I'll take a deeper look at the issue, though any feedback/advise is
> welcome. And I'll hold on with the patchset that removes pci_dev_get/put
> and kobject_del.
> 
> 
The origional post may offer some guidance here:
https://lkml.org/lkml/2011/9/29/220

In particular the v3 update I think is relevant.
Neil

> >
> >>>I also don't understand this nearby code (the same pattern appears in
> >>>free_msi_irqs()):
> >>>
> >>>    out_unroll:
> >>>        list_for_each_entry(entry, &pdev->msi_list, list) {
> >>>                if (!count)
> >>>                        break;
> >>>                kobject_del(&entry->kobj);
> >>>                kobject_put(&entry->kobj);
> >>>                count--;
> >>>        }
> >>>
> >>>Why do we call kobject_del() here?  The kobject_put() will call
> >>>kobject_del() anyway, so it looks redundant.
> >>>Documentation/kobject.txt says kobject_del() must be called explicitly
> >>>to break a circular reference, but I don't think we have that here.
> >>>
> >> I think thats exactly why I did it, because of the documentation.  I agree
> >>however, it does look redundant.  Harmless, but redundant.
> >
> >OK, thanks.  I think we should remove it on the grounds that it's not
> >needed and removing it will make this code look more similar to other
> >callers of kobject_init_and_add(), which means bugs will have fewer
> >places to hide.
> >
> >Thanks, Neil!
> >
> >Bjorn
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Sept. 26, 2013, 10:16 p.m. UTC | #10
[+cc Russell]

On Thu, Sep 26, 2013 at 04:07:51PM +0200, Veaceslav Falico wrote:
> On Thu, Sep 26, 2013 at 02:25:52PM +0200, Veaceslav Falico wrote:
> >On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote:
> >>On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> >>>On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
> >>>>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
...
> >As an update - I've found an interesting case why exactly that
> >kobject_del() might be needed:
> >
> >in kobject_del() it removes instantly the link to kset - via
> >kobj_kset_leave(), so that our kset remains without links and, thus, might
> >be instantly removed.
> >
> >So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly
> >without any links (i.e. other kobjects) and, when we call kset_unregister()
> >- it exits instantly (if it's not being hold somewhere elsewhere...).
> >
> >Without it, kset_unregister() will wait till all the kobjects will be gone.

I don't see any waiting in kset_unregister(); all it does is a
kobject_put().

> >Now, the fun part starts - if we quickly call pci_disable_msi() and,
> >afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is
> >still there, waiting to unregister, and the sysfs dir is still active.
> >
> >It's used, for example, in tg3_open/tg3_close, which are ndo_open/close,
> >and are called on enslave/deslave in bonding.
> >
> >What I get:
> >[   60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0()
> >[   60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs'
> >
> >I'll take a deeper look at the issue, though any feedback/advise is
> >welcome. And I'll hold on with the patchset that removes pci_dev_get/put
> >and kobject_del.
> 
> Ok, this is only reproducible with CONFIG_DEBUG_KOBJECT_RELEASE, with or
> without removing kobject_del() (though it's harder to reproduce). I could
> not trigger it without CONFIG_DEBUG_KOBJECT_RELEASE, even with constantly
> poking /sys/class/net/tg3_device/device/msi_irqs/* . Though it's,
> obviously, possible, with and without cleanup and my previous bugfix.
> 
> I'll then send now the cleanup, however this theoretical issue was, is and
> won't be fixed by it :-/. And I don't know how can we possible fix it
> without something like kobject_put_wait().

I still think we should remove the kobject_del().  We don't want to
make a race harder to hit; we want to remove it completely.  What we
really want is to make races *easier* to hit so we can find them,
which seems to be the point of KOBJECT_RELEASE :)

That said, I think I see why you see the warning in this case.
You're calling pci_enable_msi(), pci_disable_msi(), pci_enable_msi()
as in this call chain:

  pci_enable_msi
    msi_capability_init
      populate_msi_sysfs
        dev->msi_kset = kset_create_and_add("msi_irqs", ...)
        list_for_each_entry(entry, &dev->msi_list, ...)
          kobj->kset = dev->msi_kset
          kobject_init_and_add(kobj, &msi_irq_ktype, NULL, ...)
            kobject_add_internal
              kobject_get(&kobj->kset->kobj)    # dev->msi_kset

  pci_disable_msi
    free_msi_irqs
      list_for_each_entry_safe(entry, ..., &dev->msi_list, ...)
        kobject_put(&entry->kobj)
          kobject_release
            ... <delayed> ...
              kobject_cleanup
                kobject_del
                  kobj_kset_leave
                    kset_put(kobj->kset)        # dev->msi_kset
                      kobject_put	# happens AFTER pci_enable_msi() below
                t->release
        list_del(&entry->list)
    kset_unregister(dev->msi_kset)
      kobject_put
        kobject_release
          ... <delayed> ...
            kobject_cleanup		# happens AFTER pci_enable_msi() below
    dev->msi_kset = NULL

  pci_enable_msi
    msi_capability_init
      populate_msi_sysfs
        dev->msi_kset = kset_create_and_add("msi_irqs", ...)
          "sysfs: cannot create duplicate filename .../msi_irqs"

When we kset_create_and_add("msi_irqs") the second time, the delayed
kobject_cleanups() for the msi_desc entries and the msi_kset have
not yet occurred, so the msi_desc entries still hold references to
the msi_kset, etc.

I'm not sure if this is a design problem in the way PCI manages
msi_kset and msi_desc entries, or if there's something wrong in
the way KOBJECT_RELEASE is implemented.  I could imagine changing
it so the bulk of kobject_cleanup(), including the sysfs cleanup,
is executed immediately when the last reference is dropped, but
the kobj_type->release() function itself is delayed.

Calling kobject_del() explicitly sort of side-steps this problem
by doing the sysfs cleanup before the last put.  But it is quite
subtle, and it feels error-prone to rely on that.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Veaceslav Falico Sept. 26, 2013, 11:05 p.m. UTC | #11
On Thu, Sep 26, 2013 at 04:16:13PM -0600, Bjorn Helgaas wrote:
>[+cc Russell]
>
>On Thu, Sep 26, 2013 at 04:07:51PM +0200, Veaceslav Falico wrote:
>> On Thu, Sep 26, 2013 at 02:25:52PM +0200, Veaceslav Falico wrote:
>> >On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote:
>> >>On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>> >>>On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote:
>> >>>>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote:
>...
>> >As an update - I've found an interesting case why exactly that
>> >kobject_del() might be needed:
>> >
>> >in kobject_del() it removes instantly the link to kset - via
>> >kobj_kset_leave(), so that our kset remains without links and, thus, might
>> >be instantly removed.
>> >
>> >So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly
>> >without any links (i.e. other kobjects) and, when we call kset_unregister()
>> >- it exits instantly (if it's not being hold somewhere elsewhere...).
>> >
>> >Without it, kset_unregister() will wait till all the kobjects will be gone.
>
>I don't see any waiting in kset_unregister(); all it does is a
>kobject_put().

Sorry, didn't word it correctly - my thought was that kset_unregister()
(which is, basicly, kobject_put()) will NOT unregister it instantly without
kobject_del() in place, because the 'child' kobjects are still tied to this
kset.

>
>> >Now, the fun part starts - if we quickly call pci_disable_msi() and,
>> >afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is
>> >still there, waiting to unregister, and the sysfs dir is still active.
>> >
>> >It's used, for example, in tg3_open/tg3_close, which are ndo_open/close,
>> >and are called on enslave/deslave in bonding.
>> >
>> >What I get:
>> >[   60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0()
>> >[   60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs'
>> >
>> >I'll take a deeper look at the issue, though any feedback/advise is
>> >welcome. And I'll hold on with the patchset that removes pci_dev_get/put
>> >and kobject_del.
>>
>> Ok, this is only reproducible with CONFIG_DEBUG_KOBJECT_RELEASE, with or
>> without removing kobject_del() (though it's harder to reproduce). I could
>> not trigger it without CONFIG_DEBUG_KOBJECT_RELEASE, even with constantly
>> poking /sys/class/net/tg3_device/device/msi_irqs/* . Though it's,
>> obviously, possible, with and without cleanup and my previous bugfix.
>>
>> I'll then send now the cleanup, however this theoretical issue was, is and
>> won't be fixed by it :-/. And I don't know how can we possible fix it
>> without something like kobject_put_wait().
>
>I still think we should remove the kobject_del().  We don't want to
>make a race harder to hit; we want to remove it completely.  What we
>really want is to make races *easier* to hit so we can find them,
>which seems to be the point of KOBJECT_RELEASE :)

Agree, that's what I've done in the v2 patchset :).

>
>That said, I think I see why you see the warning in this case.
>You're calling pci_enable_msi(), pci_disable_msi(), pci_enable_msi()
>as in this call chain:
>
>  pci_enable_msi
>    msi_capability_init
>      populate_msi_sysfs
>        dev->msi_kset = kset_create_and_add("msi_irqs", ...)
>        list_for_each_entry(entry, &dev->msi_list, ...)
>          kobj->kset = dev->msi_kset
>          kobject_init_and_add(kobj, &msi_irq_ktype, NULL, ...)
>            kobject_add_internal
>              kobject_get(&kobj->kset->kobj)    # dev->msi_kset
>
>  pci_disable_msi
>    free_msi_irqs
>      list_for_each_entry_safe(entry, ..., &dev->msi_list, ...)
>        kobject_put(&entry->kobj)
>          kobject_release
>            ... <delayed> ...
>              kobject_cleanup
>                kobject_del
>                  kobj_kset_leave
>                    kset_put(kobj->kset)        # dev->msi_kset
>                      kobject_put	# happens AFTER pci_enable_msi() below
>                t->release
>        list_del(&entry->list)
>    kset_unregister(dev->msi_kset)
>      kobject_put
>        kobject_release
>          ... <delayed> ...
>            kobject_cleanup		# happens AFTER pci_enable_msi() below
>    dev->msi_kset = NULL
>
>  pci_enable_msi
>    msi_capability_init
>      populate_msi_sysfs
>        dev->msi_kset = kset_create_and_add("msi_irqs", ...)
>          "sysfs: cannot create duplicate filename .../msi_irqs"
>
>When we kset_create_and_add("msi_irqs") the second time, the delayed
>kobject_cleanups() for the msi_desc entries and the msi_kset have
>not yet occurred, so the msi_desc entries still hold references to
>the msi_kset, etc.

Exactly!

>
>I'm not sure if this is a design problem in the way PCI manages
>msi_kset and msi_desc entries, or if there's something wrong in
>the way KOBJECT_RELEASE is implemented.  I could imagine changing
>it so the bulk of kobject_cleanup(), including the sysfs cleanup,
>is executed immediately when the last reference is dropped, but
>the kobj_type->release() function itself is delayed.

Yep, could be one of the possibilities - and it actually resembles what
kobject_del() does :). But anyway, I agree that we shouldn't leave
kobject_del(), cause it only hides the real problem, and not completely (if
there are some other users who kobject_get(kset/kobject) - we're in
trouble).

>
>Calling kobject_del() explicitly sort of side-steps this problem
>by doing the sysfs cleanup before the last put.  But it is quite
>subtle, and it feels error-prone to rely on that.

So, in my v2 patchset, I've removed the kobject_del(). It's really hard to
trigger the bug without KOBJECT_DEBUG_RELEASE, still.

>
>Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index d5f90d6..14bf578 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -534,8 +534,10 @@  static int populate_msi_sysfs(struct pci_dev *pdev)
 		pci_dev_get(pdev);
 		ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
 				     "%u", entry->irq);
-		if (ret)
+		if (ret) {
+			pci_dev_put(pdev);
 			goto out_unroll;
+		}
 
 		count++;
 	}