Message ID | 1379382464-7920-2-git-send-email-vfalico@redhat.com |
---|---|
State | Changes Requested |
Headers | show |
[+cc Neil (he added this code in da8d1c8ba4), Greg] On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote: > Before trying to kobject_init_and_add(), we add a reference to pdev via > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we > don't return it back - even on out_unroll. > > Fix this by adding pci_dev_put(pdev) before going to unrolling section. > > CC: Bjorn Helgaas <bhelgaas@google.com> > CC: linux-pci@vger.kernel.org > CC: linux-kernel@vger.kernel.org > Signed-off-by: Veaceslav Falico <vfalico@redhat.com> > --- > drivers/pci/msi.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index d5f90d6..14bf578 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev) > pci_dev_get(pdev); > ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL, > "%u", entry->irq); > - if (ret) > + if (ret) { > + pci_dev_put(pdev); > goto out_unroll; > + } > > count++; > } I don't understand why this code does the pci_dev_get() in the first place. The pdev->msi_list of msi_desc structs is private to the pci_dev, and even without bumping the refcount, there should be no way for the pci_dev to be freed before the msi_desc. I also don't understand this nearby code (the same pattern appears in free_msi_irqs()): out_unroll: list_for_each_entry(entry, &pdev->msi_list, list) { if (!count) break; kobject_del(&entry->kobj); kobject_put(&entry->kobj); count--; } Why do we call kobject_del() here? The kobject_put() will call kobject_del() anyway, so it looks redundant. Documentation/kobject.txt says kobject_del() must be called explicitly to break a circular reference, but I don't think we have that here. Also, I think it is incorrect that free_msi_irqs() does this: if (entry->kobj.parent) { kobject_del(&entry->kobj); kobject_put(&entry->kobj); } list_del(&entry->list); kfree(entry); I think the "kfree(entry)" should be in msi_kobj_release() instead. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 25, 2013 at 3:08 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > Also, I think it is incorrect that free_msi_irqs() does this: > > if (entry->kobj.parent) { > kobject_del(&entry->kobj); > kobject_put(&entry->kobj); > } > > list_del(&entry->list); > kfree(entry); > > I think the "kfree(entry)" should be in msi_kobj_release() instead. Oh, I see you fixed this in patch 3/3. I hadn't read that far yet :) Did you find these problems by inspection, or is there some easy way to trigger bad behavior? Just wondering if this is some way I can reproduce the problem. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 25, 2013 at 03:30:14PM -0600, Bjorn Helgaas wrote: >On Wed, Sep 25, 2013 at 3:08 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> Also, I think it is incorrect that free_msi_irqs() does this: >> >> if (entry->kobj.parent) { >> kobject_del(&entry->kobj); >> kobject_put(&entry->kobj); >> } >> >> list_del(&entry->list); >> kfree(entry); >> >> I think the "kfree(entry)" should be in msi_kobj_release() instead. > >Oh, I see you fixed this in patch 3/3. I hadn't read that far yet :) > >Did you find these problems by inspection, or is there some easy way >to trigger bad behavior? Just wondering if this is some way I can >reproduce the problem. Hi, I've first found it by building with CONFIG_DEBUG_KOBJECT_RELEASE and CONFIG_DEBUG_OBJECTS - it shows that it's freeing an object in an active state (I'm just running insmod/rmmod tg3 concurently, but I guess it's reproducible with any driver that uses msi/x). Without CONFIG_DEBUG_OBJECTS it's also reproducible, and without CONFIG_DEBUG_KOBJECT_RELEASE it's really hard to reproduce, but still reproducible (I've hit it with tg3 as a slave of bonding and concurently running rmmod bonding/ifup bond0 - though it's really hard). It just panics in kobject_put(), iirc. So, with those CONFIG_DEBUG_* it's easily triggerable, without - quite hard. Hope that helps. p.s. I'll adress your other comments tomorrow already, it's quite late here and I don't want to do something stupid now :). Thanks a lot! > >Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote: > [+cc Neil (he added this code in da8d1c8ba4), Greg] > > On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote: > > Before trying to kobject_init_and_add(), we add a reference to pdev via > > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we > > don't return it back - even on out_unroll. > > > > Fix this by adding pci_dev_put(pdev) before going to unrolling section. > > > > CC: Bjorn Helgaas <bhelgaas@google.com> > > CC: linux-pci@vger.kernel.org > > CC: linux-kernel@vger.kernel.org > > Signed-off-by: Veaceslav Falico <vfalico@redhat.com> > > --- > > drivers/pci/msi.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > > index d5f90d6..14bf578 100644 > > --- a/drivers/pci/msi.c > > +++ b/drivers/pci/msi.c > > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev) > > pci_dev_get(pdev); > > ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL, > > "%u", entry->irq); > > - if (ret) > > + if (ret) { > > + pci_dev_put(pdev); > > goto out_unroll; > > + } > > > > count++; > > } > > I don't understand why this code does the pci_dev_get() in the first > place. The pdev->msi_list of msi_desc structs is private to the > pci_dev, and even without bumping the refcount, there should be no way > for the pci_dev to be freed before the msi_desc. > Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that people didn't try to remove the device prior to freeing all their interrupts (i.e I didn't want a broken driver to go through its remove routine without freeing all its irqs). That might have been the wrong thing to do, but thats what bubbles to the front of my head when looking at this. > I also don't understand this nearby code (the same pattern appears in > free_msi_irqs()): > > out_unroll: > list_for_each_entry(entry, &pdev->msi_list, list) { > if (!count) > break; > kobject_del(&entry->kobj); > kobject_put(&entry->kobj); > count--; > } > > Why do we call kobject_del() here? The kobject_put() will call > kobject_del() anyway, so it looks redundant. > Documentation/kobject.txt says kobject_del() must be called explicitly > to break a circular reference, but I don't think we have that here. > I think thats exactly why I did it, because of the documentation. I agree however, it does look redundant. Harmless, but redundant. > Also, I think it is incorrect that free_msi_irqs() does this: > > if (entry->kobj.parent) { > kobject_del(&entry->kobj); > kobject_put(&entry->kobj); > } > > list_del(&entry->list); > kfree(entry); > > I think the "kfree(entry)" should be in msi_kobj_release() instead. > As far as this change goes, I think it looks ok Acked-by: Neil Horman <nhorman@tuxdriver.com> > Bjorn > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote: > On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote: >> [+cc Neil (he added this code in da8d1c8ba4), Greg] >> >> On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote: >> > Before trying to kobject_init_and_add(), we add a reference to pdev via >> > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we >> > don't return it back - even on out_unroll. >> > >> > Fix this by adding pci_dev_put(pdev) before going to unrolling section. >> > >> > CC: Bjorn Helgaas <bhelgaas@google.com> >> > CC: linux-pci@vger.kernel.org >> > CC: linux-kernel@vger.kernel.org >> > Signed-off-by: Veaceslav Falico <vfalico@redhat.com> >> > --- >> > drivers/pci/msi.c | 4 +++- >> > 1 file changed, 3 insertions(+), 1 deletion(-) >> > >> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c >> > index d5f90d6..14bf578 100644 >> > --- a/drivers/pci/msi.c >> > +++ b/drivers/pci/msi.c >> > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev) >> > pci_dev_get(pdev); >> > ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL, >> > "%u", entry->irq); >> > - if (ret) >> > + if (ret) { >> > + pci_dev_put(pdev); >> > goto out_unroll; >> > + } >> > >> > count++; >> > } >> >> I don't understand why this code does the pci_dev_get() in the first >> place. The pdev->msi_list of msi_desc structs is private to the >> pci_dev, and even without bumping the refcount, there should be no way >> for the pci_dev to be freed before the msi_desc. >> > Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that > people didn't try to remove the device prior to freeing all their interrupts > (i.e I didn't want a broken driver to go through its remove routine without > freeing all its irqs). That might have been the wrong thing to do, but thats > what bubbles to the front of my head when looking at this. That sounds plausible, but I think I'd rather deal with that by having the PCI core remove logic free all the interrupts. I *think* that's already in place, i.e., pci_free_resources() calls msi_remove_pci_irq_vectors(). So I propose that we remove the pci_dev_get()/put() unless we come up with a more compelling reason for it. >> I also don't understand this nearby code (the same pattern appears in >> free_msi_irqs()): >> >> out_unroll: >> list_for_each_entry(entry, &pdev->msi_list, list) { >> if (!count) >> break; >> kobject_del(&entry->kobj); >> kobject_put(&entry->kobj); >> count--; >> } >> >> Why do we call kobject_del() here? The kobject_put() will call >> kobject_del() anyway, so it looks redundant. >> Documentation/kobject.txt says kobject_del() must be called explicitly >> to break a circular reference, but I don't think we have that here. >> > I think thats exactly why I did it, because of the documentation. I agree > however, it does look redundant. Harmless, but redundant. OK, thanks. I think we should remove it on the grounds that it's not needed and removing it will make this code look more similar to other callers of kobject_init_and_add(), which means bugs will have fewer places to hide. Thanks, Neil! Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote: >On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote: >> On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote: >>> [+cc Neil (he added this code in da8d1c8ba4), Greg] >>> >>> On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote: >>> > Before trying to kobject_init_and_add(), we add a reference to pdev via >>> > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we >>> > don't return it back - even on out_unroll. >>> > >>> > Fix this by adding pci_dev_put(pdev) before going to unrolling section. >>> > >>> > CC: Bjorn Helgaas <bhelgaas@google.com> >>> > CC: linux-pci@vger.kernel.org >>> > CC: linux-kernel@vger.kernel.org >>> > Signed-off-by: Veaceslav Falico <vfalico@redhat.com> >>> > --- >>> > drivers/pci/msi.c | 4 +++- >>> > 1 file changed, 3 insertions(+), 1 deletion(-) >>> > >>> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c >>> > index d5f90d6..14bf578 100644 >>> > --- a/drivers/pci/msi.c >>> > +++ b/drivers/pci/msi.c >>> > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev) >>> > pci_dev_get(pdev); >>> > ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL, >>> > "%u", entry->irq); >>> > - if (ret) >>> > + if (ret) { >>> > + pci_dev_put(pdev); >>> > goto out_unroll; >>> > + } >>> > >>> > count++; >>> > } >>> >>> I don't understand why this code does the pci_dev_get() in the first >>> place. The pdev->msi_list of msi_desc structs is private to the >>> pci_dev, and even without bumping the refcount, there should be no way >>> for the pci_dev to be freed before the msi_desc. >>> >> Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that >> people didn't try to remove the device prior to freeing all their interrupts >> (i.e I didn't want a broken driver to go through its remove routine without >> freeing all its irqs). That might have been the wrong thing to do, but thats >> what bubbles to the front of my head when looking at this. > >That sounds plausible, but I think I'd rather deal with that by having >the PCI core remove logic free all the interrupts. I *think* that's >already in place, i.e., pci_free_resources() calls >msi_remove_pci_irq_vectors(). So I propose that we remove the >pci_dev_get()/put() unless we come up with a more compelling reason >for it. > >>> I also don't understand this nearby code (the same pattern appears in >>> free_msi_irqs()): >>> >>> out_unroll: >>> list_for_each_entry(entry, &pdev->msi_list, list) { >>> if (!count) >>> break; >>> kobject_del(&entry->kobj); >>> kobject_put(&entry->kobj); >>> count--; >>> } >>> >>> Why do we call kobject_del() here? The kobject_put() will call >>> kobject_del() anyway, so it looks redundant. >>> Documentation/kobject.txt says kobject_del() must be called explicitly >>> to break a circular reference, but I don't think we have that here. >>> >> I think thats exactly why I did it, because of the documentation. I agree >> however, it does look redundant. Harmless, but redundant. > >OK, thanks. I think we should remove it on the grounds that it's not >needed and removing it will make this code look more similar to other >callers of kobject_init_and_add(), which means bugs will have fewer >places to hide. Hi, Sounds great, I'll add this as new patches and send this in a separate patchset with the unregister msi_kset in free_msi_irqs patch, so that it won't depend on the fix (both removing kobject_del() and pci_dev_get/put()). Thank you! > >Thanks, Neil! > >Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote: >On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote: >> On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote: >>> [+cc Neil (he added this code in da8d1c8ba4), Greg] >>> >>> On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote: >>> > Before trying to kobject_init_and_add(), we add a reference to pdev via >>> > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we >>> > don't return it back - even on out_unroll. >>> > >>> > Fix this by adding pci_dev_put(pdev) before going to unrolling section. >>> > >>> > CC: Bjorn Helgaas <bhelgaas@google.com> >>> > CC: linux-pci@vger.kernel.org >>> > CC: linux-kernel@vger.kernel.org >>> > Signed-off-by: Veaceslav Falico <vfalico@redhat.com> >>> > --- >>> > drivers/pci/msi.c | 4 +++- >>> > 1 file changed, 3 insertions(+), 1 deletion(-) >>> > >>> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c >>> > index d5f90d6..14bf578 100644 >>> > --- a/drivers/pci/msi.c >>> > +++ b/drivers/pci/msi.c >>> > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev) >>> > pci_dev_get(pdev); >>> > ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL, >>> > "%u", entry->irq); >>> > - if (ret) >>> > + if (ret) { >>> > + pci_dev_put(pdev); >>> > goto out_unroll; >>> > + } >>> > >>> > count++; >>> > } >>> >>> I don't understand why this code does the pci_dev_get() in the first >>> place. The pdev->msi_list of msi_desc structs is private to the >>> pci_dev, and even without bumping the refcount, there should be no way >>> for the pci_dev to be freed before the msi_desc. >>> >> Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that >> people didn't try to remove the device prior to freeing all their interrupts >> (i.e I didn't want a broken driver to go through its remove routine without >> freeing all its irqs). That might have been the wrong thing to do, but thats >> what bubbles to the front of my head when looking at this. > >That sounds plausible, but I think I'd rather deal with that by having >the PCI core remove logic free all the interrupts. I *think* that's >already in place, i.e., pci_free_resources() calls >msi_remove_pci_irq_vectors(). So I propose that we remove the >pci_dev_get()/put() unless we come up with a more compelling reason >for it. As an update - I've found an interesting case why exactly that kobject_del() might be needed: in kobject_del() it removes instantly the link to kset - via kobj_kset_leave(), so that our kset remains without links and, thus, might be instantly removed. So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly without any links (i.e. other kobjects) and, when we call kset_unregister() - it exits instantly (if it's not being hold somewhere elsewhere...). Without it, kset_unregister() will wait till all the kobjects will be gone. Now, the fun part starts - if we quickly call pci_disable_msi() and, afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is still there, waiting to unregister, and the sysfs dir is still active. It's used, for example, in tg3_open/tg3_close, which are ndo_open/close, and are called on enslave/deslave in bonding. What I get: [ 60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0() [ 60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs' I'll take a deeper look at the issue, though any feedback/advise is welcome. And I'll hold on with the patchset that removes pci_dev_get/put and kobject_del. > >>> I also don't understand this nearby code (the same pattern appears in >>> free_msi_irqs()): >>> >>> out_unroll: >>> list_for_each_entry(entry, &pdev->msi_list, list) { >>> if (!count) >>> break; >>> kobject_del(&entry->kobj); >>> kobject_put(&entry->kobj); >>> count--; >>> } >>> >>> Why do we call kobject_del() here? The kobject_put() will call >>> kobject_del() anyway, so it looks redundant. >>> Documentation/kobject.txt says kobject_del() must be called explicitly >>> to break a circular reference, but I don't think we have that here. >>> >> I think thats exactly why I did it, because of the documentation. I agree >> however, it does look redundant. Harmless, but redundant. > >OK, thanks. I think we should remove it on the grounds that it's not >needed and removing it will make this code look more similar to other >callers of kobject_init_and_add(), which means bugs will have fewer >places to hide. > >Thanks, Neil! > >Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 26, 2013 at 02:25:52PM +0200, Veaceslav Falico wrote: >On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote: >>On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote: >>>On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote: >>>>[+cc Neil (he added this code in da8d1c8ba4), Greg] >>>> >>>>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote: >>>>> Before trying to kobject_init_and_add(), we add a reference to pdev via >>>>> pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we >>>>> don't return it back - even on out_unroll. >>>>> >>>>> Fix this by adding pci_dev_put(pdev) before going to unrolling section. >>>>> >>>>> CC: Bjorn Helgaas <bhelgaas@google.com> >>>>> CC: linux-pci@vger.kernel.org >>>>> CC: linux-kernel@vger.kernel.org >>>>> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> >>>>> --- >>>>> drivers/pci/msi.c | 4 +++- >>>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c >>>>> index d5f90d6..14bf578 100644 >>>>> --- a/drivers/pci/msi.c >>>>> +++ b/drivers/pci/msi.c >>>>> @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev) >>>>> pci_dev_get(pdev); >>>>> ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL, >>>>> "%u", entry->irq); >>>>> - if (ret) >>>>> + if (ret) { >>>>> + pci_dev_put(pdev); >>>>> goto out_unroll; >>>>> + } >>>>> >>>>> count++; >>>>> } >>>> >>>>I don't understand why this code does the pci_dev_get() in the first >>>>place. The pdev->msi_list of msi_desc structs is private to the >>>>pci_dev, and even without bumping the refcount, there should be no way >>>>for the pci_dev to be freed before the msi_desc. >>>> >>>Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that >>>people didn't try to remove the device prior to freeing all their interrupts >>>(i.e I didn't want a broken driver to go through its remove routine without >>>freeing all its irqs). That might have been the wrong thing to do, but thats >>>what bubbles to the front of my head when looking at this. >> >>That sounds plausible, but I think I'd rather deal with that by having >>the PCI core remove logic free all the interrupts. I *think* that's >>already in place, i.e., pci_free_resources() calls >>msi_remove_pci_irq_vectors(). So I propose that we remove the >>pci_dev_get()/put() unless we come up with a more compelling reason >>for it. > >As an update - I've found an interesting case why exactly that >kobject_del() might be needed: > >in kobject_del() it removes instantly the link to kset - via >kobj_kset_leave(), so that our kset remains without links and, thus, might >be instantly removed. > >So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly >without any links (i.e. other kobjects) and, when we call kset_unregister() >- it exits instantly (if it's not being hold somewhere elsewhere...). > >Without it, kset_unregister() will wait till all the kobjects will be gone. > >Now, the fun part starts - if we quickly call pci_disable_msi() and, >afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is >still there, waiting to unregister, and the sysfs dir is still active. > >It's used, for example, in tg3_open/tg3_close, which are ndo_open/close, >and are called on enslave/deslave in bonding. > >What I get: >[ 60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0() >[ 60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs' > >I'll take a deeper look at the issue, though any feedback/advise is >welcome. And I'll hold on with the patchset that removes pci_dev_get/put >and kobject_del. Ok, this is only reproducible with CONFIG_DEBUG_KOBJECT_RELEASE, with or without removing kobject_del() (though it's harder to reproduce). I could not trigger it without CONFIG_DEBUG_KOBJECT_RELEASE, even with constantly poking /sys/class/net/tg3_device/device/msi_irqs/* . Though it's, obviously, possible, with and without cleanup and my previous bugfix. I'll then send now the cleanup, however this theoretical issue was, is and won't be fixed by it :-/. And I don't know how can we possible fix it without something like kobject_put_wait(). > > >> >>>>I also don't understand this nearby code (the same pattern appears in >>>>free_msi_irqs()): >>>> >>>> out_unroll: >>>> list_for_each_entry(entry, &pdev->msi_list, list) { >>>> if (!count) >>>> break; >>>> kobject_del(&entry->kobj); >>>> kobject_put(&entry->kobj); >>>> count--; >>>> } >>>> >>>>Why do we call kobject_del() here? The kobject_put() will call >>>>kobject_del() anyway, so it looks redundant. >>>>Documentation/kobject.txt says kobject_del() must be called explicitly >>>>to break a circular reference, but I don't think we have that here. >>>> >>> I think thats exactly why I did it, because of the documentation. I agree >>>however, it does look redundant. Harmless, but redundant. >> >>OK, thanks. I think we should remove it on the grounds that it's not >>needed and removing it will make this code look more similar to other >>callers of kobject_init_and_add(), which means bugs will have fewer >>places to hide. >> >>Thanks, Neil! >> >>Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 26, 2013 at 02:25:52PM +0200, Veaceslav Falico wrote: > On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote: > >On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote: > >>On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote: > >>>[+cc Neil (he added this code in da8d1c8ba4), Greg] > >>> > >>>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote: > >>>> Before trying to kobject_init_and_add(), we add a reference to pdev via > >>>> pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we > >>>> don't return it back - even on out_unroll. > >>>> > >>>> Fix this by adding pci_dev_put(pdev) before going to unrolling section. > >>>> > >>>> CC: Bjorn Helgaas <bhelgaas@google.com> > >>>> CC: linux-pci@vger.kernel.org > >>>> CC: linux-kernel@vger.kernel.org > >>>> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> > >>>> --- > >>>> drivers/pci/msi.c | 4 +++- > >>>> 1 file changed, 3 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > >>>> index d5f90d6..14bf578 100644 > >>>> --- a/drivers/pci/msi.c > >>>> +++ b/drivers/pci/msi.c > >>>> @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev) > >>>> pci_dev_get(pdev); > >>>> ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL, > >>>> "%u", entry->irq); > >>>> - if (ret) > >>>> + if (ret) { > >>>> + pci_dev_put(pdev); > >>>> goto out_unroll; > >>>> + } > >>>> > >>>> count++; > >>>> } > >>> > >>>I don't understand why this code does the pci_dev_get() in the first > >>>place. The pdev->msi_list of msi_desc structs is private to the > >>>pci_dev, and even without bumping the refcount, there should be no way > >>>for the pci_dev to be freed before the msi_desc. > >>> > >>Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that > >>people didn't try to remove the device prior to freeing all their interrupts > >>(i.e I didn't want a broken driver to go through its remove routine without > >>freeing all its irqs). That might have been the wrong thing to do, but thats > >>what bubbles to the front of my head when looking at this. > > > >That sounds plausible, but I think I'd rather deal with that by having > >the PCI core remove logic free all the interrupts. I *think* that's > >already in place, i.e., pci_free_resources() calls > >msi_remove_pci_irq_vectors(). So I propose that we remove the > >pci_dev_get()/put() unless we come up with a more compelling reason > >for it. > > As an update - I've found an interesting case why exactly that > kobject_del() might be needed: > > in kobject_del() it removes instantly the link to kset - via > kobj_kset_leave(), so that our kset remains without links and, thus, might > be instantly removed. > > So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly > without any links (i.e. other kobjects) and, when we call kset_unregister() > - it exits instantly (if it's not being hold somewhere elsewhere...). > > Without it, kset_unregister() will wait till all the kobjects will be gone. > > Now, the fun part starts - if we quickly call pci_disable_msi() and, > afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is > still there, waiting to unregister, and the sysfs dir is still active. > > It's used, for example, in tg3_open/tg3_close, which are ndo_open/close, > and are called on enslave/deslave in bonding. > > What I get: > [ 60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0() > [ 60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs' > > I'll take a deeper look at the issue, though any feedback/advise is > welcome. And I'll hold on with the patchset that removes pci_dev_get/put > and kobject_del. > > The origional post may offer some guidance here: https://lkml.org/lkml/2011/9/29/220 In particular the v3 update I think is relevant. Neil > > > >>>I also don't understand this nearby code (the same pattern appears in > >>>free_msi_irqs()): > >>> > >>> out_unroll: > >>> list_for_each_entry(entry, &pdev->msi_list, list) { > >>> if (!count) > >>> break; > >>> kobject_del(&entry->kobj); > >>> kobject_put(&entry->kobj); > >>> count--; > >>> } > >>> > >>>Why do we call kobject_del() here? The kobject_put() will call > >>>kobject_del() anyway, so it looks redundant. > >>>Documentation/kobject.txt says kobject_del() must be called explicitly > >>>to break a circular reference, but I don't think we have that here. > >>> > >> I think thats exactly why I did it, because of the documentation. I agree > >>however, it does look redundant. Harmless, but redundant. > > > >OK, thanks. I think we should remove it on the grounds that it's not > >needed and removing it will make this code look more similar to other > >callers of kobject_init_and_add(), which means bugs will have fewer > >places to hide. > > > >Thanks, Neil! > > > >Bjorn > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[+cc Russell] On Thu, Sep 26, 2013 at 04:07:51PM +0200, Veaceslav Falico wrote: > On Thu, Sep 26, 2013 at 02:25:52PM +0200, Veaceslav Falico wrote: > >On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote: > >>On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote: > >>>On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote: > >>>>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote: ... > >As an update - I've found an interesting case why exactly that > >kobject_del() might be needed: > > > >in kobject_del() it removes instantly the link to kset - via > >kobj_kset_leave(), so that our kset remains without links and, thus, might > >be instantly removed. > > > >So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly > >without any links (i.e. other kobjects) and, when we call kset_unregister() > >- it exits instantly (if it's not being hold somewhere elsewhere...). > > > >Without it, kset_unregister() will wait till all the kobjects will be gone. I don't see any waiting in kset_unregister(); all it does is a kobject_put(). > >Now, the fun part starts - if we quickly call pci_disable_msi() and, > >afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is > >still there, waiting to unregister, and the sysfs dir is still active. > > > >It's used, for example, in tg3_open/tg3_close, which are ndo_open/close, > >and are called on enslave/deslave in bonding. > > > >What I get: > >[ 60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0() > >[ 60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs' > > > >I'll take a deeper look at the issue, though any feedback/advise is > >welcome. And I'll hold on with the patchset that removes pci_dev_get/put > >and kobject_del. > > Ok, this is only reproducible with CONFIG_DEBUG_KOBJECT_RELEASE, with or > without removing kobject_del() (though it's harder to reproduce). I could > not trigger it without CONFIG_DEBUG_KOBJECT_RELEASE, even with constantly > poking /sys/class/net/tg3_device/device/msi_irqs/* . Though it's, > obviously, possible, with and without cleanup and my previous bugfix. > > I'll then send now the cleanup, however this theoretical issue was, is and > won't be fixed by it :-/. And I don't know how can we possible fix it > without something like kobject_put_wait(). I still think we should remove the kobject_del(). We don't want to make a race harder to hit; we want to remove it completely. What we really want is to make races *easier* to hit so we can find them, which seems to be the point of KOBJECT_RELEASE :) That said, I think I see why you see the warning in this case. You're calling pci_enable_msi(), pci_disable_msi(), pci_enable_msi() as in this call chain: pci_enable_msi msi_capability_init populate_msi_sysfs dev->msi_kset = kset_create_and_add("msi_irqs", ...) list_for_each_entry(entry, &dev->msi_list, ...) kobj->kset = dev->msi_kset kobject_init_and_add(kobj, &msi_irq_ktype, NULL, ...) kobject_add_internal kobject_get(&kobj->kset->kobj) # dev->msi_kset pci_disable_msi free_msi_irqs list_for_each_entry_safe(entry, ..., &dev->msi_list, ...) kobject_put(&entry->kobj) kobject_release ... <delayed> ... kobject_cleanup kobject_del kobj_kset_leave kset_put(kobj->kset) # dev->msi_kset kobject_put # happens AFTER pci_enable_msi() below t->release list_del(&entry->list) kset_unregister(dev->msi_kset) kobject_put kobject_release ... <delayed> ... kobject_cleanup # happens AFTER pci_enable_msi() below dev->msi_kset = NULL pci_enable_msi msi_capability_init populate_msi_sysfs dev->msi_kset = kset_create_and_add("msi_irqs", ...) "sysfs: cannot create duplicate filename .../msi_irqs" When we kset_create_and_add("msi_irqs") the second time, the delayed kobject_cleanups() for the msi_desc entries and the msi_kset have not yet occurred, so the msi_desc entries still hold references to the msi_kset, etc. I'm not sure if this is a design problem in the way PCI manages msi_kset and msi_desc entries, or if there's something wrong in the way KOBJECT_RELEASE is implemented. I could imagine changing it so the bulk of kobject_cleanup(), including the sysfs cleanup, is executed immediately when the last reference is dropped, but the kobj_type->release() function itself is delayed. Calling kobject_del() explicitly sort of side-steps this problem by doing the sysfs cleanup before the last put. But it is quite subtle, and it feels error-prone to rely on that. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 26, 2013 at 04:16:13PM -0600, Bjorn Helgaas wrote: >[+cc Russell] > >On Thu, Sep 26, 2013 at 04:07:51PM +0200, Veaceslav Falico wrote: >> On Thu, Sep 26, 2013 at 02:25:52PM +0200, Veaceslav Falico wrote: >> >On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote: >> >>On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman <nhorman@tuxdriver.com> wrote: >> >>>On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote: >> >>>>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico <vfalico@redhat.com> wrote: >... >> >As an update - I've found an interesting case why exactly that >> >kobject_del() might be needed: >> > >> >in kobject_del() it removes instantly the link to kset - via >> >kobj_kset_leave(), so that our kset remains without links and, thus, might >> >be instantly removed. >> > >> >So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly >> >without any links (i.e. other kobjects) and, when we call kset_unregister() >> >- it exits instantly (if it's not being hold somewhere elsewhere...). >> > >> >Without it, kset_unregister() will wait till all the kobjects will be gone. > >I don't see any waiting in kset_unregister(); all it does is a >kobject_put(). Sorry, didn't word it correctly - my thought was that kset_unregister() (which is, basicly, kobject_put()) will NOT unregister it instantly without kobject_del() in place, because the 'child' kobjects are still tied to this kset. > >> >Now, the fun part starts - if we quickly call pci_disable_msi() and, >> >afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is >> >still there, waiting to unregister, and the sysfs dir is still active. >> > >> >It's used, for example, in tg3_open/tg3_close, which are ndo_open/close, >> >and are called on enslave/deslave in bonding. >> > >> >What I get: >> >[ 60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 sysfs_add_one+0xbb/0xe0() >> >[ 60.458350] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1c.5/0000:3f:00.0/msi_irqs' >> > >> >I'll take a deeper look at the issue, though any feedback/advise is >> >welcome. And I'll hold on with the patchset that removes pci_dev_get/put >> >and kobject_del. >> >> Ok, this is only reproducible with CONFIG_DEBUG_KOBJECT_RELEASE, with or >> without removing kobject_del() (though it's harder to reproduce). I could >> not trigger it without CONFIG_DEBUG_KOBJECT_RELEASE, even with constantly >> poking /sys/class/net/tg3_device/device/msi_irqs/* . Though it's, >> obviously, possible, with and without cleanup and my previous bugfix. >> >> I'll then send now the cleanup, however this theoretical issue was, is and >> won't be fixed by it :-/. And I don't know how can we possible fix it >> without something like kobject_put_wait(). > >I still think we should remove the kobject_del(). We don't want to >make a race harder to hit; we want to remove it completely. What we >really want is to make races *easier* to hit so we can find them, >which seems to be the point of KOBJECT_RELEASE :) Agree, that's what I've done in the v2 patchset :). > >That said, I think I see why you see the warning in this case. >You're calling pci_enable_msi(), pci_disable_msi(), pci_enable_msi() >as in this call chain: > > pci_enable_msi > msi_capability_init > populate_msi_sysfs > dev->msi_kset = kset_create_and_add("msi_irqs", ...) > list_for_each_entry(entry, &dev->msi_list, ...) > kobj->kset = dev->msi_kset > kobject_init_and_add(kobj, &msi_irq_ktype, NULL, ...) > kobject_add_internal > kobject_get(&kobj->kset->kobj) # dev->msi_kset > > pci_disable_msi > free_msi_irqs > list_for_each_entry_safe(entry, ..., &dev->msi_list, ...) > kobject_put(&entry->kobj) > kobject_release > ... <delayed> ... > kobject_cleanup > kobject_del > kobj_kset_leave > kset_put(kobj->kset) # dev->msi_kset > kobject_put # happens AFTER pci_enable_msi() below > t->release > list_del(&entry->list) > kset_unregister(dev->msi_kset) > kobject_put > kobject_release > ... <delayed> ... > kobject_cleanup # happens AFTER pci_enable_msi() below > dev->msi_kset = NULL > > pci_enable_msi > msi_capability_init > populate_msi_sysfs > dev->msi_kset = kset_create_and_add("msi_irqs", ...) > "sysfs: cannot create duplicate filename .../msi_irqs" > >When we kset_create_and_add("msi_irqs") the second time, the delayed >kobject_cleanups() for the msi_desc entries and the msi_kset have >not yet occurred, so the msi_desc entries still hold references to >the msi_kset, etc. Exactly! > >I'm not sure if this is a design problem in the way PCI manages >msi_kset and msi_desc entries, or if there's something wrong in >the way KOBJECT_RELEASE is implemented. I could imagine changing >it so the bulk of kobject_cleanup(), including the sysfs cleanup, >is executed immediately when the last reference is dropped, but >the kobj_type->release() function itself is delayed. Yep, could be one of the possibilities - and it actually resembles what kobject_del() does :). But anyway, I agree that we shouldn't leave kobject_del(), cause it only hides the real problem, and not completely (if there are some other users who kobject_get(kset/kobject) - we're in trouble). > >Calling kobject_del() explicitly sort of side-steps this problem >by doing the sysfs cleanup before the last put. But it is quite >subtle, and it feels error-prone to rely on that. So, in my v2 patchset, I've removed the kobject_del(). It's really hard to trigger the bug without KOBJECT_DEBUG_RELEASE, still. > >Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index d5f90d6..14bf578 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev) pci_dev_get(pdev); ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL, "%u", entry->irq); - if (ret) + if (ret) { + pci_dev_put(pdev); goto out_unroll; + } count++; }
Before trying to kobject_init_and_add(), we add a reference to pdev via pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we don't return it back - even on out_unroll. Fix this by adding pci_dev_put(pdev) before going to unrolling section. CC: Bjorn Helgaas <bhelgaas@google.com> CC: linux-pci@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Veaceslav Falico <vfalico@redhat.com> --- drivers/pci/msi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)