Patchwork [v3,1/3] msi: free msi_desc entry only after we've released the kobject

login
register
mail settings
Submitter Veaceslav Falico
Date Oct. 29, 2013, 10:30 a.m.
Message ID <1383042632-7102-2-git-send-email-vfalico@redhat.com>
Download mbox | patch
Permalink /patch/286782/
State Not Applicable
Headers show

Comments

Veaceslav Falico - Oct. 29, 2013, 10:30 a.m.
Currently, we first do kobject_put(&entry->kobj) and the kfree(entry),
however kobject_put() doesn't guarantee us that it was the last reference
and that the kobj isn't used currently by someone else, so after we
kfree(entry) with the struct kobject - other users will begin using the
freed memory, instead of the actual kobject.

Fix this by using the kobject->release callback, which is called last when
the kobject is indeed not used and is cleaned up - it's msi_kobj_release(),
which can do the kfree(entry) safely (kobject_put/cleanup doesn't use the
kobj itself after ->release() was called, so we're safe).

In case we've failed to create the sysfs directories - just kfree()
it - cause we don't have the kobjects attached.

Also, remove the same functionality from populate_msi_sysfs(), cause on
failure we anyway call free_msi_irqs(), which will take care of all the
kobjects properly.

And add the forgotten pci_dev_put(pdev) in case of failure to register the
kobject in populate_msi_sysfs().

CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Neil Horman <nhorman@tuxdriver.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: linux-pci@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
 drivers/pci/msi.c | 31 ++++++++++++-------------------
 1 file changed, 12 insertions(+), 19 deletions(-)
Neil Horman - Oct. 29, 2013, 11:32 a.m.
On Tue, Oct 29, 2013 at 11:30:30AM +0100, Veaceslav Falico wrote:
> Currently, we first do kobject_put(&entry->kobj) and the kfree(entry),
> however kobject_put() doesn't guarantee us that it was the last reference
> and that the kobj isn't used currently by someone else, so after we
> kfree(entry) with the struct kobject - other users will begin using the
> freed memory, instead of the actual kobject.
> 
> Fix this by using the kobject->release callback, which is called last when
> the kobject is indeed not used and is cleaned up - it's msi_kobj_release(),
> which can do the kfree(entry) safely (kobject_put/cleanup doesn't use the
> kobj itself after ->release() was called, so we're safe).
> 
> In case we've failed to create the sysfs directories - just kfree()
> it - cause we don't have the kobjects attached.
> 
> Also, remove the same functionality from populate_msi_sysfs(), cause on
> failure we anyway call free_msi_irqs(), which will take care of all the
> kobjects properly.
> 
> And add the forgotten pci_dev_put(pdev) in case of failure to register the
> kobject in populate_msi_sysfs().
> 
> CC: Bjorn Helgaas <bhelgaas@google.com>
> CC: Neil Horman <nhorman@tuxdriver.com>
> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> CC: linux-pci@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds - Oct. 29, 2013, 4:34 p.m.
On Tue, Oct 29, 2013 at 3:30 AM, Veaceslav Falico <vfalico@redhat.com> wrote:
>                 /*
>                  * Its possible that we get into this path
>                  * When populate_msi_sysfs fails, which means the entries
>                  * were not registered with sysfs.  In that case don't
> -                * unregister them.
> +                * unregister them, and just free. Otherwise the
> +                * kobject->release will take care of freeing the entry via
> +                * msi_kobj_release().
>                  */
>                 if (entry->kobj.parent) {
>                         kobject_del(&entry->kobj);
>                         kobject_put(&entry->kobj);
> +               } else {
> +                       kfree(entry);
>                 }
> -
> -               list_del(&entry->list);
> -               kfree(entry);

So this code sequence still makes me very unhappy.

Why does not just a simple unconditional

        kobject_del(&entry->kobj);
        kobject_put(&entry->kobj);

work for the "not registered with sysfs" case? And if the sysfs code
really gets confused, why not

        if (entry->kobj.parent)
                kobject_del(&entry->kobj);
        kobject_put(&entry->kobj);

(btw, looking at the sysfs code, this looks *very* suspicious in
sysfs_remove_dir():

        struct sysfs_dirent *sd = kobj->sd;

        spin_lock(&sysfs_assoc_lock);
        kobj->sd = NULL;
        spin_unlock(&sysfs_assoc_lock);

and I would suggest that "sd = kobj->sd" should be done under the
lock, because otherwise the lock is kind of pointless..)

Greg?

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Veaceslav Falico - Oct. 29, 2013, 4:47 p.m.
On Tue, Oct 29, 2013 at 09:34:28AM -0700, Linus Torvalds wrote:
>On Tue, Oct 29, 2013 at 3:30 AM, Veaceslav Falico <vfalico@redhat.com> wrote:
>>                 /*
>>                  * Its possible that we get into this path
>>                  * When populate_msi_sysfs fails, which means the entries
>>                  * were not registered with sysfs.  In that case don't
>> -                * unregister them.
>> +                * unregister them, and just free. Otherwise the
>> +                * kobject->release will take care of freeing the entry via
>> +                * msi_kobj_release().
>>                  */
>>                 if (entry->kobj.parent) {
>>                         kobject_del(&entry->kobj);
>>                         kobject_put(&entry->kobj);
>> +               } else {
>> +                       kfree(entry);
>>                 }
>> -
>> -               list_del(&entry->list);
>> -               kfree(entry);
>
>So this code sequence still makes me very unhappy.
>
>Why does not just a simple unconditional
>
>        kobject_del(&entry->kobj);
>        kobject_put(&entry->kobj);
>
>work for the "not registered with sysfs" case? And if the sysfs code
>really gets confused, why not
>
>        if (entry->kobj.parent)
>                kobject_del(&entry->kobj);
>        kobject_put(&entry->kobj);

It was fixed this way in 424eb39 ("PCI: msi: fix imbalanced refcount of msi
irq sysfs objects"). kobject_put() still failed in case it wasn't
registered with sysfs earlier. populate_msi_sysfs() creates an kobject for
each entry in msi_list, and we have no idea (on fallback) up to which entry
was it already registered, on which it failed and which entries are still
not kobject_init_and_add()ed.

>
>(btw, looking at the sysfs code, this looks *very* suspicious in
>sysfs_remove_dir():
>
>        struct sysfs_dirent *sd = kobj->sd;
>
>        spin_lock(&sysfs_assoc_lock);
>        kobj->sd = NULL;
>        spin_unlock(&sysfs_assoc_lock);
>
>and I would suggest that "sd = kobj->sd" should be done under the
>lock, because otherwise the lock is kind of pointless..)
>
>Greg?
>
>                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH - Oct. 29, 2013, 9:49 p.m.
On Tue, Oct 29, 2013 at 09:34:28AM -0700, Linus Torvalds wrote:
> On Tue, Oct 29, 2013 at 3:30 AM, Veaceslav Falico <vfalico@redhat.com> wrote:
> >                 /*
> >                  * Its possible that we get into this path
> >                  * When populate_msi_sysfs fails, which means the entries
> >                  * were not registered with sysfs.  In that case don't
> > -                * unregister them.
> > +                * unregister them, and just free. Otherwise the
> > +                * kobject->release will take care of freeing the entry via
> > +                * msi_kobj_release().
> >                  */
> >                 if (entry->kobj.parent) {
> >                         kobject_del(&entry->kobj);
> >                         kobject_put(&entry->kobj);
> > +               } else {
> > +                       kfree(entry);
> >                 }
> > -
> > -               list_del(&entry->list);
> > -               kfree(entry);
> 
> So this code sequence still makes me very unhappy.
> 
> Why does not just a simple unconditional
> 
>         kobject_del(&entry->kobj);
>         kobject_put(&entry->kobj);
> 
> work for the "not registered with sysfs" case? And if the sysfs code
> really gets confused, why not
> 
>         if (entry->kobj.parent)
>                 kobject_del(&entry->kobj);
>         kobject_put(&entry->kobj);
> 
> (btw, looking at the sysfs code, this looks *very* suspicious in
> sysfs_remove_dir():
> 
>         struct sysfs_dirent *sd = kobj->sd;
> 
>         spin_lock(&sysfs_assoc_lock);
>         kobj->sd = NULL;
>         spin_unlock(&sysfs_assoc_lock);
> 
> and I would suggest that "sd = kobj->sd" should be done under the
> lock, because otherwise the lock is kind of pointless..)
> 
> Greg?

That is really odd, but I guess it works as-is because no one ever calls
that function on the same kobject at the same time.  I don't know what
that is trying to do.  There has been some work by Tejun in this area
for linux-next, but that lock and logic is still there, I'll look into
fixing that up...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - Nov. 25, 2013, 11:12 p.m.
On Tue, Oct 29, 2013 at 11:30:30AM +0100, Veaceslav Falico wrote:
> Currently, we first do kobject_put(&entry->kobj) and the kfree(entry),
> however kobject_put() doesn't guarantee us that it was the last reference
> and that the kobj isn't used currently by someone else, so after we
> kfree(entry) with the struct kobject - other users will begin using the
> freed memory, instead of the actual kobject.
> 
> Fix this by using the kobject->release callback, which is called last when
> the kobject is indeed not used and is cleaned up - it's msi_kobj_release(),
> which can do the kfree(entry) safely (kobject_put/cleanup doesn't use the
> kobj itself after ->release() was called, so we're safe).
> 
> In case we've failed to create the sysfs directories - just kfree()
> it - cause we don't have the kobjects attached.
> 
> Also, remove the same functionality from populate_msi_sysfs(), cause on
> failure we anyway call free_msi_irqs(), which will take care of all the
> kobjects properly.
> 
> And add the forgotten pci_dev_put(pdev) in case of failure to register the
> kobject in populate_msi_sysfs().
> 
> CC: Bjorn Helgaas <bhelgaas@google.com>
> CC: Neil Horman <nhorman@tuxdriver.com>
> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> CC: linux-pci@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>

I'm still hoping that Greg (or somebody else; Greg already posted the bulk
of the work) will finish up the conversion to attributes, and that Neil
will confirm that works with no changes to irqbalance.  So I'm going to
drop these patches for now.  Poke me if we need to revive them.

Bjorn

> ---
>  drivers/pci/msi.c | 31 ++++++++++++-------------------
>  1 file changed, 12 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index d5f90d6..5d70f49 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -374,19 +374,22 @@ static void free_msi_irqs(struct pci_dev *dev)
>  				iounmap(entry->mask_base);
>  		}
>  
> +		list_del(&entry->list);
> +
>  		/*
>  		 * Its possible that we get into this path
>  		 * When populate_msi_sysfs fails, which means the entries
>  		 * were not registered with sysfs.  In that case don't
> -		 * unregister them.
> +		 * unregister them, and just free. Otherwise the
> +		 * kobject->release will take care of freeing the entry via
> +		 * msi_kobj_release().
>  		 */
>  		if (entry->kobj.parent) {
>  			kobject_del(&entry->kobj);
>  			kobject_put(&entry->kobj);
> +		} else {
> +			kfree(entry);
>  		}
> -
> -		list_del(&entry->list);
> -		kfree(entry);
>  	}
>  }
>  
> @@ -509,6 +512,7 @@ static void msi_kobj_release(struct kobject *kobj)
>  	struct msi_desc *entry = to_msi_desc(kobj);
>  
>  	pci_dev_put(entry->dev);
> +	kfree(entry);
>  }
>  
>  static struct kobj_type msi_irq_ktype = {
> @@ -522,7 +526,6 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>  	struct msi_desc *entry;
>  	struct kobject *kobj;
>  	int ret;
> -	int count = 0;
>  
>  	pdev->msi_kset = kset_create_and_add("msi_irqs", NULL, &pdev->dev.kobj);
>  	if (!pdev->msi_kset)
> @@ -534,23 +537,13 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>  		pci_dev_get(pdev);
>  		ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
>  				     "%u", entry->irq);
> -		if (ret)
> -			goto out_unroll;
> -
> -		count++;
> +		if (ret) {
> +			pci_dev_put(pdev);
> +			return ret;
> +		}
>  	}
>  
>  	return 0;
> -
> -out_unroll:
> -	list_for_each_entry(entry, &pdev->msi_list, list) {
> -		if (!count)
> -			break;
> -		kobject_del(&entry->kobj);
> -		kobject_put(&entry->kobj);
> -		count--;
> -	}
> -	return ret;
>  }
>  
>  /**
> -- 
> 1.8.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH - Nov. 25, 2013, 11:29 p.m.
On Mon, Nov 25, 2013 at 04:12:48PM -0700, Bjorn Helgaas wrote:
> On Tue, Oct 29, 2013 at 11:30:30AM +0100, Veaceslav Falico wrote:
> > Currently, we first do kobject_put(&entry->kobj) and the kfree(entry),
> > however kobject_put() doesn't guarantee us that it was the last reference
> > and that the kobj isn't used currently by someone else, so after we
> > kfree(entry) with the struct kobject - other users will begin using the
> > freed memory, instead of the actual kobject.
> > 
> > Fix this by using the kobject->release callback, which is called last when
> > the kobject is indeed not used and is cleaned up - it's msi_kobj_release(),
> > which can do the kfree(entry) safely (kobject_put/cleanup doesn't use the
> > kobj itself after ->release() was called, so we're safe).
> > 
> > In case we've failed to create the sysfs directories - just kfree()
> > it - cause we don't have the kobjects attached.
> > 
> > Also, remove the same functionality from populate_msi_sysfs(), cause on
> > failure we anyway call free_msi_irqs(), which will take care of all the
> > kobjects properly.
> > 
> > And add the forgotten pci_dev_put(pdev) in case of failure to register the
> > kobject in populate_msi_sysfs().
> > 
> > CC: Bjorn Helgaas <bhelgaas@google.com>
> > CC: Neil Horman <nhorman@tuxdriver.com>
> > CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > CC: linux-pci@vger.kernel.org
> > CC: linux-kernel@vger.kernel.org
> > Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> 
> I'm still hoping that Greg (or somebody else; Greg already posted the bulk
> of the work) will finish up the conversion to attributes, and that Neil
> will confirm that works with no changes to irqbalance.  So I'm going to
> drop these patches for now.  Poke me if we need to revive them.

Ah, sorry about that.  I'll redo my series, dropping the existing format
and using the attribute-only code.  Give me a day or so, thanks for the
reminder.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman - Nov. 26, 2013, 12:47 p.m.
On Mon, Nov 25, 2013 at 03:29:36PM -0800, Greg Kroah-Hartman wrote:
> On Mon, Nov 25, 2013 at 04:12:48PM -0700, Bjorn Helgaas wrote:
> > On Tue, Oct 29, 2013 at 11:30:30AM +0100, Veaceslav Falico wrote:
> > > Currently, we first do kobject_put(&entry->kobj) and the kfree(entry),
> > > however kobject_put() doesn't guarantee us that it was the last reference
> > > and that the kobj isn't used currently by someone else, so after we
> > > kfree(entry) with the struct kobject - other users will begin using the
> > > freed memory, instead of the actual kobject.
> > > 
> > > Fix this by using the kobject->release callback, which is called last when
> > > the kobject is indeed not used and is cleaned up - it's msi_kobj_release(),
> > > which can do the kfree(entry) safely (kobject_put/cleanup doesn't use the
> > > kobj itself after ->release() was called, so we're safe).
> > > 
> > > In case we've failed to create the sysfs directories - just kfree()
> > > it - cause we don't have the kobjects attached.
> > > 
> > > Also, remove the same functionality from populate_msi_sysfs(), cause on
> > > failure we anyway call free_msi_irqs(), which will take care of all the
> > > kobjects properly.
> > > 
> > > And add the forgotten pci_dev_put(pdev) in case of failure to register the
> > > kobject in populate_msi_sysfs().
> > > 
> > > CC: Bjorn Helgaas <bhelgaas@google.com>
> > > CC: Neil Horman <nhorman@tuxdriver.com>
> > > CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > CC: linux-pci@vger.kernel.org
> > > CC: linux-kernel@vger.kernel.org
> > > Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> > 
> > I'm still hoping that Greg (or somebody else; Greg already posted the bulk
> > of the work) will finish up the conversion to attributes, and that Neil
> > will confirm that works with no changes to irqbalance.  So I'm going to
> > drop these patches for now.  Poke me if we need to revive them.
> 
> Ah, sorry about that.  I'll redo my series, dropping the existing format
> and using the attribute-only code.  Give me a day or so, thanks for the
> reminder.
> 
> greg k-h
> 
Thank you greg, I'll make sure Irqbalance doesn't choke on the new format as
soon as you post them.

Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index d5f90d6..5d70f49 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -374,19 +374,22 @@  static void free_msi_irqs(struct pci_dev *dev)
 				iounmap(entry->mask_base);
 		}
 
+		list_del(&entry->list);
+
 		/*
 		 * Its possible that we get into this path
 		 * When populate_msi_sysfs fails, which means the entries
 		 * were not registered with sysfs.  In that case don't
-		 * unregister them.
+		 * unregister them, and just free. Otherwise the
+		 * kobject->release will take care of freeing the entry via
+		 * msi_kobj_release().
 		 */
 		if (entry->kobj.parent) {
 			kobject_del(&entry->kobj);
 			kobject_put(&entry->kobj);
+		} else {
+			kfree(entry);
 		}
-
-		list_del(&entry->list);
-		kfree(entry);
 	}
 }
 
@@ -509,6 +512,7 @@  static void msi_kobj_release(struct kobject *kobj)
 	struct msi_desc *entry = to_msi_desc(kobj);
 
 	pci_dev_put(entry->dev);
+	kfree(entry);
 }
 
 static struct kobj_type msi_irq_ktype = {
@@ -522,7 +526,6 @@  static int populate_msi_sysfs(struct pci_dev *pdev)
 	struct msi_desc *entry;
 	struct kobject *kobj;
 	int ret;
-	int count = 0;
 
 	pdev->msi_kset = kset_create_and_add("msi_irqs", NULL, &pdev->dev.kobj);
 	if (!pdev->msi_kset)
@@ -534,23 +537,13 @@  static int populate_msi_sysfs(struct pci_dev *pdev)
 		pci_dev_get(pdev);
 		ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
 				     "%u", entry->irq);
-		if (ret)
-			goto out_unroll;
-
-		count++;
+		if (ret) {
+			pci_dev_put(pdev);
+			return ret;
+		}
 	}
 
 	return 0;
-
-out_unroll:
-	list_for_each_entry(entry, &pdev->msi_list, list) {
-		if (!count)
-			break;
-		kobject_del(&entry->kobj);
-		kobject_put(&entry->kobj);
-		count--;
-	}
-	return ret;
 }
 
 /**