diff mbox series

[v4,2/2] of: dynamic: Synchronize of_changeset_destroy() with the devlink removals

Message ID 20240306085007.169771-3-herve.codina@bootlin.com
State Superseded
Headers show
Series Synchronize DT overlay removal with devlink removals | expand

Checks

Context Check Description
robh/checkpatch warning total: 0 errors, 1 warnings, 19 lines checked
robh/patch-applied fail build log

Commit Message

Herve Codina March 6, 2024, 8:50 a.m. UTC
In the following sequence:
  1) of_platform_depopulate()
  2) of_overlay_remove()

During the step 1, devices are destroyed and devlinks are removed.
During the step 2, OF nodes are destroyed but
__of_changeset_entry_destroy() can raise warnings related to missing
of_node_put():
  ERROR: memory leak, expected refcount 1 instead of 2 ...

Indeed, during the devlink removals performed at step 1, the removal
itself releasing the device (and the attached of_node) is done by a job
queued in a workqueue and so, it is done asynchronously with respect to
function calls.
When the warning is present, of_node_put() will be called but wrongly
too late from the workqueue job.

In order to be sure that any ongoing devlink removals are done before
the of_node destruction, synchronize the of_changeset_destroy() with the
devlink removals.

Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
Cc: stable@vger.kernel.org
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
---
 drivers/of/dynamic.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Nuno Sá March 6, 2024, 9:21 a.m. UTC | #1
On Wed, 2024-03-06 at 09:50 +0100, Herve Codina wrote:
> In the following sequence:
>   1) of_platform_depopulate()
>   2) of_overlay_remove()
> 
> During the step 1, devices are destroyed and devlinks are removed.
> During the step 2, OF nodes are destroyed but
> __of_changeset_entry_destroy() can raise warnings related to missing
> of_node_put():
>   ERROR: memory leak, expected refcount 1 instead of 2 ...
> 
> Indeed, during the devlink removals performed at step 1, the removal
> itself releasing the device (and the attached of_node) is done by a job
> queued in a workqueue and so, it is done asynchronously with respect to
> function calls.
> When the warning is present, of_node_put() will be called but wrongly
> too late from the workqueue job.
> 
> In order to be sure that any ongoing devlink removals are done before
> the of_node destruction, synchronize the of_changeset_destroy() with the
> devlink removals.
> 
> Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> Cc: stable@vger.kernel.org
> Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> ---

LGTM

Reviewed-by: Nuno Sa <nuno.sa@analog.com>

>  drivers/of/dynamic.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index 3bf27052832f..169e2a9ae22f 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -9,6 +9,7 @@
>  
>  #define pr_fmt(fmt)	"OF: " fmt
>  
> +#include <linux/device.h>
>  #include <linux/of.h>
>  #include <linux/spinlock.h>
>  #include <linux/slab.h>
> @@ -667,6 +668,12 @@ void of_changeset_destroy(struct of_changeset *ocs)
>  {
>  	struct of_changeset_entry *ce, *cen;
>  
> +	/*
> +	 * Wait for any ongoing device link removals before destroying some
> of
> +	 * nodes.
> +	 */
> +	device_link_wait_removal();
> +
>  	list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
>  		__of_changeset_entry_destroy(ce);
>  }
Luca Ceresoli March 6, 2024, 11:07 a.m. UTC | #2
On Wed,  6 Mar 2024 09:50:03 +0100
Herve Codina <herve.codina@bootlin.com> wrote:

> In the following sequence:
>   1) of_platform_depopulate()
>   2) of_overlay_remove()
> 
> During the step 1, devices are destroyed and devlinks are removed.
> During the step 2, OF nodes are destroyed but
> __of_changeset_entry_destroy() can raise warnings related to missing
> of_node_put():
>   ERROR: memory leak, expected refcount 1 instead of 2 ...
> 
> Indeed, during the devlink removals performed at step 1, the removal
> itself releasing the device (and the attached of_node) is done by a job
> queued in a workqueue and so, it is done asynchronously with respect to
> function calls.
> When the warning is present, of_node_put() will be called but wrongly
> too late from the workqueue job.
> 
> In order to be sure that any ongoing devlink removals are done before
> the of_node destruction, synchronize the of_changeset_destroy() with the
> devlink removals.
> 
> Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> Cc: stable@vger.kernel.org
> Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> ---
>  drivers/of/dynamic.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index 3bf27052832f..169e2a9ae22f 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -9,6 +9,7 @@
>  
>  #define pr_fmt(fmt)	"OF: " fmt
>  
> +#include <linux/device.h>
>  #include <linux/of.h>
>  #include <linux/spinlock.h>
>  #include <linux/slab.h>
> @@ -667,6 +668,12 @@ void of_changeset_destroy(struct of_changeset *ocs)
>  {
>  	struct of_changeset_entry *ce, *cen;
>  
> +	/*
> +	 * Wait for any ongoing device link removals before destroying some of
> +	 * nodes.
> +	 */
> +	device_link_wait_removal();

Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

And no problem appeared in my tests due to the removed unlock/lock
around device_link_wait_removal().

Luca
Saravana Kannan March 6, 2024, 9:35 p.m. UTC | #3
On Wed, Mar 6, 2024 at 12:51 AM Herve Codina <herve.codina@bootlin.com> wrote:
>
> In the following sequence:
>   1) of_platform_depopulate()
>   2) of_overlay_remove()
>
> During the step 1, devices are destroyed and devlinks are removed.
> During the step 2, OF nodes are destroyed but
> __of_changeset_entry_destroy() can raise warnings related to missing
> of_node_put():
>   ERROR: memory leak, expected refcount 1 instead of 2 ...
>
> Indeed, during the devlink removals performed at step 1, the removal
> itself releasing the device (and the attached of_node) is done by a job
> queued in a workqueue and so, it is done asynchronously with respect to
> function calls.
> When the warning is present, of_node_put() will be called but wrongly
> too late from the workqueue job.
>
> In order to be sure that any ongoing devlink removals are done before
> the of_node destruction, synchronize the of_changeset_destroy() with the
> devlink removals.
>
> Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> Cc: stable@vger.kernel.org
> Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> ---
>  drivers/of/dynamic.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index 3bf27052832f..169e2a9ae22f 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -9,6 +9,7 @@
>
>  #define pr_fmt(fmt)    "OF: " fmt
>
> +#include <linux/device.h>
>  #include <linux/of.h>
>  #include <linux/spinlock.h>
>  #include <linux/slab.h>
> @@ -667,6 +668,12 @@ void of_changeset_destroy(struct of_changeset *ocs)
>  {
>         struct of_changeset_entry *ce, *cen;
>
> +       /*
> +        * Wait for any ongoing device link removals before destroying some of
> +        * nodes.
> +        */

Not going to ask you to revise this patch just for this, but this
comment isn't very useful. It's telling what you are doing. Not why.
And the function name is already clear on what you are doing.

Maybe something like this would be better at describing the "why"?
Free free to reword it.

When a device is deleted, the device links to/from it are also queued
for deletion. Until these device links are freed, the devices
themselves aren't freed. If the device being deleted is due to an
overlay change, this device might be holding a reference to a device
node that will be freed. So, wait until all already pending device
links are deleted before freeing a device node. This ensures we don't
free any device node that has a non-zero reference count.

Reviewed-by: Saravana Kannan <saravanak@google.com>

-Saravana

> +       device_link_wait_removal();
> +
>         list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
>                 __of_changeset_entry_destroy(ce);
>  }
> --
> 2.43.0
>
diff mbox series

Patch

diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index 3bf27052832f..169e2a9ae22f 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -9,6 +9,7 @@ 
 
 #define pr_fmt(fmt)	"OF: " fmt
 
+#include <linux/device.h>
 #include <linux/of.h>
 #include <linux/spinlock.h>
 #include <linux/slab.h>
@@ -667,6 +668,12 @@  void of_changeset_destroy(struct of_changeset *ocs)
 {
 	struct of_changeset_entry *ce, *cen;
 
+	/*
+	 * Wait for any ongoing device link removals before destroying some of
+	 * nodes.
+	 */
+	device_link_wait_removal();
+
 	list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
 		__of_changeset_entry_destroy(ce);
 }