Message ID | 20240229083953.607569-3-herve.codina@bootlin.com |
---|---|
State | Superseded |
Headers | show |
Series | Synchronize DT overlay removal with devlink removals | expand |
Context | Check | Description |
---|---|---|
robh/checkpatch | warning | total: 0 errors, 1 warnings, 21 lines checked |
robh/patch-applied | fail | build log |
On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote: > In the following sequence: > 1) of_platform_depopulate() > 2) of_overlay_remove() > > During the step 1, devices are destroyed and devlinks are removed. > During the step 2, OF nodes are destroyed but > __of_changeset_entry_destroy() can raise warnings related to missing > of_node_put(): > ERROR: memory leak, expected refcount 1 instead of 2 ... > > Indeed, during the devlink removals performed at step 1, the removal > itself releasing the device (and the attached of_node) is done by a job > queued in a workqueue and so, it is done asynchronously with respect to > function calls. > When the warning is present, of_node_put() will be called but wrongly > too late from the workqueue job. > > In order to be sure that any ongoing devlink removals are done before > the of_node destruction, synchronize the of_overlay_remove() with the > devlink removals. > > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") > Cc: stable@vger.kernel.org > Signed-off-by: Herve Codina <herve.codina@bootlin.com> > --- > drivers/of/overlay.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > index 2ae7e9d24a64..99659ae9fb28 100644 > --- a/drivers/of/overlay.c > +++ b/drivers/of/overlay.c > @@ -853,6 +853,14 @@ static void free_overlay_changeset(struct > overlay_changeset *ovcs) > { > int i; > > + /* > + * Wait for any ongoing device link removals before removing some of > + * nodes. Drop the global lock while waiting > + */ > + mutex_unlock(&of_mutex); > + device_link_wait_removal(); > + mutex_lock(&of_mutex); I'm still not convinced we need to drop the lock. What happens if someone else grabs the lock while we are in device_link_wait_removal()? Can we guarantee that we can't screw things badly? The question is, do you have a system/use case where you can really see the deadlock happening? Until I see one, I'm very skeptical about this. And if we have one, I'm not really sure this is also the right solution for it. - Nuno Sá
On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote: > In the following sequence: > 1) of_platform_depopulate() > 2) of_overlay_remove() > > During the step 1, devices are destroyed and devlinks are removed. > During the step 2, OF nodes are destroyed but > __of_changeset_entry_destroy() can raise warnings related to missing > of_node_put(): > ERROR: memory leak, expected refcount 1 instead of 2 ... > > Indeed, during the devlink removals performed at step 1, the removal > itself releasing the device (and the attached of_node) is done by a job > queued in a workqueue and so, it is done asynchronously with respect to > function calls. > When the warning is present, of_node_put() will be called but wrongly > too late from the workqueue job. > > In order to be sure that any ongoing devlink removals are done before > the of_node destruction, synchronize the of_overlay_remove() with the > devlink removals. > > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") > Cc: stable@vger.kernel.org > Signed-off-by: Herve Codina <herve.codina@bootlin.com> > --- > drivers/of/overlay.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > index 2ae7e9d24a64..99659ae9fb28 100644 > --- a/drivers/of/overlay.c > +++ b/drivers/of/overlay.c In the cover, you mention device.h inclusion but I'm not seeing it? This is clearly up to the DT maintainers to decide but, IMHO, I would very much prefer to see fwnode.h included in here rather than directly device.h (so yeah, renaming the function to fwnode_*). But yeah, I might be biased by own series :) - Nuno Sá
On Thu, 29 Feb 2024 10:50:21 +0100 Nuno Sá <noname.nuno@gmail.com> wrote: > On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote: > > In the following sequence: > > 1) of_platform_depopulate() > > 2) of_overlay_remove() > > > > During the step 1, devices are destroyed and devlinks are removed. > > During the step 2, OF nodes are destroyed but > > __of_changeset_entry_destroy() can raise warnings related to missing > > of_node_put(): > > ERROR: memory leak, expected refcount 1 instead of 2 ... > > > > Indeed, during the devlink removals performed at step 1, the removal > > itself releasing the device (and the attached of_node) is done by a job > > queued in a workqueue and so, it is done asynchronously with respect to > > function calls. > > When the warning is present, of_node_put() will be called but wrongly > > too late from the workqueue job. > > > > In order to be sure that any ongoing devlink removals are done before > > the of_node destruction, synchronize the of_overlay_remove() with the > > devlink removals. > > > > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") > > Cc: stable@vger.kernel.org > > Signed-off-by: Herve Codina <herve.codina@bootlin.com> > > --- > > drivers/of/overlay.c | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > index 2ae7e9d24a64..99659ae9fb28 100644 > > --- a/drivers/of/overlay.c > > +++ b/drivers/of/overlay.c > > In the cover, you mention device.h inclusion but I'm not seeing it? This is > clearly up to the DT maintainers to decide but, IMHO, I would very much prefer > to see fwnode.h included in here rather than directly device.h (so yeah, > renaming the function to fwnode_*). But yeah, I might be biased by own series :) > Damned. I missed device.h in this patch. Without this one, the patch do not compile :( A fixup commit I missed to squash before sending. A v3 is planned to add this device.h. Nuno, do you prefer I wait few days before sending this v3 waiting for more replies or I send it right now and you re-do your comment on the v3 ? I would really prefer to send it now :) Sorry about my mistake. Best regards, Hervé
On Thu, 2024-02-29 at 11:14 +0100, Herve Codina wrote: > On Thu, 29 Feb 2024 10:50:21 +0100 > Nuno Sá <noname.nuno@gmail.com> wrote: > > > On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote: > > > In the following sequence: > > > 1) of_platform_depopulate() > > > 2) of_overlay_remove() > > > > > > During the step 1, devices are destroyed and devlinks are removed. > > > During the step 2, OF nodes are destroyed but > > > __of_changeset_entry_destroy() can raise warnings related to missing > > > of_node_put(): > > > ERROR: memory leak, expected refcount 1 instead of 2 ... > > > > > > Indeed, during the devlink removals performed at step 1, the removal > > > itself releasing the device (and the attached of_node) is done by a job > > > queued in a workqueue and so, it is done asynchronously with respect to > > > function calls. > > > When the warning is present, of_node_put() will be called but wrongly > > > too late from the workqueue job. > > > > > > In order to be sure that any ongoing devlink removals are done before > > > the of_node destruction, synchronize the of_overlay_remove() with the > > > devlink removals. > > > > > > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Herve Codina <herve.codina@bootlin.com> > > > --- > > > drivers/of/overlay.c | 9 ++++++++- > > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > > index 2ae7e9d24a64..99659ae9fb28 100644 > > > --- a/drivers/of/overlay.c > > > +++ b/drivers/of/overlay.c > > > > In the cover, you mention device.h inclusion but I'm not seeing it? This is > > clearly up to the DT maintainers to decide but, IMHO, I would very much > > prefer > > to see fwnode.h included in here rather than directly device.h (so yeah, > > renaming the function to fwnode_*). But yeah, I might be biased by own > > series :) > > > > Damned. I missed device.h in this patch. > Without this one, the patch do not compile :( > > A fixup commit I missed to squash before sending. > > A v3 is planned to add this device.h. > > Nuno, do you prefer I wait few days before sending this v3 waiting for more > replies > or I send it right now and you re-do your comment on the v3 ? > > I would really prefer to send it now :) > Typically maintainers don't like much of re-spinning versions too fast. That said, up to you :). I can copy paste my comments in v3. - Nuno Sá
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c index 2ae7e9d24a64..99659ae9fb28 100644 --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c @@ -853,6 +853,14 @@ static void free_overlay_changeset(struct overlay_changeset *ovcs) { int i; + /* + * Wait for any ongoing device link removals before removing some of + * nodes. Drop the global lock while waiting + */ + mutex_unlock(&of_mutex); + device_link_wait_removal(); + mutex_lock(&of_mutex); + if (ovcs->cset.entries.next) of_changeset_destroy(&ovcs->cset); @@ -862,7 +870,6 @@ static void free_overlay_changeset(struct overlay_changeset *ovcs) ovcs->id = 0; } - for (i = 0; i < ovcs->count; i++) { of_node_put(ovcs->fragments[i].target); of_node_put(ovcs->fragments[i].overlay);
In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove() During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ... Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job. In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_overlay_remove() with the devlink removals. Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina <herve.codina@bootlin.com> --- drivers/of/overlay.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)