diff mbox series

ptp: free ptp clock properly

Message ID 20200304175350.GB267906@xps-13
State Rejected
Delegated to: David Miller
Headers show
Series ptp: free ptp clock properly | expand

Commit Message

Andrea Righi March 4, 2020, 5:53 p.m. UTC
There is a bug in ptp_clock_unregister() where ptp_clock_release() can
free up resources needed by posix_clock_unregister() to properly destroy
a related sysfs device.

Fix this by calling posix_clock_unregister() in ptp_clock_release().

See also:
commit 75718584cb3c ("ptp: free ptp device pin descriptors properly").

BugLink: https://bugs.launchpad.net/bugs/1864754
Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and cdev")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
---
 drivers/ptp/ptp_clock.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Vladis Dronov March 4, 2020, 8:11 p.m. UTC | #1
Hello, Andrea, all,

----- Original Message -----
> From: "Andrea Righi" <andrea.righi@canonical.com>
> Subject: [PATCH] ptp: free ptp clock properly
> 
> There is a bug in ptp_clock_unregister() where ptp_clock_release() can
> free up resources needed by posix_clock_unregister() to properly destroy
> a related sysfs device.
> 
> Fix this by calling posix_clock_unregister() in ptp_clock_release().

Honestly, this does not seem right. The calls at PTP clock release are:

ptp_clock_unregister() -> posix_clock_unregister() -> cdev_device_del() ->
-> ... bla ... -> ptp_clock_release()

So, it looks like with this patch both posix_clock_unregister() and
ptp_clock_release() are not called at all. And it looks like the "fix" is
not removing PTP clock's cdev, i.e. leaking it and related sysfs resources.

I would guess that a kernel in question (5.3.0-40-generic) has the commit
a33121e5487b but does not have the commit 75718584cb3c, which should be
exactly fixing a docking station disconnect crash. Could you please,
check this?

Why? We have 2 crash call traces. 1) the launchpad bug 2) the email which
led to the commit 75718584cb3c creation (see Link:).

Aaaaand they are identical starting from device_release_driver_internal()
and almost to the top.

> See also:
> commit 75718584cb3c ("ptp: free ptp device pin descriptors properly").
> 
> BugLink: https://bugs.launchpad.net/bugs/1864754
> Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and
> cdev")
> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
> ---
>  drivers/ptp/ptp_clock.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
> index ac1f2bf9e888..12951023d0c6 100644
> --- a/drivers/ptp/ptp_clock.c
> +++ b/drivers/ptp/ptp_clock.c
> @@ -171,6 +171,7 @@ static void ptp_clock_release(struct device *dev)
>  	struct ptp_clock *ptp = container_of(dev, struct ptp_clock, dev);
>  
>  	ptp_cleanup_pin_groups(ptp);
> +	posix_clock_unregister(&ptp->clock);
>  	mutex_destroy(&ptp->tsevq_mux);
>  	mutex_destroy(&ptp->pincfg_mux);
>  	ida_simple_remove(&ptp_clocks_map, ptp->index);
> @@ -303,8 +304,6 @@ int ptp_clock_unregister(struct ptp_clock *ptp)
>  	if (ptp->pps_source)
>  		pps_unregister_source(ptp->pps_source);
>  
> -	posix_clock_unregister(&ptp->clock);
> -
>  	return 0;
>  }
>  EXPORT_SYMBOL(ptp_clock_unregister);
> --
> 2.25.0

Best regards,
Vladis Dronov | Red Hat, Inc. | The Core Kernel | Senior Software Engineer
Andrea Righi March 5, 2020, 7:36 a.m. UTC | #2
On Wed, Mar 04, 2020 at 03:11:44PM -0500, Vladis Dronov wrote:
> Hello, Andrea, all,
> 
> ----- Original Message -----
> > From: "Andrea Righi" <andrea.righi@canonical.com>
> > Subject: [PATCH] ptp: free ptp clock properly
> > 
> > There is a bug in ptp_clock_unregister() where ptp_clock_release() can
> > free up resources needed by posix_clock_unregister() to properly destroy
> > a related sysfs device.
> > 
> > Fix this by calling posix_clock_unregister() in ptp_clock_release().
> 
> Honestly, this does not seem right. The calls at PTP clock release are:
> 
> ptp_clock_unregister() -> posix_clock_unregister() -> cdev_device_del() ->
> -> ... bla ... -> ptp_clock_release()
> 
> So, it looks like with this patch both posix_clock_unregister() and
> ptp_clock_release() are not called at all. And it looks like the "fix" is
> not removing PTP clock's cdev, i.e. leaking it and related sysfs resources.

That's absolutely right, thanks for the clarification!

With my "fix" we don't see the the kernel oops anymore, but we're
leaking resources, so definitely not a valid fix.

> 
> I would guess that a kernel in question (5.3.0-40-generic) has the commit
> a33121e5487b but does not have the commit 75718584cb3c, which should be
> exactly fixing a docking station disconnect crash. Could you please,
> check this?

Unfortunately the kernel in question already has 75718584cb3c:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/commit/?h=hwe&id=c71b774732f997ef38ed7bd62e73891a01f2bbfe

It looks like there's something else that can free up too early the
resources required by posix_clock_unregister() to destroy the related
sysfs files.

Maybe what we really need to call from ptp_clock_release() is
pps_unregister_source()? Something like this:

From: Andrea Righi <andrea.righi@canonical.com>
Subject: [PATCH] ptp: free ptp clock properly

There is a bug in ptp_clock_unregister() where ptp_clock_release() can
free up resources needed by posix_clock_unregister() to properly destroy
a related sysfs device.

Fix this by calling pps_unregister_source() in ptp_clock_release().

See also:
commit 75718584cb3c ("ptp: free ptp device pin descriptors properly").

BugLink: https://bugs.launchpad.net/bugs/1864754
Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and cdev")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
---
 drivers/ptp/ptp_clock.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index ac1f2bf9e888..468286ef61ad 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -170,6 +170,9 @@ static void ptp_clock_release(struct device *dev)
 {
 	struct ptp_clock *ptp = container_of(dev, struct ptp_clock, dev);
 
+	/* Release the clock's resources. */
+	if (ptp->pps_source)
+		pps_unregister_source(ptp->pps_source);
 	ptp_cleanup_pin_groups(ptp);
 	mutex_destroy(&ptp->tsevq_mux);
 	mutex_destroy(&ptp->pincfg_mux);
@@ -298,11 +301,6 @@ int ptp_clock_unregister(struct ptp_clock *ptp)
 		kthread_cancel_delayed_work_sync(&ptp->aux_work);
 		kthread_destroy_worker(ptp->kworker);
 	}
-
-	/* Release the clock's resources. */
-	if (ptp->pps_source)
-		pps_unregister_source(ptp->pps_source);
-
 	posix_clock_unregister(&ptp->clock);
 
 	return 0;
Vladis Dronov March 5, 2020, 10:47 a.m. UTC | #3
Hello, Andrea, all,

> > I would guess that a kernel in question (5.3.0-40-generic) has the commit
> > a33121e5487b but does not have the commit 75718584cb3c, which should be
> > exactly fixing a docking station disconnect crash. Could you please,
> > check this?
> 
> Unfortunately the kernel in question already has 75718584cb3c:
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/commit/?h=hwe&id=c71b774732f997ef38ed7bd62e73891a01f2bbfe
> 
> It looks like there's something else that can free up too early the
> resources required by posix_clock_unregister() to destroy the related
> sysfs files.
> 
> Maybe what we really need to call from ptp_clock_release() is
> pps_unregister_source()? Something like this:

Err... I believe, "Maybe" is not a good enough reason to accept a kernel patch.
Probably, there should be something supporting this statement.

Best regards,
Vladis Dronov | Red Hat, Inc. | The Core Kernel | Senior Software Engineer
Andrea Righi March 5, 2020, 10:58 a.m. UTC | #4
On Thu, Mar 05, 2020 at 05:47:34AM -0500, Vladis Dronov wrote:
> Hello, Andrea, all,
> 
> > > I would guess that a kernel in question (5.3.0-40-generic) has the commit
> > > a33121e5487b but does not have the commit 75718584cb3c, which should be
> > > exactly fixing a docking station disconnect crash. Could you please,
> > > check this?
> > 
> > Unfortunately the kernel in question already has 75718584cb3c:
> > https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/commit/?h=hwe&id=c71b774732f997ef38ed7bd62e73891a01f2bbfe
> > 
> > It looks like there's something else that can free up too early the
> > resources required by posix_clock_unregister() to destroy the related
> > sysfs files.
> > 
> > Maybe what we really need to call from ptp_clock_release() is
> > pps_unregister_source()? Something like this:
> 
> Err... I believe, "Maybe" is not a good enough reason to accept a kernel patch.
> Probably, there should be something supporting this statement.

Indeed. :) I've asked the original bug reporter to repeat the test with
this patch. Let's see if we can still reproduce the failure...

Thanks,
-Andrea
diff mbox series

Patch

diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index ac1f2bf9e888..12951023d0c6 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -171,6 +171,7 @@  static void ptp_clock_release(struct device *dev)
 	struct ptp_clock *ptp = container_of(dev, struct ptp_clock, dev);
 
 	ptp_cleanup_pin_groups(ptp);
+	posix_clock_unregister(&ptp->clock);
 	mutex_destroy(&ptp->tsevq_mux);
 	mutex_destroy(&ptp->pincfg_mux);
 	ida_simple_remove(&ptp_clocks_map, ptp->index);
@@ -303,8 +304,6 @@  int ptp_clock_unregister(struct ptp_clock *ptp)
 	if (ptp->pps_source)
 		pps_unregister_source(ptp->pps_source);
 
-	posix_clock_unregister(&ptp->clock);
-
 	return 0;
 }
 EXPORT_SYMBOL(ptp_clock_unregister);