diff mbox series

net: cpts: Condition WARN_ON on PTP_1588_CLOCK

Message ID 20200416085627.1882-1-clay@daemons.net
State Changes Requested
Delegated to: David Miller
Headers show
Series net: cpts: Condition WARN_ON on PTP_1588_CLOCK | expand

Commit Message

Clay McClure April 16, 2020, 8:56 a.m. UTC
CPTS_MOD merely implies PTP_1588_CLOCK; it is possible to build cpts
without PTP clock support. In that case, ptp_clock_register() returns
NULL and we should not WARN_ON(cpts->clock) when downing the interface.
The ptp_*() functions are stubbed without PTP_1588_CLOCK, so it's safe
to pass them a null pointer.

Signed-off-by: Clay McClure <clay@daemons.net>
---
 drivers/net/ethernet/ti/cpts.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Grygorii Strashko April 16, 2020, 11:11 a.m. UTC | #1
On 16/04/2020 11:56, Clay McClure wrote:
> CPTS_MOD merely implies PTP_1588_CLOCK; it is possible to build cpts
> without PTP clock support. In that case, ptp_clock_register() returns
> NULL and we should not WARN_ON(cpts->clock) when downing the interface.
> The ptp_*() functions are stubbed without PTP_1588_CLOCK, so it's safe
> to pass them a null pointer.

Could you explain the purpose of the exercise (Enabling CPTS with PTP_1588_CLOCK disabled), pls?

> 
> Signed-off-by: Clay McClure <clay@daemons.net>
> ---
>   drivers/net/ethernet/ti/cpts.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
> index fd214f8730a9..daf4505f4a70 100644
> --- a/drivers/net/ethernet/ti/cpts.c
> +++ b/drivers/net/ethernet/ti/cpts.c
> @@ -646,7 +646,7 @@ EXPORT_SYMBOL_GPL(cpts_register);
>   
>   void cpts_unregister(struct cpts *cpts)
>   {
> -	if (WARN_ON(!cpts->clock))
> +	if (IS_REACHABLE(PTP_1588_CLOCK) && WARN_ON(!cpts->clock))
>   		return;
>   
>   	ptp_clock_unregister(cpts->clock);
>
Clay McClure April 20, 2020, 9:36 a.m. UTC | #2
On Thu, Apr 16, 2020 at 02:11:45PM +0300, Grygorii Strashko wrote:

Grygorii,

> > CPTS_MOD merely implies PTP_1588_CLOCK; it is possible to build cpts
> > without PTP clock support. In that case, ptp_clock_register() returns
> > NULL and we should not WARN_ON(cpts->clock) when downing the interface.
> > The ptp_*() functions are stubbed without PTP_1588_CLOCK, so it's safe
> > to pass them a null pointer.
> 
> Could you explain the purpose of the exercise (Enabling CPTS with
> PTP_1588_CLOCK disabled), pls?

Hardware timestamping with a free-running PHC _almost_ works without
PTP_1588_CLOCK, but since PHC rollover is handled by the PTP kworker
in this driver the timestamps end up not being monotonic.

And of course the moment you want to syntonize/synchronize the PHC with
another clock (say, CLOCK_REALTIME), you'll need a PTP clock device. So
you're right, there's not much point in building CPTS_MOD without
PTP_1588_CLOCK.

Given that, I wonder why all the Ethernet drivers seem to just `imply`
PTP_1588_CLOCK, rather than `depends on` it?

In any case, I was surprised to get a warning during `ifdown` but not
during `ifup`. What do you think of this change, which prints an error
like this during `ifup` if PTP_1588_CLOCK is not enabled:

[    6.192707] 000: cpsw 4a100000.ethernet: error registering cpts device

--- 
diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index 10ad706dda53..70b15039cd37 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -462,8 +462,8 @@ int cpts_register(struct cpts *cpts)
        timecounter_init(&cpts->tc, &cpts->cc, ktime_get_real_ns());
 
        cpts->clock = ptp_clock_register(&cpts->info, cpts->dev);
-       if (IS_ERR(cpts->clock)) {
-               err = PTR_ERR(cpts->clock);
+       if (IS_ERR_OR_NULL(cpts->clock)) {
+               err = cpts->clock ? PTR_ERR(cpts->clock) : -EOPNOTSUPP;
                cpts->clock = NULL;
                goto err_ptp;
        }

-- 
Clay
Arnd Bergmann April 20, 2020, 2:38 p.m. UTC | #3
On Mon, Apr 20, 2020 at 11:38 AM Clay McClure <clay@daemons.net> wrote:
> On Thu, Apr 16, 2020 at 02:11:45PM +0300, Grygorii Strashko wrote:
>
> > > CPTS_MOD merely implies PTP_1588_CLOCK; it is possible to build cpts
> > > without PTP clock support. In that case, ptp_clock_register() returns
> > > NULL and we should not WARN_ON(cpts->clock) when downing the interface.
> > > The ptp_*() functions are stubbed without PTP_1588_CLOCK, so it's safe
> > > to pass them a null pointer.
> >
> > Could you explain the purpose of the exercise (Enabling CPTS with
> > PTP_1588_CLOCK disabled), pls?
>
> Hardware timestamping with a free-running PHC _almost_ works without
> PTP_1588_CLOCK, but since PHC rollover is handled by the PTP kworker
> in this driver the timestamps end up not being monotonic.
>
> And of course the moment you want to syntonize/synchronize the PHC with
> another clock (say, CLOCK_REALTIME), you'll need a PTP clock device. So
> you're right, there's not much point in building CPTS_MOD without
> PTP_1588_CLOCK.
>
> Given that, I wonder why all the Ethernet drivers seem to just `imply`
> PTP_1588_CLOCK, rather than `depends on` it?

I suspect we should move all of them back. This was an early user
of 'imply', but the meaning of that keyword has now changed
in the latest Kconfig.

> diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
> index 10ad706dda53..70b15039cd37 100644
> --- a/drivers/net/ethernet/ti/cpts.c
> +++ b/drivers/net/ethernet/ti/cpts.c
> @@ -462,8 +462,8 @@ int cpts_register(struct cpts *cpts)
>         timecounter_init(&cpts->tc, &cpts->cc, ktime_get_real_ns());
>
>         cpts->clock = ptp_clock_register(&cpts->info, cpts->dev);
> -       if (IS_ERR(cpts->clock)) {
> -               err = PTR_ERR(cpts->clock);
> +       if (IS_ERR_OR_NULL(cpts->clock)) {
> +               err = cpts->clock ? PTR_ERR(cpts->clock) : -EOPNOTSUPP;
>                 cpts->clock = NULL;
>                 goto err_ptp;

Something else is wrong if you need IS_ERR_OR_NULL(). Any
kernel interface should either return an negative error code when
something goes wrong, or should return NULL for all errors, but
not mix the two.

        Arnd
Richard Cochran April 20, 2020, 5 p.m. UTC | #4
On Mon, Apr 20, 2020 at 04:38:32PM +0200, Arnd Bergmann wrote:
> 
> I suspect we should move all of them back. This was an early user
> of 'imply', but the meaning of that keyword has now changed
> in the latest Kconfig.

Can you please explain the justification for changing the meaning?

It was a big PITA for me to support this in the first place, and now
we are back to square one?

> Something else is wrong if you need IS_ERR_OR_NULL(). Any
> kernel interface should either return an negative error code when
> something goes wrong, or should return NULL for all errors, but
> not mix the two.

On the contrary, this is exactly what the whole "imply" thing
demanded.

d1cbfd771ce82 (Nicolas Pitre       2016-11-11 172) #if IS_REACHABLE(CONFIG_PTP_1588_CLOCK)
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 173) 
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 174) /**
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 175)  * ptp_clock_register() - register a PTP hardware clock driver
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 176)  *
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 177)  * @info:   Structure describing the new clock.
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 178)  * @parent: Pointer to the parent device of the new clock.
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 179)  *
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 180)  * Returns a valid pointer on success or PTR_ERR on failure.  If PHC
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 181)  * support is missing at the configuration level, this function
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 182)  * returns NULL, and drivers are expected to gracefully handle that
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 183)  * case separately.
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 184)  */
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 185) 
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 186) extern struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
d1cbfd771ce82 (Nicolas Pitre       2016-11-11 187) 					    struct device *parent);

Thanks,
Richard
Arnd Bergmann April 20, 2020, 6:57 p.m. UTC | #5
On Mon, Apr 20, 2020 at 7:00 PM Richard Cochran
<richardcochran@gmail.com> wrote:
>
> On Mon, Apr 20, 2020 at 04:38:32PM +0200, Arnd Bergmann wrote:
> >
> > I suspect we should move all of them back. This was an early user
> > of 'imply', but the meaning of that keyword has now changed
> > in the latest Kconfig.
>
> Can you please explain the justification for changing the meaning?
>
> It was a big PITA for me to support this in the first place, and now
> we are back to square one?

I don't understand it either. Apparently it didn't always do what users
expected, though the new definition seems less useful to me, as it
only changes the default.

> > Something else is wrong if you need IS_ERR_OR_NULL(). Any
> > kernel interface should either return an negative error code when
> > something goes wrong, or should return NULL for all errors, but
> > not mix the two.
>
> On the contrary, this is exactly what the whole "imply" thing
> demanded.
>
> d1cbfd771ce82 (Nicolas Pitre       2016-11-11 172) #if IS_REACHABLE(CONFIG_PTP_1588_CLOCK)
> d1cbfd771ce82 (Nicolas Pitre       2016-11-11 173)
> d1cbfd771ce82 (Nicolas Pitre       2016-11-11 174) /**
> d1cbfd771ce82 (Nicolas Pitre       2016-11-11 175)  * ptp_clock_register() - register a PTP hardware clock driver
> d1cbfd771ce82 (Nicolas Pitre       2016-11-11 176)  *
> d1cbfd771ce82 (Nicolas Pitre       2016-11-11 177)  * @info:   Structure describing the new clock.
> d1cbfd771ce82 (Nicolas Pitre       2016-11-11 178)  * @parent: Pointer to the parent device of the new clock.
> d1cbfd771ce82 (Nicolas Pitre       2016-11-11 179)  *
> d1cbfd771ce82 (Nicolas Pitre       2016-11-11 180)  * Returns a valid pointer on success or PTR_ERR on failure.  If PHC
> d1cbfd771ce82 (Nicolas Pitre       2016-11-11 181)  * support is missing at the configuration level, this function
> d1cbfd771ce82 (Nicolas Pitre       2016-11-11 182)  * returns NULL, and drivers are expected to gracefully handle that
> d1cbfd771ce82 (Nicolas Pitre       2016-11-11 183)  * case separately.
> d1cbfd771ce82 (Nicolas Pitre       2016-11-11 184)  */

The key here is "gracefully". The second patch from Clay just turns NULL into
 -EOPNOTSUPP and treats the compile-time condition into a runtime error.
I don't see a point in allowing the driver to compile if it just always returns
an error. The two callers then go on to print a message for any error
and just keep going.

      Arnd
Richard Cochran April 20, 2020, 9:18 p.m. UTC | #6
On Mon, Apr 20, 2020 at 08:57:05PM +0200, Arnd Bergmann wrote:
> > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 172) #if IS_REACHABLE(CONFIG_PTP_1588_CLOCK)
> > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 173)
> > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 174) /**
> > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 175)  * ptp_clock_register() - register a PTP hardware clock driver
> > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 176)  *
> > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 177)  * @info:   Structure describing the new clock.
> > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 178)  * @parent: Pointer to the parent device of the new clock.
> > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 179)  *
> > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 180)  * Returns a valid pointer on success or PTR_ERR on failure.  If PHC
> > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 181)  * support is missing at the configuration level, this function
> > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 182)  * returns NULL, and drivers are expected to gracefully handle that
> > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 183)  * case separately.
> > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 184)  */
> 
> The key here is "gracefully". The second patch from Clay just turns NULL into
>  -EOPNOTSUPP and treats the compile-time condition into a runtime error.

You are talking about the cpts driver, no?

I'm worried about ptp_clock_register(), because it does return NULL if
IS_REACHABLE(CONFIG_PTP_1588_CLOCK), and this is the "correct"
behavior ever since November 2016.

If somebody wants to change that stub to return EOPNOTSUPP, then fine,
but please have them audit the callers and submit a patch series.

Thanks,
Richard
Arnd Bergmann April 20, 2020, 9:21 p.m. UTC | #7
On Mon, Apr 20, 2020 at 11:18 PM Richard Cochran
<richardcochran@gmail.com> wrote:
> On Mon, Apr 20, 2020 at 08:57:05PM +0200, Arnd Bergmann wrote:
> > > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 172) #if IS_REACHABLE(CONFIG_PTP_1588_CLOCK)
> > > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 173)
> > > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 174) /**
> > > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 175)  * ptp_clock_register() - register a PTP hardware clock driver
> > > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 176)  *
> > > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 177)  * @info:   Structure describing the new clock.
> > > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 178)  * @parent: Pointer to the parent device of the new clock.
> > > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 179)  *
> > > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 180)  * Returns a valid pointer on success or PTR_ERR on failure.  If PHC
> > > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 181)  * support is missing at the configuration level, this function
> > > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 182)  * returns NULL, and drivers are expected to gracefully handle that
> > > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 183)  * case separately.
> > > d1cbfd771ce82 (Nicolas Pitre       2016-11-11 184)  */
> >
> > The key here is "gracefully". The second patch from Clay just turns NULL into
> >  -EOPNOTSUPP and treats the compile-time condition into a runtime error.
>
> You are talking about the cpts driver, no?
>
> I'm worried about ptp_clock_register(), because it does return NULL if
> IS_REACHABLE(CONFIG_PTP_1588_CLOCK), and this is the "correct"
> behavior ever since November 2016.
>
> If somebody wants to change that stub to return EOPNOTSUPP, then fine,
> but please have them audit the callers and submit a patch series.

It's not great, but we have other interfaces like this that can return NULL for
success when the subsystem is disabled. The problem is when there is
a mismatch between the caller treating NULL as failure when it is meant to
be "successful lack of object returned".

       Arnd
Richard Cochran April 20, 2020, 9:34 p.m. UTC | #8
On Mon, Apr 20, 2020 at 11:21:20PM +0200, Arnd Bergmann wrote:
> It's not great, but we have other interfaces like this that can return NULL for
> success when the subsystem is disabled. The problem is when there is
> a mismatch between the caller treating NULL as failure when it is meant to
> be "successful lack of object returned".

Yeah, that should be fixed.

To be clear, do you all see a need to change the stubbed version of
ptp_clock_register() or not?

Thanks,
Richard
Arnd Bergmann April 20, 2020, 9:42 p.m. UTC | #9
On Mon, Apr 20, 2020 at 11:34 PM Richard Cochran
<richardcochran@gmail.com> wrote:
>
> On Mon, Apr 20, 2020 at 11:21:20PM +0200, Arnd Bergmann wrote:
> > It's not great, but we have other interfaces like this that can return NULL for
> > success when the subsystem is disabled. The problem is when there is
> > a mismatch between the caller treating NULL as failure when it is meant to
> > be "successful lack of object returned".
>
> Yeah, that should be fixed.
>
> To be clear, do you all see a need to change the stubbed version of
> ptp_clock_register() or not?

No, if the NULL return is only meant to mean "nothing wrong, keep going
wihtout an object", that's fine with me. It does occasionally confuse driver
writers (as seen here), so it's not a great interface, but there is no general
solution to make it better.

     Arnd
Grygorii Strashko April 22, 2020, 11:16 a.m. UTC | #10
Hi

On 21/04/2020 00:42, Arnd Bergmann wrote:
> On Mon, Apr 20, 2020 at 11:34 PM Richard Cochran
> <richardcochran@gmail.com> wrote:
>>
>> On Mon, Apr 20, 2020 at 11:21:20PM +0200, Arnd Bergmann wrote:
>>> It's not great, but we have other interfaces like this that can return NULL for
>>> success when the subsystem is disabled. The problem is when there is
>>> a mismatch between the caller treating NULL as failure when it is meant to
>>> be "successful lack of object returned".
>>
>> Yeah, that should be fixed.
>>
>> To be clear, do you all see a need to change the stubbed version of
>> ptp_clock_register() or not?
> 
> No, if the NULL return is only meant to mean "nothing wrong, keep going
> wihtout an object", that's fine with me. It does occasionally confuse driver
> writers (as seen here), so it's not a great interface, but there is no general
> solution to make it better.

As per commit
commit d1cbfd771ce8297fa11e89f315392de6056a2181
Author: Nicolas Pitre <nicolas.pitre@linaro.org>
Date:   Fri Nov 11 00:10:07 2016 -0500

     ptp_clock: Allow for it to be optional
     
     In order to break the hard dependency between the PTP clock subsystem and
     ethernet drivers capable of being clock providers, this patch provides
     simple PTP stub functions to allow linkage of those drivers into the
     kernel even when the PTP subsystem is configured out. Drivers must be
     ready to accept NULL from ptp_clock_register() in that case.
     
     And to make it possible for PTP to be configured out, the select statement
     in those driver's Kconfig menu entries is converted to the new "imply"
     statement. This way the PTP subsystem may have Kconfig dependencies of
     its own, such as POSIX_TIMERS, without having to make those ethernet
     drivers unavailable if POSIX timers are cconfigured out. And when support
     for POSIX timers is selected again then the default config option for PTP
     clock support will automatically be adjusted accordingly.


the idea of using "imply" is to keep networking enabled even if PTP is configured out
and this exactly what happens with cpts driver.
Another question is that CPTS completely nonfunctional in this case and it was never
expected that somebody will even try to use/run such configuration (except for random build purposes).
Clay McClure April 26, 2020, 2:41 a.m. UTC | #11
On Wed, Apr 22, 2020 at 02:16:11PM +0300, Grygorii Strashko wrote:
> 
> On 21/04/2020 00:42, Arnd Bergmann wrote:
> >
> > On Mon, Apr 20, 2020 at 11:34 PM Richard Cochran
> > >
> > > To be clear, do you all see a need to change the stubbed version of
> > > ptp_clock_register() or not?
> > 
> > No, if the NULL return is only meant to mean "nothing wrong, keep going
> > wihtout an object", that's fine with me. It does occasionally confuse driver
> > writers (as seen here), so it's not a great interface, but there is no general
> > solution to make it better.

That's why in my first patch I condition the WARN_ON() on PTP_1588_CLOCK,
since without that the null pointer here is not an error:

	void cpts_unregister(struct cpts *cpts)                                         
	{                                                                               
		if (WARN_ON(!cpts->clock))                                              
			return;                                                         

Grygorii's question (paraphrasing: "why would you ever do that?")
prompted my second patch, making the broken configuration obvious by
emitting an error during `ifup`, instead of just a warning during
`ifdown`.

But I think Grygorii is on to something here:

> Another question is that CPTS completely nonfunctional in this case and
> it was never expected that somebody will even try to use/run such
> configuration (except for random build purposes).

So, let's not allow it. In my view, commit d1cbfd771ce8 ("ptp_clock:
Allow for it to be optional") went a bit too far, and converted even
clearly PTP-specific modules from `select` to `imply` PTP_1588_CLOCK,
which is what made this broken configuration possible. I suggest
reverting that change, just for the PTP-specific modules under
drivers/net/ethernet.

I audited all drivers that call `ptp_clock_register()`; it looks like
these should `select` (instead of merely `imply`) PTP_1588_CLOCK:

    NET_DSA_MV88E6XXX_PTP
    NET_DSA_SJA1105_PTP
    MACB_USE_HWSTAMP
    CAVIUM_PTP
    TI_CPTS_MOD
    PTP_1588_CLOCK_IXP46X

Note how they all reference PTP or timestamping in their name, which is
a clue that they depend on PTP_1588_CLOCK.

I have a patch for this, but first, a procedural question: does mailing
list etiquette dictate that I reply to this thread with the new patch,
or does it begin a new thread?
diff mbox series

Patch

diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index fd214f8730a9..daf4505f4a70 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -646,7 +646,7 @@  EXPORT_SYMBOL_GPL(cpts_register);
 
 void cpts_unregister(struct cpts *cpts)
 {
-	if (WARN_ON(!cpts->clock))
+	if (IS_REACHABLE(PTP_1588_CLOCK) && WARN_ON(!cpts->clock))
 		return;
 
 	ptp_clock_unregister(cpts->clock);