diff mbox series

time: Fix get_ticks being non-monotonic

Message ID 20200901195548.542737-1-seanga2@gmail.com
State Superseded
Delegated to: Tom Rini
Headers show
Series time: Fix get_ticks being non-monotonic | expand

Commit Message

Sean Anderson Sept. 1, 2020, 7:55 p.m. UTC
get_ticks does not always succeed. Sometimes it can be called before the
timer has been initialized. If it does, it returns a negative errno.
This causes the timer to appear non-monotonic, because the value will
become much smaller after the timer is initialized.

No users of get_ticks which I checked handle errors of this kind. Further,
functions like tick_to_time mangle the result of get_ticks, making it very
unlikely that one could check for an error without suggesting a patch such
as this one.

This patch changes get_ticks to always return 0 when there is an error.
0 is the least unsigned integer, ensuring get_ticks appears monotonic. This
has the side effect of time apparently not passing until the timer is
initialized. However, without this patch, time does not pass anyway,
because the error value is likely to be the same.

Fixes: c8a7ba9e6a5
Signed-off-by: Sean Anderson <seanga2@gmail.com>
---

 lib/time.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Simon Glass Sept. 7, 2020, 1:43 a.m. UTC | #1
Hi Sean,

On Tue, 1 Sep 2020 at 13:56, Sean Anderson <seanga2@gmail.com> wrote:
>
> get_ticks does not always succeed. Sometimes it can be called before the
> timer has been initialized. If it does, it returns a negative errno.
> This causes the timer to appear non-monotonic, because the value will
> become much smaller after the timer is initialized.
>
> No users of get_ticks which I checked handle errors of this kind. Further,
> functions like tick_to_time mangle the result of get_ticks, making it very
> unlikely that one could check for an error without suggesting a patch such
> as this one.
>
> This patch changes get_ticks to always return 0 when there is an error.
> 0 is the least unsigned integer, ensuring get_ticks appears monotonic. This
> has the side effect of time apparently not passing until the timer is
> initialized. However, without this patch, time does not pass anyway,
> because the error value is likely to be the same.
>
> Fixes: c8a7ba9e6a5
> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> ---
>
>  lib/time.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Would it be better to panic so people can fix the bug?

Regards,
Simon
Sean Anderson Sept. 7, 2020, 2:02 a.m. UTC | #2
On 9/6/20 9:43 PM, Simon Glass wrote:
> Hi Sean,
> 
> On Tue, 1 Sep 2020 at 13:56, Sean Anderson <seanga2@gmail.com> wrote:
>>
>> get_ticks does not always succeed. Sometimes it can be called before the
>> timer has been initialized. If it does, it returns a negative errno.
>> This causes the timer to appear non-monotonic, because the value will
>> become much smaller after the timer is initialized.
>>
>> No users of get_ticks which I checked handle errors of this kind. Further,
>> functions like tick_to_time mangle the result of get_ticks, making it very
>> unlikely that one could check for an error without suggesting a patch such
>> as this one.
>>
>> This patch changes get_ticks to always return 0 when there is an error.
>> 0 is the least unsigned integer, ensuring get_ticks appears monotonic. This
>> has the side effect of time apparently not passing until the timer is
>> initialized. However, without this patch, time does not pass anyway,
>> because the error value is likely to be the same.
>>
>> Fixes: c8a7ba9e6a5
>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>> ---
>>
>>  lib/time.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Would it be better to panic so people can fix the bug?

I thought this was expected behavior. It's only a bug if you do
something like udelay before any timers are created. We just can't
report errors through get_ticks, because its users assume that it always
returns a time of some kind.

--Sean
Simon Glass Sept. 7, 2020, 1:57 p.m. UTC | #3
Hi Sean,

On Sun, 6 Sep 2020 at 20:02, Sean Anderson <seanga2@gmail.com> wrote:
>
> On 9/6/20 9:43 PM, Simon Glass wrote:
> > Hi Sean,
> >
> > On Tue, 1 Sep 2020 at 13:56, Sean Anderson <seanga2@gmail.com> wrote:
> >>
> >> get_ticks does not always succeed. Sometimes it can be called before the
> >> timer has been initialized. If it does, it returns a negative errno.
> >> This causes the timer to appear non-monotonic, because the value will
> >> become much smaller after the timer is initialized.
> >>
> >> No users of get_ticks which I checked handle errors of this kind. Further,
> >> functions like tick_to_time mangle the result of get_ticks, making it very
> >> unlikely that one could check for an error without suggesting a patch such
> >> as this one.
> >>
> >> This patch changes get_ticks to always return 0 when there is an error.
> >> 0 is the least unsigned integer, ensuring get_ticks appears monotonic. This
> >> has the side effect of time apparently not passing until the timer is
> >> initialized. However, without this patch, time does not pass anyway,
> >> because the error value is likely to be the same.
> >>
> >> Fixes: c8a7ba9e6a5
> >> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> >> ---
> >>
> >>  lib/time.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > Would it be better to panic so people can fix the bug?
>
> I thought this was expected behavior. It's only a bug if you do
> something like udelay before any timers are created. We just can't
> report errors through get_ticks, because its users assume that it always
> returns a time of some kind.

I think it indicates a bug. If you use a device before it is ready you
don't really know what it will do. I worry that this patch is just
going to cause confusion, since the behaviour depends on when you call
it. If we panic, people can figure out why the timer is being inited
too late, or being used too early.

Regards,
Simon
Sean Anderson Sept. 7, 2020, 3:51 p.m. UTC | #4
On 9/7/20 9:57 AM, Simon Glass wrote:
> Hi Sean,
> 
> On Sun, 6 Sep 2020 at 20:02, Sean Anderson <seanga2@gmail.com> wrote:
>>
>> On 9/6/20 9:43 PM, Simon Glass wrote:
>>> Hi Sean,
>>>
>>> On Tue, 1 Sep 2020 at 13:56, Sean Anderson <seanga2@gmail.com> wrote:
>>>>
>>>> get_ticks does not always succeed. Sometimes it can be called before the
>>>> timer has been initialized. If it does, it returns a negative errno.
>>>> This causes the timer to appear non-monotonic, because the value will
>>>> become much smaller after the timer is initialized.
>>>>
>>>> No users of get_ticks which I checked handle errors of this kind. Further,
>>>> functions like tick_to_time mangle the result of get_ticks, making it very
>>>> unlikely that one could check for an error without suggesting a patch such
>>>> as this one.
>>>>
>>>> This patch changes get_ticks to always return 0 when there is an error.
>>>> 0 is the least unsigned integer, ensuring get_ticks appears monotonic. This
>>>> has the side effect of time apparently not passing until the timer is
>>>> initialized. However, without this patch, time does not pass anyway,
>>>> because the error value is likely to be the same.
>>>>
>>>> Fixes: c8a7ba9e6a5
>>>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>>>> ---
>>>>
>>>>  lib/time.c | 4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> Would it be better to panic so people can fix the bug?
>>
>> I thought this was expected behavior. It's only a bug if you do
>> something like udelay before any timers are created. We just can't
>> report errors through get_ticks, because its users assume that it always
>> returns a time of some kind.
> 
> I think it indicates a bug. If you use a device before it is ready you
> don't really know what it will do. I worry that this patch is just
> going to cause confusion, since the behaviour depends on when you call
> it. If we panic, people can figure out why the timer is being inited
> too late, or being used too early.

Hm, maybe. I don't think it's as clear cut as "us[ing] a device before
it is ready," because get_ticks tries to initialize the timer if it
isn't already initialized. Unless someone else does it first, the first
call to get_ticks will always be before the timer is initialized.

The specific problem I ran into was that after relocation, the watchdog
may be initialized before the timer. This occurs on RISC-V because
without [1] a timer only exists after arch_early_init_r. So, for the
first few calls to watchdog_reset there is no timer.
 
The second return could probably be turned into a panic. I checked, and
all current timer drivers always succeed in getting the time (except for
the RISC-V timer, which is fixed in [1]), so the only way for
timer_get_count to fail is if timer_ops.get_count doesn't exist. That is
almost certainly an error on the driver author's part, so I think
panicking there is the only reasonable option.

(Does get_count even need to have a return value? I think it's
reasonable to always expect the timer to return a value.)

--Sean

[1] https://patchwork.ozlabs.org/project/uboot/list/?series=198797
Simon Glass Sept. 8, 2020, 11:56 p.m. UTC | #5
Hi Sean,

On Mon, 7 Sep 2020 at 09:51, Sean Anderson <seanga2@gmail.com> wrote:
>
> On 9/7/20 9:57 AM, Simon Glass wrote:
> > Hi Sean,
> >
> > On Sun, 6 Sep 2020 at 20:02, Sean Anderson <seanga2@gmail.com> wrote:
> >>
> >> On 9/6/20 9:43 PM, Simon Glass wrote:
> >>> Hi Sean,
> >>>
> >>> On Tue, 1 Sep 2020 at 13:56, Sean Anderson <seanga2@gmail.com> wrote:
> >>>>
> >>>> get_ticks does not always succeed. Sometimes it can be called before the
> >>>> timer has been initialized. If it does, it returns a negative errno.
> >>>> This causes the timer to appear non-monotonic, because the value will
> >>>> become much smaller after the timer is initialized.
> >>>>
> >>>> No users of get_ticks which I checked handle errors of this kind. Further,
> >>>> functions like tick_to_time mangle the result of get_ticks, making it very
> >>>> unlikely that one could check for an error without suggesting a patch such
> >>>> as this one.
> >>>>
> >>>> This patch changes get_ticks to always return 0 when there is an error.
> >>>> 0 is the least unsigned integer, ensuring get_ticks appears monotonic. This
> >>>> has the side effect of time apparently not passing until the timer is
> >>>> initialized. However, without this patch, time does not pass anyway,
> >>>> because the error value is likely to be the same.
> >>>>
> >>>> Fixes: c8a7ba9e6a5
> >>>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> >>>> ---
> >>>>
> >>>>  lib/time.c | 4 ++--
> >>>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> Would it be better to panic so people can fix the bug?
> >>
> >> I thought this was expected behavior. It's only a bug if you do
> >> something like udelay before any timers are created. We just can't
> >> report errors through get_ticks, because its users assume that it always
> >> returns a time of some kind.
> >
> > I think it indicates a bug. If you use a device before it is ready you
> > don't really know what it will do. I worry that this patch is just
> > going to cause confusion, since the behaviour depends on when you call
> > it. If we panic, people can figure out why the timer is being inited
> > too late, or being used too early.
>
> Hm, maybe. I don't think it's as clear cut as "us[ing] a device before
> it is ready," because get_ticks tries to initialize the timer if it
> isn't already initialized. Unless someone else does it first, the first
> call to get_ticks will always be before the timer is initialized.
>
> The specific problem I ran into was that after relocation, the watchdog
> may be initialized before the timer. This occurs on RISC-V because
> without [1] a timer only exists after arch_early_init_r. So, for the
> first few calls to watchdog_reset there is no timer.
>
> The second return could probably be turned into a panic. I checked, and
> all current timer drivers always succeed in getting the time (except for
> the RISC-V timer, which is fixed in [1]), so the only way for
> timer_get_count to fail is if timer_ops.get_count doesn't exist. That is
> almost certainly an error on the driver author's part, so I think
> panicking there is the only reasonable option.

OK good, let's do that and update docs in timer.h

>
> (Does get_count even need to have a return value? I think it's
> reasonable to always expect the timer to return a value.)

I saw your patch, seems OK.

Regards,
Simon

>
> --Sean
>
> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=198797
Sean Anderson Sept. 8, 2020, 11:59 p.m. UTC | #6
On 9/8/20 7:56 PM, Simon Glass wrote:
> Hi Sean,
> 
> On Mon, 7 Sep 2020 at 09:51, Sean Anderson <seanga2@gmail.com> wrote:
>>
>> On 9/7/20 9:57 AM, Simon Glass wrote:
>>> Hi Sean,
>>>
>>> On Sun, 6 Sep 2020 at 20:02, Sean Anderson <seanga2@gmail.com> wrote:
>>>>
>>>> On 9/6/20 9:43 PM, Simon Glass wrote:
>>>>> Hi Sean,
>>>>>
>>>>> On Tue, 1 Sep 2020 at 13:56, Sean Anderson <seanga2@gmail.com> wrote:
>>>>>>
>>>>>> get_ticks does not always succeed. Sometimes it can be called before the
>>>>>> timer has been initialized. If it does, it returns a negative errno.
>>>>>> This causes the timer to appear non-monotonic, because the value will
>>>>>> become much smaller after the timer is initialized.
>>>>>>
>>>>>> No users of get_ticks which I checked handle errors of this kind. Further,
>>>>>> functions like tick_to_time mangle the result of get_ticks, making it very
>>>>>> unlikely that one could check for an error without suggesting a patch such
>>>>>> as this one.
>>>>>>
>>>>>> This patch changes get_ticks to always return 0 when there is an error.
>>>>>> 0 is the least unsigned integer, ensuring get_ticks appears monotonic. This
>>>>>> has the side effect of time apparently not passing until the timer is
>>>>>> initialized. However, without this patch, time does not pass anyway,
>>>>>> because the error value is likely to be the same.
>>>>>>
>>>>>> Fixes: c8a7ba9e6a5
>>>>>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>>>>>> ---
>>>>>>
>>>>>>  lib/time.c | 4 ++--
>>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> Would it be better to panic so people can fix the bug?
>>>>
>>>> I thought this was expected behavior. It's only a bug if you do
>>>> something like udelay before any timers are created. We just can't
>>>> report errors through get_ticks, because its users assume that it always
>>>> returns a time of some kind.
>>>
>>> I think it indicates a bug. If you use a device before it is ready you
>>> don't really know what it will do. I worry that this patch is just
>>> going to cause confusion, since the behaviour depends on when you call
>>> it. If we panic, people can figure out why the timer is being inited
>>> too late, or being used too early.
>>
>> Hm, maybe. I don't think it's as clear cut as "us[ing] a device before
>> it is ready," because get_ticks tries to initialize the timer if it
>> isn't already initialized. Unless someone else does it first, the first
>> call to get_ticks will always be before the timer is initialized.
>>
>> The specific problem I ran into was that after relocation, the watchdog
>> may be initialized before the timer. This occurs on RISC-V because
>> without [1] a timer only exists after arch_early_init_r. So, for the
>> first few calls to watchdog_reset there is no timer.
>>
>> The second return could probably be turned into a panic. I checked, and
>> all current timer drivers always succeed in getting the time (except for
>> the RISC-V timer, which is fixed in [1]), so the only way for
>> timer_get_count to fail is if timer_ops.get_count doesn't exist. That is
>> almost certainly an error on the driver author's part, so I think
>> panicking there is the only reasonable option.
> 
> OK good, let's do that and update docs in timer.h

That being to panic both times, or just panic the second time?

--Sean
Simon Glass Sept. 9, 2020, 12:01 a.m. UTC | #7
Hi Sean,

On Tue, 8 Sep 2020 at 17:59, Sean Anderson <seanga2@gmail.com> wrote:
>
> On 9/8/20 7:56 PM, Simon Glass wrote:
> > Hi Sean,
> >
> > On Mon, 7 Sep 2020 at 09:51, Sean Anderson <seanga2@gmail.com> wrote:
> >>
> >> On 9/7/20 9:57 AM, Simon Glass wrote:
> >>> Hi Sean,
> >>>
> >>> On Sun, 6 Sep 2020 at 20:02, Sean Anderson <seanga2@gmail.com> wrote:
> >>>>
> >>>> On 9/6/20 9:43 PM, Simon Glass wrote:
> >>>>> Hi Sean,
> >>>>>
> >>>>> On Tue, 1 Sep 2020 at 13:56, Sean Anderson <seanga2@gmail.com> wrote:
> >>>>>>
> >>>>>> get_ticks does not always succeed. Sometimes it can be called before the
> >>>>>> timer has been initialized. If it does, it returns a negative errno.
> >>>>>> This causes the timer to appear non-monotonic, because the value will
> >>>>>> become much smaller after the timer is initialized.
> >>>>>>
> >>>>>> No users of get_ticks which I checked handle errors of this kind. Further,
> >>>>>> functions like tick_to_time mangle the result of get_ticks, making it very
> >>>>>> unlikely that one could check for an error without suggesting a patch such
> >>>>>> as this one.
> >>>>>>
> >>>>>> This patch changes get_ticks to always return 0 when there is an error.
> >>>>>> 0 is the least unsigned integer, ensuring get_ticks appears monotonic. This
> >>>>>> has the side effect of time apparently not passing until the timer is
> >>>>>> initialized. However, without this patch, time does not pass anyway,
> >>>>>> because the error value is likely to be the same.
> >>>>>>
> >>>>>> Fixes: c8a7ba9e6a5
> >>>>>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> >>>>>> ---
> >>>>>>
> >>>>>>  lib/time.c | 4 ++--
> >>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> Would it be better to panic so people can fix the bug?
> >>>>
> >>>> I thought this was expected behavior. It's only a bug if you do
> >>>> something like udelay before any timers are created. We just can't
> >>>> report errors through get_ticks, because its users assume that it always
> >>>> returns a time of some kind.
> >>>
> >>> I think it indicates a bug. If you use a device before it is ready you
> >>> don't really know what it will do. I worry that this patch is just
> >>> going to cause confusion, since the behaviour depends on when you call
> >>> it. If we panic, people can figure out why the timer is being inited
> >>> too late, or being used too early.
> >>
> >> Hm, maybe. I don't think it's as clear cut as "us[ing] a device before
> >> it is ready," because get_ticks tries to initialize the timer if it
> >> isn't already initialized. Unless someone else does it first, the first
> >> call to get_ticks will always be before the timer is initialized.
> >>
> >> The specific problem I ran into was that after relocation, the watchdog
> >> may be initialized before the timer. This occurs on RISC-V because
> >> without [1] a timer only exists after arch_early_init_r. So, for the
> >> first few calls to watchdog_reset there is no timer.
> >>
> >> The second return could probably be turned into a panic. I checked, and
> >> all current timer drivers always succeed in getting the time (except for
> >> the RISC-V timer, which is fixed in [1]), so the only way for
> >> timer_get_count to fail is if timer_ops.get_count doesn't exist. That is
> >> almost certainly an error on the driver author's part, so I think
> >> panicking there is the only reasonable option.
> >
> > OK good, let's do that and update docs in timer.h
>
> That being to panic both times, or just panic the second time?

Well I like a panic if the call is invalid, ie. in both cases.

Regards,
Simon
Sean Anderson Sept. 9, 2020, 12:02 a.m. UTC | #8
On 9/8/20 8:01 PM, Simon Glass wrote:
> Hi Sean,
> 
> On Tue, 8 Sep 2020 at 17:59, Sean Anderson <seanga2@gmail.com> wrote:
>>
>> On 9/8/20 7:56 PM, Simon Glass wrote:
>>> Hi Sean,
>>>
>>> On Mon, 7 Sep 2020 at 09:51, Sean Anderson <seanga2@gmail.com> wrote:
>>>>
>>>> On 9/7/20 9:57 AM, Simon Glass wrote:
>>>>> Hi Sean,
>>>>>
>>>>> On Sun, 6 Sep 2020 at 20:02, Sean Anderson <seanga2@gmail.com> wrote:
>>>>>>
>>>>>> On 9/6/20 9:43 PM, Simon Glass wrote:
>>>>>>> Hi Sean,
>>>>>>>
>>>>>>> On Tue, 1 Sep 2020 at 13:56, Sean Anderson <seanga2@gmail.com> wrote:
>>>>>>>>
>>>>>>>> get_ticks does not always succeed. Sometimes it can be called before the
>>>>>>>> timer has been initialized. If it does, it returns a negative errno.
>>>>>>>> This causes the timer to appear non-monotonic, because the value will
>>>>>>>> become much smaller after the timer is initialized.
>>>>>>>>
>>>>>>>> No users of get_ticks which I checked handle errors of this kind. Further,
>>>>>>>> functions like tick_to_time mangle the result of get_ticks, making it very
>>>>>>>> unlikely that one could check for an error without suggesting a patch such
>>>>>>>> as this one.
>>>>>>>>
>>>>>>>> This patch changes get_ticks to always return 0 when there is an error.
>>>>>>>> 0 is the least unsigned integer, ensuring get_ticks appears monotonic. This
>>>>>>>> has the side effect of time apparently not passing until the timer is
>>>>>>>> initialized. However, without this patch, time does not pass anyway,
>>>>>>>> because the error value is likely to be the same.
>>>>>>>>
>>>>>>>> Fixes: c8a7ba9e6a5
>>>>>>>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>>>>>>>> ---
>>>>>>>>
>>>>>>>>  lib/time.c | 4 ++--
>>>>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> Would it be better to panic so people can fix the bug?
>>>>>>
>>>>>> I thought this was expected behavior. It's only a bug if you do
>>>>>> something like udelay before any timers are created. We just can't
>>>>>> report errors through get_ticks, because its users assume that it always
>>>>>> returns a time of some kind.
>>>>>
>>>>> I think it indicates a bug. If you use a device before it is ready you
>>>>> don't really know what it will do. I worry that this patch is just
>>>>> going to cause confusion, since the behaviour depends on when you call
>>>>> it. If we panic, people can figure out why the timer is being inited
>>>>> too late, or being used too early.
>>>>
>>>> Hm, maybe. I don't think it's as clear cut as "us[ing] a device before
>>>> it is ready," because get_ticks tries to initialize the timer if it
>>>> isn't already initialized. Unless someone else does it first, the first
>>>> call to get_ticks will always be before the timer is initialized.
>>>>
>>>> The specific problem I ran into was that after relocation, the watchdog
>>>> may be initialized before the timer. This occurs on RISC-V because
>>>> without [1] a timer only exists after arch_early_init_r. So, for the
>>>> first few calls to watchdog_reset there is no timer.
>>>>
>>>> The second return could probably be turned into a panic. I checked, and
>>>> all current timer drivers always succeed in getting the time (except for
>>>> the RISC-V timer, which is fixed in [1]), so the only way for
>>>> timer_get_count to fail is if timer_ops.get_count doesn't exist. That is
>>>> almost certainly an error on the driver author's part, so I think
>>>> panicking there is the only reasonable option.
>>>
>>> OK good, let's do that and update docs in timer.h
>>
>> That being to panic both times, or just panic the second time?
> 
> Well I like a panic if the call is invalid, ie. in both cases.

Ok, sounds good to me.

--Sean
diff mbox series

Patch

diff --git a/lib/time.c b/lib/time.c
index 47f8c84327..45750b6d36 100644
--- a/lib/time.c
+++ b/lib/time.c
@@ -91,13 +91,13 @@  uint64_t notrace get_ticks(void)
 
 		ret = dm_timer_init();
 		if (ret)
-			return ret;
+			return 0;
 #endif
 	}
 
 	ret = timer_get_count(gd->timer, &count);
 	if (ret)
-		return ret;
+		return 0;
 
 	return count;
 }