rtc: report time-retrieval errors when updating alarm
diff mbox series

Message ID 20180521164222.149896-1-briannorris@chromium.org
State Rejected
Headers show
Series
  • rtc: report time-retrieval errors when updating alarm
Related show

Commit Message

Brian Norris May 21, 2018, 4:42 p.m. UTC
__rtc_read_time() can fail (e.g., if the RTC uses an unreliable medium).
When it does, we don't report the error, but instead calculate a
1-second alarm based on the potentially-garbage 'tm' (in practice,
__rtc_read_time() zeroes out the time first, so it's likely to still be
all 0).

Let's propagate the error instead.

Signed-off-by: Brian Norris <briannorris@chromium.org>
---
 drivers/rtc/interface.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Alexandre Belloni Oct. 21, 2019, 4:11 p.m. UTC | #1
Hello Brian,

On 21/05/2018 09:42:22-0700, Brian Norris wrote:
> __rtc_read_time() can fail (e.g., if the RTC uses an unreliable medium).
> When it does, we don't report the error, but instead calculate a
> 1-second alarm based on the potentially-garbage 'tm' (in practice,
> __rtc_read_time() zeroes out the time first, so it's likely to still be
> all 0).
> 
> Let's propagate the error instead.
> 

I submitted
https://lore.kernel.org/linux-rtc/20191021155631.3342-2-alexandre.belloni@bootlin.com/T/#u
to solve this after looking at all the implication checking the return
value of __rtc_read_time had.

> Signed-off-by: Brian Norris <briannorris@chromium.org>
> ---
>  drivers/rtc/interface.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
> index 7cbdc9228dd5..a4bdd8b5fe2e 100644
> --- a/drivers/rtc/interface.c
> +++ b/drivers/rtc/interface.c
> @@ -555,7 +555,9 @@ int rtc_update_irq_enable(struct rtc_device *rtc, unsigned int enabled)
>  		struct rtc_time tm;
>  		ktime_t now, onesec;
>  
> -		__rtc_read_time(rtc, &tm);
> +		err = __rtc_read_time(rtc, &tm);
> +		if (err)
> +			goto out;
>  		onesec = ktime_set(1, 0);
>  		now = rtc_tm_to_ktime(tm);
>  		rtc->uie_rtctimer.node.expires = ktime_add(now, onesec);
> -- 
> 2.17.0.441.gb46fe60e1d-goog
>
Brian Norris Oct. 21, 2019, 5:20 p.m. UTC | #2
Hi Alexandre!

On Mon, Oct 21, 2019 at 9:11 AM Alexandre Belloni
<alexandre.belloni@bootlin.com> wrote:
> On 21/05/2018 09:42:22-0700, Brian Norris wrote:
> > __rtc_read_time() can fail (e.g., if the RTC uses an unreliable medium).
> > When it does, we don't report the error, but instead calculate a
> > 1-second alarm based on the potentially-garbage 'tm' (in practice,
> > __rtc_read_time() zeroes out the time first, so it's likely to still be
> > all 0).
> >
> > Let's propagate the error instead.
> >
>
> I submitted
> https://lore.kernel.org/linux-rtc/20191021155631.3342-2-alexandre.belloni@bootlin.com/T/#u
> to solve this after looking at all the implication checking the return
> value of __rtc_read_time had.

Only about a year and a half late, nice! Fortunately we have a decent
(albeit time-consuming) process for rebasing our downstream patches in
Chrome OS kernels...

Brian
Alexandre Belloni Oct. 21, 2019, 5:48 p.m. UTC | #3
On 21/10/2019 10:20:08-0700, Brian Norris wrote:
> Hi Alexandre!
> 
> On Mon, Oct 21, 2019 at 9:11 AM Alexandre Belloni
> <alexandre.belloni@bootlin.com> wrote:
> > On 21/05/2018 09:42:22-0700, Brian Norris wrote:
> > > __rtc_read_time() can fail (e.g., if the RTC uses an unreliable medium).
> > > When it does, we don't report the error, but instead calculate a
> > > 1-second alarm based on the potentially-garbage 'tm' (in practice,
> > > __rtc_read_time() zeroes out the time first, so it's likely to still be
> > > all 0).
> > >
> > > Let's propagate the error instead.
> > >
> >
> > I submitted
> > https://lore.kernel.org/linux-rtc/20191021155631.3342-2-alexandre.belloni@bootlin.com/T/#u
> > to solve this after looking at all the implication checking the return
> > value of __rtc_read_time had.
> 
> Only about a year and a half late, nice!

I know, right? :) The fact is that this is not a common issue or at
least, I didn't have any report that this was causing real problems in
the field. So it ended up being one of the longest standing patch in
patchwork...

>Fortunately we have a decent
> (albeit time-consuming) process for rebasing our downstream patches in
> Chrome OS kernels...
> 

Do you need that patch backported to LTS kernels?
Brian Norris Oct. 21, 2019, 6:08 p.m. UTC | #4
On Mon, Oct 21, 2019 at 10:48 AM Alexandre Belloni
<alexandre.belloni@bootlin.com> wrote:
> On 21/10/2019 10:20:08-0700, Brian Norris wrote:
> > Hi Alexandre!
> >
> > On Mon, Oct 21, 2019 at 9:11 AM Alexandre Belloni
> > <alexandre.belloni@bootlin.com> wrote:
> > > On 21/05/2018 09:42:22-0700, Brian Norris wrote:
> > > > __rtc_read_time() can fail (e.g., if the RTC uses an unreliable medium).
> > > > When it does, we don't report the error, but instead calculate a
> > > > 1-second alarm based on the potentially-garbage 'tm' (in practice,
> > > > __rtc_read_time() zeroes out the time first, so it's likely to still be
> > > > all 0).
> > > >
> > > > Let's propagate the error instead.
> > > >
> > >
> > > I submitted
> > > https://lore.kernel.org/linux-rtc/20191021155631.3342-2-alexandre.belloni@bootlin.com/T/#u
> > > to solve this after looking at all the implication checking the return
> > > value of __rtc_read_time had.
> >
> > Only about a year and a half late, nice!
>
> I know, right? :) The fact is that this is not a common issue or at
> least, I didn't have any report that this was causing real problems in
> the field. So it ended up being one of the longest standing patch in
> patchwork...

I suppose I could have put specific examples in here: the Rockchip
RK3399-based Gru family of Chromebooks
(arch/arm64/boot/dts/rockchip/rk3399-gru-{kevin,bob,scarlet}*.dts) use
the Chrome OS EC-based RTC (drivers/rtc/rtc-cros-ec.c), and the EC
protocol is prone to occasional errors. So we definitely have a
concrete case where this problem can be tickled. I guess I was too
terse in summarizing that as "if the RTC uses an unreliable medium"?

As for the actual symptoms: this was part of a variety of problems
that resulted in seeing interrupt storms from our EC/RTC when running
`hwclock -r`. I believe there were other patches that were more
critical to resolving the worst symptoms, but this error was noticed
along the way. If you care to read more, you can see our downstream
kernel patches here, when we first handled this problem:

https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1067442
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1066984
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1069546

Unfortunately, the bug links are private (they were dealing with
partner/factory issues), so you can only glean the implicit
information from the code. And since this was over a year ago, my
memory is a little fuzzy on what exactly the source of the interrupt
storm was...

> >Fortunately we have a decent
> > (albeit time-consuming) process for rebasing our downstream patches in
> > Chrome OS kernels...
> >
>
> Do you need that patch backported to LTS kernels?

Eh, I dunno. If anything that'll just cause us merge troubles (but not
too much) on our Chrome OS kernels, which already carry my patch. But
if there are any non-Chrome-OS users of these Chromebooks (there are)
that are seeing this problem (I'm not sure), they might appreciate it.

By the way, I wonder if your patch actually deserves a "Reported-by".
I suppose I also left off Jeffy as the reporter, but it would be:

Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Reported-by: Brian Norris <briannorris@chromium.org>

Brian

Patch
diff mbox series

diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
index 7cbdc9228dd5..a4bdd8b5fe2e 100644
--- a/drivers/rtc/interface.c
+++ b/drivers/rtc/interface.c
@@ -555,7 +555,9 @@  int rtc_update_irq_enable(struct rtc_device *rtc, unsigned int enabled)
 		struct rtc_time tm;
 		ktime_t now, onesec;
 
-		__rtc_read_time(rtc, &tm);
+		err = __rtc_read_time(rtc, &tm);
+		if (err)
+			goto out;
 		onesec = ktime_set(1, 0);
 		now = rtc_tm_to_ktime(tm);
 		rtc->uie_rtctimer.node.expires = ktime_add(now, onesec);