Message ID | 20180521164222.149896-1-briannorris@chromium.org |
---|---|
State | Rejected |
Headers | show |
Series | rtc: report time-retrieval errors when updating alarm | expand |
Hello Brian, On 21/05/2018 09:42:22-0700, Brian Norris wrote: > __rtc_read_time() can fail (e.g., if the RTC uses an unreliable medium). > When it does, we don't report the error, but instead calculate a > 1-second alarm based on the potentially-garbage 'tm' (in practice, > __rtc_read_time() zeroes out the time first, so it's likely to still be > all 0). > > Let's propagate the error instead. > I submitted https://lore.kernel.org/linux-rtc/20191021155631.3342-2-alexandre.belloni@bootlin.com/T/#u to solve this after looking at all the implication checking the return value of __rtc_read_time had. > Signed-off-by: Brian Norris <briannorris@chromium.org> > --- > drivers/rtc/interface.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c > index 7cbdc9228dd5..a4bdd8b5fe2e 100644 > --- a/drivers/rtc/interface.c > +++ b/drivers/rtc/interface.c > @@ -555,7 +555,9 @@ int rtc_update_irq_enable(struct rtc_device *rtc, unsigned int enabled) > struct rtc_time tm; > ktime_t now, onesec; > > - __rtc_read_time(rtc, &tm); > + err = __rtc_read_time(rtc, &tm); > + if (err) > + goto out; > onesec = ktime_set(1, 0); > now = rtc_tm_to_ktime(tm); > rtc->uie_rtctimer.node.expires = ktime_add(now, onesec); > -- > 2.17.0.441.gb46fe60e1d-goog >
Hi Alexandre! On Mon, Oct 21, 2019 at 9:11 AM Alexandre Belloni <alexandre.belloni@bootlin.com> wrote: > On 21/05/2018 09:42:22-0700, Brian Norris wrote: > > __rtc_read_time() can fail (e.g., if the RTC uses an unreliable medium). > > When it does, we don't report the error, but instead calculate a > > 1-second alarm based on the potentially-garbage 'tm' (in practice, > > __rtc_read_time() zeroes out the time first, so it's likely to still be > > all 0). > > > > Let's propagate the error instead. > > > > I submitted > https://lore.kernel.org/linux-rtc/20191021155631.3342-2-alexandre.belloni@bootlin.com/T/#u > to solve this after looking at all the implication checking the return > value of __rtc_read_time had. Only about a year and a half late, nice! Fortunately we have a decent (albeit time-consuming) process for rebasing our downstream patches in Chrome OS kernels... Brian
On 21/10/2019 10:20:08-0700, Brian Norris wrote: > Hi Alexandre! > > On Mon, Oct 21, 2019 at 9:11 AM Alexandre Belloni > <alexandre.belloni@bootlin.com> wrote: > > On 21/05/2018 09:42:22-0700, Brian Norris wrote: > > > __rtc_read_time() can fail (e.g., if the RTC uses an unreliable medium). > > > When it does, we don't report the error, but instead calculate a > > > 1-second alarm based on the potentially-garbage 'tm' (in practice, > > > __rtc_read_time() zeroes out the time first, so it's likely to still be > > > all 0). > > > > > > Let's propagate the error instead. > > > > > > > I submitted > > https://lore.kernel.org/linux-rtc/20191021155631.3342-2-alexandre.belloni@bootlin.com/T/#u > > to solve this after looking at all the implication checking the return > > value of __rtc_read_time had. > > Only about a year and a half late, nice! I know, right? :) The fact is that this is not a common issue or at least, I didn't have any report that this was causing real problems in the field. So it ended up being one of the longest standing patch in patchwork... >Fortunately we have a decent > (albeit time-consuming) process for rebasing our downstream patches in > Chrome OS kernels... > Do you need that patch backported to LTS kernels?
On Mon, Oct 21, 2019 at 10:48 AM Alexandre Belloni <alexandre.belloni@bootlin.com> wrote: > On 21/10/2019 10:20:08-0700, Brian Norris wrote: > > Hi Alexandre! > > > > On Mon, Oct 21, 2019 at 9:11 AM Alexandre Belloni > > <alexandre.belloni@bootlin.com> wrote: > > > On 21/05/2018 09:42:22-0700, Brian Norris wrote: > > > > __rtc_read_time() can fail (e.g., if the RTC uses an unreliable medium). > > > > When it does, we don't report the error, but instead calculate a > > > > 1-second alarm based on the potentially-garbage 'tm' (in practice, > > > > __rtc_read_time() zeroes out the time first, so it's likely to still be > > > > all 0). > > > > > > > > Let's propagate the error instead. > > > > > > > > > > I submitted > > > https://lore.kernel.org/linux-rtc/20191021155631.3342-2-alexandre.belloni@bootlin.com/T/#u > > > to solve this after looking at all the implication checking the return > > > value of __rtc_read_time had. > > > > Only about a year and a half late, nice! > > I know, right? :) The fact is that this is not a common issue or at > least, I didn't have any report that this was causing real problems in > the field. So it ended up being one of the longest standing patch in > patchwork... I suppose I could have put specific examples in here: the Rockchip RK3399-based Gru family of Chromebooks (arch/arm64/boot/dts/rockchip/rk3399-gru-{kevin,bob,scarlet}*.dts) use the Chrome OS EC-based RTC (drivers/rtc/rtc-cros-ec.c), and the EC protocol is prone to occasional errors. So we definitely have a concrete case where this problem can be tickled. I guess I was too terse in summarizing that as "if the RTC uses an unreliable medium"? As for the actual symptoms: this was part of a variety of problems that resulted in seeing interrupt storms from our EC/RTC when running `hwclock -r`. I believe there were other patches that were more critical to resolving the worst symptoms, but this error was noticed along the way. If you care to read more, you can see our downstream kernel patches here, when we first handled this problem: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1067442 https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1066984 https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1069546 Unfortunately, the bug links are private (they were dealing with partner/factory issues), so you can only glean the implicit information from the code. And since this was over a year ago, my memory is a little fuzzy on what exactly the source of the interrupt storm was... > >Fortunately we have a decent > > (albeit time-consuming) process for rebasing our downstream patches in > > Chrome OS kernels... > > > > Do you need that patch backported to LTS kernels? Eh, I dunno. If anything that'll just cause us merge troubles (but not too much) on our Chrome OS kernels, which already carry my patch. But if there are any non-Chrome-OS users of these Chromebooks (there are) that are seeing this problem (I'm not sure), they might appreciate it. By the way, I wonder if your patch actually deserves a "Reported-by". I suppose I also left off Jeffy as the reporter, but it would be: Reported-by: Jeffy Chen <jeffy.chen@rock-chips.com> Reported-by: Brian Norris <briannorris@chromium.org> Brian
diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c index 7cbdc9228dd5..a4bdd8b5fe2e 100644 --- a/drivers/rtc/interface.c +++ b/drivers/rtc/interface.c @@ -555,7 +555,9 @@ int rtc_update_irq_enable(struct rtc_device *rtc, unsigned int enabled) struct rtc_time tm; ktime_t now, onesec; - __rtc_read_time(rtc, &tm); + err = __rtc_read_time(rtc, &tm); + if (err) + goto out; onesec = ktime_set(1, 0); now = rtc_tm_to_ktime(tm); rtc->uie_rtctimer.node.expires = ktime_add(now, onesec);
__rtc_read_time() can fail (e.g., if the RTC uses an unreliable medium). When it does, we don't report the error, but instead calculate a 1-second alarm based on the potentially-garbage 'tm' (in practice, __rtc_read_time() zeroes out the time first, so it's likely to still be all 0). Let's propagate the error instead. Signed-off-by: Brian Norris <briannorris@chromium.org> --- drivers/rtc/interface.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)