Message ID | 87tur3fx7w.fsf@nanos.tec.linutronix.de |
---|---|
State | Not Applicable |
Headers | show |
Series | [V2] rtc: mc146818: Detect and handle broken RTCs | expand |
Thomas Gleixner <tglx@linutronix.de> writes: > The recent fix for handling the UIP bit unearthed another issue in the RTC > code. If the RTC is advertised but the readout is straight 0xFF because > it's not available, the old code just proceeded with crappy values, but the > new code hangs because it waits for the UIP bit to become low. > > Add a sanity check in the RTC CMOS probe function which reads the RTC_VALID > register (Register D) which should have bit 0-6 cleared. If that's not the > case then fail to register the CMOS. > > Add the same check to mc146818_get_time(), warn once when the condition > is true and invalidate the rtc_time data. In case it is helpful: on my hardware this patch triggers a warning (attached below). Without it the rtc messages look like this: [ 2.783386] rtc_cmos 00:01: RTC can wake from S4 [ 2.784302] rtc_cmos 00:01: registered as rtc0 [ 2.785036] rtc_cmos 00:01: setting system clock to 2021-01-31T10:13:40 UTC (1612088020) [ 2.785713] rtc_cmos 00:01: alarms up to one month, y3k, 114 bytes nvram, hpet irqs Dirk [ 7.258410] ------------[ cut here ]------------ [ 7.258414] WARNING: CPU: 2 PID: 0 at drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x2b/0x1e5 [ 7.258420] Modules linked in: iwlmvm(+) mac80211 iwlwifi sdhci_pci amdgpu(+) drm_ttm_helper cfg80211 ttm cqhci gpu_sched sdhci ccp thinkpad_acpi(+) rng_core nvram tpm_tis(+) tpm_tis_core wmi tpm pinctrl_amd [ 7.258432] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W 5.11.0-rc5-next-20210129-x86_64 #180 [ 7.258434] Hardware name: LENOVO 20U50008GE/20U50008GE, BIOS R19ET26W (1.10 ) 06/22/2020 [ 7.258435] RIP: 0010:mc146818_get_time+0x2b/0x1e5 [ 7.258437] Code: 56 41 55 45 31 ed 41 54 55 53 48 89 fb 48 c7 c7 bc d9 eb 82 e8 26 d8 36 00 bf 0d 00 00 00 48 89 c5 e8 6d d1 8f ff a8 7f 74 24 <0f> 0b 48 c7 c7 bc d9 eb 82 48 89 ee e8 bc d6 36 00 b0 ff b9 24 00 [ 7.258438] RSP: 0018:ffffc9000022cef0 EFLAGS: 00010002 [ 7.258440] RAX: 0000000000000031 RBX: ffffc9000022cf24 RCX: 0000000000000000 [ 7.258441] RDX: 0000000000000001 RSI: ffff888105607000 RDI: 000000000000000d [ 7.258441] RBP: 0000000000000046 R08: ffffc9000022cf24 R09: 0000000000000000 [ 7.258442] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888105607000 [ 7.258443] R13: 0000000000000000 R14: ffffc9000022cfa4 R15: 0000000000000000 [ 7.258444] FS: 0000000000000000(0000) GS:ffff88840ec80000(0000) knlGS:0000000000000000 [ 7.258445] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7.258446] CR2: 00007f2ed26c4160 CR3: 000000000480a000 CR4: 0000000000350ee0 [ 7.258447] Call Trace: [ 7.258449] <IRQ> [ 7.258450] hpet_rtc_interrupt+0xd3/0x1a3 [ 7.258454] __handle_irq_event_percpu+0x6b/0x12e [ 7.258457] handle_irq_event_percpu+0x2c/0x6f [ 7.258459] handle_irq_event+0x23/0x43 [ 7.258461] handle_edge_irq+0x9e/0xbb [ 7.258463] asm_call_irq_on_stack+0x12/0x20 [ 7.258467] </IRQ> [ 7.258467] common_interrupt+0x9a/0x123 [ 7.258470] asm_common_interrupt+0x1e/0x40 [ 7.258472] RIP: 0010:cpuidle_enter_state+0x13e/0x1fe [ 7.258475] Code: 49 89 c4 e8 bd fd ff ff 31 ff e8 3e 80 92 ff 45 84 ff 74 12 9c 58 0f ba e0 09 73 03 0f 0b fa 31 ff e8 13 16 96 ff fb 45 85 f6 <0f> 88 97 00 00 00 49 63 d6 4c 2b 24 24 48 6b ca 68 48 6b c2 30 4c [ 7.258476] RSP: 0018:ffffc90000167eb0 EFLAGS: 00000206 [ 7.258477] RAX: ffff88840eca8240 RBX: ffff888101e0d400 RCX: 00000001b0a24b16 [ 7.258478] RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000000 [ 7.258478] RBP: 0000000000000003 R08: 00000000ffffffff R09: 0000000000000000 [ 7.258479] R10: ffff88810083c4a8 R11: 0000000000000000 R12: 00000001b0a24b48 [ 7.258480] R13: ffffffff8299cc60 R14: 0000000000000003 R15: 0000000000000000 [ 7.258482] cpuidle_enter+0x2b/0x37 [ 7.258483] do_idle+0x126/0x184 [ 7.258485] cpu_startup_entry+0x18/0x1a [ 7.258486] secondary_startup_64_no_verify+0xb0/0xbb [ 7.258489] ---[ end trace 9da59c3696ed99d8 ]--- > Reported-by: Mickaël Salaün <mic@digikod.net> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Tested-by: Mickaël Salaün <mic@linux.microsoft.com> > --- > V2: Fixed the sizeof() as spotted by Mickaël > --- > drivers/rtc/rtc-cmos.c | 8 ++++++++ > drivers/rtc/rtc-mc146818-lib.c | 7 +++++++ > 2 files changed, 15 insertions(+) > > --- a/drivers/rtc/rtc-cmos.c > +++ b/drivers/rtc/rtc-cmos.c > @@ -805,6 +805,14 @@ cmos_do_probe(struct device *dev, struct > > spin_lock_irq(&rtc_lock); > > + /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */ > + if ((CMOS_READ(RTC_VALID) & 0x7f) != 0) { > + spin_unlock_irq(&rtc_lock); > + dev_warn(dev, "not accessible\n"); > + retval = -ENXIO; > + goto cleanup1; > + } > + > if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) { > /* force periodic irq to CMOS reset default of 1024Hz; > * > --- a/drivers/rtc/rtc-mc146818-lib.c > +++ b/drivers/rtc/rtc-mc146818-lib.c > @@ -21,6 +21,13 @@ unsigned int mc146818_get_time(struct rt > > again: > spin_lock_irqsave(&rtc_lock, flags); > + /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */ > + if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x7f) != 0)) { > + spin_unlock_irqrestore(&rtc_lock, flags); > + memset(time, 0xff, sizeof(*time)); > + return 0; > + } > + > /* > * Check whether there is an update in progress during which the > * readout is unspecified. The maximum update time is ~2ms. Poll
Hi! "Me too": > --- a/drivers/rtc/rtc-mc146818-lib.c > +++ b/drivers/rtc/rtc-mc146818-lib.c > @@ -21,6 +21,13 @@ unsigned int mc146818_get_time(struct rt > > again: > spin_lock_irqsave(&rtc_lock, flags); > + /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */ > + if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x7f) != 0)) { > + spin_unlock_irqrestore(&rtc_lock, flags); > + memset(time, 0xff, sizeof(*time)); > + return 0; > + } > + ... triggers here on a different box (Xiaomi mi notebook air 12.5): [ 3.524002] ------------[ cut here ]------------ [ 3.528317] WARNING: CPU: 3 PID: 273 at drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x1b6/0x210 [ 3.532558] CPU: 3 PID: 273 Comm: udevadm Not tainted 5.11.0-rc6 #760 [ 3.536748] Hardware name: Timi TM1612/TM1612, BIOS A04 08/06/2016 [ 3.540947] RIP: 0010:mc146818_get_time+0x1b6/0x210 [ 3.545103] Code: 76 0b 0f b6 d0 83 ea 13 6b d2 64 01 d5 83 fd 45 89 6b 14 7f 06 83 c5 64 89 6b 14 41 83 ed 01 b8 02 00 00 00 44 89 6b 10 eb 39 <0f> 0b 48 c7 c7 b4 e0 9e 82 48 89 ee e8 29 6b 34 00 48 c7 03 ff ff [ 3.549883] RSP: 0000:ffffc900012efe30 EFLAGS: 00010002 [ 3.554387] RAX: 0000000000000081 RBX: ffffc900012efe64 RCX: 000000000005b8d7 [ 3.558867] RDX: 0000000000000001 RSI: ffff8881000aa000 RDI: 000000000000000d [ 3.563333] RBP: 0000000000000046 R08: 0000000000000004 R09: fffffffe5e075ac6 [ 3.567748] iwlwifi 0000:02:00.0: Applying debug destination EXTERNAL_DRAM [ 3.567822] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 3.567827] R13: ffffc900012efedc R14: 0000000000000008 R15: ffff888100051200 [ 3.577223] FS: 0000000000000000(0000) GS:ffff88816ad80000(0000) knlGS:0000000000000000 [ 3.579870] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.581947] CR2: 00007fface455e28 CR3: 0000000103244005 CR4: 00000000003706a0 [ 3.583836] Call Trace: [ 3.585699] hpet_rtc_interrupt+0x1af/0x220 [ 3.587585] __handle_irq_event_percpu+0x5a/0xc0 [ 3.589230] handle_irq_event_percpu+0x1b/0x50 [ 3.590673] handle_irq_event+0x22/0x40 [ 3.592107] handle_edge_irq+0x6b/0x190 [ 3.593545] common_interrupt+0x67/0x130 [ 3.594983] ? asm_common_interrupt+0x8/0x40 [ 3.596432] asm_common_interrupt+0x1e/0x40 [ 3.597618] RIP: 0033:0x7ffaceac9b31 [ 3.598794] Code: 48 83 fe 0a 0f 87 f5 fe ff ff be 41 ff ff 6f 48 29 d6 48 89 04 f1 e9 e4 fe ff ff 48 85 ff 74 79 49 8b 44 24 60 48 85 c0 74 04 <48> 01 78 08 49 8b 44 24 58 48 85 c0 74 04 48 01 78 08 49 8b 44 24 [ 3.600048] RSP: 002b:00007ffc12303b00 EFLAGS: 00010202 [ 3.601343] RAX: 00007fface455e20 RBX: 000000006ffffdff RCX: 00007fface80c040 [ 3.602587] RDX: 0000000000000000 RSI: 0000000000000029 RDI: 00007fface451000 [ 3.603809] RBP: 00007ffc12303c50 R08: 000000006fffffff R09: 00000000effffef5 [ 3.605015] R10: 0000000070000022 R11: 0000000000000032 R12: 00007fface80c000 [ 3.606223] R13: 000000006ffffeff R14: 000000006ffffe35 R15: 00007ffc12303ce0 [ 3.607421] ---[ end trace 5922ddf43b0f7b83 ]--- [ 3.608692] hpet: Lost 3 RTC interrupts
--- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -805,6 +805,14 @@ cmos_do_probe(struct device *dev, struct spin_lock_irq(&rtc_lock); + /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */ + if ((CMOS_READ(RTC_VALID) & 0x7f) != 0) { + spin_unlock_irq(&rtc_lock); + dev_warn(dev, "not accessible\n"); + retval = -ENXIO; + goto cleanup1; + } + if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) { /* force periodic irq to CMOS reset default of 1024Hz; * --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -21,6 +21,13 @@ unsigned int mc146818_get_time(struct rt again: spin_lock_irqsave(&rtc_lock, flags); + /* Ensure that the RTC is accessible. Bit 0-6 must be 0! */ + if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x7f) != 0)) { + spin_unlock_irqrestore(&rtc_lock, flags); + memset(time, 0xff, sizeof(*time)); + return 0; + } + /* * Check whether there is an update in progress during which the * readout is unspecified. The maximum update time is ~2ms. Poll