diff mbox series

[V2] rtc: mc146818: Detect and handle broken RTCs

Message ID 87tur3fx7w.fsf@nanos.tec.linutronix.de
State Not Applicable
Headers show
Series [V2] rtc: mc146818: Detect and handle broken RTCs | expand

Commit Message

Thomas Gleixner Jan. 26, 2021, 5:02 p.m. UTC
The recent fix for handling the UIP bit unearthed another issue in the RTC
code. If the RTC is advertised but the readout is straight 0xFF because
it's not available, the old code just proceeded with crappy values, but the
new code hangs because it waits for the UIP bit to become low.

Add a sanity check in the RTC CMOS probe function which reads the RTC_VALID
register (Register D) which should have bit 0-6 cleared. If that's not the
case then fail to register the CMOS.

Add the same check to mc146818_get_time(), warn once when the condition
is true and invalidate the rtc_time data.

Reported-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mickaël Salaün <mic@linux.microsoft.com>
---
V2: Fixed the sizeof() as spotted by Mickaël
---
 drivers/rtc/rtc-cmos.c         |    8 ++++++++
 drivers/rtc/rtc-mc146818-lib.c |    7 +++++++
 2 files changed, 15 insertions(+)

Comments

Dirk Gouders Jan. 31, 2021, 10:59 a.m. UTC | #1
Thomas Gleixner <tglx@linutronix.de> writes:

> The recent fix for handling the UIP bit unearthed another issue in the RTC
> code. If the RTC is advertised but the readout is straight 0xFF because
> it's not available, the old code just proceeded with crappy values, but the
> new code hangs because it waits for the UIP bit to become low.
>
> Add a sanity check in the RTC CMOS probe function which reads the RTC_VALID
> register (Register D) which should have bit 0-6 cleared. If that's not the
> case then fail to register the CMOS.
>
> Add the same check to mc146818_get_time(), warn once when the condition
> is true and invalidate the rtc_time data.

In case it is helpful: on my hardware this patch triggers a warning
(attached below).

Without it the rtc messages look like this:

[    2.783386] rtc_cmos 00:01: RTC can wake from S4
[    2.784302] rtc_cmos 00:01: registered as rtc0
[    2.785036] rtc_cmos 00:01: setting system clock to 2021-01-31T10:13:40 UTC (1612088020)
[    2.785713] rtc_cmos 00:01: alarms up to one month, y3k, 114 bytes nvram, hpet irqs

Dirk

[    7.258410] ------------[ cut here ]------------
[    7.258414] WARNING: CPU: 2 PID: 0 at drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x2b/0x1e5
[    7.258420] Modules linked in: iwlmvm(+) mac80211 iwlwifi sdhci_pci amdgpu(+) drm_ttm_helper cfg80211 ttm cqhci gpu_sched sdhci ccp thinkpad_acpi(+) rng_core nvram tpm_tis(+) tpm_tis_core wmi tpm pinctrl_amd
[    7.258432] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W         5.11.0-rc5-next-20210129-x86_64 #180
[    7.258434] Hardware name: LENOVO 20U50008GE/20U50008GE, BIOS R19ET26W (1.10 ) 06/22/2020
[    7.258435] RIP: 0010:mc146818_get_time+0x2b/0x1e5
[    7.258437] Code: 56 41 55 45 31 ed 41 54 55 53 48 89 fb 48 c7 c7 bc d9 eb 82 e8 26 d8 36 00 bf 0d 00 00 00 48 89 c5 e8 6d d1 8f ff a8 7f 74 24 <0f> 0b 48 c7 c7 bc d9 eb 82 48 89 ee e8 bc d6 36 00 b0 ff b9 24 00
[    7.258438] RSP: 0018:ffffc9000022cef0 EFLAGS: 00010002
[    7.258440] RAX: 0000000000000031 RBX: ffffc9000022cf24 RCX: 0000000000000000
[    7.258441] RDX: 0000000000000001 RSI: ffff888105607000 RDI: 000000000000000d
[    7.258441] RBP: 0000000000000046 R08: ffffc9000022cf24 R09: 0000000000000000
[    7.258442] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888105607000
[    7.258443] R13: 0000000000000000 R14: ffffc9000022cfa4 R15: 0000000000000000
[    7.258444] FS:  0000000000000000(0000) GS:ffff88840ec80000(0000) knlGS:0000000000000000
[    7.258445] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.258446] CR2: 00007f2ed26c4160 CR3: 000000000480a000 CR4: 0000000000350ee0
[    7.258447] Call Trace:
[    7.258449]  <IRQ>
[    7.258450]  hpet_rtc_interrupt+0xd3/0x1a3
[    7.258454]  __handle_irq_event_percpu+0x6b/0x12e
[    7.258457]  handle_irq_event_percpu+0x2c/0x6f
[    7.258459]  handle_irq_event+0x23/0x43
[    7.258461]  handle_edge_irq+0x9e/0xbb
[    7.258463]  asm_call_irq_on_stack+0x12/0x20
[    7.258467]  </IRQ>
[    7.258467]  common_interrupt+0x9a/0x123
[    7.258470]  asm_common_interrupt+0x1e/0x40
[    7.258472] RIP: 0010:cpuidle_enter_state+0x13e/0x1fe
[    7.258475] Code: 49 89 c4 e8 bd fd ff ff 31 ff e8 3e 80 92 ff 45 84 ff 74 12 9c 58 0f ba e0 09 73 03 0f 0b fa 31 ff e8 13 16 96 ff fb 45 85 f6 <0f> 88 97 00 00 00 49 63 d6 4c 2b 24 24 48 6b ca 68 48 6b c2 30 4c
[    7.258476] RSP: 0018:ffffc90000167eb0 EFLAGS: 00000206
[    7.258477] RAX: ffff88840eca8240 RBX: ffff888101e0d400 RCX: 00000001b0a24b16
[    7.258478] RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000000
[    7.258478] RBP: 0000000000000003 R08: 00000000ffffffff R09: 0000000000000000
[    7.258479] R10: ffff88810083c4a8 R11: 0000000000000000 R12: 00000001b0a24b48
[    7.258480] R13: ffffffff8299cc60 R14: 0000000000000003 R15: 0000000000000000
[    7.258482]  cpuidle_enter+0x2b/0x37
[    7.258483]  do_idle+0x126/0x184
[    7.258485]  cpu_startup_entry+0x18/0x1a
[    7.258486]  secondary_startup_64_no_verify+0xb0/0xbb
[    7.258489] ---[ end trace 9da59c3696ed99d8 ]---


> Reported-by: Mickaël Salaün <mic@digikod.net>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Mickaël Salaün <mic@linux.microsoft.com>
> ---
> V2: Fixed the sizeof() as spotted by Mickaël
> ---
>  drivers/rtc/rtc-cmos.c         |    8 ++++++++
>  drivers/rtc/rtc-mc146818-lib.c |    7 +++++++
>  2 files changed, 15 insertions(+)
>
> --- a/drivers/rtc/rtc-cmos.c
> +++ b/drivers/rtc/rtc-cmos.c
> @@ -805,6 +805,14 @@ cmos_do_probe(struct device *dev, struct
>  
>  	spin_lock_irq(&rtc_lock);
>  
> +	/* Ensure that the RTC is accessible. Bit 0-6 must be 0! */
> +	if ((CMOS_READ(RTC_VALID) & 0x7f) != 0) {
> +		spin_unlock_irq(&rtc_lock);
> +		dev_warn(dev, "not accessible\n");
> +		retval = -ENXIO;
> +		goto cleanup1;
> +	}
> +
>  	if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) {
>  		/* force periodic irq to CMOS reset default of 1024Hz;
>  		 *
> --- a/drivers/rtc/rtc-mc146818-lib.c
> +++ b/drivers/rtc/rtc-mc146818-lib.c
> @@ -21,6 +21,13 @@ unsigned int mc146818_get_time(struct rt
>  
>  again:
>  	spin_lock_irqsave(&rtc_lock, flags);
> +	/* Ensure that the RTC is accessible. Bit 0-6 must be 0! */
> +	if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x7f) != 0)) {
> +		spin_unlock_irqrestore(&rtc_lock, flags);
> +		memset(time, 0xff, sizeof(*time));
> +		return 0;
> +	}
> +
>  	/*
>  	 * Check whether there is an update in progress during which the
>  	 * readout is unspecified. The maximum update time is ~2ms. Poll
Serge Belyshev Feb. 1, 2021, 1:53 p.m. UTC | #2
Hi!  "Me too":

> --- a/drivers/rtc/rtc-mc146818-lib.c
> +++ b/drivers/rtc/rtc-mc146818-lib.c
> @@ -21,6 +21,13 @@ unsigned int mc146818_get_time(struct rt
>  
>  again:
>  	spin_lock_irqsave(&rtc_lock, flags);
> +	/* Ensure that the RTC is accessible. Bit 0-6 must be 0! */
> +	if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x7f) != 0)) {
> +		spin_unlock_irqrestore(&rtc_lock, flags);
> +		memset(time, 0xff, sizeof(*time));
> +		return 0;
> +	}
> +

... triggers here on a different box (Xiaomi mi notebook air 12.5):

[    3.524002] ------------[ cut here ]------------
[    3.528317] WARNING: CPU: 3 PID: 273 at drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x1b6/0x210
[    3.532558] CPU: 3 PID: 273 Comm: udevadm Not tainted 5.11.0-rc6 #760
[    3.536748] Hardware name: Timi TM1612/TM1612, BIOS A04 08/06/2016
[    3.540947] RIP: 0010:mc146818_get_time+0x1b6/0x210
[    3.545103] Code: 76 0b 0f b6 d0 83 ea 13 6b d2 64 01 d5 83 fd 45 89 6b 14 7f 06 83 c5 64 89 6b 14 41 83 ed 01 b8 02 00 00 00 44 89 6b 10 eb 39 <0f> 0b 48 c7 c7 b4 e0 9e 82 48 89 ee e8 29 6b 34 00 48 c7 03 ff ff
[    3.549883] RSP: 0000:ffffc900012efe30 EFLAGS: 00010002
[    3.554387] RAX: 0000000000000081 RBX: ffffc900012efe64 RCX: 000000000005b8d7
[    3.558867] RDX: 0000000000000001 RSI: ffff8881000aa000 RDI: 000000000000000d
[    3.563333] RBP: 0000000000000046 R08: 0000000000000004 R09: fffffffe5e075ac6
[    3.567748] iwlwifi 0000:02:00.0: Applying debug destination EXTERNAL_DRAM
[    3.567822] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    3.567827] R13: ffffc900012efedc R14: 0000000000000008 R15: ffff888100051200
[    3.577223] FS:  0000000000000000(0000) GS:ffff88816ad80000(0000) knlGS:0000000000000000
[    3.579870] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.581947] CR2: 00007fface455e28 CR3: 0000000103244005 CR4: 00000000003706a0
[    3.583836] Call Trace:
[    3.585699]  hpet_rtc_interrupt+0x1af/0x220
[    3.587585]  __handle_irq_event_percpu+0x5a/0xc0
[    3.589230]  handle_irq_event_percpu+0x1b/0x50
[    3.590673]  handle_irq_event+0x22/0x40
[    3.592107]  handle_edge_irq+0x6b/0x190
[    3.593545]  common_interrupt+0x67/0x130
[    3.594983]  ? asm_common_interrupt+0x8/0x40
[    3.596432]  asm_common_interrupt+0x1e/0x40
[    3.597618] RIP: 0033:0x7ffaceac9b31
[    3.598794] Code: 48 83 fe 0a 0f 87 f5 fe ff ff be 41 ff ff 6f 48 29 d6 48 89 04 f1 e9 e4 fe ff ff 48 85 ff 74 79 49 8b 44 24 60 48 85 c0 74 04 <48> 01 78 08 49 8b 44 24 58 48 85 c0 74 04 48 01 78 08 49 8b 44 24
[    3.600048] RSP: 002b:00007ffc12303b00 EFLAGS: 00010202
[    3.601343] RAX: 00007fface455e20 RBX: 000000006ffffdff RCX: 00007fface80c040
[    3.602587] RDX: 0000000000000000 RSI: 0000000000000029 RDI: 00007fface451000
[    3.603809] RBP: 00007ffc12303c50 R08: 000000006fffffff R09: 00000000effffef5
[    3.605015] R10: 0000000070000022 R11: 0000000000000032 R12: 00007fface80c000
[    3.606223] R13: 000000006ffffeff R14: 000000006ffffe35 R15: 00007ffc12303ce0
[    3.607421] ---[ end trace 5922ddf43b0f7b83 ]---
[    3.608692] hpet: Lost 3 RTC interrupts
diff mbox series

Patch

--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -805,6 +805,14 @@  cmos_do_probe(struct device *dev, struct
 
 	spin_lock_irq(&rtc_lock);
 
+	/* Ensure that the RTC is accessible. Bit 0-6 must be 0! */
+	if ((CMOS_READ(RTC_VALID) & 0x7f) != 0) {
+		spin_unlock_irq(&rtc_lock);
+		dev_warn(dev, "not accessible\n");
+		retval = -ENXIO;
+		goto cleanup1;
+	}
+
 	if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) {
 		/* force periodic irq to CMOS reset default of 1024Hz;
 		 *
--- a/drivers/rtc/rtc-mc146818-lib.c
+++ b/drivers/rtc/rtc-mc146818-lib.c
@@ -21,6 +21,13 @@  unsigned int mc146818_get_time(struct rt
 
 again:
 	spin_lock_irqsave(&rtc_lock, flags);
+	/* Ensure that the RTC is accessible. Bit 0-6 must be 0! */
+	if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x7f) != 0)) {
+		spin_unlock_irqrestore(&rtc_lock, flags);
+		memset(time, 0xff, sizeof(*time));
+		return 0;
+	}
+
 	/*
 	 * Check whether there is an update in progress during which the
 	 * readout is unspecified. The maximum update time is ~2ms. Poll