Message ID | 4F8CFC12.6050700@linaro.org |
---|---|
State | Superseded |
Headers | show |
On 12-04-17 01:13 AM, John Stultz wrote: > On 04/16/2012 07:30 PM, Mark Lord wrote: .. >> And.. on some of the systems I'm testing on, the BIOS setup has >> the RTC Alarm "enabled", which means "under BIOS control", >> as opposed to "disabled" which means "under software control". >> >> It's the "under BIOS control" systems that the above patch breaks. .. > Thanks for the extra info. Although I'm still a little perplexed why that's causing trouble. > When "under BIOS control" is the RTC unusable by the kernel? Will any access cause problems? Or just > the extra disable path? Not totally clear yet, but I have noticed failures just reading the RTC alarm setting when it is "under BIOS control", so it does seem to cause issues. I've further isolated it down to one specific revision of motherboard here. The later rev of the same mobo doesn't exhibit the problem (I have both revs). Interesting, that. But if this board does it, then there probably are others out there which could also be affected. > On a hunch, I wonder if your tripping over the alarmtimer initialization issue that was recently fixed. > Have you also seen this issue w/ 3.4-rc2+ ? These boxes aren't ready for 3.4-xx yet. :) > I guess I'm curious why you're hitting the rtc_alarm_disable if you're not using the alarm. If you > use the following diff, can you provide the resulting stack traces? Yeah, excellent suggestion. I'm very curious who calls this and when. > diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c > index eb415bd..4c98ee5 100644 > --- a/drivers/rtc/interface.c > +++ b/drivers/rtc/interface.c > @@ -786,7 +786,8 @@ static void rtc_alarm_disable(struct rtc_device *rtc) > if (!rtc->ops || !rtc->ops->alarm_irq_enable) > return; > > - rtc->ops->alarm_irq_enable(rtc->dev.parent, false); > + //rtc->ops->alarm_irq_enable(rtc->dev.parent, false); > + dump_stack(); > } .. > Then un-comment/re-add the alarm_irq_enable() call above, and try the following, to see if the > behavior changes? Then re-add each line one by one to see if you can isolate where things go wrong > in the cmos code? > > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c > index 7d5f56e..c500bce 100644 > --- a/drivers/rtc/rtc-cmos.c > +++ b/drivers/rtc/rtc-cmos.c > @@ -318,9 +318,9 @@ static void cmos_irq_disable(struct cmos_rtc *cmos, unsigned char mask) > rtc_control = CMOS_READ(RTC_CONTROL); > rtc_control&= ~mask; > CMOS_WRITE(rtc_control, RTC_CONTROL); > - hpet_mask_rtc_irq_bit(mask); > + //hpet_mask_rtc_irq_bit(mask); > > - cmos_checkintr(cmos, rtc_control); > + //cmos_checkintr(cmos, rtc_control); .. Ack. Will do. Probably not for 12-24 hours though. Cheers
On 12-04-17 01:13 AM, John Stultz wrote: .. > - rtc->ops->alarm_irq_enable(rtc->dev.parent, false); > + //rtc->ops->alarm_irq_enable(rtc->dev.parent, false); > + dump_stack(); .. Okay, the call into here is coming from a "hwclock -w -u" line in the system suspend script. Since that command isn't touching the hardware Alarm, then neither should the Linux kernel. Yet it is touching it. > CMOS_WRITE(rtc_control, RTC_CONTROL); > - hpet_mask_rtc_irq_bit(mask); > + //hpet_mask_rtc_irq_bit(mask); > > - cmos_checkintr(cmos, rtc_control); > + //cmos_checkintr(cmos, rtc_control); ... The problem still occurs (lockup on suspend) with both lines above commented out. Note that it's not 100% in any case, more like 8/10, indicating a possible strong race condition somewhere. I think all that should be done here, is to change the kernel to NOT enable/disable the Alarm unless told to do so by an explicit userspace action. Eg. writing to /sys/../wakealarm and/or /proc/acpi/alarm. If userspace leaves the alarm alone, then so should the kernel when possible. That's the old behaviour before the new alarm_irq_enable() stuff. Cheers
On 12-04-17 04:11 PM, Mark Lord wrote: > On 12-04-17 01:13 AM, John Stultz wrote: > .. >> - rtc->ops->alarm_irq_enable(rtc->dev.parent, false); >> + //rtc->ops->alarm_irq_enable(rtc->dev.parent, false); >> + dump_stack(); > .. > > Okay, the call into here is coming from a "hwclock -w -u" line > in the system suspend script. Forgot the stack dump: Pid: 4353, comm: hwclock Tainted: P O 3.3.2 #5 Call Trace: [<ffffffff8123cfcd>] ? rtc_timer_remove+0x66/0xb2 [<ffffffff8103d2ff>] ? should_resched+0x5/0x23 [<ffffffff8123d21b>] ? rtc_update_irq_enable+0xd0/0x108 [<ffffffff812dd582>] ? __mutex_lock_common.isra.5+0x3b/0x166 [<ffffffff8123e058>] ? rtc_dev_ioctl+0x36d/0x468 [<ffffffff8101b78a>] ? do_page_fault+0x264/0x2ce [<ffffffff81027650>] ? timespec_add_safe+0x33/0x63 [<ffffffff810077a8>] ? read_tsc+0x5/0x14 [<ffffffff810483fa>] ? timekeeping_get_ns+0xd/0x2a [<ffffffff810ab492>] ? do_vfs_ioctl+0x45a/0x49c [<ffffffff810abb8e>] ? poll_select_copy_remaining+0xdb/0xfb [<ffffffff810ab511>] ? sys_ioctl+0x3d/0x60 [<ffffffff812df222>] ? system_call_fastpath+0x16/0x1b > Since that command isn't touching the hardware Alarm, > then neither should the Linux kernel. Yet it is touching it. > >> CMOS_WRITE(rtc_control, RTC_CONTROL); >> - hpet_mask_rtc_irq_bit(mask); >> + //hpet_mask_rtc_irq_bit(mask); >> >> - cmos_checkintr(cmos, rtc_control); >> + //cmos_checkintr(cmos, rtc_control); > ... > > The problem still occurs (lockup on suspend) > with both lines above commented out. > > Note that it's not 100% in any case, more like 8/10, > indicating a possible strong race condition somewhere. > > I think all that should be done here, is to change the kernel > to NOT enable/disable the Alarm unless told to do so by > an explicit userspace action. Eg. writing to /sys/../wakealarm > and/or /proc/acpi/alarm. > > If userspace leaves the alarm alone, then so should the kernel when possible. > That's the old behaviour before the new alarm_irq_enable() stuff. > > Cheers
diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c index eb415bd..4c98ee5 100644 --- a/drivers/rtc/interface.c +++ b/drivers/rtc/interface.c @@ -786,7 +786,8 @@ static void rtc_alarm_disable(struct rtc_device *rtc) if (!rtc->ops || !rtc->ops->alarm_irq_enable) return; - rtc->ops->alarm_irq_enable(rtc->dev.parent, false); + //rtc->ops->alarm_irq_enable(rtc->dev.parent, false); + dump_stack(); } /** Then un-comment/re-add the alarm_irq_enable() call above, and try the following, to see if the behavior changes? Then re-add each line one by one to see if you can isolate where things go wrong in the cmos code? diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index 7d5f56e..c500bce 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -318,9 +318,9 @@ static void cmos_irq_disable(struct cmos_rtc *cmos, unsigned char mask) rtc_control = CMOS_READ(RTC_CONTROL); rtc_control&= ~mask; CMOS_WRITE(rtc_control, RTC_CONTROL); - hpet_mask_rtc_irq_bit(mask); + //hpet_mask_rtc_irq_bit(mask); - cmos_checkintr(cmos, rtc_control); + //cmos_checkintr(cmos, rtc_control); } static int cmos_set_alarm(struct device *dev, struct rtc_wkalrm *t)