From patchwork Tue Apr 17 23:02:11 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Stultz X-Patchwork-Id: 153332 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from mail-vx0-f184.google.com (mail-vx0-f184.google.com [209.85.220.184]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (not verified)) by ozlabs.org (Postfix) with ESMTPS id C65A2B7073 for ; Wed, 18 Apr 2012 09:02:56 +1000 (EST) Received: by vcge1 with SMTP id e1sf6374390vcg.11 for ; Tue, 17 Apr 2012 16:02:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:message-id:date:from:user-agent :mime-version:to:cc:subject:references:in-reply-to:x-content-scanned :x-cbid:x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-google-group-id:list-post :list-help:list-archive:sender:list-subscribe:list-unsubscribe :content-type; bh=a/JxI4HWKtkeeEOLi1sg85VKlLa2he22myjt2uP8RWI=; b=eTx+vvRzXyXbEk7ftQTKnMYhNsTK4qQvZf+tu2GZ9Evyp5coNrJ6aRnWM7+J2MZjHa /mg1Ha/0m9p4uMsBiEklVzuxz/sYbhoM/Swu/k6tT2j4/b0j0DQv0GnGKFYvf/37RwlP 5VdrfMjf3qinHOoIjPVJ7qYuFzQyCe41gq2i4= Received: by 10.50.183.201 with SMTP id eo9mr19074igc.3.1334703774042; Tue, 17 Apr 2012 16:02:54 -0700 (PDT) X-BeenThere: rtc-linux@googlegroups.com Received: by 10.231.27.25 with SMTP id g25ls293843ibc.1.gmail; Tue, 17 Apr 2012 16:02:53 -0700 (PDT) Received: by 10.42.142.202 with SMTP id t10mr41823icu.4.1334703773656; Tue, 17 Apr 2012 16:02:53 -0700 (PDT) Received: by 10.42.142.202 with SMTP id t10mr41821icu.4.1334703773638; Tue, 17 Apr 2012 16:02:53 -0700 (PDT) Received: from e33.co.us.ibm.com (e33.co.us.ibm.com. [32.97.110.151]) by gmr-mx.google.com with ESMTPS id hq2si7626611igc.3.2012.04.17.16.02.53 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 17 Apr 2012 16:02:53 -0700 (PDT) Received-SPF: neutral (google.com: 32.97.110.151 is neither permitted nor denied by best guess record for domain of john.stultz@linaro.org) client-ip=32.97.110.151; Received: from /spool/local by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 17 Apr 2012 17:02:51 -0600 Received: from d01dlp03.pok.ibm.com (9.56.224.17) by e33.co.us.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 17 Apr 2012 17:02:17 -0600 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 185BFC9006C for ; Tue, 17 Apr 2012 19:02:15 -0400 (EDT) Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q3HN2Gcc036066 for ; Tue, 17 Apr 2012 19:02:16 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q3HN2F2N025502 for ; Tue, 17 Apr 2012 19:02:16 -0400 Received: from [9.48.85.214] (sig-9-48-85-214.mts.ibm.com [9.48.85.214]) by d01av04.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q3HN2BBB025166; Tue, 17 Apr 2012 19:02:13 -0400 Message-ID: <4F8DF673.8050605@linaro.org> Date: Tue, 17 Apr 2012 16:02:11 -0700 From: John Stultz User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 MIME-Version: 1.0 To: Mark Lord CC: richard -rw- weinberger , Linux Kernel , rtc-linux@googlegroups.com, Alessandro Zummo , Greg Kroah-Hartman , stable@vger.kernel.org, Rabin Vincent Subject: [rtc-linux] Re: [REGRESSION] rtc/interface.c: kills suspend-to-ram References: <4F8BA1C1.4030804@teksavvy.com> <4F8C24E5.4020703@teksavvy.com> <4F8C3DDF.8030103@teksavvy.com> <4F8C415C.80806@teksavvy.com> <4F8C76EB.20709@linaro.org> <4F8C926D.2040503@linaro.org> <4F8CD5D3.8060006@teksavvy.com> <4F8CFC12.6050700@linaro.org> <4F8DCE74.2020906@teksavvy.com> In-Reply-To: <4F8DCE74.2020906@teksavvy.com> X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12041723-2398-0000-0000-000005E622C6 X-Original-Sender: john.stultz@linaro.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 32.97.110.151 is neither permitted nor denied by best guess record for domain of john.stultz@linaro.org) smtp.mail=john.stultz@linaro.org Reply-To: rtc-linux@googlegroups.com Precedence: list Mailing-list: list rtc-linux@googlegroups.com; contact rtc-linux+owners@googlegroups.com List-ID: X-Google-Group-Id: 712029733259 List-Post: , List-Help: , List-Archive: Sender: rtc-linux@googlegroups.com List-Subscribe: , List-Unsubscribe: , On 04/17/2012 01:11 PM, Mark Lord wrote: > On 12-04-17 01:13 AM, John Stultz wrote: > .. >> - rtc->ops->alarm_irq_enable(rtc->dev.parent, false); >> + //rtc->ops->alarm_irq_enable(rtc->dev.parent, false); >> + dump_stack(); > .. > > Okay, the call into here is coming from a "hwclock -w -u" line > in the system suspend script. > > Since that command isn't touching the hardware Alarm, > then neither should the Linux kernel. Yet it is touching it. > Pid: 4353, comm: hwclock Tainted: P O 3.3.2 #5 > Call Trace: > [] ? rtc_timer_remove+0x66/0xb2 > [] ? should_resched+0x5/0x23 > [] ? rtc_update_irq_enable+0xd0/0x108 > [] ? __mutex_lock_common.isra.5+0x3b/0x166 > [] ? rtc_dev_ioctl+0x36d/0x468 > [] ? do_page_fault+0x264/0x2ce > [] ? timespec_add_safe+0x33/0x63 > [] ? read_tsc+0x5/0x14 > [] ? timekeeping_get_ns+0xd/0x2a > [] ? do_vfs_ioctl+0x45a/0x49c > [] ? poll_select_copy_remaining+0xdb/0xfb > [] ? sys_ioctl+0x3d/0x60 > [] ? system_call_fastpath+0x16/0x1b > Thanks again for testing and sending the backtrace in the other mail (pasted above). Unfortunately, I'm not sure the assessment above is correct. If you strace hwclock -w -u you'll see: ... ioctl(3, PHN_SET_REGS or RTC_UIE_ON, 0) = 0 ... ioctl(3, PHN_NOT_OH or RTC_UIE_OFF, 0) = 0 ... Which is the UIE alarm being turned on and then back off. The UIE mode has been emulated using the AIE alarm since ~2.6.38. So technically the kernel is touching the hardware alarm, and has been for a bit. The recent difference is that previously we'd drop the soft-timer and then it would be possible we'd get one final hardware alarm which we'd (ideally) ignore. But that could cause problems with systems waking up immediately after suspend/poweroff. So now when we drop the soft-timer, if there's no other soft-timers pending, we also turn off the hardware alarm. Thus, why we see rtc_timer_remove() being called from the ioctl. > >> CMOS_WRITE(rtc_control, RTC_CONTROL); >> - hpet_mask_rtc_irq_bit(mask); >> + //hpet_mask_rtc_irq_bit(mask); >> >> - cmos_checkintr(cmos, rtc_control); >> + //cmos_checkintr(cmos, rtc_control); > ... > > The problem still occurs (lockup on suspend) > with both lines above commented out. > > Note that it's not 100% in any case, more like 8/10, > indicating a possible strong race condition somewhere. Thanks again for the testing! I'm still a little bit baffled what would be going on. As you said in your other mail, it seems to only affect certain versions of the same hardware, so its likely a bios issue. Even so, it could be the rtc-cmos code is just missing something. > I think all that should be done here, is to change the kernel > to NOT enable/disable the Alarm unless told to do so by > an explicit userspace action. Eg. writing to /sys/../wakealarm > and/or /proc/acpi/alarm. > > If userspace leaves the alarm alone, then so should the kernel when possible. > That's the old behaviour before the new alarm_irq_enable() stuff. > Well, I'd really like to better understand what is going wrong in this case. Disabling the alarm shouldn't cause suspend problems, even if it was redundant. So if we can better understand the mechanics of the issue, we can better work around it. If you could, would you mind booting a unmodified kernel w/ "nohpet" to see if this is hpet related? Then, please run with the debug patch below on an otherwise unmodified kernel, then send me the complete dmesg. Finally, I don't think you sent me your .config, would you mind sending that as well? Thanks so much again! -john diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index 7d5f56e..44740b8 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -303,9 +303,12 @@ static void cmos_irq_enable(struct cmos_rtc *cmos, unsigned char mask) */ rtc_control = CMOS_READ(RTC_CONTROL); cmos_checkintr(cmos, rtc_control); + printk("cmos_irq_enable: Read: 0x%02x Mask: 0x%02x ", (int)rtc_control, (int)mask); rtc_control |= mask; CMOS_WRITE(rtc_control, RTC_CONTROL); + printk("wrote: 0x%02x\n", (int)rtc_control); + hpet_set_rtc_irq_bit(mask); cmos_checkintr(cmos, rtc_control); @@ -316,8 +319,10 @@ static void cmos_irq_disable(struct cmos_rtc *cmos, unsigned char mask) unsigned char rtc_control; rtc_control = CMOS_READ(RTC_CONTROL); + printk("cmos_irq_disable: Read: 0x%02x Mask: 0x%02x ", (int)rtc_control, (int)mask); rtc_control&= ~mask; CMOS_WRITE(rtc_control, RTC_CONTROL); + printk("wrote: 0x%02x\n", (int)rtc_control); hpet_mask_rtc_irq_bit(mask); cmos_checkintr(cmos, rtc_control); @@ -554,6 +559,9 @@ static irqreturn_t cmos_interrupt(int irq, void *p) */ irqstat = CMOS_READ(RTC_INTR_FLAGS); rtc_control = CMOS_READ(RTC_CONTROL); + + printk("cmos_interrupt: irqstat: 0x%02x control: 0x%02x\n", irqstat, rtc_control); + if (is_hpet_enabled()) irqstat = (unsigned long)irq& 0xF0; irqstat&= (rtc_control& RTC_IRQMASK) | RTC_IRQF; @@ -671,6 +679,13 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) spin_lock_irq(&rtc_lock); + + + printk("cmos_do_probe: irqstat: 0x%02x control: 0x%02x valid: 0x%02x\n", + (int)CMOS_READ(RTC_INTR_FLAGS), + (int)CMOS_READ(RTC_CONTROL), + (int)CMOS_READ(RTC_VALID)); + /* force periodic irq to CMOS reset default of 1024Hz; * * REVISIT it's been reported that at least one x86_64 ALI mobo