From patchwork Wed Dec 16 07:57:56 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaotian Feng X-Patchwork-Id: 41241 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from bilbo.ozlabs.org (localhost [127.0.0.1]) by ozlabs.org (Postfix) with ESMTP id 85B531008AB for ; Wed, 16 Dec 2009 18:58:05 +1100 (EST) Received: by ozlabs.org (Postfix) id BE6D01007D1; Wed, 16 Dec 2009 18:57:57 +1100 (EST) Delivered-To: linuxppc-dev@ozlabs.org Received: from mail-px0-f188.google.com (mail-px0-f188.google.com [209.85.216.188]) by ozlabs.org (Postfix) with ESMTP id 5C0B1B6F08 for ; Wed, 16 Dec 2009 18:57:57 +1100 (EST) Received: by pxi26 with SMTP id 26so569463pxi.17 for ; Tue, 15 Dec 2009 23:57:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=FIGyoUkk7xnoIBmkSSM5Bd0UWzLlL1t9kqg1BCB7l9E=; b=OWly3noyQLIZc6jnAJSNcie8Z3+ibkHivkD+EVRflwVrh6noG9g9vBBzk2KRSu0P5V rqVfHyExGd3wcglYX1MGZ/y/aQcszvVou6w1zaJoYcYNJLkhDwenCClaJJk1gmEV1D1r YuJSn9toyWGu8be3jIqyRdCfsZadqVTMvKL3U= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=HuQpgnKR6zz+z/J/BgMlOxqnjj1akZfKg9hBswdjixOSavJ5kZRGqiLGZo0q5LnfFs AK6XBuX/jc3/ntC4Z7y8vl3M95RKGfkEVMwqFdMbQAT2gdf1itk2b3FWI6XTQVd0I8de meBwWbGU27Boz9WBGfyNSGkZEE4r9k2hvAe60= MIME-Version: 1.0 Received: by 10.141.88.1 with SMTP id q1mr491750rvl.267.1260950276121; Tue, 15 Dec 2009 23:57:56 -0800 (PST) In-Reply-To: <1260947890.8023.1281.camel@laptop> References: <4B2224C7.1020908@in.ibm.com> <7b6bb4a50912152225p4f5dde13re83c439407c16eaf@mail.gmail.com> <4B288131.2050306@in.ibm.com> <7b6bb4a50912152245v61a7f1ebgb41f4857134f3476@mail.gmail.com> <4B288413.2070704@in.ibm.com> <1260947890.8023.1281.camel@laptop> Date: Wed, 16 Dec 2009 15:57:56 +0800 Message-ID: <7b6bb4a50912152357m75aea5dfl6fe063d716517baf@mail.gmail.com> Subject: Re: [Next] CPU Hotplug test failures on powerpc From: Xiaotian Feng To: Peter Zijlstra Cc: linux-kernel , Linux/PPC Development , linux-next@vger.kernel.org, Ingo Molnar X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org On Wed, Dec 16, 2009 at 3:18 PM, Peter Zijlstra wrote: > On Wed, 2009-12-16 at 12:24 +0530, Sachin Sant wrote: >> Xiaotian Feng wrote: >> > On Wed, Dec 16, 2009 at 2:41 PM, Sachin Sant wrote: >> > >> >> Xiaotian Feng wrote: >> >> >> >>> Does this testcase hotplug cpu 0 off? >> >>> >> >>> >> >> No, i don't think so. It skips cpu0 during online/offline >> >> process. >> >> >> > >> > Then how could this happen ? Looks like cpu 0 is offline .... >> > 0:mon> <4>IRQ 17 affinity broken off cpu 0 >> > <4>IRQ 18 affinity broken off cpu 0 >> > <4>IRQ 19 affinity broken off cpu 0 >> > <4>IRQ 264 affinity broken off cpu 0 >> > <4>cpu 0 (hwid 0) Ready to die... >> > <7>clockevent: decrementer mult[83126e97] shift[32] cpu[0] >> > >> Sorry i was looking at only one script. Looking more closely >> at the test there are 6 different sub tests. The rest of the >> tests do seem to hotplug CPU 0. > > Ooh, cute, so you can actually hotplug cpu 0.. no wonder that didn't get > exposed on x86. > > Still, the only time cpu_active_mask should not be equal to > cpu_online_mask is when we're in the middle of a hotplug, we clear > active early and set it late, but its all done under the hotplug mutex, > so we can at most have 1 cpu differences with online mask. > Could follow be possible? We know there's cpu 0 and cpu 1, offline cpu1 > done offline cpu0 > false consider this in cpu_down code, int __ref cpu_down(unsigned int cpu) { set_cpu_active(cpu, false); // here, we set cpu 0 to inactive synchronize_sched(); err = _cpu_down(cpu, 0); out: } Then in _cpu_down code: static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) { if (num_online_cpus() == 1) // if we're trying to offline cpu0, num_online_cpus will be 1 return -EBUSY; // after return back to cpu_down, we didn't change cpu 0 back to active if (!cpu_online(cpu)) return -EINVAL; if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL)) return -ENOMEM; } Then cpu 0 is not active, but online, then we try to offline cpu1, ....... This can not be exposed because x86 does not have /sys/devices/system/cpu0/online. I guess following patch fixes this bug. --- err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod, > Unless of course, I messed up, which appears to be rather likely given > these problems ;-) > > diff --git a/kernel/cpu.c b/kernel/cpu.c index 291ac58..21ddace 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -199,14 +199,18 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) .hcpu = hcpu, }; - if (num_online_cpus() == 1) + if (num_online_cpus() == 1) { + set_cpu_active(cpu, true); return -EBUSY; + } if (!cpu_online(cpu)) return -EINVAL; - if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL)) + if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL)) { + set_cpu_active(cpu, true); return -ENOMEM; + } cpu_hotplug_begin();