[{"id":3675638,"web_url":"http://patchwork.ozlabs.org/comment/3675638/","msgid":"<C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>","list_archive_url":null,"date":"2026-04-10T03:42:07","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":86030,"url":"http://patchwork.ozlabs.org/api/people/86030/","name":"Ani Sinha","email":"anisinha@redhat.com"},"content":"> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n> \n> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n> This caused a lock imbalance where the vCPU thread would loop back to\n> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n\nI am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance? \n\n> \n> The issue manifests as boot hangs on PowerPC pseries machines with multiple\n> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and\n> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held\n> the BQL, preventing other operations from proceeding.\n> \n> The fix has two parts:\n> \n> 1. In kvm_cpu_exec() (kvm-all.c):\n>   Release the BQL before returning EXCP_HLT in the early return path,\n>   matching the behavior of the normal execution path where bql_unlock()\n>   is called before entering the main KVM execution loop.\n> \n> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):\n>   Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the\n>   loop expects to hold the BQL when calling kvm_cpu_exec() again.\n> \n> This ensures proper BQL lock/unlock pairing:\n> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()\n> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)\n> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned\n> - Next iteration has BQL held as expected\n> \n> This is a regression introduced by commit 98884e0cc1 (\"accel/kvm: add\n> changes required to support KVM VM file descriptor change\") which\n> refactored kvm_irqchip_create() and changed the initialization timing,\n> exposing this lock imbalance issue.\n> \n> Fixes: 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file descriptor change\")\n\nI do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.\n\nPersonally I think the core issue is somewhere else. I am not convinced this is the proper fix.\n\n> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>\n> Reported-by: Gautam Menghani <gautam@linux.ibm.com>\n> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>\n> ---\n> accel/kvm/kvm-accel-ops.c | 4 ++++\n> accel/kvm/kvm-all.c       | 1 +\n> 2 files changed, 5 insertions(+)\n> \n> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c\n> index 6d9140e549..d684fd0840 100644\n> --- a/accel/kvm/kvm-accel-ops.c\n> +++ b/accel/kvm/kvm-accel-ops.c\n> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)\n> \n>         if (cpu_can_run(cpu)) {\n>             r = kvm_cpu_exec(cpu);\n> +            if (r == EXCP_HLT) {\n> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */\n> +                bql_lock();\n> +            }\n>             if (r == EXCP_DEBUG) {\n>                 cpu_handle_guest_debug(cpu);\n>             }\n> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\n> index 774499d34f..00b8018664 100644\n> --- a/accel/kvm/kvm-all.c\n> +++ b/accel/kvm/kvm-all.c\n> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)\n>     trace_kvm_cpu_exec();\n> \n>     if (kvm_arch_process_async_events(cpu)) {\n> +        bql_unlock();\n>         return EXCP_HLT;\n>     }\n> \n> -- \n> 2.52.0\n>","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=mimecast20190719 header.b=fkU6SuEv;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsN1G1MkVz1y2d\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 13:42:56 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wB2lL-000626-KY; Thu, 09 Apr 2026 23:42:31 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wB2lK-00061i-B9\n for qemu-ppc@nongnu.org; Thu, 09 Apr 2026 23:42:30 -0400","from us-smtp-delivery-124.mimecast.com ([170.10.129.124])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wB2lI-0002Ac-4j\n for qemu-ppc@nongnu.org; Thu, 09 Apr 2026 23:42:30 -0400","from mail-pj1-f71.google.com (mail-pj1-f71.google.com\n [209.85.216.71]) by relay.mimecast.com with ESMTP with STARTTLS\n (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id\n us-mta-479-hKoLIQqfOS-g8DImJZhPCw-1; Thu, 09 Apr 2026 23:42:24 -0400","by mail-pj1-f71.google.com with SMTP id\n 98e67ed59e1d1-35da4795b3cso3340089a91.2\n for <qemu-ppc@nongnu.org>; Thu, 09 Apr 2026 20:42:24 -0700 (PDT)","from smtpclient.apple ([122.163.114.34])\n by smtp.gmail.com with ESMTPSA id\n 41be03b00d2f7-c79216fd6edsm948082a12.3.2026.04.09.20.42.19\n (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);\n Thu, 09 Apr 2026 20:42:22 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;\n s=mimecast20190719; t=1775792546;\n h=from:from:reply-to:subject:subject:date:date:message-id:message-id:\n to:to:cc:cc:mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=PBoQRKphN7kBHRhIFL+zlHSIiXI8V41ETp1efn7SRTQ=;\n b=fkU6SuEvXioaiKF80hxNs/jgq2tERmpfNnTDQ858pbD3NIsQwNptEEQhDZGBL/23xC4bRU\n jzP6q/zB7+13eIDHtJG26tNKrhZMHMl6mbU1FR89gLW7b9qMO+1u4GFOtsq7tpWlf4vgOU\n SMHFr/CxeI/4sPuWVD4LoCb4t+H0ssM=","X-MC-Unique":"hKoLIQqfOS-g8DImJZhPCw-1","X-Mimecast-MFC-AGG-ID":"hKoLIQqfOS-g8DImJZhPCw_1775792544","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1775792544; x=1776397344;\n h=to:references:message-id:content-transfer-encoding:cc:date\n :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state\n :from:to:cc:subject:date:message-id:reply-to;\n bh=NwfDZpCsHi9F5cF5ZWuT52Xnv7ONvtyTnR6haZym7L8=;\n b=E3xCpS0xr9qH2CDhQkzDXc2bIxSehJAxan39eFXbWgLrm+6g+Urv7YqCe+hHFmQDqK\n 6jttMQTS4iFO53dpiNODEQIv956HSAQBcRuK92eDalpyFdfU58ITdFtGq3D/gjC6new5\n WIp7OwmEWLpaAXvs8HepK5YbYyfx6WZeAB7LC6liY2hQdn6fFUOHvR8WvKpXziNzop+m\n GUIcUiEgAL9AWy6o+ejesKOOa//zTyfGpSLOgV9I8ToTEXZtWYVHswplBTtIkGdRzbf/\n IQ3UgCLPho2UwgBEqRbSbpw4I9Ue9fpvWPD5U4w87ip+8Si9d+M1VLuofmNI5e5+4/ya\n 7XjQ==","X-Forwarded-Encrypted":"i=1;\n AJvYcCVztjoz1zUTU0F72M3UexrA6nKrsrhm4QMzfDNjlRBL8d24wJKED7QPpRTB/99lmwUtCvEBt6t1Tw==@nongnu.org","X-Gm-Message-State":"AOJu0YyqN1Lm3A5YlBcsZ/32ax2QyMjvdMBNkIzrXq0kGR1bOb2fW4jf\n /Z4pvCh2EqpDzmtNLZoZu25gv5liAMH5C4V3Dcpnz2AJH44u1A8OL/9gx4bsAFLTXZkewKr3cn4\n +CpoD8laKb8yp/pF0zzHwEN8C2m9q7rvERCBfojql7foYJcJHCP4asg==","X-Gm-Gg":"AeBDievejzeF2/aplRdoUeL9XgIv1MbT0U6Kkx6GY2PRKHU+38dYPx0fw1RMh1jAdhB\n NGiPpcT5movl9VKr7OjcuZKfpvnnhqxGtO+6J4AcSdOKAjShwpHTLIUyBCITeM84Pd0EzuxJzfP\n h0eJUtrDCGrWjJaTZOW20jmRfAb+WvULQaFlgGuwuAvpgXO/N7WDVyZ3sD6lb49CzLK9XusKozP\n qswlLJ8gIJTZUVS8REMufb2UZTWP5fTkA53RbWg3CD8Gudt6v6FXR0gXpW74bxN54CWjKjc0ZmO\n 3zf1pbWXnNgAN6BmiUJF4rJv34WBke0Uv5oQUkrjsimi0EPxWEsgTTZ7GI6+oADdnrC9qluZ1vU\n Ed911trJZPBOno3xk6jeFx6xI8GS3ssTQuwCteqEhrgmf4uDI05rDaCLIePbJsAci3NOZBe/RJR\n 0=","X-Received":["by 2002:a05:6a20:9389:b0:398:7a23:2779 with SMTP id\n adf61e73a8af0-39fe408bfa5mr1777443637.52.1775792543543;\n Thu, 09 Apr 2026 20:42:23 -0700 (PDT)","by 2002:a05:6a20:9389:b0:398:7a23:2779 with SMTP id\n adf61e73a8af0-39fe408bfa5mr1777409637.52.1775792542980;\n Thu, 09 Apr 2026 20:42:22 -0700 (PDT)"],"Mime-Version":"1.0 (Mac OS X Mail 16.0 \\(3864.500.181\\))","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","From":"Ani Sinha <anisinha@redhat.com>","In-Reply-To":"<20260409161042.55281-1-harshpb@linux.ibm.com>","Date":"Fri, 10 Apr 2026 09:12:07 +0530","Cc":"qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","Message-Id":"<C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>","To":"Harsh Prateek Bora <harshpb@linux.ibm.com>","X-Mailer":"Apple Mail (2.3864.500.181)","X-Mimecast-Spam-Score":"0","X-Mimecast-MFC-PROC-ID":"4rLJ0dvTwp3Y8Uq316TWbo6A8WwHYYARPiQ1yMNndz0_1775792544","X-Mimecast-Originator":"redhat.com","Content-Type":"text/plain;\n\tcharset=utf-8","Content-Transfer-Encoding":"quoted-printable","Received-SPF":"pass client-ip=170.10.129.124;\n envelope-from=anisinha@redhat.com;\n helo=us-smtp-delivery-124.mimecast.com","X-Spam_score_int":"-25","X-Spam_score":"-2.6","X-Spam_bar":"--","X-Spam_report":"(-2.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.54,\n DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,\n RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675662,"web_url":"http://patchwork.ozlabs.org/comment/3675662/","msgid":"<4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>","list_archive_url":null,"date":"2026-04-10T05:25:53","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":85411,"url":"http://patchwork.ozlabs.org/api/people/85411/","name":"Harsh Prateek Bora","email":"harshpb@linux.ibm.com"},"content":"Hi Ani,\n\nOn 10/04/26 9:12 am, Ani Sinha wrote:\n> \n> \n>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>\n>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n>> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n>> This caused a lock imbalance where the vCPU thread would loop back to\n>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n> \n> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?\n\nThe issue is not that kvm_cpu_exec() doesn't expect the caller to hold \nthe BQL - it does. The problem is that kvm_cpu_exec() has inconsistent \nBQL handling across its return paths.\n\nNormal execution path:\n\nint kvm_cpu_exec(CPUState *cpu)\n{\n     // BQL held on entry (from caller)\n\n     if (kvm_arch_process_async_events(cpu)) {\n         return EXCP_HLT;  // ← Returns with BQL STILL HELD\n     }\n\n     bql_unlock();  // ← Normal path unlocks here\n     // ... KVM execution loop ...\n     bql_lock();    // ← Re-acquires before returning\n     return ret;\n}\n\nThe lock imbalance:\n\nWhen kvm_arch_process_async_events() returns true, the function returns \nEXCP_HLT before the bql_unlock() call.\nThis means the early return path keeps the BQL held, while the normal \nexecution path releases and re-acquires it.\nThe caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() \nagain, but now the BQL is already held from the previous iteration\nThis creates a situation where the BQL is never released between \niterations when EXCP_HLT is returned.\n\nWhy this matters:\nOn PowerPC pseries with halted secondary vCPUs (start-powered-off=true), \nthese vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each \niteration accumulates BQL holds, preventing other threads (including CPU \n0) from making progress.\n\n> \n>>\n>> The issue manifests as boot hangs on PowerPC pseries machines with multiple\n>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and\n>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held\n>> the BQL, preventing other operations from proceeding.\n>>\n>> The fix has two parts:\n>>\n>> 1. In kvm_cpu_exec() (kvm-all.c):\n>>    Release the BQL before returning EXCP_HLT in the early return path,\n>>    matching the behavior of the normal execution path where bql_unlock()\n>>    is called before entering the main KVM execution loop.\n>>\n>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):\n>>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the\n>>    loop expects to hold the BQL when calling kvm_cpu_exec() again.\n>>\n>> This ensures proper BQL lock/unlock pairing:\n>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()\n>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)\n>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned\n>> - Next iteration has BQL held as expected\n>>\n>> This is a regression introduced by commit 98884e0cc1 (\"accel/kvm: add\n>> changes required to support KVM VM file descriptor change\") which\n>> refactored kvm_irqchip_create() and changed the initialization timing,\n>> exposing this lock imbalance issue.\n>>\n>> Fixes: 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file descriptor change\")\n> \n> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.\n> \n> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.\n\nRegarding commit 98884e0cc1:\n\nReverting the kvm_irqchip_create refactoring makes the problem go away. \nThis commit may have changed timing that exposed the issue, but the root \ncause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can \neither:\n\nRemove the \"Fixes:\" tag entirely, or\nAdd a note that this is a pre-existing issue exposed by timing changes\nThe core fix (releasing BQL before returning EXCP_HLT) is correct and \naddresses the actual deadlock mechanism.\n\n> \n>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>\n>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>\n>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>\n>> ---\n>> accel/kvm/kvm-accel-ops.c | 4 ++++\n>> accel/kvm/kvm-all.c       | 1 +\n>> 2 files changed, 5 insertions(+)\n>>\n>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c\n>> index 6d9140e549..d684fd0840 100644\n>> --- a/accel/kvm/kvm-accel-ops.c\n>> +++ b/accel/kvm/kvm-accel-ops.c\n>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)\n>>\n>>          if (cpu_can_run(cpu)) {\n>>              r = kvm_cpu_exec(cpu);\n>> +            if (r == EXCP_HLT) {\n>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */\n>> +                bql_lock();\n>> +            }\n>>              if (r == EXCP_DEBUG) {\n>>                  cpu_handle_guest_debug(cpu);\n>>              }\n>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\n>> index 774499d34f..00b8018664 100644\n>> --- a/accel/kvm/kvm-all.c\n>> +++ b/accel/kvm/kvm-all.c\n>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)\n>>      trace_kvm_cpu_exec();\n>>\n>>      if (kvm_arch_process_async_events(cpu)) {\n>> +        bql_unlock();\n>>          return EXCP_HLT;\n>>      }\n>>\n>> -- \n>> 2.52.0\n>>\n>","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256\n header.s=pp1 header.b=XZvkAQlM;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsQJt1VCnz1y2d\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 15:26:36 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wB4Ng-0007Uy-0M; Fri, 10 Apr 2026 01:26:12 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wB4Nb-0007UC-Lg; Fri, 10 Apr 2026 01:26:07 -0400","from mx0a-001b2d01.pphosted.com ([148.163.156.1])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wB4NZ-0006Ww-A2; Fri, 10 Apr 2026 01:26:07 -0400","from pps.filterd (m0353729.ppops.net [127.0.0.1])\n by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id\n 639NMjGe2315079; Fri, 10 Apr 2026 05:26:01 GMT","from ppma23.wdc07v.mail.ibm.com\n (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93])\n by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4dcn2g8ems-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Fri, 10 Apr 2026 05:26:01 +0000 (GMT)","from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1])\n by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id\n 63A2VEkX013882;\n Fri, 10 Apr 2026 05:26:00 GMT","from smtprelay02.wdc07v.mail.ibm.com ([172.16.1.69])\n by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4dcmf4eped-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Fri, 10 Apr 2026 05:26:00 +0000","from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com\n [10.241.53.102])\n by smtprelay02.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id\n 63A5Pw8M24838878\n (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);\n Fri, 10 Apr 2026 05:25:58 GMT","from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id 43DC058056;\n Fri, 10 Apr 2026 05:25:58 +0000 (GMT)","from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id 8C4215805A;\n Fri, 10 Apr 2026 05:25:55 +0000 (GMT)","from [9.124.212.60] (unknown [9.124.212.60])\n by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTP;\n Fri, 10 Apr 2026 05:25:55 +0000 (GMT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc\n :content-transfer-encoding:content-type:date:from:in-reply-to\n :message-id:mime-version:references:subject:to; s=pp1; bh=Dk5iov\n uqyQlNW6p1kUmMfoBh43avLJBdH6SFpfMUKKg=; b=XZvkAQlM7y3cogsDKKm43m\n VQtwY7Yo5UcW4dWGmC3Tilms93pT0SAXNkYypqde11DiVAPHS9+/dBnqK4E0VxVe\n /8NKqHohHzzpT08zYMSmEgKqrqWYYsoKyuXatjgwBPC++z0Is9EQSx+4f69rN7sK\n /3m99KkL4eEyZFh8wdJHz8h6I1ixt3HUBLe53I7S7GlOtOGxsoJfkwZyZDKcMSEZ\n lAMof4CxL05HHdCoy3xj63NcIB2Mpx3W9JYuXyu120BvemLGKEGMEQ1BzmzMfar0\n kh1L6/DrB6/eRrifA/4i9lcNRjg/q/jBGF+kcehhraHSYByjzs6gYrT6iNAgpoHw\n ==","Message-ID":"<4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>","Date":"Fri, 10 Apr 2026 10:55:53 +0530","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","Content-Language":"en-GB","To":"Ani Sinha <anisinha@redhat.com>","Cc":"qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>","From":"Harsh Prateek Bora <harshpb@linux.ibm.com>","In-Reply-To":"<C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>","Content-Type":"text/plain; charset=UTF-8; format=flowed","Content-Transfer-Encoding":"8bit","X-TM-AS-GCONF":"00","X-Proofpoint-Reinject":"loops=2 maxloops=12","X-Authority-Analysis":"v=2.4 cv=FKArAeos c=1 sm=1 tr=0 ts=69d889e9 cx=c_pps\n a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17\n a=IkcTkHD0fZMA:10 a=A5OVakUREuEA:10 a=f7IdgyKtn90A:10\n a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=uAbxVGIbfxUO_5tXvNgY:22\n a=VnNF1IyMAAAA:8 a=w4t-TaBqUuFs90xPVssA:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10\n a=O8hF6Hzn-FEA:10","X-Proofpoint-ORIG-GUID":"IBlTKh2k6U6dyEcgjtp32SmolDJfxxnE","X-Proofpoint-Spam-Details-Enc":"AW1haW4tMjYwNDEwMDA0NiBTYWx0ZWRfX++9UpGhiXo6U\n P58J3JyCoX1Rabk8/E4rv59LNEwFZ+f7LLKFpJNmqYovIoB01f+MDC9GbPPLzRuNwZumBfBvqli\n gabQridAxLfNMXZpkV0VYq6S71E+sUAT44oeECsmJcs2csr6LRPxmRHziXYj/qh5d9fkFA5Eea8\n GXIZ16CjaMMG80m0aunaEMGJaEJj0i9mojhogLe9GdHiuF+C3PWjM+CAwssPVe06e2ok34qGlko\n xp0Ym2SkJATvm5UF2gC5MGVn2G6k7LPFLNnJT8/rL6alOtXjC3c4NJ/MUYJZvBZT7hmr3ogwifS\n HW9cm6NbVzAeNe14O5tnjNYm4IL6OrlsEkJb6lLcK0h4jXX3vJdRPb+FRjZl0H4que+D4idrXxI\n 8YXG6NTVzMpaymT9zX+bJgKVLSJCdKONis1cZlNo78TFZM9w+Edb9DAeWiL9sfFwwhP6wmoD9/R\n ZgtnnFvV4CfzecsaXeA==","X-Proofpoint-GUID":"Vpdr0Co6l27B53CpIggg0OQEJN8M0qFt","X-Proofpoint-Virus-Version":"vendor=baseguard\n engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49\n definitions=2026-04-10_01,2026-04-09_02,2025-10-01_01","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n malwarescore=0 clxscore=1015 lowpriorityscore=0 adultscore=0 bulkscore=0\n suspectscore=0 priorityscore=1501 impostorscore=0 spamscore=0 phishscore=0\n classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0\n reason=mlx scancount=1 engine=8.22.0-2604010000 definitions=main-2604100046","Received-SPF":"pass client-ip=148.163.156.1;\n envelope-from=harshpb@linux.ibm.com;\n helo=mx0a-001b2d01.pphosted.com","X-Spam_score_int":"-26","X-Spam_score":"-2.7","X-Spam_bar":"--","X-Spam_report":"(-2.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,\n DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7,\n RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675676,"web_url":"http://patchwork.ozlabs.org/comment/3675676/","msgid":"<451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>","list_archive_url":null,"date":"2026-04-10T06:35:32","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":86030,"url":"http://patchwork.ozlabs.org/api/people/86030/","name":"Ani Sinha","email":"anisinha@redhat.com"},"content":"> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n> \n> Hi Ani,\n> \n> On 10/04/26 9:12 am, Ani Sinha wrote:\n>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>> \n>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n>>> This caused a lock imbalance where the vCPU thread would loop back to\n>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?\n> \n> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.\n> \n> Normal execution path:\n> \n> int kvm_cpu_exec(CPUState *cpu)\n> {\n>    // BQL held on entry (from caller)\n> \n>    if (kvm_arch_process_async_events(cpu)) {\n>        return EXCP_HLT;  // ← Returns with BQL STILL HELD\n>    }\n> \n>    bql_unlock();  // ← Normal path unlocks here\n>    // ... KVM execution loop ...\n>    bql_lock();    // ← Re-acquires before returning\n>    return ret;\n> }\n\nYes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.\n\nSo if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.\n\n\n> \n> The lock imbalance:\n> \n> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.\n\nWhy should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.\n\n> This means the early return path keeps the BQL held,\n\nThis would be the correct thing to do.\n\n> while the normal execution path releases and re-acquires it.\n\nBecause the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.\n\n> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration\n> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.\n> \n> Why this matters:\n> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.\n\nThis seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?\n\n> \n>>> \n>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple\n>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and\n>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held\n>>> the BQL, preventing other operations from proceeding.\n>>> \n>>> The fix has two parts:\n>>> \n>>> 1. In kvm_cpu_exec() (kvm-all.c):\n>>>   Release the BQL before returning EXCP_HLT in the early return path,\n>>>   matching the behavior of the normal execution path where bql_unlock()\n>>>   is called before entering the main KVM execution loop.\n>>> \n>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):\n>>>   Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the\n>>>   loop expects to hold the BQL when calling kvm_cpu_exec() again.\n>>> \n>>> This ensures proper BQL lock/unlock pairing:\n>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()\n>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)\n>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned\n>>> - Next iteration has BQL held as expected\n>>> \n>>> This is a regression introduced by commit 98884e0cc1 (\"accel/kvm: add\n>>> changes required to support KVM VM file descriptor change\") which\n>>> refactored kvm_irqchip_create() and changed the initialization timing,\n>>> exposing this lock imbalance issue.\n>>> \n>>> Fixes: 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file descriptor change\")\n>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.\n>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.\n> \n> Regarding commit 98884e0cc1:\n> \n> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:\n> \n> Remove the \"Fixes:\" tag entirely, or\n\nLets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.\n\n> Add a note that this is a pre-existing issue exposed by timing changes\n> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.\n> \n>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>\n>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>\n>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>\n>>> ---\n>>> accel/kvm/kvm-accel-ops.c | 4 ++++\n>>> accel/kvm/kvm-all.c       | 1 +\n>>> 2 files changed, 5 insertions(+)\n>>> \n>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c\n>>> index 6d9140e549..d684fd0840 100644\n>>> --- a/accel/kvm/kvm-accel-ops.c\n>>> +++ b/accel/kvm/kvm-accel-ops.c\n>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)\n>>> \n>>>         if (cpu_can_run(cpu)) {\n>>>             r = kvm_cpu_exec(cpu);\n>>> +            if (r == EXCP_HLT) {\n>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */\n>>> +                bql_lock();\n>>> +            }\n>>>             if (r == EXCP_DEBUG) {\n>>>                 cpu_handle_guest_debug(cpu);\n>>>             }\n>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\n>>> index 774499d34f..00b8018664 100644\n>>> --- a/accel/kvm/kvm-all.c\n>>> +++ b/accel/kvm/kvm-all.c\n>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)\n>>>     trace_kvm_cpu_exec();\n>>> \n>>>     if (kvm_arch_process_async_events(cpu)) {\n>>> +        bql_unlock();\n>>>         return EXCP_HLT;\n>>>     }\n>>> \n>>> -- \n>>> 2.52.0","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=mimecast20190719 header.b=iVNLFn4u;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsRs86Xlnz1yGb\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 16:36:10 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wB5TB-0004K6-86; Fri, 10 Apr 2026 02:35:59 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wB5T9-0004Ja-Tn\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 02:35:55 -0400","from us-smtp-delivery-124.mimecast.com ([170.10.133.124])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wB5T7-0000Bw-Fi\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 02:35:55 -0400","from mail-pf1-f197.google.com (mail-pf1-f197.google.com\n [209.85.210.197]) by relay.mimecast.com with ESMTP with STARTTLS\n (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id\n us-mta-372-H14E6qzDPeu02N7fmIN9vQ-1; Fri, 10 Apr 2026 02:35:49 -0400","by mail-pf1-f197.google.com with SMTP id\n d2e1a72fcca58-82c83bd48afso917024b3a.3\n for <qemu-ppc@nongnu.org>; Thu, 09 Apr 2026 23:35:49 -0700 (PDT)","from smtpclient.apple ([122.163.114.34])\n by smtp.gmail.com with ESMTPSA id\n 41be03b00d2f7-c79218fc64csm1394428a12.11.2026.04.09.23.35.44\n (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);\n Thu, 09 Apr 2026 23:35:47 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;\n s=mimecast20190719; t=1775802951;\n h=from:from:reply-to:subject:subject:date:date:message-id:message-id:\n to:to:cc:cc:mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=Ffs4c6Vnk9DfpY4ug6L8Rv+31wi69rvtvWsqjYVdBiM=;\n b=iVNLFn4uXXqbEbRUzQpzj+wLwUZF5LpsO2k9dUFCwNkcZavNS0lOTT1yD+2KzeN6GxvPDQ\n Id9z0LqWWbaIogRjiQcjXiRHOo42oqfNaQGXYeJz+KNK8lA5Yg3kVoq8HRs3EnAl5yvwuI\n UJBC0W5EZuilfo2f+feMFz3IGhTa7uY=","X-MC-Unique":"H14E6qzDPeu02N7fmIN9vQ-1","X-Mimecast-MFC-AGG-ID":"H14E6qzDPeu02N7fmIN9vQ_1775802949","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1775802948; x=1776407748;\n h=to:references:message-id:content-transfer-encoding:cc:date\n :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state\n :from:to:cc:subject:date:message-id:reply-to;\n bh=me1HUP1kyvRB60+ew96olFdIdRAU4EYzbu0s3mn49GU=;\n b=rh6PT7j7bLybqGrpsCIAeaEgsGoyY1Mnrpx0phLA3pHvtOSKNd1sbPWrsD2nH8x4tc\n MdLG6canwQa6QFmOs3uGu5TYHJdRrepT3uOARP9Nlmh3ZsJZ5aMZlXHr2Qm8iB1BE45v\n ljoC31951vfFx4+Ufilb1V4gJRiC9f5y9RrNZA5KLmt7wjpjnBuLRUOixZpPfVkVKwI4\n Vr3hktBOWH1qx0KcX+NYroeQ2T1tr0gBz6QvL7wE7b7khW8rC9Gg1TPF0QlKIYbWlzur\n e8s/9NGtbXnysug6v0usZZHu3gMK5d//c/rGVBlCMwvUO6beQ4VJJCAUFjgJTNysv3fm\n bGEQ==","X-Forwarded-Encrypted":"i=1;\n AJvYcCWndVt/vxz9j8i8XfcmYRv60X8q+mcHRjrmbmwAQaVoyrzYg6wABIysIPZK/D95rwjbxj24XtIAsw==@nongnu.org","X-Gm-Message-State":"AOJu0YwH2DI0kZmxNo68Hy0rvs47eQg8TPalvixP9qWh3McICefPW3lZ\n lDZmDfGtiwZh7MB+IKosGTGMiUefVMOEvd39ypW6iYODV+6whX8GCJH5zFYavJI2yStGWhHDQxK\n kjBxn4dmWvC6OPCS/6HeNiJ4A2uAR/QW18nIKfekGvLn383mpGbJA4Q==","X-Gm-Gg":"AeBDievQHRzWfoZG6kQ7lS+Z+a+O5J+ysPVSUHPfiyDUnbitbQEllUZtwW0+k1zMZ6k\n 8ad2AVs5lFUU075S7tY7U8qjOdzFYIE/jMbm/fC2AdkhteLH+9OPhmJdVsMBEM+JGmoHNV5MIFW\n /O3gsobMUyGJ/IwrJQircsmxW6R7k+bX1VSbeIaaxi3r9fzhLD/aOqfn/TFAhF2vqMQi7QGQN1N\n LHeKJAWTb6S+4wOao6I54fRMEuU0xfCwVsT2d26IHHod8+A7MWVHvuxE9zp5ERZmEWFKsY0ssCa\n 21lL/hN6FDW8mzSlwI+66UMv/svsMyG+ZIloSvd4WifKI5ruIBExsNmxit+9QY+wu0lU0rDLeeU\n q9fenUUGSiJOOjhbzN6inu0+v4ftETichPCZou3HtwSnVQh5L644S0D32bN0uGwu37aV9tOx+xd\n 8=","X-Received":["by 2002:a05:6a00:369b:b0:82c:2468:a163 with SMTP id\n d2e1a72fcca58-82f0c22190cmr2176782b3a.34.1775802948424;\n Thu, 09 Apr 2026 23:35:48 -0700 (PDT)","by 2002:a05:6a00:369b:b0:82c:2468:a163 with SMTP id\n d2e1a72fcca58-82f0c22190cmr2176744b3a.34.1775802947823;\n Thu, 09 Apr 2026 23:35:47 -0700 (PDT)"],"Mime-Version":"1.0 (Mac OS X Mail 16.0 \\(3864.500.181\\))","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","From":"Ani Sinha <anisinha@redhat.com>","In-Reply-To":"<4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>","Date":"Fri, 10 Apr 2026 12:05:32 +0530","Cc":"qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","Message-Id":"<451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>\n <4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>","To":"Harsh Prateek Bora <harshpb@linux.ibm.com>","X-Mailer":"Apple Mail (2.3864.500.181)","X-Mimecast-Spam-Score":"0","X-Mimecast-MFC-PROC-ID":"EP8wNB2i84q269MzT357RPyyoLRdSpyHDazWsvrjii0_1775802949","X-Mimecast-Originator":"redhat.com","Content-Type":"text/plain;\n\tcharset=utf-8","Content-Transfer-Encoding":"quoted-printable","Received-SPF":"pass client-ip=170.10.133.124;\n envelope-from=anisinha@redhat.com;\n helo=us-smtp-delivery-124.mimecast.com","X-Spam_score_int":"-25","X-Spam_score":"-2.6","X-Spam_bar":"--","X-Spam_report":"(-2.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.54,\n DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,\n RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675691,"web_url":"http://patchwork.ozlabs.org/comment/3675691/","msgid":"<4fee7176e93e91a75e39ef141db2675f@linux.ibm.com>","list_archive_url":null,"date":"2026-04-10T07:16:00","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":90528,"url":"http://patchwork.ozlabs.org/api/people/90528/","name":"Misbah Anjum N","email":"misanjum@linux.ibm.com"},"content":"Hi,\nI've tested the patch on PowerPC pseries machine and it resolves the \nboot hang issue seen on ppc when booting KVM guest with >1 smp value.\n\nTest Environment:\n- Host Arch: ppc64le\n- Host and Guest OS: Fedora 42\n- Machine Type: pseries with KVM acceleration\n- QEMU: Latest master with this patch applied\n\nTest Results:\nAll the following SMP topologies now boot successfully:\n\nSingle and simple multi-CPU:\n- -smp 1\n- -smp 2\n- -smp 4\n- -smp 32\n\nVarious socket/core/thread combinations (8 vCPUs):\n- -smp 8,sockets=8,cores=1,threads=1\n- -smp 8,sockets=1,cores=8,threads=1\n- -smp 8,sockets=1,cores=1,threads=8\n- -smp 8,sockets=2,cores=4,threads=1\n- -smp 8,sockets=1,cores=4,threads=2\n- -smp 8,sockets=2,cores=1,threads=4\n- -smp 8,sockets=2,cores=2,threads=2\n\nHigher vCPU count:\n- -smp 16,sockets=2,cores=4,threads=2\n- -smp 32,sockets=1,cores=8,threads=4\n\nTested-by: Misbah Anjum N <misanjum@linux.ibm.com>\n\nThanks,\nMisbah Anjum N <misanjum@linux.ibm.com>\n\n\nOn 2026-04-09 21:40, Harsh Prateek Bora wrote:\n> When kvm_cpu_exec() returns EXCP_HLT due to \n> kvm_arch_process_async_events()\n> returning true, it was returning before releasing the BQL (Big QEMU \n> Lock).\n> This caused a lock imbalance where the vCPU thread would loop back to\n> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n> \n> The issue manifests as boot hangs on PowerPC pseries machines with \n> multiple\n> vCPUs, where secondary vCPUs with start-powered-off=true remain halted \n> and\n> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration \n> held\n> the BQL, preventing other operations from proceeding.\n> \n> The fix has two parts:\n> \n> 1. In kvm_cpu_exec() (kvm-all.c):\n>    Release the BQL before returning EXCP_HLT in the early return path,\n>    matching the behavior of the normal execution path where \n> bql_unlock()\n>    is called before entering the main KVM execution loop.\n> \n> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):\n>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the\n>    loop expects to hold the BQL when calling kvm_cpu_exec() again.\n> \n> This ensures proper BQL lock/unlock pairing:\n> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()\n> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)\n> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned\n> - Next iteration has BQL held as expected\n> \n> This is a regression introduced by commit 98884e0cc1 (\"accel/kvm: add\n> changes required to support KVM VM file descriptor change\") which\n> refactored kvm_irqchip_create() and changed the initialization timing,\n> exposing this lock imbalance issue.\n> \n> Fixes: 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM\n> file descriptor change\")\n> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>\n> Reported-by: Gautam Menghani <gautam@linux.ibm.com>\n> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>\n> ---\n>  accel/kvm/kvm-accel-ops.c | 4 ++++\n>  accel/kvm/kvm-all.c       | 1 +\n>  2 files changed, 5 insertions(+)\n> \n> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c\n> index 6d9140e549..d684fd0840 100644\n> --- a/accel/kvm/kvm-accel-ops.c\n> +++ b/accel/kvm/kvm-accel-ops.c\n> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)\n> \n>          if (cpu_can_run(cpu)) {\n>              r = kvm_cpu_exec(cpu);\n> +            if (r == EXCP_HLT) {\n> +                /* kvm_cpu_exec() released BQL, re-acquire for next\n> iteration */\n> +                bql_lock();\n> +            }\n>              if (r == EXCP_DEBUG) {\n>                  cpu_handle_guest_debug(cpu);\n>              }\n> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\n> index 774499d34f..00b8018664 100644\n> --- a/accel/kvm/kvm-all.c\n> +++ b/accel/kvm/kvm-all.c\n> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)\n>      trace_kvm_cpu_exec();\n> \n>      if (kvm_arch_process_async_events(cpu)) {\n> +        bql_unlock();\n>          return EXCP_HLT;\n>      }","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256\n header.s=pp1 header.b=Sgrm+jrR;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsSlk5Smjz1yCv\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 17:16:32 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wB667-0002KF-DI; Fri, 10 Apr 2026 03:16:11 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <misanjum@linux.ibm.com>)\n id 1wB666-0002K1-2c; Fri, 10 Apr 2026 03:16:10 -0400","from mx0b-001b2d01.pphosted.com ([148.163.158.5])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <misanjum@linux.ibm.com>)\n id 1wB664-0004hx-Aj; Fri, 10 Apr 2026 03:16:09 -0400","from pps.filterd (m0356516.ppops.net [127.0.0.1])\n by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id\n 639LDaWH2326338; Fri, 10 Apr 2026 07:16:05 GMT","from ppma11.dal12v.mail.ibm.com\n (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219])\n by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4dcn2kqqj2-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Fri, 10 Apr 2026 07:16:04 +0000 (GMT)","from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1])\n by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id\n 63A3ch4X014378;\n Fri, 10 Apr 2026 07:16:03 GMT","from smtprelay01.wdc07v.mail.ibm.com ([172.16.1.68])\n by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4dcmg4y0dw-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Fri, 10 Apr 2026 07:16:03 +0000","from smtpav05.wdc07v.mail.ibm.com (smtpav05.wdc07v.mail.ibm.com\n [10.39.53.232])\n by smtprelay01.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id\n 63A7G2YF65798452\n (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);\n Fri, 10 Apr 2026 07:16:02 GMT","from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id EFC2C58043;\n Fri, 10 Apr 2026 07:16:01 +0000 (GMT)","from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id 3D8A35805F;\n Fri, 10 Apr 2026 07:16:01 +0000 (GMT)","from ltc.linux.ibm.com (unknown [9.5.196.140])\n by smtpav05.wdc07v.mail.ibm.com (Postfix) with ESMTP;\n Fri, 10 Apr 2026 07:16:01 +0000 (GMT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc\n :content-transfer-encoding:content-type:date:from:in-reply-to\n :message-id:mime-version:references:subject:to; s=pp1; bh=NnRxgs\n 8uCVhsjkfC8QTCVCXzpNWTDgDGCkUn4+e7VUI=; b=Sgrm+jrRxMrcbnc2AsW0KR\n u/lI2M2vRvmzv/6jyaCLijxOmZI2+QAd2z08nhWOhbH97rwjG1YuEscOE4qoIWwR\n C+gCCjrRQ1Ui+cxg4unxCBxeBI+XbDRUHUsIhckTYkOTiDYjfqqbEiPUmhZ4nF5P\n DERLB74Pa7GPBk8YdUAuaXvftFzDs+jxOr/or+oCJOIPpW74ZoqG6La6Qh/x0vyb\n FF5Su3efLZvVdZqNw6eBzUKZa4A0LHrKY4vggr03H87emiAe+vowiGJxOBmjv+gv\n aA1YK96uT8fai0fnwraZI7IVcB92VXqMd/S3OCLTBwuf7O3UAQqSP1vLIyOcmfgA\n ==","MIME-Version":"1.0","Date":"Fri, 10 Apr 2026 12:46:00 +0530","From":"Misbah Anjum N <misanjum@linux.ibm.com>","To":"Harsh Prateek Bora <harshpb@linux.ibm.com>, Anisinha\n <anisinha@redhat.com>, Pbonzini <pbonzini@redhat.com>","Cc":"qemu-devel@nongnu.org, qemu-ppc@nongnu.org, npiggin@gmail.com,\n gautam@linux.ibm.com, peter.maydell@linaro.org","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","In-Reply-To":"<20260409161042.55281-1-harshpb@linux.ibm.com>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>","Message-ID":"<4fee7176e93e91a75e39ef141db2675f@linux.ibm.com>","X-Sender":"misanjum@linux.ibm.com","Organization":"IBM","Content-Type":"text/plain; charset=US-ASCII;\n format=flowed","Content-Transfer-Encoding":"7bit","X-TM-AS-GCONF":"00","X-Proofpoint-Reinject":"loops=2 maxloops=12","X-Proofpoint-Spam-Details-Enc":"AW1haW4tMjYwNDEwMDA2MSBTYWx0ZWRfX6jKPEGteKutG\n I53l9toAbqBbiI3Lg/zZM7B/q6GS+L0RAr44va7H0ry/fWw/iqytPuBiZhOnB/YvZIZowuI/iuK\n iZHAzFvu/cGWjQ5yLscJ1M97qJ5OLxeOtbqu1rL2qX5VWZC1FKMTCvr4wGuXDUci147UZOp4JFe\n 5mlOhFlgzdfpFLip1ROJ3BkmTfhjXp8e2p6JJVBXnnYWiETiz61MidNJ2xtF0LnVwvlJ+By2f9N\n Rh6zyL70M315y+uLO0L9vqNvll6LH4u8OhV1Wcmm1TrTRtoYVN1vwMfFK7lHXRjARvQAqlTZndz\n MTetl0Bnd9rjfEhX2+WLmQkZbeZJ30KqQqJ0qqchePGvQ3+laAsIpz2WdqumlRzwYMOrcIS+KEQ\n jUZXq9oqI+i3I/lZ0pyNTI/4QlLU68H4GKlWLnTOML6EjTsM96N2MJUHKn7dFAXcuW045JL2c4R\n /eA6Ps5FGNApxHymy5g==","X-Proofpoint-ORIG-GUID":"XbzVHOuZhd_9uTrAtaKohDaaRlicFaB5","X-Authority-Analysis":"v=2.4 cv=e9k2j6p/ c=1 sm=1 tr=0 ts=69d8a3b4 cx=c_pps\n a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17\n a=kj9zAlcOel0A:10 a=A5OVakUREuEA:10 a=VkNPw1HP01LnGYTKEx00:22\n a=RnoormkPH1_aCDwRdu11:22 a=Y2IxJ9c9Rs8Kov3niI8_:22 a=VnNF1IyMAAAA:8\n a=9oSHA60ZoXLfwu9LI_gA:9 a=CjuIK1q_8ugA:10 a=O8hF6Hzn-FEA:10","X-Proofpoint-GUID":"GETwo9QV28Wz_Sxq8bUNjdJgXVbjrhjM","X-Proofpoint-Virus-Version":"vendor=baseguard\n engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49\n definitions=2026-04-10_02,2026-04-09_02,2025-10-01_01","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n clxscore=1011 impostorscore=0 malwarescore=0 suspectscore=0 spamscore=0\n bulkscore=0 adultscore=0 priorityscore=1501 phishscore=0 lowpriorityscore=0\n classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0\n reason=mlx scancount=1 engine=8.22.0-2604010000 definitions=main-2604100061","Received-SPF":"pass client-ip=148.163.158.5;\n envelope-from=misanjum@linux.ibm.com; helo=mx0b-001b2d01.pphosted.com","X-Spam_score_int":"-26","X-Spam_score":"-2.7","X-Spam_bar":"--","X-Spam_report":"(-2.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,\n DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7,\n RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675721,"web_url":"http://patchwork.ozlabs.org/comment/3675721/","msgid":"<8B87F317-5C9C-4CAE-AA50-2B5A12864B76@redhat.com>","list_archive_url":null,"date":"2026-04-10T08:15:02","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":86030,"url":"http://patchwork.ozlabs.org/api/people/86030/","name":"Ani Sinha","email":"anisinha@redhat.com"},"content":"> On 10 Apr 2026, at 12:05 PM, Ani Sinha <anisinha@redhat.com> wrote:\n> \n> \n> \n>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>> \n>> Hi Ani,\n>> \n>> On 10/04/26 9:12 am, Ani Sinha wrote:\n>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>> \n>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n>>>> This caused a lock imbalance where the vCPU thread would loop back to\n>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?\n>> \n>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.\n>> \n>> Normal execution path:\n>> \n>> int kvm_cpu_exec(CPUState *cpu)\n>> {\n>>   // BQL held on entry (from caller)\n>> \n>>   if (kvm_arch_process_async_events(cpu)) {\n>>       return EXCP_HLT;  // ← Returns with BQL STILL HELD\n>>   }\n>> \n>>   bql_unlock();  // ← Normal path unlocks here\n>>   // ... KVM execution loop ...\n>>   bql_lock();    // ← Re-acquires before returning\n>>   return ret;\n>> }\n> \n> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.\n> \n> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.\n> \n> \n>> \n>> The lock imbalance:\n>> \n>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.\n> \n> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.\n> \n>> This means the early return path keeps the BQL held,\n> \n> This would be the correct thing to do.\n> \n>> while the normal execution path releases and re-acquires it.\n> \n> Because the functions it calls after unlocking requires bql to be unlocked.\n\nAnother likely reason is that we cannot have large atomic regions for performance reasons. So a general principle is to acquire the lock narrowly only for specific regions of the code that requires atomicity.","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=mimecast20190719 header.b=iECP9iUR;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsV4H5w1bz1yCv\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 18:15:59 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wB71S-0000G7-3T; Fri, 10 Apr 2026 04:15:26 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wB71Q-0000Fi-W8\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 04:15:25 -0400","from us-smtp-delivery-124.mimecast.com ([170.10.133.124])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wB71O-0000Yc-V4\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 04:15:24 -0400","from mail-pg1-f200.google.com (mail-pg1-f200.google.com\n [209.85.215.200]) by relay.mimecast.com with ESMTP with STARTTLS\n (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id\n us-mta-68-NjNNqMBhOhO1OYc99d01WA-1; Fri, 10 Apr 2026 04:15:19 -0400","by mail-pg1-f200.google.com with SMTP id\n 41be03b00d2f7-c70ea91bfe1so1038041a12.1\n for <qemu-ppc@nongnu.org>; Fri, 10 Apr 2026 01:15:19 -0700 (PDT)","from smtpclient.apple ([122.163.114.34])\n by smtp.gmail.com with ESMTPSA id\n d2e1a72fcca58-82f0c4e23d9sm2248944b3a.46.2026.04.10.01.15.14\n (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);\n Fri, 10 Apr 2026 01:15:17 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;\n s=mimecast20190719; t=1775808921;\n h=from:from:reply-to:subject:subject:date:date:message-id:message-id:\n to:to:cc:cc:mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=szSW13HtPLHFIdx0PlVxdduTI7q/B+6Syp4uvC8tOec=;\n b=iECP9iUR3+st5YD+HBE4raXezwTnIAutLcFj7lXJnGlZlQ4apsqmxD4QuuufRjxy+JEHMs\n dnHQeXFRbBYZAh99FisH90wixHgoKYq1GhvT6JUMDkMaMkakoG8QBG3i9OL4oU71+eGaAp\n B4ZJ+OzwAsjBjUYGnxtmzjGhvxaMLxY=","X-MC-Unique":"NjNNqMBhOhO1OYc99d01WA-1","X-Mimecast-MFC-AGG-ID":"NjNNqMBhOhO1OYc99d01WA_1775808918","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1775808918; x=1776413718;\n h=to:references:message-id:content-transfer-encoding:cc:date\n :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state\n :from:to:cc:subject:date:message-id:reply-to;\n bh=tfTs/yau8cHNI4C5o+BGrWbtz6MfsiSDhARV/1OFHwc=;\n b=fozI+GCNjEAjJ/Tl3O4yuS3EceFTqS6lDyNXLI6SClU7jDkNrnU3BFpxpUdOjahslx\n +q4YNXzzECGhfZIH1jLcbpP/3D33NHOKGVW9IGWa0vp5dcbVzRfGdXJSHta0yGWPGDFj\n DHdCDsjfDaHsZyeKMXyBw8oZnfY+jEvszQEk47ZWYHyXeGjKnGtiRehLh1HoPC/zseHK\n PzZ2uww+Ri2Ry8AK/4MxFliTx8fxZ10O+E7PvY70809JSG5NsmEdHkuluIoaOykG/bJI\n cGAaFHr/7jXOfp4Z65z9QBYBYcEovcJywSbd53x8CLRVwKJd+dbIDU/Nm4t7Hdm/lDPF\n a7Ag==","X-Forwarded-Encrypted":"i=1;\n AJvYcCW717cWleOosD0OOHwITwwhCYFiAwflSfUElynzyxZBVAhmeFa0pkbjQEbsF3G3f7+goxWjapB0Ug==@nongnu.org","X-Gm-Message-State":"AOJu0YxmXaxouMRNj5OGNTKj173O3hGhZ2LKlQxVPRh/OzZX8y03bwse\n ue4cR0hmWkybck12qEs1pX20XfPqkBk/1q8XanMlqF0Q7SEwZPDf+hu5c9Mi/nWlxlwiCsg2OU+\n JBLXvmKCMrdOInnuoFaSD9bQ5YwkQu9qy9UIEhiBug6Fy3PpmY7a3sQ==","X-Gm-Gg":"AeBDies4oFNg3h/zgFEqivoIidI/csbHc+EooUozvoxGbx9o+8gKGxQ5T26A7LO6//M\n bamBZU6a43Alu8aJ/EAIQXwiICfbr0jUzKVx3DLLkbWTKGAGrOHHx1QUEyUYwwsBLMirX8w7ICW\n Ic4jpKQpS2K1QWKhYPSV2eiszvHvkh6OXxPrzvHyhNWj4BbckrLN3Z/ahLWRSJljO93sIyCUeA9\n XVz3mr68QZRAgc0MQPUkFFskhmKJ2jvzXugyGgIYZ5870MPjpKJww7gsfTuSL1A9Y9s7mZPfbMo\n TsZf6PILTOKpIL/eyAntFn8N6UaV1ulXD+yQ0q/S59KKRPBkPfKfcCdUqZE0F8uSCzt/MJyCimo\n vcdAMhwUZ7G0+KCeL97rYl5inWakESeams6TfFzPeAOzpfGGdY05/y/LTTp3OipHYxZPZga0EMr\n Q=","X-Received":["by 2002:a05:6a00:369b:b0:82d:603f:7ab5 with SMTP id\n d2e1a72fcca58-82f0c221b7dmr2627650b3a.35.1775808918227;\n Fri, 10 Apr 2026 01:15:18 -0700 (PDT)","by 2002:a05:6a00:369b:b0:82d:603f:7ab5 with SMTP id\n d2e1a72fcca58-82f0c221b7dmr2627609b3a.35.1775808917726;\n Fri, 10 Apr 2026 01:15:17 -0700 (PDT)"],"Mime-Version":"1.0 (Mac OS X Mail 16.0 \\(3864.500.181\\))","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","From":"Ani Sinha <anisinha@redhat.com>","In-Reply-To":"<451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>","Date":"Fri, 10 Apr 2026 13:45:02 +0530","Cc":"qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","Message-Id":"<8B87F317-5C9C-4CAE-AA50-2B5A12864B76@redhat.com>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>\n <4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>\n <451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>","To":"Harsh Prateek Bora <harshpb@linux.ibm.com>","X-Mailer":"Apple Mail (2.3864.500.181)","X-Mimecast-Spam-Score":"0","X-Mimecast-MFC-PROC-ID":"3UAyNUsXEHUc-0GkwrbtyFmgCR0mXgmEizUmf01wKmI_1775808918","X-Mimecast-Originator":"redhat.com","Content-Type":"text/plain;\n\tcharset=utf-8","Content-Transfer-Encoding":"quoted-printable","Received-SPF":"pass client-ip=170.10.133.124;\n envelope-from=anisinha@redhat.com;\n helo=us-smtp-delivery-124.mimecast.com","X-Spam_score_int":"-25","X-Spam_score":"-2.6","X-Spam_bar":"--","X-Spam_report":"(-2.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.54,\n DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,\n RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675724,"web_url":"http://patchwork.ozlabs.org/comment/3675724/","msgid":"<424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com>","list_archive_url":null,"date":"2026-04-10T08:18:08","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":85411,"url":"http://patchwork.ozlabs.org/api/people/85411/","name":"Harsh Prateek Bora","email":"harshpb@linux.ibm.com"},"content":"On 10/04/26 12:05 pm, Ani Sinha wrote:\n> \n> \n>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>\n>> Hi Ani,\n>>\n>> On 10/04/26 9:12 am, Ani Sinha wrote:\n>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>\n>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n>>>> This caused a lock imbalance where the vCPU thread would loop back to\n>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?\n>>\n>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.\n>>\n>> Normal execution path:\n>>\n>> int kvm_cpu_exec(CPUState *cpu)\n>> {\n>>     // BQL held on entry (from caller)\n>>\n>>     if (kvm_arch_process_async_events(cpu)) {\n>>         return EXCP_HLT;  // ← Returns with BQL STILL HELD\n>>     }\n>>\n>>     bql_unlock();  // ← Normal path unlocks here\n>>     // ... KVM execution loop ...\n>>     bql_lock();    // ← Re-acquires before returning\n>>     return ret;\n>> }\n> \n> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.\n> \n> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.\n> \n> \n>>\n>> The lock imbalance:\n>>\n>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.\n> \n> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.\n> \n>> This means the early return path keeps the BQL held,\n> \n> This would be the correct thing to do.\n> \n>> while the normal execution path releases and re-acquires it.\n> \n> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.\n\nIt had to unlock it for the same reason - to give others a chance to \nlock. We need to handle failure/exception cases for the same purpose as \nwell.\n\n> \n>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration\n>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.\n>>\n>> Why this matters:\n>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.\n> \n> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?\n\nKVM vCPUs need to enter the kernel to handle the halted state and \ntherefore can run. On spapr, it is handled via start-cpu rtas call for \nwhich the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to \nbe able to proceed before that stage is reached, but it hangs while \ntrying to acquire bql_lock in qemu_default_main() whereas secondary vcpu \nis spinning with BQL held returning EXCP_HLT. This is causing deadlock.\n\n> \n>>\n>>>>\n>>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple\n>>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and\n>>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held\n>>>> the BQL, preventing other operations from proceeding.\n>>>>\n>>>> The fix has two parts:\n>>>>\n>>>> 1. In kvm_cpu_exec() (kvm-all.c):\n>>>>    Release the BQL before returning EXCP_HLT in the early return path,\n>>>>    matching the behavior of the normal execution path where bql_unlock()\n>>>>    is called before entering the main KVM execution loop.\n>>>>\n>>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):\n>>>>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the\n>>>>    loop expects to hold the BQL when calling kvm_cpu_exec() again.\n>>>>\n>>>> This ensures proper BQL lock/unlock pairing:\n>>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()\n>>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)\n>>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned\n>>>> - Next iteration has BQL held as expected\n>>>>\n>>>> This is a regression introduced by commit 98884e0cc1 (\"accel/kvm: add\n>>>> changes required to support KVM VM file descriptor change\") which\n>>>> refactored kvm_irqchip_create() and changed the initialization timing,\n>>>> exposing this lock imbalance issue.\n>>>>\n>>>> Fixes: 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file descriptor change\")\n>>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.\n>>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.\n>>\n>> Regarding commit 98884e0cc1:\n>>\n>> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:\n>>\n>> Remove the \"Fixes:\" tag entirely, or\n> \n> Lets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.\n> \n>> Add a note that this is a pre-existing issue exposed by timing changes\n>> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.\n>>\n>>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>\n>>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>\n>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>\n>>>> ---\n>>>> accel/kvm/kvm-accel-ops.c | 4 ++++\n>>>> accel/kvm/kvm-all.c       | 1 +\n>>>> 2 files changed, 5 insertions(+)\n>>>>\n>>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c\n>>>> index 6d9140e549..d684fd0840 100644\n>>>> --- a/accel/kvm/kvm-accel-ops.c\n>>>> +++ b/accel/kvm/kvm-accel-ops.c\n>>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)\n>>>>\n>>>>          if (cpu_can_run(cpu)) {\n>>>>              r = kvm_cpu_exec(cpu);\n>>>> +            if (r == EXCP_HLT) {\n>>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */\n>>>> +                bql_lock();\n>>>> +            }\n>>>>              if (r == EXCP_DEBUG) {\n>>>>                  cpu_handle_guest_debug(cpu);\n>>>>              }\n>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\n>>>> index 774499d34f..00b8018664 100644\n>>>> --- a/accel/kvm/kvm-all.c\n>>>> +++ b/accel/kvm/kvm-all.c\n>>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)\n>>>>      trace_kvm_cpu_exec();\n>>>>\n>>>>      if (kvm_arch_process_async_events(cpu)) {\n>>>> +        bql_unlock();\n>>>>          return EXCP_HLT;\n>>>>      }\n>>>>\n>>>> -- \n>>>> 2.52.0\n> \n> \n>","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256\n header.s=pp1 header.b=Yv5pJBCJ;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsV786t16z1yGS\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 18:18:28 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wB74H-0001EF-Sg; Fri, 10 Apr 2026 04:18:21 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wB74H-0001Dq-0N; Fri, 10 Apr 2026 04:18:21 -0400","from mx0a-001b2d01.pphosted.com ([148.163.156.1])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wB74E-0000s1-V1; Fri, 10 Apr 2026 04:18:20 -0400","from pps.filterd (m0353729.ppops.net [127.0.0.1])\n by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id\n 639KHlYA2162309; Fri, 10 Apr 2026 08:18:16 GMT","from ppma21.wdc07v.mail.ibm.com\n (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91])\n by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4dcn2g9067-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Fri, 10 Apr 2026 08:18:15 +0000 (GMT)","from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1])\n by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id\n 63A30wnc030102;\n Fri, 10 Apr 2026 08:18:14 GMT","from smtprelay02.wdc07v.mail.ibm.com ([172.16.1.69])\n by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4dcme7q5kw-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Fri, 10 Apr 2026 08:18:14 +0000","from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com\n [10.241.53.102])\n by smtprelay02.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id\n 63A8IDH122151812\n (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);\n Fri, 10 Apr 2026 08:18:13 GMT","from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id F1AF858060;\n Fri, 10 Apr 2026 08:18:12 +0000 (GMT)","from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id 3DDD758056;\n Fri, 10 Apr 2026 08:18:10 +0000 (GMT)","from [9.124.212.60] (unknown [9.124.212.60])\n by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTP;\n Fri, 10 Apr 2026 08:18:09 +0000 (GMT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc\n :content-transfer-encoding:content-type:date:from:in-reply-to\n :message-id:mime-version:references:subject:to; s=pp1; bh=hdO8+n\n 4suFybdBAwaxNVR3b5hvXVaCTpXvsUvrrA0h0=; b=Yv5pJBCJx/dD509sESChvM\n GXo8TRESlxPPLXvRQoIHX6u5GsmJYUdx3xHCk20642YKwmJztHEcOv+OMN8yFFb1\n TvgZYD21xhbZrFPDvClkEUv0OSfG1WFQvucdSVSIUxd01BMhA0paxe+njdFheC1s\n s07ppIhgElPfK8fKejQIZqa/TMjTMYaweyKa740VmOi9E1rypsYVISjeu5tkHsvc\n 3P/F+jRWtvq2LRYgFKUp11Bq8sFS91w2eMpfYgwa3ckOiM/GvlihurWiyJyLxpcv\n R8aNVoDmcH/LD2e4ct02J8udUJi7CvIVNKu9cUnkcxtL6k55FPYY9+3pHoYyBrDw\n ==","Message-ID":"<424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com>","Date":"Fri, 10 Apr 2026 13:48:08 +0530","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","Content-Language":"en-GB","To":"Ani Sinha <anisinha@redhat.com>","Cc":"qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>\n <4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>\n <451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>","From":"Harsh Prateek Bora <harshpb@linux.ibm.com>","In-Reply-To":"<451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>","Content-Type":"text/plain; charset=UTF-8; format=flowed","Content-Transfer-Encoding":"8bit","X-TM-AS-GCONF":"00","X-Proofpoint-Reinject":"loops=2 maxloops=12","X-Authority-Analysis":"v=2.4 cv=FKArAeos c=1 sm=1 tr=0 ts=69d8b248 cx=c_pps\n a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17\n a=IkcTkHD0fZMA:10 a=A5OVakUREuEA:10 a=f7IdgyKtn90A:10\n a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=uAbxVGIbfxUO_5tXvNgY:22\n a=VnNF1IyMAAAA:8 a=4Vvs_npgFDSUEtlqYM4A:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10\n a=O8hF6Hzn-FEA:10","X-Proofpoint-ORIG-GUID":"K65FOcmf5VoFE-lbOJLJqh07Uw0k2X4u","X-Proofpoint-Spam-Details-Enc":"AW1haW4tMjYwNDEwMDA3NiBTYWx0ZWRfXzSBySWoZW9PO\n wX+oyP6HtZg3UB9jY8uhE26R8SrEufOA3qihIwkTQ04FayBKdeCHRTMAU/L+RUvUVJqs0ydsTLQ\n aMmHAWJkKySwljXyMGRRhRJAXPg9OUS6Zvk6OXoaZUopg0ohlVBJeC4/sClZdjZGGjIarJgeavd\n YDRA2TQnCXQn7VBoqqcYrrnatBsVIyE0xJDAPBJh0M5oRv4oWInhlX40sbNcZ7YEHSxjeRvqdvv\n xqsMbBbwavrvhncXOM3Pp12oOqtkHZewtpYJ2ldksFDr507EmExSEjRtL7CWQu/t8CWKIJkRXTO\n 8Ib6kdMb/mQUXvnOz4WtsBgmzPwCKCTov4+QtSoU0gMgdG9iCj6gv03OzY9nZq2ML6Xf+rZPHiJ\n jaNG/x7v6n17MhDBHx1f7YrH2Fh1mz3bUls67uZe2QZJMbdtQVJHQLC0rtlFIcz3asunrEJYFJY\n YD3RSBuzcLmp0BWADdQ==","X-Proofpoint-GUID":"K9zOq34lySPhy_-586nQu2S7aftVUI45","X-Proofpoint-Virus-Version":"vendor=baseguard\n engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49\n definitions=2026-04-10_02,2026-04-09_02,2025-10-01_01","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n malwarescore=0 clxscore=1015 lowpriorityscore=0 adultscore=0 bulkscore=0\n suspectscore=0 priorityscore=1501 impostorscore=0 spamscore=0 phishscore=0\n classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0\n reason=mlx scancount=1 engine=8.22.0-2604010000 definitions=main-2604100076","Received-SPF":"pass client-ip=148.163.156.1;\n envelope-from=harshpb@linux.ibm.com;\n helo=mx0a-001b2d01.pphosted.com","X-Spam_score_int":"-26","X-Spam_score":"-2.7","X-Spam_bar":"--","X-Spam_report":"(-2.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,\n DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7,\n RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675731,"web_url":"http://patchwork.ozlabs.org/comment/3675731/","msgid":"<69167029-AE49-4BB3-9A5C-D16E51E5D40F@redhat.com>","list_archive_url":null,"date":"2026-04-10T08:29:39","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":86030,"url":"http://patchwork.ozlabs.org/api/people/86030/","name":"Ani Sinha","email":"anisinha@redhat.com"},"content":"> On 10 Apr 2026, at 1:48 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n> \n> \n> \n> On 10/04/26 12:05 pm, Ani Sinha wrote:\n>>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>> \n>>> Hi Ani,\n>>> \n>>> On 10/04/26 9:12 am, Ani Sinha wrote:\n>>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>> \n>>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n>>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n>>>>> This caused a lock imbalance where the vCPU thread would loop back to\n>>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n>>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?\n>>> \n>>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.\n>>> \n>>> Normal execution path:\n>>> \n>>> int kvm_cpu_exec(CPUState *cpu)\n>>> {\n>>>    // BQL held on entry (from caller)\n>>> \n>>>    if (kvm_arch_process_async_events(cpu)) {\n>>>        return EXCP_HLT;  // ← Returns with BQL STILL HELD\n>>>    }\n>>> \n>>>    bql_unlock();  // ← Normal path unlocks here\n>>>    // ... KVM execution loop ...\n>>>    bql_lock();    // ← Re-acquires before returning\n>>>    return ret;\n>>> }\n>> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.\n>> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.\n>>> \n>>> The lock imbalance:\n>>> \n>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.\n>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.\n>>> This means the early return path keeps the BQL held,\n>> This would be the correct thing to do.\n>>> while the normal execution path releases and re-acquires it.\n>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.\n> \n> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.\n\nBut by unlocking and returning you are breaking the semantics of the function and introducing imbalance.\n\n> \n>>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration\n>>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.\n>>> \n>>> Why this matters:\n>>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.\n>> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?\n> \n> KVM vCPUs need to enter the kernel to handle the halted state and therefore can run. On spapr, it is handled via start-cpu rtas call for which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to be able to proceed before that stage is reached, but it hangs while trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu is spinning with BQL held returning EXCP_HLT. This is causing deadlock.\n\nWithout looking at the code, it seems there is a race condition in the way the vcpu threads are initialised in spapr. I think that needs fixing. \n\n> \n>>> \n>>>>> \n>>>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple\n>>>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and\n>>>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held\n>>>>> the BQL, preventing other operations from proceeding.\n>>>>> \n>>>>> The fix has two parts:\n>>>>> \n>>>>> 1. In kvm_cpu_exec() (kvm-all.c):\n>>>>>   Release the BQL before returning EXCP_HLT in the early return path,\n>>>>>   matching the behavior of the normal execution path where bql_unlock()\n>>>>>   is called before entering the main KVM execution loop.\n>>>>> \n>>>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):\n>>>>>   Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the\n>>>>>   loop expects to hold the BQL when calling kvm_cpu_exec() again.\n>>>>> \n>>>>> This ensures proper BQL lock/unlock pairing:\n>>>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()\n>>>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)\n>>>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned\n>>>>> - Next iteration has BQL held as expected\n>>>>> \n>>>>> This is a regression introduced by commit 98884e0cc1 (\"accel/kvm: add\n>>>>> changes required to support KVM VM file descriptor change\") which\n>>>>> refactored kvm_irqchip_create() and changed the initialization timing,\n>>>>> exposing this lock imbalance issue.\n>>>>> \n>>>>> Fixes: 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file descriptor change\")\n>>>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.\n>>>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.\n>>> \n>>> Regarding commit 98884e0cc1:\n>>> \n>>> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:\n>>> \n>>> Remove the \"Fixes:\" tag entirely, or\n>> Lets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.\n>>> Add a note that this is a pre-existing issue exposed by timing changes\n>>> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.\n>>> \n>>>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>\n>>>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>\n>>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>\n>>>>> ---\n>>>>> accel/kvm/kvm-accel-ops.c | 4 ++++\n>>>>> accel/kvm/kvm-all.c       | 1 +\n>>>>> 2 files changed, 5 insertions(+)\n>>>>> \n>>>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c\n>>>>> index 6d9140e549..d684fd0840 100644\n>>>>> --- a/accel/kvm/kvm-accel-ops.c\n>>>>> +++ b/accel/kvm/kvm-accel-ops.c\n>>>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)\n>>>>> \n>>>>>         if (cpu_can_run(cpu)) {\n>>>>>             r = kvm_cpu_exec(cpu);\n>>>>> +            if (r == EXCP_HLT) {\n>>>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */\n>>>>> +                bql_lock();\n>>>>> +            }\n>>>>>             if (r == EXCP_DEBUG) {\n>>>>>                 cpu_handle_guest_debug(cpu);\n>>>>>             }\n>>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\n>>>>> index 774499d34f..00b8018664 100644\n>>>>> --- a/accel/kvm/kvm-all.c\n>>>>> +++ b/accel/kvm/kvm-all.c\n>>>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)\n>>>>>     trace_kvm_cpu_exec();\n>>>>> \n>>>>>     if (kvm_arch_process_async_events(cpu)) {\n>>>>> +        bql_unlock();\n>>>>>         return EXCP_HLT;\n>>>>>     }\n>>>>> \n>>>>> -- \n>>>>> 2.52.0","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=mimecast20190719 header.b=jJaIX8ij;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsVNk6vFkz1yGS\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 18:30:13 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wB7Fb-0007Zq-3y; Fri, 10 Apr 2026 04:30:03 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wB7Fa-0007U2-2E\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 04:30:02 -0400","from us-smtp-delivery-124.mimecast.com ([170.10.133.124])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wB7FX-0005Mg-G4\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 04:30:01 -0400","from mail-pf1-f197.google.com (mail-pf1-f197.google.com\n [209.85.210.197]) by relay.mimecast.com with ESMTP with STARTTLS\n (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id\n us-mta-570-6wuePsvDOjCN1DwcMIFkTQ-1; Fri, 10 Apr 2026 04:29:56 -0400","by mail-pf1-f197.google.com with SMTP id\n d2e1a72fcca58-82f07cc6590so660651b3a.3\n for <qemu-ppc@nongnu.org>; Fri, 10 Apr 2026 01:29:56 -0700 (PDT)","from smtpclient.apple ([122.163.114.34])\n by smtp.gmail.com with ESMTPSA id\n d2e1a72fcca58-82f0c343b16sm1974340b3a.23.2026.04.10.01.29.51\n (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);\n Fri, 10 Apr 2026 01:29:54 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;\n s=mimecast20190719; t=1775809798;\n h=from:from:reply-to:subject:subject:date:date:message-id:message-id:\n to:to:cc:cc:mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=lMyHd2TruVi8lpv1tvQDeEt0z4/J9e8j3HYjH+ORa6c=;\n b=jJaIX8ij9IaBJrDbtiYlvQM6pSXhp54ylDLvITR4a7+lMTFnPrC9arxajVyKUX+SdB8sev\n ZSmG/pLle0sSjZSu+Ic1lNYTBSWJZtXxjp1smFHX4GdzQrwY/7lnmt8S/WxAT6ArUtAE6V\n RmoJ1S++pGZUcNqqhuSFfIQAhJ5jMdg=","X-MC-Unique":"6wuePsvDOjCN1DwcMIFkTQ-1","X-Mimecast-MFC-AGG-ID":"6wuePsvDOjCN1DwcMIFkTQ_1775809796","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1775809796; x=1776414596;\n h=to:references:message-id:content-transfer-encoding:cc:date\n :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state\n :from:to:cc:subject:date:message-id:reply-to;\n bh=PAl6Drf07LiwM8kPAV4FwH12BT3JzkTnu9KAQX5IAqo=;\n b=ebPxvaRaTVd0p1eQ+6dVHiZGjTBBNNCXRVGN/cty5oN2vokRBkkPfDHyibDHiixciQ\n IqNTC7B1smaIMaOdaKaTo27XWNZU+GacAUM1cOJTaOPzONvMQmCvakbDxokBfR7/JSlX\n ITSTpXZGCF19y70zMIHiHWwThl3TOkJp0WK7LAgfJRMFD7wDzBlljimBi+a2qi7xLjEt\n p8473lnGkxcimld3GXzvU1EOGFjSO70ZfZ7/DGaf6IjOYiZKnlFsnLMOxDo8SN22r4aG\n YcJY05qokI5yeVif7ttutr84W2BBrlsYgkkKajI860N3hrqMsiaRc1N7Fyd/9AFrySkd\n xEiw==","X-Forwarded-Encrypted":"i=1;\n AJvYcCXla9MnPDcyG8SR5gwh4N7ZbPOm7nRmzoGxGKFDey/czRSv8pFEuvGv7bWh2iswz6sl0lvJ2W0E2g==@nongnu.org","X-Gm-Message-State":"AOJu0YxDmyIvupHGAmy9Pn9xfGOsIIf3A+PIfL6+GLrQTD3FBwOLPmXj\n y1zGNiX+qYmxexDob0VEV9jFsz/ZZQEhX8TYe6tmrsiOR/Bb4nCBNMHzRBs4f9v24Mpyg8rkpii\n 8KCtzW8CkLJZogifXbtboP3oTiORpXHXLs91eNun6TuxobZNT6zUPQQ==","X-Gm-Gg":"AeBDieuFjPIwVSawfcXfO2L6kG26xtt9eABmYfFQDQZ9Phk6zF3n0dYwI1WtLTnLREo\n ymU7URw1LtiV7+hKia2glsXpO8hI42vOOH3xzwIzuvmFai3EtP5/2B6l3fsfA0gLrm9Rb3aznDp\n DQeCHUqOs/3AgPq52r7cLjjkZlEzPCS8cUXVTqx0m0MEFEe7VJobJVI4eR58PdQTpV5ioyhhBn8\n pw0PGmokwdo6c9BMTCHnXpRLEq4ZEDD/QpdNYq3E7WK3AEbDYsYrvtnXbLvKLLzf2/MJVI4PEni\n B+N4rN3mDVqofrQAgUJjOpIoEIQVPRRiyzpB+eFEZ6Qx0g4gj7jQyperIOIVPhI1n7ZYiKUzJun\n kagUEgcxKT6OHR9g2nrgZ3uJAr2LABpXffAJuKfqxsHccEeRJfDyyxjWknFR1y92sje5dP0Ic1S\n E=","X-Received":["by 2002:a05:6a00:929e:b0:82c:2468:a164 with SMTP id\n d2e1a72fcca58-82f0c3a37f1mr2940788b3a.41.1775809795566;\n Fri, 10 Apr 2026 01:29:55 -0700 (PDT)","by 2002:a05:6a00:929e:b0:82c:2468:a164 with SMTP id\n d2e1a72fcca58-82f0c3a37f1mr2940765b3a.41.1775809795007;\n Fri, 10 Apr 2026 01:29:55 -0700 (PDT)"],"Mime-Version":"1.0 (Mac OS X Mail 16.0 \\(3864.500.181\\))","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","From":"Ani Sinha <anisinha@redhat.com>","In-Reply-To":"<424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com>","Date":"Fri, 10 Apr 2026 13:59:39 +0530","Cc":"qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","Message-Id":"<69167029-AE49-4BB3-9A5C-D16E51E5D40F@redhat.com>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>\n <4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>\n <451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>\n <424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com>","To":"Harsh Prateek Bora <harshpb@linux.ibm.com>","X-Mailer":"Apple Mail (2.3864.500.181)","X-Mimecast-Spam-Score":"0","X-Mimecast-MFC-PROC-ID":"QX2IPDObQX9iLciq-dhBNKUWJ5TTBFB3dqWNgcejW8M_1775809796","X-Mimecast-Originator":"redhat.com","Content-Type":"text/plain;\n\tcharset=utf-8","Content-Transfer-Encoding":"quoted-printable","Received-SPF":"pass client-ip=170.10.133.124;\n envelope-from=anisinha@redhat.com;\n helo=us-smtp-delivery-124.mimecast.com","X-Spam_score_int":"-25","X-Spam_score":"-2.6","X-Spam_bar":"--","X-Spam_report":"(-2.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.54,\n DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,\n RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675741,"web_url":"http://patchwork.ozlabs.org/comment/3675741/","msgid":"<07f76b99-ff79-4480-af02-c43e2779d179@linux.ibm.com>","list_archive_url":null,"date":"2026-04-10T09:01:50","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":85411,"url":"http://patchwork.ozlabs.org/api/people/85411/","name":"Harsh Prateek Bora","email":"harshpb@linux.ibm.com"},"content":"On 10/04/26 1:59 pm, Ani Sinha wrote:\n> \n> \n>> On 10 Apr 2026, at 1:48 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>\n>>\n>>\n>> On 10/04/26 12:05 pm, Ani Sinha wrote:\n>>>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>\n>>>> Hi Ani,\n>>>>\n>>>> On 10/04/26 9:12 am, Ani Sinha wrote:\n>>>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>>>\n>>>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n>>>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n>>>>>> This caused a lock imbalance where the vCPU thread would loop back to\n>>>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n>>>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?\n>>>>\n>>>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.\n>>>>\n>>>> Normal execution path:\n>>>>\n>>>> int kvm_cpu_exec(CPUState *cpu)\n>>>> {\n>>>>     // BQL held on entry (from caller)\n>>>>\n>>>>     if (kvm_arch_process_async_events(cpu)) {\n>>>>         return EXCP_HLT;  // ← Returns with BQL STILL HELD\n>>>>     }\n>>>>\n>>>>     bql_unlock();  // ← Normal path unlocks here\n>>>>     // ... KVM execution loop ...\n>>>>     bql_lock();    // ← Re-acquires before returning\n>>>>     return ret;\n>>>> }\n>>> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.\n>>> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.\n>>>>\n>>>> The lock imbalance:\n>>>>\n>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.\n>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.\n>>>> This means the early return path keeps the BQL held,\n>>> This would be the correct thing to do.\n>>>> while the normal execution path releases and re-acquires it.\n>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.\n>>\n>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.\n> \n> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.\n\nI think it is better to unlock bql early in failure cases.\nEven Otherwise, it would become a bql_unlock followed by a bql_lock in \nthe caller for EXCP_HLT, which might look a bit odd as well.\n\nPaolo, suggestions?\n\n> \n>>\n>>>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration\n>>>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.\n>>>>\n>>>> Why this matters:\n>>>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.\n>>> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?\n>>\n>> KVM vCPUs need to enter the kernel to handle the halted state and therefore can run. On spapr, it is handled via start-cpu rtas call for which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to be able to proceed before that stage is reached, but it hangs while trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu is spinning with BQL held returning EXCP_HLT. This is causing deadlock.\n> \n> Without looking at the code, it seems there is a race condition in the way the vcpu threads are initialised in spapr. I think that needs fixing.\n\nI did explain the race condition observed above. Other architectures may \nhave different timing that masks the issue.\n\n> \n>>\n>>>>\n>>>>>>\n>>>>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple\n>>>>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and\n>>>>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held\n>>>>>> the BQL, preventing other operations from proceeding.\n>>>>>>\n>>>>>> The fix has two parts:\n>>>>>>\n>>>>>> 1. In kvm_cpu_exec() (kvm-all.c):\n>>>>>>    Release the BQL before returning EXCP_HLT in the early return path,\n>>>>>>    matching the behavior of the normal execution path where bql_unlock()\n>>>>>>    is called before entering the main KVM execution loop.\n>>>>>>\n>>>>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):\n>>>>>>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the\n>>>>>>    loop expects to hold the BQL when calling kvm_cpu_exec() again.\n>>>>>>\n>>>>>> This ensures proper BQL lock/unlock pairing:\n>>>>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()\n>>>>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)\n>>>>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned\n>>>>>> - Next iteration has BQL held as expected\n>>>>>>\n>>>>>> This is a regression introduced by commit 98884e0cc1 (\"accel/kvm: add\n>>>>>> changes required to support KVM VM file descriptor change\") which\n>>>>>> refactored kvm_irqchip_create() and changed the initialization timing,\n>>>>>> exposing this lock imbalance issue.\n>>>>>>\n>>>>>> Fixes: 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file descriptor change\")\n>>>>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.\n>>>>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.\n>>>>\n>>>> Regarding commit 98884e0cc1:\n>>>>\n>>>> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:\n>>>>\n>>>> Remove the \"Fixes:\" tag entirely, or\n>>> Lets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.\n>>>> Add a note that this is a pre-existing issue exposed by timing changes\n>>>> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.\n>>>>\n>>>>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>\n>>>>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>\n>>>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>\n>>>>>> ---\n>>>>>> accel/kvm/kvm-accel-ops.c | 4 ++++\n>>>>>> accel/kvm/kvm-all.c       | 1 +\n>>>>>> 2 files changed, 5 insertions(+)\n>>>>>>\n>>>>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c\n>>>>>> index 6d9140e549..d684fd0840 100644\n>>>>>> --- a/accel/kvm/kvm-accel-ops.c\n>>>>>> +++ b/accel/kvm/kvm-accel-ops.c\n>>>>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)\n>>>>>>\n>>>>>>          if (cpu_can_run(cpu)) {\n>>>>>>              r = kvm_cpu_exec(cpu);\n>>>>>> +            if (r == EXCP_HLT) {\n>>>>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */\n>>>>>> +                bql_lock();\n>>>>>> +            }\n>>>>>>              if (r == EXCP_DEBUG) {\n>>>>>>                  cpu_handle_guest_debug(cpu);\n>>>>>>              }\n>>>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\n>>>>>> index 774499d34f..00b8018664 100644\n>>>>>> --- a/accel/kvm/kvm-all.c\n>>>>>> +++ b/accel/kvm/kvm-all.c\n>>>>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)\n>>>>>>      trace_kvm_cpu_exec();\n>>>>>>\n>>>>>>      if (kvm_arch_process_async_events(cpu)) {\n>>>>>> +        bql_unlock();\n>>>>>>          return EXCP_HLT;\n>>>>>>      }\n>>>>>>\n>>>>>> -- \n>>>>>> 2.52.0\n> \n>","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256\n header.s=pp1 header.b=Gvxg1/Ul;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsW5n1ryDz1yGS\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 19:02:21 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wB7ka-00076V-OI; Fri, 10 Apr 2026 05:02:04 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wB7kX-00075Q-1t; Fri, 10 Apr 2026 05:02:01 -0400","from mx0b-001b2d01.pphosted.com ([148.163.158.5])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wB7kU-0004cr-Tp; Fri, 10 Apr 2026 05:02:00 -0400","from pps.filterd (m0356516.ppops.net [127.0.0.1])\n by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id\n 63A1I3P62326341; Fri, 10 Apr 2026 09:01:57 GMT","from ppma12.dal12v.mail.ibm.com\n (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220])\n by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4dcn2kr1wy-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Fri, 10 Apr 2026 09:01:56 +0000 (GMT)","from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1])\n by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id\n 63A8oFPH026723;\n Fri, 10 Apr 2026 09:01:55 GMT","from smtprelay04.dal12v.mail.ibm.com ([172.16.1.6])\n by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4dcmg877j7-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Fri, 10 Apr 2026 09:01:55 +0000","from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com\n [10.241.53.102])\n by smtprelay04.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id\n 63A91sgd49873246\n (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);\n Fri, 10 Apr 2026 09:01:54 GMT","from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id BD3515803F;\n Fri, 10 Apr 2026 09:01:54 +0000 (GMT)","from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id 2466058064;\n Fri, 10 Apr 2026 09:01:52 +0000 (GMT)","from [9.124.212.60] (unknown [9.124.212.60])\n by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTP;\n Fri, 10 Apr 2026 09:01:51 +0000 (GMT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc\n :content-transfer-encoding:content-type:date:from:in-reply-to\n :message-id:mime-version:references:subject:to; s=pp1; bh=D5zME0\n wZAjsLSMZgAbyrPE1ijWC5L8zIrBhwg2MfKv4=; b=Gvxg1/UlOfys+hFSMHOJ53\n YMRKRnlOIuuY2q6rgwXu0fearNvSLUxR4ZMf4J9yuWyOFHExqsGspACW3G/P6m0G\n KAhstNGccq/8yz/r2LpicDh6IQGZ+BIUphae1fZMzZMfKMUXZLSEzKsgZqhaOaQ4\n jSMdECIGDMop3eXiNrjTztfwNioeQ76noXBp+0DTsncfcKb+QCpeWV+kxRtSebee\n oiYFRjAElrsavNIe5356r7fAWj7bqYJaUNhDQLhhSMZpJp6tyImPTCPszT234iwt\n xGddpTtzM+opoX/USu9ccUDOvsrqxrBXEDoBUVD6r+GItqFfkCMNe2f3vsA6u7mg\n ==","Message-ID":"<07f76b99-ff79-4480-af02-c43e2779d179@linux.ibm.com>","Date":"Fri, 10 Apr 2026 14:31:50 +0530","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","Content-Language":"en-GB","To":"Ani Sinha <anisinha@redhat.com>","Cc":"qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>\n <4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>\n <451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>\n <424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com>\n <69167029-AE49-4BB3-9A5C-D16E51E5D40F@redhat.com>","From":"Harsh Prateek Bora <harshpb@linux.ibm.com>","In-Reply-To":"<69167029-AE49-4BB3-9A5C-D16E51E5D40F@redhat.com>","Content-Type":"text/plain; charset=UTF-8; format=flowed","Content-Transfer-Encoding":"8bit","X-TM-AS-GCONF":"00","X-Proofpoint-Reinject":"loops=2 maxloops=12","X-Proofpoint-Spam-Details-Enc":"AW1haW4tMjYwNDEwMDA4MSBTYWx0ZWRfX2D9MKk2Opd4U\n tfVpS8PWnBohwKGQfvMtQ0at7a74Zs9ONrOk2jUrnCuFl8AA4r12ZJLxO12nB+ibgwUx5OaI8MB\n QWmM9dTAOMQU4mbBYEZLZ6mhyuLxs/fJRJZm5ctuSkDBJkE7Yvm3C0q81TknFNyvcaRNqv4zbWM\n ih7qZClsHl4WbGFcaqN+LwFR3+BCD2KtAeiAYQRBgANXlvrQa/QmdAtWvirP1nG3nMwEJV4nq5I\n Z/5UWANZnb3qOVh5cJcyJoQ91vDb/EFIb+MY2lmRfc8e6pG4goevnJ/9XSNOiXXZUOGYqoqDRqK\n Quj0/J6NsCQSOr3lmmyroSLuYJWe9fKXrkyISB2XneK4bCSd2AyKUgfhrbtzD6li6oTt8ssLLsC\n qIWvsZJZ7Kc7r81tmS91WUy2BEkFZl9JsT/KKHsS3NH8MD5+hWnHOw7k2kSkqf6sce2J45cgNtO\n 35pZUxuBfM5/Ms8Rt1Q==","X-Proofpoint-ORIG-GUID":"aNlYwsPhaxL-N2WbxTUYekEUY5uwaCSS","X-Authority-Analysis":"v=2.4 cv=e9k2j6p/ c=1 sm=1 tr=0 ts=69d8bc84 cx=c_pps\n a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17\n a=IkcTkHD0fZMA:10 a=A5OVakUREuEA:10 a=f7IdgyKtn90A:10\n a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=Y2IxJ9c9Rs8Kov3niI8_:22\n a=VnNF1IyMAAAA:8 a=dmSnYZTVpx9dE7S3pykA:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10\n a=O8hF6Hzn-FEA:10","X-Proofpoint-GUID":"9QXI_HEhhMxl9ASMP--XJl4h7VMkKePC","X-Proofpoint-Virus-Version":"vendor=baseguard\n engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49\n definitions=2026-04-10_02,2026-04-09_02,2025-10-01_01","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n clxscore=1015 impostorscore=0 malwarescore=0 suspectscore=0 spamscore=0\n bulkscore=0 adultscore=0 priorityscore=1501 phishscore=0 lowpriorityscore=0\n classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0\n reason=mlx scancount=1 engine=8.22.0-2604010000 definitions=main-2604100081","Received-SPF":"pass client-ip=148.163.158.5;\n envelope-from=harshpb@linux.ibm.com;\n helo=mx0b-001b2d01.pphosted.com","X-Spam_score_int":"-26","X-Spam_score":"-2.7","X-Spam_bar":"--","X-Spam_report":"(-2.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,\n DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7,\n RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675750,"web_url":"http://patchwork.ozlabs.org/comment/3675750/","msgid":"<34DB9C06-ED15-42BF-A7BB-AF1BBC170ADA@redhat.com>","list_archive_url":null,"date":"2026-04-10T09:31:16","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":86030,"url":"http://patchwork.ozlabs.org/api/people/86030/","name":"Ani Sinha","email":"anisinha@redhat.com"},"content":"> On 10 Apr 2026, at 2:31 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n> \n> \n> \n> On 10/04/26 1:59 pm, Ani Sinha wrote:\n>>> On 10 Apr 2026, at 1:48 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>> \n>>> \n>>> \n>>> On 10/04/26 12:05 pm, Ani Sinha wrote:\n>>>>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>> \n>>>>> Hi Ani,\n>>>>> \n>>>>> On 10/04/26 9:12 am, Ani Sinha wrote:\n>>>>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>>>> \n>>>>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n>>>>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n>>>>>>> This caused a lock imbalance where the vCPU thread would loop back to\n>>>>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n>>>>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?\n>>>>> \n>>>>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.\n>>>>> \n>>>>> Normal execution path:\n>>>>> \n>>>>> int kvm_cpu_exec(CPUState *cpu)\n>>>>> {\n>>>>>    // BQL held on entry (from caller)\n>>>>> \n>>>>>    if (kvm_arch_process_async_events(cpu)) {\n>>>>>        return EXCP_HLT;  // ← Returns with BQL STILL HELD\n>>>>>    }\n>>>>> \n>>>>>    bql_unlock();  // ← Normal path unlocks here\n>>>>>    // ... KVM execution loop ...\n>>>>>    bql_lock();    // ← Re-acquires before returning\n>>>>>    return ret;\n>>>>> }\n>>>> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.\n>>>> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.\n>>>>> \n>>>>> The lock imbalance:\n>>>>> \n>>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.\n>>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.\n>>>>> This means the early return path keeps the BQL held,\n>>>> This would be the correct thing to do.\n>>>>> while the normal execution path releases and re-acquires it.\n>>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.\n>>> \n>>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.\n>> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.\n> \n> I think it is better to unlock bql early in failure cases.\n> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the caller for EXCP_HLT, which might look a bit odd as well.\n\nBut you are doing exactly that across a function call. \n\n> \n> Paolo, suggestions?\n> \n>>> \n>>>>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration\n>>>>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.\n>>>>> \n>>>>> Why this matters:\n>>>>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.\n>>>> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?\n>>> \n>>> KVM vCPUs need to enter the kernel to handle the halted state and therefore can run. On spapr, it is handled via start-cpu rtas call for which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to be able to proceed before that stage is reached, but it hangs while trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu is spinning with BQL held returning EXCP_HLT. This is causing deadlock.\n>> Without looking at the code, it seems there is a race condition in the way the vcpu threads are initialised in spapr. I think that needs fixing.\n> \n> I did explain the race condition observed above.\n\nSo why not fix it properly?\n\n> Other architectures may have different timing that masks the issue.\n> \n>>> \n>>>>> \n>>>>>>> \n>>>>>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple\n>>>>>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and\n>>>>>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held\n>>>>>>> the BQL, preventing other operations from proceeding.\n>>>>>>> \n>>>>>>> The fix has two parts:\n>>>>>>> \n>>>>>>> 1. In kvm_cpu_exec() (kvm-all.c):\n>>>>>>>   Release the BQL before returning EXCP_HLT in the early return path,\n>>>>>>>   matching the behavior of the normal execution path where bql_unlock()\n>>>>>>>   is called before entering the main KVM execution loop.\n>>>>>>> \n>>>>>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):\n>>>>>>>   Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the\n>>>>>>>   loop expects to hold the BQL when calling kvm_cpu_exec() again.\n>>>>>>> \n>>>>>>> This ensures proper BQL lock/unlock pairing:\n>>>>>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()\n>>>>>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)\n>>>>>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned\n>>>>>>> - Next iteration has BQL held as expected\n>>>>>>> \n>>>>>>> This is a regression introduced by commit 98884e0cc1 (\"accel/kvm: add\n>>>>>>> changes required to support KVM VM file descriptor change\") which\n>>>>>>> refactored kvm_irqchip_create() and changed the initialization timing,\n>>>>>>> exposing this lock imbalance issue.\n>>>>>>> \n>>>>>>> Fixes: 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file descriptor change\")\n>>>>>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.\n>>>>>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.\n>>>>> \n>>>>> Regarding commit 98884e0cc1:\n>>>>> \n>>>>> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:\n>>>>> \n>>>>> Remove the \"Fixes:\" tag entirely, or\n>>>> Lets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.\n>>>>> Add a note that this is a pre-existing issue exposed by timing changes\n>>>>> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.\n>>>>> \n>>>>>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>\n>>>>>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>\n>>>>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>\n>>>>>>> ---\n>>>>>>> accel/kvm/kvm-accel-ops.c | 4 ++++\n>>>>>>> accel/kvm/kvm-all.c       | 1 +\n>>>>>>> 2 files changed, 5 insertions(+)\n>>>>>>> \n>>>>>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c\n>>>>>>> index 6d9140e549..d684fd0840 100644\n>>>>>>> --- a/accel/kvm/kvm-accel-ops.c\n>>>>>>> +++ b/accel/kvm/kvm-accel-ops.c\n>>>>>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)\n>>>>>>> \n>>>>>>>         if (cpu_can_run(cpu)) {\n>>>>>>>             r = kvm_cpu_exec(cpu);\n>>>>>>> +            if (r == EXCP_HLT) {\n>>>>>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */\n>>>>>>> +                bql_lock();\n>>>>>>> +            }\n>>>>>>>             if (r == EXCP_DEBUG) {\n>>>>>>>                 cpu_handle_guest_debug(cpu);\n>>>>>>>             }\n>>>>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\n>>>>>>> index 774499d34f..00b8018664 100644\n>>>>>>> --- a/accel/kvm/kvm-all.c\n>>>>>>> +++ b/accel/kvm/kvm-all.c\n>>>>>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)\n>>>>>>>     trace_kvm_cpu_exec();\n>>>>>>> \n>>>>>>>     if (kvm_arch_process_async_events(cpu)) {\n>>>>>>> +        bql_unlock();\n>>>>>>>         return EXCP_HLT;\n>>>>>>>     }\n>>>>>>> \n>>>>>>> -- \n>>>>>>> 2.52.0","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=mimecast20190719 header.b=YndzwC9x;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsWm71PvYz1yGS\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 19:32:05 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wB8DH-0007TL-Oj; Fri, 10 Apr 2026 05:31:44 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wB8DF-0007T2-Mm\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 05:31:41 -0400","from us-smtp-delivery-124.mimecast.com ([170.10.129.124])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wB8DD-0000oP-CV\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 05:31:41 -0400","from mail-pg1-f199.google.com (mail-pg1-f199.google.com\n [209.85.215.199]) by relay.mimecast.com with ESMTP with STARTTLS\n (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id\n us-mta-393-2YUiPuYxP1eN3rhXQXkd2A-1; Fri, 10 Apr 2026 05:31:35 -0400","by mail-pg1-f199.google.com with SMTP id\n 41be03b00d2f7-c6e24ee93a6so915293a12.0\n for <qemu-ppc@nongnu.org>; Fri, 10 Apr 2026 02:31:35 -0700 (PDT)","from smtpclient.apple ([122.163.114.34])\n by smtp.gmail.com with ESMTPSA id\n 41be03b00d2f7-c79219c4a97sm2028613a12.19.2026.04.10.02.31.28\n (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);\n Fri, 10 Apr 2026 02:31:31 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;\n s=mimecast20190719; t=1775813497;\n h=from:from:reply-to:subject:subject:date:date:message-id:message-id:\n to:to:cc:cc:mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=Cex6anxbau+zvfsfbcTFm5H6GrkiG5hoxeE4BC6A4Kk=;\n b=YndzwC9xHW2tu4N8UQc0AhVwkRg/r5XHiqOPn7YVmsTnt4o0g8Y8WwW/wFby9VO/+PwYsT\n TUEVNOribP8QGWptEmxQ02o7mAL4c+pFdP7lGr3ldz0ic9LcEU9VdrALcb2VTqGJrHqqDs\n 6qe7SGRxz/dfOnmmoqBBdzkrPMgUwlw=","X-MC-Unique":"2YUiPuYxP1eN3rhXQXkd2A-1","X-Mimecast-MFC-AGG-ID":"2YUiPuYxP1eN3rhXQXkd2A_1775813494","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1775813494; x=1776418294;\n h=to:references:message-id:content-transfer-encoding:cc:date\n :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state\n :from:to:cc:subject:date:message-id:reply-to;\n bh=ptZZvVPuYkfpijxT4Yx1WCcrzvMcHlP0a8QMGdjJ8Es=;\n b=LujheanRtDCRKq3T4zzXQmXB4WXwm9DDRjch0NCdwK+xqKbETh21riyJli6EdSZExx\n tIzzfQFjeDtuk4v2UHRV6ezfE9Bv8LWHh2EtIHguFdCMiwrqmM/xyWemJKjXLpNMEYO8\n Y91HI1ia0o9KwRAS+7zyjWrraMt+GaPAPl/SrT7S2QkALA92NcOvwPI7UREvWE1PQfal\n 9JhjMwuVnd7CUnymnW5EwvU80rCa+ONad1GLMiL+UYqwesyosPrS9GcOvWaGycLfyWuw\n PdzJvOCT9z/ki3bAFdwrsiuDJpGwmw30if9lVM7njxzlVeuaa7dCQHedH9/EEupb+QIn\n D4IA==","X-Forwarded-Encrypted":"i=1;\n AJvYcCW0i0hpRK39iByeyJO+Lz+84in8rwk+97kcFM9viNxXs2AH5NRbmHarMAnGYJYVYY/+SSCC8wkiZA==@nongnu.org","X-Gm-Message-State":"AOJu0Yy+SSlhLPtAHABo1xTnvcTjYxd2Mf21CLpqm6HU5Ktcg7NzZyHu\n HJBpNBFsimssBB0fxZVVMwXg0zZGH2XJe9uYOR47rS34njPxR7U2nL5N8yOC0wQSDhjMWg13YHA\n jbH3SRjEYX2PwxnM8xjVbd2q7m4r3br+/SnhUm1sIACcWp4Rn5weukg==","X-Gm-Gg":"AeBDievlB4cDkDSi0AtNr0FaoMTh9EC6gEUZCQeFvx2taxj0Sd+K+UQ6Gd9KeFT6Knv\n kCo1LRk6GSS4yt4BUsoG7m3DoHNXridlYcQQ+V907DoEN1ZKEi+FA+ziyJGn5EtFEslP79ex+0W\n I5Yeh1gM0yh7xff4q9xHLCdDYTGx+aUlNfTeI8HMbDRKsS3nM1WAlpdLjoqwoBL90rASGkRIoS6\n tMP41pBVzJCCxhop9drtBwCCxZzKCGvZ/gnlLlUHvP/UygBgKkqbweVfTIaZNgPFnVQpwKli+NK\n VNsPc+WDXRBeJWAAehTtP0PomASKt/giqZaMdThBojoF9rH5xucz3syj5t8hvFk4DK115fQy73q\n Krh7npkzUN6Z4pWg6zeX5ZGtbhEYZgwZMV7i7Y7U+kvpI/s41Nz/GrahTbIPZGX3dYDz4anBZcQ\n U=","X-Received":["by 2002:a05:6a20:3ca8:b0:39b:ff20:f4e5 with SMTP id\n adf61e73a8af0-39fc9309e37mr7358357637.12.1775813494221;\n Fri, 10 Apr 2026 02:31:34 -0700 (PDT)","by 2002:a05:6a20:3ca8:b0:39b:ff20:f4e5 with SMTP id\n adf61e73a8af0-39fc9309e37mr7358331637.12.1775813493491;\n Fri, 10 Apr 2026 02:31:33 -0700 (PDT)"],"Mime-Version":"1.0 (Mac OS X Mail 16.0 \\(3864.500.181\\))","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","From":"Ani Sinha <anisinha@redhat.com>","In-Reply-To":"<07f76b99-ff79-4480-af02-c43e2779d179@linux.ibm.com>","Date":"Fri, 10 Apr 2026 15:01:16 +0530","Cc":"qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","Message-Id":"<34DB9C06-ED15-42BF-A7BB-AF1BBC170ADA@redhat.com>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>\n <4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>\n <451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>\n <424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com>\n <69167029-AE49-4BB3-9A5C-D16E51E5D40F@redhat.com>\n <07f76b99-ff79-4480-af02-c43e2779d179@linux.ibm.com>","To":"Harsh Prateek Bora <harshpb@linux.ibm.com>","X-Mailer":"Apple Mail (2.3864.500.181)","X-Mimecast-Spam-Score":"0","X-Mimecast-MFC-PROC-ID":"Uh4_GklsufXDH6cbMPbmMvKu4sALYIjWvV1HnniWhFw_1775813494","X-Mimecast-Originator":"redhat.com","Content-Type":"text/plain;\n\tcharset=utf-8","Content-Transfer-Encoding":"quoted-printable","Received-SPF":"pass client-ip=170.10.129.124;\n envelope-from=anisinha@redhat.com;\n helo=us-smtp-delivery-124.mimecast.com","X-Spam_score_int":"-25","X-Spam_score":"-2.6","X-Spam_bar":"--","X-Spam_report":"(-2.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.54,\n DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,\n RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675767,"web_url":"http://patchwork.ozlabs.org/comment/3675767/","msgid":"<852c5863-8f56-473d-ad92-a56a80f3ce71@linux.ibm.com>","list_archive_url":null,"date":"2026-04-10T10:02:03","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":85411,"url":"http://patchwork.ozlabs.org/api/people/85411/","name":"Harsh Prateek Bora","email":"harshpb@linux.ibm.com"},"content":"On 10/04/26 3:01 pm, Ani Sinha wrote:\n> \n> \n>> On 10 Apr 2026, at 2:31 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>\n>>\n>>\n>> On 10/04/26 1:59 pm, Ani Sinha wrote:\n>>>> On 10 Apr 2026, at 1:48 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>\n>>>>\n>>>>\n>>>> On 10/04/26 12:05 pm, Ani Sinha wrote:\n>>>>>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>>>\n>>>>>> Hi Ani,\n>>>>>>\n>>>>>> On 10/04/26 9:12 am, Ani Sinha wrote:\n>>>>>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>>>>>\n>>>>>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n>>>>>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n>>>>>>>> This caused a lock imbalance where the vCPU thread would loop back to\n>>>>>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n>>>>>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?\n>>>>>>\n>>>>>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.\n>>>>>>\n>>>>>> Normal execution path:\n>>>>>>\n>>>>>> int kvm_cpu_exec(CPUState *cpu)\n>>>>>> {\n>>>>>>     // BQL held on entry (from caller)\n>>>>>>\n>>>>>>     if (kvm_arch_process_async_events(cpu)) {\n>>>>>>         return EXCP_HLT;  // ← Returns with BQL STILL HELD\n>>>>>>     }\n>>>>>>\n>>>>>>     bql_unlock();  // ← Normal path unlocks here\n>>>>>>     // ... KVM execution loop ...\n>>>>>>     bql_lock();    // ← Re-acquires before returning\n>>>>>>     return ret;\n>>>>>> }\n>>>>> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.\n>>>>> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.\n>>>>>>\n>>>>>> The lock imbalance:\n>>>>>>\n>>>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.\n>>>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.\n>>>>>> This means the early return path keeps the BQL held,\n>>>>> This would be the correct thing to do.\n>>>>>> while the normal execution path releases and re-acquires it.\n>>>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.\n>>>>\n>>>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.\n>>> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.\n>>\n>> I think it is better to unlock bql early in failure cases.\n>> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the caller for EXCP_HLT, which might look a bit odd as well.\n> \n> But you are doing exactly that across a function call.\n\nI am fine with either/or maintainer's choice.\n\n> \n>>\n>> Paolo, suggestions?\n>>\n>>>>\n>>>>>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration\n>>>>>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.\n>>>>>>\n>>>>>> Why this matters:\n>>>>>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.\n>>>>> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?\n>>>>\n>>>> KVM vCPUs need to enter the kernel to handle the halted state and therefore can run. On spapr, it is handled via start-cpu rtas call for which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to be able to proceed before that stage is reached, but it hangs while trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu is spinning with BQL held returning EXCP_HLT. This is causing deadlock.\n>>> Without looking at the code, it seems there is a race condition in the way the vcpu threads are initialised in spapr. I think that needs fixing.\n>>\n>> I did explain the race condition observed above.\n> \n> So why not fix it properly?\n\nI think the suggested fix is appropriate for this scenario.\nKeeping BQL held forever in a loop for EXCP_HLT case isnt the right \nthing to do.\n\n> \n>> Other architectures may have different timing that masks the issue.\n>>\n>>>>\n>>>>>>\n>>>>>>>>\n>>>>>>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple\n>>>>>>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and\n>>>>>>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held\n>>>>>>>> the BQL, preventing other operations from proceeding.\n>>>>>>>>\n>>>>>>>> The fix has two parts:\n>>>>>>>>\n>>>>>>>> 1. In kvm_cpu_exec() (kvm-all.c):\n>>>>>>>>    Release the BQL before returning EXCP_HLT in the early return path,\n>>>>>>>>    matching the behavior of the normal execution path where bql_unlock()\n>>>>>>>>    is called before entering the main KVM execution loop.\n>>>>>>>>\n>>>>>>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):\n>>>>>>>>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the\n>>>>>>>>    loop expects to hold the BQL when calling kvm_cpu_exec() again.\n>>>>>>>>\n>>>>>>>> This ensures proper BQL lock/unlock pairing:\n>>>>>>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()\n>>>>>>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)\n>>>>>>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned\n>>>>>>>> - Next iteration has BQL held as expected\n>>>>>>>>\n>>>>>>>> This is a regression introduced by commit 98884e0cc1 (\"accel/kvm: add\n>>>>>>>> changes required to support KVM VM file descriptor change\") which\n>>>>>>>> refactored kvm_irqchip_create() and changed the initialization timing,\n>>>>>>>> exposing this lock imbalance issue.\n>>>>>>>>\n>>>>>>>> Fixes: 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file descriptor change\")\n>>>>>>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.\n>>>>>>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.\n>>>>>>\n>>>>>> Regarding commit 98884e0cc1:\n>>>>>>\n>>>>>> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:\n>>>>>>\n>>>>>> Remove the \"Fixes:\" tag entirely, or\n>>>>> Lets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.\n>>>>>> Add a note that this is a pre-existing issue exposed by timing changes\n>>>>>> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.\n>>>>>>\n>>>>>>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>\n>>>>>>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>\n>>>>>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>\n>>>>>>>> ---\n>>>>>>>> accel/kvm/kvm-accel-ops.c | 4 ++++\n>>>>>>>> accel/kvm/kvm-all.c       | 1 +\n>>>>>>>> 2 files changed, 5 insertions(+)\n>>>>>>>>\n>>>>>>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c\n>>>>>>>> index 6d9140e549..d684fd0840 100644\n>>>>>>>> --- a/accel/kvm/kvm-accel-ops.c\n>>>>>>>> +++ b/accel/kvm/kvm-accel-ops.c\n>>>>>>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)\n>>>>>>>>\n>>>>>>>>          if (cpu_can_run(cpu)) {\n>>>>>>>>              r = kvm_cpu_exec(cpu);\n>>>>>>>> +            if (r == EXCP_HLT) {\n>>>>>>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */\n>>>>>>>> +                bql_lock();\n>>>>>>>> +            }\n>>>>>>>>              if (r == EXCP_DEBUG) {\n>>>>>>>>                  cpu_handle_guest_debug(cpu);\n>>>>>>>>              }\n>>>>>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\n>>>>>>>> index 774499d34f..00b8018664 100644\n>>>>>>>> --- a/accel/kvm/kvm-all.c\n>>>>>>>> +++ b/accel/kvm/kvm-all.c\n>>>>>>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)\n>>>>>>>>      trace_kvm_cpu_exec();\n>>>>>>>>\n>>>>>>>>      if (kvm_arch_process_async_events(cpu)) {\n>>>>>>>> +        bql_unlock();\n>>>>>>>>          return EXCP_HLT;\n>>>>>>>>      }\n>>>>>>>>\n>>>>>>>> -- \n>>>>>>>> 2.52.0\n> \n>","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256\n header.s=pp1 header.b=ecP6yYXg;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsXRd1PsCz1yGS\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 20:02:51 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wB8hA-00064p-0f; Fri, 10 Apr 2026 06:02:39 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wB8h2-00063Q-8l; Fri, 10 Apr 2026 06:02:29 -0400","from mx0a-001b2d01.pphosted.com ([148.163.156.1])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wB8gt-0000Mz-Tv; Fri, 10 Apr 2026 06:02:25 -0400","from pps.filterd (m0360083.ppops.net [127.0.0.1])\n by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id\n 63A2PQJq2592704; Fri, 10 Apr 2026 10:02:11 GMT","from ppma12.dal12v.mail.ibm.com\n (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220])\n by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4dcn2ehcc3-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Fri, 10 Apr 2026 10:02:10 +0000 (GMT)","from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1])\n by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id\n 63A8UpWu026661;\n Fri, 10 Apr 2026 10:02:09 GMT","from smtprelay04.dal12v.mail.ibm.com ([172.16.1.6])\n by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4dcmg87djj-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Fri, 10 Apr 2026 10:02:09 +0000","from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com\n [10.39.53.228])\n by smtprelay04.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id\n 63AA28cf29229602\n (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);\n Fri, 10 Apr 2026 10:02:08 GMT","from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id 2326F58066;\n Fri, 10 Apr 2026 10:02:08 +0000 (GMT)","from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id 3F38C58055;\n Fri, 10 Apr 2026 10:02:05 +0000 (GMT)","from [9.124.212.238] (unknown [9.124.212.238])\n by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTP;\n Fri, 10 Apr 2026 10:02:04 +0000 (GMT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc\n :content-transfer-encoding:content-type:date:from:in-reply-to\n :message-id:mime-version:references:subject:to; s=pp1; bh=wgS69K\n u+w8Lfzd1Ui3J71zR0yx7I5Sie+M7fdcFnTzQ=; b=ecP6yYXgK7OAkaUbJyrdQB\n LrOhBypONmXt26CO9AO1ROR0ppVjn26O+Mc20gPqWU9YYZXKMXDTI9IwoYzrfN/Z\n VwRc8v7MUutE5YC1aRIz1EDoX6wAUtbzAqrbWc7vW40G43Gd29QMIx0lgCzwUe3r\n wW0v8JXnZRYVVsJ7bbCBgow4QasIMpJWJxhhlmVt/BPD4ZQU/RDcCU8hVAZtZauu\n 83CrOToKtJ2l/sYWyGKqEiZQNPIsO9LNdaURKLfW75JYkeYYGe8xnKP/xF02BFkQ\n 1Eavj0LllwTBd4Zm33mvo2ML1ch6hmKbjXtlsJz8f6z3iHIOdjDE7B7b1A2rw9hA\n ==","Message-ID":"<852c5863-8f56-473d-ad92-a56a80f3ce71@linux.ibm.com>","Date":"Fri, 10 Apr 2026 15:32:03 +0530","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","Content-Language":"en-GB","To":"Ani Sinha <anisinha@redhat.com>","Cc":"qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>\n <4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>\n <451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>\n <424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com>\n <69167029-AE49-4BB3-9A5C-D16E51E5D40F@redhat.com>\n <07f76b99-ff79-4480-af02-c43e2779d179@linux.ibm.com>\n <34DB9C06-ED15-42BF-A7BB-AF1BBC170ADA@redhat.com>","From":"Harsh Prateek Bora <harshpb@linux.ibm.com>","In-Reply-To":"<34DB9C06-ED15-42BF-A7BB-AF1BBC170ADA@redhat.com>","Content-Type":"text/plain; charset=UTF-8; format=flowed","Content-Transfer-Encoding":"8bit","X-TM-AS-GCONF":"00","X-Proofpoint-Reinject":"loops=2 maxloops=12","X-Proofpoint-ORIG-GUID":"vagoCx_FyNWv3Hb4jTCK_dlOZJTcoDSw","X-Proofpoint-Spam-Details-Enc":"AW1haW4tMjYwNDEwMDA5MSBTYWx0ZWRfX69D1KmR9mWna\n JAW1SIU9dPltkBe2E2n4GELzPp/2tfodTh/v7g124YLFnYR+JjTUB4NNqqFIp20mTh4dAL3Xksl\n iH7kLUO6bNn0iSiq+5bBRL/5EdJ5IRkoZ7mjGWwuHJSh/zrzMgI+hAQFtbIQ9oZ1fApA92TLcrL\n GCa4d18bgUd0GdbNUw7L0OkgW3yJUFa9aIdFUlY1hDR3X3gUdtlFC9Tvoye3CLYQhw0ZjchBnKf\n SMH5rkYkwuD3vzEGtGd77lOH7H3CBKEjMPFv1OID88Ko8iSMRRN5+I0FwJKZQ1tvrHxGRuPayr3\n V3wZ+SQx218UD0VjOLu7QYzWEfacUpxDSWtmGENoH8gLiaY1ypZSA05pxlzCxzSazf46qQYkHxZ\n tt/gjqr8gtbBplRJhfj+CJj0E8yVRDNiBkXUzWyeshiyRbz2RFS1ufP/6Ppwvg1Mtfq4erAYaKJ\n BIMkfq40o7GYvt2mvOg==","X-Authority-Analysis":"v=2.4 cv=Cfw4Irrl c=1 sm=1 tr=0 ts=69d8caa2 cx=c_pps\n a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17\n a=IkcTkHD0fZMA:10 a=A5OVakUREuEA:10 a=f7IdgyKtn90A:10\n a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=iQ6ETzBq9ecOQQE5vZCe:22\n a=VnNF1IyMAAAA:8 a=tSrD7f4uqpc6jKVmcS8A:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10\n a=O8hF6Hzn-FEA:10","X-Proofpoint-GUID":"43nQ6WNteqXZWkNfP5dF_ThGRFvaTMi5","X-Proofpoint-Virus-Version":"vendor=baseguard\n engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49\n definitions=2026-04-10_03,2026-04-09_02,2025-10-01_01","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n malwarescore=0 phishscore=0 clxscore=1015 adultscore=0 suspectscore=0\n priorityscore=1501 impostorscore=0 bulkscore=0 spamscore=0 lowpriorityscore=0\n classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0\n reason=mlx scancount=1 engine=8.22.0-2604010000 definitions=main-2604100091","Received-SPF":"pass client-ip=148.163.156.1;\n envelope-from=harshpb@linux.ibm.com;\n helo=mx0a-001b2d01.pphosted.com","X-Spam_score_int":"-26","X-Spam_score":"-2.7","X-Spam_bar":"--","X-Spam_report":"(-2.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,\n DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7,\n RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675772,"web_url":"http://patchwork.ozlabs.org/comment/3675772/","msgid":"<89EDA797-28E3-46A1-A464-8AFE9FB534BD@redhat.com>","list_archive_url":null,"date":"2026-04-10T10:05:13","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":86030,"url":"http://patchwork.ozlabs.org/api/people/86030/","name":"Ani Sinha","email":"anisinha@redhat.com"},"content":"> On 10 Apr 2026, at 3:32 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n> \n> \n> \n> On 10/04/26 3:01 pm, Ani Sinha wrote:\n>>> On 10 Apr 2026, at 2:31 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>> \n>>> \n>>> \n>>> On 10/04/26 1:59 pm, Ani Sinha wrote:\n>>>>> On 10 Apr 2026, at 1:48 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>> \n>>>>> \n>>>>> \n>>>>> On 10/04/26 12:05 pm, Ani Sinha wrote:\n>>>>>>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>>>> \n>>>>>>> Hi Ani,\n>>>>>>> \n>>>>>>> On 10/04/26 9:12 am, Ani Sinha wrote:\n>>>>>>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>>>>>> \n>>>>>>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n>>>>>>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n>>>>>>>>> This caused a lock imbalance where the vCPU thread would loop back to\n>>>>>>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n>>>>>>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?\n>>>>>>> \n>>>>>>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.\n>>>>>>> \n>>>>>>> Normal execution path:\n>>>>>>> \n>>>>>>> int kvm_cpu_exec(CPUState *cpu)\n>>>>>>> {\n>>>>>>>    // BQL held on entry (from caller)\n>>>>>>> \n>>>>>>>    if (kvm_arch_process_async_events(cpu)) {\n>>>>>>>        return EXCP_HLT;  // ← Returns with BQL STILL HELD\n>>>>>>>    }\n>>>>>>> \n>>>>>>>    bql_unlock();  // ← Normal path unlocks here\n>>>>>>>    // ... KVM execution loop ...\n>>>>>>>    bql_lock();    // ← Re-acquires before returning\n>>>>>>>    return ret;\n>>>>>>> }\n>>>>>> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.\n>>>>>> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.\n>>>>>>> \n>>>>>>> The lock imbalance:\n>>>>>>> \n>>>>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.\n>>>>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.\n>>>>>>> This means the early return path keeps the BQL held,\n>>>>>> This would be the correct thing to do.\n>>>>>>> while the normal execution path releases and re-acquires it.\n>>>>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.\n>>>>> \n>>>>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.\n>>>> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.\n>>> \n>>> I think it is better to unlock bql early in failure cases.\n>>> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the caller for EXCP_HLT, which might look a bit odd as well.\n>> But you are doing exactly that across a function call.\n> \n> I am fine with either/or maintainer's choice.\n> \n>>> \n>>> Paolo, suggestions?\n>>> \n>>>>> \n>>>>>>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration\n>>>>>>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.\n>>>>>>> \n>>>>>>> Why this matters:\n>>>>>>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.\n>>>>>> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?\n>>>>> \n>>>>> KVM vCPUs need to enter the kernel to handle the halted state and therefore can run. On spapr, it is handled via start-cpu rtas call for which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to be able to proceed before that stage is reached, but it hangs while trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu is spinning with BQL held returning EXCP_HLT. This is causing deadlock.\n>>>> Without looking at the code, it seems there is a race condition in the way the vcpu threads are initialised in spapr. I think that needs fixing.\n>>> \n>>> I did explain the race condition observed above.\n>> So why not fix it properly?\n> \n> I think the suggested fix is appropriate for this scenario.\n> Keeping BQL held forever in a loop for EXCP_HLT case isnt the right thing to do.\n\nIMHO that loop is a symptom of a deeper architectural issue. The fix proposed here feels more like a incorrect hack.","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=mimecast20190719 header.b=adImUoMz;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsXWK1kCMz1yGb\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 20:06:05 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wB8k6-0007Ii-2f; Fri, 10 Apr 2026 06:05:38 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wB8k5-0007IG-6n\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 06:05:37 -0400","from us-smtp-delivery-124.mimecast.com ([170.10.129.124])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wB8k2-0001Om-O1\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 06:05:36 -0400","from mail-pf1-f198.google.com (mail-pf1-f198.google.com\n [209.85.210.198]) by relay.mimecast.com with ESMTP with STARTTLS\n (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id\n us-mta-687-MmUTIAVUOiGsuX9vHREbog-1; Fri, 10 Apr 2026 06:05:32 -0400","by mail-pf1-f198.google.com with SMTP id\n d2e1a72fcca58-82a88a2704fso906341b3a.0\n for <qemu-ppc@nongnu.org>; Fri, 10 Apr 2026 03:05:31 -0700 (PDT)","from smtpclient.apple ([122.163.114.34])\n by smtp.gmail.com with ESMTPSA id\n d2e1a72fcca58-82f0c4e3d41sm3109682b3a.48.2026.04.10.03.05.26\n (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);\n Fri, 10 Apr 2026 03:05:29 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;\n s=mimecast20190719; t=1775815533;\n h=from:from:reply-to:subject:subject:date:date:message-id:message-id:\n to:to:cc:cc:mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=Ns6BkFzMJb7r5L+NrAMpfmv4N1efC9/3djz7uP3kZy4=;\n b=adImUoMzPWfXtgGkNkshBh57wLDwk7jeyIOJrXCLH0pjVD4PonmaeVqpduA+QAkclLy1Qg\n CO+eMANmvmciTryo7J1NqxozPFFO6FYpObCHrjaQHhZaCW6I9165bHwx0sFHiYkOkO9NUo\n xLz93MIAct09+6xC6vORljB3tfFDkgM=","X-MC-Unique":"MmUTIAVUOiGsuX9vHREbog-1","X-Mimecast-MFC-AGG-ID":"MmUTIAVUOiGsuX9vHREbog_1775815531","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1775815531; x=1776420331;\n h=to:references:message-id:content-transfer-encoding:cc:date\n :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state\n :from:to:cc:subject:date:message-id:reply-to;\n bh=PLQxTuoOSmu7wueIRZEqqREP+YovMsVy7W3Hn70m5Ls=;\n b=GVZ99yjCz9t3jFm6pDDckYuy6RhJ7hKkhKvXz7fT+Cr3wE794F+FDen/VBAuqz/+qb\n 93FE9JreQMRSZMqj/eu6A+vUPNgqIeUjK+vRVq3HYs5X2L80nX2Fapwkp+m16RWwC8by\n grgygOZLqleSA8KM7BhFyGm3iqwFLFy0OHez6uw+eqsRaN9bgm+5IAkbb13R8OLUDB78\n DYDIgcpnQf529B9uctBCmsv4tXPy4zjKpD3D5mYUs03nPlyo+i03h2au22zEO2QqEV/b\n rCl0nufUYDYNQxvZxT8q0ApSTb6IUybSYNuhs52lOP6qEJgT0/UC4o+dR5TR7vUpLWI8\n m1Aw==","X-Forwarded-Encrypted":"i=1;\n AJvYcCXLqU/xRXouP0lItZXLSi7S6S6Wk+RQX+fM1tVBHu62P6eIqOspCozfKXxrD5ooXkhwfMhbp86Ntw==@nongnu.org","X-Gm-Message-State":"AOJu0YzwJNtXjswU/VgeYzZ0WMaAY8MWLceFbtp3BU21OS4ATVuA9quG\n ACXuy1+5KhQ4LjRL3OPqp8ENk3xRTUNKvNGXCJEzCHe8J2i/OKyWl8LFFoESWrGbLto/Nh84vWd\n /6/DykAkw5MmIntIa5mvdWon2Gcj18Yy9gvV8spqzJhtznvVatcV05g==","X-Gm-Gg":"AeBDieuHwuVBJwszsKxrJbVaHxpzE+vII/6jQHzQXzwnJrwi6mBDuReGm1uqe1RtoMh\n 08U3T4VbWm+qM0vQy49U8M2noWVh0ZceBcHJ026XEfpkke7hesR7qaTYF5KAyxrzOvQ8Ug1DD4A\n SpgLYhzGk55apQRIX/fua8wyq7h9uw3ChsBtdmQh8gbXOr07CXRA53nWzqo4EgDX/15AQE1Olgu\n fGSGOfP2eyKH2B1CPke4FgIhvYvXKgkeLMMN8pP5F8SmRvfk2NDnHqDg79wGsXcWEbYd/peH0Ik\n jmiupVKs8gf6uuuNkdzrc/maxRsvNkZXYXIkwnwy+vPUu8s++Fhi93BezCMfGUAjBFx4/oGfklp\n 52WT7rdMaqgUkFxomI5NQdjAWAt/aWoKGf0KTwExAj21B0B13hicp//UB79Fm4PMTOQtra4mlFf\n Y=","X-Received":["by 2002:a05:6a00:2e08:b0:82c:c390:ad77 with SMTP id\n d2e1a72fcca58-82dd8a1dbb5mr6256998b3a.7.1775815530790;\n Fri, 10 Apr 2026 03:05:30 -0700 (PDT)","by 2002:a05:6a00:2e08:b0:82c:c390:ad77 with SMTP id\n d2e1a72fcca58-82dd8a1dbb5mr6256960b3a.7.1775815530087;\n Fri, 10 Apr 2026 03:05:30 -0700 (PDT)"],"Mime-Version":"1.0 (Mac OS X Mail 16.0 \\(3864.500.181\\))","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","From":"Ani Sinha <anisinha@redhat.com>","In-Reply-To":"<852c5863-8f56-473d-ad92-a56a80f3ce71@linux.ibm.com>","Date":"Fri, 10 Apr 2026 15:35:13 +0530","Cc":"qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","Message-Id":"<89EDA797-28E3-46A1-A464-8AFE9FB534BD@redhat.com>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>\n <4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>\n <451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>\n <424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com>\n <69167029-AE49-4BB3-9A5C-D16E51E5D40F@redhat.com>\n <07f76b99-ff79-4480-af02-c43e2779d179@linux.ibm.com>\n <34DB9C06-ED15-42BF-A7BB-AF1BBC170ADA@redhat.com>\n <852c5863-8f56-473d-ad92-a56a80f3ce71@linux.ibm.com>","To":"Harsh Prateek Bora <harshpb@linux.ibm.com>","X-Mailer":"Apple Mail (2.3864.500.181)","X-Mimecast-Spam-Score":"0","X-Mimecast-MFC-PROC-ID":"ZjXP1juPjHNLR5Yu1H22qDBvXdKAm-Ae1BiFBNLY-QA_1775815531","X-Mimecast-Originator":"redhat.com","Content-Type":"text/plain;\n\tcharset=utf-8","Content-Transfer-Encoding":"quoted-printable","Received-SPF":"pass client-ip=170.10.129.124;\n envelope-from=anisinha@redhat.com;\n helo=us-smtp-delivery-124.mimecast.com","X-Spam_score_int":"-25","X-Spam_score":"-2.6","X-Spam_bar":"--","X-Spam_report":"(-2.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.54,\n DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,\n RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675778,"web_url":"http://patchwork.ozlabs.org/comment/3675778/","msgid":"<c56cf0e3-6d95-40a3-b00f-0b95c8137b14@linux.ibm.com>","list_archive_url":null,"date":"2026-04-10T10:16:51","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":85411,"url":"http://patchwork.ozlabs.org/api/people/85411/","name":"Harsh Prateek Bora","email":"harshpb@linux.ibm.com"},"content":"On 10/04/26 3:35 pm, Ani Sinha wrote:\n> \n> \n>> On 10 Apr 2026, at 3:32 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>\n>>\n>>\n>> On 10/04/26 3:01 pm, Ani Sinha wrote:\n>>>> On 10 Apr 2026, at 2:31 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>\n>>>>\n>>>>\n>>>> On 10/04/26 1:59 pm, Ani Sinha wrote:\n>>>>>> On 10 Apr 2026, at 1:48 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>>>\n>>>>>>\n>>>>>>\n>>>>>> On 10/04/26 12:05 pm, Ani Sinha wrote:\n>>>>>>>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>>>>>\n>>>>>>>> Hi Ani,\n>>>>>>>>\n>>>>>>>> On 10/04/26 9:12 am, Ani Sinha wrote:\n>>>>>>>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n>>>>>>>>>>\n>>>>>>>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n>>>>>>>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n>>>>>>>>>> This caused a lock imbalance where the vCPU thread would loop back to\n>>>>>>>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n>>>>>>>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?\n>>>>>>>>\n>>>>>>>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.\n>>>>>>>>\n>>>>>>>> Normal execution path:\n>>>>>>>>\n>>>>>>>> int kvm_cpu_exec(CPUState *cpu)\n>>>>>>>> {\n>>>>>>>>     // BQL held on entry (from caller)\n>>>>>>>>\n>>>>>>>>     if (kvm_arch_process_async_events(cpu)) {\n>>>>>>>>         return EXCP_HLT;  // ← Returns with BQL STILL HELD\n>>>>>>>>     }\n>>>>>>>>\n>>>>>>>>     bql_unlock();  // ← Normal path unlocks here\n>>>>>>>>     // ... KVM execution loop ...\n>>>>>>>>     bql_lock();    // ← Re-acquires before returning\n>>>>>>>>     return ret;\n>>>>>>>> }\n>>>>>>> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.\n>>>>>>> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.\n>>>>>>>>\n>>>>>>>> The lock imbalance:\n>>>>>>>>\n>>>>>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.\n>>>>>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.\n>>>>>>>> This means the early return path keeps the BQL held,\n>>>>>>> This would be the correct thing to do.\n>>>>>>>> while the normal execution path releases and re-acquires it.\n>>>>>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.\n>>>>>>\n>>>>>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.\n>>>>> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.\n>>>>\n>>>> I think it is better to unlock bql early in failure cases.\n>>>> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the caller for EXCP_HLT, which might look a bit odd as well.\n>>> But you are doing exactly that across a function call.\n>>\n>> I am fine with either/or maintainer's choice.\n>>\n>>>>\n>>>> Paolo, suggestions?\n>>>>\n>>>>>>\n>>>>>>>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration\n>>>>>>>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.\n>>>>>>>>\n>>>>>>>> Why this matters:\n>>>>>>>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.\n>>>>>>> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?\n>>>>>>\n>>>>>> KVM vCPUs need to enter the kernel to handle the halted state and therefore can run. On spapr, it is handled via start-cpu rtas call for which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to be able to proceed before that stage is reached, but it hangs while trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu is spinning with BQL held returning EXCP_HLT. This is causing deadlock.\n>>>>> Without looking at the code, it seems there is a race condition in the way the vcpu threads are initialised in spapr. I think that needs fixing.\n>>>>\n>>>> I did explain the race condition observed above.\n>>> So why not fix it properly?\n>>\n>> I think the suggested fix is appropriate for this scenario.\n>> Keeping BQL held forever in a loop for EXCP_HLT case isnt the right thing to do.\n> \n> IMHO that loop is a symptom of a deeper architectural issue. The fix proposed here feels more like a incorrect hack.\n\nI do not think so. Irrespective of the issue, there's no reason to keep \nholding BQL forever in a loop like that for EXCP_HLT case.\n\n\n>","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256\n header.s=pp1 header.b=L7lfqWuq;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsXmf2cPTz1yGS\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 20:17:36 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wB8vB-0002w3-Rl; Fri, 10 Apr 2026 06:17:06 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wB8v9-0002vF-8z; Fri, 10 Apr 2026 06:17:04 -0400","from mx0b-001b2d01.pphosted.com ([148.163.158.5])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wB8v7-0003mM-9R; Fri, 10 Apr 2026 06:17:02 -0400","from pps.filterd (m0360072.ppops.net [127.0.0.1])\n by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id\n 63A1iO3m2298432; Fri, 10 Apr 2026 10:16:59 GMT","from ppma12.dal12v.mail.ibm.com\n (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220])\n by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4dcn2g8bad-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Fri, 10 Apr 2026 10:16:58 +0000 (GMT)","from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1])\n by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id\n 63A8ZA4h026646;\n Fri, 10 Apr 2026 10:16:57 GMT","from smtprelay01.wdc07v.mail.ibm.com ([172.16.1.68])\n by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4dcmg87f7w-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Fri, 10 Apr 2026 10:16:57 +0000","from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com\n [10.39.53.228])\n by smtprelay01.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id\n 63AAGu6c983980\n (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);\n Fri, 10 Apr 2026 10:16:56 GMT","from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id 1A78358066;\n Fri, 10 Apr 2026 10:16:56 +0000 (GMT)","from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id 5723F58055;\n Fri, 10 Apr 2026 10:16:53 +0000 (GMT)","from [9.124.212.238] (unknown [9.124.212.238])\n by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTP;\n Fri, 10 Apr 2026 10:16:53 +0000 (GMT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc\n :content-transfer-encoding:content-type:date:from:in-reply-to\n :message-id:mime-version:references:subject:to; s=pp1; bh=/UHNg6\n cyNaixJPFd2I27xfAXRNvI652l2wGrfTl3TxY=; b=L7lfqWuq9MFePKZlIFT0EH\n FWgC3KoO2yOa+Koo3jlwdUbQTSTg2NPeL9vzROu6BWQg15Ts1eKqNneG+6yE5wEM\n 50s0KOGkh/849yoACIixFwCBFWf6g4yx9stKMc/Ew0pT6cLI7OXa+P5dRmOMJFnP\n hpO1aoNqjTan+9ONptM0Z28JRalIhp0YxjYWou9MzblPvO/2e9nixpSHKSGEPwXW\n FuTVhJUyoWUEkJBhTcFWHJTGsoicEhA8I3iwuuQPVvAUoG3syjKkDKr8wufQVAPx\n oVSKtkWd8uktozLmW1OJOKNlYm946TDl1eCV9miUxCK0XiSxWjz+l3td8/mhfq9w\n ==","Message-ID":"<c56cf0e3-6d95-40a3-b00f-0b95c8137b14@linux.ibm.com>","Date":"Fri, 10 Apr 2026 15:46:51 +0530","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","Content-Language":"en-GB","To":"Ani Sinha <anisinha@redhat.com>","Cc":"qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>\n <4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>\n <451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>\n <424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com>\n <69167029-AE49-4BB3-9A5C-D16E51E5D40F@redhat.com>\n <07f76b99-ff79-4480-af02-c43e2779d179@linux.ibm.com>\n <34DB9C06-ED15-42BF-A7BB-AF1BBC170ADA@redhat.com>\n <852c5863-8f56-473d-ad92-a56a80f3ce71@linux.ibm.com>\n <89EDA797-28E3-46A1-A464-8AFE9FB534BD@redhat.com>","From":"Harsh Prateek Bora <harshpb@linux.ibm.com>","In-Reply-To":"<89EDA797-28E3-46A1-A464-8AFE9FB534BD@redhat.com>","Content-Type":"text/plain; charset=UTF-8; format=flowed","Content-Transfer-Encoding":"8bit","X-TM-AS-GCONF":"00","X-Proofpoint-Reinject":"loops=2 maxloops=12","X-Authority-Analysis":"v=2.4 cv=KeridwYD c=1 sm=1 tr=0 ts=69d8ce1a cx=c_pps\n a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17\n a=IkcTkHD0fZMA:10 a=A5OVakUREuEA:10 a=f7IdgyKtn90A:10\n a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=RzCfie-kr_QcCd8fBx8p:22\n a=VnNF1IyMAAAA:8 a=tNryw6az7MBK7zL6OWoA:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10\n a=O8hF6Hzn-FEA:10","X-Proofpoint-ORIG-GUID":"QmVloEOwyCRIS9kpvTr2hO6csjjL_ZOm","X-Proofpoint-GUID":"G3kji3VUuDmS5CQuV4wAyHOAzGG1eLfu","X-Proofpoint-Spam-Details-Enc":"AW1haW4tMjYwNDEwMDA5MSBTYWx0ZWRfX68YkYsVVNi7x\n EhC2jLsxQudt80LEUtA/ZLVa+zHZWbCVqg2tvPtgLau0rLFBxNvXGzRGsxoum08+celQowcZC11\n CWYfzQfYVI+6agKWiPo4jCeBE+VvOnBg/5LikysrV1EuOnrADkxV2QNHOAEYCsFH/Ie0Cl1ulzP\n q5XRAPBxYqHDq9impchXjWYJYQKyrB6DZUWVPxHVzQ81FKiNvHMX3RjlNnB+lYlyYE2v1qzHeN4\n jIuPEWD7ARq/7ZaroFixjVTZaQps5DbTmWPSNtyri1HnKl3bKj+4u36Io9v++PUCqOppcaMkJLb\n GloEXXLLbLdtxjUi3hnoXeuvbevyNyJsUrOkB4t6BpFyBkkibveRB0c2373NM+e/aC1JybosCPy\n olzuKD/ADJGydfCRZvOxM6Lhq83a+1ik2wQtfRF7VKpQWFnSe5mibnvph1fywkt4ClQKNJF+kGW\n oblv2GOVT7dPEcOQfAA==","X-Proofpoint-Virus-Version":"vendor=baseguard\n engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49\n definitions=2026-04-10_03,2026-04-09_02,2025-10-01_01","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n suspectscore=0 malwarescore=0 lowpriorityscore=0 adultscore=0 impostorscore=0\n clxscore=1015 phishscore=0 priorityscore=1501 spamscore=0 bulkscore=0\n classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0\n reason=mlx scancount=1 engine=8.22.0-2604010000 definitions=main-2604100091","Received-SPF":"pass client-ip=148.163.158.5;\n envelope-from=harshpb@linux.ibm.com;\n helo=mx0b-001b2d01.pphosted.com","X-Spam_score_int":"-26","X-Spam_score":"-2.7","X-Spam_bar":"--","X-Spam_report":"(-2.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,\n DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7,\n RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675873,"web_url":"http://patchwork.ozlabs.org/comment/3675873/","msgid":"<c49859ce-7d98-04f7-69f8-d241e4148278@eik.bme.hu>","list_archive_url":null,"date":"2026-04-10T13:04:11","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":16148,"url":"http://patchwork.ozlabs.org/api/people/16148/","name":"BALATON Zoltan","email":"balaton@eik.bme.hu"},"content":"On Fri, 10 Apr 2026, Harsh Prateek Bora wrote:\n>>>>> The lock imbalance:\n>>>>> \n>>>>> When kvm_arch_process_async_events() returns true, the function returns \n>>>>> EXCP_HLT before the bql_unlock() call.\n>>>> Why should it unlock it before returning? In fact it’s opposite. If the \n>>>> function had unlocked bql, it should lock it again before returning.\n>>>>> This means the early return path keeps the BQL held,\n>>>> This would be the correct thing to do.\n>>>>> while the normal execution path releases and re-acquires it.\n>>>> Because the functions it calls after unlocking requires bql to be \n>>>> unlocked. Since it had to unlock it, it locks it again before returning.\n>>> \n>>> It had to unlock it for the same reason - to give others a chance to lock. \n>>> We need to handle failure/exception cases for the same purpose as well.\n>> \n>> But by unlocking and returning you are breaking the semantics of the \n>> function and introducing imbalance.\n>\n> I think it is better to unlock bql early in failure cases.\n> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the \n> caller for EXCP_HLT, which might look a bit odd as well.\n\nSo why not do\n\nbql_unlock()\n/* comment explaining why this is needed */\nbql_lock()\nreturn EXCP_HLT;\n\nThat should keep the function return locked and fix the problem without \nhaving to hack the caller for this case.\n\nOn Fri, 10 Apr 2026, Ani Sinha wrote:\n> >>> I did explain the race condition observed above.\n> >> So why not fix it properly?\n> > \n> > I think the suggested fix is appropriate for this scenario.\n> > Keeping BQL held forever in a loop for EXCP_HLT case isnt the right thing to do.\n> \n> IMHO that loop is a symptom of a deeper architectural issue. The fix proposed here feels more like a incorrect hack.\n\nEven if this is not a complete fix of an underlying problem but only a \nwork around maybe it's safer to do this just before a release than trying \nto rewrite something deeper now.\n\nRegards,\nBALATON Zoltan","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":"legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)","Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fscTQ4c5Mz1y2d\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 23:04:40 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wBBX4-00006s-Aq; Fri, 10 Apr 2026 09:04:22 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <balaton@eik.bme.hu>)\n id 1wBBX3-00006W-1B; Fri, 10 Apr 2026 09:04:21 -0400","from zero.eik.bme.hu ([152.66.115.2])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <balaton@eik.bme.hu>)\n id 1wBBX0-0003v9-RC; Fri, 10 Apr 2026 09:04:20 -0400","from localhost (localhost [127.0.0.1])\n by zero.eik.bme.hu (Postfix) with ESMTP id 6BB50596A22;\n Fri, 10 Apr 2026 15:04:13 +0200 (CEST)","from zero.eik.bme.hu ([127.0.0.1])\n by localhost (zero.eik.bme.hu [127.0.0.1]) (amavis, port 10028) with ESMTP\n id iK-s5ewEpxdG; Fri, 10 Apr 2026 15:04:11 +0200 (CEST)","by zero.eik.bme.hu (Postfix, from userid 432)\n id 5CA885969F6; Fri, 10 Apr 2026 15:04:11 +0200 (CEST)","from localhost (localhost [127.0.0.1])\n by zero.eik.bme.hu (Postfix) with ESMTP id 5A7235969F2;\n Fri, 10 Apr 2026 15:04:11 +0200 (CEST)"],"X-Virus-Scanned":"amavis at eik.bme.hu","Date":"Fri, 10 Apr 2026 15:04:11 +0200 (CEST)","From":"BALATON Zoltan <balaton@eik.bme.hu>","To":"Harsh Prateek Bora <harshpb@linux.ibm.com>","cc":"Ani Sinha <anisinha@redhat.com>, qemu-devel <qemu-devel@nongnu.org>,\n qemu-ppc@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,\n npiggin@gmail.com, misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","In-Reply-To":"<07f76b99-ff79-4480-af02-c43e2779d179@linux.ibm.com>","Message-ID":"<c49859ce-7d98-04f7-69f8-d241e4148278@eik.bme.hu>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>\n <4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>\n <451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>\n <424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com>\n <69167029-AE49-4BB3-9A5C-D16E51E5D40F@redhat.com>\n <07f76b99-ff79-4480-af02-c43e2779d179@linux.ibm.com>","MIME-Version":"1.0","Content-Type":"multipart/mixed;\n BOUNDARY=\"3866299591-1068443836-1775826072=:71846\"","Content-ID":"<1ddd52c2-d7f1-0ae5-3e36-79d1f4dbd8ec@eik.bme.hu>","Received-SPF":"pass client-ip=152.66.115.2; envelope-from=balaton@eik.bme.hu;\n helo=zero.eik.bme.hu","X-Spam_score_int":"-18","X-Spam_score":"-1.9","X-Spam_bar":"-","X-Spam_report":"(-1.9 / 5.0 requ) BAYES_00=-1.9,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675882,"web_url":"http://patchwork.ozlabs.org/comment/3675882/","msgid":"<6816BC33-C3A0-4E89-8F85-C6ECE1108409@redhat.com>","list_archive_url":null,"date":"2026-04-10T13:37:56","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":86030,"url":"http://patchwork.ozlabs.org/api/people/86030/","name":"Ani Sinha","email":"anisinha@redhat.com"},"content":"> On 10 Apr 2026, at 6:34 PM, BALATON Zoltan <balaton@eik.bme.hu> wrote:\n> \n> On Fri, 10 Apr 2026, Harsh Prateek Bora wrote:\n>>>>>> The lock imbalance:\n>>>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.\n>>>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.\n>>>>>> This means the early return path keeps the BQL held,\n>>>>> This would be the correct thing to do.\n>>>>>> while the normal execution path releases and re-acquires it.\n>>>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.\n>>>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.\n>>> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.\n>> \n>> I think it is better to unlock bql early in failure cases.\n>> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the caller for EXCP_HLT, which might look a bit odd as well.\n> \n> So why not do\n> \n> bql_unlock()\n> /* comment explaining why this is needed */\n> bql_lock()\n> return EXCP_HLT;\n\nIf we go down this path, what would be wrong with\n\nbql_unlock()\nsleep(100);\nbql_lock();\n\nThis would make the window even larger for the other thread to make progress … How about sleeping for 1 min? 10 min? \n\n\n> \n> That should keep the function return locked and fix the problem without having to hack the caller for this case.\n> \n> On Fri, 10 Apr 2026, Ani Sinha wrote:\n>> >>> I did explain the race condition observed above.\n>> >> So why not fix it properly?\n>> > > I think the suggested fix is appropriate for this scenario.\n>> > Keeping BQL held forever in a loop for EXCP_HLT case isnt the right thing to do.\n>> IMHO that loop is a symptom of a deeper architectural issue. The fix proposed here feels more like a incorrect hack.\n> \n> Even if this is not a complete fix of an underlying problem but only a work around maybe it's safer to do this just before a release than trying to rewrite something deeper now.\n> \n> Regards,\n> BALATON Zoltan","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=mimecast20190719 header.b=hcPmuWZQ;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsdDp6R68z1yGb\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 10 Apr 2026 23:38:49 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wBC3x-0002hD-Sg; Fri, 10 Apr 2026 09:38:21 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wBC3w-0002fe-Az\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 09:38:20 -0400","from us-smtp-delivery-124.mimecast.com ([170.10.129.124])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wBC3u-0005Ty-B9\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 09:38:20 -0400","from mail-pf1-f200.google.com (mail-pf1-f200.google.com\n [209.85.210.200]) by relay.mimecast.com with ESMTP with STARTTLS\n (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id\n us-mta-125-NClxlLN9PP-JLVbpzVuQ6w-1; Fri, 10 Apr 2026 09:38:14 -0400","by mail-pf1-f200.google.com with SMTP id\n d2e1a72fcca58-82c613194caso1157241b3a.1\n for <qemu-ppc@nongnu.org>; Fri, 10 Apr 2026 06:38:14 -0700 (PDT)","from smtpclient.apple ([122.163.114.34])\n by smtp.gmail.com with ESMTPSA id\n d2e1a72fcca58-82f0c50d24bsm3858685b3a.57.2026.04.10.06.38.08\n (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);\n Fri, 10 Apr 2026 06:38:11 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;\n s=mimecast20190719; t=1775828296;\n h=from:from:reply-to:subject:subject:date:date:message-id:message-id:\n to:to:cc:cc:mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=HjSe4jwDdl3aV+P6vHrHrEGyPUqVbgrk4bHFv9eCr9k=;\n b=hcPmuWZQmrfr3SdOXhx0DG8uuRjj6lriS+x1+3TGp+5JBB9UcYQ5k+ENQsdK55rA5gXOB5\n slfk9cJ4voaMT3NmTIDoJ+7sPlvi8xJlmyxSAOpEz1Zr3/DnqA0xCC6rlGLBN3bl4NGiZH\n ZSv7LsMuthv2xviRoY6E7yuAoO34wYU=","X-MC-Unique":"NClxlLN9PP-JLVbpzVuQ6w-1","X-Mimecast-MFC-AGG-ID":"NClxlLN9PP-JLVbpzVuQ6w_1775828293","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1775828293; x=1776433093;\n h=to:references:message-id:content-transfer-encoding:cc:date\n :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state\n :from:to:cc:subject:date:message-id:reply-to;\n bh=90GlQ02GA+w1eGs62STQFXkrZqT2kakYwYUQM44UiDs=;\n b=YUvlSHPnIeG7WWxfpl9lZKDYg/4W7Wn+FRkFTkmTMFSusdYNeFdVJ421Mf7oLzVHtD\n Y1ZQWQj8g0N9S2mTtXcvZF5z4ScjOJpKVz/lPiJtDonDQNlsy3z9AOcfWO+5KoYMIqCC\n +HAfu3IN8dm0TfKvT5gzjB3xmI+D67Wlpmdx4XoPlb1+PAk74uz9m7w54Pb9nzD8nXIT\n 0LzsiyVqut2ADJoHRBU+u1z9BLv6hCI0qsL4N/iAYHb56/qxIjjbjzM7Zu+O+7WTT/Bc\n F4McW8hh9DcpXhANWhsfgprCabYfcV6fECo3eW9Y6iXs9++pZrJOzDQBzvgtwS4fXNx0\n ujMg==","X-Forwarded-Encrypted":"i=1;\n AJvYcCWnWoca56exCJvElqEQM9ixFpnybfL4OOkFkdpz7zlcbspca9PBJ8vnRY1AGuKvm2zLZ06ZzD+sDg==@nongnu.org","X-Gm-Message-State":"AOJu0YxC1Mn7yYGZotM9F8SuTPqdqN0YAKYNaxtoHop+NC2PjWUE5DZE\n zO+eTkDUTNFAAfA6ZnXNOquOJXp/KGjf9Z4nom5NVpcmwZMvdToqwaz9QYdUMDXXWrqvdf7n5Ta\n vl/inQOeoIFZNfLk2kJ0sqJZQ8ZG/oT7X4unZSZg+V4icfvA1AjgdBg==","X-Gm-Gg":"AeBDievkrXOhThH8bixTPWC/GUbtBSmWs8RFyuu1weIKrshWCm6rE3X7xrJNcF2+Cad\n nbFIKPwMNN9Dd6xdLszqSAX2ved8Sjl9Rc/os1/XMzSJrILDPpp44kgJ91QcGvDrMkQ/XfQOvYG\n L2oVUVqgvP4IyLtNA/EU39heqIn3PRIM5nf64f+Qm67IOaTJOzGXwvCS7tHsBxggrZKM3n4FNw/\n 4E3Mf/fehzDz4LAQxyMFGY4nmVN6DreORffWKn+FwFiKb4sExqAqUt7N39o492QaGTGMZSc2ToE\n iJmzYL8hMHlgzQEtMzmo5AIv2AXhj2E8EUAXoeaOi+JnA6eefUA1TmD8I41FR9uOEizdvmOdHSx\n ZgB0PxXBAr/0LSa/l0frh1Bo5CbPTsIBc94H69taMkSbjuzyqAqEwYv22lvDSSvH4O4wpD5tA/g\n 8=","X-Received":["by 2002:a05:6a00:aa85:b0:81f:852b:a925 with SMTP id\n d2e1a72fcca58-82f0c1cc121mr3580966b3a.1.1775828293089;\n Fri, 10 Apr 2026 06:38:13 -0700 (PDT)","by 2002:a05:6a00:aa85:b0:81f:852b:a925 with SMTP id\n d2e1a72fcca58-82f0c1cc121mr3580943b3a.1.1775828292578;\n Fri, 10 Apr 2026 06:38:12 -0700 (PDT)"],"Mime-Version":"1.0 (Mac OS X Mail 16.0 \\(3864.500.181\\))","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","From":"Ani Sinha <anisinha@redhat.com>","In-Reply-To":"<c49859ce-7d98-04f7-69f8-d241e4148278@eik.bme.hu>","Date":"Fri, 10 Apr 2026 19:07:56 +0530","Cc":"Harsh Prateek Bora <harshpb@linux.ibm.com>,\n qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","Message-Id":"<6816BC33-C3A0-4E89-8F85-C6ECE1108409@redhat.com>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>\n <4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>\n <451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>\n <424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com>\n <69167029-AE49-4BB3-9A5C-D16E51E5D40F@redhat.com>\n <07f76b99-ff79-4480-af02-c43e2779d179@linux.ibm.com>\n <c49859ce-7d98-04f7-69f8-d241e4148278@eik.bme.hu>","To":"BALATON Zoltan <balaton@eik.bme.hu>","X-Mailer":"Apple Mail (2.3864.500.181)","X-Mimecast-Spam-Score":"0","X-Mimecast-MFC-PROC-ID":"HNsbXIfpRePcRN4VcQOWJTXQAu3d2nv_VhbOMqa-KUA_1775828293","X-Mimecast-Originator":"redhat.com","Content-Type":"text/plain;\n\tcharset=utf-8","Content-Transfer-Encoding":"quoted-printable","Received-SPF":"pass client-ip=170.10.129.124;\n envelope-from=anisinha@redhat.com;\n helo=us-smtp-delivery-124.mimecast.com","X-Spam_score_int":"-25","X-Spam_score":"-2.6","X-Spam_bar":"--","X-Spam_report":"(-2.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.54,\n DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,\n RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3675922,"web_url":"http://patchwork.ozlabs.org/comment/3675922/","msgid":"<2302cd01-b2ed-f0f2-93aa-8fd6c4836a7f@eik.bme.hu>","list_archive_url":null,"date":"2026-04-10T15:07:52","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":16148,"url":"http://patchwork.ozlabs.org/api/people/16148/","name":"BALATON Zoltan","email":"balaton@eik.bme.hu"},"content":"On Fri, 10 Apr 2026, Ani Sinha wrote:\n>> On 10 Apr 2026, at 6:34 PM, BALATON Zoltan <balaton@eik.bme.hu> wrote:\n>> On Fri, 10 Apr 2026, Harsh Prateek Bora wrote:\n>>>>>>> The lock imbalance:\n>>>>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.\n>>>>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.\n>>>>>>> This means the early return path keeps the BQL held,\n>>>>>> This would be the correct thing to do.\n>>>>>>> while the normal execution path releases and re-acquires it.\n>>>>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.\n>>>>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.\n>>>> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.\n>>>\n>>> I think it is better to unlock bql early in failure cases.\n>>> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the caller for EXCP_HLT, which might look a bit odd as well.\n>>\n>> So why not do\n>>\n>> bql_unlock()\n>> /* comment explaining why this is needed */\n>> bql_lock()\n>> return EXCP_HLT;\n>\n> If we go down this path, what would be wrong with\n>\n> bql_unlock()\n> sleep(100);\n> bql_lock();\n>\n> This would make the window even larger for the other thread to make progress … How about sleeping for 1 min? 10 min?\n\nIf the reason to unlock is to give a chance to other waiting threads to \nacquire the lock then no need to sleep long as we'll wait in bql_lock if \nanother thread got the lock so less than a second sleep or maybe just a \nfew milliseconds should be plenty of time for other threads to get a \nchance to run and avoid a deadlock.\n\nI understand that you say that maybe there could be a better fix elsewhere \nbut we don't seem to fully understand the issue and trying a bigger \nrewrite now is more likely to break something than this hack that should \ngive suffucient work around for the relase at least.\n\nBut I'm not involved in this just tried to at least resolve the deadlock \nbetween you if I can't resolve the deadlock in BQL. :-)\n\nRegards,\nBALATON Zoltan","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":"legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)","Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fsgD11tBYz1yGS\n\tfor <incoming@patchwork.ozlabs.org>; Sat, 11 Apr 2026 01:08:15 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wBDSl-0004g8-ER; Fri, 10 Apr 2026 11:08:03 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <balaton@eik.bme.hu>)\n id 1wBDSi-0004ew-9p; Fri, 10 Apr 2026 11:08:00 -0400","from zero.eik.bme.hu ([2001:738:2001:2001::2001])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <balaton@eik.bme.hu>)\n id 1wBDSf-0003Hl-U2; Fri, 10 Apr 2026 11:08:00 -0400","from localhost (localhost [127.0.0.1])\n by zero.eik.bme.hu (Postfix) with ESMTP id 57144596A2A;\n Fri, 10 Apr 2026 17:07:54 +0200 (CEST)","from zero.eik.bme.hu ([127.0.0.1])\n by localhost (zero.eik.bme.hu [127.0.0.1]) (amavis, port 10028) with ESMTP\n id PPxvSXg4-Ulq; Fri, 10 Apr 2026 17:07:52 +0200 (CEST)","by zero.eik.bme.hu (Postfix, from userid 432)\n id 3D83E596A22; Fri, 10 Apr 2026 17:07:52 +0200 (CEST)","from localhost (localhost [127.0.0.1])\n by zero.eik.bme.hu (Postfix) with ESMTP id 3B457596A1E;\n Fri, 10 Apr 2026 17:07:52 +0200 (CEST)"],"X-Virus-Scanned":"amavis at eik.bme.hu","Date":"Fri, 10 Apr 2026 17:07:52 +0200 (CEST)","From":"BALATON Zoltan <balaton@eik.bme.hu>","To":"Ani Sinha <anisinha@redhat.com>","cc":"Harsh Prateek Bora <harshpb@linux.ibm.com>,\n qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","In-Reply-To":"<6816BC33-C3A0-4E89-8F85-C6ECE1108409@redhat.com>","Message-ID":"<2302cd01-b2ed-f0f2-93aa-8fd6c4836a7f@eik.bme.hu>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <C0822D91-E199-4FEB-B1AA-28652D0F3453@redhat.com>\n <4b3044b1-4ea0-4f3a-8a0a-04e09d071a15@linux.ibm.com>\n <451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>\n <424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com>\n <69167029-AE49-4BB3-9A5C-D16E51E5D40F@redhat.com>\n <07f76b99-ff79-4480-af02-c43e2779d179@linux.ibm.com>\n <c49859ce-7d98-04f7-69f8-d241e4148278@eik.bme.hu>\n <6816BC33-C3A0-4E89-8F85-C6ECE1108409@redhat.com>","MIME-Version":"1.0","Content-Type":"multipart/mixed;\n boundary=\"3866299591-1384376333-1775833672=:7256\"","Received-SPF":"pass client-ip=2001:738:2001:2001::2001;\n envelope-from=balaton@eik.bme.hu; helo=zero.eik.bme.hu","X-Spam_score_int":"-18","X-Spam_score":"-1.9","X-Spam_bar":"-","X-Spam_report":"(-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001,\n SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3676000,"web_url":"http://patchwork.ozlabs.org/comment/3676000/","msgid":"<87qzomznto.fsf@suse.de>","list_archive_url":null,"date":"2026-04-10T18:12:51","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":85343,"url":"http://patchwork.ozlabs.org/api/people/85343/","name":"Fabiano Rosas","email":"farosas@suse.de"},"content":"Harsh Prateek Bora <harshpb@linux.ibm.com> writes:\n\n> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n> This caused a lock imbalance where the vCPU thread would loop back to\n> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n>\n> The issue manifests as boot hangs on PowerPC pseries machines with multiple\n> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and\n> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held\n> the BQL, preventing other operations from proceeding.\n>\n\nAFAIU, with halted=1, the thread should be waiting at the\nqemu_process_cpu_events() qemu_cond_wait invocation which will release\nthe BQL during the wait.\n\nWhat is your irqchip setting? on/off/split? Aren't you just hitting the\nearly return at do_kvm_irqchip_create()? The refactoring from commit\n98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file\ndescriptor change\") made it so a few lines that would have been skipped\nare now executed.\n\nIn this case, probably kvm_halt_in_kernel_allowed=true is making\ncpu_thread_is_idle() return false and skip the wait at\nqemu_process_cpu_events().\n\n> The fix has two parts:\n>\n> 1. In kvm_cpu_exec() (kvm-all.c):\n>    Release the BQL before returning EXCP_HLT in the early return path,\n>    matching the behavior of the normal execution path where bql_unlock()\n>    is called before entering the main KVM execution loop.\n>\n> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):\n>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the\n>    loop expects to hold the BQL when calling kvm_cpu_exec() again.\n>\n> This ensures proper BQL lock/unlock pairing:\n> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()\n> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)\n> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned\n> - Next iteration has BQL held as expected\n>\n> This is a regression introduced by commit 98884e0cc1 (\"accel/kvm: add\n> changes required to support KVM VM file descriptor change\") which\n> refactored kvm_irqchip_create() and changed the initialization timing,\n> exposing this lock imbalance issue.\n>\n> Fixes: 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file descriptor change\")\n> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>\n> Reported-by: Gautam Menghani <gautam@linux.ibm.com>\n> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>\n> ---\n>  accel/kvm/kvm-accel-ops.c | 4 ++++\n>  accel/kvm/kvm-all.c       | 1 +\n>  2 files changed, 5 insertions(+)\n>\n> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c\n> index 6d9140e549..d684fd0840 100644\n> --- a/accel/kvm/kvm-accel-ops.c\n> +++ b/accel/kvm/kvm-accel-ops.c\n> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)\n>  \n>          if (cpu_can_run(cpu)) {\n>              r = kvm_cpu_exec(cpu);\n> +            if (r == EXCP_HLT) {\n> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */\n> +                bql_lock();\n> +            }\n>              if (r == EXCP_DEBUG) {\n>                  cpu_handle_guest_debug(cpu);\n>              }\n> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\n> index 774499d34f..00b8018664 100644\n> --- a/accel/kvm/kvm-all.c\n> +++ b/accel/kvm/kvm-all.c\n> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)\n>      trace_kvm_cpu_exec();\n>  \n>      if (kvm_arch_process_async_events(cpu)) {\n> +        bql_unlock();\n>          return EXCP_HLT;\n>      }","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256\n header.s=susede2_rsa header.b=AqehyiAT;\n\tdkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256\n header.s=susede2_ed25519 header.b=825lTzho;\n\tdkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de\n header.a=rsa-sha256 header.s=susede2_rsa header.b=Wsx7Cg0i;\n\tdkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256\n header.s=susede2_ed25519 header.b=Ldrwdy99;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)","smtp-out2.suse.de;\n dkim=pass header.d=suse.de header.s=susede2_rsa header.b=Wsx7Cg0i;\n dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=Ldrwdy99"],"Received":["from lists.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fslKk4D5yz1yGb\n\tfor <incoming@patchwork.ozlabs.org>; Sat, 11 Apr 2026 04:13:28 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wBGLl-000707-LO; Fri, 10 Apr 2026 14:13:01 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <farosas@suse.de>) id 1wBGLk-0006zY-44\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 14:13:00 -0400","from smtp-out2.suse.de ([2a07:de40:b251:101:10:150:64:2])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)\n (Exim 4.90_1) (envelope-from <farosas@suse.de>) id 1wBGLi-00065s-9p\n for qemu-ppc@nongnu.org; Fri, 10 Apr 2026 14:12:59 -0400","from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org\n [IPv6:2a07:de40:b281:104:10:150:64:97])\n (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest\n SHA256)\n (No client certificate requested)\n by smtp-out2.suse.de (Postfix) with ESMTPS id C4FF65BD39;\n Fri, 10 Apr 2026 18:12:54 +0000 (UTC)","from imap1.dmz-prg2.suse.org (localhost [127.0.0.1])\n (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest\n SHA256)\n (No client certificate requested)\n by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 4C9084A0B2;\n Fri, 10 Apr 2026 18:12:54 +0000 (UTC)","from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167])\n by imap1.dmz-prg2.suse.org with ESMTPSA id mfnfBKY92WlMRAAAD6G6ig\n (envelope-from <farosas@suse.de>); Fri, 10 Apr 2026 18:12:54 +0000"],"DKIM-Signature":["v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_rsa;\n t=1775844775;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n in-reply-to:in-reply-to:references:references;\n bh=Sq5mrRmxNo2gmVRRcSHuDwbjArE3r3RaPPpBn5fIRyQ=;\n b=AqehyiATV2nVX8A76iXsl2tVJj/d2ZeAsxdTpfpmrX0s5N+AJro5PXOVeAEclozrL2fO6F\n OD+eXChYkY0Yhkkhmhf7POK6Uj53tda+5nlW8sSjlKmWq2CUZLtkbjzDzRIfIKVekZJPR5\n Iw7ZOaSBcDu5Mk/20u8AOYEnPkhXGiQ=","v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_ed25519; t=1775844775;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n in-reply-to:in-reply-to:references:references;\n bh=Sq5mrRmxNo2gmVRRcSHuDwbjArE3r3RaPPpBn5fIRyQ=;\n b=825lTzhoSjXi+2/lrQ86qhbPXd1D/vGYyWgaGBaMkL2cGSruEOiUKfsYXwOtRV3KMDDT+U\n eQyeHvuI5S7faSBg==","v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_rsa;\n t=1775844774;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n in-reply-to:in-reply-to:references:references;\n bh=Sq5mrRmxNo2gmVRRcSHuDwbjArE3r3RaPPpBn5fIRyQ=;\n b=Wsx7Cg0i7WAfnbyZHS50BnlZYI7whtbrFAUctsE28Hvw2OHiG9CHz+fgQL2p3h6B+COZcj\n 25vuoYJJ9ENs23+mwvf+Njho08AVfvDYCB9Lduk5QHYc/Cegn9F4oA/fqcACy/mSyU/j6A\n eFPtP66AN4rT0p7tBN4j8oZqVlkyaQs=","v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_ed25519; t=1775844774;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n in-reply-to:in-reply-to:references:references;\n bh=Sq5mrRmxNo2gmVRRcSHuDwbjArE3r3RaPPpBn5fIRyQ=;\n b=Ldrwdy99OPsU47vmPQs81WiH8WFA60qbLReiXqvrZWBobBewkeVIlC/JuTP2K7BlRZMFo9\n tLzBftFYwEeIN+Bg=="],"From":"Fabiano Rosas <farosas@suse.de>","To":"Harsh Prateek Bora <harshpb@linux.ibm.com>, qemu-devel@nongnu.org,\n qemu-ppc@nongnu.org","Cc":"anisinha@redhat.com, pbonzini@redhat.com, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com, peter.maydell@linaro.org","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","In-Reply-To":"<20260409161042.55281-1-harshpb@linux.ibm.com>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>","Date":"Fri, 10 Apr 2026 15:12:51 -0300","Message-ID":"<87qzomznto.fsf@suse.de>","MIME-Version":"1.0","Content-Type":"text/plain","X-Spamd-Result":"default: False [-4.51 / 50.00]; BAYES_HAM(-3.00)[100.00%];\n NEURAL_HAM_LONG(-1.00)[-1.000];\n R_DKIM_ALLOW(-0.20)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];\n NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain];\n MX_GOOD(-0.01)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com];\n FUZZY_RATELIMITED(0.00)[rspamd.com]; ARC_NA(0.00)[];\n RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+];\n MISSING_XM_UA(0.00)[]; RCVD_TLS_ALL(0.00)[];\n MID_RHS_MATCH_FROM(0.00)[]; TO_DN_SOME(0.00)[];\n FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[];\n FREEMAIL_CC(0.00)[redhat.com,gmail.com,linux.ibm.com,linaro.org];\n RCPT_COUNT_SEVEN(0.00)[9]; RCVD_COUNT_TWO(0.00)[2];\n TO_MATCH_ENVRCPT_ALL(0.00)[];\n DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns,suse.de:dkim,suse.de:mid];\n DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];\n DKIM_TRACE(0.00)[suse.de:+]","X-Rspamd-Action":"no action","X-Spam-Score":"-4.51","X-Rspamd-Server":"rspamd1.dmz-prg2.suse.org","X-Rspamd-Queue-Id":"C4FF65BD39","Received-SPF":"pass client-ip=2a07:de40:b251:101:10:150:64:2;\n envelope-from=farosas@suse.de; helo=smtp-out2.suse.de","X-Spam_score_int":"-20","X-Spam_score":"-2.1","X-Spam_bar":"--","X-Spam_report":"(-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,\n DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001,\n SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3676470,"web_url":"http://patchwork.ozlabs.org/comment/3676470/","msgid":"<d4ca5f6a-c810-407b-b089-839e0a7d6d06@linux.ibm.com>","list_archive_url":null,"date":"2026-04-13T05:44:20","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":85411,"url":"http://patchwork.ozlabs.org/api/people/85411/","name":"Harsh Prateek Bora","email":"harshpb@linux.ibm.com"},"content":"Hi Fabiano, Balaton,\n\nThanks for pitching in and the suggestions.\n\nOn 10/04/26 11:42 pm, Fabiano Rosas wrote:\n> Harsh Prateek Bora <harshpb@linux.ibm.com> writes:\n> \n>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n>> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n>> This caused a lock imbalance where the vCPU thread would loop back to\n>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n>>\n>> The issue manifests as boot hangs on PowerPC pseries machines with multiple\n>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and\n>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held\n>> the BQL, preventing other operations from proceeding.\n>>\n> \n> AFAIU, with halted=1, the thread should be waiting at the\n> qemu_process_cpu_events() qemu_cond_wait invocation which will release\n> the BQL during the wait.\n> \n> What is your irqchip setting? on/off/split? Aren't you just hitting the\n> early return at do_kvm_irqchip_create()? The refactoring from commit\n> 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file\n> descriptor change\") made it so a few lines that would have been skipped\n> are now executed.\n> \n> In this case, probably kvm_halt_in_kernel_allowed=true is making\n> cpu_thread_is_idle() return false and skip the wait at\n> qemu_process_cpu_events().\n\nYou caught it right, the early return changed the behaviour with the\ncommit mentioned. I just tried with below change which retains the\nbehaviour prior to the commit mentioned and it works as expected (no \nmore BQL deadlocks).\n\nAni, would you like to post the patch fixing the behaviour change\nintroduced with commit 98884e0cc1 or I can send below patch if looks fine?\n\ndiff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\nindex 774499d34f..a388c00d71 100644\n--- a/accel/kvm/kvm-all.c\n+++ b/accel/kvm/kvm-all.c\n@@ -2575,7 +2575,7 @@ void kvm_irqchip_set_qemuirq_gsi(KVMState *s, \nqemu_irq irq, int gsi)\n      g_hash_table_insert(s->gsimap, irq, GINT_TO_POINTER(gsi));\n  }\n\n-static void do_kvm_irqchip_create(KVMState *s)\n+static int do_kvm_irqchip_create(KVMState *s)\n  {\n      int ret;\n      if (kvm_check_extension(s, KVM_CAP_IRQCHIP)) {\n@@ -2587,7 +2587,7 @@ static void do_kvm_irqchip_create(KVMState *s)\n              exit(1);\n          }\n      } else {\n-        return;\n+        return -1;\n      }\n\n      if (kvm_check_extension(s, KVM_CAP_IRQFD) <= 0) {\n@@ -2610,13 +2610,17 @@ static void do_kvm_irqchip_create(KVMState *s)\n          fprintf(stderr, \"Create kernel irqchip failed: %s\\n\", \nstrerror(-ret));\n          exit(1);\n      }\n+    return 0;\n  }\n\n  static void kvm_irqchip_create(KVMState *s)\n  {\n      assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);\n-\n-    do_kvm_irqchip_create(s);\n+    int ret = 0;\n+\n+    ret = do_kvm_irqchip_create(s);\n+    if (ret < 0)\n+        return;\n      kvm_kernel_irqchip = true;\n      /* If we have an in-kernel IRQ chip then we must have asynchronous\n       * interrupt delivery (though the reverse is not necessarily true)\n\n\n> \n>> The fix has two parts:\n>>\n>> 1. In kvm_cpu_exec() (kvm-all.c):\n>>     Release the BQL before returning EXCP_HLT in the early return path,\n>>     matching the behavior of the normal execution path where bql_unlock()\n>>     is called before entering the main KVM execution loop.\n>>\n>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):\n>>     Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the\n>>     loop expects to hold the BQL when calling kvm_cpu_exec() again.\n>>\n>> This ensures proper BQL lock/unlock pairing:\n>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()\n>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)\n>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned\n>> - Next iteration has BQL held as expected\n>>\n>> This is a regression introduced by commit 98884e0cc1 (\"accel/kvm: add\n>> changes required to support KVM VM file descriptor change\") which\n>> refactored kvm_irqchip_create() and changed the initialization timing,\n>> exposing this lock imbalance issue.\n>>\n>> Fixes: 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file descriptor change\")\n>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>\n>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>\n>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>\n>> ---\n>>   accel/kvm/kvm-accel-ops.c | 4 ++++\n>>   accel/kvm/kvm-all.c       | 1 +\n>>   2 files changed, 5 insertions(+)\n>>\n>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c\n>> index 6d9140e549..d684fd0840 100644\n>> --- a/accel/kvm/kvm-accel-ops.c\n>> +++ b/accel/kvm/kvm-accel-ops.c\n>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)\n>>   \n>>           if (cpu_can_run(cpu)) {\n>>               r = kvm_cpu_exec(cpu);\n>> +            if (r == EXCP_HLT) {\n>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */\n>> +                bql_lock();\n>> +            }\n>>               if (r == EXCP_DEBUG) {\n>>                   cpu_handle_guest_debug(cpu);\n>>               }\n>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\n>> index 774499d34f..00b8018664 100644\n>> --- a/accel/kvm/kvm-all.c\n>> +++ b/accel/kvm/kvm-all.c\n>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)\n>>       trace_kvm_cpu_exec();\n>>   \n>>       if (kvm_arch_process_async_events(cpu)) {\n>> +        bql_unlock();\n>>           return EXCP_HLT;\n>>       }","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256\n header.s=pp1 header.b=ezl538jK;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists1p.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fvGZv0KXhz1y2d\n\tfor <incoming@patchwork.ozlabs.org>; Mon, 13 Apr 2026 15:45:09 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists1p.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wCA6I-0004oX-9p; Mon, 13 Apr 2026 01:44:46 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wCA6G-0004nk-9R; Mon, 13 Apr 2026 01:44:44 -0400","from mx0a-001b2d01.pphosted.com ([148.163.156.1])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wCA6E-0003IU-1d; Mon, 13 Apr 2026 01:44:43 -0400","from pps.filterd (m0360083.ppops.net [127.0.0.1])\n by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id\n 63CKfhbD3344222; Mon, 13 Apr 2026 05:44:28 GMT","from ppma22.wdc07v.mail.ibm.com\n (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92])\n by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4dfdt3p6h8-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Mon, 13 Apr 2026 05:44:27 +0000 (GMT)","from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1])\n by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id\n 63D2d1b7031240;\n Mon, 13 Apr 2026 05:44:26 GMT","from smtprelay06.dal12v.mail.ibm.com ([172.16.1.8])\n by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4dg10y3xbt-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Mon, 13 Apr 2026 05:44:26 +0000","from smtpav01.dal12v.mail.ibm.com (smtpav01.dal12v.mail.ibm.com\n [10.241.53.100])\n by smtprelay06.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id\n 63D5iPXM50790772\n (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);\n Mon, 13 Apr 2026 05:44:25 GMT","from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id 19F0258058;\n Mon, 13 Apr 2026 05:44:25 +0000 (GMT)","from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id E910958059;\n Mon, 13 Apr 2026 05:44:21 +0000 (GMT)","from [9.124.214.170] (unknown [9.124.214.170])\n by smtpav01.dal12v.mail.ibm.com (Postfix) with ESMTP;\n Mon, 13 Apr 2026 05:44:21 +0000 (GMT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc\n :content-transfer-encoding:content-type:date:from:in-reply-to\n :message-id:mime-version:references:subject:to; s=pp1; bh=jfRqSa\n XLxujPZJrH37YGpEURwColteq3LuWBnyLJU/w=; b=ezl538jKqjRW7OWnpJNSZ9\n t45IZV1dTvkJSFEwrq4FdNsJcJZCPwP30Y1t3Y31CN35C7R8MjvHILtKqDKM1EFp\n lH1Q6VKzbR1bpTHKjE8XjA0y07JxGGyxfkLMw1QafqsSAW1ORDScz4oopQdH7u0a\n PpAZ4ejBz4xaxRwESR1/mGcboMBDV96Lq00+UucXHA0QK2xEcqi92icyV/g+0tE6\n 3FhmaMlpOwY2x54FESp1IcUR2ee9ZtoWHsMhv1mRmG/eJ25rBuMccswc8xrzgCPt\n Jev7jZQwGoIvU1+mjpYDoSA0H6tSXL8uq3vIamBPriUACN8wsoX1SZ3sdcI1B2gA\n ==","Message-ID":"<d4ca5f6a-c810-407b-b089-839e0a7d6d06@linux.ibm.com>","Date":"Mon, 13 Apr 2026 11:14:20 +0530","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","Content-Language":"en-GB","To":"Fabiano Rosas <farosas@suse.de>, balaton@eik.bme.hu, anisinha@redhat.com,\n qemu-devel@nongnu.org, qemu-ppc@nongnu.org","Cc":"pbonzini@redhat.com, npiggin@gmail.com, misanjum@linux.ibm.com,\n gautam@linux.ibm.com, peter.maydell@linaro.org","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <87qzomznto.fsf@suse.de>","From":"Harsh Prateek Bora <harshpb@linux.ibm.com>","In-Reply-To":"<87qzomznto.fsf@suse.de>","Content-Type":"text/plain; charset=UTF-8; format=flowed","Content-Transfer-Encoding":"7bit","X-TM-AS-GCONF":"00","X-Proofpoint-Reinject":"loops=2 maxloops=12","X-Proofpoint-ORIG-GUID":"aKQ2Uc-c-qCGH0JP2Td9R-LcFprhSmCB","X-Proofpoint-Spam-Details-Enc":"AW1haW4tMjYwNDEzMDA0OCBTYWx0ZWRfX534yr1tSlO+k\n qRr6TzHrtP+NU+13UEVHGKoxtlhj2DoXuO0FKckqoBq5wjIY2M61VxsclWoqL5bDh7jNT2SRrZv\n IGQGmpFBzMFr3LGCySEmIWQAx1Ja7BK+m4IaFXwYhxfIsh8gTX62H1mHeUjmhJKkxBWOuqZhbfh\n XSlIvJrKvo+sgNa0im2vk9CiWJa268p50MPu/gIjs+4k0kCKABT5y8JwrRn8wwT52xY88HgMX7x\n q4RAv1LaMDIgMND6V93OXffkKgZOVVOKH9Yppy3TGwlGDxSenf07MCTJgjRCrVgNscGzXMnzXPp\n pOauYIO1qXRVAlIlC9+6qFUBCrZCnuSbPlQx9vdSG5mboXfeUlPx8D2ln+kJsHTMZHBS73PD8fQ\n JDmyGhjMxotsCPOn6xTnH+I8NviazeeO/sY21IfFxPAqdkypU1/EsBAMq8pUK0nVir0aCtOHMSb\n 9VtFFgKoephf5oGxgzw==","X-Proofpoint-GUID":"pyYhuyNNb68s5KOVGq8Sc56tZ915Yvcc","X-Authority-Analysis":"v=2.4 cv=WpEb99fv c=1 sm=1 tr=0 ts=69dc82bb cx=c_pps\n a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17\n a=IkcTkHD0fZMA:10 a=A5OVakUREuEA:10 a=f7IdgyKtn90A:10\n a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=iQ6ETzBq9ecOQQE5vZCe:22\n a=VnNF1IyMAAAA:8 a=mAafe1QSjR4qlSFeF0IA:9 a=QEXdDO2ut3YA:10 a=O8hF6Hzn-FEA:10","X-Proofpoint-Virus-Version":"vendor=baseguard\n engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49\n definitions=2026-04-13_01,2026-04-09_02,2025-10-01_01","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n priorityscore=1501 phishscore=0 bulkscore=0 adultscore=0 spamscore=0\n malwarescore=0 clxscore=1015 lowpriorityscore=0 suspectscore=0\n impostorscore=0 classifier=typeunknown authscore=0 authtc= authcc=\n route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2604010000\n definitions=main-2604130048","Received-SPF":"pass client-ip=148.163.156.1;\n envelope-from=harshpb@linux.ibm.com;\n helo=mx0a-001b2d01.pphosted.com","X-Spam_score_int":"-26","X-Spam_score":"-2.7","X-Spam_bar":"--","X-Spam_report":"(-2.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,\n DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7,\n RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3676501,"web_url":"http://patchwork.ozlabs.org/comment/3676501/","msgid":"<A44F57CD-3292-4715-BE26-98DD15D40613@redhat.com>","list_archive_url":null,"date":"2026-04-13T07:13:37","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":86030,"url":"http://patchwork.ozlabs.org/api/people/86030/","name":"Ani Sinha","email":"anisinha@redhat.com"},"content":"> On 13 Apr 2026, at 11:14 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:\n> \n> Hi Fabiano, Balaton,\n> \n> Thanks for pitching in and the suggestions.\n> \n> On 10/04/26 11:42 pm, Fabiano Rosas wrote:\n>> Harsh Prateek Bora <harshpb@linux.ibm.com> writes:\n>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()\n>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).\n>>> This caused a lock imbalance where the vCPU thread would loop back to\n>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.\n>>> \n>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple\n>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and\n>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held\n>>> the BQL, preventing other operations from proceeding.\n>>> \n>> AFAIU, with halted=1, the thread should be waiting at the\n>> qemu_process_cpu_events() qemu_cond_wait invocation which will release\n>> the BQL during the wait.\n>> What is your irqchip setting? on/off/split? Aren't you just hitting the\n>> early return at do_kvm_irqchip_create()? The refactoring from commit\n>> 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file\n>> descriptor change\") made it so a few lines that would have been skipped\n>> are now executed.\n>> In this case, probably kvm_halt_in_kernel_allowed=true is making\n>> cpu_thread_is_idle() return false and skip the wait at\n>> qemu_process_cpu_events().\n> \n> You caught it right, the early return changed the behaviour with the\n> commit mentioned. I just tried with below change which retains the\n> behaviour prior to the commit mentioned and it works as expected (no more BQL deadlocks).\n> \n\nThis makes more sense. Btw if you had followed through with the suggestion I made here\nhttps://yhbt.net/lore/all/1014F925-4992-459B-B5B4-E6CCAC7FBC02@redhat.com/\n\nIt would have pointed straight at this change and then figuring out the rest would have been straightforward.\n\n> Ani, would you like to post the patch fixing the behaviour change\n> introduced with commit 98884e0cc1 or I can send below patch if looks fine?\n\nI will post something soon.\n\n\n> \n> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\n> index 774499d34f..a388c00d71 100644\n> --- a/accel/kvm/kvm-all.c\n> +++ b/accel/kvm/kvm-all.c\n> @@ -2575,7 +2575,7 @@ void kvm_irqchip_set_qemuirq_gsi(KVMState *s, qemu_irq irq, int gsi)\n>     g_hash_table_insert(s->gsimap, irq, GINT_TO_POINTER(gsi));\n> }\n> \n> -static void do_kvm_irqchip_create(KVMState *s)\n> +static int do_kvm_irqchip_create(KVMState *s)\n> {\n>     int ret;\n>     if (kvm_check_extension(s, KVM_CAP_IRQCHIP)) {\n> @@ -2587,7 +2587,7 @@ static void do_kvm_irqchip_create(KVMState *s)\n>             exit(1);\n>         }\n>     } else {\n> -        return;\n> +        return -1;\n>     }\n> \n>     if (kvm_check_extension(s, KVM_CAP_IRQFD) <= 0) {\n> @@ -2610,13 +2610,17 @@ static void do_kvm_irqchip_create(KVMState *s)\n>         fprintf(stderr, \"Create kernel irqchip failed: %s\\n\", strerror(-ret));\n>         exit(1);\n>     }\n> +    return 0;\n> }\n> \n> static void kvm_irqchip_create(KVMState *s)\n> {\n>     assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);\n> -\n> -    do_kvm_irqchip_create(s);\n> +    int ret = 0;\n> +\n> +    ret = do_kvm_irqchip_create(s);\n> +    if (ret < 0)\n> +        return;\n>     kvm_kernel_irqchip = true;\n>     /* If we have an in-kernel IRQ chip then we must have asynchronous\n>      * interrupt delivery (though the reverse is not necessarily true)\n> \n> \n>>> The fix has two parts:\n>>> \n>>> 1. In kvm_cpu_exec() (kvm-all.c):\n>>>    Release the BQL before returning EXCP_HLT in the early return path,\n>>>    matching the behavior of the normal execution path where bql_unlock()\n>>>    is called before entering the main KVM execution loop.\n>>> \n>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):\n>>>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the\n>>>    loop expects to hold the BQL when calling kvm_cpu_exec() again.\n>>> \n>>> This ensures proper BQL lock/unlock pairing:\n>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()\n>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)\n>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned\n>>> - Next iteration has BQL held as expected\n>>> \n>>> This is a regression introduced by commit 98884e0cc1 (\"accel/kvm: add\n>>> changes required to support KVM VM file descriptor change\") which\n>>> refactored kvm_irqchip_create() and changed the initialization timing,\n>>> exposing this lock imbalance issue.\n>>> \n>>> Fixes: 98884e0cc1 (\"accel/kvm: add changes required to support KVM VM file descriptor change\")\n>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>\n>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>\n>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>\n>>> ---\n>>>  accel/kvm/kvm-accel-ops.c | 4 ++++\n>>>  accel/kvm/kvm-all.c       | 1 +\n>>>  2 files changed, 5 insertions(+)\n>>> \n>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c\n>>> index 6d9140e549..d684fd0840 100644\n>>> --- a/accel/kvm/kvm-accel-ops.c\n>>> +++ b/accel/kvm/kvm-accel-ops.c\n>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)\n>>>            if (cpu_can_run(cpu)) {\n>>>              r = kvm_cpu_exec(cpu);\n>>> +            if (r == EXCP_HLT) {\n>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */\n>>> +                bql_lock();\n>>> +            }\n>>>              if (r == EXCP_DEBUG) {\n>>>                  cpu_handle_guest_debug(cpu);\n>>>              }\n>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c\n>>> index 774499d34f..00b8018664 100644\n>>> --- a/accel/kvm/kvm-all.c\n>>> +++ b/accel/kvm/kvm-all.c\n>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)\n>>>      trace_kvm_cpu_exec();\n>>>        if (kvm_arch_process_async_events(cpu)) {\n>>> +        bql_unlock();\n>>>          return EXCP_HLT;\n>>>      }","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=mimecast20190719 header.b=NRT3asjg;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists1p.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fvJYl4tV6z1yDF\n\tfor <incoming@patchwork.ozlabs.org>; Mon, 13 Apr 2026 17:14:17 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists1p.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wCBUh-0002zA-6q; Mon, 13 Apr 2026 03:14:03 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wCBUf-0002yc-Cy\n for qemu-ppc@nongnu.org; Mon, 13 Apr 2026 03:14:01 -0400","from us-smtp-delivery-124.mimecast.com ([170.10.133.124])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <anisinha@redhat.com>)\n id 1wCBUd-0000M0-1O\n for qemu-ppc@nongnu.org; Mon, 13 Apr 2026 03:14:01 -0400","from mail-pl1-f197.google.com (mail-pl1-f197.google.com\n [209.85.214.197]) by relay.mimecast.com with ESMTP with STARTTLS\n (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id\n us-mta-264-RjIRqEDdPDelRxn1i86Jxg-1; Mon, 13 Apr 2026 03:13:55 -0400","by mail-pl1-f197.google.com with SMTP id\n d9443c01a7336-2b24af7ca99so58283535ad.1\n for <qemu-ppc@nongnu.org>; Mon, 13 Apr 2026 00:13:55 -0700 (PDT)","from smtpclient.apple ([122.163.114.34])\n by smtp.gmail.com with ESMTPSA id\n d9443c01a7336-2b2d4dd7faasm105091465ad.26.2026.04.13.00.13.49\n (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);\n Mon, 13 Apr 2026 00:13:52 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;\n s=mimecast20190719; t=1776064436;\n h=from:from:reply-to:subject:subject:date:date:message-id:message-id:\n to:to:cc:cc:mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=rFSA+NM0PhYWL1O+fiZ8HPRFJe14R+WUeGt8sSodKHI=;\n b=NRT3asjgI2W8C5tbZPJdDPBad0a8L8R3YyEwiiak8jsjAmnoLsnk2KE26mBxgePjAMSBmX\n apzh/XrLIGd9HXOm1nPP1BMWeY9Pt8MQHG2XHbpIRNgeSolPnnkR+ASOsqlSaCwj8vjcId\n e8+uUAS26Rf/IuyelIwJO+3dxRYl4oU=","X-MC-Unique":"RjIRqEDdPDelRxn1i86Jxg-1","X-Mimecast-MFC-AGG-ID":"RjIRqEDdPDelRxn1i86Jxg_1776064435","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1776064434; x=1776669234;\n h=to:references:message-id:content-transfer-encoding:cc:date\n :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state\n :from:to:cc:subject:date:message-id:reply-to;\n bh=5iAu39pTkAEFIrDGoZu079165o4bxNu8OxZMxd1GIwU=;\n b=k9tFiznI8bP8sdfK80e9Mnn1eahmbN5qqr0KgcI2CwT4UjOxegxygKHZORQGakXQV4\n 1Ln7fVX5GvBgI++11nCjygxqhWA0+lAuIt3VceUl1ZV5yfVsMYVdXmxi+auD1vRbFYlm\n 49pTPOowMut8YWHwHaRJ4eTubB7+MAjNkP+Yn87JJMF7DRX8ShpDdexu5tA5sngokxe6\n D+LSypsNlkV6Yibjnjh/PfQB+8qHv8jc8sP5FNR0jBaRRUTXHwdywdlX/OipnfPYM5Zp\n GzSGZE0Ck8vgQ3aWDVbFLFzdZuWW9jjYTEJctVW+NZDFAIuQZkr7j2ur4yrB57jP6nuV\n wqvw==","X-Forwarded-Encrypted":"i=1;\n AFNElJ8mHSkLna+zbR6JosXDG63updottz0nmLwZhPBEge36qmrHqnlVUo8cgyr2x7MT5QDTKTbXctMRug==@nongnu.org","X-Gm-Message-State":"AOJu0Yy8eV8uyeaN4IksvMDUBVyNNV4ythH+OMpJOD0cSZOjs+Z4UWNP\n EPC/6zHVstYCB4tBCsX2dZL3aFTCsek41uzdtUMrOREqhlCBD+Iq8F3eBv51WRQ7M74W6VzYECZ\n WzHXnm6pU0LGtUBkRV+0LplKZqVBSlAUFMUAs5jFUcYweLYXBO92TlA==","X-Gm-Gg":"AeBDieuOEZ4CRuNsEdjxgU9D8HIh/2BqxthHI6SigGCZdDdTFAgSpkJbLk7TYqo3A+/\n MZ1S+tv4e9LEIss1p7n2UrjXyjmhfcL7HxPeFFOYpynBdKv2Ic0qGt+nuAEf19tX3pWPcf7pP6M\n QdoP5hNpunLSbUXjizG6hRkcXm8gQY/1asGs5bxK6aW2GAey3iKio5V5PLUda+Nko04pAf85aYk\n 9SBkFsb//SvrxfNQaIffYB6duVRettHygVUeleju70aGCPrh5z5+L08I9ns+nndPERqyTjaPLm7\n JZ8FKc2HkgVxaQam5umjeMYdSxZ7t6aOizg9r47QZzUz3F/e6j5uxzjFbAyzYvtTA1y9oamjno0\n mrxG368c+x9WNKFlfmkKS8eToIrIUk8Pikc7GWHrRZrG2SUgXaoj9JynE2jEfW895A6Gb3anC2+\n c=","X-Received":["by 2002:a17:902:7041:b0:2b4:65ab:57cd with SMTP id\n d9443c01a7336-2b465ab5ba4mr3859515ad.36.1776064434382;\n Mon, 13 Apr 2026 00:13:54 -0700 (PDT)","by 2002:a17:902:7041:b0:2b4:65ab:57cd with SMTP id\n d9443c01a7336-2b465ab5ba4mr3859245ad.36.1776064433399;\n Mon, 13 Apr 2026 00:13:53 -0700 (PDT)"],"Mime-Version":"1.0 (Mac OS X Mail 16.0 \\(3864.500.181\\))","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","From":"Ani Sinha <anisinha@redhat.com>","In-Reply-To":"<d4ca5f6a-c810-407b-b089-839e0a7d6d06@linux.ibm.com>","Date":"Mon, 13 Apr 2026 12:43:37 +0530","Cc":"Fabiano Rosas <farosas@suse.de>, balaton@eik.bme.hu,\n qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","Message-Id":"<A44F57CD-3292-4715-BE26-98DD15D40613@redhat.com>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <87qzomznto.fsf@suse.de> <d4ca5f6a-c810-407b-b089-839e0a7d6d06@linux.ibm.com>","To":"Harsh Prateek Bora <harshpb@linux.ibm.com>","X-Mailer":"Apple Mail (2.3864.500.181)","X-Mimecast-Spam-Score":"0","X-Mimecast-MFC-PROC-ID":"qA_TDeXmDOvxMqhTg-hGo2U8QcqIaQZ_Qf5EcVFyJ48_1776064435","X-Mimecast-Originator":"redhat.com","Content-Type":"text/plain;\n\tcharset=utf-8","Content-Transfer-Encoding":"quoted-printable","Received-SPF":"pass client-ip=170.10.133.124;\n envelope-from=anisinha@redhat.com;\n helo=us-smtp-delivery-124.mimecast.com","X-Spam_score_int":"-25","X-Spam_score":"-2.6","X-Spam_bar":"--","X-Spam_report":"(-2.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.54,\n DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,\n RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3676513,"web_url":"http://patchwork.ozlabs.org/comment/3676513/","msgid":"<33b54326-f6f6-4589-9fa8-bc315a9be2eb@linux.ibm.com>","list_archive_url":null,"date":"2026-04-13T07:39:48","subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","submitter":{"id":85411,"url":"http://patchwork.ozlabs.org/api/people/85411/","name":"Harsh Prateek Bora","email":"harshpb@linux.ibm.com"},"content":"On 13/04/26 12:43 pm, Ani Sinha wrote:\n>> Ani, would you like to post the patch fixing the behaviour change\n>> introduced with commit 98884e0cc1 or I can send below patch if looks fine?\n> I will post something soon.\n\nThanks, Ani. Let's get it fixed in -rc4!","headers":{"Return-Path":"<qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256\n header.s=pp1 header.b=UCY2Xm5w;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists1p.gnu.org;\n envelope-from=qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fvK7c40L8z1y2d\n\tfor <incoming@patchwork.ozlabs.org>; Mon, 13 Apr 2026 17:40:12 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists1p.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-ppc-bounces@nongnu.org>)\n\tid 1wCBtv-0002l5-PR; Mon, 13 Apr 2026 03:40:07 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wCBtq-0002Fe-I8; Mon, 13 Apr 2026 03:40:02 -0400","from mx0a-001b2d01.pphosted.com ([148.163.156.1])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <harshpb@linux.ibm.com>)\n id 1wCBto-0006oz-Mt; Mon, 13 Apr 2026 03:40:02 -0400","from pps.filterd (m0353729.ppops.net [127.0.0.1])\n by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id\n 63CJF39V2897733; Mon, 13 Apr 2026 07:39:55 GMT","from ppma11.dal12v.mail.ibm.com\n (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219])\n by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4dfdyqpgt6-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Mon, 13 Apr 2026 07:39:55 +0000 (GMT)","from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1])\n by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id\n 63D55vdu025761;\n Mon, 13 Apr 2026 07:39:54 GMT","from smtprelay02.dal12v.mail.ibm.com ([172.16.1.4])\n by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 4dg3b1bwsk-1\n (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n Mon, 13 Apr 2026 07:39:54 +0000","from smtpav06.wdc07v.mail.ibm.com (smtpav06.wdc07v.mail.ibm.com\n [10.39.53.233])\n by smtprelay02.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id\n 63D7drH48848076\n (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);\n Mon, 13 Apr 2026 07:39:53 GMT","from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id 260985803F;\n Mon, 13 Apr 2026 07:39:53 +0000 (GMT)","from smtpav06.wdc07v.mail.ibm.com (unknown [127.0.0.1])\n by IMSVA (Postfix) with ESMTP id CEEBB58054;\n Mon, 13 Apr 2026 07:39:49 +0000 (GMT)","from [9.123.0.169] (unknown [9.123.0.169])\n by smtpav06.wdc07v.mail.ibm.com (Postfix) with ESMTP;\n Mon, 13 Apr 2026 07:39:49 +0000 (GMT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc\n :content-transfer-encoding:content-type:date:from:in-reply-to\n :message-id:mime-version:references:subject:to; s=pp1; bh=+Gckv2\n M9p1mNTsqe3ii1JrV33Rbla4jfy+2WAtjSvQo=; b=UCY2Xm5wPQwqsS6kNWfLat\n HMvFveluT+sKZ/2eFBYwln2OIf5JnmA7VsWAWtUqZL6/Q/EtY4K+L4TmXkVWx2Lp\n NsMlAXM/hz/rGsbDF9eBsQNVOVyQ54RLG+RIMfaJwQG9X/mBcwX4joUGt1pH17LA\n wwpwz4aq8/m8DF0gSk+5ZBxhQBzP7/P4q/Lv8/t26yeJ6wCEb+dxd6uFkJEsM+a7\n pb46BgwZVh0pX+40x1jQN71qrcKOEbF9GWMqXPkNf9p1esQIsZCmZJUGu4GpqlMy\n M4MQ8MpSMoz9niKseZU41XUZe+iFC7zrCxt8PkwD/1Dye1T/9Q/+M9GUOO6kbIBg\n ==","Message-ID":"<33b54326-f6f6-4589-9fa8-bc315a9be2eb@linux.ibm.com>","Date":"Mon, 13 Apr 2026 13:09:48 +0530","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in\n kvm_cpu_exec","Content-Language":"en-GB","To":"Ani Sinha <anisinha@redhat.com>","Cc":"Fabiano Rosas <farosas@suse.de>, balaton@eik.bme.hu,\n qemu-devel <qemu-devel@nongnu.org>, qemu-ppc@nongnu.org,\n Paolo Bonzini <pbonzini@redhat.com>, npiggin@gmail.com,\n misanjum@linux.ibm.com, gautam@linux.ibm.com,\n Peter Maydell <peter.maydell@linaro.org>","References":"<20260409161042.55281-1-harshpb@linux.ibm.com>\n <87qzomznto.fsf@suse.de> <d4ca5f6a-c810-407b-b089-839e0a7d6d06@linux.ibm.com>\n <A44F57CD-3292-4715-BE26-98DD15D40613@redhat.com>","From":"Harsh Prateek Bora <harshpb@linux.ibm.com>","In-Reply-To":"<A44F57CD-3292-4715-BE26-98DD15D40613@redhat.com>","Content-Type":"text/plain; charset=UTF-8; format=flowed","Content-Transfer-Encoding":"7bit","X-TM-AS-GCONF":"00","X-Proofpoint-Reinject":"loops=2 maxloops=12","X-Authority-Analysis":"v=2.4 cv=ErTiaycA c=1 sm=1 tr=0 ts=69dc9dcb cx=c_pps\n a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17\n a=IkcTkHD0fZMA:10 a=A5OVakUREuEA:10 a=f7IdgyKtn90A:10\n a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=uAbxVGIbfxUO_5tXvNgY:22\n a=aHC7gpCOpxZIYopdUIsA:9 a=QEXdDO2ut3YA:10","X-Proofpoint-ORIG-GUID":"gOE9PviYbO3wlO3xAiUDLF6Cr7DV9rav","X-Proofpoint-GUID":"9GeYidcRu52NQr-TjExo6576xoH_kxdt","X-Proofpoint-Spam-Details-Enc":"AW1haW4tMjYwNDEzMDA2OSBTYWx0ZWRfX+o2c/QdqN+gT\n J2GJF0uTvBgbWX1Lz9kqPq2gkct/aCHCFj5k+j06Eg85vyVf4ks5WUw5EcxoIsl5F0BLGCg2sKb\n o6MGsPUIA/akdU8fjUGkR6a68YudC897tOGcsJUyMNRZ064P5Yts5BMEw3S9aHwu7P7RTtJ8NfR\n +0dmtDB/p4lvMo5vWOggnqOnRbcpK7O7Odml3oVFdGxBNLW6phMXTKPbb+OMGXdysiERBZYONLz\n mRGYdbSuxNRuIbl2cVxITmm/wNkuDe1rPy4Z4qYqRKx0SR2dOv6rvZ8N0JtCSaOzNF4huJc+GVB\n ctT1RBKoxADispGIHb754abFrdQvIuxivJTYrkyOdpmMKv/pXwS9vIJbfhCw1BuT49BtAH1kfvp\n okVHaPyk0ol6w+fbTVO0C3Y5nksL7q/rRuYiQ3a6zMGAEfxtlwq56STxQ3efkCi0zCZDHBMzOp+\n Ei2SNHa/Pf493unxNVA==","X-Proofpoint-Virus-Version":"vendor=baseguard\n engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49\n definitions=2026-04-13_02,2026-04-09_02,2025-10-01_01","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n malwarescore=0 adultscore=0 suspectscore=0 bulkscore=0 clxscore=1015\n priorityscore=1501 lowpriorityscore=0 phishscore=0 spamscore=0\n impostorscore=0 classifier=typeunknown authscore=0 authtc= authcc=\n route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2604010000\n definitions=main-2604130069","Received-SPF":"pass client-ip=148.163.156.1;\n envelope-from=harshpb@linux.ibm.com;\n helo=mx0a-001b2d01.pphosted.com","X-Spam_score_int":"-26","X-Spam_score":"-2.7","X-Spam_bar":"--","X-Spam_report":"(-2.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,\n DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7,\n RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,\n SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-ppc@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<qemu-ppc.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-ppc>","List-Post":"<mailto:qemu-ppc@nongnu.org>","List-Help":"<mailto:qemu-ppc-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-ppc>,\n <mailto:qemu-ppc-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}}]