{"id":2220388,"url":"http://patchwork.ozlabs.org/api/1.1/covers/2220388/?format=json","web_url":"http://patchwork.ozlabs.org/project/ubuntu-kernel/cover/20260407063459.106734-1-acelan.kao@canonical.com/","project":{"id":15,"url":"http://patchwork.ozlabs.org/api/1.1/projects/15/?format=json","name":"Ubuntu Kernel","link_name":"ubuntu-kernel","list_id":"kernel-team.lists.ubuntu.com","list_email":"kernel-team@lists.ubuntu.com","web_url":null,"scm_url":null,"webscm_url":null},"msgid":"<20260407063459.106734-1-acelan.kao@canonical.com>","date":"2026-04-07T06:34:58","name":"[SRU,Q,0/1] System hangs during stress-ng stack test","submitter":{"id":2976,"url":"http://patchwork.ozlabs.org/api/1.1/people/2976/?format=json","name":"AceLan Kao","email":"acelan.kao@canonical.com"},"mbox":"http://patchwork.ozlabs.org/project/ubuntu-kernel/cover/20260407063459.106734-1-acelan.kao@canonical.com/mbox/","series":[{"id":498922,"url":"http://patchwork.ozlabs.org/api/1.1/series/498922/?format=json","web_url":"http://patchwork.ozlabs.org/project/ubuntu-kernel/list/?series=498922","date":"2026-04-07T06:34:58","name":"System hangs during stress-ng stack test","version":1,"mbox":"http://patchwork.ozlabs.org/series/498922/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/covers/2220388/comments/","headers":{"Return-Path":"<kernel-team-bounces@lists.ubuntu.com>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256\n header.s=20251104 header.b=CIDI+tWB;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com\n (client-ip=185.125.189.65; helo=lists.ubuntu.com;\n envelope-from=kernel-team-bounces@lists.ubuntu.com;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fqbzZ2xP7z1xy1\n\tfor <incoming@patchwork.ozlabs.org>; Tue, 07 Apr 2026 16:35:21 +1000 (AEST)","from localhost ([127.0.0.1] helo=lists.ubuntu.com)\n\tby lists.ubuntu.com with esmtp (Exim 4.86_2)\n\t(envelope-from <kernel-team-bounces@lists.ubuntu.com>)\n\tid 1wA01m-0007TL-AG; Tue, 07 Apr 2026 06:35:10 +0000","from mail-pl1-f179.google.com ([209.85.214.179])\n by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)\n (Exim 4.86_2) (envelope-from <acelan@gmail.com>) id 1wA01l-0007TD-Bc\n for kernel-team@lists.ubuntu.com; Tue, 07 Apr 2026 06:35:09 +0000","by mail-pl1-f179.google.com with SMTP id\n d9443c01a7336-2aae4816912so29447185ad.2\n for <kernel-team@lists.ubuntu.com>; Mon, 06 Apr 2026 23:35:08 -0700 (PDT)","from localhost ([2001:67c:1562:8007::aac:4468])\n by smtp.gmail.com with ESMTPSA id\n d9443c01a7336-2b2749a440csm156289765ad.65.2026.04.06.23.35.03\n for <kernel-team@lists.ubuntu.com>\n (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);\n Mon, 06 Apr 2026 23:35:05 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=gmail.com; s=20251104; t=1775543706; x=1776148506; darn=lists.ubuntu.com;\n h=content-transfer-encoding:mime-version:message-id:date:subject:to\n :from:sender:from:to:cc:subject:date:message-id:reply-to;\n bh=Ql7VL3SPwb26j6dAm5CyrNa7yhFi/r0903FsgyVTlII=;\n b=CIDI+tWBJFF81k0n+Yx+Liz2R2RxVNUcWmWhaT81vFwp8q284ODCdbArDdPWPvyTRF\n VYoXGk328vI/UXBL+ZUJQe50oHqbHHyxKZe1SJd3239r37aCx2M1o5Whi7Qft+XlAoRK\n /pY6Hh85IzgFoXebJEZfKKDZhlITOypX/KEcbZe44EAk4x1EVd59jkjwu0FXDwHGLrd3\n 68m7L8ialL8orXEvFQ1+UJUdenQKRYBm7FhrF6aE8pZjO1taijMKNdc1yu73yywNZzku\n Jc8MN9b//dXckn6DsNGy6DCvwnSNJjNRLH+XeaVIBqyXKAzKJ3jKC4uoQW5duK9UMTLf\n /svw==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1775543706; x=1776148506;\n h=content-transfer-encoding:mime-version:message-id:date:subject:to\n :from:sender:x-gm-gg:x-gm-message-state:from:to:cc:subject:date\n :message-id:reply-to;\n bh=Ql7VL3SPwb26j6dAm5CyrNa7yhFi/r0903FsgyVTlII=;\n b=sXE07iRxIDiYh5oQwhUeyZbxAZH4QJEwqdbRb7PXIBTf7vsWS2QHW1rvVc6xSSQW4q\n 6IsBHafr7pK0x//ZG7iG1LXdH/8xfjpp2sELjJ0STNUV+Wae0nfGi1giqz1RAPQjA7F6\n FDv7h3Yhu4AnWWKcMSMwdYfXhNRAdTny0/ZAwV410vl9ZmTlc9mByfSyajCc76qguwlf\n OXQawpR0IdC9GsTruekEbWXaUoLkYTQgbHx3fEVXeV4A4Q/CxHl9dZ/kem5S33aWOCVt\n TbrVUlzwGYXyBk4ZLuoY3fT+Kh1etpbDhcAB2XYoTru93z3el1WLY3M15Mmv7Sa06lLr\n 8yng==","X-Gm-Message-State":"AOJu0YyXEEmVz5+/B5GwK0dTMfhw1lIilEyhlLMrYkDfWD2h6d32NIHG\n RwavHDXeDlAEnnrfBd1d6LGadpA/WKux7whAmlLUGdJvEBQXO67DgqFL+1oI0uBo","X-Gm-Gg":"AeBDievIFtcDTpkvgmGDyR7LBR6gW7ck8KAztw0UpkhjARJJqRUlD7VsT4cNiQkdkQ2\n J3YJaXM3SV15Jw5+NSP3WSbfvJfDGjKffAlPTM0fRlLeIYvk47zXSeHBCDCmSRnBr3VlV/iiN5V\n hF77S85IFVehxNq3a7WKBKOrjK2J2unGBPzOtxazt6JQz073eeOIKdp8Z0Y3QsXN/aQFE7aQmMd\n X/U5PrBxaemVbRIDan8lu4FXf3eueyIVlka03XgfS09StKYemBlenXluGl7aVrNdhuDv3pLrcsO\n 0vRzp/QtOoD/upzBVr3+tDqRfRkPsNtNDMYGg4i7A0jufex9RRell3H4sLTC+YHS5dLwF4sz42H\n VxBmpDpneelwJ3kgGYOW84BJhP1Vl/p+ijb0SQtJRbhDA++uHRzWEmaQfNqygpyHfkhnFnG49gU\n BHV+7wBQ==","X-Received":"by 2002:a17:902:e748:b0:2b2:50bd:83b3 with SMTP id\n d9443c01a7336-2b281706f12mr155887925ad.10.1775543706340;\n Mon, 06 Apr 2026 23:35:06 -0700 (PDT)","From":"AceLan Kao <acelan.kao@canonical.com>","To":"kernel-team@lists.ubuntu.com","Subject":"[SRU][Q][PATCH 0/1] System hangs during stress-ng stack test","Date":"Tue,  7 Apr 2026 14:34:58 +0800","Message-ID":"<20260407063459.106734-1-acelan.kao@canonical.com>","X-Mailer":"git-send-email 2.53.0","MIME-Version":"1.0","Received-SPF":"pass client-ip=209.85.214.179; envelope-from=acelan@gmail.com;\n helo=mail-pl1-f179.google.com","X-BeenThere":"kernel-team@lists.ubuntu.com","X-Mailman-Version":"2.1.20","Precedence":"list","List-Id":"Kernel team discussions <kernel-team.lists.ubuntu.com>","List-Unsubscribe":"<https://lists.ubuntu.com/mailman/options/kernel-team>,\n <mailto:kernel-team-request@lists.ubuntu.com?subject=unsubscribe>","List-Archive":"<https://lists.ubuntu.com/archives/kernel-team>","List-Post":"<mailto:kernel-team@lists.ubuntu.com>","List-Help":"<mailto:kernel-team-request@lists.ubuntu.com?subject=help>","List-Subscribe":"<https://lists.ubuntu.com/mailman/listinfo/kernel-team>,\n <mailto:kernel-team-request@lists.ubuntu.com?subject=subscribe>","Content-Type":"text/plain; charset=\"utf-8\"","Content-Transfer-Encoding":"base64","Errors-To":"kernel-team-bounces@lists.ubuntu.com","Sender":"\"kernel-team\" <kernel-team-bounces@lists.ubuntu.com>"},"content":"From: \"Chia-Lin Kao (AceLan)\" <acelan.kao@canonical.com>\n\nBugLink: https://bugs.launchpad.net/bugs/2137755\n\n[Impact]\nstress-ng memory stress test fails with stack stressor timeout on Dell\nsystems (CID: 202511-38062) running kernel 6.17.0-1007-oem. The stack\nstressor, which creates heavy memory pressure and swap activity,\nconsistently times out after running for the expected duration.\n\nThe issue occurs because the swap allocator uses an incorrect index when\nretrying swap cache reclaim after encountering a race condition. During\nheavy memory pressure (such as generated by the stack stressor), the\nallocator reclaims cached swap slots while scanning. If it finds a folio\nthat's already removed from the swap cache due to a race, it retries - but\nthe retry uses the wrong index, which can lead to:\n1. Reclaiming irrelevant swap folios instead of the intended ones\n2. Inefficient swap reclaim behavior under memory pressure\n3. Performance degradation that causes stress tests to timeout\n\nAffected hardware: Dell systems (CID: 202511-38062) with high core count\nand memory configurations\nFailure rate: 100% (2/2 test runs failed)\n\n[Fix]\nUpstream commit a733d8de7f1cc (\"mm, swap: fix swap cache index error when\nretrying reclaim\") fixes the swap cache index handling.\n\nThe fix makes two key changes:\n1. Makes the `entry` variable const to prevent incorrect reassignment\n2. Uses `folio->swap` directly when updating the offset after retrying,\ninstead of using the stale `entry` variable\n\nThis ensures that when the allocator retries after a race condition, it\nuses the correct swap cache index from the locked folio, preventing reclaim\nof irrelevant folios.\n\nThe patch is upstream in mainline kernel v6.18 and reviewed by multiple\nmemory management maintainers.\n\nLink: https://lkml.kernel.org/r/20250916160100.31545-4-ryncsn@gmail.com\nFixes: fae859550531 (\"mm, swap: avoid reclaiming irrelevant swap cache\")\n\n[Test Plan]\nOn affected Dell systems (CID: 202511-38062) or similar systems with high\ncore count and memory:\n\n1. Install kernel with the fix\n\n2. Run the stress test:\n   ```\n   # Run stress-ng with stack stressor\n   stress-ng --aggressive --verify --oom-avoid-bytes 10% --timeout 920 --stack 8\n   ```\n\n3. Monitor the test execution:\n   - The test should complete within the expected 920 second timeout\n   - Check that stress-ng reports \"successful run completed\" for the stack\n     stressor\n\nWithout the patch:\n- stress-ng stack stressor times out and is forcefully terminated\n- System may hang if the stress-ng process fails to be killed\n\nWith the patch:\n- stress-ng stack stressor completes within timeout period\n\n4. Optionally verify swap activity during the test:\n   ```\n   # Monitor swap usage\n   watch -n 1 'free -h && cat /proc/swaps'\n   ```\n   Swap should be actively used and reclaimed without unusual delays.\n\n[Where problems could occur]\nThe changes affect the swap file subsystem's reclaim logic in mm/swapfile.c,\nspecifically the __try_to_reclaim_swap() function.\n\nIf the fix introduces incorrect behavior:\n\n1. **Incorrect folio identification**: If `folio->swap` doesn't properly\nreflect the current state after locking, the code might still reclaim the\nwrong folio. However, this is unlikely since the folio is locked and the\nswap entry is validated before use.\n\n2. **Performance regression**: The change from using a cached `entry` value\nto dereferencing `folio->swap` multiple times could theoretically impact\nperformance. However, this should be negligible as the additional\ndereferences only occur in the retry path (race condition case) which is not\nthe common case.\n\n3. **Const qualifier issues**: Making `entry` const prevents reassignment.\nIf there were other code paths that relied on reassigning `entry` (not\nvisible in the upstream patch), compilation would fail. However, the\nupstream kernel has this change merged and tested.\n\n4. **Backport conflicts**: The backport required manual resolution because\nthe target branch still has an `address_space` variable that was removed\nupstream. If the resolution was incorrect, swap cache lookups could fail.\nHowever, the resolution preserves the `address_space` variable while\napplying the const qualifier and folio->swap usage as intended.\n\nThe impact is limited to swap reclaim behavior under memory pressure. The\nfix makes the code more correct by ensuring the right swap slots are\nreclaimed during races, which should improve rather than degrade stability.\n\n\nKairui Song (1):\n  mm, swap: fix swap cache index error when retrying reclaim\n\n mm/swapfile.c | 8 ++++----\n 1 file changed, 4 insertions(+), 4 deletions(-)"}