From patchwork Mon Oct 7 16:35:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 1172902 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46n5j20qfpz9sPF; Tue, 8 Oct 2019 03:36:37 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1iHVzs-00020K-E0; Mon, 07 Oct 2019 16:36:32 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1iHVzp-0001zn-PJ for kernel-team@lists.ubuntu.com; Mon, 07 Oct 2019 16:36:29 +0000 Received: from mail-wm1-f72.google.com ([209.85.128.72]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1iHVzp-00056h-KS for kernel-team@lists.ubuntu.com; Mon, 07 Oct 2019 16:36:29 +0000 Received: by mail-wm1-f72.google.com with SMTP id s25so60842wmh.1 for ; Mon, 07 Oct 2019 09:36:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=IcDlz2iJajPIGzh5Lilyb/tfffVwi8PudfYN1XiUlN0=; b=Kq4XLaaD0pj1rlAaBBBQgKG2YSdVeGAam+UWyo4PdilMO+QumhxNBMDMu0BgosyDi0 sRlR5ZMNdNErtDs9ofBMF9jpJbZLU3hd3fK23Y2IC1Aj3pjLvqcs9DhZAt3YTjFIxLXB 4+VRnl37/tzAXM2X30tQ9OAMHYcB6qZ2Ho4sVRs7r08UQW++0WVT/oQTfxk8uaH7MOh2 NghHMjNMQE0UYQhBuvpYzkm2hRAJn+wefWuZu9CDZo+UTTco2qn3bbmDSWXnCg8zIaGM lnuACeaZ4ECy/FzNnIQ7CyBftdKGyalIMnVW6RLB49i7vroyRb9Q4EE18HIn4dm+p1V/ LhfA== X-Gm-Message-State: APjAAAWXcoMO6m9RnIqqQR0H3c0NCdE7LK53GMll62T84+I+ZoDmEeH3 5NDDHdO/GnHY3tH4hVrmEm5DKm812QW4kRyN/Wgyfboyc8UoyISri1t/tnJuU8CyMe7io80WpOr agvOGr1kPYdTQSEfPNb1Sap9qeCvSLBPayvHXQ4DqrQ== X-Received: by 2002:a1c:48c6:: with SMTP id v189mr161199wma.20.1570466188990; Mon, 07 Oct 2019 09:36:28 -0700 (PDT) X-Google-Smtp-Source: APXvYqx481OAPeKqAWG/jLq6uyahPIuXvZU3Ung5E0fVlhZPzbct32cT50t2vs03J91c+WZVTPyUyg== X-Received: by 2002:a1c:48c6:: with SMTP id v189mr161185wma.20.1570466188700; Mon, 07 Oct 2019 09:36:28 -0700 (PDT) Received: from localhost.localdomain ([95.239.130.49]) by smtp.gmail.com with ESMTPSA id x5sm13914545wrt.75.2019.10.07.09.36.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2019 09:36:27 -0700 (PDT) From: Andrea Righi To: kernel-team@lists.ubuntu.com Subject: [SRU][X/B/D/E] [PATCH 0/1] PM / hibernate: fix potential memory corruption Date: Mon, 7 Oct 2019 18:35:48 +0200 Message-Id: <20191007163550.20548-1-andrea.righi@canonical.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/1847118 [Impact] A caching bug in the hibernation code can lead to potential memory corruptions on resume. The hibernation code is representing all the allocated pages in memory (pfn) using a list of extents, inside each extent it uses a radix tree and each node in the tree contains a bitmap. This structure is used to save the memory image to disk. To speed up lookups in this structure the kernel is caching the position of the previous lookup in the form (current_extent, current_node). However, if two consecutive lookups are distant enough from each other, the extent can change, but the kernel can still use the cached node (current_node), accessing the wrong bitmap and ending up saving to disk the wrong pfn's. [Test Case] Bug has been reproduced in Xenial and Bionic trying to hibernate a large instance with a lot of RAM (100GB+). But we also wrote a custom kernel module to better isolate the code that triggers the problem: https://code.launchpad.net/~arighi/+git/mybitmap This module has exactly the same code as the hibernation code, but it can be used as a fast test case to reproduce the problem without actually triggering a real hibernation/resume cycle. [Fix] This bug can be fixed by properly invalidating the cached pair (extent, node) when the next lookup falls in a different extent or a different node. [Regression Potential] The fix has been sent to the LKML for review/feedback (https://lkml.org/lkml/2019/9/25/393), we have not received any feedback so far, but the bug is pretty clear and well tested on the affected platforms. Moreover, the code is isolated to the hibernation area, so the overall regression potential is minimal. ---------------------------------------------------------------- Andy Whitcroft (1): PM / hibernate: memory_bm_find_bit -- tighten node optimisation kernel/power/snapshot.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) Acked-by: Colin Ian King Acked-by: Thadeu Lima de Souza Cascardo