From patchwork Thu May 20 13:36:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 1481626 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Fm9kH2Lq1z9sWF; Thu, 20 May 2021 23:36:23 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ljiqX-0001bB-Hx; Thu, 20 May 2021 13:36:17 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ljiqW-0001az-Bo for kernel-team@lists.ubuntu.com; Thu, 20 May 2021 13:36:16 +0000 Received: from mail-ej1-f69.google.com ([209.85.218.69]) by youngberry.canonical.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1ljiqW-0000L0-2y for kernel-team@lists.ubuntu.com; Thu, 20 May 2021 13:36:16 +0000 Received: by mail-ej1-f69.google.com with SMTP id z1-20020a1709068141b02903cd421d7803so5007530ejw.22 for ; Thu, 20 May 2021 06:36:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=tb5OWYrWU7FnaDeKPioLhM1bvL/12U3tn68rd/5jZNQ=; b=eRWT3IUsU/k0/hYDcVzjCo3JmGJGBEdP+FfGJgqWByv2x2bi5Y/gwb1zWPDfG7SSOW z4759EFbVfLAULwE7DCT+dPXCarb9/m/TvixPoO3QVQImmbhQM+fY8EikmoQtYr/dOW5 POtUlO3KuX7oXp8MJInYh88d3ripooQNB44p6q1cXsrk53w3YHU3iJGSuGCWUt3nooJt iS3q12C8/AGpWP5Qml6wvtP0Wnndotpux+5IxPR2NdCAGglCfz8Uc3PBQHltYYGawthS aA6J4YkuMReW/OosI3r3vlMn4K7Je60wiPUK5NWxJwAbjmEadCFDkYfO8vpn68VoXvGY d+XQ== X-Gm-Message-State: AOAM5307F9oq2ig+o9SDCtEh7X4pp/zdWEUGGxn0M2hSWwroPjQVEpPo PQHxPTPgZETvGiOU95kVPerKTqLJ8mmsG/N0o1PaKINZzeYTzW0J2mfpNNVs/2BtlGawmGyrFjg yampTx6t34qoR4d0/2jgaxvwTWuiQwkgOXZPsKVqhFw== X-Received: by 2002:a50:fb0a:: with SMTP id d10mr4973137edq.47.1621517775805; Thu, 20 May 2021 06:36:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyCx/ludWx3InilX1vmmopJ8yMX89iqKotcNgYrSNG4cR6QijURHa3/8zKg/PPPlhGJwYnsww== X-Received: by 2002:a50:fb0a:: with SMTP id d10mr4973115edq.47.1621517775592; Thu, 20 May 2021 06:36:15 -0700 (PDT) Received: from xps-13-7390.homenet.telecomitalia.it (host-87-19-3-42.retail.telecomitalia.it. [87.19.3.42]) by smtp.gmail.com with ESMTPSA id gw6sm936959ejb.86.2021.05.20.06.36.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 May 2021 06:36:15 -0700 (PDT) From: Andrea Righi To: kernel-team@lists.ubuntu.com Subject: [SRU][F][PATCH 0/5] kvm: properly tear down PV features on hibernate Date: Thu, 20 May 2021 15:36:06 +0200 Message-Id: <20210520133611.39540-1-andrea.righi@canonical.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" [Impact] In LP: #1918694 we applied a fix and a workaround to solve the hibernation issues on c5.18xlarge. The workaround was in the form of a SAUCE patch: "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" It looks like we can replace this workaround with a proper fix, by applying this patch: http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/ This is required because various PV features (Async PF, PV EOI, steal time) work through memory shared with hypervisor and when we restore from hibernation we must properly tear down all these features to make sure hypervisor doesn't write to stale locations after we jump to the previously hibernated kernel. For this reason it is safe to apply this patch set also to the all the generic kernels and not just AWS. [Test plan] This can be easily tested on AWS (but it should be reproduced by hibernating any kvm instance with multiple CPUs). Create a c5.18xlarge instance, run the memory stress test script (the same test script that we are using to stress test hibernation), trigger the hibernate event, trigger the resume event. Repeat a couple of times and the problem is very likely to happen. [Fix] On the AWS kernel replace "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" with: http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/ For the other kernels, simply apply this patch set. The fix has been tested extensively in the AWS infrastructure with positive results. [Regression potential] This new code introduced by the fix can be executed also when a CPU is put offline, so we may see potential regressions in the KVM CPU hot-plugging. Acked-by: Guilherme G. Piccoli Acked-by: Tim Gardner