From patchwork Wed Aug 17 08:51:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gerald Yang X-Patchwork-Id: 1667149 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=sHP1ghJl; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4M71wy5VcDz1ygk for ; Wed, 17 Aug 2022 18:52:18 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1oOEmO-0000KY-NJ; Wed, 17 Aug 2022 08:52:00 +0000 Received: from smtp-relay-internal-0.internal ([10.131.114.225] helo=smtp-relay-internal-0.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1oOEmM-0000K7-Q1 for kernel-team@lists.ubuntu.com; Wed, 17 Aug 2022 08:51:58 +0000 Received: from mail-pj1-f71.google.com (mail-pj1-f71.google.com [209.85.216.71]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 48D2A3FB94 for ; Wed, 17 Aug 2022 08:51:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1660726318; bh=K7oShyIKAW5G4C0Mk7I17kAougHj7nTj6HhUY/Oe+6U=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=sHP1ghJl9CXQbFYGdbGwz6JFBriyMb6uVy18v/ocJTnOlQ8HXwD+P+mZop42ceGnz yVMCjx9KAtL7M6v+RClfDfbog8lzOi7fpM4Jt95ItGcMshXEhPmYeX0QfcZN/Cjr4q lDH1R7th9wlOC3NtimL4Huc1LocS/ETvnMLK5oxpLV07UxWeuJ/0uC/CZMam8WtPdT ksBdBzd2NUHdjNgdpqQfZJQgjX5Sz7pOKkZXAZ0m324CVYjmlImW8VJXyXnkoNTDNI 4cZPDqlMLADHvMDRh1SGR4lyw4ThODyJJsGoc1JkR3NcxqTegxXE9AwuVicgTpZj4L 2piaXTg+MqKgg== Received: by mail-pj1-f71.google.com with SMTP id o18-20020a17090aac1200b001f3252af009so853311pjq.7 for ; Wed, 17 Aug 2022 01:51:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc; bh=K7oShyIKAW5G4C0Mk7I17kAougHj7nTj6HhUY/Oe+6U=; b=aaOrryqe35gNj5joPEayXtaw4iUMggoTenKlGM/m1co8aZKx40dLnZuJyh4GlKO6HQ t0MistuiYgCOwrktggwLH7dXTYK51QAMQk3pRcBL4NfwOyPd9lUflq8PcbYIPydCPX7d 4F1+0A8SPWNEcTqBo1/6jhPlgfc7q99NOnDR1q1hdZ4aGl3L3j855qauEq9wYF8+HzPc XNyCHwg9mc/+tQGtQs2YUvv7cKqOExrJFqVSJXgNFB3EXrAjBx4koWVIU4ubt0floGYa Z2FrCsrV4G3d5KDu3x/P/M2g6v/hintUWKqGHOjiX33W55U0EkPe8w0vcHrzPHwCMhQv JJDw== X-Gm-Message-State: ACgBeo1YsF9MpW0an8cYc/sq2kSOy8lG8xcBLauPYKfEXJjiJZhY2tUJ FOJTZKTtd8FIKtcU1SCILou2La30vMYDfVPzRRb/JD1ebYdY+qPz9zMg+wRv34S9rW0tGWu0A0K Bulwue6kAYKj6IPMyflm/9Dd7HssCORHRUIXqpTapXA== X-Received: by 2002:a17:90b:3511:b0:1f7:3c52:4b98 with SMTP id ls17-20020a17090b351100b001f73c524b98mr2630946pjb.17.1660726315848; Wed, 17 Aug 2022 01:51:55 -0700 (PDT) X-Google-Smtp-Source: AA6agR5GdyqJUlU3R6r0vvph1rv7JS8n3Y7BpW5n24vClTr7EYjhUSyHcDJIcf/DfnnqEOPHQ7zcXw== X-Received: by 2002:a17:90b:3511:b0:1f7:3c52:4b98 with SMTP id ls17-20020a17090b351100b001f73c524b98mr2630922pjb.17.1660726315233; Wed, 17 Aug 2022 01:51:55 -0700 (PDT) Received: from localhost.localdomain (220-135-31-21.hinet-ip.hinet.net. [220.135.31.21]) by smtp.gmail.com with ESMTPSA id s90-20020a17090a69e300b001f522180d46sm1001033pjj.8.2022.08.17.01.51.54 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Aug 2022 01:51:54 -0700 (PDT) From: Gerald Yang To: kernel-team@lists.ubuntu.com Subject: [SRU][jammy/linux-aws][kinetic/linux-aws][PATCH 00/20] UBUNTU: SAUCE: PM: Hibernate: Enable Hibernation for Xen Based Instance Types Date: Wed, 17 Aug 2022 16:51:26 +0800 Message-Id: <20220817085150.2078055-1-gerald.yang@canonical.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/1968062 SRU Justification: [Impact] Hibernation currently fails for all AWS Xen instance types (c3/c4/i3/m3/m4/r3/r4/t2) with Jammy 5.15 and Kinetic 5.19 linux-aws kernels. When attempting to hibernate, the system gets stuck in sync_inodes_one_sb() when processing the rootfs, fails to hibernate, and shuts down. When you start the instance, it starts fresh, and does not resume from the incomplete hibernation image. Networking is also broken, and you cannot ssh in. Upon review of the jammy/linux-aws git log, it appears that the kernel is missing AWS hibernation enablement patches entirely. These need to be included to get hibernation working. [Fix] Hibernation currently works on the Amazon Linux 2 5.15 Kernel: https://github.com/amazonlinux/linux/tree/amazon-5.15.y/mainline After careful review of the amazon-5.15.y/mainline branch, we have found the below set of patches authored by Amazon AWS Hibernation team to be minimally sufficient to get hibernation working on both Jammy 5.15 and Kinetic 5.19. xen: Restore xen-pirqs on resume from hibernation xen-netfront: call netif_device_attach on resume xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs. xen: restore pirqs on resume from hibernation. block: xen-blkfront: consider new dom0 features on restore x86: tsc: avoid system instability in hibernation xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq Revert "xen: dont fiddle with event channel masking in suspend/resume" PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA x86/xen: close event channels for PIRQs in system core suspend callback xen/events: add xen_shutdown_pirqs helper function x86/xen: save and restore steal clock xen/time: introduce xen_{save,restore}_steal_clock xen-netfront: add callbacks for PM suspend and hibernation support xen-blkfront: add callbacks for PM suspend and hibernation x86/xen: add system core suspend and resume callbacks x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume xenbus: add freeze/thaw/restore callbacks support xen/manage: introduce helper function to know the on-going suspend mode xen/manage: keep track of the on-going suspend mode These patches will be carried as SAUCE patches, and their subjects marked with "UBUNTU: SAUCE [aws]". Their upstream is the Amazon Hibernation team, with the repo being the Amazon Linux 2 kernel repo. [Testcase] 1. Log into Amazon EC2. 2. Select Launch Instance. 3. Under Instance Type, select any from (c3/c4/i3/m3/m4/r3/r4/t2). I suggest t2.medium. 4. Select the "Ubuntu 22.04 LTS HVM (SSD type)" AMI in the quicklaunch pane. 5. Select your SSH keypair. 6. In storage, select 20gb. Go to the advanced tab, and set Encrypted: Yes. 7. Under Advanced Settings for the instance, set "Stop - Hibernate" to Enable. 8. Create the Instance. SSH in. 9. Wait 5 minutes for hibinit-agent to create /swap-hibinit swapfile and configure grub. 10. Start a screen session. Echo some text and then detach with ctrl-d. 11. Log out from instance. 12. In EC2, select "Instance State" > "Hibernate". 13. Wait 30 seconds to one minute. The state will go from "Stopping" to "Stopped". 14. Start the instance again. 15. SSH in. 16. Attempt to resume screen session with "screen -r". If you are not able to ssh into the instance, hibernation had failed. If ssh works and the screen session is still running, hibernation was successful. Alternatively, the CPC team can run their Hibernation testsuite over Jammy and Kinetic. We have built test kernels for Jammy and Kinetic with the patches, and they are available in the below ppa: https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/aws-hibernate-test If you try and hibernate and resume with the test kernels, hibernation is successful. [Where problems could occur] We are adding a significant amount of code to the Xen subsystem, spread across many commits. This code has not been mainlined, and is instead maintained out of tree by the Amazon AWS Hibernation team. The changes target hibernation, block devices, and clock devices, specific to those used on AWS Xen instances. Most of these patches have been applied to Xenial, Bionic, Focal and other series for a long time, but some patches are new for 5.15 onward. The changes will only target linux-aws to try and limit regression risk to AWS users, and any regressions will be limited to users of Xen based instance types (c3/c4/i3/m3/m4/r3/r4/t2), covering both Xen 4.2 and Xen 4.11. If a regression were to occur, the instance would likely fail to hibernate, and at worst, write an incomplete hibernation image to the swapfile. The kernel will see this on start, and instead of resuming from the hibernation image, will start fresh. It is unlikely to cause any filesystem corruption on the rootfs, but any in progress computations at the time of hibernation could be lost. The current broken behaviour breaks networking, and users would have to power cycle the instance a few times before they can ssh in again. Aleksei Besogonov (1): PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA Anchal Agarwal (4): x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume Revert "xen: dont fiddle with event channel masking in suspend/resume" xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq xen: Restore xen-pirqs on resume from hibernation Eduardo Valentin (2): x86: tsc: avoid system instability in hibernation block: xen-blkfront: consider new dom0 features on restore Frank van der Linden (3): xen: restore pirqs on resume from hibernation. xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs. xen-netfront: call netif_device_attach on resume Munehisa Kamata (10): xen/manage: keep track of the on-going suspend mode xen/manage: introduce helper function to know the on-going suspend mode xenbus: add freeze/thaw/restore callbacks support x86/xen: add system core suspend and resume callbacks xen-blkfront: add callbacks for PM suspend and hibernation xen-netfront: add callbacks for PM suspend and hibernation support xen/time: introduce xen_{save,restore}_steal_clock x86/xen: save and restore steal clock xen/events: add xen_shutdown_pirqs helper function x86/xen: close event channels for PIRQs in system core suspend callback arch/x86/kernel/tsc.c | 29 ++++++ arch/x86/xen/enlighten_hvm.c | 8 ++ arch/x86/xen/suspend.c | 67 +++++++++++++ arch/x86/xen/time.c | 3 + arch/x86/xen/xen-ops.h | 2 + drivers/block/xen-blkfront.c | 161 ++++++++++++++++++++++++++++-- drivers/net/xen-netfront.c | 104 ++++++++++++++++++- drivers/xen/events/events_base.c | 30 +++++- drivers/xen/manage.c | 73 ++++++++++++++ drivers/xen/time.c | 29 +++++- drivers/xen/xenbus/xenbus_probe.c | 99 +++++++++++++++--- include/linux/irq.h | 2 + include/linux/sched/clock.h | 5 + include/xen/events.h | 2 + include/xen/xen-ops.h | 8 ++ include/xen/xenbus.h | 3 + kernel/irq/chip.c | 4 +- kernel/power/user.c | 4 + kernel/sched/clock.c | 4 +- 19 files changed, 604 insertions(+), 33 deletions(-) Acked-by: Tim Gardner Acked-by: Marcelo Henrique Cerri