From patchwork Wed Nov 27 08:00:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: You-Sheng Yang X-Patchwork-Id: 1201387 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NCrV5nDCz9sT1; Wed, 27 Nov 2019 19:00:58 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1iZsFp-0000th-Vw; Wed, 27 Nov 2019 08:00:53 +0000 Received: from mail-pl1-f194.google.com ([209.85.214.194]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1iZsFa-0000hz-FE for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 08:00:38 +0000 Received: by mail-pl1-f194.google.com with SMTP id d7so9395612pls.3 for ; Wed, 27 Nov 2019 00:00:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FVgLMnFoSZKtp69dulU5FB5J0szxcb+BM7g0q0aYleM=; b=cV+/QFi83QvjiJ8ZZ7xtEhKYlBpC7IC6BAmbtq0qO3XEQwAIUeCFjf0ZUZmye7MawH ONa6s35Nx4NDr600chNt0t3pYzAVazGXZs6be4kINGkruljbuTg2ViQbt/9IkvbVQD8S NDe75gtZ23kzqCSD2yVAPzWkp2y3ZhAHMDPuhS/e5aKoguOnCw98P7fRv94LLdqljTW1 B/IPGIqH5Mv7L00nDbgK7jn4sh5FGJGChjJfEVe1PWUdrgVTh4fwy+RX7D1TP5SNam7T kZDIsZynBjdCPnkAnVHRNdMQMtF/OSoN1tcoyRsVlslaFDIzOj6v93LnA+/AlEo84mUq vRgg== X-Gm-Message-State: APjAAAXAYmE1E6IEu73Z9OZgX6WZCw0tcHaj8DHKjUFOUDBbn1T966js phqdSOl4KPa3Mk2wtkAcTdOnS7Ry X-Google-Smtp-Source: APXvYqzNi6RLv3Ok7CiayljGVPNCfoPMiGj9UhLkcbhIjxt3tAIvULSTvyc/uscnyX+BDw015GGpDg== X-Received: by 2002:a17:90a:bd0c:: with SMTP id y12mr4412604pjr.108.1574841635918; Wed, 27 Nov 2019 00:00:35 -0800 (PST) Received: from localhost (61-220-137-37.HINET-IP.hinet.net. [61.220.137.37]) by smtp.gmail.com with ESMTPSA id j23sm15035708pfe.95.2019.11.27.00.00.34 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 00:00:35 -0800 (PST) From: You-Sheng Yang To: kernel-team@lists.ubuntu.com Subject: [SRU][D/OEM-OSP1-B][PATCH v2 02/20] perf/x86/kvm: Avoid unnecessary work in guest filtering Date: Wed, 27 Nov 2019 16:00:10 +0800 Message-Id: <20191127080028.200261-3-vicamo.yang@canonical.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191127080028.200261-1-vicamo.yang@canonical.com> References: <20191127080028.200261-1-vicamo.yang@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Andi Kleen KVM added a workaround for PEBS events leaking into guests with commit: 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.") This uses the VT entry/exit list to add an extra disable of the PEBS_ENABLE MSR. Intel also added a fix for this issue to microcode updates on Haswell/Broadwell/Skylake. It turns out using the MSR entry/exit list makes VM exits significantly slower. The list is only needed for disabling PEBS, because the GLOBAL_CTRL change gets optimized by KVM into changing the VMCS. Check for the microcode updates that have the microcode fix for leaking PEBS, and disable the extra entry/exit list entry for PEBS_ENABLE. In addition we always clear the GLOBAL_CTRL for the PEBS counter while running in the guest, which is enough to make them never fire at the wrong side of the host/guest transition. The overhead for VM exits with the filtering active with the patch is reduced from 8% to 4%. The microcode patch has already been merged into future platforms. This patch is one-off thing. The quirks is used here. For other old platforms which doesn't have microcode patch and quirks, extra disable of the PEBS_ENABLE MSR is still required. Signed-off-by: Andi Kleen Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: David Ahern Cc: Jiri Olsa Cc: Linus Torvalds Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: bp@alien8.de Link: https://lkml.kernel.org/r/1549319013-4522-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar (cherry picked from commit 9b545c04abd4f7246a3bde040efde587abebb23c) Signed-off-by: You-Sheng Yang --- arch/x86/events/intel/core.c | 74 +++++++++++++++++++++++++++++++----- arch/x86/events/intel/ds.c | 2 + arch/x86/events/perf_event.h | 15 ++++---- 3 files changed, 75 insertions(+), 16 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 4978f1d1bc60..4c64fba70832 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -18,6 +18,7 @@ #include #include #include +#include #include "../perf_event.h" @@ -3250,16 +3251,27 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr) arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask; arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask; - /* - * If PMU counter has PEBS enabled it is not enough to disable counter - * on a guest entry since PEBS memory write can overshoot guest entry - * and corrupt guest memory. Disabling PEBS solves the problem. - */ - arr[1].msr = MSR_IA32_PEBS_ENABLE; - arr[1].host = cpuc->pebs_enabled; - arr[1].guest = 0; + if (x86_pmu.flags & PMU_FL_PEBS_ALL) + arr[0].guest &= ~cpuc->pebs_enabled; + else + arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK); + *nr = 1; + + if (x86_pmu.pebs && x86_pmu.pebs_no_isolation) { + /* + * If PMU counter has PEBS enabled it is not enough to + * disable counter on a guest entry since PEBS memory + * write can overshoot guest entry and corrupt guest + * memory. Disabling PEBS solves the problem. + * + * Don't do this if the CPU already enforces it. + */ + arr[1].msr = MSR_IA32_PEBS_ENABLE; + arr[1].host = cpuc->pebs_enabled; + arr[1].guest = 0; + *nr = 2; + } - *nr = 2; return arr; } @@ -3833,6 +3845,47 @@ static __init void intel_clovertown_quirk(void) x86_pmu.pebs_constraints = NULL; } +static const struct x86_cpu_desc isolation_ucodes[] = { + INTEL_CPU_DESC(INTEL_FAM6_HASWELL_CORE, 3, 0x0000001f), + INTEL_CPU_DESC(INTEL_FAM6_HASWELL_ULT, 1, 0x0000001e), + INTEL_CPU_DESC(INTEL_FAM6_HASWELL_GT3E, 1, 0x00000015), + INTEL_CPU_DESC(INTEL_FAM6_HASWELL_X, 2, 0x00000037), + INTEL_CPU_DESC(INTEL_FAM6_HASWELL_X, 4, 0x0000000a), + INTEL_CPU_DESC(INTEL_FAM6_BROADWELL_CORE, 4, 0x00000023), + INTEL_CPU_DESC(INTEL_FAM6_BROADWELL_GT3E, 1, 0x00000014), + INTEL_CPU_DESC(INTEL_FAM6_BROADWELL_XEON_D, 2, 0x00000010), + INTEL_CPU_DESC(INTEL_FAM6_BROADWELL_XEON_D, 3, 0x07000009), + INTEL_CPU_DESC(INTEL_FAM6_BROADWELL_XEON_D, 4, 0x0f000009), + INTEL_CPU_DESC(INTEL_FAM6_BROADWELL_XEON_D, 5, 0x0e000002), + INTEL_CPU_DESC(INTEL_FAM6_BROADWELL_X, 2, 0x0b000014), + INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_X, 3, 0x00000021), + INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_X, 4, 0x00000000), + INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_MOBILE, 3, 0x0000007c), + INTEL_CPU_DESC(INTEL_FAM6_SKYLAKE_DESKTOP, 3, 0x0000007c), + INTEL_CPU_DESC(INTEL_FAM6_KABYLAKE_DESKTOP, 9, 0x0000004e), + INTEL_CPU_DESC(INTEL_FAM6_KABYLAKE_MOBILE, 9, 0x0000004e), + INTEL_CPU_DESC(INTEL_FAM6_KABYLAKE_MOBILE, 10, 0x0000004e), + INTEL_CPU_DESC(INTEL_FAM6_KABYLAKE_MOBILE, 11, 0x0000004e), + INTEL_CPU_DESC(INTEL_FAM6_KABYLAKE_MOBILE, 12, 0x0000004e), + INTEL_CPU_DESC(INTEL_FAM6_KABYLAKE_DESKTOP, 10, 0x0000004e), + INTEL_CPU_DESC(INTEL_FAM6_KABYLAKE_DESKTOP, 11, 0x0000004e), + INTEL_CPU_DESC(INTEL_FAM6_KABYLAKE_DESKTOP, 12, 0x0000004e), + INTEL_CPU_DESC(INTEL_FAM6_KABYLAKE_DESKTOP, 13, 0x0000004e), + {} +}; + +static void intel_check_pebs_isolation(void) +{ + x86_pmu.pebs_no_isolation = !x86_cpu_has_min_microcode_rev(isolation_ucodes); +} + +static __init void intel_pebs_isolation_quirk(void) +{ + WARN_ON_ONCE(x86_pmu.check_microcode); + x86_pmu.check_microcode = intel_check_pebs_isolation; + intel_check_pebs_isolation(); +} + static int intel_snb_pebs_broken(int cpu) { u32 rev = UINT_MAX; /* default to broken for unknown models */ @@ -4536,6 +4589,7 @@ __init int intel_pmu_init(void) case INTEL_FAM6_HASWELL_ULT: case INTEL_FAM6_HASWELL_GT3E: x86_add_quirk(intel_ht_bug); + x86_add_quirk(intel_pebs_isolation_quirk); x86_pmu.late_ack = true; memcpy(hw_cache_event_ids, hsw_hw_cache_event_ids, sizeof(hw_cache_event_ids)); memcpy(hw_cache_extra_regs, hsw_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); @@ -4567,6 +4621,7 @@ __init int intel_pmu_init(void) case INTEL_FAM6_BROADWELL_XEON_D: case INTEL_FAM6_BROADWELL_GT3E: case INTEL_FAM6_BROADWELL_X: + x86_add_quirk(intel_pebs_isolation_quirk); x86_pmu.late_ack = true; memcpy(hw_cache_event_ids, hsw_hw_cache_event_ids, sizeof(hw_cache_event_ids)); memcpy(hw_cache_extra_regs, hsw_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); @@ -4629,6 +4684,7 @@ __init int intel_pmu_init(void) case INTEL_FAM6_SKYLAKE_X: case INTEL_FAM6_KABYLAKE_MOBILE: case INTEL_FAM6_KABYLAKE_DESKTOP: + x86_add_quirk(intel_pebs_isolation_quirk); x86_pmu.late_ack = true; memcpy(hw_cache_event_ids, skl_hw_cache_event_ids, sizeof(hw_cache_event_ids)); memcpy(hw_cache_extra_regs, skl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 18cb68d3e003..b71adf603b86 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1628,6 +1628,8 @@ void __init intel_ds_init(void) x86_pmu.bts = boot_cpu_has(X86_FEATURE_BTS); x86_pmu.pebs = boot_cpu_has(X86_FEATURE_PEBS); x86_pmu.pebs_buffer_size = PEBS_BUFFER_SIZE; + if (x86_pmu.version <= 4) + x86_pmu.pebs_no_isolation = 1; if (x86_pmu.pebs) { char pebs_type = x86_pmu.intel_cap.pebs_trap ? '+' : '-'; int format = x86_pmu.intel_cap.pebs_format; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index b68ab65454ff..1e98a42b560a 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -606,13 +606,14 @@ struct x86_pmu { /* * Intel DebugStore bits */ - unsigned int bts :1, - bts_active :1, - pebs :1, - pebs_active :1, - pebs_broken :1, - pebs_prec_dist :1, - pebs_no_tlb :1; + unsigned int bts :1, + bts_active :1, + pebs :1, + pebs_active :1, + pebs_broken :1, + pebs_prec_dist :1, + pebs_no_tlb :1, + pebs_no_isolation :1; int pebs_record_size; int pebs_buffer_size; void (*drain_pebs)(struct pt_regs *regs);