Message ID | 20170724190757.11278-17-brijesh.singh@amd.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Mon, Jul 24, 2017 at 02:07:56PM -0500, Brijesh Singh wrote: > Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU MSRs > variable at compile time and share its physical address with hypervisor. That sentence needs changing - the MSRs don't allocate - for them gets allocated. > It presents a challege when SEV is active in guest OS, when SEV is active, > the guest memory is encrypted with guest key hence hypervisor will not > able to modify the guest memory. When SEV is active, we need to clear the > encryption attribute (aka C-bit) of shared physical addresses so that both > guest and hypervisor can access the data. This whole paragraph needs rewriting. > To solve this problem, I have tried these three options: > > 1) Convert the static per-CPU to dynamic per-CPU allocation and when SEV > is detected clear the C-bit from the page table. But while doing so I > found that per-CPU dynamic allocator was not ready when kvm_guest_cpu_init > was called. > > 2) Since the C-bit works on PAGE_SIZE hence add some extra padding to > 'struct kvm-steal-time' to make it PAGE_SIZE and then at runtime "to make it PAGE_SIZE"? I know what it means but it reads strange and needs more restraint when rewriting it. :) > clear the encryption attribute of the full PAGE. The downside of this - > we need to modify structure which may break the compatibility. > > 3) Define a new per-CPU section (.data..percpu.hv_shared) which will be > used to hold the compile time shared per-CPU variables. When SEV is > detected we map this section without C-bit. > > This patch implements #3. From Documentation/process/submitting-patches.rst: "Describe your changes in imperative mood, e.g. "make xyzzy do frotz" instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy to do frotz", as if you are giving orders to the codebase to change its behaviour." Also, never say "This patch" in a commit message of a patch. It is tautologically useless. > It introduces a new DEFINE_PER_CPU_HV_SHAHRED There's no DEFINE_PER_CPU_HV_SHAHRED. Typo. > macro to create a compile time per-CPU variable. When SEV is detected we > clear the C-bit from the shared per-CPU variable. > > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> > --- > arch/x86/kernel/kvm.c | 46 ++++++++++++++++++++++++++++++++++++--- > include/asm-generic/vmlinux.lds.h | 3 +++ > include/linux/percpu-defs.h | 12 ++++++++++ > 3 files changed, 58 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index 71c17a5..1f6fec8 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -75,8 +75,8 @@ static int parse_no_kvmclock_vsyscall(char *arg) > > early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall); > > -static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64); > -static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64); > +static DEFINE_PER_CPU_HV_SHARED(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64); > +static DEFINE_PER_CPU_HV_SHARED(struct kvm_steal_time, steal_time) __aligned(64); > static int has_steal_clock = 0; > > /* > @@ -303,7 +303,7 @@ static void kvm_register_steal_time(void) > cpu, (unsigned long long) slow_virt_to_phys(st)); > } > > -static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED; > +static DEFINE_PER_CPU_HV_SHARED(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED; > > static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val) > { > @@ -319,11 +319,51 @@ static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val) > apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK); > } > > +/* NOTE: function is marked as __ref because it is used by __init functions */ No need for that comment. What should you look into is why do you need to call the early versions: " * producing a warning (of course, no warning does not mean code is * correct, so optimally document why the __ref is needed and why it's OK)." And we do have the normal set_memory_decrypted() etc helpers so why aren't we using those? If you need to use the early ones too, then you probably need to differentiate this in the callers by passing a "bool early", which calls the proper flavor. > +static int __ref kvm_map_hv_shared_decrypted(void) > +{ > + static int once, ret; > + int cpu; > + > + if (once) > + return ret; So this function gets called per-CPU but you need to do this ugly "once" thing - i.e., global function called in a per-CPU context. Why can't you do that mapping only on the current CPU and then when that function is called on the next CPU, it will do the same thing on that next CPU? > + /* > + * Iterate through all possible CPU's and clear the C-bit from > + * percpu variables. > + */ > + for_each_possible_cpu(cpu) { > + struct kvm_vcpu_pv_apf_data *apf; > + unsigned long pa; > + > + apf = &per_cpu(apf_reason, cpu); > + pa = slow_virt_to_phys(apf); > + sme_early_decrypt(pa & PAGE_MASK, PAGE_SIZE); > + ret = early_set_memory_decrypted(pa, PAGE_SIZE); > + if (ret) > + break; > + } > + > + once = 1; > + return ret; > +} > + > static void kvm_guest_cpu_init(void) > { > if (!kvm_para_available()) > return; > > + /* > + * When SEV is active, map the shared percpu as unencrypted so that ... map the share percpu area unecnrypted... > + * both guest and hypervsior can access the data. > + */ > + if (sev_active()) { > + if (kvm_map_hv_shared_decrypted()) { > + printk(KERN_ERR "Failed to map percpu as unencrypted\n"); > + return; > + } > + } > + > if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) { > u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason)); > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > index da0be9a..52854cf 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -783,6 +783,9 @@ > . = ALIGN(cacheline); \ > *(.data..percpu) \ > *(.data..percpu..shared_aligned) \ > + . = ALIGN(PAGE_SIZE); \ > + *(.data..percpu..hv_shared) \ > + . = ALIGN(PAGE_SIZE); \ > VMLINUX_SYMBOL(__per_cpu_end) = .; Yeah, no, you can't do that. That's adding this section unconditionally on *every* arch. You need to do some ifdeffery like it is done at the beginning of that file and have this only on the arch which supports SEV.
Hi Boris, On 08/29/2017 05:22 AM, Borislav Petkov wrote: [...] > On Mon, Jul 24, 2017 at 02:07:56PM -0500, Brijesh Singh wrote: >> Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU > > MSRs > >> variable at compile time and share its physical address with hypervisor. > > That sentence needs changing - the MSRs don't allocate - for them gets > allocated. > >> It presents a challege when SEV is active in guest OS, when SEV is active, >> the guest memory is encrypted with guest key hence hypervisor will not >> able to modify the guest memory. When SEV is active, we need to clear the >> encryption attribute (aka C-bit) of shared physical addresses so that both >> guest and hypervisor can access the data. > > This whole paragraph needs rewriting. > I will improve the commit message in next rev. [...] >> +/* NOTE: function is marked as __ref because it is used by __init functions */ > > No need for that comment. > > What should you look into is why do you need to call the early versions: > > " * producing a warning (of course, no warning does not mean code is > * correct, so optimally document why the __ref is needed and why it's OK)." > > And we do have the normal set_memory_decrypted() etc helpers so why > aren't we using those? > Since kvm_guest_init() is called early in the boot process hence we will not able to use set_memory_decrypted() function. IIRC, if we try calling set_memory_decrypted() early then we will hit a BUG_ON [1] -- mainly when it tries to flush the caches. [1] http://elixir.free-electrons.com/linux/latest/source/arch/x86/mm/pageattr.c#L167 > If you need to use the early ones too, then you probably need to > differentiate this in the callers by passing a "bool early", which calls > the proper flavor. > Sure I can rearrange code to make it more readable and use "bool early" parameter to differentiate it. >> +static int __ref kvm_map_hv_shared_decrypted(void) >> +{ >> + static int once, ret; >> + int cpu; >> + >> + if (once) >> + return ret; > > So this function gets called per-CPU but you need to do this ugly "once" > thing - i.e., global function called in a per-CPU context. > > Why can't you do that mapping only on the current CPU and then > when that function is called on the next CPU, it will do the same thing > on that next CPU? > Yes, it can be done but I remember running into issues during the CPU hot plug. The patch uses early_set_memory_decrypted() -- which calls kernel_physical_mapping_init() to split the large pages into smaller. IIRC, the API did not work after the system is successfully booted. After the system is booted we must use the set_memory_decrypted(). I was trying to avoid mixing early and no-early set_memory_decrypted() but if feedback is: use early_set_memory_decrypted() only if its required otherwise use set_memory_decrypted() then I can improve the logic in next rev. thanks [...] >> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h >> index da0be9a..52854cf 100644 >> --- a/include/asm-generic/vmlinux.lds.h >> +++ b/include/asm-generic/vmlinux.lds.h >> @@ -783,6 +783,9 @@ >> . = ALIGN(cacheline); \ >> *(.data..percpu) \ >> *(.data..percpu..shared_aligned) \ >> + . = ALIGN(PAGE_SIZE); \ >> + *(.data..percpu..hv_shared) \ >> + . = ALIGN(PAGE_SIZE); \ >> VMLINUX_SYMBOL(__per_cpu_end) = .; > > Yeah, no, you can't do that. That's adding this section unconditionally > on *every* arch. You need to do some ifdeffery like it is done at the > beginning of that file and have this only on the arch which supports SEV. > Will do . thanks -Brijesh
On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote: > I was trying to avoid mixing early and no-early set_memory_decrypted() but if > feedback is: use early_set_memory_decrypted() only if its required otherwise > use set_memory_decrypted() then I can improve the logic in next rev. thanks Yes, I think you should use the early versions when you're, well, *early* :-) But get rid of that for_each_possible_cpu() and do it only on the current CPU, as this is a per-CPU path anyway. If you need to do it on *every* CPU and very early, then you need a separate function which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP. Thx.
Hi Boris, On 08/30/2017 12:46 PM, Borislav Petkov wrote: > On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote: >> I was trying to avoid mixing early and no-early set_memory_decrypted() but if >> feedback is: use early_set_memory_decrypted() only if its required otherwise >> use set_memory_decrypted() then I can improve the logic in next rev. thanks > > Yes, I think you should use the early versions when you're, well, > *early* :-) But get rid of that for_each_possible_cpu() and do it only > on the current CPU, as this is a per-CPU path anyway. If you need to > do it on *every* CPU and very early, then you need a separate function > which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP. > I am trying to implement your feedback and now remember why I choose to use early_set_memory_decrypted() and for_each_possible_cpu loop. These percpu variables are static. Hence before clearing the C-bit we must perform the in-place decryption so that original assignment is preserved after we change the C-bit. Tom's SME patch [1] added sme_early_decrypt() -- which can be used to perform the in-place decryption but we do not have similar routine for non-early cases. In order to address your feedback, we have to add similar functions. So far, we have not seen the need for having such functions except this cases. The approach we have right now works just fine and not sure if its worth adding new functions. Thoughts ? [1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of memory -Brijesh
On Fri, Sep 1, 2017 at 3:52 PM, Brijesh Singh <brijesh.singh@amd.com> wrote: > Hi Boris, > > On 08/30/2017 12:46 PM, Borislav Petkov wrote: >> >> On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote: >>> >>> I was trying to avoid mixing early and no-early set_memory_decrypted() >>> but if >>> feedback is: use early_set_memory_decrypted() only if its required >>> otherwise >>> use set_memory_decrypted() then I can improve the logic in next rev. >>> thanks >> >> >> Yes, I think you should use the early versions when you're, well, >> *early* :-) But get rid of that for_each_possible_cpu() and do it only >> on the current CPU, as this is a per-CPU path anyway. If you need to >> do it on *every* CPU and very early, then you need a separate function >> which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP. >> > > I am trying to implement your feedback and now remember why I choose to > use early_set_memory_decrypted() and for_each_possible_cpu loop. These > percpu variables are static. Hence before clearing the C-bit we must > perform the in-place decryption so that original assignment is preserved > after we change the C-bit. Tom's SME patch [1] added sme_early_decrypt() > -- which can be used to perform the in-place decryption but we do not have > similar routine for non-early cases. In order to address your feedback, > we have to add similar functions. So far, we have not seen the need for > having such functions except this cases. The approach we have right now > works just fine and not sure if its worth adding new functions. > > Thoughts ? > > [1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of > memory Shouldn't this be called DEFINE_PER_CPU_UNENCRYPTED? ISTM the "HV shared" bit is incidental.
On 9/1/17 10:21 PM, Andy Lutomirski wrote: > On Fri, Sep 1, 2017 at 3:52 PM, Brijesh Singh <brijesh.singh@amd.com> wrote: >> Hi Boris, >> >> On 08/30/2017 12:46 PM, Borislav Petkov wrote: >>> On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote: >>>> I was trying to avoid mixing early and no-early set_memory_decrypted() >>>> but if >>>> feedback is: use early_set_memory_decrypted() only if its required >>>> otherwise >>>> use set_memory_decrypted() then I can improve the logic in next rev. >>>> thanks >>> >>> Yes, I think you should use the early versions when you're, well, >>> *early* :-) But get rid of that for_each_possible_cpu() and do it only >>> on the current CPU, as this is a per-CPU path anyway. If you need to >>> do it on *every* CPU and very early, then you need a separate function >>> which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP. >>> >> I am trying to implement your feedback and now remember why I choose to >> use early_set_memory_decrypted() and for_each_possible_cpu loop. These >> percpu variables are static. Hence before clearing the C-bit we must >> perform the in-place decryption so that original assignment is preserved >> after we change the C-bit. Tom's SME patch [1] added sme_early_decrypt() >> -- which can be used to perform the in-place decryption but we do not have >> similar routine for non-early cases. In order to address your feedback, >> we have to add similar functions. So far, we have not seen the need for >> having such functions except this cases. The approach we have right now >> works just fine and not sure if its worth adding new functions. >> >> Thoughts ? >> >> [1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of >> memory > Shouldn't this be called DEFINE_PER_CPU_UNENCRYPTED? ISTM the "HV > shared" bit is incidental. Thanks for the suggestion, we could call it DEFINE_PER_CPU_UNENCRYPTED. I will use it in next rev. -Brijesh
On Fri, Sep 01, 2017 at 05:52:13PM -0500, Brijesh Singh wrote: > So far, we have not seen the need for having such functions except > this cases. The approach we have right now works just fine and not > sure if its worth adding new functions. Then put the call to kvm_map_hv_shared_decrypted() into kvm_smp_prepare_boot_cpu() to denote that you're executing this whole stuff only once during guest init. Now you're doing additional jumping-through-hoops with that once static var just so you can force something which needs to execute only once but gets called in a per-CPU path. See what I mean? > Thoughts ? > > [1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of memory Add [core] abbrev = 12 to the core section of your .gitconfig.
On 9/4/17 12:05 PM, Borislav Petkov wrote: > On Fri, Sep 01, 2017 at 05:52:13PM -0500, Brijesh Singh wrote: >> So far, we have not seen the need for having such functions except >> this cases. The approach we have right now works just fine and not >> sure if its worth adding new functions. > Then put the call to kvm_map_hv_shared_decrypted() into > kvm_smp_prepare_boot_cpu() to denote that you're executing this whole > stuff only once during guest init. > > Now you're doing additional jumping-through-hoops with that once static > var just so you can force something which needs to execute only once but > gets called in a per-CPU path. > > See what I mean? Yes, I see your point. I will address this issue in next rev. -Brijesh
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 71c17a5..1f6fec8 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -75,8 +75,8 @@ static int parse_no_kvmclock_vsyscall(char *arg) early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall); -static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64); -static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64); +static DEFINE_PER_CPU_HV_SHARED(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64); +static DEFINE_PER_CPU_HV_SHARED(struct kvm_steal_time, steal_time) __aligned(64); static int has_steal_clock = 0; /* @@ -303,7 +303,7 @@ static void kvm_register_steal_time(void) cpu, (unsigned long long) slow_virt_to_phys(st)); } -static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED; +static DEFINE_PER_CPU_HV_SHARED(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED; static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val) { @@ -319,11 +319,51 @@ static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val) apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK); } +/* NOTE: function is marked as __ref because it is used by __init functions */ +static int __ref kvm_map_hv_shared_decrypted(void) +{ + static int once, ret; + int cpu; + + if (once) + return ret; + + /* + * Iterate through all possible CPU's and clear the C-bit from + * percpu variables. + */ + for_each_possible_cpu(cpu) { + struct kvm_vcpu_pv_apf_data *apf; + unsigned long pa; + + apf = &per_cpu(apf_reason, cpu); + pa = slow_virt_to_phys(apf); + sme_early_decrypt(pa & PAGE_MASK, PAGE_SIZE); + ret = early_set_memory_decrypted(pa, PAGE_SIZE); + if (ret) + break; + } + + once = 1; + return ret; +} + static void kvm_guest_cpu_init(void) { if (!kvm_para_available()) return; + /* + * When SEV is active, map the shared percpu as unencrypted so that + * both guest and hypervsior can access the data. + */ + if (sev_active()) { + if (kvm_map_hv_shared_decrypted()) { + printk(KERN_ERR "Failed to map percpu as unencrypted\n"); + return; + } + } + if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) { u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason)); diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index da0be9a..52854cf 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -783,6 +783,9 @@ . = ALIGN(cacheline); \ *(.data..percpu) \ *(.data..percpu..shared_aligned) \ + . = ALIGN(PAGE_SIZE); \ + *(.data..percpu..hv_shared) \ + . = ALIGN(PAGE_SIZE); \ VMLINUX_SYMBOL(__per_cpu_end) = .; /** diff --git a/include/linux/percpu-defs.h b/include/linux/percpu-defs.h index 8f16299..f74b0c3 100644 --- a/include/linux/percpu-defs.h +++ b/include/linux/percpu-defs.h @@ -173,6 +173,18 @@ DEFINE_PER_CPU_SECTION(type, name, "..read_mostly") /* + * Declaration/definition used for per-CPU variables that must be shared + * between hypervisor and guest OS. + */ +#ifdef CONFIG_VIRTUALIZATION +#define DECLARE_PER_CPU_HV_SHARED(type, name) \ + DECLARE_PER_CPU_SECTION(type, name, "..hv_shared") + +#define DEFINE_PER_CPU_HV_SHARED(type, name) \ + DEFINE_PER_CPU_SECTION(type, name, "..hv_shared") +#endif + +/* * Intermodule exports for per-CPU variables. sparse forgets about * address space across EXPORT_SYMBOL(), change EXPORT_SYMBOL() to * noop if __CHECKER__.
Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU variable at compile time and share its physical address with hypervisor. It presents a challege when SEV is active in guest OS, when SEV is active, the guest memory is encrypted with guest key hence hypervisor will not able to modify the guest memory. When SEV is active, we need to clear the encryption attribute (aka C-bit) of shared physical addresses so that both guest and hypervisor can access the data. To solve this problem, I have tried these three options: 1) Convert the static per-CPU to dynamic per-CPU allocation and when SEV is detected clear the C-bit from the page table. But while doing so I found that per-CPU dynamic allocator was not ready when kvm_guest_cpu_init was called. 2) Since the C-bit works on PAGE_SIZE hence add some extra padding to 'struct kvm-steal-time' to make it PAGE_SIZE and then at runtime clear the encryption attribute of the full PAGE. The downside of this - we need to modify structure which may break the compatibility. 3) Define a new per-CPU section (.data..percpu.hv_shared) which will be used to hold the compile time shared per-CPU variables. When SEV is detected we map this section without C-bit. This patch implements #3. It introduces a new DEFINE_PER_CPU_HV_SHAHRED macro to create a compile time per-CPU variable. When SEV is detected we clear the C-bit from the shared per-CPU variable. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> --- arch/x86/kernel/kvm.c | 46 ++++++++++++++++++++++++++++++++++++--- include/asm-generic/vmlinux.lds.h | 3 +++ include/linux/percpu-defs.h | 12 ++++++++++ 3 files changed, 58 insertions(+), 3 deletions(-)