diff mbox

[18/26] KVM: PPC: KVM PV guest stubs

Message ID 1277508314-915-19-git-send-email-agraf@suse.de (mailing list archive)
State Not Applicable
Headers show

Commit Message

Alexander Graf June 25, 2010, 11:25 p.m. UTC
We will soon start and replace instructions from the text section with
other, paravirtualized versions. To ease the readability of those patches
I split out the generic looping and magic page mapping code out.

This patch still only contains stubs. But at least it loops through the
text section :).

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/kvm.c |   59 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 59 insertions(+), 0 deletions(-)

Comments

Avi Kivity June 27, 2010, 8:28 a.m. UTC | #1
On 06/26/2010 02:25 AM, Alexander Graf wrote:
> We will soon start and replace instructions from the text section with
> other, paravirtualized versions. To ease the readability of those patches
> I split out the generic looping and magic page mapping code out.
>
> This patch still only contains stubs. But at least it loops through the
> text section :).
>
>
> +
> +static void kvm_check_ins(u32 *inst)
> +{
> +	u32 _inst = *inst;
> +	u32 inst_no_rt = _inst&  ~KVM_MASK_RT;
> +	u32 inst_rt = _inst&  KVM_MASK_RT;
> +
> +	switch (inst_no_rt) {
> +	}
> +
> +	switch (_inst) {
> +	}
> +
> +	flush_icache_range((ulong)inst, (ulong)inst + 4);
> +}
>    

Shouldn't we flush only if we patched something?

> +
> +static void kvm_use_magic_page(void)
> +{
> +	u32 *p;
> +	u32 *start, *end;
> +
> +	/* Tell the host to map the magic page to -4096 on all CPUs */
> +
> +	on_each_cpu(kvm_map_magic_page, NULL, 1);
> +
> +	/* Now loop through all code and find instructions */
> +
> +	start = (void*)_stext;
> +	end = (void*)_etext;
> +
> +	for (p = start; p<  end; p++)
> +		kvm_check_ins(p);
> +}
> +
>    

Or, flush the entire thing here.
Alexander Graf June 27, 2010, 9:47 a.m. UTC | #2
Am 27.06.2010 um 10:28 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>> We will soon start and replace instructions from the text section  
>> with
>> other, paravirtualized versions. To ease the readability of those  
>> patches
>> I split out the generic looping and magic page mapping code out.
>>
>> This patch still only contains stubs. But at least it loops through  
>> the
>> text section :).
>>
>>
>> +
>> +static void kvm_check_ins(u32 *inst)
>> +{
>> +    u32 _inst = *inst;
>> +    u32 inst_no_rt = _inst&  ~KVM_MASK_RT;
>> +    u32 inst_rt = _inst&  KVM_MASK_RT;
>> +
>> +    switch (inst_no_rt) {
>> +    }
>> +
>> +    switch (_inst) {
>> +    }
>> +
>> +    flush_icache_range((ulong)inst, (ulong)inst + 4);
>> +}
>>
>
> Shouldn't we flush only if we patched something?

We introduce the patching in the next patches. This is only a  
preparation stub.

>
>> +
>> +static void kvm_use_magic_page(void)
>> +{
>> +    u32 *p;
>> +    u32 *start, *end;
>> +
>> +    /* Tell the host to map the magic page to -4096 on all CPUs */
>> +
>> +    on_each_cpu(kvm_map_magic_page, NULL, 1);
>> +
>> +    /* Now loop through all code and find instructions */
>> +
>> +    start = (void*)_stext;
>> +    end = (void*)_etext;
>> +
>> +    for (p = start; p<  end; p++)
>> +        kvm_check_ins(p);
>> +}
>> +
>>
>
> Or, flush the entire thing here.

I did that at first. It breaks. During the patching we may take  
interrupts (pahe faults for example) that contain just patched  
instructions. And really, hell breaks loose if we don't flush it  
immediately :). I was hoping at first a 32 bit replace would be atomic  
in cache, but the cpu tried to execute invalid instructions, so it  
must have gotten some intermediate state.

Alex
Avi Kivity June 27, 2010, 10:16 a.m. UTC | #3
On 06/27/2010 12:47 PM, Alexander Graf wrote:
>
> Am 27.06.2010 um 10:28 schrieb Avi Kivity <avi@redhat.com>:
>
>> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>>> We will soon start and replace instructions from the text section with
>>> other, paravirtualized versions. To ease the readability of those 
>>> patches
>>> I split out the generic looping and magic page mapping code out.
>>>
>>> This patch still only contains stubs. But at least it loops through the
>>> text section :).
>>>
>>>
>>> +
>>> +static void kvm_check_ins(u32 *inst)
>>> +{
>>> +    u32 _inst = *inst;
>>> +    u32 inst_no_rt = _inst&  ~KVM_MASK_RT;
>>> +    u32 inst_rt = _inst&  KVM_MASK_RT;
>>> +
>>> +    switch (inst_no_rt) {
>>> +    }
>>> +
>>> +    switch (_inst) {
>>> +    }
>>> +
>>> +    flush_icache_range((ulong)inst, (ulong)inst + 4);
>>> +}
>>>
>>
>> Shouldn't we flush only if we patched something?
>
> We introduce the patching in the next patches. This is only a 
> preparation stub.

Well, unless I missed something, this remains unconditional after all 
the patches.

A helper patch(pc, replacement) could patch and flush in one go.

>
>>
>>> +
>>> +static void kvm_use_magic_page(void)
>>> +{
>>> +    u32 *p;
>>> +    u32 *start, *end;
>>> +
>>> +    /* Tell the host to map the magic page to -4096 on all CPUs */
>>> +
>>> +    on_each_cpu(kvm_map_magic_page, NULL, 1);
>>> +
>>> +    /* Now loop through all code and find instructions */
>>> +
>>> +    start = (void*)_stext;
>>> +    end = (void*)_etext;
>>> +
>>> +    for (p = start; p<  end; p++)
>>> +        kvm_check_ins(p);
>>> +}
>>> +
>>>
>>
>> Or, flush the entire thing here.
>
> I did that at first. It breaks. During the patching we may take 
> interrupts (pahe faults for example) that contain just patched 
> instructions. And really, hell breaks loose if we don't flush it 
> immediately :). I was hoping at first a 32 bit replace would be atomic 
> in cache, but the cpu tried to execute invalid instructions, so it 
> must have gotten some intermediate state.

Surprising.  Maybe you need a flush after writing to the out-of-line code?
Alexander Graf June 27, 2010, 10:38 a.m. UTC | #4
Am 27.06.2010 um 12:16 schrieb Avi Kivity <avi@redhat.com>:

> On 06/27/2010 12:47 PM, Alexander Graf wrote:
>>
>> Am 27.06.2010 um 10:28 schrieb Avi Kivity <avi@redhat.com>:
>>
>>> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>>>> We will soon start and replace instructions from the text section  
>>>> with
>>>> other, paravirtualized versions. To ease the readability of those  
>>>> patches
>>>> I split out the generic looping and magic page mapping code out.
>>>>
>>>> This patch still only contains stubs. But at least it loops  
>>>> through the
>>>> text section :).
>>>>
>>>>
>>>> +
>>>> +static void kvm_check_ins(u32 *inst)
>>>> +{
>>>> +    u32 _inst = *inst;
>>>> +    u32 inst_no_rt = _inst&  ~KVM_MASK_RT;
>>>> +    u32 inst_rt = _inst&  KVM_MASK_RT;
>>>> +
>>>> +    switch (inst_no_rt) {
>>>> +    }
>>>> +
>>>> +    switch (_inst) {
>>>> +    }
>>>> +
>>>> +    flush_icache_range((ulong)inst, (ulong)inst + 4);
>>>> +}
>>>>
>>>
>>> Shouldn't we flush only if we patched something?
>>
>> We introduce the patching in the next patches. This is only a  
>> preparation stub.
>
> Well, unless I missed something, this remains unconditional after  
> all the patches.
>
> A helper patch(pc, replacement) could patch and flush in one go.

Oh I see what you mean. While not necessary, it would save a few  
cycles on guest bootup.

>
>>
>>>
>>>> +
>>>> +static void kvm_use_magic_page(void)
>>>> +{
>>>> +    u32 *p;
>>>> +    u32 *start, *end;
>>>> +
>>>> +    /* Tell the host to map the magic page to -4096 on all CPUs */
>>>> +
>>>> +    on_each_cpu(kvm_map_magic_page, NULL, 1);
>>>> +
>>>> +    /* Now loop through all code and find instructions */
>>>> +
>>>> +    start = (void*)_stext;
>>>> +    end = (void*)_etext;
>>>> +
>>>> +    for (p = start; p<  end; p++)
>>>> +        kvm_check_ins(p);
>>>> +}
>>>> +
>>>>
>>>
>>> Or, flush the entire thing here.
>>
>> I did that at first. It breaks. During the patching we may take  
>> interrupts (pahe faults for example) that contain just patched  
>> instructions. And really, hell breaks loose if we don't flush it  
>> immediately :). I was hoping at first a 32 bit replace would be  
>> atomic in cache, but the cpu tried to execute invalid instructions,  
>> so it must have gotten some intermediate state.
>
> Surprising.  Maybe you need a flush after writing to the out-of-line  
> code?

I do that too now :). Better flush too often that too rarely. It's not  
_that_ expensive after all.

Alex
Benjamin Herrenschmidt June 27, 2010, 10:04 p.m. UTC | #5
On Sun, 2010-06-27 at 11:47 +0200, Alexander Graf wrote:
> I did that at first. It breaks. During the patching we may take  
> interrupts (pahe faults for example) that contain just patched  
> instructions. And really, hell breaks loose if we don't flush it  
> immediately :). I was hoping at first a 32 bit replace would be
> atomic  
> in cache, but the cpu tried to execute invalid instructions, so it  
> must have gotten some intermediate state.

A 32-bit aligned store -is- atomic. The other threads/cpu will see
either the old or the new instruction, nothing in between.

Cheers,
Ben.
Matt Evans June 28, 2010, 4:39 a.m. UTC | #6
Howdy Alex!

Alexander Graf wrote:
> We will soon start and replace instructions from the text section with
> other, paravirtualized versions. To ease the readability of those patches
> I split out the generic looping and magic page mapping code out.
> 
> This patch still only contains stubs. But at least it loops through the
> text section :).
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  arch/powerpc/kernel/kvm.c |   59 +++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 59 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
> index 2d8dd73..d873bc6 100644
> --- a/arch/powerpc/kernel/kvm.c
> +++ b/arch/powerpc/kernel/kvm.c
> @@ -32,3 +32,62 @@
>  #define KVM_MAGIC_PAGE		(-4096L)
>  #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
>  
> +static bool kvm_patching_worked = true;
> +
> +static void kvm_map_magic_page(void *data)
> +{
> +	kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
> +		       KVM_MAGIC_PAGE,  /* Physical Address */
> +		       KVM_MAGIC_PAGE); /* Effective Address */
> +}
> +
> +static void kvm_check_ins(u32 *inst)
> +{
> +	u32 _inst = *inst;
> +	u32 inst_no_rt = _inst & ~KVM_MASK_RT;
> +	u32 inst_rt = _inst & KVM_MASK_RT;
> +
> +	switch (inst_no_rt) {
> +	}
> +
> +	switch (_inst) {
> +	}
> +
> +	flush_icache_range((ulong)inst, (ulong)inst + 4);
> +}
> +
> +static void kvm_use_magic_page(void)
> +{
> +	u32 *p;
> +	u32 *start, *end;
> +
> +	/* Tell the host to map the magic page to -4096 on all CPUs */
> +
> +	on_each_cpu(kvm_map_magic_page, NULL, 1);
> +
> +	/* Now loop through all code and find instructions */
> +
> +	start = (void*)_stext;
> +	end = (void*)_etext;
> +
> +	for (p = start; p < end; p++)
> +		kvm_check_ins(p);
> +}

Could you do something similar in module_finalize() to patch loaded modules' .text sections?

> +
> +static int __init kvm_guest_init(void)
> +{
> +	char *p;
> +
> +	if (!kvm_para_available())
> +		return 0;
> +
> +	if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE))
> +		kvm_use_magic_page();
> +
> +	printk(KERN_INFO "KVM: Live patching for a fast VM %s\n",
> +			 kvm_patching_worked ? "worked" : "failed");
> +
> +	return 0;
> +}
> +
> +postcore_initcall(kvm_guest_init);

Cheers,


Matt
Alexander Graf June 28, 2010, 6:33 a.m. UTC | #7
On 28.06.2010, at 06:39, Matt Evans wrote:

> Howdy Alex!
> 
> Alexander Graf wrote:
>> We will soon start and replace instructions from the text section with
>> other, paravirtualized versions. To ease the readability of those patches
>> I split out the generic looping and magic page mapping code out.
>> 
>> This patch still only contains stubs. But at least it loops through the
>> text section :).
>> 
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>> ---
>> arch/powerpc/kernel/kvm.c |   59 +++++++++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 59 insertions(+), 0 deletions(-)
>> 
>> diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
>> index 2d8dd73..d873bc6 100644
>> --- a/arch/powerpc/kernel/kvm.c
>> +++ b/arch/powerpc/kernel/kvm.c
>> @@ -32,3 +32,62 @@
>> #define KVM_MAGIC_PAGE		(-4096L)
>> #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
>> 
>> +static bool kvm_patching_worked = true;
>> +
>> +static void kvm_map_magic_page(void *data)
>> +{
>> +	kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
>> +		       KVM_MAGIC_PAGE,  /* Physical Address */
>> +		       KVM_MAGIC_PAGE); /* Effective Address */
>> +}
>> +
>> +static void kvm_check_ins(u32 *inst)
>> +{
>> +	u32 _inst = *inst;
>> +	u32 inst_no_rt = _inst & ~KVM_MASK_RT;
>> +	u32 inst_rt = _inst & KVM_MASK_RT;
>> +
>> +	switch (inst_no_rt) {
>> +	}
>> +
>> +	switch (_inst) {
>> +	}
>> +
>> +	flush_icache_range((ulong)inst, (ulong)inst + 4);
>> +}
>> +
>> +static void kvm_use_magic_page(void)
>> +{
>> +	u32 *p;
>> +	u32 *start, *end;
>> +
>> +	/* Tell the host to map the magic page to -4096 on all CPUs */
>> +
>> +	on_each_cpu(kvm_map_magic_page, NULL, 1);
>> +
>> +	/* Now loop through all code and find instructions */
>> +
>> +	start = (void*)_stext;
>> +	end = (void*)_etext;
>> +
>> +	for (p = start; p < end; p++)
>> +		kvm_check_ins(p);
>> +}
> 
> Could you do something similar in module_finalize() to patch loaded modules' .text sections?

I could, but do we need it? I objdump -d | grep'ed all my modules and didn't find any need to do so.


Alex
Avi Kivity June 28, 2010, 8:15 a.m. UTC | #8
On 06/28/2010 09:33 AM, Alexander Graf wrote:
>
>> Could you do something similar in module_finalize() to patch loaded modules' .text sections?
>>      
> I could, but do we need it? I objdump -d | grep'ed all my modules and didn't find any need to do so.
>    

You mean even kvm.ko doesn't use privileged instructions?
Alexander Graf June 28, 2010, 8:23 a.m. UTC | #9
On 28.06.2010, at 10:15, Avi Kivity wrote:

> On 06/28/2010 09:33 AM, Alexander Graf wrote:
>> 
>>> Could you do something similar in module_finalize() to patch loaded modules' .text sections?
>>>     
>> I could, but do we need it? I objdump -d | grep'ed all my modules and didn't find any need to do so.
>>   
> 
> You mean even kvm.ko doesn't use privileged instructions?

It does, but I don't think it's worth speeding those up. There are only a couple. Most of the privileged instructions in PPC KVM are statically compiled into the kernel because we need to guarantee they're in the RMO (first 8MB for the PS3).

Even with the magic page in use, trapping instructions still works exactly as before, so we're only talking about a speed difference.


Alex
Avi Kivity June 28, 2010, 8:33 a.m. UTC | #10
On 06/28/2010 11:23 AM, Alexander Graf wrote:
>> You mean even kvm.ko doesn't use privileged instructions?
>>      
> It does, but I don't think it's worth speeding those up. There are only a couple. Most of the privileged instructions in PPC KVM are statically compiled into the kernel because we need to guarantee they're in the RMO (first 8MB for the PS3).
>
> Even with the magic page in use, trapping instructions still works exactly as before, so we're only talking about a speed difference.
>
>    

Yeah, that also answers my question re pv->nonpv transition.
diff mbox

Patch

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 2d8dd73..d873bc6 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -32,3 +32,62 @@ 
 #define KVM_MAGIC_PAGE		(-4096L)
 #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
 
+static bool kvm_patching_worked = true;
+
+static void kvm_map_magic_page(void *data)
+{
+	kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
+		       KVM_MAGIC_PAGE,  /* Physical Address */
+		       KVM_MAGIC_PAGE); /* Effective Address */
+}
+
+static void kvm_check_ins(u32 *inst)
+{
+	u32 _inst = *inst;
+	u32 inst_no_rt = _inst & ~KVM_MASK_RT;
+	u32 inst_rt = _inst & KVM_MASK_RT;
+
+	switch (inst_no_rt) {
+	}
+
+	switch (_inst) {
+	}
+
+	flush_icache_range((ulong)inst, (ulong)inst + 4);
+}
+
+static void kvm_use_magic_page(void)
+{
+	u32 *p;
+	u32 *start, *end;
+
+	/* Tell the host to map the magic page to -4096 on all CPUs */
+
+	on_each_cpu(kvm_map_magic_page, NULL, 1);
+
+	/* Now loop through all code and find instructions */
+
+	start = (void*)_stext;
+	end = (void*)_etext;
+
+	for (p = start; p < end; p++)
+		kvm_check_ins(p);
+}
+
+static int __init kvm_guest_init(void)
+{
+	char *p;
+
+	if (!kvm_para_available())
+		return 0;
+
+	if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE))
+		kvm_use_magic_page();
+
+	printk(KERN_INFO "KVM: Live patching for a fast VM %s\n",
+			 kvm_patching_worked ? "worked" : "failed");
+
+	return 0;
+}
+
+postcore_initcall(kvm_guest_init);