Patchwork [lucid] fix smp kvm guests on AMD

login
register
mail settings
Submitter Serge Hallyn
Date March 2, 2011, 9:57 p.m.
Message ID <20110302215713.GA27999@peq.hallyn.com>
Download mbox | patch
Permalink /patch/85284/
State New
Headers show

Comments

Serge Hallyn - March 2, 2011, 9:57 p.m.
> Serge - Can we get this as 3 different patches, each with an
> appropriate 'backported from' or 'cherry-picked from' SHA1 ID.

Attached to this email.  (If you prefer 3 separate inline emails
pls let me know - I'm still getting familiar with the preferences
here).  The result compiled fine.

thanks,
-serge
Stefan Bader - March 7, 2011, 6:04 p.m.
On 03/02/2011 10:57 PM, Serge E. Hallyn wrote:
>> Serge - Can we get this as 3 different patches, each with an
>> appropriate 'backported from' or 'cherry-picked from' SHA1 ID.
> 
> Attached to this email.  (If you prefer 3 separate inline emails
> pls let me know - I'm still getting familiar with the preferences
> here).  The result compiled fine.
> 
> thanks,
> -serge
> 

I looked through the patches and have a few questions/remarks:

Patch #2 seems to work around some changes missing from

commit 759379dd68c2885d1fafa433083d4487e710a685
Author: Zachary Amsden <zamsden@redhat.com>
Date:   Thu Aug 19 22:07:25 2010 -1000

    KVM: x86: Add helper functions for time computation

that patch replaces
 	ktime_get_ts(&ts);
 	monotonic_to_bootbased(&ts);
	[timespec_to_ns(&ts);]
by
	kernel_ns = get_kernel_ns()

which is all done within the section of disabled interrupts, so I probably would
move the assignment of kernel_ns up before local_irq_restore().

Also in patch#2, the upstream patch does this:

/* With all the info we got, fill in the values */
vcpu->hv_clock.tsc_timestamp = tsc_timestamp;

This seems to be missing in the backport.

And finally in patch#3 last_guest_tsc is set right at the beginning but the
upstream patch had that later in the section that was assigning last_kernel_ns.

@@ -1095,6 +1095,7 @@ static int kvm_write_guest_time(struct kvm_vcpu *v)
        vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
        vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
        vcpu->last_kernel_ns = kernel_ns;
+       vcpu->last_guest_tsc = tsc_timestamp;
        vcpu->hv_clock.flags = 0;

One not directly related note: when checking whether this could have effects on
the lucid-ec2 branch, I noted that the xen code also has some scale_delta in its
code. It seems separated enough to be unaffected. But then maybe we actually
want to look closer there. Maybe there is the same problem in that code (I did
not look closer, yet).

-Stefan
Serge Hallyn - March 7, 2011, 7:57 p.m.
Quoting Stefan Bader (stefan.bader@canonical.com):

Thanks for the review, Stefan.  Great catches.

> I looked through the patches and have a few questions/remarks:
> 
> Patch #2 seems to work around some changes missing from
> 
> commit 759379dd68c2885d1fafa433083d4487e710a685
> Author: Zachary Amsden <zamsden@redhat.com>
> Date:   Thu Aug 19 22:07:25 2010 -1000
> 
>     KVM: x86: Add helper functions for time computation
> 
> that patch replaces
>  	ktime_get_ts(&ts);
>  	monotonic_to_bootbased(&ts);
> 	[timespec_to_ns(&ts);]
> by
> 	kernel_ns = get_kernel_ns()
> 
> which is all done within the section of disabled interrupts, so I probably would
> move the assignment of kernel_ns up before local_irq_restore().

Sounds good.

> Also in patch#2, the upstream patch does this:
> 
> /* With all the info we got, fill in the values */
> vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
> 
> This seems to be missing in the backport.

Yes, I missed that one, and clearly it's necessary.

> And finally in patch#3 last_guest_tsc is set right at the beginning but the
> upstream patch had that later in the section that was assigning last_kernel_ns.
> 
> @@ -1095,6 +1095,7 @@ static int kvm_write_guest_time(struct kvm_vcpu *v)
>         vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
>         vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
>         vcpu->last_kernel_ns = kernel_ns;
> +       vcpu->last_guest_tsc = tsc_timestamp;
>         vcpu->hv_clock.flags = 0;

Feh, that's how the first version of my patch did it.  Messed up in
the split.

Are you planning to just fix these on your end?

> One not directly related note: when checking whether this could have effects on
> the lucid-ec2 branch, I noted that the xen code also has some scale_delta in its
> code. It seems separated enough to be unaffected. But then maybe we actually
> want to look closer there. Maybe there is the same problem in that code (I did
> not look closer, yet).

I'd worry about doing that without reports from xen users.

thanks,
-serge
Stefan Bader - March 8, 2011, 2:22 p.m.
On 03/07/2011 08:57 PM, Serge E. Hallyn wrote:
> Quoting Stefan Bader (stefan.bader@canonical.com):
> 
> Thanks for the review, Stefan.  Great catches.
> 
>> I looked through the patches and have a few questions/remarks:
>>
>> Patch #2 seems to work around some changes missing from
>>
>> commit 759379dd68c2885d1fafa433083d4487e710a685
>> Author: Zachary Amsden <zamsden@redhat.com>
>> Date:   Thu Aug 19 22:07:25 2010 -1000
>>
>>     KVM: x86: Add helper functions for time computation
>>
>> that patch replaces
>>  	ktime_get_ts(&ts);
>>  	monotonic_to_bootbased(&ts);
>> 	[timespec_to_ns(&ts);]
>> by
>> 	kernel_ns = get_kernel_ns()
>>
>> which is all done within the section of disabled interrupts, so I probably would
>> move the assignment of kernel_ns up before local_irq_restore().
> 
> Sounds good.
> 
>> Also in patch#2, the upstream patch does this:
>>
>> /* With all the info we got, fill in the values */
>> vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
>>
>> This seems to be missing in the backport.
> 
> Yes, I missed that one, and clearly it's necessary.
> 
>> And finally in patch#3 last_guest_tsc is set right at the beginning but the
>> upstream patch had that later in the section that was assigning last_kernel_ns.
>>
>> @@ -1095,6 +1095,7 @@ static int kvm_write_guest_time(struct kvm_vcpu *v)
>>         vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
>>         vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
>>         vcpu->last_kernel_ns = kernel_ns;
>> +       vcpu->last_guest_tsc = tsc_timestamp;
>>         vcpu->hv_clock.flags = 0;
> 
> Feh, that's how the first version of my patch did it.  Messed up in
> the split.
> 
> Are you planning to just fix these on your end?
> 
I would appreciate if you could update the patchset and re-submit it to this thread.

>> One not directly related note: when checking whether this could have effects on
>> the lucid-ec2 branch, I noted that the xen code also has some scale_delta in its
>> code. It seems separated enough to be unaffected. But then maybe we actually
>> want to look closer there. Maybe there is the same problem in that code (I did
>> not look closer, yet).
> 
> I'd worry about doing that without reports from xen users.
>
True. There are some reports about wrong process times but IIRC those were seen
in ec2 and bare metal and maybe get fixed by that big upstream scheduler change.

-Stefan

> thanks,
> -serge
Stefan Bader - March 9, 2011, 3:41 p.m.
On 03/09/2011 02:31 PM, Serge E. Hallyn wrote:
> Quoting Stefan Bader (stefan.bader@canonical.com):
>> I would appreciate if you could update the patchset and re-submit it to this thread.
> 
> attached.
> 
> -serge

Looks right to me now. There is a slight error in the description section which
I probably should have spotted before. Though that also can be corrected when
applying. Normally each patch should have a BugLink line which points to the bug
report (BugLink: http://bugs.launchpad.net/bugs/#).

Also we may think of presenting that backport to the KVM maintainers. It sounds
like there could be interest in having that in the upstream stable tree as well.

Acked-by: Stefan Bader <stefan.bader@canonical.com>

Patch

From 00a79dd53bf86de570e7449a5839da10e5b4be45 Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <serge.hallyn@canonical.com>
Date: Wed, 2 Mar 2011 20:07:48 +0000
Subject: [PATCH 3/3] KVM: x86: Fix kvmclock bug

Backport of commit 28e4639adf0c9f26f6bb56149b7ab547bf33bb95

If preempted after kvmclock values are updated, but before hardware
virtualization is entered, the last tsc time as read by the guest is
never set.  It underflows the next time kvmclock is updated if there
has not yet been a successful entry / exit into hardware virt.

Fix this by simply setting last_tsc to the newly read tsc value so
that any computed nsec advance of kvmclock is nulled.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
---
 arch/x86/kvm/x86.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ffd70eb..205d977 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -654,6 +654,7 @@  static void kvm_write_guest_time(struct kvm_vcpu *v)
 	monotonic_to_bootbased(&ts);
 	local_irq_restore(flags);
 	kernel_ns = timespec_to_ns(&ts);
+	vcpu->last_guest_tsc = tsc_timestamp;
 
 	/*
 	 * Time as measured by the TSC may go backwards when resetting the base
-- 
1.7.0.4