Patchwork [v2,2/2] kvm: migrate vPMU state

login
register
mail settings
Submitter Gleb Natapov
Date July 28, 2013, 2:24 p.m.
Message ID <20130728142425.GK11772@redhat.com>
Download mbox | patch
Permalink /patch/262591/
State New
Headers show

Comments

Gleb Natapov - July 28, 2013, 2:24 p.m.
On Sun, Jul 28, 2013 at 04:07:37PM +0200, Paolo Bonzini wrote:
> Il 28/07/2013 15:54, Gleb Natapov ha scritto:
> > On Sun, Jul 28, 2013 at 03:51:25PM +0200, Paolo Bonzini wrote:
> >> Il 28/07/2013 14:57, Gleb Natapov ha scritto:
> >>>> @@ -1114,6 +1135,33 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> >>>>              kvm_msr_entry_set(&msrs[n++], MSR_KVM_STEAL_TIME,
> >>>>                                env->steal_time_msr);
> >>>>          }
> >>>> +        if (has_msr_architectural_pmu) {
> >>>> +            /* Stop the counter.  */
> >>>> +            kvm_msr_entry_set(&msrs[n++], MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> >>>> +            kvm_msr_entry_set(&msrs[n++], MSR_CORE_PERF_GLOBAL_CTRL, 0);
> >>>> +
> >>> Why is this needed?
> >>
> >> In v1 it was in the commit message.  I'll fix it up before applying:
> >>
> >>> Second, to avoid any possible side effects during the setting of MSRs
> >>> I stop the PMU while setting the counters and event selector MSRs.
> >>> Stopping the PMU snapshots the counters and ensures that no strange
> >>> races can happen if the counters were saved close to their overflow
> >>> value.
> >>
> > Since vcpu is not running counters should not count anyway.
> 
> Does the perf event distinguish KVM_RUN from any other activity in the
> vCPU thread (in which this code runs)?  It seemed unsafe to me to change
> the overflow status and the performance counter value while the counter
> could be running, since the counter value could affect the overflow
> status.  Maybe I was being paranoid?
> 
KVM disabled HW counters when outside of a guest mode (otherwise result
will be useless), so I do not see how the problem you describe can
happen. On the other hand MPU emulation assumes that counter have to be disabled
while MSR_IA32_PERFCTR0 is written since write to MSR_IA32_PERFCTR0 does
not reprogram perf evens, so we need either disable/enable counters to
write MSR_IA32_PERFCTR0 or have this patch in the kernel:



--
			Gleb.
Paolo Bonzini - Aug. 1, 2013, 1:03 p.m.
> KVM disabled HW counters when outside of a guest mode (otherwise result
> will be useless), so I do not see how the problem you describe can
> happen.

Yes, you're right.

> On the other hand MPU emulation assumes that counter have to be disabled
> while MSR_IA32_PERFCTR0 is written since write to MSR_IA32_PERFCTR0 does
> not reprogram perf evens, so we need either disable/enable counters to
> write MSR_IA32_PERFCTR0 or have this patch in the kernel:
> 
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 5c4f631..bf14e42 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -412,6 +412,8 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  			if (!msr_info->host_initiated)
>  				data = (s64)(s32)data;
>  			pmc->counter += data - read_pmc(pmc);
> +			if (msr_info->host_initiated)
> +				reprogram_gp_counter(pmc, pmc->eventsel);
>  			return 0;
>  		} else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) {
>  			if (data == pmc->eventsel)

Why do you need "if (msr_info->host_initiated)"?  I could not find any
hint in the manual that the overflow counter will still use the value
of the counter that was programmed first.

If we need to do it always, I agree it's better to modify the QEMU
patch and not disable/enable the counters.  But if we need to restrict
it to host-initiated writes, I would rather have the QEMU patch as I
posted it.  So far we always had less side-effects from host_initiated,
not more, and I think it's a good rule of thumb.

Paolo
Gleb Natapov - Aug. 1, 2013, 1:12 p.m.
On Thu, Aug 01, 2013 at 03:03:12PM +0200, Paolo Bonzini wrote:
> > KVM disabled HW counters when outside of a guest mode (otherwise result
> > will be useless), so I do not see how the problem you describe can
> > happen.
> 
> Yes, you're right.
> 
> > On the other hand MPU emulation assumes that counter have to be disabled
> > while MSR_IA32_PERFCTR0 is written since write to MSR_IA32_PERFCTR0 does
> > not reprogram perf evens, so we need either disable/enable counters to
> > write MSR_IA32_PERFCTR0 or have this patch in the kernel:
> > 
> > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> > index 5c4f631..bf14e42 100644
> > --- a/arch/x86/kvm/pmu.c
> > +++ b/arch/x86/kvm/pmu.c
> > @@ -412,6 +412,8 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  			if (!msr_info->host_initiated)
> >  				data = (s64)(s32)data;
> >  			pmc->counter += data - read_pmc(pmc);
> > +			if (msr_info->host_initiated)
> > +				reprogram_gp_counter(pmc, pmc->eventsel);
> >  			return 0;
> >  		} else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) {
> >  			if (data == pmc->eventsel)
> 
> Why do you need "if (msr_info->host_initiated)"?  I could not find any
> hint in the manual that the overflow counter will still use the value
> of the counter that was programmed first.
> 
Not sure I understand. What "overflow counter will still use the value
of the counter that was programmed first" means?

Strictly speaking we do need "if (msr_info->host_initiated)" here,
there is no harm in calling reprogram_gp_counter() unconditionally,
but spec says in no vague terms that counter should be disabled before
writing into the MSR and it means that reprogram_gp_counter() will be
called again when guest will enable counter later, so the invocation
here is redundant and since during profiling this happens a lot avoiding
call to reprogram_gp_counter() is a win.

> If we need to do it always, I agree it's better to modify the QEMU
> patch and not disable/enable the counters.  But if we need to restrict
> it to host-initiated writes, I would rather have the QEMU patch as I
> posted it.  So far we always had less side-effects from host_initiated,
> not more, and I think it's a good rule of thumb.
> 
I am OK with your patch, it is a little bit unfortunate that userspase
should care about such low level details though.

--
			Gleb.
Paolo Bonzini - Aug. 1, 2013, 1:48 p.m.
On Aug 01 2013, Gleb Natapov wrote:
> On Thu, Aug 01, 2013 at 03:03:12PM +0200, Paolo Bonzini wrote:
> > > KVM disabled HW counters when outside of a guest mode (otherwise result
> > > will be useless), so I do not see how the problem you describe can
> > > happen.
> > 
> > Yes, you're right.
> > 
> > > On the other hand MPU emulation assumes that counter have to be disabled
> > > while MSR_IA32_PERFCTR0 is written since write to MSR_IA32_PERFCTR0 does
> > > not reprogram perf evens, so we need either disable/enable counters to
> > > write MSR_IA32_PERFCTR0 or have this patch in the kernel:
> > > 
> > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> > > index 5c4f631..bf14e42 100644
> > > --- a/arch/x86/kvm/pmu.c
> > > +++ b/arch/x86/kvm/pmu.c
> > > @@ -412,6 +412,8 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > >  			if (!msr_info->host_initiated)
> > >  				data = (s64)(s32)data;
> > >  			pmc->counter += data - read_pmc(pmc);
> > > +			if (msr_info->host_initiated)
> > > +				reprogram_gp_counter(pmc, pmc->eventsel);
> > >  			return 0;
> > >  		} else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) {
> > >  			if (data == pmc->eventsel)
> > 
> > Why do you need "if (msr_info->host_initiated)"?  I could not find any
> > hint in the manual that the overflow counter will still use the value
> > of the counter that was programmed first.
> > 
> Not sure I understand. What "overflow counter will still use the value
> of the counter that was programmed first" means?
> 
> spec says in no vague terms that counter should be disabled before
> writing into the MSR and it means that reprogram_gp_counter() will be
> called again when guest will enable counter later,

Yeah, I found it now.

> I am OK with your patch, it is a little bit unfortunate that userspase
> should care about such low level details though.

Is it a Reviewed-by?

Paolo
Gleb Natapov - Aug. 1, 2013, 1:50 p.m.
On Thu, Aug 01, 2013 at 03:48:29PM +0200, Paolo Bonzini wrote:
>  On Aug 01 2013, Gleb Natapov wrote:
> > On Thu, Aug 01, 2013 at 03:03:12PM +0200, Paolo Bonzini wrote:
> > > > KVM disabled HW counters when outside of a guest mode (otherwise result
> > > > will be useless), so I do not see how the problem you describe can
> > > > happen.
> > > 
> > > Yes, you're right.
> > > 
> > > > On the other hand MPU emulation assumes that counter have to be disabled
> > > > while MSR_IA32_PERFCTR0 is written since write to MSR_IA32_PERFCTR0 does
> > > > not reprogram perf evens, so we need either disable/enable counters to
> > > > write MSR_IA32_PERFCTR0 or have this patch in the kernel:
> > > > 
> > > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> > > > index 5c4f631..bf14e42 100644
> > > > --- a/arch/x86/kvm/pmu.c
> > > > +++ b/arch/x86/kvm/pmu.c
> > > > @@ -412,6 +412,8 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > > >  			if (!msr_info->host_initiated)
> > > >  				data = (s64)(s32)data;
> > > >  			pmc->counter += data - read_pmc(pmc);
> > > > +			if (msr_info->host_initiated)
> > > > +				reprogram_gp_counter(pmc, pmc->eventsel);
> > > >  			return 0;
> > > >  		} else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) {
> > > >  			if (data == pmc->eventsel)
> > > 
> > > Why do you need "if (msr_info->host_initiated)"?  I could not find any
> > > hint in the manual that the overflow counter will still use the value
> > > of the counter that was programmed first.
> > > 
> > Not sure I understand. What "overflow counter will still use the value
> > of the counter that was programmed first" means?
> > 
> > spec says in no vague terms that counter should be disabled before
> > writing into the MSR and it means that reprogram_gp_counter() will be
> > called again when guest will enable counter later,
> 
> Yeah, I found it now.
> 
> > I am OK with your patch, it is a little bit unfortunate that userspase
> > should care about such low level details though.
> 
> Is it a Reviewed-by?
>
here is one :)
 
Reviewed-by: Gleb Natapov <gleb@redhat.com>

--
			Gleb.

Patch

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 5c4f631..bf14e42 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -412,6 +412,8 @@  int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			if (!msr_info->host_initiated)
 				data = (s64)(s32)data;
 			pmc->counter += data - read_pmc(pmc);
+			if (msr_info->host_initiated)
+				reprogram_gp_counter(pmc, pmc->eventsel);
 			return 0;
 		} else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) {
 			if (data == pmc->eventsel)