diff mbox series

[v2] KVM: PPC: Book3S HV: Sanitise vcpu registers in nested path

Message ID 20210406214645.3315819-1-farosas@linux.ibm.com
State New
Headers show
Series [v2] KVM: PPC: Book3S HV: Sanitise vcpu registers in nested path | expand

Commit Message

Fabiano Rosas April 6, 2021, 9:46 p.m. UTC
As one of the arguments of the H_ENTER_NESTED hypercall, the nested
hypervisor (L1) prepares a structure containing the values of various
hypervisor-privileged registers with which it wants the nested guest
(L2) to run. Since the nested HV runs in supervisor mode it needs the
host to write to these registers.

To stop a nested HV manipulating this mechanism and using a nested
guest as a proxy to access a facility that has been made unavailable
to it, we have a routine that sanitises the values of the HV registers
before copying them into the nested guest's vcpu struct.

However, when coming out of the guest the values are copied as they
were back into L1 memory, which means that any sanitisation we did
during guest entry will be exposed to L1 after H_ENTER_NESTED returns.

This patch alters this sanitisation to have effect on the vcpu->arch
registers directly before entering and after exiting the guest,
leaving the structure that is copied back into L1 unchanged (except
when we really want L1 to access the value, e.g the Cause bits of
HFSCR).

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
I'm taking another shot at fixing this locally without resorting to
more complex things such as error handling and feature
advertisement/negotiation.

Changes since v1:

- made the change more generic, not only applies to hfscr anymore;
- sanitisation is now done directly on the vcpu struct, l2_hv is left unchanged;

v1:

https://lkml.kernel.org/r/20210305231055.2913892-1-farosas@linux.ibm.com
---
 arch/powerpc/kvm/book3s_hv_nested.c | 33 +++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 7 deletions(-)

Comments

Nicholas Piggin April 7, 2021, 10:01 a.m. UTC | #1
Excerpts from Fabiano Rosas's message of April 7, 2021 7:46 am:
> As one of the arguments of the H_ENTER_NESTED hypercall, the nested
> hypervisor (L1) prepares a structure containing the values of various
> hypervisor-privileged registers with which it wants the nested guest
> (L2) to run. Since the nested HV runs in supervisor mode it needs the
> host to write to these registers.
> 
> To stop a nested HV manipulating this mechanism and using a nested
> guest as a proxy to access a facility that has been made unavailable
> to it, we have a routine that sanitises the values of the HV registers
> before copying them into the nested guest's vcpu struct.
> 
> However, when coming out of the guest the values are copied as they
> were back into L1 memory, which means that any sanitisation we did
> during guest entry will be exposed to L1 after H_ENTER_NESTED returns.
> 
> This patch alters this sanitisation to have effect on the vcpu->arch
> registers directly before entering and after exiting the guest,
> leaving the structure that is copied back into L1 unchanged (except
> when we really want L1 to access the value, e.g the Cause bits of
> HFSCR).
> 
> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>

I like this direction better. Now "sanitise" may be the wrong word 
because you aren't cleaning the data in place any more but copying a 
cleaned version across.

Which is fine but I wouldn't call it sanitise. Actually I would
prefer if it is just done as part of the copy rather than
copy first and then apply this (explained below in the code).

> ---
> I'm taking another shot at fixing this locally without resorting to
> more complex things such as error handling and feature
> advertisement/negotiation.

That's okay, I think those are orthogonal problems. This won't help if
a nested HV tries to use some HFSCR feature it believes should be
available to a (say) POWER10 processor but got secretly masked away.
But also such negotiation doesn't help with trying to minimise L0 HV
data accessible to guests.

(As before a guest can easily find out many of these things if it is
determined to, but that does not mean I'm strongly against what you
are doing here)

> 
> Changes since v1:
> 
> - made the change more generic, not only applies to hfscr anymore;
> - sanitisation is now done directly on the vcpu struct, l2_hv is left unchanged;
> 
> v1:
> 
> https://lkml.kernel.org/r/20210305231055.2913892-1-farosas@linux.ibm.com
> ---
>  arch/powerpc/kvm/book3s_hv_nested.c | 33 +++++++++++++++++++++++------
>  1 file changed, 26 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
> index 0cd0e7aad588..a60fccb2c4f2 100644
> --- a/arch/powerpc/kvm/book3s_hv_nested.c
> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
> @@ -132,21 +132,37 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap,
>  	}
>  }
>  
> -static void sanitise_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
> +static void sanitise_vcpu_entry_state(struct kvm_vcpu *vcpu,
> +				      const struct hv_guest_state *l2_hv,
> +				      const struct hv_guest_state *l1_hv)
>  {
>  	/*
>  	 * Don't let L1 enable features for L2 which we've disabled for L1,
>  	 * but preserve the interrupt cause field.
>  	 */
> -	hr->hfscr &= (HFSCR_INTR_CAUSE | vcpu->arch.hfscr);
> +	vcpu->arch.hfscr = l2_hv->hfscr & (HFSCR_INTR_CAUSE | l1_hv->hfscr);
>  
>  	/* Don't let data address watchpoint match in hypervisor state */
> -	hr->dawrx0 &= ~DAWRX_HYP;
> -	hr->dawrx1 &= ~DAWRX_HYP;
> +	vcpu->arch.dawrx0 = l2_hv->dawrx0 & ~DAWRX_HYP;
> +	vcpu->arch.dawrx1 = l2_hv->dawrx1 & ~DAWRX_HYP;
>  
>  	/* Don't let completed instruction address breakpt match in HV state */
> -	if ((hr->ciabr & CIABR_PRIV) == CIABR_PRIV_HYPER)
> -		hr->ciabr &= ~CIABR_PRIV;
> +	if ((l2_hv->ciabr & CIABR_PRIV) == CIABR_PRIV_HYPER)
> +		vcpu->arch.ciabr = l2_hv->ciabr & ~CIABR_PRIV;
> +}
> +
> +
> +/*
> + * During sanitise_vcpu_entry_state() we might have used bits from L1
> + * state to restrict what the L2 state is allowed to be. Since L1 is
> + * not allowed to read the HV registers, do not include these
> + * modifications in the return state.
> + */
> +static void sanitise_vcpu_return_state(struct kvm_vcpu *vcpu,
> +				       const struct hv_guest_state *l2_hv)
> +{
> +	vcpu->arch.hfscr = ((~HFSCR_INTR_CAUSE & l2_hv->hfscr) |
> +			(HFSCR_INTR_CAUSE & vcpu->arch.hfscr));
>  }
>  
>  static void restore_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
> @@ -324,9 +340,10 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>  	mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
>  		LPCR_LPES | LPCR_MER;
>  	lpcr = (vc->lpcr & ~mask) | (l2_hv.lpcr & mask);
> -	sanitise_hv_regs(vcpu, &l2_hv);
>  	restore_hv_regs(vcpu, &l2_hv);
>  
> +	sanitise_vcpu_entry_state(vcpu, &l2_hv, &saved_l1_hv);

So instead of doing this, can we just have one function that does
load_hv_regs_for_l2()?

> +
>  	vcpu->arch.ret = RESUME_GUEST;
>  	vcpu->arch.trap = 0;
>  	do {
> @@ -338,6 +355,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>  		r = kvmhv_run_single_vcpu(vcpu, hdec_exp, lpcr);
>  	} while (is_kvmppc_resume_guest(r));
>  
> +	sanitise_vcpu_return_state(vcpu, &l2_hv);

And this could be done in save_hv_return_state().

I think?

Question about HFSCR. Is it possible for some interrupt cause bit
reaching the nested hypervisor for a bit that we thought we had
enabled but was secretly masked off? I.e., do we have to filter
HFSCR causes according to the facilities we secretly disabled?

Thanks,
Nick
Fabiano Rosas April 7, 2021, 2:27 p.m. UTC | #2
Nicholas Piggin <npiggin@gmail.com> writes:

<snip>

>>  static void restore_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
>> @@ -324,9 +340,10 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>>  	mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
>>  		LPCR_LPES | LPCR_MER;
>>  	lpcr = (vc->lpcr & ~mask) | (l2_hv.lpcr & mask);
>> -	sanitise_hv_regs(vcpu, &l2_hv);
>>  	restore_hv_regs(vcpu, &l2_hv);
>>  
>> +	sanitise_vcpu_entry_state(vcpu, &l2_hv, &saved_l1_hv);
>
> So instead of doing this, can we just have one function that does
> load_hv_regs_for_l2()?

Yes. I would go even further and fold everything into a load_l2_state()
that takes care of hv and non-hv. The top level here could easily be:

  save_l1_state();
  load_l2_state();
  
  do {
     kvmhv_run_single_vcpu();
  } while();
  
  save_l2_state();
  restore_l1_state();

I'll send a v3 with the change you suggested and then perhaps a small
refactoring on top of it. Let's see how it turns out.

>
>> +
>>  	vcpu->arch.ret = RESUME_GUEST;
>>  	vcpu->arch.trap = 0;
>>  	do {
>> @@ -338,6 +355,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>>  		r = kvmhv_run_single_vcpu(vcpu, hdec_exp, lpcr);
>>  	} while (is_kvmppc_resume_guest(r));
>>  
>> +	sanitise_vcpu_return_state(vcpu, &l2_hv);
>
> And this could be done in save_hv_return_state().
>
> I think?
>
> Question about HFSCR. Is it possible for some interrupt cause bit
> reaching the nested hypervisor for a bit that we thought we had
> enabled but was secretly masked off? I.e., do we have to filter
> HFSCR causes according to the facilities we secretly disabled?

Yes, we're copying the Cause bits unmodified. Currently it makes no
difference because L1 only checks for doorbells and everything else
leads to injecting a program interrupt into L2.

What I think is the correct thing to do is to only return into L1 with
the Cause bits pertaining to the facilities it has disabled (if L1 state
has a bit set but L2 state has not).

For the facilities L0 has disabled in L1, we should handle them as if L1
had tried to use the facility and instead of returning from
H_ENTER_NESTED into L1, do whatever we currently do under
BOOK3S_INTERRUPT_H_FAC_UNAVAIL for non-nested guests. Which would
eventually mean injecting a program interrupt into L1 because we're not
L2s hypervisor - L1 is - so there is not much we'd want to do in L0 in
terms of emulating the facility.

Does that make sense?

>
> Thanks,
> Nick
diff mbox series

Patch

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 0cd0e7aad588..a60fccb2c4f2 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -132,21 +132,37 @@  static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap,
 	}
 }
 
-static void sanitise_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
+static void sanitise_vcpu_entry_state(struct kvm_vcpu *vcpu,
+				      const struct hv_guest_state *l2_hv,
+				      const struct hv_guest_state *l1_hv)
 {
 	/*
 	 * Don't let L1 enable features for L2 which we've disabled for L1,
 	 * but preserve the interrupt cause field.
 	 */
-	hr->hfscr &= (HFSCR_INTR_CAUSE | vcpu->arch.hfscr);
+	vcpu->arch.hfscr = l2_hv->hfscr & (HFSCR_INTR_CAUSE | l1_hv->hfscr);
 
 	/* Don't let data address watchpoint match in hypervisor state */
-	hr->dawrx0 &= ~DAWRX_HYP;
-	hr->dawrx1 &= ~DAWRX_HYP;
+	vcpu->arch.dawrx0 = l2_hv->dawrx0 & ~DAWRX_HYP;
+	vcpu->arch.dawrx1 = l2_hv->dawrx1 & ~DAWRX_HYP;
 
 	/* Don't let completed instruction address breakpt match in HV state */
-	if ((hr->ciabr & CIABR_PRIV) == CIABR_PRIV_HYPER)
-		hr->ciabr &= ~CIABR_PRIV;
+	if ((l2_hv->ciabr & CIABR_PRIV) == CIABR_PRIV_HYPER)
+		vcpu->arch.ciabr = l2_hv->ciabr & ~CIABR_PRIV;
+}
+
+
+/*
+ * During sanitise_vcpu_entry_state() we might have used bits from L1
+ * state to restrict what the L2 state is allowed to be. Since L1 is
+ * not allowed to read the HV registers, do not include these
+ * modifications in the return state.
+ */
+static void sanitise_vcpu_return_state(struct kvm_vcpu *vcpu,
+				       const struct hv_guest_state *l2_hv)
+{
+	vcpu->arch.hfscr = ((~HFSCR_INTR_CAUSE & l2_hv->hfscr) |
+			(HFSCR_INTR_CAUSE & vcpu->arch.hfscr));
 }
 
 static void restore_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
@@ -324,9 +340,10 @@  long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 	mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
 		LPCR_LPES | LPCR_MER;
 	lpcr = (vc->lpcr & ~mask) | (l2_hv.lpcr & mask);
-	sanitise_hv_regs(vcpu, &l2_hv);
 	restore_hv_regs(vcpu, &l2_hv);
 
+	sanitise_vcpu_entry_state(vcpu, &l2_hv, &saved_l1_hv);
+
 	vcpu->arch.ret = RESUME_GUEST;
 	vcpu->arch.trap = 0;
 	do {
@@ -338,6 +355,8 @@  long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 		r = kvmhv_run_single_vcpu(vcpu, hdec_exp, lpcr);
 	} while (is_kvmppc_resume_guest(r));
 
+	sanitise_vcpu_return_state(vcpu, &l2_hv);
+
 	/* save L2 state for return */
 	l2_regs = vcpu->arch.regs;
 	l2_regs.msr = vcpu->arch.shregs.msr;