diff mbox series

[02/23] KVM: PPC: Book3S HV: Increment mmu_notifier_seq when modifying radix pte rc bits

Message ID 20190826062109.7573-3-sjitindarsingh@gmail.com
State Deferred
Headers show
Series KVM: PPC: BOok3S HV: Support for nested HPT guests | expand

Commit Message

Suraj Jitindar Singh Aug. 26, 2019, 6:20 a.m. UTC
The kvm mmu_notifier_seq is used to communicate that a page mapping has
changed to code which is using that information in constructing a different
but reliant page mapping. For example when constructing a mapping for a
nested guest it is used to detect when the guest mapping has changed, which
would render the nested guest mapping invalid.

When running nested guests it is important that the rc bits are kept in
sync between the 2 ptes on the host in which they exist; the pte for the
guest, and the pte for the nested guest. This is done when inserting the
nested pte in __kvmhv_nested_page_fault_radix() by reducing the rc bits
being set in the nested pte to those already set in the guest pte. And
when setting the bits in the nested pte in response to an interrupt in
kvmhv_handle_nested_set_rc_radix() the same bits are also set in the
guest pte, with the bits not set in the nested pte if this fails.

When the host wants to remove rc bits from the guest pte in
kvm_radix_test_clear_dirty(), if first removes then from the guest pte
and then from any corresponding nested ptes which map the same guest
page. This means that there is a window between which the rc bits could
get out of sync between the two ptes as they might have been seen as set
in the guest pte and thus updated in the nested pte assuming as such,
while the host might be in the process of removing those rc bits leading
to an inconsistency.

In kvm_radix_test_clear_dirty() the mmu_lock spin lock is held across
removing the rc bits from the guest and nested pte, and the same is done
across updating the rc bits in the guest and nested pte in
kvmhv_handle_nested_set_rc_radix() and so there is no window for them to
get out of sync in this case. However when constructing the pte in
__kvmhv_nested_page_fault_radix() we drop the mmu_lock spin lock between
reading the guest pte and inserting the nested pte, presenting a window
for them to get out of sync. This is because the rc bits could have been
observed as set in the guest pte and set in the nested pte accordingly,
however in the mean time the rc bits in the guest pte could have been
cleared, and since the nested pte wasn't yet inserted there is no way
for the kvm_radix_test_clear_dirty() function to clear them and so an
inconsistency can arise.

To avoid the possibility of the rc bits getting out of sync, increment
the mmu_notifier_seq in kvm_radix_test_clear_dirty() under the mmu_lock
when clearing rc bits. This means that when inserting the nested pte in
__kvmhv_nested_page_fault_radix() we will bail out and retry when we see
that the mmu_seq differs indicating that the guest pte has changed.

Fixes: ae59a7e1945b ("KVM: PPC: Book3S HV: Keep rc bits in shadow pgtable in sync with host")

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 ++
 1 file changed, 2 insertions(+)
diff mbox series


diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 2d415c36a61d..310d8dde9a48 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1044,6 +1044,8 @@  static int kvm_radix_test_clear_dirty(struct kvm *kvm,
 		kvmhv_update_nest_rmap_rc_list(kvm, rmapp, _PAGE_DIRTY, 0,
 					       old & PTE_RPN_MASK,
 					       1UL << shift);
+		/* Notify anyone trying to map the page that it has changed */
+		kvm->mmu_notifier_seq++;
 	return ret;