diff mbox series

[v2] KVM: ppc64: Enable ring-based dirty memory tracking on ppc64: enable config options and implement relevant functions

Message ID 20230717071208.1134783-1-kconsul@linux.vnet.ibm.com (mailing list archive)
State Changes Requested
Headers show
Series [v2] KVM: ppc64: Enable ring-based dirty memory tracking on ppc64: enable config options and implement relevant functions | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 23 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.

Commit Message

Kautuk Consul July 17, 2023, 7:12 a.m. UTC
- Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL as ppc64 is weakly
  ordered.
- Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP because the
  kvmppc_xive_native_set_attr is called in the context of an ioctl
  syscall and will call kvmppc_xive_native_eq_sync for setting the
  KVM_DEV_XIVE_EQ_SYNC attribute which will call mark_dirty_page()
  when there isn't a running vcpu. Implemented the
  kvm_arch_allow_write_without_running_vcpu to always return true
  to allow mark_page_dirty_in_slot to mark the page dirty in the
  memslot->dirty_bitmap in this case.
- Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page
  offset.
- Implement the kvm_arch_mmu_enable_log_dirty_pt_masked function required
  for the generic KVM code to call.
- Add a check to kvmppc_vcpu_run_hv for checking whether the dirty
  ring is soft full.
- Implement the kvm_arch_flush_remote_tlbs_memslot function to support
  the CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT config option.

Test Results
============
On testing with live migration it was found that there is around
150-180 ms improvment in overall migration time with this patch.

Bare Metal P9 testing with patch:
--------------------------------
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 20694 ms
downtime: 73 ms
setup: 23 ms
transferred ram: 2604370 kbytes
throughput: 1033.55 mbps
remaining ram: 0 kbytes
total ram: 16777216 kbytes
duplicate: 3555398 pages
skipped: 0 pages
normal: 642026 pages
normal bytes: 2568104 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 32455
precopy ram: 2581549 kbytes
downtime ram: 22820 kbytes

Bare Metal P9 testing without patch:
-----------------------------------
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 20873 ms
downtime: 62 ms
setup: 19 ms
transferred ram: 2612900 kbytes
throughput: 1027.83 mbps
remaining ram: 0 kbytes
total ram: 16777216 kbytes
duplicate: 3553329 pages
skipped: 0 pages
normal: 644159 pages
normal bytes: 2576636 kbytes
dirty sync count: 4
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 88297
precopy ram: 2603645 kbytes
downtime ram: 9254 kbytes

Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
---
 Documentation/virt/kvm/api.rst      |  2 +-
 arch/powerpc/include/uapi/asm/kvm.h |  2 ++
 arch/powerpc/kvm/Kconfig            |  2 ++
 arch/powerpc/kvm/book3s.c           | 46 +++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv.c        |  3 ++
 include/linux/kvm_dirty_ring.h      |  5 ++++
 virt/kvm/dirty_ring.c               |  1 +
 7 files changed, 60 insertions(+), 1 deletion(-)

Comments

Aneesh Kumar K V July 17, 2023, 8:06 a.m. UTC | #1
Kautuk Consul <kconsul@linux.vnet.ibm.com> writes:

> - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL as ppc64 is weakly
>   ordered.
> - Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP because the
>   kvmppc_xive_native_set_attr is called in the context of an ioctl
>   syscall and will call kvmppc_xive_native_eq_sync for setting the
>   KVM_DEV_XIVE_EQ_SYNC attribute which will call mark_dirty_page()
>   when there isn't a running vcpu. Implemented the
>   kvm_arch_allow_write_without_running_vcpu to always return true
>   to allow mark_page_dirty_in_slot to mark the page dirty in the
>   memslot->dirty_bitmap in this case.
> - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page
>   offset.
> - Implement the kvm_arch_mmu_enable_log_dirty_pt_masked function required
>   for the generic KVM code to call.
> - Add a check to kvmppc_vcpu_run_hv for checking whether the dirty
>   ring is soft full.
> - Implement the kvm_arch_flush_remote_tlbs_memslot function to support
>   the CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT config option.
>
> Test Results
> ============
> On testing with live migration it was found that there is around
> 150-180 ms improvment in overall migration time with this patch.
>
> Bare Metal P9 testing with patch:
> --------------------------------
> (qemu) info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> clear-bitmap-shift: 18
> Migration status: completed
> total time: 20694 ms
> downtime: 73 ms
> setup: 23 ms
> transferred ram: 2604370 kbytes
> throughput: 1033.55 mbps
> remaining ram: 0 kbytes
> total ram: 16777216 kbytes
> duplicate: 3555398 pages
> skipped: 0 pages
> normal: 642026 pages
> normal bytes: 2568104 kbytes
> dirty sync count: 3
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> pages-per-second: 32455
> precopy ram: 2581549 kbytes
> downtime ram: 22820 kbytes
>
> Bare Metal P9 testing without patch:
> -----------------------------------
> (qemu) info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> clear-bitmap-shift: 18
> Migration status: completed
> total time: 20873 ms
> downtime: 62 ms
> setup: 19 ms
> transferred ram: 2612900 kbytes
> throughput: 1027.83 mbps
> remaining ram: 0 kbytes
> total ram: 16777216 kbytes
> duplicate: 3553329 pages
> skipped: 0 pages
> normal: 644159 pages
> normal bytes: 2576636 kbytes
> dirty sync count: 4
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> pages-per-second: 88297
> precopy ram: 2603645 kbytes
> downtime ram: 9254 kbytes
>
> Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
> ---
>  Documentation/virt/kvm/api.rst      |  2 +-
>  arch/powerpc/include/uapi/asm/kvm.h |  2 ++
>  arch/powerpc/kvm/Kconfig            |  2 ++
>  arch/powerpc/kvm/book3s.c           | 46 +++++++++++++++++++++++++++++
>  arch/powerpc/kvm/book3s_hv.c        |  3 ++
>  include/linux/kvm_dirty_ring.h      |  5 ++++
>  virt/kvm/dirty_ring.c               |  1 +
>  7 files changed, 60 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index c0ddd3035462..84c180ccd178 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8114,7 +8114,7 @@ regardless of what has actually been exposed through the CPUID leaf.
>  8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>  ----------------------------------------------------------
>  
> -:Architectures: x86, arm64
> +:Architectures: x86, arm64, ppc64
>  :Parameters: args[0] - size of the dirty log ring
>  
>  KVM is capable of tracking dirty memory using ring buffers that are
> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
> index 9f18fa090f1f..f722309ed7fb 100644
> --- a/arch/powerpc/include/uapi/asm/kvm.h
> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> @@ -33,6 +33,8 @@
>  /* Not always available, but if it is, this is the correct offset.  */
>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>  
> +#define KVM_DIRTY_LOG_PAGE_OFFSET 64
> +
>  struct kvm_regs {
>  	__u64 pc;
>  	__u64 cr;
> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
> index 902611954200..c93354ec3bd5 100644
> --- a/arch/powerpc/kvm/Kconfig
> +++ b/arch/powerpc/kvm/Kconfig
> @@ -26,6 +26,8 @@ config KVM
>  	select IRQ_BYPASS_MANAGER
>  	select HAVE_KVM_IRQ_BYPASS
>  	select INTERVAL_TREE
> +	select HAVE_KVM_DIRTY_RING_ACQ_REL
> +	select NEED_KVM_DIRTY_RING_WITH_BITMAP
>  
>  config KVM_BOOK3S_HANDLER
>  	bool
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 686d8d9eda3e..01aa4fe2c424 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -32,6 +32,7 @@
>  #include <asm/mmu_context.h>
>  #include <asm/page.h>
>  #include <asm/xive.h>
> +#include <asm/book3s/64/radix.h>
>  
>  #include "book3s.h"
>  #include "trace.h"
> @@ -1070,6 +1071,51 @@ int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin)
>  
>  #endif /* CONFIG_KVM_XICS */
>  
> +/*
> + * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
> + * dirty pages.
> + *
> + * It write protects selected pages to enable dirty logging for them.
> + */
> +void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
> +					     struct kvm_memory_slot *slot,
> +					     gfn_t gfn_offset,
> +					     unsigned long mask)
> +{
> +	phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
> +	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
> +	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
> +
> +	while (start < end) {
> +		pte_t *ptep;
> +		unsigned int shift;
> +
> +		ptep = find_kvm_secondary_pte(kvm, start, &shift);
> +
> +		if (radix_enabled())
> +			__radix_pte_update(ptep, _PAGE_WRITE, 0);
> +		else
> +			*ptep = __pte(pte_val(*ptep) & ~(_PAGE_WRITE));
> +
> +		start += PAGE_SIZE;
> +	}
>


I am not sure about that. You are walking partition scoped table here
and you are checking for hypervisor translation mode and doing pte
updates. That doesn't look correct.

-aneesh
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index c0ddd3035462..84c180ccd178 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8114,7 +8114,7 @@  regardless of what has actually been exposed through the CPUID leaf.
 8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 ----------------------------------------------------------
 
-:Architectures: x86, arm64
+:Architectures: x86, arm64, ppc64
 :Parameters: args[0] - size of the dirty log ring
 
 KVM is capable of tracking dirty memory using ring buffers that are
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 9f18fa090f1f..f722309ed7fb 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -33,6 +33,8 @@ 
 /* Not always available, but if it is, this is the correct offset.  */
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
+
 struct kvm_regs {
 	__u64 pc;
 	__u64 cr;
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 902611954200..c93354ec3bd5 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -26,6 +26,8 @@  config KVM
 	select IRQ_BYPASS_MANAGER
 	select HAVE_KVM_IRQ_BYPASS
 	select INTERVAL_TREE
+	select HAVE_KVM_DIRTY_RING_ACQ_REL
+	select NEED_KVM_DIRTY_RING_WITH_BITMAP
 
 config KVM_BOOK3S_HANDLER
 	bool
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 686d8d9eda3e..01aa4fe2c424 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -32,6 +32,7 @@ 
 #include <asm/mmu_context.h>
 #include <asm/page.h>
 #include <asm/xive.h>
+#include <asm/book3s/64/radix.h>
 
 #include "book3s.h"
 #include "trace.h"
@@ -1070,6 +1071,51 @@  int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin)
 
 #endif /* CONFIG_KVM_XICS */
 
+/*
+ * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
+ * dirty pages.
+ *
+ * It write protects selected pages to enable dirty logging for them.
+ */
+void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
+					     struct kvm_memory_slot *slot,
+					     gfn_t gfn_offset,
+					     unsigned long mask)
+{
+	phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
+	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
+	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
+
+	while (start < end) {
+		pte_t *ptep;
+		unsigned int shift;
+
+		ptep = find_kvm_secondary_pte(kvm, start, &shift);
+
+		if (radix_enabled())
+			__radix_pte_update(ptep, _PAGE_WRITE, 0);
+		else
+			*ptep = __pte(pte_val(*ptep) & ~(_PAGE_WRITE));
+
+		start += PAGE_SIZE;
+	}
+}
+
+#ifdef CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP
+bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
+{
+	return true;
+}
+#endif
+
+#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
+void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
+					const struct kvm_memory_slot *memslot)
+{
+	kvm_flush_remote_tlbs(kvm);
+}
+#endif
+
 static int kvmppc_book3s_init(void)
 {
 	int r;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 130bafdb1430..1d1264ea72c4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4804,6 +4804,9 @@  static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 		return -EINTR;
 	}
 
+	if (kvm_dirty_ring_check_request(vcpu))
+		return 0;
+
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	/*
 	 * Don't allow entry with a suspended transaction, because
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 4862c98d80d3..a00301059da5 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -69,6 +69,11 @@  static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 {
 }
 
+static inline bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 int kvm_cpu_dirty_log_size(void);
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index c1cd7dfe4a90..982ee7e1072f 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -209,6 +209,7 @@  bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
 
 	return false;
 }
+EXPORT_SYMBOL_GPL(kvm_dirty_ring_check_request);
 
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
 {