Patchwork [3/3] pseries: Correctly create ibm, segment-page-sizes property

login
register
mail settings
Submitter David Gibson
Date Sept. 30, 2011, 7:50 a.m.
Message ID <1317369040-30437-4-git-send-email-david@gibson.dropbear.id.au>
Download mbox | patch
Permalink /patch/117054/
State New
Headers show

Comments

David Gibson - Sept. 30, 2011, 7:50 a.m.
Current versions of the PowerPC architecture require and fully define
4kB and 16MB page sizes.  Other pagesizes (e.g. 64kB, 1MB) are
permitted and are often supported, but the exact encodings used to set
the up can vary from chip to chip.

The supported pagesizes and required encodings are advertised to the
OS via the ibm,segment-page-sizes property in the device tree.
Currently we do not put this property in our device tree, so guests
are restricted to the architected 4kB and 16MB pagesizes.

The base sizes are all that we implement in tcg, however with KVM the
guest can use anything supported by the host as long as the guest's
base memory is backed by pages at least as large.  Furthermore, in
order to use any extended page sizes, the guest needs to know the
correct encodings for the host.

This patch, therefore, reads the host's pagesize information, filters
it based on the pagesize backing RAM, and passes it into the guest.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/spapr.c           |  127 ++++++++++++++++++++++++++++++++++++++++++++++++++
 target-ppc/kvm.c     |   43 +++++++++++++++++
 target-ppc/kvm_ppc.h |    6 ++
 3 files changed, 176 insertions(+), 0 deletions(-)
Alexander Graf - Oct. 7, 2011, 7:20 a.m.
On 30.09.2011, at 09:50, David Gibson wrote:

> Current versions of the PowerPC architecture require and fully define
> 4kB and 16MB page sizes.  Other pagesizes (e.g. 64kB, 1MB) are
> permitted and are often supported, but the exact encodings used to set
> the up can vary from chip to chip.
> 
> The supported pagesizes and required encodings are advertised to the
> OS via the ibm,segment-page-sizes property in the device tree.
> Currently we do not put this property in our device tree, so guests
> are restricted to the architected 4kB and 16MB pagesizes.
> 
> The base sizes are all that we implement in tcg, however with KVM the
> guest can use anything supported by the host as long as the guest's
> base memory is backed by pages at least as large.  Furthermore, in
> order to use any extended page sizes, the guest needs to know the
> correct encodings for the host.
> 
> This patch, therefore, reads the host's pagesize information, filters
> it based on the pagesize backing RAM, and passes it into the guest.
> 
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> hw/spapr.c           |  127 ++++++++++++++++++++++++++++++++++++++++++++++++++
> target-ppc/kvm.c     |   43 +++++++++++++++++
> target-ppc/kvm_ppc.h |    6 ++
> 3 files changed, 176 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/spapr.c b/hw/spapr.c
> index 8089d83..72b6c6a 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -24,6 +24,8 @@
>  * THE SOFTWARE.
>  *
>  */
> +#include <sys/vfs.h>
> +
> #include "sysemu.h"
> #include "hw.h"
> #include "elf.h"
> @@ -88,6 +90,122 @@ qemu_irq spapr_allocate_irq(uint32_t hint, uint32_t *irq_num)
>     return qirq;
> }
> 
> +#define HUGETLBFS_MAGIC       0x958458f6
> +
> +static long getrampagesize(void)
> +{
> +    struct statfs fs;
> +    int ret;
> +
> +    if (!mem_path) {
> +        /* guest RAM is backed by normal anonymous pages */
> +        return getpagesize();
> +    }
> +
> +    do {
> +        ret = statfs(mem_path, &fs);
> +    } while (ret != 0 && errno == EINTR);
> +
> +    if (ret != 0) {
> +        fprintf(stderr, "Couldn't statfs() memory path: %s\n",
> +                strerror(errno));
> +        exit(1);
> +    }
> +
> +    if (fs.f_type != HUGETLBFS_MAGIC) {
> +        /* Explicit mempath, but it's ordinary pages */
> +        return getpagesize();
> +    }
> +
> +    /* It's hugepage, return the huge page size */
> +    return fs.f_bsize;
> +}

Would this function compile and work on win32 hosts? If not, it should probably go to kvm.c.

> +
> +static size_t create_page_sizes_prop(uint32_t *prop, size_t maxsize)
> +{
> +    int cells;
> +    target_ulong ram_page_size = getrampagesize();
> +    int i, j;
> +
> +    if (!kvm_enabled()) {
> +        /* For the supported CPUs in emulation, we support just 4k and
> +         * 16MB pages, with the usual encodings.  This is the default
> +         * set the guest will assume if we don't specify anything */
> +        return 0;
> +    }
> +
> +    cells = kvmppc_read_segment_page_sizes(prop, maxsize / sizeof(uint32_t));

Shouldn't we rather be asking the kvm kernel module to tell us its supported segment sizes? Just because the host doesn't support 256MB page size doesn't mean we can't expose it to the guest, right? Depending on the KVM mode of course.

For HV we would pass through the hardware ones. For PR we could pretty much support anything since we're shadowing the htab. But there it'd be a win too, since we would get less page table entries and could potentially also back things with huge pages.

Also, this depends heavily on the guest CPU architecture. For 970, we can't support anything but 4k and 16MB (and even that one is crap). For p7, things are a lot more flexible. But we need to make sure that what we tell the guest is actually possible to do on the particular CPU we're emulating / virtualizing.


Alex

Patch

diff --git a/hw/spapr.c b/hw/spapr.c
index 8089d83..72b6c6a 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -24,6 +24,8 @@ 
  * THE SOFTWARE.
  *
  */
+#include <sys/vfs.h>
+
 #include "sysemu.h"
 #include "hw.h"
 #include "elf.h"
@@ -88,6 +90,122 @@  qemu_irq spapr_allocate_irq(uint32_t hint, uint32_t *irq_num)
     return qirq;
 }
 
+#define HUGETLBFS_MAGIC       0x958458f6
+
+static long getrampagesize(void)
+{
+    struct statfs fs;
+    int ret;
+
+    if (!mem_path) {
+        /* guest RAM is backed by normal anonymous pages */
+        return getpagesize();
+    }
+
+    do {
+        ret = statfs(mem_path, &fs);
+    } while (ret != 0 && errno == EINTR);
+
+    if (ret != 0) {
+        fprintf(stderr, "Couldn't statfs() memory path: %s\n",
+                strerror(errno));
+        exit(1);
+    }
+
+    if (fs.f_type != HUGETLBFS_MAGIC) {
+        /* Explicit mempath, but it's ordinary pages */
+        return getpagesize();
+    }
+
+    /* It's hugepage, return the huge page size */
+    return fs.f_bsize;
+}
+
+static size_t create_page_sizes_prop(uint32_t *prop, size_t maxsize)
+{
+    int cells;
+    target_ulong ram_page_size = getrampagesize();
+    int i, j;
+
+    if (!kvm_enabled()) {
+        /* For the supported CPUs in emulation, we support just 4k and
+         * 16MB pages, with the usual encodings.  This is the default
+         * set the guest will assume if we don't specify anything */
+        return 0;
+    }
+
+    cells = kvmppc_read_segment_page_sizes(prop, maxsize / sizeof(uint32_t));
+    if (cells < 0) {
+        fprintf(stderr, "Error reading host's "
+                "ibm,segment-page-sizes property\n");
+        exit(1);
+    }
+
+    if (cells == 0) {
+        /* Host specifies no pagesizes, so use the architected ones */
+        uint32_t def_page_sizes[] = {0xc, 0x0, 0x1, 0xc, 0x0, /* 4kB */
+                                     0x18, 0x100, 0x1, 0x18, 0x0, }; /* 16MB */
+
+        assert(maxsize >= sizeof(def_page_sizes));
+
+        memcpy(prop, def_page_sizes, sizeof(def_page_sizes));
+        cells = sizeof(def_page_sizes) / sizeof(def_page_sizes[0]);
+    }
+
+    /* Filter based on pagesize backing RAM */
+    i = j = 0;
+    while (i < cells) {
+        uint32_t baseshift, slbenc, numsizes, k, n;
+
+        if ((i + 3) >= cells) {
+            fprintf(stderr, "Malformed ibm,segment-page-sizes on host\n");
+            exit(1);
+        }
+
+        baseshift = be32_to_cpu(prop[i++]);
+        slbenc = be32_to_cpu(prop[i++]);
+        numsizes = be32_to_cpu(prop[i++]);
+
+        if ((i + numsizes*2) >= cells) {
+            fprintf(stderr, "Malformed ibm,segment-page-sizes on host\n");
+            exit(1);
+        }
+
+        /* Too big, skip */
+        if ((1UL << baseshift) > ram_page_size) {
+            i += numsizes*2;
+            continue;
+        }
+
+        n = 0;
+        for (k = 0; k < numsizes; k++) {
+            uint32_t shift = be32_to_cpu(prop[i + k*2]);
+
+            if ((1UL << shift) <= ram_page_size) {
+                n++;
+            }
+        }
+
+        prop[j++] = cpu_to_be32(baseshift);
+        prop[j++] = cpu_to_be32(slbenc);
+        prop[j++] = cpu_to_be32(n);
+
+        for (k = 0; k < numsizes; k++) {
+            uint32_t shift = be32_to_cpu(prop[i++]);
+            uint32_t hashenc = be32_to_cpu(prop[i++]);
+
+            if ((1UL << shift) <= ram_page_size) {
+                prop[j++] = cpu_to_be32(shift);
+                prop[j++] = cpu_to_be32(hashenc);
+            }
+        }
+    }
+
+    assert(i == cells);
+
+    return j * sizeof(uint32_t);
+}
+
 static void *spapr_create_fdt_skel(const char *cpu_model,
                                    target_phys_addr_t rma_size,
                                    target_phys_addr_t initrd_base,
@@ -189,6 +307,8 @@  static void *spapr_create_fdt_skel(const char *cpu_model,
             kvmppc_read_int_cpu_dt("clock-frequency") : 1000000000;
         uint32_t vmx = kvm_enabled() ? kvmppc_read_int_cpu_dt("ibm,vmx") : 0;
         uint32_t dfp = kvm_enabled() ? kvmppc_read_int_cpu_dt("ibm,dfp") : 0;
+        uint32_t page_sizes_prop[15];
+        size_t page_sizes_prop_size;
 
         if ((index % smt) != 0) {
             continue;
@@ -251,6 +371,13 @@  static void *spapr_create_fdt_skel(const char *cpu_model,
             _FDT((fdt_property_cell(fdt, "ibm,dfp", dfp)));
         }
 
+        page_sizes_prop_size = create_page_sizes_prop(page_sizes_prop,
+                                                      sizeof(page_sizes_prop));
+        if (page_sizes_prop_size) {
+            _FDT((fdt_property(fdt, "ibm,segment-page-sizes",
+                               page_sizes_prop, page_sizes_prop_size)));
+        }
+
         _FDT((fdt_end_node(fdt)));
     }
 
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index db2326d..b399845 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -689,6 +689,49 @@  uint64_t kvmppc_read_int_cpu_dt(const char *propname)
     return 0;
 }
 
+/* Read a CPU node property from the host device tree that's a single
+ * integer (32-bit or 64-bit).  Returns 0 if anything goes wrong
+ * (can't find or open the property, or doesn't understand the
+ * format) */
+int kvmppc_read_segment_page_sizes(uint32_t *prop, int maxcells)
+{
+    char buf[PATH_MAX];
+    FILE *f;
+    int ncells;
+
+    if (kvmppc_find_cpu_dt(buf, sizeof(buf))) {
+        return -1;
+    }
+
+    strncat(buf, "/ibm,segment-page-sizes", sizeof(buf) - strlen(buf));
+
+    f = fopen(buf, "rb");
+    if (!f) {
+        if (errno == -ENOENT) {
+            /* If missing, assume defaults */
+            return 0;
+        }
+        return -1;
+    }
+
+    ncells = fread(prop, sizeof(uint32_t), maxcells, f);
+    if (ncells == maxcells) {
+        uint32_t tmp;
+        int n;
+
+        n = fread(&tmp, sizeof(tmp), 1, f);
+        if ((n != 0) || !feof(f)) {
+            fclose(f);
+            /* Not enough space provided for the result */
+            return -1;
+        }
+    }
+
+    fclose(f);
+
+    return ncells;
+}
+
 int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len)
 {
     uint32_t *hc = (uint32_t*)buf;
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 0b9a58a..14fbaa6 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -15,6 +15,7 @@  void kvmppc_init(void);
 
 uint32_t kvmppc_get_tbfreq(void);
 uint64_t kvmppc_read_int_cpu_dt(const char *propname);
+int kvmppc_read_segment_page_sizes(uint32_t *prop, int maxcells);
 int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len);
 int kvmppc_set_interrupt(CPUState *env, int irq, int level);
 void kvmppc_set_papr(CPUState *env);
@@ -35,6 +36,11 @@  static inline uint64_t kvmppc_read_int_cpu_dt(const char *propname)
     return 0;
 }
 
+static inline int kvmppc_read_segment_page_sizes(uint32_t *prop, int maxcells)
+{
+    return -1;
+}
+
 static inline int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len)
 {
     return -1;