Patchwork PPC: tell the guest about the time base frequency

login
register
mail settings
Submitter Alexander Graf
Date Jan. 8, 2010, 5:43 p.m.
Message ID <1262972592-7317-1-git-send-email-agraf@suse.de>
Download mbox | patch
Permalink /patch/42525/
State New
Headers show

Comments

Alexander Graf - Jan. 8, 2010, 5:43 p.m.
Our guest systems need to know by how much the timebase increases every second,
so there usually is a "timebase-frequency" property in the cpu leaf of the
device tree.

This property is missing in OpenBIOS, as is the "clock-frequency" property that
tells the guest how fast the CPU is. FWIW that one is only used for
/proc/cpuinfo though.

With qemu, Linux's fallback timebase speed and qemu's internal timebase speed
match up. With KVM, that is no longer true. The guest is running at the same
timebase speed as the host.

This leads to massive timing problems. On my test machine, a "sleep 2" takes
about 14 seconds with KVM enabled.

This patch exports the timebase and clock frequencies to OpenBIOS, so it can
then put them into the device tree. I'll push the OpenBIOS change with the
NewWorld patch set, once that's either been reviewed or applied.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 hw/ppc.h             |    2 +
 hw/ppc_newworld.c    |   11 ++++++++
 hw/ppc_oldworld.c    |   11 ++++++++
 target-ppc/kvm.c     |   70 ++++++++++++++++++++++++++++++++++++++++++++++++++
 target-ppc/kvm_ppc.h |    3 ++
 5 files changed, 97 insertions(+), 0 deletions(-)
Blue Swirl - Jan. 8, 2010, 6:04 p.m.
On Fri, Jan 8, 2010 at 5:43 PM, Alexander Graf <agraf@suse.de> wrote:
> Our guest systems need to know by how much the timebase increases every second,
> so there usually is a "timebase-frequency" property in the cpu leaf of the
> device tree.
>
> This property is missing in OpenBIOS, as is the "clock-frequency" property that
> tells the guest how fast the CPU is. FWIW that one is only used for
> /proc/cpuinfo though.
>
> With qemu, Linux's fallback timebase speed and qemu's internal timebase speed
> match up. With KVM, that is no longer true. The guest is running at the same
> timebase speed as the host.
>
> This leads to massive timing problems. On my test machine, a "sleep 2" takes
> about 14 seconds with KVM enabled.
>
> This patch exports the timebase and clock frequencies to OpenBIOS, so it can
> then put them into the device tree. I'll push the OpenBIOS change with the
> NewWorld patch set, once that's either been reviewed or applied.

IIRC copying the host CPU frequency to guest was rejected earlier for x86.
Alexander Graf - Jan. 8, 2010, 6:07 p.m.
On 08.01.2010, at 19:04, Blue Swirl wrote:

> On Fri, Jan 8, 2010 at 5:43 PM, Alexander Graf <agraf@suse.de> wrote:
>> Our guest systems need to know by how much the timebase increases every second,
>> so there usually is a "timebase-frequency" property in the cpu leaf of the
>> device tree.
>> 
>> This property is missing in OpenBIOS, as is the "clock-frequency" property that
>> tells the guest how fast the CPU is. FWIW that one is only used for
>> /proc/cpuinfo though.
>> 
>> With qemu, Linux's fallback timebase speed and qemu's internal timebase speed
>> match up. With KVM, that is no longer true. The guest is running at the same
>> timebase speed as the host.
>> 
>> This leads to massive timing problems. On my test machine, a "sleep 2" takes
>> about 14 seconds with KVM enabled.
>> 
>> This patch exports the timebase and clock frequencies to OpenBIOS, so it can
>> then put them into the device tree. I'll push the OpenBIOS change with the
>> NewWorld patch set, once that's either been reviewed or applied.
> 
> IIRC copying the host CPU frequency to guest was rejected earlier for x86.

Well IIRC x86 Linux tries to find out the cpu frequency itself.
PPC Linux doesn't - it completely relies on entries in the device tree.

Alex
Blue Swirl - Jan. 8, 2010, 6:22 p.m.
On Fri, Jan 8, 2010 at 6:07 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 08.01.2010, at 19:04, Blue Swirl wrote:
>
>> On Fri, Jan 8, 2010 at 5:43 PM, Alexander Graf <agraf@suse.de> wrote:
>>> Our guest systems need to know by how much the timebase increases every second,
>>> so there usually is a "timebase-frequency" property in the cpu leaf of the
>>> device tree.
>>>
>>> This property is missing in OpenBIOS, as is the "clock-frequency" property that
>>> tells the guest how fast the CPU is. FWIW that one is only used for
>>> /proc/cpuinfo though.
>>>
>>> With qemu, Linux's fallback timebase speed and qemu's internal timebase speed
>>> match up. With KVM, that is no longer true. The guest is running at the same
>>> timebase speed as the host.
>>>
>>> This leads to massive timing problems. On my test machine, a "sleep 2" takes
>>> about 14 seconds with KVM enabled.
>>>
>>> This patch exports the timebase and clock frequencies to OpenBIOS, so it can
>>> then put them into the device tree. I'll push the OpenBIOS change with the
>>> NewWorld patch set, once that's either been reviewed or applied.
>>
>> IIRC copying the host CPU frequency to guest was rejected earlier for x86.
>
> Well IIRC x86 Linux tries to find out the cpu frequency itself.
> PPC Linux doesn't - it completely relies on entries in the device tree.

The frequency could be a parameter for the -cpu flag, like -cpu
970fx,frequency=1000000000.
Alexander Graf - Jan. 8, 2010, 6:27 p.m.
On 08.01.2010, at 19:22, Blue Swirl wrote:

> On Fri, Jan 8, 2010 at 6:07 PM, Alexander Graf <agraf@suse.de> wrote:
>> 
>> On 08.01.2010, at 19:04, Blue Swirl wrote:
>> 
>>> On Fri, Jan 8, 2010 at 5:43 PM, Alexander Graf <agraf@suse.de> wrote:
>>>> Our guest systems need to know by how much the timebase increases every second,
>>>> so there usually is a "timebase-frequency" property in the cpu leaf of the
>>>> device tree.
>>>> 
>>>> This property is missing in OpenBIOS, as is the "clock-frequency" property that
>>>> tells the guest how fast the CPU is. FWIW that one is only used for
>>>> /proc/cpuinfo though.
>>>> 
>>>> With qemu, Linux's fallback timebase speed and qemu's internal timebase speed
>>>> match up. With KVM, that is no longer true. The guest is running at the same
>>>> timebase speed as the host.
>>>> 
>>>> This leads to massive timing problems. On my test machine, a "sleep 2" takes
>>>> about 14 seconds with KVM enabled.
>>>> 
>>>> This patch exports the timebase and clock frequencies to OpenBIOS, so it can
>>>> then put them into the device tree. I'll push the OpenBIOS change with the
>>>> NewWorld patch set, once that's either been reviewed or applied.
>>> 
>>> IIRC copying the host CPU frequency to guest was rejected earlier for x86.
>> 
>> Well IIRC x86 Linux tries to find out the cpu frequency itself.
>> PPC Linux doesn't - it completely relies on entries in the device tree.
> 
> The frequency could be a parameter for the -cpu flag, like -cpu
> 970fx,frequency=1000000000.

We could implement that as an override to the current static code path.
For KVM it doesn't make any sense, as you really want to see the host frequency in the guest here. That's what users expect. Period.

Alex
Blue Swirl - Jan. 8, 2010, 6:29 p.m.
On Fri, Jan 8, 2010 at 6:27 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 08.01.2010, at 19:22, Blue Swirl wrote:
>
>> On Fri, Jan 8, 2010 at 6:07 PM, Alexander Graf <agraf@suse.de> wrote:
>>>
>>> On 08.01.2010, at 19:04, Blue Swirl wrote:
>>>
>>>> On Fri, Jan 8, 2010 at 5:43 PM, Alexander Graf <agraf@suse.de> wrote:
>>>>> Our guest systems need to know by how much the timebase increases every second,
>>>>> so there usually is a "timebase-frequency" property in the cpu leaf of the
>>>>> device tree.
>>>>>
>>>>> This property is missing in OpenBIOS, as is the "clock-frequency" property that
>>>>> tells the guest how fast the CPU is. FWIW that one is only used for
>>>>> /proc/cpuinfo though.
>>>>>
>>>>> With qemu, Linux's fallback timebase speed and qemu's internal timebase speed
>>>>> match up. With KVM, that is no longer true. The guest is running at the same
>>>>> timebase speed as the host.
>>>>>
>>>>> This leads to massive timing problems. On my test machine, a "sleep 2" takes
>>>>> about 14 seconds with KVM enabled.
>>>>>
>>>>> This patch exports the timebase and clock frequencies to OpenBIOS, so it can
>>>>> then put them into the device tree. I'll push the OpenBIOS change with the
>>>>> NewWorld patch set, once that's either been reviewed or applied.
>>>>
>>>> IIRC copying the host CPU frequency to guest was rejected earlier for x86.
>>>
>>> Well IIRC x86 Linux tries to find out the cpu frequency itself.
>>> PPC Linux doesn't - it completely relies on entries in the device tree.
>>
>> The frequency could be a parameter for the -cpu flag, like -cpu
>> 970fx,frequency=1000000000.
>
> We could implement that as an override to the current static code path.
> For KVM it doesn't make any sense, as you really want to see the host frequency in the guest here. That's what users expect. Period.

Even with 10 guests? What about migration?
Alexander Graf - Jan. 8, 2010, 6:37 p.m.
On 08.01.2010, at 19:29, Blue Swirl wrote:

> On Fri, Jan 8, 2010 at 6:27 PM, Alexander Graf <agraf@suse.de> wrote:
>> 
>> On 08.01.2010, at 19:22, Blue Swirl wrote:
>> 
>>> On Fri, Jan 8, 2010 at 6:07 PM, Alexander Graf <agraf@suse.de> wrote:
>>>> 
>>>> On 08.01.2010, at 19:04, Blue Swirl wrote:
>>>> 
>>>>> On Fri, Jan 8, 2010 at 5:43 PM, Alexander Graf <agraf@suse.de> wrote:
>>>>>> Our guest systems need to know by how much the timebase increases every second,
>>>>>> so there usually is a "timebase-frequency" property in the cpu leaf of the
>>>>>> device tree.
>>>>>> 
>>>>>> This property is missing in OpenBIOS, as is the "clock-frequency" property that
>>>>>> tells the guest how fast the CPU is. FWIW that one is only used for
>>>>>> /proc/cpuinfo though.
>>>>>> 
>>>>>> With qemu, Linux's fallback timebase speed and qemu's internal timebase speed
>>>>>> match up. With KVM, that is no longer true. The guest is running at the same
>>>>>> timebase speed as the host.
>>>>>> 
>>>>>> This leads to massive timing problems. On my test machine, a "sleep 2" takes
>>>>>> about 14 seconds with KVM enabled.
>>>>>> 
>>>>>> This patch exports the timebase and clock frequencies to OpenBIOS, so it can
>>>>>> then put them into the device tree. I'll push the OpenBIOS change with the
>>>>>> NewWorld patch set, once that's either been reviewed or applied.
>>>>> 
>>>>> IIRC copying the host CPU frequency to guest was rejected earlier for x86.
>>>> 
>>>> Well IIRC x86 Linux tries to find out the cpu frequency itself.
>>>> PPC Linux doesn't - it completely relies on entries in the device tree.
>>> 
>>> The frequency could be a parameter for the -cpu flag, like -cpu
>>> 970fx,frequency=1000000000.
>> 
>> We could implement that as an override to the current static code path.
>> For KVM it doesn't make any sense, as you really want to see the host frequency in the guest here. That's what users expect. Period.
> 
> Even with 10 guests? What about migration?

Boot up a hypervisor of your choice on any random x86 box. Yes, that's exactly what you get.

cpuinfo wrt migration is broken. Since the timebase isn't trapable on ppc, you can also expect things to blow up completely when migrating a ppc VM.

That being said the only chance of ever getting migration to work properly on ppc would be to notify the guest that the timer frequency just changed, at which point it would have to reload it, rebase all timers off the new frequency and try to temper over everything as much as it can.

Either way, that's completely orthogonal to the question of exposing the host cpu frequency to the guest.

I think we should do what users expect. I agree that having a -cpu flag to set the exposed frequency is a good idea. However, the default shouldn't be hardcoded but determined as cleverly as possible.

If you like to do the -cpu flag, you're more than welcome to do so. I'm trying to fix bugs for now, getting ppc64 working fine in qemu. And there are seriously worse things out there than a lacking way of defining the cpu frequency with -cpu I'd rather spend my time on.

Alex

Patch

diff --git a/hw/ppc.h b/hw/ppc.h
index b9a12a1..864516f 100644
--- a/hw/ppc.h
+++ b/hw/ppc.h
@@ -46,5 +46,7 @@  enum {
 #define FW_CFG_PPC_WIDTH	(FW_CFG_ARCH_LOCAL + 0x00)
 #define FW_CFG_PPC_HEIGHT	(FW_CFG_ARCH_LOCAL + 0x01)
 #define FW_CFG_PPC_DEPTH	(FW_CFG_ARCH_LOCAL + 0x02)
+#define FW_CFG_PPC_TBFREQ	(FW_CFG_ARCH_LOCAL + 0x03)
+#define FW_CFG_PPC_CPUFREQ	(FW_CFG_ARCH_LOCAL + 0x04)
 
 #define PPC_SERIAL_MM_BAUDBASE 399193
diff --git a/hw/ppc_newworld.c b/hw/ppc_newworld.c
index d66860b..0ed7a39 100644
--- a/hw/ppc_newworld.c
+++ b/hw/ppc_newworld.c
@@ -40,6 +40,7 @@ 
 #include "loader.h"
 #include "elf.h"
 #include "kvm.h"
+#include "kvm_ppc.h"
 
 #define MAX_IDE_BUS 2
 #define VGA_BIOS_SIZE 65536
@@ -389,6 +390,16 @@  static void ppc_core99_init (ram_addr_t ram_size,
     fw_cfg_add_i16(fw_cfg, FW_CFG_PPC_HEIGHT, graphic_height);
     fw_cfg_add_i16(fw_cfg, FW_CFG_PPC_DEPTH, graphic_depth);
 
+    if (kvm_enabled()) {
+#ifdef CONFIG_KVM
+        fw_cfg_add_i32(fw_cfg, FW_CFG_PPC_TBFREQ, kvmppc_get_tbfreq());
+        fw_cfg_add_i32(fw_cfg, FW_CFG_PPC_CPUFREQ, kvmppc_get_cpufreq());
+#endif
+    } else {
+        fw_cfg_add_i32(fw_cfg, FW_CFG_PPC_TBFREQ, get_ticks_per_sec());
+        fw_cfg_add_i32(fw_cfg, FW_CFG_PPC_CPUFREQ, 1500 * 1000 * 1000);
+    }
+
     qemu_register_boot_set(fw_cfg_boot_set, fw_cfg);
 }
 
diff --git a/hw/ppc_oldworld.c b/hw/ppc_oldworld.c
index 7ccc6a1..feb8b6a 100644
--- a/hw/ppc_oldworld.c
+++ b/hw/ppc_oldworld.c
@@ -40,6 +40,7 @@ 
 #include "loader.h"
 #include "elf.h"
 #include "kvm.h"
+#include "kvm_ppc.h"
 
 #define MAX_IDE_BUS 2
 #define VGA_BIOS_SIZE 65536
@@ -401,6 +402,16 @@  static void ppc_heathrow_init (ram_addr_t ram_size,
     fw_cfg_add_i16(fw_cfg, FW_CFG_PPC_HEIGHT, graphic_height);
     fw_cfg_add_i16(fw_cfg, FW_CFG_PPC_DEPTH, graphic_depth);
 
+    if (kvm_enabled()) {
+#ifdef CONFIG_KVM
+        fw_cfg_add_i32(fw_cfg, FW_CFG_PPC_TBFREQ, kvmppc_get_tbfreq());
+        fw_cfg_add_i32(fw_cfg, FW_CFG_PPC_CPUFREQ, kvmppc_get_cpufreq());
+#endif
+    } else {
+        fw_cfg_add_i32(fw_cfg, FW_CFG_PPC_TBFREQ, get_ticks_per_sec());
+        fw_cfg_add_i32(fw_cfg, FW_CFG_PPC_CPUFREQ, 500 * 1000 * 1000);
+    }
+
     qemu_register_boot_set(fw_cfg_boot_set, fw_cfg);
 }
 
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 0424a78..2e1c897 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -252,3 +252,73 @@  int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run)
     return ret;
 }
 
+static int read_cpuinfo(const char *field, char *value, int len)
+{
+    FILE *f;
+    int ret = -1;
+    int field_len = strlen(field);
+    char line[512];
+
+    f = fopen("/proc/cpuinfo", "r");
+    if (!f) {
+        return -1;
+    }
+
+    do {
+        fgets(line, sizeof(line), f);
+        if (!strncmp(line, field, field_len)) {
+            strncpy(value, line, len);
+            ret = 0;
+            break;
+        }
+    } while(*line);
+
+    fclose(f);
+
+    return ret;
+}
+
+uint32_t kvmppc_get_cpufreq(void)
+{
+    char line[512];
+    char *ns, *ns2;
+    uint32_t retval = 1500 * 1000 * 1000;
+
+    if (read_cpuinfo("clock", line, sizeof(line))) {
+        return retval;
+    }
+
+    if (!(ns = strchr(line, ':'))) {
+        return retval;
+    }
+
+    ns++;
+
+    if (!(ns2 = strchr(ns, '.'))) {
+        return retval;
+    }
+
+    retval = atoi(ns) * 1000 * 1000;
+    return retval;
+}
+
+uint32_t kvmppc_get_tbfreq(void)
+{
+    char line[512];
+    char *ns;
+    uint32_t retval = get_ticks_per_sec();
+
+    if (read_cpuinfo("timebase", line, sizeof(line))) {
+        return retval;
+    }
+
+    if (!(ns = strchr(line, ':'))) {
+        return retval;
+    }
+
+    ns++;
+
+    retval = atoi(ns);
+    return retval;
+}
+
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 3792ef7..7f504df 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -14,4 +14,7 @@  void kvmppc_fdt_update(void *fdt);
 int kvmppc_read_host_property(const char *node_path, const char *prop,
                                      void *val, size_t len);
 
+uint32_t kvmppc_get_cpufreq(void);
+uint32_t kvmppc_get_tbfreq(void);
+
 #endif /* __KVM_PPC_H__ */