diff mbox series

[2/8] hvf: Move common code out

Message ID 20201126215017.41156-3-agraf@csgraf.de
State New
Headers show
Series hvf: Implement Apple Silicon Support | expand

Commit Message

Alexander Graf Nov. 26, 2020, 9:50 p.m. UTC
Until now, Hypervisor.framework has only been available on x86_64 systems.
With Apple Silicon shipping now, it extends its reach to aarch64. To
prepare for support for multiple architectures, let's move common code out
into its own accel directory.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
---
 MAINTAINERS                 |   9 +-
 accel/hvf/hvf-all.c         |  56 +++++
 accel/hvf/hvf-cpus.c        | 468 ++++++++++++++++++++++++++++++++++++
 accel/hvf/meson.build       |   7 +
 accel/meson.build           |   1 +
 include/sysemu/hvf_int.h    |  69 ++++++
 target/i386/hvf/hvf-cpus.c  | 131 ----------
 target/i386/hvf/hvf-cpus.h  |  25 --
 target/i386/hvf/hvf-i386.h  |  48 +---
 target/i386/hvf/hvf.c       | 360 +--------------------------
 target/i386/hvf/meson.build |   1 -
 target/i386/hvf/x86hvf.c    |  11 +-
 target/i386/hvf/x86hvf.h    |   2 -
 13 files changed, 619 insertions(+), 569 deletions(-)
 create mode 100644 accel/hvf/hvf-all.c
 create mode 100644 accel/hvf/hvf-cpus.c
 create mode 100644 accel/hvf/meson.build
 create mode 100644 include/sysemu/hvf_int.h
 delete mode 100644 target/i386/hvf/hvf-cpus.c
 delete mode 100644 target/i386/hvf/hvf-cpus.h

Comments

Roman Bolshakov Nov. 27, 2020, 8 p.m. UTC | #1
On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote:
> Until now, Hypervisor.framework has only been available on x86_64 systems.
> With Apple Silicon shipping now, it extends its reach to aarch64. To
> prepare for support for multiple architectures, let's move common code out
> into its own accel directory.
> 
> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> ---
>  MAINTAINERS                 |   9 +-
>  accel/hvf/hvf-all.c         |  56 +++++
>  accel/hvf/hvf-cpus.c        | 468 ++++++++++++++++++++++++++++++++++++
>  accel/hvf/meson.build       |   7 +
>  accel/meson.build           |   1 +
>  include/sysemu/hvf_int.h    |  69 ++++++
>  target/i386/hvf/hvf-cpus.c  | 131 ----------
>  target/i386/hvf/hvf-cpus.h  |  25 --
>  target/i386/hvf/hvf-i386.h  |  48 +---
>  target/i386/hvf/hvf.c       | 360 +--------------------------
>  target/i386/hvf/meson.build |   1 -
>  target/i386/hvf/x86hvf.c    |  11 +-
>  target/i386/hvf/x86hvf.h    |   2 -
>  13 files changed, 619 insertions(+), 569 deletions(-)
>  create mode 100644 accel/hvf/hvf-all.c
>  create mode 100644 accel/hvf/hvf-cpus.c
>  create mode 100644 accel/hvf/meson.build
>  create mode 100644 include/sysemu/hvf_int.h
>  delete mode 100644 target/i386/hvf/hvf-cpus.c
>  delete mode 100644 target/i386/hvf/hvf-cpus.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 68bc160f41..ca4b6d9279 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com>
>  M: Roman Bolshakov <r.bolshakov@yadro.com>
>  W: https://wiki.qemu.org/Features/HVF
>  S: Maintained
> -F: accel/stubs/hvf-stub.c

There was a patch for that in the RFC series from Claudio.

>  F: target/i386/hvf/
> +
> +HVF
> +M: Cameron Esfahani <dirty@apple.com>
> +M: Roman Bolshakov <r.bolshakov@yadro.com>
> +W: https://wiki.qemu.org/Features/HVF
> +S: Maintained
> +F: accel/hvf/
>  F: include/sysemu/hvf.h
> +F: include/sysemu/hvf_int.h
>  
>  WHPX CPUs
>  M: Sunil Muthuswamy <sunilmut@microsoft.com>
> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
> new file mode 100644
> index 0000000000..47d77a472a
> --- /dev/null
> +++ b/accel/hvf/hvf-all.c
> @@ -0,0 +1,56 @@
> +/*
> + * QEMU Hypervisor.framework support
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Contributions after 2012-01-13 are licensed under the terms of the
> + * GNU GPL, version 2 or (at your option) any later version.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "qemu/error-report.h"
> +#include "sysemu/hvf.h"
> +#include "sysemu/hvf_int.h"
> +#include "sysemu/runstate.h"
> +
> +#include "qemu/main-loop.h"
> +#include "sysemu/accel.h"
> +
> +#include <Hypervisor/Hypervisor.h>
> +
> +bool hvf_allowed;
> +HVFState *hvf_state;
> +
> +void assert_hvf_ok(hv_return_t ret)
> +{
> +    if (ret == HV_SUCCESS) {
> +        return;
> +    }
> +
> +    switch (ret) {
> +    case HV_ERROR:
> +        error_report("Error: HV_ERROR");
> +        break;
> +    case HV_BUSY:
> +        error_report("Error: HV_BUSY");
> +        break;
> +    case HV_BAD_ARGUMENT:
> +        error_report("Error: HV_BAD_ARGUMENT");
> +        break;
> +    case HV_NO_RESOURCES:
> +        error_report("Error: HV_NO_RESOURCES");
> +        break;
> +    case HV_NO_DEVICE:
> +        error_report("Error: HV_NO_DEVICE");
> +        break;
> +    case HV_UNSUPPORTED:
> +        error_report("Error: HV_UNSUPPORTED");
> +        break;
> +    default:
> +        error_report("Unknown Error");
> +    }
> +
> +    abort();
> +}
> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
> new file mode 100644
> index 0000000000..f9bb5502b7
> --- /dev/null
> +++ b/accel/hvf/hvf-cpus.c
> @@ -0,0 +1,468 @@
> +/*
> + * Copyright 2008 IBM Corporation
> + *           2008 Red Hat, Inc.
> + * Copyright 2011 Intel Corporation
> + * Copyright 2016 Veertu, Inc.
> + * Copyright 2017 The Android Open Source Project
> + *
> + * QEMU Hypervisor.framework support
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + *
> + * This file contain code under public domain from the hvdos project:
> + * https://github.com/mist64/hvdos
> + *
> + * Parts Copyright (c) 2011 NetApp, Inc.
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/error-report.h"
> +#include "qemu/main-loop.h"
> +#include "exec/address-spaces.h"
> +#include "exec/exec-all.h"
> +#include "sysemu/cpus.h"
> +#include "sysemu/hvf.h"
> +#include "sysemu/hvf_int.h"
> +#include "sysemu/runstate.h"
> +#include "qemu/guest-random.h"
> +
> +#include <Hypervisor/Hypervisor.h>
> +
> +/* Memory slots */
> +
> +struct mac_slot {
> +    int present;
> +    uint64_t size;
> +    uint64_t gpa_start;
> +    uint64_t gva;
> +};
> +
> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
> +{
> +    hvf_slot *slot;
> +    int x;
> +    for (x = 0; x < hvf_state->num_slots; ++x) {
> +        slot = &hvf_state->slots[x];
> +        if (slot->size && start < (slot->start + slot->size) &&
> +            (start + size) > slot->start) {
> +            return slot;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +struct mac_slot mac_slots[32];
> +
> +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
> +{
> +    struct mac_slot *macslot;
> +    hv_return_t ret;
> +
> +    macslot = &mac_slots[slot->slot_id];
> +
> +    if (macslot->present) {
> +        if (macslot->size != slot->size) {
> +            macslot->present = 0;
> +            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
> +            assert_hvf_ok(ret);
> +        }
> +    }
> +
> +    if (!slot->size) {
> +        return 0;
> +    }
> +
> +    macslot->present = 1;
> +    macslot->gpa_start = slot->start;
> +    macslot->size = slot->size;
> +    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
> +    assert_hvf_ok(ret);
> +    return 0;
> +}
> +
> +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
> +{
> +    hvf_slot *mem;
> +    MemoryRegion *area = section->mr;
> +    bool writeable = !area->readonly && !area->rom_device;
> +    hv_memory_flags_t flags;
> +
> +    if (!memory_region_is_ram(area)) {
> +        if (writeable) {
> +            return;
> +        } else if (!memory_region_is_romd(area)) {
> +            /*
> +             * If the memory device is not in romd_mode, then we actually want
> +             * to remove the hvf memory slot so all accesses will trap.
> +             */
> +             add = false;
> +        }
> +    }
> +
> +    mem = hvf_find_overlap_slot(
> +            section->offset_within_address_space,
> +            int128_get64(section->size));
> +
> +    if (mem && add) {
> +        if (mem->size == int128_get64(section->size) &&
> +            mem->start == section->offset_within_address_space &&
> +            mem->mem == (memory_region_get_ram_ptr(area) +
> +            section->offset_within_region)) {
> +            return; /* Same region was attempted to register, go away. */
> +        }
> +    }
> +
> +    /* Region needs to be reset. set the size to 0 and remap it. */
> +    if (mem) {
> +        mem->size = 0;
> +        if (do_hvf_set_memory(mem, 0)) {
> +            error_report("Failed to reset overlapping slot");
> +            abort();
> +        }
> +    }
> +
> +    if (!add) {
> +        return;
> +    }
> +
> +    if (area->readonly ||
> +        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
> +        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
> +    } else {
> +        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
> +    }
> +
> +    /* Now make a new slot. */
> +    int x;
> +
> +    for (x = 0; x < hvf_state->num_slots; ++x) {
> +        mem = &hvf_state->slots[x];
> +        if (!mem->size) {
> +            break;
> +        }
> +    }
> +
> +    if (x == hvf_state->num_slots) {
> +        error_report("No free slots");
> +        abort();
> +    }
> +
> +    mem->size = int128_get64(section->size);
> +    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
> +    mem->start = section->offset_within_address_space;
> +    mem->region = area;
> +
> +    if (do_hvf_set_memory(mem, flags)) {
> +        error_report("Error registering new memory slot");
> +        abort();
> +    }
> +}
> +
> +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
> +{
> +    hvf_slot *slot;
> +
> +    slot = hvf_find_overlap_slot(
> +            section->offset_within_address_space,
> +            int128_get64(section->size));
> +
> +    /* protect region against writes; begin tracking it */
> +    if (on) {
> +        slot->flags |= HVF_SLOT_LOG;
> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
> +                      HV_MEMORY_READ);
> +    /* stop tracking region*/
> +    } else {
> +        slot->flags &= ~HVF_SLOT_LOG;
> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
> +                      HV_MEMORY_READ | HV_MEMORY_WRITE);
> +    }
> +}
> +
> +static void hvf_log_start(MemoryListener *listener,
> +                          MemoryRegionSection *section, int old, int new)
> +{
> +    if (old != 0) {
> +        return;
> +    }
> +
> +    hvf_set_dirty_tracking(section, 1);
> +}
> +
> +static void hvf_log_stop(MemoryListener *listener,
> +                         MemoryRegionSection *section, int old, int new)
> +{
> +    if (new != 0) {
> +        return;
> +    }
> +
> +    hvf_set_dirty_tracking(section, 0);
> +}
> +
> +static void hvf_log_sync(MemoryListener *listener,
> +                         MemoryRegionSection *section)
> +{
> +    /*
> +     * sync of dirty pages is handled elsewhere; just make sure we keep
> +     * tracking the region.
> +     */
> +    hvf_set_dirty_tracking(section, 1);
> +}
> +
> +static void hvf_region_add(MemoryListener *listener,
> +                           MemoryRegionSection *section)
> +{
> +    hvf_set_phys_mem(section, true);
> +}
> +
> +static void hvf_region_del(MemoryListener *listener,
> +                           MemoryRegionSection *section)
> +{
> +    hvf_set_phys_mem(section, false);
> +}
> +
> +static MemoryListener hvf_memory_listener = {
> +    .priority = 10,
> +    .region_add = hvf_region_add,
> +    .region_del = hvf_region_del,
> +    .log_start = hvf_log_start,
> +    .log_stop = hvf_log_stop,
> +    .log_sync = hvf_log_sync,
> +};
> +
> +static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
> +{
> +    if (!cpu->vcpu_dirty) {
> +        hvf_get_registers(cpu);
> +        cpu->vcpu_dirty = true;
> +    }
> +}
> +
> +static void hvf_cpu_synchronize_state(CPUState *cpu)
> +{
> +    if (!cpu->vcpu_dirty) {
> +        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
> +    }
> +}
> +
> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
> +                                              run_on_cpu_data arg)
> +{
> +    hvf_put_registers(cpu);
> +    cpu->vcpu_dirty = false;
> +}
> +
> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
> +{
> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
> +}
> +
> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
> +                                             run_on_cpu_data arg)
> +{
> +    hvf_put_registers(cpu);
> +    cpu->vcpu_dirty = false;
> +}
> +
> +static void hvf_cpu_synchronize_post_init(CPUState *cpu)
> +{
> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
> +}
> +
> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
> +                                              run_on_cpu_data arg)
> +{
> +    cpu->vcpu_dirty = true;
> +}
> +
> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
> +{
> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
> +}
> +
> +static void hvf_vcpu_destroy(CPUState *cpu)
> +{
> +    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
> +    assert_hvf_ok(ret);
> +
> +    hvf_arch_vcpu_destroy(cpu);
> +}
> +
> +static void dummy_signal(int sig)
> +{
> +}
> +
> +static int hvf_init_vcpu(CPUState *cpu)
> +{
> +    int r;
> +
> +    /* init cpu signals */
> +    sigset_t set;
> +    struct sigaction sigact;
> +
> +    memset(&sigact, 0, sizeof(sigact));
> +    sigact.sa_handler = dummy_signal;
> +    sigaction(SIG_IPI, &sigact, NULL);
> +
> +    pthread_sigmask(SIG_BLOCK, NULL, &set);
> +    sigdelset(&set, SIG_IPI);
> +
> +#ifdef __aarch64__
> +    r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t **)&cpu->hvf_exit, NULL);
> +#else
> +    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
> +#endif

I think the first __aarch64__ bit fits better to arm part of the series.

> +    cpu->vcpu_dirty = 1;
> +    assert_hvf_ok(r);
> +
> +    return hvf_arch_init_vcpu(cpu);
> +}
> +
> +/*
> + * The HVF-specific vCPU thread function. This one should only run when the host
> + * CPU supports the VMX "unrestricted guest" feature.
> + */
> +static void *hvf_cpu_thread_fn(void *arg)
> +{
> +    CPUState *cpu = arg;
> +
> +    int r;
> +
> +    assert(hvf_enabled());
> +
> +    rcu_register_thread();
> +
> +    qemu_mutex_lock_iothread();
> +    qemu_thread_get_self(cpu->thread);
> +
> +    cpu->thread_id = qemu_get_thread_id();
> +    cpu->can_do_io = 1;
> +    current_cpu = cpu;
> +
> +    hvf_init_vcpu(cpu);
> +
> +    /* signal CPU creation */
> +    cpu_thread_signal_created(cpu);
> +    qemu_guest_random_seed_thread_part2(cpu->random_seed);
> +
> +    do {
> +        if (cpu_can_run(cpu)) {
> +            r = hvf_vcpu_exec(cpu);
> +            if (r == EXCP_DEBUG) {
> +                cpu_handle_guest_debug(cpu);
> +            }
> +        }
> +        qemu_wait_io_event(cpu);
> +    } while (!cpu->unplug || cpu_can_run(cpu));
> +
> +    hvf_vcpu_destroy(cpu);
> +    cpu_thread_signal_destroyed(cpu);
> +    qemu_mutex_unlock_iothread();
> +    rcu_unregister_thread();
> +    return NULL;
> +}
> +
> +static void hvf_start_vcpu_thread(CPUState *cpu)
> +{
> +    char thread_name[VCPU_THREAD_NAME_SIZE];
> +
> +    /*
> +     * HVF currently does not support TCG, and only runs in
> +     * unrestricted-guest mode.
> +     */
> +    assert(hvf_enabled());
> +
> +    cpu->thread = g_malloc0(sizeof(QemuThread));
> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
> +    qemu_cond_init(cpu->halt_cond);
> +
> +    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
> +             cpu->cpu_index);
> +    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
> +                       cpu, QEMU_THREAD_JOINABLE);
> +}
> +
> +static const CpusAccel hvf_cpus = {
> +    .create_vcpu_thread = hvf_start_vcpu_thread,
> +
> +    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
> +    .synchronize_post_init = hvf_cpu_synchronize_post_init,
> +    .synchronize_state = hvf_cpu_synchronize_state,
> +    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
> +};
> +
> +static int hvf_accel_init(MachineState *ms)
> +{
> +    int x;
> +    hv_return_t ret;
> +    HVFState *s;
> +
> +    ret = hv_vm_create(HV_VM_DEFAULT);
> +    assert_hvf_ok(ret);
> +
> +    s = g_new0(HVFState, 1);
> +
> +    s->num_slots = 32;
> +    for (x = 0; x < s->num_slots; ++x) {
> +        s->slots[x].size = 0;
> +        s->slots[x].slot_id = x;
> +    }
> +
> +    hvf_state = s;
> +    memory_listener_register(&hvf_memory_listener, &address_space_memory);
> +    cpus_register_accel(&hvf_cpus);
> +    return 0;
> +}
> +
> +static void hvf_accel_class_init(ObjectClass *oc, void *data)
> +{
> +    AccelClass *ac = ACCEL_CLASS(oc);
> +    ac->name = "HVF";
> +    ac->init_machine = hvf_accel_init;
> +    ac->allowed = &hvf_allowed;
> +}
> +
> +static const TypeInfo hvf_accel_type = {
> +    .name = TYPE_HVF_ACCEL,
> +    .parent = TYPE_ACCEL,
> +    .class_init = hvf_accel_class_init,
> +};
> +
> +static void hvf_type_init(void)
> +{
> +    type_register_static(&hvf_accel_type);
> +}
> +
> +type_init(hvf_type_init);
> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
> new file mode 100644
> index 0000000000..dfd6b68dc7
> --- /dev/null
> +++ b/accel/hvf/meson.build
> @@ -0,0 +1,7 @@
> +hvf_ss = ss.source_set()
> +hvf_ss.add(files(
> +  'hvf-all.c',
> +  'hvf-cpus.c',
> +))
> +
> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
> diff --git a/accel/meson.build b/accel/meson.build
> index b26cca227a..6de12ce5d5 100644
> --- a/accel/meson.build
> +++ b/accel/meson.build
> @@ -1,5 +1,6 @@
>  softmmu_ss.add(files('accel.c'))
>  
> +subdir('hvf')
>  subdir('qtest')
>  subdir('kvm')
>  subdir('tcg')
> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
> new file mode 100644
> index 0000000000..de9bad23a8
> --- /dev/null
> +++ b/include/sysemu/hvf_int.h
> @@ -0,0 +1,69 @@
> +/*
> + * QEMU Hypervisor.framework (HVF) support
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +/* header to be included in HVF-specific code */
> +
> +#ifndef HVF_INT_H
> +#define HVF_INT_H
> +
> +#include <Hypervisor/Hypervisor.h>
> +
> +#define HVF_MAX_VCPU 0x10
> +
> +extern struct hvf_state hvf_global;
> +
> +struct hvf_vm {
> +    int id;
> +    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
> +};
> +
> +struct hvf_state {
> +    uint32_t version;
> +    struct hvf_vm *vm;
> +    uint64_t mem_quota;
> +};
> +
> +/* hvf_slot flags */
> +#define HVF_SLOT_LOG (1 << 0)
> +
> +typedef struct hvf_slot {
> +    uint64_t start;
> +    uint64_t size;
> +    uint8_t *mem;
> +    int slot_id;
> +    uint32_t flags;
> +    MemoryRegion *region;
> +} hvf_slot;
> +
> +typedef struct hvf_vcpu_caps {
> +    uint64_t vmx_cap_pinbased;
> +    uint64_t vmx_cap_procbased;
> +    uint64_t vmx_cap_procbased2;
> +    uint64_t vmx_cap_entry;
> +    uint64_t vmx_cap_exit;
> +    uint64_t vmx_cap_preemption_timer;
> +} hvf_vcpu_caps;
> +
> +struct HVFState {
> +    AccelState parent;
> +    hvf_slot slots[32];
> +    int num_slots;
> +
> +    hvf_vcpu_caps *hvf_caps;
> +};
> +extern HVFState *hvf_state;
> +
> +void assert_hvf_ok(hv_return_t ret);
> +int hvf_get_registers(CPUState *cpu);
> +int hvf_put_registers(CPUState *cpu);
> +int hvf_arch_init_vcpu(CPUState *cpu);
> +void hvf_arch_vcpu_destroy(CPUState *cpu);
> +int hvf_vcpu_exec(CPUState *cpu);
> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
> +
> +#endif
> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c
> deleted file mode 100644
> index 817b3d7452..0000000000
> --- a/target/i386/hvf/hvf-cpus.c
> +++ /dev/null
> @@ -1,131 +0,0 @@
> -/*
> - * Copyright 2008 IBM Corporation
> - *           2008 Red Hat, Inc.
> - * Copyright 2011 Intel Corporation
> - * Copyright 2016 Veertu, Inc.
> - * Copyright 2017 The Android Open Source Project
> - *
> - * QEMU Hypervisor.framework support
> - *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of version 2 of the GNU General Public
> - * License as published by the Free Software Foundation.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> - * General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, see <http://www.gnu.org/licenses/>.
> - *
> - * This file contain code under public domain from the hvdos project:
> - * https://github.com/mist64/hvdos
> - *
> - * Parts Copyright (c) 2011 NetApp, Inc.
> - * All rights reserved.
> - *
> - * Redistribution and use in source and binary forms, with or without
> - * modification, are permitted provided that the following conditions
> - * are met:
> - * 1. Redistributions of source code must retain the above copyright
> - *    notice, this list of conditions and the following disclaimer.
> - * 2. Redistributions in binary form must reproduce the above copyright
> - *    notice, this list of conditions and the following disclaimer in the
> - *    documentation and/or other materials provided with the distribution.
> - *
> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> - * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> - * SUCH DAMAGE.
> - */
> -
> -#include "qemu/osdep.h"
> -#include "qemu/error-report.h"
> -#include "qemu/main-loop.h"
> -#include "sysemu/hvf.h"
> -#include "sysemu/runstate.h"
> -#include "target/i386/cpu.h"
> -#include "qemu/guest-random.h"
> -
> -#include "hvf-cpus.h"
> -
> -/*
> - * The HVF-specific vCPU thread function. This one should only run when the host
> - * CPU supports the VMX "unrestricted guest" feature.
> - */
> -static void *hvf_cpu_thread_fn(void *arg)
> -{
> -    CPUState *cpu = arg;
> -
> -    int r;
> -
> -    assert(hvf_enabled());
> -
> -    rcu_register_thread();
> -
> -    qemu_mutex_lock_iothread();
> -    qemu_thread_get_self(cpu->thread);
> -
> -    cpu->thread_id = qemu_get_thread_id();
> -    cpu->can_do_io = 1;
> -    current_cpu = cpu;
> -
> -    hvf_init_vcpu(cpu);
> -
> -    /* signal CPU creation */
> -    cpu_thread_signal_created(cpu);
> -    qemu_guest_random_seed_thread_part2(cpu->random_seed);
> -
> -    do {
> -        if (cpu_can_run(cpu)) {
> -            r = hvf_vcpu_exec(cpu);
> -            if (r == EXCP_DEBUG) {
> -                cpu_handle_guest_debug(cpu);
> -            }
> -        }
> -        qemu_wait_io_event(cpu);
> -    } while (!cpu->unplug || cpu_can_run(cpu));
> -
> -    hvf_vcpu_destroy(cpu);
> -    cpu_thread_signal_destroyed(cpu);
> -    qemu_mutex_unlock_iothread();
> -    rcu_unregister_thread();
> -    return NULL;
> -}
> -
> -static void hvf_start_vcpu_thread(CPUState *cpu)
> -{
> -    char thread_name[VCPU_THREAD_NAME_SIZE];
> -
> -    /*
> -     * HVF currently does not support TCG, and only runs in
> -     * unrestricted-guest mode.
> -     */
> -    assert(hvf_enabled());
> -
> -    cpu->thread = g_malloc0(sizeof(QemuThread));
> -    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
> -    qemu_cond_init(cpu->halt_cond);
> -
> -    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
> -             cpu->cpu_index);
> -    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
> -                       cpu, QEMU_THREAD_JOINABLE);
> -}
> -
> -const CpusAccel hvf_cpus = {
> -    .create_vcpu_thread = hvf_start_vcpu_thread,
> -
> -    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
> -    .synchronize_post_init = hvf_cpu_synchronize_post_init,
> -    .synchronize_state = hvf_cpu_synchronize_state,
> -    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
> -};
> diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h
> deleted file mode 100644
> index ced31b82c0..0000000000
> --- a/target/i386/hvf/hvf-cpus.h
> +++ /dev/null
> @@ -1,25 +0,0 @@
> -/*
> - * Accelerator CPUS Interface
> - *
> - * Copyright 2020 SUSE LLC
> - *
> - * This work is licensed under the terms of the GNU GPL, version 2 or later.
> - * See the COPYING file in the top-level directory.
> - */
> -
> -#ifndef HVF_CPUS_H
> -#define HVF_CPUS_H
> -
> -#include "sysemu/cpus.h"
> -
> -extern const CpusAccel hvf_cpus;
> -
> -int hvf_init_vcpu(CPUState *);
> -int hvf_vcpu_exec(CPUState *);
> -void hvf_cpu_synchronize_state(CPUState *);
> -void hvf_cpu_synchronize_post_reset(CPUState *);
> -void hvf_cpu_synchronize_post_init(CPUState *);
> -void hvf_cpu_synchronize_pre_loadvm(CPUState *);
> -void hvf_vcpu_destroy(CPUState *);
> -
> -#endif /* HVF_CPUS_H */
> diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
> index e0edffd077..6d56f8f6bb 100644
> --- a/target/i386/hvf/hvf-i386.h
> +++ b/target/i386/hvf/hvf-i386.h
> @@ -18,57 +18,11 @@
>  
>  #include "sysemu/accel.h"
>  #include "sysemu/hvf.h"
> +#include "sysemu/hvf_int.h"
>  #include "cpu.h"
>  #include "x86.h"
>  
> -#define HVF_MAX_VCPU 0x10
> -
> -extern struct hvf_state hvf_global;
> -
> -struct hvf_vm {
> -    int id;
> -    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
> -};
> -
> -struct hvf_state {
> -    uint32_t version;
> -    struct hvf_vm *vm;
> -    uint64_t mem_quota;
> -};
> -
> -/* hvf_slot flags */
> -#define HVF_SLOT_LOG (1 << 0)
> -
> -typedef struct hvf_slot {
> -    uint64_t start;
> -    uint64_t size;
> -    uint8_t *mem;
> -    int slot_id;
> -    uint32_t flags;
> -    MemoryRegion *region;
> -} hvf_slot;
> -
> -typedef struct hvf_vcpu_caps {
> -    uint64_t vmx_cap_pinbased;
> -    uint64_t vmx_cap_procbased;
> -    uint64_t vmx_cap_procbased2;
> -    uint64_t vmx_cap_entry;
> -    uint64_t vmx_cap_exit;
> -    uint64_t vmx_cap_preemption_timer;
> -} hvf_vcpu_caps;
> -
> -struct HVFState {
> -    AccelState parent;
> -    hvf_slot slots[32];
> -    int num_slots;
> -
> -    hvf_vcpu_caps *hvf_caps;
> -};
> -extern HVFState *hvf_state;
> -
> -void hvf_set_phys_mem(MemoryRegionSection *, bool);
>  void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
>  
>  #ifdef NEED_CPU_H
>  /* Functions exported to host specific mode */
> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
> index ed9356565c..8b96ecd619 100644
> --- a/target/i386/hvf/hvf.c
> +++ b/target/i386/hvf/hvf.c
> @@ -51,6 +51,7 @@
>  #include "qemu/error-report.h"
>  
>  #include "sysemu/hvf.h"
> +#include "sysemu/hvf_int.h"
>  #include "sysemu/runstate.h"
>  #include "hvf-i386.h"
>  #include "vmcs.h"
> @@ -72,171 +73,6 @@
>  #include "sysemu/accel.h"
>  #include "target/i386/cpu.h"
>  
> -#include "hvf-cpus.h"
> -
> -HVFState *hvf_state;
> -
> -static void assert_hvf_ok(hv_return_t ret)
> -{
> -    if (ret == HV_SUCCESS) {
> -        return;
> -    }
> -
> -    switch (ret) {
> -    case HV_ERROR:
> -        error_report("Error: HV_ERROR");
> -        break;
> -    case HV_BUSY:
> -        error_report("Error: HV_BUSY");
> -        break;
> -    case HV_BAD_ARGUMENT:
> -        error_report("Error: HV_BAD_ARGUMENT");
> -        break;
> -    case HV_NO_RESOURCES:
> -        error_report("Error: HV_NO_RESOURCES");
> -        break;
> -    case HV_NO_DEVICE:
> -        error_report("Error: HV_NO_DEVICE");
> -        break;
> -    case HV_UNSUPPORTED:
> -        error_report("Error: HV_UNSUPPORTED");
> -        break;
> -    default:
> -        error_report("Unknown Error");
> -    }
> -
> -    abort();
> -}
> -
> -/* Memory slots */
> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
> -{
> -    hvf_slot *slot;
> -    int x;
> -    for (x = 0; x < hvf_state->num_slots; ++x) {
> -        slot = &hvf_state->slots[x];
> -        if (slot->size && start < (slot->start + slot->size) &&
> -            (start + size) > slot->start) {
> -            return slot;
> -        }
> -    }
> -    return NULL;
> -}
> -
> -struct mac_slot {
> -    int present;
> -    uint64_t size;
> -    uint64_t gpa_start;
> -    uint64_t gva;
> -};
> -
> -struct mac_slot mac_slots[32];
> -
> -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
> -{
> -    struct mac_slot *macslot;
> -    hv_return_t ret;
> -
> -    macslot = &mac_slots[slot->slot_id];
> -
> -    if (macslot->present) {
> -        if (macslot->size != slot->size) {
> -            macslot->present = 0;
> -            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
> -            assert_hvf_ok(ret);
> -        }
> -    }
> -
> -    if (!slot->size) {
> -        return 0;
> -    }
> -
> -    macslot->present = 1;
> -    macslot->gpa_start = slot->start;
> -    macslot->size = slot->size;
> -    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
> -    assert_hvf_ok(ret);
> -    return 0;
> -}
> -
> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
> -{
> -    hvf_slot *mem;
> -    MemoryRegion *area = section->mr;
> -    bool writeable = !area->readonly && !area->rom_device;
> -    hv_memory_flags_t flags;
> -
> -    if (!memory_region_is_ram(area)) {
> -        if (writeable) {
> -            return;
> -        } else if (!memory_region_is_romd(area)) {
> -            /*
> -             * If the memory device is not in romd_mode, then we actually want
> -             * to remove the hvf memory slot so all accesses will trap.
> -             */
> -             add = false;
> -        }
> -    }
> -
> -    mem = hvf_find_overlap_slot(
> -            section->offset_within_address_space,
> -            int128_get64(section->size));
> -
> -    if (mem && add) {
> -        if (mem->size == int128_get64(section->size) &&
> -            mem->start == section->offset_within_address_space &&
> -            mem->mem == (memory_region_get_ram_ptr(area) +
> -            section->offset_within_region)) {
> -            return; /* Same region was attempted to register, go away. */
> -        }
> -    }
> -
> -    /* Region needs to be reset. set the size to 0 and remap it. */
> -    if (mem) {
> -        mem->size = 0;
> -        if (do_hvf_set_memory(mem, 0)) {
> -            error_report("Failed to reset overlapping slot");
> -            abort();
> -        }
> -    }
> -
> -    if (!add) {
> -        return;
> -    }
> -
> -    if (area->readonly ||
> -        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
> -        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
> -    } else {
> -        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
> -    }
> -
> -    /* Now make a new slot. */
> -    int x;
> -
> -    for (x = 0; x < hvf_state->num_slots; ++x) {
> -        mem = &hvf_state->slots[x];
> -        if (!mem->size) {
> -            break;
> -        }
> -    }
> -
> -    if (x == hvf_state->num_slots) {
> -        error_report("No free slots");
> -        abort();
> -    }
> -
> -    mem->size = int128_get64(section->size);
> -    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
> -    mem->start = section->offset_within_address_space;
> -    mem->region = area;
> -
> -    if (do_hvf_set_memory(mem, flags)) {
> -        error_report("Error registering new memory slot");
> -        abort();
> -    }
> -}
> -
>  void vmx_update_tpr(CPUState *cpu)
>  {
>      /* TODO: need integrate APIC handling */
> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer,
>      }
>  }
>  
> -static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
> -{
> -    if (!cpu->vcpu_dirty) {
> -        hvf_get_registers(cpu);
> -        cpu->vcpu_dirty = true;
> -    }
> -}
> -
> -void hvf_cpu_synchronize_state(CPUState *cpu)
> -{
> -    if (!cpu->vcpu_dirty) {
> -        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
> -    }
> -}
> -
> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
> -                                              run_on_cpu_data arg)
> -{
> -    hvf_put_registers(cpu);
> -    cpu->vcpu_dirty = false;
> -}
> -
> -void hvf_cpu_synchronize_post_reset(CPUState *cpu)
> -{
> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
> -}
> -
> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
> -                                             run_on_cpu_data arg)
> -{
> -    hvf_put_registers(cpu);
> -    cpu->vcpu_dirty = false;
> -}
> -
> -void hvf_cpu_synchronize_post_init(CPUState *cpu)
> -{
> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
> -}
> -
> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
> -                                              run_on_cpu_data arg)
> -{
> -    cpu->vcpu_dirty = true;
> -}
> -
> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
> -{
> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
> -}
> -
>  static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
>  {
>      int read, write;
> @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
>      return false;
>  }
>  
> -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
> -{
> -    hvf_slot *slot;
> -
> -    slot = hvf_find_overlap_slot(
> -            section->offset_within_address_space,
> -            int128_get64(section->size));
> -
> -    /* protect region against writes; begin tracking it */
> -    if (on) {
> -        slot->flags |= HVF_SLOT_LOG;
> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
> -                      HV_MEMORY_READ);
> -    /* stop tracking region*/
> -    } else {
> -        slot->flags &= ~HVF_SLOT_LOG;
> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
> -                      HV_MEMORY_READ | HV_MEMORY_WRITE);
> -    }
> -}
> -
> -static void hvf_log_start(MemoryListener *listener,
> -                          MemoryRegionSection *section, int old, int new)
> -{
> -    if (old != 0) {
> -        return;
> -    }
> -
> -    hvf_set_dirty_tracking(section, 1);
> -}
> -
> -static void hvf_log_stop(MemoryListener *listener,
> -                         MemoryRegionSection *section, int old, int new)
> -{
> -    if (new != 0) {
> -        return;
> -    }
> -
> -    hvf_set_dirty_tracking(section, 0);
> -}
> -
> -static void hvf_log_sync(MemoryListener *listener,
> -                         MemoryRegionSection *section)
> -{
> -    /*
> -     * sync of dirty pages is handled elsewhere; just make sure we keep
> -     * tracking the region.
> -     */
> -    hvf_set_dirty_tracking(section, 1);
> -}
> -
> -static void hvf_region_add(MemoryListener *listener,
> -                           MemoryRegionSection *section)
> -{
> -    hvf_set_phys_mem(section, true);
> -}
> -
> -static void hvf_region_del(MemoryListener *listener,
> -                           MemoryRegionSection *section)
> -{
> -    hvf_set_phys_mem(section, false);
> -}
> -
> -static MemoryListener hvf_memory_listener = {
> -    .priority = 10,
> -    .region_add = hvf_region_add,
> -    .region_del = hvf_region_del,
> -    .log_start = hvf_log_start,
> -    .log_stop = hvf_log_stop,
> -    .log_sync = hvf_log_sync,
> -};
> -
> -void hvf_vcpu_destroy(CPUState *cpu)
> +void hvf_arch_vcpu_destroy(CPUState *cpu)
>  {
>      X86CPU *x86_cpu = X86_CPU(cpu);
>      CPUX86State *env = &x86_cpu->env;
>  
> -    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
>      g_free(env->hvf_mmio_buf);
> -    assert_hvf_ok(ret);
> -}
> -
> -static void dummy_signal(int sig)
> -{
>  }
>  
> -int hvf_init_vcpu(CPUState *cpu)
> +int hvf_arch_init_vcpu(CPUState *cpu)
>  {
>  
>      X86CPU *x86cpu = X86_CPU(cpu);
>      CPUX86State *env = &x86cpu->env;
> -    int r;
> -
> -    /* init cpu signals */
> -    sigset_t set;
> -    struct sigaction sigact;
> -
> -    memset(&sigact, 0, sizeof(sigact));
> -    sigact.sa_handler = dummy_signal;
> -    sigaction(SIG_IPI, &sigact, NULL);
> -
> -    pthread_sigmask(SIG_BLOCK, NULL, &set);
> -    sigdelset(&set, SIG_IPI);
>  
>      init_emu();
>      init_decoder();
> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu)
>      hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1);
>      env->hvf_mmio_buf = g_new(char, 4096);
>  
> -    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
> -    cpu->vcpu_dirty = 1;
> -    assert_hvf_ok(r);
> -
>      if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
>          &hvf_state->hvf_caps->vmx_cap_pinbased)) {
>          abort();
> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu)
>  
>      return ret;
>  }
> -
> -bool hvf_allowed;
> -
> -static int hvf_accel_init(MachineState *ms)
> -{
> -    int x;
> -    hv_return_t ret;
> -    HVFState *s;
> -
> -    ret = hv_vm_create(HV_VM_DEFAULT);
> -    assert_hvf_ok(ret);
> -
> -    s = g_new0(HVFState, 1);
> - 
> -    s->num_slots = 32;
> -    for (x = 0; x < s->num_slots; ++x) {
> -        s->slots[x].size = 0;
> -        s->slots[x].slot_id = x;
> -    }
> -  
> -    hvf_state = s;
> -    memory_listener_register(&hvf_memory_listener, &address_space_memory);
> -    cpus_register_accel(&hvf_cpus);
> -    return 0;
> -}
> -
> -static void hvf_accel_class_init(ObjectClass *oc, void *data)
> -{
> -    AccelClass *ac = ACCEL_CLASS(oc);
> -    ac->name = "HVF";
> -    ac->init_machine = hvf_accel_init;
> -    ac->allowed = &hvf_allowed;
> -}
> -
> -static const TypeInfo hvf_accel_type = {
> -    .name = TYPE_HVF_ACCEL,
> -    .parent = TYPE_ACCEL,
> -    .class_init = hvf_accel_class_init,
> -};
> -
> -static void hvf_type_init(void)
> -{
> -    type_register_static(&hvf_accel_type);
> -}
> -
> -type_init(hvf_type_init);
> diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
> index 409c9a3f14..c8a43717ee 100644
> --- a/target/i386/hvf/meson.build
> +++ b/target/i386/hvf/meson.build
> @@ -1,6 +1,5 @@
>  i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
>    'hvf.c',
> -  'hvf-cpus.c',
>    'x86.c',
>    'x86_cpuid.c',
>    'x86_decode.c',
> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
> index bbec412b6c..89b8e9d87a 100644
> --- a/target/i386/hvf/x86hvf.c
> +++ b/target/i386/hvf/x86hvf.c
> @@ -20,6 +20,9 @@
>  #include "qemu/osdep.h"
>  
>  #include "qemu-common.h"
> +#include "sysemu/hvf.h"
> +#include "sysemu/hvf_int.h"
> +#include "sysemu/hw_accel.h"
>  #include "x86hvf.h"
>  #include "vmx.h"
>  #include "vmcs.h"
> @@ -32,8 +35,6 @@
>  #include <Hypervisor/hv.h>
>  #include <Hypervisor/hv_vmx.h>
>  
> -#include "hvf-cpus.h"
> -
>  void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
>                       SegmentCache *qseg, bool is_tr)
>  {
> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state)
>      env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
>  
>      if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
> -        hvf_cpu_synchronize_state(cpu_state);
> +        cpu_synchronize_state(cpu_state);
>          do_cpu_init(cpu);
>      }
>  
> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state)
>          cpu_state->halted = 0;
>      }
>      if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) {
> -        hvf_cpu_synchronize_state(cpu_state);
> +        cpu_synchronize_state(cpu_state);
>          do_cpu_sipi(cpu);
>      }
>      if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) {
>          cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR;
> -        hvf_cpu_synchronize_state(cpu_state);
> +        cpu_synchronize_state(cpu_state);

The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should
be a separate patch. It follows cpu/accel cleanups Claudio was doing the
summer.

Phillipe raised the idea that the patch might go ahead of ARM-specific
part (which might involve some discussions) and I agree with that.

Some sync between Claudio series (CC'd him) and the patch might be need.

Thanks,
Roman

>          apic_handle_tpr_access_report(cpu->apic_state, env->eip,
>                                        env->tpr_access_type);
>      }
> diff --git a/target/i386/hvf/x86hvf.h b/target/i386/hvf/x86hvf.h
> index 635ab0f34e..99ed8d608d 100644
> --- a/target/i386/hvf/x86hvf.h
> +++ b/target/i386/hvf/x86hvf.h
> @@ -21,8 +21,6 @@
>  #include "x86_descr.h"
>  
>  int hvf_process_events(CPUState *);
> -int hvf_put_registers(CPUState *);
> -int hvf_get_registers(CPUState *);
>  bool hvf_inject_interrupts(CPUState *);
>  void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
>                       SegmentCache *qseg, bool is_tr);
> -- 
> 2.24.3 (Apple Git-128)
> 
> 
>
Alexander Graf Nov. 27, 2020, 9:55 p.m. UTC | #2
On 27.11.20 21:00, Roman Bolshakov wrote:
> On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote:
>> Until now, Hypervisor.framework has only been available on x86_64 systems.
>> With Apple Silicon shipping now, it extends its reach to aarch64. To
>> prepare for support for multiple architectures, let's move common code out
>> into its own accel directory.
>>
>> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>> ---
>>   MAINTAINERS                 |   9 +-
>>   accel/hvf/hvf-all.c         |  56 +++++
>>   accel/hvf/hvf-cpus.c        | 468 ++++++++++++++++++++++++++++++++++++
>>   accel/hvf/meson.build       |   7 +
>>   accel/meson.build           |   1 +
>>   include/sysemu/hvf_int.h    |  69 ++++++
>>   target/i386/hvf/hvf-cpus.c  | 131 ----------
>>   target/i386/hvf/hvf-cpus.h  |  25 --
>>   target/i386/hvf/hvf-i386.h  |  48 +---
>>   target/i386/hvf/hvf.c       | 360 +--------------------------
>>   target/i386/hvf/meson.build |   1 -
>>   target/i386/hvf/x86hvf.c    |  11 +-
>>   target/i386/hvf/x86hvf.h    |   2 -
>>   13 files changed, 619 insertions(+), 569 deletions(-)
>>   create mode 100644 accel/hvf/hvf-all.c
>>   create mode 100644 accel/hvf/hvf-cpus.c
>>   create mode 100644 accel/hvf/meson.build
>>   create mode 100644 include/sysemu/hvf_int.h
>>   delete mode 100644 target/i386/hvf/hvf-cpus.c
>>   delete mode 100644 target/i386/hvf/hvf-cpus.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 68bc160f41..ca4b6d9279 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com>
>>   M: Roman Bolshakov <r.bolshakov@yadro.com>
>>   W: https://wiki.qemu.org/Features/HVF
>>   S: Maintained
>> -F: accel/stubs/hvf-stub.c
> There was a patch for that in the RFC series from Claudio.


Yeah, I'm not worried about this hunk :).


>
>>   F: target/i386/hvf/
>> +
>> +HVF
>> +M: Cameron Esfahani <dirty@apple.com>
>> +M: Roman Bolshakov <r.bolshakov@yadro.com>
>> +W: https://wiki.qemu.org/Features/HVF
>> +S: Maintained
>> +F: accel/hvf/
>>   F: include/sysemu/hvf.h
>> +F: include/sysemu/hvf_int.h
>>   
>>   WHPX CPUs
>>   M: Sunil Muthuswamy <sunilmut@microsoft.com>
>> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
>> new file mode 100644
>> index 0000000000..47d77a472a
>> --- /dev/null
>> +++ b/accel/hvf/hvf-all.c
>> @@ -0,0 +1,56 @@
>> +/*
>> + * QEMU Hypervisor.framework support
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Contributions after 2012-01-13 are licensed under the terms of the
>> + * GNU GPL, version 2 or (at your option) any later version.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu-common.h"
>> +#include "qemu/error-report.h"
>> +#include "sysemu/hvf.h"
>> +#include "sysemu/hvf_int.h"
>> +#include "sysemu/runstate.h"
>> +
>> +#include "qemu/main-loop.h"
>> +#include "sysemu/accel.h"
>> +
>> +#include <Hypervisor/Hypervisor.h>
>> +
>> +bool hvf_allowed;
>> +HVFState *hvf_state;
>> +
>> +void assert_hvf_ok(hv_return_t ret)
>> +{
>> +    if (ret == HV_SUCCESS) {
>> +        return;
>> +    }
>> +
>> +    switch (ret) {
>> +    case HV_ERROR:
>> +        error_report("Error: HV_ERROR");
>> +        break;
>> +    case HV_BUSY:
>> +        error_report("Error: HV_BUSY");
>> +        break;
>> +    case HV_BAD_ARGUMENT:
>> +        error_report("Error: HV_BAD_ARGUMENT");
>> +        break;
>> +    case HV_NO_RESOURCES:
>> +        error_report("Error: HV_NO_RESOURCES");
>> +        break;
>> +    case HV_NO_DEVICE:
>> +        error_report("Error: HV_NO_DEVICE");
>> +        break;
>> +    case HV_UNSUPPORTED:
>> +        error_report("Error: HV_UNSUPPORTED");
>> +        break;
>> +    default:
>> +        error_report("Unknown Error");
>> +    }
>> +
>> +    abort();
>> +}
>> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
>> new file mode 100644
>> index 0000000000..f9bb5502b7
>> --- /dev/null
>> +++ b/accel/hvf/hvf-cpus.c
>> @@ -0,0 +1,468 @@
>> +/*
>> + * Copyright 2008 IBM Corporation
>> + *           2008 Red Hat, Inc.
>> + * Copyright 2011 Intel Corporation
>> + * Copyright 2016 Veertu, Inc.
>> + * Copyright 2017 The Android Open Source Project
>> + *
>> + * QEMU Hypervisor.framework support
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of version 2 of the GNU General Public
>> + * License as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + *
>> + * This file contain code under public domain from the hvdos project:
>> + * https://github.com/mist64/hvdos
>> + *
>> + * Parts Copyright (c) 2011 NetApp, Inc.
>> + * All rights reserved.
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions
>> + * are met:
>> + * 1. Redistributions of source code must retain the above copyright
>> + *    notice, this list of conditions and the following disclaimer.
>> + * 2. Redistributions in binary form must reproduce the above copyright
>> + *    notice, this list of conditions and the following disclaimer in the
>> + *    documentation and/or other materials provided with the distribution.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
>> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
>> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
>> + * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
>> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
>> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
>> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
>> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
>> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
>> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>> + * SUCH DAMAGE.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/main-loop.h"
>> +#include "exec/address-spaces.h"
>> +#include "exec/exec-all.h"
>> +#include "sysemu/cpus.h"
>> +#include "sysemu/hvf.h"
>> +#include "sysemu/hvf_int.h"
>> +#include "sysemu/runstate.h"
>> +#include "qemu/guest-random.h"
>> +
>> +#include <Hypervisor/Hypervisor.h>
>> +
>> +/* Memory slots */
>> +
>> +struct mac_slot {
>> +    int present;
>> +    uint64_t size;
>> +    uint64_t gpa_start;
>> +    uint64_t gva;
>> +};
>> +
>> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
>> +{
>> +    hvf_slot *slot;
>> +    int x;
>> +    for (x = 0; x < hvf_state->num_slots; ++x) {
>> +        slot = &hvf_state->slots[x];
>> +        if (slot->size && start < (slot->start + slot->size) &&
>> +            (start + size) > slot->start) {
>> +            return slot;
>> +        }
>> +    }
>> +    return NULL;
>> +}
>> +
>> +struct mac_slot mac_slots[32];
>> +
>> +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
>> +{
>> +    struct mac_slot *macslot;
>> +    hv_return_t ret;
>> +
>> +    macslot = &mac_slots[slot->slot_id];
>> +
>> +    if (macslot->present) {
>> +        if (macslot->size != slot->size) {
>> +            macslot->present = 0;
>> +            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
>> +            assert_hvf_ok(ret);
>> +        }
>> +    }
>> +
>> +    if (!slot->size) {
>> +        return 0;
>> +    }
>> +
>> +    macslot->present = 1;
>> +    macslot->gpa_start = slot->start;
>> +    macslot->size = slot->size;
>> +    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
>> +    assert_hvf_ok(ret);
>> +    return 0;
>> +}
>> +
>> +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
>> +{
>> +    hvf_slot *mem;
>> +    MemoryRegion *area = section->mr;
>> +    bool writeable = !area->readonly && !area->rom_device;
>> +    hv_memory_flags_t flags;
>> +
>> +    if (!memory_region_is_ram(area)) {
>> +        if (writeable) {
>> +            return;
>> +        } else if (!memory_region_is_romd(area)) {
>> +            /*
>> +             * If the memory device is not in romd_mode, then we actually want
>> +             * to remove the hvf memory slot so all accesses will trap.
>> +             */
>> +             add = false;
>> +        }
>> +    }
>> +
>> +    mem = hvf_find_overlap_slot(
>> +            section->offset_within_address_space,
>> +            int128_get64(section->size));
>> +
>> +    if (mem && add) {
>> +        if (mem->size == int128_get64(section->size) &&
>> +            mem->start == section->offset_within_address_space &&
>> +            mem->mem == (memory_region_get_ram_ptr(area) +
>> +            section->offset_within_region)) {
>> +            return; /* Same region was attempted to register, go away. */
>> +        }
>> +    }
>> +
>> +    /* Region needs to be reset. set the size to 0 and remap it. */
>> +    if (mem) {
>> +        mem->size = 0;
>> +        if (do_hvf_set_memory(mem, 0)) {
>> +            error_report("Failed to reset overlapping slot");
>> +            abort();
>> +        }
>> +    }
>> +
>> +    if (!add) {
>> +        return;
>> +    }
>> +
>> +    if (area->readonly ||
>> +        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
>> +        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
>> +    } else {
>> +        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
>> +    }
>> +
>> +    /* Now make a new slot. */
>> +    int x;
>> +
>> +    for (x = 0; x < hvf_state->num_slots; ++x) {
>> +        mem = &hvf_state->slots[x];
>> +        if (!mem->size) {
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (x == hvf_state->num_slots) {
>> +        error_report("No free slots");
>> +        abort();
>> +    }
>> +
>> +    mem->size = int128_get64(section->size);
>> +    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
>> +    mem->start = section->offset_within_address_space;
>> +    mem->region = area;
>> +
>> +    if (do_hvf_set_memory(mem, flags)) {
>> +        error_report("Error registering new memory slot");
>> +        abort();
>> +    }
>> +}
>> +
>> +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
>> +{
>> +    hvf_slot *slot;
>> +
>> +    slot = hvf_find_overlap_slot(
>> +            section->offset_within_address_space,
>> +            int128_get64(section->size));
>> +
>> +    /* protect region against writes; begin tracking it */
>> +    if (on) {
>> +        slot->flags |= HVF_SLOT_LOG;
>> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
>> +                      HV_MEMORY_READ);
>> +    /* stop tracking region*/
>> +    } else {
>> +        slot->flags &= ~HVF_SLOT_LOG;
>> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
>> +                      HV_MEMORY_READ | HV_MEMORY_WRITE);
>> +    }
>> +}
>> +
>> +static void hvf_log_start(MemoryListener *listener,
>> +                          MemoryRegionSection *section, int old, int new)
>> +{
>> +    if (old != 0) {
>> +        return;
>> +    }
>> +
>> +    hvf_set_dirty_tracking(section, 1);
>> +}
>> +
>> +static void hvf_log_stop(MemoryListener *listener,
>> +                         MemoryRegionSection *section, int old, int new)
>> +{
>> +    if (new != 0) {
>> +        return;
>> +    }
>> +
>> +    hvf_set_dirty_tracking(section, 0);
>> +}
>> +
>> +static void hvf_log_sync(MemoryListener *listener,
>> +                         MemoryRegionSection *section)
>> +{
>> +    /*
>> +     * sync of dirty pages is handled elsewhere; just make sure we keep
>> +     * tracking the region.
>> +     */
>> +    hvf_set_dirty_tracking(section, 1);
>> +}
>> +
>> +static void hvf_region_add(MemoryListener *listener,
>> +                           MemoryRegionSection *section)
>> +{
>> +    hvf_set_phys_mem(section, true);
>> +}
>> +
>> +static void hvf_region_del(MemoryListener *listener,
>> +                           MemoryRegionSection *section)
>> +{
>> +    hvf_set_phys_mem(section, false);
>> +}
>> +
>> +static MemoryListener hvf_memory_listener = {
>> +    .priority = 10,
>> +    .region_add = hvf_region_add,
>> +    .region_del = hvf_region_del,
>> +    .log_start = hvf_log_start,
>> +    .log_stop = hvf_log_stop,
>> +    .log_sync = hvf_log_sync,
>> +};
>> +
>> +static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
>> +{
>> +    if (!cpu->vcpu_dirty) {
>> +        hvf_get_registers(cpu);
>> +        cpu->vcpu_dirty = true;
>> +    }
>> +}
>> +
>> +static void hvf_cpu_synchronize_state(CPUState *cpu)
>> +{
>> +    if (!cpu->vcpu_dirty) {
>> +        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
>> +    }
>> +}
>> +
>> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
>> +                                              run_on_cpu_data arg)
>> +{
>> +    hvf_put_registers(cpu);
>> +    cpu->vcpu_dirty = false;
>> +}
>> +
>> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
>> +{
>> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
>> +}
>> +
>> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
>> +                                             run_on_cpu_data arg)
>> +{
>> +    hvf_put_registers(cpu);
>> +    cpu->vcpu_dirty = false;
>> +}
>> +
>> +static void hvf_cpu_synchronize_post_init(CPUState *cpu)
>> +{
>> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
>> +}
>> +
>> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
>> +                                              run_on_cpu_data arg)
>> +{
>> +    cpu->vcpu_dirty = true;
>> +}
>> +
>> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
>> +{
>> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
>> +}
>> +
>> +static void hvf_vcpu_destroy(CPUState *cpu)
>> +{
>> +    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
>> +    assert_hvf_ok(ret);
>> +
>> +    hvf_arch_vcpu_destroy(cpu);
>> +}
>> +
>> +static void dummy_signal(int sig)
>> +{
>> +}
>> +
>> +static int hvf_init_vcpu(CPUState *cpu)
>> +{
>> +    int r;
>> +
>> +    /* init cpu signals */
>> +    sigset_t set;
>> +    struct sigaction sigact;
>> +
>> +    memset(&sigact, 0, sizeof(sigact));
>> +    sigact.sa_handler = dummy_signal;
>> +    sigaction(SIG_IPI, &sigact, NULL);
>> +
>> +    pthread_sigmask(SIG_BLOCK, NULL, &set);
>> +    sigdelset(&set, SIG_IPI);
>> +
>> +#ifdef __aarch64__
>> +    r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t **)&cpu->hvf_exit, NULL);
>> +#else
>> +    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
>> +#endif
> I think the first __aarch64__ bit fits better to arm part of the series.


Oops. Thanks for catching it! Yes, absolutely. It should be part of the 
ARM enablement.


>
>> +    cpu->vcpu_dirty = 1;
>> +    assert_hvf_ok(r);
>> +
>> +    return hvf_arch_init_vcpu(cpu);
>> +}
>> +
>> +/*
>> + * The HVF-specific vCPU thread function. This one should only run when the host
>> + * CPU supports the VMX "unrestricted guest" feature.
>> + */
>> +static void *hvf_cpu_thread_fn(void *arg)
>> +{
>> +    CPUState *cpu = arg;
>> +
>> +    int r;
>> +
>> +    assert(hvf_enabled());
>> +
>> +    rcu_register_thread();
>> +
>> +    qemu_mutex_lock_iothread();
>> +    qemu_thread_get_self(cpu->thread);
>> +
>> +    cpu->thread_id = qemu_get_thread_id();
>> +    cpu->can_do_io = 1;
>> +    current_cpu = cpu;
>> +
>> +    hvf_init_vcpu(cpu);
>> +
>> +    /* signal CPU creation */
>> +    cpu_thread_signal_created(cpu);
>> +    qemu_guest_random_seed_thread_part2(cpu->random_seed);
>> +
>> +    do {
>> +        if (cpu_can_run(cpu)) {
>> +            r = hvf_vcpu_exec(cpu);
>> +            if (r == EXCP_DEBUG) {
>> +                cpu_handle_guest_debug(cpu);
>> +            }
>> +        }
>> +        qemu_wait_io_event(cpu);
>> +    } while (!cpu->unplug || cpu_can_run(cpu));
>> +
>> +    hvf_vcpu_destroy(cpu);
>> +    cpu_thread_signal_destroyed(cpu);
>> +    qemu_mutex_unlock_iothread();
>> +    rcu_unregister_thread();
>> +    return NULL;
>> +}
>> +
>> +static void hvf_start_vcpu_thread(CPUState *cpu)
>> +{
>> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>> +
>> +    /*
>> +     * HVF currently does not support TCG, and only runs in
>> +     * unrestricted-guest mode.
>> +     */
>> +    assert(hvf_enabled());
>> +
>> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>> +    qemu_cond_init(cpu->halt_cond);
>> +
>> +    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>> +             cpu->cpu_index);
>> +    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
>> +                       cpu, QEMU_THREAD_JOINABLE);
>> +}
>> +
>> +static const CpusAccel hvf_cpus = {
>> +    .create_vcpu_thread = hvf_start_vcpu_thread,
>> +
>> +    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
>> +    .synchronize_post_init = hvf_cpu_synchronize_post_init,
>> +    .synchronize_state = hvf_cpu_synchronize_state,
>> +    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
>> +};
>> +
>> +static int hvf_accel_init(MachineState *ms)
>> +{
>> +    int x;
>> +    hv_return_t ret;
>> +    HVFState *s;
>> +
>> +    ret = hv_vm_create(HV_VM_DEFAULT);
>> +    assert_hvf_ok(ret);
>> +
>> +    s = g_new0(HVFState, 1);
>> +
>> +    s->num_slots = 32;
>> +    for (x = 0; x < s->num_slots; ++x) {
>> +        s->slots[x].size = 0;
>> +        s->slots[x].slot_id = x;
>> +    }
>> +
>> +    hvf_state = s;
>> +    memory_listener_register(&hvf_memory_listener, &address_space_memory);
>> +    cpus_register_accel(&hvf_cpus);
>> +    return 0;
>> +}
>> +
>> +static void hvf_accel_class_init(ObjectClass *oc, void *data)
>> +{
>> +    AccelClass *ac = ACCEL_CLASS(oc);
>> +    ac->name = "HVF";
>> +    ac->init_machine = hvf_accel_init;
>> +    ac->allowed = &hvf_allowed;
>> +}
>> +
>> +static const TypeInfo hvf_accel_type = {
>> +    .name = TYPE_HVF_ACCEL,
>> +    .parent = TYPE_ACCEL,
>> +    .class_init = hvf_accel_class_init,
>> +};
>> +
>> +static void hvf_type_init(void)
>> +{
>> +    type_register_static(&hvf_accel_type);
>> +}
>> +
>> +type_init(hvf_type_init);
>> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
>> new file mode 100644
>> index 0000000000..dfd6b68dc7
>> --- /dev/null
>> +++ b/accel/hvf/meson.build
>> @@ -0,0 +1,7 @@
>> +hvf_ss = ss.source_set()
>> +hvf_ss.add(files(
>> +  'hvf-all.c',
>> +  'hvf-cpus.c',
>> +))
>> +
>> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
>> diff --git a/accel/meson.build b/accel/meson.build
>> index b26cca227a..6de12ce5d5 100644
>> --- a/accel/meson.build
>> +++ b/accel/meson.build
>> @@ -1,5 +1,6 @@
>>   softmmu_ss.add(files('accel.c'))
>>   
>> +subdir('hvf')
>>   subdir('qtest')
>>   subdir('kvm')
>>   subdir('tcg')
>> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
>> new file mode 100644
>> index 0000000000..de9bad23a8
>> --- /dev/null
>> +++ b/include/sysemu/hvf_int.h
>> @@ -0,0 +1,69 @@
>> +/*
>> + * QEMU Hypervisor.framework (HVF) support
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +/* header to be included in HVF-specific code */
>> +
>> +#ifndef HVF_INT_H
>> +#define HVF_INT_H
>> +
>> +#include <Hypervisor/Hypervisor.h>
>> +
>> +#define HVF_MAX_VCPU 0x10
>> +
>> +extern struct hvf_state hvf_global;
>> +
>> +struct hvf_vm {
>> +    int id;
>> +    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
>> +};
>> +
>> +struct hvf_state {
>> +    uint32_t version;
>> +    struct hvf_vm *vm;
>> +    uint64_t mem_quota;
>> +};
>> +
>> +/* hvf_slot flags */
>> +#define HVF_SLOT_LOG (1 << 0)
>> +
>> +typedef struct hvf_slot {
>> +    uint64_t start;
>> +    uint64_t size;
>> +    uint8_t *mem;
>> +    int slot_id;
>> +    uint32_t flags;
>> +    MemoryRegion *region;
>> +} hvf_slot;
>> +
>> +typedef struct hvf_vcpu_caps {
>> +    uint64_t vmx_cap_pinbased;
>> +    uint64_t vmx_cap_procbased;
>> +    uint64_t vmx_cap_procbased2;
>> +    uint64_t vmx_cap_entry;
>> +    uint64_t vmx_cap_exit;
>> +    uint64_t vmx_cap_preemption_timer;
>> +} hvf_vcpu_caps;
>> +
>> +struct HVFState {
>> +    AccelState parent;
>> +    hvf_slot slots[32];
>> +    int num_slots;
>> +
>> +    hvf_vcpu_caps *hvf_caps;
>> +};
>> +extern HVFState *hvf_state;
>> +
>> +void assert_hvf_ok(hv_return_t ret);
>> +int hvf_get_registers(CPUState *cpu);
>> +int hvf_put_registers(CPUState *cpu);
>> +int hvf_arch_init_vcpu(CPUState *cpu);
>> +void hvf_arch_vcpu_destroy(CPUState *cpu);
>> +int hvf_vcpu_exec(CPUState *cpu);
>> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
>> +
>> +#endif
>> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c
>> deleted file mode 100644
>> index 817b3d7452..0000000000
>> --- a/target/i386/hvf/hvf-cpus.c
>> +++ /dev/null
>> @@ -1,131 +0,0 @@
>> -/*
>> - * Copyright 2008 IBM Corporation
>> - *           2008 Red Hat, Inc.
>> - * Copyright 2011 Intel Corporation
>> - * Copyright 2016 Veertu, Inc.
>> - * Copyright 2017 The Android Open Source Project
>> - *
>> - * QEMU Hypervisor.framework support
>> - *
>> - * This program is free software; you can redistribute it and/or
>> - * modify it under the terms of version 2 of the GNU General Public
>> - * License as published by the Free Software Foundation.
>> - *
>> - * This program is distributed in the hope that it will be useful,
>> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> - * General Public License for more details.
>> - *
>> - * You should have received a copy of the GNU General Public License
>> - * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> - *
>> - * This file contain code under public domain from the hvdos project:
>> - * https://github.com/mist64/hvdos
>> - *
>> - * Parts Copyright (c) 2011 NetApp, Inc.
>> - * All rights reserved.
>> - *
>> - * Redistribution and use in source and binary forms, with or without
>> - * modification, are permitted provided that the following conditions
>> - * are met:
>> - * 1. Redistributions of source code must retain the above copyright
>> - *    notice, this list of conditions and the following disclaimer.
>> - * 2. Redistributions in binary form must reproduce the above copyright
>> - *    notice, this list of conditions and the following disclaimer in the
>> - *    documentation and/or other materials provided with the distribution.
>> - *
>> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
>> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
>> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
>> - * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
>> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
>> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
>> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
>> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
>> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
>> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>> - * SUCH DAMAGE.
>> - */
>> -
>> -#include "qemu/osdep.h"
>> -#include "qemu/error-report.h"
>> -#include "qemu/main-loop.h"
>> -#include "sysemu/hvf.h"
>> -#include "sysemu/runstate.h"
>> -#include "target/i386/cpu.h"
>> -#include "qemu/guest-random.h"
>> -
>> -#include "hvf-cpus.h"
>> -
>> -/*
>> - * The HVF-specific vCPU thread function. This one should only run when the host
>> - * CPU supports the VMX "unrestricted guest" feature.
>> - */
>> -static void *hvf_cpu_thread_fn(void *arg)
>> -{
>> -    CPUState *cpu = arg;
>> -
>> -    int r;
>> -
>> -    assert(hvf_enabled());
>> -
>> -    rcu_register_thread();
>> -
>> -    qemu_mutex_lock_iothread();
>> -    qemu_thread_get_self(cpu->thread);
>> -
>> -    cpu->thread_id = qemu_get_thread_id();
>> -    cpu->can_do_io = 1;
>> -    current_cpu = cpu;
>> -
>> -    hvf_init_vcpu(cpu);
>> -
>> -    /* signal CPU creation */
>> -    cpu_thread_signal_created(cpu);
>> -    qemu_guest_random_seed_thread_part2(cpu->random_seed);
>> -
>> -    do {
>> -        if (cpu_can_run(cpu)) {
>> -            r = hvf_vcpu_exec(cpu);
>> -            if (r == EXCP_DEBUG) {
>> -                cpu_handle_guest_debug(cpu);
>> -            }
>> -        }
>> -        qemu_wait_io_event(cpu);
>> -    } while (!cpu->unplug || cpu_can_run(cpu));
>> -
>> -    hvf_vcpu_destroy(cpu);
>> -    cpu_thread_signal_destroyed(cpu);
>> -    qemu_mutex_unlock_iothread();
>> -    rcu_unregister_thread();
>> -    return NULL;
>> -}
>> -
>> -static void hvf_start_vcpu_thread(CPUState *cpu)
>> -{
>> -    char thread_name[VCPU_THREAD_NAME_SIZE];
>> -
>> -    /*
>> -     * HVF currently does not support TCG, and only runs in
>> -     * unrestricted-guest mode.
>> -     */
>> -    assert(hvf_enabled());
>> -
>> -    cpu->thread = g_malloc0(sizeof(QemuThread));
>> -    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>> -    qemu_cond_init(cpu->halt_cond);
>> -
>> -    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>> -             cpu->cpu_index);
>> -    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
>> -                       cpu, QEMU_THREAD_JOINABLE);
>> -}
>> -
>> -const CpusAccel hvf_cpus = {
>> -    .create_vcpu_thread = hvf_start_vcpu_thread,
>> -
>> -    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
>> -    .synchronize_post_init = hvf_cpu_synchronize_post_init,
>> -    .synchronize_state = hvf_cpu_synchronize_state,
>> -    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
>> -};
>> diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h
>> deleted file mode 100644
>> index ced31b82c0..0000000000
>> --- a/target/i386/hvf/hvf-cpus.h
>> +++ /dev/null
>> @@ -1,25 +0,0 @@
>> -/*
>> - * Accelerator CPUS Interface
>> - *
>> - * Copyright 2020 SUSE LLC
>> - *
>> - * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> - * See the COPYING file in the top-level directory.
>> - */
>> -
>> -#ifndef HVF_CPUS_H
>> -#define HVF_CPUS_H
>> -
>> -#include "sysemu/cpus.h"
>> -
>> -extern const CpusAccel hvf_cpus;
>> -
>> -int hvf_init_vcpu(CPUState *);
>> -int hvf_vcpu_exec(CPUState *);
>> -void hvf_cpu_synchronize_state(CPUState *);
>> -void hvf_cpu_synchronize_post_reset(CPUState *);
>> -void hvf_cpu_synchronize_post_init(CPUState *);
>> -void hvf_cpu_synchronize_pre_loadvm(CPUState *);
>> -void hvf_vcpu_destroy(CPUState *);
>> -
>> -#endif /* HVF_CPUS_H */
>> diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
>> index e0edffd077..6d56f8f6bb 100644
>> --- a/target/i386/hvf/hvf-i386.h
>> +++ b/target/i386/hvf/hvf-i386.h
>> @@ -18,57 +18,11 @@
>>   
>>   #include "sysemu/accel.h"
>>   #include "sysemu/hvf.h"
>> +#include "sysemu/hvf_int.h"
>>   #include "cpu.h"
>>   #include "x86.h"
>>   
>> -#define HVF_MAX_VCPU 0x10
>> -
>> -extern struct hvf_state hvf_global;
>> -
>> -struct hvf_vm {
>> -    int id;
>> -    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
>> -};
>> -
>> -struct hvf_state {
>> -    uint32_t version;
>> -    struct hvf_vm *vm;
>> -    uint64_t mem_quota;
>> -};
>> -
>> -/* hvf_slot flags */
>> -#define HVF_SLOT_LOG (1 << 0)
>> -
>> -typedef struct hvf_slot {
>> -    uint64_t start;
>> -    uint64_t size;
>> -    uint8_t *mem;
>> -    int slot_id;
>> -    uint32_t flags;
>> -    MemoryRegion *region;
>> -} hvf_slot;
>> -
>> -typedef struct hvf_vcpu_caps {
>> -    uint64_t vmx_cap_pinbased;
>> -    uint64_t vmx_cap_procbased;
>> -    uint64_t vmx_cap_procbased2;
>> -    uint64_t vmx_cap_entry;
>> -    uint64_t vmx_cap_exit;
>> -    uint64_t vmx_cap_preemption_timer;
>> -} hvf_vcpu_caps;
>> -
>> -struct HVFState {
>> -    AccelState parent;
>> -    hvf_slot slots[32];
>> -    int num_slots;
>> -
>> -    hvf_vcpu_caps *hvf_caps;
>> -};
>> -extern HVFState *hvf_state;
>> -
>> -void hvf_set_phys_mem(MemoryRegionSection *, bool);
>>   void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
>> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
>>   
>>   #ifdef NEED_CPU_H
>>   /* Functions exported to host specific mode */
>> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
>> index ed9356565c..8b96ecd619 100644
>> --- a/target/i386/hvf/hvf.c
>> +++ b/target/i386/hvf/hvf.c
>> @@ -51,6 +51,7 @@
>>   #include "qemu/error-report.h"
>>   
>>   #include "sysemu/hvf.h"
>> +#include "sysemu/hvf_int.h"
>>   #include "sysemu/runstate.h"
>>   #include "hvf-i386.h"
>>   #include "vmcs.h"
>> @@ -72,171 +73,6 @@
>>   #include "sysemu/accel.h"
>>   #include "target/i386/cpu.h"
>>   
>> -#include "hvf-cpus.h"
>> -
>> -HVFState *hvf_state;
>> -
>> -static void assert_hvf_ok(hv_return_t ret)
>> -{
>> -    if (ret == HV_SUCCESS) {
>> -        return;
>> -    }
>> -
>> -    switch (ret) {
>> -    case HV_ERROR:
>> -        error_report("Error: HV_ERROR");
>> -        break;
>> -    case HV_BUSY:
>> -        error_report("Error: HV_BUSY");
>> -        break;
>> -    case HV_BAD_ARGUMENT:
>> -        error_report("Error: HV_BAD_ARGUMENT");
>> -        break;
>> -    case HV_NO_RESOURCES:
>> -        error_report("Error: HV_NO_RESOURCES");
>> -        break;
>> -    case HV_NO_DEVICE:
>> -        error_report("Error: HV_NO_DEVICE");
>> -        break;
>> -    case HV_UNSUPPORTED:
>> -        error_report("Error: HV_UNSUPPORTED");
>> -        break;
>> -    default:
>> -        error_report("Unknown Error");
>> -    }
>> -
>> -    abort();
>> -}
>> -
>> -/* Memory slots */
>> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
>> -{
>> -    hvf_slot *slot;
>> -    int x;
>> -    for (x = 0; x < hvf_state->num_slots; ++x) {
>> -        slot = &hvf_state->slots[x];
>> -        if (slot->size && start < (slot->start + slot->size) &&
>> -            (start + size) > slot->start) {
>> -            return slot;
>> -        }
>> -    }
>> -    return NULL;
>> -}
>> -
>> -struct mac_slot {
>> -    int present;
>> -    uint64_t size;
>> -    uint64_t gpa_start;
>> -    uint64_t gva;
>> -};
>> -
>> -struct mac_slot mac_slots[32];
>> -
>> -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
>> -{
>> -    struct mac_slot *macslot;
>> -    hv_return_t ret;
>> -
>> -    macslot = &mac_slots[slot->slot_id];
>> -
>> -    if (macslot->present) {
>> -        if (macslot->size != slot->size) {
>> -            macslot->present = 0;
>> -            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
>> -            assert_hvf_ok(ret);
>> -        }
>> -    }
>> -
>> -    if (!slot->size) {
>> -        return 0;
>> -    }
>> -
>> -    macslot->present = 1;
>> -    macslot->gpa_start = slot->start;
>> -    macslot->size = slot->size;
>> -    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
>> -    assert_hvf_ok(ret);
>> -    return 0;
>> -}
>> -
>> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
>> -{
>> -    hvf_slot *mem;
>> -    MemoryRegion *area = section->mr;
>> -    bool writeable = !area->readonly && !area->rom_device;
>> -    hv_memory_flags_t flags;
>> -
>> -    if (!memory_region_is_ram(area)) {
>> -        if (writeable) {
>> -            return;
>> -        } else if (!memory_region_is_romd(area)) {
>> -            /*
>> -             * If the memory device is not in romd_mode, then we actually want
>> -             * to remove the hvf memory slot so all accesses will trap.
>> -             */
>> -             add = false;
>> -        }
>> -    }
>> -
>> -    mem = hvf_find_overlap_slot(
>> -            section->offset_within_address_space,
>> -            int128_get64(section->size));
>> -
>> -    if (mem && add) {
>> -        if (mem->size == int128_get64(section->size) &&
>> -            mem->start == section->offset_within_address_space &&
>> -            mem->mem == (memory_region_get_ram_ptr(area) +
>> -            section->offset_within_region)) {
>> -            return; /* Same region was attempted to register, go away. */
>> -        }
>> -    }
>> -
>> -    /* Region needs to be reset. set the size to 0 and remap it. */
>> -    if (mem) {
>> -        mem->size = 0;
>> -        if (do_hvf_set_memory(mem, 0)) {
>> -            error_report("Failed to reset overlapping slot");
>> -            abort();
>> -        }
>> -    }
>> -
>> -    if (!add) {
>> -        return;
>> -    }
>> -
>> -    if (area->readonly ||
>> -        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
>> -        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
>> -    } else {
>> -        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
>> -    }
>> -
>> -    /* Now make a new slot. */
>> -    int x;
>> -
>> -    for (x = 0; x < hvf_state->num_slots; ++x) {
>> -        mem = &hvf_state->slots[x];
>> -        if (!mem->size) {
>> -            break;
>> -        }
>> -    }
>> -
>> -    if (x == hvf_state->num_slots) {
>> -        error_report("No free slots");
>> -        abort();
>> -    }
>> -
>> -    mem->size = int128_get64(section->size);
>> -    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
>> -    mem->start = section->offset_within_address_space;
>> -    mem->region = area;
>> -
>> -    if (do_hvf_set_memory(mem, flags)) {
>> -        error_report("Error registering new memory slot");
>> -        abort();
>> -    }
>> -}
>> -
>>   void vmx_update_tpr(CPUState *cpu)
>>   {
>>       /* TODO: need integrate APIC handling */
>> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer,
>>       }
>>   }
>>   
>> -static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
>> -{
>> -    if (!cpu->vcpu_dirty) {
>> -        hvf_get_registers(cpu);
>> -        cpu->vcpu_dirty = true;
>> -    }
>> -}
>> -
>> -void hvf_cpu_synchronize_state(CPUState *cpu)
>> -{
>> -    if (!cpu->vcpu_dirty) {
>> -        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
>> -    }
>> -}
>> -
>> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
>> -                                              run_on_cpu_data arg)
>> -{
>> -    hvf_put_registers(cpu);
>> -    cpu->vcpu_dirty = false;
>> -}
>> -
>> -void hvf_cpu_synchronize_post_reset(CPUState *cpu)
>> -{
>> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
>> -}
>> -
>> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
>> -                                             run_on_cpu_data arg)
>> -{
>> -    hvf_put_registers(cpu);
>> -    cpu->vcpu_dirty = false;
>> -}
>> -
>> -void hvf_cpu_synchronize_post_init(CPUState *cpu)
>> -{
>> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
>> -}
>> -
>> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
>> -                                              run_on_cpu_data arg)
>> -{
>> -    cpu->vcpu_dirty = true;
>> -}
>> -
>> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
>> -{
>> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
>> -}
>> -
>>   static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
>>   {
>>       int read, write;
>> @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
>>       return false;
>>   }
>>   
>> -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
>> -{
>> -    hvf_slot *slot;
>> -
>> -    slot = hvf_find_overlap_slot(
>> -            section->offset_within_address_space,
>> -            int128_get64(section->size));
>> -
>> -    /* protect region against writes; begin tracking it */
>> -    if (on) {
>> -        slot->flags |= HVF_SLOT_LOG;
>> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
>> -                      HV_MEMORY_READ);
>> -    /* stop tracking region*/
>> -    } else {
>> -        slot->flags &= ~HVF_SLOT_LOG;
>> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
>> -                      HV_MEMORY_READ | HV_MEMORY_WRITE);
>> -    }
>> -}
>> -
>> -static void hvf_log_start(MemoryListener *listener,
>> -                          MemoryRegionSection *section, int old, int new)
>> -{
>> -    if (old != 0) {
>> -        return;
>> -    }
>> -
>> -    hvf_set_dirty_tracking(section, 1);
>> -}
>> -
>> -static void hvf_log_stop(MemoryListener *listener,
>> -                         MemoryRegionSection *section, int old, int new)
>> -{
>> -    if (new != 0) {
>> -        return;
>> -    }
>> -
>> -    hvf_set_dirty_tracking(section, 0);
>> -}
>> -
>> -static void hvf_log_sync(MemoryListener *listener,
>> -                         MemoryRegionSection *section)
>> -{
>> -    /*
>> -     * sync of dirty pages is handled elsewhere; just make sure we keep
>> -     * tracking the region.
>> -     */
>> -    hvf_set_dirty_tracking(section, 1);
>> -}
>> -
>> -static void hvf_region_add(MemoryListener *listener,
>> -                           MemoryRegionSection *section)
>> -{
>> -    hvf_set_phys_mem(section, true);
>> -}
>> -
>> -static void hvf_region_del(MemoryListener *listener,
>> -                           MemoryRegionSection *section)
>> -{
>> -    hvf_set_phys_mem(section, false);
>> -}
>> -
>> -static MemoryListener hvf_memory_listener = {
>> -    .priority = 10,
>> -    .region_add = hvf_region_add,
>> -    .region_del = hvf_region_del,
>> -    .log_start = hvf_log_start,
>> -    .log_stop = hvf_log_stop,
>> -    .log_sync = hvf_log_sync,
>> -};
>> -
>> -void hvf_vcpu_destroy(CPUState *cpu)
>> +void hvf_arch_vcpu_destroy(CPUState *cpu)
>>   {
>>       X86CPU *x86_cpu = X86_CPU(cpu);
>>       CPUX86State *env = &x86_cpu->env;
>>   
>> -    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
>>       g_free(env->hvf_mmio_buf);
>> -    assert_hvf_ok(ret);
>> -}
>> -
>> -static void dummy_signal(int sig)
>> -{
>>   }
>>   
>> -int hvf_init_vcpu(CPUState *cpu)
>> +int hvf_arch_init_vcpu(CPUState *cpu)
>>   {
>>   
>>       X86CPU *x86cpu = X86_CPU(cpu);
>>       CPUX86State *env = &x86cpu->env;
>> -    int r;
>> -
>> -    /* init cpu signals */
>> -    sigset_t set;
>> -    struct sigaction sigact;
>> -
>> -    memset(&sigact, 0, sizeof(sigact));
>> -    sigact.sa_handler = dummy_signal;
>> -    sigaction(SIG_IPI, &sigact, NULL);
>> -
>> -    pthread_sigmask(SIG_BLOCK, NULL, &set);
>> -    sigdelset(&set, SIG_IPI);
>>   
>>       init_emu();
>>       init_decoder();
>> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu)
>>       hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1);
>>       env->hvf_mmio_buf = g_new(char, 4096);
>>   
>> -    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
>> -    cpu->vcpu_dirty = 1;
>> -    assert_hvf_ok(r);
>> -
>>       if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
>>           &hvf_state->hvf_caps->vmx_cap_pinbased)) {
>>           abort();
>> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu)
>>   
>>       return ret;
>>   }
>> -
>> -bool hvf_allowed;
>> -
>> -static int hvf_accel_init(MachineState *ms)
>> -{
>> -    int x;
>> -    hv_return_t ret;
>> -    HVFState *s;
>> -
>> -    ret = hv_vm_create(HV_VM_DEFAULT);
>> -    assert_hvf_ok(ret);
>> -
>> -    s = g_new0(HVFState, 1);
>> -
>> -    s->num_slots = 32;
>> -    for (x = 0; x < s->num_slots; ++x) {
>> -        s->slots[x].size = 0;
>> -        s->slots[x].slot_id = x;
>> -    }
>> -
>> -    hvf_state = s;
>> -    memory_listener_register(&hvf_memory_listener, &address_space_memory);
>> -    cpus_register_accel(&hvf_cpus);
>> -    return 0;
>> -}
>> -
>> -static void hvf_accel_class_init(ObjectClass *oc, void *data)
>> -{
>> -    AccelClass *ac = ACCEL_CLASS(oc);
>> -    ac->name = "HVF";
>> -    ac->init_machine = hvf_accel_init;
>> -    ac->allowed = &hvf_allowed;
>> -}
>> -
>> -static const TypeInfo hvf_accel_type = {
>> -    .name = TYPE_HVF_ACCEL,
>> -    .parent = TYPE_ACCEL,
>> -    .class_init = hvf_accel_class_init,
>> -};
>> -
>> -static void hvf_type_init(void)
>> -{
>> -    type_register_static(&hvf_accel_type);
>> -}
>> -
>> -type_init(hvf_type_init);
>> diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
>> index 409c9a3f14..c8a43717ee 100644
>> --- a/target/i386/hvf/meson.build
>> +++ b/target/i386/hvf/meson.build
>> @@ -1,6 +1,5 @@
>>   i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
>>     'hvf.c',
>> -  'hvf-cpus.c',
>>     'x86.c',
>>     'x86_cpuid.c',
>>     'x86_decode.c',
>> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
>> index bbec412b6c..89b8e9d87a 100644
>> --- a/target/i386/hvf/x86hvf.c
>> +++ b/target/i386/hvf/x86hvf.c
>> @@ -20,6 +20,9 @@
>>   #include "qemu/osdep.h"
>>   
>>   #include "qemu-common.h"
>> +#include "sysemu/hvf.h"
>> +#include "sysemu/hvf_int.h"
>> +#include "sysemu/hw_accel.h"
>>   #include "x86hvf.h"
>>   #include "vmx.h"
>>   #include "vmcs.h"
>> @@ -32,8 +35,6 @@
>>   #include <Hypervisor/hv.h>
>>   #include <Hypervisor/hv_vmx.h>
>>   
>> -#include "hvf-cpus.h"
>> -
>>   void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
>>                        SegmentCache *qseg, bool is_tr)
>>   {
>> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state)
>>       env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
>>   
>>       if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
>> -        hvf_cpu_synchronize_state(cpu_state);
>> +        cpu_synchronize_state(cpu_state);
>>           do_cpu_init(cpu);
>>       }
>>   
>> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state)
>>           cpu_state->halted = 0;
>>       }
>>       if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) {
>> -        hvf_cpu_synchronize_state(cpu_state);
>> +        cpu_synchronize_state(cpu_state);
>>           do_cpu_sipi(cpu);
>>       }
>>       if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) {
>>           cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR;
>> -        hvf_cpu_synchronize_state(cpu_state);
>> +        cpu_synchronize_state(cpu_state);
> The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should
> be a separate patch. It follows cpu/accel cleanups Claudio was doing the
> summer.


The only reason they're in here is because we no longer have access to 
the hvf_ functions from the file. I am perfectly happy to rebase the 
patch on top of Claudio's if his goes in first. I'm sure it'll be 
trivial for him to rebase on top of this too if my series goes in first.


>
> Phillipe raised the idea that the patch might go ahead of ARM-specific
> part (which might involve some discussions) and I agree with that.
>
> Some sync between Claudio series (CC'd him) and the patch might be need.


I would prefer not to hold back because of the sync. Claudio's cleanup 
is trivial enough to adjust for if it gets merged ahead of this.


Alex
Frank Yang Nov. 27, 2020, 11:30 p.m. UTC | #3
Hi all,

+Peter Collingbourne <pcc@google.com>

I'm a developer on the Android Emulator, which is in a fork of QEMU.

Peter and I have been working on an HVF Apple Silicon backend with an eye
toward Android guests.

We have gotten things to basically switch to Android userspace already
(logcat/shell and graphics available at least)

Our strategy so far has been to import logic from the KVM implementation
and hook into QEMU's software devices that previously assumed to only work
with TCG, or have KVM-specific paths.

Thanks to Alexander for the tip on the 36-bit address space limitation btw;
our way of addressing this is to still allow highmem but not put pci high
mmio so high.

Also, note we have a sleep/signal based mechanism to deal with WFx, which
might be worth looking into in Alexander's implementation as well:

https://android-review.googlesource.com/c/platform/external/qemu/+/1512551

Patches so far, FYI:

https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1
https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3
https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3
https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3
https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3

https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a
https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b
https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01
https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228
https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102
https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6

Peter's also noticed that there are extra steps needed for M1's to allow
TCG to work, as it involves JIT:

https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9

We'd appreciate any feedback/comments :)

Best,

Frank

On Fri, Nov 27, 2020 at 1:57 PM Alexander Graf <agraf@csgraf.de> wrote:

>
> On 27.11.20 21:00, Roman Bolshakov wrote:
> > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote:
> >> Until now, Hypervisor.framework has only been available on x86_64
> systems.
> >> With Apple Silicon shipping now, it extends its reach to aarch64. To
> >> prepare for support for multiple architectures, let's move common code
> out
> >> into its own accel directory.
> >>
> >> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> >> ---
> >>   MAINTAINERS                 |   9 +-
> >>   accel/hvf/hvf-all.c         |  56 +++++
> >>   accel/hvf/hvf-cpus.c        | 468 ++++++++++++++++++++++++++++++++++++
> >>   accel/hvf/meson.build       |   7 +
> >>   accel/meson.build           |   1 +
> >>   include/sysemu/hvf_int.h    |  69 ++++++
> >>   target/i386/hvf/hvf-cpus.c  | 131 ----------
> >>   target/i386/hvf/hvf-cpus.h  |  25 --
> >>   target/i386/hvf/hvf-i386.h  |  48 +---
> >>   target/i386/hvf/hvf.c       | 360 +--------------------------
> >>   target/i386/hvf/meson.build |   1 -
> >>   target/i386/hvf/x86hvf.c    |  11 +-
> >>   target/i386/hvf/x86hvf.h    |   2 -
> >>   13 files changed, 619 insertions(+), 569 deletions(-)
> >>   create mode 100644 accel/hvf/hvf-all.c
> >>   create mode 100644 accel/hvf/hvf-cpus.c
> >>   create mode 100644 accel/hvf/meson.build
> >>   create mode 100644 include/sysemu/hvf_int.h
> >>   delete mode 100644 target/i386/hvf/hvf-cpus.c
> >>   delete mode 100644 target/i386/hvf/hvf-cpus.h
> >>
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index 68bc160f41..ca4b6d9279 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com>
> >>   M: Roman Bolshakov <r.bolshakov@yadro.com>
> >>   W: https://wiki.qemu.org/Features/HVF
> >>   S: Maintained
> >> -F: accel/stubs/hvf-stub.c
> > There was a patch for that in the RFC series from Claudio.
>
>
> Yeah, I'm not worried about this hunk :).
>
>
> >
> >>   F: target/i386/hvf/
> >> +
> >> +HVF
> >> +M: Cameron Esfahani <dirty@apple.com>
> >> +M: Roman Bolshakov <r.bolshakov@yadro.com>
> >> +W: https://wiki.qemu.org/Features/HVF
> >> +S: Maintained
> >> +F: accel/hvf/
> >>   F: include/sysemu/hvf.h
> >> +F: include/sysemu/hvf_int.h
> >>
> >>   WHPX CPUs
> >>   M: Sunil Muthuswamy <sunilmut@microsoft.com>
> >> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
> >> new file mode 100644
> >> index 0000000000..47d77a472a
> >> --- /dev/null
> >> +++ b/accel/hvf/hvf-all.c
> >> @@ -0,0 +1,56 @@
> >> +/*
> >> + * QEMU Hypervisor.framework support
> >> + *
> >> + * This work is licensed under the terms of the GNU GPL, version 2.
> See
> >> + * the COPYING file in the top-level directory.
> >> + *
> >> + * Contributions after 2012-01-13 are licensed under the terms of the
> >> + * GNU GPL, version 2 or (at your option) any later version.
> >> + */
> >> +
> >> +#include "qemu/osdep.h"
> >> +#include "qemu-common.h"
> >> +#include "qemu/error-report.h"
> >> +#include "sysemu/hvf.h"
> >> +#include "sysemu/hvf_int.h"
> >> +#include "sysemu/runstate.h"
> >> +
> >> +#include "qemu/main-loop.h"
> >> +#include "sysemu/accel.h"
> >> +
> >> +#include <Hypervisor/Hypervisor.h>
> >> +
> >> +bool hvf_allowed;
> >> +HVFState *hvf_state;
> >> +
> >> +void assert_hvf_ok(hv_return_t ret)
> >> +{
> >> +    if (ret == HV_SUCCESS) {
> >> +        return;
> >> +    }
> >> +
> >> +    switch (ret) {
> >> +    case HV_ERROR:
> >> +        error_report("Error: HV_ERROR");
> >> +        break;
> >> +    case HV_BUSY:
> >> +        error_report("Error: HV_BUSY");
> >> +        break;
> >> +    case HV_BAD_ARGUMENT:
> >> +        error_report("Error: HV_BAD_ARGUMENT");
> >> +        break;
> >> +    case HV_NO_RESOURCES:
> >> +        error_report("Error: HV_NO_RESOURCES");
> >> +        break;
> >> +    case HV_NO_DEVICE:
> >> +        error_report("Error: HV_NO_DEVICE");
> >> +        break;
> >> +    case HV_UNSUPPORTED:
> >> +        error_report("Error: HV_UNSUPPORTED");
> >> +        break;
> >> +    default:
> >> +        error_report("Unknown Error");
> >> +    }
> >> +
> >> +    abort();
> >> +}
> >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
> >> new file mode 100644
> >> index 0000000000..f9bb5502b7
> >> --- /dev/null
> >> +++ b/accel/hvf/hvf-cpus.c
> >> @@ -0,0 +1,468 @@
> >> +/*
> >> + * Copyright 2008 IBM Corporation
> >> + *           2008 Red Hat, Inc.
> >> + * Copyright 2011 Intel Corporation
> >> + * Copyright 2016 Veertu, Inc.
> >> + * Copyright 2017 The Android Open Source Project
> >> + *
> >> + * QEMU Hypervisor.framework support
> >> + *
> >> + * This program is free software; you can redistribute it and/or
> >> + * modify it under the terms of version 2 of the GNU General Public
> >> + * License as published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> >> + * General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, see <http://www.gnu.org/licenses/
> >.
> >> + *
> >> + * This file contain code under public domain from the hvdos project:
> >> + * https://github.com/mist64/hvdos
> >> + *
> >> + * Parts Copyright (c) 2011 NetApp, Inc.
> >> + * All rights reserved.
> >> + *
> >> + * Redistribution and use in source and binary forms, with or without
> >> + * modification, are permitted provided that the following conditions
> >> + * are met:
> >> + * 1. Redistributions of source code must retain the above copyright
> >> + *    notice, this list of conditions and the following disclaimer.
> >> + * 2. Redistributions in binary form must reproduce the above copyright
> >> + *    notice, this list of conditions and the following disclaimer in
> the
> >> + *    documentation and/or other materials provided with the
> distribution.
> >> + *
> >> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
> >> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
> THE
> >> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> PURPOSE
> >> + * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE
> LIABLE
> >> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> CONSEQUENTIAL
> >> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
> GOODS
> >> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> INTERRUPTION)
> >> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> STRICT
> >> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
> ANY WAY
> >> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
> OF
> >> + * SUCH DAMAGE.
> >> + */
> >> +
> >> +#include "qemu/osdep.h"
> >> +#include "qemu/error-report.h"
> >> +#include "qemu/main-loop.h"
> >> +#include "exec/address-spaces.h"
> >> +#include "exec/exec-all.h"
> >> +#include "sysemu/cpus.h"
> >> +#include "sysemu/hvf.h"
> >> +#include "sysemu/hvf_int.h"
> >> +#include "sysemu/runstate.h"
> >> +#include "qemu/guest-random.h"
> >> +
> >> +#include <Hypervisor/Hypervisor.h>
> >> +
> >> +/* Memory slots */
> >> +
> >> +struct mac_slot {
> >> +    int present;
> >> +    uint64_t size;
> >> +    uint64_t gpa_start;
> >> +    uint64_t gva;
> >> +};
> >> +
> >> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
> >> +{
> >> +    hvf_slot *slot;
> >> +    int x;
> >> +    for (x = 0; x < hvf_state->num_slots; ++x) {
> >> +        slot = &hvf_state->slots[x];
> >> +        if (slot->size && start < (slot->start + slot->size) &&
> >> +            (start + size) > slot->start) {
> >> +            return slot;
> >> +        }
> >> +    }
> >> +    return NULL;
> >> +}
> >> +
> >> +struct mac_slot mac_slots[32];
> >> +
> >> +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
> >> +{
> >> +    struct mac_slot *macslot;
> >> +    hv_return_t ret;
> >> +
> >> +    macslot = &mac_slots[slot->slot_id];
> >> +
> >> +    if (macslot->present) {
> >> +        if (macslot->size != slot->size) {
> >> +            macslot->present = 0;
> >> +            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
> >> +            assert_hvf_ok(ret);
> >> +        }
> >> +    }
> >> +
> >> +    if (!slot->size) {
> >> +        return 0;
> >> +    }
> >> +
> >> +    macslot->present = 1;
> >> +    macslot->gpa_start = slot->start;
> >> +    macslot->size = slot->size;
> >> +    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
> >> +    assert_hvf_ok(ret);
> >> +    return 0;
> >> +}
> >> +
> >> +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
> >> +{
> >> +    hvf_slot *mem;
> >> +    MemoryRegion *area = section->mr;
> >> +    bool writeable = !area->readonly && !area->rom_device;
> >> +    hv_memory_flags_t flags;
> >> +
> >> +    if (!memory_region_is_ram(area)) {
> >> +        if (writeable) {
> >> +            return;
> >> +        } else if (!memory_region_is_romd(area)) {
> >> +            /*
> >> +             * If the memory device is not in romd_mode, then we
> actually want
> >> +             * to remove the hvf memory slot so all accesses will trap.
> >> +             */
> >> +             add = false;
> >> +        }
> >> +    }
> >> +
> >> +    mem = hvf_find_overlap_slot(
> >> +            section->offset_within_address_space,
> >> +            int128_get64(section->size));
> >> +
> >> +    if (mem && add) {
> >> +        if (mem->size == int128_get64(section->size) &&
> >> +            mem->start == section->offset_within_address_space &&
> >> +            mem->mem == (memory_region_get_ram_ptr(area) +
> >> +            section->offset_within_region)) {
> >> +            return; /* Same region was attempted to register, go away.
> */
> >> +        }
> >> +    }
> >> +
> >> +    /* Region needs to be reset. set the size to 0 and remap it. */
> >> +    if (mem) {
> >> +        mem->size = 0;
> >> +        if (do_hvf_set_memory(mem, 0)) {
> >> +            error_report("Failed to reset overlapping slot");
> >> +            abort();
> >> +        }
> >> +    }
> >> +
> >> +    if (!add) {
> >> +        return;
> >> +    }
> >> +
> >> +    if (area->readonly ||
> >> +        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
> >> +        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
> >> +    } else {
> >> +        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
> >> +    }
> >> +
> >> +    /* Now make a new slot. */
> >> +    int x;
> >> +
> >> +    for (x = 0; x < hvf_state->num_slots; ++x) {
> >> +        mem = &hvf_state->slots[x];
> >> +        if (!mem->size) {
> >> +            break;
> >> +        }
> >> +    }
> >> +
> >> +    if (x == hvf_state->num_slots) {
> >> +        error_report("No free slots");
> >> +        abort();
> >> +    }
> >> +
> >> +    mem->size = int128_get64(section->size);
> >> +    mem->mem = memory_region_get_ram_ptr(area) +
> section->offset_within_region;
> >> +    mem->start = section->offset_within_address_space;
> >> +    mem->region = area;
> >> +
> >> +    if (do_hvf_set_memory(mem, flags)) {
> >> +        error_report("Error registering new memory slot");
> >> +        abort();
> >> +    }
> >> +}
> >> +
> >> +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool
> on)
> >> +{
> >> +    hvf_slot *slot;
> >> +
> >> +    slot = hvf_find_overlap_slot(
> >> +            section->offset_within_address_space,
> >> +            int128_get64(section->size));
> >> +
> >> +    /* protect region against writes; begin tracking it */
> >> +    if (on) {
> >> +        slot->flags |= HVF_SLOT_LOG;
> >> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
> >> +                      HV_MEMORY_READ);
> >> +    /* stop tracking region*/
> >> +    } else {
> >> +        slot->flags &= ~HVF_SLOT_LOG;
> >> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
> >> +                      HV_MEMORY_READ | HV_MEMORY_WRITE);
> >> +    }
> >> +}
> >> +
> >> +static void hvf_log_start(MemoryListener *listener,
> >> +                          MemoryRegionSection *section, int old, int
> new)
> >> +{
> >> +    if (old != 0) {
> >> +        return;
> >> +    }
> >> +
> >> +    hvf_set_dirty_tracking(section, 1);
> >> +}
> >> +
> >> +static void hvf_log_stop(MemoryListener *listener,
> >> +                         MemoryRegionSection *section, int old, int
> new)
> >> +{
> >> +    if (new != 0) {
> >> +        return;
> >> +    }
> >> +
> >> +    hvf_set_dirty_tracking(section, 0);
> >> +}
> >> +
> >> +static void hvf_log_sync(MemoryListener *listener,
> >> +                         MemoryRegionSection *section)
> >> +{
> >> +    /*
> >> +     * sync of dirty pages is handled elsewhere; just make sure we keep
> >> +     * tracking the region.
> >> +     */
> >> +    hvf_set_dirty_tracking(section, 1);
> >> +}
> >> +
> >> +static void hvf_region_add(MemoryListener *listener,
> >> +                           MemoryRegionSection *section)
> >> +{
> >> +    hvf_set_phys_mem(section, true);
> >> +}
> >> +
> >> +static void hvf_region_del(MemoryListener *listener,
> >> +                           MemoryRegionSection *section)
> >> +{
> >> +    hvf_set_phys_mem(section, false);
> >> +}
> >> +
> >> +static MemoryListener hvf_memory_listener = {
> >> +    .priority = 10,
> >> +    .region_add = hvf_region_add,
> >> +    .region_del = hvf_region_del,
> >> +    .log_start = hvf_log_start,
> >> +    .log_stop = hvf_log_stop,
> >> +    .log_sync = hvf_log_sync,
> >> +};
> >> +
> >> +static void do_hvf_cpu_synchronize_state(CPUState *cpu,
> run_on_cpu_data arg)
> >> +{
> >> +    if (!cpu->vcpu_dirty) {
> >> +        hvf_get_registers(cpu);
> >> +        cpu->vcpu_dirty = true;
> >> +    }
> >> +}
> >> +
> >> +static void hvf_cpu_synchronize_state(CPUState *cpu)
> >> +{
> >> +    if (!cpu->vcpu_dirty) {
> >> +        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
> >> +    }
> >> +}
> >> +
> >> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
> >> +                                              run_on_cpu_data arg)
> >> +{
> >> +    hvf_put_registers(cpu);
> >> +    cpu->vcpu_dirty = false;
> >> +}
> >> +
> >> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
> >> +{
> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset,
> RUN_ON_CPU_NULL);
> >> +}
> >> +
> >> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
> >> +                                             run_on_cpu_data arg)
> >> +{
> >> +    hvf_put_registers(cpu);
> >> +    cpu->vcpu_dirty = false;
> >> +}
> >> +
> >> +static void hvf_cpu_synchronize_post_init(CPUState *cpu)
> >> +{
> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
> >> +}
> >> +
> >> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
> >> +                                              run_on_cpu_data arg)
> >> +{
> >> +    cpu->vcpu_dirty = true;
> >> +}
> >> +
> >> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
> >> +{
> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm,
> RUN_ON_CPU_NULL);
> >> +}
> >> +
> >> +static void hvf_vcpu_destroy(CPUState *cpu)
> >> +{
> >> +    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
> >> +    assert_hvf_ok(ret);
> >> +
> >> +    hvf_arch_vcpu_destroy(cpu);
> >> +}
> >> +
> >> +static void dummy_signal(int sig)
> >> +{
> >> +}
> >> +
> >> +static int hvf_init_vcpu(CPUState *cpu)
> >> +{
> >> +    int r;
> >> +
> >> +    /* init cpu signals */
> >> +    sigset_t set;
> >> +    struct sigaction sigact;
> >> +
> >> +    memset(&sigact, 0, sizeof(sigact));
> >> +    sigact.sa_handler = dummy_signal;
> >> +    sigaction(SIG_IPI, &sigact, NULL);
> >> +
> >> +    pthread_sigmask(SIG_BLOCK, NULL, &set);
> >> +    sigdelset(&set, SIG_IPI);
> >> +
> >> +#ifdef __aarch64__
> >> +    r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t
> **)&cpu->hvf_exit, NULL);
> >> +#else
> >> +    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
> >> +#endif
> > I think the first __aarch64__ bit fits better to arm part of the series.
>
>
> Oops. Thanks for catching it! Yes, absolutely. It should be part of the
> ARM enablement.
>
>
> >
> >> +    cpu->vcpu_dirty = 1;
> >> +    assert_hvf_ok(r);
> >> +
> >> +    return hvf_arch_init_vcpu(cpu);
> >> +}
> >> +
> >> +/*
> >> + * The HVF-specific vCPU thread function. This one should only run
> when the host
> >> + * CPU supports the VMX "unrestricted guest" feature.
> >> + */
> >> +static void *hvf_cpu_thread_fn(void *arg)
> >> +{
> >> +    CPUState *cpu = arg;
> >> +
> >> +    int r;
> >> +
> >> +    assert(hvf_enabled());
> >> +
> >> +    rcu_register_thread();
> >> +
> >> +    qemu_mutex_lock_iothread();
> >> +    qemu_thread_get_self(cpu->thread);
> >> +
> >> +    cpu->thread_id = qemu_get_thread_id();
> >> +    cpu->can_do_io = 1;
> >> +    current_cpu = cpu;
> >> +
> >> +    hvf_init_vcpu(cpu);
> >> +
> >> +    /* signal CPU creation */
> >> +    cpu_thread_signal_created(cpu);
> >> +    qemu_guest_random_seed_thread_part2(cpu->random_seed);
> >> +
> >> +    do {
> >> +        if (cpu_can_run(cpu)) {
> >> +            r = hvf_vcpu_exec(cpu);
> >> +            if (r == EXCP_DEBUG) {
> >> +                cpu_handle_guest_debug(cpu);
> >> +            }
> >> +        }
> >> +        qemu_wait_io_event(cpu);
> >> +    } while (!cpu->unplug || cpu_can_run(cpu));
> >> +
> >> +    hvf_vcpu_destroy(cpu);
> >> +    cpu_thread_signal_destroyed(cpu);
> >> +    qemu_mutex_unlock_iothread();
> >> +    rcu_unregister_thread();
> >> +    return NULL;
> >> +}
> >> +
> >> +static void hvf_start_vcpu_thread(CPUState *cpu)
> >> +{
> >> +    char thread_name[VCPU_THREAD_NAME_SIZE];
> >> +
> >> +    /*
> >> +     * HVF currently does not support TCG, and only runs in
> >> +     * unrestricted-guest mode.
> >> +     */
> >> +    assert(hvf_enabled());
> >> +
> >> +    cpu->thread = g_malloc0(sizeof(QemuThread));
> >> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
> >> +    qemu_cond_init(cpu->halt_cond);
> >> +
> >> +    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
> >> +             cpu->cpu_index);
> >> +    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
> >> +                       cpu, QEMU_THREAD_JOINABLE);
> >> +}
> >> +
> >> +static const CpusAccel hvf_cpus = {
> >> +    .create_vcpu_thread = hvf_start_vcpu_thread,
> >> +
> >> +    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
> >> +    .synchronize_post_init = hvf_cpu_synchronize_post_init,
> >> +    .synchronize_state = hvf_cpu_synchronize_state,
> >> +    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
> >> +};
> >> +
> >> +static int hvf_accel_init(MachineState *ms)
> >> +{
> >> +    int x;
> >> +    hv_return_t ret;
> >> +    HVFState *s;
> >> +
> >> +    ret = hv_vm_create(HV_VM_DEFAULT);
> >> +    assert_hvf_ok(ret);
> >> +
> >> +    s = g_new0(HVFState, 1);
> >> +
> >> +    s->num_slots = 32;
> >> +    for (x = 0; x < s->num_slots; ++x) {
> >> +        s->slots[x].size = 0;
> >> +        s->slots[x].slot_id = x;
> >> +    }
> >> +
> >> +    hvf_state = s;
> >> +    memory_listener_register(&hvf_memory_listener,
> &address_space_memory);
> >> +    cpus_register_accel(&hvf_cpus);
> >> +    return 0;
> >> +}
> >> +
> >> +static void hvf_accel_class_init(ObjectClass *oc, void *data)
> >> +{
> >> +    AccelClass *ac = ACCEL_CLASS(oc);
> >> +    ac->name = "HVF";
> >> +    ac->init_machine = hvf_accel_init;
> >> +    ac->allowed = &hvf_allowed;
> >> +}
> >> +
> >> +static const TypeInfo hvf_accel_type = {
> >> +    .name = TYPE_HVF_ACCEL,
> >> +    .parent = TYPE_ACCEL,
> >> +    .class_init = hvf_accel_class_init,
> >> +};
> >> +
> >> +static void hvf_type_init(void)
> >> +{
> >> +    type_register_static(&hvf_accel_type);
> >> +}
> >> +
> >> +type_init(hvf_type_init);
> >> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
> >> new file mode 100644
> >> index 0000000000..dfd6b68dc7
> >> --- /dev/null
> >> +++ b/accel/hvf/meson.build
> >> @@ -0,0 +1,7 @@
> >> +hvf_ss = ss.source_set()
> >> +hvf_ss.add(files(
> >> +  'hvf-all.c',
> >> +  'hvf-cpus.c',
> >> +))
> >> +
> >> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
> >> diff --git a/accel/meson.build b/accel/meson.build
> >> index b26cca227a..6de12ce5d5 100644
> >> --- a/accel/meson.build
> >> +++ b/accel/meson.build
> >> @@ -1,5 +1,6 @@
> >>   softmmu_ss.add(files('accel.c'))
> >>
> >> +subdir('hvf')
> >>   subdir('qtest')
> >>   subdir('kvm')
> >>   subdir('tcg')
> >> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
> >> new file mode 100644
> >> index 0000000000..de9bad23a8
> >> --- /dev/null
> >> +++ b/include/sysemu/hvf_int.h
> >> @@ -0,0 +1,69 @@
> >> +/*
> >> + * QEMU Hypervisor.framework (HVF) support
> >> + *
> >> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> >> + * See the COPYING file in the top-level directory.
> >> + *
> >> + */
> >> +
> >> +/* header to be included in HVF-specific code */
> >> +
> >> +#ifndef HVF_INT_H
> >> +#define HVF_INT_H
> >> +
> >> +#include <Hypervisor/Hypervisor.h>
> >> +
> >> +#define HVF_MAX_VCPU 0x10
> >> +
> >> +extern struct hvf_state hvf_global;
> >> +
> >> +struct hvf_vm {
> >> +    int id;
> >> +    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
> >> +};
> >> +
> >> +struct hvf_state {
> >> +    uint32_t version;
> >> +    struct hvf_vm *vm;
> >> +    uint64_t mem_quota;
> >> +};
> >> +
> >> +/* hvf_slot flags */
> >> +#define HVF_SLOT_LOG (1 << 0)
> >> +
> >> +typedef struct hvf_slot {
> >> +    uint64_t start;
> >> +    uint64_t size;
> >> +    uint8_t *mem;
> >> +    int slot_id;
> >> +    uint32_t flags;
> >> +    MemoryRegion *region;
> >> +} hvf_slot;
> >> +
> >> +typedef struct hvf_vcpu_caps {
> >> +    uint64_t vmx_cap_pinbased;
> >> +    uint64_t vmx_cap_procbased;
> >> +    uint64_t vmx_cap_procbased2;
> >> +    uint64_t vmx_cap_entry;
> >> +    uint64_t vmx_cap_exit;
> >> +    uint64_t vmx_cap_preemption_timer;
> >> +} hvf_vcpu_caps;
> >> +
> >> +struct HVFState {
> >> +    AccelState parent;
> >> +    hvf_slot slots[32];
> >> +    int num_slots;
> >> +
> >> +    hvf_vcpu_caps *hvf_caps;
> >> +};
> >> +extern HVFState *hvf_state;
> >> +
> >> +void assert_hvf_ok(hv_return_t ret);
> >> +int hvf_get_registers(CPUState *cpu);
> >> +int hvf_put_registers(CPUState *cpu);
> >> +int hvf_arch_init_vcpu(CPUState *cpu);
> >> +void hvf_arch_vcpu_destroy(CPUState *cpu);
> >> +int hvf_vcpu_exec(CPUState *cpu);
> >> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
> >> +
> >> +#endif
> >> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c
> >> deleted file mode 100644
> >> index 817b3d7452..0000000000
> >> --- a/target/i386/hvf/hvf-cpus.c
> >> +++ /dev/null
> >> @@ -1,131 +0,0 @@
> >> -/*
> >> - * Copyright 2008 IBM Corporation
> >> - *           2008 Red Hat, Inc.
> >> - * Copyright 2011 Intel Corporation
> >> - * Copyright 2016 Veertu, Inc.
> >> - * Copyright 2017 The Android Open Source Project
> >> - *
> >> - * QEMU Hypervisor.framework support
> >> - *
> >> - * This program is free software; you can redistribute it and/or
> >> - * modify it under the terms of version 2 of the GNU General Public
> >> - * License as published by the Free Software Foundation.
> >> - *
> >> - * This program is distributed in the hope that it will be useful,
> >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> >> - * General Public License for more details.
> >> - *
> >> - * You should have received a copy of the GNU General Public License
> >> - * along with this program; if not, see <http://www.gnu.org/licenses/
> >.
> >> - *
> >> - * This file contain code under public domain from the hvdos project:
> >> - * https://github.com/mist64/hvdos
> >> - *
> >> - * Parts Copyright (c) 2011 NetApp, Inc.
> >> - * All rights reserved.
> >> - *
> >> - * Redistribution and use in source and binary forms, with or without
> >> - * modification, are permitted provided that the following conditions
> >> - * are met:
> >> - * 1. Redistributions of source code must retain the above copyright
> >> - *    notice, this list of conditions and the following disclaimer.
> >> - * 2. Redistributions in binary form must reproduce the above copyright
> >> - *    notice, this list of conditions and the following disclaimer in
> the
> >> - *    documentation and/or other materials provided with the
> distribution.
> >> - *
> >> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
> >> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
> THE
> >> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> PURPOSE
> >> - * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE
> LIABLE
> >> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> CONSEQUENTIAL
> >> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
> GOODS
> >> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> INTERRUPTION)
> >> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> STRICT
> >> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
> ANY WAY
> >> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
> OF
> >> - * SUCH DAMAGE.
> >> - */
> >> -
> >> -#include "qemu/osdep.h"
> >> -#include "qemu/error-report.h"
> >> -#include "qemu/main-loop.h"
> >> -#include "sysemu/hvf.h"
> >> -#include "sysemu/runstate.h"
> >> -#include "target/i386/cpu.h"
> >> -#include "qemu/guest-random.h"
> >> -
> >> -#include "hvf-cpus.h"
> >> -
> >> -/*
> >> - * The HVF-specific vCPU thread function. This one should only run
> when the host
> >> - * CPU supports the VMX "unrestricted guest" feature.
> >> - */
> >> -static void *hvf_cpu_thread_fn(void *arg)
> >> -{
> >> -    CPUState *cpu = arg;
> >> -
> >> -    int r;
> >> -
> >> -    assert(hvf_enabled());
> >> -
> >> -    rcu_register_thread();
> >> -
> >> -    qemu_mutex_lock_iothread();
> >> -    qemu_thread_get_self(cpu->thread);
> >> -
> >> -    cpu->thread_id = qemu_get_thread_id();
> >> -    cpu->can_do_io = 1;
> >> -    current_cpu = cpu;
> >> -
> >> -    hvf_init_vcpu(cpu);
> >> -
> >> -    /* signal CPU creation */
> >> -    cpu_thread_signal_created(cpu);
> >> -    qemu_guest_random_seed_thread_part2(cpu->random_seed);
> >> -
> >> -    do {
> >> -        if (cpu_can_run(cpu)) {
> >> -            r = hvf_vcpu_exec(cpu);
> >> -            if (r == EXCP_DEBUG) {
> >> -                cpu_handle_guest_debug(cpu);
> >> -            }
> >> -        }
> >> -        qemu_wait_io_event(cpu);
> >> -    } while (!cpu->unplug || cpu_can_run(cpu));
> >> -
> >> -    hvf_vcpu_destroy(cpu);
> >> -    cpu_thread_signal_destroyed(cpu);
> >> -    qemu_mutex_unlock_iothread();
> >> -    rcu_unregister_thread();
> >> -    return NULL;
> >> -}
> >> -
> >> -static void hvf_start_vcpu_thread(CPUState *cpu)
> >> -{
> >> -    char thread_name[VCPU_THREAD_NAME_SIZE];
> >> -
> >> -    /*
> >> -     * HVF currently does not support TCG, and only runs in
> >> -     * unrestricted-guest mode.
> >> -     */
> >> -    assert(hvf_enabled());
> >> -
> >> -    cpu->thread = g_malloc0(sizeof(QemuThread));
> >> -    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
> >> -    qemu_cond_init(cpu->halt_cond);
> >> -
> >> -    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
> >> -             cpu->cpu_index);
> >> -    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
> >> -                       cpu, QEMU_THREAD_JOINABLE);
> >> -}
> >> -
> >> -const CpusAccel hvf_cpus = {
> >> -    .create_vcpu_thread = hvf_start_vcpu_thread,
> >> -
> >> -    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
> >> -    .synchronize_post_init = hvf_cpu_synchronize_post_init,
> >> -    .synchronize_state = hvf_cpu_synchronize_state,
> >> -    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
> >> -};
> >> diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h
> >> deleted file mode 100644
> >> index ced31b82c0..0000000000
> >> --- a/target/i386/hvf/hvf-cpus.h
> >> +++ /dev/null
> >> @@ -1,25 +0,0 @@
> >> -/*
> >> - * Accelerator CPUS Interface
> >> - *
> >> - * Copyright 2020 SUSE LLC
> >> - *
> >> - * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> >> - * See the COPYING file in the top-level directory.
> >> - */
> >> -
> >> -#ifndef HVF_CPUS_H
> >> -#define HVF_CPUS_H
> >> -
> >> -#include "sysemu/cpus.h"
> >> -
> >> -extern const CpusAccel hvf_cpus;
> >> -
> >> -int hvf_init_vcpu(CPUState *);
> >> -int hvf_vcpu_exec(CPUState *);
> >> -void hvf_cpu_synchronize_state(CPUState *);
> >> -void hvf_cpu_synchronize_post_reset(CPUState *);
> >> -void hvf_cpu_synchronize_post_init(CPUState *);
> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *);
> >> -void hvf_vcpu_destroy(CPUState *);
> >> -
> >> -#endif /* HVF_CPUS_H */
> >> diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
> >> index e0edffd077..6d56f8f6bb 100644
> >> --- a/target/i386/hvf/hvf-i386.h
> >> +++ b/target/i386/hvf/hvf-i386.h
> >> @@ -18,57 +18,11 @@
> >>
> >>   #include "sysemu/accel.h"
> >>   #include "sysemu/hvf.h"
> >> +#include "sysemu/hvf_int.h"
> >>   #include "cpu.h"
> >>   #include "x86.h"
> >>
> >> -#define HVF_MAX_VCPU 0x10
> >> -
> >> -extern struct hvf_state hvf_global;
> >> -
> >> -struct hvf_vm {
> >> -    int id;
> >> -    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
> >> -};
> >> -
> >> -struct hvf_state {
> >> -    uint32_t version;
> >> -    struct hvf_vm *vm;
> >> -    uint64_t mem_quota;
> >> -};
> >> -
> >> -/* hvf_slot flags */
> >> -#define HVF_SLOT_LOG (1 << 0)
> >> -
> >> -typedef struct hvf_slot {
> >> -    uint64_t start;
> >> -    uint64_t size;
> >> -    uint8_t *mem;
> >> -    int slot_id;
> >> -    uint32_t flags;
> >> -    MemoryRegion *region;
> >> -} hvf_slot;
> >> -
> >> -typedef struct hvf_vcpu_caps {
> >> -    uint64_t vmx_cap_pinbased;
> >> -    uint64_t vmx_cap_procbased;
> >> -    uint64_t vmx_cap_procbased2;
> >> -    uint64_t vmx_cap_entry;
> >> -    uint64_t vmx_cap_exit;
> >> -    uint64_t vmx_cap_preemption_timer;
> >> -} hvf_vcpu_caps;
> >> -
> >> -struct HVFState {
> >> -    AccelState parent;
> >> -    hvf_slot slots[32];
> >> -    int num_slots;
> >> -
> >> -    hvf_vcpu_caps *hvf_caps;
> >> -};
> >> -extern HVFState *hvf_state;
> >> -
> >> -void hvf_set_phys_mem(MemoryRegionSection *, bool);
> >>   void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
> >> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
> >>
> >>   #ifdef NEED_CPU_H
> >>   /* Functions exported to host specific mode */
> >> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
> >> index ed9356565c..8b96ecd619 100644
> >> --- a/target/i386/hvf/hvf.c
> >> +++ b/target/i386/hvf/hvf.c
> >> @@ -51,6 +51,7 @@
> >>   #include "qemu/error-report.h"
> >>
> >>   #include "sysemu/hvf.h"
> >> +#include "sysemu/hvf_int.h"
> >>   #include "sysemu/runstate.h"
> >>   #include "hvf-i386.h"
> >>   #include "vmcs.h"
> >> @@ -72,171 +73,6 @@
> >>   #include "sysemu/accel.h"
> >>   #include "target/i386/cpu.h"
> >>
> >> -#include "hvf-cpus.h"
> >> -
> >> -HVFState *hvf_state;
> >> -
> >> -static void assert_hvf_ok(hv_return_t ret)
> >> -{
> >> -    if (ret == HV_SUCCESS) {
> >> -        return;
> >> -    }
> >> -
> >> -    switch (ret) {
> >> -    case HV_ERROR:
> >> -        error_report("Error: HV_ERROR");
> >> -        break;
> >> -    case HV_BUSY:
> >> -        error_report("Error: HV_BUSY");
> >> -        break;
> >> -    case HV_BAD_ARGUMENT:
> >> -        error_report("Error: HV_BAD_ARGUMENT");
> >> -        break;
> >> -    case HV_NO_RESOURCES:
> >> -        error_report("Error: HV_NO_RESOURCES");
> >> -        break;
> >> -    case HV_NO_DEVICE:
> >> -        error_report("Error: HV_NO_DEVICE");
> >> -        break;
> >> -    case HV_UNSUPPORTED:
> >> -        error_report("Error: HV_UNSUPPORTED");
> >> -        break;
> >> -    default:
> >> -        error_report("Unknown Error");
> >> -    }
> >> -
> >> -    abort();
> >> -}
> >> -
> >> -/* Memory slots */
> >> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
> >> -{
> >> -    hvf_slot *slot;
> >> -    int x;
> >> -    for (x = 0; x < hvf_state->num_slots; ++x) {
> >> -        slot = &hvf_state->slots[x];
> >> -        if (slot->size && start < (slot->start + slot->size) &&
> >> -            (start + size) > slot->start) {
> >> -            return slot;
> >> -        }
> >> -    }
> >> -    return NULL;
> >> -}
> >> -
> >> -struct mac_slot {
> >> -    int present;
> >> -    uint64_t size;
> >> -    uint64_t gpa_start;
> >> -    uint64_t gva;
> >> -};
> >> -
> >> -struct mac_slot mac_slots[32];
> >> -
> >> -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
> >> -{
> >> -    struct mac_slot *macslot;
> >> -    hv_return_t ret;
> >> -
> >> -    macslot = &mac_slots[slot->slot_id];
> >> -
> >> -    if (macslot->present) {
> >> -        if (macslot->size != slot->size) {
> >> -            macslot->present = 0;
> >> -            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
> >> -            assert_hvf_ok(ret);
> >> -        }
> >> -    }
> >> -
> >> -    if (!slot->size) {
> >> -        return 0;
> >> -    }
> >> -
> >> -    macslot->present = 1;
> >> -    macslot->gpa_start = slot->start;
> >> -    macslot->size = slot->size;
> >> -    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size,
> flags);
> >> -    assert_hvf_ok(ret);
> >> -    return 0;
> >> -}
> >> -
> >> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
> >> -{
> >> -    hvf_slot *mem;
> >> -    MemoryRegion *area = section->mr;
> >> -    bool writeable = !area->readonly && !area->rom_device;
> >> -    hv_memory_flags_t flags;
> >> -
> >> -    if (!memory_region_is_ram(area)) {
> >> -        if (writeable) {
> >> -            return;
> >> -        } else if (!memory_region_is_romd(area)) {
> >> -            /*
> >> -             * If the memory device is not in romd_mode, then we
> actually want
> >> -             * to remove the hvf memory slot so all accesses will trap.
> >> -             */
> >> -             add = false;
> >> -        }
> >> -    }
> >> -
> >> -    mem = hvf_find_overlap_slot(
> >> -            section->offset_within_address_space,
> >> -            int128_get64(section->size));
> >> -
> >> -    if (mem && add) {
> >> -        if (mem->size == int128_get64(section->size) &&
> >> -            mem->start == section->offset_within_address_space &&
> >> -            mem->mem == (memory_region_get_ram_ptr(area) +
> >> -            section->offset_within_region)) {
> >> -            return; /* Same region was attempted to register, go away.
> */
> >> -        }
> >> -    }
> >> -
> >> -    /* Region needs to be reset. set the size to 0 and remap it. */
> >> -    if (mem) {
> >> -        mem->size = 0;
> >> -        if (do_hvf_set_memory(mem, 0)) {
> >> -            error_report("Failed to reset overlapping slot");
> >> -            abort();
> >> -        }
> >> -    }
> >> -
> >> -    if (!add) {
> >> -        return;
> >> -    }
> >> -
> >> -    if (area->readonly ||
> >> -        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
> >> -        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
> >> -    } else {
> >> -        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
> >> -    }
> >> -
> >> -    /* Now make a new slot. */
> >> -    int x;
> >> -
> >> -    for (x = 0; x < hvf_state->num_slots; ++x) {
> >> -        mem = &hvf_state->slots[x];
> >> -        if (!mem->size) {
> >> -            break;
> >> -        }
> >> -    }
> >> -
> >> -    if (x == hvf_state->num_slots) {
> >> -        error_report("No free slots");
> >> -        abort();
> >> -    }
> >> -
> >> -    mem->size = int128_get64(section->size);
> >> -    mem->mem = memory_region_get_ram_ptr(area) +
> section->offset_within_region;
> >> -    mem->start = section->offset_within_address_space;
> >> -    mem->region = area;
> >> -
> >> -    if (do_hvf_set_memory(mem, flags)) {
> >> -        error_report("Error registering new memory slot");
> >> -        abort();
> >> -    }
> >> -}
> >> -
> >>   void vmx_update_tpr(CPUState *cpu)
> >>   {
> >>       /* TODO: need integrate APIC handling */
> >> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t
> port, void *buffer,
> >>       }
> >>   }
> >>
> >> -static void do_hvf_cpu_synchronize_state(CPUState *cpu,
> run_on_cpu_data arg)
> >> -{
> >> -    if (!cpu->vcpu_dirty) {
> >> -        hvf_get_registers(cpu);
> >> -        cpu->vcpu_dirty = true;
> >> -    }
> >> -}
> >> -
> >> -void hvf_cpu_synchronize_state(CPUState *cpu)
> >> -{
> >> -    if (!cpu->vcpu_dirty) {
> >> -        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
> >> -    }
> >> -}
> >> -
> >> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
> >> -                                              run_on_cpu_data arg)
> >> -{
> >> -    hvf_put_registers(cpu);
> >> -    cpu->vcpu_dirty = false;
> >> -}
> >> -
> >> -void hvf_cpu_synchronize_post_reset(CPUState *cpu)
> >> -{
> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset,
> RUN_ON_CPU_NULL);
> >> -}
> >> -
> >> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
> >> -                                             run_on_cpu_data arg)
> >> -{
> >> -    hvf_put_registers(cpu);
> >> -    cpu->vcpu_dirty = false;
> >> -}
> >> -
> >> -void hvf_cpu_synchronize_post_init(CPUState *cpu)
> >> -{
> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
> >> -}
> >> -
> >> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
> >> -                                              run_on_cpu_data arg)
> >> -{
> >> -    cpu->vcpu_dirty = true;
> >> -}
> >> -
> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
> >> -{
> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm,
> RUN_ON_CPU_NULL);
> >> -}
> >> -
> >>   static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa,
> uint64_t ept_qual)
> >>   {
> >>       int read, write;
> >> @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot *slot,
> uint64_t gpa, uint64_t ept_qual)
> >>       return false;
> >>   }
> >>
> >> -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool
> on)
> >> -{
> >> -    hvf_slot *slot;
> >> -
> >> -    slot = hvf_find_overlap_slot(
> >> -            section->offset_within_address_space,
> >> -            int128_get64(section->size));
> >> -
> >> -    /* protect region against writes; begin tracking it */
> >> -    if (on) {
> >> -        slot->flags |= HVF_SLOT_LOG;
> >> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
> >> -                      HV_MEMORY_READ);
> >> -    /* stop tracking region*/
> >> -    } else {
> >> -        slot->flags &= ~HVF_SLOT_LOG;
> >> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
> >> -                      HV_MEMORY_READ | HV_MEMORY_WRITE);
> >> -    }
> >> -}
> >> -
> >> -static void hvf_log_start(MemoryListener *listener,
> >> -                          MemoryRegionSection *section, int old, int
> new)
> >> -{
> >> -    if (old != 0) {
> >> -        return;
> >> -    }
> >> -
> >> -    hvf_set_dirty_tracking(section, 1);
> >> -}
> >> -
> >> -static void hvf_log_stop(MemoryListener *listener,
> >> -                         MemoryRegionSection *section, int old, int
> new)
> >> -{
> >> -    if (new != 0) {
> >> -        return;
> >> -    }
> >> -
> >> -    hvf_set_dirty_tracking(section, 0);
> >> -}
> >> -
> >> -static void hvf_log_sync(MemoryListener *listener,
> >> -                         MemoryRegionSection *section)
> >> -{
> >> -    /*
> >> -     * sync of dirty pages is handled elsewhere; just make sure we keep
> >> -     * tracking the region.
> >> -     */
> >> -    hvf_set_dirty_tracking(section, 1);
> >> -}
> >> -
> >> -static void hvf_region_add(MemoryListener *listener,
> >> -                           MemoryRegionSection *section)
> >> -{
> >> -    hvf_set_phys_mem(section, true);
> >> -}
> >> -
> >> -static void hvf_region_del(MemoryListener *listener,
> >> -                           MemoryRegionSection *section)
> >> -{
> >> -    hvf_set_phys_mem(section, false);
> >> -}
> >> -
> >> -static MemoryListener hvf_memory_listener = {
> >> -    .priority = 10,
> >> -    .region_add = hvf_region_add,
> >> -    .region_del = hvf_region_del,
> >> -    .log_start = hvf_log_start,
> >> -    .log_stop = hvf_log_stop,
> >> -    .log_sync = hvf_log_sync,
> >> -};
> >> -
> >> -void hvf_vcpu_destroy(CPUState *cpu)
> >> +void hvf_arch_vcpu_destroy(CPUState *cpu)
> >>   {
> >>       X86CPU *x86_cpu = X86_CPU(cpu);
> >>       CPUX86State *env = &x86_cpu->env;
> >>
> >> -    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
> >>       g_free(env->hvf_mmio_buf);
> >> -    assert_hvf_ok(ret);
> >> -}
> >> -
> >> -static void dummy_signal(int sig)
> >> -{
> >>   }
> >>
> >> -int hvf_init_vcpu(CPUState *cpu)
> >> +int hvf_arch_init_vcpu(CPUState *cpu)
> >>   {
> >>
> >>       X86CPU *x86cpu = X86_CPU(cpu);
> >>       CPUX86State *env = &x86cpu->env;
> >> -    int r;
> >> -
> >> -    /* init cpu signals */
> >> -    sigset_t set;
> >> -    struct sigaction sigact;
> >> -
> >> -    memset(&sigact, 0, sizeof(sigact));
> >> -    sigact.sa_handler = dummy_signal;
> >> -    sigaction(SIG_IPI, &sigact, NULL);
> >> -
> >> -    pthread_sigmask(SIG_BLOCK, NULL, &set);
> >> -    sigdelset(&set, SIG_IPI);
> >>
> >>       init_emu();
> >>       init_decoder();
> >> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu)
> >>       hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1);
> >>       env->hvf_mmio_buf = g_new(char, 4096);
> >>
> >> -    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
> >> -    cpu->vcpu_dirty = 1;
> >> -    assert_hvf_ok(r);
> >> -
> >>       if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
> >>           &hvf_state->hvf_caps->vmx_cap_pinbased)) {
> >>           abort();
> >> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu)
> >>
> >>       return ret;
> >>   }
> >> -
> >> -bool hvf_allowed;
> >> -
> >> -static int hvf_accel_init(MachineState *ms)
> >> -{
> >> -    int x;
> >> -    hv_return_t ret;
> >> -    HVFState *s;
> >> -
> >> -    ret = hv_vm_create(HV_VM_DEFAULT);
> >> -    assert_hvf_ok(ret);
> >> -
> >> -    s = g_new0(HVFState, 1);
> >> -
> >> -    s->num_slots = 32;
> >> -    for (x = 0; x < s->num_slots; ++x) {
> >> -        s->slots[x].size = 0;
> >> -        s->slots[x].slot_id = x;
> >> -    }
> >> -
> >> -    hvf_state = s;
> >> -    memory_listener_register(&hvf_memory_listener,
> &address_space_memory);
> >> -    cpus_register_accel(&hvf_cpus);
> >> -    return 0;
> >> -}
> >> -
> >> -static void hvf_accel_class_init(ObjectClass *oc, void *data)
> >> -{
> >> -    AccelClass *ac = ACCEL_CLASS(oc);
> >> -    ac->name = "HVF";
> >> -    ac->init_machine = hvf_accel_init;
> >> -    ac->allowed = &hvf_allowed;
> >> -}
> >> -
> >> -static const TypeInfo hvf_accel_type = {
> >> -    .name = TYPE_HVF_ACCEL,
> >> -    .parent = TYPE_ACCEL,
> >> -    .class_init = hvf_accel_class_init,
> >> -};
> >> -
> >> -static void hvf_type_init(void)
> >> -{
> >> -    type_register_static(&hvf_accel_type);
> >> -}
> >> -
> >> -type_init(hvf_type_init);
> >> diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
> >> index 409c9a3f14..c8a43717ee 100644
> >> --- a/target/i386/hvf/meson.build
> >> +++ b/target/i386/hvf/meson.build
> >> @@ -1,6 +1,5 @@
> >>   i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
> >>     'hvf.c',
> >> -  'hvf-cpus.c',
> >>     'x86.c',
> >>     'x86_cpuid.c',
> >>     'x86_decode.c',
> >> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
> >> index bbec412b6c..89b8e9d87a 100644
> >> --- a/target/i386/hvf/x86hvf.c
> >> +++ b/target/i386/hvf/x86hvf.c
> >> @@ -20,6 +20,9 @@
> >>   #include "qemu/osdep.h"
> >>
> >>   #include "qemu-common.h"
> >> +#include "sysemu/hvf.h"
> >> +#include "sysemu/hvf_int.h"
> >> +#include "sysemu/hw_accel.h"
> >>   #include "x86hvf.h"
> >>   #include "vmx.h"
> >>   #include "vmcs.h"
> >> @@ -32,8 +35,6 @@
> >>   #include <Hypervisor/hv.h>
> >>   #include <Hypervisor/hv_vmx.h>
> >>
> >> -#include "hvf-cpus.h"
> >> -
> >>   void hvf_set_segment(struct CPUState *cpu, struct vmx_segment
> *vmx_seg,
> >>                        SegmentCache *qseg, bool is_tr)
> >>   {
> >> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state)
> >>       env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
> >>
> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
> >> -        hvf_cpu_synchronize_state(cpu_state);
> >> +        cpu_synchronize_state(cpu_state);
> >>           do_cpu_init(cpu);
> >>       }
> >>
> >> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state)
> >>           cpu_state->halted = 0;
> >>       }
> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) {
> >> -        hvf_cpu_synchronize_state(cpu_state);
> >> +        cpu_synchronize_state(cpu_state);
> >>           do_cpu_sipi(cpu);
> >>       }
> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) {
> >>           cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR;
> >> -        hvf_cpu_synchronize_state(cpu_state);
> >> +        cpu_synchronize_state(cpu_state);
> > The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should
> > be a separate patch. It follows cpu/accel cleanups Claudio was doing the
> > summer.
>
>
> The only reason they're in here is because we no longer have access to
> the hvf_ functions from the file. I am perfectly happy to rebase the
> patch on top of Claudio's if his goes in first. I'm sure it'll be
> trivial for him to rebase on top of this too if my series goes in first.
>
>
> >
> > Phillipe raised the idea that the patch might go ahead of ARM-specific
> > part (which might involve some discussions) and I agree with that.
> >
> > Some sync between Claudio series (CC'd him) and the patch might be need.
>
>
> I would prefer not to hold back because of the sync. Claudio's cleanup
> is trivial enough to adjust for if it gets merged ahead of this.
>
>
> Alex
>
>
>
>
Frank Yang Nov. 30, 2020, 8:15 p.m. UTC | #4
Update: We're not quite sure how to compare the CNTV_CVAL and CNTVCT. But
the high CPU usage seems to be mitigated by having a poll interval (like
KVM does) in handling WFI:

https://android-review.googlesource.com/c/platform/external/qemu/+/1512501

This is loosely inspired by
https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766
which does seem to specify a poll interval.

It would be cool if we could have a lightweight way to enter sleep and
restart the vcpus precisely when CVAL passes, though.

Frank


On Fri, Nov 27, 2020 at 3:30 PM Frank Yang <lfy@google.com> wrote:

> Hi all,
>
> +Peter Collingbourne <pcc@google.com>
>
> I'm a developer on the Android Emulator, which is in a fork of QEMU.
>
> Peter and I have been working on an HVF Apple Silicon backend with an eye
> toward Android guests.
>
> We have gotten things to basically switch to Android userspace already
> (logcat/shell and graphics available at least)
>
> Our strategy so far has been to import logic from the KVM implementation
> and hook into QEMU's software devices that previously assumed to only work
> with TCG, or have KVM-specific paths.
>
> Thanks to Alexander for the tip on the 36-bit address space limitation
> btw; our way of addressing this is to still allow highmem but not put pci
> high mmio so high.
>
> Also, note we have a sleep/signal based mechanism to deal with WFx, which
> might be worth looking into in Alexander's implementation as well:
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551
>
> Patches so far, FYI:
>
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3
>
>
> https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a
>
> https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b
>
> https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01
>
> https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228
>
> https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102
>
> https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6
>
> Peter's also noticed that there are extra steps needed for M1's to allow
> TCG to work, as it involves JIT:
>
>
> https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9
>
> We'd appreciate any feedback/comments :)
>
> Best,
>
> Frank
>
> On Fri, Nov 27, 2020 at 1:57 PM Alexander Graf <agraf@csgraf.de> wrote:
>
>>
>> On 27.11.20 21:00, Roman Bolshakov wrote:
>> > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote:
>> >> Until now, Hypervisor.framework has only been available on x86_64
>> systems.
>> >> With Apple Silicon shipping now, it extends its reach to aarch64. To
>> >> prepare for support for multiple architectures, let's move common code
>> out
>> >> into its own accel directory.
>> >>
>> >> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>> >> ---
>> >>   MAINTAINERS                 |   9 +-
>> >>   accel/hvf/hvf-all.c         |  56 +++++
>> >>   accel/hvf/hvf-cpus.c        | 468
>> ++++++++++++++++++++++++++++++++++++
>> >>   accel/hvf/meson.build       |   7 +
>> >>   accel/meson.build           |   1 +
>> >>   include/sysemu/hvf_int.h    |  69 ++++++
>> >>   target/i386/hvf/hvf-cpus.c  | 131 ----------
>> >>   target/i386/hvf/hvf-cpus.h  |  25 --
>> >>   target/i386/hvf/hvf-i386.h  |  48 +---
>> >>   target/i386/hvf/hvf.c       | 360 +--------------------------
>> >>   target/i386/hvf/meson.build |   1 -
>> >>   target/i386/hvf/x86hvf.c    |  11 +-
>> >>   target/i386/hvf/x86hvf.h    |   2 -
>> >>   13 files changed, 619 insertions(+), 569 deletions(-)
>> >>   create mode 100644 accel/hvf/hvf-all.c
>> >>   create mode 100644 accel/hvf/hvf-cpus.c
>> >>   create mode 100644 accel/hvf/meson.build
>> >>   create mode 100644 include/sysemu/hvf_int.h
>> >>   delete mode 100644 target/i386/hvf/hvf-cpus.c
>> >>   delete mode 100644 target/i386/hvf/hvf-cpus.h
>> >>
>> >> diff --git a/MAINTAINERS b/MAINTAINERS
>> >> index 68bc160f41..ca4b6d9279 100644
>> >> --- a/MAINTAINERS
>> >> +++ b/MAINTAINERS
>> >> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com>
>> >>   M: Roman Bolshakov <r.bolshakov@yadro.com>
>> >>   W: https://wiki.qemu.org/Features/HVF
>> >>   S: Maintained
>> >> -F: accel/stubs/hvf-stub.c
>> > There was a patch for that in the RFC series from Claudio.
>>
>>
>> Yeah, I'm not worried about this hunk :).
>>
>>
>> >
>> >>   F: target/i386/hvf/
>> >> +
>> >> +HVF
>> >> +M: Cameron Esfahani <dirty@apple.com>
>> >> +M: Roman Bolshakov <r.bolshakov@yadro.com>
>> >> +W: https://wiki.qemu.org/Features/HVF
>> >> +S: Maintained
>> >> +F: accel/hvf/
>> >>   F: include/sysemu/hvf.h
>> >> +F: include/sysemu/hvf_int.h
>> >>
>> >>   WHPX CPUs
>> >>   M: Sunil Muthuswamy <sunilmut@microsoft.com>
>> >> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
>> >> new file mode 100644
>> >> index 0000000000..47d77a472a
>> >> --- /dev/null
>> >> +++ b/accel/hvf/hvf-all.c
>> >> @@ -0,0 +1,56 @@
>> >> +/*
>> >> + * QEMU Hypervisor.framework support
>> >> + *
>> >> + * This work is licensed under the terms of the GNU GPL, version 2.
>> See
>> >> + * the COPYING file in the top-level directory.
>> >> + *
>> >> + * Contributions after 2012-01-13 are licensed under the terms of the
>> >> + * GNU GPL, version 2 or (at your option) any later version.
>> >> + */
>> >> +
>> >> +#include "qemu/osdep.h"
>> >> +#include "qemu-common.h"
>> >> +#include "qemu/error-report.h"
>> >> +#include "sysemu/hvf.h"
>> >> +#include "sysemu/hvf_int.h"
>> >> +#include "sysemu/runstate.h"
>> >> +
>> >> +#include "qemu/main-loop.h"
>> >> +#include "sysemu/accel.h"
>> >> +
>> >> +#include <Hypervisor/Hypervisor.h>
>> >> +
>> >> +bool hvf_allowed;
>> >> +HVFState *hvf_state;
>> >> +
>> >> +void assert_hvf_ok(hv_return_t ret)
>> >> +{
>> >> +    if (ret == HV_SUCCESS) {
>> >> +        return;
>> >> +    }
>> >> +
>> >> +    switch (ret) {
>> >> +    case HV_ERROR:
>> >> +        error_report("Error: HV_ERROR");
>> >> +        break;
>> >> +    case HV_BUSY:
>> >> +        error_report("Error: HV_BUSY");
>> >> +        break;
>> >> +    case HV_BAD_ARGUMENT:
>> >> +        error_report("Error: HV_BAD_ARGUMENT");
>> >> +        break;
>> >> +    case HV_NO_RESOURCES:
>> >> +        error_report("Error: HV_NO_RESOURCES");
>> >> +        break;
>> >> +    case HV_NO_DEVICE:
>> >> +        error_report("Error: HV_NO_DEVICE");
>> >> +        break;
>> >> +    case HV_UNSUPPORTED:
>> >> +        error_report("Error: HV_UNSUPPORTED");
>> >> +        break;
>> >> +    default:
>> >> +        error_report("Unknown Error");
>> >> +    }
>> >> +
>> >> +    abort();
>> >> +}
>> >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
>> >> new file mode 100644
>> >> index 0000000000..f9bb5502b7
>> >> --- /dev/null
>> >> +++ b/accel/hvf/hvf-cpus.c
>> >> @@ -0,0 +1,468 @@
>> >> +/*
>> >> + * Copyright 2008 IBM Corporation
>> >> + *           2008 Red Hat, Inc.
>> >> + * Copyright 2011 Intel Corporation
>> >> + * Copyright 2016 Veertu, Inc.
>> >> + * Copyright 2017 The Android Open Source Project
>> >> + *
>> >> + * QEMU Hypervisor.framework support
>> >> + *
>> >> + * This program is free software; you can redistribute it and/or
>> >> + * modify it under the terms of version 2 of the GNU General Public
>> >> + * License as published by the Free Software Foundation.
>> >> + *
>> >> + * This program is distributed in the hope that it will be useful,
>> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> >> + * General Public License for more details.
>> >> + *
>> >> + * You should have received a copy of the GNU General Public License
>> >> + * along with this program; if not, see <http://www.gnu.org/licenses/
>> >.
>> >> + *
>> >> + * This file contain code under public domain from the hvdos project:
>> >> + * https://github.com/mist64/hvdos
>> >> + *
>> >> + * Parts Copyright (c) 2011 NetApp, Inc.
>> >> + * All rights reserved.
>> >> + *
>> >> + * Redistribution and use in source and binary forms, with or without
>> >> + * modification, are permitted provided that the following conditions
>> >> + * are met:
>> >> + * 1. Redistributions of source code must retain the above copyright
>> >> + *    notice, this list of conditions and the following disclaimer.
>> >> + * 2. Redistributions in binary form must reproduce the above
>> copyright
>> >> + *    notice, this list of conditions and the following disclaimer in
>> the
>> >> + *    documentation and/or other materials provided with the
>> distribution.
>> >> + *
>> >> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
>> >> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
>> THE
>> >> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
>> PURPOSE
>> >> + * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE
>> LIABLE
>> >> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
>> CONSEQUENTIAL
>> >> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
>> GOODS
>> >> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
>> INTERRUPTION)
>> >> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
>> CONTRACT, STRICT
>> >> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
>> ANY WAY
>> >> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
>> POSSIBILITY OF
>> >> + * SUCH DAMAGE.
>> >> + */
>> >> +
>> >> +#include "qemu/osdep.h"
>> >> +#include "qemu/error-report.h"
>> >> +#include "qemu/main-loop.h"
>> >> +#include "exec/address-spaces.h"
>> >> +#include "exec/exec-all.h"
>> >> +#include "sysemu/cpus.h"
>> >> +#include "sysemu/hvf.h"
>> >> +#include "sysemu/hvf_int.h"
>> >> +#include "sysemu/runstate.h"
>> >> +#include "qemu/guest-random.h"
>> >> +
>> >> +#include <Hypervisor/Hypervisor.h>
>> >> +
>> >> +/* Memory slots */
>> >> +
>> >> +struct mac_slot {
>> >> +    int present;
>> >> +    uint64_t size;
>> >> +    uint64_t gpa_start;
>> >> +    uint64_t gva;
>> >> +};
>> >> +
>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
>> >> +{
>> >> +    hvf_slot *slot;
>> >> +    int x;
>> >> +    for (x = 0; x < hvf_state->num_slots; ++x) {
>> >> +        slot = &hvf_state->slots[x];
>> >> +        if (slot->size && start < (slot->start + slot->size) &&
>> >> +            (start + size) > slot->start) {
>> >> +            return slot;
>> >> +        }
>> >> +    }
>> >> +    return NULL;
>> >> +}
>> >> +
>> >> +struct mac_slot mac_slots[32];
>> >> +
>> >> +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
>> >> +{
>> >> +    struct mac_slot *macslot;
>> >> +    hv_return_t ret;
>> >> +
>> >> +    macslot = &mac_slots[slot->slot_id];
>> >> +
>> >> +    if (macslot->present) {
>> >> +        if (macslot->size != slot->size) {
>> >> +            macslot->present = 0;
>> >> +            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
>> >> +            assert_hvf_ok(ret);
>> >> +        }
>> >> +    }
>> >> +
>> >> +    if (!slot->size) {
>> >> +        return 0;
>> >> +    }
>> >> +
>> >> +    macslot->present = 1;
>> >> +    macslot->gpa_start = slot->start;
>> >> +    macslot->size = slot->size;
>> >> +    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
>> >> +    assert_hvf_ok(ret);
>> >> +    return 0;
>> >> +}
>> >> +
>> >> +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
>> >> +{
>> >> +    hvf_slot *mem;
>> >> +    MemoryRegion *area = section->mr;
>> >> +    bool writeable = !area->readonly && !area->rom_device;
>> >> +    hv_memory_flags_t flags;
>> >> +
>> >> +    if (!memory_region_is_ram(area)) {
>> >> +        if (writeable) {
>> >> +            return;
>> >> +        } else if (!memory_region_is_romd(area)) {
>> >> +            /*
>> >> +             * If the memory device is not in romd_mode, then we
>> actually want
>> >> +             * to remove the hvf memory slot so all accesses will
>> trap.
>> >> +             */
>> >> +             add = false;
>> >> +        }
>> >> +    }
>> >> +
>> >> +    mem = hvf_find_overlap_slot(
>> >> +            section->offset_within_address_space,
>> >> +            int128_get64(section->size));
>> >> +
>> >> +    if (mem && add) {
>> >> +        if (mem->size == int128_get64(section->size) &&
>> >> +            mem->start == section->offset_within_address_space &&
>> >> +            mem->mem == (memory_region_get_ram_ptr(area) +
>> >> +            section->offset_within_region)) {
>> >> +            return; /* Same region was attempted to register, go
>> away. */
>> >> +        }
>> >> +    }
>> >> +
>> >> +    /* Region needs to be reset. set the size to 0 and remap it. */
>> >> +    if (mem) {
>> >> +        mem->size = 0;
>> >> +        if (do_hvf_set_memory(mem, 0)) {
>> >> +            error_report("Failed to reset overlapping slot");
>> >> +            abort();
>> >> +        }
>> >> +    }
>> >> +
>> >> +    if (!add) {
>> >> +        return;
>> >> +    }
>> >> +
>> >> +    if (area->readonly ||
>> >> +        (!memory_region_is_ram(area) && memory_region_is_romd(area)))
>> {
>> >> +        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
>> >> +    } else {
>> >> +        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
>> >> +    }
>> >> +
>> >> +    /* Now make a new slot. */
>> >> +    int x;
>> >> +
>> >> +    for (x = 0; x < hvf_state->num_slots; ++x) {
>> >> +        mem = &hvf_state->slots[x];
>> >> +        if (!mem->size) {
>> >> +            break;
>> >> +        }
>> >> +    }
>> >> +
>> >> +    if (x == hvf_state->num_slots) {
>> >> +        error_report("No free slots");
>> >> +        abort();
>> >> +    }
>> >> +
>> >> +    mem->size = int128_get64(section->size);
>> >> +    mem->mem = memory_region_get_ram_ptr(area) +
>> section->offset_within_region;
>> >> +    mem->start = section->offset_within_address_space;
>> >> +    mem->region = area;
>> >> +
>> >> +    if (do_hvf_set_memory(mem, flags)) {
>> >> +        error_report("Error registering new memory slot");
>> >> +        abort();
>> >> +    }
>> >> +}
>> >> +
>> >> +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool
>> on)
>> >> +{
>> >> +    hvf_slot *slot;
>> >> +
>> >> +    slot = hvf_find_overlap_slot(
>> >> +            section->offset_within_address_space,
>> >> +            int128_get64(section->size));
>> >> +
>> >> +    /* protect region against writes; begin tracking it */
>> >> +    if (on) {
>> >> +        slot->flags |= HVF_SLOT_LOG;
>> >> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
>> >> +                      HV_MEMORY_READ);
>> >> +    /* stop tracking region*/
>> >> +    } else {
>> >> +        slot->flags &= ~HVF_SLOT_LOG;
>> >> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
>> >> +                      HV_MEMORY_READ | HV_MEMORY_WRITE);
>> >> +    }
>> >> +}
>> >> +
>> >> +static void hvf_log_start(MemoryListener *listener,
>> >> +                          MemoryRegionSection *section, int old, int
>> new)
>> >> +{
>> >> +    if (old != 0) {
>> >> +        return;
>> >> +    }
>> >> +
>> >> +    hvf_set_dirty_tracking(section, 1);
>> >> +}
>> >> +
>> >> +static void hvf_log_stop(MemoryListener *listener,
>> >> +                         MemoryRegionSection *section, int old, int
>> new)
>> >> +{
>> >> +    if (new != 0) {
>> >> +        return;
>> >> +    }
>> >> +
>> >> +    hvf_set_dirty_tracking(section, 0);
>> >> +}
>> >> +
>> >> +static void hvf_log_sync(MemoryListener *listener,
>> >> +                         MemoryRegionSection *section)
>> >> +{
>> >> +    /*
>> >> +     * sync of dirty pages is handled elsewhere; just make sure we
>> keep
>> >> +     * tracking the region.
>> >> +     */
>> >> +    hvf_set_dirty_tracking(section, 1);
>> >> +}
>> >> +
>> >> +static void hvf_region_add(MemoryListener *listener,
>> >> +                           MemoryRegionSection *section)
>> >> +{
>> >> +    hvf_set_phys_mem(section, true);
>> >> +}
>> >> +
>> >> +static void hvf_region_del(MemoryListener *listener,
>> >> +                           MemoryRegionSection *section)
>> >> +{
>> >> +    hvf_set_phys_mem(section, false);
>> >> +}
>> >> +
>> >> +static MemoryListener hvf_memory_listener = {
>> >> +    .priority = 10,
>> >> +    .region_add = hvf_region_add,
>> >> +    .region_del = hvf_region_del,
>> >> +    .log_start = hvf_log_start,
>> >> +    .log_stop = hvf_log_stop,
>> >> +    .log_sync = hvf_log_sync,
>> >> +};
>> >> +
>> >> +static void do_hvf_cpu_synchronize_state(CPUState *cpu,
>> run_on_cpu_data arg)
>> >> +{
>> >> +    if (!cpu->vcpu_dirty) {
>> >> +        hvf_get_registers(cpu);
>> >> +        cpu->vcpu_dirty = true;
>> >> +    }
>> >> +}
>> >> +
>> >> +static void hvf_cpu_synchronize_state(CPUState *cpu)
>> >> +{
>> >> +    if (!cpu->vcpu_dirty) {
>> >> +        run_on_cpu(cpu, do_hvf_cpu_synchronize_state,
>> RUN_ON_CPU_NULL);
>> >> +    }
>> >> +}
>> >> +
>> >> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
>> >> +                                              run_on_cpu_data arg)
>> >> +{
>> >> +    hvf_put_registers(cpu);
>> >> +    cpu->vcpu_dirty = false;
>> >> +}
>> >> +
>> >> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
>> >> +{
>> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset,
>> RUN_ON_CPU_NULL);
>> >> +}
>> >> +
>> >> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
>> >> +                                             run_on_cpu_data arg)
>> >> +{
>> >> +    hvf_put_registers(cpu);
>> >> +    cpu->vcpu_dirty = false;
>> >> +}
>> >> +
>> >> +static void hvf_cpu_synchronize_post_init(CPUState *cpu)
>> >> +{
>> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init,
>> RUN_ON_CPU_NULL);
>> >> +}
>> >> +
>> >> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
>> >> +                                              run_on_cpu_data arg)
>> >> +{
>> >> +    cpu->vcpu_dirty = true;
>> >> +}
>> >> +
>> >> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
>> >> +{
>> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm,
>> RUN_ON_CPU_NULL);
>> >> +}
>> >> +
>> >> +static void hvf_vcpu_destroy(CPUState *cpu)
>> >> +{
>> >> +    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
>> >> +    assert_hvf_ok(ret);
>> >> +
>> >> +    hvf_arch_vcpu_destroy(cpu);
>> >> +}
>> >> +
>> >> +static void dummy_signal(int sig)
>> >> +{
>> >> +}
>> >> +
>> >> +static int hvf_init_vcpu(CPUState *cpu)
>> >> +{
>> >> +    int r;
>> >> +
>> >> +    /* init cpu signals */
>> >> +    sigset_t set;
>> >> +    struct sigaction sigact;
>> >> +
>> >> +    memset(&sigact, 0, sizeof(sigact));
>> >> +    sigact.sa_handler = dummy_signal;
>> >> +    sigaction(SIG_IPI, &sigact, NULL);
>> >> +
>> >> +    pthread_sigmask(SIG_BLOCK, NULL, &set);
>> >> +    sigdelset(&set, SIG_IPI);
>> >> +
>> >> +#ifdef __aarch64__
>> >> +    r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t
>> **)&cpu->hvf_exit, NULL);
>> >> +#else
>> >> +    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
>> >> +#endif
>> > I think the first __aarch64__ bit fits better to arm part of the series.
>>
>>
>> Oops. Thanks for catching it! Yes, absolutely. It should be part of the
>> ARM enablement.
>>
>>
>> >
>> >> +    cpu->vcpu_dirty = 1;
>> >> +    assert_hvf_ok(r);
>> >> +
>> >> +    return hvf_arch_init_vcpu(cpu);
>> >> +}
>> >> +
>> >> +/*
>> >> + * The HVF-specific vCPU thread function. This one should only run
>> when the host
>> >> + * CPU supports the VMX "unrestricted guest" feature.
>> >> + */
>> >> +static void *hvf_cpu_thread_fn(void *arg)
>> >> +{
>> >> +    CPUState *cpu = arg;
>> >> +
>> >> +    int r;
>> >> +
>> >> +    assert(hvf_enabled());
>> >> +
>> >> +    rcu_register_thread();
>> >> +
>> >> +    qemu_mutex_lock_iothread();
>> >> +    qemu_thread_get_self(cpu->thread);
>> >> +
>> >> +    cpu->thread_id = qemu_get_thread_id();
>> >> +    cpu->can_do_io = 1;
>> >> +    current_cpu = cpu;
>> >> +
>> >> +    hvf_init_vcpu(cpu);
>> >> +
>> >> +    /* signal CPU creation */
>> >> +    cpu_thread_signal_created(cpu);
>> >> +    qemu_guest_random_seed_thread_part2(cpu->random_seed);
>> >> +
>> >> +    do {
>> >> +        if (cpu_can_run(cpu)) {
>> >> +            r = hvf_vcpu_exec(cpu);
>> >> +            if (r == EXCP_DEBUG) {
>> >> +                cpu_handle_guest_debug(cpu);
>> >> +            }
>> >> +        }
>> >> +        qemu_wait_io_event(cpu);
>> >> +    } while (!cpu->unplug || cpu_can_run(cpu));
>> >> +
>> >> +    hvf_vcpu_destroy(cpu);
>> >> +    cpu_thread_signal_destroyed(cpu);
>> >> +    qemu_mutex_unlock_iothread();
>> >> +    rcu_unregister_thread();
>> >> +    return NULL;
>> >> +}
>> >> +
>> >> +static void hvf_start_vcpu_thread(CPUState *cpu)
>> >> +{
>> >> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>> >> +
>> >> +    /*
>> >> +     * HVF currently does not support TCG, and only runs in
>> >> +     * unrestricted-guest mode.
>> >> +     */
>> >> +    assert(hvf_enabled());
>> >> +
>> >> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>> >> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>> >> +    qemu_cond_init(cpu->halt_cond);
>> >> +
>> >> +    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>> >> +             cpu->cpu_index);
>> >> +    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
>> >> +                       cpu, QEMU_THREAD_JOINABLE);
>> >> +}
>> >> +
>> >> +static const CpusAccel hvf_cpus = {
>> >> +    .create_vcpu_thread = hvf_start_vcpu_thread,
>> >> +
>> >> +    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
>> >> +    .synchronize_post_init = hvf_cpu_synchronize_post_init,
>> >> +    .synchronize_state = hvf_cpu_synchronize_state,
>> >> +    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
>> >> +};
>> >> +
>> >> +static int hvf_accel_init(MachineState *ms)
>> >> +{
>> >> +    int x;
>> >> +    hv_return_t ret;
>> >> +    HVFState *s;
>> >> +
>> >> +    ret = hv_vm_create(HV_VM_DEFAULT);
>> >> +    assert_hvf_ok(ret);
>> >> +
>> >> +    s = g_new0(HVFState, 1);
>> >> +
>> >> +    s->num_slots = 32;
>> >> +    for (x = 0; x < s->num_slots; ++x) {
>> >> +        s->slots[x].size = 0;
>> >> +        s->slots[x].slot_id = x;
>> >> +    }
>> >> +
>> >> +    hvf_state = s;
>> >> +    memory_listener_register(&hvf_memory_listener,
>> &address_space_memory);
>> >> +    cpus_register_accel(&hvf_cpus);
>> >> +    return 0;
>> >> +}
>> >> +
>> >> +static void hvf_accel_class_init(ObjectClass *oc, void *data)
>> >> +{
>> >> +    AccelClass *ac = ACCEL_CLASS(oc);
>> >> +    ac->name = "HVF";
>> >> +    ac->init_machine = hvf_accel_init;
>> >> +    ac->allowed = &hvf_allowed;
>> >> +}
>> >> +
>> >> +static const TypeInfo hvf_accel_type = {
>> >> +    .name = TYPE_HVF_ACCEL,
>> >> +    .parent = TYPE_ACCEL,
>> >> +    .class_init = hvf_accel_class_init,
>> >> +};
>> >> +
>> >> +static void hvf_type_init(void)
>> >> +{
>> >> +    type_register_static(&hvf_accel_type);
>> >> +}
>> >> +
>> >> +type_init(hvf_type_init);
>> >> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
>> >> new file mode 100644
>> >> index 0000000000..dfd6b68dc7
>> >> --- /dev/null
>> >> +++ b/accel/hvf/meson.build
>> >> @@ -0,0 +1,7 @@
>> >> +hvf_ss = ss.source_set()
>> >> +hvf_ss.add(files(
>> >> +  'hvf-all.c',
>> >> +  'hvf-cpus.c',
>> >> +))
>> >> +
>> >> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
>> >> diff --git a/accel/meson.build b/accel/meson.build
>> >> index b26cca227a..6de12ce5d5 100644
>> >> --- a/accel/meson.build
>> >> +++ b/accel/meson.build
>> >> @@ -1,5 +1,6 @@
>> >>   softmmu_ss.add(files('accel.c'))
>> >>
>> >> +subdir('hvf')
>> >>   subdir('qtest')
>> >>   subdir('kvm')
>> >>   subdir('tcg')
>> >> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
>> >> new file mode 100644
>> >> index 0000000000..de9bad23a8
>> >> --- /dev/null
>> >> +++ b/include/sysemu/hvf_int.h
>> >> @@ -0,0 +1,69 @@
>> >> +/*
>> >> + * QEMU Hypervisor.framework (HVF) support
>> >> + *
>> >> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> later.
>> >> + * See the COPYING file in the top-level directory.
>> >> + *
>> >> + */
>> >> +
>> >> +/* header to be included in HVF-specific code */
>> >> +
>> >> +#ifndef HVF_INT_H
>> >> +#define HVF_INT_H
>> >> +
>> >> +#include <Hypervisor/Hypervisor.h>
>> >> +
>> >> +#define HVF_MAX_VCPU 0x10
>> >> +
>> >> +extern struct hvf_state hvf_global;
>> >> +
>> >> +struct hvf_vm {
>> >> +    int id;
>> >> +    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
>> >> +};
>> >> +
>> >> +struct hvf_state {
>> >> +    uint32_t version;
>> >> +    struct hvf_vm *vm;
>> >> +    uint64_t mem_quota;
>> >> +};
>> >> +
>> >> +/* hvf_slot flags */
>> >> +#define HVF_SLOT_LOG (1 << 0)
>> >> +
>> >> +typedef struct hvf_slot {
>> >> +    uint64_t start;
>> >> +    uint64_t size;
>> >> +    uint8_t *mem;
>> >> +    int slot_id;
>> >> +    uint32_t flags;
>> >> +    MemoryRegion *region;
>> >> +} hvf_slot;
>> >> +
>> >> +typedef struct hvf_vcpu_caps {
>> >> +    uint64_t vmx_cap_pinbased;
>> >> +    uint64_t vmx_cap_procbased;
>> >> +    uint64_t vmx_cap_procbased2;
>> >> +    uint64_t vmx_cap_entry;
>> >> +    uint64_t vmx_cap_exit;
>> >> +    uint64_t vmx_cap_preemption_timer;
>> >> +} hvf_vcpu_caps;
>> >> +
>> >> +struct HVFState {
>> >> +    AccelState parent;
>> >> +    hvf_slot slots[32];
>> >> +    int num_slots;
>> >> +
>> >> +    hvf_vcpu_caps *hvf_caps;
>> >> +};
>> >> +extern HVFState *hvf_state;
>> >> +
>> >> +void assert_hvf_ok(hv_return_t ret);
>> >> +int hvf_get_registers(CPUState *cpu);
>> >> +int hvf_put_registers(CPUState *cpu);
>> >> +int hvf_arch_init_vcpu(CPUState *cpu);
>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu);
>> >> +int hvf_vcpu_exec(CPUState *cpu);
>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
>> >> +
>> >> +#endif
>> >> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c
>> >> deleted file mode 100644
>> >> index 817b3d7452..0000000000
>> >> --- a/target/i386/hvf/hvf-cpus.c
>> >> +++ /dev/null
>> >> @@ -1,131 +0,0 @@
>> >> -/*
>> >> - * Copyright 2008 IBM Corporation
>> >> - *           2008 Red Hat, Inc.
>> >> - * Copyright 2011 Intel Corporation
>> >> - * Copyright 2016 Veertu, Inc.
>> >> - * Copyright 2017 The Android Open Source Project
>> >> - *
>> >> - * QEMU Hypervisor.framework support
>> >> - *
>> >> - * This program is free software; you can redistribute it and/or
>> >> - * modify it under the terms of version 2 of the GNU General Public
>> >> - * License as published by the Free Software Foundation.
>> >> - *
>> >> - * This program is distributed in the hope that it will be useful,
>> >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> >> - * General Public License for more details.
>> >> - *
>> >> - * You should have received a copy of the GNU General Public License
>> >> - * along with this program; if not, see <http://www.gnu.org/licenses/
>> >.
>> >> - *
>> >> - * This file contain code under public domain from the hvdos project:
>> >> - * https://github.com/mist64/hvdos
>> >> - *
>> >> - * Parts Copyright (c) 2011 NetApp, Inc.
>> >> - * All rights reserved.
>> >> - *
>> >> - * Redistribution and use in source and binary forms, with or without
>> >> - * modification, are permitted provided that the following conditions
>> >> - * are met:
>> >> - * 1. Redistributions of source code must retain the above copyright
>> >> - *    notice, this list of conditions and the following disclaimer.
>> >> - * 2. Redistributions in binary form must reproduce the above
>> copyright
>> >> - *    notice, this list of conditions and the following disclaimer in
>> the
>> >> - *    documentation and/or other materials provided with the
>> distribution.
>> >> - *
>> >> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
>> >> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
>> THE
>> >> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
>> PURPOSE
>> >> - * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE
>> LIABLE
>> >> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
>> CONSEQUENTIAL
>> >> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
>> GOODS
>> >> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
>> INTERRUPTION)
>> >> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
>> CONTRACT, STRICT
>> >> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
>> ANY WAY
>> >> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
>> POSSIBILITY OF
>> >> - * SUCH DAMAGE.
>> >> - */
>> >> -
>> >> -#include "qemu/osdep.h"
>> >> -#include "qemu/error-report.h"
>> >> -#include "qemu/main-loop.h"
>> >> -#include "sysemu/hvf.h"
>> >> -#include "sysemu/runstate.h"
>> >> -#include "target/i386/cpu.h"
>> >> -#include "qemu/guest-random.h"
>> >> -
>> >> -#include "hvf-cpus.h"
>> >> -
>> >> -/*
>> >> - * The HVF-specific vCPU thread function. This one should only run
>> when the host
>> >> - * CPU supports the VMX "unrestricted guest" feature.
>> >> - */
>> >> -static void *hvf_cpu_thread_fn(void *arg)
>> >> -{
>> >> -    CPUState *cpu = arg;
>> >> -
>> >> -    int r;
>> >> -
>> >> -    assert(hvf_enabled());
>> >> -
>> >> -    rcu_register_thread();
>> >> -
>> >> -    qemu_mutex_lock_iothread();
>> >> -    qemu_thread_get_self(cpu->thread);
>> >> -
>> >> -    cpu->thread_id = qemu_get_thread_id();
>> >> -    cpu->can_do_io = 1;
>> >> -    current_cpu = cpu;
>> >> -
>> >> -    hvf_init_vcpu(cpu);
>> >> -
>> >> -    /* signal CPU creation */
>> >> -    cpu_thread_signal_created(cpu);
>> >> -    qemu_guest_random_seed_thread_part2(cpu->random_seed);
>> >> -
>> >> -    do {
>> >> -        if (cpu_can_run(cpu)) {
>> >> -            r = hvf_vcpu_exec(cpu);
>> >> -            if (r == EXCP_DEBUG) {
>> >> -                cpu_handle_guest_debug(cpu);
>> >> -            }
>> >> -        }
>> >> -        qemu_wait_io_event(cpu);
>> >> -    } while (!cpu->unplug || cpu_can_run(cpu));
>> >> -
>> >> -    hvf_vcpu_destroy(cpu);
>> >> -    cpu_thread_signal_destroyed(cpu);
>> >> -    qemu_mutex_unlock_iothread();
>> >> -    rcu_unregister_thread();
>> >> -    return NULL;
>> >> -}
>> >> -
>> >> -static void hvf_start_vcpu_thread(CPUState *cpu)
>> >> -{
>> >> -    char thread_name[VCPU_THREAD_NAME_SIZE];
>> >> -
>> >> -    /*
>> >> -     * HVF currently does not support TCG, and only runs in
>> >> -     * unrestricted-guest mode.
>> >> -     */
>> >> -    assert(hvf_enabled());
>> >> -
>> >> -    cpu->thread = g_malloc0(sizeof(QemuThread));
>> >> -    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>> >> -    qemu_cond_init(cpu->halt_cond);
>> >> -
>> >> -    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>> >> -             cpu->cpu_index);
>> >> -    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
>> >> -                       cpu, QEMU_THREAD_JOINABLE);
>> >> -}
>> >> -
>> >> -const CpusAccel hvf_cpus = {
>> >> -    .create_vcpu_thread = hvf_start_vcpu_thread,
>> >> -
>> >> -    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
>> >> -    .synchronize_post_init = hvf_cpu_synchronize_post_init,
>> >> -    .synchronize_state = hvf_cpu_synchronize_state,
>> >> -    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
>> >> -};
>> >> diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h
>> >> deleted file mode 100644
>> >> index ced31b82c0..0000000000
>> >> --- a/target/i386/hvf/hvf-cpus.h
>> >> +++ /dev/null
>> >> @@ -1,25 +0,0 @@
>> >> -/*
>> >> - * Accelerator CPUS Interface
>> >> - *
>> >> - * Copyright 2020 SUSE LLC
>> >> - *
>> >> - * This work is licensed under the terms of the GNU GPL, version 2 or
>> later.
>> >> - * See the COPYING file in the top-level directory.
>> >> - */
>> >> -
>> >> -#ifndef HVF_CPUS_H
>> >> -#define HVF_CPUS_H
>> >> -
>> >> -#include "sysemu/cpus.h"
>> >> -
>> >> -extern const CpusAccel hvf_cpus;
>> >> -
>> >> -int hvf_init_vcpu(CPUState *);
>> >> -int hvf_vcpu_exec(CPUState *);
>> >> -void hvf_cpu_synchronize_state(CPUState *);
>> >> -void hvf_cpu_synchronize_post_reset(CPUState *);
>> >> -void hvf_cpu_synchronize_post_init(CPUState *);
>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *);
>> >> -void hvf_vcpu_destroy(CPUState *);
>> >> -
>> >> -#endif /* HVF_CPUS_H */
>> >> diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
>> >> index e0edffd077..6d56f8f6bb 100644
>> >> --- a/target/i386/hvf/hvf-i386.h
>> >> +++ b/target/i386/hvf/hvf-i386.h
>> >> @@ -18,57 +18,11 @@
>> >>
>> >>   #include "sysemu/accel.h"
>> >>   #include "sysemu/hvf.h"
>> >> +#include "sysemu/hvf_int.h"
>> >>   #include "cpu.h"
>> >>   #include "x86.h"
>> >>
>> >> -#define HVF_MAX_VCPU 0x10
>> >> -
>> >> -extern struct hvf_state hvf_global;
>> >> -
>> >> -struct hvf_vm {
>> >> -    int id;
>> >> -    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
>> >> -};
>> >> -
>> >> -struct hvf_state {
>> >> -    uint32_t version;
>> >> -    struct hvf_vm *vm;
>> >> -    uint64_t mem_quota;
>> >> -};
>> >> -
>> >> -/* hvf_slot flags */
>> >> -#define HVF_SLOT_LOG (1 << 0)
>> >> -
>> >> -typedef struct hvf_slot {
>> >> -    uint64_t start;
>> >> -    uint64_t size;
>> >> -    uint8_t *mem;
>> >> -    int slot_id;
>> >> -    uint32_t flags;
>> >> -    MemoryRegion *region;
>> >> -} hvf_slot;
>> >> -
>> >> -typedef struct hvf_vcpu_caps {
>> >> -    uint64_t vmx_cap_pinbased;
>> >> -    uint64_t vmx_cap_procbased;
>> >> -    uint64_t vmx_cap_procbased2;
>> >> -    uint64_t vmx_cap_entry;
>> >> -    uint64_t vmx_cap_exit;
>> >> -    uint64_t vmx_cap_preemption_timer;
>> >> -} hvf_vcpu_caps;
>> >> -
>> >> -struct HVFState {
>> >> -    AccelState parent;
>> >> -    hvf_slot slots[32];
>> >> -    int num_slots;
>> >> -
>> >> -    hvf_vcpu_caps *hvf_caps;
>> >> -};
>> >> -extern HVFState *hvf_state;
>> >> -
>> >> -void hvf_set_phys_mem(MemoryRegionSection *, bool);
>> >>   void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
>> >>
>> >>   #ifdef NEED_CPU_H
>> >>   /* Functions exported to host specific mode */
>> >> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
>> >> index ed9356565c..8b96ecd619 100644
>> >> --- a/target/i386/hvf/hvf.c
>> >> +++ b/target/i386/hvf/hvf.c
>> >> @@ -51,6 +51,7 @@
>> >>   #include "qemu/error-report.h"
>> >>
>> >>   #include "sysemu/hvf.h"
>> >> +#include "sysemu/hvf_int.h"
>> >>   #include "sysemu/runstate.h"
>> >>   #include "hvf-i386.h"
>> >>   #include "vmcs.h"
>> >> @@ -72,171 +73,6 @@
>> >>   #include "sysemu/accel.h"
>> >>   #include "target/i386/cpu.h"
>> >>
>> >> -#include "hvf-cpus.h"
>> >> -
>> >> -HVFState *hvf_state;
>> >> -
>> >> -static void assert_hvf_ok(hv_return_t ret)
>> >> -{
>> >> -    if (ret == HV_SUCCESS) {
>> >> -        return;
>> >> -    }
>> >> -
>> >> -    switch (ret) {
>> >> -    case HV_ERROR:
>> >> -        error_report("Error: HV_ERROR");
>> >> -        break;
>> >> -    case HV_BUSY:
>> >> -        error_report("Error: HV_BUSY");
>> >> -        break;
>> >> -    case HV_BAD_ARGUMENT:
>> >> -        error_report("Error: HV_BAD_ARGUMENT");
>> >> -        break;
>> >> -    case HV_NO_RESOURCES:
>> >> -        error_report("Error: HV_NO_RESOURCES");
>> >> -        break;
>> >> -    case HV_NO_DEVICE:
>> >> -        error_report("Error: HV_NO_DEVICE");
>> >> -        break;
>> >> -    case HV_UNSUPPORTED:
>> >> -        error_report("Error: HV_UNSUPPORTED");
>> >> -        break;
>> >> -    default:
>> >> -        error_report("Unknown Error");
>> >> -    }
>> >> -
>> >> -    abort();
>> >> -}
>> >> -
>> >> -/* Memory slots */
>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
>> >> -{
>> >> -    hvf_slot *slot;
>> >> -    int x;
>> >> -    for (x = 0; x < hvf_state->num_slots; ++x) {
>> >> -        slot = &hvf_state->slots[x];
>> >> -        if (slot->size && start < (slot->start + slot->size) &&
>> >> -            (start + size) > slot->start) {
>> >> -            return slot;
>> >> -        }
>> >> -    }
>> >> -    return NULL;
>> >> -}
>> >> -
>> >> -struct mac_slot {
>> >> -    int present;
>> >> -    uint64_t size;
>> >> -    uint64_t gpa_start;
>> >> -    uint64_t gva;
>> >> -};
>> >> -
>> >> -struct mac_slot mac_slots[32];
>> >> -
>> >> -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
>> >> -{
>> >> -    struct mac_slot *macslot;
>> >> -    hv_return_t ret;
>> >> -
>> >> -    macslot = &mac_slots[slot->slot_id];
>> >> -
>> >> -    if (macslot->present) {
>> >> -        if (macslot->size != slot->size) {
>> >> -            macslot->present = 0;
>> >> -            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
>> >> -            assert_hvf_ok(ret);
>> >> -        }
>> >> -    }
>> >> -
>> >> -    if (!slot->size) {
>> >> -        return 0;
>> >> -    }
>> >> -
>> >> -    macslot->present = 1;
>> >> -    macslot->gpa_start = slot->start;
>> >> -    macslot->size = slot->size;
>> >> -    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size,
>> flags);
>> >> -    assert_hvf_ok(ret);
>> >> -    return 0;
>> >> -}
>> >> -
>> >> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
>> >> -{
>> >> -    hvf_slot *mem;
>> >> -    MemoryRegion *area = section->mr;
>> >> -    bool writeable = !area->readonly && !area->rom_device;
>> >> -    hv_memory_flags_t flags;
>> >> -
>> >> -    if (!memory_region_is_ram(area)) {
>> >> -        if (writeable) {
>> >> -            return;
>> >> -        } else if (!memory_region_is_romd(area)) {
>> >> -            /*
>> >> -             * If the memory device is not in romd_mode, then we
>> actually want
>> >> -             * to remove the hvf memory slot so all accesses will
>> trap.
>> >> -             */
>> >> -             add = false;
>> >> -        }
>> >> -    }
>> >> -
>> >> -    mem = hvf_find_overlap_slot(
>> >> -            section->offset_within_address_space,
>> >> -            int128_get64(section->size));
>> >> -
>> >> -    if (mem && add) {
>> >> -        if (mem->size == int128_get64(section->size) &&
>> >> -            mem->start == section->offset_within_address_space &&
>> >> -            mem->mem == (memory_region_get_ram_ptr(area) +
>> >> -            section->offset_within_region)) {
>> >> -            return; /* Same region was attempted to register, go
>> away. */
>> >> -        }
>> >> -    }
>> >> -
>> >> -    /* Region needs to be reset. set the size to 0 and remap it. */
>> >> -    if (mem) {
>> >> -        mem->size = 0;
>> >> -        if (do_hvf_set_memory(mem, 0)) {
>> >> -            error_report("Failed to reset overlapping slot");
>> >> -            abort();
>> >> -        }
>> >> -    }
>> >> -
>> >> -    if (!add) {
>> >> -        return;
>> >> -    }
>> >> -
>> >> -    if (area->readonly ||
>> >> -        (!memory_region_is_ram(area) && memory_region_is_romd(area)))
>> {
>> >> -        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
>> >> -    } else {
>> >> -        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
>> >> -    }
>> >> -
>> >> -    /* Now make a new slot. */
>> >> -    int x;
>> >> -
>> >> -    for (x = 0; x < hvf_state->num_slots; ++x) {
>> >> -        mem = &hvf_state->slots[x];
>> >> -        if (!mem->size) {
>> >> -            break;
>> >> -        }
>> >> -    }
>> >> -
>> >> -    if (x == hvf_state->num_slots) {
>> >> -        error_report("No free slots");
>> >> -        abort();
>> >> -    }
>> >> -
>> >> -    mem->size = int128_get64(section->size);
>> >> -    mem->mem = memory_region_get_ram_ptr(area) +
>> section->offset_within_region;
>> >> -    mem->start = section->offset_within_address_space;
>> >> -    mem->region = area;
>> >> -
>> >> -    if (do_hvf_set_memory(mem, flags)) {
>> >> -        error_report("Error registering new memory slot");
>> >> -        abort();
>> >> -    }
>> >> -}
>> >> -
>> >>   void vmx_update_tpr(CPUState *cpu)
>> >>   {
>> >>       /* TODO: need integrate APIC handling */
>> >> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t
>> port, void *buffer,
>> >>       }
>> >>   }
>> >>
>> >> -static void do_hvf_cpu_synchronize_state(CPUState *cpu,
>> run_on_cpu_data arg)
>> >> -{
>> >> -    if (!cpu->vcpu_dirty) {
>> >> -        hvf_get_registers(cpu);
>> >> -        cpu->vcpu_dirty = true;
>> >> -    }
>> >> -}
>> >> -
>> >> -void hvf_cpu_synchronize_state(CPUState *cpu)
>> >> -{
>> >> -    if (!cpu->vcpu_dirty) {
>> >> -        run_on_cpu(cpu, do_hvf_cpu_synchronize_state,
>> RUN_ON_CPU_NULL);
>> >> -    }
>> >> -}
>> >> -
>> >> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
>> >> -                                              run_on_cpu_data arg)
>> >> -{
>> >> -    hvf_put_registers(cpu);
>> >> -    cpu->vcpu_dirty = false;
>> >> -}
>> >> -
>> >> -void hvf_cpu_synchronize_post_reset(CPUState *cpu)
>> >> -{
>> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset,
>> RUN_ON_CPU_NULL);
>> >> -}
>> >> -
>> >> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
>> >> -                                             run_on_cpu_data arg)
>> >> -{
>> >> -    hvf_put_registers(cpu);
>> >> -    cpu->vcpu_dirty = false;
>> >> -}
>> >> -
>> >> -void hvf_cpu_synchronize_post_init(CPUState *cpu)
>> >> -{
>> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init,
>> RUN_ON_CPU_NULL);
>> >> -}
>> >> -
>> >> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
>> >> -                                              run_on_cpu_data arg)
>> >> -{
>> >> -    cpu->vcpu_dirty = true;
>> >> -}
>> >> -
>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
>> >> -{
>> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm,
>> RUN_ON_CPU_NULL);
>> >> -}
>> >> -
>> >>   static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa,
>> uint64_t ept_qual)
>> >>   {
>> >>       int read, write;
>> >> @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot *slot,
>> uint64_t gpa, uint64_t ept_qual)
>> >>       return false;
>> >>   }
>> >>
>> >> -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool
>> on)
>> >> -{
>> >> -    hvf_slot *slot;
>> >> -
>> >> -    slot = hvf_find_overlap_slot(
>> >> -            section->offset_within_address_space,
>> >> -            int128_get64(section->size));
>> >> -
>> >> -    /* protect region against writes; begin tracking it */
>> >> -    if (on) {
>> >> -        slot->flags |= HVF_SLOT_LOG;
>> >> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
>> >> -                      HV_MEMORY_READ);
>> >> -    /* stop tracking region*/
>> >> -    } else {
>> >> -        slot->flags &= ~HVF_SLOT_LOG;
>> >> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
>> >> -                      HV_MEMORY_READ | HV_MEMORY_WRITE);
>> >> -    }
>> >> -}
>> >> -
>> >> -static void hvf_log_start(MemoryListener *listener,
>> >> -                          MemoryRegionSection *section, int old, int
>> new)
>> >> -{
>> >> -    if (old != 0) {
>> >> -        return;
>> >> -    }
>> >> -
>> >> -    hvf_set_dirty_tracking(section, 1);
>> >> -}
>> >> -
>> >> -static void hvf_log_stop(MemoryListener *listener,
>> >> -                         MemoryRegionSection *section, int old, int
>> new)
>> >> -{
>> >> -    if (new != 0) {
>> >> -        return;
>> >> -    }
>> >> -
>> >> -    hvf_set_dirty_tracking(section, 0);
>> >> -}
>> >> -
>> >> -static void hvf_log_sync(MemoryListener *listener,
>> >> -                         MemoryRegionSection *section)
>> >> -{
>> >> -    /*
>> >> -     * sync of dirty pages is handled elsewhere; just make sure we
>> keep
>> >> -     * tracking the region.
>> >> -     */
>> >> -    hvf_set_dirty_tracking(section, 1);
>> >> -}
>> >> -
>> >> -static void hvf_region_add(MemoryListener *listener,
>> >> -                           MemoryRegionSection *section)
>> >> -{
>> >> -    hvf_set_phys_mem(section, true);
>> >> -}
>> >> -
>> >> -static void hvf_region_del(MemoryListener *listener,
>> >> -                           MemoryRegionSection *section)
>> >> -{
>> >> -    hvf_set_phys_mem(section, false);
>> >> -}
>> >> -
>> >> -static MemoryListener hvf_memory_listener = {
>> >> -    .priority = 10,
>> >> -    .region_add = hvf_region_add,
>> >> -    .region_del = hvf_region_del,
>> >> -    .log_start = hvf_log_start,
>> >> -    .log_stop = hvf_log_stop,
>> >> -    .log_sync = hvf_log_sync,
>> >> -};
>> >> -
>> >> -void hvf_vcpu_destroy(CPUState *cpu)
>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu)
>> >>   {
>> >>       X86CPU *x86_cpu = X86_CPU(cpu);
>> >>       CPUX86State *env = &x86_cpu->env;
>> >>
>> >> -    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
>> >>       g_free(env->hvf_mmio_buf);
>> >> -    assert_hvf_ok(ret);
>> >> -}
>> >> -
>> >> -static void dummy_signal(int sig)
>> >> -{
>> >>   }
>> >>
>> >> -int hvf_init_vcpu(CPUState *cpu)
>> >> +int hvf_arch_init_vcpu(CPUState *cpu)
>> >>   {
>> >>
>> >>       X86CPU *x86cpu = X86_CPU(cpu);
>> >>       CPUX86State *env = &x86cpu->env;
>> >> -    int r;
>> >> -
>> >> -    /* init cpu signals */
>> >> -    sigset_t set;
>> >> -    struct sigaction sigact;
>> >> -
>> >> -    memset(&sigact, 0, sizeof(sigact));
>> >> -    sigact.sa_handler = dummy_signal;
>> >> -    sigaction(SIG_IPI, &sigact, NULL);
>> >> -
>> >> -    pthread_sigmask(SIG_BLOCK, NULL, &set);
>> >> -    sigdelset(&set, SIG_IPI);
>> >>
>> >>       init_emu();
>> >>       init_decoder();
>> >> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu)
>> >>       hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1);
>> >>       env->hvf_mmio_buf = g_new(char, 4096);
>> >>
>> >> -    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
>> >> -    cpu->vcpu_dirty = 1;
>> >> -    assert_hvf_ok(r);
>> >> -
>> >>       if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
>> >>           &hvf_state->hvf_caps->vmx_cap_pinbased)) {
>> >>           abort();
>> >> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu)
>> >>
>> >>       return ret;
>> >>   }
>> >> -
>> >> -bool hvf_allowed;
>> >> -
>> >> -static int hvf_accel_init(MachineState *ms)
>> >> -{
>> >> -    int x;
>> >> -    hv_return_t ret;
>> >> -    HVFState *s;
>> >> -
>> >> -    ret = hv_vm_create(HV_VM_DEFAULT);
>> >> -    assert_hvf_ok(ret);
>> >> -
>> >> -    s = g_new0(HVFState, 1);
>> >> -
>> >> -    s->num_slots = 32;
>> >> -    for (x = 0; x < s->num_slots; ++x) {
>> >> -        s->slots[x].size = 0;
>> >> -        s->slots[x].slot_id = x;
>> >> -    }
>> >> -
>> >> -    hvf_state = s;
>> >> -    memory_listener_register(&hvf_memory_listener,
>> &address_space_memory);
>> >> -    cpus_register_accel(&hvf_cpus);
>> >> -    return 0;
>> >> -}
>> >> -
>> >> -static void hvf_accel_class_init(ObjectClass *oc, void *data)
>> >> -{
>> >> -    AccelClass *ac = ACCEL_CLASS(oc);
>> >> -    ac->name = "HVF";
>> >> -    ac->init_machine = hvf_accel_init;
>> >> -    ac->allowed = &hvf_allowed;
>> >> -}
>> >> -
>> >> -static const TypeInfo hvf_accel_type = {
>> >> -    .name = TYPE_HVF_ACCEL,
>> >> -    .parent = TYPE_ACCEL,
>> >> -    .class_init = hvf_accel_class_init,
>> >> -};
>> >> -
>> >> -static void hvf_type_init(void)
>> >> -{
>> >> -    type_register_static(&hvf_accel_type);
>> >> -}
>> >> -
>> >> -type_init(hvf_type_init);
>> >> diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
>> >> index 409c9a3f14..c8a43717ee 100644
>> >> --- a/target/i386/hvf/meson.build
>> >> +++ b/target/i386/hvf/meson.build
>> >> @@ -1,6 +1,5 @@
>> >>   i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
>> >>     'hvf.c',
>> >> -  'hvf-cpus.c',
>> >>     'x86.c',
>> >>     'x86_cpuid.c',
>> >>     'x86_decode.c',
>> >> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
>> >> index bbec412b6c..89b8e9d87a 100644
>> >> --- a/target/i386/hvf/x86hvf.c
>> >> +++ b/target/i386/hvf/x86hvf.c
>> >> @@ -20,6 +20,9 @@
>> >>   #include "qemu/osdep.h"
>> >>
>> >>   #include "qemu-common.h"
>> >> +#include "sysemu/hvf.h"
>> >> +#include "sysemu/hvf_int.h"
>> >> +#include "sysemu/hw_accel.h"
>> >>   #include "x86hvf.h"
>> >>   #include "vmx.h"
>> >>   #include "vmcs.h"
>> >> @@ -32,8 +35,6 @@
>> >>   #include <Hypervisor/hv.h>
>> >>   #include <Hypervisor/hv_vmx.h>
>> >>
>> >> -#include "hvf-cpus.h"
>> >> -
>> >>   void hvf_set_segment(struct CPUState *cpu, struct vmx_segment
>> *vmx_seg,
>> >>                        SegmentCache *qseg, bool is_tr)
>> >>   {
>> >> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state)
>> >>       env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
>> >>
>> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
>> >> -        hvf_cpu_synchronize_state(cpu_state);
>> >> +        cpu_synchronize_state(cpu_state);
>> >>           do_cpu_init(cpu);
>> >>       }
>> >>
>> >> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state)
>> >>           cpu_state->halted = 0;
>> >>       }
>> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) {
>> >> -        hvf_cpu_synchronize_state(cpu_state);
>> >> +        cpu_synchronize_state(cpu_state);
>> >>           do_cpu_sipi(cpu);
>> >>       }
>> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) {
>> >>           cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR;
>> >> -        hvf_cpu_synchronize_state(cpu_state);
>> >> +        cpu_synchronize_state(cpu_state);
>> > The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should
>> > be a separate patch. It follows cpu/accel cleanups Claudio was doing the
>> > summer.
>>
>>
>> The only reason they're in here is because we no longer have access to
>> the hvf_ functions from the file. I am perfectly happy to rebase the
>> patch on top of Claudio's if his goes in first. I'm sure it'll be
>> trivial for him to rebase on top of this too if my series goes in first.
>>
>>
>> >
>> > Phillipe raised the idea that the patch might go ahead of ARM-specific
>> > part (which might involve some discussions) and I agree with that.
>> >
>> > Some sync between Claudio series (CC'd him) and the patch might be need.
>>
>>
>> I would prefer not to hold back because of the sync. Claudio's cleanup
>> is trivial enough to adjust for if it gets merged ahead of this.
>>
>>
>> Alex
>>
>>
>>
>>
Alexander Graf Nov. 30, 2020, 8:33 p.m. UTC | #5
Hi Frank,

Thanks for the update :). Your previous email nudged me into the right 
direction. I previously had implemented WFI through the internal timer 
framework which performed way worse.

Along the way, I stumbled over a few issues though. For starters, the 
signal mask for SIG_IPI was not set correctly, so while pselect() would 
exit, the signal would never get delivered to the thread! For a fix, 
check out

https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/

Please also have a look at my latest stab at WFI emulation. It doesn't 
handle WFE (that's only relevant in overcommitted scenarios). But it 
does handle WFI and even does something similar to hlt polling, albeit 
not with an adaptive threshold.

Also, is there a particular reason you're working on this super 
interesting and useful code in a random downstream fork of QEMU? 
Wouldn't it be more helpful to contribute to the upstream code base instead?


Alex


On 30.11.20 21:15, Frank Yang wrote:
> Update: We're not quite sure how to compare the CNTV_CVAL and CNTVCT. 
> But the high CPU usage seems to be mitigated by having a poll interval 
> (like KVM does) in handling WFI:
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1512501 
> <https://android-review.googlesource.com/c/platform/external/qemu/+/1512501>
>
> This is loosely inspired by 
> https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766 
> <https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766> 
> which does seem to specify a poll interval.
>
> It would be cool if we could have a lightweight way to enter sleep and 
> restart the vcpus precisely when CVAL passes, though.
>
> Frank
>
>
> On Fri, Nov 27, 2020 at 3:30 PM Frank Yang <lfy@google.com 
> <mailto:lfy@google.com>> wrote:
>
>     Hi all,
>
>     +Peter Collingbourne <mailto:pcc@google.com>
>
>     I'm a developer on the Android Emulator, which is in a fork of QEMU.
>
>     Peter and I have been working on an HVF Apple Silicon backend with
>     an eye toward Android guests.
>
>     We have gotten things to basically switch to Android userspace
>     already (logcat/shell and graphics available at least)
>
>     Our strategy so far has been to import logic from the KVM
>     implementation and hook into QEMU's software devices
>     that previously assumed to only work with TCG, or have
>     KVM-specific paths.
>
>     Thanks to Alexander for the tip on the 36-bit address space
>     limitation btw; our way of addressing this is to still allow
>     highmem but not put pci high mmio so high.
>
>     Also, note we have a sleep/signal based mechanism to deal with
>     WFx, which might be worth looking into in Alexander's
>     implementation as well:
>
>     https://android-review.googlesource.com/c/platform/external/qemu/+/1512551
>     <https://android-review.googlesource.com/c/platform/external/qemu/+/1512551>
>
>     Patches so far, FYI:
>
>     https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1
>     <https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1>
>     https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3
>     <https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3>
>     https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3
>     <https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3>
>     https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3
>     <https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3>
>     https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3
>     <https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3>
>
>     https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a
>     <https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a>
>     https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b
>     <https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b>
>     https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01
>     <https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01>
>     https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228
>     <https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228>
>     https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102
>     <https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102>
>     https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6
>     <https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6>
>
>     Peter's also noticed that there are extra steps needed for M1's to
>     allow TCG to work, as it involves JIT:
>
>     https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9
>     <https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9>
>
>     We'd appreciate any feedback/comments :)
>
>     Best,
>
>     Frank
>
>     On Fri, Nov 27, 2020 at 1:57 PM Alexander Graf <agraf@csgraf.de
>     <mailto:agraf@csgraf.de>> wrote:
>
>
>         On 27.11.20 21:00, Roman Bolshakov wrote:
>         > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote:
>         >> Until now, Hypervisor.framework has only been available on
>         x86_64 systems.
>         >> With Apple Silicon shipping now, it extends its reach to
>         aarch64. To
>         >> prepare for support for multiple architectures, let's move
>         common code out
>         >> into its own accel directory.
>         >>
>         >> Signed-off-by: Alexander Graf <agraf@csgraf.de
>         <mailto:agraf@csgraf.de>>
>         >> ---
>         >>   MAINTAINERS                 |   9 +-
>         >>   accel/hvf/hvf-all.c         |  56 +++++
>         >>   accel/hvf/hvf-cpus.c        | 468
>         ++++++++++++++++++++++++++++++++++++
>         >>   accel/hvf/meson.build       |   7 +
>         >>   accel/meson.build           |   1 +
>         >>   include/sysemu/hvf_int.h    |  69 ++++++
>         >>   target/i386/hvf/hvf-cpus.c  | 131 ----------
>         >>   target/i386/hvf/hvf-cpus.h  |  25 --
>         >>   target/i386/hvf/hvf-i386.h  |  48 +---
>         >>   target/i386/hvf/hvf.c       | 360 +--------------------------
>         >>   target/i386/hvf/meson.build |   1 -
>         >>   target/i386/hvf/x86hvf.c    |  11 +-
>         >>   target/i386/hvf/x86hvf.h    |   2 -
>         >>   13 files changed, 619 insertions(+), 569 deletions(-)
>         >>   create mode 100644 accel/hvf/hvf-all.c
>         >>   create mode 100644 accel/hvf/hvf-cpus.c
>         >>   create mode 100644 accel/hvf/meson.build
>         >>   create mode 100644 include/sysemu/hvf_int.h
>         >>   delete mode 100644 target/i386/hvf/hvf-cpus.c
>         >>   delete mode 100644 target/i386/hvf/hvf-cpus.h
>         >>
>         >> diff --git a/MAINTAINERS b/MAINTAINERS
>         >> index 68bc160f41..ca4b6d9279 100644
>         >> --- a/MAINTAINERS
>         >> +++ b/MAINTAINERS
>         >> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com
>         <mailto:dirty@apple.com>>
>         >>   M: Roman Bolshakov <r.bolshakov@yadro.com
>         <mailto:r.bolshakov@yadro.com>>
>         >>   W: https://wiki.qemu.org/Features/HVF
>         <https://wiki.qemu.org/Features/HVF>
>         >>   S: Maintained
>         >> -F: accel/stubs/hvf-stub.c
>         > There was a patch for that in the RFC series from Claudio.
>
>
>         Yeah, I'm not worried about this hunk :).
>
>
>         >
>         >>   F: target/i386/hvf/
>         >> +
>         >> +HVF
>         >> +M: Cameron Esfahani <dirty@apple.com <mailto:dirty@apple.com>>
>         >> +M: Roman Bolshakov <r.bolshakov@yadro.com
>         <mailto:r.bolshakov@yadro.com>>
>         >> +W: https://wiki.qemu.org/Features/HVF
>         <https://wiki.qemu.org/Features/HVF>
>         >> +S: Maintained
>         >> +F: accel/hvf/
>         >>   F: include/sysemu/hvf.h
>         >> +F: include/sysemu/hvf_int.h
>         >>
>         >>   WHPX CPUs
>         >>   M: Sunil Muthuswamy <sunilmut@microsoft.com
>         <mailto:sunilmut@microsoft.com>>
>         >> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
>         >> new file mode 100644
>         >> index 0000000000..47d77a472a
>         >> --- /dev/null
>         >> +++ b/accel/hvf/hvf-all.c
>         >> @@ -0,0 +1,56 @@
>         >> +/*
>         >> + * QEMU Hypervisor.framework support
>         >> + *
>         >> + * This work is licensed under the terms of the GNU GPL,
>         version 2.  See
>         >> + * the COPYING file in the top-level directory.
>         >> + *
>         >> + * Contributions after 2012-01-13 are licensed under the
>         terms of the
>         >> + * GNU GPL, version 2 or (at your option) any later version.
>         >> + */
>         >> +
>         >> +#include "qemu/osdep.h"
>         >> +#include "qemu-common.h"
>         >> +#include "qemu/error-report.h"
>         >> +#include "sysemu/hvf.h"
>         >> +#include "sysemu/hvf_int.h"
>         >> +#include "sysemu/runstate.h"
>         >> +
>         >> +#include "qemu/main-loop.h"
>         >> +#include "sysemu/accel.h"
>         >> +
>         >> +#include <Hypervisor/Hypervisor.h>
>         >> +
>         >> +bool hvf_allowed;
>         >> +HVFState *hvf_state;
>         >> +
>         >> +void assert_hvf_ok(hv_return_t ret)
>         >> +{
>         >> +    if (ret == HV_SUCCESS) {
>         >> +        return;
>         >> +    }
>         >> +
>         >> +    switch (ret) {
>         >> +    case HV_ERROR:
>         >> +        error_report("Error: HV_ERROR");
>         >> +        break;
>         >> +    case HV_BUSY:
>         >> +        error_report("Error: HV_BUSY");
>         >> +        break;
>         >> +    case HV_BAD_ARGUMENT:
>         >> +        error_report("Error: HV_BAD_ARGUMENT");
>         >> +        break;
>         >> +    case HV_NO_RESOURCES:
>         >> +        error_report("Error: HV_NO_RESOURCES");
>         >> +        break;
>         >> +    case HV_NO_DEVICE:
>         >> +        error_report("Error: HV_NO_DEVICE");
>         >> +        break;
>         >> +    case HV_UNSUPPORTED:
>         >> +        error_report("Error: HV_UNSUPPORTED");
>         >> +        break;
>         >> +    default:
>         >> +        error_report("Unknown Error");
>         >> +    }
>         >> +
>         >> +    abort();
>         >> +}
>         >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
>         >> new file mode 100644
>         >> index 0000000000..f9bb5502b7
>         >> --- /dev/null
>         >> +++ b/accel/hvf/hvf-cpus.c
>         >> @@ -0,0 +1,468 @@
>         >> +/*
>         >> + * Copyright 2008 IBM Corporation
>         >> + *           2008 Red Hat, Inc.
>         >> + * Copyright 2011 Intel Corporation
>         >> + * Copyright 2016 Veertu, Inc.
>         >> + * Copyright 2017 The Android Open Source Project
>         >> + *
>         >> + * QEMU Hypervisor.framework support
>         >> + *
>         >> + * This program is free software; you can redistribute it
>         and/or
>         >> + * modify it under the terms of version 2 of the GNU
>         General Public
>         >> + * License as published by the Free Software Foundation.
>         >> + *
>         >> + * This program is distributed in the hope that it will be
>         useful,
>         >> + * but WITHOUT ANY WARRANTY; without even the implied
>         warranty of
>         >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 
>         See the GNU
>         >> + * General Public License for more details.
>         >> + *
>         >> + * You should have received a copy of the GNU General
>         Public License
>         >> + * along with this program; if not, see
>         <http://www.gnu.org/licenses/ <http://www.gnu.org/licenses/>>.
>         >> + *
>         >> + * This file contain code under public domain from the
>         hvdos project:
>         >> + * https://github.com/mist64/hvdos
>         <https://github.com/mist64/hvdos>
>         >> + *
>         >> + * Parts Copyright (c) 2011 NetApp, Inc.
>         >> + * All rights reserved.
>         >> + *
>         >> + * Redistribution and use in source and binary forms, with
>         or without
>         >> + * modification, are permitted provided that the following
>         conditions
>         >> + * are met:
>         >> + * 1. Redistributions of source code must retain the above
>         copyright
>         >> + *    notice, this list of conditions and the following
>         disclaimer.
>         >> + * 2. Redistributions in binary form must reproduce the
>         above copyright
>         >> + *    notice, this list of conditions and the following
>         disclaimer in the
>         >> + *    documentation and/or other materials provided with
>         the distribution.
>         >> + *
>         >> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
>         >> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>         LIMITED TO, THE
>         >> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>         PARTICULAR PURPOSE
>         >> + * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR
>         CONTRIBUTORS BE LIABLE
>         >> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
>         EXEMPLARY, OR CONSEQUENTIAL
>         >> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
>         SUBSTITUTE GOODS
>         >> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
>         INTERRUPTION)
>         >> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
>         IN CONTRACT, STRICT
>         >> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
>         ARISING IN ANY WAY
>         >> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
>         POSSIBILITY OF
>         >> + * SUCH DAMAGE.
>         >> + */
>         >> +
>         >> +#include "qemu/osdep.h"
>         >> +#include "qemu/error-report.h"
>         >> +#include "qemu/main-loop.h"
>         >> +#include "exec/address-spaces.h"
>         >> +#include "exec/exec-all.h"
>         >> +#include "sysemu/cpus.h"
>         >> +#include "sysemu/hvf.h"
>         >> +#include "sysemu/hvf_int.h"
>         >> +#include "sysemu/runstate.h"
>         >> +#include "qemu/guest-random.h"
>         >> +
>         >> +#include <Hypervisor/Hypervisor.h>
>         >> +
>         >> +/* Memory slots */
>         >> +
>         >> +struct mac_slot {
>         >> +    int present;
>         >> +    uint64_t size;
>         >> +    uint64_t gpa_start;
>         >> +    uint64_t gva;
>         >> +};
>         >> +
>         >> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
>         >> +{
>         >> +    hvf_slot *slot;
>         >> +    int x;
>         >> +    for (x = 0; x < hvf_state->num_slots; ++x) {
>         >> +        slot = &hvf_state->slots[x];
>         >> +        if (slot->size && start < (slot->start +
>         slot->size) &&
>         >> +            (start + size) > slot->start) {
>         >> +            return slot;
>         >> +        }
>         >> +    }
>         >> +    return NULL;
>         >> +}
>         >> +
>         >> +struct mac_slot mac_slots[32];
>         >> +
>         >> +static int do_hvf_set_memory(hvf_slot *slot,
>         hv_memory_flags_t flags)
>         >> +{
>         >> +    struct mac_slot *macslot;
>         >> +    hv_return_t ret;
>         >> +
>         >> +    macslot = &mac_slots[slot->slot_id];
>         >> +
>         >> +    if (macslot->present) {
>         >> +        if (macslot->size != slot->size) {
>         >> +            macslot->present = 0;
>         >> +            ret = hv_vm_unmap(macslot->gpa_start,
>         macslot->size);
>         >> +            assert_hvf_ok(ret);
>         >> +        }
>         >> +    }
>         >> +
>         >> +    if (!slot->size) {
>         >> +        return 0;
>         >> +    }
>         >> +
>         >> +    macslot->present = 1;
>         >> +    macslot->gpa_start = slot->start;
>         >> +    macslot->size = slot->size;
>         >> +    ret = hv_vm_map(slot->mem, slot->start, slot->size,
>         flags);
>         >> +    assert_hvf_ok(ret);
>         >> +    return 0;
>         >> +}
>         >> +
>         >> +static void hvf_set_phys_mem(MemoryRegionSection *section,
>         bool add)
>         >> +{
>         >> +    hvf_slot *mem;
>         >> +    MemoryRegion *area = section->mr;
>         >> +    bool writeable = !area->readonly && !area->rom_device;
>         >> +    hv_memory_flags_t flags;
>         >> +
>         >> +    if (!memory_region_is_ram(area)) {
>         >> +        if (writeable) {
>         >> +            return;
>         >> +        } else if (!memory_region_is_romd(area)) {
>         >> +            /*
>         >> +             * If the memory device is not in romd_mode,
>         then we actually want
>         >> +             * to remove the hvf memory slot so all
>         accesses will trap.
>         >> +             */
>         >> +             add = false;
>         >> +        }
>         >> +    }
>         >> +
>         >> +    mem = hvf_find_overlap_slot(
>         >> + section->offset_within_address_space,
>         >> +            int128_get64(section->size));
>         >> +
>         >> +    if (mem && add) {
>         >> +        if (mem->size == int128_get64(section->size) &&
>         >> +            mem->start ==
>         section->offset_within_address_space &&
>         >> +            mem->mem == (memory_region_get_ram_ptr(area) +
>         >> +            section->offset_within_region)) {
>         >> +            return; /* Same region was attempted to
>         register, go away. */
>         >> +        }
>         >> +    }
>         >> +
>         >> +    /* Region needs to be reset. set the size to 0 and
>         remap it. */
>         >> +    if (mem) {
>         >> +        mem->size = 0;
>         >> +        if (do_hvf_set_memory(mem, 0)) {
>         >> +            error_report("Failed to reset overlapping slot");
>         >> +            abort();
>         >> +        }
>         >> +    }
>         >> +
>         >> +    if (!add) {
>         >> +        return;
>         >> +    }
>         >> +
>         >> +    if (area->readonly ||
>         >> +        (!memory_region_is_ram(area) &&
>         memory_region_is_romd(area))) {
>         >> +        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
>         >> +    } else {
>         >> +        flags = HV_MEMORY_READ | HV_MEMORY_WRITE |
>         HV_MEMORY_EXEC;
>         >> +    }
>         >> +
>         >> +    /* Now make a new slot. */
>         >> +    int x;
>         >> +
>         >> +    for (x = 0; x < hvf_state->num_slots; ++x) {
>         >> +        mem = &hvf_state->slots[x];
>         >> +        if (!mem->size) {
>         >> +            break;
>         >> +        }
>         >> +    }
>         >> +
>         >> +    if (x == hvf_state->num_slots) {
>         >> +        error_report("No free slots");
>         >> +        abort();
>         >> +    }
>         >> +
>         >> +    mem->size = int128_get64(section->size);
>         >> +    mem->mem = memory_region_get_ram_ptr(area) +
>         section->offset_within_region;
>         >> +    mem->start = section->offset_within_address_space;
>         >> +    mem->region = area;
>         >> +
>         >> +    if (do_hvf_set_memory(mem, flags)) {
>         >> +        error_report("Error registering new memory slot");
>         >> +        abort();
>         >> +    }
>         >> +}
>         >> +
>         >> +static void hvf_set_dirty_tracking(MemoryRegionSection
>         *section, bool on)
>         >> +{
>         >> +    hvf_slot *slot;
>         >> +
>         >> +    slot = hvf_find_overlap_slot(
>         >> + section->offset_within_address_space,
>         >> +            int128_get64(section->size));
>         >> +
>         >> +    /* protect region against writes; begin tracking it */
>         >> +    if (on) {
>         >> +        slot->flags |= HVF_SLOT_LOG;
>         >> +        hv_vm_protect((uintptr_t)slot->start,
>         (size_t)slot->size,
>         >> +                      HV_MEMORY_READ);
>         >> +    /* stop tracking region*/
>         >> +    } else {
>         >> +        slot->flags &= ~HVF_SLOT_LOG;
>         >> +        hv_vm_protect((uintptr_t)slot->start,
>         (size_t)slot->size,
>         >> +                      HV_MEMORY_READ | HV_MEMORY_WRITE);
>         >> +    }
>         >> +}
>         >> +
>         >> +static void hvf_log_start(MemoryListener *listener,
>         >> +                          MemoryRegionSection *section,
>         int old, int new)
>         >> +{
>         >> +    if (old != 0) {
>         >> +        return;
>         >> +    }
>         >> +
>         >> +    hvf_set_dirty_tracking(section, 1);
>         >> +}
>         >> +
>         >> +static void hvf_log_stop(MemoryListener *listener,
>         >> +                         MemoryRegionSection *section, int
>         old, int new)
>         >> +{
>         >> +    if (new != 0) {
>         >> +        return;
>         >> +    }
>         >> +
>         >> +    hvf_set_dirty_tracking(section, 0);
>         >> +}
>         >> +
>         >> +static void hvf_log_sync(MemoryListener *listener,
>         >> +                         MemoryRegionSection *section)
>         >> +{
>         >> +    /*
>         >> +     * sync of dirty pages is handled elsewhere; just make
>         sure we keep
>         >> +     * tracking the region.
>         >> +     */
>         >> +    hvf_set_dirty_tracking(section, 1);
>         >> +}
>         >> +
>         >> +static void hvf_region_add(MemoryListener *listener,
>         >> +                           MemoryRegionSection *section)
>         >> +{
>         >> +    hvf_set_phys_mem(section, true);
>         >> +}
>         >> +
>         >> +static void hvf_region_del(MemoryListener *listener,
>         >> +                           MemoryRegionSection *section)
>         >> +{
>         >> +    hvf_set_phys_mem(section, false);
>         >> +}
>         >> +
>         >> +static MemoryListener hvf_memory_listener = {
>         >> +    .priority = 10,
>         >> +    .region_add = hvf_region_add,
>         >> +    .region_del = hvf_region_del,
>         >> +    .log_start = hvf_log_start,
>         >> +    .log_stop = hvf_log_stop,
>         >> +    .log_sync = hvf_log_sync,
>         >> +};
>         >> +
>         >> +static void do_hvf_cpu_synchronize_state(CPUState *cpu,
>         run_on_cpu_data arg)
>         >> +{
>         >> +    if (!cpu->vcpu_dirty) {
>         >> +        hvf_get_registers(cpu);
>         >> +        cpu->vcpu_dirty = true;
>         >> +    }
>         >> +}
>         >> +
>         >> +static void hvf_cpu_synchronize_state(CPUState *cpu)
>         >> +{
>         >> +    if (!cpu->vcpu_dirty) {
>         >> +        run_on_cpu(cpu, do_hvf_cpu_synchronize_state,
>         RUN_ON_CPU_NULL);
>         >> +    }
>         >> +}
>         >> +
>         >> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
>         >> + run_on_cpu_data arg)
>         >> +{
>         >> +    hvf_put_registers(cpu);
>         >> +    cpu->vcpu_dirty = false;
>         >> +}
>         >> +
>         >> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
>         >> +{
>         >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset,
>         RUN_ON_CPU_NULL);
>         >> +}
>         >> +
>         >> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
>         >> +  run_on_cpu_data arg)
>         >> +{
>         >> +    hvf_put_registers(cpu);
>         >> +    cpu->vcpu_dirty = false;
>         >> +}
>         >> +
>         >> +static void hvf_cpu_synchronize_post_init(CPUState *cpu)
>         >> +{
>         >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init,
>         RUN_ON_CPU_NULL);
>         >> +}
>         >> +
>         >> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
>         >> + run_on_cpu_data arg)
>         >> +{
>         >> +    cpu->vcpu_dirty = true;
>         >> +}
>         >> +
>         >> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
>         >> +{
>         >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm,
>         RUN_ON_CPU_NULL);
>         >> +}
>         >> +
>         >> +static void hvf_vcpu_destroy(CPUState *cpu)
>         >> +{
>         >> +    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
>         >> +    assert_hvf_ok(ret);
>         >> +
>         >> +    hvf_arch_vcpu_destroy(cpu);
>         >> +}
>         >> +
>         >> +static void dummy_signal(int sig)
>         >> +{
>         >> +}
>         >> +
>         >> +static int hvf_init_vcpu(CPUState *cpu)
>         >> +{
>         >> +    int r;
>         >> +
>         >> +    /* init cpu signals */
>         >> +    sigset_t set;
>         >> +    struct sigaction sigact;
>         >> +
>         >> +    memset(&sigact, 0, sizeof(sigact));
>         >> +    sigact.sa_handler = dummy_signal;
>         >> +    sigaction(SIG_IPI, &sigact, NULL);
>         >> +
>         >> +    pthread_sigmask(SIG_BLOCK, NULL, &set);
>         >> +    sigdelset(&set, SIG_IPI);
>         >> +
>         >> +#ifdef __aarch64__
>         >> +    r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t
>         **)&cpu->hvf_exit, NULL);
>         >> +#else
>         >> +    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd,
>         HV_VCPU_DEFAULT);
>         >> +#endif
>         > I think the first __aarch64__ bit fits better to arm part of
>         the series.
>
>
>         Oops. Thanks for catching it! Yes, absolutely. It should be
>         part of the
>         ARM enablement.
>
>
>         >
>         >> +    cpu->vcpu_dirty = 1;
>         >> +    assert_hvf_ok(r);
>         >> +
>         >> +    return hvf_arch_init_vcpu(cpu);
>         >> +}
>         >> +
>         >> +/*
>         >> + * The HVF-specific vCPU thread function. This one should
>         only run when the host
>         >> + * CPU supports the VMX "unrestricted guest" feature.
>         >> + */
>         >> +static void *hvf_cpu_thread_fn(void *arg)
>         >> +{
>         >> +    CPUState *cpu = arg;
>         >> +
>         >> +    int r;
>         >> +
>         >> +    assert(hvf_enabled());
>         >> +
>         >> +    rcu_register_thread();
>         >> +
>         >> +    qemu_mutex_lock_iothread();
>         >> +    qemu_thread_get_self(cpu->thread);
>         >> +
>         >> +    cpu->thread_id = qemu_get_thread_id();
>         >> +    cpu->can_do_io = 1;
>         >> +    current_cpu = cpu;
>         >> +
>         >> +    hvf_init_vcpu(cpu);
>         >> +
>         >> +    /* signal CPU creation */
>         >> +    cpu_thread_signal_created(cpu);
>         >> + qemu_guest_random_seed_thread_part2(cpu->random_seed);
>         >> +
>         >> +    do {
>         >> +        if (cpu_can_run(cpu)) {
>         >> +            r = hvf_vcpu_exec(cpu);
>         >> +            if (r == EXCP_DEBUG) {
>         >> +                cpu_handle_guest_debug(cpu);
>         >> +            }
>         >> +        }
>         >> +        qemu_wait_io_event(cpu);
>         >> +    } while (!cpu->unplug || cpu_can_run(cpu));
>         >> +
>         >> +    hvf_vcpu_destroy(cpu);
>         >> +    cpu_thread_signal_destroyed(cpu);
>         >> +    qemu_mutex_unlock_iothread();
>         >> +    rcu_unregister_thread();
>         >> +    return NULL;
>         >> +}
>         >> +
>         >> +static void hvf_start_vcpu_thread(CPUState *cpu)
>         >> +{
>         >> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>         >> +
>         >> +    /*
>         >> +     * HVF currently does not support TCG, and only runs in
>         >> +     * unrestricted-guest mode.
>         >> +     */
>         >> +    assert(hvf_enabled());
>         >> +
>         >> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>         >> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>         >> +    qemu_cond_init(cpu->halt_cond);
>         >> +
>         >> +    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>         >> +             cpu->cpu_index);
>         >> +    qemu_thread_create(cpu->thread, thread_name,
>         hvf_cpu_thread_fn,
>         >> +                       cpu, QEMU_THREAD_JOINABLE);
>         >> +}
>         >> +
>         >> +static const CpusAccel hvf_cpus = {
>         >> +    .create_vcpu_thread = hvf_start_vcpu_thread,
>         >> +
>         >> +    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
>         >> +    .synchronize_post_init = hvf_cpu_synchronize_post_init,
>         >> +    .synchronize_state = hvf_cpu_synchronize_state,
>         >> +    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
>         >> +};
>         >> +
>         >> +static int hvf_accel_init(MachineState *ms)
>         >> +{
>         >> +    int x;
>         >> +    hv_return_t ret;
>         >> +    HVFState *s;
>         >> +
>         >> +    ret = hv_vm_create(HV_VM_DEFAULT);
>         >> +    assert_hvf_ok(ret);
>         >> +
>         >> +    s = g_new0(HVFState, 1);
>         >> +
>         >> +    s->num_slots = 32;
>         >> +    for (x = 0; x < s->num_slots; ++x) {
>         >> +        s->slots[x].size = 0;
>         >> +        s->slots[x].slot_id = x;
>         >> +    }
>         >> +
>         >> +    hvf_state = s;
>         >> + memory_listener_register(&hvf_memory_listener,
>         &address_space_memory);
>         >> +    cpus_register_accel(&hvf_cpus);
>         >> +    return 0;
>         >> +}
>         >> +
>         >> +static void hvf_accel_class_init(ObjectClass *oc, void *data)
>         >> +{
>         >> +    AccelClass *ac = ACCEL_CLASS(oc);
>         >> +    ac->name = "HVF";
>         >> +    ac->init_machine = hvf_accel_init;
>         >> +    ac->allowed = &hvf_allowed;
>         >> +}
>         >> +
>         >> +static const TypeInfo hvf_accel_type = {
>         >> +    .name = TYPE_HVF_ACCEL,
>         >> +    .parent = TYPE_ACCEL,
>         >> +    .class_init = hvf_accel_class_init,
>         >> +};
>         >> +
>         >> +static void hvf_type_init(void)
>         >> +{
>         >> +    type_register_static(&hvf_accel_type);
>         >> +}
>         >> +
>         >> +type_init(hvf_type_init);
>         >> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
>         >> new file mode 100644
>         >> index 0000000000..dfd6b68dc7
>         >> --- /dev/null
>         >> +++ b/accel/hvf/meson.build
>         >> @@ -0,0 +1,7 @@
>         >> +hvf_ss = ss.source_set()
>         >> +hvf_ss.add(files(
>         >> +  'hvf-all.c',
>         >> +  'hvf-cpus.c',
>         >> +))
>         >> +
>         >> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
>         >> diff --git a/accel/meson.build b/accel/meson.build
>         >> index b26cca227a..6de12ce5d5 100644
>         >> --- a/accel/meson.build
>         >> +++ b/accel/meson.build
>         >> @@ -1,5 +1,6 @@
>         >>   softmmu_ss.add(files('accel.c'))
>         >>
>         >> +subdir('hvf')
>         >>   subdir('qtest')
>         >>   subdir('kvm')
>         >>   subdir('tcg')
>         >> diff --git a/include/sysemu/hvf_int.h
>         b/include/sysemu/hvf_int.h
>         >> new file mode 100644
>         >> index 0000000000..de9bad23a8
>         >> --- /dev/null
>         >> +++ b/include/sysemu/hvf_int.h
>         >> @@ -0,0 +1,69 @@
>         >> +/*
>         >> + * QEMU Hypervisor.framework (HVF) support
>         >> + *
>         >> + * This work is licensed under the terms of the GNU GPL,
>         version 2 or later.
>         >> + * See the COPYING file in the top-level directory.
>         >> + *
>         >> + */
>         >> +
>         >> +/* header to be included in HVF-specific code */
>         >> +
>         >> +#ifndef HVF_INT_H
>         >> +#define HVF_INT_H
>         >> +
>         >> +#include <Hypervisor/Hypervisor.h>
>         >> +
>         >> +#define HVF_MAX_VCPU 0x10
>         >> +
>         >> +extern struct hvf_state hvf_global;
>         >> +
>         >> +struct hvf_vm {
>         >> +    int id;
>         >> +    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
>         >> +};
>         >> +
>         >> +struct hvf_state {
>         >> +    uint32_t version;
>         >> +    struct hvf_vm *vm;
>         >> +    uint64_t mem_quota;
>         >> +};
>         >> +
>         >> +/* hvf_slot flags */
>         >> +#define HVF_SLOT_LOG (1 << 0)
>         >> +
>         >> +typedef struct hvf_slot {
>         >> +    uint64_t start;
>         >> +    uint64_t size;
>         >> +    uint8_t *mem;
>         >> +    int slot_id;
>         >> +    uint32_t flags;
>         >> +    MemoryRegion *region;
>         >> +} hvf_slot;
>         >> +
>         >> +typedef struct hvf_vcpu_caps {
>         >> +    uint64_t vmx_cap_pinbased;
>         >> +    uint64_t vmx_cap_procbased;
>         >> +    uint64_t vmx_cap_procbased2;
>         >> +    uint64_t vmx_cap_entry;
>         >> +    uint64_t vmx_cap_exit;
>         >> +    uint64_t vmx_cap_preemption_timer;
>         >> +} hvf_vcpu_caps;
>         >> +
>         >> +struct HVFState {
>         >> +    AccelState parent;
>         >> +    hvf_slot slots[32];
>         >> +    int num_slots;
>         >> +
>         >> +    hvf_vcpu_caps *hvf_caps;
>         >> +};
>         >> +extern HVFState *hvf_state;
>         >> +
>         >> +void assert_hvf_ok(hv_return_t ret);
>         >> +int hvf_get_registers(CPUState *cpu);
>         >> +int hvf_put_registers(CPUState *cpu);
>         >> +int hvf_arch_init_vcpu(CPUState *cpu);
>         >> +void hvf_arch_vcpu_destroy(CPUState *cpu);
>         >> +int hvf_vcpu_exec(CPUState *cpu);
>         >> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
>         >> +
>         >> +#endif
>         >> diff --git a/target/i386/hvf/hvf-cpus.c
>         b/target/i386/hvf/hvf-cpus.c
>         >> deleted file mode 100644
>         >> index 817b3d7452..0000000000
>         >> --- a/target/i386/hvf/hvf-cpus.c
>         >> +++ /dev/null
>         >> @@ -1,131 +0,0 @@
>         >> -/*
>         >> - * Copyright 2008 IBM Corporation
>         >> - *           2008 Red Hat, Inc.
>         >> - * Copyright 2011 Intel Corporation
>         >> - * Copyright 2016 Veertu, Inc.
>         >> - * Copyright 2017 The Android Open Source Project
>         >> - *
>         >> - * QEMU Hypervisor.framework support
>         >> - *
>         >> - * This program is free software; you can redistribute it
>         and/or
>         >> - * modify it under the terms of version 2 of the GNU
>         General Public
>         >> - * License as published by the Free Software Foundation.
>         >> - *
>         >> - * This program is distributed in the hope that it will be
>         useful,
>         >> - * but WITHOUT ANY WARRANTY; without even the implied
>         warranty of
>         >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 
>         See the GNU
>         >> - * General Public License for more details.
>         >> - *
>         >> - * You should have received a copy of the GNU General
>         Public License
>         >> - * along with this program; if not, see
>         <http://www.gnu.org/licenses/ <http://www.gnu.org/licenses/>>.
>         >> - *
>         >> - * This file contain code under public domain from the
>         hvdos project:
>         >> - * https://github.com/mist64/hvdos
>         <https://github.com/mist64/hvdos>
>         >> - *
>         >> - * Parts Copyright (c) 2011 NetApp, Inc.
>         >> - * All rights reserved.
>         >> - *
>         >> - * Redistribution and use in source and binary forms, with
>         or without
>         >> - * modification, are permitted provided that the following
>         conditions
>         >> - * are met:
>         >> - * 1. Redistributions of source code must retain the above
>         copyright
>         >> - *    notice, this list of conditions and the following
>         disclaimer.
>         >> - * 2. Redistributions in binary form must reproduce the
>         above copyright
>         >> - *    notice, this list of conditions and the following
>         disclaimer in the
>         >> - *    documentation and/or other materials provided with
>         the distribution.
>         >> - *
>         >> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
>         >> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>         LIMITED TO, THE
>         >> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>         PARTICULAR PURPOSE
>         >> - * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR
>         CONTRIBUTORS BE LIABLE
>         >> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
>         EXEMPLARY, OR CONSEQUENTIAL
>         >> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
>         SUBSTITUTE GOODS
>         >> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
>         INTERRUPTION)
>         >> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
>         IN CONTRACT, STRICT
>         >> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
>         ARISING IN ANY WAY
>         >> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
>         POSSIBILITY OF
>         >> - * SUCH DAMAGE.
>         >> - */
>         >> -
>         >> -#include "qemu/osdep.h"
>         >> -#include "qemu/error-report.h"
>         >> -#include "qemu/main-loop.h"
>         >> -#include "sysemu/hvf.h"
>         >> -#include "sysemu/runstate.h"
>         >> -#include "target/i386/cpu.h"
>         >> -#include "qemu/guest-random.h"
>         >> -
>         >> -#include "hvf-cpus.h"
>         >> -
>         >> -/*
>         >> - * The HVF-specific vCPU thread function. This one should
>         only run when the host
>         >> - * CPU supports the VMX "unrestricted guest" feature.
>         >> - */
>         >> -static void *hvf_cpu_thread_fn(void *arg)
>         >> -{
>         >> -    CPUState *cpu = arg;
>         >> -
>         >> -    int r;
>         >> -
>         >> -    assert(hvf_enabled());
>         >> -
>         >> -    rcu_register_thread();
>         >> -
>         >> -    qemu_mutex_lock_iothread();
>         >> -    qemu_thread_get_self(cpu->thread);
>         >> -
>         >> -    cpu->thread_id = qemu_get_thread_id();
>         >> -    cpu->can_do_io = 1;
>         >> -    current_cpu = cpu;
>         >> -
>         >> -    hvf_init_vcpu(cpu);
>         >> -
>         >> -    /* signal CPU creation */
>         >> -    cpu_thread_signal_created(cpu);
>         >> - qemu_guest_random_seed_thread_part2(cpu->random_seed);
>         >> -
>         >> -    do {
>         >> -        if (cpu_can_run(cpu)) {
>         >> -            r = hvf_vcpu_exec(cpu);
>         >> -            if (r == EXCP_DEBUG) {
>         >> -                cpu_handle_guest_debug(cpu);
>         >> -            }
>         >> -        }
>         >> -        qemu_wait_io_event(cpu);
>         >> -    } while (!cpu->unplug || cpu_can_run(cpu));
>         >> -
>         >> -    hvf_vcpu_destroy(cpu);
>         >> -    cpu_thread_signal_destroyed(cpu);
>         >> -    qemu_mutex_unlock_iothread();
>         >> -    rcu_unregister_thread();
>         >> -    return NULL;
>         >> -}
>         >> -
>         >> -static void hvf_start_vcpu_thread(CPUState *cpu)
>         >> -{
>         >> -    char thread_name[VCPU_THREAD_NAME_SIZE];
>         >> -
>         >> -    /*
>         >> -     * HVF currently does not support TCG, and only runs in
>         >> -     * unrestricted-guest mode.
>         >> -     */
>         >> -    assert(hvf_enabled());
>         >> -
>         >> -    cpu->thread = g_malloc0(sizeof(QemuThread));
>         >> -    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>         >> -    qemu_cond_init(cpu->halt_cond);
>         >> -
>         >> -    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>         >> -             cpu->cpu_index);
>         >> -    qemu_thread_create(cpu->thread, thread_name,
>         hvf_cpu_thread_fn,
>         >> -                       cpu, QEMU_THREAD_JOINABLE);
>         >> -}
>         >> -
>         >> -const CpusAccel hvf_cpus = {
>         >> -    .create_vcpu_thread = hvf_start_vcpu_thread,
>         >> -
>         >> -    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
>         >> -    .synchronize_post_init = hvf_cpu_synchronize_post_init,
>         >> -    .synchronize_state = hvf_cpu_synchronize_state,
>         >> -    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
>         >> -};
>         >> diff --git a/target/i386/hvf/hvf-cpus.h
>         b/target/i386/hvf/hvf-cpus.h
>         >> deleted file mode 100644
>         >> index ced31b82c0..0000000000
>         >> --- a/target/i386/hvf/hvf-cpus.h
>         >> +++ /dev/null
>         >> @@ -1,25 +0,0 @@
>         >> -/*
>         >> - * Accelerator CPUS Interface
>         >> - *
>         >> - * Copyright 2020 SUSE LLC
>         >> - *
>         >> - * This work is licensed under the terms of the GNU GPL,
>         version 2 or later.
>         >> - * See the COPYING file in the top-level directory.
>         >> - */
>         >> -
>         >> -#ifndef HVF_CPUS_H
>         >> -#define HVF_CPUS_H
>         >> -
>         >> -#include "sysemu/cpus.h"
>         >> -
>         >> -extern const CpusAccel hvf_cpus;
>         >> -
>         >> -int hvf_init_vcpu(CPUState *);
>         >> -int hvf_vcpu_exec(CPUState *);
>         >> -void hvf_cpu_synchronize_state(CPUState *);
>         >> -void hvf_cpu_synchronize_post_reset(CPUState *);
>         >> -void hvf_cpu_synchronize_post_init(CPUState *);
>         >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *);
>         >> -void hvf_vcpu_destroy(CPUState *);
>         >> -
>         >> -#endif /* HVF_CPUS_H */
>         >> diff --git a/target/i386/hvf/hvf-i386.h
>         b/target/i386/hvf/hvf-i386.h
>         >> index e0edffd077..6d56f8f6bb 100644
>         >> --- a/target/i386/hvf/hvf-i386.h
>         >> +++ b/target/i386/hvf/hvf-i386.h
>         >> @@ -18,57 +18,11 @@
>         >>
>         >>   #include "sysemu/accel.h"
>         >>   #include "sysemu/hvf.h"
>         >> +#include "sysemu/hvf_int.h"
>         >>   #include "cpu.h"
>         >>   #include "x86.h"
>         >>
>         >> -#define HVF_MAX_VCPU 0x10
>         >> -
>         >> -extern struct hvf_state hvf_global;
>         >> -
>         >> -struct hvf_vm {
>         >> -    int id;
>         >> -    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
>         >> -};
>         >> -
>         >> -struct hvf_state {
>         >> -    uint32_t version;
>         >> -    struct hvf_vm *vm;
>         >> -    uint64_t mem_quota;
>         >> -};
>         >> -
>         >> -/* hvf_slot flags */
>         >> -#define HVF_SLOT_LOG (1 << 0)
>         >> -
>         >> -typedef struct hvf_slot {
>         >> -    uint64_t start;
>         >> -    uint64_t size;
>         >> -    uint8_t *mem;
>         >> -    int slot_id;
>         >> -    uint32_t flags;
>         >> -    MemoryRegion *region;
>         >> -} hvf_slot;
>         >> -
>         >> -typedef struct hvf_vcpu_caps {
>         >> -    uint64_t vmx_cap_pinbased;
>         >> -    uint64_t vmx_cap_procbased;
>         >> -    uint64_t vmx_cap_procbased2;
>         >> -    uint64_t vmx_cap_entry;
>         >> -    uint64_t vmx_cap_exit;
>         >> -    uint64_t vmx_cap_preemption_timer;
>         >> -} hvf_vcpu_caps;
>         >> -
>         >> -struct HVFState {
>         >> -    AccelState parent;
>         >> -    hvf_slot slots[32];
>         >> -    int num_slots;
>         >> -
>         >> -    hvf_vcpu_caps *hvf_caps;
>         >> -};
>         >> -extern HVFState *hvf_state;
>         >> -
>         >> -void hvf_set_phys_mem(MemoryRegionSection *, bool);
>         >>   void hvf_handle_io(CPUArchState *, uint16_t, void *, int,
>         int, int);
>         >> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
>         >>
>         >>   #ifdef NEED_CPU_H
>         >>   /* Functions exported to host specific mode */
>         >> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
>         >> index ed9356565c..8b96ecd619 100644
>         >> --- a/target/i386/hvf/hvf.c
>         >> +++ b/target/i386/hvf/hvf.c
>         >> @@ -51,6 +51,7 @@
>         >>   #include "qemu/error-report.h"
>         >>
>         >>   #include "sysemu/hvf.h"
>         >> +#include "sysemu/hvf_int.h"
>         >>   #include "sysemu/runstate.h"
>         >>   #include "hvf-i386.h"
>         >>   #include "vmcs.h"
>         >> @@ -72,171 +73,6 @@
>         >>   #include "sysemu/accel.h"
>         >>   #include "target/i386/cpu.h"
>         >>
>         >> -#include "hvf-cpus.h"
>         >> -
>         >> -HVFState *hvf_state;
>         >> -
>         >> -static void assert_hvf_ok(hv_return_t ret)
>         >> -{
>         >> -    if (ret == HV_SUCCESS) {
>         >> -        return;
>         >> -    }
>         >> -
>         >> -    switch (ret) {
>         >> -    case HV_ERROR:
>         >> -        error_report("Error: HV_ERROR");
>         >> -        break;
>         >> -    case HV_BUSY:
>         >> -        error_report("Error: HV_BUSY");
>         >> -        break;
>         >> -    case HV_BAD_ARGUMENT:
>         >> -        error_report("Error: HV_BAD_ARGUMENT");
>         >> -        break;
>         >> -    case HV_NO_RESOURCES:
>         >> -        error_report("Error: HV_NO_RESOURCES");
>         >> -        break;
>         >> -    case HV_NO_DEVICE:
>         >> -        error_report("Error: HV_NO_DEVICE");
>         >> -        break;
>         >> -    case HV_UNSUPPORTED:
>         >> -        error_report("Error: HV_UNSUPPORTED");
>         >> -        break;
>         >> -    default:
>         >> -        error_report("Unknown Error");
>         >> -    }
>         >> -
>         >> -    abort();
>         >> -}
>         >> -
>         >> -/* Memory slots */
>         >> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
>         >> -{
>         >> -    hvf_slot *slot;
>         >> -    int x;
>         >> -    for (x = 0; x < hvf_state->num_slots; ++x) {
>         >> -        slot = &hvf_state->slots[x];
>         >> -        if (slot->size && start < (slot->start +
>         slot->size) &&
>         >> -            (start + size) > slot->start) {
>         >> -            return slot;
>         >> -        }
>         >> -    }
>         >> -    return NULL;
>         >> -}
>         >> -
>         >> -struct mac_slot {
>         >> -    int present;
>         >> -    uint64_t size;
>         >> -    uint64_t gpa_start;
>         >> -    uint64_t gva;
>         >> -};
>         >> -
>         >> -struct mac_slot mac_slots[32];
>         >> -
>         >> -static int do_hvf_set_memory(hvf_slot *slot,
>         hv_memory_flags_t flags)
>         >> -{
>         >> -    struct mac_slot *macslot;
>         >> -    hv_return_t ret;
>         >> -
>         >> -    macslot = &mac_slots[slot->slot_id];
>         >> -
>         >> -    if (macslot->present) {
>         >> -        if (macslot->size != slot->size) {
>         >> -            macslot->present = 0;
>         >> -            ret = hv_vm_unmap(macslot->gpa_start,
>         macslot->size);
>         >> -            assert_hvf_ok(ret);
>         >> -        }
>         >> -    }
>         >> -
>         >> -    if (!slot->size) {
>         >> -        return 0;
>         >> -    }
>         >> -
>         >> -    macslot->present = 1;
>         >> -    macslot->gpa_start = slot->start;
>         >> -    macslot->size = slot->size;
>         >> -    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start,
>         slot->size, flags);
>         >> -    assert_hvf_ok(ret);
>         >> -    return 0;
>         >> -}
>         >> -
>         >> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
>         >> -{
>         >> -    hvf_slot *mem;
>         >> -    MemoryRegion *area = section->mr;
>         >> -    bool writeable = !area->readonly && !area->rom_device;
>         >> -    hv_memory_flags_t flags;
>         >> -
>         >> -    if (!memory_region_is_ram(area)) {
>         >> -        if (writeable) {
>         >> -            return;
>         >> -        } else if (!memory_region_is_romd(area)) {
>         >> -            /*
>         >> -             * If the memory device is not in romd_mode,
>         then we actually want
>         >> -             * to remove the hvf memory slot so all
>         accesses will trap.
>         >> -             */
>         >> -             add = false;
>         >> -        }
>         >> -    }
>         >> -
>         >> -    mem = hvf_find_overlap_slot(
>         >> - section->offset_within_address_space,
>         >> -            int128_get64(section->size));
>         >> -
>         >> -    if (mem && add) {
>         >> -        if (mem->size == int128_get64(section->size) &&
>         >> -            mem->start ==
>         section->offset_within_address_space &&
>         >> -            mem->mem == (memory_region_get_ram_ptr(area) +
>         >> -            section->offset_within_region)) {
>         >> -            return; /* Same region was attempted to
>         register, go away. */
>         >> -        }
>         >> -    }
>         >> -
>         >> -    /* Region needs to be reset. set the size to 0 and
>         remap it. */
>         >> -    if (mem) {
>         >> -        mem->size = 0;
>         >> -        if (do_hvf_set_memory(mem, 0)) {
>         >> -            error_report("Failed to reset overlapping slot");
>         >> -            abort();
>         >> -        }
>         >> -    }
>         >> -
>         >> -    if (!add) {
>         >> -        return;
>         >> -    }
>         >> -
>         >> -    if (area->readonly ||
>         >> -        (!memory_region_is_ram(area) &&
>         memory_region_is_romd(area))) {
>         >> -        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
>         >> -    } else {
>         >> -        flags = HV_MEMORY_READ | HV_MEMORY_WRITE |
>         HV_MEMORY_EXEC;
>         >> -    }
>         >> -
>         >> -    /* Now make a new slot. */
>         >> -    int x;
>         >> -
>         >> -    for (x = 0; x < hvf_state->num_slots; ++x) {
>         >> -        mem = &hvf_state->slots[x];
>         >> -        if (!mem->size) {
>         >> -            break;
>         >> -        }
>         >> -    }
>         >> -
>         >> -    if (x == hvf_state->num_slots) {
>         >> -        error_report("No free slots");
>         >> -        abort();
>         >> -    }
>         >> -
>         >> -    mem->size = int128_get64(section->size);
>         >> -    mem->mem = memory_region_get_ram_ptr(area) +
>         section->offset_within_region;
>         >> -    mem->start = section->offset_within_address_space;
>         >> -    mem->region = area;
>         >> -
>         >> -    if (do_hvf_set_memory(mem, flags)) {
>         >> -        error_report("Error registering new memory slot");
>         >> -        abort();
>         >> -    }
>         >> -}
>         >> -
>         >>   void vmx_update_tpr(CPUState *cpu)
>         >>   {
>         >>       /* TODO: need integrate APIC handling */
>         >> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env,
>         uint16_t port, void *buffer,
>         >>       }
>         >>   }
>         >>
>         >> -static void do_hvf_cpu_synchronize_state(CPUState *cpu,
>         run_on_cpu_data arg)
>         >> -{
>         >> -    if (!cpu->vcpu_dirty) {
>         >> -        hvf_get_registers(cpu);
>         >> -        cpu->vcpu_dirty = true;
>         >> -    }
>         >> -}
>         >> -
>         >> -void hvf_cpu_synchronize_state(CPUState *cpu)
>         >> -{
>         >> -    if (!cpu->vcpu_dirty) {
>         >> -        run_on_cpu(cpu, do_hvf_cpu_synchronize_state,
>         RUN_ON_CPU_NULL);
>         >> -    }
>         >> -}
>         >> -
>         >> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
>         >> - run_on_cpu_data arg)
>         >> -{
>         >> -    hvf_put_registers(cpu);
>         >> -    cpu->vcpu_dirty = false;
>         >> -}
>         >> -
>         >> -void hvf_cpu_synchronize_post_reset(CPUState *cpu)
>         >> -{
>         >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset,
>         RUN_ON_CPU_NULL);
>         >> -}
>         >> -
>         >> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
>         >> -  run_on_cpu_data arg)
>         >> -{
>         >> -    hvf_put_registers(cpu);
>         >> -    cpu->vcpu_dirty = false;
>         >> -}
>         >> -
>         >> -void hvf_cpu_synchronize_post_init(CPUState *cpu)
>         >> -{
>         >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init,
>         RUN_ON_CPU_NULL);
>         >> -}
>         >> -
>         >> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
>         >> - run_on_cpu_data arg)
>         >> -{
>         >> -    cpu->vcpu_dirty = true;
>         >> -}
>         >> -
>         >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
>         >> -{
>         >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm,
>         RUN_ON_CPU_NULL);
>         >> -}
>         >> -
>         >>   static bool ept_emulation_fault(hvf_slot *slot, uint64_t
>         gpa, uint64_t ept_qual)
>         >>   {
>         >>       int read, write;
>         >> @@ -370,109 +156,19 @@ static bool
>         ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t
>         ept_qual)
>         >>       return false;
>         >>   }
>         >>
>         >> -static void hvf_set_dirty_tracking(MemoryRegionSection
>         *section, bool on)
>         >> -{
>         >> -    hvf_slot *slot;
>         >> -
>         >> -    slot = hvf_find_overlap_slot(
>         >> - section->offset_within_address_space,
>         >> -            int128_get64(section->size));
>         >> -
>         >> -    /* protect region against writes; begin tracking it */
>         >> -    if (on) {
>         >> -        slot->flags |= HVF_SLOT_LOG;
>         >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
>         >> -                      HV_MEMORY_READ);
>         >> -    /* stop tracking region*/
>         >> -    } else {
>         >> -        slot->flags &= ~HVF_SLOT_LOG;
>         >> - hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
>         >> -                      HV_MEMORY_READ | HV_MEMORY_WRITE);
>         >> -    }
>         >> -}
>         >> -
>         >> -static void hvf_log_start(MemoryListener *listener,
>         >> -                          MemoryRegionSection *section,
>         int old, int new)
>         >> -{
>         >> -    if (old != 0) {
>         >> -        return;
>         >> -    }
>         >> -
>         >> -    hvf_set_dirty_tracking(section, 1);
>         >> -}
>         >> -
>         >> -static void hvf_log_stop(MemoryListener *listener,
>         >> -                         MemoryRegionSection *section, int
>         old, int new)
>         >> -{
>         >> -    if (new != 0) {
>         >> -        return;
>         >> -    }
>         >> -
>         >> -    hvf_set_dirty_tracking(section, 0);
>         >> -}
>         >> -
>         >> -static void hvf_log_sync(MemoryListener *listener,
>         >> -                         MemoryRegionSection *section)
>         >> -{
>         >> -    /*
>         >> -     * sync of dirty pages is handled elsewhere; just make
>         sure we keep
>         >> -     * tracking the region.
>         >> -     */
>         >> -    hvf_set_dirty_tracking(section, 1);
>         >> -}
>         >> -
>         >> -static void hvf_region_add(MemoryListener *listener,
>         >> -                           MemoryRegionSection *section)
>         >> -{
>         >> -    hvf_set_phys_mem(section, true);
>         >> -}
>         >> -
>         >> -static void hvf_region_del(MemoryListener *listener,
>         >> -                           MemoryRegionSection *section)
>         >> -{
>         >> -    hvf_set_phys_mem(section, false);
>         >> -}
>         >> -
>         >> -static MemoryListener hvf_memory_listener = {
>         >> -    .priority = 10,
>         >> -    .region_add = hvf_region_add,
>         >> -    .region_del = hvf_region_del,
>         >> -    .log_start = hvf_log_start,
>         >> -    .log_stop = hvf_log_stop,
>         >> -    .log_sync = hvf_log_sync,
>         >> -};
>         >> -
>         >> -void hvf_vcpu_destroy(CPUState *cpu)
>         >> +void hvf_arch_vcpu_destroy(CPUState *cpu)
>         >>   {
>         >>       X86CPU *x86_cpu = X86_CPU(cpu);
>         >>       CPUX86State *env = &x86_cpu->env;
>         >>
>         >> -    hv_return_t ret =
>         hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
>         >>       g_free(env->hvf_mmio_buf);
>         >> -    assert_hvf_ok(ret);
>         >> -}
>         >> -
>         >> -static void dummy_signal(int sig)
>         >> -{
>         >>   }
>         >>
>         >> -int hvf_init_vcpu(CPUState *cpu)
>         >> +int hvf_arch_init_vcpu(CPUState *cpu)
>         >>   {
>         >>
>         >>       X86CPU *x86cpu = X86_CPU(cpu);
>         >>       CPUX86State *env = &x86cpu->env;
>         >> -    int r;
>         >> -
>         >> -    /* init cpu signals */
>         >> -    sigset_t set;
>         >> -    struct sigaction sigact;
>         >> -
>         >> -    memset(&sigact, 0, sizeof(sigact));
>         >> -    sigact.sa_handler = dummy_signal;
>         >> -    sigaction(SIG_IPI, &sigact, NULL);
>         >> -
>         >> -    pthread_sigmask(SIG_BLOCK, NULL, &set);
>         >> -    sigdelset(&set, SIG_IPI);
>         >>
>         >>       init_emu();
>         >>       init_decoder();
>         >> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu)
>         >>       hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1);
>         >>       env->hvf_mmio_buf = g_new(char, 4096);
>         >>
>         >> -    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd,
>         HV_VCPU_DEFAULT);
>         >> -    cpu->vcpu_dirty = 1;
>         >> -    assert_hvf_ok(r);
>         >> -
>         >>       if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
>         >>  &hvf_state->hvf_caps->vmx_cap_pinbased)) {
>         >>           abort();
>         >> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu)
>         >>
>         >>       return ret;
>         >>   }
>         >> -
>         >> -bool hvf_allowed;
>         >> -
>         >> -static int hvf_accel_init(MachineState *ms)
>         >> -{
>         >> -    int x;
>         >> -    hv_return_t ret;
>         >> -    HVFState *s;
>         >> -
>         >> -    ret = hv_vm_create(HV_VM_DEFAULT);
>         >> -    assert_hvf_ok(ret);
>         >> -
>         >> -    s = g_new0(HVFState, 1);
>         >> -
>         >> -    s->num_slots = 32;
>         >> -    for (x = 0; x < s->num_slots; ++x) {
>         >> -        s->slots[x].size = 0;
>         >> -        s->slots[x].slot_id = x;
>         >> -    }
>         >> -
>         >> -    hvf_state = s;
>         >> - memory_listener_register(&hvf_memory_listener,
>         &address_space_memory);
>         >> -    cpus_register_accel(&hvf_cpus);
>         >> -    return 0;
>         >> -}
>         >> -
>         >> -static void hvf_accel_class_init(ObjectClass *oc, void *data)
>         >> -{
>         >> -    AccelClass *ac = ACCEL_CLASS(oc);
>         >> -    ac->name = "HVF";
>         >> -    ac->init_machine = hvf_accel_init;
>         >> -    ac->allowed = &hvf_allowed;
>         >> -}
>         >> -
>         >> -static const TypeInfo hvf_accel_type = {
>         >> -    .name = TYPE_HVF_ACCEL,
>         >> -    .parent = TYPE_ACCEL,
>         >> -    .class_init = hvf_accel_class_init,
>         >> -};
>         >> -
>         >> -static void hvf_type_init(void)
>         >> -{
>         >> -    type_register_static(&hvf_accel_type);
>         >> -}
>         >> -
>         >> -type_init(hvf_type_init);
>         >> diff --git a/target/i386/hvf/meson.build
>         b/target/i386/hvf/meson.build
>         >> index 409c9a3f14..c8a43717ee 100644
>         >> --- a/target/i386/hvf/meson.build
>         >> +++ b/target/i386/hvf/meson.build
>         >> @@ -1,6 +1,5 @@
>         >>   i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true:
>         files(
>         >>     'hvf.c',
>         >> -  'hvf-cpus.c',
>         >>     'x86.c',
>         >>     'x86_cpuid.c',
>         >>     'x86_decode.c',
>         >> diff --git a/target/i386/hvf/x86hvf.c
>         b/target/i386/hvf/x86hvf.c
>         >> index bbec412b6c..89b8e9d87a 100644
>         >> --- a/target/i386/hvf/x86hvf.c
>         >> +++ b/target/i386/hvf/x86hvf.c
>         >> @@ -20,6 +20,9 @@
>         >>   #include "qemu/osdep.h"
>         >>
>         >>   #include "qemu-common.h"
>         >> +#include "sysemu/hvf.h"
>         >> +#include "sysemu/hvf_int.h"
>         >> +#include "sysemu/hw_accel.h"
>         >>   #include "x86hvf.h"
>         >>   #include "vmx.h"
>         >>   #include "vmcs.h"
>         >> @@ -32,8 +35,6 @@
>         >>   #include <Hypervisor/hv.h>
>         >>   #include <Hypervisor/hv_vmx.h>
>         >>
>         >> -#include "hvf-cpus.h"
>         >> -
>         >>   void hvf_set_segment(struct CPUState *cpu, struct
>         vmx_segment *vmx_seg,
>         >>                        SegmentCache *qseg, bool is_tr)
>         >>   {
>         >> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state)
>         >>       env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
>         >>
>         >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
>         >> -        hvf_cpu_synchronize_state(cpu_state);
>         >> +        cpu_synchronize_state(cpu_state);
>         >>           do_cpu_init(cpu);
>         >>       }
>         >>
>         >> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState
>         *cpu_state)
>         >>           cpu_state->halted = 0;
>         >>       }
>         >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) {
>         >> -        hvf_cpu_synchronize_state(cpu_state);
>         >> +        cpu_synchronize_state(cpu_state);
>         >>           do_cpu_sipi(cpu);
>         >>       }
>         >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) {
>         >>           cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR;
>         >> -        hvf_cpu_synchronize_state(cpu_state);
>         >> +        cpu_synchronize_state(cpu_state);
>         > The changes from hvf_cpu_*() to cpu_*() are cleanup and
>         perhaps should
>         > be a separate patch. It follows cpu/accel cleanups Claudio
>         was doing the
>         > summer.
>
>
>         The only reason they're in here is because we no longer have
>         access to
>         the hvf_ functions from the file. I am perfectly happy to
>         rebase the
>         patch on top of Claudio's if his goes in first. I'm sure it'll be
>         trivial for him to rebase on top of this too if my series goes
>         in first.
>
>
>         >
>         > Phillipe raised the idea that the patch might go ahead of
>         ARM-specific
>         > part (which might involve some discussions) and I agree with
>         that.
>         >
>         > Some sync between Claudio series (CC'd him) and the patch
>         might be need.
>
>
>         I would prefer not to hold back because of the sync. Claudio's
>         cleanup
>         is trivial enough to adjust for if it gets merged ahead of this.
>
>
>         Alex
>
>
>
Frank Yang Nov. 30, 2020, 8:55 p.m. UTC | #6
On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote:

> Hi Frank,
>
> Thanks for the update :). Your previous email nudged me into the right
> direction. I previously had implemented WFI through the internal timer
> framework which performed way worse.
>
Cool, glad it's helping. Also, Peter found out that the main thing keeping
us from just using cntpct_el0 on the host directly and compare with cval is
that if we sleep, cval is going to be much < cntpct_el0 by the sleep time.
If we can get either the architecture or macos to read out the sleep time
then we might be able to not have to use a poll interval either!

> Along the way, I stumbled over a few issues though. For starters, the
> signal mask for SIG_IPI was not set correctly, so while pselect() would
> exit, the signal would never get delivered to the thread! For a fix, check
> out
>
>
> https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/
>
>
Thanks, we'll take a look :)


> Please also have a look at my latest stab at WFI emulation. It doesn't
> handle WFE (that's only relevant in overcommitted scenarios). But it does
> handle WFI and even does something similar to hlt polling, albeit not with
> an adaptive threshold.
>
> Also, is there a particular reason you're working on this super
> interesting and useful code in a random downstream fork of QEMU? Wouldn't
> it be more helpful to contribute to the upstream code base instead?
>
We'd actually like to contribute upstream too :) We do want to maintain our
own downstream though; Android Emulator codebase needs to work solidly on
macos and windows which has made keeping up with upstream difficult, and
staying on a previous version (2.12) with known quirks easier. (theres also
some android related customization relating to Qt Ui + different set of
virtual devices and snapshot support (incl. snapshots of graphics devices
with OpenGLES state tracking), which we hope to separate into other
libraries/processes, but its not insignificant)

>
> Alex
>
> On 30.11.20 21:15, Frank Yang wrote:
>
> Update: We're not quite sure how to compare the CNTV_CVAL and CNTVCT. But
> the high CPU usage seems to be mitigated by having a poll interval (like
> KVM does) in handling WFI:
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1512501
>
> This is loosely inspired by
> https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766
> which does seem to specify a poll interval.
>
> It would be cool if we could have a lightweight way to enter sleep and
> restart the vcpus precisely when CVAL passes, though.
>
> Frank
>
>
> On Fri, Nov 27, 2020 at 3:30 PM Frank Yang <lfy@google.com> wrote:
>
>> Hi all,
>>
>> +Peter Collingbourne <pcc@google.com>
>>
>> I'm a developer on the Android Emulator, which is in a fork of QEMU.
>>
>> Peter and I have been working on an HVF Apple Silicon backend with an eye
>> toward Android guests.
>>
>> We have gotten things to basically switch to Android userspace already
>> (logcat/shell and graphics available at least)
>>
>> Our strategy so far has been to import logic from the KVM implementation
>> and hook into QEMU's software devices that previously assumed to only work
>> with TCG, or have KVM-specific paths.
>>
>> Thanks to Alexander for the tip on the 36-bit address space limitation
>> btw; our way of addressing this is to still allow highmem but not put pci
>> high mmio so high.
>>
>> Also, note we have a sleep/signal based mechanism to deal with WFx, which
>> might be worth looking into in Alexander's implementation as well:
>>
>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551
>>
>> Patches so far, FYI:
>>
>>
>> https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1
>>
>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3
>>
>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3
>>
>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3
>>
>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3
>>
>>
>> https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a
>>
>> https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b
>>
>> https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01
>>
>> https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228
>>
>> https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102
>>
>> https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6
>>
>> Peter's also noticed that there are extra steps needed for M1's to allow
>> TCG to work, as it involves JIT:
>>
>>
>> https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9
>>
>> We'd appreciate any feedback/comments :)
>>
>> Best,
>>
>> Frank
>>
>> On Fri, Nov 27, 2020 at 1:57 PM Alexander Graf <agraf@csgraf.de> wrote:
>>
>>>
>>> On 27.11.20 21:00, Roman Bolshakov wrote:
>>> > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote:
>>> >> Until now, Hypervisor.framework has only been available on x86_64
>>> systems.
>>> >> With Apple Silicon shipping now, it extends its reach to aarch64. To
>>> >> prepare for support for multiple architectures, let's move common
>>> code out
>>> >> into its own accel directory.
>>> >>
>>> >> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>>> >> ---
>>> >>   MAINTAINERS                 |   9 +-
>>> >>   accel/hvf/hvf-all.c         |  56 +++++
>>> >>   accel/hvf/hvf-cpus.c        | 468
>>> ++++++++++++++++++++++++++++++++++++
>>> >>   accel/hvf/meson.build       |   7 +
>>> >>   accel/meson.build           |   1 +
>>> >>   include/sysemu/hvf_int.h    |  69 ++++++
>>> >>   target/i386/hvf/hvf-cpus.c  | 131 ----------
>>> >>   target/i386/hvf/hvf-cpus.h  |  25 --
>>> >>   target/i386/hvf/hvf-i386.h  |  48 +---
>>> >>   target/i386/hvf/hvf.c       | 360 +--------------------------
>>> >>   target/i386/hvf/meson.build |   1 -
>>> >>   target/i386/hvf/x86hvf.c    |  11 +-
>>> >>   target/i386/hvf/x86hvf.h    |   2 -
>>> >>   13 files changed, 619 insertions(+), 569 deletions(-)
>>> >>   create mode 100644 accel/hvf/hvf-all.c
>>> >>   create mode 100644 accel/hvf/hvf-cpus.c
>>> >>   create mode 100644 accel/hvf/meson.build
>>> >>   create mode 100644 include/sysemu/hvf_int.h
>>> >>   delete mode 100644 target/i386/hvf/hvf-cpus.c
>>> >>   delete mode 100644 target/i386/hvf/hvf-cpus.h
>>> >>
>>> >> diff --git a/MAINTAINERS b/MAINTAINERS
>>> >> index 68bc160f41..ca4b6d9279 100644
>>> >> --- a/MAINTAINERS
>>> >> +++ b/MAINTAINERS
>>> >> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com>
>>> >>   M: Roman Bolshakov <r.bolshakov@yadro.com>
>>> >>   W: https://wiki.qemu.org/Features/HVF
>>> >>   S: Maintained
>>> >> -F: accel/stubs/hvf-stub.c
>>> > There was a patch for that in the RFC series from Claudio.
>>>
>>>
>>> Yeah, I'm not worried about this hunk :).
>>>
>>>
>>> >
>>> >>   F: target/i386/hvf/
>>> >> +
>>> >> +HVF
>>> >> +M: Cameron Esfahani <dirty@apple.com>
>>> >> +M: Roman Bolshakov <r.bolshakov@yadro.com>
>>> >> +W: https://wiki.qemu.org/Features/HVF
>>> >> +S: Maintained
>>> >> +F: accel/hvf/
>>> >>   F: include/sysemu/hvf.h
>>> >> +F: include/sysemu/hvf_int.h
>>> >>
>>> >>   WHPX CPUs
>>> >>   M: Sunil Muthuswamy <sunilmut@microsoft.com>
>>> >> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
>>> >> new file mode 100644
>>> >> index 0000000000..47d77a472a
>>> >> --- /dev/null
>>> >> +++ b/accel/hvf/hvf-all.c
>>> >> @@ -0,0 +1,56 @@
>>> >> +/*
>>> >> + * QEMU Hypervisor.framework support
>>> >> + *
>>> >> + * This work is licensed under the terms of the GNU GPL, version 2.
>>> See
>>> >> + * the COPYING file in the top-level directory.
>>> >> + *
>>> >> + * Contributions after 2012-01-13 are licensed under the terms of the
>>> >> + * GNU GPL, version 2 or (at your option) any later version.
>>> >> + */
>>> >> +
>>> >> +#include "qemu/osdep.h"
>>> >> +#include "qemu-common.h"
>>> >> +#include "qemu/error-report.h"
>>> >> +#include "sysemu/hvf.h"
>>> >> +#include "sysemu/hvf_int.h"
>>> >> +#include "sysemu/runstate.h"
>>> >> +
>>> >> +#include "qemu/main-loop.h"
>>> >> +#include "sysemu/accel.h"
>>> >> +
>>> >> +#include <Hypervisor/Hypervisor.h>
>>> >> +
>>> >> +bool hvf_allowed;
>>> >> +HVFState *hvf_state;
>>> >> +
>>> >> +void assert_hvf_ok(hv_return_t ret)
>>> >> +{
>>> >> +    if (ret == HV_SUCCESS) {
>>> >> +        return;
>>> >> +    }
>>> >> +
>>> >> +    switch (ret) {
>>> >> +    case HV_ERROR:
>>> >> +        error_report("Error: HV_ERROR");
>>> >> +        break;
>>> >> +    case HV_BUSY:
>>> >> +        error_report("Error: HV_BUSY");
>>> >> +        break;
>>> >> +    case HV_BAD_ARGUMENT:
>>> >> +        error_report("Error: HV_BAD_ARGUMENT");
>>> >> +        break;
>>> >> +    case HV_NO_RESOURCES:
>>> >> +        error_report("Error: HV_NO_RESOURCES");
>>> >> +        break;
>>> >> +    case HV_NO_DEVICE:
>>> >> +        error_report("Error: HV_NO_DEVICE");
>>> >> +        break;
>>> >> +    case HV_UNSUPPORTED:
>>> >> +        error_report("Error: HV_UNSUPPORTED");
>>> >> +        break;
>>> >> +    default:
>>> >> +        error_report("Unknown Error");
>>> >> +    }
>>> >> +
>>> >> +    abort();
>>> >> +}
>>> >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
>>> >> new file mode 100644
>>> >> index 0000000000..f9bb5502b7
>>> >> --- /dev/null
>>> >> +++ b/accel/hvf/hvf-cpus.c
>>> >> @@ -0,0 +1,468 @@
>>> >> +/*
>>> >> + * Copyright 2008 IBM Corporation
>>> >> + *           2008 Red Hat, Inc.
>>> >> + * Copyright 2011 Intel Corporation
>>> >> + * Copyright 2016 Veertu, Inc.
>>> >> + * Copyright 2017 The Android Open Source Project
>>> >> + *
>>> >> + * QEMU Hypervisor.framework support
>>> >> + *
>>> >> + * This program is free software; you can redistribute it and/or
>>> >> + * modify it under the terms of version 2 of the GNU General Public
>>> >> + * License as published by the Free Software Foundation.
>>> >> + *
>>> >> + * This program is distributed in the hope that it will be useful,
>>> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>> >> + * General Public License for more details.
>>> >> + *
>>> >> + * You should have received a copy of the GNU General Public License
>>> >> + * along with this program; if not, see <
>>> http://www.gnu.org/licenses/>.
>>> >> + *
>>> >> + * This file contain code under public domain from the hvdos project:
>>> >> + * https://github.com/mist64/hvdos
>>> >> + *
>>> >> + * Parts Copyright (c) 2011 NetApp, Inc.
>>> >> + * All rights reserved.
>>> >> + *
>>> >> + * Redistribution and use in source and binary forms, with or without
>>> >> + * modification, are permitted provided that the following conditions
>>> >> + * are met:
>>> >> + * 1. Redistributions of source code must retain the above copyright
>>> >> + *    notice, this list of conditions and the following disclaimer.
>>> >> + * 2. Redistributions in binary form must reproduce the above
>>> copyright
>>> >> + *    notice, this list of conditions and the following disclaimer
>>> in the
>>> >> + *    documentation and/or other materials provided with the
>>> distribution.
>>> >> + *
>>> >> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
>>> >> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
>>> THE
>>> >> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>>> PARTICULAR PURPOSE
>>> >> + * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE
>>> LIABLE
>>> >> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
>>> CONSEQUENTIAL
>>> >> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
>>> GOODS
>>> >> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
>>> INTERRUPTION)
>>> >> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
>>> CONTRACT, STRICT
>>> >> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
>>> ANY WAY
>>> >> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
>>> POSSIBILITY OF
>>> >> + * SUCH DAMAGE.
>>> >> + */
>>> >> +
>>> >> +#include "qemu/osdep.h"
>>> >> +#include "qemu/error-report.h"
>>> >> +#include "qemu/main-loop.h"
>>> >> +#include "exec/address-spaces.h"
>>> >> +#include "exec/exec-all.h"
>>> >> +#include "sysemu/cpus.h"
>>> >> +#include "sysemu/hvf.h"
>>> >> +#include "sysemu/hvf_int.h"
>>> >> +#include "sysemu/runstate.h"
>>> >> +#include "qemu/guest-random.h"
>>> >> +
>>> >> +#include <Hypervisor/Hypervisor.h>
>>> >> +
>>> >> +/* Memory slots */
>>> >> +
>>> >> +struct mac_slot {
>>> >> +    int present;
>>> >> +    uint64_t size;
>>> >> +    uint64_t gpa_start;
>>> >> +    uint64_t gva;
>>> >> +};
>>> >> +
>>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
>>> >> +{
>>> >> +    hvf_slot *slot;
>>> >> +    int x;
>>> >> +    for (x = 0; x < hvf_state->num_slots; ++x) {
>>> >> +        slot = &hvf_state->slots[x];
>>> >> +        if (slot->size && start < (slot->start + slot->size) &&
>>> >> +            (start + size) > slot->start) {
>>> >> +            return slot;
>>> >> +        }
>>> >> +    }
>>> >> +    return NULL;
>>> >> +}
>>> >> +
>>> >> +struct mac_slot mac_slots[32];
>>> >> +
>>> >> +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
>>> >> +{
>>> >> +    struct mac_slot *macslot;
>>> >> +    hv_return_t ret;
>>> >> +
>>> >> +    macslot = &mac_slots[slot->slot_id];
>>> >> +
>>> >> +    if (macslot->present) {
>>> >> +        if (macslot->size != slot->size) {
>>> >> +            macslot->present = 0;
>>> >> +            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
>>> >> +            assert_hvf_ok(ret);
>>> >> +        }
>>> >> +    }
>>> >> +
>>> >> +    if (!slot->size) {
>>> >> +        return 0;
>>> >> +    }
>>> >> +
>>> >> +    macslot->present = 1;
>>> >> +    macslot->gpa_start = slot->start;
>>> >> +    macslot->size = slot->size;
>>> >> +    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
>>> >> +    assert_hvf_ok(ret);
>>> >> +    return 0;
>>> >> +}
>>> >> +
>>> >> +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
>>> >> +{
>>> >> +    hvf_slot *mem;
>>> >> +    MemoryRegion *area = section->mr;
>>> >> +    bool writeable = !area->readonly && !area->rom_device;
>>> >> +    hv_memory_flags_t flags;
>>> >> +
>>> >> +    if (!memory_region_is_ram(area)) {
>>> >> +        if (writeable) {
>>> >> +            return;
>>> >> +        } else if (!memory_region_is_romd(area)) {
>>> >> +            /*
>>> >> +             * If the memory device is not in romd_mode, then we
>>> actually want
>>> >> +             * to remove the hvf memory slot so all accesses will
>>> trap.
>>> >> +             */
>>> >> +             add = false;
>>> >> +        }
>>> >> +    }
>>> >> +
>>> >> +    mem = hvf_find_overlap_slot(
>>> >> +            section->offset_within_address_space,
>>> >> +            int128_get64(section->size));
>>> >> +
>>> >> +    if (mem && add) {
>>> >> +        if (mem->size == int128_get64(section->size) &&
>>> >> +            mem->start == section->offset_within_address_space &&
>>> >> +            mem->mem == (memory_region_get_ram_ptr(area) +
>>> >> +            section->offset_within_region)) {
>>> >> +            return; /* Same region was attempted to register, go
>>> away. */
>>> >> +        }
>>> >> +    }
>>> >> +
>>> >> +    /* Region needs to be reset. set the size to 0 and remap it. */
>>> >> +    if (mem) {
>>> >> +        mem->size = 0;
>>> >> +        if (do_hvf_set_memory(mem, 0)) {
>>> >> +            error_report("Failed to reset overlapping slot");
>>> >> +            abort();
>>> >> +        }
>>> >> +    }
>>> >> +
>>> >> +    if (!add) {
>>> >> +        return;
>>> >> +    }
>>> >> +
>>> >> +    if (area->readonly ||
>>> >> +        (!memory_region_is_ram(area) &&
>>> memory_region_is_romd(area))) {
>>> >> +        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
>>> >> +    } else {
>>> >> +        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
>>> >> +    }
>>> >> +
>>> >> +    /* Now make a new slot. */
>>> >> +    int x;
>>> >> +
>>> >> +    for (x = 0; x < hvf_state->num_slots; ++x) {
>>> >> +        mem = &hvf_state->slots[x];
>>> >> +        if (!mem->size) {
>>> >> +            break;
>>> >> +        }
>>> >> +    }
>>> >> +
>>> >> +    if (x == hvf_state->num_slots) {
>>> >> +        error_report("No free slots");
>>> >> +        abort();
>>> >> +    }
>>> >> +
>>> >> +    mem->size = int128_get64(section->size);
>>> >> +    mem->mem = memory_region_get_ram_ptr(area) +
>>> section->offset_within_region;
>>> >> +    mem->start = section->offset_within_address_space;
>>> >> +    mem->region = area;
>>> >> +
>>> >> +    if (do_hvf_set_memory(mem, flags)) {
>>> >> +        error_report("Error registering new memory slot");
>>> >> +        abort();
>>> >> +    }
>>> >> +}
>>> >> +
>>> >> +static void hvf_set_dirty_tracking(MemoryRegionSection *section,
>>> bool on)
>>> >> +{
>>> >> +    hvf_slot *slot;
>>> >> +
>>> >> +    slot = hvf_find_overlap_slot(
>>> >> +            section->offset_within_address_space,
>>> >> +            int128_get64(section->size));
>>> >> +
>>> >> +    /* protect region against writes; begin tracking it */
>>> >> +    if (on) {
>>> >> +        slot->flags |= HVF_SLOT_LOG;
>>> >> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
>>> >> +                      HV_MEMORY_READ);
>>> >> +    /* stop tracking region*/
>>> >> +    } else {
>>> >> +        slot->flags &= ~HVF_SLOT_LOG;
>>> >> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
>>> >> +                      HV_MEMORY_READ | HV_MEMORY_WRITE);
>>> >> +    }
>>> >> +}
>>> >> +
>>> >> +static void hvf_log_start(MemoryListener *listener,
>>> >> +                          MemoryRegionSection *section, int old, int
>>> new)
>>> >> +{
>>> >> +    if (old != 0) {
>>> >> +        return;
>>> >> +    }
>>> >> +
>>> >> +    hvf_set_dirty_tracking(section, 1);
>>> >> +}
>>> >> +
>>> >> +static void hvf_log_stop(MemoryListener *listener,
>>> >> +                         MemoryRegionSection *section, int old, int
>>> new)
>>> >> +{
>>> >> +    if (new != 0) {
>>> >> +        return;
>>> >> +    }
>>> >> +
>>> >> +    hvf_set_dirty_tracking(section, 0);
>>> >> +}
>>> >> +
>>> >> +static void hvf_log_sync(MemoryListener *listener,
>>> >> +                         MemoryRegionSection *section)
>>> >> +{
>>> >> +    /*
>>> >> +     * sync of dirty pages is handled elsewhere; just make sure we
>>> keep
>>> >> +     * tracking the region.
>>> >> +     */
>>> >> +    hvf_set_dirty_tracking(section, 1);
>>> >> +}
>>> >> +
>>> >> +static void hvf_region_add(MemoryListener *listener,
>>> >> +                           MemoryRegionSection *section)
>>> >> +{
>>> >> +    hvf_set_phys_mem(section, true);
>>> >> +}
>>> >> +
>>> >> +static void hvf_region_del(MemoryListener *listener,
>>> >> +                           MemoryRegionSection *section)
>>> >> +{
>>> >> +    hvf_set_phys_mem(section, false);
>>> >> +}
>>> >> +
>>> >> +static MemoryListener hvf_memory_listener = {
>>> >> +    .priority = 10,
>>> >> +    .region_add = hvf_region_add,
>>> >> +    .region_del = hvf_region_del,
>>> >> +    .log_start = hvf_log_start,
>>> >> +    .log_stop = hvf_log_stop,
>>> >> +    .log_sync = hvf_log_sync,
>>> >> +};
>>> >> +
>>> >> +static void do_hvf_cpu_synchronize_state(CPUState *cpu,
>>> run_on_cpu_data arg)
>>> >> +{
>>> >> +    if (!cpu->vcpu_dirty) {
>>> >> +        hvf_get_registers(cpu);
>>> >> +        cpu->vcpu_dirty = true;
>>> >> +    }
>>> >> +}
>>> >> +
>>> >> +static void hvf_cpu_synchronize_state(CPUState *cpu)
>>> >> +{
>>> >> +    if (!cpu->vcpu_dirty) {
>>> >> +        run_on_cpu(cpu, do_hvf_cpu_synchronize_state,
>>> RUN_ON_CPU_NULL);
>>> >> +    }
>>> >> +}
>>> >> +
>>> >> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
>>> >> +                                              run_on_cpu_data arg)
>>> >> +{
>>> >> +    hvf_put_registers(cpu);
>>> >> +    cpu->vcpu_dirty = false;
>>> >> +}
>>> >> +
>>> >> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
>>> >> +{
>>> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset,
>>> RUN_ON_CPU_NULL);
>>> >> +}
>>> >> +
>>> >> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
>>> >> +                                             run_on_cpu_data arg)
>>> >> +{
>>> >> +    hvf_put_registers(cpu);
>>> >> +    cpu->vcpu_dirty = false;
>>> >> +}
>>> >> +
>>> >> +static void hvf_cpu_synchronize_post_init(CPUState *cpu)
>>> >> +{
>>> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init,
>>> RUN_ON_CPU_NULL);
>>> >> +}
>>> >> +
>>> >> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
>>> >> +                                              run_on_cpu_data arg)
>>> >> +{
>>> >> +    cpu->vcpu_dirty = true;
>>> >> +}
>>> >> +
>>> >> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
>>> >> +{
>>> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm,
>>> RUN_ON_CPU_NULL);
>>> >> +}
>>> >> +
>>> >> +static void hvf_vcpu_destroy(CPUState *cpu)
>>> >> +{
>>> >> +    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
>>> >> +    assert_hvf_ok(ret);
>>> >> +
>>> >> +    hvf_arch_vcpu_destroy(cpu);
>>> >> +}
>>> >> +
>>> >> +static void dummy_signal(int sig)
>>> >> +{
>>> >> +}
>>> >> +
>>> >> +static int hvf_init_vcpu(CPUState *cpu)
>>> >> +{
>>> >> +    int r;
>>> >> +
>>> >> +    /* init cpu signals */
>>> >> +    sigset_t set;
>>> >> +    struct sigaction sigact;
>>> >> +
>>> >> +    memset(&sigact, 0, sizeof(sigact));
>>> >> +    sigact.sa_handler = dummy_signal;
>>> >> +    sigaction(SIG_IPI, &sigact, NULL);
>>> >> +
>>> >> +    pthread_sigmask(SIG_BLOCK, NULL, &set);
>>> >> +    sigdelset(&set, SIG_IPI);
>>> >> +
>>> >> +#ifdef __aarch64__
>>> >> +    r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t
>>> **)&cpu->hvf_exit, NULL);
>>> >> +#else
>>> >> +    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
>>> >> +#endif
>>> > I think the first __aarch64__ bit fits better to arm part of the
>>> series.
>>>
>>>
>>> Oops. Thanks for catching it! Yes, absolutely. It should be part of the
>>> ARM enablement.
>>>
>>>
>>> >
>>> >> +    cpu->vcpu_dirty = 1;
>>> >> +    assert_hvf_ok(r);
>>> >> +
>>> >> +    return hvf_arch_init_vcpu(cpu);
>>> >> +}
>>> >> +
>>> >> +/*
>>> >> + * The HVF-specific vCPU thread function. This one should only run
>>> when the host
>>> >> + * CPU supports the VMX "unrestricted guest" feature.
>>> >> + */
>>> >> +static void *hvf_cpu_thread_fn(void *arg)
>>> >> +{
>>> >> +    CPUState *cpu = arg;
>>> >> +
>>> >> +    int r;
>>> >> +
>>> >> +    assert(hvf_enabled());
>>> >> +
>>> >> +    rcu_register_thread();
>>> >> +
>>> >> +    qemu_mutex_lock_iothread();
>>> >> +    qemu_thread_get_self(cpu->thread);
>>> >> +
>>> >> +    cpu->thread_id = qemu_get_thread_id();
>>> >> +    cpu->can_do_io = 1;
>>> >> +    current_cpu = cpu;
>>> >> +
>>> >> +    hvf_init_vcpu(cpu);
>>> >> +
>>> >> +    /* signal CPU creation */
>>> >> +    cpu_thread_signal_created(cpu);
>>> >> +    qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>> >> +
>>> >> +    do {
>>> >> +        if (cpu_can_run(cpu)) {
>>> >> +            r = hvf_vcpu_exec(cpu);
>>> >> +            if (r == EXCP_DEBUG) {
>>> >> +                cpu_handle_guest_debug(cpu);
>>> >> +            }
>>> >> +        }
>>> >> +        qemu_wait_io_event(cpu);
>>> >> +    } while (!cpu->unplug || cpu_can_run(cpu));
>>> >> +
>>> >> +    hvf_vcpu_destroy(cpu);
>>> >> +    cpu_thread_signal_destroyed(cpu);
>>> >> +    qemu_mutex_unlock_iothread();
>>> >> +    rcu_unregister_thread();
>>> >> +    return NULL;
>>> >> +}
>>> >> +
>>> >> +static void hvf_start_vcpu_thread(CPUState *cpu)
>>> >> +{
>>> >> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>>> >> +
>>> >> +    /*
>>> >> +     * HVF currently does not support TCG, and only runs in
>>> >> +     * unrestricted-guest mode.
>>> >> +     */
>>> >> +    assert(hvf_enabled());
>>> >> +
>>> >> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>>> >> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>> >> +    qemu_cond_init(cpu->halt_cond);
>>> >> +
>>> >> +    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>>> >> +             cpu->cpu_index);
>>> >> +    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
>>> >> +                       cpu, QEMU_THREAD_JOINABLE);
>>> >> +}
>>> >> +
>>> >> +static const CpusAccel hvf_cpus = {
>>> >> +    .create_vcpu_thread = hvf_start_vcpu_thread,
>>> >> +
>>> >> +    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
>>> >> +    .synchronize_post_init = hvf_cpu_synchronize_post_init,
>>> >> +    .synchronize_state = hvf_cpu_synchronize_state,
>>> >> +    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
>>> >> +};
>>> >> +
>>> >> +static int hvf_accel_init(MachineState *ms)
>>> >> +{
>>> >> +    int x;
>>> >> +    hv_return_t ret;
>>> >> +    HVFState *s;
>>> >> +
>>> >> +    ret = hv_vm_create(HV_VM_DEFAULT);
>>> >> +    assert_hvf_ok(ret);
>>> >> +
>>> >> +    s = g_new0(HVFState, 1);
>>> >> +
>>> >> +    s->num_slots = 32;
>>> >> +    for (x = 0; x < s->num_slots; ++x) {
>>> >> +        s->slots[x].size = 0;
>>> >> +        s->slots[x].slot_id = x;
>>> >> +    }
>>> >> +
>>> >> +    hvf_state = s;
>>> >> +    memory_listener_register(&hvf_memory_listener,
>>> &address_space_memory);
>>> >> +    cpus_register_accel(&hvf_cpus);
>>> >> +    return 0;
>>> >> +}
>>> >> +
>>> >> +static void hvf_accel_class_init(ObjectClass *oc, void *data)
>>> >> +{
>>> >> +    AccelClass *ac = ACCEL_CLASS(oc);
>>> >> +    ac->name = "HVF";
>>> >> +    ac->init_machine = hvf_accel_init;
>>> >> +    ac->allowed = &hvf_allowed;
>>> >> +}
>>> >> +
>>> >> +static const TypeInfo hvf_accel_type = {
>>> >> +    .name = TYPE_HVF_ACCEL,
>>> >> +    .parent = TYPE_ACCEL,
>>> >> +    .class_init = hvf_accel_class_init,
>>> >> +};
>>> >> +
>>> >> +static void hvf_type_init(void)
>>> >> +{
>>> >> +    type_register_static(&hvf_accel_type);
>>> >> +}
>>> >> +
>>> >> +type_init(hvf_type_init);
>>> >> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
>>> >> new file mode 100644
>>> >> index 0000000000..dfd6b68dc7
>>> >> --- /dev/null
>>> >> +++ b/accel/hvf/meson.build
>>> >> @@ -0,0 +1,7 @@
>>> >> +hvf_ss = ss.source_set()
>>> >> +hvf_ss.add(files(
>>> >> +  'hvf-all.c',
>>> >> +  'hvf-cpus.c',
>>> >> +))
>>> >> +
>>> >> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
>>> >> diff --git a/accel/meson.build b/accel/meson.build
>>> >> index b26cca227a..6de12ce5d5 100644
>>> >> --- a/accel/meson.build
>>> >> +++ b/accel/meson.build
>>> >> @@ -1,5 +1,6 @@
>>> >>   softmmu_ss.add(files('accel.c'))
>>> >>
>>> >> +subdir('hvf')
>>> >>   subdir('qtest')
>>> >>   subdir('kvm')
>>> >>   subdir('tcg')
>>> >> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
>>> >> new file mode 100644
>>> >> index 0000000000..de9bad23a8
>>> >> --- /dev/null
>>> >> +++ b/include/sysemu/hvf_int.h
>>> >> @@ -0,0 +1,69 @@
>>> >> +/*
>>> >> + * QEMU Hypervisor.framework (HVF) support
>>> >> + *
>>> >> + * This work is licensed under the terms of the GNU GPL, version 2
>>> or later.
>>> >> + * See the COPYING file in the top-level directory.
>>> >> + *
>>> >> + */
>>> >> +
>>> >> +/* header to be included in HVF-specific code */
>>> >> +
>>> >> +#ifndef HVF_INT_H
>>> >> +#define HVF_INT_H
>>> >> +
>>> >> +#include <Hypervisor/Hypervisor.h>
>>> >> +
>>> >> +#define HVF_MAX_VCPU 0x10
>>> >> +
>>> >> +extern struct hvf_state hvf_global;
>>> >> +
>>> >> +struct hvf_vm {
>>> >> +    int id;
>>> >> +    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
>>> >> +};
>>> >> +
>>> >> +struct hvf_state {
>>> >> +    uint32_t version;
>>> >> +    struct hvf_vm *vm;
>>> >> +    uint64_t mem_quota;
>>> >> +};
>>> >> +
>>> >> +/* hvf_slot flags */
>>> >> +#define HVF_SLOT_LOG (1 << 0)
>>> >> +
>>> >> +typedef struct hvf_slot {
>>> >> +    uint64_t start;
>>> >> +    uint64_t size;
>>> >> +    uint8_t *mem;
>>> >> +    int slot_id;
>>> >> +    uint32_t flags;
>>> >> +    MemoryRegion *region;
>>> >> +} hvf_slot;
>>> >> +
>>> >> +typedef struct hvf_vcpu_caps {
>>> >> +    uint64_t vmx_cap_pinbased;
>>> >> +    uint64_t vmx_cap_procbased;
>>> >> +    uint64_t vmx_cap_procbased2;
>>> >> +    uint64_t vmx_cap_entry;
>>> >> +    uint64_t vmx_cap_exit;
>>> >> +    uint64_t vmx_cap_preemption_timer;
>>> >> +} hvf_vcpu_caps;
>>> >> +
>>> >> +struct HVFState {
>>> >> +    AccelState parent;
>>> >> +    hvf_slot slots[32];
>>> >> +    int num_slots;
>>> >> +
>>> >> +    hvf_vcpu_caps *hvf_caps;
>>> >> +};
>>> >> +extern HVFState *hvf_state;
>>> >> +
>>> >> +void assert_hvf_ok(hv_return_t ret);
>>> >> +int hvf_get_registers(CPUState *cpu);
>>> >> +int hvf_put_registers(CPUState *cpu);
>>> >> +int hvf_arch_init_vcpu(CPUState *cpu);
>>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu);
>>> >> +int hvf_vcpu_exec(CPUState *cpu);
>>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
>>> >> +
>>> >> +#endif
>>> >> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c
>>> >> deleted file mode 100644
>>> >> index 817b3d7452..0000000000
>>> >> --- a/target/i386/hvf/hvf-cpus.c
>>> >> +++ /dev/null
>>> >> @@ -1,131 +0,0 @@
>>> >> -/*
>>> >> - * Copyright 2008 IBM Corporation
>>> >> - *           2008 Red Hat, Inc.
>>> >> - * Copyright 2011 Intel Corporation
>>> >> - * Copyright 2016 Veertu, Inc.
>>> >> - * Copyright 2017 The Android Open Source Project
>>> >> - *
>>> >> - * QEMU Hypervisor.framework support
>>> >> - *
>>> >> - * This program is free software; you can redistribute it and/or
>>> >> - * modify it under the terms of version 2 of the GNU General Public
>>> >> - * License as published by the Free Software Foundation.
>>> >> - *
>>> >> - * This program is distributed in the hope that it will be useful,
>>> >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>> >> - * General Public License for more details.
>>> >> - *
>>> >> - * You should have received a copy of the GNU General Public License
>>> >> - * along with this program; if not, see <
>>> http://www.gnu.org/licenses/>.
>>> >> - *
>>> >> - * This file contain code under public domain from the hvdos project:
>>> >> - * https://github.com/mist64/hvdos
>>> >> - *
>>> >> - * Parts Copyright (c) 2011 NetApp, Inc.
>>> >> - * All rights reserved.
>>> >> - *
>>> >> - * Redistribution and use in source and binary forms, with or without
>>> >> - * modification, are permitted provided that the following conditions
>>> >> - * are met:
>>> >> - * 1. Redistributions of source code must retain the above copyright
>>> >> - *    notice, this list of conditions and the following disclaimer.
>>> >> - * 2. Redistributions in binary form must reproduce the above
>>> copyright
>>> >> - *    notice, this list of conditions and the following disclaimer
>>> in the
>>> >> - *    documentation and/or other materials provided with the
>>> distribution.
>>> >> - *
>>> >> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
>>> >> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
>>> THE
>>> >> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>>> PARTICULAR PURPOSE
>>> >> - * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE
>>> LIABLE
>>> >> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
>>> CONSEQUENTIAL
>>> >> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
>>> GOODS
>>> >> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
>>> INTERRUPTION)
>>> >> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
>>> CONTRACT, STRICT
>>> >> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
>>> ANY WAY
>>> >> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
>>> POSSIBILITY OF
>>> >> - * SUCH DAMAGE.
>>> >> - */
>>> >> -
>>> >> -#include "qemu/osdep.h"
>>> >> -#include "qemu/error-report.h"
>>> >> -#include "qemu/main-loop.h"
>>> >> -#include "sysemu/hvf.h"
>>> >> -#include "sysemu/runstate.h"
>>> >> -#include "target/i386/cpu.h"
>>> >> -#include "qemu/guest-random.h"
>>> >> -
>>> >> -#include "hvf-cpus.h"
>>> >> -
>>> >> -/*
>>> >> - * The HVF-specific vCPU thread function. This one should only run
>>> when the host
>>> >> - * CPU supports the VMX "unrestricted guest" feature.
>>> >> - */
>>> >> -static void *hvf_cpu_thread_fn(void *arg)
>>> >> -{
>>> >> -    CPUState *cpu = arg;
>>> >> -
>>> >> -    int r;
>>> >> -
>>> >> -    assert(hvf_enabled());
>>> >> -
>>> >> -    rcu_register_thread();
>>> >> -
>>> >> -    qemu_mutex_lock_iothread();
>>> >> -    qemu_thread_get_self(cpu->thread);
>>> >> -
>>> >> -    cpu->thread_id = qemu_get_thread_id();
>>> >> -    cpu->can_do_io = 1;
>>> >> -    current_cpu = cpu;
>>> >> -
>>> >> -    hvf_init_vcpu(cpu);
>>> >> -
>>> >> -    /* signal CPU creation */
>>> >> -    cpu_thread_signal_created(cpu);
>>> >> -    qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>> >> -
>>> >> -    do {
>>> >> -        if (cpu_can_run(cpu)) {
>>> >> -            r = hvf_vcpu_exec(cpu);
>>> >> -            if (r == EXCP_DEBUG) {
>>> >> -                cpu_handle_guest_debug(cpu);
>>> >> -            }
>>> >> -        }
>>> >> -        qemu_wait_io_event(cpu);
>>> >> -    } while (!cpu->unplug || cpu_can_run(cpu));
>>> >> -
>>> >> -    hvf_vcpu_destroy(cpu);
>>> >> -    cpu_thread_signal_destroyed(cpu);
>>> >> -    qemu_mutex_unlock_iothread();
>>> >> -    rcu_unregister_thread();
>>> >> -    return NULL;
>>> >> -}
>>> >> -
>>> >> -static void hvf_start_vcpu_thread(CPUState *cpu)
>>> >> -{
>>> >> -    char thread_name[VCPU_THREAD_NAME_SIZE];
>>> >> -
>>> >> -    /*
>>> >> -     * HVF currently does not support TCG, and only runs in
>>> >> -     * unrestricted-guest mode.
>>> >> -     */
>>> >> -    assert(hvf_enabled());
>>> >> -
>>> >> -    cpu->thread = g_malloc0(sizeof(QemuThread));
>>> >> -    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>> >> -    qemu_cond_init(cpu->halt_cond);
>>> >> -
>>> >> -    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>>> >> -             cpu->cpu_index);
>>> >> -    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
>>> >> -                       cpu, QEMU_THREAD_JOINABLE);
>>> >> -}
>>> >> -
>>> >> -const CpusAccel hvf_cpus = {
>>> >> -    .create_vcpu_thread = hvf_start_vcpu_thread,
>>> >> -
>>> >> -    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
>>> >> -    .synchronize_post_init = hvf_cpu_synchronize_post_init,
>>> >> -    .synchronize_state = hvf_cpu_synchronize_state,
>>> >> -    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
>>> >> -};
>>> >> diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h
>>> >> deleted file mode 100644
>>> >> index ced31b82c0..0000000000
>>> >> --- a/target/i386/hvf/hvf-cpus.h
>>> >> +++ /dev/null
>>> >> @@ -1,25 +0,0 @@
>>> >> -/*
>>> >> - * Accelerator CPUS Interface
>>> >> - *
>>> >> - * Copyright 2020 SUSE LLC
>>> >> - *
>>> >> - * This work is licensed under the terms of the GNU GPL, version 2
>>> or later.
>>> >> - * See the COPYING file in the top-level directory.
>>> >> - */
>>> >> -
>>> >> -#ifndef HVF_CPUS_H
>>> >> -#define HVF_CPUS_H
>>> >> -
>>> >> -#include "sysemu/cpus.h"
>>> >> -
>>> >> -extern const CpusAccel hvf_cpus;
>>> >> -
>>> >> -int hvf_init_vcpu(CPUState *);
>>> >> -int hvf_vcpu_exec(CPUState *);
>>> >> -void hvf_cpu_synchronize_state(CPUState *);
>>> >> -void hvf_cpu_synchronize_post_reset(CPUState *);
>>> >> -void hvf_cpu_synchronize_post_init(CPUState *);
>>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *);
>>> >> -void hvf_vcpu_destroy(CPUState *);
>>> >> -
>>> >> -#endif /* HVF_CPUS_H */
>>> >> diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
>>> >> index e0edffd077..6d56f8f6bb 100644
>>> >> --- a/target/i386/hvf/hvf-i386.h
>>> >> +++ b/target/i386/hvf/hvf-i386.h
>>> >> @@ -18,57 +18,11 @@
>>> >>
>>> >>   #include "sysemu/accel.h"
>>> >>   #include "sysemu/hvf.h"
>>> >> +#include "sysemu/hvf_int.h"
>>> >>   #include "cpu.h"
>>> >>   #include "x86.h"
>>> >>
>>> >> -#define HVF_MAX_VCPU 0x10
>>> >> -
>>> >> -extern struct hvf_state hvf_global;
>>> >> -
>>> >> -struct hvf_vm {
>>> >> -    int id;
>>> >> -    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
>>> >> -};
>>> >> -
>>> >> -struct hvf_state {
>>> >> -    uint32_t version;
>>> >> -    struct hvf_vm *vm;
>>> >> -    uint64_t mem_quota;
>>> >> -};
>>> >> -
>>> >> -/* hvf_slot flags */
>>> >> -#define HVF_SLOT_LOG (1 << 0)
>>> >> -
>>> >> -typedef struct hvf_slot {
>>> >> -    uint64_t start;
>>> >> -    uint64_t size;
>>> >> -    uint8_t *mem;
>>> >> -    int slot_id;
>>> >> -    uint32_t flags;
>>> >> -    MemoryRegion *region;
>>> >> -} hvf_slot;
>>> >> -
>>> >> -typedef struct hvf_vcpu_caps {
>>> >> -    uint64_t vmx_cap_pinbased;
>>> >> -    uint64_t vmx_cap_procbased;
>>> >> -    uint64_t vmx_cap_procbased2;
>>> >> -    uint64_t vmx_cap_entry;
>>> >> -    uint64_t vmx_cap_exit;
>>> >> -    uint64_t vmx_cap_preemption_timer;
>>> >> -} hvf_vcpu_caps;
>>> >> -
>>> >> -struct HVFState {
>>> >> -    AccelState parent;
>>> >> -    hvf_slot slots[32];
>>> >> -    int num_slots;
>>> >> -
>>> >> -    hvf_vcpu_caps *hvf_caps;
>>> >> -};
>>> >> -extern HVFState *hvf_state;
>>> >> -
>>> >> -void hvf_set_phys_mem(MemoryRegionSection *, bool);
>>> >>   void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
>>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
>>> >>
>>> >>   #ifdef NEED_CPU_H
>>> >>   /* Functions exported to host specific mode */
>>> >> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
>>> >> index ed9356565c..8b96ecd619 100644
>>> >> --- a/target/i386/hvf/hvf.c
>>> >> +++ b/target/i386/hvf/hvf.c
>>> >> @@ -51,6 +51,7 @@
>>> >>   #include "qemu/error-report.h"
>>> >>
>>> >>   #include "sysemu/hvf.h"
>>> >> +#include "sysemu/hvf_int.h"
>>> >>   #include "sysemu/runstate.h"
>>> >>   #include "hvf-i386.h"
>>> >>   #include "vmcs.h"
>>> >> @@ -72,171 +73,6 @@
>>> >>   #include "sysemu/accel.h"
>>> >>   #include "target/i386/cpu.h"
>>> >>
>>> >> -#include "hvf-cpus.h"
>>> >> -
>>> >> -HVFState *hvf_state;
>>> >> -
>>> >> -static void assert_hvf_ok(hv_return_t ret)
>>> >> -{
>>> >> -    if (ret == HV_SUCCESS) {
>>> >> -        return;
>>> >> -    }
>>> >> -
>>> >> -    switch (ret) {
>>> >> -    case HV_ERROR:
>>> >> -        error_report("Error: HV_ERROR");
>>> >> -        break;
>>> >> -    case HV_BUSY:
>>> >> -        error_report("Error: HV_BUSY");
>>> >> -        break;
>>> >> -    case HV_BAD_ARGUMENT:
>>> >> -        error_report("Error: HV_BAD_ARGUMENT");
>>> >> -        break;
>>> >> -    case HV_NO_RESOURCES:
>>> >> -        error_report("Error: HV_NO_RESOURCES");
>>> >> -        break;
>>> >> -    case HV_NO_DEVICE:
>>> >> -        error_report("Error: HV_NO_DEVICE");
>>> >> -        break;
>>> >> -    case HV_UNSUPPORTED:
>>> >> -        error_report("Error: HV_UNSUPPORTED");
>>> >> -        break;
>>> >> -    default:
>>> >> -        error_report("Unknown Error");
>>> >> -    }
>>> >> -
>>> >> -    abort();
>>> >> -}
>>> >> -
>>> >> -/* Memory slots */
>>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
>>> >> -{
>>> >> -    hvf_slot *slot;
>>> >> -    int x;
>>> >> -    for (x = 0; x < hvf_state->num_slots; ++x) {
>>> >> -        slot = &hvf_state->slots[x];
>>> >> -        if (slot->size && start < (slot->start + slot->size) &&
>>> >> -            (start + size) > slot->start) {
>>> >> -            return slot;
>>> >> -        }
>>> >> -    }
>>> >> -    return NULL;
>>> >> -}
>>> >> -
>>> >> -struct mac_slot {
>>> >> -    int present;
>>> >> -    uint64_t size;
>>> >> -    uint64_t gpa_start;
>>> >> -    uint64_t gva;
>>> >> -};
>>> >> -
>>> >> -struct mac_slot mac_slots[32];
>>> >> -
>>> >> -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
>>> >> -{
>>> >> -    struct mac_slot *macslot;
>>> >> -    hv_return_t ret;
>>> >> -
>>> >> -    macslot = &mac_slots[slot->slot_id];
>>> >> -
>>> >> -    if (macslot->present) {
>>> >> -        if (macslot->size != slot->size) {
>>> >> -            macslot->present = 0;
>>> >> -            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
>>> >> -            assert_hvf_ok(ret);
>>> >> -        }
>>> >> -    }
>>> >> -
>>> >> -    if (!slot->size) {
>>> >> -        return 0;
>>> >> -    }
>>> >> -
>>> >> -    macslot->present = 1;
>>> >> -    macslot->gpa_start = slot->start;
>>> >> -    macslot->size = slot->size;
>>> >> -    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size,
>>> flags);
>>> >> -    assert_hvf_ok(ret);
>>> >> -    return 0;
>>> >> -}
>>> >> -
>>> >> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
>>> >> -{
>>> >> -    hvf_slot *mem;
>>> >> -    MemoryRegion *area = section->mr;
>>> >> -    bool writeable = !area->readonly && !area->rom_device;
>>> >> -    hv_memory_flags_t flags;
>>> >> -
>>> >> -    if (!memory_region_is_ram(area)) {
>>> >> -        if (writeable) {
>>> >> -            return;
>>> >> -        } else if (!memory_region_is_romd(area)) {
>>> >> -            /*
>>> >> -             * If the memory device is not in romd_mode, then we
>>> actually want
>>> >> -             * to remove the hvf memory slot so all accesses will
>>> trap.
>>> >> -             */
>>> >> -             add = false;
>>> >> -        }
>>> >> -    }
>>> >> -
>>> >> -    mem = hvf_find_overlap_slot(
>>> >> -            section->offset_within_address_space,
>>> >> -            int128_get64(section->size));
>>> >> -
>>> >> -    if (mem && add) {
>>> >> -        if (mem->size == int128_get64(section->size) &&
>>> >> -            mem->start == section->offset_within_address_space &&
>>> >> -            mem->mem == (memory_region_get_ram_ptr(area) +
>>> >> -            section->offset_within_region)) {
>>> >> -            return; /* Same region was attempted to register, go
>>> away. */
>>> >> -        }
>>> >> -    }
>>> >> -
>>> >> -    /* Region needs to be reset. set the size to 0 and remap it. */
>>> >> -    if (mem) {
>>> >> -        mem->size = 0;
>>> >> -        if (do_hvf_set_memory(mem, 0)) {
>>> >> -            error_report("Failed to reset overlapping slot");
>>> >> -            abort();
>>> >> -        }
>>> >> -    }
>>> >> -
>>> >> -    if (!add) {
>>> >> -        return;
>>> >> -    }
>>> >> -
>>> >> -    if (area->readonly ||
>>> >> -        (!memory_region_is_ram(area) &&
>>> memory_region_is_romd(area))) {
>>> >> -        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
>>> >> -    } else {
>>> >> -        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
>>> >> -    }
>>> >> -
>>> >> -    /* Now make a new slot. */
>>> >> -    int x;
>>> >> -
>>> >> -    for (x = 0; x < hvf_state->num_slots; ++x) {
>>> >> -        mem = &hvf_state->slots[x];
>>> >> -        if (!mem->size) {
>>> >> -            break;
>>> >> -        }
>>> >> -    }
>>> >> -
>>> >> -    if (x == hvf_state->num_slots) {
>>> >> -        error_report("No free slots");
>>> >> -        abort();
>>> >> -    }
>>> >> -
>>> >> -    mem->size = int128_get64(section->size);
>>> >> -    mem->mem = memory_region_get_ram_ptr(area) +
>>> section->offset_within_region;
>>> >> -    mem->start = section->offset_within_address_space;
>>> >> -    mem->region = area;
>>> >> -
>>> >> -    if (do_hvf_set_memory(mem, flags)) {
>>> >> -        error_report("Error registering new memory slot");
>>> >> -        abort();
>>> >> -    }
>>> >> -}
>>> >> -
>>> >>   void vmx_update_tpr(CPUState *cpu)
>>> >>   {
>>> >>       /* TODO: need integrate APIC handling */
>>> >> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t
>>> port, void *buffer,
>>> >>       }
>>> >>   }
>>> >>
>>> >> -static void do_hvf_cpu_synchronize_state(CPUState *cpu,
>>> run_on_cpu_data arg)
>>> >> -{
>>> >> -    if (!cpu->vcpu_dirty) {
>>> >> -        hvf_get_registers(cpu);
>>> >> -        cpu->vcpu_dirty = true;
>>> >> -    }
>>> >> -}
>>> >> -
>>> >> -void hvf_cpu_synchronize_state(CPUState *cpu)
>>> >> -{
>>> >> -    if (!cpu->vcpu_dirty) {
>>> >> -        run_on_cpu(cpu, do_hvf_cpu_synchronize_state,
>>> RUN_ON_CPU_NULL);
>>> >> -    }
>>> >> -}
>>> >> -
>>> >> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
>>> >> -                                              run_on_cpu_data arg)
>>> >> -{
>>> >> -    hvf_put_registers(cpu);
>>> >> -    cpu->vcpu_dirty = false;
>>> >> -}
>>> >> -
>>> >> -void hvf_cpu_synchronize_post_reset(CPUState *cpu)
>>> >> -{
>>> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset,
>>> RUN_ON_CPU_NULL);
>>> >> -}
>>> >> -
>>> >> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
>>> >> -                                             run_on_cpu_data arg)
>>> >> -{
>>> >> -    hvf_put_registers(cpu);
>>> >> -    cpu->vcpu_dirty = false;
>>> >> -}
>>> >> -
>>> >> -void hvf_cpu_synchronize_post_init(CPUState *cpu)
>>> >> -{
>>> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init,
>>> RUN_ON_CPU_NULL);
>>> >> -}
>>> >> -
>>> >> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
>>> >> -                                              run_on_cpu_data arg)
>>> >> -{
>>> >> -    cpu->vcpu_dirty = true;
>>> >> -}
>>> >> -
>>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
>>> >> -{
>>> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm,
>>> RUN_ON_CPU_NULL);
>>> >> -}
>>> >> -
>>> >>   static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa,
>>> uint64_t ept_qual)
>>> >>   {
>>> >>       int read, write;
>>> >> @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot
>>> *slot, uint64_t gpa, uint64_t ept_qual)
>>> >>       return false;
>>> >>   }
>>> >>
>>> >> -static void hvf_set_dirty_tracking(MemoryRegionSection *section,
>>> bool on)
>>> >> -{
>>> >> -    hvf_slot *slot;
>>> >> -
>>> >> -    slot = hvf_find_overlap_slot(
>>> >> -            section->offset_within_address_space,
>>> >> -            int128_get64(section->size));
>>> >> -
>>> >> -    /* protect region against writes; begin tracking it */
>>> >> -    if (on) {
>>> >> -        slot->flags |= HVF_SLOT_LOG;
>>> >> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
>>> >> -                      HV_MEMORY_READ);
>>> >> -    /* stop tracking region*/
>>> >> -    } else {
>>> >> -        slot->flags &= ~HVF_SLOT_LOG;
>>> >> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
>>> >> -                      HV_MEMORY_READ | HV_MEMORY_WRITE);
>>> >> -    }
>>> >> -}
>>> >> -
>>> >> -static void hvf_log_start(MemoryListener *listener,
>>> >> -                          MemoryRegionSection *section, int old, int
>>> new)
>>> >> -{
>>> >> -    if (old != 0) {
>>> >> -        return;
>>> >> -    }
>>> >> -
>>> >> -    hvf_set_dirty_tracking(section, 1);
>>> >> -}
>>> >> -
>>> >> -static void hvf_log_stop(MemoryListener *listener,
>>> >> -                         MemoryRegionSection *section, int old, int
>>> new)
>>> >> -{
>>> >> -    if (new != 0) {
>>> >> -        return;
>>> >> -    }
>>> >> -
>>> >> -    hvf_set_dirty_tracking(section, 0);
>>> >> -}
>>> >> -
>>> >> -static void hvf_log_sync(MemoryListener *listener,
>>> >> -                         MemoryRegionSection *section)
>>> >> -{
>>> >> -    /*
>>> >> -     * sync of dirty pages is handled elsewhere; just make sure we
>>> keep
>>> >> -     * tracking the region.
>>> >> -     */
>>> >> -    hvf_set_dirty_tracking(section, 1);
>>> >> -}
>>> >> -
>>> >> -static void hvf_region_add(MemoryListener *listener,
>>> >> -                           MemoryRegionSection *section)
>>> >> -{
>>> >> -    hvf_set_phys_mem(section, true);
>>> >> -}
>>> >> -
>>> >> -static void hvf_region_del(MemoryListener *listener,
>>> >> -                           MemoryRegionSection *section)
>>> >> -{
>>> >> -    hvf_set_phys_mem(section, false);
>>> >> -}
>>> >> -
>>> >> -static MemoryListener hvf_memory_listener = {
>>> >> -    .priority = 10,
>>> >> -    .region_add = hvf_region_add,
>>> >> -    .region_del = hvf_region_del,
>>> >> -    .log_start = hvf_log_start,
>>> >> -    .log_stop = hvf_log_stop,
>>> >> -    .log_sync = hvf_log_sync,
>>> >> -};
>>> >> -
>>> >> -void hvf_vcpu_destroy(CPUState *cpu)
>>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu)
>>> >>   {
>>> >>       X86CPU *x86_cpu = X86_CPU(cpu);
>>> >>       CPUX86State *env = &x86_cpu->env;
>>> >>
>>> >> -    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
>>> >>       g_free(env->hvf_mmio_buf);
>>> >> -    assert_hvf_ok(ret);
>>> >> -}
>>> >> -
>>> >> -static void dummy_signal(int sig)
>>> >> -{
>>> >>   }
>>> >>
>>> >> -int hvf_init_vcpu(CPUState *cpu)
>>> >> +int hvf_arch_init_vcpu(CPUState *cpu)
>>> >>   {
>>> >>
>>> >>       X86CPU *x86cpu = X86_CPU(cpu);
>>> >>       CPUX86State *env = &x86cpu->env;
>>> >> -    int r;
>>> >> -
>>> >> -    /* init cpu signals */
>>> >> -    sigset_t set;
>>> >> -    struct sigaction sigact;
>>> >> -
>>> >> -    memset(&sigact, 0, sizeof(sigact));
>>> >> -    sigact.sa_handler = dummy_signal;
>>> >> -    sigaction(SIG_IPI, &sigact, NULL);
>>> >> -
>>> >> -    pthread_sigmask(SIG_BLOCK, NULL, &set);
>>> >> -    sigdelset(&set, SIG_IPI);
>>> >>
>>> >>       init_emu();
>>> >>       init_decoder();
>>> >> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu)
>>> >>       hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1);
>>> >>       env->hvf_mmio_buf = g_new(char, 4096);
>>> >>
>>> >> -    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
>>> >> -    cpu->vcpu_dirty = 1;
>>> >> -    assert_hvf_ok(r);
>>> >> -
>>> >>       if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
>>> >>           &hvf_state->hvf_caps->vmx_cap_pinbased)) {
>>> >>           abort();
>>> >> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu)
>>> >>
>>> >>       return ret;
>>> >>   }
>>> >> -
>>> >> -bool hvf_allowed;
>>> >> -
>>> >> -static int hvf_accel_init(MachineState *ms)
>>> >> -{
>>> >> -    int x;
>>> >> -    hv_return_t ret;
>>> >> -    HVFState *s;
>>> >> -
>>> >> -    ret = hv_vm_create(HV_VM_DEFAULT);
>>> >> -    assert_hvf_ok(ret);
>>> >> -
>>> >> -    s = g_new0(HVFState, 1);
>>> >> -
>>> >> -    s->num_slots = 32;
>>> >> -    for (x = 0; x < s->num_slots; ++x) {
>>> >> -        s->slots[x].size = 0;
>>> >> -        s->slots[x].slot_id = x;
>>> >> -    }
>>> >> -
>>> >> -    hvf_state = s;
>>> >> -    memory_listener_register(&hvf_memory_listener,
>>> &address_space_memory);
>>> >> -    cpus_register_accel(&hvf_cpus);
>>> >> -    return 0;
>>> >> -}
>>> >> -
>>> >> -static void hvf_accel_class_init(ObjectClass *oc, void *data)
>>> >> -{
>>> >> -    AccelClass *ac = ACCEL_CLASS(oc);
>>> >> -    ac->name = "HVF";
>>> >> -    ac->init_machine = hvf_accel_init;
>>> >> -    ac->allowed = &hvf_allowed;
>>> >> -}
>>> >> -
>>> >> -static const TypeInfo hvf_accel_type = {
>>> >> -    .name = TYPE_HVF_ACCEL,
>>> >> -    .parent = TYPE_ACCEL,
>>> >> -    .class_init = hvf_accel_class_init,
>>> >> -};
>>> >> -
>>> >> -static void hvf_type_init(void)
>>> >> -{
>>> >> -    type_register_static(&hvf_accel_type);
>>> >> -}
>>> >> -
>>> >> -type_init(hvf_type_init);
>>> >> diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
>>> >> index 409c9a3f14..c8a43717ee 100644
>>> >> --- a/target/i386/hvf/meson.build
>>> >> +++ b/target/i386/hvf/meson.build
>>> >> @@ -1,6 +1,5 @@
>>> >>   i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
>>> >>     'hvf.c',
>>> >> -  'hvf-cpus.c',
>>> >>     'x86.c',
>>> >>     'x86_cpuid.c',
>>> >>     'x86_decode.c',
>>> >> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
>>> >> index bbec412b6c..89b8e9d87a 100644
>>> >> --- a/target/i386/hvf/x86hvf.c
>>> >> +++ b/target/i386/hvf/x86hvf.c
>>> >> @@ -20,6 +20,9 @@
>>> >>   #include "qemu/osdep.h"
>>> >>
>>> >>   #include "qemu-common.h"
>>> >> +#include "sysemu/hvf.h"
>>> >> +#include "sysemu/hvf_int.h"
>>> >> +#include "sysemu/hw_accel.h"
>>> >>   #include "x86hvf.h"
>>> >>   #include "vmx.h"
>>> >>   #include "vmcs.h"
>>> >> @@ -32,8 +35,6 @@
>>> >>   #include <Hypervisor/hv.h>
>>> >>   #include <Hypervisor/hv_vmx.h>
>>> >>
>>> >> -#include "hvf-cpus.h"
>>> >> -
>>> >>   void hvf_set_segment(struct CPUState *cpu, struct vmx_segment
>>> *vmx_seg,
>>> >>                        SegmentCache *qseg, bool is_tr)
>>> >>   {
>>> >> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state)
>>> >>       env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
>>> >>
>>> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
>>> >> -        hvf_cpu_synchronize_state(cpu_state);
>>> >> +        cpu_synchronize_state(cpu_state);
>>> >>           do_cpu_init(cpu);
>>> >>       }
>>> >>
>>> >> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state)
>>> >>           cpu_state->halted = 0;
>>> >>       }
>>> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) {
>>> >> -        hvf_cpu_synchronize_state(cpu_state);
>>> >> +        cpu_synchronize_state(cpu_state);
>>> >>           do_cpu_sipi(cpu);
>>> >>       }
>>> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) {
>>> >>           cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR;
>>> >> -        hvf_cpu_synchronize_state(cpu_state);
>>> >> +        cpu_synchronize_state(cpu_state);
>>> > The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should
>>> > be a separate patch. It follows cpu/accel cleanups Claudio was doing
>>> the
>>> > summer.
>>>
>>>
>>> The only reason they're in here is because we no longer have access to
>>> the hvf_ functions from the file. I am perfectly happy to rebase the
>>> patch on top of Claudio's if his goes in first. I'm sure it'll be
>>> trivial for him to rebase on top of this too if my series goes in first.
>>>
>>>
>>> >
>>> > Phillipe raised the idea that the patch might go ahead of ARM-specific
>>> > part (which might involve some discussions) and I agree with that.
>>> >
>>> > Some sync between Claudio series (CC'd him) and the patch might be
>>> need.
>>>
>>>
>>> I would prefer not to hold back because of the sync. Claudio's cleanup
>>> is trivial enough to adjust for if it gets merged ahead of this.
>>>
>>>
>>> Alex
>>>
>>>
>>>
>>>
Peter Collingbourne Nov. 30, 2020, 9:08 p.m. UTC | #7
On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote:
>
>
>
> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote:
>>
>> Hi Frank,
>>
>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse.
>
> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either!
>>
>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out
>>
>>   https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/
>>
>
> Thanks, we'll take a look :)
>
>>
>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold.

Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
I'll reply to your patch here. You have:

+                    /* Set cpu->hvf->sleeping so that we get a
SIG_IPI signal. */
+                    cpu->hvf->sleeping = true;
+                    smp_mb();
+
+                    /* Bail out if we received an IRQ meanwhile */
+                    if (cpu->thread_kicked || (cpu->interrupt_request &
+                        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
+                        cpu->hvf->sleeping = false;
+                        break;
+                    }
+
+                    /* nanosleep returns on signal, so we wake up on kick. */
+                    nanosleep(ts, NULL);

and then send the signal conditional on whether sleeping is true, but
I think this is racy. If the signal is sent after sleeping is set to
true but before entering nanosleep then I think it will be ignored and
we will miss the wakeup. That's why in my implementation I block IPI
on the CPU thread at startup and then use pselect to atomically
unblock and begin sleeping. The signal is sent unconditionally so
there's no need to worry about races between actually sleeping and the
"we think we're sleeping" state. It may lead to an extra wakeup but
that's better than missing it entirely.

Peter

>>
>> Also, is there a particular reason you're working on this super interesting and useful code in a random downstream fork of QEMU? Wouldn't it be more helpful to contribute to the upstream code base instead?
>
> We'd actually like to contribute upstream too :) We do want to maintain our own downstream though; Android Emulator codebase needs to work solidly on macos and windows which has made keeping up with upstream difficult, and staying on a previous version (2.12) with known quirks easier. (theres also some android related customization relating to Qt Ui + different set of virtual devices and snapshot support (incl. snapshots of graphics devices with OpenGLES state tracking), which we hope to separate into other libraries/processes, but its not insignificant)
>>
>>
>> Alex
>>
>>
>> On 30.11.20 21:15, Frank Yang wrote:
>>
>> Update: We're not quite sure how to compare the CNTV_CVAL and CNTVCT. But the high CPU usage seems to be mitigated by having a poll interval (like KVM does) in handling WFI:
>>
>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512501
>>
>> This is loosely inspired by https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766 which does seem to specify a poll interval.
>>
>> It would be cool if we could have a lightweight way to enter sleep and restart the vcpus precisely when CVAL passes, though.
>>
>> Frank
>>
>>
>> On Fri, Nov 27, 2020 at 3:30 PM Frank Yang <lfy@google.com> wrote:
>>>
>>> Hi all,
>>>
>>> +Peter Collingbourne
>>>
>>> I'm a developer on the Android Emulator, which is in a fork of QEMU.
>>>
>>> Peter and I have been working on an HVF Apple Silicon backend with an eye toward Android guests.
>>>
>>> We have gotten things to basically switch to Android userspace already (logcat/shell and graphics available at least)
>>>
>>> Our strategy so far has been to import logic from the KVM implementation and hook into QEMU's software devices that previously assumed to only work with TCG, or have KVM-specific paths.
>>>
>>> Thanks to Alexander for the tip on the 36-bit address space limitation btw; our way of addressing this is to still allow highmem but not put pci high mmio so high.
>>>
>>> Also, note we have a sleep/signal based mechanism to deal with WFx, which might be worth looking into in Alexander's implementation as well:
>>>
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551
>>>
>>> Patches so far, FYI:
>>>
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3
>>>
>>> https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a
>>> https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b
>>> https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01
>>> https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228
>>> https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102
>>> https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6
>>>
>>> Peter's also noticed that there are extra steps needed for M1's to allow TCG to work, as it involves JIT:
>>>
>>> https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9
>>>
>>> We'd appreciate any feedback/comments :)
>>>
>>> Best,
>>>
>>> Frank
>>>
>>> On Fri, Nov 27, 2020 at 1:57 PM Alexander Graf <agraf@csgraf.de> wrote:
>>>>
>>>>
>>>> On 27.11.20 21:00, Roman Bolshakov wrote:
>>>> > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote:
>>>> >> Until now, Hypervisor.framework has only been available on x86_64 systems.
>>>> >> With Apple Silicon shipping now, it extends its reach to aarch64. To
>>>> >> prepare for support for multiple architectures, let's move common code out
>>>> >> into its own accel directory.
>>>> >>
>>>> >> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>>>> >> ---
>>>> >>   MAINTAINERS                 |   9 +-
>>>> >>   accel/hvf/hvf-all.c         |  56 +++++
>>>> >>   accel/hvf/hvf-cpus.c        | 468 ++++++++++++++++++++++++++++++++++++
>>>> >>   accel/hvf/meson.build       |   7 +
>>>> >>   accel/meson.build           |   1 +
>>>> >>   include/sysemu/hvf_int.h    |  69 ++++++
>>>> >>   target/i386/hvf/hvf-cpus.c  | 131 ----------
>>>> >>   target/i386/hvf/hvf-cpus.h  |  25 --
>>>> >>   target/i386/hvf/hvf-i386.h  |  48 +---
>>>> >>   target/i386/hvf/hvf.c       | 360 +--------------------------
>>>> >>   target/i386/hvf/meson.build |   1 -
>>>> >>   target/i386/hvf/x86hvf.c    |  11 +-
>>>> >>   target/i386/hvf/x86hvf.h    |   2 -
>>>> >>   13 files changed, 619 insertions(+), 569 deletions(-)
>>>> >>   create mode 100644 accel/hvf/hvf-all.c
>>>> >>   create mode 100644 accel/hvf/hvf-cpus.c
>>>> >>   create mode 100644 accel/hvf/meson.build
>>>> >>   create mode 100644 include/sysemu/hvf_int.h
>>>> >>   delete mode 100644 target/i386/hvf/hvf-cpus.c
>>>> >>   delete mode 100644 target/i386/hvf/hvf-cpus.h
>>>> >>
>>>> >> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> >> index 68bc160f41..ca4b6d9279 100644
>>>> >> --- a/MAINTAINERS
>>>> >> +++ b/MAINTAINERS
>>>> >> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com>
>>>> >>   M: Roman Bolshakov <r.bolshakov@yadro.com>
>>>> >>   W: https://wiki.qemu.org/Features/HVF
>>>> >>   S: Maintained
>>>> >> -F: accel/stubs/hvf-stub.c
>>>> > There was a patch for that in the RFC series from Claudio.
>>>>
>>>>
>>>> Yeah, I'm not worried about this hunk :).
>>>>
>>>>
>>>> >
>>>> >>   F: target/i386/hvf/
>>>> >> +
>>>> >> +HVF
>>>> >> +M: Cameron Esfahani <dirty@apple.com>
>>>> >> +M: Roman Bolshakov <r.bolshakov@yadro.com>
>>>> >> +W: https://wiki.qemu.org/Features/HVF
>>>> >> +S: Maintained
>>>> >> +F: accel/hvf/
>>>> >>   F: include/sysemu/hvf.h
>>>> >> +F: include/sysemu/hvf_int.h
>>>> >>
>>>> >>   WHPX CPUs
>>>> >>   M: Sunil Muthuswamy <sunilmut@microsoft.com>
>>>> >> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
>>>> >> new file mode 100644
>>>> >> index 0000000000..47d77a472a
>>>> >> --- /dev/null
>>>> >> +++ b/accel/hvf/hvf-all.c
>>>> >> @@ -0,0 +1,56 @@
>>>> >> +/*
>>>> >> + * QEMU Hypervisor.framework support
>>>> >> + *
>>>> >> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>>>> >> + * the COPYING file in the top-level directory.
>>>> >> + *
>>>> >> + * Contributions after 2012-01-13 are licensed under the terms of the
>>>> >> + * GNU GPL, version 2 or (at your option) any later version.
>>>> >> + */
>>>> >> +
>>>> >> +#include "qemu/osdep.h"
>>>> >> +#include "qemu-common.h"
>>>> >> +#include "qemu/error-report.h"
>>>> >> +#include "sysemu/hvf.h"
>>>> >> +#include "sysemu/hvf_int.h"
>>>> >> +#include "sysemu/runstate.h"
>>>> >> +
>>>> >> +#include "qemu/main-loop.h"
>>>> >> +#include "sysemu/accel.h"
>>>> >> +
>>>> >> +#include <Hypervisor/Hypervisor.h>
>>>> >> +
>>>> >> +bool hvf_allowed;
>>>> >> +HVFState *hvf_state;
>>>> >> +
>>>> >> +void assert_hvf_ok(hv_return_t ret)
>>>> >> +{
>>>> >> +    if (ret == HV_SUCCESS) {
>>>> >> +        return;
>>>> >> +    }
>>>> >> +
>>>> >> +    switch (ret) {
>>>> >> +    case HV_ERROR:
>>>> >> +        error_report("Error: HV_ERROR");
>>>> >> +        break;
>>>> >> +    case HV_BUSY:
>>>> >> +        error_report("Error: HV_BUSY");
>>>> >> +        break;
>>>> >> +    case HV_BAD_ARGUMENT:
>>>> >> +        error_report("Error: HV_BAD_ARGUMENT");
>>>> >> +        break;
>>>> >> +    case HV_NO_RESOURCES:
>>>> >> +        error_report("Error: HV_NO_RESOURCES");
>>>> >> +        break;
>>>> >> +    case HV_NO_DEVICE:
>>>> >> +        error_report("Error: HV_NO_DEVICE");
>>>> >> +        break;
>>>> >> +    case HV_UNSUPPORTED:
>>>> >> +        error_report("Error: HV_UNSUPPORTED");
>>>> >> +        break;
>>>> >> +    default:
>>>> >> +        error_report("Unknown Error");
>>>> >> +    }
>>>> >> +
>>>> >> +    abort();
>>>> >> +}
>>>> >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
>>>> >> new file mode 100644
>>>> >> index 0000000000..f9bb5502b7
>>>> >> --- /dev/null
>>>> >> +++ b/accel/hvf/hvf-cpus.c
>>>> >> @@ -0,0 +1,468 @@
>>>> >> +/*
>>>> >> + * Copyright 2008 IBM Corporation
>>>> >> + *           2008 Red Hat, Inc.
>>>> >> + * Copyright 2011 Intel Corporation
>>>> >> + * Copyright 2016 Veertu, Inc.
>>>> >> + * Copyright 2017 The Android Open Source Project
>>>> >> + *
>>>> >> + * QEMU Hypervisor.framework support
>>>> >> + *
>>>> >> + * This program is free software; you can redistribute it and/or
>>>> >> + * modify it under the terms of version 2 of the GNU General Public
>>>> >> + * License as published by the Free Software Foundation.
>>>> >> + *
>>>> >> + * This program is distributed in the hope that it will be useful,
>>>> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>>> >> + * General Public License for more details.
>>>> >> + *
>>>> >> + * You should have received a copy of the GNU General Public License
>>>> >> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>>>> >> + *
>>>> >> + * This file contain code under public domain from the hvdos project:
>>>> >> + * https://github.com/mist64/hvdos
>>>> >> + *
>>>> >> + * Parts Copyright (c) 2011 NetApp, Inc.
>>>> >> + * All rights reserved.
>>>> >> + *
>>>> >> + * Redistribution and use in source and binary forms, with or without
>>>> >> + * modification, are permitted provided that the following conditions
>>>> >> + * are met:
>>>> >> + * 1. Redistributions of source code must retain the above copyright
>>>> >> + *    notice, this list of conditions and the following disclaimer.
>>>> >> + * 2. Redistributions in binary form must reproduce the above copyright
>>>> >> + *    notice, this list of conditions and the following disclaimer in the
>>>> >> + *    documentation and/or other materials provided with the distribution.
>>>> >> + *
>>>> >> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
>>>> >> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
>>>> >> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
>>>> >> + * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
>>>> >> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
>>>> >> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
>>>> >> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
>>>> >> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
>>>> >> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
>>>> >> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>>>> >> + * SUCH DAMAGE.
>>>> >> + */
>>>> >> +
>>>> >> +#include "qemu/osdep.h"
>>>> >> +#include "qemu/error-report.h"
>>>> >> +#include "qemu/main-loop.h"
>>>> >> +#include "exec/address-spaces.h"
>>>> >> +#include "exec/exec-all.h"
>>>> >> +#include "sysemu/cpus.h"
>>>> >> +#include "sysemu/hvf.h"
>>>> >> +#include "sysemu/hvf_int.h"
>>>> >> +#include "sysemu/runstate.h"
>>>> >> +#include "qemu/guest-random.h"
>>>> >> +
>>>> >> +#include <Hypervisor/Hypervisor.h>
>>>> >> +
>>>> >> +/* Memory slots */
>>>> >> +
>>>> >> +struct mac_slot {
>>>> >> +    int present;
>>>> >> +    uint64_t size;
>>>> >> +    uint64_t gpa_start;
>>>> >> +    uint64_t gva;
>>>> >> +};
>>>> >> +
>>>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
>>>> >> +{
>>>> >> +    hvf_slot *slot;
>>>> >> +    int x;
>>>> >> +    for (x = 0; x < hvf_state->num_slots; ++x) {
>>>> >> +        slot = &hvf_state->slots[x];
>>>> >> +        if (slot->size && start < (slot->start + slot->size) &&
>>>> >> +            (start + size) > slot->start) {
>>>> >> +            return slot;
>>>> >> +        }
>>>> >> +    }
>>>> >> +    return NULL;
>>>> >> +}
>>>> >> +
>>>> >> +struct mac_slot mac_slots[32];
>>>> >> +
>>>> >> +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
>>>> >> +{
>>>> >> +    struct mac_slot *macslot;
>>>> >> +    hv_return_t ret;
>>>> >> +
>>>> >> +    macslot = &mac_slots[slot->slot_id];
>>>> >> +
>>>> >> +    if (macslot->present) {
>>>> >> +        if (macslot->size != slot->size) {
>>>> >> +            macslot->present = 0;
>>>> >> +            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
>>>> >> +            assert_hvf_ok(ret);
>>>> >> +        }
>>>> >> +    }
>>>> >> +
>>>> >> +    if (!slot->size) {
>>>> >> +        return 0;
>>>> >> +    }
>>>> >> +
>>>> >> +    macslot->present = 1;
>>>> >> +    macslot->gpa_start = slot->start;
>>>> >> +    macslot->size = slot->size;
>>>> >> +    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
>>>> >> +    assert_hvf_ok(ret);
>>>> >> +    return 0;
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
>>>> >> +{
>>>> >> +    hvf_slot *mem;
>>>> >> +    MemoryRegion *area = section->mr;
>>>> >> +    bool writeable = !area->readonly && !area->rom_device;
>>>> >> +    hv_memory_flags_t flags;
>>>> >> +
>>>> >> +    if (!memory_region_is_ram(area)) {
>>>> >> +        if (writeable) {
>>>> >> +            return;
>>>> >> +        } else if (!memory_region_is_romd(area)) {
>>>> >> +            /*
>>>> >> +             * If the memory device is not in romd_mode, then we actually want
>>>> >> +             * to remove the hvf memory slot so all accesses will trap.
>>>> >> +             */
>>>> >> +             add = false;
>>>> >> +        }
>>>> >> +    }
>>>> >> +
>>>> >> +    mem = hvf_find_overlap_slot(
>>>> >> +            section->offset_within_address_space,
>>>> >> +            int128_get64(section->size));
>>>> >> +
>>>> >> +    if (mem && add) {
>>>> >> +        if (mem->size == int128_get64(section->size) &&
>>>> >> +            mem->start == section->offset_within_address_space &&
>>>> >> +            mem->mem == (memory_region_get_ram_ptr(area) +
>>>> >> +            section->offset_within_region)) {
>>>> >> +            return; /* Same region was attempted to register, go away. */
>>>> >> +        }
>>>> >> +    }
>>>> >> +
>>>> >> +    /* Region needs to be reset. set the size to 0 and remap it. */
>>>> >> +    if (mem) {
>>>> >> +        mem->size = 0;
>>>> >> +        if (do_hvf_set_memory(mem, 0)) {
>>>> >> +            error_report("Failed to reset overlapping slot");
>>>> >> +            abort();
>>>> >> +        }
>>>> >> +    }
>>>> >> +
>>>> >> +    if (!add) {
>>>> >> +        return;
>>>> >> +    }
>>>> >> +
>>>> >> +    if (area->readonly ||
>>>> >> +        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
>>>> >> +        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
>>>> >> +    } else {
>>>> >> +        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
>>>> >> +    }
>>>> >> +
>>>> >> +    /* Now make a new slot. */
>>>> >> +    int x;
>>>> >> +
>>>> >> +    for (x = 0; x < hvf_state->num_slots; ++x) {
>>>> >> +        mem = &hvf_state->slots[x];
>>>> >> +        if (!mem->size) {
>>>> >> +            break;
>>>> >> +        }
>>>> >> +    }
>>>> >> +
>>>> >> +    if (x == hvf_state->num_slots) {
>>>> >> +        error_report("No free slots");
>>>> >> +        abort();
>>>> >> +    }
>>>> >> +
>>>> >> +    mem->size = int128_get64(section->size);
>>>> >> +    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
>>>> >> +    mem->start = section->offset_within_address_space;
>>>> >> +    mem->region = area;
>>>> >> +
>>>> >> +    if (do_hvf_set_memory(mem, flags)) {
>>>> >> +        error_report("Error registering new memory slot");
>>>> >> +        abort();
>>>> >> +    }
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
>>>> >> +{
>>>> >> +    hvf_slot *slot;
>>>> >> +
>>>> >> +    slot = hvf_find_overlap_slot(
>>>> >> +            section->offset_within_address_space,
>>>> >> +            int128_get64(section->size));
>>>> >> +
>>>> >> +    /* protect region against writes; begin tracking it */
>>>> >> +    if (on) {
>>>> >> +        slot->flags |= HVF_SLOT_LOG;
>>>> >> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
>>>> >> +                      HV_MEMORY_READ);
>>>> >> +    /* stop tracking region*/
>>>> >> +    } else {
>>>> >> +        slot->flags &= ~HVF_SLOT_LOG;
>>>> >> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
>>>> >> +                      HV_MEMORY_READ | HV_MEMORY_WRITE);
>>>> >> +    }
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_log_start(MemoryListener *listener,
>>>> >> +                          MemoryRegionSection *section, int old, int new)
>>>> >> +{
>>>> >> +    if (old != 0) {
>>>> >> +        return;
>>>> >> +    }
>>>> >> +
>>>> >> +    hvf_set_dirty_tracking(section, 1);
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_log_stop(MemoryListener *listener,
>>>> >> +                         MemoryRegionSection *section, int old, int new)
>>>> >> +{
>>>> >> +    if (new != 0) {
>>>> >> +        return;
>>>> >> +    }
>>>> >> +
>>>> >> +    hvf_set_dirty_tracking(section, 0);
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_log_sync(MemoryListener *listener,
>>>> >> +                         MemoryRegionSection *section)
>>>> >> +{
>>>> >> +    /*
>>>> >> +     * sync of dirty pages is handled elsewhere; just make sure we keep
>>>> >> +     * tracking the region.
>>>> >> +     */
>>>> >> +    hvf_set_dirty_tracking(section, 1);
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_region_add(MemoryListener *listener,
>>>> >> +                           MemoryRegionSection *section)
>>>> >> +{
>>>> >> +    hvf_set_phys_mem(section, true);
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_region_del(MemoryListener *listener,
>>>> >> +                           MemoryRegionSection *section)
>>>> >> +{
>>>> >> +    hvf_set_phys_mem(section, false);
>>>> >> +}
>>>> >> +
>>>> >> +static MemoryListener hvf_memory_listener = {
>>>> >> +    .priority = 10,
>>>> >> +    .region_add = hvf_region_add,
>>>> >> +    .region_del = hvf_region_del,
>>>> >> +    .log_start = hvf_log_start,
>>>> >> +    .log_stop = hvf_log_stop,
>>>> >> +    .log_sync = hvf_log_sync,
>>>> >> +};
>>>> >> +
>>>> >> +static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
>>>> >> +{
>>>> >> +    if (!cpu->vcpu_dirty) {
>>>> >> +        hvf_get_registers(cpu);
>>>> >> +        cpu->vcpu_dirty = true;
>>>> >> +    }
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_cpu_synchronize_state(CPUState *cpu)
>>>> >> +{
>>>> >> +    if (!cpu->vcpu_dirty) {
>>>> >> +        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
>>>> >> +    }
>>>> >> +}
>>>> >> +
>>>> >> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
>>>> >> +                                              run_on_cpu_data arg)
>>>> >> +{
>>>> >> +    hvf_put_registers(cpu);
>>>> >> +    cpu->vcpu_dirty = false;
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
>>>> >> +{
>>>> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
>>>> >> +}
>>>> >> +
>>>> >> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
>>>> >> +                                             run_on_cpu_data arg)
>>>> >> +{
>>>> >> +    hvf_put_registers(cpu);
>>>> >> +    cpu->vcpu_dirty = false;
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_cpu_synchronize_post_init(CPUState *cpu)
>>>> >> +{
>>>> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
>>>> >> +}
>>>> >> +
>>>> >> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
>>>> >> +                                              run_on_cpu_data arg)
>>>> >> +{
>>>> >> +    cpu->vcpu_dirty = true;
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
>>>> >> +{
>>>> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_vcpu_destroy(CPUState *cpu)
>>>> >> +{
>>>> >> +    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
>>>> >> +    assert_hvf_ok(ret);
>>>> >> +
>>>> >> +    hvf_arch_vcpu_destroy(cpu);
>>>> >> +}
>>>> >> +
>>>> >> +static void dummy_signal(int sig)
>>>> >> +{
>>>> >> +}
>>>> >> +
>>>> >> +static int hvf_init_vcpu(CPUState *cpu)
>>>> >> +{
>>>> >> +    int r;
>>>> >> +
>>>> >> +    /* init cpu signals */
>>>> >> +    sigset_t set;
>>>> >> +    struct sigaction sigact;
>>>> >> +
>>>> >> +    memset(&sigact, 0, sizeof(sigact));
>>>> >> +    sigact.sa_handler = dummy_signal;
>>>> >> +    sigaction(SIG_IPI, &sigact, NULL);
>>>> >> +
>>>> >> +    pthread_sigmask(SIG_BLOCK, NULL, &set);
>>>> >> +    sigdelset(&set, SIG_IPI);
>>>> >> +
>>>> >> +#ifdef __aarch64__
>>>> >> +    r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t **)&cpu->hvf_exit, NULL);
>>>> >> +#else
>>>> >> +    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
>>>> >> +#endif
>>>> > I think the first __aarch64__ bit fits better to arm part of the series.
>>>>
>>>>
>>>> Oops. Thanks for catching it! Yes, absolutely. It should be part of the
>>>> ARM enablement.
>>>>
>>>>
>>>> >
>>>> >> +    cpu->vcpu_dirty = 1;
>>>> >> +    assert_hvf_ok(r);
>>>> >> +
>>>> >> +    return hvf_arch_init_vcpu(cpu);
>>>> >> +}
>>>> >> +
>>>> >> +/*
>>>> >> + * The HVF-specific vCPU thread function. This one should only run when the host
>>>> >> + * CPU supports the VMX "unrestricted guest" feature.
>>>> >> + */
>>>> >> +static void *hvf_cpu_thread_fn(void *arg)
>>>> >> +{
>>>> >> +    CPUState *cpu = arg;
>>>> >> +
>>>> >> +    int r;
>>>> >> +
>>>> >> +    assert(hvf_enabled());
>>>> >> +
>>>> >> +    rcu_register_thread();
>>>> >> +
>>>> >> +    qemu_mutex_lock_iothread();
>>>> >> +    qemu_thread_get_self(cpu->thread);
>>>> >> +
>>>> >> +    cpu->thread_id = qemu_get_thread_id();
>>>> >> +    cpu->can_do_io = 1;
>>>> >> +    current_cpu = cpu;
>>>> >> +
>>>> >> +    hvf_init_vcpu(cpu);
>>>> >> +
>>>> >> +    /* signal CPU creation */
>>>> >> +    cpu_thread_signal_created(cpu);
>>>> >> +    qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>>> >> +
>>>> >> +    do {
>>>> >> +        if (cpu_can_run(cpu)) {
>>>> >> +            r = hvf_vcpu_exec(cpu);
>>>> >> +            if (r == EXCP_DEBUG) {
>>>> >> +                cpu_handle_guest_debug(cpu);
>>>> >> +            }
>>>> >> +        }
>>>> >> +        qemu_wait_io_event(cpu);
>>>> >> +    } while (!cpu->unplug || cpu_can_run(cpu));
>>>> >> +
>>>> >> +    hvf_vcpu_destroy(cpu);
>>>> >> +    cpu_thread_signal_destroyed(cpu);
>>>> >> +    qemu_mutex_unlock_iothread();
>>>> >> +    rcu_unregister_thread();
>>>> >> +    return NULL;
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_start_vcpu_thread(CPUState *cpu)
>>>> >> +{
>>>> >> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>>>> >> +
>>>> >> +    /*
>>>> >> +     * HVF currently does not support TCG, and only runs in
>>>> >> +     * unrestricted-guest mode.
>>>> >> +     */
>>>> >> +    assert(hvf_enabled());
>>>> >> +
>>>> >> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>>>> >> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>>> >> +    qemu_cond_init(cpu->halt_cond);
>>>> >> +
>>>> >> +    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>>>> >> +             cpu->cpu_index);
>>>> >> +    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
>>>> >> +                       cpu, QEMU_THREAD_JOINABLE);
>>>> >> +}
>>>> >> +
>>>> >> +static const CpusAccel hvf_cpus = {
>>>> >> +    .create_vcpu_thread = hvf_start_vcpu_thread,
>>>> >> +
>>>> >> +    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
>>>> >> +    .synchronize_post_init = hvf_cpu_synchronize_post_init,
>>>> >> +    .synchronize_state = hvf_cpu_synchronize_state,
>>>> >> +    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
>>>> >> +};
>>>> >> +
>>>> >> +static int hvf_accel_init(MachineState *ms)
>>>> >> +{
>>>> >> +    int x;
>>>> >> +    hv_return_t ret;
>>>> >> +    HVFState *s;
>>>> >> +
>>>> >> +    ret = hv_vm_create(HV_VM_DEFAULT);
>>>> >> +    assert_hvf_ok(ret);
>>>> >> +
>>>> >> +    s = g_new0(HVFState, 1);
>>>> >> +
>>>> >> +    s->num_slots = 32;
>>>> >> +    for (x = 0; x < s->num_slots; ++x) {
>>>> >> +        s->slots[x].size = 0;
>>>> >> +        s->slots[x].slot_id = x;
>>>> >> +    }
>>>> >> +
>>>> >> +    hvf_state = s;
>>>> >> +    memory_listener_register(&hvf_memory_listener, &address_space_memory);
>>>> >> +    cpus_register_accel(&hvf_cpus);
>>>> >> +    return 0;
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_accel_class_init(ObjectClass *oc, void *data)
>>>> >> +{
>>>> >> +    AccelClass *ac = ACCEL_CLASS(oc);
>>>> >> +    ac->name = "HVF";
>>>> >> +    ac->init_machine = hvf_accel_init;
>>>> >> +    ac->allowed = &hvf_allowed;
>>>> >> +}
>>>> >> +
>>>> >> +static const TypeInfo hvf_accel_type = {
>>>> >> +    .name = TYPE_HVF_ACCEL,
>>>> >> +    .parent = TYPE_ACCEL,
>>>> >> +    .class_init = hvf_accel_class_init,
>>>> >> +};
>>>> >> +
>>>> >> +static void hvf_type_init(void)
>>>> >> +{
>>>> >> +    type_register_static(&hvf_accel_type);
>>>> >> +}
>>>> >> +
>>>> >> +type_init(hvf_type_init);
>>>> >> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
>>>> >> new file mode 100644
>>>> >> index 0000000000..dfd6b68dc7
>>>> >> --- /dev/null
>>>> >> +++ b/accel/hvf/meson.build
>>>> >> @@ -0,0 +1,7 @@
>>>> >> +hvf_ss = ss.source_set()
>>>> >> +hvf_ss.add(files(
>>>> >> +  'hvf-all.c',
>>>> >> +  'hvf-cpus.c',
>>>> >> +))
>>>> >> +
>>>> >> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
>>>> >> diff --git a/accel/meson.build b/accel/meson.build
>>>> >> index b26cca227a..6de12ce5d5 100644
>>>> >> --- a/accel/meson.build
>>>> >> +++ b/accel/meson.build
>>>> >> @@ -1,5 +1,6 @@
>>>> >>   softmmu_ss.add(files('accel.c'))
>>>> >>
>>>> >> +subdir('hvf')
>>>> >>   subdir('qtest')
>>>> >>   subdir('kvm')
>>>> >>   subdir('tcg')
>>>> >> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
>>>> >> new file mode 100644
>>>> >> index 0000000000..de9bad23a8
>>>> >> --- /dev/null
>>>> >> +++ b/include/sysemu/hvf_int.h
>>>> >> @@ -0,0 +1,69 @@
>>>> >> +/*
>>>> >> + * QEMU Hypervisor.framework (HVF) support
>>>> >> + *
>>>> >> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>>>> >> + * See the COPYING file in the top-level directory.
>>>> >> + *
>>>> >> + */
>>>> >> +
>>>> >> +/* header to be included in HVF-specific code */
>>>> >> +
>>>> >> +#ifndef HVF_INT_H
>>>> >> +#define HVF_INT_H
>>>> >> +
>>>> >> +#include <Hypervisor/Hypervisor.h>
>>>> >> +
>>>> >> +#define HVF_MAX_VCPU 0x10
>>>> >> +
>>>> >> +extern struct hvf_state hvf_global;
>>>> >> +
>>>> >> +struct hvf_vm {
>>>> >> +    int id;
>>>> >> +    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
>>>> >> +};
>>>> >> +
>>>> >> +struct hvf_state {
>>>> >> +    uint32_t version;
>>>> >> +    struct hvf_vm *vm;
>>>> >> +    uint64_t mem_quota;
>>>> >> +};
>>>> >> +
>>>> >> +/* hvf_slot flags */
>>>> >> +#define HVF_SLOT_LOG (1 << 0)
>>>> >> +
>>>> >> +typedef struct hvf_slot {
>>>> >> +    uint64_t start;
>>>> >> +    uint64_t size;
>>>> >> +    uint8_t *mem;
>>>> >> +    int slot_id;
>>>> >> +    uint32_t flags;
>>>> >> +    MemoryRegion *region;
>>>> >> +} hvf_slot;
>>>> >> +
>>>> >> +typedef struct hvf_vcpu_caps {
>>>> >> +    uint64_t vmx_cap_pinbased;
>>>> >> +    uint64_t vmx_cap_procbased;
>>>> >> +    uint64_t vmx_cap_procbased2;
>>>> >> +    uint64_t vmx_cap_entry;
>>>> >> +    uint64_t vmx_cap_exit;
>>>> >> +    uint64_t vmx_cap_preemption_timer;
>>>> >> +} hvf_vcpu_caps;
>>>> >> +
>>>> >> +struct HVFState {
>>>> >> +    AccelState parent;
>>>> >> +    hvf_slot slots[32];
>>>> >> +    int num_slots;
>>>> >> +
>>>> >> +    hvf_vcpu_caps *hvf_caps;
>>>> >> +};
>>>> >> +extern HVFState *hvf_state;
>>>> >> +
>>>> >> +void assert_hvf_ok(hv_return_t ret);
>>>> >> +int hvf_get_registers(CPUState *cpu);
>>>> >> +int hvf_put_registers(CPUState *cpu);
>>>> >> +int hvf_arch_init_vcpu(CPUState *cpu);
>>>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu);
>>>> >> +int hvf_vcpu_exec(CPUState *cpu);
>>>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
>>>> >> +
>>>> >> +#endif
>>>> >> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c
>>>> >> deleted file mode 100644
>>>> >> index 817b3d7452..0000000000
>>>> >> --- a/target/i386/hvf/hvf-cpus.c
>>>> >> +++ /dev/null
>>>> >> @@ -1,131 +0,0 @@
>>>> >> -/*
>>>> >> - * Copyright 2008 IBM Corporation
>>>> >> - *           2008 Red Hat, Inc.
>>>> >> - * Copyright 2011 Intel Corporation
>>>> >> - * Copyright 2016 Veertu, Inc.
>>>> >> - * Copyright 2017 The Android Open Source Project
>>>> >> - *
>>>> >> - * QEMU Hypervisor.framework support
>>>> >> - *
>>>> >> - * This program is free software; you can redistribute it and/or
>>>> >> - * modify it under the terms of version 2 of the GNU General Public
>>>> >> - * License as published by the Free Software Foundation.
>>>> >> - *
>>>> >> - * This program is distributed in the hope that it will be useful,
>>>> >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>>> >> - * General Public License for more details.
>>>> >> - *
>>>> >> - * You should have received a copy of the GNU General Public License
>>>> >> - * along with this program; if not, see <http://www.gnu.org/licenses/>.
>>>> >> - *
>>>> >> - * This file contain code under public domain from the hvdos project:
>>>> >> - * https://github.com/mist64/hvdos
>>>> >> - *
>>>> >> - * Parts Copyright (c) 2011 NetApp, Inc.
>>>> >> - * All rights reserved.
>>>> >> - *
>>>> >> - * Redistribution and use in source and binary forms, with or without
>>>> >> - * modification, are permitted provided that the following conditions
>>>> >> - * are met:
>>>> >> - * 1. Redistributions of source code must retain the above copyright
>>>> >> - *    notice, this list of conditions and the following disclaimer.
>>>> >> - * 2. Redistributions in binary form must reproduce the above copyright
>>>> >> - *    notice, this list of conditions and the following disclaimer in the
>>>> >> - *    documentation and/or other materials provided with the distribution.
>>>> >> - *
>>>> >> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
>>>> >> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
>>>> >> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
>>>> >> - * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
>>>> >> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
>>>> >> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
>>>> >> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
>>>> >> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
>>>> >> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
>>>> >> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>>>> >> - * SUCH DAMAGE.
>>>> >> - */
>>>> >> -
>>>> >> -#include "qemu/osdep.h"
>>>> >> -#include "qemu/error-report.h"
>>>> >> -#include "qemu/main-loop.h"
>>>> >> -#include "sysemu/hvf.h"
>>>> >> -#include "sysemu/runstate.h"
>>>> >> -#include "target/i386/cpu.h"
>>>> >> -#include "qemu/guest-random.h"
>>>> >> -
>>>> >> -#include "hvf-cpus.h"
>>>> >> -
>>>> >> -/*
>>>> >> - * The HVF-specific vCPU thread function. This one should only run when the host
>>>> >> - * CPU supports the VMX "unrestricted guest" feature.
>>>> >> - */
>>>> >> -static void *hvf_cpu_thread_fn(void *arg)
>>>> >> -{
>>>> >> -    CPUState *cpu = arg;
>>>> >> -
>>>> >> -    int r;
>>>> >> -
>>>> >> -    assert(hvf_enabled());
>>>> >> -
>>>> >> -    rcu_register_thread();
>>>> >> -
>>>> >> -    qemu_mutex_lock_iothread();
>>>> >> -    qemu_thread_get_self(cpu->thread);
>>>> >> -
>>>> >> -    cpu->thread_id = qemu_get_thread_id();
>>>> >> -    cpu->can_do_io = 1;
>>>> >> -    current_cpu = cpu;
>>>> >> -
>>>> >> -    hvf_init_vcpu(cpu);
>>>> >> -
>>>> >> -    /* signal CPU creation */
>>>> >> -    cpu_thread_signal_created(cpu);
>>>> >> -    qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>>> >> -
>>>> >> -    do {
>>>> >> -        if (cpu_can_run(cpu)) {
>>>> >> -            r = hvf_vcpu_exec(cpu);
>>>> >> -            if (r == EXCP_DEBUG) {
>>>> >> -                cpu_handle_guest_debug(cpu);
>>>> >> -            }
>>>> >> -        }
>>>> >> -        qemu_wait_io_event(cpu);
>>>> >> -    } while (!cpu->unplug || cpu_can_run(cpu));
>>>> >> -
>>>> >> -    hvf_vcpu_destroy(cpu);
>>>> >> -    cpu_thread_signal_destroyed(cpu);
>>>> >> -    qemu_mutex_unlock_iothread();
>>>> >> -    rcu_unregister_thread();
>>>> >> -    return NULL;
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_start_vcpu_thread(CPUState *cpu)
>>>> >> -{
>>>> >> -    char thread_name[VCPU_THREAD_NAME_SIZE];
>>>> >> -
>>>> >> -    /*
>>>> >> -     * HVF currently does not support TCG, and only runs in
>>>> >> -     * unrestricted-guest mode.
>>>> >> -     */
>>>> >> -    assert(hvf_enabled());
>>>> >> -
>>>> >> -    cpu->thread = g_malloc0(sizeof(QemuThread));
>>>> >> -    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>>> >> -    qemu_cond_init(cpu->halt_cond);
>>>> >> -
>>>> >> -    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>>>> >> -             cpu->cpu_index);
>>>> >> -    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
>>>> >> -                       cpu, QEMU_THREAD_JOINABLE);
>>>> >> -}
>>>> >> -
>>>> >> -const CpusAccel hvf_cpus = {
>>>> >> -    .create_vcpu_thread = hvf_start_vcpu_thread,
>>>> >> -
>>>> >> -    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
>>>> >> -    .synchronize_post_init = hvf_cpu_synchronize_post_init,
>>>> >> -    .synchronize_state = hvf_cpu_synchronize_state,
>>>> >> -    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
>>>> >> -};
>>>> >> diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h
>>>> >> deleted file mode 100644
>>>> >> index ced31b82c0..0000000000
>>>> >> --- a/target/i386/hvf/hvf-cpus.h
>>>> >> +++ /dev/null
>>>> >> @@ -1,25 +0,0 @@
>>>> >> -/*
>>>> >> - * Accelerator CPUS Interface
>>>> >> - *
>>>> >> - * Copyright 2020 SUSE LLC
>>>> >> - *
>>>> >> - * This work is licensed under the terms of the GNU GPL, version 2 or later.
>>>> >> - * See the COPYING file in the top-level directory.
>>>> >> - */
>>>> >> -
>>>> >> -#ifndef HVF_CPUS_H
>>>> >> -#define HVF_CPUS_H
>>>> >> -
>>>> >> -#include "sysemu/cpus.h"
>>>> >> -
>>>> >> -extern const CpusAccel hvf_cpus;
>>>> >> -
>>>> >> -int hvf_init_vcpu(CPUState *);
>>>> >> -int hvf_vcpu_exec(CPUState *);
>>>> >> -void hvf_cpu_synchronize_state(CPUState *);
>>>> >> -void hvf_cpu_synchronize_post_reset(CPUState *);
>>>> >> -void hvf_cpu_synchronize_post_init(CPUState *);
>>>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *);
>>>> >> -void hvf_vcpu_destroy(CPUState *);
>>>> >> -
>>>> >> -#endif /* HVF_CPUS_H */
>>>> >> diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
>>>> >> index e0edffd077..6d56f8f6bb 100644
>>>> >> --- a/target/i386/hvf/hvf-i386.h
>>>> >> +++ b/target/i386/hvf/hvf-i386.h
>>>> >> @@ -18,57 +18,11 @@
>>>> >>
>>>> >>   #include "sysemu/accel.h"
>>>> >>   #include "sysemu/hvf.h"
>>>> >> +#include "sysemu/hvf_int.h"
>>>> >>   #include "cpu.h"
>>>> >>   #include "x86.h"
>>>> >>
>>>> >> -#define HVF_MAX_VCPU 0x10
>>>> >> -
>>>> >> -extern struct hvf_state hvf_global;
>>>> >> -
>>>> >> -struct hvf_vm {
>>>> >> -    int id;
>>>> >> -    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
>>>> >> -};
>>>> >> -
>>>> >> -struct hvf_state {
>>>> >> -    uint32_t version;
>>>> >> -    struct hvf_vm *vm;
>>>> >> -    uint64_t mem_quota;
>>>> >> -};
>>>> >> -
>>>> >> -/* hvf_slot flags */
>>>> >> -#define HVF_SLOT_LOG (1 << 0)
>>>> >> -
>>>> >> -typedef struct hvf_slot {
>>>> >> -    uint64_t start;
>>>> >> -    uint64_t size;
>>>> >> -    uint8_t *mem;
>>>> >> -    int slot_id;
>>>> >> -    uint32_t flags;
>>>> >> -    MemoryRegion *region;
>>>> >> -} hvf_slot;
>>>> >> -
>>>> >> -typedef struct hvf_vcpu_caps {
>>>> >> -    uint64_t vmx_cap_pinbased;
>>>> >> -    uint64_t vmx_cap_procbased;
>>>> >> -    uint64_t vmx_cap_procbased2;
>>>> >> -    uint64_t vmx_cap_entry;
>>>> >> -    uint64_t vmx_cap_exit;
>>>> >> -    uint64_t vmx_cap_preemption_timer;
>>>> >> -} hvf_vcpu_caps;
>>>> >> -
>>>> >> -struct HVFState {
>>>> >> -    AccelState parent;
>>>> >> -    hvf_slot slots[32];
>>>> >> -    int num_slots;
>>>> >> -
>>>> >> -    hvf_vcpu_caps *hvf_caps;
>>>> >> -};
>>>> >> -extern HVFState *hvf_state;
>>>> >> -
>>>> >> -void hvf_set_phys_mem(MemoryRegionSection *, bool);
>>>> >>   void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
>>>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
>>>> >>
>>>> >>   #ifdef NEED_CPU_H
>>>> >>   /* Functions exported to host specific mode */
>>>> >> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
>>>> >> index ed9356565c..8b96ecd619 100644
>>>> >> --- a/target/i386/hvf/hvf.c
>>>> >> +++ b/target/i386/hvf/hvf.c
>>>> >> @@ -51,6 +51,7 @@
>>>> >>   #include "qemu/error-report.h"
>>>> >>
>>>> >>   #include "sysemu/hvf.h"
>>>> >> +#include "sysemu/hvf_int.h"
>>>> >>   #include "sysemu/runstate.h"
>>>> >>   #include "hvf-i386.h"
>>>> >>   #include "vmcs.h"
>>>> >> @@ -72,171 +73,6 @@
>>>> >>   #include "sysemu/accel.h"
>>>> >>   #include "target/i386/cpu.h"
>>>> >>
>>>> >> -#include "hvf-cpus.h"
>>>> >> -
>>>> >> -HVFState *hvf_state;
>>>> >> -
>>>> >> -static void assert_hvf_ok(hv_return_t ret)
>>>> >> -{
>>>> >> -    if (ret == HV_SUCCESS) {
>>>> >> -        return;
>>>> >> -    }
>>>> >> -
>>>> >> -    switch (ret) {
>>>> >> -    case HV_ERROR:
>>>> >> -        error_report("Error: HV_ERROR");
>>>> >> -        break;
>>>> >> -    case HV_BUSY:
>>>> >> -        error_report("Error: HV_BUSY");
>>>> >> -        break;
>>>> >> -    case HV_BAD_ARGUMENT:
>>>> >> -        error_report("Error: HV_BAD_ARGUMENT");
>>>> >> -        break;
>>>> >> -    case HV_NO_RESOURCES:
>>>> >> -        error_report("Error: HV_NO_RESOURCES");
>>>> >> -        break;
>>>> >> -    case HV_NO_DEVICE:
>>>> >> -        error_report("Error: HV_NO_DEVICE");
>>>> >> -        break;
>>>> >> -    case HV_UNSUPPORTED:
>>>> >> -        error_report("Error: HV_UNSUPPORTED");
>>>> >> -        break;
>>>> >> -    default:
>>>> >> -        error_report("Unknown Error");
>>>> >> -    }
>>>> >> -
>>>> >> -    abort();
>>>> >> -}
>>>> >> -
>>>> >> -/* Memory slots */
>>>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
>>>> >> -{
>>>> >> -    hvf_slot *slot;
>>>> >> -    int x;
>>>> >> -    for (x = 0; x < hvf_state->num_slots; ++x) {
>>>> >> -        slot = &hvf_state->slots[x];
>>>> >> -        if (slot->size && start < (slot->start + slot->size) &&
>>>> >> -            (start + size) > slot->start) {
>>>> >> -            return slot;
>>>> >> -        }
>>>> >> -    }
>>>> >> -    return NULL;
>>>> >> -}
>>>> >> -
>>>> >> -struct mac_slot {
>>>> >> -    int present;
>>>> >> -    uint64_t size;
>>>> >> -    uint64_t gpa_start;
>>>> >> -    uint64_t gva;
>>>> >> -};
>>>> >> -
>>>> >> -struct mac_slot mac_slots[32];
>>>> >> -
>>>> >> -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
>>>> >> -{
>>>> >> -    struct mac_slot *macslot;
>>>> >> -    hv_return_t ret;
>>>> >> -
>>>> >> -    macslot = &mac_slots[slot->slot_id];
>>>> >> -
>>>> >> -    if (macslot->present) {
>>>> >> -        if (macslot->size != slot->size) {
>>>> >> -            macslot->present = 0;
>>>> >> -            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
>>>> >> -            assert_hvf_ok(ret);
>>>> >> -        }
>>>> >> -    }
>>>> >> -
>>>> >> -    if (!slot->size) {
>>>> >> -        return 0;
>>>> >> -    }
>>>> >> -
>>>> >> -    macslot->present = 1;
>>>> >> -    macslot->gpa_start = slot->start;
>>>> >> -    macslot->size = slot->size;
>>>> >> -    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
>>>> >> -    assert_hvf_ok(ret);
>>>> >> -    return 0;
>>>> >> -}
>>>> >> -
>>>> >> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
>>>> >> -{
>>>> >> -    hvf_slot *mem;
>>>> >> -    MemoryRegion *area = section->mr;
>>>> >> -    bool writeable = !area->readonly && !area->rom_device;
>>>> >> -    hv_memory_flags_t flags;
>>>> >> -
>>>> >> -    if (!memory_region_is_ram(area)) {
>>>> >> -        if (writeable) {
>>>> >> -            return;
>>>> >> -        } else if (!memory_region_is_romd(area)) {
>>>> >> -            /*
>>>> >> -             * If the memory device is not in romd_mode, then we actually want
>>>> >> -             * to remove the hvf memory slot so all accesses will trap.
>>>> >> -             */
>>>> >> -             add = false;
>>>> >> -        }
>>>> >> -    }
>>>> >> -
>>>> >> -    mem = hvf_find_overlap_slot(
>>>> >> -            section->offset_within_address_space,
>>>> >> -            int128_get64(section->size));
>>>> >> -
>>>> >> -    if (mem && add) {
>>>> >> -        if (mem->size == int128_get64(section->size) &&
>>>> >> -            mem->start == section->offset_within_address_space &&
>>>> >> -            mem->mem == (memory_region_get_ram_ptr(area) +
>>>> >> -            section->offset_within_region)) {
>>>> >> -            return; /* Same region was attempted to register, go away. */
>>>> >> -        }
>>>> >> -    }
>>>> >> -
>>>> >> -    /* Region needs to be reset. set the size to 0 and remap it. */
>>>> >> -    if (mem) {
>>>> >> -        mem->size = 0;
>>>> >> -        if (do_hvf_set_memory(mem, 0)) {
>>>> >> -            error_report("Failed to reset overlapping slot");
>>>> >> -            abort();
>>>> >> -        }
>>>> >> -    }
>>>> >> -
>>>> >> -    if (!add) {
>>>> >> -        return;
>>>> >> -    }
>>>> >> -
>>>> >> -    if (area->readonly ||
>>>> >> -        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
>>>> >> -        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
>>>> >> -    } else {
>>>> >> -        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
>>>> >> -    }
>>>> >> -
>>>> >> -    /* Now make a new slot. */
>>>> >> -    int x;
>>>> >> -
>>>> >> -    for (x = 0; x < hvf_state->num_slots; ++x) {
>>>> >> -        mem = &hvf_state->slots[x];
>>>> >> -        if (!mem->size) {
>>>> >> -            break;
>>>> >> -        }
>>>> >> -    }
>>>> >> -
>>>> >> -    if (x == hvf_state->num_slots) {
>>>> >> -        error_report("No free slots");
>>>> >> -        abort();
>>>> >> -    }
>>>> >> -
>>>> >> -    mem->size = int128_get64(section->size);
>>>> >> -    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
>>>> >> -    mem->start = section->offset_within_address_space;
>>>> >> -    mem->region = area;
>>>> >> -
>>>> >> -    if (do_hvf_set_memory(mem, flags)) {
>>>> >> -        error_report("Error registering new memory slot");
>>>> >> -        abort();
>>>> >> -    }
>>>> >> -}
>>>> >> -
>>>> >>   void vmx_update_tpr(CPUState *cpu)
>>>> >>   {
>>>> >>       /* TODO: need integrate APIC handling */
>>>> >> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer,
>>>> >>       }
>>>> >>   }
>>>> >>
>>>> >> -static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
>>>> >> -{
>>>> >> -    if (!cpu->vcpu_dirty) {
>>>> >> -        hvf_get_registers(cpu);
>>>> >> -        cpu->vcpu_dirty = true;
>>>> >> -    }
>>>> >> -}
>>>> >> -
>>>> >> -void hvf_cpu_synchronize_state(CPUState *cpu)
>>>> >> -{
>>>> >> -    if (!cpu->vcpu_dirty) {
>>>> >> -        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
>>>> >> -    }
>>>> >> -}
>>>> >> -
>>>> >> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
>>>> >> -                                              run_on_cpu_data arg)
>>>> >> -{
>>>> >> -    hvf_put_registers(cpu);
>>>> >> -    cpu->vcpu_dirty = false;
>>>> >> -}
>>>> >> -
>>>> >> -void hvf_cpu_synchronize_post_reset(CPUState *cpu)
>>>> >> -{
>>>> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
>>>> >> -}
>>>> >> -
>>>> >> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
>>>> >> -                                             run_on_cpu_data arg)
>>>> >> -{
>>>> >> -    hvf_put_registers(cpu);
>>>> >> -    cpu->vcpu_dirty = false;
>>>> >> -}
>>>> >> -
>>>> >> -void hvf_cpu_synchronize_post_init(CPUState *cpu)
>>>> >> -{
>>>> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
>>>> >> -}
>>>> >> -
>>>> >> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
>>>> >> -                                              run_on_cpu_data arg)
>>>> >> -{
>>>> >> -    cpu->vcpu_dirty = true;
>>>> >> -}
>>>> >> -
>>>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
>>>> >> -{
>>>> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
>>>> >> -}
>>>> >> -
>>>> >>   static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
>>>> >>   {
>>>> >>       int read, write;
>>>> >> @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
>>>> >>       return false;
>>>> >>   }
>>>> >>
>>>> >> -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
>>>> >> -{
>>>> >> -    hvf_slot *slot;
>>>> >> -
>>>> >> -    slot = hvf_find_overlap_slot(
>>>> >> -            section->offset_within_address_space,
>>>> >> -            int128_get64(section->size));
>>>> >> -
>>>> >> -    /* protect region against writes; begin tracking it */
>>>> >> -    if (on) {
>>>> >> -        slot->flags |= HVF_SLOT_LOG;
>>>> >> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
>>>> >> -                      HV_MEMORY_READ);
>>>> >> -    /* stop tracking region*/
>>>> >> -    } else {
>>>> >> -        slot->flags &= ~HVF_SLOT_LOG;
>>>> >> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
>>>> >> -                      HV_MEMORY_READ | HV_MEMORY_WRITE);
>>>> >> -    }
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_log_start(MemoryListener *listener,
>>>> >> -                          MemoryRegionSection *section, int old, int new)
>>>> >> -{
>>>> >> -    if (old != 0) {
>>>> >> -        return;
>>>> >> -    }
>>>> >> -
>>>> >> -    hvf_set_dirty_tracking(section, 1);
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_log_stop(MemoryListener *listener,
>>>> >> -                         MemoryRegionSection *section, int old, int new)
>>>> >> -{
>>>> >> -    if (new != 0) {
>>>> >> -        return;
>>>> >> -    }
>>>> >> -
>>>> >> -    hvf_set_dirty_tracking(section, 0);
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_log_sync(MemoryListener *listener,
>>>> >> -                         MemoryRegionSection *section)
>>>> >> -{
>>>> >> -    /*
>>>> >> -     * sync of dirty pages is handled elsewhere; just make sure we keep
>>>> >> -     * tracking the region.
>>>> >> -     */
>>>> >> -    hvf_set_dirty_tracking(section, 1);
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_region_add(MemoryListener *listener,
>>>> >> -                           MemoryRegionSection *section)
>>>> >> -{
>>>> >> -    hvf_set_phys_mem(section, true);
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_region_del(MemoryListener *listener,
>>>> >> -                           MemoryRegionSection *section)
>>>> >> -{
>>>> >> -    hvf_set_phys_mem(section, false);
>>>> >> -}
>>>> >> -
>>>> >> -static MemoryListener hvf_memory_listener = {
>>>> >> -    .priority = 10,
>>>> >> -    .region_add = hvf_region_add,
>>>> >> -    .region_del = hvf_region_del,
>>>> >> -    .log_start = hvf_log_start,
>>>> >> -    .log_stop = hvf_log_stop,
>>>> >> -    .log_sync = hvf_log_sync,
>>>> >> -};
>>>> >> -
>>>> >> -void hvf_vcpu_destroy(CPUState *cpu)
>>>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu)
>>>> >>   {
>>>> >>       X86CPU *x86_cpu = X86_CPU(cpu);
>>>> >>       CPUX86State *env = &x86_cpu->env;
>>>> >>
>>>> >> -    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
>>>> >>       g_free(env->hvf_mmio_buf);
>>>> >> -    assert_hvf_ok(ret);
>>>> >> -}
>>>> >> -
>>>> >> -static void dummy_signal(int sig)
>>>> >> -{
>>>> >>   }
>>>> >>
>>>> >> -int hvf_init_vcpu(CPUState *cpu)
>>>> >> +int hvf_arch_init_vcpu(CPUState *cpu)
>>>> >>   {
>>>> >>
>>>> >>       X86CPU *x86cpu = X86_CPU(cpu);
>>>> >>       CPUX86State *env = &x86cpu->env;
>>>> >> -    int r;
>>>> >> -
>>>> >> -    /* init cpu signals */
>>>> >> -    sigset_t set;
>>>> >> -    struct sigaction sigact;
>>>> >> -
>>>> >> -    memset(&sigact, 0, sizeof(sigact));
>>>> >> -    sigact.sa_handler = dummy_signal;
>>>> >> -    sigaction(SIG_IPI, &sigact, NULL);
>>>> >> -
>>>> >> -    pthread_sigmask(SIG_BLOCK, NULL, &set);
>>>> >> -    sigdelset(&set, SIG_IPI);
>>>> >>
>>>> >>       init_emu();
>>>> >>       init_decoder();
>>>> >> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu)
>>>> >>       hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1);
>>>> >>       env->hvf_mmio_buf = g_new(char, 4096);
>>>> >>
>>>> >> -    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
>>>> >> -    cpu->vcpu_dirty = 1;
>>>> >> -    assert_hvf_ok(r);
>>>> >> -
>>>> >>       if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
>>>> >>           &hvf_state->hvf_caps->vmx_cap_pinbased)) {
>>>> >>           abort();
>>>> >> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu)
>>>> >>
>>>> >>       return ret;
>>>> >>   }
>>>> >> -
>>>> >> -bool hvf_allowed;
>>>> >> -
>>>> >> -static int hvf_accel_init(MachineState *ms)
>>>> >> -{
>>>> >> -    int x;
>>>> >> -    hv_return_t ret;
>>>> >> -    HVFState *s;
>>>> >> -
>>>> >> -    ret = hv_vm_create(HV_VM_DEFAULT);
>>>> >> -    assert_hvf_ok(ret);
>>>> >> -
>>>> >> -    s = g_new0(HVFState, 1);
>>>> >> -
>>>> >> -    s->num_slots = 32;
>>>> >> -    for (x = 0; x < s->num_slots; ++x) {
>>>> >> -        s->slots[x].size = 0;
>>>> >> -        s->slots[x].slot_id = x;
>>>> >> -    }
>>>> >> -
>>>> >> -    hvf_state = s;
>>>> >> -    memory_listener_register(&hvf_memory_listener, &address_space_memory);
>>>> >> -    cpus_register_accel(&hvf_cpus);
>>>> >> -    return 0;
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_accel_class_init(ObjectClass *oc, void *data)
>>>> >> -{
>>>> >> -    AccelClass *ac = ACCEL_CLASS(oc);
>>>> >> -    ac->name = "HVF";
>>>> >> -    ac->init_machine = hvf_accel_init;
>>>> >> -    ac->allowed = &hvf_allowed;
>>>> >> -}
>>>> >> -
>>>> >> -static const TypeInfo hvf_accel_type = {
>>>> >> -    .name = TYPE_HVF_ACCEL,
>>>> >> -    .parent = TYPE_ACCEL,
>>>> >> -    .class_init = hvf_accel_class_init,
>>>> >> -};
>>>> >> -
>>>> >> -static void hvf_type_init(void)
>>>> >> -{
>>>> >> -    type_register_static(&hvf_accel_type);
>>>> >> -}
>>>> >> -
>>>> >> -type_init(hvf_type_init);
>>>> >> diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
>>>> >> index 409c9a3f14..c8a43717ee 100644
>>>> >> --- a/target/i386/hvf/meson.build
>>>> >> +++ b/target/i386/hvf/meson.build
>>>> >> @@ -1,6 +1,5 @@
>>>> >>   i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
>>>> >>     'hvf.c',
>>>> >> -  'hvf-cpus.c',
>>>> >>     'x86.c',
>>>> >>     'x86_cpuid.c',
>>>> >>     'x86_decode.c',
>>>> >> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
>>>> >> index bbec412b6c..89b8e9d87a 100644
>>>> >> --- a/target/i386/hvf/x86hvf.c
>>>> >> +++ b/target/i386/hvf/x86hvf.c
>>>> >> @@ -20,6 +20,9 @@
>>>> >>   #include "qemu/osdep.h"
>>>> >>
>>>> >>   #include "qemu-common.h"
>>>> >> +#include "sysemu/hvf.h"
>>>> >> +#include "sysemu/hvf_int.h"
>>>> >> +#include "sysemu/hw_accel.h"
>>>> >>   #include "x86hvf.h"
>>>> >>   #include "vmx.h"
>>>> >>   #include "vmcs.h"
>>>> >> @@ -32,8 +35,6 @@
>>>> >>   #include <Hypervisor/hv.h>
>>>> >>   #include <Hypervisor/hv_vmx.h>
>>>> >>
>>>> >> -#include "hvf-cpus.h"
>>>> >> -
>>>> >>   void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
>>>> >>                        SegmentCache *qseg, bool is_tr)
>>>> >>   {
>>>> >> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state)
>>>> >>       env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
>>>> >>
>>>> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
>>>> >> -        hvf_cpu_synchronize_state(cpu_state);
>>>> >> +        cpu_synchronize_state(cpu_state);
>>>> >>           do_cpu_init(cpu);
>>>> >>       }
>>>> >>
>>>> >> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state)
>>>> >>           cpu_state->halted = 0;
>>>> >>       }
>>>> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) {
>>>> >> -        hvf_cpu_synchronize_state(cpu_state);
>>>> >> +        cpu_synchronize_state(cpu_state);
>>>> >>           do_cpu_sipi(cpu);
>>>> >>       }
>>>> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) {
>>>> >>           cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR;
>>>> >> -        hvf_cpu_synchronize_state(cpu_state);
>>>> >> +        cpu_synchronize_state(cpu_state);
>>>> > The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should
>>>> > be a separate patch. It follows cpu/accel cleanups Claudio was doing the
>>>> > summer.
>>>>
>>>>
>>>> The only reason they're in here is because we no longer have access to
>>>> the hvf_ functions from the file. I am perfectly happy to rebase the
>>>> patch on top of Claudio's if his goes in first. I'm sure it'll be
>>>> trivial for him to rebase on top of this too if my series goes in first.
>>>>
>>>>
>>>> >
>>>> > Phillipe raised the idea that the patch might go ahead of ARM-specific
>>>> > part (which might involve some discussions) and I agree with that.
>>>> >
>>>> > Some sync between Claudio series (CC'd him) and the patch might be need.
>>>>
>>>>
>>>> I would prefer not to hold back because of the sync. Claudio's cleanup
>>>> is trivial enough to adjust for if it gets merged ahead of this.
>>>>
>>>>
>>>> Alex
>>>>
>>>>
>>>>
Alexander Graf Nov. 30, 2020, 9:40 p.m. UTC | #8
Hi Peter,

On 30.11.20 22:08, Peter Collingbourne wrote:
> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote:
>>
>>
>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote:
>>> Hi Frank,
>>>
>>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse.
>> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either!
>>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out
>>>
>>>    https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/
>>>
>> Thanks, we'll take a look :)
>>
>>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold.
> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
> I'll reply to your patch here. You have:
>
> +                    /* Set cpu->hvf->sleeping so that we get a
> SIG_IPI signal. */
> +                    cpu->hvf->sleeping = true;
> +                    smp_mb();
> +
> +                    /* Bail out if we received an IRQ meanwhile */
> +                    if (cpu->thread_kicked || (cpu->interrupt_request &
> +                        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> +                        cpu->hvf->sleeping = false;
> +                        break;
> +                    }
> +
> +                    /* nanosleep returns on signal, so we wake up on kick. */
> +                    nanosleep(ts, NULL);
>
> and then send the signal conditional on whether sleeping is true, but
> I think this is racy. If the signal is sent after sleeping is set to
> true but before entering nanosleep then I think it will be ignored and
> we will miss the wakeup. That's why in my implementation I block IPI
> on the CPU thread at startup and then use pselect to atomically
> unblock and begin sleeping. The signal is sent unconditionally so
> there's no need to worry about races between actually sleeping and the
> "we think we're sleeping" state. It may lead to an extra wakeup but
> that's better than missing it entirely.


Thanks a bunch for the comment! So the trick I was using here is to 
modify the timespec from the kick function before sending the IPI 
signal. That way, we know that either we are inside the sleep (where the 
signal wakes it up) or we are outside the sleep (where timespec={} will 
make it return immediately).

The only race I can think of is if nanosleep does calculations based on 
the timespec and we happen to send the signal right there and then.

The problem with blocking IPIs is basically what Frank was describing 
earlier: How do you unset the IPI signal pending status? If the signal 
is never delivered, how can pselect differentiate "signal from last time 
is still pending" from "new signal because I got an IPI"?


Alex
Peter Maydell Nov. 30, 2020, 10:10 p.m. UTC | #9
On Mon, 30 Nov 2020 at 20:56, Frank Yang <lfy@google.com> wrote:
> We'd actually like to contribute upstream too :) We do want to maintain
> our own downstream though; Android Emulator codebase needs to work
> solidly on macos and windows which has made keeping up with upstream difficult

One of the main reasons why OSX and Windows support upstream is
not so great is because very few people are helping to develop,
test and support it upstream. The way to fix that IMHO is for more
people who do care about those platforms to actively engage
with us upstream to help in making those platforms move closer to
being first class citizens. If you stay on a downstream fork
forever then I don't think you'll ever see things improve.

thanks
-- PMM
Peter Collingbourne Nov. 30, 2020, 10:46 p.m. UTC | #10
On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote:
>
>
>
> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote:
>>
>> Hi Frank,
>>
>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse.
>
> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either!

We tracked down the discrepancies between CNTPCT_EL0 on the guest vs
on the host to the fact that CNTPCT_EL0 on the guest does not
increment while the system is asleep and as such corresponds to
mach_absolute_time() on the host (if you read the XNU sources you will
see that mach_absolute_time() is implemented as CNTPCT_EL0 plus a
constant representing the time spent asleep) while CNTPCT_EL0 on the
host does increment while asleep. This patch switches the
implementation over to using mach_absolute_time() instead of reading
CNTPCT_EL0 directly:

https://android-review.googlesource.com/c/platform/external/qemu/+/1514870

Peter

>>
>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out
>>
>>   https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/
>>
>
> Thanks, we'll take a look :)
>
>>
>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold.
>>
>> Also, is there a particular reason you're working on this super interesting and useful code in a random downstream fork of QEMU? Wouldn't it be more helpful to contribute to the upstream code base instead?
>
> We'd actually like to contribute upstream too :) We do want to maintain our own downstream though; Android Emulator codebase needs to work solidly on macos and windows which has made keeping up with upstream difficult, and staying on a previous version (2.12) with known quirks easier. (theres also some android related customization relating to Qt Ui + different set of virtual devices and snapshot support (incl. snapshots of graphics devices with OpenGLES state tracking), which we hope to separate into other libraries/processes, but its not insignificant)
>>
>>
>> Alex
>>
>>
>> On 30.11.20 21:15, Frank Yang wrote:
>>
>> Update: We're not quite sure how to compare the CNTV_CVAL and CNTVCT. But the high CPU usage seems to be mitigated by having a poll interval (like KVM does) in handling WFI:
>>
>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512501
>>
>> This is loosely inspired by https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766 which does seem to specify a poll interval.
>>
>> It would be cool if we could have a lightweight way to enter sleep and restart the vcpus precisely when CVAL passes, though.
>>
>> Frank
>>
>>
>> On Fri, Nov 27, 2020 at 3:30 PM Frank Yang <lfy@google.com> wrote:
>>>
>>> Hi all,
>>>
>>> +Peter Collingbourne
>>>
>>> I'm a developer on the Android Emulator, which is in a fork of QEMU.
>>>
>>> Peter and I have been working on an HVF Apple Silicon backend with an eye toward Android guests.
>>>
>>> We have gotten things to basically switch to Android userspace already (logcat/shell and graphics available at least)
>>>
>>> Our strategy so far has been to import logic from the KVM implementation and hook into QEMU's software devices that previously assumed to only work with TCG, or have KVM-specific paths.
>>>
>>> Thanks to Alexander for the tip on the 36-bit address space limitation btw; our way of addressing this is to still allow highmem but not put pci high mmio so high.
>>>
>>> Also, note we have a sleep/signal based mechanism to deal with WFx, which might be worth looking into in Alexander's implementation as well:
>>>
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551
>>>
>>> Patches so far, FYI:
>>>
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3
>>>
>>> https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a
>>> https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b
>>> https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01
>>> https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228
>>> https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102
>>> https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6
>>>
>>> Peter's also noticed that there are extra steps needed for M1's to allow TCG to work, as it involves JIT:
>>>
>>> https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9
>>>
>>> We'd appreciate any feedback/comments :)
>>>
>>> Best,
>>>
>>> Frank
>>>
>>> On Fri, Nov 27, 2020 at 1:57 PM Alexander Graf <agraf@csgraf.de> wrote:
>>>>
>>>>
>>>> On 27.11.20 21:00, Roman Bolshakov wrote:
>>>> > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote:
>>>> >> Until now, Hypervisor.framework has only been available on x86_64 systems.
>>>> >> With Apple Silicon shipping now, it extends its reach to aarch64. To
>>>> >> prepare for support for multiple architectures, let's move common code out
>>>> >> into its own accel directory.
>>>> >>
>>>> >> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>>>> >> ---
>>>> >>   MAINTAINERS                 |   9 +-
>>>> >>   accel/hvf/hvf-all.c         |  56 +++++
>>>> >>   accel/hvf/hvf-cpus.c        | 468 ++++++++++++++++++++++++++++++++++++
>>>> >>   accel/hvf/meson.build       |   7 +
>>>> >>   accel/meson.build           |   1 +
>>>> >>   include/sysemu/hvf_int.h    |  69 ++++++
>>>> >>   target/i386/hvf/hvf-cpus.c  | 131 ----------
>>>> >>   target/i386/hvf/hvf-cpus.h  |  25 --
>>>> >>   target/i386/hvf/hvf-i386.h  |  48 +---
>>>> >>   target/i386/hvf/hvf.c       | 360 +--------------------------
>>>> >>   target/i386/hvf/meson.build |   1 -
>>>> >>   target/i386/hvf/x86hvf.c    |  11 +-
>>>> >>   target/i386/hvf/x86hvf.h    |   2 -
>>>> >>   13 files changed, 619 insertions(+), 569 deletions(-)
>>>> >>   create mode 100644 accel/hvf/hvf-all.c
>>>> >>   create mode 100644 accel/hvf/hvf-cpus.c
>>>> >>   create mode 100644 accel/hvf/meson.build
>>>> >>   create mode 100644 include/sysemu/hvf_int.h
>>>> >>   delete mode 100644 target/i386/hvf/hvf-cpus.c
>>>> >>   delete mode 100644 target/i386/hvf/hvf-cpus.h
>>>> >>
>>>> >> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> >> index 68bc160f41..ca4b6d9279 100644
>>>> >> --- a/MAINTAINERS
>>>> >> +++ b/MAINTAINERS
>>>> >> @@ -444,9 +444,16 @@ M: Cameron Esfahani <dirty@apple.com>
>>>> >>   M: Roman Bolshakov <r.bolshakov@yadro.com>
>>>> >>   W: https://wiki.qemu.org/Features/HVF
>>>> >>   S: Maintained
>>>> >> -F: accel/stubs/hvf-stub.c
>>>> > There was a patch for that in the RFC series from Claudio.
>>>>
>>>>
>>>> Yeah, I'm not worried about this hunk :).
>>>>
>>>>
>>>> >
>>>> >>   F: target/i386/hvf/
>>>> >> +
>>>> >> +HVF
>>>> >> +M: Cameron Esfahani <dirty@apple.com>
>>>> >> +M: Roman Bolshakov <r.bolshakov@yadro.com>
>>>> >> +W: https://wiki.qemu.org/Features/HVF
>>>> >> +S: Maintained
>>>> >> +F: accel/hvf/
>>>> >>   F: include/sysemu/hvf.h
>>>> >> +F: include/sysemu/hvf_int.h
>>>> >>
>>>> >>   WHPX CPUs
>>>> >>   M: Sunil Muthuswamy <sunilmut@microsoft.com>
>>>> >> diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
>>>> >> new file mode 100644
>>>> >> index 0000000000..47d77a472a
>>>> >> --- /dev/null
>>>> >> +++ b/accel/hvf/hvf-all.c
>>>> >> @@ -0,0 +1,56 @@
>>>> >> +/*
>>>> >> + * QEMU Hypervisor.framework support
>>>> >> + *
>>>> >> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>>>> >> + * the COPYING file in the top-level directory.
>>>> >> + *
>>>> >> + * Contributions after 2012-01-13 are licensed under the terms of the
>>>> >> + * GNU GPL, version 2 or (at your option) any later version.
>>>> >> + */
>>>> >> +
>>>> >> +#include "qemu/osdep.h"
>>>> >> +#include "qemu-common.h"
>>>> >> +#include "qemu/error-report.h"
>>>> >> +#include "sysemu/hvf.h"
>>>> >> +#include "sysemu/hvf_int.h"
>>>> >> +#include "sysemu/runstate.h"
>>>> >> +
>>>> >> +#include "qemu/main-loop.h"
>>>> >> +#include "sysemu/accel.h"
>>>> >> +
>>>> >> +#include <Hypervisor/Hypervisor.h>
>>>> >> +
>>>> >> +bool hvf_allowed;
>>>> >> +HVFState *hvf_state;
>>>> >> +
>>>> >> +void assert_hvf_ok(hv_return_t ret)
>>>> >> +{
>>>> >> +    if (ret == HV_SUCCESS) {
>>>> >> +        return;
>>>> >> +    }
>>>> >> +
>>>> >> +    switch (ret) {
>>>> >> +    case HV_ERROR:
>>>> >> +        error_report("Error: HV_ERROR");
>>>> >> +        break;
>>>> >> +    case HV_BUSY:
>>>> >> +        error_report("Error: HV_BUSY");
>>>> >> +        break;
>>>> >> +    case HV_BAD_ARGUMENT:
>>>> >> +        error_report("Error: HV_BAD_ARGUMENT");
>>>> >> +        break;
>>>> >> +    case HV_NO_RESOURCES:
>>>> >> +        error_report("Error: HV_NO_RESOURCES");
>>>> >> +        break;
>>>> >> +    case HV_NO_DEVICE:
>>>> >> +        error_report("Error: HV_NO_DEVICE");
>>>> >> +        break;
>>>> >> +    case HV_UNSUPPORTED:
>>>> >> +        error_report("Error: HV_UNSUPPORTED");
>>>> >> +        break;
>>>> >> +    default:
>>>> >> +        error_report("Unknown Error");
>>>> >> +    }
>>>> >> +
>>>> >> +    abort();
>>>> >> +}
>>>> >> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
>>>> >> new file mode 100644
>>>> >> index 0000000000..f9bb5502b7
>>>> >> --- /dev/null
>>>> >> +++ b/accel/hvf/hvf-cpus.c
>>>> >> @@ -0,0 +1,468 @@
>>>> >> +/*
>>>> >> + * Copyright 2008 IBM Corporation
>>>> >> + *           2008 Red Hat, Inc.
>>>> >> + * Copyright 2011 Intel Corporation
>>>> >> + * Copyright 2016 Veertu, Inc.
>>>> >> + * Copyright 2017 The Android Open Source Project
>>>> >> + *
>>>> >> + * QEMU Hypervisor.framework support
>>>> >> + *
>>>> >> + * This program is free software; you can redistribute it and/or
>>>> >> + * modify it under the terms of version 2 of the GNU General Public
>>>> >> + * License as published by the Free Software Foundation.
>>>> >> + *
>>>> >> + * This program is distributed in the hope that it will be useful,
>>>> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>>> >> + * General Public License for more details.
>>>> >> + *
>>>> >> + * You should have received a copy of the GNU General Public License
>>>> >> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>>>> >> + *
>>>> >> + * This file contain code under public domain from the hvdos project:
>>>> >> + * https://github.com/mist64/hvdos
>>>> >> + *
>>>> >> + * Parts Copyright (c) 2011 NetApp, Inc.
>>>> >> + * All rights reserved.
>>>> >> + *
>>>> >> + * Redistribution and use in source and binary forms, with or without
>>>> >> + * modification, are permitted provided that the following conditions
>>>> >> + * are met:
>>>> >> + * 1. Redistributions of source code must retain the above copyright
>>>> >> + *    notice, this list of conditions and the following disclaimer.
>>>> >> + * 2. Redistributions in binary form must reproduce the above copyright
>>>> >> + *    notice, this list of conditions and the following disclaimer in the
>>>> >> + *    documentation and/or other materials provided with the distribution.
>>>> >> + *
>>>> >> + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
>>>> >> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
>>>> >> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
>>>> >> + * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
>>>> >> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
>>>> >> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
>>>> >> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
>>>> >> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
>>>> >> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
>>>> >> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>>>> >> + * SUCH DAMAGE.
>>>> >> + */
>>>> >> +
>>>> >> +#include "qemu/osdep.h"
>>>> >> +#include "qemu/error-report.h"
>>>> >> +#include "qemu/main-loop.h"
>>>> >> +#include "exec/address-spaces.h"
>>>> >> +#include "exec/exec-all.h"
>>>> >> +#include "sysemu/cpus.h"
>>>> >> +#include "sysemu/hvf.h"
>>>> >> +#include "sysemu/hvf_int.h"
>>>> >> +#include "sysemu/runstate.h"
>>>> >> +#include "qemu/guest-random.h"
>>>> >> +
>>>> >> +#include <Hypervisor/Hypervisor.h>
>>>> >> +
>>>> >> +/* Memory slots */
>>>> >> +
>>>> >> +struct mac_slot {
>>>> >> +    int present;
>>>> >> +    uint64_t size;
>>>> >> +    uint64_t gpa_start;
>>>> >> +    uint64_t gva;
>>>> >> +};
>>>> >> +
>>>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
>>>> >> +{
>>>> >> +    hvf_slot *slot;
>>>> >> +    int x;
>>>> >> +    for (x = 0; x < hvf_state->num_slots; ++x) {
>>>> >> +        slot = &hvf_state->slots[x];
>>>> >> +        if (slot->size && start < (slot->start + slot->size) &&
>>>> >> +            (start + size) > slot->start) {
>>>> >> +            return slot;
>>>> >> +        }
>>>> >> +    }
>>>> >> +    return NULL;
>>>> >> +}
>>>> >> +
>>>> >> +struct mac_slot mac_slots[32];
>>>> >> +
>>>> >> +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
>>>> >> +{
>>>> >> +    struct mac_slot *macslot;
>>>> >> +    hv_return_t ret;
>>>> >> +
>>>> >> +    macslot = &mac_slots[slot->slot_id];
>>>> >> +
>>>> >> +    if (macslot->present) {
>>>> >> +        if (macslot->size != slot->size) {
>>>> >> +            macslot->present = 0;
>>>> >> +            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
>>>> >> +            assert_hvf_ok(ret);
>>>> >> +        }
>>>> >> +    }
>>>> >> +
>>>> >> +    if (!slot->size) {
>>>> >> +        return 0;
>>>> >> +    }
>>>> >> +
>>>> >> +    macslot->present = 1;
>>>> >> +    macslot->gpa_start = slot->start;
>>>> >> +    macslot->size = slot->size;
>>>> >> +    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
>>>> >> +    assert_hvf_ok(ret);
>>>> >> +    return 0;
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
>>>> >> +{
>>>> >> +    hvf_slot *mem;
>>>> >> +    MemoryRegion *area = section->mr;
>>>> >> +    bool writeable = !area->readonly && !area->rom_device;
>>>> >> +    hv_memory_flags_t flags;
>>>> >> +
>>>> >> +    if (!memory_region_is_ram(area)) {
>>>> >> +        if (writeable) {
>>>> >> +            return;
>>>> >> +        } else if (!memory_region_is_romd(area)) {
>>>> >> +            /*
>>>> >> +             * If the memory device is not in romd_mode, then we actually want
>>>> >> +             * to remove the hvf memory slot so all accesses will trap.
>>>> >> +             */
>>>> >> +             add = false;
>>>> >> +        }
>>>> >> +    }
>>>> >> +
>>>> >> +    mem = hvf_find_overlap_slot(
>>>> >> +            section->offset_within_address_space,
>>>> >> +            int128_get64(section->size));
>>>> >> +
>>>> >> +    if (mem && add) {
>>>> >> +        if (mem->size == int128_get64(section->size) &&
>>>> >> +            mem->start == section->offset_within_address_space &&
>>>> >> +            mem->mem == (memory_region_get_ram_ptr(area) +
>>>> >> +            section->offset_within_region)) {
>>>> >> +            return; /* Same region was attempted to register, go away. */
>>>> >> +        }
>>>> >> +    }
>>>> >> +
>>>> >> +    /* Region needs to be reset. set the size to 0 and remap it. */
>>>> >> +    if (mem) {
>>>> >> +        mem->size = 0;
>>>> >> +        if (do_hvf_set_memory(mem, 0)) {
>>>> >> +            error_report("Failed to reset overlapping slot");
>>>> >> +            abort();
>>>> >> +        }
>>>> >> +    }
>>>> >> +
>>>> >> +    if (!add) {
>>>> >> +        return;
>>>> >> +    }
>>>> >> +
>>>> >> +    if (area->readonly ||
>>>> >> +        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
>>>> >> +        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
>>>> >> +    } else {
>>>> >> +        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
>>>> >> +    }
>>>> >> +
>>>> >> +    /* Now make a new slot. */
>>>> >> +    int x;
>>>> >> +
>>>> >> +    for (x = 0; x < hvf_state->num_slots; ++x) {
>>>> >> +        mem = &hvf_state->slots[x];
>>>> >> +        if (!mem->size) {
>>>> >> +            break;
>>>> >> +        }
>>>> >> +    }
>>>> >> +
>>>> >> +    if (x == hvf_state->num_slots) {
>>>> >> +        error_report("No free slots");
>>>> >> +        abort();
>>>> >> +    }
>>>> >> +
>>>> >> +    mem->size = int128_get64(section->size);
>>>> >> +    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
>>>> >> +    mem->start = section->offset_within_address_space;
>>>> >> +    mem->region = area;
>>>> >> +
>>>> >> +    if (do_hvf_set_memory(mem, flags)) {
>>>> >> +        error_report("Error registering new memory slot");
>>>> >> +        abort();
>>>> >> +    }
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
>>>> >> +{
>>>> >> +    hvf_slot *slot;
>>>> >> +
>>>> >> +    slot = hvf_find_overlap_slot(
>>>> >> +            section->offset_within_address_space,
>>>> >> +            int128_get64(section->size));
>>>> >> +
>>>> >> +    /* protect region against writes; begin tracking it */
>>>> >> +    if (on) {
>>>> >> +        slot->flags |= HVF_SLOT_LOG;
>>>> >> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
>>>> >> +                      HV_MEMORY_READ);
>>>> >> +    /* stop tracking region*/
>>>> >> +    } else {
>>>> >> +        slot->flags &= ~HVF_SLOT_LOG;
>>>> >> +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
>>>> >> +                      HV_MEMORY_READ | HV_MEMORY_WRITE);
>>>> >> +    }
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_log_start(MemoryListener *listener,
>>>> >> +                          MemoryRegionSection *section, int old, int new)
>>>> >> +{
>>>> >> +    if (old != 0) {
>>>> >> +        return;
>>>> >> +    }
>>>> >> +
>>>> >> +    hvf_set_dirty_tracking(section, 1);
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_log_stop(MemoryListener *listener,
>>>> >> +                         MemoryRegionSection *section, int old, int new)
>>>> >> +{
>>>> >> +    if (new != 0) {
>>>> >> +        return;
>>>> >> +    }
>>>> >> +
>>>> >> +    hvf_set_dirty_tracking(section, 0);
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_log_sync(MemoryListener *listener,
>>>> >> +                         MemoryRegionSection *section)
>>>> >> +{
>>>> >> +    /*
>>>> >> +     * sync of dirty pages is handled elsewhere; just make sure we keep
>>>> >> +     * tracking the region.
>>>> >> +     */
>>>> >> +    hvf_set_dirty_tracking(section, 1);
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_region_add(MemoryListener *listener,
>>>> >> +                           MemoryRegionSection *section)
>>>> >> +{
>>>> >> +    hvf_set_phys_mem(section, true);
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_region_del(MemoryListener *listener,
>>>> >> +                           MemoryRegionSection *section)
>>>> >> +{
>>>> >> +    hvf_set_phys_mem(section, false);
>>>> >> +}
>>>> >> +
>>>> >> +static MemoryListener hvf_memory_listener = {
>>>> >> +    .priority = 10,
>>>> >> +    .region_add = hvf_region_add,
>>>> >> +    .region_del = hvf_region_del,
>>>> >> +    .log_start = hvf_log_start,
>>>> >> +    .log_stop = hvf_log_stop,
>>>> >> +    .log_sync = hvf_log_sync,
>>>> >> +};
>>>> >> +
>>>> >> +static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
>>>> >> +{
>>>> >> +    if (!cpu->vcpu_dirty) {
>>>> >> +        hvf_get_registers(cpu);
>>>> >> +        cpu->vcpu_dirty = true;
>>>> >> +    }
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_cpu_synchronize_state(CPUState *cpu)
>>>> >> +{
>>>> >> +    if (!cpu->vcpu_dirty) {
>>>> >> +        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
>>>> >> +    }
>>>> >> +}
>>>> >> +
>>>> >> +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
>>>> >> +                                              run_on_cpu_data arg)
>>>> >> +{
>>>> >> +    hvf_put_registers(cpu);
>>>> >> +    cpu->vcpu_dirty = false;
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
>>>> >> +{
>>>> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
>>>> >> +}
>>>> >> +
>>>> >> +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
>>>> >> +                                             run_on_cpu_data arg)
>>>> >> +{
>>>> >> +    hvf_put_registers(cpu);
>>>> >> +    cpu->vcpu_dirty = false;
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_cpu_synchronize_post_init(CPUState *cpu)
>>>> >> +{
>>>> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
>>>> >> +}
>>>> >> +
>>>> >> +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
>>>> >> +                                              run_on_cpu_data arg)
>>>> >> +{
>>>> >> +    cpu->vcpu_dirty = true;
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
>>>> >> +{
>>>> >> +    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_vcpu_destroy(CPUState *cpu)
>>>> >> +{
>>>> >> +    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
>>>> >> +    assert_hvf_ok(ret);
>>>> >> +
>>>> >> +    hvf_arch_vcpu_destroy(cpu);
>>>> >> +}
>>>> >> +
>>>> >> +static void dummy_signal(int sig)
>>>> >> +{
>>>> >> +}
>>>> >> +
>>>> >> +static int hvf_init_vcpu(CPUState *cpu)
>>>> >> +{
>>>> >> +    int r;
>>>> >> +
>>>> >> +    /* init cpu signals */
>>>> >> +    sigset_t set;
>>>> >> +    struct sigaction sigact;
>>>> >> +
>>>> >> +    memset(&sigact, 0, sizeof(sigact));
>>>> >> +    sigact.sa_handler = dummy_signal;
>>>> >> +    sigaction(SIG_IPI, &sigact, NULL);
>>>> >> +
>>>> >> +    pthread_sigmask(SIG_BLOCK, NULL, &set);
>>>> >> +    sigdelset(&set, SIG_IPI);
>>>> >> +
>>>> >> +#ifdef __aarch64__
>>>> >> +    r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t **)&cpu->hvf_exit, NULL);
>>>> >> +#else
>>>> >> +    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
>>>> >> +#endif
>>>> > I think the first __aarch64__ bit fits better to arm part of the series.
>>>>
>>>>
>>>> Oops. Thanks for catching it! Yes, absolutely. It should be part of the
>>>> ARM enablement.
>>>>
>>>>
>>>> >
>>>> >> +    cpu->vcpu_dirty = 1;
>>>> >> +    assert_hvf_ok(r);
>>>> >> +
>>>> >> +    return hvf_arch_init_vcpu(cpu);
>>>> >> +}
>>>> >> +
>>>> >> +/*
>>>> >> + * The HVF-specific vCPU thread function. This one should only run when the host
>>>> >> + * CPU supports the VMX "unrestricted guest" feature.
>>>> >> + */
>>>> >> +static void *hvf_cpu_thread_fn(void *arg)
>>>> >> +{
>>>> >> +    CPUState *cpu = arg;
>>>> >> +
>>>> >> +    int r;
>>>> >> +
>>>> >> +    assert(hvf_enabled());
>>>> >> +
>>>> >> +    rcu_register_thread();
>>>> >> +
>>>> >> +    qemu_mutex_lock_iothread();
>>>> >> +    qemu_thread_get_self(cpu->thread);
>>>> >> +
>>>> >> +    cpu->thread_id = qemu_get_thread_id();
>>>> >> +    cpu->can_do_io = 1;
>>>> >> +    current_cpu = cpu;
>>>> >> +
>>>> >> +    hvf_init_vcpu(cpu);
>>>> >> +
>>>> >> +    /* signal CPU creation */
>>>> >> +    cpu_thread_signal_created(cpu);
>>>> >> +    qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>>> >> +
>>>> >> +    do {
>>>> >> +        if (cpu_can_run(cpu)) {
>>>> >> +            r = hvf_vcpu_exec(cpu);
>>>> >> +            if (r == EXCP_DEBUG) {
>>>> >> +                cpu_handle_guest_debug(cpu);
>>>> >> +            }
>>>> >> +        }
>>>> >> +        qemu_wait_io_event(cpu);
>>>> >> +    } while (!cpu->unplug || cpu_can_run(cpu));
>>>> >> +
>>>> >> +    hvf_vcpu_destroy(cpu);
>>>> >> +    cpu_thread_signal_destroyed(cpu);
>>>> >> +    qemu_mutex_unlock_iothread();
>>>> >> +    rcu_unregister_thread();
>>>> >> +    return NULL;
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_start_vcpu_thread(CPUState *cpu)
>>>> >> +{
>>>> >> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>>>> >> +
>>>> >> +    /*
>>>> >> +     * HVF currently does not support TCG, and only runs in
>>>> >> +     * unrestricted-guest mode.
>>>> >> +     */
>>>> >> +    assert(hvf_enabled());
>>>> >> +
>>>> >> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>>>> >> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>>> >> +    qemu_cond_init(cpu->halt_cond);
>>>> >> +
>>>> >> +    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>>>> >> +             cpu->cpu_index);
>>>> >> +    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
>>>> >> +                       cpu, QEMU_THREAD_JOINABLE);
>>>> >> +}
>>>> >> +
>>>> >> +static const CpusAccel hvf_cpus = {
>>>> >> +    .create_vcpu_thread = hvf_start_vcpu_thread,
>>>> >> +
>>>> >> +    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
>>>> >> +    .synchronize_post_init = hvf_cpu_synchronize_post_init,
>>>> >> +    .synchronize_state = hvf_cpu_synchronize_state,
>>>> >> +    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
>>>> >> +};
>>>> >> +
>>>> >> +static int hvf_accel_init(MachineState *ms)
>>>> >> +{
>>>> >> +    int x;
>>>> >> +    hv_return_t ret;
>>>> >> +    HVFState *s;
>>>> >> +
>>>> >> +    ret = hv_vm_create(HV_VM_DEFAULT);
>>>> >> +    assert_hvf_ok(ret);
>>>> >> +
>>>> >> +    s = g_new0(HVFState, 1);
>>>> >> +
>>>> >> +    s->num_slots = 32;
>>>> >> +    for (x = 0; x < s->num_slots; ++x) {
>>>> >> +        s->slots[x].size = 0;
>>>> >> +        s->slots[x].slot_id = x;
>>>> >> +    }
>>>> >> +
>>>> >> +    hvf_state = s;
>>>> >> +    memory_listener_register(&hvf_memory_listener, &address_space_memory);
>>>> >> +    cpus_register_accel(&hvf_cpus);
>>>> >> +    return 0;
>>>> >> +}
>>>> >> +
>>>> >> +static void hvf_accel_class_init(ObjectClass *oc, void *data)
>>>> >> +{
>>>> >> +    AccelClass *ac = ACCEL_CLASS(oc);
>>>> >> +    ac->name = "HVF";
>>>> >> +    ac->init_machine = hvf_accel_init;
>>>> >> +    ac->allowed = &hvf_allowed;
>>>> >> +}
>>>> >> +
>>>> >> +static const TypeInfo hvf_accel_type = {
>>>> >> +    .name = TYPE_HVF_ACCEL,
>>>> >> +    .parent = TYPE_ACCEL,
>>>> >> +    .class_init = hvf_accel_class_init,
>>>> >> +};
>>>> >> +
>>>> >> +static void hvf_type_init(void)
>>>> >> +{
>>>> >> +    type_register_static(&hvf_accel_type);
>>>> >> +}
>>>> >> +
>>>> >> +type_init(hvf_type_init);
>>>> >> diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
>>>> >> new file mode 100644
>>>> >> index 0000000000..dfd6b68dc7
>>>> >> --- /dev/null
>>>> >> +++ b/accel/hvf/meson.build
>>>> >> @@ -0,0 +1,7 @@
>>>> >> +hvf_ss = ss.source_set()
>>>> >> +hvf_ss.add(files(
>>>> >> +  'hvf-all.c',
>>>> >> +  'hvf-cpus.c',
>>>> >> +))
>>>> >> +
>>>> >> +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
>>>> >> diff --git a/accel/meson.build b/accel/meson.build
>>>> >> index b26cca227a..6de12ce5d5 100644
>>>> >> --- a/accel/meson.build
>>>> >> +++ b/accel/meson.build
>>>> >> @@ -1,5 +1,6 @@
>>>> >>   softmmu_ss.add(files('accel.c'))
>>>> >>
>>>> >> +subdir('hvf')
>>>> >>   subdir('qtest')
>>>> >>   subdir('kvm')
>>>> >>   subdir('tcg')
>>>> >> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
>>>> >> new file mode 100644
>>>> >> index 0000000000..de9bad23a8
>>>> >> --- /dev/null
>>>> >> +++ b/include/sysemu/hvf_int.h
>>>> >> @@ -0,0 +1,69 @@
>>>> >> +/*
>>>> >> + * QEMU Hypervisor.framework (HVF) support
>>>> >> + *
>>>> >> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>>>> >> + * See the COPYING file in the top-level directory.
>>>> >> + *
>>>> >> + */
>>>> >> +
>>>> >> +/* header to be included in HVF-specific code */
>>>> >> +
>>>> >> +#ifndef HVF_INT_H
>>>> >> +#define HVF_INT_H
>>>> >> +
>>>> >> +#include <Hypervisor/Hypervisor.h>
>>>> >> +
>>>> >> +#define HVF_MAX_VCPU 0x10
>>>> >> +
>>>> >> +extern struct hvf_state hvf_global;
>>>> >> +
>>>> >> +struct hvf_vm {
>>>> >> +    int id;
>>>> >> +    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
>>>> >> +};
>>>> >> +
>>>> >> +struct hvf_state {
>>>> >> +    uint32_t version;
>>>> >> +    struct hvf_vm *vm;
>>>> >> +    uint64_t mem_quota;
>>>> >> +};
>>>> >> +
>>>> >> +/* hvf_slot flags */
>>>> >> +#define HVF_SLOT_LOG (1 << 0)
>>>> >> +
>>>> >> +typedef struct hvf_slot {
>>>> >> +    uint64_t start;
>>>> >> +    uint64_t size;
>>>> >> +    uint8_t *mem;
>>>> >> +    int slot_id;
>>>> >> +    uint32_t flags;
>>>> >> +    MemoryRegion *region;
>>>> >> +} hvf_slot;
>>>> >> +
>>>> >> +typedef struct hvf_vcpu_caps {
>>>> >> +    uint64_t vmx_cap_pinbased;
>>>> >> +    uint64_t vmx_cap_procbased;
>>>> >> +    uint64_t vmx_cap_procbased2;
>>>> >> +    uint64_t vmx_cap_entry;
>>>> >> +    uint64_t vmx_cap_exit;
>>>> >> +    uint64_t vmx_cap_preemption_timer;
>>>> >> +} hvf_vcpu_caps;
>>>> >> +
>>>> >> +struct HVFState {
>>>> >> +    AccelState parent;
>>>> >> +    hvf_slot slots[32];
>>>> >> +    int num_slots;
>>>> >> +
>>>> >> +    hvf_vcpu_caps *hvf_caps;
>>>> >> +};
>>>> >> +extern HVFState *hvf_state;
>>>> >> +
>>>> >> +void assert_hvf_ok(hv_return_t ret);
>>>> >> +int hvf_get_registers(CPUState *cpu);
>>>> >> +int hvf_put_registers(CPUState *cpu);
>>>> >> +int hvf_arch_init_vcpu(CPUState *cpu);
>>>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu);
>>>> >> +int hvf_vcpu_exec(CPUState *cpu);
>>>> >> +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
>>>> >> +
>>>> >> +#endif
>>>> >> diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c
>>>> >> deleted file mode 100644
>>>> >> index 817b3d7452..0000000000
>>>> >> --- a/target/i386/hvf/hvf-cpus.c
>>>> >> +++ /dev/null
>>>> >> @@ -1,131 +0,0 @@
>>>> >> -/*
>>>> >> - * Copyright 2008 IBM Corporation
>>>> >> - *           2008 Red Hat, Inc.
>>>> >> - * Copyright 2011 Intel Corporation
>>>> >> - * Copyright 2016 Veertu, Inc.
>>>> >> - * Copyright 2017 The Android Open Source Project
>>>> >> - *
>>>> >> - * QEMU Hypervisor.framework support
>>>> >> - *
>>>> >> - * This program is free software; you can redistribute it and/or
>>>> >> - * modify it under the terms of version 2 of the GNU General Public
>>>> >> - * License as published by the Free Software Foundation.
>>>> >> - *
>>>> >> - * This program is distributed in the hope that it will be useful,
>>>> >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>>> >> - * General Public License for more details.
>>>> >> - *
>>>> >> - * You should have received a copy of the GNU General Public License
>>>> >> - * along with this program; if not, see <http://www.gnu.org/licenses/>.
>>>> >> - *
>>>> >> - * This file contain code under public domain from the hvdos project:
>>>> >> - * https://github.com/mist64/hvdos
>>>> >> - *
>>>> >> - * Parts Copyright (c) 2011 NetApp, Inc.
>>>> >> - * All rights reserved.
>>>> >> - *
>>>> >> - * Redistribution and use in source and binary forms, with or without
>>>> >> - * modification, are permitted provided that the following conditions
>>>> >> - * are met:
>>>> >> - * 1. Redistributions of source code must retain the above copyright
>>>> >> - *    notice, this list of conditions and the following disclaimer.
>>>> >> - * 2. Redistributions in binary form must reproduce the above copyright
>>>> >> - *    notice, this list of conditions and the following disclaimer in the
>>>> >> - *    documentation and/or other materials provided with the distribution.
>>>> >> - *
>>>> >> - * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
>>>> >> - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
>>>> >> - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
>>>> >> - * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
>>>> >> - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
>>>> >> - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
>>>> >> - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
>>>> >> - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
>>>> >> - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
>>>> >> - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>>>> >> - * SUCH DAMAGE.
>>>> >> - */
>>>> >> -
>>>> >> -#include "qemu/osdep.h"
>>>> >> -#include "qemu/error-report.h"
>>>> >> -#include "qemu/main-loop.h"
>>>> >> -#include "sysemu/hvf.h"
>>>> >> -#include "sysemu/runstate.h"
>>>> >> -#include "target/i386/cpu.h"
>>>> >> -#include "qemu/guest-random.h"
>>>> >> -
>>>> >> -#include "hvf-cpus.h"
>>>> >> -
>>>> >> -/*
>>>> >> - * The HVF-specific vCPU thread function. This one should only run when the host
>>>> >> - * CPU supports the VMX "unrestricted guest" feature.
>>>> >> - */
>>>> >> -static void *hvf_cpu_thread_fn(void *arg)
>>>> >> -{
>>>> >> -    CPUState *cpu = arg;
>>>> >> -
>>>> >> -    int r;
>>>> >> -
>>>> >> -    assert(hvf_enabled());
>>>> >> -
>>>> >> -    rcu_register_thread();
>>>> >> -
>>>> >> -    qemu_mutex_lock_iothread();
>>>> >> -    qemu_thread_get_self(cpu->thread);
>>>> >> -
>>>> >> -    cpu->thread_id = qemu_get_thread_id();
>>>> >> -    cpu->can_do_io = 1;
>>>> >> -    current_cpu = cpu;
>>>> >> -
>>>> >> -    hvf_init_vcpu(cpu);
>>>> >> -
>>>> >> -    /* signal CPU creation */
>>>> >> -    cpu_thread_signal_created(cpu);
>>>> >> -    qemu_guest_random_seed_thread_part2(cpu->random_seed);
>>>> >> -
>>>> >> -    do {
>>>> >> -        if (cpu_can_run(cpu)) {
>>>> >> -            r = hvf_vcpu_exec(cpu);
>>>> >> -            if (r == EXCP_DEBUG) {
>>>> >> -                cpu_handle_guest_debug(cpu);
>>>> >> -            }
>>>> >> -        }
>>>> >> -        qemu_wait_io_event(cpu);
>>>> >> -    } while (!cpu->unplug || cpu_can_run(cpu));
>>>> >> -
>>>> >> -    hvf_vcpu_destroy(cpu);
>>>> >> -    cpu_thread_signal_destroyed(cpu);
>>>> >> -    qemu_mutex_unlock_iothread();
>>>> >> -    rcu_unregister_thread();
>>>> >> -    return NULL;
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_start_vcpu_thread(CPUState *cpu)
>>>> >> -{
>>>> >> -    char thread_name[VCPU_THREAD_NAME_SIZE];
>>>> >> -
>>>> >> -    /*
>>>> >> -     * HVF currently does not support TCG, and only runs in
>>>> >> -     * unrestricted-guest mode.
>>>> >> -     */
>>>> >> -    assert(hvf_enabled());
>>>> >> -
>>>> >> -    cpu->thread = g_malloc0(sizeof(QemuThread));
>>>> >> -    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>>> >> -    qemu_cond_init(cpu->halt_cond);
>>>> >> -
>>>> >> -    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>>>> >> -             cpu->cpu_index);
>>>> >> -    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
>>>> >> -                       cpu, QEMU_THREAD_JOINABLE);
>>>> >> -}
>>>> >> -
>>>> >> -const CpusAccel hvf_cpus = {
>>>> >> -    .create_vcpu_thread = hvf_start_vcpu_thread,
>>>> >> -
>>>> >> -    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
>>>> >> -    .synchronize_post_init = hvf_cpu_synchronize_post_init,
>>>> >> -    .synchronize_state = hvf_cpu_synchronize_state,
>>>> >> -    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
>>>> >> -};
>>>> >> diff --git a/target/i386/hvf/hvf-cpus.h b/target/i386/hvf/hvf-cpus.h
>>>> >> deleted file mode 100644
>>>> >> index ced31b82c0..0000000000
>>>> >> --- a/target/i386/hvf/hvf-cpus.h
>>>> >> +++ /dev/null
>>>> >> @@ -1,25 +0,0 @@
>>>> >> -/*
>>>> >> - * Accelerator CPUS Interface
>>>> >> - *
>>>> >> - * Copyright 2020 SUSE LLC
>>>> >> - *
>>>> >> - * This work is licensed under the terms of the GNU GPL, version 2 or later.
>>>> >> - * See the COPYING file in the top-level directory.
>>>> >> - */
>>>> >> -
>>>> >> -#ifndef HVF_CPUS_H
>>>> >> -#define HVF_CPUS_H
>>>> >> -
>>>> >> -#include "sysemu/cpus.h"
>>>> >> -
>>>> >> -extern const CpusAccel hvf_cpus;
>>>> >> -
>>>> >> -int hvf_init_vcpu(CPUState *);
>>>> >> -int hvf_vcpu_exec(CPUState *);
>>>> >> -void hvf_cpu_synchronize_state(CPUState *);
>>>> >> -void hvf_cpu_synchronize_post_reset(CPUState *);
>>>> >> -void hvf_cpu_synchronize_post_init(CPUState *);
>>>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *);
>>>> >> -void hvf_vcpu_destroy(CPUState *);
>>>> >> -
>>>> >> -#endif /* HVF_CPUS_H */
>>>> >> diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
>>>> >> index e0edffd077..6d56f8f6bb 100644
>>>> >> --- a/target/i386/hvf/hvf-i386.h
>>>> >> +++ b/target/i386/hvf/hvf-i386.h
>>>> >> @@ -18,57 +18,11 @@
>>>> >>
>>>> >>   #include "sysemu/accel.h"
>>>> >>   #include "sysemu/hvf.h"
>>>> >> +#include "sysemu/hvf_int.h"
>>>> >>   #include "cpu.h"
>>>> >>   #include "x86.h"
>>>> >>
>>>> >> -#define HVF_MAX_VCPU 0x10
>>>> >> -
>>>> >> -extern struct hvf_state hvf_global;
>>>> >> -
>>>> >> -struct hvf_vm {
>>>> >> -    int id;
>>>> >> -    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
>>>> >> -};
>>>> >> -
>>>> >> -struct hvf_state {
>>>> >> -    uint32_t version;
>>>> >> -    struct hvf_vm *vm;
>>>> >> -    uint64_t mem_quota;
>>>> >> -};
>>>> >> -
>>>> >> -/* hvf_slot flags */
>>>> >> -#define HVF_SLOT_LOG (1 << 0)
>>>> >> -
>>>> >> -typedef struct hvf_slot {
>>>> >> -    uint64_t start;
>>>> >> -    uint64_t size;
>>>> >> -    uint8_t *mem;
>>>> >> -    int slot_id;
>>>> >> -    uint32_t flags;
>>>> >> -    MemoryRegion *region;
>>>> >> -} hvf_slot;
>>>> >> -
>>>> >> -typedef struct hvf_vcpu_caps {
>>>> >> -    uint64_t vmx_cap_pinbased;
>>>> >> -    uint64_t vmx_cap_procbased;
>>>> >> -    uint64_t vmx_cap_procbased2;
>>>> >> -    uint64_t vmx_cap_entry;
>>>> >> -    uint64_t vmx_cap_exit;
>>>> >> -    uint64_t vmx_cap_preemption_timer;
>>>> >> -} hvf_vcpu_caps;
>>>> >> -
>>>> >> -struct HVFState {
>>>> >> -    AccelState parent;
>>>> >> -    hvf_slot slots[32];
>>>> >> -    int num_slots;
>>>> >> -
>>>> >> -    hvf_vcpu_caps *hvf_caps;
>>>> >> -};
>>>> >> -extern HVFState *hvf_state;
>>>> >> -
>>>> >> -void hvf_set_phys_mem(MemoryRegionSection *, bool);
>>>> >>   void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
>>>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
>>>> >>
>>>> >>   #ifdef NEED_CPU_H
>>>> >>   /* Functions exported to host specific mode */
>>>> >> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
>>>> >> index ed9356565c..8b96ecd619 100644
>>>> >> --- a/target/i386/hvf/hvf.c
>>>> >> +++ b/target/i386/hvf/hvf.c
>>>> >> @@ -51,6 +51,7 @@
>>>> >>   #include "qemu/error-report.h"
>>>> >>
>>>> >>   #include "sysemu/hvf.h"
>>>> >> +#include "sysemu/hvf_int.h"
>>>> >>   #include "sysemu/runstate.h"
>>>> >>   #include "hvf-i386.h"
>>>> >>   #include "vmcs.h"
>>>> >> @@ -72,171 +73,6 @@
>>>> >>   #include "sysemu/accel.h"
>>>> >>   #include "target/i386/cpu.h"
>>>> >>
>>>> >> -#include "hvf-cpus.h"
>>>> >> -
>>>> >> -HVFState *hvf_state;
>>>> >> -
>>>> >> -static void assert_hvf_ok(hv_return_t ret)
>>>> >> -{
>>>> >> -    if (ret == HV_SUCCESS) {
>>>> >> -        return;
>>>> >> -    }
>>>> >> -
>>>> >> -    switch (ret) {
>>>> >> -    case HV_ERROR:
>>>> >> -        error_report("Error: HV_ERROR");
>>>> >> -        break;
>>>> >> -    case HV_BUSY:
>>>> >> -        error_report("Error: HV_BUSY");
>>>> >> -        break;
>>>> >> -    case HV_BAD_ARGUMENT:
>>>> >> -        error_report("Error: HV_BAD_ARGUMENT");
>>>> >> -        break;
>>>> >> -    case HV_NO_RESOURCES:
>>>> >> -        error_report("Error: HV_NO_RESOURCES");
>>>> >> -        break;
>>>> >> -    case HV_NO_DEVICE:
>>>> >> -        error_report("Error: HV_NO_DEVICE");
>>>> >> -        break;
>>>> >> -    case HV_UNSUPPORTED:
>>>> >> -        error_report("Error: HV_UNSUPPORTED");
>>>> >> -        break;
>>>> >> -    default:
>>>> >> -        error_report("Unknown Error");
>>>> >> -    }
>>>> >> -
>>>> >> -    abort();
>>>> >> -}
>>>> >> -
>>>> >> -/* Memory slots */
>>>> >> -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
>>>> >> -{
>>>> >> -    hvf_slot *slot;
>>>> >> -    int x;
>>>> >> -    for (x = 0; x < hvf_state->num_slots; ++x) {
>>>> >> -        slot = &hvf_state->slots[x];
>>>> >> -        if (slot->size && start < (slot->start + slot->size) &&
>>>> >> -            (start + size) > slot->start) {
>>>> >> -            return slot;
>>>> >> -        }
>>>> >> -    }
>>>> >> -    return NULL;
>>>> >> -}
>>>> >> -
>>>> >> -struct mac_slot {
>>>> >> -    int present;
>>>> >> -    uint64_t size;
>>>> >> -    uint64_t gpa_start;
>>>> >> -    uint64_t gva;
>>>> >> -};
>>>> >> -
>>>> >> -struct mac_slot mac_slots[32];
>>>> >> -
>>>> >> -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
>>>> >> -{
>>>> >> -    struct mac_slot *macslot;
>>>> >> -    hv_return_t ret;
>>>> >> -
>>>> >> -    macslot = &mac_slots[slot->slot_id];
>>>> >> -
>>>> >> -    if (macslot->present) {
>>>> >> -        if (macslot->size != slot->size) {
>>>> >> -            macslot->present = 0;
>>>> >> -            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
>>>> >> -            assert_hvf_ok(ret);
>>>> >> -        }
>>>> >> -    }
>>>> >> -
>>>> >> -    if (!slot->size) {
>>>> >> -        return 0;
>>>> >> -    }
>>>> >> -
>>>> >> -    macslot->present = 1;
>>>> >> -    macslot->gpa_start = slot->start;
>>>> >> -    macslot->size = slot->size;
>>>> >> -    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
>>>> >> -    assert_hvf_ok(ret);
>>>> >> -    return 0;
>>>> >> -}
>>>> >> -
>>>> >> -void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
>>>> >> -{
>>>> >> -    hvf_slot *mem;
>>>> >> -    MemoryRegion *area = section->mr;
>>>> >> -    bool writeable = !area->readonly && !area->rom_device;
>>>> >> -    hv_memory_flags_t flags;
>>>> >> -
>>>> >> -    if (!memory_region_is_ram(area)) {
>>>> >> -        if (writeable) {
>>>> >> -            return;
>>>> >> -        } else if (!memory_region_is_romd(area)) {
>>>> >> -            /*
>>>> >> -             * If the memory device is not in romd_mode, then we actually want
>>>> >> -             * to remove the hvf memory slot so all accesses will trap.
>>>> >> -             */
>>>> >> -             add = false;
>>>> >> -        }
>>>> >> -    }
>>>> >> -
>>>> >> -    mem = hvf_find_overlap_slot(
>>>> >> -            section->offset_within_address_space,
>>>> >> -            int128_get64(section->size));
>>>> >> -
>>>> >> -    if (mem && add) {
>>>> >> -        if (mem->size == int128_get64(section->size) &&
>>>> >> -            mem->start == section->offset_within_address_space &&
>>>> >> -            mem->mem == (memory_region_get_ram_ptr(area) +
>>>> >> -            section->offset_within_region)) {
>>>> >> -            return; /* Same region was attempted to register, go away. */
>>>> >> -        }
>>>> >> -    }
>>>> >> -
>>>> >> -    /* Region needs to be reset. set the size to 0 and remap it. */
>>>> >> -    if (mem) {
>>>> >> -        mem->size = 0;
>>>> >> -        if (do_hvf_set_memory(mem, 0)) {
>>>> >> -            error_report("Failed to reset overlapping slot");
>>>> >> -            abort();
>>>> >> -        }
>>>> >> -    }
>>>> >> -
>>>> >> -    if (!add) {
>>>> >> -        return;
>>>> >> -    }
>>>> >> -
>>>> >> -    if (area->readonly ||
>>>> >> -        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
>>>> >> -        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
>>>> >> -    } else {
>>>> >> -        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
>>>> >> -    }
>>>> >> -
>>>> >> -    /* Now make a new slot. */
>>>> >> -    int x;
>>>> >> -
>>>> >> -    for (x = 0; x < hvf_state->num_slots; ++x) {
>>>> >> -        mem = &hvf_state->slots[x];
>>>> >> -        if (!mem->size) {
>>>> >> -            break;
>>>> >> -        }
>>>> >> -    }
>>>> >> -
>>>> >> -    if (x == hvf_state->num_slots) {
>>>> >> -        error_report("No free slots");
>>>> >> -        abort();
>>>> >> -    }
>>>> >> -
>>>> >> -    mem->size = int128_get64(section->size);
>>>> >> -    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
>>>> >> -    mem->start = section->offset_within_address_space;
>>>> >> -    mem->region = area;
>>>> >> -
>>>> >> -    if (do_hvf_set_memory(mem, flags)) {
>>>> >> -        error_report("Error registering new memory slot");
>>>> >> -        abort();
>>>> >> -    }
>>>> >> -}
>>>> >> -
>>>> >>   void vmx_update_tpr(CPUState *cpu)
>>>> >>   {
>>>> >>       /* TODO: need integrate APIC handling */
>>>> >> @@ -276,56 +112,6 @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer,
>>>> >>       }
>>>> >>   }
>>>> >>
>>>> >> -static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
>>>> >> -{
>>>> >> -    if (!cpu->vcpu_dirty) {
>>>> >> -        hvf_get_registers(cpu);
>>>> >> -        cpu->vcpu_dirty = true;
>>>> >> -    }
>>>> >> -}
>>>> >> -
>>>> >> -void hvf_cpu_synchronize_state(CPUState *cpu)
>>>> >> -{
>>>> >> -    if (!cpu->vcpu_dirty) {
>>>> >> -        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
>>>> >> -    }
>>>> >> -}
>>>> >> -
>>>> >> -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
>>>> >> -                                              run_on_cpu_data arg)
>>>> >> -{
>>>> >> -    hvf_put_registers(cpu);
>>>> >> -    cpu->vcpu_dirty = false;
>>>> >> -}
>>>> >> -
>>>> >> -void hvf_cpu_synchronize_post_reset(CPUState *cpu)
>>>> >> -{
>>>> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
>>>> >> -}
>>>> >> -
>>>> >> -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
>>>> >> -                                             run_on_cpu_data arg)
>>>> >> -{
>>>> >> -    hvf_put_registers(cpu);
>>>> >> -    cpu->vcpu_dirty = false;
>>>> >> -}
>>>> >> -
>>>> >> -void hvf_cpu_synchronize_post_init(CPUState *cpu)
>>>> >> -{
>>>> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
>>>> >> -}
>>>> >> -
>>>> >> -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
>>>> >> -                                              run_on_cpu_data arg)
>>>> >> -{
>>>> >> -    cpu->vcpu_dirty = true;
>>>> >> -}
>>>> >> -
>>>> >> -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
>>>> >> -{
>>>> >> -    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
>>>> >> -}
>>>> >> -
>>>> >>   static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
>>>> >>   {
>>>> >>       int read, write;
>>>> >> @@ -370,109 +156,19 @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
>>>> >>       return false;
>>>> >>   }
>>>> >>
>>>> >> -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
>>>> >> -{
>>>> >> -    hvf_slot *slot;
>>>> >> -
>>>> >> -    slot = hvf_find_overlap_slot(
>>>> >> -            section->offset_within_address_space,
>>>> >> -            int128_get64(section->size));
>>>> >> -
>>>> >> -    /* protect region against writes; begin tracking it */
>>>> >> -    if (on) {
>>>> >> -        slot->flags |= HVF_SLOT_LOG;
>>>> >> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
>>>> >> -                      HV_MEMORY_READ);
>>>> >> -    /* stop tracking region*/
>>>> >> -    } else {
>>>> >> -        slot->flags &= ~HVF_SLOT_LOG;
>>>> >> -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
>>>> >> -                      HV_MEMORY_READ | HV_MEMORY_WRITE);
>>>> >> -    }
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_log_start(MemoryListener *listener,
>>>> >> -                          MemoryRegionSection *section, int old, int new)
>>>> >> -{
>>>> >> -    if (old != 0) {
>>>> >> -        return;
>>>> >> -    }
>>>> >> -
>>>> >> -    hvf_set_dirty_tracking(section, 1);
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_log_stop(MemoryListener *listener,
>>>> >> -                         MemoryRegionSection *section, int old, int new)
>>>> >> -{
>>>> >> -    if (new != 0) {
>>>> >> -        return;
>>>> >> -    }
>>>> >> -
>>>> >> -    hvf_set_dirty_tracking(section, 0);
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_log_sync(MemoryListener *listener,
>>>> >> -                         MemoryRegionSection *section)
>>>> >> -{
>>>> >> -    /*
>>>> >> -     * sync of dirty pages is handled elsewhere; just make sure we keep
>>>> >> -     * tracking the region.
>>>> >> -     */
>>>> >> -    hvf_set_dirty_tracking(section, 1);
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_region_add(MemoryListener *listener,
>>>> >> -                           MemoryRegionSection *section)
>>>> >> -{
>>>> >> -    hvf_set_phys_mem(section, true);
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_region_del(MemoryListener *listener,
>>>> >> -                           MemoryRegionSection *section)
>>>> >> -{
>>>> >> -    hvf_set_phys_mem(section, false);
>>>> >> -}
>>>> >> -
>>>> >> -static MemoryListener hvf_memory_listener = {
>>>> >> -    .priority = 10,
>>>> >> -    .region_add = hvf_region_add,
>>>> >> -    .region_del = hvf_region_del,
>>>> >> -    .log_start = hvf_log_start,
>>>> >> -    .log_stop = hvf_log_stop,
>>>> >> -    .log_sync = hvf_log_sync,
>>>> >> -};
>>>> >> -
>>>> >> -void hvf_vcpu_destroy(CPUState *cpu)
>>>> >> +void hvf_arch_vcpu_destroy(CPUState *cpu)
>>>> >>   {
>>>> >>       X86CPU *x86_cpu = X86_CPU(cpu);
>>>> >>       CPUX86State *env = &x86_cpu->env;
>>>> >>
>>>> >> -    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
>>>> >>       g_free(env->hvf_mmio_buf);
>>>> >> -    assert_hvf_ok(ret);
>>>> >> -}
>>>> >> -
>>>> >> -static void dummy_signal(int sig)
>>>> >> -{
>>>> >>   }
>>>> >>
>>>> >> -int hvf_init_vcpu(CPUState *cpu)
>>>> >> +int hvf_arch_init_vcpu(CPUState *cpu)
>>>> >>   {
>>>> >>
>>>> >>       X86CPU *x86cpu = X86_CPU(cpu);
>>>> >>       CPUX86State *env = &x86cpu->env;
>>>> >> -    int r;
>>>> >> -
>>>> >> -    /* init cpu signals */
>>>> >> -    sigset_t set;
>>>> >> -    struct sigaction sigact;
>>>> >> -
>>>> >> -    memset(&sigact, 0, sizeof(sigact));
>>>> >> -    sigact.sa_handler = dummy_signal;
>>>> >> -    sigaction(SIG_IPI, &sigact, NULL);
>>>> >> -
>>>> >> -    pthread_sigmask(SIG_BLOCK, NULL, &set);
>>>> >> -    sigdelset(&set, SIG_IPI);
>>>> >>
>>>> >>       init_emu();
>>>> >>       init_decoder();
>>>> >> @@ -480,10 +176,6 @@ int hvf_init_vcpu(CPUState *cpu)
>>>> >>       hvf_state->hvf_caps = g_new0(struct hvf_vcpu_caps, 1);
>>>> >>       env->hvf_mmio_buf = g_new(char, 4096);
>>>> >>
>>>> >> -    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
>>>> >> -    cpu->vcpu_dirty = 1;
>>>> >> -    assert_hvf_ok(r);
>>>> >> -
>>>> >>       if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
>>>> >>           &hvf_state->hvf_caps->vmx_cap_pinbased)) {
>>>> >>           abort();
>>>> >> @@ -865,49 +557,3 @@ int hvf_vcpu_exec(CPUState *cpu)
>>>> >>
>>>> >>       return ret;
>>>> >>   }
>>>> >> -
>>>> >> -bool hvf_allowed;
>>>> >> -
>>>> >> -static int hvf_accel_init(MachineState *ms)
>>>> >> -{
>>>> >> -    int x;
>>>> >> -    hv_return_t ret;
>>>> >> -    HVFState *s;
>>>> >> -
>>>> >> -    ret = hv_vm_create(HV_VM_DEFAULT);
>>>> >> -    assert_hvf_ok(ret);
>>>> >> -
>>>> >> -    s = g_new0(HVFState, 1);
>>>> >> -
>>>> >> -    s->num_slots = 32;
>>>> >> -    for (x = 0; x < s->num_slots; ++x) {
>>>> >> -        s->slots[x].size = 0;
>>>> >> -        s->slots[x].slot_id = x;
>>>> >> -    }
>>>> >> -
>>>> >> -    hvf_state = s;
>>>> >> -    memory_listener_register(&hvf_memory_listener, &address_space_memory);
>>>> >> -    cpus_register_accel(&hvf_cpus);
>>>> >> -    return 0;
>>>> >> -}
>>>> >> -
>>>> >> -static void hvf_accel_class_init(ObjectClass *oc, void *data)
>>>> >> -{
>>>> >> -    AccelClass *ac = ACCEL_CLASS(oc);
>>>> >> -    ac->name = "HVF";
>>>> >> -    ac->init_machine = hvf_accel_init;
>>>> >> -    ac->allowed = &hvf_allowed;
>>>> >> -}
>>>> >> -
>>>> >> -static const TypeInfo hvf_accel_type = {
>>>> >> -    .name = TYPE_HVF_ACCEL,
>>>> >> -    .parent = TYPE_ACCEL,
>>>> >> -    .class_init = hvf_accel_class_init,
>>>> >> -};
>>>> >> -
>>>> >> -static void hvf_type_init(void)
>>>> >> -{
>>>> >> -    type_register_static(&hvf_accel_type);
>>>> >> -}
>>>> >> -
>>>> >> -type_init(hvf_type_init);
>>>> >> diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
>>>> >> index 409c9a3f14..c8a43717ee 100644
>>>> >> --- a/target/i386/hvf/meson.build
>>>> >> +++ b/target/i386/hvf/meson.build
>>>> >> @@ -1,6 +1,5 @@
>>>> >>   i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
>>>> >>     'hvf.c',
>>>> >> -  'hvf-cpus.c',
>>>> >>     'x86.c',
>>>> >>     'x86_cpuid.c',
>>>> >>     'x86_decode.c',
>>>> >> diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
>>>> >> index bbec412b6c..89b8e9d87a 100644
>>>> >> --- a/target/i386/hvf/x86hvf.c
>>>> >> +++ b/target/i386/hvf/x86hvf.c
>>>> >> @@ -20,6 +20,9 @@
>>>> >>   #include "qemu/osdep.h"
>>>> >>
>>>> >>   #include "qemu-common.h"
>>>> >> +#include "sysemu/hvf.h"
>>>> >> +#include "sysemu/hvf_int.h"
>>>> >> +#include "sysemu/hw_accel.h"
>>>> >>   #include "x86hvf.h"
>>>> >>   #include "vmx.h"
>>>> >>   #include "vmcs.h"
>>>> >> @@ -32,8 +35,6 @@
>>>> >>   #include <Hypervisor/hv.h>
>>>> >>   #include <Hypervisor/hv_vmx.h>
>>>> >>
>>>> >> -#include "hvf-cpus.h"
>>>> >> -
>>>> >>   void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
>>>> >>                        SegmentCache *qseg, bool is_tr)
>>>> >>   {
>>>> >> @@ -437,7 +438,7 @@ int hvf_process_events(CPUState *cpu_state)
>>>> >>       env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
>>>> >>
>>>> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
>>>> >> -        hvf_cpu_synchronize_state(cpu_state);
>>>> >> +        cpu_synchronize_state(cpu_state);
>>>> >>           do_cpu_init(cpu);
>>>> >>       }
>>>> >>
>>>> >> @@ -451,12 +452,12 @@ int hvf_process_events(CPUState *cpu_state)
>>>> >>           cpu_state->halted = 0;
>>>> >>       }
>>>> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) {
>>>> >> -        hvf_cpu_synchronize_state(cpu_state);
>>>> >> +        cpu_synchronize_state(cpu_state);
>>>> >>           do_cpu_sipi(cpu);
>>>> >>       }
>>>> >>       if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) {
>>>> >>           cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR;
>>>> >> -        hvf_cpu_synchronize_state(cpu_state);
>>>> >> +        cpu_synchronize_state(cpu_state);
>>>> > The changes from hvf_cpu_*() to cpu_*() are cleanup and perhaps should
>>>> > be a separate patch. It follows cpu/accel cleanups Claudio was doing the
>>>> > summer.
>>>>
>>>>
>>>> The only reason they're in here is because we no longer have access to
>>>> the hvf_ functions from the file. I am perfectly happy to rebase the
>>>> patch on top of Claudio's if his goes in first. I'm sure it'll be
>>>> trivial for him to rebase on top of this too if my series goes in first.
>>>>
>>>>
>>>> >
>>>> > Phillipe raised the idea that the patch might go ahead of ARM-specific
>>>> > part (which might involve some discussions) and I agree with that.
>>>> >
>>>> > Some sync between Claudio series (CC'd him) and the patch might be need.
>>>>
>>>>
>>>> I would prefer not to hold back because of the sync. Claudio's cleanup
>>>> is trivial enough to adjust for if it gets merged ahead of this.
>>>>
>>>>
>>>> Alex
>>>>
>>>>
>>>>
Peter Collingbourne Nov. 30, 2020, 11:01 p.m. UTC | #11
On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote:
>
> Hi Peter,
>
> On 30.11.20 22:08, Peter Collingbourne wrote:
> > On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote:
> >>
> >>
> >> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote:
> >>> Hi Frank,
> >>>
> >>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse.
> >> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either!
> >>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out
> >>>
> >>>    https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/
> >>>
> >> Thanks, we'll take a look :)
> >>
> >>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold.
> > Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
> > I'll reply to your patch here. You have:
> >
> > +                    /* Set cpu->hvf->sleeping so that we get a
> > SIG_IPI signal. */
> > +                    cpu->hvf->sleeping = true;
> > +                    smp_mb();
> > +
> > +                    /* Bail out if we received an IRQ meanwhile */
> > +                    if (cpu->thread_kicked || (cpu->interrupt_request &
> > +                        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> > +                        cpu->hvf->sleeping = false;
> > +                        break;
> > +                    }
> > +
> > +                    /* nanosleep returns on signal, so we wake up on kick. */
> > +                    nanosleep(ts, NULL);
> >
> > and then send the signal conditional on whether sleeping is true, but
> > I think this is racy. If the signal is sent after sleeping is set to
> > true but before entering nanosleep then I think it will be ignored and
> > we will miss the wakeup. That's why in my implementation I block IPI
> > on the CPU thread at startup and then use pselect to atomically
> > unblock and begin sleeping. The signal is sent unconditionally so
> > there's no need to worry about races between actually sleeping and the
> > "we think we're sleeping" state. It may lead to an extra wakeup but
> > that's better than missing it entirely.
>
>
> Thanks a bunch for the comment! So the trick I was using here is to
> modify the timespec from the kick function before sending the IPI
> signal. That way, we know that either we are inside the sleep (where the
> signal wakes it up) or we are outside the sleep (where timespec={} will
> make it return immediately).
>
> The only race I can think of is if nanosleep does calculations based on
> the timespec and we happen to send the signal right there and then.

Yes that's the race I was thinking of. Admittedly it's a small window
but it's theoretically possible and part of the reason why pselect was
created.

> The problem with blocking IPIs is basically what Frank was describing
> earlier: How do you unset the IPI signal pending status? If the signal
> is never delivered, how can pselect differentiate "signal from last time
> is still pending" from "new signal because I got an IPI"?

In this case we would take the additional wakeup which should be
harmless since we will take the WFx exit again and put us in the
correct state. But that's a lot better than busy looping.

I reckon that you could improve things a little by unblocking the
signal and then reblocking it before unlocking iothread (e.g. with a
pselect with zero time interval), which would flush any pending
signals. Since any such signal would correspond to a signal from last
time (because we still have the iothread lock) we know that any future
signals should correspond to new IPIs.

Peter
Alexander Graf Nov. 30, 2020, 11:18 p.m. UTC | #12
On 01.12.20 00:01, Peter Collingbourne wrote:
> On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote:
>> Hi Peter,
>>
>> On 30.11.20 22:08, Peter Collingbourne wrote:
>>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote:
>>>>
>>>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote:
>>>>> Hi Frank,
>>>>>
>>>>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse.
>>>> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either!
>>>>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out
>>>>>
>>>>>     https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/
>>>>>
>>>> Thanks, we'll take a look :)
>>>>
>>>>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold.
>>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
>>> I'll reply to your patch here. You have:
>>>
>>> +                    /* Set cpu->hvf->sleeping so that we get a
>>> SIG_IPI signal. */
>>> +                    cpu->hvf->sleeping = true;
>>> +                    smp_mb();
>>> +
>>> +                    /* Bail out if we received an IRQ meanwhile */
>>> +                    if (cpu->thread_kicked || (cpu->interrupt_request &
>>> +                        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
>>> +                        cpu->hvf->sleeping = false;
>>> +                        break;
>>> +                    }
>>> +
>>> +                    /* nanosleep returns on signal, so we wake up on kick. */
>>> +                    nanosleep(ts, NULL);
>>>
>>> and then send the signal conditional on whether sleeping is true, but
>>> I think this is racy. If the signal is sent after sleeping is set to
>>> true but before entering nanosleep then I think it will be ignored and
>>> we will miss the wakeup. That's why in my implementation I block IPI
>>> on the CPU thread at startup and then use pselect to atomically
>>> unblock and begin sleeping. The signal is sent unconditionally so
>>> there's no need to worry about races between actually sleeping and the
>>> "we think we're sleeping" state. It may lead to an extra wakeup but
>>> that's better than missing it entirely.
>>
>> Thanks a bunch for the comment! So the trick I was using here is to
>> modify the timespec from the kick function before sending the IPI
>> signal. That way, we know that either we are inside the sleep (where the
>> signal wakes it up) or we are outside the sleep (where timespec={} will
>> make it return immediately).
>>
>> The only race I can think of is if nanosleep does calculations based on
>> the timespec and we happen to send the signal right there and then.
> Yes that's the race I was thinking of. Admittedly it's a small window
> but it's theoretically possible and part of the reason why pselect was
> created.
>
>> The problem with blocking IPIs is basically what Frank was describing
>> earlier: How do you unset the IPI signal pending status? If the signal
>> is never delivered, how can pselect differentiate "signal from last time
>> is still pending" from "new signal because I got an IPI"?
> In this case we would take the additional wakeup which should be
> harmless since we will take the WFx exit again and put us in the
> correct state. But that's a lot better than busy looping.


I'm not sure I follow. I'm thinking of the following scenario:

   - trap into WFI handler
   - go to sleep with blocked SIG_IPI
   - SIG_IPI arrives, pselect() exits
   - signal is still pending because it's blocked
   - enter guest
   - trap into WFI handler
   - run pselect(), but it immediate exits because SIG_IPI is still pending

This was the loop I was seeing when running with SIG_IPI blocked. That's 
part of the reason why I switched to a different model.


> I reckon that you could improve things a little by unblocking the
> signal and then reblocking it before unlocking iothread (e.g. with a
> pselect with zero time interval), which would flush any pending
> signals. Since any such signal would correspond to a signal from last
> time (because we still have the iothread lock) we know that any future
> signals should correspond to new IPIs.


Yeah, I think you actually *have* to do exactly that, because otherwise 
pselect() will always return after 0ns because the signal is still pending.

And yes, I agree that that starts to sound a bit less racy now. But it 
means we can probably also just do

   - WFI handler
   - block SIG_IPI
   - set hvf->sleeping = true
   - check for pending interrupts
   - pselect()
   - unblock SIG_IPI

which means we run with SIG_IPI unmasked by default. I don't think the 
number of signal mask changes is any different with that compared to 
running with SIG_IPI always masked, right?


Alex
Peter Collingbourne Dec. 1, 2020, midnight UTC | #13
On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf <agraf@csgraf.de> wrote:
>
>
> On 01.12.20 00:01, Peter Collingbourne wrote:
> > On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote:
> >> Hi Peter,
> >>
> >> On 30.11.20 22:08, Peter Collingbourne wrote:
> >>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote:
> >>>>
> >>>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote:
> >>>>> Hi Frank,
> >>>>>
> >>>>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse.
> >>>> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either!
> >>>>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out
> >>>>>
> >>>>>     https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/
> >>>>>
> >>>> Thanks, we'll take a look :)
> >>>>
> >>>>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold.
> >>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
> >>> I'll reply to your patch here. You have:
> >>>
> >>> +                    /* Set cpu->hvf->sleeping so that we get a
> >>> SIG_IPI signal. */
> >>> +                    cpu->hvf->sleeping = true;
> >>> +                    smp_mb();
> >>> +
> >>> +                    /* Bail out if we received an IRQ meanwhile */
> >>> +                    if (cpu->thread_kicked || (cpu->interrupt_request &
> >>> +                        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> >>> +                        cpu->hvf->sleeping = false;
> >>> +                        break;
> >>> +                    }
> >>> +
> >>> +                    /* nanosleep returns on signal, so we wake up on kick. */
> >>> +                    nanosleep(ts, NULL);
> >>>
> >>> and then send the signal conditional on whether sleeping is true, but
> >>> I think this is racy. If the signal is sent after sleeping is set to
> >>> true but before entering nanosleep then I think it will be ignored and
> >>> we will miss the wakeup. That's why in my implementation I block IPI
> >>> on the CPU thread at startup and then use pselect to atomically
> >>> unblock and begin sleeping. The signal is sent unconditionally so
> >>> there's no need to worry about races between actually sleeping and the
> >>> "we think we're sleeping" state. It may lead to an extra wakeup but
> >>> that's better than missing it entirely.
> >>
> >> Thanks a bunch for the comment! So the trick I was using here is to
> >> modify the timespec from the kick function before sending the IPI
> >> signal. That way, we know that either we are inside the sleep (where the
> >> signal wakes it up) or we are outside the sleep (where timespec={} will
> >> make it return immediately).
> >>
> >> The only race I can think of is if nanosleep does calculations based on
> >> the timespec and we happen to send the signal right there and then.
> > Yes that's the race I was thinking of. Admittedly it's a small window
> > but it's theoretically possible and part of the reason why pselect was
> > created.
> >
> >> The problem with blocking IPIs is basically what Frank was describing
> >> earlier: How do you unset the IPI signal pending status? If the signal
> >> is never delivered, how can pselect differentiate "signal from last time
> >> is still pending" from "new signal because I got an IPI"?
> > In this case we would take the additional wakeup which should be
> > harmless since we will take the WFx exit again and put us in the
> > correct state. But that's a lot better than busy looping.
>
>
> I'm not sure I follow. I'm thinking of the following scenario:
>
>    - trap into WFI handler
>    - go to sleep with blocked SIG_IPI
>    - SIG_IPI arrives, pselect() exits
>    - signal is still pending because it's blocked
>    - enter guest
>    - trap into WFI handler
>    - run pselect(), but it immediate exits because SIG_IPI is still pending
>
> This was the loop I was seeing when running with SIG_IPI blocked. That's
> part of the reason why I switched to a different model.

What I observe is that when returning from a pending signal pselect
consumes the signal (which is also consistent with my understanding of
what pselect does). That means that it doesn't matter if we take a
second WFx exit because once we reach the pselect in the second WFx
exit the signal will have been consumed by the pselect in the first
exit and we will just wait for the next one.

I don't know why things may have been going wrong in your
implementation but it may be related to the issue with
mach_absolute_time() which I posted about separately and was also
causing busy loops for us in some cases. Once that issue was fixed in
our implementation we started seeing sleep until VTIMER due work
properly.

>
>
> > I reckon that you could improve things a little by unblocking the
> > signal and then reblocking it before unlocking iothread (e.g. with a
> > pselect with zero time interval), which would flush any pending
> > signals. Since any such signal would correspond to a signal from last
> > time (because we still have the iothread lock) we know that any future
> > signals should correspond to new IPIs.
>
>
> Yeah, I think you actually *have* to do exactly that, because otherwise
> pselect() will always return after 0ns because the signal is still pending.
>
> And yes, I agree that that starts to sound a bit less racy now. But it
> means we can probably also just do
>
>    - WFI handler
>    - block SIG_IPI
>    - set hvf->sleeping = true
>    - check for pending interrupts
>    - pselect()
>    - unblock SIG_IPI
>
> which means we run with SIG_IPI unmasked by default. I don't think the
> number of signal mask changes is any different with that compared to
> running with SIG_IPI always masked, right?

And unlock/lock iothread around the pselect? I suppose that could work
but as I mentioned it would just be an optimization.

Maybe I can try to make my approach work on top of your series, or if
you already have a patch I can try to debug it. Let me know.

Peter
Alexander Graf Dec. 1, 2020, 12:13 a.m. UTC | #14
On 01.12.20 01:00, Peter Collingbourne wrote:
> On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf <agraf@csgraf.de> wrote:
>>
>> On 01.12.20 00:01, Peter Collingbourne wrote:
>>> On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote:
>>>> Hi Peter,
>>>>
>>>> On 30.11.20 22:08, Peter Collingbourne wrote:
>>>>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote:
>>>>>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote:
>>>>>>> Hi Frank,
>>>>>>>
>>>>>>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse.
>>>>>> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either!
>>>>>>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out
>>>>>>>
>>>>>>>      https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/
>>>>>>>
>>>>>> Thanks, we'll take a look :)
>>>>>>
>>>>>>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold.
>>>>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
>>>>> I'll reply to your patch here. You have:
>>>>>
>>>>> +                    /* Set cpu->hvf->sleeping so that we get a
>>>>> SIG_IPI signal. */
>>>>> +                    cpu->hvf->sleeping = true;
>>>>> +                    smp_mb();
>>>>> +
>>>>> +                    /* Bail out if we received an IRQ meanwhile */
>>>>> +                    if (cpu->thread_kicked || (cpu->interrupt_request &
>>>>> +                        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
>>>>> +                        cpu->hvf->sleeping = false;
>>>>> +                        break;
>>>>> +                    }
>>>>> +
>>>>> +                    /* nanosleep returns on signal, so we wake up on kick. */
>>>>> +                    nanosleep(ts, NULL);
>>>>>
>>>>> and then send the signal conditional on whether sleeping is true, but
>>>>> I think this is racy. If the signal is sent after sleeping is set to
>>>>> true but before entering nanosleep then I think it will be ignored and
>>>>> we will miss the wakeup. That's why in my implementation I block IPI
>>>>> on the CPU thread at startup and then use pselect to atomically
>>>>> unblock and begin sleeping. The signal is sent unconditionally so
>>>>> there's no need to worry about races between actually sleeping and the
>>>>> "we think we're sleeping" state. It may lead to an extra wakeup but
>>>>> that's better than missing it entirely.
>>>> Thanks a bunch for the comment! So the trick I was using here is to
>>>> modify the timespec from the kick function before sending the IPI
>>>> signal. That way, we know that either we are inside the sleep (where the
>>>> signal wakes it up) or we are outside the sleep (where timespec={} will
>>>> make it return immediately).
>>>>
>>>> The only race I can think of is if nanosleep does calculations based on
>>>> the timespec and we happen to send the signal right there and then.
>>> Yes that's the race I was thinking of. Admittedly it's a small window
>>> but it's theoretically possible and part of the reason why pselect was
>>> created.
>>>
>>>> The problem with blocking IPIs is basically what Frank was describing
>>>> earlier: How do you unset the IPI signal pending status? If the signal
>>>> is never delivered, how can pselect differentiate "signal from last time
>>>> is still pending" from "new signal because I got an IPI"?
>>> In this case we would take the additional wakeup which should be
>>> harmless since we will take the WFx exit again and put us in the
>>> correct state. But that's a lot better than busy looping.
>>
>> I'm not sure I follow. I'm thinking of the following scenario:
>>
>>     - trap into WFI handler
>>     - go to sleep with blocked SIG_IPI
>>     - SIG_IPI arrives, pselect() exits
>>     - signal is still pending because it's blocked
>>     - enter guest
>>     - trap into WFI handler
>>     - run pselect(), but it immediate exits because SIG_IPI is still pending
>>
>> This was the loop I was seeing when running with SIG_IPI blocked. That's
>> part of the reason why I switched to a different model.
> What I observe is that when returning from a pending signal pselect
> consumes the signal (which is also consistent with my understanding of
> what pselect does). That means that it doesn't matter if we take a
> second WFx exit because once we reach the pselect in the second WFx
> exit the signal will have been consumed by the pselect in the first
> exit and we will just wait for the next one.
>
> I don't know why things may have been going wrong in your
> implementation but it may be related to the issue with
> mach_absolute_time() which I posted about separately and was also
> causing busy loops for us in some cases. Once that issue was fixed in
> our implementation we started seeing sleep until VTIMER due work
> properly.
>
>>
>>> I reckon that you could improve things a little by unblocking the
>>> signal and then reblocking it before unlocking iothread (e.g. with a
>>> pselect with zero time interval), which would flush any pending
>>> signals. Since any such signal would correspond to a signal from last
>>> time (because we still have the iothread lock) we know that any future
>>> signals should correspond to new IPIs.
>>
>> Yeah, I think you actually *have* to do exactly that, because otherwise
>> pselect() will always return after 0ns because the signal is still pending.
>>
>> And yes, I agree that that starts to sound a bit less racy now. But it
>> means we can probably also just do
>>
>>     - WFI handler
>>     - block SIG_IPI
>>     - set hvf->sleeping = true
>>     - check for pending interrupts
>>     - pselect()
>>     - unblock SIG_IPI
>>
>> which means we run with SIG_IPI unmasked by default. I don't think the
>> number of signal mask changes is any different with that compared to
>> running with SIG_IPI always masked, right?
> And unlock/lock iothread around the pselect? I suppose that could work
> but as I mentioned it would just be an optimization.
>
> Maybe I can try to make my approach work on top of your series, or if
> you already have a patch I can try to debug it. Let me know.


I would love to take a patch from you here :). I'll still be stuck for a 
while with the sysreg sync rework that Peter asked for before I can look 
at WFI again.


Alex
Roman Bolshakov Dec. 1, 2020, 12:37 a.m. UTC | #15
On Mon, Nov 30, 2020 at 10:40:49PM +0100, Alexander Graf wrote:
> Hi Peter,
> 
> On 30.11.20 22:08, Peter Collingbourne wrote:
> > On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote:
> > > 
> > > 
> > > On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote:
> > > > Hi Frank,
> > > > 
> > > > Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse.
> > > Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either!
> > > > Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out
> > > > 
> > > >    https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/
> > > > 
> > > Thanks, we'll take a look :)
> > > 
> > > > Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold.
> > Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
> > I'll reply to your patch here. You have:
> > 
> > +                    /* Set cpu->hvf->sleeping so that we get a
> > SIG_IPI signal. */
> > +                    cpu->hvf->sleeping = true;
> > +                    smp_mb();
> > +
> > +                    /* Bail out if we received an IRQ meanwhile */
> > +                    if (cpu->thread_kicked || (cpu->interrupt_request &
> > +                        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> > +                        cpu->hvf->sleeping = false;
> > +                        break;
> > +                    }
> > +
> > +                    /* nanosleep returns on signal, so we wake up on kick. */
> > +                    nanosleep(ts, NULL);
> > 
> > and then send the signal conditional on whether sleeping is true, but
> > I think this is racy. If the signal is sent after sleeping is set to
> > true but before entering nanosleep then I think it will be ignored and
> > we will miss the wakeup. That's why in my implementation I block IPI
> > on the CPU thread at startup and then use pselect to atomically
> > unblock and begin sleeping. The signal is sent unconditionally so
> > there's no need to worry about races between actually sleeping and the
> > "we think we're sleeping" state. It may lead to an extra wakeup but
> > that's better than missing it entirely.
> 
> 
> Thanks a bunch for the comment! So the trick I was using here is to modify
> the timespec from the kick function before sending the IPI signal. That way,
> we know that either we are inside the sleep (where the signal wakes it up)
> or we are outside the sleep (where timespec={} will make it return
> immediately).
> 
> The only race I can think of is if nanosleep does calculations based on the
> timespec and we happen to send the signal right there and then.
> 
> The problem with blocking IPIs is basically what Frank was describing
> earlier: How do you unset the IPI signal pending status? If the signal is
> never delivered, how can pselect differentiate "signal from last time is
> still pending" from "new signal because I got an IPI"?
> 
> 

Hi Alex,

There was a patch for x86 HVF that implements CPU kick and it wasn't
merged (mostly because of my lazyness). It has some changes like you
introduced in the series and VMX-specific handling of preemption timer
to gurantee interrupt delivery without kick loss:

https://patchwork.kernel.org/project/qemu-devel/patch/20200729124832.79375-1-r.bolshakov@yadro.com/

I wonder if it'd possible to have common handling of kicks for both x86
and arm (given that arch-specific bits are wrapped)?

Thanks,
Roman
Frank Yang Dec. 1, 2020, 2:49 a.m. UTC | #16
On Mon, Nov 30, 2020 at 2:10 PM Peter Maydell <peter.maydell@linaro.org>
wrote:

> On Mon, 30 Nov 2020 at 20:56, Frank Yang <lfy@google.com> wrote:
> > We'd actually like to contribute upstream too :) We do want to maintain
> > our own downstream though; Android Emulator codebase needs to work
> > solidly on macos and windows which has made keeping up with upstream
> difficult
>
> One of the main reasons why OSX and Windows support upstream is
> not so great is because very few people are helping to develop,
> test and support it upstream. The way to fix that IMHO is for more
> people who do care about those platforms to actively engage
> with us upstream to help in making those platforms move closer to
> being first class citizens. If you stay on a downstream fork
> forever then I don't think you'll ever see things improve.
>
> thanks
> -- PMM
>

That's a really good point. I'll definitely be more active about sending
comments upstream in the future :)

Frank
Roman Bolshakov Dec. 3, 2020, 9:41 a.m. UTC | #17
On Mon, Nov 30, 2020 at 04:00:11PM -0800, Peter Collingbourne wrote:
> On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf <agraf@csgraf.de> wrote:
> >
> >
> > On 01.12.20 00:01, Peter Collingbourne wrote:
> > > On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote:
> > >> Hi Peter,
> > >>
> > >> On 30.11.20 22:08, Peter Collingbourne wrote:
> > >>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote:
> > >>>>
> > >>>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote:
> > >>>>> Hi Frank,
> > >>>>>
> > >>>>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse.
> > >>>> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either!
> > >>>>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out
> > >>>>>
> > >>>>>     https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/
> > >>>>>
> > >>>> Thanks, we'll take a look :)
> > >>>>
> > >>>>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold.
> > >>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
> > >>> I'll reply to your patch here. You have:
> > >>>
> > >>> +                    /* Set cpu->hvf->sleeping so that we get a
> > >>> SIG_IPI signal. */
> > >>> +                    cpu->hvf->sleeping = true;
> > >>> +                    smp_mb();
> > >>> +
> > >>> +                    /* Bail out if we received an IRQ meanwhile */
> > >>> +                    if (cpu->thread_kicked || (cpu->interrupt_request &
> > >>> +                        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> > >>> +                        cpu->hvf->sleeping = false;
> > >>> +                        break;
> > >>> +                    }
> > >>> +
> > >>> +                    /* nanosleep returns on signal, so we wake up on kick. */
> > >>> +                    nanosleep(ts, NULL);
> > >>>
> > >>> and then send the signal conditional on whether sleeping is true, but
> > >>> I think this is racy. If the signal is sent after sleeping is set to
> > >>> true but before entering nanosleep then I think it will be ignored and
> > >>> we will miss the wakeup. That's why in my implementation I block IPI
> > >>> on the CPU thread at startup and then use pselect to atomically
> > >>> unblock and begin sleeping. The signal is sent unconditionally so
> > >>> there's no need to worry about races between actually sleeping and the
> > >>> "we think we're sleeping" state. It may lead to an extra wakeup but
> > >>> that's better than missing it entirely.
> > >>
> > >> Thanks a bunch for the comment! So the trick I was using here is to > > >> modify the timespec from the kick function before sending the IPI
> > >> signal. That way, we know that either we are inside the sleep (where the
> > >> signal wakes it up) or we are outside the sleep (where timespec={} will
> > >> make it return immediately).
> > >>
> > >> The only race I can think of is if nanosleep does calculations based on
> > >> the timespec and we happen to send the signal right there and then.
> > > Yes that's the race I was thinking of. Admittedly it's a small window
> > > but it's theoretically possible and part of the reason why pselect was
> > > created.
> > >
> > >> The problem with blocking IPIs is basically what Frank was describing
> > >> earlier: How do you unset the IPI signal pending status? If the signal
> > >> is never delivered, how can pselect differentiate "signal from last time
> > >> is still pending" from "new signal because I got an IPI"?
> > > In this case we would take the additional wakeup which should be
> > > harmless since we will take the WFx exit again and put us in the
> > > correct state. But that's a lot better than busy looping.
> >
> >
> > I'm not sure I follow. I'm thinking of the following scenario:
> >
> >    - trap into WFI handler
> >    - go to sleep with blocked SIG_IPI
> >    - SIG_IPI arrives, pselect() exits
> >    - signal is still pending because it's blocked
> >    - enter guest
> >    - trap into WFI handler
> >    - run pselect(), but it immediate exits because SIG_IPI is still pending
> >
> > This was the loop I was seeing when running with SIG_IPI blocked. That's
> > part of the reason why I switched to a different model.
> 
> What I observe is that when returning from a pending signal pselect
> consumes the signal (which is also consistent with my understanding of
> what pselect does). That means that it doesn't matter if we take a
> second WFx exit because once we reach the pselect in the second WFx
> exit the signal will have been consumed by the pselect in the first
> exit and we will just wait for the next one.
> 

Aha! Thanks for the explanation. So, the first WFI in the series of
guest WFIs will likely wake up immediately? After a period without WFIs
there must be a pending SIG_IPI...

It shouldn't be a critical issue though because (as defined in D1.16.2)
"the architecture permits a PE to leave the low-power state for any
reason, it is permissible for a PE to treat WFI as a NOP, but this is
not recommended for lowest power operation."

BTW. I think a bit from the thread should go into the description of
patch 8, because it's not trivial and it would really be helpful to keep
in repo history. At least something like this (taken from an earlier
reply in the thread):

  In this implementation IPI is blocked on the CPU thread at startup and
  pselect() is used to atomically unblock the signal and begin sleeping.
  The signal is sent unconditionally so there's no need to worry about
  races between actually sleeping and the "we think we're sleeping"
  state. It may lead to an extra wakeup but that's better than missing
  it entirely.


Thanks,
Roman

> I don't know why things may have been going wrong in your
> implementation but it may be related to the issue with
> mach_absolute_time() which I posted about separately and was also
> causing busy loops for us in some cases. Once that issue was fixed in
> our implementation we started seeing sleep until VTIMER due work
> properly.
> 
> >
> >
> > > I reckon that you could improve things a little by unblocking the
> > > signal and then reblocking it before unlocking iothread (e.g. with a
> > > pselect with zero time interval), which would flush any pending
> > > signals. Since any such signal would correspond to a signal from last
> > > time (because we still have the iothread lock) we know that any future
> > > signals should correspond to new IPIs.
> >
> >
> > Yeah, I think you actually *have* to do exactly that, because otherwise
> > pselect() will always return after 0ns because the signal is still pending.
> >
> > And yes, I agree that that starts to sound a bit less racy now. But it
> > means we can probably also just do
> >
> >    - WFI handler
> >    - block SIG_IPI
> >    - set hvf->sleeping = true
> >    - check for pending interrupts
> >    - pselect()
> >    - unblock SIG_IPI
> >
> > which means we run with SIG_IPI unmasked by default. I don't think the
> > number of signal mask changes is any different with that compared to
> > running with SIG_IPI always masked, right?
> 

P.S. Just found that Alex already raised my concern. Pending signals
have to be consumed or there should be no pending signals to start
sleeping on the very first WFI.

> And unlock/lock iothread around the pselect? I suppose that could work
> but as I mentioned it would just be an optimization.
> 
> Maybe I can try to make my approach work on top of your series, or if
> you already have a patch I can try to debug it. Let me know.
> 
> Peter
Peter Collingbourne Dec. 3, 2020, 6:42 p.m. UTC | #18
On Thu, Dec 3, 2020 at 1:41 AM Roman Bolshakov <r.bolshakov@yadro.com> wrote:
>
> On Mon, Nov 30, 2020 at 04:00:11PM -0800, Peter Collingbourne wrote:
> > On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf <agraf@csgraf.de> wrote:
> > >
> > >
> > > On 01.12.20 00:01, Peter Collingbourne wrote:
> > > > On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote:
> > > >> Hi Peter,
> > > >>
> > > >> On 30.11.20 22:08, Peter Collingbourne wrote:
> > > >>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote:
> > > >>>>
> > > >>>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote:
> > > >>>>> Hi Frank,
> > > >>>>>
> > > >>>>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse.
> > > >>>> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either!
> > > >>>>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out
> > > >>>>>
> > > >>>>>     https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/
> > > >>>>>
> > > >>>> Thanks, we'll take a look :)
> > > >>>>
> > > >>>>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold.
> > > >>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
> > > >>> I'll reply to your patch here. You have:
> > > >>>
> > > >>> +                    /* Set cpu->hvf->sleeping so that we get a
> > > >>> SIG_IPI signal. */
> > > >>> +                    cpu->hvf->sleeping = true;
> > > >>> +                    smp_mb();
> > > >>> +
> > > >>> +                    /* Bail out if we received an IRQ meanwhile */
> > > >>> +                    if (cpu->thread_kicked || (cpu->interrupt_request &
> > > >>> +                        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> > > >>> +                        cpu->hvf->sleeping = false;
> > > >>> +                        break;
> > > >>> +                    }
> > > >>> +
> > > >>> +                    /* nanosleep returns on signal, so we wake up on kick. */
> > > >>> +                    nanosleep(ts, NULL);
> > > >>>
> > > >>> and then send the signal conditional on whether sleeping is true, but
> > > >>> I think this is racy. If the signal is sent after sleeping is set to
> > > >>> true but before entering nanosleep then I think it will be ignored and
> > > >>> we will miss the wakeup. That's why in my implementation I block IPI
> > > >>> on the CPU thread at startup and then use pselect to atomically
> > > >>> unblock and begin sleeping. The signal is sent unconditionally so
> > > >>> there's no need to worry about races between actually sleeping and the
> > > >>> "we think we're sleeping" state. It may lead to an extra wakeup but
> > > >>> that's better than missing it entirely.
> > > >>
> > > >> Thanks a bunch for the comment! So the trick I was using here is to > > >> modify the timespec from the kick function before sending the IPI
> > > >> signal. That way, we know that either we are inside the sleep (where the
> > > >> signal wakes it up) or we are outside the sleep (where timespec={} will
> > > >> make it return immediately).
> > > >>
> > > >> The only race I can think of is if nanosleep does calculations based on
> > > >> the timespec and we happen to send the signal right there and then.
> > > > Yes that's the race I was thinking of. Admittedly it's a small window
> > > > but it's theoretically possible and part of the reason why pselect was
> > > > created.
> > > >
> > > >> The problem with blocking IPIs is basically what Frank was describing
> > > >> earlier: How do you unset the IPI signal pending status? If the signal
> > > >> is never delivered, how can pselect differentiate "signal from last time
> > > >> is still pending" from "new signal because I got an IPI"?
> > > > In this case we would take the additional wakeup which should be
> > > > harmless since we will take the WFx exit again and put us in the
> > > > correct state. But that's a lot better than busy looping.
> > >
> > >
> > > I'm not sure I follow. I'm thinking of the following scenario:
> > >
> > >    - trap into WFI handler
> > >    - go to sleep with blocked SIG_IPI
> > >    - SIG_IPI arrives, pselect() exits
> > >    - signal is still pending because it's blocked
> > >    - enter guest
> > >    - trap into WFI handler
> > >    - run pselect(), but it immediate exits because SIG_IPI is still pending
> > >
> > > This was the loop I was seeing when running with SIG_IPI blocked. That's
> > > part of the reason why I switched to a different model.
> >
> > What I observe is that when returning from a pending signal pselect
> > consumes the signal (which is also consistent with my understanding of
> > what pselect does). That means that it doesn't matter if we take a
> > second WFx exit because once we reach the pselect in the second WFx
> > exit the signal will have been consumed by the pselect in the first
> > exit and we will just wait for the next one.
> >
>
> Aha! Thanks for the explanation. So, the first WFI in the series of
> guest WFIs will likely wake up immediately? After a period without WFIs
> there must be a pending SIG_IPI...
>
> It shouldn't be a critical issue though because (as defined in D1.16.2)
> "the architecture permits a PE to leave the low-power state for any
> reason, it is permissible for a PE to treat WFI as a NOP, but this is
> not recommended for lowest power operation."
>
> BTW. I think a bit from the thread should go into the description of
> patch 8, because it's not trivial and it would really be helpful to keep
> in repo history. At least something like this (taken from an earlier
> reply in the thread):
>
>   In this implementation IPI is blocked on the CPU thread at startup and
>   pselect() is used to atomically unblock the signal and begin sleeping.
>   The signal is sent unconditionally so there's no need to worry about
>   races between actually sleeping and the "we think we're sleeping"
>   state. It may lead to an extra wakeup but that's better than missing
>   it entirely.

Okay, I'll add something like that to the next version of the patch I send out.

Peter

>
>
> Thanks,
> Roman
>
> > I don't know why things may have been going wrong in your
> > implementation but it may be related to the issue with
> > mach_absolute_time() which I posted about separately and was also
> > causing busy loops for us in some cases. Once that issue was fixed in
> > our implementation we started seeing sleep until VTIMER due work
> > properly.
> >
> > >
> > >
> > > > I reckon that you could improve things a little by unblocking the
> > > > signal and then reblocking it before unlocking iothread (e.g. with a
> > > > pselect with zero time interval), which would flush any pending
> > > > signals. Since any such signal would correspond to a signal from last
> > > > time (because we still have the iothread lock) we know that any future
> > > > signals should correspond to new IPIs.
> > >
> > >
> > > Yeah, I think you actually *have* to do exactly that, because otherwise
> > > pselect() will always return after 0ns because the signal is still pending.
> > >
> > > And yes, I agree that that starts to sound a bit less racy now. But it
> > > means we can probably also just do
> > >
> > >    - WFI handler
> > >    - block SIG_IPI
> > >    - set hvf->sleeping = true
> > >    - check for pending interrupts
> > >    - pselect()
> > >    - unblock SIG_IPI
> > >
> > > which means we run with SIG_IPI unmasked by default. I don't think the
> > > number of signal mask changes is any different with that compared to
> > > running with SIG_IPI always masked, right?
> >
>
> P.S. Just found that Alex already raised my concern. Pending signals
> have to be consumed or there should be no pending signals to start
> sleeping on the very first WFI.
>
> > And unlock/lock iothread around the pselect? I suppose that could work
> > but as I mentioned it would just be an optimization.
> >
> > Maybe I can try to make my approach work on top of your series, or if
> > you already have a patch I can try to debug it. Let me know.
> >
> > Peter
Alexander Graf Dec. 3, 2020, 10:13 p.m. UTC | #19
On 03.12.20 19:42, Peter Collingbourne wrote:
> On Thu, Dec 3, 2020 at 1:41 AM Roman Bolshakov <r.bolshakov@yadro.com> wrote:
>> On Mon, Nov 30, 2020 at 04:00:11PM -0800, Peter Collingbourne wrote:
>>> On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf <agraf@csgraf.de> wrote:
>>>>
>>>> On 01.12.20 00:01, Peter Collingbourne wrote:
>>>>> On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote:
>>>>>> Hi Peter,
>>>>>>
>>>>>> On 30.11.20 22:08, Peter Collingbourne wrote:
>>>>>>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote:
>>>>>>>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote:
>>>>>>>>> Hi Frank,
>>>>>>>>>
>>>>>>>>> Thanks for the update :). Your previous email nudged me into the right direction. I previously had implemented WFI through the internal timer framework which performed way worse.
>>>>>>>> Cool, glad it's helping. Also, Peter found out that the main thing keeping us from just using cntpct_el0 on the host directly and compare with cval is that if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get either the architecture or macos to read out the sleep time then we might be able to not have to use a poll interval either!
>>>>>>>>> Along the way, I stumbled over a few issues though. For starters, the signal mask for SIG_IPI was not set correctly, so while pselect() would exit, the signal would never get delivered to the thread! For a fix, check out
>>>>>>>>>
>>>>>>>>>      https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/
>>>>>>>>>
>>>>>>>> Thanks, we'll take a look :)
>>>>>>>>
>>>>>>>>> Please also have a look at my latest stab at WFI emulation. It doesn't handle WFE (that's only relevant in overcommitted scenarios). But it does handle WFI and even does something similar to hlt polling, albeit not with an adaptive threshold.
>>>>>>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
>>>>>>> I'll reply to your patch here. You have:
>>>>>>>
>>>>>>> +                    /* Set cpu->hvf->sleeping so that we get a
>>>>>>> SIG_IPI signal. */
>>>>>>> +                    cpu->hvf->sleeping = true;
>>>>>>> +                    smp_mb();
>>>>>>> +
>>>>>>> +                    /* Bail out if we received an IRQ meanwhile */
>>>>>>> +                    if (cpu->thread_kicked || (cpu->interrupt_request &
>>>>>>> +                        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
>>>>>>> +                        cpu->hvf->sleeping = false;
>>>>>>> +                        break;
>>>>>>> +                    }
>>>>>>> +
>>>>>>> +                    /* nanosleep returns on signal, so we wake up on kick. */
>>>>>>> +                    nanosleep(ts, NULL);
>>>>>>>
>>>>>>> and then send the signal conditional on whether sleeping is true, but
>>>>>>> I think this is racy. If the signal is sent after sleeping is set to
>>>>>>> true but before entering nanosleep then I think it will be ignored and
>>>>>>> we will miss the wakeup. That's why in my implementation I block IPI
>>>>>>> on the CPU thread at startup and then use pselect to atomically
>>>>>>> unblock and begin sleeping. The signal is sent unconditionally so
>>>>>>> there's no need to worry about races between actually sleeping and the
>>>>>>> "we think we're sleeping" state. It may lead to an extra wakeup but
>>>>>>> that's better than missing it entirely.
>>>>>> Thanks a bunch for the comment! So the trick I was using here is to > > >> modify the timespec from the kick function before sending the IPI
>>>>>> signal. That way, we know that either we are inside the sleep (where the
>>>>>> signal wakes it up) or we are outside the sleep (where timespec={} will
>>>>>> make it return immediately).
>>>>>>
>>>>>> The only race I can think of is if nanosleep does calculations based on
>>>>>> the timespec and we happen to send the signal right there and then.
>>>>> Yes that's the race I was thinking of. Admittedly it's a small window
>>>>> but it's theoretically possible and part of the reason why pselect was
>>>>> created.
>>>>>
>>>>>> The problem with blocking IPIs is basically what Frank was describing
>>>>>> earlier: How do you unset the IPI signal pending status? If the signal
>>>>>> is never delivered, how can pselect differentiate "signal from last time
>>>>>> is still pending" from "new signal because I got an IPI"?
>>>>> In this case we would take the additional wakeup which should be
>>>>> harmless since we will take the WFx exit again and put us in the
>>>>> correct state. But that's a lot better than busy looping.
>>>>
>>>> I'm not sure I follow. I'm thinking of the following scenario:
>>>>
>>>>     - trap into WFI handler
>>>>     - go to sleep with blocked SIG_IPI
>>>>     - SIG_IPI arrives, pselect() exits
>>>>     - signal is still pending because it's blocked
>>>>     - enter guest
>>>>     - trap into WFI handler
>>>>     - run pselect(), but it immediate exits because SIG_IPI is still pending
>>>>
>>>> This was the loop I was seeing when running with SIG_IPI blocked. That's
>>>> part of the reason why I switched to a different model.
>>> What I observe is that when returning from a pending signal pselect
>>> consumes the signal (which is also consistent with my understanding of
>>> what pselect does). That means that it doesn't matter if we take a
>>> second WFx exit because once we reach the pselect in the second WFx
>>> exit the signal will have been consumed by the pselect in the first
>>> exit and we will just wait for the next one.
>>>
>> Aha! Thanks for the explanation. So, the first WFI in the series of
>> guest WFIs will likely wake up immediately? After a period without WFIs
>> there must be a pending SIG_IPI...
>>
>> It shouldn't be a critical issue though because (as defined in D1.16.2)
>> "the architecture permits a PE to leave the low-power state for any
>> reason, it is permissible for a PE to treat WFI as a NOP, but this is
>> not recommended for lowest power operation."
>>
>> BTW. I think a bit from the thread should go into the description of
>> patch 8, because it's not trivial and it would really be helpful to keep
>> in repo history. At least something like this (taken from an earlier
>> reply in the thread):
>>
>>    In this implementation IPI is blocked on the CPU thread at startup and
>>    pselect() is used to atomically unblock the signal and begin sleeping.
>>    The signal is sent unconditionally so there's no need to worry about
>>    races between actually sleeping and the "we think we're sleeping"
>>    state. It may lead to an extra wakeup but that's better than missing
>>    it entirely.
> Okay, I'll add something like that to the next version of the patch I send out.


If this is the only change, I've already added it for v4. If you want me 
to change it further, just let me know what to replace the patch 
description with.


Alex
Roman Bolshakov Dec. 3, 2020, 11:04 p.m. UTC | #20
On Thu, Dec 03, 2020 at 11:13:35PM +0100, Alexander Graf wrote:
> 
> On 03.12.20 19:42, Peter Collingbourne wrote:
> > On Thu, Dec 3, 2020 at 1:41 AM Roman Bolshakov <r.bolshakov@yadro.com> wrote:
> > > On Mon, Nov 30, 2020 at 04:00:11PM -0800, Peter Collingbourne wrote:
> > > > What I observe is that when returning from a pending signal pselect
> > > > consumes the signal (which is also consistent with my understanding of
> > > > what pselect does). That means that it doesn't matter if we take a
> > > > second WFx exit because once we reach the pselect in the second WFx
> > > > exit the signal will have been consumed by the pselect in the first
> > > > exit and we will just wait for the next one.
> > > > 
> > > Aha! Thanks for the explanation. So, the first WFI in the series of
> > > guest WFIs will likely wake up immediately? After a period without WFIs
> > > there must be a pending SIG_IPI...
> > > 
> > > It shouldn't be a critical issue though because (as defined in D1.16.2)
> > > "the architecture permits a PE to leave the low-power state for any
> > > reason, it is permissible for a PE to treat WFI as a NOP, but this is
> > > not recommended for lowest power operation."
> > > 
> > > BTW. I think a bit from the thread should go into the description of
> > > patch 8, because it's not trivial and it would really be helpful to keep
> > > in repo history. At least something like this (taken from an earlier
> > > reply in the thread):
> > > 
> > >    In this implementation IPI is blocked on the CPU thread at startup and
> > >    pselect() is used to atomically unblock the signal and begin sleeping.
> > >    The signal is sent unconditionally so there's no need to worry about
> > >    races between actually sleeping and the "we think we're sleeping"
> > >    state. It may lead to an extra wakeup but that's better than missing
> > >    it entirely.
> > Okay, I'll add something like that to the next version of the patch I send out.
> 
> 
> If this is the only change, I've already added it for v4. If you want me to
> change it further, just let me know what to replace the patch description
> with.
> 
> 

Thanks, Alex.

I'm fine with the description and all set.

-Roman
diff mbox series

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 68bc160f41..ca4b6d9279 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -444,9 +444,16 @@  M: Cameron Esfahani <dirty@apple.com>
 M: Roman Bolshakov <r.bolshakov@yadro.com>
 W: https://wiki.qemu.org/Features/HVF
 S: Maintained
-F: accel/stubs/hvf-stub.c
 F: target/i386/hvf/
+
+HVF
+M: Cameron Esfahani <dirty@apple.com>
+M: Roman Bolshakov <r.bolshakov@yadro.com>
+W: https://wiki.qemu.org/Features/HVF
+S: Maintained
+F: accel/hvf/
 F: include/sysemu/hvf.h
+F: include/sysemu/hvf_int.h
 
 WHPX CPUs
 M: Sunil Muthuswamy <sunilmut@microsoft.com>
diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
new file mode 100644
index 0000000000..47d77a472a
--- /dev/null
+++ b/accel/hvf/hvf-all.c
@@ -0,0 +1,56 @@ 
+/*
+ * QEMU Hypervisor.framework support
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
+#include "sysemu/runstate.h"
+
+#include "qemu/main-loop.h"
+#include "sysemu/accel.h"
+
+#include <Hypervisor/Hypervisor.h>
+
+bool hvf_allowed;
+HVFState *hvf_state;
+
+void assert_hvf_ok(hv_return_t ret)
+{
+    if (ret == HV_SUCCESS) {
+        return;
+    }
+
+    switch (ret) {
+    case HV_ERROR:
+        error_report("Error: HV_ERROR");
+        break;
+    case HV_BUSY:
+        error_report("Error: HV_BUSY");
+        break;
+    case HV_BAD_ARGUMENT:
+        error_report("Error: HV_BAD_ARGUMENT");
+        break;
+    case HV_NO_RESOURCES:
+        error_report("Error: HV_NO_RESOURCES");
+        break;
+    case HV_NO_DEVICE:
+        error_report("Error: HV_NO_DEVICE");
+        break;
+    case HV_UNSUPPORTED:
+        error_report("Error: HV_UNSUPPORTED");
+        break;
+    default:
+        error_report("Unknown Error");
+    }
+
+    abort();
+}
diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
new file mode 100644
index 0000000000..f9bb5502b7
--- /dev/null
+++ b/accel/hvf/hvf-cpus.c
@@ -0,0 +1,468 @@ 
+/*
+ * Copyright 2008 IBM Corporation
+ *           2008 Red Hat, Inc.
+ * Copyright 2011 Intel Corporation
+ * Copyright 2016 Veertu, Inc.
+ * Copyright 2017 The Android Open Source Project
+ *
+ * QEMU Hypervisor.framework support
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ * This file contain code under public domain from the hvdos project:
+ * https://github.com/mist64/hvdos
+ *
+ * Parts Copyright (c) 2011 NetApp, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+#include "exec/address-spaces.h"
+#include "exec/exec-all.h"
+#include "sysemu/cpus.h"
+#include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
+#include "sysemu/runstate.h"
+#include "qemu/guest-random.h"
+
+#include <Hypervisor/Hypervisor.h>
+
+/* Memory slots */
+
+struct mac_slot {
+    int present;
+    uint64_t size;
+    uint64_t gpa_start;
+    uint64_t gva;
+};
+
+hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
+{
+    hvf_slot *slot;
+    int x;
+    for (x = 0; x < hvf_state->num_slots; ++x) {
+        slot = &hvf_state->slots[x];
+        if (slot->size && start < (slot->start + slot->size) &&
+            (start + size) > slot->start) {
+            return slot;
+        }
+    }
+    return NULL;
+}
+
+struct mac_slot mac_slots[32];
+
+static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
+{
+    struct mac_slot *macslot;
+    hv_return_t ret;
+
+    macslot = &mac_slots[slot->slot_id];
+
+    if (macslot->present) {
+        if (macslot->size != slot->size) {
+            macslot->present = 0;
+            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
+            assert_hvf_ok(ret);
+        }
+    }
+
+    if (!slot->size) {
+        return 0;
+    }
+
+    macslot->present = 1;
+    macslot->gpa_start = slot->start;
+    macslot->size = slot->size;
+    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
+    assert_hvf_ok(ret);
+    return 0;
+}
+
+static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
+{
+    hvf_slot *mem;
+    MemoryRegion *area = section->mr;
+    bool writeable = !area->readonly && !area->rom_device;
+    hv_memory_flags_t flags;
+
+    if (!memory_region_is_ram(area)) {
+        if (writeable) {
+            return;
+        } else if (!memory_region_is_romd(area)) {
+            /*
+             * If the memory device is not in romd_mode, then we actually want
+             * to remove the hvf memory slot so all accesses will trap.
+             */
+             add = false;
+        }
+    }
+
+    mem = hvf_find_overlap_slot(
+            section->offset_within_address_space,
+            int128_get64(section->size));
+
+    if (mem && add) {
+        if (mem->size == int128_get64(section->size) &&
+            mem->start == section->offset_within_address_space &&
+            mem->mem == (memory_region_get_ram_ptr(area) +
+            section->offset_within_region)) {
+            return; /* Same region was attempted to register, go away. */
+        }
+    }
+
+    /* Region needs to be reset. set the size to 0 and remap it. */
+    if (mem) {
+        mem->size = 0;
+        if (do_hvf_set_memory(mem, 0)) {
+            error_report("Failed to reset overlapping slot");
+            abort();
+        }
+    }
+
+    if (!add) {
+        return;
+    }
+
+    if (area->readonly ||
+        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
+        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
+    } else {
+        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
+    }
+
+    /* Now make a new slot. */
+    int x;
+
+    for (x = 0; x < hvf_state->num_slots; ++x) {
+        mem = &hvf_state->slots[x];
+        if (!mem->size) {
+            break;
+        }
+    }
+
+    if (x == hvf_state->num_slots) {
+        error_report("No free slots");
+        abort();
+    }
+
+    mem->size = int128_get64(section->size);
+    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
+    mem->start = section->offset_within_address_space;
+    mem->region = area;
+
+    if (do_hvf_set_memory(mem, flags)) {
+        error_report("Error registering new memory slot");
+        abort();
+    }
+}
+
+static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
+{
+    hvf_slot *slot;
+
+    slot = hvf_find_overlap_slot(
+            section->offset_within_address_space,
+            int128_get64(section->size));
+
+    /* protect region against writes; begin tracking it */
+    if (on) {
+        slot->flags |= HVF_SLOT_LOG;
+        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
+                      HV_MEMORY_READ);
+    /* stop tracking region*/
+    } else {
+        slot->flags &= ~HVF_SLOT_LOG;
+        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
+                      HV_MEMORY_READ | HV_MEMORY_WRITE);
+    }
+}
+
+static void hvf_log_start(MemoryListener *listener,
+                          MemoryRegionSection *section, int old, int new)
+{
+    if (old != 0) {
+        return;
+    }
+
+    hvf_set_dirty_tracking(section, 1);
+}
+
+static void hvf_log_stop(MemoryListener *listener,
+                         MemoryRegionSection *section, int old, int new)
+{
+    if (new != 0) {
+        return;
+    }
+
+    hvf_set_dirty_tracking(section, 0);
+}
+
+static void hvf_log_sync(MemoryListener *listener,
+                         MemoryRegionSection *section)
+{
+    /*
+     * sync of dirty pages is handled elsewhere; just make sure we keep
+     * tracking the region.
+     */
+    hvf_set_dirty_tracking(section, 1);
+}
+
+static void hvf_region_add(MemoryListener *listener,
+                           MemoryRegionSection *section)
+{
+    hvf_set_phys_mem(section, true);
+}
+
+static void hvf_region_del(MemoryListener *listener,
+                           MemoryRegionSection *section)
+{
+    hvf_set_phys_mem(section, false);
+}
+
+static MemoryListener hvf_memory_listener = {
+    .priority = 10,
+    .region_add = hvf_region_add,
+    .region_del = hvf_region_del,
+    .log_start = hvf_log_start,
+    .log_stop = hvf_log_stop,
+    .log_sync = hvf_log_sync,
+};
+
+static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
+{
+    if (!cpu->vcpu_dirty) {
+        hvf_get_registers(cpu);
+        cpu->vcpu_dirty = true;
+    }
+}
+
+static void hvf_cpu_synchronize_state(CPUState *cpu)
+{
+    if (!cpu->vcpu_dirty) {
+        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
+    }
+}
+
+static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
+                                              run_on_cpu_data arg)
+{
+    hvf_put_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
+}
+
+static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
+                                             run_on_cpu_data arg)
+{
+    hvf_put_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void hvf_cpu_synchronize_post_init(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
+}
+
+static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
+                                              run_on_cpu_data arg)
+{
+    cpu->vcpu_dirty = true;
+}
+
+static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+}
+
+static void hvf_vcpu_destroy(CPUState *cpu)
+{
+    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
+    assert_hvf_ok(ret);
+
+    hvf_arch_vcpu_destroy(cpu);
+}
+
+static void dummy_signal(int sig)
+{
+}
+
+static int hvf_init_vcpu(CPUState *cpu)
+{
+    int r;
+
+    /* init cpu signals */
+    sigset_t set;
+    struct sigaction sigact;
+
+    memset(&sigact, 0, sizeof(sigact));
+    sigact.sa_handler = dummy_signal;
+    sigaction(SIG_IPI, &sigact, NULL);
+
+    pthread_sigmask(SIG_BLOCK, NULL, &set);
+    sigdelset(&set, SIG_IPI);
+
+#ifdef __aarch64__
+    r = hv_vcpu_create(&cpu->hvf_fd, (hv_vcpu_exit_t **)&cpu->hvf_exit, NULL);
+#else
+    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
+#endif
+    cpu->vcpu_dirty = 1;
+    assert_hvf_ok(r);
+
+    return hvf_arch_init_vcpu(cpu);
+}
+
+/*
+ * The HVF-specific vCPU thread function. This one should only run when the host
+ * CPU supports the VMX "unrestricted guest" feature.
+ */
+static void *hvf_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+
+    int r;
+
+    assert(hvf_enabled());
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+
+    hvf_init_vcpu(cpu);
+
+    /* signal CPU creation */
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = hvf_vcpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    hvf_vcpu_destroy(cpu);
+    cpu_thread_signal_destroyed(cpu);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
+static void hvf_start_vcpu_thread(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    /*
+     * HVF currently does not support TCG, and only runs in
+     * unrestricted-guest mode.
+     */
+    assert(hvf_enabled());
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, hvf_cpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+}
+
+static const CpusAccel hvf_cpus = {
+    .create_vcpu_thread = hvf_start_vcpu_thread,
+
+    .synchronize_post_reset = hvf_cpu_synchronize_post_reset,
+    .synchronize_post_init = hvf_cpu_synchronize_post_init,
+    .synchronize_state = hvf_cpu_synchronize_state,
+    .synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm,
+};
+
+static int hvf_accel_init(MachineState *ms)
+{
+    int x;
+    hv_return_t ret;
+    HVFState *s;
+
+    ret = hv_vm_create(HV_VM_DEFAULT);
+    assert_hvf_ok(ret);
+
+    s = g_new0(HVFState, 1);
+
+    s->num_slots = 32;
+    for (x = 0; x < s->num_slots; ++x) {
+        s->slots[x].size = 0;
+        s->slots[x].slot_id = x;
+    }
+
+    hvf_state = s;
+    memory_listener_register(&hvf_memory_listener, &address_space_memory);
+    cpus_register_accel(&hvf_cpus);
+    return 0;
+}
+
+static void hvf_accel_class_init(ObjectClass *oc, void *data)
+{
+    AccelClass *ac = ACCEL_CLASS(oc);
+    ac->name = "HVF";
+    ac->init_machine = hvf_accel_init;
+    ac->allowed = &hvf_allowed;
+}
+
+static const TypeInfo hvf_accel_type = {
+    .name = TYPE_HVF_ACCEL,
+    .parent = TYPE_ACCEL,
+    .class_init = hvf_accel_class_init,
+};
+
+static void hvf_type_init(void)
+{
+    type_register_static(&hvf_accel_type);
+}
+
+type_init(hvf_type_init);
diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
new file mode 100644
index 0000000000..dfd6b68dc7
--- /dev/null
+++ b/accel/hvf/meson.build
@@ -0,0 +1,7 @@ 
+hvf_ss = ss.source_set()
+hvf_ss.add(files(
+  'hvf-all.c',
+  'hvf-cpus.c',
+))
+
+specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
diff --git a/accel/meson.build b/accel/meson.build
index b26cca227a..6de12ce5d5 100644
--- a/accel/meson.build
+++ b/accel/meson.build
@@ -1,5 +1,6 @@ 
 softmmu_ss.add(files('accel.c'))
 
+subdir('hvf')
 subdir('qtest')
 subdir('kvm')
 subdir('tcg')
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
new file mode 100644
index 0000000000..de9bad23a8
--- /dev/null
+++ b/include/sysemu/hvf_int.h
@@ -0,0 +1,69 @@ 
+/*
+ * QEMU Hypervisor.framework (HVF) support
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/* header to be included in HVF-specific code */
+
+#ifndef HVF_INT_H
+#define HVF_INT_H
+
+#include <Hypervisor/Hypervisor.h>
+
+#define HVF_MAX_VCPU 0x10
+
+extern struct hvf_state hvf_global;
+
+struct hvf_vm {
+    int id;
+    struct hvf_vcpu_state *vcpus[HVF_MAX_VCPU];
+};
+
+struct hvf_state {
+    uint32_t version;
+    struct hvf_vm *vm;
+    uint64_t mem_quota;
+};
+
+/* hvf_slot flags */
+#define HVF_SLOT_LOG (1 << 0)
+
+typedef struct hvf_slot {
+    uint64_t start;
+    uint64_t size;
+    uint8_t *mem;
+    int slot_id;
+    uint32_t flags;
+    MemoryRegion *region;
+} hvf_slot;
+
+typedef struct hvf_vcpu_caps {
+    uint64_t vmx_cap_pinbased;
+    uint64_t vmx_cap_procbased;
+    uint64_t vmx_cap_procbased2;
+    uint64_t vmx_cap_entry;
+    uint64_t vmx_cap_exit;
+    uint64_t vmx_cap_preemption_timer;
+} hvf_vcpu_caps;
+
+struct HVFState {
+    AccelState parent;
+    hvf_slot slots[32];
+    int num_slots;
+
+    hvf_vcpu_caps *hvf_caps;
+};
+extern HVFState *hvf_state;
+
+void assert_hvf_ok(hv_return_t ret);
+int hvf_get_registers(CPUState *cpu);
+int hvf_put_registers(CPUState *cpu);
+int hvf_arch_init_vcpu(CPUState *cpu);
+void hvf_arch_vcpu_destroy(CPUState *cpu);
+int hvf_vcpu_exec(CPUState *cpu);
+hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
+
+#endif
diff --git a/target/i386/hvf/hvf-cpus.c b/target/i386/hvf/hvf-cpus.c
deleted file mode 100644
index 817b3d7452..0000000000
--- a/target/i386/hvf/hvf-cpus.c
+++ /dev/null
@@ -1,131 +0,0 @@ 
-/*
- * Copyright 2008 IBM Corporation
- *           2008 Red Hat, Inc.
- * Copyright 2011 Intel Corporation
- * Copyright 2016 Veertu, Inc.
- * Copyright 2017 The Android Open Source Project
- *
- * QEMU Hypervisor.framework support
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * This file contain code under public domain from the hvdos project:
- * https://github.com/mist64/hvdos
- *
- * Parts Copyright (c) 2011 NetApp, Inc.
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- * 1. Redistributions of source code must retain the above copyright
- *    notice, this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright
- *    notice, this list of conditions and the following disclaimer in the
- *    documentation and/or other materials provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
- * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
- * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- * OR SERVICES;