diff mbox

[v5,2/2] Xen: Use the ioreq-server API when available

Message ID 1417776605-36309-3-git-send-email-paul.durrant@citrix.com
State New
Headers show

Commit Message

Paul Durrant Dec. 5, 2014, 10:50 a.m. UTC
The ioreq-server API added to Xen 4.5 offers better security than
the existing Xen/QEMU interface because the shared pages that are
used to pass emulation request/results back and forth are removed
from the guest's memory space before any requests are serviced.
This prevents the guest from mapping these pages (they are in a
well known location) and attempting to attack QEMU by synthesizing
its own request structures. Hence, this patch modifies configure
to detect whether the API is available, and adds the necessary
code to use the API if it is.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michael Tokarev <mjt@tls.msk.ru>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefan Weil <sw@weilnetz.de>
Cc: Olaf Hering <olaf@aepfle.de>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Alexander Graf <agraf@suse.de>
---
 configure                   |   29 ++++++
 include/hw/xen/xen_common.h |  223 +++++++++++++++++++++++++++++++++++++++++++
 trace-events                |    9 ++
 xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
 4 files changed, 399 insertions(+), 22 deletions(-)

Comments

Don Slutz Jan. 28, 2015, 7:32 p.m. UTC | #1
On 12/05/14 05:50, Paul Durrant wrote:
> The ioreq-server API added to Xen 4.5 offers better security than
> the existing Xen/QEMU interface because the shared pages that are
> used to pass emulation request/results back and forth are removed
> from the guest's memory space before any requests are serviced.
> This prevents the guest from mapping these pages (they are in a
> well known location) and attempting to attack QEMU by synthesizing
> its own request structures. Hence, this patch modifies configure
> to detect whether the API is available, and adds the necessary
> code to use the API if it is.

This patch (which is now on xenbits qemu staging) is causing me
issues.

So far I have tracked it back to hvm_select_ioreq_server()
which selects the "default_ioreq_server".  Since I have one 1
QEMU, it is both the "default_ioreq_server" and an enabled
2nd ioreq_server.  I am continuing to understand why my changes
are causing this.  More below.

This patch causes QEMU to only call xc_evtchn_bind_interdomain()
for the enabled 2nd ioreq_server.  So when (if)
hvm_select_ioreq_server() selects the "default_ioreq_server", the
guest hangs on an I/O.

Using the debug key 'e':

(XEN) [2015-01-28 18:57:07] 'e' pressed -> dumping event-channel info
(XEN) [2015-01-28 18:57:07] Event channel information for domain 0:
(XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
(XEN) [2015-01-28 18:57:07]     port [p/m/s]
(XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=5 n=0 x=0 v=0
(XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=6 n=0 x=0
(XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=6 n=0 x=0
(XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=5 n=0 x=0 v=1
(XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=6 n=0 x=0
(XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=6 n=0 x=0
(XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=5 n=1 x=0 v=0
(XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=6 n=1 x=0
(XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=6 n=1 x=0
(XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=5 n=1 x=0 v=1
(XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=6 n=1 x=0
(XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=6 n=1 x=0
(XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=5 n=2 x=0 v=0
(XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=6 n=2 x=0
(XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=6 n=2 x=0
(XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=5 n=2 x=0 v=1
(XEN) [2015-01-28 18:57:07]       17 [0/0/0]: s=6 n=2 x=0
(XEN) [2015-01-28 18:57:07]       18 [0/0/0]: s=6 n=2 x=0
(XEN) [2015-01-28 18:57:07]       19 [0/0/0]: s=5 n=3 x=0 v=0
(XEN) [2015-01-28 18:57:07]       20 [0/0/0]: s=6 n=3 x=0
(XEN) [2015-01-28 18:57:07]       21 [0/0/0]: s=6 n=3 x=0
(XEN) [2015-01-28 18:57:07]       22 [0/0/0]: s=5 n=3 x=0 v=1
(XEN) [2015-01-28 18:57:07]       23 [0/0/0]: s=6 n=3 x=0
(XEN) [2015-01-28 18:57:07]       24 [0/0/0]: s=6 n=3 x=0
(XEN) [2015-01-28 18:57:07]       25 [0/0/0]: s=5 n=4 x=0 v=0
(XEN) [2015-01-28 18:57:07]       26 [0/0/0]: s=6 n=4 x=0
(XEN) [2015-01-28 18:57:07]       27 [0/0/0]: s=6 n=4 x=0
(XEN) [2015-01-28 18:57:07]       28 [0/0/0]: s=5 n=4 x=0 v=1
(XEN) [2015-01-28 18:57:07]       29 [0/0/0]: s=6 n=4 x=0
(XEN) [2015-01-28 18:57:07]       30 [0/0/0]: s=6 n=4 x=0
(XEN) [2015-01-28 18:57:07]       31 [0/0/0]: s=5 n=5 x=0 v=0
(XEN) [2015-01-28 18:57:07]       32 [0/0/0]: s=6 n=5 x=0
(XEN) [2015-01-28 18:57:07]       33 [0/0/0]: s=6 n=5 x=0
(XEN) [2015-01-28 18:57:07]       34 [0/0/0]: s=5 n=5 x=0 v=1
(XEN) [2015-01-28 18:57:07]       35 [0/0/0]: s=6 n=5 x=0
(XEN) [2015-01-28 18:57:07]       36 [0/0/0]: s=6 n=5 x=0
(XEN) [2015-01-28 18:57:07]       37 [0/0/0]: s=5 n=6 x=0 v=0
(XEN) [2015-01-28 18:57:07]       38 [0/0/0]: s=6 n=6 x=0
(XEN) [2015-01-28 18:57:07]       39 [0/0/0]: s=6 n=6 x=0
(XEN) [2015-01-28 18:57:07]       40 [0/0/0]: s=5 n=6 x=0 v=1
(XEN) [2015-01-28 18:57:07]       41 [0/0/0]: s=6 n=6 x=0
(XEN) [2015-01-28 18:57:07]       42 [0/0/0]: s=6 n=6 x=0
(XEN) [2015-01-28 18:57:07]       43 [0/0/0]: s=5 n=7 x=0 v=0
(XEN) [2015-01-28 18:57:07]       44 [0/0/0]: s=6 n=7 x=0
(XEN) [2015-01-28 18:57:07]       45 [0/0/0]: s=6 n=7 x=0
(XEN) [2015-01-28 18:57:07]       46 [0/0/0]: s=5 n=7 x=0 v=1
(XEN) [2015-01-28 18:57:07]       47 [0/0/0]: s=6 n=7 x=0
(XEN) [2015-01-28 18:57:07]       48 [0/0/0]: s=6 n=7 x=0
(XEN) [2015-01-28 18:57:07]       49 [0/0/0]: s=3 n=0 x=0 d=0 p=58
(XEN) [2015-01-28 18:57:07]       50 [0/0/0]: s=5 n=0 x=0 v=9
(XEN) [2015-01-28 18:57:07]       51 [0/0/0]: s=4 n=0 x=0 p=9 i=9
(XEN) [2015-01-28 18:57:07]       52 [0/0/0]: s=5 n=0 x=0 v=2
(XEN) [2015-01-28 18:57:07]       53 [0/0/0]: s=4 n=4 x=0 p=16 i=16
(XEN) [2015-01-28 18:57:07]       54 [0/0/0]: s=4 n=0 x=0 p=17 i=17
(XEN) [2015-01-28 18:57:07]       55 [0/0/0]: s=4 n=6 x=0 p=18 i=18
(XEN) [2015-01-28 18:57:07]       56 [0/0/0]: s=4 n=0 x=0 p=8 i=8
(XEN) [2015-01-28 18:57:07]       57 [0/0/0]: s=4 n=0 x=0 p=19 i=19
(XEN) [2015-01-28 18:57:07]       58 [0/0/0]: s=3 n=0 x=0 d=0 p=49
(XEN) [2015-01-28 18:57:07]       59 [0/0/0]: s=5 n=0 x=0 v=3
(XEN) [2015-01-28 18:57:07]       60 [0/0/0]: s=5 n=0 x=0 v=4
(XEN) [2015-01-28 18:57:07]       61 [0/0/0]: s=3 n=0 x=0 d=1 p=1
(XEN) [2015-01-28 18:57:07]       62 [0/0/0]: s=3 n=0 x=0 d=1 p=2
(XEN) [2015-01-28 18:57:07]       63 [0/0/0]: s=3 n=0 x=0 d=1 p=3
(XEN) [2015-01-28 18:57:07]       64 [0/0/0]: s=3 n=0 x=0 d=1 p=5
(XEN) [2015-01-28 18:57:07]       65 [0/0/0]: s=3 n=0 x=0 d=1 p=6
(XEN) [2015-01-28 18:57:07]       66 [0/0/0]: s=3 n=0 x=0 d=1 p=7
(XEN) [2015-01-28 18:57:07]       67 [0/0/0]: s=3 n=0 x=0 d=1 p=8
(XEN) [2015-01-28 18:57:07]       68 [0/0/0]: s=3 n=0 x=0 d=1 p=9
(XEN) [2015-01-28 18:57:07]       69 [0/0/0]: s=3 n=0 x=0 d=1 p=4
(XEN) [2015-01-28 18:57:07] Event channel information for domain 1:
(XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
(XEN) [2015-01-28 18:57:07]     port [p/m/s]
(XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=3 n=0 x=0 d=0 p=61
(XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=3 n=0 x=0 d=0 p=62
(XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=3 n=0 x=1 d=0 p=63
(XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=3 n=0 x=1 d=0 p=69
(XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=3 n=1 x=1 d=0 p=64
(XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=3 n=2 x=1 d=0 p=65
(XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=3 n=3 x=1 d=0 p=66
(XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=3 n=4 x=1 d=0 p=67
(XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=3 n=5 x=1 d=0 p=68
(XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=2 n=0 x=1 d=0
(XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=2 n=0 x=1 d=0
(XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=2 n=1 x=1 d=0
(XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=2 n=2 x=1 d=0
(XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=2 n=3 x=1 d=0
(XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=2 n=4 x=1 d=0
(XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=2 n=5 x=1 d=0

You can see that domain 1 has only half of it's event channels
fully setup.  So when (if) hvm_send_assist_req_to_ioreq_server()
does:

            notify_via_xen_event_channel(d, port);

Nothing happens and you hang in hvm_wait_for_io() forever.


This does raise the questions:

1) Does this patch causes extra event channels to be created
   that cannot be used?

2) Should the "default_ioreq_server" be deleted?


Not sure the right way to go.

    -Don Slutz


> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Michael Tokarev <mjt@tls.msk.ru>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Stefan Weil <sw@weilnetz.de>
> Cc: Olaf Hering <olaf@aepfle.de>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: Alexander Graf <agraf@suse.de>
> ---
>  configure                   |   29 ++++++
>  include/hw/xen/xen_common.h |  223 +++++++++++++++++++++++++++++++++++++++++++
>  trace-events                |    9 ++
>  xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
>  4 files changed, 399 insertions(+), 22 deletions(-)
> 
> diff --git a/configure b/configure
> index 47048f0..b1f8c2a 100755
> --- a/configure
> +++ b/configure
> @@ -1877,6 +1877,32 @@ int main(void) {
>    xc_gnttab_open(NULL, 0);
>    xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>    xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
> +  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
> +  return 0;
> +}
> +EOF
> +      compile_prog "" "$xen_libs"
> +    then
> +    xen_ctrl_version=450
> +    xen=yes
> +
> +  elif
> +      cat > $TMPC <<EOF &&
> +#include <xenctrl.h>
> +#include <xenstore.h>
> +#include <stdint.h>
> +#include <xen/hvm/hvm_info_table.h>
> +#if !defined(HVM_MAX_VCPUS)
> +# error HVM_MAX_VCPUS not defined
> +#endif
> +int main(void) {
> +  xc_interface *xc;
> +  xs_daemon_open();
> +  xc = xc_interface_open(0, 0, 0);
> +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
> +  xc_gnttab_open(NULL, 0);
> +  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
> +  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>    return 0;
>  }
>  EOF
> @@ -4283,6 +4309,9 @@ if test -n "$sparc_cpu"; then
>      echo "Target Sparc Arch $sparc_cpu"
>  fi
>  echo "xen support       $xen"
> +if test "$xen" = "yes" ; then
> +  echo "xen ctrl version  $xen_ctrl_version"
> +fi
>  echo "brlapi support    $brlapi"
>  echo "bluez  support    $bluez"
>  echo "Documentation     $docs"
> diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
> index 95612a4..519696f 100644
> --- a/include/hw/xen/xen_common.h
> +++ b/include/hw/xen/xen_common.h
> @@ -16,7 +16,9 @@
>  
>  #include "hw/hw.h"
>  #include "hw/xen/xen.h"
> +#include "hw/pci/pci.h"
>  #include "qemu/queue.h"
> +#include "trace.h"
>  
>  /*
>   * We don't support Xen prior to 3.3.0.
> @@ -179,4 +181,225 @@ static inline int xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
>  }
>  #endif
>  
> +/* Xen before 4.5 */
> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
> +
> +#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
> +#define HVM_PARAM_BUFIOREQ_EVTCHN 26
> +#endif
> +
> +#define IOREQ_TYPE_PCI_CONFIG 2
> +
> +typedef uint32_t ioservid_t;
> +
> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
> +                                          ioservid_t ioservid,
> +                                          MemoryRegionSection *section)
> +{
> +}
> +
> +static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
> +                                            ioservid_t ioservid,
> +                                            MemoryRegionSection *section)
> +{
> +}
> +
> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
> +                                      ioservid_t ioservid,
> +                                      MemoryRegionSection *section)
> +{
> +}
> +
> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
> +                                        ioservid_t ioservid,
> +                                        MemoryRegionSection *section)
> +{
> +}
> +
> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
> +                                  ioservid_t ioservid,
> +                                  PCIDevice *pci_dev)
> +{
> +}
> +
> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
> +                                    ioservid_t ioservid,
> +                                    PCIDevice *pci_dev)
> +{
> +}
> +
> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
> +                                          ioservid_t *ioservid)
> +{
> +    return 0;
> +}
> +
> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
> +                                            ioservid_t ioservid)
> +{
> +}
> +
> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
> +                                            ioservid_t ioservid,
> +                                            xen_pfn_t *ioreq_pfn,
> +                                            xen_pfn_t *bufioreq_pfn,
> +                                            evtchn_port_t *bufioreq_evtchn)
> +{
> +    unsigned long param;
> +    int rc;
> +
> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN, &param);
> +    if (rc < 0) {
> +        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
> +        return -1;
> +    }
> +
> +    *ioreq_pfn = param;
> +
> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN, &param);
> +    if (rc < 0) {
> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
> +        return -1;
> +    }
> +
> +    *bufioreq_pfn = param;
> +
> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_EVTCHN,
> +                          &param);
> +    if (rc < 0) {
> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
> +        return -1;
> +    }
> +
> +    *bufioreq_evtchn = param;
> +
> +    return 0;
> +}
> +
> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
> +                                             ioservid_t ioservid,
> +                                             bool enable)
> +{
> +    return 0;
> +}
> +
> +/* Xen 4.5 */
> +#else
> +
> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
> +                                          ioservid_t ioservid,
> +                                          MemoryRegionSection *section)
> +{
> +    hwaddr start_addr = section->offset_within_address_space;
> +    ram_addr_t size = int128_get64(section->size);
> +    hwaddr end_addr = start_addr + size - 1;
> +
> +    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
> +                                        start_addr, end_addr);
> +}
> +
> +static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
> +                                            ioservid_t ioservid,
> +                                            MemoryRegionSection *section)
> +{
> +    hwaddr start_addr = section->offset_within_address_space;
> +    ram_addr_t size = int128_get64(section->size);
> +    hwaddr end_addr = start_addr + size - 1;
> +
> +    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
> +                                            start_addr, end_addr);
> +}
> +
> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
> +                                      ioservid_t ioservid,
> +                                      MemoryRegionSection *section)
> +{
> +    hwaddr start_addr = section->offset_within_address_space;
> +    ram_addr_t size = int128_get64(section->size);
> +    hwaddr end_addr = start_addr + size - 1;
> +
> +    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
> +                                        start_addr, end_addr);
> +}
> +
> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
> +                                        ioservid_t ioservid,
> +                                        MemoryRegionSection *section)
> +{
> +    hwaddr start_addr = section->offset_within_address_space;
> +    ram_addr_t size = int128_get64(section->size);
> +    hwaddr end_addr = start_addr + size - 1;
> +
> +    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
> +                                            start_addr, end_addr);
> +}
> +
> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
> +                                  ioservid_t ioservid,
> +                                  PCIDevice *pci_dev)
> +{
> +    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
> +                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
> +    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
> +                                      0, pci_bus_num(pci_dev->bus),
> +                                      PCI_SLOT(pci_dev->devfn),
> +                                      PCI_FUNC(pci_dev->devfn));
> +}
> +
> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
> +                                    ioservid_t ioservid,
> +                                    PCIDevice *pci_dev)
> +{
> +    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
> +                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
> +    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
> +                                          0, pci_bus_num(pci_dev->bus),
> +                                          PCI_SLOT(pci_dev->devfn),
> +                                          PCI_FUNC(pci_dev->devfn));
> +}
> +
> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
> +                                          ioservid_t *ioservid)
> +{
> +    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
> +
> +    if (rc == 0) {
> +        trace_xen_ioreq_server_create(*ioservid);
> +    }
> +
> +    return rc;
> +}
> +
> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
> +                                            ioservid_t ioservid)
> +{
> +    trace_xen_ioreq_server_destroy(ioservid);
> +    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
> +}
> +
> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
> +                                            ioservid_t ioservid,
> +                                            xen_pfn_t *ioreq_pfn,
> +                                            xen_pfn_t *bufioreq_pfn,
> +                                            evtchn_port_t *bufioreq_evtchn)
> +{
> +    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
> +                                        ioreq_pfn, bufioreq_pfn,
> +                                        bufioreq_evtchn);
> +}
> +
> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
> +                                             ioservid_t ioservid,
> +                                             bool enable)
> +{
> +    trace_xen_ioreq_server_state(ioservid, enable);
> +    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
> +}
> +
> +#endif
> +
>  #endif /* QEMU_HW_XEN_COMMON_H */
> diff --git a/trace-events b/trace-events
> index b5722ea..abd1118 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -897,6 +897,15 @@ pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s pages:
>  # xen-hvm.c
>  xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, size %#lx"
>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
> +xen_ioreq_server_create(uint32_t id) "id: %u"
> +xen_ioreq_server_destroy(uint32_t id) "id: %u"
> +xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
> +xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> +xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> +xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> +xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
>  
>  # xen-mapcache.c
>  xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
> diff --git a/xen-hvm.c b/xen-hvm.c
> index 7548794..31cb3ca 100644
> --- a/xen-hvm.c
> +++ b/xen-hvm.c
> @@ -85,9 +85,6 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
>  }
>  #  define FMT_ioreq_size "u"
>  #endif
> -#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
> -#define HVM_PARAM_BUFIOREQ_EVTCHN 26
> -#endif
>  
>  #define BUFFER_IO_MAX_DELAY  100
>  
> @@ -101,6 +98,7 @@ typedef struct XenPhysmap {
>  } XenPhysmap;
>  
>  typedef struct XenIOState {
> +    ioservid_t ioservid;
>      shared_iopage_t *shared_page;
>      shared_vmport_iopage_t *shared_vmport_page;
>      buffered_iopage_t *buffered_io_page;
> @@ -117,6 +115,8 @@ typedef struct XenIOState {
>  
>      struct xs_handle *xenstore;
>      MemoryListener memory_listener;
> +    MemoryListener io_listener;
> +    DeviceListener device_listener;
>      QLIST_HEAD(, XenPhysmap) physmap;
>      hwaddr free_phys_offset;
>      const XenPhysmap *log_for_dirtybit;
> @@ -467,12 +467,23 @@ static void xen_set_memory(struct MemoryListener *listener,
>      bool log_dirty = memory_region_is_logging(section->mr);
>      hvmmem_type_t mem_type;
>  
> +    if (section->mr == &ram_memory) {
> +        return;
> +    } else {
> +        if (add) {
> +            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
> +                                   section);
> +        } else {
> +            xen_unmap_memory_section(xen_xc, xen_domid, state->ioservid,
> +                                     section);
> +        }
> +    }
> +
>      if (!memory_region_is_ram(section->mr)) {
>          return;
>      }
>  
> -    if (!(section->mr != &ram_memory
> -          && ( (log_dirty && add) || (!log_dirty && !add)))) {
> +    if (log_dirty != add) {
>          return;
>      }
>  
> @@ -515,6 +526,50 @@ static void xen_region_del(MemoryListener *listener,
>      memory_region_unref(section->mr);
>  }
>  
> +static void xen_io_add(MemoryListener *listener,
> +                       MemoryRegionSection *section)
> +{
> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
> +
> +    memory_region_ref(section->mr);
> +
> +    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
> +}
> +
> +static void xen_io_del(MemoryListener *listener,
> +                       MemoryRegionSection *section)
> +{
> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
> +
> +    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid, section);
> +
> +    memory_region_unref(section->mr);
> +}
> +
> +static void xen_device_realize(DeviceListener *listener,
> +			       DeviceState *dev)
> +{
> +    XenIOState *state = container_of(listener, XenIOState, device_listener);
> +
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
> +
> +        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
> +    }
> +}
> +
> +static void xen_device_unrealize(DeviceListener *listener,
> +				 DeviceState *dev)
> +{
> +    XenIOState *state = container_of(listener, XenIOState, device_listener);
> +
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
> +
> +        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
> +    }
> +}
> +
>  static void xen_sync_dirty_bitmap(XenIOState *state,
>                                    hwaddr start_addr,
>                                    ram_addr_t size)
> @@ -615,6 +670,17 @@ static MemoryListener xen_memory_listener = {
>      .priority = 10,
>  };
>  
> +static MemoryListener xen_io_listener = {
> +    .region_add = xen_io_add,
> +    .region_del = xen_io_del,
> +    .priority = 10,
> +};
> +
> +static DeviceListener xen_device_listener = {
> +    .realize = xen_device_realize,
> +    .unrealize = xen_device_unrealize,
> +};
> +
>  /* get the ioreq packets from share mem */
>  static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState *state, int vcpu)
>  {
> @@ -863,6 +929,27 @@ static void handle_ioreq(XenIOState *state, ioreq_t *req)
>          case IOREQ_TYPE_INVALIDATE:
>              xen_invalidate_map_cache();
>              break;
> +        case IOREQ_TYPE_PCI_CONFIG: {
> +            uint32_t sbdf = req->addr >> 32;
> +            uint32_t val;
> +
> +            /* Fake a write to port 0xCF8 so that
> +             * the config space access will target the
> +             * correct device model.
> +             */
> +            val = (1u << 31) |
> +                  ((req->addr & 0x0f00) << 16) |
> +                  ((sbdf & 0xffff) << 8) |
> +                  (req->addr & 0xfc);
> +            do_outp(0xcf8, 4, val);
> +
> +            /* Now issue the config space access via
> +             * port 0xCFC
> +             */
> +            req->addr = 0xcfc | (req->addr & 0x03);
> +            cpu_ioreq_pio(req);
> +            break;
> +        }
>          default:
>              hw_error("Invalid ioreq type 0x%x\n", req->type);
>      }
> @@ -993,9 +1080,15 @@ static void xen_main_loop_prepare(XenIOState *state)
>  static void xen_hvm_change_state_handler(void *opaque, int running,
>                                           RunState rstate)
>  {
> +    XenIOState *state = opaque;
> +
>      if (running) {
> -        xen_main_loop_prepare((XenIOState *)opaque);
> +        xen_main_loop_prepare(state);
>      }
> +
> +    xen_set_ioreq_server_state(xen_xc, xen_domid,
> +                               state->ioservid,
> +                               (rstate == RUN_STATE_RUNNING));
>  }
>  
>  static void xen_exit_notifier(Notifier *n, void *data)
> @@ -1064,8 +1157,9 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>                   MemoryRegion **ram_memory)
>  {
>      int i, rc;
> -    unsigned long ioreq_pfn;
> -    unsigned long bufioreq_evtchn;
> +    xen_pfn_t ioreq_pfn;
> +    xen_pfn_t bufioreq_pfn;
> +    evtchn_port_t bufioreq_evtchn;
>      XenIOState *state;
>  
>      state = g_malloc0(sizeof (XenIOState));
> @@ -1082,6 +1176,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>          return -1;
>      }
>  
> +    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state->ioservid);
> +    if (rc < 0) {
> +        perror("xen: ioreq server create");
> +        return -1;
> +    }
> +
>      state->exit.notify = xen_exit_notifier;
>      qemu_add_exit_notifier(&state->exit);
>  
> @@ -1091,8 +1191,18 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>      state->wakeup.notify = xen_wakeup_notifier;
>      qemu_register_wakeup_notifier(&state->wakeup);
>  
> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN, &ioreq_pfn);
> +    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state->ioservid,
> +                                   &ioreq_pfn, &bufioreq_pfn,
> +                                   &bufioreq_evtchn);
> +    if (rc < 0) {
> +        hw_error("failed to get ioreq server info: error %d handle=" XC_INTERFACE_FMT,
> +                 errno, xen_xc);
> +    }
> +
>      DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
> +    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
> +    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
> +
>      state->shared_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
>                                                PROT_READ|PROT_WRITE, ioreq_pfn);
>      if (state->shared_page == NULL) {
> @@ -1114,10 +1224,10 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>          hw_error("get vmport regs pfn returned error %d, rc=%d", errno, rc);
>      }
>  
> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
> -    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
> -    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
> -                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
> +    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid,
> +                                                   XC_PAGE_SIZE,
> +                                                   PROT_READ|PROT_WRITE,
> +                                                   bufioreq_pfn);
>      if (state->buffered_io_page == NULL) {
>          hw_error("map buffered IO page returned error %d", errno);
>      }
> @@ -1125,6 +1235,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>      /* Note: cpus is empty at this point in init */
>      state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
>  
> +    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state->ioservid, true);
> +    if (rc < 0) {
> +        hw_error("failed to enable ioreq server info: error %d handle=" XC_INTERFACE_FMT,
> +                 errno, xen_xc);
> +    }
> +
>      state->ioreq_local_port = g_malloc0(max_cpus * sizeof (evtchn_port_t));
>  
>      /* FIXME: how about if we overflow the page here? */
> @@ -1132,22 +1248,16 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>          rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>                                          xen_vcpu_eport(state->shared_page, i));
>          if (rc == -1) {
> -            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
> +            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
>              return -1;
>          }
>          state->ioreq_local_port[i] = rc;
>      }
>  
> -    rc = xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_EVTCHN,
> -            &bufioreq_evtchn);
> -    if (rc < 0) {
> -        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
> -        return -1;
> -    }
>      rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
> -            (uint32_t)bufioreq_evtchn);
> +                                    bufioreq_evtchn);
>      if (rc == -1) {
> -        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
> +        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
>          return -1;
>      }
>      state->bufioreq_local_port = rc;
> @@ -1163,6 +1273,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>      memory_listener_register(&state->memory_listener, &address_space_memory);
>      state->log_for_dirtybit = NULL;
>  
> +    state->io_listener = xen_io_listener;
> +    memory_listener_register(&state->io_listener, &address_space_io);
> +
> +    state->device_listener = xen_device_listener;
> +    device_listener_register(&state->device_listener);
> +
>      /* Initialize backend core & drivers */
>      if (xen_be_init() != 0) {
>          fprintf(stderr, "%s: xen backend core setup failed\n", __FUNCTION__);
>
Don Slutz Jan. 29, 2015, 12:05 a.m. UTC | #2
On 01/28/15 14:32, Don Slutz wrote:
> On 12/05/14 05:50, Paul Durrant wrote:
>> The ioreq-server API added to Xen 4.5 offers better security than
>> the existing Xen/QEMU interface because the shared pages that are
>> used to pass emulation request/results back and forth are removed
>> from the guest's memory space before any requests are serviced.
>> This prevents the guest from mapping these pages (they are in a
>> well known location) and attempting to attack QEMU by synthesizing
>> its own request structures. Hence, this patch modifies configure
>> to detect whether the API is available, and adds the necessary
>> code to use the API if it is.
> 
> This patch (which is now on xenbits qemu staging) is causing me
> issues.
> 

I have found the key.

The following will reproduce my issue:

1) xl create -p <config>
2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN, or
   HVM_PARAM_BUFIOREQ_EVTCHN
3) xl unpause new guest

The guest will hang in hvmloader.

More in thread:


Subject: Re: [Xen-devel] [PATCH] ioreq-server: handle
IOREQ_TYPE_PCI_CONFIG in assist function
References: <1422385589-17316-1-git-send-email-wei.liu2@citrix.com>


    -Don Slutz


> So far I have tracked it back to hvm_select_ioreq_server()
> which selects the "default_ioreq_server".  Since I have one 1
> QEMU, it is both the "default_ioreq_server" and an enabled
> 2nd ioreq_server.  I am continuing to understand why my changes
> are causing this.  More below.
> 
> This patch causes QEMU to only call xc_evtchn_bind_interdomain()
> for the enabled 2nd ioreq_server.  So when (if)
> hvm_select_ioreq_server() selects the "default_ioreq_server", the
> guest hangs on an I/O.
> 
> Using the debug key 'e':
> 
> (XEN) [2015-01-28 18:57:07] 'e' pressed -> dumping event-channel info
> (XEN) [2015-01-28 18:57:07] Event channel information for domain 0:
> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=5 n=0 x=0 v=0
> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=6 n=0 x=0
> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=6 n=0 x=0
> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=5 n=0 x=0 v=1
> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=6 n=0 x=0
> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=6 n=0 x=0
> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=5 n=1 x=0 v=0
> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=6 n=1 x=0
> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=6 n=1 x=0
> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=5 n=1 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=6 n=1 x=0
> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=6 n=1 x=0
> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=5 n=2 x=0 v=0
> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=6 n=2 x=0
> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=6 n=2 x=0
> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=5 n=2 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       17 [0/0/0]: s=6 n=2 x=0
> (XEN) [2015-01-28 18:57:07]       18 [0/0/0]: s=6 n=2 x=0
> (XEN) [2015-01-28 18:57:07]       19 [0/0/0]: s=5 n=3 x=0 v=0
> (XEN) [2015-01-28 18:57:07]       20 [0/0/0]: s=6 n=3 x=0
> (XEN) [2015-01-28 18:57:07]       21 [0/0/0]: s=6 n=3 x=0
> (XEN) [2015-01-28 18:57:07]       22 [0/0/0]: s=5 n=3 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       23 [0/0/0]: s=6 n=3 x=0
> (XEN) [2015-01-28 18:57:07]       24 [0/0/0]: s=6 n=3 x=0
> (XEN) [2015-01-28 18:57:07]       25 [0/0/0]: s=5 n=4 x=0 v=0
> (XEN) [2015-01-28 18:57:07]       26 [0/0/0]: s=6 n=4 x=0
> (XEN) [2015-01-28 18:57:07]       27 [0/0/0]: s=6 n=4 x=0
> (XEN) [2015-01-28 18:57:07]       28 [0/0/0]: s=5 n=4 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       29 [0/0/0]: s=6 n=4 x=0
> (XEN) [2015-01-28 18:57:07]       30 [0/0/0]: s=6 n=4 x=0
> (XEN) [2015-01-28 18:57:07]       31 [0/0/0]: s=5 n=5 x=0 v=0
> (XEN) [2015-01-28 18:57:07]       32 [0/0/0]: s=6 n=5 x=0
> (XEN) [2015-01-28 18:57:07]       33 [0/0/0]: s=6 n=5 x=0
> (XEN) [2015-01-28 18:57:07]       34 [0/0/0]: s=5 n=5 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       35 [0/0/0]: s=6 n=5 x=0
> (XEN) [2015-01-28 18:57:07]       36 [0/0/0]: s=6 n=5 x=0
> (XEN) [2015-01-28 18:57:07]       37 [0/0/0]: s=5 n=6 x=0 v=0
> (XEN) [2015-01-28 18:57:07]       38 [0/0/0]: s=6 n=6 x=0
> (XEN) [2015-01-28 18:57:07]       39 [0/0/0]: s=6 n=6 x=0
> (XEN) [2015-01-28 18:57:07]       40 [0/0/0]: s=5 n=6 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       41 [0/0/0]: s=6 n=6 x=0
> (XEN) [2015-01-28 18:57:07]       42 [0/0/0]: s=6 n=6 x=0
> (XEN) [2015-01-28 18:57:07]       43 [0/0/0]: s=5 n=7 x=0 v=0
> (XEN) [2015-01-28 18:57:07]       44 [0/0/0]: s=6 n=7 x=0
> (XEN) [2015-01-28 18:57:07]       45 [0/0/0]: s=6 n=7 x=0
> (XEN) [2015-01-28 18:57:07]       46 [0/0/0]: s=5 n=7 x=0 v=1
> (XEN) [2015-01-28 18:57:07]       47 [0/0/0]: s=6 n=7 x=0
> (XEN) [2015-01-28 18:57:07]       48 [0/0/0]: s=6 n=7 x=0
> (XEN) [2015-01-28 18:57:07]       49 [0/0/0]: s=3 n=0 x=0 d=0 p=58
> (XEN) [2015-01-28 18:57:07]       50 [0/0/0]: s=5 n=0 x=0 v=9
> (XEN) [2015-01-28 18:57:07]       51 [0/0/0]: s=4 n=0 x=0 p=9 i=9
> (XEN) [2015-01-28 18:57:07]       52 [0/0/0]: s=5 n=0 x=0 v=2
> (XEN) [2015-01-28 18:57:07]       53 [0/0/0]: s=4 n=4 x=0 p=16 i=16
> (XEN) [2015-01-28 18:57:07]       54 [0/0/0]: s=4 n=0 x=0 p=17 i=17
> (XEN) [2015-01-28 18:57:07]       55 [0/0/0]: s=4 n=6 x=0 p=18 i=18
> (XEN) [2015-01-28 18:57:07]       56 [0/0/0]: s=4 n=0 x=0 p=8 i=8
> (XEN) [2015-01-28 18:57:07]       57 [0/0/0]: s=4 n=0 x=0 p=19 i=19
> (XEN) [2015-01-28 18:57:07]       58 [0/0/0]: s=3 n=0 x=0 d=0 p=49
> (XEN) [2015-01-28 18:57:07]       59 [0/0/0]: s=5 n=0 x=0 v=3
> (XEN) [2015-01-28 18:57:07]       60 [0/0/0]: s=5 n=0 x=0 v=4
> (XEN) [2015-01-28 18:57:07]       61 [0/0/0]: s=3 n=0 x=0 d=1 p=1
> (XEN) [2015-01-28 18:57:07]       62 [0/0/0]: s=3 n=0 x=0 d=1 p=2
> (XEN) [2015-01-28 18:57:07]       63 [0/0/0]: s=3 n=0 x=0 d=1 p=3
> (XEN) [2015-01-28 18:57:07]       64 [0/0/0]: s=3 n=0 x=0 d=1 p=5
> (XEN) [2015-01-28 18:57:07]       65 [0/0/0]: s=3 n=0 x=0 d=1 p=6
> (XEN) [2015-01-28 18:57:07]       66 [0/0/0]: s=3 n=0 x=0 d=1 p=7
> (XEN) [2015-01-28 18:57:07]       67 [0/0/0]: s=3 n=0 x=0 d=1 p=8
> (XEN) [2015-01-28 18:57:07]       68 [0/0/0]: s=3 n=0 x=0 d=1 p=9
> (XEN) [2015-01-28 18:57:07]       69 [0/0/0]: s=3 n=0 x=0 d=1 p=4
> (XEN) [2015-01-28 18:57:07] Event channel information for domain 1:
> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=3 n=0 x=0 d=0 p=61
> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=3 n=0 x=0 d=0 p=62
> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=3 n=0 x=1 d=0 p=63
> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=3 n=0 x=1 d=0 p=69
> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=3 n=1 x=1 d=0 p=64
> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=3 n=2 x=1 d=0 p=65
> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=3 n=3 x=1 d=0 p=66
> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=3 n=4 x=1 d=0 p=67
> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=3 n=5 x=1 d=0 p=68
> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=2 n=0 x=1 d=0
> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=2 n=0 x=1 d=0
> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=2 n=1 x=1 d=0
> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=2 n=2 x=1 d=0
> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=2 n=3 x=1 d=0
> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=2 n=4 x=1 d=0
> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=2 n=5 x=1 d=0
> 
> You can see that domain 1 has only half of it's event channels
> fully setup.  So when (if) hvm_send_assist_req_to_ioreq_server()
> does:
> 
>             notify_via_xen_event_channel(d, port);
> 
> Nothing happens and you hang in hvm_wait_for_io() forever.
> 
> 
> This does raise the questions:
> 
> 1) Does this patch causes extra event channels to be created
>    that cannot be used?
> 
> 2) Should the "default_ioreq_server" be deleted?
> 
> 
> Not sure the right way to go.
> 
>     -Don Slutz
> 
> 
>>
>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> Cc: Peter Maydell <peter.maydell@linaro.org>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Michael Tokarev <mjt@tls.msk.ru>
>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> Cc: Stefan Weil <sw@weilnetz.de>
>> Cc: Olaf Hering <olaf@aepfle.de>
>> Cc: Gerd Hoffmann <kraxel@redhat.com>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>> Cc: Alexander Graf <agraf@suse.de>
>> ---
>>  configure                   |   29 ++++++
>>  include/hw/xen/xen_common.h |  223 +++++++++++++++++++++++++++++++++++++++++++
>>  trace-events                |    9 ++
>>  xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
>>  4 files changed, 399 insertions(+), 22 deletions(-)
>>
>> diff --git a/configure b/configure
>> index 47048f0..b1f8c2a 100755
>> --- a/configure
>> +++ b/configure
>> @@ -1877,6 +1877,32 @@ int main(void) {
>>    xc_gnttab_open(NULL, 0);
>>    xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>    xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>> +  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
>> +  return 0;
>> +}
>> +EOF
>> +      compile_prog "" "$xen_libs"
>> +    then
>> +    xen_ctrl_version=450
>> +    xen=yes
>> +
>> +  elif
>> +      cat > $TMPC <<EOF &&
>> +#include <xenctrl.h>
>> +#include <xenstore.h>
>> +#include <stdint.h>
>> +#include <xen/hvm/hvm_info_table.h>
>> +#if !defined(HVM_MAX_VCPUS)
>> +# error HVM_MAX_VCPUS not defined
>> +#endif
>> +int main(void) {
>> +  xc_interface *xc;
>> +  xs_daemon_open();
>> +  xc = xc_interface_open(0, 0, 0);
>> +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
>> +  xc_gnttab_open(NULL, 0);
>> +  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>> +  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>    return 0;
>>  }
>>  EOF
>> @@ -4283,6 +4309,9 @@ if test -n "$sparc_cpu"; then
>>      echo "Target Sparc Arch $sparc_cpu"
>>  fi
>>  echo "xen support       $xen"
>> +if test "$xen" = "yes" ; then
>> +  echo "xen ctrl version  $xen_ctrl_version"
>> +fi
>>  echo "brlapi support    $brlapi"
>>  echo "bluez  support    $bluez"
>>  echo "Documentation     $docs"
>> diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
>> index 95612a4..519696f 100644
>> --- a/include/hw/xen/xen_common.h
>> +++ b/include/hw/xen/xen_common.h
>> @@ -16,7 +16,9 @@
>>  
>>  #include "hw/hw.h"
>>  #include "hw/xen/xen.h"
>> +#include "hw/pci/pci.h"
>>  #include "qemu/queue.h"
>> +#include "trace.h"
>>  
>>  /*
>>   * We don't support Xen prior to 3.3.0.
>> @@ -179,4 +181,225 @@ static inline int xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
>>  }
>>  #endif
>>  
>> +/* Xen before 4.5 */
>> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
>> +
>> +#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>> +#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>> +#endif
>> +
>> +#define IOREQ_TYPE_PCI_CONFIG 2
>> +
>> +typedef uint32_t ioservid_t;
>> +
>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>> +                                          ioservid_t ioservid,
>> +                                          MemoryRegionSection *section)
>> +{
>> +}
>> +
>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
>> +                                            ioservid_t ioservid,
>> +                                            MemoryRegionSection *section)
>> +{
>> +}
>> +
>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>> +                                      ioservid_t ioservid,
>> +                                      MemoryRegionSection *section)
>> +{
>> +}
>> +
>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>> +                                        ioservid_t ioservid,
>> +                                        MemoryRegionSection *section)
>> +{
>> +}
>> +
>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>> +                                  ioservid_t ioservid,
>> +                                  PCIDevice *pci_dev)
>> +{
>> +}
>> +
>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>> +                                    ioservid_t ioservid,
>> +                                    PCIDevice *pci_dev)
>> +{
>> +}
>> +
>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>> +                                          ioservid_t *ioservid)
>> +{
>> +    return 0;
>> +}
>> +
>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>> +                                            ioservid_t ioservid)
>> +{
>> +}
>> +
>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>> +                                            ioservid_t ioservid,
>> +                                            xen_pfn_t *ioreq_pfn,
>> +                                            xen_pfn_t *bufioreq_pfn,
>> +                                            evtchn_port_t *bufioreq_evtchn)
>> +{
>> +    unsigned long param;
>> +    int rc;
>> +
>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN, &param);
>> +    if (rc < 0) {
>> +        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
>> +        return -1;
>> +    }
>> +
>> +    *ioreq_pfn = param;
>> +
>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN, &param);
>> +    if (rc < 0) {
>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
>> +        return -1;
>> +    }
>> +
>> +    *bufioreq_pfn = param;
>> +
>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_EVTCHN,
>> +                          &param);
>> +    if (rc < 0) {
>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
>> +        return -1;
>> +    }
>> +
>> +    *bufioreq_evtchn = param;
>> +
>> +    return 0;
>> +}
>> +
>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>> +                                             ioservid_t ioservid,
>> +                                             bool enable)
>> +{
>> +    return 0;
>> +}
>> +
>> +/* Xen 4.5 */
>> +#else
>> +
>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>> +                                          ioservid_t ioservid,
>> +                                          MemoryRegionSection *section)
>> +{
>> +    hwaddr start_addr = section->offset_within_address_space;
>> +    ram_addr_t size = int128_get64(section->size);
>> +    hwaddr end_addr = start_addr + size - 1;
>> +
>> +    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
>> +                                        start_addr, end_addr);
>> +}
>> +
>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
>> +                                            ioservid_t ioservid,
>> +                                            MemoryRegionSection *section)
>> +{
>> +    hwaddr start_addr = section->offset_within_address_space;
>> +    ram_addr_t size = int128_get64(section->size);
>> +    hwaddr end_addr = start_addr + size - 1;
>> +
>> +    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
>> +                                            start_addr, end_addr);
>> +}
>> +
>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>> +                                      ioservid_t ioservid,
>> +                                      MemoryRegionSection *section)
>> +{
>> +    hwaddr start_addr = section->offset_within_address_space;
>> +    ram_addr_t size = int128_get64(section->size);
>> +    hwaddr end_addr = start_addr + size - 1;
>> +
>> +    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
>> +                                        start_addr, end_addr);
>> +}
>> +
>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>> +                                        ioservid_t ioservid,
>> +                                        MemoryRegionSection *section)
>> +{
>> +    hwaddr start_addr = section->offset_within_address_space;
>> +    ram_addr_t size = int128_get64(section->size);
>> +    hwaddr end_addr = start_addr + size - 1;
>> +
>> +    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
>> +                                            start_addr, end_addr);
>> +}
>> +
>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>> +                                  ioservid_t ioservid,
>> +                                  PCIDevice *pci_dev)
>> +{
>> +    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>> +                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>> +    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
>> +                                      0, pci_bus_num(pci_dev->bus),
>> +                                      PCI_SLOT(pci_dev->devfn),
>> +                                      PCI_FUNC(pci_dev->devfn));
>> +}
>> +
>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>> +                                    ioservid_t ioservid,
>> +                                    PCIDevice *pci_dev)
>> +{
>> +    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>> +                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>> +    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
>> +                                          0, pci_bus_num(pci_dev->bus),
>> +                                          PCI_SLOT(pci_dev->devfn),
>> +                                          PCI_FUNC(pci_dev->devfn));
>> +}
>> +
>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>> +                                          ioservid_t *ioservid)
>> +{
>> +    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
>> +
>> +    if (rc == 0) {
>> +        trace_xen_ioreq_server_create(*ioservid);
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>> +                                            ioservid_t ioservid)
>> +{
>> +    trace_xen_ioreq_server_destroy(ioservid);
>> +    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
>> +}
>> +
>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>> +                                            ioservid_t ioservid,
>> +                                            xen_pfn_t *ioreq_pfn,
>> +                                            xen_pfn_t *bufioreq_pfn,
>> +                                            evtchn_port_t *bufioreq_evtchn)
>> +{
>> +    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
>> +                                        ioreq_pfn, bufioreq_pfn,
>> +                                        bufioreq_evtchn);
>> +}
>> +
>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>> +                                             ioservid_t ioservid,
>> +                                             bool enable)
>> +{
>> +    trace_xen_ioreq_server_state(ioservid, enable);
>> +    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
>> +}
>> +
>> +#endif
>> +
>>  #endif /* QEMU_HW_XEN_COMMON_H */
>> diff --git a/trace-events b/trace-events
>> index b5722ea..abd1118 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -897,6 +897,15 @@ pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s pages:
>>  # xen-hvm.c
>>  xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, size %#lx"
>>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
>> +xen_ioreq_server_create(uint32_t id) "id: %u"
>> +xen_ioreq_server_destroy(uint32_t id) "id: %u"
>> +xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
>> +xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>> +xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>> +xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>> +xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
>> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
>>  
>>  # xen-mapcache.c
>>  xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
>> diff --git a/xen-hvm.c b/xen-hvm.c
>> index 7548794..31cb3ca 100644
>> --- a/xen-hvm.c
>> +++ b/xen-hvm.c
>> @@ -85,9 +85,6 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
>>  }
>>  #  define FMT_ioreq_size "u"
>>  #endif
>> -#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>> -#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>> -#endif
>>  
>>  #define BUFFER_IO_MAX_DELAY  100
>>  
>> @@ -101,6 +98,7 @@ typedef struct XenPhysmap {
>>  } XenPhysmap;
>>  
>>  typedef struct XenIOState {
>> +    ioservid_t ioservid;
>>      shared_iopage_t *shared_page;
>>      shared_vmport_iopage_t *shared_vmport_page;
>>      buffered_iopage_t *buffered_io_page;
>> @@ -117,6 +115,8 @@ typedef struct XenIOState {
>>  
>>      struct xs_handle *xenstore;
>>      MemoryListener memory_listener;
>> +    MemoryListener io_listener;
>> +    DeviceListener device_listener;
>>      QLIST_HEAD(, XenPhysmap) physmap;
>>      hwaddr free_phys_offset;
>>      const XenPhysmap *log_for_dirtybit;
>> @@ -467,12 +467,23 @@ static void xen_set_memory(struct MemoryListener *listener,
>>      bool log_dirty = memory_region_is_logging(section->mr);
>>      hvmmem_type_t mem_type;
>>  
>> +    if (section->mr == &ram_memory) {
>> +        return;
>> +    } else {
>> +        if (add) {
>> +            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
>> +                                   section);
>> +        } else {
>> +            xen_unmap_memory_section(xen_xc, xen_domid, state->ioservid,
>> +                                     section);
>> +        }
>> +    }
>> +
>>      if (!memory_region_is_ram(section->mr)) {
>>          return;
>>      }
>>  
>> -    if (!(section->mr != &ram_memory
>> -          && ( (log_dirty && add) || (!log_dirty && !add)))) {
>> +    if (log_dirty != add) {
>>          return;
>>      }
>>  
>> @@ -515,6 +526,50 @@ static void xen_region_del(MemoryListener *listener,
>>      memory_region_unref(section->mr);
>>  }
>>  
>> +static void xen_io_add(MemoryListener *listener,
>> +                       MemoryRegionSection *section)
>> +{
>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>> +
>> +    memory_region_ref(section->mr);
>> +
>> +    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
>> +}
>> +
>> +static void xen_io_del(MemoryListener *listener,
>> +                       MemoryRegionSection *section)
>> +{
>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>> +
>> +    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid, section);
>> +
>> +    memory_region_unref(section->mr);
>> +}
>> +
>> +static void xen_device_realize(DeviceListener *listener,
>> +			       DeviceState *dev)
>> +{
>> +    XenIOState *state = container_of(listener, XenIOState, device_listener);
>> +
>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>> +
>> +        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>> +    }
>> +}
>> +
>> +static void xen_device_unrealize(DeviceListener *listener,
>> +				 DeviceState *dev)
>> +{
>> +    XenIOState *state = container_of(listener, XenIOState, device_listener);
>> +
>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>> +
>> +        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>> +    }
>> +}
>> +
>>  static void xen_sync_dirty_bitmap(XenIOState *state,
>>                                    hwaddr start_addr,
>>                                    ram_addr_t size)
>> @@ -615,6 +670,17 @@ static MemoryListener xen_memory_listener = {
>>      .priority = 10,
>>  };
>>  
>> +static MemoryListener xen_io_listener = {
>> +    .region_add = xen_io_add,
>> +    .region_del = xen_io_del,
>> +    .priority = 10,
>> +};
>> +
>> +static DeviceListener xen_device_listener = {
>> +    .realize = xen_device_realize,
>> +    .unrealize = xen_device_unrealize,
>> +};
>> +
>>  /* get the ioreq packets from share mem */
>>  static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState *state, int vcpu)
>>  {
>> @@ -863,6 +929,27 @@ static void handle_ioreq(XenIOState *state, ioreq_t *req)
>>          case IOREQ_TYPE_INVALIDATE:
>>              xen_invalidate_map_cache();
>>              break;
>> +        case IOREQ_TYPE_PCI_CONFIG: {
>> +            uint32_t sbdf = req->addr >> 32;
>> +            uint32_t val;
>> +
>> +            /* Fake a write to port 0xCF8 so that
>> +             * the config space access will target the
>> +             * correct device model.
>> +             */
>> +            val = (1u << 31) |
>> +                  ((req->addr & 0x0f00) << 16) |
>> +                  ((sbdf & 0xffff) << 8) |
>> +                  (req->addr & 0xfc);
>> +            do_outp(0xcf8, 4, val);
>> +
>> +            /* Now issue the config space access via
>> +             * port 0xCFC
>> +             */
>> +            req->addr = 0xcfc | (req->addr & 0x03);
>> +            cpu_ioreq_pio(req);
>> +            break;
>> +        }
>>          default:
>>              hw_error("Invalid ioreq type 0x%x\n", req->type);
>>      }
>> @@ -993,9 +1080,15 @@ static void xen_main_loop_prepare(XenIOState *state)
>>  static void xen_hvm_change_state_handler(void *opaque, int running,
>>                                           RunState rstate)
>>  {
>> +    XenIOState *state = opaque;
>> +
>>      if (running) {
>> -        xen_main_loop_prepare((XenIOState *)opaque);
>> +        xen_main_loop_prepare(state);
>>      }
>> +
>> +    xen_set_ioreq_server_state(xen_xc, xen_domid,
>> +                               state->ioservid,
>> +                               (rstate == RUN_STATE_RUNNING));
>>  }
>>  
>>  static void xen_exit_notifier(Notifier *n, void *data)
>> @@ -1064,8 +1157,9 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>                   MemoryRegion **ram_memory)
>>  {
>>      int i, rc;
>> -    unsigned long ioreq_pfn;
>> -    unsigned long bufioreq_evtchn;
>> +    xen_pfn_t ioreq_pfn;
>> +    xen_pfn_t bufioreq_pfn;
>> +    evtchn_port_t bufioreq_evtchn;
>>      XenIOState *state;
>>  
>>      state = g_malloc0(sizeof (XenIOState));
>> @@ -1082,6 +1176,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>          return -1;
>>      }
>>  
>> +    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state->ioservid);
>> +    if (rc < 0) {
>> +        perror("xen: ioreq server create");
>> +        return -1;
>> +    }
>> +
>>      state->exit.notify = xen_exit_notifier;
>>      qemu_add_exit_notifier(&state->exit);
>>  
>> @@ -1091,8 +1191,18 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>      state->wakeup.notify = xen_wakeup_notifier;
>>      qemu_register_wakeup_notifier(&state->wakeup);
>>  
>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN, &ioreq_pfn);
>> +    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state->ioservid,
>> +                                   &ioreq_pfn, &bufioreq_pfn,
>> +                                   &bufioreq_evtchn);
>> +    if (rc < 0) {
>> +        hw_error("failed to get ioreq server info: error %d handle=" XC_INTERFACE_FMT,
>> +                 errno, xen_xc);
>> +    }
>> +
>>      DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
>> +    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
>> +    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
>> +
>>      state->shared_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
>>                                                PROT_READ|PROT_WRITE, ioreq_pfn);
>>      if (state->shared_page == NULL) {
>> @@ -1114,10 +1224,10 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>          hw_error("get vmport regs pfn returned error %d, rc=%d", errno, rc);
>>      }
>>  
>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
>> -    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
>> -    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
>> -                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
>> +    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid,
>> +                                                   XC_PAGE_SIZE,
>> +                                                   PROT_READ|PROT_WRITE,
>> +                                                   bufioreq_pfn);
>>      if (state->buffered_io_page == NULL) {
>>          hw_error("map buffered IO page returned error %d", errno);
>>      }
>> @@ -1125,6 +1235,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>      /* Note: cpus is empty at this point in init */
>>      state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
>>  
>> +    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state->ioservid, true);
>> +    if (rc < 0) {
>> +        hw_error("failed to enable ioreq server info: error %d handle=" XC_INTERFACE_FMT,
>> +                 errno, xen_xc);
>> +    }
>> +
>>      state->ioreq_local_port = g_malloc0(max_cpus * sizeof (evtchn_port_t));
>>  
>>      /* FIXME: how about if we overflow the page here? */
>> @@ -1132,22 +1248,16 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>          rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>                                          xen_vcpu_eport(state->shared_page, i));
>>          if (rc == -1) {
>> -            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>> +            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
>>              return -1;
>>          }
>>          state->ioreq_local_port[i] = rc;
>>      }
>>  
>> -    rc = xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_EVTCHN,
>> -            &bufioreq_evtchn);
>> -    if (rc < 0) {
>> -        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
>> -        return -1;
>> -    }
>>      rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>> -            (uint32_t)bufioreq_evtchn);
>> +                                    bufioreq_evtchn);
>>      if (rc == -1) {
>> -        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>> +        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
>>          return -1;
>>      }
>>      state->bufioreq_local_port = rc;
>> @@ -1163,6 +1273,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>      memory_listener_register(&state->memory_listener, &address_space_memory);
>>      state->log_for_dirtybit = NULL;
>>  
>> +    state->io_listener = xen_io_listener;
>> +    memory_listener_register(&state->io_listener, &address_space_io);
>> +
>> +    state->device_listener = xen_device_listener;
>> +    device_listener_register(&state->device_listener);
>> +
>>      /* Initialize backend core & drivers */
>>      if (xen_be_init() != 0) {
>>          fprintf(stderr, "%s: xen backend core setup failed\n", __FUNCTION__);
>>
>
Don Slutz Jan. 29, 2015, 12:57 a.m. UTC | #3
On 01/28/15 19:05, Don Slutz wrote:
> On 01/28/15 14:32, Don Slutz wrote:
>> On 12/05/14 05:50, Paul Durrant wrote:
>>> The ioreq-server API added to Xen 4.5 offers better security than
>>> the existing Xen/QEMU interface because the shared pages that are
>>> used to pass emulation request/results back and forth are removed
>>> from the guest's memory space before any requests are serviced.
>>> This prevents the guest from mapping these pages (they are in a
>>> well known location) and attempting to attack QEMU by synthesizing
>>> its own request structures. Hence, this patch modifies configure
>>> to detect whether the API is available, and adds the necessary
>>> code to use the API if it is.
>>
>> This patch (which is now on xenbits qemu staging) is causing me
>> issues.
>>
> 
> I have found the key.
> 
> The following will reproduce my issue:
> 
> 1) xl create -p <config>
> 2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN, or
>    HVM_PARAM_BUFIOREQ_EVTCHN
> 3) xl unpause new guest
> 
> The guest will hang in hvmloader.
> 
> More in thread:
> 
> 
> Subject: Re: [Xen-devel] [PATCH] ioreq-server: handle
> IOREQ_TYPE_PCI_CONFIG in assist function
> References: <1422385589-17316-1-git-send-email-wei.liu2@citrix.com>
> 
> 

Opps, That thread does not make sense to include what I have found.

Here is the info I was going to send there:


Using QEMU upstream master (or xenbits qemu staging), you do not have a
default ioreq server.  And so hvm_select_ioreq_server() returns NULL for
hvmloader's iorequest to:

CPU4  0 (+       0)  HANDLE_PIO [ port = 0x0cfe size = 2 dir = 1 ]

(I added this xentrace to figure out what is happening, and I have
a lot of data about it, if any one wants it.)

To get a guest hang instead of calling hvm_complete_assist_req()
for some of hvmloader's pci_read() calls, you can do the following:


1) xl create -p <config>
2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN, or
   HVM_PARAM_BUFIOREQ_EVTCHN
3) xl unpause new guest

The guest will hang in hvmloader.

The read of HVM_PARAM_IOREQ_PFN will cause a default ioreq server to
be created and directed to the QEMU upsteam that is not a default
ioreq server.  This read also creates the extra event channels that
I see.

I see that hvmop_create_ioreq_server() prevents you from creating
an is_default ioreq_server, so QEMU is not able to do.

Not sure where we go from here.

   -Don Slutz


>     -Don Slutz
> 
> 
>> So far I have tracked it back to hvm_select_ioreq_server()
>> which selects the "default_ioreq_server".  Since I have one 1
>> QEMU, it is both the "default_ioreq_server" and an enabled
>> 2nd ioreq_server.  I am continuing to understand why my changes
>> are causing this.  More below.
>>
>> This patch causes QEMU to only call xc_evtchn_bind_interdomain()
>> for the enabled 2nd ioreq_server.  So when (if)
>> hvm_select_ioreq_server() selects the "default_ioreq_server", the
>> guest hangs on an I/O.
>>
>> Using the debug key 'e':
>>
>> (XEN) [2015-01-28 18:57:07] 'e' pressed -> dumping event-channel info
>> (XEN) [2015-01-28 18:57:07] Event channel information for domain 0:
>> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
>> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
>> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=5 n=0 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=6 n=0 x=0
>> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=6 n=0 x=0
>> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=5 n=0 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=6 n=0 x=0
>> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=6 n=0 x=0
>> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=5 n=1 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=6 n=1 x=0
>> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=6 n=1 x=0
>> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=5 n=1 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=6 n=1 x=0
>> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=6 n=1 x=0
>> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=5 n=2 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=6 n=2 x=0
>> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=6 n=2 x=0
>> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=5 n=2 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       17 [0/0/0]: s=6 n=2 x=0
>> (XEN) [2015-01-28 18:57:07]       18 [0/0/0]: s=6 n=2 x=0
>> (XEN) [2015-01-28 18:57:07]       19 [0/0/0]: s=5 n=3 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]       20 [0/0/0]: s=6 n=3 x=0
>> (XEN) [2015-01-28 18:57:07]       21 [0/0/0]: s=6 n=3 x=0
>> (XEN) [2015-01-28 18:57:07]       22 [0/0/0]: s=5 n=3 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       23 [0/0/0]: s=6 n=3 x=0
>> (XEN) [2015-01-28 18:57:07]       24 [0/0/0]: s=6 n=3 x=0
>> (XEN) [2015-01-28 18:57:07]       25 [0/0/0]: s=5 n=4 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]       26 [0/0/0]: s=6 n=4 x=0
>> (XEN) [2015-01-28 18:57:07]       27 [0/0/0]: s=6 n=4 x=0
>> (XEN) [2015-01-28 18:57:07]       28 [0/0/0]: s=5 n=4 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       29 [0/0/0]: s=6 n=4 x=0
>> (XEN) [2015-01-28 18:57:07]       30 [0/0/0]: s=6 n=4 x=0
>> (XEN) [2015-01-28 18:57:07]       31 [0/0/0]: s=5 n=5 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]       32 [0/0/0]: s=6 n=5 x=0
>> (XEN) [2015-01-28 18:57:07]       33 [0/0/0]: s=6 n=5 x=0
>> (XEN) [2015-01-28 18:57:07]       34 [0/0/0]: s=5 n=5 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       35 [0/0/0]: s=6 n=5 x=0
>> (XEN) [2015-01-28 18:57:07]       36 [0/0/0]: s=6 n=5 x=0
>> (XEN) [2015-01-28 18:57:07]       37 [0/0/0]: s=5 n=6 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]       38 [0/0/0]: s=6 n=6 x=0
>> (XEN) [2015-01-28 18:57:07]       39 [0/0/0]: s=6 n=6 x=0
>> (XEN) [2015-01-28 18:57:07]       40 [0/0/0]: s=5 n=6 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       41 [0/0/0]: s=6 n=6 x=0
>> (XEN) [2015-01-28 18:57:07]       42 [0/0/0]: s=6 n=6 x=0
>> (XEN) [2015-01-28 18:57:07]       43 [0/0/0]: s=5 n=7 x=0 v=0
>> (XEN) [2015-01-28 18:57:07]       44 [0/0/0]: s=6 n=7 x=0
>> (XEN) [2015-01-28 18:57:07]       45 [0/0/0]: s=6 n=7 x=0
>> (XEN) [2015-01-28 18:57:07]       46 [0/0/0]: s=5 n=7 x=0 v=1
>> (XEN) [2015-01-28 18:57:07]       47 [0/0/0]: s=6 n=7 x=0
>> (XEN) [2015-01-28 18:57:07]       48 [0/0/0]: s=6 n=7 x=0
>> (XEN) [2015-01-28 18:57:07]       49 [0/0/0]: s=3 n=0 x=0 d=0 p=58
>> (XEN) [2015-01-28 18:57:07]       50 [0/0/0]: s=5 n=0 x=0 v=9
>> (XEN) [2015-01-28 18:57:07]       51 [0/0/0]: s=4 n=0 x=0 p=9 i=9
>> (XEN) [2015-01-28 18:57:07]       52 [0/0/0]: s=5 n=0 x=0 v=2
>> (XEN) [2015-01-28 18:57:07]       53 [0/0/0]: s=4 n=4 x=0 p=16 i=16
>> (XEN) [2015-01-28 18:57:07]       54 [0/0/0]: s=4 n=0 x=0 p=17 i=17
>> (XEN) [2015-01-28 18:57:07]       55 [0/0/0]: s=4 n=6 x=0 p=18 i=18
>> (XEN) [2015-01-28 18:57:07]       56 [0/0/0]: s=4 n=0 x=0 p=8 i=8
>> (XEN) [2015-01-28 18:57:07]       57 [0/0/0]: s=4 n=0 x=0 p=19 i=19
>> (XEN) [2015-01-28 18:57:07]       58 [0/0/0]: s=3 n=0 x=0 d=0 p=49
>> (XEN) [2015-01-28 18:57:07]       59 [0/0/0]: s=5 n=0 x=0 v=3
>> (XEN) [2015-01-28 18:57:07]       60 [0/0/0]: s=5 n=0 x=0 v=4
>> (XEN) [2015-01-28 18:57:07]       61 [0/0/0]: s=3 n=0 x=0 d=1 p=1
>> (XEN) [2015-01-28 18:57:07]       62 [0/0/0]: s=3 n=0 x=0 d=1 p=2
>> (XEN) [2015-01-28 18:57:07]       63 [0/0/0]: s=3 n=0 x=0 d=1 p=3
>> (XEN) [2015-01-28 18:57:07]       64 [0/0/0]: s=3 n=0 x=0 d=1 p=5
>> (XEN) [2015-01-28 18:57:07]       65 [0/0/0]: s=3 n=0 x=0 d=1 p=6
>> (XEN) [2015-01-28 18:57:07]       66 [0/0/0]: s=3 n=0 x=0 d=1 p=7
>> (XEN) [2015-01-28 18:57:07]       67 [0/0/0]: s=3 n=0 x=0 d=1 p=8
>> (XEN) [2015-01-28 18:57:07]       68 [0/0/0]: s=3 n=0 x=0 d=1 p=9
>> (XEN) [2015-01-28 18:57:07]       69 [0/0/0]: s=3 n=0 x=0 d=1 p=4
>> (XEN) [2015-01-28 18:57:07] Event channel information for domain 1:
>> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
>> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
>> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=3 n=0 x=0 d=0 p=61
>> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=3 n=0 x=0 d=0 p=62
>> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=3 n=0 x=1 d=0 p=63
>> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=3 n=0 x=1 d=0 p=69
>> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=3 n=1 x=1 d=0 p=64
>> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=3 n=2 x=1 d=0 p=65
>> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=3 n=3 x=1 d=0 p=66
>> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=3 n=4 x=1 d=0 p=67
>> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=3 n=5 x=1 d=0 p=68
>> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=2 n=0 x=1 d=0
>> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=2 n=0 x=1 d=0
>> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=2 n=1 x=1 d=0
>> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=2 n=2 x=1 d=0
>> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=2 n=3 x=1 d=0
>> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=2 n=4 x=1 d=0
>> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=2 n=5 x=1 d=0
>>
>> You can see that domain 1 has only half of it's event channels
>> fully setup.  So when (if) hvm_send_assist_req_to_ioreq_server()
>> does:
>>
>>             notify_via_xen_event_channel(d, port);
>>
>> Nothing happens and you hang in hvm_wait_for_io() forever.
>>
>>
>> This does raise the questions:
>>
>> 1) Does this patch causes extra event channels to be created
>>    that cannot be used?
>>
>> 2) Should the "default_ioreq_server" be deleted?
>>
>>
>> Not sure the right way to go.
>>
>>     -Don Slutz
>>
>>
>>>
>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>> Cc: Peter Maydell <peter.maydell@linaro.org>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Michael Tokarev <mjt@tls.msk.ru>
>>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>>> Cc: Stefan Weil <sw@weilnetz.de>
>>> Cc: Olaf Hering <olaf@aepfle.de>
>>> Cc: Gerd Hoffmann <kraxel@redhat.com>
>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> Cc: Alexander Graf <agraf@suse.de>
>>> ---
>>>  configure                   |   29 ++++++
>>>  include/hw/xen/xen_common.h |  223 +++++++++++++++++++++++++++++++++++++++++++
>>>  trace-events                |    9 ++
>>>  xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
>>>  4 files changed, 399 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/configure b/configure
>>> index 47048f0..b1f8c2a 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -1877,6 +1877,32 @@ int main(void) {
>>>    xc_gnttab_open(NULL, 0);
>>>    xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>>    xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>> +  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
>>> +  return 0;
>>> +}
>>> +EOF
>>> +      compile_prog "" "$xen_libs"
>>> +    then
>>> +    xen_ctrl_version=450
>>> +    xen=yes
>>> +
>>> +  elif
>>> +      cat > $TMPC <<EOF &&
>>> +#include <xenctrl.h>
>>> +#include <xenstore.h>
>>> +#include <stdint.h>
>>> +#include <xen/hvm/hvm_info_table.h>
>>> +#if !defined(HVM_MAX_VCPUS)
>>> +# error HVM_MAX_VCPUS not defined
>>> +#endif
>>> +int main(void) {
>>> +  xc_interface *xc;
>>> +  xs_daemon_open();
>>> +  xc = xc_interface_open(0, 0, 0);
>>> +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
>>> +  xc_gnttab_open(NULL, 0);
>>> +  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>> +  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>>    return 0;
>>>  }
>>>  EOF
>>> @@ -4283,6 +4309,9 @@ if test -n "$sparc_cpu"; then
>>>      echo "Target Sparc Arch $sparc_cpu"
>>>  fi
>>>  echo "xen support       $xen"
>>> +if test "$xen" = "yes" ; then
>>> +  echo "xen ctrl version  $xen_ctrl_version"
>>> +fi
>>>  echo "brlapi support    $brlapi"
>>>  echo "bluez  support    $bluez"
>>>  echo "Documentation     $docs"
>>> diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
>>> index 95612a4..519696f 100644
>>> --- a/include/hw/xen/xen_common.h
>>> +++ b/include/hw/xen/xen_common.h
>>> @@ -16,7 +16,9 @@
>>>  
>>>  #include "hw/hw.h"
>>>  #include "hw/xen/xen.h"
>>> +#include "hw/pci/pci.h"
>>>  #include "qemu/queue.h"
>>> +#include "trace.h"
>>>  
>>>  /*
>>>   * We don't support Xen prior to 3.3.0.
>>> @@ -179,4 +181,225 @@ static inline int xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
>>>  }
>>>  #endif
>>>  
>>> +/* Xen before 4.5 */
>>> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
>>> +
>>> +#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>>> +#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>>> +#endif
>>> +
>>> +#define IOREQ_TYPE_PCI_CONFIG 2
>>> +
>>> +typedef uint32_t ioservid_t;
>>> +
>>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>>> +                                          ioservid_t ioservid,
>>> +                                          MemoryRegionSection *section)
>>> +{
>>> +}
>>> +
>>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
>>> +                                            ioservid_t ioservid,
>>> +                                            MemoryRegionSection *section)
>>> +{
>>> +}
>>> +
>>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>>> +                                      ioservid_t ioservid,
>>> +                                      MemoryRegionSection *section)
>>> +{
>>> +}
>>> +
>>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>>> +                                        ioservid_t ioservid,
>>> +                                        MemoryRegionSection *section)
>>> +{
>>> +}
>>> +
>>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>> +                                  ioservid_t ioservid,
>>> +                                  PCIDevice *pci_dev)
>>> +{
>>> +}
>>> +
>>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>> +                                    ioservid_t ioservid,
>>> +                                    PCIDevice *pci_dev)
>>> +{
>>> +}
>>> +
>>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>>> +                                          ioservid_t *ioservid)
>>> +{
>>> +    return 0;
>>> +}
>>> +
>>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>>> +                                            ioservid_t ioservid)
>>> +{
>>> +}
>>> +
>>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>>> +                                            ioservid_t ioservid,
>>> +                                            xen_pfn_t *ioreq_pfn,
>>> +                                            xen_pfn_t *bufioreq_pfn,
>>> +                                            evtchn_port_t *bufioreq_evtchn)
>>> +{
>>> +    unsigned long param;
>>> +    int rc;
>>> +
>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN, &param);
>>> +    if (rc < 0) {
>>> +        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
>>> +        return -1;
>>> +    }
>>> +
>>> +    *ioreq_pfn = param;
>>> +
>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN, &param);
>>> +    if (rc < 0) {
>>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
>>> +        return -1;
>>> +    }
>>> +
>>> +    *bufioreq_pfn = param;
>>> +
>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_EVTCHN,
>>> +                          &param);
>>> +    if (rc < 0) {
>>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
>>> +        return -1;
>>> +    }
>>> +
>>> +    *bufioreq_evtchn = param;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>>> +                                             ioservid_t ioservid,
>>> +                                             bool enable)
>>> +{
>>> +    return 0;
>>> +}
>>> +
>>> +/* Xen 4.5 */
>>> +#else
>>> +
>>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>>> +                                          ioservid_t ioservid,
>>> +                                          MemoryRegionSection *section)
>>> +{
>>> +    hwaddr start_addr = section->offset_within_address_space;
>>> +    ram_addr_t size = int128_get64(section->size);
>>> +    hwaddr end_addr = start_addr + size - 1;
>>> +
>>> +    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
>>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
>>> +                                        start_addr, end_addr);
>>> +}
>>> +
>>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
>>> +                                            ioservid_t ioservid,
>>> +                                            MemoryRegionSection *section)
>>> +{
>>> +    hwaddr start_addr = section->offset_within_address_space;
>>> +    ram_addr_t size = int128_get64(section->size);
>>> +    hwaddr end_addr = start_addr + size - 1;
>>> +
>>> +    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
>>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
>>> +                                            start_addr, end_addr);
>>> +}
>>> +
>>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>>> +                                      ioservid_t ioservid,
>>> +                                      MemoryRegionSection *section)
>>> +{
>>> +    hwaddr start_addr = section->offset_within_address_space;
>>> +    ram_addr_t size = int128_get64(section->size);
>>> +    hwaddr end_addr = start_addr + size - 1;
>>> +
>>> +    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
>>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
>>> +                                        start_addr, end_addr);
>>> +}
>>> +
>>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>>> +                                        ioservid_t ioservid,
>>> +                                        MemoryRegionSection *section)
>>> +{
>>> +    hwaddr start_addr = section->offset_within_address_space;
>>> +    ram_addr_t size = int128_get64(section->size);
>>> +    hwaddr end_addr = start_addr + size - 1;
>>> +
>>> +    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
>>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
>>> +                                            start_addr, end_addr);
>>> +}
>>> +
>>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>> +                                  ioservid_t ioservid,
>>> +                                  PCIDevice *pci_dev)
>>> +{
>>> +    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>> +                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>>> +    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
>>> +                                      0, pci_bus_num(pci_dev->bus),
>>> +                                      PCI_SLOT(pci_dev->devfn),
>>> +                                      PCI_FUNC(pci_dev->devfn));
>>> +}
>>> +
>>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>> +                                    ioservid_t ioservid,
>>> +                                    PCIDevice *pci_dev)
>>> +{
>>> +    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>> +                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>>> +    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
>>> +                                          0, pci_bus_num(pci_dev->bus),
>>> +                                          PCI_SLOT(pci_dev->devfn),
>>> +                                          PCI_FUNC(pci_dev->devfn));
>>> +}
>>> +
>>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>>> +                                          ioservid_t *ioservid)
>>> +{
>>> +    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
>>> +
>>> +    if (rc == 0) {
>>> +        trace_xen_ioreq_server_create(*ioservid);
>>> +    }
>>> +
>>> +    return rc;
>>> +}
>>> +
>>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>>> +                                            ioservid_t ioservid)
>>> +{
>>> +    trace_xen_ioreq_server_destroy(ioservid);
>>> +    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
>>> +}
>>> +
>>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>>> +                                            ioservid_t ioservid,
>>> +                                            xen_pfn_t *ioreq_pfn,
>>> +                                            xen_pfn_t *bufioreq_pfn,
>>> +                                            evtchn_port_t *bufioreq_evtchn)
>>> +{
>>> +    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
>>> +                                        ioreq_pfn, bufioreq_pfn,
>>> +                                        bufioreq_evtchn);
>>> +}
>>> +
>>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>>> +                                             ioservid_t ioservid,
>>> +                                             bool enable)
>>> +{
>>> +    trace_xen_ioreq_server_state(ioservid, enable);
>>> +    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
>>> +}
>>> +
>>> +#endif
>>> +
>>>  #endif /* QEMU_HW_XEN_COMMON_H */
>>> diff --git a/trace-events b/trace-events
>>> index b5722ea..abd1118 100644
>>> --- a/trace-events
>>> +++ b/trace-events
>>> @@ -897,6 +897,15 @@ pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s pages:
>>>  # xen-hvm.c
>>>  xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, size %#lx"
>>>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
>>> +xen_ioreq_server_create(uint32_t id) "id: %u"
>>> +xen_ioreq_server_destroy(uint32_t id) "id: %u"
>>> +xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
>>> +xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>> +xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>> +xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>> +xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
>>> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
>>>  
>>>  # xen-mapcache.c
>>>  xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
>>> diff --git a/xen-hvm.c b/xen-hvm.c
>>> index 7548794..31cb3ca 100644
>>> --- a/xen-hvm.c
>>> +++ b/xen-hvm.c
>>> @@ -85,9 +85,6 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
>>>  }
>>>  #  define FMT_ioreq_size "u"
>>>  #endif
>>> -#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>>> -#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>>> -#endif
>>>  
>>>  #define BUFFER_IO_MAX_DELAY  100
>>>  
>>> @@ -101,6 +98,7 @@ typedef struct XenPhysmap {
>>>  } XenPhysmap;
>>>  
>>>  typedef struct XenIOState {
>>> +    ioservid_t ioservid;
>>>      shared_iopage_t *shared_page;
>>>      shared_vmport_iopage_t *shared_vmport_page;
>>>      buffered_iopage_t *buffered_io_page;
>>> @@ -117,6 +115,8 @@ typedef struct XenIOState {
>>>  
>>>      struct xs_handle *xenstore;
>>>      MemoryListener memory_listener;
>>> +    MemoryListener io_listener;
>>> +    DeviceListener device_listener;
>>>      QLIST_HEAD(, XenPhysmap) physmap;
>>>      hwaddr free_phys_offset;
>>>      const XenPhysmap *log_for_dirtybit;
>>> @@ -467,12 +467,23 @@ static void xen_set_memory(struct MemoryListener *listener,
>>>      bool log_dirty = memory_region_is_logging(section->mr);
>>>      hvmmem_type_t mem_type;
>>>  
>>> +    if (section->mr == &ram_memory) {
>>> +        return;
>>> +    } else {
>>> +        if (add) {
>>> +            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
>>> +                                   section);
>>> +        } else {
>>> +            xen_unmap_memory_section(xen_xc, xen_domid, state->ioservid,
>>> +                                     section);
>>> +        }
>>> +    }
>>> +
>>>      if (!memory_region_is_ram(section->mr)) {
>>>          return;
>>>      }
>>>  
>>> -    if (!(section->mr != &ram_memory
>>> -          && ( (log_dirty && add) || (!log_dirty && !add)))) {
>>> +    if (log_dirty != add) {
>>>          return;
>>>      }
>>>  
>>> @@ -515,6 +526,50 @@ static void xen_region_del(MemoryListener *listener,
>>>      memory_region_unref(section->mr);
>>>  }
>>>  
>>> +static void xen_io_add(MemoryListener *listener,
>>> +                       MemoryRegionSection *section)
>>> +{
>>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>>> +
>>> +    memory_region_ref(section->mr);
>>> +
>>> +    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
>>> +}
>>> +
>>> +static void xen_io_del(MemoryListener *listener,
>>> +                       MemoryRegionSection *section)
>>> +{
>>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>>> +
>>> +    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid, section);
>>> +
>>> +    memory_region_unref(section->mr);
>>> +}
>>> +
>>> +static void xen_device_realize(DeviceListener *listener,
>>> +			       DeviceState *dev)
>>> +{
>>> +    XenIOState *state = container_of(listener, XenIOState, device_listener);
>>> +
>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>>> +
>>> +        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>> +    }
>>> +}
>>> +
>>> +static void xen_device_unrealize(DeviceListener *listener,
>>> +				 DeviceState *dev)
>>> +{
>>> +    XenIOState *state = container_of(listener, XenIOState, device_listener);
>>> +
>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>>> +
>>> +        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>> +    }
>>> +}
>>> +
>>>  static void xen_sync_dirty_bitmap(XenIOState *state,
>>>                                    hwaddr start_addr,
>>>                                    ram_addr_t size)
>>> @@ -615,6 +670,17 @@ static MemoryListener xen_memory_listener = {
>>>      .priority = 10,
>>>  };
>>>  
>>> +static MemoryListener xen_io_listener = {
>>> +    .region_add = xen_io_add,
>>> +    .region_del = xen_io_del,
>>> +    .priority = 10,
>>> +};
>>> +
>>> +static DeviceListener xen_device_listener = {
>>> +    .realize = xen_device_realize,
>>> +    .unrealize = xen_device_unrealize,
>>> +};
>>> +
>>>  /* get the ioreq packets from share mem */
>>>  static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState *state, int vcpu)
>>>  {
>>> @@ -863,6 +929,27 @@ static void handle_ioreq(XenIOState *state, ioreq_t *req)
>>>          case IOREQ_TYPE_INVALIDATE:
>>>              xen_invalidate_map_cache();
>>>              break;
>>> +        case IOREQ_TYPE_PCI_CONFIG: {
>>> +            uint32_t sbdf = req->addr >> 32;
>>> +            uint32_t val;
>>> +
>>> +            /* Fake a write to port 0xCF8 so that
>>> +             * the config space access will target the
>>> +             * correct device model.
>>> +             */
>>> +            val = (1u << 31) |
>>> +                  ((req->addr & 0x0f00) << 16) |
>>> +                  ((sbdf & 0xffff) << 8) |
>>> +                  (req->addr & 0xfc);
>>> +            do_outp(0xcf8, 4, val);
>>> +
>>> +            /* Now issue the config space access via
>>> +             * port 0xCFC
>>> +             */
>>> +            req->addr = 0xcfc | (req->addr & 0x03);
>>> +            cpu_ioreq_pio(req);
>>> +            break;
>>> +        }
>>>          default:
>>>              hw_error("Invalid ioreq type 0x%x\n", req->type);
>>>      }
>>> @@ -993,9 +1080,15 @@ static void xen_main_loop_prepare(XenIOState *state)
>>>  static void xen_hvm_change_state_handler(void *opaque, int running,
>>>                                           RunState rstate)
>>>  {
>>> +    XenIOState *state = opaque;
>>> +
>>>      if (running) {
>>> -        xen_main_loop_prepare((XenIOState *)opaque);
>>> +        xen_main_loop_prepare(state);
>>>      }
>>> +
>>> +    xen_set_ioreq_server_state(xen_xc, xen_domid,
>>> +                               state->ioservid,
>>> +                               (rstate == RUN_STATE_RUNNING));
>>>  }
>>>  
>>>  static void xen_exit_notifier(Notifier *n, void *data)
>>> @@ -1064,8 +1157,9 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>                   MemoryRegion **ram_memory)
>>>  {
>>>      int i, rc;
>>> -    unsigned long ioreq_pfn;
>>> -    unsigned long bufioreq_evtchn;
>>> +    xen_pfn_t ioreq_pfn;
>>> +    xen_pfn_t bufioreq_pfn;
>>> +    evtchn_port_t bufioreq_evtchn;
>>>      XenIOState *state;
>>>  
>>>      state = g_malloc0(sizeof (XenIOState));
>>> @@ -1082,6 +1176,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>          return -1;
>>>      }
>>>  
>>> +    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state->ioservid);
>>> +    if (rc < 0) {
>>> +        perror("xen: ioreq server create");
>>> +        return -1;
>>> +    }
>>> +
>>>      state->exit.notify = xen_exit_notifier;
>>>      qemu_add_exit_notifier(&state->exit);
>>>  
>>> @@ -1091,8 +1191,18 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>      state->wakeup.notify = xen_wakeup_notifier;
>>>      qemu_register_wakeup_notifier(&state->wakeup);
>>>  
>>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN, &ioreq_pfn);
>>> +    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state->ioservid,
>>> +                                   &ioreq_pfn, &bufioreq_pfn,
>>> +                                   &bufioreq_evtchn);
>>> +    if (rc < 0) {
>>> +        hw_error("failed to get ioreq server info: error %d handle=" XC_INTERFACE_FMT,
>>> +                 errno, xen_xc);
>>> +    }
>>> +
>>>      DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
>>> +    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
>>> +    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
>>> +
>>>      state->shared_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
>>>                                                PROT_READ|PROT_WRITE, ioreq_pfn);
>>>      if (state->shared_page == NULL) {
>>> @@ -1114,10 +1224,10 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>          hw_error("get vmport regs pfn returned error %d, rc=%d", errno, rc);
>>>      }
>>>  
>>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
>>> -    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
>>> -    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
>>> -                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
>>> +    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid,
>>> +                                                   XC_PAGE_SIZE,
>>> +                                                   PROT_READ|PROT_WRITE,
>>> +                                                   bufioreq_pfn);
>>>      if (state->buffered_io_page == NULL) {
>>>          hw_error("map buffered IO page returned error %d", errno);
>>>      }
>>> @@ -1125,6 +1235,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>      /* Note: cpus is empty at this point in init */
>>>      state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
>>>  
>>> +    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state->ioservid, true);
>>> +    if (rc < 0) {
>>> +        hw_error("failed to enable ioreq server info: error %d handle=" XC_INTERFACE_FMT,
>>> +                 errno, xen_xc);
>>> +    }
>>> +
>>>      state->ioreq_local_port = g_malloc0(max_cpus * sizeof (evtchn_port_t));
>>>  
>>>      /* FIXME: how about if we overflow the page here? */
>>> @@ -1132,22 +1248,16 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>          rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>>                                          xen_vcpu_eport(state->shared_page, i));
>>>          if (rc == -1) {
>>> -            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>>> +            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
>>>              return -1;
>>>          }
>>>          state->ioreq_local_port[i] = rc;
>>>      }
>>>  
>>> -    rc = xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_EVTCHN,
>>> -            &bufioreq_evtchn);
>>> -    if (rc < 0) {
>>> -        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
>>> -        return -1;
>>> -    }
>>>      rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>> -            (uint32_t)bufioreq_evtchn);
>>> +                                    bufioreq_evtchn);
>>>      if (rc == -1) {
>>> -        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>>> +        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
>>>          return -1;
>>>      }
>>>      state->bufioreq_local_port = rc;
>>> @@ -1163,6 +1273,12 @@ int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>      memory_listener_register(&state->memory_listener, &address_space_memory);
>>>      state->log_for_dirtybit = NULL;
>>>  
>>> +    state->io_listener = xen_io_listener;
>>> +    memory_listener_register(&state->io_listener, &address_space_io);
>>> +
>>> +    state->device_listener = xen_device_listener;
>>> +    device_listener_register(&state->device_listener);
>>> +
>>>      /* Initialize backend core & drivers */
>>>      if (xen_be_init() != 0) {
>>>          fprintf(stderr, "%s: xen backend core setup failed\n", __FUNCTION__);
>>>
>>
Paul Durrant Jan. 29, 2015, 12:09 p.m. UTC | #4
> -----Original Message-----
> From: Don Slutz [mailto:dslutz@verizon.com]
> Sent: 29 January 2015 00:58
> To: Don Slutz; Paul Durrant; qemu-devel@nongnu.org; Stefano Stabellini
> Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
> Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini
> Subject: Re: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API
> when available
> 
> 
> 
> On 01/28/15 19:05, Don Slutz wrote:
> > On 01/28/15 14:32, Don Slutz wrote:
> >> On 12/05/14 05:50, Paul Durrant wrote:
> >>> The ioreq-server API added to Xen 4.5 offers better security than
> >>> the existing Xen/QEMU interface because the shared pages that are
> >>> used to pass emulation request/results back and forth are removed
> >>> from the guest's memory space before any requests are serviced.
> >>> This prevents the guest from mapping these pages (they are in a
> >>> well known location) and attempting to attack QEMU by synthesizing
> >>> its own request structures. Hence, this patch modifies configure
> >>> to detect whether the API is available, and adds the necessary
> >>> code to use the API if it is.
> >>
> >> This patch (which is now on xenbits qemu staging) is causing me
> >> issues.
> >>
> >
> > I have found the key.
> >
> > The following will reproduce my issue:
> >
> > 1) xl create -p <config>
> > 2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN,
> or
> >    HVM_PARAM_BUFIOREQ_EVTCHN
> > 3) xl unpause new guest
> >
> > The guest will hang in hvmloader.
> >
> > More in thread:
> >
> >
> > Subject: Re: [Xen-devel] [PATCH] ioreq-server: handle
> > IOREQ_TYPE_PCI_CONFIG in assist function
> > References: <1422385589-17316-1-git-send-email-wei.liu2@citrix.com>
> >
> >
> 
> Opps, That thread does not make sense to include what I have found.
> 
> Here is the info I was going to send there:
> 
> 
> Using QEMU upstream master (or xenbits qemu staging), you do not have a
> default ioreq server.  And so hvm_select_ioreq_server() returns NULL for
> hvmloader's iorequest to:
> 
> CPU4  0 (+       0)  HANDLE_PIO [ port = 0x0cfe size = 2 dir = 1 ]
> 
> (I added this xentrace to figure out what is happening, and I have
> a lot of data about it, if any one wants it.)
> 
> To get a guest hang instead of calling hvm_complete_assist_req()
> for some of hvmloader's pci_read() calls, you can do the following:
> 
> 
> 1) xl create -p <config>
> 2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN,
> or
>    HVM_PARAM_BUFIOREQ_EVTCHN
> 3) xl unpause new guest
> 
> The guest will hang in hvmloader.
> 
> The read of HVM_PARAM_IOREQ_PFN will cause a default ioreq server to
> be created and directed to the QEMU upsteam that is not a default
> ioreq server.  This read also creates the extra event channels that
> I see.
> 
> I see that hvmop_create_ioreq_server() prevents you from creating
> an is_default ioreq_server, so QEMU is not able to do.
> 
> Not sure where we go from here.
> 

Given that IIRC you are using a new dedicated IOREQ type, I think there needs to be something that allows an emulator to register for this IOREQ type. How about adding a new type to those defined for HVMOP_map_io_range_to_ioreq_server for your case? (In your case the start and end values in the hypercall would be meaningless but it could be used to steer hvm_select_ioreq_server() into sending all emulation requests or your new type to QEMU.
Actually such a mechanism could be used to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU patches, they are going nowhere. Upstream QEMU (as default) used to ignore them anyway, which is why I didn't bother with such a patch to Xen before but since you now need one maybe you could add that too?

  Paul

>    -Don Slutz
> 
> 
> >     -Don Slutz
> >
> >
> >> So far I have tracked it back to hvm_select_ioreq_server()
> >> which selects the "default_ioreq_server".  Since I have one 1
> >> QEMU, it is both the "default_ioreq_server" and an enabled
> >> 2nd ioreq_server.  I am continuing to understand why my changes
> >> are causing this.  More below.
> >>
> >> This patch causes QEMU to only call xc_evtchn_bind_interdomain()
> >> for the enabled 2nd ioreq_server.  So when (if)
> >> hvm_select_ioreq_server() selects the "default_ioreq_server", the
> >> guest hangs on an I/O.
> >>
> >> Using the debug key 'e':
> >>
> >> (XEN) [2015-01-28 18:57:07] 'e' pressed -> dumping event-channel info
> >> (XEN) [2015-01-28 18:57:07] Event channel information for domain 0:
> >> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
> >> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
> >> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=5 n=0 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=6 n=0 x=0
> >> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=6 n=0 x=0
> >> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=5 n=0 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=6 n=0 x=0
> >> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=6 n=0 x=0
> >> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=5 n=1 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=6 n=1 x=0
> >> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=6 n=1 x=0
> >> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=5 n=1 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=6 n=1 x=0
> >> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=6 n=1 x=0
> >> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=5 n=2 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=6 n=2 x=0
> >> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=6 n=2 x=0
> >> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=5 n=2 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       17 [0/0/0]: s=6 n=2 x=0
> >> (XEN) [2015-01-28 18:57:07]       18 [0/0/0]: s=6 n=2 x=0
> >> (XEN) [2015-01-28 18:57:07]       19 [0/0/0]: s=5 n=3 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]       20 [0/0/0]: s=6 n=3 x=0
> >> (XEN) [2015-01-28 18:57:07]       21 [0/0/0]: s=6 n=3 x=0
> >> (XEN) [2015-01-28 18:57:07]       22 [0/0/0]: s=5 n=3 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       23 [0/0/0]: s=6 n=3 x=0
> >> (XEN) [2015-01-28 18:57:07]       24 [0/0/0]: s=6 n=3 x=0
> >> (XEN) [2015-01-28 18:57:07]       25 [0/0/0]: s=5 n=4 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]       26 [0/0/0]: s=6 n=4 x=0
> >> (XEN) [2015-01-28 18:57:07]       27 [0/0/0]: s=6 n=4 x=0
> >> (XEN) [2015-01-28 18:57:07]       28 [0/0/0]: s=5 n=4 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       29 [0/0/0]: s=6 n=4 x=0
> >> (XEN) [2015-01-28 18:57:07]       30 [0/0/0]: s=6 n=4 x=0
> >> (XEN) [2015-01-28 18:57:07]       31 [0/0/0]: s=5 n=5 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]       32 [0/0/0]: s=6 n=5 x=0
> >> (XEN) [2015-01-28 18:57:07]       33 [0/0/0]: s=6 n=5 x=0
> >> (XEN) [2015-01-28 18:57:07]       34 [0/0/0]: s=5 n=5 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       35 [0/0/0]: s=6 n=5 x=0
> >> (XEN) [2015-01-28 18:57:07]       36 [0/0/0]: s=6 n=5 x=0
> >> (XEN) [2015-01-28 18:57:07]       37 [0/0/0]: s=5 n=6 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]       38 [0/0/0]: s=6 n=6 x=0
> >> (XEN) [2015-01-28 18:57:07]       39 [0/0/0]: s=6 n=6 x=0
> >> (XEN) [2015-01-28 18:57:07]       40 [0/0/0]: s=5 n=6 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       41 [0/0/0]: s=6 n=6 x=0
> >> (XEN) [2015-01-28 18:57:07]       42 [0/0/0]: s=6 n=6 x=0
> >> (XEN) [2015-01-28 18:57:07]       43 [0/0/0]: s=5 n=7 x=0 v=0
> >> (XEN) [2015-01-28 18:57:07]       44 [0/0/0]: s=6 n=7 x=0
> >> (XEN) [2015-01-28 18:57:07]       45 [0/0/0]: s=6 n=7 x=0
> >> (XEN) [2015-01-28 18:57:07]       46 [0/0/0]: s=5 n=7 x=0 v=1
> >> (XEN) [2015-01-28 18:57:07]       47 [0/0/0]: s=6 n=7 x=0
> >> (XEN) [2015-01-28 18:57:07]       48 [0/0/0]: s=6 n=7 x=0
> >> (XEN) [2015-01-28 18:57:07]       49 [0/0/0]: s=3 n=0 x=0 d=0 p=58
> >> (XEN) [2015-01-28 18:57:07]       50 [0/0/0]: s=5 n=0 x=0 v=9
> >> (XEN) [2015-01-28 18:57:07]       51 [0/0/0]: s=4 n=0 x=0 p=9 i=9
> >> (XEN) [2015-01-28 18:57:07]       52 [0/0/0]: s=5 n=0 x=0 v=2
> >> (XEN) [2015-01-28 18:57:07]       53 [0/0/0]: s=4 n=4 x=0 p=16 i=16
> >> (XEN) [2015-01-28 18:57:07]       54 [0/0/0]: s=4 n=0 x=0 p=17 i=17
> >> (XEN) [2015-01-28 18:57:07]       55 [0/0/0]: s=4 n=6 x=0 p=18 i=18
> >> (XEN) [2015-01-28 18:57:07]       56 [0/0/0]: s=4 n=0 x=0 p=8 i=8
> >> (XEN) [2015-01-28 18:57:07]       57 [0/0/0]: s=4 n=0 x=0 p=19 i=19
> >> (XEN) [2015-01-28 18:57:07]       58 [0/0/0]: s=3 n=0 x=0 d=0 p=49
> >> (XEN) [2015-01-28 18:57:07]       59 [0/0/0]: s=5 n=0 x=0 v=3
> >> (XEN) [2015-01-28 18:57:07]       60 [0/0/0]: s=5 n=0 x=0 v=4
> >> (XEN) [2015-01-28 18:57:07]       61 [0/0/0]: s=3 n=0 x=0 d=1 p=1
> >> (XEN) [2015-01-28 18:57:07]       62 [0/0/0]: s=3 n=0 x=0 d=1 p=2
> >> (XEN) [2015-01-28 18:57:07]       63 [0/0/0]: s=3 n=0 x=0 d=1 p=3
> >> (XEN) [2015-01-28 18:57:07]       64 [0/0/0]: s=3 n=0 x=0 d=1 p=5
> >> (XEN) [2015-01-28 18:57:07]       65 [0/0/0]: s=3 n=0 x=0 d=1 p=6
> >> (XEN) [2015-01-28 18:57:07]       66 [0/0/0]: s=3 n=0 x=0 d=1 p=7
> >> (XEN) [2015-01-28 18:57:07]       67 [0/0/0]: s=3 n=0 x=0 d=1 p=8
> >> (XEN) [2015-01-28 18:57:07]       68 [0/0/0]: s=3 n=0 x=0 d=1 p=9
> >> (XEN) [2015-01-28 18:57:07]       69 [0/0/0]: s=3 n=0 x=0 d=1 p=4
> >> (XEN) [2015-01-28 18:57:07] Event channel information for domain 1:
> >> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
> >> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
> >> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=3 n=0 x=0 d=0 p=61
> >> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=3 n=0 x=0 d=0 p=62
> >> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=3 n=0 x=1 d=0 p=63
> >> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=3 n=0 x=1 d=0 p=69
> >> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=3 n=1 x=1 d=0 p=64
> >> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=3 n=2 x=1 d=0 p=65
> >> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=3 n=3 x=1 d=0 p=66
> >> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=3 n=4 x=1 d=0 p=67
> >> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=3 n=5 x=1 d=0 p=68
> >> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=2 n=0 x=1 d=0
> >> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=2 n=0 x=1 d=0
> >> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=2 n=1 x=1 d=0
> >> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=2 n=2 x=1 d=0
> >> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=2 n=3 x=1 d=0
> >> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=2 n=4 x=1 d=0
> >> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=2 n=5 x=1 d=0
> >>
> >> You can see that domain 1 has only half of it's event channels
> >> fully setup.  So when (if) hvm_send_assist_req_to_ioreq_server()
> >> does:
> >>
> >>             notify_via_xen_event_channel(d, port);
> >>
> >> Nothing happens and you hang in hvm_wait_for_io() forever.
> >>
> >>
> >> This does raise the questions:
> >>
> >> 1) Does this patch causes extra event channels to be created
> >>    that cannot be used?
> >>
> >> 2) Should the "default_ioreq_server" be deleted?
> >>
> >>
> >> Not sure the right way to go.
> >>
> >>     -Don Slutz
> >>
> >>
> >>>
> >>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> >>> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> >>> Cc: Peter Maydell <peter.maydell@linaro.org>
> >>> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >>> Cc: Michael Tokarev <mjt@tls.msk.ru>
> >>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> >>> Cc: Stefan Weil <sw@weilnetz.de>
> >>> Cc: Olaf Hering <olaf@aepfle.de>
> >>> Cc: Gerd Hoffmann <kraxel@redhat.com>
> >>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>> Cc: Alexander Graf <agraf@suse.de>
> >>> ---
> >>>  configure                   |   29 ++++++
> >>>  include/hw/xen/xen_common.h |  223
> +++++++++++++++++++++++++++++++++++++++++++
> >>>  trace-events                |    9 ++
> >>>  xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
> >>>  4 files changed, 399 insertions(+), 22 deletions(-)
> >>>
> >>> diff --git a/configure b/configure
> >>> index 47048f0..b1f8c2a 100755
> >>> --- a/configure
> >>> +++ b/configure
> >>> @@ -1877,6 +1877,32 @@ int main(void) {
> >>>    xc_gnttab_open(NULL, 0);
> >>>    xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
> >>>    xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
> >>> +  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
> >>> +  return 0;
> >>> +}
> >>> +EOF
> >>> +      compile_prog "" "$xen_libs"
> >>> +    then
> >>> +    xen_ctrl_version=450
> >>> +    xen=yes
> >>> +
> >>> +  elif
> >>> +      cat > $TMPC <<EOF &&
> >>> +#include <xenctrl.h>
> >>> +#include <xenstore.h>
> >>> +#include <stdint.h>
> >>> +#include <xen/hvm/hvm_info_table.h>
> >>> +#if !defined(HVM_MAX_VCPUS)
> >>> +# error HVM_MAX_VCPUS not defined
> >>> +#endif
> >>> +int main(void) {
> >>> +  xc_interface *xc;
> >>> +  xs_daemon_open();
> >>> +  xc = xc_interface_open(0, 0, 0);
> >>> +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
> >>> +  xc_gnttab_open(NULL, 0);
> >>> +  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
> >>> +  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
> >>>    return 0;
> >>>  }
> >>>  EOF
> >>> @@ -4283,6 +4309,9 @@ if test -n "$sparc_cpu"; then
> >>>      echo "Target Sparc Arch $sparc_cpu"
> >>>  fi
> >>>  echo "xen support       $xen"
> >>> +if test "$xen" = "yes" ; then
> >>> +  echo "xen ctrl version  $xen_ctrl_version"
> >>> +fi
> >>>  echo "brlapi support    $brlapi"
> >>>  echo "bluez  support    $bluez"
> >>>  echo "Documentation     $docs"
> >>> diff --git a/include/hw/xen/xen_common.h
> b/include/hw/xen/xen_common.h
> >>> index 95612a4..519696f 100644
> >>> --- a/include/hw/xen/xen_common.h
> >>> +++ b/include/hw/xen/xen_common.h
> >>> @@ -16,7 +16,9 @@
> >>>
> >>>  #include "hw/hw.h"
> >>>  #include "hw/xen/xen.h"
> >>> +#include "hw/pci/pci.h"
> >>>  #include "qemu/queue.h"
> >>> +#include "trace.h"
> >>>
> >>>  /*
> >>>   * We don't support Xen prior to 3.3.0.
> >>> @@ -179,4 +181,225 @@ static inline int
> xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
> >>>  }
> >>>  #endif
> >>>
> >>> +/* Xen before 4.5 */
> >>> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
> >>> +
> >>> +#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
> >>> +#define HVM_PARAM_BUFIOREQ_EVTCHN 26
> >>> +#endif
> >>> +
> >>> +#define IOREQ_TYPE_PCI_CONFIG 2
> >>> +
> >>> +typedef uint32_t ioservid_t;
> >>> +
> >>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
> >>> +                                          ioservid_t ioservid,
> >>> +                                          MemoryRegionSection *section)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t
> dom,
> >>> +                                            ioservid_t ioservid,
> >>> +                                            MemoryRegionSection *section)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
> >>> +                                      ioservid_t ioservid,
> >>> +                                      MemoryRegionSection *section)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
> >>> +                                        ioservid_t ioservid,
> >>> +                                        MemoryRegionSection *section)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
> >>> +                                  ioservid_t ioservid,
> >>> +                                  PCIDevice *pci_dev)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
> >>> +                                    ioservid_t ioservid,
> >>> +                                    PCIDevice *pci_dev)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
> >>> +                                          ioservid_t *ioservid)
> >>> +{
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
> >>> +                                            ioservid_t ioservid)
> >>> +{
> >>> +}
> >>> +
> >>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
> >>> +                                            ioservid_t ioservid,
> >>> +                                            xen_pfn_t *ioreq_pfn,
> >>> +                                            xen_pfn_t *bufioreq_pfn,
> >>> +                                            evtchn_port_t *bufioreq_evtchn)
> >>> +{
> >>> +    unsigned long param;
> >>> +    int rc;
> >>> +
> >>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN,
> &param);
> >>> +    if (rc < 0) {
> >>> +        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
> >>> +        return -1;
> >>> +    }
> >>> +
> >>> +    *ioreq_pfn = param;
> >>> +
> >>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN,
> &param);
> >>> +    if (rc < 0) {
> >>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
> >>> +        return -1;
> >>> +    }
> >>> +
> >>> +    *bufioreq_pfn = param;
> >>> +
> >>> +    rc = xc_get_hvm_param(xc, dom,
> HVM_PARAM_BUFIOREQ_EVTCHN,
> >>> +                          &param);
> >>> +    if (rc < 0) {
> >>> +        fprintf(stderr, "failed to get
> HVM_PARAM_BUFIOREQ_EVTCHN\n");
> >>> +        return -1;
> >>> +    }
> >>> +
> >>> +    *bufioreq_evtchn = param;
> >>> +
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
> >>> +                                             ioservid_t ioservid,
> >>> +                                             bool enable)
> >>> +{
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +/* Xen 4.5 */
> >>> +#else
> >>> +
> >>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
> >>> +                                          ioservid_t ioservid,
> >>> +                                          MemoryRegionSection *section)
> >>> +{
> >>> +    hwaddr start_addr = section->offset_within_address_space;
> >>> +    ram_addr_t size = int128_get64(section->size);
> >>> +    hwaddr end_addr = start_addr + size - 1;
> >>> +
> >>> +    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
> >>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
> >>> +                                        start_addr, end_addr);
> >>> +}
> >>> +
> >>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t
> dom,
> >>> +                                            ioservid_t ioservid,
> >>> +                                            MemoryRegionSection *section)
> >>> +{
> >>> +    hwaddr start_addr = section->offset_within_address_space;
> >>> +    ram_addr_t size = int128_get64(section->size);
> >>> +    hwaddr end_addr = start_addr + size - 1;
> >>> +
> >>> +    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
> >>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
> >>> +                                            start_addr, end_addr);
> >>> +}
> >>> +
> >>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
> >>> +                                      ioservid_t ioservid,
> >>> +                                      MemoryRegionSection *section)
> >>> +{
> >>> +    hwaddr start_addr = section->offset_within_address_space;
> >>> +    ram_addr_t size = int128_get64(section->size);
> >>> +    hwaddr end_addr = start_addr + size - 1;
> >>> +
> >>> +    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
> >>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
> >>> +                                        start_addr, end_addr);
> >>> +}
> >>> +
> >>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
> >>> +                                        ioservid_t ioservid,
> >>> +                                        MemoryRegionSection *section)
> >>> +{
> >>> +    hwaddr start_addr = section->offset_within_address_space;
> >>> +    ram_addr_t size = int128_get64(section->size);
> >>> +    hwaddr end_addr = start_addr + size - 1;
> >>> +
> >>> +    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
> >>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
> >>> +                                            start_addr, end_addr);
> >>> +}
> >>> +
> >>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
> >>> +                                  ioservid_t ioservid,
> >>> +                                  PCIDevice *pci_dev)
> >>> +{
> >>> +    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
> >>> +                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
> >>> +    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
> >>> +                                      0, pci_bus_num(pci_dev->bus),
> >>> +                                      PCI_SLOT(pci_dev->devfn),
> >>> +                                      PCI_FUNC(pci_dev->devfn));
> >>> +}
> >>> +
> >>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
> >>> +                                    ioservid_t ioservid,
> >>> +                                    PCIDevice *pci_dev)
> >>> +{
> >>> +    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
> >>> +                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
> >>> +    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
> >>> +                                          0, pci_bus_num(pci_dev->bus),
> >>> +                                          PCI_SLOT(pci_dev->devfn),
> >>> +                                          PCI_FUNC(pci_dev->devfn));
> >>> +}
> >>> +
> >>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
> >>> +                                          ioservid_t *ioservid)
> >>> +{
> >>> +    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
> >>> +
> >>> +    if (rc == 0) {
> >>> +        trace_xen_ioreq_server_create(*ioservid);
> >>> +    }
> >>> +
> >>> +    return rc;
> >>> +}
> >>> +
> >>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
> >>> +                                            ioservid_t ioservid)
> >>> +{
> >>> +    trace_xen_ioreq_server_destroy(ioservid);
> >>> +    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
> >>> +}
> >>> +
> >>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
> >>> +                                            ioservid_t ioservid,
> >>> +                                            xen_pfn_t *ioreq_pfn,
> >>> +                                            xen_pfn_t *bufioreq_pfn,
> >>> +                                            evtchn_port_t *bufioreq_evtchn)
> >>> +{
> >>> +    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
> >>> +                                        ioreq_pfn, bufioreq_pfn,
> >>> +                                        bufioreq_evtchn);
> >>> +}
> >>> +
> >>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
> >>> +                                             ioservid_t ioservid,
> >>> +                                             bool enable)
> >>> +{
> >>> +    trace_xen_ioreq_server_state(ioservid, enable);
> >>> +    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
> >>> +}
> >>> +
> >>> +#endif
> >>> +
> >>>  #endif /* QEMU_HW_XEN_COMMON_H */
> >>> diff --git a/trace-events b/trace-events
> >>> index b5722ea..abd1118 100644
> >>> --- a/trace-events
> >>> +++ b/trace-events
> >>> @@ -897,6 +897,15 @@ pvscsi_tx_rings_num_pages(const char* label,
> uint32_t num) "Number of %s pages:
> >>>  # xen-hvm.c
> >>>  xen_ram_alloc(unsigned long ram_addr, unsigned long size)
> "requested: %#lx, size %#lx"
> >>>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool
> log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
> >>> +xen_ioreq_server_create(uint32_t id) "id: %u"
> >>> +xen_ioreq_server_destroy(uint32_t id) "id: %u"
> >>> +xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
> >>> +xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t
> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> >>> +xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t
> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> >>> +xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t
> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> >>> +xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t
> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
> >>> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
> "id: %u bdf: %02x.%02x.%02x"
> >>> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
> "id: %u bdf: %02x.%02x.%02x"
> >>>
> >>>  # xen-mapcache.c
> >>>  xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
> >>> diff --git a/xen-hvm.c b/xen-hvm.c
> >>> index 7548794..31cb3ca 100644
> >>> --- a/xen-hvm.c
> >>> +++ b/xen-hvm.c
> >>> @@ -85,9 +85,6 @@ static inline ioreq_t
> *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
> >>>  }
> >>>  #  define FMT_ioreq_size "u"
> >>>  #endif
> >>> -#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
> >>> -#define HVM_PARAM_BUFIOREQ_EVTCHN 26
> >>> -#endif
> >>>
> >>>  #define BUFFER_IO_MAX_DELAY  100
> >>>
> >>> @@ -101,6 +98,7 @@ typedef struct XenPhysmap {
> >>>  } XenPhysmap;
> >>>
> >>>  typedef struct XenIOState {
> >>> +    ioservid_t ioservid;
> >>>      shared_iopage_t *shared_page;
> >>>      shared_vmport_iopage_t *shared_vmport_page;
> >>>      buffered_iopage_t *buffered_io_page;
> >>> @@ -117,6 +115,8 @@ typedef struct XenIOState {
> >>>
> >>>      struct xs_handle *xenstore;
> >>>      MemoryListener memory_listener;
> >>> +    MemoryListener io_listener;
> >>> +    DeviceListener device_listener;
> >>>      QLIST_HEAD(, XenPhysmap) physmap;
> >>>      hwaddr free_phys_offset;
> >>>      const XenPhysmap *log_for_dirtybit;
> >>> @@ -467,12 +467,23 @@ static void xen_set_memory(struct
> MemoryListener *listener,
> >>>      bool log_dirty = memory_region_is_logging(section->mr);
> >>>      hvmmem_type_t mem_type;
> >>>
> >>> +    if (section->mr == &ram_memory) {
> >>> +        return;
> >>> +    } else {
> >>> +        if (add) {
> >>> +            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
> >>> +                                   section);
> >>> +        } else {
> >>> +            xen_unmap_memory_section(xen_xc, xen_domid, state-
> >ioservid,
> >>> +                                     section);
> >>> +        }
> >>> +    }
> >>> +
> >>>      if (!memory_region_is_ram(section->mr)) {
> >>>          return;
> >>>      }
> >>>
> >>> -    if (!(section->mr != &ram_memory
> >>> -          && ( (log_dirty && add) || (!log_dirty && !add)))) {
> >>> +    if (log_dirty != add) {
> >>>          return;
> >>>      }
> >>>
> >>> @@ -515,6 +526,50 @@ static void xen_region_del(MemoryListener
> *listener,
> >>>      memory_region_unref(section->mr);
> >>>  }
> >>>
> >>> +static void xen_io_add(MemoryListener *listener,
> >>> +                       MemoryRegionSection *section)
> >>> +{
> >>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
> >>> +
> >>> +    memory_region_ref(section->mr);
> >>> +
> >>> +    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
> >>> +}
> >>> +
> >>> +static void xen_io_del(MemoryListener *listener,
> >>> +                       MemoryRegionSection *section)
> >>> +{
> >>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
> >>> +
> >>> +    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid,
> section);
> >>> +
> >>> +    memory_region_unref(section->mr);
> >>> +}
> >>> +
> >>> +static void xen_device_realize(DeviceListener *listener,
> >>> +			       DeviceState *dev)
> >>> +{
> >>> +    XenIOState *state = container_of(listener, XenIOState,
> device_listener);
> >>> +
> >>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> >>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
> >>> +
> >>> +        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
> >>> +    }
> >>> +}
> >>> +
> >>> +static void xen_device_unrealize(DeviceListener *listener,
> >>> +				 DeviceState *dev)
> >>> +{
> >>> +    XenIOState *state = container_of(listener, XenIOState,
> device_listener);
> >>> +
> >>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> >>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
> >>> +
> >>> +        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
> >>> +    }
> >>> +}
> >>> +
> >>>  static void xen_sync_dirty_bitmap(XenIOState *state,
> >>>                                    hwaddr start_addr,
> >>>                                    ram_addr_t size)
> >>> @@ -615,6 +670,17 @@ static MemoryListener xen_memory_listener =
> {
> >>>      .priority = 10,
> >>>  };
> >>>
> >>> +static MemoryListener xen_io_listener = {
> >>> +    .region_add = xen_io_add,
> >>> +    .region_del = xen_io_del,
> >>> +    .priority = 10,
> >>> +};
> >>> +
> >>> +static DeviceListener xen_device_listener = {
> >>> +    .realize = xen_device_realize,
> >>> +    .unrealize = xen_device_unrealize,
> >>> +};
> >>> +
> >>>  /* get the ioreq packets from share mem */
> >>>  static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState
> *state, int vcpu)
> >>>  {
> >>> @@ -863,6 +929,27 @@ static void handle_ioreq(XenIOState *state,
> ioreq_t *req)
> >>>          case IOREQ_TYPE_INVALIDATE:
> >>>              xen_invalidate_map_cache();
> >>>              break;
> >>> +        case IOREQ_TYPE_PCI_CONFIG: {
> >>> +            uint32_t sbdf = req->addr >> 32;
> >>> +            uint32_t val;
> >>> +
> >>> +            /* Fake a write to port 0xCF8 so that
> >>> +             * the config space access will target the
> >>> +             * correct device model.
> >>> +             */
> >>> +            val = (1u << 31) |
> >>> +                  ((req->addr & 0x0f00) << 16) |
> >>> +                  ((sbdf & 0xffff) << 8) |
> >>> +                  (req->addr & 0xfc);
> >>> +            do_outp(0xcf8, 4, val);
> >>> +
> >>> +            /* Now issue the config space access via
> >>> +             * port 0xCFC
> >>> +             */
> >>> +            req->addr = 0xcfc | (req->addr & 0x03);
> >>> +            cpu_ioreq_pio(req);
> >>> +            break;
> >>> +        }
> >>>          default:
> >>>              hw_error("Invalid ioreq type 0x%x\n", req->type);
> >>>      }
> >>> @@ -993,9 +1080,15 @@ static void
> xen_main_loop_prepare(XenIOState *state)
> >>>  static void xen_hvm_change_state_handler(void *opaque, int running,
> >>>                                           RunState rstate)
> >>>  {
> >>> +    XenIOState *state = opaque;
> >>> +
> >>>      if (running) {
> >>> -        xen_main_loop_prepare((XenIOState *)opaque);
> >>> +        xen_main_loop_prepare(state);
> >>>      }
> >>> +
> >>> +    xen_set_ioreq_server_state(xen_xc, xen_domid,
> >>> +                               state->ioservid,
> >>> +                               (rstate == RUN_STATE_RUNNING));
> >>>  }
> >>>
> >>>  static void xen_exit_notifier(Notifier *n, void *data)
> >>> @@ -1064,8 +1157,9 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>                   MemoryRegion **ram_memory)
> >>>  {
> >>>      int i, rc;
> >>> -    unsigned long ioreq_pfn;
> >>> -    unsigned long bufioreq_evtchn;
> >>> +    xen_pfn_t ioreq_pfn;
> >>> +    xen_pfn_t bufioreq_pfn;
> >>> +    evtchn_port_t bufioreq_evtchn;
> >>>      XenIOState *state;
> >>>
> >>>      state = g_malloc0(sizeof (XenIOState));
> >>> @@ -1082,6 +1176,12 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>          return -1;
> >>>      }
> >>>
> >>> +    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state-
> >ioservid);
> >>> +    if (rc < 0) {
> >>> +        perror("xen: ioreq server create");
> >>> +        return -1;
> >>> +    }
> >>> +
> >>>      state->exit.notify = xen_exit_notifier;
> >>>      qemu_add_exit_notifier(&state->exit);
> >>>
> >>> @@ -1091,8 +1191,18 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>      state->wakeup.notify = xen_wakeup_notifier;
> >>>      qemu_register_wakeup_notifier(&state->wakeup);
> >>>
> >>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN,
> &ioreq_pfn);
> >>> +    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state-
> >ioservid,
> >>> +                                   &ioreq_pfn, &bufioreq_pfn,
> >>> +                                   &bufioreq_evtchn);
> >>> +    if (rc < 0) {
> >>> +        hw_error("failed to get ioreq server info: error %d handle="
> XC_INTERFACE_FMT,
> >>> +                 errno, xen_xc);
> >>> +    }
> >>> +
> >>>      DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
> >>> +    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
> >>> +    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
> >>> +
> >>>      state->shared_page = xc_map_foreign_range(xen_xc, xen_domid,
> XC_PAGE_SIZE,
> >>>                                                PROT_READ|PROT_WRITE, ioreq_pfn);
> >>>      if (state->shared_page == NULL) {
> >>> @@ -1114,10 +1224,10 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>          hw_error("get vmport regs pfn returned error %d, rc=%d", errno,
> rc);
> >>>      }
> >>>
> >>> -    xc_get_hvm_param(xen_xc, xen_domid,
> HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
> >>> -    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
> >>> -    state->buffered_io_page = xc_map_foreign_range(xen_xc,
> xen_domid, XC_PAGE_SIZE,
> >>> -                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
> >>> +    state->buffered_io_page = xc_map_foreign_range(xen_xc,
> xen_domid,
> >>> +                                                   XC_PAGE_SIZE,
> >>> +                                                   PROT_READ|PROT_WRITE,
> >>> +                                                   bufioreq_pfn);
> >>>      if (state->buffered_io_page == NULL) {
> >>>          hw_error("map buffered IO page returned error %d", errno);
> >>>      }
> >>> @@ -1125,6 +1235,12 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>      /* Note: cpus is empty at this point in init */
> >>>      state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
> >>>
> >>> +    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state-
> >ioservid, true);
> >>> +    if (rc < 0) {
> >>> +        hw_error("failed to enable ioreq server info: error %d handle="
> XC_INTERFACE_FMT,
> >>> +                 errno, xen_xc);
> >>> +    }
> >>> +
> >>>      state->ioreq_local_port = g_malloc0(max_cpus * sizeof
> (evtchn_port_t));
> >>>
> >>>      /* FIXME: how about if we overflow the page here? */
> >>> @@ -1132,22 +1248,16 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>          rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
> >>>                                          xen_vcpu_eport(state->shared_page, i));
> >>>          if (rc == -1) {
> >>> -            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
> >>> +            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
> >>>              return -1;
> >>>          }
> >>>          state->ioreq_local_port[i] = rc;
> >>>      }
> >>>
> >>> -    rc = xc_get_hvm_param(xen_xc, xen_domid,
> HVM_PARAM_BUFIOREQ_EVTCHN,
> >>> -            &bufioreq_evtchn);
> >>> -    if (rc < 0) {
> >>> -        fprintf(stderr, "failed to get
> HVM_PARAM_BUFIOREQ_EVTCHN\n");
> >>> -        return -1;
> >>> -    }
> >>>      rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
> >>> -            (uint32_t)bufioreq_evtchn);
> >>> +                                    bufioreq_evtchn);
> >>>      if (rc == -1) {
> >>> -        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
> >>> +        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
> >>>          return -1;
> >>>      }
> >>>      state->bufioreq_local_port = rc;
> >>> @@ -1163,6 +1273,12 @@ int xen_hvm_init(ram_addr_t
> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
> >>>      memory_listener_register(&state->memory_listener,
> &address_space_memory);
> >>>      state->log_for_dirtybit = NULL;
> >>>
> >>> +    state->io_listener = xen_io_listener;
> >>> +    memory_listener_register(&state->io_listener, &address_space_io);
> >>> +
> >>> +    state->device_listener = xen_device_listener;
> >>> +    device_listener_register(&state->device_listener);
> >>> +
> >>>      /* Initialize backend core & drivers */
> >>>      if (xen_be_init() != 0) {
> >>>          fprintf(stderr, "%s: xen backend core setup failed\n",
> __FUNCTION__);
> >>>
> >>
Don Slutz Jan. 29, 2015, 7:14 p.m. UTC | #5
On 01/29/15 07:09, Paul Durrant wrote:
>> -----Original Message-----
>> From: Don Slutz [mailto:dslutz@verizon.com]
>> Sent: 29 January 2015 00:58
>> To: Don Slutz; Paul Durrant; qemu-devel@nongnu.org; Stefano Stabellini
>> Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
>> Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini
>> Subject: Re: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API
>> when available
>>
>>
>>
>> On 01/28/15 19:05, Don Slutz wrote:
>>> On 01/28/15 14:32, Don Slutz wrote:
>>>> On 12/05/14 05:50, Paul Durrant wrote:
>>>>> The ioreq-server API added to Xen 4.5 offers better security than
>>>>> the existing Xen/QEMU interface because the shared pages that are
>>>>> used to pass emulation request/results back and forth are removed
>>>>> from the guest's memory space before any requests are serviced.
>>>>> This prevents the guest from mapping these pages (they are in a
>>>>> well known location) and attempting to attack QEMU by synthesizing
>>>>> its own request structures. Hence, this patch modifies configure
>>>>> to detect whether the API is available, and adds the necessary
>>>>> code to use the API if it is.
>>>>
>>>> This patch (which is now on xenbits qemu staging) is causing me
>>>> issues.
>>>>
>>>
>>> I have found the key.
>>>
>>> The following will reproduce my issue:
>>>
>>> 1) xl create -p <config>
>>> 2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN,
>> or
>>>    HVM_PARAM_BUFIOREQ_EVTCHN
>>> 3) xl unpause new guest
>>>
>>> The guest will hang in hvmloader.
>>>
>>> More in thread:
>>>
>>>
>>> Subject: Re: [Xen-devel] [PATCH] ioreq-server: handle
>>> IOREQ_TYPE_PCI_CONFIG in assist function
>>> References: <1422385589-17316-1-git-send-email-wei.liu2@citrix.com>
>>>
>>>
>>
>> Opps, That thread does not make sense to include what I have found.
>>
>> Here is the info I was going to send there:
>>
>>
>> Using QEMU upstream master (or xenbits qemu staging), you do not have a
>> default ioreq server.  And so hvm_select_ioreq_server() returns NULL for
>> hvmloader's iorequest to:
>>
>> CPU4  0 (+       0)  HANDLE_PIO [ port = 0x0cfe size = 2 dir = 1 ]
>>
>> (I added this xentrace to figure out what is happening, and I have
>> a lot of data about it, if any one wants it.)
>>
>> To get a guest hang instead of calling hvm_complete_assist_req()
>> for some of hvmloader's pci_read() calls, you can do the following:
>>
>>
>> 1) xl create -p <config>
>> 2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN,
>> or
>>    HVM_PARAM_BUFIOREQ_EVTCHN
>> 3) xl unpause new guest
>>
>> The guest will hang in hvmloader.
>>
>> The read of HVM_PARAM_IOREQ_PFN will cause a default ioreq server to
>> be created and directed to the QEMU upsteam that is not a default
>> ioreq server.  This read also creates the extra event channels that
>> I see.
>>
>> I see that hvmop_create_ioreq_server() prevents you from creating
>> an is_default ioreq_server, so QEMU is not able to do.
>>
>> Not sure where we go from here.
>>
> 
> Given that IIRC you are using a new dedicated IOREQ type, I
> think there needs to be something that allows an emulator to
> register for this IOREQ type. How about adding a new type to
> those defined for HVMOP_map_io_range_to_ioreq_server for your
> case? (In your case the start and end values in the hypercall
> would be meaningless but it could be used to steer
> hvm_select_ioreq_server() into sending all emulation requests or
> your new type to QEMU.
> 
> Actually such a mechanism could be used
> to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU
> patches, they are going nowhere. Upstream QEMU (as default) used
> to ignore them anyway, which is why I didn't bother with such a
> patch to Xen before but since you now need one maybe you could
> add that too?
>

I am confused by these statements.  They do contain useful information
but do not have any relation to the issue I am talking about.

Here is longer description:


In a newly cloned xen with the .config:

QEMU_UPSTREAM_REVISION = master
QEMU_UPSTREAM_URL = git://xenbits.xen.org/staging/qemu-upstream-unstable.git
debug = n

And build it:

./configure --prefix=/usr --disable-stubdom
make -j8 rpmball

And run it:

[root@dcs-xen-54 ~]# xl cre -p -V /home/don/aoe-xfg/C63-min-tools.trace.xfg

Read hvm_param(Ioreq_Pfn)

[root@dcs-xen-54 ~]# xl unpause C63-min-tools
[root@dcs-xen-54 ~]# xl list
Name                                        ID   Mem VCPUs      State
Time(s)
Domain-0                                     0  2048     8     r-----
   35.5
C63-min-tools                                1  8194     1     ------
    0.0
[root@dcs-xen-54 ~]# date
Thu Jan 29 12:23:10 EST 2015
[root@dcs-xen-54 ~]# xl list
Name                                        ID   Mem VCPUs      State
Time(s)
Domain-0                                     0  2048     8     r-----
   66.9
C63-min-tools                                1  8194     1     ------
    0.0
[root@dcs-xen-54 ~]# /usr/lib/xen/bin/xenctx 1
cs:eip: 0018:00101583
flags: 00000002 nz
ss:esp: 0020:001ba488
eax: 80000108   ebx: 00000002   ecx: 00000002   edx: 00000cfe
esi: 00000000   edi: 00000000   ebp: 00000000
 ds:     0020    es:     0020    fs:     0020    gs:     0020
Code (instr addr 00101583)
0c 00 00 ec 0f b6 c0 5b c3 90 83 e3 02 8d 93 fc 0c 00 00 66 ed <0f> b7
c0 5b c3 90 8d b4 26 00 00
[root@dcs-xen-54 ~]#


You can see that the guest is still waiting for the inl from 0x00000cfe.




-- I used the tool (From:

Subject: [OPTIONAL][PATCH for-4.5 v7 7/7] Add xen-hvm-param
Date: Thu, 2 Oct 2014 17:30:17 -0400
Message-ID: <1412285417-19180-8-git-send-email-dslutz@verizon.com>
X-Mailer: git-send-email 1.8.4
In-Reply-To: <1412285417-19180-1-git-send-email-dslutz@verizon.com>

) "dist/install/usr/sbin/xen-hvm-param 1" to do the hvm param read (any
way that calls on xc_get_hvm_param(,,HVM_PARAM_IOREQ_PFN,) will cause
the issue).

   -Don Slutz


>   Paul
> 
>>    -Don Slutz
>>
>>
>>>     -Don Slutz
>>>
>>>
>>>> So far I have tracked it back to hvm_select_ioreq_server()
>>>> which selects the "default_ioreq_server".  Since I have one 1
>>>> QEMU, it is both the "default_ioreq_server" and an enabled
>>>> 2nd ioreq_server.  I am continuing to understand why my changes
>>>> are causing this.  More below.
>>>>
>>>> This patch causes QEMU to only call xc_evtchn_bind_interdomain()
>>>> for the enabled 2nd ioreq_server.  So when (if)
>>>> hvm_select_ioreq_server() selects the "default_ioreq_server", the
>>>> guest hangs on an I/O.
>>>>
>>>> Using the debug key 'e':
>>>>
>>>> (XEN) [2015-01-28 18:57:07] 'e' pressed -> dumping event-channel info
>>>> (XEN) [2015-01-28 18:57:07] Event channel information for domain 0:
>>>> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
>>>> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
>>>> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=5 n=0 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=6 n=0 x=0
>>>> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=6 n=0 x=0
>>>> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=5 n=0 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=6 n=0 x=0
>>>> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=6 n=0 x=0
>>>> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=5 n=1 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=6 n=1 x=0
>>>> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=6 n=1 x=0
>>>> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=5 n=1 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=6 n=1 x=0
>>>> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=6 n=1 x=0
>>>> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=5 n=2 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=6 n=2 x=0
>>>> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=6 n=2 x=0
>>>> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=5 n=2 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       17 [0/0/0]: s=6 n=2 x=0
>>>> (XEN) [2015-01-28 18:57:07]       18 [0/0/0]: s=6 n=2 x=0
>>>> (XEN) [2015-01-28 18:57:07]       19 [0/0/0]: s=5 n=3 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]       20 [0/0/0]: s=6 n=3 x=0
>>>> (XEN) [2015-01-28 18:57:07]       21 [0/0/0]: s=6 n=3 x=0
>>>> (XEN) [2015-01-28 18:57:07]       22 [0/0/0]: s=5 n=3 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       23 [0/0/0]: s=6 n=3 x=0
>>>> (XEN) [2015-01-28 18:57:07]       24 [0/0/0]: s=6 n=3 x=0
>>>> (XEN) [2015-01-28 18:57:07]       25 [0/0/0]: s=5 n=4 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]       26 [0/0/0]: s=6 n=4 x=0
>>>> (XEN) [2015-01-28 18:57:07]       27 [0/0/0]: s=6 n=4 x=0
>>>> (XEN) [2015-01-28 18:57:07]       28 [0/0/0]: s=5 n=4 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       29 [0/0/0]: s=6 n=4 x=0
>>>> (XEN) [2015-01-28 18:57:07]       30 [0/0/0]: s=6 n=4 x=0
>>>> (XEN) [2015-01-28 18:57:07]       31 [0/0/0]: s=5 n=5 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]       32 [0/0/0]: s=6 n=5 x=0
>>>> (XEN) [2015-01-28 18:57:07]       33 [0/0/0]: s=6 n=5 x=0
>>>> (XEN) [2015-01-28 18:57:07]       34 [0/0/0]: s=5 n=5 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       35 [0/0/0]: s=6 n=5 x=0
>>>> (XEN) [2015-01-28 18:57:07]       36 [0/0/0]: s=6 n=5 x=0
>>>> (XEN) [2015-01-28 18:57:07]       37 [0/0/0]: s=5 n=6 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]       38 [0/0/0]: s=6 n=6 x=0
>>>> (XEN) [2015-01-28 18:57:07]       39 [0/0/0]: s=6 n=6 x=0
>>>> (XEN) [2015-01-28 18:57:07]       40 [0/0/0]: s=5 n=6 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       41 [0/0/0]: s=6 n=6 x=0
>>>> (XEN) [2015-01-28 18:57:07]       42 [0/0/0]: s=6 n=6 x=0
>>>> (XEN) [2015-01-28 18:57:07]       43 [0/0/0]: s=5 n=7 x=0 v=0
>>>> (XEN) [2015-01-28 18:57:07]       44 [0/0/0]: s=6 n=7 x=0
>>>> (XEN) [2015-01-28 18:57:07]       45 [0/0/0]: s=6 n=7 x=0
>>>> (XEN) [2015-01-28 18:57:07]       46 [0/0/0]: s=5 n=7 x=0 v=1
>>>> (XEN) [2015-01-28 18:57:07]       47 [0/0/0]: s=6 n=7 x=0
>>>> (XEN) [2015-01-28 18:57:07]       48 [0/0/0]: s=6 n=7 x=0
>>>> (XEN) [2015-01-28 18:57:07]       49 [0/0/0]: s=3 n=0 x=0 d=0 p=58
>>>> (XEN) [2015-01-28 18:57:07]       50 [0/0/0]: s=5 n=0 x=0 v=9
>>>> (XEN) [2015-01-28 18:57:07]       51 [0/0/0]: s=4 n=0 x=0 p=9 i=9
>>>> (XEN) [2015-01-28 18:57:07]       52 [0/0/0]: s=5 n=0 x=0 v=2
>>>> (XEN) [2015-01-28 18:57:07]       53 [0/0/0]: s=4 n=4 x=0 p=16 i=16
>>>> (XEN) [2015-01-28 18:57:07]       54 [0/0/0]: s=4 n=0 x=0 p=17 i=17
>>>> (XEN) [2015-01-28 18:57:07]       55 [0/0/0]: s=4 n=6 x=0 p=18 i=18
>>>> (XEN) [2015-01-28 18:57:07]       56 [0/0/0]: s=4 n=0 x=0 p=8 i=8
>>>> (XEN) [2015-01-28 18:57:07]       57 [0/0/0]: s=4 n=0 x=0 p=19 i=19
>>>> (XEN) [2015-01-28 18:57:07]       58 [0/0/0]: s=3 n=0 x=0 d=0 p=49
>>>> (XEN) [2015-01-28 18:57:07]       59 [0/0/0]: s=5 n=0 x=0 v=3
>>>> (XEN) [2015-01-28 18:57:07]       60 [0/0/0]: s=5 n=0 x=0 v=4
>>>> (XEN) [2015-01-28 18:57:07]       61 [0/0/0]: s=3 n=0 x=0 d=1 p=1
>>>> (XEN) [2015-01-28 18:57:07]       62 [0/0/0]: s=3 n=0 x=0 d=1 p=2
>>>> (XEN) [2015-01-28 18:57:07]       63 [0/0/0]: s=3 n=0 x=0 d=1 p=3
>>>> (XEN) [2015-01-28 18:57:07]       64 [0/0/0]: s=3 n=0 x=0 d=1 p=5
>>>> (XEN) [2015-01-28 18:57:07]       65 [0/0/0]: s=3 n=0 x=0 d=1 p=6
>>>> (XEN) [2015-01-28 18:57:07]       66 [0/0/0]: s=3 n=0 x=0 d=1 p=7
>>>> (XEN) [2015-01-28 18:57:07]       67 [0/0/0]: s=3 n=0 x=0 d=1 p=8
>>>> (XEN) [2015-01-28 18:57:07]       68 [0/0/0]: s=3 n=0 x=0 d=1 p=9
>>>> (XEN) [2015-01-28 18:57:07]       69 [0/0/0]: s=3 n=0 x=0 d=1 p=4
>>>> (XEN) [2015-01-28 18:57:07] Event channel information for domain 1:
>>>> (XEN) [2015-01-28 18:57:07] Polling vCPUs: {}
>>>> (XEN) [2015-01-28 18:57:07]     port [p/m/s]
>>>> (XEN) [2015-01-28 18:57:07]        1 [0/0/0]: s=3 n=0 x=0 d=0 p=61
>>>> (XEN) [2015-01-28 18:57:07]        2 [0/0/0]: s=3 n=0 x=0 d=0 p=62
>>>> (XEN) [2015-01-28 18:57:07]        3 [0/0/0]: s=3 n=0 x=1 d=0 p=63
>>>> (XEN) [2015-01-28 18:57:07]        4 [0/0/0]: s=3 n=0 x=1 d=0 p=69
>>>> (XEN) [2015-01-28 18:57:07]        5 [0/0/0]: s=3 n=1 x=1 d=0 p=64
>>>> (XEN) [2015-01-28 18:57:07]        6 [0/0/0]: s=3 n=2 x=1 d=0 p=65
>>>> (XEN) [2015-01-28 18:57:07]        7 [0/0/0]: s=3 n=3 x=1 d=0 p=66
>>>> (XEN) [2015-01-28 18:57:07]        8 [0/0/0]: s=3 n=4 x=1 d=0 p=67
>>>> (XEN) [2015-01-28 18:57:07]        9 [0/0/0]: s=3 n=5 x=1 d=0 p=68
>>>> (XEN) [2015-01-28 18:57:07]       10 [0/0/0]: s=2 n=0 x=1 d=0
>>>> (XEN) [2015-01-28 18:57:07]       11 [0/0/0]: s=2 n=0 x=1 d=0
>>>> (XEN) [2015-01-28 18:57:07]       12 [0/0/0]: s=2 n=1 x=1 d=0
>>>> (XEN) [2015-01-28 18:57:07]       13 [0/0/0]: s=2 n=2 x=1 d=0
>>>> (XEN) [2015-01-28 18:57:07]       14 [0/0/0]: s=2 n=3 x=1 d=0
>>>> (XEN) [2015-01-28 18:57:07]       15 [0/0/0]: s=2 n=4 x=1 d=0
>>>> (XEN) [2015-01-28 18:57:07]       16 [0/0/0]: s=2 n=5 x=1 d=0
>>>>
>>>> You can see that domain 1 has only half of it's event channels
>>>> fully setup.  So when (if) hvm_send_assist_req_to_ioreq_server()
>>>> does:
>>>>
>>>>             notify_via_xen_event_channel(d, port);
>>>>
>>>> Nothing happens and you hang in hvm_wait_for_io() forever.
>>>>
>>>>
>>>> This does raise the questions:
>>>>
>>>> 1) Does this patch causes extra event channels to be created
>>>>    that cannot be used?
>>>>
>>>> 2) Should the "default_ioreq_server" be deleted?
>>>>
>>>>
>>>> Not sure the right way to go.
>>>>
>>>>     -Don Slutz
>>>>
>>>>
>>>>>
>>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>>> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>>>> Cc: Peter Maydell <peter.maydell@linaro.org>
>>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>>> Cc: Michael Tokarev <mjt@tls.msk.ru>
>>>>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>>>>> Cc: Stefan Weil <sw@weilnetz.de>
>>>>> Cc: Olaf Hering <olaf@aepfle.de>
>>>>> Cc: Gerd Hoffmann <kraxel@redhat.com>
>>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>> Cc: Alexander Graf <agraf@suse.de>
>>>>> ---
>>>>>  configure                   |   29 ++++++
>>>>>  include/hw/xen/xen_common.h |  223
>> +++++++++++++++++++++++++++++++++++++++++++
>>>>>  trace-events                |    9 ++
>>>>>  xen-hvm.c                   |  160 ++++++++++++++++++++++++++-----
>>>>>  4 files changed, 399 insertions(+), 22 deletions(-)
>>>>>
>>>>> diff --git a/configure b/configure
>>>>> index 47048f0..b1f8c2a 100755
>>>>> --- a/configure
>>>>> +++ b/configure
>>>>> @@ -1877,6 +1877,32 @@ int main(void) {
>>>>>    xc_gnttab_open(NULL, 0);
>>>>>    xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>>>>    xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>>>> +  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
>>>>> +  return 0;
>>>>> +}
>>>>> +EOF
>>>>> +      compile_prog "" "$xen_libs"
>>>>> +    then
>>>>> +    xen_ctrl_version=450
>>>>> +    xen=yes
>>>>> +
>>>>> +  elif
>>>>> +      cat > $TMPC <<EOF &&
>>>>> +#include <xenctrl.h>
>>>>> +#include <xenstore.h>
>>>>> +#include <stdint.h>
>>>>> +#include <xen/hvm/hvm_info_table.h>
>>>>> +#if !defined(HVM_MAX_VCPUS)
>>>>> +# error HVM_MAX_VCPUS not defined
>>>>> +#endif
>>>>> +int main(void) {
>>>>> +  xc_interface *xc;
>>>>> +  xs_daemon_open();
>>>>> +  xc = xc_interface_open(0, 0, 0);
>>>>> +  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
>>>>> +  xc_gnttab_open(NULL, 0);
>>>>> +  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
>>>>> +  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
>>>>>    return 0;
>>>>>  }
>>>>>  EOF
>>>>> @@ -4283,6 +4309,9 @@ if test -n "$sparc_cpu"; then
>>>>>      echo "Target Sparc Arch $sparc_cpu"
>>>>>  fi
>>>>>  echo "xen support       $xen"
>>>>> +if test "$xen" = "yes" ; then
>>>>> +  echo "xen ctrl version  $xen_ctrl_version"
>>>>> +fi
>>>>>  echo "brlapi support    $brlapi"
>>>>>  echo "bluez  support    $bluez"
>>>>>  echo "Documentation     $docs"
>>>>> diff --git a/include/hw/xen/xen_common.h
>> b/include/hw/xen/xen_common.h
>>>>> index 95612a4..519696f 100644
>>>>> --- a/include/hw/xen/xen_common.h
>>>>> +++ b/include/hw/xen/xen_common.h
>>>>> @@ -16,7 +16,9 @@
>>>>>
>>>>>  #include "hw/hw.h"
>>>>>  #include "hw/xen/xen.h"
>>>>> +#include "hw/pci/pci.h"
>>>>>  #include "qemu/queue.h"
>>>>> +#include "trace.h"
>>>>>
>>>>>  /*
>>>>>   * We don't support Xen prior to 3.3.0.
>>>>> @@ -179,4 +181,225 @@ static inline int
>> xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
>>>>>  }
>>>>>  #endif
>>>>>
>>>>> +/* Xen before 4.5 */
>>>>> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
>>>>> +
>>>>> +#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>>>>> +#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>>>>> +#endif
>>>>> +
>>>>> +#define IOREQ_TYPE_PCI_CONFIG 2
>>>>> +
>>>>> +typedef uint32_t ioservid_t;
>>>>> +
>>>>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>>>>> +                                          ioservid_t ioservid,
>>>>> +                                          MemoryRegionSection *section)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t
>> dom,
>>>>> +                                            ioservid_t ioservid,
>>>>> +                                            MemoryRegionSection *section)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>>>>> +                                      ioservid_t ioservid,
>>>>> +                                      MemoryRegionSection *section)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>>>>> +                                        ioservid_t ioservid,
>>>>> +                                        MemoryRegionSection *section)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>>>> +                                  ioservid_t ioservid,
>>>>> +                                  PCIDevice *pci_dev)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>>>> +                                    ioservid_t ioservid,
>>>>> +                                    PCIDevice *pci_dev)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>>>>> +                                          ioservid_t *ioservid)
>>>>> +{
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>>>>> +                                            ioservid_t ioservid)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>>>>> +                                            ioservid_t ioservid,
>>>>> +                                            xen_pfn_t *ioreq_pfn,
>>>>> +                                            xen_pfn_t *bufioreq_pfn,
>>>>> +                                            evtchn_port_t *bufioreq_evtchn)
>>>>> +{
>>>>> +    unsigned long param;
>>>>> +    int rc;
>>>>> +
>>>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN,
>> &param);
>>>>> +    if (rc < 0) {
>>>>> +        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
>>>>> +        return -1;
>>>>> +    }
>>>>> +
>>>>> +    *ioreq_pfn = param;
>>>>> +
>>>>> +    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN,
>> &param);
>>>>> +    if (rc < 0) {
>>>>> +        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
>>>>> +        return -1;
>>>>> +    }
>>>>> +
>>>>> +    *bufioreq_pfn = param;
>>>>> +
>>>>> +    rc = xc_get_hvm_param(xc, dom,
>> HVM_PARAM_BUFIOREQ_EVTCHN,
>>>>> +                          &param);
>>>>> +    if (rc < 0) {
>>>>> +        fprintf(stderr, "failed to get
>> HVM_PARAM_BUFIOREQ_EVTCHN\n");
>>>>> +        return -1;
>>>>> +    }
>>>>> +
>>>>> +    *bufioreq_evtchn = param;
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>>>>> +                                             ioservid_t ioservid,
>>>>> +                                             bool enable)
>>>>> +{
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +/* Xen 4.5 */
>>>>> +#else
>>>>> +
>>>>> +static inline void xen_map_memory_section(XenXC xc, domid_t dom,
>>>>> +                                          ioservid_t ioservid,
>>>>> +                                          MemoryRegionSection *section)
>>>>> +{
>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>> +
>>>>> +    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
>>>>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
>>>>> +                                        start_addr, end_addr);
>>>>> +}
>>>>> +
>>>>> +static inline void xen_unmap_memory_section(XenXC xc, domid_t
>> dom,
>>>>> +                                            ioservid_t ioservid,
>>>>> +                                            MemoryRegionSection *section)
>>>>> +{
>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>> +
>>>>> +    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
>>>>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
>>>>> +                                            start_addr, end_addr);
>>>>> +}
>>>>> +
>>>>> +static inline void xen_map_io_section(XenXC xc, domid_t dom,
>>>>> +                                      ioservid_t ioservid,
>>>>> +                                      MemoryRegionSection *section)
>>>>> +{
>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>> +
>>>>> +    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
>>>>> +    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
>>>>> +                                        start_addr, end_addr);
>>>>> +}
>>>>> +
>>>>> +static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
>>>>> +                                        ioservid_t ioservid,
>>>>> +                                        MemoryRegionSection *section)
>>>>> +{
>>>>> +    hwaddr start_addr = section->offset_within_address_space;
>>>>> +    ram_addr_t size = int128_get64(section->size);
>>>>> +    hwaddr end_addr = start_addr + size - 1;
>>>>> +
>>>>> +    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
>>>>> +    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
>>>>> +                                            start_addr, end_addr);
>>>>> +}
>>>>> +
>>>>> +static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>>>> +                                  ioservid_t ioservid,
>>>>> +                                  PCIDevice *pci_dev)
>>>>> +{
>>>>> +    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>>>> +                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>>>>> +    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
>>>>> +                                      0, pci_bus_num(pci_dev->bus),
>>>>> +                                      PCI_SLOT(pci_dev->devfn),
>>>>> +                                      PCI_FUNC(pci_dev->devfn));
>>>>> +}
>>>>> +
>>>>> +static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>>>> +                                    ioservid_t ioservid,
>>>>> +                                    PCIDevice *pci_dev)
>>>>> +{
>>>>> +    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>>>> +                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
>>>>> +    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
>>>>> +                                          0, pci_bus_num(pci_dev->bus),
>>>>> +                                          PCI_SLOT(pci_dev->devfn),
>>>>> +                                          PCI_FUNC(pci_dev->devfn));
>>>>> +}
>>>>> +
>>>>> +static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>>>>> +                                          ioservid_t *ioservid)
>>>>> +{
>>>>> +    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
>>>>> +
>>>>> +    if (rc == 0) {
>>>>> +        trace_xen_ioreq_server_create(*ioservid);
>>>>> +    }
>>>>> +
>>>>> +    return rc;
>>>>> +}
>>>>> +
>>>>> +static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
>>>>> +                                            ioservid_t ioservid)
>>>>> +{
>>>>> +    trace_xen_ioreq_server_destroy(ioservid);
>>>>> +    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
>>>>> +}
>>>>> +
>>>>> +static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
>>>>> +                                            ioservid_t ioservid,
>>>>> +                                            xen_pfn_t *ioreq_pfn,
>>>>> +                                            xen_pfn_t *bufioreq_pfn,
>>>>> +                                            evtchn_port_t *bufioreq_evtchn)
>>>>> +{
>>>>> +    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
>>>>> +                                        ioreq_pfn, bufioreq_pfn,
>>>>> +                                        bufioreq_evtchn);
>>>>> +}
>>>>> +
>>>>> +static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
>>>>> +                                             ioservid_t ioservid,
>>>>> +                                             bool enable)
>>>>> +{
>>>>> +    trace_xen_ioreq_server_state(ioservid, enable);
>>>>> +    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
>>>>> +}
>>>>> +
>>>>> +#endif
>>>>> +
>>>>>  #endif /* QEMU_HW_XEN_COMMON_H */
>>>>> diff --git a/trace-events b/trace-events
>>>>> index b5722ea..abd1118 100644
>>>>> --- a/trace-events
>>>>> +++ b/trace-events
>>>>> @@ -897,6 +897,15 @@ pvscsi_tx_rings_num_pages(const char* label,
>> uint32_t num) "Number of %s pages:
>>>>>  # xen-hvm.c
>>>>>  xen_ram_alloc(unsigned long ram_addr, unsigned long size)
>> "requested: %#lx, size %#lx"
>>>>>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool
>> log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
>>>>> +xen_ioreq_server_create(uint32_t id) "id: %u"
>>>>> +xen_ioreq_server_destroy(uint32_t id) "id: %u"
>>>>> +xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
>>>>> +xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t
>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>> +xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t
>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>> +xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t
>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>> +xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t
>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
>> "id: %u bdf: %02x.%02x.%02x"
>>>>> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
>> "id: %u bdf: %02x.%02x.%02x"
>>>>>
>>>>>  # xen-mapcache.c
>>>>>  xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
>>>>> diff --git a/xen-hvm.c b/xen-hvm.c
>>>>> index 7548794..31cb3ca 100644
>>>>> --- a/xen-hvm.c
>>>>> +++ b/xen-hvm.c
>>>>> @@ -85,9 +85,6 @@ static inline ioreq_t
>> *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
>>>>>  }
>>>>>  #  define FMT_ioreq_size "u"
>>>>>  #endif
>>>>> -#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
>>>>> -#define HVM_PARAM_BUFIOREQ_EVTCHN 26
>>>>> -#endif
>>>>>
>>>>>  #define BUFFER_IO_MAX_DELAY  100
>>>>>
>>>>> @@ -101,6 +98,7 @@ typedef struct XenPhysmap {
>>>>>  } XenPhysmap;
>>>>>
>>>>>  typedef struct XenIOState {
>>>>> +    ioservid_t ioservid;
>>>>>      shared_iopage_t *shared_page;
>>>>>      shared_vmport_iopage_t *shared_vmport_page;
>>>>>      buffered_iopage_t *buffered_io_page;
>>>>> @@ -117,6 +115,8 @@ typedef struct XenIOState {
>>>>>
>>>>>      struct xs_handle *xenstore;
>>>>>      MemoryListener memory_listener;
>>>>> +    MemoryListener io_listener;
>>>>> +    DeviceListener device_listener;
>>>>>      QLIST_HEAD(, XenPhysmap) physmap;
>>>>>      hwaddr free_phys_offset;
>>>>>      const XenPhysmap *log_for_dirtybit;
>>>>> @@ -467,12 +467,23 @@ static void xen_set_memory(struct
>> MemoryListener *listener,
>>>>>      bool log_dirty = memory_region_is_logging(section->mr);
>>>>>      hvmmem_type_t mem_type;
>>>>>
>>>>> +    if (section->mr == &ram_memory) {
>>>>> +        return;
>>>>> +    } else {
>>>>> +        if (add) {
>>>>> +            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
>>>>> +                                   section);
>>>>> +        } else {
>>>>> +            xen_unmap_memory_section(xen_xc, xen_domid, state-
>>> ioservid,
>>>>> +                                     section);
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>      if (!memory_region_is_ram(section->mr)) {
>>>>>          return;
>>>>>      }
>>>>>
>>>>> -    if (!(section->mr != &ram_memory
>>>>> -          && ( (log_dirty && add) || (!log_dirty && !add)))) {
>>>>> +    if (log_dirty != add) {
>>>>>          return;
>>>>>      }
>>>>>
>>>>> @@ -515,6 +526,50 @@ static void xen_region_del(MemoryListener
>> *listener,
>>>>>      memory_region_unref(section->mr);
>>>>>  }
>>>>>
>>>>> +static void xen_io_add(MemoryListener *listener,
>>>>> +                       MemoryRegionSection *section)
>>>>> +{
>>>>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>>>>> +
>>>>> +    memory_region_ref(section->mr);
>>>>> +
>>>>> +    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
>>>>> +}
>>>>> +
>>>>> +static void xen_io_del(MemoryListener *listener,
>>>>> +                       MemoryRegionSection *section)
>>>>> +{
>>>>> +    XenIOState *state = container_of(listener, XenIOState, io_listener);
>>>>> +
>>>>> +    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid,
>> section);
>>>>> +
>>>>> +    memory_region_unref(section->mr);
>>>>> +}
>>>>> +
>>>>> +static void xen_device_realize(DeviceListener *listener,
>>>>> +			       DeviceState *dev)
>>>>> +{
>>>>> +    XenIOState *state = container_of(listener, XenIOState,
>> device_listener);
>>>>> +
>>>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>>>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>>>>> +
>>>>> +        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +static void xen_device_unrealize(DeviceListener *listener,
>>>>> +				 DeviceState *dev)
>>>>> +{
>>>>> +    XenIOState *state = container_of(listener, XenIOState,
>> device_listener);
>>>>> +
>>>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>>>> +        PCIDevice *pci_dev = PCI_DEVICE(dev);
>>>>> +
>>>>> +        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>>  static void xen_sync_dirty_bitmap(XenIOState *state,
>>>>>                                    hwaddr start_addr,
>>>>>                                    ram_addr_t size)
>>>>> @@ -615,6 +670,17 @@ static MemoryListener xen_memory_listener =
>> {
>>>>>      .priority = 10,
>>>>>  };
>>>>>
>>>>> +static MemoryListener xen_io_listener = {
>>>>> +    .region_add = xen_io_add,
>>>>> +    .region_del = xen_io_del,
>>>>> +    .priority = 10,
>>>>> +};
>>>>> +
>>>>> +static DeviceListener xen_device_listener = {
>>>>> +    .realize = xen_device_realize,
>>>>> +    .unrealize = xen_device_unrealize,
>>>>> +};
>>>>> +
>>>>>  /* get the ioreq packets from share mem */
>>>>>  static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState
>> *state, int vcpu)
>>>>>  {
>>>>> @@ -863,6 +929,27 @@ static void handle_ioreq(XenIOState *state,
>> ioreq_t *req)
>>>>>          case IOREQ_TYPE_INVALIDATE:
>>>>>              xen_invalidate_map_cache();
>>>>>              break;
>>>>> +        case IOREQ_TYPE_PCI_CONFIG: {
>>>>> +            uint32_t sbdf = req->addr >> 32;
>>>>> +            uint32_t val;
>>>>> +
>>>>> +            /* Fake a write to port 0xCF8 so that
>>>>> +             * the config space access will target the
>>>>> +             * correct device model.
>>>>> +             */
>>>>> +            val = (1u << 31) |
>>>>> +                  ((req->addr & 0x0f00) << 16) |
>>>>> +                  ((sbdf & 0xffff) << 8) |
>>>>> +                  (req->addr & 0xfc);
>>>>> +            do_outp(0xcf8, 4, val);
>>>>> +
>>>>> +            /* Now issue the config space access via
>>>>> +             * port 0xCFC
>>>>> +             */
>>>>> +            req->addr = 0xcfc | (req->addr & 0x03);
>>>>> +            cpu_ioreq_pio(req);
>>>>> +            break;
>>>>> +        }
>>>>>          default:
>>>>>              hw_error("Invalid ioreq type 0x%x\n", req->type);
>>>>>      }
>>>>> @@ -993,9 +1080,15 @@ static void
>> xen_main_loop_prepare(XenIOState *state)
>>>>>  static void xen_hvm_change_state_handler(void *opaque, int running,
>>>>>                                           RunState rstate)
>>>>>  {
>>>>> +    XenIOState *state = opaque;
>>>>> +
>>>>>      if (running) {
>>>>> -        xen_main_loop_prepare((XenIOState *)opaque);
>>>>> +        xen_main_loop_prepare(state);
>>>>>      }
>>>>> +
>>>>> +    xen_set_ioreq_server_state(xen_xc, xen_domid,
>>>>> +                               state->ioservid,
>>>>> +                               (rstate == RUN_STATE_RUNNING));
>>>>>  }
>>>>>
>>>>>  static void xen_exit_notifier(Notifier *n, void *data)
>>>>> @@ -1064,8 +1157,9 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>                   MemoryRegion **ram_memory)
>>>>>  {
>>>>>      int i, rc;
>>>>> -    unsigned long ioreq_pfn;
>>>>> -    unsigned long bufioreq_evtchn;
>>>>> +    xen_pfn_t ioreq_pfn;
>>>>> +    xen_pfn_t bufioreq_pfn;
>>>>> +    evtchn_port_t bufioreq_evtchn;
>>>>>      XenIOState *state;
>>>>>
>>>>>      state = g_malloc0(sizeof (XenIOState));
>>>>> @@ -1082,6 +1176,12 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>          return -1;
>>>>>      }
>>>>>
>>>>> +    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state-
>>> ioservid);
>>>>> +    if (rc < 0) {
>>>>> +        perror("xen: ioreq server create");
>>>>> +        return -1;
>>>>> +    }
>>>>> +
>>>>>      state->exit.notify = xen_exit_notifier;
>>>>>      qemu_add_exit_notifier(&state->exit);
>>>>>
>>>>> @@ -1091,8 +1191,18 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>      state->wakeup.notify = xen_wakeup_notifier;
>>>>>      qemu_register_wakeup_notifier(&state->wakeup);
>>>>>
>>>>> -    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN,
>> &ioreq_pfn);
>>>>> +    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state-
>>> ioservid,
>>>>> +                                   &ioreq_pfn, &bufioreq_pfn,
>>>>> +                                   &bufioreq_evtchn);
>>>>> +    if (rc < 0) {
>>>>> +        hw_error("failed to get ioreq server info: error %d handle="
>> XC_INTERFACE_FMT,
>>>>> +                 errno, xen_xc);
>>>>> +    }
>>>>> +
>>>>>      DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
>>>>> +    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
>>>>> +    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
>>>>> +
>>>>>      state->shared_page = xc_map_foreign_range(xen_xc, xen_domid,
>> XC_PAGE_SIZE,
>>>>>                                                PROT_READ|PROT_WRITE, ioreq_pfn);
>>>>>      if (state->shared_page == NULL) {
>>>>> @@ -1114,10 +1224,10 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>          hw_error("get vmport regs pfn returned error %d, rc=%d", errno,
>> rc);
>>>>>      }
>>>>>
>>>>> -    xc_get_hvm_param(xen_xc, xen_domid,
>> HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
>>>>> -    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
>>>>> -    state->buffered_io_page = xc_map_foreign_range(xen_xc,
>> xen_domid, XC_PAGE_SIZE,
>>>>> -                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
>>>>> +    state->buffered_io_page = xc_map_foreign_range(xen_xc,
>> xen_domid,
>>>>> +                                                   XC_PAGE_SIZE,
>>>>> +                                                   PROT_READ|PROT_WRITE,
>>>>> +                                                   bufioreq_pfn);
>>>>>      if (state->buffered_io_page == NULL) {
>>>>>          hw_error("map buffered IO page returned error %d", errno);
>>>>>      }
>>>>> @@ -1125,6 +1235,12 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>      /* Note: cpus is empty at this point in init */
>>>>>      state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
>>>>>
>>>>> +    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state-
>>> ioservid, true);
>>>>> +    if (rc < 0) {
>>>>> +        hw_error("failed to enable ioreq server info: error %d handle="
>> XC_INTERFACE_FMT,
>>>>> +                 errno, xen_xc);
>>>>> +    }
>>>>> +
>>>>>      state->ioreq_local_port = g_malloc0(max_cpus * sizeof
>> (evtchn_port_t));
>>>>>
>>>>>      /* FIXME: how about if we overflow the page here? */
>>>>> @@ -1132,22 +1248,16 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>          rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>>>>                                          xen_vcpu_eport(state->shared_page, i));
>>>>>          if (rc == -1) {
>>>>> -            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>>>>> +            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
>>>>>              return -1;
>>>>>          }
>>>>>          state->ioreq_local_port[i] = rc;
>>>>>      }
>>>>>
>>>>> -    rc = xc_get_hvm_param(xen_xc, xen_domid,
>> HVM_PARAM_BUFIOREQ_EVTCHN,
>>>>> -            &bufioreq_evtchn);
>>>>> -    if (rc < 0) {
>>>>> -        fprintf(stderr, "failed to get
>> HVM_PARAM_BUFIOREQ_EVTCHN\n");
>>>>> -        return -1;
>>>>> -    }
>>>>>      rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
>>>>> -            (uint32_t)bufioreq_evtchn);
>>>>> +                                    bufioreq_evtchn);
>>>>>      if (rc == -1) {
>>>>> -        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
>>>>> +        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
>>>>>          return -1;
>>>>>      }
>>>>>      state->bufioreq_local_port = rc;
>>>>> @@ -1163,6 +1273,12 @@ int xen_hvm_init(ram_addr_t
>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>>      memory_listener_register(&state->memory_listener,
>> &address_space_memory);
>>>>>      state->log_for_dirtybit = NULL;
>>>>>
>>>>> +    state->io_listener = xen_io_listener;
>>>>> +    memory_listener_register(&state->io_listener, &address_space_io);
>>>>> +
>>>>> +    state->device_listener = xen_device_listener;
>>>>> +    device_listener_register(&state->device_listener);
>>>>> +
>>>>>      /* Initialize backend core & drivers */
>>>>>      if (xen_be_init() != 0) {
>>>>>          fprintf(stderr, "%s: xen backend core setup failed\n",
>> __FUNCTION__);
>>>>>
>>>>
Don Slutz Jan. 29, 2015, 7:41 p.m. UTC | #6
>> On 01/29/15 07:09, Paul Durrant wrote:
...
>> Given that IIRC you are using a new dedicated IOREQ type, I
>> think there needs to be something that allows an emulator to
>> register for this IOREQ type. How about adding a new type to
>> those defined for HVMOP_map_io_range_to_ioreq_server for your
>> case? (In your case the start and end values in the hypercall
>> would be meaningless but it could be used to steer
>> hvm_select_ioreq_server() into sending all emulation requests or
>> your new type to QEMU.
>>

This is an interesting idea.  Will need to spend more time on it.


>> Actually such a mechanism could be used
>> to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU
>> patches, they are going nowhere. Upstream QEMU (as default) used
>> to ignore them anyway, which is why I didn't bother with such a
>> patch to Xen before but since you now need one maybe you could
>> add that too?
>>

I think it would not be that hard.  Will look into adding it.


Currently I do not see how hvm_do_resume() works with 2 ioreq servers.
It looks like to me that if a vpcu (like 0) needs to wait for the
2nd ioreq server, hvm_do_resume() will check the 1st ioreq server
and return as if the ioreq is done.  What am I missing?

   -Don Slutz
Paul Durrant Jan. 30, 2015, 10:23 a.m. UTC | #7
> -----Original Message-----
> From: Don Slutz [mailto:dslutz@verizon.com]
> Sent: 29 January 2015 19:41
> To: Paul Durrant; Don Slutz; qemu-devel@nongnu.org; Stefano Stabellini
> Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
> Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini;
> xen-devel
> Subject: New IOREQ type -- IOREQ_TYPE_VMWARE_PORT
> 
> >> On 01/29/15 07:09, Paul Durrant wrote:
> ...
> >> Given that IIRC you are using a new dedicated IOREQ type, I
> >> think there needs to be something that allows an emulator to
> >> register for this IOREQ type. How about adding a new type to
> >> those defined for HVMOP_map_io_range_to_ioreq_server for your
> >> case? (In your case the start and end values in the hypercall
> >> would be meaningless but it could be used to steer
> >> hvm_select_ioreq_server() into sending all emulation requests or
> >> your new type to QEMU.
> >>
> 
> This is an interesting idea.  Will need to spend more time on it.
> 
> 
> >> Actually such a mechanism could be used
> >> to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU
> >> patches, they are going nowhere. Upstream QEMU (as default) used
> >> to ignore them anyway, which is why I didn't bother with such a
> >> patch to Xen before but since you now need one maybe you could
> >> add that too?
> >>
> 
> I think it would not be that hard.  Will look into adding it.
> 
> 
> Currently I do not see how hvm_do_resume() works with 2 ioreq servers.
> It looks like to me that if a vpcu (like 0) needs to wait for the
> 2nd ioreq server, hvm_do_resume() will check the 1st ioreq server
> and return as if the ioreq is done.  What am I missing?
> 

hvm_do_resume() walks the ioreq server list looking at the IOREQ state in the shared page of each server in turn. If no IOREQ was sent to that server then then state will be IOREQ_NONE and hvm_wait_for_io() will return 1 immediately so the outer loop in hvm_do_resume() will move on to the next server. If a state of IOREQ_READY or IOREQ_INPROCESS is found then the vcpu blocks on the relevant event channel until the state transitions to IORESP_READY. The IOREQ is then completed and the loop moves on to the next server.
Normally an IOREQ would only be directed at one server and indeed IOREQs that are issued for emulation requests (i.e. when io_state != HVMIO_none) fall into this category but there is one case of a broadcast IOREQ, which is the INVALIDATE IOREQ (sent to tell emulators to invalidate any mappings of guest memory they may have cached) and that is why hvm_do_resume() has to iterate over all servers.

Does that make sense?

  Paul

>    -Don Slutz
Don Slutz Jan. 30, 2015, 6:26 p.m. UTC | #8
On 01/30/15 05:23, Paul Durrant wrote:
>> -----Original Message-----
>> From: Don Slutz [mailto:dslutz@verizon.com]
>> Sent: 29 January 2015 19:41
>> To: Paul Durrant; Don Slutz; qemu-devel@nongnu.org; Stefano Stabellini
>> Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
>> Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini;
>> xen-devel
>> Subject: New IOREQ type -- IOREQ_TYPE_VMWARE_PORT
>>
>>>> On 01/29/15 07:09, Paul Durrant wrote:
>> ...
>>>> Given that IIRC you are using a new dedicated IOREQ type, I
>>>> think there needs to be something that allows an emulator to
>>>> register for this IOREQ type. How about adding a new type to
>>>> those defined for HVMOP_map_io_range_to_ioreq_server for your
>>>> case? (In your case the start and end values in the hypercall
>>>> would be meaningless but it could be used to steer
>>>> hvm_select_ioreq_server() into sending all emulation requests or
>>>> your new type to QEMU.
>>>>
>>
>> This is an interesting idea.  Will need to spend more time on it.
>>

This does look very doable.  The only issue I see is that it requires
a QEMU change in order to work.  This would prevent Xen 4.6 from using
QEMU 2.2.0 and vmport (vmware-tools, vmware-mouse).

What makes sense to me is to "invert it"  I.E. the default is to send
IOREQ_TYPE_VMWARE_PORT via io_range, and an ioreq server can say stop
sending them.

The reason this is safe so far is that IOREQ_TYPE_VMWARE_PORT can only
be sent if vmport is configured on.  And in that case QEMU will be
started with vmport=on which will cause all QEMUs that do not support
IOREQ_TYPE_VMWARE_PORT to crash.




>>
>>>> Actually such a mechanism could be used
>>>> to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU
>>>> patches, they are going nowhere. Upstream QEMU (as default) used
>>>> to ignore them anyway, which is why I didn't bother with such a
>>>> patch to Xen before but since you now need one maybe you could
>>>> add that too?
>>>>
>>
>> I think it would not be that hard.  Will look into adding it.
>>
>>
>> Currently I do not see how hvm_do_resume() works with 2 ioreq servers.
>> It looks like to me that if a vpcu (like 0) needs to wait for the
>> 2nd ioreq server, hvm_do_resume() will check the 1st ioreq server
>> and return as if the ioreq is done.  What am I missing?
>>
> 
> hvm_do_resume() walks the ioreq server list looking at the IOREQ state in the shared page of each server in turn. If no IOREQ was sent to that server then then state will be IOREQ_NONE and hvm_wait_for_io() will return 1 immediately so the outer loop in hvm_do_resume() will move on to the next server. If a state of IOREQ_READY or IOREQ_INPROCESS is found then the vcpu blocks on the relevant event channel until the state transitions to IORESP_READY. The IOREQ is then completed and the loop moves on to the next server.
> Normally an IOREQ would only be directed at one server and indeed IOREQs that are issued for emulation requests (i.e. when io_state != HVMIO_none) fall into this category but there is one case of a broadcast IOREQ, which is the INVALIDATE IOREQ (sent to tell emulators to invalidate any mappings of guest memory they may have cached) and that is why hvm_do_resume() has to iterate over all servers.
> 
> Does that make sense?
> 

Thanks for the clear info.  It does make sense.

   -Don Slutz

>   Paul
> 
>>    -Don Slutz
>
diff mbox

Patch

diff --git a/configure b/configure
index 47048f0..b1f8c2a 100755
--- a/configure
+++ b/configure
@@ -1877,6 +1877,32 @@  int main(void) {
   xc_gnttab_open(NULL, 0);
   xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
   xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
+  xc_hvm_create_ioreq_server(xc, 0, 0, NULL);
+  return 0;
+}
+EOF
+      compile_prog "" "$xen_libs"
+    then
+    xen_ctrl_version=450
+    xen=yes
+
+  elif
+      cat > $TMPC <<EOF &&
+#include <xenctrl.h>
+#include <xenstore.h>
+#include <stdint.h>
+#include <xen/hvm/hvm_info_table.h>
+#if !defined(HVM_MAX_VCPUS)
+# error HVM_MAX_VCPUS not defined
+#endif
+int main(void) {
+  xc_interface *xc;
+  xs_daemon_open();
+  xc = xc_interface_open(0, 0, 0);
+  xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0);
+  xc_gnttab_open(NULL, 0);
+  xc_domain_add_to_physmap(0, 0, XENMAPSPACE_gmfn, 0, 0);
+  xc_hvm_inject_msi(xc, 0, 0xf0000000, 0x00000000);
   return 0;
 }
 EOF
@@ -4283,6 +4309,9 @@  if test -n "$sparc_cpu"; then
     echo "Target Sparc Arch $sparc_cpu"
 fi
 echo "xen support       $xen"
+if test "$xen" = "yes" ; then
+  echo "xen ctrl version  $xen_ctrl_version"
+fi
 echo "brlapi support    $brlapi"
 echo "bluez  support    $bluez"
 echo "Documentation     $docs"
diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
index 95612a4..519696f 100644
--- a/include/hw/xen/xen_common.h
+++ b/include/hw/xen/xen_common.h
@@ -16,7 +16,9 @@ 
 
 #include "hw/hw.h"
 #include "hw/xen/xen.h"
+#include "hw/pci/pci.h"
 #include "qemu/queue.h"
+#include "trace.h"
 
 /*
  * We don't support Xen prior to 3.3.0.
@@ -179,4 +181,225 @@  static inline int xen_get_vmport_regs_pfn(XenXC xc, domid_t dom,
 }
 #endif
 
+/* Xen before 4.5 */
+#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 450
+
+#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
+#define HVM_PARAM_BUFIOREQ_EVTCHN 26
+#endif
+
+#define IOREQ_TYPE_PCI_CONFIG 2
+
+typedef uint32_t ioservid_t;
+
+static inline void xen_map_memory_section(XenXC xc, domid_t dom,
+                                          ioservid_t ioservid,
+                                          MemoryRegionSection *section)
+{
+}
+
+static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
+                                            ioservid_t ioservid,
+                                            MemoryRegionSection *section)
+{
+}
+
+static inline void xen_map_io_section(XenXC xc, domid_t dom,
+                                      ioservid_t ioservid,
+                                      MemoryRegionSection *section)
+{
+}
+
+static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
+                                        ioservid_t ioservid,
+                                        MemoryRegionSection *section)
+{
+}
+
+static inline void xen_map_pcidev(XenXC xc, domid_t dom,
+                                  ioservid_t ioservid,
+                                  PCIDevice *pci_dev)
+{
+}
+
+static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
+                                    ioservid_t ioservid,
+                                    PCIDevice *pci_dev)
+{
+}
+
+static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
+                                          ioservid_t *ioservid)
+{
+    return 0;
+}
+
+static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
+                                            ioservid_t ioservid)
+{
+}
+
+static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
+                                            ioservid_t ioservid,
+                                            xen_pfn_t *ioreq_pfn,
+                                            xen_pfn_t *bufioreq_pfn,
+                                            evtchn_port_t *bufioreq_evtchn)
+{
+    unsigned long param;
+    int rc;
+
+    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_IOREQ_PFN, &param);
+    if (rc < 0) {
+        fprintf(stderr, "failed to get HVM_PARAM_IOREQ_PFN\n");
+        return -1;
+    }
+
+    *ioreq_pfn = param;
+
+    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_PFN, &param);
+    if (rc < 0) {
+        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_PFN\n");
+        return -1;
+    }
+
+    *bufioreq_pfn = param;
+
+    rc = xc_get_hvm_param(xc, dom, HVM_PARAM_BUFIOREQ_EVTCHN,
+                          &param);
+    if (rc < 0) {
+        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
+        return -1;
+    }
+
+    *bufioreq_evtchn = param;
+
+    return 0;
+}
+
+static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
+                                             ioservid_t ioservid,
+                                             bool enable)
+{
+    return 0;
+}
+
+/* Xen 4.5 */
+#else
+
+static inline void xen_map_memory_section(XenXC xc, domid_t dom,
+                                          ioservid_t ioservid,
+                                          MemoryRegionSection *section)
+{
+    hwaddr start_addr = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    hwaddr end_addr = start_addr + size - 1;
+
+    trace_xen_map_mmio_range(ioservid, start_addr, end_addr);
+    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 1,
+                                        start_addr, end_addr);
+}
+
+static inline void xen_unmap_memory_section(XenXC xc, domid_t dom,
+                                            ioservid_t ioservid,
+                                            MemoryRegionSection *section)
+{
+    hwaddr start_addr = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    hwaddr end_addr = start_addr + size - 1;
+
+    trace_xen_unmap_mmio_range(ioservid, start_addr, end_addr);
+    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 1,
+                                            start_addr, end_addr);
+}
+
+static inline void xen_map_io_section(XenXC xc, domid_t dom,
+                                      ioservid_t ioservid,
+                                      MemoryRegionSection *section)
+{
+    hwaddr start_addr = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    hwaddr end_addr = start_addr + size - 1;
+
+    trace_xen_map_portio_range(ioservid, start_addr, end_addr);
+    xc_hvm_map_io_range_to_ioreq_server(xc, dom, ioservid, 0,
+                                        start_addr, end_addr);
+}
+
+static inline void xen_unmap_io_section(XenXC xc, domid_t dom,
+                                        ioservid_t ioservid,
+                                        MemoryRegionSection *section)
+{
+    hwaddr start_addr = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    hwaddr end_addr = start_addr + size - 1;
+
+    trace_xen_unmap_portio_range(ioservid, start_addr, end_addr);
+    xc_hvm_unmap_io_range_from_ioreq_server(xc, dom, ioservid, 0,
+                                            start_addr, end_addr);
+}
+
+static inline void xen_map_pcidev(XenXC xc, domid_t dom,
+                                  ioservid_t ioservid,
+                                  PCIDevice *pci_dev)
+{
+    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
+                         PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
+    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
+                                      0, pci_bus_num(pci_dev->bus),
+                                      PCI_SLOT(pci_dev->devfn),
+                                      PCI_FUNC(pci_dev->devfn));
+}
+
+static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
+                                    ioservid_t ioservid,
+                                    PCIDevice *pci_dev)
+{
+    trace_xen_unmap_pcidev(ioservid, pci_bus_num(pci_dev->bus),
+                           PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
+    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
+                                          0, pci_bus_num(pci_dev->bus),
+                                          PCI_SLOT(pci_dev->devfn),
+                                          PCI_FUNC(pci_dev->devfn));
+}
+
+static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
+                                          ioservid_t *ioservid)
+{
+    int rc = xc_hvm_create_ioreq_server(xc, dom, 1, ioservid);
+
+    if (rc == 0) {
+        trace_xen_ioreq_server_create(*ioservid);
+    }
+
+    return rc;
+}
+
+static inline void xen_destroy_ioreq_server(XenXC xc, domid_t dom,
+                                            ioservid_t ioservid)
+{
+    trace_xen_ioreq_server_destroy(ioservid);
+    xc_hvm_destroy_ioreq_server(xc, dom, ioservid);
+}
+
+static inline int xen_get_ioreq_server_info(XenXC xc, domid_t dom,
+                                            ioservid_t ioservid,
+                                            xen_pfn_t *ioreq_pfn,
+                                            xen_pfn_t *bufioreq_pfn,
+                                            evtchn_port_t *bufioreq_evtchn)
+{
+    return xc_hvm_get_ioreq_server_info(xc, dom, ioservid,
+                                        ioreq_pfn, bufioreq_pfn,
+                                        bufioreq_evtchn);
+}
+
+static inline int xen_set_ioreq_server_state(XenXC xc, domid_t dom,
+                                             ioservid_t ioservid,
+                                             bool enable)
+{
+    trace_xen_ioreq_server_state(ioservid, enable);
+    return xc_hvm_set_ioreq_server_state(xc, dom, ioservid, enable);
+}
+
+#endif
+
 #endif /* QEMU_HW_XEN_COMMON_H */
diff --git a/trace-events b/trace-events
index b5722ea..abd1118 100644
--- a/trace-events
+++ b/trace-events
@@ -897,6 +897,15 @@  pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s pages:
 # xen-hvm.c
 xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, size %#lx"
 xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
+xen_ioreq_server_create(uint32_t id) "id: %u"
+xen_ioreq_server_destroy(uint32_t id) "id: %u"
+xen_ioreq_server_state(uint32_t id, bool enable) "id: %u: enable: %i"
+xen_map_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
+xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
+xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
+xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
+xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
+xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id: %u bdf: %02x.%02x.%02x"
 
 # xen-mapcache.c
 xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
diff --git a/xen-hvm.c b/xen-hvm.c
index 7548794..31cb3ca 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -85,9 +85,6 @@  static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
 }
 #  define FMT_ioreq_size "u"
 #endif
-#ifndef HVM_PARAM_BUFIOREQ_EVTCHN
-#define HVM_PARAM_BUFIOREQ_EVTCHN 26
-#endif
 
 #define BUFFER_IO_MAX_DELAY  100
 
@@ -101,6 +98,7 @@  typedef struct XenPhysmap {
 } XenPhysmap;
 
 typedef struct XenIOState {
+    ioservid_t ioservid;
     shared_iopage_t *shared_page;
     shared_vmport_iopage_t *shared_vmport_page;
     buffered_iopage_t *buffered_io_page;
@@ -117,6 +115,8 @@  typedef struct XenIOState {
 
     struct xs_handle *xenstore;
     MemoryListener memory_listener;
+    MemoryListener io_listener;
+    DeviceListener device_listener;
     QLIST_HEAD(, XenPhysmap) physmap;
     hwaddr free_phys_offset;
     const XenPhysmap *log_for_dirtybit;
@@ -467,12 +467,23 @@  static void xen_set_memory(struct MemoryListener *listener,
     bool log_dirty = memory_region_is_logging(section->mr);
     hvmmem_type_t mem_type;
 
+    if (section->mr == &ram_memory) {
+        return;
+    } else {
+        if (add) {
+            xen_map_memory_section(xen_xc, xen_domid, state->ioservid,
+                                   section);
+        } else {
+            xen_unmap_memory_section(xen_xc, xen_domid, state->ioservid,
+                                     section);
+        }
+    }
+
     if (!memory_region_is_ram(section->mr)) {
         return;
     }
 
-    if (!(section->mr != &ram_memory
-          && ( (log_dirty && add) || (!log_dirty && !add)))) {
+    if (log_dirty != add) {
         return;
     }
 
@@ -515,6 +526,50 @@  static void xen_region_del(MemoryListener *listener,
     memory_region_unref(section->mr);
 }
 
+static void xen_io_add(MemoryListener *listener,
+                       MemoryRegionSection *section)
+{
+    XenIOState *state = container_of(listener, XenIOState, io_listener);
+
+    memory_region_ref(section->mr);
+
+    xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
+}
+
+static void xen_io_del(MemoryListener *listener,
+                       MemoryRegionSection *section)
+{
+    XenIOState *state = container_of(listener, XenIOState, io_listener);
+
+    xen_unmap_io_section(xen_xc, xen_domid, state->ioservid, section);
+
+    memory_region_unref(section->mr);
+}
+
+static void xen_device_realize(DeviceListener *listener,
+			       DeviceState *dev)
+{
+    XenIOState *state = container_of(listener, XenIOState, device_listener);
+
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        PCIDevice *pci_dev = PCI_DEVICE(dev);
+
+        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
+    }
+}
+
+static void xen_device_unrealize(DeviceListener *listener,
+				 DeviceState *dev)
+{
+    XenIOState *state = container_of(listener, XenIOState, device_listener);
+
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        PCIDevice *pci_dev = PCI_DEVICE(dev);
+
+        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
+    }
+}
+
 static void xen_sync_dirty_bitmap(XenIOState *state,
                                   hwaddr start_addr,
                                   ram_addr_t size)
@@ -615,6 +670,17 @@  static MemoryListener xen_memory_listener = {
     .priority = 10,
 };
 
+static MemoryListener xen_io_listener = {
+    .region_add = xen_io_add,
+    .region_del = xen_io_del,
+    .priority = 10,
+};
+
+static DeviceListener xen_device_listener = {
+    .realize = xen_device_realize,
+    .unrealize = xen_device_unrealize,
+};
+
 /* get the ioreq packets from share mem */
 static ioreq_t *cpu_get_ioreq_from_shared_memory(XenIOState *state, int vcpu)
 {
@@ -863,6 +929,27 @@  static void handle_ioreq(XenIOState *state, ioreq_t *req)
         case IOREQ_TYPE_INVALIDATE:
             xen_invalidate_map_cache();
             break;
+        case IOREQ_TYPE_PCI_CONFIG: {
+            uint32_t sbdf = req->addr >> 32;
+            uint32_t val;
+
+            /* Fake a write to port 0xCF8 so that
+             * the config space access will target the
+             * correct device model.
+             */
+            val = (1u << 31) |
+                  ((req->addr & 0x0f00) << 16) |
+                  ((sbdf & 0xffff) << 8) |
+                  (req->addr & 0xfc);
+            do_outp(0xcf8, 4, val);
+
+            /* Now issue the config space access via
+             * port 0xCFC
+             */
+            req->addr = 0xcfc | (req->addr & 0x03);
+            cpu_ioreq_pio(req);
+            break;
+        }
         default:
             hw_error("Invalid ioreq type 0x%x\n", req->type);
     }
@@ -993,9 +1080,15 @@  static void xen_main_loop_prepare(XenIOState *state)
 static void xen_hvm_change_state_handler(void *opaque, int running,
                                          RunState rstate)
 {
+    XenIOState *state = opaque;
+
     if (running) {
-        xen_main_loop_prepare((XenIOState *)opaque);
+        xen_main_loop_prepare(state);
     }
+
+    xen_set_ioreq_server_state(xen_xc, xen_domid,
+                               state->ioservid,
+                               (rstate == RUN_STATE_RUNNING));
 }
 
 static void xen_exit_notifier(Notifier *n, void *data)
@@ -1064,8 +1157,9 @@  int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
                  MemoryRegion **ram_memory)
 {
     int i, rc;
-    unsigned long ioreq_pfn;
-    unsigned long bufioreq_evtchn;
+    xen_pfn_t ioreq_pfn;
+    xen_pfn_t bufioreq_pfn;
+    evtchn_port_t bufioreq_evtchn;
     XenIOState *state;
 
     state = g_malloc0(sizeof (XenIOState));
@@ -1082,6 +1176,12 @@  int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
         return -1;
     }
 
+    rc = xen_create_ioreq_server(xen_xc, xen_domid, &state->ioservid);
+    if (rc < 0) {
+        perror("xen: ioreq server create");
+        return -1;
+    }
+
     state->exit.notify = xen_exit_notifier;
     qemu_add_exit_notifier(&state->exit);
 
@@ -1091,8 +1191,18 @@  int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
     state->wakeup.notify = xen_wakeup_notifier;
     qemu_register_wakeup_notifier(&state->wakeup);
 
-    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN, &ioreq_pfn);
+    rc = xen_get_ioreq_server_info(xen_xc, xen_domid, state->ioservid,
+                                   &ioreq_pfn, &bufioreq_pfn,
+                                   &bufioreq_evtchn);
+    if (rc < 0) {
+        hw_error("failed to get ioreq server info: error %d handle=" XC_INTERFACE_FMT,
+                 errno, xen_xc);
+    }
+
     DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
+    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
+    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
+
     state->shared_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
                                               PROT_READ|PROT_WRITE, ioreq_pfn);
     if (state->shared_page == NULL) {
@@ -1114,10 +1224,10 @@  int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
         hw_error("get vmport regs pfn returned error %d, rc=%d", errno, rc);
     }
 
-    xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn);
-    DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn);
-    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE,
-                                                   PROT_READ|PROT_WRITE, ioreq_pfn);
+    state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid,
+                                                   XC_PAGE_SIZE,
+                                                   PROT_READ|PROT_WRITE,
+                                                   bufioreq_pfn);
     if (state->buffered_io_page == NULL) {
         hw_error("map buffered IO page returned error %d", errno);
     }
@@ -1125,6 +1235,12 @@  int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
     /* Note: cpus is empty at this point in init */
     state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
 
+    rc = xen_set_ioreq_server_state(xen_xc, xen_domid, state->ioservid, true);
+    if (rc < 0) {
+        hw_error("failed to enable ioreq server info: error %d handle=" XC_INTERFACE_FMT,
+                 errno, xen_xc);
+    }
+
     state->ioreq_local_port = g_malloc0(max_cpus * sizeof (evtchn_port_t));
 
     /* FIXME: how about if we overflow the page here? */
@@ -1132,22 +1248,16 @@  int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
         rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
                                         xen_vcpu_eport(state->shared_page, i));
         if (rc == -1) {
-            fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
+            fprintf(stderr, "shared evtchn %d bind error %d\n", i, errno);
             return -1;
         }
         state->ioreq_local_port[i] = rc;
     }
 
-    rc = xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_EVTCHN,
-            &bufioreq_evtchn);
-    if (rc < 0) {
-        fprintf(stderr, "failed to get HVM_PARAM_BUFIOREQ_EVTCHN\n");
-        return -1;
-    }
     rc = xc_evtchn_bind_interdomain(state->xce_handle, xen_domid,
-            (uint32_t)bufioreq_evtchn);
+                                    bufioreq_evtchn);
     if (rc == -1) {
-        fprintf(stderr, "bind interdomain ioctl error %d\n", errno);
+        fprintf(stderr, "buffered evtchn bind error %d\n", errno);
         return -1;
     }
     state->bufioreq_local_port = rc;
@@ -1163,6 +1273,12 @@  int xen_hvm_init(ram_addr_t *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
     memory_listener_register(&state->memory_listener, &address_space_memory);
     state->log_for_dirtybit = NULL;
 
+    state->io_listener = xen_io_listener;
+    memory_listener_register(&state->io_listener, &address_space_io);
+
+    state->device_listener = xen_device_listener;
+    device_listener_register(&state->device_listener);
+
     /* Initialize backend core & drivers */
     if (xen_be_init() != 0) {
         fprintf(stderr, "%s: xen backend core setup failed\n", __FUNCTION__);