From patchwork Mon Jul 25 14:02:54 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Avi Kivity X-Patchwork-Id: 106673 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [140.186.70.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 0ED24B6F90 for ; Tue, 26 Jul 2011 00:03:43 +1000 (EST) Received: from localhost ([::1]:47943 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QlLkt-00043L-0S for incoming@patchwork.ozlabs.org; Mon, 25 Jul 2011 10:03:35 -0400 Received: from eggs.gnu.org ([140.186.70.92]:47271) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QlLkZ-0003kH-KF for qemu-devel@nongnu.org; Mon, 25 Jul 2011 10:03:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QlLkU-00053Y-IU for qemu-devel@nongnu.org; Mon, 25 Jul 2011 10:03:15 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35469) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QlLkU-00052H-Ae for qemu-devel@nongnu.org; Mon, 25 Jul 2011 10:03:10 -0400 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p6PE39cZ008851 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 25 Jul 2011 10:03:09 -0400 Received: from cleopatra.tlv.redhat.com (cleopatra.tlv.redhat.com [10.35.255.11]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p6PE38RW024464; Mon, 25 Jul 2011 10:03:09 -0400 Received: from s01.tlv.redhat.com (s01.tlv.redhat.com [10.35.255.8]) by cleopatra.tlv.redhat.com (Postfix) with ESMTP id ED02E250B43; Mon, 25 Jul 2011 17:03:06 +0300 (IDT) From: Avi Kivity To: qemu-devel@nongnu.org Date: Mon, 25 Jul 2011 17:02:54 +0300 Message-Id: <1311602584-23409-14-git-send-email-avi@redhat.com> In-Reply-To: <1311602584-23409-1-git-send-email-avi@redhat.com> References: <1311602584-23409-1-git-send-email-avi@redhat.com> X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.132.183.28 Cc: kvm@vger.kernel.org Subject: [Qemu-devel] [PATCH 13/23] memory: document the memory API X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Signed-off-by: Avi Kivity --- docs/memory.txt | 172 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 172 insertions(+), 0 deletions(-) create mode 100644 docs/memory.txt diff --git a/docs/memory.txt b/docs/memory.txt new file mode 100644 index 0000000..4460c06 --- /dev/null +++ b/docs/memory.txt @@ -0,0 +1,172 @@ +The memory API +============== + +The memory API models the memory and I/O buses and controllers of a QEMU +machine. It attempts to allow modelling of: + + - ordinary RAM + - memory-mapped I/O (MMIO) + - memory controllers that can dynamically reroute physical memory regions + to different destinations + +The memory model provides support for + + - tracking RAM changes by the guest + - setting up coalesced memory for kvm + - setting up ioeventfd regions for kvm + +Memory is modelled as an tree (really acyclic graph) of MemoryRegion objects. +The root of the tree is memory as seen from the CPU's viewpoint (the system +bus). Nodes in the tree represent other buses, memory controllers, and +memory regions that have been rerouted. Leaves are RAM and MMIO regions. + +Types of regions +---------------- + +There are four types of memory regions (all represented by a single C type +MemoryRegion): + +- RAM: a RAM region is simply a range of host memory that can be made available + to the guest. + +- MMIO: a range of guest memory that is implemented by host callbacks; + each read or write causes a callback to be called on the host. + +- container: a container simply includes other memory regions, each at + a different offset. Containers are useful for grouping several regions + into one unit. For example, a PCI BAR may be composed of a RAM region + and an MMIO region. + + A container's subregions are usually non-overlapping. In some cases it is + useful to have overlapping regions; for example a memory controller that + can overlay a subregion of RAM with MMIO or ROM, or a PCI controller + that does not prevent card from claiming overlapping BARs. + +- alias: a subsection of another region. Aliases allow a region to be + split apart into discontiguous regions. Examples of uses are memory banks + used when the guest address space is smaller than the amount of RAM + addressed, or a memory controller that splits main memory to expose a "PCI + hole". Aliases may point to any type of region, including other aliases, + but an alias may not point back to itself, directly or indirectly. + + +Region names +------------ + +Regions are assigned names by the constructor. For most regions these are +only used for debugging purposes, but RAM regions also use the name to identify +live migration sections. This means that RAM region names need to have ABI +stability. + +Region lifecycle +---------------- + +A region is created by one of the constructor functions (memory_region_init*()) +and destroyed by the destructor (memory_region_destroy()). In between, +a region can be added to an address space by using memory_region_add_subregion() +and removed using memory_region_del_subregion(). Region attributes may be +changed at any point; they take effect once the region becomes exposed to the +guest. + +Overlapping regions and priority +-------------------------------- +Usually, regions may not overlap each other; a memory address decodes into +exactly one target. In some cases it is useful to allow regions to overlap, +and sometimes to control which of an overlapping regions is visible to the +guest. This is done with memory_region_add_subregion_overlap(), which +allows the region to overlap any other region in the same container, and +specifies a priority that allows the core to decide which of two regions at +the same address are visible (highest wins). + +Visibility +---------- +The memory core uses the following rules to select a memory region when the +guest accesses an address: + +- all direct subregions of the root region are matched against the address, in + descending priority order + - if the address lies outside the region offset/size, the subregion is + discarded + - if the subregion is a leaf (RAM or MMIO), the seach terminates + - if the subregion is a container, the same algorithm is used within the + subregion (after the address is adjusted by the subregion offset) + - if the subregion is an alias, the search is continues at the alias target + (after the address is adjusted by the subregion offset and alias offset) + +Example memory map +------------------ + +system_memory: container@0-2^48-1 + | + +---- lomem: alias@0-0xdfffffff ---> #ram (0-0xdfffffff) + | + +---- himem: alias@0x100000000-0x11fffffff ---> #ram (0xe0000000-0xffffffff) + | + +---- vga-window: alias@0xa0000-0xbfffff ---> #pci (0xa0000-0xbffff) + | (prio 1) + | + +---- pci-hole: alias@0xe0000000-0xffffffff ---> #pci (0xe0000000-0xffffffff) + +pci (0-2^32-1) + | + +--- vga-area: container@0xa0000-0xbffff + | | + | +--- alias@0x00000-0x7fff ---> #vram (0x010000-0x017fff) + | | + | +--- alias@0x08000-0xffff ---> #vram (0x020000-0x027fff) + | + +---- vram: ram@0xe1000000-0xe1ffffff + | + +---- vga-mmio: mmio@0xe2000000-0xe200ffff + +ram: ram@0x00000000-0xffffffff + +The is a (simplified) PC memory map. The 4GB RAM block is mapped into the +system address space via two aliases: "lomem" is a 1:1 mapping of the first +3.5GB; "himem" maps the last 0.5GB at address 4GB. This leaves 0.5GB for the +so-called PCI hole, that allows a 32-bit PCI bus to exist in a system with +4GB of memory. + +The memory controller diverts addresses in the range 640K-768K to the PCI +address space. This is modeled using the "vga-window" alias, mapped at a +higher priority so it obscures the RAM at the same addresses. The vga window +can be removed by programming the memory controller; this is modelled by +removing the alias and exposing the RAM underneath. + +The pci address space is not a direct child of the system address space, since +we only want parts of it to be visible (we accomplish this using aliases). +It has two subregions: vga-area models the legacy vga window and is occupied +by two 32K memory banks pointing at two sections of the framebuffer. +In addition the vram is mapped as a BAR at address e1000000, and an additional +BAR containing MMIO registers is mapped after it. + +Note that if the guest maps a BAR outside the PCI hole, it would not be +visible as the pci-hole alias clips it to a 0.5GB range. + +Attributes +---------- + +Various region attributes (read-only, dirty logging, coalesced mmio, ioeventfd) +can be changed during the region lifecycle. They take effect once the region +is made visible (which can be immediately, later, or never). + +MMIO Operations +--------------- + +MMIO regions are provided with ->read() and ->write() callbacks; in addition +various constraints can be supplied to control how these callbacks are called: + + - .valid.min_access_size, .valid.max_access_size define the access sizes + (in bytes) which the device accepts; accesses outside this range will + have device and bus specific behaviour (ignored, or machine check) + - .valid.aligned specifies that the device only accepts naturally aligned + accesses. Unaligned accesses invoke device and bus specific behaviour. + - .impl.min_access_size, .impl.max_access_size define the access sizes + (in bytes) supported by the *implementation*; other access sizes will be + emulated using the ones available. For example a 4-byte write will be + emulated using four 1-byte write, is .impl.max_access_size = 1. + - .impl.valid specifies that the *implementation* only supports unaligned + accesses; unaligned accesses will be emulated by two aligned accesses. + - .old_portio and .old_mmio can be used to ease porting from code using + cpu_register_io_memory() and register_ioport(). They should not be used + in new code.