diff mbox

[RFC,v1] Add declarations for hierarchical memory region API

Message ID 1305814352-15044-2-git-send-email-avi@redhat.com
State New
Headers show

Commit Message

Avi Kivity May 19, 2011, 2:12 p.m. UTC
The memory API separates the attributes of a memory region (its size, how
reads or writes are handled, dirty logging, and coalescing) from where it
is mapped and whether it is enabled.  This allows a device to configure
a memory region once, then hand it off to its parent bus to map it according
to the bus configuration.

Hierarchical registration also allows a device to compose a region out of
a number of sub-regions with different properties; for example some may be
RAM while others may be MMIO.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 memory.h |  142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 142 insertions(+), 0 deletions(-)
 create mode 100644 memory.h

Comments

Alex Williamson May 19, 2011, 7:07 p.m. UTC | #1
On Thu, 2011-05-19 at 10:12 -0400, Avi Kivity wrote:
> The memory API separates the attributes of a memory region (its size, how
> reads or writes are handled, dirty logging, and coalescing) from where it
> is mapped and whether it is enabled.  This allows a device to configure
> a memory region once, then hand it off to its parent bus to map it according
> to the bus configuration.
> 
> Hierarchical registration also allows a device to compose a region out of
> a number of sub-regions with different properties; for example some may be
> RAM while others may be MMIO.
> 
> Signed-off-by: Avi Kivity <avi@redhat.com>
> ---
>  memory.h |  142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 142 insertions(+), 0 deletions(-)
>  create mode 100644 memory.h
> 
> diff --git a/memory.h b/memory.h
> new file mode 100644
> index 0000000..77c5951
> --- /dev/null
> +++ b/memory.h
> @@ -0,0 +1,142 @@
> +#ifndef MEMORY_H
> +#define MEMORY_H
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include "qemu-common.h"
> +#include "cpu-common.h"
> +#include "targphys.h"
> +#include "qemu-queue.h"
> +
> +typedef struct MemoryRegionOps MemoryRegionOps;
> +typedef struct MemoryRegion MemoryRegion;
> +
> +/*
> + * Memory region callbacks
> + */
> +struct MemoryRegionOps {
> +    /* Read from the memory region. @addr is relative to @mr; @size is
> +     * in bytes. */
> +    uint64_t (*read)(MemoryRegion *mr,
> +                     target_phys_addr_t addr,
> +                     unsigned size);
> +    /* Write to the memory region. @addr is relative to @mr; @size is
> +     * in bytes. */
> +    void (*write)(MemoryRegion *mr,
> +                  target_phys_addr_t addr,
> +                  uint64_t data,
> +                  unsigned size);
> +    /* Guest-visible constraints: */
> +    struct {
> +        /* If nonzero, specify bounds on access sizes beyond which a machine
> +         * check is thrown.
> +         */
> +        unsigned min_access_size;
> +        unsigned max_access_size;

Do we always support all access sizes between min and max?  This might
be easier to describe as a bitmap of supported power of 2 access sizes.
Thanks,

Alex
Jan Kiszka May 19, 2011, 7:27 p.m. UTC | #2
On 2011-05-19 16:12, Avi Kivity wrote:
> +/* Sets an offset to be added to MemoryRegionOps callbacks. */
> +void memory_region_set_offset(MemoryRegion *mr, target_phys_addr_t offset);

Please mark this as a legacy helper, ideally to be removed after the
complete conversion to this API. During that phase we should try to
identify those devices which still depend on offset=0 and maybe directly
fix them.

> +/* Turn loggging on or off for specified client (display, migration) */
> +void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client);
> +/* Enable memory coalescing for the region.  MMIO ->write callbacks may be
> + * delayed until a non-coalesced MMIO is issued.
> + */
> +void memory_region_set_coalescing(MemoryRegion *mr);
> +/* Enable memory coalescing for a sub-range of the region.  MMIO ->write
> + * callbacks may be delayed until a non-coalesced MMIO is issued.
> + */
> +void memory_region_add_coalescing(MemoryRegion *mr,
> +                                  target_phys_addr_t offset,
> +                                  target_phys_addr_t size);

Will this be such a common use case that requesting the user to split up
the region and then use set_coalescing will generate too much boiler
plate code?

> +/* Disable MMIO coalescing for the region. */
> +void memory_region_clear_coalescing(MemoryRegion *mr);

And what about clearing coalescing for sub-ranges? Maybe skip
add_coalescing for the first run and see how far we get.

Jan
Anthony Liguori May 19, 2011, 8:43 p.m. UTC | #3
On 05/19/2011 09:12 AM, Avi Kivity wrote:
> The memory API separates the attributes of a memory region (its size, how
> reads or writes are handled, dirty logging, and coalescing) from where it
> is mapped and whether it is enabled.  This allows a device to configure
> a memory region once, then hand it off to its parent bus to map it according
> to the bus configuration.
>
> Hierarchical registration also allows a device to compose a region out of
> a number of sub-regions with different properties; for example some may be
> RAM while others may be MMIO.
>
> Signed-off-by: Avi Kivity<avi@redhat.com>
> ---
>   memory.h |  142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 files changed, 142 insertions(+), 0 deletions(-)
>   create mode 100644 memory.h
>
> diff --git a/memory.h b/memory.h
> new file mode 100644
> index 0000000..77c5951
> --- /dev/null
> +++ b/memory.h
> @@ -0,0 +1,142 @@
> +#ifndef MEMORY_H
> +#define MEMORY_H
> +
> +#include<stdint.h>
> +#include<stdbool.h>
> +#include "qemu-common.h"
> +#include "cpu-common.h"
> +#include "targphys.h"
> +#include "qemu-queue.h"
> +
> +typedef struct MemoryRegionOps MemoryRegionOps;
> +typedef struct MemoryRegion MemoryRegion;
> +
> +/*
> + * Memory region callbacks
> + */
> +struct MemoryRegionOps {
> +    /* Read from the memory region. @addr is relative to @mr; @size is
> +     * in bytes. */
> +    uint64_t (*read)(MemoryRegion *mr,
> +                     target_phys_addr_t addr,
> +                     unsigned size);
> +    /* Write to the memory region. @addr is relative to @mr; @size is
> +     * in bytes. */
> +    void (*write)(MemoryRegion *mr,
> +                  target_phys_addr_t addr,
> +                  uint64_t data,
> +                  unsigned size);
> +    /* Guest-visible constraints: */
> +    struct {
> +        /* If nonzero, specify bounds on access sizes beyond which a machine
> +         * check is thrown.
> +         */
> +        unsigned min_access_size;
> +        unsigned max_access_size;
> +        /* If true, unaligned accesses are supported.  Otherwise unaligned
> +         * accesses throw machine checks.
> +         */
> +         bool unaligned;
> +    } valid;

Under what circumstances would this be used?

The behavior of devices that receive non-natural accesses varies wildly.

For PCI devices, invalid accesses almost always return ~0.  I can't 
think of a device where an MCE would occur.

Regards,

Anthony Liguori
Stefan Weil May 19, 2011, 9:04 p.m. UTC | #4
Am 19.05.2011 16:12, schrieb Avi Kivity:
> The memory API separates the attributes of a memory region (its size, how
> reads or writes are handled, dirty logging, and coalescing) from where it
> is mapped and whether it is enabled. This allows a device to configure
> a memory region once, then hand it off to its parent bus to map it 
> according
> to the bus configuration.
>
> Hierarchical registration also allows a device to compose a region out of
> a number of sub-regions with different properties; for example some may be
> RAM while others may be MMIO.
>
> Signed-off-by: Avi Kivity <avi@redhat.com>
> ---
> memory.h | 142 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 142 insertions(+), 0 deletions(-)
> create mode 100644 memory.h
>
> diff --git a/memory.h b/memory.h
> new file mode 100644
> index 0000000..77c5951
> --- /dev/null
> +++ b/memory.h
> @@ -0,0 +1,142 @@
> +#ifndef MEMORY_H
> +#define MEMORY_H
> +
> +#include <stdint.h>
> +#include <stdbool.h>

stdbool.h is already included in qemu-common.h,
stdint.h (indirectly) too.

Therefore both include statements can be removed.

> +#include "qemu-common.h"
> +#include "cpu-common.h"
> +#include "targphys.h"
> +#include "qemu-queue.h"
> +
> +typedef struct MemoryRegionOps MemoryRegionOps;
> +typedef struct MemoryRegion MemoryRegion;
> +
> +/*
> + * Memory region callbacks
> + */
> +struct MemoryRegionOps {
> + /* Read from the memory region. @addr is relative to @mr; @size is
> + * in bytes. */
> + uint64_t (*read)(MemoryRegion *mr,
> + target_phys_addr_t addr,
> + unsigned size);
> + /* Write to the memory region. @addr is relative to @mr; @size is
> + * in bytes. */
> + void (*write)(MemoryRegion *mr,
> + target_phys_addr_t addr,
> + uint64_t data,
> + unsigned size);
> + /* Guest-visible constraints: */
> + struct {
> + /* If nonzero, specify bounds on access sizes beyond which a machine
> + * check is thrown.
> + */
> + unsigned min_access_size;
> + unsigned max_access_size;
> + /* If true, unaligned accesses are supported. Otherwise unaligned
> + * accesses throw machine checks.
> + */
> + bool unaligned;
> + } valid;
> + /* Internal implementation constraints: */
> + struct {
> + /* If nonzero, specifies the minimum size implemented. Smaller sizes
> + * will be rounded upwards and a partial result will be returned.
> + */
> + unsigned min_access_size;
> + /* If nonzero, specifies the maximum size implemented. Larger sizes
> + * will be done as a series of accesses with smaller sizes.
> + */
> + unsigned max_access_size;
> + /* If true, unaligned accesses are supported. Otherwise all accesses
> + * are converted to (possibly multiple) naturally aligned accesses.
> + */
> + bool unaligned;
> + } impl;
> +};
> +
> +typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> +
> +struct CoalescedMemoryRange {
> + target_phys_addr_t start;
> + target_phys_addr_t size;
> + QTAILQ_ENTRY(coalesced_ranges) link;
> +};
> +
> +struct MemoryRegion {
> + /* All fields are private - violators will be prosecuted */

Is it possible to move this private declaration into the implementation
file (or a private header file if the declaration is needed by more than
one file)?

> + const MemoryRegionOps *ops;
> + MemoryRegion *parent;
> + target_phys_addr_t size;
> + target_phys_addr_t addr;
> + ram_addr_t ram_addr;
> + unsigned priority;
> + bool may_overlap;
> + QTAILQ_HEAD(subregions, MemoryRegion) subregions;
> + QTAILQ_ENTRY(subregions) subregions_link;
> + QTAILQ_HEAD(coalesced_ranges, CoalescedMemoryRange) coalesced;
> +};
> +
> +/* Initialize a memory region
> + *
> + * The region typically acts as a container for other memory regions.
> + */
> +void memory_region_init(MemoryRegion *mr,
> + target_phys_addr_t size);
> +/* Initialize an I/O memory region. Accesses into the region will be
> + * cause the callbacks in @ops to be called.
> + *
> + * if @size is nonzero, subregions will be clipped to @size.
> + */
> +void memory_region_init_io(MemoryRegion *mr,
> + const MemoryRegionOps *ops,
> + target_phys_addr_t size);
> +/* Initialize an I/O memory region. Accesses into the region will be
> + * modify memory directly.
> + */
> +void memory_region_init_ram(MemoryRegion *mr,
> + target_phys_addr_t size);
> +/* Initialize a RAM memory region. Accesses into the region will be
> + * modify memory in @ptr directly.
> + */
> +void memory_region_init_ram_ptr(MemoryRegion *mr,
> + target_phys_addr_t size,
> + void *ptr);
> +/* Destroy a memory region. The memory becomes inaccessible. */
> +void memory_region_destroy(MemoryRegion *mr);
> +/* Sets an offset to be added to MemoryRegionOps callbacks. */
> +void memory_region_set_offset(MemoryRegion *mr, target_phys_addr_t 
> offset);
> +/* Turn loggging on or off for specified client (display, migration) */
> +void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client);
> +/* Enable memory coalescing for the region. MMIO ->write callbacks may be
> + * delayed until a non-coalesced MMIO is issued.
> + */
> +void memory_region_set_coalescing(MemoryRegion *mr);
> +/* Enable memory coalescing for a sub-range of the region. MMIO ->write
> + * callbacks may be delayed until a non-coalesced MMIO is issued.
> + */
> +void memory_region_add_coalescing(MemoryRegion *mr,
> + target_phys_addr_t offset,
> + target_phys_addr_t size);
> +/* Disable MMIO coalescing for the region. */
> +void memory_region_clear_coalescing(MemoryRegion *mr);
> +
> +/* Add a sub-region at @offset. The sub-region may not overlap with other
> + * subregions (except for those explicitly marked as overlapping)
> + */
> +void memory_region_add_subregion(MemoryRegion *mr,
> + target_phys_addr_t offset,
> + MemoryRegion *subregion);
> +/* Add a sub-region at @offset. The sun-region may overlap other 
> subregions;
> + * conflicts are resolved by having a higher @priority hide a lower 
> @priority.
> + * Subregions without priority are taken as @priority 0.
> + */
> +void memory_region_add_subregion_overlap(MemoryRegion *mr,
> + target_phys_addr_t offset,
> + MemoryRegion *subregion,
> + unsigned priority);
> +/* Remove a subregion. */
> +void memory_region_del_subregion(MemoryRegion *mr,
> + MemoryRegion *subregion);
> +
> +#endif

Kind regards,
Stefan W.
Stefan Hajnoczi May 19, 2011, 9:11 p.m. UTC | #5
On Thu, May 19, 2011 at 3:12 PM, Avi Kivity <avi@redhat.com> wrote:
> +struct MemoryRegion {
> +    /* All fields are private - violators will be prosecuted */
> +    const MemoryRegionOps *ops;
> +    MemoryRegion *parent;

In the case where a region is aliased (mapped twice into the address
space at different addresses) I need two MemoryRegions?  The
MemoryRegion describes an actual mapping in the <parent, addr,
ram_addr> tuple, not just the attributes of the region (ops, size,
...).

Stefan
Avi Kivity May 20, 2011, 9:18 a.m. UTC | #6
On 05/19/2011 10:07 PM, Alex Williamson wrote:
> On Thu, 2011-05-19 at 10:12 -0400, Avi Kivity wrote:
> >  The memory API separates the attributes of a memory region (its size, how
> >  reads or writes are handled, dirty logging, and coalescing) from where it
> >  is mapped and whether it is enabled.  This allows a device to configure
> >  a memory region once, then hand it off to its parent bus to map it according
> >  to the bus configuration.
> >
> >  Hierarchical registration also allows a device to compose a region out of
> >  a number of sub-regions with different properties; for example some may be
> >  RAM while others may be MMIO.
>
> >  +    /* Guest-visible constraints: */
> >  +    struct {
> >  +        /* If nonzero, specify bounds on access sizes beyond which a machine
> >  +         * check is thrown.
> >  +         */
> >  +        unsigned min_access_size;
> >  +        unsigned max_access_size;
>
> Do we always support all access sizes between min and max?

As far as I can tell, yes.

> This might
> be easier to describe as a bitmap of supported power of 2 access sizes.

This is uglier to initialize.  However we can provide #defines for 
common use (MEM_ACCESS_BYTE_TO_LONG, MEM_ACCESS_LONG).
Avi Kivity May 20, 2011, 9:20 a.m. UTC | #7
On 05/19/2011 10:27 PM, Jan Kiszka wrote:
> On 2011-05-19 16:12, Avi Kivity wrote:
> >  +/* Sets an offset to be added to MemoryRegionOps callbacks. */
> >  +void memory_region_set_offset(MemoryRegion *mr, target_phys_addr_t offset);
>
> Please mark this as a legacy helper, ideally to be removed after the
> complete conversion to this API. During that phase we should try to
> identify those devices which still depend on offset=0 and maybe directly
> fix them.

Okay.

> >  +/* Turn loggging on or off for specified client (display, migration) */
> >  +void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client);
> >  +/* Enable memory coalescing for the region.  MMIO ->write callbacks may be
> >  + * delayed until a non-coalesced MMIO is issued.
> >  + */
> >  +void memory_region_set_coalescing(MemoryRegion *mr);
> >  +/* Enable memory coalescing for a sub-range of the region.  MMIO ->write
> >  + * callbacks may be delayed until a non-coalesced MMIO is issued.
> >  + */
> >  +void memory_region_add_coalescing(MemoryRegion *mr,
> >  +                                  target_phys_addr_t offset,
> >  +                                  target_phys_addr_t size);
>
> Will this be such a common use case that requesting the user to split up
> the region and then use set_coalescing will generate too much boiler
> plate code?

Look at e1000, coalescing ranges have byte granularity.

> >  +/* Disable MMIO coalescing for the region. */
> >  +void memory_region_clear_coalescing(MemoryRegion *mr);
>
> And what about clearing coalescing for sub-ranges?

Clear them all and rebuild.

> Maybe skip
> add_coalescing for the first run and see how far we get.

We get as far as e.
Avi Kivity May 20, 2011, 9:23 a.m. UTC | #8
On 05/19/2011 11:43 PM, Anthony Liguori wrote:
> On 05/19/2011 09:12 AM, Avi Kivity wrote:
>> The memory API separates the attributes of a memory region (its size, 
>> how
>> reads or writes are handled, dirty logging, and coalescing) from 
>> where it
>> is mapped and whether it is enabled.  This allows a device to configure
>> a memory region once, then hand it off to its parent bus to map it 
>> according
>> to the bus configuration.
>>
>> Hierarchical registration also allows a device to compose a region 
>> out of
>> a number of sub-regions with different properties; for example some 
>> may be
>> RAM while others may be MMIO.
>>
>> +    struct {
>> +        /* If nonzero, specify bounds on access sizes beyond which a 
>> machine
>> +         * check is thrown.
>> +         */
>> +        unsigned min_access_size;
>> +        unsigned max_access_size;
>> +        /* If true, unaligned accesses are supported.  Otherwise 
>> unaligned
>> +         * accesses throw machine checks.
>> +         */
>> +         bool unaligned;
>> +    } valid;
>
> Under what circumstances would this be used?
>
> The behavior of devices that receive non-natural accesses varies wildly.
>
> For PCI devices, invalid accesses almost always return ~0.  I can't 
> think of a device where an MCE would occur.

This was requested by Richard, so I'll let him comment.
Avi Kivity May 20, 2011, 9:26 a.m. UTC | #9
On 05/20/2011 12:04 AM, Stefan Weil wrote:
> Am 19.05.2011 16:12, schrieb Avi Kivity:
>> The memory API separates the attributes of a memory region (its size, 
>> how
>> reads or writes are handled, dirty logging, and coalescing) from 
>> where it
>> is mapped and whether it is enabled. This allows a device to configure
>> a memory region once, then hand it off to its parent bus to map it 
>> according
>> to the bus configuration.
>>
>> Hierarchical registration also allows a device to compose a region 
>> out of
>> a number of sub-regions with different properties; for example some 
>> may be
>> RAM while others may be MMIO.

>> --- /dev/null
>> +++ b/memory.h
>> @@ -0,0 +1,142 @@
>> +#ifndef MEMORY_H
>> +#define MEMORY_H
>> +
>> +#include <stdint.h>
>> +#include <stdbool.h>
>
> stdbool.h is already included in qemu-common.h,
> stdint.h (indirectly) too.
>
> Therefore both include statements can be removed.

We shouldn't rely on indirect includes, it makes updating headers very 
hard.  Each header should #include what it directly needs and no more.

>> +typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>> +
>> +struct CoalescedMemoryRange {
>> + target_phys_addr_t start;
>> + target_phys_addr_t size;
>> + QTAILQ_ENTRY(coalesced_ranges) link;
>> +};
>> +
>> +struct MemoryRegion {
>> + /* All fields are private - violators will be prosecuted */
>
> Is it possible to move this private declaration into the implementation
> file (or a private header file if the declaration is needed by more than
> one file)?


No, the structure size is needed by clients.
Avi Kivity May 20, 2011, 9:28 a.m. UTC | #10
On 05/20/2011 12:11 AM, Stefan Hajnoczi wrote:
> On Thu, May 19, 2011 at 3:12 PM, Avi Kivity<avi@redhat.com>  wrote:
> >  +struct MemoryRegion {
> >  +    /* All fields are private - violators will be prosecuted */
> >  +    const MemoryRegionOps *ops;
> >  +    MemoryRegion *parent;
>
> In the case where a region is aliased (mapped twice into the address
> space at different addresses) I need two MemoryRegions?

Yes.

> The
> MemoryRegion describes an actual mapping in the<parent, addr,
> ram_addr>  tuple, not just the attributes of the region (ops, size,
> ...).

Correct.  The region is not just a read-only descriptor.  
memory_region_add_subregion() can be used only once on a region (unless 
you _del_subregion() in between).

(it also follows from the fact that there is no separate opaque for 
registration, and from the fact that RAM is owned by the region, not 
provided as part of registration).
Richard Henderson May 20, 2011, 2:06 p.m. UTC | #11
On 05/20/2011 02:23 AM, Avi Kivity wrote:
> On 05/19/2011 11:43 PM, Anthony Liguori wrote:
>> On 05/19/2011 09:12 AM, Avi Kivity wrote:
>>> The memory API separates the attributes of a memory region (its size, how
>>> reads or writes are handled, dirty logging, and coalescing) from where it
>>> is mapped and whether it is enabled.  This allows a device to configure
>>> a memory region once, then hand it off to its parent bus to map it according
>>> to the bus configuration.
>>>
>>> Hierarchical registration also allows a device to compose a region out of
>>> a number of sub-regions with different properties; for example some may be
>>> RAM while others may be MMIO.
>>>
>>> +    struct {
>>> +        /* If nonzero, specify bounds on access sizes beyond which a machine
>>> +         * check is thrown.
>>> +         */
>>> +        unsigned min_access_size;
>>> +        unsigned max_access_size;
>>> +        /* If true, unaligned accesses are supported.  Otherwise unaligned
>>> +         * accesses throw machine checks.
>>> +         */
>>> +         bool unaligned;
>>> +    } valid;
>>
>> Under what circumstances would this be used?
>>
>> The behavior of devices that receive non-natural accesses varies wildly.
>>
>> For PCI devices, invalid accesses almost always return ~0.  I can't think of a device where an MCE would occur.
> 
> This was requested by Richard, so I'll let him comment.
> 

Several alpha system chips MCE when accessed with incorrect sizes.
E.g. only 64-bit accesses are allowed.

Is this structure honestly any better than 4 function pointers?
I can't see that it is, myself.


r~
Anthony Liguori May 20, 2011, 2:31 p.m. UTC | #12
On 05/20/2011 09:06 AM, Richard Henderson wrote:
> On 05/20/2011 02:23 AM, Avi Kivity wrote:
>> On 05/19/2011 11:43 PM, Anthony Liguori wrote:
>>> On 05/19/2011 09:12 AM, Avi Kivity wrote:
>>>> The memory API separates the attributes of a memory region (its size, how
>>>> reads or writes are handled, dirty logging, and coalescing) from where it
>>>> is mapped and whether it is enabled.  This allows a device to configure
>>>> a memory region once, then hand it off to its parent bus to map it according
>>>> to the bus configuration.
>>>>
>>>> Hierarchical registration also allows a device to compose a region out of
>>>> a number of sub-regions with different properties; for example some may be
>>>> RAM while others may be MMIO.
>>>>
>>>> +    struct {
>>>> +        /* If nonzero, specify bounds on access sizes beyond which a machine
>>>> +         * check is thrown.
>>>> +         */
>>>> +        unsigned min_access_size;
>>>> +        unsigned max_access_size;
>>>> +        /* If true, unaligned accesses are supported.  Otherwise unaligned
>>>> +         * accesses throw machine checks.
>>>> +         */
>>>> +         bool unaligned;
>>>> +    } valid;
>>>
>>> Under what circumstances would this be used?
>>>
>>> The behavior of devices that receive non-natural accesses varies wildly.
>>>
>>> For PCI devices, invalid accesses almost always return ~0.  I can't think of a device where an MCE would occur.
>>
>> This was requested by Richard, so I'll let him comment.
>>
>
> Several alpha system chips MCE when accessed with incorrect sizes.
> E.g. only 64-bit accesses are allowed.

But is this a characteristic of devices or is this a characteristic of 
the chipset/CPU?

At any rate, I'm fairly sure it doesn't belong in the MemoryRegion 
structure.

Regards,

Anthony Liguori
Richard Henderson May 20, 2011, 2:40 p.m. UTC | #13
On 05/20/2011 07:31 AM, Anthony Liguori wrote:
> But is this a characteristic of devices or is this a characteristic of the chipset/CPU?

Chipset.


r~
Anthony Liguori May 20, 2011, 2:46 p.m. UTC | #14
On 05/20/2011 09:40 AM, Richard Henderson wrote:
> On 05/20/2011 07:31 AM, Anthony Liguori wrote:
>> But is this a characteristic of devices or is this a characteristic of the chipset/CPU?
>
> Chipset.

So if the chipset only allows accesses that are 64-bit, then you'll want 
to have hierarchical dispatch filter non 64-bit accesses and raise an 
MCE appropriately.

So you don't need anything in MemoryRegion, you need code in the 
dispatch path.

Regards,

Anthony Liguori

>
> r~
>
Blue Swirl May 20, 2011, 5:59 p.m. UTC | #15
On Thu, May 19, 2011 at 5:12 PM, Avi Kivity <avi@redhat.com> wrote:
> The memory API separates the attributes of a memory region (its size, how
> reads or writes are handled, dirty logging, and coalescing) from where it
> is mapped and whether it is enabled.  This allows a device to configure
> a memory region once, then hand it off to its parent bus to map it according
> to the bus configuration.
>
> Hierarchical registration also allows a device to compose a region out of
> a number of sub-regions with different properties; for example some may be
> RAM while others may be MMIO.
>
> Signed-off-by: Avi Kivity <avi@redhat.com>
> ---
>  memory.h |  142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 142 insertions(+), 0 deletions(-)
>  create mode 100644 memory.h
>
> diff --git a/memory.h b/memory.h
> new file mode 100644
> index 0000000..77c5951
> --- /dev/null
> +++ b/memory.h
> @@ -0,0 +1,142 @@
> +#ifndef MEMORY_H
> +#define MEMORY_H
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include "qemu-common.h"
> +#include "cpu-common.h"
> +#include "targphys.h"
> +#include "qemu-queue.h"
> +
> +typedef struct MemoryRegionOps MemoryRegionOps;
> +typedef struct MemoryRegion MemoryRegion;
> +
> +/*
> + * Memory region callbacks
> + */
> +struct MemoryRegionOps {
> +    /* Read from the memory region. @addr is relative to @mr; @size is
> +     * in bytes. */
> +    uint64_t (*read)(MemoryRegion *mr,
> +                     target_phys_addr_t addr,
> +                     unsigned size);
> +    /* Write to the memory region. @addr is relative to @mr; @size is
> +     * in bytes. */
> +    void (*write)(MemoryRegion *mr,
> +                  target_phys_addr_t addr,
> +                  uint64_t data,
> +                  unsigned size);
> +    /* Guest-visible constraints: */
> +    struct {
> +        /* If nonzero, specify bounds on access sizes beyond which a machine
> +         * check is thrown.
> +         */
> +        unsigned min_access_size;
> +        unsigned max_access_size;
> +        /* If true, unaligned accesses are supported.  Otherwise unaligned
> +         * accesses throw machine checks.
> +         */
> +         bool unaligned;
> +    } valid;
> +    /* Internal implementation constraints: */
> +    struct {
> +        /* If nonzero, specifies the minimum size implemented.  Smaller sizes
> +         * will be rounded upwards and a partial result will be returned.
> +         */
> +        unsigned min_access_size;
> +        /* If nonzero, specifies the maximum size implemented.  Larger sizes
> +         * will be done as a series of accesses with smaller sizes.
> +         */
> +        unsigned max_access_size;
> +        /* If true, unaligned accesses are supported.  Otherwise all accesses
> +         * are converted to (possibly multiple) naturally aligned accesses.
> +         */
> +         bool unaligned;
> +    } impl;
> +};
> +
> +typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> +
> +struct CoalescedMemoryRange {
> +    target_phys_addr_t start;
> +    target_phys_addr_t size;
> +    QTAILQ_ENTRY(coalesced_ranges) link;
> +};
> +
> +struct MemoryRegion {
> +    /* All fields are private - violators will be prosecuted */
> +    const MemoryRegionOps *ops;
> +    MemoryRegion *parent;
> +    target_phys_addr_t size;
> +    target_phys_addr_t addr;
> +    ram_addr_t ram_addr;
> +    unsigned priority;
> +    bool may_overlap;
> +    QTAILQ_HEAD(subregions, MemoryRegion) subregions;
> +    QTAILQ_ENTRY(subregions) subregions_link;
> +    QTAILQ_HEAD(coalesced_ranges, CoalescedMemoryRange) coalesced;
> +};
> +
> +/* Initialize a memory region
> + *
> + * The region typically acts as a container for other memory regions.
> + */
> +void memory_region_init(MemoryRegion *mr,
> +                        target_phys_addr_t size);
> +/* Initialize an I/O memory region.  Accesses into the region will be
> + * cause the callbacks in @ops to be called.
> + *
> + * if @size is nonzero, subregions will be clipped to @size.
> + */
> +void memory_region_init_io(MemoryRegion *mr,
> +                           const MemoryRegionOps *ops,
> +                           target_phys_addr_t size);
> +/* Initialize an I/O memory region.  Accesses into the region will be
> + * modify memory directly.
> + */
> +void memory_region_init_ram(MemoryRegion *mr,
> +                            target_phys_addr_t size);
> +/* Initialize a RAM memory region.  Accesses into the region will be
> + * modify memory in @ptr directly.
> + */
> +void memory_region_init_ram_ptr(MemoryRegion *mr,
> +                                target_phys_addr_t size,
> +                                void *ptr);
> +/* Destroy a memory region.  The memory becomes inaccessible. */
> +void memory_region_destroy(MemoryRegion *mr);

Doesn't the lower priority region become accessible instead in some cases?

> +/* Sets an offset to be added to MemoryRegionOps callbacks. */
> +void memory_region_set_offset(MemoryRegion *mr, target_phys_addr_t offset);
> +/* Turn loggging on or off for specified client (display, migration) */

g--

> +void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client);
> +/* Enable memory coalescing for the region.  MMIO ->write callbacks may be
> + * delayed until a non-coalesced MMIO is issued.
> + */
> +void memory_region_set_coalescing(MemoryRegion *mr);
> +/* Enable memory coalescing for a sub-range of the region.  MMIO ->write
> + * callbacks may be delayed until a non-coalesced MMIO is issued.
> + */
> +void memory_region_add_coalescing(MemoryRegion *mr,
> +                                  target_phys_addr_t offset,
> +                                  target_phys_addr_t size);
> +/* Disable MMIO coalescing for the region. */
> +void memory_region_clear_coalescing(MemoryRegion *mr);

Perhaps the interface could be more generic, like
+void memory_region_set_property(MemoryRegion *mr, unsigned flags);
+void memory_region_clear_property(MemoryRegion *mr, unsigned flags);

> +
> +/* Add a sub-region at @offset.  The sub-region may not overlap with other
> + * subregions (except for those explicitly marked as overlapping)
> + */
> +void memory_region_add_subregion(MemoryRegion *mr,
> +                                 target_phys_addr_t offset,
> +                                 MemoryRegion *subregion);
> +/* Add a sub-region at @offset.  The sun-region may overlap other subregions;

Sunny regions?

> + * conflicts are resolved by having a higher @priority hide a lower @priority.
> + * Subregions without priority are taken as @priority 0.
> + */
> +void memory_region_add_subregion_overlap(MemoryRegion *mr,
> +                                         target_phys_addr_t offset,
> +                                         MemoryRegion *subregion,
> +                                         unsigned priority);
> +/* Remove a subregion. */
> +void memory_region_del_subregion(MemoryRegion *mr,
> +                                 MemoryRegion *subregion);

What would the subregions be used for?
Blue Swirl May 20, 2011, 6:16 p.m. UTC | #16
On Fri, May 20, 2011 at 5:46 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 05/20/2011 09:40 AM, Richard Henderson wrote:
>>
>> On 05/20/2011 07:31 AM, Anthony Liguori wrote:
>>>
>>> But is this a characteristic of devices or is this a characteristic of
>>> the chipset/CPU?
>>
>> Chipset.
>
> So if the chipset only allows accesses that are 64-bit, then you'll want to
> have hierarchical dispatch filter non 64-bit accesses and raise an MCE
> appropriately.
>
> So you don't need anything in MemoryRegion, you need code in the dispatch
> path.

Sparc (32/64) systems are also very picky about wrong sized accesses.
I think the buses have lines for access size and the device can (and
they will) signal an error in these cases. Then the bus controller
will raise an NMI.

I think the easiest way to handle this could be to use overlapping
registrations for specific sizes. Then we could make a default error
generator device, which would just signal NMI/MCE on any access. It
would register for all of the picky area with lowest possible
priority. Other devices would register the small working access areas
with no knowledge about this error generator device. Any correct
access should go to other devices, bad accesses to the error
generator.

Though this would not be very different from current unassigned access handling.
Avi Kivity May 22, 2011, 6:38 a.m. UTC | #17
On 05/20/2011 05:31 PM, Anthony Liguori wrote:
>> Several alpha system chips MCE when accessed with incorrect sizes.
>> E.g. only 64-bit accesses are allowed.
>
>
> But is this a characteristic of devices or is this a characteristic of 
> the chipset/CPU?

The chipset is modelled by a MemoryRegion too.

>
> At any rate, I'm fairly sure it doesn't belong in the MemoryRegion 
> structure.
>

Since it isn't a global property, where does it belong?
Avi Kivity May 22, 2011, 6:39 a.m. UTC | #18
On 05/20/2011 05:46 PM, Anthony Liguori wrote:
> On 05/20/2011 09:40 AM, Richard Henderson wrote:
>> On 05/20/2011 07:31 AM, Anthony Liguori wrote:
>>> But is this a characteristic of devices or is this a characteristic 
>>> of the chipset/CPU?
>>
>> Chipset.
>
> So if the chipset only allows accesses that are 64-bit, then you'll 
> want to have hierarchical dispatch filter non 64-bit accesses and 
> raise an MCE appropriately.
>
> So you don't need anything in MemoryRegion, you need code in the 
> dispatch path.

MemoryRegion *is* the dispatch path.  Only done declaratively so we can 
flatten it whenever it changes.
Avi Kivity May 22, 2011, 6:40 a.m. UTC | #19
On 05/20/2011 09:16 PM, Blue Swirl wrote:
> On Fri, May 20, 2011 at 5:46 PM, Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >  On 05/20/2011 09:40 AM, Richard Henderson wrote:
> >>
> >>  On 05/20/2011 07:31 AM, Anthony Liguori wrote:
> >>>
> >>>  But is this a characteristic of devices or is this a characteristic of
> >>>  the chipset/CPU?
> >>
> >>  Chipset.
> >
> >  So if the chipset only allows accesses that are 64-bit, then you'll want to
> >  have hierarchical dispatch filter non 64-bit accesses and raise an MCE
> >  appropriately.
> >
> >  So you don't need anything in MemoryRegion, you need code in the dispatch
> >  path.
>
> Sparc (32/64) systems are also very picky about wrong sized accesses.
> I think the buses have lines for access size and the device can (and
> they will) signal an error in these cases. Then the bus controller
> will raise an NMI.
>
> I think the easiest way to handle this could be to use overlapping
> registrations for specific sizes. Then we could make a default error
> generator device, which would just signal NMI/MCE on any access. It
> would register for all of the picky area with lowest possible
> priority. Other devices would register the small working access areas
> with no knowledge about this error generator device. Any correct
> access should go to other devices, bad accesses to the error
> generator.
>
> Though this would not be very different from current unassigned access handling.

The MemoryRegion way of doing it would be to register the MemoryRegion 
of the bus with the picky attributes.  Any sub-regions will inherit the 
property.
Avi Kivity May 22, 2011, 6:45 a.m. UTC | #20
On 05/20/2011 08:59 PM, Blue Swirl wrote:
> On Thu, May 19, 2011 at 5:12 PM, Avi Kivity<avi@redhat.com>  wrote:
> >  The memory API separates the attributes of a memory region (its size, how
> >  reads or writes are handled, dirty logging, and coalescing) from where it
> >  is mapped and whether it is enabled.  This allows a device to configure
> >  a memory region once, then hand it off to its parent bus to map it according
> >  to the bus configuration.
> >
> >  Hierarchical registration also allows a device to compose a region out of
> >  a number of sub-regions with different properties; for example some may be
> >  RAM while others may be MMIO.

> >  +void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client);
> >  +/* Enable memory coalescing for the region.  MMIO ->write callbacks may be
> >  + * delayed until a non-coalesced MMIO is issued.
> >  + */
> >  +void memory_region_set_coalescing(MemoryRegion *mr);
> >  +/* Enable memory coalescing for a sub-range of the region.  MMIO ->write
> >  + * callbacks may be delayed until a non-coalesced MMIO is issued.
> >  + */
> >  +void memory_region_add_coalescing(MemoryRegion *mr,
> >  +                                  target_phys_addr_t offset,
> >  +                                  target_phys_addr_t size);
> >  +/* Disable MMIO coalescing for the region. */
> >  +void memory_region_clear_coalescing(MemoryRegion *mr);
>
> Perhaps the interface could be more generic, like
> +void memory_region_set_property(MemoryRegion *mr, unsigned flags);
> +void memory_region_clear_property(MemoryRegion *mr, unsigned flags);
>

Coalescing is a complex property, not just a boolean attribute.  We 
probably will have a number of boolean attributes later, though.

> >  + * conflicts are resolved by having a higher @priority hide a lower @priority.
> >  + * Subregions without priority are taken as @priority 0.
> >  + */
> >  +void memory_region_add_subregion_overlap(MemoryRegion *mr,
> >  +                                         target_phys_addr_t offset,
> >  +                                         MemoryRegion *subregion,
> >  +                                         unsigned priority);
> >  +/* Remove a subregion. */
> >  +void memory_region_del_subregion(MemoryRegion *mr,
> >  +                                 MemoryRegion *subregion);
>
> What would the subregions be used for?

Subregions describe the flow of data through the memory bus.  We'd have 
a subregion for the PCI bus, with its own subregions for various BARs, 
with some having subregions for dispatching different MMIO types within 
the BAR.

This allows, for example, the PCI layer to move a BAR without the PCI 
device knowing anything about it.
Avi Kivity May 22, 2011, 7:01 a.m. UTC | #21
On 05/20/2011 05:06 PM, Richard Henderson wrote:
> Is this structure honestly any better than 4 function pointers?
> I can't see that it is, myself.
>

That was requested by Anthony.  And in fact we have two bits of 
information per access size, one is whether the access is allowed or 
not, the other is whether we want to use a larger or smaller access size 
function (useful for auto-splitting a 64-bit access into two 32-bit 
accesses for example).
Avi Kivity May 22, 2011, 7:04 a.m. UTC | #22
On 05/20/2011 08:59 PM, Blue Swirl wrote:
> On Thu, May 19, 2011 at 5:12 PM, Avi Kivity<avi@redhat.com>  wrote:
> >  The memory API separates the attributes of a memory region (its size, how
> >  reads or writes are handled, dirty logging, and coalescing) from where it
> >  is mapped and whether it is enabled.  This allows a device to configure
> >  a memory region once, then hand it off to its parent bus to map it according
> >  to the bus configuration.
> >
> >  Hierarchical registration also allows a device to compose a region out of
> >  a number of sub-regions with different properties; for example some may be
> >  RAM while others may be MMIO.
> >
> >  +/* Destroy a memory region.  The memory becomes inaccessible. */
> >  +void memory_region_destroy(MemoryRegion *mr);
>
> Doesn't the lower priority region become accessible instead in some cases?

If we require that _add_subregion() and _del_subregion() be paired, then no.
Blue Swirl May 22, 2011, 9:32 a.m. UTC | #23
On Sun, May 22, 2011 at 9:45 AM, Avi Kivity <avi@redhat.com> wrote:
> On 05/20/2011 08:59 PM, Blue Swirl wrote:
>>
>> On Thu, May 19, 2011 at 5:12 PM, Avi Kivity<avi@redhat.com>  wrote:
>> >  The memory API separates the attributes of a memory region (its size,
>> > how
>> >  reads or writes are handled, dirty logging, and coalescing) from where
>> > it
>> >  is mapped and whether it is enabled.  This allows a device to configure
>> >  a memory region once, then hand it off to its parent bus to map it
>> > according
>> >  to the bus configuration.
>> >
>> >  Hierarchical registration also allows a device to compose a region out
>> > of
>> >  a number of sub-regions with different properties; for example some may
>> > be
>> >  RAM while others may be MMIO.
>
>> >  +void memory_region_set_log(MemoryRegion *mr, bool log, unsigned
>> > client);
>> >  +/* Enable memory coalescing for the region.  MMIO ->write callbacks
>> > may be
>> >  + * delayed until a non-coalesced MMIO is issued.
>> >  + */
>> >  +void memory_region_set_coalescing(MemoryRegion *mr);
>> >  +/* Enable memory coalescing for a sub-range of the region.  MMIO
>> > ->write
>> >  + * callbacks may be delayed until a non-coalesced MMIO is issued.
>> >  + */
>> >  +void memory_region_add_coalescing(MemoryRegion *mr,
>> >  +                                  target_phys_addr_t offset,
>> >  +                                  target_phys_addr_t size);
>> >  +/* Disable MMIO coalescing for the region. */
>> >  +void memory_region_clear_coalescing(MemoryRegion *mr);
>>
>> Perhaps the interface could be more generic, like
>> +void memory_region_set_property(MemoryRegion *mr, unsigned flags);
>> +void memory_region_clear_property(MemoryRegion *mr, unsigned flags);
>>
>
> Coalescing is a complex property, not just a boolean attribute.  We probably
> will have a number of boolean attributes later, though.

But what is the difference between adding coalescing to an area and
setting the bit property 'coalescing' to an area? At least what you
propose now is not so complex that it couldn't be handled as a single
bit.

>> >  + * conflicts are resolved by having a higher @priority hide a lower
>> > @priority.
>> >  + * Subregions without priority are taken as @priority 0.
>> >  + */
>> >  +void memory_region_add_subregion_overlap(MemoryRegion *mr,
>> >  +                                         target_phys_addr_t offset,
>> >  +                                         MemoryRegion *subregion,
>> >  +                                         unsigned priority);
>> >  +/* Remove a subregion. */
>> >  +void memory_region_del_subregion(MemoryRegion *mr,
>> >  +                                 MemoryRegion *subregion);
>>
>> What would the subregions be used for?
>
> Subregions describe the flow of data through the memory bus.  We'd have a
> subregion for the PCI bus, with its own subregions for various BARs, with
> some having subregions for dispatching different MMIO types within the BAR.
>
> This allows, for example, the PCI layer to move a BAR without the PCI device
> knowing anything about it.

But why can't a first class region be used for that?
Avi Kivity May 22, 2011, 11:36 a.m. UTC | #24
On 05/22/2011 12:32 PM, Blue Swirl wrote:
> >>  >    +void memory_region_add_coalescing(MemoryRegion *mr,
> >>  >    +                                  target_phys_addr_t offset,
> >>  >    +                                  target_phys_addr_t size);
> >>  >    +/* Disable MMIO coalescing for the region. */
> >>  >    +void memory_region_clear_coalescing(MemoryRegion *mr);
> >>
> >>  Perhaps the interface could be more generic, like
> >>  +void memory_region_set_property(MemoryRegion *mr, unsigned flags);
> >>  +void memory_region_clear_property(MemoryRegion *mr, unsigned flags);
> >>
> >
> >  Coalescing is a complex property, not just a boolean attribute.  We probably
> >  will have a number of boolean attributes later, though.
>
> But what is the difference between adding coalescing to an area and
> setting the bit property 'coalescing' to an area? At least what you
> propose now is not so complex that it couldn't be handled as a single
> bit.

Look at the API - add_coalescing() sets the coalescing property on a 
subrange of the memory region, not the entire region.

(motivation - hw/e1000.c).

> >>  >    + * conflicts are resolved by having a higher @priority hide a lower
> >>  >  @priority.
> >>  >    + * Subregions without priority are taken as @priority 0.
> >>  >    + */
> >>  >    +void memory_region_add_subregion_overlap(MemoryRegion *mr,
> >>  >    +                                         target_phys_addr_t offset,
> >>  >    +                                         MemoryRegion *subregion,
> >>  >    +                                         unsigned priority);
> >>  >    +/* Remove a subregion. */
> >>  >    +void memory_region_del_subregion(MemoryRegion *mr,
> >>  >    +                                 MemoryRegion *subregion);
> >>
> >>  What would the subregions be used for?
> >
> >  Subregions describe the flow of data through the memory bus.  We'd have a
> >  subregion for the PCI bus, with its own subregions for various BARs, with
> >  some having subregions for dispatching different MMIO types within the BAR.
> >
> >  This allows, for example, the PCI layer to move a BAR without the PCI device
> >  knowing anything about it.
>
> But why can't a first class region be used for that?

Subregions are first-class regions.  In fact all regions are subregions 
except the root.

It's a tree of regions, each level adding an offset, clipping, and 
perhaps other attributes, with the leaves providing actual memory (mmio 
or RAM).
Blue Swirl May 22, 2011, 12:06 p.m. UTC | #25
On Sun, May 22, 2011 at 2:36 PM, Avi Kivity <avi@redhat.com> wrote:
> On 05/22/2011 12:32 PM, Blue Swirl wrote:
>>
>> >>  >    +void memory_region_add_coalescing(MemoryRegion *mr,
>> >>  >    +                                  target_phys_addr_t offset,
>> >>  >    +                                  target_phys_addr_t size);
>> >>  >    +/* Disable MMIO coalescing for the region. */
>> >>  >    +void memory_region_clear_coalescing(MemoryRegion *mr);
>> >>
>> >>  Perhaps the interface could be more generic, like
>> >>  +void memory_region_set_property(MemoryRegion *mr, unsigned flags);
>> >>  +void memory_region_clear_property(MemoryRegion *mr, unsigned flags);
>> >>
>> >
>> >  Coalescing is a complex property, not just a boolean attribute.  We
>> > probably
>> >  will have a number of boolean attributes later, though.
>>
>> But what is the difference between adding coalescing to an area and
>> setting the bit property 'coalescing' to an area? At least what you
>> propose now is not so complex that it couldn't be handled as a single
>> bit.
>
> Look at the API - add_coalescing() sets the coalescing property on a
> subrange of the memory region, not the entire region.

Right, but doesn't the same apply to any other properties, they may
apply to a full range or just a subrange?

> (motivation - hw/e1000.c).
>
>> >>  >    + * conflicts are resolved by having a higher @priority hide a
>> >> lower
>> >>  >  @priority.
>> >>  >    + * Subregions without priority are taken as @priority 0.
>> >>  >    + */
>> >>  >    +void memory_region_add_subregion_overlap(MemoryRegion *mr,
>> >>  >    +                                         target_phys_addr_t
>> >> offset,
>> >>  >    +                                         MemoryRegion
>> >> *subregion,
>> >>  >    +                                         unsigned priority);
>> >>  >    +/* Remove a subregion. */
>> >>  >    +void memory_region_del_subregion(MemoryRegion *mr,
>> >>  >    +                                 MemoryRegion *subregion);
>> >>
>> >>  What would the subregions be used for?
>> >
>> >  Subregions describe the flow of data through the memory bus.  We'd have
>> > a
>> >  subregion for the PCI bus, with its own subregions for various BARs,
>> > with
>> >  some having subregions for dispatching different MMIO types within the
>> > BAR.
>> >
>> >  This allows, for example, the PCI layer to move a BAR without the PCI
>> > device
>> >  knowing anything about it.
>>
>> But why can't a first class region be used for that?
>
> Subregions are first-class regions.  In fact all regions are subregions
> except the root.

Oh, I see now. Maybe the comments should describe this. Or perhaps the
terms should be something like 'bus/bridge/root' and 'region' instead
of 'region' and 'subregion'?

> It's a tree of regions, each level adding an offset, clipping, and perhaps
> other attributes, with the leaves providing actual memory (mmio or RAM).

I thought that there are two classes of regions, like PCI device vs. a
single BAR.
Avi Kivity May 22, 2011, 12:18 p.m. UTC | #26
On 05/22/2011 03:06 PM, Blue Swirl wrote:
> On Sun, May 22, 2011 at 2:36 PM, Avi Kivity<avi@redhat.com>  wrote:
> >  On 05/22/2011 12:32 PM, Blue Swirl wrote:
> >>
> >>  >>    >      +void memory_region_add_coalescing(MemoryRegion *mr,
> >>  >>    >      +                                  target_phys_addr_t offset,
> >>  >>    >      +                                  target_phys_addr_t size);
> >>  >>    >      +/* Disable MMIO coalescing for the region. */
> >>  >>    >      +void memory_region_clear_coalescing(MemoryRegion *mr);
> >>  >>
> >>  >>    Perhaps the interface could be more generic, like
> >>  >>    +void memory_region_set_property(MemoryRegion *mr, unsigned flags);
> >>  >>    +void memory_region_clear_property(MemoryRegion *mr, unsigned flags);
> >>  >>
> >>  >
> >>  >    Coalescing is a complex property, not just a boolean attribute.  We
> >>  >  probably
> >>  >    will have a number of boolean attributes later, though.
> >>
> >>  But what is the difference between adding coalescing to an area and
> >>  setting the bit property 'coalescing' to an area? At least what you
> >>  propose now is not so complex that it couldn't be handled as a single
> >>  bit.
> >
> >  Look at the API - add_coalescing() sets the coalescing property on a
> >  subrange of the memory region, not the entire region.
>
> Right, but doesn't the same apply to any other properties, they may
> apply to a full range or just a subrange?

We'll know when we have more properties.  I expect most will be region-wide.


> >
> >  Subregions are first-class regions.  In fact all regions are subregions
> >  except the root.
>
> Oh, I see now. Maybe the comments should describe this. Or perhaps the
> terms should be something like 'bus/bridge/root' and 'region' instead
> of 'region' and 'subregion'?

Problem is, memory_region_add_subregion() adds both sub-bridges and leaf 
regions.

It's quite possible that BAR 0 will be a leaf region, and BAR 1 will be 
a sub-bridge.

Can you suggest an alternative naming for the API?


> >  It's a tree of regions, each level adding an offset, clipping, and perhaps
> >  other attributes, with the leaves providing actual memory (mmio or RAM).
>
> I thought that there are two classes of regions, like PCI device vs. a
> single BAR.

It's true in a way, except the mapping is not 1:1.
Blue Swirl May 22, 2011, 3:32 p.m. UTC | #27
On Sun, May 22, 2011 at 3:18 PM, Avi Kivity <avi@redhat.com> wrote:
> On 05/22/2011 03:06 PM, Blue Swirl wrote:
>>
>> On Sun, May 22, 2011 at 2:36 PM, Avi Kivity<avi@redhat.com>  wrote:
>> >  On 05/22/2011 12:32 PM, Blue Swirl wrote:
>> >>
>> >>  >>    >      +void memory_region_add_coalescing(MemoryRegion *mr,
>> >>  >>    >      +                                  target_phys_addr_t
>> >> offset,
>> >>  >>    >      +                                  target_phys_addr_t
>> >> size);
>> >>  >>    >      +/* Disable MMIO coalescing for the region. */
>> >>  >>    >      +void memory_region_clear_coalescing(MemoryRegion *mr);
>> >>  >>
>> >>  >>    Perhaps the interface could be more generic, like
>> >>  >>    +void memory_region_set_property(MemoryRegion *mr, unsigned
>> >> flags);
>> >>  >>    +void memory_region_clear_property(MemoryRegion *mr, unsigned
>> >> flags);
>> >>  >>
>> >>  >
>> >>  >    Coalescing is a complex property, not just a boolean attribute.
>> >>  We
>> >>  >  probably
>> >>  >    will have a number of boolean attributes later, though.
>> >>
>> >>  But what is the difference between adding coalescing to an area and
>> >>  setting the bit property 'coalescing' to an area? At least what you
>> >>  propose now is not so complex that it couldn't be handled as a single
>> >>  bit.
>> >
>> >  Look at the API - add_coalescing() sets the coalescing property on a
>> >  subrange of the memory region, not the entire region.
>>
>> Right, but doesn't the same apply to any other properties, they may
>> apply to a full range or just a subrange?
>
> We'll know when we have more properties.  I expect most will be region-wide.

Since we don't know about those yet, coalescing API could be like you
propose. Later it can be changed to the property API, or leave for
convenience.

>> >
>> >  Subregions are first-class regions.  In fact all regions are subregions
>> >  except the root.
>>
>> Oh, I see now. Maybe the comments should describe this. Or perhaps the
>> terms should be something like 'bus/bridge/root' and 'region' instead
>> of 'region' and 'subregion'?
>
> Problem is, memory_region_add_subregion() adds both sub-bridges and leaf
> regions.
>
> It's quite possible that BAR 0 will be a leaf region, and BAR 1 will be a
> sub-bridge.
>
> Can you suggest an alternative naming for the API?

How about
memory_region_container_init()
memory_region_add()
Avi Kivity May 22, 2011, 3:36 p.m. UTC | #28
On 05/22/2011 06:32 PM, Blue Swirl wrote:
> >
> >  Can you suggest an alternative naming for the API?
>
> How about
> memory_region_container_init()
> memory_region_add()

I'm neutral.  If someone seconds this, I'll make it so.
Anthony Liguori May 22, 2011, 3:44 p.m. UTC | #29
On 05/22/2011 01:38 AM, Avi Kivity wrote:
> On 05/20/2011 05:31 PM, Anthony Liguori wrote:
>>> Several alpha system chips MCE when accessed with incorrect sizes.
>>> E.g. only 64-bit accesses are allowed.
>>
>>
>> But is this a characteristic of devices or is this a characteristic of
>> the chipset/CPU?
>
> The chipset is modelled by a MemoryRegion too.
>
>>
>> At any rate, I'm fairly sure it doesn't belong in the MemoryRegion
>> structure.
>>
>
> Since it isn't a global property, where does it belong?

The chipset should have an intercept in the dispatch path that enforces 
this (this assumes hierarchical dispatch).

Regards,

Anthony Liguori

>
Anthony Liguori May 22, 2011, 3:46 p.m. UTC | #30
On 05/22/2011 01:39 AM, Avi Kivity wrote:
> On 05/20/2011 05:46 PM, Anthony Liguori wrote:
>> On 05/20/2011 09:40 AM, Richard Henderson wrote:
>>> On 05/20/2011 07:31 AM, Anthony Liguori wrote:
>>>> But is this a characteristic of devices or is this a characteristic
>>>> of the chipset/CPU?
>>>
>>> Chipset.
>>
>> So if the chipset only allows accesses that are 64-bit, then you'll
>> want to have hierarchical dispatch filter non 64-bit accesses and
>> raise an MCE appropriately.
>>
>> So you don't need anything in MemoryRegion, you need code in the
>> dispatch path.
>
> MemoryRegion *is* the dispatch path. Only done declaratively so we can
> flatten it whenever it changes.

We don't want dispatch to be 100% declarative.  That's what will cause 
the API to get horrendously ugly.

An example is PCI-bus level endianness conversion.  I also believe the 
Sparc IOMMU has an xor engine.

You could add a 'bool swap_endian' and an 'uint32_t xor_mask' in 
MemoryRegion but now you're adding a ton of platform specific knowledge 
to what should be an independent layer.

Regards,

Anthony Liguori

>
Avi Kivity May 22, 2011, 3:52 p.m. UTC | #31
On 05/22/2011 06:46 PM, Anthony Liguori wrote:
>> MemoryRegion *is* the dispatch path. Only done declaratively so we can
>> flatten it whenever it changes.
>
>
> We don't want dispatch to be 100% declarative.  That's what will cause 
> the API to get horrendously ugly.
>
> An example is PCI-bus level endianness conversion.  I also believe the 
> Sparc IOMMU has an xor engine.
>
> You could add a 'bool swap_endian' and an 'uint32_t xor_mask' in 
> MemoryRegion but now you're adding a ton of platform specific 
> knowledge to what should be an independent layer.
>

Currently containers do not use the read/write function pointers.  We 
could make them (or perhaps others) act as transformations on the data 
as it passes.  So it's still declarative (the entire flow is known at 
registration time) but it doesn't embed platform magic.

Byteswap is sufficiently generic to add as a region property, IMO.

btw, wrt iommu emulation, the API finally allows us to determine the 
path between any two devices, so we can apply the right iommu 
transformations.
Avi Kivity May 22, 2011, 3:56 p.m. UTC | #32
On 05/22/2011 06:44 PM, Anthony Liguori wrote:
>
>>>
>>> At any rate, I'm fairly sure it doesn't belong in the MemoryRegion
>>> structure.
>>>
>>
>> Since it isn't a global property, where does it belong?
>
>
> The chipset should have an intercept in the dispatch path that 
> enforces this (this assumes hierarchical dispatch).

So instead of region->ops->valid.*, region->ops->intercept()?

btw, that still doesn't require hierarchical dispatch.  If intercepts 
only check if the access is valid, it can still be flattened.

Hierarchical dispatch means that chipset callbacks get to choose which 
subregion callbacks are called, which isn't the case here.  If it were, 
it would be impossible to figure out the kvm slot layout.
diff mbox

Patch

diff --git a/memory.h b/memory.h
new file mode 100644
index 0000000..77c5951
--- /dev/null
+++ b/memory.h
@@ -0,0 +1,142 @@ 
+#ifndef MEMORY_H
+#define MEMORY_H
+
+#include <stdint.h>
+#include <stdbool.h>
+#include "qemu-common.h"
+#include "cpu-common.h"
+#include "targphys.h"
+#include "qemu-queue.h"
+
+typedef struct MemoryRegionOps MemoryRegionOps;
+typedef struct MemoryRegion MemoryRegion;
+
+/*
+ * Memory region callbacks
+ */
+struct MemoryRegionOps {
+    /* Read from the memory region. @addr is relative to @mr; @size is
+     * in bytes. */
+    uint64_t (*read)(MemoryRegion *mr,
+                     target_phys_addr_t addr,
+                     unsigned size);
+    /* Write to the memory region. @addr is relative to @mr; @size is
+     * in bytes. */
+    void (*write)(MemoryRegion *mr,
+                  target_phys_addr_t addr,
+                  uint64_t data,
+                  unsigned size);
+    /* Guest-visible constraints: */
+    struct {
+        /* If nonzero, specify bounds on access sizes beyond which a machine
+         * check is thrown.
+         */
+        unsigned min_access_size;
+        unsigned max_access_size;
+        /* If true, unaligned accesses are supported.  Otherwise unaligned
+         * accesses throw machine checks.
+         */
+         bool unaligned;
+    } valid;
+    /* Internal implementation constraints: */
+    struct {
+        /* If nonzero, specifies the minimum size implemented.  Smaller sizes
+         * will be rounded upwards and a partial result will be returned.
+         */
+        unsigned min_access_size;
+        /* If nonzero, specifies the maximum size implemented.  Larger sizes
+         * will be done as a series of accesses with smaller sizes.
+         */
+        unsigned max_access_size;
+        /* If true, unaligned accesses are supported.  Otherwise all accesses
+         * are converted to (possibly multiple) naturally aligned accesses.
+         */
+         bool unaligned;
+    } impl;
+};
+
+typedef struct CoalescedMemoryRange CoalescedMemoryRange;
+
+struct CoalescedMemoryRange {
+    target_phys_addr_t start;
+    target_phys_addr_t size;
+    QTAILQ_ENTRY(coalesced_ranges) link;
+};
+
+struct MemoryRegion {
+    /* All fields are private - violators will be prosecuted */
+    const MemoryRegionOps *ops;
+    MemoryRegion *parent;
+    target_phys_addr_t size;
+    target_phys_addr_t addr;
+    ram_addr_t ram_addr;
+    unsigned priority;
+    bool may_overlap;
+    QTAILQ_HEAD(subregions, MemoryRegion) subregions;
+    QTAILQ_ENTRY(subregions) subregions_link;
+    QTAILQ_HEAD(coalesced_ranges, CoalescedMemoryRange) coalesced;
+};
+
+/* Initialize a memory region
+ *
+ * The region typically acts as a container for other memory regions.
+ */
+void memory_region_init(MemoryRegion *mr,
+                        target_phys_addr_t size);
+/* Initialize an I/O memory region.  Accesses into the region will be
+ * cause the callbacks in @ops to be called.
+ *
+ * if @size is nonzero, subregions will be clipped to @size.
+ */
+void memory_region_init_io(MemoryRegion *mr,
+                           const MemoryRegionOps *ops,
+                           target_phys_addr_t size);
+/* Initialize an I/O memory region.  Accesses into the region will be
+ * modify memory directly.
+ */
+void memory_region_init_ram(MemoryRegion *mr,
+                            target_phys_addr_t size);
+/* Initialize a RAM memory region.  Accesses into the region will be
+ * modify memory in @ptr directly.
+ */
+void memory_region_init_ram_ptr(MemoryRegion *mr,
+                                target_phys_addr_t size,
+                                void *ptr);
+/* Destroy a memory region.  The memory becomes inaccessible. */
+void memory_region_destroy(MemoryRegion *mr);
+/* Sets an offset to be added to MemoryRegionOps callbacks. */
+void memory_region_set_offset(MemoryRegion *mr, target_phys_addr_t offset);
+/* Turn loggging on or off for specified client (display, migration) */
+void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client);
+/* Enable memory coalescing for the region.  MMIO ->write callbacks may be
+ * delayed until a non-coalesced MMIO is issued.
+ */
+void memory_region_set_coalescing(MemoryRegion *mr);
+/* Enable memory coalescing for a sub-range of the region.  MMIO ->write
+ * callbacks may be delayed until a non-coalesced MMIO is issued.
+ */
+void memory_region_add_coalescing(MemoryRegion *mr,
+                                  target_phys_addr_t offset,
+                                  target_phys_addr_t size);
+/* Disable MMIO coalescing for the region. */
+void memory_region_clear_coalescing(MemoryRegion *mr);
+
+/* Add a sub-region at @offset.  The sub-region may not overlap with other
+ * subregions (except for those explicitly marked as overlapping)
+ */
+void memory_region_add_subregion(MemoryRegion *mr,
+                                 target_phys_addr_t offset,
+                                 MemoryRegion *subregion);
+/* Add a sub-region at @offset.  The sun-region may overlap other subregions;
+ * conflicts are resolved by having a higher @priority hide a lower @priority.
+ * Subregions without priority are taken as @priority 0.
+ */
+void memory_region_add_subregion_overlap(MemoryRegion *mr,
+                                         target_phys_addr_t offset,
+                                         MemoryRegion *subregion,
+                                         unsigned priority);
+/* Remove a subregion. */
+void memory_region_del_subregion(MemoryRegion *mr,
+                                 MemoryRegion *subregion);
+
+#endif