diff mbox

[RFC] memory: drop _overlap variant

Message ID 20130214124555.GA13861@redhat.com
State New
Headers show

Commit Message

Michael S. Tsirkin Feb. 14, 2013, 12:45 p.m. UTC
overlap flag in the region is currently unused, most devices have no
idea whether their region overlaps with anything, so drop it,
assume that all regions can overlap and always require priority.

It's also not clear how should devices allocate priorities.
As a first step, define a set of symbolic priority
names so it's easier to grep for.

The result of this patch is that it's easy to see who
uses a given priority.

To avoid breaking build, this patch should be combined with
a patch updating all devices.
I have it working but am not sending this yet to avoid unnecessary load
on list, so this patch shoul dnot be applied as is.

Will send the full version if people don't object to this one.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: avi.kivity@gmail.com

---

Comments

Peter Maydell Feb. 14, 2013, 12:56 p.m. UTC | #1
On 14 February 2013 12:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> overlap flag in the region is currently unused, most devices have no
> idea whether their region overlaps with anything, so drop it,
> assume that all regions can overlap and always require priority.

Devices themselves shouldn't care, for the most part -- they just
provide a memory region and it's their parent that has to map it
and know whether it overlaps or not. Similarly, parents should
generally be in control of the container they're mapping the
memory region into, and know whether it will be an overlapping
map or not.

> It's also not clear how should devices allocate priorities.

Up to the parent which controls the region being mapped into.
I definitely don't like making the priority argument mandatory:
this is just introducing pointless boilerplate for the common
case where nothing overlaps and you know nothing overlaps.

Maybe we should take the printf() about subregion collisions
in memory_region_add_subregion_common() out of the #if 0
that it currently sits in?

-- PMM
Michael S. Tsirkin Feb. 14, 2013, 1:09 p.m. UTC | #2
On Thu, Feb 14, 2013 at 12:56:02PM +0000, Peter Maydell wrote:
> On 14 February 2013 12:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> > overlap flag in the region is currently unused, most devices have no
> > idea whether their region overlaps with anything, so drop it,
> > assume that all regions can overlap and always require priority.
> 
> Devices themselves shouldn't care, for the most part -- they just
> provide a memory region and it's their parent that has to map it
> and know whether it overlaps or not. Similarly, parents should
> generally be in control of the container they're mapping the
> memory region into, and know whether it will be an overlapping
> map or not.
> 
> > It's also not clear how should devices allocate priorities.
> 
> Up to the parent which controls the region being mapped into.

We could just assume same priority as parent but what happens if it
has to be different? There are also aliases so a region
can have multiple parents. Presumably it will have to have
different priorities depending on what the parent does?
Here's a list of instances using priority != 0.

hw/armv7m_nvic.c:                                MEMORY_PRIO_LOW);
hw/cirrus_vga.c:                                    MEMORY_PRIO_LOW);
hw/cirrus_vga.c:                                    MEMORY_PRIO_LOW);
hw/cirrus_vga.c:                                &s->low_mem_container, MEMORY_PRIO_LOW);
hw/kvm/pci-assign.c: &r_dev->mmio, MEMORY_PRIO_LOW);
hw/kvmvapic.c:    memory_region_add_subregion(as, rom_paddr, &s->rom, MEMORY_PRIO_HIGH);
hw/lpc_ich9.c:                                        MEMORY_PRIO_LOW);
hw/onenand.c:                                &s->mapped_ram, MEMORY_PRIO_LOW);
hw/pam.c:                                    MEMORY_PRIO_LOW);
hw/pc.c:                                MEMORY_PRIO_LOW);
hw/pc_sysfw.c:                                isa_bios, MEMORY_PRIO_LOW);
hw/pc_sysfw.c:                                isa_bios, MEMORY_PRIO_LOW);
hw/pci/pci.c:                                        MEMORY_PRIO_LOW);
hw/pci/pci_bridge.c:    memory_region_add_subregion(parent_space, base, alias, MEMORY_PRIO_LOW);
hw/piix_pci.c:                                MEMORY_PRIO_LOW);
hw/piix_pci.c:                                &d->rcr_mem, MEMORY_PRIO_LOW);
hw/q35.c:                                &mch->smram_region, MEMORY_PRIO_LOW);
hw/vga-isa.c:                                MEMORY_PRIO_LOW);
hw/vga.c:                                    MEMORY_PRIO_MEDIUM);
hw/vga.c:                                vga_io_memory, MEMORY_PRIO_LOW);
hw/xen_pt_msi.c:                                MEMORY_PRIO_MEDIUM); /* Priority: pci default + 1

Making priority relative to parent but not the same just seems like a recipe for disaster.

> I definitely don't like making the priority argument mandatory:
> this is just introducing pointless boilerplate for the common
> case where nothing overlaps and you know nothing overlaps.

Non overlapping is not a common case at all.  E.g. with normal PCI
devices you have no way to know nothing overlaps - addresses are guest
programmable.

See also recent discussion about 64 bit BARs.

We could add a wrapper for MEMORY_PRIO_LOWEST - will that address
your concern?

> Maybe we should take the printf() about subregion collisions
> in memory_region_add_subregion_common() out of the #if 0
> that it currently sits in?
> 
> -- PMM

This is just a debugging tool, it won't fix anything.
Peter Maydell Feb. 14, 2013, 1:22 p.m. UTC | #3
On 14 February 2013 13:09, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Feb 14, 2013 at 12:56:02PM +0000, Peter Maydell wrote:
>> Up to the parent which controls the region being mapped into.
>
> We could just assume same priority as parent

Er, no. I mean the code in control of the parent MR sets the
priority, when it calls memory_region_add_subregion_overlap().

> but what happens if it
> has to be different? There are also aliases so a region
> can have multiple parents.

The alias has its own priority.

> Presumably it will have to have
> different priorities depending on what the parent does?
> Here's a list of instances using priority != 0.
>
> hw/armv7m_nvic.c:                                MEMORY_PRIO_LOW);

So this one I know about, and it's a good example of what
I'm talking about. This function sets up a container memory
region ("nvic"), and it is in total control of what is
mapped into that container. Specifically, it puts in a
"nvic_sysregs" background region which covers the whole
0x1000 size of the container (at an implicit priority of
zero). It then layers over that an alias of the GIC
registers ("nvic-gic") at a specific address and with
a priority of 1 so it appears above the background region.
Nobody else ever puts anything in this container, so
the only thing we care about is that the priority of
the nvic-gic region is higher than that of the nvic_sysregs
region; and it's clear from the code that we do that.
Priority is a local question whose meaning is only relevant
within a particular container region, not system-wide, and
having system-wide MEMORY_PRIO_ defines obscures that IMHO.

>> I definitely don't like making the priority argument mandatory:
>> this is just introducing pointless boilerplate for the common
>> case where nothing overlaps and you know nothing overlaps.
>
> Non overlapping is not a common case at all.  E.g. with normal PCI
> devices you have no way to know nothing overlaps - addresses are guest
> programmable.

That means PCI is a special case :-) If the guest can
program overlap then presumably PCI specifies semantics
for what happens then, and there need to be PCI specific
wrappers that enforce those semantics and they can call
the relevant _overlap functions when mapping things.
In any case this isn't a concern for the PCI *device*,
which can just provide its memory regions. It's a problem
the PCI *host adaptor* has to deal with when it's figuring
out how to map those regions into the container which
corresponds to its area of the address space.

> We could add a wrapper for MEMORY_PRIO_LOWEST - will that address
> your concern?

Well, I'm entirely happy with the memory API we have at
the moment, and I'm trying to figure out why you want to
change it...

>> Maybe we should take the printf() about subregion collisions
>> in memory_region_add_subregion_common() out of the #if 0
>> that it currently sits in?

> This is just a debugging tool, it won't fix anything.

It might tell us what bits of code are currently erroneously
mapping regions that overlap without using the _overlap()
function. Then we could fix them.

-- PMM
Michael S. Tsirkin Feb. 14, 2013, 2:02 p.m. UTC | #4
On Thu, Feb 14, 2013 at 01:22:18PM +0000, Peter Maydell wrote:
> On 14 February 2013 13:09, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Feb 14, 2013 at 12:56:02PM +0000, Peter Maydell wrote:
> >> Up to the parent which controls the region being mapped into.
> >
> > We could just assume same priority as parent
> 
> Er, no. I mean the code in control of the parent MR sets the
> priority, when it calls memory_region_add_subregion_overlap().
> 
> > but what happens if it
> > has to be different? There are also aliases so a region
> > can have multiple parents.
> 
> The alias has its own priority.

Well that's the status quo. One of the issues is, you have
no idea what else uses each priority. With this change,
at least you can grep for it.

> > Presumably it will have to have
> > different priorities depending on what the parent does?
> > Here's a list of instances using priority != 0.
> >
> > hw/armv7m_nvic.c:                                MEMORY_PRIO_LOW);
> 
> So this one I know about, and it's a good example of what
> I'm talking about. This function sets up a container memory
> region ("nvic"), and it is in total control of what is
> mapped into that container. Specifically, it puts in a
> "nvic_sysregs" background region which covers the whole
> 0x1000 size of the container (at an implicit priority of
> zero). It then layers over that an alias of the GIC
> registers ("nvic-gic") at a specific address and with
> a priority of 1 so it appears above the background region.
> Nobody else ever puts anything in this container, so
> the only thing we care about is that the priority of
> the nvic-gic region is higher than that of the nvic_sysregs
> region; and it's clear from the code that we do that.
> Priority is a local question whose meaning is only relevant
> within a particular container region, not system-wide,
> and
> having system-wide MEMORY_PRIO_ defines obscures that IMHO.

Well that's not how it seems to work, and I don't see how it *could*
work. Imagine the specific example: ioapic and pci devices. ioapic has
an address within the pci hole but it is not a subregion.
If priority has no meaning how would you decide which one
to use?

Also, on a PC many addresses are guest programmable. We need to behave
in some defined way if guest programs addresses to something silly.

The only reason it works sometimes is because some systems
use fixes addresses which never overlap.

> 
> >> I definitely don't like making the priority argument mandatory:
> >> this is just introducing pointless boilerplate for the common
> >> case where nothing overlaps and you know nothing overlaps.
> >
> > Non overlapping is not a common case at all.  E.g. with normal PCI
> > devices you have no way to know nothing overlaps - addresses are guest
> > programmable.
> 
> That means PCI is a special case :-)
> If the guest can
> program overlap then presumably PCI specifies semantics
> for what happens then, and there need to be PCI specific
> wrappers that enforce those semantics and they can call
> the relevant _overlap functions when mapping things.
> In any case this isn't a concern for the PCI *device*,
> which can just provide its memory regions. It's a problem
> the PCI *host adaptor* has to deal with when it's figuring
> out how to map those regions into the container which
> corresponds to its area of the address space.

Issue is, a PCI device overlapping something else suddenly
becomes this something else's problem.

> > We could add a wrapper for MEMORY_PRIO_LOWEST - will that address
> > your concern?
> 
> Well, I'm entirely happy with the memory API we have at
> the moment, and I'm trying to figure out why you want to
> change it...

I am guessing your systems all have hardcoded addresses
not controlled by guest.

> >> Maybe we should take the printf() about subregion collisions
> >> in memory_region_add_subregion_common() out of the #if 0
> >> that it currently sits in?
> 
> > This is just a debugging tool, it won't fix anything.
> 
> It might tell us what bits of code are currently erroneously
> mapping regions that overlap without using the _overlap()
> function. Then we could fix them.
> 
> -- PMM

When there is a single guest programmable device,
any address can be overlapped by it.
We could invent rules like 'non overlappable is higher
priority' but it seems completely arbitrary, a single
priority is clearer.
Avi Kivity Feb. 14, 2013, 2:14 p.m. UTC | #5
On Thu, Feb 14, 2013 at 3:09 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Feb 14, 2013 at 12:56:02PM +0000, Peter Maydell wrote:
>> On 14 February 2013 12:45, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > overlap flag in the region is currently unused, most devices have no
>> > idea whether their region overlaps with anything, so drop it,
>> > assume that all regions can overlap and always require priority.
>>
>> Devices themselves shouldn't care, for the most part -- they just
>> provide a memory region and it's their parent that has to map it
>> and know whether it overlaps or not. Similarly, parents should
>> generally be in control of the container they're mapping the
>> memory region into, and know whether it will be an overlapping
>> map or not.
>>
>> > It's also not clear how should devices allocate priorities.
>>
>> Up to the parent which controls the region being mapped into.
>
> We could just assume same priority as parent but what happens if it
> has to be different?

Priority is only considered relative to siblings.  The parent's
priority is only considered wrt the parent's siblings, not its
children.

> There are also aliases so a region
> can have multiple parents. Presumably it will have to have
> different priorities depending on what the parent does?

The alias region has its own priority

> Here's a list of instances using priority != 0.
>
> hw/armv7m_nvic.c:                                MEMORY_PRIO_LOW);
> hw/cirrus_vga.c:                                    MEMORY_PRIO_LOW);
> hw/cirrus_vga.c:                                    MEMORY_PRIO_LOW);
> hw/cirrus_vga.c:                                &s->low_mem_container, MEMORY_PRIO_LOW);
> hw/kvm/pci-assign.c: &r_dev->mmio, MEMORY_PRIO_LOW);
> hw/kvmvapic.c:    memory_region_add_subregion(as, rom_paddr, &s->rom, MEMORY_PRIO_HIGH);
> hw/lpc_ich9.c:                                        MEMORY_PRIO_LOW);
> hw/onenand.c:                                &s->mapped_ram, MEMORY_PRIO_LOW);
> hw/pam.c:                                    MEMORY_PRIO_LOW);
> hw/pc.c:                                MEMORY_PRIO_LOW);
> hw/pc_sysfw.c:                                isa_bios, MEMORY_PRIO_LOW);
> hw/pc_sysfw.c:                                isa_bios, MEMORY_PRIO_LOW);
> hw/pci/pci.c:                                        MEMORY_PRIO_LOW);
> hw/pci/pci_bridge.c:    memory_region_add_subregion(parent_space, base, alias, MEMORY_PRIO_LOW);
> hw/piix_pci.c:                                MEMORY_PRIO_LOW);
> hw/piix_pci.c:                                &d->rcr_mem, MEMORY_PRIO_LOW);
> hw/q35.c:                                &mch->smram_region, MEMORY_PRIO_LOW);
> hw/vga-isa.c:                                MEMORY_PRIO_LOW);
> hw/vga.c:                                    MEMORY_PRIO_MEDIUM);
> hw/vga.c:                                vga_io_memory, MEMORY_PRIO_LOW);
> hw/xen_pt_msi.c:                                MEMORY_PRIO_MEDIUM); /* Priority: pci default + 1
>
> Making priority relative to parent but not the same just seems like a recipe for disaster.
>
>> I definitely don't like making the priority argument mandatory:
>> this is just introducing pointless boilerplate for the common
>> case where nothing overlaps and you know nothing overlaps.
>
> Non overlapping is not a common case at all.  E.g. with normal PCI
> devices you have no way to know nothing overlaps - addresses are guest
> programmable.

Non overlapping is mostly useful for embedded platforms.
Avi Kivity Feb. 14, 2013, 2:23 p.m. UTC | #6
On Thu, Feb 14, 2013 at 4:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Feb 14, 2013 at 01:22:18PM +0000, Peter Maydell wrote:
>> On 14 February 2013 13:09, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Thu, Feb 14, 2013 at 12:56:02PM +0000, Peter Maydell wrote:
>> >> Up to the parent which controls the region being mapped into.
>> >
>> > We could just assume same priority as parent
>>
>> Er, no. I mean the code in control of the parent MR sets the
>> priority, when it calls memory_region_add_subregion_overlap().
>>
>> > but what happens if it
>> > has to be different? There are also aliases so a region
>> > can have multiple parents.
>>
>> The alias has its own priority.
>
> Well that's the status quo. One of the issues is, you have
> no idea what else uses each priority. With this change,
> at least you can grep for it.

The question "what priorities do aliases of this region have" is not
an interesting question.  Priority is a local attribute, not an
attribute of the region being prioritized.

>
>> > Presumably it will have to have
>> > different priorities depending on what the parent does?
>> > Here's a list of instances using priority != 0.
>> >
>> > hw/armv7m_nvic.c:                                MEMORY_PRIO_LOW);
>>
>> So this one I know about, and it's a good example of what
>> I'm talking about. This function sets up a container memory
>> region ("nvic"), and it is in total control of what is
>> mapped into that container. Specifically, it puts in a
>> "nvic_sysregs" background region which covers the whole
>> 0x1000 size of the container (at an implicit priority of
>> zero). It then layers over that an alias of the GIC
>> registers ("nvic-gic") at a specific address and with
>> a priority of 1 so it appears above the background region.
>> Nobody else ever puts anything in this container, so
>> the only thing we care about is that the priority of
>> the nvic-gic region is higher than that of the nvic_sysregs
>> region; and it's clear from the code that we do that.
>> Priority is a local question whose meaning is only relevant
>> within a particular container region, not system-wide,
>> and
>> having system-wide MEMORY_PRIO_ defines obscures that IMHO.
>
> Well that's not how it seems to work, and I don't see how it *could*
> work. Imagine the specific example: ioapic and pci devices. ioapic has
> an address within the pci hole but it is not a subregion.
> If priority has no meaning how would you decide which one
> to use?

Like PMM said.  You look at the semantics of the hardware, and map
that onto the API.  If the pci controller says that BARs hide the
ioapic, then you give them higher priority.  If it says that the
ioapic hides BARs, then that gets higher priority.  If it doesn't say
anything, take your pick (or give them the same priority).

>
> Also, on a PC many addresses are guest programmable. We need to behave
> in some defined way if guest programs addresses to something silly.

That's why _overlap exists.

> The only reason it works sometimes is because some systems
> use fixes addresses which never overlap.

That's why the no overlap API exists.

>
>>
>> >> I definitely don't like making the priority argument mandatory:
>> >> this is just introducing pointless boilerplate for the common
>> >> case where nothing overlaps and you know nothing overlaps.
>> >
>> > Non overlapping is not a common case at all.  E.g. with normal PCI
>> > devices you have no way to know nothing overlaps - addresses are guest
>> > programmable.
>>
>> That means PCI is a special case :-)
>> If the guest can
>> program overlap then presumably PCI specifies semantics
>> for what happens then, and there need to be PCI specific
>> wrappers that enforce those semantics and they can call
>> the relevant _overlap functions when mapping things.
>> In any case this isn't a concern for the PCI *device*,
>> which can just provide its memory regions. It's a problem
>> the PCI *host adaptor* has to deal with when it's figuring
>> out how to map those regions into the container which
>> corresponds to its area of the address space.
>
> Issue is, a PCI device overlapping something else suddenly
> becomes this something else's problem.

It is not a problem at all.

>
>> >> Maybe we should take the printf() about subregion collisions
>> >> in memory_region_add_subregion_common() out of the #if 0
>> >> that it currently sits in?
>>
>> > This is just a debugging tool, it won't fix anything.
>>
>> It might tell us what bits of code are currently erroneously
>> mapping regions that overlap without using the _overlap()
>> function. Then we could fix them.
>>
>> -- PMM
>
> When there is a single guest programmable device,
> any address can be overlapped by it.

No.  Only addresses within the same container.  Other containers work
fine without overlap.

> We could invent rules like 'non overlappable is higher
> priority' but it seems completely arbitrary, a single
> priority is clearer.

It's just noise for the xx% of cases which don't need it.
Peter Maydell Feb. 14, 2013, 2:34 p.m. UTC | #7
On 14 February 2013 14:02, Michael S. Tsirkin <mst@redhat.com> wrote:
> Well that's the status quo. One of the issues is, you have
> no idea what else uses each priority. With this change,
> at least you can grep for it.

No, because most of the code you find will be setting
priorities for completely irrelevant containers (for
instance PCI doesn't care at all about priorities used
by the v7m NVIC).

> Imagine the specific example: ioapic and pci devices. ioapic has
> an address within the pci hole but it is not a subregion.
> If priority has no meaning how would you decide which one
> to use?

I don't know about the specifics of the PC's memory layout,
but *something* has to manage the address space that is
being set up. I would expect something like:

 * PCI host controller has a memory region (container) which
   all the PCI devices are mapped into as per guest programming
 * ioapic has a memory region
 * there is another container which contains both these
   memory regions. The code that controls and sets up that
   container [which is probably the pc board model] gets to
   decide priorities, which are purely local to it

(It's possible that at the moment the "another container" is
the get_system_memory() system address space. If it makes life
easier you can always invent another container to give you a
fresh level of indirection.)

> Also, on a PC many addresses are guest programmable. We need to behave
> in some defined way if guest programs addresses to something silly.

Yes, this is the job of the code controlling the container(s)
into which those memory regions may be mapped.

>> If the guest can
>> program overlap then presumably PCI specifies semantics
>> for what happens then, and there need to be PCI specific
>> wrappers that enforce those semantics and they can call
>> the relevant _overlap functions when mapping things.
>> In any case this isn't a concern for the PCI *device*,
>> which can just provide its memory regions. It's a problem
>> the PCI *host adaptor* has to deal with when it's figuring
>> out how to map those regions into the container which
>> corresponds to its area of the address space.
>
> Issue is, a PCI device overlapping something else suddenly
> becomes this something else's problem.

Nope, because the PCI host controller model should be in
complete control of the container all the PCI devices live
in, and it is the thing doing the mapping and unmapping
so it gets to set priorities and mark things as OK to
overlap. Also, memory.c permits overlap if either of the
two memory regions in question is marked as may-overlap;
they don't both have to be marked.

>> > We could add a wrapper for MEMORY_PRIO_LOWEST - will that address
>> > your concern?
>>
>> Well, I'm entirely happy with the memory API we have at
>> the moment, and I'm trying to figure out why you want to
>> change it...
>
> I am guessing your systems all have hardcoded addresses
> not controlled by guest.

Nope. omap_gpmc.c for instance has guest programmable subregions;
it uses a container so the guest's manipulation of these can't
leak out and cause weird things to happen to other bits of QEMU.
[I think we don't implement the correct guest-facing behaviour
when the guest asks for overlapping regions, but we shouldn't
hit the memory.c overlapping-region issue, or if we do it's
a bug to be fixed.]

There's also PCI on the versatilepb, but PCI devices can't
just appear anywhere, the PCI memory windows are at known
addresses and the PCI device can't escape from the wrong
side of the PCI controller.

>> >> Maybe we should take the printf() about subregion collisions
>> >> in memory_region_add_subregion_common() out of the #if 0
>> >> that it currently sits in?
>>
>> > This is just a debugging tool, it won't fix anything.
>>
>> It might tell us what bits of code are currently erroneously
>> mapping regions that overlap without using the _overlap()
>> function. Then we could fix them.

> When there is a single guest programmable device,
> any address can be overlapped by it.

Do we really have an example of a guest programmable
device where the *device itself* decides where it lives
in the address space, rather than the guest being able to
program a host controller/bus fabric/equivalent thing to
specify where the device should live, or the device
effectively negotiating with its bus controller? That
seems very implausible to me just because hardware itself
generally has some kind of hierarchy of buses and it's not
really possible for a leaf node to make itself appear
anywhere in the hierarchy; all it can do is by agreement
with the thing above it appear at some different address at
the same level.

[of course there are trivial systems with a totally flat
bus but that's just a degenerate case of the above where
there's only one thing (the board) managing a single
layer, and typically those systems have everything at
a fixed address anyhow.]

-- PMM
Michael S. Tsirkin Feb. 14, 2013, 2:40 p.m. UTC | #8
On Thu, Feb 14, 2013 at 04:14:39PM +0200, Avi Kivity wrote:
> On Thu, Feb 14, 2013 at 3:09 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Feb 14, 2013 at 12:56:02PM +0000, Peter Maydell wrote:
> >> On 14 February 2013 12:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > overlap flag in the region is currently unused, most devices have no
> >> > idea whether their region overlaps with anything, so drop it,
> >> > assume that all regions can overlap and always require priority.
> >>
> >> Devices themselves shouldn't care, for the most part -- they just
> >> provide a memory region and it's their parent that has to map it
> >> and know whether it overlaps or not. Similarly, parents should
> >> generally be in control of the container they're mapping the
> >> memory region into, and know whether it will be an overlapping
> >> map or not.
> >>
> >> > It's also not clear how should devices allocate priorities.
> >>
> >> Up to the parent which controls the region being mapped into.
> >
> > We could just assume same priority as parent but what happens if it
> > has to be different?
> 
> Priority is only considered relative to siblings.  The parent's
> priority is only considered wrt the parent's siblings, not its
> children.

But some parents are system created and shared by many devices so children for
such have no idea who their siblings are.

Please take a look at the typical map in this mail:
'[BUG] Guest OS hangs on boot when 64bit BAR present'

system overlap 0 pri 0 [0x0 - 0x7fffffffffffffff]
     kvmvapic-rom overlap 1 pri 1000 [0xca000 - 0xcd000]
         pc.ram overlap 0 pri 0 [0xca000 - 0xcd000]
         ++ pc.ram [0xca000 - 0xcd000] is added to view
     ....................
     smram-region overlap 1 pri 1 [0xa0000 - 0xc0000]
         pci overlap 0 pri 0 [0xa0000 - 0xc0000]
             cirrus-lowmem-container overlap 1 pri 1 [0xa0000 - 0xc0000]
                 cirrus-low-memory overlap 0 pri 0 [0xa0000 - 0xc0000]
                ++cirrus-low-memory [0xa0000 - 0xc0000] is added to view
     kvm-ioapic overlap 0 pri 0 [0xfec00000 - 0xfec01000]
    ++kvm-ioapic [0xfec00000 - 0xfec01000] is added to view
     pci-hole64 overlap 0 pri 0 [0x100000000 - 0x4000000100000000]
         pci overlap 0 pri 0 [0x100000000 - 0x4000000100000000]
     pci-hole overlap 0 pri 0 [0x7d000000 - 0x100000000]
         pci overlap 0 pri 0 [0x7d000000 - 0x100000000]
             ivshmem-bar2-container overlap 1 pri 1 [0xfe000000 - 0x100000000]
                 ivshmem.bar2 overlap 0 pri 0 [0xfe000000 - 0x100000000]
                ++ivshmem.bar2 [0xfe000000 - 0xfec00000] is added to view
                ++ivshmem.bar2  [0xfec01000 - 0x100000000] is added to view

As you see, ioapic at 0xfec00000 overlaps pci-hole.
ioapic is guest programmable in theory - should use _overlap?
pci-hole is not but can overlap with ioapic.
So also _overlap?

Let's imagine someone writes a guest programmable device for
ARM. Now we should update all ARM devices from regular to _overlap?


> > There are also aliases so a region
> > can have multiple parents. Presumably it will have to have
> > different priorities depending on what the parent does?
> 
> The alias region has its own priority
> 
> > Here's a list of instances using priority != 0.
> >
> > hw/armv7m_nvic.c:                                MEMORY_PRIO_LOW);
> > hw/cirrus_vga.c:                                    MEMORY_PRIO_LOW);
> > hw/cirrus_vga.c:                                    MEMORY_PRIO_LOW);
> > hw/cirrus_vga.c:                                &s->low_mem_container, MEMORY_PRIO_LOW);
> > hw/kvm/pci-assign.c: &r_dev->mmio, MEMORY_PRIO_LOW);
> > hw/kvmvapic.c:    memory_region_add_subregion(as, rom_paddr, &s->rom, MEMORY_PRIO_HIGH);
> > hw/lpc_ich9.c:                                        MEMORY_PRIO_LOW);
> > hw/onenand.c:                                &s->mapped_ram, MEMORY_PRIO_LOW);
> > hw/pam.c:                                    MEMORY_PRIO_LOW);
> > hw/pc.c:                                MEMORY_PRIO_LOW);
> > hw/pc_sysfw.c:                                isa_bios, MEMORY_PRIO_LOW);
> > hw/pc_sysfw.c:                                isa_bios, MEMORY_PRIO_LOW);
> > hw/pci/pci.c:                                        MEMORY_PRIO_LOW);
> > hw/pci/pci_bridge.c:    memory_region_add_subregion(parent_space, base, alias, MEMORY_PRIO_LOW);
> > hw/piix_pci.c:                                MEMORY_PRIO_LOW);
> > hw/piix_pci.c:                                &d->rcr_mem, MEMORY_PRIO_LOW);
> > hw/q35.c:                                &mch->smram_region, MEMORY_PRIO_LOW);
> > hw/vga-isa.c:                                MEMORY_PRIO_LOW);
> > hw/vga.c:                                    MEMORY_PRIO_MEDIUM);
> > hw/vga.c:                                vga_io_memory, MEMORY_PRIO_LOW);
> > hw/xen_pt_msi.c:                                MEMORY_PRIO_MEDIUM); /* Priority: pci default + 1
> >
> > Making priority relative to parent but not the same just seems like a recipe for disaster.
> >
> >> I definitely don't like making the priority argument mandatory:
> >> this is just introducing pointless boilerplate for the common
> >> case where nothing overlaps and you know nothing overlaps.
> >
> > Non overlapping is not a common case at all.  E.g. with normal PCI
> > devices you have no way to know nothing overlaps - addresses are guest
> > programmable.
> 
> Non overlapping is mostly useful for embedded platforms.

Maybe it should have a longer name like _nonoverlap then?
Current API makes people assume _overlap is only for special
cases and default should be non overlap.
Michael S. Tsirkin Feb. 14, 2013, 2:47 p.m. UTC | #9
On Thu, Feb 14, 2013 at 02:34:20PM +0000, Peter Maydell wrote:
> On 14 February 2013 14:02, Michael S. Tsirkin <mst@redhat.com> wrote:
> > Well that's the status quo. One of the issues is, you have
> > no idea what else uses each priority. With this change,
> > at least you can grep for it.
> 
> No, because most of the code you find will be setting
> priorities for completely irrelevant containers (for
> instance PCI doesn't care at all about priorities used
> by the v7m NVIC).
> 
> > Imagine the specific example: ioapic and pci devices. ioapic has
> > an address within the pci hole but it is not a subregion.
> > If priority has no meaning how would you decide which one
> > to use?
> 
> I don't know about the specifics of the PC's memory layout,
> but *something* has to manage the address space that is
> being set up. I would expect something like:
> 
>  * PCI host controller has a memory region (container) which
>    all the PCI devices are mapped into as per guest programming
>  * ioapic has a memory region
>  * there is another container which contains both these
>    memory regions. The code that controls and sets up that
>    container [which is probably the pc board model] gets to
>    decide priorities, which are purely local to it

This assumes we set up devices in code.
We are trying to move away from that, and have
APIs that let you set up boards from command line.


> (It's possible that at the moment the "another container" is
> the get_system_memory() system address space. If it makes life
> easier you can always invent another container to give you a
> fresh level of indirection.)
> 
> > Also, on a PC many addresses are guest programmable. We need to behave
> > in some defined way if guest programs addresses to something silly.
> 
> Yes, this is the job of the code controlling the container(s)
> into which those memory regions may be mapped.

Some containers don't know what is mapped into them.

> >> If the guest can
> >> program overlap then presumably PCI specifies semantics
> >> for what happens then, and there need to be PCI specific
> >> wrappers that enforce those semantics and they can call
> >> the relevant _overlap functions when mapping things.
> >> In any case this isn't a concern for the PCI *device*,
> >> which can just provide its memory regions. It's a problem
> >> the PCI *host adaptor* has to deal with when it's figuring
> >> out how to map those regions into the container which
> >> corresponds to its area of the address space.
> >
> > Issue is, a PCI device overlapping something else suddenly
> > becomes this something else's problem.
> 
> Nope, because the PCI host controller model should be in
> complete control of the container all the PCI devices live
> in, and it is the thing doing the mapping and unmapping
> so it gets to set priorities and mark things as OK to
> overlap. Also, memory.c permits overlap if either of the
> two memory regions in question is marked as may-overlap;
> they don't both have to be marked.

That's undocumented, isn't it?
And then which one wins?


> >> > We could add a wrapper for MEMORY_PRIO_LOWEST - will that address
> >> > your concern?
> >>
> >> Well, I'm entirely happy with the memory API we have at
> >> the moment, and I'm trying to figure out why you want to
> >> change it...
> >
> > I am guessing your systems all have hardcoded addresses
> > not controlled by guest.
> 
> Nope. omap_gpmc.c for instance has guest programmable subregions;
> it uses a container so the guest's manipulation of these can't
> leak out and cause weird things to happen to other bits of QEMU.
> [I think we don't implement the correct guest-facing behaviour
> when the guest asks for overlapping regions, but we shouldn't
> hit the memory.c overlapping-region issue, or if we do it's
> a bug to be fixed.]
> 
> There's also PCI on the versatilepb, but PCI devices can't
> just appear anywhere, the PCI memory windows are at known
> addresses and the PCI device can't escape from the wrong
> side of the PCI controller.

But, there are devices who's addresses can overlap the PCI
window.


> >> >> Maybe we should take the printf() about subregion collisions
> >> >> in memory_region_add_subregion_common() out of the #if 0
> >> >> that it currently sits in?
> >>
> >> > This is just a debugging tool, it won't fix anything.
> >>
> >> It might tell us what bits of code are currently erroneously
> >> mapping regions that overlap without using the _overlap()
> >> function. Then we could fix them.
> 
> > When there is a single guest programmable device,
> > any address can be overlapped by it.
> 
> Do we really have an example of a guest programmable
> device where the *device itself* decides where it lives
> in the address space, rather than the guest being able to
> program a host controller/bus fabric/equivalent thing to
> specify where the device should live, or the device
> effectively negotiating with its bus controller? That
> seems very implausible to me just because hardware itself
> generally has some kind of hierarchy of buses and it's not
> really possible for a leaf node to make itself appear
> anywhere in the hierarchy; all it can do is by agreement
> with the thing above it appear at some different address at
> the same level.
> [of course there are trivial systems with a totally flat
> bus but that's just a degenerate case of the above where
> there's only one thing (the board) managing a single
> layer, and typically those systems have everything at
> a fixed address anyhow.]
> 
> -- PMM

x86 APIC seems to be such a device: guest programs it,
it's the first to get to say where it is.
Avi Kivity Feb. 14, 2013, 3:07 p.m. UTC | #10
On Thu, Feb 14, 2013 at 4:40 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Feb 14, 2013 at 04:14:39PM +0200, Avi Kivity wrote:
>
> But some parents are system created and shared by many devices so children for
> such have no idea who their siblings are.
>
> Please take a look at the typical map in this mail:
> '[BUG] Guest OS hangs on boot when 64bit BAR present'
>
> system overlap 0 pri 0 [0x0 - 0x7fffffffffffffff]
>      kvmvapic-rom overlap 1 pri 1000 [0xca000 - 0xcd000]
>          pc.ram overlap 0 pri 0 [0xca000 - 0xcd000]
>          ++ pc.ram [0xca000 - 0xcd000] is added to view
>      ....................
>      smram-region overlap 1 pri 1 [0xa0000 - 0xc0000]
>          pci overlap 0 pri 0 [0xa0000 - 0xc0000]
>              cirrus-lowmem-container overlap 1 pri 1 [0xa0000 - 0xc0000]
>                  cirrus-low-memory overlap 0 pri 0 [0xa0000 - 0xc0000]
>                 ++cirrus-low-memory [0xa0000 - 0xc0000] is added to view
>      kvm-ioapic overlap 0 pri 0 [0xfec00000 - 0xfec01000]
>     ++kvm-ioapic [0xfec00000 - 0xfec01000] is added to view
>      pci-hole64 overlap 0 pri 0 [0x100000000 - 0x4000000100000000]
>          pci overlap 0 pri 0 [0x100000000 - 0x4000000100000000]
>      pci-hole overlap 0 pri 0 [0x7d000000 - 0x100000000]
>          pci overlap 0 pri 0 [0x7d000000 - 0x100000000]
>              ivshmem-bar2-container overlap 1 pri 1 [0xfe000000 - 0x100000000]
>                  ivshmem.bar2 overlap 0 pri 0 [0xfe000000 - 0x100000000]
>                 ++ivshmem.bar2 [0xfe000000 - 0xfec00000] is added to view
>                 ++ivshmem.bar2  [0xfec01000 - 0x100000000] is added to view
>
> As you see, ioapic at 0xfec00000 overlaps pci-hole.
> ioapic is guest programmable in theory - should use _overlap?
> pci-hole is not but can overlap with ioapic.
> So also _overlap?

It's a bug.  The ioapic is in the pci address space, not the system
address space.  And yes it's overlappable.

>
> Let's imagine someone writes a guest programmable device for
> ARM. Now we should update all ARM devices from regular to _overlap?

It's sufficient to update the programmable device.

>> >
>> > Non overlapping is not a common case at all.  E.g. with normal PCI
>> > devices you have no way to know nothing overlaps - addresses are guest
>> > programmable.
>>
>> Non overlapping is mostly useful for embedded platforms.
>
> Maybe it should have a longer name like _nonoverlap then?
> Current API makes people assume _overlap is only for special
> cases and default should be non overlap.

The assumption is correct.
Michael S. Tsirkin Feb. 14, 2013, 4:50 p.m. UTC | #11
On Thu, Feb 14, 2013 at 05:07:02PM +0200, Avi Kivity wrote:
> On Thu, Feb 14, 2013 at 4:40 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Feb 14, 2013 at 04:14:39PM +0200, Avi Kivity wrote:
> >
> > But some parents are system created and shared by many devices so children for
> > such have no idea who their siblings are.
> >
> > Please take a look at the typical map in this mail:
> > '[BUG] Guest OS hangs on boot when 64bit BAR present'
> >
> > system overlap 0 pri 0 [0x0 - 0x7fffffffffffffff]
> >      kvmvapic-rom overlap 1 pri 1000 [0xca000 - 0xcd000]
> >          pc.ram overlap 0 pri 0 [0xca000 - 0xcd000]
> >          ++ pc.ram [0xca000 - 0xcd000] is added to view
> >      ....................
> >      smram-region overlap 1 pri 1 [0xa0000 - 0xc0000]
> >          pci overlap 0 pri 0 [0xa0000 - 0xc0000]
> >              cirrus-lowmem-container overlap 1 pri 1 [0xa0000 - 0xc0000]
> >                  cirrus-low-memory overlap 0 pri 0 [0xa0000 - 0xc0000]
> >                 ++cirrus-low-memory [0xa0000 - 0xc0000] is added to view
> >      kvm-ioapic overlap 0 pri 0 [0xfec00000 - 0xfec01000]
> >     ++kvm-ioapic [0xfec00000 - 0xfec01000] is added to view
> >      pci-hole64 overlap 0 pri 0 [0x100000000 - 0x4000000100000000]
> >          pci overlap 0 pri 0 [0x100000000 - 0x4000000100000000]
> >      pci-hole overlap 0 pri 0 [0x7d000000 - 0x100000000]
> >          pci overlap 0 pri 0 [0x7d000000 - 0x100000000]
> >              ivshmem-bar2-container overlap 1 pri 1 [0xfe000000 - 0x100000000]
> >                  ivshmem.bar2 overlap 0 pri 0 [0xfe000000 - 0x100000000]
> >                 ++ivshmem.bar2 [0xfe000000 - 0xfec00000] is added to view
> >                 ++ivshmem.bar2  [0xfec01000 - 0x100000000] is added to view
> >
> > As you see, ioapic at 0xfec00000 overlaps pci-hole.
> > ioapic is guest programmable in theory - should use _overlap?
> > pci-hole is not but can overlap with ioapic.
> > So also _overlap?
> 
> It's a bug.  The ioapic is in the pci address space, not the system
> address space.  And yes it's overlappable.

So you want to put it where? Under pci-hole?
And we'll have to teach all machine types
creating pci-hole about it?

> >
> > Let's imagine someone writes a guest programmable device for
> > ARM. Now we should update all ARM devices from regular to _overlap?
> 
> It's sufficient to update the programmable device.

Then the device can be higher priority (works for apic)
but not lower priority. Make priority signed?

> >> >
> >> > Non overlapping is not a common case at all.  E.g. with normal PCI
> >> > devices you have no way to know nothing overlaps - addresses are guest
> >> > programmable.
> >>
> >> Non overlapping is mostly useful for embedded platforms.
> >
> > Maybe it should have a longer name like _nonoverlap then?
> > Current API makes people assume _overlap is only for special
> > cases and default should be non overlap.
> 
> The assumption is correct.
Avi Kivity Feb. 14, 2013, 5:02 p.m. UTC | #12
On Thu, Feb 14, 2013 at 6:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > As you see, ioapic at 0xfec00000 overlaps pci-hole.
>> > ioapic is guest programmable in theory - should use _overlap?
>> > pci-hole is not but can overlap with ioapic.
>> > So also _overlap?
>>
>> It's a bug.  The ioapic is in the pci address space, not the system
>> address space.  And yes it's overlappable.
>
> So you want to put it where? Under pci-hole?

No, under the pci address space.  Look at the 440fx block diagram.

> And we'll have to teach all machine types
> creating pci-hole about it?

No.

>
>> >
>> > Let's imagine someone writes a guest programmable device for
>> > ARM. Now we should update all ARM devices from regular to _overlap?
>>
>> It's sufficient to update the programmable device.
>
> Then the device can be higher priority (works for apic)
> but not lower priority. Make priority signed?

Is there an actual real problem that needs fixing?
Michael S. Tsirkin Feb. 14, 2013, 6:12 p.m. UTC | #13
On Thu, Feb 14, 2013 at 07:02:15PM +0200, Avi Kivity wrote:
> On Thu, Feb 14, 2013 at 6:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > As you see, ioapic at 0xfec00000 overlaps pci-hole.
> >> > ioapic is guest programmable in theory - should use _overlap?
> >> > pci-hole is not but can overlap with ioapic.
> >> > So also _overlap?
> >>
> >> It's a bug.  The ioapic is in the pci address space, not the system
> >> address space.  And yes it's overlappable.
> >
> > So you want to put it where? Under pci-hole?
> 
> No, under the pci address space.  Look at the 440fx block diagram.
> 
> > And we'll have to teach all machine types
> > creating pci-hole about it?
> 
> No.
> 
> >
> >> >
> >> > Let's imagine someone writes a guest programmable device for
> >> > ARM. Now we should update all ARM devices from regular to _overlap?
> >>
> >> It's sufficient to update the programmable device.
> >
> > Then the device can be higher priority (works for apic)
> > but not lower priority. Make priority signed?
> 
> Is there an actual real problem that needs fixing?

Yes. Guests sometimes cause device BARs to temporary overlap
the APIC range during BAR sizing. It works fine on a physical
system but fails on KVM since pci has same priority.

See the report:
[BUG] Guest OS hangs on boot when 64bit BAR present
Avi Kivity Feb. 14, 2013, 6:23 p.m. UTC | #14
On Thu, Feb 14, 2013 at 8:12 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> Is there an actual real problem that needs fixing?
>
> Yes. Guests sometimes cause device BARs to temporary overlap
> the APIC range during BAR sizing. It works fine on a physical
> system but fails on KVM since pci has same priority.
>
> See the report:
> [BUG] Guest OS hangs on boot when 64bit BAR present
>

Is PCI_COMMAND_MEMORY set while this is going on?
Michael S. Tsirkin Feb. 19, 2013, 2:41 p.m. UTC | #15
On Thu, Feb 14, 2013 at 08:23:04PM +0200, Avi Kivity wrote:
> On Thu, Feb 14, 2013 at 8:12 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>
> >> Is there an actual real problem that needs fixing?
> >
> > Yes. Guests sometimes cause device BARs to temporary overlap
> > the APIC range during BAR sizing. It works fine on a physical
> > system but fails on KVM since pci has same priority.
> >
> > See the report:
> > [BUG] Guest OS hangs on boot when 64bit BAR present
> >
> 
> Is PCI_COMMAND_MEMORY set while this is going on?

I think Linux never clears PCI_COMMAND_MEMORY because
it's buggy in some devices.
Avi Kivity Feb. 19, 2013, 3:58 p.m. UTC | #16
On Tue, Feb 19, 2013 at 4:41 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Feb 14, 2013 at 08:23:04PM +0200, Avi Kivity wrote:
>> On Thu, Feb 14, 2013 at 8:12 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >>
>> >> Is there an actual real problem that needs fixing?
>> >
>> > Yes. Guests sometimes cause device BARs to temporary overlap
>> > the APIC range during BAR sizing. It works fine on a physical
>> > system but fails on KVM since pci has same priority.
>> >
>> > See the report:
>> > [BUG] Guest OS hangs on boot when 64bit BAR present
>> >
>>
>> Is PCI_COMMAND_MEMORY set while this is going on?
>
> I think Linux never clears PCI_COMMAND_MEMORY because
> it's buggy in some devices.

Ok.  Then I recommend defining the MSI message area as overlapped with
sufficient priority.  It should probably be a child of the PCI address
space.

The IOAPIC is actually closer to ISA, but again it's sufficient to
move it to the PCI address space.  I doubt its priority matters.
Michael S. Tsirkin Feb. 19, 2013, 4:08 p.m. UTC | #17
On Tue, Feb 19, 2013 at 05:58:38PM +0200, Avi Kivity wrote:
> On Tue, Feb 19, 2013 at 4:41 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Feb 14, 2013 at 08:23:04PM +0200, Avi Kivity wrote:
> >> On Thu, Feb 14, 2013 at 8:12 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >>
> >> >> Is there an actual real problem that needs fixing?
> >> >
> >> > Yes. Guests sometimes cause device BARs to temporary overlap
> >> > the APIC range during BAR sizing. It works fine on a physical
> >> > system but fails on KVM since pci has same priority.
> >> >
> >> > See the report:
> >> > [BUG] Guest OS hangs on boot when 64bit BAR present
> >> >
> >>
> >> Is PCI_COMMAND_MEMORY set while this is going on?
> >
> > I think Linux never clears PCI_COMMAND_MEMORY because
> > it's buggy in some devices.
> 
> Ok.  Then I recommend defining the MSI message area as overlapped with
> sufficient priority.  It should probably be a child of the PCI address
> space.
> 
> The IOAPIC is actually closer to ISA, but again it's sufficient to
> move it to the PCI address space.  I doubt its priority matters.

Well moving IOAPIC to PCI seems strange, it's not a PCI thing,
and I think it can be moved outside PCI though guests don't do it.
So I think ideally we really should have it look something like:

	sysbus -> ioapic
	       -> pci -> msi
Avi Kivity Feb. 19, 2013, 4:14 p.m. UTC | #18
On Tue, Feb 19, 2013 at 6:08 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> The IOAPIC is actually closer to ISA, but again it's sufficient to
>> move it to the PCI address space.  I doubt its priority matters.
>
> Well moving IOAPIC to PCI seems strange, it's not a PCI thing,
> and I think it can be moved outside PCI though guests don't do it.

Look at the 440fx/piix datasheets.  It's connected to the piix which
decodes its address.  So it's definitely part of the pci address
space.


> So I think ideally we really should have it look something like:
>
>         sysbus -> ioapic
>                -> pci -> msi
>
diff mbox

Patch

diff --git a/hw/sysbus.c b/hw/sysbus.c
index 6d9d1df..f039cb8 100644
--- a/hw/sysbus.c
+++ b/hw/sysbus.c
@@ -61,9 +61,8 @@  void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
         memory_region_del_subregion(get_system_memory(), dev->mmio[n].memory);
     }
     dev->mmio[n].addr = addr;
-    memory_region_add_subregion(get_system_memory(),
-                                addr,
-                                dev->mmio[n].memory);
+    memory_region_add_subregion(get_system_memory(), addr,
+                                dev->mmio[n].memory, MEMORY_PRIO_LOWEST);
 }
 
 
@@ -215,16 +214,10 @@  static char *sysbus_get_fw_dev_path(DeviceState *dev)
 }
 
 void sysbus_add_memory(SysBusDevice *dev, hwaddr addr,
-                       MemoryRegion *mem)
-{
-    memory_region_add_subregion(get_system_memory(), addr, mem);
-}
-
-void sysbus_add_memory_overlap(SysBusDevice *dev, hwaddr addr,
-                               MemoryRegion *mem, unsigned priority)
+                       MemoryRegion *mem, MemoryRegionPriority priority)
 {
-    memory_region_add_subregion_overlap(get_system_memory(), addr, mem,
-                                        priority);
+    memory_region_add_subregion(get_system_memory(), addr, mem,
+                                priority);
 }
 
 void sysbus_del_memory(SysBusDevice *dev, MemoryRegion *mem)
@@ -235,7 +228,8 @@  void sysbus_del_memory(SysBusDevice *dev, MemoryRegion *mem)
 void sysbus_add_io(SysBusDevice *dev, hwaddr addr,
                        MemoryRegion *mem)
 {
-    memory_region_add_subregion(get_system_io(), addr, mem);
+    memory_region_add_subregion(get_system_io(), addr, mem,
+                                MEMORY_PRIO_LOWEST);
 }
 
 void sysbus_del_io(SysBusDevice *dev, MemoryRegion *mem)
diff --git a/hw/sysbus.h b/hw/sysbus.h
index a7fcded..4059c03 100644
--- a/hw/sysbus.h
+++ b/hw/sysbus.h
@@ -57,9 +57,7 @@  void sysbus_init_ioports(SysBusDevice *dev, pio_addr_t ioport, pio_addr_t size);
 void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq irq);
 void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr);
 void sysbus_add_memory(SysBusDevice *dev, hwaddr addr,
-                       MemoryRegion *mem);
-void sysbus_add_memory_overlap(SysBusDevice *dev, hwaddr addr,
-                               MemoryRegion *mem, unsigned priority);
+                       MemoryRegion *mem, MemoryRegionPriority priority);
 void sysbus_del_memory(SysBusDevice *dev, MemoryRegion *mem);
 void sysbus_add_io(SysBusDevice *dev, hwaddr addr,
                    MemoryRegion *mem);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2322732..af1e0fa 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -147,6 +147,14 @@  struct MemoryRegion {
     MemoryRegionIoeventfd *ioeventfds;
 };
 
+typedef enum MemoryRegionPriority {
+    MEMORY_PRIO_LOWEST = 0x0,
+    MEMORY_PRIO_LOW = 0x1,
+    MEMORY_PRIO_MEDIUM = 0x2,
+    MEMORY_PRIO_HIGH = 1000,
+    MEMORY_PRIO_HIGHEST = UINT_MAX,
+} MemoryRegionPriority;
+
 struct MemoryRegionPortio {
     uint32_t offset;
     uint32_t len;
@@ -628,27 +636,9 @@  void memory_region_del_eventfd(MemoryRegion *mr,
 /**
  * memory_region_add_subregion: Add a subregion to a container.
  *
- * Adds a subregion at @offset.  The subregion may not overlap with other
- * subregions (except for those explicitly marked as overlapping).  A region
- * may only be added once as a subregion (unless removed with
- * memory_region_del_subregion()); use memory_region_init_alias() if you
- * want a region to be a subregion in multiple locations.
- *
- * @mr: the region to contain the new subregion; must be a container
- *      initialized with memory_region_init().
- * @offset: the offset relative to @mr where @subregion is added.
- * @subregion: the subregion to be added.
- */
-void memory_region_add_subregion(MemoryRegion *mr,
-                                 hwaddr offset,
-                                 MemoryRegion *subregion);
-/**
- * memory_region_add_subregion_overlap: Add a subregion to a container
- *                                      with overlap.
- *
  * Adds a subregion at @offset.  The subregion may overlap with other
  * subregions.  Conflicts are resolved by having a higher @priority hide a
- * lower @priority. Subregions without priority are taken as @priority 0.
+ * lower @priority.
  * A region may only be added once as a subregion (unless removed with
  * memory_region_del_subregion()); use memory_region_init_alias() if you
  * want a region to be a subregion in multiple locations.
@@ -659,10 +649,10 @@  void memory_region_add_subregion(MemoryRegion *mr,
  * @subregion: the subregion to be added.
  * @priority: used for resolving overlaps; highest priority wins.
  */
-void memory_region_add_subregion_overlap(MemoryRegion *mr,
-                                         hwaddr offset,
-                                         MemoryRegion *subregion,
-                                         unsigned priority);
+void memory_region_add_subregion(MemoryRegion *mr,
+                                 hwaddr offset,
+                                 MemoryRegion *subregion,
+                                 MemoryRegionPriority priority);
 
 /**
  * memory_region_get_ram_addr: Get the ram address associated with a memory
diff --git a/memory.c b/memory.c
index cd7d5e0..4a2c5cd 100644
--- a/memory.c
+++ b/memory.c
@@ -806,7 +806,6 @@  void memory_region_init(MemoryRegion *mr,
     mr->rom_device = false;
     mr->destructor = memory_region_destructor_none;
     mr->priority = 0;
-    mr->may_overlap = false;
     mr->alias = NULL;
     QTAILQ_INIT(&mr->subregions);
     memset(&mr->subregions_link, 0, sizeof mr->subregions_link);
@@ -1318,9 +1317,6 @@  static void memory_region_add_subregion_common(MemoryRegion *mr,
     subregion->parent = mr;
     subregion->addr = offset;
     QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
-        if (subregion->may_overlap || other->may_overlap) {
-            continue;
-        }
         if (int128_gt(int128_make64(offset),
                       int128_add(int128_make64(other->addr), other->size))
             || int128_le(int128_add(int128_make64(offset), subregion->size),
@@ -1353,19 +1349,9 @@  done:
 
 void memory_region_add_subregion(MemoryRegion *mr,
                                  hwaddr offset,
-                                 MemoryRegion *subregion)
+                                 MemoryRegion *subregion,
+                                 MemoryRegionPriority priority)
 {
-    subregion->may_overlap = false;
-    subregion->priority = 0;
-    memory_region_add_subregion_common(mr, offset, subregion);
-}
-
-void memory_region_add_subregion_overlap(MemoryRegion *mr,
-                                         hwaddr offset,
-                                         MemoryRegion *subregion,
-                                         unsigned priority)
-{
-    subregion->may_overlap = true;
     subregion->priority = priority;
     memory_region_add_subregion_common(mr, offset, subregion);
 }
@@ -1396,7 +1382,6 @@  void memory_region_set_address(MemoryRegion *mr, hwaddr addr)
 {
     MemoryRegion *parent = mr->parent;
     unsigned priority = mr->priority;
-    bool may_overlap = mr->may_overlap;
 
     if (addr == mr->addr || !parent) {
         mr->addr = addr;
@@ -1405,11 +1390,7 @@  void memory_region_set_address(MemoryRegion *mr, hwaddr addr)
 
     memory_region_transaction_begin();
     memory_region_del_subregion(parent, mr);
-    if (may_overlap) {
-        memory_region_add_subregion_overlap(parent, addr, mr, priority);
-    } else {
-        memory_region_add_subregion(parent, addr, mr);
-    }
+    memory_region_add_subregion(parent, addr, mr, priority);
     memory_region_transaction_commit();
 }