diff mbox

[v2,2/2] RAM API: Make use of it for x86 PC

Message ID 20101101151415.3927.87944.stgit@s20.home
State New
Headers show

Commit Message

Alex Williamson Nov. 1, 2010, 3:14 p.m. UTC
Register the actual VM RAM using the new API

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 hw/pc.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

Comments

Anthony Liguori Nov. 16, 2010, 2:58 p.m. UTC | #1
On 11/01/2010 10:14 AM, Alex Williamson wrote:
> Register the actual VM RAM using the new API
>
> Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
> ---
>
>   hw/pc.c |   12 ++++++------
>   1 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/hw/pc.c b/hw/pc.c
> index 69b13bf..0ea6d10 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -912,14 +912,14 @@ void pc_memory_init(ram_addr_t ram_size,
>       /* allocate RAM */
>       ram_addr = qemu_ram_alloc(NULL, "pc.ram",
>                                 below_4g_mem_size + above_4g_mem_size);
> -    cpu_register_physical_memory(0, 0xa0000, ram_addr);
> -    cpu_register_physical_memory(0x100000,
> -                 below_4g_mem_size - 0x100000,
> -                 ram_addr + 0x100000);
> +
> +    qemu_ram_register(0, 0xa0000, ram_addr);
> +    qemu_ram_register(0x100000, below_4g_mem_size - 0x100000,
> +                      ram_addr + 0x100000);
>   #if TARGET_PHYS_ADDR_BITS>  32
>       if (above_4g_mem_size>  0) {
> -        cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size,
> -                                     ram_addr + below_4g_mem_size);
> +        qemu_ram_register(0x100000000ULL, above_4g_mem_size,
> +                          ram_addr + below_4g_mem_size);
>       }
>    

Take a look at the memory shadowing in the i440fx.  The regions of 
memory in the BIOS area can temporarily become RAM.

That's because there is normally RAM backing this space but the memory 
controller redirects writes to the ROM space.

Not sure the best way to handle this, but the basic concept is, RAM 
always exists but if a device tries to access it, it may or may not be 
accessible as RAM at any given point in time.

Regards,

Anthony Liguori

>   #endif
>
>
>
>
>
Alex Williamson Nov. 16, 2010, 9:24 p.m. UTC | #2
On Tue, 2010-11-16 at 08:58 -0600, Anthony Liguori wrote:
> On 11/01/2010 10:14 AM, Alex Williamson wrote:
> > Register the actual VM RAM using the new API
> >
> > Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
> > ---
> >
> >   hw/pc.c |   12 ++++++------
> >   1 files changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/hw/pc.c b/hw/pc.c
> > index 69b13bf..0ea6d10 100644
> > --- a/hw/pc.c
> > +++ b/hw/pc.c
> > @@ -912,14 +912,14 @@ void pc_memory_init(ram_addr_t ram_size,
> >       /* allocate RAM */
> >       ram_addr = qemu_ram_alloc(NULL, "pc.ram",
> >                                 below_4g_mem_size + above_4g_mem_size);
> > -    cpu_register_physical_memory(0, 0xa0000, ram_addr);
> > -    cpu_register_physical_memory(0x100000,
> > -                 below_4g_mem_size - 0x100000,
> > -                 ram_addr + 0x100000);
> > +
> > +    qemu_ram_register(0, 0xa0000, ram_addr);
> > +    qemu_ram_register(0x100000, below_4g_mem_size - 0x100000,
> > +                      ram_addr + 0x100000);
> >   #if TARGET_PHYS_ADDR_BITS>  32
> >       if (above_4g_mem_size>  0) {
> > -        cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size,
> > -                                     ram_addr + below_4g_mem_size);
> > +        qemu_ram_register(0x100000000ULL, above_4g_mem_size,
> > +                          ram_addr + below_4g_mem_size);
> >       }
> >    
> 
> Take a look at the memory shadowing in the i440fx.  The regions of 
> memory in the BIOS area can temporarily become RAM.
> 
> That's because there is normally RAM backing this space but the memory 
> controller redirects writes to the ROM space.
> 
> Not sure the best way to handle this, but the basic concept is, RAM 
> always exists but if a device tries to access it, it may or may not be 
> accessible as RAM at any given point in time.

Gack.  For the benefit of those that want to join the fun without
digging up the spec, these magic flippable segments the i440fx can
toggle are 12 fixed 16k segments from 0xc0000 to 0xeffff and a single
64k segment from 0xf0000 to 0xfffff.  There are read-enable and
write-enable bits for each, so the chipset can be configured to read
from the bios and write to memory (to setup BIOS-RAM caching), and read
from memory and write to the bios (to enable BIOS-RAM caching).  The
other bit combinations are also available.

For my purpose in using this to program the IOMMU with guest physical to
host virtual addresses for device assignment, it doesn't really matter
since there should never be a DMA in this range of memory.  But for a
general RAM API, I'm not sure either.  I'm tempted to say that while
this is in fact a use of RAM, the RAM is never presented to the guest as
usable system memory (E820_RAM for x86), and should therefore be
excluded from the RAM API if we're using it only to track regions that
are actual guest usable physical memory.

We had talked on irc that pc.c should be registering 0x0 to
below_4g_mem_size as ram, but now I tend to disagree with that.  The
memory backing 0xa0000-0x100000 is present, but it's not presented to
the guest as usable RAM.  What's your strict definition of what the RAM
API includes?  Is it only what the guest could consider usable RAM or
does it also include quirky chipset accelerator features like this
(everything with a guest physical address)?  Thanks,

Alex
Gleb Natapov Nov. 17, 2010, 9:31 a.m. UTC | #3
On Tue, Nov 16, 2010 at 02:24:06PM -0700, Alex Williamson wrote:
> On Tue, 2010-11-16 at 08:58 -0600, Anthony Liguori wrote:
> > On 11/01/2010 10:14 AM, Alex Williamson wrote:
> > > Register the actual VM RAM using the new API
> > >
> > > Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
> > > ---
> > >
> > >   hw/pc.c |   12 ++++++------
> > >   1 files changed, 6 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/hw/pc.c b/hw/pc.c
> > > index 69b13bf..0ea6d10 100644
> > > --- a/hw/pc.c
> > > +++ b/hw/pc.c
> > > @@ -912,14 +912,14 @@ void pc_memory_init(ram_addr_t ram_size,
> > >       /* allocate RAM */
> > >       ram_addr = qemu_ram_alloc(NULL, "pc.ram",
> > >                                 below_4g_mem_size + above_4g_mem_size);
> > > -    cpu_register_physical_memory(0, 0xa0000, ram_addr);
> > > -    cpu_register_physical_memory(0x100000,
> > > -                 below_4g_mem_size - 0x100000,
> > > -                 ram_addr + 0x100000);
> > > +
> > > +    qemu_ram_register(0, 0xa0000, ram_addr);
> > > +    qemu_ram_register(0x100000, below_4g_mem_size - 0x100000,
> > > +                      ram_addr + 0x100000);
> > >   #if TARGET_PHYS_ADDR_BITS>  32
> > >       if (above_4g_mem_size>  0) {
> > > -        cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size,
> > > -                                     ram_addr + below_4g_mem_size);
> > > +        qemu_ram_register(0x100000000ULL, above_4g_mem_size,
> > > +                          ram_addr + below_4g_mem_size);
> > >       }
> > >    
> > 
> > Take a look at the memory shadowing in the i440fx.  The regions of 
> > memory in the BIOS area can temporarily become RAM.
> > 
> > That's because there is normally RAM backing this space but the memory 
> > controller redirects writes to the ROM space.
> > 
> > Not sure the best way to handle this, but the basic concept is, RAM 
> > always exists but if a device tries to access it, it may or may not be 
> > accessible as RAM at any given point in time.
> 
> Gack.  For the benefit of those that want to join the fun without
> digging up the spec, these magic flippable segments the i440fx can
> toggle are 12 fixed 16k segments from 0xc0000 to 0xeffff and a single
> 64k segment from 0xf0000 to 0xfffff.  There are read-enable and
> write-enable bits for each, so the chipset can be configured to read
> from the bios and write to memory (to setup BIOS-RAM caching), and read
> from memory and write to the bios (to enable BIOS-RAM caching).  The
> other bit combinations are also available.
> 
There is also 0xa0000−0xbffff which is usually part of framebuffer, but
chipset can be configured to access this memory as RAM when CPU is in
SMM mode.

> For my purpose in using this to program the IOMMU with guest physical to
> host virtual addresses for device assignment, it doesn't really matter
> since there should never be a DMA in this range of memory.  But for a
IIRC spec defines for each range of memory if it is accessed from PCI bus.

> general RAM API, I'm not sure either.  I'm tempted to say that while
> this is in fact a use of RAM, the RAM is never presented to the guest as
> usable system memory (E820_RAM for x86), and should therefore be
> excluded from the RAM API if we're using it only to track regions that
> are actual guest usable physical memory.
A guest is no only OS (like Windows or Linux), but the bios code is also part
of the guest and it can access all of this memory.

> 
> We had talked on irc that pc.c should be registering 0x0 to
> below_4g_mem_size as ram, but now I tend to disagree with that.  The
> memory backing 0xa0000-0x100000 is present, but it's not presented to
> the guest as usable RAM.
It is, during SMM, if bios configured chipset to do so.
 
>                          What's your strict definition of what the RAM
> API includes?  Is it only what the guest could consider usable RAM or
> does it also include quirky chipset accelerator features like this
> (everything with a guest physical address)?  Thanks,
> 

--
			Gleb.
Anthony Liguori Nov. 17, 2010, 11:42 p.m. UTC | #4
On 11/16/2010 03:24 PM, Alex Williamson wrote:
> On Tue, 2010-11-16 at 08:58 -0600, Anthony Liguori wrote:
>    
>> On 11/01/2010 10:14 AM, Alex Williamson wrote:
>>      
>>> Register the actual VM RAM using the new API
>>>
>>> Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
>>> ---
>>>
>>>    hw/pc.c |   12 ++++++------
>>>    1 files changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/hw/pc.c b/hw/pc.c
>>> index 69b13bf..0ea6d10 100644
>>> --- a/hw/pc.c
>>> +++ b/hw/pc.c
>>> @@ -912,14 +912,14 @@ void pc_memory_init(ram_addr_t ram_size,
>>>        /* allocate RAM */
>>>        ram_addr = qemu_ram_alloc(NULL, "pc.ram",
>>>                                  below_4g_mem_size + above_4g_mem_size);
>>> -    cpu_register_physical_memory(0, 0xa0000, ram_addr);
>>> -    cpu_register_physical_memory(0x100000,
>>> -                 below_4g_mem_size - 0x100000,
>>> -                 ram_addr + 0x100000);
>>> +
>>> +    qemu_ram_register(0, 0xa0000, ram_addr);
>>> +    qemu_ram_register(0x100000, below_4g_mem_size - 0x100000,
>>> +                      ram_addr + 0x100000);
>>>    #if TARGET_PHYS_ADDR_BITS>   32
>>>        if (above_4g_mem_size>   0) {
>>> -        cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size,
>>> -                                     ram_addr + below_4g_mem_size);
>>> +        qemu_ram_register(0x100000000ULL, above_4g_mem_size,
>>> +                          ram_addr + below_4g_mem_size);
>>>        }
>>>
>>>        
>> Take a look at the memory shadowing in the i440fx.  The regions of
>> memory in the BIOS area can temporarily become RAM.
>>
>> That's because there is normally RAM backing this space but the memory
>> controller redirects writes to the ROM space.
>>
>> Not sure the best way to handle this, but the basic concept is, RAM
>> always exists but if a device tries to access it, it may or may not be
>> accessible as RAM at any given point in time.
>>      
> Gack.  For the benefit of those that want to join the fun without
> digging up the spec, these magic flippable segments the i440fx can
> toggle are 12 fixed 16k segments from 0xc0000 to 0xeffff and a single
> 64k segment from 0xf0000 to 0xfffff.  There are read-enable and
> write-enable bits for each, so the chipset can be configured to read
> from the bios and write to memory (to setup BIOS-RAM caching), and read
> from memory and write to the bios (to enable BIOS-RAM caching).  The
> other bit combinations are also available.
>    

Yup.  As Gleb mentions, there's the SDRAM register which controls 
whether 0xa0000 is mapped to PCI or whether it's mapped to RAM (but KVM 
explicitly disabled SMM support).

> For my purpose in using this to program the IOMMU with guest physical to
> host virtual addresses for device assignment, it doesn't really matter
> since there should never be a DMA in this range of memory.  But for a
> general RAM API, I'm not sure either.  I'm tempted to say that while
> this is in fact a use of RAM, the RAM is never presented to the guest as
> usable system memory (E820_RAM for x86), and should therefore be
> excluded from the RAM API if we're using it only to track regions that
> are actual guest usable physical memory.
>
> We had talked on irc that pc.c should be registering 0x0 to
> below_4g_mem_size as ram, but now I tend to disagree with that.  The
> memory backing 0xa0000-0x100000 is present, but it's not presented to
> the guest as usable RAM.  What's your strict definition of what the RAM
> API includes?  Is it only what the guest could consider usable RAM or
> does it also include quirky chipset accelerator features like this
> (everything with a guest physical address)?  Thanks,
>    

Today we model on flat space that's a mixed of device memory, RAM, or 
ROM.  This is not how machines work and the limitations of this model is 
holding us back.

IRL, there's a block of RAM that's connected to a memory controller.  
The CPU is also connected to the memory controller.  Devices are 
connected to another controller which is in turn connected to the memory 
controller.  There may, in fact, be more than one controller between a 
device and the memory controller.

A controller may change the way a device sees memory in arbitrary ways.  
In fact, two controllers accessing the same page might see something 
totally different.

The idea behind the RAM API is to begin to establish this hierarchy.  
RAM is not what any particular device sees--it's actual RAM.  IOW, the 
RAM API should represent what address mapping I would get if I talked 
directly to DIMMs.

This is not what RamBlock is even though the name would suggest 
otherwise.  RamBlocks are anything that qemu represents as cache 
consistency directly accessable memory.  Device ROMs and areas of device 
RAM are all allocated from the RamBlock space.

So the very first task of a RAM API is to simplify differentiate these 
two things.  Once we have the base RAM API, we can start adding the 
proper APIs that sit on top of it (like a PCI memory API).

Regards,

Anthony Liguori

> Alex
>
>
>
Avi Kivity Nov. 18, 2010, 3:22 p.m. UTC | #5
On 11/18/2010 01:42 AM, Anthony Liguori wrote:
>> Gack.  For the benefit of those that want to join the fun without
>> digging up the spec, these magic flippable segments the i440fx can
>> toggle are 12 fixed 16k segments from 0xc0000 to 0xeffff and a single
>> 64k segment from 0xf0000 to 0xfffff.  There are read-enable and
>> write-enable bits for each, so the chipset can be configured to read
>> from the bios and write to memory (to setup BIOS-RAM caching), and read
>> from memory and write to the bios (to enable BIOS-RAM caching).  The
>> other bit combinations are also available.
>
> Yup.  As Gleb mentions, there's the SDRAM register which controls 
> whether 0xa0000 is mapped to PCI or whether it's mapped to RAM (but 
> KVM explicitly disabled SMM support).

KVM not supporting SMM is a bug (albeit one that is likely to remain 
unresolved for a while).  Let's pretend that kvm smm support is not an 
issue.

IIUC, SMM means that there two memory maps when the cpu accesses memory, 
one for SMM, one for non-SMM.

>
>> For my purpose in using this to program the IOMMU with guest physical to
>> host virtual addresses for device assignment, it doesn't really matter
>> since there should never be a DMA in this range of memory.  But for a
>> general RAM API, I'm not sure either.  I'm tempted to say that while
>> this is in fact a use of RAM, the RAM is never presented to the guest as
>> usable system memory (E820_RAM for x86), and should therefore be
>> excluded from the RAM API if we're using it only to track regions that
>> are actual guest usable physical memory.
>>
>> We had talked on irc that pc.c should be registering 0x0 to
>> below_4g_mem_size as ram, but now I tend to disagree with that.  The
>> memory backing 0xa0000-0x100000 is present, but it's not presented to
>> the guest as usable RAM.  What's your strict definition of what the RAM
>> API includes?  Is it only what the guest could consider usable RAM or
>> does it also include quirky chipset accelerator features like this
>> (everything with a guest physical address)?  Thanks,
>
> Today we model on flat space that's a mixed of device memory, RAM, or 
> ROM.  This is not how machines work and the limitations of this model 
> is holding us back.
>
> IRL, there's a block of RAM that's connected to a memory controller.  
> The CPU is also connected to the memory controller.  Devices are 
> connected to another controller which is in turn connected to the 
> memory controller.  There may, in fact, be more than one controller 
> between a device and the memory controller.
>
> A controller may change the way a device sees memory in arbitrary 
> ways.  In fact, two controllers accessing the same page might see 
> something totally different.
>
> The idea behind the RAM API is to begin to establish this hierarchy.  
> RAM is not what any particular device sees--it's actual RAM.  IOW, the 
> RAM API should represent what address mapping I would get if I talked 
> directly to DIMMs.
>
> This is not what RamBlock is even though the name would suggest 
> otherwise.  RamBlocks are anything that qemu represents as cache 
> consistency directly accessable memory.  Device ROMs and areas of 
> device RAM are all allocated from the RamBlock space.
>
> So the very first task of a RAM API is to simplify differentiate these 
> two things.  Once we have the base RAM API, we can start adding the 
> proper APIs that sit on top of it (like a PCI memory API).

Things aren't that bad - a ram_addr_t and a physical address are already 
different things, so we already have one level of translation.
Anthony Liguori Nov. 18, 2010, 3:46 p.m. UTC | #6
On 11/18/2010 09:22 AM, Avi Kivity wrote:
> On 11/18/2010 01:42 AM, Anthony Liguori wrote:
>>> Gack.  For the benefit of those that want to join the fun without
>>> digging up the spec, these magic flippable segments the i440fx can
>>> toggle are 12 fixed 16k segments from 0xc0000 to 0xeffff and a single
>>> 64k segment from 0xf0000 to 0xfffff.  There are read-enable and
>>> write-enable bits for each, so the chipset can be configured to read
>>> from the bios and write to memory (to setup BIOS-RAM caching), and read
>>> from memory and write to the bios (to enable BIOS-RAM caching).  The
>>> other bit combinations are also available.
>>
>> Yup.  As Gleb mentions, there's the SDRAM register which controls 
>> whether 0xa0000 is mapped to PCI or whether it's mapped to RAM (but 
>> KVM explicitly disabled SMM support).
>
> KVM not supporting SMM is a bug (albeit one that is likely to remain 
> unresolved for a while).  Let's pretend that kvm smm support is not an 
> issue.
>
> IIUC, SMM means that there two memory maps when the cpu accesses 
> memory, one for SMM, one for non-SMM.

No.  That's not what it means.  With the i440fx, when the CPU accesses 
0xa0000, it gets forwarded to the PCI bus no different than an access to 
0xe0000.

If the CPU asserts the EXF4#/Ab7# signal, then the i440fx directs CPU 
accesses to 0xa0000 to RAM instead of the PCI bus.

Alternatively, if the SMRAM register is activated, then the i440fx will 
redirect 0xa0000 to RAM regardless of whether the CPU asserts that 
signal.  That means that even without KVM supporting SMM, this mode can 
happen.

In general, the memory controller can redirect IO accesses to RAM or to 
the PCI bus.  The PCI bus may redirect the access to the ISA bus.

>>> For my purpose in using this to program the IOMMU with guest 
>>> physical to
>>> host virtual addresses for device assignment, it doesn't really matter
>>> since there should never be a DMA in this range of memory.  But for a
>>> general RAM API, I'm not sure either.  I'm tempted to say that while
>>> this is in fact a use of RAM, the RAM is never presented to the 
>>> guest as
>>> usable system memory (E820_RAM for x86), and should therefore be
>>> excluded from the RAM API if we're using it only to track regions that
>>> are actual guest usable physical memory.
>>>
>>> We had talked on irc that pc.c should be registering 0x0 to
>>> below_4g_mem_size as ram, but now I tend to disagree with that.  The
>>> memory backing 0xa0000-0x100000 is present, but it's not presented to
>>> the guest as usable RAM.  What's your strict definition of what the RAM
>>> API includes?  Is it only what the guest could consider usable RAM or
>>> does it also include quirky chipset accelerator features like this
>>> (everything with a guest physical address)?  Thanks,
>>
>> Today we model on flat space that's a mixed of device memory, RAM, or 
>> ROM.  This is not how machines work and the limitations of this model 
>> is holding us back.
>>
>> IRL, there's a block of RAM that's connected to a memory controller.  
>> The CPU is also connected to the memory controller.  Devices are 
>> connected to another controller which is in turn connected to the 
>> memory controller.  There may, in fact, be more than one controller 
>> between a device and the memory controller.
>>
>> A controller may change the way a device sees memory in arbitrary 
>> ways.  In fact, two controllers accessing the same page might see 
>> something totally different.
>>
>> The idea behind the RAM API is to begin to establish this hierarchy.  
>> RAM is not what any particular device sees--it's actual RAM.  IOW, 
>> the RAM API should represent what address mapping I would get if I 
>> talked directly to DIMMs.
>>
>> This is not what RamBlock is even though the name would suggest 
>> otherwise.  RamBlocks are anything that qemu represents as cache 
>> consistency directly accessable memory.  Device ROMs and areas of 
>> device RAM are all allocated from the RamBlock space.
>>
>> So the very first task of a RAM API is to simplify differentiate 
>> these two things.  Once we have the base RAM API, we can start adding 
>> the proper APIs that sit on top of it (like a PCI memory API).
>
> Things aren't that bad - a ram_addr_t and a physical address are 
> already different things, so we already have one level of translation.

Yeah, but ram_addr_t doesn't model anything meaningful IRL.  It's an 
internal implementation detail.

Regards,

Anthony Liguori
Gleb Natapov Nov. 18, 2010, 3:51 p.m. UTC | #7
On Wed, Nov 17, 2010 at 05:42:28PM -0600, Anthony Liguori wrote:
> >For my purpose in using this to program the IOMMU with guest physical to
> >host virtual addresses for device assignment, it doesn't really matter
> >since there should never be a DMA in this range of memory.  But for a
> >general RAM API, I'm not sure either.  I'm tempted to say that while
> >this is in fact a use of RAM, the RAM is never presented to the guest as
> >usable system memory (E820_RAM for x86), and should therefore be
> >excluded from the RAM API if we're using it only to track regions that
> >are actual guest usable physical memory.
> >
> >We had talked on irc that pc.c should be registering 0x0 to
> >below_4g_mem_size as ram, but now I tend to disagree with that.  The
> >memory backing 0xa0000-0x100000 is present, but it's not presented to
> >the guest as usable RAM.  What's your strict definition of what the RAM
> >API includes?  Is it only what the guest could consider usable RAM or
> >does it also include quirky chipset accelerator features like this
> >(everything with a guest physical address)?  Thanks,
> 
> Today we model on flat space that's a mixed of device memory, RAM,
> or ROM.  This is not how machines work and the limitations of this
> model is holding us back.
> 
> IRL, there's a block of RAM that's connected to a memory controller.
> The CPU is also connected to the memory controller.  Devices are
> connected to another controller which is in turn connected to the
> memory controller.  There may, in fact, be more than one controller
> between a device and the memory controller.
> 
> A controller may change the way a device sees memory in arbitrary
> ways.  In fact, two controllers accessing the same page might see
> something totally different.
> 
> The idea behind the RAM API is to begin to establish this hierarchy.
> RAM is not what any particular device sees--it's actual RAM.  IOW,
> the RAM API should represent what address mapping I would get if I
> talked directly to DIMMs.
> 
> This is not what RamBlock is even though the name would suggest
> otherwise.  RamBlocks are anything that qemu represents as cache
> consistency directly accessable memory.  Device ROMs and areas of
> device RAM are all allocated from the RamBlock space.
> 
> So the very first task of a RAM API is to simplify differentiate
> these two things.  Once we have the base RAM API, we can start
> adding the proper APIs that sit on top of it (like a PCI memory
> API).
> 
+1 for all above. What happens when device access some address is
completely different from what happens when CPU access the same address
(or even another device on another bus). For instance how MSI is
implemented now CPU can send MSI by writing to 0xfee00000 memory range.
I do not think you can do that on real HW.

--
			Gleb.
Avi Kivity Nov. 18, 2010, 3:57 p.m. UTC | #8
On 11/18/2010 05:46 PM, Anthony Liguori wrote:
> On 11/18/2010 09:22 AM, Avi Kivity wrote:
>> On 11/18/2010 01:42 AM, Anthony Liguori wrote:
>>>> Gack.  For the benefit of those that want to join the fun without
>>>> digging up the spec, these magic flippable segments the i440fx can
>>>> toggle are 12 fixed 16k segments from 0xc0000 to 0xeffff and a single
>>>> 64k segment from 0xf0000 to 0xfffff.  There are read-enable and
>>>> write-enable bits for each, so the chipset can be configured to read
>>>> from the bios and write to memory (to setup BIOS-RAM caching), and 
>>>> read
>>>> from memory and write to the bios (to enable BIOS-RAM caching).  The
>>>> other bit combinations are also available.
>>>
>>> Yup.  As Gleb mentions, there's the SDRAM register which controls 
>>> whether 0xa0000 is mapped to PCI or whether it's mapped to RAM (but 
>>> KVM explicitly disabled SMM support).
>>
>> KVM not supporting SMM is a bug (albeit one that is likely to remain 
>> unresolved for a while).  Let's pretend that kvm smm support is not 
>> an issue.
>>
>> IIUC, SMM means that there two memory maps when the cpu accesses 
>> memory, one for SMM, one for non-SMM.
>
> No.  That's not what it means.  With the i440fx, when the CPU accesses 
> 0xa0000, it gets forwarded to the PCI bus no different than an access 
> to 0xe0000.
>
> If the CPU asserts the EXF4#/Ab7# signal, then the i440fx directs CPU 
> accesses to 0xa0000 to RAM instead of the PCI bus.

That's what "two memory maps" mean.  If you have one cpu in SMM and 
another outside SMM, then those two maps are active simultaneously.

>
> Alternatively, if the SMRAM register is activated, then the i440fx 
> will redirect 0xa0000 to RAM regardless of whether the CPU asserts 
> that signal.  That means that even without KVM supporting SMM, this 
> mode can happen.

That's a single memory map that is modified under hardware control, it's 
no different than BARs and such.

>> Things aren't that bad - a ram_addr_t and a physical address are 
>> already different things, so we already have one level of translation.
>
> Yeah, but ram_addr_t doesn't model anything meaningful IRL.  It's an 
> internal implementation detail.
>

Does it matter?  We can say those are addresses on the memory bus.  
Since they are not observable anyway, who cares if the correspond with 
reality or not?
Anthony Liguori Nov. 18, 2010, 4:09 p.m. UTC | #9
On 11/18/2010 09:57 AM, Avi Kivity wrote:
> On 11/18/2010 05:46 PM, Anthony Liguori wrote:
>> On 11/18/2010 09:22 AM, Avi Kivity wrote:
>>> On 11/18/2010 01:42 AM, Anthony Liguori wrote:
>>>>> Gack.  For the benefit of those that want to join the fun without
>>>>> digging up the spec, these magic flippable segments the i440fx can
>>>>> toggle are 12 fixed 16k segments from 0xc0000 to 0xeffff and a single
>>>>> 64k segment from 0xf0000 to 0xfffff.  There are read-enable and
>>>>> write-enable bits for each, so the chipset can be configured to read
>>>>> from the bios and write to memory (to setup BIOS-RAM caching), and 
>>>>> read
>>>>> from memory and write to the bios (to enable BIOS-RAM caching).  The
>>>>> other bit combinations are also available.
>>>>
>>>> Yup.  As Gleb mentions, there's the SDRAM register which controls 
>>>> whether 0xa0000 is mapped to PCI or whether it's mapped to RAM (but 
>>>> KVM explicitly disabled SMM support).
>>>
>>> KVM not supporting SMM is a bug (albeit one that is likely to remain 
>>> unresolved for a while).  Let's pretend that kvm smm support is not 
>>> an issue.
>>>
>>> IIUC, SMM means that there two memory maps when the cpu accesses 
>>> memory, one for SMM, one for non-SMM.
>>
>> No.  That's not what it means.  With the i440fx, when the CPU 
>> accesses 0xa0000, it gets forwarded to the PCI bus no different than 
>> an access to 0xe0000.
>>
>> If the CPU asserts the EXF4#/Ab7# signal, then the i440fx directs CPU 
>> accesses to 0xa0000 to RAM instead of the PCI bus.
>
> That's what "two memory maps" mean.  If you have one cpu in SMM and 
> another outside SMM, then those two maps are active simultaneously.

I'm not sure if more modern memory controllers do special things here, 
but for the i440fx, if any CPU asserts SMM mode, then any memory access 
to that space is going to access SMRAM.

>>
>> Alternatively, if the SMRAM register is activated, then the i440fx 
>> will redirect 0xa0000 to RAM regardless of whether the CPU asserts 
>> that signal.  That means that even without KVM supporting SMM, this 
>> mode can happen.
>
> That's a single memory map that is modified under hardware control, 
> it's no different than BARs and such.

There is a single block of RAM.

The memory controller may either forward an address unmodified to the 
RAM block or it may forward the address to the PCI bus[1].  A non CPU 
access goes through a controller hierarchy and may be modified while it 
transverses the hierarchy.

So really, we should have a big chunk of RAM that we associate with a 
guest, with a list of intercepts that changes as the devices are 
modified.  Instead of having that list dispatch directly to a device, we 
should send all intercepted accesses to the memory controller and let 
the memory controller propagate out the access to the appropriate device.

[1] The except is access to the local APIC.  That's handled directly by 
the CPU (or immediately outside of the CPU before the access gets to the 
memory controller if the local APIC is external to the CPU).

>>> Things aren't that bad - a ram_addr_t and a physical address are 
>>> already different things, so we already have one level of translation.
>>
>> Yeah, but ram_addr_t doesn't model anything meaningful IRL.  It's an 
>> internal implementation detail.
>>
>
> Does it matter?  We can say those are addresses on the memory bus.  
> Since they are not observable anyway, who cares if the correspond with 
> reality or not?

It matters a lot because the life cycle of RAM is different from the 
life cycle of ROM.

For instance, the original goal was to madvise(MADV_DONTNEED) RAM on 
reboot.  You can't do that to ROM because the contents matter.

But for PV devices, we can be loose in how we define the way the devices 
interact with the rest of the system.  For instance, we can say that 
virtio-pci devices are directly connected to RAM and do not go through 
the memory controllers.  That means we could get stable mappings of the 
virtio ring.

Regards,

Anthony Liguori
Avi Kivity Nov. 18, 2010, 4:18 p.m. UTC | #10
On 11/18/2010 06:09 PM, Anthony Liguori wrote:
>> That's what "two memory maps" mean.  If you have one cpu in SMM and 
>> another outside SMM, then those two maps are active simultaneously.
>
>
> I'm not sure if more modern memory controllers do special things here, 
> but for the i440fx, if any CPU asserts SMM mode, then any memory 
> access to that space is going to access SMRAM.

How does SMP work then?

> SMM Space Open (DOPEN). When DOPEN=1 and DLCK=0, SMM space DRAM is 
> made visible even
> when CPU cycle does not indicate SMM mode access via EXF4#/Ab7# 
> signal. This is intended to help
> BIOS initialize SMM space. Software should ensure that DOPEN=1 is 
> mutually exclusive with DCLS=1.
> When DLCK is set to a 1, DOPEN is set to 0 and becomes read only.

The words "cpu cycle does not indicate SMM mode" seem to say that SMM 
accesses are made on a per-transaction basis, or so my lawyers tell me.


>
>>>
>>> Alternatively, if the SMRAM register is activated, then the i440fx 
>>> will redirect 0xa0000 to RAM regardless of whether the CPU asserts 
>>> that signal.  That means that even without KVM supporting SMM, this 
>>> mode can happen.
>>
>> That's a single memory map that is modified under hardware control, 
>> it's no different than BARs and such.
>
> There is a single block of RAM.
>
> The memory controller may either forward an address unmodified to the 
> RAM block or it may forward the address to the PCI bus[1].  A non CPU 
> access goes through a controller hierarchy and may be modified while 
> it transverses the hierarchy.
>
> So really, we should have a big chunk of RAM that we associate with a 
> guest, with a list of intercepts that changes as the devices are 
> modified.  Instead of having that list dispatch directly to a device, 
> we should send all intercepted accesses to the memory controller and 
> let the memory controller propagate out the access to the appropriate 
> device.
>
> [1] The except is access to the local APIC.  That's handled directly 
> by the CPU (or immediately outside of the CPU before the access gets 
> to the memory controller if the local APIC is external to the CPU).
>

Agree.  However the point with SMM is that the dispatch is made not only 
based on the address, but also based on SMM mode (and, unfortunately, 
can also be different based on read vs write).

>>>> Things aren't that bad - a ram_addr_t and a physical address are 
>>>> already different things, so we already have one level of translation.
>>>
>>> Yeah, but ram_addr_t doesn't model anything meaningful IRL.  It's an 
>>> internal implementation detail.
>>>
>>
>> Does it matter?  We can say those are addresses on the memory bus.  
>> Since they are not observable anyway, who cares if the correspond 
>> with reality or not?
>
> It matters a lot because the life cycle of RAM is different from the 
> life cycle of ROM.
>
> For instance, the original goal was to madvise(MADV_DONTNEED) RAM on 
> reboot.  You can't do that to ROM because the contents matter.

I don't think you can do that to RAM either.

>
> But for PV devices, we can be loose in how we define the way the 
> devices interact with the rest of the system.  For instance, we can 
> say that virtio-pci devices are directly connected to RAM and do not 
> go through the memory controllers.  That means we could get stable 
> mappings of the virtio ring.

That wouldn't work once we have an iommu and start to assign them to 
nested guests.
Michael S. Tsirkin Nov. 18, 2010, 4:35 p.m. UTC | #11
On Thu, Nov 18, 2010 at 06:18:06PM +0200, Avi Kivity wrote:
> >But for PV devices, we can be loose in how we define the way the
> >devices interact with the rest of the system.  For instance, we
> >can say that virtio-pci devices are directly connected to RAM and
> >do not go through the memory controllers.  That means we could get
> >stable mappings of the virtio ring.
> 
> That wouldn't work once we have an iommu and start to assign them to
> nested guests.

Yea. Not sure whether I'm worried about that though.
Mixing in all the problems inherent in nested virt, PV and assigned
devices seems especially masochistic.

> -- 
> error compiling committee.c: too many arguments to function
diff mbox

Patch

diff --git a/hw/pc.c b/hw/pc.c
index 69b13bf..0ea6d10 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -912,14 +912,14 @@  void pc_memory_init(ram_addr_t ram_size,
     /* allocate RAM */
     ram_addr = qemu_ram_alloc(NULL, "pc.ram",
                               below_4g_mem_size + above_4g_mem_size);
-    cpu_register_physical_memory(0, 0xa0000, ram_addr);
-    cpu_register_physical_memory(0x100000,
-                 below_4g_mem_size - 0x100000,
-                 ram_addr + 0x100000);
+
+    qemu_ram_register(0, 0xa0000, ram_addr);
+    qemu_ram_register(0x100000, below_4g_mem_size - 0x100000,
+                      ram_addr + 0x100000);
 #if TARGET_PHYS_ADDR_BITS > 32
     if (above_4g_mem_size > 0) {
-        cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size,
-                                     ram_addr + below_4g_mem_size);
+        qemu_ram_register(0x100000000ULL, above_4g_mem_size,
+                          ram_addr + below_4g_mem_size);
     }
 #endif