Patchwork [RFC] vga: flag vga ram for notifiers

login
register
mail settings
Submitter Michael S. Tsirkin
Date March 31, 2011, 5:43 p.m.
Message ID <20110331174328.GA25133@redhat.com>
Download mbox | patch
Permalink /patch/89102/
State New
Headers show

Comments

Michael S. Tsirkin - March 31, 2011, 5:43 p.m.
Currently, vga cards that allocate vga ram,
register it as regular ram. When this happens
a lot, vhost need to get notified and flush
its memory tables, which is slow.

This was observed with cirrus vga.

As a solution, add an explicit flag when
registering vga ram, vhost-net can simply ignore it.

Long term, we might be able to use this API
to avoid the need to request
dirty loggin from devices explicitly.

Tested: with cirrus vga only.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 cpu-common.h    |   22 +++++++++++++++++-----
 exec.c          |   14 ++++++++------
 hw/cirrus_vga.c |   34 ++++++++++++++++++++++------------
 hw/g364fb.c     |    2 +-
 hw/vga-isa-mm.c |    4 ++--
 hw/vga-pci.c    |    3 ++-
 hw/vga.c        |    4 ++--
 hw/vhost.c      |    7 ++++++-
 8 files changed, 60 insertions(+), 30 deletions(-)
Anthony Liguori - March 31, 2011, 6:33 p.m.
On 03/31/2011 12:43 PM, Michael S. Tsirkin wrote:
> Currently, vga cards that allocate vga ram,
> register it as regular ram. When this happens
> a lot, vhost need to get notified and flush
> its memory tables, which is slow.
>
> This was observed with cirrus vga.
>
> As a solution, add an explicit flag when
> registering vga ram, vhost-net can simply ignore it.
>
> Long term, we might be able to use this API
> to avoid the need to request
> dirty loggin from devices explicitly.
>
> Tested: with cirrus vga only.
>
> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>

Treating vga specially is not the right approach.

You want to treat real RAM specially and only make that visible to 
vhost.  See http://wiki.qemu.org/Features/RamAPI

There is nothing special about VGA.

Regards,

Anthony Liguori
Michael S. Tsirkin - March 31, 2011, 6:49 p.m.
On Thu, Mar 31, 2011 at 01:33:58PM -0500, Anthony Liguori wrote:
> On 03/31/2011 12:43 PM, Michael S. Tsirkin wrote:
> >Currently, vga cards that allocate vga ram,
> >register it as regular ram. When this happens
> >a lot, vhost need to get notified and flush
> >its memory tables, which is slow.
> >
> >This was observed with cirrus vga.
> >
> >As a solution, add an explicit flag when
> >registering vga ram, vhost-net can simply ignore it.
> >
> >Long term, we might be able to use this API
> >to avoid the need to request
> >dirty loggin from devices explicitly.
> >
> >Tested: with cirrus vga only.
> >
> >Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> 
> Treating vga specially is not the right approach.
> 
> You want to treat real RAM specially and only make that visible to
> vhost.  See http://wiki.qemu.org/Features/RamAPI

That seems like a dead project? And VGa is unhandled there.


> There is nothing special about VGA.

It is special in that guest can control host virtual to
guest physical mappings. In this VGA is similar to 
IO rather than RAM.
Anthony Liguori - March 31, 2011, 7:01 p.m.
On 03/31/2011 01:49 PM, Michael S. Tsirkin wrote:
> On Thu, Mar 31, 2011 at 01:33:58PM -0500, Anthony Liguori wrote:
>> On 03/31/2011 12:43 PM, Michael S. Tsirkin wrote:
>>> Currently, vga cards that allocate vga ram,
>>> register it as regular ram. When this happens
>>> a lot, vhost need to get notified and flush
>>> its memory tables, which is slow.
>>>
>>> This was observed with cirrus vga.
>>>
>>> As a solution, add an explicit flag when
>>> registering vga ram, vhost-net can simply ignore it.
>>>
>>> Long term, we might be able to use this API
>>> to avoid the need to request
>>> dirty loggin from devices explicitly.
>>>
>>> Tested: with cirrus vga only.
>>>
>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>> Treating vga specially is not the right approach.
>>
>> You want to treat real RAM specially and only make that visible to
>> vhost.  See http://wiki.qemu.org/Features/RamAPI
> That seems like a dead project? And VGa is unhandled there.

Just needs some love.

VGA is just another device.  It happens to be that we treat VGA device 
memory as something that behaves like ram occassionally but that does 
not make it RAM.

Something like vhost doesn't need to see anything but RAM.  If we have a 
mechanism to identify RAM as RAM, then vhost can only look at RAM memory 
and not worry about things like VGA.

I thought Alex had gotten a mini-version of RamAPI in but I can't seem 
to figure out what that included.  At any rate, the point is still that 
registering things that you want to exclude in vhost is the wrong 
approach, you want to explicitly mark the things you want to include.

Regards,

Anthony Liguori

>> There is nothing special about VGA.
> It is special in that guest can control host virtual to
> guest physical mappings. In this VGA is similar to
> IO rather than RAM.
>
>
Michael S. Tsirkin - March 31, 2011, 7:07 p.m.
On Thu, Mar 31, 2011 at 02:01:52PM -0500, Anthony Liguori wrote:
> On 03/31/2011 01:49 PM, Michael S. Tsirkin wrote:
> >On Thu, Mar 31, 2011 at 01:33:58PM -0500, Anthony Liguori wrote:
> >>On 03/31/2011 12:43 PM, Michael S. Tsirkin wrote:
> >>>Currently, vga cards that allocate vga ram,
> >>>register it as regular ram. When this happens
> >>>a lot, vhost need to get notified and flush
> >>>its memory tables, which is slow.
> >>>
> >>>This was observed with cirrus vga.
> >>>
> >>>As a solution, add an explicit flag when
> >>>registering vga ram, vhost-net can simply ignore it.
> >>>
> >>>Long term, we might be able to use this API
> >>>to avoid the need to request
> >>>dirty loggin from devices explicitly.
> >>>
> >>>Tested: with cirrus vga only.
> >>>
> >>>Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >>Treating vga specially is not the right approach.
> >>
> >>You want to treat real RAM specially and only make that visible to
> >>vhost.  See http://wiki.qemu.org/Features/RamAPI
> >That seems like a dead project? And VGa is unhandled there.
> 
> Just needs some love.
> 
> VGA is just another device.  It happens to be that we treat VGA
> device memory as something that behaves like ram occassionally but
> that does not make it RAM.

If we agree on that, will a pair of functions for this work?
How about device_register_ram / device_unregister_ram ?


> Something like vhost doesn't need to see anything but RAM.  If we
> have a mechanism to identify RAM as RAM, then vhost can only look at
> RAM memory and not worry about things like VGA.
> 
> I thought Alex had gotten a mini-version of RamAPI in but I can't
> seem to figure out what that included.

Me neither.

>  At any rate, the point is
> still that registering things that you want to exclude in vhost is
> the wrong approach, you want to explicitly mark the things you want
> to include.
> 
> Regards,
> 
> Anthony Liguori

vhost just wants RAM.
Alex Williamson - March 31, 2011, 7:17 p.m.
On Thu, 2011-03-31 at 14:01 -0500, Anthony Liguori wrote:
> On 03/31/2011 01:49 PM, Michael S. Tsirkin wrote:
> > On Thu, Mar 31, 2011 at 01:33:58PM -0500, Anthony Liguori wrote:
> >> On 03/31/2011 12:43 PM, Michael S. Tsirkin wrote:
> >>> Currently, vga cards that allocate vga ram,
> >>> register it as regular ram. When this happens
> >>> a lot, vhost need to get notified and flush
> >>> its memory tables, which is slow.
> >>>
> >>> This was observed with cirrus vga.
> >>>
> >>> As a solution, add an explicit flag when
> >>> registering vga ram, vhost-net can simply ignore it.
> >>>
> >>> Long term, we might be able to use this API
> >>> to avoid the need to request
> >>> dirty loggin from devices explicitly.
> >>>
> >>> Tested: with cirrus vga only.
> >>>
> >>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >> Treating vga specially is not the right approach.
> >>
> >> You want to treat real RAM specially and only make that visible to
> >> vhost.  See http://wiki.qemu.org/Features/RamAPI
> > That seems like a dead project? And VGa is unhandled there.
> 
> Just needs some love.
> 
> VGA is just another device.  It happens to be that we treat VGA device 
> memory as something that behaves like ram occassionally but that does 
> not make it RAM.
> 
> Something like vhost doesn't need to see anything but RAM.  If we have a 
> mechanism to identify RAM as RAM, then vhost can only look at RAM memory 
> and not worry about things like VGA.
> 
> I thought Alex had gotten a mini-version of RamAPI in but I can't seem 
> to figure out what that included.  At any rate, the point is still that 
> registering things that you want to exclude in vhost is the wrong 
> approach, you want to explicitly mark the things you want to include.

Conveniently I've just started to push my WIP VFIO patches out to a
public tree.  You can find the minimal RamAPI in these:

https://github.com/awilliam/qemu-kvm-vfio/commit/182e557b544bb6e8c6cc57a2a6075d768d224082
https://github.com/awilliam/qemu-kvm-vfio/commit/a505fe08defb4fa1720fffbdf3489289bc930264

(I reserve the right to break these commits)

VFIO only needs the ram_for_each_slot() interface to setup and teardown
the iommu, but obviously this should include memory client type
callbacks for things that only want to track memory.

Alex
Peter Maydell - March 31, 2011, 7:18 p.m.
On 31 March 2011 20:01, Anthony Liguori <anthony@codemonkey.ws> wrote:
> VGA is just another device.  It happens to be that we treat VGA device
> memory as something that behaves like ram occassionally but that does not
> make it RAM.

So, to ask a dumb question, what does make something RAM?
My take on RAM is that RAM is just another device; the only
difference is that you want to be able to implement fast
paths that go straight(ish) to target memory; but that's
an optimisation detail, not something that makes RAM
conceptually different from other devices...

-- PMM
Anthony Liguori - March 31, 2011, 7:26 p.m.
On 03/31/2011 02:07 PM, Michael S. Tsirkin wrote:
>> Just needs some love.
>>
>> VGA is just another device.  It happens to be that we treat VGA
>> device memory as something that behaves like ram occassionally but
>> that does not make it RAM.
> If we agree on that, will a pair of functions for this work?
> How about device_register_ram / device_unregister_ram ?

To register normal RAM or to register stuff that isn't RAM but looks and 
tastes like RAM?


>> Something like vhost doesn't need to see anything but RAM.  If we
>> have a mechanism to identify RAM as RAM, then vhost can only look at
>> RAM memory and not worry about things like VGA.
>>
>> I thought Alex had gotten a mini-version of RamAPI in but I can't
>> seem to figure out what that included.
> Me neither.
>
>>   At any rate, the point is
>> still that registering things that you want to exclude in vhost is
>> the wrong approach, you want to explicitly mark the things you want
>> to include.
>>
>> Regards,
>>
>> Anthony Liguori
> vhost just wants RAM.

Right, so mark RAM, and call it a day :-)

Regards,

Anthony Liguori
Anthony Liguori - March 31, 2011, 7:29 p.m.
On 03/31/2011 02:18 PM, Peter Maydell wrote:
> On 31 March 2011 20:01, Anthony Liguori<anthony@codemonkey.ws>  wrote:
>> VGA is just another device.  It happens to be that we treat VGA device
>> memory as something that behaves like ram occassionally but that does not
>> make it RAM.
> So, to ask a dumb question, what does make something RAM?

It's a made up concept that we use to make device performance faster.

Basically, RAM should include all of the memory that a reasonable device 
(that we control) would DMA to and has a relatively stable mapping.

> My take on RAM is that RAM is just another device; the only
> difference is that you want to be able to implement fast
> paths that go straight(ish) to target memory; but that's
> an optimisation detail, not something that makes RAM
> conceptually different from other devices...

Right, the trouble is, if you want to treat RAM like any other device, 
you can't get stable mappings to it which is bad for something like 
vhost-net.

Regards,

Anthony Liguori

> -- PMM
>
Michael S. Tsirkin - March 31, 2011, 8:03 p.m.
On Thu, Mar 31, 2011 at 02:26:59PM -0500, Anthony Liguori wrote:
> On 03/31/2011 02:07 PM, Michael S. Tsirkin wrote:
> >>Just needs some love.
> >>
> >>VGA is just another device.  It happens to be that we treat VGA
> >>device memory as something that behaves like ram occassionally but
> >>that does not make it RAM.
> >If we agree on that, will a pair of functions for this work?
> >How about device_register_ram / device_unregister_ram ?
> 
> To register normal RAM or to register stuff that isn't RAM but looks
> and tastes like RAM?

The later.

> >>Something like vhost doesn't need to see anything but RAM.  If we
> >>have a mechanism to identify RAM as RAM, then vhost can only look at
> >>RAM memory and not worry about things like VGA.
> >>
> >>I thought Alex had gotten a mini-version of RamAPI in but I can't
> >>seem to figure out what that included.
> >Me neither.
> >
> >>  At any rate, the point is
> >>still that registering things that you want to exclude in vhost is
> >>the wrong approach, you want to explicitly mark the things you want
> >>to include.
> >>
> >>Regards,
> >>
> >>Anthony Liguori
> >vhost just wants RAM.
> 
> Right, so mark RAM, and call it a day :-)
> 
> Regards,
> 
> Anthony Liguori
>
Peter Maydell - March 31, 2011, 8:12 p.m.
On 31 March 2011 20:29, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 03/31/2011 02:18 PM, Peter Maydell wrote:
>> So, to ask a dumb question, what does make something RAM?
>
> It's a made up concept that we use to make device performance faster.
>
> Basically, RAM should include all of the memory that a reasonable device
> (that we control) would DMA to and has a relatively stable mapping.

This seems rather vague :-) "Suitable as a target for DMA"
seems more like a guest concept than a model one.

>> My take on RAM is that RAM is just another device; the only
>> difference is that you want to be able to implement fast
>> paths that go straight(ish) to target memory; but that's
>> an optimisation detail, not something that makes RAM
>> conceptually different from other devices...
>
> Right, the trouble is, if you want to treat RAM like any other device, you
> can't get stable mappings to it which is bad for something like vhost-net.

Well, obviously you need to be able to revoke the permission
to use the fastpath pointer to the underlying memory. But you
need to be able to do that anyhow, to cover cases where (eg) the
guest has just written to some register that remaps the bottom
part of the address space so it's ROM rather than RAM, or whatever.
It's just a feature your optimisation needs to have. Equally, you
don't remap unless you have to, but if the mapping's changed then
it's changed...

-- PMM
Anthony Liguori - March 31, 2011, 8:21 p.m.
On 03/31/2011 03:03 PM, Michael S. Tsirkin wrote:
> On Thu, Mar 31, 2011 at 02:26:59PM -0500, Anthony Liguori wrote:
>> On 03/31/2011 02:07 PM, Michael S. Tsirkin wrote:
>>>> Just needs some love.
>>>>
>>>> VGA is just another device.  It happens to be that we treat VGA
>>>> device memory as something that behaves like ram occassionally but
>>>> that does not make it RAM.
>>> If we agree on that, will a pair of functions for this work?
>>> How about device_register_ram / device_unregister_ram ?
>> To register normal RAM or to register stuff that isn't RAM but looks
>> and tastes like RAM?
> The later.

If you're going to take that approach (and I'd strongly advise you to 
reconsider :-)), I'd at least suggest making the name a bit clearer.  
For instance, device_shadow_memory_as_ram() or something along those lines.

Regards,

Anthony Liguori

>>>> Something like vhost doesn't need to see anything but RAM.  If we
>>>> have a mechanism to identify RAM as RAM, then vhost can only look at
>>>> RAM memory and not worry about things like VGA.
>>>>
>>>> I thought Alex had gotten a mini-version of RamAPI in but I can't
>>>> seem to figure out what that included.
>>> Me neither.
>>>
>>>>   At any rate, the point is
>>>> still that registering things that you want to exclude in vhost is
>>>> the wrong approach, you want to explicitly mark the things you want
>>>> to include.
>>>>
>>>> Regards,
>>>>
>>>> Anthony Liguori
>>> vhost just wants RAM.
>> Right, so mark RAM, and call it a day :-)
>>
>> Regards,
>>
>> Anthony Liguori
>>
Anthony Liguori - March 31, 2011, 8:23 p.m.
On 03/31/2011 03:12 PM, Peter Maydell wrote:
> On 31 March 2011 20:29, Anthony Liguori<anthony@codemonkey.ws>  wrote:
>> On 03/31/2011 02:18 PM, Peter Maydell wrote:
>>> So, to ask a dumb question, what does make something RAM?
>> It's a made up concept that we use to make device performance faster.
>>
>> Basically, RAM should include all of the memory that a reasonable device
>> (that we control) would DMA to and has a relatively stable mapping.
> This seems rather vague :-) "Suitable as a target for DMA"
> seems more like a guest concept than a model one.

This is really exploited for paravirtual I/O (namely virtio) so it does 
cross into a guest concept.

>> Right, the trouble is, if you want to treat RAM like any other device, you
>> can't get stable mappings to it which is bad for something like vhost-net.
> Well, obviously you need to be able to revoke the permission
> to use the fastpath pointer to the underlying memory. But you
> need to be able to do that anyhow, to cover cases where (eg) the
> guest has just written to some register that remaps the bottom
> part of the address space so it's ROM rather than RAM, or whatever.
> It's just a feature your optimisation needs to have. Equally, you
> don't remap unless you have to, but if the mapping's changed then
> it's changed...

Right, the trouble now is that there's no way to distinguish between 
mapping where 1) we don't care about them in virtio and 2) they change 
frequently.

Maybe the right approach here is to just use a virtio specific API and 
register RAM as register_virtio_dma_area().

Regards,

Anthony Liguori

> -- PMM
>
Michael S. Tsirkin - March 31, 2011, 9:26 p.m.
On Thu, Mar 31, 2011 at 02:29:50PM -0500, Anthony Liguori wrote:
> On 03/31/2011 02:18 PM, Peter Maydell wrote:
> >On 31 March 2011 20:01, Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >>VGA is just another device.  It happens to be that we treat VGA device
> >>memory as something that behaves like ram occassionally but that does not
> >>make it RAM.
> >So, to ask a dumb question, what does make something RAM?
> 
> It's a made up concept that we use to make device performance faster.
> 
> Basically, RAM should include all of the memory that a reasonable
> device (that we control) would DMA to and has a relatively stable
> mapping.
> 
> >My take on RAM is that RAM is just another device; the only
> >difference is that you want to be able to implement fast
> >paths that go straight(ish) to target memory; but that's
> >an optimisation detail, not something that makes RAM
> >conceptually different from other devices...
> 
> Right, the trouble is, if you want to treat RAM like any other
> device, you can't get stable mappings to it which is bad for
> something like vhost-net.
> 
> Regards,
> 
> Anthony Liguori

Not only that I guess. Removing the VGA memory with the baloon
will likely also be a bad idea.
Peter Maydell - March 31, 2011, 9:32 p.m.
On 31 March 2011 21:23, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 03/31/2011 03:12 PM, Peter Maydell wrote:
>> Well, obviously you need to be able to revoke the permission
>> to use the fastpath pointer to the underlying memory. But you
>> need to be able to do that anyhow, to cover cases where (eg) the
>> guest has just written to some register that remaps the bottom
>> part of the address space so it's ROM rather than RAM, or whatever.
>> It's just a feature your optimisation needs to have. Equally, you
>> don't remap unless you have to, but if the mapping's changed then
>> it's changed...
>
> Right, the trouble now is that there's no way to distinguish between mapping
> where 1) we don't care about them in virtio and 2) they change frequently.

Aha. Thanks for the explanation.

> Maybe the right approach here is to just use a virtio specific API and
> register RAM as register_virtio_dma_area().

That seems like a clearer API, yes. I think it makes it much more
obvious what it's trying to achieve.

-- PMM
Anthony Liguori - March 31, 2011, 9:32 p.m.
On 03/31/2011 04:26 PM, Michael S. Tsirkin wrote:
> On Thu, Mar 31, 2011 at 02:29:50PM -0500, Anthony Liguori wrote:
>> On 03/31/2011 02:18 PM, Peter Maydell wrote:
>>> On 31 March 2011 20:01, Anthony Liguori<anthony@codemonkey.ws>   wrote:
>>>> VGA is just another device.  It happens to be that we treat VGA device
>>>> memory as something that behaves like ram occassionally but that does not
>>>> make it RAM.
>>> So, to ask a dumb question, what does make something RAM?
>> It's a made up concept that we use to make device performance faster.
>>
>> Basically, RAM should include all of the memory that a reasonable
>> device (that we control) would DMA to and has a relatively stable
>> mapping.
>>
>>> My take on RAM is that RAM is just another device; the only
>>> difference is that you want to be able to implement fast
>>> paths that go straight(ish) to target memory; but that's
>>> an optimisation detail, not something that makes RAM
>>> conceptually different from other devices...
>> Right, the trouble is, if you want to treat RAM like any other
>> device, you can't get stable mappings to it which is bad for
>> something like vhost-net.
>>
>> Regards,
>>
>> Anthony Liguori
> Not only that I guess. Removing the VGA memory with the baloon
> will likely also be a bad idea.

It's just the equivalent of a memset(0).  It would be a silly thing for 
a guest to do but not somethign to be concerned about.

Regards,

Anthony Liguori
Michael S. Tsirkin - March 31, 2011, 9:37 p.m.
On Thu, Mar 31, 2011 at 04:32:44PM -0500, Anthony Liguori wrote:
> On 03/31/2011 04:26 PM, Michael S. Tsirkin wrote:
> >On Thu, Mar 31, 2011 at 02:29:50PM -0500, Anthony Liguori wrote:
> >>On 03/31/2011 02:18 PM, Peter Maydell wrote:
> >>>On 31 March 2011 20:01, Anthony Liguori<anthony@codemonkey.ws>   wrote:
> >>>>VGA is just another device.  It happens to be that we treat VGA device
> >>>>memory as something that behaves like ram occassionally but that does not
> >>>>make it RAM.
> >>>So, to ask a dumb question, what does make something RAM?
> >>It's a made up concept that we use to make device performance faster.
> >>
> >>Basically, RAM should include all of the memory that a reasonable
> >>device (that we control) would DMA to and has a relatively stable
> >>mapping.
> >>
> >>>My take on RAM is that RAM is just another device; the only
> >>>difference is that you want to be able to implement fast
> >>>paths that go straight(ish) to target memory; but that's
> >>>an optimisation detail, not something that makes RAM
> >>>conceptually different from other devices...
> >>Right, the trouble is, if you want to treat RAM like any other
> >>device, you can't get stable mappings to it which is bad for
> >>something like vhost-net.
> >>
> >>Regards,
> >>
> >>Anthony Liguori
> >Not only that I guess. Removing the VGA memory with the baloon
> >will likely also be a bad idea.
> 
> It's just the equivalent of a memset(0).  It would be a silly thing
> for a guest to do but not somethign to be concerned about.
> 
> Regards,
> 
> Anthony Liguori
> 


BTW, what is IO_MEM_NOTDIRTY?
I thought another way would be to replace IO_MEM_RAM with
IO_MEM_DEVICE_SHADOW_RAM in these cases but no idea
what the right value for that enum would be.
Michael S. Tsirkin - March 31, 2011, 9:38 p.m.
On Thu, Mar 31, 2011 at 10:32:11PM +0100, Peter Maydell wrote:
> On 31 March 2011 21:23, Anthony Liguori <anthony@codemonkey.ws> wrote:
> > On 03/31/2011 03:12 PM, Peter Maydell wrote:
> >> Well, obviously you need to be able to revoke the permission
> >> to use the fastpath pointer to the underlying memory. But you
> >> need to be able to do that anyhow, to cover cases where (eg) the
> >> guest has just written to some register that remaps the bottom
> >> part of the address space so it's ROM rather than RAM, or whatever.
> >> It's just a feature your optimisation needs to have. Equally, you
> >> don't remap unless you have to, but if the mapping's changed then
> >> it's changed...
> >
> > Right, the trouble now is that there's no way to distinguish between mapping
> > where 1) we don't care about them in virtio and 2) they change frequently.
> 
> Aha. Thanks for the explanation.
> 
> > Maybe the right approach here is to just use a virtio specific API and
> > register RAM as register_virtio_dma_area().
> 
> That seems like a clearer API, yes. I think it makes it much more
> obvious what it's trying to achieve.
> 
> -- PMM

Maybe register_dma_area - its' not 100% virtio specific.
Anthony Liguori - March 31, 2011, 9:48 p.m.
On 03/31/2011 04:37 PM, Michael S. Tsirkin wrote:
> On Thu, Mar 31, 2011 at 04:32:44PM -0500, Anthony Liguori wrote:
>> On 03/31/2011 04:26 PM, Michael S. Tsirkin wrote:
>>> On Thu, Mar 31, 2011 at 02:29:50PM -0500, Anthony Liguori wrote:
>>>> On 03/31/2011 02:18 PM, Peter Maydell wrote:
>>>>> On 31 March 2011 20:01, Anthony Liguori<anthony@codemonkey.ws>    wrote:
>>>>>> VGA is just another device.  It happens to be that we treat VGA device
>>>>>> memory as something that behaves like ram occassionally but that does not
>>>>>> make it RAM.
>>>>> So, to ask a dumb question, what does make something RAM?
>>>> It's a made up concept that we use to make device performance faster.
>>>>
>>>> Basically, RAM should include all of the memory that a reasonable
>>>> device (that we control) would DMA to and has a relatively stable
>>>> mapping.
>>>>
>>>>> My take on RAM is that RAM is just another device; the only
>>>>> difference is that you want to be able to implement fast
>>>>> paths that go straight(ish) to target memory; but that's
>>>>> an optimisation detail, not something that makes RAM
>>>>> conceptually different from other devices...
>>>> Right, the trouble is, if you want to treat RAM like any other
>>>> device, you can't get stable mappings to it which is bad for
>>>> something like vhost-net.
>>>>
>>>> Regards,
>>>>
>>>> Anthony Liguori
>>> Not only that I guess. Removing the VGA memory with the baloon
>>> will likely also be a bad idea.
>> It's just the equivalent of a memset(0).  It would be a silly thing
>> for a guest to do but not somethign to be concerned about.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>
> BTW, what is IO_MEM_NOTDIRTY?

TCG needs to keep track of dirty memory in order to be able to deal with 
self modifying code.  IO_MEM_NOTDIRTY is the state that RAM takes before 
the first write.  I don't think IO_MEM_NOTDIRTY would ever show up in 
the l1_phys_map but it will show up in the iotlb.  The iotlb uses the 
same dispatch code as the l1_phys_map though.

Regards,

Anthony Liguori

> I thought another way would be to replace IO_MEM_RAM with
> IO_MEM_DEVICE_SHADOW_RAM in these cases but no idea
> what the right value for that enum would be.
>
Anthony Liguori - March 31, 2011, 9:49 p.m.
On 03/31/2011 04:38 PM, Michael S. Tsirkin wrote:
>
>> That seems like a clearer API, yes. I think it makes it much more
>> obvious what it's trying to achieve.
>>
>> -- PMM
> Maybe register_dma_area - its' not 100% virtio specific.

It's never been clear to me whether that's true or not.  I've heard 
mixed things about whether devices DMA to other devices.  I've never 
been able to find something in a specification stating authoritatively 
one way or another.

Regards,

Anthony Liguori
Michael S. Tsirkin - March 31, 2011, 11:42 p.m.
On Thu, Mar 31, 2011 at 04:49:46PM -0500, Anthony Liguori wrote:
> On 03/31/2011 04:38 PM, Michael S. Tsirkin wrote:
> >
> >>That seems like a clearer API, yes. I think it makes it much more
> >>obvious what it's trying to achieve.
> >>
> >>-- PMM
> >Maybe register_dma_area - its' not 100% virtio specific.
> 
> It's never been clear to me whether that's true or not.  I've heard
> mixed things about whether devices DMA to other devices.  I've never
> been able to find something in a specification stating
> authoritatively one way or another.
> 
> Regards,
> 
> Anthony Liguori
> 

AFAIK the capability of cross-talk between PCI devices
exists in PCI and is optional in PCI Express.

PCI spec says:
	Full multi-master capability allowing any PCI master peer-to-peer
	access to any PCI master/target.

The Express spec says:
	"The capability to route peer-to-peer transactions between hierarchy
	domains through a Root Complex is optional and implementation dependent.
	For example, an implementation may incorporate a real or virtual Switch
	internally within the Root Complex to enable full peer-to- peer support
	in a software transparent way."

However I don't think guests use this with devices we emulate in any way.

Haven't looked at ISA.
Anthony Liguori - April 1, 2011, 12:44 a.m.
On 03/31/2011 06:42 PM, Michael S. Tsirkin wrote:
> On Thu, Mar 31, 2011 at 04:49:46PM -0500, Anthony Liguori wrote:
>> On 03/31/2011 04:38 PM, Michael S. Tsirkin wrote:
>>>> That seems like a clearer API, yes. I think it makes it much more
>>>> obvious what it's trying to achieve.
>>>>
>>>> -- PMM
>>> Maybe register_dma_area - its' not 100% virtio specific.
>> It's never been clear to me whether that's true or not.  I've heard
>> mixed things about whether devices DMA to other devices.  I've never
>> been able to find something in a specification stating
>> authoritatively one way or another.
>>
>> Regards,
>>
>> Anthony Liguori
>>
> AFAIK the capability of cross-talk between PCI devices
> exists in PCI and is optional in PCI Express.
>
> PCI spec says:
> 	Full multi-master capability allowing any PCI master peer-to-peer
> 	access to any PCI master/target.
>
> The Express spec says:
> 	"The capability to route peer-to-peer transactions between hierarchy
> 	domains through a Root Complex is optional and implementation dependent.
> 	For example, an implementation may incorporate a real or virtual Switch
> 	internally within the Root Complex to enable full peer-to- peer support
> 	in a software transparent way."

What's not clear to me though, is whether peer-to-peer transactions are 
done via a special PCI mechanism or whether it's done by doing a I/O 
access to the address that the device happens to be mapped to.

Regards,

Anthony Liguori

> However I don't think guests use this with devices we emulate in any way.
>
> Haven't looked at ISA.
>
Michael S. Tsirkin - April 1, 2011, 1:07 a.m.
On Thu, Mar 31, 2011 at 07:44:35PM -0500, Anthony Liguori wrote:
> On 03/31/2011 06:42 PM, Michael S. Tsirkin wrote:
> >On Thu, Mar 31, 2011 at 04:49:46PM -0500, Anthony Liguori wrote:
> >>On 03/31/2011 04:38 PM, Michael S. Tsirkin wrote:
> >>>>That seems like a clearer API, yes. I think it makes it much more
> >>>>obvious what it's trying to achieve.
> >>>>
> >>>>-- PMM
> >>>Maybe register_dma_area - its' not 100% virtio specific.
> >>It's never been clear to me whether that's true or not.  I've heard
> >>mixed things about whether devices DMA to other devices.  I've never
> >>been able to find something in a specification stating
> >>authoritatively one way or another.
> >>
> >>Regards,
> >>
> >>Anthony Liguori
> >>
> >AFAIK the capability of cross-talk between PCI devices
> >exists in PCI and is optional in PCI Express.
> >
> >PCI spec says:
> >	Full multi-master capability allowing any PCI master peer-to-peer
> >	access to any PCI master/target.
> >
> >The Express spec says:
> >	"The capability to route peer-to-peer transactions between hierarchy
> >	domains through a Root Complex is optional and implementation dependent.
> >	For example, an implementation may incorporate a real or virtual Switch
> >	internally within the Root Complex to enable full peer-to- peer support
> >	in a software transparent way."
> 
> What's not clear to me though, is whether peer-to-peer transactions
> are done via a special PCI mechanism or whether it's done by doing a
> I/O access to the address that the device happens to be mapped to.

Section 2.2.4.1:
Address routing is used with Memory and I/O Requests.
Peter Maydell - April 1, 2011, 7:12 a.m.
On 31 March 2011 22:38, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Mar 31, 2011 at 10:32:11PM +0100, Peter Maydell wrote:
>> On 31 March 2011 21:23, Anthony Liguori <anthony@codemonkey.ws> wrote:
>> > Maybe the right approach here is to just use a virtio specific API and
>> > register RAM as register_virtio_dma_area().
>>
>> That seems like a clearer API, yes. I think it makes it much more
>> obvious what it's trying to achieve.

> Maybe register_dma_area - its' not 100% virtio specific.

Presumably it is specific to virtualisation-aware devices
though? Guest DMA has to just work to all the locations you
can DMA to/from on hardware, right?

-- PMM
Michael S. Tsirkin - April 1, 2011, 11:21 a.m.
On Fri, Apr 01, 2011 at 08:12:48AM +0100, Peter Maydell wrote:
> On 31 March 2011 22:38, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Mar 31, 2011 at 10:32:11PM +0100, Peter Maydell wrote:
> >> On 31 March 2011 21:23, Anthony Liguori <anthony@codemonkey.ws> wrote:
> >> > Maybe the right approach here is to just use a virtio specific API and
> >> > register RAM as register_virtio_dma_area().
> >>
> >> That seems like a clearer API, yes. I think it makes it much more
> >> obvious what it's trying to achieve.
> 
> > Maybe register_dma_area - its' not 100% virtio specific.
> 
> Presumably it is specific to virtualisation-aware devices
> though? Guest DMA has to just work to all the locations you
> can DMA to/from on hardware, right?
> 
> -- PMM

I guess so, yes.  But it might not be possible e.g. for a PCI device
to DMA into an ISA device.

Patch

diff --git a/cpu-common.h b/cpu-common.h
index acb91ac..e0477e9 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -18,10 +18,21 @@  typedef unsigned long ram_addr_t;
 typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, uint32_t value);
 typedef uint32_t CPUReadMemoryFunc(void *opaque, target_phys_addr_t addr);
 
-void cpu_register_physical_memory_offset(target_phys_addr_t start_addr,
-                                         ram_addr_t size,
-                                         ram_addr_t phys_offset,
-                                         ram_addr_t region_offset);
+void cpu_register_physical_memory_vga(target_phys_addr_t start_addr,
+                                      ram_addr_t size,
+                                      ram_addr_t phys_offset,
+                                      ram_addr_t region_offset,
+                                      bool vga_ram);
+
+static inline void cpu_register_physical_memory_offset(target_phys_addr_t start_addr,
+                                                       ram_addr_t size,
+                                                       ram_addr_t phys_offset,
+                                                       ram_addr_t region_offset)
+{
+    cpu_register_physical_memory_vga(start_addr, size, phys_offset,
+                                     region_offset, false);
+}
+
 static inline void cpu_register_physical_memory(target_phys_addr_t start_addr,
                                                 ram_addr_t size,
                                                 ram_addr_t phys_offset)
@@ -69,7 +80,8 @@  struct CPUPhysMemoryClient {
     void (*set_memory)(struct CPUPhysMemoryClient *client,
                        target_phys_addr_t start_addr,
                        ram_addr_t size,
-                       ram_addr_t phys_offset);
+                       ram_addr_t phys_offset,
+                       bool vga_ram);
     int (*sync_dirty_bitmap)(struct CPUPhysMemoryClient *client,
                              target_phys_addr_t start_addr,
                              target_phys_addr_t end_addr);
diff --git a/exec.c b/exec.c
index b992016..6435378 100644
--- a/exec.c
+++ b/exec.c
@@ -1635,11 +1635,12 @@  static QLIST_HEAD(memory_client_list, CPUPhysMemoryClient) memory_client_list
 
 static void cpu_notify_set_memory(target_phys_addr_t start_addr,
 				  ram_addr_t size,
-				  ram_addr_t phys_offset)
+				  ram_addr_t phys_offset,
+                                  bool vga_ram)
 {
     CPUPhysMemoryClient *client;
     QLIST_FOREACH(client, &memory_client_list, list) {
-        client->set_memory(client, start_addr, size, phys_offset);
+        client->set_memory(client, start_addr, size, phys_offset, vga_ram);
     }
 }
 
@@ -1682,7 +1683,7 @@  static void phys_page_for_each_in_l1_map(PhysPageDesc **phys_map,
                 continue;
             }
             client->set_memory(client, pd[l2].region_offset,
-                               TARGET_PAGE_SIZE, pd[l2].phys_offset);
+                               TARGET_PAGE_SIZE, pd[l2].phys_offset, false);
         }
     }
 }
@@ -2418,10 +2419,11 @@  static void *subpage_init (target_phys_addr_t base, ram_addr_t *phys,
    start_addr and region_offset are rounded down to a page boundary
    before calculating this offset.  This should not be a problem unless
    the low bits of start_addr and region_offset differ.  */
-void cpu_register_physical_memory_offset(target_phys_addr_t start_addr,
+void cpu_register_physical_memory_vga(target_phys_addr_t start_addr,
                                          ram_addr_t size,
                                          ram_addr_t phys_offset,
-                                         ram_addr_t region_offset)
+                                         ram_addr_t region_offset,
+                                         bool vga_ram)
 {
     target_phys_addr_t addr, end_addr;
     PhysPageDesc *p;
@@ -2432,7 +2434,7 @@  void cpu_register_physical_memory_offset(target_phys_addr_t start_addr,
     if (kvm_enabled())
         kvm_set_phys_mem(start_addr, size, phys_offset);
 
-    cpu_notify_set_memory(start_addr, size, phys_offset);
+    cpu_notify_set_memory(start_addr, size, phys_offset, vga_ram);
 
     if (phys_offset == IO_MEM_UNASSIGNED) {
         region_offset = start_addr;
diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index d04adeb..435d914 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -2557,7 +2557,9 @@  static void map_linear_vram(CirrusVGAState *s)
     if (!s->vga.map_addr && s->vga.lfb_addr && s->vga.lfb_end) {
         s->vga.map_addr = s->vga.lfb_addr;
         s->vga.map_end = s->vga.lfb_end;
-        cpu_register_physical_memory(s->vga.map_addr, s->vga.map_end - s->vga.map_addr, s->vga.vram_offset);
+        cpu_register_physical_memory_vga(s->vga.map_addr,
+					 s->vga.map_end - s->vga.map_addr,
+					 s->vga.vram_offset, 0, true);
     }
 
     if (!s->vga.map_addr)
@@ -2566,26 +2568,34 @@  static void map_linear_vram(CirrusVGAState *s)
 #ifndef TARGET_IA64
     s->vga.lfb_vram_mapped = 0;
 
-    cpu_register_physical_memory(isa_mem_base + 0xa0000, 0x8000,
-                                (s->vga.vram_offset + s->cirrus_bank_base[0]) | IO_MEM_UNASSIGNED);
-    cpu_register_physical_memory(isa_mem_base + 0xa8000, 0x8000,
-                                (s->vga.vram_offset + s->cirrus_bank_base[1]) | IO_MEM_UNASSIGNED);
+    cpu_register_physical_memory_vga(isa_mem_base + 0xa0000, 0x8000,
+				     (s->vga.vram_offset +
+				      s->cirrus_bank_base[0]) |
+				     IO_MEM_UNASSIGNED, 0, true);
+    cpu_register_physical_memory_vga(isa_mem_base + 0xa8000, 0x8000,
+				     (s->vga.vram_offset +
+				      s->cirrus_bank_base[1]) |
+				     IO_MEM_UNASSIGNED, 0, true);
     if (!(s->cirrus_srcptr != s->cirrus_srcptr_end)
         && !((s->vga.sr[0x07] & 0x01) == 0)
         && !((s->vga.gr[0x0B] & 0x14) == 0x14)
         && !(s->vga.gr[0x0B] & 0x02)) {
 
         vga_dirty_vga_stop(&s->vga);
-        cpu_register_physical_memory(isa_mem_base + 0xa0000, 0x8000,
-                                    (s->vga.vram_offset + s->cirrus_bank_base[0]) | IO_MEM_RAM);
-        cpu_register_physical_memory(isa_mem_base + 0xa8000, 0x8000,
-                                    (s->vga.vram_offset + s->cirrus_bank_base[1]) | IO_MEM_RAM);
+        cpu_register_physical_memory_vga(isa_mem_base + 0xa0000, 0x8000,
+					 (s->vga.vram_offset +
+					  s->cirrus_bank_base[0]) |
+					 IO_MEM_RAM, 0, true);
+        cpu_register_physical_memory_vga(isa_mem_base + 0xa8000, 0x8000,
+					 (s->vga.vram_offset +
+					  s->cirrus_bank_base[1]) |
+					 IO_MEM_RAM, 0, true);
 
         s->vga.lfb_vram_mapped = 1;
     }
     else {
         cpu_register_physical_memory(isa_mem_base + 0xa0000, 0x20000,
                                      s->vga.vga_io_memory);
     }
 #endif
 
diff --git a/hw/g364fb.c b/hw/g364fb.c
index 3c8fb98..3734902 100644
--- a/hw/g364fb.c
+++ b/hw/g364fb.c
@@ -605,7 +605,7 @@  int g364fb_mm_init(target_phys_addr_t vram_base,
                                  g364fb_invalidate_display,
                                  g364fb_screen_dump, NULL, s);
 
-    cpu_register_physical_memory(vram_base, s->vram_size, s->vram_offset);
+    cpu_register_physical_memory_vga(vram_base, s->vram_size, s->vram_offset, 0, true);
 
     io_ctrl = cpu_register_io_memory(g364fb_ctrl_read, g364fb_ctrl_write, s);
     cpu_register_physical_memory(ctrl_base, 0x200000, io_ctrl);
diff --git a/hw/vga-isa-mm.c b/hw/vga-isa-mm.c
index b4ff23c..500507d 100644
--- a/hw/vga-isa-mm.c
+++ b/hw/vga-isa-mm.c
@@ -123,8 +123,8 @@  int isa_vga_mm_init(target_phys_addr_t vram_base,
 
 #ifdef CONFIG_BOCHS_VBE
     /* XXX: use optimized standard vga accesses */
-    cpu_register_physical_memory(VBE_DISPI_LFB_PHYSICAL_ADDRESS,
-                                 VGA_RAM_SIZE, s->vga.vram_offset);
+    cpu_register_physical_memory_vga(VBE_DISPI_LFB_PHYSICAL_ADDRESS,
+                                     VGA_RAM_SIZE, s->vga.vram_offset, 0, true);
 #endif
     return 0;
 }
diff --git a/hw/vga-pci.c b/hw/vga-pci.c
index f6fb1b3..3a673cf 100644
--- a/hw/vga-pci.c
+++ b/hw/vga-pci.c
@@ -55,7 +55,8 @@  static void vga_map(PCIDevice *pci_dev, int region_num,
     if (region_num == PCI_ROM_SLOT) {
         cpu_register_physical_memory(addr, s->bios_size, s->bios_offset);
     } else {
-        cpu_register_physical_memory(addr, s->vram_size, s->vram_offset);
+        cpu_register_physical_memory_vga(addr, s->vram_size, s->vram_offset, 0,
+                                         true);
         s->map_addr = addr;
         s->map_end = addr + s->vram_size;
         vga_dirty_vga_start(s);
diff --git a/hw/vga.c b/hw/vga.c
index 1d269d5..a742dd4 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -2359,8 +2359,8 @@  void vga_init_vbe(VGACommonState *s)
 {
 #ifdef CONFIG_BOCHS_VBE
     /* XXX: use optimized standard vga accesses */
-    cpu_register_physical_memory(VBE_DISPI_LFB_PHYSICAL_ADDRESS,
-                                 VGA_RAM_SIZE, s->vram_offset);
+    cpu_register_physical_memory_vga(VBE_DISPI_LFB_PHYSICAL_ADDRESS,
+                                     VGA_RAM_SIZE, s->vram_offset,0 , true);
     s->vbe_mapped = 1;
 #endif 
 }
diff --git a/hw/vhost.c b/hw/vhost.c
index 8e28fd9..27d08c0 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -379,7 +379,8 @@  static bool vhost_dev_cmp_memory(struct vhost_dev *dev,
 static void vhost_client_set_memory(CPUPhysMemoryClient *client,
                                     target_phys_addr_t start_addr,
                                     ram_addr_t size,
-                                    ram_addr_t phys_offset)
+                                    ram_addr_t phys_offset,
+                                    bool vga_ram)
 {
     struct vhost_dev *dev = container_of(client, struct vhost_dev, client);
     ram_addr_t flags = phys_offset & ~TARGET_PAGE_MASK;
@@ -388,6 +389,10 @@  static void vhost_client_set_memory(CPUPhysMemoryClient *client,
     uint64_t log_size;
     int r;
 
+    if (vga_ram) {
+        flags = IO_MEM_UNASSIGNED;
+    }
+
     /* Optimize no-change case. At least cirrus_vga does this a lot at this time. */
     if (flags == IO_MEM_RAM) {
         if (!vhost_dev_cmp_memory(dev, start_addr, size,