Patchwork [v2,1/2] memory: Fix old portio word accesses

login
register
mail settings
Submitter Jan Kiszka
Date Sept. 18, 2011, 12:51 p.m.
Message ID <4E75E96E.90101@web.de>
Download mbox | patch
Permalink /patch/115236/
State New
Headers show

Comments

Jan Kiszka - Sept. 18, 2011, 12:51 p.m.
From: Jan Kiszka <jan.kiszka@siemens.com>

As we register old portio regions via ioport_register, we are also
responsible for providing the word access wrapper.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Oops, was lacking a shift for word reads.

 memory.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)
Avi Kivity - Sept. 18, 2011, 3:37 p.m.
On 09/18/2011 03:51 PM, Jan Kiszka wrote:
> From: Jan Kiszka<jan.kiszka@siemens.com>
>
> As we register old portio regions via ioport_register, we are also
> responsible for providing the word access wrapper.
>
> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
> ---
>
> Oops, was lacking a shift for word reads.
>
>   memory.c |   10 ++++++++++
>   1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/memory.c b/memory.c
> index b3ee232..aef4702 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -397,6 +397,11 @@ static void memory_region_iorange_read(IORange *iorange,
>           *data = ((uint64_t)1<<  (width * 8)) - 1;
>           if (mrp) {
>               *data = mrp->read(mr->opaque, offset + mr->offset);
> +        } else if (width == 2) {
> +            mrp = find_portio(mr, offset, 1, false);
> +            assert(mrp);
> +            *data = mrp->read(mr->opaque, offset + mr->offset) |
> +                    (mrp->read(mr->opaque, offset + mr->offset + 1)<<  8);
>           }

What about width 4?  Why not use access_with_adjusted_size()?
Jan Kiszka - Sept. 18, 2011, 3:43 p.m.
On 2011-09-18 17:37, Avi Kivity wrote:
> On 09/18/2011 03:51 PM, Jan Kiszka wrote:
>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>
>> As we register old portio regions via ioport_register, we are also
>> responsible for providing the word access wrapper.
>>
>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>> ---
>>
>> Oops, was lacking a shift for word reads.
>>
>>   memory.c |   10 ++++++++++
>>   1 files changed, 10 insertions(+), 0 deletions(-)
>>
>> diff --git a/memory.c b/memory.c
>> index b3ee232..aef4702 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -397,6 +397,11 @@ static void memory_region_iorange_read(IORange
>> *iorange,
>>           *data = ((uint64_t)1<<  (width * 8)) - 1;
>>           if (mrp) {
>>               *data = mrp->read(mr->opaque, offset + mr->offset);
>> +        } else if (width == 2) {
>> +            mrp = find_portio(mr, offset, 1, false);
>> +            assert(mrp);
>> +            *data = mrp->read(mr->opaque, offset + mr->offset) |
>> +                    (mrp->read(mr->opaque, offset + mr->offset +
>> 1)<<  8);
>>           }
> 
> What about width 4?

This is PIO, limited by the x86 address space to 16 bit. Will add a comment.

>  Why not use access_with_adjusted_size()?

Because of different accessor prototypes.

Jan
Avi Kivity - Sept. 18, 2011, 3:45 p.m.
On 09/18/2011 06:43 PM, Jan Kiszka wrote:
> On 2011-09-18 17:37, Avi Kivity wrote:
> >  On 09/18/2011 03:51 PM, Jan Kiszka wrote:
> >>  From: Jan Kiszka<jan.kiszka@siemens.com>
> >>
> >>  As we register old portio regions via ioport_register, we are also
> >>  responsible for providing the word access wrapper.
> >>
> >>  Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
> >>  ---
> >>
> >>  Oops, was lacking a shift for word reads.
> >>
> >>    memory.c |   10 ++++++++++
> >>    1 files changed, 10 insertions(+), 0 deletions(-)
> >>
> >>  diff --git a/memory.c b/memory.c
> >>  index b3ee232..aef4702 100644
> >>  --- a/memory.c
> >>  +++ b/memory.c
> >>  @@ -397,6 +397,11 @@ static void memory_region_iorange_read(IORange
> >>  *iorange,
> >>            *data = ((uint64_t)1<<   (width * 8)) - 1;
> >>            if (mrp) {
> >>                *data = mrp->read(mr->opaque, offset + mr->offset);
> >>  +        } else if (width == 2) {
> >>  +            mrp = find_portio(mr, offset, 1, false);
> >>  +            assert(mrp);
> >>  +            *data = mrp->read(mr->opaque, offset + mr->offset) |
> >>  +                    (mrp->read(mr->opaque, offset + mr->offset +
> >>  1)<<   8);
> >>            }
> >
> >  What about width 4?
>
> This is PIO, limited by the x86 address space to 16 bit. Will add a comment.

x86 PIO is not limited to 16 bits, just ISA, which memory.c knows 
nothing about.

> >   Why not use access_with_adjusted_size()?
>
> Because of different accessor prototypes.
>

Can be thunked.  There is a different issue, a_w_a_s() can use small 
accesses to emulate large ones, but not vice versa.  It needs fixing anyway.
Jan Kiszka - Sept. 18, 2011, 4:28 p.m.
On 2011-09-18 17:45, Avi Kivity wrote:
> On 09/18/2011 06:43 PM, Jan Kiszka wrote:
>> On 2011-09-18 17:37, Avi Kivity wrote:
>> >  On 09/18/2011 03:51 PM, Jan Kiszka wrote:
>> >>  From: Jan Kiszka<jan.kiszka@siemens.com>
>> >>
>> >>  As we register old portio regions via ioport_register, we are also
>> >>  responsible for providing the word access wrapper.
>> >>
>> >>  Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>> >>  ---
>> >>
>> >>  Oops, was lacking a shift for word reads.
>> >>
>> >>    memory.c |   10 ++++++++++
>> >>    1 files changed, 10 insertions(+), 0 deletions(-)
>> >>
>> >>  diff --git a/memory.c b/memory.c
>> >>  index b3ee232..aef4702 100644
>> >>  --- a/memory.c
>> >>  +++ b/memory.c
>> >>  @@ -397,6 +397,11 @@ static void memory_region_iorange_read(IORange
>> >>  *iorange,
>> >>            *data = ((uint64_t)1<<   (width * 8)) - 1;
>> >>            if (mrp) {
>> >>                *data = mrp->read(mr->opaque, offset + mr->offset);
>> >>  +        } else if (width == 2) {
>> >>  +            mrp = find_portio(mr, offset, 1, false);
>> >>  +            assert(mrp);
>> >>  +            *data = mrp->read(mr->opaque, offset + mr->offset) |
>> >>  +                    (mrp->read(mr->opaque, offset + mr->offset +
>> >>  1)<<   8);
>> >>            }
>> >
>> >  What about width 4?
>>
>> This is PIO, limited by the x86 address space to 16 bit. Will add a
>> comment.
> 
> x86 PIO is not limited to 16 bits, just ISA, which memory.c knows
> nothing about.

Confused address and data, the former is limited 16, the latter can be
32 as well. But I guess only ISA models made use of the core's split up
service, and that's why QEMU limited itself accordingly.

> 
>> >   Why not use access_with_adjusted_size()?
>>
>> Because of different accessor prototypes.
>>
> 
> Can be thunked.  There is a different issue, a_w_a_s() can use small
> accesses to emulate large ones, but not vice versa.  It needs fixing
> anyway.
> 

IIRC, that's a feature: Devices not implementing small accesses tend to
refuse them in reality.

Jan
Avi Kivity - Sept. 18, 2011, 4:49 p.m.
On 09/18/2011 07:28 PM, Jan Kiszka wrote:
> >>
> >>  This is PIO, limited by the x86 address space to 16 bit. Will add a
> >>  comment.
> >
> >  x86 PIO is not limited to 16 bits, just ISA, which memory.c knows
> >  nothing about.
>
> Confused address and data, the former is limited 16, the latter can be
> 32 as well. But I guess only ISA models made use of the core's split up
> service, and that's why QEMU limited itself accordingly.

Let's not bury such details in the core.

>
> >
> >>  >    Why not use access_with_adjusted_size()?
> >>
> >>  Because of different accessor prototypes.
> >>
> >
> >  Can be thunked.  There is a different issue, a_w_a_s() can use small
> >  accesses to emulate large ones, but not vice versa.  It needs fixing
> >  anyway.
> >
>
> IIRC, that's a feature: Devices not implementing small accesses tend to
> refuse them in reality.

I don't think this holds for pci; there the bus always generates 32-bit 
writes with separate byte enables for each lane.  The device need not 
even be aware of a sub-word access, for reads.
Jan Kiszka - Sept. 18, 2011, 7:07 p.m.
On 2011-09-18 18:49, Avi Kivity wrote:
> On 09/18/2011 07:28 PM, Jan Kiszka wrote:
>> >>
>> >>  This is PIO, limited by the x86 address space to 16 bit. Will add a
>> >>  comment.
>> >
>> >  x86 PIO is not limited to 16 bits, just ISA, which memory.c knows
>> >  nothing about.
>>
>> Confused address and data, the former is limited 16, the latter can be
>> 32 as well. But I guess only ISA models made use of the core's split up
>> service, and that's why QEMU limited itself accordingly.
> 
> Let's not bury such details in the core.

It's already in the core (ioport), and would refrain from changing it in
this fix.

> 
>>
>> >
>> >>  >    Why not use access_with_adjusted_size()?
>> >>
>> >>  Because of different accessor prototypes.
>> >>
>> >
>> >  Can be thunked.  There is a different issue, a_w_a_s() can use small
>> >  accesses to emulate large ones, but not vice versa.  It needs fixing
>> >  anyway.
>> >
>>
>> IIRC, that's a feature: Devices not implementing small accesses tend to
>> refuse them in reality.
> 
> I don't think this holds for pci; there the bus always generates 32-bit
> writes with separate byte enables for each lane.  The device need not
> even be aware of a sub-word access, for reads.

The problem is that once we "enhance" the core with such a support to
potentially help one use case, we need to validate all users again if
they depend on the old behavior. That's tricky as breakage may only show
up with odd guests that issue invalid but so far harmless requests.

Jan
Avi Kivity - Sept. 19, 2011, 12:19 p.m.
On 09/18/2011 10:07 PM, Jan Kiszka wrote:
> On 2011-09-18 18:49, Avi Kivity wrote:
> >  On 09/18/2011 07:28 PM, Jan Kiszka wrote:
> >>  >>
> >>  >>   This is PIO, limited by the x86 address space to 16 bit. Will add a
> >>  >>   comment.
> >>  >
> >>  >   x86 PIO is not limited to 16 bits, just ISA, which memory.c knows
> >>  >   nothing about.
> >>
> >>  Confused address and data, the former is limited 16, the latter can be
> >>  32 as well. But I guess only ISA models made use of the core's split up
> >>  service, and that's why QEMU limited itself accordingly.
> >
> >  Let's not bury such details in the core.
>
> It's already in the core (ioport), and would refrain from changing it in
> this fix.

That's the bad old core we're trying to get away from.

> >
> >  I don't think this holds for pci; there the bus always generates 32-bit
> >  writes with separate byte enables for each lane.  The device need not
> >  even be aware of a sub-word access, for reads.
>
> The problem is that once we "enhance" the core with such a support to
> potentially help one use case, we need to validate all users again if
> they depend on the old behavior. That's tricky as breakage may only show
> up with odd guests that issue invalid but so far harmless requests.

It's opt-in.  If a device sets 
MemoryRegionOps::impl.{min,max}_access_size = 1, it will only be fed 
byte accesses (the core will take care of breaking apart larger 
writes).  If it sets MemoryRegionOps::impl.{min,max}_access_size = 4, it 
will only get long accesses (and the core will/should shift/mask or 
RMW).  Refusing illegal access sizes is done using 
MemoryRegionOps::valid.  Most of this is unimplemented unfortunately.
Jan Kiszka - Sept. 19, 2011, 12:32 p.m.
On 2011-09-19 14:19, Avi Kivity wrote:
> On 09/18/2011 10:07 PM, Jan Kiszka wrote:
>> On 2011-09-18 18:49, Avi Kivity wrote:
>> >  On 09/18/2011 07:28 PM, Jan Kiszka wrote:
>> >>  >>
>> >>  >>   This is PIO, limited by the x86 address space to 16 bit. Will
>> add a
>> >>  >>   comment.
>> >>  >
>> >>  >   x86 PIO is not limited to 16 bits, just ISA, which memory.c knows
>> >>  >   nothing about.
>> >>
>> >>  Confused address and data, the former is limited 16, the latter
>> can be
>> >>  32 as well. But I guess only ISA models made use of the core's
>> split up
>> >>  service, and that's why QEMU limited itself accordingly.
>> >
>> >  Let's not bury such details in the core.
>>
>> It's already in the core (ioport), and would refrain from changing it in
>> this fix.
> 
> That's the bad old core we're trying to get away from.

We overcome it at the point the last portio user was converted and that
legacy is removed.

> 
>> >
>> >  I don't think this holds for pci; there the bus always generates
>> 32-bit
>> >  writes with separate byte enables for each lane.  The device need not
>> >  even be aware of a sub-word access, for reads.
>>
>> The problem is that once we "enhance" the core with such a support to
>> potentially help one use case, we need to validate all users again if
>> they depend on the old behavior. That's tricky as breakage may only show
>> up with odd guests that issue invalid but so far harmless requests.
> 
> It's opt-in.  If a device sets
> MemoryRegionOps::impl.{min,max}_access_size = 1, it will only be fed
> byte accesses (the core will take care of breaking apart larger
> writes).  If it sets MemoryRegionOps::impl.{min,max}_access_size = 4, it
> will only get long accesses (and the core will/should shift/mask or
> RMW).  Refusing illegal access sizes is done using
> MemoryRegionOps::valid.  Most of this is unimplemented unfortunately.

That makes sense (for non-old_portio users).

Jan
Avi Kivity - Sept. 19, 2011, 12:42 p.m.
On 09/19/2011 03:32 PM, Jan Kiszka wrote:
> >  It's opt-in.  If a device sets
> >  MemoryRegionOps::impl.{min,max}_access_size = 1, it will only be fed
> >  byte accesses (the core will take care of breaking apart larger
> >  writes).  If it sets MemoryRegionOps::impl.{min,max}_access_size = 4, it
> >  will only get long accesses (and the core will/should shift/mask or
> >  RMW).  Refusing illegal access sizes is done using
> >  MemoryRegionOps::valid.  Most of this is unimplemented unfortunately.
>
> That makes sense (for non-old_portio users).
>
>

The trick of having a way to register N callbacks with one shot is worth 
growing.  Ideally each register in a BAR would have a callback and we'd 
do something like

     MemoryRegionOps mydev_ops = {
         .registers = {
              { MYDEV_REG_x, 4, 4, mydev_reg_x_read, mydev_reg_x_write, },
              ...
          },
     }

with hints to the core like "this register sits at this offset, use it 
for reads instead of a callback", or, "this is a read-only register".
Jan Kiszka - Sept. 19, 2011, 12:55 p.m.
On 2011-09-19 14:42, Avi Kivity wrote:
> On 09/19/2011 03:32 PM, Jan Kiszka wrote:
>> >  It's opt-in.  If a device sets
>> >  MemoryRegionOps::impl.{min,max}_access_size = 1, it will only be fed
>> >  byte accesses (the core will take care of breaking apart larger
>> >  writes).  If it sets MemoryRegionOps::impl.{min,max}_access_size =
>> 4, it
>> >  will only get long accesses (and the core will/should shift/mask or
>> >  RMW).  Refusing illegal access sizes is done using
>> >  MemoryRegionOps::valid.  Most of this is unimplemented unfortunately.
>>
>> That makes sense (for non-old_portio users).
>>
>>
> 
> The trick of having a way to register N callbacks with one shot is worth
> growing.  Ideally each register in a BAR would have a callback and we'd
> do something like
> 
>     MemoryRegionOps mydev_ops = {
>         .registers = {
>              { MYDEV_REG_x, 4, 4, mydev_reg_x_read, mydev_reg_x_write, },
>              ...
>          },
>     }
> 
> with hints to the core like "this register sits at this offset, use it
> for reads instead of a callback", or, "this is a read-only register".

This has pros and cons. If you have n registers to dispatch, you then
have to write n function prologues and maybe epilogues instead of just
one. Specifically if the register access is trivial, that could case
quite some LoC blowup on the device side.

What may have a better ratio are generic register get/set handlers.

Jan
Avi Kivity - Sept. 19, 2011, 1:09 p.m.
On 09/19/2011 03:55 PM, Jan Kiszka wrote:
> >
> >  The trick of having a way to register N callbacks with one shot is worth
> >  growing.  Ideally each register in a BAR would have a callback and we'd
> >  do something like
> >
> >      MemoryRegionOps mydev_ops = {
> >          .registers = {
> >               { MYDEV_REG_x, 4, 4, mydev_reg_x_read, mydev_reg_x_write, },
> >               ...
> >           },
> >      }
> >
> >  with hints to the core like "this register sits at this offset, use it
> >  for reads instead of a callback", or, "this is a read-only register".
>
> This has pros and cons. If you have n registers to dispatch, you then
> have to write n function prologues and maybe epilogues instead of just
> one. Specifically if the register access is trivial, that could case
> quite some LoC blowup on the device side.
>
> What may have a better ratio are generic register get/set handlers.
>

With C++ pointers-to-members and pointers-to-member-functions, you 
actually get some nice representation:

class MyDev {

     void reg_1_read(...) { return some_computation(); }
     void reg_1_write(...) { do_something(); }

     uint32_t reg_2;
     void reg_2_write(...) { reg_2 = value; do_something(); }

     uint64_t reg_3;

     static const Register registers[] = {
           Register(REG_1, &MyDev::reg_1_read, &MyDev::reg_1_write),
           Register(REG_2, &MyDev::reg_2, &MyDev::reg_2_write),
           Register(REG_1, &MyDev::reg_3),
     };
};

... and the Register class generates the appropriate accessors.  We can 
emulate some of this with macros, but the conversion from opaque to the 
actual type will always be ugly.
Gerd Hoffmann - Sept. 19, 2011, 1:55 p.m.
Hi,

>> The trick of having a way to register N callbacks with one shot is worth
>> growing.  Ideally each register in a BAR would have a callback and we'd
>> do something like
>>
>>      MemoryRegionOps mydev_ops = {
>>          .registers = {
>>               { MYDEV_REG_x, 4, 4, mydev_reg_x_read, mydev_reg_x_write, },
>>               ...
>>           },
>>      }
>>
>> with hints to the core like "this register sits at this offset, use it
>> for reads instead of a callback", or, "this is a read-only register".
>
> This has pros and cons. If you have n registers to dispatch, you then
> have to write n function prologues and maybe epilogues instead of just
> one. Specifically if the register access is trivial, that could case
> quite some LoC blowup on the device side.

If the register access is trivial then you don't need to call into the 
driver at all ...

You can have a look at hw/intel-hda.c which actually implements 
something like this, with some commonly needed features:

   * The "offset" field already mentioned by avi is there, so trivial
     reads/writes can be handled by the core.
   * A "wmask" field to specify which bits are guest writable.
   * A "wclear" field to specify which bits have write-one-to-clear
     semantics.
   * A "reset" field which specified the value this field has after
     device reset.  Also serves as value for read-only registers.
   * read/write handlers of course.  The write handler is called after
     the core applied sanity checks and calculated the new register
     value (using wmask+wclear).
   * A "name" field (for debug logging).

It's pretty nice, alot more readable that a big switch, forces you to 
think which bits the guest can set (not specifying a wmask gives you a 
read-only register ;).

Also no bloat.  With this moving to memory core the all the handlers 
will gain a line with a container_of(), but that isn't too bad too IMHO.

cheers,
   Gerd
Avi Kivity - Sept. 19, 2011, 1:58 p.m.
On 09/19/2011 04:55 PM, Gerd Hoffmann wrote:
>
> If the register access is trivial then you don't need to call into the 
> driver at all ...
>
> You can have a look at hw/intel-hda.c which actually implements 
> something like this, with some commonly needed features:
>
>   * The "offset" field already mentioned by avi is there, so trivial
>     reads/writes can be handled by the core.
>   * A "wmask" field to specify which bits are guest writable.
>   * A "wclear" field to specify which bits have write-one-to-clear
>     semantics.
>   * A "reset" field which specified the value this field has after
>     device reset.  Also serves as value for read-only registers.
>   * read/write handlers of course.  The write handler is called after
>     the core applied sanity checks and calculated the new register
>     value (using wmask+wclear).
>   * A "name" field (for debug logging).
>
> It's pretty nice, alot more readable that a big switch, forces you to 
> think which bits the guest can set (not specifying a wmask gives you a 
> read-only register ;).
>
> Also no bloat.  With this moving to memory core the all the handlers 
> will gain a line with a container_of(), but that isn't too bad too IMHO.

It's also more secure.  Move as much as possible into the core, and 
review (and fuzz) that like hell.
Avi Kivity - Sept. 26, 2011, 1:30 p.m.
On 09/18/2011 03:51 PM, Jan Kiszka wrote:
> From: Jan Kiszka<jan.kiszka@siemens.com>
>
> As we register old portio regions via ioport_register, we are also
> responsible for providing the word access wrapper.
>
> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
> ---
>
> Oops, was lacking a shift for word reads.
>
>

Thanks, applied; while I don't like it, vga is such a mess now that we 
have to reduce the number of fronts and get something working.

Patch

diff --git a/memory.c b/memory.c
index b3ee232..aef4702 100644
--- a/memory.c
+++ b/memory.c
@@ -397,6 +397,11 @@  static void memory_region_iorange_read(IORange *iorange,
         *data = ((uint64_t)1 << (width * 8)) - 1;
         if (mrp) {
             *data = mrp->read(mr->opaque, offset + mr->offset);
+        } else if (width == 2) {
+            mrp = find_portio(mr, offset, 1, false);
+            assert(mrp);
+            *data = mrp->read(mr->opaque, offset + mr->offset) |
+                    (mrp->read(mr->opaque, offset + mr->offset + 1) << 8);
         }
         return;
     }
@@ -419,6 +424,11 @@  static void memory_region_iorange_write(IORange *iorange,
 
         if (mrp) {
             mrp->write(mr->opaque, offset + mr->offset, data);
+        } else if (width == 2) {
+            mrp = find_portio(mr, offset, 1, false);
+            assert(mrp);
+            mrp->write(mr->opaque, offset + mr->offset, data & 0xff);
+            mrp->write(mr->opaque, offset + mr->offset + 1, data >> 8);
         }
         return;
     }