Patchwork Re: [PATCH 1/6] Make config space accessor host bus trapable

login
register
mail settings
Submitter Isaku Yamahata
Date Jan. 4, 2010, 10:45 a.m.
Message ID <20100104104516.GD4672@valinux.co.jp>
Download mbox | patch
Permalink /patch/42058/
State New
Headers show

Comments

Isaku Yamahata - Jan. 4, 2010, 10:45 a.m.
On Mon, Jan 04, 2010 at 04:26:46AM +0100, Alexander Graf wrote:
> 
> On 03.01.2010, at 21:50, Benjamin Herrenschmidt wrote:
> 
> > On Sun, 2010-01-03 at 21:27 +0100, Alexander Graf wrote:
> > 
> >> I think if unin_pci is the only user, it'd be better to do it hacky
> >> inside unin_pci.c. But if there's a chance there's another user, it'd
> >> be better to make it generic.
> >> 
> >> Since this is the first time I ever stumbled across type 0 and type 1
> >> PCI config space addresses, I simply don't know if there are any. Blue
> >> Swirls comment indicated that there are. Ben also sounded as if it's
> >> rather common to not use the x86 format. On the other hand, it looks
> >> like nobody in qemu needed it so far - and we're implementing ARM,
> >> MIPS and all other sorts of platforms.
> >> 
> >> So if anyone with knowledge in PCI could shed some light here, please
> >> do so.
> > 
> > My feeling is that what you're better off doing is to have the qemu core
> > take an abstract struct to identify a device config space location, that
> > consists of separate fields for:
> > 
> > - The PCI domain (which is what host bridge it hangs off since bus
> > numbers are not unique between domains)
> > 
> > - The bus number
> 
> Hm, I think it'd make more sense to just store a PCIBus pointer in there. We could then fetch the bus and domain id from there.
> 
> I'll write something up :-).
> 
> 
> Alex
> 

Does the following patch help?
I did only compile test though.

From 9c62b4846f95ebe84e182f76295016e1fe980699 Mon Sep 17 00:00:00 2001
From: Isaku Yamahata <yamahata@valinux.co.jp>
Date: Mon, 4 Jan 2010 19:39:36 +0900
Subject: [PATCH] pci: pcihost clean up.

remove some codes by introduce callback to calculate pci device and offset
in configuration space.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 hw/gt64xxx.c           |    9 +---
 hw/pci_host.c          |  115 ++++++++++++++++++++++++++++--------------------
 hw/pci_host.h          |   17 ++++++-
 hw/pci_host_template.h |   21 +++------
 hw/prep_pci.c          |   68 +++--------------------------
 hw/sh_pci.c            |   92 +++++++++++++++++---------------------
 hw/versatile_pci.c     |   74 +++++--------------------------
 7 files changed, 150 insertions(+), 246 deletions(-)
Alexander Graf - Jan. 4, 2010, 10:55 a.m.
On 04.01.2010, at 11:45, Isaku Yamahata wrote:

> On Mon, Jan 04, 2010 at 04:26:46AM +0100, Alexander Graf wrote:
>> 
>> On 03.01.2010, at 21:50, Benjamin Herrenschmidt wrote:
>> 
>>> On Sun, 2010-01-03 at 21:27 +0100, Alexander Graf wrote:
>>> 
>>>> I think if unin_pci is the only user, it'd be better to do it hacky
>>>> inside unin_pci.c. But if there's a chance there's another user, it'd
>>>> be better to make it generic.
>>>> 
>>>> Since this is the first time I ever stumbled across type 0 and type 1
>>>> PCI config space addresses, I simply don't know if there are any. Blue
>>>> Swirls comment indicated that there are. Ben also sounded as if it's
>>>> rather common to not use the x86 format. On the other hand, it looks
>>>> like nobody in qemu needed it so far - and we're implementing ARM,
>>>> MIPS and all other sorts of platforms.
>>>> 
>>>> So if anyone with knowledge in PCI could shed some light here, please
>>>> do so.
>>> 
>>> My feeling is that what you're better off doing is to have the qemu core
>>> take an abstract struct to identify a device config space location, that
>>> consists of separate fields for:
>>> 
>>> - The PCI domain (which is what host bridge it hangs off since bus
>>> numbers are not unique between domains)
>>> 
>>> - The bus number
>> 
>> Hm, I think it'd make more sense to just store a PCIBus pointer in there. We could then fetch the bus and domain id from there.
>> 
>> I'll write something up :-).
>> 
>> 
>> Alex
>> 
> 
> Does the following patch help?
> I did only compile test though.

I sent out v2 already, which contains a complete resolution framework. It also allows for incremental cleanup of the users, as I'd rather like to see everyone use pci_host instead of hacky homegrown functions. But I don't think changing all at once is going to fly wrt reviewablity.

I'd be really glad if you could take a glimpse at my version. You're definitely more knowledgable in the PCI areas than me :-). I verified that it fixes Uninorth and x86 still works.


Alex
Michael S. Tsirkin - Jan. 4, 2010, 11:07 a.m.
On Mon, Jan 04, 2010 at 07:45:16PM +0900, Isaku Yamahata wrote:
> +static PCIDevice *pci_host_dev_find_fn_noswap(PCIHostState *s, uint32_t addr)
> +{
> +    return pci_host_find_dev_active(s->bus, pci_host_pci_addr(s, addr));
> +}
> +
> +static PCIDevice *pci_host_dev_find_fn(PCIHostState *s, uint32_t addr)
> +{
> +#ifdef TARGET_WORDS_BIGENDIAN
> +    addr = bswap32(addr);
> +#endif
> +    return pci_host_find_dev_active(s->bus, pci_host_pci_addr(s, addr));
> +}
> +

BTW, I think we really should think about the right way to address the
swap/noswap issue without using a preprocessor. Maybe make pci host
bridge explicitly specify whether to swap bytes?  How about adding a
field in PCIHostState to make it do this?
Isaku Yamahata - Jan. 4, 2010, 11:08 a.m.
On Mon, Jan 04, 2010 at 11:55:10AM +0100, Alexander Graf wrote:
> 
> On 04.01.2010, at 11:45, Isaku Yamahata wrote:
> 
> > On Mon, Jan 04, 2010 at 04:26:46AM +0100, Alexander Graf wrote:
> >> 
> >> On 03.01.2010, at 21:50, Benjamin Herrenschmidt wrote:
> >> 
> >>> On Sun, 2010-01-03 at 21:27 +0100, Alexander Graf wrote:
> >>> 
> >>>> I think if unin_pci is the only user, it'd be better to do it hacky
> >>>> inside unin_pci.c. But if there's a chance there's another user, it'd
> >>>> be better to make it generic.
> >>>> 
> >>>> Since this is the first time I ever stumbled across type 0 and type 1
> >>>> PCI config space addresses, I simply don't know if there are any. Blue
> >>>> Swirls comment indicated that there are. Ben also sounded as if it's
> >>>> rather common to not use the x86 format. On the other hand, it looks
> >>>> like nobody in qemu needed it so far - and we're implementing ARM,
> >>>> MIPS and all other sorts of platforms.
> >>>> 
> >>>> So if anyone with knowledge in PCI could shed some light here, please
> >>>> do so.
> >>> 
> >>> My feeling is that what you're better off doing is to have the qemu core
> >>> take an abstract struct to identify a device config space location, that
> >>> consists of separate fields for:
> >>> 
> >>> - The PCI domain (which is what host bridge it hangs off since bus
> >>> numbers are not unique between domains)
> >>> 
> >>> - The bus number
> >> 
> >> Hm, I think it'd make more sense to just store a PCIBus pointer in there. We could then fetch the bus and domain id from there.
> >> 
> >> I'll write something up :-).
> >> 
> >> 
> >> Alex
> >> 
> > 
> > Does the following patch help?
> > I did only compile test though.
> 
> I sent out v2 already, which contains a complete resolution framework. It also allows for incremental cleanup of the users, as I'd rather like to see everyone use pci_host instead of hacky homegrown functions. But I don't think changing all at once is going to fly wrt reviewablity.

Agreed. Anyway that patch is just for discussion, not for commit.
If wanted, I'm willing to split it up into a series of patches or
rebase it on top of your patches. (Or throw it away)


> I'd be really glad if you could take a glimpse at my version. You're definitely more knowledgable in the PCI areas than me :-). I verified that it fixes Uninorth and x86 still works.

I'll have a look at it tomorrow.
Alexander Graf - Jan. 4, 2010, 11:13 a.m.
On 04.01.2010, at 12:07, Michael S. Tsirkin wrote:

> On Mon, Jan 04, 2010 at 07:45:16PM +0900, Isaku Yamahata wrote:
>> +static PCIDevice *pci_host_dev_find_fn_noswap(PCIHostState *s, uint32_t addr)
>> +{
>> +    return pci_host_find_dev_active(s->bus, pci_host_pci_addr(s, addr));
>> +}
>> +
>> +static PCIDevice *pci_host_dev_find_fn(PCIHostState *s, uint32_t addr)
>> +{
>> +#ifdef TARGET_WORDS_BIGENDIAN
>> +    addr = bswap32(addr);
>> +#endif
>> +    return pci_host_find_dev_active(s->bus, pci_host_pci_addr(s, addr));
>> +}
>> +
> 
> BTW, I think we really should think about the right way to address the
> swap/noswap issue without using a preprocessor. Maybe make pci host
> bridge explicitly specify whether to swap bytes?  How about adding a
> field in PCIHostState to make it do this?

Sounds reasonable. But let's take baby steps here. I don't want to end up with a 10000 lines patch :-).


Alex
Benjamin Herrenschmidt - Jan. 4, 2010, 8:10 p.m.
On Mon, 2010-01-04 at 13:07 +0200, Michael S. Tsirkin wrote:
> BTW, I think we really should think about the right way to address the
> swap/noswap issue without using a preprocessor. Maybe make pci host
> bridge explicitly specify whether to swap bytes?  How about adding a
> field in PCIHostState to make it do this? 

No, this is a non issue if you get your design right. Just abstract out
the reference to a device in a struct like Alex is proposing and have
the host bridge specific code fill that up appropriately. I don't see
why there would be any need for swapping and in any case, this should go
away once the host bridge code knows how to interpret the write to
whatever config access registers it exposes.

Ben.
Michael S. Tsirkin - Jan. 4, 2010, 9:12 p.m.
On Tue, Jan 05, 2010 at 07:10:58AM +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2010-01-04 at 13:07 +0200, Michael S. Tsirkin wrote:
> > BTW, I think we really should think about the right way to address the
> > swap/noswap issue without using a preprocessor. Maybe make pci host
> > bridge explicitly specify whether to swap bytes?  How about adding a
> > field in PCIHostState to make it do this? 
> 
> No, this is a non issue if you get your design right. Just abstract out
> the reference to a device in a struct like Alex is proposing and have
> the host bridge specific code fill that up appropriately. I don't see
> why there would be any need for swapping and in any case, this should go
> away once the host bridge code knows how to interpret the write to
> whatever config access registers it exposes.
> 
> Ben.

Well, the main issue if I understand correcttly is that basically the
same hardware bridge can be connected to host in different ways. Yes, we
can say "if it's connected differently it's a different device" but this
is slightly ugly, device should not have to know how it's connected. It
would be cleaner to have a "connector" device in the middle that swaps
bytes.  Even though yes, what you describe would be less ugly than using
proprocessor as we do now.
Benjamin Herrenschmidt - Jan. 4, 2010, 9:25 p.m.
On Mon, 2010-01-04 at 23:12 +0200, Michael S. Tsirkin wrote:
> Well, the main issue if I understand correcttly is that basically the
> same hardware bridge can be connected to host in different ways. Yes, we
> can say "if it's connected differently it's a different device" but this
> is slightly ugly, device should not have to know how it's connected. It
> would be cleaner to have a "connector" device in the middle that swaps
> bytes.  Even though yes, what you describe would be less ugly than using
> proprocessor as we do now.

Well, the thing is... PCI is always little endian. I'm not 100% familiar
on how emulation of devices works in qemu, but it's possible that the
problem here is simply not how a standard PCI host bridge is connected
to the processor changing but rather whether it's connected to an x86
host or a ppc host should make the byte order appear different. IE. a
PPC operating system will byteswap accesses. If qemu just passes on
accesses -as-is- instead of doing natural byteswapping then indeed you
will need that added swap there too.

I still think though that this should be buried in the accessors for the
host bridge themselves, eventually controlled by a flag set when
instanciating the host bridge in case it can indeed be wired in
different ways.

Cheers,
Ben.
Michael S. Tsirkin - Jan. 4, 2010, 9:30 p.m.
On Tue, Jan 05, 2010 at 08:25:30AM +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2010-01-04 at 23:12 +0200, Michael S. Tsirkin wrote:
> > Well, the main issue if I understand correcttly is that basically the
> > same hardware bridge can be connected to host in different ways. Yes, we
> > can say "if it's connected differently it's a different device" but this
> > is slightly ugly, device should not have to know how it's connected. It
> > would be cleaner to have a "connector" device in the middle that swaps
> > bytes.  Even though yes, what you describe would be less ugly than using
> > proprocessor as we do now.
> 
> Well, the thing is... PCI is always little endian.
> I'm not 100% familiar
> on how emulation of devices works in qemu, but it's possible that the
> problem here is simply not how a standard PCI host bridge is connected
> to the processor changing but rather whether it's connected to an x86
> host or a ppc host should make the byte order appear different. IE. a
> PPC operating system will byteswap accesses. If qemu just passes on
> accesses -as-is- instead of doing natural byteswapping then indeed you
> will need that added swap there too.
> 

Yes, but I think how you program your host to pci bridge is platform specific,
the standard (mostly) applies to what happens below the bridge.  There's
no real standard for how PCI host bridge is connected to processor
AFAIK, it's by luck we can share code there at all.

> I still think though that this should be buried in the accessors for the
> host bridge themselves, eventually controlled by a flag set when
> instanciating the host bridge in case it can indeed be wired in
> different ways.
> 
> Cheers,
> Ben.
>  

buried sounds scary, but generally yes, a flag in host bridge code
is the idea.
Benjamin Herrenschmidt - Jan. 4, 2010, 9:53 p.m.
> Yes, but I think how you program your host to pci bridge is platform specific,
> the standard (mostly) applies to what happens below the bridge.  There's
> no real standard for how PCI host bridge is connected to processor
> AFAIK, it's by luck we can share code there at all.

Well, yes and no ... there's a standard on how a PCI host bridge is
connected in the sense that how normal MMIO accesses go through in term
of endianness is well defined.

How you actually issue config space cycles is a property of a given
bridge. How you issue IO cycles as well in fact.

Cheers,
Ben.
Michael S. Tsirkin - Jan. 4, 2010, 10:25 p.m.
On Tue, Jan 05, 2010 at 08:53:52AM +1100, Benjamin Herrenschmidt wrote:
> 
> > Yes, but I think how you program your host to pci bridge is platform specific,
> > the standard (mostly) applies to what happens below the bridge.  There's
> > no real standard for how PCI host bridge is connected to processor
> > AFAIK, it's by luck we can share code there at all.
> 
> Well, yes and no ... there's a standard on how a PCI host bridge is
> connected in the sense that how normal MMIO accesses go through in term
> of endianness is well defined.
> 

Go through where? From processor to PCI?
Which spec do you refer to?

> How you actually issue config space cycles is a property of a given
> bridge. How you issue IO cycles as well in fact.
> 
> Cheers,
> Ben.
Benjamin Herrenschmidt - Jan. 4, 2010, 10:51 p.m.
On Tue, 2010-01-05 at 00:25 +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 05, 2010 at 08:53:52AM +1100, Benjamin Herrenschmidt wrote:
> > 
> > > Yes, but I think how you program your host to pci bridge is platform specific,
> > > the standard (mostly) applies to what happens below the bridge.  There's
> > > no real standard for how PCI host bridge is connected to processor
> > > AFAIK, it's by luck we can share code there at all.
> > 
> > Well, yes and no ... there's a standard on how a PCI host bridge is
> > connected in the sense that how normal MMIO accesses go through in term
> > of endianness is well defined.
> > 
> 
> Go through where? From processor to PCI?
> Which spec do you refer to?

The PCI spec from memory :-) Byte swizzling for MMIO between a processor
and PCI bus is well defined.

Now of course, since issuing config space cycles tend to be host-bridge
chip specific, everything is possible there :-) In -most- cases though,
they use a mechanism similar to x86 using the cf8/cfc ports or
equivalent mapped wherever the PIO region is mapped in MMIO space for
non-x86 processors, and thus end up with the exact same byte swizzling.

In fact, this is true of accesses to PCI devices as well. Take for
example, a device that has a 32-bit MMIO register. This register is
meant to appear as little endian (well, unless the device itself plays
tricks but most don't) whatever the host processor is. Thus an x86 host
will need no byteswap but a powerpc host (assuming the ppc is running in
BE mode) will.

This is why for example the base readl and writel function in Linux do
byteswap on powerpc.

This is important to understand that this is a property of how the PCI
bridge is connected to the host processor, such that the PCI "native"
byte order is preserved along with address invariance for sub-32-bit
quantities.

The endianness of the host bridge config space access register is thus
most of the time just a natural side effect of said register being part
of the bridge PIO space and thus getting the natural byteswap explained
above for a 32-bit LE register on PCI.

Cheers,
Ben.
Michael S. Tsirkin - Jan. 4, 2010, 10:59 p.m.
On Tue, Jan 05, 2010 at 09:51:48AM +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2010-01-05 at 00:25 +0200, Michael S. Tsirkin wrote:
> > On Tue, Jan 05, 2010 at 08:53:52AM +1100, Benjamin Herrenschmidt wrote:
> > > 
> > > > Yes, but I think how you program your host to pci bridge is platform specific,
> > > > the standard (mostly) applies to what happens below the bridge.  There's
> > > > no real standard for how PCI host bridge is connected to processor
> > > > AFAIK, it's by luck we can share code there at all.
> > > 
> > > Well, yes and no ... there's a standard on how a PCI host bridge is
> > > connected in the sense that how normal MMIO accesses go through in term
> > > of endianness is well defined.
> > > 
> > 
> > Go through where? From processor to PCI?
> > Which spec do you refer to?
> 
> The PCI spec from memory :-) Byte swizzling for MMIO between a processor
> and PCI bus is well defined.

Couldn't find it in spec.

> Now of course, since issuing config space cycles tend to be host-bridge
> chip specific, everything is possible there :-) In -most- cases though,
> they use a mechanism similar to x86 using the cf8/cfc ports or
> equivalent mapped wherever the PIO region is mapped in MMIO space for
> non-x86 processors, and thus end up with the exact same byte swizzling.
> 
> In fact, this is true of accesses to PCI devices as well. Take for
> example, a device that has a 32-bit MMIO register. This register is
> meant to appear as little endian (well, unless the device itself plays
> tricks but most don't) whatever the host processor is. Thus an x86 host
> will need no byteswap but a powerpc host (assuming the ppc is running in
> BE mode) will.
> 
> This is why for example the base readl and writel function in Linux do
> byteswap on powerpc.
> 
> This is important to understand that this is a property of how the PCI
> bridge is connected to the host processor, such that the PCI "native"
> byte order is preserved along with address invariance for sub-32-bit
> quantities.
> 
> The endianness of the host bridge config space access register is thus
> most of the time just a natural side effect of said register being part
> of the bridge PIO space and thus getting the natural byteswap explained
> above for a 32-bit LE register on PCI.
> 
> Cheers,
> Ben.

So, it appears that this is not the case for many platforms: bridge
itself does a byteswap to make devices behind it work according to spec,
but this does not apply to programming bridge itself.

This seems common on BE platforms, this is why qemu has
ifdef TARGET_WORDS_BIGENDIAN there IIUC.
Alexander Graf - Jan. 4, 2010, 11:08 p.m.
On 04.01.2010, at 23:59, Michael S. Tsirkin wrote:

> On Tue, Jan 05, 2010 at 09:51:48AM +1100, Benjamin Herrenschmidt wrote:
>> On Tue, 2010-01-05 at 00:25 +0200, Michael S. Tsirkin wrote:
>>> On Tue, Jan 05, 2010 at 08:53:52AM +1100, Benjamin Herrenschmidt wrote:
>>>> 
>>>>> Yes, but I think how you program your host to pci bridge is platform specific,
>>>>> the standard (mostly) applies to what happens below the bridge.  There's
>>>>> no real standard for how PCI host bridge is connected to processor
>>>>> AFAIK, it's by luck we can share code there at all.
>>>> 
>>>> Well, yes and no ... there's a standard on how a PCI host bridge is
>>>> connected in the sense that how normal MMIO accesses go through in term
>>>> of endianness is well defined.
>>>> 
>>> 
>>> Go through where? From processor to PCI?
>>> Which spec do you refer to?
>> 
>> The PCI spec from memory :-) Byte swizzling for MMIO between a processor
>> and PCI bus is well defined.
> 
> Couldn't find it in spec.
> 
>> Now of course, since issuing config space cycles tend to be host-bridge
>> chip specific, everything is possible there :-) In -most- cases though,
>> they use a mechanism similar to x86 using the cf8/cfc ports or
>> equivalent mapped wherever the PIO region is mapped in MMIO space for
>> non-x86 processors, and thus end up with the exact same byte swizzling.
>> 
>> In fact, this is true of accesses to PCI devices as well. Take for
>> example, a device that has a 32-bit MMIO register. This register is
>> meant to appear as little endian (well, unless the device itself plays
>> tricks but most don't) whatever the host processor is. Thus an x86 host
>> will need no byteswap but a powerpc host (assuming the ppc is running in
>> BE mode) will.
>> 
>> This is why for example the base readl and writel function in Linux do
>> byteswap on powerpc.
>> 
>> This is important to understand that this is a property of how the PCI
>> bridge is connected to the host processor, such that the PCI "native"
>> byte order is preserved along with address invariance for sub-32-bit
>> quantities.
>> 
>> The endianness of the host bridge config space access register is thus
>> most of the time just a natural side effect of said register being part
>> of the bridge PIO space and thus getting the natural byteswap explained
>> above for a 32-bit LE register on PCI.
>> 
>> Cheers,
>> Ben.
> 
> So, it appears that this is not the case for many platforms: bridge
> itself does a byteswap to make devices behind it work according to spec,
> but this does not apply to programming bridge itself.
> 
> This seems common on BE platforms, this is why qemu has
> ifdef TARGET_WORDS_BIGENDIAN there IIUC.

IIRC qemu's mmio functions just pass the register value the guest had at that moment to the mmio function.

While that is pretty simple, it means that we behave different from real hardware. Real hardware has 2 components:

1) CPU
2) Device

The CPU sends an MMIO request in local byte order to the device. The device also has a native endianness - either BE or LE.
So what the byte swap here does is simply converting from wrong LE (what Linux thought the device needs) to a proper variable.

I'm not sure what the correct solution to this is. I'm inclined to suggest that mmio callbacks should get called with tswap'ed values. But then again the code as is works in quite a lot of cases and I don't want to spend months getting it back to where it is just because of a cosmetic change.

Alex
Michael S. Tsirkin - Jan. 4, 2010, 11:12 p.m.
On Tue, Jan 05, 2010 at 12:08:19AM +0100, Alexander Graf wrote:
> 
> On 04.01.2010, at 23:59, Michael S. Tsirkin wrote:
> 
> > On Tue, Jan 05, 2010 at 09:51:48AM +1100, Benjamin Herrenschmidt wrote:
> >> On Tue, 2010-01-05 at 00:25 +0200, Michael S. Tsirkin wrote:
> >>> On Tue, Jan 05, 2010 at 08:53:52AM +1100, Benjamin Herrenschmidt wrote:
> >>>> 
> >>>>> Yes, but I think how you program your host to pci bridge is platform specific,
> >>>>> the standard (mostly) applies to what happens below the bridge.  There's
> >>>>> no real standard for how PCI host bridge is connected to processor
> >>>>> AFAIK, it's by luck we can share code there at all.
> >>>> 
> >>>> Well, yes and no ... there's a standard on how a PCI host bridge is
> >>>> connected in the sense that how normal MMIO accesses go through in term
> >>>> of endianness is well defined.
> >>>> 
> >>> 
> >>> Go through where? From processor to PCI?
> >>> Which spec do you refer to?
> >> 
> >> The PCI spec from memory :-) Byte swizzling for MMIO between a processor
> >> and PCI bus is well defined.
> > 
> > Couldn't find it in spec.
> > 
> >> Now of course, since issuing config space cycles tend to be host-bridge
> >> chip specific, everything is possible there :-) In -most- cases though,
> >> they use a mechanism similar to x86 using the cf8/cfc ports or
> >> equivalent mapped wherever the PIO region is mapped in MMIO space for
> >> non-x86 processors, and thus end up with the exact same byte swizzling.
> >> 
> >> In fact, this is true of accesses to PCI devices as well. Take for
> >> example, a device that has a 32-bit MMIO register. This register is
> >> meant to appear as little endian (well, unless the device itself plays
> >> tricks but most don't) whatever the host processor is. Thus an x86 host
> >> will need no byteswap but a powerpc host (assuming the ppc is running in
> >> BE mode) will.
> >> 
> >> This is why for example the base readl and writel function in Linux do
> >> byteswap on powerpc.
> >> 
> >> This is important to understand that this is a property of how the PCI
> >> bridge is connected to the host processor, such that the PCI "native"
> >> byte order is preserved along with address invariance for sub-32-bit
> >> quantities.
> >> 
> >> The endianness of the host bridge config space access register is thus
> >> most of the time just a natural side effect of said register being part
> >> of the bridge PIO space and thus getting the natural byteswap explained
> >> above for a 32-bit LE register on PCI.
> >> 
> >> Cheers,
> >> Ben.
> > 
> > So, it appears that this is not the case for many platforms: bridge
> > itself does a byteswap to make devices behind it work according to spec,
> > but this does not apply to programming bridge itself.
> > 
> > This seems common on BE platforms, this is why qemu has
> > ifdef TARGET_WORDS_BIGENDIAN there IIUC.
> 
> IIRC qemu's mmio functions just pass the register value the guest had at that moment to the mmio function.
> 
> While that is pretty simple, it means that we behave different from real hardware. Real hardware has 2 components:
> 
> 1) CPU
> 2) Device
> 
> The CPU sends an MMIO request in local byte order to the device. The device also has a native endianness - either BE or LE.
> So what the byte swap here does is simply converting from wrong LE (what Linux thought the device needs) to a proper variable.
> 
> I'm not sure what the correct solution to this is. I'm inclined to suggest that mmio callbacks should get called with tswap'ed values. But then again the code as is works in quite a lot of cases and I don't want to spend months getting it back to where it is just because of a cosmetic change.
> 
> Alex

If I understand correctly this is just a byteswap to emulate how host
bridge is connected, unrelated to local byte order.  By anyway, I agree:
let's not spend months on this.
Benjamin Herrenschmidt - Jan. 4, 2010, 11:33 p.m.
> So, it appears that this is not the case for many platforms: bridge
> itself does a byteswap to make devices behind it work according to spec,
> but this does not apply to programming bridge itself.
> 
> This seems common on BE platforms, this is why qemu has
> ifdef TARGET_WORDS_BIGENDIAN there IIUC.

Sadly, it is quite often that misleaded HW designers thing they are
doing SW a service by making something BE instead of LE on a BE
platform, but in the end just forcing everybody to special case :-)

It's hard to say what is the most common. All powerpc chrp have it LE
afaik, FSL tried to be smart (FAIL) and got their SoC config space
access reg BE instead. Apple stuff is just "special", etc...

In any case, it doesn't matter much as long as qemu gets normal MMIO
emulation right. Host bridge config space is chipset specific and as
such platform specific.

Cheers,
Ben.
Benjamin Herrenschmidt - Jan. 4, 2010, 11:39 p.m.
On Tue, 2010-01-05 at 00:08 +0100, Alexander Graf wrote:
> 
> IIRC qemu's mmio functions just pass the register value the guest had
> at that moment to the mmio function.

That means that qemu HW emulation needs, for each device, to add a layer
of byteswap depending on whether the CPU is LE or BE which sucks :-)

> While that is pretty simple, it means that we behave different from
> real hardware. Real hardware has 2 components:
> 
> 1) CPU
> 2) Device
> 
> The CPU sends an MMIO request in local byte order to the device. The
> device also has a native endianness - either BE or LE.
> So what the byte swap here does is simply converting from wrong LE
> (what Linux thought the device needs) to a proper variable.

Not quite. The CPU sends an MMIO request to a PCI host bridge. For
devices (and I'm not talking about the host bridge registers themselves,
those are on the CPU bus and could be wired in any fancy way), but for
PCI devices behind that bridge, you expect things to be wired in such a
way that the BE CPU MSB is wired to the PCI LSB.

IE. If a piece of C code writes 0xaabbccdd to a PCI device 32-bit
register using a native aligned access from the CPU of the form
*(unsigned int *)addr, the device should see 0xaabbccdd with a LE CPU
and 0xddccbbaa with a BE CPU :-) Really ! This is why writel() in linux
will byteswap on BE CPUs.

Ideally, QEMU should do the same swapping for all accesses over PCI in
order to avoid to CPU conditional byteswap tricks in the device
emulation proper.

Now, regarding the host bridge own registers, as I said above, if they
are legacy IO cf8/cfc (which is the case of most CHRPs for example) they
will be LE just the same way as above. Some SoCs like FSL do try to be
smart and get them BE though. But yeah, that is all specific to a given
host bridge.

> I'm not sure what the correct solution to this is. I'm inclined to
> suggest that mmio callbacks should get called with tswap'ed values.
> But then again the code as is works in quite a lot of cases and I
> don't want to spend months getting it back to where it is just because
> of a cosmetic change.

Well, I'd say that MMIO callbacks should continue being native for
anything on the CPU bus, -but- when crossing a bridge, they may need to
get some swapping done depending on what that bridge does. For example,
I would expect a powerpc/amba bridge to do a similar swapping as above
for a powerpc/pci bridge.

Cheers,
Ben.

> Alex

Patch

diff --git a/hw/gt64xxx.c b/hw/gt64xxx.c
index fb7f5bd..b399f0c 100644
--- a/hw/gt64xxx.c
+++ b/hw/gt64xxx.c
@@ -530,8 +530,7 @@  static void gt64120_writel (void *opaque, target_phys_addr_t addr,
     case GT_PCI0_CFGDATA:
         if (!(s->regs[GT_PCI0_CMD] & 1) && (s->pci->config_reg & 0x00fff800))
             val = bswap32(val);
-        if (s->pci->config_reg & (1u << 31))
-            pci_data_write(s->pci->bus, s->pci->config_reg, val, 4);
+        pci_data_write(s->pci, 0, val, 4);
         break;
 
     /* Interrupts */
@@ -768,10 +767,7 @@  static uint32_t gt64120_readl (void *opaque,
         val = s->pci->config_reg;
         break;
     case GT_PCI0_CFGDATA:
-        if (!(s->pci->config_reg & (1 << 31)))
-            val = 0xffffffff;
-        else
-            val = pci_data_read(s->pci->bus, s->pci->config_reg, 4);
+        val = pci_data_read(s->pci, 0, 4);
         if (!(s->regs[GT_PCI0_CMD] & 1) && (s->pci->config_reg & 0x00fff800))
             val = bswap32(val);
         break;
@@ -1119,6 +1115,7 @@  PCIBus *pci_gt64120_init(qemu_irq *pic)
 
     s = qemu_mallocz(sizeof(GT64120State));
     s->pci = qemu_mallocz(sizeof(GT64120PCIState));
+    pci_host_init(s->pci, NULL, NULL);
 
     s->pci->bus = pci_register_bus(NULL, "pci",
                                    pci_gt64120_set_irq, pci_gt64120_map_irq,
diff --git a/hw/pci_host.c b/hw/pci_host.c
index eeb8dee..6677175 100644
--- a/hw/pci_host.c
+++ b/hw/pci_host.c
@@ -40,6 +40,49 @@  do { printf("pci_host_data: " fmt , ## __VA_ARGS__); } while (0)
  */
 
 /* the helper functio to get a PCIDeice* for a given pci address */
+PCIDevice *pci_host_find_dev(PCIBus *bus, uint32_t addr)
+{
+    uint8_t bus_num = addr >> 16;
+    uint8_t devfn = addr >> 8;
+
+    return pci_find_device(bus, bus_num, PCI_SLOT(devfn), PCI_FUNC(devfn));
+}
+
+uint32_t pci_host_config_offset(uint32_t addr)
+{
+    return addr & (PCI_CONFIG_SPACE_SIZE - 1);
+}
+
+static uint32_t pci_host_pci_addr(PCIHostState *s, uint32_t addr)
+{
+    return s->config_reg | (addr & 3);
+}
+static PCIDevice *pci_host_find_dev_active(PCIBus *bus, uint32_t pci_addr)
+{
+    if (!(pci_addr & (1u << 31)))
+        return NULL;
+
+    return pci_host_find_dev(bus, pci_addr);
+}
+
+static PCIDevice *pci_host_dev_find_fn_noswap(PCIHostState *s, uint32_t addr)
+{
+    return pci_host_find_dev_active(s->bus, pci_host_pci_addr(s, addr));
+}
+
+static PCIDevice *pci_host_dev_find_fn(PCIHostState *s, uint32_t addr)
+{
+#ifdef TARGET_WORDS_BIGENDIAN
+    addr = bswap32(addr);
+#endif
+    return pci_host_find_dev_active(s->bus, pci_host_pci_addr(s, addr));
+}
+
+static uint32_t pci_host_config_offset_fn(PCIHostState *s, uint32_t addr)
+{
+    return pci_host_config_offset(pci_host_pci_addr(s, addr));
+}
+
 static inline PCIDevice *pci_dev_find_by_addr(PCIBus *bus, uint32_t addr)
 {
     uint8_t bus_num = addr >> 16;
@@ -48,10 +91,10 @@  static inline PCIDevice *pci_dev_find_by_addr(PCIBus *bus, uint32_t addr)
     return pci_find_device(bus, bus_num, PCI_SLOT(devfn), PCI_FUNC(devfn));
 }
 
-void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, int len)
+void pci_data_write(PCIHostState *s, uint32_t addr, uint32_t val, int len)
 {
-    PCIDevice *pci_dev = pci_dev_find_by_addr(s, addr);
-    uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
+    PCIDevice *pci_dev = s->dev_find(s, addr);
+    uint32_t config_addr = s->config_offset(s, addr);
 
     if (!pci_dev)
         return;
@@ -61,10 +104,10 @@  void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, int len)
     pci_dev->config_write(pci_dev, config_addr, val, len);
 }
 
-uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len)
+uint32_t pci_data_read(PCIHostState *s, uint32_t addr, int len)
 {
-    PCIDevice *pci_dev = pci_dev_find_by_addr(s, addr);
-    uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
+    PCIDevice *pci_dev = s->dev_find(s, addr);
+    uint32_t config_addr = s->config_offset(s, addr);
     uint32_t val;
 
     assert(len == 1 || len == 2 || len == 4);
@@ -79,14 +122,24 @@  uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len)
     return val;
 }
 
+void pci_host_init(PCIHostState *s,
+                   pci_dev_find_fn dev_find,
+                   pci_config_offset_fn config_offset)
+{
+    if (!dev_find)
+        dev_find = pci_host_dev_find_fn;
+    if (!config_offset)
+        config_offset = pci_host_config_offset_fn;
+
+    s->dev_find = dev_find;
+    s->config_offset = config_offset;
+}
+
 static void pci_host_config_writel(void *opaque, target_phys_addr_t addr,
                                    uint32_t val)
 {
     PCIHostState *s = opaque;
 
-#ifdef TARGET_WORDS_BIGENDIAN
-    val = bswap32(val);
-#endif
     PCI_DPRINTF("%s addr " TARGET_FMT_plx " val %"PRIx32"\n",
                 __func__, addr, val);
     s->config_reg = val;
@@ -97,9 +150,6 @@  static uint32_t pci_host_config_readl(void *opaque, target_phys_addr_t addr)
     PCIHostState *s = opaque;
     uint32_t val = s->config_reg;
 
-#ifdef TARGET_WORDS_BIGENDIAN
-    val = bswap32(val);
-#endif
     PCI_DPRINTF("%s addr " TARGET_FMT_plx " val %"PRIx32"\n",
                 __func__, addr, val);
     return val;
@@ -119,48 +169,16 @@  static CPUReadMemoryFunc * const pci_host_config_read[] = {
 
 int pci_host_conf_register_mmio(PCIHostState *s)
 {
+    pci_host_init(s, NULL, NULL);
     return cpu_register_io_memory(pci_host_config_read,
                                   pci_host_config_write, s);
 }
 
-static void pci_host_config_writel_noswap(void *opaque,
-                                          target_phys_addr_t addr,
-                                          uint32_t val)
-{
-    PCIHostState *s = opaque;
-
-    PCI_DPRINTF("%s addr " TARGET_FMT_plx " val %"PRIx32"\n",
-                __func__, addr, val);
-    s->config_reg = val;
-}
-
-static uint32_t pci_host_config_readl_noswap(void *opaque,
-                                             target_phys_addr_t addr)
-{
-    PCIHostState *s = opaque;
-    uint32_t val = s->config_reg;
-
-    PCI_DPRINTF("%s addr " TARGET_FMT_plx " val %"PRIx32"\n",
-                __func__, addr, val);
-    return val;
-}
-
-static CPUWriteMemoryFunc * const pci_host_config_write_noswap[] = {
-    &pci_host_config_writel_noswap,
-    &pci_host_config_writel_noswap,
-    &pci_host_config_writel_noswap,
-};
-
-static CPUReadMemoryFunc * const pci_host_config_read_noswap[] = {
-    &pci_host_config_readl_noswap,
-    &pci_host_config_readl_noswap,
-    &pci_host_config_readl_noswap,
-};
-
 int pci_host_conf_register_mmio_noswap(PCIHostState *s)
 {
-    return cpu_register_io_memory(pci_host_config_read_noswap,
-                                  pci_host_config_write_noswap, s);
+    pci_host_init(s, pci_host_dev_find_fn_noswap, NULL);
+    return cpu_register_io_memory(pci_host_config_read,
+                                  pci_host_config_write, s);
 }
 
 static void pci_host_config_writel_ioport(void *opaque,
@@ -183,6 +201,7 @@  static uint32_t pci_host_config_readl_ioport(void *opaque, uint32_t addr)
 
 void pci_host_conf_register_ioport(pio_addr_t ioport, PCIHostState *s)
 {
+    pci_host_init(s, NULL, NULL);
     register_ioport_write(ioport, 4, 4, pci_host_config_writel_ioport, s);
     register_ioport_read(ioport, 4, 4, pci_host_config_readl_ioport, s);
 }
diff --git a/hw/pci_host.h b/hw/pci_host.h
index a006687..47c339a 100644
--- a/hw/pci_host.h
+++ b/hw/pci_host.h
@@ -30,14 +30,27 @@ 
 
 #include "sysbus.h"
 
+typedef PCIDevice *(*pci_dev_find_fn)(PCIHostState *s, uint32_t addr);
+typedef uint32_t (*pci_config_offset_fn)(PCIHostState *s, uint32_t addr);
+
 struct PCIHostState {
     SysBusDevice busdev;
     uint32_t config_reg;
     PCIBus *bus;
+
+    pci_dev_find_fn dev_find;
+    pci_config_offset_fn config_offset;
 };
 
-void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, int len);
-uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len);
+void pci_host_init(PCIHostState *s,
+                   pci_dev_find_fn dev_find,
+                   pci_config_offset_fn config_offset);
+
+PCIDevice *pci_host_find_dev(PCIBus *bus, uint32_t addr);
+uint32_t pci_host_config_offset(uint32_t addr);
+
+void pci_data_write(PCIHostState *s, uint32_t addr, uint32_t val, int len);
+uint32_t pci_data_read(PCIHostState *s, uint32_t addr, int len);
 
 /* for mmio */
 int pci_host_conf_register_mmio(PCIHostState *s);
diff --git a/hw/pci_host_template.h b/hw/pci_host_template.h
index 11e6c88..24a03e0 100644
--- a/hw/pci_host_template.h
+++ b/hw/pci_host_template.h
@@ -32,8 +32,7 @@  static void glue(pci_host_data_writeb, PCI_HOST_SUFFIX)(
 
     PCI_DPRINTF("writeb addr " TARGET_FMT_plx " val %x\n",
                 (target_phys_addr_t)addr, val);
-    if (s->config_reg & (1u << 31))
-        pci_data_write(s->bus, s->config_reg | (addr & 3), val, 1);
+    pci_data_write(s, addr, val, 1);
 }
 
 static void glue(pci_host_data_writew, PCI_HOST_SUFFIX)(
@@ -45,8 +44,7 @@  static void glue(pci_host_data_writew, PCI_HOST_SUFFIX)(
 #endif
     PCI_DPRINTF("writew addr " TARGET_FMT_plx " val %x\n",
                 (target_phys_addr_t)addr, val);
-    if (s->config_reg & (1u << 31))
-        pci_data_write(s->bus, s->config_reg | (addr & 3), val, 2);
+    pci_data_write(s, addr, val, 2);
 }
 
 static void glue(pci_host_data_writel, PCI_HOST_SUFFIX)(
@@ -58,8 +56,7 @@  static void glue(pci_host_data_writel, PCI_HOST_SUFFIX)(
 #endif
     PCI_DPRINTF("writel addr " TARGET_FMT_plx " val %x\n",
                 (target_phys_addr_t)addr, val);
-    if (s->config_reg & (1u << 31))
-        pci_data_write(s->bus, s->config_reg, val, 4);
+    pci_data_write(s, addr, val, 4);
 }
 
 static uint32_t glue(pci_host_data_readb, PCI_HOST_SUFFIX)(
@@ -68,9 +65,7 @@  static uint32_t glue(pci_host_data_readb, PCI_HOST_SUFFIX)(
     PCIHostState *s = opaque;
     uint32_t val;
 
-    if (!(s->config_reg & (1 << 31)))
-        return 0xff;
-    val = pci_data_read(s->bus, s->config_reg | (addr & 3), 1);
+    val = pci_data_read(s, addr, 1);
     PCI_DPRINTF("readb addr " TARGET_FMT_plx " val %x\n",
                 (target_phys_addr_t)addr, val);
     return val;
@@ -81,9 +76,7 @@  static uint32_t glue(pci_host_data_readw, PCI_HOST_SUFFIX)(
 {
     PCIHostState *s = opaque;
     uint32_t val;
-    if (!(s->config_reg & (1 << 31)))
-        return 0xffff;
-    val = pci_data_read(s->bus, s->config_reg | (addr & 3), 2);
+    val = pci_data_read(s, addr, 2);
     PCI_DPRINTF("readw addr " TARGET_FMT_plx " val %x\n",
                 (target_phys_addr_t)addr, val);
 #ifdef TARGET_WORDS_BIGENDIAN
@@ -97,9 +90,7 @@  static uint32_t glue(pci_host_data_readl, PCI_HOST_SUFFIX)(
 {
     PCIHostState *s = opaque;
     uint32_t val;
-    if (!(s->config_reg & (1 << 31)))
-        return 0xffffffff;
-    val = pci_data_read(s->bus, s->config_reg | (addr & 3), 4);
+    val = pci_data_read(s, addr, 4);
     PCI_DPRINTF("readl addr " TARGET_FMT_plx " val %x\n",
                 (target_phys_addr_t)addr, val);
 #ifdef TARGET_WORDS_BIGENDIAN
diff --git a/hw/prep_pci.c b/hw/prep_pci.c
index 19f028c..8640b2e 100644
--- a/hw/prep_pci.c
+++ b/hw/prep_pci.c
@@ -40,72 +40,16 @@  static inline uint32_t PPC_PCIIO_config(target_phys_addr_t addr)
     return (addr & 0x7ff) |  (i << 11);
 }
 
-static void PPC_PCIIO_writeb (void *opaque, target_phys_addr_t addr, uint32_t val)
+static PCIDevice *PPC_dev_find_fn(PCIHostState *s, uint32_t addr)
 {
-    PREPPCIState *s = opaque;
-    pci_data_write(s->bus, PPC_PCIIO_config(addr), val, 1);
+    return pci_host_find_dev(s->bus, PPC_PCIIO_config(addr));
 }
 
-static void PPC_PCIIO_writew (void *opaque, target_phys_addr_t addr, uint32_t val)
+static uint32_t PPC_config_offset_fn(PCIHostState *s, uint32_t addr)
 {
-    PREPPCIState *s = opaque;
-#ifdef TARGET_WORDS_BIGENDIAN
-    val = bswap16(val);
-#endif
-    pci_data_write(s->bus, PPC_PCIIO_config(addr), val, 2);
+    return pci_host_config_offset(PPC_PCIIO_config(addr));
 }
 
-static void PPC_PCIIO_writel (void *opaque, target_phys_addr_t addr, uint32_t val)
-{
-    PREPPCIState *s = opaque;
-#ifdef TARGET_WORDS_BIGENDIAN
-    val = bswap32(val);
-#endif
-    pci_data_write(s->bus, PPC_PCIIO_config(addr), val, 4);
-}
-
-static uint32_t PPC_PCIIO_readb (void *opaque, target_phys_addr_t addr)
-{
-    PREPPCIState *s = opaque;
-    uint32_t val;
-    val = pci_data_read(s->bus, PPC_PCIIO_config(addr), 1);
-    return val;
-}
-
-static uint32_t PPC_PCIIO_readw (void *opaque, target_phys_addr_t addr)
-{
-    PREPPCIState *s = opaque;
-    uint32_t val;
-    val = pci_data_read(s->bus, PPC_PCIIO_config(addr), 2);
-#ifdef TARGET_WORDS_BIGENDIAN
-    val = bswap16(val);
-#endif
-    return val;
-}
-
-static uint32_t PPC_PCIIO_readl (void *opaque, target_phys_addr_t addr)
-{
-    PREPPCIState *s = opaque;
-    uint32_t val;
-    val = pci_data_read(s->bus, PPC_PCIIO_config(addr), 4);
-#ifdef TARGET_WORDS_BIGENDIAN
-    val = bswap32(val);
-#endif
-    return val;
-}
-
-static CPUWriteMemoryFunc * const PPC_PCIIO_write[] = {
-    &PPC_PCIIO_writeb,
-    &PPC_PCIIO_writew,
-    &PPC_PCIIO_writel,
-};
-
-static CPUReadMemoryFunc * const PPC_PCIIO_read[] = {
-    &PPC_PCIIO_readb,
-    &PPC_PCIIO_readw,
-    &PPC_PCIIO_readl,
-};
-
 static int prep_map_irq(PCIDevice *pci_dev, int irq_num)
 {
     return (irq_num + (pci_dev->devfn >> 3)) & 1;
@@ -132,8 +76,8 @@  PCIBus *pci_prep_init(qemu_irq *pic)
 
     pci_host_data_register_ioport(0xcfc, s);
 
-    PPC_io_memory = cpu_register_io_memory(PPC_PCIIO_read,
-                                           PPC_PCIIO_write, s);
+    pci_host_init(s, PPC_dev_find_fn, PPC_config_offset_fn);
+    PPC_io_memory = pci_host_data_register_mmio(s);
     cpu_register_physical_memory(0x80800000, 0x00400000, PPC_io_memory);
 
     /* PCI host bridge */
diff --git a/hw/sh_pci.c b/hw/sh_pci.c
index abe4c75..209b92e 100644
--- a/hw/sh_pci.c
+++ b/hw/sh_pci.c
@@ -29,13 +29,37 @@ 
 #include "bswap.h"
 
 typedef struct {
-    PCIBus *bus;
+    PCIHostState pci_state;
     PCIDevice *dev;
-    uint32_t par;
     uint32_t mbr;
     uint32_t iobr;
+    uint32_t par;
 } SHPCIC;
 
+static void sh_pci_data_write(SHPCIC *p, uint32_t val)
+{
+    uint32_t addr = p->par;
+    PCIDevice *dev = pci_host_find_dev(p->pci_state.bus, addr);
+    uint32_t config_offset = pci_host_config_offset(addr);
+
+    if (!dev) {
+        return;
+    }
+    dev->config_write(dev, config_offset, val, 4);
+}
+
+static uint32_t sh_pci_data_read(SHPCIC *p)
+{
+    uint32_t addr = p->par;
+    PCIDevice *dev = pci_host_find_dev(p->pci_state.bus, addr);
+    uint32_t config_offset = pci_host_config_offset(addr);
+
+    if (!dev) {
+        return ~(uint32_t)0;
+    }
+    return dev->config_read(dev, config_offset, 4);
+}
+
 static void sh_pci_reg_write (void *p, target_phys_addr_t addr, uint32_t val)
 {
     SHPCIC *pcic = p;
@@ -53,7 +77,7 @@  static void sh_pci_reg_write (void *p, target_phys_addr_t addr, uint32_t val)
         pcic->iobr = val;
         break;
     case 0x220:
-        pci_data_write(pcic->bus, pcic->par, val, 4);
+        sh_pci_data_write(pcic, val);
         break;
     }
 }
@@ -67,51 +91,21 @@  static uint32_t sh_pci_reg_read (void *p, target_phys_addr_t addr)
     case 0x1c0:
         return pcic->par;
     case 0x220:
-        return pci_data_read(pcic->bus, pcic->par, 4);
+        return sh_pci_data_read(pcic);
     }
     return 0;
 }
 
-static void sh_pci_data_write (SHPCIC *pcic, target_phys_addr_t addr,
-                               uint32_t val, int size)
-{
-    pci_data_write(pcic->bus, addr + pcic->mbr, val, size);
-}
-
-static uint32_t sh_pci_mem_read (SHPCIC *pcic, target_phys_addr_t addr,
-                                 int size)
-{
-    return pci_data_read(pcic->bus, addr + pcic->mbr, size);
-}
-
-static void sh_pci_writeb (void *p, target_phys_addr_t addr, uint32_t val)
-{
-    sh_pci_data_write(p, addr, val, 1);
-}
-
-static void sh_pci_writew (void *p, target_phys_addr_t addr, uint32_t val)
-{
-    sh_pci_data_write(p, addr, val, 2);
-}
-
-static void sh_pci_writel (void *p, target_phys_addr_t addr, uint32_t val)
+static PCIDevice *sh_pci_dev_find_fn(PCIHostState *s, uint32_t addr)
 {
-    sh_pci_data_write(p, addr, val, 4);
+    SHPCIC *pcic = DO_UPCAST(SHPCIC, pci_state, s);
+    return pci_host_find_dev(pcic->pci_state.bus, addr + pcic->mbr);
 }
 
-static uint32_t sh_pci_readb (void *p, target_phys_addr_t addr)
+static uint32_t sh_pci_config_offset_fn(PCIHostState *s, uint32_t addr)
 {
-    return sh_pci_mem_read(p, addr, 1);
-}
-
-static uint32_t sh_pci_readw (void *p, target_phys_addr_t addr)
-{
-    return sh_pci_mem_read(p, addr, 2);
-}
-
-static uint32_t sh_pci_readl (void *p, target_phys_addr_t addr)
-{
-    return sh_pci_mem_read(p, addr, 4);
+    SHPCIC *pcic = DO_UPCAST(SHPCIC, pci_state, s);
+    return pci_host_config_offset(addr + pcic->mbr);
 }
 
 static int sh_pci_addr2port(SHPCIC *pcic, target_phys_addr_t addr)
@@ -159,11 +153,6 @@  static MemOp sh_pci_reg = {
     { NULL, NULL, sh_pci_reg_write },
 };
 
-static MemOp sh_pci_mem = {
-    { sh_pci_readb, sh_pci_readw, sh_pci_readl },
-    { sh_pci_writeb, sh_pci_writew, sh_pci_writel },
-};
-
 static MemOp sh_pci_iop = {
     { sh_pci_inb, sh_pci_inw, sh_pci_inl },
     { sh_pci_outb, sh_pci_outw, sh_pci_outl },
@@ -176,14 +165,15 @@  PCIBus *sh_pci_register_bus(pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
     int mem, reg, iop;
 
     p = qemu_mallocz(sizeof(SHPCIC));
-    p->bus = pci_register_bus(NULL, "pci",
-                              set_irq, map_irq, opaque, devfn_min, nirq);
+    p->pci_state.bus = pci_register_bus(NULL, "pci", set_irq, map_irq,
+                                        opaque, devfn_min, nirq);
 
-    p->dev = pci_register_device(p->bus, "SH PCIC", sizeof(PCIDevice),
-                                 -1, NULL, NULL);
+    p->dev = pci_register_device(p->pci_state.bus, "SH PCIC",
+                                 sizeof(PCIDevice), -1, NULL, NULL);
     reg = cpu_register_io_memory(sh_pci_reg.r, sh_pci_reg.w, p);
     iop = cpu_register_io_memory(sh_pci_iop.r, sh_pci_iop.w, p);
-    mem = cpu_register_io_memory(sh_pci_mem.r, sh_pci_mem.w, p);
+    pci_host_init(&p->pci_state, sh_pci_dev_find_fn, sh_pci_config_offset_fn);
+    mem = pci_host_data_register_mmio(&p->pci_state);
     cpu_register_physical_memory(0x1e200000, 0x224, reg);
     cpu_register_physical_memory(0x1e240000, 0x40000, iop);
     cpu_register_physical_memory(0x1d000000, 0x1000000, mem);
@@ -198,5 +188,5 @@  PCIBus *sh_pci_register_bus(pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
     p->dev->config[0x06] = 0x90;
     p->dev->config[0x07] = 0x02;
 
-    return p->bus;
+    return p->pci_state.bus;
 }
diff --git a/hw/versatile_pci.c b/hw/versatile_pci.c
index 153c651..31bc1cd 100644
--- a/hw/versatile_pci.c
+++ b/hw/versatile_pci.c
@@ -12,7 +12,7 @@ 
 #include "pci_host.h"
 
 typedef struct {
-    SysBusDevice busdev;
+    PCIHostState pci_state;
     qemu_irq irq[4];
     int realview;
     int mem_config;
@@ -23,69 +23,16 @@  static inline uint32_t vpb_pci_config_addr(target_phys_addr_t addr)
     return addr & 0xffffff;
 }
 
-static void pci_vpb_config_writeb (void *opaque, target_phys_addr_t addr,
-                                   uint32_t val)
+static PCIDevice *pci_vpb_dev_find_fn(PCIHostState *s, uint32_t addr)
 {
-    pci_data_write(opaque, vpb_pci_config_addr (addr), val, 1);
+    return pci_host_find_dev(s->bus, vpb_pci_config_addr(addr));
 }
 
-static void pci_vpb_config_writew (void *opaque, target_phys_addr_t addr,
-                                   uint32_t val)
+static uint32_t pci_vpb_config_offset_fn(PCIHostState *s, uint32_t addr)
 {
-#ifdef TARGET_WORDS_BIGENDIAN
-    val = bswap16(val);
-#endif
-    pci_data_write(opaque, vpb_pci_config_addr (addr), val, 2);
+    return pci_host_config_offset(vpb_pci_config_addr(addr));
 }
 
-static void pci_vpb_config_writel (void *opaque, target_phys_addr_t addr,
-                                   uint32_t val)
-{
-#ifdef TARGET_WORDS_BIGENDIAN
-    val = bswap32(val);
-#endif
-    pci_data_write(opaque, vpb_pci_config_addr (addr), val, 4);
-}
-
-static uint32_t pci_vpb_config_readb (void *opaque, target_phys_addr_t addr)
-{
-    uint32_t val;
-    val = pci_data_read(opaque, vpb_pci_config_addr (addr), 1);
-    return val;
-}
-
-static uint32_t pci_vpb_config_readw (void *opaque, target_phys_addr_t addr)
-{
-    uint32_t val;
-    val = pci_data_read(opaque, vpb_pci_config_addr (addr), 2);
-#ifdef TARGET_WORDS_BIGENDIAN
-    val = bswap16(val);
-#endif
-    return val;
-}
-
-static uint32_t pci_vpb_config_readl (void *opaque, target_phys_addr_t addr)
-{
-    uint32_t val;
-    val = pci_data_read(opaque, vpb_pci_config_addr (addr), 4);
-#ifdef TARGET_WORDS_BIGENDIAN
-    val = bswap32(val);
-#endif
-    return val;
-}
-
-static CPUWriteMemoryFunc * const pci_vpb_config_write[] = {
-    &pci_vpb_config_writeb,
-    &pci_vpb_config_writew,
-    &pci_vpb_config_writel,
-};
-
-static CPUReadMemoryFunc * const pci_vpb_config_read[] = {
-    &pci_vpb_config_readb,
-    &pci_vpb_config_readw,
-    &pci_vpb_config_readl,
-};
-
 static int pci_vpb_map_irq(PCIDevice *d, int irq_num)
 {
     return irq_num;
@@ -114,7 +61,8 @@  static void pci_vpb_map(SysBusDevice *dev, target_phys_addr_t base)
 
 static int pci_vpb_init(SysBusDevice *dev)
 {
-    PCIVPBState *s = FROM_SYSBUS(PCIVPBState, dev);
+    PCIHostState *pci_state = FROM_SYSBUS(PCIHostState, dev);
+    PCIVPBState *s = DO_UPCAST(PCIVPBState, pci_state, pci_state);
     PCIBus *bus;
     int i;
 
@@ -127,8 +75,9 @@  static int pci_vpb_init(SysBusDevice *dev)
 
     /* ??? Register memory space.  */
 
-    s->mem_config = cpu_register_io_memory(pci_vpb_config_read,
-                                           pci_vpb_config_write, bus);
+    pci_host_init(&s->pci_state,
+                  pci_vpb_dev_find_fn, pci_vpb_config_offset_fn);
+    s->mem_config = pci_host_data_register_mmio(&s->pci_state);
     sysbus_init_mmio_cb(dev, 0x04000000, pci_vpb_map);
 
     pci_create_simple(bus, -1, "versatile_pci_host");
@@ -137,7 +86,8 @@  static int pci_vpb_init(SysBusDevice *dev)
 
 static int pci_realview_init(SysBusDevice *dev)
 {
-    PCIVPBState *s = FROM_SYSBUS(PCIVPBState, dev);
+    PCIHostState *pci_state = FROM_SYSBUS(PCIHostState, dev);
+    PCIVPBState *s = DO_UPCAST(PCIVPBState, pci_state, pci_state);
     s->realview = 1;
     return pci_vpb_init(dev);
 }