diff mbox

[2/3] PCI: ARM: add support for virtual PCI host controller

Message ID 1391532784-1953-3-git-send-email-will.deacon@arm.com
State Changes Requested
Headers show

Commit Message

Will Deacon Feb. 4, 2014, 4:53 p.m. UTC
This patch adds support for an extremely simple virtual PCI host
controller. The controller itself has no configuration registers, and
has its address spaces described entirely by the device-tree (using the
bindings described by ePAPR). This allows emulations, such as kvmtool,
to provide a simple means for a guest Linux instance to make use of
PCI devices.

Corresponding documentation is added for the DT binding.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 .../devicetree/bindings/pci/linux,pci-virt.txt     |  38 ++++
 drivers/pci/host/Kconfig                           |   7 +
 drivers/pci/host/Makefile                          |   1 +
 drivers/pci/host/pci-virt.c                        | 200 +++++++++++++++++++++
 4 files changed, 246 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/linux,pci-virt.txt
 create mode 100644 drivers/pci/host/pci-virt.c

Comments

Arnd Bergmann Feb. 4, 2014, 7:13 p.m. UTC | #1
On Tuesday 04 February 2014 16:53:03 Will Deacon wrote:
> +
> +- ranges         : As described in IEEE Std 1275-1994, but must provide
> +                   at least a definition of the Configuration Space plus
> +                   one or both of IO and Memory Space.
> +

I might need to reread the spec, but I think the config space is not
actually supposed to be in the 'ranges' of the host bridge at all,
and it should just be listed in the 'reg'.

IIRC the reason why the config space is part of the three-cell address
is so that you can have funky ways to say "memory space of the device
with bus/dev/fn is actually translated to address X rather then Y".

It's too late to change that for the other drivers now, after the
binding is established.

> +Configuration Space is assumed to be memory-mapped (as opposed to being
> +accessed via an ioport) and laid out with a direct correspondence to the
> +geography of a PCI bus address, by concatenating the various components
> +to form a 24-bit offset:
> +
> +        cfg_offset(bus, device, function, register) =
> +                bus << 16 | device << 11 | function << 8 | register

This won't allow extended config space. Why not just do the
regular mmconfig layout and make this:

	cfg_offset(bus, device, function, register) =
		bus << 20 | device << 15 | function << 12 | register;

> +static int virt_pci_setup(int nr, struct pci_sys_data *sys)
> +{
> +	struct virt_pci *pci = sys->private_data;
> +
> +	if (resource_type(&pci->io)) {
> +		pci_add_resource(&sys->resources, &pci->io);
> +		pci_ioremap_io(nr * resource_size(&pci->io), pci->io.start);
> +	}

This should really compute an io_offset.

> +	if (resource_type(&pci->mem))
> +		pci_add_resource(&sys->resources, &pci->mem);

and also a mem_offset, which is something different.

> +	pci->cfg_base = devm_ioremap_resource(pci->dev, &pci->cfg);
> +	return !IS_ERR(pci->cfg_base);
> +}
> +
> +static const struct of_device_id virt_pci_of_match[] = {
> +	{ .compatible = "linux,pci-virt" },
> +	{ },
> +};
> +MODULE_DEVICE_TABLE(of, virt_pci_of_match);

I don't think we even want "virt" in the compatible string. The
binding should be generic enough that it can actually work with
real hardware.

> +	for_each_of_pci_range(&parser, &range) {
> +		u32 restype = range.flags & IORESOURCE_TYPE_BITS;
> +
> +		switch (restype) {
> +		case IORESOURCE_IO:
> +			if (resource_type(&pci->io))
> +				dev_warn(dev,
> +					 "ignoring additional io resource\n");
> +			else
> +				of_pci_range_to_resource(&range, np, &pci->io);
> +			break;
> +		case IORESOURCE_MEM:
> +			if (resource_type(&pci->mem))
> +				dev_warn(dev,
> +					 "ignoring additional mem resource\n");
> +			else
> +				of_pci_range_to_resource(&range, np, &pci->mem);
> +			break;

This shows once more that the range parser code is suboptimal. So far
every single driver got the I/O space wrong here, because the obvious
way to write this function is also completely wrong.

What you get out of "of_pci_range_to_resource(&range, np, &pci->io)"
is not the resource you want to pass into pci_add_resource()
later.

> +	memset(&hw, 0, sizeof(hw));
> +	hw.nr_controllers	= 1;
> +	hw.private_data		= (void **)&pci;
> +	hw.setup		= virt_pci_setup;
> +	hw.map_irq		= of_irq_parse_and_map_pci;
> +	hw.ops			= &virt_pci_ops;
> +	pci_common_init_dev(dev, &hw);

Since most fields here are constant, I'd just write this as

	struct hw_pci hw = {
		.nr_controllers = 1,
		.setup = virt_pci_setup,
		...
	};

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon Feb. 5, 2014, 7:09 p.m. UTC | #2
Hi Arnd,

Thanks for having a look.

On Tue, Feb 04, 2014 at 07:13:49PM +0000, Arnd Bergmann wrote:
> On Tuesday 04 February 2014 16:53:03 Will Deacon wrote:
> > +
> > +- ranges         : As described in IEEE Std 1275-1994, but must provide
> > +                   at least a definition of the Configuration Space plus
> > +                   one or both of IO and Memory Space.
> > +
> 
> I might need to reread the spec, but I think the config space is not
> actually supposed to be in the 'ranges' of the host bridge at all,
> and it should just be listed in the 'reg'.

This wasn't at all clear to me (I listed it in the cover-letter as being
something to sort out).

> IIRC the reason why the config space is part of the three-cell address
> is so that you can have funky ways to say "memory space of the device
> with bus/dev/fn is actually translated to address X rather then Y".
> 
> It's too late to change that for the other drivers now, after the
> binding is established.

The spec is based on the idea that open-firmware enumerates your entire PCI
bus topology, then provides the Conriguation Space address for each device
using a reg property.

Since:

  (a) This doesn't match what we're planning to support
  (b) Runs the risk of making the "reg" encoding something specific to this
      driver
  (c) Doesn't match how we describe Memory and IO Spaces
  (d) There is already precendence in mainline

I chose to use "ranges" instead.

Now, if "reg" is definitely the correct thing to do, is it simply a matter
of putting the Configuration Space base address in there, or do we also need
to do the rest of what ePAPR says (expansion ROM details, ...)? I don't like
the idea of enumerating the entire bus in the DT when we don't need to.

> > +Configuration Space is assumed to be memory-mapped (as opposed to being
> > +accessed via an ioport) and laid out with a direct correspondence to the
> > +geography of a PCI bus address, by concatenating the various components
> > +to form a 24-bit offset:
> > +
> > +        cfg_offset(bus, device, function, register) =
> > +                bus << 16 | device << 11 | function << 8 | register
> 
> This won't allow extended config space. Why not just do the
> regular mmconfig layout and make this:
> 
> 	cfg_offset(bus, device, function, register) =
> 		bus << 20 | device << 15 | function << 12 | register;

Is it worth adding a DT property to support both, or is ECAM the only thing
to care about? I'm happy either way, although I'll need to hack kvmtool to
use the new scheme.

> > +static int virt_pci_setup(int nr, struct pci_sys_data *sys)
> > +{
> > +	struct virt_pci *pci = sys->private_data;
> > +
> > +	if (resource_type(&pci->io)) {
> > +		pci_add_resource(&sys->resources, &pci->io);
> > +		pci_ioremap_io(nr * resource_size(&pci->io), pci->io.start);
> > +	}
> 
> This should really compute an io_offset.
> 
> > +	if (resource_type(&pci->mem))
> > +		pci_add_resource(&sys->resources, &pci->mem);
> 
> and also a mem_offset, which is something different.

As somebody new to PCI, I'm afraid you've lost me here. Are you referring to
using pci_add_resource_offset instead, then removing my restriction on
having a single resource from the parsing code?

> > +	pci->cfg_base = devm_ioremap_resource(pci->dev, &pci->cfg);
> > +	return !IS_ERR(pci->cfg_base);
> > +}
> > +
> > +static const struct of_device_id virt_pci_of_match[] = {
> > +	{ .compatible = "linux,pci-virt" },
> > +	{ },
> > +};
> > +MODULE_DEVICE_TABLE(of, virt_pci_of_match);
> 
> I don't think we even want "virt" in the compatible string. The
> binding should be generic enough that it can actually work with
> real hardware.

Sure. How about "arm,pci-generic" ? Alternatives are
"arm,pcie-generic" or "linux,pci-generic".

> > +	for_each_of_pci_range(&parser, &range) {
> > +		u32 restype = range.flags & IORESOURCE_TYPE_BITS;
> > +
> > +		switch (restype) {
> > +		case IORESOURCE_IO:
> > +			if (resource_type(&pci->io))
> > +				dev_warn(dev,
> > +					 "ignoring additional io resource\n");
> > +			else
> > +				of_pci_range_to_resource(&range, np, &pci->io);
> > +			break;
> > +		case IORESOURCE_MEM:
> > +			if (resource_type(&pci->mem))
> > +				dev_warn(dev,
> > +					 "ignoring additional mem resource\n");
> > +			else
> > +				of_pci_range_to_resource(&range, np, &pci->mem);
> > +			break;
> 
> This shows once more that the range parser code is suboptimal. So far
> every single driver got the I/O space wrong here, because the obvious
> way to write this function is also completely wrong.

I see you mentioned to Liviu that you should register a logical resource,
rather than physical resource returned from the parser. It seems odd that
I/O space appears to work with this code as-is (I've tested it on arm using
kvmtool by removing the memory bars).

> What you get out of "of_pci_range_to_resource(&range, np, &pci->io)"
> is not the resource you want to pass into pci_add_resource()
> later.

Do I need to open-code the resource translation from phys -> logical?

> 
> > +	memset(&hw, 0, sizeof(hw));
> > +	hw.nr_controllers	= 1;
> > +	hw.private_data		= (void **)&pci;
> > +	hw.setup		= virt_pci_setup;
> > +	hw.map_irq		= of_irq_parse_and_map_pci;
> > +	hw.ops			= &virt_pci_ops;
> > +	pci_common_init_dev(dev, &hw);
> 
> Since most fields here are constant, I'd just write this as
> 
> 	struct hw_pci hw = {
> 		.nr_controllers = 1,
> 		.setup = virt_pci_setup,
> 		...
> 	};

Can do.

Thanks,

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Feb. 5, 2014, 7:27 p.m. UTC | #3
On Wed, Feb 05, 2014 at 07:09:47PM +0000, Will Deacon wrote:
> Hi Arnd,
> 
> Thanks for having a look.
> 
> On Tue, Feb 04, 2014 at 07:13:49PM +0000, Arnd Bergmann wrote:
> > On Tuesday 04 February 2014 16:53:03 Will Deacon wrote:
> > > +
> > > +- ranges         : As described in IEEE Std 1275-1994, but must provide
> > > +                   at least a definition of the Configuration Space plus
> > > +                   one or both of IO and Memory Space.
> > > +
> > 
> > I might need to reread the spec, but I think the config space is not
> > actually supposed to be in the 'ranges' of the host bridge at all,
> > and it should just be listed in the 'reg'.
> 
> This wasn't at all clear to me (I listed it in the cover-letter as being
> something to sort out).

When we talked about this earlier on the DT bindings list the
consensus seemed to be that configuration MMIO ranges should only be
used if the underlying memory was exactly ECAM, and was not to be used
for random configuration related register blocks.

The rational being that generic code, upon seeing that ranges entry,
could just go ahead and assume ECAM mapping.

> Now, if "reg" is definitely the correct thing to do, is it simply a matter
> of putting the Configuration Space base address in there, or do we also need
> to do the rest of what ePAPR says (expansion ROM details, ...)? I don't like
> the idea of enumerating the entire bus in the DT when we don't need to.

If you use 'reg' then it is a private base address to your driver and
you can do whatever you want with it.

Most of the ePAPR things are only needed if the FW is going to
communicate detailed information to the OS, for an environment that
relies on Linux resource assignment and discovery you can just ignore
all that.

> > This won't allow extended config space. Why not just do the
> > regular mmconfig layout and make this:
> > 
> > 	cfg_offset(bus, device, function, register) =
> > 		bus << 20 | device << 15 | function << 12 | register;
> 
> Is it worth adding a DT property to support both, or is ECAM the only thing
> to care about? I'm happy either way, although I'll need to hack kvmtool to
> use the new scheme.

If you use ECAM then I wonder if your driver might be a generic SBSA
driver too?

I can't think of a reason to support alternate MMIO layouts if
kvmtool is the only user and can be changed.

> > I don't think we even want "virt" in the compatible string. The
> > binding should be generic enough that it can actually work with
> > real hardware.
> 
> Sure. How about "arm,pci-generic" ? Alternatives are
> "arm,pcie-generic" or "linux,pci-generic".

arm,pci-ecam-generic ?

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Maydell Feb. 5, 2014, 7:41 p.m. UTC | #4
On 5 February 2014 19:27, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> I can't think of a reason to support alternate MMIO layouts if
> kvmtool is the only user and can be changed.

I expect we'll get a QEMU implementation too at some point,
but we should be able to just make that match whatever you
decide is the correct layout and behaviour.

The only constraint QEMU has which kvmtool doesn't is that
we care about not randomly shuffling things around between
QEMU versions, so whatever we pick we're more or less
stuck with. So a layout which leaves scope for "supports
PCI now, may do PCI-e later, MSI after that" would be good.
(My PCI knowledge is very piecemeal, I hope that makes sense...)

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 5, 2014, 8:26 p.m. UTC | #5
On Wednesday 05 February 2014 19:09:47 Will Deacon wrote:

> On Tue, Feb 04, 2014 at 07:13:49PM +0000, Arnd Bergmann wrote:
> > On Tuesday 04 February 2014 16:53:03 Will Deacon wrote:
> Now, if "reg" is definitely the correct thing to do, is it simply a matter
> of putting the Configuration Space base address in there, or do we also need
> to do the rest of what ePAPR says (expansion ROM details, ...)? I don't like
> the idea of enumerating the entire bus in the DT when we don't need to.

You won't have to worry about the per-device stuff, aside from that,
do what Jason says ;-)
 
> > > +Configuration Space is assumed to be memory-mapped (as opposed to being
> > > +accessed via an ioport) and laid out with a direct correspondence to the
> > > +geography of a PCI bus address, by concatenating the various components
> > > +to form a 24-bit offset:
> > > +
> > > +        cfg_offset(bus, device, function, register) =
> > > +                bus << 16 | device << 11 | function << 8 | register
> > 
> > This won't allow extended config space. Why not just do the
> > regular mmconfig layout and make this:
> > 
> > 	cfg_offset(bus, device, function, register) =
> > 		bus << 20 | device << 15 | function << 12 | register;
> 
> Is it worth adding a DT property to support both, or is ECAM the only thing
> to care about? I'm happy either way, although I'll need to hack kvmtool to
> use the new scheme.

For any 64-bit system, ECAM is the only thing we need. On 32-bit
systems, we can't just map the entire config space though, since
that would require 256MB of vmalloc space. Ideally the driver is
smart enough to map only the space for the present buses (1MB
per bus), which I think is what x86 does, or one page per
PCI function that is present, but that can be tricky for hotplugging.

> > > +static int virt_pci_setup(int nr, struct pci_sys_data *sys)
> > > +{
> > > +	struct virt_pci *pci = sys->private_data;
> > > +
> > > +	if (resource_type(&pci->io)) {
> > > +		pci_add_resource(&sys->resources, &pci->io);
> > > +		pci_ioremap_io(nr * resource_size(&pci->io), pci->io.start);
> > > +	}
> > 
> > This should really compute an io_offset.
> > 
> > > +	if (resource_type(&pci->mem))
> > > +		pci_add_resource(&sys->resources, &pci->mem);
> > 
> > and also a mem_offset, which is something different.
> 
> As somebody new to PCI, I'm afraid you've lost me here. Are you referring to
> using pci_add_resource_offset instead, then removing my restriction on
> having a single resource from the parsing code?

I mean pci_add_resource_offset, but I don't understand what the
restriction is that you are talking about. To elaborate on the offsets,
the common case is that the PCI memory space is the same as the
host physical address space for both MMIO and DMA, while you have
only one PCI host with a single I/O space range from port 0 to 65536.

If you mandate that, your code is actually correct and you do not
require an io_offset or mem_offset.

In practice, there can be various ways that a system requires something
more complex:

* You can have a memory space range that puts PCI bus address zero
  at the start of the pci->mem resource. In this case, you have
  mem_offset = pci->mem.start. We should probably try not to do
  this, but there is existing hardware doing it.

* You might have multiple sections of the PCI bus space mapped
  into CPU physical space. If you want to support legacy VGA
  console, you probably want to map the first 16MB of the bus
  space at an arbitrary location (with the mem_offset as above),
  plus a second, larger section of the bus space with an identity
  mapping (mem_offset_= 0) for all devices other than VGA.
  You'd also need to copy some VGA specific code from arm32 to
  arm64 to actually make this work.

* If you have two PCI host bridges and each of them comes with
  its own I/O space range starting between 0x0-0xffff, you have
  to map one of them into logical I/O space address 0x10000-0x1ffff
  and set io_offset = 0x10000 for that bus.

* Alternatively, the second bus in that case can use *bus* I/O
  port address 0x10000-0x1ffff and use io_offset=0, but that
  prevents you from having legacy ISA I/O ports on the
  second bus, since they are hardwired to a known port number
  in the range 0x0-0x1000 (the first 4KB). This includes VGA
  console.

> > > +	for_each_of_pci_range(&parser, &range) {
> > > +		u32 restype = range.flags & IORESOURCE_TYPE_BITS;
> > > +
> > > +		switch (restype) {
> > > +		case IORESOURCE_IO:
> > > +			if (resource_type(&pci->io))
> > > +				dev_warn(dev,
> > > +					 "ignoring additional io resource\n");
> > > +			else
> > > +				of_pci_range_to_resource(&range, np, &pci->io);
> > > +			break;
> > > +		case IORESOURCE_MEM:
> > > +			if (resource_type(&pci->mem))
> > > +				dev_warn(dev,
> > > +					 "ignoring additional mem resource\n");
> > > +			else
> > > +				of_pci_range_to_resource(&range, np, &pci->mem);
> > > +			break;
> > 
> > This shows once more that the range parser code is suboptimal. So far
> > every single driver got the I/O space wrong here, because the obvious
> > way to write this function is also completely wrong.
> 
> I see you mentioned to Liviu that you should register a logical resource,
> rather than physical resource returned from the parser. It seems odd that
> I/O space appears to work with this code as-is (I've tested it on arm using
> kvmtool by removing the memory bars).

what do you see in /proc/ioports and /proc/iomem then?

> > What you get out of "of_pci_range_to_resource(&range, np, &pci->io)"
> > is not the resource you want to pass into pci_add_resource()
> > later.
> 
> Do I need to open-code the resource translation from phys -> logical?

I think we should have some common infrastructure that lets you
get this right more easily.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Feb. 5, 2014, 8:53 p.m. UTC | #6
On Wed, Feb 05, 2014 at 09:26:17PM +0100, Arnd Bergmann wrote:
> > > What you get out of "of_pci_range_to_resource(&range, np, &pci->io)"
> > > is not the resource you want to pass into pci_add_resource()
> > > later.

Right, of_pci_range_to_resource returns the CPU MMIO address. A big
problem here is that struct resource is being re-used for bus
physical, CPU MMIO, and Linux Driver addressing domains without any
helpful tagging.

typedef struct resource resource_bus;
typedef struct resource resource_cpu;

?

> > Do I need to open-code the resource translation from phys -> logical?
> 
> I think we should have some common infrastructure that lets you
> get this right more easily.

The offset stuff seems to be very confusing to people, removing it
from the APIs and forcing drivers to talk about bus addresess, CPU
addresses and internal Linux addresses seems a bit more plain?

What do you think about something like this:

int of_pci_alloc_io(.. resources,
                    struct of_pci_range *range,
                    struct device_node *np)
{
  struct resource bus_address, mmio_window, res;

  bus_address.start = range->pci_addr;
  bus_address.end = range->pci_addr + range->size - 1;

  mmio_window.start = range->cpu_addr;
  mmio_window.end = range->cpu_addr + range->size - 1;

  /* Input bus_address - addresses seen on the bus
           mmio_window - physical CPU address to create the bus
	   	         addreses
     Output res - Address suitable for use in drivers
     This does the pci_ioremap_io too */
  pci_alloc_virtual_io_window(&bus_address, &mmio_window, &res);

  /* bus_address - addresses seen on the bus
     res - matching driver view for the bus addresses */
  pci_add_resource_bus(&resources, &bus_address, &res);
}

And a similar function for MMIO that just omits
pci_alloc_virtual_io_window.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 6, 2014, 8:28 a.m. UTC | #7
On Wednesday 05 February 2014, Jason Gunthorpe wrote:
> On Wed, Feb 05, 2014 at 09:26:17PM +0100, Arnd Bergmann wrote:
> > > > What you get out of "of_pci_range_to_resource(&range, np, &pci->io)"
> > > > is not the resource you want to pass into pci_add_resource()
> > > > later.
> 
> Right, of_pci_range_to_resource returns the CPU MMIO address. A big
> problem here is that struct resource is being re-used for bus
> physical, CPU MMIO, and Linux Driver addressing domains without any
> helpful tagging.
> 
> typedef struct resource resource_bus;
> typedef struct resource resource_cpu;
> 
> ?

I fear the resource management needs a bigger overhaul than than.
Most importantly, 'struct resource' is not connected to 'struct device'
at the moment, and it only deals with resource types that were around
15 years ago.

> > > Do I need to open-code the resource translation from phys -> logical?
> > 
> > I think we should have some common infrastructure that lets you
> > get this right more easily.
> 
> The offset stuff seems to be very confusing to people, removing it
> from the APIs and forcing drivers to talk about bus addresess, CPU
> addresses and internal Linux addresses seems a bit more plain?

Interesting idea. I'm not sure how Bjorn will like that
after he just did an overhaul of all users of the offset
API some time ago, but let's see what he thinks.

> What do you think about something like this:
> 
> int of_pci_alloc_io(.. resources,
>                     struct of_pci_range *range,
>                     struct device_node *np)
> {
>   struct resource bus_address, mmio_window, res;
> 
>   bus_address.start = range->pci_addr;
>   bus_address.end = range->pci_addr + range->size - 1;
> 
>   mmio_window.start = range->cpu_addr;
>   mmio_window.end = range->cpu_addr + range->size - 1;
> 
>   /* Input bus_address - addresses seen on the bus
>            mmio_window - physical CPU address to create the bus
> 	   	         addreses
>      Output res - Address suitable for use in drivers
>      This does the pci_ioremap_io too */
>   pci_alloc_virtual_io_window(&bus_address, &mmio_window, &res);
> 
>   /* bus_address - addresses seen on the bus
>      res - matching driver view for the bus addresses */
>   pci_add_resource_bus(&resources, &bus_address, &res);
> }
> 
> And a similar function for MMIO that just omits
> pci_alloc_virtual_io_window.

It certainly seems workable. OTOH if we just manage to do a
helper that scans the OF ranges, allocates the I/O window,
remaps it and calls the existing pci_add_resource_offset()
helper, PCI host drivers don't need to worry about the
io_offsets computation either and just need to pull out the
correct window locations if they need to set up the hardware
translation windows (which I'd hope we can often let the boot
loader take care of).

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anup Patel Feb. 6, 2014, 8:54 a.m. UTC | #8
On Tue, Feb 4, 2014 at 10:23 PM, Will Deacon <will.deacon@arm.com> wrote:
> This patch adds support for an extremely simple virtual PCI host
> controller. The controller itself has no configuration registers, and
> has its address spaces described entirely by the device-tree (using the
> bindings described by ePAPR). This allows emulations, such as kvmtool,
> to provide a simple means for a guest Linux instance to make use of
> PCI devices.
>
> Corresponding documentation is added for the DT binding.
>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  .../devicetree/bindings/pci/linux,pci-virt.txt     |  38 ++++
>  drivers/pci/host/Kconfig                           |   7 +
>  drivers/pci/host/Makefile                          |   1 +
>  drivers/pci/host/pci-virt.c                        | 200 +++++++++++++++++++++
>  4 files changed, 246 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/pci/linux,pci-virt.txt
>  create mode 100644 drivers/pci/host/pci-virt.c
>
> diff --git a/Documentation/devicetree/bindings/pci/linux,pci-virt.txt b/Documentation/devicetree/bindings/pci/linux,pci-virt.txt
> new file mode 100644
> index 000000000000..54668a283498
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/pci/linux,pci-virt.txt
> @@ -0,0 +1,38 @@
> +* ARM Basic Virtual PCI controller
> +
> +PCI emulations, such as the virtio-pci implementations found in kvmtool
> +and other para-virtualised systems, do not require driver support for
> +complexities such as regulator and clock management. In fact, the
> +controller may not even have a control interface visible to the
> +operating system, instead presenting a set of fixed windows describing a
> +subset of IO, Memory and Configuration spaces.
> +
> +Such a controller can be described purely in terms of the standardized
> +device tree bindings communicated in pci.txt:
> +
> +- compatible     : Must be "linux,pci-virt"
> +
> +- ranges         : As described in IEEE Std 1275-1994, but must provide
> +                   at least a definition of the Configuration Space plus
> +                   one or both of IO and Memory Space.
> +
> +- #address-cells : Must be 3
> +
> +- #size-cells    : Must be 2
> +
> +Configuration Space is assumed to be memory-mapped (as opposed to being

It would be great to have a flag to specify whether Configuration Space is over
ioports or memory mapped.

Regards,
Anup

> +accessed via an ioport) and laid out with a direct correspondence to the
> +geography of a PCI bus address, by concatenating the various components
> +to form a 24-bit offset:
> +
> +        cfg_offset(bus, device, function, register) =
> +                bus << 16 | device << 11 | function << 8 | register
> +
> +Interrupt mapping is exactly as described in `Open Firmware Recommended
> +Practice: Interrupt Mapping' and requires the following properties:
> +
> +- #interrupt-cells   : Must be 1
> +
> +- interrupt-map      : <see aforementioned specification>
> +
> +- interrupt-map-mask : <see aforementioned specification>
> diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
> index 47d46c6d8468..fd4460573b81 100644
> --- a/drivers/pci/host/Kconfig
> +++ b/drivers/pci/host/Kconfig
> @@ -33,4 +33,11 @@ config PCI_RCAR_GEN2
>           There are 3 internal PCI controllers available with a single
>           built-in EHCI/OHCI host controller present on each one.
>
> +config PCI_VIRT_HOST
> +       bool "Virtual PCI host controller"
> +       depends on ARM && OF
> +       help
> +         Say Y here if you want to support a very simple virtual PCI
> +         host controller, such as the one emulated by kvmtool.
> +
>  endmenu
> diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
> index 13fb3333aa05..9b6775d95d3b 100644
> --- a/drivers/pci/host/Makefile
> +++ b/drivers/pci/host/Makefile
> @@ -4,3 +4,4 @@ obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
>  obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
>  obj-$(CONFIG_PCI_TEGRA) += pci-tegra.o
>  obj-$(CONFIG_PCI_RCAR_GEN2) += pci-rcar-gen2.o
> +obj-$(CONFIG_PCI_VIRT_HOST) += pci-virt.o
> diff --git a/drivers/pci/host/pci-virt.c b/drivers/pci/host/pci-virt.c
> new file mode 100644
> index 000000000000..ded01474453b
> --- /dev/null
> +++ b/drivers/pci/host/pci-virt.c
> @@ -0,0 +1,200 @@
> +/*
> + * Very basic PCI host controller driver targetting virtual machines
> + * (e.g. the PCI emulation provided by kvmtool).
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Copyright (C) 2014 ARM Limited
> + *
> + * Author: Will Deacon <will.deacon@arm.com>
> + *
> + * This driver currently supports (per instance):
> + *     - A single controller
> + *     - A single memory space and/or port space
> + *     - A memory-mapped configuration space
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/of_address.h>
> +#include <linux/of_pci.h>
> +#include <linux/platform_device.h>
> +
> +struct virt_pci {
> +       struct device   *dev;
> +
> +       struct resource cfg;
> +       struct resource io;
> +       struct resource mem;
> +
> +       void __iomem    *cfg_base;
> +};
> +
> +static void __iomem *virt_pci_config_address(struct pci_bus *bus,
> +                                            unsigned int devfn,
> +                                            int where)
> +{
> +       struct pci_sys_data *sys = bus->sysdata;
> +       struct virt_pci *pci = sys->private_data;
> +       void __iomem *addr = pci->cfg_base;
> +
> +       /*
> +        * We construct config space addresses by simply sandwiching
> +        * together all of the PCI address components and using the
> +        * result as an offset into a 16M region.
> +        */
> +       return addr + (((u32)bus->number << 16) | (devfn << 8) | where);
> +}
> +
> +
> +static int virt_pci_config_read(struct pci_bus *bus, unsigned int devfn,
> +                               int where, int size, u32 *val)
> +{
> +       void __iomem *addr = virt_pci_config_address(bus, devfn, where);
> +
> +       switch (size) {
> +       case 1:
> +               *val = readb(addr);
> +               break;
> +       case 2:
> +               *val = readw(addr);
> +               break;
> +       default:
> +               *val = readl(addr);
> +       }
> +
> +       return PCIBIOS_SUCCESSFUL;
> +}
> +
> +static int virt_pci_config_write(struct pci_bus *bus, unsigned int devfn,
> +                                int where, int size, u32 val)
> +{
> +       void __iomem *addr = virt_pci_config_address(bus, devfn, where);
> +
> +       switch (size) {
> +       case 1:
> +               writeb(val, addr);
> +               break;
> +       case 2:
> +               writew(val, addr);
> +               break;
> +       default:
> +               writel(val, addr);
> +       }
> +
> +       return PCIBIOS_SUCCESSFUL;
> +}
> +
> +static struct pci_ops virt_pci_ops = {
> +       .read   = virt_pci_config_read,
> +       .write  = virt_pci_config_write,
> +};
> +
> +static int virt_pci_setup(int nr, struct pci_sys_data *sys)
> +{
> +       struct virt_pci *pci = sys->private_data;
> +
> +       if (resource_type(&pci->io)) {
> +               pci_add_resource(&sys->resources, &pci->io);
> +               pci_ioremap_io(nr * resource_size(&pci->io), pci->io.start);
> +       }
> +
> +       if (resource_type(&pci->mem))
> +               pci_add_resource(&sys->resources, &pci->mem);
> +
> +       pci->cfg_base = devm_ioremap_resource(pci->dev, &pci->cfg);
> +       return !IS_ERR(pci->cfg_base);
> +}
> +
> +static const struct of_device_id virt_pci_of_match[] = {
> +       { .compatible = "linux,pci-virt" },
> +       { },
> +};
> +MODULE_DEVICE_TABLE(of, virt_pci_of_match);
> +
> +static int virt_pci_probe(struct platform_device *pdev)
> +{
> +       struct hw_pci hw;
> +       struct of_pci_range range;
> +       struct of_pci_range_parser parser;
> +       struct virt_pci *pci;
> +       struct device *dev = &pdev->dev;
> +       struct device_node *np = dev->of_node;
> +
> +       if (of_pci_range_parser_init(&parser, np)) {
> +               dev_err(dev, "missing \"ranges\" property\n");
> +               return -EINVAL;
> +       }
> +
> +       pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL);
> +       if (!pci)
> +               return -ENOMEM;
> +
> +       pci->dev = dev;
> +       for_each_of_pci_range(&parser, &range) {
> +               u32 restype = range.flags & IORESOURCE_TYPE_BITS;
> +
> +               switch (restype) {
> +               case IORESOURCE_IO:
> +                       if (resource_type(&pci->io))
> +                               dev_warn(dev,
> +                                        "ignoring additional io resource\n");
> +                       else
> +                               of_pci_range_to_resource(&range, np, &pci->io);
> +                       break;
> +               case IORESOURCE_MEM:
> +                       if (resource_type(&pci->mem))
> +                               dev_warn(dev,
> +                                        "ignoring additional mem resource\n");
> +                       else
> +                               of_pci_range_to_resource(&range, np, &pci->mem);
> +                       break;
> +               case 0: /* cfg */
> +                       if (resource_type(&pci->cfg)) {
> +                               dev_warn(dev,
> +                                        "ignoring additional cfg resource\n");
> +                       } else {
> +                               of_pci_range_to_resource(&range, np, &pci->cfg);
> +                               pci->cfg.flags |= IORESOURCE_MEM;
> +                       }
> +                       break;
> +               default:
> +                       dev_warn(dev,
> +                               "ignoring unknown/unsupported resource type %x\n",
> +                                restype);
> +               }
> +       }
> +
> +       memset(&hw, 0, sizeof(hw));
> +       hw.nr_controllers       = 1;
> +       hw.private_data         = (void **)&pci;
> +       hw.setup                = virt_pci_setup;
> +       hw.map_irq              = of_irq_parse_and_map_pci;
> +       hw.ops                  = &virt_pci_ops;
> +       pci_common_init_dev(dev, &hw);
> +       return 0;
> +}
> +
> +static struct platform_driver virt_pci_driver = {
> +       .driver = {
> +               .name = "pci-virt",
> +               .owner = THIS_MODULE,
> +               .of_match_table = virt_pci_of_match,
> +       },
> +       .probe = virt_pci_probe,
> +};
> +module_platform_driver(virt_pci_driver);
> +
> +MODULE_DESCRIPTION("Virtual PCI host driver");
> +MODULE_AUTHOR("Will Deacon <will.deacon@arm.com>");
> +MODULE_LICENSE("GPLv2");
> --
> 1.8.2.2
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 6, 2014, 10:26 a.m. UTC | #9
On Thursday 06 February 2014 14:24:03 Anup Patel wrote:
> > diff --git a/Documentation/devicetree/bindings/pci/linux,pci-virt.txt b/Documentation/devicetree/bindings/pci/linux,pci-virt.txt
> > new file mode 100644
> > index 000000000000..54668a283498
> > +- compatible     : Must be "linux,pci-virt"
> > +
> > +- ranges         : As described in IEEE Std 1275-1994, but must provide
> > +                   at least a definition of the Configuration Space plus
> > +                   one or both of IO and Memory Space.
> > +
> > +- #address-cells : Must be 3
> > +
> > +- #size-cells    : Must be 2
> > +
> > +Configuration Space is assumed to be memory-mapped (as opposed to being
> 
> It would be great to have a flag to specify whether Configuration Space is over
> ioports or memory mapped.
> 

I haven't seen config space access through ioport for a long
time on non-x86. Do you have reason to believe that we will need
that?

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anup Patel Feb. 6, 2014, 10:52 a.m. UTC | #10
On Thu, Feb 6, 2014 at 3:56 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 06 February 2014 14:24:03 Anup Patel wrote:
>> > diff --git a/Documentation/devicetree/bindings/pci/linux,pci-virt.txt b/Documentation/devicetree/bindings/pci/linux,pci-virt.txt
>> > new file mode 100644
>> > index 000000000000..54668a283498
>> > +- compatible     : Must be "linux,pci-virt"
>> > +
>> > +- ranges         : As described in IEEE Std 1275-1994, but must provide
>> > +                   at least a definition of the Configuration Space plus
>> > +                   one or both of IO and Memory Space.
>> > +
>> > +- #address-cells : Must be 3
>> > +
>> > +- #size-cells    : Must be 2
>> > +
>> > +Configuration Space is assumed to be memory-mapped (as opposed to being
>>
>> It would be great to have a flag to specify whether Configuration Space is over
>> ioports or memory mapped.
>>
>
> I haven't seen config space access through ioport for a long
> time on non-x86. Do you have reason to believe that we will need
> that?

I suggested it to make "pci-virt" useful for x86-world too.

>
>         Arnd

--
Anup
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liviu Dudau Feb. 6, 2014, 10:54 a.m. UTC | #11
On Thu, Feb 06, 2014 at 08:54:03AM +0000, Anup Patel wrote:
> On Tue, Feb 4, 2014 at 10:23 PM, Will Deacon <will.deacon@arm.com> wrote:
> > This patch adds support for an extremely simple virtual PCI host
> > controller. The controller itself has no configuration registers, and
> > has its address spaces described entirely by the device-tree (using the
> > bindings described by ePAPR). This allows emulations, such as kvmtool,
> > to provide a simple means for a guest Linux instance to make use of
> > PCI devices.
> >
> > Corresponding documentation is added for the DT binding.
> >
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > ---
> >  .../devicetree/bindings/pci/linux,pci-virt.txt     |  38 ++++
> >  drivers/pci/host/Kconfig                           |   7 +
> >  drivers/pci/host/Makefile                          |   1 +
> >  drivers/pci/host/pci-virt.c                        | 200 +++++++++++++++++++++
> >  4 files changed, 246 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/pci/linux,pci-virt.txt
> >  create mode 100644 drivers/pci/host/pci-virt.c
> >
> > diff --git a/Documentation/devicetree/bindings/pci/linux,pci-virt.txt b/Documentation/devicetree/bindings/pci/linux,pci-virt.txt
> > new file mode 100644
> > index 000000000000..54668a283498
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/pci/linux,pci-virt.txt
> > @@ -0,0 +1,38 @@
> > +* ARM Basic Virtual PCI controller
> > +
> > +PCI emulations, such as the virtio-pci implementations found in kvmtool
> > +and other para-virtualised systems, do not require driver support for
> > +complexities such as regulator and clock management. In fact, the
> > +controller may not even have a control interface visible to the
> > +operating system, instead presenting a set of fixed windows describing a
> > +subset of IO, Memory and Configuration spaces.
> > +
> > +Such a controller can be described purely in terms of the standardized
> > +device tree bindings communicated in pci.txt:
> > +
> > +- compatible     : Must be "linux,pci-virt"
> > +
> > +- ranges         : As described in IEEE Std 1275-1994, but must provide
> > +                   at least a definition of the Configuration Space plus
> > +                   one or both of IO and Memory Space.
> > +
> > +- #address-cells : Must be 3
> > +
> > +- #size-cells    : Must be 2
> > +
> > +Configuration Space is assumed to be memory-mapped (as opposed to being
> 
> It would be great to have a flag to specify whether Configuration Space is over
> ioports or memory mapped.

This is another reason why I prefer the reg property for specifying the configuration
space address range. I don't see a straight way of making the distinction you
need using the ranges property.

> 
> Regards,
> Anup
> 
> > +accessed via an ioport) and laid out with a direct correspondence to the
> > +geography of a PCI bus address, by concatenating the various components
> > +to form a 24-bit offset:
> > +
> > +        cfg_offset(bus, device, function, register) =
> > +                bus << 16 | device << 11 | function << 8 | register
> > +
> > +Interrupt mapping is exactly as described in `Open Firmware Recommended
> > +Practice: Interrupt Mapping' and requires the following properties:
> > +
> > +- #interrupt-cells   : Must be 1
> > +
> > +- interrupt-map      : <see aforementioned specification>
> > +
> > +- interrupt-map-mask : <see aforementioned specification>
> > diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
> > index 47d46c6d8468..fd4460573b81 100644
> > --- a/drivers/pci/host/Kconfig
> > +++ b/drivers/pci/host/Kconfig
> > @@ -33,4 +33,11 @@ config PCI_RCAR_GEN2
> >           There are 3 internal PCI controllers available with a single
> >           built-in EHCI/OHCI host controller present on each one.
> >
> > +config PCI_VIRT_HOST
> > +       bool "Virtual PCI host controller"
> > +       depends on ARM && OF
> > +       help
> > +         Say Y here if you want to support a very simple virtual PCI
> > +         host controller, such as the one emulated by kvmtool.
> > +
> >  endmenu
> > diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
> > index 13fb3333aa05..9b6775d95d3b 100644
> > --- a/drivers/pci/host/Makefile
> > +++ b/drivers/pci/host/Makefile
> > @@ -4,3 +4,4 @@ obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
> >  obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
> >  obj-$(CONFIG_PCI_TEGRA) += pci-tegra.o
> >  obj-$(CONFIG_PCI_RCAR_GEN2) += pci-rcar-gen2.o
> > +obj-$(CONFIG_PCI_VIRT_HOST) += pci-virt.o
> > diff --git a/drivers/pci/host/pci-virt.c b/drivers/pci/host/pci-virt.c
> > new file mode 100644
> > index 000000000000..ded01474453b
> > --- /dev/null
> > +++ b/drivers/pci/host/pci-virt.c
> > @@ -0,0 +1,200 @@
> > +/*
> > + * Very basic PCI host controller driver targetting virtual machines
> > + * (e.g. the PCI emulation provided by kvmtool).
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> > + *
> > + * Copyright (C) 2014 ARM Limited
> > + *
> > + * Author: Will Deacon <will.deacon@arm.com>
> > + *
> > + * This driver currently supports (per instance):
> > + *     - A single controller
> > + *     - A single memory space and/or port space
> > + *     - A memory-mapped configuration space
> > + */
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/module.h>
> > +#include <linux/of_address.h>
> > +#include <linux/of_pci.h>
> > +#include <linux/platform_device.h>
> > +
> > +struct virt_pci {
> > +       struct device   *dev;
> > +
> > +       struct resource cfg;
> > +       struct resource io;
> > +       struct resource mem;
> > +
> > +       void __iomem    *cfg_base;
> > +};
> > +
> > +static void __iomem *virt_pci_config_address(struct pci_bus *bus,
> > +                                            unsigned int devfn,
> > +                                            int where)
> > +{
> > +       struct pci_sys_data *sys = bus->sysdata;
> > +       struct virt_pci *pci = sys->private_data;
> > +       void __iomem *addr = pci->cfg_base;
> > +
> > +       /*
> > +        * We construct config space addresses by simply sandwiching
> > +        * together all of the PCI address components and using the
> > +        * result as an offset into a 16M region.
> > +        */
> > +       return addr + (((u32)bus->number << 16) | (devfn << 8) | where);
> > +}
> > +
> > +
> > +static int virt_pci_config_read(struct pci_bus *bus, unsigned int devfn,
> > +                               int where, int size, u32 *val)
> > +{
> > +       void __iomem *addr = virt_pci_config_address(bus, devfn, where);
> > +
> > +       switch (size) {
> > +       case 1:
> > +               *val = readb(addr);
> > +               break;
> > +       case 2:
> > +               *val = readw(addr);
> > +               break;
> > +       default:
> > +               *val = readl(addr);
> > +       }
> > +
> > +       return PCIBIOS_SUCCESSFUL;
> > +}
> > +
> > +static int virt_pci_config_write(struct pci_bus *bus, unsigned int devfn,
> > +                                int where, int size, u32 val)
> > +{
> > +       void __iomem *addr = virt_pci_config_address(bus, devfn, where);
> > +
> > +       switch (size) {
> > +       case 1:
> > +               writeb(val, addr);
> > +               break;
> > +       case 2:
> > +               writew(val, addr);
> > +               break;
> > +       default:
> > +               writel(val, addr);
> > +       }
> > +
> > +       return PCIBIOS_SUCCESSFUL;
> > +}
> > +
> > +static struct pci_ops virt_pci_ops = {
> > +       .read   = virt_pci_config_read,
> > +       .write  = virt_pci_config_write,
> > +};
> > +
> > +static int virt_pci_setup(int nr, struct pci_sys_data *sys)
> > +{
> > +       struct virt_pci *pci = sys->private_data;
> > +
> > +       if (resource_type(&pci->io)) {
> > +               pci_add_resource(&sys->resources, &pci->io);
> > +               pci_ioremap_io(nr * resource_size(&pci->io), pci->io.start);
> > +       }
> > +
> > +       if (resource_type(&pci->mem))
> > +               pci_add_resource(&sys->resources, &pci->mem);
> > +
> > +       pci->cfg_base = devm_ioremap_resource(pci->dev, &pci->cfg);
> > +       return !IS_ERR(pci->cfg_base);
> > +}
> > +
> > +static const struct of_device_id virt_pci_of_match[] = {
> > +       { .compatible = "linux,pci-virt" },
> > +       { },
> > +};
> > +MODULE_DEVICE_TABLE(of, virt_pci_of_match);
> > +
> > +static int virt_pci_probe(struct platform_device *pdev)
> > +{
> > +       struct hw_pci hw;
> > +       struct of_pci_range range;
> > +       struct of_pci_range_parser parser;
> > +       struct virt_pci *pci;
> > +       struct device *dev = &pdev->dev;
> > +       struct device_node *np = dev->of_node;
> > +
> > +       if (of_pci_range_parser_init(&parser, np)) {
> > +               dev_err(dev, "missing \"ranges\" property\n");
> > +               return -EINVAL;
> > +       }
> > +
> > +       pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL);
> > +       if (!pci)
> > +               return -ENOMEM;
> > +
> > +       pci->dev = dev;
> > +       for_each_of_pci_range(&parser, &range) {
> > +               u32 restype = range.flags & IORESOURCE_TYPE_BITS;
> > +
> > +               switch (restype) {
> > +               case IORESOURCE_IO:
> > +                       if (resource_type(&pci->io))
> > +                               dev_warn(dev,
> > +                                        "ignoring additional io resource\n");
> > +                       else
> > +                               of_pci_range_to_resource(&range, np, &pci->io);
> > +                       break;
> > +               case IORESOURCE_MEM:
> > +                       if (resource_type(&pci->mem))
> > +                               dev_warn(dev,
> > +                                        "ignoring additional mem resource\n");
> > +                       else
> > +                               of_pci_range_to_resource(&range, np, &pci->mem);
> > +                       break;
> > +               case 0: /* cfg */
> > +                       if (resource_type(&pci->cfg)) {
> > +                               dev_warn(dev,
> > +                                        "ignoring additional cfg resource\n");
> > +                       } else {
> > +                               of_pci_range_to_resource(&range, np, &pci->cfg);
> > +                               pci->cfg.flags |= IORESOURCE_MEM;
> > +                       }
> > +                       break;
> > +               default:
> > +                       dev_warn(dev,
> > +                               "ignoring unknown/unsupported resource type %x\n",
> > +                                restype);
> > +               }
> > +       }
> > +
> > +       memset(&hw, 0, sizeof(hw));
> > +       hw.nr_controllers       = 1;
> > +       hw.private_data         = (void **)&pci;
> > +       hw.setup                = virt_pci_setup;
> > +       hw.map_irq              = of_irq_parse_and_map_pci;
> > +       hw.ops                  = &virt_pci_ops;
> > +       pci_common_init_dev(dev, &hw);
> > +       return 0;
> > +}
> > +
> > +static struct platform_driver virt_pci_driver = {
> > +       .driver = {
> > +               .name = "pci-virt",
> > +               .owner = THIS_MODULE,
> > +               .of_match_table = virt_pci_of_match,
> > +       },
> > +       .probe = virt_pci_probe,
> > +};
> > +module_platform_driver(virt_pci_driver);
> > +
> > +MODULE_DESCRIPTION("Virtual PCI host driver");
> > +MODULE_AUTHOR("Will Deacon <will.deacon@arm.com>");
> > +MODULE_LICENSE("GPLv2");
> > --
> > 1.8.2.2
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Will Deacon Feb. 6, 2014, 11 a.m. UTC | #12
On Thu, Feb 06, 2014 at 10:54:42AM +0000, Liviu Dudau wrote:
> On Thu, Feb 06, 2014 at 08:54:03AM +0000, Anup Patel wrote:
> > On Tue, Feb 4, 2014 at 10:23 PM, Will Deacon <will.deacon@arm.com> wrote:
> > > This patch adds support for an extremely simple virtual PCI host
> > > controller. The controller itself has no configuration registers, and
> > > has its address spaces described entirely by the device-tree (using the
> > > bindings described by ePAPR). This allows emulations, such as kvmtool,
> > > to provide a simple means for a guest Linux instance to make use of
> > > PCI devices.
> > >
> > > Corresponding documentation is added for the DT binding.
> > >
> > > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > > ---
> > >  .../devicetree/bindings/pci/linux,pci-virt.txt     |  38 ++++
> > >  drivers/pci/host/Kconfig                           |   7 +
> > >  drivers/pci/host/Makefile                          |   1 +
> > >  drivers/pci/host/pci-virt.c                        | 200 +++++++++++++++++++++
> > >  4 files changed, 246 insertions(+)
> > >  create mode 100644 Documentation/devicetree/bindings/pci/linux,pci-virt.txt
> > >  create mode 100644 drivers/pci/host/pci-virt.c
> > >
> > > diff --git a/Documentation/devicetree/bindings/pci/linux,pci-virt.txt b/Documentation/devicetree/bindings/pci/linux,pci-virt.txt
> > > new file mode 100644
> > > index 000000000000..54668a283498
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/pci/linux,pci-virt.txt
> > > @@ -0,0 +1,38 @@
> > > +* ARM Basic Virtual PCI controller
> > > +
> > > +PCI emulations, such as the virtio-pci implementations found in kvmtool
> > > +and other para-virtualised systems, do not require driver support for
> > > +complexities such as regulator and clock management. In fact, the
> > > +controller may not even have a control interface visible to the
> > > +operating system, instead presenting a set of fixed windows describing a
> > > +subset of IO, Memory and Configuration spaces.
> > > +
> > > +Such a controller can be described purely in terms of the standardized
> > > +device tree bindings communicated in pci.txt:
> > > +
> > > +- compatible     : Must be "linux,pci-virt"
> > > +
> > > +- ranges         : As described in IEEE Std 1275-1994, but must provide
> > > +                   at least a definition of the Configuration Space plus
> > > +                   one or both of IO and Memory Space.
> > > +
> > > +- #address-cells : Must be 3
> > > +
> > > +- #size-cells    : Must be 2
> > > +
> > > +Configuration Space is assumed to be memory-mapped (as opposed to being
> > 
> > It would be great to have a flag to specify whether Configuration Space is over
> > ioports or memory mapped.
> 
> This is another reason why I prefer the reg property for specifying the configuration
> space address range. I don't see a straight way of making the distinction you
> need using the ranges property.

Well if we need to distinguish cam vs ecam, then adding ioport to the mix
should be (conceptually) easy and I don't think it would involve the "reg"
property.

However, I'm not planning to add ioport support myself.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 6, 2014, 11:28 a.m. UTC | #13
On Thursday 06 February 2014 11:00:16 Will Deacon wrote:
> On Thu, Feb 06, 2014 at 10:54:42AM +0000, Liviu Dudau wrote:
> > On Thu, Feb 06, 2014 at 08:54:03AM +0000, Anup Patel wrote:
> > > On Tue, Feb 4, 2014 at 10:23 PM, Will Deacon <will.deacon@arm.com> wrote:
> > 
> > This is another reason why I prefer the reg property for specifying the configuration
> > space address range. I don't see a straight way of making the distinction you
> > need using the ranges property.
> 
> Well if we need to distinguish cam vs ecam, then adding ioport to the mix
> should be (conceptually) easy and I don't think it would involve the "reg"
> property.
> 
> However, I'm not planning to add ioport support myself.

Maybe it's better to have separate compatible strings for these cases,
can call it "pci-ecam-generic" or "pci-cam-generic" to have an easy
distinction. Or we just use the "reg-names" property so the device
can have a "cam" register or an "ecam" register or both.

I don't think we can make the driver generic enough for
x86 as Anup suggests though, since x86 has its own set of quirks to
deal with the various PCI config space access methods.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux Feb. 6, 2014, 8:31 p.m. UTC | #14
On Thu, Feb 06, 2014 at 09:28:52AM +0100, Arnd Bergmann wrote:
> It certainly seems workable. OTOH if we just manage to do a
> helper that scans the OF ranges, allocates the I/O window,
> remaps it and calls the existing pci_add_resource_offset()
> helper, PCI host drivers don't need to worry about the
> io_offsets computation either and just need to pull out the
> correct window locations if they need to set up the hardware
> translation windows (which I'd hope we can often let the boot
> loader take care of).

Wrong.  Think about it for a moment.

Let's say you have two host bridges.  One host bridge has an IO window
which maps to bus I/O address 0 at 0x40000000.  The other host bridge
has it's IO window which maps bus I/O address 0 to 0x48000000.

So, the contents of the /hardware/ BARs is going to be in the range
of 0x0000 to 0xffff.  That means the value found in them is meaningless
without knowing which bus it's on.

Now, remember we said that I/O was going to be in a fixed location of
a fixed size, that being 0xfee00000 and 1MB in size.  So, we map the
first IO window to 0xfee00000.  What about the other one?  Well,
that could be mapped to 0xfee10000.

However, we need the IO addresses visible to the Linux kernel to be
offset by 0x10000 for the second bus - merely reading the BAR and
storing that in a resource, for the driver to later pick up and pass
into inb()/outb() won't work.  There needs to be offsetting.  This is
the exact reason why we have the offsetting for IO windows.

Exactly the same goes for memory windows as well.  It's no good working
in the hosts physical address space when looking at BARs, because they're
not in that address space.  They're in the bus address space which can
be entirely different.

So, whenever you enumerate a PCI bus, and read the resource information
out of the BARs, you /must/ know how that address region specified in
the BAR as a /bus/ address maps to the /host/ address space.
Will Deacon Feb. 7, 2014, 11:46 a.m. UTC | #15
Hi Arnd, Jason,

First of all, thanks for the really helpful feedback. I'll take it on-board
for v2.

On Wed, Feb 05, 2014 at 08:26:17PM +0000, Arnd Bergmann wrote:
> On Wednesday 05 February 2014 19:09:47 Will Deacon wrote:
> > On Tue, Feb 04, 2014 at 07:13:49PM +0000, Arnd Bergmann wrote:
> > > This should really compute an io_offset.
> > > 
> > > > +	if (resource_type(&pci->mem))
> > > > +		pci_add_resource(&sys->resources, &pci->mem);
> > > 
> > > and also a mem_offset, which is something different.
> > 
> > As somebody new to PCI, I'm afraid you've lost me here. Are you referring to
> > using pci_add_resource_offset instead, then removing my restriction on
> > having a single resource from the parsing code?
> 
> I mean pci_add_resource_offset, but I don't understand what the
> restriction is that you are talking about. To elaborate on the offsets,
> the common case is that the PCI memory space is the same as the
> host physical address space for both MMIO and DMA, while you have
> only one PCI host with a single I/O space range from port 0 to 65536.
> 
> If you mandate that, your code is actually correct and you do not
> require an io_offset or mem_offset.

Right, so that's what I've currently been relying on. If I mandate that,
will I be making this driver significantly less useful?

> In practice, there can be various ways that a system requires something
> more complex:
> 
> * You can have a memory space range that puts PCI bus address zero
>   at the start of the pci->mem resource. In this case, you have
>   mem_offset = pci->mem.start. We should probably try not to do
>   this, but there is existing hardware doing it.

If it's not the common case, then the generic driver might not need to care
(at least, initially).

> * You might have multiple sections of the PCI bus space mapped
>   into CPU physical space. If you want to support legacy VGA
>   console, you probably want to map the first 16MB of the bus
>   space at an arbitrary location (with the mem_offset as above),
>   plus a second, larger section of the bus space with an identity
>   mapping (mem_offset_= 0) for all devices other than VGA.
>   You'd also need to copy some VGA specific code from arm32 to
>   arm64 to actually make this work.

Again, I'd rather cross that bridge (no pun intended) when we decide we want
legacy VGA.

> * If you have two PCI host bridges and each of them comes with
>   its own I/O space range starting between 0x0-0xffff, you have
>   to map one of them into logical I/O space address 0x10000-0x1ffff
>   and set io_offset = 0x10000 for that bus.

Right, this sounds a lot more plausible from the perspective of a generic
driver. Since the current code only allows exactly one I/O space region, we
avoid the issue but I can remove that restriction (that I mentioned earlier)
and offset the region based on the bridge.

> > > This shows once more that the range parser code is suboptimal. So far
> > > every single driver got the I/O space wrong here, because the obvious
> > > way to write this function is also completely wrong.
> > 
> > I see you mentioned to Liviu that you should register a logical resource,
> > rather than physical resource returned from the parser. It seems odd that
> > I/O space appears to work with this code as-is (I've tested it on arm using
> > kvmtool by removing the memory bars).
> 
> what do you see in /proc/ioports and /proc/iomem then?

bash-4.2# cat /proc/ioports
00006200-000065ff : virtio-pci
00006600-000069ff : virtio-pci
00006a00-00006dff : virtio-pci
00006e00-000071ff : virtio-pci

bash-4.2# cat /proc/iomem
40000000-40ffffff : /pci
41000400-410005ff : virtio-pci
41000c00-41000dff : virtio-pci
41001400-410015ff : virtio-pci
41001c00-41001dff : virtio-pci
80000000-93ffffff : System RAM
  80008000-8053df0b : Kernel code
  80570000-805c07fb : Kernel data

> > > What you get out of "of_pci_range_to_resource(&range, np, &pci->io)"
> > > is not the resource you want to pass into pci_add_resource()
> > > later.
> > 
> > Do I need to open-code the resource translation from phys -> logical?
> 
> I think we should have some common infrastructure that lets you
> get this right more easily.

Okey doke, is anybody working on that? (I see the follow up from Jason, but
it's not clear whether that's going to be merged).

Cheers,

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Feb. 7, 2014, 5:54 p.m. UTC | #16
On Fri, Feb 07, 2014 at 11:46:07AM +0000, Will Deacon wrote:

> > In practice, there can be various ways that a system requires something
> > more complex:
> > 
> > * You can have a memory space range that puts PCI bus address zero
> >   at the start of the pci->mem resource. In this case, you have
> >   mem_offset = pci->mem.start. We should probably try not to do
> >   this, but there is existing hardware doing it.
> 
> If it's not the common case, then the generic driver might not need to care
> (at least, initially).

Something to think about, other people are going to reference this
driver when writing drivers for their own hardware, it would be nice
to see it perfect..

AFAIK, the job is fairly simple, when you call pci_add_resource_offset
for memory compute the offset from 
  of_pci_range.pci_addr - of_pci_range.cpu_addr

(or is it the other way around ?)

And when you do it for IO then you compute the offset between the
requested io mapping base to the pci_addr.

> > * You might have multiple sections of the PCI bus space mapped
> >   into CPU physical space. If you want to support legacy VGA
> >   console, you probably want to map the first 16MB of the bus
> >   space at an arbitrary location (with the mem_offset as above),
> >   plus a second, larger section of the bus space with an identity
> >   mapping (mem_offset_= 0) for all devices other than VGA.
> >   You'd also need to copy some VGA specific code from arm32 to
> >   arm64 to actually make this work.
> 
> Again, I'd rather cross that bridge (no pun intended) when we decide we want
> legacy VGA.

Fortuantely if you compute the offset directly from the DT then you
don't need to do anything more. If someone wants to use this
arrangement then they just have to setup the HW and write a proper DT
with two ranges lines for memory and everything should just work.

> Okey doke, is anybody working on that? (I see the follow up from
> Jason, but it's not clear whether that's going to be merged).

Nope, just a thought to stimulate discussion :)

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 9, 2014, 8:18 p.m. UTC | #17
On Thursday 06 February 2014, Russell King - ARM Linux wrote:
> On Thu, Feb 06, 2014 at 09:28:52AM +0100, Arnd Bergmann wrote:
> > It certainly seems workable. OTOH if we just manage to do a
> > helper that scans the OF ranges, allocates the I/O window,
> > remaps it and calls the existing pci_add_resource_offset()
> > helper, PCI host drivers don't need to worry about the
> > io_offsets computation either and just need to pull out the
> > correct window locations if they need to set up the hardware
> > translation windows (which I'd hope we can often let the boot
> > loader take care of).

...

> So, whenever you enumerate a PCI bus, and read the resource information
> out of the BARs, you must know how that address region specified in
> the BAR as a bus address maps to the host address space.
> 

None of that contradicts what I wrote. Please try to understand what
I suggested, which is to have a common way to communicate that
information from DT to the PCI core without involving the PCI host
bridge driver.

All the bus scanning is done in common code, which already knows
how to factor in io_offset and mem_offset. The mem_offset can be
trivially computed from the ranges property, and the io_offset
is known by the time we call pci_ioremap_io() or rather a
replacement such as the one I proposed that also contains an
allocator.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 9, 2014, 8:30 p.m. UTC | #18
On Friday 07 February 2014, Will Deacon wrote:
> On Wed, Feb 05, 2014 at 08:26:17PM +0000, Arnd Bergmann wrote:
> > On Wednesday 05 February 2014 19:09:47 Will Deacon wrote:
> > > On Tue, Feb 04, 2014 at 07:13:49PM +0000, Arnd Bergmann wrote:
> > If you mandate that, your code is actually correct and you do not
> > require an io_offset or mem_offset.
> 
> Right, so that's what I've currently been relying on. If I mandate that,
> will I be making this driver significantly less useful?

After thinking about it some more, I think we should really try to
keep that logic completely out of the host controller driver and instead
make it generic enough to cover all possible cases.

Let's make sure that nobody ever has to call the range parser from
a PCI host driver again, no matter how weird their hardware setup is,
and put the necessary code in a common location. This will make your
driver even simpler.

> > * You might have multiple sections of the PCI bus space mapped
> >   into CPU physical space. If you want to support legacy VGA
> >   console, you probably want to map the first 16MB of the bus
> >   space at an arbitrary location (with the mem_offset as above),
> >   plus a second, larger section of the bus space with an identity
> >   mapping (mem_offset_= 0) for all devices other than VGA.
> >   You'd also need to copy some VGA specific code from arm32 to
> >   arm64 to actually make this work.
> 
> Again, I'd rather cross that bridge (no pun intended) when we decide we want
> legacy VGA.

Fair enough.

> 
> > > > This shows once more that the range parser code is suboptimal. So far
> > > > every single driver got the I/O space wrong here, because the obvious
> > > > way to write this function is also completely wrong.
> > > 
> > > I see you mentioned to Liviu that you should register a logical resource,
> > > rather than physical resource returned from the parser. It seems odd that
> > > I/O space appears to work with this code as-is (I've tested it on arm using
> > > kvmtool by removing the memory bars).
> > 
> > what do you see in /proc/ioports and /proc/iomem then?
> 
> bash-4.2# cat /proc/ioports
> 00006200-000065ff : virtio-pci
> 00006600-000069ff : virtio-pci
> 00006a00-00006dff : virtio-pci
> 00006e00-000071ff : virtio-pci
> 
> bash-4.2# cat /proc/iomem
> 40000000-40ffffff : /pci
> 41000400-410005ff : virtio-pci
> 41000c00-41000dff : virtio-pci
> 41001400-410015ff : virtio-pci
> 41001c00-41001dff : virtio-pci
> 80000000-93ffffff : System RAM
>   80008000-8053df0b : Kernel code
>   80570000-805c07fb : Kernel data

You should normally see a parent resource for the PCI bus and the virtio-pci
resources under that. For some reason, neither of the two appears to have
been registered correctly.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux Feb. 9, 2014, 8:34 p.m. UTC | #19
On Sun, Feb 09, 2014 at 09:18:19PM +0100, Arnd Bergmann wrote:
> On Thursday 06 February 2014, Russell King - ARM Linux wrote:
> > On Thu, Feb 06, 2014 at 09:28:52AM +0100, Arnd Bergmann wrote:
> > > It certainly seems workable. OTOH if we just manage to do a
> > > helper that scans the OF ranges, allocates the I/O window,
> > > remaps it and calls the existing pci_add_resource_offset()
> > > helper, PCI host drivers don't need to worry about the
> > > io_offsets computation either and just need to pull out the
> > > correct window locations if they need to set up the hardware
> > > translation windows (which I'd hope we can often let the boot
> > > loader take care of).
> 
> ...
> 
> > So, whenever you enumerate a PCI bus, and read the resource information
> > out of the BARs, you must know how that address region specified in
> > the BAR as a bus address maps to the host address space.
> > 
> 
> None of that contradicts what I wrote. Please try to understand what
> I suggested, which is to have a common way to communicate that
> information from DT to the PCI core without involving the PCI host
> bridge driver.

Please explain it better then.
Jason Gunthorpe Feb. 10, 2014, 5:34 p.m. UTC | #20
On Sun, Feb 09, 2014 at 09:30:25PM +0100, Arnd Bergmann wrote:

> > bash-4.2# cat /proc/iomem
> > 40000000-40ffffff : /pci
> > 41000400-410005ff : virtio-pci
> > 41000c00-41000dff : virtio-pci
> > 41001400-410015ff : virtio-pci
> > 41001c00-41001dff : virtio-pci
> > 80000000-93ffffff : System RAM
> >   80008000-8053df0b : Kernel code
> >   80570000-805c07fb : Kernel data
> 
> You should normally see a parent resource for the PCI bus and the virtio-pci
> resources under that. For some reason, neither of the two appears to have
> been registered correctly.

I noticed this on mvebu as well..

3.13 w/ mvebu driver:

e0001000-e0001fff : /mbus/pex@e0000000/pcie@1,0/fpga@0/fpga_sysmon@1000
e0006000-e0006fff : /mbus/pex@e0000000/pcie@1,0/fpga@0/qdr2p@6000

3.10 w/ old kirkwood driver:

e0000000-e7ffffff : PCIe 0 MEM
  e0000000-e001ffff : 0000:00:01.0
    e0001000-e0001fff : /mbus/pex@e0000000/pcie@1,0/fpga@0/fpga_sysmon@1000
    e0006000-e0006fff : /mbus/pex@e0000000/pcie@1,0/fpga@0/qdr2p@6000

The latter is obviously correct and matches x86. I'm not sure where
the new style host drivers are going wrong, even the resource that
should be added by the PCI core itself for the BAR is missing..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 11, 2014, 7:15 p.m. UTC | #21
On Sunday 09 February 2014, Russell King - ARM Linux wrote:
> On Sun, Feb 09, 2014 at 09:18:19PM +0100, Arnd Bergmann wrote:
> > On Thursday 06 February 2014, Russell King - ARM Linux wrote:
> > 
> > > So, whenever you enumerate a PCI bus, and read the resource information
> > > out of the BARs, you must know how that address region specified in
> > > the BAR as a bus address maps to the host address space.
> > > 
> > 
> > None of that contradicts what I wrote. Please try to understand what
> > I suggested, which is to have a common way to communicate that
> > information from DT to the PCI core without involving the PCI host
> > bridge driver.
> 
> Please explain it better then.
>

Let me try again: Looking at the dw_pcie driver (since that is one of
the few that gets it mostly right), we have quite a bit of generic
code in dw_pcie_host_init() and in dw_pcie_setup(). All the DT parsing
in there just implements the generic PCI DT binding, and what the
setup function does depends exclusively on what comes out of the
parser.

If we manage to move all the common parts into a generic helper
function that gets called by the setup() on arm32, we no longer
need to worry about host drivers implementing an incorrect DT
parser or getting the io_offset calculation wrong, because they
don't actually see any of that, plus we save a lot of duplicate
code.

How it fits together with the new arm64 pci support is a different
question, but if it's just a helper function, it should really work
on any architecture.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon Feb. 12, 2014, 6:10 p.m. UTC | #22
On Fri, Feb 07, 2014 at 05:54:10PM +0000, Jason Gunthorpe wrote:
> On Fri, Feb 07, 2014 at 11:46:07AM +0000, Will Deacon wrote:
> 
> > > In practice, there can be various ways that a system requires something
> > > more complex:
> > > 
> > > * You can have a memory space range that puts PCI bus address zero
> > >   at the start of the pci->mem resource. In this case, you have
> > >   mem_offset = pci->mem.start. We should probably try not to do
> > >   this, but there is existing hardware doing it.
> > 
> > If it's not the common case, then the generic driver might not need to care
> > (at least, initially).
> 
> Something to think about, other people are going to reference this
> driver when writing drivers for their own hardware, it would be nice
> to see it perfect..
> 
> AFAIK, the job is fairly simple, when you call pci_add_resource_offset
> for memory compute the offset from 
>   of_pci_range.pci_addr - of_pci_range.cpu_addr
> 
> (or is it the other way around ?)

I think it's the other way round: bus = cpu - offset, then Arnd's example of
PCI bus 0 works out as: 0 = cpu - pci->mem_start.

I added that to my driver, but I get some weird looking bus addresses in
dmesg:

[    0.307585] pci-arm-generic 40000000.pci: PCI host bridge to bus 0000:00
[    0.307601] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
[    0.307615] pci_bus 0000:00: root bus resource [mem 0x00000000-0x3effffff] (bus address [0xffffffffbf000000-0xfffffffffdffffff])

Looking at drivers/pci/probe.c, it seems to think that res->start - offset
gives a bus address, which implies that the resources are indeed *CPU*
addresses.

Are you sure pci_add_resource_offset wants bus addresses?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Feb. 12, 2014, 6:19 p.m. UTC | #23
On Wed, Feb 12, 2014 at 06:10:15PM +0000, Will Deacon wrote:

> > AFAIK, the job is fairly simple, when you call pci_add_resource_offset
> > for memory compute the offset from 
> >   of_pci_range.pci_addr - of_pci_range.cpu_addr
> > 
> > (or is it the other way around ?)
> 
> I think it's the other way round: bus = cpu - offset, then Arnd's example of
> PCI bus 0 works out as: 0 = cpu - pci->mem_start.

That looks right to me

> I added that to my driver, but I get some weird looking bus addresses in
> dmesg:
> 
> [    0.307585] pci-arm-generic 40000000.pci: PCI host bridge to bus 0000:00
> [    0.307601] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
> [    0.307615] pci_bus 0000:00: root bus resource [mem 0x00000000-0x3effffff] (bus address [0xffffffffbf000000-0xfffffffffdffffff])
> 
> Looking at drivers/pci/probe.c, it seems to think that res->start - offset
> gives a bus address, which implies that the resources are indeed *CPU*
> addresses.
> 
> Are you sure pci_add_resource_offset wants bus addresses?

Sorry, I wasn't clear: It accepts a cpu address in the struct
resource and an offset to convert back to a bus address.

You should compute 0 as the offset in the normal case, ie
of_pci_range.pci_addr and of_pci_range.cpu_addr should be identical,
which depends on the DT ranges being correct..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon Feb. 12, 2014, 6:21 p.m. UTC | #24
On Wed, Feb 12, 2014 at 06:19:13PM +0000, Jason Gunthorpe wrote:
> On Wed, Feb 12, 2014 at 06:10:15PM +0000, Will Deacon wrote:
> 
> > > AFAIK, the job is fairly simple, when you call pci_add_resource_offset
> > > for memory compute the offset from 
> > >   of_pci_range.pci_addr - of_pci_range.cpu_addr
> > > 
> > > (or is it the other way around ?)
> > 
> > I think it's the other way round: bus = cpu - offset, then Arnd's example of
> > PCI bus 0 works out as: 0 = cpu - pci->mem_start.
> 
> That looks right to me
> 
> > I added that to my driver, but I get some weird looking bus addresses in
> > dmesg:
> > 
> > [    0.307585] pci-arm-generic 40000000.pci: PCI host bridge to bus 0000:00
> > [    0.307601] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
> > [    0.307615] pci_bus 0000:00: root bus resource [mem 0x00000000-0x3effffff] (bus address [0xffffffffbf000000-0xfffffffffdffffff])
> > 
> > Looking at drivers/pci/probe.c, it seems to think that res->start - offset
> > gives a bus address, which implies that the resources are indeed *CPU*
> > addresses.
> > 
> > Are you sure pci_add_resource_offset wants bus addresses?
> 
> Sorry, I wasn't clear: It accepts a cpu address in the struct
> resource and an offset to convert back to a bus address.
> 
> You should compute 0 as the offset in the normal case, ie
> of_pci_range.pci_addr and of_pci_range.cpu_addr should be identical,
> which depends on the DT ranges being correct..

Aha! That explains all of the confusion. I'll remove my homebrew
resource translation code then :)

Thanks,

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/pci/linux,pci-virt.txt b/Documentation/devicetree/bindings/pci/linux,pci-virt.txt
new file mode 100644
index 000000000000..54668a283498
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/linux,pci-virt.txt
@@ -0,0 +1,38 @@ 
+* ARM Basic Virtual PCI controller
+
+PCI emulations, such as the virtio-pci implementations found in kvmtool
+and other para-virtualised systems, do not require driver support for
+complexities such as regulator and clock management. In fact, the
+controller may not even have a control interface visible to the
+operating system, instead presenting a set of fixed windows describing a
+subset of IO, Memory and Configuration spaces.
+
+Such a controller can be described purely in terms of the standardized
+device tree bindings communicated in pci.txt:
+
+- compatible     : Must be "linux,pci-virt"
+
+- ranges         : As described in IEEE Std 1275-1994, but must provide
+                   at least a definition of the Configuration Space plus
+                   one or both of IO and Memory Space.
+
+- #address-cells : Must be 3
+
+- #size-cells    : Must be 2
+
+Configuration Space is assumed to be memory-mapped (as opposed to being
+accessed via an ioport) and laid out with a direct correspondence to the
+geography of a PCI bus address, by concatenating the various components
+to form a 24-bit offset:
+
+        cfg_offset(bus, device, function, register) =
+                bus << 16 | device << 11 | function << 8 | register
+
+Interrupt mapping is exactly as described in `Open Firmware Recommended
+Practice: Interrupt Mapping' and requires the following properties:
+
+- #interrupt-cells   : Must be 1
+
+- interrupt-map      : <see aforementioned specification>
+
+- interrupt-map-mask : <see aforementioned specification>
diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
index 47d46c6d8468..fd4460573b81 100644
--- a/drivers/pci/host/Kconfig
+++ b/drivers/pci/host/Kconfig
@@ -33,4 +33,11 @@  config PCI_RCAR_GEN2
 	  There are 3 internal PCI controllers available with a single
 	  built-in EHCI/OHCI host controller present on each one.
 
+config PCI_VIRT_HOST
+	bool "Virtual PCI host controller"
+	depends on ARM && OF
+	help
+	  Say Y here if you want to support a very simple virtual PCI
+	  host controller, such as the one emulated by kvmtool.
+
 endmenu
diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
index 13fb3333aa05..9b6775d95d3b 100644
--- a/drivers/pci/host/Makefile
+++ b/drivers/pci/host/Makefile
@@ -4,3 +4,4 @@  obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
 obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
 obj-$(CONFIG_PCI_TEGRA) += pci-tegra.o
 obj-$(CONFIG_PCI_RCAR_GEN2) += pci-rcar-gen2.o
+obj-$(CONFIG_PCI_VIRT_HOST) += pci-virt.o
diff --git a/drivers/pci/host/pci-virt.c b/drivers/pci/host/pci-virt.c
new file mode 100644
index 000000000000..ded01474453b
--- /dev/null
+++ b/drivers/pci/host/pci-virt.c
@@ -0,0 +1,200 @@ 
+/*
+ * Very basic PCI host controller driver targetting virtual machines
+ * (e.g. the PCI emulation provided by kvmtool).
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (C) 2014 ARM Limited
+ *
+ * Author: Will Deacon <will.deacon@arm.com>
+ *
+ * This driver currently supports (per instance):
+ *	- A single controller
+ *	- A single memory space and/or port space
+ *	- A memory-mapped configuration space
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of_address.h>
+#include <linux/of_pci.h>
+#include <linux/platform_device.h>
+
+struct virt_pci {
+	struct device	*dev;
+
+	struct resource	cfg;
+	struct resource	io;
+	struct resource	mem;
+
+	void __iomem	*cfg_base;
+};
+
+static void __iomem *virt_pci_config_address(struct pci_bus *bus,
+					     unsigned int devfn,
+					     int where)
+{
+	struct pci_sys_data *sys = bus->sysdata;
+	struct virt_pci *pci = sys->private_data;
+	void __iomem *addr = pci->cfg_base;
+
+	/*
+	 * We construct config space addresses by simply sandwiching
+	 * together all of the PCI address components and using the
+	 * result as an offset into a 16M region.
+	 */
+	return addr + (((u32)bus->number << 16) | (devfn << 8) | where);
+}
+
+
+static int virt_pci_config_read(struct pci_bus *bus, unsigned int devfn,
+				int where, int size, u32 *val)
+{
+	void __iomem *addr = virt_pci_config_address(bus, devfn, where);
+
+	switch (size) {
+	case 1:
+		*val = readb(addr);
+		break;
+	case 2:
+		*val = readw(addr);
+		break;
+	default:
+		*val = readl(addr);
+	}
+
+	return PCIBIOS_SUCCESSFUL;
+}
+
+static int virt_pci_config_write(struct pci_bus *bus, unsigned int devfn,
+				 int where, int size, u32 val)
+{
+	void __iomem *addr = virt_pci_config_address(bus, devfn, where);
+
+	switch (size) {
+	case 1:
+		writeb(val, addr);
+		break;
+	case 2:
+		writew(val, addr);
+		break;
+	default:
+		writel(val, addr);
+	}
+
+	return PCIBIOS_SUCCESSFUL;
+}
+
+static struct pci_ops virt_pci_ops = {
+	.read	= virt_pci_config_read,
+	.write	= virt_pci_config_write,
+};
+
+static int virt_pci_setup(int nr, struct pci_sys_data *sys)
+{
+	struct virt_pci *pci = sys->private_data;
+
+	if (resource_type(&pci->io)) {
+		pci_add_resource(&sys->resources, &pci->io);
+		pci_ioremap_io(nr * resource_size(&pci->io), pci->io.start);
+	}
+
+	if (resource_type(&pci->mem))
+		pci_add_resource(&sys->resources, &pci->mem);
+
+	pci->cfg_base = devm_ioremap_resource(pci->dev, &pci->cfg);
+	return !IS_ERR(pci->cfg_base);
+}
+
+static const struct of_device_id virt_pci_of_match[] = {
+	{ .compatible = "linux,pci-virt" },
+	{ },
+};
+MODULE_DEVICE_TABLE(of, virt_pci_of_match);
+
+static int virt_pci_probe(struct platform_device *pdev)
+{
+	struct hw_pci hw;
+	struct of_pci_range range;
+	struct of_pci_range_parser parser;
+	struct virt_pci *pci;
+	struct device *dev = &pdev->dev;
+	struct device_node *np = dev->of_node;
+
+	if (of_pci_range_parser_init(&parser, np)) {
+		dev_err(dev, "missing \"ranges\" property\n");
+		return -EINVAL;
+	}
+
+	pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL);
+	if (!pci)
+		return -ENOMEM;
+
+	pci->dev = dev;
+	for_each_of_pci_range(&parser, &range) {
+		u32 restype = range.flags & IORESOURCE_TYPE_BITS;
+
+		switch (restype) {
+		case IORESOURCE_IO:
+			if (resource_type(&pci->io))
+				dev_warn(dev,
+					 "ignoring additional io resource\n");
+			else
+				of_pci_range_to_resource(&range, np, &pci->io);
+			break;
+		case IORESOURCE_MEM:
+			if (resource_type(&pci->mem))
+				dev_warn(dev,
+					 "ignoring additional mem resource\n");
+			else
+				of_pci_range_to_resource(&range, np, &pci->mem);
+			break;
+		case 0:	/* cfg */
+			if (resource_type(&pci->cfg)) {
+				dev_warn(dev,
+					 "ignoring additional cfg resource\n");
+			} else {
+				of_pci_range_to_resource(&range, np, &pci->cfg);
+				pci->cfg.flags |= IORESOURCE_MEM;
+			}
+			break;
+		default:
+			dev_warn(dev,
+				"ignoring unknown/unsupported resource type %x\n",
+				 restype);
+		}
+	}
+
+	memset(&hw, 0, sizeof(hw));
+	hw.nr_controllers	= 1;
+	hw.private_data		= (void **)&pci;
+	hw.setup		= virt_pci_setup;
+	hw.map_irq		= of_irq_parse_and_map_pci;
+	hw.ops			= &virt_pci_ops;
+	pci_common_init_dev(dev, &hw);
+	return 0;
+}
+
+static struct platform_driver virt_pci_driver = {
+	.driver = {
+		.name = "pci-virt",
+		.owner = THIS_MODULE,
+		.of_match_table = virt_pci_of_match,
+	},
+	.probe = virt_pci_probe,
+};
+module_platform_driver(virt_pci_driver);
+
+MODULE_DESCRIPTION("Virtual PCI host driver");
+MODULE_AUTHOR("Will Deacon <will.deacon@arm.com>");
+MODULE_LICENSE("GPLv2");