[1/3] PCI: Introduce PCI software bridge common logic

Message ID 20180629092231.32207-2-thomas.petazzoni@bootlin.com
State Superseded
Delegated to: Lorenzo Pieralisi
Headers show
Series
  • PCI: emulated PCI bridge common logic
Related show

Commit Message

Thomas Petazzoni June 29, 2018, 9:22 a.m.
Some PCI host controllers do not expose at the hardware level a root
port PCI bridge. Due to this, the Marvell Armada 370/38x/XP PCI
controller driver (pci-mvebu) emulates a root port PCI bridge, and
uses that to (among other thingsà) dynamically create the memory
windows that correspond to the PCI MEM and I/O regions.

Since we now need to add a very similar logic for the Marvell Armada
37xx PCI controller driver (pci-aardvark), instead of duplicating the
code, we create in this commit a common logic called pci-sw-bridge.

The idea of this logic is to emulate a root port PCI bridge by
providing configuration space read/write operations, and faking behind
the scenes the configuration space of a PCI bridge. A PCI host
controller driver simply has to call pci_sw_bridge_read() and
pci_sw_bridge_write() to read/write the configuration space of the
bridge.

By default, the PCI bridge configuration space is simply emulated by a
chunk of memory, but the PCI host controller can override the behavior
of the read and write operations on a per-register basis to do
additional actions if needed.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 drivers/pci/Kconfig           |   3 +
 drivers/pci/Makefile          |   1 +
 drivers/pci/pci-sw-bridge.c   | 149 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci-sw-bridge.h | 125 +++++++++++++++++++++++++++++++++++
 4 files changed, 278 insertions(+)
 create mode 100644 drivers/pci/pci-sw-bridge.c
 create mode 100644 include/linux/pci-sw-bridge.h

Comments

Bjorn Helgaas July 12, 2018, 7:58 p.m. | #1
[+cc Ray since he's doing something similar for iProc,
     Ley Foon & Simon for a possible Altera & R-Car bug]

On Fri, Jun 29, 2018 at 11:22:29AM +0200, Thomas Petazzoni wrote:
> Some PCI host controllers do not expose at the hardware level a root
> port PCI bridge. Due to this, the Marvell Armada 370/38x/XP PCI
> controller driver (pci-mvebu) emulates a root port PCI bridge, and
> uses that to (among other thingsà) dynamically create the memory
> windows that correspond to the PCI MEM and I/O regions.
> 
> Since we now need to add a very similar logic for the Marvell Armada
> 37xx PCI controller driver (pci-aardvark), instead of duplicating the
> code, we create in this commit a common logic called pci-sw-bridge.
> 
> The idea of this logic is to emulate a root port PCI bridge by
> providing configuration space read/write operations, and faking behind
> the scenes the configuration space of a PCI bridge. A PCI host
> controller driver simply has to call pci_sw_bridge_read() and
> pci_sw_bridge_write() to read/write the configuration space of the
> bridge.

These systems must actually have a Root Port (there's obviously a Port
that originates the link), but it's not visible in a spec-compliant
way.  So this is basically a wrapper that translates accesses to
standard config space into the vendor-specific registers.

Since there really *is* a hardware bridge but it just doesn't have the
interface we expect, "sw_bridge" doesn't feel like quite the right
name for it.  Maybe something related to adapter/emulation/interface/...?

There are several drivers that do this, and it would be really cool to
have them all do it in a similar way.  I found at least these:

  hv_pcifront_read_config()
  mvebu_pcie_rd_conf()
  thunder_ecam_config_read()
  thunder_pem_config_read()
  xgene_pcie_config_read32()
  iproc_pcie_config_read32()
  rcar_pcie_read_conf()

I *really* like the fact that your accessors use the normal register
names, e.g., PCI_EXP_DEVCAP instead of PCISWCAP_EXP_DEVCAP.  That's
huge for greppability.

> By default, the PCI bridge configuration space is simply emulated by a
> chunk of memory, but the PCI host controller can override the behavior
> of the read and write operations on a per-register basis to do
> additional actions if needed.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> ---
>  drivers/pci/Kconfig           |   3 +
>  drivers/pci/Makefile          |   1 +
>  drivers/pci/pci-sw-bridge.c   | 149 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci-sw-bridge.h | 125 +++++++++++++++++++++++++++++++++++

Could this go in drivers/pci instead of include/linux?  I'd prefer to
hide it inside the PCI core if that's possible.

> + * Should be called by the PCI controller driver when reading the PCI
> + * configuration space of the fake bridge. It will call back the
> + * ->ops->read_base or ->ops->read_pcie operations.
> + */
> +int pci_sw_bridge_read(struct pci_sw_bridge *bridge, int where,
> +		       int size, u32 *value)

We don't really have a strong convention about the names of config
accessors: *_rd_conf(), *_read_config(), *config_read() are all used.
Maybe we could at least include "conf" to connect this with config
space as opposed to MMIO space.

> +{
> +	int ret;
> +	int reg = where & ~3;
> +
> +	if (bridge->has_pcie && reg >= PCI_CAP_PCIE_END) {
> +		*value = 0;
> +		return PCIBIOS_SUCCESSFUL;
> +	}
> +
> +	if (!bridge->has_pcie && reg >= PCI_BRIDGE_CONF_END) {
> +		*value = 0;
> +		return PCIBIOS_SUCCESSFUL;
> +	}
> +
> +	if (bridge->has_pcie && reg >= PCI_CAP_PCIE_START) {
> +		reg -= PCI_CAP_PCIE_START;
> +
> +		if (bridge->ops->read_pcie)
> +			ret = bridge->ops->read_pcie(bridge, reg, value);
> +		else
> +			ret = PCI_SW_BRIDGE_NOT_HANDLED;
> +
> +		if (ret == PCI_SW_BRIDGE_NOT_HANDLED)
> +			*value = *((u32*) &bridge->pcie_conf + reg / 4);
> +	} else {
> +		if (bridge->ops->read_base)
> +			ret = bridge->ops->read_base(bridge, reg, value);
> +		else
> +			ret = PCI_SW_BRIDGE_NOT_HANDLED;
> +
> +		if (ret == PCI_SW_BRIDGE_NOT_HANDLED)
> +			*value = *((u32*) &bridge->conf + reg / 4);

I'm not sure about this part, where the config space that's not
handled by the bridge's ops ends up just being RAM that's readable and
writable with no effect on the hardware.  That seems a little
counter-intuitive.  I think dropping writes and reading zeroes would
simplify my mental model.

> +	}
> +
> +	if (size == 1)
> +		*value = (*value >> (8 * (where & 3))) & 0xff;
> +	else if (size == 2)
> +		*value = (*value >> (8 * (where & 3))) & 0xffff;
> +	else if (size != 4)
> +		return PCIBIOS_BAD_REGISTER_NUMBER;

This isn't directly related to your patches, but this is a common
pattern with some variations between drivers, which makes me wonder
whether there are bugs lurking.  These use "where & 3" for size 2:

  advk_pcie_rd_conf()
  mtk_pcie_hw_rd_cfg()
  pci_generic_config_read32()

But these use "where & 2":

  _altera_pcie_cfg_read()
  rcar_pcie_read_conf()

I *think* this means that unaligned 2-byte config reads on Altera and
R-Car will return incorrect data.  In both cases, the actual read seen
by the hardware is a 32-bit read, but I think we extract the wrong 16
bits from that result.

I wonder if there's a way to use a common helper function to do this.

> +	return PCIBIOS_SUCCESSFUL;
> +}

> +	/*
> +	 * Called when writing to the regular PCI bridge configuration
> +	 * space. old is the current value, new is the new value being
> +	 * written, and mask indicates which parts of the value are
> +	 * being changed.
> +	 */
> +	void (*write_base)(struct pci_sw_bridge *bridge, int reg,
> +			   u32 old, u32 new, u32 mask);
> +
> +	/*
> +	 * Same as ->read_base(), except it is for reading from the
> +	 * PCIe capability configuration space.

I think you mean "->write_base(), except it is for writing to"

> +	 */
> +	void (*write_pcie)(struct pci_sw_bridge *bridge, int reg,
> +			   u32 old, u32 new, u32 mask);
> +};
Thomas Petazzoni Aug. 1, 2018, 8:49 a.m. | #2
Hello Bjorn,

Thanks for your review!

On Thu, 12 Jul 2018 14:58:02 -0500, Bjorn Helgaas wrote:

> > The idea of this logic is to emulate a root port PCI bridge by
> > providing configuration space read/write operations, and faking behind
> > the scenes the configuration space of a PCI bridge. A PCI host
> > controller driver simply has to call pci_sw_bridge_read() and
> > pci_sw_bridge_write() to read/write the configuration space of the
> > bridge.  
> 
> These systems must actually have a Root Port (there's obviously a Port
> that originates the link), but it's not visible in a spec-compliant
> way.  So this is basically a wrapper that translates accesses to
> standard config space into the vendor-specific registers.

Correct.

> Since there really *is* a hardware bridge but it just doesn't have the
> interface we expect, "sw_bridge" doesn't feel like quite the right
> name for it.  Maybe something related to adapter/emulation/interface/...?

sw_adapter ? sw_adap ?

> There are several drivers that do this, and it would be really cool to
> have them all do it in a similar way.  I found at least these:
> 
>   hv_pcifront_read_config()

This one indeed looks potentially a bit similar.

>   mvebu_pcie_rd_conf()

This one is converted by my patch series.

>   thunder_ecam_config_read()

This one I'm not sure what is happening,

>   thunder_pem_config_read()

For this one, it seems like the HW exposes a config space, just that it
is bogus. It is a bit of a different situation. It could be handled by
the same sw_bridge/sw_adapt stuff, but it's a slightly different use
case than "the HW doesn't expose any spec-compliant root port config
space".

>   xgene_pcie_config_read32()

I don't see why this one would need sw_bridge/sw_adap.

>   iproc_pcie_config_read32()

Not sure about this one either.

>   rcar_pcie_read_conf()

Perhaps it needs sw_bridge/sw_adap.

Really for all those ones, it would require some involvement from the
corresponding driver maintainers, who understand better their HW and
see if they would benefit from sw_bridge/sw_adap.

> I *really* like the fact that your accessors use the normal register
> names, e.g., PCI_EXP_DEVCAP instead of PCISWCAP_EXP_DEVCAP.  That's
> huge for greppability.

Thanks. There's however one down-side: I had to add separate accessors
for reading/writing the main PCI config space and the PCIe capability
config space. Any additional capability config space would require
adding another pair of accessors. But I really wanted to re-use the
existing PCI_EXP_* definitions. Another approach is perhaps to pass an
"ID" of the config space being accessed, i.e 0x0 for the main config
space, and PCI_CAP_ID_EXP for PCI express.

So instead of 4 accessors in pci_sw_bridge_ops, we would have only two:

pci_sw_bridge_read_status_t (*read)(struct pci_sw_bridge *bridge,
				    int id, int reg, u32 *value);
void (*write)(struct pci_sw_bridge *bridge, int id, int reg,
              u32 old, u32 new, u32 mask);

where "id" would allow the accessor to know if we're accessing the main
config space, or the PCI express capability config space.

How does that sound ?

> > By default, the PCI bridge configuration space is simply emulated by a
> > chunk of memory, but the PCI host controller can override the behavior
> > of the read and write operations on a per-register basis to do
> > additional actions if needed.
> > 
> > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> > ---
> >  drivers/pci/Kconfig           |   3 +
> >  drivers/pci/Makefile          |   1 +
> >  drivers/pci/pci-sw-bridge.c   | 149 ++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/pci-sw-bridge.h | 125 +++++++++++++++++++++++++++++++++++  
> 
> Could this go in drivers/pci instead of include/linux?  I'd prefer to
> hide it inside the PCI core if that's possible.

Sure, I'll fix that.

> > + * Should be called by the PCI controller driver when reading the PCI
> > + * configuration space of the fake bridge. It will call back the
> > + * ->ops->read_base or ->ops->read_pcie operations.
> > + */
> > +int pci_sw_bridge_read(struct pci_sw_bridge *bridge, int where,
> > +		       int size, u32 *value)  
> 
> We don't really have a strong convention about the names of config
> accessors: *_rd_conf(), *_read_config(), *config_read() are all used.
> Maybe we could at least include "conf" to connect this with config
> space as opposed to MMIO space.

No problem, I'll fix that, makes perfect sense.

> > +	if (bridge->has_pcie && reg >= PCI_CAP_PCIE_START) {
> > +		reg -= PCI_CAP_PCIE_START;
> > +
> > +		if (bridge->ops->read_pcie)
> > +			ret = bridge->ops->read_pcie(bridge, reg, value);
> > +		else
> > +			ret = PCI_SW_BRIDGE_NOT_HANDLED;
> > +
> > +		if (ret == PCI_SW_BRIDGE_NOT_HANDLED)
> > +			*value = *((u32*) &bridge->pcie_conf + reg / 4);
> > +	} else {
> > +		if (bridge->ops->read_base)
> > +			ret = bridge->ops->read_base(bridge, reg, value);
> > +		else
> > +			ret = PCI_SW_BRIDGE_NOT_HANDLED;
> > +
> > +		if (ret == PCI_SW_BRIDGE_NOT_HANDLED)
> > +			*value = *((u32*) &bridge->conf + reg / 4);  
> 
> I'm not sure about this part, where the config space that's not
> handled by the bridge's ops ends up just being RAM that's readable and
> writable with no effect on the hardware.  That seems a little
> counter-intuitive.  I think dropping writes and reading zeroes would
> simplify my mental model.

The fact that it is handled just like RAM is actually needed for a
number of registers. For example, in the current pci-mvebu driver, when
reading we have:

        case PCI_VENDOR_ID:
                *value = bridge->device << 16 | bridge->vendor;
                break;

        case PCI_COMMAND:
                *value = bridge->command | bridge->status << 16;
                break;

        case PCI_CLASS_REVISION:
                *value = bridge->class << 16 | bridge->interface << 8 |
                         bridge->revision;
                break;

        case PCI_CACHE_LINE_SIZE:
                *value = bridge->bist << 24 | bridge->header_type << 16 |
                         bridge->latency_timer << 8 | bridge->cache_line_size;
                break;

        case PCI_BASE_ADDRESS_0 ... PCI_BASE_ADDRESS_1:
                *value = bridge->bar[((where & ~3) - PCI_BASE_ADDRESS_0) / 4];
                break;

        case PCI_PRIMARY_BUS:
                *value = (bridge->secondary_latency_timer << 24 |
                          bridge->subordinate_bus         << 16 |
                          bridge->secondary_bus           <<  8 |
                          bridge->primary_bus);
                break;

        case PCI_IO_BASE:
                if (!mvebu_has_ioport(port))
                        *value = bridge->secondary_status << 16;
                else
                        *value = (bridge->secondary_status << 16 |
                                  bridge->iolimit          <<  8 |
                                  bridge->iobase);
                break;

We definitely cannot return zeroes for all these values.

And for writes, we have code like this:

        case PCI_BASE_ADDRESS_0 ... PCI_BASE_ADDRESS_1:
                bridge->bar[((where & ~3) - PCI_BASE_ADDRESS_0) / 4] = value;
                break;

        case PCI_IO_BASE:
                /*
                 * We also keep bit 1 set, it is a read-only bit that
                 * indicates we support 32 bits addressing for the
                 * I/O
                 */
                bridge->iobase = (value & 0xff) | PCI_IO_RANGE_TYPE_32;
                bridge->iolimit = ((value >> 8) & 0xff) | PCI_IO_RANGE_TYPE_32;
                mvebu_pcie_handle_iobase_change(port);
                break;

        case PCI_MEMORY_BASE:
                bridge->membase = value & 0xffff;
                bridge->memlimit = value >> 16;
                mvebu_pcie_handle_membase_change(port);
                break;

        case PCI_IO_BASE_UPPER16:
                bridge->iobaseupper = value & 0xffff;
                bridge->iolimitupper = value >> 16;
                mvebu_pcie_handle_iobase_change(port);
                break;

        case PCI_PRIMARY_BUS:
                bridge->primary_bus             = value & 0xff;
                bridge->secondary_bus           = (value >> 8) & 0xff;
                bridge->subordinate_bus         = (value >> 16) & 0xff;
                bridge->secondary_latency_timer = (value >> 24) & 0xff;
                mvebu_pcie_set_local_bus_nr(port, bridge->secondary_bus);
                break;

The writes being made to those parts of the config space are being
reflected back when reading the config space later.

> > +	if (size == 1)
> > +		*value = (*value >> (8 * (where & 3))) & 0xff;
> > +	else if (size == 2)
> > +		*value = (*value >> (8 * (where & 3))) & 0xffff;
> > +	else if (size != 4)
> > +		return PCIBIOS_BAD_REGISTER_NUMBER;  
> 
> This isn't directly related to your patches, but this is a common
> pattern with some variations between drivers, which makes me wonder
> whether there are bugs lurking.  These use "where & 3" for size 2:
> 
>   advk_pcie_rd_conf()
>   mtk_pcie_hw_rd_cfg()
>   pci_generic_config_read32()
> 
> But these use "where & 2":
> 
>   _altera_pcie_cfg_read()
>   rcar_pcie_read_conf()
> 
> I *think* this means that unaligned 2-byte config reads on Altera and
> R-Car will return incorrect data.  In both cases, the actual read seen
> by the hardware is a 32-bit read, but I think we extract the wrong 16
> bits from that result.

Indeed, if unaligned 2-byte config reads are made, they will return
bogus results. But are such reads allowed ?

> I wonder if there's a way to use a common helper function to do this.

Yes, this small bit of logic is duplicated all over the place. I'll see
if I can come up with some reasonable helpers for that.

> > +	return PCIBIOS_SUCCESSFUL;
> > +}  
> 
> > +	/*
> > +	 * Called when writing to the regular PCI bridge configuration
> > +	 * space. old is the current value, new is the new value being
> > +	 * written, and mask indicates which parts of the value are
> > +	 * being changed.
> > +	 */
> > +	void (*write_base)(struct pci_sw_bridge *bridge, int reg,
> > +			   u32 old, u32 new, u32 mask);
> > +
> > +	/*
> > +	 * Same as ->read_base(), except it is for reading from the
> > +	 * PCIe capability configuration space.  
> 
> I think you mean "->write_base(), except it is for writing to"

Indeed, yes. Will fix that.

So, to sum up, there are really three key questions:

 (1) Which name do you want for this ?

 (2) Agreeing on whether a RAM-like behavior is acceptable, and if not,
     which solution do you propose instead.

 (3) How much do you want me to convert other drivers vs. the
     maintainers for those drivers being responsible for doing this.

Since "There are only two hard things in Computer Science: cache
invalidation and naming things.", I expect question (1) to be the most
difficult one :-)

Best regards,

Thomas
Russell King - ARM Linux Aug. 1, 2018, 9:21 a.m. | #3
On Wed, Aug 01, 2018 at 10:49:57AM +0200, Thomas Petazzoni wrote:
> Indeed, yes. Will fix that.
> 
> So, to sum up, there are really three key questions:
> 
>  (1) Which name do you want for this ?
> 
>  (2) Agreeing on whether a RAM-like behavior is acceptable, and if not,
>      which solution do you propose instead.

Please take the time to read the PCI(e) specifications and implement
what it recommends, rather than doing something else.  That's what I
did when I originally reworked the mvebu PCIe driver for Armada 388,
and it's the only sensible approach to have something that works with
the rest of the Linux PCI(e) layer.

Going off and implementing a non-standard behaviour is a recipe for
things breaking as the PCIe kernel support evolves.

The spec requires reserved registers to read back as zero and ignore
writes, whereas implemented registers have a mixture of behaviours:

- read/write
- read, write to clear
- read-only

Here's the extract(s):

   All PCI devices must treat Configuration Space write operations
   to reserved registers as no-ops; that is, the access must be
   completed normally on the bus and the data discarded. Read
   accesses to reserved or unimplemented registers must be completed
   normally and a data value of 0 returned.

(eg) PCI status:
   Reserved bits should be read-only and return zero when read.
   A one bit is reset (if it is not read-only) whenever the register
   is written, and the write data in the corresponding bit location
   is a 1.

[which is why doing the read-modify-write action that some host
 bridges that only support 32-bit accesses is dangerous - it leads
 to various status bits being inadvertently reset.]

Getting this right in a software emulation of the register space for
each bit in every register makes for a happy Linux PCIe layer.

Now, configuration read/writes use naturally aligned addresses.  The
PCI specification defines the PC IO 0xcf8/0xcfc configuration access
mechanism.  The first register defines the address with a 6 bit
"double-word" address to cover the 256 bytes of standard config space.
Accesses to 0xcfc are forwarded to the PCI bus as a single
configuration request - and that means there is nothing to deal with
an attempt to access a mis-aligned word.

Indeed, if 0xcfe was to be accessed as a double word, this would
result in the processor reading the two bytes from IO address 0xcfe,
0xcff, as well as 0xd00 and 0xd01 which are not part of the
configuration data register - resulting in undefined data being read.

So, basically, misaligned configuration accesses are not supported.
Thomas Petazzoni Aug. 1, 2018, 9:37 a.m. | #4
Hello Russell,

Thanks for your feedback.

On Wed, 1 Aug 2018 10:21:19 +0100, Russell King - ARM Linux wrote:

> >  (2) Agreeing on whether a RAM-like behavior is acceptable, and if not,
> >      which solution do you propose instead.  
> 
> Please take the time to read the PCI(e) specifications and implement
> what it recommends, rather than doing something else.  That's what I
> did when I originally reworked the mvebu PCIe driver for Armada 388,
> and it's the only sensible approach to have something that works with
> the rest of the Linux PCI(e) layer.

Actually, all what I'm doing is generalizing what the mvebu PCIe driver
is doing (which you significantly contributed) to, and try to make it
usable by other drivers that have the same need.

So, I'm precisely trying to make sure we implement this stuff once,
correctly, rather than duplicate it in several drivers. I'm not
pretending that my first iteration is entirely correct, and your review
of it to make it better is much appreciated.

> Going off and implementing a non-standard behaviour is a recipe for
> things breaking as the PCIe kernel support evolves.

Sure.

> The spec requires reserved registers to read back as zero and ignore
> writes, whereas implemented registers have a mixture of behaviours:
> 
> - read/write
> - read, write to clear
> - read-only
> 
> Here's the extract(s):
> 
>    All PCI devices must treat Configuration Space write operations
>    to reserved registers as no-ops; that is, the access must be
>    completed normally on the bus and the data discarded. Read
>    accesses to reserved or unimplemented registers must be completed
>    normally and a data value of 0 returned.
> 
> (eg) PCI status:
>    Reserved bits should be read-only and return zero when read.
>    A one bit is reset (if it is not read-only) whenever the register
>    is written, and the write data in the corresponding bit location
>    is a 1.

Right. This pci-sw-bridge layer should implement this behavior.

> [which is why doing the read-modify-write action that some host
>  bridges that only support 32-bit accesses is dangerous - it leads
>  to various status bits being inadvertently reset.]
> 
> Getting this right in a software emulation of the register space for
> each bit in every register makes for a happy Linux PCIe layer.

Absolutely.

> Now, configuration read/writes use naturally aligned addresses.  The
> PCI specification defines the PC IO 0xcf8/0xcfc configuration access
> mechanism.  The first register defines the address with a 6 bit
> "double-word" address to cover the 256 bytes of standard config space.
> Accesses to 0xcfc are forwarded to the PCI bus as a single
> configuration request - and that means there is nothing to deal with
> an attempt to access a mis-aligned word.
> 
> Indeed, if 0xcfe was to be accessed as a double word, this would
> result in the processor reading the two bytes from IO address 0xcfe,
> 0xcff, as well as 0xd00 and 0xd01 which are not part of the
> configuration data register - resulting in undefined data being read.
> 
> So, basically, misaligned configuration accesses are not supported.

OK. So the where & 2 that the RCAR driver is doing for 2 bytes accesses
is OK, because bit 0 of where will always be 0 for a 2 bytes access.

Thanks,

Thomas Petazzoni
Thomas Petazzoni Aug. 1, 2018, 9:54 a.m. | #5
Hello Russell,

On Wed, 1 Aug 2018 10:21:19 +0100, Russell King - ARM Linux wrote:


>    All PCI devices must treat Configuration Space write operations
>    to reserved registers as no-ops; that is, the access must be
>    completed normally on the bus and the data discarded. Read
>    accesses to reserved or unimplemented registers must be completed
>    normally and a data value of 0 returned.
> 
> (eg) PCI status:
>    Reserved bits should be read-only and return zero when read.
>    A one bit is reset (if it is not read-only) whenever the register
>    is written, and the write data in the corresponding bit location
>    is a 1.
> 
> [which is why doing the read-modify-write action that some host
>  bridges that only support 32-bit accesses is dangerous - it leads
>  to various status bits being inadvertently reset.]

Speaking of this, the generic pci_generic_config_write32() function
indeed does this incorrectly, and prints a warning.

However, I just looked at the pci-thunder-pem code, and it seems to
correctly account for those W1C bits, by having an explicit list of
registers and their W1C bits (see thunder_pem_bridge_w1c_bits). This
isn't specific to the Thunder PCIe controller at all, and would benefit
from being made generic, no ?

Best regards,

Thomas
Thomas Petazzoni Aug. 1, 2018, 11:13 a.m. | #6
Bjorn,

On Wed, 1 Aug 2018 10:49:57 +0200, Thomas Petazzoni wrote:

> > I wonder if there's a way to use a common helper function to do this.  
> 
> Yes, this small bit of logic is duplicated all over the place. I'll see
> if I can come up with some reasonable helpers for that.

Here is an attempt at doing this:

 - Introduce some helpers

   https://github.com/MISL-EBU-System-SW/mainline-public/commit/55543d2050d9aa3abe297569be830bde8680e1e9

 - Use them in drivers/pci/access.c

   https://github.com/MISL-EBU-System-SW/mainline-public/commit/ce6158d5c6a039e0cddf5ee6840dadeb46c5fc4b

 - Use them in PCI host controller drivers

   https://github.com/MISL-EBU-System-SW/mainline-public/commit/df4d0b5272ea3b4c3226c96466e5360c5a89253f

I'm not a big fan of the naming though. Let me know what you think, if
you think it's worth it, I'll submit the patches.

Note: the whole thing is compile-tested only for now.

Best regards,

Thomas

Patch

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 56ff8f6d31fc..1f13575d052b 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -98,6 +98,9 @@  config PCI_ECAM
 config PCI_LOCKLESS_CONFIG
 	bool
 
+config PCI_SW_BRIDGE
+	bool
+
 config PCI_IOV
 	bool "PCI IOV support"
 	depends on PCI
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 535201984b8b..0bd65ddd2c50 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -19,6 +19,7 @@  obj-$(CONFIG_HOTPLUG_PCI)	+= hotplug/
 obj-$(CONFIG_PCI_MSI)		+= msi.o
 obj-$(CONFIG_PCI_ATS)		+= ats.o
 obj-$(CONFIG_PCI_IOV)		+= iov.o
+obj-$(CONFIG_PCI_SW_BRIDGE)	+= pci-sw-bridge.o
 obj-$(CONFIG_ACPI)		+= pci-acpi.o
 obj-$(CONFIG_PCI_LABEL)		+= pci-label.o
 obj-$(CONFIG_X86_INTEL_MID)	+= pci-mid.o
diff --git a/drivers/pci/pci-sw-bridge.c b/drivers/pci/pci-sw-bridge.c
new file mode 100644
index 000000000000..304e8c0e164d
--- /dev/null
+++ b/drivers/pci/pci-sw-bridge.c
@@ -0,0 +1,149 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018 Marvell
+ *
+ * Author: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
+ *
+ * This file helps PCI controller drivers implement a fake root port
+ * PCI bridge when the HW doesn't provide such a root port PCI
+ * bridge.
+ *
+ * It emulates a PCI bridge by providing a fake PCI configuration
+ * space (and optionally a PCIe capability configuration space) in
+ * memory. By default the read/write operations simply read and update
+ * this fake configuration space in memory. However, PCI controller
+ * drivers can provide through the 'struct pci_sw_bridge_ops'
+ * structure a set of operations to override or complement this
+ * default behavior.
+ */
+
+#include <linux/pci-sw-bridge.h>
+#include <linux/pci.h>
+
+#define PCI_BRIDGE_CONF_END	(PCI_BRIDGE_CONTROL + 2)
+#define PCI_CAP_PCIE_START	PCI_BRIDGE_CONF_END
+#define PCI_CAP_PCIE_END	(PCI_CAP_PCIE_START + PCI_EXP_SLTSTA2 + 2)
+
+/*
+ * Initialize a pci_sw_bridge structure to represent a fake PCI
+ * bridge. The caller needs to have initialized the PCI configuration
+ * space with whatever values make sense (typically at least vendor,
+ * device, revision), the ->ops pointer, and possibly ->data and
+ * ->has_pcie.
+ */
+void pci_sw_bridge_init(struct pci_sw_bridge *bridge)
+{
+	bridge->conf.class = PCI_CLASS_BRIDGE_PCI;
+	bridge->conf.header_type = PCI_HEADER_TYPE_BRIDGE;
+	bridge->conf.cache_line_size = 0x10;
+	bridge->conf.status = PCI_STATUS_CAP_LIST;
+
+	if (bridge->has_pcie) {
+		bridge->conf.capabilities_pointer = PCI_CAP_PCIE_START;
+		bridge->pcie_conf.cap_id = PCI_CAP_ID_EXP;
+		/* Set PCIe v2, root port, slot support */
+		bridge->pcie_conf.cap = PCI_EXP_TYPE_ROOT_PORT << 4 | 2 |
+			PCI_EXP_FLAGS_SLOT;
+	}
+}
+
+/*
+ * Should be called by the PCI controller driver when reading the PCI
+ * configuration space of the fake bridge. It will call back the
+ * ->ops->read_base or ->ops->read_pcie operations.
+ */
+int pci_sw_bridge_read(struct pci_sw_bridge *bridge, int where,
+		       int size, u32 *value)
+{
+	int ret;
+	int reg = where & ~3;
+
+	if (bridge->has_pcie && reg >= PCI_CAP_PCIE_END) {
+		*value = 0;
+		return PCIBIOS_SUCCESSFUL;
+	}
+
+	if (!bridge->has_pcie && reg >= PCI_BRIDGE_CONF_END) {
+		*value = 0;
+		return PCIBIOS_SUCCESSFUL;
+	}
+
+	if (bridge->has_pcie && reg >= PCI_CAP_PCIE_START) {
+		reg -= PCI_CAP_PCIE_START;
+
+		if (bridge->ops->read_pcie)
+			ret = bridge->ops->read_pcie(bridge, reg, value);
+		else
+			ret = PCI_SW_BRIDGE_NOT_HANDLED;
+
+		if (ret == PCI_SW_BRIDGE_NOT_HANDLED)
+			*value = *((u32*) &bridge->pcie_conf + reg / 4);
+	} else {
+		if (bridge->ops->read_base)
+			ret = bridge->ops->read_base(bridge, reg, value);
+		else
+			ret = PCI_SW_BRIDGE_NOT_HANDLED;
+
+		if (ret == PCI_SW_BRIDGE_NOT_HANDLED)
+			*value = *((u32*) &bridge->conf + reg / 4);
+	}
+
+	if (size == 1)
+		*value = (*value >> (8 * (where & 3))) & 0xff;
+	else if (size == 2)
+		*value = (*value >> (8 * (where & 3))) & 0xffff;
+	else if (size != 4)
+		return PCIBIOS_BAD_REGISTER_NUMBER;
+
+	return PCIBIOS_SUCCESSFUL;
+}
+
+/*
+ * Should be called by the PCI controller driver when writing the PCI
+ * configuration space of the fake bridge. It will call back the
+ * ->ops->write_base or ->ops->write_pcie operations.
+ */
+int pci_sw_bridge_write(struct pci_sw_bridge *bridge, int where,
+			int size, u32 value)
+{
+	int reg = where & ~3;
+	int mask, ret, old, new;
+
+	if (bridge->has_pcie && reg >= PCI_CAP_PCIE_END)
+		return PCIBIOS_SUCCESSFUL;
+
+	if (!bridge->has_pcie && reg >= PCI_BRIDGE_CONF_END)
+		return PCIBIOS_SUCCESSFUL;
+
+	if (size == 4)
+		mask = 0xffffffff;
+	else if (size == 2)
+		mask = 0xffff << ((where & 0x3) * 8);
+	else if (size == 1)
+		mask = 0xff << ((where & 0x3) * 8);
+	else
+		return PCIBIOS_BAD_REGISTER_NUMBER;
+
+	ret = pci_sw_bridge_read(bridge, reg, 4, &old);
+	if (ret != PCIBIOS_SUCCESSFUL)
+		return ret;
+
+	new = old & ~mask;
+	new |= (value << ((where & 0x3) * 8)) & mask;
+
+	if (bridge->has_pcie && reg >= PCI_CAP_PCIE_START) {
+		reg -= PCI_CAP_PCIE_START;
+
+		*((u32*) &bridge->pcie_conf + reg / 4) = new;
+
+		if (bridge->ops->write_pcie)
+			bridge->ops->write_pcie(bridge, reg, old, new, mask);
+	} else {
+		*((u32*) &bridge->conf + reg / 4) = new;
+
+		if (bridge->ops->write_base)
+			bridge->ops->write_base(bridge, reg, old, new, mask);
+	}
+
+	return PCIBIOS_SUCCESSFUL;
+}
diff --git a/include/linux/pci-sw-bridge.h b/include/linux/pci-sw-bridge.h
new file mode 100644
index 000000000000..152648124df5
--- /dev/null
+++ b/include/linux/pci-sw-bridge.h
@@ -0,0 +1,125 @@ 
+#ifndef __PCI_SW_BRIDGE_H__
+#define __PCI_SW_BRIDGE_H__
+
+#include <linux/kernel.h>
+
+/* PCI configuration space of a PCI-to-PCI bridge. */
+struct pci_sw_bridge_conf {
+	u16 vendor;
+	u16 device;
+	u16 command;
+	u16 status;
+	u8 revision;
+	u8 interface;
+	u16 class;
+	u8 cache_line_size;
+	u8 latency_timer;
+	u8 header_type;
+	u8 bist;
+	u32 bar[2];
+	u8 primary_bus;
+	u8 secondary_bus;
+	u8 subordinate_bus;
+	u8 secondary_latency_timer;
+	u8 iobase;
+	u8 iolimit;
+	u16 secondary_status;
+	u16 membase;
+	u16 memlimit;
+	u16 pref_mem_base;
+	u16 pref_mem_limit;
+	u32 prefbaseupper;
+	u32 preflimitupper;
+	u16 iobaseupper;
+	u16 iolimitupper;
+	u8 capabilities_pointer;
+	u8 reserve[3];
+	u32 romaddr;
+	u8 intline;
+	u8 intpin;
+	u16 bridgectrl;
+};
+
+/* PCI configuration space of the PCIe capabilities */
+struct pci_sw_bridge_pcie_conf {
+	u8 cap_id;
+	u8 next;
+	u16 cap;
+	u32 devcap;
+	u16 devctl;
+	u16 devsta;
+	u32 lnkcap;
+	u16 lnkctl;
+	u16 lnksta;
+	u32 slotcap;
+	u16 slotctl;
+	u16 slotsta;
+	u16 rootctl;
+	u16 rsvd;
+	u32 rootsta;
+	u32 devcap2;
+	u16 devctl2;
+	u16 devsta2;
+	u32 lnkcap2;
+	u16 lnkctl2;
+	u16 lnksta2;
+	u32 slotcap2;
+	u16 slotctl2;
+	u16 slotsta2;
+};
+
+struct pci_sw_bridge;
+
+typedef enum { PCI_SW_BRIDGE_HANDLED,
+	       PCI_SW_BRIDGE_NOT_HANDLED } pci_sw_bridge_read_status_t;
+
+struct pci_sw_bridge_ops {
+	/*
+	 * Called when reading from the regular PCI bridge
+	 * configuration space. Return PCI_SW_BRIDGE_HANDLED when the
+	 * operation has handled the read operation and filled in the
+	 * *value, or PCI_SW_BRIDGE_NOT_HANDLED when the read should
+	 * be emulated by the common code by reading from the
+	 * in-memory copy of the configuration space.
+	 */
+	pci_sw_bridge_read_status_t (*read_base)(struct pci_sw_bridge *bridge,
+						 int reg, u32 *value);
+
+	/*
+	 * Same as ->read_base(), except it is for reading from the
+	 * PCIe capability configuration space.
+	 */
+	pci_sw_bridge_read_status_t (*read_pcie)(struct pci_sw_bridge *bridge,
+						 int reg, u32 *value);
+	/*
+	 * Called when writing to the regular PCI bridge configuration
+	 * space. old is the current value, new is the new value being
+	 * written, and mask indicates which parts of the value are
+	 * being changed.
+	 */
+	void (*write_base)(struct pci_sw_bridge *bridge, int reg,
+			   u32 old, u32 new, u32 mask);
+
+	/*
+	 * Same as ->read_base(), except it is for reading from the
+	 * PCIe capability configuration space.
+	 */
+	void (*write_pcie)(struct pci_sw_bridge *bridge, int reg,
+			   u32 old, u32 new, u32 mask);
+};
+
+struct pci_sw_bridge {
+	struct pci_sw_bridge_conf conf;
+	struct pci_sw_bridge_pcie_conf pcie_conf;
+	struct pci_sw_bridge_ops *ops;
+	void *data;
+	bool has_pcie;
+};
+
+void pci_sw_bridge_init(struct pci_sw_bridge *bridge);
+int pci_sw_bridge_read(struct pci_sw_bridge *bridge, int where,
+		       int size, u32 *value);
+int pci_sw_bridge_write(struct pci_sw_bridge *bridge, int where,
+			int size, u32 value);
+
+#endif /* __PCI_SW_BRIDGE_H__ */