Patchwork [v2,1/2] pci: Fix flaw in pci_acs_enabled()

login
register
mail settings
Submitter Alex Williamson
Date June 7, 2013, 4:34 p.m.
Message ID <20130607163441.7733.23221.stgit@bling.home>
Download mbox | patch
Permalink /patch/249759/
State Superseded
Headers show

Comments

Alex Williamson - June 7, 2013, 4:34 p.m.
Downstream ports support for all ACS flags supercedes multifunction
exclusion of some flags.  The PCIe spec also fully specifies which
PCIe types are subject to the multifunction rules and excludes event
collectors and PCIe-to-PCI bridges entirely.  Document each rule to
the section of the PCIe spec.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/pci/pci.c |   59 ++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 45 insertions(+), 14 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - June 18, 2013, 5:09 p.m.
On Fri, Jun 07, 2013 at 10:34:41AM -0600, Alex Williamson wrote:
> Downstream ports support for all ACS flags supercedes multifunction
> exclusion of some flags.  The PCIe spec also fully specifies which
> PCIe types are subject to the multifunction rules and excludes event
> collectors and PCIe-to-PCI bridges entirely.  Document each rule to
> the section of the PCIe spec.

What's the flaw, exactly?  Does this fix a user-visible issue?  Should
it be backported to any stable releases?  It'd be nice to have an
example or two of devices where we return the wrong thing today.

Oh, I think I see: the original code filters out flags for
multi-function downstream ports, and your new code does not.

> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/pci/pci.c |   59 ++++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 45 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b099e00..457ae51 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2354,6 +2354,19 @@ void pci_enable_acs(struct pci_dev *dev)
>  	pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> +static bool pci_acs_flags_enabled(struct pci_dev *pdev, u16 acs_flags)
> +{
> +	int pos;
> +	u16 ctrl;
> +
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
> +	if (!pos)
> +		return false;
> +
> +	pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl);
> +	return (ctrl & acs_flags) == acs_flags;
> +}
> +
>  /**
>   * pci_acs_enabled - test ACS against required flags for a given device
>   * @pdev: device to test
> @@ -2364,8 +2377,7 @@ void pci_enable_acs(struct pci_dev *dev)
>   */
>  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)

I know you didn't change the *name* of this function, but I think it
would be easier to follow if you did change the name to something more
descriptive, e.g., something to do with the actual property you're
interested in, which has to do with routing peer-to-peer DMA.

That property makes sense even for the excluded devices, while the
idea of an ACS capability that doesn't even exist is implicitly
enabled, really doesn't.

>  {
> -	int pos, ret;
> -	u16 ctrl;
> +	int ret;
>  
>  	ret = pci_dev_specific_acs_enabled(pdev, acs_flags);
>  	if (ret >= 0)
> @@ -2374,23 +2386,42 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
>  	if (!pci_is_pcie(pdev))
>  		return false;
>  
> -	/* Filter out flags not applicable to multifunction */
> -	if (pdev->multifunction)
> +	switch (pci_pcie_type(pdev)) {
> +	/*
> +	 * PCIe 3.0, 6.12.1 excludes ACS on these devices.  "Not applicable"
> +	 */
> +	case PCI_EXP_TYPE_PCI_BRIDGE:
> +	case PCI_EXP_TYPE_RC_EC:
> +		break;

Why not just return "true" immediately here, or even better, just omit
the case altogether, since it doesn't change the behavior at all?

> +	/*
> +	 * PCIe 3.0, 6.12.1.1 specifies full ACS capabilities on downstream
> +	 * ports, regardless of them being multifunction devices.
> +	 */
> +	case PCI_EXP_TYPE_DOWNSTREAM:
> +	case PCI_EXP_TYPE_ROOT_PORT:
> +		return pci_acs_flags_enabled(pdev, acs_flags);
> +	/*
> +	 * PCIe 3.0, 6.12.1.2 specifies unimplemented ACS capabilities on
> +	 * multifunction devices.  6.12 footnote identifies specifically
> +	 * which devices types this applies to.
> +	 */
> +	case PCI_EXP_TYPE_ENDPOINT:
> +	case PCI_EXP_TYPE_UPSTREAM:
> +	case PCI_EXP_TYPE_LEG_END:
> +	case PCI_EXP_TYPE_RC_END:
> +		if (!pdev->multifunction)
> +			break;

And here.

> +
>  		acs_flags &= (PCI_ACS_RR | PCI_ACS_CR |
>  			      PCI_ACS_EC | PCI_ACS_DT);
>  
> -	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM ||
> -	    pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT ||
> -	    pdev->multifunction) {
> -		pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
> -		if (!pos)
> -			return false;
> -
> -		pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl);
> -		if ((ctrl & acs_flags) != acs_flags)
> -			return false;
> +		return pci_acs_flags_enabled(pdev, acs_flags);
>  	}
>  
> +	/*
> +	 * PCIe 3.0, 6.12.1.3 specifies no ACS capabilties are applicable
> +	 * to single function devices with the exception of downstream ports.
> +	 */
>  	return true;
>  }
>  
> 

I actually liked the original code organization, because it's closer
to expressing the idea of "here are the control points where hardware
can make choices about routing peer-to-peer DMA and here's the knob
(ACS) that constrains those choices."

Maybe something like (pidgin C):

    if (PCI_EXP_TYPE_DOWNSTREAM || PCI_EXP_TYPE_ROOT_PORT)
	return pci_acs_flags_enabled(pdev, acs_flags);

    if (!pdev->multifunction)
	return true;

    acs_flags &= (PCI_ACS_RR | PCI_ACS_CR | ...);
    return pci_acs_flags_enabled(pdev, acs_flags);

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson - June 18, 2013, 6:38 p.m.
On Tue, 2013-06-18 at 11:09 -0600, Bjorn Helgaas wrote:
> On Fri, Jun 07, 2013 at 10:34:41AM -0600, Alex Williamson wrote:
> > Downstream ports support for all ACS flags supercedes multifunction
> > exclusion of some flags.  The PCIe spec also fully specifies which
> > PCIe types are subject to the multifunction rules and excludes event
> > collectors and PCIe-to-PCI bridges entirely.  Document each rule to
> > the section of the PCIe spec.
> 
> What's the flaw, exactly?  Does this fix a user-visible issue?  Should
> it be backported to any stable releases?  It'd be nice to have an
> example or two of devices where we return the wrong thing today.
> 
> Oh, I think I see: the original code filters out flags for
> multi-function downstream ports, and your new code does not.

Right, if you have a multifunction root port and you want to check a
feature that applies to downstream ports, but not multifunction devices,
we were previously filtering those out.

> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >  drivers/pci/pci.c |   59 ++++++++++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 45 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index b099e00..457ae51 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2354,6 +2354,19 @@ void pci_enable_acs(struct pci_dev *dev)
> >  	pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> >  }
> >  
> > +static bool pci_acs_flags_enabled(struct pci_dev *pdev, u16 acs_flags)
> > +{
> > +	int pos;
> > +	u16 ctrl;
> > +
> > +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
> > +	if (!pos)
> > +		return false;
> > +
> > +	pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl);
> > +	return (ctrl & acs_flags) == acs_flags;
> > +}
> > +
> >  /**
> >   * pci_acs_enabled - test ACS against required flags for a given device
> >   * @pdev: device to test
> > @@ -2364,8 +2377,7 @@ void pci_enable_acs(struct pci_dev *dev)
> >   */
> >  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
> 
> I know you didn't change the *name* of this function, but I think it
> would be easier to follow if you did change the name to something more
> descriptive, e.g., something to do with the actual property you're
> interested in, which has to do with routing peer-to-peer DMA.
> 
> That property makes sense even for the excluded devices, while the
> idea of an ACS capability that doesn't even exist is implicitly
> enabled, really doesn't.

I think we also don't want to put the complexity at the caller for
understanding what capabilities are applicable to a given device.  It's
also convenient to use the set of ACS flags.  Given that, the current
naming came about.  It's a little awkward, but it's easy to use.
Suggestions for a better name?

> >  {
> > -	int pos, ret;
> > -	u16 ctrl;
> > +	int ret;
> >  
> >  	ret = pci_dev_specific_acs_enabled(pdev, acs_flags);
> >  	if (ret >= 0)
> > @@ -2374,23 +2386,42 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
> >  	if (!pci_is_pcie(pdev))
> >  		return false;
> >  
> > -	/* Filter out flags not applicable to multifunction */
> > -	if (pdev->multifunction)
> > +	switch (pci_pcie_type(pdev)) {
> > +	/*
> > +	 * PCIe 3.0, 6.12.1 excludes ACS on these devices.  "Not applicable"
> > +	 */
> > +	case PCI_EXP_TYPE_PCI_BRIDGE:
> > +	case PCI_EXP_TYPE_RC_EC:
> > +		break;
> 
> Why not just return "true" immediately here, or even better, just omit
> the case altogether, since it doesn't change the behavior at all?

It could certainly return true, I debated that for a while and decided
to break just to reduce the number of exit points.  Omitting the case
means that anyone who looks at this in the future has to figure out why
those pcie types aren't explicitly listed.

> > +	/*
> > +	 * PCIe 3.0, 6.12.1.1 specifies full ACS capabilities on downstream
> > +	 * ports, regardless of them being multifunction devices.
> > +	 */
> > +	case PCI_EXP_TYPE_DOWNSTREAM:
> > +	case PCI_EXP_TYPE_ROOT_PORT:
> > +		return pci_acs_flags_enabled(pdev, acs_flags);
> > +	/*
> > +	 * PCIe 3.0, 6.12.1.2 specifies unimplemented ACS capabilities on
> > +	 * multifunction devices.  6.12 footnote identifies specifically
> > +	 * which devices types this applies to.
> > +	 */
> > +	case PCI_EXP_TYPE_ENDPOINT:
> > +	case PCI_EXP_TYPE_UPSTREAM:
> > +	case PCI_EXP_TYPE_LEG_END:
> > +	case PCI_EXP_TYPE_RC_END:
> > +		if (!pdev->multifunction)
> > +			break;
> 
> And here.

Same

> > +
> >  		acs_flags &= (PCI_ACS_RR | PCI_ACS_CR |
> >  			      PCI_ACS_EC | PCI_ACS_DT);
> >  
> > -	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM ||
> > -	    pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT ||
> > -	    pdev->multifunction) {
> > -		pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
> > -		if (!pos)
> > -			return false;
> > -
> > -		pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl);
> > -		if ((ctrl & acs_flags) != acs_flags)
> > -			return false;
> > +		return pci_acs_flags_enabled(pdev, acs_flags);
> >  	}
> >  
> > +	/*
> > +	 * PCIe 3.0, 6.12.1.3 specifies no ACS capabilties are applicable
> > +	 * to single function devices with the exception of downstream ports.
> > +	 */
> >  	return true;
> >  }
> >  
> > 
> 
> I actually liked the original code organization, because it's closer
> to expressing the idea of "here are the control points where hardware
> can make choices about routing peer-to-peer DMA and here's the knob
> (ACS) that constrains those choices."
> 
> Maybe something like (pidgin C):
> 
>     if (PCI_EXP_TYPE_DOWNSTREAM || PCI_EXP_TYPE_ROOT_PORT)
> 	return pci_acs_flags_enabled(pdev, acs_flags);
> 
>     if (!pdev->multifunction)
> 	return true;
> 
>     acs_flags &= (PCI_ACS_RR | PCI_ACS_CR | ...);
>     return pci_acs_flags_enabled(pdev, acs_flags);

Ok.  I re-organized it because I got tired of re-reading the spec to
make sure that a given type of device was handled correctly.  Note that
the above simplification incorrectly handles multifunction bridges or EC
devices.  I can do something like you suggest, but being explicit was
actually a goal in fixing this bug and avoiding future bugs.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - June 18, 2013, 10:10 p.m.
On Tue, Jun 18, 2013 at 12:38 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Tue, 2013-06-18 at 11:09 -0600, Bjorn Helgaas wrote:
>> On Fri, Jun 07, 2013 at 10:34:41AM -0600, Alex Williamson wrote:
>> > Downstream ports support for all ACS flags supercedes multifunction
>> > exclusion of some flags.  The PCIe spec also fully specifies which
>> > PCIe types are subject to the multifunction rules and excludes event
>> > collectors and PCIe-to-PCI bridges entirely.  Document each rule to
>> > the section of the PCIe spec.
>>
>> What's the flaw, exactly?  Does this fix a user-visible issue?  Should
>> it be backported to any stable releases?  It'd be nice to have an
>> example or two of devices where we return the wrong thing today.
>>
>> Oh, I think I see: the original code filters out flags for
>> multi-function downstream ports, and your new code does not.
>
> Right, if you have a multifunction root port and you want to check a
> feature that applies to downstream ports, but not multifunction devices,
> we were previously filtering those out.
>
>> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>> > ---
>> >  drivers/pci/pci.c |   59 ++++++++++++++++++++++++++++++++++++++++-------------
>> >  1 file changed, 45 insertions(+), 14 deletions(-)
>> >
>> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> > index b099e00..457ae51 100644
>> > --- a/drivers/pci/pci.c
>> > +++ b/drivers/pci/pci.c
>> > @@ -2354,6 +2354,19 @@ void pci_enable_acs(struct pci_dev *dev)
>> >     pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>> >  }
>> >
>> > +static bool pci_acs_flags_enabled(struct pci_dev *pdev, u16 acs_flags)
>> > +{
>> > +   int pos;
>> > +   u16 ctrl;
>> > +
>> > +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
>> > +   if (!pos)
>> > +           return false;
>> > +
>> > +   pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl);
>> > +   return (ctrl & acs_flags) == acs_flags;
>> > +}
>> > +
>> >  /**
>> >   * pci_acs_enabled - test ACS against required flags for a given device
>> >   * @pdev: device to test
>> > @@ -2364,8 +2377,7 @@ void pci_enable_acs(struct pci_dev *dev)
>> >   */
>> >  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
>>
>> I know you didn't change the *name* of this function, but I think it
>> would be easier to follow if you did change the name to something more
>> descriptive, e.g., something to do with the actual property you're
>> interested in, which has to do with routing peer-to-peer DMA.
>>
>> That property makes sense even for the excluded devices, while the
>> idea of an ACS capability that doesn't even exist is implicitly
>> enabled, really doesn't.
>
> I think we also don't want to put the complexity at the caller for
> understanding what capabilities are applicable to a given device.  It's
> also convenient to use the set of ACS flags.  Given that, the current
> naming came about.  It's a little awkward, but it's easy to use.
> Suggestions for a better name?

100% agreed the caller shouldn't have to worry about different device
types.  I was thinking something like "pci_enforces_peer_isolation()"
or "pci_peer_dma_routed_upstream()".  Or maybe it should be
"pci_dev_...()".

>> >  {
>> > -   int pos, ret;
>> > -   u16 ctrl;
>> > +   int ret;
>> >
>> >     ret = pci_dev_specific_acs_enabled(pdev, acs_flags);
>> >     if (ret >= 0)
>> > @@ -2374,23 +2386,42 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
>> >     if (!pci_is_pcie(pdev))
>> >             return false;
>> >
>> > -   /* Filter out flags not applicable to multifunction */
>> > -   if (pdev->multifunction)
>> > +   switch (pci_pcie_type(pdev)) {
>> > +   /*
>> > +    * PCIe 3.0, 6.12.1 excludes ACS on these devices.  "Not applicable"
>> > +    */
>> > +   case PCI_EXP_TYPE_PCI_BRIDGE:
>> > +   case PCI_EXP_TYPE_RC_EC:
>> > +           break;
>>
>> Why not just return "true" immediately here, or even better, just omit
>> the case altogether, since it doesn't change the behavior at all?
>
> It could certainly return true, I debated that for a while and decided
> to break just to reduce the number of exit points.

That's interesting; I know lots of people advocate a single exit
point, but for functions like this where there's no cleanup and we're
essentially doing a "goto" to a "return <constant>", it seems harder
to read because I have to look down at the bottom to see how this case
is resolved.  Oh well, no point in arguing about *that* :)

> Omitting the case
> means that anyone who looks at this in the future has to figure out why
> those pcie types aren't explicitly listed.
>
>> > +   /*
>> > +    * PCIe 3.0, 6.12.1.1 specifies full ACS capabilities on downstream
>> > +    * ports, regardless of them being multifunction devices.
>> > +    */
>> > +   case PCI_EXP_TYPE_DOWNSTREAM:
>> > +   case PCI_EXP_TYPE_ROOT_PORT:
>> > +           return pci_acs_flags_enabled(pdev, acs_flags);
>> > +   /*
>> > +    * PCIe 3.0, 6.12.1.2 specifies unimplemented ACS capabilities on
>> > +    * multifunction devices.  6.12 footnote identifies specifically
>> > +    * which devices types this applies to.
>> > +    */
>> > +   case PCI_EXP_TYPE_ENDPOINT:
>> > +   case PCI_EXP_TYPE_UPSTREAM:
>> > +   case PCI_EXP_TYPE_LEG_END:
>> > +   case PCI_EXP_TYPE_RC_END:
>> > +           if (!pdev->multifunction)
>> > +                   break;
>>
>> And here.
>
> Same
>
>> > +
>> >             acs_flags &= (PCI_ACS_RR | PCI_ACS_CR |
>> >                           PCI_ACS_EC | PCI_ACS_DT);
>> >
>> > -   if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM ||
>> > -       pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT ||
>> > -       pdev->multifunction) {
>> > -           pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
>> > -           if (!pos)
>> > -                   return false;
>> > -
>> > -           pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl);
>> > -           if ((ctrl & acs_flags) != acs_flags)
>> > -                   return false;
>> > +           return pci_acs_flags_enabled(pdev, acs_flags);
>> >     }
>> >
>> > +   /*
>> > +    * PCIe 3.0, 6.12.1.3 specifies no ACS capabilties are applicable
>> > +    * to single function devices with the exception of downstream ports.
>> > +    */
>> >     return true;
>> >  }
>> >
>> >
>>
>> I actually liked the original code organization, because it's closer
>> to expressing the idea of "here are the control points where hardware
>> can make choices about routing peer-to-peer DMA and here's the knob
>> (ACS) that constrains those choices."
>>
>> Maybe something like (pidgin C):
>>
>>     if (PCI_EXP_TYPE_DOWNSTREAM || PCI_EXP_TYPE_ROOT_PORT)
>>       return pci_acs_flags_enabled(pdev, acs_flags);
>>
>>     if (!pdev->multifunction)
>>       return true;
>>
>>     acs_flags &= (PCI_ACS_RR | PCI_ACS_CR | ...);
>>     return pci_acs_flags_enabled(pdev, acs_flags);
>
> Ok.  I re-organized it because I got tired of re-reading the spec to
> make sure that a given type of device was handled correctly.  Note that
> the above simplification incorrectly handles multifunction bridges or EC
> devices.

Hmm...  What *is* the correct behavior for a bridge?  You return
"true," i.e., you're saying that a PCIe-to-PCI bridge will always
route peer-to-peer transactions from PCI devices upstream to its PCIe
link.  But that seems wrong: a PCI DMA transaction can target a peer
on the same PCI bus, and it's not even possible for the bridge to
validate the transaction or forward it upstream.

I suspect the "ACS is never applicable to a PCI Express to PCI Bridge
Function" statement in 6.12.1 just means "it's impossible for ACS to
isolate the devices below the bridge from each other, so it would be
misleading to implement the capability."

I have no idea or speculation about why the spec calls out Event
Collectors.  They aren't bridges, so I suppose it must be related to
multi-function devices?

> I can do something like you suggest, but being explicit was
> actually a goal in fixing this bug and avoiding future bugs.  Thanks,

Is there a bugzilla you can reference here?  Or can we at least turn
the "multi-function downstream port flag filtering" thing into a
user-visible effect?  I can see what it fixes at the microscopic
level, but I really don't know the implications of it.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson - June 18, 2013, 10:47 p.m.
On Tue, 2013-06-18 at 16:10 -0600, Bjorn Helgaas wrote:
> On Tue, Jun 18, 2013 at 12:38 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Tue, 2013-06-18 at 11:09 -0600, Bjorn Helgaas wrote:
> >> On Fri, Jun 07, 2013 at 10:34:41AM -0600, Alex Williamson wrote:
> >> > Downstream ports support for all ACS flags supercedes multifunction
> >> > exclusion of some flags.  The PCIe spec also fully specifies which
> >> > PCIe types are subject to the multifunction rules and excludes event
> >> > collectors and PCIe-to-PCI bridges entirely.  Document each rule to
> >> > the section of the PCIe spec.
> >>
> >> What's the flaw, exactly?  Does this fix a user-visible issue?  Should
> >> it be backported to any stable releases?  It'd be nice to have an
> >> example or two of devices where we return the wrong thing today.
> >>
> >> Oh, I think I see: the original code filters out flags for
> >> multi-function downstream ports, and your new code does not.
> >
> > Right, if you have a multifunction root port and you want to check a
> > feature that applies to downstream ports, but not multifunction devices,
> > we were previously filtering those out.
> >
> >> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> >> > ---
> >> >  drivers/pci/pci.c |   59 ++++++++++++++++++++++++++++++++++++++++-------------
> >> >  1 file changed, 45 insertions(+), 14 deletions(-)
> >> >
> >> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> > index b099e00..457ae51 100644
> >> > --- a/drivers/pci/pci.c
> >> > +++ b/drivers/pci/pci.c
> >> > @@ -2354,6 +2354,19 @@ void pci_enable_acs(struct pci_dev *dev)
> >> >     pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> >> >  }
> >> >
> >> > +static bool pci_acs_flags_enabled(struct pci_dev *pdev, u16 acs_flags)
> >> > +{
> >> > +   int pos;
> >> > +   u16 ctrl;
> >> > +
> >> > +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
> >> > +   if (!pos)
> >> > +           return false;
> >> > +
> >> > +   pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl);
> >> > +   return (ctrl & acs_flags) == acs_flags;
> >> > +}
> >> > +
> >> >  /**
> >> >   * pci_acs_enabled - test ACS against required flags for a given device
> >> >   * @pdev: device to test
> >> > @@ -2364,8 +2377,7 @@ void pci_enable_acs(struct pci_dev *dev)
> >> >   */
> >> >  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
> >>
> >> I know you didn't change the *name* of this function, but I think it
> >> would be easier to follow if you did change the name to something more
> >> descriptive, e.g., something to do with the actual property you're
> >> interested in, which has to do with routing peer-to-peer DMA.
> >>
> >> That property makes sense even for the excluded devices, while the
> >> idea of an ACS capability that doesn't even exist is implicitly
> >> enabled, really doesn't.
> >
> > I think we also don't want to put the complexity at the caller for
> > understanding what capabilities are applicable to a given device.  It's
> > also convenient to use the set of ACS flags.  Given that, the current
> > naming came about.  It's a little awkward, but it's easy to use.
> > Suggestions for a better name?
> 
> 100% agreed the caller shouldn't have to worry about different device
> types.  I was thinking something like "pci_enforces_peer_isolation()"
> or "pci_peer_dma_routed_upstream()".  Or maybe it should be
> "pci_dev_...()".

Ok, I'll play with those.  I'm worried that there are nuances to each
flag bit that don't all fit under such a broad description.  Do you want
to gate this series on a rename of an existing function?

> >> >  {
> >> > -   int pos, ret;
> >> > -   u16 ctrl;
> >> > +   int ret;
> >> >
> >> >     ret = pci_dev_specific_acs_enabled(pdev, acs_flags);
> >> >     if (ret >= 0)
> >> > @@ -2374,23 +2386,42 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
> >> >     if (!pci_is_pcie(pdev))
> >> >             return false;
> >> >
> >> > -   /* Filter out flags not applicable to multifunction */
> >> > -   if (pdev->multifunction)
> >> > +   switch (pci_pcie_type(pdev)) {
> >> > +   /*
> >> > +    * PCIe 3.0, 6.12.1 excludes ACS on these devices.  "Not applicable"
> >> > +    */
> >> > +   case PCI_EXP_TYPE_PCI_BRIDGE:
> >> > +   case PCI_EXP_TYPE_RC_EC:
> >> > +           break;
> >>
> >> Why not just return "true" immediately here, or even better, just omit
> >> the case altogether, since it doesn't change the behavior at all?
> >
> > It could certainly return true, I debated that for a while and decided
> > to break just to reduce the number of exit points.
> 
> That's interesting; I know lots of people advocate a single exit
> point, but for functions like this where there's no cleanup and we're
> essentially doing a "goto" to a "return <constant>", it seems harder
> to read because I have to look down at the bottom to see how this case
> is resolved.  Oh well, no point in arguing about *that* :)
> 
> > Omitting the case
> > means that anyone who looks at this in the future has to figure out why
> > those pcie types aren't explicitly listed.
> >
> >> > +   /*
> >> > +    * PCIe 3.0, 6.12.1.1 specifies full ACS capabilities on downstream
> >> > +    * ports, regardless of them being multifunction devices.
> >> > +    */
> >> > +   case PCI_EXP_TYPE_DOWNSTREAM:
> >> > +   case PCI_EXP_TYPE_ROOT_PORT:
> >> > +           return pci_acs_flags_enabled(pdev, acs_flags);
> >> > +   /*
> >> > +    * PCIe 3.0, 6.12.1.2 specifies unimplemented ACS capabilities on
> >> > +    * multifunction devices.  6.12 footnote identifies specifically
> >> > +    * which devices types this applies to.
> >> > +    */
> >> > +   case PCI_EXP_TYPE_ENDPOINT:
> >> > +   case PCI_EXP_TYPE_UPSTREAM:
> >> > +   case PCI_EXP_TYPE_LEG_END:
> >> > +   case PCI_EXP_TYPE_RC_END:
> >> > +           if (!pdev->multifunction)
> >> > +                   break;
> >>
> >> And here.
> >
> > Same
> >
> >> > +
> >> >             acs_flags &= (PCI_ACS_RR | PCI_ACS_CR |
> >> >                           PCI_ACS_EC | PCI_ACS_DT);
> >> >
> >> > -   if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM ||
> >> > -       pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT ||
> >> > -       pdev->multifunction) {
> >> > -           pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
> >> > -           if (!pos)
> >> > -                   return false;
> >> > -
> >> > -           pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl);
> >> > -           if ((ctrl & acs_flags) != acs_flags)
> >> > -                   return false;
> >> > +           return pci_acs_flags_enabled(pdev, acs_flags);
> >> >     }
> >> >
> >> > +   /*
> >> > +    * PCIe 3.0, 6.12.1.3 specifies no ACS capabilties are applicable
> >> > +    * to single function devices with the exception of downstream ports.
> >> > +    */
> >> >     return true;
> >> >  }
> >> >
> >> >
> >>
> >> I actually liked the original code organization, because it's closer
> >> to expressing the idea of "here are the control points where hardware
> >> can make choices about routing peer-to-peer DMA and here's the knob
> >> (ACS) that constrains those choices."
> >>
> >> Maybe something like (pidgin C):
> >>
> >>     if (PCI_EXP_TYPE_DOWNSTREAM || PCI_EXP_TYPE_ROOT_PORT)
> >>       return pci_acs_flags_enabled(pdev, acs_flags);
> >>
> >>     if (!pdev->multifunction)
> >>       return true;
> >>
> >>     acs_flags &= (PCI_ACS_RR | PCI_ACS_CR | ...);
> >>     return pci_acs_flags_enabled(pdev, acs_flags);
> >
> > Ok.  I re-organized it because I got tired of re-reading the spec to
> > make sure that a given type of device was handled correctly.  Note that
> > the above simplification incorrectly handles multifunction bridges or EC
> > devices.
> 
> Hmm...  What *is* the correct behavior for a bridge?  You return
> "true," i.e., you're saying that a PCIe-to-PCI bridge will always
> route peer-to-peer transactions from PCI devices upstream to its PCIe
> link.  But that seems wrong: a PCI DMA transaction can target a peer
> on the same PCI bus, and it's not even possible for the bridge to
> validate the transaction or forward it upstream.
> 
> I suspect the "ACS is never applicable to a PCI Express to PCI Bridge
> Function" statement in 6.12.1 just means "it's impossible for ACS to
> isolate the devices below the bridge from each other, so it would be
> misleading to implement the capability."

Note that we never consider ACS to be enabled for a conventional PCI
device.  I suppose we could have cases where it's the only device on a
bus, but for the most part, it's not worth the trouble (it may be the
only device now, then a hotplug occurs).  So really saying the bridge
does or doesn't support ACS doesn't matter to the devices behind it.
What does matter is the fan-out of that isolation group of the
conventional devices beyond the bridge.  If the spec is indicating that
a bridge cannot do peer-to-peer with other devices  then all of the
conventional devices behind it are in an isolation domain so long as the
path between the bridge and the RC supports ACS.  If the bridge can do
peer-to-peer and it is a multifunction device, then the isolation domain
grows to include the other functions and subordinates of the other
functions.  I took the assumption that a bridge probably needs to
forward transaction upstream.  Do you have an alternate opinion?

> I have no idea or speculation about why the spec calls out Event
> Collectors.  They aren't bridges, so I suppose it must be related to
> multi-function devices?

I wonder if they can even do DMA, that might be why they're called out.
I don't really know anything about ECs though.

> > I can do something like you suggest, but being explicit was
> > actually a goal in fixing this bug and avoiding future bugs.  Thanks,
> 
> Is there a bugzilla you can reference here?  Or can we at least turn
> the "multi-function downstream port flag filtering" thing into a
> user-visible effect?  I can see what it fixes at the microscopic
> level, but I really don't know the implications of it.

There's no bugzilla for this.  This is a subtle change I don't think you
can see it.  If we assume that all devices comply with the spec (ha!),
then downstream ports must implement all the capability bits.  We mask
some out incorrectly, but we've already blasted a compatible set into
the control register in pci_enable_acs().  So this is purely for
correctness and you should not actually be able to see a difference in
iommu group topology from it (unless you can find a non-spec compliant
device which misses some required bits).  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - June 19, 2013, 2:30 a.m.
On Tue, Jun 18, 2013 at 4:47 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Tue, 2013-06-18 at 16:10 -0600, Bjorn Helgaas wrote:
>> On Tue, Jun 18, 2013 at 12:38 PM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> > On Tue, 2013-06-18 at 11:09 -0600, Bjorn Helgaas wrote:
>> >> On Fri, Jun 07, 2013 at 10:34:41AM -0600, Alex Williamson wrote:
> ...
>> >> >   * pci_acs_enabled - test ACS against required flags for a given device
>> >> >   * @pdev: device to test
>> >> > @@ -2364,8 +2377,7 @@ void pci_enable_acs(struct pci_dev *dev)
>> >> >   */
>> >> >  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
>> >>
>> >> I know you didn't change the *name* of this function, but I think it
>> >> would be easier to follow if you did change the name to something more
>> >> descriptive, e.g., something to do with the actual property you're
>> >> interested in, which has to do with routing peer-to-peer DMA.
>> >>
>> >> That property makes sense even for the excluded devices, while the
>> >> idea of an ACS capability that doesn't even exist is implicitly
>> >> enabled, really doesn't.
>> >
>> > I think we also don't want to put the complexity at the caller for
>> > understanding what capabilities are applicable to a given device.  It's
>> > also convenient to use the set of ACS flags.  Given that, the current
>> > naming came about.  It's a little awkward, but it's easy to use.
>> > Suggestions for a better name?
>>
>> 100% agreed the caller shouldn't have to worry about different device
>> types.  I was thinking something like "pci_enforces_peer_isolation()"
>> or "pci_peer_dma_routed_upstream()".  Or maybe it should be
>> "pci_dev_...()".
>
> Ok, I'll play with those.  I'm worried that there are nuances to each
> flag bit that don't all fit under such a broad description.

It's true that they might be overly broad.  On the other hand, the set
of flags we look for is always the same: PCI_ACS_SV | PCI_ACS_RR |
PCI_ACS_CR | PCI_ACS_UF, so what's the point in making a completely
general-purpose interface?  I'm not sure it's even worth passing the
flags around if the code would be clearer without that.

> Do you want
> to gate this series on a rename of an existing function?

You put me in a bit of a tight spot :)  My #1 concern is correctness
and maintainability.  Naming things so they're consistent with other
code and make sense to other readers is a huge part of that.
Unfortunately I don't have time to do any work myself, and my only
tool is to apply patches or not.  But no, I don't want to gate a
simple bug fix on other "cleanup" rework.

> ...
>> >> Maybe something like (pidgin C):
>> >>
>> >>     if (PCI_EXP_TYPE_DOWNSTREAM || PCI_EXP_TYPE_ROOT_PORT)
>> >>       return pci_acs_flags_enabled(pdev, acs_flags);
>> >>
>> >>     if (!pdev->multifunction)
>> >>       return true;
>> >>
>> >>     acs_flags &= (PCI_ACS_RR | PCI_ACS_CR | ...);
>> >>     return pci_acs_flags_enabled(pdev, acs_flags);
>> >
>> > ...  Note that
>> > the above simplification incorrectly handles multifunction bridges or EC
>> > devices.
>>
>> Hmm...  What *is* the correct behavior for a bridge?  You return
>> "true," i.e., you're saying that a PCIe-to-PCI bridge will always
>> route peer-to-peer transactions from PCI devices upstream to its PCIe
>> link.  But that seems wrong: a PCI DMA transaction can target a peer
>> on the same PCI bus, and it's not even possible for the bridge to
>> validate the transaction or forward it upstream.
>>
>> I suspect the "ACS is never applicable to a PCI Express to PCI Bridge
>> Function" statement in 6.12.1 just means "it's impossible for ACS to
>> isolate the devices below the bridge from each other, so it would be
>> misleading to implement the capability."
>
> Note that we never consider ACS to be enabled for a conventional PCI
> device.  I suppose we could have cases where it's the only device on a
> bus, but for the most part, it's not worth the trouble (it may be the
> only device now, then a hotplug occurs).  So really saying the bridge
> does or doesn't support ACS doesn't matter to the devices behind it.

> What does matter is the fan-out of that isolation group of the
> conventional devices beyond the bridge.  If the spec is indicating that
> a bridge cannot do peer-to-peer with other devices  then all of the
> conventional devices behind it are in an isolation domain so long as the
> path between the bridge and the RC supports ACS.  If the bridge can do
> peer-to-peer and it is a multifunction device, then the isolation domain
> grows to include the other functions and subordinates of the other
> functions.  I took the assumption that a bridge probably needs to
> forward transaction upstream.  Do you have an alternate opinion?

I think you're talking about a multi-function device with several
PCIe-to-PCI bridges (e.g., Option A of Figure 1-4, p. 29, of the PCIe
bridge spec 1.0), and the question is whether the bridge can forward a
transaction between bus X and bus Y without forwarding it upstream.

I don't see any mention of forwarding transactions between functions
of a multi-ported bridge, so the conservative assumption would be that
a bridge is allowed to do that.  I would expect a conventional
multi-port PCI-PCI bridge to forward between functions, because in
conventional PCI there would be no reason to forward it upstream, so
I'd think PCIe-PCI bridges, at least those designed pre-ACS, would be
similar.  And I don't think the ACS capability is expressive enough to
say "peer-to-peer transactions between functions must be forwarded
upstream, but peer-to-peer transactions below a single function may
not be."

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson - June 20, 2013, 9:43 p.m.
On Tue, 2013-06-18 at 20:30 -0600, Bjorn Helgaas wrote:
> On Tue, Jun 18, 2013 at 4:47 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Tue, 2013-06-18 at 16:10 -0600, Bjorn Helgaas wrote:
> >> On Tue, Jun 18, 2013 at 12:38 PM, Alex Williamson
> >> <alex.williamson@redhat.com> wrote:
> >> > On Tue, 2013-06-18 at 11:09 -0600, Bjorn Helgaas wrote:
> >> >> On Fri, Jun 07, 2013 at 10:34:41AM -0600, Alex Williamson wrote:
> > ...
> >> >> >   * pci_acs_enabled - test ACS against required flags for a given device
> >> >> >   * @pdev: device to test
> >> >> > @@ -2364,8 +2377,7 @@ void pci_enable_acs(struct pci_dev *dev)
> >> >> >   */
> >> >> >  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
> >> >>
> >> >> I know you didn't change the *name* of this function, but I think it
> >> >> would be easier to follow if you did change the name to something more
> >> >> descriptive, e.g., something to do with the actual property you're
> >> >> interested in, which has to do with routing peer-to-peer DMA.
> >> >>
> >> >> That property makes sense even for the excluded devices, while the
> >> >> idea of an ACS capability that doesn't even exist is implicitly
> >> >> enabled, really doesn't.
> >> >
> >> > I think we also don't want to put the complexity at the caller for
> >> > understanding what capabilities are applicable to a given device.  It's
> >> > also convenient to use the set of ACS flags.  Given that, the current
> >> > naming came about.  It's a little awkward, but it's easy to use.
> >> > Suggestions for a better name?
> >>
> >> 100% agreed the caller shouldn't have to worry about different device
> >> types.  I was thinking something like "pci_enforces_peer_isolation()"
> >> or "pci_peer_dma_routed_upstream()".  Or maybe it should be
> >> "pci_dev_...()".
> >
> > Ok, I'll play with those.  I'm worried that there are nuances to each
> > flag bit that don't all fit under such a broad description.
> 
> It's true that they might be overly broad.  On the other hand, the set
> of flags we look for is always the same: PCI_ACS_SV | PCI_ACS_RR |
> PCI_ACS_CR | PCI_ACS_UF, so what's the point in making a completely
> general-purpose interface?  I'm not sure it's even worth passing the
> flags around if the code would be clearer without that.
> 
> > Do you want
> > to gate this series on a rename of an existing function?
> 
> You put me in a bit of a tight spot :)  My #1 concern is correctness
> and maintainability.  Naming things so they're consistent with other
> code and make sense to other readers is a huge part of that.
> Unfortunately I don't have time to do any work myself, and my only
> tool is to apply patches or not.  But no, I don't want to gate a
> simple bug fix on other "cleanup" rework.
> 
> > ...
> >> >> Maybe something like (pidgin C):
> >> >>
> >> >>     if (PCI_EXP_TYPE_DOWNSTREAM || PCI_EXP_TYPE_ROOT_PORT)
> >> >>       return pci_acs_flags_enabled(pdev, acs_flags);
> >> >>
> >> >>     if (!pdev->multifunction)
> >> >>       return true;
> >> >>
> >> >>     acs_flags &= (PCI_ACS_RR | PCI_ACS_CR | ...);
> >> >>     return pci_acs_flags_enabled(pdev, acs_flags);
> >> >
> >> > ...  Note that
> >> > the above simplification incorrectly handles multifunction bridges or EC
> >> > devices.
> >>
> >> Hmm...  What *is* the correct behavior for a bridge?  You return
> >> "true," i.e., you're saying that a PCIe-to-PCI bridge will always
> >> route peer-to-peer transactions from PCI devices upstream to its PCIe
> >> link.  But that seems wrong: a PCI DMA transaction can target a peer
> >> on the same PCI bus, and it's not even possible for the bridge to
> >> validate the transaction or forward it upstream.
> >>
> >> I suspect the "ACS is never applicable to a PCI Express to PCI Bridge
> >> Function" statement in 6.12.1 just means "it's impossible for ACS to
> >> isolate the devices below the bridge from each other, so it would be
> >> misleading to implement the capability."
> >
> > Note that we never consider ACS to be enabled for a conventional PCI
> > device.  I suppose we could have cases where it's the only device on a
> > bus, but for the most part, it's not worth the trouble (it may be the
> > only device now, then a hotplug occurs).  So really saying the bridge
> > does or doesn't support ACS doesn't matter to the devices behind it.
> 
> > What does matter is the fan-out of that isolation group of the
> > conventional devices beyond the bridge.  If the spec is indicating that
> > a bridge cannot do peer-to-peer with other devices  then all of the
> > conventional devices behind it are in an isolation domain so long as the
> > path between the bridge and the RC supports ACS.  If the bridge can do
> > peer-to-peer and it is a multifunction device, then the isolation domain
> > grows to include the other functions and subordinates of the other
> > functions.  I took the assumption that a bridge probably needs to
> > forward transaction upstream.  Do you have an alternate opinion?
> 
> I think you're talking about a multi-function device with several
> PCIe-to-PCI bridges (e.g., Option A of Figure 1-4, p. 29, of the PCIe
> bridge spec 1.0), and the question is whether the bridge can forward a
> transaction between bus X and bus Y without forwarding it upstream.

Right, or the second function may not be a bridge, it could be anything
that could accept a p2p DMA.

> I don't see any mention of forwarding transactions between functions
> of a multi-ported bridge, so the conservative assumption would be that
> a bridge is allowed to do that.  I would expect a conventional
> multi-port PCI-PCI bridge to forward between functions, because in
> conventional PCI there would be no reason to forward it upstream, so
> I'd think PCIe-PCI bridges, at least those designed pre-ACS, would be
> similar.  And I don't think the ACS capability is expressive enough to
> say "peer-to-peer transactions between functions must be forwarded
> upstream, but peer-to-peer transactions below a single function may
> not be."

The best I can find is the statement that the bridge must forward
transactions from the secondary interface to the primary interface.  We
could infer that that means upstream, but we can't rule out some switch
logic in a multifunction package that could do a re-route from that
statement.  However then why would the PCIe spec forbid a PCIe-to-PCI
bridge from implementing ACS?

ACS doesn't need to be expressive enough for what you're describing.  We
know that there's no protection against p2p on conventional PCI.  The
PCIe bridge spec even indicates that transactions within the bridge
apertures should not be forwarded to the primary interface (which should
really be a big red flag that we shouldn't even attempt to support
assignment of such devices).  The question is whether a transaction
going out the primary interface is necessarily headed upstream or if it
can go directly to a peer device.  In that case, I can't figure out why
ACS is precluded from PCIe-to-PCI bridges.  If anything this is an
example of why we need a user override, then we could it to the more
conservative value pending further information and let the user change
it.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - June 24, 2013, 5:27 p.m.
On Thu, Jun 20, 2013 at 3:43 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Tue, 2013-06-18 at 20:30 -0600, Bjorn Helgaas wrote:
>> On Tue, Jun 18, 2013 at 4:47 PM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> > On Tue, 2013-06-18 at 16:10 -0600, Bjorn Helgaas wrote:
>> >> On Tue, Jun 18, 2013 at 12:38 PM, Alex Williamson
>> >> <alex.williamson@redhat.com> wrote:
>> >> > On Tue, 2013-06-18 at 11:09 -0600, Bjorn Helgaas wrote:
>> >> >> On Fri, Jun 07, 2013 at 10:34:41AM -0600, Alex Williamson wrote:
>> > ...
>> >> >> >   * pci_acs_enabled - test ACS against required flags for a given device
>> >> >> >   * @pdev: device to test
>> >> >> > @@ -2364,8 +2377,7 @@ void pci_enable_acs(struct pci_dev *dev)
>> >> >> >   */
>> >> >> >  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
>> >> >>
>> >> >> I know you didn't change the *name* of this function, but I think it
>> >> >> would be easier to follow if you did change the name to something more
>> >> >> descriptive, e.g., something to do with the actual property you're
>> >> >> interested in, which has to do with routing peer-to-peer DMA.
>> >> >>
>> >> >> That property makes sense even for the excluded devices, while the
>> >> >> idea of an ACS capability that doesn't even exist is implicitly
>> >> >> enabled, really doesn't.
>> >> >
>> >> > I think we also don't want to put the complexity at the caller for
>> >> > understanding what capabilities are applicable to a given device.  It's
>> >> > also convenient to use the set of ACS flags.  Given that, the current
>> >> > naming came about.  It's a little awkward, but it's easy to use.
>> >> > Suggestions for a better name?
>> >>
>> >> 100% agreed the caller shouldn't have to worry about different device
>> >> types.  I was thinking something like "pci_enforces_peer_isolation()"
>> >> or "pci_peer_dma_routed_upstream()".  Or maybe it should be
>> >> "pci_dev_...()".
>> >
>> > Ok, I'll play with those.  I'm worried that there are nuances to each
>> > flag bit that don't all fit under such a broad description.
>>
>> It's true that they might be overly broad.  On the other hand, the set
>> of flags we look for is always the same: PCI_ACS_SV | PCI_ACS_RR |
>> PCI_ACS_CR | PCI_ACS_UF, so what's the point in making a completely
>> general-purpose interface?  I'm not sure it's even worth passing the
>> flags around if the code would be clearer without that.
>>
>> > Do you want
>> > to gate this series on a rename of an existing function?

OK, forget about the name.  I'm happy with the existing name as long
as we add a comment about the routing implications.  The spec
references are good, but they don't really help understand what's
going on in the hardware.

>> >> Hmm...  What *is* the correct behavior for a bridge?  You return
>> >> "true," i.e., you're saying that a PCIe-to-PCI bridge will always
>> >> route peer-to-peer transactions from PCI devices upstream to its PCIe
>> >> link.  But that seems wrong: a PCI DMA transaction can target a peer
>> >> on the same PCI bus, and it's not even possible for the bridge to
>> >> validate the transaction or forward it upstream.
>> >>
>> >> I suspect the "ACS is never applicable to a PCI Express to PCI Bridge
>> >> Function" statement in 6.12.1 just means "it's impossible for ACS to
>> >> isolate the devices below the bridge from each other, so it would be
>> >> misleading to implement the capability."
>> >
>> > Note that we never consider ACS to be enabled for a conventional PCI
>> > device.  I suppose we could have cases where it's the only device on a
>> > bus, but for the most part, it's not worth the trouble (it may be the
>> > only device now, then a hotplug occurs).  So really saying the bridge
>> > does or doesn't support ACS doesn't matter to the devices behind it.
>>
>> > What does matter is the fan-out of that isolation group of the
>> > conventional devices beyond the bridge.  If the spec is indicating that
>> > a bridge cannot do peer-to-peer with other devices  then all of the
>> > conventional devices behind it are in an isolation domain so long as the
>> > path between the bridge and the RC supports ACS.  If the bridge can do
>> > peer-to-peer and it is a multifunction device, then the isolation domain
>> > grows to include the other functions and subordinates of the other
>> > functions.  I took the assumption that a bridge probably needs to
>> > forward transaction upstream.  Do you have an alternate opinion?
>>
>> I think you're talking about a multi-function device with several
>> PCIe-to-PCI bridges (e.g., Option A of Figure 1-4, p. 29, of the PCIe
>> bridge spec 1.0), and the question is whether the bridge can forward a
>> transaction between bus X and bus Y without forwarding it upstream.
>
> Right, or the second function may not be a bridge, it could be anything
> that could accept a p2p DMA.
>
>> I don't see any mention of forwarding transactions between functions
>> of a multi-ported bridge, so the conservative assumption would be that
>> a bridge is allowed to do that.  I would expect a conventional
>> multi-port PCI-PCI bridge to forward between functions, because in
>> conventional PCI there would be no reason to forward it upstream, so
>> I'd think PCIe-PCI bridges, at least those designed pre-ACS, would be
>> similar.  And I don't think the ACS capability is expressive enough to
>> say "peer-to-peer transactions between functions must be forwarded
>> upstream, but peer-to-peer transactions below a single function may
>> not be."
>
> The best I can find is the statement that the bridge must forward
> transactions from the secondary interface to the primary interface.  We
> could infer that that means upstream, but we can't rule out some switch
> logic in a multifunction package that could do a re-route from that
> statement.  However then why would the PCIe spec forbid a PCIe-to-PCI
> bridge from implementing ACS?
>
> ACS doesn't need to be expressive enough for what you're describing.  We
> know that there's no protection against p2p on conventional PCI.

My point was that if a PCIe-to-PCI bridge did implement ACS, it would
be hard to interpret it.  Setting the R bit would have to mean
something like "peer-to-peer transactions between devices below
separate bridge functions must be forwarded upstream, but peer-to-peer
transactions between devices below a single bridge function are not."
Those are pretty complicated semantics, and that could be enough of a
reason to disallow it.

>  The
> PCIe bridge spec even indicates that transactions within the bridge
> apertures should not be forwarded to the primary interface (which should
> really be a big red flag that we shouldn't even attempt to support
> assignment of such devices).  The question is whether a transaction
> going out the primary interface is necessarily headed upstream or if it
> can go directly to a peer device.  In that case, I can't figure out why
> ACS is precluded from PCIe-to-PCI bridges.  If anything this is an
> example of why we need a user override, then we could it to the more
> conservative value pending further information and let the user change
> it.  Thanks,

Getting back to the point, I think we should return "false" for
PCIe-to-PCI bridges (and probably event collectors, though it probably
doesn't matter there), not "true."  I just don't see a convincing
argument otherwise.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b099e00..457ae51 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2354,6 +2354,19 @@  void pci_enable_acs(struct pci_dev *dev)
 	pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
 }
 
+static bool pci_acs_flags_enabled(struct pci_dev *pdev, u16 acs_flags)
+{
+	int pos;
+	u16 ctrl;
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
+	if (!pos)
+		return false;
+
+	pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl);
+	return (ctrl & acs_flags) == acs_flags;
+}
+
 /**
  * pci_acs_enabled - test ACS against required flags for a given device
  * @pdev: device to test
@@ -2364,8 +2377,7 @@  void pci_enable_acs(struct pci_dev *dev)
  */
 bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
 {
-	int pos, ret;
-	u16 ctrl;
+	int ret;
 
 	ret = pci_dev_specific_acs_enabled(pdev, acs_flags);
 	if (ret >= 0)
@@ -2374,23 +2386,42 @@  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
 	if (!pci_is_pcie(pdev))
 		return false;
 
-	/* Filter out flags not applicable to multifunction */
-	if (pdev->multifunction)
+	switch (pci_pcie_type(pdev)) {
+	/*
+	 * PCIe 3.0, 6.12.1 excludes ACS on these devices.  "Not applicable"
+	 */
+	case PCI_EXP_TYPE_PCI_BRIDGE:
+	case PCI_EXP_TYPE_RC_EC:
+		break;
+	/*
+	 * PCIe 3.0, 6.12.1.1 specifies full ACS capabilities on downstream
+	 * ports, regardless of them being multifunction devices.
+	 */
+	case PCI_EXP_TYPE_DOWNSTREAM:
+	case PCI_EXP_TYPE_ROOT_PORT:
+		return pci_acs_flags_enabled(pdev, acs_flags);
+	/*
+	 * PCIe 3.0, 6.12.1.2 specifies unimplemented ACS capabilities on
+	 * multifunction devices.  6.12 footnote identifies specifically
+	 * which devices types this applies to.
+	 */
+	case PCI_EXP_TYPE_ENDPOINT:
+	case PCI_EXP_TYPE_UPSTREAM:
+	case PCI_EXP_TYPE_LEG_END:
+	case PCI_EXP_TYPE_RC_END:
+		if (!pdev->multifunction)
+			break;
+
 		acs_flags &= (PCI_ACS_RR | PCI_ACS_CR |
 			      PCI_ACS_EC | PCI_ACS_DT);
 
-	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM ||
-	    pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT ||
-	    pdev->multifunction) {
-		pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
-		if (!pos)
-			return false;
-
-		pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl);
-		if ((ctrl & acs_flags) != acs_flags)
-			return false;
+		return pci_acs_flags_enabled(pdev, acs_flags);
 	}
 
+	/*
+	 * PCIe 3.0, 6.12.1.3 specifies no ACS capabilties are applicable
+	 * to single function devices with the exception of downstream ports.
+	 */
 	return true;
 }