Patchwork [5/8] pci: resource assignment based on p2p alignment

login
register
mail settings
Submitter Gavin Shan
Date Aug. 20, 2012, 1:46 p.m.
Message ID <1345470395-17295-6-git-send-email-shangw@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/178781/
State Superseded
Headers show

Comments

Gavin Shan - Aug. 20, 2012, 1:46 p.m.
The patch changes function pbus_size_io() and pbus_size_mem() to
do resource (I/O, memory and prefetchable memory) reassignment
based on the minimal alignments for the p2p bridge, which was
retrieved by function window_alignment().

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 drivers/pci/setup-bus.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)
Bjorn Helgaas - Aug. 21, 2012, 5:46 p.m.
On Tue, Aug 21, 2012 at 12:36 AM, Gavin Shan <shangw@linux.vnet.ibm.com> wrote:
>
> Last night, I talked to Bjorn through IRC. It seems that Bjorn still have
> concerns on what we're doing in function pcibios_window_alignment() and
> pcibios_align_resource(). Another question that Bjorn has is that's possible
> to put the PPC alignment (alignment on p2p memory windows) to pcibios_align_resource().
> Actually, I've replied to that before and it seems the reply was lost again.

It's not that I missed the previous discussion, it's just that I
didn't think we had come up with a clear explanation for why we needed
two kinds of alignment hooks and a way to decide when to use one vs.
the other.  It might be clear in *your* head, but it's not clear to me
yet :)

> Anyway, here's the full message about the discussion.
>
> https://patchwork.kernel.org/patch/1203171/
>
>>> Hmm..'struct resource *window' may not yet know which aperature it is
>>> allocated from. Will it? We are still in the sizing process, the allocation will
>>> be done much later.
>>
>> Of course, you're absolutely right; I had this backwards.  In
>> pbus_size_io/mem(), we do "b_res = find_free_bus_resource()", so b_res
>> is a bus resource that matches the desired type (IO/MEM).  This
>> resource represents an aperture of the upstream bridge leading to the
>> bus.  I was thinking that b_res->start would contain address
>> information that the arch could use to decide alignment.
>>
>> But at this point, in pbus_size_io/mem(), we set "b_res->start =
>> min_align", so obviously b_res contains no information about the
>> window base that will eventually be assigned.  I think b_res is
>> basically the *container* into which we'll eventually put the P2P
>> aperture start/end, but here, we're using that container to hold the
>> information about the size and alignment we need for that aperture.
>>
>> The fact that we deal with alignment in pbus_size_mem() and again in
>> __pci_assign_resource() (via pcibios_align_resource) is confusing to
>> me -- I don't have a clear idea of what sorts of alignment are done in
>> each place.  Could this powerpc alignment be done in
>> pcibios_align_resource()?  We do have the actual proposed address
>> there, as well as the pci_dev.
>
> Currently, we have 2 phases for resourece reallocation. During the first
> phase that is being covered by function pbus_size_io() and pbus_size_mem(),
> the PCI hierarchy tree is scanned from bottom to top so that we can reserve
> enough resources in p2p I/O or memory windows for its child devices (including
> child p2p bridge possiblly). During the time, pcibios_window_alignment() will
> be called to determine the minimal alignment of the p2p windows (I/O or memory)
> so that the p2p windows could be aligned to what the specific platform requires.
>
> In the 2nd phase, the resources of p2p windows as well as PCI devices will
> be allocated. That will be done according to the PCI hierarchy tree from top
> to bottom. During the period, function pcibios_align_resource() will be called
> to do _little_ adjustment on those resources that have special requirments. If
> I understood correctly, it's unsafe to change the resource size (or start address)
> greatly.
>
> Lets have one example here for more explanation on why it's unsafe to change
> the resource size (or start address) greatly by pcibios_align_resource(). As
> the following figure shows, we have 2 p2p bridges (A, B) and another one PCI
> device (C). The memory resources they're owning are shown below them after the
> phase one. Now, we're running in phase 2 (resource allocation) for (B) when the
> allocation on resources of (A) should have done because phase 2 is expected to
> be done from top to bottom of the PCI tree. It is impossible to extend the resource
> of (B) for another one 1MB to [0x100000, 0x2fffff] through pcibios_align_resource()
> since (C) needs some resources as well. The cause is that we never did enough resource
> reservation for (A) during phase 1. So pcibios_align_resource() isn't safe enough to
> cover the PowerPC alignment we requires, but pcibios_window_alignment() can accomodate
> that very well :-)
>
>                 [ p2p bridge A ]
>                 [0x100000, 0x2fffff]
>                       |
>         ---------------------------------
>         |                               |
>       [ p2p bridge B ]              [ PCI dev C]
>       [0x100000, 0x1fffff]
>
> In summary, pcibios_window_alignment() takes affect in phase 1 (resource reservation),
> but pcibios_align_resource() will function in phase 2 (resource allocation). They're
> different.

This is a good start.  You're saying that "large" alignments must be
done in phase 1 with pcibios_window_alignment() and
pcibios_align_resource() can do "small" adjustments in phase 2.  How
do we quantify that?  *Can* it be quantified?

I think pcibios_align_resource() is primarily used to deal with
devices that only do partial decoding, e.g., ISA 10-bit I/O port
resources.  By the time we call it, I think the upstream aperture has
already been determined, so the allocation may fail completely.

It seems like we can't accurately do even the resource reservation
part without knowing what pcibios_align_resource() is going to do.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - Sept. 6, 2012, 11:21 p.m.
On Mon, Aug 20, 2012 at 7:46 AM, Gavin Shan <shangw@linux.vnet.ibm.com> wrote:
> The patch changes function pbus_size_io() and pbus_size_mem() to
> do resource (I/O, memory and prefetchable memory) reassignment
> based on the minimal alignments for the p2p bridge, which was
> retrieved by function window_alignment().
>
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>  drivers/pci/setup-bus.c |   16 ++++++++++++----
>  1 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 826a4f3..bb2eade 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -741,7 +741,7 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
>         struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO);
>         unsigned long size = 0, size0 = 0, size1 = 0;
>         resource_size_t children_add_size = 0;
> -       resource_size_t min_align = 4096, align;
> +       resource_size_t io_align, min_align, align;
>
>         if (!b_res)
>                 return;
> @@ -750,8 +750,14 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
>          * Per spec, I/O windows are 4K-aligned, but some bridges have an
>          * extension to support 1K alignment.
>          */
> -       if (bus->self->io_window_1k)
> +       if (bus->self->io_window_1k) {
> +               io_align = 4096;

Why is io_align 4K instead of 1K here?

>                 min_align = 1024;
> +       } else {
> +               io_align = window_alignment(bus, IORESOURCE_IO);

This doesn't seem right.  Only the non-1k path uses
window_alignment(), so there's no way for the platform to influence
the alignment of bridges that support 1k windows.  I think the
io_window_1k checking needs to be moved from pbus_size_io() to
window_alignment().

Maybe you can then make the min_align handling in pbus_size_io() more
similar to that in pbus_size_mem(), e.g.,

    min_align = 0;
    ...
    min_align = max(min_align, window_alignment(bus, b_res->flags & mask));

> +               min_align = io_align;
> +       }
> +
>         list_for_each_entry(dev, &bus->devices, bus_list) {
>                 int i;
>
> @@ -778,8 +784,8 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
>                 }
>         }
>
> -       if (min_align > 4096)
> -               min_align = 4096;
> +       if (min_align > io_align)
> +               min_align = io_align;
>
>         size0 = calculate_iosize(size, min_size, size1,
>                         resource_size(b_res), min_align);
> @@ -901,6 +907,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
>                         min_align = align1 >> 1;
>                 align += aligns[order];
>         }
> +
> +       min_align = max(min_align, window_alignment(bus, b_res->flags & mask));
>         size0 = calculate_memsize(size, min_size, 0, resource_size(b_res), min_align);
>         if (children_add_size > add_size)
>                 add_size = children_add_size;
> --
> 1.7.5.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 826a4f3..bb2eade 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -741,7 +741,7 @@  static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO);
 	unsigned long size = 0, size0 = 0, size1 = 0;
 	resource_size_t children_add_size = 0;
-	resource_size_t min_align = 4096, align;
+	resource_size_t io_align, min_align, align;
 
 	if (!b_res)
  		return;
@@ -750,8 +750,14 @@  static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	 * Per spec, I/O windows are 4K-aligned, but some bridges have an
 	 * extension to support 1K alignment.
 	 */
-	if (bus->self->io_window_1k)
+	if (bus->self->io_window_1k) {
+		io_align = 4096;
 		min_align = 1024;
+	} else {
+		io_align = window_alignment(bus, IORESOURCE_IO);
+		min_align = io_align;
+	}
+
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
 
@@ -778,8 +784,8 @@  static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 		}
 	}
 
-	if (min_align > 4096)
-		min_align = 4096;
+	if (min_align > io_align)
+		min_align = io_align;
 
 	size0 = calculate_iosize(size, min_size, size1,
 			resource_size(b_res), min_align);
@@ -901,6 +907,8 @@  static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			min_align = align1 >> 1;
 		align += aligns[order];
 	}
+
+	min_align = max(min_align, window_alignment(bus, b_res->flags & mask));
 	size0 = calculate_memsize(size, min_size, 0, resource_size(b_res), min_align);
 	if (children_add_size > add_size)
 		add_size = children_add_size;