From patchwork Fri Jul 13 15:52:11 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Metcalf X-Patchwork-Id: 170912 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 7DFD22C0348 for ; Sat, 14 Jul 2012 01:52:17 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753920Ab2GMPwP (ORCPT ); Fri, 13 Jul 2012 11:52:15 -0400 Received: from usmamail.tilera.com ([12.216.194.151]:37876 "EHLO USMAMAIL.TILERA.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752691Ab2GMPwO (ORCPT ); Fri, 13 Jul 2012 11:52:14 -0400 Received: from [10.7.0.95] (10.9.0.23) by USMAExch2.tad.internal.tilera.com (10.3.0.33) with Microsoft SMTP Server id 14.0.694.0; Fri, 13 Jul 2012 11:52:12 -0400 Message-ID: <5000442B.7090208@tilera.com> Date: Fri, 13 Jul 2012 11:52:11 -0400 From: Chris Metcalf User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120614 Thunderbird/13.0.1 MIME-Version: 1.0 To: Bjorn Helgaas CC: , , Marek Szyprowski Subject: Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices References: <201206162036.q5GKacEK017774@lab-41.internal.tilera.com> <201206162037.q5GKbFg0017781@lab-41.internal.tilera.com> In-Reply-To: X-Enigmail-Version: 1.4.2 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Sorry for the slow reply to your feedback; I had to coordinate with our primary PCI developer (in another timezone) and we both had various unrelated fires to fight along the way. I've appended the patch that corrects all the issues you reported. Bjorn, I'm assuming that it's appropriate for me to push this change through the tile tree (along with all the infrastructural changes to support the TILE-Gx TRIO shim that implements PCIe for our chip) rather than breaking it out to push it through the pci tree; does that sound correct to you? On 6/22/2012 7:24 AM, Bjorn Helgaas wrote: > On Fri, Jun 15, 2012 at 1:23 PM, Chris Metcalf wrote: >> This change uses the TRIO IOMMU to map the PCI DMA space and physical >> memory at different addresses. We also now use the dma_mapping_ops >> to provide support for non-PCI DMA, PCIe DMA (64-bit) and legacy PCI >> DMA (32-bit). We use the kernel's software I/O TLB framework >> (i.e. bounce buffers) for the legacy 32-bit PCI device support since >> there are a limited number of TLB entries in the IOMMU and it is >> non-trivial to handle indexing, searching, matching, etc. For 32-bit >> devices the performance impact of bounce buffers should not be a concern. >> >> >> +extern void >> +pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region, >> + struct resource *res); >> + >> +extern void >> +pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res, >> + struct pci_bus_region *region); > These extern declarations look like leftovers that shouldn't be needed. Thanks. Removed. >> +/* PCI I/O space support is not implemented. */ >> +static struct resource pci_ioport_resource = { >> + .name = "PCI IO", >> + .start = 0, >> + .end = 0, >> + .flags = IORESOURCE_IO, >> +}; > You don't need to define pci_ioport_resource at all if you don't > support I/O space. We have some internal changes to support I/O space, but for now I've gone ahead and removed pci_ioport_resource. >> + /* >> + * The PCI memory resource is located above the PA space. >> + * The memory range for the PCI root bus should not overlap >> + * with the physical RAM >> + */ >> + pci_add_resource_offset(&resources, &iomem_resource, >> + 1ULL << CHIP_PA_WIDTH()); > This says that your entire physical address space (currently > 0x0-0xffffffff_ffffffff) is routed to the PCI bus, which is not true. > I think what you want here is pci_iomem_resource, but I'm not sure > that's set up correctly. It should contain the CPU physical address > that are routed to the PCI bus. Since you mention an offset, the PCI > bus addresses will "CPU physical address - offset". Yes, we've changed it to use pci_iomem_resource. On TILE-Gx, there are two types of CPU physical addresses: physical RAM addresses and MMIO addresses. The MMIO address has the MMIO attribute in the page table. So, the physical address spaces for the RAM and the PCI are completely separate. Instead, we have the following relationship: PCI bus address = PCI resource address - offset, where the PCI resource addresses are defined by pci_iomem_resource and they are never generated by the CPU. > I don't understand the CHIP_PA_WIDTH() usage -- that seems to be the > physical address width, but you define TILE_PCI_MEM_END as "((1ULL << > CHIP_PA_WIDTH()) + TILE_PCI_BAR_WINDOW_TOP)", which would mean the CPU > could never generate that address. Exactly. The CPU-generated physical addresses for the PCI space, i.e. the MMIO addresses, have an address format that is defined by the RC controller. They go to the RC controller directly, because the page table entry also encodes the RC controller’s location on the chip. > I might understand this better if you could give a concrete example of > the CPU address range and the corresponding PCI bus address range. > For example, I have a box where CPU physical address range [mem > 0xf0000000000-0xf007edfffff] is routed to PCI bus address range > [0x80000000-0xfedfffff]. In this case, the struct resource contains > 0xf0000000000-0xf007edfffff, and the offset is 0xf0000000000 - > 0x80000000 or 0xeff80000000. The TILE-Gx chip’s CHIP_PA_WIDTH is 40-bit. In the following example, the system has 32GB RAM installed, with 16GB in each of the 2 memory controllers. For the first mvsas device, its PCI memory resource is [0x100c0000000, 0x100c003ffff], the corresponding PCI bus address range is [0xc0000000, 0xc003ffff] after subtracting the offset of (1ul << 40). The aforementioned PCI MMIO address’s low 32-bits contains the PCI bus address. # cat /proc/iomem 00000000-3fbffffff : System RAM 00000000-007eeb1f : Kernel code 00860000-00af6e4b : Kernel data 4000000000-43ffffffff : System RAM 100c0000000-100c003ffff : mvsas 100c0040000-100c005ffff : mvsas 100c0200000-100c0203fff : sky2 100c0300000-100c0303fff : sata_sil24 100c0304000-100c030407f : sata_sil24 100c0400000-100c0403fff : sky2 Note that in above example, the 2 mvsas devices are in a separate PCI domain than the other 4 devices. > The comments at TILE_PCI_MEM_MAP_BASE_OFFSET suggest that you have two > MMIO regions (one for bus addresses <4GB), so there should be two > resources on the list here. There is a single MMIO region, defined by the corresponding resource pci_iomem_resource. The TILE_PCI_MEM_MAP_BASE_OFFSET is used in the context of inbound access only, i.e. for DMA access. Yes, there are two inbound windows. First is [1ULL << CHIP_PA_WIDTH(), 1ULL << CHIP_PA_WIDTH() + 1], used by devices that can generate 64-bit DMA addresses. The HW IOMMU is used to derive the real RAM address by subtracting 1ULL << CHIP_PA_WIDTH() from the DMA address. The second inbound window is [0, 3GB] with direct mapping, used by 32-bit devices, where 3GB = 4GB – MMIO_region. > The list should also include a bus number resource describing the bus > numbers claimed by the host bridge. Since you don't have that, we'll > default to [bus 00-ff], but that's wrong if you have more than one > host bridge. Fixed. > In fact, since it appears that you *do* have multiple host bridges, > the "resources" list should be constructed so it contains the bus > number and MMIO apertures for each bridge, which should be > non-overlapping. We use the same pci_iomem_resource for different domains or host bridges, but the MMIO apertures for each bridge do not overlap because non-overlapping resource ranges are allocated for each domains. >> void __devinit pcibios_fixup_bus(struct pci_bus *bus) >> { >> - /* Nothing needs to be done. */ >> + struct pci_dev *dev = bus->self; >> + >> + if (!dev) { >> + /* This is the root bus. */ >> + bus->resource[0] = &pci_ioport_resource; >> + bus->resource[1] = &pci_iomem_resource; >> + } > Please don't add this. I'm in the process of removing > pcibios_fixup_bus() altogether. Instead, you should put > pci_iomem_resource on a resources list and use pci_scan_root_bus(). I removed the contents of pcibios_fixup_bus(), but leaving the no-op function in for now, until after the 3.6 merge. >> /* >> - * We reserve all resources above 4GB so that PCI won't try to put >> + * On Pro, we reserve all resources above 4GB so that PCI won't try to put >> * mappings above 4GB; the standard allows that for some devices but >> * the probing code trunates values to 32 bits. > I think this comment about probing code truncating values is out of > date. Or if it's not, please point me to it so we can fix it :) Yes, it's out of date; fixed. >> @@ -1588,7 +1585,7 @@ static int __init request_standard_resources(void) >> enum { CODE_DELTA = MEM_SV_INTRPT - PAGE_OFFSET }; >> >> iomem_resource.end = -1LL; > This patch isn't touching iomem_resource, but iomem_resource.end > *should* be set to the highest physical address your CPU can generate, > which is probably smaller than this. This is not necessarily true. True on x86 where the PA space is shared by the RAM and the PCI. On TILE-Gx, iomem_resource covers all resources of type IORESOURCE_MEM, which include the RAM resource and the PCI resource. On the other hand, setting it here is not necessary because it is set to -1 in iomem_resource’s definition in kernel/resource.c. The change follows. commit d52776fade4dadf0b034d101f0cd4ce4f8d2f48f Author: Chris Metcalf Date: Sun Jul 1 14:42:49 2012 -0400 tile: updates to pci root complex from community feedback diff --git a/arch/tile/include/asm/pci.h b/arch/tile/include/asm/pci.h index 553b7ff..93a1f14 100644 --- a/arch/tile/include/asm/pci.h +++ b/arch/tile/include/asm/pci.h @@ -161,6 +161,7 @@ struct pci_controller { uint64_t mem_offset; /* cpu->bus memory mapping offset. */ + int first_busno; int last_busno; struct pci_ops *ops; @@ -179,14 +180,6 @@ extern gxio_trio_context_t trio_contexts[TILEGX_NUM_TRIO]; extern void pci_iounmap(struct pci_dev *dev, void __iomem *); -extern void -pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region, - struct resource *res); - -extern void -pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res, - struct pci_bus_region *region); - /* * The PCI address space does not equal the physical memory address * space (we have an IOMMU). The IDE and SCSI device layers use this diff --git a/arch/tile/kernel/pci_gx.c b/arch/tile/kernel/pci_gx.c index 27f7ab0..56a3c97 100644 --- a/arch/tile/kernel/pci_gx.c +++ b/arch/tile/kernel/pci_gx.c @@ -96,14 +96,6 @@ static struct pci_ops tile_cfg_ops; /* Mask of CPUs that should receive PCIe interrupts. */ static struct cpumask intr_cpus_map; -/* PCI I/O space support is not implemented. */ -static struct resource pci_ioport_resource = { - .name = "PCI IO", - .start = 0, - .end = 0, - .flags = IORESOURCE_IO, -}; - static struct resource pci_iomem_resource = { .name = "PCI mem", .start = TILE_PCI_MEM_START, @@ -588,6 +580,7 @@ int __init pcibios_init(void) { resource_size_t offset; LIST_HEAD(resources); + int next_busno; int i; tile_pci_init(); @@ -628,7 +621,7 @@ int __init pcibios_init(void) msleep(250); /* Scan all of the recorded PCI controllers. */ - for (i = 0; i < num_rc_controllers; i++) { + for (next_busno = 0, i = 0; i < num_rc_controllers; i++) { struct pci_controller *controller = &pci_controllers[i]; gxio_trio_context_t *trio_context = controller->trio; TRIO_PCIE_INTFC_PORT_CONFIG_t port_config; @@ -843,13 +836,14 @@ int __init pcibios_init(void) * The memory range for the PCI root bus should not overlap * with the physical RAM */ - pci_add_resource_offset(&resources, &iomem_resource, + pci_add_resource_offset(&resources, &pci_iomem_resource, 1ULL << CHIP_PA_WIDTH()); - bus = pci_scan_root_bus(NULL, 0, controller->ops, + controller->first_busno = next_busno; + bus = pci_scan_root_bus(NULL, next_busno, controller->ops, controller, &resources); controller->root_bus = bus; - controller->last_busno = bus->subordinate; + next_busno = bus->subordinate + 1; } @@ -1011,20 +1005,9 @@ alloc_mem_map_failed: } subsys_initcall(pcibios_init); -/* - * PCI scan code calls the arch specific pcibios_fixup_bus() each time it scans - * a new bridge. Called after each bus is probed, but before its children are - * examined. - */ +/* Note: to be deleted after Linux 3.6 merge. */ void __devinit pcibios_fixup_bus(struct pci_bus *bus) { - struct pci_dev *dev = bus->self; - - if (!dev) { - /* This is the root bus. */ - bus->resource[0] = &pci_ioport_resource; - bus->resource[1] = &pci_iomem_resource; - } } /* @@ -1172,11 +1155,11 @@ static int __devinit tile_cfg_read(struct pci_bus *bus, void *mmio_addr; /* - * Map all accesses to the local device (bus == 0) into the + * Map all accesses to the local device on root bus into the * MMIO space of the MAC. Accesses to the downstream devices * go to the PIO space. */ - if (busnum == 0) { + if (pci_is_root_bus(bus)) { if (device == 0) { /* * This is the internal downstream P2P bridge, @@ -1205,11 +1188,11 @@ static int __devinit tile_cfg_read(struct pci_bus *bus, } /* - * Accesses to the directly attached device (bus == 1) have to be + * Accesses to the directly attached device have to be * sent as type-0 configs. */ - if (busnum == 1) { + if (busnum == (controller->first_busno + 1)) { /* * There is only one device off of our built-in P2P bridge. */ @@ -1303,11 +1286,11 @@ static int __devinit tile_cfg_write(struct pci_bus *bus, u8 val_8 = (u8)val; /* - * Map all accesses to the local device (bus == 0) into the + * Map all accesses to the local device on root bus into the * MMIO space of the MAC. Accesses to the downstream devices * go to the PIO space. */ - if (busnum == 0) { + if (pci_is_root_bus(bus)) { if (device == 0) { /* * This is the internal downstream P2P bridge, @@ -1336,11 +1319,11 @@ static int __devinit tile_cfg_write(struct pci_bus *bus, } /* - * Accesses to the directly attached device (bus == 1) have to be + * Accesses to the directly attached device have to be * sent as type-0 configs. */ - if (busnum == 1) { + if (busnum == (controller->first_busno + 1)) { /* * There is only one device off of our built-in P2P bridge. */ diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c index 2b8b689..ea930ba 100644 --- a/arch/tile/kernel/setup.c +++ b/arch/tile/kernel/setup.c @@ -1536,8 +1536,7 @@ static struct resource code_resource = { /* * On Pro, we reserve all resources above 4GB so that PCI won't try to put - * mappings above 4GB; the standard allows that for some devices but - * the probing code trunates values to 32 bits. + * mappings above 4GB. */ #if defined(CONFIG_PCI) && !defined(__tilegx__) static struct resource* __init