From patchwork Mon Nov 12 21:05:05 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Thode X-Patchwork-Id: 198458 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 27E7E2C007E for ; Tue, 13 Nov 2012 08:04:50 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751502Ab2KLVEs (ORCPT ); Mon, 12 Nov 2012 16:04:48 -0500 Received: from rrcs-24-173-105-85.sw.biz.rr.com ([24.173.105.85]:40622 "EHLO mx1.mthode.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751347Ab2KLVEs (ORCPT ); Mon, 12 Nov 2012 16:04:48 -0500 Received: from khorne.mthode.org (unknown [IPv6:2001:470:e1cc:3:c0b3:8fff:fe3c:be27]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.mthode.org (Postfix) with ESMTPSA id 63523DBF1; Mon, 12 Nov 2012 16:04:46 -0500 (EST) Message-ID: <50A16481.5030309@gentoo.org> Date: Mon, 12 Nov 2012 15:05:05 -0600 From: Matthew Thode Reply-To: prometheanfire@gentoo.org User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121030 Thunderbird/16.0.1 MIME-Version: 1.0 To: Don Dutile CC: Doug Goldstein , linux-kernel@vger.kernel.org, bhelgaas@google.com, linux-pci@vger.kernel.org, mthode@mthode.org, iommu@lists.linux-foundation.org, Alex Williamson Subject: Re: [BUG 3.7-rc5] NULL pointer deref when using a pcie-pci bridged pci device and intel-iommu References: <50A03281.6040206@gentoo.org> <50A1549C.7020404@redhat.com> In-Reply-To: <50A1549C.7020404@redhat.com> X-Enigmail-Version: 1.5a1pre Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org On 11/12/2012 01:57 PM, Don Dutile wrote: > On 11/12/2012 04:26 AM, Doug Goldstein wrote: >> On Sun, Nov 11, 2012 at 5:19 PM, Matthew Thode >> wrote: >>> System boots with vt-d disabled in bios. Otherwise I get the errors in >>> the attached log. I can do whatever testing you need as this system is >>> not in production yet. gonna paste the important part here. Let me >>> know if you want anything else. >>> >>> Please CC me directly as I am not subscribed to the LKML. >>> >>> >>> Trying to unpack rootfs image as initramfs... >>> Freeing initrd memory: 5124k freed >>> IOMMU 0 0xfbffe000: using Queued invalidation >>> IOMMU: Setting RMRR: >>> IOMMU: Setting identity map for device 0000:00:1d.0 [0xbf7ec000 - >>> 0xbf7fffff] >>> IOMMU: Setting identity map for device 0000:00:1d.1 [0xbf7ec000 - >>> 0xbf7fffff] >>> IOMMU: Setting identity map for device 0000:00:1d.2 [0xbf7ec000 - >>> 0xbf7fffff] >>> IOMMU: Setting identity map for device 0000:00:1d.7 [0xbf7ec000 - >>> 0xbf7fffff] >>> IOMMU: Setting identity map for device 0000:00:1a.0 [0xbf7ec000 - >>> 0xbf7fffff] >>> IOMMU: Setting identity map for device 0000:00:1a.1 [0xbf7ec000 - >>> 0xbf7fffff] >>> IOMMU: Setting identity map for device 0000:00:1a.2 [0xbf7ec000 - >>> 0xbf7fffff] >>> IOMMU: Setting identity map for device 0000:00:1a.7 [0xbf7ec000 - >>> 0xbf7fffff] >>> IOMMU: Setting identity map for device 0000:00:1d.0 [0xec000 - 0xeffff] >>> IOMMU: Setting identity map for device 0000:00:1d.1 [0xec000 - 0xeffff] >>> IOMMU: Setting identity map for device 0000:00:1d.2 [0xec000 - 0xeffff] >>> IOMMU: Setting identity map for device 0000:00:1d.7 [0xec000 - 0xeffff] >>> IOMMU: Setting identity map for device 0000:00:1a.0 [0xec000 - 0xeffff] >>> IOMMU: Setting identity map for device 0000:00:1a.1 [0xec000 - 0xeffff] >>> IOMMU: Setting identity map for device 0000:00:1a.2 [0xec000 - 0xeffff] >>> IOMMU: Setting identity map for device 0000:00:1a.7 [0xec000 - 0xeffff] >>> IOMMU: Prepare 0-16MiB unity mapping for LPC >>> IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff] >>> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O >>> BUG: unable to handle kernel NULL pointer dereference at >>> 000000000000003c >>> IP: [] pci_get_dma_source+0xf/0x41 >>> PGD 0 >>> Oops: 0000 [#1] SMP >>> Modules linked in: >>> CPU 7 >>> Pid: 1, comm: swapper/0 Not tainted 3.7.0-rc5 #1 Penguin Computing >>> Relion 1751/X8DTU >>> RIP: 0010:[] [] >>> pci_get_dma_source+0xf/0x41 >>> RSP: 0000:ffff8806264d1d88 EFLAGS: 00010282 >>> RAX: ffffffff813bd3a8 RBX: ffff8806261d1000 RCX: 00000000e8221180 >>> RDX: ffffffff818624f0 RSI: ffff88062635b0c0 RDI: 0000000000000000 >>> RBP: ffff8806264d1d88 R08: ffff8806263d6000 R09: 00000000ffffffff >>> R10: ffff8806264d1ca8 R11: 0000000000000005 R12: 0000000000000000 >>> R13: ffff8806261d1098 R14: 0000000000000000 R15: 0000000000000000 >>> FS: 0000000000000000(0000) GS:ffff88063f2e0000(0000) >>> knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>> CR2: 000000000000003c CR3: 0000000001c0b000 CR4: 00000000000007e0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>> Process swapper/0 (pid: 1, threadinfo ffff8806264d0000, task >>> ffff8806264cf910) >>> Stack: >>> ffff8806264d1dc8 ffffffff815d02c9 0000000000000000 ffff880600000000 >>> ffff8806264d1dd8 ffffffff81c64b00 ffff8806261d1098 ffff8806264d1df8 >>> ffff8806264d1de8 ffffffff815cd5a4 ffffffff81c64b00 ffffffff815cd56a >>> Call Trace: >>> [] intel_iommu_add_device+0x95/0x167 >>> [] add_iommu_group+0x3a/0x41 >>> [] ? bus_set_iommu+0x44/0x44 >>> [] bus_for_each_dev+0x54/0x81 >>> [] bus_set_iommu+0x3d/0x44 >>> [] intel_iommu_init+0xae5/0xb5e >>> [] ? free_initrd+0x9e/0x9e >>> [] ? memblock_find_dma_reserve+0x13f/0x13f >>> [] pci_iommu_init+0x16/0x41 >>> [] ? pci_proc_init+0x6b/0x6b >>> [] do_one_initcall+0x7a/0x129 >>> [] kernel_init+0x139/0x2a2 >>> [] ? loglevel+0x31/0x31 >>> [] ? rest_init+0x6f/0x6f >>> [] ret_from_fork+0x7c/0xb0 >>> [] ? rest_init+0x6f/0x6f >>> Code: ff c1 75 04 ff d0 eb 12 48 83 c2 10 48 8b 42 08 48 85 c0 75 d3 b8 >>> e7 ff ff ff c9 c3 55 48 c7 c2 f0 24 86 81 48 89 e5 eb 24 8b 0a<66> 3b >>> 4f 3c 74 05 66 ff c1 75 13 66 8b 4a 02 66 3b 4f 3e 74 05 >>> RIP [] pci_get_dma_source+0xf/0x41 >>> RSP >>> CR2: 000000000000003c >>> ---[ end trace 5c5a2ceca067e0ec ]--- >>> Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 >>> >>> ------------[ cut here ]------------ >>> WARNING: at arch/x86/kernel/smp.c:123 >>> native_smp_send_reschedule+0x25/0x51() >>> Hardware name: Relion 1751 >>> Modules linked in: >>> Pid: 1, comm: swapper/0 Tainted: G D 3.7.0-rc5 #1 >>> Call Trace: >>> [] warn_slowpath_common+0x80/0x98 >>> [] warn_slowpath_null+0x15/0x17 >>> [] native_smp_send_reschedule+0x25/0x51 >>> [] trigger_load_balance+0x1e8/0x214 >>> [] scheduler_tick+0xd8/0xe1 >>> [] update_process_times+0x62/0x73 >>> [] tick_sched_timer+0x7c/0x9b >>> [] __run_hrtimer.clone.24+0x4e/0xc1 >>> [] hrtimer_interrupt+0xc7/0x1ac >>> [] smp_apic_timer_interrupt+0x81/0x94 >>> [] apic_timer_interrupt+0x6a/0x70 >>> [] ? console_unlock+0x2c2/0x2ed >>> [] ? panic+0x189/0x1c5 >>> [] ? panic+0xee/0x1c5 >>> [] do_exit+0x357/0x7b2 >>> [] oops_end+0xb2/0xba >>> [] no_context+0x266/0x275 >>> [] __bad_area_nosemaphore+0x1bb/0x1db >>> [] ? sysfs_addrm_finish+0x2f/0xa6 >>> [] bad_area_nosemaphore+0xe/0x10 >>> [] __do_page_fault+0x360/0x39f >>> [] ? ida_get_new_above+0xf9/0x19e >>> [] ? slab_node+0x59/0xa2 >>> [] ? mutex_unlock+0x9/0xb >>> [] ? klist_put+0x4c/0x70 >>> [] ? klist_next+0x30/0xb6 >>> [] ? pci_do_find_bus+0x49/0x49 >>> [] do_page_fault+0x9/0xb >>> [] page_fault+0x22/0x30 >>> [] ? nv_msi_ht_cap_quirk_all+0x10/0x10 >>> [] ? pci_get_dma_source+0xf/0x41 >>> [] intel_iommu_add_device+0x95/0x167 >>> [] add_iommu_group+0x3a/0x41 >>> [] ? bus_set_iommu+0x44/0x44 >>> [] bus_for_each_dev+0x54/0x81 >>> [] bus_set_iommu+0x3d/0x44 >>> [] intel_iommu_init+0xae5/0xb5e >>> [] ? free_initrd+0x9e/0x9e >>> [] ? memblock_find_dma_reserve+0x13f/0x13f >>> [] pci_iommu_init+0x16/0x41 >>> [] ? pci_proc_init+0x6b/0x6b >>> [] do_one_initcall+0x7a/0x129 >>> [] kernel_init+0x139/0x2a2 >>> [] ? loglevel+0x31/0x31 >>> [] ? rest_init+0x6f/0x6f >>> [] ret_from_fork+0x7c/0xb0 >>> [] ? rest_init+0x6f/0x6f >>> ---[ end trace 5c5a2ceca067e0ed ]--- >>> >>> -- >>> -- Matthew Thode (prometheanfire) >> >> The root cause of Matt's issue is that intel_iommu_add_device() calls >> pci_get_domain_bus_and_slot() which is returning NULL. Which is not an >> expected value. The reason NULL is being returned is that Matt has a >> card with a TI XIO2000A/XIO2200A PCIe-PCI bridge (VID: 104C, DID: >> 8231) on it. This device already has a quirk setup for disabling fast >> back to back transfers on its secondary bus. If we cause it to use the >> primary bus, that appears to resolve the issue. I'm not sure exactly >> how to proceed from here due to relative lack of knowledge of PCI. Do >> all PCIe-PCI bridges with secondary buses need their DMA parent to be >> the primary bus or is that just something that should be done for the >> TI XIO2000A due to the existing quirk? >> > DMA from a (legacy) PCI device does not have a SRC-ID in the transaction, > so the source of the device generating the DMA is unknown. When bridging > to a PCIe device, the Parent PPB's dev-id is inserted on the PCIe as the > source > of a transaction -- in this case, DMA read/write transaction. > This (sw) mapping should have happened by default, unless a recent > change from VFIO > broke this mapping.... or the TI bridge didn't report itself correctly > as a PCIe-PCI bridge. > Alex ? > > >> The failing call with arguments was pci_get_domain_bus_and_slot(0, 5, >> 0), while pci_get_domain_bus_and_slot(0, 4, 0) resulted in a system >> that didn't panic and a device that worked. >> >> $ lspci -tvn >> -+-[0000:ff]-+-00.0 8086:2c40 >> | +-00.1 8086:2c01 >> | +-02.0 8086:2c10 >> | +-02.1 8086:2c11 >> | +-02.4 8086:2c14 >> | +-02.5 8086:2c15 >> | +-03.0 8086:2c18 >> | +-03.1 8086:2c19 >> | +-03.2 8086:2c1a >> | +-03.4 8086:2c1c >> | +-04.0 8086:2c20 >> | +-04.1 8086:2c21 >> | +-04.2 8086:2c22 >> | +-04.3 8086:2c23 >> | +-05.0 8086:2c28 >> | +-05.1 8086:2c29 >> | +-05.2 8086:2c2a >> | +-05.3 8086:2c2b >> | +-06.0 8086:2c30 >> | +-06.1 8086:2c31 >> | +-06.2 8086:2c32 >> | \-06.3 8086:2c33 >> \-[0000:00]-+-00.0 8086:3406 >> +-01.0-[01]--+-00.0 8086:10c9 >> | \-00.1 8086:10c9 >> +-03.0-[02]-- >> +-05.0-[03]-- >> +-07.0-[04-05]----00.0-[05]----08.0 d161:8006 >> +-09.0-[06]----00.0 8086:10b9 >> +-13.0 8086:342d >> +-14.0 8086:342e >> +-14.1 8086:3422 >> +-14.2 8086:3423 >> +-14.3 8086:3438 >> +-16.0 8086:3430 >> +-16.1 8086:3431 >> +-16.2 8086:3432 >> +-16.3 8086:3433 >> +-16.4 8086:3429 >> +-16.5 8086:342a >> +-16.6 8086:342b >> +-16.7 8086:342c >> +-1a.0 8086:3a37 >> +-1a.1 8086:3a38 >> +-1a.2 8086:3a39 >> +-1a.7 8086:3a3c >> +-1d.0 8086:3a34 >> +-1d.1 8086:3a35 >> +-1d.2 8086:3a36 >> +-1d.7 8086:3a3a >> +-1e.0-[07]----01.0 102b:0532 >> +-1f.0 8086:3a16 >> +-1f.2 8086:3a22 >> \-1f.3 8086:3a30 >> >> If someone can craft the correct patch that'd be great or answer the >> above question and I'll gladly craft it. >> >> Thanks. > because I didn't see it. Here was the patch that got it working for me (ignore the printks), applies against 3.6.6 and 3.7-rc5. dma_pdev = pci_dev_get(pdev); diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index d4a4cd4..df0d7ef 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4118,11 +4118,20 @@ static int intel_iommu_add_device(struct device *dev) bridge = pci_find_upstream_pcie_bridge(pdev); if (bridge) { - if (pci_is_pcie(bridge)) + if (pci_is_pcie(bridge)) { dma_pdev = pci_get_domain_bus_and_slot( pci_domain_nr(pdev->bus), bridge->subordinate->number, 0); - else + if (!dma_pdev) { + printk("MTHODE: debugging\n"); + printk("domain: %d, bus: %d, bridge->sub %d\n", + pci_domain_nr(pdev->bus), pdev->bus, + bridge->subordinate->number); + dma_pdev = pci_get_domain_bus_and_slot( + pci_domain_nr(pdev->bus), + bridge->subordinate->number - 1, 0); + } + } else dma_pdev = pci_dev_get(bridge); } else