Message ID | 20130129020357.GB12156@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Accepted, archived |
Commit | 14b6f00f8a4fdec5ccd45a0710284de301a61628 |
Delegated to: | Benjamin Herrenschmidt |
Headers | show |
On Mon, 2013-01-28 at 18:03 -0800, Nishanth Aravamudan wrote: > pseries/iommu: remove DDW on kexec > ... > > I believe the simplest, easiest-to-maintain fix is to just change our > initcall to, rather than detecting and updating the new kernel's DDW > knowledge, just remove all DDW configurations. When the drivers > re-initialize, we will set everything back up as it was before. I don't know this code at all, but this sounds like it will also work for kdump, right? ie. when the original kernel has crashed the 2nd kernel will tear the DDW down and set it back up. cheers
Hi Michael, On 29.01.2013 [21:58:28 +1100], Michael Ellerman wrote: > On Mon, 2013-01-28 at 18:03 -0800, Nishanth Aravamudan wrote: > > pseries/iommu: remove DDW on kexec > > ... > > > > I believe the simplest, easiest-to-maintain fix is to just change our > > initcall to, rather than detecting and updating the new kernel's DDW > > knowledge, just remove all DDW configurations. When the drivers > > re-initialize, we will set everything back up as it was before. > > I don't know this code at all, but this sounds like it will also work > for kdump, right? ie. when the original kernel has crashed the 2nd > kernel will tear the DDW down and set it back up. Yes, my actual test-case (and what was reported as broken) was kdump. From my relatively vague (but now growing) understanding of that process, kdump does use kexec under the covers to switch to the crash kernel, and so does get the same benefit from this change. Another datapoint, though, is that it might make sense to recommend (and I'm working on figuring this out for the distros, etc) to use disable_ddw anyways for the kdump kernel command-line, as DDW isn't 'free' and it's unclear if performance is a huge concern for the crash kernel (sort of varies with where your storage is, and how much you need to dump, which for kdump generally doesn't seem like that much?). Thanks, Nish
On Tue, 2013-01-29 at 12:33 -0800, Nishanth Aravamudan wrote: > Hi Michael, > > On 29.01.2013 [21:58:28 +1100], Michael Ellerman wrote: > > On Mon, 2013-01-28 at 18:03 -0800, Nishanth Aravamudan wrote: > > > pseries/iommu: remove DDW on kexec > > > ... > > > > > > I believe the simplest, easiest-to-maintain fix is to just change our > > > initcall to, rather than detecting and updating the new kernel's DDW > > > knowledge, just remove all DDW configurations. When the drivers > > > re-initialize, we will set everything back up as it was before. > > > > I don't know this code at all, but this sounds like it will also work > > for kdump, right? ie. when the original kernel has crashed the 2nd > > kernel will tear the DDW down and set it back up. > > Yes, my actual test-case (and what was reported as broken) was kdump. > From my relatively vague (but now growing) understanding of that > process, kdump does use kexec under the covers to switch to the crash > kernel, and so does get the same benefit from this change. OK good. Yeah kdump is basically equivalent to kexec with one major difference, which is that before a kexec the first kernel is shutdown more or less normally - as if it was about to reboot. kdump on the other hand is invoked in the panic path, so nothing is shutdown in the first kernel (almost, see ehea). So to accommodate kdump you want any logic to be in the startup path of the 2nd kernel. > Another datapoint, though, is that it might make sense to recommend (and > I'm working on figuring this out for the distros, etc) to use > disable_ddw anyways for the kdump kernel command-line, as DDW isn't > 'free' and it's unclear if performance is a huge concern for the crash > kernel (sort of varies with where your storage is, and how much you need > to dump, which for kdump generally doesn't seem like that much?). The kdump kernel /should/ just be creating a crash dump and then rebooting. That may involve writing all of RAM out to disk/nw, which could be a lot of I/O. So performance may be a concern, but correctness is much much much more important. Some folks have talked about having the kdump kernel just come up and pretend it is back in production, but that is madness for various reasons and I don't think anyone ever did it. cheers
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index a8e99f9..1b2a174 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -787,33 +787,68 @@ static u64 find_existing_ddw(struct device_node *pdn) return dma_addr; } +static void __restore_default_window(struct eeh_dev *edev, + u32 ddw_restore_token) +{ + u32 cfg_addr; + u64 buid; + int ret; + + /* + * Get the config address and phb buid of the PE window. + * Rely on eeh to retrieve this for us. + * Retrieve them from the pci device, not the node with the + * dma-window property + */ + cfg_addr = edev->config_addr; + if (edev->pe_config_addr) + cfg_addr = edev->pe_config_addr; + buid = edev->phb->buid; + + do { + ret = rtas_call(ddw_restore_token, 3, 1, NULL, cfg_addr, + BUID_HI(buid), BUID_LO(buid)); + } while (rtas_busy_delay(ret)); + pr_info("ibm,reset-pe-dma-windows(%x) %x %x %x returned %d\n", + ddw_restore_token, cfg_addr, BUID_HI(buid), BUID_LO(buid), ret); +} + static int find_existing_ddw_windows(void) { - int len; struct device_node *pdn; - struct direct_window *window; const struct dynamic_dma_window_prop *direct64; + const u32 *ddw_extensions; if (!firmware_has_feature(FW_FEATURE_LPAR)) return 0; for_each_node_with_property(pdn, DIRECT64_PROPNAME) { - direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len); + direct64 = of_get_property(pdn, DIRECT64_PROPNAME, NULL); if (!direct64) continue; - window = kzalloc(sizeof(*window), GFP_KERNEL); - if (!window || len < sizeof(struct dynamic_dma_window_prop)) { - kfree(window); - remove_ddw(pdn); - continue; - } + /* + * We need to ensure the IOMMU table is active when we + * return from the IOMMU setup so that the common code + * can clear the table or find the holes. To that end, + * first, remove any existing DDW configuration. + */ + remove_ddw(pdn); - window->device = pdn; - window->prop = direct64; - spin_lock(&direct_window_list_lock); - list_add(&window->list, &direct_window_list); - spin_unlock(&direct_window_list_lock); + /* + * Second, if we are running on a new enough level of + * firmware where the restore API is present, use it to + * restore the 32-bit window, which was removed in + * create_ddw. + * If the API is not present, then create_ddw couldn't + * have removed the 32-bit window in the first place, so + * removing the DDW configuration should be sufficient. + */ + ddw_extensions = of_get_property(pdn, "ibm,ddw-extensions", + NULL); + if (ddw_extensions && ddw_extensions[0] > 0) + __restore_default_window(of_node_to_eeh_dev(pdn), + ddw_extensions[1]); } return 0; @@ -886,30 +921,7 @@ static int create_ddw(struct pci_dev *dev, const u32 *ddw_avail, static void restore_default_window(struct pci_dev *dev, u32 ddw_restore_token) { - struct eeh_dev *edev; - u32 cfg_addr; - u64 buid; - int ret; - - /* - * Get the config address and phb buid of the PE window. - * Rely on eeh to retrieve this for us. - * Retrieve them from the pci device, not the node with the - * dma-window property - */ - edev = pci_dev_to_eeh_dev(dev); - cfg_addr = edev->config_addr; - if (edev->pe_config_addr) - cfg_addr = edev->pe_config_addr; - buid = edev->phb->buid; - - do { - ret = rtas_call(ddw_restore_token, 3, 1, NULL, cfg_addr, - BUID_HI(buid), BUID_LO(buid)); - } while (rtas_busy_delay(ret)); - dev_info(&dev->dev, - "ibm,reset-pe-dma-windows(%x) %x %x %x returned %d\n", - ddw_restore_token, cfg_addr, BUID_HI(buid), BUID_LO(buid), ret); + __restore_default_window(pci_dev_to_eeh_dev(dev), ddw_restore_token); } /*
pseries/iommu: remove DDW on kexec We currently insert a property in the device-tree when we successfully configure DDW for a given slot. This was meant to be an optimization to speed up kexec/kdump, so that we don't need to make the RTAS calls again to re-configured DDW in the new kernel. However, we end up tripping a plpar_tce_stuff failure on kexec/kdump because we unconditionally parse the ibm,dma-window property for the node at bus/dev setup time. This property contains the 32-bit DMA window LIOBN, which is distinct from the DDW window's. We pass that LIOBN (via iommu_table_init -> iommu_table_clear -> tce_free -> tce_freemulti_pSeriesLP) to plpar_tce_stuff, which fails because that 32-bit window is no longer present after 25ebc45b93452d0bc60271f178237123c4b26808 ("powerpc/pseries/iommu: remove default window before attempting DDW manipulation"). I believe the simplest, easiest-to-maintain fix is to just change our initcall to, rather than detecting and updating the new kernel's DDW knowledge, just remove all DDW configurations. When the drivers re-initialize, we will set everything back up as it was before. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>