Message ID | 20150902153928.GB47557@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Michael Ellerman |
Headers | show |
On Wed, 2015-09-02 at 08:39 -0700, Nishanth Aravamudan wrote: > On 02.09.2015 [19:00:31 +1000], Alexey Kardashevskiy wrote: > > On 09/02/2015 11:11 AM, Nishanth Aravamudan wrote: > > >diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c > > >index 85cbc96eff6c..0d7967e31169 100644 > > >--- a/arch/powerpc/platforms/powernv/pci-ioda.c > > >+++ b/arch/powerpc/platforms/powernv/pci-ioda.c > > >@@ -2077,10 +2077,17 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe) > > > { > > > struct iommu_table *tbl = NULL; > > > long rc; > > >+ /* > > >+ * In memory constrained environments, e.g. kdump kernel, the > > >+ * DMA window can be larger than available memory, which will > > >+ * cause errors later. > > >+ */ > > >+ __u64 window_size = > > > > I asked for "const __u64" ;) > > I knew I'd forget something! Nish! In future please send a reply with the above comment, and then the v2 as a separate mail, otherwise I have to manually edit out your comment when applying. > When attempting to kdump with the 4.2 kernel, we see for each PCI > device: > > pci 0003:01 : [PE# 000] Assign DMA32 space > pci 0003:01 : [PE# 000] Setting up 32-bit TCE table at 0..80000000 > pci 0003:01 : [PE# 000] Failed to create 32-bit TCE table, err -22 > PCI: Domain 0004 has 8 available 32-bit DMA segments > PCI: 4 PE# for a total weight of 70 > pci 0004:01 : [PE# 002] Assign DMA32 space > pci 0004:01 : [PE# 002] Setting up 32-bit TCE table at 0..80000000 > pci 0004:01 : [PE# 002] Failed to create 32-bit TCE table, err -22 > pci 0004:0d : [PE# 005] Assign DMA32 space > pci 0004:0d : [PE# 005] Setting up 32-bit TCE table at 0..80000000 > pci 0004:0d : [PE# 005] Failed to create 32-bit TCE table, err -22 > pci 0004:0e : [PE# 006] Assign DMA32 space > pci 0004:0e : [PE# 006] Setting up 32-bit TCE table at 0..80000000 > pci 0004:0e : [PE# 006] Failed to create 32-bit TCE table, err -22 > pci 0004:10 : [PE# 008] Assign DMA32 space > pci 0004:10 : [PE# 008] Setting up 32-bit TCE table at 0..80000000 > pci 0004:10 : [PE# 008] Failed to create 32-bit TCE table, err -22 > > and eventually the kdump kernel fails to boot as none of the PCI devices > (including the disk controller) are successfully initialized. > > The EINVAL response is because the DMA window (the 2GB base window) is > larger than the kdump kernel's reserved memory (crashkernel=, in this > case specified to be 1024M). The check in question, > > if ((window_size > memory_hotplug_max()) || !is_power_of_2(window_size)) > > is a valid sanity check for pnv_pci_ioda2_table_alloc_pages(), so adjust > the caller to pass in a smaller window size if our maximum memory value > is smaller than the DMA window. > > After this change, the PCI devices successfully set up the 32-bit TCE > table and kdump succeeds. > > The problem was seen on a Firestone machine originally. > > Fixes: aca6913f5551 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages") That went into 4.2, so should this have a: Cc: stable@vger.kernel.org # 4.2 ? > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c > index 85cbc96eff6c..e51aff01a218 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -2077,10 +2077,17 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe) > { > struct iommu_table *tbl = NULL; > long rc; > + /* > + * In memory constrained environments, e.g. kdump kernel, the > + * DMA window can be larger than available memory, which will > + * cause errors later. > + */ > + const __u64 window_size = Why is this using __u64 and not u64, it's not exported to userspace. It looks like pnv_pci_ioda2_create_table() uses __u64, but there's no reason for that AFAICS either. And yes I did commit it so it's my fault :) I'll fix this up to use u64. cheers
On 03.09.2015 [19:58:53 +1000], Michael Ellerman wrote: > On Wed, 2015-09-02 at 08:39 -0700, Nishanth Aravamudan wrote: > > On 02.09.2015 [19:00:31 +1000], Alexey Kardashevskiy wrote: > > > On 09/02/2015 11:11 AM, Nishanth Aravamudan wrote: > > > >diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c > > > >index 85cbc96eff6c..0d7967e31169 100644 > > > >--- a/arch/powerpc/platforms/powernv/pci-ioda.c > > > >+++ b/arch/powerpc/platforms/powernv/pci-ioda.c > > > >@@ -2077,10 +2077,17 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe) > > > > { > > > > struct iommu_table *tbl = NULL; > > > > long rc; > > > >+ /* > > > >+ * In memory constrained environments, e.g. kdump kernel, the > > > >+ * DMA window can be larger than available memory, which will > > > >+ * cause errors later. > > > >+ */ > > > >+ __u64 window_size = > > > > > > I asked for "const __u64" ;) > > > > I knew I'd forget something! > > Nish! In future please send a reply with the above comment, and then > the v2 as a separate mail, otherwise I have to manually edit out your > comment when applying. Sorry! Will amend my practices in the future. <snip> > > Fixes: aca6913f5551 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages") > > That went into 4.2, so should this have a: > > Cc: stable@vger.kernel.org # 4.2 Grr, another mess-up, you're right, it should. Can you amend on your end? Sorry, I was in too much of a hurry to get the fixlet out. -Nish
On 09/03/2015 07:58 PM, Michael Ellerman wrote: > On Wed, 2015-09-02 at 08:39 -0700, Nishanth Aravamudan wrote: >> On 02.09.2015 [19:00:31 +1000], Alexey Kardashevskiy wrote: >>> On 09/02/2015 11:11 AM, Nishanth Aravamudan wrote: >>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c >>>> index 85cbc96eff6c..0d7967e31169 100644 >>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c >>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>>> @@ -2077,10 +2077,17 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe) >>>> { >>>> struct iommu_table *tbl = NULL; >>>> long rc; >>>> + /* >>>> + * In memory constrained environments, e.g. kdump kernel, the >>>> + * DMA window can be larger than available memory, which will >>>> + * cause errors later. >>>> + */ >>>> + __u64 window_size = >>> >>> I asked for "const __u64" ;) >> >> I knew I'd forget something! > > Nish! In future please send a reply with the above comment, and then the v2 as > a separate mail, otherwise I have to manually edit out your comment when > applying. > >> When attempting to kdump with the 4.2 kernel, we see for each PCI >> device: >> >> pci 0003:01 : [PE# 000] Assign DMA32 space >> pci 0003:01 : [PE# 000] Setting up 32-bit TCE table at 0..80000000 >> pci 0003:01 : [PE# 000] Failed to create 32-bit TCE table, err -22 >> PCI: Domain 0004 has 8 available 32-bit DMA segments >> PCI: 4 PE# for a total weight of 70 >> pci 0004:01 : [PE# 002] Assign DMA32 space >> pci 0004:01 : [PE# 002] Setting up 32-bit TCE table at 0..80000000 >> pci 0004:01 : [PE# 002] Failed to create 32-bit TCE table, err -22 >> pci 0004:0d : [PE# 005] Assign DMA32 space >> pci 0004:0d : [PE# 005] Setting up 32-bit TCE table at 0..80000000 >> pci 0004:0d : [PE# 005] Failed to create 32-bit TCE table, err -22 >> pci 0004:0e : [PE# 006] Assign DMA32 space >> pci 0004:0e : [PE# 006] Setting up 32-bit TCE table at 0..80000000 >> pci 0004:0e : [PE# 006] Failed to create 32-bit TCE table, err -22 >> pci 0004:10 : [PE# 008] Assign DMA32 space >> pci 0004:10 : [PE# 008] Setting up 32-bit TCE table at 0..80000000 >> pci 0004:10 : [PE# 008] Failed to create 32-bit TCE table, err -22 >> >> and eventually the kdump kernel fails to boot as none of the PCI devices >> (including the disk controller) are successfully initialized. >> >> The EINVAL response is because the DMA window (the 2GB base window) is >> larger than the kdump kernel's reserved memory (crashkernel=, in this >> case specified to be 1024M). The check in question, >> >> if ((window_size > memory_hotplug_max()) || !is_power_of_2(window_size)) >> >> is a valid sanity check for pnv_pci_ioda2_table_alloc_pages(), so adjust >> the caller to pass in a smaller window size if our maximum memory value >> is smaller than the DMA window. >> >> After this change, the PCI devices successfully set up the 32-bit TCE >> table and kdump succeeds. >> >> The problem was seen on a Firestone machine originally. >> >> Fixes: aca6913f5551 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages") > > That went into 4.2, so should this have a: > > Cc: stable@vger.kernel.org # 4.2 > > ? > >> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c >> index 85cbc96eff6c..e51aff01a218 100644 >> --- a/arch/powerpc/platforms/powernv/pci-ioda.c >> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c >> @@ -2077,10 +2077,17 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe) >> { >> struct iommu_table *tbl = NULL; >> long rc; >> + /* >> + * In memory constrained environments, e.g. kdump kernel, the >> + * DMA window can be larger than available memory, which will >> + * cause errors later. >> + */ >> + const __u64 window_size = > > Why is this using __u64 and not u64, it's not exported to userspace. > > It looks like pnv_pci_ioda2_create_table() uses __u64, but there's no reason > for that AFAICS either. And yes I did commit it so it's my fault :) There is VFIO_IOMMU_SPAPR_TCE_CREATE ioctl which receives vfio_iommu_spapr_tce_create struct from the user space and there is "__u64 window_size" which is passed across: tce_iommu_create_window() tce_iommu_create_table() pnv_pci_ioda2_create_table (via table_group->ops->create_table()) At what point should __u64 have become u64? Thanks. > > I'll fix this up to use u64. > > cheers > > >
On Thu, 2015-09-03 at 08:59 -0700, Nishanth Aravamudan wrote: > On 03.09.2015 [19:58:53 +1000], Michael Ellerman wrote: > > On Wed, 2015-09-02 at 08:39 -0700, Nishanth Aravamudan wrote: > > > On 02.09.2015 [19:00:31 +1000], Alexey Kardashevskiy wrote: > > > > On 09/02/2015 11:11 AM, Nishanth Aravamudan wrote: > > > > >diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c > > > > >index 85cbc96eff6c..0d7967e31169 100644 > > > > >--- a/arch/powerpc/platforms/powernv/pci-ioda.c > > > > >+++ b/arch/powerpc/platforms/powernv/pci-ioda.c > > > > >@@ -2077,10 +2077,17 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe) > > > > > { > > > > > struct iommu_table *tbl = NULL; > > > > > long rc; > > > > >+ /* > > > > >+ * In memory constrained environments, e.g. kdump kernel, the > > > > >+ * DMA window can be larger than available memory, which will > > > > >+ * cause errors later. > > > > >+ */ > > > > >+ __u64 window_size = > > > > > > > > I asked for "const __u64" ;) > > > > > > I knew I'd forget something! > > > > Nish! In future please send a reply with the above comment, and then > > the v2 as a separate mail, otherwise I have to manually edit out your > > comment when applying. > > Sorry! Will amend my practices in the future. Thanks. I concede none of this is documented anywhere, maybe one day I'll have the time to do that :) > > > Fixes: aca6913f5551 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages") > > > > That went into 4.2, so should this have a: > > > > Cc: stable@vger.kernel.org # 4.2 > > Grr, another mess-up, you're right, it should. Can you amend on your > end? Sorry, I was in too much of a hurry to get the fixlet out. That's OK, I'll fix it up. cheers
On Fri, 2015-09-04 at 13:41 +1000, Alexey Kardashevskiy wrote: > On 09/03/2015 07:58 PM, Michael Ellerman wrote: > > On Wed, 2015-09-02 at 08:39 -0700, Nishanth Aravamudan wrote: > >> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c > >> index 85cbc96eff6c..e51aff01a218 100644 > >> --- a/arch/powerpc/platforms/powernv/pci-ioda.c > >> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > >> @@ -2077,10 +2077,17 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe) > >> { > >> struct iommu_table *tbl = NULL; > >> long rc; > >> + /* > >> + * In memory constrained environments, e.g. kdump kernel, the > >> + * DMA window can be larger than available memory, which will > >> + * cause errors later. > >> + */ > >> + const __u64 window_size = > > > > Why is this using __u64 and not u64, it's not exported to userspace. > > > > It looks like pnv_pci_ioda2_create_table() uses __u64, but there's no reason > > for that AFAICS either. And yes I did commit it so it's my fault :) > > There is VFIO_IOMMU_SPAPR_TCE_CREATE ioctl which receives > vfio_iommu_spapr_tce_create struct from the user space and there is "__u64 > window_size" which is passed across: > tce_iommu_create_window() > tce_iommu_create_table() > pnv_pci_ioda2_create_table (via table_group->ops->create_table()) > > At what point should __u64 have become u64? As soon as it was pulled out of the struct. So here: ret = tce_iommu_create_window(container, create.page_shift, create.window_size, create.levels, &create.start_addr); ie, instead of: static long tce_iommu_create_window(struct tce_container *container, __u32 page_shift, __u64 window_size, __u32 levels, __u64 *start_addr) this: static long tce_iommu_create_window(struct tce_container *container, u32 page_shift, u64 window_size, u32 levels, u64 *start_addr) cheers
On Wed, 2015-02-09 at 15:39:28 UTC, Nishanth Aravamudan wrote: > When attempting to kdump with the 4.2 kernel, we see for each PCI > device: > > pci 0003:01 : [PE# 000] Assign DMA32 space > pci 0003:01 : [PE# 000] Setting up 32-bit TCE table at 0..80000000 > pci 0003:01 : [PE# 000] Failed to create 32-bit TCE table, err -22 <snip> > is a valid sanity check for pnv_pci_ioda2_table_alloc_pages(), so adjust > the caller to pass in a smaller window size if our maximum memory value > is smaller than the DMA window. > > After this change, the PCI devices successfully set up the 32-bit TCE > table and kdump succeeds. > > The problem was seen on a Firestone machine originally. > > Fixes: aca6913f5551 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages") > Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> > Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/bb0054552d080dd929907c59 cheers
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 85cbc96eff6c..e51aff01a218 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2077,10 +2077,17 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe) { struct iommu_table *tbl = NULL; long rc; + /* + * In memory constrained environments, e.g. kdump kernel, the + * DMA window can be larger than available memory, which will + * cause errors later. + */ + const __u64 window_size = + min((u64)pe->table_group.tce32_size, memory_hotplug_max()); rc = pnv_pci_ioda2_create_table(&pe->table_group, 0, IOMMU_PAGE_SHIFT_4K, - pe->table_group.tce32_size, + window_size, POWERNV_IOMMU_DEFAULT_LEVELS, &tbl); if (rc) { pe_err(pe, "Failed to create 32-bit TCE table, err %ld",