diff mbox series

[v4,5/7] powerpc/iommu: Move iommu_table cleaning routine to iommu_table_clean

Message ID 20200716071658.467820-6-leobras.c@gmail.com
State Changes Requested, archived
Headers show
Series Remove default DMA window before creating DDW | expand

Commit Message

Leonardo Brás July 16, 2020, 7:16 a.m. UTC
Move the part of iommu_table_free() that does struct iommu_table cleaning
into iommu_table_clean, so we can invoke it separately.

This new function is useful for cleaning struct iommu_table before
initializing it again with a new DMA window, without having it freed and
allocated again.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/kernel/iommu.c | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

Comments

Alexey Kardashevskiy July 21, 2020, 4:59 a.m. UTC | #1
On 16/07/2020 17:16, Leonardo Bras wrote:
> Move the part of iommu_table_free() that does struct iommu_table cleaning
> into iommu_table_clean, so we can invoke it separately.
> 
> This new function is useful for cleaning struct iommu_table before
> initializing it again with a new DMA window, without having it freed and
> allocated again.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>  arch/powerpc/kernel/iommu.c | 30 ++++++++++++++++++------------
>  1 file changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 9704f3f76e63..c3242253a4e7 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -735,21 +735,10 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>  	return tbl;
>  }
>  
> -static void iommu_table_free(struct kref *kref)
> +static void iommu_table_clean(struct iommu_table *tbl)


iommu_table_free() + iommu_init_table() + set_iommu_table_base() should
work too, why new helper?

There is also iommu_table_clear() which does a different thing so you
need a better name.

Second, iommu_table_free() would print a warning if any IOMMU page is in
use and it would be ok as we would only see this when hot-unplugging a
PE because we always kept the default window.

Btw you must be seeing these warnings now every time you create DDW with
these patches as at least the first page is reserved, do not you?

Since we are replacing a table for a device which is still in the
system, we should not try messing with its DMA if it already has
mappings so the warning should become an error preventing DDW. It is
rather hard to trigger in practice but I could hack a driver to ask for
32bit DMA mask first, map few pages and then ask for 64bit DMA mask, it
is not illegal, I think. So this needs a new helper - "bool
iommu_table_in_use(tbl)" - to use in enable_ddw(). Or I am overthinking
this?... Thanks,



>  {
>  	unsigned long bitmap_sz;
>  	unsigned int order;
> -	struct iommu_table *tbl;
> -
> -	tbl = container_of(kref, struct iommu_table, it_kref);
> -
> -	if (tbl->it_ops->free)
> -		tbl->it_ops->free(tbl);
> -
> -	if (!tbl->it_map) {
> -		kfree(tbl);
> -		return;
> -	}
>  
>  	iommu_table_release_pages(tbl);
>  
> @@ -763,6 +752,23 @@ static void iommu_table_free(struct kref *kref)
>  	/* free bitmap */
>  	order = get_order(bitmap_sz);
>  	free_pages((unsigned long) tbl->it_map, order);
> +}
> +
> +static void iommu_table_free(struct kref *kref)
> +{
> +	struct iommu_table *tbl;
> +
> +	tbl = container_of(kref, struct iommu_table, it_kref);
> +
> +	if (tbl->it_ops->free)
> +		tbl->it_ops->free(tbl);
> +
> +	if (!tbl->it_map) {
> +		kfree(tbl);
> +		return;
> +	}
> +
> +	iommu_table_clean(tbl);
>  
>  	/* free table */
>  	kfree(tbl);
>
Leonardo Brás July 21, 2020, 10:13 p.m. UTC | #2
On Tue, 2020-07-21 at 14:59 +1000, Alexey Kardashevskiy wrote:
> 
> On 16/07/2020 17:16, Leonardo Bras wrote:
> > Move the part of iommu_table_free() that does struct iommu_table cleaning
> > into iommu_table_clean, so we can invoke it separately.
> > 
> > This new function is useful for cleaning struct iommu_table before
> > initializing it again with a new DMA window, without having it freed and
> > allocated again.
> > 
> > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > ---
> >  arch/powerpc/kernel/iommu.c | 30 ++++++++++++++++++------------
> >  1 file changed, 18 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> > index 9704f3f76e63..c3242253a4e7 100644
> > --- a/arch/powerpc/kernel/iommu.c
> > +++ b/arch/powerpc/kernel/iommu.c
> > @@ -735,21 +735,10 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
> >  	return tbl;
> >  }
> >  
> > -static void iommu_table_free(struct kref *kref)
> > +static void iommu_table_clean(struct iommu_table *tbl)
> 
> iommu_table_free() + iommu_init_table() + set_iommu_table_base() should
> work too, why new helper?

iommu_table_free() also frees the tbl, which would need allocate it
again (new address) and to fill it up again, unnecessarily. 
I think it's a better approach to only change what is needed.

> There is also iommu_table_clear() which does a different thing so you
> need a better name.

I agree.
I had not noticed this other function before sending the patchset. What
would be a better name though? __iommu_table_free()? 

> Second, iommu_table_free
> use and it would be ok as we would only see this when hot-unplugging a
> PE because we always kept the default window.
> Btw you must be seeing these warnings now every time you create DDW with
> these patches as at least the first page is reserved, do not you?

It does not print a warning.
I noticed other warnings, but not this one from iommu_table_free():
/* verify that table contains no entries */
if (!bitmap_empty(tbl->it_ma
p, tbl->it_size))
	pr_warn("%s: Unexpected TCEs\n", __func__);

Before that, iommu_table_release_pages(tbl) is supposed to clear the 
bitmap, so this only tests for a tce that is created in this short period.

> Since we are replacing a table for a device which is still in the
> system, we should not try messing with its DMA if it already has
> mappings so the warning should become an error preventing DDW. It is
> rather hard to trigger in practice but I could hack a driver to ask for
> 32bit DMA mask first, map few pages and then ask for 64bit DMA mask, it
> is not illegal, I think. So this needs a new helper - "bool
> iommu_table_in_use(tbl)" - to use in enable_ddw(). Or I am overthinking
> this?... Thanks,

As of today, there seems to be nothing like that happening in the
driver I am testing. 
I spoke to Brian King on slack, and he mentioned that at the point DDW
is created there should be no allocations in place.

But I suppose some driver could try to do this.

Maybe a better approach would be removing the mapping only if the
default window is removed (at the end of enable_ddw, as an else to
resetting the default DMA window), and having a way to add more
mappings to those pools. But this last part doesn't look so simple, and
it would be better to understand if it's necessary investing work in
this.

What do you think?

Best regards,
Brian King July 22, 2020, 12:52 a.m. UTC | #3
On 7/21/20 5:13 PM, Leonardo Bras wrote:
> On Tue, 2020-07-21 at 14:59 +1000, Alexey Kardashevskiy wrote:
>>
>> On 16/07/2020 17:16, Leonardo Bras wrote:
>>> Move the part of iommu_table_free() that does struct iommu_table cleaning
>>> into iommu_table_clean, so we can invoke it separately.
>>>
>>> This new function is useful for cleaning struct iommu_table before
>>> initializing it again with a new DMA window, without having it freed and
>>> allocated again.
>>>
>>> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
>>> ---
>>>  arch/powerpc/kernel/iommu.c | 30 ++++++++++++++++++------------
>>>  1 file changed, 18 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
>>> index 9704f3f76e63..c3242253a4e7 100644
>>> --- a/arch/powerpc/kernel/iommu.c
>>> +++ b/arch/powerpc/kernel/iommu.c
>>> @@ -735,21 +735,10 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>>>  	return tbl;
>>>  }
>>>  
>>> -static void iommu_table_free(struct kref *kref)
>>> +static void iommu_table_clean(struct iommu_table *tbl)
>>
>> iommu_table_free() + iommu_init_table() + set_iommu_table_base() should
>> work too, why new helper?
> 
> iommu_table_free() also frees the tbl, which would need allocate it
> again (new address) and to fill it up again, unnecessarily. 
> I think it's a better approach to only change what is needed.
> 
>> There is also iommu_table_clear() which does a different thing so you
>> need a better name.
> 
> I agree.
> I had not noticed this other function before sending the patchset. What
> would be a better name though? __iommu_table_free()? 
> 
>> Second, iommu_table_free
>> use and it would be ok as we would only see this when hot-unplugging a
>> PE because we always kept the default window.
>> Btw you must be seeing these warnings now every time you create DDW with
>> these patches as at least the first page is reserved, do not you?
> 
> It does not print a warning.
> I noticed other warnings, but not this one from iommu_table_free():
> /* verify that table contains no entries */
> if (!bitmap_empty(tbl->it_ma
> p, tbl->it_size))
> 	pr_warn("%s: Unexpected TCEs\n", __func__);
> 
> Before that, iommu_table_release_pages(tbl) is supposed to clear the 
> bitmap, so this only tests for a tce that is created in this short period.
> 
>> Since we are replacing a table for a device which is still in the
>> system, we should not try messing with its DMA if it already has
>> mappings so the warning should become an error preventing DDW. It is
>> rather hard to trigger in practice but I could hack a driver to ask for
>> 32bit DMA mask first, map few pages and then ask for 64bit DMA mask, it
>> is not illegal, I think. So this needs a new helper - "bool
>> iommu_table_in_use(tbl)" - to use in enable_ddw(). Or I am overthinking
>> this?... Thanks,
> 
> As of today, there seems to be nothing like that happening in the
> driver I am testing. 
> I spoke to Brian King on slack, and he mentioned that at the point DDW
> is created there should be no allocations in place.

I think there are a couple of scenarios here. One is where there is a DMA
allocation prior to a call to set the DMA mask. Second scenario is if the
driver makes multiple calls to set the DMA mask. I would argue that a properly
written driver should tell the IOMMU subsystem what DMA mask it supports prior
to allocating DMA memroy. Documentation/core-api/dma-api-howto.rst should
describe what is legal and what is not.

It might be reasonable to declare its not allowed to allocate DMA memory
and then later change the DMA mask and clearly call this out in the documentation
if its not already.

-Brian
Alexey Kardashevskiy July 22, 2020, 1:28 a.m. UTC | #4
On 22/07/2020 08:13, Leonardo Bras wrote:
> On Tue, 2020-07-21 at 14:59 +1000, Alexey Kardashevskiy wrote:
>>
>> On 16/07/2020 17:16, Leonardo Bras wrote:
>>> Move the part of iommu_table_free() that does struct iommu_table cleaning
>>> into iommu_table_clean, so we can invoke it separately.
>>>
>>> This new function is useful for cleaning struct iommu_table before
>>> initializing it again with a new DMA window, without having it freed and
>>> allocated again.
>>>
>>> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
>>> ---
>>>  arch/powerpc/kernel/iommu.c | 30 ++++++++++++++++++------------
>>>  1 file changed, 18 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
>>> index 9704f3f76e63..c3242253a4e7 100644
>>> --- a/arch/powerpc/kernel/iommu.c
>>> +++ b/arch/powerpc/kernel/iommu.c
>>> @@ -735,21 +735,10 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>>>  	return tbl;
>>>  }
>>>  
>>> -static void iommu_table_free(struct kref *kref)
>>> +static void iommu_table_clean(struct iommu_table *tbl)
>>
>> iommu_table_free() + iommu_init_table() + set_iommu_table_base() should
>> work too, why new helper?
> 
> iommu_table_free() also frees the tbl, which would need allocate it
> again (new address) and to fill it up again, unnecessarily. 

It is a new table in fact, everything is new there. You are only saving
kfree+kzalloc which does not seem a huge win.

Also, iommu_table_update() simply assumes 64bit window by passing
res_start=res_end=0 to iommu_init_table() which is not horribly robust
either. Yeah, I know, iommu_init_table() is always called with zeroes in
pseries but this is somewhat ok as those tables are from the device tree
and those windows don't overlap with 32bit MMIO but under KVM they will
(well, if we hack QEMU to advertise a single window).

I suggest removing iommu_pseries_table_update() from 6/7 and do
iommu_table_free() + iommu_init_table() + set_iommu_table_base() with a
WARN_ON(pdev->dev.archdata.dma_offset>=SZ_4G), may be even do this all
in enable_ddw() where we know for sure if it is 1:1 mapping or just a
big window.

Out of curiosity - what page sizes does pHyp advertise in "query"?


> I think it's a better approach to only change what is needed.
> 
>> There is also iommu_table_clear() which does a different thing so you
>> need a better name.
> 
> I agree.
> I had not noticed this other function before sending the patchset. What
> would be a better name though? __iommu_table_free()? 
> 
>> Second, iommu_table_free
>> use and it would be ok as we would only see this when hot-unplugging a
>> PE because we always kept the default window.
>> Btw you must be seeing these warnings now every time you create DDW with
>> these patches as at least the first page is reserved, do not you?
> 
> It does not print a warning.
> I noticed other warnings,

And what are these?

> but not this one from iommu_table_free():
> /* verify that table contains no entries */
> if (!bitmap_empty(tbl->it_ma
> p, tbl->it_size))
> 	pr_warn("%s: Unexpected TCEs\n", __func__);
> 
> Before that, iommu_table_release_pages(tbl) is supposed to clear the 
> bitmap, so this only tests for a tce that is created in this short period.


iommu_table_release_pages() only clears reserved pages - page 0 (just a
protection against NULL DMA pointers) and 32bit MMIO (these should not
be set for 64bit window). The "%s: Unexpected TCEs\n" is what checks for
actual mapped TCEs.


>> Since we are replacing a table for a device which is still in the
>> system, we should not try messing with its DMA if it already has
>> mappings so the warning should become an error preventing DDW. It is
>> rather hard to trigger in practice but I could hack a driver to ask for
>> 32bit DMA mask first, map few pages and then ask for 64bit DMA mask, it
>> is not illegal, I think. So this needs a new helper - "bool
>> iommu_table_in_use(tbl)" - to use in enable_ddw(). Or I am overthinking
>> this?... Thanks,
> 
> As of today, there seems to be nothing like that happening in the
> driver I am testing. 
> I spoke to Brian King on slack, and he mentioned that at the point DDW
> is created there should be no allocations in place.

Correct, there should not be. But it is also not a huge effort to fall
back if there are.


> But I suppose some driver could try to do this.
> 
> Maybe a better approach would be removing the mapping only if the
> default window is removed (at the end of enable_ddw, as an else to
> resetting the default DMA window), and having a way to add more
> mappings to those pools. But this last part doesn't look so simple, and
> it would be better to understand if it's necessary investing work in
> this.
> 
> What do you think?

Add iommu_table_in_use(tbl) and fail DDW if that says "yes".
Leonardo Brás July 22, 2020, 11:37 p.m. UTC | #5
On Wed, 2020-07-22 at 11:28 +1000, Alexey Kardashevskiy wrote:
> 
> On 22/07/2020 08:13, Leonardo Bras wrote:
> > On Tue, 2020-07-21 at 14:59 +1000, Alexey Kardashevskiy wrote:
> > > On 16/07/2020 17:16, Leonardo Bras wrote:
> > > > Move the part of iommu_table_free() that does struct iommu_table cleaning
> > > > into iommu_table_clean, so we can invoke it separately.
> > > > 
> > > > This new function is useful for cleaning struct iommu_table before
> > > > initializing it again with a new DMA window, without having it freed and
> > > > allocated again.
> > > > 
> > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > > > ---
> > > >  arch/powerpc/kernel/iommu.c | 30 ++++++++++++++++++------------
> > > >  1 file changed, 18 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> > > > index 9704f3f76e63..c3242253a4e7 100644
> > > > --- a/arch/powerpc/kernel/iommu.c
> > > > +++ b/arch/powerpc/kernel/iommu.c
> > > > @@ -735,21 +735,10 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
> > > >  	return tbl;
> > > >  }
> > > >  
> > > > -static void iommu_table_free(struct kref *kref)
> > > > +static void iommu_table_clean(struct iommu_table *tbl)
> > > 
> > > iommu_table_free() + iommu_init_table() + set_iommu_table_base() should
> > > work too, why new helper?
> > 
> > iommu_table_free() also frees the tbl, which would need allocate it
> > again (new address) and to fill it up again, unnecessarily. 
> 
> It is a new table in fact, everything is new there. You are only saving
> kfree+kzalloc which does not seem a huge win.
> 
> Also, iommu_table_update() simply assumes 64bit window by passing
> res_start=res_end=0 to iommu_init_table() which is not horribly robust
> either. Yeah, I know, iommu_init_table() is always called with zeroes in
> pseries but this is somewhat ok as those tables are from the device tree
> and those windows don't overlap with 32bit MMIO but under KVM they will
> (well, if we hack QEMU to advertise a single window).
> 
> I suggest removing iommu_pseries_table_update() from 6/7 and do
> iommu_table_free() + iommu_init_table() + set_iommu_table_base() with a
> WARN_ON(pdev->dev.archdata.dma_offset>=SZ_4G), may be even do this all
> in enable_ddw() where we know for sure if it is 1:1 mapping or just a
> big window.

Sure, I have yet to understand the full impact of this change, but I
will implement this and give it a try.

> 
> Out of curiosity - what page sizes does pHyp advertise in "query"?

64kB (page shift 0x10)

> 
> 
> > I think it's a better approach to only change what is needed.
> > 
> > > There is also iommu_table_clear() which does a different thing so you
> > > need a better name.
> > 
> > I agree.
> > I had not noticed this other function before sending the patchset. What
> > would be a better name though? __iommu_table_free()? 
> > 
> > > Second, iommu_table_free
> > > use and it would be ok as we would only see this when hot-unplugging a
> > > PE because we always kept the default window.
> > > Btw you must be seeing these warnings now every time you create DDW with
> > > these patches as at least the first page is reserved, do not you?
> > 
> > It does not print a warning.
> > I noticed other warnings,
> 
> And what are these?

tce_freemulti_pSeriesLP: plpar_tce_stuff failed
[...]

It's regarding the change in pagesize. 
Some places have the tceshift hardcoded as 12, tce_freemulti_pSeriesLP
is one of them, and that is causing some errors.

I wrote a patch fixing this, and I will include it in the next series.

> 
> > but not this one from iommu_table_free():
> > /* verify that table contains no entries */
> > if (!bitmap_empty(tbl->it_ma
> > p, tbl->it_size))
> > 	pr_warn("%s: Unexpected TCEs\n", __func__);
> > 
> > Before that, iommu_table_release_pages(tbl) is supposed to clear the 
> > bitmap, so this only tests for a tce that is created in this short period.
> 
> iommu_table_release_pages() only clears reserved pages - page 0 (just a
> protection against NULL DMA pointers) and 32bit MMIO (these should not
> be set for 64bit window). The "%s: Unexpected TCEs\n" is what checks for
> actual mapped TCEs.
> 

Oh, I haven't noticed that. Thanks for pointing!

> > > Since we are replacing a table for a device which is still in the
> > > system, we should not try messing with its DMA if it already has
> > > mappings so the warning should become an error preventing DDW. It is
> > > rather hard to trigger in practice but I could hack a driver to ask for
> > > 32bit DMA mask first, map few pages and then ask for 64bit DMA mask, it
> > > is not illegal, I think. So this needs a new helper - "bool
> > > iommu_table_in_use(tbl)" - to use in enable_ddw(). Or I am overthinking
> > > this?... Thanks,
> > 
> > As of today, there seems to be nothing like that happening in the
> > driver I am testing. 
> > I spoke to Brian King on slack, and he mentioned that at the point DDW
> > is created there should be no allocations in place.
> 
> Correct, there should not be. But it is also not a huge effort to fall
> back if there are.

True.

> 
> > But I suppose some driver could try to do this.
> > 
> > Maybe a better approach would be removing the mapping only if the
> > default window is removed (at the end of enable_ddw, as an else to
> > resetting the default DMA window), and having a way to add more
> > mappings to those pools. But this last part doesn't look so simple, and
> > it would be better to understand if it's necessary investing work in
> > this.
> > 
> > What do you think?
> 
> Add iommu_table_in_use(tbl) and fail DDW if that says "yes".

Seems good, I will include that on the next patchset.

Still, I will try to implement that more complex approach in a future
patchset, as it may come to be useful.

Thank you for the feedback!
Leonardo Brás July 22, 2020, 11:45 p.m. UTC | #6
On Tue, 2020-07-21 at 19:52 -0500, Brian King wrote:
> > 
> > As of today, there seems to be nothing like that happening in the
> > driver I am testing. 
> > I spoke to Brian King on slack, and he mentioned that at the point DDW
> > is created there should be no allocations in place.
> 
> I think there are a couple of scenarios here. One is where there is a DMA
> allocation prior to a call to set the DMA mask. Second scenario is if the
> driver makes multiple calls to set the DMA mask. I would argue that a properly
> written driver should tell the IOMMU subsystem what DMA mask it supports prior
> to allocating DMA memroy. Documentation/core-api/dma-api-howto.rst should
> describe what is legal and what is not.
> 
> It might be reasonable to declare its not allowed to allocate DMA memory
> and then later change the DMA mask and clearly call this out in the documentation
> if its not already.
> 
> -Brian

Thank you for the feedback Brian!

That makes sense to me. I will try to have this in mind for the next
patchset. 

Best regards,
diff mbox series

Patch

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 9704f3f76e63..c3242253a4e7 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -735,21 +735,10 @@  struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
 	return tbl;
 }
 
-static void iommu_table_free(struct kref *kref)
+static void iommu_table_clean(struct iommu_table *tbl)
 {
 	unsigned long bitmap_sz;
 	unsigned int order;
-	struct iommu_table *tbl;
-
-	tbl = container_of(kref, struct iommu_table, it_kref);
-
-	if (tbl->it_ops->free)
-		tbl->it_ops->free(tbl);
-
-	if (!tbl->it_map) {
-		kfree(tbl);
-		return;
-	}
 
 	iommu_table_release_pages(tbl);
 
@@ -763,6 +752,23 @@  static void iommu_table_free(struct kref *kref)
 	/* free bitmap */
 	order = get_order(bitmap_sz);
 	free_pages((unsigned long) tbl->it_map, order);
+}
+
+static void iommu_table_free(struct kref *kref)
+{
+	struct iommu_table *tbl;
+
+	tbl = container_of(kref, struct iommu_table, it_kref);
+
+	if (tbl->it_ops->free)
+		tbl->it_ops->free(tbl);
+
+	if (!tbl->it_map) {
+		kfree(tbl);
+		return;
+	}
+
+	iommu_table_clean(tbl);
 
 	/* free table */
 	kfree(tbl);