[kernel] powerpc/powernv/ioda: Store correct amount of memory used for table

Message ID 20190211074801.125646-1-aik@ozlabs.ru
State New
Headers show
Series
  • [kernel] powerpc/powernv/ioda: Store correct amount of memory used for table
Related show

Checks

Context Check Description
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
snowpatch_ozlabs/build-pmac32 success build succeeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-ppc64e success build succeeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-ppc64be success build succeeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-ppc64le success build succeeded & removed 0 sparse warning(s)
snowpatch_ozlabs/apply_patch success next/apply_patch Successfully applied

Commit Message

Alexey Kardashevskiy Feb. 11, 2019, 7:48 a.m.
We store 2 multilevel tables in iommu_table - one for the hardware and
one with the corresponding userspace addresses. Before allocating
the tables, the iommu_table_group_ops::get_table_size() hook returns
the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
the locked_vm counter correctly. When the table is actually allocated,
the amount of allocated memory is stored in iommu_table::it_allocated_size
and used to adjust the locked_vm counter when we release the memory used
by the table; .get_table_size() and .create_table() calculate it
independently but the result is expected to be the same.

Unfortunately the allocator does not add the userspace table size to
::it_allocated_size so when we destroy the table because of VFIO PCI
unplug (i.e. VFIO container is gone but the userspace keeps running),
we decrement locked_vm by just a half of size of memory we are releasing.
As the result, we leak locked_vm and may not be able to allocate more
IOMMU tables after few iterations of hotplug/unplug.

This adjusts it_allocated_size if the userspace addresses table was
requested (total_allocated_uas is initialized by zero).

Fixes: 090bad39b "powerpc/powernv: Add indirect levels to it_userspace"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Gibson Feb. 12, 2019, 12:20 a.m. | #1
On Mon, Feb 11, 2019 at 06:48:01PM +1100, Alexey Kardashevskiy wrote:
> We store 2 multilevel tables in iommu_table - one for the hardware and
> one with the corresponding userspace addresses. Before allocating
> the tables, the iommu_table_group_ops::get_table_size() hook returns
> the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
> the locked_vm counter correctly. When the table is actually allocated,
> the amount of allocated memory is stored in iommu_table::it_allocated_size
> and used to adjust the locked_vm counter when we release the memory used
> by the table; .get_table_size() and .create_table() calculate it
> independently but the result is expected to be the same.

Any way we can remove that redundant calculation?  That seems like
begging for bugs.

> Unfortunately the allocator does not add the userspace table size to
> ::it_allocated_size so when we destroy the table because of VFIO PCI
> unplug (i.e. VFIO container is gone but the userspace keeps running),
> we decrement locked_vm by just a half of size of memory we are releasing.
> As the result, we leak locked_vm and may not be able to allocate more
> IOMMU tables after few iterations of hotplug/unplug.
> 
> This adjusts it_allocated_size if the userspace addresses table was
> requested (total_allocated_uas is initialized by zero).
> 
> Fixes: 090bad39b "powerpc/powernv: Add indirect levels to it_userspace"
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> index 697449a..58146e1 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> @@ -313,7 +313,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>  			page_shift);
>  	tbl->it_level_size = 1ULL << (level_shift - 3);
>  	tbl->it_indirect_levels = levels - 1;
> -	tbl->it_allocated_size = total_allocated;
> +	tbl->it_allocated_size = total_allocated + total_allocated_uas;
>  	tbl->it_userspace = uas;
>  	tbl->it_nid = nid;
>
Alexey Kardashevskiy Feb. 12, 2019, 7:33 a.m. | #2
On 12/02/2019 11:20, David Gibson wrote:
> On Mon, Feb 11, 2019 at 06:48:01PM +1100, Alexey Kardashevskiy wrote:
>> We store 2 multilevel tables in iommu_table - one for the hardware and
>> one with the corresponding userspace addresses. Before allocating
>> the tables, the iommu_table_group_ops::get_table_size() hook returns
>> the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
>> the locked_vm counter correctly. When the table is actually allocated,
>> the amount of allocated memory is stored in iommu_table::it_allocated_size
>> and used to adjust the locked_vm counter when we release the memory used
>> by the table; .get_table_size() and .create_table() calculate it
>> independently but the result is expected to be the same.
> 
> Any way we can remove that redundant calculation?  That seems like
> begging for bugs.


I do not see an easy way. One way could be adding a "dryrun" flag to
pnv_pci_ioda2_table_alloc_pages(), count allocated memory there and call
it from .get_table_size() but for multilevel TCEs it only allocates
first level...


>> Unfortunately the allocator does not add the userspace table size to
>> ::it_allocated_size so when we destroy the table because of VFIO PCI
>> unplug (i.e. VFIO container is gone but the userspace keeps running),
>> we decrement locked_vm by just a half of size of memory we are releasing.
>> As the result, we leak locked_vm and may not be able to allocate more
>> IOMMU tables after few iterations of hotplug/unplug.
>>
>> This adjusts it_allocated_size if the userspace addresses table was
>> requested (total_allocated_uas is initialized by zero).
>>
>> Fixes: 090bad39b "powerpc/powernv: Add indirect levels to it_userspace"
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
>> ---
>>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>> index 697449a..58146e1 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>> @@ -313,7 +313,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>>  			page_shift);
>>  	tbl->it_level_size = 1ULL << (level_shift - 3);
>>  	tbl->it_indirect_levels = levels - 1;
>> -	tbl->it_allocated_size = total_allocated;
>> +	tbl->it_allocated_size = total_allocated + total_allocated_uas;
>>  	tbl->it_userspace = uas;
>>  	tbl->it_nid = nid;
>>  
>
Alexey Kardashevskiy Feb. 12, 2019, 11:57 p.m. | #3
On 12/02/2019 18:33, Alexey Kardashevskiy wrote:
> 
> 
> On 12/02/2019 11:20, David Gibson wrote:
>> On Mon, Feb 11, 2019 at 06:48:01PM +1100, Alexey Kardashevskiy wrote:
>>> We store 2 multilevel tables in iommu_table - one for the hardware and
>>> one with the corresponding userspace addresses. Before allocating
>>> the tables, the iommu_table_group_ops::get_table_size() hook returns
>>> the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
>>> the locked_vm counter correctly. When the table is actually allocated,
>>> the amount of allocated memory is stored in iommu_table::it_allocated_size
>>> and used to adjust the locked_vm counter when we release the memory used
>>> by the table; .get_table_size() and .create_table() calculate it
>>> independently but the result is expected to be the same.
>>
>> Any way we can remove that redundant calculation?  That seems like
>> begging for bugs.
> 
> 
> I do not see an easy way. One way could be adding a "dryrun" flag to
> pnv_pci_ioda2_table_alloc_pages(), count allocated memory there and call
> it from .get_table_size() but for multilevel TCEs it only allocates
> first level...


byyyyyy the way even this it_allocated_size is buggy as it is what was
actually allocated so it does not include any indirect levels which
might be allocated later although it should.

I'll simply make it tbl->it_allocated_size=.get_table_size() or just
ditch it_allocated_size, let me see...



> 
> 
>>> Unfortunately the allocator does not add the userspace table size to
>>> ::it_allocated_size so when we destroy the table because of VFIO PCI
>>> unplug (i.e. VFIO container is gone but the userspace keeps running),
>>> we decrement locked_vm by just a half of size of memory we are releasing.
>>> As the result, we leak locked_vm and may not be able to allocate more
>>> IOMMU tables after few iterations of hotplug/unplug.
>>>
>>> This adjusts it_allocated_size if the userspace addresses table was
>>> requested (total_allocated_uas is initialized by zero).
>>>
>>> Fixes: 090bad39b "powerpc/powernv: Add indirect levels to it_userspace"
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>>
>>> ---
>>>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>>> index 697449a..58146e1 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>>> @@ -313,7 +313,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>>>  			page_shift);
>>>  	tbl->it_level_size = 1ULL << (level_shift - 3);
>>>  	tbl->it_indirect_levels = levels - 1;
>>> -	tbl->it_allocated_size = total_allocated;
>>> +	tbl->it_allocated_size = total_allocated + total_allocated_uas;
>>>  	tbl->it_userspace = uas;
>>>  	tbl->it_nid = nid;
>>>  
>>
>

Patch

diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index 697449a..58146e1 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -313,7 +313,7 @@  long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 			page_shift);
 	tbl->it_level_size = 1ULL << (level_shift - 3);
 	tbl->it_indirect_levels = levels - 1;
-	tbl->it_allocated_size = total_allocated;
+	tbl->it_allocated_size = total_allocated + total_allocated_uas;
 	tbl->it_userspace = uas;
 	tbl->it_nid = nid;