diff mbox

[qemu,v4,1/3] spapr_iommu: Realloc guest visible TCE table when hot(un)plugging vfio-pci

Message ID 20170720072231.35054-2-aik@ozlabs.ru
State New
Headers show

Commit Message

Alexey Kardashevskiy July 20, 2017, 7:22 a.m. UTC
This replaces g_malloc() with spapr_tce_alloc_table() as this is
the standard way of allocating tables and this allows moving the table
back to KVM when unplugging a VFIO PCI device and VFIO TCE acceleration
support is not present in the KVM.

Although spapr_tce_alloc_table() is expected to fail with EBUSY
if called when previous fd is not closed yet, in practice we will not
see it because cap_spapr_vfio is false at the moment.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_iommu.c | 35 ++++++++++++++---------------------
 1 file changed, 14 insertions(+), 21 deletions(-)

Comments

David Gibson July 24, 2017, 5:53 a.m. UTC | #1
On Thu, Jul 20, 2017 at 05:22:29PM +1000, Alexey Kardashevskiy wrote:
> This replaces g_malloc() with spapr_tce_alloc_table() as this is
> the standard way of allocating tables and this allows moving the table
> back to KVM when unplugging a VFIO PCI device and VFIO TCE acceleration
> support is not present in the KVM.

So, I like the idea here, and I think it will work for now, but one
thing concerns me.

AIUI, your future plans include allowing in-kernel accelerated TCE
tables even with VFIO devices.  When that happens, we might have an
in-kernel table both before and after a change in the "need_vfio"
state.

But you won't be able to have two in-kernel tables allocated at once,
whereas this code relies on having both the old and new tables at once
so it can copy the contents over.

How do you intend to handle that?

> Although spapr_tce_alloc_table() is expected to fail with EBUSY
> if called when previous fd is not closed yet, in practice we will not
> see it because cap_spapr_vfio is false at the moment.

I don't follow this.  How would cap_spapr_vfio be changing at any
point?

> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/ppc/spapr_iommu.c | 35 ++++++++++++++---------------------
>  1 file changed, 14 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index e614621a83..307dc3021e 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -275,33 +275,26 @@ static int spapr_tce_table_realize(DeviceState *dev)
>  void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio)
>  {
>      size_t table_size = tcet->nb_table * sizeof(uint64_t);
> -    void *newtable;
> +    uint64_t *oldtable;
> +    int newfd = -1;
>  
> -    if (need_vfio == tcet->need_vfio) {
> -        /* Nothing to do */
> -        return;
> -    }
> +    g_assert(need_vfio != tcet->need_vfio);
>  
> -    if (!need_vfio) {
> -        /* FIXME: We don't support transition back to KVM accelerated
> -         * TCEs yet */
> -        return;
> -    }
> +    tcet->need_vfio = need_vfio;
>  
> -    tcet->need_vfio = true;
> +    oldtable = tcet->table;
>  
> -    if (tcet->fd < 0) {
> -        /* Table is already in userspace, nothing to be do */
> -        return;
> -    }
> +    tcet->table = spapr_tce_alloc_table(tcet->liobn,
> +                                        tcet->page_shift,
> +                                        tcet->bus_offset,
> +                                        tcet->nb_table,
> +                                        &newfd,
> +                                        need_vfio);
> +    memcpy(tcet->table, oldtable, table_size);
>  
> -    newtable = g_malloc(table_size);
> -    memcpy(newtable, tcet->table, table_size);
> +    spapr_tce_free_table(oldtable, tcet->fd, tcet->nb_table);
>  
> -    kvmppc_remove_spapr_tce(tcet->table, tcet->fd, tcet->nb_table);
> -
> -    tcet->fd = -1;
> -    tcet->table = newtable;
> +    tcet->fd = newfd;
>  }
>  
>  sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn)
Alexey Kardashevskiy July 24, 2017, 10:48 a.m. UTC | #2
On 24/07/17 15:53, David Gibson wrote:
> On Thu, Jul 20, 2017 at 05:22:29PM +1000, Alexey Kardashevskiy wrote:
>> This replaces g_malloc() with spapr_tce_alloc_table() as this is
>> the standard way of allocating tables and this allows moving the table
>> back to KVM when unplugging a VFIO PCI device and VFIO TCE acceleration
>> support is not present in the KVM.
> 
> So, I like the idea here, and I think it will work for now, but one
> thing concerns me.
> 
> AIUI, your future plans include allowing in-kernel accelerated TCE
> tables even with VFIO devices.  When that happens, we might have an
> in-kernel table both before and after a change in the "need_vfio"
> state.

The in-kernel table just stays there, mappings will be replayed but in the
end of the replay only the hardware table will change.

> 
> But you won't be able to have two in-kernel tables allocated at once,
> whereas this code relies on having both the old and new tables at once
> so it can copy the contents over.
> 
> How do you intend to handle that?

I intend to make this function no-op when cap_spapr_vfio is present.


> 
>> Although spapr_tce_alloc_table() is expected to fail with EBUSY
>> if called when previous fd is not closed yet, in practice we will not
>> see it because cap_spapr_vfio is false at the moment.
> 
> I don't follow this.  How would cap_spapr_vfio be changing at any
> point?

It depends on the version of the host kernel.

> 
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  hw/ppc/spapr_iommu.c | 35 ++++++++++++++---------------------
>>  1 file changed, 14 insertions(+), 21 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index e614621a83..307dc3021e 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
>> @@ -275,33 +275,26 @@ static int spapr_tce_table_realize(DeviceState *dev)
>>  void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio)
>>  {
>>      size_t table_size = tcet->nb_table * sizeof(uint64_t);
>> -    void *newtable;
>> +    uint64_t *oldtable;
>> +    int newfd = -1;
>>  
>> -    if (need_vfio == tcet->need_vfio) {
>> -        /* Nothing to do */
>> -        return;
>> -    }
>> +    g_assert(need_vfio != tcet->need_vfio);
>>  
>> -    if (!need_vfio) {
>> -        /* FIXME: We don't support transition back to KVM accelerated
>> -         * TCEs yet */
>> -        return;
>> -    }
>> +    tcet->need_vfio = need_vfio;
>>  
>> -    tcet->need_vfio = true;
>> +    oldtable = tcet->table;
>>  
>> -    if (tcet->fd < 0) {
>> -        /* Table is already in userspace, nothing to be do */
>> -        return;
>> -    }
>> +    tcet->table = spapr_tce_alloc_table(tcet->liobn,
>> +                                        tcet->page_shift,
>> +                                        tcet->bus_offset,
>> +                                        tcet->nb_table,
>> +                                        &newfd,
>> +                                        need_vfio);
>> +    memcpy(tcet->table, oldtable, table_size);
>>  
>> -    newtable = g_malloc(table_size);
>> -    memcpy(newtable, tcet->table, table_size);
>> +    spapr_tce_free_table(oldtable, tcet->fd, tcet->nb_table);
>>  
>> -    kvmppc_remove_spapr_tce(tcet->table, tcet->fd, tcet->nb_table);
>> -
>> -    tcet->fd = -1;
>> -    tcet->table = newtable;
>> +    tcet->fd = newfd;
>>  }
>>  
>>  sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn)
>
Alexey Kardashevskiy Aug. 7, 2017, 7:29 a.m. UTC | #3
On 24/07/17 20:48, Alexey Kardashevskiy wrote:
> On 24/07/17 15:53, David Gibson wrote:
>> On Thu, Jul 20, 2017 at 05:22:29PM +1000, Alexey Kardashevskiy wrote:
>>> This replaces g_malloc() with spapr_tce_alloc_table() as this is
>>> the standard way of allocating tables and this allows moving the table
>>> back to KVM when unplugging a VFIO PCI device and VFIO TCE acceleration
>>> support is not present in the KVM.
>>
>> So, I like the idea here, and I think it will work for now, but one
>> thing concerns me.
>>
>> AIUI, your future plans include allowing in-kernel accelerated TCE
>> tables even with VFIO devices.  When that happens, we might have an
>> in-kernel table both before and after a change in the "need_vfio"
>> state.
> 
> The in-kernel table just stays there, mappings will be replayed but in the
> end of the replay only the hardware table will change.
> 
>>
>> But you won't be able to have two in-kernel tables allocated at once,
>> whereas this code relies on having both the old and new tables at once
>> so it can copy the contents over.
>>
>> How do you intend to handle that?
> 
> I intend to make this function no-op when cap_spapr_vfio is present.
> 
> 
>>
>>> Although spapr_tce_alloc_table() is expected to fail with EBUSY
>>> if called when previous fd is not closed yet, in practice we will not
>>> see it because cap_spapr_vfio is false at the moment.
>>
>> I don't follow this.  How would cap_spapr_vfio be changing at any
>> point?
> 
> It depends on the version of the host kernel.


Ping?


> 
>>
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  hw/ppc/spapr_iommu.c | 35 ++++++++++++++---------------------
>>>  1 file changed, 14 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>>> index e614621a83..307dc3021e 100644
>>> --- a/hw/ppc/spapr_iommu.c
>>> +++ b/hw/ppc/spapr_iommu.c
>>> @@ -275,33 +275,26 @@ static int spapr_tce_table_realize(DeviceState *dev)
>>>  void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio)
>>>  {
>>>      size_t table_size = tcet->nb_table * sizeof(uint64_t);
>>> -    void *newtable;
>>> +    uint64_t *oldtable;
>>> +    int newfd = -1;
>>>  
>>> -    if (need_vfio == tcet->need_vfio) {
>>> -        /* Nothing to do */
>>> -        return;
>>> -    }
>>> +    g_assert(need_vfio != tcet->need_vfio);
>>>  
>>> -    if (!need_vfio) {
>>> -        /* FIXME: We don't support transition back to KVM accelerated
>>> -         * TCEs yet */
>>> -        return;
>>> -    }
>>> +    tcet->need_vfio = need_vfio;
>>>  
>>> -    tcet->need_vfio = true;
>>> +    oldtable = tcet->table;
>>>  
>>> -    if (tcet->fd < 0) {
>>> -        /* Table is already in userspace, nothing to be do */
>>> -        return;
>>> -    }
>>> +    tcet->table = spapr_tce_alloc_table(tcet->liobn,
>>> +                                        tcet->page_shift,
>>> +                                        tcet->bus_offset,
>>> +                                        tcet->nb_table,
>>> +                                        &newfd,
>>> +                                        need_vfio);
>>> +    memcpy(tcet->table, oldtable, table_size);
>>>  
>>> -    newtable = g_malloc(table_size);
>>> -    memcpy(newtable, tcet->table, table_size);
>>> +    spapr_tce_free_table(oldtable, tcet->fd, tcet->nb_table);
>>>  
>>> -    kvmppc_remove_spapr_tce(tcet->table, tcet->fd, tcet->nb_table);
>>> -
>>> -    tcet->fd = -1;
>>> -    tcet->table = newtable;
>>> +    tcet->fd = newfd;
>>>  }
>>>  
>>>  sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn)
>>
> 
>
David Gibson Aug. 9, 2017, 7:33 a.m. UTC | #4
On Mon, Jul 24, 2017 at 08:48:45PM +1000, Alexey Kardashevskiy wrote:
> On 24/07/17 15:53, David Gibson wrote:
> > On Thu, Jul 20, 2017 at 05:22:29PM +1000, Alexey Kardashevskiy wrote:
> >> This replaces g_malloc() with spapr_tce_alloc_table() as this is
> >> the standard way of allocating tables and this allows moving the table
> >> back to KVM when unplugging a VFIO PCI device and VFIO TCE acceleration
> >> support is not present in the KVM.
> > 
> > So, I like the idea here, and I think it will work for now, but one
> > thing concerns me.
> > 
> > AIUI, your future plans include allowing in-kernel accelerated TCE
> > tables even with VFIO devices.  When that happens, we might have an
> > in-kernel table both before and after a change in the "need_vfio"
> > state.
> 
> The in-kernel table just stays there, mappings will be replayed but in the
> end of the replay only the hardware table will change.
> 
> > 
> > But you won't be able to have two in-kernel tables allocated at once,
> > whereas this code relies on having both the old and new tables at once
> > so it can copy the contents over.
> > 
> > How do you intend to handle that?
> 
> I intend to make this function no-op when cap_spapr_vfio is present.
> 
> 
> > 
> >> Although spapr_tce_alloc_table() is expected to fail with EBUSY
> >> if called when previous fd is not closed yet, in practice we will not
> >> see it because cap_spapr_vfio is false at the moment.
> > 
> > I don't follow this.  How would cap_spapr_vfio be changing at any
> > point?
> 
> It depends on the version of the host kernel.

Ok.  I'm still a bit dubious about the line of reasoning here, but the
patch is correct for now, so we just have to make sure subsequent
changes work with it.

Applied to ppc-for-2.11.

> 
> > 
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >>  hw/ppc/spapr_iommu.c | 35 ++++++++++++++---------------------
> >>  1 file changed, 14 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> >> index e614621a83..307dc3021e 100644
> >> --- a/hw/ppc/spapr_iommu.c
> >> +++ b/hw/ppc/spapr_iommu.c
> >> @@ -275,33 +275,26 @@ static int spapr_tce_table_realize(DeviceState *dev)
> >>  void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio)
> >>  {
> >>      size_t table_size = tcet->nb_table * sizeof(uint64_t);
> >> -    void *newtable;
> >> +    uint64_t *oldtable;
> >> +    int newfd = -1;
> >>  
> >> -    if (need_vfio == tcet->need_vfio) {
> >> -        /* Nothing to do */
> >> -        return;
> >> -    }
> >> +    g_assert(need_vfio != tcet->need_vfio);
> >>  
> >> -    if (!need_vfio) {
> >> -        /* FIXME: We don't support transition back to KVM accelerated
> >> -         * TCEs yet */
> >> -        return;
> >> -    }
> >> +    tcet->need_vfio = need_vfio;
> >>  
> >> -    tcet->need_vfio = true;
> >> +    oldtable = tcet->table;
> >>  
> >> -    if (tcet->fd < 0) {
> >> -        /* Table is already in userspace, nothing to be do */
> >> -        return;
> >> -    }
> >> +    tcet->table = spapr_tce_alloc_table(tcet->liobn,
> >> +                                        tcet->page_shift,
> >> +                                        tcet->bus_offset,
> >> +                                        tcet->nb_table,
> >> +                                        &newfd,
> >> +                                        need_vfio);
> >> +    memcpy(tcet->table, oldtable, table_size);
> >>  
> >> -    newtable = g_malloc(table_size);
> >> -    memcpy(newtable, tcet->table, table_size);
> >> +    spapr_tce_free_table(oldtable, tcet->fd, tcet->nb_table);
> >>  
> >> -    kvmppc_remove_spapr_tce(tcet->table, tcet->fd, tcet->nb_table);
> >> -
> >> -    tcet->fd = -1;
> >> -    tcet->table = newtable;
> >> +    tcet->fd = newfd;
> >>  }
> >>  
> >>  sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn)
> > 
> 
>
Alexey Kardashevskiy Aug. 16, 2017, 3:30 a.m. UTC | #5
On 09/08/17 17:33, David Gibson wrote:
> On Mon, Jul 24, 2017 at 08:48:45PM +1000, Alexey Kardashevskiy wrote:
>> On 24/07/17 15:53, David Gibson wrote:
>>> On Thu, Jul 20, 2017 at 05:22:29PM +1000, Alexey Kardashevskiy wrote:
>>>> This replaces g_malloc() with spapr_tce_alloc_table() as this is
>>>> the standard way of allocating tables and this allows moving the table
>>>> back to KVM when unplugging a VFIO PCI device and VFIO TCE acceleration
>>>> support is not present in the KVM.
>>>
>>> So, I like the idea here, and I think it will work for now, but one
>>> thing concerns me.
>>>
>>> AIUI, your future plans include allowing in-kernel accelerated TCE
>>> tables even with VFIO devices.  When that happens, we might have an
>>> in-kernel table both before and after a change in the "need_vfio"
>>> state.
>>
>> The in-kernel table just stays there, mappings will be replayed but in the
>> end of the replay only the hardware table will change.
>>
>>>
>>> But you won't be able to have two in-kernel tables allocated at once,
>>> whereas this code relies on having both the old and new tables at once
>>> so it can copy the contents over.
>>>
>>> How do you intend to handle that?
>>
>> I intend to make this function no-op when cap_spapr_vfio is present.
>>
>>
>>>
>>>> Although spapr_tce_alloc_table() is expected to fail with EBUSY
>>>> if called when previous fd is not closed yet, in practice we will not
>>>> see it because cap_spapr_vfio is false at the moment.
>>>
>>> I don't follow this.  How would cap_spapr_vfio be changing at any
>>> point?
>>
>> It depends on the version of the host kernel.
> 
> Ok.  I'm still a bit dubious about the line of reasoning here, but the
> patch is correct for now, so we just have to make sure subsequent
> changes work with it.
> 
> Applied to ppc-for-2.11.

Thanks, what about the other two (2/3, 3/3)?
diff mbox

Patch

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index e614621a83..307dc3021e 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -275,33 +275,26 @@  static int spapr_tce_table_realize(DeviceState *dev)
 void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio)
 {
     size_t table_size = tcet->nb_table * sizeof(uint64_t);
-    void *newtable;
+    uint64_t *oldtable;
+    int newfd = -1;
 
-    if (need_vfio == tcet->need_vfio) {
-        /* Nothing to do */
-        return;
-    }
+    g_assert(need_vfio != tcet->need_vfio);
 
-    if (!need_vfio) {
-        /* FIXME: We don't support transition back to KVM accelerated
-         * TCEs yet */
-        return;
-    }
+    tcet->need_vfio = need_vfio;
 
-    tcet->need_vfio = true;
+    oldtable = tcet->table;
 
-    if (tcet->fd < 0) {
-        /* Table is already in userspace, nothing to be do */
-        return;
-    }
+    tcet->table = spapr_tce_alloc_table(tcet->liobn,
+                                        tcet->page_shift,
+                                        tcet->bus_offset,
+                                        tcet->nb_table,
+                                        &newfd,
+                                        need_vfio);
+    memcpy(tcet->table, oldtable, table_size);
 
-    newtable = g_malloc(table_size);
-    memcpy(newtable, tcet->table, table_size);
+    spapr_tce_free_table(oldtable, tcet->fd, tcet->nb_table);
 
-    kvmppc_remove_spapr_tce(tcet->table, tcet->fd, tcet->nb_table);
-
-    tcet->fd = -1;
-    tcet->table = newtable;
+    tcet->fd = newfd;
 }
 
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn)