diff mbox

[RFC] pseries: define coldplugged devices as "configured"

Message ID 1439470382-17540-1-git-send-email-lvivier@redhat.com
State New
Headers show

Commit Message

Laurent Vivier Aug. 13, 2015, 12:53 p.m. UTC
When a device is hotplugged, attach() sets "configured" to
false, waiting an action from the OS to configure it and then
to call ibm,configure-connector. On ibm,configure-connector,
the hypervisor sets "configured" to true.

In case of coldplugged device, attach() sets "configured" to
false, but firmware and OS never call the ibm,configure-connector
in this case, so it remains set to false.

It could be harmless, but when we unplug a device, hypervisor
waits the device becomes configured because for it, a not configured
device is a device being configured, so it waits the end of configuration
to unplug it... and it never happens, so it is never unplugged.

This patch set by default coldplugged device to "configured=true",
hotplugged device to "configured=false".

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/ppc/spapr_drc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Bharata B Rao Aug. 14, 2015, 5:20 a.m. UTC | #1
On Thu, Aug 13, 2015 at 02:53:02PM +0200, Laurent Vivier wrote:
> When a device is hotplugged, attach() sets "configured" to
> false, waiting an action from the OS to configure it and then
> to call ibm,configure-connector. On ibm,configure-connector,
> the hypervisor sets "configured" to true.
> 
> In case of coldplugged device, attach() sets "configured" to
> false, but firmware and OS never call the ibm,configure-connector
> in this case, so it remains set to false.
> 
> It could be harmless, but when we unplug a device, hypervisor
> waits the device becomes configured because for it, a not configured
> device is a device being configured, so it waits the end of configuration
> to unplug it... and it never happens, so it is never unplugged.

Not true for at least logical DR device like CPU. I am able to cleanly
unplug a cold plugged CPU in the patchset I posted at:

https://lists.gnu.org/archive/html/qemu-ppc/2015-08/msg00041.html

And this is how the state transitions work for cold plugged CPU devices:

- Cold plugged CPU DRC is explicitly set with allocation_state=USABLE
  and isolation_state=UNISOLATED.
- device_del results in drck->detach() that just returns by setting
  drc->awaiting_release to true.
- Unplug notification is sent to guest.
- Guest comes back with set_indicator RTAS call for setting isolation_state
  to ISOLATED. set_isolation_state() sets drc->configured to false.
- Guest comes back again with set_indicator RTAS call for setting allocation
  state to UNUSABLE. set_allocation_state() finalizes the device removal by
  calling drck->detach()
- drck->detach() now calls drc->detach_cb() that truly releases the
  CPU resource by getting rid of vCPU thread in QEMU.

Regards,
Bharata.
Laurent Vivier Aug. 14, 2015, 7:16 a.m. UTC | #2
On 14/08/2015 07:20, Bharata B Rao wrote:
> On Thu, Aug 13, 2015 at 02:53:02PM +0200, Laurent Vivier wrote:
>> When a device is hotplugged, attach() sets "configured" to
>> false, waiting an action from the OS to configure it and then
>> to call ibm,configure-connector. On ibm,configure-connector,
>> the hypervisor sets "configured" to true.
>>
>> In case of coldplugged device, attach() sets "configured" to
>> false, but firmware and OS never call the ibm,configure-connector
>> in this case, so it remains set to false.
>>
>> It could be harmless, but when we unplug a device, hypervisor
>> waits the device becomes configured because for it, a not configured
>> device is a device being configured, so it waits the end of configuration
>> to unplug it... and it never happens, so it is never unplugged.
> 
> Not true for at least logical DR device like CPU. I am able to cleanly
> unplug a cold plugged CPU in the patchset I posted at:
> 
> https://lists.gnu.org/archive/html/qemu-ppc/2015-08/msg00041.html
> 
> And this is how the state transitions work for cold plugged CPU devices:

Could you try with a PCI card ?

Thanks,
Laurent
Bharata B Rao Aug. 14, 2015, 7:44 a.m. UTC | #3
On Fri, Aug 14, 2015 at 09:16:08AM +0200, Laurent Vivier wrote:
> 
> 
> On 14/08/2015 07:20, Bharata B Rao wrote:
> > On Thu, Aug 13, 2015 at 02:53:02PM +0200, Laurent Vivier wrote:
> >> When a device is hotplugged, attach() sets "configured" to
> >> false, waiting an action from the OS to configure it and then
> >> to call ibm,configure-connector. On ibm,configure-connector,
> >> the hypervisor sets "configured" to true.
> >>
> >> In case of coldplugged device, attach() sets "configured" to
> >> false, but firmware and OS never call the ibm,configure-connector
> >> in this case, so it remains set to false.
> >>
> >> It could be harmless, but when we unplug a device, hypervisor
> >> waits the device becomes configured because for it, a not configured
> >> device is a device being configured, so it waits the end of configuration
> >> to unplug it... and it never happens, so it is never unplugged.
> > 
> > Not true for at least logical DR device like CPU. I am able to cleanly
> > unplug a cold plugged CPU in the patchset I posted at:
> > 
> > https://lists.gnu.org/archive/html/qemu-ppc/2015-08/msg00041.html
> > 
> > And this is how the state transitions work for cold plugged CPU devices:
> 
> Could you try with a PCI card ?

Yes, there is an issue with removal of cold plugged PCI devices. I can see
the device getting completely removed in the guest but it still remains
in QEMU as shown by the QEMU monitor. So your patch fixes this by ensuring
complete removal.

Regards,
Bharata.
Laurent Vivier Aug. 14, 2015, 7:46 a.m. UTC | #4
On 14/08/2015 07:20, Bharata B Rao wrote:
> On Thu, Aug 13, 2015 at 02:53:02PM +0200, Laurent Vivier wrote:
>> When a device is hotplugged, attach() sets "configured" to
>> false, waiting an action from the OS to configure it and then
>> to call ibm,configure-connector. On ibm,configure-connector,
>> the hypervisor sets "configured" to true.
>>
>> In case of coldplugged device, attach() sets "configured" to
>> false, but firmware and OS never call the ibm,configure-connector
>> in this case, so it remains set to false.
>>
>> It could be harmless, but when we unplug a device, hypervisor
>> waits the device becomes configured because for it, a not configured
>> device is a device being configured, so it waits the end of configuration
>> to unplug it... and it never happens, so it is never unplugged.
> 
> Not true for at least logical DR device like CPU. I am able to cleanly
> unplug a cold plugged CPU in the patchset I posted at:
> 
> https://lists.gnu.org/archive/html/qemu-ppc/2015-08/msg00041.html
> 
> And this is how the state transitions work for cold plugged CPU devices:
> 
> - Cold plugged CPU DRC is explicitly set with allocation_state=USABLE
>   and isolation_state=UNISOLATED.
> - device_del results in drck->detach() that just returns by setting
>   drc->awaiting_release to true.
> - Unplug notification is sent to guest.
> - Guest comes back with set_indicator RTAS call for setting isolation_state
>   to ISOLATED. set_isolation_state() sets drc->configured to false.
> - Guest comes back again with set_indicator RTAS call for setting allocation
>   state to UNUSABLE. set_allocation_state() finalizes the device removal by
>   calling drck->detach()

It doesn't work for PCI, because (QEMU 2.4.0):

static int set_allocation_state(sPAPRDRConnector *drc,
                                sPAPRDRAllocationState state)
...
    if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) {
...
            drck->detach(drc, DEVICE(drc->dev), drc->detach_cb,
                         drc->detach_cb_opaque, NULL);
...
    }

> - drck->detach() now calls drc->detach_cb() that truly releases the
>   CPU resource by getting rid of vCPU thread in QEMU.

Laurent
Laurent Vivier Aug. 14, 2015, 12:33 p.m. UTC | #5
I'd like to know if it is the good way to fix the problem: are there
more comments on this patch ? People from IBM ?

Laurent

On 13/08/2015 14:53, Laurent Vivier wrote:
> When a device is hotplugged, attach() sets "configured" to
> false, waiting an action from the OS to configure it and then
> to call ibm,configure-connector. On ibm,configure-connector,
> the hypervisor sets "configured" to true.
> 
> In case of coldplugged device, attach() sets "configured" to
> false, but firmware and OS never call the ibm,configure-connector
> in this case, so it remains set to false.
> 
> It could be harmless, but when we unplug a device, hypervisor
> waits the device becomes configured because for it, a not configured
> device is a device being configured, so it waits the end of configuration
> to unplug it... and it never happens, so it is never unplugged.
> 
> This patch set by default coldplugged device to "configured=true",
> hotplugged device to "configured=false".
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
>  hw/ppc/spapr_drc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index ee87432..e86babf 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -310,7 +310,7 @@ static void attach(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
>      drc->dev = d;
>      drc->fdt = fdt;
>      drc->fdt_start_offset = fdt_start_offset;
> -    drc->configured = false;
> +    drc->configured = coldplug;
>  
>      object_property_add_link(OBJECT(drc), "device",
>                               object_get_typename(OBJECT(drc->dev)),
>
Michael Roth Aug. 23, 2015, 7:08 p.m. UTC | #6
Quoting Laurent Vivier (2015-08-14 02:46:49)
> 
> 
> On 14/08/2015 07:20, Bharata B Rao wrote:
> > On Thu, Aug 13, 2015 at 02:53:02PM +0200, Laurent Vivier wrote:
> >> When a device is hotplugged, attach() sets "configured" to
> >> false, waiting an action from the OS to configure it and then
> >> to call ibm,configure-connector. On ibm,configure-connector,
> >> the hypervisor sets "configured" to true.
> >>
> >> In case of coldplugged device, attach() sets "configured" to
> >> false, but firmware and OS never call the ibm,configure-connector
> >> in this case, so it remains set to false.
> >>
> >> It could be harmless, but when we unplug a device, hypervisor
> >> waits the device becomes configured because for it, a not configured
> >> device is a device being configured, so it waits the end of configuration
> >> to unplug it... and it never happens, so it is never unplugged.
> > 
> > Not true for at least logical DR device like CPU. I am able to cleanly
> > unplug a cold plugged CPU in the patchset I posted at:
> > 
> > https://lists.gnu.org/archive/html/qemu-ppc/2015-08/msg00041.html
> > 
> > And this is how the state transitions work for cold plugged CPU devices:
> > 
> > - Cold plugged CPU DRC is explicitly set with allocation_state=USABLE
> >   and isolation_state=UNISOLATED.
> > - device_del results in drck->detach() that just returns by setting
> >   drc->awaiting_release to true.
> > - Unplug notification is sent to guest.
> > - Guest comes back with set_indicator RTAS call for setting isolation_state
> >   to ISOLATED. set_isolation_state() sets drc->configured to false.
> > - Guest comes back again with set_indicator RTAS call for setting allocation
> >   state to UNUSABLE. set_allocation_state() finalizes the device removal by
> >   calling drck->detach()
> 
> It doesn't work for PCI, because (QEMU 2.4.0):
> 
> static int set_allocation_state(sPAPRDRConnector *drc,
>                                 sPAPRDRAllocationState state)
> ...
>     if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) {
> ...
>             drck->detach(drc, DEVICE(drc->dev), drc->detach_cb,
>                          drc->detach_cb_opaque, NULL);
> ...

Ok, that makes sense then:

the is_configured() checks were added due to a race specifically with
PCI devices: when we plug the device we hand control over to OS and set
state to unisolated as a result. The guest assumes 'interactive' hotplug
where it sets a slot back to isolated and waits for the user to actually
plug it in. Once plugged in, state is moved back to isolated, and guest
starts configuring device. We use a flag in guest drmgr invocation to skip
the wait, but it *still* does the change to isolated state. So there's an
extra unisolated->isolated->unisolated transition for PCI in guest code.

Because of that check, if management does a quick device_add+device_del,
there's a race where we mark the device as awaiting_release as soon as
the device_del comes in (even though device_add event might still be
getting processed by guest). That would fine normally, but in this state
a transition to isolated state results in the device getting immediately
finalized and then disappearing while the guest is trying to configure
it, so the extra transition in the PCI case races with device_del.

The is_configured() check removes that race window, and the check was
added in set_isolation(). 'logical' resources (lmb/cpu/phb) get
finalized via set_allocation() however, which is why they didn't appear
affected by this bug. And from what I can tell, cpu/lmb don't make extra
'isolated'/'unallocated' transitions, just the ones at the end unplug,
so the fact that we're missing the check in set_allocation() shouldn't
be a problem. Makes sense to set the configured flag appropriately for
those case as well though for consistency.

Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

>     }
> 
> > - drck->detach() now calls drc->detach_cb() that truly releases the
> >   CPU resource by getting rid of vCPU thread in QEMU.
> 
> Laurent
>
Laurent Vivier Aug. 26, 2015, 1:04 p.m. UTC | #7
On 23/08/2015 21:08, Michael Roth wrote:
> Quoting Laurent Vivier (2015-08-14 02:46:49)
>>
>>
>> On 14/08/2015 07:20, Bharata B Rao wrote:
>>> On Thu, Aug 13, 2015 at 02:53:02PM +0200, Laurent Vivier wrote:
>>>> When a device is hotplugged, attach() sets "configured" to
>>>> false, waiting an action from the OS to configure it and then
>>>> to call ibm,configure-connector. On ibm,configure-connector,
>>>> the hypervisor sets "configured" to true.
>>>>
>>>> In case of coldplugged device, attach() sets "configured" to
>>>> false, but firmware and OS never call the ibm,configure-connector
>>>> in this case, so it remains set to false.
>>>>
>>>> It could be harmless, but when we unplug a device, hypervisor
>>>> waits the device becomes configured because for it, a not configured
>>>> device is a device being configured, so it waits the end of configuration
>>>> to unplug it... and it never happens, so it is never unplugged.
>>>
>>> Not true for at least logical DR device like CPU. I am able to cleanly
>>> unplug a cold plugged CPU in the patchset I posted at:
>>>
>>> https://lists.gnu.org/archive/html/qemu-ppc/2015-08/msg00041.html
>>>
>>> And this is how the state transitions work for cold plugged CPU devices:
>>>
>>> - Cold plugged CPU DRC is explicitly set with allocation_state=USABLE
>>>   and isolation_state=UNISOLATED.
>>> - device_del results in drck->detach() that just returns by setting
>>>   drc->awaiting_release to true.
>>> - Unplug notification is sent to guest.
>>> - Guest comes back with set_indicator RTAS call for setting isolation_state
>>>   to ISOLATED. set_isolation_state() sets drc->configured to false.
>>> - Guest comes back again with set_indicator RTAS call for setting allocation
>>>   state to UNUSABLE. set_allocation_state() finalizes the device removal by
>>>   calling drck->detach()
>>
>> It doesn't work for PCI, because (QEMU 2.4.0):
>>
>> static int set_allocation_state(sPAPRDRConnector *drc,
>>                                 sPAPRDRAllocationState state)
>> ...
>>     if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) {
>> ...
>>             drck->detach(drc, DEVICE(drc->dev), drc->detach_cb,
>>                          drc->detach_cb_opaque, NULL);
>> ...
> 
> Ok, that makes sense then:
> 
> the is_configured() checks were added due to a race specifically with
> PCI devices: when we plug the device we hand control over to OS and set
> state to unisolated as a result. The guest assumes 'interactive' hotplug
> where it sets a slot back to isolated and waits for the user to actually
> plug it in. Once plugged in, state is moved back to isolated, and guest
> starts configuring device. We use a flag in guest drmgr invocation to skip
> the wait, but it *still* does the change to isolated state. So there's an
> extra unisolated->isolated->unisolated transition for PCI in guest code.
> 
> Because of that check, if management does a quick device_add+device_del,
> there's a race where we mark the device as awaiting_release as soon as
> the device_del comes in (even though device_add event might still be
> getting processed by guest). That would fine normally, but in this state
> a transition to isolated state results in the device getting immediately
> finalized and then disappearing while the guest is trying to configure
> it, so the extra transition in the PCI case races with device_del.
> 
> The is_configured() check removes that race window, and the check was
> added in set_isolation(). 'logical' resources (lmb/cpu/phb) get
> finalized via set_allocation() however, which is why they didn't appear
> affected by this bug. And from what I can tell, cpu/lmb don't make extra
> 'isolated'/'unallocated' transitions, just the ones at the end unplug,
> so the fact that we're missing the check in set_allocation() shouldn't
> be a problem. Makes sense to set the configured flag appropriately for
> those case as well though for consistency.
> 
> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

David or Alex, are you ready to take this patch to your -next branch ?

> 
>>     }
>>
>>> - drck->detach() now calls drc->detach_cb() that truly releases the
>>>   CPU resource by getting rid of vCPU thread in QEMU.
>>
>> Laurent
>>
>
David Gibson Sept. 1, 2015, 5 a.m. UTC | #8
On Thu, Aug 13, 2015 at 02:53:02PM +0200, Laurent Vivier wrote:
> When a device is hotplugged, attach() sets "configured" to
> false, waiting an action from the OS to configure it and then
> to call ibm,configure-connector. On ibm,configure-connector,
> the hypervisor sets "configured" to true.
> 
> In case of coldplugged device, attach() sets "configured" to
> false, but firmware and OS never call the ibm,configure-connector
> in this case, so it remains set to false.
> 
> It could be harmless, but when we unplug a device, hypervisor
> waits the device becomes configured because for it, a not configured
> device is a device being configured, so it waits the end of configuration
> to unplug it... and it never happens, so it is never unplugged.
> 
> This patch set by default coldplugged device to "configured=true",
> hotplugged device to "configured=false".
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Merged to spapr-next, thanks.
diff mbox

Patch

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index ee87432..e86babf 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -310,7 +310,7 @@  static void attach(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
     drc->dev = d;
     drc->fdt = fdt;
     drc->fdt_start_offset = fdt_start_offset;
-    drc->configured = false;
+    drc->configured = coldplug;
 
     object_property_add_link(OBJECT(drc), "device",
                              object_get_typename(OBJECT(drc->dev)),