diff mbox

[2/4] macio: add dma_active to VMStateDescription

Message ID 1452112646-15969-3-git-send-email-mark.cave-ayland@ilande.co.uk
State New
Headers show

Commit Message

Mark Cave-Ayland Jan. 6, 2016, 8:37 p.m. UTC
Make sure that we include the value of dma_active in the migration stream.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
---
 hw/ide/macio.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

John Snow Jan. 6, 2016, 8:57 p.m. UTC | #1
On 01/06/2016 03:37 PM, Mark Cave-Ayland wrote:
> Make sure that we include the value of dma_active in the migration stream.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> ---
>  hw/ide/macio.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> index 560c071..695d4d2 100644
> --- a/hw/ide/macio.c
> +++ b/hw/ide/macio.c
> @@ -518,11 +518,12 @@ static const MemoryRegionOps pmac_ide_ops = {
>  
>  static const VMStateDescription vmstate_pmac = {
>      .name = "ide",
> -    .version_id = 3,
> +    .version_id = 4,
>      .minimum_version_id = 0,
>      .fields = (VMStateField[]) {
>          VMSTATE_IDE_BUS(bus, MACIOIDEState),
>          VMSTATE_IDE_DRIVES(bus.ifs, MACIOIDEState),
> +        VMSTATE_BOOL(dma_active, MACIOIDEState),
>          VMSTATE_END_OF_LIST()
>      }
>  };
> 

Did you wind up ever observing this value to be non-zero when it was
written to the migration stream?

I really did think that we should be able to assume this was always
false due to how migration will drain all outstanding AIO, but maybe I
am mistaken.

--js
Mark Cave-Ayland Jan. 6, 2016, 9:17 p.m. UTC | #2
On 06/01/16 20:57, John Snow wrote:

> On 01/06/2016 03:37 PM, Mark Cave-Ayland wrote:
>> Make sure that we include the value of dma_active in the migration stream.
>>
>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>> ---
>>  hw/ide/macio.c |    3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
>> index 560c071..695d4d2 100644
>> --- a/hw/ide/macio.c
>> +++ b/hw/ide/macio.c
>> @@ -518,11 +518,12 @@ static const MemoryRegionOps pmac_ide_ops = {
>>  
>>  static const VMStateDescription vmstate_pmac = {
>>      .name = "ide",
>> -    .version_id = 3,
>> +    .version_id = 4,
>>      .minimum_version_id = 0,
>>      .fields = (VMStateField[]) {
>>          VMSTATE_IDE_BUS(bus, MACIOIDEState),
>>          VMSTATE_IDE_DRIVES(bus.ifs, MACIOIDEState),
>> +        VMSTATE_BOOL(dma_active, MACIOIDEState),
>>          VMSTATE_END_OF_LIST()
>>      }
>>  };
>>
> 
> Did you wind up ever observing this value to be non-zero when it was
> written to the migration stream?
> 
> I really did think that we should be able to assume this was always
> false due to how migration will drain all outstanding AIO, but maybe I
> am mistaken.

I think this can happen because Darwin/MacOS sets the DBDMA processor
running first *before* the IDE request is issued, compared to pretty
much every other OS which issues the IDE request *first* which then in
turn invokes the DMA engine (which is the general assumption in the QEMU
IDE/DMA APIs).

So there could be a window where the DBDMA is programmed and active but
migration takes place before the corresponding IDE request has been
issued (which is exactly the situation that this flag handles).


ATB,

Mark.
John Snow Jan. 8, 2016, 8:55 p.m. UTC | #3
On 01/06/2016 04:17 PM, Mark Cave-Ayland wrote:
> On 06/01/16 20:57, John Snow wrote:
> 
>> On 01/06/2016 03:37 PM, Mark Cave-Ayland wrote:
>>> Make sure that we include the value of dma_active in the migration stream.
>>>
>>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>> ---
>>>  hw/ide/macio.c |    3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
>>> index 560c071..695d4d2 100644
>>> --- a/hw/ide/macio.c
>>> +++ b/hw/ide/macio.c
>>> @@ -518,11 +518,12 @@ static const MemoryRegionOps pmac_ide_ops = {
>>>  
>>>  static const VMStateDescription vmstate_pmac = {
>>>      .name = "ide",
>>> -    .version_id = 3,
>>> +    .version_id = 4,
>>>      .minimum_version_id = 0,
>>>      .fields = (VMStateField[]) {
>>>          VMSTATE_IDE_BUS(bus, MACIOIDEState),
>>>          VMSTATE_IDE_DRIVES(bus.ifs, MACIOIDEState),
>>> +        VMSTATE_BOOL(dma_active, MACIOIDEState),
>>>          VMSTATE_END_OF_LIST()
>>>      }
>>>  };
>>>
>>
>> Did you wind up ever observing this value to be non-zero when it was
>> written to the migration stream?
>>
>> I really did think that we should be able to assume this was always
>> false due to how migration will drain all outstanding AIO, but maybe I
>> am mistaken.
> 
> I think this can happen because Darwin/MacOS sets the DBDMA processor
> running first *before* the IDE request is issued, compared to pretty
> much every other OS which issues the IDE request *first* which then in
> turn invokes the DMA engine (which is the general assumption in the QEMU
> IDE/DMA APIs).
> 
> So there could be a window where the DBDMA is programmed and active but
> migration takes place before the corresponding IDE request has been
> issued (which is exactly the situation that this flag handles).
> 
> 
> ATB,
> 
> Mark.
> 

sadly that seems to be the case. ide_dbdma_start looks like it can yield
through DBDMA_kick, so there's time for things to go awry.

Acked-by: John Snow <jsnow@redhat.com>

I had an off-list discussion with David Gilbert on how the migration
fields work here -- this will introduce a hard incompatibility between
pre-2.5 and post-2.5, which might be fine since Mac has never really
quite worked correctly anyway.

If you want to worry about compatibility, David advised me that a
conditional subsection might be appropriate:

since dma_active is /usually/ false, we can use this as a flag for
deciding to migrate it or not: i.e. if it's false, we skip the field and
the receiver assumes it's false in post_load, or if we migrate to an
older version, it never has to worry about it.

If it's true, you get a migration error that says the subsection wasn't
found, but you get to try to migrate again -- it's kind of a cheesy way
to say that you can't migrate to older versions while the DMA is active.
Future versions can accept the true boolean, though.

HTH
--js
Mark Cave-Ayland Jan. 11, 2016, 11:41 p.m. UTC | #4
On 08/01/16 20:55, John Snow wrote:

> On 01/06/2016 04:17 PM, Mark Cave-Ayland wrote:
>> On 06/01/16 20:57, John Snow wrote:
>>
>>> On 01/06/2016 03:37 PM, Mark Cave-Ayland wrote:
>>>> Make sure that we include the value of dma_active in the migration stream.
>>>>
>>>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>>> ---
>>>>  hw/ide/macio.c |    3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
>>>> index 560c071..695d4d2 100644
>>>> --- a/hw/ide/macio.c
>>>> +++ b/hw/ide/macio.c
>>>> @@ -518,11 +518,12 @@ static const MemoryRegionOps pmac_ide_ops = {
>>>>  
>>>>  static const VMStateDescription vmstate_pmac = {
>>>>      .name = "ide",
>>>> -    .version_id = 3,
>>>> +    .version_id = 4,
>>>>      .minimum_version_id = 0,
>>>>      .fields = (VMStateField[]) {
>>>>          VMSTATE_IDE_BUS(bus, MACIOIDEState),
>>>>          VMSTATE_IDE_DRIVES(bus.ifs, MACIOIDEState),
>>>> +        VMSTATE_BOOL(dma_active, MACIOIDEState),
>>>>          VMSTATE_END_OF_LIST()
>>>>      }
>>>>  };
>>>>
>>>
>>> Did you wind up ever observing this value to be non-zero when it was
>>> written to the migration stream?
>>>
>>> I really did think that we should be able to assume this was always
>>> false due to how migration will drain all outstanding AIO, but maybe I
>>> am mistaken.
>>
>> I think this can happen because Darwin/MacOS sets the DBDMA processor
>> running first *before* the IDE request is issued, compared to pretty
>> much every other OS which issues the IDE request *first* which then in
>> turn invokes the DMA engine (which is the general assumption in the QEMU
>> IDE/DMA APIs).
>>
>> So there could be a window where the DBDMA is programmed and active but
>> migration takes place before the corresponding IDE request has been
>> issued (which is exactly the situation that this flag handles).
>>
>>
>> ATB,
>>
>> Mark.
>>
> 
> sadly that seems to be the case. ide_dbdma_start looks like it can yield
> through DBDMA_kick, so there's time for things to go awry.
> 
> Acked-by: John Snow <jsnow@redhat.com>
> 
> I had an off-list discussion with David Gilbert on how the migration
> fields work here -- this will introduce a hard incompatibility between
> pre-2.5 and post-2.5, which might be fine since Mac has never really
> quite worked correctly anyway.
> 
> If you want to worry about compatibility, David advised me that a
> conditional subsection might be appropriate:
> 
> since dma_active is /usually/ false, we can use this as a flag for
> deciding to migrate it or not: i.e. if it's false, we skip the field and
> the receiver assumes it's false in post_load, or if we migrate to an
> older version, it never has to worry about it.
> 
> If it's true, you get a migration error that says the subsection wasn't
> found, but you get to try to migrate again -- it's kind of a cheesy way
> to say that you can't migrate to older versions while the DMA is active.
> Future versions can accept the true boolean, though.

I'm not too worried about this since before my patchset last year then
none of the Mac machines could be migrated since version ~0.10, and even
then, only when there was no outstanding disk activity (e.g. just within
OpenBIOS).

As there are also issues with the CPU interrupt status under TCG (see my
related patchset) then the chance of getting a successful migration
before now is amazingly small. Alex, what do you think?


ATB,

Mark.
John Snow Jan. 14, 2016, 4:51 p.m. UTC | #5
On 01/11/2016 06:41 PM, Mark Cave-Ayland wrote:
> On 08/01/16 20:55, John Snow wrote:
> 
>> On 01/06/2016 04:17 PM, Mark Cave-Ayland wrote:
>>> On 06/01/16 20:57, John Snow wrote:
>>>
>>>> On 01/06/2016 03:37 PM, Mark Cave-Ayland wrote:
>>>>> Make sure that we include the value of dma_active in the migration stream.
>>>>>
>>>>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>>>> ---
>>>>>  hw/ide/macio.c |    3 ++-
>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
>>>>> index 560c071..695d4d2 100644
>>>>> --- a/hw/ide/macio.c
>>>>> +++ b/hw/ide/macio.c
>>>>> @@ -518,11 +518,12 @@ static const MemoryRegionOps pmac_ide_ops = {
>>>>>  
>>>>>  static const VMStateDescription vmstate_pmac = {
>>>>>      .name = "ide",
>>>>> -    .version_id = 3,
>>>>> +    .version_id = 4,
>>>>>      .minimum_version_id = 0,
>>>>>      .fields = (VMStateField[]) {
>>>>>          VMSTATE_IDE_BUS(bus, MACIOIDEState),
>>>>>          VMSTATE_IDE_DRIVES(bus.ifs, MACIOIDEState),
>>>>> +        VMSTATE_BOOL(dma_active, MACIOIDEState),
>>>>>          VMSTATE_END_OF_LIST()
>>>>>      }
>>>>>  };
>>>>>
>>>>
>>>> Did you wind up ever observing this value to be non-zero when it was
>>>> written to the migration stream?
>>>>
>>>> I really did think that we should be able to assume this was always
>>>> false due to how migration will drain all outstanding AIO, but maybe I
>>>> am mistaken.
>>>
>>> I think this can happen because Darwin/MacOS sets the DBDMA processor
>>> running first *before* the IDE request is issued, compared to pretty
>>> much every other OS which issues the IDE request *first* which then in
>>> turn invokes the DMA engine (which is the general assumption in the QEMU
>>> IDE/DMA APIs).
>>>
>>> So there could be a window where the DBDMA is programmed and active but
>>> migration takes place before the corresponding IDE request has been
>>> issued (which is exactly the situation that this flag handles).
>>>
>>>
>>> ATB,
>>>
>>> Mark.
>>>
>>
>> sadly that seems to be the case. ide_dbdma_start looks like it can yield
>> through DBDMA_kick, so there's time for things to go awry.
>>
>> Acked-by: John Snow <jsnow@redhat.com>
>>
>> I had an off-list discussion with David Gilbert on how the migration
>> fields work here -- this will introduce a hard incompatibility between
>> pre-2.5 and post-2.5, which might be fine since Mac has never really
>> quite worked correctly anyway.
>>
>> If you want to worry about compatibility, David advised me that a
>> conditional subsection might be appropriate:
>>
>> since dma_active is /usually/ false, we can use this as a flag for
>> deciding to migrate it or not: i.e. if it's false, we skip the field and
>> the receiver assumes it's false in post_load, or if we migrate to an
>> older version, it never has to worry about it.
>>
>> If it's true, you get a migration error that says the subsection wasn't
>> found, but you get to try to migrate again -- it's kind of a cheesy way
>> to say that you can't migrate to older versions while the DMA is active.
>> Future versions can accept the true boolean, though.
> 
> I'm not too worried about this since before my patchset last year then
> none of the Mac machines could be migrated since version ~0.10, and even
> then, only when there was no outstanding disk activity (e.g. just within
> OpenBIOS).
> 
> As there are also issues with the CPU interrupt status under TCG (see my
> related patchset) then the chance of getting a successful migration
> before now is amazingly small. Alex, what do you think?
> 
> 
> ATB,
> 
> Mark.
> 

We can revisit this once the solution for the CPU interrupt status is
nailed down :)

Broadly, you are right that this board has been pretty broken for a long
time, but it appears to be at least semi-functional in 2.5, so it might
be time to start maintaining some migration compatibility -- even if
it's just in the form of nicer error messages instead of stuff exploding.

--js
Mark Cave-Ayland Jan. 14, 2016, 8:18 p.m. UTC | #6
On 14/01/16 16:51, John Snow wrote:

> We can revisit this once the solution for the CPU interrupt status is
> nailed down :)

Sure :)

> Broadly, you are right that this board has been pretty broken for a long
> time, but it appears to be at least semi-functional in 2.5, so it might
> be time to start maintaining some migration compatibility -- even if
> it's just in the form of nicer error messages instead of stuff exploding.

I think with the interrupt status/timer issues sorted soon it's worth
maintaining migration from this release onwards, but I can say without
fear having been trying to get migration to work on various PPC/TCG
guests up until now that there is a < 10% of having a working image on
loadvm and without the macio fixes a very good chance of disk corruption
too.


ATB,

Mark.
diff mbox

Patch

diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index 560c071..695d4d2 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -518,11 +518,12 @@  static const MemoryRegionOps pmac_ide_ops = {
 
 static const VMStateDescription vmstate_pmac = {
     .name = "ide",
-    .version_id = 3,
+    .version_id = 4,
     .minimum_version_id = 0,
     .fields = (VMStateField[]) {
         VMSTATE_IDE_BUS(bus, MACIOIDEState),
         VMSTATE_IDE_DRIVES(bus.ifs, MACIOIDEState),
+        VMSTATE_BOOL(dma_active, MACIOIDEState),
         VMSTATE_END_OF_LIST()
     }
 };