Message ID | 20190715121338.20600-3-philmd@redhat.com |
---|---|
State | New |
Headers | show |
Series | hw/block/pflash_cfi01: Add DeviceReset() handler | expand |
Philippe asked me to have a look at this one, so here goes. Philippe Mathieu-Daudé <philmd@redhat.com> writes: > In the document [*] the "Read Array Flowchart", the READ_ARRAY > command has a value of 0xff. > > Use the correct value in the pflash model. > > There is no change of behavior in the guest, because: > - when the guest were sending 0xFF, the reset_flash label > was setting the command value as 0x00 > - 0x00 was used internally for READ_ARRAY *Groan* Is this cleanup, or does it fix an observable bug? > To keep migration with older versions behaving correctly, we > decide to always migrate the READ_ARRAY as 0x00. > > If the CFI open standard decide to assign a new command of value > 0x00, this model is flawed because it uses this value internally. > If a guest eventually requires this new CFI feature, a different > model will be required (or this same model but breaking backward > migration). So it is safe to keep migrating READ_ARRAY as 0x00. We could perhaps keep migration working for "benign" device states, with judicious use of subsections. We'll cross that bridge when we get to it. > [*] "Common Flash Interface (CFI) and Command Sets" > (Intel Application Note 646) > Appendix B "Basic Command Set" > > Reviewed-by: John Snow <jsnow@redhat.com> > Reviewed-by: Alistair Francis <alistair.francis@wdc.com> > Regression-tested-by: Laszlo Ersek <lersek@redhat.com> > Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> > --- > v3: Handle migrating the 'cmd' field. > v4: Handle migrating to older QEMU (Dave) > v5: Add a paragraph about why this model is flawed due to > historically using READ_ARRAY as 0x00 (Dave, Peter). > > Since Laszlo stated he did not test migration [*], I'm keeping his > test tag, because the change with v2 has no impact in the tests > he ran. > > Likewise I'm keeping John and Alistair tags, but I'd like an extra > review for the migration change, thanks! > > [*] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg00679.html > --- > hw/block/pflash_cfi01.c | 57 ++++++++++++++++++++++++++++++++++------- > 1 file changed, 48 insertions(+), 9 deletions(-) > > diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c > index 9e34fd4e82..85bb2132c0 100644 > --- a/hw/block/pflash_cfi01.c > +++ b/hw/block/pflash_cfi01.c > @@ -96,6 +96,37 @@ struct PFlashCFI01 { > bool old_multiple_chip_handling; > }; > > +static int pflash_pre_save(void *opaque) > +{ > + PFlashCFI01 *s = opaque; > + > + /* > + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the > + * READ_ARRAY command. To preserve migrating to these older version, > + * always migrate the READ_ARRAY command as 0x00. > + */ > + if (s->cmd == 0xff) { > + s->cmd = 0x00; > + } > + > + return 0; > +} > + > +static int pflash_post_save(void *opaque) > +{ > + PFlashCFI01 *s = opaque; > + > + /* > + * If migration failed, the guest will continue to run. > + * Restore the correct READ_ARRAY value. > + */ > + if (s->cmd == 0x00) { > + s->cmd = 0xff; > + } > + > + return 0; > +} Uh, this gives me a queasy feeling. Perhaps David can assuage it. I figure the intent is to migrate PFlashCFI01 member @cmd value 0xFF as 0x00, for migration compatibility to and from older versions. You do this by monkey-patching it to 0x00 before migration, and to 0xFF afterwards. On the incoming side, you replace 0x00 by 0xFF, in pflash_post_load() below. Questions: * Can anything but the code that sends @cmd see the temporary 0x00 value between pflash_pre_save() and pflash_post_save() * Consider the matrix source \in { old, new } x dest \in { old, new } x @cmd on source in { 0x00, 0xFF }. What does migration put into @cmd on dest? Eight cases: source @cmd -> wire -> dest @cmd old 0x00 -> 0x00 -> old 0x00 (1) new 0xFF (2) old 0xFF -> 0xFF -> old 0xFF (3) new 0xFF (4) new 0x00 -> 0x00 -> old 0x00 (5) new 0xFF (6) new 0xFF -> 0x00 -> old 0x00 (7) new 0xFF (8) Old -> old (cases 1 and 3) is unaffected by this patch. New -> new leaves 0xFF unchanged (8). It changes 0x00 to 0xFF (6). Uh-oh. Can this happen? Rephrasing the question: can @cmd ever be 0x00 with this patch applied? Old -> new leaves 0xFF unchanged (4). It changes 0x00 to 0xFF (2), which I think is intentional. New -> old leaves 0x00 unchanged (5). It changes 0xFF to 0x00 (7), which I think is intentional. Old -> new -> old leaves 0x00 unchanged. Good. It changes 0xFF to 0x00. Uh-oh. Can @cmd ever be 0xFF before this patch? New -> old -> new leaves 0xFF unchanged. Good. It changes 0x00 to 0xFF. Same uh-oh as for new -> new. > + > static int pflash_post_load(void *opaque, int version_id); > > static const VMStateDescription vmstate_pflash = { > @@ -103,6 +134,8 @@ static const VMStateDescription vmstate_pflash = { > .version_id = 1, > .minimum_version_id = 1, > .post_load = pflash_post_load, > + .pre_save = pflash_pre_save, > + .post_save = pflash_post_save, > .fields = (VMStateField[]) { > VMSTATE_UINT8(wcycle, PFlashCFI01), > VMSTATE_UINT8(cmd, PFlashCFI01), > @@ -277,10 +310,9 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr offset, > /* This should never happen : reset state & treat it as a read */ > DPRINTF("%s: unknown command state: %x\n", __func__, pfl->cmd); > pfl->wcycle = 0; > - pfl->cmd = 0; > + pfl->cmd = 0xff; > /* fall through to read code */ > - case 0x00: > - /* Flash area read */ > + case 0xff: /* Read Array */ > ret = pflash_data_read(pfl, offset, width, be); > break; On 0xFF, we no longer zap pfl->wcycle and pfl->cmd. On 0x00, we do. We zap pfl->cmd to 0xFF instead of 0x00. Same below after label error_flash and reset_flash. Related: initialization to 0xFF instead of 0x00 in pflash_cfi01_realize(). I *guess* these changes together ensure pfl->cmd can't become 0x00. Correct? > case 0x10: /* Single byte program */ > @@ -448,8 +480,6 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, > case 0: > /* read mode */ > switch (cmd) { > - case 0x00: /* ??? */ > - goto reset_flash; On 0x00, we now use default: goto error_flash. Can this happen? > case 0x10: /* Single Byte Program */ > case 0x40: /* Single Byte Program */ > DPRINTF("%s: Single Byte Program\n", __func__); > @@ -526,7 +556,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, > if (cmd == 0xd0) { /* confirm */ > pfl->wcycle = 0; > pfl->status |= 0x80; > - } else if (cmd == 0xff) { /* read array mode */ > + } else if (cmd == 0xff) { /* Read Array */ > goto reset_flash; > } else > goto error_flash; > @@ -553,7 +583,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, > } else if (cmd == 0x01) { > pfl->wcycle = 0; > pfl->status |= 0x80; > - } else if (cmd == 0xff) { > + } else if (cmd == 0xff) { /* read array mode */ Your new comment is phrased the way you corrected in the previous hunk. Intentional? > goto reset_flash; > } else { > DPRINTF("%s: Unknown (un)locking command\n", __func__); > @@ -645,7 +675,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, error_flash: qemu_log_mask(LOG_UNIMP, "%s: Unimplemented flash cmd sequence " "(offset " TARGET_FMT_plx ", wcycle 0x%x cmd 0x%x value 0x%x)" "\n", __func__, offset, pfl->wcycle, pfl->cmd, value); reset_flash: > trace_pflash_reset(); > memory_region_rom_device_set_romd(&pfl->mem, true); > pfl->wcycle = 0; > - pfl->cmd = 0; > + pfl->cmd = 0xff; > } > > > @@ -761,7 +791,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp) > } > > pfl->wcycle = 0; > - pfl->cmd = 0; > + pfl->cmd = 0xff; > pfl->status = 0; > /* Hardcoded CFI table */ > /* Standard "QRY" string */ > @@ -1001,5 +1031,14 @@ static int pflash_post_load(void *opaque, int version_id) > pfl->vmstate = qemu_add_vm_change_state_handler(postload_update_cb, > pfl); > } > + > + /* > + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the > + * READ_ARRAY command. > + */ > + if (pfl->cmd == 0x00) { > + pfl->cmd = 0xff; > + } > + > return 0; > }
* Markus Armbruster (armbru@redhat.com) wrote: > Philippe asked me to have a look at this one, so here goes. > > Philippe Mathieu-Daudé <philmd@redhat.com> writes: > > > In the document [*] the "Read Array Flowchart", the READ_ARRAY > > command has a value of 0xff. > > > > Use the correct value in the pflash model. > > > > There is no change of behavior in the guest, because: > > - when the guest were sending 0xFF, the reset_flash label > > was setting the command value as 0x00 > > - 0x00 was used internally for READ_ARRAY > > *Groan* > > Is this cleanup, or does it fix an observable bug? > > > To keep migration with older versions behaving correctly, we > > decide to always migrate the READ_ARRAY as 0x00. > > > > If the CFI open standard decide to assign a new command of value > > 0x00, this model is flawed because it uses this value internally. > > If a guest eventually requires this new CFI feature, a different > > model will be required (or this same model but breaking backward > > migration). So it is safe to keep migrating READ_ARRAY as 0x00. > > We could perhaps keep migration working for "benign" device states, with > judicious use of subsections. We'll cross that bridge when we get to > it. > > > [*] "Common Flash Interface (CFI) and Command Sets" > > (Intel Application Note 646) > > Appendix B "Basic Command Set" > > > > Reviewed-by: John Snow <jsnow@redhat.com> > > Reviewed-by: Alistair Francis <alistair.francis@wdc.com> > > Regression-tested-by: Laszlo Ersek <lersek@redhat.com> > > Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> > > --- > > v3: Handle migrating the 'cmd' field. > > v4: Handle migrating to older QEMU (Dave) > > v5: Add a paragraph about why this model is flawed due to > > historically using READ_ARRAY as 0x00 (Dave, Peter). > > > > Since Laszlo stated he did not test migration [*], I'm keeping his > > test tag, because the change with v2 has no impact in the tests > > he ran. > > > > Likewise I'm keeping John and Alistair tags, but I'd like an extra > > review for the migration change, thanks! > > > > [*] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg00679.html > > --- > > hw/block/pflash_cfi01.c | 57 ++++++++++++++++++++++++++++++++++------- > > 1 file changed, 48 insertions(+), 9 deletions(-) > > > > diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c > > index 9e34fd4e82..85bb2132c0 100644 > > --- a/hw/block/pflash_cfi01.c > > +++ b/hw/block/pflash_cfi01.c > > @@ -96,6 +96,37 @@ struct PFlashCFI01 { > > bool old_multiple_chip_handling; > > }; > > > > +static int pflash_pre_save(void *opaque) > > +{ > > + PFlashCFI01 *s = opaque; > > + > > + /* > > + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the > > + * READ_ARRAY command. To preserve migrating to these older version, > > + * always migrate the READ_ARRAY command as 0x00. > > + */ > > + if (s->cmd == 0xff) { > > + s->cmd = 0x00; > > + } > > + > > + return 0; > > +} > > + > > +static int pflash_post_save(void *opaque) > > +{ > > + PFlashCFI01 *s = opaque; > > + > > + /* > > + * If migration failed, the guest will continue to run. > > + * Restore the correct READ_ARRAY value. > > + */ > > + if (s->cmd == 0x00) { > > + s->cmd = 0xff; > > + } > > + > > + return 0; > > +} > > Uh, this gives me a queasy feeling. Perhaps David can assuage it. See the previous 4 versions of discussion.... > I figure the intent is to migrate PFlashCFI01 member @cmd value 0xFF as > 0x00, for migration compatibility to and from older versions. > > You do this by monkey-patching it to 0x00 before migration, and to 0xFF > afterwards. On the incoming side, you replace 0x00 by 0xFF, in > pflash_post_load() below. > > Questions: > > * Can anything but the code that sends @cmd see the temporary 0x00 value > between pflash_pre_save() and pflash_post_save() It is the same pflash data structure; but all CPUs are stopped and we're just walking the list of devices serialising them; so no nothing should be seeing that value. (There is another way to do this, which is to produce a temporary structure at this point, populate the temporary structure and migrate that) Dave > * Consider the matrix source \in { old, new } x dest \in { old, new } x > @cmd on source in { 0x00, 0xFF }. What does migration put into @cmd > on dest? Eight cases: > > source @cmd -> wire -> dest @cmd > old 0x00 -> 0x00 -> old 0x00 (1) > new 0xFF (2) > old 0xFF -> 0xFF -> old 0xFF (3) > new 0xFF (4) > new 0x00 -> 0x00 -> old 0x00 (5) > new 0xFF (6) > new 0xFF -> 0x00 -> old 0x00 (7) > new 0xFF (8) > > Old -> old (cases 1 and 3) is unaffected by this patch. > > New -> new leaves 0xFF unchanged (8). It changes 0x00 to 0xFF (6). > Uh-oh. Can this happen? Rephrasing the question: can @cmd ever be > 0x00 with this patch applied? > > Old -> new leaves 0xFF unchanged (4). It changes 0x00 to 0xFF (2), > which I think is intentional. > > New -> old leaves 0x00 unchanged (5). It changes 0xFF to 0x00 (7), > which I think is intentional. > > Old -> new -> old leaves 0x00 unchanged. Good. It changes 0xFF to > 0x00. Uh-oh. Can @cmd ever be 0xFF before this patch? > > New -> old -> new leaves 0xFF unchanged. Good. It changes 0x00 to > 0xFF. Same uh-oh as for new -> new. > > > + > > static int pflash_post_load(void *opaque, int version_id); > > > > static const VMStateDescription vmstate_pflash = { > > @@ -103,6 +134,8 @@ static const VMStateDescription vmstate_pflash = { > > .version_id = 1, > > .minimum_version_id = 1, > > .post_load = pflash_post_load, > > + .pre_save = pflash_pre_save, > > + .post_save = pflash_post_save, > > .fields = (VMStateField[]) { > > VMSTATE_UINT8(wcycle, PFlashCFI01), > > VMSTATE_UINT8(cmd, PFlashCFI01), > > @@ -277,10 +310,9 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr offset, > > /* This should never happen : reset state & treat it as a read */ > > DPRINTF("%s: unknown command state: %x\n", __func__, pfl->cmd); > > pfl->wcycle = 0; > > - pfl->cmd = 0; > > + pfl->cmd = 0xff; > > /* fall through to read code */ > > - case 0x00: > > - /* Flash area read */ > > + case 0xff: /* Read Array */ > > ret = pflash_data_read(pfl, offset, width, be); > > break; > > On 0xFF, we no longer zap pfl->wcycle and pfl->cmd. > > On 0x00, we do. > > We zap pfl->cmd to 0xFF instead of 0x00. Same below after label > error_flash and reset_flash. Related: initialization to 0xFF instead of > 0x00 in pflash_cfi01_realize(). I *guess* these changes together ensure > pfl->cmd can't become 0x00. Correct? > > > case 0x10: /* Single byte program */ > > @@ -448,8 +480,6 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, > > case 0: > > /* read mode */ > > switch (cmd) { > > - case 0x00: /* ??? */ > > - goto reset_flash; > > On 0x00, we now use default: goto error_flash. Can this happen? > > > case 0x10: /* Single Byte Program */ > > case 0x40: /* Single Byte Program */ > > DPRINTF("%s: Single Byte Program\n", __func__); > > @@ -526,7 +556,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, > > if (cmd == 0xd0) { /* confirm */ > > pfl->wcycle = 0; > > pfl->status |= 0x80; > > - } else if (cmd == 0xff) { /* read array mode */ > > + } else if (cmd == 0xff) { /* Read Array */ > > goto reset_flash; > > } else > > goto error_flash; > > @@ -553,7 +583,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, > > } else if (cmd == 0x01) { > > pfl->wcycle = 0; > > pfl->status |= 0x80; > > - } else if (cmd == 0xff) { > > + } else if (cmd == 0xff) { /* read array mode */ > > Your new comment is phrased the way you corrected in the previous hunk. > Intentional? > > > goto reset_flash; > > } else { > > DPRINTF("%s: Unknown (un)locking command\n", __func__); > > @@ -645,7 +675,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, > error_flash: > qemu_log_mask(LOG_UNIMP, "%s: Unimplemented flash cmd sequence " > "(offset " TARGET_FMT_plx ", wcycle 0x%x cmd 0x%x value 0x%x)" > "\n", __func__, offset, pfl->wcycle, pfl->cmd, value); > > reset_flash: > > trace_pflash_reset(); > > memory_region_rom_device_set_romd(&pfl->mem, true); > > pfl->wcycle = 0; > > - pfl->cmd = 0; > > + pfl->cmd = 0xff; > > } > > > > > > @@ -761,7 +791,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp) > > } > > > > pfl->wcycle = 0; > > - pfl->cmd = 0; > > + pfl->cmd = 0xff; > > pfl->status = 0; > > /* Hardcoded CFI table */ > > /* Standard "QRY" string */ > > @@ -1001,5 +1031,14 @@ static int pflash_post_load(void *opaque, int version_id) > > pfl->vmstate = qemu_add_vm_change_state_handler(postload_update_cb, > > pfl); > > } > > + > > + /* > > + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the > > + * READ_ARRAY command. > > + */ > > + if (pfl->cmd == 0x00) { > > + pfl->cmd = 0xff; > > + } > > + > > return 0; > > } -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > * Markus Armbruster (armbru@redhat.com) wrote: >> Philippe asked me to have a look at this one, so here goes. >> >> Philippe Mathieu-Daudé <philmd@redhat.com> writes: >> >> > In the document [*] the "Read Array Flowchart", the READ_ARRAY >> > command has a value of 0xff. >> > >> > Use the correct value in the pflash model. >> > >> > There is no change of behavior in the guest, because: >> > - when the guest were sending 0xFF, the reset_flash label >> > was setting the command value as 0x00 >> > - 0x00 was used internally for READ_ARRAY >> >> *Groan* >> >> Is this cleanup, or does it fix an observable bug? >> >> > To keep migration with older versions behaving correctly, we >> > decide to always migrate the READ_ARRAY as 0x00. >> > >> > If the CFI open standard decide to assign a new command of value >> > 0x00, this model is flawed because it uses this value internally. >> > If a guest eventually requires this new CFI feature, a different >> > model will be required (or this same model but breaking backward >> > migration). So it is safe to keep migrating READ_ARRAY as 0x00. >> >> We could perhaps keep migration working for "benign" device states, with >> judicious use of subsections. We'll cross that bridge when we get to >> it. >> >> > [*] "Common Flash Interface (CFI) and Command Sets" >> > (Intel Application Note 646) >> > Appendix B "Basic Command Set" >> > >> > Reviewed-by: John Snow <jsnow@redhat.com> >> > Reviewed-by: Alistair Francis <alistair.francis@wdc.com> >> > Regression-tested-by: Laszlo Ersek <lersek@redhat.com> >> > Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> >> > --- >> > v3: Handle migrating the 'cmd' field. >> > v4: Handle migrating to older QEMU (Dave) >> > v5: Add a paragraph about why this model is flawed due to >> > historically using READ_ARRAY as 0x00 (Dave, Peter). >> > >> > Since Laszlo stated he did not test migration [*], I'm keeping his >> > test tag, because the change with v2 has no impact in the tests >> > he ran. >> > >> > Likewise I'm keeping John and Alistair tags, but I'd like an extra >> > review for the migration change, thanks! >> > >> > [*] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg00679.html >> > --- >> > hw/block/pflash_cfi01.c | 57 ++++++++++++++++++++++++++++++++++------- >> > 1 file changed, 48 insertions(+), 9 deletions(-) >> > >> > diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c >> > index 9e34fd4e82..85bb2132c0 100644 >> > --- a/hw/block/pflash_cfi01.c >> > +++ b/hw/block/pflash_cfi01.c >> > @@ -96,6 +96,37 @@ struct PFlashCFI01 { >> > bool old_multiple_chip_handling; >> > }; >> > >> > +static int pflash_pre_save(void *opaque) >> > +{ >> > + PFlashCFI01 *s = opaque; >> > + >> > + /* >> > + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the >> > + * READ_ARRAY command. To preserve migrating to these older version, >> > + * always migrate the READ_ARRAY command as 0x00. >> > + */ >> > + if (s->cmd == 0xff) { >> > + s->cmd = 0x00; >> > + } >> > + >> > + return 0; >> > +} >> > + >> > +static int pflash_post_save(void *opaque) >> > +{ >> > + PFlashCFI01 *s = opaque; >> > + >> > + /* >> > + * If migration failed, the guest will continue to run. >> > + * Restore the correct READ_ARRAY value. >> > + */ >> > + if (s->cmd == 0x00) { >> > + s->cmd = 0xff; >> > + } >> > + >> > + return 0; >> > +} >> >> Uh, this gives me a queasy feeling. Perhaps David can assuage it. > > See the previous 4 versions of discussion.... Jumped in at v5; sorry about that. >> I figure the intent is to migrate PFlashCFI01 member @cmd value 0xFF as >> 0x00, for migration compatibility to and from older versions. >> >> You do this by monkey-patching it to 0x00 before migration, and to 0xFF >> afterwards. On the incoming side, you replace 0x00 by 0xFF, in >> pflash_post_load() below. >> >> Questions: >> >> * Can anything but the code that sends @cmd see the temporary 0x00 value >> between pflash_pre_save() and pflash_post_save() > > It is the same pflash data structure; but all CPUs are stopped and we're > just walking the list of devices serialising them; so no nothing should > be seeing that value. Sounds good. > (There is another way to do this, which is to produce a temporary > structure at this point, populate the temporary structure and migrate > that) Not necessary. The uh-ohs below still need assuaging, not necessarily yours. > Dave > >> * Consider the matrix source \in { old, new } x dest \in { old, new } x >> @cmd on source in { 0x00, 0xFF }. What does migration put into @cmd >> on dest? Eight cases: >> >> source @cmd -> wire -> dest @cmd >> old 0x00 -> 0x00 -> old 0x00 (1) >> new 0xFF (2) >> old 0xFF -> 0xFF -> old 0xFF (3) >> new 0xFF (4) >> new 0x00 -> 0x00 -> old 0x00 (5) >> new 0xFF (6) >> new 0xFF -> 0x00 -> old 0x00 (7) >> new 0xFF (8) >> >> Old -> old (cases 1 and 3) is unaffected by this patch. >> >> New -> new leaves 0xFF unchanged (8). It changes 0x00 to 0xFF (6). >> Uh-oh. Can this happen? Rephrasing the question: can @cmd ever be >> 0x00 with this patch applied? >> >> Old -> new leaves 0xFF unchanged (4). It changes 0x00 to 0xFF (2), >> which I think is intentional. >> >> New -> old leaves 0x00 unchanged (5). It changes 0xFF to 0x00 (7), >> which I think is intentional. >> >> Old -> new -> old leaves 0x00 unchanged. Good. It changes 0xFF to >> 0x00. Uh-oh. Can @cmd ever be 0xFF before this patch? >> >> New -> old -> new leaves 0xFF unchanged. Good. It changes 0x00 to >> 0xFF. Same uh-oh as for new -> new. >> >> > + >> > static int pflash_post_load(void *opaque, int version_id); >> > >> > static const VMStateDescription vmstate_pflash = { >> > @@ -103,6 +134,8 @@ static const VMStateDescription vmstate_pflash = { >> > .version_id = 1, >> > .minimum_version_id = 1, >> > .post_load = pflash_post_load, >> > + .pre_save = pflash_pre_save, >> > + .post_save = pflash_post_save, >> > .fields = (VMStateField[]) { >> > VMSTATE_UINT8(wcycle, PFlashCFI01), >> > VMSTATE_UINT8(cmd, PFlashCFI01), >> > @@ -277,10 +310,9 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr offset, >> > /* This should never happen : reset state & treat it as a read */ >> > DPRINTF("%s: unknown command state: %x\n", __func__, pfl->cmd); >> > pfl->wcycle = 0; >> > - pfl->cmd = 0; >> > + pfl->cmd = 0xff; >> > /* fall through to read code */ >> > - case 0x00: >> > - /* Flash area read */ >> > + case 0xff: /* Read Array */ >> > ret = pflash_data_read(pfl, offset, width, be); >> > break; >> >> On 0xFF, we no longer zap pfl->wcycle and pfl->cmd. >> >> On 0x00, we do. >> >> We zap pfl->cmd to 0xFF instead of 0x00. Same below after label >> error_flash and reset_flash. Related: initialization to 0xFF instead of >> 0x00 in pflash_cfi01_realize(). I *guess* these changes together ensure >> pfl->cmd can't become 0x00. Correct? >> >> > case 0x10: /* Single byte program */ >> > @@ -448,8 +480,6 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, >> > case 0: >> > /* read mode */ >> > switch (cmd) { >> > - case 0x00: /* ??? */ >> > - goto reset_flash; >> >> On 0x00, we now use default: goto error_flash. Can this happen? >> >> > case 0x10: /* Single Byte Program */ >> > case 0x40: /* Single Byte Program */ >> > DPRINTF("%s: Single Byte Program\n", __func__); >> > @@ -526,7 +556,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, >> > if (cmd == 0xd0) { /* confirm */ >> > pfl->wcycle = 0; >> > pfl->status |= 0x80; >> > - } else if (cmd == 0xff) { /* read array mode */ >> > + } else if (cmd == 0xff) { /* Read Array */ >> > goto reset_flash; >> > } else >> > goto error_flash; >> > @@ -553,7 +583,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, >> > } else if (cmd == 0x01) { >> > pfl->wcycle = 0; >> > pfl->status |= 0x80; >> > - } else if (cmd == 0xff) { >> > + } else if (cmd == 0xff) { /* read array mode */ >> >> Your new comment is phrased the way you corrected in the previous hunk. >> Intentional? >> >> > goto reset_flash; >> > } else { >> > DPRINTF("%s: Unknown (un)locking command\n", __func__); >> > @@ -645,7 +675,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, >> error_flash: >> qemu_log_mask(LOG_UNIMP, "%s: Unimplemented flash cmd sequence " >> "(offset " TARGET_FMT_plx ", wcycle 0x%x cmd 0x%x value 0x%x)" >> "\n", __func__, offset, pfl->wcycle, pfl->cmd, value); >> >> reset_flash: >> > trace_pflash_reset(); >> > memory_region_rom_device_set_romd(&pfl->mem, true); >> > pfl->wcycle = 0; >> > - pfl->cmd = 0; >> > + pfl->cmd = 0xff; >> > } >> > >> > >> > @@ -761,7 +791,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp) >> > } >> > >> > pfl->wcycle = 0; >> > - pfl->cmd = 0; >> > + pfl->cmd = 0xff; >> > pfl->status = 0; >> > /* Hardcoded CFI table */ >> > /* Standard "QRY" string */ >> > @@ -1001,5 +1031,14 @@ static int pflash_post_load(void *opaque, int version_id) >> > pfl->vmstate = qemu_add_vm_change_state_handler(postload_update_cb, >> > pfl); >> > } >> > + >> > + /* >> > + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the >> > + * READ_ARRAY command. >> > + */ >> > + if (pfl->cmd == 0x00) { >> > + pfl->cmd = 0xff; >> > + } >> > + >> > return 0; >> > } > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Hi Markus. On 7/16/19 5:12 PM, Markus Armbruster wrote: > "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: > >> * Markus Armbruster (armbru@redhat.com) wrote: >>> Philippe asked me to have a look at this one, so here goes. Thanks a lot for your careful analysis. I got scared the uh-oh you raised would get this series or rework of it still refused for the release, so I went for a quick-and-dirty bugfix. Since this bugfix got merged (as commit 3a283507c0347), I can now think again about how to properly fix this (if it is fixable...). >>> Philippe Mathieu-Daudé <philmd@redhat.com> writes: >>> >>>> In the document [*] the "Read Array Flowchart", the READ_ARRAY >>>> command has a value of 0xff. >>>> >>>> Use the correct value in the pflash model. >>>> >>>> There is no change of behavior in the guest, because: >>>> - when the guest were sending 0xFF, the reset_flash label >>>> was setting the command value as 0x00 >>>> - 0x00 was used internally for READ_ARRAY >>> >>> *Groan* >>> >>> Is this cleanup, or does it fix an observable bug? Well it depends where you stand. I have a few patches on top of this adding trace events (4.2 material) and while debugging it was not making sense with the CFI specs. 1/ The guest writes 0xFF to go in READ_ARRAY mode, the model report a warning and take the switch default case which calls pflash_reset(), which happens to set the flash in READ_ARRAY. 2/ Then a later series adds the CFI specs timings (like the CFI02 model), because it would useful to test the UEFI Capsule Update feature with real-time behavior. For the 'Virt' pflash, the timing is disabled. Now running a non-Virt pflash, it becomes very slow because each time the guest goes into READ_ARRAY mode, the reset delay (which is the longest) occurs. Talking with Laszlo, I figured for 1/ instead of fixing the model, I can display 0x00 as 0xFF and ignore the pflash_reset() when the caller is not system_reset(). Dirty again. For 2/ it is not that easy, it will depends if there is more interest from the UEFI community (Intel parallel NOR flashes are used on x86 and aarch64 UEFI platforms). If we justify 1/ and 2/ are not important, then it is simply a cleanup. >>>> To keep migration with older versions behaving correctly, we >>>> decide to always migrate the READ_ARRAY as 0x00. >>>> >>>> If the CFI open standard decide to assign a new command of value >>>> 0x00, this model is flawed because it uses this value internally. >>>> If a guest eventually requires this new CFI feature, a different >>>> model will be required (or this same model but breaking backward >>>> migration). So it is safe to keep migrating READ_ARRAY as 0x00. >>> >>> We could perhaps keep migration working for "benign" device states, with >>> judicious use of subsections. We'll cross that bridge when we get to >>> it. >>> >>>> [*] "Common Flash Interface (CFI) and Command Sets" >>>> (Intel Application Note 646) >>>> Appendix B "Basic Command Set" >>>> >>>> Reviewed-by: John Snow <jsnow@redhat.com> >>>> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> >>>> Regression-tested-by: Laszlo Ersek <lersek@redhat.com> >>>> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> >>>> --- >>>> v3: Handle migrating the 'cmd' field. >>>> v4: Handle migrating to older QEMU (Dave) >>>> v5: Add a paragraph about why this model is flawed due to >>>> historically using READ_ARRAY as 0x00 (Dave, Peter). >>>> >>>> Since Laszlo stated he did not test migration [*], I'm keeping his >>>> test tag, because the change with v2 has no impact in the tests >>>> he ran. >>>> >>>> Likewise I'm keeping John and Alistair tags, but I'd like an extra >>>> review for the migration change, thanks! >>>> >>>> [*] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg00679.html >>>> --- >>>> hw/block/pflash_cfi01.c | 57 ++++++++++++++++++++++++++++++++++------- >>>> 1 file changed, 48 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c >>>> index 9e34fd4e82..85bb2132c0 100644 >>>> --- a/hw/block/pflash_cfi01.c >>>> +++ b/hw/block/pflash_cfi01.c >>>> @@ -96,6 +96,37 @@ struct PFlashCFI01 { >>>> bool old_multiple_chip_handling; >>>> }; >>>> >>>> +static int pflash_pre_save(void *opaque) >>>> +{ >>>> + PFlashCFI01 *s = opaque; >>>> + >>>> + /* >>>> + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the >>>> + * READ_ARRAY command. To preserve migrating to these older version, >>>> + * always migrate the READ_ARRAY command as 0x00. >>>> + */ >>>> + if (s->cmd == 0xff) { >>>> + s->cmd = 0x00; >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static int pflash_post_save(void *opaque) >>>> +{ >>>> + PFlashCFI01 *s = opaque; >>>> + >>>> + /* >>>> + * If migration failed, the guest will continue to run. >>>> + * Restore the correct READ_ARRAY value. >>>> + */ >>>> + if (s->cmd == 0x00) { >>>> + s->cmd = 0xff; >>>> + } >>>> + >>>> + return 0; >>>> +} >>> >>> Uh, this gives me a queasy feeling. Perhaps David can assuage it. >> >> See the previous 4 versions of discussion.... > > Jumped in at v5; sorry about that. > >>> I figure the intent is to migrate PFlashCFI01 member @cmd value 0xFF as >>> 0x00, for migration compatibility to and from older versions. >>> >>> You do this by monkey-patching it to 0x00 before migration, and to 0xFF >>> afterwards. On the incoming side, you replace 0x00 by 0xFF, in >>> pflash_post_load() below. >>> >>> Questions: >>> >>> * Can anything but the code that sends @cmd see the temporary 0x00 value >>> between pflash_pre_save() and pflash_post_save() >> >> It is the same pflash data structure; but all CPUs are stopped and we're >> just walking the list of devices serialising them; so no nothing should >> be seeing that value. > > Sounds good. > >> (There is another way to do this, which is to produce a temporary >> structure at this point, populate the temporary structure and migrate >> that) > > Not necessary. > > The uh-ohs below still need assuaging, not necessarily yours. > >> Dave >> >>> * Consider the matrix source \in { old, new } x dest \in { old, new } x >>> @cmd on source in { 0x00, 0xFF }. What does migration put into @cmd >>> on dest? Eight cases: >>> >>> source @cmd -> wire -> dest @cmd >>> old 0x00 -> 0x00 -> old 0x00 (1) >>> new 0xFF (2) >>> old 0xFF -> 0xFF -> old 0xFF (3) >>> new 0xFF (4) >>> new 0x00 -> 0x00 -> old 0x00 (5) >>> new 0xFF (6) >>> new 0xFF -> 0x00 -> old 0x00 (7) >>> new 0xFF (8) >>> >>> Old -> old (cases 1 and 3) is unaffected by this patch. >>> >>> New -> new leaves 0xFF unchanged (8). It changes 0x00 to 0xFF (6). >>> Uh-oh. Can this happen? Rephrasing the question: can @cmd ever be >>> 0x00 with this patch applied? 0x00 is not in the spec (but as suggested Peter Maydell, the spec can eventually assign the value in the future [*]). So no guests use it. This value is only set: - when the (old) model is initialized, without any access to the guest - if the guest wrote an incorrect value, hitting the switch default case. The guest can not read this value (value internal to the state machine). So answer: Yes :( [*] However the spec has a way to announce supported features to the guest. >>> >>> Old -> new leaves 0xFF unchanged (4). It changes 0x00 to 0xFF (2), >>> which I think is intentional. >>> >>> New -> old leaves 0x00 unchanged (5). It changes 0xFF to 0x00 (7), >>> which I think is intentional. >>> >>> Old -> new -> old leaves 0x00 unchanged. Good. It changes 0xFF to >>> 0x00. Uh-oh. Can @cmd ever be 0xFF before this patch? I understand the full question as "Can @cmd ever be 0xFF [in Old] before this patch?". Answer: No, neither the guest nor the state machine can set @cmd to 0xFF in Old. >>> >>> New -> old -> new leaves 0xFF unchanged. Good. It changes 0x00 to >>> 0xFF. Same uh-oh as for new -> new. "Same uh-oh", do you mean "Can @cmd ever be 0x00 [in New] before this patch?"? Same answer: No, neither the guest nor the state machine can set @cmd to 0x00 in New. >>> >>>> + >>>> static int pflash_post_load(void *opaque, int version_id); >>>> >>>> static const VMStateDescription vmstate_pflash = { >>>> @@ -103,6 +134,8 @@ static const VMStateDescription vmstate_pflash = { >>>> .version_id = 1, >>>> .minimum_version_id = 1, >>>> .post_load = pflash_post_load, >>>> + .pre_save = pflash_pre_save, >>>> + .post_save = pflash_post_save, >>>> .fields = (VMStateField[]) { >>>> VMSTATE_UINT8(wcycle, PFlashCFI01), >>>> VMSTATE_UINT8(cmd, PFlashCFI01), >>>> @@ -277,10 +310,9 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr offset, >>>> /* This should never happen : reset state & treat it as a read */ >>>> DPRINTF("%s: unknown command state: %x\n", __func__, pfl->cmd); >>>> pfl->wcycle = 0; >>>> - pfl->cmd = 0; >>>> + pfl->cmd = 0xff; >>>> /* fall through to read code */ >>>> - case 0x00: >>>> - /* Flash area read */ >>>> + case 0xff: /* Read Array */ >>>> ret = pflash_data_read(pfl, offset, width, be); >>>> break; >>> >>> On 0xFF, we no longer zap pfl->wcycle and pfl->cmd. We have 2 ways to set @cmd=0xFF. - Write 0xFF, write an invalid command, finish a multicycle operation (wcycle returns to 0): pflash_write() goto reset_flash, then calls memory_region_rom_device_set_romd(). set wcycle=0, cmd=READ_ARRAY. Next read() will be in ROMD mode, we won't reach pflash_read(). - if next access is write, we'll enter the same pflash_write(). - The /* This should never happen */ comment in pflash_read(). It might happens migrating? So we migrated crap, the guest wants to read, the crap defaults to READ_ARRAY in I/O mode. Wrong in many ways, not sure what the guest expects there, probably not ARRAY data. Anyway, we stay in this READ_ARRAY I/O mode until the guest eventually does a write access. Wrong. The case "The state machine set 0xff, let the device in I/O mode" so we expect to answer to a read() with READ_ARRAY is wrong too, the device should already be in ROMD mode. >>> >>> On 0x00, we do. Because we have no idea how we got there... Neither what we should do. >>> >>> We zap pfl->cmd to 0xFF instead of 0x00. Same below after label >>> error_flash and reset_flash. Related: initialization to 0xFF instead of >>> 0x00 in pflash_cfi01_realize(). I *guess* these changes together ensure >>> pfl->cmd can't become 0x00. Correct? >>> >>>> case 0x10: /* Single byte program */ >>>> @@ -448,8 +480,6 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, >>>> case 0: >>>> /* read mode */ >>>> switch (cmd) { >>>> - case 0x00: /* ??? */ >>>> - goto reset_flash; >>> >>> On 0x00, we now use default: goto error_flash. Can this happen? This could happen if the the guest is writing crap, so we correctly report this as an error. >>> >>>> case 0x10: /* Single Byte Program */ >>>> case 0x40: /* Single Byte Program */ >>>> DPRINTF("%s: Single Byte Program\n", __func__); >>>> @@ -526,7 +556,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, >>>> if (cmd == 0xd0) { /* confirm */ >>>> pfl->wcycle = 0; >>>> pfl->status |= 0x80; >>>> - } else if (cmd == 0xff) { /* read array mode */ >>>> + } else if (cmd == 0xff) { /* Read Array */ >>>> goto reset_flash; >>>> } else >>>> goto error_flash; >>>> @@ -553,7 +583,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, >>>> } else if (cmd == 0x01) { >>>> pfl->wcycle = 0; >>>> pfl->status |= 0x80; >>>> - } else if (cmd == 0xff) { >>>> + } else if (cmd == 0xff) { /* read array mode */ >>> >>> Your new comment is phrased the way you corrected in the previous hunk. >>> Intentional? No :/ Too many rebases. >>> >>>> goto reset_flash; >>>> } else { >>>> DPRINTF("%s: Unknown (un)locking command\n", __func__); >>>> @@ -645,7 +675,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, >>> error_flash: >>> qemu_log_mask(LOG_UNIMP, "%s: Unimplemented flash cmd sequence " >>> "(offset " TARGET_FMT_plx ", wcycle 0x%x cmd 0x%x value 0x%x)" >>> "\n", __func__, offset, pfl->wcycle, pfl->cmd, value); >>> >>> reset_flash: >>>> trace_pflash_reset(); >>>> memory_region_rom_device_set_romd(&pfl->mem, true); >>>> pfl->wcycle = 0; >>>> - pfl->cmd = 0; >>>> + pfl->cmd = 0xff; >>>> } >>>> >>>> >>>> @@ -761,7 +791,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp) >>>> } >>>> >>>> pfl->wcycle = 0; >>>> - pfl->cmd = 0; >>>> + pfl->cmd = 0xff; >>>> pfl->status = 0; >>>> /* Hardcoded CFI table */ >>>> /* Standard "QRY" string */ >>>> @@ -1001,5 +1031,14 @@ static int pflash_post_load(void *opaque, int version_id) >>>> pfl->vmstate = qemu_add_vm_change_state_handler(postload_update_cb, >>>> pfl); >>>> } >>>> + >>>> + /* >>>> + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the >>>> + * READ_ARRAY command. >>>> + */ >>>> + if (pfl->cmd == 0x00) { >>>> + pfl->cmd = 0xff; >>>> + } >>>> + >>>> return 0; >>>> } >> -- >> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Philippe Mathieu-Daudé <philmd@redhat.com> writes: > Hi Markus. > > On 7/16/19 5:12 PM, Markus Armbruster wrote: >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: >> >>> * Markus Armbruster (armbru@redhat.com) wrote: >>>> Philippe asked me to have a look at this one, so here goes. > > Thanks a lot for your careful analysis. > > I got scared the uh-oh you raised would get this series or rework of it > still refused for the release, so I went for a quick-and-dirty bugfix. This close to the release, minimal bug fix now and cleanup later makes lots of sense. > Since this bugfix got merged (as commit 3a283507c0347), I can now think > again about how to properly fix this (if it is fixable...). > >>>> Philippe Mathieu-Daudé <philmd@redhat.com> writes: >>>> >>>>> In the document [*] the "Read Array Flowchart", the READ_ARRAY >>>>> command has a value of 0xff. >>>>> >>>>> Use the correct value in the pflash model. >>>>> >>>>> There is no change of behavior in the guest, because: >>>>> - when the guest were sending 0xFF, the reset_flash label >>>>> was setting the command value as 0x00 >>>>> - 0x00 was used internally for READ_ARRAY >>>> >>>> *Groan* >>>> >>>> Is this cleanup, or does it fix an observable bug? > > Well it depends where you stand. > > I have a few patches on top of this adding trace events (4.2 material) > and while debugging it was not making sense with the CFI specs. > > 1/ The guest writes 0xFF to go in READ_ARRAY mode, the model report a > warning and take the switch default case which calls pflash_reset(), > which happens to set the flash in READ_ARRAY. This one, in pflash_write()? switch (pfl->wcycle) { case 0: ... ---> case 0xff: /* Read array mode */ DPRINTF("%s: Read array mode\n", __func__); goto reset_flash; ... } return; ... reset_flash: trace_pflash_reset(); memory_region_rom_device_set_romd(&pfl->mem, true); pfl->wcycle = 0; pfl->cmd = 0; I can't see a warning here. Let's ignore the tracepoint. Is the memory_region_rom_device_set_romd() appropriate for READ_ARRAY? pfl->wcycle = 0 is a no-op. pfl->cmd = 0 is part of the "use 0 instead 0f 0xFF to represent READ_ARRAY state" madness. By the way, use of tracing and DPRINTF() in the same .c is an anti-pattern. Care to convert the remaining DPRINTF() into tracepoints? Feel free to delete useless ones, if any. > 2/ Then a later series adds the CFI specs timings (like the CFI02 > model), because it would useful to test the UEFI Capsule Update feature > with real-time behavior. For the 'Virt' pflash, the timing is disabled. > Now running a non-Virt pflash, it becomes very slow because each time > the guest goes into READ_ARRAY mode, the reset delay (which is the > longest) occurs. Feels like a latent bug. Adding timing turns it into a real one. > Talking with Laszlo, I figured for 1/ instead of fixing the model, I can > display 0x00 as 0xFF and ignore the pflash_reset() when the caller is > not system_reset(). Dirty again. > > For 2/ it is not that easy, it will depends if there is more interest > from the UEFI community (Intel parallel NOR flashes are used on x86 and > aarch64 UEFI platforms). > > If we justify 1/ and 2/ are not important, then it is simply a cleanup. If it's a bug fix, have the commit message explain the bug. If it's just cleanup, heave the commit message say so. >>>>> To keep migration with older versions behaving correctly, we >>>>> decide to always migrate the READ_ARRAY as 0x00. >>>>> >>>>> If the CFI open standard decide to assign a new command of value >>>>> 0x00, this model is flawed because it uses this value internally. >>>>> If a guest eventually requires this new CFI feature, a different >>>>> model will be required (or this same model but breaking backward >>>>> migration). So it is safe to keep migrating READ_ARRAY as 0x00. >>>> >>>> We could perhaps keep migration working for "benign" device states, with >>>> judicious use of subsections. We'll cross that bridge when we get to >>>> it. >>>> >>>>> [*] "Common Flash Interface (CFI) and Command Sets" >>>>> (Intel Application Note 646) >>>>> Appendix B "Basic Command Set" >>>>> >>>>> Reviewed-by: John Snow <jsnow@redhat.com> >>>>> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> >>>>> Regression-tested-by: Laszlo Ersek <lersek@redhat.com> >>>>> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> >>>>> --- >>>>> v3: Handle migrating the 'cmd' field. >>>>> v4: Handle migrating to older QEMU (Dave) >>>>> v5: Add a paragraph about why this model is flawed due to >>>>> historically using READ_ARRAY as 0x00 (Dave, Peter). >>>>> >>>>> Since Laszlo stated he did not test migration [*], I'm keeping his >>>>> test tag, because the change with v2 has no impact in the tests >>>>> he ran. >>>>> >>>>> Likewise I'm keeping John and Alistair tags, but I'd like an extra >>>>> review for the migration change, thanks! >>>>> >>>>> [*] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg00679.html >>>>> --- >>>>> hw/block/pflash_cfi01.c | 57 ++++++++++++++++++++++++++++++++++------- >>>>> 1 file changed, 48 insertions(+), 9 deletions(-) >>>>> >>>>> diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c >>>>> index 9e34fd4e82..85bb2132c0 100644 >>>>> --- a/hw/block/pflash_cfi01.c >>>>> +++ b/hw/block/pflash_cfi01.c >>>>> @@ -96,6 +96,37 @@ struct PFlashCFI01 { >>>>> bool old_multiple_chip_handling; >>>>> }; >>>>> >>>>> +static int pflash_pre_save(void *opaque) >>>>> +{ >>>>> + PFlashCFI01 *s = opaque; >>>>> + >>>>> + /* >>>>> + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the >>>>> + * READ_ARRAY command. To preserve migrating to these older version, >>>>> + * always migrate the READ_ARRAY command as 0x00. >>>>> + */ >>>>> + if (s->cmd == 0xff) { >>>>> + s->cmd = 0x00; >>>>> + } >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +static int pflash_post_save(void *opaque) >>>>> +{ >>>>> + PFlashCFI01 *s = opaque; >>>>> + >>>>> + /* >>>>> + * If migration failed, the guest will continue to run. >>>>> + * Restore the correct READ_ARRAY value. >>>>> + */ >>>>> + if (s->cmd == 0x00) { >>>>> + s->cmd = 0xff; >>>>> + } >>>>> + >>>>> + return 0; >>>>> +} >>>> >>>> Uh, this gives me a queasy feeling. Perhaps David can assuage it. >>> >>> See the previous 4 versions of discussion.... >> >> Jumped in at v5; sorry about that. >> >>>> I figure the intent is to migrate PFlashCFI01 member @cmd value 0xFF as >>>> 0x00, for migration compatibility to and from older versions. >>>> >>>> You do this by monkey-patching it to 0x00 before migration, and to 0xFF >>>> afterwards. On the incoming side, you replace 0x00 by 0xFF, in >>>> pflash_post_load() below. >>>> >>>> Questions: >>>> >>>> * Can anything but the code that sends @cmd see the temporary 0x00 value >>>> between pflash_pre_save() and pflash_post_save() >>> >>> It is the same pflash data structure; but all CPUs are stopped and we're >>> just walking the list of devices serialising them; so no nothing should >>> be seeing that value. >> >> Sounds good. >> >>> (There is another way to do this, which is to produce a temporary >>> structure at this point, populate the temporary structure and migrate >>> that) >> >> Not necessary. >> >> The uh-ohs below still need assuaging, not necessarily yours. >> >>> Dave >>> >>>> * Consider the matrix source \in { old, new } x dest \in { old, new } x >>>> @cmd on source in { 0x00, 0xFF }. What does migration put into @cmd >>>> on dest? Eight cases: >>>> >>>> source @cmd -> wire -> dest @cmd >>>> old 0x00 -> 0x00 -> old 0x00 (1) >>>> new 0xFF (2) >>>> old 0xFF -> 0xFF -> old 0xFF (3) >>>> new 0xFF (4) >>>> new 0x00 -> 0x00 -> old 0x00 (5) >>>> new 0xFF (6) >>>> new 0xFF -> 0x00 -> old 0x00 (7) >>>> new 0xFF (8) >>>> >>>> Old -> old (cases 1 and 3) is unaffected by this patch. >>>> >>>> New -> new leaves 0xFF unchanged (8). It changes 0x00 to 0xFF (6). >>>> Uh-oh. Can this happen? Rephrasing the question: can @cmd ever be >>>> 0x00 with this patch applied? > > 0x00 is not in the spec (but as suggested Peter Maydell, the spec can > eventually assign the value in the future [*]). So no guests use it. > This value is only set: > - when the (old) model is initialized, without any access to the guest > - if the guest wrote an incorrect value, hitting the switch default case. > The guest can not read this value (value internal to the state machine). > > So answer: Yes :( Could zero @cmd be avoided? > [*] However the spec has a way to announce supported features to the guest. > >>>> >>>> Old -> new leaves 0xFF unchanged (4). It changes 0x00 to 0xFF (2), >>>> which I think is intentional. >>>> >>>> New -> old leaves 0x00 unchanged (5). It changes 0xFF to 0x00 (7), >>>> which I think is intentional. >>>> >>>> Old -> new -> old leaves 0x00 unchanged. Good. It changes 0xFF to >>>> 0x00. Uh-oh. Can @cmd ever be 0xFF before this patch? > > I understand the full question as "Can @cmd ever be 0xFF [in Old] before > this patch?". > > Answer: No, neither the guest nor the state machine can set @cmd to 0xFF > in Old. Good. >>>> >>>> New -> old -> new leaves 0xFF unchanged. Good. It changes 0x00 to >>>> 0xFF. Same uh-oh as for new -> new. > > "Same uh-oh", do you mean "Can @cmd ever be 0x00 [in New] before this > patch?"? Uh, isn't "[in New] before this patch" is a contradiction? Old = QEMU before this patch New = QEMU with this patch applied New -> old -> new = migrate from QEMU with the patch to QEMU without the patch, then migrate again to QEMU with the patch. If @cmd can be 0x00 initially, then the first migration is (5), and @cmd remains 0x00. The second migration is (2), and @cmd becomes 0xFF. The two migrations together change device state, which could be bad. > Same answer: No, neither the guest nor the state machine can set @cmd to > 0x00 in New. I'm afraid you gave a different answer to the same question above: "Yes :(". >>>> >>>>> + >>>>> static int pflash_post_load(void *opaque, int version_id); >>>>> >>>>> static const VMStateDescription vmstate_pflash = { >>>>> @@ -103,6 +134,8 @@ static const VMStateDescription vmstate_pflash = { >>>>> .version_id = 1, >>>>> .minimum_version_id = 1, >>>>> .post_load = pflash_post_load, >>>>> + .pre_save = pflash_pre_save, >>>>> + .post_save = pflash_post_save, >>>>> .fields = (VMStateField[]) { >>>>> VMSTATE_UINT8(wcycle, PFlashCFI01), >>>>> VMSTATE_UINT8(cmd, PFlashCFI01), >>>>> @@ -277,10 +310,9 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr offset, >>>>> /* This should never happen : reset state & treat it as a read */ >>>>> DPRINTF("%s: unknown command state: %x\n", __func__, pfl->cmd); >>>>> pfl->wcycle = 0; >>>>> - pfl->cmd = 0; >>>>> + pfl->cmd = 0xff; >>>>> /* fall through to read code */ >>>>> - case 0x00: >>>>> - /* Flash area read */ >>>>> + case 0xff: /* Read Array */ >>>>> ret = pflash_data_read(pfl, offset, width, be); >>>>> break; >>>> >>>> On 0xFF, we no longer zap pfl->wcycle and pfl->cmd. > > We have 2 ways to set @cmd=0xFF. > > - Write 0xFF, write an invalid command, > finish a multicycle operation (wcycle returns to 0): > > pflash_write() goto reset_flash, then calls > memory_region_rom_device_set_romd(). set wcycle=0, cmd=READ_ARRAY. > > Next read() will be in ROMD mode, we won't reach pflash_read(). > > - if next access is write, we'll enter the same pflash_write(). > > - The /* This should never happen */ comment in pflash_read(). > > It might happens migrating? So we migrated crap, the guest wants to > read, the crap defaults to READ_ARRAY in I/O mode. Wrong in many ways, > not sure what the guest expects there, probably not ARRAY data. > Anyway, we stay in this READ_ARRAY I/O mode until the guest eventually > does a write access. Wrong. > > The case "The state machine set 0xff, let the device in I/O mode" so we > expect to answer to a read() with READ_ARRAY is wrong too, the device > should already be in ROMD mode. > >>>> >>>> On 0x00, we do. > > Because we have no idea how we got there... Neither what we should do. The part I actually understand here is "this device model is wrong in so many ways". Is this device model fixable with reasonable effort? Would starting over be easier? >>>> >>>> We zap pfl->cmd to 0xFF instead of 0x00. Same below after label >>>> error_flash and reset_flash. Related: initialization to 0xFF instead of >>>> 0x00 in pflash_cfi01_realize(). I *guess* these changes together ensure >>>> pfl->cmd can't become 0x00. Correct? >>>> >>>>> case 0x10: /* Single byte program */ >>>>> @@ -448,8 +480,6 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, >>>>> case 0: >>>>> /* read mode */ >>>>> switch (cmd) { >>>>> - case 0x00: /* ??? */ >>>>> - goto reset_flash; >>>> >>>> On 0x00, we now use default: goto error_flash. Can this happen? > > This could happen if the the guest is writing crap, so we correctly > report this as an error. A bug fix of sorts. >>>> >>>>> case 0x10: /* Single Byte Program */ >>>>> case 0x40: /* Single Byte Program */ >>>>> DPRINTF("%s: Single Byte Program\n", __func__); >>>>> @@ -526,7 +556,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, >>>>> if (cmd == 0xd0) { /* confirm */ >>>>> pfl->wcycle = 0; >>>>> pfl->status |= 0x80; >>>>> - } else if (cmd == 0xff) { /* read array mode */ >>>>> + } else if (cmd == 0xff) { /* Read Array */ >>>>> goto reset_flash; >>>>> } else >>>>> goto error_flash; >>>>> @@ -553,7 +583,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, >>>>> } else if (cmd == 0x01) { >>>>> pfl->wcycle = 0; >>>>> pfl->status |= 0x80; >>>>> - } else if (cmd == 0xff) { >>>>> + } else if (cmd == 0xff) { /* read array mode */ >>>> >>>> Your new comment is phrased the way you corrected in the previous hunk. >>>> Intentional? > > No :/ Too many rebases. Easy enough to clean up :) >>>> >>>>> goto reset_flash; >>>>> } else { >>>>> DPRINTF("%s: Unknown (un)locking command\n", __func__); >>>>> @@ -645,7 +675,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, >>>> error_flash: >>>> qemu_log_mask(LOG_UNIMP, "%s: Unimplemented flash cmd sequence " >>>> "(offset " TARGET_FMT_plx ", wcycle 0x%x cmd 0x%x value 0x%x)" >>>> "\n", __func__, offset, pfl->wcycle, pfl->cmd, value); >>>> >>>> reset_flash: >>>>> trace_pflash_reset(); >>>>> memory_region_rom_device_set_romd(&pfl->mem, true); >>>>> pfl->wcycle = 0; >>>>> - pfl->cmd = 0; >>>>> + pfl->cmd = 0xff; >>>>> } >>>>> >>>>> >>>>> @@ -761,7 +791,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp) >>>>> } >>>>> >>>>> pfl->wcycle = 0; >>>>> - pfl->cmd = 0; >>>>> + pfl->cmd = 0xff; >>>>> pfl->status = 0; >>>>> /* Hardcoded CFI table */ >>>>> /* Standard "QRY" string */ >>>>> @@ -1001,5 +1031,14 @@ static int pflash_post_load(void *opaque, int version_id) >>>>> pfl->vmstate = qemu_add_vm_change_state_handler(postload_update_cb, >>>>> pfl); >>>>> } >>>>> + >>>>> + /* >>>>> + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the >>>>> + * READ_ARRAY command. >>>>> + */ >>>>> + if (pfl->cmd == 0x00) { >>>>> + pfl->cmd = 0xff; >>>>> + } >>>>> + >>>>> return 0; >>>>> } >>> -- >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c index 9e34fd4e82..85bb2132c0 100644 --- a/hw/block/pflash_cfi01.c +++ b/hw/block/pflash_cfi01.c @@ -96,6 +96,37 @@ struct PFlashCFI01 { bool old_multiple_chip_handling; }; +static int pflash_pre_save(void *opaque) +{ + PFlashCFI01 *s = opaque; + + /* + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the + * READ_ARRAY command. To preserve migrating to these older version, + * always migrate the READ_ARRAY command as 0x00. + */ + if (s->cmd == 0xff) { + s->cmd = 0x00; + } + + return 0; +} + +static int pflash_post_save(void *opaque) +{ + PFlashCFI01 *s = opaque; + + /* + * If migration failed, the guest will continue to run. + * Restore the correct READ_ARRAY value. + */ + if (s->cmd == 0x00) { + s->cmd = 0xff; + } + + return 0; +} + static int pflash_post_load(void *opaque, int version_id); static const VMStateDescription vmstate_pflash = { @@ -103,6 +134,8 @@ static const VMStateDescription vmstate_pflash = { .version_id = 1, .minimum_version_id = 1, .post_load = pflash_post_load, + .pre_save = pflash_pre_save, + .post_save = pflash_post_save, .fields = (VMStateField[]) { VMSTATE_UINT8(wcycle, PFlashCFI01), VMSTATE_UINT8(cmd, PFlashCFI01), @@ -277,10 +310,9 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr offset, /* This should never happen : reset state & treat it as a read */ DPRINTF("%s: unknown command state: %x\n", __func__, pfl->cmd); pfl->wcycle = 0; - pfl->cmd = 0; + pfl->cmd = 0xff; /* fall through to read code */ - case 0x00: - /* Flash area read */ + case 0xff: /* Read Array */ ret = pflash_data_read(pfl, offset, width, be); break; case 0x10: /* Single byte program */ @@ -448,8 +480,6 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, case 0: /* read mode */ switch (cmd) { - case 0x00: /* ??? */ - goto reset_flash; case 0x10: /* Single Byte Program */ case 0x40: /* Single Byte Program */ DPRINTF("%s: Single Byte Program\n", __func__); @@ -526,7 +556,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, if (cmd == 0xd0) { /* confirm */ pfl->wcycle = 0; pfl->status |= 0x80; - } else if (cmd == 0xff) { /* read array mode */ + } else if (cmd == 0xff) { /* Read Array */ goto reset_flash; } else goto error_flash; @@ -553,7 +583,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, } else if (cmd == 0x01) { pfl->wcycle = 0; pfl->status |= 0x80; - } else if (cmd == 0xff) { + } else if (cmd == 0xff) { /* read array mode */ goto reset_flash; } else { DPRINTF("%s: Unknown (un)locking command\n", __func__); @@ -645,7 +675,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, trace_pflash_reset(); memory_region_rom_device_set_romd(&pfl->mem, true); pfl->wcycle = 0; - pfl->cmd = 0; + pfl->cmd = 0xff; } @@ -761,7 +791,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp) } pfl->wcycle = 0; - pfl->cmd = 0; + pfl->cmd = 0xff; pfl->status = 0; /* Hardcoded CFI table */ /* Standard "QRY" string */ @@ -1001,5 +1031,14 @@ static int pflash_post_load(void *opaque, int version_id) pfl->vmstate = qemu_add_vm_change_state_handler(postload_update_cb, pfl); } + + /* + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the + * READ_ARRAY command. + */ + if (pfl->cmd == 0x00) { + pfl->cmd = 0xff; + } + return 0; }