Message ID | 1434984574-21037-1-git-send-email-dgilbert@redhat.com |
---|---|
State | New |
Headers | show |
On 22.06.15 16:49, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > The VMDescription section maybe after the EOF mark, the current code > does a 'qemu_get_byte' and either gets the header byte identifying the > description or an error (which it ignores). Doing the 'get' upsets > RDMA which hangs on old machine types without the VMDescription. > > Using 'qemu_peek_byte' avoids that. > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fun. I did actually use peek at first and then figured it's the same as read in the qemu file implementation. Have you figured out why exactly peek does make a difference for the RDMA case? Alex
* Alexander Graf (agraf@suse.de) wrote: > > > On 22.06.15 16:49, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > The VMDescription section maybe after the EOF mark, the current code > > does a 'qemu_get_byte' and either gets the header byte identifying the > > description or an error (which it ignores). Doing the 'get' upsets > > RDMA which hangs on old machine types without the VMDescription. > > > > Using 'qemu_peek_byte' avoids that. > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > Fun. I did actually use peek at first and then figured it's the same as > read in the qemu file implementation. Have you figured out why exactly > peek does make a difference for the RDMA case? Hmm, no, I agree that is odd but I do need to look again at it. I started off with a simple empty VM (no guest running) and found that it wouldn't migrate over RDMA using machine types older than 2.3 unless I sent the vmdesc, and this seems to fix that for me. However, I've found another case; a busy migrate running stressapp that still fails on older machine types. My guess is that it depends whether we've read the data already - if we're lucky and the data is already in the qemu-file buffer it doesn't end up calling the RDMA code again. Dave > > > Alex -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Alexander Graf (agraf@suse.de) wrote: > > > On 22.06.15 16:49, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > The VMDescription section maybe after the EOF mark, the current code > > does a 'qemu_get_byte' and either gets the header byte identifying the > > description or an error (which it ignores). Doing the 'get' upsets > > RDMA which hangs on old machine types without the VMDescription. > > > > Using 'qemu_peek_byte' avoids that. > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > Fun. I did actually use peek at first and then figured it's the same as > read in the qemu file implementation. Have you figured out why exactly > peek does make a difference for the RDMA case? Yeh, scrap this patch. I've just posted 'Only try and read a VMDescription if it should be there' as a replacement. Fundamentally, the trick of trying to send/read stuff after the EOF just isn't safe on all transports. We've got to read stuff if it's expected and only if it's expected and obey the EOF marker. If it wasn't for keeping compatibility I'd swing this section around so it went before the EOF, but we can't break compatibility with streams that already have it. Dave > > > Alex -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 23.06.15 18:37, Dr. David Alan Gilbert wrote: > * Alexander Graf (agraf@suse.de) wrote: >> >> >> On 22.06.15 16:49, Dr. David Alan Gilbert (git) wrote: >>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> >>> >>> The VMDescription section maybe after the EOF mark, the current code >>> does a 'qemu_get_byte' and either gets the header byte identifying the >>> description or an error (which it ignores). Doing the 'get' upsets >>> RDMA which hangs on old machine types without the VMDescription. >>> >>> Using 'qemu_peek_byte' avoids that. >>> >>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> >> >> Fun. I did actually use peek at first and then figured it's the same as >> read in the qemu file implementation. Have you figured out why exactly >> peek does make a difference for the RDMA case? > > Yeh, scrap this patch. > > I've just posted > > 'Only try and read a VMDescription if it should be there' > > as a replacement. > Fundamentally, the trick of trying to send/read stuff after the EOF > just isn't safe on all transports. We've got to read stuff if it's > expected and only if it's expected and obey the EOF marker. If it > wasn't for keeping compatibility I'd swing this section around so it > went before the EOF, but we can't break compatibility with streams > that already have it. Meh, that's truly a shame. The post-things-after-EOF-hack sounded so great... Alex
diff --git a/migration/savevm.c b/migration/savevm.c index 2004dce..4bd3709 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1128,9 +1128,14 @@ int qemu_loadvm_state(QEMUFile *f) * Try to read in the VMDESC section as well, so that dumping tools that * intercept our migration stream have the chance to see it. */ - if (qemu_get_byte(f) == QEMU_VM_VMDESCRIPTION) { - uint32_t size = qemu_get_be32(f); + if (qemu_peek_byte(f, 0) == QEMU_VM_VMDESCRIPTION) { uint8_t *buf = g_malloc(0x1000); + uint32_t size; + + /* Consume the peeked byte */ + size = qemu_get_byte(f); + + size = qemu_get_be32(f); while (size > 0) { uint32_t read_chunk = MIN(size, 0x1000);