Patchwork virtio: verify that all outstanding buffers are flushed (was Re: vmstate conversion for virtio?)

login
register
mail settings
Submitter Michael S. Tsirkin
Date Dec. 5, 2012, 11:08 a.m.
Message ID <20121205110807.GA10045@redhat.com>
Download mbox | patch
Permalink /patch/203850/
State New
Headers show

Comments

Michael S. Tsirkin - Dec. 5, 2012, 11:08 a.m.
Add sanity check to address the following concern:

On Wed, Dec 05, 2012 at 09:47:22AM +1030, Rusty Russell wrote:
> All we need is the index of the request; the rest can be re-read from
> the ring.

I'd like to point out that this is not generally
true if any available requests are outstanding.
Imagine a ring of size 4.
Below A means available U means used.

A 1
A 2
U 2
A 2
U 2
A 2
U 2
A 2
U 2

At this point available ring has wrapped around, the only
way to know head 1 is outstanding is because backend
has stored this info somewhere.

The reason we manage to migrate without tracking this in migration
state is because we flush outstanding requests before
migration.
This flush is device-specific though, let's add
a safeguard in virtio core to ensure it's done properly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---
Rusty Russell - Dec. 6, 2012, 6:03 a.m.
"Michael S. Tsirkin" <mst@redhat.com> writes:
> Add sanity check to address the following concern:
>
> On Wed, Dec 05, 2012 at 09:47:22AM +1030, Rusty Russell wrote:
>> All we need is the index of the request; the rest can be re-read from
>> the ring.

The terminology I used here was loose, indeed.

We need the head of the chained descriptor, which we already read from
the ring when we gathered the request.

Currently we dump a massive structure; it's inelegant at the very least.

Cheers,
Rusty.
Michael S. Tsirkin - Dec. 6, 2012, 8:02 a.m.
On Thu, Dec 06, 2012 at 04:33:06PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > Add sanity check to address the following concern:
> >
> > On Wed, Dec 05, 2012 at 09:47:22AM +1030, Rusty Russell wrote:
> >> All we need is the index of the request; the rest can be re-read from
> >> the ring.
> 
> The terminology I used here was loose, indeed.
> 
> We need the head of the chained descriptor, which we already read from
> the ring when we gathered the request.

So ack that patch?

> Currently we dump a massive structure; it's inelegant at the very least.
> 
> Cheers,
> Rusty.

Hmm not sure what you refer to. I see this per ring:

        qemu_put_be32(f, vdev->vq[i].vring.num);
        qemu_put_be64(f, vdev->vq[i].pa);
        qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);

Looks like there's no way around savng these fields.
Rusty Russell - Dec. 6, 2012, 11:39 p.m.
"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, Dec 06, 2012 at 04:33:06PM +1030, Rusty Russell wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> > Add sanity check to address the following concern:
>> >
>> > On Wed, Dec 05, 2012 at 09:47:22AM +1030, Rusty Russell wrote:
>> >> All we need is the index of the request; the rest can be re-read from
>> >> the ring.
>> 
>> The terminology I used here was loose, indeed.
>> 
>> We need the head of the chained descriptor, which we already read from
>> the ring when we gathered the request.
>
> So ack that patch?

No, because I don't understand it.  Is it true for the case of
virtio_blk, which has outstanding requests?

>> Currently we dump a massive structure; it's inelegant at the very least.
>> 
>> Cheers,
>> Rusty.
>
> Hmm not sure what you refer to. I see this per ring:
>
>         qemu_put_be32(f, vdev->vq[i].vring.num);
>         qemu_put_be64(f, vdev->vq[i].pa);
>         qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);
>
> Looks like there's no way around savng these fields.

Not what I'm referring to.  See here:

virtio.h defines a 48k structure:

#define VIRTQUEUE_MAX_SIZE 1024

typedef struct VirtQueueElement
{
    unsigned int index;
    unsigned int out_num;
    unsigned int in_num;
    hwaddr in_addr[VIRTQUEUE_MAX_SIZE];
    hwaddr out_addr[VIRTQUEUE_MAX_SIZE];
    struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
    struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
} VirtQueueElement;

virtio-blk.c uses it in its request struct:

typedef struct VirtIOBlockReq
{
    VirtIOBlock *dev;
    VirtQueueElement elem;
    struct virtio_blk_inhdr *in;
    struct virtio_blk_outhdr *out;
    struct virtio_scsi_inhdr *scsi;
    QEMUIOVector qiov;
    struct VirtIOBlockReq *next;
    BlockAcctCookie acct;
} VirtIOBlockReq;

... and saves it in virtio_blk_save:

static void virtio_blk_save(QEMUFile *f, void *opaque)
{
    VirtIOBlock *s = opaque;
    VirtIOBlockReq *req = s->rq;

    virtio_save(&s->vdev, f);
    
    while (req) {
        qemu_put_sbyte(f, 1);
        qemu_put_buffer(f, (unsigned char*)&req->elem, sizeof(req->elem));
        req = req->next;
    }
    qemu_put_sbyte(f, 0);
}

Cheers,
Rusty.
Anthony Liguori - Dec. 10, 2012, 2:16 p.m.
Rusty Russell <rusty@rustcorp.com.au> writes:

> "Michael S. Tsirkin" <mst@redhat.com> writes:
>
> No, because I don't understand it.  Is it true for the case of
> virtio_blk, which has outstanding requests?
>
>>> Currently we dump a massive structure; it's inelegant at the very
>>> least.

Inelegant is a kind word..

There's a couple things to consider though which is why this code hasn't
changed so far.

1) We're writing native endian values to the wire.  This is seriously
   broken.  Just imagine trying to migrate from qemu-system-i386 on an
   big endian box to a little endian box.

2) Fixing (1) either means (a) breaking migration across the board
   gracefully or (b) breaking migration on [big|little] endian hosts in
   an extremely ungraceful way.

3) We send a ton of crap over the wire that is unnecessary, but we need
   to maintain it.

I wrote up a patch series to try to improve the situation that I'll send
out.  I haven't gotten around to testing it with an older version of
QEMU yet.

I went for 2.b and choose to break big endian hosts.

>>> 
>>> Cheers,
>>> Rusty.
>>
>> Hmm not sure what you refer to. I see this per ring:
>>
>>         qemu_put_be32(f, vdev->vq[i].vring.num);
>>         qemu_put_be64(f, vdev->vq[i].pa);
>>         qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);
>>
>> Looks like there's no way around savng these fields.

Correct.

Regards,

Anthony Liguori

>
> Not what I'm referring to.  See here:
>
> virtio.h defines a 48k structure:
>
> #define VIRTQUEUE_MAX_SIZE 1024
>
> typedef struct VirtQueueElement
> {
>     unsigned int index;
>     unsigned int out_num;
>     unsigned int in_num;
>     hwaddr in_addr[VIRTQUEUE_MAX_SIZE];
>     hwaddr out_addr[VIRTQUEUE_MAX_SIZE];
>     struct iovec in_sg[VIRTQUEUE_MAX_SIZE];
>     struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
> } VirtQueueElement;
>
> virtio-blk.c uses it in its request struct:
>
> typedef struct VirtIOBlockReq
> {
>     VirtIOBlock *dev;
>     VirtQueueElement elem;
>     struct virtio_blk_inhdr *in;
>     struct virtio_blk_outhdr *out;
>     struct virtio_scsi_inhdr *scsi;
>     QEMUIOVector qiov;
>     struct VirtIOBlockReq *next;
>     BlockAcctCookie acct;
> } VirtIOBlockReq;
>
> ... and saves it in virtio_blk_save:
>
> static void virtio_blk_save(QEMUFile *f, void *opaque)
> {
>     VirtIOBlock *s = opaque;
>     VirtIOBlockReq *req = s->rq;
>
>     virtio_save(&s->vdev, f);
>     
>     while (req) {
>         qemu_put_sbyte(f, 1);
>         qemu_put_buffer(f, (unsigned char*)&req->elem, sizeof(req->elem));
>         req = req->next;
>     }
>     qemu_put_sbyte(f, 0);
> }
>
> Cheers,
> Rusty.
Michael S. Tsirkin - Dec. 10, 2012, 8:35 p.m.
On Wed, Dec 05, 2012 at 01:08:07PM +0200, Michael S. Tsirkin wrote:
> Add sanity check to address the following concern:
> 
> On Wed, Dec 05, 2012 at 09:47:22AM +1030, Rusty Russell wrote:
> > All we need is the index of the request; the rest can be re-read from
> > the ring.
> 
> I'd like to point out that this is not generally
> true if any available requests are outstanding.
> Imagine a ring of size 4.
> Below A means available U means used.
> 
> A 1
> A 2
> U 2
> A 2
> U 2
> A 2
> U 2
> A 2
> U 2
> 
> At this point available ring has wrapped around, the only
> way to know head 1 is outstanding is because backend
> has stored this info somewhere.
> 
> The reason we manage to migrate without tracking this in migration
> state is because we flush outstanding requests before
> migration.
> This flush is device-specific though, let's add
> a safeguard in virtio core to ensure it's done properly.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Doh sent a wrong patch sorry. I'll resend the right one shortly.
Pls disregard.

> ---
> 
> diff --git a/hw/virtio.c b/hw/virtio.c
> index f40a8c5..b80a5a9 100644
> --- a/hw/virtio.c
> +++ b/hw/virtio.c
> @@ -788,6 +788,8 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
>          if (vdev->vq[i].vring.num == 0)
>              break;
>  
> +        assert(!vq->inuse);
> +
>          qemu_put_be32(f, vdev->vq[i].vring.num);
>          qemu_put_be64(f, vdev->vq[i].pa);
>          qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);
> 
> -- 
> MST
Rusty Russell - Dec. 10, 2012, 11:54 p.m.
Anthony Liguori <anthony@codemonkey.ws> writes:

> Rusty Russell <rusty@rustcorp.com.au> writes:
>
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>
>> No, because I don't understand it.  Is it true for the case of
>> virtio_blk, which has outstanding requests?
>>
>>>> Currently we dump a massive structure; it's inelegant at the very
>>>> least.
>
> Inelegant is a kind word..
>
> There's a couple things to consider though which is why this code hasn't
> changed so far.
>
> 1) We're writing native endian values to the wire.  This is seriously
>    broken.  Just imagine trying to migrate from qemu-system-i386 on an
>    big endian box to a little endian box.
>
> 2) Fixing (1) either means (a) breaking migration across the board
>    gracefully or (b) breaking migration on [big|little] endian hosts in
>    an extremely ungraceful way.
>
> 3) We send a ton of crap over the wire that is unnecessary, but we need
>    to maintain it.
>
> I wrote up a patch series to try to improve the situation that I'll send
> out.  I haven't gotten around to testing it with an older version of
> QEMU yet.
>
> I went for 2.b and choose to break big endian hosts.

Since we only actually want to save the descriptor head, I was planning
on a new format version.  That will fix both...

Look forward to your patch,
Rusty.

Patch

diff --git a/hw/virtio.c b/hw/virtio.c
index f40a8c5..b80a5a9 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -788,6 +788,8 @@  void virtio_save(VirtIODevice *vdev, QEMUFile *f)
         if (vdev->vq[i].vring.num == 0)
             break;
 
+        assert(!vq->inuse);
+
         qemu_put_be32(f, vdev->vq[i].vring.num);
         qemu_put_be64(f, vdev->vq[i].pa);
         qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);