diff mbox

virtio-blk: correctly dirty guest memory

Message ID 20150402162601.GL15412@fam-t430.nay.redhat.com
State New
Headers show

Commit Message

Fam Zheng April 2, 2015, 4:26 p.m. UTC
On Thu, 04/02 17:21, Paolo Bonzini wrote:
> 
> 
> On 02/04/2015 17:16, Fam Zheng wrote:
> >>>> > >> After qemu_iovec_destroy, the QEMUIOVector's size is zeroed and
> >>>> > >> the zero size ultimately is used to compute virtqueue_push's len
> >>>> > >> argument.  Therefore, reads from virtio-blk devices did not
> >>>> > >> migrate their results correctly.  (Writes were okay).
> >>> > > 
> >>> > > Can't we move qemu_iovec_destroy to virtio_blk_free_request?
> >> > 
> >> > You would still have to add more code to differentiate reads and
> >> > writes---I think.
> > Yeah, but the extra field will not be needed.
> 
> Can you post an alternative patch?  One small complication is that
> is_write is in mrb but not in mrb->reqs[x].  virtio_blk_rw_complete is
> already doing
> 
>     int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
>     bool is_read = !(p & VIRTIO_BLK_T_OUT);
> 
> but only in a slow path.

OK, so it looks like a new field is the simplest way to achieve.

There is another problem with your patch - read_size is not initialized in
non-RW paths like scsi and flush.

I think the optimization for write is a separate thing, though. Shouldn't below
patch already fix the migration issue?

Comments

Paolo Bonzini April 2, 2015, 4:36 p.m. UTC | #1
On 02/04/2015 18:26, Fam Zheng wrote:
> There is another problem with your patch - read_size is not initialized in
> non-RW paths like scsi and flush.

Right, but...

> I think the optimization for write is a separate thing, though. Shouldn't below
> patch already fix the migration issue?

... it also doesn't cover SCSI, does it?

Paolo
Wen Congyang April 3, 2015, 1:22 a.m. UTC | #2
On 04/03/2015 12:26 AM, Fam Zheng wrote:
> On Thu, 04/02 17:21, Paolo Bonzini wrote:
>>
>>
>> On 02/04/2015 17:16, Fam Zheng wrote:
>>>>>>>>> After qemu_iovec_destroy, the QEMUIOVector's size is zeroed and
>>>>>>>>> the zero size ultimately is used to compute virtqueue_push's len
>>>>>>>>> argument.  Therefore, reads from virtio-blk devices did not
>>>>>>>>> migrate their results correctly.  (Writes were okay).
>>>>>>>
>>>>>>> Can't we move qemu_iovec_destroy to virtio_blk_free_request?
>>>>>
>>>>> You would still have to add more code to differentiate reads and
>>>>> writes---I think.
>>> Yeah, but the extra field will not be needed.
>>
>> Can you post an alternative patch?  One small complication is that
>> is_write is in mrb but not in mrb->reqs[x].  virtio_blk_rw_complete is
>> already doing
>>
>>     int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
>>     bool is_read = !(p & VIRTIO_BLK_T_OUT);
>>
>> but only in a slow path.
> 
> OK, so it looks like a new field is the simplest way to achieve.
> 
> There is another problem with your patch - read_size is not initialized in
> non-RW paths like scsi and flush.
> 
> I think the optimization for write is a separate thing, though. Shouldn't below
> patch already fix the migration issue?
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 000c38d..ee6e198 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -92,13 +92,6 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
>          next = req->mr_next;
>          trace_virtio_blk_rw_complete(req, ret);
>  
> -        if (req->qiov.nalloc != -1) {
> -            /* If nalloc is != 1 req->qiov is a local copy of the original
> -             * external iovec. It was allocated in submit_merged_requests
> -             * to be able to merge requests. */
> -            qemu_iovec_destroy(&req->qiov);
> -        }
> -
>          if (ret) {
>              int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
>              bool is_read = !(p & VIRTIO_BLK_T_OUT);
> @@ -109,6 +102,13 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
>  
>          virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
>          block_acct_done(blk_get_stats(req->dev->blk), &req->acct);
> +
> +        if (req->qiov.nalloc != -1) {
> +            /* This means req->qiov is a local copy of the original external
> +             * iovec. It was allocated in virtio_blk_submit_multireq in order
> +             * to merge requests. */
> +            qemu_iovec_destroy(&req->qiov);
> +        }

We will not come here on I/O failure. It will cause memory leak.

Thanks
Wen Congyang

>          virtio_blk_free_request(req);
>      }
>  }
> 
> 
> .
>
diff mbox

Patch

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 000c38d..ee6e198 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -92,13 +92,6 @@  static void virtio_blk_rw_complete(void *opaque, int ret)
         next = req->mr_next;
         trace_virtio_blk_rw_complete(req, ret);
 
-        if (req->qiov.nalloc != -1) {
-            /* If nalloc is != 1 req->qiov is a local copy of the original
-             * external iovec. It was allocated in submit_merged_requests
-             * to be able to merge requests. */
-            qemu_iovec_destroy(&req->qiov);
-        }
-
         if (ret) {
             int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
             bool is_read = !(p & VIRTIO_BLK_T_OUT);
@@ -109,6 +102,13 @@  static void virtio_blk_rw_complete(void *opaque, int ret)
 
         virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
         block_acct_done(blk_get_stats(req->dev->blk), &req->acct);
+
+        if (req->qiov.nalloc != -1) {
+            /* This means req->qiov is a local copy of the original external
+             * iovec. It was allocated in virtio_blk_submit_multireq in order
+             * to merge requests. */
+            qemu_iovec_destroy(&req->qiov);
+        }
         virtio_blk_free_request(req);
     }
 }