Patchwork [14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

login
register
mail settings
Submitter Yoshiaki Tamura
Date Jan. 19, 2011, 5:44 a.m.
Message ID <1295415904-11918-15-git-send-email-tamura.yoshiaki@lab.ntt.co.jp>
Download mbox | patch
Permalink /patch/79408/
State New
Headers show

Comments

Yoshiaki Tamura - Jan. 19, 2011, 5:44 a.m.
event-tap function is called only when it is on, and requests sent
from device emulators.

Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
---
 block.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)
Kevin Wolf - Jan. 19, 2011, 9:05 a.m.
Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
> event-tap function is called only when it is on, and requests sent
> from device emulators.
> 
> Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
> ---
>  block.c |   11 +++++++++++
>  1 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index ff2795b..85bd8b8 100644
> --- a/block.c
> +++ b/block.c
> @@ -28,6 +28,7 @@
>  #include "block_int.h"
>  #include "module.h"
>  #include "qemu-objects.h"
> +#include "event-tap.h"
>  
>  #ifdef CONFIG_BSD
>  #include <sys/types.h>
> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>      if (bdrv_check_request(bs, sector_num, nb_sectors))
>          return NULL;
>  
> +    if (bs->device_name && event_tap_is_on()) {

bs->device_name is a pointer to a char array contained in bs, so it's
never NULL. You probably mean *bs->device_name?

Kevin
Kevin Wolf - Jan. 19, 2011, 9:47 a.m.
Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
> event-tap function is called only when it is on, and requests sent
> from device emulators.
> 
> Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
> ---
>  block.c |   11 +++++++++++
>  1 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index ff2795b..85bd8b8 100644
> --- a/block.c
> +++ b/block.c
> @@ -28,6 +28,7 @@
>  #include "block_int.h"
>  #include "module.h"
>  #include "qemu-objects.h"
> +#include "event-tap.h"
>  
>  #ifdef CONFIG_BSD
>  #include <sys/types.h>
> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>      if (bdrv_check_request(bs, sector_num, nb_sectors))
>          return NULL;
>  
> +    if (bs->device_name && event_tap_is_on()) {
> +        return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
> +                                         cb, opaque);
> +    }
> +
>      if (bs->dirty_bitmap) {
>          blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
>                                           opaque);

Just noticed the context here... Does this patch break block migration
when event-tap is on?

Another question that came to my mind is if we really hook everything we
need. I think we'll need to have a hook in bdrv_flush as well. I don't
know if you do hook qemu_aio_flush and friends -  does a call cause
event-tap to flush its queue? If not, a call to qemu_aio_flush might
hang qemu because it's waiting for requests to complete which are
actually stuck in the event-tap queue.

Kevin
Yoshiaki Tamura - Jan. 19, 2011, 12:06 p.m.
2011/1/19 Kevin Wolf <kwolf@redhat.com>:
> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
>> event-tap function is called only when it is on, and requests sent
>> from device emulators.
>>
>> Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
>> ---
>>  block.c |   11 +++++++++++
>>  1 files changed, 11 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index ff2795b..85bd8b8 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -28,6 +28,7 @@
>>  #include "block_int.h"
>>  #include "module.h"
>>  #include "qemu-objects.h"
>> +#include "event-tap.h"
>>
>>  #ifdef CONFIG_BSD
>>  #include <sys/types.h>
>> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>>      if (bdrv_check_request(bs, sector_num, nb_sectors))
>>          return NULL;
>>
>> +    if (bs->device_name && event_tap_is_on()) {
>
> bs->device_name is a pointer to a char array contained in bs, so it's
> never NULL. You probably mean *bs->device_name?

Yes, thanks for pointing out.  It didn't expose because
event_tap_is_on() was false upon flushing after synchronization.

Yoshi

>
> Kevin
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Yoshiaki Tamura - Jan. 19, 2011, 1:16 p.m.
2011/1/19 Kevin Wolf <kwolf@redhat.com>:
> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
>> event-tap function is called only when it is on, and requests sent
>> from device emulators.
>>
>> Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
>> ---
>>  block.c |   11 +++++++++++
>>  1 files changed, 11 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index ff2795b..85bd8b8 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -28,6 +28,7 @@
>>  #include "block_int.h"
>>  #include "module.h"
>>  #include "qemu-objects.h"
>> +#include "event-tap.h"
>>
>>  #ifdef CONFIG_BSD
>>  #include <sys/types.h>
>> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>>      if (bdrv_check_request(bs, sector_num, nb_sectors))
>>          return NULL;
>>
>> +    if (bs->device_name && event_tap_is_on()) {
>> +        return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
>> +                                         cb, opaque);
>> +    }
>> +
>>      if (bs->dirty_bitmap) {
>>          blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
>>                                           opaque);
>
> Just noticed the context here... Does this patch break block migration
> when event-tap is on?

I don't think so.  event-tap will call bdrv_aio_writev() upon
flushing requests and it shouldn't affect block-migration.  The
block written after the synchronization should be marked as dirty
and should be sent in the next round.  Am I missing the point?

> Another question that came to my mind is if we really hook everything we
> need. I think we'll need to have a hook in bdrv_flush as well. I don't
> know if you do hook qemu_aio_flush and friends -  does a call cause
> event-tap to flush its queue? If not, a call to qemu_aio_flush might
> hang qemu because it's waiting for requests to complete which are
> actually stuck in the event-tap queue.

No it doesn't queue at event-tap.  Marcelo pointed that we should
hook bdrv_aio_flush to avoid requests inversion, that made sense
to me.  Do we need to hook bdrv_flush for that same reason?  If
we hook bdrv_flush and qemu_aio_flush, we're going loop forever
because the synchronization code is calling vm_stop that call
bdrv_flush_all and qemu_aio_flush.

Yoshi

>
> Kevin
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Kevin Wolf - Jan. 19, 2011, 2:08 p.m.
Am 19.01.2011 14:16, schrieb Yoshiaki Tamura:
> 2011/1/19 Kevin Wolf <kwolf@redhat.com>:
>> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
>>> event-tap function is called only when it is on, and requests sent
>>> from device emulators.
>>>
>>> Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
>>> ---
>>>  block.c |   11 +++++++++++
>>>  1 files changed, 11 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index ff2795b..85bd8b8 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -28,6 +28,7 @@
>>>  #include "block_int.h"
>>>  #include "module.h"
>>>  #include "qemu-objects.h"
>>> +#include "event-tap.h"
>>>
>>>  #ifdef CONFIG_BSD
>>>  #include <sys/types.h>
>>> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>>>      if (bdrv_check_request(bs, sector_num, nb_sectors))
>>>          return NULL;
>>>
>>> +    if (bs->device_name && event_tap_is_on()) {
>>> +        return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
>>> +                                         cb, opaque);
>>> +    }
>>> +
>>>      if (bs->dirty_bitmap) {
>>>          blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
>>>                                           opaque);
>>
>> Just noticed the context here... Does this patch break block migration
>> when event-tap is on?
> 
> I don't think so.  event-tap will call bdrv_aio_writev() upon
> flushing requests and it shouldn't affect block-migration.  The
> block written after the synchronization should be marked as dirty
> and should be sent in the next round.  Am I missing the point?

No, that makes sense. I don't have a complete understanding of the whole
series yet, so there may be well more misunderstandings on my side.

>> Another question that came to my mind is if we really hook everything we
>> need. I think we'll need to have a hook in bdrv_flush as well. I don't
>> know if you do hook qemu_aio_flush and friends -  does a call cause
>> event-tap to flush its queue? If not, a call to qemu_aio_flush might
>> hang qemu because it's waiting for requests to complete which are
>> actually stuck in the event-tap queue.
> 
> No it doesn't queue at event-tap.  Marcelo pointed that we should
> hook bdrv_aio_flush to avoid requests inversion, that made sense
> to me.  Do we need to hook bdrv_flush for that same reason?  If

bdrv_flush() is the synchronous version of bdrv_aio_flush(), so in
general it seems likely that we need to do the same.

> we hook bdrv_flush and qemu_aio_flush, we're going loop forever
> because the synchronization code is calling vm_stop that call
> bdrv_flush_all and qemu_aio_flush.

qemu_aio_flush doesn't invoke any bdrv_* functions, so I don't see why
we would loop forever. It just waits for AIO requests to complete.

I just looked up the code and I think the situation is a bit different
than I thought originally: qemu_aio_flush waits only for completion of
requests which belong to a driver that has registered using
qemu_aio_set_fd_handler. So this means that AIO requests queued in
event-tap are not considered in-flight requests and we won't get stuck
in a loop. Maybe we have no problem in fact. :-)

On the other hand, e.g. migration relies on the fact that after a
qemu_aio_flush, all AIO requests that the device model has submitted are
completed. I think event-tap must take care that the requests which are
queued are not forgotten to be migrated. (Maybe the code already
considers this, I'm just writing down what comes to my mind...)

Kevin
Yoshiaki Tamura - Jan. 20, 2011, 5:01 a.m.
2011/1/19 Kevin Wolf <kwolf@redhat.com>:
> Am 19.01.2011 14:16, schrieb Yoshiaki Tamura:
>> 2011/1/19 Kevin Wolf <kwolf@redhat.com>:
>>> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
>>>> event-tap function is called only when it is on, and requests sent
>>>> from device emulators.
>>>>
>>>> Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
>>>> ---
>>>>  block.c |   11 +++++++++++
>>>>  1 files changed, 11 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index ff2795b..85bd8b8 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -28,6 +28,7 @@
>>>>  #include "block_int.h"
>>>>  #include "module.h"
>>>>  #include "qemu-objects.h"
>>>> +#include "event-tap.h"
>>>>
>>>>  #ifdef CONFIG_BSD
>>>>  #include <sys/types.h>
>>>> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>>>>      if (bdrv_check_request(bs, sector_num, nb_sectors))
>>>>          return NULL;
>>>>
>>>> +    if (bs->device_name && event_tap_is_on()) {
>>>> +        return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
>>>> +                                         cb, opaque);
>>>> +    }
>>>> +
>>>>      if (bs->dirty_bitmap) {
>>>>          blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
>>>>                                           opaque);
>>>
>>> Just noticed the context here... Does this patch break block migration
>>> when event-tap is on?
>>
>> I don't think so.  event-tap will call bdrv_aio_writev() upon
>> flushing requests and it shouldn't affect block-migration.  The
>> block written after the synchronization should be marked as dirty
>> and should be sent in the next round.  Am I missing the point?
>
> No, that makes sense. I don't have a complete understanding of the whole
> series yet, so there may be well more misunderstandings on my side.

It's OK.  I'm glad that you're reviewing.

>>> Another question that came to my mind is if we really hook everything we
>>> need. I think we'll need to have a hook in bdrv_flush as well. I don't
>>> know if you do hook qemu_aio_flush and friends -  does a call cause
>>> event-tap to flush its queue? If not, a call to qemu_aio_flush might
>>> hang qemu because it's waiting for requests to complete which are
>>> actually stuck in the event-tap queue.
>>
>> No it doesn't queue at event-tap.  Marcelo pointed that we should
>> hook bdrv_aio_flush to avoid requests inversion, that made sense
>> to me.  Do we need to hook bdrv_flush for that same reason?  If
>
> bdrv_flush() is the synchronous version of bdrv_aio_flush(), so in
> general it seems likely that we need to do the same.

Hmm.  Because it's synchronous, we need to start synchronization
right away, and once done, flush requests queued in event-tap
then return.

>> we hook bdrv_flush and qemu_aio_flush, we're going loop forever
>> because the synchronization code is calling vm_stop that call
>> bdrv_flush_all and qemu_aio_flush.
>
> qemu_aio_flush doesn't invoke any bdrv_* functions, so I don't see why
> we would loop forever. It just waits for AIO requests to complete.
>
> I just looked up the code and I think the situation is a bit different
> than I thought originally: qemu_aio_flush waits only for completion of
> requests which belong to a driver that has registered using
> qemu_aio_set_fd_handler. So this means that AIO requests queued in
> event-tap are not considered in-flight requests and we won't get stuck
> in a loop. Maybe we have no problem in fact. :-)

I had the same thoughts.  We don't have to hook qemu_aio_flush.

> On the other hand, e.g. migration relies on the fact that after a
> qemu_aio_flush, all AIO requests that the device model has submitted are
> completed. I think event-tap must take care that the requests which are
> queued are not forgotten to be migrated. (Maybe the code already
> considers this, I'm just writing down what comes to my mind...)

That's where event-tap is calling qemu_aio_flush.  It should be
almost same as for live migration.  Requests queued in event-tap
are replayed on the secondary side, that is the core design of
Kemari.

Thanks,

Yoshi

>
> Kevin
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Patch

diff --git a/block.c b/block.c
index ff2795b..85bd8b8 100644
--- a/block.c
+++ b/block.c
@@ -28,6 +28,7 @@ 
 #include "block_int.h"
 #include "module.h"
 #include "qemu-objects.h"
+#include "event-tap.h"
 
 #ifdef CONFIG_BSD
 #include <sys/types.h>
@@ -2111,6 +2112,11 @@  BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
     if (bdrv_check_request(bs, sector_num, nb_sectors))
         return NULL;
 
+    if (bs->device_name && event_tap_is_on()) {
+        return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
+                                         cb, opaque);
+    }
+
     if (bs->dirty_bitmap) {
         blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
                                          opaque);
@@ -2374,6 +2380,11 @@  BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
 
     if (!drv)
         return NULL;
+
+    if (bs->device_name && event_tap_is_on()) {
+        return event_tap_bdrv_aio_flush(bs, cb, opaque);
+    }
+
     return drv->bdrv_aio_flush(bs, cb, opaque);
 }