Patchwork Weird iscsi/fd-event issue since recent merge of event system changes

login
register
mail settings
Submitter ronniesahlberg@gmail.com
Date May 22, 2012, 9:15 a.m.
Message ID <CAN05THTh4oQeo0cRZ8bjY4i7L8TNNsY0mNJuDyjJzW7NBZ_V9g@mail.gmail.com>
Download mbox | patch
Permalink /patch/160571/
State New
Headers show

Comments

ronniesahlberg@gmail.com - May 22, 2012, 9:15 a.m.
Hi,

Now that I see what happens, I can easily workaround this in block/iscsi.c
by the patch below, but I dont know if this is the right thing to do.

It does appear that here,  when I use qemu_set_fd_handler() and add a
handler for "writeble" it takes 55ms before the event system notices
this and reacts.







On Tue, May 22, 2012 at 7:07 PM, ronnie sahlberg
<ronniesahlberg@gmail.com> wrote:
> Hi Stefan
>
> On Mon, May 21, 2012 at 11:20 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Mon, May 21, 2012 at 12:00 PM, ronnie sahlberg
>> <ronniesahlberg@gmail.com> wrote:
>>> Yes, I use IDE since I boot from this LUN.
>>>
>>>
>>> I just managed to track it down to the IDE changes.
>>>
>>> It looks like basically this change triggered it :
>>>
>>> commit bef0fd5958120542f126f2dedbfce65d8839a94d
>>> Author: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>> Date:   Thu Mar 29 10:31:30 2012 +0100
>>>    ide: convert ide_sector_read() to asynchronous I/O
>>>
>>>
>>> Stefan, any ideas?   Ill continue trying to understand the IDE code
>>> tomorrow and maybe I make some progress.
>>
>> It sounds like aio for iSCSI is broken.  IDE submits bdrv_aio_readv()
>> requests which iSCSI handles.  QEMU main loop should find that the
>> iSCSI socket file descriptor becomes readable when the SCSI response
>> arrives.  We can then report I/O completion back to the guest.
>>
>> However, it appears that completions are not triggering.  My guess is
>> that once the OS is fully booted you don't see the problem because
>> other events are pumping the main loop besides a 55ms timer.
>> Therefore this problem probably applies across the board but it's most
>> noticable when the BIOS/bootloader/early OS is executing.
>
> I booted off a CDROM image and then had iscsi device as a disk.
> From the cdrom booted guest I then did 'dd if=/dev/sda of=/dev/null
> bs=512 iflag=direct'
> but did not see these 55ms delays anymore.
> I did however see a ~4ms delay .
>
>
>>
>> Have you added debugging into block/iscsi.c or libiscsi to watch when
>> read handlers get called?  Using tcpdump or strace might also be
>> useful to figure out when the reply becomes available and why we
>> aren't registering a read handler function with the QEMU main loop.
>
> I just did that, and I print a message and timeval to stdout for the
> functions in block/iscsi.c
> What seems to happen is that we call iscsi_aio_readv() which sets up a
> "write event"  but this event is not triggered for another 55ms.
> Since the descriptor is writeable at this point,  i would expected
> that the write_event_callback would trigger almost immediately.
>
>
> Below   iscsi_aio_readv   is printed when we enter this function.
> This function prepares a CDB and sets it up for queueing,  but it does
> not actually write it to the socket.
> At the end of this function, we set up the events  by calling
> iscsi_set_events() and we set it up for writeable.
>
> iscsi_process_write() is when we enter the fd-is-writeable callback.
>
> iscsi_aio_readv                   1337676989.429822
> iscsi_set_events   write_event?Y  1337676989.429834
> iscsi_process_write               1337676989.484424
> iscsi_set_events   write_event?N  1337676989.484507
> iscsi_process_read                1337676989.484628
> iscsi_aio_read10_cb               1337676989.484647
> iscsi_set_events   write_event?N  1337676989.484660
> iscsi_readv_writev_bh_cb          1337676989.484665
> iscsi_aio_readv                   1337676989.484751
>
>
> So  what happens seems to be that once we have set up the i/o  and
> queued it,  and we set up the "writeable" event,
> but this event is not triggered immediately! It takes another 55ms
> before the event is triggered.
>
>
> To me this looks like a bug in the eventsystem. The socket is always
> writeable. shouldnt the callback be invoked almost immediately here ?
>
>
> regards
> ronnie sahlberg
Paolo Bonzini - May 22, 2012, 9:29 a.m.
Il 22/05/2012 11:15, ronnie sahlberg ha scritto:
> Hi,
> 
> Now that I see what happens, I can easily workaround this in block/iscsi.c
> by the patch below, but I dont know if this is the right thing to do.
> 
> It does appear that here,  when I use qemu_set_fd_handler() and add a
> handler for "writeble" it takes 55ms before the event system notices
> this and reacts.
> 
> 
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index d37c4ee..1ebff0f 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun)
>  {
>      struct iscsi_context *iscsi = iscsilun->iscsi;
> 
> +    if (iscsi_which_events(iscsi) & POLLOUT) {
> +        iscsi_process_write(iscsilun);
> +    }
> +
>      qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read,
>                             (iscsi_which_events(iscsi) & POLLOUT)
>                             ? iscsi_process_write : NULL,

Doh, now I remember.  Whenever you change the aio handlers you need to
call qemu_notify_event() afterwards, if the handler may fire right away.
 The alternative is your patch, which is correct.  Can you resend it
with SoB?

Paolo
ronniesahlberg@gmail.com - May 22, 2012, 9:48 a.m.
On Tue, May 22, 2012 at 7:29 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 22/05/2012 11:15, ronnie sahlberg ha scritto:
>> Hi,
>>
>> Now that I see what happens, I can easily workaround this in block/iscsi.c
>> by the patch below, but I dont know if this is the right thing to do.
>>
>> It does appear that here,  when I use qemu_set_fd_handler() and add a
>> handler for "writeble" it takes 55ms before the event system notices
>> this and reacts.
>>
>>
>>
>> diff --git a/block/iscsi.c b/block/iscsi.c
>> index d37c4ee..1ebff0f 100644
>> --- a/block/iscsi.c
>> +++ b/block/iscsi.c
>> @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun)
>>  {
>>      struct iscsi_context *iscsi = iscsilun->iscsi;
>>
>> +    if (iscsi_which_events(iscsi) & POLLOUT) {
>> +        iscsi_process_write(iscsilun);
>> +    }
>> +
>>      qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read,
>>                             (iscsi_which_events(iscsi) & POLLOUT)
>>                             ? iscsi_process_write : NULL,
>
> Doh, now I remember.  Whenever you change the aio handlers you need to
> call qemu_notify_event() afterwards, if the handler may fire right away.

Thanks.  I just confirmed that qemu_notify_event() fixes the issue.
Ill send a patch that uses qemu_notify_event() and a comment why this is needed.


regards
ronnie sahlberg
ronniesahlberg@gmail.com - May 22, 2012, 10:03 a.m.
None of the other drivers in block/*.c call qemu_notify_event()

Do you you think those should be audited and have this call added to
where required ?


regards
ronnie sahlberg

On Tue, May 22, 2012 at 7:48 PM, ronnie sahlberg
<ronniesahlberg@gmail.com> wrote:
> On Tue, May 22, 2012 at 7:29 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 22/05/2012 11:15, ronnie sahlberg ha scritto:
>>> Hi,
>>>
>>> Now that I see what happens, I can easily workaround this in block/iscsi.c
>>> by the patch below, but I dont know if this is the right thing to do.
>>>
>>> It does appear that here,  when I use qemu_set_fd_handler() and add a
>>> handler for "writeble" it takes 55ms before the event system notices
>>> this and reacts.
>>>
>>>
>>>
>>> diff --git a/block/iscsi.c b/block/iscsi.c
>>> index d37c4ee..1ebff0f 100644
>>> --- a/block/iscsi.c
>>> +++ b/block/iscsi.c
>>> @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun)
>>>  {
>>>      struct iscsi_context *iscsi = iscsilun->iscsi;
>>>
>>> +    if (iscsi_which_events(iscsi) & POLLOUT) {
>>> +        iscsi_process_write(iscsilun);
>>> +    }
>>> +
>>>      qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read,
>>>                             (iscsi_which_events(iscsi) & POLLOUT)
>>>                             ? iscsi_process_write : NULL,
>>
>> Doh, now I remember.  Whenever you change the aio handlers you need to
>> call qemu_notify_event() afterwards, if the handler may fire right away.
>
> Thanks.  I just confirmed that qemu_notify_event() fixes the issue.
> Ill send a patch that uses qemu_notify_event() and a comment why this is needed.
>
>
> regards
> ronnie sahlberg
Michael Tokarev - May 22, 2012, 4:31 p.m.
On 22.05.2012 14:03, ronnie sahlberg wrote:
[]
>>> Doh, now I remember.  Whenever you change the aio handlers you need to
>>> call qemu_notify_event() afterwards, if the handler may fire right away.
>>
>> Thanks.  I just confirmed that qemu_notify_event() fixes the issue.
>> Ill send a patch that uses qemu_notify_event() and a comment why this is needed.

When I were writing network block driver I tried to not call qemu_aio_set_fd_handler()
unless absolutely necessary -- that is, trying to write as much as possible till
the next write returns EAGAIN, or to read till the next read returns EAGAIN.  This
way there's no need to call qemu_notify_event().  FWIW.

/mjt
ronniesahlberg@gmail.com - May 25, 2012, 6:53 a.m.
On Wed, May 23, 2012 at 2:31 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
> On 22.05.2012 14:03, ronnie sahlberg wrote:
> []
>>>> Doh, now I remember.  Whenever you change the aio handlers you need to
>>>> call qemu_notify_event() afterwards, if the handler may fire right away.
>>>
>>> Thanks.  I just confirmed that qemu_notify_event() fixes the issue.
>>> Ill send a patch that uses qemu_notify_event() and a comment why this is needed.
>
> When I were writing network block driver I tried to not call qemu_aio_set_fd_handler()
> unless absolutely necessary -- that is, trying to write as much as possible till
> the next write returns EAGAIN, or to read till the next read returns EAGAIN.  This
> way there's no need to call qemu_notify_event().  FWIW.
>

Good idea. Ill create a patch that keeps track of which events we are
listening on already and only calling these functions when the events
change.

regards
ronnie sahlberg

Patch

diff --git a/block/iscsi.c b/block/iscsi.c
index d37c4ee..1ebff0f 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -105,6 +105,10 @@  iscsi_set_events(IscsiLun *iscsilun)
 {
     struct iscsi_context *iscsi = iscsilun->iscsi;

+    if (iscsi_which_events(iscsi) & POLLOUT) {
+        iscsi_process_write(iscsilun);
+    }
+
     qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read,
                            (iscsi_which_events(iscsi) & POLLOUT)
                            ? iscsi_process_write : NULL,