Patchwork Re: KVM call minutes for Feb 1

login
register
mail settings
Submitter Jan Kiszka
Date Feb. 1, 2011, 5:03 p.m.
Message ID <4D483CCF.60009@siemens.com>
Download mbox | patch
Permalink /patch/81363/
State New
Headers show

Comments

Jan Kiszka - Feb. 1, 2011, 5:03 p.m.
On 2011-02-01 17:53, Anthony Liguori wrote:
> On 02/01/2011 10:36 AM, Jan Kiszka wrote:
>> On 2011-02-01 16:54, Chris Wright wrote:
>>    
>>> KVM upstream merge: status, plans, coordination
>>> - Jan has a git tree, consolidating
>>> - qemu-kvm io threading is still an issue
>>> - Anthony wants to just merge
>>>    - concerns with non-x86 arch and merge
>>>    - concerns with big-bang patch merge and following stability
>>> - post 0.14 conversion to glib mainloop, non-upstreamed qemu-kvm will be
>>>    a problem if it's not there by then
>>> - testing and nuances are still an issue (e.g. stefan berger's mmio read issue)
>>> - qemu-kvm still evolving, needs to get sync'd or it will keep diverging
>>> - 2 implementations of main init, cpu init, Jan has merged them into one
>>>    - qemu-kvm-x86.c file that's only a few hundred lines
>>> - review as one patch to see the fundamental difference
>>>      
>> More precisely, my current work flow is to pick some function(s), e.g.
>> kvm_cpu_exec/kvm_run, and start wondering "What needs to be done to
>> upstream so that qemu-kvm could use that implementation?". If they
>> differ, the reasons need to be understood and patched away, either by
>> fixing/enhancing upstream or simplifying qemu-kvm. Once the upstream
>> changes are merged back, a qemu-kvm patch is posted to switch to that
>> version.
>>
>> Any help will be welcome, either via review of my subtle regressions or
>> on resolving concrete differences.
>>
>> E.g. posix-aio-compat.c: Why does qemu-kvm differ here? If it's because
>> of its own iothread code, can we wrap that away or do we need to
>> consolidate the threading code first? Or do we need to fix something in
>> upstream?
>>    
> 
> I bet it's the eventfd thing.  It's arbitrary.  If you've got a small 
> diff post your series, I'd be happy to take a look at it and see what I 
> can explain.
> 

Looks like it's around signalfd and its emulation:

[git diff qemu/master..master posix-aio-compat.c]



Jan
Anthony Liguori - Feb. 1, 2011, 5:20 p.m.
On 02/01/2011 11:03 AM, Jan Kiszka wrote:
> On 2011-02-01 17:53, Anthony Liguori wrote:
>    
>> On 02/01/2011 10:36 AM, Jan Kiszka wrote:
>>      
>>> On 2011-02-01 16:54, Chris Wright wrote:
>>>
>>>        
>>>> KVM upstream merge: status, plans, coordination
>>>> - Jan has a git tree, consolidating
>>>> - qemu-kvm io threading is still an issue
>>>> - Anthony wants to just merge
>>>>     - concerns with non-x86 arch and merge
>>>>     - concerns with big-bang patch merge and following stability
>>>> - post 0.14 conversion to glib mainloop, non-upstreamed qemu-kvm will be
>>>>     a problem if it's not there by then
>>>> - testing and nuances are still an issue (e.g. stefan berger's mmio read issue)
>>>> - qemu-kvm still evolving, needs to get sync'd or it will keep diverging
>>>> - 2 implementations of main init, cpu init, Jan has merged them into one
>>>>     - qemu-kvm-x86.c file that's only a few hundred lines
>>>> - review as one patch to see the fundamental difference
>>>>
>>>>          
>>> More precisely, my current work flow is to pick some function(s), e.g.
>>> kvm_cpu_exec/kvm_run, and start wondering "What needs to be done to
>>> upstream so that qemu-kvm could use that implementation?". If they
>>> differ, the reasons need to be understood and patched away, either by
>>> fixing/enhancing upstream or simplifying qemu-kvm. Once the upstream
>>> changes are merged back, a qemu-kvm patch is posted to switch to that
>>> version.
>>>
>>> Any help will be welcome, either via review of my subtle regressions or
>>> on resolving concrete differences.
>>>
>>> E.g. posix-aio-compat.c: Why does qemu-kvm differ here? If it's because
>>> of its own iothread code, can we wrap that away or do we need to
>>> consolidate the threading code first? Or do we need to fix something in
>>> upstream?
>>>
>>>        
>> I bet it's the eventfd thing.  It's arbitrary.  If you've got a small
>> diff post your series, I'd be happy to take a look at it and see what I
>> can explain.
>>
>>      
> Looks like it's around signalfd and its emulation:
>    

I really meant the compatfd thing.

signalfd can't really be emulated properly so in upstream we switched to 
a pipe() which Avi didn't like.

But with glib, this all goes away anyway so we should just drop the 
qemu-kvm changes and use the upstream version.  Once we enable I/O 
thread in qemu.git, we no longer need to use signals for I/O completion 
which I think everyone would agree is a better solution.

Regards,

Anthony Liguori
Anthony Liguori - Feb. 1, 2011, 8:28 p.m.
On 02/01/2011 11:34 AM, Jan Kiszka wrote:
> On 2011-02-01 18:20, Anthony Liguori wrote:
>    
>> On 02/01/2011 11:03 AM, Jan Kiszka wrote:
>>      
>>> On 2011-02-01 17:53, Anthony Liguori wrote:
>>>
>>>        
>>>> On 02/01/2011 10:36 AM, Jan Kiszka wrote:
>>>>
>>>>          
>>>>> On 2011-02-01 16:54, Chris Wright wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> KVM upstream merge: status, plans, coordination
>>>>>> - Jan has a git tree, consolidating
>>>>>> - qemu-kvm io threading is still an issue
>>>>>> - Anthony wants to just merge
>>>>>>      - concerns with non-x86 arch and merge
>>>>>>      - concerns with big-bang patch merge and following stability
>>>>>> - post 0.14 conversion to glib mainloop, non-upstreamed qemu-kvm will be
>>>>>>      a problem if it's not there by then
>>>>>> - testing and nuances are still an issue (e.g. stefan berger's mmio read issue)
>>>>>> - qemu-kvm still evolving, needs to get sync'd or it will keep diverging
>>>>>> - 2 implementations of main init, cpu init, Jan has merged them into one
>>>>>>      - qemu-kvm-x86.c file that's only a few hundred lines
>>>>>> - review as one patch to see the fundamental difference
>>>>>>
>>>>>>
>>>>>>              
>>>>> More precisely, my current work flow is to pick some function(s), e.g.
>>>>> kvm_cpu_exec/kvm_run, and start wondering "What needs to be done to
>>>>> upstream so that qemu-kvm could use that implementation?". If they
>>>>> differ, the reasons need to be understood and patched away, either by
>>>>> fixing/enhancing upstream or simplifying qemu-kvm. Once the upstream
>>>>> changes are merged back, a qemu-kvm patch is posted to switch to that
>>>>> version.
>>>>>
>>>>> Any help will be welcome, either via review of my subtle regressions or
>>>>> on resolving concrete differences.
>>>>>
>>>>> E.g. posix-aio-compat.c: Why does qemu-kvm differ here? If it's because
>>>>> of its own iothread code, can we wrap that away or do we need to
>>>>> consolidate the threading code first? Or do we need to fix something in
>>>>> upstream?
>>>>>
>>>>>
>>>>>            
>>>> I bet it's the eventfd thing.  It's arbitrary.  If you've got a small
>>>> diff post your series, I'd be happy to take a look at it and see what I
>>>> can explain.
>>>>
>>>>
>>>>          
>>> Looks like it's around signalfd and its emulation:
>>>
>>>        
>> I really meant the compatfd thing.
>>
>> signalfd can't really be emulated properly so in upstream we switched to
>> a pipe() which Avi didn't like.
>>
>> But with glib, this all goes away anyway so we should just drop the
>> qemu-kvm changes and use the upstream version.  Once we enable I/O
>> thread in qemu.git, we no longer need to use signals for I/O completion
>> which I think everyone would agree is a better solution.
>>      
> Don't understand: If we do not need SIGIO for AIO emulation in threaded
> mode, why wasn't that stubbed out already?

Historically, we used posix-aio which only notifies completion based on 
signals.

However, because of the signal/select race, there's nothing useful that 
can be done in the signal handler.  So we then added signalfd such that 
we could poll the signal safely from the select loop.

However, signalfd cannot be emulated reliably which was the approach we 
had been using since signalfd is only available in newer kernels.  So we 
switched to having the signal handler write to a pipe() which gives us 
an fd based notification mechanism.  While qemu.git made that change, 
qemu-kvm.git carried the signalfd version probably because we just 
didn't argue about it enough back then.

Now, since we haven't used posix-aio in a very long time, there's really 
no reason to go through this signal non-sense in the first place.  We 
can just make the helper threads write to a file descriptor (eventfd or 
pipe).  At one point, that's what we did in the tree.  However, when TCG 
does TB chaining, the only thing that will break a guest out of a tight 
loop is a signal delivery.  In single threaded TCG, if the guest doesn't 
have a periodic timer enabled and issues an I/O operation, the 
signalling is posix-aio-compat would break it out of the TB loop to let 
it handle the completion.  When we got rid of it, we broke these guests 
with the symptom of I/Os not completing until you typed a key in the 
serial console.

However, once we enable the I/O thread for TCG, the I/O thread can issue 
a select() statement while the TCG thread is doing chaining.  As long as 
we send a signal to the TCG thread after select() returns and then wait 
for qemu_mutex to be released, this problem doesn't exist anymore.

So enabling the I/O thread universally means we can drop signaling in 
posix-aio.

Regards,

Anthony Liguori

>   If that helps reducing
> worries about the signalfd emulation (which is likely a non-issue anyway
> as anyone with serious workload should run a kernel with such support).
>
> Jan
>
>
Marcelo Tosatti - Feb. 3, 2011, 10:11 a.m.
On Tue, Feb 01, 2011 at 06:34:50PM +0100, Jan Kiszka wrote:
> On 2011-02-01 18:20, Anthony Liguori wrote:
> > On 02/01/2011 11:03 AM, Jan Kiszka wrote:
> >> On 2011-02-01 17:53, Anthony Liguori wrote:
> >>    
> >>> On 02/01/2011 10:36 AM, Jan Kiszka wrote:
> >>>      
> >>>> On 2011-02-01 16:54, Chris Wright wrote:
> >>>>
> >>>>        
> >>>>> KVM upstream merge: status, plans, coordination
> >>>>> - Jan has a git tree, consolidating
> >>>>> - qemu-kvm io threading is still an issue
> >>>>> - Anthony wants to just merge
> >>>>>     - concerns with non-x86 arch and merge
> >>>>>     - concerns with big-bang patch merge and following stability
> >>>>> - post 0.14 conversion to glib mainloop, non-upstreamed qemu-kvm will be
> >>>>>     a problem if it's not there by then
> >>>>> - testing and nuances are still an issue (e.g. stefan berger's mmio read issue)
> >>>>> - qemu-kvm still evolving, needs to get sync'd or it will keep diverging
> >>>>> - 2 implementations of main init, cpu init, Jan has merged them into one
> >>>>>     - qemu-kvm-x86.c file that's only a few hundred lines
> >>>>> - review as one patch to see the fundamental difference
> >>>>>
> >>>>>          
> >>>> More precisely, my current work flow is to pick some function(s), e.g.
> >>>> kvm_cpu_exec/kvm_run, and start wondering "What needs to be done to
> >>>> upstream so that qemu-kvm could use that implementation?". If they
> >>>> differ, the reasons need to be understood and patched away, either by
> >>>> fixing/enhancing upstream or simplifying qemu-kvm. Once the upstream
> >>>> changes are merged back, a qemu-kvm patch is posted to switch to that
> >>>> version.
> >>>>
> >>>> Any help will be welcome, either via review of my subtle regressions or
> >>>> on resolving concrete differences.
> >>>>
> >>>> E.g. posix-aio-compat.c: Why does qemu-kvm differ here? If it's because
> >>>> of its own iothread code, can we wrap that away or do we need to
> >>>> consolidate the threading code first? Or do we need to fix something in
> >>>> upstream?
> >>>>
> >>>>        
> >>> I bet it's the eventfd thing.  It's arbitrary.  If you've got a small
> >>> diff post your series, I'd be happy to take a look at it and see what I
> >>> can explain.
> >>>
> >>>      
> >> Looks like it's around signalfd and its emulation:
> >>    
> > 
> > I really meant the compatfd thing.
> > 
> > signalfd can't really be emulated properly so in upstream we switched to 
> > a pipe() which Avi didn't like.
> > 
> > But with glib, this all goes away anyway so we should just drop the 
> > qemu-kvm changes and use the upstream version.  Once we enable I/O 
> > thread in qemu.git, we no longer need to use signals for I/O completion 
> > which I think everyone would agree is a better solution.
> Don't understand: If we do not need SIGIO for AIO emulation in threaded
> mode, why wasn't that stubbed out already? If that helps reducing
> worries about the signalfd emulation (which is likely a non-issue anyway
> as anyone with serious workload should run a kernel with such support).

qemu-kvm has this modification for performance reasons.
SIGUSR2 can't be blocked otherwise. See example test case at
https://patchwork.kernel.org/patch/20817/.

Problem is that you can't block the AIO signal and process it via
signalfd because of synchronous IO emulation:

- submit io
- qemu_aio_wait

Since the aio signal is processed in main_loop_wait by the iothread, the
above deadlocks. To be more clear:

SIGUSR2 unblocked:
signal -> aio_signal_handler -> write(posix_fd)

SIGUSR2 blocked:
signal -> signalfd -> aio_signal_handler -> write(posix_fd)

It would be good to maintain this behaviour upstream, before switching
(can be selective on CONFIG_IOTHREAD), IMO.
Anthony Liguori - Feb. 3, 2011, 1:48 p.m.
On 02/03/2011 04:11 AM, Marcelo Tosatti wrote:
> On Tue, Feb 01, 2011 at 06:34:50PM +0100, Jan Kiszka wrote:
>    
>> On 2011-02-01 18:20, Anthony Liguori wrote:
>>      
>>> On 02/01/2011 11:03 AM, Jan Kiszka wrote:
>>>        
>>>> On 2011-02-01 17:53, Anthony Liguori wrote:
>>>>
>>>>          
>>>>> On 02/01/2011 10:36 AM, Jan Kiszka wrote:
>>>>>
>>>>>            
>>>>>> On 2011-02-01 16:54, Chris Wright wrote:
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> KVM upstream merge: status, plans, coordination
>>>>>>> - Jan has a git tree, consolidating
>>>>>>> - qemu-kvm io threading is still an issue
>>>>>>> - Anthony wants to just merge
>>>>>>>      - concerns with non-x86 arch and merge
>>>>>>>      - concerns with big-bang patch merge and following stability
>>>>>>> - post 0.14 conversion to glib mainloop, non-upstreamed qemu-kvm will be
>>>>>>>      a problem if it's not there by then
>>>>>>> - testing and nuances are still an issue (e.g. stefan berger's mmio read issue)
>>>>>>> - qemu-kvm still evolving, needs to get sync'd or it will keep diverging
>>>>>>> - 2 implementations of main init, cpu init, Jan has merged them into one
>>>>>>>      - qemu-kvm-x86.c file that's only a few hundred lines
>>>>>>> - review as one patch to see the fundamental difference
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> More precisely, my current work flow is to pick some function(s), e.g.
>>>>>> kvm_cpu_exec/kvm_run, and start wondering "What needs to be done to
>>>>>> upstream so that qemu-kvm could use that implementation?". If they
>>>>>> differ, the reasons need to be understood and patched away, either by
>>>>>> fixing/enhancing upstream or simplifying qemu-kvm. Once the upstream
>>>>>> changes are merged back, a qemu-kvm patch is posted to switch to that
>>>>>> version.
>>>>>>
>>>>>> Any help will be welcome, either via review of my subtle regressions or
>>>>>> on resolving concrete differences.
>>>>>>
>>>>>> E.g. posix-aio-compat.c: Why does qemu-kvm differ here? If it's because
>>>>>> of its own iothread code, can we wrap that away or do we need to
>>>>>> consolidate the threading code first? Or do we need to fix something in
>>>>>> upstream?
>>>>>>
>>>>>>
>>>>>>              
>>>>> I bet it's the eventfd thing.  It's arbitrary.  If you've got a small
>>>>> diff post your series, I'd be happy to take a look at it and see what I
>>>>> can explain.
>>>>>
>>>>>
>>>>>            
>>>> Looks like it's around signalfd and its emulation:
>>>>
>>>>          
>>> I really meant the compatfd thing.
>>>
>>> signalfd can't really be emulated properly so in upstream we switched to
>>> a pipe() which Avi didn't like.
>>>
>>> But with glib, this all goes away anyway so we should just drop the
>>> qemu-kvm changes and use the upstream version.  Once we enable I/O
>>> thread in qemu.git, we no longer need to use signals for I/O completion
>>> which I think everyone would agree is a better solution.
>>>        
>> Don't understand: If we do not need SIGIO for AIO emulation in threaded
>> mode, why wasn't that stubbed out already? If that helps reducing
>> worries about the signalfd emulation (which is likely a non-issue anyway
>> as anyone with serious workload should run a kernel with such support).
>>      
> qemu-kvm has this modification for performance reasons.
> SIGUSR2 can't be blocked otherwise. See example test case at
> https://patchwork.kernel.org/patch/20817/.
>    

That test-case is not realistic.  That's 10k signals per second.  With 
batching, we're at an I/O op rate that we're not even close to today.  I 
can guarantee that you won't find a real workload were you can actually 
measure the difference.

And keep in mind, the signal notification should go away so having this 
change in qemu-kvm really doesn't make sense.

Regards,

Anthony Liguori

Patch

diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index fa5494d..0704064 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -28,6 +28,7 @@ 
 #include "qemu-common.h"
 #include "trace.h"
 #include "block_int.h"
+#include "compatfd.h"
 
 #include "block/raw-posix-aio.h"
 
@@ -55,7 +56,7 @@  struct qemu_paiocb {
 };
 
 typedef struct PosixAioState {
-    int rfd, wfd;
+    int fd;
     struct qemu_paiocb *first_aio;
 } PosixAioState;
 
@@ -474,18 +475,29 @@  static int posix_aio_process_queue(void *opaque)
 static void posix_aio_read(void *opaque)
 {
     PosixAioState *s = opaque;
-    ssize_t len;
+    union {
+        struct qemu_signalfd_siginfo siginfo;
+        char buf[128];
+    } sig;
+    size_t offset;
 
-    /* read all bytes from signal pipe */
-    for (;;) {
-        char bytes[16];
+    /* try to read from signalfd, don't freak out if we can't read anything */
+    offset = 0;
+    while (offset < 128) {
+        ssize_t len;
 
-        len = read(s->rfd, bytes, sizeof(bytes));
+        len = read(s->fd, sig.buf + offset, 128 - offset);
         if (len == -1 && errno == EINTR)
-            continue; /* try again */
-        if (len == sizeof(bytes))
-            continue; /* more to read */
-        break;
+            continue;
+        if (len == -1 && errno == EAGAIN) {
+            /* there is no natural reason for this to happen,
+             * so we'll spin hard until we get everything just
+             * to be on the safe side. */
+            if (offset > 0)
+                continue;
+        }
+
+        offset += len;
     }
 
     posix_aio_process_queue(s);
@@ -499,20 +511,6 @@  static int posix_aio_flush(void *opaque)
 
 static PosixAioState *posix_aio_state;
 
-static void aio_signal_handler(int signum)
-{
-    if (posix_aio_state) {
-        char byte = 0;
-        ssize_t ret;
-
-        ret = write(posix_aio_state->wfd, &byte, sizeof(byte));
-        if (ret < 0 && errno != EAGAIN)
-            die("write()");
-    }
-
-    qemu_service_io();
-}
-
 static void paio_remove(struct qemu_paiocb *acb)
 {
     struct qemu_paiocb **pacb;
@@ -616,9 +614,8 @@  BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,
 
 int paio_init(void)
 {
-    struct sigaction act;
+    sigset_t mask;
     PosixAioState *s;
-    int fds[2];
     int ret;
 
     if (posix_aio_state)
@@ -626,24 +623,21 @@  int paio_init(void)
 
     s = qemu_malloc(sizeof(PosixAioState));
 
-    sigfillset(&act.sa_mask);
-    act.sa_flags = 0; /* do not restart syscalls to interrupt select() */
-    act.sa_handler = aio_signal_handler;
-    sigaction(SIGUSR2, &act, NULL);
+    /* Make sure to block AIO signal */
+    sigemptyset(&mask);
+    sigaddset(&mask, SIGUSR2);
+    sigprocmask(SIG_BLOCK, &mask, NULL);
 
     s->first_aio = NULL;
-    if (qemu_pipe(fds) == -1) {
-        fprintf(stderr, "failed to create pipe\n");
+    s->fd = qemu_signalfd(&mask);
+    if (s->fd == -1) {
+        fprintf(stderr, "failed to create signalfd\n");
         return -1;
     }
 
-    s->rfd = fds[0];
-    s->wfd = fds[1];
-
-    fcntl(s->rfd, F_SETFL, O_NONBLOCK);
-    fcntl(s->wfd, F_SETFL, O_NONBLOCK);
+    fcntl(s->fd, F_SETFL, O_NONBLOCK);
 
-    qemu_aio_set_fd_handler(s->rfd, posix_aio_read, NULL, posix_aio_flush,
+    qemu_aio_set_fd_handler(s->fd, posix_aio_read, NULL, posix_aio_flush,
         posix_aio_process_queue, s);
 
     ret = pthread_attr_init(&attr);