diff mbox

[4/4] qemu-nbd: do not start the block layer in the parent

Message ID 1319797077-25441-5-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Oct. 28, 2011, 10:17 a.m. UTC
Forking and threading do not behave very well together.  With qemu-nbd in
-c mode it could happen that the thread pool is started in the parent, and
the children (the one actually doing the I/O) is left without working I/O.
Fix this by only running bdrv_init in the child process.

Reported-by: Pierre Riteau <Pierre.Riteau@irisa.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 qemu-nbd.c |   31 ++++++++++++++-----------------
 1 files changed, 14 insertions(+), 17 deletions(-)

Comments

Pierre Riteau Oct. 28, 2011, 11:56 a.m. UTC | #1
Thanks a lot Paolo, I confirm this patchset fixes the bug!
Paolo Bonzini Oct. 28, 2011, 11:57 a.m. UTC | #2
On 10/28/2011 01:56 PM, Pierre Riteau wrote:
> Thanks a lot Paolo, I confirm this patchset fixes the bug!

Do you believe the limitation on error reporting to be too strong?

Paolo
Pierre Riteau Oct. 28, 2011, 12:16 p.m. UTC | #3
On 28 oct. 2011, at 13:57, Paolo Bonzini wrote:

> On 10/28/2011 01:56 PM, Pierre Riteau wrote:
>> Thanks a lot Paolo, I confirm this patchset fixes the bug!
> 
> Do you believe the limitation on error reporting to be too strong?
> 
> Paolo

Yes, it would be better if we could have error output on stderr. Now, "simple" errors such as a missing image file (or wrong path to the image) are reported to syslog instead. It could be the source of some headaches...

Is there a way we could have the child send the error to the parent over a pipe and have the parent print it on stderr?

Pierre
Paolo Bonzini Oct. 28, 2011, 12:17 p.m. UTC | #4
On 10/28/2011 02:16 PM, Pierre Riteau wrote:
> Yes, it would be better if we could have error output on stderr. Now,
> "simple" errors such as a missing image file (or wrong path to the
> image) are reported to syslog instead. It could be the source of some
> headaches...
>
> Is there a way we could have the child send the error to the parent
> over a pipe and have the parent print it on stderr?

A way could be to change the fork() into a separate thread, so that you 
can daemonize as soon as you accept the socket rather than having to do 
it early.

Paolo
Paolo Bonzini Nov. 4, 2011, 9:46 a.m. UTC | #5
On 10/28/2011 02:17 PM, Paolo Bonzini wrote:
>> Yes, it would be better if we could have error output on stderr. Now,
>> "simple" errors such as a missing image file (or wrong path to the
>> image) are reported to syslog instead. It could be the source of some
>> headaches...
>>
>> Is there a way we could have the child send the error to the parent
>> over a pipe and have the parent print it on stderr?
>
> A way could be to change the fork() into a separate thread, so that you
> can daemonize as soon as you accept the socket rather than having to do
> it early.

I tried implementing this, but in general daemonization (which forks and 
leave only the children) breaks the threading.

So we could either keep this series (which moves all errors to syslog, 
but doesn't otherwise change behavior), or I can finish and post an 
alternative series that removes all forking from qemu-nbd *but* changes 
behavior in that "qemu-nbd -c" will not daemonize anymore.

Since this is 1.0 after all, I'm slightly more inclined towards the latter.

Opinions?  Kevin, Anthony?

Paolo
Kevin Wolf Nov. 4, 2011, 10:31 a.m. UTC | #6
Am 04.11.2011 10:46, schrieb Paolo Bonzini:
> On 10/28/2011 02:17 PM, Paolo Bonzini wrote:
>>> Yes, it would be better if we could have error output on stderr. Now,
>>> "simple" errors such as a missing image file (or wrong path to the
>>> image) are reported to syslog instead. It could be the source of some
>>> headaches...
>>>
>>> Is there a way we could have the child send the error to the parent
>>> over a pipe and have the parent print it on stderr?
>>
>> A way could be to change the fork() into a separate thread, so that you
>> can daemonize as soon as you accept the socket rather than having to do
>> it early.
> 
> I tried implementing this, but in general daemonization (which forks and 
> leave only the children) breaks the threading.
> 
> So we could either keep this series (which moves all errors to syslog, 
> but doesn't otherwise change behavior), or I can finish and post an 
> alternative series that removes all forking from qemu-nbd *but* changes 
> behavior in that "qemu-nbd -c" will not daemonize anymore.
> 
> Since this is 1.0 after all, I'm slightly more inclined towards the latter.
> 
> Opinions?  Kevin, Anthony?

I'm surprised that -c is what causes trouble. As far as I understand,
the code for implementing -c doesn't use the qemu block layer at all.

So why can't we just change the code to fork before it initialises the
block layer and opens the image file?

Kevin
Paolo Bonzini Nov. 4, 2011, 11:10 a.m. UTC | #7
On 11/04/2011 11:31 AM, Kevin Wolf wrote:
> >  I tried implementing this, but in general daemonization (which forks and
> >  leave only the children) breaks the threading.
> >
> >  So we could either keep this series (which moves all errors to syslog,
> >  but doesn't otherwise change behavior), or I can finish and post an
> >  alternative series that removes all forking from qemu-nbd*but*  changes
> >  behavior in that "qemu-nbd -c" will not daemonize anymore.
> >
> >  Since this is 1.0 after all, I'm slightly more inclined towards the latter.
> >
> >  Opinions?  Kevin, Anthony?
>
> I'm surprised that -c is what causes trouble. As far as I understand,
> the code for implementing -c doesn't use the qemu block layer at all.

-c causes qemu-nbd to fork (both to run the server in a child process, 
and to daemonize).  The server runs in a child process, but that means 
that the server process loses the aio threads when the parent forks.

> So why can't we just change the code to fork before it initialises the
> block layer and opens the image file?

That's exactly what this series did.  However, daemonization has also to 
be done before opening the image file.  So the series has to support 
reporting errors to syslog, and also qemu-nbd will not give a nonzero 
exit status when errors occur.

Exchanging the server and client roles is also possible (i.e. put the 
server in the parent process and the client in the child).  It fixes the 
problem that the thread pool is lost in the server process.  However it 
still requires daemonization to occur before opening the image file so 
that errors cannot be reported properly.

Paolo
Kevin Wolf Nov. 4, 2011, 11:22 a.m. UTC | #8
Am 04.11.2011 12:10, schrieb Paolo Bonzini:
> On 11/04/2011 11:31 AM, Kevin Wolf wrote:
>>>  I tried implementing this, but in general daemonization (which forks and
>>>  leave only the children) breaks the threading.
>>>
>>>  So we could either keep this series (which moves all errors to syslog,
>>>  but doesn't otherwise change behavior), or I can finish and post an
>>>  alternative series that removes all forking from qemu-nbd*but*  changes
>>>  behavior in that "qemu-nbd -c" will not daemonize anymore.
>>>
>>>  Since this is 1.0 after all, I'm slightly more inclined towards the latter.
>>>
>>>  Opinions?  Kevin, Anthony?
>>
>> I'm surprised that -c is what causes trouble. As far as I understand,
>> the code for implementing -c doesn't use the qemu block layer at all.
> 
> -c causes qemu-nbd to fork (both to run the server in a child process, 
> and to daemonize).  The server runs in a child process, but that means 
> that the server process loses the aio threads when the parent forks.
> 
>> So why can't we just change the code to fork before it initialises the
>> block layer and opens the image file?
> 
> That's exactly what this series did.  However, daemonization has also to 
> be done before opening the image file.  So the series has to support 
> reporting errors to syslog, and also qemu-nbd will not give a nonzero 
> exit status when errors occur.

The parent could wait until the initialisation is done. As long as it
runs, writing to stderr should just work, no? Or if that doesn't work,
the child could use a pipe to communicate any errors to the parent in
the startup phase and only after the initialisation has completed it
would switch to using syslog.

> Exchanging the server and client roles is also possible (i.e. put the 
> server in the parent process and the client in the child).  It fixes the 
> problem that the thread pool is lost in the server process.  However it 
> still requires daemonization to occur before opening the image file so 
> that errors cannot be reported properly.

Yes, this doesn't solve the problem.

Kevin
Paolo Bonzini Nov. 4, 2011, 11:25 a.m. UTC | #9
On 11/04/2011 12:22 PM, Kevin Wolf wrote:
>> >
>> >  That's exactly what this series did.  However, daemonization has also to
>> >  be done before opening the image file.  So the series has to support
>> >  reporting errors to syslog, and also qemu-nbd will not give a nonzero
>> >  exit status when errors occur.
>
> The parent could wait until the initialisation is done.

You need to daemonize first, then fork the server.  If you fork the 
server before daemonizing, the server:

1) is not in its own process group, and still has a controlling TTY;

2) is not your child so your process structure is all broken, with the 
client and server being both child of PID 1;

3) is not your child, so you cannot reliably kill it (because if it has 
exited and PID 1 has already reaped it, you might kill a random process 
instead!).

Paolo
diff mbox

Patch

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 5031158..962952c 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -371,21 +371,6 @@  int main(int argc, char **argv)
 	return 0;
     }
 
-    bdrv_init();
-
-    bs = bdrv_new("hda");
-
-    if ((ret = bdrv_open(bs, argv[optind], flags, NULL)) < 0) {
-        errno = -ret;
-        err(EXIT_FAILURE, "Failed to bdrv_open '%s'", argv[optind]);
-    }
-
-    fd_size = bs->total_sectors * 512;
-
-    if (partition != -1 &&
-        find_partition(bs, partition, &dev_offset, &fd_size))
-        err(EXIT_FAILURE, "Could not find partition %d", partition);
-
     if (device) {
         pid_t pid;
         int sock;
@@ -418,7 +403,6 @@  int main(int argc, char **argv)
             size_t blocksize;
 
             ret = 0;
-            bdrv_close(bs);
 
             do {
                 sock = unix_socket_outgoing(socket);
@@ -473,8 +457,21 @@  int main(int argc, char **argv)
         /* children */
     }
 
+    bdrv_init();
+    bs = bdrv_new("hda");
+    if ((ret = bdrv_open(bs, argv[optind], flags, NULL)) < 0) {
+        errno = -ret;
+        err(EXIT_FAILURE, "Failed to bdrv_open '%s'", argv[optind]);
+    }
+
+    fd_size = bs->total_sectors * 512;
+
+    if (partition != -1 &&
+        find_partition(bs, partition, &dev_offset, &fd_size)) {
+        err(EXIT_FAILURE, "Could not find partition %d", partition);
+    }
+
     sharing_fds = g_malloc((shared + 1) * sizeof(int));
-
     if (socket) {
         sharing_fds[0] = unix_socket_incoming(socket);
     } else {