diff mbox

Don't leak file descriptors

Message ID 1258125436-23759-1-git-send-email-kwolf@redhat.com
State New
Headers show

Commit Message

Kevin Wolf Nov. 13, 2009, 3:17 p.m. UTC
We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
descriptors that don't need to be passed to children to stop this misbehaviour.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/raw-posix.c  |    1 +
 gdbstub.c          |    6 ++++++
 kvm-all.c          |    1 +
 migration-tcp.c    |    6 +++---
 migration-unix.c   |    6 +++---
 net.c              |    8 ++++----
 osdep.c            |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 posix-aio-compat.c |    3 +++
 qemu-char.c        |    7 ++++++-
 qemu-common.h      |    2 ++
 qemu-sockets.c     |   10 +++++-----
 qemu_socket.h      |    2 ++
 slirp/misc.c       |    4 ++--
 slirp/slirp.h      |    4 ++++
 slirp/socket.c     |    2 +-
 slirp/tcp_subr.c   |    2 +-
 slirp/udp.c        |    4 ++--
 vl.c               |    9 +++++++++
 vnc.c              |    2 +-
 19 files changed, 102 insertions(+), 23 deletions(-)

Comments

Scott Tsai Nov. 13, 2009, 3:36 p.m. UTC | #1
On Fri, Nov 13, 2009 at 11:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
> descriptors that don't need to be passed to children to stop this misbehaviour.

Since qemu is a multi threaded program, how about opening those file
descriptors with the equivalent of O_CLOEXEC set to avoid the race
condition when a fork comes between the 'open/socket/accept' operation
and the 'fcntl'?

We could create helper functions like 'qemu_socket_cloexec'.
The implementation of qemu_socket_cloexec would use the new system
calls and flags listed in:
http://udrepper.livejournal.com/20407.html
if available and fall back to separate 'open' and 'fcnt' operations
when not building with a new enough glibc.
Nathan Froyd Nov. 13, 2009, 3:41 p.m. UTC | #2
On Fri, Nov 13, 2009 at 04:17:16PM +0100, Kevin Wolf wrote:
> We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
> descriptors that don't need to be passed to children to stop this
> misbehaviour.
>
>> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -2356,6 +2356,9 @@ static void gdb_accept(void)
>              perror("accept");
>              return;
>          } else if (fd >= 0) {
> +#ifndef _WIN32
> +            fcntl(fd, F_SETFD, FD_CLOEXEC);
> +#endif
>              break;
>          }
>      }

Why not just use the new accept wrapper here?

> @@ -2385,6 +2388,9 @@ static int gdbserver_open(int port)
>          perror("socket");
>          return -1;
>      }
> +#ifndef _WIN32
> +    fcntl(fd, F_SETFD, FD_CLOEXEC);
> +#endif
>  
>      /* allow fast reuse */
>      val = 1;

...and ditto for using the new wrapper here.

-Nathan
Kevin Wolf Nov. 13, 2009, 3:44 p.m. UTC | #3
Am 13.11.2009 16:41, schrieb Nathan Froyd:
> On Fri, Nov 13, 2009 at 04:17:16PM +0100, Kevin Wolf wrote:
>> We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
>> descriptors that don't need to be passed to children to stop this
>> misbehaviour.
>>
>>> --- a/gdbstub.c
>> +++ b/gdbstub.c
>> @@ -2356,6 +2356,9 @@ static void gdb_accept(void)
>>              perror("accept");
>>              return;
>>          } else if (fd >= 0) {
>> +#ifndef _WIN32
>> +            fcntl(fd, F_SETFD, FD_CLOEXEC);
>> +#endif
>>              break;
>>          }
>>      }
> 
> Why not just use the new accept wrapper here?

gdbstub.c is also used in the Linux userspace emulator where the accept
wrapper is not available. I tried to add osdep.c to the linux-user build
- after all, it looked easy enough - but it ended up being too much
Makefile magic. This is why I decided to go for the easy way and expand it.

Kevin
Blue Swirl Nov. 13, 2009, 9:05 p.m. UTC | #4
On Fri, Nov 13, 2009 at 5:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
> descriptors that don't need to be passed to children to stop this misbehaviour.

> -        c = accept(s, (struct sockaddr *)&addr, &addrlen);
> +        c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen);

Would it be possible to improve the interface so that no casts are
needed for the calling code?
Jamie Lokier Nov. 16, 2009, 2:15 a.m. UTC | #5
Scott Tsai wrote:
> On Fri, Nov 13, 2009 at 11:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> > We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
> > descriptors that don't need to be passed to children to stop this misbehaviour.
> 
> Since qemu is a multi threaded program, how about opening those file
> descriptors with the equivalent of O_CLOEXEC set to avoid the race
> condition when a fork comes between the 'open/socket/accept' operation
> and the 'fcntl'?

For qemu-system this shoudn't matter, as long as all the forking and
opening is done in the same thread, with other threads only used for
virtual CPUs.

For qemu-user, maybe it's relevant?

> We could create helper functions like 'qemu_socket_cloexec'.
> The implementation of qemu_socket_cloexec would use the new system
> calls and flags listed in:
> http://udrepper.livejournal.com/20407.html
> if available and fall back to separate 'open' and 'fcnt' operations
> when not building with a new enough glibc.

To do it thoroughly, it's possible to emulate O_CLOEXEC's
thread-safety, by blocking fork operations in other threads during an
open+fcntl sequence using an rwlock, or without locking by keeping
track of "possibly open" file descriptors and closing them
unconditionally in fork children.

That would help with qemu-user emulation of O_CLOEXEC (et al) too.

-- Jamie
Kevin Wolf Nov. 16, 2009, 12:47 p.m. UTC | #6
Am 13.11.2009 22:05, schrieb Blue Swirl:
> On Fri, Nov 13, 2009 at 5:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
>> descriptors that don't need to be passed to children to stop this misbehaviour.
> 
>> -        c = accept(s, (struct sockaddr *)&addr, &addrlen);
>> +        c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen);
> 
> Would it be possible to improve the interface so that no casts are
> needed for the calling code?

How exactly would you do that? The only way I see to do it would be
using void*, but I'm not sure if this really is an improvement.

Kevin
Blue Swirl Nov. 16, 2009, 4:21 p.m. UTC | #7
On Mon, Nov 16, 2009 at 2:47 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 13.11.2009 22:05, schrieb Blue Swirl:
>> On Fri, Nov 13, 2009 at 5:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
>>> descriptors that don't need to be passed to children to stop this misbehaviour.
>>
>>> -        c = accept(s, (struct sockaddr *)&addr, &addrlen);
>>> +        c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen);
>>
>> Would it be possible to improve the interface so that no casts are
>> needed for the calling code?
>
> How exactly would you do that? The only way I see to do it would be
> using void*, but I'm not sure if this really is an improvement.

Instead of sockaddr_in vs. sockaddr and the lame casts in between, we
could have QSockAddr which magically works. Or if we only ever use
sockaddr_in, just use that.
Avi Kivity Nov. 16, 2009, 4:46 p.m. UTC | #8
On 11/16/2009 06:21 PM, Blue Swirl wrote:
>
>> How exactly would you do that? The only way I see to do it would be
>> using void*, but I'm not sure if this really is an improvement.
>>      
> Instead of sockaddr_in vs. sockaddr and the lame casts in between, we
> could have QSockAddr which magically works.

Probably a bunch of work for something that is rarely used.

> Or if we only ever use
> sockaddr_in, just use that.
>    

There's some ipv6 support in qemu-sockets.c.
Jamie Lokier Nov. 16, 2009, 11:05 p.m. UTC | #9
Blue Swirl wrote:
> On Mon, Nov 16, 2009 at 2:47 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> > Am 13.11.2009 22:05, schrieb Blue Swirl:
> >> On Fri, Nov 13, 2009 at 5:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> >>> We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
> >>> descriptors that don't need to be passed to children to stop this misbehaviour.
> >>
> >>> -        c = accept(s, (struct sockaddr *)&addr, &addrlen);
> >>> +        c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen);
> >>
> >> Would it be possible to improve the interface so that no casts are
> >> needed for the calling code?
> >
> > How exactly would you do that? The only way I see to do it would be
> > using void*, but I'm not sure if this really is an improvement.
> 
> Instead of sockaddr_in vs. sockaddr and the lame casts in between, we
> could have QSockAddr which magically works. Or if we only ever use
> sockaddr_in, just use that.

int qemu_accept(int s, union __attribute__((__transparent_union__)) {
                               struct sockaddr *sa;
                               struct sockaddr_in *sin;
                               struct sockaddr_in6 *sin6;
                } addr, socklen_t len);

#define qemu_accept(s, addr) qemu_accept(s, addr, sizeof(*addr))

Seems to work. :-)

Or use a typedef to prettify, but there's no need for another named
type in the API.  (But you might create QSockAddr anyway, for things
like printing and parsing).

Enjoy,
-- Jamie
Jamie Lokier Nov. 16, 2009, 11:10 p.m. UTC | #10
Jamie Lokier wrote:
> Blue Swirl wrote:
> > On Mon, Nov 16, 2009 at 2:47 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> > > Am 13.11.2009 22:05, schrieb Blue Swirl:
> > >> On Fri, Nov 13, 2009 at 5:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> > >>> We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
> > >>> descriptors that don't need to be passed to children to stop this misbehaviour.
> > >>
> > >>> -        c = accept(s, (struct sockaddr *)&addr, &addrlen);
> > >>> +        c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen);
> > >>
> > >> Would it be possible to improve the interface so that no casts are
> > >> needed for the calling code?
> > >
> > > How exactly would you do that? The only way I see to do it would be
> > > using void*, but I'm not sure if this really is an improvement.
> > 
> > Instead of sockaddr_in vs. sockaddr and the lame casts in between, we
> > could have QSockAddr which magically works. Or if we only ever use
> > sockaddr_in, just use that.
> 
> int qemu_accept(int s, union __attribute__((__transparent_union__)) {
>                                struct sockaddr *sa;
>                                struct sockaddr_in *sin;
>                                struct sockaddr_in6 *sin6;
>                 } addr, socklen_t len);
> 
> #define qemu_accept(s, addr) qemu_accept(s, addr, sizeof(*addr))
> 
> Seems to work. :-)

The transparent_union is what Glibc uses for accept(), by the way,
when _GNU_SOURCE is defined.  That's why the old code using
accept() compiled without warnings despite the type mismatch.

Grep for __SOCKADDR_ARG in Glibc's /usr/include/sys/socket.h.

-- Jamie
Anthony Liguori Nov. 16, 2009, 11:26 p.m. UTC | #11
Kevin Wolf wrote:
> We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
> descriptors that don't need to be passed to children to stop this misbehaviour.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>   

    pid = fork();
    if (pid == 0) {
        int open_max = sysconf(_SC_OPEN_MAX), i;

        for (i = 0; i < open_max; i++) {
            if (i != STDIN_FILENO &&
                i != STDOUT_FILENO &&
                i != STDERR_FILENO &&
                i != fd) {
                close(i);
            }

Handles this in a less invasive way.  I think the only problem we have 
today is that we use popen() for exec: migration.  The solution to that 
though should be to convert popen to a proper fork/exec() with a pipe.

I'd prefer to introduce a single fork/exec helper that behaved properly 
instead of having to deal with cloexec everywhere.

Regards,

Anthony Liguori
Jamie Lokier Nov. 16, 2009, 11:44 p.m. UTC | #12
Anthony Liguori wrote:
> Kevin Wolf wrote:
> >We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
> >descriptors that don't need to be passed to children to stop this 
> >misbehaviour.
> >
> >Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> >  
> 
>    pid = fork();
>    if (pid == 0) {
>        int open_max = sysconf(_SC_OPEN_MAX), i;
> 
>        for (i = 0; i < open_max; i++) {
>            if (i != STDIN_FILENO &&
>                i != STDOUT_FILENO &&
>                i != STDERR_FILENO &&
>                i != fd) {
>                close(i);
>            }
> 
> Handles this in a less invasive way.  I think the only problem we have 
> today is that we use popen() for exec: migration.  The solution to that 
> though should be to convert popen to a proper fork/exec() with a pipe.
> 
> I'd prefer to introduce a single fork/exec helper that behaved properly 
> instead of having to deal with cloexec everywhere.

The above can be a bit slow when sysconf(_SC_OPEN_MAX) == 131072,
which you get if running qemu from some web servers or some user
environments set up to run web servers...  But it's not _that_ slow on
a modern machine on Linux - 10^7 closes per second has been measured.
Still a bit slow if it's INT_MAX :-)

A scalable method on Linux is readdir(/proc/self/fd).  (I'm not sure
if readdir returns everything reliably if you close while reading, so
just reading to get the largest open fd value, then closing all fds up
to that value is what I do).

Or just copy the closefrom() implementation from openssh/sudo.

Interestingly, that says "We avoid checking resource limits since it
is possible to open a file descriptor and then drop the rlimit such
that it is below the open fd..." but then uses _SC_OPEN_MAX, which I
think on Glibc checks the resource limits...

-- Jamie
Kevin Wolf Nov. 17, 2009, 9 a.m. UTC | #13
Am 17.11.2009 00:26, schrieb Anthony Liguori:
> Kevin Wolf wrote:
>> We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
>> descriptors that don't need to be passed to children to stop this misbehaviour.
>>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>   
> 
>     pid = fork();
>     if (pid == 0) {
>         int open_max = sysconf(_SC_OPEN_MAX), i;
> 
>         for (i = 0; i < open_max; i++) {
>             if (i != STDIN_FILENO &&
>                 i != STDOUT_FILENO &&
>                 i != STDERR_FILENO &&
>                 i != fd) {
>                 close(i);
>             }
> 
> Handles this in a less invasive way.  I think the only problem we have 
> today is that we use popen() for exec: migration.  The solution to that 
> though should be to convert popen to a proper fork/exec() with a pipe.
> 
> I'd prefer to introduce a single fork/exec helper that behaved properly 
> instead of having to deal with cloexec everywhere.

No, unfortunately this doesn't work because it requires knowledge of all
execs. However, the glibc people believe that they are free to fork/exec
whenever they want in any function. Actually, the bug report that led to
this fix was triggered by a hidden fork/exec in glibc. I'm not convinced
that it's right what they're doing, but it's the way it is.

If you like to read it up, you can use
https://bugzilla.redhat.com/show_bug.cgi?id=528134 as a starting point
and dig through the referenced bugs.

Kevin
Kevin Wolf Nov. 17, 2009, 9:12 a.m. UTC | #14
Am 17.11.2009 00:05, schrieb Jamie Lokier:
> Blue Swirl wrote:
>> On Mon, Nov 16, 2009 at 2:47 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 13.11.2009 22:05, schrieb Blue Swirl:
>>>> On Fri, Nov 13, 2009 at 5:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>>> We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
>>>>> descriptors that don't need to be passed to children to stop this misbehaviour.
>>>>
>>>>> -        c = accept(s, (struct sockaddr *)&addr, &addrlen);
>>>>> +        c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen);
>>>>
>>>> Would it be possible to improve the interface so that no casts are
>>>> needed for the calling code?
>>>
>>> How exactly would you do that? The only way I see to do it would be
>>> using void*, but I'm not sure if this really is an improvement.
>>
>> Instead of sockaddr_in vs. sockaddr and the lame casts in between, we
>> could have QSockAddr which magically works. Or if we only ever use
>> sockaddr_in, just use that.
> 
> int qemu_accept(int s, union __attribute__((__transparent_union__)) {
>                                struct sockaddr *sa;
>                                struct sockaddr_in *sin;
>                                struct sockaddr_in6 *sin6;
>                 } addr, socklen_t len);
> 
> #define qemu_accept(s, addr) qemu_accept(s, addr, sizeof(*addr))
> 
> Seems to work. :-)

Interesting. I didn't even know that __transparent_union__ exists. Might
be worth to consider, but the CLOEXEC patch definitely isn't the right
place to do the conversion.

Blue Swirl, care to do it on top of my patch?

Kevin
Blue Swirl Nov. 17, 2009, 8:28 p.m. UTC | #15
On Tue, Nov 17, 2009 at 11:12 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 17.11.2009 00:05, schrieb Jamie Lokier:
>> Blue Swirl wrote:
>>> On Mon, Nov 16, 2009 at 2:47 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>> Am 13.11.2009 22:05, schrieb Blue Swirl:
>>>>> On Fri, Nov 13, 2009 at 5:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>>>> We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
>>>>>> descriptors that don't need to be passed to children to stop this misbehaviour.
>>>>>
>>>>>> -        c = accept(s, (struct sockaddr *)&addr, &addrlen);
>>>>>> +        c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen);
>>>>>
>>>>> Would it be possible to improve the interface so that no casts are
>>>>> needed for the calling code?
>>>>
>>>> How exactly would you do that? The only way I see to do it would be
>>>> using void*, but I'm not sure if this really is an improvement.
>>>
>>> Instead of sockaddr_in vs. sockaddr and the lame casts in between, we
>>> could have QSockAddr which magically works. Or if we only ever use
>>> sockaddr_in, just use that.
>>
>> int qemu_accept(int s, union __attribute__((__transparent_union__)) {
>>                                struct sockaddr *sa;
>>                                struct sockaddr_in *sin;
>>                                struct sockaddr_in6 *sin6;
>>                 } addr, socklen_t len);
>>
>> #define qemu_accept(s, addr) qemu_accept(s, addr, sizeof(*addr))
>>
>> Seems to work. :-)

But on the downside, it's a gcc extension.

> Interesting. I didn't even know that __transparent_union__ exists. Might
> be worth to consider, but the CLOEXEC patch definitely isn't the right
> place to do the conversion.
>
> Blue Swirl, care to do it on top of my patch?

The interface change is clearly out of scope of your patch.
diff mbox

Patch

diff --git a/block/raw-posix.c b/block/raw-posix.c
index f558976..5288948 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -160,6 +160,7 @@  static int raw_open_common(BlockDriverState *bs, const char *filename,
             ret = -EACCES;
         return ret;
     }
+    qemu_set_cloexec(fd);
     s->fd = fd;
     s->aligned_buf = NULL;
 
diff --git a/gdbstub.c b/gdbstub.c
index 055093f..5a4f7d4 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -2356,6 +2356,9 @@  static void gdb_accept(void)
             perror("accept");
             return;
         } else if (fd >= 0) {
+#ifndef _WIN32
+            fcntl(fd, F_SETFD, FD_CLOEXEC);
+#endif
             break;
         }
     }
@@ -2385,6 +2388,9 @@  static int gdbserver_open(int port)
         perror("socket");
         return -1;
     }
+#ifndef _WIN32
+    fcntl(fd, F_SETFD, FD_CLOEXEC);
+#endif
 
     /* allow fast reuse */
     val = 1;
diff --git a/kvm-all.c b/kvm-all.c
index 1916ec6..e78dd44 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -420,6 +420,7 @@  int kvm_init(int smp_cpus)
         ret = -errno;
         goto err;
     }
+    qemu_set_cloexec(s->fd);
 
     ret = kvm_ioctl(s, KVM_GET_API_VERSION, 0);
     if (ret < KVM_API_VERSION) {
diff --git a/migration-tcp.c b/migration-tcp.c
index 9ed92b4..dc8772c 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -99,7 +99,7 @@  MigrationState *tcp_start_outgoing_migration(const char *host_port,
     s->state = MIG_STATE_ACTIVE;
     s->mon_resume = NULL;
     s->bandwidth_limit = bandwidth_limit;
-    s->fd = socket(PF_INET, SOCK_STREAM, 0);
+    s->fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
     if (s->fd == -1) {
         qemu_free(s);
         return NULL;
@@ -139,7 +139,7 @@  static void tcp_accept_incoming_migration(void *opaque)
     int c, ret;
 
     do {
-        c = accept(s, (struct sockaddr *)&addr, &addrlen);
+        c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen);
     } while (c == -1 && socket_error() == EINTR);
 
     dprintf("accepted migration\n");
@@ -186,7 +186,7 @@  int tcp_start_incoming_migration(const char *host_port)
         return -EINVAL;
     }
 
-    s = socket(PF_INET, SOCK_STREAM, 0);
+    s = qemu_socket(PF_INET, SOCK_STREAM, 0);
     if (s == -1)
         return -socket_error();
 
diff --git a/migration-unix.c b/migration-unix.c
index a26587a..d3de7ae 100644
--- a/migration-unix.c
+++ b/migration-unix.c
@@ -98,7 +98,7 @@  MigrationState *unix_start_outgoing_migration(const char *path,
     s->state = MIG_STATE_ACTIVE;
     s->mon_resume = NULL;
     s->bandwidth_limit = bandwidth_limit;
-    s->fd = socket(PF_UNIX, SOCK_STREAM, 0);
+    s->fd = qemu_socket(PF_UNIX, SOCK_STREAM, 0);
     if (s->fd < 0) {
         dprintf("Unable to open socket");
         goto err_after_alloc;
@@ -143,7 +143,7 @@  static void unix_accept_incoming_migration(void *opaque)
     int c, ret;
 
     do {
-        c = accept(s, (struct sockaddr *)&addr, &addrlen);
+        c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen);
     } while (c == -1 && socket_error() == EINTR);
 
     dprintf("accepted migration\n");
@@ -184,7 +184,7 @@  int unix_start_incoming_migration(const char *path)
 
     dprintf("Attempting to start an incoming migration\n");
 
-    sock = socket(PF_UNIX, SOCK_STREAM, 0);
+    sock = qemu_socket(PF_UNIX, SOCK_STREAM, 0);
     if (sock < 0) {
         fprintf(stderr, "Could not open unix socket: %s\n", strerror(errno));
         return -EINVAL;
diff --git a/net.c b/net.c
index 9ea66e3..cff6efd 100644
--- a/net.c
+++ b/net.c
@@ -1515,7 +1515,7 @@  static int net_socket_mcast_create(struct sockaddr_in *mcastaddr)
 	return -1;
 
     }
-    fd = socket(PF_INET, SOCK_DGRAM, 0);
+    fd = qemu_socket(PF_INET, SOCK_DGRAM, 0);
     if (fd < 0) {
         perror("socket(PF_INET, SOCK_DGRAM)");
         return -1;
@@ -1693,7 +1693,7 @@  static void net_socket_accept(void *opaque)
 
     for(;;) {
         len = sizeof(saddr);
-        fd = accept(s->fd, (struct sockaddr *)&saddr, &len);
+        fd = qemu_accept(s->fd, (struct sockaddr *)&saddr, &len);
         if (fd < 0 && errno != EINTR) {
             return;
         } else if (fd >= 0) {
@@ -1724,7 +1724,7 @@  static int net_socket_listen_init(VLANState *vlan,
 
     s = qemu_mallocz(sizeof(NetSocketListenState));
 
-    fd = socket(PF_INET, SOCK_STREAM, 0);
+    fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
     if (fd < 0) {
         perror("socket");
         return -1;
@@ -1765,7 +1765,7 @@  static int net_socket_connect_init(VLANState *vlan,
     if (parse_host_port(&saddr, host_str) < 0)
         return -1;
 
-    fd = socket(PF_INET, SOCK_STREAM, 0);
+    fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
     if (fd < 0) {
         perror("socket");
         return -1;
diff --git a/osdep.c b/osdep.c
index fd8bbd7..15c5ff1 100644
--- a/osdep.c
+++ b/osdep.c
@@ -128,6 +128,8 @@  int qemu_create_pidfile(const char *filename)
     if (lockf(fd, F_TLOCK, 0) == -1)
         return -1;
 
+    qemu_set_cloexec(fd);
+
     len = snprintf(buffer, sizeof(buffer), "%ld\n", (long)getpid());
     if (write(fd, buffer, len) != len)
         return -1;
@@ -201,11 +203,55 @@  int inet_aton(const char *cp, struct in_addr *ia)
     ia->s_addr = addr;
     return 1;
 }
+
+void qemu_set_cloexec(int fd)
+{
+}
+
 #else
+
 void socket_set_nonblock(int fd)
 {
     int f;
     f = fcntl(fd, F_GETFL);
     fcntl(fd, F_SETFL, f | O_NONBLOCK);
 }
+
+void qemu_set_cloexec(int fd)
+{
+    int f;
+    f = fcntl(fd, F_GETFD);
+    fcntl(fd, F_SETFD, f | FD_CLOEXEC);
+}
+
 #endif
+
+/*
+ * Opens a socket with FD_CLOEXEC set
+ */
+int qemu_socket(int domain, int type, int protocol)
+{
+    int ret;
+
+    ret = socket(domain, type, protocol);
+    if (ret >= 0) {
+        qemu_set_cloexec(ret);
+    }
+
+    return ret;
+}
+
+/*
+ * Accept a connection and set FD_CLOEXEC
+ */
+int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen)
+{
+    int ret;
+
+    ret = accept(s, addr, addrlen);
+    if (ret >= 0) {
+        qemu_set_cloexec(ret);
+    }
+
+    return ret;
+}
diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index 7f391c9..855763f 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -630,6 +630,9 @@  int paio_init(void)
         return -1;
     }
 
+    qemu_set_cloexec(fds[0]);
+    qemu_set_cloexec(fds[1]);
+
     s->rfd = fds[0];
     s->wfd = fds[1];
 
diff --git a/qemu-char.c b/qemu-char.c
index 40bd7e8..6f86459 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -630,6 +630,7 @@  static CharDriverState *qemu_chr_open_file_out(QemuOpts *opts)
                       O_WRONLY | O_TRUNC | O_CREAT | O_BINARY, 0666));
     if (fd_out < 0)
         return NULL;
+    qemu_set_cloexec(fd_out);
     return qemu_chr_open_fd(-1, fd_out);
 }
 
@@ -657,6 +658,10 @@  static CharDriverState *qemu_chr_open_pipe(QemuOpts *opts)
         if (fd_in < 0)
             return NULL;
     }
+
+    qemu_set_cloexec(fd_in);
+    qemu_set_cloexec(fd_out);
+
     return qemu_chr_open_fd(fd_in, fd_out);
 }
 
@@ -2106,7 +2111,7 @@  static void tcp_chr_accept(void *opaque)
 	    len = sizeof(saddr);
 	    addr = (struct sockaddr *)&saddr;
 	}
-        fd = accept(s->listen_fd, addr, &len);
+        fd = qemu_accept(s->listen_fd, addr, &len);
         if (fd < 0 && errno != EINTR) {
             return;
         } else if (fd >= 0) {
diff --git a/qemu-common.h b/qemu-common.h
index b779cfe..2f6ca1a 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -159,6 +159,8 @@  void *get_mmap_addr(unsigned long size);
 void qemu_mutex_lock_iothread(void);
 void qemu_mutex_unlock_iothread(void);
 
+void qemu_set_cloexec(int fd);
+
 /* Error handling.  */
 
 void QEMU_NORETURN hw_error(const char *fmt, ...)
diff --git a/qemu-sockets.c b/qemu-sockets.c
index 8801453..8850516 100644
--- a/qemu-sockets.c
+++ b/qemu-sockets.c
@@ -160,7 +160,7 @@  int inet_listen_opts(QemuOpts *opts, int port_offset)
         getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen,
 		        uaddr,INET6_ADDRSTRLEN,uport,32,
 		        NI_NUMERICHOST | NI_NUMERICSERV);
-        slisten = socket(e->ai_family, e->ai_socktype, e->ai_protocol);
+        slisten = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol);
         if (slisten < 0) {
             fprintf(stderr,"%s: socket(%s): %s\n", __FUNCTION__,
                     inet_strfamily(e->ai_family), strerror(errno));
@@ -258,7 +258,7 @@  int inet_connect_opts(QemuOpts *opts)
             fprintf(stderr,"%s: getnameinfo: oops\n", __FUNCTION__);
             continue;
         }
-        sock = socket(e->ai_family, e->ai_socktype, e->ai_protocol);
+        sock = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol);
         if (sock < 0) {
             fprintf(stderr,"%s: socket(%s): %s\n", __FUNCTION__,
             inet_strfamily(e->ai_family), strerror(errno));
@@ -351,7 +351,7 @@  int inet_dgram_opts(QemuOpts *opts)
     }
 
     /* create socket */
-    sock = socket(peer->ai_family, peer->ai_socktype, peer->ai_protocol);
+    sock = qemu_socket(peer->ai_family, peer->ai_socktype, peer->ai_protocol);
     if (sock < 0) {
         fprintf(stderr,"%s: socket(%s): %s\n", __FUNCTION__,
                 inet_strfamily(peer->ai_family), strerror(errno));
@@ -505,7 +505,7 @@  int unix_listen_opts(QemuOpts *opts)
     const char *path = qemu_opt_get(opts, "path");
     int sock, fd;
 
-    sock = socket(PF_UNIX, SOCK_STREAM, 0);
+    sock = qemu_socket(PF_UNIX, SOCK_STREAM, 0);
     if (sock < 0) {
         perror("socket(unix)");
         return -1;
@@ -560,7 +560,7 @@  int unix_connect_opts(QemuOpts *opts)
         return -1;
     }
 
-    sock = socket(PF_UNIX, SOCK_STREAM, 0);
+    sock = qemu_socket(PF_UNIX, SOCK_STREAM, 0);
     if (sock < 0) {
         perror("socket(unix)");
         return -1;
diff --git a/qemu_socket.h b/qemu_socket.h
index c253b32..86bdbf5 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -32,6 +32,8 @@  int inet_aton(const char *cp, struct in_addr *ia);
 #include "qemu-option.h"
 
 /* misc helpers */
+int qemu_socket(int domain, int type, int protocol);
+int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 void socket_set_nonblock(int fd);
 int send_all(int fd, const void *buf, int len1);
 
diff --git a/slirp/misc.c b/slirp/misc.c
index e9f08fd..c76ad8f 100644
--- a/slirp/misc.c
+++ b/slirp/misc.c
@@ -132,7 +132,7 @@  fork_exec(struct socket *so, const char *ex, int do_pty)
 		addr.sin_port = 0;
 		addr.sin_addr.s_addr = INADDR_ANY;
 
-		if ((s = socket(AF_INET, SOCK_STREAM, 0)) < 0 ||
+		if ((s = qemu_socket(AF_INET, SOCK_STREAM, 0)) < 0 ||
 		    bind(s, (struct sockaddr *)&addr, addrlen) < 0 ||
 		    listen(s, 1) < 0) {
 			lprint("Error: inet socket: %s\n", strerror(errno));
@@ -165,7 +165,7 @@  fork_exec(struct socket *so, const char *ex, int do_pty)
 			 * Connect to the socket
 			 * XXX If any of these fail, we're in trouble!
 	 		 */
-			s = socket(AF_INET, SOCK_STREAM, 0);
+			s = qemu_socket(AF_INET, SOCK_STREAM, 0);
 			addr.sin_addr = loopback_addr;
                         do {
                             ret = connect(s, (struct sockaddr *)&addr, addrlen);
diff --git a/slirp/slirp.h b/slirp/slirp.h
index 9ef57ea..98a2644 100644
--- a/slirp/slirp.h
+++ b/slirp/slirp.h
@@ -197,6 +197,10 @@  int inet_aton(const char *cp, struct in_addr *ia);
 #include "bootp.h"
 #include "tftp.h"
 
+/* osdep.c */
+int qemu_socket(int domain, int type, int protocol);
+
+
 struct Slirp {
     QTAILQ_ENTRY(Slirp) entry;
 
diff --git a/slirp/socket.c b/slirp/socket.c
index 207109c..cf6e6a9 100644
--- a/slirp/socket.c
+++ b/slirp/socket.c
@@ -626,7 +626,7 @@  tcp_listen(Slirp *slirp, u_int32_t haddr, u_int hport, u_int32_t laddr,
 	addr.sin_addr.s_addr = haddr;
 	addr.sin_port = hport;
 
-	if (((s = socket(AF_INET,SOCK_STREAM,0)) < 0) ||
+	if (((s = qemu_socket(AF_INET,SOCK_STREAM,0)) < 0) ||
 	    (setsockopt(s,SOL_SOCKET,SO_REUSEADDR,(char *)&opt,sizeof(int)) < 0) ||
 	    (bind(s,(struct sockaddr *)&addr, sizeof(addr)) < 0) ||
 	    (listen(s,1) < 0)) {
diff --git a/slirp/tcp_subr.c b/slirp/tcp_subr.c
index 0417345..7851307 100644
--- a/slirp/tcp_subr.c
+++ b/slirp/tcp_subr.c
@@ -325,7 +325,7 @@  int tcp_fconnect(struct socket *so)
   DEBUG_CALL("tcp_fconnect");
   DEBUG_ARG("so = %lx", (long )so);
 
-  if( (ret=so->s=socket(AF_INET,SOCK_STREAM,0)) >= 0) {
+  if( (ret = so->s = qemu_socket(AF_INET,SOCK_STREAM,0)) >= 0) {
     int opt, s=so->s;
     struct sockaddr_in addr;
 
diff --git a/slirp/udp.c b/slirp/udp.c
index a88b645..d6c39b9 100644
--- a/slirp/udp.c
+++ b/slirp/udp.c
@@ -302,7 +302,7 @@  int udp_output(struct socket *so, struct mbuf *m,
 int
 udp_attach(struct socket *so)
 {
-  if((so->s = socket(AF_INET,SOCK_DGRAM,0)) != -1) {
+  if((so->s = qemu_socket(AF_INET,SOCK_DGRAM,0)) != -1) {
     so->so_expire = curtime + SO_EXPIRE;
     insque(so, &so->slirp->udb);
   }
@@ -350,7 +350,7 @@  udp_listen(Slirp *slirp, u_int32_t haddr, u_int hport, u_int32_t laddr,
 	if (!so) {
 	    return NULL;
 	}
-	so->s = socket(AF_INET,SOCK_DGRAM,0);
+	so->s = qemu_socket(AF_INET,SOCK_DGRAM,0);
 	so->so_expire = curtime + SO_EXPIRE;
 	insque(so, &slirp->udb);
 
diff --git a/vl.c b/vl.c
index fff8e8d..c84b7fb 100644
--- a/vl.c
+++ b/vl.c
@@ -1245,6 +1245,7 @@  static int hpet_start_timer(struct qemu_alarm_timer *t)
     fd = open("/dev/hpet", O_RDONLY);
     if (fd < 0)
         return -1;
+    qemu_set_cloexec(fd);
 
     /* Set frequency */
     r = ioctl(fd, HPET_IRQFREQ, RTC_FREQ);
@@ -1294,6 +1295,7 @@  static int rtc_start_timer(struct qemu_alarm_timer *t)
     TFR(rtc_fd = open("/dev/rtc", O_RDONLY));
     if (rtc_fd < 0)
         return -1;
+    qemu_set_cloexec(rtc_fd);
     ioctl(rtc_fd, RTC_IRQP_READ, &current_rtc_freq);
     if (current_rtc_freq != RTC_FREQ &&
         ioctl(rtc_fd, RTC_IRQP_SET, RTC_FREQ) < 0) {
@@ -3305,6 +3307,9 @@  static int qemu_event_init(void)
     if (err == -1)
         return -errno;
 
+    qemu_set_cloexec(fds[0]);
+    qemu_set_cloexec(fds[1]);
+
     err = fcntl_setfl(fds[0], O_NONBLOCK);
     if (err < 0)
         goto fail;
@@ -5461,6 +5466,9 @@  int main(int argc, char **argv, char **envp)
 	} else if (pid < 0)
             exit(1);
 
+	close(fds[0]);
+	qemu_set_cloexec(fds[1]);
+
 	setsid();
 
 	pid = fork();
@@ -5862,6 +5870,7 @@  int main(int argc, char **argv, char **envp)
 
 	chdir("/");
 	TFR(fd = open("/dev/null", O_RDWR));
+	qemu_set_cloexec(fd);
 	if (fd == -1)
 	    exit(1);
     }
diff --git a/vnc.c b/vnc.c
index 2bb8024..32c4678 100644
--- a/vnc.c
+++ b/vnc.c
@@ -2247,7 +2247,7 @@  static void vnc_listen_read(void *opaque)
     /* Catch-up */
     vga_hw_update();
 
-    int csock = accept(vs->lsock, (struct sockaddr *)&addr, &addrlen);
+    int csock = qemu_accept(vs->lsock, (struct sockaddr *)&addr, &addrlen);
     if (csock != -1) {
         vnc_connect(vs, csock);
     }