Patchwork [v3,3/5] osdep: Enable qemu_open to dup pre-opened fd

login
register
mail settings
Submitter Corey Bryant
Date June 14, 2012, 3:55 p.m.
Message ID <1339689305-27031-4-git-send-email-coreyb@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/164959/
State New
Headers show

Comments

Corey Bryant - June 14, 2012, 3:55 p.m.
This patch adds support to qemu_open to dup(fd) a pre-opened file
descriptor if the filename is of the format /dev/fd/X.

This can be used when QEMU is restricted from opening files, and
the management application opens files on QEMU's behalf.

If the fd was passed to the monitor with the pass-fd command, it
must be explicitly closed with the 'closefd' command when it is
no longer required, in order to prevent fd leaks.

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
---
v2:
 -Get rid of file_open and move dup code to qemu_open
  (kwolf@redhat.com)
 -Use strtol wrapper instead of atoi (kwolf@redhat.com)

v3:
 -Add note about fd leakage (eblake@redhat.com)

 osdep.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)
Eric Blake - June 15, 2012, 3:16 p.m.
On 06/14/2012 09:55 AM, Corey Bryant wrote:
> This patch adds support to qemu_open to dup(fd) a pre-opened file
> descriptor if the filename is of the format /dev/fd/X.
> 

> +++ b/osdep.c
> @@ -82,6 +82,19 @@ int qemu_open(const char *name, int flags, ...)
>      int ret;
>      int mode = 0;
>  
> +#ifndef _WIN32
> +    const char *p;
> +
> +    /* Attempt dup of fd for pre-opened file */
> +    if (strstart(name, "/dev/fd/", &p)) {
> +        ret = qemu_parse_fd(p);
> +        if (ret == -1) {
> +            return -1;
> +        }
> +        return dup(ret);

I think you need to honor flags so that the end use of the fd will be as
if qemu had directly opened the file, rather than just doing a blind dup
with a resulting fd that is in a different state than the caller
expected.  I can think of at least the following cases (there may be more):

if (flags & O_TRUNCATE ||
      ((flags & (O_CREAT|O_EXCL)) == (O_CREAT|O_EXCL))
    ftruncate(ret, 0);

Why do I truncate on O_CREAT|O_EXCL?  Because that's a case where open()
guarantees that the file will have just been created empty.

if (flags & O_CLOEXEC)
   use fcntl(F_DUPFD_CLOEXEC) instead of dup(), if possible
   else fallback to fcntl(F_GETFD/F_SETFD) to set FD_CLOEXEC non-atomically

Oh, and are we using MSG_CMSG_CLOEXEC where possible (and where not
possible, falling back to fcntl(F_GETFD/F_SETFD) to set FD_CLOEXEC) on
all fds received by 'getfd' and 'pass-fd'?  I can't think of any reason
why 'migrate fd:name' would need to be inheritable, and in the case of
/dev/fd/ parsing, while the dup() result may need to be inheritable, the
original that we are dup'ing from should most certainly be cloexec.

if (flags & O_NONBLOCK)
   use fcntl(F_GETFL/F_SETFL) to set O_NONBLOCK
else
   use fcntl(F_GETFL/F_SETFL) to clear O_NONBLOCK

or maybe we document that callers of pass-fd must always pass fds with
O_NONBLOCK clear instead of clearing it ourselves.  Or maybe we make
sure part of the process of tying name with fd in the lookup list of
named fds is determining the current O_NONBLOCK state in case future
qemu_open() need it in the opposite state.

Treat O_APPEND similarly to O_NONBLOCK (with fcntl(F_GETFL/F_SETFL) to
match the desired open() setting).
Corey Bryant - June 15, 2012, 6:16 p.m.
On 06/15/2012 11:16 AM, Eric Blake wrote:
> On 06/14/2012 09:55 AM, Corey Bryant wrote:
>> This patch adds support to qemu_open to dup(fd) a pre-opened file
>> descriptor if the filename is of the format /dev/fd/X.
>>
>
>> +++ b/osdep.c
>> @@ -82,6 +82,19 @@ int qemu_open(const char *name, int flags, ...)
>>       int ret;
>>       int mode = 0;
>>
>> +#ifndef _WIN32
>> +    const char *p;
>> +
>> +    /* Attempt dup of fd for pre-opened file */
>> +    if (strstart(name, "/dev/fd/", &p)) {
>> +        ret = qemu_parse_fd(p);
>> +        if (ret == -1) {
>> +            return -1;
>> +        }
>> +        return dup(ret);
>
> I think you need to honor flags so that the end use of the fd will be as
> if qemu had directly opened the file, rather than just doing a blind dup
> with a resulting fd that is in a different state than the caller
> expected.  I can think of at least the following cases (there may be more):

I was thinking libvirt would handle all the flag settings on open 
(obviously since that's how I coded it).  I think you're right with this 
approach though as QEMU will re-open the same file various times with 
different flags.

There are some flags that I don't think we'll be able to change.  For 
example: O_RDONLY, O_WRONLY, O_RDWR.  I assume libvirt would open all 
files O_RDWR.

>
> if (flags & O_TRUNCATE ||
>        ((flags & (O_CREAT|O_EXCL)) == (O_CREAT|O_EXCL))
>      ftruncate(ret, 0);
>
> Why do I truncate on O_CREAT|O_EXCL?  Because that's a case where open()
> guarantees that the file will have just been created empty.
>

That makes sense.

> if (flags & O_CLOEXEC)
>     use fcntl(F_DUPFD_CLOEXEC) instead of dup(), if possible
>     else fallback to fcntl(F_GETFD/F_SETFD) to set FD_CLOEXEC non-atomically
>

That makes sense too.

> Oh, and are we using MSG_CMSG_CLOEXEC where possible (and where not
> possible, falling back to fcntl(F_GETFD/F_SETFD) to set FD_CLOEXEC) on
> all fds received by 'getfd' and 'pass-fd'?  I can't think of any reason
> why 'migrate fd:name' would need to be inheritable, and in the case of
> /dev/fd/ parsing, while the dup() result may need to be inheritable, the
> original that we are dup'ing from should most certainly be cloexec.

It doesn't look like we use MSG_CMSG_CLOEXEC anywhere in QEMU.  I don't 
think we can modify getfd at this point (compatibility) but we could 
update pass-fd to set it.  It may make more sense to set it with 
fcntl(FD_CLOEXEC) in qmp_pass_fd().

>
> if (flags & O_NONBLOCK)
>     use fcntl(F_GETFL/F_SETFL) to set O_NONBLOCK
> else
>     use fcntl(F_GETFL/F_SETFL) to clear O_NONBLOCK
>
> or maybe we document that callers of pass-fd must always pass fds with
> O_NONBLOCK clear instead of clearing it ourselves.  Or maybe we make
> sure part of the process of tying name with fd in the lookup list of
> named fds is determining the current O_NONBLOCK state in case future
> qemu_open() need it in the opposite state.

Just documenting it seems error-prone.  Why not just set/clear it based 
on the flag passed to qemu_open?

>
> Treat O_APPEND similarly to O_NONBLOCK (with fcntl(F_GETFL/F_SETFL) to
> match the desired open() setting).
>

Ok
Eric Blake - June 15, 2012, 6:42 p.m.
On 06/15/2012 12:16 PM, Corey Bryant wrote:

>> I think you need to honor flags so that the end use of the fd will be as
>> if qemu had directly opened the file, rather than just doing a blind dup
>> with a resulting fd that is in a different state than the caller
>> expected.  I can think of at least the following cases (there may be
>> more):
> 
> I was thinking libvirt would handle all the flag settings on open
> (obviously since that's how I coded it).  I think you're right with this
> approach though as QEMU will re-open the same file various times with
> different flags.

How will libvirt know whether qemu wanted to open a file with O_TRUNCATE
vs. reusing an entire file, or with O_NONBLOCK or not, and so forth?  I
think there are just too many qmeu_open() calls with different flags to
expect libvirt to know how to pre-open all possible fds in such a manner
that /dev/fd/nnn will be a valid replacement for what qemu would have
done, in the cases where qemu can instead correct flags itself.

> 
> There are some flags that I don't think we'll be able to change.  For
> example: O_RDONLY, O_WRONLY, O_RDWR.  I assume libvirt would open all
> files O_RDWR.

I agree that this can't be changed, so this is one case where libvirt
will have to know what the file will used for.  But it is also a case
where qemu can at least check whether the fd passed in has sufficient
permissions (fcntl(F_GETFL)) for what qemu wanted to use, and should
error out here rather than silently succeed here then have a weird EBADF
failure down the road when the dup'd fd is finally used.  You are right
that libvirt should always be safe in passing in an O_RDWR fd for
whatever qemu wants, although that may represent security holes (there's
reasons for O_RDONLY); and in cases where libvirt is instead passing in
one end of a pipe, the fd will necessarily be O_RDONLY or O_WRONLY
(since you can't have an O_RDWR pipe).

>> Oh, and are we using MSG_CMSG_CLOEXEC where possible (and where not
>> possible, falling back to fcntl(F_GETFD/F_SETFD) to set FD_CLOEXEC) on
>> all fds received by 'getfd' and 'pass-fd'?  I can't think of any reason
>> why 'migrate fd:name' would need to be inheritable, and in the case of
>> /dev/fd/ parsing, while the dup() result may need to be inheritable, the
>> original that we are dup'ing from should most certainly be cloexec.
> 
> It doesn't look like we use MSG_CMSG_CLOEXEC anywhere in QEMU.  I don't
> think we can modify getfd at this point (compatibility) but we could
> update pass-fd to set it.  It may make more sense to set it with
> fcntl(FD_CLOEXEC) in qmp_pass_fd().

That's not atomic.  If we don't care about atomicity (for example, if we
can guarantee that no other thread in qemu can possibly fork/exec in
between the monitor thread receiving an fd then converting it to
cloexec, based on how we hold a mutex), then that's fine.  OR, if we
make sure _all_ fork() calls sanitize themselves, by closing all fds in
the getfd/pass-fd list prior to calling exec(), then we don't have to
even worry about cloexec (but then you have to worry about locking the
getfd name list, since locking a list to iterate it is not an async-safe
action and probably can't be done between fork() and exec()).
Otherwise, using MSG_CMSG_CLOEXEC is the only safe way to avoid leaking
unintended fds into another thread's child process.

> 
>>
>> if (flags & O_NONBLOCK)
>>     use fcntl(F_GETFL/F_SETFL) to set O_NONBLOCK
>> else
>>     use fcntl(F_GETFL/F_SETFL) to clear O_NONBLOCK
>>
>> or maybe we document that callers of pass-fd must always pass fds with
>> O_NONBLOCK clear instead of clearing it ourselves.  Or maybe we make
>> sure part of the process of tying name with fd in the lookup list of
>> named fds is determining the current O_NONBLOCK state in case future
>> qemu_open() need it in the opposite state.
> 
> Just documenting it seems error-prone.  Why not just set/clear it based
> on the flag passed to qemu_open?

Yep, back to my argument that making libvirt have to know every context
that qemu will want to open, and what flags it would be using in those
contexts, is painful compared to having qemu just do the right thing in
the first place, or gracefully erroring out right away at the point of
the qemu_open(/dev/fd/nnn).
Kevin Wolf - June 15, 2012, 6:46 p.m.
Am 15.06.2012 20:16, schrieb Corey Bryant:
> 
> 
> On 06/15/2012 11:16 AM, Eric Blake wrote:
>> On 06/14/2012 09:55 AM, Corey Bryant wrote:
>>> This patch adds support to qemu_open to dup(fd) a pre-opened file
>>> descriptor if the filename is of the format /dev/fd/X.
>>>
>>
>>> +++ b/osdep.c
>>> @@ -82,6 +82,19 @@ int qemu_open(const char *name, int flags, ...)
>>>       int ret;
>>>       int mode = 0;
>>>
>>> +#ifndef _WIN32
>>> +    const char *p;
>>> +
>>> +    /* Attempt dup of fd for pre-opened file */
>>> +    if (strstart(name, "/dev/fd/", &p)) {
>>> +        ret = qemu_parse_fd(p);
>>> +        if (ret == -1) {
>>> +            return -1;
>>> +        }
>>> +        return dup(ret);
>>
>> I think you need to honor flags so that the end use of the fd will be as
>> if qemu had directly opened the file, rather than just doing a blind dup
>> with a resulting fd that is in a different state than the caller
>> expected.  I can think of at least the following cases (there may be more):
> 
> I was thinking libvirt would handle all the flag settings on open 
> (obviously since that's how I coded it).  I think you're right with this 
> approach though as QEMU will re-open the same file various times with 
> different flags.
> 
> There are some flags that I don't think we'll be able to change.  For 
> example: O_RDONLY, O_WRONLY, O_RDWR.  I assume libvirt would open all 
> files O_RDWR.

I think we need to check all of them and fail qemu_open() if they don't
match. Those that qemu can change, should be just changed, of course.

>> Oh, and are we using MSG_CMSG_CLOEXEC where possible (and where not
>> possible, falling back to fcntl(F_GETFD/F_SETFD) to set FD_CLOEXEC) on
>> all fds received by 'getfd' and 'pass-fd'?  I can't think of any reason
>> why 'migrate fd:name' would need to be inheritable, and in the case of
>> /dev/fd/ parsing, while the dup() result may need to be inheritable, the
>> original that we are dup'ing from should most certainly be cloexec.
> 
> It doesn't look like we use MSG_CMSG_CLOEXEC anywhere in QEMU.  I don't 
> think we can modify getfd at this point (compatibility) but we could 
> update pass-fd to set it.  It may make more sense to set it with 
> fcntl(FD_CLOEXEC) in qmp_pass_fd().

In which scenario would any client break if we set FD_CLOEXEC? I don't
think compatibility means we can't fix any bugs.

>> if (flags & O_NONBLOCK)
>>     use fcntl(F_GETFL/F_SETFL) to set O_NONBLOCK
>> else
>>     use fcntl(F_GETFL/F_SETFL) to clear O_NONBLOCK
>>
>> or maybe we document that callers of pass-fd must always pass fds with
>> O_NONBLOCK clear instead of clearing it ourselves.  Or maybe we make
>> sure part of the process of tying name with fd in the lookup list of
>> named fds is determining the current O_NONBLOCK state in case future
>> qemu_open() need it in the opposite state.
> 
> Just documenting it seems error-prone.  Why not just set/clear it based 
> on the flag passed to qemu_open?

I agree. We could just check and return an error if they aren't set
correctly, but I think adjusting the flags is nicer.

Kevin
Corey Bryant - June 15, 2012, 7:02 p.m.
On 06/15/2012 02:42 PM, Eric Blake wrote:
> On 06/15/2012 12:16 PM, Corey Bryant wrote:
>
>>> I think you need to honor flags so that the end use of the fd will be as
>>> if qemu had directly opened the file, rather than just doing a blind dup
>>> with a resulting fd that is in a different state than the caller
>>> expected.  I can think of at least the following cases (there may be
>>> more):
>>
>> I was thinking libvirt would handle all the flag settings on open
>> (obviously since that's how I coded it).  I think you're right with this
>> approach though as QEMU will re-open the same file various times with
>> different flags.
>
> How will libvirt know whether qemu wanted to open a file with O_TRUNCATE
> vs. reusing an entire file, or with O_NONBLOCK or not, and so forth?  I
> think there are just too many qmeu_open() calls with different flags to
> expect libvirt to know how to pre-open all possible fds in such a manner
> that /dev/fd/nnn will be a valid replacement for what qemu would have
> done, in the cases where qemu can instead correct flags itself.
>

I agree completely.

>>
>> There are some flags that I don't think we'll be able to change.  For
>> example: O_RDONLY, O_WRONLY, O_RDWR.  I assume libvirt would open all
>> files O_RDWR.
>
> I agree that this can't be changed, so this is one case where libvirt
> will have to know what the file will used for.  But it is also a case
> where qemu can at least check whether the fd passed in has sufficient
> permissions (fcntl(F_GETFL)) for what qemu wanted to use, and should
> error out here rather than silently succeed here then have a weird EBADF
> failure down the road when the dup'd fd is finally used.  You are right
> that libvirt should always be safe in passing in an O_RDWR fd for
> whatever qemu wants, although that may represent security holes (there's
> reasons for O_RDONLY); and in cases where libvirt is instead passing in
> one end of a pipe, the fd will necessarily be O_RDONLY or O_WRONLY
> (since you can't have an O_RDWR pipe).

Good points.  In addition to flag setting, I'll add some flag checking 
for those flags that can't be set (ie. O_RDONLY, O_WRONLY).

>
>>> Oh, and are we using MSG_CMSG_CLOEXEC where possible (and where not
>>> possible, falling back to fcntl(F_GETFD/F_SETFD) to set FD_CLOEXEC) on
>>> all fds received by 'getfd' and 'pass-fd'?  I can't think of any reason
>>> why 'migrate fd:name' would need to be inheritable, and in the case of
>>> /dev/fd/ parsing, while the dup() result may need to be inheritable, the
>>> original that we are dup'ing from should most certainly be cloexec.
>>
>> It doesn't look like we use MSG_CMSG_CLOEXEC anywhere in QEMU.  I don't
>> think we can modify getfd at this point (compatibility) but we could
>> update pass-fd to set it.  It may make more sense to set it with
>> fcntl(FD_CLOEXEC) in qmp_pass_fd().
>
> That's not atomic.  If we don't care about atomicity (for example, if we
> can guarantee that no other thread in qemu can possibly fork/exec in
> between the monitor thread receiving an fd then converting it to
> cloexec, based on how we hold a mutex), then that's fine.  OR, if we
> make sure _all_ fork() calls sanitize themselves, by closing all fds in
> the getfd/pass-fd list prior to calling exec(), then we don't have to
> even worry about cloexec (but then you have to worry about locking the
> getfd name list, since locking a list to iterate it is not an async-safe
> action and probably can't be done between fork() and exec()).
> Otherwise, using MSG_CMSG_CLOEXEC is the only safe way to avoid leaking
> unintended fds into another thread's child process.

Ok I'm sold.  I'll first look into setting MSG_CMSG_CLOEXEC.

>
>>
>>>
>>> if (flags & O_NONBLOCK)
>>>      use fcntl(F_GETFL/F_SETFL) to set O_NONBLOCK
>>> else
>>>      use fcntl(F_GETFL/F_SETFL) to clear O_NONBLOCK
>>>
>>> or maybe we document that callers of pass-fd must always pass fds with
>>> O_NONBLOCK clear instead of clearing it ourselves.  Or maybe we make
>>> sure part of the process of tying name with fd in the lookup list of
>>> named fds is determining the current O_NONBLOCK state in case future
>>> qemu_open() need it in the opposite state.
>>
>> Just documenting it seems error-prone.  Why not just set/clear it based
>> on the flag passed to qemu_open?
>
> Yep, back to my argument that making libvirt have to know every context
> that qemu will want to open, and what flags it would be using in those
> contexts, is painful compared to having qemu just do the right thing in
> the first place, or gracefully erroring out right away at the point of
> the qemu_open(/dev/fd/nnn).
>

I agree.
Corey Bryant - June 15, 2012, 7:19 p.m.
On 06/15/2012 02:46 PM, Kevin Wolf wrote:
> Am 15.06.2012 20:16, schrieb Corey Bryant:
>>
>>
>> On 06/15/2012 11:16 AM, Eric Blake wrote:
>>> On 06/14/2012 09:55 AM, Corey Bryant wrote:
>>>> This patch adds support to qemu_open to dup(fd) a pre-opened file
>>>> descriptor if the filename is of the format /dev/fd/X.
>>>>
>>>
>>>> +++ b/osdep.c
>>>> @@ -82,6 +82,19 @@ int qemu_open(const char *name, int flags, ...)
>>>>        int ret;
>>>>        int mode = 0;
>>>>
>>>> +#ifndef _WIN32
>>>> +    const char *p;
>>>> +
>>>> +    /* Attempt dup of fd for pre-opened file */
>>>> +    if (strstart(name, "/dev/fd/", &p)) {
>>>> +        ret = qemu_parse_fd(p);
>>>> +        if (ret == -1) {
>>>> +            return -1;
>>>> +        }
>>>> +        return dup(ret);
>>>
>>> I think you need to honor flags so that the end use of the fd will be as
>>> if qemu had directly opened the file, rather than just doing a blind dup
>>> with a resulting fd that is in a different state than the caller
>>> expected.  I can think of at least the following cases (there may be more):
>>
>> I was thinking libvirt would handle all the flag settings on open
>> (obviously since that's how I coded it).  I think you're right with this
>> approach though as QEMU will re-open the same file various times with
>> different flags.
>>
>> There are some flags that I don't think we'll be able to change.  For
>> example: O_RDONLY, O_WRONLY, O_RDWR.  I assume libvirt would open all
>> files O_RDWR.
>
> I think we need to check all of them and fail qemu_open() if they don't
> match. Those that qemu can change, should be just changed, of course.
>

Ok.  I remember a scenario where QEMU opens a file read-only (perhaps to 
check headers and determine the file format) before re-opening it 
read-write.  Perhaps this is only when format= isn't specified with 
-drive.  I'm thinking we may need to change flags to read-write where 
they used to be read-only, in some circumstances.

>>> Oh, and are we using MSG_CMSG_CLOEXEC where possible (and where not
>>> possible, falling back to fcntl(F_GETFD/F_SETFD) to set FD_CLOEXEC) on
>>> all fds received by 'getfd' and 'pass-fd'?  I can't think of any reason
>>> why 'migrate fd:name' would need to be inheritable, and in the case of
>>> /dev/fd/ parsing, while the dup() result may need to be inheritable, the
>>> original that we are dup'ing from should most certainly be cloexec.
>>
>> It doesn't look like we use MSG_CMSG_CLOEXEC anywhere in QEMU.  I don't
>> think we can modify getfd at this point (compatibility) but we could
>> update pass-fd to set it.  It may make more sense to set it with
>> fcntl(FD_CLOEXEC) in qmp_pass_fd().
>
> In which scenario would any client break if we set FD_CLOEXEC? I don't
> think compatibility means we can't fix any bugs.
>

I don't know if it breaks any client.  Maybe it's not a compatibility 
error.  It dopes change behavior down the line though.  If you think 
it's ok to set FD_CLOEXEC for getfd too, then I'm happy to do it.

>>> if (flags & O_NONBLOCK)
>>>      use fcntl(F_GETFL/F_SETFL) to set O_NONBLOCK
>>> else
>>>      use fcntl(F_GETFL/F_SETFL) to clear O_NONBLOCK
>>>
>>> or maybe we document that callers of pass-fd must always pass fds with
>>> O_NONBLOCK clear instead of clearing it ourselves.  Or maybe we make
>>> sure part of the process of tying name with fd in the lookup list of
>>> named fds is determining the current O_NONBLOCK state in case future
>>> qemu_open() need it in the opposite state.
>>
>> Just documenting it seems error-prone.  Why not just set/clear it based
>> on the flag passed to qemu_open?
>
> I agree. We could just check and return an error if they aren't set
> correctly, but I think adjusting the flags is nicer.
>
> Kevin
>

Ok thanks for the input!
Eric Blake - June 15, 2012, 8 p.m.
On 06/15/2012 01:19 PM, Corey Bryant wrote:

>>> There are some flags that I don't think we'll be able to change.  For
>>> example: O_RDONLY, O_WRONLY, O_RDWR.  I assume libvirt would open all
>>> files O_RDWR.
>>
>> I think we need to check all of them and fail qemu_open() if they don't
>> match. Those that qemu can change, should be just changed, of course.
>>
> 
> Ok.  I remember a scenario where QEMU opens a file read-only (perhaps to
> check headers and determine the file format) before re-opening it
> read-write.  Perhaps this is only when format= isn't specified with
> -drive.  I'm thinking we may need to change flags to read-write where
> they used to be read-only, in some circumstances.

In those situations, libvirt would pass fd with O_RDWR, and qemu_open()
would be fine requesting O_RDONLY the first time (subset is okay), and
O_RDWR the second time.  Where you have to error out is where libvirt
passes O_RDONLY but qemu wants O_RDWR, and so forth.


>>
>> In which scenario would any client break if we set FD_CLOEXEC? I don't
>> think compatibility means we can't fix any bugs.
>>
> 
> I don't know if it breaks any client.  Maybe it's not a compatibility
> error.  It dopes change behavior down the line though.  If you think
> it's ok to set FD_CLOEXEC for getfd too, then I'm happy to do it.

The only case that a client might break is if there were a way to pass
an fd into qemu and then intentionally see that fd in a child process of
qemu.  But in the case of 'migrate fd:nnn', you aren't spawning a child
process, and even in the case of 'migrate exec:command' (which libvirt
no longer uses if fd:nnn works), I don't see how the client could have
ever intentionally tried to use 'getfd' in advance to pass an extra fd
for use inside the 'exec:command' child.  Besides, before 'pass-fd' was
around, how would the management app triggering the 'exec:command' even
know what fd number would accidentally be inherited into the
exec:command child?  I think it is pretty much a straight bug-fix for
'getfd' to always set FD_CLOEXEC, and preferably set it atomically via
MSG_CMSG_CLOEXEC.
Corey Bryant - June 15, 2012, 8:49 p.m.
On 06/15/2012 04:00 PM, Eric Blake wrote:
> On 06/15/2012 01:19 PM, Corey Bryant wrote:
>
>>>> There are some flags that I don't think we'll be able to change.  For
>>>> example: O_RDONLY, O_WRONLY, O_RDWR.  I assume libvirt would open all
>>>> files O_RDWR.
>>>
>>> I think we need to check all of them and fail qemu_open() if they don't
>>> match. Those that qemu can change, should be just changed, of course.
>>>
>>
>> Ok.  I remember a scenario where QEMU opens a file read-only (perhaps to
>> check headers and determine the file format) before re-opening it
>> read-write.  Perhaps this is only when format= isn't specified with
>> -drive.  I'm thinking we may need to change flags to read-write where
>> they used to be read-only, in some circumstances.
>
> In those situations, libvirt would pass fd with O_RDWR, and qemu_open()
> would be fine requesting O_RDONLY the first time (subset is okay), and
> O_RDWR the second time.  Where you have to error out is where libvirt
> passes O_RDONLY but qemu wants O_RDWR, and so forth.
>

I'll plan on going with this approach.

>
>>>
>>> In which scenario would any client break if we set FD_CLOEXEC? I don't
>>> think compatibility means we can't fix any bugs.
>>>
>>
>> I don't know if it breaks any client.  Maybe it's not a compatibility
>> error.  It dopes change behavior down the line though.  If you think

s/dopes/does

>> it's ok to set FD_CLOEXEC for getfd too, then I'm happy to do it.
>
> The only case that a client might break is if there were a way to pass
> an fd into qemu and then intentionally see that fd in a child process of
> qemu.  But in the case of 'migrate fd:nnn', you aren't spawning a child
> process, and even in the case of 'migrate exec:command' (which libvirt
> no longer uses if fd:nnn works), I don't see how the client could have
> ever intentionally tried to use 'getfd' in advance to pass an extra fd
> for use inside the 'exec:command' child.  Besides, before 'pass-fd' was
> around, how would the management app triggering the 'exec:command' even
> know what fd number would accidentally be inherited into the
> exec:command child?  I think it is pretty much a straight bug-fix for
> 'getfd' to always set FD_CLOEXEC, and preferably set it atomically via
> MSG_CMSG_CLOEXEC.
>

Alright, I'll go ahead and make this update in the next version of the 
patch series.

Thanks for all the input!
Kevin Wolf - June 18, 2012, 8:10 a.m.
Am 15.06.2012 22:00, schrieb Eric Blake:
> On 06/15/2012 01:19 PM, Corey Bryant wrote:
> 
>>>> There are some flags that I don't think we'll be able to change.  For
>>>> example: O_RDONLY, O_WRONLY, O_RDWR.  I assume libvirt would open all
>>>> files O_RDWR.
>>>
>>> I think we need to check all of them and fail qemu_open() if they don't
>>> match. Those that qemu can change, should be just changed, of course.
>>>
>>
>> Ok.  I remember a scenario where QEMU opens a file read-only (perhaps to
>> check headers and determine the file format) before re-opening it
>> read-write.  Perhaps this is only when format= isn't specified with
>> -drive.  I'm thinking we may need to change flags to read-write where
>> they used to be read-only, in some circumstances.
> 
> In those situations, libvirt would pass fd with O_RDWR, and qemu_open()
> would be fine requesting O_RDONLY the first time (subset is okay), and
> O_RDWR the second time.  Where you have to error out is where libvirt
> passes O_RDONLY but qemu wants O_RDWR, and so forth.

Let's try it with requiring an exact match first. If you pass the
format, I think the probing is completely avoided indeed, and having
read-only images really opened O_RDONLY protects against stupid mistakes.

Or if we really need to open the file for probing, maybe we could add a
flag that relaxes the check and that isn't used in the real bdrv_open().

Kevin
Corey Bryant - June 19, 2012, 1:59 p.m.
On 06/18/2012 04:10 AM, Kevin Wolf wrote:
> Am 15.06.2012 22:00, schrieb Eric Blake:
>> On 06/15/2012 01:19 PM, Corey Bryant wrote:
>>
>>>>> There are some flags that I don't think we'll be able to change.  For
>>>>> example: O_RDONLY, O_WRONLY, O_RDWR.  I assume libvirt would open all
>>>>> files O_RDWR.
>>>>
>>>> I think we need to check all of them and fail qemu_open() if they don't
>>>> match. Those that qemu can change, should be just changed, of course.
>>>>
>>>
>>> Ok.  I remember a scenario where QEMU opens a file read-only (perhaps to
>>> check headers and determine the file format) before re-opening it
>>> read-write.  Perhaps this is only when format= isn't specified with
>>> -drive.  I'm thinking we may need to change flags to read-write where
>>> they used to be read-only, in some circumstances.
>>
>> In those situations, libvirt would pass fd with O_RDWR, and qemu_open()
>> would be fine requesting O_RDONLY the first time (subset is okay), and
>> O_RDWR the second time.  Where you have to error out is where libvirt
>> passes O_RDONLY but qemu wants O_RDWR, and so forth.
>
> Let's try it with requiring an exact match first. If you pass the
> format, I think the probing is completely avoided indeed, and having
> read-only images really opened O_RDONLY protects against stupid mistakes.
>
> Or if we really need to open the file for probing, maybe we could add a
> flag that relaxes the check and that isn't used in the real bdrv_open().
>
> Kevin
>

I haven't heard any objection to this so I'll be checking for exact 
match, and implementing a flag to relax the check only if it's necessary.

Patch

diff --git a/osdep.c b/osdep.c
index 3e6bada..c17cdcb 100644
--- a/osdep.c
+++ b/osdep.c
@@ -82,6 +82,19 @@  int qemu_open(const char *name, int flags, ...)
     int ret;
     int mode = 0;
 
+#ifndef _WIN32
+    const char *p;
+
+    /* Attempt dup of fd for pre-opened file */
+    if (strstart(name, "/dev/fd/", &p)) {
+        ret = qemu_parse_fd(p);
+        if (ret == -1) {
+            return -1;
+        }
+        return dup(ret);
+    }
+#endif
+
     if (flags & O_CREAT) {
         va_list ap;