Patchwork [v2,2/9] block: raw-posix image file reopen

login
register
mail settings
Submitter Supriya Kannery
Date July 30, 2012, 9:34 p.m.
Message ID <20120730213437.21536.87232.sendpatchset@skannery.in.ibm.com>
Download mbox | patch
Permalink /patch/174099/
State New
Headers show

Comments

Supriya Kannery - July 30, 2012, 9:34 p.m.
raw-posix driver changes for bdrv_reopen_xx functions to
safely reopen image files. Reopening of image files while 
changing hostcache dynamically is handled here.

Signed-off-by: Supriya Kannery <supriyak@linux.vnet.ibm.com>

---
Eric Blake - July 31, 2012, 5:17 p.m.
On 07/30/2012 03:34 PM, Supriya Kannery wrote:
> raw-posix driver changes for bdrv_reopen_xx functions to
> safely reopen image files. Reopening of image files while 
> changing hostcache dynamically is handled here.
> 
> Signed-off-by: Supriya Kannery <supriyak@linux.vnet.ibm.com>
> 
> ---
> Index: qemu/block/raw.c
> ===================================================================

> +static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState **prs,
> +                              int flags)
> +{
> +    BDRVRawReopenState *raw_rs = g_malloc0(sizeof(BDRVRawReopenState));
> +    BDRVRawState *s = bs->opaque;
> +    int ret = 0;
> +
> +    raw_rs->reopen_state.bs = bs;
> +
> +    /* stash state before reopen */
> +    raw_rs->stash_s = g_malloc0(sizeof(BDRVRawState));
> +    raw_stash_state(raw_rs->stash_s, s);
> +    s->fd = dup3(raw_rs->stash_s->fd, s->fd, O_CLOEXEC);

You called it out in your cover letter, but just pointing out that this
is one of the locations that needs a conditional fallback to
dup2/qemu_set_cloexec if dup3 and O_CLOEXEC are missing.

More importantly, though, you want this to be fcntl(F_DUP_CLOEXEC) and
NOT dup3, so that you are duplicating to the first available fd rather
than accidentally overwriting whatever s->fd happened to be on entry.
Also, where is your error checking that you didn't just assign s->fd to
-1?  It looks like your goal here is to stash a copy of the fd, so that
on failure you can then pivot over to your copy.

> +
> +    *prs = &(raw_rs->reopen_state);
> +
> +    /* Flags that can be set using fcntl */
> +    int fcntl_flags = BDRV_O_NOCACHE;

This variable name is misleading; fcntl_flags in my mind implies O_* not
BDRV_O_*, as we are not guaranteed that they are the same values.  Maybe
bdrv_flags is a better name.  Or are you trying to list the subset of
BDRV_O flags that can be changed via mapping to the appropriate O_ flag
during fcntl?

> +
> +    if ((bs->open_flags & ~fcntl_flags) == (flags & ~fcntl_flags)) {
> +        if ((flags & BDRV_O_NOCACHE)) {
> +            s->open_flags |= O_DIRECT;
> +        } else {
> +            s->open_flags &= ~O_DIRECT;
> +        }
> +        ret = fcntl_setfl(s->fd, s->open_flags);
> +    } else {
> +
> +        /* close and reopen using new flags */
> +        bs->drv->bdrv_close(bs);
> +        ret = bs->drv->bdrv_file_open(bs, bs->filename, flags);
> +    }

At any rate, in spite of the poor choice of naming for fcntl_flags, the
logic here looked correct - if the only BDRV_O_ bits that changed
between existing flags and the requested flags can be handled by fcntl,
then use fcntl to change them, otherwise open from scratch.

> +static void raw_reopen_abort(BlockDriverState *bs, BDRVReopenState *rs)
> +{
> +    BDRVRawReopenState *raw_rs;
> +    BDRVRawState *s = bs->opaque;
> +
> +    raw_rs = container_of(rs, BDRVRawReopenState, reopen_state);
> +
> +    /* revert to stashed state */
> +    if (s->fd != -1) {
> +        close(s->fd);
> +    }

At first, I worried that you needed to use dup3() here to restore s->fd
from the cached fd...

> +    raw_revert_state(s, raw_rs->stash_s);

then I read the code for raw_revert_state, where you are rewriting the
state to permanently use the dup'd copy rather than trying to revert
back to the old fd number.  As long as we are sure that all other points
in the code base always go through s->fd rather than caching the old
value, and therefore don't have a stale reference, then swapping the
struct instead of using dup3 to restore the exact fd value we previously
had should work.
Kevin Wolf - Aug. 1, 2012, 6:18 a.m.
Am 31.07.2012 19:17, schrieb Eric Blake:
> On 07/30/2012 03:34 PM, Supriya Kannery wrote:
>> raw-posix driver changes for bdrv_reopen_xx functions to
>> safely reopen image files. Reopening of image files while 
>> changing hostcache dynamically is handled here.
>>
>> Signed-off-by: Supriya Kannery <supriyak@linux.vnet.ibm.com>
>>
>> ---
>> Index: qemu/block/raw.c
>> ===================================================================
> 
>> +static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState **prs,
>> +                              int flags)
>> +{
>> +    BDRVRawReopenState *raw_rs = g_malloc0(sizeof(BDRVRawReopenState));
>> +    BDRVRawState *s = bs->opaque;
>> +    int ret = 0;
>> +
>> +    raw_rs->reopen_state.bs = bs;
>> +
>> +    /* stash state before reopen */
>> +    raw_rs->stash_s = g_malloc0(sizeof(BDRVRawState));
>> +    raw_stash_state(raw_rs->stash_s, s);
>> +    s->fd = dup3(raw_rs->stash_s->fd, s->fd, O_CLOEXEC);
> 
> You called it out in your cover letter, but just pointing out that this
> is one of the locations that needs a conditional fallback to
> dup2/qemu_set_cloexec if dup3 and O_CLOEXEC are missing.
> 
> More importantly, though, you want this to be fcntl(F_DUP_CLOEXEC) and
> NOT dup3, so that you are duplicating to the first available fd rather
> than accidentally overwriting whatever s->fd happened to be on entry.
> Also, where is your error checking that you didn't just assign s->fd to
> -1?  It looks like your goal here is to stash a copy of the fd, so that
> on failure you can then pivot over to your copy.

Doesn't Corey's fd passing series introduce a helper function for
F_DUP_CLOEXEC and emulation if it isn't support? Could be reused here then.

Kevin
Jeff Cody - Aug. 3, 2012, 10:32 p.m.
On 07/31/2012 01:17 PM, Eric Blake wrote:
> On 07/30/2012 03:34 PM, Supriya Kannery wrote:
>> raw-posix driver changes for bdrv_reopen_xx functions to
>> safely reopen image files. Reopening of image files while 
>> changing hostcache dynamically is handled here.
>>
>> Signed-off-by: Supriya Kannery <supriyak@linux.vnet.ibm.com>
>>
>> ---
>> Index: qemu/block/raw.c
>> ===================================================================
> 
>> +static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState **prs,
>> +                              int flags)
>> +{
>> +    BDRVRawReopenState *raw_rs = g_malloc0(sizeof(BDRVRawReopenState));
>> +    BDRVRawState *s = bs->opaque;
>> +    int ret = 0;
>> +
>> +    raw_rs->reopen_state.bs = bs;
>> +
>> +    /* stash state before reopen */
>> +    raw_rs->stash_s = g_malloc0(sizeof(BDRVRawState));
>> +    raw_stash_state(raw_rs->stash_s, s);
>> +    s->fd = dup3(raw_rs->stash_s->fd, s->fd, O_CLOEXEC);
> 
> You called it out in your cover letter, but just pointing out that this
> is one of the locations that needs a conditional fallback to
> dup2/qemu_set_cloexec if dup3 and O_CLOEXEC are missing.
> 
> More importantly, though, you want this to be fcntl(F_DUP_CLOEXEC) and
> NOT dup3, so that you are duplicating to the first available fd rather
> than accidentally overwriting whatever s->fd happened to be on entry.
> Also, where is your error checking that you didn't just assign s->fd to
> -1?  It looks like your goal here is to stash a copy of the fd, so that
> on failure you can then pivot over to your copy.

In addition, as it is written above will always assign -1 to s->fd, and
stash_s->fd every time, with errno "Bad file descriptor".

The function raw_stash_state() does not save s->fd, but instead sets the
stashed copy (raw_rs->stash_s->fd) to be -1.

Then this:
    s->fd = dup3(raw_rs->stash_s->fd, s->fd, O_CLOEXEC);

assigns s->fd to -1.

I don't think dup3() will work at all, because if -1 is used for one of the
file descriptors, that is EBADF, and if stash_s->fd == s->fd, that is
EINVAL.

So I agree, I think we definitely want fcntl(F_DUP_CLOEXEC) here.

> 
>> +
>> +    *prs = &(raw_rs->reopen_state);
>> +
>> +    /* Flags that can be set using fcntl */
>> +    int fcntl_flags = BDRV_O_NOCACHE;
> 
> This variable name is misleading; fcntl_flags in my mind implies O_* not
> BDRV_O_*, as we are not guaranteed that they are the same values.  Maybe
> bdrv_flags is a better name.  Or are you trying to list the subset of
> BDRV_O flags that can be changed via mapping to the appropriate O_ flag
> during fcntl?
> 
>> +
>> +    if ((bs->open_flags & ~fcntl_flags) == (flags & ~fcntl_flags)) {
>> +        if ((flags & BDRV_O_NOCACHE)) {
>> +            s->open_flags |= O_DIRECT;
>> +        } else {
>> +            s->open_flags &= ~O_DIRECT;
>> +        }
>> +        ret = fcntl_setfl(s->fd, s->open_flags);
>> +    } else {
>> +
>> +        /* close and reopen using new flags */
>> +        bs->drv->bdrv_close(bs);
>> +        ret = bs->drv->bdrv_file_open(bs, bs->filename, flags);
>> +    }
> 
> At any rate, in spite of the poor choice of naming for fcntl_flags, the
> logic here looked correct - if the only BDRV_O_ bits that changed
> between existing flags and the requested flags can be handled by fcntl,
> then use fcntl to change them, otherwise open from scratch.
> 
>> +static void raw_reopen_abort(BlockDriverState *bs, BDRVReopenState *rs)
>> +{
>> +    BDRVRawReopenState *raw_rs;
>> +    BDRVRawState *s = bs->opaque;
>> +
>> +    raw_rs = container_of(rs, BDRVRawReopenState, reopen_state);
>> +
>> +    /* revert to stashed state */
>> +    if (s->fd != -1) {
>> +        close(s->fd);
>> +    }
> 
> At first, I worried that you needed to use dup3() here to restore s->fd
> from the cached fd...
> 
>> +    raw_revert_state(s, raw_rs->stash_s);
> 
> then I read the code for raw_revert_state, where you are rewriting the
> state to permanently use the dup'd copy rather than trying to revert
> back to the old fd number.  As long as we are sure that all other points
> in the code base always go through s->fd rather than caching the old
> value, and therefore don't have a stale reference, then swapping the
> struct instead of using dup3 to restore the exact fd value we previously
> had should work.
>
Jeff Cody - Aug. 9, 2012, 4:26 a.m.
On 07/30/2012 05:34 PM, Supriya Kannery wrote:
> raw-posix driver changes for bdrv_reopen_xx functions to
> safely reopen image files. Reopening of image files while 
> changing hostcache dynamically is handled here.
> 
> Signed-off-by: Supriya Kannery <supriyak@linux.vnet.ibm.com>
> 
> ---
> Index: qemu/block/raw.c
> ===================================================================
> --- qemu.orig/block/raw.c
> +++ qemu/block/raw.c
> @@ -9,6 +9,22 @@ static int raw_open(BlockDriverState *bs
>      return 0;
>  }
>  
> +static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState **prs,
> +                              int flags)
> +{
> +    return bdrv_reopen_prepare(bs->file, prs, flags);
> +}
> +
> +static void raw_reopen_commit(BlockDriverState *bs, BDRVReopenState *rs)
> +{
> +    bdrv_reopen_commit(bs->file, rs);
> +}
> +
> +static void raw_reopen_abort(BlockDriverState *bs, BDRVReopenState *rs)
> +{
> +    bdrv_reopen_abort(bs->file, rs);
> +}
> +

I think the above should be stubs that do nothing (or, alternatively,
are not present at all), if we have the reopen() do the bs->file
reopen() (which is needed for qcow2 - see my comments for patch 1/9)

>  static int coroutine_fn raw_co_readv(BlockDriverState *bs, int64_t sector_num,
>                                       int nb_sectors, QEMUIOVector *qiov)
>  {
> @@ -113,6 +129,10 @@ static BlockDriver bdrv_raw = {
>      .instance_size      = 1,
>  
>      .bdrv_open          = raw_open,
> +    .bdrv_reopen_prepare
> +                        = raw_reopen_prepare,
> +    .bdrv_reopen_commit = raw_reopen_commit,
> +    .bdrv_reopen_abort  = raw_reopen_abort,
>      .bdrv_close         = raw_close,
>  
>      .bdrv_co_readv          = raw_co_readv,
> Index: qemu/block/raw-posix.c
> ===================================================================
> --- qemu.orig/block/raw-posix.c
> +++ qemu/block/raw-posix.c
> @@ -140,8 +140,15 @@ typedef struct BDRVRawState {
>  #endif
>  } BDRVRawState;
>  
> +typedef struct BDRVRawReopenState {
> +    BDRVReopenState reopen_state;
> +    BDRVRawState *stash_s;
> +} BDRVRawReopenState;
> +
>  static int fd_open(BlockDriverState *bs);
>  static int64_t raw_getlength(BlockDriverState *bs);
> +static void raw_stash_state(BDRVRawState *stashed_state, BDRVRawState *s);
> +static void raw_revert_state(BDRVRawState *s, BDRVRawState *stashed_state);
>  
>  #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
>  static int cdrom_reopen(BlockDriverState *bs);
> @@ -283,6 +290,117 @@ static int raw_open(BlockDriverState *bs
>      return raw_open_common(bs, filename, flags, 0);
>  }
>  
> +static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState **prs,
> +                              int flags)
> +{
> +    BDRVRawReopenState *raw_rs = g_malloc0(sizeof(BDRVRawReopenState));
> +    BDRVRawState *s = bs->opaque;
> +    int ret = 0;
> +
> +    raw_rs->reopen_state.bs = bs;
> +
> +    /* stash state before reopen */
> +    raw_rs->stash_s = g_malloc0(sizeof(BDRVRawState));
> +    raw_stash_state(raw_rs->stash_s, s);
> +    s->fd = dup3(raw_rs->stash_s->fd, s->fd, O_CLOEXEC);
> +
> +    *prs = &(raw_rs->reopen_state);
> +
> +    /* Flags that can be set using fcntl */
> +    int fcntl_flags = BDRV_O_NOCACHE;
> +
> +    if ((bs->open_flags & ~fcntl_flags) == (flags & ~fcntl_flags)) {
> +        if ((flags & BDRV_O_NOCACHE)) {
> +            s->open_flags |= O_DIRECT;
> +        } else {
> +            s->open_flags &= ~O_DIRECT;
> +        }
> +        ret = fcntl_setfl(s->fd, s->open_flags);
> +    } else {
> +
> +        /* close and reopen using new flags */
> +        bs->drv->bdrv_close(bs);
> +        ret = bs->drv->bdrv_file_open(bs, bs->filename, flags);

This worries me, in that we close the BDS and then try
to reopen it.  At this point, if the bdrv_file_open() fails, we have no
recovery mechanism.

Maybe we could try to do something like a bdrv_file_open() into a new BDS
before we close, followed by something akin to a bdrv_swap() /
bdrv_append() if the new bdrv_file_open() was successful, and then do
the close on the original.  On failure, just abandon the new BDS.

> +    }
> +    return ret;
> +}
> +
> +static void raw_reopen_commit(BlockDriverState *bs, BDRVReopenState *rs)
> +{
> +    BDRVRawReopenState *raw_rs;
> +
> +    raw_rs = container_of(rs, BDRVRawReopenState, reopen_state);
> +
> +    /* clean up stashed state */
> +    close(raw_rs->stash_s->fd);
> +    g_free(raw_rs->stash_s);
> +    g_free(raw_rs);
> +}
> +
> +static void raw_reopen_abort(BlockDriverState *bs, BDRVReopenState *rs)
> +{
> +    BDRVRawReopenState *raw_rs;
> +    BDRVRawState *s = bs->opaque;
> +
> +    raw_rs = container_of(rs, BDRVRawReopenState, reopen_state);
> +
> +    /* revert to stashed state */
> +    if (s->fd != -1) {
> +        close(s->fd);
> +    }
> +    raw_revert_state(s, raw_rs->stash_s);
> +    g_free(raw_rs->stash_s);
> +    g_free(raw_rs);
> +}
> +
> +static void raw_stash_state(BDRVRawState *stashed_s, BDRVRawState *s)
> +{
> +    stashed_s->fd = -1;
> +    stashed_s->type = s->type;
> +    stashed_s->open_flags = s->open_flags;
> +#if defined(__linux__)
> +    /* linux floppy specific */
> +    stashed_s->fd_open_time = s->fd_open_time;
> +    stashed_s->fd_error_time = s->fd_error_time;
> +    stashed_s->fd_got_error = s->fd_got_error;
> +    stashed_s->fd_media_changed = s->fd_media_changed;
> +#endif
> +#ifdef CONFIG_LINUX_AIO
> +    stashed_s->use_aio = s->use_aio;
> +    stashed_s->aio_ctx = s->aio_ctx;
> +#endif
> +    stashed_s->aligned_buf = s->aligned_buf;
> +    stashed_s->aligned_buf_size = s->aligned_buf_size;
> +#ifdef CONFIG_XFS
> +    stashed_s->is_xfs = s->is_xfs;
> +#endif
> +
> +}
> +
> +static void raw_revert_state(BDRVRawState *s, BDRVRawState *stashed_s)
> +{
> +
> +    s->fd = stashed_s->fd;
> +    s->type = stashed_s->type;
> +    s->open_flags = stashed_s->open_flags;
> +#if defined(__linux__)
> +    /* linux floppy specific */
> +    s->fd_open_time = stashed_s->fd_open_time;
> +    s->fd_error_time = stashed_s->fd_error_time;
> +    s->fd_got_error = stashed_s->fd_got_error;
> +    s->fd_media_changed = stashed_s->fd_media_changed;
> +#endif
> +#ifdef CONFIG_LINUX_AIO
> +    s->use_aio = stashed_s->use_aio;
> +    s->aio_ctx = stashed_s->aio_ctx;
> +#endif
> +    s->aligned_buf = stashed_s->aligned_buf;
> +    s->aligned_buf_size = stashed_s->aligned_buf_size;
> +#ifdef CONFIG_XFS
> +    s->is_xfs = stashed_s->is_xfs;
> +#endif
> +}
> +
>  /* XXX: use host sector size if necessary with:
>  #ifdef DIOCGSECTORSIZE
>          {
> @@ -735,6 +853,9 @@ static BlockDriver bdrv_file = {
>      .instance_size = sizeof(BDRVRawState),
>      .bdrv_probe = NULL, /* no probe for protocols */
>      .bdrv_file_open = raw_open,
> +    .bdrv_reopen_prepare = raw_reopen_prepare,
> +    .bdrv_reopen_commit = raw_reopen_commit,
> +    .bdrv_reopen_abort = raw_reopen_abort,
>      .bdrv_close = raw_close,
>      .bdrv_create = raw_create,
>      .bdrv_co_discard = raw_co_discard,
> 
>
Corey Bryant - Aug. 10, 2012, 1:45 p.m.
On 07/30/2012 05:34 PM, Supriya Kannery wrote:
> raw-posix driver changes for bdrv_reopen_xx functions to
> safely reopen image files. Reopening of image files while
> changing hostcache dynamically is handled here.
>
> Signed-off-by: Supriya Kannery <supriyak@linux.vnet.ibm.com>
>
> ---
> Index: qemu/block/raw.c
> ===================================================================
> --- qemu.orig/block/raw.c
> +++ qemu/block/raw.c
> @@ -9,6 +9,22 @@ static int raw_open(BlockDriverState *bs
>       return 0;
>   }
>
> +static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState **prs,
> +                              int flags)
> +{
> +    return bdrv_reopen_prepare(bs->file, prs, flags);
> +}
> +
> +static void raw_reopen_commit(BlockDriverState *bs, BDRVReopenState *rs)
> +{
> +    bdrv_reopen_commit(bs->file, rs);
> +}
> +
> +static void raw_reopen_abort(BlockDriverState *bs, BDRVReopenState *rs)
> +{
> +    bdrv_reopen_abort(bs->file, rs);
> +}
> +
>   static int coroutine_fn raw_co_readv(BlockDriverState *bs, int64_t sector_num,
>                                        int nb_sectors, QEMUIOVector *qiov)
>   {
> @@ -113,6 +129,10 @@ static BlockDriver bdrv_raw = {
>       .instance_size      = 1,
>
>       .bdrv_open          = raw_open,
> +    .bdrv_reopen_prepare
> +                        = raw_reopen_prepare,
> +    .bdrv_reopen_commit = raw_reopen_commit,
> +    .bdrv_reopen_abort  = raw_reopen_abort,
>       .bdrv_close         = raw_close,
>
>       .bdrv_co_readv          = raw_co_readv,
> Index: qemu/block/raw-posix.c
> ===================================================================
> --- qemu.orig/block/raw-posix.c
> +++ qemu/block/raw-posix.c
> @@ -140,8 +140,15 @@ typedef struct BDRVRawState {
>   #endif
>   } BDRVRawState;
>
> +typedef struct BDRVRawReopenState {
> +    BDRVReopenState reopen_state;
> +    BDRVRawState *stash_s;
> +} BDRVRawReopenState;
> +
>   static int fd_open(BlockDriverState *bs);
>   static int64_t raw_getlength(BlockDriverState *bs);
> +static void raw_stash_state(BDRVRawState *stashed_state, BDRVRawState *s);
> +static void raw_revert_state(BDRVRawState *s, BDRVRawState *stashed_state);
>
>   #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
>   static int cdrom_reopen(BlockDriverState *bs);
> @@ -283,6 +290,117 @@ static int raw_open(BlockDriverState *bs
>       return raw_open_common(bs, filename, flags, 0);
>   }
>
> +static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState **prs,
> +                              int flags)
> +{
> +    BDRVRawReopenState *raw_rs = g_malloc0(sizeof(BDRVRawReopenState));
> +    BDRVRawState *s = bs->opaque;
> +    int ret = 0;
> +
> +    raw_rs->reopen_state.bs = bs;
> +
> +    /* stash state before reopen */
> +    raw_rs->stash_s = g_malloc0(sizeof(BDRVRawState));
> +    raw_stash_state(raw_rs->stash_s, s);
> +    s->fd = dup3(raw_rs->stash_s->fd, s->fd, O_CLOEXEC);
> +
> +    *prs = &(raw_rs->reopen_state);
> +
> +    /* Flags that can be set using fcntl */
> +    int fcntl_flags = BDRV_O_NOCACHE;
> +
> +    if ((bs->open_flags & ~fcntl_flags) == (flags & ~fcntl_flags)) {
> +        if ((flags & BDRV_O_NOCACHE)) {
> +            s->open_flags |= O_DIRECT;
> +        } else {
> +            s->open_flags &= ~O_DIRECT;
> +        }
> +        ret = fcntl_setfl(s->fd, s->open_flags);
> +    } else {
> +
> +        /* close and reopen using new flags */
> +        bs->drv->bdrv_close(bs);
> +        ret = bs->drv->bdrv_file_open(bs, bs->filename, flags);

Will this allow the fdset refcount to get to zero?  I was hoping your 
patches would prevent that from happening.  Perhaps Kevin or Eric can 
weigh in.  qemu_open() increments the refcount for an fdset when an fd 
from it is used, and qemu_close() decrements it.  I think if you were 
able to perform the open before the close here that refcount wouldn't 
get to zero.
Supriya Kannery - Aug. 14, 2012, 10:53 a.m.
On 07/31/2012 10:47 PM, Eric Blake wrote:
> On 07/30/2012 03:34 PM, Supriya Kannery wrote:
>> +    s->fd = dup3(raw_rs->stash_s->fd, s->fd, O_CLOEXEC);
> 
> You called it out in your cover letter, but just pointing out that this
> is one of the locations that needs a conditional fallback to
> dup2/qemu_set_cloexec if dup3 and O_CLOEXEC are missing.
> 
> More importantly, though, you want this to be fcntl(F_DUP_CLOEXEC) and
> NOT dup3, so that you are duplicating to the first available fd rather
> than accidentally overwriting whatever s->fd happened to be on entry.
> Also, where is your error checking that you didn't just assign s->fd to
> -1?  It looks like your goal here is to stash a copy of the fd, so that
> on failure you can then pivot over to your copy.
> 
 
Will use fcntl(F_DUP_CLOEXEC) in updated patch.
 
>> +
>> +    *prs =&(raw_rs->reopen_state);
>> +
>> +    /* Flags that can be set using fcntl */
>> +    int fcntl_flags = BDRV_O_NOCACHE;
> 
> This variable name is misleading; fcntl_flags in my mind implies O_* not
> BDRV_O_*, as we are not guaranteed that they are the same values.  Maybe
> bdrv_flags is a better name.  Or are you trying to list the subset of
> BDRV_O flags that can be changed via mapping to the appropriate O_ flag
> during fcntl?
> 

  From the list of flags in bdrv->openflags (BDRV_O*), only few of them can be changed
using fcntl. So to identify those allowed subset, I am using fcntl_flags.

Excerpt from man fcntl for F_SETFL:
            On Linux this command can only  change  the  O_APPEND,  O_ASYNC,
              O_DIRECT, O_NOATIME, and O_NONBLOCK flags.

May be naming it fcntl_supported_flags would be better?

- thanks, Supriya
Supriya Kannery - Aug. 14, 2012, 11:13 a.m.
On 08/10/2012 07:15 PM, Corey Bryant wrote:
> 
> 
> On 07/30/2012 05:34 PM, Supriya Kannery wrote:

>> +static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState 
>> **prs,
>> + int flags)
>> +{
>> + BDRVRawReopenState *raw_rs = g_malloc0(sizeof(BDRVRawReopenState));
>> + BDRVRawState *s = bs->opaque;
>> + int ret = 0;
>> +
>> + raw_rs->reopen_state.bs = bs;
>> +
>> + /* stash state before reopen */
>> + raw_rs->stash_s = g_malloc0(sizeof(BDRVRawState));
>> + raw_stash_state(raw_rs->stash_s, s);
>> + s->fd = dup3(raw_rs->stash_s->fd, s->fd, O_CLOEXEC);
>> +
>> + *prs = &(raw_rs->reopen_state);
>> +
>> + /* Flags that can be set using fcntl */
>> + int fcntl_flags = BDRV_O_NOCACHE;
>> +
>> + if ((bs->open_flags & ~fcntl_flags) == (flags & ~fcntl_flags)) {
>> + if ((flags & BDRV_O_NOCACHE)) {
>> + s->open_flags |= O_DIRECT;
>> + } else {
>> + s->open_flags &= ~O_DIRECT;
>> + }
>> + ret = fcntl_setfl(s->fd, s->open_flags);
>> + } else {
>> +
>> + /* close and reopen using new flags */
>> + bs->drv->bdrv_close(bs);
>> + ret = bs->drv->bdrv_file_open(bs, bs->filename, flags);
> 
> Will this allow the fdset refcount to get to zero? I was hoping your 
> patches would prevent that from happening. Perhaps Kevin or Eric can 
> weigh in. qemu_open() increments the refcount for an fdset when an fd 
> from it is used, and qemu_close() decrements it. I think if you were 
> able to perform the open before the close here that refcount wouldn't 
> get to zero.
> 

 Since we are duping the file descriptor before reaching this bdrv_close(),
refcount for fd won't become zero.

- thanks, Supriya
Kevin Wolf - Aug. 14, 2012, 11:35 a.m.
Am 14.08.2012 13:13, schrieb Supriya Kannery:
> On 08/10/2012 07:15 PM, Corey Bryant wrote:
>>
>>
>> On 07/30/2012 05:34 PM, Supriya Kannery wrote:
> 
>>> +static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState 
>>> **prs,
>>> + int flags)
>>> +{
>>> + BDRVRawReopenState *raw_rs = g_malloc0(sizeof(BDRVRawReopenState));
>>> + BDRVRawState *s = bs->opaque;
>>> + int ret = 0;
>>> +
>>> + raw_rs->reopen_state.bs = bs;
>>> +
>>> + /* stash state before reopen */
>>> + raw_rs->stash_s = g_malloc0(sizeof(BDRVRawState));
>>> + raw_stash_state(raw_rs->stash_s, s);
>>> + s->fd = dup3(raw_rs->stash_s->fd, s->fd, O_CLOEXEC);
>>> +
>>> + *prs = &(raw_rs->reopen_state);
>>> +
>>> + /* Flags that can be set using fcntl */
>>> + int fcntl_flags = BDRV_O_NOCACHE;
>>> +
>>> + if ((bs->open_flags & ~fcntl_flags) == (flags & ~fcntl_flags)) {
>>> + if ((flags & BDRV_O_NOCACHE)) {
>>> + s->open_flags |= O_DIRECT;
>>> + } else {
>>> + s->open_flags &= ~O_DIRECT;
>>> + }
>>> + ret = fcntl_setfl(s->fd, s->open_flags);
>>> + } else {
>>> +
>>> + /* close and reopen using new flags */
>>> + bs->drv->bdrv_close(bs);
>>> + ret = bs->drv->bdrv_file_open(bs, bs->filename, flags);
>>
>> Will this allow the fdset refcount to get to zero? I was hoping your 
>> patches would prevent that from happening. Perhaps Kevin or Eric can 
>> weigh in. qemu_open() increments the refcount for an fdset when an fd 
>> from it is used, and qemu_close() decrements it. I think if you were 
>> able to perform the open before the close here that refcount wouldn't 
>> get to zero.
>>
> 
>  Since we are duping the file descriptor before reaching this bdrv_close(),
> refcount for fd won't become zero.

We need to use a qemu_dup() here, so that the fdset implementation can
keep track of the new fd.

Kevin

Patch

Index: qemu/block/raw.c
===================================================================
--- qemu.orig/block/raw.c
+++ qemu/block/raw.c
@@ -9,6 +9,22 @@  static int raw_open(BlockDriverState *bs
     return 0;
 }
 
+static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState **prs,
+                              int flags)
+{
+    return bdrv_reopen_prepare(bs->file, prs, flags);
+}
+
+static void raw_reopen_commit(BlockDriverState *bs, BDRVReopenState *rs)
+{
+    bdrv_reopen_commit(bs->file, rs);
+}
+
+static void raw_reopen_abort(BlockDriverState *bs, BDRVReopenState *rs)
+{
+    bdrv_reopen_abort(bs->file, rs);
+}
+
 static int coroutine_fn raw_co_readv(BlockDriverState *bs, int64_t sector_num,
                                      int nb_sectors, QEMUIOVector *qiov)
 {
@@ -113,6 +129,10 @@  static BlockDriver bdrv_raw = {
     .instance_size      = 1,
 
     .bdrv_open          = raw_open,
+    .bdrv_reopen_prepare
+                        = raw_reopen_prepare,
+    .bdrv_reopen_commit = raw_reopen_commit,
+    .bdrv_reopen_abort  = raw_reopen_abort,
     .bdrv_close         = raw_close,
 
     .bdrv_co_readv          = raw_co_readv,
Index: qemu/block/raw-posix.c
===================================================================
--- qemu.orig/block/raw-posix.c
+++ qemu/block/raw-posix.c
@@ -140,8 +140,15 @@  typedef struct BDRVRawState {
 #endif
 } BDRVRawState;
 
+typedef struct BDRVRawReopenState {
+    BDRVReopenState reopen_state;
+    BDRVRawState *stash_s;
+} BDRVRawReopenState;
+
 static int fd_open(BlockDriverState *bs);
 static int64_t raw_getlength(BlockDriverState *bs);
+static void raw_stash_state(BDRVRawState *stashed_state, BDRVRawState *s);
+static void raw_revert_state(BDRVRawState *s, BDRVRawState *stashed_state);
 
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
 static int cdrom_reopen(BlockDriverState *bs);
@@ -283,6 +290,117 @@  static int raw_open(BlockDriverState *bs
     return raw_open_common(bs, filename, flags, 0);
 }
 
+static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState **prs,
+                              int flags)
+{
+    BDRVRawReopenState *raw_rs = g_malloc0(sizeof(BDRVRawReopenState));
+    BDRVRawState *s = bs->opaque;
+    int ret = 0;
+
+    raw_rs->reopen_state.bs = bs;
+
+    /* stash state before reopen */
+    raw_rs->stash_s = g_malloc0(sizeof(BDRVRawState));
+    raw_stash_state(raw_rs->stash_s, s);
+    s->fd = dup3(raw_rs->stash_s->fd, s->fd, O_CLOEXEC);
+
+    *prs = &(raw_rs->reopen_state);
+
+    /* Flags that can be set using fcntl */
+    int fcntl_flags = BDRV_O_NOCACHE;
+
+    if ((bs->open_flags & ~fcntl_flags) == (flags & ~fcntl_flags)) {
+        if ((flags & BDRV_O_NOCACHE)) {
+            s->open_flags |= O_DIRECT;
+        } else {
+            s->open_flags &= ~O_DIRECT;
+        }
+        ret = fcntl_setfl(s->fd, s->open_flags);
+    } else {
+
+        /* close and reopen using new flags */
+        bs->drv->bdrv_close(bs);
+        ret = bs->drv->bdrv_file_open(bs, bs->filename, flags);
+    }
+    return ret;
+}
+
+static void raw_reopen_commit(BlockDriverState *bs, BDRVReopenState *rs)
+{
+    BDRVRawReopenState *raw_rs;
+
+    raw_rs = container_of(rs, BDRVRawReopenState, reopen_state);
+
+    /* clean up stashed state */
+    close(raw_rs->stash_s->fd);
+    g_free(raw_rs->stash_s);
+    g_free(raw_rs);
+}
+
+static void raw_reopen_abort(BlockDriverState *bs, BDRVReopenState *rs)
+{
+    BDRVRawReopenState *raw_rs;
+    BDRVRawState *s = bs->opaque;
+
+    raw_rs = container_of(rs, BDRVRawReopenState, reopen_state);
+
+    /* revert to stashed state */
+    if (s->fd != -1) {
+        close(s->fd);
+    }
+    raw_revert_state(s, raw_rs->stash_s);
+    g_free(raw_rs->stash_s);
+    g_free(raw_rs);
+}
+
+static void raw_stash_state(BDRVRawState *stashed_s, BDRVRawState *s)
+{
+    stashed_s->fd = -1;
+    stashed_s->type = s->type;
+    stashed_s->open_flags = s->open_flags;
+#if defined(__linux__)
+    /* linux floppy specific */
+    stashed_s->fd_open_time = s->fd_open_time;
+    stashed_s->fd_error_time = s->fd_error_time;
+    stashed_s->fd_got_error = s->fd_got_error;
+    stashed_s->fd_media_changed = s->fd_media_changed;
+#endif
+#ifdef CONFIG_LINUX_AIO
+    stashed_s->use_aio = s->use_aio;
+    stashed_s->aio_ctx = s->aio_ctx;
+#endif
+    stashed_s->aligned_buf = s->aligned_buf;
+    stashed_s->aligned_buf_size = s->aligned_buf_size;
+#ifdef CONFIG_XFS
+    stashed_s->is_xfs = s->is_xfs;
+#endif
+
+}
+
+static void raw_revert_state(BDRVRawState *s, BDRVRawState *stashed_s)
+{
+
+    s->fd = stashed_s->fd;
+    s->type = stashed_s->type;
+    s->open_flags = stashed_s->open_flags;
+#if defined(__linux__)
+    /* linux floppy specific */
+    s->fd_open_time = stashed_s->fd_open_time;
+    s->fd_error_time = stashed_s->fd_error_time;
+    s->fd_got_error = stashed_s->fd_got_error;
+    s->fd_media_changed = stashed_s->fd_media_changed;
+#endif
+#ifdef CONFIG_LINUX_AIO
+    s->use_aio = stashed_s->use_aio;
+    s->aio_ctx = stashed_s->aio_ctx;
+#endif
+    s->aligned_buf = stashed_s->aligned_buf;
+    s->aligned_buf_size = stashed_s->aligned_buf_size;
+#ifdef CONFIG_XFS
+    s->is_xfs = stashed_s->is_xfs;
+#endif
+}
+
 /* XXX: use host sector size if necessary with:
 #ifdef DIOCGSECTORSIZE
         {
@@ -735,6 +853,9 @@  static BlockDriver bdrv_file = {
     .instance_size = sizeof(BDRVRawState),
     .bdrv_probe = NULL, /* no probe for protocols */
     .bdrv_file_open = raw_open,
+    .bdrv_reopen_prepare = raw_reopen_prepare,
+    .bdrv_reopen_commit = raw_reopen_commit,
+    .bdrv_reopen_abort = raw_reopen_abort,
     .bdrv_close = raw_close,
     .bdrv_create = raw_create,
     .bdrv_co_discard = raw_co_discard,