diff mbox series

[2/5] migration: Fix race on qemu_file_shutdown()

Message ID 20220920223800.47467-3-peterx@redhat.com
State New
Headers show
Series migration: Bug fixes (prepare for preempt-full) | expand

Commit Message

Peter Xu Sept. 20, 2022, 10:37 p.m. UTC
In qemu_file_shutdown(), there's a possible race if with current order of
operation.  There're two major things to do:

  (1) Do real shutdown() (e.g. shutdown() syscall on socket)
  (2) Update qemufile's last_error

We must do (2) before (1) otherwise there can be a race condition like:

      page receiver                     other thread
      -------------                     ------------
      qemu_get_buffer()
                                        do shutdown()
        returns 0 (buffer all zero)
        (meanwhile we didn't check this retcode)
      try to detect IO error
        last_error==NULL, IO okay
      install ALL-ZERO page
                                        set last_error
      --> guest crash!

To fix this, we can also check retval of qemu_get_buffer(), but not all
APIs can be properly checked and ultimately we still need to go back to
qemu_file_get_error().  E.g. qemu_get_byte() doesn't return error.

Maybe some day a rework of qemufile API is really needed, but for now keep
using qemu_file_get_error() and fix it by not allowing that race condition
to happen.  Here shutdown() is indeed special because the last_error was
emulated.  For real -EIO errors it'll always be set when e.g. sendmsg()
error triggers so we won't miss those ones, only shutdown() is a bit tricky
here.

Cc: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/qemu-file.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

Comments

Dr. David Alan Gilbert Sept. 22, 2022, 3:43 p.m. UTC | #1
* Peter Xu (peterx@redhat.com) wrote:
> In qemu_file_shutdown(), there's a possible race if with current order of
> operation.  There're two major things to do:
> 
>   (1) Do real shutdown() (e.g. shutdown() syscall on socket)
>   (2) Update qemufile's last_error
> 
> We must do (2) before (1) otherwise there can be a race condition like:
> 
>       page receiver                     other thread
>       -------------                     ------------
>       qemu_get_buffer()
>                                         do shutdown()
>         returns 0 (buffer all zero)
>         (meanwhile we didn't check this retcode)
>       try to detect IO error
>         last_error==NULL, IO okay
>       install ALL-ZERO page
>                                         set last_error
>       --> guest crash!
> 
> To fix this, we can also check retval of qemu_get_buffer(), but not all
> APIs can be properly checked and ultimately we still need to go back to
> qemu_file_get_error().  E.g. qemu_get_byte() doesn't return error.
> 
> Maybe some day a rework of qemufile API is really needed, but for now keep
> using qemu_file_get_error() and fix it by not allowing that race condition
> to happen.  Here shutdown() is indeed special because the last_error was
> emulated.  For real -EIO errors it'll always be set when e.g. sendmsg()
> error triggers so we won't miss those ones, only shutdown() is a bit tricky
> here.
> 
> Cc: Daniel P. Berrange <berrange@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Oh that's kind of fun,


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/qemu-file.c | 27 ++++++++++++++++++++++++---
>  1 file changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 4f400c2e52..2d5f74ffc2 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -79,6 +79,30 @@ int qemu_file_shutdown(QEMUFile *f)
>      int ret = 0;
>  
>      f->shutdown = true;
> +
> +    /*
> +     * We must set qemufile error before the real shutdown(), otherwise
> +     * there can be a race window where we thought IO all went though
> +     * (because last_error==NULL) but actually IO has already stopped.
> +     *
> +     * If without correct ordering, the race can happen like this:
> +     *
> +     *      page receiver                     other thread
> +     *      -------------                     ------------
> +     *      qemu_get_buffer()
> +     *                                        do shutdown()
> +     *        returns 0 (buffer all zero)
> +     *        (we didn't check this retcode)
> +     *      try to detect IO error
> +     *        last_error==NULL, IO okay
> +     *      install ALL-ZERO page
> +     *                                        set last_error
> +     *      --> guest crash!
> +     */
> +    if (!f->last_error) {
> +        qemu_file_set_error(f, -EIO);
> +    }
> +
>      if (!qio_channel_has_feature(f->ioc,
>                                   QIO_CHANNEL_FEATURE_SHUTDOWN)) {
>          return -ENOSYS;
> @@ -88,9 +112,6 @@ int qemu_file_shutdown(QEMUFile *f)
>          ret = -EIO;
>      }
>  
> -    if (!f->last_error) {
> -        qemu_file_set_error(f, -EIO);
> -    }
>      return ret;
>  }
>  
> -- 
> 2.32.0
>
Daniel P. Berrangé Sept. 22, 2022, 4:58 p.m. UTC | #2
On Tue, Sep 20, 2022 at 06:37:57PM -0400, Peter Xu wrote:
> In qemu_file_shutdown(), there's a possible race if with current order of
> operation.  There're two major things to do:
> 
>   (1) Do real shutdown() (e.g. shutdown() syscall on socket)
>   (2) Update qemufile's last_error
> 
> We must do (2) before (1) otherwise there can be a race condition like:
> 
>       page receiver                     other thread
>       -------------                     ------------
>       qemu_get_buffer()
>                                         do shutdown()
>         returns 0 (buffer all zero)
>         (meanwhile we didn't check this retcode)
>       try to detect IO error
>         last_error==NULL, IO okay
>       install ALL-ZERO page
>                                         set last_error
>       --> guest crash!
> 
> To fix this, we can also check retval of qemu_get_buffer(), but not all
> APIs can be properly checked and ultimately we still need to go back to
> qemu_file_get_error().  E.g. qemu_get_byte() doesn't return error.
> 
> Maybe some day a rework of qemufile API is really needed, but for now keep
> using qemu_file_get_error() and fix it by not allowing that race condition
> to happen.  Here shutdown() is indeed special because the last_error was
> emulated.  For real -EIO errors it'll always be set when e.g. sendmsg()
> error triggers so we won't miss those ones, only shutdown() is a bit tricky
> here.

The ultimate answer really is to stop using QEMUFile entirely and just
do migration with the QIOChannel objects directly. The work I did in the
last cycle to remove the QEMUFileOps callbacks was another piece of the
puzzle in moving in that direction, by ensuring that every QEMUFile is
now backed by a QIOChannel. All QEMUFile is doing now is adding the I/O
caching layer and some convenience APIs for I/O operations.

So the next step would be to add a  QIOChannelCached class that can wrap
over another QIOChannel, to add the I/O buffering functionality that
currently exists in QEMUFile. Once that's done, it'd be at the stage
where we could look at how to use QIOChannel APIs for VMstate. It would
likely involve wiring up an Error **errp  parameter into a great many
places so we get synchronous error propagation instead of out-of-band
error checking, so a phased transition would need to be figured out.

With regards,
Daniel
Peter Xu Sept. 22, 2022, 7:37 p.m. UTC | #3
On Thu, Sep 22, 2022 at 05:58:25PM +0100, Daniel P. Berrangé wrote:
> On Tue, Sep 20, 2022 at 06:37:57PM -0400, Peter Xu wrote:
> > In qemu_file_shutdown(), there's a possible race if with current order of
> > operation.  There're two major things to do:
> > 
> >   (1) Do real shutdown() (e.g. shutdown() syscall on socket)
> >   (2) Update qemufile's last_error
> > 
> > We must do (2) before (1) otherwise there can be a race condition like:
> > 
> >       page receiver                     other thread
> >       -------------                     ------------
> >       qemu_get_buffer()
> >                                         do shutdown()
> >         returns 0 (buffer all zero)
> >         (meanwhile we didn't check this retcode)
> >       try to detect IO error
> >         last_error==NULL, IO okay
> >       install ALL-ZERO page
> >                                         set last_error
> >       --> guest crash!
> > 
> > To fix this, we can also check retval of qemu_get_buffer(), but not all
> > APIs can be properly checked and ultimately we still need to go back to
> > qemu_file_get_error().  E.g. qemu_get_byte() doesn't return error.
> > 
> > Maybe some day a rework of qemufile API is really needed, but for now keep
> > using qemu_file_get_error() and fix it by not allowing that race condition
> > to happen.  Here shutdown() is indeed special because the last_error was
> > emulated.  For real -EIO errors it'll always be set when e.g. sendmsg()
> > error triggers so we won't miss those ones, only shutdown() is a bit tricky
> > here.
> 
> The ultimate answer really is to stop using QEMUFile entirely and just
> do migration with the QIOChannel objects directly. The work I did in the
> last cycle to remove the QEMUFileOps callbacks was another piece of the
> puzzle in moving in that direction, by ensuring that every QEMUFile is
> now backed by a QIOChannel. All QEMUFile is doing now is adding the I/O
> caching layer and some convenience APIs for I/O operations.
> 
> So the next step would be to add a  QIOChannelCached class that can wrap
> over another QIOChannel, to add the I/O buffering functionality that
> currently exists in QEMUFile. Once that's done, it'd be at the stage
> where we could look at how to use QIOChannel APIs for VMstate. It would
> likely involve wiring up an Error **errp  parameter into a great many
> places so we get synchronous error propagation instead of out-of-band
> error checking, so a phased transition would need to be figured out.

Yes, Error** sounds very good to have.

So far I'm not satisfied with qemufile api majorly because of that error
handling, especially on *get() interfaces.

Besides that, do you have anything else in mind that would make
QIOChannelCached better than qemufile (e.g. on how we do caching)?

Thanks,
Daniel P. Berrangé Sept. 23, 2022, 7:14 a.m. UTC | #4
On Thu, Sep 22, 2022 at 03:37:11PM -0400, Peter Xu wrote:
> On Thu, Sep 22, 2022 at 05:58:25PM +0100, Daniel P. Berrangé wrote:
> > On Tue, Sep 20, 2022 at 06:37:57PM -0400, Peter Xu wrote:
> > > In qemu_file_shutdown(), there's a possible race if with current order of
> > > operation.  There're two major things to do:
> > > 
> > >   (1) Do real shutdown() (e.g. shutdown() syscall on socket)
> > >   (2) Update qemufile's last_error
> > > 
> > > We must do (2) before (1) otherwise there can be a race condition like:
> > > 
> > >       page receiver                     other thread
> > >       -------------                     ------------
> > >       qemu_get_buffer()
> > >                                         do shutdown()
> > >         returns 0 (buffer all zero)
> > >         (meanwhile we didn't check this retcode)
> > >       try to detect IO error
> > >         last_error==NULL, IO okay
> > >       install ALL-ZERO page
> > >                                         set last_error
> > >       --> guest crash!
> > > 
> > > To fix this, we can also check retval of qemu_get_buffer(), but not all
> > > APIs can be properly checked and ultimately we still need to go back to
> > > qemu_file_get_error().  E.g. qemu_get_byte() doesn't return error.
> > > 
> > > Maybe some day a rework of qemufile API is really needed, but for now keep
> > > using qemu_file_get_error() and fix it by not allowing that race condition
> > > to happen.  Here shutdown() is indeed special because the last_error was
> > > emulated.  For real -EIO errors it'll always be set when e.g. sendmsg()
> > > error triggers so we won't miss those ones, only shutdown() is a bit tricky
> > > here.
> > 
> > The ultimate answer really is to stop using QEMUFile entirely and just
> > do migration with the QIOChannel objects directly. The work I did in the
> > last cycle to remove the QEMUFileOps callbacks was another piece of the
> > puzzle in moving in that direction, by ensuring that every QEMUFile is
> > now backed by a QIOChannel. All QEMUFile is doing now is adding the I/O
> > caching layer and some convenience APIs for I/O operations.
> > 
> > So the next step would be to add a  QIOChannelCached class that can wrap
> > over another QIOChannel, to add the I/O buffering functionality that
> > currently exists in QEMUFile. Once that's done, it'd be at the stage
> > where we could look at how to use QIOChannel APIs for VMstate. It would
> > likely involve wiring up an Error **errp  parameter into a great many
> > places so we get synchronous error propagation instead of out-of-band
> > error checking, so a phased transition would need to be figured out.
> 
> Yes, Error** sounds very good to have.
> 
> So far I'm not satisfied with qemufile api majorly because of that error
> handling, especially on *get() interfaces.
> 
> Besides that, do you have anything else in mind that would make
> QIOChannelCached better than qemufile (e.g. on how we do caching)?

Depends what you mean by better ? I think the caching code would be
a bit easier to understand, because QEMUFile gets a bit confusing
about which logic is used for read side and which is used for the
write side.

With regards,
Daniel
Peter Xu Sept. 23, 2022, 6:27 p.m. UTC | #5
On Fri, Sep 23, 2022 at 08:14:25AM +0100, Daniel P. Berrangé wrote:
> > Besides that, do you have anything else in mind that would make
> > QIOChannelCached better than qemufile (e.g. on how we do caching)?
> 
> Depends what you mean by better ? I think the caching code would be
> a bit easier to understand, because QEMUFile gets a bit confusing
> about which logic is used for read side and which is used for the
> write side.

I see, looking forward to it.  Thanks,
diff mbox series

Patch

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 4f400c2e52..2d5f74ffc2 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -79,6 +79,30 @@  int qemu_file_shutdown(QEMUFile *f)
     int ret = 0;
 
     f->shutdown = true;
+
+    /*
+     * We must set qemufile error before the real shutdown(), otherwise
+     * there can be a race window where we thought IO all went though
+     * (because last_error==NULL) but actually IO has already stopped.
+     *
+     * If without correct ordering, the race can happen like this:
+     *
+     *      page receiver                     other thread
+     *      -------------                     ------------
+     *      qemu_get_buffer()
+     *                                        do shutdown()
+     *        returns 0 (buffer all zero)
+     *        (we didn't check this retcode)
+     *      try to detect IO error
+     *        last_error==NULL, IO okay
+     *      install ALL-ZERO page
+     *                                        set last_error
+     *      --> guest crash!
+     */
+    if (!f->last_error) {
+        qemu_file_set_error(f, -EIO);
+    }
+
     if (!qio_channel_has_feature(f->ioc,
                                  QIO_CHANNEL_FEATURE_SHUTDOWN)) {
         return -ENOSYS;
@@ -88,9 +112,6 @@  int qemu_file_shutdown(QEMUFile *f)
         ret = -EIO;
     }
 
-    if (!f->last_error) {
-        qemu_file_set_error(f, -EIO);
-    }
     return ret;
 }