diff mbox series

[3/8] xen: defer call to xen_restrict until just before os_setup_post

Message ID 1507564902-9000-4-git-send-email-ian.jackson@eu.citrix.com
State New
Headers show
Series xen: xen-domid-restrict improvements | expand

Commit Message

Ian Jackson Oct. 9, 2017, 4:01 p.m. UTC
We need to restrict *all* the control fds that qemu opens.  Looking in
/proc/PID/fd shows there are many; their allocation seems scattered
throughout Xen support code in qemu.

We must postpone the restrict call until roughly the same time as qemu
changes its uid, chroots (if applicable), and so on.

There doesn't seem to be an appropriate hook already.  The RunState
change hook fires at different times depending on exactly what mode
qemu is operating in.

And it appears that no-one but the Xen code wants a hook at this phase
of execution.  So, introduce a bare call to a new function
xen_setup_post, just before os_setup_post.  Also provide the
appropriate stub for when Xen compilation is disabled.

We do the restriction before rather than after os_setup_post, because
xen_restrict may need to open /dev/null, and os_setup_post might have
called chroot.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v3: Do xen_setup_post just before, not just after, os_setup_post,
    to improve interaction with chroot.  Thanks to Ross Lagerwall.
---
 hw/i386/xen/xen-hvm.c   |  8 --------
 hw/xen/xen-common.c     | 13 +++++++++++++
 include/sysemu/sysemu.h |  2 ++
 stubs/xen-hvm.c         |  5 +++++
 vl.c                    |  1 +
 5 files changed, 21 insertions(+), 8 deletions(-)

Comments

Anthony PERARD Oct. 10, 2017, 11:48 a.m. UTC | #1
On Mon, Oct 09, 2017 at 05:01:37PM +0100, Ian Jackson wrote:
> We need to restrict *all* the control fds that qemu opens.  Looking in
> /proc/PID/fd shows there are many; their allocation seems scattered
> throughout Xen support code in qemu.
> 
> We must postpone the restrict call until roughly the same time as qemu
> changes its uid, chroots (if applicable), and so on.
> 
> There doesn't seem to be an appropriate hook already.  The RunState
> change hook fires at different times depending on exactly what mode
> qemu is operating in.
> 
> And it appears that no-one but the Xen code wants a hook at this phase
> of execution.  So, introduce a bare call to a new function
> xen_setup_post, just before os_setup_post.  Also provide the
> appropriate stub for when Xen compilation is disabled.
> 
> We do the restriction before rather than after os_setup_post, because
> xen_restrict may need to open /dev/null, and os_setup_post might have
> called chroot.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Ross Lagerwall Oct. 13, 2017, 8:37 a.m. UTC | #2
On 10/09/2017 05:01 PM, Ian Jackson wrote:
> We need to restrict *all* the control fds that qemu opens.  Looking in
> /proc/PID/fd shows there are many; their allocation seems scattered
> throughout Xen support code in qemu.
> 
> We must postpone the restrict call until roughly the same time as qemu
> changes its uid, chroots (if applicable), and so on.
> 
> There doesn't seem to be an appropriate hook already.  The RunState
> change hook fires at different times depending on exactly what mode
> qemu is operating in.
> 
> And it appears that no-one but the Xen code wants a hook at this phase
> of execution.  So, introduce a bare call to a new function
> xen_setup_post, just before os_setup_post.  Also provide the
> appropriate stub for when Xen compilation is disabled.
> 
> We do the restriction before rather than after os_setup_post, because
> xen_restrict may need to open /dev/null, and os_setup_post might have
> called chroot.
> 
This works for normally starting a VM but doesn't seem to work when 
resuming/migrating.

Here is the order of selected operations when starting a VM normally:
     VNC server running on 127.0.0.1:5901
     xen_change_state_handler()
     xenstore_record_dm_state: running
     xen_setup_post()
     xentoolcore_restrict_all: rc = 0
     os_setup_post()
     main_loop()

Here is the order of selected operations when starting QEMU with 
-incoming fd:... :
     VNC server running on 127.0.0.1:5902
     migration_fd_incoming()
     xen_setup_post()
     xentoolcore_restrict_all: rc = 0
     os_setup_post()
     main_loop()
     migration_set_incoming_channel()
     migrate_set_state()
     xen_change_state_handler()
     xenstore_record_dm_state: running
     error recording dm state
     qemu exited with code 1

The issue is that QEMU needs xenstore access to write "running" but this 
is after it has already been restricted. Moving xen_setup_post() into 
xen_change_state_handler() works fine. The only issue is that in the 
migration case, it executes after os_setup_post() so QEMU might be 
chrooted in which case opening /dev/null to restrict fds doesn't work 
(unless its new root has a /dev/null).
Andrew Cooper Oct. 13, 2017, 8:59 a.m. UTC | #3
On 13/10/2017 09:37, Ross Lagerwall wrote:
> On 10/09/2017 05:01 PM, Ian Jackson wrote:
>> We need to restrict *all* the control fds that qemu opens.  Looking in
>> /proc/PID/fd shows there are many; their allocation seems scattered
>> throughout Xen support code in qemu.
>>
>> We must postpone the restrict call until roughly the same time as qemu
>> changes its uid, chroots (if applicable), and so on.
>>
>> There doesn't seem to be an appropriate hook already.  The RunState
>> change hook fires at different times depending on exactly what mode
>> qemu is operating in.
>>
>> And it appears that no-one but the Xen code wants a hook at this phase
>> of execution.  So, introduce a bare call to a new function
>> xen_setup_post, just before os_setup_post.  Also provide the
>> appropriate stub for when Xen compilation is disabled.
>>
>> We do the restriction before rather than after os_setup_post, because
>> xen_restrict may need to open /dev/null, and os_setup_post might have
>> called chroot.
>>
> This works for normally starting a VM but doesn't seem to work when
> resuming/migrating.
>
> Here is the order of selected operations when starting a VM normally:
>     VNC server running on 127.0.0.1:5901
>     xen_change_state_handler()
>     xenstore_record_dm_state: running
>     xen_setup_post()
>     xentoolcore_restrict_all: rc = 0
>     os_setup_post()
>     main_loop()
>
> Here is the order of selected operations when starting QEMU with
> -incoming fd:... :
>     VNC server running on 127.0.0.1:5902
>     migration_fd_incoming()
>     xen_setup_post()
>     xentoolcore_restrict_all: rc = 0
>     os_setup_post()
>     main_loop()
>     migration_set_incoming_channel()
>     migrate_set_state()
>     xen_change_state_handler()
>     xenstore_record_dm_state: running
>     error recording dm state
>     qemu exited with code 1
>
> The issue is that QEMU needs xenstore access to write "running" but
> this is after it has already been restricted. Moving xen_setup_post()
> into xen_change_state_handler() works fine. The only issue is that in
> the migration case, it executes after os_setup_post() so QEMU might be
> chrooted in which case opening /dev/null to restrict fds doesn't work
> (unless its new root has a /dev/null).
>

Wasn't the agreement in the end to remove all use of xenstore from the
device mode?  This running notification can and should be QMP, at which
point we break a causal dependency.

For safety reasons, qemu needs to have restricted/dropped/etc all
permissions before it looks at a single byte of incoming migration data,
to protect against buggy or malicious alterations to the migration stream.

~Andrew
Paul Durrant Oct. 13, 2017, 9:06 a.m. UTC | #4
> -----Original Message-----

> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of

> Andrew Cooper

> Sent: 13 October 2017 10:00

> To: Ross Lagerwall <ross.lagerwall@citrix.com>; Ian Jackson

> <Ian.Jackson@citrix.com>; qemu-devel@nongnu.org

> Cc: Anthony Perard <anthony.perard@citrix.com>; Juergen Gross

> <jgross@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-

> devel@lists.xenproject.org

> Subject: Re: [Xen-devel] [PATCH 3/8] xen: defer call to xen_restrict until just

> before os_setup_post

> 

> On 13/10/2017 09:37, Ross Lagerwall wrote:

> > On 10/09/2017 05:01 PM, Ian Jackson wrote:

> >> We need to restrict *all* the control fds that qemu opens.  Looking in

> >> /proc/PID/fd shows there are many; their allocation seems scattered

> >> throughout Xen support code in qemu.

> >>

> >> We must postpone the restrict call until roughly the same time as qemu

> >> changes its uid, chroots (if applicable), and so on.

> >>

> >> There doesn't seem to be an appropriate hook already.  The RunState

> >> change hook fires at different times depending on exactly what mode

> >> qemu is operating in.

> >>

> >> And it appears that no-one but the Xen code wants a hook at this phase

> >> of execution.  So, introduce a bare call to a new function

> >> xen_setup_post, just before os_setup_post.  Also provide the

> >> appropriate stub for when Xen compilation is disabled.

> >>

> >> We do the restriction before rather than after os_setup_post, because

> >> xen_restrict may need to open /dev/null, and os_setup_post might have

> >> called chroot.

> >>

> > This works for normally starting a VM but doesn't seem to work when

> > resuming/migrating.

> >

> > Here is the order of selected operations when starting a VM normally:

> >     VNC server running on 127.0.0.1:5901

> >     xen_change_state_handler()

> >     xenstore_record_dm_state: running

> >     xen_setup_post()

> >     xentoolcore_restrict_all: rc = 0

> >     os_setup_post()

> >     main_loop()

> >

> > Here is the order of selected operations when starting QEMU with

> > -incoming fd:... :

> >     VNC server running on 127.0.0.1:5902

> >     migration_fd_incoming()

> >     xen_setup_post()

> >     xentoolcore_restrict_all: rc = 0

> >     os_setup_post()

> >     main_loop()

> >     migration_set_incoming_channel()

> >     migrate_set_state()

> >     xen_change_state_handler()

> >     xenstore_record_dm_state: running

> >     error recording dm state

> >     qemu exited with code 1

> >

> > The issue is that QEMU needs xenstore access to write "running" but

> > this is after it has already been restricted. Moving xen_setup_post()

> > into xen_change_state_handler() works fine. The only issue is that in

> > the migration case, it executes after os_setup_post() so QEMU might be

> > chrooted in which case opening /dev/null to restrict fds doesn't work

> > (unless its new root has a /dev/null).

> >

> 

> Wasn't the agreement in the end to remove all use of xenstore from the

> device mode?  This running notification can and should be QMP, at which

> point we break a causal dependency.

> 


Yes, that was the agreement. One problem is that there is not yet adequate separation in either QEMU's common and pv/hvm init routines to do this yet. Nor, I believe, is there support in libxl to spawn separate xenpv and xenfv instances of QEMU for the same guest.

  Paul

> For safety reasons, qemu needs to have restricted/dropped/etc all

> permissions before it looks at a single byte of incoming migration data,

> to protect against buggy or malicious alterations to the migration stream.

> 

> ~Andrew

> 

> _______________________________________________

> Xen-devel mailing list

> Xen-devel@lists.xen.org

> https://lists.xen.org/xen-devel
Ian Jackson Oct. 13, 2017, 10:43 a.m. UTC | #5
Ross Lagerwall writes ("Re: [PATCH 3/8] xen: defer call to xen_restrict until just before os_setup_post"):
> This works for normally starting a VM but doesn't seem to work when 
> resuming/migrating.
> 
> Here is the order of selected operations when starting a VM normally:
>      VNC server running on 127.0.0.1:5901
>      xen_change_state_handler()
>      xenstore_record_dm_state: running
>      xen_setup_post()
>      xentoolcore_restrict_all: rc = 0
>      os_setup_post()
>      main_loop()
> 
> Here is the order of selected operations when starting QEMU with 
> -incoming fd:... :
>      VNC server running on 127.0.0.1:5902
>      migration_fd_incoming()
>      xen_setup_post()
>      xentoolcore_restrict_all: rc = 0
>      os_setup_post()
>      main_loop()
>      migration_set_incoming_channel()
>      migrate_set_state()
>      xen_change_state_handler()
>      xenstore_record_dm_state: running
>      error recording dm state
>      qemu exited with code 1
> 
> The issue is that QEMU needs xenstore access to write "running" but this 
> is after it has already been restricted. Moving xen_setup_post() into 
> xen_change_state_handler() works fine. The only issue is that in the 
> migration case, it executes after os_setup_post() so QEMU might be 
> chrooted in which case opening /dev/null to restrict fds doesn't work 
> (unless its new root has a /dev/null).

Thanks for the extensive diagnosis.

We do in fact want the restriction to occur before the migration
stream is read.  This is because we are trying to defend against a
guest which can exploit a bug in qemu.  That means that the sending
qemu must be assumed to be compromised.  In this context I don't think
qemu's migration stream receiver can be regarded as hardened against
hostile input; it's far too complicated, even if efforts have been
made in that direction (I haven't checked).  So to avoid the receiving
qemu being compromised while still unrestricted, we should restrict
before starting to read the migration stream.

The correct fix is to use a different technique to notify the
toolstack that qemu is up and running.  That obviously requires
changes on the Xen tools side.  I will look into that for the Xen 4.11
release cycle.

Ian.
diff mbox series

Patch

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index d9ccd5d..7b60ec6 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -1254,14 +1254,6 @@  void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
         goto err;
     }
 
-    if (xen_domid_restrict) {
-        rc = xen_restrict(xen_domid);
-        if (rc < 0) {
-            error_report("failed to restrict: error %d", errno);
-            goto err;
-        }
-    }
-
     xen_create_ioreq_server(xen_domid, &state->ioservid);
 
     state->exit.notify = xen_exit_notifier;
diff --git a/hw/xen/xen-common.c b/hw/xen/xen-common.c
index 632a938..4056420 100644
--- a/hw/xen/xen-common.c
+++ b/hw/xen/xen-common.c
@@ -117,6 +117,19 @@  static void xen_change_state_handler(void *opaque, int running,
     }
 }
 
+void xen_setup_post(void)
+{
+    int rc;
+
+    if (xen_domid_restrict) {
+        rc = xen_restrict(xen_domid);
+        if (rc < 0) {
+            perror("xen: failed to restrict");
+            exit(1);
+        }
+    }
+}
+
 static int xen_init(MachineState *ms)
 {
     xen_xc = xc_interface_open(0, 0, 0);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index b213696..b064a55 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -93,6 +93,8 @@  void qemu_remove_machine_init_done_notifier(Notifier *notify);
 
 void qemu_announce_self(void);
 
+void xen_setup_post(void);
+
 extern int autostart;
 
 typedef enum {
diff --git a/stubs/xen-hvm.c b/stubs/xen-hvm.c
index 3ca6c51..9701feb 100644
--- a/stubs/xen-hvm.c
+++ b/stubs/xen-hvm.c
@@ -13,6 +13,7 @@ 
 #include "hw/xen/xen.h"
 #include "exec/memory.h"
 #include "qmp-commands.h"
+#include "sysemu/sysemu.h"
 
 int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
 {
@@ -61,3 +62,7 @@  void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
 void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
 {
 }
+
+void xen_setup_post(void)
+{
+}
diff --git a/vl.c b/vl.c
index fb1f05b..ca06553 100644
--- a/vl.c
+++ b/vl.c
@@ -4792,6 +4792,7 @@  int main(int argc, char **argv, char **envp)
         vm_start();
     }
 
+    xen_setup_post();
     os_setup_post();
 
     main_loop();