diff mbox

[RFC] net: Peer with existing NIC in netdev_add

Message ID 1351082961-13628-1-git-send-email-stefanha@redhat.com
State New
Headers show

Commit Message

Stefan Hajnoczi Oct. 24, 2012, 12:49 p.m. UTC
Allow netdev_del followed by netdev_add to re-peer a NIC and its netdev:

  (qemu) info network
  virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
   \ netdev0: type=user,net=10.0.2.0,restrict=off

  (qemu) netdev_del netdev0

  (qemu) netdev_add socket,id=netdev0,listen=:1234

  (qemu) info network
  virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
   \ netdev0: type=socket,

This makes it possible to switch netdev while the guest is running.  It
is not necessary to reset the NIC.

Note that the NIC's link goes down in netdev_del and back up again in
netdev_add.  Therefore the guest becomes aware that the network has
changed, although this depends on the emulated NIC model providing link
status change interrupts.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
This patch addresses the netdev hotplug issue that has been discussed
previously on the mailing list:

  https://www.redhat.com/archives/libvir-list/2012-October/msg00629.html

Initially I thought we need to add a new monitor command.  After looking at the
code it seems that simply allowing netdev_del followed by netdev_add (as
suggested by Laine) is the simplest way to solve the problem.  No new monitor
commands are necessary.

The current code is not safe with virtio-net + tap, where the virtio-net device
reports offload features from the tap device to the guest.  If you try to
switch to a -netdev user or -netdev socket device, those offload capabilities
are not present and network communication will fail.

Laine: Please try this out and see if it works for your use case.

 net.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

Comments

Michael S. Tsirkin Oct. 30, 2012, 3:24 p.m. UTC | #1
On Wed, Oct 24, 2012 at 02:49:21PM +0200, Stefan Hajnoczi wrote:
> Allow netdev_del followed by netdev_add to re-peer a NIC and its netdev:
> 
>   (qemu) info network
>   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
>    \ netdev0: type=user,net=10.0.2.0,restrict=off
> 
>   (qemu) netdev_del netdev0
> 
>   (qemu) netdev_add socket,id=netdev0,listen=:1234
> 
>   (qemu) info network
>   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
>    \ netdev0: type=socket,
> 
> This makes it possible to switch netdev while the guest is running.  It
> is not necessary to reset the NIC.
> 
> Note that the NIC's link goes down in netdev_del and back up again in
> netdev_add.  Therefore the guest becomes aware that the network has
> changed, although this depends on the emulated NIC model providing link
> status change interrupts.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

I'd be surprised if this patch worked when one or both backends are tap.
tap supports offloads but slirp doesn't, since guest
probes offloads at startup, it assumes it can use offloads.
We also program tap during device operation e.g. on set features.
vhost operation could also be interesting, have not looked into it.

> ---
> This patch addresses the netdev hotplug issue that has been discussed
> previously on the mailing list:
> 
>   https://www.redhat.com/archives/libvir-list/2012-October/msg00629.html

I actually think both we and libvirt should address (2):
	(2) because qemu has no asynchronous event to notify libvirt
	when the guest's PCI detach has actually completed, I have to stick in
	an arbitrary call to sleep() which is generally *way* too long, but may
	be too short in some cases of extremely high load.

libvirt should verify device is gone after calling sleep,
if not yet gone - sleep again.

qemu should report an event when device is ejected.

> Initially I thought we need to add a new monitor command.  After looking at the
> code it seems that simply allowing netdev_del followed by netdev_add (as
> suggested by Laine) is the simplest way to solve the problem.  No new monitor
> commands are necessary.
> 
> The current code is not safe with virtio-net + tap, where the virtio-net device
> reports offload features from the tap device to the guest.  If you try to
> switch to a -netdev user or -netdev socket device, those offload capabilities
> are not present and network communication will fail.
> 
> Laine: Please try this out and see if it works for your use case.
> 
>  net.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/net.c b/net.c
> index ae4bc0d..d7c60b0 100644
> --- a/net.c
> +++ b/net.c
> @@ -619,6 +619,19 @@ static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])(
>          [NET_CLIENT_OPTIONS_KIND_HUBPORT]   = net_init_hubport,
>  };
>  
> +static NICState *net_find_nic_for_peername(const char *name)
> +{
> +    NetClientState *nc;
> +    QTAILQ_FOREACH(nc, &net_clients, next) {
> +        if (nc->info->type != NET_CLIENT_OPTIONS_KIND_NIC) {
> +            continue;
> +        }
> +        if (nc->peer && strcmp(nc->peer->name, name) == 0) {
> +            return (NICState *)nc;
> +        }
> +    }
> +    return NULL;
> +}
>  
>  static int net_client_init1(const void *object, int is_netdev, Error **errp)
>  {
> @@ -678,6 +691,25 @@ static int net_client_init1(const void *object, int is_netdev, Error **errp)
>                        NetClientOptionsKind_lookup[opts->kind]);
>              return -1;
>          }
> +
> +        /* Peer with existing nic if netdev name matches */
> +        if (is_netdev && opts->kind != NET_CLIENT_OPTIONS_KIND_NIC) {
> +            NICState *nic = net_find_nic_for_peername(name);
> +            if (nic && nic->peer_deleted) {
> +                NetClientState *netdev = qemu_find_netdev(name);
> +
> +                /* TODO vnet_hdr compatibility check */
> +
> +                qemu_free_net_client(nic->nc.peer);
> +                nic->nc.peer = netdev;
> +                netdev->peer = &nic->nc;
> +                nic->peer_deleted = false;
> +                nic->nc.link_down = false;
> +                if (nic->nc.info->link_status_changed) {
> +                    nic->nc.info->link_status_changed(&nic->nc);
> +                }
> +            }
> +        }
>      }
>      return 0;
>  }
> -- 
> 1.7.11.7
>
Stefan Hajnoczi Oct. 31, 2012, 8:07 a.m. UTC | #2
On Tue, Oct 30, 2012 at 05:24:06PM +0200, Michael S. Tsirkin wrote:
> On Wed, Oct 24, 2012 at 02:49:21PM +0200, Stefan Hajnoczi wrote:
> > Allow netdev_del followed by netdev_add to re-peer a NIC and its netdev:
> > 
> >   (qemu) info network
> >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
> >    \ netdev0: type=user,net=10.0.2.0,restrict=off
> > 
> >   (qemu) netdev_del netdev0
> > 
> >   (qemu) netdev_add socket,id=netdev0,listen=:1234
> > 
> >   (qemu) info network
> >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
> >    \ netdev0: type=socket,
> > 
> > This makes it possible to switch netdev while the guest is running.  It
> > is not necessary to reset the NIC.
> > 
> > Note that the NIC's link goes down in netdev_del and back up again in
> > netdev_add.  Therefore the guest becomes aware that the network has
> > changed, although this depends on the emulated NIC model providing link
> > status change interrupts.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> 
> I'd be surprised if this patch worked when one or both backends are tap.
> tap supports offloads but slirp doesn't, since guest
> probes offloads at startup, it assumes it can use offloads.
> We also program tap during device operation e.g. on set features.
> vhost operation could also be interesting, have not looked into it.

Yes, I left a TODO in the RFC patch and described the issue below.
We'll have to reject incompatible netdevs.

At this stage I'd like to see if this approach works for Laine.  If it
looks good I'll figure out how we can make vhost and offload checks.

> > ---
> > This patch addresses the netdev hotplug issue that has been discussed
> > previously on the mailing list:
> > 
> >   https://www.redhat.com/archives/libvir-list/2012-October/msg00629.html
> 
> I actually think both we and libvirt should address (2):
> 	(2) because qemu has no asynchronous event to notify libvirt
> 	when the guest's PCI detach has actually completed, I have to stick in
> 	an arbitrary call to sleep() which is generally *way* too long, but may
> 	be too short in some cases of extremely high load.
> 
> libvirt should verify device is gone after calling sleep,
> if not yet gone - sleep again.
> 
> qemu should report an event when device is ejected.

I agree.  The hotplug implementation and how it is driven by QMP clients
is really ugly today.

Stefan
Michael S. Tsirkin Oct. 31, 2012, 8:57 a.m. UTC | #3
On Wed, Oct 31, 2012 at 09:07:27AM +0100, Stefan Hajnoczi wrote:
> On Tue, Oct 30, 2012 at 05:24:06PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Oct 24, 2012 at 02:49:21PM +0200, Stefan Hajnoczi wrote:
> > > Allow netdev_del followed by netdev_add to re-peer a NIC and its netdev:
> > > 
> > >   (qemu) info network
> > >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
> > >    \ netdev0: type=user,net=10.0.2.0,restrict=off
> > > 
> > >   (qemu) netdev_del netdev0
> > > 
> > >   (qemu) netdev_add socket,id=netdev0,listen=:1234
> > > 
> > >   (qemu) info network
> > >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
> > >    \ netdev0: type=socket,
> > > 
> > > This makes it possible to switch netdev while the guest is running.  It
> > > is not necessary to reset the NIC.
> > > 
> > > Note that the NIC's link goes down in netdev_del and back up again in
> > > netdev_add.  Therefore the guest becomes aware that the network has
> > > changed, although this depends on the emulated NIC model providing link
> > > status change interrupts.
> > > 
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > I'd be surprised if this patch worked when one or both backends are tap.
> > tap supports offloads but slirp doesn't, since guest
> > probes offloads at startup, it assumes it can use offloads.
> > We also program tap during device operation e.g. on set features.
> > vhost operation could also be interesting, have not looked into it.
> 
> Yes, I left a TODO in the RFC patch and described the issue below.
> We'll have to reject incompatible netdevs.

Ideally, we'd probe all backend capabilities at init time.
However, looks like we allowed netdev and device creation in any order.
Can we change this and require netdev always be there before device?

> At this stage I'd like to see if this approach works for Laine.  If it
> looks good I'll figure out how we can make vhost and offload checks.
> 
> > > ---
> > > This patch addresses the netdev hotplug issue that has been discussed
> > > previously on the mailing list:
> > > 
> > >   https://www.redhat.com/archives/libvir-list/2012-October/msg00629.html
> > 
> > I actually think both we and libvirt should address (2):
> > 	(2) because qemu has no asynchronous event to notify libvirt
> > 	when the guest's PCI detach has actually completed, I have to stick in
> > 	an arbitrary call to sleep() which is generally *way* too long, but may
> > 	be too short in some cases of extremely high load.
> > 
> > libvirt should verify device is gone after calling sleep,
> > if not yet gone - sleep again.
> > 
> > qemu should report an event when device is ejected.
> 
> I agree.  The hotplug implementation and how it is driven by QMP clients
> is really ugly today.
> 
> Stefan
Stefan Hajnoczi Oct. 31, 2012, 2:51 p.m. UTC | #4
On Wed, Oct 31, 2012 at 10:57:24AM +0200, Michael S. Tsirkin wrote:
> On Wed, Oct 31, 2012 at 09:07:27AM +0100, Stefan Hajnoczi wrote:
> > On Tue, Oct 30, 2012 at 05:24:06PM +0200, Michael S. Tsirkin wrote:
> > > On Wed, Oct 24, 2012 at 02:49:21PM +0200, Stefan Hajnoczi wrote:
> > > > Allow netdev_del followed by netdev_add to re-peer a NIC and its netdev:
> > > > 
> > > >   (qemu) info network
> > > >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
> > > >    \ netdev0: type=user,net=10.0.2.0,restrict=off
> > > > 
> > > >   (qemu) netdev_del netdev0
> > > > 
> > > >   (qemu) netdev_add socket,id=netdev0,listen=:1234
> > > > 
> > > >   (qemu) info network
> > > >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
> > > >    \ netdev0: type=socket,
> > > > 
> > > > This makes it possible to switch netdev while the guest is running.  It
> > > > is not necessary to reset the NIC.
> > > > 
> > > > Note that the NIC's link goes down in netdev_del and back up again in
> > > > netdev_add.  Therefore the guest becomes aware that the network has
> > > > changed, although this depends on the emulated NIC model providing link
> > > > status change interrupts.
> > > > 
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > 
> > > I'd be surprised if this patch worked when one or both backends are tap.
> > > tap supports offloads but slirp doesn't, since guest
> > > probes offloads at startup, it assumes it can use offloads.
> > > We also program tap during device operation e.g. on set features.
> > > vhost operation could also be interesting, have not looked into it.
> > 
> > Yes, I left a TODO in the RFC patch and described the issue below.
> > We'll have to reject incompatible netdevs.
> 
> Ideally, we'd probe all backend capabilities at init time.
> However, looks like we allowed netdev and device creation in any order.
> Can we change this and require netdev always be there before device?

I don't think the order is a problem.  The relaxed order is only
relevant during startup from main() - but in that case we have no
constraints yet anyway.

The problem only occurs when netdev_add is used to create an
incompatible netdev after devices have initialized.  We should be able
to check and error out in the code that my RFC patch modifies.  If
constraints are violated then netdev_add can fail with an error (the new
netdev is not created and the QMP client needs to try again with a
compatible netdev configuration).

Maybe I'm misunderstanding your point?

Stefan
Michael S. Tsirkin Oct. 31, 2012, 4:34 p.m. UTC | #5
On Wed, Oct 31, 2012 at 03:51:08PM +0100, Stefan Hajnoczi wrote:
> On Wed, Oct 31, 2012 at 10:57:24AM +0200, Michael S. Tsirkin wrote:
> > On Wed, Oct 31, 2012 at 09:07:27AM +0100, Stefan Hajnoczi wrote:
> > > On Tue, Oct 30, 2012 at 05:24:06PM +0200, Michael S. Tsirkin wrote:
> > > > On Wed, Oct 24, 2012 at 02:49:21PM +0200, Stefan Hajnoczi wrote:
> > > > > Allow netdev_del followed by netdev_add to re-peer a NIC and its netdev:
> > > > > 
> > > > >   (qemu) info network
> > > > >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
> > > > >    \ netdev0: type=user,net=10.0.2.0,restrict=off
> > > > > 
> > > > >   (qemu) netdev_del netdev0
> > > > > 
> > > > >   (qemu) netdev_add socket,id=netdev0,listen=:1234
> > > > > 
> > > > >   (qemu) info network
> > > > >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
> > > > >    \ netdev0: type=socket,
> > > > > 
> > > > > This makes it possible to switch netdev while the guest is running.  It
> > > > > is not necessary to reset the NIC.
> > > > > 
> > > > > Note that the NIC's link goes down in netdev_del and back up again in
> > > > > netdev_add.  Therefore the guest becomes aware that the network has
> > > > > changed, although this depends on the emulated NIC model providing link
> > > > > status change interrupts.
> > > > > 
> > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > 
> > > > I'd be surprised if this patch worked when one or both backends are tap.
> > > > tap supports offloads but slirp doesn't, since guest
> > > > probes offloads at startup, it assumes it can use offloads.
> > > > We also program tap during device operation e.g. on set features.
> > > > vhost operation could also be interesting, have not looked into it.
> > > 
> > > Yes, I left a TODO in the RFC patch and described the issue below.
> > > We'll have to reject incompatible netdevs.
> > 
> > Ideally, we'd probe all backend capabilities at init time.
> > However, looks like we allowed netdev and device creation in any order.
> > Can we change this and require netdev always be there before device?
> 
> I don't think the order is a problem.  The relaxed order is only
> relevant during startup from main() - but in that case we have no
> constraints yet anyway.
> The problem only occurs when netdev_add is used to create an
> incompatible netdev after devices have initialized.  We should be able
> to check and error out in the code that my RFC patch modifies.  If
> constraints are violated then netdev_add can fail with an error (the new
> netdev is not created and the QMP client needs to try again with a
> compatible netdev configuration).
> 
> Maybe I'm misunderstanding your point?
> 
> Stefan

OK so if we basically require same type backend then I think it's mostly
fine.  I was trying to think of a way to allow changing backend type,
this becomes messy very quickly.  In partuclar macvtap probably
shouldn't be swapped with tap even though they are the same type
formally.
Stefan Hajnoczi Nov. 1, 2012, 9:53 a.m. UTC | #6
On Wed, Oct 31, 2012 at 06:34:07PM +0200, Michael S. Tsirkin wrote:
> On Wed, Oct 31, 2012 at 03:51:08PM +0100, Stefan Hajnoczi wrote:
> > On Wed, Oct 31, 2012 at 10:57:24AM +0200, Michael S. Tsirkin wrote:
> > > On Wed, Oct 31, 2012 at 09:07:27AM +0100, Stefan Hajnoczi wrote:
> > > > On Tue, Oct 30, 2012 at 05:24:06PM +0200, Michael S. Tsirkin wrote:
> > > > > On Wed, Oct 24, 2012 at 02:49:21PM +0200, Stefan Hajnoczi wrote:
> > > > > > Allow netdev_del followed by netdev_add to re-peer a NIC and its netdev:
> > > > > > 
> > > > > >   (qemu) info network
> > > > > >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
> > > > > >    \ netdev0: type=user,net=10.0.2.0,restrict=off
> > > > > > 
> > > > > >   (qemu) netdev_del netdev0
> > > > > > 
> > > > > >   (qemu) netdev_add socket,id=netdev0,listen=:1234
> > > > > > 
> > > > > >   (qemu) info network
> > > > > >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
> > > > > >    \ netdev0: type=socket,
> > > > > > 
> > > > > > This makes it possible to switch netdev while the guest is running.  It
> > > > > > is not necessary to reset the NIC.
> > > > > > 
> > > > > > Note that the NIC's link goes down in netdev_del and back up again in
> > > > > > netdev_add.  Therefore the guest becomes aware that the network has
> > > > > > changed, although this depends on the emulated NIC model providing link
> > > > > > status change interrupts.
> > > > > > 
> > > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > 
> > > > > I'd be surprised if this patch worked when one or both backends are tap.
> > > > > tap supports offloads but slirp doesn't, since guest
> > > > > probes offloads at startup, it assumes it can use offloads.
> > > > > We also program tap during device operation e.g. on set features.
> > > > > vhost operation could also be interesting, have not looked into it.
> > > > 
> > > > Yes, I left a TODO in the RFC patch and described the issue below.
> > > > We'll have to reject incompatible netdevs.
> > > 
> > > Ideally, we'd probe all backend capabilities at init time.
> > > However, looks like we allowed netdev and device creation in any order.
> > > Can we change this and require netdev always be there before device?
> > 
> > I don't think the order is a problem.  The relaxed order is only
> > relevant during startup from main() - but in that case we have no
> > constraints yet anyway.
> > The problem only occurs when netdev_add is used to create an
> > incompatible netdev after devices have initialized.  We should be able
> > to check and error out in the code that my RFC patch modifies.  If
> > constraints are violated then netdev_add can fail with an error (the new
> > netdev is not created and the QMP client needs to try again with a
> > compatible netdev configuration).
> > 
> > Maybe I'm misunderstanding your point?
> > 
> > Stefan
> 
> OK so if we basically require same type backend then I think it's mostly
> fine.  I was trying to think of a way to allow changing backend type,
> this becomes messy very quickly.  In partuclar macvtap probably
> shouldn't be swapped with tap even though they are the same type
> formally.

As long as they are offload-compatible, I think they can be swapped.
It's up to the user or the management stack to make sure switching
netdevs makes "sense".  So the network may be different and the guest
needs to DHCP again, but that's the user's problem.

Is there a reason why macvtap <-> tap won't work given compatible
vnet_hdr offload?

Stefan
Michael S. Tsirkin Nov. 1, 2012, 11:31 a.m. UTC | #7
On Thu, Nov 01, 2012 at 10:53:52AM +0100, Stefan Hajnoczi wrote:
> On Wed, Oct 31, 2012 at 06:34:07PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Oct 31, 2012 at 03:51:08PM +0100, Stefan Hajnoczi wrote:
> > > On Wed, Oct 31, 2012 at 10:57:24AM +0200, Michael S. Tsirkin wrote:
> > > > On Wed, Oct 31, 2012 at 09:07:27AM +0100, Stefan Hajnoczi wrote:
> > > > > On Tue, Oct 30, 2012 at 05:24:06PM +0200, Michael S. Tsirkin wrote:
> > > > > > On Wed, Oct 24, 2012 at 02:49:21PM +0200, Stefan Hajnoczi wrote:
> > > > > > > Allow netdev_del followed by netdev_add to re-peer a NIC and its netdev:
> > > > > > > 
> > > > > > >   (qemu) info network
> > > > > > >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
> > > > > > >    \ netdev0: type=user,net=10.0.2.0,restrict=off
> > > > > > > 
> > > > > > >   (qemu) netdev_del netdev0
> > > > > > > 
> > > > > > >   (qemu) netdev_add socket,id=netdev0,listen=:1234
> > > > > > > 
> > > > > > >   (qemu) info network
> > > > > > >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
> > > > > > >    \ netdev0: type=socket,
> > > > > > > 
> > > > > > > This makes it possible to switch netdev while the guest is running.  It
> > > > > > > is not necessary to reset the NIC.
> > > > > > > 
> > > > > > > Note that the NIC's link goes down in netdev_del and back up again in
> > > > > > > netdev_add.  Therefore the guest becomes aware that the network has
> > > > > > > changed, although this depends on the emulated NIC model providing link
> > > > > > > status change interrupts.
> > > > > > > 
> > > > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > 
> > > > > > I'd be surprised if this patch worked when one or both backends are tap.
> > > > > > tap supports offloads but slirp doesn't, since guest
> > > > > > probes offloads at startup, it assumes it can use offloads.
> > > > > > We also program tap during device operation e.g. on set features.
> > > > > > vhost operation could also be interesting, have not looked into it.
> > > > > 
> > > > > Yes, I left a TODO in the RFC patch and described the issue below.
> > > > > We'll have to reject incompatible netdevs.
> > > > 
> > > > Ideally, we'd probe all backend capabilities at init time.
> > > > However, looks like we allowed netdev and device creation in any order.
> > > > Can we change this and require netdev always be there before device?
> > > 
> > > I don't think the order is a problem.  The relaxed order is only
> > > relevant during startup from main() - but in that case we have no
> > > constraints yet anyway.
> > > The problem only occurs when netdev_add is used to create an
> > > incompatible netdev after devices have initialized.  We should be able
> > > to check and error out in the code that my RFC patch modifies.  If
> > > constraints are violated then netdev_add can fail with an error (the new
> > > netdev is not created and the QMP client needs to try again with a
> > > compatible netdev configuration).
> > > 
> > > Maybe I'm misunderstanding your point?
> > > 
> > > Stefan
> > 
> > OK so if we basically require same type backend then I think it's mostly
> > fine.  I was trying to think of a way to allow changing backend type,
> > this becomes messy very quickly.  In partuclar macvtap probably
> > shouldn't be swapped with tap even though they are the same type
> > formally.
> 
> As long as they are offload-compatible, I think they can be swapped.
> It's up to the user or the management stack to make sure switching
> netdevs makes "sense".  So the network may be different and the guest
> needs to DHCP again, but that's the user's problem.

I think a simple rule like "use same backend type" is better than
an opaque one "are offload-compatible" - user has no idea
which offloads do each of the frontends and backends support.
Also if in future we add offloads to backend X suddenly we
break ability to swap with backend Y.
Let's keep it simple.

> Is there a reason why macvtap <-> tap won't work given compatible
> vnet_hdr offload?
> 
> Stefan

There's no guarantee they will support same offload options in all
kernels, in fact thats' not the case in some kernel.org kernels.
Stefan Hajnoczi Nov. 2, 2012, 10:32 a.m. UTC | #8
On Thu, Nov 1, 2012 at 12:31 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Nov 01, 2012 at 10:53:52AM +0100, Stefan Hajnoczi wrote:
>> On Wed, Oct 31, 2012 at 06:34:07PM +0200, Michael S. Tsirkin wrote:
>> > On Wed, Oct 31, 2012 at 03:51:08PM +0100, Stefan Hajnoczi wrote:
>> > > On Wed, Oct 31, 2012 at 10:57:24AM +0200, Michael S. Tsirkin wrote:
>> > > > On Wed, Oct 31, 2012 at 09:07:27AM +0100, Stefan Hajnoczi wrote:
>> > > > > On Tue, Oct 30, 2012 at 05:24:06PM +0200, Michael S. Tsirkin wrote:
>> > > > > > On Wed, Oct 24, 2012 at 02:49:21PM +0200, Stefan Hajnoczi wrote:
>> > > > > > > Allow netdev_del followed by netdev_add to re-peer a NIC and its netdev:
>> > > > > > >
>> > > > > > >   (qemu) info network
>> > > > > > >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
>> > > > > > >    \ netdev0: type=user,net=10.0.2.0,restrict=off
>> > > > > > >
>> > > > > > >   (qemu) netdev_del netdev0
>> > > > > > >
>> > > > > > >   (qemu) netdev_add socket,id=netdev0,listen=:1234
>> > > > > > >
>> > > > > > >   (qemu) info network
>> > > > > > >   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
>> > > > > > >    \ netdev0: type=socket,
>> > > > > > >
>> > > > > > > This makes it possible to switch netdev while the guest is running.  It
>> > > > > > > is not necessary to reset the NIC.
>> > > > > > >
>> > > > > > > Note that the NIC's link goes down in netdev_del and back up again in
>> > > > > > > netdev_add.  Therefore the guest becomes aware that the network has
>> > > > > > > changed, although this depends on the emulated NIC model providing link
>> > > > > > > status change interrupts.
>> > > > > > >
>> > > > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> > > > > >
>> > > > > > I'd be surprised if this patch worked when one or both backends are tap.
>> > > > > > tap supports offloads but slirp doesn't, since guest
>> > > > > > probes offloads at startup, it assumes it can use offloads.
>> > > > > > We also program tap during device operation e.g. on set features.
>> > > > > > vhost operation could also be interesting, have not looked into it.
>> > > > >
>> > > > > Yes, I left a TODO in the RFC patch and described the issue below.
>> > > > > We'll have to reject incompatible netdevs.
>> > > >
>> > > > Ideally, we'd probe all backend capabilities at init time.
>> > > > However, looks like we allowed netdev and device creation in any order.
>> > > > Can we change this and require netdev always be there before device?
>> > >
>> > > I don't think the order is a problem.  The relaxed order is only
>> > > relevant during startup from main() - but in that case we have no
>> > > constraints yet anyway.
>> > > The problem only occurs when netdev_add is used to create an
>> > > incompatible netdev after devices have initialized.  We should be able
>> > > to check and error out in the code that my RFC patch modifies.  If
>> > > constraints are violated then netdev_add can fail with an error (the new
>> > > netdev is not created and the QMP client needs to try again with a
>> > > compatible netdev configuration).
>> > >
>> > > Maybe I'm misunderstanding your point?
>> > >
>> > > Stefan
>> >
>> > OK so if we basically require same type backend then I think it's mostly
>> > fine.  I was trying to think of a way to allow changing backend type,
>> > this becomes messy very quickly.  In partuclar macvtap probably
>> > shouldn't be swapped with tap even though they are the same type
>> > formally.
>>
>> As long as they are offload-compatible, I think they can be swapped.
>> It's up to the user or the management stack to make sure switching
>> netdevs makes "sense".  So the network may be different and the guest
>> needs to DHCP again, but that's the user's problem.
>
> I think a simple rule like "use same backend type" is better than
> an opaque one "are offload-compatible" - user has no idea
> which offloads do each of the frontends and backends support.
> Also if in future we add offloads to backend X suddenly we
> break ability to swap with backend Y.
> Let's keep it simple.

Okay, that's a safe constraint that we can start with.  If users
request more freedom later we can get fancy.

Stefan
Stefan Hajnoczi Nov. 2, 2012, 10:34 a.m. UTC | #9
On Wed, Oct 24, 2012 at 2:49 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> Laine: Please try this out and see if it works for your use case.

Waiting for your feedback before I prepare a final patch that can go into QEMU.

There's no time pressure from my side to get this feature in so take
as much time as you need.

Stefan
diff mbox

Patch

diff --git a/net.c b/net.c
index ae4bc0d..d7c60b0 100644
--- a/net.c
+++ b/net.c
@@ -619,6 +619,19 @@  static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])(
         [NET_CLIENT_OPTIONS_KIND_HUBPORT]   = net_init_hubport,
 };
 
+static NICState *net_find_nic_for_peername(const char *name)
+{
+    NetClientState *nc;
+    QTAILQ_FOREACH(nc, &net_clients, next) {
+        if (nc->info->type != NET_CLIENT_OPTIONS_KIND_NIC) {
+            continue;
+        }
+        if (nc->peer && strcmp(nc->peer->name, name) == 0) {
+            return (NICState *)nc;
+        }
+    }
+    return NULL;
+}
 
 static int net_client_init1(const void *object, int is_netdev, Error **errp)
 {
@@ -678,6 +691,25 @@  static int net_client_init1(const void *object, int is_netdev, Error **errp)
                       NetClientOptionsKind_lookup[opts->kind]);
             return -1;
         }
+
+        /* Peer with existing nic if netdev name matches */
+        if (is_netdev && opts->kind != NET_CLIENT_OPTIONS_KIND_NIC) {
+            NICState *nic = net_find_nic_for_peername(name);
+            if (nic && nic->peer_deleted) {
+                NetClientState *netdev = qemu_find_netdev(name);
+
+                /* TODO vnet_hdr compatibility check */
+
+                qemu_free_net_client(nic->nc.peer);
+                nic->nc.peer = netdev;
+                netdev->peer = &nic->nc;
+                nic->peer_deleted = false;
+                nic->nc.link_down = false;
+                if (nic->nc.info->link_status_changed) {
+                    nic->nc.info->link_status_changed(&nic->nc);
+                }
+            }
+        }
     }
     return 0;
 }