Patchwork [v2] net: Fix hotplug with pci_add

login
register
mail settings
Submitter Amit Shah
Date June 8, 2010, 3:43 p.m.
Message ID <356ef9bdde008d695e7c75dd1566222d6160d4b6.1276011638.git.amit.shah@redhat.com>
Download mbox | patch
Permalink /patch/55002/
State New
Headers show

Comments

Amit Shah - June 8, 2010, 3:43 p.m.
The correct model type wasn't getting added when hotplugging nics with
pci_add.

Testcase: start VM with default nic type. In the qemu_monitor:

(qemu) pci_add auto nic model=virtio

This results in a nic hot-plug of the same nic type as the default.

This was broken in 5294e2c774f120e10b44652ac143abda356f44eb

Also changes the behaviour where no .init is defined for a
net_client_type. Previously, 0 was returned, which indicated the init
was successful and that 0 was the index into the nd_tables[] array.
Return -1, indicating unsuccessful init, in such a case.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
Sorry, v1 was a stale patch.

v2:
 - Init 'ret' to -1, fixes compile err and added note in the commit msg
   explaining this.

 net.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)
Juan Quintela - June 8, 2010, 3:55 p.m.
Amit Shah <amit.shah@redhat.com> wrote:
> The correct model type wasn't getting added when hotplugging nics with
> pci_add.
>
> Testcase: start VM with default nic type. In the qemu_monitor:
>
> (qemu) pci_add auto nic model=virtio
>
> This results in a nic hot-plug of the same nic type as the default.
>
> This was broken in 5294e2c774f120e10b44652ac143abda356f44eb
>
> Also changes the behaviour where no .init is defined for a
> net_client_type. Previously, 0 was returned, which indicated the init
> was successful and that 0 was the index into the nd_tables[] array.
> Return -1, indicating unsuccessful init, in such a case.
>
> Signed-off-by: Amit Shah <amit.shah@redhat.com>
> ---
> Sorry, v1 was a stale patch.
>
> v2:
>  - Init 'ret' to -1, fixes compile err and added note in the commit msg
>    explaining this.
>
>  net.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)

Acked-by: Juan Quintela <quintela@redhat.com>
Markus Armbruster - June 8, 2010, 4:33 p.m.
Amit Shah <amit.shah@redhat.com> writes:

> The correct model type wasn't getting added when hotplugging nics with
> pci_add.
>
> Testcase: start VM with default nic type. In the qemu_monitor:
>
> (qemu) pci_add auto nic model=virtio
>
> This results in a nic hot-plug of the same nic type as the default.

Works fine for me on master, fd1dc858370d9a9ac7ea2512812c3a152ee6484b.
What am I doing wrong?

> This was broken in 5294e2c774f120e10b44652ac143abda356f44eb
>
> Also changes the behaviour where no .init is defined for a
> net_client_type. Previously, 0 was returned, which indicated the init
> was successful and that 0 was the index into the nd_tables[] array.
> Return -1, indicating unsuccessful init, in such a case.

The only element of net_client_types[] without an init() method is type
"none", index 0.  So, doesn't this break -net none?  And what does it
fix?
Amit Shah - June 8, 2010, 5:26 p.m.
On (Tue) Jun 08 2010 [18:33:00], Markus Armbruster wrote:
> Amit Shah <amit.shah@redhat.com> writes:
> 
> > The correct model type wasn't getting added when hotplugging nics with
> > pci_add.
> >
> > Testcase: start VM with default nic type. In the qemu_monitor:
> >
> > (qemu) pci_add auto nic model=virtio
> >
> > This results in a nic hot-plug of the same nic type as the default.
> 
> Works fine for me on master, fd1dc858370d9a9ac7ea2512812c3a152ee6484b.
> What am I doing wrong?

Did you start with a virtio nic added? The 'default' here is the nic
type that's added as the first nic. Try this: start a VM with model
e1000 and use pci_add to add a nic type of virtio.

> > This was broken in 5294e2c774f120e10b44652ac143abda356f44eb
> >
> > Also changes the behaviour where no .init is defined for a
> > net_client_type. Previously, 0 was returned, which indicated the init
> > was successful and that 0 was the index into the nd_tables[] array.
> > Return -1, indicating unsuccessful init, in such a case.
> 
> The only element of net_client_types[] without an init() method is type
> "none", index 0.  So, doesn't this break -net none?  And what does it
> fix?

The net_client_types[] index isn't relevant here. -net none works fine,
no problem.

		Amit
Markus Armbruster - June 9, 2010, 6:37 a.m.
Amit Shah <amit.shah@redhat.com> writes:

> On (Tue) Jun 08 2010 [18:33:00], Markus Armbruster wrote:
>> Amit Shah <amit.shah@redhat.com> writes:
>> 
>> > The correct model type wasn't getting added when hotplugging nics with
>> > pci_add.
>> >
>> > Testcase: start VM with default nic type. In the qemu_monitor:
>> >
>> > (qemu) pci_add auto nic model=virtio
>> >
>> > This results in a nic hot-plug of the same nic type as the default.
>> 
>> Works fine for me on master, fd1dc858370d9a9ac7ea2512812c3a152ee6484b.
>> What am I doing wrong?
>
> Did you start with a virtio nic added? The 'default' here is the nic
> type that's added as the first nic. Try this: start a VM with model
> e1000 and use pci_add to add a nic type of virtio.

I do.  Nevertheless, "pci_add auto nic model=e1000" adds an e1000.

Also works if I start without a NIC.

Ah, if I start with a -net nic, then pci_add breaks as described!

Now my next question is *how* your patch fixes this.  It's not at all
obvious to me.

>> > This was broken in 5294e2c774f120e10b44652ac143abda356f44eb
>> >
>> > Also changes the behaviour where no .init is defined for a
>> > net_client_type. Previously, 0 was returned, which indicated the init
>> > was successful and that 0 was the index into the nd_tables[] array.
>> > Return -1, indicating unsuccessful init, in such a case.
>> 
>> The only element of net_client_types[] without an init() method is type
>> "none", index 0.  So, doesn't this break -net none?  And what does it
>> fix?
>
> The net_client_types[] index isn't relevant here. -net none works fine,
> no problem.

Let me rephrase: Behavior changes for -net types without an init()
method.  The only one without an init() method is "none".  Before,
net_client_init() succeeded for it.  Now it fails.  What's the impact of
that change?  And why does it make sense?
Juan Quintela - June 9, 2010, 7:59 a.m.
Markus Armbruster <armbru@redhat.com> wrote:
> Amit Shah <amit.shah@redhat.com> writes:
>
>> On (Tue) Jun 08 2010 [18:33:00], Markus Armbruster wrote:
>>> Amit Shah <amit.shah@redhat.com> writes:
>>> 
>>> > The correct model type wasn't getting added when hotplugging nics with
>>> > pci_add.
>>> >
>>> > Testcase: start VM with default nic type. In the qemu_monitor:
>>> >
>>> > (qemu) pci_add auto nic model=virtio
>>> >
>>> > This results in a nic hot-plug of the same nic type as the default.
>>> 
>>> Works fine for me on master, fd1dc858370d9a9ac7ea2512812c3a152ee6484b.
>>> What am I doing wrong?
>>
>> Did you start with a virtio nic added? The 'default' here is the nic
>> type that's added as the first nic. Try this: start a VM with model
>> e1000 and use pci_add to add a nic type of virtio.
>
> I do.  Nevertheless, "pci_add auto nic model=e1000" adds an e1000.
>
> Also works if I start without a NIC.
>
> Ah, if I start with a -net nic, then pci_add breaks as described!
>
> Now my next question is *how* your patch fixes this.  It's not at all
> obvious to me.

Far away form the patch file.  Look at:

hw/pci-hotplug.c

static PCIDevice *qemu_pci_hot_add_nic(Monitor *mon,
                                       const char *devaddr,
                                       const char *opts_str)
{
    .....
    ret = net_client_init(mon, opts, 0);
    .....
    return pci_nic_init(&nd_table[ret], "rtl8139", devaddr);
}


You can see that accessing nd_table with value 1 as before was wrong.

BTW, once here, didn't default nic should be e1000? not rtl8139?

>>> > This was broken in 5294e2c774f120e10b44652ac143abda356f44eb
>>> >
>>> > Also changes the behaviour where no .init is defined for a
>>> > net_client_type. Previously, 0 was returned, which indicated the init
>>> > was successful and that 0 was the index into the nd_tables[] array.
>>> > Return -1, indicating unsuccessful init, in such a case.
>>> 
>>> The only element of net_client_types[] without an init() method is type
>>> "none", index 0.  So, doesn't this break -net none?  And what does it
>>> fix?
>>
>> The net_client_types[] index isn't relevant here. -net none works fine,
>> no problem.
>
> Let me rephrase: Behavior changes for -net types without an init()
> method.  The only one without an init() method is "none".  Before,
> net_client_init() succeeded for it.  Now it fails.  What's the impact of
> that change?  And why does it make sense?

Here, I don't know.

Later, Juan.
Amit Shah - June 9, 2010, 9:58 a.m.
On (Wed) Jun 09 2010 [08:37:26], Markus Armbruster wrote:
> Amit Shah <amit.shah@redhat.com> writes:
> 
> > On (Tue) Jun 08 2010 [18:33:00], Markus Armbruster wrote:
> >> Amit Shah <amit.shah@redhat.com> writes:
> >> 
> >> > The correct model type wasn't getting added when hotplugging nics with
> >> > pci_add.
> >> >
> >> > Testcase: start VM with default nic type. In the qemu_monitor:
> >> >
> >> > (qemu) pci_add auto nic model=virtio
> >> >
> >> > This results in a nic hot-plug of the same nic type as the default.
> >> 
> >> Works fine for me on master, fd1dc858370d9a9ac7ea2512812c3a152ee6484b.
> >> What am I doing wrong?
> >
> > Did you start with a virtio nic added? The 'default' here is the nic
> > type that's added as the first nic. Try this: start a VM with model
> > e1000 and use pci_add to add a nic type of virtio.
> 
> I do.  Nevertheless, "pci_add auto nic model=e1000" adds an e1000.
> 
> Also works if I start without a NIC.
> 
> Ah, if I start with a -net nic, then pci_add breaks as described!
> 
> Now my next question is *how* your patch fixes this.  It's not at all
> obvious to me.

The callers of net_client_init() expect a positive return value that
means an index into the nd_table[] array, where the options just parsed
are placed.

Let's look at 5294e2c774f120e10b44652ac143abda356f44eb, which broke this
thing:

 
             if (net_client_types[i].init) {
-                return net_client_types[i].init(opts, mon, name, vlan);
-            } else {
-                return 0;
+                if (net_client_types[i].init(opts, mon, name, vlan) < 0) {
+                    /* TODO push error reporting into init() methods */
+                    qerror_report(QERR_DEVICE_INIT_FAILED, type);
+                    return -1;
+                }
             }
+            return 0;

See it yet? instead of returning the value returned by
net_client_types[i].init(), we're now just returning either 0 or -1. The
return value is now by default '0' for successful init(), which causes
the calling functions to just end up using the same parameters that were
passed for the first nic type.

> >> > This was broken in 5294e2c774f120e10b44652ac143abda356f44eb
> >> >
> >> > Also changes the behaviour where no .init is defined for a
> >> > net_client_type. Previously, 0 was returned, which indicated the init
> >> > was successful and that 0 was the index into the nd_tables[] array.
> >> > Return -1, indicating unsuccessful init, in such a case.
> >> 
> >> The only element of net_client_types[] without an init() method is type
> >> "none", index 0.  So, doesn't this break -net none?  And what does it
> >> fix?
> >
> > The net_client_types[] index isn't relevant here. -net none works fine,
> > no problem.
> 
> Let me rephrase: Behavior changes for -net types without an init()
> method.  The only one without an init() method is "none".  Before,
> net_client_init() succeeded for it.  Now it fails.  What's the impact of
> that change?  And why does it make sense?

It makes sense because we don't actually initialise anything. We don't
place anything in the nd_table[] array. That means callers shouldn't
poke in the array for any values. Returning -1 makes sense for that
reason. If we continued to return 0, callers might just assume that init
was successful and that nd_table[0] was set up for use appropriately.

The thing is that the code doesn't go this far in case of '-net none'
anyway. This was just a potential bug lurking around for any new -net
method which didn't have an init.

		Amit
Michael S. Tsirkin - June 9, 2010, 9:58 a.m.
On Tue, Jun 08, 2010 at 09:13:58PM +0530, Amit Shah wrote:
> The correct model type wasn't getting added when hotplugging nics with
> pci_add.
> 
> Testcase: start VM with default nic type. In the qemu_monitor:
> 
> (qemu) pci_add auto nic model=virtio
> 
> This results in a nic hot-plug of the same nic type as the default.
> 
> This was broken in 5294e2c774f120e10b44652ac143abda356f44eb
> 
> Also changes the behaviour where no .init is defined for a
> net_client_type. Previously, 0 was returned, which indicated the init
> was successful and that 0 was the index into the nd_tables[] array.
> Return -1, indicating unsuccessful init, in such a case.
> 
> Signed-off-by: Amit Shah <amit.shah@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

I'll pick this up if no one beats me to it.

> ---
> Sorry, v1 was a stale patch.
> 
> v2:
>  - Init 'ret' to -1, fixes compile err and added note in the commit msg
>    explaining this.
> 
>  net.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net.c b/net.c
> index efa8b3d..4cb93ed 100644
> --- a/net.c
> +++ b/net.c
> @@ -1106,6 +1106,7 @@ int net_client_init(Monitor *mon, QemuOpts *opts, int is_netdev)
>      for (i = 0; net_client_types[i].type != NULL; i++) {
>          if (!strcmp(net_client_types[i].type, type)) {
>              VLANState *vlan = NULL;
> +            int ret;
>  
>              if (qemu_opts_validate(opts, &net_client_types[i].desc[0]) == -1) {
>                  return -1;
> @@ -1118,14 +1119,16 @@ int net_client_init(Monitor *mon, QemuOpts *opts, int is_netdev)
>                  vlan = qemu_find_vlan(qemu_opt_get_number(opts, "vlan", 0), 1);
>              }
>  
> +            ret = -1;
>              if (net_client_types[i].init) {
> -                if (net_client_types[i].init(opts, mon, name, vlan) < 0) {
> +                ret = net_client_types[i].init(opts, mon, name, vlan);
> +                if (ret < 0) {
>                      /* TODO push error reporting into init() methods */
>                      qerror_report(QERR_DEVICE_INIT_FAILED, type);
>                      return -1;
>                  }
>              }
> -            return 0;
> +            return ret;
>          }
>      }
>  
> -- 
> 1.7.0.1
>
Amit Shah - June 9, 2010, 10:28 a.m.
On (Wed) Jun 09 2010 [09:59:50], Juan Quintela wrote:
> 
> BTW, once here, didn't default nic should be e1000? not rtl8139?

Are you looking at qemu-kvm sources? ;-)

		Amit
Markus Armbruster - June 9, 2010, 11:31 a.m.
Amit Shah <amit.shah@redhat.com> writes:

> On (Wed) Jun 09 2010 [08:37:26], Markus Armbruster wrote:
>> Amit Shah <amit.shah@redhat.com> writes:
>> 
>> > On (Tue) Jun 08 2010 [18:33:00], Markus Armbruster wrote:
>> >> Amit Shah <amit.shah@redhat.com> writes:
>> >> 
>> >> > The correct model type wasn't getting added when hotplugging nics with
>> >> > pci_add.
>> >> >
>> >> > Testcase: start VM with default nic type. In the qemu_monitor:
>> >> >
>> >> > (qemu) pci_add auto nic model=virtio
>> >> >
>> >> > This results in a nic hot-plug of the same nic type as the default.
>> >> 
>> >> Works fine for me on master, fd1dc858370d9a9ac7ea2512812c3a152ee6484b.
>> >> What am I doing wrong?
>> >
>> > Did you start with a virtio nic added? The 'default' here is the nic
>> > type that's added as the first nic. Try this: start a VM with model
>> > e1000 and use pci_add to add a nic type of virtio.
>> 
>> I do.  Nevertheless, "pci_add auto nic model=e1000" adds an e1000.
>> 
>> Also works if I start without a NIC.
>> 
>> Ah, if I start with a -net nic, then pci_add breaks as described!
>> 
>> Now my next question is *how* your patch fixes this.  It's not at all
>> obvious to me.
>
> The callers of net_client_init() expect a positive return value that
> means an index into the nd_table[] array, where the options just parsed
> are placed.
>
> Let's look at 5294e2c774f120e10b44652ac143abda356f44eb, which broke this
> thing:
>
>  
>              if (net_client_types[i].init) {
> -                return net_client_types[i].init(opts, mon, name, vlan);
> -            } else {
> -                return 0;
> +                if (net_client_types[i].init(opts, mon, name, vlan) < 0) {
> +                    /* TODO push error reporting into init() methods */
> +                    qerror_report(QERR_DEVICE_INIT_FAILED, type);
> +                    return -1;
> +                }
>              }
> +            return 0;
>
> See it yet? instead of returning the value returned by
> net_client_types[i].init(), we're now just returning either 0 or -1. The
> return value is now by default '0' for successful init(), which causes
> the calling functions to just end up using the same parameters that were
> passed for the first nic type.

Ah!  Could a brief version of this be added to the commit message?

>> >> > This was broken in 5294e2c774f120e10b44652ac143abda356f44eb
>> >> >
>> >> > Also changes the behaviour where no .init is defined for a
>> >> > net_client_type. Previously, 0 was returned, which indicated the init
>> >> > was successful and that 0 was the index into the nd_tables[] array.
>> >> > Return -1, indicating unsuccessful init, in such a case.
>> >> 
>> >> The only element of net_client_types[] without an init() method is type
>> >> "none", index 0.  So, doesn't this break -net none?  And what does it
>> >> fix?
>> >
>> > The net_client_types[] index isn't relevant here. -net none works fine,
>> > no problem.
>> 
>> Let me rephrase: Behavior changes for -net types without an init()
>> method.  The only one without an init() method is "none".  Before,
>> net_client_init() succeeded for it.  Now it fails.  What's the impact of
>> that change?  And why does it make sense?
>
> It makes sense because we don't actually initialise anything. We don't
> place anything in the nd_table[] array. That means callers shouldn't
> poke in the array for any values. Returning -1 makes sense for that
> reason. If we continued to return 0, callers might just assume that init
> was successful and that nd_table[0] was set up for use appropriately.
>
> The thing is that the code doesn't go this far in case of '-net none'
> anyway. This was just a potential bug lurking around for any new -net
> method which didn't have an init.

Okay.

Thanks for your patience.
Amit Shah - June 15, 2010, 8:05 a.m.
On (Wed) Jun 09 2010 [15:28:11], Amit Shah wrote:
> > Let me rephrase: Behavior changes for -net types without an init()
> > method.  The only one without an init() method is "none".  Before,
> > net_client_init() succeeded for it.  Now it fails.  What's the impact of
> > that change?  And why does it make sense?
> 
> It makes sense because we don't actually initialise anything. We don't
> place anything in the nd_table[] array. That means callers shouldn't
> poke in the array for any values. Returning -1 makes sense for that
> reason. If we continued to return 0, callers might just assume that init
> was successful and that nd_table[0] was set up for use appropriately.
> 
> The thing is that the code doesn't go this far in case of '-net none'
> anyway. This was just a potential bug lurking around for any new -net
> method which didn't have an init.

Hm, I only tested -net none from the monitor, using pci_add. Looks like
I skipped testing it from the cmd line. I did that today and found out
the VM doesn't start. I've sent a patch to revert to the original
behaviour of returning 0, and we'll have to find another way of fixing
the 'success or index' return problem.

		Amit

Patch

diff --git a/net.c b/net.c
index efa8b3d..4cb93ed 100644
--- a/net.c
+++ b/net.c
@@ -1106,6 +1106,7 @@  int net_client_init(Monitor *mon, QemuOpts *opts, int is_netdev)
     for (i = 0; net_client_types[i].type != NULL; i++) {
         if (!strcmp(net_client_types[i].type, type)) {
             VLANState *vlan = NULL;
+            int ret;
 
             if (qemu_opts_validate(opts, &net_client_types[i].desc[0]) == -1) {
                 return -1;
@@ -1118,14 +1119,16 @@  int net_client_init(Monitor *mon, QemuOpts *opts, int is_netdev)
                 vlan = qemu_find_vlan(qemu_opt_get_number(opts, "vlan", 0), 1);
             }
 
+            ret = -1;
             if (net_client_types[i].init) {
-                if (net_client_types[i].init(opts, mon, name, vlan) < 0) {
+                ret = net_client_types[i].init(opts, mon, name, vlan);
+                if (ret < 0) {
                     /* TODO push error reporting into init() methods */
                     qerror_report(QERR_DEVICE_INIT_FAILED, type);
                     return -1;
                 }
             }
-            return 0;
+            return ret;
         }
     }