Patchwork Re: [PATCH] PCI: Bus number from the bridge, not the device

login
register
mail settings
Submitter Michael S. Tsirkin
Date Dec. 14, 2010, 4:57 a.m.
Message ID <20101214045715.GG9554@redhat.com>
Download mbox | patch
Permalink /patch/75467/
State New
Headers show

Comments

Michael S. Tsirkin - Dec. 14, 2010, 4:57 a.m.
On Mon, Dec 13, 2010 at 09:49:21PM -0700, Alex Williamson wrote:
> On Tue, 2010-12-14 at 06:46 +0200, Michael S. Tsirkin wrote:
> > On Mon, Dec 13, 2010 at 01:04:23PM -0700, Alex Williamson wrote:
> > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
> > > > > pcibus_dev_print() was erroneously retrieving the device bus
> > > > > number from the secondary bus number offset of the device
> > > > > instead of the bridge above the device.  This ends of landing
> > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > > > is usually zero.  pcibus_get_dev_path() copied this code,
> > > > > inheriting the same bug.  pcibus_get_dev_path() is used for
> > > > > ramblock naming, so changing it can effect migration.  However,
> > > > > I've only seen this byte be non-zero for an assigned device,
> > > > > which can't migrate anyway, so hopefully we won't run into
> > > > > any issues.
> > > > > 
> > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > 
> > > > Good catch. Applied.
> > > 
> > > Um... submitted vs applied:
> > > 
> > >      PCI: Bus number from the bridge, not the device
> > >  
> > > @@ -6,20 +8,28 @@
> > >      number from the secondary bus number offset of the device
> > >      instead of the bridge above the device.  This ends of landing
> > >      in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > -    is usually zero.  pcibus_get_dev_path() copied this code,
> > > +    is usually zero.
> > > +    
> > > +    Note: pcibus_get_dev_path() copied this code,
> > >      inheriting the same bug.  pcibus_get_dev_path() is used for
> > >      ramblock naming, so changing it can effect migration.  However,
> > >      I've only seen this byte be non-zero for an assigned device,
> > >      which can't migrate anyway, so hopefully we won't run into
> > >      any issues.
> > >  
> > > +    This patch does not touch pcibus_get_dev_path, as
> > > +    bus number is guest assigned for nested buses,
> > > +    so using it for migration is broken anyway.
> > > +    Fix it properly later.
> > > +    
> > >      Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > +    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > >  
> > >  diff --git a/hw/pci.c b/hw/pci.c
> > > -index 6d0934d..15416dd 100644
> > > +index 962886e..8f6fcf8 100644
> > >  --- a/hw/pci.c
> > >  +++ b/hw/pci.c
> > > -@@ -1940,8 +1940,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
> > > +@@ -1806,8 +1806,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
> > >   
> > >       monitor_printf(mon, "%*sclass %s, addr %02x:%02x.%x, "
> > >                      "pci id %04x:%04x (sub %04x:%04x)\n",
> > > @@ -29,14 +39,3 @@
> > >                      PCI_SLOT(d->devfn), PCI_FUNC(d->devfn),
> > >                      pci_get_word(d->config + PCI_VENDOR_ID),
> > >                      pci_get_word(d->config + PCI_DEVICE_ID),
> > > -@@ -1965,7 +1964,7 @@ static char *pcibus_get_dev_path(DeviceState *dev)
> > > -     char path[16];
> > > - 
> > > -     snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> > > --             pci_find_domain(d->bus), d->config[PCI_SECONDARY_BUS],
> > > -+             pci_find_domain(d->bus), pci_bus_num(d->bus),
> > > -              PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> > > - 
> > > -     return strdup(path);
> > > -
> > > -
> > > 
> > > So the chunk that fixed the part that I was actually interested in got
> > > dropped even though the existing code is clearly wrong.  Yes, we still
> > > have issues with nested bridges (not that we have many of those), but
> > > until the "Fix it properly later" part comes along, can we please
> > > include the obvious bug fix?  Thanks,
> > > 
> > > Alex
> > 
> > We can stick 0 in there - would that help?  I would much rather not
> > create a version where we put the bus number there.
> 
> Yep, 0 is good enough until we solve the nested bridge problem.  Thanks,
> 
> Alex

I'm surprised you see that it matters in practice, but ok.
Like this?
Alex Williamson - Dec. 14, 2010, 5:04 a.m.
On Tue, 2010-12-14 at 06:57 +0200, Michael S. Tsirkin wrote:
> On Mon, Dec 13, 2010 at 09:49:21PM -0700, Alex Williamson wrote:
> > On Tue, 2010-12-14 at 06:46 +0200, Michael S. Tsirkin wrote:
> > > On Mon, Dec 13, 2010 at 01:04:23PM -0700, Alex Williamson wrote:
> > > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
> > > > > > pcibus_dev_print() was erroneously retrieving the device bus
> > > > > > number from the secondary bus number offset of the device
> > > > > > instead of the bridge above the device.  This ends of landing
> > > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > > > > is usually zero.  pcibus_get_dev_path() copied this code,
> > > > > > inheriting the same bug.  pcibus_get_dev_path() is used for
> > > > > > ramblock naming, so changing it can effect migration.  However,
> > > > > > I've only seen this byte be non-zero for an assigned device,
> > > > > > which can't migrate anyway, so hopefully we won't run into
> > > > > > any issues.
> > > > > > 
> > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > > 
> > > > > Good catch. Applied.
> > > > 
> > > > Um... submitted vs applied:
> > > > 
> > > >      PCI: Bus number from the bridge, not the device
> > > >  
> > > > @@ -6,20 +8,28 @@
> > > >      number from the secondary bus number offset of the device
> > > >      instead of the bridge above the device.  This ends of landing
> > > >      in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > > -    is usually zero.  pcibus_get_dev_path() copied this code,
> > > > +    is usually zero.
> > > > +    
> > > > +    Note: pcibus_get_dev_path() copied this code,
> > > >      inheriting the same bug.  pcibus_get_dev_path() is used for
> > > >      ramblock naming, so changing it can effect migration.  However,
> > > >      I've only seen this byte be non-zero for an assigned device,
> > > >      which can't migrate anyway, so hopefully we won't run into
> > > >      any issues.
> > > >  
> > > > +    This patch does not touch pcibus_get_dev_path, as
> > > > +    bus number is guest assigned for nested buses,
> > > > +    so using it for migration is broken anyway.
> > > > +    Fix it properly later.
> > > > +    
> > > >      Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > +    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > >  
> > > >  diff --git a/hw/pci.c b/hw/pci.c
> > > > -index 6d0934d..15416dd 100644
> > > > +index 962886e..8f6fcf8 100644
> > > >  --- a/hw/pci.c
> > > >  +++ b/hw/pci.c
> > > > -@@ -1940,8 +1940,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
> > > > +@@ -1806,8 +1806,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
> > > >   
> > > >       monitor_printf(mon, "%*sclass %s, addr %02x:%02x.%x, "
> > > >                      "pci id %04x:%04x (sub %04x:%04x)\n",
> > > > @@ -29,14 +39,3 @@
> > > >                      PCI_SLOT(d->devfn), PCI_FUNC(d->devfn),
> > > >                      pci_get_word(d->config + PCI_VENDOR_ID),
> > > >                      pci_get_word(d->config + PCI_DEVICE_ID),
> > > > -@@ -1965,7 +1964,7 @@ static char *pcibus_get_dev_path(DeviceState *dev)
> > > > -     char path[16];
> > > > - 
> > > > -     snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> > > > --             pci_find_domain(d->bus), d->config[PCI_SECONDARY_BUS],
> > > > -+             pci_find_domain(d->bus), pci_bus_num(d->bus),
> > > > -              PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> > > > - 
> > > > -     return strdup(path);
> > > > -
> > > > -
> > > > 
> > > > So the chunk that fixed the part that I was actually interested in got
> > > > dropped even though the existing code is clearly wrong.  Yes, we still
> > > > have issues with nested bridges (not that we have many of those), but
> > > > until the "Fix it properly later" part comes along, can we please
> > > > include the obvious bug fix?  Thanks,
> > > > 
> > > > Alex
> > > 
> > > We can stick 0 in there - would that help?  I would much rather not
> > > create a version where we put the bus number there.
> > 
> > Yep, 0 is good enough until we solve the nested bridge problem.  Thanks,
> > 
> > Alex
> 
> I'm surprised you see that it matters in practice, but ok.
> Like this?

I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an
assigned device, so I'm pretty sure we're not going to hurt migration,
but the code is clearly wrong and I'd like to make sure we don't trip on
a migration failure for a minor device config space change.

> diff --git a/hw/pci.c b/hw/pci.c
> index 254647b..81231c5 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1952,7 +1952,10 @@ static char *pcibus_get_dev_path(DeviceState *dev)
>      char path[16];
>  
>      snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> -             pci_find_domain(d->bus), d->config[PCI_SECONDARY_BUS],
> +             pci_find_domain(d->bus),
> +             0 /* TODO: need a persistent path for nested buses.
> +                * Note: pci_bus_num(d->bus) is not right as it's guest
> +                * assigned. */,
>               PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
>  
>      return strdup(path);

Sure, that's fine.

Acked-by: Alex Williamson <alex.williamson@redhat.com>

Thanks,

Alex
Michael S. Tsirkin - Dec. 14, 2010, 12:26 p.m.
On Mon, Dec 13, 2010 at 10:04:24PM -0700, Alex Williamson wrote:
> On Tue, 2010-12-14 at 06:57 +0200, Michael S. Tsirkin wrote:
> > On Mon, Dec 13, 2010 at 09:49:21PM -0700, Alex Williamson wrote:
> > > On Tue, 2010-12-14 at 06:46 +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Dec 13, 2010 at 01:04:23PM -0700, Alex Williamson wrote:
> > > > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
> > > > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
> > > > > > > pcibus_dev_print() was erroneously retrieving the device bus
> > > > > > > number from the secondary bus number offset of the device
> > > > > > > instead of the bridge above the device.  This ends of landing
> > > > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > > > > > is usually zero.  pcibus_get_dev_path() copied this code,
> > > > > > > inheriting the same bug.  pcibus_get_dev_path() is used for
> > > > > > > ramblock naming, so changing it can effect migration.  However,
> > > > > > > I've only seen this byte be non-zero for an assigned device,
> > > > > > > which can't migrate anyway, so hopefully we won't run into
> > > > > > > any issues.
> > > > > > > 
> > > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > > > 
> > > > > > Good catch. Applied.
> > > > > 
> > > > > Um... submitted vs applied:
> > > > > 
> > > > >      PCI: Bus number from the bridge, not the device
> > > > >  
> > > > > @@ -6,20 +8,28 @@
> > > > >      number from the secondary bus number offset of the device
> > > > >      instead of the bridge above the device.  This ends of landing
> > > > >      in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > > > -    is usually zero.  pcibus_get_dev_path() copied this code,
> > > > > +    is usually zero.
> > > > > +    
> > > > > +    Note: pcibus_get_dev_path() copied this code,
> > > > >      inheriting the same bug.  pcibus_get_dev_path() is used for
> > > > >      ramblock naming, so changing it can effect migration.  However,
> > > > >      I've only seen this byte be non-zero for an assigned device,
> > > > >      which can't migrate anyway, so hopefully we won't run into
> > > > >      any issues.
> > > > >  
> > > > > +    This patch does not touch pcibus_get_dev_path, as
> > > > > +    bus number is guest assigned for nested buses,
> > > > > +    so using it for migration is broken anyway.
> > > > > +    Fix it properly later.
> > > > > +    
> > > > >      Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > > +    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > >  
> > > > >  diff --git a/hw/pci.c b/hw/pci.c
> > > > > -index 6d0934d..15416dd 100644
> > > > > +index 962886e..8f6fcf8 100644
> > > > >  --- a/hw/pci.c
> > > > >  +++ b/hw/pci.c
> > > > > -@@ -1940,8 +1940,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
> > > > > +@@ -1806,8 +1806,7 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
> > > > >   
> > > > >       monitor_printf(mon, "%*sclass %s, addr %02x:%02x.%x, "
> > > > >                      "pci id %04x:%04x (sub %04x:%04x)\n",
> > > > > @@ -29,14 +39,3 @@
> > > > >                      PCI_SLOT(d->devfn), PCI_FUNC(d->devfn),
> > > > >                      pci_get_word(d->config + PCI_VENDOR_ID),
> > > > >                      pci_get_word(d->config + PCI_DEVICE_ID),
> > > > > -@@ -1965,7 +1964,7 @@ static char *pcibus_get_dev_path(DeviceState *dev)
> > > > > -     char path[16];
> > > > > - 
> > > > > -     snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> > > > > --             pci_find_domain(d->bus), d->config[PCI_SECONDARY_BUS],
> > > > > -+             pci_find_domain(d->bus), pci_bus_num(d->bus),
> > > > > -              PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> > > > > - 
> > > > > -     return strdup(path);
> > > > > -
> > > > > -
> > > > > 
> > > > > So the chunk that fixed the part that I was actually interested in got
> > > > > dropped even though the existing code is clearly wrong.  Yes, we still
> > > > > have issues with nested bridges (not that we have many of those), but
> > > > > until the "Fix it properly later" part comes along, can we please
> > > > > include the obvious bug fix?  Thanks,
> > > > > 
> > > > > Alex
> > > > 
> > > > We can stick 0 in there - would that help?  I would much rather not
> > > > create a version where we put the bus number there.
> > > 
> > > Yep, 0 is good enough until we solve the nested bridge problem.  Thanks,
> > > 
> > > Alex
> > 
> > I'm surprised you see that it matters in practice, but ok.
> > Like this?
> 
> I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an
> assigned device, so I'm pretty sure we're not going to hurt migration,
> but the code is clearly wrong and I'd like to make sure we don't trip on
> a migration failure for a minor device config space change.

Which reminds me: maybe just mark nested bridges as non-migrateable
for now?  Care writing such a patch?

> > diff --git a/hw/pci.c b/hw/pci.c
> > index 254647b..81231c5 100644
> > --- a/hw/pci.c
> > +++ b/hw/pci.c
> > @@ -1952,7 +1952,10 @@ static char *pcibus_get_dev_path(DeviceState *dev)
> >      char path[16];
> >  
> >      snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> > -             pci_find_domain(d->bus), d->config[PCI_SECONDARY_BUS],
> > +             pci_find_domain(d->bus),
> > +             0 /* TODO: need a persistent path for nested buses.
> > +                * Note: pci_bus_num(d->bus) is not right as it's guest
> > +                * assigned. */,
> >               PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> >  
> >      return strdup(path);
> 
> Sure, that's fine.
> 
> Acked-by: Alex Williamson <alex.williamson@redhat.com>
> 
> Thanks,
> 
> Alex
Alex Williamson - Dec. 14, 2010, 6:34 p.m.
On Tue, 2010-12-14 at 14:26 +0200, Michael S. Tsirkin wrote:
> On Mon, Dec 13, 2010 at 10:04:24PM -0700, Alex Williamson wrote:
> > 
> > I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an
> > assigned device, so I'm pretty sure we're not going to hurt migration,
> > but the code is clearly wrong and I'd like to make sure we don't trip on
> > a migration failure for a minor device config space change.
> 
> Which reminds me: maybe just mark nested bridges as non-migrateable
> for now?  Care writing such a patch?

Hmm, this is trickier than it sounds.  We're really only broken wrt
migration if a device under a bridge calls qemu_ram_alloc.  Any device
is free to do this, but typically it only happens via
pci_add_option_rom() (not counting vga as typical).  So maybe the better
approach for now is to prevent the problem by disallowing option ROMs
for devices below a bridge.  We obviously risk devices coming along that
allocate RAM on their own, but we could still allow the most common
issue with almost no lost functionality (assuming no one wants to boot
off that nested device).  Thoughts?  Thanks,

Alex
Michael S. Tsirkin - Dec. 15, 2010, 9:56 a.m.
On Tue, Dec 14, 2010 at 11:34:53AM -0700, Alex Williamson wrote:
> On Tue, 2010-12-14 at 14:26 +0200, Michael S. Tsirkin wrote:
> > On Mon, Dec 13, 2010 at 10:04:24PM -0700, Alex Williamson wrote:
> > > 
> > > I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an
> > > assigned device, so I'm pretty sure we're not going to hurt migration,
> > > but the code is clearly wrong and I'd like to make sure we don't trip on
> > > a migration failure for a minor device config space change.
> > 
> > Which reminds me: maybe just mark nested bridges as non-migrateable
> > for now?  Care writing such a patch?
> 
> Hmm, this is trickier than it sounds.

Hmm, since 0 is put in the path instead of the bridge number,
will the correct bridge be restored?

>  We're really only broken wrt
> migration if a device under a bridge calls qemu_ram_alloc.

I guess there's more broken-ness. What exactly breaks qemu_ram_alloc?

>  Any device
> is free to do this, but typically it only happens via
> pci_add_option_rom() (not counting vga as typical).  So maybe the better
> approach for now is to prevent the problem by disallowing option ROMs
> for devices below a bridge.  We obviously risk devices coming along that
> allocate RAM on their own, but we could still allow the most common
> issue with almost no lost functionality (assuming no one wants to boot
> off that nested device).  Thoughts?  Thanks,
> 
> Alex
Alex Williamson - Dec. 15, 2010, 3:27 p.m.
On Wed, 2010-12-15 at 11:56 +0200, Michael S. Tsirkin wrote:
> On Tue, Dec 14, 2010 at 11:34:53AM -0700, Alex Williamson wrote:
> > On Tue, 2010-12-14 at 14:26 +0200, Michael S. Tsirkin wrote:
> > > On Mon, Dec 13, 2010 at 10:04:24PM -0700, Alex Williamson wrote:
> > > > 
> > > > I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an
> > > > assigned device, so I'm pretty sure we're not going to hurt migration,
> > > > but the code is clearly wrong and I'd like to make sure we don't trip on
> > > > a migration failure for a minor device config space change.
> > > 
> > > Which reminds me: maybe just mark nested bridges as non-migrateable
> > > for now?  Care writing such a patch?
> > 
> > Hmm, this is trickier than it sounds.
> 
> Hmm, since 0 is put in the path instead of the bridge number,
> will the correct bridge be restored?
> 
> >  We're really only broken wrt
> > migration if a device under a bridge calls qemu_ram_alloc.
> 
> I guess there's more broken-ness. What exactly breaks qemu_ram_alloc?

You're right, it's more broken than that.  Anything that calls
get_dev_path is broken for migration of bridges since the path is
determined before the guest updates bus numbers.  That includes
qemu_ram_alloc and vmstate.  I was only looking at the qemu_ram_alloc
side.  So perhaps the right answer, for the moment, is to block
migration if there's a p2p bridge.

Alex

> >  Any device
> > is free to do this, but typically it only happens via
> > pci_add_option_rom() (not counting vga as typical).  So maybe the better
> > approach for now is to prevent the problem by disallowing option ROMs
> > for devices below a bridge.  We obviously risk devices coming along that
> > allocate RAM on their own, but we could still allow the most common
> > issue with almost no lost functionality (assuming no one wants to boot
> > off that nested device).  Thoughts?  Thanks,
> > 
> > Alex
Isaku Yamahata - Dec. 16, 2010, 7:08 a.m.
On Wed, Dec 15, 2010 at 08:27:49AM -0700, Alex Williamson wrote:
> On Wed, 2010-12-15 at 11:56 +0200, Michael S. Tsirkin wrote:
> > On Tue, Dec 14, 2010 at 11:34:53AM -0700, Alex Williamson wrote:
> > > On Tue, 2010-12-14 at 14:26 +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Dec 13, 2010 at 10:04:24PM -0700, Alex Williamson wrote:
> > > > > 
> > > > > I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an
> > > > > assigned device, so I'm pretty sure we're not going to hurt migration,
> > > > > but the code is clearly wrong and I'd like to make sure we don't trip on
> > > > > a migration failure for a minor device config space change.
> > > > 
> > > > Which reminds me: maybe just mark nested bridges as non-migrateable
> > > > for now?  Care writing such a patch?
> > > 
> > > Hmm, this is trickier than it sounds.
> > 
> > Hmm, since 0 is put in the path instead of the bridge number,
> > will the correct bridge be restored?
> > 
> > >  We're really only broken wrt
> > > migration if a device under a bridge calls qemu_ram_alloc.
> > 
> > I guess there's more broken-ness. What exactly breaks qemu_ram_alloc?
> 
> You're right, it's more broken than that.  Anything that calls
> get_dev_path is broken for migration of bridges since the path is
> determined before the guest updates bus numbers.  That includes
> qemu_ram_alloc and vmstate.  I was only looking at the qemu_ram_alloc
> side.  So perhaps the right answer, for the moment, is to block
> migration if there's a p2p bridge.

Eww. That's bad. Anyway, I agree to disable it for the moment.
Let's fix it after 0.14 release.
Michael S. Tsirkin - Dec. 16, 2010, 8:36 a.m.
On Thu, Dec 16, 2010 at 04:08:16PM +0900, Isaku Yamahata wrote:
> On Wed, Dec 15, 2010 at 08:27:49AM -0700, Alex Williamson wrote:
> > On Wed, 2010-12-15 at 11:56 +0200, Michael S. Tsirkin wrote:
> > > On Tue, Dec 14, 2010 at 11:34:53AM -0700, Alex Williamson wrote:
> > > > On Tue, 2010-12-14 at 14:26 +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Dec 13, 2010 at 10:04:24PM -0700, Alex Williamson wrote:
> > > > > > 
> > > > > > I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an
> > > > > > assigned device, so I'm pretty sure we're not going to hurt migration,
> > > > > > but the code is clearly wrong and I'd like to make sure we don't trip on
> > > > > > a migration failure for a minor device config space change.
> > > > > 
> > > > > Which reminds me: maybe just mark nested bridges as non-migrateable
> > > > > for now?  Care writing such a patch?
> > > > 
> > > > Hmm, this is trickier than it sounds.
> > > 
> > > Hmm, since 0 is put in the path instead of the bridge number,
> > > will the correct bridge be restored?
> > > 
> > > >  We're really only broken wrt
> > > > migration if a device under a bridge calls qemu_ram_alloc.
> > > 
> > > I guess there's more broken-ness. What exactly breaks qemu_ram_alloc?
> > 
> > You're right, it's more broken than that.  Anything that calls
> > get_dev_path is broken for migration of bridges since the path is
> > determined before the guest updates bus numbers.  That includes
> > qemu_ram_alloc and vmstate.  I was only looking at the qemu_ram_alloc
> > side.  So perhaps the right answer, for the moment, is to block
> > migration if there's a p2p bridge.
> 
> Eww. That's bad. Anyway, I agree to disable it for the moment.

Patch?

> Let's fix it after 0.14 release.
> -- 
> yamahata
Isaku Yamahata - Dec. 21, 2010, 10:13 a.m.
On Thu, Dec 16, 2010 at 10:36:08AM +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 16, 2010 at 04:08:16PM +0900, Isaku Yamahata wrote:
> > On Wed, Dec 15, 2010 at 08:27:49AM -0700, Alex Williamson wrote:
> > > On Wed, 2010-12-15 at 11:56 +0200, Michael S. Tsirkin wrote:
> > > > On Tue, Dec 14, 2010 at 11:34:53AM -0700, Alex Williamson wrote:
> > > > > On Tue, 2010-12-14 at 14:26 +0200, Michael S. Tsirkin wrote:
> > > > > > On Mon, Dec 13, 2010 at 10:04:24PM -0700, Alex Williamson wrote:
> > > > > > > 
> > > > > > > I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an
> > > > > > > assigned device, so I'm pretty sure we're not going to hurt migration,
> > > > > > > but the code is clearly wrong and I'd like to make sure we don't trip on
> > > > > > > a migration failure for a minor device config space change.
> > > > > > 
> > > > > > Which reminds me: maybe just mark nested bridges as non-migrateable
> > > > > > for now?  Care writing such a patch?
> > > > > 
> > > > > Hmm, this is trickier than it sounds.
> > > > 
> > > > Hmm, since 0 is put in the path instead of the bridge number,
> > > > will the correct bridge be restored?
> > > > 
> > > > >  We're really only broken wrt
> > > > > migration if a device under a bridge calls qemu_ram_alloc.
> > > > 
> > > > I guess there's more broken-ness. What exactly breaks qemu_ram_alloc?
> > > 
> > > You're right, it's more broken than that.  Anything that calls
> > > get_dev_path is broken for migration of bridges since the path is
> > > determined before the guest updates bus numbers.  That includes
> > > qemu_ram_alloc and vmstate.  I was only looking at the qemu_ram_alloc
> > > side.  So perhaps the right answer, for the moment, is to block
> > > migration if there's a p2p bridge.
> > 
> > Eww. That's bad. Anyway, I agree to disable it for the moment.
> 
> Patch?

Although I'm willing to create patch, how to disable the migration?
As long as I know, Alex Williamson had tried to push the patch series
"Save state error handling (kill off no_migrate)", they haven't been
merged.
If they are merged in, it seems quite easy to disable the migration.

Anyway register_device_unmigratable() seems to need some clean up
for DeviceInfo::vmsd. Any ideas?
Michael S. Tsirkin - Dec. 21, 2010, 10:56 a.m.
On Tue, Dec 21, 2010 at 07:13:57PM +0900, Isaku Yamahata wrote:
> On Thu, Dec 16, 2010 at 10:36:08AM +0200, Michael S. Tsirkin wrote:
> > On Thu, Dec 16, 2010 at 04:08:16PM +0900, Isaku Yamahata wrote:
> > > On Wed, Dec 15, 2010 at 08:27:49AM -0700, Alex Williamson wrote:
> > > > On Wed, 2010-12-15 at 11:56 +0200, Michael S. Tsirkin wrote:
> > > > > On Tue, Dec 14, 2010 at 11:34:53AM -0700, Alex Williamson wrote:
> > > > > > On Tue, 2010-12-14 at 14:26 +0200, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Dec 13, 2010 at 10:04:24PM -0700, Alex Williamson wrote:
> > > > > > > > 
> > > > > > > > I've only ever seen config[PCI_SECONDARY_BUS] be non-zero for an
> > > > > > > > assigned device, so I'm pretty sure we're not going to hurt migration,
> > > > > > > > but the code is clearly wrong and I'd like to make sure we don't trip on
> > > > > > > > a migration failure for a minor device config space change.
> > > > > > > 
> > > > > > > Which reminds me: maybe just mark nested bridges as non-migrateable
> > > > > > > for now?  Care writing such a patch?
> > > > > > 
> > > > > > Hmm, this is trickier than it sounds.
> > > > > 
> > > > > Hmm, since 0 is put in the path instead of the bridge number,
> > > > > will the correct bridge be restored?
> > > > > 
> > > > > >  We're really only broken wrt
> > > > > > migration if a device under a bridge calls qemu_ram_alloc.
> > > > > 
> > > > > I guess there's more broken-ness. What exactly breaks qemu_ram_alloc?
> > > > 
> > > > You're right, it's more broken than that.  Anything that calls
> > > > get_dev_path is broken for migration of bridges since the path is
> > > > determined before the guest updates bus numbers.  That includes
> > > > qemu_ram_alloc and vmstate.  I was only looking at the qemu_ram_alloc
> > > > side.  So perhaps the right answer, for the moment, is to block
> > > > migration if there's a p2p bridge.
> > > 
> > > Eww. That's bad. Anyway, I agree to disable it for the moment.
> > 
> > Patch?
> 
> Although I'm willing to create patch, how to disable the migration?

Set the no_migrate flag in SaveStateEntry.

> As long as I know, Alex Williamson had tried to push the patch series
> "Save state error handling (kill off no_migrate)", they haven't been
> merged.
> If they are merged in, it seems quite easy to disable the migration.

These look unlikely to be merged: they seem to go
contrary to the table-driver approach to migration that we are trying to
take.

> Anyway register_device_unmigratable() seems to need some clean up
> for DeviceInfo::vmsd. Any ideas?

So clean it up if you like.

> -- 
> yamahata

Patch

diff --git a/hw/pci.c b/hw/pci.c
index 254647b..81231c5 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1952,7 +1952,10 @@  static char *pcibus_get_dev_path(DeviceState *dev)
     char path[16];
 
     snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
-             pci_find_domain(d->bus), d->config[PCI_SECONDARY_BUS],
+             pci_find_domain(d->bus),
+             0 /* TODO: need a persistent path for nested buses.
+                * Note: pci_bus_num(d->bus) is not right as it's guest
+                * assigned. */,
              PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
 
     return strdup(path);