diff mbox

[2/2] pcie_aer: clear cmask for Advanced Error Interrupt Message Number

Message ID bc0da7d123650f7d5caca74591c9ab875835b12c.1346347994.git.jbaron@redhat.com
State New
Headers show

Commit Message

Jason Baron Aug. 30, 2012, 5:51 p.m. UTC
The Advanced Error Interrupt Message Number (bits 31:27 of the Root
Error Status Register) is updated when the number of msi messages assigned to a
device changes. Migration of windows 7 on q35 chipset failed because the check
in get_pci_config_device() fails due to wmask being set on these bits. Its valid
to update these bits and we must restore this state across migration.

Signed-off-by: Jason Baron <jbaron@redhat.com>
---
 hw/pcie_aer.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

Comments

Michael S. Tsirkin Aug. 31, 2012, 8:42 a.m. UTC | #1
Some minor nits below. If you dont get to it I will tweak this patch
when I apply it early next week.

On Thu, Aug 30, 2012 at 01:51:15PM -0400, Jason Baron wrote:
> The Advanced Error Interrupt Message Number (bits 31:27 of the Root
> Error Status Register) is updated when the number of msi messages assigned to a
> device changes. Migration of windows 7 on q35 chipset failed because the check
> in get_pci_config_device() fails due to wmask being set on these bits.

I think you actually mean 'not being set on these bits'?

> Its valid
> to update these bits and we must restore this state across migration.
> 
> Signed-off-by: Jason Baron <jbaron@redhat.com>
> ---
>  hw/pcie_aer.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
> index 3b6981c..6edcd79 100644
> --- a/hw/pcie_aer.c
> +++ b/hw/pcie_aer.c
> @@ -738,6 +738,12 @@ void pcie_aer_root_init(PCIDevice *dev)
>                   PCI_ERR_ROOT_CMD_EN_MASK);
>      pci_set_long(dev->w1cmask + pos + PCI_ERR_ROOT_STATUS,
>                   PCI_ERR_ROOT_STATUS_REPORT_MASK);
> +    /* Bits 31:27 - Advanced Error Interrupt Message Number

This line is better moved to near definition of PCI_ERR_ROOT_IRQ.
Then here we can say 'PCI_ERR_ROOT_IRQ is RO but devices
change it using a device-specific method.'

> +     * These bits are updated when the number of MSI messages changes.
> +     * By clearing the cmask, pcie devices can be migrated.
> +     */
> +    pci_set_long(dev->cmask + pos + PCI_ERR_ROOT_STATUS,
> +                 (1 << PCI_ERR_ROOT_IRQ_SHIFT) - 1);

~PCI_ERR_ROOT_IRQ would be clearer I think.

>  }



>  
>  void pcie_aer_root_reset(PCIDevice *dev)
> -- 
> 1.7.1
Jason Baron Aug. 31, 2012, 2:45 p.m. UTC | #2
On Fri, Aug 31, 2012 at 11:42:27AM +0300, Michael S. Tsirkin wrote:
> Some minor nits below. If you dont get to it I will tweak this patch
> when I apply it early next week.
> 
> On Thu, Aug 30, 2012 at 01:51:15PM -0400, Jason Baron wrote:
> > The Advanced Error Interrupt Message Number (bits 31:27 of the Root
> > Error Status Register) is updated when the number of msi messages assigned to a
> > device changes. Migration of windows 7 on q35 chipset failed because the check
> > in get_pci_config_device() fails due to wmask being set on these bits.
> 
> I think you actually mean 'not being set on these bits'?
> 

No, the check is:

static int get_pci_config_device(QEMUFile *f, void *pv, size_t size)
{
    PCIDevice *s = container_of(pv, PCIDevice, config);
    uint8_t *config;
    int i;

    assert(size == pci_config_size(s));
    config = g_malloc(size);

    qemu_get_buffer(f, config, size);
    for (i = 0; i < size; ++i) {
        if ((config[i] ^ s->config[i]) &
            s->cmask[i] & ~s->wmask[i] & ~s->w1cmask[i]) {
            g_free(config);
            return -EINVAL;
        }
    }
.....

So because cmask is set and these bits differ in config space (due to
them being updated before migration), the migration aborts.


> > Its valid
> > to update these bits and we must restore this state across migration.
> > 
> > Signed-off-by: Jason Baron <jbaron@redhat.com>
> > ---
> >  hw/pcie_aer.c |    6 ++++++
> >  1 files changed, 6 insertions(+), 0 deletions(-)
> > 
> > diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
> > index 3b6981c..6edcd79 100644
> > --- a/hw/pcie_aer.c
> > +++ b/hw/pcie_aer.c
> > @@ -738,6 +738,12 @@ void pcie_aer_root_init(PCIDevice *dev)
> >                   PCI_ERR_ROOT_CMD_EN_MASK);
> >      pci_set_long(dev->w1cmask + pos + PCI_ERR_ROOT_STATUS,
> >                   PCI_ERR_ROOT_STATUS_REPORT_MASK);
> > +    /* Bits 31:27 - Advanced Error Interrupt Message Number
> 
> This line is better moved to near definition of PCI_ERR_ROOT_IRQ.
> Then here we can say 'PCI_ERR_ROOT_IRQ is RO but devices
> change it using a device-specific method.'

ok.

> 
> > +     * These bits are updated when the number of MSI messages changes.
> > +     * By clearing the cmask, pcie devices can be migrated.
> > +     */
> > +    pci_set_long(dev->cmask + pos + PCI_ERR_ROOT_STATUS,
> > +                 (1 << PCI_ERR_ROOT_IRQ_SHIFT) - 1);
> 
> ~PCI_ERR_ROOT_IRQ would be clearer I think.
> 

agreed.

I will re-spin and post a v2.

Thanks,

-Jason
Michael S. Tsirkin Aug. 31, 2012, 3:35 p.m. UTC | #3
On Fri, Aug 31, 2012 at 10:45:52AM -0400, Jason Baron wrote:
> On Fri, Aug 31, 2012 at 11:42:27AM +0300, Michael S. Tsirkin wrote:
> > Some minor nits below. If you dont get to it I will tweak this patch
> > when I apply it early next week.
> > 
> > On Thu, Aug 30, 2012 at 01:51:15PM -0400, Jason Baron wrote:
> > > The Advanced Error Interrupt Message Number (bits 31:27 of the Root
> > > Error Status Register) is updated when the number of msi messages assigned to a
> > > device changes. Migration of windows 7 on q35 chipset failed because the check
> > > in get_pci_config_device() fails due to wmask being set on these bits.
> > 
> > I think you actually mean 'not being set on these bits'?
> > 
> 
> No, the check is:
> 
> static int get_pci_config_device(QEMUFile *f, void *pv, size_t size)
> {
>     PCIDevice *s = container_of(pv, PCIDevice, config);
>     uint8_t *config;
>     int i;
> 
>     assert(size == pci_config_size(s));
>     config = g_malloc(size);
> 
>     qemu_get_buffer(f, config, size);
>     for (i = 0; i < size; ++i) {
>         if ((config[i] ^ s->config[i]) &
>             s->cmask[i] & ~s->wmask[i] & ~s->w1cmask[i]) {
>             g_free(config);
>             return -EINVAL;
>         }
>     }
> .....
> 
> So because cmask is set and these bits differ in config space (due to
> them being updated before migration), the migration aborts.

Ah so you mean 'cmask being set' - not wmask as you wrote.

> 
> > > Its valid
> > > to update these bits and we must restore this state across migration.
> > > 
> > > Signed-off-by: Jason Baron <jbaron@redhat.com>
> > > ---
> > >  hw/pcie_aer.c |    6 ++++++
> > >  1 files changed, 6 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
> > > index 3b6981c..6edcd79 100644
> > > --- a/hw/pcie_aer.c
> > > +++ b/hw/pcie_aer.c
> > > @@ -738,6 +738,12 @@ void pcie_aer_root_init(PCIDevice *dev)
> > >                   PCI_ERR_ROOT_CMD_EN_MASK);
> > >      pci_set_long(dev->w1cmask + pos + PCI_ERR_ROOT_STATUS,
> > >                   PCI_ERR_ROOT_STATUS_REPORT_MASK);
> > > +    /* Bits 31:27 - Advanced Error Interrupt Message Number
> > 
> > This line is better moved to near definition of PCI_ERR_ROOT_IRQ.
> > Then here we can say 'PCI_ERR_ROOT_IRQ is RO but devices
> > change it using a device-specific method.'
> 
> ok.
> 
> > 
> > > +     * These bits are updated when the number of MSI messages changes.
> > > +     * By clearing the cmask, pcie devices can be migrated.
> > > +     */
> > > +    pci_set_long(dev->cmask + pos + PCI_ERR_ROOT_STATUS,
> > > +                 (1 << PCI_ERR_ROOT_IRQ_SHIFT) - 1);
> > 
> > ~PCI_ERR_ROOT_IRQ would be clearer I think.
> > 
> 
> agreed.
> 
> I will re-spin and post a v2.
> 
> Thanks,
> 
> -Jason
Jason Baron Aug. 31, 2012, 3:43 p.m. UTC | #4
On Fri, Aug 31, 2012 at 06:35:13PM +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 31, 2012 at 10:45:52AM -0400, Jason Baron wrote:
> > On Fri, Aug 31, 2012 at 11:42:27AM +0300, Michael S. Tsirkin wrote:
> > > Some minor nits below. If you dont get to it I will tweak this patch
> > > when I apply it early next week.
> > > 
> > > On Thu, Aug 30, 2012 at 01:51:15PM -0400, Jason Baron wrote:
> > > > The Advanced Error Interrupt Message Number (bits 31:27 of the Root
> > > > Error Status Register) is updated when the number of msi messages assigned to a
> > > > device changes. Migration of windows 7 on q35 chipset failed because the check
> > > > in get_pci_config_device() fails due to wmask being set on these bits.
> > > 
> > > I think you actually mean 'not being set on these bits'?
> > > 
> > 
> > No, the check is:
> > 
> > static int get_pci_config_device(QEMUFile *f, void *pv, size_t size)
> > {
> >     PCIDevice *s = container_of(pv, PCIDevice, config);
> >     uint8_t *config;
> >     int i;
> > 
> >     assert(size == pci_config_size(s));
> >     config = g_malloc(size);
> > 
> >     qemu_get_buffer(f, config, size);
> >     for (i = 0; i < size; ++i) {
> >         if ((config[i] ^ s->config[i]) &
> >             s->cmask[i] & ~s->wmask[i] & ~s->w1cmask[i]) {
> >             g_free(config);
> >             return -EINVAL;
> >         }
> >     }
> > .....
> > 
> > So because cmask is set and these bits differ in config space (due to
> > them being updated before migration), the migration aborts.
> 
> Ah so you mean 'cmask being set' - not wmask as you wrote.
> 

Sorry - my mistake, yes I meant 'cmask'. I will fix the changelog in the
re-post.

Thanks,

-Jason
diff mbox

Patch

diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
index 3b6981c..6edcd79 100644
--- a/hw/pcie_aer.c
+++ b/hw/pcie_aer.c
@@ -738,6 +738,12 @@  void pcie_aer_root_init(PCIDevice *dev)
                  PCI_ERR_ROOT_CMD_EN_MASK);
     pci_set_long(dev->w1cmask + pos + PCI_ERR_ROOT_STATUS,
                  PCI_ERR_ROOT_STATUS_REPORT_MASK);
+    /* Bits 31:27 - Advanced Error Interrupt Message Number
+     * These bits are updated when the number of MSI messages changes.
+     * By clearing the cmask, pcie devices can be migrated.
+     */
+    pci_set_long(dev->cmask + pos + PCI_ERR_ROOT_STATUS,
+                 (1 << PCI_ERR_ROOT_IRQ_SHIFT) - 1);
 }
 
 void pcie_aer_root_reset(PCIDevice *dev)