Patchwork [2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page

login
register
mail settings
Submitter Michael S. Tsirkin
Date April 9, 2014, 1:52 p.m.
Message ID <20140409135231.GA12232@redhat.com>
Download mbox | patch
Permalink /patch/337846/
State New
Headers show

Comments

Michael S. Tsirkin - April 9, 2014, 1:52 p.m.
On Wed, Apr 09, 2014 at 10:56:57AM +0000, Gonglei (Arei) wrote:
> Hi,
> 
> > > QEMU only mmap MSIX_PAGE_SIZE memory for all pci devices in
> > > assigned_dev_register_msix_mmio(), meanwhile the set the one
> > > page memmory to zero, so the rest memory will be random value
> > > (maybe etnry.data is not 0). In the assigned_dev_update_msix_mmio()
> > > maybe occur the issue of entry_nr > 256, and the kmod reports
> > > the EINVAL error.
> > >
> > > Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> > 
> > Okay so this kind of works because guest does not try
> > to use so many vectors.
> > 
> Yes. It's true.
> 
> > But I think it's better not to give guest more entries
> > than we can actually support.
> > 
> > How about tweaking MSIX capability exposed to guest to limit table size?
> > 
> IMHO, the MSIX table entry size got form the physic NIC hardware. The software
> layer shouldn't tweak the capability of real hardware, neither QEMU nor KVM. 
> 
> 
> Best regards,
> -Gonglei

Should be easy to fix though. Does the following help?

-->

pci-assign: limit # of msix vectors


Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Gonglei (Arei) - April 10, 2014, 2:34 a.m.
> On Wed, Apr 09, 2014 at 10:56:57AM +0000, Gonglei (Arei) wrote:
> > Hi,
> >
> > > > QEMU only mmap MSIX_PAGE_SIZE memory for all pci devices in
> > > > assigned_dev_register_msix_mmio(), meanwhile the set the one
> > > > page memmory to zero, so the rest memory will be random value
> > > > (maybe etnry.data is not 0). In the assigned_dev_update_msix_mmio()
> > > > maybe occur the issue of entry_nr > 256, and the kmod reports
> > > > the EINVAL error.
> > > >
> > > > Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> > >
> > > Okay so this kind of works because guest does not try
> > > to use so many vectors.
> > >
> > Yes. It's true.
> >
> > > But I think it's better not to give guest more entries
> > > than we can actually support.
> > >
> > > How about tweaking MSIX capability exposed to guest to limit table size?
> > >
> > IMHO, the MSIX table entry size got form the physic NIC hardware. The
> software
> > layer shouldn't tweak the capability of real hardware, neither QEMU nor
> KVM.
> >
> >
> > Best regards,
> > -Gonglei
> 
> Should be easy to fix though. Does the following help?
> 
Good, the patch is perfect.

> -->
> 
> pci-assign: limit # of msix vectors
> 
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> 
> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index a825871..76aa86e 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -1258,6 +1258,7 @@ static int assigned_device_pci_cap_init(PCIDevice
> *pci_dev)
>      if (pos != 0 && kvm_device_msix_supported(kvm_state)) {
>          int bar_nr;
>          uint32_t msix_table_entry;
> +        uint16_t msix_max;
> 
>          if (!check_irqchip_in_kernel()) {
>              return -ENOTSUP;
> @@ -1269,9 +1270,10 @@ static int assigned_device_pci_cap_init(PCIDevice
> *pci_dev)
>          }
>          pci_dev->msix_cap = pos;
> 
> -        pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS,
> -                     pci_get_word(pci_dev->config + pos +
> PCI_MSIX_FLAGS) &
> -                     PCI_MSIX_FLAGS_QSIZE);
> +        msix_max = (pci_get_word(pci_dev->config + pos +
> PCI_MSIX_FLAGS) &
> +                    PCI_MSIX_FLAGS_QSIZE) + 1;
> +        msix_max = MIN(msix_max, KVM_MAX_MSIX_PER_DEV);
> +        pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS, msix_max -
> 1);
> 
>          /* Only enable and function mask bits are writable */
>          pci_set_word(pci_dev->wmask + pos + PCI_MSIX_FLAGS,
> @@ -1281,9 +1283,7 @@ static int assigned_device_pci_cap_init(PCIDevice
> *pci_dev)
>          bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
>          msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
>          dev->msix_table_addr = pci_region[bar_nr].base_addr +
> msix_table_entry;
> -        dev->msix_max = pci_get_word(pci_dev->config + pos +
> PCI_MSIX_FLAGS);
> -        dev->msix_max &= PCI_MSIX_FLAGS_QSIZE;
> -        dev->msix_max += 1;
> +        dev->msix_max = msix_max;
>      }
> 
>      /* Minimal PM support, nothing writable, device appears to NAK changes
> */

In the Guest OS, the pci config of VF as bellow:

linux:/sys/bus/pci/devices/0000:00:08.0 # hexdump config 
0000000 10df e228 0507 0010 0010 0200 0000 0000
0000010 000c fe00 0000 0000 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 10df e264
0000030 0000 0000 0054 0000 0000 0000 0000 0000
0000040 0000 0000 0008 0000 0000 0000 0000 0000
0000050 0000 0000 8409 0008 2b41 c002 0000 0000
0000060 0005 000a 0000 0000 0000 0000 0000 0000
0000070 0000 0000 0000 0000 6011 80ff 4000 0000
0000080 3400 0000 9403 0000 0000 0000 0000 0000
0000090 0000 0000 7810 0002 8724 1000 0000 0000
00000a0 dc83 0041 0000 0000 0000 0000 0000 0000
00000b0 0000 0000 0000 0000 001f 0010 0000 0000
00000c0 000e 0000 0000 0000 0000 0000 0000 0000
00000d0 0000 0000 0000 0000 0000 0000 0000 0000
*
0000100

# lspci
00:08.0 Ethernet controller: Emulex Corporation Device e228 (rev 10)

The msix table size is *0ff*. Expect a formal patch. Thanks.

Tested-by: Gonglei <arei.gonglei@huawei.com>


Best regards,
-Gonglei
Paolo Bonzini - April 28, 2014, 1:51 p.m.
Il 09/04/2014 15:52, Michael S. Tsirkin ha scritto:
> On Wed, Apr 09, 2014 at 10:56:57AM +0000, Gonglei (Arei) wrote:
>> Hi,
>>
>>>> QEMU only mmap MSIX_PAGE_SIZE memory for all pci devices in
>>>> assigned_dev_register_msix_mmio(), meanwhile the set the one
>>>> page memmory to zero, so the rest memory will be random value
>>>> (maybe etnry.data is not 0). In the assigned_dev_update_msix_mmio()
>>>> maybe occur the issue of entry_nr > 256, and the kmod reports
>>>> the EINVAL error.
>>>>
>>>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>>>
>>> Okay so this kind of works because guest does not try
>>> to use so many vectors.
>>>
>> Yes. It's true.
>>
>>> But I think it's better not to give guest more entries
>>> than we can actually support.
>>>
>>> How about tweaking MSIX capability exposed to guest to limit table size?
>>>
>> IMHO, the MSIX table entry size got form the physic NIC hardware. The software
>> layer shouldn't tweak the capability of real hardware, neither QEMU nor KVM.
>>
>>
>> Best regards,
>> -Gonglei
>
> Should be easy to fix though. Does the following help?
>
> -->
>
> pci-assign: limit # of msix vectors
>
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
>
> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index a825871..76aa86e 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -1258,6 +1258,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>      if (pos != 0 && kvm_device_msix_supported(kvm_state)) {
>          int bar_nr;
>          uint32_t msix_table_entry;
> +        uint16_t msix_max;
>
>          if (!check_irqchip_in_kernel()) {
>              return -ENOTSUP;
> @@ -1269,9 +1270,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>          }
>          pci_dev->msix_cap = pos;
>
> -        pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS,
> -                     pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
> -                     PCI_MSIX_FLAGS_QSIZE);
> +        msix_max = (pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
> +                    PCI_MSIX_FLAGS_QSIZE) + 1;
> +        msix_max = MIN(msix_max, KVM_MAX_MSIX_PER_DEV);
> +        pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS, msix_max - 1);
>
>          /* Only enable and function mask bits are writable */
>          pci_set_word(pci_dev->wmask + pos + PCI_MSIX_FLAGS,
> @@ -1281,9 +1283,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>          bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
>          msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
>          dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
> -        dev->msix_max = pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS);
> -        dev->msix_max &= PCI_MSIX_FLAGS_QSIZE;
> -        dev->msix_max += 1;
> +        dev->msix_max = msix_max;
>      }
>
>      /* Minimal PM support, nothing writable, device appears to NAK changes */
>

Michael, can you send it as toplevel and with a nice commit message so 
that I can apply it to uq/master?  In the meanwhile I've applied patch 1.

Paolo

Patch

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index a825871..76aa86e 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -1258,6 +1258,7 @@  static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
     if (pos != 0 && kvm_device_msix_supported(kvm_state)) {
         int bar_nr;
         uint32_t msix_table_entry;
+        uint16_t msix_max;
 
         if (!check_irqchip_in_kernel()) {
             return -ENOTSUP;
@@ -1269,9 +1270,10 @@  static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         }
         pci_dev->msix_cap = pos;
 
-        pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS,
-                     pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
-                     PCI_MSIX_FLAGS_QSIZE);
+        msix_max = (pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
+                    PCI_MSIX_FLAGS_QSIZE) + 1;
+        msix_max = MIN(msix_max, KVM_MAX_MSIX_PER_DEV);
+        pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS, msix_max - 1);
 
         /* Only enable and function mask bits are writable */
         pci_set_word(pci_dev->wmask + pos + PCI_MSIX_FLAGS,
@@ -1281,9 +1283,7 @@  static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
         msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
         dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
-        dev->msix_max = pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS);
-        dev->msix_max &= PCI_MSIX_FLAGS_QSIZE;
-        dev->msix_max += 1;
+        dev->msix_max = msix_max;
     }
 
     /* Minimal PM support, nothing writable, device appears to NAK changes */