diff mbox

[v2,5/6] vmw_pvscsi: The pvscsi device is a PCIE endpoint

Message ID 1449994112-7054-6-git-send-email-shmulik.ladkani@ravellosystems.com
State New
Headers show

Commit Message

Shmulik Ladkani Dec. 13, 2015, 8:08 a.m. UTC
Report the 'express endpoint' capability if on a PCIE bus.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
---
 hw/scsi/vmw_pvscsi.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini Dec. 14, 2015, 5:14 p.m. UTC | #1
On 13/12/2015 09:08, Shmulik Ladkani wrote:
> +    pvs_k->parent_dc_realize = dc->realize;

Marcel, Michael,

this creates a really nasty dependency on the contents of pci_qdev_realize.

Can you instead change PCIDeviceClass's pc->is_express to a function
pointer, and provide a sample implementation pci_is_express_true for the
devices that set is_express to true?

Thanks,

Paolo
Shmulik Ladkani Dec. 14, 2015, 5:31 p.m. UTC | #2
Hi,

On Mon, 14 Dec 2015 18:14:37 +0100 Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 13/12/2015 09:08, Shmulik Ladkani wrote:
> > +    pvs_k->parent_dc_realize = dc->realize;
> 
> Marcel, Michael,
> 
> this creates a really nasty dependency on the contents of pci_qdev_realize.
> 
> Can you instead change PCIDeviceClass's pc->is_express to a function
> pointer, and provide a sample implementation pci_is_express_true for the
> devices that set is_express to true?

Thanks Paolo, I like the idea.
Indeed repeating the parent_dc_realize hack for various devices seems
awkward.

If this approach is accepted, I'm okay doing the suggested refactor.

Regards,
Shmulik
Michael S. Tsirkin Dec. 14, 2015, 5:37 p.m. UTC | #3
On Mon, Dec 14, 2015 at 06:14:37PM +0100, Paolo Bonzini wrote:
> 
> 
> On 13/12/2015 09:08, Shmulik Ladkani wrote:
> > +    pvs_k->parent_dc_realize = dc->realize;
> 
> Marcel, Michael,
> 
> this creates a really nasty dependency on the contents of pci_qdev_realize.
> 
> Can you instead change PCIDeviceClass's pc->is_express to a function
> pointer, and provide a sample implementation pci_is_express_true for the
> devices that set is_express to true?
> 
> Thanks,
> 
> Paolo

I'm not very familiar with vmw code, and I dislike overusing callbacks.
What exactly would this callback do?
Paolo Bonzini Dec. 14, 2015, 6:05 p.m. UTC | #4
On 14/12/2015 18:37, Michael S. Tsirkin wrote:
> > Can you instead change PCIDeviceClass's pc->is_express to a function
> > pointer, and provide a sample implementation pci_is_express_true for the
> > devices that set is_express to true?
> 
> I'm not very familiar with vmw code, and I dislike overusing callbacks.
> What exactly would this callback do?

It would be exactly the same for virtio.  The callback would return true
if the device should have a PCIe capability, false if not.  The default
would be always false.

Paolo
Marcel Apfelbaum Dec. 14, 2015, 6:25 p.m. UTC | #5
On 12/14/2015 07:14 PM, Paolo Bonzini wrote:
>
>
> On 13/12/2015 09:08, Shmulik Ladkani wrote:
>> +    pvs_k->parent_dc_realize = dc->realize;
>
> Marcel, Michael,
>
> this creates a really nasty dependency on the contents of pci_qdev_realize.

Hi Paolo,

In this case is exactly want we want to do. We don't care what pci_qdev_realize does.
We just want to made a change and then call it. We can think of it as "pre-realize."
I think I already saw a "post-realize" or something.

The usage is also documented in include/qom/object.h, how to derive "do_something" (see derived_do_something).

>
> Can you instead change PCIDeviceClass's pc->is_express to a function
> pointer, and provide a sample implementation pci_is_express_true for the
> devices that set is_express to true?

I think I thougth about this too and I have nothing against it.
The only "down side" I can see is that "is_express" means actually "can_be_express",
and some classes keep it true even if sometimes is not actually a PCIe device.

Now we change back the meaning to "is_really_express" :) , but we may encounter some side effects.
(I understand that we keep it true so we don't have to maintain some complicated code
on VMState related to the config space size.)

Thanks,
Marcel

>
> Thanks,
>
> Paolo
>
Shmulik Ladkani Dec. 14, 2015, 9:01 p.m. UTC | #6
Hi Michael,

On Mon, 14 Dec 2015 19:37:29 +0200 "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Dec 14, 2015 at 06:14:37PM +0100, Paolo Bonzini wrote:
> > 
> > On 13/12/2015 09:08, Shmulik Ladkani wrote:
> > > +    pvs_k->parent_dc_realize = dc->realize;
> > 
> > Marcel, Michael,
> > 
> > this creates a really nasty dependency on the contents of pci_qdev_realize.
> > 
> > Can you instead change PCIDeviceClass's pc->is_express to a function
> > pointer, and provide a sample implementation pci_is_express_true for the
> > devices that set is_express to true?
> > 
> 
> I'm not very familiar with vmw code, and I dislike overusing callbacks.
> What exactly would this callback do?

Not specific to vmw.

Recently we've made virtio-pci a pcie:
  1811e64c hw/virtio: Add PCIe capability to virtio devices

For migration, 'x-pcie-disable' proprety was introduced.
Thus, instances of virtio-pci may be either pci or pcie.

We'd like to do the same for vmxnet3 and vmw_pvscsi.

This exposes a design limitation.

PCIDeviceClass.is_express is a propery of the class.
All pci_dev instances get the QEMU_PCI_CAP_EXPRESS bit assigned into
their 'cap_present' according to their class 'pc->is_express' value,
early in 'pci_qdev_realize', quote:

static void pci_qdev_realize(DeviceState *qdev, Error **errp)
    ...
    /* initialize cap_present for pci_is_express() and pci_config_size() */
    if (pc->is_express) {
        pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
    }


However, we'd like to set QEMU_PCI_CAP_EXPRESS conditionally per
instance.

In order to set QEMU_PCI_CAP_EXPRESS conditionally per instance
(for example, according to the given x-pcie-disable property), the
'parent_dc_realize' trick was suggested.

Reasoning is documented in:
  0560b0e9 virtio-pci: Set the QEMU_PCI_CAP_EXPRESS capability early in its DeviceClass realize method

Alas, the same trick needs to be repeated for all classes whose
instances may be either pci or pcie.

What Paolo suggest, is having a callback which can return whether the
device *instance* needs the QEMU_PCI_CAP_EXPRESS bit.

So in 'pci_qdev_realize', instead of:

    if (pc->is_express)
        pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;

we'll have something like:

    if (pc->is_express(qdev))
        pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;

WDYT?

Regards,
Shmulik
Michael S. Tsirkin Dec. 14, 2015, 9:10 p.m. UTC | #7
On Mon, Dec 14, 2015 at 11:01:05PM +0200, Shmulik Ladkani wrote:
> Hi Michael,
> 
> On Mon, 14 Dec 2015 19:37:29 +0200 "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Dec 14, 2015 at 06:14:37PM +0100, Paolo Bonzini wrote:
> > > 
> > > On 13/12/2015 09:08, Shmulik Ladkani wrote:
> > > > +    pvs_k->parent_dc_realize = dc->realize;
> > > 
> > > Marcel, Michael,
> > > 
> > > this creates a really nasty dependency on the contents of pci_qdev_realize.
> > > 
> > > Can you instead change PCIDeviceClass's pc->is_express to a function
> > > pointer, and provide a sample implementation pci_is_express_true for the
> > > devices that set is_express to true?
> > > 
> > 
> > I'm not very familiar with vmw code, and I dislike overusing callbacks.
> > What exactly would this callback do?
> 
> Not specific to vmw.
> 
> Recently we've made virtio-pci a pcie:
>   1811e64c hw/virtio: Add PCIe capability to virtio devices
> 
> For migration, 'x-pcie-disable' proprety was introduced.
> Thus, instances of virtio-pci may be either pci or pcie.
> 
> We'd like to do the same for vmxnet3 and vmw_pvscsi.
> 
> This exposes a design limitation.
> 
> PCIDeviceClass.is_express is a propery of the class.
> All pci_dev instances get the QEMU_PCI_CAP_EXPRESS bit assigned into
> their 'cap_present' according to their class 'pc->is_express' value,
> early in 'pci_qdev_realize', quote:
> 
> static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>     ...
>     /* initialize cap_present for pci_is_express() and pci_config_size() */
>     if (pc->is_express) {
>         pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
>     }
> 
> 
> However, we'd like to set QEMU_PCI_CAP_EXPRESS conditionally per
> instance.
> 
> In order to set QEMU_PCI_CAP_EXPRESS conditionally per instance
> (for example, according to the given x-pcie-disable property), the
> 'parent_dc_realize' trick was suggested.
> 
> Reasoning is documented in:
>   0560b0e9 virtio-pci: Set the QEMU_PCI_CAP_EXPRESS capability early in its DeviceClass realize method
> 
> Alas, the same trick needs to be repeated for all classes whose
> instances may be either pci or pcie.

Actually, I think you can just move the code to pci core.
There won't be a need for a callback then.

> What Paolo suggest, is having a callback which can return whether the
> device *instance* needs the QEMU_PCI_CAP_EXPRESS bit.
> 
> So in 'pci_qdev_realize', instead of:
> 
>     if (pc->is_express)
>         pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
> 
> we'll have something like:
> 
>     if (pc->is_express(qdev))
>         pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
> 
> WDYT?
> 
> Regards,
> Shmulik
Shmulik Ladkani Dec. 15, 2015, 5:48 a.m. UTC | #8
Hi,

On Mon, 14 Dec 2015 23:10:20 +0200 "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > In order to set QEMU_PCI_CAP_EXPRESS conditionally per instance
> > (for example, according to the given x-pcie-disable property), the
> > 'parent_dc_realize' trick was suggested.
> > 
> > Reasoning is documented in:
> >   0560b0e9 virtio-pci: Set the QEMU_PCI_CAP_EXPRESS capability early in its DeviceClass realize method
> > 
> > Alas, the same trick needs to be repeated for all classes whose
> > instances may be either pci or pcie.
> 
> Actually, I think you can just move the code to pci core.
> There won't be a need for a callback then.

Michael, can you please elaborate?

Regards,
Shmulik
diff mbox

Patch

diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
index 00d6900..fb53b24 100644
--- a/hw/scsi/vmw_pvscsi.c
+++ b/hw/scsi/vmw_pvscsi.c
@@ -50,6 +50,7 @@ 
 
 typedef struct PVSCSIClass {
     PCIDeviceClass parent_class;
+    DeviceRealize parent_dc_realize;
 } PVSCSIClass;
 
 #define TYPE_PVSCSI "pvscsi"
@@ -64,11 +65,15 @@  typedef struct PVSCSIClass {
 #define PVSCSI_COMPAT_OLD_PCI_CONFIGURATION_BIT 0
 #define PVSCSI_COMPAT_OLD_PCI_CONFIGURATION \
     (1 << PVSCSI_COMPAT_OLD_PCI_CONFIGURATION_BIT)
+#define PVSCSI_COMPAT_DISABLE_PCIE_BIT 1
+#define PVSCSI_COMPAT_DISABLE_PCIE \
+    (1 << PVSCSI_COMPAT_DISABLE_PCIE_BIT)
 
 #define PVSCSI_USE_OLD_PCI_CONFIGURATION(s) \
     ((s)->compat_flags & PVSCSI_COMPAT_OLD_PCI_CONFIGURATION)
 #define PVSCSI_MSI_OFFSET(s) \
     (PVSCSI_USE_OLD_PCI_CONFIGURATION(s) ? 0x50 : 0x7c)
+#define PVSCSI_EXP_EP_OFFSET (0x40)
 
 typedef struct PVSCSIRingInfo {
     uint64_t            rs_pa;
@@ -1112,6 +1117,10 @@  pvscsi_init(PCIDevice *pci_dev)
 
     pvscsi_init_msi(s);
 
+    if (pci_is_express(pci_dev) && pci_bus_is_express(pci_dev->bus)) {
+        pcie_endpoint_cap_init(pci_dev, PVSCSI_EXP_EP_OFFSET);
+    }
+
     s->completion_worker = qemu_bh_new(pvscsi_process_completion_queue, s);
     if (!s->completion_worker) {
         pvscsi_cleanup_msi(s);
@@ -1166,6 +1175,27 @@  pvscsi_post_load(void *opaque, int version_id)
     return 0;
 }
 
+static bool pvscsi_vmstate_need_pcie_device(void *opaque)
+{
+    PVSCSIState *s = PVSCSI(opaque);
+
+    return !(s->compat_flags & PVSCSI_COMPAT_DISABLE_PCIE);
+}
+
+static bool pvscsi_vmstate_test_pci_device(void *opaque, int version_id)
+{
+    return !pvscsi_vmstate_need_pcie_device(opaque);
+}
+
+static const VMStateDescription vmstate_pvscsi_pcie_device = {
+    .name = "pvscsi/pcie",
+    .needed = pvscsi_vmstate_need_pcie_device,
+    .fields = (VMStateField[]) {
+        VMSTATE_PCIE_DEVICE(parent_obj, PVSCSIState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_pvscsi = {
     .name = "pvscsi",
     .version_id = 0,
@@ -1173,7 +1203,9 @@  static const VMStateDescription vmstate_pvscsi = {
     .pre_save = pvscsi_pre_save,
     .post_load = pvscsi_post_load,
     .fields = (VMStateField[]) {
-        VMSTATE_PCI_DEVICE(parent_obj, PVSCSIState),
+        VMSTATE_STRUCT_TEST(parent_obj, PVSCSIState,
+                            pvscsi_vmstate_test_pci_device, 0,
+                            vmstate_pci_device, PCIDevice),
         VMSTATE_UINT8(msi_used, PVSCSIState),
         VMSTATE_UINT32(resetting, PVSCSIState),
         VMSTATE_UINT64(reg_interrupt_status, PVSCSIState),
@@ -1198,6 +1230,10 @@  static const VMStateDescription vmstate_pvscsi = {
         VMSTATE_UINT64(rings.filled_cmp_ptr, PVSCSIState),
 
         VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription*[]) {
+        &vmstate_pvscsi_pcie_device,
+        NULL
     }
 };
 
@@ -1208,10 +1244,24 @@  static Property pvscsi_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static void pvscsi_realize(DeviceState *qdev, Error **errp)
+{
+    PVSCSIClass *pvs_c = PVSCSI_DEVICE_GET_CLASS(qdev);
+    PCIDevice *pci_dev = PCI_DEVICE(qdev);
+    PVSCSIState *s = PVSCSI(qdev);
+
+    if (!(s->compat_flags & PVSCSI_COMPAT_DISABLE_PCIE)) {
+        pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
+    }
+
+    pvs_c->parent_dc_realize(qdev, errp);
+}
+
 static void pvscsi_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+    PVSCSIClass *pvs_k = PVSCSI_DEVICE_CLASS(klass);
     HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
 
     k->init = pvscsi_init;
@@ -1220,6 +1270,8 @@  static void pvscsi_class_init(ObjectClass *klass, void *data)
     k->device_id = PCI_DEVICE_ID_VMWARE_PVSCSI;
     k->class_id = PCI_CLASS_STORAGE_SCSI;
     k->subsystem_id = 0x1000;
+    pvs_k->parent_dc_realize = dc->realize;
+    dc->realize = pvscsi_realize;
     dc->reset = pvscsi_reset;
     dc->vmsd = &vmstate_pvscsi;
     dc->props = pvscsi_properties;