mbox series

[v4,00/17] Add VFIO mediated device support and DEV-MSI support for the idxd driver

Message ID 160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com
Headers show
Series Add VFIO mediated device support and DEV-MSI support for the idxd driver | expand

Message

Dave Jiang Oct. 30, 2020, 6:50 p.m. UTC
- Would like to acquire Reviewed-by tags from Thomas for MSI and IMS related bits.
- Would like to acquire for Reviewed-by tags from Alex and/or Kirti for the VFIO mdev driver bits.
- Would like to acquire for Reviewed-by tag from Bjorn for PCI common bits
- Would like to acquire 5.11 kernel acceptance through dmaengine (Vinod) with the review tags. 

v4:
dev-msi:
- Make interrupt remapping code more readable (Thomas)
- Add flush writes to unmask/write and reset ims slots (Thomas)
- Interrupt Message Storm-> Interrupt Message Store (Thomas)
- Merge in pasid programming code. (Thomas)

mdev:
- Fixed up domain assignment (Thomas)
- Define magic numbers (Thomas)
- Move siov detection code to PCI common (Thomas)
- Remove duplicated MSI entry info (Thomas)
- Convert code to use ims_slot (Thomas)
- Add explanation of pasid programming for IMS entry (Thomas)
- Add release int handle release support due to DSA spec 1.1 update.

v3:
Dev-msi:
- No need to add support for 2 different dev-msi irq domains, a common
  once can be used for both the cases(with IR enabled/disabled)
- Add arch specific function to specify additions to msi_prepare callback
  instead of making the callback a weak function
- Call platform ops directly instead of a wrapper function
- Make mask/unmask callbacks as void functions
  dev->msi_domain should be updated at the device driver level before
  calling dev_msi_alloc_irqs()
  dev_msi_alloc/free_irqs() cannot be used for PCI devices
  Followed the generic layering scheme: infrastructure bits->arch bits->enabling bits

Mdev:
- Remove set kvm group notifier (Yan Zhao)
- Fix VFIO irq trigger removal (Yan Zhao)
- Add mmio read flush to ims mask (Jason)

v2:
IMS (now dev-msi):
- With recommendations from Jason/Thomas/Dan on making IMS more generic:
- Pass a non-pci generic device(struct device) for IMS management instead of mdev
- Remove all references to mdev and symbol_get/put
- Remove all references to IMS in common code and replace with dev-msi
- Remove dynamic allocation of platform-msi interrupts: no groups,no
  new msi list or list helpers
- Create a generic dev-msi domain with and without interrupt remapping enabled.
- Introduce dev_msi_domain_alloc_irqs and dev_msi_domain_free_irqs apis

mdev: 
- Removing unrelated bits from SVA enabling that’s not necessary for
  the submission. (Kevin)
- Restructured entire mdev driver series to make reviewing easier (Kevin)
- Made rw emulation more robust (Kevin)
- Removed uuid wq type and added single dedicated wq type (Kevin)
- Locking fixes for vdev (Yan Zhao)
- VFIO MSIX trigger fixes (Yan Zhao)

This code series will match the support of the 5.6 kernel (stage 1) driver but on guest.

The code has dependency on Thomas’s MSI restructuring patch series:
https://lore.kernel.org/lkml/20200826111628.794979401@linutronix.de/

The code has dependency on Baolu’s mdev domain patches:
https://lore.kernel.org/lkml/20201030045809.957927-1-baolu.lu@linux.intel.com/

The code has dependency on David Box’s dvsec definition patch:
https://lore.kernel.org/linux-pci/bc5f059c5bae957daebde699945c80808286bf45.camel@linux.intel.com/T/#m1d0dc12e3b2c739e2c37106a45f325bb8f001774

Stage 1 of the driver has been accepted in v5.6 kernel. It supports dedicated workqueue (wq)
without Shared Virtual Memory (SVM) support. 

Stage 2 of the driver supports shared wq and SVM. It should be pending for 5.11 and in
dmaengine/next.

VFIO mediated device framework allows vendor drivers to wrap a portion of
device resources into virtual devices (mdev). Each mdev can be assigned
to different guest using the same set of VFIO uAPIs as assigning a
physical device. Accessing to the mdev resource is served with mixed
policies. For example, vendor drivers typically mark data-path interface
as pass-through for fast guest operations, and then trap-and-mediate the
control-path interface to avoid undesired interference between mdevs. Some
level of emulation is necessary behind vfio mdev to compose the virtual
device interface.

This series brings mdev to idxd driver to enable Intel Scalable IOV
(SIOV), a hardware-assisted mediated pass-through technology. SIOV makes
each DSA wq independently assignable through PASID-granular resource/DMA
isolation. It helps improve scalability and reduces mediation complexity
against purely software-based mdev implementations. Each assigned wq is
configured by host and exposed to the guest in a read-only configuration
mode, which allows the guest to use the wq w/o additional setup. This
design greatly reduces the emulation bits to focus on handling commands
from guests.

There are two possible avenues to support virtual device composition:
1. VFIO mediated device (mdev) or 2. User space DMA through char device
(or UACCE). Given the small portion of emulation to satisfy our needs
and VFIO mdev having the infrastructure already to support the device
passthrough, we feel that VFIO mdev is the better route. For more in depth
explanation, see documentation in Documents/driver-api/vfio/mdev-idxd.rst.

Introducing mdev types “1dwq-v1” type. This mdev type allows
allocation of a single dedicated wq from available dedicated wqs. After
a workqueue (wq) is enabled, the user will generate an uuid. On mdev
creation, the mdev driver code will find a dwq depending on the mdev
type. When the create operation is successful, the user generated uuid
can be passed to qemu. When the guest boots up, it should discover a
DSA device when doing PCI discovery.

For example of “1dwq-v1” type:
1. Enable wq with “mdev” wq type
2. A user generated uuid.
3. The uuid is written to the mdev class sysfs path:
echo $UUID > /sys/class/mdev_bus/0000\:00\:0a.0/mdev_supported_types/idxd-1dwq-v1/create
4. Pass the following parameter to qemu:
"-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:0a.0/$UUID"
 
The wq exported through mdev will have the read only config bit set
for configuration. This means that the device does not require the
typical configuration. After enabling the device, the user must set the
WQ type and name. That is all is necessary to enable the WQ and start
using it. The single wq configuration is not the only way to create the
mdev. Multi wqs support for mdev will be in the future works.
 
The mdev utilizes Interrupt Message Store or IMS[3], a device-specific
MSI implementation, instead of MSIX for interrupts for the guest. This
preserves MSIX for host usages and also allows a significantly larger
number of interrupt vectors for guest usage.

The idxd driver implements IMS as on-device memory mapped unified
storage. Each interrupt message is stored as a DWORD size data payload
and a 64-bit address (same as MSI-X). Access to the IMS is through the
host idxd driver.

The idxd driver makes use of the generic IMS irq chip and domain which
stores the interrupt messages as an array in device memory. Allocation and
freeing of interrupts happens via the generic msi_domain_alloc/free_irqs()
interface. One only needs to ensure the interrupt domain is stored in
the underlying device struct.

[1]: https://lore.kernel.org/lkml/157965011794.73301.15960052071729101309.stgit@djiang5-desk3.ch.intel.com/
[2]: https://software.intel.com/en-us/articles/intel-sdm
[3]: https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[4]: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification
[5]: https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator
[6]: https://intel.github.io/idxd/
[7]: https://github.com/intel/idxd-driver idxd-stage2.5

---

Dave Jiang (15):
      dmaengine: idxd: add theory of operation documentation for idxd mdev
      dmaengine: idxd: add support for readonly config devices
      dmaengine: idxd: add interrupt handle request support
      PCI: add SIOV and IMS capability detection
      dmaengine: idxd: add IMS support in base driver
      dmaengine: idxd: add device support functions in prep for mdev
      dmaengine: idxd: add basic mdev registration and helper functions
      dmaengine: idxd: add emulation rw routines
      dmaengine: idxd: prep for virtual device commands
      dmaengine: idxd: virtual device commands emulation
      dmaengine: idxd: ims setup for the vdcm
      dmaengine: idxd: add mdev type as a new wq type
      dmaengine: idxd: add dedicated wq mdev type
      dmaengine: idxd: add new wq state for mdev
      dmaengine: idxd: add error notification from host driver to mediated device

Megha Dey (1):
      iommu/vt-d: Add DEV-MSI support

Thomas Gleixner (1):
      irqchip: Add IMS (Interrupt Message Store) driver


 .../ABI/stable/sysfs-driver-dma-idxd          |    6 +
 Documentation/driver-api/vfio/mdev-idxd.rst   |  404 ++++++
 MAINTAINERS                                   |    1 +
 drivers/dma/Kconfig                           |    9 +
 drivers/dma/idxd/Makefile                     |    2 +
 drivers/dma/idxd/cdev.c                       |    6 +-
 drivers/dma/idxd/device.c                     |  294 ++++-
 drivers/dma/idxd/idxd.h                       |   67 +-
 drivers/dma/idxd/init.c                       |   86 ++
 drivers/dma/idxd/irq.c                        |    6 +-
 drivers/dma/idxd/mdev.c                       | 1121 +++++++++++++++++
 drivers/dma/idxd/mdev.h                       |  116 ++
 drivers/dma/idxd/registers.h                  |   38 +-
 drivers/dma/idxd/submit.c                     |   37 +-
 drivers/dma/idxd/sysfs.c                      |   52 +-
 drivers/dma/idxd/vdev.c                       |  976 ++++++++++++++
 drivers/dma/idxd/vdev.h                       |   28 +
 drivers/iommu/intel/iommu.c                   |   31 +-
 drivers/iommu/intel/irq_remapping.c           |   34 +-
 drivers/pci/Kconfig                           |   15 +
 drivers/pci/Makefile                          |    2 +
 drivers/pci/dvsec.c                           |   40 +
 drivers/pci/siov.c                            |   50 +
 include/linux/pci-siov.h                      |   18 +
 include/linux/pci.h                           |    3 +
 include/uapi/linux/idxd.h                     |    2 +
 include/uapi/linux/pci_regs.h                 |    4 +
 kernel/irq/msi.c                              |    2 +
 28 files changed, 3352 insertions(+), 98 deletions(-)
 create mode 100644 Documentation/driver-api/vfio/mdev-idxd.rst
 create mode 100644 drivers/dma/idxd/mdev.c
 create mode 100644 drivers/dma/idxd/mdev.h
 create mode 100644 drivers/dma/idxd/vdev.c
 create mode 100644 drivers/dma/idxd/vdev.h
 create mode 100644 drivers/pci/dvsec.c
 create mode 100644 drivers/pci/siov.c
 create mode 100644 include/linux/pci-siov.h

--

Comments

Jason Gunthorpe Oct. 30, 2020, 6:58 p.m. UTC | #1
On Fri, Oct 30, 2020 at 11:50:47AM -0700, Dave Jiang wrote:
>  .../ABI/stable/sysfs-driver-dma-idxd          |    6 +
>  Documentation/driver-api/vfio/mdev-idxd.rst   |  404 ++++++
>  MAINTAINERS                                   |    1 +
>  drivers/dma/Kconfig                           |    9 +
>  drivers/dma/idxd/Makefile                     |    2 +
>  drivers/dma/idxd/cdev.c                       |    6 +-
>  drivers/dma/idxd/device.c                     |  294 ++++-
>  drivers/dma/idxd/idxd.h                       |   67 +-
>  drivers/dma/idxd/init.c                       |   86 ++
>  drivers/dma/idxd/irq.c                        |    6 +-
>  drivers/dma/idxd/mdev.c                       | 1121 +++++++++++++++++
>  drivers/dma/idxd/mdev.h                       |  116 ++

Again, a subsytem driver belongs in the directory hierarchy of the
subsystem, not in other random places. All this mdev stuff belongs
under drivers/vfio

Jason
Dave Jiang Oct. 30, 2020, 7:13 p.m. UTC | #2
On 10/30/2020 11:58 AM, Jason Gunthorpe wrote:
> On Fri, Oct 30, 2020 at 11:50:47AM -0700, Dave Jiang wrote:
>>   .../ABI/stable/sysfs-driver-dma-idxd          |    6 +
>>   Documentation/driver-api/vfio/mdev-idxd.rst   |  404 ++++++
>>   MAINTAINERS                                   |    1 +
>>   drivers/dma/Kconfig                           |    9 +
>>   drivers/dma/idxd/Makefile                     |    2 +
>>   drivers/dma/idxd/cdev.c                       |    6 +-
>>   drivers/dma/idxd/device.c                     |  294 ++++-
>>   drivers/dma/idxd/idxd.h                       |   67 +-
>>   drivers/dma/idxd/init.c                       |   86 ++
>>   drivers/dma/idxd/irq.c                        |    6 +-
>>   drivers/dma/idxd/mdev.c                       | 1121 +++++++++++++++++
>>   drivers/dma/idxd/mdev.h                       |  116 ++
> 
> Again, a subsytem driver belongs in the directory hierarchy of the
> subsystem, not in other random places. All this mdev stuff belongs
> under drivers/vfio

Alex seems to have disagreed last time....
https://lore.kernel.org/dmaengine/20200917113016.425dcde7@x1.home/

And I do agree with his perspective. The mdev is an extension of the PF driver. 
It's a bit awkward to be a stand alone mdev driver under vfio/mdev/.

> 
> Jason
>
Jason Gunthorpe Oct. 30, 2020, 7:17 p.m. UTC | #3
On Fri, Oct 30, 2020 at 12:13:48PM -0700, Dave Jiang wrote:
> 
> 
> On 10/30/2020 11:58 AM, Jason Gunthorpe wrote:
> > On Fri, Oct 30, 2020 at 11:50:47AM -0700, Dave Jiang wrote:
> > >   .../ABI/stable/sysfs-driver-dma-idxd          |    6 +
> > >   Documentation/driver-api/vfio/mdev-idxd.rst   |  404 ++++++
> > >   MAINTAINERS                                   |    1 +
> > >   drivers/dma/Kconfig                           |    9 +
> > >   drivers/dma/idxd/Makefile                     |    2 +
> > >   drivers/dma/idxd/cdev.c                       |    6 +-
> > >   drivers/dma/idxd/device.c                     |  294 ++++-
> > >   drivers/dma/idxd/idxd.h                       |   67 +-
> > >   drivers/dma/idxd/init.c                       |   86 ++
> > >   drivers/dma/idxd/irq.c                        |    6 +-
> > >   drivers/dma/idxd/mdev.c                       | 1121 +++++++++++++++++
> > >   drivers/dma/idxd/mdev.h                       |  116 ++
> > 
> > Again, a subsytem driver belongs in the directory hierarchy of the
> > subsystem, not in other random places. All this mdev stuff belongs
> > under drivers/vfio
> 
> Alex seems to have disagreed last time....
> https://lore.kernel.org/dmaengine/20200917113016.425dcde7@x1.home/

Nobody else in the kernel is splitting subsystems up anymore
 
> And I do agree with his perspective. The mdev is an extension of the PF
> driver. It's a bit awkward to be a stand alone mdev driver under vfio/mdev/.

By this logic we'd have giagantic drivers under drivers/ethernet
touching netdev, rdma, scsi, vdpa, etc just because that is where the
PF driver came from.

It is not how the kernel works. Subsystem owners are responsible for
their subsystem, drivers implementing their subsystem are under the
subsystem directory.

Jason
Ashok Raj Oct. 30, 2020, 7:23 p.m. UTC | #4
On Fri, Oct 30, 2020 at 04:17:06PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 30, 2020 at 12:13:48PM -0700, Dave Jiang wrote:
> > 
> > 
> > On 10/30/2020 11:58 AM, Jason Gunthorpe wrote:
> > > On Fri, Oct 30, 2020 at 11:50:47AM -0700, Dave Jiang wrote:
> > > >   .../ABI/stable/sysfs-driver-dma-idxd          |    6 +
> > > >   Documentation/driver-api/vfio/mdev-idxd.rst   |  404 ++++++
> > > >   MAINTAINERS                                   |    1 +
> > > >   drivers/dma/Kconfig                           |    9 +
> > > >   drivers/dma/idxd/Makefile                     |    2 +
> > > >   drivers/dma/idxd/cdev.c                       |    6 +-
> > > >   drivers/dma/idxd/device.c                     |  294 ++++-
> > > >   drivers/dma/idxd/idxd.h                       |   67 +-
> > > >   drivers/dma/idxd/init.c                       |   86 ++
> > > >   drivers/dma/idxd/irq.c                        |    6 +-
> > > >   drivers/dma/idxd/mdev.c                       | 1121 +++++++++++++++++
> > > >   drivers/dma/idxd/mdev.h                       |  116 ++
> > > 
> > > Again, a subsytem driver belongs in the directory hierarchy of the
> > > subsystem, not in other random places. All this mdev stuff belongs
> > > under drivers/vfio
> > 
> > Alex seems to have disagreed last time....
> > https://lore.kernel.org/dmaengine/20200917113016.425dcde7@x1.home/
> 
> Nobody else in the kernel is splitting subsystems up anymore
>  
> > And I do agree with his perspective. The mdev is an extension of the PF
> > driver. It's a bit awkward to be a stand alone mdev driver under vfio/mdev/.
> 
> By this logic we'd have giagantic drivers under drivers/ethernet
> touching netdev, rdma, scsi, vdpa, etc just because that is where the
> PF driver came from.

What makes you think this is providing services like scsi/rdma/vdpa etc.. ?

for DSA this playes the exact same role, not a different function 
as you highlight above. these mdev's are creating DSA for virtualization
use. They aren't providing a completely different role or subsystem per-se.

Cheers,
Ashok
Jason Gunthorpe Oct. 30, 2020, 7:30 p.m. UTC | #5
On Fri, Oct 30, 2020 at 12:23:25PM -0700, Raj, Ashok wrote:
> On Fri, Oct 30, 2020 at 04:17:06PM -0300, Jason Gunthorpe wrote:
> > On Fri, Oct 30, 2020 at 12:13:48PM -0700, Dave Jiang wrote:
> > > 
> > > 
> > > On 10/30/2020 11:58 AM, Jason Gunthorpe wrote:
> > > > On Fri, Oct 30, 2020 at 11:50:47AM -0700, Dave Jiang wrote:
> > > > >   .../ABI/stable/sysfs-driver-dma-idxd          |    6 +
> > > > >   Documentation/driver-api/vfio/mdev-idxd.rst   |  404 ++++++
> > > > >   MAINTAINERS                                   |    1 +
> > > > >   drivers/dma/Kconfig                           |    9 +
> > > > >   drivers/dma/idxd/Makefile                     |    2 +
> > > > >   drivers/dma/idxd/cdev.c                       |    6 +-
> > > > >   drivers/dma/idxd/device.c                     |  294 ++++-
> > > > >   drivers/dma/idxd/idxd.h                       |   67 +-
> > > > >   drivers/dma/idxd/init.c                       |   86 ++
> > > > >   drivers/dma/idxd/irq.c                        |    6 +-
> > > > >   drivers/dma/idxd/mdev.c                       | 1121 +++++++++++++++++
> > > > >   drivers/dma/idxd/mdev.h                       |  116 ++
> > > > 
> > > > Again, a subsytem driver belongs in the directory hierarchy of the
> > > > subsystem, not in other random places. All this mdev stuff belongs
> > > > under drivers/vfio
> > > 
> > > Alex seems to have disagreed last time....
> > > https://lore.kernel.org/dmaengine/20200917113016.425dcde7@x1.home/
> > 
> > Nobody else in the kernel is splitting subsystems up anymore
> >  
> > > And I do agree with his perspective. The mdev is an extension of the PF
> > > driver. It's a bit awkward to be a stand alone mdev driver under vfio/mdev/.
> > 
> > By this logic we'd have giagantic drivers under drivers/ethernet
> > touching netdev, rdma, scsi, vdpa, etc just because that is where the
> > PF driver came from.
> 
> What makes you think this is providing services like scsi/rdma/vdpa etc.. ?
> 
> for DSA this playes the exact same role, not a different function 
> as you highlight above. these mdev's are creating DSA for virtualization
> use. They aren't providing a completely different role or subsystem per-se.

It is a different subsystem, different maintainer, and different
reviewers.

It is a development process problem, it doesn't matter what it is
doing.

Jason
Ashok Raj Oct. 30, 2020, 8:43 p.m. UTC | #6
On Fri, Oct 30, 2020 at 04:30:45PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 30, 2020 at 12:23:25PM -0700, Raj, Ashok wrote:
> > On Fri, Oct 30, 2020 at 04:17:06PM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 30, 2020 at 12:13:48PM -0700, Dave Jiang wrote:
> > > > 
> > > > 
> > > > On 10/30/2020 11:58 AM, Jason Gunthorpe wrote:
> > > > > On Fri, Oct 30, 2020 at 11:50:47AM -0700, Dave Jiang wrote:
> > > > > >   .../ABI/stable/sysfs-driver-dma-idxd          |    6 +
> > > > > >   Documentation/driver-api/vfio/mdev-idxd.rst   |  404 ++++++
> > > > > >   MAINTAINERS                                   |    1 +
> > > > > >   drivers/dma/Kconfig                           |    9 +
> > > > > >   drivers/dma/idxd/Makefile                     |    2 +
> > > > > >   drivers/dma/idxd/cdev.c                       |    6 +-
> > > > > >   drivers/dma/idxd/device.c                     |  294 ++++-
> > > > > >   drivers/dma/idxd/idxd.h                       |   67 +-
> > > > > >   drivers/dma/idxd/init.c                       |   86 ++
> > > > > >   drivers/dma/idxd/irq.c                        |    6 +-
> > > > > >   drivers/dma/idxd/mdev.c                       | 1121 +++++++++++++++++
> > > > > >   drivers/dma/idxd/mdev.h                       |  116 ++
> > > > > 
> > > > > Again, a subsytem driver belongs in the directory hierarchy of the
> > > > > subsystem, not in other random places. All this mdev stuff belongs
> > > > > under drivers/vfio
> > > > 
> > > > Alex seems to have disagreed last time....
> > > > https://lore.kernel.org/dmaengine/20200917113016.425dcde7@x1.home/
> > > 
> > > Nobody else in the kernel is splitting subsystems up anymore
> > >  
> > > > And I do agree with his perspective. The mdev is an extension of the PF
> > > > driver. It's a bit awkward to be a stand alone mdev driver under vfio/mdev/.
> > > 
> > > By this logic we'd have giagantic drivers under drivers/ethernet
> > > touching netdev, rdma, scsi, vdpa, etc just because that is where the
> > > PF driver came from.
> > 
> > What makes you think this is providing services like scsi/rdma/vdpa etc.. ?
> > 
> > for DSA this playes the exact same role, not a different function 
> > as you highlight above. these mdev's are creating DSA for virtualization
> > use. They aren't providing a completely different role or subsystem per-se.
> 
> It is a different subsystem, different maintainer, and different
> reviewers.
> 
> It is a development process problem, it doesn't matter what it is
> doing.

So drawing that parallel, do you expect all drivers that call
pci_register_driver() to be located in drivers/pci? Aren't they scattered
all over the place ata,scsi, platform drivers and such?

As Alex pointed out, i915 and handful of s390 drivers that are mdev users
are not in drivers/vfio. Are you sayint those drivers don't get reviewed? 

This is no different than PF driver offering VF services. Its a logical
extension. 

Reviews happen for mdev users today. What you suggest seems like cutting 
the feet to fit the shoe. Unless the maintainers are asking things 
to be split just because its calling mdev_register_device() that practice 
doesn't exist and would be totally weird if you want to move all callers of
pci_register_driver(). 

Your argument seems interesting even entertaining :-). But honestly i'm not finding it
practical :-). So every caller of mmu_register_notifier() needs to be in
mm? 

What you mention for different functions make absolute sense, not arguing
against that.  but this ain't that. 

And we just follow the asks of the maintainer. 

I know you aren't going to give up, but there is little we can do. I want
the maintainers to make that call and I'm not add more noise to this.

Cheers,
Ashok
Thomas Gleixner Oct. 30, 2020, 8:48 p.m. UTC | #7
On Fri, Oct 30 2020 at 11:50, Dave Jiang wrote:
> The code has dependency on Thomas’s MSI restructuring patch series:
> https://lore.kernel.org/lkml/20200826111628.794979401@linutronix.de/

which is outdated and not longer applicable.

Thanks,

        tglx
Dave Jiang Oct. 30, 2020, 8:59 p.m. UTC | #8
On 10/30/2020 1:48 PM, Thomas Gleixner wrote:
> On Fri, Oct 30 2020 at 11:50, Dave Jiang wrote:
>> The code has dependency on Thomas’s MSI restructuring patch series:
>> https://lore.kernel.org/lkml/20200826111628.794979401@linutronix.de/
> 
> which is outdated and not longer applicable.

Yes.... I wasn't sure how to point to these patches from you as a dependency.

irqdomain/msi: Provide msi_alloc/free_store() callbacks
platform-msi: Add device MSI infrastructure
genirq/msi: Provide and use msi_domain_set_default_info_flags()
genirq/proc: Take buslock on affinity write
platform-msi: Provide default irq_chip:: Ack
x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI
x86/irq: Add DEV_MSI allocation type

Do I need to include these patches in my series? Thanks!

> 
> Thanks,
> 
>          tglx
>
Thomas Gleixner Oct. 30, 2020, 10:10 p.m. UTC | #9
On Fri, Oct 30 2020 at 13:59, Dave Jiang wrote:
> On 10/30/2020 1:48 PM, Thomas Gleixner wrote:
>> On Fri, Oct 30 2020 at 11:50, Dave Jiang wrote:
>>> The code has dependency on Thomas’s MSI restructuring patch series:
>>> https://lore.kernel.org/lkml/20200826111628.794979401@linutronix.de/
>> 
>> which is outdated and not longer applicable.
>
> Yes.... I wasn't sure how to point to these patches from you as a dependency.
>
> irqdomain/msi: Provide msi_alloc/free_store() callbacks
> platform-msi: Add device MSI infrastructure
> genirq/msi: Provide and use msi_domain_set_default_info_flags()
> genirq/proc: Take buslock on affinity write
> platform-msi: Provide default irq_chip:: Ack
> x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI
> x86/irq: Add DEV_MSI allocation type

How can you point at something which is not longer applicable?

> Do I need to include these patches in my series? Thanks!

No. They are NOT part of this series. Prerequisites are seperate
entities and your series can be based on them.

So for one you want to make sure that the prerequisites for your IDXD
stuff are going to be merged into the relevant maintainer trees.

To allow people working with your stuff you simply provide an
aggregation git tree which contains all the collected prerequisites.
This aggregation tree needs to be rebased when the prerequisites change
during review or are merged into a maintainer tree/branch.

It's not rocket science and a lot of people do exactly this all the time
in order to coordinate changes which have dependencies over multiple
subsystems.

Thanks,

        tglx
Jason Gunthorpe Oct. 30, 2020, 10:54 p.m. UTC | #10
On Fri, Oct 30, 2020 at 01:43:07PM -0700, Raj, Ashok wrote:
 
> So drawing that parallel, do you expect all drivers that call
> pci_register_driver() to be located in drivers/pci? Aren't they scattered
> all over the place ata,scsi, platform drivers and such?

The subsystem is the thing that calls
device_register. pci_register_driver() doesn't do that.

> As Alex pointed out, i915 and handful of s390 drivers that are mdev users
> are not in drivers/vfio. Are you sayint those drivers don't get reviewed? 

Past mistakes do not justify continuing to do it wrong.

ARM and PPC went through a huge multi year cleanup moving code out of
arch and into the proper drivers/ directories. We know this is the
correct way to work the development process.

> Your argument seems interesting even entertaining :-). But honestly i'm not finding it
> practical :-). So every caller of mmu_register_notifier() needs to be in
> mm? 

mmu notifiers are not a subsytem, they are core libary code.

You seem to completely not understand what a subsystem is. :(

> I know you aren't going to give up, but there is little we can do. I want
> the maintainers to make that call and I'm not add more noise to this.

Well, hopefully Vinod will insist on following kernel norms here.

Jason
Thomas Gleixner Oct. 31, 2020, 2:50 a.m. UTC | #11
Ashok,

On Fri, Oct 30 2020 at 13:43, Ashok Raj wrote:
> On Fri, Oct 30, 2020 at 04:30:45PM -0300, Jason Gunthorpe wrote:
>> On Fri, Oct 30, 2020 at 12:23:25PM -0700, Raj, Ashok wrote:
>> It is a different subsystem, different maintainer, and different
>> reviewers.
>> 
>> It is a development process problem, it doesn't matter what it is
>> doing.

< skip a lot of non-sensical arguments>

> I know you aren't going to give up, but there is little we can do. I want
> the maintainers to make that call and I'm not add more noise to this.

Jason is absolutely right.

Just because there is historical precendence which does not care about
the differentiation of subsystems is not an argument at all to make the
same mistakes which have been made years ago.

IDXD is just infrastructure which provides the base for a variety of
different functionalities. Very similar to what multi function devices
provide. In fact IDXD is pretty much a MFD facility.

Sticking all of it into dmaengine is sloppy at best. The dma engine
related part of IDXD is only a part of the overall functionality.

I'm well aware that it is conveniant to just throw everything into
drivers/myturf/ but that does neither make it reviewable nor
maintainable.

What's the problem with restructuring your code in a way which makes it
fit into existing subsystems?

The whole thing - as I pointed out to Dave earlier - is based on 'works
for me' wishful thinking with a blissful ignorance of the development
process and the requirement to split a large problem into the proper
bits and pieces aka. engineering 101.

Thanks,

        tglx
Ashok Raj Oct. 31, 2020, 11:53 p.m. UTC | #12
Hi Thomas,

On Sat, Oct 31, 2020 at 03:50:43AM +0100, Thomas Gleixner wrote:
> Ashok,
> 
> < skip a lot of non-sensical arguments>

Ouch!.. Didn't mean to awaken you like this :-).. apologies.. profusely! 

> 
> Just because there is historical precendence which does not care about
> the differentiation of subsystems is not an argument at all to make the
> same mistakes which have been made years ago.
> 
> IDXD is just infrastructure which provides the base for a variety of
> different functionalities. Very similar to what multi function devices
> provide. In fact IDXD is pretty much a MFD facility.

I'm only asking this to better understand the thought process. 
I don't intend to be defensive,  I have my hands tied back.. so we will do
what you say best fits per your recommendation.

Not my intend to dig a deeper hole than I have already dug! :-(

IDXD is just a glorified DMA engine, data mover. It also does a few other
things. In that sense its a multi-function facility. But doesn't do  different 
functional pieces like PCIe multi-function device in that sense. i.e
it doesn't do other storage and network in that sense. 

> 
> Sticking all of it into dmaengine is sloppy at best. The dma engine
> related part of IDXD is only a part of the overall functionality.

dmaengine is the basic non-transformational data-mover. Doing other operations
or transformations are just the glorified data-mover part. But fundamentally
not different.

> 
> I'm well aware that it is conveniant to just throw everything into
> drivers/myturf/ but that does neither make it reviewable nor
> maintainable.

That's true, when we add lot of functionality in one place. IDXD doing
mdev support is not offering new functioanlity. SRIOV PF drivers that support
PF/VF mailboxes are part of PF drivers today. IDXD mdev is preciely playing that
exact role. 

If we are doing this just to improve review effectiveness, Now we would need
some parent driver, and these sub-drivers registering seemed like a bit of
over-engineering when these sub-drivers actually are an extension of the
base driver and offer nothing more than extending sub-device partitions 
of IDXD for guest drivers. These look and feel like IDXD, not another device 
interface. In that sense if we move PF/VF mailboxes as
separate drivers i thought it feels a bit odd.

Please don't take it the wrong way. 

Cheers,
Ashok
Jason Gunthorpe Nov. 2, 2020, 1:20 p.m. UTC | #13
On Sat, Oct 31, 2020 at 04:53:59PM -0700, Raj, Ashok wrote:

> If we are doing this just to improve review effectiveness, Now we would need
> some parent driver, and these sub-drivers registering seemed like a bit of
> over-engineering when these sub-drivers actually are an extension of the
> base driver and offer nothing more than extending sub-device partitions
> of IDXD for guest drivers. These look and feel like IDXD, not another device 
> interface. In that sense if we move PF/VF mailboxes as
> separate drivers i thought it feels a bit odd.

You need this split anyhow, putting VFIO calls into the main idxd
module is not OK.

Plugging in a PCI device should not auto-load VFIO modules.

Jason
Ashok Raj Nov. 2, 2020, 4:20 p.m. UTC | #14
Hi Jason

On Mon, Nov 02, 2020 at 09:20:36AM -0400, Jason Gunthorpe wrote:

> > of IDXD for guest drivers. These look and feel like IDXD, not another device 
> > interface. In that sense if we move PF/VF mailboxes as
> > separate drivers i thought it feels a bit odd.
> 
> You need this split anyhow, putting VFIO calls into the main idxd
> module is not OK.
> 
> Plugging in a PCI device should not auto-load VFIO modules.

Yes, I agree that would be a good reason to separate them completely and
glue functionality with private APIs between the 2 modules.

- Separate mdev code from base idxd.
- Separate maintainers, so its easy to review and include. (But remember
  they are heavily inter-dependent. They have to move to-gether)

Almost all SRIOV drivers today are just configured with some form of Kconfig
and those relevant files are compiled into the same module.

I think in *most* applications idxd would be operating in that mode, where
you have the base driver and mdev parts (like VF) compiled in if configured
such.

Creating these private interfaces for intra-module are just 1-1 and not
general purpose and every accelerator needs to create these instances.

I wasn't sure focibly creating this firewall between the PF/VF interfaces
is actually worth the work every driver is going to require. I can see
where this is required when they offer separate functional interfaces
when we talk about multi-function in a more confined definition today.

idxd mdev's are purely a VF extension. It doesn't provide any different
function. For e.g. like an RDMA device that can provide iWarp, ipoib or
even multiplexing storage over IB. IDXD is a fixed function interface.

Sure having separate modules helps with that isolation. But I'm not
convinced if this simplifies, or complicates things more than what is
required for these device types.

Cheers,
Ashok
Jason Gunthorpe Nov. 2, 2020, 5:19 p.m. UTC | #15
On Mon, Nov 02, 2020 at 08:20:43AM -0800, Raj, Ashok wrote:
> Creating these private interfaces for intra-module are just 1-1 and not
> general purpose and every accelerator needs to create these instances.

This is where we are going, auxillary bus should be merged soon which
is specifically to connect these kinds of devices across subsystems

Jason
Dave Jiang Nov. 2, 2020, 6:18 p.m. UTC | #16
On 11/2/2020 10:19 AM, Jason Gunthorpe wrote:
> On Mon, Nov 02, 2020 at 08:20:43AM -0800, Raj, Ashok wrote:
>> Creating these private interfaces for intra-module are just 1-1 and not
>> general purpose and every accelerator needs to create these instances.
> 
> This is where we are going, auxillary bus should be merged soon which
> is specifically to connect these kinds of devices across subsystems

I think this resolves the aux device probe/remove issue via a common bus. But it 
does not help with the mdev device needing a lot of the device handling calls 
from the parent driver as it share the same handling as the parent device. My 
plan is to export all the needed call via EXPORT_SYMBOL_NS() so the calls can be 
shared in its own namespace between the modules. Do you have any objection with 
that?

> 
> Jason
>
Jason Gunthorpe Nov. 2, 2020, 6:26 p.m. UTC | #17
On Mon, Nov 02, 2020 at 11:18:33AM -0700, Dave Jiang wrote:
> 
> 
> On 11/2/2020 10:19 AM, Jason Gunthorpe wrote:
> > On Mon, Nov 02, 2020 at 08:20:43AM -0800, Raj, Ashok wrote:
> > > Creating these private interfaces for intra-module are just 1-1 and not
> > > general purpose and every accelerator needs to create these instances.
> > 
> > This is where we are going, auxillary bus should be merged soon which
> > is specifically to connect these kinds of devices across subsystems
> 
> I think this resolves the aux device probe/remove issue via a common bus.
> But it does not help with the mdev device needing a lot of the device
> handling calls from the parent driver as it share the same handling as the
> parent device.

The intention of auxiliary bus is that the two parts will tightly
couple across some exported function interface.

> My plan is to export all the needed call via EXPORT_SYMBOL_NS() so
> the calls can be shared in its own namespace between the modules. Do
> you have any objection with that?

I think you will be the first to use the namespace stuff for this, it
seems like a good idea and others should probably do so as well.

Jason
Dan Williams Nov. 2, 2020, 6:38 p.m. UTC | #18
On Mon, Nov 2, 2020 at 10:26 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Mon, Nov 02, 2020 at 11:18:33AM -0700, Dave Jiang wrote:
> >
> >
> > On 11/2/2020 10:19 AM, Jason Gunthorpe wrote:
> > > On Mon, Nov 02, 2020 at 08:20:43AM -0800, Raj, Ashok wrote:
> > > > Creating these private interfaces for intra-module are just 1-1 and not
> > > > general purpose and every accelerator needs to create these instances.
> > >
> > > This is where we are going, auxillary bus should be merged soon which
> > > is specifically to connect these kinds of devices across subsystems
> >
> > I think this resolves the aux device probe/remove issue via a common bus.
> > But it does not help with the mdev device needing a lot of the device
> > handling calls from the parent driver as it share the same handling as the
> > parent device.
>
> The intention of auxiliary bus is that the two parts will tightly
> couple across some exported function interface.
>
> > My plan is to export all the needed call via EXPORT_SYMBOL_NS() so
> > the calls can be shared in its own namespace between the modules. Do
> > you have any objection with that?
>
> I think you will be the first to use the namespace stuff for this, it
> seems like a good idea and others should probably do so as well.

I was thinking either EXPORT_SYMBOL_NS, or auxiliary bus, because you
should be able to export an ops structure with all the necessary
callbacks. Aux bus seems cleaner because the lifetime rules and
ownership concerns are clearer.
Jason Gunthorpe Nov. 2, 2020, 6:51 p.m. UTC | #19
On Mon, Nov 02, 2020 at 10:38:28AM -0800, Dan Williams wrote:

> > I think you will be the first to use the namespace stuff for this, it
> > seems like a good idea and others should probably do so as well.
> 
> I was thinking either EXPORT_SYMBOL_NS, or auxiliary bus, because you
> should be able to export an ops structure with all the necessary
> callbacks. 

'or'? 

Auxiliary bus should not be used with huge arrays of function
pointers... The module providing the device should export a normal
linkable function interface. Putting that in a namespace makes a lot
of sense.

Jason
Dan Williams Nov. 2, 2020, 7:26 p.m. UTC | #20
On Mon, Nov 2, 2020 at 10:52 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Mon, Nov 02, 2020 at 10:38:28AM -0800, Dan Williams wrote:
>
> > > I think you will be the first to use the namespace stuff for this, it
> > > seems like a good idea and others should probably do so as well.
> >
> > I was thinking either EXPORT_SYMBOL_NS, or auxiliary bus, because you
> > should be able to export an ops structure with all the necessary
> > callbacks.
>
> 'or'?
>
> Auxiliary bus should not be used with huge arrays of function
> pointers... The module providing the device should export a normal
> linkable function interface. Putting that in a namespace makes a lot
> of sense.

True, probably needs to be a mixture of both.