diff mbox series

[08/20] genirq/msi: Make MSI descriptor iterators device domain aware

Message ID 20221111132706.500733944@linutronix.de
State New
Headers show
Series genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 2 API rework | expand

Commit Message

Thomas Gleixner Nov. 11, 2022, 1:56 p.m. UTC
To support multiple MSI interrupt domains per device it is necessary to
segment the xarray MSI descriptor storage. Each domain gets up to
MSI_MAX_INDEX entries.

Change the iterators so they operate with domain ids and take the domain
offsets into account.

The publicly available iterators which are mostly used in legacy
implementations and the PCI/MSI core default to MSI_DEFAULT_DOMAIN (0)
which is the id for the existing "global" domains.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/msi.h |   45 +++++++++++++++++++++++++++++++++++++++++----
 kernel/irq/msi.c    |   43 +++++++++++++++++++++++++++++++++++--------
 2 files changed, 76 insertions(+), 12 deletions(-)

Comments

Jason Gunthorpe Nov. 16, 2022, 6:36 p.m. UTC | #1
On Fri, Nov 11, 2022 at 02:56:50PM +0100, Thomas Gleixner wrote:
> To support multiple MSI interrupt domains per device it is necessary to
> segment the xarray MSI descriptor storage. Each domain gets up to
> MSI_MAX_INDEX entries.

This kinds of suggests that the new per-device MSI domains should hold
this storage instead of per-device xarray?

I suppose the reason to avoid this is because alot of the driver
facing API is now built on vector index numbers that index this
xarray?

But on the other hand can we just say drivers using multiple domains
are "new" and they should use some new style pointer based interface
so we don't have to have arrays of things?

At least, I'd like to understand a bit better the motivation for using
a domain ID instead of a pointer.

It feels like we are baking in several hard coded limits with this
choice

> +static int msi_get_domain_base_index(struct device *dev, unsigned int domid)
> +{
> +	lockdep_assert_held(&dev->msi.data->mutex);
> +
> +	if (WARN_ON_ONCE(domid >= MSI_MAX_DEVICE_IRQDOMAINS))
> +		return -ENODEV;
> +
> +	if (WARN_ON_ONCE(!dev->msi.data->__irqdomains[domid]))
> +		return -ENODEV;
> +
> +	return domid * MSI_XA_DOMAIN_SIZE;
> +}
> +
> +
>  /**

Extra new line

Jason
Thomas Gleixner Nov. 16, 2022, 10:32 p.m. UTC | #2
On Wed, Nov 16 2022 at 14:36, Jason Gunthorpe wrote:
> On Fri, Nov 11, 2022 at 02:56:50PM +0100, Thomas Gleixner wrote:
>> To support multiple MSI interrupt domains per device it is necessary to
>> segment the xarray MSI descriptor storage. Each domain gets up to
>> MSI_MAX_INDEX entries.
>
> This kinds of suggests that the new per-device MSI domains should hold
> this storage instead of per-device xarray?

No, really not. This would create random storage in random driver places
instead of having a central storage place which is managed by the core
code. We've had that back in the days when every architecture had it's
own magic place to store and manage interrupt descriptors. Seen that,
mopped it up and never want to go back.

> I suppose the reason to avoid this is because alot of the driver
> facing API is now built on vector index numbers that index this
> xarray?

That's one aspect, but as I demonstrate later even for the IMS domains
which do not have a real requirement for 'index' you still need to have
a place to store the MSI descriptor and allocate storage space for it.

I really don't want to have random places doing that because then I
can't provide implicit MSI descriptor management, e.g. automatic
alloc/free anymore and everything has to happen at the driver side. The
only reason why I still need to do that for PCI/MSI is to be able to
support the museum architectures which still depend on the arch_....()
interfaces from 20 years ago.

So if a IMS domain, which e.g. stores the MSI message in queue memory,
wants a new interrupt then it allocates it with MSI_ANY_INDEX, which
gives it the next free slot in the XARRAY section of the MSI domain.

This avoids having IDA, bitmap allocators or whatever at the driver side
and having a virtual index number to track things does not affect the
flexibility of the driver side in any way.

All the driver needs at the very end is the interrupt number and the
message itself.

> But on the other hand can we just say drivers using multiple domains
> are "new" and they should use some new style pointer based interface
> so we don't have to have arrays of things?

Then driver writers have to provide storage for the domain pointer and
care about teardown etc. Seriously? NO!

> At least, I'd like to understand a bit better the motivation for using
> a domain ID instead of a pointer.

The main motivation was to avoid device specific storage for the irq
domain pointers. It would have started with PCI/MSI[X]: I'd had to add a
irqdomain pointer to struct pci_dev and then have the PCI core care
about it. So we'd add that to everything and the world which utilizes
per device MSI domains which is quite a few places outside of PCI in the
ARM64 world and growing.

The msi_device_data struct which is allocated on demand for MSI usage is
the obvious point to store _and_ manage these things, i.e. managed
teardown etc.

Giving this up makes any change to the core code hard because you have
to chase all usage sites and mop them up. Just look at the ARM part of
this series which is by now 40+ patches just to mop up the irqchip
core. There are still 25 PCI/MSI global irqdomain left.

> It feels like we are baking in several hard coded limits with this
> choice

Which ones?

The chosen array section size per domain is arbitrary and can be changed
at any given time. Though you have to exhaust 64k vectors per domain
first before we start debating that.

The number of irqdomains is not really hard limited either. It's trivial
enough to extend that number and once we hit 32 we just can stash them
away in the xarray. I pondered to do that right away, but that wastes
too much memory for now.

It really does not matter whether the domain creation results in a
number or in a pointer. Pointers are required for the inner workings of
the domain hierarchy but absolutely uninteresting for endpoint domains.

All you need there is a conveniant way to create the domain and then
allocate/free interrupts as you see fit.

We agreed a year ago that we want to abstract most of these things away
for driver writers and that all they need is simple way to create the
domains and the corresponding interrupt chip is mostly about writing the
MSI message to implementation defined storage and eventually providing a
implementation specific mask/unmask operation.

So what are you concerned about?

Thanks,

        tglx
Jason Gunthorpe Nov. 17, 2022, 12:37 a.m. UTC | #3
On Wed, Nov 16, 2022 at 11:32:15PM +0100, Thomas Gleixner wrote:
> On Wed, Nov 16 2022 at 14:36, Jason Gunthorpe wrote:
> > On Fri, Nov 11, 2022 at 02:56:50PM +0100, Thomas Gleixner wrote:
> >> To support multiple MSI interrupt domains per device it is necessary to
> >> segment the xarray MSI descriptor storage. Each domain gets up to
> >> MSI_MAX_INDEX entries.
> >
> > This kinds of suggests that the new per-device MSI domains should hold
> > this storage instead of per-device xarray?
> 
> No, really not. This would create random storage in random driver places
> instead of having a central storage place which is managed by the core
> code. We've had that back in the days when every architecture had it's
> own magic place to store and manage interrupt descriptors. Seen that,
> mopped it up and never want to go back.

I don't mean shift it into the msi_domain driver logic, I just mean
stick an xarray in the struct msi_domain that the core code, and only
the core code, manages.

But I suppose, on reflection, the strong reason not to do this is that
the msi_descriptor array is per-device, and while it would work OK
with per-device msi_domains we still have the legacy of global msi
domains and thus still need a per-device place to store the global msi
domain's per-device descriptors.

> > At least, I'd like to understand a bit better the motivation for using
> > a domain ID instead of a pointer.
> 
> The main motivation was to avoid device specific storage for the irq
> domain pointers. It would have started with PCI/MSI[X]: I'd had to add a
> irqdomain pointer to struct pci_dev and then have the PCI core care
> about it. So we'd add that to everything and the world which utilizes
> per device MSI domains which is quite a few places outside of PCI in the
> ARM64 world and growing.

I was thinking more that the "default" domain (eg the domain ID 0 as
this series has it) would remain as a domain pointer in the device
data as it is here, but any secondary domains would be handled with a
pointer that the driver owns.

You could have as many secondary domains as is required this way. Few
drivers would ever use a secondary domain, so it not really a big deal
for them to hold the pointer lifetime.

> So what are you concerned about?

Mostly API clarity, I find it very un-kernly to swap a clear pointer
for an ID #. We loose typing, the APIs become less clear and we now
have to worry about ID allocation policy if we ever need more than 2.

Thanks,
Jason
Thomas Gleixner Nov. 17, 2022, 8:45 a.m. UTC | #4
On Wed, Nov 16 2022 at 20:37, Jason Gunthorpe wrote:
> On Wed, Nov 16, 2022 at 11:32:15PM +0100, Thomas Gleixner wrote:
>> On Wed, Nov 16 2022 at 14:36, Jason Gunthorpe wrote:
>> > On Fri, Nov 11, 2022 at 02:56:50PM +0100, Thomas Gleixner wrote:
>> >> To support multiple MSI interrupt domains per device it is necessary to
>> >> segment the xarray MSI descriptor storage. Each domain gets up to
>> >> MSI_MAX_INDEX entries.
>> >
>> > This kinds of suggests that the new per-device MSI domains should hold
>> > this storage instead of per-device xarray?
>> 
>> No, really not. This would create random storage in random driver places
>> instead of having a central storage place which is managed by the core
>> code. We've had that back in the days when every architecture had it's
>> own magic place to store and manage interrupt descriptors. Seen that,
>> mopped it up and never want to go back.
>
> I don't mean shift it into the msi_domain driver logic, I just mean
> stick an xarray in the struct msi_domain that the core code, and only
> the core code, manages.
>
> But I suppose, on reflection, the strong reason not to do this is that
> the msi_descriptor array is per-device, and while it would work OK
> with per-device msi_domains we still have the legacy of global msi
> domains and thus still need a per-device place to store the global msi
> domain's per-device descriptors.

I tried several approaches but all of them ended up having slightly
different code pathes and decided to keep everything the same from
legacy arch over global MSI and the modern per device MSI models.

Due to that some of the constructs are slightly awkward, but the
important outcome for me was that I ended up with as many shared code
pathes as possible. Having separate code pathes for all variants is for
one causing code bloat and what's worse it's a guarantee for divergance
and maintenance nightmares. As this is setup/teardown management code
and not the fancy hotpath where we really want to spare cycles, I went
for the unified model.

> You could have as many secondary domains as is required this way. Few
> drivers would ever use a secondary domain, so it not really a big deal
> for them to hold the pointer lifetime.
>
>> So what are you concerned about?
>
> Mostly API clarity, I find it very un-kernly to swap a clear pointer
> for an ID #. We loose typing, the APIs become less clear and we now
> have to worry about ID allocation policy if we ever need more than 2.

I don't see an issue with that.

  id = msi_create_device_domain(dev, &template, ...);
  
is not much different from:

  ptr = msi_create_device_domain(dev, &template, ...);

But it makes a massive difference vs. encapsulation and pointer leakage.

If you have a stale ID then you can't do harm, a stale pointer very much
so.

Aside of that once pointers are available people insist on fiddling in
the guts. As I'm mopping up behind driver writers for the last twenty
years now, my confidence in them is pretty close to zero.

So I rather be defensive and work towards encapsulation where ever its
possible. Interrupts are a source of hard to debug subtle bugs, so
taking the tinkerers the tools away to cause them is a good thing IMO.

Thanks,

        tglx
Tian, Kevin Nov. 18, 2022, 7:57 a.m. UTC | #5
> From: Thomas Gleixner <tglx@linutronix.de>
> Sent: Friday, November 11, 2022 9:57 PM
> 
> +/* Invalid XA index which is outside of any searchable range */
> +#define MSI_XA_MAX_INDEX	(ULONG_MAX - 1)
> +#define MSI_XA_DOMAIN_SIZE	(MSI_MAX_INDEX + 1)
> +

Out of curiosity. Other places treat MSI_MAX_INDEX - 1 as the upper
bound of a valid range. This size definition here implies that the last ID
is wasted for every domain. Is it intended?
Thomas Gleixner Nov. 18, 2022, 12:17 p.m. UTC | #6
On Fri, Nov 18 2022 at 07:57, Kevin Tian wrote:
>> From: Thomas Gleixner <tglx@linutronix.de>
>> Sent: Friday, November 11, 2022 9:57 PM
>> 
>> +/* Invalid XA index which is outside of any searchable range */
>> +#define MSI_XA_MAX_INDEX	(ULONG_MAX - 1)
>> +#define MSI_XA_DOMAIN_SIZE	(MSI_MAX_INDEX + 1)
>> +
>
> Out of curiosity. Other places treat MSI_MAX_INDEX - 1 as the upper
> bound of a valid range. This size definition here implies that the last ID
> is wasted for every domain. Is it intended?

Bah. MSI_MAX_INDEX is inclusive so the size must be + 1. I obviously
missed that in the other places which use it as upper bound.

Not that it matters, but yes. Let me fix that.
diff mbox series

Patch

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -177,6 +177,7 @@  enum msi_desc_filter {
  * @mutex:		Mutex protecting the MSI descriptor store
  * @__store:		Xarray for storing MSI descriptor pointers
  * @__iter_idx:		Index to search the next entry for iterators
+ * @__iter_max:		Index to limit the search
  * @__irqdomains:	Per device interrupt domains
  */
 struct msi_device_data {
@@ -185,6 +186,7 @@  struct msi_device_data {
 	struct mutex			mutex;
 	struct xarray			__store;
 	unsigned long			__iter_idx;
+	unsigned long			__iter_max;
 	struct irq_domain		*__irqdomains[MSI_MAX_DEVICE_IRQDOMAINS];
 };
 
@@ -193,14 +195,34 @@  int msi_setup_device_data(struct device
 void msi_lock_descs(struct device *dev);
 void msi_unlock_descs(struct device *dev);
 
-struct msi_desc *msi_first_desc(struct device *dev, enum msi_desc_filter filter);
+struct msi_desc *msi_domain_first_desc(struct device *dev, unsigned int domid,
+				       enum msi_desc_filter filter);
+
+/**
+ * msi_first_desc - Get the first MSI descriptor of the default irqdomain
+ * @dev:	Device to operate on
+ * @filter:	Descriptor state filter
+ *
+ * Must be called with the MSI descriptor mutex held, i.e. msi_lock_descs()
+ * must be invoked before the call.
+ *
+ * Return: Pointer to the first MSI descriptor matching the search
+ *	   criteria, NULL if none found.
+ */
+static inline struct msi_desc *msi_first_desc(struct device *dev,
+					      enum msi_desc_filter filter)
+{
+	return msi_domain_first_desc(dev, MSI_DEFAULT_DOMAIN, filter);
+}
+
 struct msi_desc *msi_next_desc(struct device *dev, enum msi_desc_filter filter);
 
 /**
- * msi_for_each_desc - Iterate the MSI descriptors
+ * msi_domain_for_each_desc - Iterate the MSI descriptors in a specific domain
  *
  * @desc:	struct msi_desc pointer used as iterator
  * @dev:	struct device pointer - device to iterate
+ * @domid:	The id of the interrupt domain which should be walked.
  * @filter:	Filter for descriptor selection
  *
  * Notes:
@@ -208,10 +230,25 @@  struct msi_desc *msi_next_desc(struct de
  *    pair.
  *  - It is safe to remove a retrieved MSI descriptor in the loop.
  */
-#define msi_for_each_desc(desc, dev, filter)			\
-	for ((desc) = msi_first_desc((dev), (filter)); (desc);	\
+#define msi_domain_for_each_desc(desc, dev, domid, filter)			\
+	for ((desc) = msi_domain_first_desc((dev), (domid), (filter)); (desc);	\
 	     (desc) = msi_next_desc((dev), (filter)))
 
+/**
+ * msi_for_each_desc - Iterate the MSI descriptors in the default irqdomain
+ *
+ * @desc:	struct msi_desc pointer used as iterator
+ * @dev:	struct device pointer - device to iterate
+ * @filter:	Filter for descriptor selection
+ *
+ * Notes:
+ *  - The loop must be protected with a msi_lock_descs()/msi_unlock_descs()
+ *    pair.
+ *  - It is safe to remove a retrieved MSI descriptor in the loop.
+ */
+#define msi_for_each_desc(desc, dev, filter)					\
+	msi_domain_for_each_desc((desc), (dev), MSI_DEFAULT_DOMAIN, (filter))
+
 #define msi_desc_to_dev(desc)		((desc)->dev)
 
 #ifdef CONFIG_IRQ_MSI_IOMMU
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -21,6 +21,10 @@ 
 
 static inline int msi_sysfs_create_group(struct device *dev);
 
+/* Invalid XA index which is outside of any searchable range */
+#define MSI_XA_MAX_INDEX	(ULONG_MAX - 1)
+#define MSI_XA_DOMAIN_SIZE	(MSI_MAX_INDEX + 1)
+
 static inline void msi_setup_default_irqdomain(struct device *dev, struct msi_device_data *md)
 {
 	if (!dev->msi.domain)
@@ -33,6 +37,20 @@  static inline void msi_setup_default_irq
 		md->__irqdomains[MSI_DEFAULT_DOMAIN] = dev->msi.domain;
 }
 
+static int msi_get_domain_base_index(struct device *dev, unsigned int domid)
+{
+	lockdep_assert_held(&dev->msi.data->mutex);
+
+	if (WARN_ON_ONCE(domid >= MSI_MAX_DEVICE_IRQDOMAINS))
+		return -ENODEV;
+
+	if (WARN_ON_ONCE(!dev->msi.data->__irqdomains[domid]))
+		return -ENODEV;
+
+	return domid * MSI_XA_DOMAIN_SIZE;
+}
+
+
 /**
  * msi_alloc_desc - Allocate an initialized msi_desc
  * @dev:	Pointer to the device for which this is allocated
@@ -229,6 +247,7 @@  int msi_setup_device_data(struct device
 
 	xa_init(&md->__store);
 	mutex_init(&md->mutex);
+	md->__iter_idx = MSI_XA_MAX_INDEX;
 	dev->msi.data = md;
 	devres_add(dev, md);
 	return 0;
@@ -251,7 +270,7 @@  EXPORT_SYMBOL_GPL(msi_lock_descs);
 void msi_unlock_descs(struct device *dev)
 {
 	/* Invalidate the index wich was cached by the iterator */
-	dev->msi.data->__iter_idx = MSI_MAX_INDEX;
+	dev->msi.data->__iter_idx = MSI_XA_MAX_INDEX;
 	mutex_unlock(&dev->msi.data->mutex);
 }
 EXPORT_SYMBOL_GPL(msi_unlock_descs);
@@ -260,17 +279,18 @@  static struct msi_desc *msi_find_desc(st
 {
 	struct msi_desc *desc;
 
-	xa_for_each_start(&md->__store, md->__iter_idx, desc, md->__iter_idx) {
+	xa_for_each_range(&md->__store, md->__iter_idx, desc, md->__iter_idx, md->__iter_max) {
 		if (msi_desc_match(desc, filter))
 			return desc;
 	}
-	md->__iter_idx = MSI_MAX_INDEX;
+	md->__iter_idx = MSI_XA_MAX_INDEX;
 	return NULL;
 }
 
 /**
- * msi_first_desc - Get the first MSI descriptor of a device
+ * msi_domain_first_desc - Get the first MSI descriptor of an irqdomain associated to a device
  * @dev:	Device to operate on
+ * @domid:	The id of the interrupt domain which should be walked.
  * @filter:	Descriptor state filter
  *
  * Must be called with the MSI descriptor mutex held, i.e. msi_lock_descs()
@@ -279,19 +299,26 @@  static struct msi_desc *msi_find_desc(st
  * Return: Pointer to the first MSI descriptor matching the search
  *	   criteria, NULL if none found.
  */
-struct msi_desc *msi_first_desc(struct device *dev, enum msi_desc_filter filter)
+struct msi_desc *msi_domain_first_desc(struct device *dev, unsigned int domid,
+				       enum msi_desc_filter filter)
 {
 	struct msi_device_data *md = dev->msi.data;
+	int baseidx;
 
 	if (WARN_ON_ONCE(!md))
 		return NULL;
 
 	lockdep_assert_held(&md->mutex);
 
-	md->__iter_idx = 0;
+	baseidx = msi_get_domain_base_index(dev, domid);
+	if (baseidx < 0)
+		return NULL;
+
+	md->__iter_idx = baseidx;
+	md->__iter_max = baseidx + MSI_MAX_INDEX - 1;
 	return msi_find_desc(md, filter);
 }
-EXPORT_SYMBOL_GPL(msi_first_desc);
+EXPORT_SYMBOL_GPL(msi_domain_first_desc);
 
 /**
  * msi_next_desc - Get the next MSI descriptor of a device
@@ -315,7 +342,7 @@  struct msi_desc *msi_next_desc(struct de
 
 	lockdep_assert_held(&md->mutex);
 
-	if (md->__iter_idx >= (unsigned long)MSI_MAX_INDEX)
+	if (md->__iter_idx >= md->__iter_max)
 		return NULL;
 
 	md->__iter_idx++;