diff mbox

[RFC] irqdomain: Introduce new interfaces to support hierarchy irqdomains

Message ID 1406880493-18859-1-git-send-email-jiang.liu@linux.intel.com
State Not Applicable
Headers show

Commit Message

Jiang Liu Aug. 1, 2014, 8:08 a.m. UTC
We plan to use hierarchy irqdomain to suppport CPU vector assignment,
interrupt remapping controller, IO-APIC controller, MSI interrupt
and hypertransport interrupt etc on x86 platforms. So extend irqdomain
interfaces to support hierarchy irqdomain.

There are already many clients of current irqdomain interfaces.
To minimize the changes, we choose to introduce new version 2 interfaces
to support hierarchy instead of extending existing irqdomain interfaces.

The new V2 interfaces include six functions:
irq_domain_{alloc|free}_irqs():
	Allocate IRQ from domain or free IRQ into domain. It's mainly
used to allocate and free resources associated with interrupt descriptor.
irq_domain_{associate|disassociate}_irqs:
	Associate or disassociate IRQ descriptor with hardware interrupt
resource(pin). It's mainly used to program interrupt controller.
irq_domain_{create|destroy}_irqs():
	Call above four interfaces to create or destroy IRQs.

There are several design points about the new interfaces.

First, architecture needs to define struct irq_map_info, which will be
used to pass architecture specific information to controller specific
callbacks.

Second, all new interfaces have parameter 'size' to support multiple
continous IRQs, which is needed by MSI when interrupt remapping is
enabled on x86.

Third, a special value IRQDOMAIN_AUTO_ASSIGN_HWIRQ is defined out of
irq_hw_number_t, which indicates that irqdomain callbacks should
automatically hardware interrupt number for clients. This will be used
to support CPU vector allocation and interrupt remapping controller
on x86 platforms.

Fourth, the flag IRQDOMAIN_FLAG_HIERARCHY is used to indicate weather
irqdomain operations are hierarchy request. The irqdomain core uses
domain and hwirq fields in struct irq_data to save domain and hardware
interrupt number, but this causes trouble when enabling hierarchy
irqdomain. We solve this limitation by:
1) Still use domain and hwirq fields in struct irq_data to save
   infomation about the out-most irqdomain.
2) For hierarchy irqdomains, the parent field in struct irq_domain is
   used to save point to parent irqdomain.
3) For hierarchy irqdomains, it must implement a private method to save
   hardware interrupt number (hwirq).

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
Hi all,	
	We are trying to extend current irqdomain interfaces to support
hierarchy irqdomains. Please help to comment on the new interface design.
We will switch x86 irq management subsystem to use hierarchy irqdomains
once we reach agreement on the new interfaces.
Regards!
Gerry
---
 include/linux/irqdomain.h |   49 ++++++++-
 kernel/irq/irqdomain.c    |  252 ++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 262 insertions(+), 39 deletions(-)

Comments

Thomas Gleixner Aug. 26, 2014, 9:06 p.m. UTC | #1
Jiang,

On Fri, 1 Aug 2014, Jiang Liu wrote:

> First, architecture needs to define struct irq_map_info, which will be
> used to pass architecture specific information to controller specific
> callbacks.

We should not have another "universal" data structure which needs to
fit various levels of the hierarchy. See below.

> Second, all new interfaces have parameter 'size' to support multiple
> continous IRQs, which is needed by MSI when interrupt remapping is
> enabled on x86.

Nitpick: nrirqs would be more intuitive.
 
> Third, a special value IRQDOMAIN_AUTO_ASSIGN_HWIRQ is defined out of
> irq_hw_number_t, which indicates that irqdomain callbacks should
> automatically hardware interrupt number for clients. This will be used
> to support CPU vector allocation and interrupt remapping controller
> on x86 platforms.

I can't see the point of this. If we have a hierarchy then this is a
property of the hierarchy itself not of an indiviual call.
 
> Fourth, the flag IRQDOMAIN_FLAG_HIERARCHY is used to indicate weather
> irqdomain operations are hierarchy request. The irqdomain core uses

Why do we need that flag? If a domain has a parent domain, we already
know that this domain is part of a hierarchy.

> domain and hwirq fields in struct irq_data to save domain and hardware
> interrupt number, but this causes trouble when enabling hierarchy
> irqdomain. We solve this limitation by:
> 1) Still use domain and hwirq fields in struct irq_data to save
>    infomation about the out-most irqdomain.
> 2) For hierarchy irqdomains, the parent field in struct irq_domain is
>    used to save point to parent irqdomain.
> 3) For hierarchy irqdomains, it must implement a private method to save
>    hardware interrupt number (hwirq).

I'm not so fond of this design decision. Let's look at a hierarchy:

    PCI-Device
	MSI Interrupt
	    Remap Entry
	    	  CPU Vector

But your data representation is not hierarchical because only the
outmost domain map is stored in irq_data. For the parent domains you
offload the storage to the domain implementation itself.

We should be more clever and make the storage hierarchical and
indivual itself.

What about adding a field:

     struct irq_data *parent_data;

to struct irq_data and allocate it when we allocate an interrupt down
the hierarchy chain?

That allows to store hw_irq for each hierarchy level along with a
pointer to the interrupt chip and irq specific data. So the storage
would look like this:

irq
   irq_desc
      irq_data (hwirq, chip, chipdata, domain ...)
         irq_data (hwirq, chip, chipdata, domain ...)
             irq_data (hwirq, chip, chipdata, domain ...)

That allows us also to support scenarios where manipulation of the top
level interrupt "irq" requires to walk down the hierarchy without
consulting any irqdomain management code and without having specific
knowledge in the interrupt chip callbacks. Assume a two level
hierarchy which requires to mask/unmask both levels. In the current
mode we need to deal with this in the chip callback of the outermost
chip. But with a hierarchical data set, we can do:

mask(struct irqdata *d)
{
	d->chip->irq_mask(d);
	if (d->parent_data)
	   mask(d->parent_data);
}

and avoid magic conditionals in the chip callbacks.

Lets look at the current way of allocating a MSI interrupt with
remapping.

msi_capability_init()
  arch_setup_msi_irqs()
    irq_remapping_setup_msi_irqs()
      irq_alloc_hwirq()
         __assign_irq_vector()
      intel_msi_alloc_irq()
         alloc_irte()

In a proper domain hierarchy it would look like this:

msi_capability_init()
  arch_setup_msi_irqs()
    irq_domain_alloc(msi_domain)
       irq_domain_alloc(remap_domain)
         irq_domain_alloc(vector_domain)

In a non remapped scenario the msi_domain parent would be
vector_domain and the chain would look like this:

msi_capability_init()
  arch_setup_msi_irqs()
    irq_domain_alloc(msi_domain)
      irq_domain_alloc(vector_domain)

So now the question is where the irq descriptor which is associated to
"irq" is allocated. The most logical point is at the end of the
hierarchy, i.e. the domain which has no parent domain. That makes a
single level domain scenario just a natural subset of a multi level
hierarchy, but that makes the hierarchical irq data allocation and
handling more complex because the innermost domain would use the
irqdata associated to irqdesc. Not a really good idea, if we look at
it from the interrupt handling code. We could copy and shuffle stuff
around, but thats ugly as hell.

So we are better off by doing:

irq_domain_alloc(....)
{
   int virq = alloc_irq_desc(nrirqs);

   if (domain->parent) {
      ret = irq_domain_alloc_parent(domain->parent, virq, nrirqs);
      ....
   }
   ....
   return virq;
}   

And have:

irq_domain_alloc_parent(struct irqdomain *domain, int virq,
			int nrirqs)
{
   alloc_irq_data(domain, virq, nrirqs);

   if (domain->parent) {
      ret = irq_domain_alloc_parent(domain->parent, nrirqs, virq);
      ....
   }
   /*
    * Allocate the actual hardware resources in this domain and store
    * the domain specific information in the proper hierarchy level
    * irqdata.
    */
   domain->alloc_hwirqs(domain, virq, nrirqs);
   ...
}

alloc_irq_data(struct irqdomain *domain, int virq, int nrirqs)
{
   /* Allocate irq_data and domain specific storage in one go */	
   int size = sizeof(struct irq_data) + domain->private_data_size;
   void *p;

   p = kzalloc(nrirqs * size);
  
   for (i = 0; i < nrirqs; i++, virq++, hwirq++, p += size) {
     struct irq_data *d = irq_get_irq_data(virq);
   
     while (d->parent_data)
       d = d->parent_data;

     d->parent_data = p;

     d = p;
     d->irq = virq;
     d->domain = domain;

     /* If we have domain specific storage, let chip_data point to it */
     if (domain->private_data_size)
       d->chip_data = d + 1;
   }
}

To lookup the irqdata of a hierarchy level we need a helper function:

irq_get_domain_irq_data(int virq, struct irq_domain *d)
{
   struct irq_data *d = irq_get_irq_data(virq);

   while (d) {
      if (d->domain == d)
         break;
      d = d->parent_data;
   }
   return d;	
}

So if we look at the various hierarchy levels, then we really can make
sensible use of the separate irq_data instances. Right now, the struct
irq_2_iommu is part of the vector domain config structure, but thats
not because the vector domain needs that information, it just happens
to be there. So we can put it into the irqdata associated to the remap
domain.

There are a few other things to consider. Right now the irq remapping
code modifies the x86_ops to redirect the ioapic and the msi functions
to the remap implementations and at irq setup time it also modifies
the irq chip callbacks. That's a completely unreadable and
undocumented maze of indirections and substitutions.

And just looking at the msi setup code makes me run away screaming.

So in the intel irq remapping case arch_setup_msi_irqs() ends up with
the following call chain for a single MSI-X interrupt:

irq_remapping_setup_msi_irqs() 	      irq_remapping.c
  do_setup_msix_irqs()

    irq_alloc_hwirq()		      irq core code
      irq = __irq_alloc_descs()
      arch_setup_hwirq(irq)	      io_apic.c
        cfg = alloc_irq_cfg(irq)
        __assign_irq_vector(irq, cfg)
	irq_set_chip_data(irq, cfg)

    msi_alloc_remapped_irq(irq)	      irq_remapping.c
      intel_msi_alloc_irq(irq)	      intel_irq_remapping.c
        alloc_irte(irq)

    setup_msi_irq(irq)		      io_apic.c
      msi_compose_msg(irq)
        assign_irq_vector()           <- Second invocation !?!?!
	x86_msi.compose_msi_msg()     
	  compose_remapped_msi_msg()  irq_remapping.c
            intel_compose_msi_msg()   intel_irq_remapping.c
	      fiddle_with_irte()
      write_msi_msg()		      io_apic.c/msi.c
      setup_remapped_irq()	      irq_remapping.c
        modify_irq_chip_call_backs()

All in all design from hell hacked into submission ...

So how can we do better with irq domains?

First of all we need to separate the resource allocation and the
actual activation, i.e.:

Allocate irq descriptors, remap area and vector block first, then
activate the interrupts. Doing it this way is also simpler for backoff
in the error case than undoing a partial reservation/allocation which
has already been activated.

irq_domain_alloc(msi_domain, nrirqs)
{
  virq = irq_alloc_desc(nriqs)

  irq_domain_alloc_parent(remap_domain, virq, nrirqs)
  
    irq_domain_alloc_parent(vector_domain, virq, nrirqs)
}

Each parent allocation allocates the irqdata and stores the
reservation/allocation information in it. Further it assigns an
interrupt chip to the irqdata if appropriate.

So we don't need to activate the interrupt in this step at all. All we
need to do is to allocate and reserve the necessary resources.

The point where we really need to activate the interrupt is at
request_irq() time. And here the interrupt chips which we assigned
come into play:

request_irq(virq)
    ....
    irq_startup();
    
So we need to teach irq_startup about hierarchies, but that's not
rocket science.

irq_startup(struct irq_desc *desc, bool resend)
{
  irq_startup_parent(&desc->irq_data);
  ...
}

And have

irq_startup_parent(struct irq_data *d)
{
  if (d->parent_data)
    irq_startup_parent(d->parent_data);

  if (d->chip && d->irq_startup)
    d->chip->irq_startup(d);  
}

So in the PCI MSI[X] remapping case this would 

1) Call down into the vector domain and the assigned irq chip would
   have a startup function which finally activates the reserved
   vector.

2) Write the irte entry from the remap domain and compose the msi msg.

3) Enable the msi msg from the msi domain

Now you might ask the question how #2 makes use of #1
(cfg->vector/cfg->domain) and #3 makes use of #2 (msi msg). That's
rather simple.

The remap domain chip knows that its parent is the vector domain, so
it simply can look at d->parent_data->chip_data to retrieve irq_cfg
which contains the vector and the domain.

Now the MSI domain chip checks, whether the parent domain has
d->parent_data->msi_desc set and if not NULL it gets the remapped msi
msg information from there. If remapping is disabled, then
d->parent_data->msi_desc is NULL and the native msg composition takes
place.

While we are at it can we please get rid of the abnomination of
modifying the callbacks of the irq chip which is associated to a
particular interrupt line. That's a nasty and fcking non obvious
hack. Why would the MSI chip care about an override to irq_eoi?

What's wrong with either having proper interrupt chips for the
remapped and non remapped case or modifying the msi chip callbacks
once when interrupt remapping is enabled instead of doing it with the
assbackwards setup_remapped_irq() function over and over again?

Now if you followed the above you'll notice that it will get rid of
another annoying level of indirection, i.e. the x86_msi ops. And I'm
pretty sure, that this also gets rid of the ioapic ops hackery if we
treat it the same way.

Another benefit is that you can get rid of the whole irq_remapping.c
indirection and conditional maze, which is completely overengineered
and pointless.

So the whole irq_remapping mess will boil down to establish the proper
domain parent pointers for MSI, HPET_MSI, IOAPIC and whatever gets
mangled through the remap code. That's a simple setup time operation
and does not require any x86_*ops overrides at all. So if we do it
right we can get rid of quite some pointless shite.

I probably missed a few things, though I don't see any real show
stopper right now. I did not elaborate on irq_set_affinity() but that
should be an obvious excercise; if not we need to think it through
again and maybe from a different angle, but we really want a proper
cleanup for the existing pile of shite and not another bolted on
solution.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiang Liu Aug. 27, 2014, 7:57 a.m. UTC | #2
Hi Thomas,
	Thanks for your great comments! Please refer to
inline comments below.
Regards!
Gerry

On 2014/8/27 5:06, Thomas Gleixner wrote:
> Jiang,
> 
> On Fri, 1 Aug 2014, Jiang Liu wrote:
> 
>> First, architecture needs to define struct irq_map_info, which will be
>> used to pass architecture specific information to controller specific
>> callbacks.
> 
> We should not have another "universal" data structure which needs to
> fit various levels of the hierarchy. See below.
> 
>> Second, all new interfaces have parameter 'size' to support multiple
>> continous IRQs, which is needed by MSI when interrupt remapping is
>> enabled on x86.
> 
> Nitpick: nrirqs would be more intuitive.
Will use nrirqs in next version.

>  
>> Third, a special value IRQDOMAIN_AUTO_ASSIGN_HWIRQ is defined out of
>> irq_hw_number_t, which indicates that irqdomain callbacks should
>> automatically hardware interrupt number for clients. This will be used
>> to support CPU vector allocation and interrupt remapping controller
>> on x86 platforms.
> 
> I can't see the point of this. If we have a hierarchy then this is a
> property of the hierarchy itself not of an indiviual call.
When invoking irqdomain interfaces, we need to pass in an hwirq. For
IOAPIC, it's the pin number. But for remap and vector domains, caller
can't provide hwirq, it's assigned by remap and vector domains
themselves. So introduced IRQDOMAIN_AUTO_ASSIGN_HWIRQ to indicate
that irqdomain will assign hwirq for callers.

>  
>> Fourth, the flag IRQDOMAIN_FLAG_HIERARCHY is used to indicate weather
>> irqdomain operations are hierarchy request. The irqdomain core uses
> 
> Why do we need that flag? If a domain has a parent domain, we already
> know that this domain is part of a hierarchy.
This flag is passed into hierarchy irqdomain interfaces for two
purposes:
1) to protect irq_data->hwirq and irq_data->domain
2) to solve recursive lock issues in irqdomain core.

> 
>> domain and hwirq fields in struct irq_data to save domain and hardware
>> interrupt number, but this causes trouble when enabling hierarchy
>> irqdomain. We solve this limitation by:
>> 1) Still use domain and hwirq fields in struct irq_data to save
>>    infomation about the out-most irqdomain.
>> 2) For hierarchy irqdomains, the parent field in struct irq_domain is
>>    used to save point to parent irqdomain.
>> 3) For hierarchy irqdomains, it must implement a private method to save
>>    hardware interrupt number (hwirq).
> 
> I'm not so fond of this design decision. Let's look at a hierarchy:
> 
>     PCI-Device
> 	MSI Interrupt
> 	    Remap Entry
> 	    	  CPU Vector
> 
> But your data representation is not hierarchical because only the
> outmost domain map is stored in irq_data. For the parent domains you
> offload the storage to the domain implementation itself.
One of my design rules is only to change x86 arch specific code when
possible, so used above solution. If we could make changes to public
data structures, we may find better solution as you have suggested:)

> 
> We should be more clever and make the storage hierarchical and
> indivual itself.
> 
> What about adding a field:
> 
>      struct irq_data *parent_data;
> 
> to struct irq_data and allocate it when we allocate an interrupt down
> the hierarchy chain?
> 
> That allows to store hw_irq for each hierarchy level along with a
> pointer to the interrupt chip and irq specific data. So the storage
> would look like this:
> 
> irq
>    irq_desc
>       irq_data (hwirq, chip, chipdata, domain ...)
>          irq_data (hwirq, chip, chipdata, domain ...)
>              irq_data (hwirq, chip, chipdata, domain ...)
> 
> That allows us also to support scenarios where manipulation of the top
> level interrupt "irq" requires to walk down the hierarchy without
> consulting any irqdomain management code and without having specific
> knowledge in the interrupt chip callbacks. Assume a two level
> hierarchy which requires to mask/unmask both levels. In the current
> mode we need to deal with this in the chip callback of the outermost
> chip. But with a hierarchical data set, we can do:
> 
> mask(struct irqdata *d)
> {
> 	d->chip->irq_mask(d);
> 	if (d->parent_data)
> 	   mask(d->parent_data);
> }
> 
> and avoid magic conditionals in the chip callbacks.
That's a good suggestion. Should we reuse irq_data directly or
group some fields from irq_data into a new data structure?

> 
> Lets look at the current way of allocating a MSI interrupt with
> remapping.
> 
> msi_capability_init()
>   arch_setup_msi_irqs()
>     irq_remapping_setup_msi_irqs()
>       irq_alloc_hwirq()
>          __assign_irq_vector()
>       intel_msi_alloc_irq()
>          alloc_irte()
> 
> In a proper domain hierarchy it would look like this:
> 
> msi_capability_init()
>   arch_setup_msi_irqs()
>     irq_domain_alloc(msi_domain)
>        irq_domain_alloc(remap_domain)
>          irq_domain_alloc(vector_domain)
> 
> In a non remapped scenario the msi_domain parent would be
> vector_domain and the chain would look like this:
> 
> msi_capability_init()
>   arch_setup_msi_irqs()
>     irq_domain_alloc(msi_domain)
>       irq_domain_alloc(vector_domain)
That's what I want to achieve:)

> 
> So now the question is where the irq descriptor which is associated to
> "irq" is allocated. The most logical point is at the end of the
> hierarchy, i.e. the domain which has no parent domain. That makes a
> single level domain scenario just a natural subset of a multi level
> hierarchy, but that makes the hierarchical irq data allocation and
> handling more complex because the innermost domain would use the
> irqdata associated to irqdesc. Not a really good idea, if we look at
> it from the interrupt handling code. We could copy and shuffle stuff
> around, but thats ugly as hell.
> 
> So we are better off by doing:
> 
> irq_domain_alloc(....)
> {
>    int virq = alloc_irq_desc(nrirqs);
> 
>    if (domain->parent) {
>       ret = irq_domain_alloc_parent(domain->parent, virq, nrirqs);
>       ....
>    }
>    ....
>    return virq;
> }   
> 
> And have:
> 
> irq_domain_alloc_parent(struct irqdomain *domain, int virq,
> 			int nrirqs)
> {
>    alloc_irq_data(domain, virq, nrirqs);
> 
>    if (domain->parent) {
>       ret = irq_domain_alloc_parent(domain->parent, nrirqs, virq);
>       ....
>    }
>    /*
>     * Allocate the actual hardware resources in this domain and store
>     * the domain specific information in the proper hierarchy level
>     * irqdata.
>     */
>    domain->alloc_hwirqs(domain, virq, nrirqs);
>    ...
> }
> 
> alloc_irq_data(struct irqdomain *domain, int virq, int nrirqs)
> {
>    /* Allocate irq_data and domain specific storage in one go */	
>    int size = sizeof(struct irq_data) + domain->private_data_size;
>    void *p;
> 
>    p = kzalloc(nrirqs * size);
>   
>    for (i = 0; i < nrirqs; i++, virq++, hwirq++, p += size) {
>      struct irq_data *d = irq_get_irq_data(virq);
>    
>      while (d->parent_data)
>        d = d->parent_data;
> 
>      d->parent_data = p;
> 
>      d = p;
>      d->irq = virq;
>      d->domain = domain;
> 
>      /* If we have domain specific storage, let chip_data point to it */
>      if (domain->private_data_size)
>        d->chip_data = d + 1;
>    }
> }
> 
> To lookup the irqdata of a hierarchy level we need a helper function:
> 
> irq_get_domain_irq_data(int virq, struct irq_domain *d)
> {
>    struct irq_data *d = irq_get_irq_data(virq);
> 
>    while (d) {
>       if (d->domain == d)
>          break;
>       d = d->parent_data;
>    }
>    return d;	
> }
> 
> So if we look at the various hierarchy levels, then we really can make
> sensible use of the separate irq_data instances. Right now, the struct
> irq_2_iommu is part of the vector domain config structure, but thats
> not because the vector domain needs that information, it just happens
> to be there. So we can put it into the irqdata associated to the remap
> domain.
> 
> There are a few other things to consider. Right now the irq remapping
> code modifies the x86_ops to redirect the ioapic and the msi functions
> to the remap implementations and at irq setup time it also modifies
> the irq chip callbacks. That's a completely unreadable and
> undocumented maze of indirections and substitutions.
> 
> And just looking at the msi setup code makes me run away screaming.
> 
> So in the intel irq remapping case arch_setup_msi_irqs() ends up with
> the following call chain for a single MSI-X interrupt:
> 
> irq_remapping_setup_msi_irqs() 	      irq_remapping.c
>   do_setup_msix_irqs()
> 
>     irq_alloc_hwirq()		      irq core code
>       irq = __irq_alloc_descs()
>       arch_setup_hwirq(irq)	      io_apic.c
>         cfg = alloc_irq_cfg(irq)
>         __assign_irq_vector(irq, cfg)
> 	irq_set_chip_data(irq, cfg)
> 
>     msi_alloc_remapped_irq(irq)	      irq_remapping.c
>       intel_msi_alloc_irq(irq)	      intel_irq_remapping.c
>         alloc_irte(irq)
> 
>     setup_msi_irq(irq)		      io_apic.c
>       msi_compose_msg(irq)
>         assign_irq_vector()           <- Second invocation !?!?!
> 	x86_msi.compose_msi_msg()     
> 	  compose_remapped_msi_msg()  irq_remapping.c
>             intel_compose_msi_msg()   intel_irq_remapping.c
> 	      fiddle_with_irte()
>       write_msi_msg()		      io_apic.c/msi.c
>       setup_remapped_irq()	      irq_remapping.c
>         modify_irq_chip_call_backs()
> 
> All in all design from hell hacked into submission ...
> 
> So how can we do better with irq domains?
> 
> First of all we need to separate the resource allocation and the
> actual activation, i.e.:
> 
> Allocate irq descriptors, remap area and vector block first, then
> activate the interrupts. Doing it this way is also simpler for backoff
> in the error case than undoing a partial reservation/allocation which
> has already been activated.
> 
> irq_domain_alloc(msi_domain, nrirqs)
> {
>   virq = irq_alloc_desc(nriqs)
> 
>   irq_domain_alloc_parent(remap_domain, virq, nrirqs)
>   
>     irq_domain_alloc_parent(vector_domain, virq, nrirqs)
> }
> 
> Each parent allocation allocates the irqdata and stores the
> reservation/allocation information in it. Further it assigns an
> interrupt chip to the irqdata if appropriate.
> 
> So we don't need to activate the interrupt in this step at all. All we
> need to do is to allocate and reserve the necessary resources.
Yes, I have split it into two steps: resource allocation and activation.

> 
> The point where we really need to activate the interrupt is at
> request_irq() time. And here the interrupt chips which we assigned
> come into play:
> 
> request_irq(virq)
>     ....
>     irq_startup();
>     
> So we need to teach irq_startup about hierarchies, but that's not
> rocket science.
> 
> irq_startup(struct irq_desc *desc, bool resend)
> {
>   irq_startup_parent(&desc->irq_data);
>   ...
> }
> 
> And have
> 
> irq_startup_parent(struct irq_data *d)
> {
>   if (d->parent_data)
>     irq_startup_parent(d->parent_data);
> 
>   if (d->chip && d->irq_startup)
>     d->chip->irq_startup(d);  
> }
> 
> So in the PCI MSI[X] remapping case this would 
> 
> 1) Call down into the vector domain and the assigned irq chip would
>    have a startup function which finally activates the reserved
>    vector.
> 
> 2) Write the irte entry from the remap domain and compose the msi msg.
> 
> 3) Enable the msi msg from the msi domain
> 
> Now you might ask the question how #2 makes use of #1
> (cfg->vector/cfg->domain) and #3 makes use of #2 (msi msg). That's
> rather simple.
Currently we solve this issue by packing all data into irq_cfg,
we remap and ioapic level could access apic id and vector in
vector domain.

> 
> The remap domain chip knows that its parent is the vector domain, so
> it simply can look at d->parent_data->chip_data to retrieve irq_cfg
> which contains the vector and the domain.
> 
> Now the MSI domain chip checks, whether the parent domain has
> d->parent_data->msi_desc set and if not NULL it gets the remapped msi
> msg information from there. If remapping is disabled, then
> d->parent_data->msi_desc is NULL and the native msg composition takes
> place.
> 
> While we are at it can we please get rid of the abnomination of
> modifying the callbacks of the irq chip which is associated to a
> particular interrupt line. That's a nasty and fcking non obvious
> hack. Why would the MSI chip care about an override to irq_eoi?
> 
> What's wrong with either having proper interrupt chips for the
> remapped and non remapped case or modifying the msi chip callbacks
> once when interrupt remapping is enabled instead of doing it with the
> assbackwards setup_remapped_irq() function over and over again?
> 
> Now if you followed the above you'll notice that it will get rid of
> another annoying level of indirection, i.e. the x86_msi ops. And I'm
> pretty sure, that this also gets rid of the ioapic ops hackery if we
> treat it the same way.
That's one of my goal: to simplify x86_msi and ioapic ops structures,
get rid of callback override, so the code will be much more easier
to read and maintain.

> 
> Another benefit is that you can get rid of the whole irq_remapping.c
> indirection and conditional maze, which is completely overengineered
> and pointless.
> 
> So the whole irq_remapping mess will boil down to establish the proper
> domain parent pointers for MSI, HPET_MSI, IOAPIC and whatever gets
> mangled through the remap code. That's a simple setup time operation
> and does not require any x86_*ops overrides at all. So if we do it
> right we can get rid of quite some pointless shite.
> 
> I probably missed a few things, though I don't see any real show
> stopper right now. I did not elaborate on irq_set_affinity() but that
> should be an obvious excercise; if not we need to think it through
> again and maybe from a different angle, but we really want a proper
> cleanup for the existing pile of shite and not another bolted on
> solution.
One more issue I'm facing now.

I plan to build one irqdomain to support MSI/MSIx, but system may have
multiple interrupt remapping units. It's a little tricky to maintain
hierarchy relationship between MSI irqdomain and remapping irqdomain.
It's hard to maintain N:N relationship for MSI and remapping irqdomains.
So should we maintain 1:N or 1:1 relationships? In other words, should
we build one irqdomain for each remapping unit or only build one
irqdomain for all remapping units?

On the other hand, it's a good news that we almost have the same goals,
and just have different ways to achieve our goals. I tried to change
x86 arch code only, and you suggest to touch the irq public code.
To be honest, I have no enough confidence to touch irq public code
at the first step:(

And I have a working patch set based on the draft design, should I
send out it fore RFC? I know I will rewrite them based on your
suggestions, but it may help to find more design issues with my
current implementation.

Regards!
Gerry

> 
> Thanks,
> 
> 	tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Gleixner Aug. 27, 2014, 8:57 a.m. UTC | #3
Jiang,

On Wed, 27 Aug 2014, Jiang Liu wrote:
> >> Third, a special value IRQDOMAIN_AUTO_ASSIGN_HWIRQ is defined out of
> >> irq_hw_number_t, which indicates that irqdomain callbacks should
> >> automatically hardware interrupt number for clients. This will be used
> >> to support CPU vector allocation and interrupt remapping controller
> >> on x86 platforms.
> > 
> > I can't see the point of this. If we have a hierarchy then this is a
> > property of the hierarchy itself not of an indiviual call.
> When invoking irqdomain interfaces, we need to pass in an hwirq. For
> IOAPIC, it's the pin number. But for remap and vector domains, caller
> can't provide hwirq, it's assigned by remap and vector domains
> themselves. So introduced IRQDOMAIN_AUTO_ASSIGN_HWIRQ to indicate
> that irqdomain will assign hwirq for callers.

I don't think it's an issue. You don't have to worry about the
existing irqdomain semantics and functionality. By introducing
hierarchy some of the existing rules are going to change no matter
what. So we should not try to make the interfaces which are required
for the hierarchical domains follow the semantics of the existing
plain interfaces.

If we decide to have the allocation scheme which I outlined, then this
becomes completely moot, simply because the allocation will take care
of this.

Lets look at the MSI example again. MSI does not have a hwirq number,
the MSI domain manages the MSI msg and that is composed from the
information which is created/managed by the remap and vector domains.
 
> >> Fourth, the flag IRQDOMAIN_FLAG_HIERARCHY is used to indicate weather
> >> irqdomain operations are hierarchy request. The irqdomain core uses
> > 
> > Why do we need that flag? If a domain has a parent domain, we already
> > know that this domain is part of a hierarchy.
> This flag is passed into hierarchy irqdomain interfaces for two
> purposes:
> 1) to protect irq_data->hwirq and irq_data->domain

Again, you try to bolt the hierarchy into the existing design rather
than doing a hierarchy design for irq domains and either map the
existing flat domain functionality into it or just leave it alone.

> > But your data representation is not hierarchical because only the
> > outmost domain map is stored in irq_data. For the parent domains you
> > offload the storage to the domain implementation itself.
> 
> One of my design rules is only to change x86 arch specific code when
> possible, so used above solution.

This design rule is wrong to begin with. You need to touch core code
anyway to support the hierarchy mechanisms. So you better have a
proper support for all of this in the core than having half baken
infrastructure plus ugly workarounds in the architecture code.

> If we could make changes to public data structures, we may find
> better solution as you have suggested:)

Of course we can do that and we should do it.

> > and avoid magic conditionals in the chip callbacks.
> That's a good suggestion. Should we reuse irq_data directly or
> group some fields from irq_data into a new data structure?

If we keep irq_data, then all nested chip callbacks and other things
just work. So creating a new sub structure is probably
counterproductive.

> > Now you might ask the question how #2 makes use of #1
> > (cfg->vector/cfg->domain) and #3 makes use of #2 (msi msg). That's
> > rather simple.
>
> Currently we solve this issue by packing all data into irq_cfg,
> we remap and ioapic level could access apic id and vector in
> vector domain.

Well, that's how it was hacked into the code in the first place, but
that's not something we want to keep. Clear separation of storage is
definitely a goal of doing the whole hierarchy change.
 
> I plan to build one irqdomain to support MSI/MSIx, but system may have
> multiple interrupt remapping units. It's a little tricky to maintain
> hierarchy relationship between MSI irqdomain and remapping irqdomain.
> It's hard to maintain N:N relationship for MSI and remapping irqdomains.
> So should we maintain 1:N or 1:1 relationships? In other words, should
> we build one irqdomain for each remapping unit or only build one
> irqdomain for all remapping units?

If you have several remapping domains, then you might consider to have
several corresponding MSI[X] domains as well. That's how the hardware
is structured.
 
> On the other hand, it's a good news that we almost have the same goals,
> and just have different ways to achieve our goals. I tried to change
> x86 arch code only, and you suggest to touch the irq public code.
> To be honest, I have no enough confidence to touch irq public code
> at the first step:(

Don't worry about touching generic code. It's not different from x86
code and having a proper core infrastructure makes the architecture
side clean and simple rather than stuffed with obscure workarounds.

I'm happy to guide you through if there are questions or design
decisions to make.
 
Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index b0f9d16e48f6..8c3ab6b62049 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -38,9 +38,16 @@ 
 struct device_node;
 struct irq_domain;
 struct of_device_id;
+struct irq_map_info;
 
 /* Number of irqs reserved for a legacy isa controller */
-#define NUM_ISA_INTERRUPTS	16
+#define NUM_ISA_INTERRUPTS		16
+
+/* Auto-assign hardware pin number */
+#define IRQDOMAIN_AUTO_ASSIGN_HWIRQ	((irq_hw_number_t)-1)
+
+/* Associate/disassociate IRQ for hierarchy IRQ domain */
+#define IRQDOMAIN_FLAG_HIERARCHY	0x1
 
 /**
  * struct irq_domain_ops - Methods for irq_domain objects
@@ -64,6 +71,14 @@  struct irq_domain_ops {
 	int (*xlate)(struct irq_domain *d, struct device_node *node,
 		     const u32 *intspec, unsigned int intsize,
 		     unsigned long *out_hwirq, unsigned int *out_type);
+
+	/* extended V2 interfaces to support hierarchy irqdomains */
+	int (*alloc)(struct irq_domain *d, irq_hw_number_t *hw,
+		     unsigned int size, struct irq_map_info *info);
+	void (*free)(struct irq_domain *d, unsigned int virq,
+		     irq_hw_number_t hw, unsigned int size);
+	int (*map2)(struct irq_domain *d, unsigned int virq,
+		    irq_hw_number_t hw, struct irq_map_info *info);
 };
 
 extern struct irq_domain_ops irq_generic_chip_ops;
@@ -99,6 +114,7 @@  struct irq_domain {
 	void *host_data;
 
 	/* Optional data */
+	struct irq_domain *parent;
 	struct device_node *of_node;
 	struct irq_domain_chip_generic *gc;
 
@@ -115,6 +131,7 @@  struct irq_domain *__irq_domain_add(struct device_node *of_node, int size,
 				    irq_hw_number_t hwirq_max, int direct_max,
 				    const struct irq_domain_ops *ops,
 				    void *host_data);
+
 struct irq_domain *irq_domain_add_simple(struct device_node *of_node,
 					 unsigned int size,
 					 unsigned int first_irq,
@@ -143,6 +160,7 @@  static inline struct irq_domain *irq_domain_add_linear(struct device_node *of_no
 {
 	return __irq_domain_add(of_node, size, size, 0, ops, host_data);
 }
+
 static inline struct irq_domain *irq_domain_add_nomap(struct device_node *of_node,
 					 unsigned int max_irq,
 					 const struct irq_domain_ops *ops,
@@ -150,6 +168,7 @@  static inline struct irq_domain *irq_domain_add_nomap(struct device_node *of_nod
 {
 	return __irq_domain_add(of_node, 0, max_irq, max_irq, ops, host_data);
 }
+
 static inline struct irq_domain *irq_domain_add_legacy_isa(
 				struct device_node *of_node,
 				const struct irq_domain_ops *ops,
@@ -158,6 +177,7 @@  static inline struct irq_domain *irq_domain_add_legacy_isa(
 	return irq_domain_add_legacy(of_node, NUM_ISA_INTERRUPTS, 0, 0, ops,
 				     host_data);
 }
+
 static inline struct irq_domain *irq_domain_add_tree(struct device_node *of_node,
 					 const struct irq_domain_ops *ops,
 					 void *host_data)
@@ -177,6 +197,7 @@  extern void irq_domain_disassociate(struct irq_domain *domain,
 
 extern unsigned int irq_create_mapping(struct irq_domain *host,
 				       irq_hw_number_t hwirq);
+
 extern void irq_dispose_mapping(unsigned int virq);
 
 /**
@@ -194,6 +215,7 @@  static inline unsigned int irq_linear_revmap(struct irq_domain *domain,
 {
 	return hwirq < domain->revmap_size ? domain->linear_revmap[hwirq] : 0;
 }
+
 extern unsigned int irq_find_mapping(struct irq_domain *host,
 				     irq_hw_number_t hwirq);
 extern unsigned int irq_create_direct_mapping(struct irq_domain *host);
@@ -220,6 +242,31 @@  int irq_domain_xlate_onetwocell(struct irq_domain *d, struct device_node *ctrlr,
 			const u32 *intspec, unsigned int intsize,
 			irq_hw_number_t *out_hwirq, unsigned int *out_type);
 
+/* V2 interfaces to support hierarchy IRQ domains. */
+extern int irq_domain_alloc_descs(struct irq_domain *domain, int virq,
+				  irq_hw_number_t hwirq, unsigned int size,
+				  int node);
+extern void irq_domain_free_descs(struct irq_domain *domain, unsigned int virq,
+				  unsigned int size);
+extern int irq_domain_alloc_irqs(struct irq_domain *domain,
+				 irq_hw_number_t *hwirq, unsigned int size,
+				 struct irq_map_info *info);
+extern void irq_domain_free_irqs(struct irq_domain *domain, unsigned int virq,
+				 irq_hw_number_t hwirq, unsigned int size);
+extern int irq_domain_associate_irqs(struct irq_domain *domain,
+				     unsigned int irq, irq_hw_number_t hwirq,
+				     unsigned int size, u32 flags,
+				     struct irq_map_info *info);
+extern void irq_domain_disassociate_irqs(struct irq_domain *domain,
+					 unsigned int irq,
+					 irq_hw_number_t hwirq,
+					 unsigned int size, u32 flags);
+extern int irq_domain_create_irqs(struct irq_domain *domain,
+				  irq_hw_number_t *hwirq, unsigned int size,
+				  struct irq_map_info *info);
+extern void irq_domain_destroy_irqs(struct irq_domain *domain,
+				    unsigned int virq, irq_hw_number_t hwirq,
+				    unsigned int size);
 #else /* CONFIG_IRQ_DOMAIN */
 static inline void irq_dispose_mapping(unsigned int virq) { }
 #endif /* !CONFIG_IRQ_DOMAIN */
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 6534ff6ce02e..e1f4e0685fdd 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -174,10 +174,8 @@  struct irq_domain *irq_domain_add_legacy(struct device_node *of_node,
 
 	domain = __irq_domain_add(of_node, first_hwirq + size,
 				  first_hwirq + size, 0, ops, host_data);
-	if (!domain)
-		return NULL;
-
-	irq_domain_associate_many(domain, first_irq, first_hwirq, size);
+	if (domain)
+		irq_domain_associate_many(domain, first_irq, first_hwirq, size);
 
 	return domain;
 }
@@ -231,31 +229,45 @@  void irq_set_default_host(struct irq_domain *domain)
 }
 EXPORT_SYMBOL_GPL(irq_set_default_host);
 
-void irq_domain_disassociate(struct irq_domain *domain, unsigned int irq)
+static void __irq_domain_disassociate(struct irq_domain *domain,
+				      unsigned int irq, irq_hw_number_t hwirq,
+				      u32 flags)
 {
 	struct irq_data *irq_data = irq_get_irq_data(irq);
-	irq_hw_number_t hwirq;
 
-	if (WARN(!irq_data || irq_data->domain != domain,
-		 "virq%i doesn't exist; cannot disassociate\n", irq))
-		return;
+	if (domain == NULL) {
+		domain = irq_default_domain;
+		if (WARN(!domain,
+			 "domain for virq%i is NULL; cannot disassociate\n",
+			 irq))
+			return;
+	}
 
-	hwirq = irq_data->hwirq;
-	irq_set_status_flags(irq, IRQ_NOREQUEST);
+	if (!(flags & IRQDOMAIN_FLAG_HIERARCHY)) {
+		if (WARN(!irq_data || irq_data->domain != domain,
+			 "virq%i doesn't exist; cannot disassociate\n", irq))
+			return;
 
-	/* remove chip and handler */
-	irq_set_chip_and_handler(irq, NULL, NULL);
+		if (hwirq == IRQDOMAIN_AUTO_ASSIGN_HWIRQ)
+			hwirq = irq_data->hwirq;
+		irq_set_status_flags(irq, IRQ_NOREQUEST);
 
-	/* Make sure it's completed */
-	synchronize_irq(irq);
+		/* remove chip and handler */
+		irq_set_chip_and_handler(irq, NULL, NULL);
+
+		/* Make sure it's completed */
+		synchronize_irq(irq);
+	}
 
 	/* Tell the PIC about it */
 	if (domain->ops->unmap)
 		domain->ops->unmap(domain, irq);
 	smp_mb();
 
-	irq_data->domain = NULL;
-	irq_data->hwirq = 0;
+	if (!(flags & IRQDOMAIN_FLAG_HIERARCHY)) {
+		irq_data->domain = NULL;
+		irq_data->hwirq = 0;
+	}
 
 	/* Clear reverse map for this hwirq */
 	if (hwirq < domain->revmap_size) {
@@ -267,38 +279,66 @@  void irq_domain_disassociate(struct irq_domain *domain, unsigned int irq)
 	}
 }
 
-int irq_domain_associate(struct irq_domain *domain, unsigned int virq,
-			 irq_hw_number_t hwirq)
+void irq_domain_disassociate(struct irq_domain *domain, unsigned int irq)
+{
+	__irq_domain_disassociate(domain, irq, IRQDOMAIN_AUTO_ASSIGN_HWIRQ, 0);
+}
+
+static int __irq_domain_associate(struct irq_domain *domain, unsigned int virq,
+				  irq_hw_number_t hwirq, u32 flags,
+				  struct irq_map_info *info)
 {
 	struct irq_data *irq_data = irq_get_irq_data(virq);
 	int ret;
 
+	if (domain == NULL) {
+		domain = irq_default_domain;
+		if (WARN(!domain,
+			 "domain for virq%i is NULL; cannot associate\n", virq))
+			return -EINVAL;
+	}
+
 	if (WARN(hwirq >= domain->hwirq_max,
 		 "error: hwirq 0x%x is too large for %s\n", (int)hwirq, domain->name))
 		return -EINVAL;
 	if (WARN(!irq_data, "error: virq%i is not allocated", virq))
 		return -EINVAL;
-	if (WARN(irq_data->domain, "error: virq%i is already associated", virq))
+	if (!(flags & IRQDOMAIN_FLAG_HIERARCHY) &&
+	    WARN(irq_data->domain, "error: virq%i is already associated", virq))
 		return -EINVAL;
 
-	mutex_lock(&irq_domain_mutex);
-	irq_data->hwirq = hwirq;
-	irq_data->domain = domain;
-	if (domain->ops->map) {
-		ret = domain->ops->map(domain, virq, hwirq);
+	/*
+	 * When supporting hierarchy irqdomains, hwirq and domain fields in
+	 * struct irq_data are used to store information for the outer-most
+	 * domain. And the parent field in struct irq_domain may be used to
+	 * store parent domain. But architecture needs provide a way to save
+	 * hwirq for inner domains.
+	 */
+	if (!(flags & IRQDOMAIN_FLAG_HIERARCHY)) {
+		mutex_lock(&irq_domain_mutex);
+		irq_data->hwirq = hwirq;
+		irq_data->domain = domain;
+	}
+
+	if (domain->ops->map || domain->ops->map2) {
+		if (domain->ops->map2)
+			ret = domain->ops->map2(domain, virq, hwirq, info);
+		else
+			ret = domain->ops->map(domain, virq, hwirq);
 		if (ret != 0) {
 			/*
 			 * If map() returns -EPERM, this interrupt is protected
 			 * by the firmware or some other service and shall not
 			 * be mapped. Don't bother telling the user about it.
 			 */
-			if (ret != -EPERM) {
+			if (ret != -EPERM)
 				pr_info("%s didn't like hwirq-0x%lx to VIRQ%i mapping (rc=%d)\n",
 				       domain->name, hwirq, virq, ret);
+			if (!(flags & IRQDOMAIN_FLAG_HIERARCHY)) {
+				irq_data->domain = NULL;
+				irq_data->hwirq = 0;
+				mutex_unlock(&irq_domain_mutex);
 			}
-			irq_data->domain = NULL;
-			irq_data->hwirq = 0;
-			mutex_unlock(&irq_domain_mutex);
 			return ret;
 		}
 
@@ -314,12 +354,20 @@  int irq_domain_associate(struct irq_domain *domain, unsigned int virq,
 		radix_tree_insert(&domain->revmap_tree, hwirq, irq_data);
 		mutex_unlock(&revmap_trees_mutex);
 	}
-	mutex_unlock(&irq_domain_mutex);
+
+	if (!(flags & IRQDOMAIN_FLAG_HIERARCHY))
+		mutex_unlock(&irq_domain_mutex);
 
 	irq_clear_status_flags(virq, IRQ_NOREQUEST);
 
 	return 0;
 }
+
+int irq_domain_associate(struct irq_domain *domain, unsigned int irq,
+			 irq_hw_number_t hwirq)
+{
+	return __irq_domain_associate(domain, irq, hwirq, 0, NULL);
+}
 EXPORT_SYMBOL_GPL(irq_domain_associate);
 
 void irq_domain_associate_many(struct irq_domain *domain, unsigned int irq_base,
@@ -331,11 +379,144 @@  void irq_domain_associate_many(struct irq_domain *domain, unsigned int irq_base,
 		of_node_full_name(domain->of_node), irq_base, (int)hwirq_base, count);
 
 	for (i = 0; i < count; i++) {
-		irq_domain_associate(domain, irq_base + i, hwirq_base + i);
+		__irq_domain_associate(domain, irq_base + i, hwirq_base + i,
+				       0, NULL);
 	}
 }
 EXPORT_SYMBOL_GPL(irq_domain_associate_many);
 
+int irq_domain_associate_irqs(struct irq_domain *domain, unsigned int irq,
+			      irq_hw_number_t hwirq, unsigned int size,
+			      u32 flags, struct irq_map_info *info)
+{
+	int ret;
+	unsigned int i;
+
+	for (i = 0; i < size; i++) {
+		ret = __irq_domain_associate(domain, irq + i, hwirq + i,
+					     flags, info);
+		if (ret) {
+			irq_domain_disassociate_irqs(domain, irq, hwirq,
+						     i, flags);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+void irq_domain_disassociate_irqs(struct irq_domain *domain, unsigned int irq,
+				  irq_hw_number_t hwirq, unsigned int size,
+				  u32 flags)
+{
+	unsigned int i;
+
+	for (i = 0; i < size; i++)
+		__irq_domain_disassociate(domain, irq + i, hwirq + i, flags);
+}
+
+int irq_domain_alloc_descs(struct irq_domain *domain, int virq,
+			   irq_hw_number_t hwirq, unsigned int size, int node)
+{
+	unsigned int hint;
+
+	/* Allocate a virtual interrupt number */
+	if (virq >= 0) {
+		virq = irq_alloc_descs(virq, virq, size, node);
+	} else {
+		hint = hwirq % nr_irqs;
+		if (hint == 0)
+			hint++;
+		virq = irq_alloc_descs_from(hint, size, node);
+		if (virq <= 0)
+			virq = irq_alloc_descs_from(1, size, node);
+	}
+
+	return virq;
+}
+
+void irq_domain_free_descs(struct irq_domain *domain, unsigned int virq,
+			   unsigned int size)
+{
+	unsigned int i;
+
+	for (i = 0; i < size; i++)
+		irq_free_desc(virq + i);
+}
+
+int irq_domain_alloc_irqs(struct irq_domain *domain, irq_hw_number_t *hwirq,
+			  unsigned int size, struct irq_map_info *info)
+{
+	int virq;
+
+	if (domain == NULL) {
+		domain = irq_default_domain;
+		if (WARN(!domain, "domain is NULL; cannot allocate irq\n"))
+			return -EINVAL;
+	}
+
+	if (*hwirq != IRQDOMAIN_AUTO_ASSIGN_HWIRQ) {
+		/* Check if mapping already exists */
+		virq = irq_find_mapping(domain, *hwirq);
+		if (virq) {
+			pr_debug("-> existing mapping on virq %d\n", virq);
+			return -EEXIST;
+		}
+	} else if (!domain->ops->alloc) {
+		pr_debug("-> ops->alloc() is NULL\n");
+		return -ENOSYS;
+	}
+
+	if (domain->ops->alloc)
+		virq = domain->ops->alloc(domain, hwirq, size, info);
+	else
+		virq = irq_domain_alloc_descs(domain, -1, *hwirq, size,
+					      of_node_to_nid(domain->of_node));
+
+	return virq;
+}
+
+void irq_domain_free_irqs(struct irq_domain *domain, unsigned int virq,
+			  irq_hw_number_t hwirq, unsigned int size)
+{
+	if (domain == NULL) {
+		domain = irq_default_domain;
+		if (WARN(!domain, "domain is NULL; cannot free irq%d\n",
+			 virq))
+			return;
+	}
+
+	if (domain->ops->free)
+		domain->ops->free(domain, virq, hwirq, size);
+	else
+		irq_domain_free_descs(domain, virq, size);
+}
+
+int irq_domain_create_irqs(struct irq_domain *domain, irq_hw_number_t *hwirq,
+			   unsigned int size, struct irq_map_info *info)
+{
+	int virq, ret;
+
+	virq = irq_domain_alloc_irqs(domain, hwirq, size, info);
+	if (virq >= 0) {
+		ret = irq_domain_associate_irqs(domain, virq, *hwirq, size,
+						0, info);
+		if (ret < 0) {
+			irq_domain_free_irqs(domain, virq, *hwirq, size);
+			virq = -1;
+		}
+	}
+
+	return virq;
+}
+
+void irq_domain_destroy_irqs(struct irq_domain *domain, unsigned int virq,
+			     irq_hw_number_t hwirq, unsigned int size)
+{
+	irq_domain_disassociate_irqs(domain, virq, hwirq, size, 0);
+	irq_domain_free_irqs(domain, virq, hwirq, size);
+}
+
 /**
  * irq_create_direct_mapping() - Allocate an irq for direct mapping
  * @domain: domain to allocate the irq for or NULL for default domain
@@ -388,7 +569,6 @@  EXPORT_SYMBOL_GPL(irq_create_direct_mapping);
 unsigned int irq_create_mapping(struct irq_domain *domain,
 				irq_hw_number_t hwirq)
 {
-	unsigned int hint;
 	int virq;
 
 	pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq);
@@ -410,12 +590,8 @@  unsigned int irq_create_mapping(struct irq_domain *domain,
 	}
 
 	/* Allocate a virtual interrupt number */
-	hint = hwirq % nr_irqs;
-	if (hint == 0)
-		hint++;
-	virq = irq_alloc_desc_from(hint, of_node_to_nid(domain->of_node));
-	if (virq <= 0)
-		virq = irq_alloc_desc_from(1, of_node_to_nid(domain->of_node));
+	virq = irq_domain_alloc_descs(domain, -1, hwirq, 1,
+				      of_node_to_nid(domain->of_node));
 	if (virq <= 0) {
 		pr_debug("-> virq allocation failed\n");
 		return 0;