diff mbox

[v5,06/27] irq_domain/powerpc: eliminate irq_map; use irq_alloc_desc() instead

Message ID 1329383368-12122-7-git-send-email-grant.likely@secretlab.ca (mailing list archive)
State Not Applicable
Headers show

Commit Message

Grant Likely Feb. 16, 2012, 9:09 a.m. UTC
This patch drops the powerpc-specific irq_map table and replaces it with
directly using the irq_alloc_desc()/irq_free_desc() interfaces for allocating
and freeing irq_desc structures.

This patch is a preparation step for generalizing the powerpc-specific virq
infrastructure to become irq_domains.

As part of this change, the irq_big_lock is changed to a mutex from a raw
spinlock.  There is no longer any need to use a spin lock since the irq_desc
allocation code is now responsible for the critical section of finding
an unused range of irq numbers.

The radix lookup table is also changed to store the irq_data pointer instead
of the irq_map entry since the irq_map is removed.  This should end up being
functionally equivalent since only allocated irq_descs are ever added to the
radix tree.

v5: - Really don't ever allocate virq 0.  The previous version could still
      do it if hint == 0
    - Respect irq_virq_count setting for NOMAP.  Some NOMAP domains cannot
      use virq values above irq_virq_count.
    - Use numa_node_id() when allocating irq_descs.  Ideally the API should
      obtain that value from the caller, but that touches a lot of call sites
      so will be deferred to a follow-on patch.
    - Fix irq_find_mapping() to include irq numbers lower than
      NUM_ISA_INTERRUPTS.  With the switch to irq_alloc_desc*(), the lowest
      possible allocated irq is now returned by arch_probe_nr_irqs().
v4: - Fix incorrect access to irq_data structure in debugfs code
    - Don't ever allocate virq 0

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
---
 arch/powerpc/include/asm/irq.h |   27 -----
 arch/powerpc/kernel/irq.c      |  240 ++++++++++++----------------------------
 2 files changed, 69 insertions(+), 198 deletions(-)

Comments

Andreas Schwab April 1, 2012, 9:27 p.m. UTC | #1
Grant Likely <grant.likely@secretlab.ca> writes:

> This patch drops the powerpc-specific irq_map table and replaces it with
> directly using the irq_alloc_desc()/irq_free_desc() interfaces for allocating
> and freeing irq_desc structures.

This breaks irqs on PowerMac G5.  I see lost irq errors from the sata
driver.

Note that the breakage only started after the series was merged into
mainline.  But when transplanted upon the parent of the merge this is
the commit that was blamed by bisect.  So something changed between
v3.3-rc3 and the merge of this series that broke it.

Andreas.
Benjamin Herrenschmidt April 2, 2012, 4:21 a.m. UTC | #2
On Sun, 2012-04-01 at 23:27 +0200, Andreas Schwab wrote:
> Grant Likely <grant.likely@secretlab.ca> writes:
> 
> > This patch drops the powerpc-specific irq_map table and replaces it with
> > directly using the irq_alloc_desc()/irq_free_desc() interfaces for allocating
> > and freeing irq_desc structures.
> 
> This breaks irqs on PowerMac G5.  I see lost irq errors from the sata
> driver.
> 
> Note that the breakage only started after the series was merged into
> mainline.  But when transplanted upon the parent of the merge this is
> the commit that was blamed by bisect.  So something changed between
> v3.3-rc3 and the merge of this series that broke it.

can you tell me more ? (What machine precisely, what .config etc...) ?

I tested here on two G5's:

 - PowerMac7,3 aka Dual G5 AGP. This ones has U3 + K2 and
uses cascaded MPICs

 - PowerMac11,2 aka Quad G5. This one has U4 + K2 and uses a single MPIC
in the northbridge

And both appear to work fine with today upstream (basically -rc1).

Cheers,
Ben.
Andreas Schwab April 2, 2012, 10:31 a.m. UTC | #3
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

>  - PowerMac7,3 aka Dual G5 AGP. This ones has U3 + K2 and
> uses cascaded MPICs

That's the same as mine, config is attached.  I'm currently trying to
bisect the commit that actually broke the series.

Andreas.
Andreas Schwab April 2, 2012, 4:29 p.m. UTC | #4
Andreas Schwab <schwab@linux-m68k.org> writes:

> Grant Likely <grant.likely@secretlab.ca> writes:
>
>> This patch drops the powerpc-specific irq_map table and replaces it with
>> directly using the irq_alloc_desc()/irq_free_desc() interfaces for allocating
>> and freeing irq_desc structures.
>
> This breaks irqs on PowerMac G5.  I see lost irq errors from the sata
> driver.

When I revert a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5 ("genirq: Fix
long-term regression in genirq irq_set_irq_type() handling") on top of
3.4-rc1 the sata irq errors disappear, but I see a lot of spurious
interrupts.  Also the X server is broken somehow, though I don't know
whether that is related or a different bug.

Andreas.
Grant Likely April 2, 2012, 8:28 p.m. UTC | #5
On Mon, 02 Apr 2012 18:29:15 +0200, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Andreas Schwab <schwab@linux-m68k.org> writes:
> 
> > Grant Likely <grant.likely@secretlab.ca> writes:
> >
> >> This patch drops the powerpc-specific irq_map table and replaces it with
> >> directly using the irq_alloc_desc()/irq_free_desc() interfaces for allocating
> >> and freeing irq_desc structures.
> >
> > This breaks irqs on PowerMac G5.  I see lost irq errors from the sata
> > driver.
> 
> When I revert a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5 ("genirq: Fix
> long-term regression in genirq irq_set_irq_type() handling") on top of
> 3.4-rc1 the sata irq errors disappear, but I see a lot of spurious
> interrupts.  Also the X server is broken somehow, though I don't know
> whether that is related or a different bug.

That change is:

@@ -61,8 +61,7 @@ int irq_set_irq_type(unsigned int irq, unsigned int type)
                return -EINVAL;
 
        type &= IRQ_TYPE_SENSE_MASK;
-       if (type != IRQ_TYPE_NONE)
-               ret = __irq_set_trigger(desc, irq, type);
+       ret = __irq_set_trigger(desc, irq, type);
        irq_put_desc_busunlock(desc, flags);
        return ret;

So presumably irq_set_irq_type() is getting called with type ==
IRQ_TYPE_NONE.  From Russell's description, presumably that would mean
the G5 sata driver isn't setting the correct type for the interrupt,
but I have *no* idea how that intersects with the change removing the
powerpc irq map.

Can you dump out /debug/powerpc/virq_mapping from both before and
after the irq_map patch is applied?

g.
Thomas Gleixner April 2, 2012, 8:52 p.m. UTC | #6
On Mon, 2 Apr 2012, Andreas Schwab wrote:

> Andreas Schwab <schwab@linux-m68k.org> writes:
> 
> > Grant Likely <grant.likely@secretlab.ca> writes:
> >
> >> This patch drops the powerpc-specific irq_map table and replaces it with
> >> directly using the irq_alloc_desc()/irq_free_desc() interfaces for allocating
> >> and freeing irq_desc structures.
> >
> > This breaks irqs on PowerMac G5.  I see lost irq errors from the sata
> > driver.
> 
> When I revert a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5 ("genirq: Fix
> long-term regression in genirq irq_set_irq_type() handling") on top of
> 3.4-rc1 the sata irq errors disappear, but I see a lot of spurious

Hmm. Which irq chip is handling the interrupt for that sata irq ?

Thanks,

	tglx
Benjamin Herrenschmidt April 2, 2012, 9:11 p.m. UTC | #7
On Mon, 2012-04-02 at 12:31 +0200, Andreas Schwab wrote:
> >  - PowerMac7,3 aka Dual G5 AGP. This ones has U3 + K2 and
> > uses cascaded MPICs
> 
> That's the same as mine, config is attached.  I'm currently trying to
> bisect the commit that actually broke the series. 

Hrm, odd.. I'll dbl check today that I didn't b0rk something in my test
(like loading the wrong kernel :-) I'll also try with your config.

Cheers,
Ben.
Benjamin Herrenschmidt April 2, 2012, 9:20 p.m. UTC | #8
On Mon, 2012-04-02 at 22:52 +0200, Thomas Gleixner wrote:
> > When I revert a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5 ("genirq: Fix
> > long-term regression in genirq irq_set_irq_type() handling") on top
> of
> > 3.4-rc1 the sata irq errors disappear, but I see a lot of spurious
> 
> Hmm. Which irq chip is handling the interrupt for that sata irq ?

MPIC. The irq is a PCI IRQ, and so should be level-low, it should have
been mapped by the PCI code using the of_* calls.

Cheers,
Ben.
Andreas Schwab April 2, 2012, 9:22 p.m. UTC | #9
Thomas Gleixner <tglx@linutronix.de> writes:

> Hmm. Which irq chip is handling the interrupt for that sata irq ?

It's MPIC 1.

Andreas.
Thomas Gleixner April 2, 2012, 9:27 p.m. UTC | #10
On Tue, 3 Apr 2012, Benjamin Herrenschmidt wrote:

> On Mon, 2012-04-02 at 22:52 +0200, Thomas Gleixner wrote:
> > > When I revert a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5 ("genirq: Fix
> > > long-term regression in genirq irq_set_irq_type() handling") on top
> > of
> > > 3.4-rc1 the sata irq errors disappear, but I see a lot of spurious
> > 
> > Hmm. Which irq chip is handling the interrupt for that sata irq ?
> 
> MPIC. The irq is a PCI IRQ, and so should be level-low, it should have
> been mapped by the PCI code using the of_* calls.

So it's covered by this section:

mpic_set_irq_type()

        if (flow_type == IRQ_TYPE_NONE)
                if (mpic->senses && src < mpic->senses_count)
                        flow_type = mpic->senses[src];
        if (flow_type == IRQ_TYPE_NONE)
                flow_type = IRQ_TYPE_LEVEL_LOW;

And with IRQ_TYPE_NONE this is always going to end up with
IRQ_TYPE_LEVEL_LOW simply because the only function which might set
mpic->senses is mpic_set_default_senses(). And this function does not
have a single caller .....

/me scratches head
Russell King - ARM Linux April 2, 2012, 9:55 p.m. UTC | #11
On Mon, Apr 02, 2012 at 02:28:48PM -0600, Grant Likely wrote:
> On Mon, 02 Apr 2012 18:29:15 +0200, Andreas Schwab <schwab@linux-m68k.org> wrote:
> > Andreas Schwab <schwab@linux-m68k.org> writes:
> > 
> > > Grant Likely <grant.likely@secretlab.ca> writes:
> > >
> > >> This patch drops the powerpc-specific irq_map table and replaces it with
> > >> directly using the irq_alloc_desc()/irq_free_desc() interfaces for allocating
> > >> and freeing irq_desc structures.
> > >
> > > This breaks irqs on PowerMac G5.  I see lost irq errors from the sata
> > > driver.
> > 
> > When I revert a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5 ("genirq: Fix
> > long-term regression in genirq irq_set_irq_type() handling") on top of
> > 3.4-rc1 the sata irq errors disappear, but I see a lot of spurious
> > interrupts.  Also the X server is broken somehow, though I don't know
> > whether that is related or a different bug.
> 
> That change is:
> 
> @@ -61,8 +61,7 @@ int irq_set_irq_type(unsigned int irq, unsigned int type)
>                 return -EINVAL;
>  
>         type &= IRQ_TYPE_SENSE_MASK;
> -       if (type != IRQ_TYPE_NONE)
> -               ret = __irq_set_trigger(desc, irq, type);
> +       ret = __irq_set_trigger(desc, irq, type);
>         irq_put_desc_busunlock(desc, flags);
>         return ret;
> 
> So presumably irq_set_irq_type() is getting called with type ==
> IRQ_TYPE_NONE.  From Russell's description, presumably that would mean
> the G5 sata driver isn't setting the correct type for the interrupt,
> but I have *no* idea how that intersects with the change removing the
> powerpc irq map.

Well, presumably someone is calling irq_set_irq_type() asking explicitly
for IRQ_TYPE_NONE.  The code will now (as it always used to before David's
change) do exactly what you ask this to: it will ask the type to be set
to none.

If you don't want to set the type to none, don't call the interface asking
for that to happen.
Benjamin Herrenschmidt April 2, 2012, 10:32 p.m. UTC | #12
On Mon, 2012-04-02 at 23:27 +0200, Thomas Gleixner wrote:
> So it's covered by this section:
> 
> mpic_set_irq_type()
> 
>         if (flow_type == IRQ_TYPE_NONE)
>                 if (mpic->senses && src < mpic->senses_count)
>                         flow_type = mpic->senses[src];
>         if (flow_type == IRQ_TYPE_NONE)
>                 flow_type = IRQ_TYPE_LEVEL_LOW;
> 
> And with IRQ_TYPE_NONE this is always going to end up with
> IRQ_TYPE_LEVEL_LOW simply because the only function which might set
> mpic->senses is mpic_set_default_senses(). And this function does not
> have a single caller .....
> 
> /me scratches head 

Something must have broken ... when converting from the device-tree we
should be calling xlate which should extract the polarity/sense from the
device-tree.

Now, afaik, IRQ_LEVEL_LOW is correct for PCI unless the interrupt is
inverted inside Apple ASIC which is possible... (it's not an external
PCI, it's a cell inside K2).

Cheers,
Ben.
Benjamin Herrenschmidt April 2, 2012, 10:33 p.m. UTC | #13
On Mon, 2012-04-02 at 22:55 +0100, Russell King - ARM Linux wrote:
> Well, presumably someone is calling irq_set_irq_type() asking explicitly
> for IRQ_TYPE_NONE.  The code will now (as it always used to before David's
> change) do exactly what you ask this to: it will ask the type to be set
> to none.
> 
> If you don't want to set the type to none, don't call the interface asking
> for that to happen. 

I think part of the idea with NONE is to use it as "reset that interrupt
to the "default" setting, whatever that means ...

Cheers,
Ben.
Russell King - ARM Linux April 2, 2012, 10:52 p.m. UTC | #14
On Tue, Apr 03, 2012 at 08:33:25AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2012-04-02 at 22:55 +0100, Russell King - ARM Linux wrote:
> > Well, presumably someone is calling irq_set_irq_type() asking explicitly
> > for IRQ_TYPE_NONE.  The code will now (as it always used to before David's
> > change) do exactly what you ask this to: it will ask the type to be set
> > to none.
> > 
> > If you don't want to set the type to none, don't call the interface asking
> > for that to happen. 
> 
> I think part of the idea with NONE is to use it as "reset that interrupt
> to the "default" setting, whatever that means ...

No, it never was (anyone who thought that is plain wrong because that
wasn't my original design.)

NONE really did mean "I don't want any trigger level" for set_irq_type()
as it was originally designed on ARM, and there's a number of PCMCIA
socket drivers which rely on that behaviour: the reason is that with
edge triggered interrupts, Linux normally remembers edges which happen
with the interrupt disabled, and delivers those events when the interrupt
is re-enabled.  There are situations (such as edge triggered status GPIOs)
where we _really_ do want to ignore transitions on an edge triggered
input.  And PCMCIA soc_common uses set_irq_type() with NONE to do this.

The reason you can't request an interrupt with a NONE type is because
I saw that to be pointless - and in any case, the type is a bitmask and
NONE ended up being zero.  Zero is also what you get with request_irq()
used with drivers which don't pass any IRQ trigger flags, so request_irq()
has to ignore this case.

Now, arguably, we could say that there should be an interface for clearing
pending edges in our interrupt model, but traditionally there hasn't been,
even when my ARM IRQ stuff was moved forward into genirq.  However,
subsequently changing genirq in a way which breaks previously working
stuff is a regression, and that's what my revert addresses.

If we want to fix it a better way, then sure, that'll be good.  But what
we shouldn't do is re-introduce one regression to fix a different
regression.

So, Thomas, what do you think about providing a way that a disabled
interrupt could have its pending status cleared such that when it's
enabled, any queued events are ignored?  Maybe an enable_and_clear_irq() ?
Benjamin Herrenschmidt April 2, 2012, 11:38 p.m. UTC | #15
On Mon, 2012-04-02 at 23:52 +0100, Russell King - ARM Linux wrote:
> If we want to fix it a better way, then sure, that'll be good.  But
> what
> we shouldn't do is re-introduce one regression to fix a different
> regression.
> 
> So, Thomas, what do you think about providing a way that a disabled
> interrupt could have its pending status cleared such that when it's
> enabled, any queued events are ignored?  Maybe an
> enable_and_clear_irq() ? 

Ok. Doesn't matter anyway, this shouldn't be the problem in this
specific case. IE. we shouldn't be setting that interrupt to NONE.

We should have parsed the device-tree and set the trigger appropriately.

Cheers,
Ben.
Benjamin Herrenschmidt April 3, 2012, 12:37 a.m. UTC | #16
On Mon, 2012-04-02 at 18:29 +0200, Andreas Schwab wrote:
> > This breaks irqs on PowerMac G5.  I see lost irq errors from the sata
> > driver.
> 
> When I revert a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5 ("genirq: Fix
> long-term regression in genirq irq_set_irq_type() handling") on top of
> 3.4-rc1 the sata irq errors disappear, but I see a lot of spurious
> interrupts.  Also the X server is broken somehow, though I don't know
> whether that is related or a different bug.

I can't explain why but it works for me with your config on a similar
dual G5.

It would be interesting to figure out what causes the call to
set_irq_type(NONE)...

The SATA driver isn't doing anything. What should be happening
is that:

 - pci_read_irq_line() in arch/powerpc/kernel/pci-common.c

This will call of_irq_map_pci() and then call irq_of_create_mapping()
with the result if it's successful (it should be).

 - irq_of_create_mapping in kernel/irq/irqdomain.c

That should then call domain->ops->xlate() where domain is the domain
of the first MPIC (the one in K2), which should return your IRQ number
for the SATA controller (btw, it's 0, in case that matters, ie, HW irq 0
on MPIC 1 iirc) and the interrupt type. This is where you should
get IRQ_TYPE_LEVEL_LOW (i verified and that's what my device-tree
contains).

You can add printk's in there to see what's happening in your case ?

Cheers,
Ben.
Thomas Gleixner April 3, 2012, 8:20 a.m. UTC | #17
On Tue, 3 Apr 2012, Benjamin Herrenschmidt wrote:

> On Mon, 2012-04-02 at 22:55 +0100, Russell King - ARM Linux wrote:
> > Well, presumably someone is calling irq_set_irq_type() asking explicitly
> > for IRQ_TYPE_NONE.  The code will now (as it always used to before David's
> > change) do exactly what you ask this to: it will ask the type to be set
> > to none.
> > 
> > If you don't want to set the type to none, don't call the interface asking
> > for that to happen. 
> 
> I think part of the idea with NONE is to use it as "reset that interrupt
> to the "default" setting, whatever that means ...

Well, the ship side set_type function could simply leave it alone and
not touch the thing at all. That's how the core code did until we
discovered that we broke Russells toys that way.

Thanks,

	tglx
Thomas Gleixner April 3, 2012, 8:23 a.m. UTC | #18
On Mon, 2 Apr 2012, Russell King - ARM Linux wrote:
> If we want to fix it a better way, then sure, that'll be good.  But what
> we shouldn't do is re-introduce one regression to fix a different
> regression.
> 
> So, Thomas, what do you think about providing a way that a disabled
> interrupt could have its pending status cleared such that when it's
> enabled, any queued events are ignored?  Maybe an enable_and_clear_irq() ?
 
We can make it a flag, which you can either add to the irq itself or
hand it in on request_irq().

Does that require a special chip->callback() function as well ?

Thanks,

	tglx
Andreas Schwab April 3, 2012, 12:11 p.m. UTC | #19
Grant Likely <grant.likely@secretlab.ca> writes:

> Can you dump out /debug/powerpc/virq_mapping from both before and
> after the irq_map patch is applied?

before:
virq   hwirq    chip name        chip data           host name
   16  0x00000   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   21  0x00001   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   24  0x00002   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   25  0x00019   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   26  0x0001a   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   27  0x0001b   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   28  0x0001c   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   29  0x0003d   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   30  0x0001e   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   31  0x0003c   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   39  0x00027   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   40  0x00028   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   41  0x00029   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   42  0x0002a   MPIC 2          0xc00000017a011000  /u3@0,f8000000/mpic@f8040000
   47  0x0002f   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   59  0x000fb   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   60  0x000fc   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   61  0x000fd   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   62  0x000fe   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   63  0x0003f   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000

after:
virq   hwirq    chip name        chip data           host name
   16  0x00000   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   21  0x00001   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   24  0x00002   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   25  0x00019   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   26  0x0001a   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   27  0x0001b   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   28  0x0001c   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   30  0x0001e   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   39  0x00027   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   40  0x00028   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   41  0x00029   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   42  0x0002a   MPIC 2          0xc00000017a011000  /u3@0,f8000000/mpic@f8040000
   47  0x0002f   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   59  0x000fb   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   60  0x000fc   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   61  0x000fd   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   62  0x000fe   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   63  0x0003f   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   64  0x0003d   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000
   65  0x0003c   MPIC 1          0xc00000017a010000  /ht@0,f2000000/pci@1/mac-io@7/mpic@40000

But I have NR_IRQS=64.  Bounds checking missing?  Irqs 64/65 are related
to the sound chip (headphone-detect and line-out-detect).

When reconfiguring with NR_IRQS=128 interrupts are working again, but I
still see a lot of spurious interrupts, and the X server is still broken
(no input works, but I still don't know whether that is an unrelated
bug).

This is a sample of /proc/interrupts from 3.3 (with NR_IRQS=64):
           CPU0       CPU1       
 16:       2039       6070   MPIC 1    Level     sata_svw
 21:          0          0   MPIC 1    Edge      i2sbus: i2s-a (tx)
 22:         12         20   MPIC 1    Level   
 23:         14         18   MPIC 1    Level   
 24:          0          0   MPIC 1    Edge      i2sbus: i2s-a (rx)
 25:          3          0   MPIC 1    Level     VIA-PMU
 26:         16         62   MPIC 1    Level     keywest i2c
 27:          0          1   MPIC 1    Level     ohci_hcd:usb2
 28:          0          1   MPIC 1    Level     ohci_hcd:usb3
 29:          0          0   MPIC 1    Edge      headphone-detect
 30:          0          0   MPIC 1    Level     i2sbus: i2s-a (control)
 31:          0          0   MPIC 1    Edge      line-output-detect
 39:         22         64   MPIC 1    Level     pata-pci-macio
 40:          0          2   MPIC 1    Level     firewire_ohci
 41:         52        147   MPIC 1    Level     eth0
 42:       1732       5053   MPIC 2    Level     keywest i2c
 47:          0          0   MPIC 1    Level     GPIO1 ADB
 59:          0          0   MPIC 1    Edge      ipi call function
 60:       2064       1940   MPIC 1    Edge      ipi reschedule
 61:       3406        945   MPIC 1    Edge      ipi call function single
 62:          0          0   MPIC 1    Edge      ipi debugger
 63:         39         91   MPIC 1    Level     ehci_hcd:usb1, ohci_hcd:usb4, ohci_hcd:usb5
LOC:       3503       3719   Local timer interrupts
SPU:          2          0   Spurious interrupts
CNT:          0          0   Performance monitoring interrupts
MCE:          0          0   Machine check exceptions

This is a sample of /proc/interrupts from 3.4-rc1 (with NR_IRQS=128):
           CPU0       CPU1       
 16:       2603       7596   MPIC 1    Level     sata_svw
 21:          1          0   MPIC 1    Edge      i2sbus: i2s-a (tx)
 22:         13         19   MPIC 1    Level   
 23:          8         24   MPIC 1    Level   
 24:          0          1   MPIC 1    Edge      i2sbus: i2s-a (rx)
 25:          2          1   MPIC 1    Level     VIA-PMU
 26:         21         57   MPIC 1    Level     keywest i2c
 27:          0          1   MPIC 1    Level     ohci_hcd:usb2
 28:          0          1   MPIC 1    Level     ohci_hcd:usb3
 30:          0          0   MPIC 1    Level     i2sbus: i2s-a (control)
 39:         39        131   MPIC 1    Level     pata-pci-macio
 40:          2          2   MPIC 1    Level     firewire_ohci
 41:         93        268   MPIC 1    Level     eth0
 42:       8569      24140   MPIC 2    Level     keywest i2c
 47:          0          0   MPIC 1    Level     GPIO1 ADB
 60:          1          0   MPIC 1    Edge      line-output-detect
 61:          1          0   MPIC 1    Edge      headphone-detect
 63:        153        502   MPIC 1    Level     ehci_hcd:usb1, ohci_hcd:usb4, ohci_hcd:usb5
123:          0          0   MPIC 1    Edge      ipi call function
124:       1978       2349   MPIC 1    Edge      ipi reschedule
125:       2356       1816   MPIC 1    Edge      ipi call function single
126:          0          0   MPIC 1    Edge      ipi debugger
LOC:       4417       7985   Local timer interrupts
SPU:       9586      25811   Spurious interrupts
CNT:          0          0   Performance monitoring interrupts
MCE:          0          0   Machine check exceptions

Andreas.
Benjamin Herrenschmidt April 3, 2012, 9:43 p.m. UTC | #20
On Tue, 2012-04-03 at 14:11 +0200, Andreas Schwab wrote:
> 
> When reconfiguring with NR_IRQS=128 interrupts are working again, but I
> still see a lot of spurious interrupts, and the X server is still broken
> (no input works, but I still don't know whether that is an unrelated
> bug). 

I have an idea or two about the spurrious interrupt business, I'll have
a look later today hopefully. As for the X server, I don't think it's
related, but of course it's hard to tell. Do you see a relevant
difference in the X log ?

Cheers,
Ben.
Andreas Schwab April 4, 2012, 12:51 p.m. UTC | #21
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> Do you see a relevant difference in the X log ?

Nothing at all.

Andreas.
Andreas Schwab April 5, 2012, 12:35 p.m. UTC | #22
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> Hrm, odd.. I'll dbl check today that I didn't b0rk something in my test
> (like loading the wrong kernel :-) I'll also try with your config.

Do you have anything that triggers loading the sound drivers?  That's
exactly the point where the irq problems start.

Andreas.
Andreas Schwab April 6, 2012, 11:51 a.m. UTC | #23
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> Ok. Doesn't matter anyway, this shouldn't be the problem in this
> specific case. IE. we shouldn't be setting that interrupt to NONE.

After removinge the call to irq_set_irq_type in mpic_host_map the irq
problem disappears.

Instead of the irq_set_irq_type(,NONE) calls from mpic_host_map I see
corresponding irq_set_irq_type(,LEVEL_LOW) calls now.

Andreas.
Benjamin Herrenschmidt April 6, 2012, 11:37 p.m. UTC | #24
On Fri, 2012-04-06 at 13:51 +0200, Andreas Schwab wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
> > Ok. Doesn't matter anyway, this shouldn't be the problem in this
> > specific case. IE. we shouldn't be setting that interrupt to NONE.
> 
> After removinge the call to irq_set_irq_type in mpic_host_map the irq
> problem disappears.
> 
> Instead of the irq_set_irq_type(,NONE) calls from mpic_host_map I see
> corresponding irq_set_irq_type(,LEVEL_LOW) calls now.

It's arguable that this irq_set_irq_type(,NONE) shouln't be there but
still ... it's been around for ever and things worked :-) So something
-else- is causing the problem and I'd like to understand what exactly.

(When I say arguable, I do mean it. There are some reasons to keep it
even if we agree that it's not "setting a default" but marking the
interrupt as somewhat disabled as Russell wants, that's fine with me, an
MPIC external interrupt should -always- have a proper type set before
being used).

First we need to fix that business with NR_IRQS/nr_irqs everywhere, then
if the problem persists, check why we aren't calling the proper
irq_set_irq_type() after the mapping is established (we should be).

Cheers,
Ben.
Andreas Schwab April 7, 2012, 12:27 p.m. UTC | #25
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> It's arguable that this irq_set_irq_type(,NONE) shouln't be there but
> still ... it's been around for ever and things worked :-) So something
> -else- is causing the problem and I'd like to understand what exactly.

AFAICS before a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5
irq_set_irq_type(,NONE) was actually a no-op.

Andreas.
Benjamin Herrenschmidt April 11, 2012, 1:13 a.m. UTC | #26
On Sat, 2012-04-07 at 14:27 +0200, Andreas Schwab wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
> > It's arguable that this irq_set_irq_type(,NONE) shouln't be there but
> > still ... it's been around for ever and things worked :-) So something
> > -else- is causing the problem and I'd like to understand what exactly.
> 
> AFAICS before a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5
> irq_set_irq_type(,NONE) was actually a no-op.

So I'm still a bit baffled... ie, I understand some of what's happening
but not why it breaks things, I haven't yet managed to reproduce but I
haven't tried too hard just yet (was away from the HW) :

Basically, we end up setting the mpic to IRQ_TYPE_LEVEL_LOW as a result
of a request to set it to IRQ_TYPE_NONE, ie, we apply a "default"
setting instead of just setting it to "none" (in part because we don't
have a way in HW to disable the trigger completely. We then write that
into the irq_desc with irqd_set_trigger_type().

Then we return IRQ_SET_MASK_OK_NOCOPY, which means that the generic
__irq_set_trigger() code will read back what we've set in the desc and
adjust things accordingly, rather than writing the original value in,
ie, our descriptor ends up being effectively set to IRQ_TYPE_LEVEL_LOW
rather than IRQ_TYPE_NONE.

Later, when we actually try to set it to IRQ_TYPE_LEVEL_LOW as a result
of parsing the device-tree, we then hit the code
irq_create_of_mapping(), at the bottom, which will skip the call to
irq_set_irq_type() if we are trying to set it to the same type that it's
already set to....

But I don't see why that breaks anything, ie we -have- set things to
LEVEL_LOW as a result of IRQ_TYPE_NONE, which is the right setting, so
things should work despite the fact that we skip the second call.

Out of curiosity, can you try removing the check in irqdomain and always
apply the setting regardless of the previous value (ie, remove the check
of type against irqd_get_trigger at the end of irq_create_of_mapping),
see if that makes a difference ?

Cheers,
Ben.
Benjamin Herrenschmidt April 11, 2012, 1:33 a.m. UTC | #27
On Wed, 2012-04-11 at 11:13 +1000, Benjamin Herrenschmidt wrote:
> On Sat, 2012-04-07 at 14:27 +0200, Andreas Schwab wrote:
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> > 
> > > It's arguable that this irq_set_irq_type(,NONE) shouln't be there but
> > > still ... it's been around for ever and things worked :-) So something
> > > -else- is causing the problem and I'd like to understand what exactly.
> > 
> > AFAICS before a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5
> > irq_set_irq_type(,NONE) was actually a no-op.
> 
> So I'm still a bit baffled... ie, I understand some of what's happening
> but not why it breaks things, I haven't yet managed to reproduce but I
> haven't tried too hard just yet (was away from the HW) :

Allright, I have a repro-case, I'll dig.

Cheers,
Ben.
Benjamin Herrenschmidt April 11, 2012, 5:29 a.m. UTC | #28
On Wed, 2012-04-11 at 11:33 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-04-11 at 11:13 +1000, Benjamin Herrenschmidt wrote:
> > On Sat, 2012-04-07 at 14:27 +0200, Andreas Schwab wrote:
> > > Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> > > 
> > > > It's arguable that this irq_set_irq_type(,NONE) shouln't be there but
> > > > still ... it's been around for ever and things worked :-) So something
> > > > -else- is causing the problem and I'd like to understand what exactly.
> > > 
> > > AFAICS before a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5
> > > irq_set_irq_type(,NONE) was actually a no-op.
> > 
> > So I'm still a bit baffled... ie, I understand some of what's happening
> > but not why it breaks things, I haven't yet managed to reproduce but I
> > haven't tried too hard just yet (was away from the HW) :
> 
> Allright, I have a repro-case, I'll dig.

Ok, so it's Grant's fault :-)

So basically, it's quite subtle and I'm only 99% sure of the details but
I believe what happens is:

 - The audio interrupts get virq 64 and 65 (so above NR_IRQS)

 - The reverse map isn't pre-filled at map time (we should probably do
it nowadays), we do it lazily so ...

 - The audio interrupt happens, it tries to pre-fill the reverse map
and ... fails (see below)

 - This cause irq_linear_revmap() to returns 0

 - This hits another bug in mpic where when that happens it doesn't EOI
the interrupt, which means the priority remains stuck high and we don't
get any other interrupt.

There's thus two bugs here:

 - We don't properly establish the reverse mapping for 64 and 65 (well,
not always, again see below)

 - We don't EOI an interrupt we can't reverse map (we should warn & EOI,
I'll send a patch for that).

The second problem is a bug we shouldn't hit if the first one didn't
happen, the first one comes from way irq_find_mapping() works. Basically
when the revmap entry is 0, we call it to find the mapping & populate
the revmap... and that fails.

That fails because it will only search up to irq_virq_count which is
statically initialized to NR_IRQS, which in our case is 64 so it won't
find our interrupts 64 and 65... 

The reason it works if you don't do the set_type(NONE) is a fluke, it
will crash as soon as you actually do audio:

The set_type(NONE) has the effect in mpic to configure the interrupt to
level low (default). This is also the idle state of the audio interrupt
(which is positive edge), and the MPIC appears to be latching it (yeah
odd, it does seem to latch level interrupts).

So as soon as the audio driver enables it, it fires, triggering the bug.
Without the set_type(NONE) the occurence of the bug is delayed to the
fist time you get a real audio interrupt.

Now what is the solution ? Well, there's several things I think we need
to do:

 - MPIC should properly EOI interrupts it can't revmap (I'll fix that)

 - MPIC should probably not do the set_type(NONE) on map, it's not
useful

 - However, we do have a risk of discrepency between the default trigger
type in the irq_desc vs. the default HW value at startup/map time, so we
do need to do something to reconcile them (that was the intend of NONE I
think)... without that, we risk having irq_create_of_mapping() not
setting the trigger because it thinks it thinks it's right... What do
you think is best here ? We can probably read the HW config and sync the
irq desc appropriately ...

 - The whole business with irq_virq_count needs fixing. Basically the
default value shouldn't be NR_IRQ. I suggest making it 0 and have all
the use sites do something like:

	max = irq_virq_count ? irq_virq_count : nr_irqs;

(Grant, can you take care of that ?)

 - We still need to clear up all the NR_IRQS occurences in arch/powerpc

Cheers,
Ben.
Grant Likely April 11, 2012, 8:57 p.m. UTC | #29
On Wed, 11 Apr 2012 15:29:42 +1000, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Wed, 2012-04-11 at 11:33 +1000, Benjamin Herrenschmidt wrote:
> > On Wed, 2012-04-11 at 11:13 +1000, Benjamin Herrenschmidt wrote:
> > > On Sat, 2012-04-07 at 14:27 +0200, Andreas Schwab wrote:
> > > > Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> > > > 
> > > > > It's arguable that this irq_set_irq_type(,NONE) shouln't be there but
> > > > > still ... it's been around for ever and things worked :-) So something
> > > > > -else- is causing the problem and I'd like to understand what exactly.
> > > > 
> > > > AFAICS before a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5
> > > > irq_set_irq_type(,NONE) was actually a no-op.
> > > 
> > > So I'm still a bit baffled... ie, I understand some of what's happening
> > > but not why it breaks things, I haven't yet managed to reproduce but I
> > > haven't tried too hard just yet (was away from the HW) :
> > 
> > Allright, I have a repro-case, I'll dig.
> 
> Ok, so it's Grant's fault :-)

I pretty much expected it would be.  :-p

> So basically, it's quite subtle and I'm only 99% sure of the details but
> I believe what happens is:
> 
>  - The audio interrupts get virq 64 and 65 (so above NR_IRQS)
> 
>  - The reverse map isn't pre-filled at map time (we should probably do
> it nowadays), we do it lazily so ...

Hmmm... I though I had merged a patch that does that.

/me goes to look again...

Okay, I did write that patch, but I never merged it because it didn't
get much review and I was already nervous about the other irq_domain
changes.  I'll post it again and ask for feedback.

>  - The whole business with irq_virq_count needs fixing. Basically the
> default value shouldn't be NR_IRQ. I suggest making it 0 and have all
> the use sites do something like:
> 
> 	max = irq_virq_count ? irq_virq_count : nr_irqs;
> 
> (Grant, can you take care of that ?)

Yeah, I've got a different way to fix it though.  There is exactly one
user of irq_virq_count in-tree right now: PS3.  Also, irq_virq_count
is only useful for the NOMAP mapping.  So, instead of having a single
global irq_virq_count values, I've dropped it entirely and added a
max_irq argument to irq_domain_add_nomap().  That makes it a property
of an individual nomap irq domain instead of a global system settting.

Hopefully I'll have a draft patch ready today.

g.
Benjamin Herrenschmidt April 11, 2012, 9:37 p.m. UTC | #30
On Wed, 2012-04-11 at 14:57 -0600, Grant Likely wrote:
> 
> Yeah, I've got a different way to fix it though.  There is exactly one
> user of irq_virq_count in-tree right now: PS3.  Also, irq_virq_count
> is only useful for the NOMAP mapping.  So, instead of having a single
> global irq_virq_count values, I've dropped it entirely and added a
> max_irq argument to irq_domain_add_nomap().  That makes it a property
> of an individual nomap irq domain instead of a global system settting.
> 
> Hopefully I'll have a draft patch ready today. 

That works for me. I'll send patches for cleanup MPIC as well.

One thing tho (Thomas, Russell) is that I like using set_irq_trigger to
establish the "defaults" in mpic, ie, it does the descriptor locking
etc... for me, I'd rather avoid open coding all of that. What I need is
a "variant" that doesn't actually change the trigger but instead
initializes the irq_desc with whatever settings the HW currently has
(ie, I need to make sure things are properly in sync) though other
implementations may want to use that for defaults.

Any objection to defining something like IRQ_TYPE_DEFAULT ?

I was thinking about making it equal to IRQ_TYPE_SENSE_MASK since that
can obviously not be a valid trigger value and is distinct from
IRQ_TYPE_NONE.

Cheers,
Ben.
Thomas Gleixner April 11, 2012, 9:47 p.m. UTC | #31
On Thu, 12 Apr 2012, Benjamin Herrenschmidt wrote:

> On Wed, 2012-04-11 at 14:57 -0600, Grant Likely wrote:
> > 
> > Yeah, I've got a different way to fix it though.  There is exactly one
> > user of irq_virq_count in-tree right now: PS3.  Also, irq_virq_count
> > is only useful for the NOMAP mapping.  So, instead of having a single
> > global irq_virq_count values, I've dropped it entirely and added a
> > max_irq argument to irq_domain_add_nomap().  That makes it a property
> > of an individual nomap irq domain instead of a global system settting.
> > 
> > Hopefully I'll have a draft patch ready today. 
> 
> That works for me. I'll send patches for cleanup MPIC as well.
> 
> One thing tho (Thomas, Russell) is that I like using set_irq_trigger to
> establish the "defaults" in mpic, ie, it does the descriptor locking
> etc... for me, I'd rather avoid open coding all of that. What I need is
> a "variant" that doesn't actually change the trigger but instead
> initializes the irq_desc with whatever settings the HW currently has
> (ie, I need to make sure things are properly in sync) though other
> implementations may want to use that for defaults.
> 
> Any objection to defining something like IRQ_TYPE_DEFAULT ?
> 
> I was thinking about making it equal to IRQ_TYPE_SENSE_MASK since that
> can obviously not be a valid trigger value and is distinct from
> IRQ_TYPE_NONE.

No objections.
Grant Likely April 19, 2012, 6:42 p.m. UTC | #32
On Thu, 12 Apr 2012 07:37:28 +1000, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Wed, 2012-04-11 at 14:57 -0600, Grant Likely wrote:
> > 
> > Yeah, I've got a different way to fix it though.  There is exactly one
> > user of irq_virq_count in-tree right now: PS3.  Also, irq_virq_count
> > is only useful for the NOMAP mapping.  So, instead of having a single
> > global irq_virq_count values, I've dropped it entirely and added a
> > max_irq argument to irq_domain_add_nomap().  That makes it a property
> > of an individual nomap irq domain instead of a global system settting.
> > 
> > Hopefully I'll have a draft patch ready today. 
> 
> That works for me. I'll send patches for cleanup MPIC as well.

Okay, I'll wait on these.  The MPIC fixes will need to be applied
before I can apply the automatic revmapping and hint removal patches.
Those patches will also need testing before I apply to linux-next.

g.
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index cb06b39..abdd7ef 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -191,33 +191,6 @@  extern unsigned int irq_linear_revmap(struct irq_domain *host,
 				      irq_hw_number_t hwirq);
 
 
-
-/**
- * irq_alloc_virt - Allocate virtual irq numbers
- * @host: host owning these new virtual irqs
- * @count: number of consecutive numbers to allocate
- * @hint: pass a hint number, the allocator will try to use a 1:1 mapping
- *
- * This is a low level function that is used internally by irq_create_mapping()
- * and that can be used by some irq controllers implementations for things
- * like allocating ranges of numbers for MSIs. The revmaps are left untouched.
- */
-extern unsigned int irq_alloc_virt(struct irq_domain *host,
-				   unsigned int count,
-				   unsigned int hint);
-
-/**
- * irq_free_virt - Free virtual irq numbers
- * @virq: virtual irq number of the first interrupt to free
- * @count: number of interrupts to free
- *
- * This function is the opposite of irq_alloc_virt. It will not clear reverse
- * maps, this should be done previously by unmap'ing the interrupt. In fact,
- * all interrupts covered by the range being freed should have been unmapped
- * prior to calling this.
- */
-extern void irq_free_virt(unsigned int virq, unsigned int count);
-
 /**
  * irq_early_init - Init irq remapping subsystem
  */
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 7305f2f..03c95f0 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -491,38 +491,29 @@  void do_softirq(void)
  * IRQ controller and virtual interrupts
  */
 
-/* The main irq map itself is an array of NR_IRQ entries containing the
- * associate host and irq number. An entry with a host of NULL is free.
- * An entry can be allocated if it's free, the allocator always then sets
- * hwirq first to the host's invalid irq number and then fills ops.
- */
-struct irq_map_entry {
-	irq_hw_number_t	hwirq;
-	struct irq_domain	*host;
-};
-
 static LIST_HEAD(irq_domain_list);
-static DEFINE_RAW_SPINLOCK(irq_big_lock);
+static DEFINE_MUTEX(irq_domain_mutex);
 static DEFINE_MUTEX(revmap_trees_mutex);
-static struct irq_map_entry irq_map[NR_IRQS];
 static unsigned int irq_virq_count = NR_IRQS;
 static struct irq_domain *irq_default_host;
 
 irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
 {
-	return irq_map[d->irq].hwirq;
+	return d->hwirq;
 }
 EXPORT_SYMBOL_GPL(irqd_to_hwirq);
 
 irq_hw_number_t virq_to_hw(unsigned int virq)
 {
-	return irq_map[virq].hwirq;
+	struct irq_data *irq_data = irq_get_irq_data(virq);
+	return WARN_ON(!irq_data) ? 0 : irq_data->hwirq;
 }
 EXPORT_SYMBOL_GPL(virq_to_hw);
 
 bool virq_is_host(unsigned int virq, struct irq_domain *host)
 {
-	return irq_map[virq].host == host;
+	struct irq_data *irq_data = irq_get_irq_data(virq);
+	return irq_data ? irq_data->domain == host : false;
 }
 EXPORT_SYMBOL_GPL(virq_is_host);
 
@@ -537,11 +528,10 @@  struct irq_domain *irq_alloc_host(struct device_node *of_node,
 				struct irq_domain_ops *ops,
 				irq_hw_number_t inval_irq)
 {
-	struct irq_domain *host;
+	struct irq_domain *host, *h;
 	unsigned int size = sizeof(struct irq_domain);
 	unsigned int i;
 	unsigned int *rmap;
-	unsigned long flags;
 
 	/* Allocate structure and revmap table if using linear mapping */
 	if (revmap_type == IRQ_DOMAIN_MAP_LINEAR)
@@ -559,23 +549,20 @@  struct irq_domain *irq_alloc_host(struct device_node *of_node,
 	if (host->ops->match == NULL)
 		host->ops->match = default_irq_host_match;
 
-	raw_spin_lock_irqsave(&irq_big_lock, flags);
-
-	/* If it's a legacy controller, check for duplicates and
-	 * mark it as allocated (we use irq 0 host pointer for that
-	 */
+	mutex_lock(&irq_domain_mutex);
+	/* Make sure only one legacy controller can be created */
 	if (revmap_type == IRQ_DOMAIN_MAP_LEGACY) {
-		if (irq_map[0].host != NULL) {
-			raw_spin_unlock_irqrestore(&irq_big_lock, flags);
-			of_node_put(host->of_node);
-			kfree(host);
-			return NULL;
+		list_for_each_entry(h, &irq_domain_list, link) {
+			if (WARN_ON(h->revmap_type == IRQ_DOMAIN_MAP_LEGACY)) {
+				mutex_unlock(&irq_domain_mutex);
+				of_node_put(host->of_node);
+				kfree(host);
+				return NULL;
+			}
 		}
-		irq_map[0].host = host;
 	}
-
 	list_add(&host->link, &irq_domain_list);
-	raw_spin_unlock_irqrestore(&irq_big_lock, flags);
+	mutex_unlock(&irq_domain_mutex);
 
 	/* Additional setups per revmap type */
 	switch(revmap_type) {
@@ -584,10 +571,9 @@  struct irq_domain *irq_alloc_host(struct device_node *of_node,
 		host->inval_irq = 0;
 		/* setup us as the host for all legacy interrupts */
 		for (i = 1; i < NUM_ISA_INTERRUPTS; i++) {
-			irq_map[i].hwirq = i;
-			smp_wmb();
-			irq_map[i].host = host;
-			smp_wmb();
+			struct irq_data *irq_data = irq_get_irq_data(i);
+			irq_data->hwirq = i;
+			irq_data->domain = host;
 
 			/* Legacy flags are left to default at this point,
 			 * one can then use irq_create_mapping() to
@@ -604,7 +590,6 @@  struct irq_domain *irq_alloc_host(struct device_node *of_node,
 		for (i = 0; i < revmap_arg; i++)
 			rmap[i] = NO_IRQ;
 		host->revmap_data.linear.size = revmap_arg;
-		smp_wmb();
 		host->revmap_data.linear.revmap = rmap;
 		break;
 	case IRQ_DOMAIN_MAP_TREE:
@@ -622,20 +607,19 @@  struct irq_domain *irq_alloc_host(struct device_node *of_node,
 struct irq_domain *irq_find_host(struct device_node *node)
 {
 	struct irq_domain *h, *found = NULL;
-	unsigned long flags;
 
 	/* We might want to match the legacy controller last since
 	 * it might potentially be set to match all interrupts in
 	 * the absence of a device node. This isn't a problem so far
 	 * yet though...
 	 */
-	raw_spin_lock_irqsave(&irq_big_lock, flags);
+	mutex_lock(&irq_domain_mutex);
 	list_for_each_entry(h, &irq_domain_list, link)
 		if (h->ops->match(h, node)) {
 			found = h;
 			break;
 		}
-	raw_spin_unlock_irqrestore(&irq_big_lock, flags);
+	mutex_unlock(&irq_domain_mutex);
 	return found;
 }
 EXPORT_SYMBOL_GPL(irq_find_host);
@@ -659,33 +643,20 @@  void irq_set_virq_count(unsigned int count)
 static int irq_setup_virq(struct irq_domain *host, unsigned int virq,
 			    irq_hw_number_t hwirq)
 {
-	int res;
-
-	res = irq_alloc_desc_at(virq, 0);
-	if (res != virq) {
-		pr_debug("irq: -> allocating desc failed\n");
-		goto error;
-	}
-
-	/* map it */
-	smp_wmb();
-	irq_map[virq].hwirq = hwirq;
-	smp_mb();
+	struct irq_data *irq_data = irq_get_irq_data(virq);
 
+	irq_data->hwirq = hwirq;
+	irq_data->domain = host;
 	if (host->ops->map(host, virq, hwirq)) {
 		pr_debug("irq: -> mapping failed, freeing\n");
-		goto errdesc;
+		irq_data->domain = NULL;
+		irq_data->hwirq = 0;
+		return -1;
 	}
 
 	irq_clear_status_flags(virq, IRQ_NOREQUEST);
 
 	return 0;
-
-errdesc:
-	irq_free_descs(virq, 1);
-error:
-	irq_free_virt(virq, 1);
-	return -1;
 }
 
 unsigned int irq_create_direct_mapping(struct irq_domain *host)
@@ -698,16 +669,24 @@  unsigned int irq_create_direct_mapping(struct irq_domain *host)
 	BUG_ON(host == NULL);
 	WARN_ON(host->revmap_type != IRQ_DOMAIN_MAP_NOMAP);
 
-	virq = irq_alloc_virt(host, 1, 0);
+	virq = irq_alloc_desc_from(1, 0);
 	if (virq == NO_IRQ) {
 		pr_debug("irq: create_direct virq allocation failed\n");
 		return NO_IRQ;
 	}
+	if (virq >= irq_virq_count) {
+		pr_err("ERROR: no free irqs available below %i maximum\n",
+			irq_virq_count);
+		irq_free_desc(virq);
+		return 0;
+	}
 
 	pr_debug("irq: create_direct obtained virq %d\n", virq);
 
-	if (irq_setup_virq(host, virq, virq))
+	if (irq_setup_virq(host, virq, virq)) {
+		irq_free_desc(virq);
 		return NO_IRQ;
+	}
 
 	return virq;
 }
@@ -747,15 +726,22 @@  unsigned int irq_create_mapping(struct irq_domain *host,
 	} else {
 		/* Allocate a virtual interrupt number */
 		hint = hwirq % irq_virq_count;
-		virq = irq_alloc_virt(host, 1, hint);
+		if (hint == 0)
+			hint = 1;
+		virq = irq_alloc_desc_from(hint, 0);
+		if (!virq)
+			virq = irq_alloc_desc_from(1, 0);
 		if (virq == NO_IRQ) {
 			pr_debug("irq: -> virq allocation failed\n");
 			return NO_IRQ;
 		}
 	}
 
-	if (irq_setup_virq(host, virq, hwirq))
+	if (irq_setup_virq(host, virq, hwirq)) {
+		if (host->revmap_type != IRQ_DOMAIN_MAP_LEGACY)
+			irq_free_desc(virq);
 		return NO_IRQ;
+	}
 
 	pr_debug("irq: irq %lu on host %s mapped to virtual irq %u\n",
 		hwirq, host->of_node ? host->of_node->full_name : "null", virq);
@@ -806,13 +792,14 @@  EXPORT_SYMBOL_GPL(irq_create_of_mapping);
 
 void irq_dispose_mapping(unsigned int virq)
 {
+	struct irq_data *irq_data = irq_get_irq_data(virq);
 	struct irq_domain *host;
 	irq_hw_number_t hwirq;
 
-	if (virq == NO_IRQ)
+	if (virq == NO_IRQ || !irq_data)
 		return;
 
-	host = irq_map[virq].host;
+	host = irq_data->domain;
 	if (WARN_ON(host == NULL))
 		return;
 
@@ -834,7 +821,7 @@  void irq_dispose_mapping(unsigned int virq)
 	smp_mb();
 
 	/* Clear reverse map */
-	hwirq = irq_map[virq].hwirq;
+	hwirq = irq_data->hwirq;
 	switch(host->revmap_type) {
 	case IRQ_DOMAIN_MAP_LINEAR:
 		if (hwirq < host->revmap_data.linear.size)
@@ -848,12 +835,9 @@  void irq_dispose_mapping(unsigned int virq)
 	}
 
 	/* Destroy map */
-	smp_mb();
-	irq_map[virq].hwirq = host->inval_irq;
+	irq_data->hwirq = host->inval_irq;
 
-	irq_free_descs(virq, 1);
-	/* Free it */
-	irq_free_virt(virq, 1);
+	irq_free_desc(virq);
 }
 EXPORT_SYMBOL_GPL(irq_dispose_mapping);
 
@@ -874,16 +858,16 @@  unsigned int irq_find_mapping(struct irq_domain *host,
 		return hwirq;
 
 	/* Slow path does a linear search of the map */
-	if (hint < NUM_ISA_INTERRUPTS)
-		hint = NUM_ISA_INTERRUPTS;
+	if (hint == 0)
+		hint = 1;
 	i = hint;
-	do  {
-		if (irq_map[i].host == host &&
-		    irq_map[i].hwirq == hwirq)
+	do {
+		struct irq_data *data = irq_get_irq_data(i);
+		if (data && (data->domain == host) && (data->hwirq == hwirq))
 			return i;
 		i++;
 		if (i >= irq_virq_count)
-			i = NUM_ISA_INTERRUPTS;
+			i = 1;
 	} while(i != hint);
 	return NO_IRQ;
 }
@@ -928,19 +912,17 @@  int irq_choose_cpu(const struct cpumask *mask)
 unsigned int irq_radix_revmap_lookup(struct irq_domain *host,
 				     irq_hw_number_t hwirq)
 {
-	struct irq_map_entry *ptr;
-	unsigned int virq;
+	struct irq_data *irq_data;
 
 	if (WARN_ON_ONCE(host->revmap_type != IRQ_DOMAIN_MAP_TREE))
 		return irq_find_mapping(host, hwirq);
 
 	/*
-	 * The ptr returned references the static global irq_map.
-	 * but freeing an irq can delete nodes along the path to
+	 * Freeing an irq can delete nodes along the path to
 	 * do the lookup via call_rcu.
 	 */
 	rcu_read_lock();
-	ptr = radix_tree_lookup(&host->revmap_data.tree, hwirq);
+	irq_data = radix_tree_lookup(&host->revmap_data.tree, hwirq);
 	rcu_read_unlock();
 
 	/*
@@ -948,24 +930,20 @@  unsigned int irq_radix_revmap_lookup(struct irq_domain *host,
 	 * Else fallback to linear lookup - this should not happen in practice
 	 * as it means that we failed to insert the node in the radix tree.
 	 */
-	if (ptr)
-		virq = ptr - irq_map;
-	else
-		virq = irq_find_mapping(host, hwirq);
-
-	return virq;
+	return irq_data ? irq_data->irq : irq_find_mapping(host, hwirq);
 }
 
 void irq_radix_revmap_insert(struct irq_domain *host, unsigned int virq,
 			     irq_hw_number_t hwirq)
 {
+	struct irq_data *irq_data = irq_get_irq_data(virq);
+
 	if (WARN_ON(host->revmap_type != IRQ_DOMAIN_MAP_TREE))
 		return;
 
 	if (virq != NO_IRQ) {
 		mutex_lock(&revmap_trees_mutex);
-		radix_tree_insert(&host->revmap_data.tree, hwirq,
-				  &irq_map[virq]);
+		radix_tree_insert(&host->revmap_data.tree, hwirq, irq_data);
 		mutex_unlock(&revmap_trees_mutex);
 	}
 }
@@ -994,86 +972,6 @@  unsigned int irq_linear_revmap(struct irq_domain *host,
 	return revmap[hwirq];
 }
 
-unsigned int irq_alloc_virt(struct irq_domain *host,
-			    unsigned int count,
-			    unsigned int hint)
-{
-	unsigned long flags;
-	unsigned int i, j, found = NO_IRQ;
-
-	if (count == 0 || count > (irq_virq_count - NUM_ISA_INTERRUPTS))
-		return NO_IRQ;
-
-	raw_spin_lock_irqsave(&irq_big_lock, flags);
-
-	/* Use hint for 1 interrupt if any */
-	if (count == 1 && hint >= NUM_ISA_INTERRUPTS &&
-	    hint < irq_virq_count && irq_map[hint].host == NULL) {
-		found = hint;
-		goto hint_found;
-	}
-
-	/* Look for count consecutive numbers in the allocatable
-	 * (non-legacy) space
-	 */
-	for (i = NUM_ISA_INTERRUPTS, j = 0; i < irq_virq_count; i++) {
-		if (irq_map[i].host != NULL)
-			j = 0;
-		else
-			j++;
-
-		if (j == count) {
-			found = i - count + 1;
-			break;
-		}
-	}
-	if (found == NO_IRQ) {
-		raw_spin_unlock_irqrestore(&irq_big_lock, flags);
-		return NO_IRQ;
-	}
- hint_found:
-	for (i = found; i < (found + count); i++) {
-		irq_map[i].hwirq = host->inval_irq;
-		smp_wmb();
-		irq_map[i].host = host;
-	}
-	raw_spin_unlock_irqrestore(&irq_big_lock, flags);
-	return found;
-}
-
-void irq_free_virt(unsigned int virq, unsigned int count)
-{
-	unsigned long flags;
-	unsigned int i;
-
-	WARN_ON (virq < NUM_ISA_INTERRUPTS);
-	WARN_ON (count == 0 || (virq + count) > irq_virq_count);
-
-	if (virq < NUM_ISA_INTERRUPTS) {
-		if (virq + count < NUM_ISA_INTERRUPTS)
-			return;
-		count  =- NUM_ISA_INTERRUPTS - virq;
-		virq = NUM_ISA_INTERRUPTS;
-	}
-
-	if (count > irq_virq_count || virq > irq_virq_count - count) {
-		if (virq > irq_virq_count)
-			return;
-		count = irq_virq_count - virq;
-	}
-
-	raw_spin_lock_irqsave(&irq_big_lock, flags);
-	for (i = virq; i < (virq + count); i++) {
-		struct irq_domain *host;
-
-		host = irq_map[i].host;
-		irq_map[i].hwirq = host->inval_irq;
-		smp_wmb();
-		irq_map[i].host = NULL;
-	}
-	raw_spin_unlock_irqrestore(&irq_big_lock, flags);
-}
-
 int arch_early_irq_init(void)
 {
 	return 0;
@@ -1103,7 +1001,7 @@  static int virq_debug_show(struct seq_file *m, void *private)
 			struct irq_chip *chip;
 
 			seq_printf(m, "%5d  ", i);
-			seq_printf(m, "0x%05lx  ", irq_map[i].hwirq);
+			seq_printf(m, "0x%05lx  ", desc->irq_data.hwirq);
 
 			chip = irq_desc_get_chip(desc);
 			if (chip && chip->name)
@@ -1115,8 +1013,8 @@  static int virq_debug_show(struct seq_file *m, void *private)
 			data = irq_desc_get_chip_data(desc);
 			seq_printf(m, "0x%16p  ", data);
 
-			if (irq_map[i].host && irq_map[i].host->of_node)
-				p = irq_map[i].host->of_node->full_name;
+			if (desc->irq_data.domain->of_node)
+				p = desc->irq_data.domain->of_node->full_name;
 			else
 				p = none;
 			seq_printf(m, "%s\n", p);