diff mbox

[RESEND,v3,5/6] AHCI: Optimize single IRQ interrupt processing

Message ID b1a04379e05e40f9774fb4668609fac4255a6514.1411297686.git.agordeev@redhat.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Alexander Gordeev Sept. 21, 2014, 1:19 p.m. UTC
Split interrupt service routine into hardware context handler
and threaded context handler. That allows to protect ports with
individual locks rather than with a single host-wide lock and
move port interrupts handling out of the hardware interrupt
context.

Testing was done by transferring 8GB on two hard drives in
parallel using command 'dd if=/dev/sd{a,b} of=/dev/null'. With
lock_stat statistics I measured access times to ata_host::lock
spinlock (since interrupt handler code is fully embraced with
this lock). The average lock's holdtime decreased eight times
while average waittime decreased two times.

Both before and after the change the transfer time is the same,
while 'perf record -e cycles:k ...' shows 1%-4% CPU time spent
in ahci_single_irq_intr() routine before the update and not even
sampled/shown ahci_single_irq_intr() after the update.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: linux-ide@vger.kernel.org
---
 drivers/ata/ahci.c    | 29 ++++++++++++++++++++++++++--
 drivers/ata/ahci.h    |  1 +
 drivers/ata/libahci.c | 53 ++++++++++++++++++++++++++++++++-------------------
 3 files changed, 61 insertions(+), 22 deletions(-)

Comments

Tejun Heo Sept. 23, 2014, 8:57 p.m. UTC | #1
Hello, Alexander.

On Sun, Sep 21, 2014 at 03:19:28PM +0200, Alexander Gordeev wrote:
> Split interrupt service routine into hardware context handler
> and threaded context handler. That allows to protect ports with
> individual locks rather than with a single host-wide lock and
> move port interrupts handling out of the hardware interrupt
> context.
> 
> Testing was done by transferring 8GB on two hard drives in
> parallel using command 'dd if=/dev/sd{a,b} of=/dev/null'. With
> lock_stat statistics I measured access times to ata_host::lock
> spinlock (since interrupt handler code is fully embraced with
> this lock). The average lock's holdtime decreased eight times
> while average waittime decreased two times.
> 
> Both before and after the change the transfer time is the same,
> while 'perf record -e cycles:k ...' shows 1%-4% CPU time spent
> in ahci_single_irq_intr() routine before the update and not even
> sampled/shown ahci_single_irq_intr() after the update.

Hmmm... how does it affect single device operation tho?  It does make
individual interrupt handling heavier, no?

Thanks.
Alexander Gordeev Sept. 24, 2014, 10:42 a.m. UTC | #2
On Tue, Sep 23, 2014 at 04:57:10PM -0400, Tejun Heo wrote:
> Hmmm... how does it affect single device operation tho?  It does make
> individual interrupt handling heavier, no?

I think it is difficult to assess "individual interrupt handling", since
it depends from both the hardware and device access pattern. On the system
I use the results are rather counter-intuitive: ahci_thread_fn() does not
show up in perf report at all, nor ahci_single_irq_intr(). While before
the change ahci_single_irq_intr() reported 0.00%.

But since the handling is split in two parts it is rather incorrect to
apply the same metric to the threaded context. Obviously, the threaded
handler is expected slowed down by other interrupts handlers, but the
whole system should benefit from it, which is exactly the aim of this
change.

> -- 
> tejun
Tejun Heo Sept. 24, 2014, 1:04 p.m. UTC | #3
Hello, Alexander.

On Wed, Sep 24, 2014 at 11:42:15AM +0100, Alexander Gordeev wrote:
> On Tue, Sep 23, 2014 at 04:57:10PM -0400, Tejun Heo wrote:
> > Hmmm... how does it affect single device operation tho?  It does make
> > individual interrupt handling heavier, no?
> 
> I think it is difficult to assess "individual interrupt handling", since
> it depends from both the hardware and device access pattern. On the system
> I use the results are rather counter-intuitive: ahci_thread_fn() does not
> show up in perf report at all, nor ahci_single_irq_intr(). While before
> the change ahci_single_irq_intr() reported 0.00%.
> 
> But since the handling is split in two parts it is rather incorrect to
> apply the same metric to the threaded context. Obviously, the threaded
> handler is expected slowed down by other interrupts handlers, but the
> whole system should benefit from it, which is exactly the aim of this
> change.

Hmmm, how would the whole system benefit from it if there's only
single device?  Each individual servicing of the interrupt does more
now which includes scheduling which may end up adding to completion
latency.

The thing I don't get is why multiple MSI handling and this patchset
are tied to threaded interrupt handling.  Splitting locks don't
necessarily have much to do with threaded handling and it's not like
ahci interrupt handling is heavy.  The hot path is pretty short
actually.  The meat of the work - completing requests and propagating
completions - is offloaded to softirq by block layer anyway.

Just to be clear, I'm not against the proposed changes but wanna
understand the justifications behind them.

Thanks.
Chuck Ebbert Sept. 24, 2014, 1:27 p.m. UTC | #4
On Wed, 24 Sep 2014 09:04:44 -0400
Tejun Heo <tj@kernel.org> wrote:

> Hello, Alexander.
> 
> On Wed, Sep 24, 2014 at 11:42:15AM +0100, Alexander Gordeev wrote:
> > On Tue, Sep 23, 2014 at 04:57:10PM -0400, Tejun Heo wrote:
> > > Hmmm... how does it affect single device operation tho?  It does make
> > > individual interrupt handling heavier, no?
> > 
> > I think it is difficult to assess "individual interrupt handling", since
> > it depends from both the hardware and device access pattern. On the system
> > I use the results are rather counter-intuitive: ahci_thread_fn() does not
> > show up in perf report at all, nor ahci_single_irq_intr(). While before
> > the change ahci_single_irq_intr() reported 0.00%.
> > 
> > But since the handling is split in two parts it is rather incorrect to
> > apply the same metric to the threaded context. Obviously, the threaded
> > handler is expected slowed down by other interrupts handlers, but the
> > whole system should benefit from it, which is exactly the aim of this
> > change.
> 
> Hmmm, how would the whole system benefit from it if there's only
> single device?  Each individual servicing of the interrupt does more
> now which includes scheduling which may end up adding to completion
> latency.
> 

I think he meant other, non-AHCI, interrupt handlers would benefit. A
good test of this patch might be to stream 10Gb ethernet while also
streaming writes to an AHCI device.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo Sept. 24, 2014, 1:36 p.m. UTC | #5
On Wed, Sep 24, 2014 at 08:27:12AM -0500, Chuck Ebbert wrote:
> I think he meant other, non-AHCI, interrupt handlers would benefit. A
> good test of this patch might be to stream 10Gb ethernet while also
> streaming writes to an AHCI device.

I'm a bit doubtful this'd make a noticeable difference.  ahci
interrupt handler doesn't do that much to begin with.

Thanks.
Alexander Gordeev Sept. 24, 2014, 2:08 p.m. UTC | #6
On Wed, Sep 24, 2014 at 09:04:44AM -0400, Tejun Heo wrote:
> Hello, Alexander.
> 
> On Wed, Sep 24, 2014 at 11:42:15AM +0100, Alexander Gordeev wrote:
> > On Tue, Sep 23, 2014 at 04:57:10PM -0400, Tejun Heo wrote:
> > > Hmmm... how does it affect single device operation tho?  It does make
> > > individual interrupt handling heavier, no?
> > 
> > I think it is difficult to assess "individual interrupt handling", since
> > it depends from both the hardware and device access pattern. On the system
> > I use the results are rather counter-intuitive: ahci_thread_fn() does not
> > show up in perf report at all, nor ahci_single_irq_intr(). While before
> > the change ahci_single_irq_intr() reported 0.00%.
> > 
> > But since the handling is split in two parts it is rather incorrect to
> > apply the same metric to the threaded context. Obviously, the threaded
> > handler is expected slowed down by other interrupts handlers, but the
> > whole system should benefit from it, which is exactly the aim of this
> > change.
> 
> Hmmm, how would the whole system benefit from it if there's only
> single device?  Each individual servicing of the interrupt does more
> now which includes scheduling which may end up adding to completion
> latency.

As Chuck noticed, non-AHCI hardware context handlers will benefit.

> The thing I don't get is why multiple MSI handling and this patchset
> are tied to threaded interrupt handling.

Multiple MSIs were implemented with the above aim (let's say aim #1)
right away. Single MSI/IRQ handling is getting updated with this series.

> Splitting locks don't
> necessarily have much to do with threaded handling and it's not like
> ahci interrupt handling is heavy.  The hot path is pretty short
> actually.  The meat of the work - completing requests and propagating
> completions - is offloaded to softirq by block layer anyway.

So the aim (let's say aim #2) is to avoid any of those to compete with
hardware context handler. IOW, not to wait on host/port spinlocks with
local interrupts disabled unnecessarily.

I assume, if at the time of writing of original handlers the two
interrupt context existed, they were written the way I propose now :)

> Just to be clear, I'm not against the proposed changes but wanna
> understand the justifications behind them.

Should I send the fixed series? ;)

> Thanks.
> 
> -- 
> tejun
Tejun Heo Sept. 24, 2014, 2:39 p.m. UTC | #7
Hello, Alexander.

On Wed, Sep 24, 2014 at 03:08:44PM +0100, Alexander Gordeev wrote:
> > Hmmm, how would the whole system benefit from it if there's only
> > single device?  Each individual servicing of the interrupt does more
> > now which includes scheduling which may end up adding to completion
> > latency.
> 
> As Chuck noticed, non-AHCI hardware context handlers will benefit.

Maybe I'm off but I'm kinda skeptical that we'd be gaining back the
overhead we pay by punting to a thread.

> > The thing I don't get is why multiple MSI handling and this patchset
> > are tied to threaded interrupt handling.
> 
> Multiple MSIs were implemented with the above aim (let's say aim #1)
> right away. Single MSI/IRQ handling is getting updated with this series.

Yeah, I get that.  I'm curious whether that was justified.

> > Splitting locks don't
> > necessarily have much to do with threaded handling and it's not like
> > ahci interrupt handling is heavy.  The hot path is pretty short
> > actually.  The meat of the work - completing requests and propagating
> > completions - is offloaded to softirq by block layer anyway.
> 
> So the aim (let's say aim #2) is to avoid any of those to compete with
> hardware context handler. IOW, not to wait on host/port spinlocks with
> local interrupts disabled unnecessarily.
> 
> I assume, if at the time of writing of original handlers the two
> interrupt context existed, they were written the way I propose now :)

Maybe it makes sense with many high speed devices attached to a single
host; otherwise, I think we'd prolly be paying more than we're
gaining.  Lock splitting itself is likely beneficial as our issue path
is a lot heavier than completion path but I'm not too sure about
splitting completion contexts especially given that completion for
block layer and up are already punted to softirq.

Would it be possible for you compare threaded vs. unthreaded under
relatively heavy load?  ie. let the interrupt handler access irq
status under host lock but release it and then go through per-port
locks from the interrupt handler.

Thanks for doing this!
Alexander Gordeev Sept. 24, 2014, 2:59 p.m. UTC | #8
On Wed, Sep 24, 2014 at 10:39:13AM -0400, Tejun Heo wrote:
> Would it be possible for you compare threaded vs. unthreaded under
> relatively heavy load?

I will try, although not quite soon.

In the meantime I could fix and resend patches 1,2,3 and 6 as
they are not related to this topic. Makes sense?

> -- 
> tejun
Elliott, Robert (Server Storage) Sept. 25, 2014, 3 a.m. UTC | #9
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Tejun Heo
...
> The thing I don't get is why multiple MSI handling and this patchset
> are tied to threaded interrupt handling.  Splitting locks don't
> necessarily have much to do with threaded handling and it's not like
> ahci interrupt handling is heavy.  The hot path is pretty short
> actually.  The meat of the work - completing requests and propagating
> completions - is offloaded to softirq by block layer anyway.

blk-mq/scsi-mq chose to move all completion work into hardirq context,
so this seems headed in a different direction.


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo Sept. 25, 2014, 3:27 a.m. UTC | #10
On Wed, Sep 24, 2014 at 03:59:07PM +0100, Alexander Gordeev wrote:
> On Wed, Sep 24, 2014 at 10:39:13AM -0400, Tejun Heo wrote:
> > Would it be possible for you compare threaded vs. unthreaded under
> > relatively heavy load?
> 
> I will try, although not quite soon.
> 
> In the meantime I could fix and resend patches 1,2,3 and 6 as
> they are not related to this topic. Makes sense?

Yeap, sure thing.

Thanks.
Alexander Gordeev Oct. 1, 2014, 3:31 p.m. UTC | #11
On Wed, Sep 24, 2014 at 10:39:13AM -0400, Tejun Heo wrote:
> Hello, Alexander.
> 
> On Wed, Sep 24, 2014 at 03:08:44PM +0100, Alexander Gordeev wrote:
> > > Hmmm, how would the whole system benefit from it if there's only
> > > single device?  Each individual servicing of the interrupt does more
> > > now which includes scheduling which may end up adding to completion
> > > latency.
> > 
> > As Chuck noticed, non-AHCI hardware context handlers will benefit.
> 
> Maybe I'm off but I'm kinda skeptical that we'd be gaining back the
> overhead we pay by punting to a thread.

Hi Tejun,

As odd as it sounds, I did not mention there is *no* change in IO
performance at all (in my system): neither with one drive nor two.
The change is only about how the interrupt handlers co-exist with
other devices.

I am attaching excerpts from some new perf tests I have done (this
time in legacy interrupt mode). As you can notice, ahci_interrupt()
CPU time drops from 4% to none.

As of your concern wrt threaded handler invocation overhead - I am
not quite sure here, but if SCHED_FIFO policy (the handler runs with)
makes the difference? Anyway, as said above the overall IO does not
suffer.

> -- 
> tejun
Alexander Gordeev Oct. 1, 2014, 3:39 p.m. UTC | #12
On Wed, Oct 01, 2014 at 04:31:14PM +0100, Alexander Gordeev wrote:
> I am attaching excerpts from some new perf tests I have done (this

Attaching :)
Tejun Heo Oct. 5, 2014, 3:23 a.m. UTC | #13
Hey, Alexander.

On Wed, Oct 01, 2014 at 04:31:15PM +0100, Alexander Gordeev wrote:
> As of your concern wrt threaded handler invocation overhead - I am
> not quite sure here, but if SCHED_FIFO policy (the handler runs with)
> makes the difference? Anyway, as said above the overall IO does not
> suffer.

Hmmm.... so, AFAICS, there's no real pros or cons of going either way,
right?  The only thing which could be different is possibly slightly
lower latency in servicing other IRQs or RT tasks on the same CPU but
given that the ahci IRQ handler already doesn't do anything which
takes time, I'm doubtful whether that'd be anything measureable.

I just don't get why ahci bothers with threaded irq, MMSI or not.

Thanks.
Tejun Heo Oct. 5, 2014, 4:16 p.m. UTC | #14
A bit of addition.

On Sat, Oct 04, 2014 at 10:23:11PM -0400, Tejun Heo wrote:
> Hmmm.... so, AFAICS, there's no real pros or cons of going either way,
> right?  The only thing which could be different is possibly slightly
> lower latency in servicing other IRQs or RT tasks on the same CPU but
> given that the ahci IRQ handler already doesn't do anything which
> takes time, I'm doubtful whether that'd be anything measureable.
> 
> I just don't get why ahci bothers with threaded irq, MMSI or not.

I think the thing which bothers me is that due to softirq we end up
bouncing the context twice.  IRQ schedules threaded IRQ handler after
doing minimal amount of work.  The threaded IRQ handler gets scheduled
and again it doesn't do much but basically just schedules block
softirq to actually run completions which is the heavier part.
Apparently this doesn't seem to hurt measureably but it's just weird.
Why are we bouncing the context twice?

Thanks.
Alexander Gordeev Oct. 6, 2014, 7:27 a.m. UTC | #15
On Sun, Oct 05, 2014 at 12:16:46PM -0400, Tejun Heo wrote:
> I think the thing which bothers me is that due to softirq we end up
> bouncing the context twice.  IRQ schedules threaded IRQ handler after
> doing minimal amount of work.  The threaded IRQ handler gets scheduled
> and again it doesn't do much but basically just schedules block
> softirq to actually run completions which is the heavier part.
> Apparently this doesn't seem to hurt measureably but it's just weird.

Hi Tejun,

That is exactly the point I was concerned with when stated in one of
changelogs "The downside of this change is introduction of a kernel
thread". Splitting the service routine in two parts is a small change
(in terms of code familiarity). Yet it right away provides benefits I
could observe and justify (to myself at least).

> Why are we bouncing the context twice?

I *did* consider moving the threaded handler code to the softirq part.
I just wanted to get updates in stages: to address hardware interrupts
latency first and possibly threaded hander next. Getting done these two
together would be too big change for me ;)

> -- 
> tejun
Tejun Heo Oct. 6, 2014, 12:58 p.m. UTC | #16
Hello, Alexander.

On Mon, Oct 06, 2014 at 08:27:11AM +0100, Alexander Gordeev wrote:
> > Why are we bouncing the context twice?
> 
> I *did* consider moving the threaded handler code to the softirq part.
> I just wanted to get updates in stages: to address hardware interrupts
> latency first and possibly threaded hander next. Getting done these two
> together would be too big change for me ;)

I don't think we'd be able to move libata handling to block softirq
and probably end up just doing it from the irq context.  Anyways, as
long as you're gonna keep working on it, I have no objection to the
proposed changes.  Do you have a refreshed version or is the current
version good for inclusion?

Thanks.
Alexander Gordeev Oct. 6, 2014, 1:24 p.m. UTC | #17
On Mon, Oct 06, 2014 at 08:58:17AM -0400, Tejun Heo wrote:
> I don't think we'd be able to move libata handling to block softirq
> and probably end up just doing it from the irq context.  Anyways, as
> long as you're gonna keep working on it, I have no objection to the
> proposed changes.  Do you have a refreshed version or is the current
> version good for inclusion?

No, this one would not apply. I can send updated version on top of
v5 I posted earlier. Should I?

> tejun
Tejun Heo Oct. 6, 2014, 2:54 p.m. UTC | #18
On Mon, Oct 06, 2014 at 02:24:46PM +0100, Alexander Gordeev wrote:
> On Mon, Oct 06, 2014 at 08:58:17AM -0400, Tejun Heo wrote:
> > I don't think we'd be able to move libata handling to block softirq
> > and probably end up just doing it from the irq context.  Anyways, as
> > long as you're gonna keep working on it, I have no objection to the
> > proposed changes.  Do you have a refreshed version or is the current
> > version good for inclusion?
> 
> No, this one would not apply. I can send updated version on top of
> v5 I posted earlier. Should I?

Yeap, please do so.

Thanks a lot for your patience!  :)
diff mbox

Patch

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 4a849f8..0a6d112 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1280,6 +1280,31 @@  out_free_irqs:
 	return rc;
 }
 
+static int ahci_host_activate_single_irq(struct ata_host *host, int irq,
+					 struct scsi_host_template *sht)
+{
+	int i, rc;
+
+	rc = ata_host_start(host);
+	if (rc)
+		return rc;
+
+	rc = devm_request_threaded_irq(host->dev, irq, ahci_single_irq_intr,
+				       ahci_thread_fn, IRQF_SHARED,
+				       dev_driver_string(host->dev), host);
+	if (rc)
+		return rc;
+
+	for (i = 0; i < host->n_ports; i++)
+		ata_port_desc(host->ports[i], "irq %d", irq);
+
+	rc = ata_host_register(host, sht);
+	if (rc)
+		devm_free_irq(host->dev, irq, host);
+
+	return rc;
+}
+
 /**
  *	ahci_host_activate - start AHCI host, request IRQs and register it
  *	@host: target ATA host
@@ -1305,8 +1330,8 @@  int ahci_host_activate(struct ata_host *host, int irq,
 	if (hpriv->flags & AHCI_HFLAG_MULTI_MSI)
 		rc = ahci_host_activate_multi_irqs(host, irq, sht);
 	else
-		rc = ata_host_activate(host, irq, ahci_single_irq_intr,
-				       IRQF_SHARED, sht);
+		rc = ahci_host_activate_single_irq(host, irq, sht);
+	return rc;
 }
 EXPORT_SYMBOL_GPL(ahci_host_activate);
 
diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h
index 44c02f7..c12f590 100644
--- a/drivers/ata/ahci.h
+++ b/drivers/ata/ahci.h
@@ -390,6 +390,7 @@  void ahci_set_em_messages(struct ahci_host_priv *hpriv,
 int ahci_reset_em(struct ata_host *host);
 irqreturn_t ahci_single_irq_intr(int irq, void *dev_instance);
 irqreturn_t ahci_multi_irqs_intr(int irq, void *dev_instance);
+irqreturn_t ahci_thread_fn(int irq, void *dev_instance);
 irqreturn_t ahci_port_thread_fn(int irq, void *dev_instance);
 void ahci_print_info(struct ata_host *host, const char *scc_s);
 int ahci_host_activate(struct ata_host *host, int irq,
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index cbe7757..169c272 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -1778,17 +1778,6 @@  static void ahci_handle_port_interrupt(struct ata_port *ap, u32 status)
 	}
 }
 
-static void ahci_port_intr(struct ata_port *ap)
-{
-	void __iomem *port_mmio = ahci_port_base(ap);
-	u32 status;
-
-	status = readl(port_mmio + PORT_IRQ_STAT);
-	writel(status, port_mmio + PORT_IRQ_STAT);
-
-	ahci_handle_port_interrupt(ap, status);
-}
-
 irqreturn_t ahci_port_thread_fn(int irq, void *dev_instance)
 {
 	struct ata_port *ap = dev_instance;
@@ -1810,6 +1799,35 @@  irqreturn_t ahci_port_thread_fn(int irq, void *dev_instance)
 }
 EXPORT_SYMBOL_GPL(ahci_port_thread_fn);
 
+irqreturn_t ahci_thread_fn(int irq, void *dev_instance)
+{
+	struct ata_host *host = dev_instance;
+	struct ahci_host_priv *hpriv = host->private_data;
+	u32 irq_masked = hpriv->port_map;
+	unsigned int i;
+
+	for (i = 0; i < host->n_ports; i++) {
+		struct ata_port *ap;
+
+		if (!(irq_masked & (1 << i)))
+			continue;
+
+		ap = host->ports[i];
+		if (ap) {
+			ahci_port_thread_fn(irq, ap);
+			VPRINTK("port %u\n", i);
+		} else {
+			VPRINTK("port %u (no irq)\n", i);
+			if (ata_ratelimit())
+				dev_warn(host->dev,
+					 "interrupt on disabled port %u\n", i);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+EXPORT_SYMBOL_GPL(ahci_thread_fn);
+
 static void ahci_update_intr_status(struct ata_port *ap)
 {
 	void __iomem *port_mmio = ahci_port_base(ap);
@@ -1908,7 +1926,7 @@  irqreturn_t ahci_single_irq_intr(int irq, void *dev_instance)
 
 		ap = host->ports[i];
 		if (ap) {
-			ahci_port_intr(ap);
+			ahci_update_intr_status(ap);
 			VPRINTK("port %u\n", i);
 		} else {
 			VPRINTK("port %u (no irq)\n", i);
@@ -1935,7 +1953,7 @@  irqreturn_t ahci_single_irq_intr(int irq, void *dev_instance)
 
 	VPRINTK("EXIT\n");
 
-	return IRQ_RETVAL(handled);
+	return handled ? IRQ_WAKE_THREAD : IRQ_NONE;
 }
 EXPORT_SYMBOL_GPL(ahci_single_irq_intr);
 
@@ -2348,13 +2366,8 @@  static int ahci_port_start(struct ata_port *ap)
 	 */
 	pp->intr_mask = DEF_PORT_IRQ;
 
-	/*
-	 * Switch to per-port locking in case each port has its own MSI vector.
-	 */
-	if ((hpriv->flags & AHCI_HFLAG_MULTI_MSI)) {
-		spin_lock_init(&pp->lock);
-		ap->lock = &pp->lock;
-	}
+	spin_lock_init(&pp->lock);
+	ap->lock = &pp->lock;
 
 	ap->private_data = pp;