[60/98] aacraid: Fix for arrays are going offline in the system. System hangs

Message ID 1373552708-15235-61-git-send-email-luis.henriques@canonical.com
State New
Headers show

Commit Message

Luis Henriques July 11, 2013, 2:24 p.m. -stable review patch.  If anyone has any objections, please let me know.


From: Mahesh Rajashekhara <Mahesh.Rajashekhara@pmcs.com>

commit c5bebd829dd95602c15f8da8cc50fa938b5e0254 upstream.

One of the customer had reported that the set of raid logical arrays will
become unavailable (I/O offline) after a long hours of IO stress test.  The OS
wouldn`t be accessible afterwards and require a hard reset.

This driver patch has a fix for race condition between the doorbell and the
circular buffer. The driver is modified to do an extra read after clearing the
doorbell in case there had been a completion posted during the small timing

With this fix, we ran IO stress for ~13 days. There were no IO failures.

Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@pmcs.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[ luis: backported to 3.5: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
 drivers/scsi/aacraid/src.c | 3 +++
 1 file changed, 3 insertions(+)


diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c
index 7628206..4de2612 100644
--- a/drivers/scsi/aacraid/src.c
+++ b/drivers/scsi/aacraid/src.c
@@ -101,6 +101,9 @@  static irqreturn_t aac_src_intr_message(int irq, void *dev_id)
 			struct list_head *entry;
 			int send_it = 0;
+			src_writel(dev, MUnit.ODR_C, bellbits);
+			src_readl(dev, MUnit.ODR_C);
 			if (dev->sync_fib) {
 				our_interrupt = 1;
 				if (dev->sync_fib->callback)