diff mbox

please fix FUSION (Was: [v3.13][v3.14][Regression] kthread:makekthread_create()killable)

Message ID 20140321183443.GA3891@redhat.com
State New
Headers show

Commit Message

Oleg Nesterov March 21, 2014, 6:34 p.m. UTC
On 03/20, Oleg Nesterov wrote:
>
> On 03/20, Joseph Salisbury wrote:
> >
> > There was some testing done with your test kernel.  The data collected
> > while running your kernel is available in the bug report:
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705/comments/58
>
> Joseph, thanks a lot.
>
> I'll try to read the logs tomorrow, but at first glance Tetsuo was right,
> I do not see a "long" sleep in that log.

Yes, it seems that it actually needs > 30 secs. It spends most of the time
(30.13286 seconds) in

	msleep+0x20/0x30
	WaitForDoorbellInt+0x103/0x130 [mptbase]
	WaitForDoorbellReply+0x42/0x220 [mptbase]
	mpt_handshake_req_reply_wait+0x1dc/0x2c0 [mptbase]
	SendPortEnable.constprop.23+0x94/0xc0 [mptbase]

WaitForDoorbellInt() does msleep(1) in a loop. This trace starts at the line
6001, and it is repeated 3792 times, till the line 176686 which apparently
shows the trace of the 2nd WaitForDoorbellInt() in WaitForDoorbellReply().

SendPortEnable:

	if (ioc->ir_firmware || ioc->bus_type == SAS) {
		rc = mpt_handshake_req_reply_wait(ioc, req_sz,
		(u32*)&port_enable, reply_sz, (u16*)&reply_buf,
		300 /*seconds*/, sleepFlag);
	} else {
		rc = mpt_handshake_req_reply_wait(ioc, req_sz,
		(u32*)&port_enable, reply_sz, (u16*)&reply_buf,
		30 /*seconds*/, sleepFlag);
	}

I am wondering which branch calls mpt_handshake_req_reply_wait(), the
else's timeout=30 (passed to the 1st WaitForDoorbellInt) suspiciously
matches the time WaitForDoorbellInt() actually runs... But everything
works fine and at first glance the potential timeout error should be
propogated correctly. So "timeout" is probably 300. And probably this
all is fine.

All I can suggest is the dirty hack for now. User-space should be
changed too, I think, but this is another story.

Tetsuo, what do you think?

Oleg.
---

Comments

Linus Torvalds March 21, 2014, 7:32 p.m. UTC | #1
On Fri, Mar 21, 2014 at 11:34 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Yes, it seems that it actually needs > 30 secs. It spends most of the time
> (30.13286 seconds) in [..]

So how about taking a completely different approach:

 - just say that waiting for devices in the module init sequence for
over 30 seconds is really really wrong.

 - make the damn mptsas driver just register the controller from the
init sequence, and then do device discovery asynchronously.

The ATA layer does this correctly: it synchronously finds each host,
but then it does

        /* perform each probe asynchronously */
        for (i = 0; i < host->n_ports; i++) {
                struct ata_port *ap = host->ports[i];
                async_schedule(async_port_probe, ap);
        }

and I really think SCSI drivers should do the same if they have this
kind of "ports can take forever to probe" behavior.

What would be the equivalent magic to do this for SCSI? Could we just
make something like scsi_probe_and_add_lun() just always do this, the
same way ata_host_register() does it?

                   Linus
James Bottomley March 21, 2014, 10:56 p.m. UTC | #2
On Fri, 2014-03-21 at 12:32 -0700, Linus Torvalds wrote:
> On Fri, Mar 21, 2014 at 11:34 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Yes, it seems that it actually needs > 30 secs. It spends most of the time
> > (30.13286 seconds) in [..]
> 
> So how about taking a completely different approach:
> 
>  - just say that waiting for devices in the module init sequence for
> over 30 seconds is really really wrong.
> 
>  - make the damn mptsas driver just register the controller from the
> init sequence, and then do device discovery asynchronously.
> 
> The ATA layer does this correctly: it synchronously finds each host,
> but then it does
> 
>         /* perform each probe asynchronously */
>         for (i = 0; i < host->n_ports; i++) {
>                 struct ata_port *ap = host->ports[i];
>                 async_schedule(async_port_probe, ap);
>         }
> 
> and I really think SCSI drivers should do the same if they have this
> kind of "ports can take forever to probe" behavior.
> 
> What would be the equivalent magic to do this for SCSI? Could we just
> make something like scsi_probe_and_add_lun() just always do this, the
> same way ata_host_register() does it?

Well, we do do this asynchronously.  The idea is that the add host only
initialises the actual hardware.  The port probing is supposed to be
done asynchronously (provided the async probe option is enabled in SCSI,
of course).  The way this is supposed to happen is the driver
initialises the hardware and then calls scsi_scan_host().  If the
platform is set up for async scanning, that kicks off all the async
workqueues and returns (or does it all synchronously if async scanning
isn't enabled).

It is possible fusion gets this wrong because the sas driver doesn't
really couple to SCSI's libsas, which is where it would pick up most of
the generic infrastructure for this.  Plus it depends where all the time
is being wasted.  The fusion was the last sas chipset I got the specs
for (under NDA).  It's actually table driven, so if the problem is the
controller taking ages to fill in the tables it might necessitate a
fusion specific fix.  I can see from the driver that it seems to do all
the probing itself instead of relying on probe callbacks from
scsi_scan_host(), so I know what needs to be fixed ... it's less clear
how easy this would be given how monolithic the routine looks.

James
diff mbox

Patch

--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -5395,6 +5395,8 @@  static struct pci_driver mptsas_driver = {
 #endif
 };
 
+#include <linux/signal.h>
+
 static int __init
 mptsas_init(void)
 {
@@ -5424,7 +5426,31 @@  mptsas_init(void)
 	mpt_event_register(mptsasDoneCtx, mptsas_event_process);
 	mpt_reset_register(mptsasDoneCtx, mptsas_ioc_reset);
 
-	error = pci_register_driver(&mptsas_driver);
+	{
+		sigset_t full, save;
+		/*
+		 * KILL ME. THIS IS THE DIRTY AND WRONG HACK WAITING FOR THE
+		 * FIX FROM MAINTAINERS.
+		 *
+		 * - This driver needs a lot of time to complete SendPortEnable()
+		 *   but systemd-udevd sends SIGKILL after 30 seconds, see
+		 *   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705
+		 *
+		 *   Probably user-space should be changed, but:
+		 *
+		 * - Since commit 786235eeba0e "kthread: make kthread_create()
+		 *   killable" scsi_host_alloc() becomes killable and this SIGKILL
+		 *   crashes the kernel.
+		 *
+		 *   If scsi_host_alloc() fails mptsas_probe() blindly calls
+		 *   mptscsih_remove() and scsi_remove_host() hits host == NULL.
+		 */
+		sigfillset(&full);
+		sigprocmask(SIG_SETMASK, &full, &save);
+		error = pci_register_driver(&mptsas_driver);
+		sigprocmask(SIG_SETMASK, &save, NULL);
+	}
+
 	if (error)
 		sas_release_transport(mptsas_transport_template);