diff mbox

[v8,08/13] libsas: libsas.force_hard_reset module parameter

Message ID 20120210084520.25701.83280.stgit@dwillia2-linux.jf.intel.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Dan Williams Feb. 10, 2012, 8:45 a.m. UTC
It is possible for a host to get "locked out" from talking to sata
devices in the domain if, for example, its sas address changes but the
expander topology has existing affiliations with the old address.  If
the system is booted userspace can write to
/sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
if this condition exists for the root device the module parameter can be
used to promote all ata resets to hard resets.

After the system is booted this state can be cleared via
/sys/module/libsas/parameters/force_hard_reset

Cc: Xiangliang Yu <yuxiangl@marvell.com>
Cc: Luben Tuikov <ltuikov@yahoo.com>
Cc: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/kernel-parameters.txt |    6 ++++++
 drivers/scsi/libsas/sas_init.c      |    6 +++++-
 2 files changed, 11 insertions(+), 1 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

James Bottomley Feb. 29, 2012, 9:55 p.m. UTC | #1
On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
> It is possible for a host to get "locked out" from talking to sata
> devices in the domain if, for example, its sas address changes but the
> expander topology has existing affiliations with the old address.  If
> the system is booted userspace can write to
> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
> if this condition exists for the root device the module parameter can be
> used to promote all ata resets to hard resets.

I don't quite understand this.  Are you saying we can't (or shouldn't)
execute 

/sys/class/sas_phy/<phy-X>/hard_reset

on the root device for some reason?

> After the system is booted this state can be cleared via
> /sys/module/libsas/parameters/force_hard_reset

I really don't think a module parameter for this is such a good idea ...
it effectively promotes all soft resets to being hard ones, which can
have a lot of unintended consequences.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Douglas Gilbert Feb. 29, 2012, 10:40 p.m. UTC | #2
On 12-02-29 04:55 PM, James Bottomley wrote:
> On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
>> It is possible for a host to get "locked out" from talking to sata
>> devices in the domain if, for example, its sas address changes but the
>> expander topology has existing affiliations with the old address.  If
>> the system is booted userspace can write to
>> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
>> if this condition exists for the root device the module parameter can be
>> used to promote all ata resets to hard resets.

A point of order: SAS has link resets and hard resets. The
hard reset is a superset of link reset. A "link reset sequence
serves as a hard reset for SATA devices" and hence is
sufficient to reset a SATA device. To reset a SAS device
(e.g. a SAS disk) you need a SAS hard reset. Therefore a link
reset is the appropriately sized "gun" to reset a SATA device.

I have a SAS-2 expander that annoyingly powers up with the
programmed maximum physical link rate of its phys at 3 Gbps
even though its hardware maximum rate is 6 Gbps. For expander
phys connected to SAS-2 disks I can up the programmed maximum
value to 6 Gbps on the expander phy then do a link reset on
that phy. So without upsetting Linux (or any other OS) I can
switch that path from 3 Gbps to 6 Gbps. Can't do that with a
SATA disk without the OS finding out.

Also to clear a SATA affiliation you should be using a SMP
PHY CONTROL (phy_op=6) function.

Doug Gilbert
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams Feb. 29, 2012, 11:22 p.m. UTC | #3
On Wed, Feb 29, 2012 at 1:55 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
>> It is possible for a host to get "locked out" from talking to sata
>> devices in the domain if, for example, its sas address changes but the
>> expander topology has existing affiliations with the old address.  If
>> the system is booted userspace can write to
>> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
>> if this condition exists for the root device the module parameter can be
>> used to promote all ata resets to hard resets.
>
> I don't quite understand this.  Are you saying we can't (or shouldn't)
> execute
>
> /sys/class/sas_phy/<phy-X>/hard_reset
>
> on the root device for some reason?

The case I ran into was accidentally changing the host sas address
between reboots.  If the sata device had been a root device then I
would not have been able boot the system.  But now that I think about
it, if Linux could not boot then neither could the pre-os
option-rom/efi driver.

>> After the system is booted this state can be cleared via
>> /sys/module/libsas/parameters/force_hard_reset
>
> I really don't think a module parameter for this is such a good idea ...
> it effectively promotes all soft resets to being hard ones, which can
> have a lot of unintended consequences.

Yes, it was only meant as a temporary "get out of a sticky situation"
option, but given the above pre-os-driver realization it is not even
useful for that case.  So I'm fine killing this patch.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams Feb. 29, 2012, 11:27 p.m. UTC | #4
On Wed, Feb 29, 2012 at 2:40 PM, Douglas Gilbert <dgilbert@interlog.com> wrote:
> On 12-02-29 04:55 PM, James Bottomley wrote:
>>
>> On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
>>>
>>> It is possible for a host to get "locked out" from talking to sata
>>> devices in the domain if, for example, its sas address changes but the
>>> expander topology has existing affiliations with the old address.  If
>>> the system is booted userspace can write to
>>> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
>>> if this condition exists for the root device the module parameter can be
>>> used to promote all ata resets to hard resets.
>
>
> A point of order: SAS has link resets and hard resets. The
> hard reset is a superset of link reset. A "link reset sequence
> serves as a hard reset for SATA devices" and hence is
> sufficient to reset a SATA device. To reset a SAS device
> (e.g. a SAS disk) you need a SAS hard reset. Therefore a link
> reset is the appropriately sized "gun" to reset a SATA device.
>
> I have a SAS-2 expander that annoyingly powers up with the
> programmed maximum physical link rate of its phys at 3 Gbps
> even though its hardware maximum rate is 6 Gbps. For expander
> phys connected to SAS-2 disks I can up the programmed maximum
> value to 6 Gbps on the expander phy then do a link reset on
> that phy. So without upsetting Linux (or any other OS) I can
> switch that path from 3 Gbps to 6 Gbps. Can't do that with a
> SATA disk without the OS finding out.

At least now (with these pending patches) if you trigger a link-reset
via the sysfs interface libsas will manage the link recovery like any
other error-recovery initiated reset.

Something like a libsas.force_max_phys_link_rate module parameter
might not be a bad idea for this scenario, since libsas sata discovery
always forces at least one reset of the disk after the phy reports
"attached sata device".

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Douglas Gilbert March 1, 2012, 12:23 a.m. UTC | #5
On 12-02-29 06:27 PM, Dan Williams wrote:
> On Wed, Feb 29, 2012 at 2:40 PM, Douglas Gilbert<dgilbert@interlog.com>  wrote:
>> On 12-02-29 04:55 PM, James Bottomley wrote:
>>>
>>> On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
>>>>
>>>> It is possible for a host to get "locked out" from talking to sata
>>>> devices in the domain if, for example, its sas address changes but the
>>>> expander topology has existing affiliations with the old address.  If
>>>> the system is booted userspace can write to
>>>> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
>>>> if this condition exists for the root device the module parameter can be
>>>> used to promote all ata resets to hard resets.
>>
>>
>> A point of order: SAS has link resets and hard resets. The
>> hard reset is a superset of link reset. A "link reset sequence
>> serves as a hard reset for SATA devices" and hence is
>> sufficient to reset a SATA device. To reset a SAS device
>> (e.g. a SAS disk) you need a SAS hard reset. Therefore a link
>> reset is the appropriately sized "gun" to reset a SATA device.
>>
>> I have a SAS-2 expander that annoyingly powers up with the
>> programmed maximum physical link rate of its phys at 3 Gbps
>> even though its hardware maximum rate is 6 Gbps. For expander
>> phys connected to SAS-2 disks I can up the programmed maximum
>> value to 6 Gbps on the expander phy then do a link reset on
>> that phy. So without upsetting Linux (or any other OS) I can
>> switch that path from 3 Gbps to 6 Gbps. Can't do that with a
>> SATA disk without the OS finding out.
>
> At least now (with these pending patches) if you trigger a link-reset
> via the sysfs interface libsas will manage the link recovery like any
> other error-recovery initiated reset.

I can think of 4 cases for link reset. The other end
of the link is:
   a) a SAS target: not error recovery situation
   b) a SAS expander phy: not error recovery situation
   c) a SATA device: error recovery situation
   d) a SAS initiator: not sure, probably not

Doug Gilbert
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams March 1, 2012, 12:35 a.m. UTC | #6
On Wed, Feb 29, 2012 at 4:23 PM, Douglas Gilbert <dgilbert@interlog.com> wrote:
> On 12-02-29 06:27 PM, Dan Williams wrote:
>>
>> On Wed, Feb 29, 2012 at 2:40 PM, Douglas Gilbert<dgilbert@interlog.com>
>>  wrote:
>>>
>>> On 12-02-29 04:55 PM, James Bottomley wrote:
>>>>
>>>>
>>>> On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
>>>>>
>>>>>
>>>>> It is possible for a host to get "locked out" from talking to sata
>>>>> devices in the domain if, for example, its sas address changes but the
>>>>> expander topology has existing affiliations with the old address.  If
>>>>> the system is booted userspace can write to
>>>>> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
>>>>> if this condition exists for the root device the module parameter can
>>>>> be
>>>>> used to promote all ata resets to hard resets.
>>>
>>>
>>>
>>> A point of order: SAS has link resets and hard resets. The
>>> hard reset is a superset of link reset. A "link reset sequence
>>> serves as a hard reset for SATA devices" and hence is
>>> sufficient to reset a SATA device. To reset a SAS device
>>> (e.g. a SAS disk) you need a SAS hard reset. Therefore a link
>>> reset is the appropriately sized "gun" to reset a SATA device.
>>>
>>> I have a SAS-2 expander that annoyingly powers up with the
>>> programmed maximum physical link rate of its phys at 3 Gbps
>>> even though its hardware maximum rate is 6 Gbps. For expander
>>> phys connected to SAS-2 disks I can up the programmed maximum
>>> value to 6 Gbps on the expander phy then do a link reset on
>>> that phy. So without upsetting Linux (or any other OS) I can
>>> switch that path from 3 Gbps to 6 Gbps. Can't do that with a
>>> SATA disk without the OS finding out.
>>
>>
>> At least now (with these pending patches) if you trigger a link-reset
>> via the sysfs interface libsas will manage the link recovery like any
>> other error-recovery initiated reset.
>
>
> I can think of 4 cases for link reset. The other end
> of the link is:
>  a) a SAS target: not error recovery situation
>  b) a SAS expander phy: not error recovery situation
>  c) a SATA device: error recovery situation

sas_try_ata_reset() [1] is what promotes user requested resets into
error recovery managed resets if the other end of the link is sata.

[1]: http://git.kernel.org/?p=linux/kernel/git/djbw/isci.git;a=blob;f=drivers/scsi/libsas/sas_init.c;h=57e7ac97b3e3dba3091f83a64c0c32a6660390cb;hb=refs/heads/all#l222
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley March 1, 2012, 2:27 p.m. UTC | #7
On Wed, 2012-02-29 at 15:22 -0800, Dan Williams wrote:
> On Wed, Feb 29, 2012 at 1:55 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Fri, 2012-02-10 at 00:45 -0800, Dan Williams wrote:
> >> It is possible for a host to get "locked out" from talking to sata
> >> devices in the domain if, for example, its sas address changes but the
> >> expander topology has existing affiliations with the old address.  If
> >> the system is booted userspace can write to
> >> /sys/class/sas_phy/<phy-X>/hard_reset to clear the affiliation, however
> >> if this condition exists for the root device the module parameter can be
> >> used to promote all ata resets to hard resets.
> >
> > I don't quite understand this.  Are you saying we can't (or shouldn't)
> > execute
> >
> > /sys/class/sas_phy/<phy-X>/hard_reset
> >
> > on the root device for some reason?
> 
> The case I ran into was accidentally changing the host sas address
> between reboots.  If the sata device had been a root device then I
> would not have been able boot the system.  But now that I think about
> it, if Linux could not boot then neither could the pre-os
> option-rom/efi driver.
> 
> >> After the system is booted this state can be cleared via
> >> /sys/module/libsas/parameters/force_hard_reset
> >
> > I really don't think a module parameter for this is such a good idea ...
> > it effectively promotes all soft resets to being hard ones, which can
> > have a lot of unintended consequences.
> 
> Yes, it was only meant as a temporary "get out of a sticky situation"
> option, but given the above pre-os-driver realization it is not even
> useful for that case.  So I'm fine killing this patch.

Great, I'll drop it, thanks.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 81c287f..ffefa3b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1283,6 +1283,12 @@  bytes respectively. Such letter suffixes can also be entirely omitted.
 			If there are multiple matching configurations changing
 			the same attribute, the last one is used.
 
+	libsas.force_hard_reset=
+			[LIBSAS] Clear SATA affiliations with every reset, for
+			cases where affiliation errors are causing boot
+			failures, otherwise use sysfs hard_reset interface to
+			clear individual phys.
+
 	memblock=debug	[KNL] Enable memblock debug messages.
 
 	load_ramdisk=	[RAM] List of ramdisks to load from floppy
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 120bff6..2fc23d3 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -293,6 +293,10 @@  static int sas_phy_enable(struct sas_phy *phy, int enable)
 	return ret;
 }
 
+static bool force_hard_reset;
+module_param(force_hard_reset, bool, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(force_hard_reset, "clear sata affiliations on every reset");
+
 int sas_phy_reset(struct sas_phy *phy, int hard_reset)
 {
 	int ret;
@@ -301,7 +305,7 @@  int sas_phy_reset(struct sas_phy *phy, int hard_reset)
 	if (!phy->enabled)
 		return -ENODEV;
 
-	if (hard_reset)
+	if (hard_reset || force_hard_reset)
 		reset_type = PHY_FUNC_HARD_RESET;
 	else
 		reset_type = PHY_FUNC_LINK_RESET;