diff mbox

[libsas,v12,04/11] sysfs: handle 'parent deleted before child added'

Message ID 20120322063214.22036.77957.stgit@dwillia2-linux.jf.intel.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Dan Williams March 22, 2012, 6:32 a.m. UTC
In scsi at least two cases of the parent device being deleted before the
child is added have been observed.

1/ scsi is performing async scans and the device is removed prior to the
   async can thread running.

2/ libsas discovery event running after the parent port has been torn
   down.

Result in crash signatures like:
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
 IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
 ...
 Process scsi_scan_8 (pid: 5417, threadinfo ffff88080bd16000, task ffff880801b8a0b0)
 Stack:
  00000000fffffffe ffff880813470628 ffff88080bd17cd0 ffff88080614b7e8
  ffff88080b45c108 00000000fffffffe ffff88080bd17d20 ffffffff8125e4a8
  ffff88080bd17cf0 ffffffff81075149 ffff88080bd17d30 ffff88080614b7e8
 Call Trace:
  [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
  [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff8125e641>] kobject_add_varg+0x41/0x50
  [<ffffffff8125e70b>] kobject_add+0x64/0x66
  [<ffffffff8131122b>] device_add+0x12d/0x63a

These scenarios need to be cleaned up, but in the meantime the system
need not crash if this ordering occurs.  Instead report:

 kobject_add_internal failed for target8:0:16 (error: -2 parent: end_device-8:0:24)
 Pid: 2942, comm: scsi_scan_8 Not tainted 3.3.0-rc7-isci+ #2
 Call Trace:
  [<ffffffff8125e551>] kobject_add_internal+0x1c1/0x1f3
  [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff8125e659>] kobject_add_varg+0x41/0x50
  [<ffffffff8125e723>] kobject_add+0x64/0x66
  [<ffffffff8131124b>] device_add+0x12d/0x63a
  [<ffffffff8125e0ef>] ? kobject_put+0x4c/0x50
  [<ffffffff8132f370>] scsi_sysfs_add_sdev+0x4e/0x28a
  [<ffffffff8132dce3>] do_scan_async+0x9c/0x145

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/sysfs/dir.c |    3 +++
 lib/kobject.c  |    7 ++++---
 2 files changed, 7 insertions(+), 3 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

gregkh@linuxfoundation.org March 22, 2012, 2:39 p.m. UTC | #1
On Wed, Mar 21, 2012 at 11:32:14PM -0700, Dan Williams wrote:
> In scsi at least two cases of the parent device being deleted before the
> child is added have been observed.
> 
> 1/ scsi is performing async scans and the device is removed prior to the
>    async can thread running.
> 
> 2/ libsas discovery event running after the parent port has been torn
>    down.
> 
> Result in crash signatures like:
>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
>  IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
>  ...
>  Process scsi_scan_8 (pid: 5417, threadinfo ffff88080bd16000, task ffff880801b8a0b0)
>  Stack:
>   00000000fffffffe ffff880813470628 ffff88080bd17cd0 ffff88080614b7e8
>   ffff88080b45c108 00000000fffffffe ffff88080bd17d20 ffffffff8125e4a8
>   ffff88080bd17cf0 ffffffff81075149 ffff88080bd17d30 ffff88080614b7e8
>  Call Trace:
>   [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
>   [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
>   [<ffffffff8125e641>] kobject_add_varg+0x41/0x50
>   [<ffffffff8125e70b>] kobject_add+0x64/0x66
>   [<ffffffff8131122b>] device_add+0x12d/0x63a
> 
> These scenarios need to be cleaned up, but in the meantime the system
> need not crash if this ordering occurs.  Instead report:
> 
>  kobject_add_internal failed for target8:0:16 (error: -2 parent: end_device-8:0:24)
>  Pid: 2942, comm: scsi_scan_8 Not tainted 3.3.0-rc7-isci+ #2
>  Call Trace:
>   [<ffffffff8125e551>] kobject_add_internal+0x1c1/0x1f3
>   [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
>   [<ffffffff8125e659>] kobject_add_varg+0x41/0x50
>   [<ffffffff8125e723>] kobject_add+0x64/0x66
>   [<ffffffff8131124b>] device_add+0x12d/0x63a
>   [<ffffffff8125e0ef>] ? kobject_put+0x4c/0x50
>   [<ffffffff8132f370>] scsi_sysfs_add_sdev+0x4e/0x28a
>   [<ffffffff8132dce3>] do_scan_async+0x9c/0x145
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  fs/sysfs/dir.c |    3 +++
>  lib/kobject.c  |    7 ++++---
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index 7fdf6a7..86521ee 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -714,6 +714,9 @@ int sysfs_create_dir(struct kobject * kobj)
>  	else
>  		parent_sd = &sysfs_root;
>  
> +	if (!parent_sd)
> +		return -ENOENT;
> +
>  	if (sysfs_ns_type(parent_sd))
>  		ns = kobj->ktype->namespace(kobj);
>  	type = sysfs_read_ns_type(kobj);

So what happens if this is true?  Does this patch fix the oops?  What
kernels should this be applied to where this problem has been seen?

> diff --git a/lib/kobject.c b/lib/kobject.c
> index c33d7a1..e5f86c0 100644
> --- a/lib/kobject.c
> +++ b/lib/kobject.c
> @@ -192,13 +192,14 @@ static int kobject_add_internal(struct kobject *kobj)
>  
>  		/* be noisy on error issues */
>  		if (error == -EEXIST)
> -			printk(KERN_ERR "%s failed for %s with "
> +			pr_err("%s failed for %s with "
>  			       "-EEXIST, don't try to register things with "
>  			       "the same name in the same directory.\n",
>  			       __func__, kobject_name(kobj));
>  		else
> -			printk(KERN_ERR "%s failed for %s (%d)\n",
> -			       __func__, kobject_name(kobj), error);
> +			pr_err("%s failed for %s (error: %d parent: %s)\n",
> +			       __func__, kobject_name(kobj), error,
> +			       parent ? kobject_name(parent) : "'none'");
>  		dump_stack();
>  	} else
>  		kobj->state_in_sysfs = 1;

These changes have nothing to do with the above fix, so why include them
here?

And note, I hate pr_err(), what's wrong with printk() in this instance?

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley March 22, 2012, 2:47 p.m. UTC | #2
On Wed, 2012-03-21 at 23:32 -0700, Dan Williams wrote:
> In scsi at least two cases of the parent device being deleted before the
> child is added have been observed.
> 
> 1/ scsi is performing async scans and the device is removed prior to the
>    async can thread running.

This doesn't sound right.  We do an explicit get on the sdkp (and the
sdkp holds the sdp) before we schedule an async scan.  That's only
removed after the async scan has completed, so it should be impossible
for the parent to vanish.

> 2/ libsas discovery event running after the parent port has been torn
>    down.

This sounds possible, but should be fixable by referencing and checking
in libsas rather than having to alter sysfs, shouldn't it?

In general, I think any case where the parent is freed before the child
of that parent created is our refcounting cockup rather than a generic
problem in sysfs.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams March 22, 2012, 4:27 p.m. UTC | #3
On Thu, Mar 22, 2012 at 7:39 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Wed, Mar 21, 2012 at 11:32:14PM -0700, Dan Williams wrote:
[..]
>> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
>> index 7fdf6a7..86521ee 100644
>> --- a/fs/sysfs/dir.c
>> +++ b/fs/sysfs/dir.c
>> @@ -714,6 +714,9 @@ int sysfs_create_dir(struct kobject * kobj)
>>       else
>>               parent_sd = &sysfs_root;
>>
>> +     if (!parent_sd)
>> +             return -ENOENT;
>> +
>>       if (sysfs_ns_type(parent_sd))
>>               ns = kobj->ktype->namespace(kobj);
>>       type = sysfs_read_ns_type(kobj);
>
> So what happens if this is true?  Does this patch fix the oops?

This patch downgrades the oops by turning it into a device_add()
failure, but the patches that *fix* this warning are here [1] and here
[2].

> What kernels should this be applied to where this problem has been seen?

I assume this has been a latent problem ever since scsi async scanning
was added (2.6.20-rc2), but it's a rare corner case to unplug devices
during the initial scan.

>> diff --git a/lib/kobject.c b/lib/kobject.c
>> index c33d7a1..e5f86c0 100644
>> --- a/lib/kobject.c
>> +++ b/lib/kobject.c
>> @@ -192,13 +192,14 @@ static int kobject_add_internal(struct kobject *kobj)
>>
>>               /* be noisy on error issues */
>>               if (error == -EEXIST)
>> -                     printk(KERN_ERR "%s failed for %s with "
>> +                     pr_err("%s failed for %s with "
>>                              "-EEXIST, don't try to register things with "
>>                              "the same name in the same directory.\n",
>>                              __func__, kobject_name(kobj));
>>               else
>> -                     printk(KERN_ERR "%s failed for %s (%d)\n",
>> -                            __func__, kobject_name(kobj), error);
>> +                     pr_err("%s failed for %s (error: %d parent: %s)\n",
>> +                            __func__, kobject_name(kobj), error,
>> +                            parent ? kobject_name(parent) : "'none'");
>>               dump_stack();
>>       } else
>>               kobj->state_in_sysfs = 1;
>
> These changes have nothing to do with the above fix, so why include them
> here?

It wasn't until I realized which 'parent' and which 'child' were
interacting that I was able to identify the real fixes.  Since it was
helpful for the scsi/sas case, I decided to leave the more informative
warning for the next person that gets to debug a similar failure.

> And note, I hate pr_err(), what's wrong with printk() in this instance?

This is a bit circuitous, but extending the warning to include the
'parent' and 'child' pushed up against 80 columns and since this
routine has a pr_debug() a few lines up I thought a pr_ conversion was
acceptable.  The pr_err() conversion of the EEXIST case just came
along for the ride to keep the print style consistent (at least in
this routine).

--
Dan

[1]: http://marc.info/?l=linux-scsi&m=133239707903443&w=2
[2]: http://marc.info/?l=linux-scsi&m=133239709603452&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams March 22, 2012, 4:34 p.m. UTC | #4
On Thu, Mar 22, 2012 at 7:47 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Wed, 2012-03-21 at 23:32 -0700, Dan Williams wrote:
>> In scsi at least two cases of the parent device being deleted before the
>> child is added have been observed.
>>
>> 1/ scsi is performing async scans and the device is removed prior to the
>>    async can thread running.
>
> This doesn't sound right.  We do an explicit get on the sdkp (and the
> sdkp holds the sdp) before we schedule an async scan.  That's only
> removed after the async scan has completed, so it should be impossible
> for the parent to vanish.

Right, the parent does not get freed because we have a reference, but
it still gets device_del()'d which leads to:

  device_del()->kobject_del()->sysfs_remove_dir()

  kobj->sd = NULL;

...and then sysfs_create_dir() (without this fix) goes ahead and
de-references parent_sd via sysfs_ns_type():

  return (sd->s_flags & SYSFS_NS_TYPE_MASK) >> SYSFS_NS_TYPE_SHIFT;

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Richter March 22, 2012, 10:51 p.m. UTC | #5
On Mar 22 Williams, Dan J wrote:
> On Thu, Mar 22, 2012 at 7:39 AM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Wed, Mar 21, 2012 at 11:32:14PM -0700, Dan Williams wrote:
> [..]
> >> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> >> index 7fdf6a7..86521ee 100644
> >> --- a/fs/sysfs/dir.c
> >> +++ b/fs/sysfs/dir.c
> >> @@ -714,6 +714,9 @@ int sysfs_create_dir(struct kobject * kobj)
> >>       else
> >>               parent_sd = &sysfs_root;
> >>
> >> +     if (!parent_sd)
> >> +             return -ENOENT;
> >> +
> >>       if (sysfs_ns_type(parent_sd))
> >>               ns = kobj->ktype->namespace(kobj);
> >>       type = sysfs_read_ns_type(kobj);
> >
> > So what happens if this is true?  Does this patch fix the oops?
> 
> This patch downgrades the oops by turning it into a device_add()
> failure, but the patches that *fix* this warning are here [1] and here
> [2].
[...]
> [1]: http://marc.info/?l=linux-scsi&m=133239707903443&w=2
> [2]: http://marc.info/?l=linux-scsi&m=133239709603452&w=2

Isn't this something which is to be accomplished by counting references to
the parent device?
Dan Williams March 22, 2012, 11:11 p.m. UTC | #6
On Thu, Mar 22, 2012 at 3:51 PM, Stefan Richter
<stefanr@s5r6.in-berlin.de> wrote:
> Isn't this something which is to be accomplished by counting references to
> the parent device?

As I mentioned to James [1] the reference count keeps the parent from
being kfree()'d but not device_del()'d where kobj->sd gets sets to
NULL.

--
Dan

[1]: http://marc.info/?l=linux-scsi&m=133243414518064&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Richter March 22, 2012, 11:26 p.m. UTC | #7
On Mar 22 Stefan Richter wrote:
> On Mar 22 Williams, Dan J wrote:
> > the patches that *fix* this warning are here [1] and here
> > [2].
> [...]
> > [1]: http://marc.info/?l=linux-scsi&m=133239707903443&w=2
> > [2]: http://marc.info/?l=linux-scsi&m=133239709603452&w=2
> 
> Isn't this something which is to be accomplished by counting references to
> the parent device?

Oh, OK, it is about ensuring device_add(parent) -- device_add(child) --
device_del(child) -- device_del(parent) nesting order.  So not just
counting of plain references but a wait-for-completion type issue.
diff mbox

Patch

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 7fdf6a7..86521ee 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -714,6 +714,9 @@  int sysfs_create_dir(struct kobject * kobj)
 	else
 		parent_sd = &sysfs_root;
 
+	if (!parent_sd)
+		return -ENOENT;
+
 	if (sysfs_ns_type(parent_sd))
 		ns = kobj->ktype->namespace(kobj);
 	type = sysfs_read_ns_type(kobj);
diff --git a/lib/kobject.c b/lib/kobject.c
index c33d7a1..e5f86c0 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -192,13 +192,14 @@  static int kobject_add_internal(struct kobject *kobj)
 
 		/* be noisy on error issues */
 		if (error == -EEXIST)
-			printk(KERN_ERR "%s failed for %s with "
+			pr_err("%s failed for %s with "
 			       "-EEXIST, don't try to register things with "
 			       "the same name in the same directory.\n",
 			       __func__, kobject_name(kobj));
 		else
-			printk(KERN_ERR "%s failed for %s (%d)\n",
-			       __func__, kobject_name(kobj), error);
+			pr_err("%s failed for %s (error: %d parent: %s)\n",
+			       __func__, kobject_name(kobj), error,
+			       parent ? kobject_name(parent) : "'none'");
 		dump_stack();
 	} else
 		kobj->state_in_sysfs = 1;