diff mbox

i2c: core: Prevent race condition when removing i2c devices

Message ID 20170608024344.9311-1-david.thomson@alliedtelesis.co.nz
State Rejected
Headers show

Commit Message

David Thomson June 8, 2017, 2:43 a.m. UTC
There's a race condition that occurs when deleting & closing multiple
i2c devices, which results in a kernel warning on an incorrect reference
count:

Call Trace:
[c00000007e2cfaf0] [c0000000000b84e4] .module_put+0x24/0xb0 (unreliable)
[c00000007e2cfb70] [c0000000005db830] .i2c_put_adapter+0x30/0x50
[c00000007e2cfbf0] [c0000000005deb74] .i2cdev_release+0x24/0x60
[c00000007e2cfc70] [c00000000014015c] .__fput+0xbc/0x290
[c00000007e2cfd10] [c000000000059b64] .task_work_run+0xf4/0x130
[c00000007e2cfdb0] [c00000000000a0a4] .do_notify_resume+0x74/0x80
[c00000007e2cfe30] [c000000000000acc] .ret_from_except_lite+0x78/0x7c

The race is seen when an i2c adapter is being deleted while
file descriptor on another i2c adapter is being closed. The following
shows two threads executing concurrently (the first number is the thread
id):

4580 i2c_del_mux_adapter adapter i2c-6-mux (chan_id 2)
4580 i2c_del_adapter adapter i2c-6-mux (chan_id 2)
4580 i2cdev_detach_adapter adapter i2c-6-mux (chan_id 2)
2627 i2cdev_release START adapter i2c-6-mux (chan_id 2)
2627 i2c_put_adapter START adapter i2c-6-mux (chan_id 2) owner (null)
2627 i2c_adapter_dev_release adapter i2c-6-mux (chan_id 2)
4580 i2c_del_mux_adapter adapter i2c-6-mux (chan_id 3)
4580 i2c_del_adapter adapter i2c-6-mux (chan_id 3)
4580 i2cdev_detach_adapter adapter i2c-6-mux (chan_id 3)
2627 i2c_put_adapter END adapter i2c-6-mux (chan_id 2) owner c0000000edf66400
2627 i2cdev_release END adapter i2c-6-mux (chan_id 2)
2627 i2cdev_release START adapter i2c-6-mux (chan_id 3)
2627 i2c_put_adapter START adapter i2c-6-mux (chan_id 3) owner (null)
2627 i2c_adapter_dev_release adapter i2c-6-mux (chan_id 3)
2627 i2c_put_adapter END adapter i2c-6-mux (chan_id 3) owner (null)
2627 i2cdev_release END adapter i2c-6-mux (chan_id 3)

When thread 4580 for chan_id 3 interrupts 2627 doing
i2c_adapter_dev_release, the 'owner' field for chan_id 2 becomes
non-null. This causes the code to attempt to decrement the reference for
this owner, but the current reference count is invalid. I'm not sure how
the owner is getting set, but adding a lock around the
put_device/module_put calls resolves the issue.

Signed-off-by: David Thomson <david.thomson@alliedtelesis.co.nz>
---
 drivers/i2c/i2c-core.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Wolfram Sang June 19, 2017, 2:36 p.m. UTC | #1
On Thu, Jun 08, 2017 at 02:43:44PM +1200, David Thomson wrote:
> There's a race condition that occurs when deleting & closing multiple
> i2c devices, which results in a kernel warning on an incorrect reference
> count:

Thanks for the report and patch!

CCing peda. Do you have time to look into it? Muxes and locking are
kinda calling for your expertise ;)

> 
> Call Trace:
> [c00000007e2cfaf0] [c0000000000b84e4] .module_put+0x24/0xb0 (unreliable)
> [c00000007e2cfb70] [c0000000005db830] .i2c_put_adapter+0x30/0x50
> [c00000007e2cfbf0] [c0000000005deb74] .i2cdev_release+0x24/0x60
> [c00000007e2cfc70] [c00000000014015c] .__fput+0xbc/0x290
> [c00000007e2cfd10] [c000000000059b64] .task_work_run+0xf4/0x130
> [c00000007e2cfdb0] [c00000000000a0a4] .do_notify_resume+0x74/0x80
> [c00000007e2cfe30] [c000000000000acc] .ret_from_except_lite+0x78/0x7c
> 
> The race is seen when an i2c adapter is being deleted while
> file descriptor on another i2c adapter is being closed. The following
> shows two threads executing concurrently (the first number is the thread
> id):
> 
> 4580 i2c_del_mux_adapter adapter i2c-6-mux (chan_id 2)
> 4580 i2c_del_adapter adapter i2c-6-mux (chan_id 2)
> 4580 i2cdev_detach_adapter adapter i2c-6-mux (chan_id 2)
> 2627 i2cdev_release START adapter i2c-6-mux (chan_id 2)
> 2627 i2c_put_adapter START adapter i2c-6-mux (chan_id 2) owner (null)
> 2627 i2c_adapter_dev_release adapter i2c-6-mux (chan_id 2)
> 4580 i2c_del_mux_adapter adapter i2c-6-mux (chan_id 3)
> 4580 i2c_del_adapter adapter i2c-6-mux (chan_id 3)
> 4580 i2cdev_detach_adapter adapter i2c-6-mux (chan_id 3)
> 2627 i2c_put_adapter END adapter i2c-6-mux (chan_id 2) owner c0000000edf66400
> 2627 i2cdev_release END adapter i2c-6-mux (chan_id 2)
> 2627 i2cdev_release START adapter i2c-6-mux (chan_id 3)
> 2627 i2c_put_adapter START adapter i2c-6-mux (chan_id 3) owner (null)
> 2627 i2c_adapter_dev_release adapter i2c-6-mux (chan_id 3)
> 2627 i2c_put_adapter END adapter i2c-6-mux (chan_id 3) owner (null)
> 2627 i2cdev_release END adapter i2c-6-mux (chan_id 3)
> 
> When thread 4580 for chan_id 3 interrupts 2627 doing
> i2c_adapter_dev_release, the 'owner' field for chan_id 2 becomes
> non-null. This causes the code to attempt to decrement the reference for
> this owner, but the current reference count is invalid. I'm not sure how
> the owner is getting set, but adding a lock around the
> put_device/module_put calls resolves the issue.
> 
> Signed-off-by: David Thomson <david.thomson@alliedtelesis.co.nz>
> ---
>  drivers/i2c/i2c-core.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
> index 82576aaccc90..8831d9b363a7 100644
> --- a/drivers/i2c/i2c-core.c
> +++ b/drivers/i2c/i2c-core.c
> @@ -3151,8 +3151,12 @@ void i2c_put_adapter(struct i2c_adapter *adap)
>  	if (!adap)
>  		return;
>  
> +	mutex_lock(&core_lock);
> +
>  	put_device(&adap->dev);
>  	module_put(adap->owner);
> +
> +	mutex_unlock(&core_lock);
>  }
>  EXPORT_SYMBOL(i2c_put_adapter);
>  
> -- 
> 2.13.0
>
Peter Rosin June 19, 2017, 4:34 p.m. UTC | #2
On 2017-06-19 16:36, Wolfram Sang wrote:
> On Thu, Jun 08, 2017 at 02:43:44PM +1200, David Thomson wrote:
>> There's a race condition that occurs when deleting & closing multiple
>> i2c devices, which results in a kernel warning on an incorrect reference
>> count:
> 
> Thanks for the report and patch!
> 
> CCing peda. Do you have time to look into it? Muxes and locking are
> kinda calling for your expertise ;)
> 
>>
>> Call Trace:
>> [c00000007e2cfaf0] [c0000000000b84e4] .module_put+0x24/0xb0 (unreliable)
>> [c00000007e2cfb70] [c0000000005db830] .i2c_put_adapter+0x30/0x50
>> [c00000007e2cfbf0] [c0000000005deb74] .i2cdev_release+0x24/0x60
>> [c00000007e2cfc70] [c00000000014015c] .__fput+0xbc/0x290
>> [c00000007e2cfd10] [c000000000059b64] .task_work_run+0xf4/0x130
>> [c00000007e2cfdb0] [c00000000000a0a4] .do_notify_resume+0x74/0x80
>> [c00000007e2cfe30] [c000000000000acc] .ret_from_except_lite+0x78/0x7c
>>
>> The race is seen when an i2c adapter is being deleted while
>> file descriptor on another i2c adapter is being closed. The following
>> shows two threads executing concurrently (the first number is the thread
>> id):
>>
>> 4580 i2c_del_mux_adapter adapter i2c-6-mux (chan_id 2)

This function no longer exist in the kernel. I'm not saying that the
problem has been solved (it probably hasn't), but what version was this
and can you please redo this with a newer kernel?

>> 4580 i2c_del_adapter adapter i2c-6-mux (chan_id 2)
>> 4580 i2cdev_detach_adapter adapter i2c-6-mux (chan_id 2)
>> 2627 i2cdev_release START adapter i2c-6-mux (chan_id 2)
>> 2627 i2c_put_adapter START adapter i2c-6-mux (chan_id 2) owner (null)
>> 2627 i2c_adapter_dev_release adapter i2c-6-mux (chan_id 2)
>> 4580 i2c_del_mux_adapter adapter i2c-6-mux (chan_id 3)
>> 4580 i2c_del_adapter adapter i2c-6-mux (chan_id 3)
>> 4580 i2cdev_detach_adapter adapter i2c-6-mux (chan_id 3)
>> 2627 i2c_put_adapter END adapter i2c-6-mux (chan_id 2) owner c0000000edf66400
>> 2627 i2cdev_release END adapter i2c-6-mux (chan_id 2)
>> 2627 i2cdev_release START adapter i2c-6-mux (chan_id 3)
>> 2627 i2c_put_adapter START adapter i2c-6-mux (chan_id 3) owner (null)
>> 2627 i2c_adapter_dev_release adapter i2c-6-mux (chan_id 3)
>> 2627 i2c_put_adapter END adapter i2c-6-mux (chan_id 3) owner (null)
>> 2627 i2cdev_release END adapter i2c-6-mux (chan_id 3)
>>
>> When thread 4580 for chan_id 3 interrupts 2627 doing
>> i2c_adapter_dev_release, the 'owner' field for chan_id 2 becomes
>> non-null. This causes the code to attempt to decrement the reference for
>> this owner, but the current reference count is invalid. I'm not sure how
>> the owner is getting set, but adding a lock around the
>> put_device/module_put calls resolves the issue.

It would certainly be nice to know why .owner is clobbered, because I
don't see it. But then again, I don't know what sources I should be
reading...

Cheers,
peda

>> Signed-off-by: David Thomson <david.thomson@alliedtelesis.co.nz>
>> ---
>>  drivers/i2c/i2c-core.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
>> index 82576aaccc90..8831d9b363a7 100644
>> --- a/drivers/i2c/i2c-core.c
>> +++ b/drivers/i2c/i2c-core.c
>> @@ -3151,8 +3151,12 @@ void i2c_put_adapter(struct i2c_adapter *adap)
>>  	if (!adap)
>>  		return;
>>  
>> +	mutex_lock(&core_lock);
>> +
>>  	put_device(&adap->dev);
>>  	module_put(adap->owner);
>> +
>> +	mutex_unlock(&core_lock);
>>  }
>>  EXPORT_SYMBOL(i2c_put_adapter);
>>  
>> -- 
>> 2.13.0
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Rosin June 19, 2017, 7:30 p.m. UTC | #3
On 2017-06-19 18:34, Peter Rosin wrote:
> It would certainly be nice to know why .owner is clobbered, because I
> don't see it. But then again, I don't know what sources I should be
> reading...

BTW, one thing I noticed when reading the current code is that I see this
at the end of i2c_del_adapter:

	/* Clear the device structure in case this adapter is ever going to be
	   added again */
	memset(&adap->dev, 0, sizeof(adap->dev));
}
EXPORT_SYMBOL(i2c_del_adapter);


which is not very compatible with this part of the device_add() docs:


 * Do not call this routine or device_register() more than once for
 * any device structure.  The driver model core is not designed to work
 * with devices that get unregistered and then spring back to life.
 * (Among other things, it's very hard to guarantee that all references
 * to the previous incarnation of @dev have been dropped.)  Allocate
 * and register a fresh new struct device instead.


That, of course, probably have no bearing on the problem/patch in this
thread...

Or maybe struct device reuse is exactly what is going on???

Cheers,
peda
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Thomson June 28, 2017, 10:20 p.m. UTC | #4
Hi Peter

Thanks for your reply. Unfortunately we cannot reproduce the issue on a newer kernel as it requires userspace applications (for processing the hotswap) that rely on our (many) changes to the kernel. I appreciate you taking the time to consider the issue.

Thanks
David
Wolfram Sang July 31, 2017, 1:38 p.m. UTC | #5
On Mon, Jun 19, 2017 at 09:30:06PM +0200, Peter Rosin wrote:
> On 2017-06-19 18:34, Peter Rosin wrote:
> > It would certainly be nice to know why .owner is clobbered, because I
> > don't see it. But then again, I don't know what sources I should be
> > reading...
>
> BTW, one thing I noticed when reading the current code is that I see this
> at the end of i2c_del_adapter:
>
> 	/* Clear the device structure in case this adapter is ever going to be
> 	   added again */
> 	memset(&adap->dev, 0, sizeof(adap->dev));
> }
> EXPORT_SYMBOL(i2c_del_adapter);

It was added with commit bd4bc3dbded9cd ("i2c: Clear i2c_adapter.dev on
adapter removal") with this commit message:

    i2c: Clear i2c_adapter.dev on adapter removal

    Clear i2c_adapter.dev on adapter removal. This makes it possible to
    re-add the adapter at a later point, which some drivers
    (i2c-amd756-s4882, i2c-nforce2-s4985) actually do.

    This fixes a bug reported by John Stultz here:
      http://lkml.org/lkml/2008/7/15/720
    and by Ingo Molar there:
      http://lkml.org/lkml/2008/7/16/78

So, despite the docs, it used to be an issue actually...
Peter Rosin July 31, 2017, 2:02 p.m. UTC | #6
On 2017-07-31 15:38, Wolfram Sang wrote:
> On Mon, Jun 19, 2017 at 09:30:06PM +0200, Peter Rosin wrote:
>> On 2017-06-19 18:34, Peter Rosin wrote:
>>> It would certainly be nice to know why .owner is clobbered, because I
>>> don't see it. But then again, I don't know what sources I should be
>>> reading...
>>
>> BTW, one thing I noticed when reading the current code is that I see this
>> at the end of i2c_del_adapter:
>>
>> 	/* Clear the device structure in case this adapter is ever going to be
>> 	   added again */
>> 	memset(&adap->dev, 0, sizeof(adap->dev));
>> }
>> EXPORT_SYMBOL(i2c_del_adapter);
> 
> It was added with commit bd4bc3dbded9cd ("i2c: Clear i2c_adapter.dev on
> adapter removal") with this commit message:
> 
>     i2c: Clear i2c_adapter.dev on adapter removal
> 
>     Clear i2c_adapter.dev on adapter removal. This makes it possible to
>     re-add the adapter at a later point, which some drivers
>     (i2c-amd756-s4882, i2c-nforce2-s4985) actually do.
> 
>     This fixes a bug reported by John Stultz here:
>       http://lkml.org/lkml/2008/7/15/720
>     and by Ingo Molar there:
>       http://lkml.org/lkml/2008/7/16/78
> 
> So, despite the docs, it used to be an issue actually...

What do you mean "despite the docs"? The docs says that you shouldn't
reuse struct device. When you did, you got splats. Zeroing out struct
device happens to work in this case probably because there is a
completion going on around the call to device_unregister(). Without
that completion it would have been seriously wrong to zero out the
struct device, I think. But even with it, I'm not sure it's correct
to do so and worry about rcu grace periods etc. Has any driver model
expert looked at this struct device reuse and blessed it as ok?

Cheers,
Peter
Wolfram Sang July 31, 2017, 2:11 p.m. UTC | #7
> to do so and worry about rcu grace periods etc. Has any driver model
> expert looked at this struct device reuse and blessed it as ok?

I am quite sure this did not happen. That's what I mean with "despite
the docs": One shouldn't reuse struct device, some people did it and got
this "fix" in place, and now people might rely on the "fix". As a
result, we can't simply remove it by saying "Ah, that's cruft. It can
just go." We'd need some deeper look into random drivers, like you
sketched.
Peter Rosin Aug. 18, 2017, 8:04 a.m. UTC | #8
On 2017-06-29 00:20, David Thomson wrote:
> On 2017-06-19 18:34, Peter Rosin wrote:
>> It would certainly be nice to know why .owner is clobbered, because I
>> don't see it. But then again, I don't know what sources I should be
>> reading...
> Hi Peter
> 
> Thanks for your reply. Unfortunately we cannot reproduce the issue on a newer kernel as it requires userspace applications (for processing the hotswap) that rely on our (many) changes to the kernel. I appreciate you taking the time to consider the issue.

Still interested in what base version you have and if there are any
patches of relevance to the i2c subsystem... The oldest mainline
that had i2c_del_mux_adapter is v4.6 (May 2016), so I had a second
look at that. Could you also tell us what i2c-mux driver this is
and whether it is expected that .owner is NULL in the first place? In
other words, is the i2c-mux driver in question built-in or a loaded
module?

If you have not added any new i2c-mux driver (which I doubt, you
would surely have mentioned that) the only way that this race can
happen is if the driver is removed twice since i2c_del_mux_adapter
is only called in .remove (and in .probe on failure, but I don't
think that can race). At least that's the case for the i2c-mux drivers
that I looked at...

It seems pretty serious that a driver can have its .remove called
twice so I suspect your patch just papers over a much bigger problem?

So, why is the i2c-mux driver being removed more than once? The
module refcounting must be badly screwed up for that to happen. Or?

Cheers,
Peter
diff mbox

Patch

diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index 82576aaccc90..8831d9b363a7 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -3151,8 +3151,12 @@  void i2c_put_adapter(struct i2c_adapter *adap)
 	if (!adap)
 		return;
 
+	mutex_lock(&core_lock);
+
 	put_device(&adap->dev);
 	module_put(adap->owner);
+
+	mutex_unlock(&core_lock);
 }
 EXPORT_SYMBOL(i2c_put_adapter);