diff mbox series

[v4] i2c: acpi: Unbind mux adapters before delete

Message ID 20240312221632.859695-1-hamish.martin@alliedtelesis.co.nz
State New
Headers show
Series [v4] i2c: acpi: Unbind mux adapters before delete | expand

Commit Message

Hamish Martin March 12, 2024, 10:16 p.m. UTC
There is an issue with ACPI overlay table removal specifically related
to I2C multiplexers.

Consider an ACPI SSDT Overlay that defines a PCA9548 I2C mux on an
existing I2C bus. When this table is loaded we see the creation of a
device for the overall PCA9548 chip and 8 further devices - one
i2c_adapter each for the mux channels. These are all bound to their
ACPI equivalents via an eventual invocation of acpi_bind_one().

When we unload the SSDT overlay we run into the problem. The ACPI
devices are deleted as normal via acpi_device_del_work_fn() and the
acpi_device_del_list.

However, the following warning and stack trace is output as the
deletion does not go smoothly:
------------[ cut here ]------------
kernfs: can not remove 'physical_node', no directory
WARNING: CPU: 1 PID: 11 at fs/kernfs/dir.c:1674 kernfs_remove_by_name_ns+0xb9/0xc0
Modules linked in:
CPU: 1 PID: 11 Comm: kworker/u128:0 Not tainted 6.8.0-rc6+ #1
Hardware name: congatec AG conga-B7E3/conga-B7E3, BIOS 5.13 05/16/2023
Workqueue: kacpi_hotplug acpi_device_del_work_fn
RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0
Code: e4 00 48 89 ef e8 07 71 db ff 5b b8 fe ff ff ff 5d 41 5c 41 5d e9 a7 55 e4 00 0f 0b eb a6 48 c7 c7 f0 38 0d 9d e8 97 0a d5 ff <0f> 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffff9f864008fb28 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8ef90a8d4940 RCX: 0000000000000000
RDX: ffff8f000e267d10 RSI: ffff8f000e25c780 RDI: ffff8f000e25c780
RBP: ffff8ef9186f9870 R08: 0000000000013ffb R09: 00000000ffffbfff
R10: 00000000ffffbfff R11: ffff8f000e0a0000 R12: ffff9f864008fb50
R13: ffff8ef90c93dd60 R14: ffff8ef9010d0958 R15: ffff8ef9186f98c8
FS:  0000000000000000(0000) GS:ffff8f000e240000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f48f5253a08 CR3: 00000003cb82e000 CR4: 00000000003506f0
Call Trace:
 <TASK>
 ? kernfs_remove_by_name_ns+0xb9/0xc0
 ? __warn+0x7c/0x130
 ? kernfs_remove_by_name_ns+0xb9/0xc0
 ? report_bug+0x171/0x1a0
 ? handle_bug+0x3c/0x70
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? kernfs_remove_by_name_ns+0xb9/0xc0
 ? kernfs_remove_by_name_ns+0xb9/0xc0
 acpi_unbind_one+0x108/0x180
 device_del+0x18b/0x490
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 device_unregister+0xd/0x30
 i2c_del_adapter.part.0+0x1bf/0x250
 i2c_mux_del_adapters+0xa1/0xe0
 i2c_device_remove+0x1e/0x80
 device_release_driver_internal+0x19a/0x200
 bus_remove_device+0xbf/0x100
 device_del+0x157/0x490
 ? __pfx_device_match_fwnode+0x10/0x10
 ? srso_return_thunk+0x5/0x5f
 device_unregister+0xd/0x30
 i2c_acpi_notify+0x10f/0x140
 notifier_call_chain+0x58/0xd0
 blocking_notifier_call_chain+0x3a/0x60
 acpi_device_del_work_fn+0x85/0x1d0
 process_one_work+0x134/0x2f0
 worker_thread+0x2f0/0x410
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xe3/0x110
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2f/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>
---[ end trace 0000000000000000 ]---
...
repeated 7 more times, 1 for each channel of the mux
...

The issue is that the binding of the ACPI devices to their peer I2C
adapters is not correctly cleaned up. Digging deeper into the issue we
see that the deletion order is such that the ACPI devices matching the
mux channel i2c adapters are deleted first during the SSDT overlay
removal. For each of the channels we see a call to i2c_acpi_notify()
with ACPI_RECONFIG_DEVICE_REMOVE but, because these devices are not
actually i2c_clients, nothing is done for them.

Later on, after each of the mux channels has been dealt with, we come
to delete the i2c_client representing the PCA9548 device. This is the
call stack we see above, whereby the kernel cleans up the i2c_client
including destruction of the mux and its channel adapters. At this
point we do attempt to unbind from the ACPI peers but those peers no
longer exist and so we hit the kernfs errors.

The fix is to augment i2c_acpi_notify() to handle i2c_adapters. But,
given that the life cycle of the adapters is linked to the i2c_client,
instead of deleting the i2c_adapters during the i2c_acpi_notify(), we
just trigger unbinding of the ACPI device from the adapter device, and
allow the clean up of the adapter to continue in the way it always has.

Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Fixes: 525e6fabeae2 ("i2c / ACPI: add support for ACPI reconfigure notifications")
Cc: <stable@vger.kernel.org> # v4.8+
---

Notes:
    v4:
      Resolve Build failure noted by:
        Linux Kernel Functional Testing <lkft@linaro.org>, and
        kernel test robot <lkp@intel.com>
      These failures led to revert of the v3 version of this patch that had been accepted earlier.
    v3:
      Add reviewed by tags (Mika Westerberg and Andi Shyti) and Fixes tag.
    v2:
      Moved long problem description from cover letter to commit description at Mika's suggestion

 drivers/i2c/i2c-core-acpi.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Comments

Hamish Martin April 3, 2024, 2:10 a.m. UTC | #1
On Wed, 2024-03-13 at 11:16 +1300, Hamish Martin wrote:
> There is an issue with ACPI overlay table removal specifically
> related
> to I2C multiplexers.
> 
> Consider an ACPI SSDT Overlay that defines a PCA9548 I2C mux on an
> existing I2C bus. When this table is loaded we see the creation of a
> device for the overall PCA9548 chip and 8 further devices - one
> i2c_adapter each for the mux channels. These are all bound to their
> ACPI equivalents via an eventual invocation of acpi_bind_one().
> 
> When we unload the SSDT overlay we run into the problem. The ACPI
> devices are deleted as normal via acpi_device_del_work_fn() and the
> acpi_device_del_list.
> 
> However, the following warning and stack trace is output as the
> deletion does not go smoothly:
> ------------[ cut here ]------------
> kernfs: can not remove 'physical_node', no directory
> WARNING: CPU: 1 PID: 11 at fs/kernfs/dir.c:1674
> kernfs_remove_by_name_ns+0xb9/0xc0
> Modules linked in:
> CPU: 1 PID: 11 Comm: kworker/u128:0 Not tainted 6.8.0-rc6+ #1
> Hardware name: congatec AG conga-B7E3/conga-B7E3, BIOS 5.13
> 05/16/2023
> Workqueue: kacpi_hotplug acpi_device_del_work_fn
> RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0
> Code: e4 00 48 89 ef e8 07 71 db ff 5b b8 fe ff ff ff 5d 41 5c 41 5d
> e9 a7 55 e4 00 0f 0b eb a6 48 c7 c7 f0 38 0d 9d e8 97 0a d5 ff <0f>
> 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> RSP: 0018:ffff9f864008fb28 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: ffff8ef90a8d4940 RCX: 0000000000000000
> RDX: ffff8f000e267d10 RSI: ffff8f000e25c780 RDI: ffff8f000e25c780
> RBP: ffff8ef9186f9870 R08: 0000000000013ffb R09: 00000000ffffbfff
> R10: 00000000ffffbfff R11: ffff8f000e0a0000 R12: ffff9f864008fb50
> R13: ffff8ef90c93dd60 R14: ffff8ef9010d0958 R15: ffff8ef9186f98c8
> FS:  0000000000000000(0000) GS:ffff8f000e240000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f48f5253a08 CR3: 00000003cb82e000 CR4: 00000000003506f0
> Call Trace:
>  <TASK>
>  ? kernfs_remove_by_name_ns+0xb9/0xc0
>  ? __warn+0x7c/0x130
>  ? kernfs_remove_by_name_ns+0xb9/0xc0
>  ? report_bug+0x171/0x1a0
>  ? handle_bug+0x3c/0x70
>  ? exc_invalid_op+0x17/0x70
>  ? asm_exc_invalid_op+0x1a/0x20
>  ? kernfs_remove_by_name_ns+0xb9/0xc0
>  ? kernfs_remove_by_name_ns+0xb9/0xc0
>  acpi_unbind_one+0x108/0x180
>  device_del+0x18b/0x490
>  ? srso_return_thunk+0x5/0x5f
>  ? srso_return_thunk+0x5/0x5f
>  device_unregister+0xd/0x30
>  i2c_del_adapter.part.0+0x1bf/0x250
>  i2c_mux_del_adapters+0xa1/0xe0
>  i2c_device_remove+0x1e/0x80
>  device_release_driver_internal+0x19a/0x200
>  bus_remove_device+0xbf/0x100
>  device_del+0x157/0x490
>  ? __pfx_device_match_fwnode+0x10/0x10
>  ? srso_return_thunk+0x5/0x5f
>  device_unregister+0xd/0x30
>  i2c_acpi_notify+0x10f/0x140
>  notifier_call_chain+0x58/0xd0
>  blocking_notifier_call_chain+0x3a/0x60
>  acpi_device_del_work_fn+0x85/0x1d0
>  process_one_work+0x134/0x2f0
>  worker_thread+0x2f0/0x410
>  ? __pfx_worker_thread+0x10/0x10
>  kthread+0xe3/0x110
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork+0x2f/0x50
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork_asm+0x1b/0x30
>  </TASK>
> ---[ end trace 0000000000000000 ]---
> ...
> repeated 7 more times, 1 for each channel of the mux
> ...
> 
> The issue is that the binding of the ACPI devices to their peer I2C
> adapters is not correctly cleaned up. Digging deeper into the issue
> we
> see that the deletion order is such that the ACPI devices matching
> the
> mux channel i2c adapters are deleted first during the SSDT overlay
> removal. For each of the channels we see a call to i2c_acpi_notify()
> with ACPI_RECONFIG_DEVICE_REMOVE but, because these devices are not
> actually i2c_clients, nothing is done for them.
> 
> Later on, after each of the mux channels has been dealt with, we come
> to delete the i2c_client representing the PCA9548 device. This is the
> call stack we see above, whereby the kernel cleans up the i2c_client
> including destruction of the mux and its channel adapters. At this
> point we do attempt to unbind from the ACPI peers but those peers no
> longer exist and so we hit the kernfs errors.
> 
> The fix is to augment i2c_acpi_notify() to handle i2c_adapters. But,
> given that the life cycle of the adapters is linked to the
> i2c_client,
> instead of deleting the i2c_adapters during the i2c_acpi_notify(), we
> just trigger unbinding of the ACPI device from the adapter device,
> and
> allow the clean up of the adapter to continue in the way it always
> has.
> 
> Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
> Fixes: 525e6fabeae2 ("i2c / ACPI: add support for ACPI reconfigure
> notifications")
> Cc: <stable@vger.kernel.org> # v4.8+
> ---
> 
> Notes:
>     v4:
>       Resolve Build failure noted by:
>         Linux Kernel Functional Testing <lkft@linaro.org>, and
>         kernel test robot <lkp@intel.com>
>       These failures led to revert of the v3 version of this patch
> that had been accepted earlier.
>     v3:
>       Add reviewed by tags (Mika Westerberg and Andi Shyti) and Fixes
> tag.
>     v2:
>       Moved long problem description from cover letter to commit
> description at Mika's suggestion
> 
>  drivers/i2c/i2c-core-acpi.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/i2c/i2c-core-acpi.c b/drivers/i2c/i2c-core-
> acpi.c
> index d6037a328669..14ae0cfc325e 100644
> --- a/drivers/i2c/i2c-core-acpi.c
> +++ b/drivers/i2c/i2c-core-acpi.c
> @@ -445,6 +445,11 @@ static struct i2c_client
> *i2c_acpi_find_client_by_adev(struct acpi_device *adev)
>         return i2c_find_device_by_fwnode(acpi_fwnode_handle(adev));
>  }
>  
> +static struct i2c_adapter *i2c_acpi_find_adapter_by_adev(struct
> acpi_device *adev)
> +{
> +       return i2c_find_adapter_by_fwnode(acpi_fwnode_handle(adev));
> +}
> +
>  static int i2c_acpi_notify(struct notifier_block *nb, unsigned long
> value,
>                            void *arg)
>  {
> @@ -471,11 +476,17 @@ static int i2c_acpi_notify(struct
> notifier_block *nb, unsigned long value,
>                         break;
>  
>                 client = i2c_acpi_find_client_by_adev(adev);
> -               if (!client)
> -                       break;
> +               if (client) {
> +                       i2c_unregister_device(client);
> +                       put_device(&client->dev);
> +               }
> +
> +               adapter = i2c_acpi_find_adapter_by_adev(adev);
> +               if (adapter) {
> +                       acpi_unbind_one(&adapter->dev);
> +                       put_device(&adapter->dev);
> +               }
>  
> -               i2c_unregister_device(client);
> -               put_device(&client->dev);
>                 break;
>         }
>  

I wonder if this patch has slipped through the net due to a process
error on my part.

The v3 version was accepted by Wolfram into for-current on March 4th,
but a build issue was detected by a couple of test robots, leading to
it being reverted 3 days later.
I submitted this corrected v4 version on March 13th. Perhaps I should
have not submitted a v4 and that has screwed something up, or perhaps
folks are just busy ;-).

Let me know what further work is required to get this one accepted.

Thanks,
Hamish M.
diff mbox series

Patch

diff --git a/drivers/i2c/i2c-core-acpi.c b/drivers/i2c/i2c-core-acpi.c
index d6037a328669..14ae0cfc325e 100644
--- a/drivers/i2c/i2c-core-acpi.c
+++ b/drivers/i2c/i2c-core-acpi.c
@@ -445,6 +445,11 @@  static struct i2c_client *i2c_acpi_find_client_by_adev(struct acpi_device *adev)
 	return i2c_find_device_by_fwnode(acpi_fwnode_handle(adev));
 }
 
+static struct i2c_adapter *i2c_acpi_find_adapter_by_adev(struct acpi_device *adev)
+{
+	return i2c_find_adapter_by_fwnode(acpi_fwnode_handle(adev));
+}
+
 static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
 			   void *arg)
 {
@@ -471,11 +476,17 @@  static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
 			break;
 
 		client = i2c_acpi_find_client_by_adev(adev);
-		if (!client)
-			break;
+		if (client) {
+			i2c_unregister_device(client);
+			put_device(&client->dev);
+		}
+
+		adapter = i2c_acpi_find_adapter_by_adev(adev);
+		if (adapter) {
+			acpi_unbind_one(&adapter->dev);
+			put_device(&adapter->dev);
+		}
 
-		i2c_unregister_device(client);
-		put_device(&client->dev);
 		break;
 	}