mbox series

[SRU,Impish,0/1] amd_sfh: Null pointer dereference on early device init causes early panic and fails to boot

Message ID 20220112234115.11152-1-matthew.ruffell@canonical.com
Headers show
Series amd_sfh: Null pointer dereference on early device init causes early panic and fails to boot | expand

Message

Matthew Ruffell Jan. 12, 2022, 11:41 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1956519

[Impact]

A regression was introduced into 5.13.0-23-generic for devices using AMD Ryzen 
chipsets that incorporate AMD Sensor Fusion Hub (SFH) HID devices, which are
mostly Ryzen based laptops, but desktops do have the SOC embedded as well.

On early boot, when the driver initialises the device, it hits a null pointer
dereference with the following stack trace:

BUG: kernel NULL pointer dereference, address: 000000000000000c
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP NOPTI
CPU: 0 PID: 175 Comm: systemd-udevd Not tainted 5.13.0-23-generic #23-Ubuntu
RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh]
Call Trace:
  ? __pci_set_master+0x5f/0xe0
  amd_mp2_pci_probe+0xad/0x160 [amd_sfh]
  local_pci_probe+0x48/0x80
  pci_device_probe+0x105/0x1c0
  really_probe+0x24b/0x4c0
  driver_probe_device+0xf0/0x160
  device_driver_attach+0xab/0xb0
  __driver_attach+0xb2/0x140
  ? device_driver_attach+0xb0/0xb0
  bus_for_each_dev+0x7e/0xc0
  driver_attach+0x1e/0x20
  bus_add_driver+0x135/0x1f0
  driver_register+0x95/0xf0
  ? 0xffffffffc03d2000
  __pci_register_driver+0x57/0x60
  amd_mp2_pci_driver_init+0x23/0x1000 [amd_sfh]
  do_one_initcall+0x48/0x1d0
  ? kmem_cache_alloc_trace+0xfb/0x240
  do_init_module+0x62/0x290
  load_module+0xa8f/0xb10
  __do_sys_finit_module+0xc2/0x120
  __x64_sys_finit_module+0x18/0x20
  do_syscall_64+0x61/0xb0
  ? ksys_mmap_pgoff+0x135/0x260
  ? exit_to_user_mode_prepare+0x37/0xb0
  ? syscall_exit_to_user_mode+0x27/0x50
  ? __x64_sys_mmap+0x33/0x40
  ? do_syscall_64+0x6e/0xb0
  ? do_syscall_64+0x6e/0xb0
  ? do_syscall_64+0x6e/0xb0
  ? syscall_exit_to_user_mode+0x27/0x50
  ? do_syscall_64+0x6e/0xb0
  ? exc_page_fault+0x8f/0x170
  ? asm_exc_page_fault+0x8/0x30
  entry_SYSCALL_64_after_hwframe+0x44/0xae

This causes a panic and the system is unable to continue booting, and the user
must select an older kernel to boot.

[Fix]

The issue was introduced in 5.13.0-23-generic by the commit:

commit d46ef750ed58cbeeba2d9a55c99231c30a172764
commit-impish 56559d7910e704470ad72da58469b5588e8cbf85
Author: Evgeny Novikov <novikov@ispras.ru>
Date: Tue Jun 1 19:38:01 2021 +0300
Subject:HID: amd_sfh: Fix potential NULL pointer dereference
Link: https://github.com/torvalds/linux/commit/d46ef750ed58cbeeba2d9a55c99231c30a172764

The issue is pretty straightforward, amd_sfh_client.c attempts to dereference
cl_data, but it is NULL:

$ eu-addr2line -ifae ./usr/lib/debug/lib/modules/5.13.0-23-generic/kernel/drivers/hid/amd-sfh-hid/amd_sfh.ko amd_sfh_hid_client_init+0x47
0x0000000000000767
amd_sfh_hid_client_init
/build/linux-k2e9CH/linux-5.13.0/drivers/hid/amd-sfh-hid/amd_sfh_client.c:147:27

134 int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata)
135 {
...
146
147 cl_data->num_hid_devices = amd_mp2_get_sensor_num(privdata, &cl_data->sensor_idx[0]);
148
...

The patch moves the call to amd_sfh_hid_client_init() before privdata->cl_data
is actually allocated by devm_kzalloc, hence cl_data being NULL.

+ rc = amd_sfh_hid_client_init(privdata);
+ if (rc)
+ return rc;
+
        privdata->cl_data = devm_kzalloc(&pdev->dev, sizeof(struct amdtp_cl_data), GFP_KERNEL);
        if (!privdata->cl_data)
                return -ENOMEM;
...
- return amd_sfh_hid_client_init(privdata);
+ return 0;

The issue was fixed upstream in 5.15-rc4 by the commit:

commit 88a04049c08cd62e698bc1b1af2d09574b9e0aee
Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Date: Thu Sep 23 17:59:27 2021 +0530
Subject: HID: amd_sfh: Fix potential NULL pointer dereference
Link: https://github.com/torvalds/linux/commit/88a04049c08cd62e698bc1b1af2d09574b9e0aee

The fix places the call to amd_sfh_hid_client_init() after privdata->cl_data is
allocated, and it changes the order of amd_sfh_hid_client_init() to happen
before devm_add_action_or_reset() fixing the actual null pointer dereference
which caused these commits to exist.

This patch also landed in 5.14.10 -stable, but it seems it was omitted from
being backported to impish, likely due to it sharing the exact same subject line
as the regression commit, so it was likely dropped as a duplicate?

[Testcase]

You need an AMD Ryzen based system that has a AMD Sensor Fusion Hub HID device
built in to test this.

Simply booting the system is enough to trigger the issue.

A test kernel is available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/lp1956519-test

A community user has tested the test kernel, and has confirmed that it fixes the
issue.

[Where problems could occur]

If a regression were to occur, it would only affect AMD Ryzen based systems with
the AMD Sensor Fusion Hub HID device SOC. Since the changes affect the device
initialisation function, a regression could cause systems to panic during boot,
forcing users to revert to older kernels to start their systems.

Saying that, the patch is present in 5.15-rc4 and is in 5.14.10, and is in
widespread use, and is already present in Jammy.

Basavaraj Natikar (1):
  HID: amd_sfh: Fix potential NULL pointer dereference

 drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

Comments

Kleber Sacilotto de Souza Jan. 13, 2022, 11:22 a.m. UTC | #1
On 13.01.22 00:41, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1956519
>
> [Impact]
>
> A regression was introduced into 5.13.0-23-generic for devices using AMD Ryzen
> chipsets that incorporate AMD Sensor Fusion Hub (SFH) HID devices, which are
> mostly Ryzen based laptops, but desktops do have the SOC embedded as well.
>
> On early boot, when the driver initialises the device, it hits a null pointer
> dereference with the following stack trace:
>
> BUG: kernel NULL pointer dereference, address: 000000000000000c
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 0 P4D 0
> Oops: 0002 [#1] SMP NOPTI
> CPU: 0 PID: 175 Comm: systemd-udevd Not tainted 5.13.0-23-generic #23-Ubuntu
> RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh]
> Call Trace:
>    ? __pci_set_master+0x5f/0xe0
>    amd_mp2_pci_probe+0xad/0x160 [amd_sfh]
>    local_pci_probe+0x48/0x80
>    pci_device_probe+0x105/0x1c0
>    really_probe+0x24b/0x4c0
>    driver_probe_device+0xf0/0x160
>    device_driver_attach+0xab/0xb0
>    __driver_attach+0xb2/0x140
>    ? device_driver_attach+0xb0/0xb0
>    bus_for_each_dev+0x7e/0xc0
>    driver_attach+0x1e/0x20
>    bus_add_driver+0x135/0x1f0
>    driver_register+0x95/0xf0
>    ? 0xffffffffc03d2000
>    __pci_register_driver+0x57/0x60
>    amd_mp2_pci_driver_init+0x23/0x1000 [amd_sfh]
>    do_one_initcall+0x48/0x1d0
>    ? kmem_cache_alloc_trace+0xfb/0x240
>    do_init_module+0x62/0x290
>    load_module+0xa8f/0xb10
>    __do_sys_finit_module+0xc2/0x120
>    __x64_sys_finit_module+0x18/0x20
>    do_syscall_64+0x61/0xb0
>    ? ksys_mmap_pgoff+0x135/0x260
>    ? exit_to_user_mode_prepare+0x37/0xb0
>    ? syscall_exit_to_user_mode+0x27/0x50
>    ? __x64_sys_mmap+0x33/0x40
>    ? do_syscall_64+0x6e/0xb0
>    ? do_syscall_64+0x6e/0xb0
>    ? do_syscall_64+0x6e/0xb0
>    ? syscall_exit_to_user_mode+0x27/0x50
>    ? do_syscall_64+0x6e/0xb0
>    ? exc_page_fault+0x8f/0x170
>    ? asm_exc_page_fault+0x8/0x30
>    entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> This causes a panic and the system is unable to continue booting, and the user
> must select an older kernel to boot.
>
> [Fix]
>
> The issue was introduced in 5.13.0-23-generic by the commit:
>
> commit d46ef750ed58cbeeba2d9a55c99231c30a172764
> commit-impish 56559d7910e704470ad72da58469b5588e8cbf85
> Author: Evgeny Novikov <novikov@ispras.ru>
> Date: Tue Jun 1 19:38:01 2021 +0300
> Subject:HID: amd_sfh: Fix potential NULL pointer dereference
> Link: https://github.com/torvalds/linux/commit/d46ef750ed58cbeeba2d9a55c99231c30a172764
>
> The issue is pretty straightforward, amd_sfh_client.c attempts to dereference
> cl_data, but it is NULL:
>
> $ eu-addr2line -ifae ./usr/lib/debug/lib/modules/5.13.0-23-generic/kernel/drivers/hid/amd-sfh-hid/amd_sfh.ko amd_sfh_hid_client_init+0x47
> 0x0000000000000767
> amd_sfh_hid_client_init
> /build/linux-k2e9CH/linux-5.13.0/drivers/hid/amd-sfh-hid/amd_sfh_client.c:147:27
>
> 134 int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata)
> 135 {
> ...
> 146
> 147 cl_data->num_hid_devices = amd_mp2_get_sensor_num(privdata, &cl_data->sensor_idx[0]);
> 148
> ...
>
> The patch moves the call to amd_sfh_hid_client_init() before privdata->cl_data
> is actually allocated by devm_kzalloc, hence cl_data being NULL.
>
> + rc = amd_sfh_hid_client_init(privdata);
> + if (rc)
> + return rc;
> +
>          privdata->cl_data = devm_kzalloc(&pdev->dev, sizeof(struct amdtp_cl_data), GFP_KERNEL);
>          if (!privdata->cl_data)
>                  return -ENOMEM;
> ...
> - return amd_sfh_hid_client_init(privdata);
> + return 0;
>
> The issue was fixed upstream in 5.15-rc4 by the commit:
>
> commit 88a04049c08cd62e698bc1b1af2d09574b9e0aee
> Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> Date: Thu Sep 23 17:59:27 2021 +0530
> Subject: HID: amd_sfh: Fix potential NULL pointer dereference
> Link: https://github.com/torvalds/linux/commit/88a04049c08cd62e698bc1b1af2d09574b9e0aee
>
> The fix places the call to amd_sfh_hid_client_init() after privdata->cl_data is
> allocated, and it changes the order of amd_sfh_hid_client_init() to happen
> before devm_add_action_or_reset() fixing the actual null pointer dereference
> which caused these commits to exist.
>
> This patch also landed in 5.14.10 -stable, but it seems it was omitted from
> being backported to impish, likely due to it sharing the exact same subject line
> as the regression commit, so it was likely dropped as a duplicate?
>
> [Testcase]
>
> You need an AMD Ryzen based system that has a AMD Sensor Fusion Hub HID device
> built in to test this.
>
> Simply booting the system is enough to trigger the issue.
>
> A test kernel is available in the following ppa:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/lp1956519-test
>
> A community user has tested the test kernel, and has confirmed that it fixes the
> issue.
>
> [Where problems could occur]
>
> If a regression were to occur, it would only affect AMD Ryzen based systems with
> the AMD Sensor Fusion Hub HID device SOC. Since the changes affect the device
> initialisation function, a regression could cause systems to panic during boot,
> forcing users to revert to older kernels to start their systems.
>
> Saying that, the patch is present in 5.15-rc4 and is in 5.14.10, and is in
> widespread use, and is already present in Jammy.
>
> Basavaraj Natikar (1):
>    HID: amd_sfh: Fix potential NULL pointer dereference
>
>   drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 12 ++++--------
>   1 file changed, 4 insertions(+), 8 deletions(-)
>

Applied to impish:linux.

Thanks,
Kleber