Message ID | 20190614071643.18607-1-mine260309@gmail.com |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | [linux,dev-5.1] fsi: core: Fix NULL dereference issue | expand |
Hi Lei, > The failure case in fsi_slave_init() is wrong and could cause NULL > dereference issue. > E.g. on FP5280G2 machine, it could get failure in > fsi_slave_set_smode(), > and when it does fsi rescan, kernel crashes due to: > > Unable to handle kernel NULL pointer dereference at virtual > address 00000060 > > The fix is to make it not calling kfree() but just goto err_free. > > However, in err_free, it calls put_device() to free the device, it > still > cause issue during fsi rescan, that the device is used after freed. > > WARNING: CPU: 0 PID: 1433 at lib/refcount.c:190 > refcount_sub_and_test_checked+0x94/0xac > refcount_t: underflow; use-after-free. > > So the put_device() is removed and "err_free" label is renamed to > "fail". It looks like this will leak memory (through the struct fsi_slave) that has been kzalloc()ed. After device_register, we need to call put_device() to free the struct fsi_slave, but there's no mechanism for that to happen if we remove it from fsi_slave_init(). The error paths for this function do need to be fixed, but I don't think this is the right approach. Do you have a backtrace of the refcount_sub_and_test_checked+0x94/0xac warning? This may not be the actual struct device that underflows. Cheers, Jeremy
On Wed, Jun 26, 2019 at 9:33 AM Jeremy Kerr <jk@ozlabs.org> wrote: > > Hi Lei, > > > The failure case in fsi_slave_init() is wrong and could cause NULL > > dereference issue. > > E.g. on FP5280G2 machine, it could get failure in > > fsi_slave_set_smode(), > > and when it does fsi rescan, kernel crashes due to: > > > > Unable to handle kernel NULL pointer dereference at virtual > > address 00000060 > > > > The fix is to make it not calling kfree() but just goto err_free. > > > > However, in err_free, it calls put_device() to free the device, it > > still > > cause issue during fsi rescan, that the device is used after freed. > > > > WARNING: CPU: 0 PID: 1433 at lib/refcount.c:190 > > refcount_sub_and_test_checked+0x94/0xac > > refcount_t: underflow; use-after-free. > > > > So the put_device() is removed and "err_free" label is renamed to > > "fail". > > It looks like this will leak memory (through the struct fsi_slave) that > has been kzalloc()ed. After device_register, we need to call > put_device() to free the struct fsi_slave, but there's no mechanism for > that to happen if we remove it from fsi_slave_init(). The memory is "leaked" in this function, that the slave device is not freed here. But eventually, it will be freed in fsi_slave_release() (if I understand the code correctly), so there is no leak, eventually. > The error paths for this function do need to be fixed, but I don't think > this is the right approach. > > Do you have a backtrace of the refcount_sub_and_test_checked+0x94/0xac > warning? This may not be the actual struct device that underflows. Yes, below is the full trace: kernel: ------------[ cut here ]------------ kernel: WARNING: CPU: 0 PID: 1417 at lib/refcount.c:190 refcount_sub_and_test_checked+0x94/0xac kernel: refcount_t: underflow; use-after-free. kernel: CPU: 0 PID: 1417 Comm: openpower-proc- Tainted: G W 5.1.1-60926c8a1d00a056568cb44da9e96d3ee5f6e654 #1 kernel: Hardware name: Generic DT based system kernel: Backtrace: kernel: [<80107ca0>] (dump_backtrace) from [<80107ed4>] (show_stack+0x20/0x24) kernel: r7:803629e8 r6:00000009 r5:00000000 r4:95937d14 kernel: [<80107eb4>] (show_stack) from [<8067e944>] (dump_stack+0x20/0x28) kernel: [<8067e924>] (dump_stack) from [<80115ff8>] (__warn.part.3+0xb4/0xdc) kernel: [<80115f44>] (__warn.part.3) from [<8011608c>] (warn_slowpath_fmt+0x6c/0x90) kernel: r6:000000be r5:807ff14c r4:80a07008 kernel: [<80116024>] (warn_slowpath_fmt) from [<803629e8>] (refcount_sub_and_test_checked+0x94/0xac) kernel: r3:80a4cbda r2:807ff1b8 kernel: r7:80410c1c r6:94506710 r5:95937ddc r4:00000000 kernel: [<80362954>] (refcount_sub_and_test_checked) from [<80362a18>] (refcount_dec_and_test_checked+0x18/0x1c) kernel: r5:95937ddc r4:94503c08 kernel: [<80362a00>] (refcount_dec_and_test_checked) from [<806836a8>] (kobject_put+0x2c/0x1e8) kernel: [<8068367c>] (kobject_put) from [<80410c44>] (klist_children_put+0x28/0x2c) kernel: r6:94506710 r5:95937ddc r4:945060fc kernel: [<80410c1c>] (klist_children_put) from [<80683130>] (klist_next+0x8c/0xe0) kernel: [<806830a4>] (klist_next) from [<80412548>] (device_for_each_child+0x4c/0xa4) kernel: r10:00000051 r9:95937f60 r8:95075030 r7:80a07008 r6:80525804 r5:00000000 kernel: r4:00000000 r3:db89ad67 kernel: [<804124fc>] (device_for_each_child) from [<80525910>] (fsi_master_rescan+0x40/0x68) kernel: r7:959293a0 r6:95075020 r5:94503afc r4:94503a00 kernel: [<805258d0>] (fsi_master_rescan) from [<8052562c>] (master_rescan_store+0x1c/0x28) kernel: r5:00000000 r4:00000001 kernel: [<80525610>] (master_rescan_store) from [<80410354>] (dev_attr_store+0x28/0x34) kernel: r5:00000000 r4:80525610 kernel: [<8041032c>] (dev_attr_store) from [<802acac8>] (sysfs_kf_write+0x48/0x54) kernel: r5:00000000 r4:8041032c kernel: [<802aca80>] (sysfs_kf_write) from [<802abef8>] (kernfs_fop_write+0x10c/0x1ec) kernel: r5:00000000 r4:00000000 kernel: [<802abdec>] (kernfs_fop_write) from [<80237e80>] (__vfs_write+0x44/0x18c) kernel: r10:95937f60 r9:00000001 r8:00000000 r7:95937f60 r6:802abdec r5:80a07008 kernel: r4:95930f40 kernel: [<80237e3c>] (__vfs_write) from [<80239864>] (vfs_write+0xb0/0x194) kernel: r10:00000000 r9:00000000 r8:00000000 r7:95937f60 r6:0101a3d0 r5:95930f40 kernel: r4:00000001 kernel: [<802397b4>] (vfs_write) from [<80239b10>] (ksys_write+0x70/0xe8) kernel: r8:00000000 r7:0101a3d0 r6:80a07008 r5:95930f40 r4:95930f40 kernel: [<80239aa0>] (ksys_write) from [<80239ba0>] (sys_write+0x18/0x1c) kernel: r9:95936000 r8:801011e4 r7:00000004 r6:00000003 r5:0101a3d0 r4:00000001 kernel: [<80239b88>] (sys_write) from [<80101000>] (ret_fast_syscall+0x0/0x54) kernel: Exception stack(0x95937fa8 to 0x95937ff0) kernel: 7fa0: 00000001 0101a3d0 00000003 0101a3d0 00000001 00000000 kernel: 7fc0: 00000001 0101a3d0 00000003 00000004 0006ac0c 0006ac20 7e994c88 7e994a1c kernel: 7fe0: 00000070 7e9949a8 48a884ac 48714fec kernel: ---[ end trace 366cbd7a56b7dc9f ]---
On 06/26/2019 around 02:39AM in some timezone, Lei YU wrote: >On Wed, Jun 26, 2019 at 9:33 AM Jeremy Kerr <jk@ozlabs.org> wrote: >> >> Hi Lei, >> >> > The failure case in fsi_slave_init() is wrong and could cause >NULL >> > dereference issue. >> > E.g. on FP5280G2 machine, it could get failure in >> > fsi_slave_set_smode(), >> > and when it does fsi rescan, kernel crashes due to: >> > >> > Unable to handle kernel NULL pointer dereference at virtual >> > address 00000060 >> > >> > The fix is to make it not calling kfree() but just goto err_free. >> > >> > However, in err_free, it calls put_device() to free the device, >it >> > still >> > cause issue during fsi rescan, that the device is used after >freed. >> > >> > WARNING: CPU: 0 PID: 1433 at lib/refcount.c:190 >> > refcount_sub_and_test_checked+0x94/0xac >> > refcount_t: underflow; use-after-free. >> > >> > So the put_device() is removed and "err_free" label is renamed to >> > "fail". >> >> It looks like this will leak memory (through the struct fsi_slave) >that >> has been kzalloc()ed. After device_register, we need to call >> put_device() to free the struct fsi_slave, but there's no mechanism >for >> that to happen if we remove it from fsi_slave_init(). > >The memory is "leaked" in this function, that the slave device is not >freed >here. But eventually, it will be freed in fsi_slave_release() (if I >understand >the code correctly), so there is no leak, eventually. This is definitely bad. In other paths the struct device for the fsi-slave might be kept beyond the lifetime of the fsi-master slave device, which would lead to use after free. Each kobject must have its own release method and be allocated via a separate kmalloc. > >> The error paths for this function do need to be fixed, but I don't >think >> this is the right approach. >> >> Do you have a backtrace of the >refcount_sub_and_test_checked+0x94/0xac >> warning? This may not be the actual struct device that underflows. > >Yes, below is the full trace: This is just one possibility. BTW, there doesn't seem to be any MAINTAINERS entry for drivers/fsi Milton
Hi Lei > > > > It looks like this will leak memory (through the struct fsi_slave) > > that > > has been kzalloc()ed. After device_register, we need to call > > put_device() to free the struct fsi_slave, but there's no mechanism > > for > > that to happen if we remove it from fsi_slave_init(). > > The memory is "leaked" in this function, that the slave device is not > freed > here. But eventually, it will be freed in fsi_slave_release() (if I > understand > the code correctly), so there is no leak, eventually. But there's no way for fsi_slave_release() to be called, as the device isn't registered with the core (d1dcd67825 effectively removed the device_add from the slave device init). I think that the BUG_ON that we're hitting in the release path is through the cdev parent release, which brings the refcount down to 0, then we do another put_device(). This could just be a matter of correcting the registration of the slave to the device core - I'll take a look here. Cheers, Jeremy
diff --git a/drivers/fsi/fsi-core.c b/drivers/fsi/fsi-core.c index 2c31563fdcae..ebede90cf4bd 100644 --- a/drivers/fsi/fsi-core.c +++ b/drivers/fsi/fsi-core.c @@ -1041,14 +1041,14 @@ static int fsi_slave_init(struct fsi_master *master, int link, uint8_t id) rc = __fsi_get_new_minor(slave, fsi_dev_cfam, &slave->dev.devt, &slave->cdev_idx); if (rc) - goto err_free; + goto fail; /* Create chardev for userspace access */ cdev_init(&slave->cdev, &cfam_fops); rc = cdev_device_add(&slave->cdev, &slave->dev); if (rc) { dev_err(&slave->dev, "Error %d creating slave device\n", rc); - goto err_free; + goto fail; } rc = fsi_slave_set_smode(slave); @@ -1056,8 +1056,8 @@ static int fsi_slave_init(struct fsi_master *master, int link, uint8_t id) dev_warn(&master->dev, "can't set smode on slave:%02x:%02x %d\n", link, id, rc); - kfree(slave); - return -ENODEV; + rc = -ENODEV; + goto fail; } if (master->link_config) master->link_config(master, link, @@ -1075,10 +1075,7 @@ static int fsi_slave_init(struct fsi_master *master, int link, uint8_t id) dev_dbg(&master->dev, "failed during slave scan with: %d\n", rc); - return rc; - - err_free: - put_device(&slave->dev); + fail: return rc; }
The failure case in fsi_slave_init() is wrong and could cause NULL dereference issue. E.g. on FP5280G2 machine, it could get failure in fsi_slave_set_smode(), and when it does fsi rescan, kernel crashes due to: Unable to handle kernel NULL pointer dereference at virtual address 00000060 The fix is to make it not calling kfree() but just goto err_free. However, in err_free, it calls put_device() to free the device, it still cause issue during fsi rescan, that the device is used after freed. WARNING: CPU: 0 PID: 1433 at lib/refcount.c:190 refcount_sub_and_test_checked+0x94/0xac refcount_t: underflow; use-after-free. So the put_device() is removed and "err_free" label is renamed to "fail". The slave device is freed by put_device() in fsi_master_remove_slave(). Signed-off-by: Lei YU <mine260309@gmail.com> --- drivers/fsi/fsi-core.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-)