Message ID | 20220809075753.21950-1-liwei.song@windriver.com |
---|---|
State | Changes Requested |
Headers | show |
Series | mtd_blkdevs: add mtd_table_mutex lock back to blktrans_{open, release} to avoid race condition | expand |
On Tue, Aug 09, 2022 at 03:57:53PM +0800, Liwei Song wrote: > without lock mtd_table_mutex in blktrans_{open, release}, there will > be a race condition when access devices /dev/mtd1 and /dev/mtdblock1 > at the same time with a high frequency open and close test: > > kernel BUG at drivers/mtd/mtdcore.c:1221! > lr : blktrans_release+0xb0/0x120 This does not seem on a current mainline kernel and seems to be a somewhat incomplete backtrace. Can you send the full dmesg of a latest mainline run and maybe share the reproducer? > diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c > index b8ae1ec14e17..147e4a11dfe4 100644 > --- a/drivers/mtd/mtd_blkdevs.c > +++ b/drivers/mtd/mtd_blkdevs.c > @@ -188,6 +188,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) > > kref_get(&dev->ref); > > + if (!mutex_trylock(&mtd_table_mutex)) > + return ret; No, that's not really the solution. Turning the kref_get above into a kref_get_unless_zero might be better path to look into.
On 8/9/22 16:08, ChristophHellwig wrote: > On Tue, Aug 09, 2022 at 03:57:53PM +0800, Liwei Song wrote: >> without lock mtd_table_mutex in blktrans_{open, release}, there will >> be a race condition when access devices /dev/mtd1 and /dev/mtdblock1 >> at the same time with a high frequency open and close test: >> >> kernel BUG at drivers/mtd/mtdcore.c:1221! >> lr : blktrans_release+0xb0/0x120 > > This does not seem on a current mainline kernel and seems to be > a somewhat incomplete backtrace. Can you send the full dmesg of > a latest mainline run and maybe share the reproducer? Yes, the kernel I used is 5.15, unfortunately this is the latest version that worked on my board, the whole log is: [ 31.301343] ------------[ cut here ]------------ [ 31.301343] ------------[ cut here ]------------ [ 31.301365] kernel BUG at drivers/mtd/mtdcore.c:1221! [ 31.314981] kernel BUG at drivers/mtd/mtdcore.c:1221! [ 31.329328] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 31.374117] Modules linked in: 8021q sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse [ 31.395820] CPU: 2 PID: 390 Comm: a.out Not tainted 5.15.58-yocto-standard #1 [ 31.412672] Hardware name: SoCFPGA Agilex SoCDK (DT) [ 31.427372] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 31.444058] pc : __put_mtd_device+0x4c/0x84 [ 31.457977] lr : put_mtd_device+0x3c/0x5c [ 31.464122] sp : ffff80000c26bc50 [ 31.466126] x29: ffff80000c26bc50 x28: ffff000285785100 x27: 0000000000000000 [ 31.471945] x26: 0000000045585401 x25: 0000000000000000 x24: ffff000285785768 [ 31.477762] x23: ffff0002803ee520 x22: ffff0002804f8900 x21: ffff000281956800 [ 31.483580] x20: ffff000281956800 x19: ffff000281955080 x18: 0000000000000000 [ 31.489402] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 31.495219] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 31.501037] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000088a79b0 [ 31.506855] x8 : ffff80000c26bcb0 x7 : 0000000000000000 x6 : 0000000000000001 [ 31.512673] x5 : ffff000280959488 x4 : 0000000000000000 x3 : 0000000000000000 [ 31.518491] x2 : ffff000281956800 x1 : 00000000ffffffff x0 : ffff000281955080 [ 31.524310] Call trace: [ 31.525450] __put_mtd_device+0x4c/0x84 [ 31.527979] put_mtd_device+0x3c/0x5c [ 31.530333] mtdchar_close+0x3c/0x84 [ 31.532604] __fput+0x78/0x220 [ 31.534357] ____fput+0x1c/0x30 [ 31.536193] task_work_run+0x88/0xc0 [ 31.538467] do_notify_resume+0x384/0x12a0 [ 31.541261] el0_svc+0x6c/0x80 [ 31.543015] el0t_64_sync_handler+0xa4/0x130 [ 31.545977] el0t_64_sync+0x1a0/0x1a4 [ 31.548338] Code: b9448841 51000421 b9048841 36ffff41 (d4210000) [ 31.553115] ---[ end trace 9652b26ea1d7daa1 ]--- [ 31.556420] Internal error: Oops - BUG: 0 [#2] PREEMPT SMP [ 31.556420] note: a.out[390] exited with preempt_count 1 [ 31.560588] Modules linked in: 8021q sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse [ 31.575265] CPU: 3 PID: 391 Comm: a.out Tainted: G D 5.15.58-yocto-standard #1 [ 31.582466] Hardware name: SoCFPGA Agilex SoCDK (DT) [ 31.586115] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 31.591757] pc : __put_mtd_device+0x4c/0x84 [ 31.594642] lr : blktrans_release+0xb0/0x120 [ 31.597603] sp : ffff80000c22bc20 [ 31.599608] x29: ffff80000c22bc20 x28: ffff000285785e80 x27: 0000000000000000 [ 31.605428] x26: 0000000045585401 x25: 0000000000000000 x24: ffff0002857864e8 [ 31.611247] x23: ffff0002803ee520 x22: ffff0002803e3470 x21: ffff0002803e3400 [ 31.617066] x20: ffff0002803e3020 x19: ffff000281955080 x18: 0000000000000000 [ 31.622884] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 31.628702] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 31.634519] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000088b0230 [ 31.640337] x8 : ffff80000c22bb90 x7 : 0000000000000000 x6 : 0000000000000001 [ 31.646155] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 31.651973] x2 : ffff000281956800 x1 : 00000000ffffffff x0 : ffff000281955080 [ 31.657792] Call trace: [ 31.658933] __put_mtd_device+0x4c/0x84 [ 31.661462] blktrans_release+0xb0/0x120 [ 31.664077] blkdev_put+0xd4/0x210 [ 31.666175] blkdev_close+0x34/0x50 [ 31.668355] __fput+0x78/0x220 [ 31.670108] ____fput+0x1c/0x30 [ 31.671943] task_work_run+0x88/0xc0 [ 31.674217] do_notify_resume+0x384/0x12a0 [ 31.677009] el0_svc+0x6c/0x80 [ 31.678762] el0t_64_sync_handler+0xa4/0x130 [ 31.681723] el0t_64_sync+0x1a0/0x1a4 [ 31.684082] Code: b9448841 51000421 b9048841 36ffff41 (d4210000) [ 31.688857] ---[ end trace 9652b26ea1d7daa2 ]--- [ 31.692161] note: a.out[391] exited with preempt_count 1 the testcase a.out is compiled from below code: when run the case /dev/mtd1 and /dev/mtdblock1 will be used for open and close test. #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> #include <signal.h> #include <unistd.h> int main(int argc, char *argv[]) { pid_t pid, pid1, pid2; int fd,ret = 0; int status = 0; char device_char[12]="/dev/mtd"; char device_block[17]="/dev/mtdblock"; strcat(device_char, argv[1]); strcat(device_block, argv[1]); pid1 = fork(); if(pid1 == 0){ while(1){ fd = open(device_char, O_SYNC|O_RDWR); ret = close(fd); } } pid2 = fork(); if(pid2 == 0){ while(1){ fd = open(device_block, O_SYNC|O_RDWR); ret = close(fd); } } sleep(10); kill(pid1, SIGKILL); kill(pid2, SIGKILL); pid = wait(&status); pid = wait(&status); return 0; } > >> diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c >> index b8ae1ec14e17..147e4a11dfe4 100644 >> --- a/drivers/mtd/mtd_blkdevs.c >> +++ b/drivers/mtd/mtd_blkdevs.c >> @@ -188,6 +188,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) >> >> kref_get(&dev->ref); >> >> + if (!mutex_trylock(&mtd_table_mutex)) >> + return ret; > > No, that's not really the solution. > > Turning the kref_get above into a kref_get_unless_zero might be better > path to look into. Thanks, I will have a look at this. Liwei. >
On 8/9/22 17:03, Liwei Song wrote: > > > On 8/9/22 16:08, ChristophHellwig wrote: >> On Tue, Aug 09, 2022 at 03:57:53PM +0800, Liwei Song wrote: >>> without lock mtd_table_mutex in blktrans_{open, release}, there will >>> be a race condition when access devices /dev/mtd1 and /dev/mtdblock1 >>> at the same time with a high frequency open and close test: >>> >>> kernel BUG at drivers/mtd/mtdcore.c:1221! >>> lr : blktrans_release+0xb0/0x120 >> >> This does not seem on a current mainline kernel and seems to be >> a somewhat incomplete backtrace. Can you send the full dmesg of >> a latest mainline run and maybe share the reproducer? > > Yes, the kernel I used is 5.15, unfortunately this is the latest version > that worked on my board, the whole log is: > > [ 31.301343] ------------[ cut here ]------------ > [ 31.301343] ------------[ cut here ]------------ > [ 31.301365] kernel BUG at drivers/mtd/mtdcore.c:1221! > [ 31.314981] kernel BUG at drivers/mtd/mtdcore.c:1221! > [ 31.329328] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > [ 31.374117] Modules linked in: 8021q sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse > [ 31.395820] CPU: 2 PID: 390 Comm: a.out Not tainted 5.15.58-yocto-standard #1 > [ 31.412672] Hardware name: SoCFPGA Agilex SoCDK (DT) > [ 31.427372] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 31.444058] pc : __put_mtd_device+0x4c/0x84 > [ 31.457977] lr : put_mtd_device+0x3c/0x5c > [ 31.464122] sp : ffff80000c26bc50 > [ 31.466126] x29: ffff80000c26bc50 x28: ffff000285785100 x27: 0000000000000000 > [ 31.471945] x26: 0000000045585401 x25: 0000000000000000 x24: ffff000285785768 > [ 31.477762] x23: ffff0002803ee520 x22: ffff0002804f8900 x21: ffff000281956800 > [ 31.483580] x20: ffff000281956800 x19: ffff000281955080 x18: 0000000000000000 > [ 31.489402] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 > [ 31.495219] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 > [ 31.501037] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000088a79b0 > [ 31.506855] x8 : ffff80000c26bcb0 x7 : 0000000000000000 x6 : 0000000000000001 > [ 31.512673] x5 : ffff000280959488 x4 : 0000000000000000 x3 : 0000000000000000 > [ 31.518491] x2 : ffff000281956800 x1 : 00000000ffffffff x0 : ffff000281955080 > [ 31.524310] Call trace: > [ 31.525450] __put_mtd_device+0x4c/0x84 > [ 31.527979] put_mtd_device+0x3c/0x5c > [ 31.530333] mtdchar_close+0x3c/0x84 > [ 31.532604] __fput+0x78/0x220 > [ 31.534357] ____fput+0x1c/0x30 > [ 31.536193] task_work_run+0x88/0xc0 > [ 31.538467] do_notify_resume+0x384/0x12a0 > [ 31.541261] el0_svc+0x6c/0x80 > [ 31.543015] el0t_64_sync_handler+0xa4/0x130 > [ 31.545977] el0t_64_sync+0x1a0/0x1a4 > [ 31.548338] Code: b9448841 51000421 b9048841 36ffff41 (d4210000) > [ 31.553115] ---[ end trace 9652b26ea1d7daa1 ]--- > [ 31.556420] Internal error: Oops - BUG: 0 [#2] PREEMPT SMP > [ 31.556420] note: a.out[390] exited with preempt_count 1 > [ 31.560588] Modules linked in: 8021q sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse > [ 31.575265] CPU: 3 PID: 391 Comm: a.out Tainted: G D 5.15.58-yocto-standard #1 > [ 31.582466] Hardware name: SoCFPGA Agilex SoCDK (DT) > [ 31.586115] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 31.591757] pc : __put_mtd_device+0x4c/0x84 > [ 31.594642] lr : blktrans_release+0xb0/0x120 > [ 31.597603] sp : ffff80000c22bc20 > [ 31.599608] x29: ffff80000c22bc20 x28: ffff000285785e80 x27: 0000000000000000 > [ 31.605428] x26: 0000000045585401 x25: 0000000000000000 x24: ffff0002857864e8 > [ 31.611247] x23: ffff0002803ee520 x22: ffff0002803e3470 x21: ffff0002803e3400 > [ 31.617066] x20: ffff0002803e3020 x19: ffff000281955080 x18: 0000000000000000 > [ 31.622884] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 > [ 31.628702] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 > [ 31.634519] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000088b0230 > [ 31.640337] x8 : ffff80000c22bb90 x7 : 0000000000000000 x6 : 0000000000000001 > [ 31.646155] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 > [ 31.651973] x2 : ffff000281956800 x1 : 00000000ffffffff x0 : ffff000281955080 > [ 31.657792] Call trace: > [ 31.658933] __put_mtd_device+0x4c/0x84 > [ 31.661462] blktrans_release+0xb0/0x120 > [ 31.664077] blkdev_put+0xd4/0x210 > [ 31.666175] blkdev_close+0x34/0x50 > [ 31.668355] __fput+0x78/0x220 > [ 31.670108] ____fput+0x1c/0x30 > [ 31.671943] task_work_run+0x88/0xc0 > [ 31.674217] do_notify_resume+0x384/0x12a0 > [ 31.677009] el0_svc+0x6c/0x80 > [ 31.678762] el0t_64_sync_handler+0xa4/0x130 > [ 31.681723] el0t_64_sync+0x1a0/0x1a4 > [ 31.684082] Code: b9448841 51000421 b9048841 36ffff41 (d4210000) > [ 31.688857] ---[ end trace 9652b26ea1d7daa2 ]--- > [ 31.692161] note: a.out[391] exited with preempt_count 1 > > the testcase a.out is compiled from below code: > when run the case /dev/mtd1 and /dev/mtdblock1 will be used for open and close test. > > #include <stdio.h> > #include <stdlib.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <fcntl.h> > #include <string.h> > #include <signal.h> > #include <unistd.h> > > int main(int argc, char *argv[]) > { > pid_t pid, pid1, pid2; > int fd,ret = 0; > int status = 0; > char device_char[12]="/dev/mtd"; > char device_block[17]="/dev/mtdblock"; > > strcat(device_char, argv[1]); > strcat(device_block, argv[1]); > > pid1 = fork(); > if(pid1 == 0){ > while(1){ > fd = open(device_char, O_SYNC|O_RDWR); > ret = close(fd); > } > } > pid2 = fork(); > if(pid2 == 0){ > while(1){ > fd = open(device_block, O_SYNC|O_RDWR); > ret = close(fd); > } > } > sleep(10); > kill(pid1, SIGKILL); > kill(pid2, SIGKILL); > pid = wait(&status); > pid = wait(&status); > return 0; > } > >> >>> diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c >>> index b8ae1ec14e17..147e4a11dfe4 100644 >>> --- a/drivers/mtd/mtd_blkdevs.c >>> +++ b/drivers/mtd/mtd_blkdevs.c >>> @@ -188,6 +188,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) >>> >>> kref_get(&dev->ref); >>> >>> + if (!mutex_trylock(&mtd_table_mutex)) >>> + return ret; >> >> No, that's not really the solution. >> >> Turning the kref_get above into a kref_get_unless_zero might be better >> path to look into. > > Thanks, I will have a look at this. Hi Christoph, It seems this way can not stop the race to decrease/increase mtd->usecount, the race condition is between mtdchar_{open, close}()->(get)put_mtd_device()->__(get)put_mtd_device() and blktrans_{open,release}()-> __(get)put_mtd_device(), when operate the same device as char device(/dev/mtd1) and block device(/dev/mtdblock1), the original fix for this issue is 073db4a51ee4 ("mtd: fix: avoid race condition when accessing mtd->usecount"), Could you give some suggestion about this? Thanks, Liwei. > > Liwei. > > >>
On 8/10/22 15:47, Liwei Song wrote: > > > On 8/9/22 17:03, Liwei Song wrote: >> >> >> On 8/9/22 16:08, ChristophHellwig wrote: >>> On Tue, Aug 09, 2022 at 03:57:53PM +0800, Liwei Song wrote: >>>> without lock mtd_table_mutex in blktrans_{open, release}, there will >>>> be a race condition when access devices /dev/mtd1 and /dev/mtdblock1 >>>> at the same time with a high frequency open and close test: >>>> >>>> kernel BUG at drivers/mtd/mtdcore.c:1221! >>>> lr : blktrans_release+0xb0/0x120 >>> >>> This does not seem on a current mainline kernel and seems to be >>> a somewhat incomplete backtrace. Can you send the full dmesg of >>> a latest mainline run and maybe share the reproducer? >> >> Yes, the kernel I used is 5.15, unfortunately this is the latest version >> that worked on my board, the whole log is: >> >> [ 31.301343] ------------[ cut here ]------------ >> [ 31.301343] ------------[ cut here ]------------ >> [ 31.301365] kernel BUG at drivers/mtd/mtdcore.c:1221! >> [ 31.314981] kernel BUG at drivers/mtd/mtdcore.c:1221! >> [ 31.329328] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP >> [ 31.374117] Modules linked in: 8021q sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse >> [ 31.395820] CPU: 2 PID: 390 Comm: a.out Not tainted 5.15.58-yocto-standard #1 >> [ 31.412672] Hardware name: SoCFPGA Agilex SoCDK (DT) >> [ 31.427372] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> [ 31.444058] pc : __put_mtd_device+0x4c/0x84 >> [ 31.457977] lr : put_mtd_device+0x3c/0x5c >> [ 31.464122] sp : ffff80000c26bc50 >> [ 31.466126] x29: ffff80000c26bc50 x28: ffff000285785100 x27: 0000000000000000 >> [ 31.471945] x26: 0000000045585401 x25: 0000000000000000 x24: ffff000285785768 >> [ 31.477762] x23: ffff0002803ee520 x22: ffff0002804f8900 x21: ffff000281956800 >> [ 31.483580] x20: ffff000281956800 x19: ffff000281955080 x18: 0000000000000000 >> [ 31.489402] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 >> [ 31.495219] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 >> [ 31.501037] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000088a79b0 >> [ 31.506855] x8 : ffff80000c26bcb0 x7 : 0000000000000000 x6 : 0000000000000001 >> [ 31.512673] x5 : ffff000280959488 x4 : 0000000000000000 x3 : 0000000000000000 >> [ 31.518491] x2 : ffff000281956800 x1 : 00000000ffffffff x0 : ffff000281955080 >> [ 31.524310] Call trace: >> [ 31.525450] __put_mtd_device+0x4c/0x84 >> [ 31.527979] put_mtd_device+0x3c/0x5c >> [ 31.530333] mtdchar_close+0x3c/0x84 >> [ 31.532604] __fput+0x78/0x220 >> [ 31.534357] ____fput+0x1c/0x30 >> [ 31.536193] task_work_run+0x88/0xc0 >> [ 31.538467] do_notify_resume+0x384/0x12a0 >> [ 31.541261] el0_svc+0x6c/0x80 >> [ 31.543015] el0t_64_sync_handler+0xa4/0x130 >> [ 31.545977] el0t_64_sync+0x1a0/0x1a4 >> [ 31.548338] Code: b9448841 51000421 b9048841 36ffff41 (d4210000) >> [ 31.553115] ---[ end trace 9652b26ea1d7daa1 ]--- >> [ 31.556420] Internal error: Oops - BUG: 0 [#2] PREEMPT SMP >> [ 31.556420] note: a.out[390] exited with preempt_count 1 >> [ 31.560588] Modules linked in: 8021q sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse >> [ 31.575265] CPU: 3 PID: 391 Comm: a.out Tainted: G D 5.15.58-yocto-standard #1 >> [ 31.582466] Hardware name: SoCFPGA Agilex SoCDK (DT) >> [ 31.586115] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> [ 31.591757] pc : __put_mtd_device+0x4c/0x84 >> [ 31.594642] lr : blktrans_release+0xb0/0x120 >> [ 31.597603] sp : ffff80000c22bc20 >> [ 31.599608] x29: ffff80000c22bc20 x28: ffff000285785e80 x27: 0000000000000000 >> [ 31.605428] x26: 0000000045585401 x25: 0000000000000000 x24: ffff0002857864e8 >> [ 31.611247] x23: ffff0002803ee520 x22: ffff0002803e3470 x21: ffff0002803e3400 >> [ 31.617066] x20: ffff0002803e3020 x19: ffff000281955080 x18: 0000000000000000 >> [ 31.622884] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 >> [ 31.628702] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 >> [ 31.634519] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000088b0230 >> [ 31.640337] x8 : ffff80000c22bb90 x7 : 0000000000000000 x6 : 0000000000000001 >> [ 31.646155] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 >> [ 31.651973] x2 : ffff000281956800 x1 : 00000000ffffffff x0 : ffff000281955080 >> [ 31.657792] Call trace: >> [ 31.658933] __put_mtd_device+0x4c/0x84 >> [ 31.661462] blktrans_release+0xb0/0x120 >> [ 31.664077] blkdev_put+0xd4/0x210 >> [ 31.666175] blkdev_close+0x34/0x50 >> [ 31.668355] __fput+0x78/0x220 >> [ 31.670108] ____fput+0x1c/0x30 >> [ 31.671943] task_work_run+0x88/0xc0 >> [ 31.674217] do_notify_resume+0x384/0x12a0 >> [ 31.677009] el0_svc+0x6c/0x80 >> [ 31.678762] el0t_64_sync_handler+0xa4/0x130 >> [ 31.681723] el0t_64_sync+0x1a0/0x1a4 >> [ 31.684082] Code: b9448841 51000421 b9048841 36ffff41 (d4210000) >> [ 31.688857] ---[ end trace 9652b26ea1d7daa2 ]--- >> [ 31.692161] note: a.out[391] exited with preempt_count 1 >> >> the testcase a.out is compiled from below code: >> when run the case /dev/mtd1 and /dev/mtdblock1 will be used for open and close test. >> >> #include <stdio.h> >> #include <stdlib.h> >> #include <sys/types.h> >> #include <sys/stat.h> >> #include <fcntl.h> >> #include <string.h> >> #include <signal.h> >> #include <unistd.h> >> >> int main(int argc, char *argv[]) >> { >> pid_t pid, pid1, pid2; >> int fd,ret = 0; >> int status = 0; >> char device_char[12]="/dev/mtd"; >> char device_block[17]="/dev/mtdblock"; >> >> strcat(device_char, argv[1]); >> strcat(device_block, argv[1]); >> >> pid1 = fork(); >> if(pid1 == 0){ >> while(1){ >> fd = open(device_char, O_SYNC|O_RDWR); >> ret = close(fd); >> } >> } >> pid2 = fork(); >> if(pid2 == 0){ >> while(1){ >> fd = open(device_block, O_SYNC|O_RDWR); >> ret = close(fd); >> } >> } >> sleep(10); >> kill(pid1, SIGKILL); >> kill(pid2, SIGKILL); >> pid = wait(&status); >> pid = wait(&status); >> return 0; >> } >> >>> >>>> diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c >>>> index b8ae1ec14e17..147e4a11dfe4 100644 >>>> --- a/drivers/mtd/mtd_blkdevs.c >>>> +++ b/drivers/mtd/mtd_blkdevs.c >>>> @@ -188,6 +188,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) >>>> >>>> kref_get(&dev->ref); >>>> >>>> + if (!mutex_trylock(&mtd_table_mutex)) >>>> + return ret; >>> >>> No, that's not really the solution. >>> >>> Turning the kref_get above into a kref_get_unless_zero might be better >>> path to look into. >> >> Thanks, I will have a look at this. > > Hi Christoph, > > It seems this way can not stop the race to decrease/increase mtd->usecount, > the race condition is between mtdchar_{open, close}()->(get)put_mtd_device()->__(get)put_mtd_device() > and blktrans_{open,release}()-> __(get)put_mtd_device(), when operate the same device > as char device(/dev/mtd1) and block device(/dev/mtdblock1), the original fix for > this issue is 073db4a51ee4 ("mtd: fix: avoid race condition when accessing mtd->usecount"), > Could you give some suggestion about this? Hi, Any suggestion about my new found? Thanks, Liwei. > > Thanks, > Liwei. > > > >> >> Liwei. >> >> >>>
Can you try this patch (it'll need to be split up into a few if it works): diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c index 60b222799871e..9eda1dd98a406 100644 --- a/drivers/mtd/mtd_blkdevs.c +++ b/drivers/mtd/mtd_blkdevs.c @@ -24,24 +24,16 @@ static LIST_HEAD(blktrans_majors); -static void blktrans_dev_release(struct kref *kref) +static void blktrans_free_disk(struct gendisk *disk) { - struct mtd_blktrans_dev *dev = - container_of(kref, struct mtd_blktrans_dev, ref); + struct mtd_blktrans_dev *dev = disk->private_data; - put_disk(dev->disk); blk_mq_free_tag_set(dev->tag_set); kfree(dev->tag_set); list_del(&dev->list); kfree(dev); } -static void blktrans_dev_put(struct mtd_blktrans_dev *dev) -{ - kref_put(&dev->ref, blktrans_dev_release); -} - - static blk_status_t do_blktrans_request(struct mtd_blktrans_ops *tr, struct mtd_blktrans_dev *dev, struct request *req) @@ -187,63 +179,58 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) struct mtd_blktrans_dev *dev = bdev->bd_disk->private_data; int ret = 0; - kref_get(&dev->ref); + if (disk_openers(bdev->bd_disk) > 0) + return 0; - mutex_lock(&dev->lock); - - if (dev->open) - goto unlock; + mutex_lock(&mtd_table_mutex); + if (!dev->mtd) { + mutex_lock(&mtd_table_mutex); + return -EINVAL; + } + ret = __get_mtd_device(dev->mtd); + mutex_unlock(&mtd_table_mutex); + if (ret) + return ret; + mutex_lock(&dev->lock); __module_get(dev->tr->owner); - - if (!dev->mtd) - goto unlock; - if (dev->tr->open) { ret = dev->tr->open(dev); if (ret) goto error_put; } - - ret = __get_mtd_device(dev->mtd); - if (ret) - goto error_release; dev->file_mode = mode; - -unlock: dev->open++; mutex_unlock(&dev->lock); - return ret; -error_release: - if (dev->tr->release) - dev->tr->release(dev); + return 0; + error_put: module_put(dev->tr->owner); mutex_unlock(&dev->lock); - blktrans_dev_put(dev); + + put_mtd_device(dev->mtd); return ret; } static void blktrans_release(struct gendisk *disk, fmode_t mode) { struct mtd_blktrans_dev *dev = disk->private_data; + struct mtd_info *mtd = NULL; - mutex_lock(&dev->lock); - - if (--dev->open) - goto unlock; + if (disk_openers(disk) > 0) + return; + mutex_lock(&dev->lock); + dev->open--; module_put(dev->tr->owner); - - if (dev->mtd) { - if (dev->tr->release) - dev->tr->release(dev); - __put_mtd_device(dev->mtd); - } -unlock: + mtd = dev->mtd; + if (mtd && dev->tr->release) + dev->tr->release(dev); mutex_unlock(&dev->lock); - blktrans_dev_put(dev); + + if (mtd) + put_mtd_device(dev->mtd); } static int blktrans_getgeo(struct block_device *bdev, struct hd_geometry *geo) @@ -266,6 +253,7 @@ static const struct block_device_operations mtd_block_ops = { .owner = THIS_MODULE, .open = blktrans_open, .release = blktrans_release, + .free_disk = blktrans_free_disk, .getgeo = blktrans_getgeo, }; @@ -318,7 +306,6 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new) added: mutex_init(&new->lock); - kref_init(&new->ref); if (!tr->writesect) new->readonly = 1; @@ -410,6 +397,7 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new) int del_mtd_blktrans_dev(struct mtd_blktrans_dev *old) { + struct mtd_info *old_mtd = NULL; unsigned long flags; lockdep_assert_held(&mtd_table_mutex); @@ -438,13 +426,14 @@ int del_mtd_blktrans_dev(struct mtd_blktrans_dev *old) if (old->open) { if (old->tr->release) old->tr->release(old); - __put_mtd_device(old->mtd); + old_mtd = old->mtd; } - old->mtd = NULL; - mutex_unlock(&old->lock); - blktrans_dev_put(old); + put_disk(old->disk); + + if (old->mtd) + put_mtd_device(old_mtd); return 0; } diff --git a/include/linux/mtd/blktrans.h b/include/linux/mtd/blktrans.h index 15cc9b95e32b5..41a81fc9f0462 100644 --- a/include/linux/mtd/blktrans.h +++ b/include/linux/mtd/blktrans.h @@ -7,7 +7,6 @@ #define __MTD_TRANS_H__ #include <linux/mutex.h> -#include <linux/kref.h> #include <linux/sysfs.h> struct hd_geometry; @@ -26,7 +25,6 @@ struct mtd_blktrans_dev { unsigned long size; int readonly; int open; - struct kref ref; struct gendisk *disk; struct attribute_group *disk_attributes; struct request_queue *rq;
On 9/9/22 22:36, ChristophHellwig wrote: > Can you try this patch (it'll need to be split up into a few if it > works): Hi Christoph, Thanks for your help, With this patch, the race condition issue I met can be fixed, but there will be a deadlock issue as below when boot the board: [ 10.277872] ====================================================== [ 10.282765] WARNING: possible circular locking dependency detected [ 10.287661] 5.18.0-yocto-standard+ #4 Not tainted [ 10.291078] ------------------------------------------------------ [ 10.295967] systemd-udevd/173 is trying to acquire lock: [ 10.299994] ffff800009999150 (mtd_table_mutex){+.+.}-{3:3}, at: blktrans_open+0x60/0x128 [ 10.306854] but task is already holding lock: [ 10.310097] ffff000180832718 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev+0xf8/0x2f8 [ 10.317464] which lock already depends on the new lock. [ 10.321749] the existing dependency chain (in reverse order) is: [ 10.326659] -> #1 (&disk->open_mutex){+.+.}-{3:3}: [ 10.330373] __mutex_lock+0x90/0x8f8 [ 10.333202] mutex_lock_nested+0x54/0x70 [ 10.336365] bd_register_pending_holders+0x30/0x120 [ 10.340489] device_add_disk+0x1f4/0x370 [ 10.343669] add_mtd_blktrans_dev+0x304/0x468 [ 10.343691] mtdblock_add_mtd+0x7c/0xd0 [ 10.343702] blktrans_notify_add+0x50/0x70 [ 10.343713] add_mtd_device+0x40c/0x460 [ 10.343724] add_mtd_partitions+0xa0/0x1a8 [ 10.343735] parse_mtd_partitions+0x1b4/0x400 [ 10.343746] mtd_device_parse_register+0x94/0x2a0 [ 10.343757] spi_nor_probe+0x220/0x2e8 [ 10.343767] spi_mem_probe+0x74/0xb0 [ 10.343777] spi_probe+0x88/0xe0 [ 10.343787] really_probe+0xb0/0x268 [ 10.343801] __driver_probe_device+0x7c/0xe0 [ 10.343811] driver_probe_device+0x50/0x100 [ 10.343823] __device_attach_driver+0x98/0xd0 [ 10.343833] bus_for_each_drv+0x74/0xd8 [ 10.343844] __device_attach+0xf0/0x150 [ 10.343854] device_initial_probe+0x24/0x30 [ 10.343865] bus_probe_device+0xa4/0xb0 [ 10.343875] device_add+0x424/0x8b8 [ 10.343884] __spi_add_device+0x7c/0x120 [ 10.343898] spi_add_device+0x4c/0x80 [ 10.343909] of_register_spi_device+0x228/0x380 [ 10.343920] spi_register_controller+0x3c4/0x718 [ 10.343931] devm_spi_register_controller+0x28/0x80 [ 10.343943] cqspi_probe+0x714/0x9f8 [ 10.343952] platform_probe+0x6c/0xd8 [ 10.343962] really_probe+0xb0/0x268 [ 10.343972] __driver_probe_device+0x7c/0xe0 [ 10.343982] driver_probe_device+0x50/0x100 [ 10.343994] __driver_attach+0xa4/0x100 [ 10.344004] bus_for_each_dev+0x84/0xd8 [ 10.344014] driver_attach+0x30/0x40 [ 10.344024] bus_add_driver+0x160/0x208 [ 10.344034] driver_register+0x64/0x110 [ 10.344045] __platform_driver_register+0x34/0x40 [ 10.344054] cqspi_platform_driver_init+0x20/0x28 [ 10.344068] do_one_initcall+0xa4/0x440 [ 10.344080] kernel_init_freeable+0x320/0x378 [ 10.344092] kernel_init+0x2c/0x138 [ 10.344104] ret_from_fork+0x10/0x20 [ 10.344115] -> #0 (mtd_table_mutex){+.+.}-{3:3}: [ 10.344137] __lock_acquire+0x1118/0x15d8 [ 10.344148] lock_acquire+0x2b8/0x3f8 [ 10.344156] __mutex_lock+0x90/0x8f8 [ 10.344166] mutex_lock_nested+0x54/0x70 [ 10.344177] blktrans_open+0x60/0x128 [ 10.344189] blkdev_get_whole+0x3c/0xb8 [ 10.344203] blkdev_get_by_dev+0x1ac/0x2f8 [ 10.344211] blkdev_open+0x64/0xb8 [ 10.344219] do_dentry_open+0x1b0/0x3b8 [ 10.344234] vfs_open+0x38/0x48 [ 10.344243] path_openat+0x758/0x938 [ 10.344255] do_filp_open+0x94/0x118 [ 10.344267] do_sys_openat2+0x234/0x308 [ 10.344278] do_sys_open+0x84/0xc0 [ 10.344286] __arm64_sys_openat+0x2c/0x38 [ 10.344295] invoke_syscall+0x64/0x138 [ 10.344309] el0_svc_common.constprop.4+0xf8/0x118 [ 10.344320] do_el0_svc+0x80/0xa0 [ 10.344330] el0_svc+0x68/0x1c8 [ 10.344339] el0t_64_sync_handler+0x88/0xb0 [ 10.344349] el0t_64_sync+0x15c/0x160 [ 10.344358] other info that might help us debug this: [ 10.344362] Possible unsafe locking scenario: [ 10.344366] CPU0 CPU1 [ 10.344370] ---- ---- [ 10.344373] lock(&disk->open_mutex); [ 10.344384] lock(mtd_table_mutex); [ 10.344395] lock(&disk->open_mutex); [ 10.344406] lock(mtd_table_mutex); [ 10.344416] *** DEADLOCK *** [ 10.344420] 1 lock held by systemd-udevd/173: [ 10.344427] #0: ffff000180832718 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev+0xf8/0x2f8 [ 10.344457] stack backtrace: [ 10.344465] CPU: 3 PID: 173 Comm: systemd-udevd Not tainted 5.18.0-yocto-standard+ #4 [ 10.344476] Hardware name: SoCFPGA Stratix 10 SoCDK (DT) [ 10.344483] Call trace: [ 10.344488] dump_backtrace.part.6+0xf4/0x100 [ 10.344498] show_stack+0x2c/0x48 [ 10.344508] dump_stack_lvl+0x9c/0xcc [ 10.344522] dump_stack+0x14/0x2c [ 10.344532] print_circular_bug.isra.49+0x1a8/0x200 [ 10.344546] check_noncircular+0x124/0x138 [ 10.344557] __lock_acquire+0x1118/0x15d8 [ 10.344566] lock_acquire+0x2b8/0x3f8 [ 10.344574] __mutex_lock+0x90/0x8f8 [ 10.344585] mutex_lock_nested+0x54/0x70 [ 10.344595] blktrans_open+0x60/0x128 [ 10.344607] blkdev_get_whole+0x3c/0xb8 [ 10.344618] blkdev_get_by_dev+0x1ac/0x2f8 [ 10.344627] blkdev_open+0x64/0xb8 [ 10.344636] do_dentry_open+0x1b0/0x3b8 [ 10.344647] vfs_open+0x38/0x48 [ 10.344656] path_openat+0x758/0x938 [ 10.344666] do_filp_open+0x94/0x118 [ 10.344676] do_sys_openat2+0x234/0x308 [ 10.344687] do_sys_open+0x84/0xc0 [ 10.344695] __arm64_sys_openat+0x2c/0x38 [ 10.344704] invoke_syscall+0x64/0x138 [ 10.344715] el0_svc_common.constprop.4+0xf8/0x118 [ 10.344726] do_el0_svc+0x80/0xa0 [ 10.344736] el0_svc+0x68/0x1c8 [ 10.344745] el0t_64_sync_handler+0x88/0xb0 [ 10.344754] el0t_64_sync+0x15c/0x160 Thanks, Liwei. > > diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c > index 60b222799871e..9eda1dd98a406 100644 > --- a/drivers/mtd/mtd_blkdevs.c > +++ b/drivers/mtd/mtd_blkdevs.c > @@ -24,24 +24,16 @@ > > static LIST_HEAD(blktrans_majors); > > -static void blktrans_dev_release(struct kref *kref) > +static void blktrans_free_disk(struct gendisk *disk) > { > - struct mtd_blktrans_dev *dev = > - container_of(kref, struct mtd_blktrans_dev, ref); > + struct mtd_blktrans_dev *dev = disk->private_data; > > - put_disk(dev->disk); > blk_mq_free_tag_set(dev->tag_set); > kfree(dev->tag_set); > list_del(&dev->list); > kfree(dev); > } > > -static void blktrans_dev_put(struct mtd_blktrans_dev *dev) > -{ > - kref_put(&dev->ref, blktrans_dev_release); > -} > - > - > static blk_status_t do_blktrans_request(struct mtd_blktrans_ops *tr, > struct mtd_blktrans_dev *dev, > struct request *req) > @@ -187,63 +179,58 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) > struct mtd_blktrans_dev *dev = bdev->bd_disk->private_data; > int ret = 0; > > - kref_get(&dev->ref); > + if (disk_openers(bdev->bd_disk) > 0) > + return 0; > > - mutex_lock(&dev->lock); > - > - if (dev->open) > - goto unlock; > + mutex_lock(&mtd_table_mutex); > + if (!dev->mtd) { > + mutex_lock(&mtd_table_mutex); > + return -EINVAL; > + } > + ret = __get_mtd_device(dev->mtd); > + mutex_unlock(&mtd_table_mutex); > + if (ret) > + return ret; > > + mutex_lock(&dev->lock); > __module_get(dev->tr->owner); > - > - if (!dev->mtd) > - goto unlock; > - > if (dev->tr->open) { > ret = dev->tr->open(dev); > if (ret) > goto error_put; > } > - > - ret = __get_mtd_device(dev->mtd); > - if (ret) > - goto error_release; > dev->file_mode = mode; > - > -unlock: > dev->open++; > mutex_unlock(&dev->lock); > - return ret; > > -error_release: > - if (dev->tr->release) > - dev->tr->release(dev); > + return 0; > + > error_put: > module_put(dev->tr->owner); > mutex_unlock(&dev->lock); > - blktrans_dev_put(dev); > + > + put_mtd_device(dev->mtd); > return ret; > } > > static void blktrans_release(struct gendisk *disk, fmode_t mode) > { > struct mtd_blktrans_dev *dev = disk->private_data; > + struct mtd_info *mtd = NULL; > > - mutex_lock(&dev->lock); > - > - if (--dev->open) > - goto unlock; > + if (disk_openers(disk) > 0) > + return; > > + mutex_lock(&dev->lock); > + dev->open--; > module_put(dev->tr->owner); > - > - if (dev->mtd) { > - if (dev->tr->release) > - dev->tr->release(dev); > - __put_mtd_device(dev->mtd); > - } > -unlock: > + mtd = dev->mtd; > + if (mtd && dev->tr->release) > + dev->tr->release(dev); > mutex_unlock(&dev->lock); > - blktrans_dev_put(dev); > + > + if (mtd) > + put_mtd_device(dev->mtd); > } > > static int blktrans_getgeo(struct block_device *bdev, struct hd_geometry *geo) > @@ -266,6 +253,7 @@ static const struct block_device_operations mtd_block_ops = { > .owner = THIS_MODULE, > .open = blktrans_open, > .release = blktrans_release, > + .free_disk = blktrans_free_disk, > .getgeo = blktrans_getgeo, > }; > > @@ -318,7 +306,6 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new) > added: > > mutex_init(&new->lock); > - kref_init(&new->ref); > if (!tr->writesect) > new->readonly = 1; > > @@ -410,6 +397,7 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new) > > int del_mtd_blktrans_dev(struct mtd_blktrans_dev *old) > { > + struct mtd_info *old_mtd = NULL; > unsigned long flags; > > lockdep_assert_held(&mtd_table_mutex); > @@ -438,13 +426,14 @@ int del_mtd_blktrans_dev(struct mtd_blktrans_dev *old) > if (old->open) { > if (old->tr->release) > old->tr->release(old); > - __put_mtd_device(old->mtd); > + old_mtd = old->mtd; > } > - > old->mtd = NULL; > - > mutex_unlock(&old->lock); > - blktrans_dev_put(old); > + put_disk(old->disk); > + > + if (old->mtd) > + put_mtd_device(old_mtd); > return 0; > } > > diff --git a/include/linux/mtd/blktrans.h b/include/linux/mtd/blktrans.h > index 15cc9b95e32b5..41a81fc9f0462 100644 > --- a/include/linux/mtd/blktrans.h > +++ b/include/linux/mtd/blktrans.h > @@ -7,7 +7,6 @@ > #define __MTD_TRANS_H__ > > #include <linux/mutex.h> > -#include <linux/kref.h> > #include <linux/sysfs.h> > > struct hd_geometry; > @@ -26,7 +25,6 @@ struct mtd_blktrans_dev { > unsigned long size; > int readonly; > int open; > - struct kref ref; > struct gendisk *disk; > struct attribute_group *disk_attributes; > struct request_queue *rq; >
diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c index b8ae1ec14e17..147e4a11dfe4 100644 --- a/drivers/mtd/mtd_blkdevs.c +++ b/drivers/mtd/mtd_blkdevs.c @@ -188,6 +188,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) kref_get(&dev->ref); + if (!mutex_trylock(&mtd_table_mutex)) + return ret; mutex_lock(&dev->lock); if (dev->open) @@ -212,6 +214,7 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) unlock: dev->open++; mutex_unlock(&dev->lock); + mutex_unlock(&mtd_table_mutex); return ret; error_release: @@ -220,6 +223,7 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) error_put: module_put(dev->tr->owner); mutex_unlock(&dev->lock); + mutex_unlock(&mtd_table_mutex); blktrans_dev_put(dev); return ret; } @@ -228,6 +232,8 @@ static void blktrans_release(struct gendisk *disk, fmode_t mode) { struct mtd_blktrans_dev *dev = disk->private_data; + if (!mutex_trylock(&mtd_table_mutex)) + return; mutex_lock(&dev->lock); if (--dev->open) @@ -242,6 +248,7 @@ static void blktrans_release(struct gendisk *disk, fmode_t mode) } unlock: mutex_unlock(&dev->lock); + mutex_unlock(&mtd_table_mutex); blktrans_dev_put(dev); }
without lock mtd_table_mutex in blktrans_{open, release}, there will be a race condition when access devices /dev/mtd1 and /dev/mtdblock1 at the same time with a high frequency open and close test: kernel BUG at drivers/mtd/mtdcore.c:1221! lr : blktrans_release+0xb0/0x120 Call trace: __put_mtd_device+0x4c/0x84 blktrans_release+0xb0/0x120 blkdev_put+0xd4/0x210 blkdev_close+0x34/0x50 __fput+0x8c/0x240 ____fput+0x1c/0x30 task_work_run+0x98/00t_64_sync_handler+0xa4/0x130 el0t_64_sync+0x1a0/0x1a4 since the original purpose of commit 799ae31c58ae ("mtd_blkdevs: don't hold del_mtd_blktrans_dev in blktrans_{open, release}") is to fix a DEADLOCK issue, here convert mutex_lock to mutex_trylock and return immediately if failed acquire mtd_table_mutex, then both race condition and DEADLOCK can be avoided. Fixes: 799ae31c58ae ("mtd_blkdevs: don't hold del_mtd_blktrans_dev in blktrans_{open, release}") Signed-off-by: Liwei Song <liwei.song@windriver.com> --- drivers/mtd/mtd_blkdevs.c | 7 +++++++ 1 file changed, 7 insertions(+)