Message ID | 20200306134518.84416-1-kgraul@linux.ibm.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net] net/smc: cancel event worker during device removal | expand |
On Fri, Mar 06, 2020 at 02:45:18PM +0100, Karsten Graul wrote: > During IB device removal, cancel the event worker before the device > structure is freed. In the worker, check if the device is being > terminated and do not proceed with the event work in that case. > > Fixes: a4cf0443c414 ("smc: introduce SMC as an IB-client") > Reported-by: syzbot+b297c6825752e7a07272@syzkaller.appspotmail.com > Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> > Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> > --- > net/smc/smc_ib.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c > index d6ba186f67e2..5e4e64a9aa4b 100644 > --- a/net/smc/smc_ib.c > +++ b/net/smc/smc_ib.c > @@ -240,6 +240,9 @@ static void smc_ib_port_event_work(struct work_struct *work) > work, struct smc_ib_device, port_event_work); > u8 port_idx; > > + if (list_empty(&smcibdev->list)) > + return; > + How can it be true if you are not holding "smc_ib_devices.lock" during execution of smc_ib_port_event_work()? > for_each_set_bit(port_idx, &smcibdev->port_event_mask, SMC_MAX_PORTS) { > smc_ib_remember_port_attr(smcibdev, port_idx + 1); > clear_bit(port_idx, &smcibdev->port_event_mask); > @@ -582,6 +585,7 @@ static void smc_ib_remove_dev(struct ib_device *ibdev, void *client_data) > smc_smcr_terminate_all(smcibdev); > smc_ib_cleanup_per_ibdev(smcibdev); > ib_unregister_event_handler(&smcibdev->event_handler); > + cancel_work_sync(&smcibdev->port_event_work); > kfree(smcibdev); > } > > -- > 2.17.1 >
On 08/03/2020 16:01, Leon Romanovsky wrote: > On Fri, Mar 06, 2020 at 02:45:18PM +0100, Karsten Graul wrote: >> During IB device removal, cancel the event worker before the device >> structure is freed. In the worker, check if the device is being >> terminated and do not proceed with the event work in that case. >> >> Fixes: a4cf0443c414 ("smc: introduce SMC as an IB-client") >> Reported-by: syzbot+b297c6825752e7a07272@syzkaller.appspotmail.com >> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> >> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> >> --- >> net/smc/smc_ib.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c >> index d6ba186f67e2..5e4e64a9aa4b 100644 >> --- a/net/smc/smc_ib.c >> +++ b/net/smc/smc_ib.c >> @@ -240,6 +240,9 @@ static void smc_ib_port_event_work(struct work_struct *work) >> work, struct smc_ib_device, port_event_work); >> u8 port_idx; >> >> + if (list_empty(&smcibdev->list)) >> + return; >> + > > How can it be true if you are not holding "smc_ib_devices.lock" during > execution of smc_ib_port_event_work()? > It is true when smc_ib_remove_dev() runs before the work actually started. Other than that its only a shortcut to return earlier, when the item is removed from the list after the check then the processing just takes a little bit longer...its still save. >> for_each_set_bit(port_idx, &smcibdev->port_event_mask, SMC_MAX_PORTS) { >> smc_ib_remember_port_attr(smcibdev, port_idx + 1); >> clear_bit(port_idx, &smcibdev->port_event_mask); >> @@ -582,6 +585,7 @@ static void smc_ib_remove_dev(struct ib_device *ibdev, void *client_data) >> smc_smcr_terminate_all(smcibdev); >> smc_ib_cleanup_per_ibdev(smcibdev); >> ib_unregister_event_handler(&smcibdev->event_handler); >> + cancel_work_sync(&smcibdev->port_event_work); >> kfree(smcibdev); >> } >> >> -- >> 2.17.1 >>
On Sun, Mar 08, 2020 at 08:59:33PM +0100, Karsten Graul wrote: > On 08/03/2020 16:01, Leon Romanovsky wrote: > > On Fri, Mar 06, 2020 at 02:45:18PM +0100, Karsten Graul wrote: > >> During IB device removal, cancel the event worker before the device > >> structure is freed. In the worker, check if the device is being > >> terminated and do not proceed with the event work in that case. > >> > >> Fixes: a4cf0443c414 ("smc: introduce SMC as an IB-client") > >> Reported-by: syzbot+b297c6825752e7a07272@syzkaller.appspotmail.com > >> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> > >> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> > >> --- > >> net/smc/smc_ib.c | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c > >> index d6ba186f67e2..5e4e64a9aa4b 100644 > >> --- a/net/smc/smc_ib.c > >> +++ b/net/smc/smc_ib.c > >> @@ -240,6 +240,9 @@ static void smc_ib_port_event_work(struct work_struct *work) > >> work, struct smc_ib_device, port_event_work); > >> u8 port_idx; > >> > >> + if (list_empty(&smcibdev->list)) > >> + return; > >> + > > > > How can it be true if you are not holding "smc_ib_devices.lock" during > > execution of smc_ib_port_event_work()? > > > > It is true when smc_ib_remove_dev() runs before the work actually started. > Other than that its only a shortcut to return earlier, when the item is > removed from the list after the check then the processing just takes a > little bit longer...its still save. The check itself maybe safe, but it can't fix syzkaller bug reported above. As you said, the smc_ib_remove_dev() can be called immediately after your list_empty() check and we return to original behavior. The correct design will be to ensure that smc_ib_port_event_work() is executed only smcibdev->list is not empty. Thanks > > >> for_each_set_bit(port_idx, &smcibdev->port_event_mask, SMC_MAX_PORTS) { > >> smc_ib_remember_port_attr(smcibdev, port_idx + 1); > >> clear_bit(port_idx, &smcibdev->port_event_mask); > >> @@ -582,6 +585,7 @@ static void smc_ib_remove_dev(struct ib_device *ibdev, void *client_data) > >> smc_smcr_terminate_all(smcibdev); > >> smc_ib_cleanup_per_ibdev(smcibdev); > >> ib_unregister_event_handler(&smcibdev->event_handler); > >> + cancel_work_sync(&smcibdev->port_event_work); > >> kfree(smcibdev); > >> } > >> > >> -- > >> 2.17.1 > >> > > -- > Karsten > > (I'm a dude) >
On 09/03/2020 09:04, Leon Romanovsky wrote: > On Sun, Mar 08, 2020 at 08:59:33PM +0100, Karsten Graul wrote: >> On 08/03/2020 16:01, Leon Romanovsky wrote: >>> On Fri, Mar 06, 2020 at 02:45:18PM +0100, Karsten Graul wrote: >>>> During IB device removal, cancel the event worker before the device >>>> structure is freed. In the worker, check if the device is being >>>> terminated and do not proceed with the event work in that case. >>>> >>>> Fixes: a4cf0443c414 ("smc: introduce SMC as an IB-client") >>>> Reported-by: syzbot+b297c6825752e7a07272@syzkaller.appspotmail.com >>>> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> >>>> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> >>>> --- >>>> net/smc/smc_ib.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c >>>> index d6ba186f67e2..5e4e64a9aa4b 100644 >>>> --- a/net/smc/smc_ib.c >>>> +++ b/net/smc/smc_ib.c >>>> @@ -240,6 +240,9 @@ static void smc_ib_port_event_work(struct work_struct *work) >>>> work, struct smc_ib_device, port_event_work); >>>> u8 port_idx; >>>> >>>> + if (list_empty(&smcibdev->list)) >>>> + return; >>>> + >>> >>> How can it be true if you are not holding "smc_ib_devices.lock" during >>> execution of smc_ib_port_event_work()? >>> >> >> It is true when smc_ib_remove_dev() runs before the work actually started. >> Other than that its only a shortcut to return earlier, when the item is >> removed from the list after the check then the processing just takes a >> little bit longer...its still save. > > The check itself maybe safe, but it can't fix syzkaller bug reported above. > As you said, the smc_ib_remove_dev() can be called immediately after > your list_empty() check and we return to original behavior. > > The correct design will be to ensure that smc_ib_port_event_work() is > executed only smcibdev->list is not empty. > > Thanks > The fix I had in mind was the cancel_work_sync(&smcibdev->port_event_work); to wait for a running port_event_work to finish before smcibdev is freed. I can remove the list_empty() check if that is too confusing. >> >>>> for_each_set_bit(port_idx, &smcibdev->port_event_mask, SMC_MAX_PORTS) { >>>> smc_ib_remember_port_attr(smcibdev, port_idx + 1); >>>> clear_bit(port_idx, &smcibdev->port_event_mask); >>>> @@ -582,6 +585,7 @@ static void smc_ib_remove_dev(struct ib_device *ibdev, void *client_data) >>>> smc_smcr_terminate_all(smcibdev); >>>> smc_ib_cleanup_per_ibdev(smcibdev); >>>> ib_unregister_event_handler(&smcibdev->event_handler); >>>> + cancel_work_sync(&smcibdev->port_event_work); >>>> kfree(smcibdev); >>>> } >>>> >>>> -- >>>> 2.17.1 >>>> >> >> -- >> Karsten >> >> (I'm a dude) >>
On Mon, Mar 09, 2020 at 10:40:16AM +0100, Karsten Graul wrote: > On 09/03/2020 09:04, Leon Romanovsky wrote: > > On Sun, Mar 08, 2020 at 08:59:33PM +0100, Karsten Graul wrote: > >> On 08/03/2020 16:01, Leon Romanovsky wrote: > >>> On Fri, Mar 06, 2020 at 02:45:18PM +0100, Karsten Graul wrote: > >>>> During IB device removal, cancel the event worker before the device > >>>> structure is freed. In the worker, check if the device is being > >>>> terminated and do not proceed with the event work in that case. > >>>> > >>>> Fixes: a4cf0443c414 ("smc: introduce SMC as an IB-client") > >>>> Reported-by: syzbot+b297c6825752e7a07272@syzkaller.appspotmail.com > >>>> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> > >>>> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> > >>>> --- > >>>> net/smc/smc_ib.c | 4 ++++ > >>>> 1 file changed, 4 insertions(+) > >>>> > >>>> diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c > >>>> index d6ba186f67e2..5e4e64a9aa4b 100644 > >>>> --- a/net/smc/smc_ib.c > >>>> +++ b/net/smc/smc_ib.c > >>>> @@ -240,6 +240,9 @@ static void smc_ib_port_event_work(struct work_struct *work) > >>>> work, struct smc_ib_device, port_event_work); > >>>> u8 port_idx; > >>>> > >>>> + if (list_empty(&smcibdev->list)) > >>>> + return; > >>>> + > >>> > >>> How can it be true if you are not holding "smc_ib_devices.lock" during > >>> execution of smc_ib_port_event_work()? > >>> > >> > >> It is true when smc_ib_remove_dev() runs before the work actually started. > >> Other than that its only a shortcut to return earlier, when the item is > >> removed from the list after the check then the processing just takes a > >> little bit longer...its still save. > > > > The check itself maybe safe, but it can't fix syzkaller bug reported above. > > As you said, the smc_ib_remove_dev() can be called immediately after > > your list_empty() check and we return to original behavior. > > > > The correct design will be to ensure that smc_ib_port_event_work() is > > executed only smcibdev->list is not empty. > > > > Thanks > > > > The fix I had in mind was the > > cancel_work_sync(&smcibdev->port_event_work); > > to wait for a running port_event_work to finish before smcibdev is freed. > I can remove the list_empty() check if that is too confusing. Yes, please. Thanks > > >> > >>>> for_each_set_bit(port_idx, &smcibdev->port_event_mask, SMC_MAX_PORTS) { > >>>> smc_ib_remember_port_attr(smcibdev, port_idx + 1); > >>>> clear_bit(port_idx, &smcibdev->port_event_mask); > >>>> @@ -582,6 +585,7 @@ static void smc_ib_remove_dev(struct ib_device *ibdev, void *client_data) > >>>> smc_smcr_terminate_all(smcibdev); > >>>> smc_ib_cleanup_per_ibdev(smcibdev); > >>>> ib_unregister_event_handler(&smcibdev->event_handler); > >>>> + cancel_work_sync(&smcibdev->port_event_work); > >>>> kfree(smcibdev); > >>>> } > >>>> > >>>> -- > >>>> 2.17.1 > >>>> > >> > >> -- > >> Karsten > >> > >> (I'm a dude) > >> > > -- > Karsten > > (I'm a dude) >
diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c index d6ba186f67e2..5e4e64a9aa4b 100644 --- a/net/smc/smc_ib.c +++ b/net/smc/smc_ib.c @@ -240,6 +240,9 @@ static void smc_ib_port_event_work(struct work_struct *work) work, struct smc_ib_device, port_event_work); u8 port_idx; + if (list_empty(&smcibdev->list)) + return; + for_each_set_bit(port_idx, &smcibdev->port_event_mask, SMC_MAX_PORTS) { smc_ib_remember_port_attr(smcibdev, port_idx + 1); clear_bit(port_idx, &smcibdev->port_event_mask); @@ -582,6 +585,7 @@ static void smc_ib_remove_dev(struct ib_device *ibdev, void *client_data) smc_smcr_terminate_all(smcibdev); smc_ib_cleanup_per_ibdev(smcibdev); ib_unregister_event_handler(&smcibdev->event_handler); + cancel_work_sync(&smcibdev->port_event_work); kfree(smcibdev); }