Message ID | 20211216102145.12640-2-ioanna-maria.alifieraki@canonical.com |
---|---|
State | New |
Headers | show |
Series | Fix crash on ipmi module unload | expand |
On Thu, Dec 16, 2021 at 12:21:44PM +0200, Ioanna Alifieraki wrote: > BugLink: https://bugs.launchpad.net/bugs/1950666 > > Currently when removing an ipmi_user the removal is deferred as a work on > the system's workqueue. Although this guarantees the free operation will > occur in non atomic context, it can race with the ipmi_msghandler module > removal (see [1]) . In case a remove_user work is scheduled for removal > and shortly after ipmi_msghandler module is removed we can end up in a > situation where the module is removed fist and when the work is executed > the system crashes with : > BUG: unable to handle page fault for address: ffffffffc05c3450 > PF: supervisor instruction fetch in kernel mode > PF: error_code(0x0010) - not-present page > because the pages of the module are gone. In cleanup_ipmi() there is no > easy way to detect if there are any pending works to flush them before > removing the module. This patch creates a separate workqueue and schedules > the remove_work works on it. When removing the module the workqueue is > drained when destroyed to avoid the race. > > [1] https://bugs.launchpad.net/bugs/1950666 > > Cc: stable@vger.kernel.org # 5.1 > Fixes: 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier) > Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com> > Message-Id: <20211115131645.25116-1-ioanna-maria.alifieraki@canonical.com> > Signed-off-by: Corey Minyard <cminyard@mvista.com> > (cherry picked from commit 1d49eb91e86e8c1c1614c72e3e958b6b7e2472a9) > Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com> > --- > drivers/char/ipmi/ipmi_msghandler.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c > index a08f53f208bf..f3a2f228f648 100644 > --- a/drivers/char/ipmi/ipmi_msghandler.c > +++ b/drivers/char/ipmi/ipmi_msghandler.c > @@ -191,6 +191,8 @@ struct ipmi_user { > struct work_struct remove_work; > }; > > +struct workqueue_struct *remove_work_wq; > + > static struct ipmi_user *acquire_ipmi_user(struct ipmi_user *user, int *index) > __acquires(user->release_barrier) > { > @@ -1261,7 +1263,7 @@ static void free_user(struct kref *ref) > struct ipmi_user *user = container_of(ref, struct ipmi_user, refcount); > > /* SRCU cleanup must happen in task context. */ > - schedule_work(&user->remove_work); > + queue_work(remove_work_wq, &user->remove_work); > } > > static void _ipmi_destroy_user(struct ipmi_user *user) > @@ -5153,6 +5155,13 @@ static int ipmi_init_msghandler(void) > > atomic_notifier_chain_register(&panic_notifier_list, &panic_block); > > + remove_work_wq = create_singlethread_workqueue("ipmi-msghandler-remove-wq"); > + if (!remove_work_wq) { > + pr_err("unable to create ipmi-msghandler-remove-wq workqueue"); > + rv = -ENOMEM; > + goto out; > + } > + Though not so easy to trigger: If this returns an error, then initialized == false, but the timer has been setup and so has the panic_notifier been registered. That is, when unloading the module, you have some new problems to deal with. The exit path in ipmi_init_msghandler should undo these, or rather, this should be done first. Cascardo. > initialized = true; > > out: > @@ -5178,6 +5187,8 @@ static void __exit cleanup_ipmi(void) > int count; > > if (initialized) { > + destroy_workqueue(remove_work_wq); > + > atomic_notifier_chain_unregister(&panic_notifier_list, > &panic_block); > > -- > 2.17.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team
On Thu, Dec 16, 2021 at 10:52:22AM -0300, Thadeu Lima de Souza Cascardo wrote: > On Thu, Dec 16, 2021 at 12:21:44PM +0200, Ioanna Alifieraki wrote: > > BugLink: https://bugs.launchpad.net/bugs/1950666 > > > > Currently when removing an ipmi_user the removal is deferred as a work on > > the system's workqueue. Although this guarantees the free operation will > > occur in non atomic context, it can race with the ipmi_msghandler module > > removal (see [1]) . In case a remove_user work is scheduled for removal > > and shortly after ipmi_msghandler module is removed we can end up in a > > situation where the module is removed fist and when the work is executed > > the system crashes with : > > BUG: unable to handle page fault for address: ffffffffc05c3450 > > PF: supervisor instruction fetch in kernel mode > > PF: error_code(0x0010) - not-present page > > because the pages of the module are gone. In cleanup_ipmi() there is no > > easy way to detect if there are any pending works to flush them before > > removing the module. This patch creates a separate workqueue and schedules > > the remove_work works on it. When removing the module the workqueue is > > drained when destroyed to avoid the race. > > > > [1] https://bugs.launchpad.net/bugs/1950666 > > > > Cc: stable@vger.kernel.org # 5.1 > > Fixes: 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier) > > Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com> > > Message-Id: <20211115131645.25116-1-ioanna-maria.alifieraki@canonical.com> > > Signed-off-by: Corey Minyard <cminyard@mvista.com> > > (cherry picked from commit 1d49eb91e86e8c1c1614c72e3e958b6b7e2472a9) > > Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com> > > --- > > drivers/char/ipmi/ipmi_msghandler.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c > > index a08f53f208bf..f3a2f228f648 100644 > > --- a/drivers/char/ipmi/ipmi_msghandler.c > > +++ b/drivers/char/ipmi/ipmi_msghandler.c > > @@ -191,6 +191,8 @@ struct ipmi_user { > > struct work_struct remove_work; > > }; > > > > +struct workqueue_struct *remove_work_wq; > > + > > static struct ipmi_user *acquire_ipmi_user(struct ipmi_user *user, int *index) > > __acquires(user->release_barrier) > > { > > @@ -1261,7 +1263,7 @@ static void free_user(struct kref *ref) > > struct ipmi_user *user = container_of(ref, struct ipmi_user, refcount); > > > > /* SRCU cleanup must happen in task context. */ > > - schedule_work(&user->remove_work); > > + queue_work(remove_work_wq, &user->remove_work); > > } > > > > static void _ipmi_destroy_user(struct ipmi_user *user) > > @@ -5153,6 +5155,13 @@ static int ipmi_init_msghandler(void) > > > > atomic_notifier_chain_register(&panic_notifier_list, &panic_block); > > > > + remove_work_wq = create_singlethread_workqueue("ipmi-msghandler-remove-wq"); > > + if (!remove_work_wq) { > > + pr_err("unable to create ipmi-msghandler-remove-wq workqueue"); > > + rv = -ENOMEM; > > + goto out; > > + } > > + > > Though not so easy to trigger: If this returns an error, then initialized == > false, but the timer has been setup and so has the panic_notifier been > registered. That is, when unloading the module, you have some new problems to > deal with. The exit path in ipmi_init_msghandler should undo these, or rather, > this should be done first. > > Cascardo. Sent and accepted by the maintainer. https://lore.kernel.org/all/20211217154410.1228673-1-cascardo@canonical.com/ Cascardo. > > > initialized = true; > > > > out: > > @@ -5178,6 +5187,8 @@ static void __exit cleanup_ipmi(void) > > int count; > > > > if (initialized) { > > + destroy_workqueue(remove_work_wq); > > + > > atomic_notifier_chain_unregister(&panic_notifier_list, > > &panic_block); > > > > -- > > 2.17.1 > > > > > > -- > > kernel-team mailing list > > kernel-team@lists.ubuntu.com > > https://lists.ubuntu.com/mailman/listinfo/kernel-team > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team
diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index a08f53f208bf..f3a2f228f648 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -191,6 +191,8 @@ struct ipmi_user { struct work_struct remove_work; }; +struct workqueue_struct *remove_work_wq; + static struct ipmi_user *acquire_ipmi_user(struct ipmi_user *user, int *index) __acquires(user->release_barrier) { @@ -1261,7 +1263,7 @@ static void free_user(struct kref *ref) struct ipmi_user *user = container_of(ref, struct ipmi_user, refcount); /* SRCU cleanup must happen in task context. */ - schedule_work(&user->remove_work); + queue_work(remove_work_wq, &user->remove_work); } static void _ipmi_destroy_user(struct ipmi_user *user) @@ -5153,6 +5155,13 @@ static int ipmi_init_msghandler(void) atomic_notifier_chain_register(&panic_notifier_list, &panic_block); + remove_work_wq = create_singlethread_workqueue("ipmi-msghandler-remove-wq"); + if (!remove_work_wq) { + pr_err("unable to create ipmi-msghandler-remove-wq workqueue"); + rv = -ENOMEM; + goto out; + } + initialized = true; out: @@ -5178,6 +5187,8 @@ static void __exit cleanup_ipmi(void) int count; if (initialized) { + destroy_workqueue(remove_work_wq); + atomic_notifier_chain_unregister(&panic_notifier_list, &panic_block);