diff mbox series

[SRU,J/I/H/F,1/2] ipmi: Move remove_work to dedicated workqueue

Message ID 20211216102145.12640-2-ioanna-maria.alifieraki@canonical.com
State New
Headers show
Series Fix crash on ipmi module unload | expand

Commit Message

Ioanna Alifieraki Dec. 16, 2021, 10:21 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1950666

Currently when removing an ipmi_user the removal is deferred as a work on
the system's workqueue. Although this guarantees the free operation will
occur in non atomic context, it can race with the ipmi_msghandler module
removal (see [1]) . In case a remove_user work is scheduled for removal
and shortly after ipmi_msghandler module is removed we can end up in a
situation where the module is removed fist and when the work is executed
the system crashes with :
BUG: unable to handle page fault for address: ffffffffc05c3450
PF: supervisor instruction fetch in kernel mode
PF: error_code(0x0010) - not-present page
because the pages of the module are gone. In cleanup_ipmi() there is no
easy way to detect if there are any pending works to flush them before
removing the module. This patch creates a separate workqueue and schedules
the remove_work works on it. When removing the module the workqueue is
drained when destroyed to avoid the race.

[1] https://bugs.launchpad.net/bugs/1950666

Cc: stable@vger.kernel.org # 5.1
Fixes: 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier)
Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
Message-Id: <20211115131645.25116-1-ioanna-maria.alifieraki@canonical.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
(cherry picked from commit 1d49eb91e86e8c1c1614c72e3e958b6b7e2472a9)
Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
---
 drivers/char/ipmi/ipmi_msghandler.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Thadeu Lima de Souza Cascardo Dec. 16, 2021, 1:52 p.m. UTC | #1
On Thu, Dec 16, 2021 at 12:21:44PM +0200, Ioanna Alifieraki wrote:
> BugLink: https://bugs.launchpad.net/bugs/1950666
> 
> Currently when removing an ipmi_user the removal is deferred as a work on
> the system's workqueue. Although this guarantees the free operation will
> occur in non atomic context, it can race with the ipmi_msghandler module
> removal (see [1]) . In case a remove_user work is scheduled for removal
> and shortly after ipmi_msghandler module is removed we can end up in a
> situation where the module is removed fist and when the work is executed
> the system crashes with :
> BUG: unable to handle page fault for address: ffffffffc05c3450
> PF: supervisor instruction fetch in kernel mode
> PF: error_code(0x0010) - not-present page
> because the pages of the module are gone. In cleanup_ipmi() there is no
> easy way to detect if there are any pending works to flush them before
> removing the module. This patch creates a separate workqueue and schedules
> the remove_work works on it. When removing the module the workqueue is
> drained when destroyed to avoid the race.
> 
> [1] https://bugs.launchpad.net/bugs/1950666
> 
> Cc: stable@vger.kernel.org # 5.1
> Fixes: 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier)
> Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
> Message-Id: <20211115131645.25116-1-ioanna-maria.alifieraki@canonical.com>
> Signed-off-by: Corey Minyard <cminyard@mvista.com>
> (cherry picked from commit 1d49eb91e86e8c1c1614c72e3e958b6b7e2472a9)
> Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
> ---
>  drivers/char/ipmi/ipmi_msghandler.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> index a08f53f208bf..f3a2f228f648 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -191,6 +191,8 @@ struct ipmi_user {
>  	struct work_struct remove_work;
>  };
>  
> +struct workqueue_struct *remove_work_wq;
> +
>  static struct ipmi_user *acquire_ipmi_user(struct ipmi_user *user, int *index)
>  	__acquires(user->release_barrier)
>  {
> @@ -1261,7 +1263,7 @@ static void free_user(struct kref *ref)
>  	struct ipmi_user *user = container_of(ref, struct ipmi_user, refcount);
>  
>  	/* SRCU cleanup must happen in task context. */
> -	schedule_work(&user->remove_work);
> +	queue_work(remove_work_wq, &user->remove_work);
>  }
>  
>  static void _ipmi_destroy_user(struct ipmi_user *user)
> @@ -5153,6 +5155,13 @@ static int ipmi_init_msghandler(void)
>  
>  	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
>  
> +	remove_work_wq = create_singlethread_workqueue("ipmi-msghandler-remove-wq");
> +	if (!remove_work_wq) {
> +		pr_err("unable to create ipmi-msghandler-remove-wq workqueue");
> +		rv = -ENOMEM;
> +		goto out;
> +	}
> +

Though not so easy to trigger: If this returns an error, then initialized ==
false, but the timer has been setup and so has the panic_notifier been
registered. That is, when unloading the module, you have some new problems to
deal with. The exit path in ipmi_init_msghandler should undo these, or rather,
this should be done first.

Cascardo.

>  	initialized = true;
>  
>  out:
> @@ -5178,6 +5187,8 @@ static void __exit cleanup_ipmi(void)
>  	int count;
>  
>  	if (initialized) {
> +		destroy_workqueue(remove_work_wq);
> +
>  		atomic_notifier_chain_unregister(&panic_notifier_list,
>  						 &panic_block);
>  
> -- 
> 2.17.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Thadeu Lima de Souza Cascardo Dec. 18, 2021, 10:15 a.m. UTC | #2
On Thu, Dec 16, 2021 at 10:52:22AM -0300, Thadeu Lima de Souza Cascardo wrote:
> On Thu, Dec 16, 2021 at 12:21:44PM +0200, Ioanna Alifieraki wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1950666
> > 
> > Currently when removing an ipmi_user the removal is deferred as a work on
> > the system's workqueue. Although this guarantees the free operation will
> > occur in non atomic context, it can race with the ipmi_msghandler module
> > removal (see [1]) . In case a remove_user work is scheduled for removal
> > and shortly after ipmi_msghandler module is removed we can end up in a
> > situation where the module is removed fist and when the work is executed
> > the system crashes with :
> > BUG: unable to handle page fault for address: ffffffffc05c3450
> > PF: supervisor instruction fetch in kernel mode
> > PF: error_code(0x0010) - not-present page
> > because the pages of the module are gone. In cleanup_ipmi() there is no
> > easy way to detect if there are any pending works to flush them before
> > removing the module. This patch creates a separate workqueue and schedules
> > the remove_work works on it. When removing the module the workqueue is
> > drained when destroyed to avoid the race.
> > 
> > [1] https://bugs.launchpad.net/bugs/1950666
> > 
> > Cc: stable@vger.kernel.org # 5.1
> > Fixes: 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier)
> > Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
> > Message-Id: <20211115131645.25116-1-ioanna-maria.alifieraki@canonical.com>
> > Signed-off-by: Corey Minyard <cminyard@mvista.com>
> > (cherry picked from commit 1d49eb91e86e8c1c1614c72e3e958b6b7e2472a9)
> > Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
> > ---
> >  drivers/char/ipmi/ipmi_msghandler.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> > index a08f53f208bf..f3a2f228f648 100644
> > --- a/drivers/char/ipmi/ipmi_msghandler.c
> > +++ b/drivers/char/ipmi/ipmi_msghandler.c
> > @@ -191,6 +191,8 @@ struct ipmi_user {
> >  	struct work_struct remove_work;
> >  };
> >  
> > +struct workqueue_struct *remove_work_wq;
> > +
> >  static struct ipmi_user *acquire_ipmi_user(struct ipmi_user *user, int *index)
> >  	__acquires(user->release_barrier)
> >  {
> > @@ -1261,7 +1263,7 @@ static void free_user(struct kref *ref)
> >  	struct ipmi_user *user = container_of(ref, struct ipmi_user, refcount);
> >  
> >  	/* SRCU cleanup must happen in task context. */
> > -	schedule_work(&user->remove_work);
> > +	queue_work(remove_work_wq, &user->remove_work);
> >  }
> >  
> >  static void _ipmi_destroy_user(struct ipmi_user *user)
> > @@ -5153,6 +5155,13 @@ static int ipmi_init_msghandler(void)
> >  
> >  	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> >  
> > +	remove_work_wq = create_singlethread_workqueue("ipmi-msghandler-remove-wq");
> > +	if (!remove_work_wq) {
> > +		pr_err("unable to create ipmi-msghandler-remove-wq workqueue");
> > +		rv = -ENOMEM;
> > +		goto out;
> > +	}
> > +
> 
> Though not so easy to trigger: If this returns an error, then initialized ==
> false, but the timer has been setup and so has the panic_notifier been
> registered. That is, when unloading the module, you have some new problems to
> deal with. The exit path in ipmi_init_msghandler should undo these, or rather,
> this should be done first.
> 
> Cascardo.

Sent and accepted by the maintainer.

https://lore.kernel.org/all/20211217154410.1228673-1-cascardo@canonical.com/

Cascardo.

> 
> >  	initialized = true;
> >  
> >  out:
> > @@ -5178,6 +5187,8 @@ static void __exit cleanup_ipmi(void)
> >  	int count;
> >  
> >  	if (initialized) {
> > +		destroy_workqueue(remove_work_wq);
> > +
> >  		atomic_notifier_chain_unregister(&panic_notifier_list,
> >  						 &panic_block);
> >  
> > -- 
> > 2.17.1
> > 
> > 
> > -- 
> > kernel-team mailing list
> > kernel-team@lists.ubuntu.com
> > https://lists.ubuntu.com/mailman/listinfo/kernel-team
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
diff mbox series

Patch

diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
index a08f53f208bf..f3a2f228f648 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -191,6 +191,8 @@  struct ipmi_user {
 	struct work_struct remove_work;
 };
 
+struct workqueue_struct *remove_work_wq;
+
 static struct ipmi_user *acquire_ipmi_user(struct ipmi_user *user, int *index)
 	__acquires(user->release_barrier)
 {
@@ -1261,7 +1263,7 @@  static void free_user(struct kref *ref)
 	struct ipmi_user *user = container_of(ref, struct ipmi_user, refcount);
 
 	/* SRCU cleanup must happen in task context. */
-	schedule_work(&user->remove_work);
+	queue_work(remove_work_wq, &user->remove_work);
 }
 
 static void _ipmi_destroy_user(struct ipmi_user *user)
@@ -5153,6 +5155,13 @@  static int ipmi_init_msghandler(void)
 
 	atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
 
+	remove_work_wq = create_singlethread_workqueue("ipmi-msghandler-remove-wq");
+	if (!remove_work_wq) {
+		pr_err("unable to create ipmi-msghandler-remove-wq workqueue");
+		rv = -ENOMEM;
+		goto out;
+	}
+
 	initialized = true;
 
 out:
@@ -5178,6 +5187,8 @@  static void __exit cleanup_ipmi(void)
 	int count;
 
 	if (initialized) {
+		destroy_workqueue(remove_work_wq);
+
 		atomic_notifier_chain_unregister(&panic_notifier_list,
 						 &panic_block);