diff mbox series

[SRU,ZESTY] UBUNTU:SAUCE: exec: fix lockup because retry loop may never exit

Message ID 20171207173134.27413-1-colin.king@canonical.com
State New
Headers show
Series [SRU,ZESTY] UBUNTU:SAUCE: exec: fix lockup because retry loop may never exit | expand

Commit Message

Colin Ian King Dec. 7, 2017, 5:31 p.m. UTC
From: Colin Ian King <colin.king@canonical.com>

BugLink: https://launchpad.net/bugs/1730717

My early fix for bug LP#1672819 could get stuck in an infinite
retry loop causing lockups.  Currently the retry is aggressively
spinning on a check and will never abort.  Relax the aggressive
retry by adding a small interruptible delay and limit the number
of retries.

I've analyzed the retries and 95% of the cases on an 8 CPU Xeon
host never hit the retry loop and less that 0.5% of retries are
ever more than 1 retry using the original reproducer program.
Considering that the reproducer is an extreeme case of forcing
the race condition I believe we now have a suitable balance of
non-aggressive CPU eating retries and a mechanism to bail out
after enough retries and avoid a lockup.

Admittedly this workaround is a bit of an ugly hack, but the
retry path is never executed for the majority of use cases and
only hits the retry/delay for the racy condition we hit with the
original bug.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 fs/exec.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Seth Forshee Dec. 8, 2017, 9:30 p.m. UTC | #1
On Thu, Dec 07, 2017 at 05:31:34PM +0000, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> BugLink: https://launchpad.net/bugs/1730717
> 
> My early fix for bug LP#1672819 could get stuck in an infinite
> retry loop causing lockups.  Currently the retry is aggressively
> spinning on a check and will never abort.  Relax the aggressive
> retry by adding a small interruptible delay and limit the number
> of retries.
> 
> I've analyzed the retries and 95% of the cases on an 8 CPU Xeon
> host never hit the retry loop and less that 0.5% of retries are
> ever more than 1 retry using the original reproducer program.
> Considering that the reproducer is an extreeme case of forcing
> the race condition I believe we now have a suitable balance of
> non-aggressive CPU eating retries and a mechanism to bail out
> after enough retries and avoid a lockup.
> 
> Admittedly this workaround is a bit of an ugly hack, but the
> retry path is never executed for the majority of use cases and
> only hits the retry/delay for the racy condition we hit with the
> original bug.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Looks reasonable. I have tested this (in artful) to confirm that it
fixes the regression.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Stefan Bader Jan. 23, 2018, 10:10 a.m. UTC | #2
On 07.12.2017 18:31, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> BugLink: https://launchpad.net/bugs/1730717
> 
> My early fix for bug LP#1672819 could get stuck in an infinite
> retry loop causing lockups.  Currently the retry is aggressively
> spinning on a check and will never abort.  Relax the aggressive
> retry by adding a small interruptible delay and limit the number
> of retries.
> 
> I've analyzed the retries and 95% of the cases on an 8 CPU Xeon
> host never hit the retry loop and less that 0.5% of retries are
> ever more than 1 retry using the original reproducer program.
> Considering that the reproducer is an extreeme case of forcing
> the race condition I believe we now have a suitable balance of
> non-aggressive CPU eating retries and a mechanism to bail out
> after enough retries and avoid a lockup.
> 
> Admittedly this workaround is a bit of an ugly hack, but the
> retry path is never executed for the majority of use cases and
> only hits the retry/delay for the racy condition we hit with the
> original bug.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  fs/exec.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 8cf76e2..2234a41 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -57,6 +57,7 @@
>  #include <linux/oom.h>
>  #include <linux/compat.h>
>  #include <linux/vmalloc.h>
> +#include <linux/delay.h>
>  
>  #include <trace/events/fs.h>
>  
> @@ -1458,6 +1459,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
>  	struct task_struct *p = current, *t;
>  	unsigned n_fs;
>  	bool fs_recheck;
> +	int count = 0;
>  
>  	if (p->ptrace) {
>  		if (ptracer_capable(p, current_user_ns()))
> @@ -1489,8 +1491,11 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
>  
>  	if (p->fs->users > n_fs) {
>  		if (fs_recheck) {
> -			spin_unlock(&p->fs->lock);
> -			goto recheck;
> +			if (count++ < 20) {
> +				spin_unlock(&p->fs->lock);
> +				msleep_interruptible(1);
> +				goto recheck;
> +			}
>  		}
>  		bprm->unsafe |= LSM_UNSAFE_SHARE;
>  	} else
> 
Zesty EOL
diff mbox series

Patch

diff --git a/fs/exec.c b/fs/exec.c
index 8cf76e2..2234a41 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -57,6 +57,7 @@ 
 #include <linux/oom.h>
 #include <linux/compat.h>
 #include <linux/vmalloc.h>
+#include <linux/delay.h>
 
 #include <trace/events/fs.h>
 
@@ -1458,6 +1459,7 @@  static void check_unsafe_exec(struct linux_binprm *bprm)
 	struct task_struct *p = current, *t;
 	unsigned n_fs;
 	bool fs_recheck;
+	int count = 0;
 
 	if (p->ptrace) {
 		if (ptracer_capable(p, current_user_ns()))
@@ -1489,8 +1491,11 @@  static void check_unsafe_exec(struct linux_binprm *bprm)
 
 	if (p->fs->users > n_fs) {
 		if (fs_recheck) {
-			spin_unlock(&p->fs->lock);
-			goto recheck;
+			if (count++ < 20) {
+				spin_unlock(&p->fs->lock);
+				msleep_interruptible(1);
+				goto recheck;
+			}
 		}
 		bprm->unsafe |= LSM_UNSAFE_SHARE;
 	} else