diff mbox series

[SRU,ARTFUL] UBUNTU:SAUCE: exec: fix lockup because retry loop may never exit

Message ID 20171207173233.27493-1-colin.king@canonical.com
State New
Headers show
Series [SRU,ARTFUL] UBUNTU:SAUCE: exec: fix lockup because retry loop may never exit | expand

Commit Message

Colin Ian King Dec. 7, 2017, 5:32 p.m. UTC
From: Colin Ian King <colin.king@canonical.com>

BugLink: https://launchpad.net/bugs/1730717

My early fix for bug LP#1672819 could get stuck in an infinite
retry loop causing lockups.  Currently the retry is aggressively
spinning on a check and will never abort.  Relax the aggressive
retry by adding a small interruptible delay and limit the number
of retries.

I've analyzed the retries and 95% of the cases on an 8 CPU Xeon
host never hit the retry loop and less that 0.5% of retries are
ever more than 1 retry using the original reproducer program.
Considering that the reproducer is an extreeme case of forcing
the race condition I believe we now have a suitable balance of
non-aggressive CPU eating retries and a mechanism to bail out
after enough retries and avoid a lockup.

Admittedly this workaround is a bit of an ugly hack, but the
retry path is never executed for the majority of use cases and
only hits the retry/delay for the racy condition we hit with the
original bug.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 fs/exec.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Seth Forshee Dec. 8, 2017, 9:30 p.m. UTC | #1
On Thu, Dec 07, 2017 at 05:32:33PM +0000, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> BugLink: https://launchpad.net/bugs/1730717
> 
> My early fix for bug LP#1672819 could get stuck in an infinite
> retry loop causing lockups.  Currently the retry is aggressively
> spinning on a check and will never abort.  Relax the aggressive
> retry by adding a small interruptible delay and limit the number
> of retries.
> 
> I've analyzed the retries and 95% of the cases on an 8 CPU Xeon
> host never hit the retry loop and less that 0.5% of retries are
> ever more than 1 retry using the original reproducer program.
> Considering that the reproducer is an extreeme case of forcing
> the race condition I believe we now have a suitable balance of
> non-aggressive CPU eating retries and a mechanism to bail out
> after enough retries and avoid a lockup.
> 
> Admittedly this workaround is a bit of an ugly hack, but the
> retry path is never executed for the majority of use cases and
> only hits the retry/delay for the racy condition we hit with the
> original bug.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Looks reasonable. I have tested this to confirm that it fixes the
regression.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Stefan Bader Jan. 23, 2018, 10:11 a.m. UTC | #2
On 07.12.2017 18:32, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> BugLink: https://launchpad.net/bugs/1730717
> 
> My early fix for bug LP#1672819 could get stuck in an infinite
> retry loop causing lockups.  Currently the retry is aggressively
> spinning on a check and will never abort.  Relax the aggressive
> retry by adding a small interruptible delay and limit the number
> of retries.
> 
> I've analyzed the retries and 95% of the cases on an 8 CPU Xeon
> host never hit the retry loop and less that 0.5% of retries are
> ever more than 1 retry using the original reproducer program.
> Considering that the reproducer is an extreeme case of forcing
> the race condition I believe we now have a suitable balance of
> non-aggressive CPU eating retries and a mechanism to bail out
> after enough retries and avoid a lockup.
> 
> Admittedly this workaround is a bit of an ugly hack, but the
> retry path is never executed for the majority of use cases and
> only hits the retry/delay for the racy condition we hit with the
> original bug.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>

> ---
>  fs/exec.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 5c2c1ff3bd..0f47bb8486 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -62,6 +62,7 @@
>  #include <linux/oom.h>
>  #include <linux/compat.h>
>  #include <linux/vmalloc.h>
> +#include <linux/delay.h>
>  
>  #include <trace/events/fs.h>
>  
> @@ -1465,6 +1466,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
>  	struct task_struct *p = current, *t;
>  	unsigned n_fs;
>  	bool fs_recheck;
> +	int count = 0;
>  
>  	if (p->ptrace)
>  		bprm->unsafe |= LSM_UNSAFE_PTRACE;
> @@ -1492,8 +1494,11 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
>  
>  	if (p->fs->users > n_fs) {
>  		if (fs_recheck) {
> -			spin_unlock(&p->fs->lock);
> -			goto recheck;
> +			if (count++ < 20) {
> +				spin_unlock(&p->fs->lock);
> +				msleep_interruptible(1);
> +				goto recheck;
> +			}
>  		}
>  		bprm->unsafe |= LSM_UNSAFE_SHARE;
>  	} else
>
Khalid Elmously Feb. 1, 2018, 8:33 p.m. UTC | #3
Applied to Artful

On 2017-12-07 17:32:33 , Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> BugLink: https://launchpad.net/bugs/1730717
> 
> My early fix for bug LP#1672819 could get stuck in an infinite
> retry loop causing lockups.  Currently the retry is aggressively
> spinning on a check and will never abort.  Relax the aggressive
> retry by adding a small interruptible delay and limit the number
> of retries.
> 
> I've analyzed the retries and 95% of the cases on an 8 CPU Xeon
> host never hit the retry loop and less that 0.5% of retries are
> ever more than 1 retry using the original reproducer program.
> Considering that the reproducer is an extreeme case of forcing
> the race condition I believe we now have a suitable balance of
> non-aggressive CPU eating retries and a mechanism to bail out
> after enough retries and avoid a lockup.
> 
> Admittedly this workaround is a bit of an ugly hack, but the
> retry path is never executed for the majority of use cases and
> only hits the retry/delay for the racy condition we hit with the
> original bug.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  fs/exec.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 5c2c1ff3bd..0f47bb8486 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -62,6 +62,7 @@
>  #include <linux/oom.h>
>  #include <linux/compat.h>
>  #include <linux/vmalloc.h>
> +#include <linux/delay.h>
>  
>  #include <trace/events/fs.h>
>  
> @@ -1465,6 +1466,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
>  	struct task_struct *p = current, *t;
>  	unsigned n_fs;
>  	bool fs_recheck;
> +	int count = 0;
>  
>  	if (p->ptrace)
>  		bprm->unsafe |= LSM_UNSAFE_PTRACE;
> @@ -1492,8 +1494,11 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
>  
>  	if (p->fs->users > n_fs) {
>  		if (fs_recheck) {
> -			spin_unlock(&p->fs->lock);
> -			goto recheck;
> +			if (count++ < 20) {
> +				spin_unlock(&p->fs->lock);
> +				msleep_interruptible(1);
> +				goto recheck;
> +			}
>  		}
>  		bprm->unsafe |= LSM_UNSAFE_SHARE;
>  	} else
> -- 
> 2.14.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
diff mbox series

Patch

diff --git a/fs/exec.c b/fs/exec.c
index 5c2c1ff3bd..0f47bb8486 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -62,6 +62,7 @@ 
 #include <linux/oom.h>
 #include <linux/compat.h>
 #include <linux/vmalloc.h>
+#include <linux/delay.h>
 
 #include <trace/events/fs.h>
 
@@ -1465,6 +1466,7 @@  static void check_unsafe_exec(struct linux_binprm *bprm)
 	struct task_struct *p = current, *t;
 	unsigned n_fs;
 	bool fs_recheck;
+	int count = 0;
 
 	if (p->ptrace)
 		bprm->unsafe |= LSM_UNSAFE_PTRACE;
@@ -1492,8 +1494,11 @@  static void check_unsafe_exec(struct linux_binprm *bprm)
 
 	if (p->fs->users > n_fs) {
 		if (fs_recheck) {
-			spin_unlock(&p->fs->lock);
-			goto recheck;
+			if (count++ < 20) {
+				spin_unlock(&p->fs->lock);
+				msleep_interruptible(1);
+				goto recheck;
+			}
 		}
 		bprm->unsafe |= LSM_UNSAFE_SHARE;
 	} else