Message ID | 20171207173233.27493-1-colin.king@canonical.com |
---|---|
State | New |
Headers | show |
Series | [SRU,ARTFUL] UBUNTU:SAUCE: exec: fix lockup because retry loop may never exit | expand |
On Thu, Dec 07, 2017 at 05:32:33PM +0000, Colin King wrote: > From: Colin Ian King <colin.king@canonical.com> > > BugLink: https://launchpad.net/bugs/1730717 > > My early fix for bug LP#1672819 could get stuck in an infinite > retry loop causing lockups. Currently the retry is aggressively > spinning on a check and will never abort. Relax the aggressive > retry by adding a small interruptible delay and limit the number > of retries. > > I've analyzed the retries and 95% of the cases on an 8 CPU Xeon > host never hit the retry loop and less that 0.5% of retries are > ever more than 1 retry using the original reproducer program. > Considering that the reproducer is an extreeme case of forcing > the race condition I believe we now have a suitable balance of > non-aggressive CPU eating retries and a mechanism to bail out > after enough retries and avoid a lockup. > > Admittedly this workaround is a bit of an ugly hack, but the > retry path is never executed for the majority of use cases and > only hits the retry/delay for the racy condition we hit with the > original bug. > > Signed-off-by: Colin Ian King <colin.king@canonical.com> Looks reasonable. I have tested this to confirm that it fixes the regression. Acked-by: Seth Forshee <seth.forshee@canonical.com>
On 07.12.2017 18:32, Colin King wrote: > From: Colin Ian King <colin.king@canonical.com> > > BugLink: https://launchpad.net/bugs/1730717 > > My early fix for bug LP#1672819 could get stuck in an infinite > retry loop causing lockups. Currently the retry is aggressively > spinning on a check and will never abort. Relax the aggressive > retry by adding a small interruptible delay and limit the number > of retries. > > I've analyzed the retries and 95% of the cases on an 8 CPU Xeon > host never hit the retry loop and less that 0.5% of retries are > ever more than 1 retry using the original reproducer program. > Considering that the reproducer is an extreeme case of forcing > the race condition I believe we now have a suitable balance of > non-aggressive CPU eating retries and a mechanism to bail out > after enough retries and avoid a lockup. > > Admittedly this workaround is a bit of an ugly hack, but the > retry path is never executed for the majority of use cases and > only hits the retry/delay for the racy condition we hit with the > original bug. > > Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> > --- > fs/exec.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 5c2c1ff3bd..0f47bb8486 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -62,6 +62,7 @@ > #include <linux/oom.h> > #include <linux/compat.h> > #include <linux/vmalloc.h> > +#include <linux/delay.h> > > #include <trace/events/fs.h> > > @@ -1465,6 +1466,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm) > struct task_struct *p = current, *t; > unsigned n_fs; > bool fs_recheck; > + int count = 0; > > if (p->ptrace) > bprm->unsafe |= LSM_UNSAFE_PTRACE; > @@ -1492,8 +1494,11 @@ static void check_unsafe_exec(struct linux_binprm *bprm) > > if (p->fs->users > n_fs) { > if (fs_recheck) { > - spin_unlock(&p->fs->lock); > - goto recheck; > + if (count++ < 20) { > + spin_unlock(&p->fs->lock); > + msleep_interruptible(1); > + goto recheck; > + } > } > bprm->unsafe |= LSM_UNSAFE_SHARE; > } else >
Applied to Artful On 2017-12-07 17:32:33 , Colin King wrote: > From: Colin Ian King <colin.king@canonical.com> > > BugLink: https://launchpad.net/bugs/1730717 > > My early fix for bug LP#1672819 could get stuck in an infinite > retry loop causing lockups. Currently the retry is aggressively > spinning on a check and will never abort. Relax the aggressive > retry by adding a small interruptible delay and limit the number > of retries. > > I've analyzed the retries and 95% of the cases on an 8 CPU Xeon > host never hit the retry loop and less that 0.5% of retries are > ever more than 1 retry using the original reproducer program. > Considering that the reproducer is an extreeme case of forcing > the race condition I believe we now have a suitable balance of > non-aggressive CPU eating retries and a mechanism to bail out > after enough retries and avoid a lockup. > > Admittedly this workaround is a bit of an ugly hack, but the > retry path is never executed for the majority of use cases and > only hits the retry/delay for the racy condition we hit with the > original bug. > > Signed-off-by: Colin Ian King <colin.king@canonical.com> > --- > fs/exec.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 5c2c1ff3bd..0f47bb8486 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -62,6 +62,7 @@ > #include <linux/oom.h> > #include <linux/compat.h> > #include <linux/vmalloc.h> > +#include <linux/delay.h> > > #include <trace/events/fs.h> > > @@ -1465,6 +1466,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm) > struct task_struct *p = current, *t; > unsigned n_fs; > bool fs_recheck; > + int count = 0; > > if (p->ptrace) > bprm->unsafe |= LSM_UNSAFE_PTRACE; > @@ -1492,8 +1494,11 @@ static void check_unsafe_exec(struct linux_binprm *bprm) > > if (p->fs->users > n_fs) { > if (fs_recheck) { > - spin_unlock(&p->fs->lock); > - goto recheck; > + if (count++ < 20) { > + spin_unlock(&p->fs->lock); > + msleep_interruptible(1); > + goto recheck; > + } > } > bprm->unsafe |= LSM_UNSAFE_SHARE; > } else > -- > 2.14.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team
diff --git a/fs/exec.c b/fs/exec.c index 5c2c1ff3bd..0f47bb8486 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -62,6 +62,7 @@ #include <linux/oom.h> #include <linux/compat.h> #include <linux/vmalloc.h> +#include <linux/delay.h> #include <trace/events/fs.h> @@ -1465,6 +1466,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm) struct task_struct *p = current, *t; unsigned n_fs; bool fs_recheck; + int count = 0; if (p->ptrace) bprm->unsafe |= LSM_UNSAFE_PTRACE; @@ -1492,8 +1494,11 @@ static void check_unsafe_exec(struct linux_binprm *bprm) if (p->fs->users > n_fs) { if (fs_recheck) { - spin_unlock(&p->fs->lock); - goto recheck; + if (count++ < 20) { + spin_unlock(&p->fs->lock); + msleep_interruptible(1); + goto recheck; + } } bprm->unsafe |= LSM_UNSAFE_SHARE; } else