diff mbox series

Taunt OOM killer in fork12 setup()

Message ID 20200130161337.31614-1-mdoucha@suse.cz
State Superseded
Headers show
Series Taunt OOM killer in fork12 setup() | expand

Commit Message

Martin Doucha Jan. 30, 2020, 4:13 p.m. UTC
On a system with low memory, fork12 can trigger OOM killer before it hits
any fork() limits. The OOM killer might accidentally kill e.g. the parent
shell and external testing tools will assume the test failed.

Set high oom_score_adj on the fork12 process so that the OOM killer focuses
on it and its children.

Signed-off-by: Martin Doucha <mdoucha@suse.cz>
---
 testcases/kernel/syscalls/fork/fork12.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Li Wang Jan. 31, 2020, 9:14 a.m. UTC | #1
On Fri, Jan 31, 2020 at 12:13 AM Martin Doucha <mdoucha@suse.cz> wrote:

> On a system with low memory, fork12 can trigger OOM killer before it hits
> any fork() limits. The OOM killer might accidentally kill e.g. the parent
> shell and external testing tools will assume the test failed.
>
> Set high oom_score_adj on the fork12 process so that the OOM killer focuses
> on it and its children.
>

It sounds more like the OOM-Killer defect but not fork12. What we do for
that is to protect the parent shell and its harness to avoid
oom_kill_process() acting on them.

On the other side, if we do raise the oom score of fork12, that would not
guarantee OOM-Killer do right evaluation but just makes fork12 easily to be
killed in testing.
Jan Stancek Jan. 31, 2020, 9:37 a.m. UTC | #2
----- Original Message -----
> 
> 
> On Fri, Jan 31, 2020 at 12:13 AM Martin Doucha < mdoucha@suse.cz > wrote:
> 
> 
> On a system with low memory, fork12 can trigger OOM killer before it hits
> any fork() limits. The OOM killer might accidentally kill e.g. the parent
> shell and external testing tools will assume the test failed.
> 
> Set high oom_score_adj on the fork12 process so that the OOM killer focuses
> on it and its children.
> 
> It sounds more like the OOM-Killer defect but not fork12.

Badness score is based on proportion of rss/swap. It doesn't seem like
defect to me, we just quickly spawn many small tasks.

> What we do for that
> is to protect the parent shell and its harness to avoid oom_kill_process()
> acting on them.
> 
> On the other side, if we do raise the oom score of fork12, that would not
> guarantee OOM-Killer do right evaluation but just makes fork12 easily to be
> killed in testing.

fork12 is not an OOM test, so I don't see problem with this. We only need OOM
to kill something we don't care about, in case it triggers.

I'd move oom_score_adj after fork, so only child processes are better target,
not the parent.
Li Wang Jan. 31, 2020, 9:47 a.m. UTC | #3
On Fri, Jan 31, 2020 at 5:37 PM Jan Stancek <jstancek@redhat.com> wrote:

>
> ----- Original Message -----
> >
> >
> > On Fri, Jan 31, 2020 at 12:13 AM Martin Doucha < mdoucha@suse.cz >
> wrote:
> >
> >
> > On a system with low memory, fork12 can trigger OOM killer before it hits
> > any fork() limits. The OOM killer might accidentally kill e.g. the parent
> > shell and external testing tools will assume the test failed.
> >
> > Set high oom_score_adj on the fork12 process so that the OOM killer
> focuses
> > on it and its children.
> >
> > It sounds more like the OOM-Killer defect but not fork12.
>
> Badness score is based on proportion of rss/swap. It doesn't seem like
> defect to me, we just quickly spawn many small tasks.
>

Ok, but killing the parent shell is not what we wanted anyway. Though oom
killer does the right evaluation and gives a high score to the whole test
framework, that will break ltp reporting.


>
> > What we do for that
> > is to protect the parent shell and its harness to avoid
> oom_kill_process()
> > acting on them.
> >
> > On the other side, if we do raise the oom score of fork12, that would not
> > guarantee OOM-Killer do right evaluation but just makes fork12 easily to
> be
> > killed in testing.
>
> fork12 is not an OOM test, so I don't see problem with this. We only need
> OOM
> to kill something we don't care about, in case it triggers.
>
> I'd move oom_score_adj after fork, so only child processes are better
> target,
> not the parent.
>

Exactly, I agree with this. At least only target for the child process.
Martin Doucha Jan. 31, 2020, 12:40 p.m. UTC | #4
On 1/31/20 10:37 AM, Jan Stancek wrote:
> ----- Original Message ----
>> It sounds more like the OOM-Killer defect but not fork12.
> 
> Badness score is based on proportion of rss/swap. It doesn't seem
> like defect to me, we just quickly spawn many small tasks.

Yes, OOM killer is working as intended here. fork12 is basically a fork
bomb test so it spawns thousands of processes with almost no allocated
memory. Since kernel 2.6.36, OOM killer uses only two criteria to decide
which process to kill:
- how much memory/swap it has allocated
- whether the process is privileged

Since fork12 children have low memory footprint, most system processes
look like better targets for OOM killer right now. But we're not testing
userspace resilience against fork bomb here. We're trying to crash the
kernel itself.

>> What we do for that is to protect the parent shell and its harness
>> to avoid oom_kill_process() acting on them.
>> 
>> On the other side, if we do raise the oom score of fork12, that
>> would not guarantee OOM-Killer do right evaluation but just makes
>> fork12 easily to be killed in testing.
> 
> fork12 is not an OOM test, so I don't see problem with this. We only
> need OOM to kill something we don't care about, in case it triggers.
> 
> I'd move oom_score_adj after fork, so only child processes are better
> target, not the parent.

oom_score_adj is inherited by child processes and OOM killer tries to
kill first-level children if it can. So setting oom_score_adj on the
main fork12 process will work exactly the way we want - OOM killer will
kill one of the child processes, fork12 will notice on line 80 and exit
gracefully.

There could be problems only on kernels older than 2.6.36 where the
number of forked children was included in OOM score calculation and the
main worker process might get targeted directly (not sure if the
kill-children-first approach was used back then).

Either way, trying to protect the parent shell is a bad idea. We'd have
to set negative oom_score_adj on it and if fork12 crashes before it can
reset it back to zero, all further test processes would inherit the OOM
protection.
Li Wang Jan. 31, 2020, 2:20 p.m. UTC | #5
On Fri, Jan 31, 2020 at 8:40 PM Martin Doucha <mdoucha@suse.cz> wrote:

> On 1/31/20 10:37 AM, Jan Stancek wrote:
> > ----- Original Message ----
> >> It sounds more like the OOM-Killer defect but not fork12.
> >
> > Badness score is based on proportion of rss/swap. It doesn't seem
> > like defect to me, we just quickly spawn many small tasks.
>
> Yes, OOM killer is working as intended here. fork12 is basically a fork
> bomb test so it spawns thousands of processes with almost no allocated
> memory. Since kernel 2.6.36, OOM killer uses only two criteria to decide
> which process to kill:
> - how much memory/swap it has allocated
> - whether the process is privileged
>
> Since fork12 children have low memory footprint, most system processes
> look like better targets for OOM killer right now. But we're not testing
> userspace resilience against fork bomb here. We're trying to crash the
> kernel itself.
>

Sounds reasonable to me, thanks!


>
> >> What we do for that is to protect the parent shell and its harness
> >> to avoid oom_kill_process() acting on them.
> >>
> >> On the other side, if we do raise the oom score of fork12, that
> >> would not guarantee OOM-Killer do right evaluation but just makes
> >> fork12 easily to be killed in testing.
> >
> > fork12 is not an OOM test, so I don't see problem with this. We only
> > need OOM to kill something we don't care about, in case it triggers.
> >
> > I'd move oom_score_adj after fork, so only child processes are better
> > target, not the parent.
>
> oom_score_adj is inherited by child processes and OOM killer tries to
> kill first-level children if it can. So setting oom_score_adj on the
> main fork12 process will work exactly the way we want - OOM killer will
> kill one of the child processes, fork12 will notice on line 80 and exit
> gracefully.
>

Theoretically yes! Isn't it makes the main fork12 more robust if not set
high score to it?


> There could be problems only on kernels older than 2.6.36 where the
> number of forked children was included in OOM score calculation and the
> main worker process might get targeted directly (not sure if the
> kill-children-first approach was used back then).
>

It sounds a little tricky. The method I can think of now is to reset the
max process limitation for the lower memory system, to make the test
includes fork bomb but NOT costs too much resource in fork12?


>
> Either way, trying to protect the parent shell is a bad idea. We'd have
> to set negative oom_score_adj on it and if fork12 crashes before it can
> reset it back to zero, all further test processes would inherit the OOM
> protection.
>

My bad! The way we have tried is to set a negative score for the test
harness and LTP related process, then reset parent shell score to 0, which
avoids other tests inherit OOM protection from the shell.
diff mbox series

Patch

diff --git a/testcases/kernel/syscalls/fork/fork12.c b/testcases/kernel/syscalls/fork/fork12.c
index 75278b012..99b6900f4 100644
--- a/testcases/kernel/syscalls/fork/fork12.c
+++ b/testcases/kernel/syscalls/fork/fork12.c
@@ -108,6 +108,8 @@  int main(int ac, char **av)
 static void setup(void)
 {
 	tst_sig(FORK, fork12_sigs, cleanup);
+	/* Taunt the OOM killer so that it doesn't kill system processes */
+	SAFE_FILE_PRINTF(cleanup, "/proc/self/oom_score_adj", "500");
 	TEST_PAUSE;
 }