diff mbox

Trinity: BUG at fs/ext4/inode.c:1590!

Message ID 519A40DE.2060409@redhat.com
State Not Applicable, archived
Headers show

Commit Message

Eric Sandeen May 20, 2013, 3:27 p.m. UTC
On 5/19/13 5:25 PM, Toralf Förster wrote:
> The following BUG happened today at a stable Gentoo Linux 32bit system with stable kernel 3.9.3:

Please say right away, or even in the subject, when you're reporting a trinity fuzzer bug.
It helps set the stage for what we're looking at.  :)

> 2013-05-19T23:28:34.195+02:00 n22 kernel: ------------[ cut here ]------------
> 2013-05-19T23:28:34.195+02:00 n22 kernel: kernel BUG at fs/ext4/inode.c:1590!

<snip>

> The trinity log files hangs here since about a hour:
> ...

ok so you're running it as:

# trinity -C 4 -V /mnt/n22/v1/v2/ -m

so: 4 children, victim files on /mnt/n22/v1/v2, and monochrome

   PID
   |      child syscall nr
   |      |
> [4673] [415] rt_sigsuspend(unewset=0xc0100220, sigsetsize=0x5ffdef7a) = -1 (Invalid argument)
> [4673] [416] munlock(addr=0x85c6800, len=4096) = 0
> [4673] [417] splice(fd_in=8, off_in=0, fd_out=12, off_out=0x85c3000[page_0xff], len=4097, flags=8) = 4097
> [4673] [418] fstatat64(dfd=12, filename="/mnt/n22/v1/v2/d10", statbuf=0, flag=0x284d0014) = -1 (Invalid argument)
> [4673] [419] mincore(start=1, len=0x1000000, vec=0x85c0000[page_zeros]) = -1 (Invalid argument)
> [4673] [420] timer_settime(timer_id=0x5f3bdbfa, flags=0x3075aee6, new_setting=0x85c3000[page_0xff], old_setting=0x85c3001) = -1 (Invalid argument)
> [4673] [421] syncfs(fd=12) [watchdog] pid 4514 hasn't made progress in 30 seconds! (last:1368998898 now:1368998928 diff:30). Stuck in syscall 267:clock_nanosleep. Sending SIGKILL.

hm, and it's telling us (?) that pid 4514 is the one that's stuck, is that in the logs, should we see what syscalls were done there?

trinity reseeds itself now and then, and prints out the seed it's using at that point.

Gathering at least the seed info would allow the test to be restarted with that seed (-s option), and in particular see if it's repeatable.

Dave, I had suggested earlier in the thread that an option to specify a seed and a nr. of syscalls to skip would help narrow down what triggers a bug.  i.e. in this case, we could find the last seed (for this child?) and then run with:

* that seed
* max syscalls 421
* skip the first 400 syscalls

and see if it reproduces.  Keep narrowing the window until we get the smallest set that reproduces.  fsx has something similar to this.

(heh, now I want an option to emit C code to recreate the last N syscalls it's made, for permanent testcases.  I suppose that'd be tough) ;)

-Eric

Here's a really hacky, untested patch that might implement the skipping I'm talking about.  Caveat; I've never actually used trinity.  :)

So if I'm missing something obvious about how to narrow down a failure to the call that caused it, I'm all ears.  :)



--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Dave Jones May 21, 2013, 5:07 p.m. UTC | #1
On Mon, May 20, 2013 at 10:27:26AM -0500, Eric Sandeen wrote:

 > Dave, I had suggested earlier in the thread that an option to specify a seed and a nr. of syscalls
 > to skip would help narrow down what triggers a bug.
 >  i.e. in this case, we could find the last seed (for this child?) and then run with:

The seed is 'global', ie, child 1 uses seed+1, child 2 uses seed+2 etc etc.
We only log the seed value without the child number, so as long as you run with the same number
of child processes, the children will all get the correct seed values.
 
 > * that seed
 > * max syscalls 421
 > * skip the first 400 syscalls
 > 
 > and see if it reproduces.  Keep narrowing the window until we get the smallest set that reproduces.  fsx has something similar to this.

Setting the seed with -N should have the same net result as skipping the syscalls up until that point.

There are however two gotchas with the code to use prior seeds.

1. imagine this scenario.

   a. initial seed
   b. N syscalls done
   c. reseed
   d. N syscalls done.

  If we get an oops after D, you might think "great, I can ignore everything before (c)" but that may not necessarily
  be the case, if something at (b) created/corrupted some kernel state that (d) ends up falling over.

2. We gather a list of filenames at startup from /proc, /sys and /dev, and these change at each run.
   This has undesirable results when you're trying to recreate something based on 'fd 243' etc, when that
   ends up mapping to a different file.
   However, in Toralf's case, if he's using -V, we'll only gather fd's from there, and as long as the files/dirs
   at that path are static across runs, all should be good.

 > (heh, now I want an option to emit C code to recreate the last N syscalls it's made, for permanent testcases.  I suppose that'd be tough) ;)

Not tough, just yet another idea to add to the already mile-long TODO :-)
 
 > Here's a really hacky, untested patch that might implement the skipping I'm talking about.  Caveat; I've never actually used trinity.  :)

I think if we wanted to skip a few million syscalls this would involve
quite a wait, and you're not really going to be saving much time over
just doing the syscalls ;)

 > So if I'm missing something obvious about how to narrow down a failure to the call that caused it, I'm all ears.  :)

I really hope that -N is enough here.  The only real gotcha is the 2nd case
above, which we could solve by using a filename cache in the same way
there's a network socket cache.

	Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

=====

trinity: Add option to skip first X syscalls

To narrow down what set of operations triggers a bug, it may
be useful to rerun w/ the same seed, but skip the first X syscalls,
to get the minimum set of calls which causes a problem.

Add an "-X" option to do this.  This is per-child, in the
same way that the -N option is per-child. (I think!)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

diff --git a/include/syscall.h b/include/syscall.h
index ccbe436..6dc00ec 100644
--- a/include/syscall.h
+++ b/include/syscall.h
@@ -99,6 +99,7 @@  extern const struct syscalltable *syscalls_32bit;
 extern const struct syscalltable *syscalls_64bit;
 
 extern unsigned long syscalls_todo;
+extern unsigned long syscalls_skip;
 extern unsigned int max_nr_syscalls;
 extern unsigned int max_nr_32bit_syscalls;
 extern unsigned int max_nr_64bit_syscalls;
diff --git a/params.c b/params.c
index c2c3f70..d09c032 100644
--- a/params.c
+++ b/params.c
@@ -64,6 +64,7 @@  static void usage(void)
 	fprintf(stderr, " -N#: do # syscalls then exit.\n");
 	fprintf(stderr, " -p:  pause after syscall.\n");
 	fprintf(stderr, " -s#: use # as random seed.\n");
+	fprintf(stderr, " -X#: skip first # syscalls.\n");
 	exit(EXIT_SUCCESS);
 }
 
@@ -92,7 +93,7 @@  void parse_args(int argc, char *argv[])
 {
 	int opt;
 
-	while ((opt = getopt_long(argc, argv, "c:C:dDg:hIl:LN:mnP:pqr:s:SV:vx:", longopts, NULL)) != -1) {
+	while ((opt = getopt_long(argc, argv, "c:C:dDg:hIl:LN:mnP:pqr:s:SV:vx:X", longopts, NULL)) != -1) {
 		switch (opt) {
 		default:
 			if (opt == '?')
@@ -189,7 +190,9 @@  void parse_args(int argc, char *argv[])
 			seed = strtol(optarg, NULL, 10);
 			user_set_seed = TRUE;
 			break;
-
+		case 'X':
+			syscalls_skip = strtol(optarg, NULL, 10);
+			break;
 
 		case 'S':
 			do_syslog = TRUE;
diff --git a/syscall.c b/syscall.c
index 42ac75e..cc33c52 100644
--- a/syscall.c
+++ b/syscall.c
@@ -83,10 +83,12 @@  static unsigned long do_syscall(int childno, int *errno_saved)
 
 	errno = 0;
 
-	if (shm->do32bit[childno] == FALSE)
-		ret = syscall(nr, a1, a2, a3, a4, a5, a6);
-	else
-		ret = syscall32(num_args, nr, a1, a2, a3, a4, a5, a6);
+	if (shm->total_syscalls_done > syscalls_skip) {
+		if (shm->do32bit[childno] == FALSE)
+			ret = syscall(nr, a1, a2, a3, a4, a5, a6);
+		else
+			ret = syscall32(num_args, nr, a1, a2, a3, a4, a5, a6);
+	}
 
 	*errno_saved = errno;
 
@@ -221,7 +223,8 @@  args_done:
 	sptr += sprintf(sptr, ") ");
 	*sptr = '\0';
 
-	output(2, "%s", string);
+	if (shm->total_syscalls_done >= syscalls_skip)
+		output(2, "%s", string);
 
 	if (dopause == TRUE) {
 		synclogs();
@@ -243,6 +246,7 @@  args_done:
 	 */
 	call += SYSCALL_OFFSET;
 
+	/* do_syscall will skip the actual call if we're still skipping */
 	ret = do_syscall(childno, &errno_saved);
 
 	sptr = string;
@@ -262,7 +266,8 @@  args_done:
 
 	*sptr = '\0';
 
-	output(2, "%s", string);
+	if (shm->total_syscalls_done > syscalls_skip)
+		output(2, "%s", string);
 
 	/* If the syscall doesn't exist don't bother calling it next time. */
 	if ((ret == -1) && (errno_saved == ENOSYS)) {
diff --git a/tables.c b/tables.c
index 2bccd83..de11fb5 100644
--- a/tables.c
+++ b/tables.c
@@ -19,6 +19,7 @@  const struct syscalltable *syscalls_32bit;
 const struct syscalltable *syscalls_64bit;
 
 unsigned long syscalls_todo = 0;
+unsigned long syscalls_skip = 0;
 
 unsigned int max_nr_syscalls;
 unsigned int max_nr_32bit_syscalls;