From patchwork Mon May 20 15:27:26 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Sandeen X-Patchwork-Id: 245058 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40F132C0091 for ; Tue, 21 May 2013 01:27:32 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756143Ab3ETP1b (ORCPT ); Mon, 20 May 2013 11:27:31 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44857 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755261Ab3ETP1a (ORCPT ); Mon, 20 May 2013 11:27:30 -0400 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r4KFRQqW029438 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 20 May 2013 11:27:26 -0400 Received: from liberator.sandeen.net (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r4KFRPYV018110 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Mon, 20 May 2013 11:27:25 -0400 Message-ID: <519A40DE.2060409@redhat.com> Date: Mon, 20 May 2013 10:27:26 -0500 From: Eric Sandeen User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: =?UTF-8?B?VG9yYWxmIEbDtnJzdGVy?= CC: linux-ext4@vger.kernel.org, Dave Jones Subject: Re: Trinity: BUG at fs/ext4/inode.c:1590! References: <5199514D.5090606@gmx.de> In-Reply-To: <5199514D.5090606@gmx.de> X-Enigmail-Version: 1.5.1 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On 5/19/13 5:25 PM, Toralf Förster wrote: > The following BUG happened today at a stable Gentoo Linux 32bit system with stable kernel 3.9.3: Please say right away, or even in the subject, when you're reporting a trinity fuzzer bug. It helps set the stage for what we're looking at. :) > 2013-05-19T23:28:34.195+02:00 n22 kernel: ------------[ cut here ]------------ > 2013-05-19T23:28:34.195+02:00 n22 kernel: kernel BUG at fs/ext4/inode.c:1590! > The trinity log files hangs here since about a hour: > ... ok so you're running it as: # trinity -C 4 -V /mnt/n22/v1/v2/ -m so: 4 children, victim files on /mnt/n22/v1/v2, and monochrome PID | child syscall nr | | > [4673] [415] rt_sigsuspend(unewset=0xc0100220, sigsetsize=0x5ffdef7a) = -1 (Invalid argument) > [4673] [416] munlock(addr=0x85c6800, len=4096) = 0 > [4673] [417] splice(fd_in=8, off_in=0, fd_out=12, off_out=0x85c3000[page_0xff], len=4097, flags=8) = 4097 > [4673] [418] fstatat64(dfd=12, filename="/mnt/n22/v1/v2/d10", statbuf=0, flag=0x284d0014) = -1 (Invalid argument) > [4673] [419] mincore(start=1, len=0x1000000, vec=0x85c0000[page_zeros]) = -1 (Invalid argument) > [4673] [420] timer_settime(timer_id=0x5f3bdbfa, flags=0x3075aee6, new_setting=0x85c3000[page_0xff], old_setting=0x85c3001) = -1 (Invalid argument) > [4673] [421] syncfs(fd=12) [watchdog] pid 4514 hasn't made progress in 30 seconds! (last:1368998898 now:1368998928 diff:30). Stuck in syscall 267:clock_nanosleep. Sending SIGKILL. hm, and it's telling us (?) that pid 4514 is the one that's stuck, is that in the logs, should we see what syscalls were done there? trinity reseeds itself now and then, and prints out the seed it's using at that point. Gathering at least the seed info would allow the test to be restarted with that seed (-s option), and in particular see if it's repeatable. Dave, I had suggested earlier in the thread that an option to specify a seed and a nr. of syscalls to skip would help narrow down what triggers a bug. i.e. in this case, we could find the last seed (for this child?) and then run with: * that seed * max syscalls 421 * skip the first 400 syscalls and see if it reproduces. Keep narrowing the window until we get the smallest set that reproduces. fsx has something similar to this. (heh, now I want an option to emit C code to recreate the last N syscalls it's made, for permanent testcases. I suppose that'd be tough) ;) -Eric Here's a really hacky, untested patch that might implement the skipping I'm talking about. Caveat; I've never actually used trinity. :) So if I'm missing something obvious about how to narrow down a failure to the call that caused it, I'm all ears. :) --- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ===== trinity: Add option to skip first X syscalls To narrow down what set of operations triggers a bug, it may be useful to rerun w/ the same seed, but skip the first X syscalls, to get the minimum set of calls which causes a problem. Add an "-X" option to do this. This is per-child, in the same way that the -N option is per-child. (I think!) Signed-off-by: Eric Sandeen --- diff --git a/include/syscall.h b/include/syscall.h index ccbe436..6dc00ec 100644 --- a/include/syscall.h +++ b/include/syscall.h @@ -99,6 +99,7 @@ extern const struct syscalltable *syscalls_32bit; extern const struct syscalltable *syscalls_64bit; extern unsigned long syscalls_todo; +extern unsigned long syscalls_skip; extern unsigned int max_nr_syscalls; extern unsigned int max_nr_32bit_syscalls; extern unsigned int max_nr_64bit_syscalls; diff --git a/params.c b/params.c index c2c3f70..d09c032 100644 --- a/params.c +++ b/params.c @@ -64,6 +64,7 @@ static void usage(void) fprintf(stderr, " -N#: do # syscalls then exit.\n"); fprintf(stderr, " -p: pause after syscall.\n"); fprintf(stderr, " -s#: use # as random seed.\n"); + fprintf(stderr, " -X#: skip first # syscalls.\n"); exit(EXIT_SUCCESS); } @@ -92,7 +93,7 @@ void parse_args(int argc, char *argv[]) { int opt; - while ((opt = getopt_long(argc, argv, "c:C:dDg:hIl:LN:mnP:pqr:s:SV:vx:", longopts, NULL)) != -1) { + while ((opt = getopt_long(argc, argv, "c:C:dDg:hIl:LN:mnP:pqr:s:SV:vx:X", longopts, NULL)) != -1) { switch (opt) { default: if (opt == '?') @@ -189,7 +190,9 @@ void parse_args(int argc, char *argv[]) seed = strtol(optarg, NULL, 10); user_set_seed = TRUE; break; - + case 'X': + syscalls_skip = strtol(optarg, NULL, 10); + break; case 'S': do_syslog = TRUE; diff --git a/syscall.c b/syscall.c index 42ac75e..cc33c52 100644 --- a/syscall.c +++ b/syscall.c @@ -83,10 +83,12 @@ static unsigned long do_syscall(int childno, int *errno_saved) errno = 0; - if (shm->do32bit[childno] == FALSE) - ret = syscall(nr, a1, a2, a3, a4, a5, a6); - else - ret = syscall32(num_args, nr, a1, a2, a3, a4, a5, a6); + if (shm->total_syscalls_done > syscalls_skip) { + if (shm->do32bit[childno] == FALSE) + ret = syscall(nr, a1, a2, a3, a4, a5, a6); + else + ret = syscall32(num_args, nr, a1, a2, a3, a4, a5, a6); + } *errno_saved = errno; @@ -221,7 +223,8 @@ args_done: sptr += sprintf(sptr, ") "); *sptr = '\0'; - output(2, "%s", string); + if (shm->total_syscalls_done >= syscalls_skip) + output(2, "%s", string); if (dopause == TRUE) { synclogs(); @@ -243,6 +246,7 @@ args_done: */ call += SYSCALL_OFFSET; + /* do_syscall will skip the actual call if we're still skipping */ ret = do_syscall(childno, &errno_saved); sptr = string; @@ -262,7 +266,8 @@ args_done: *sptr = '\0'; - output(2, "%s", string); + if (shm->total_syscalls_done > syscalls_skip) + output(2, "%s", string); /* If the syscall doesn't exist don't bother calling it next time. */ if ((ret == -1) && (errno_saved == ENOSYS)) { diff --git a/tables.c b/tables.c index 2bccd83..de11fb5 100644 --- a/tables.c +++ b/tables.c @@ -19,6 +19,7 @@ const struct syscalltable *syscalls_32bit; const struct syscalltable *syscalls_64bit; unsigned long syscalls_todo = 0; +unsigned long syscalls_skip = 0; unsigned int max_nr_syscalls; unsigned int max_nr_32bit_syscalls;