From patchwork Sat Oct 27 22:42:07 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Sandeen X-Patchwork-Id: 194637 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id A3F822C0090 for ; Sun, 28 Oct 2012 09:42:20 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759584Ab2J0WmS (ORCPT ); Sat, 27 Oct 2012 18:42:18 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34299 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759568Ab2J0WmR (ORCPT ); Sat, 27 Oct 2012 18:42:17 -0400 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q9RMgAOx006247 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 27 Oct 2012 18:42:10 -0400 Received: from liberator.sandeen.net (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q9RMg7n7029462 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 27 Oct 2012 18:42:09 -0400 Message-ID: <508C633F.4070100@redhat.com> Date: Sat, 27 Oct 2012 17:42:07 -0500 From: Eric Sandeen User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20121010 Thunderbird/16.0.1 MIME-Version: 1.0 To: Nix CC: "Theodore Ts'o" , linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) References: <874nllxi7e.fsf_-_@spindle.srvr.nix> <87pq48nbyz.fsf_-_@spindle.srvr.nix> <508AF3FA.4020506@redhat.com> <87wqydx957.fsf@spindle.srvr.nix> <20121026205618.GC8614@thunk.org> <87objpx84k.fsf@spindle.srvr.nix> <20121026211542.GE8614@thunk.org> <87haphx76u.fsf@spindle.srvr.nix> <20121027002258.GB31030@thunk.org> <873910xevu.fsf@spindle.srvr.nix> <20121027175534.GA7783@thunk.org> <87fw4zzra3.fsf@spindle.srvr.nix> <508C4FE5.1030102@redhat.com> In-Reply-To: <508C4FE5.1030102@redhat.com> X-Enigmail-Version: 1.4.5 X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On 10/27/12 4:19 PM, Eric Sandeen wrote: > On 10/27/12 1:47 PM, Nix wrote: >> On 27 Oct 2012, Theodore Ts'o said: >> >>> On Sat, Oct 27, 2012 at 01:45:25PM +0100, Nix wrote: >>>> Ah! it's turned on by journal_async_commit. OK, that alone argues >>>> against use of journal_async_commit, tested or not, and I'd not have >>>> turned it on if I'd noticed that. >>>> >>>> (So, the combinations I'll be trying for effect on this bug are: >>>> >>>> journal_async_commit (as now) >>>> journal_checksum >>>> none >>> >>> Can you also check and see whether the presence or absence of >>> "nobarrier" makes a difference? >> >> Done. (Also checked the effect of your patches posted earlier this week: >> no effect, I'm afraid, certainly not under the fails-even-on-3.6.1 test >> I was carrying out, umount -l'ing /var as the very last thing I did >> before /sbin/reboot -f.) >> >> nobarrier makes a difference that I, at least, did not expect: >> >> [no options] No corruption >> >> nobarrier No corruption >> >> journal_checksum Corruption >> Corrupted transaction, journal aborted >> >> nobarrier,journal_checksum Corruption >> Corrupted transaction, journal aborted >> >> journal_async_commit Corruption >> Corrupted transaction, journal aborted >> >> nobarrier,journal_async_commit Corruption >> No corrupted transaction or aborted journal > > That's what we needed. Woulda been great a few days ago ;) > > In my testing journal_checksum is broken, and my bisection seems to > implicate > > commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 > Author: Theodore Ts'o > Date: Mon Feb 6 20:12:03 2012 -0500 > > ext4: fold ext4_claim_inode into ext4_new_inode > > as the culprit. I haven't had time to look into why, yet. It looks like the inode_bitmap_bh is being modified outside a transaction: ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data); It needs a get_write_access / handle_dirty_metadata pair around it. A hacky patch like this seems to work but it was done 5mins before I have to be out the door to dinner so it probably needs cleanup or at least scrutiny. [PATCH] ext4: fix unjournaled inode bitmap modification commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 modified this function such that the inode bitmap was being modified outside a transaction, which could lead to corruption, and was discovered when journal_checksum found a bad checksum in the journal. Signed-off-by: Eric Sandeen --- > -Eric > >> I didn't expect the last case at all, and it adequately explains why you >> are mostly seeing corrupted journal messages in your tests but I was >> not. It also explains why when I saw this for the first time I was able >> to mount the resulting corrupted filesystem read-write and corrupt it >> further before I noticed that anything was wrong. >> >> It is also clear that journal_checksum and all that relies on it is >> worse than useless right now, as Eric reported while I was testing this. >> It should probably be marked CONFIG_BROKEN in future 3.[346].* stable >> kernels, if CONFIG_BROKEN existed anymore, which it doesn't. >> >> It's a shame journal_async_commit depends on a broken feature: it might >> be notionally unsafe but on some of my systems (without nobarrier or >> flashy caching controllers) it was associated with a noticeable speedup >> of metadata-heavy workloads -- though that was way back in 2009... >> however, "safety first" definitely applies in this case. >> > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- ialloc.c.reverted2 2012-10-27 17:31:20.351537073 -0500 +++ ialloc.c 2012-10-27 17:40:18.643553576 -0500 @@ -669,6 +669,10 @@ inode_bitmap_bh = ext4_read_inode_bitmap(sb, group); if (!inode_bitmap_bh) goto fail; + BUFFER_TRACE(inode_bitmap_bh, "get_write_access"); + err = ext4_journal_get_write_access(handle, inode_bitmap_bh); + if (err) + goto fail; repeat_in_this_group: ino = ext4_find_next_zero_bit((unsigned long *) @@ -690,6 +694,10 @@ ino++; /* the inode bitmap is zero-based */ if (!ret2) goto got; /* we grabbed the inode! */ + BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata"); + err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh); + if (err) + goto fail; if (ino < EXT4_INODES_PER_GROUP(sb)) goto repeat_in_this_group; }