Message ID | nsxr4k2kdwv.fsf@closure.thunk.org |
---|---|
State | Rejected, archived |
Headers | show |
On 2013.02.26 at 15:39 -0500, Theodore Ts'o wrote: > > The following changes since commit 9931faca02c604c22335f5a935a501bb2ace6e20: > > Linux 3.8-rc3 (2013-01-09 18:59:55 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tags/ext4_for_linus Just booted todays Linux tree and got the following errors: ... Feb 27 13:33:31 x4 kernel: EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null) ... Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: inode #70647809: block 14164000: comm cupsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2569822761, rec_len=3837, name_len=1 Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: inode #70911401: block 15213579: comm pdnsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2000846358, rec_len=36782, name_len=120 /dev/sda ext4 1.4T 655G 651G 51% /var /dev/sda on /var type ext4 (rw,noatime,data=ordered) Running "fsck.ext4 -f /dev/sda" shows no problems.
On Wed, Feb 27, 2013 at 01:47:27PM +0100, Markus Trippelsdorf wrote: > Just booted todays Linux tree and got the following errors: > > ... > Feb 27 13:33:31 x4 kernel: EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null) > ... > Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: inode #70647809: block 14164000: comm cupsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2569822761, rec_len=3837, name_len=1 > Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: inode #70911401: block 15213579: comm pdnsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2000846358, rec_len=36782, name_len=120 Is this reproducible? This looks like the in-memory copy of the directory got corrupted. This could be caused by a hardware error, or a wild pointer, or a bug in the buffer cache code, etc. Since there are so many different possible causes of this kind of complaint, we really need some kind of reproducible test case to do anything with this. I did do a test compile of the ext4 tree with the latest linus.git tree merged in, and ran a full set of repgression tests before I sent my pull request. Now, the regression tests take over 14 hours to run, and there is a delay between when a maintainer sends the pull request to when Linus acts on it --- so Linus almost certainly pulled in some other trees betewen when I did my final regression testing and when I sent the pull request and he pulled it into my tree. I'll see if I can reproduce this on my end, on Linus's tree after the ext4 tree was merged in, but at least in the past, this is the sort of thing that is almost certainly caused by a hardware failure or bug somewhere in the device driver, mm, or buffer cache, given that the directory looks completely insane and a subsequent e2fsck -f didn't discover any problem. Is there anything special about your system? How much memory do you have? What kind of device is /dev/sda? What sort of workload did you have running on your system before the failure? Also, can you send us the output of "debugfs -R "stat <70647809>" /dev/sda" so I can confirm that block 14164000 really is assigned to inode 70647809? The one potential cause of this error I can think of that might be related to recent changes in ext4 is if the extent status tree had the wrong logical-to-physical mapping cached for the directory inode. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013.02.27 at 10:34 -0500, Theodore Ts'o wrote: > On Wed, Feb 27, 2013 at 01:47:27PM +0100, Markus Trippelsdorf wrote: > > Just booted todays Linux tree and got the following errors: > > > > ... > > Feb 27 13:33:31 x4 kernel: EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null) > > ... > > Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: inode #70647809: block 14164000: comm cupsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2569822761, rec_len=3837, name_len=1 > > Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: inode #70911401: block 15213579: comm pdnsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2000846358, rec_len=36782, name_len=120 > > Is this reproducible? Haven't checked yet, because I got scared. > Is there anything special about your system? How much memory do you > have? What kind of device is /dev/sda? What sort of workload did you > have running on your system before the failure? There is nothing special about my system. /dev/sda is a standard SATA drive: Model Family: Seagate Barracuda Green (AF) Device Model: ST1500DL003-9VT16L The error happens during boot. I use ECC memory. > Also, can you send us the output of "debugfs -R "stat <70647809>" > /dev/sda" so I can confirm that block 14164000 really is assigned to > inode 70647809? The one potential cause of this error I can think of > that might be related to recent changes in ext4 is if the extent > status tree had the wrong logical-to-physical mapping cached for the > directory inode. Inode: 70647809 Type: directory Mode: 0755 Flags: 0x80000 Generation: 4138624941 Version: 0x00000000:0000000d User: 0 Group: 0 Size: 4096 File ACL: 0 Directory ACL: 0 Links: 13 Blockcount: 8 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x50ab80de:2c77d5bc -- Tue Nov 20 14:08:46 2012 atime: 0x50aa5519:c556886c -- Mon Nov 19 16:49:45 2012 mtime: 0x50ab80de:2c77d5bc -- Tue Nov 20 14:08:46 2012 crtime: 0x50aa4a81:06bc6fd0 -- Mon Nov 19 16:04:33 2012 Size of extra inode fields: 28 EXTENTS: (0):282599456
On 2013.02.27 at 16:44 +0100, Markus Trippelsdorf wrote: > On 2013.02.27 at 10:34 -0500, Theodore Ts'o wrote: > > On Wed, Feb 27, 2013 at 01:47:27PM +0100, Markus Trippelsdorf wrote: > > > Just booted todays Linux tree and got the following errors: > > > > > > ... > > > Feb 27 13:33:31 x4 kernel: EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null) > > > ... > > > Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: inode #70647809: block 14164000: comm cupsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2569822761, rec_len=3837, name_len=1 > > > Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: inode #70911401: block 15213579: comm pdnsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2000846358, rec_len=36782, name_len=120 > > > > Is this reproducible? The issue is caused by: commit d100eef2440fea13e4f09e88b1c8bcbca64beb9f Author: Zheng Liu <wenqing.lz@taobao.com> Date: Mon Feb 18 00:29:59 2013 -0500 ext4: lookup block mapping in extent status tree
在 2013-2-28,上午1:01,Markus Trippelsdorf <markus@trippelsdorf.de> 写道: > On 2013.02.27 at 16:44 +0100, Markus Trippelsdorf wrote: >> On 2013.02.27 at 10:34 -0500, Theodore Ts'o wrote: >>> On Wed, Feb 27, 2013 at 01:47:27PM +0100, Markus Trippelsdorf wrote: >>>> Just booted todays Linux tree and got the following errors: >>>> >>>> ... >>>> Feb 27 13:33:31 x4 kernel: EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null) >>>> ... >>>> Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: inode #70647809: block 14164000: comm cupsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2569822761, rec_len=3837, name_len=1 >>>> Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: ignore#70911401: block 15213579: comm pdnsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2000846358, rec_len=36782, name_len=120 >>> >>> Is this reproducible? > > The issue is caused by: > > commit d100eef2440fea13e4f09e88b1c8bcbca64beb9f > Author: Zheng Liu <wenqing.lz@taobao.com> > Date: Mon Feb 18 00:29:59 2013 -0500 > > ext4: lookup block mapping in extent status tree Hi Markus, Thanks for the report. I am very sorry about that. Now I am trying to fix it. Thanks, - Zheng-- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013.02.28 at 01:10 +0800, gnehzuil.liu wrote: > 在 2013-2-28,上午1:01,Markus Trippelsdorf <markus@trippelsdorf.de> 写道: > > > On 2013.02.27 at 16:44 +0100, Markus Trippelsdorf wrote: > >> On 2013.02.27 at 10:34 -0500, Theodore Ts'o wrote: > >>> On Wed, Feb 27, 2013 at 01:47:27PM +0100, Markus Trippelsdorf wrote: > >>>> Just booted todays Linux tree and got the following errors: > >>>> > >>>> ... > >>>> Feb 27 13:33:31 x4 kernel: EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null) > >>>> ... > >>>> Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: inode #70647809: block 14164000: comm cupsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2569822761, rec_len=3837, name_len=1 > >>>> Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: ignore#70911401: block 15213579: comm pdnsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2000846358, rec_len=36782, name_len=120 > >>> > >>> Is this reproducible? > > > > The issue is caused by: > > > > commit d100eef2440fea13e4f09e88b1c8bcbca64beb9f > > Author: Zheng Liu <wenqing.lz@taobao.com> > > Date: Mon Feb 18 00:29:59 2013 -0500 > > > > ext4: lookup block mapping in extent status tree > > Thanks for the report. I am very sorry about that. Now I am trying to fix it. > Thank you. Please note that if I run a kernel build with: "git reset --hard f7fec032aa782d3fd7e51fbdf08aa3a296c01500" (the commit before d100eef24) I get a different but similar error: EXT4-fs (sda): error count: 4 EXT4-fs (sda): initial error at 1361983458: ext4_find_dest_de:1657: inode 70911401: block 15213579 EXT4-fs (sda): last error at 1361983663: ext4_find_dest_de:1658: inode 70647809: block 14164000
在 2013-2-28,上午1:22,Markus Trippelsdorf <markus@trippelsdorf.de> 写道: > On 2013.02.28 at 01:10 +0800, gnehzuil.liu wrote: >> 在 2013-2-28,上午1:01,Markus Trippelsdorf <markus@trippelsdorf.de> 写道: >> >>> On 2013.02.27 at 16:44 +0100, Markus Trippelsdorf wrote: >>>> On 2013.02.27 at 10:34 -0500, Theodore Ts'o wrote: >>>>> On Wed, Feb 27, 2013 at 01:47:27PM +0100, Markus Trippelsdorf wrote: >>>>>> Just booted todays Linux tree and got the following errors: >>>>>> >>>>>> ... >>>>>> Feb 27 13:33:31 x4 kernel: EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null) >>>>>> ... >>>>>> Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: inode #70647809: block 14164000: comm cupsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2569822761, rec_len=3837, name_len=1 >>>>>> Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: ignore#70911401: block 15213579: comm pdnsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2000846358, rec_len=36782, name_len=120 >>>>> >>>>> Is this reproducible? >>> >>> The issue is caused by: >>> >>> commit d100eef2440fea13e4f09e88b1c8bcbca64beb9f >>> Author: Zheng Liu <wenqing.lz@taobao.com> >>> Date: Mon Feb 18 00:29:59 2013 -0500 >>> >>> ext4: lookup block mapping in extent status tree >> >> Thanks for the report. I am very sorry about that. Now I am trying to fix it. >> > Thank you. > > Please note that if I run a kernel build with: > "git reset --hard f7fec032aa782d3fd7e51fbdf08aa3a296c01500" > (the commit before d100eef24) I get a different but similar error: > > EXT4-fs (sda): error count: 4 > EXT4-fs (sda): initial error at 1361983458: ext4_find_dest_de:1657: inode 70911401: block 15213579 > EXT4-fs (sda): last error at 1361983663: ext4_find_dest_de:1658: inode 70647809: block 14164000 Yup, thanks for the note. Could you please revert to this commit a25a4e1a5d5dc0f97dddbca44e695c532d8228c1 (the commit before f7fec032)? I suspect that commit f7fec032 is the root cause. Thanks for your help. Regards, - Zheng-- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013.02.28 at 01:38 +0800, gnehzuil.liu wrote: > 在 2013-2-28,上午1:22,Markus Trippelsdorf <markus@trippelsdorf.de> 写道: > > > On 2013.02.28 at 01:10 +0800, gnehzuil.liu wrote: > >> 在 2013-2-28,上午1:01,Markus Trippelsdorf <markus@trippelsdorf.de> 写道: > >> > >>> On 2013.02.27 at 16:44 +0100, Markus Trippelsdorf wrote: > >>>> On 2013.02.27 at 10:34 -0500, Theodore Ts'o wrote: > >>>>> On Wed, Feb 27, 2013 at 01:47:27PM +0100, Markus Trippelsdorf wrote: > >>>>>> Just booted todays Linux tree and got the following errors: > >>>>>> > >>>>>> ... > >>>>>> Feb 27 13:33:31 x4 kernel: EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null) > >>>>>> ... > >>>>>> Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: inode #70647809: block 14164000: comm cupsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2569822761, rec_len=3837, name_len=1 > >>>>>> Feb 27 13:33:32 x4 kernel: EXT4-fs error (device sda): ext4_find_dest_de:1657: ignore#70911401: block 15213579: comm pdnsd: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2000846358, rec_len=36782, name_len=120 > >>>>> > >>>>> Is this reproducible? > >>> > >>> The issue is caused by: > >>> > >>> commit d100eef2440fea13e4f09e88b1c8bcbca64beb9f > >>> Author: Zheng Liu <wenqing.lz@taobao.com> > >>> Date: Mon Feb 18 00:29:59 2013 -0500 > >>> > >>> ext4: lookup block mapping in extent status tree > >> > >> Thanks for the report. I am very sorry about that. Now I am trying to fix it. > >> > > Thank you. > > > > Please note that if I run a kernel build with: > > "git reset --hard f7fec032aa782d3fd7e51fbdf08aa3a296c01500" > > (the commit before d100eef24) I get a different but similar error: > > > > EXT4-fs (sda): error count: 4 > > EXT4-fs (sda): initial error at 1361983458: ext4_find_dest_de:1657: inode 70911401: block 15213579 > > EXT4-fs (sda): last error at 1361983663: ext4_find_dest_de:1658: inode 70647809: block 14164000 > > Yup, thanks for the note. Could you please revert to this commit > a25a4e1a5d5dc0f97dddbca44e695c532d8228c1 (the commit before f7fec032)? > I suspect that commit f7fec032 is the root cause. Thanks for your > help. git revert 06b0c886214a223dde7b21cbfc3008fd20a8ce16..74cd15cd02708c7188581f279f33a98b2ae8d322 fixes all issues...
On Wed, Feb 27, 2013 at 9:45 AM, Markus Trippelsdorf <markus@trippelsdorf.de> wrote: > > git revert 06b0c886214a223dde7b21cbfc3008fd20a8ce16..74cd15cd02708c7188581f279f33a98b2ae8d322 > fixes all issues... Hmm. I'm hoping this will have a quick resolution that doesn't mean having to revert all that. But it's good to have the option. Zheng, Ted, I'm holding up further merging (ugh, second time this merge window) in the hope that we can get this sorted out quickly. I hate continuing to merge stuff when there are known nasty issues pending in my tree. I can't hold for all that long, but it would be really nice if this got sorted out quickly. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html