Patchwork logfs unmount bug

login
register
mail settings
Submitter Jörn Engel
Date Aug. 12, 2011, 9:34 a.m.
Message ID <20110812093429.GS26160@logfs.org>
Download mbox | patch
Permalink /patch/109806/
State New
Headers show

Comments

Jörn Engel - Aug. 12, 2011, 9:34 a.m.
On Wed, 10 August 2011 16:36:26 +0530, srimugunthan dhandapani wrote:
> 
> > Ok, since I cannot reproduce this at all, can you try the patch below?
> 
> I was using a week-old kernel from git. That may be the problem,
> Can you tell the kernel version that you are using, in which bonnie test passes.
> I am right now trying on kernel 3.0 and i will test and let you know
> the results.

I wouldn't suspect any code changes to cause the different behaviour.
Kernel config and test machine (memsize, etc.) would be more likely
candidates.  So if you could try the patch and send me the output,
that would be useful.

The problem is that two mutually exclusive reasons exist, why
page->private is set.  Either the page is an indirect block and
contains an alias (block->alias_map some bits set) or the page is a
regular data page that is dirty and hasn't been written back yet
(block->reserved_bytes > 0).  There should never be a case when both
happen at the same time.  So what you are seeing is only the symptom
of a bug happening some time before.

Best candidate I have right now would be fixed by the patch below.
But since I cannot reproduce the bug, that patch is just guesswork.

Jörn
srimugunthan dhandapani - Aug. 12, 2011, 5:26 p.m.
Hi,

> I wouldn't suspect any code changes to cause the different behaviour.
> Kernel config and test machine (memsize, etc.) would be more likely
> candidates.  So if you could try the patch and send me the output,
> that would be useful.

I tried your patches and I also tried on official kernel release 3.0.1.
I have shared the bonnie output and logs in  the below link:

https://docs.google.com/leaf?id=0BycgLWCW61phNjY0ZDg4ZjUtYzAyMy00YTgwLWFlMmItNjlmZWIzMWFlNGUy&hl=en_US

Basically the bonnie test gets stuck , when it does "Create files in
sequential order...".
I did all my tests with nandsim. Are there any problems using logfs
with nandsim?
I hit on another  bug(segment.c:782) by running the following script.
-
#!/bin/bash
for i in $(seq 1 10)
do
	sudo mount -t logfs /dev/mtdblock0 /mnt/flash_drive/ ;
	cd /mnt/flash_drive/ ;
	sudo mkdir dir$i ;
	cd ;
	sudo umount /mnt/flash_drive/ ;
done
-
I did a partial analysis. It happens when we do the remount and it
follows the code path "Possibly incomplete write" in check_area()
function.(the area is open and memchr_inv returns non-null pointer).
It tries to do "logfs_rewrite_block", but since the free list is not
filled during this time, it says "LOGFS: ran out of free segments"
bug.
Hope you are able to reproduce this problem.
thanks,
mugunthan
Jörn Engel - Aug. 16, 2011, 5:17 p.m.
On Fri, 12 August 2011 22:56:02 +0530, srimugunthan dhandapani wrote:
> 
> > I wouldn't suspect any code changes to cause the different behaviour.
> > Kernel config and test machine (memsize, etc.) would be more likely
> > candidates.  So if you could try the patch and send me the output,
> > that would be useful.
> 
> I tried your patches and I also tried on official kernel release 3.0.1.
> I have shared the bonnie output and logs in  the below link:
> 
> https://docs.google.com/leaf?id=0BycgLWCW61phNjY0ZDg4ZjUtYzAyMy00YTgwLWFlMmItNjlmZWIzMWFlNGUy&hl=en_US

Are you sure you tried the first patch?  When I grep for
"logfs_invalidatepage", I only find the oopses, not my debug output.

> I did all my tests with nandsim. Are there any problems using logfs
> with nandsim?

Shouldn't be, unless nandsim is corrupting memory in a peculiar and
reproduceable way on your system.

Jörn
srimugunthan dhandapani - Aug. 21, 2011, 9:19 p.m.
hi,
>> I tried your patches and I also tried on official kernel release 3.0.1.
>> I have shared the bonnie output and logs in  the below link:
>>
>> https://docs.google.com/leaf?id=0BycgLWCW61phNjY0ZDg4ZjUtYzAyMy00YTgwLWFlMmItNjlmZWIzMWFlNGUy&hl=en_US

In the shared collection,the files corresponding to first patch are
2_bonnie_out.txt (the patch used and the bonnie output),
2_patch1_first50mb_log.txt (head of the log,first 50MB)
,2_patch1_last50mb_log.txt(tail of the log, last 50MB)

Other files correspond to other kernel versions i tried

> Are you sure you tried the first patch?  When I grep for
> "logfs_invalidatepage", I only find the oopses, not my debug output.

I had a printk("PATCH1") that gets printed in the log.
I dont know why your "logfs_invalidatepage" debug print does not appear.

Thanks,
Mugunthan
Jörn Engel - Aug. 26, 2011, 7:49 p.m.
On Mon, 22 August 2011 02:49:14 +0530, srimugunthan dhandapani wrote:
> 
> I had a printk("PATCH1") that gets printed in the log.
> I dont know why your "logfs_invalidatepage" debug print does not appear.

Are you sure you have my patch applied?

Jörn
srimugunthan dhandapani - Aug. 29, 2011, 10:07 a.m.
> Are you sure you have my patch applied?
>

To clarify on the bugs I reported
1. bonnie test( bonnie  -s 20 -r 10) does not complete . It gets stuck
at "Creating  files in sequential order ..."
(tested with nandsim, kernel 3.0.1. and 2.6.38.8; consistently
reproducible on 2 machines.)
The free command show that, while the bonnie test was run for half an
hour, free space changed from 2982340 KB to 2550156 KB.

2. with mount-mkdir-unmount loop , logfs hits KERNEL bug at segment.c:784
(tested with nandsim, the kernel is from your git.)

3. with bonnie test , sometimes it hits kernel bug at file.c:172
(happens only on the unstable kernel that i was trying. not
consistently reproducible on other kernels)

Regarding the third bug, for double checking ,I thought of taking the
log for your patch once more
But I have recompiled the kernel in my machine and unfortunately i am
not able to reproduce the third bug any more.

I think the first two bugs should be reproducible at your end. If not,
pls let me know, i will see whats wrong with my test setup.
Thanks
mugunthan
Jörn Engel - Aug. 31, 2011, 5:58 a.m.
On Mon, 29 August 2011 15:37:09 +0530, srimugunthan dhandapani wrote:
> 
> > Are you sure you have my patch applied?
> >
> 
> To clarify on the bugs I reported
> 1. bonnie test( bonnie  -s 20 -r 10) does not complete . It gets stuck
> at "Creating  files in sequential order ..."
> (tested with nandsim, kernel 3.0.1. and 2.6.38.8; consistently
> reproducible on 2 machines.)
> The free command show that, while the bonnie test was run for half an
> hour, free space changed from 2982340 KB to 2550156 KB.

The only trouble is that no amount of bonnie output helps me to nail
this one down.  What I need is the kernel output - more specificly the
output of the debug patch I sent you.  Lacking that, there is nothing
I can do.  Can you ensure you have my patch applied, rerun the rest
case and send me the kernel results?

Alternatively we can spend our time figuring out the differences
between your and my setup.  You could run a different kernel with
different patches applied, have a different .config or a different
machine - probably all three.  But my gut feeling is that it shouldn't
be too hard to fix the bug, provided you are able to run test patches
and return the results to me.

Jörn

Patch

diff --git a/fs/logfs/readwrite.c b/fs/logfs/readwrite.c
index 6c23507..3b12b4f 100644
--- a/fs/logfs/readwrite.c
+++ b/fs/logfs/readwrite.c
@@ -1538,6 +1538,7 @@  static int grow_inode(struct inode *inode, u64 bix, level_t level)
 static int __logfs_write_buf(struct inode *inode, struct page *page, long flags)
 {
 	struct logfs_super *super = logfs_super(inode->i_sb);
+	struct logfs_block *block = logfs_block(page);
 	pgoff_t index = page->index;
 	u64 bix;
 	level_t level;
@@ -1547,8 +1548,10 @@  static int __logfs_write_buf(struct inode *inode, struct page *page, long flags)
 	inode->i_ctime = inode->i_mtime = CURRENT_TIME;
 
 	logfs_unpack_index(index, &bix, &level);
-	if (logfs_block(page) && logfs_block(page)->reserved_bytes)
-		super->s_dirty_bytes -= logfs_block(page)->reserved_bytes;
+	if (block && block->reserved_bytes) {
+		super->s_dirty_bytes -= block->reserved_bytes;
+		block->reserved_bytes = 0;
+	}
 
 	if (index < I0_BLOCKS)
 		return logfs_write_direct(inode, page, flags);