diff mbox

[3/4] fs: Avoid data corruption with blocksize < pagesize

Message ID 1237311235-13623-4-git-send-email-jack@suse.cz
State Not Applicable, archived
Headers show

Commit Message

Jan Kara March 17, 2009, 5:33 p.m. UTC
Assume the following situation:
Filesystem with blocksize < pagesize - suppose blocksize = 1024,
pagesize = 4096. File 'f' has first four blocks already allocated.
(line with "state:" contains the state of buffers in the page - m = mapped,
u = uptodate, d = dirty)

  process 1:                       process 2:

write to 'f' bytes 0 - 1024
  state: |mud,-,-,-|, page dirty
                                   write to 'f' bytes 1024 - 4096:
                                     __block_prepare_write() maps blocks
                                       state: |mud,m,m,m|, page dirty
                                     we fail to copy data -> copied = 0
                                     block_write_end() does nothing
                                     page gets unlocked
writepage() is called on the page
  block_write_full_page() writes buffers with garbage

This patch fixes the problem by skipping !uptodate buffers in
block_write_full_page().

CC: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/buffer.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

Comments

Nick Piggin March 18, 2009, noon UTC | #1
On Tue, Mar 17, 2009 at 06:33:54PM +0100, Jan Kara wrote:
> Assume the following situation:
> Filesystem with blocksize < pagesize - suppose blocksize = 1024,
> pagesize = 4096. File 'f' has first four blocks already allocated.
> (line with "state:" contains the state of buffers in the page - m = mapped,
> u = uptodate, d = dirty)
> 
>   process 1:                       process 2:
> 
> write to 'f' bytes 0 - 1024
>   state: |mud,-,-,-|, page dirty
>                                    write to 'f' bytes 1024 - 4096:
>                                      __block_prepare_write() maps blocks
>                                        state: |mud,m,m,m|, page dirty
>                                      we fail to copy data -> copied = 0
>                                      block_write_end() does nothing
>                                      page gets unlocked
> writepage() is called on the page
>   block_write_full_page() writes buffers with garbage
> 
> This patch fixes the problem by skipping !uptodate buffers in
> block_write_full_page().
> 
> CC: Nick Piggin <npiggin@suse.de>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/buffer.c |    7 ++++++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 9f69741..22c0144 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1774,7 +1774,12 @@ static int __block_write_full_page(struct inode *inode, struct page *page,
>  	} while (bh != head);
>  
>  	do {
> -		if (!buffer_mapped(bh))
> +		/*
> +		 * Parallel write could have already mapped the buffers but
> +		 * it then had to restart before copying in new data. We
> +		 * must avoid writing garbage so just skip the buffer.
> +		 */
> +		if (!buffer_mapped(bh) || !buffer_uptodate(bh))
>  			continue;

I don't quite see how this can happen. Further down in this loop,
we do a test_clear_buffer_dirty(), which should exclude this I
think? And marking the buffer dirty if it is not uptodate should
be a bug.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara March 18, 2009, 2:13 p.m. UTC | #2
On Wed 18-03-09 13:00:23, Nick Piggin wrote:
> On Tue, Mar 17, 2009 at 06:33:54PM +0100, Jan Kara wrote:
> > Assume the following situation:
> > Filesystem with blocksize < pagesize - suppose blocksize = 1024,
> > pagesize = 4096. File 'f' has first four blocks already allocated.
> > (line with "state:" contains the state of buffers in the page - m = mapped,
> > u = uptodate, d = dirty)
> > 
> >   process 1:                       process 2:
> > 
> > write to 'f' bytes 0 - 1024
> >   state: |mud,-,-,-|, page dirty
> >                                    write to 'f' bytes 1024 - 4096:
> >                                      __block_prepare_write() maps blocks
> >                                        state: |mud,m,m,m|, page dirty
> >                                      we fail to copy data -> copied = 0
> >                                      block_write_end() does nothing
> >                                      page gets unlocked
> > writepage() is called on the page
> >   block_write_full_page() writes buffers with garbage
> > 
> > This patch fixes the problem by skipping !uptodate buffers in
> > block_write_full_page().
> > 
> > CC: Nick Piggin <npiggin@suse.de>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/buffer.c |    7 ++++++-
> >  1 files changed, 6 insertions(+), 1 deletions(-)
> > 
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 9f69741..22c0144 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -1774,7 +1774,12 @@ static int __block_write_full_page(struct inode *inode, struct page *page,
> >  	} while (bh != head);
> >  
> >  	do {
> > -		if (!buffer_mapped(bh))
> > +		/*
> > +		 * Parallel write could have already mapped the buffers but
> > +		 * it then had to restart before copying in new data. We
> > +		 * must avoid writing garbage so just skip the buffer.
> > +		 */
> > +		if (!buffer_mapped(bh) || !buffer_uptodate(bh))
> >  			continue;
> 
> I don't quite see how this can happen. Further down in this loop,
> we do a test_clear_buffer_dirty(), which should exclude this I
> think? And marking the buffer dirty if it is not uptodate should
> be a bug.
  Hmm, this patch definitely does something important because without it I
hit corruption in UML in ~20 minutes and with it no corruption happens
in ~3 hours. Maybe someone calls set_page_dirty() on the page and
__set_page_dirty_buffers() unconditionally dirties all the buffers the
page has? But I still don't see how the write could be lost which is what
I observe in fsx-linux test. I'm doing some more tests to understand this
better.

									Honza
Aneesh Kumar K.V March 18, 2009, 6:42 p.m. UTC | #3
On Tue, Mar 17, 2009 at 06:33:54PM +0100, Jan Kara wrote:
> Assume the following situation:
> Filesystem with blocksize < pagesize - suppose blocksize = 1024,
> pagesize = 4096. File 'f' has first four blocks already allocated.
> (line with "state:" contains the state of buffers in the page - m = mapped,
> u = uptodate, d = dirty)
> 
>   process 1:                       process 2:
> 
> write to 'f' bytes 0 - 1024
>   state: |mud,-,-,-|, page dirty
>                                    write to 'f' bytes 1024 - 4096:
>                                      __block_prepare_write() maps blocks
>                                        state: |mud,m,m,m|, page dirty
>                                      we fail to copy data -> copied = 0
>                                      block_write_end() does nothing
>                                      page gets unlocked


If copied = 0 then in  block_write_end we do

page_zero_new_buffers(page, start+copied, start+len

which would mean we should not see garbage.


> writepage() is called on the page
>   block_write_full_page() writes buffers with garbage
> 

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara March 18, 2009, 6:50 p.m. UTC | #4
On Thu 19-03-09 00:12:22, Aneesh Kumar K.V wrote:
> On Tue, Mar 17, 2009 at 06:33:54PM +0100, Jan Kara wrote:
> > Assume the following situation:
> > Filesystem with blocksize < pagesize - suppose blocksize = 1024,
> > pagesize = 4096. File 'f' has first four blocks already allocated.
> > (line with "state:" contains the state of buffers in the page - m = mapped,
> > u = uptodate, d = dirty)
> > 
> >   process 1:                       process 2:
> > 
> > write to 'f' bytes 0 - 1024
> >   state: |mud,-,-,-|, page dirty
> >                                    write to 'f' bytes 1024 - 4096:
> >                                      __block_prepare_write() maps blocks
> >                                        state: |mud,m,m,m|, page dirty
> >                                      we fail to copy data -> copied = 0
> >                                      block_write_end() does nothing
> >                                      page gets unlocked
> 
> 
> If copied = 0 then in  block_write_end we do
> 
> page_zero_new_buffers(page, start+copied, start+len
> 
> which would mean we should not see garbage.
  But this will zero only *new* buffers - so if they are already allocated,
get_block() won't set new flag and they won't be zeroed...
  But I'm not saying I understand why this seems to help against a corruption
under UML because we don't seem to be writing !uptodate buffers there.

									Honza
Jan Kara March 18, 2009, 6:57 p.m. UTC | #5
On Wed 18-03-09 13:00:23, Nick Piggin wrote:
> On Tue, Mar 17, 2009 at 06:33:54PM +0100, Jan Kara wrote:
> > Assume the following situation:
> > Filesystem with blocksize < pagesize - suppose blocksize = 1024,
> > pagesize = 4096. File 'f' has first four blocks already allocated.
> > (line with "state:" contains the state of buffers in the page - m = mapped,
> > u = uptodate, d = dirty)
> > 
> >   process 1:                       process 2:
> > 
> > write to 'f' bytes 0 - 1024
> >   state: |mud,-,-,-|, page dirty
> >                                    write to 'f' bytes 1024 - 4096:
> >                                      __block_prepare_write() maps blocks
> >                                        state: |mud,m,m,m|, page dirty
> >                                      we fail to copy data -> copied = 0
> >                                      block_write_end() does nothing
> >                                      page gets unlocked
> > writepage() is called on the page
> >   block_write_full_page() writes buffers with garbage
> > 
> > This patch fixes the problem by skipping !uptodate buffers in
> > block_write_full_page().
> > 
> > CC: Nick Piggin <npiggin@suse.de>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/buffer.c |    7 ++++++-
> >  1 files changed, 6 insertions(+), 1 deletions(-)
> > 
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 9f69741..22c0144 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -1774,7 +1774,12 @@ static int __block_write_full_page(struct inode *inode, struct page *page,
> >  	} while (bh != head);
> >  
> >  	do {
> > -		if (!buffer_mapped(bh))
> > +		/*
> > +		 * Parallel write could have already mapped the buffers but
> > +		 * it then had to restart before copying in new data. We
> > +		 * must avoid writing garbage so just skip the buffer.
> > +		 */
> > +		if (!buffer_mapped(bh) || !buffer_uptodate(bh))
> >  			continue;
> 
> I don't quite see how this can happen. Further down in this loop,
> we do a test_clear_buffer_dirty(), which should exclude this I
> think? And marking the buffer dirty if it is not uptodate should
> be a bug.
  OK, I spoke too soon. Now I reproduced the corruption under UML even with
this patch. So it may be something different...

									Honza
diff mbox

Patch

diff --git a/fs/buffer.c b/fs/buffer.c
index 9f69741..22c0144 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1774,7 +1774,12 @@  static int __block_write_full_page(struct inode *inode, struct page *page,
 	} while (bh != head);
 
 	do {
-		if (!buffer_mapped(bh))
+		/*
+		 * Parallel write could have already mapped the buffers but
+		 * it then had to restart before copying in new data. We
+		 * must avoid writing garbage so just skip the buffer.
+		 */
+		if (!buffer_mapped(bh) || !buffer_uptodate(bh))
 			continue;
 		/*
 		 * If it's a fully non-blocking write attempt and we cannot