Message ID | 1444363269-25956-1-git-send-email-tytso@mit.edu |
---|---|
State | Accepted, archived |
Headers | show |
On Fri, 9 Oct 2015 00:01:09 -0400 Theodore Ts'o <tytso@mit.edu> wrote: > If there is a error while copying data from userspace into the page > cache during a write(2) system call, in data=journal mode, in > ext4_journalled_write_end() were using page_zero_new_buffers() from > fs/buffer.c. Unfortunately, this sets the buffer dirty flag, which is > no good if journalling is enabled. This is a long-standing bug that > goes back for years and years in ext3, but a combination of (a) > data=journal not being very common, (b) in many case it only results > in a warning message. and (c) only very rarely causes the kernel hang, > means that we only really noticed this as a problem when commit > 998ef75ddb caused this failure to happen frequently enough to cause > generic/208 to fail when run in data=journal mode. > > The fix is to have our own version of this function that doesn't call > mark_dirty_buffer(), since we will end up calling > ext4_handle_dirty_metadata() on the buffer head(s) in questions very > shortly afterwards in ext4_journalled_write_end(). > > Thanks to Dave Hansen and Linus Torvalds for helping to identify the > root cause of the problem. > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > --- > fs/ext4/inode.c | 34 +++++++++++++++++++++++++++++++++- > 1 file changed, 33 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index ae52e32..0a589bb 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1181,6 +1181,38 @@ errout: > return ret ? ret : copied; > } > > +/* > + * This is a private version of page_zero_new_buffers() which doesn't > + * set the buffer to be dirty, since in data=journalled mode we need > + * to call ext4_handle_dirty_metadata() instad. Small typo: s/instad/instead/ > + */ > +static void zero_new_buffers(struct page *page, unsigned from, unsigned to) > +{ > + unsigned int block_start = 0, block_end; > + struct buffer_head *head, *bh; > + > + bh = head = page_buffers(page); > + do { > + block_end = block_start + bh->b_size; > + if (buffer_new(bh)) { > + if (block_end > from && block_start < to) { > + if (!PageUptodate(page)) { > + unsigned start, size; > + > + start = max(from, block_start); > + size = min(to, block_end) - start; > + > + zero_user(page, start, size); > + set_buffer_uptodate(bh); > + } > + clear_buffer_new(bh); > + } > + } > + block_start = block_end; > + bh = bh->b_this_page; > + } while (bh != head); > +} > + > static int ext4_journalled_write_end(struct file *file, > struct address_space *mapping, > loff_t pos, unsigned len, unsigned copied, > @@ -1207,7 +1239,7 @@ static int ext4_journalled_write_end(struct file *file, > if (copied < len) { > if (!PageUptodate(page)) > copied = 0; > - page_zero_new_buffers(page, from+copied, to); > + zero_new_buffers(page, from+copied, to); > } > > ret = ext4_walk_page_buffers(handle, page_buffers(page), from,
On Fri 09-10-15 00:01:09, Ted Tso wrote: > If there is a error while copying data from userspace into the page > cache during a write(2) system call, in data=journal mode, in > ext4_journalled_write_end() were using page_zero_new_buffers() from > fs/buffer.c. Unfortunately, this sets the buffer dirty flag, which is > no good if journalling is enabled. This is a long-standing bug that > goes back for years and years in ext3, but a combination of (a) > data=journal not being very common, (b) in many case it only results > in a warning message. and (c) only very rarely causes the kernel hang, > means that we only really noticed this as a problem when commit > 998ef75ddb caused this failure to happen frequently enough to cause > generic/208 to fail when run in data=journal mode. > > The fix is to have our own version of this function that doesn't call > mark_dirty_buffer(), since we will end up calling > ext4_handle_dirty_metadata() on the buffer head(s) in questions very > shortly afterwards in ext4_journalled_write_end(). > > Thanks to Dave Hansen and Linus Torvalds for helping to identify the > root cause of the problem. > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> Looks good to me. You can add: Reviewed-by: Jan Kara <jack@suse.com> Honza > --- > fs/ext4/inode.c | 34 +++++++++++++++++++++++++++++++++- > 1 file changed, 33 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index ae52e32..0a589bb 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1181,6 +1181,38 @@ errout: > return ret ? ret : copied; > } > > +/* > + * This is a private version of page_zero_new_buffers() which doesn't > + * set the buffer to be dirty, since in data=journalled mode we need > + * to call ext4_handle_dirty_metadata() instad. > + */ > +static void zero_new_buffers(struct page *page, unsigned from, unsigned to) > +{ > + unsigned int block_start = 0, block_end; > + struct buffer_head *head, *bh; > + > + bh = head = page_buffers(page); > + do { > + block_end = block_start + bh->b_size; > + if (buffer_new(bh)) { > + if (block_end > from && block_start < to) { > + if (!PageUptodate(page)) { > + unsigned start, size; > + > + start = max(from, block_start); > + size = min(to, block_end) - start; > + > + zero_user(page, start, size); > + set_buffer_uptodate(bh); > + } > + clear_buffer_new(bh); > + } > + } > + block_start = block_end; > + bh = bh->b_this_page; > + } while (bh != head); > +} > + > static int ext4_journalled_write_end(struct file *file, > struct address_space *mapping, > loff_t pos, unsigned len, unsigned copied, > @@ -1207,7 +1239,7 @@ static int ext4_journalled_write_end(struct file *file, > if (copied < len) { > if (!PageUptodate(page)) > copied = 0; > - page_zero_new_buffers(page, from+copied, to); > + zero_new_buffers(page, from+copied, to); > } > > ret = ext4_walk_page_buffers(handle, page_buffers(page), from, > -- > 2.5.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ae52e32..0a589bb 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1181,6 +1181,38 @@ errout: return ret ? ret : copied; } +/* + * This is a private version of page_zero_new_buffers() which doesn't + * set the buffer to be dirty, since in data=journalled mode we need + * to call ext4_handle_dirty_metadata() instad. + */ +static void zero_new_buffers(struct page *page, unsigned from, unsigned to) +{ + unsigned int block_start = 0, block_end; + struct buffer_head *head, *bh; + + bh = head = page_buffers(page); + do { + block_end = block_start + bh->b_size; + if (buffer_new(bh)) { + if (block_end > from && block_start < to) { + if (!PageUptodate(page)) { + unsigned start, size; + + start = max(from, block_start); + size = min(to, block_end) - start; + + zero_user(page, start, size); + set_buffer_uptodate(bh); + } + clear_buffer_new(bh); + } + } + block_start = block_end; + bh = bh->b_this_page; + } while (bh != head); +} + static int ext4_journalled_write_end(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, @@ -1207,7 +1239,7 @@ static int ext4_journalled_write_end(struct file *file, if (copied < len) { if (!PageUptodate(page)) copied = 0; - page_zero_new_buffers(page, from+copied, to); + zero_new_buffers(page, from+copied, to); } ret = ext4_walk_page_buffers(handle, page_buffers(page), from,
If there is a error while copying data from userspace into the page cache during a write(2) system call, in data=journal mode, in ext4_journalled_write_end() were using page_zero_new_buffers() from fs/buffer.c. Unfortunately, this sets the buffer dirty flag, which is no good if journalling is enabled. This is a long-standing bug that goes back for years and years in ext3, but a combination of (a) data=journal not being very common, (b) in many case it only results in a warning message. and (c) only very rarely causes the kernel hang, means that we only really noticed this as a problem when commit 998ef75ddb caused this failure to happen frequently enough to cause generic/208 to fail when run in data=journal mode. The fix is to have our own version of this function that doesn't call mark_dirty_buffer(), since we will end up calling ext4_handle_dirty_metadata() on the buffer head(s) in questions very shortly afterwards in ext4_journalled_write_end(). Thanks to Dave Hansen and Linus Torvalds for helping to identify the root cause of the problem. Signed-off-by: Theodore Ts'o <tytso@mit.edu> --- fs/ext4/inode.c | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-)