| Submitter | Toshiyuki Okajima |
|---|---|
| Date | April 25, 2011, 6:28 a.m. |
| Message ID | <20110425152858.a9edfe58.toshi.okajima@jp.fujitsu.com> |
| Download | mbox | patch |
| Permalink | /patch/92705/ |
| State | Rejected |
| Headers | show |
Comments
On 04/25/2011 09:28 AM, Toshiyuki Okajima wrote: > Hi. > > On Sat, 23 Apr 2011 00:10:25 +0200 > Jan Kara<jack@suse.cz> wrote: >> On Fri 22-04-11 15:58:39, Toshiyuki Okajima wrote: >>> I have confirmed that the following patch works fine while my or >>> Mizuma-san's reproducer is running. Therefore, >>> we can block to write the data, which is mmapped to a file, into a disk >>> by a page-fault while fsfreezing. >>> >>> I think this patch fixes the following two problems: >>> - A deadlock occurs between ext4_da_writepages() (called from >>> writeback_inodes_wb) and thaw_super(). (reported by Mizuma-san) >>> - We can also write the data, which is mmapped to a file, >>> into a disk while fsfreezing (ext3/ext4). >>> (reported by me) >>> >>> Please examine this patch. >> Thanks for the patch. The ext3 part is not as easy as this. You cannot >> really get i_alloc_sem in ext3_page_mkwrite() because mmap_sem is already >> held by page fault code and i_alloc_sem should be acquired before it (yes I >> know, ext4 already has this bug which should be fixed when I get to it). >> Also you'll find that performance of random writers via mmap (which is >> relatively common) is going to be rather bad with this patch (because the >> file will be heavily fragmented). We have to be more clever which is >> exactly why it's taking me so long with my patch :) But tests are already >> running so if everything goes fine, I should have patches to submit next >> week. > OK, I'll wait your patch. :) > >> >> The ext4 part looks correct. I'd just also like to have some comments about >> how freeze handling is done because it's kind of subtle. > > How about this? We can have a race here too - since we are only checking if the F.S is in a frozen state or not at _that_ point. We are _not_ preventing a F.S freeze from happening _after_ this point. So here is what can happen: Key: (tx: time at xth unit) Scenario: Task 1: mmapped write - (case: page mapped to disk and is in page cache) t1) ext4_page_mkwrite() t2) vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE); t3) ---- Preempted ---- Task 2: Freeze Task t4) freezes the super block... ...(continues).... tn) the page cache is clean and the F.S is frozen. Freeze has completed execution. Task1: mmapped write - (case: page mapped to disk and is in page cache) tn+1)ext4_page_mkwrite() returns 0. The write to the mmapped page continues with writing to a page in the page cache when the F.S is frozen! So after the vfs_check_frozen() we _are_ susceptible to "dirtying the page cache when F.S is frozen" In this case we are not protected by a transaction. Are we? Warm Regards, Surbhi. > > Thanks, > Toshiyuki Okajima > > ---------------------------------------------------------------------------------------------------- > Subject: [PATCH] ext4: prevent the mmapped page flushing to disk while fsfreezing > > Signed-off-by: Toshiyuki Okajima<toshi.okajima@jp.fujitsu.com> > --- > fs/ext4/inode.c | 10 +++++++++- > 1 files changed, 9 insertions(+), 1 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index f2fa5e8..411b177 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5812,7 +5812,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) > } > ret = 0; > if (PageMappedToDisk(page)) > - goto out_unlock; > + goto out_frozen; > > if (page->index == size>> PAGE_CACHE_SHIFT) > len = size& ~PAGE_CACHE_MASK; > @@ -5830,6 +5830,14 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) > if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL, > ext4_bh_unmapped)) { > unlock_page(page); > +out_frozen: > + /* > + * We must wait here while the filesystem is being > + * frozen otherwise a flushing thread can write this > + * page to the disk (we can update the filesystem even > + * if it is frozen). > + */ > + vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE); > goto out_unlock; > } > } -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patch
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f2fa5e8..411b177 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5812,7 +5812,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) } ret = 0; if (PageMappedToDisk(page)) - goto out_unlock; + goto out_frozen; if (page->index == size >> PAGE_CACHE_SHIFT) len = size & ~PAGE_CACHE_MASK; @@ -5830,6 +5830,14 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL, ext4_bh_unmapped)) { unlock_page(page); +out_frozen: + /* + * We must wait here while the filesystem is being + * frozen otherwise a flushing thread can write this + * page to the disk (we can update the filesystem even + * if it is frozen). + */ + vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE); goto out_unlock; } }