Patchwork [v4,3/7] mm: Allow filesystems to defer cmtime updates

login
register
mail settings
Submitter Andrew Lutomirski
Date Aug. 23, 2013, 12:03 a.m.
Message ID <e1620c8e4909a65e270c8e9590e307c22fd96a44.1377193658.git.luto@amacapital.net>
Download mbox | patch
Permalink /patch/269231/
State New
Headers show

Comments

Andrew Lutomirski - Aug. 23, 2013, 12:03 a.m.
Filesystems that defer cmtime updates should update cmtime when any
of these events happen after a write via a mapping:

 - The mapping is written back to disk.  This happens from all kinds
   of places, most of which eventually call ->writepages.  (The
   exceptions are vmscan and migration.)

 - munmap is called or the mapping is removed when the process exits

 - msync(MS_ASYNC) is called.  Linux currently does nothing for
   msync(MS_ASYNC), but POSIX says that cmtime should be updated some
   time between an mmaped write and the subsequent msync call.
   MS_SYNC calls ->writepages, but MS_ASYNC needs special handling.

Filesystems are responsible for checking for pending deferred cmtime
updates in .writepages (a helper is provided for this purpose) and
for doing the actual update in .update_cmtime_deferred.

These changes have no effect by themselves; filesystems must opt in
by implementing .update_cmtime_deferred and removing any
file_update_time call in .page_mkwrite.

This patch does not implement the MS_ASYNC case; that's in the next
patch.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 include/linux/fs.h        |  8 +++++++
 include/linux/pagemap.h   |  6 ++++++
 include/linux/writeback.h |  1 +
 mm/migrate.c              |  2 ++
 mm/mmap.c                 |  6 +++++-
 mm/page-writeback.c       | 53 ++++++++++++++++++++++++++++++++++++++++++++++-
 mm/vmscan.c               |  1 +
 7 files changed, 75 insertions(+), 2 deletions(-)
Jan Kara - Sept. 4, 2013, 2:57 p.m.
On Thu 22-08-13 17:03:19, Andy Lutomirski wrote:
> Filesystems that defer cmtime updates should update cmtime when any
> of these events happen after a write via a mapping:
> 
>  - The mapping is written back to disk.  This happens from all kinds
>    of places, most of which eventually call ->writepages.  (The
>    exceptions are vmscan and migration.)
> 
>  - munmap is called or the mapping is removed when the process exits
> 
>  - msync(MS_ASYNC) is called.  Linux currently does nothing for
>    msync(MS_ASYNC), but POSIX says that cmtime should be updated some
>    time between an mmaped write and the subsequent msync call.
>    MS_SYNC calls ->writepages, but MS_ASYNC needs special handling.
> 
> Filesystems are responsible for checking for pending deferred cmtime
> updates in .writepages (a helper is provided for this purpose) and
> for doing the actual update in .update_cmtime_deferred.
> 
> These changes have no effect by themselves; filesystems must opt in
> by implementing .update_cmtime_deferred and removing any
> file_update_time call in .page_mkwrite.
> 
> This patch does not implement the MS_ASYNC case; that's in the next
> patch.
> 
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
...
> +/**
> + * generic_update_cmtime_deferred - update cmtime after an mmapped write
> + * @mapping: The mapping
> + *
> + * This library function implements .update_cmtime_deferred.  It is unlikely
> + * that any filesystem will want to do anything here except update the time
> + * (using this helper) or nothing at all (by leaving .update_cmtime_deferred
> + * NULL).
> + */
> +void generic_update_cmtime_deferred(struct address_space *mapping)
> +{
> +	struct blk_plug plug;
> +	blk_start_plug(&plug);
> +	inode_update_time_writable(mapping->host);
> +	blk_finish_plug(&plug);
> +}
> +EXPORT_SYMBOL(generic_update_cmtime_deferred);
> +
  You can remove the pluggin here. Inode update will likely result in a
single write so there's no point.

> @@ -1970,6 +1988,39 @@ int write_one_page(struct page *page, int wait)
>  }
>  EXPORT_SYMBOL(write_one_page);
>  
> +void mapping_flush_cmtime(struct address_space *mapping)
> +{
> +	if (mapping_test_clear_cmtime(mapping) &&
> +	    mapping->a_ops->update_cmtime_deferred)
> +		mapping->a_ops->update_cmtime_deferred(mapping);
> +}
> +EXPORT_SYMBOL(mapping_flush_cmtime);
  Hum, is there a reason for update_cmtime_deferred() operation? I can
hardly imagine anyone will want to do anything else than what
inode_update_time_writable() does so why bother? You mention tmpfs & co.
don't fit into your scheme well with which I agree so let's just keep
file_update_time() in their page_mkwrite() operation. But I don't see a
real need for avoiding the deferred cmtime logic...

> +
> +void mapping_flush_cmtime_nowb(struct address_space *mapping)
> +{
> +	/*
> +	 * We get called from munmap and msync.  Both calls can race
> +	 * with fs freezing.  If the fs is frozen after
> +	 * mapping_test_clear_cmtime but before the time update, then
> +	 * sync_filesystem will miss the cmtime update (because we
> +	 * just cleared it) and we don't be able to write (because the
> +	 * fs is frozen).  On the other hand, we can't just return if
> +	 * we're in the SB_FREEZE_PAGEFAULT state because our caller
> +	 * expects the timestamp to be synchronously updated.  So we
> +	 * get write access without blocking, at the SB_FREEZE_FS
> +	 * level.  If the fs is already fully frozen, then we already
> +	 * know we have nothing to do.
> +	 */
> +
> +	if (!mapping_test_cmtime(mapping))
> +		return;  /* Optimization: nothing to do. */
> +
> +	if (__sb_start_write(mapping->host->i_sb, SB_FREEZE_FS, false)) {
> +		mapping_flush_cmtime(mapping);
> +		__sb_end_write(mapping->host->i_sb, SB_FREEZE_FS);
> +	}
> +}
  This is wrong because SB_FREEZE_FS level is targetted for filesystem
internal use. Also it is racy. mapping_flush_cmtime() ends up calling
mark_inode_dirty() and filesystems such as ext4 or xfs will start a
transaction to store inode in the journal. This gets freeze protection at
SB_FREEZE_FS level again. If freeze_super() sets s_writers.frozen to
SB_FREEZE_FS before this second protection, things will deadlock.

Since the callers of this function hold mmap_sem, using SB_FREEZE_PAGEFAULT
protection would be appropriate. Also since there are just two places that
need the freeze protection I'd be inclined to open code the protection in
the two places rather than hiding it in a special function.

								Honza
Andrew Lutomirski - Sept. 4, 2013, 5:54 p.m.
On Wed, Sep 4, 2013 at 7:57 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 22-08-13 17:03:19, Andy Lutomirski wrote:
>> Filesystems that defer cmtime updates should update cmtime when any
>> of these events happen after a write via a mapping:
>>
>>  - The mapping is written back to disk.  This happens from all kinds
>>    of places, most of which eventually call ->writepages.  (The
>>    exceptions are vmscan and migration.)
>>
>>  - munmap is called or the mapping is removed when the process exits
>>
>>  - msync(MS_ASYNC) is called.  Linux currently does nothing for
>>    msync(MS_ASYNC), but POSIX says that cmtime should be updated some
>>    time between an mmaped write and the subsequent msync call.
>>    MS_SYNC calls ->writepages, but MS_ASYNC needs special handling.
>>
>> Filesystems are responsible for checking for pending deferred cmtime
>> updates in .writepages (a helper is provided for this purpose) and
>> for doing the actual update in .update_cmtime_deferred.
>>
>> These changes have no effect by themselves; filesystems must opt in
>> by implementing .update_cmtime_deferred and removing any
>> file_update_time call in .page_mkwrite.
>>
>> This patch does not implement the MS_ASYNC case; that's in the next
>> patch.
>>
>> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> ...
>> +/**
>> + * generic_update_cmtime_deferred - update cmtime after an mmapped write
>> + * @mapping: The mapping
>> + *
>> + * This library function implements .update_cmtime_deferred.  It is unlikely
>> + * that any filesystem will want to do anything here except update the time
>> + * (using this helper) or nothing at all (by leaving .update_cmtime_deferred
>> + * NULL).
>> + */
>> +void generic_update_cmtime_deferred(struct address_space *mapping)
>> +{
>> +     struct blk_plug plug;
>> +     blk_start_plug(&plug);
>> +     inode_update_time_writable(mapping->host);
>> +     blk_finish_plug(&plug);
>> +}
>> +EXPORT_SYMBOL(generic_update_cmtime_deferred);
>> +
>   You can remove the pluggin here. Inode update will likely result in a
> single write so there's no point.
>
>> @@ -1970,6 +1988,39 @@ int write_one_page(struct page *page, int wait)
>>  }
>>  EXPORT_SYMBOL(write_one_page);
>>
>> +void mapping_flush_cmtime(struct address_space *mapping)
>> +{
>> +     if (mapping_test_clear_cmtime(mapping) &&
>> +         mapping->a_ops->update_cmtime_deferred)
>> +             mapping->a_ops->update_cmtime_deferred(mapping);
>> +}
>> +EXPORT_SYMBOL(mapping_flush_cmtime);
>   Hum, is there a reason for update_cmtime_deferred() operation? I can
> hardly imagine anyone will want to do anything else than what
> inode_update_time_writable() does so why bother? You mention tmpfs & co.
> don't fit into your scheme well with which I agree so let's just keep
> file_update_time() in their page_mkwrite() operation. But I don't see a
> real need for avoiding the deferred cmtime logic...

I think there might be odd corner cases.  For example, mmap a tmpfs
file, write it, and unmap it.  Then, an hour later, maybe the system
will be under memory pressure and page out the file.  This could
trigger a surprising time update.  (I'm not sure this can actually
happen on tmpfs, but maybe it would on some other filesystem.)

Does this actually matter?  A flag to turn the feature on or off would
do the trick, but I don't think there's precedent for sticking a flag
in a_ops.

>
>> +
>> +void mapping_flush_cmtime_nowb(struct address_space *mapping)
>> +{
>> +     /*
>> +      * We get called from munmap and msync.  Both calls can race
>> +      * with fs freezing.  If the fs is frozen after
>> +      * mapping_test_clear_cmtime but before the time update, then
>> +      * sync_filesystem will miss the cmtime update (because we
>> +      * just cleared it) and we don't be able to write (because the
>> +      * fs is frozen).  On the other hand, we can't just return if
>> +      * we're in the SB_FREEZE_PAGEFAULT state because our caller
>> +      * expects the timestamp to be synchronously updated.  So we
>> +      * get write access without blocking, at the SB_FREEZE_FS
>> +      * level.  If the fs is already fully frozen, then we already
>> +      * know we have nothing to do.
>> +      */
>> +
>> +     if (!mapping_test_cmtime(mapping))
>> +             return;  /* Optimization: nothing to do. */
>> +
>> +     if (__sb_start_write(mapping->host->i_sb, SB_FREEZE_FS, false)) {
>> +             mapping_flush_cmtime(mapping);
>> +             __sb_end_write(mapping->host->i_sb, SB_FREEZE_FS);
>> +     }
>> +}
>   This is wrong because SB_FREEZE_FS level is targetted for filesystem
> internal use. Also it is racy. mapping_flush_cmtime() ends up calling
> mark_inode_dirty() and filesystems such as ext4 or xfs will start a
> transaction to store inode in the journal. This gets freeze protection at
> SB_FREEZE_FS level again. If freeze_super() sets s_writers.frozen to
> SB_FREEZE_FS before this second protection, things will deadlock.

Whoops -- I assumed that it was safe to recursively take freeze
protection at the same level.

I'm worried about the following race:

Thread 1 (in munmap):
Check AS_CMTIME set
sb_start_pagefault

Thread 2 (freezing the fs):
frozen = SB_FREEZE_PAGEFAULT;
sync_filesystem()

Thread 1 is now stuck.  It doesn't need to be, because sync_filesystem
will flush out the cmtime write.  But there doesn't seem to be a clean
mechanism to wait for the freeze to finish.

Is there a clean way to avoid this?  I don't want to return
immediately if a freeze is in progress, because userspace expects that
munmap will update cmtime synchronously.

And ugly but simple solution is:

if (!mapping_test_cmtime(mapping))
    return;  /* Optimization: nothing to do. */

if (__sb_start_write(mapping->host->i_sb, SB_FREEZE_FS, false)) {
    mapping_flush_cmtime(mapping);
    __sb_end_write(mapping->host->i_sb, SB_FREEZE_FS);
} else {
    /* Freeze is or was in progress.  The part of freezing from
SB_FREEZE_PAGEFAULT through sync_filesystem holds s_umount for write,
so we can wait for it to finish by taking s_umount for read. */
    down_read(&sb->s_umount);
    up_read(&sb->s_umount);
}

--Andy

>
> Since the callers of this function hold mmap_sem, using SB_FREEZE_PAGEFAULT
> protection would be appropriate. Also since there are just two places that
> need the freeze protection I'd be inclined to open code the protection in
> the two places rather than hiding it in a special function.

Given that this is rather subtle (I've gotten it wrong multiple
times), I'd rather leave it in one place and comment it well.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - Sept. 4, 2013, 7:20 p.m.
On Wed 04-09-13 10:54:50, Andy Lutomirski wrote:
> >> @@ -1970,6 +1988,39 @@ int write_one_page(struct page *page, int wait)
> >>  }
> >>  EXPORT_SYMBOL(write_one_page);
> >>
> >> +void mapping_flush_cmtime(struct address_space *mapping)
> >> +{
> >> +     if (mapping_test_clear_cmtime(mapping) &&
> >> +         mapping->a_ops->update_cmtime_deferred)
> >> +             mapping->a_ops->update_cmtime_deferred(mapping);
> >> +}
> >> +EXPORT_SYMBOL(mapping_flush_cmtime);
> >   Hum, is there a reason for update_cmtime_deferred() operation? I can
> > hardly imagine anyone will want to do anything else than what
> > inode_update_time_writable() does so why bother? You mention tmpfs & co.
> > don't fit into your scheme well with which I agree so let's just keep
> > file_update_time() in their page_mkwrite() operation. But I don't see a
> > real need for avoiding the deferred cmtime logic...
> 
> I think there might be odd corner cases.  For example, mmap a tmpfs
> file, write it, and unmap it.  Then, an hour later, maybe the system
  If you unmap it then that will handle the update. But if you won't unmap,
you'd get spurious updates of timestamps which would be strange.

> will be under memory pressure and page out the file.  This could
> trigger a surprising time update.  (I'm not sure this can actually
> happen on tmpfs, but maybe it would on some other filesystem.)
> 
> Does this actually matter?  A flag to turn the feature on or off would
> do the trick, but I don't think there's precedent for sticking a flag
> in a_ops.
  Flag in a_ops is ugly. But you can have a flag in 'struct
filesystem_type' which would be reasonable. 

> >> +void mapping_flush_cmtime_nowb(struct address_space *mapping)
> >> +{
> >> +     /*
> >> +      * We get called from munmap and msync.  Both calls can race
> >> +      * with fs freezing.  If the fs is frozen after
> >> +      * mapping_test_clear_cmtime but before the time update, then
> >> +      * sync_filesystem will miss the cmtime update (because we
> >> +      * just cleared it) and we don't be able to write (because the
> >> +      * fs is frozen).  On the other hand, we can't just return if
> >> +      * we're in the SB_FREEZE_PAGEFAULT state because our caller
> >> +      * expects the timestamp to be synchronously updated.  So we
> >> +      * get write access without blocking, at the SB_FREEZE_FS
> >> +      * level.  If the fs is already fully frozen, then we already
> >> +      * know we have nothing to do.
> >> +      */
> >> +
> >> +     if (!mapping_test_cmtime(mapping))
> >> +             return;  /* Optimization: nothing to do. */
> >> +
> >> +     if (__sb_start_write(mapping->host->i_sb, SB_FREEZE_FS, false)) {
> >> +             mapping_flush_cmtime(mapping);
> >> +             __sb_end_write(mapping->host->i_sb, SB_FREEZE_FS);
> >> +     }
> >> +}
> >   This is wrong because SB_FREEZE_FS level is targetted for filesystem
> > internal use. Also it is racy. mapping_flush_cmtime() ends up calling
> > mark_inode_dirty() and filesystems such as ext4 or xfs will start a
> > transaction to store inode in the journal. This gets freeze protection at
> > SB_FREEZE_FS level again. If freeze_super() sets s_writers.frozen to
> > SB_FREEZE_FS before this second protection, things will deadlock.
> 
> Whoops -- I assumed that it was safe to recursively take freeze
> protection at the same level.
> 
> I'm worried about the following race:
> 
> Thread 1 (in munmap):
> Check AS_CMTIME set
> sb_start_pagefault
> 
> Thread 2 (freezing the fs):
> frozen = SB_FREEZE_PAGEFAULT;
> sync_filesystem()
> 
> Thread 1 is now stuck.  It doesn't need to be, because sync_filesystem
> will flush out the cmtime write.  But there doesn't seem to be a clean
> mechanism to wait for the freeze to finish.
  OK, I see. Frankly, I'd rather live with msync() and munmap() blocking
while filesystem is frozen than trying to outsmart the freezing logic...
If someone comes up with a usecase where it causes trouble, we can always
improve the logic with some clever tricks.

> Is there a clean way to avoid this?  I don't want to return
> immediately if a freeze is in progress, because userspace expects that
> munmap will update cmtime synchronously.
> 
> And ugly but simple solution is:
> 
> if (!mapping_test_cmtime(mapping))
>     return;  /* Optimization: nothing to do. */
> 
> if (__sb_start_write(mapping->host->i_sb, SB_FREEZE_FS, false)) {
>     mapping_flush_cmtime(mapping);
>     __sb_end_write(mapping->host->i_sb, SB_FREEZE_FS);
> } else {
>     /* Freeze is or was in progress.  The part of freezing from
> SB_FREEZE_PAGEFAULT through sync_filesystem holds s_umount for write,
> so we can wait for it to finish by taking s_umount for read. */
>     down_read(&sb->s_umount);
>     up_read(&sb->s_umount);
> }
  Yes, this would probably work but as I said above, I'd prefer the to keep
it simple unless we have a good reason for the complex solution.

								Honza
Andrew Lutomirski - Sept. 4, 2013, 8:05 p.m.
On Wed, Sep 4, 2013 at 12:20 PM, Jan Kara <jack@suse.cz> wrote:
> On Wed 04-09-13 10:54:50, Andy Lutomirski wrote:
>> >> @@ -1970,6 +1988,39 @@ int write_one_page(struct page *page, int wait)
>> >>  }
>> >>  EXPORT_SYMBOL(write_one_page);
>> >>
>> >> +void mapping_flush_cmtime(struct address_space *mapping)
>> >> +{
>> >> +     if (mapping_test_clear_cmtime(mapping) &&
>> >> +         mapping->a_ops->update_cmtime_deferred)
>> >> +             mapping->a_ops->update_cmtime_deferred(mapping);
>> >> +}
>> >> +EXPORT_SYMBOL(mapping_flush_cmtime);
>> >   Hum, is there a reason for update_cmtime_deferred() operation? I can
>> > hardly imagine anyone will want to do anything else than what
>> > inode_update_time_writable() does so why bother? You mention tmpfs & co.
>> > don't fit into your scheme well with which I agree so let's just keep
>> > file_update_time() in their page_mkwrite() operation. But I don't see a
>> > real need for avoiding the deferred cmtime logic...
>>
>> I think there might be odd corner cases.  For example, mmap a tmpfs
>> file, write it, and unmap it.  Then, an hour later, maybe the system
>   If you unmap it then that will handle the update. But if you won't unmap,
> you'd get spurious updates of timestamps which would be strange.
>
>> will be under memory pressure and page out the file.  This could
>> trigger a surprising time update.  (I'm not sure this can actually
>> happen on tmpfs, but maybe it would on some other filesystem.)
>>
>> Does this actually matter?  A flag to turn the feature on or off would
>> do the trick, but I don't think there's precedent for sticking a flag
>> in a_ops.
>   Flag in a_ops is ugly. But you can have a flag in 'struct
> filesystem_type' which would be reasonable.

OK, will do.

>
>> >> +void mapping_flush_cmtime_nowb(struct address_space *mapping)
>> >> +{
>> >> +     /*
>> >> +      * We get called from munmap and msync.  Both calls can race
>> >> +      * with fs freezing.  If the fs is frozen after
>> >> +      * mapping_test_clear_cmtime but before the time update, then
>> >> +      * sync_filesystem will miss the cmtime update (because we
>> >> +      * just cleared it) and we don't be able to write (because the
>> >> +      * fs is frozen).  On the other hand, we can't just return if
>> >> +      * we're in the SB_FREEZE_PAGEFAULT state because our caller
>> >> +      * expects the timestamp to be synchronously updated.  So we
>> >> +      * get write access without blocking, at the SB_FREEZE_FS
>> >> +      * level.  If the fs is already fully frozen, then we already
>> >> +      * know we have nothing to do.
>> >> +      */
>> >> +
>> >> +     if (!mapping_test_cmtime(mapping))
>> >> +             return;  /* Optimization: nothing to do. */
>> >> +
>> >> +     if (__sb_start_write(mapping->host->i_sb, SB_FREEZE_FS, false)) {
>> >> +             mapping_flush_cmtime(mapping);
>> >> +             __sb_end_write(mapping->host->i_sb, SB_FREEZE_FS);
>> >> +     }
>> >> +}
>> >   This is wrong because SB_FREEZE_FS level is targetted for filesystem
>> > internal use. Also it is racy. mapping_flush_cmtime() ends up calling
>> > mark_inode_dirty() and filesystems such as ext4 or xfs will start a
>> > transaction to store inode in the journal. This gets freeze protection at
>> > SB_FREEZE_FS level again. If freeze_super() sets s_writers.frozen to
>> > SB_FREEZE_FS before this second protection, things will deadlock.
>>
>> Whoops -- I assumed that it was safe to recursively take freeze
>> protection at the same level.
>>
>> I'm worried about the following race:
>>
>> Thread 1 (in munmap):
>> Check AS_CMTIME set
>> sb_start_pagefault
>>
>> Thread 2 (freezing the fs):
>> frozen = SB_FREEZE_PAGEFAULT;
>> sync_filesystem()
>>
>> Thread 1 is now stuck.  It doesn't need to be, because sync_filesystem
>> will flush out the cmtime write.  But there doesn't seem to be a clean
>> mechanism to wait for the freeze to finish.
>   OK, I see. Frankly, I'd rather live with msync() and munmap() blocking
> while filesystem is frozen than trying to outsmart the freezing logic...
> If someone comes up with a usecase where it causes trouble, we can always
> improve the logic with some clever tricks.

I'll at least check that it's a shared writable mapping before doing
the flush to avoid blocking on other types of munmap.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 86cf0a4..f6b0f8b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -350,6 +350,14 @@  struct address_space_operations {
 	/* Write back some dirty pages from this mapping. */
 	int (*writepages)(struct address_space *, struct writeback_control *);
 
+	/*
+	 * Called when a deferred cmtime update should be applied.
+	 * Implementations should update cmtime.  (As an optional
+	 * optimization, implementaions can call mapping_test_clear_cmtime
+	 * from writepages as well.)
+	 */
+	void (*update_cmtime_deferred)(struct address_space *);
+
 	/* Set a page dirty.  Return true if this dirtied it */
 	int (*set_page_dirty)(struct page *page);
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 9a461ee..2647a13 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -90,6 +90,12 @@  static inline bool mapping_test_clear_cmtime(struct address_space * mapping)
 	return test_and_clear_bit(AS_CMTIME, &mapping->flags);
 }
 
+/* Use this one in writepages, etc. */
+extern void mapping_flush_cmtime(struct address_space * mapping);
+
+/* Use this one outside writeback. */
+extern void mapping_flush_cmtime_nowb(struct address_space * mapping);
+
 /*
  * This is non-atomic.  Only to be used before the mapping is activated.
  * Probably needs a barrier...
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 4e198ca..efe4970 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -174,6 +174,7 @@  typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc,
 
 int generic_writepages(struct address_space *mapping,
 		       struct writeback_control *wbc);
+void generic_update_cmtime_deferred(struct address_space *mapping);
 void tag_pages_for_writeback(struct address_space *mapping,
 			     pgoff_t start, pgoff_t end);
 int write_cache_pages(struct address_space *mapping,
diff --git a/mm/migrate.c b/mm/migrate.c
index 6f0c244..e4124e2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -627,6 +627,8 @@  static int writeout(struct address_space *mapping, struct page *page)
 		/* unlocked. Relock */
 		lock_page(page);
 
+	mapping_flush_cmtime(mapping);
+
 	return (rc < 0) ? -EIO : -EAGAIN;
 }
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 1edbaa3..189eb7a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1,3 +1,4 @@ 
+
 /*
  * mm/mmap.c
  *
@@ -249,8 +250,11 @@  static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
 	might_sleep();
 	if (vma->vm_ops && vma->vm_ops->close)
 		vma->vm_ops->close(vma);
-	if (vma->vm_file)
+	if (vma->vm_file) {
+		if ((vma->vm_flags & VM_SHARED) && vma->vm_file->f_mapping)
+			mapping_flush_cmtime_nowb(vma->vm_file->f_mapping);
 		fput(vma->vm_file);
+	}
 	mpol_put(vma_policy(vma));
 	kmem_cache_free(vm_area_cachep, vma);
 	return next;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 3f0c895..4ec8c02 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1912,12 +1912,30 @@  int generic_writepages(struct address_space *mapping,
 
 	blk_start_plug(&plug);
 	ret = write_cache_pages(mapping, wbc, __writepage, mapping);
+	mapping_flush_cmtime(mapping);
 	blk_finish_plug(&plug);
 	return ret;
 }
-
 EXPORT_SYMBOL(generic_writepages);
 
+/**
+ * generic_update_cmtime_deferred - update cmtime after an mmapped write
+ * @mapping: The mapping
+ *
+ * This library function implements .update_cmtime_deferred.  It is unlikely
+ * that any filesystem will want to do anything here except update the time
+ * (using this helper) or nothing at all (by leaving .update_cmtime_deferred
+ * NULL).
+ */
+void generic_update_cmtime_deferred(struct address_space *mapping)
+{
+	struct blk_plug plug;
+	blk_start_plug(&plug);
+	inode_update_time_writable(mapping->host);
+	blk_finish_plug(&plug);
+}
+EXPORT_SYMBOL(generic_update_cmtime_deferred);
+
 int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
 {
 	int ret;
@@ -1970,6 +1988,39 @@  int write_one_page(struct page *page, int wait)
 }
 EXPORT_SYMBOL(write_one_page);
 
+void mapping_flush_cmtime(struct address_space *mapping)
+{
+	if (mapping_test_clear_cmtime(mapping) &&
+	    mapping->a_ops->update_cmtime_deferred)
+		mapping->a_ops->update_cmtime_deferred(mapping);
+}
+EXPORT_SYMBOL(mapping_flush_cmtime);
+
+void mapping_flush_cmtime_nowb(struct address_space *mapping)
+{
+	/*
+	 * We get called from munmap and msync.  Both calls can race
+	 * with fs freezing.  If the fs is frozen after
+	 * mapping_test_clear_cmtime but before the time update, then
+	 * sync_filesystem will miss the cmtime update (because we
+	 * just cleared it) and we don't be able to write (because the
+	 * fs is frozen).  On the other hand, we can't just return if
+	 * we're in the SB_FREEZE_PAGEFAULT state because our caller
+	 * expects the timestamp to be synchronously updated.  So we
+	 * get write access without blocking, at the SB_FREEZE_FS
+	 * level.  If the fs is already fully frozen, then we already
+	 * know we have nothing to do.
+	 */
+
+	if (!mapping_test_cmtime(mapping))
+		return;  /* Optimization: nothing to do. */
+
+	if (__sb_start_write(mapping->host->i_sb, SB_FREEZE_FS, false)) {
+		mapping_flush_cmtime(mapping);
+		__sb_end_write(mapping->host->i_sb, SB_FREEZE_FS);
+	}
+}
+
 /*
  * For address_spaces which do not use buffers nor write back.
  */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2cff0d4..3b759e7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -429,6 +429,7 @@  static pageout_t pageout(struct page *page, struct address_space *mapping,
 		res = mapping->a_ops->writepage(page, &wbc);
 		if (res < 0)
 			handle_write_error(mapping, page, res);
+		mapping_flush_cmtime(mapping);
 		if (res == AOP_WRITEPAGE_ACTIVATE) {
 			ClearPageReclaim(page);
 			return PAGE_ACTIVATE;