diff mbox

[1/3] mm: Don't blindly assign fallback_migrate_page()

Message ID 1466112375-1717-2-git-send-email-richard@nod.at
State Rejected
Headers show

Commit Message

Richard Weinberger June 16, 2016, 9:26 p.m. UTC
While block oriented filesystems use buffer_migrate_page()
as page migration function other filesystems which don't
implement ->migratepage() will automatically get fallback_migrate_page()
assigned. fallback_migrate_page() is not as generic as is should
be. Page migration is filesystem specific and a one-fits-all function
is hard to achieve. UBIFS leaned this lection the hard way.
It uses various page flags and fallback_migrate_page() does not
handle these flags as UBIFS expected.

To make sure that no further filesystem will get confused by
fallback_migrate_page() disable the automatic assignment and
allow filesystems to use this function explicitly if it is
really suitable.

Signed-off-by: Richard Weinberger <richard@nod.at>
---
 include/linux/migrate.h |  9 +++++++++
 mm/migrate.c            | 16 ++++++++++++----
 2 files changed, 21 insertions(+), 4 deletions(-)

Comments

Andrew Morton June 16, 2016, 11:11 p.m. UTC | #1
On Thu, 16 Jun 2016 23:26:13 +0200 Richard Weinberger <richard@nod.at> wrote:

> While block oriented filesystems use buffer_migrate_page()
> as page migration function other filesystems which don't
> implement ->migratepage() will automatically get fallback_migrate_page()
> assigned. fallback_migrate_page() is not as generic as is should
> be. Page migration is filesystem specific and a one-fits-all function
> is hard to achieve. UBIFS leaned this lection the hard way.
> It uses various page flags and fallback_migrate_page() does not
> handle these flags as UBIFS expected.
> 
> To make sure that no further filesystem will get confused by
> fallback_migrate_page() disable the automatic assignment and
> allow filesystems to use this function explicitly if it is
> really suitable.

hm, is there really much point in doing this?  I assume it doesn't
actually affect any current filesystems?

[2/3] is of course OK - please add it to the UBIFS tree.
Richard Weinberger June 17, 2016, 7:41 a.m. UTC | #2
Andrew,

Am 17.06.2016 um 01:11 schrieb Andrew Morton:
> On Thu, 16 Jun 2016 23:26:13 +0200 Richard Weinberger <richard@nod.at> wrote:
> 
>> While block oriented filesystems use buffer_migrate_page()
>> as page migration function other filesystems which don't
>> implement ->migratepage() will automatically get fallback_migrate_page()
>> assigned. fallback_migrate_page() is not as generic as is should
>> be. Page migration is filesystem specific and a one-fits-all function
>> is hard to achieve. UBIFS leaned this lection the hard way.
>> It uses various page flags and fallback_migrate_page() does not
>> handle these flags as UBIFS expected.
>>
>> To make sure that no further filesystem will get confused by
>> fallback_migrate_page() disable the automatic assignment and
>> allow filesystems to use this function explicitly if it is
>> really suitable.
> 
> hm, is there really much point in doing this?  I assume it doesn't
> actually affect any current filesystems?

Well, we simply don't know which filesystems are affected by similar issues.
JFFS2 is maybe also affected since it is not block based.
For UBIFS it also worked many years.

> [2/3] is of course OK - please add it to the UBIFS tree.

Can I add your Acked-by?

Thanks,
//richard
Michal Hocko June 17, 2016, 4:28 p.m. UTC | #3
On Fri 17-06-16 09:41:38, Richard Weinberger wrote:
> Andrew,
> 
> Am 17.06.2016 um 01:11 schrieb Andrew Morton:
> > On Thu, 16 Jun 2016 23:26:13 +0200 Richard Weinberger <richard@nod.at> wrote:
> > 
> >> While block oriented filesystems use buffer_migrate_page()
> >> as page migration function other filesystems which don't
> >> implement ->migratepage() will automatically get fallback_migrate_page()
> >> assigned. fallback_migrate_page() is not as generic as is should
> >> be. Page migration is filesystem specific and a one-fits-all function
> >> is hard to achieve. UBIFS leaned this lection the hard way.
> >> It uses various page flags and fallback_migrate_page() does not
> >> handle these flags as UBIFS expected.
> >>
> >> To make sure that no further filesystem will get confused by
> >> fallback_migrate_page() disable the automatic assignment and
> >> allow filesystems to use this function explicitly if it is
> >> really suitable.
> > 
> > hm, is there really much point in doing this?  I assume it doesn't
> > actually affect any current filesystems?
> 
> Well, we simply don't know which filesystems are affected by similar issues.

But doesn't this disable the page migration and so potentially reduce
the compaction success rate for the large pile of filesystems? Without
any hint about that?

$ git grep "\.migratepage[[:space:]]*=" -- fs | wc -l
16
out of
$ git grep "struct address_space_operations[[:space:]]*[a-zA-Z0-9_]*[[:space:]]*=" -- fs | wc -l
87

That just seems to be too conservative for something that even not might
be a problem, especially when considering the fallback migration code is
there for many years with only UBIFS seeing a problem.

Wouldn't it be safer to contact FS developers who might have have
similar issue and work with them to use a proper migration code?
Richard Weinberger June 17, 2016, 4:55 p.m. UTC | #4
Am 17.06.2016 um 18:28 schrieb Michal Hocko:
> But doesn't this disable the page migration and so potentially reduce
> the compaction success rate for the large pile of filesystems? Without
> any hint about that?

The WARN_ON_ONCE() is the hint. ;)
But I can understand your point we'd have to communicate that change better.

> $ git grep "\.migratepage[[:space:]]*=" -- fs | wc -l
> 16
> out of
> $ git grep "struct address_space_operations[[:space:]]*[a-zA-Z0-9_]*[[:space:]]*=" -- fs | wc -l
> 87
> 
> That just seems to be too conservative for something that even not might
> be a problem, especially when considering the fallback migration code is
> there for many years with only UBIFS seeing a problem.

UBIFS is also there for many years.
It just shows that the issue is hard to hit but at least for UBIFS it is real.

> Wouldn't it be safer to contact FS developers who might have have
> similar issue and work with them to use a proper migration code?

That was the goal of this patch. Forcing the filesystem developers
to look as the WARN_ON_ONCE() triggered.

I fear just sending a mail to linux-fsdevel@vger is not enough.

Thanks,
//richard
Michal Hocko June 17, 2016, 6:27 p.m. UTC | #5
On Fri 17-06-16 18:55:45, Richard Weinberger wrote:
> Am 17.06.2016 um 18:28 schrieb Michal Hocko:
> > But doesn't this disable the page migration and so potentially reduce
> > the compaction success rate for the large pile of filesystems? Without
> > any hint about that?
> 
> The WARN_ON_ONCE() is the hint. ;)

Right. My reply turned a different way than I meant... I meant to say
that there might be different regressions caused by this change without much
hint that a particular warning would be the smoking gun...
Richard Weinberger June 17, 2016, 7:36 p.m. UTC | #6
Am 17.06.2016 um 20:27 schrieb Michal Hocko:
> On Fri 17-06-16 18:55:45, Richard Weinberger wrote:
>> Am 17.06.2016 um 18:28 schrieb Michal Hocko:
>>> But doesn't this disable the page migration and so potentially reduce
>>> the compaction success rate for the large pile of filesystems? Without
>>> any hint about that?
>>
>> The WARN_ON_ONCE() is the hint. ;)
> 
> Right. My reply turned a different way than I meant... I meant to say
> that there might be different regressions caused by this change without much
> hint that a particular warning would be the smoking gun... 
> 

Okay, what about something like that?
That way everything works as before and we don't have regressions
but FS maintainers will notice the WARN_ON_ONCE() and hopefully review
whether generic_migrate_page() is really suitable.
If so, they can set their a_ops->migratepage to generic_migrate_page().

@@ -771,8 +773,15 @@ static int move_to_new_page(struct page *newpage, struct page *page,
                 * is the most common path for page migration.
                 */
                rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
-       else
-               rc = fallback_migrate_page(mapping, newpage, page, mode);
+       else {
+               /*
+                * Dear filesystem maintainer, please verify whether
+                * generic_migrate_page() is suitable for your
+                * filesystem, especially wrt. page flag handling.
+                */
+               WARN_ON_ONCE(1);
+               rc = generic_migrate_page(mapping, newpage, page, mode);
+       }

        /*
         * When successful, old pagecache page->mapping must be cleared before

Thanks,
//richard
Michal Hocko June 17, 2016, 8:03 p.m. UTC | #7
On Fri 17-06-16 21:36:30, Richard Weinberger wrote:
> 
> 
> Am 17.06.2016 um 20:27 schrieb Michal Hocko:
> > On Fri 17-06-16 18:55:45, Richard Weinberger wrote:
> >> Am 17.06.2016 um 18:28 schrieb Michal Hocko:
> >>> But doesn't this disable the page migration and so potentially reduce
> >>> the compaction success rate for the large pile of filesystems? Without
> >>> any hint about that?
> >>
> >> The WARN_ON_ONCE() is the hint. ;)
> > 
> > Right. My reply turned a different way than I meant... I meant to say
> > that there might be different regressions caused by this change without much
> > hint that a particular warning would be the smoking gun... 
> > 
> 
> Okay, what about something like that?
> That way everything works as before and we don't have regressions
> but FS maintainers will notice the WARN_ON_ONCE() and hopefully review
> whether generic_migrate_page() is really suitable.
> If so, they can set their a_ops->migratepage to generic_migrate_page().

Yes this sounds better to me. I would just be more verbose about which
a_ops is missing the migratepage callback. The WARN_ON_ONCE will not
tell us which fs is the culprit. I am not even sure the calltrace is
really helpful and maybe printk_once would be more appropriate.

	printk_once(KERN_INFO "%ps is missing migratepage callback. Please report to the respective filesystem maintainers.\n",
			mapping->a_ops);

Or print once per a_ops would be even better but that sounds like an
over engineering...
 
> @@ -771,8 +773,15 @@ static int move_to_new_page(struct page *newpage, struct page *page,
>                  * is the most common path for page migration.
>                  */
>                 rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
> -       else
> -               rc = fallback_migrate_page(mapping, newpage, page, mode);
> +       else {
> +               /*
> +                * Dear filesystem maintainer, please verify whether
> +                * generic_migrate_page() is suitable for your
> +                * filesystem, especially wrt. page flag handling.
> +                */
> +               WARN_ON_ONCE(1);
> +               rc = generic_migrate_page(mapping, newpage, page, mode);
> +       }
> 
>         /*
>          * When successful, old pagecache page->mapping must be cleared before
> 
> Thanks,
> //richard
Richard Weinberger June 22, 2016, 10:21 p.m. UTC | #8
Am 17.06.2016 um 01:11 schrieb Andrew Morton:
> On Thu, 16 Jun 2016 23:26:13 +0200 Richard Weinberger <richard@nod.at> wrote:
> 
>> While block oriented filesystems use buffer_migrate_page()
>> as page migration function other filesystems which don't
>> implement ->migratepage() will automatically get fallback_migrate_page()
>> assigned. fallback_migrate_page() is not as generic as is should
>> be. Page migration is filesystem specific and a one-fits-all function
>> is hard to achieve. UBIFS leaned this lection the hard way.
>> It uses various page flags and fallback_migrate_page() does not
>> handle these flags as UBIFS expected.
>>
>> To make sure that no further filesystem will get confused by
>> fallback_migrate_page() disable the automatic assignment and
>> allow filesystems to use this function explicitly if it is
>> really suitable.
> 
> hm, is there really much point in doing this?  I assume it doesn't
> actually affect any current filesystems?
> 
> [2/3] is of course OK - please add it to the UBIFS tree.

Pushed 2/3 and 3/3 into UBIFS next tree.

Thanks,
//richard
diff mbox

Patch

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 9b50325..aba86d4 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -47,6 +47,9 @@  extern int migrate_page_move_mapping(struct address_space *mapping,
 		struct page *newpage, struct page *page,
 		struct buffer_head *head, enum migrate_mode mode,
 		int extra_count);
+extern int generic_migrate_page(struct address_space *mapping,
+				struct page *newpage, struct page *page,
+				enum migrate_mode mode);
 #else
 
 static inline void putback_movable_pages(struct list_head *l) {}
@@ -67,6 +70,12 @@  static inline int migrate_huge_page_move_mapping(struct address_space *mapping,
 	return -ENOSYS;
 }
 
+static inline int generic_migrate_page(struct address_space *mapping,
+				       struct page *newpage, struct page *page,
+				       enum migrate_mode mode)
+{
+	return -ENOSYS;
+}
 #endif /* CONFIG_MIGRATION */
 
 #ifdef CONFIG_NUMA_BALANCING
diff --git a/mm/migrate.c b/mm/migrate.c
index 9baf41c..5129143 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -719,8 +719,9 @@  static int writeout(struct address_space *mapping, struct page *page)
 /*
  * Default handling if a filesystem does not provide a migration function.
  */
-static int fallback_migrate_page(struct address_space *mapping,
-	struct page *newpage, struct page *page, enum migrate_mode mode)
+int generic_migrate_page(struct address_space *mapping,
+			 struct page *newpage, struct page *page,
+			 enum migrate_mode mode)
 {
 	if (PageDirty(page)) {
 		/* Only writeback pages in full synchronous migration */
@@ -771,8 +772,15 @@  static int move_to_new_page(struct page *newpage, struct page *page,
 		 * is the most common path for page migration.
 		 */
 		rc = mapping->a_ops->migratepage(mapping, newpage, page, mode);
-	else
-		rc = fallback_migrate_page(mapping, newpage, page, mode);
+	else {
+		/*
+		 * Dear filesystem maintainer, please verify whether
+		 * generic_migrate_page() is suitable for your
+		 * filesystem, especially wrt. page flag handling.
+		 */
+		WARN_ON_ONCE(1);
+		rc = -EINVAL;
+	}
 
 	/*
 	 * When successful, old pagecache page->mapping must be cleared before