Patchwork [2/3] mm: Ensure that mark_page_accessed moves pages to the active list

login
register
mail settings
Submitter Mel Gorman
Date April 29, 2013, 4:31 p.m.
Message ID <1367253119-6461-3-git-send-email-mgorman@suse.de>
Download mbox | patch
Permalink /patch/240416/
State Not Applicable
Headers show

Comments

Mel Gorman - April 29, 2013, 4:31 p.m.
If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
may fail to move a page to the active list as expected. Now that the
LRU is selected at LRU drain time, mark pages PageActive if they are
on a pagevec so it gets moved to the correct list at LRU drain time.
Using a debugging patch it was found that for a simple git checkout
based workload that pages were never added to the active file list in
practice but with this patch applied they are.

				before   after
LRU Add Active File                  0  757121
LRU Add Active Anon            2678833 2633924
LRU Add Inactive File          8821711 8085543
LRU Add Inactive Anon              183     200

The question to consider is if this is universally safe. If the page
was isolated for reclaim and there is a parallel mark_page_accessed()
then vmscan.c will get upset when it finds an isolated PageActive page.
Similarly a potential race exists between a per-cpu drain on a pagevec
list and an activation on a remote CPU.

				lru_add_drain_cpu
				__pagevec_lru_add
				  lru = page_lru(page);
mark_page_accessed
  if (PageLRU(page))
    activate_page
  else
    SetPageActive
				  SetPageLRU(page);
				  add_page_to_lru_list(page, lruvec, lru);

A PageActive page is now added to the inactivate list.

While this looks strange, I think it is sufficiently harmless that additional
barriers to address the case is not justified.  Unfortunately, while I never
witnessed it myself, these parallel updates potentially trigger defensive
DEBUG_VM checks on PageActive and hence they are removed by this patch.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/swap.c   | 18 ++++++++++++------
 mm/vmscan.c |  3 ---
 2 files changed, 12 insertions(+), 9 deletions(-)
Rik van Riel - April 29, 2013, 5:12 p.m.
On 04/29/2013 12:31 PM, Mel Gorman wrote:

> A PageActive page is now added to the inactivate list.
>
> While this looks strange, I think it is sufficiently harmless that additional
> barriers to address the case is not justified.  Unfortunately, while I never
> witnessed it myself, these parallel updates potentially trigger defensive
> DEBUG_VM checks on PageActive and hence they are removed by this patch.

Could this not cause issues with __page_cache_release, called from
munmap, exit, truncate, etc.?

Could the eventual skewing of active vs inactive numbers break page
reclaim heuristics?

I wonder if we would need to move to a scheme where the PG_active bit
is always the authoritive one, and we never pass an overriding "lru"
parameter to __pagevec_lru_add.

Would memory ordering between SetPageLRU and testing for PageLRU be
enough to then prevent the statistics from going off?
Sam Ben - May 1, 2013, 5:41 a.m.
Hi Mel,
On 04/30/2013 12:31 AM, Mel Gorman wrote:
> If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
> may fail to move a page to the active list as expected. Now that the
> LRU is selected at LRU drain time, mark pages PageActive if they are
> on a pagevec so it gets moved to the correct list at LRU drain time.
> Using a debugging patch it was found that for a simple git checkout
> based workload that pages were never added to the active file list in

Could you show us the details of your workload?

> practice but with this patch applied they are.
>
> 				before   after
> LRU Add Active File                  0  757121
> LRU Add Active Anon            2678833 2633924
> LRU Add Inactive File          8821711 8085543
> LRU Add Inactive Anon              183     200
>
> The question to consider is if this is universally safe. If the page
> was isolated for reclaim and there is a parallel mark_page_accessed()
> then vmscan.c will get upset when it finds an isolated PageActive page.
> Similarly a potential race exists between a per-cpu drain on a pagevec
> list and an activation on a remote CPU.
>
> 				lru_add_drain_cpu
> 				__pagevec_lru_add
> 				  lru = page_lru(page);
> mark_page_accessed
>    if (PageLRU(page))
>      activate_page
>    else
>      SetPageActive
> 				  SetPageLRU(page);
> 				  add_page_to_lru_list(page, lruvec, lru);
>
> A PageActive page is now added to the inactivate list.
>
> While this looks strange, I think it is sufficiently harmless that additional
> barriers to address the case is not justified.  Unfortunately, while I never
> witnessed it myself, these parallel updates potentially trigger defensive
> DEBUG_VM checks on PageActive and hence they are removed by this patch.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>   mm/swap.c   | 18 ++++++++++++------
>   mm/vmscan.c |  3 ---
>   2 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index 80fbc37..2a10d08 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -437,8 +437,17 @@ void activate_page(struct page *page)
>   void mark_page_accessed(struct page *page)
>   {
>   	if (!PageActive(page) && !PageUnevictable(page) &&
> -			PageReferenced(page) && PageLRU(page)) {
> -		activate_page(page);
> +			PageReferenced(page)) {
> +
> +		/*
> +		 * If the page is on the LRU, promote immediately. Otherwise,
> +		 * assume the page is on a pagevec, mark it active and it'll
> +		 * be moved to the active LRU on the next drain
> +		 */
> +		if (PageLRU(page))
> +			activate_page(page);
> +		else
> +			SetPageActive(page);
>   		ClearPageReferenced(page);
>   	} else if (!PageReferenced(page)) {
>   		SetPageReferenced(page);
> @@ -478,11 +487,8 @@ EXPORT_SYMBOL(__lru_cache_add);
>    */
>   void lru_cache_add_lru(struct page *page, enum lru_list lru)
>   {
> -	if (PageActive(page)) {
> +	if (PageActive(page))
>   		VM_BUG_ON(PageUnevictable(page));
> -	} else if (PageUnevictable(page)) {
> -		VM_BUG_ON(PageActive(page));
> -	}
>   
>   	VM_BUG_ON(PageLRU(page));
>   	__lru_cache_add(page, lru);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 88c5fed..751b897 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -704,7 +704,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>   		if (!trylock_page(page))
>   			goto keep;
>   
> -		VM_BUG_ON(PageActive(page));
>   		VM_BUG_ON(page_zone(page) != zone);
>   
>   		sc->nr_scanned++;
> @@ -935,7 +934,6 @@ activate_locked:
>   		/* Not a candidate for swapping, so reclaim swap space. */
>   		if (PageSwapCache(page) && vm_swap_full())
>   			try_to_free_swap(page);
> -		VM_BUG_ON(PageActive(page));
>   		SetPageActive(page);
>   		pgactivate++;
>   keep_locked:
> @@ -3488,7 +3486,6 @@ void check_move_unevictable_pages(struct page **pages, int nr_pages)
>   		if (page_evictable(page)) {
>   			enum lru_list lru = page_lru_base_type(page);
>   
> -			VM_BUG_ON(PageActive(page));
>   			ClearPageUnevictable(page);
>   			del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE);
>   			add_page_to_lru_list(page, lruvec, lru);

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mel Gorman - May 1, 2013, 8:06 a.m.
On Wed, May 01, 2013 at 01:41:34PM +0800, Sam Ben wrote:
> Hi Mel,
> On 04/30/2013 12:31 AM, Mel Gorman wrote:
> >If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
> >may fail to move a page to the active list as expected. Now that the
> >LRU is selected at LRU drain time, mark pages PageActive if they are
> >on a pagevec so it gets moved to the correct list at LRU drain time.
> >Using a debugging patch it was found that for a simple git checkout
> >based workload that pages were never added to the active file list in
> 
> Could you show us the details of your workload?
> 

The workload is git checkouts of a fixed number of commits for the
kernel git tree. It starts with a warm-up run that is not timed and then
records the time for a number of iterations.
Ric Mason - May 1, 2013, 8:14 a.m.
On 05/01/2013 04:06 PM, Mel Gorman wrote:
> On Wed, May 01, 2013 at 01:41:34PM +0800, Sam Ben wrote:
>> Hi Mel,
>> On 04/30/2013 12:31 AM, Mel Gorman wrote:
>>> If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
>>> may fail to move a page to the active list as expected. Now that the
>>> LRU is selected at LRU drain time, mark pages PageActive if they are
>>> on a pagevec so it gets moved to the correct list at LRU drain time.
>>> Using a debugging patch it was found that for a simple git checkout
>>> based workload that pages were never added to the active file list in
>> Could you show us the details of your workload?
>>
> The workload is git checkouts of a fixed number of commits for the

Is there script which you used?

> kernel git tree. It starts with a warm-up run that is not timed and then
> records the time for a number of iterations.

How to record the time for a number of iterations? Is the iteration here 
means lru scan?

>

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mel Gorman - May 1, 2013, 8:31 a.m.
On Wed, May 01, 2013 at 04:14:16PM +0800, Ric Mason wrote:
> On 05/01/2013 04:06 PM, Mel Gorman wrote:
> >On Wed, May 01, 2013 at 01:41:34PM +0800, Sam Ben wrote:
> >>Hi Mel,
> >>On 04/30/2013 12:31 AM, Mel Gorman wrote:
> >>>If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
> >>>may fail to move a page to the active list as expected. Now that the
> >>>LRU is selected at LRU drain time, mark pages PageActive if they are
> >>>on a pagevec so it gets moved to the correct list at LRU drain time.
> >>>Using a debugging patch it was found that for a simple git checkout
> >>>based workload that pages were never added to the active file list in
> >>Could you show us the details of your workload?
> >>
> >The workload is git checkouts of a fixed number of commits for the
> 
> Is there script which you used?
> 

mmtests with config-global-dhp__io-gitcheckout-randread-starvation . Parallel
randread was to see if the random file read would push the metadata blocks
out or not. I expected it would not be enough to trigger the reported
problem but it would be enough to determine if file pages were getting
added to the active lists or not.

> >kernel git tree. It starts with a warm-up run that is not timed and then
> >records the time for a number of iterations.
> 
> How to record the time for a number of iterations? Is the iteration
> here means lru scan?
> 

/usr/bin/time

Patch

diff --git a/mm/swap.c b/mm/swap.c
index 80fbc37..2a10d08 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -437,8 +437,17 @@  void activate_page(struct page *page)
 void mark_page_accessed(struct page *page)
 {
 	if (!PageActive(page) && !PageUnevictable(page) &&
-			PageReferenced(page) && PageLRU(page)) {
-		activate_page(page);
+			PageReferenced(page)) {
+
+		/*
+		 * If the page is on the LRU, promote immediately. Otherwise,
+		 * assume the page is on a pagevec, mark it active and it'll
+		 * be moved to the active LRU on the next drain
+		 */
+		if (PageLRU(page))
+			activate_page(page);
+		else
+			SetPageActive(page);
 		ClearPageReferenced(page);
 	} else if (!PageReferenced(page)) {
 		SetPageReferenced(page);
@@ -478,11 +487,8 @@  EXPORT_SYMBOL(__lru_cache_add);
  */
 void lru_cache_add_lru(struct page *page, enum lru_list lru)
 {
-	if (PageActive(page)) {
+	if (PageActive(page))
 		VM_BUG_ON(PageUnevictable(page));
-	} else if (PageUnevictable(page)) {
-		VM_BUG_ON(PageActive(page));
-	}
 
 	VM_BUG_ON(PageLRU(page));
 	__lru_cache_add(page, lru);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 88c5fed..751b897 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -704,7 +704,6 @@  static unsigned long shrink_page_list(struct list_head *page_list,
 		if (!trylock_page(page))
 			goto keep;
 
-		VM_BUG_ON(PageActive(page));
 		VM_BUG_ON(page_zone(page) != zone);
 
 		sc->nr_scanned++;
@@ -935,7 +934,6 @@  activate_locked:
 		/* Not a candidate for swapping, so reclaim swap space. */
 		if (PageSwapCache(page) && vm_swap_full())
 			try_to_free_swap(page);
-		VM_BUG_ON(PageActive(page));
 		SetPageActive(page);
 		pgactivate++;
 keep_locked:
@@ -3488,7 +3486,6 @@  void check_move_unevictable_pages(struct page **pages, int nr_pages)
 		if (page_evictable(page)) {
 			enum lru_list lru = page_lru_base_type(page);
 
-			VM_BUG_ON(PageActive(page));
 			ClearPageUnevictable(page);
 			del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE);
 			add_page_to_lru_list(page, lruvec, lru);