[SRU,B,C,1/1] UBUNTU: SAUCE: cachefiles: Page leaking in cachefiles_read_backing_file while vmscan is active

Message ID 20180920055057.3121-2-daniel.axtens@canonical.com
State New
Headers show
Series
  • [SRU,B,C,1/1] UBUNTU: SAUCE: cachefiles: Page leaking in cachefiles_read_backing_file while vmscan is active
Related show

Commit Message

Daniel Axtens Sept. 20, 2018, 5:50 a.m.
From: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>

BugLink: https://bugs.launchpad.net/bugs/1793430

[Description]
In a heavily loaded system where the system pagecache is nearing
memory limits and fscache is enabled, pages can be leaked by fscache
while trying read pages from cachefiles backend.  This can happen
because two applications can be reading same page from a single mount,
two threads can be trying to read the backing page at same time. This
results in one of the thread finding that a page for the backing file
or netfs file is already in the radix tree. During the error handling
cachefiles does not cleanup the reference on backing page, leading to
page leak.

[Fix]
The fix is straightforward, to decrement the reference when error is
encounterd.

[Testing]
I have tested the fix using following method for 12+ hrs.

1) mkdir -p /mnt/nfs ; mount -o vers=3,fsc <server_ip>:/export /mnt/nfs
2) create 10000 files of 2.8MB in a NFS mount.
3) start a thread to simulate heavy VM presssure
   (while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done)&
4) start multiple parallel reader for data set at same time
   find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
   find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
   find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
   ..
   ..
   find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
   find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
5) finally check using cat /proc/fs/fscache/stats | grep -i pages ;
   free -h , cat /proc/meminfo and page-types -r -b lru
   to ensure all pages are freed.

Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Shantanu Goel <sgoel01@yahoo.com>
Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
[dja: forward ported to current upstream]
Signed-off-by: Daniel Axtens <dja@axtens.net>
[applied from
 https://www.redhat.com/archives/linux-cachefs/2018-August/msg00007.html
 This is v2 of the patch. It has sat on the list for weeks without
 any response or forward progress. v1 was first posted in 2014 and
 was reposted this August.]
Signed-off-by: Daniel Axtens <daniel.axtens@canonical.com>
---
 fs/cachefiles/rdwr.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Seth Forshee Sept. 20, 2018, 8:24 a.m. | #1
On Thu, Sep 20, 2018 at 03:50:57PM +1000, Daniel Axtens wrote:
> From: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1793430
> 
> [Description]
> In a heavily loaded system where the system pagecache is nearing
> memory limits and fscache is enabled, pages can be leaked by fscache
> while trying read pages from cachefiles backend.  This can happen
> because two applications can be reading same page from a single mount,
> two threads can be trying to read the backing page at same time. This
> results in one of the thread finding that a page for the backing file
> or netfs file is already in the radix tree. During the error handling
> cachefiles does not cleanup the reference on backing page, leading to
> page leak.
> 
> [Fix]
> The fix is straightforward, to decrement the reference when error is
> encounterd.
> 
> [Testing]
> I have tested the fix using following method for 12+ hrs.
> 
> 1) mkdir -p /mnt/nfs ; mount -o vers=3,fsc <server_ip>:/export /mnt/nfs
> 2) create 10000 files of 2.8MB in a NFS mount.
> 3) start a thread to simulate heavy VM presssure
>    (while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done)&
> 4) start multiple parallel reader for data set at same time
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
>    ..
>    ..
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
> 5) finally check using cat /proc/fs/fscache/stats | grep -i pages ;
>    free -h , cat /proc/meminfo and page-types -r -b lru
>    to ensure all pages are freed.
> 
> Reviewed-by: Daniel Axtens <dja@axtens.net>
> Signed-off-by: Shantanu Goel <sgoel01@yahoo.com>
> Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
> [dja: forward ported to current upstream]
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> [applied from
>  https://www.redhat.com/archives/linux-cachefs/2018-August/msg00007.html
>  This is v2 of the patch. It has sat on the list for weeks without
>  any response or forward progress. v1 was first posted in 2014 and
>  was reposted this August.]
> Signed-off-by: Daniel Axtens <daniel.axtens@canonical.com>

I'm on the fence about this patch. On the one hand I don't think it's
harmful and does look to legitimately fix some leaks. On the other hand
it also makes changes that don't fix any leaks, afaict. Maybe you can
convince me that I'm incorrect.

> ---
>  fs/cachefiles/rdwr.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
> index e09438612607..54d11248910c 100644
> --- a/fs/cachefiles/rdwr.c
> +++ b/fs/cachefiles/rdwr.c
> @@ -274,6 +274,8 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
>  			goto installed_new_backing_page;
>  		if (ret != -EEXIST)
>  			goto nomem_page;
> +		put_page(newpage);
> +		newpage = NULL;
>  	}

I don't think this actually fixes any leaks. On the next loop iteration
it either finds a backing page and goes to backing_page_already_present,
where the reference is dropped, or else it tries again to install
newpage as the backing page. I don't see how a reference is dropped.

On the other hand, I don't see it as being harmful, except that maybe
it as to allocate a page again when it could have used the page from the
previous iteration.

>  
>  	/* we've installed a new backing page, so now we need to start
> @@ -511,6 +513,8 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
>  				goto installed_new_backing_page;
>  			if (ret != -EEXIST)
>  				goto nomem;
> +			put_page(newpage);
> +			newpage = NULL;
>  		}

Wow, this has to be the most goto-happy function I have ever seen.

This one does look to prevent leaks.

>  
>  		/* we've installed a new backing page, so now we need
> @@ -535,7 +539,10 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
>  					    netpage->index, cachefiles_gfp);
>  		if (ret < 0) {
>  			if (ret == -EEXIST) {
> +				put_page(backpage);
> +				backpage = NULL;

This seems right.

>  				put_page(netpage);
> +				netpage = NULL;

Unnecessary but not harmful, possibly good for clarity's sake.

>  				fscache_retrieval_complete(op, 1);
>  				continue;
>  			}
> @@ -608,6 +615,8 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
>  					    netpage->index, cachefiles_gfp);
>  		if (ret < 0) {
>  			if (ret == -EEXIST) {
> +				put_page(backpage);
> +				backpage = NULL;

Seems correct.

>  				put_page(netpage);

Funny that netpage isn't also assigned NULL here, as it's no different
from the case above where it was.

>  				fscache_retrieval_complete(op, 1);
>  				continue;
> -- 
> 2.17.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Daniel Axtens Sept. 21, 2018, 1:06 a.m. | #2
Hi Seth,

Thanks for the review.

> I'm on the fence about this patch. On the one hand I don't think it's
> harmful and does look to legitimately fix some leaks. On the other hand
> it also makes changes that don't fix any leaks, afaict. Maybe you can
> convince me that I'm incorrect.

I'm talking with Kiran and we'll have a new version for you ASAP.

> Wow, this has to be the most goto-happy function I have ever seen.
Isn't it ever!!

Regards,
Daniel
Daniel Axtens Sept. 21, 2018, 2:59 a.m. | #3
From: Daniel Axtens <dja@axtens.net>

FYI, Kiran's response hasn't hit the list for some reason, so I've kept
it in full below. Perhaps it's stuck in moderation.


I found I was getting confused between the
cachefiles_read_backing_file_one() function, which was the first hunk of
the original patch, and cachefiles_read_backing_file(), which has the
other 3 hunks of the original patch.

(For clarity, I'm referring to this bit:

--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -274,6 +274,8 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
                        goto installed_new_backing_page;
                if (ret != -EEXIST)
                        goto nomem_page;
+               put_page(newpage);
+               newpage = NULL;
        }

)

In cachefiles_read_backing_file(), there is an equivalent section to
this already - this just brings the two functions in to alignment.

Seth's question was whether or not that would make a difference. The
case we are concerned about is what happens if you reach the end of the
loop, the find a backing page and goto backing_page_already_present.

In cachefiles_read_backing_file_one(), backing_page_already_present
starts around line 317, and one of the first things it does is check if
newpage is not NULL and puts it if so. So we don't have a leak.

On the other hand, the parallel backing_page_already_present in
cachefiles_read_backing_file() (around line 577) does *not* free newpage
if it exists. However, we also don't have a leak here as the loop in
cachefiles_read_backing_file() *does* put_page(newpage) unconditionally
at the end of the loop. Confusing!

While it would be desirable to bring these functions into alignment, we
can do that upstream rather than in a stable release. So I think we can
safely drop the changes to cachefiles_read_backing_file_one() and just
focus on cachefiles_read_backing_file().

If we're just focussing on that function, we can safely leave all 3 of
the original hunks from patch v1. I agree that some of the sets of NULL
are not strictly necessary and some are missing where you'd expect
them. Ideally I'd like to keep the delta between what we have here and
what we're hoping will go upstream to a minimum, so how does the
following look as both a patch for Ubuntu and a patch for upstream? 


diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index e09438612607..4d03a0c93eda 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -511,6 +511,8 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
                                goto installed_new_backing_page;
                        if (ret != -EEXIST)
                                goto nomem;
+                       put_page(newpage);
+                       newpage = NULL;
                }
 
                /* we've installed a new backing page, so now we need
@@ -535,7 +537,10 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
                                            netpage->index, cachefiles_gfp);
                if (ret < 0) {
                        if (ret == -EEXIST) {
+                               put_page(backpage);
+                               backpage = NULL;
                                put_page(netpage);
+                               netpage = NULL;
                                fscache_retrieval_complete(op, 1);
                                continue;
                        }
@@ -608,7 +613,10 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
                                            netpage->index, cachefiles_gfp);
                if (ret < 0) {
                        if (ret == -EEXIST) {
+                               put_page(backpage);
+                               backpage = NULL;
                                put_page(netpage);
+                               netpage = NULL;
                                fscache_retrieval_complete(op, 1);
                                continue;
                        }

If it looks OK to both of you I'll do another backport and submission.

Regards,
Daniel

kiran.modukuri@gmail.com writes:

> From: kmodukuri <kmodukuri@nvidia.com>
>
> BugLink: https://bugs.launchpad.net/bugs/1793430
>
> v2 update: Addressed Seth, Forshee's comment about the unecessary freeing of newpage in the for loop, by restricting the freeing to the case when the
> code jumps to "backing_page_already_present" in function cachefiles_read_backing_file.
>
> [Description]
> In a heavily loaded system where the system pagecache is nearing memory limits and fscache is enabled,
> pages can be leaked by fscache while trying read pages from cachefiles backend.
> This can happen because two applications can be reading same page from a single mount,
> two threads can be trying to read the backing page at same time. This results in one of the thread
> finding that a page for the backing file or netfs file is already in the radix tree. During the error
> handling cachefiles does not cleanup the reference on backing page, leading to page leak.
>
> [Fix]
> The fix is straightforward, to decrement the reference when error is encounterd.
>
> [Testing]
> I have tested the fix using following method for 12+ hrs.
>
> 1) mkdir -p /mnt/nfs ; mount -o vers=3,fsc <server_ip>:/export /mnt/nfs
> 2) create 10000 files of 2.8MB in a NFS mount.
> 3) start a thread to simulate heavy VM presssure
>    (while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done)&
> 4) start multiple parallel reader for data set at same time
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
>    ..
>    ..
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
> 5) finally check using cat /proc/fs/fscache/stats | grep -i pages ;
>    free -h , cat /proc/meminfo and page-types -r -b lru
>    to ensure all pages are freed.
>
> Signed-off-by: Shantanu Goel <sgoel01@yahoo.com>
> Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
> [dja: forward ported to current upstream]
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  fs/cachefiles/rdwr.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
> index 40f7595..b33a4b6 100644
> --- a/fs/cachefiles/rdwr.c
> +++ b/fs/cachefiles/rdwr.c
> @@ -535,7 +535,10 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
>  					    netpage->index, cachefiles_gfp);
>  		if (ret < 0) {
>  			if (ret == -EEXIST) {
> +				put_page(backpage);
> +				backpage = NULL;
>  				put_page(netpage);
> +				netpage = NULL;
>  				fscache_retrieval_complete(op, 1);
>  				continue;
>  			}
> @@ -570,6 +573,10 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
>  		/* if the backing page is already present, it can be in one of
>  		 * three states: read in progress, read failed or read okay */
>  	backing_page_already_present:
> +		if (newpage) {
> +			put_page(newpage);
> +			newpage = NULL;
> +		}
>  		_debug("- present %p", backpage);
>  
>  		if (PageError(backpage))
> @@ -608,6 +615,8 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
>  					    netpage->index, cachefiles_gfp);
>  		if (ret < 0) {
>  			if (ret == -EEXIST) {
> +				put_page(backpage);
> +				backpage = NULL;
>  				put_page(netpage);
>  				fscache_retrieval_complete(op, 1);
>  				continue;
> -- 
> 2.7.4
Seth Forshee Sept. 21, 2018, 12:51 p.m. | #4
On Fri, Sep 21, 2018 at 12:59:40PM +1000, Daniel Axtens wrote:
> From: Daniel Axtens <dja@axtens.net>
> 
> FYI, Kiran's response hasn't hit the list for some reason, so I've kept
> it in full below. Perhaps it's stuck in moderation.
> 
> 
> I found I was getting confused between the
> cachefiles_read_backing_file_one() function, which was the first hunk of
> the original patch, and cachefiles_read_backing_file(), which has the
> other 3 hunks of the original patch.
> 
> (For clarity, I'm referring to this bit:
> 
> --- a/fs/cachefiles/rdwr.c
> +++ b/fs/cachefiles/rdwr.c
> @@ -274,6 +274,8 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
>                         goto installed_new_backing_page;
>                 if (ret != -EEXIST)
>                         goto nomem_page;
> +               put_page(newpage);
> +               newpage = NULL;
>         }
> 
> )
> 
> In cachefiles_read_backing_file(), there is an equivalent section to
> this already - this just brings the two functions in to alignment.
> 
> Seth's question was whether or not that would make a difference. The
> case we are concerned about is what happens if you reach the end of the
> loop, the find a backing page and goto backing_page_already_present.
> 
> In cachefiles_read_backing_file_one(), backing_page_already_present
> starts around line 317, and one of the first things it does is check if
> newpage is not NULL and puts it if so. So we don't have a leak.
> 
> On the other hand, the parallel backing_page_already_present in
> cachefiles_read_backing_file() (around line 577) does *not* free newpage
> if it exists. However, we also don't have a leak here as the loop in
> cachefiles_read_backing_file() *does* put_page(newpage) unconditionally
> at the end of the loop. Confusing!
> 
> While it would be desirable to bring these functions into alignment, we
> can do that upstream rather than in a stable release. So I think we can
> safely drop the changes to cachefiles_read_backing_file_one() and just
> focus on cachefiles_read_backing_file().
> 
> If we're just focussing on that function, we can safely leave all 3 of
> the original hunks from patch v1. I agree that some of the sets of NULL
> are not strictly necessary and some are missing where you'd expect
> them. Ideally I'd like to keep the delta between what we have here and
> what we're hoping will go upstream to a minimum, so how does the
> following look as both a patch for Ubuntu and a patch for upstream? 

Ultimately I would prefer that we apply whatever fix ends up upstream,
within reason. There's no patch upstream yet however, so I was giving
some general feedback. If the expectation is that upstream prefers those
two loops to remain the same, then that's the version we want to
consider for Ubuntu.

Seth
Daniel Axtens Sept. 21, 2018, 1:28 p.m. | #5
Hi,

> > If we're just focussing on that function, we can safely leave all 3 of
> > the original hunks from patch v1. I agree that some of the sets of NULL
> > are not strictly necessary and some are missing where you'd expect
> > them. Ideally I'd like to keep the delta between what we have here and
> > what we're hoping will go upstream to a minimum, so how does the
> > following look as both a patch for Ubuntu and a patch for upstream?
>
> Ultimately I would prefer that we apply whatever fix ends up upstream,
> within reason. There's no patch upstream yet however, so I was giving
> some general feedback. If the expectation is that upstream prefers those
> two loops to remain the same, then that's the version we want to
> consider for Ubuntu.

Absolutely, I agree that we want to match upstream.

However, we've had a lot of trouble getting fscache patches into
upstream in any predictable sort of time-frame. This bug is affecting
people at the moment, hence the desire to submit a sauce patch while
the upstream patch is still on the list, in hopes of getting it into
the coming SRU cycle.

I will submit my proposed version both upstream and here first thing
Monday morning.

Regards,
Daniel

>
> Seth

Patch

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index e09438612607..54d11248910c 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -274,6 +274,8 @@  static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 			goto installed_new_backing_page;
 		if (ret != -EEXIST)
 			goto nomem_page;
+		put_page(newpage);
+		newpage = NULL;
 	}
 
 	/* we've installed a new backing page, so now we need to start
@@ -511,6 +513,8 @@  static int cachefiles_read_backing_file(struct cachefiles_object *object,
 				goto installed_new_backing_page;
 			if (ret != -EEXIST)
 				goto nomem;
+			put_page(newpage);
+			newpage = NULL;
 		}
 
 		/* we've installed a new backing page, so now we need
@@ -535,7 +539,10 @@  static int cachefiles_read_backing_file(struct cachefiles_object *object,
 					    netpage->index, cachefiles_gfp);
 		if (ret < 0) {
 			if (ret == -EEXIST) {
+				put_page(backpage);
+				backpage = NULL;
 				put_page(netpage);
+				netpage = NULL;
 				fscache_retrieval_complete(op, 1);
 				continue;
 			}
@@ -608,6 +615,8 @@  static int cachefiles_read_backing_file(struct cachefiles_object *object,
 					    netpage->index, cachefiles_gfp);
 		if (ret < 0) {
 			if (ret == -EEXIST) {
+				put_page(backpage);
+				backpage = NULL;
 				put_page(netpage);
 				fscache_retrieval_complete(op, 1);
 				continue;