diff mbox series

[v9,12/25] mm: Move end_index check out of readahead loop

Message ID 20200320142231.2402-13-willy@infradead.org
State Superseded
Headers show
Series Change readahead API | expand

Commit Message

Matthew Wilcox (Oracle) March 20, 2020, 2:22 p.m. UTC
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

By reducing nr_to_read, we can eliminate this check from inside the loop.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
---
 mm/readahead.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

Comments

Eric Biggers March 20, 2020, 4:58 p.m. UTC | #1
On Fri, Mar 20, 2020 at 07:22:18AM -0700, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> By reducing nr_to_read, we can eliminate this check from inside the loop.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
> ---
>  mm/readahead.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/readahead.c b/mm/readahead.c
> index d01531ef9f3c..a37b68f66233 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -167,8 +167,6 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  		unsigned long lookahead_size)
>  {
>  	struct inode *inode = mapping->host;
> -	struct page *page;
> -	unsigned long end_index;	/* The last page we want to read */
>  	LIST_HEAD(page_pool);
>  	loff_t isize = i_size_read(inode);
>  	gfp_t gfp_mask = readahead_gfp_mask(mapping);
> @@ -178,22 +176,29 @@ void __do_page_cache_readahead(struct address_space *mapping,
>  		._index = index,
>  	};
>  	unsigned long i;
> +	pgoff_t end_index;	/* The last page we want to read */
>  
>  	if (isize == 0)
>  		return;
>  
> -	end_index = ((isize - 1) >> PAGE_SHIFT);
> +	end_index = (isize - 1) >> PAGE_SHIFT;
> +	if (index > end_index)
> +		return;
> +	/* Avoid wrapping to the beginning of the file */
> +	if (index + nr_to_read < index)
> +		nr_to_read = ULONG_MAX - index + 1;
> +	/* Don't read past the page containing the last byte of the file */
> +	if (index + nr_to_read >= end_index)
> +		nr_to_read = end_index - index + 1;

There seem to be a couple off-by-one errors here.  Shouldn't it be:

	/* Avoid wrapping to the beginning of the file */
	if (index + nr_to_read < index)
		nr_to_read = ULONG_MAX - index;
	/* Don't read past the page containing the last byte of the file */
	if (index + nr_to_read > end_index)
		nr_to_read = end_index - index + 1;

I.e., 'ULONG_MAX - index' rather than 'ULONG_MAX - index + 1', so that
'index + nr_to_read' is then ULONG_MAX rather than overflowed to 0.

Then 'index + nr_to_read > end_index' rather 'index + nr_to_read >= end_index',
since otherwise nr_to_read can be increased by 1 rather than decreased or stay
the same as expected.

- Eric
Matthew Wilcox (Oracle) March 20, 2020, 5:30 p.m. UTC | #2
On Fri, Mar 20, 2020 at 09:58:28AM -0700, Eric Biggers wrote:
> On Fri, Mar 20, 2020 at 07:22:18AM -0700, Matthew Wilcox wrote:
> > +	/* Avoid wrapping to the beginning of the file */
> > +	if (index + nr_to_read < index)
> > +		nr_to_read = ULONG_MAX - index + 1;
> > +	/* Don't read past the page containing the last byte of the file */
> > +	if (index + nr_to_read >= end_index)
> > +		nr_to_read = end_index - index + 1;
> 
> There seem to be a couple off-by-one errors here.  Shouldn't it be:
> 
> 	/* Avoid wrapping to the beginning of the file */
> 	if (index + nr_to_read < index)
> 		nr_to_read = ULONG_MAX - index;

I think it's right.  Imagine that index is ULONG_MAX.  We should read one
page (the one at ULONG_MAX).  That would be ULONG_MAX - ULONG_MAX + 1.

> 	/* Don't read past the page containing the last byte of the file */
> 	if (index + nr_to_read > end_index)
> 		nr_to_read = end_index - index + 1;
> 
> I.e., 'ULONG_MAX - index' rather than 'ULONG_MAX - index + 1', so that
> 'index + nr_to_read' is then ULONG_MAX rather than overflowed to 0.
> 
> Then 'index + nr_to_read > end_index' rather 'index + nr_to_read >= end_index',
> since otherwise nr_to_read can be increased by 1 rather than decreased or stay
> the same as expected.

Ooh, I missed the overflow case here.  It should be:

+	if (index + nr_to_read - 1 > end_index)
+		nr_to_read = end_index - index + 1;

Let's say index comes in at ULONG_MAX - 2, end_index is ULONG_MAX - 1
and nr_to_read is 8.  The first condition triggers and nr_to_read is
reduced to 3.  But then the second condition wouldn't trigger because
ULONG_MAX - 2 + 3 is 0.

With the rewrite I have in this message, ULONG_MAX - 2 + 3 - 1 is ULONG_MAX,
which is > ULONG_MAX - 1.  So the condition triggers and nr_to_read becomes
(ULONG_MAX - 1) - (ULONG_MAX - 2) + 1.  Which is -1 + 2 + 1, which is 2.
Which is the right answer because we want to read two pages; the one
at ULONG_MAX - 2 and the one at ULONG_MAX - 1.

Thank you!
Eric Biggers March 20, 2020, 6 p.m. UTC | #3
On Fri, Mar 20, 2020 at 10:30:40AM -0700, Matthew Wilcox wrote:
> On Fri, Mar 20, 2020 at 09:58:28AM -0700, Eric Biggers wrote:
> > On Fri, Mar 20, 2020 at 07:22:18AM -0700, Matthew Wilcox wrote:
> > > +	/* Avoid wrapping to the beginning of the file */
> > > +	if (index + nr_to_read < index)
> > > +		nr_to_read = ULONG_MAX - index + 1;
> > > +	/* Don't read past the page containing the last byte of the file */
> > > +	if (index + nr_to_read >= end_index)
> > > +		nr_to_read = end_index - index + 1;
> > 
> > There seem to be a couple off-by-one errors here.  Shouldn't it be:
> > 
> > 	/* Avoid wrapping to the beginning of the file */
> > 	if (index + nr_to_read < index)
> > 		nr_to_read = ULONG_MAX - index;
> 
> I think it's right.  Imagine that index is ULONG_MAX.  We should read one
> page (the one at ULONG_MAX).  That would be ULONG_MAX - ULONG_MAX + 1.
> 
> > 	/* Don't read past the page containing the last byte of the file */
> > 	if (index + nr_to_read > end_index)
> > 		nr_to_read = end_index - index + 1;
> > 
> > I.e., 'ULONG_MAX - index' rather than 'ULONG_MAX - index + 1', so that
> > 'index + nr_to_read' is then ULONG_MAX rather than overflowed to 0.
> > 
> > Then 'index + nr_to_read > end_index' rather 'index + nr_to_read >= end_index',
> > since otherwise nr_to_read can be increased by 1 rather than decreased or stay
> > the same as expected.
> 
> Ooh, I missed the overflow case here.  It should be:
> 
> +	if (index + nr_to_read - 1 > end_index)
> +		nr_to_read = end_index - index + 1;
> 

But then if someone passes index=0 and nr_to_read=0, this underflows and the
entire file gets read.

The page cache isn't actually supposed to contain a page at index ULONG_MAX,
since MAX_LFS_FILESIZE is at most ((loff_t)ULONG_MAX << PAGE_SHIFT), right?  So
I don't think we need to worry about reading the page with index ULONG_MAX.
I.e. I think it's fine to limit nr_to_read to 'ULONG_MAX - index', if that makes
it easier to avoid an overflow or underflow in the next check.

- Eric
Matthew Wilcox (Oracle) March 20, 2020, 6:11 p.m. UTC | #4
On Fri, Mar 20, 2020 at 11:00:17AM -0700, Eric Biggers wrote:
> On Fri, Mar 20, 2020 at 10:30:40AM -0700, Matthew Wilcox wrote:
> > On Fri, Mar 20, 2020 at 09:58:28AM -0700, Eric Biggers wrote:
> > > On Fri, Mar 20, 2020 at 07:22:18AM -0700, Matthew Wilcox wrote:
> > > > +	/* Avoid wrapping to the beginning of the file */
> > > > +	if (index + nr_to_read < index)
> > > > +		nr_to_read = ULONG_MAX - index + 1;
> > > > +	/* Don't read past the page containing the last byte of the file */
> > > > +	if (index + nr_to_read >= end_index)
> > > > +		nr_to_read = end_index - index + 1;
> > > 
> > > There seem to be a couple off-by-one errors here.  Shouldn't it be:
> > > 
> > > 	/* Avoid wrapping to the beginning of the file */
> > > 	if (index + nr_to_read < index)
> > > 		nr_to_read = ULONG_MAX - index;
> > 
> > I think it's right.  Imagine that index is ULONG_MAX.  We should read one
> > page (the one at ULONG_MAX).  That would be ULONG_MAX - ULONG_MAX + 1.
> > 
> > > 	/* Don't read past the page containing the last byte of the file */
> > > 	if (index + nr_to_read > end_index)
> > > 		nr_to_read = end_index - index + 1;
> > > 
> > > I.e., 'ULONG_MAX - index' rather than 'ULONG_MAX - index + 1', so that
> > > 'index + nr_to_read' is then ULONG_MAX rather than overflowed to 0.
> > > 
> > > Then 'index + nr_to_read > end_index' rather 'index + nr_to_read >= end_index',
> > > since otherwise nr_to_read can be increased by 1 rather than decreased or stay
> > > the same as expected.
> > 
> > Ooh, I missed the overflow case here.  It should be:
> > 
> > +	if (index + nr_to_read - 1 > end_index)
> > +		nr_to_read = end_index - index + 1;
> > 
> 
> But then if someone passes index=0 and nr_to_read=0, this underflows and the
> entire file gets read.

nr_to_read == 0 doesn't make sense ... I thought we filtered that out
earlier, but I can't find anywhere that does that right now.  I'd
rather return early from __do_page_cache_readahead() to fix that.

> The page cache isn't actually supposed to contain a page at index ULONG_MAX,
> since MAX_LFS_FILESIZE is at most ((loff_t)ULONG_MAX << PAGE_SHIFT), right?  So
> I don't think we need to worry about reading the page with index ULONG_MAX.
> I.e. I think it's fine to limit nr_to_read to 'ULONG_MAX - index', if that makes
> it easier to avoid an overflow or underflow in the next check.

I think we can get a page at ULONG_MAX on 32-bit systems?  I mean, we can buy
hard drives which are larger than 16TiB these days:
https://www.pcmag.com/news/seagate-will-ship-18tb-and-20tb-hard-drives-in-2020
(even ignoring RAID devices)
Eric Biggers March 20, 2020, 6:24 p.m. UTC | #5
On Fri, Mar 20, 2020 at 11:11:32AM -0700, Matthew Wilcox wrote:
> On Fri, Mar 20, 2020 at 11:00:17AM -0700, Eric Biggers wrote:
> > On Fri, Mar 20, 2020 at 10:30:40AM -0700, Matthew Wilcox wrote:
> > > On Fri, Mar 20, 2020 at 09:58:28AM -0700, Eric Biggers wrote:
> > > > On Fri, Mar 20, 2020 at 07:22:18AM -0700, Matthew Wilcox wrote:
> > > > > +	/* Avoid wrapping to the beginning of the file */
> > > > > +	if (index + nr_to_read < index)
> > > > > +		nr_to_read = ULONG_MAX - index + 1;
> > > > > +	/* Don't read past the page containing the last byte of the file */
> > > > > +	if (index + nr_to_read >= end_index)
> > > > > +		nr_to_read = end_index - index + 1;
> > > > 
> > > > There seem to be a couple off-by-one errors here.  Shouldn't it be:
> > > > 
> > > > 	/* Avoid wrapping to the beginning of the file */
> > > > 	if (index + nr_to_read < index)
> > > > 		nr_to_read = ULONG_MAX - index;
> > > 
> > > I think it's right.  Imagine that index is ULONG_MAX.  We should read one
> > > page (the one at ULONG_MAX).  That would be ULONG_MAX - ULONG_MAX + 1.
> > > 
> > > > 	/* Don't read past the page containing the last byte of the file */
> > > > 	if (index + nr_to_read > end_index)
> > > > 		nr_to_read = end_index - index + 1;
> > > > 
> > > > I.e., 'ULONG_MAX - index' rather than 'ULONG_MAX - index + 1', so that
> > > > 'index + nr_to_read' is then ULONG_MAX rather than overflowed to 0.
> > > > 
> > > > Then 'index + nr_to_read > end_index' rather 'index + nr_to_read >= end_index',
> > > > since otherwise nr_to_read can be increased by 1 rather than decreased or stay
> > > > the same as expected.
> > > 
> > > Ooh, I missed the overflow case here.  It should be:
> > > 
> > > +	if (index + nr_to_read - 1 > end_index)
> > > +		nr_to_read = end_index - index + 1;
> > > 
> > 
> > But then if someone passes index=0 and nr_to_read=0, this underflows and the
> > entire file gets read.
> 
> nr_to_read == 0 doesn't make sense ... I thought we filtered that out
> earlier, but I can't find anywhere that does that right now.  I'd
> rather return early from __do_page_cache_readahead() to fix that.
> 
> > The page cache isn't actually supposed to contain a page at index ULONG_MAX,
> > since MAX_LFS_FILESIZE is at most ((loff_t)ULONG_MAX << PAGE_SHIFT), right?  So
> > I don't think we need to worry about reading the page with index ULONG_MAX.
> > I.e. I think it's fine to limit nr_to_read to 'ULONG_MAX - index', if that makes
> > it easier to avoid an overflow or underflow in the next check.
> 
> I think we can get a page at ULONG_MAX on 32-bit systems?  I mean, we can buy
> hard drives which are larger than 16TiB these days:
> https://www.pcmag.com/news/seagate-will-ship-18tb-and-20tb-hard-drives-in-2020
> (even ignoring RAID devices)

The max file size is ((loff_t)ULONG_MAX << PAGE_SHIFT) which means the maximum
page *index* is ULONG_MAX - 1, not ULONG_MAX.

Anyway, I think we may be making this much too complicated.  How about just:

	pgoff_t i_nrpages = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);

	if (index >= i_nrpages)
		return;
	/* Don't read past the end of the file */
	nr_to_read = min(nr_to_read, i_nrpages - index);

That's 2 branches instead of 4.  (Note that assigning to i_nrpages can't
overflow, since the max number of pages is ULONG_MAX not ULONG_MAX + 1.)

- Eric
Matthew Wilcox (Oracle) March 22, 2020, 4:28 p.m. UTC | #6
On Fri, Mar 20, 2020 at 11:24:52AM -0700, Eric Biggers wrote:
> On Fri, Mar 20, 2020 at 11:11:32AM -0700, Matthew Wilcox wrote:
> > On Fri, Mar 20, 2020 at 11:00:17AM -0700, Eric Biggers wrote:
> > > But then if someone passes index=0 and nr_to_read=0, this underflows and the
> > > entire file gets read.
> > 
> > nr_to_read == 0 doesn't make sense ... I thought we filtered that out
> > earlier, but I can't find anywhere that does that right now.  I'd
> > rather return early from __do_page_cache_readahead() to fix that.
> > 
> > > The page cache isn't actually supposed to contain a page at index ULONG_MAX,
> > > since MAX_LFS_FILESIZE is at most ((loff_t)ULONG_MAX << PAGE_SHIFT), right?  So
> > > I don't think we need to worry about reading the page with index ULONG_MAX.
> > > I.e. I think it's fine to limit nr_to_read to 'ULONG_MAX - index', if that makes
> > > it easier to avoid an overflow or underflow in the next check.
> > 
> > I think we can get a page at ULONG_MAX on 32-bit systems?  I mean, we can buy
> > hard drives which are larger than 16TiB these days:
> > https://www.pcmag.com/news/seagate-will-ship-18tb-and-20tb-hard-drives-in-2020
> > (even ignoring RAID devices)
> 
> The max file size is ((loff_t)ULONG_MAX << PAGE_SHIFT) which means the maximum
> page *index* is ULONG_MAX - 1, not ULONG_MAX.

I see where we set that for _files_ ... I can't find anywhere that we prevent
i_size getting that big for block devices.  Maybe I'm missing something.

> Anyway, I think we may be making this much too complicated.  How about just:
> 
> 	pgoff_t i_nrpages = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
> 
> 	if (index >= i_nrpages)
> 		return;
> 	/* Don't read past the end of the file */
> 	nr_to_read = min(nr_to_read, i_nrpages - index);
> 
> That's 2 branches instead of 4.  (Note that assigning to i_nrpages can't
> overflow, since the max number of pages is ULONG_MAX not ULONG_MAX + 1.)

I like where you're going with this.  Just to be on the safe side, I'd
prefer to do this:

@@ -266,11 +266,8 @@ void __do_page_cache_readahead(struct address_space *mapping,
        end_index = (isize - 1) >> PAGE_SHIFT;
        if (index > end_index)
                return;
-       /* Avoid wrapping to the beginning of the file */
-       if (index + nr_to_read < index)
-               nr_to_read = ULONG_MAX - index + 1;
        /* Don't read past the page containing the last byte of the file */
-       if (index + nr_to_read >= end_index)
+       if (nr_to_read > end_index - index)
                nr_to_read = end_index - index + 1;
 
        page_cache_readahead_unbounded(mapping, file, index, nr_to_read,

end_index - index + 1 could only overflow if end_index is ULONG_MAX
and index is 0.  But if end_index is ULONG_MAX and index is 0, then
nr_to_read is necessarily <= ULONG_MAX, so the condition is false.
And if nr_to_read is 0, then the condition is also false, so it won't
increase nr_to_read from 0 to 1.  It might assign x to nr_to_read when
nr_to_read is already x, but that's harmless.

Thanks!
diff mbox series

Patch

diff --git a/mm/readahead.c b/mm/readahead.c
index d01531ef9f3c..a37b68f66233 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -167,8 +167,6 @@  void __do_page_cache_readahead(struct address_space *mapping,
 		unsigned long lookahead_size)
 {
 	struct inode *inode = mapping->host;
-	struct page *page;
-	unsigned long end_index;	/* The last page we want to read */
 	LIST_HEAD(page_pool);
 	loff_t isize = i_size_read(inode);
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
@@ -178,22 +176,29 @@  void __do_page_cache_readahead(struct address_space *mapping,
 		._index = index,
 	};
 	unsigned long i;
+	pgoff_t end_index;	/* The last page we want to read */
 
 	if (isize == 0)
 		return;
 
-	end_index = ((isize - 1) >> PAGE_SHIFT);
+	end_index = (isize - 1) >> PAGE_SHIFT;
+	if (index > end_index)
+		return;
+	/* Avoid wrapping to the beginning of the file */
+	if (index + nr_to_read < index)
+		nr_to_read = ULONG_MAX - index + 1;
+	/* Don't read past the page containing the last byte of the file */
+	if (index + nr_to_read >= end_index)
+		nr_to_read = end_index - index + 1;
 
 	/*
 	 * Preallocate as many pages as we will need.
 	 */
 	for (i = 0; i < nr_to_read; i++) {
-		if (index + i > end_index)
-			break;
+		struct page *page = xa_load(&mapping->i_pages, index + i);
 
 		BUG_ON(index + i != rac._index + rac._nr_pages);
 
-		page = xa_load(&mapping->i_pages, index + i);
 		if (page && !xa_is_value(page)) {
 			/*
 			 * Page already present?  Kick off the current batch of