diff mbox series

[RFC,V3,07/12] mpage_readpage[s]: Introduce post process callback parameters

Message ID 20180522160110.1161-8-chandan@linux.vnet.ibm.com
State Changes Requested
Headers show
Series Ext4 encryption support for blocksize < pagesize | expand

Commit Message

Chandan Rajendra May 22, 2018, 4:01 p.m. UTC
This commit introduces a new parameter to mpage_readpage[s]()
functions. This parameter contains pointers to functions that can be
used to decrypt data read from the backing device. These are stored in
the fscrypt_ctx structure and one of these functions is invoked after
the read operation is completed.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/block_dev.c                  |   5 +-
 fs/buffer.c                     | 298 ++++++++++++++++++++++++----------------
 fs/crypto/bio.c                 |  95 ++++++++++++-
 fs/crypto/crypto.c              |   2 +
 fs/ext2/inode.c                 |   4 +-
 fs/ext4/Makefile                |   2 +-
 fs/ext4/inode.c                 |  13 +-
 fs/ext4/readpage.c              | 294 ---------------------------------------
 fs/fat/inode.c                  |   4 +-
 fs/isofs/inode.c                |   5 +-
 fs/mpage.c                      |  48 +++++--
 fs/xfs/xfs_aops.c               |   4 +-
 include/linux/buffer_head.h     |   2 +-
 include/linux/fs.h              |   4 +
 include/linux/fscrypt_notsupp.h |  37 ++++-
 include/linux/fscrypt_supp.h    |  13 +-
 include/linux/mpage.h           |   6 +-
 17 files changed, 392 insertions(+), 444 deletions(-)
 delete mode 100644 fs/ext4/readpage.c

Comments

Theodore Ts'o May 25, 2018, 8:01 p.m. UTC | #1
On Tue, May 22, 2018 at 09:31:05PM +0530, Chandan Rajendra wrote:
> This commit introduces a new parameter to mpage_readpage[s]()
> functions. This parameter contains pointers to functions that can be
> used to decrypt data read from the backing device. These are stored in
> the fscrypt_ctx structure and one of these functions is invoked after
> the read operation is completed.
> 
> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>

Can you describe more of what you are doing here; specifically, you
deleted all of fs/ext4/readpage.c --- was this because you moved
functionality back into fs/mpage.c?  Did you make sure all of the
local changes in fs/ext4/readpage was moved back to fs/mpage.c?

If the goal is to refactor code to remove the need for
fs/ext4/readpage.c, you should probably make that be the first patch
as a prerequisite patch.  And we then need to make sure we don't
accidentally break anyone else who might be using fs/mpage.c.  Saying
a bit more about why you think the refactor is a good thing would also
be useful.

	     	   	       	   	    	  - Ted

P.S.  What version of the kernel was these patches against?  I noticed
the patches weren't applying cleanly to the ext4 git tree.  Given how
invasive these patches are, it's not surprising that they are very
version-sensitive.
Chandan Rajendra May 28, 2018, 5:35 a.m. UTC | #2
On Saturday, May 26, 2018 1:31:21 AM IST Theodore Y. Ts'o wrote:
> On Tue, May 22, 2018 at 09:31:05PM +0530, Chandan Rajendra wrote:
> > This commit introduces a new parameter to mpage_readpage[s]()
> > functions. This parameter contains pointers to functions that can be
> > used to decrypt data read from the backing device. These are stored in
> > the fscrypt_ctx structure and one of these functions is invoked after
> > the read operation is completed.
> > 
> > Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> 
> Can you describe more of what you are doing here; specifically, you
> deleted all of fs/ext4/readpage.c --- was this because you moved
> functionality back into fs/mpage.c?  Did you make sure all of the
> local changes in fs/ext4/readpage was moved back to fs/mpage.c?
> 
> If the goal is to refactor code to remove the need for
> fs/ext4/readpage.c, you should probably make that be the first patch
> as a prerequisite patch.  And we then need to make sure we don't
> accidentally break anyone else who might be using fs/mpage.c.  Saying
> a bit more about why you think the refactor is a good thing would also
> be useful.

I will split this patch into two as suggested by you. Also, I will update 
the commit messages.

The patchset was based on next-20180503 tree. You can obtain it from
"https://github.com/chandanr/linux.git ext4-encryption"
Theodore Ts'o May 28, 2018, 7:34 p.m. UTC | #3
On Mon, May 28, 2018 at 11:05:52AM +0530, Chandan Rajendra wrote:
> > Can you describe more of what you are doing here; specifically, you
> > deleted all of fs/ext4/readpage.c --- was this because you moved
> > functionality back into fs/mpage.c?  Did you make sure all of the
> > local changes in fs/ext4/readpage was moved back to fs/mpage.c?
> > 
> > If the goal is to refactor code to remove the need for
> > fs/ext4/readpage.c, you should probably make that be the first patch
> > as a prerequisite patch.  And we then need to make sure we don't
> > accidentally break anyone else who might be using fs/mpage.c.  Saying
> > a bit more about why you think the refactor is a good thing would also
> > be useful.
> 
> I will split this patch into two as suggested by you. Also, I will update 
> the commit messages.

Note that I was planning on making changes to fs/ext4/readpage.c as
part of integrating fsverity[1][2] support into ext4.  Basically, I
need to do something like [3] to fs/ext4/readpage.c.

[1] https://www.spinics.net/lists/linux-fsdevel/msg121182.html
[2] https://www.youtube.com/watch?v=GlEWcVuRbNA
[3] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/commit/?h=fs-verity-dev&id=827faba05972517f49fa2f2aaf272150f5766af2

Which is why I'm really interested in your reasoning for why you
propose to drop fs/ext4/readpage.c.  :-)

						- Ted
Chandan Rajendra May 29, 2018, 3:04 a.m. UTC | #4
On Tuesday, May 29, 2018 1:04:37 AM IST Theodore Y. Ts'o wrote:
> On Mon, May 28, 2018 at 11:05:52AM +0530, Chandan Rajendra wrote:
> > > Can you describe more of what you are doing here; specifically, you
> > > deleted all of fs/ext4/readpage.c --- was this because you moved
> > > functionality back into fs/mpage.c?  Did you make sure all of the
> > > local changes in fs/ext4/readpage was moved back to fs/mpage.c?
> > > 
> > > If the goal is to refactor code to remove the need for
> > > fs/ext4/readpage.c, you should probably make that be the first patch
> > > as a prerequisite patch.  And we then need to make sure we don't
> > > accidentally break anyone else who might be using fs/mpage.c.  Saying
> > > a bit more about why you think the refactor is a good thing would also
> > > be useful.
> > 
> > I will split this patch into two as suggested by you. Also, I will update 
> > the commit messages.
> 
> Note that I was planning on making changes to fs/ext4/readpage.c as
> part of integrating fsverity[1][2] support into ext4.  Basically, I
> need to do something like [3] to fs/ext4/readpage.c.
> 
> [1] https://www.spinics.net/lists/linux-fsdevel/msg121182.html
> [2] https://www.youtube.com/watch?v=GlEWcVuRbNA
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/commit/?h=fs-verity-dev&id=827faba05972517f49fa2f2aaf272150f5766af2
> 
> Which is why I'm really interested in your reasoning for why you
> propose to drop fs/ext4/readpage.c.  :-)
> 

The first patchset to support encryption in subpage-blocksize scenario copied
the block_read_full_page() from fs/buffer.c to ext4/readpage.c and had made
changes required to support encryption in that function. However, the
conclusion was to not create copies of existing code but rather add support
for decryption inside generic mpage_readpage[s] functions. Hence this patchset
implements the required decryption logic in the generic mpage_readpage[s]
functions. Since this makes the code in ext4/readpage.c redundant, I had
decided to delete the ext4/readpage.c.
Eric Biggers May 29, 2018, 5:53 p.m. UTC | #5
Hi Chandan,

On Tue, May 29, 2018 at 08:34:21AM +0530, Chandan Rajendra wrote:
> On Tuesday, May 29, 2018 1:04:37 AM IST Theodore Y. Ts'o wrote:
> > On Mon, May 28, 2018 at 11:05:52AM +0530, Chandan Rajendra wrote:
> > > > Can you describe more of what you are doing here; specifically, you
> > > > deleted all of fs/ext4/readpage.c --- was this because you moved
> > > > functionality back into fs/mpage.c?  Did you make sure all of the
> > > > local changes in fs/ext4/readpage was moved back to fs/mpage.c?
> > > > 
> > > > If the goal is to refactor code to remove the need for
> > > > fs/ext4/readpage.c, you should probably make that be the first patch
> > > > as a prerequisite patch.  And we then need to make sure we don't
> > > > accidentally break anyone else who might be using fs/mpage.c.  Saying
> > > > a bit more about why you think the refactor is a good thing would also
> > > > be useful.
> > > 
> > > I will split this patch into two as suggested by you. Also, I will update 
> > > the commit messages.
> > 
> > Note that I was planning on making changes to fs/ext4/readpage.c as
> > part of integrating fsverity[1][2] support into ext4.  Basically, I
> > need to do something like [3] to fs/ext4/readpage.c.
> > 
> > [1] https://www.spinics.net/lists/linux-fsdevel/msg121182.html
> > [2] https://www.youtube.com/watch?v=GlEWcVuRbNA
> > [3] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/commit/?h=fs-verity-dev&id=827faba05972517f49fa2f2aaf272150f5766af2
> > 
> > Which is why I'm really interested in your reasoning for why you
> > propose to drop fs/ext4/readpage.c.  :-)
> > 
> 
> The first patchset to support encryption in subpage-blocksize scenario copied
> the block_read_full_page() from fs/buffer.c to ext4/readpage.c and had made
> changes required to support encryption in that function. However, the
> conclusion was to not create copies of existing code but rather add support
> for decryption inside generic mpage_readpage[s] functions. Hence this patchset
> implements the required decryption logic in the generic mpage_readpage[s]
> functions. Since this makes the code in ext4/readpage.c redundant, I had
> decided to delete the ext4/readpage.c.
> 

Strictly speaking, I don't think anything has been "concluded" yet.  The issue,
as I saw it, was that your original patchset just copy-and-pasted lots more
generic code from fs/buffer.c into ext4, without consideration of alternatives
that would allow the code to be shared, such as adding a postprocessing callback
to mpage_readpage{,s}().  My hope was that you would thoughtfully consider the
alternatives and make a decision of what was the best solution, and then explain
that decision as part of your patchset -- not just implement some solution
without much explanation, which makes it very difficult for people to decide
whether it's the best solution or not.

And yes, now that fs-verity is planned to be a thing too, we should stop
thinking of the problem as specifically "how to support decryption", but rather
how to support the ability to post-process the data using potentially multiple
length-preserving postprocessing steps such as decryption,
integrity/authenticity verification, etc.

I'll take a closer look at this patch when I have a chance, but as Ted pointed
out it really needs to be split out into multiple patches.  Just as a
preliminary comment, it looks like you are directly calling into fs/crypto/ from
fs/buffer.c, e.g. fscrypt_enqueue_decrypt_bio().  I don't understand that.  If
you're doing that (which would start requiring that fscrypt be built-in, not
modular) then there should be no need for the filesystem to pass a
postprocessing callback to the generic code, as you could just check
S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode) in generic code to tell whether
decryption needs to be done.  The whole point of the postprocessing callback
would be to allow the generic read code to be used without it having to be aware
of all the specific types of post-read processing that filesystems may want.

Thanks!

- Eric
Chandan Rajendra May 30, 2018, 3:09 a.m. UTC | #6
On Tuesday, May 29, 2018 11:23:17 PM IST Eric Biggers wrote:
> Hi Chandan,
> 
> On Tue, May 29, 2018 at 08:34:21AM +0530, Chandan Rajendra wrote:
> > On Tuesday, May 29, 2018 1:04:37 AM IST Theodore Y. Ts'o wrote:
> > > On Mon, May 28, 2018 at 11:05:52AM +0530, Chandan Rajendra wrote:
> > > > > Can you describe more of what you are doing here; specifically, you
> > > > > deleted all of fs/ext4/readpage.c --- was this because you moved
> > > > > functionality back into fs/mpage.c?  Did you make sure all of the
> > > > > local changes in fs/ext4/readpage was moved back to fs/mpage.c?
> > > > > 
> > > > > If the goal is to refactor code to remove the need for
> > > > > fs/ext4/readpage.c, you should probably make that be the first patch
> > > > > as a prerequisite patch.  And we then need to make sure we don't
> > > > > accidentally break anyone else who might be using fs/mpage.c.  Saying
> > > > > a bit more about why you think the refactor is a good thing would also
> > > > > be useful.
> > > > 
> > > > I will split this patch into two as suggested by you. Also, I will update 
> > > > the commit messages.
> > > 
> > > Note that I was planning on making changes to fs/ext4/readpage.c as
> > > part of integrating fsverity[1][2] support into ext4.  Basically, I
> > > need to do something like [3] to fs/ext4/readpage.c.
> > > 
> > > [1] https://www.spinics.net/lists/linux-fsdevel/msg121182.html
> > > [2] https://www.youtube.com/watch?v=GlEWcVuRbNA
> > > [3] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/commit/?h=fs-verity-dev&id=827faba05972517f49fa2f2aaf272150f5766af2
> > > 
> > > Which is why I'm really interested in your reasoning for why you
> > > propose to drop fs/ext4/readpage.c.  :-)
> > > 
> > 
> > The first patchset to support encryption in subpage-blocksize scenario copied
> > the block_read_full_page() from fs/buffer.c to ext4/readpage.c and had made
> > changes required to support encryption in that function. However, the
> > conclusion was to not create copies of existing code but rather add support
> > for decryption inside generic mpage_readpage[s] functions. Hence this patchset
> > implements the required decryption logic in the generic mpage_readpage[s]
> > functions. Since this makes the code in ext4/readpage.c redundant, I had
> > decided to delete the ext4/readpage.c.
> > 
> 
> Strictly speaking, I don't think anything has been "concluded" yet.  The issue,
> as I saw it, was that your original patchset just copy-and-pasted lots more
> generic code from fs/buffer.c into ext4, without consideration of alternatives
> that would allow the code to be shared, such as adding a postprocessing callback
> to mpage_readpage{,s}().  My hope was that you would thoughtfully consider the
> alternatives and make a decision of what was the best solution, and then explain
> that decision as part of your patchset -- not just implement some solution
> without much explanation, which makes it very difficult for people to decide
> whether it's the best solution or not.
> 
> And yes, now that fs-verity is planned to be a thing too, we should stop
> thinking of the problem as specifically "how to support decryption", but rather
> how to support the ability to post-process the data using potentially multiple
> length-preserving postprocessing steps such as decryption,
> integrity/authenticity verification, etc.
> 
> I'll take a closer look at this patch when I have a chance, but as Ted pointed
> out it really needs to be split out into multiple patches.  Just as a
> preliminary comment, it looks like you are directly calling into fs/crypto/ from
> fs/buffer.c, e.g. fscrypt_enqueue_decrypt_bio().  I don't understand that.  If
> you're doing that (which would start requiring that fscrypt be built-in, not
> modular) then there should be no need for the filesystem to pass a
> postprocessing callback to the generic code, as you could just check
> S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode) in generic code to tell whether
> decryption needs to be done.  The whole point of the postprocessing callback
> would be to allow the generic read code to be used without it having to be aware
> of all the specific types of post-read processing that filesystems may want.

Hi Eric,

I had misunderstood the requirement. Sorry about that. I had written the
patchset in its current form with the understanding that fs/buffer.c and
fs/ext4/*.c would need to get compiled even when fscrypt code isn't compiled
at all. When the fscrypt module isn't selected for build in the kernel config,
calls to fscrypt_*() functions would end up calling the equivalent nop
functions in fscrypt_notsupp.h file.

For the generic code to be completely unaware of several stages of "post
processing" functionality, I would most likely have to add more callback
pointers into the newly introduced "struct post_process_read" structure. I
will work on this and post the results in the next version of the patchset.
Theodore Ts'o May 30, 2018, 5:06 a.m. UTC | #7
Note: we may want to move this thread so that it's on linux-fscrypt
exclusively.  I'm thinking that we should consider using linux-fscrypt
for fscrypt and fsverity discussions.  That way we can avoid adding
extra noise to the linux-fsdevel and linux-ext4 lists.  Comments?

On Wed, May 30, 2018 at 08:39:11AM +0530, Chandan Rajendra wrote:
> 
> I had misunderstood the requirement. Sorry about that. I had written the
> patchset in its current form with the understanding that fs/buffer.c and
> fs/ext4/*.c would need to get compiled even when fscrypt code isn't compiled
> at all. When the fscrypt module isn't selected for build in the kernel config,
> calls to fscrypt_*() functions would end up calling the equivalent nop
> functions in fscrypt_notsupp.h file.
> 
> For the generic code to be completely unaware of several stages of "post
> processinhg" functionality, I would most likely have to add more callback
> pointers into the newly introduced "struct post_process_read" structure. 

It's still a work in progress, but I have an initial integration of
ext4 with fsverity.  See the fsverity branch on ext4.git, and in
particular, the changes made to fs/ext4/readpage.c.

Let's be clear that neither Eric and I are completely happy with how
the fsveirty post-read processing is being done.  What's there right
now is a place-holder so we can continue to development/debug the
other aspects of fsverity.  In particular, we're aware that there is
code duplication between code in fs/f2fs/data.c and fs/ext4/readpage.c

One of the things which is a bit tricky right now is that fscrypt and
fsverity can be enabled on a per-file system basis.  That is, there are
separate CONFIG options:

   * CONFIG_EXT4_FS_ENCRYPTION
   * CONFIG_F2FS_FS_ENCRYPTION
   * CONFIG_EXT4_FS_VERITY
   * CONFIG_F2FS_FS_VERITY

And in each file system, you have to do this before including the header files:

#define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
#include <linux/fscrypt.h>

#define __FS_HAS_VERITY IS_ENABLED(CONFIG_EXT4_FS_VERITY)
#include <linux/fsverity.ha>

That's because whether you get the function prototype or an inline
function which returns EOhPNOTSUPP for functions such as
fscrypt_decrypt_bio() depends on how __FS_HAS_ENCRYPTION is defined
before including linux/fscrypt.h.

I now think this was a mistake, and that we should handle this the
same way we handle CONFIG_QUOTA.  If we enable fscrypt or fsverity, it
should be enabled for all file sytems which support that feature.
Otherwise it becomes too hard to try to support this in a file system
independent way --- and we end up having code which is cut-and-pasted
for each file system (e.g., in fs/f2fs/data.c and fs/ext4/readpage.c)
but which may compile to something quite different thanks to the C
preprocessor magic which is going on at the moment.

> I will work on this and post the results in the next version of the patchset.

So in order to avoid your wasting time and energy, and perhaps getting
unnecessarily frustrated, I'd recommend that before you immediately
start diving into implementation, that we have a design discussion
about the best way to proceed.  And then when we have a common
agreement about how to proceed, let's get something upstream first
which changes the infrastructure used by the file systems and by
fscrypt first.  And let's get that working, *before* we start
integrating the changes for supporting fscrypt for 4k blocks for
systems with 32k pagesize, or before we start making final (e.g.,
non-prototype) changes to integrate fsverity.

I'll include my first attempt that I played around over the weekend
with in terms of generic infrastructure, before I realized that we
need to decide whether the current way we configure fscrypt and
fsverity makes sense or not.  It's an example of why we should have
design discussions first, and not just immediately start diving into
code.

Fortunately (perhaps because I've some experience with these sorts of
things) I only spent about an hour or so working on the prototype, and
none trying to integrate it and doing all of the testing and
debugging, before recognizing that we needed to have a meeting of the
minds about design and requirements first.  :-)

Cheers,

						- Ted

------------- include/linux/bio_post_read.h

#ifndef _LINUX_BIO_POST_READ_H
#define _LINUX_BIO_POST_READ_H

#include <linux/bio.h>

enum bio_post_read_step {
	STEP_INITIAL = 0,
	STEP_DECRYPT,
	STEP_VERITY,
};

struct bio_post_read_ctx {
	struct bio *bio;
	struct work_struct work;
	unsigned int cur_step;
	unsigned int enabled_steps;
};

static inline bool bpr_required(struct bio *bio)
{
	return bio->bi_private && !bio->bi_status;
}

extern void __bpr_read_end_io(struct bio *bio);
void bpr_do_processing(struct bio_post_read_ctx *ctx);

#endif /* _LINUX_BIO_POST_READ_H */

------------- fs/bio_post_read.c

// SPDX-License-Identifier: GPL-2.0
/*
 * fs/bio_post_read.c
 *
 * Copyright (C) 2018, Google LLC
 *
 * Contains helper functions used by file systems which use fscrypt
 * and/or fsverity
 */

#include <linux/mempool.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/bio_post_read.h>
#include <linux/pagemap.h>
#include <linux/fscrypt.h>
#include <linux/fsverity.h>

static unsigned int num_prealloc_post_read_ctxs = 128;

module_param(num_prealloc_post_read_ctxs, uint, 0444);
MODULE_PARM_DESC(num_prealloc_post_read_ctxs,
		"Number of bio_post_read contexts to preallocate");

static struct kmem_cache *bpr_ctx_cache;
static mempool_t *bpr_ctx_pool;

void __bpr_read_end_io(struct bio *bio)
{
	struct page *page;
	struct bio_vec *bv;
	int i;

	bio_for_each_segment_all(bv, bio, i) {
		page = bv->bv_page;

		/* PG_error was set if any post_read step failed */
		if (bio->bi_status || PageError(page)) {
			ClearPageUptodate(page);
			SetPageError(page);
		} else {
			SetPageUptodate(page);
		}
		unlock_page(page);
	}
	if (bio->bi_private)
		mempool_free(bio->bi_private, bpr_ctx_pool);
	bio_put(bio);
}

static void decrypt_work(struct work_struct *work)
{
	struct bio_post_read_ctx *ctx =
		container_of(work, struct bio_post_read_ctx, work);

	fscrypt_decrypt_bio(ctx->bio);

	bpr_do_processing(ctx);
}

static void verity_work(struct work_struct *work)
{
	struct bio_post_read_ctx *ctx =
		container_of(work, struct bio_post_read_ctx, work);

	fsverity_verify_bio(ctx->bio);

	bpr_do_processing(ctx);
}

void bpr_do_processing(struct bio_post_read_ctx *ctx)
{
	/*
	 * We use different work queues for decryption and for verity because
	 * verity may require reading metadata pages that need decryption, and
	 * we shouldn't recurse to the same workqueue.
	 */
	switch (++ctx->cur_step) {
	case STEP_DECRYPT:
		if (ctx->enabled_steps & (1 << STEP_DECRYPT)) {
			INIT_WORK(&ctx->work, decrypt_work);
			fscrypt_enqueue_decrypt_work(&ctx->work);
			return;
		}
		ctx->cur_step++;
		/* fall-through */
	case STEP_VERITY:
		if (ctx->enabled_steps & (1 << STEP_VERITY)) {
			INIT_WORK(&ctx->work, verity_work);
			fsverity_enqueue_verify_work(&ctx->work);
			return;
		}
		ctx->cur_step++;
		/* fall-through */
	default:
		__bpr_read_end_io(ctx->bio);
	}
}

int __init bpr_init(void)
{
	bpr_ctx_cache = KMEM_CACHE(bio_post_read_ctx, 0);
	if (!bpr_ctx_cache)
		goto fail;
	bpr_ctx_pool = mempool_create_slab_pool(num_prealloc_post_read_ctxs,
						bpr_ctx_cache);
	if (!bpr_ctx_pool)
		goto fail_free_cache;
	return 0;

fail_free_cache:
	kmem_cache_destroy(bpr_ctx_cache);
fail:
	return -ENOMEM;
}

void __exit bpr_exit(void)
{
	mempool_destroy(bpr_ctx_pool);
	kmem_cache_destroy(bpr_ctx_cache);
}

module_init(bpr_init);
module_exit(bpr_exit);
MODULE_LICENSE("GPL");
Chandan Rajendra May 30, 2018, 11:33 a.m. UTC | #8
On Wednesday, May 30, 2018 10:36:42 AM IST Theodore Y. Ts'o wrote:
> Note: we may want to move this thread so that it's on linux-fscrypt
> exclusively.  I'm thinking that we should consider using linux-fscrypt
> for fscrypt and fsverity discussions.  That way we can avoid adding
> extra noise to the linux-fsdevel and linux-ext4 lists.  Comments?

I agree.

> 
> On Wed, May 30, 2018 at 08:39:11AM +0530, Chandan Rajendra wrote:
> > 
> > I had misunderstood the requirement. Sorry about that. I had written the
> > patchset in its current form with the understanding that fs/buffer.c and
> > fs/ext4/*.c would need to get compiled even when fscrypt code isn't compiled
> > at all. When the fscrypt module isn't selected for build in the kernel config,
> > calls to fscrypt_*() functions would end up calling the equivalent nop
> > functions in fscrypt_notsupp.h file.
> > 
> > For the generic code to be completely unaware of several stages of "post
> > processinhg" functionality, I would most likely have to add more callback
> > pointers into the newly introduced "struct post_process_read" structure. 
> 
> It's still a work in progress, but I have an initial integration of
> ext4 with fsverity.  See the fsverity branch on ext4.git, and in
> particular, the changes made to fs/ext4/readpage.c.
> 
> Let's be clear that neither Eric and I are completely happy with how
> the fsveirty post-read processing is being done.  What's there right
> now is a place-holder so we can continue to development/debug the
> other aspects of fsverity.  In particular, we're aware that there is
> code duplication between code in fs/f2fs/data.c and fs/ext4/readpage.c
> 
> One of the things which is a bit tricky right now is that fscrypt and
> fsverity can be enabled on a per-file system basis.  That is, there are
> separate CONFIG options:
> 
>    * CONFIG_EXT4_FS_ENCRYPTION
>    * CONFIG_F2FS_FS_ENCRYPTION
>    * CONFIG_EXT4_FS_VERITY
>    * CONFIG_F2FS_FS_VERITY
> 
> And in each file system, you have to do this before including the header files:
> 
> #define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
> #include <linux/fscrypt.h>
> 
> #define __FS_HAS_VERITY IS_ENABLED(CONFIG_EXT4_FS_VERITY)
> #include <linux/fsverity.ha>
> 
> That's because whether you get the function prototype or an inline
> function which returns EOhPNOTSUPP for functions such as
> fscrypt_decrypt_bio() depends on how __FS_HAS_ENCRYPTION is defined
> before including linux/fscrypt.h.
> 
> I now think this was a mistake, and that we should handle this the
> same way we handle CONFIG_QUOTA.  If we enable fscrypt or fsverity, it
> should be enabled for all file sytems which support that feature.

Do we continue to retain ext4/readpage.c? I ask because if the logic in
ext4/readpage.c has to be moved to fs/buffer.c (to avoid code duplication) then 
we end up losing the ability to build fscrypt as a module even if one of
the filesystems using fs/bio_post_read.c is itself built as a module.


> Otherwise it becomes too hard to try to support this in a file system
> independent way --- and we end up having code which is cut-and-pasted
> for each file system (e.g., in fs/f2fs/data.c and fs/ext4/readpage.c)
> but which may compile to something quite different thanks to the C
> preprocessor magic which is going on at the moment.
> 
> > I will work on this and post the results in the next version of the patchset.
> 
> So in order to avoid your wasting time and energy, and perhaps getting
> unnecessarily frustrated, I'd recommend that before you immediately
> start diving into implementation, that we have a design discussion
> about the best way to proceed.  And then when we have a common
> agreement about how to proceed, let's get something upstream first
> which changes the infrastructure used by the file systems and by
> fscrypt first.  And let's get that working, *before* we start
> integrating the changes for supporting fscrypt for 4k blocks for
> systems with 32k pagesize, or before we start making final (e.g.,
> non-prototype) changes to integrate fsverity.
> 
> I'll include my first attempt that I played around over the weekend
> with in terms of generic infrastructure, before I realized that we
> need to decide whether the current way we configure fscrypt and
> fsverity makes sense or not.  It's an example of why we should have
> design discussions first, and not just immediately start diving into
> code.
> 
> Fortunately (perhaps because I've some experience with these sorts of
> things) I only spent about an hour or so working on the prototype, and
> none trying to integrate it and doing all of the testing and
> debugging, before recognizing that we needed to have a meeting of the
> minds about design and requirements first.  :-)
> 
> Cheers,
> 
> 						- Ted
> 
> ------------- include/linux/bio_post_read.h
> 
> #ifndef _LINUX_BIO_POST_READ_H
> #define _LINUX_BIO_POST_READ_H
> 
> #include <linux/bio.h>
> 
> enum bio_post_read_step {
> 	STEP_INITIAL = 0,
> 	STEP_DECRYPT,
> 	STEP_VERITY,
> };
> 
> struct bio_post_read_ctx {
> 	struct bio *bio;
> 	struct work_struct work;
> 	unsigned int cur_step;
> 	unsigned int enabled_steps;
> };
> 
> static inline bool bpr_required(struct bio *bio)
> {
> 	return bio->bi_private && !bio->bi_status;
> }

Relying on bio->bi_private to check for requirement of post processing steps
does not seem to be correct. AFAIK (please correct me if I am wrong) the
convention is that bi_private field is owned by the code that allocates the
bio and the owner can assign its own value to this field.

Ideally we should be checking for per-inode encryption/verity flags to decide
if any post-processing steps are required to be executed.

I had implemented the following in the patchset that I had posted earlier:

bool fscrypt_bio_encrypted(struct bio *bio)
{
	struct address_space *mapping;
	struct inode *inode;
	struct page *page;

	if (bio_op(bio) == REQ_OP_READ && bio->bi_vcnt) {
		page = bio->bi_io_vec->bv_page;

		if (!PageSwapCache(page)) {
			mapping = page_mapping(page);
			if (mapping) {
				inode = mapping->host;

				if (IS_ENCRYPTED(inode) &&
					S_ISREG(inode->i_mode))
					return true;
			}
		}
	}

	return false;
}

> 
> extern void __bpr_read_end_io(struct bio *bio);
> void bpr_do_processing(struct bio_post_read_ctx *ctx);
> 
> #endif /* _LINUX_BIO_POST_READ_H */
> 
> ------------- fs/bio_post_read.c
> 
> // SPDX-License-Identifier: GPL-2.0
> /*
>  * fs/bio_post_read.c
>  *
>  * Copyright (C) 2018, Google LLC
>  *
>  * Contains helper functions used by file systems which use fscrypt
>  * and/or fsverity
>  */
> 
> #include <linux/mempool.h>
> #include <linux/module.h>
> #include <linux/slab.h>
> #include <linux/bio_post_read.h>
> #include <linux/pagemap.h>
> #include <linux/fscrypt.h>
> #include <linux/fsverity.h>
> 
> static unsigned int num_prealloc_post_read_ctxs = 128;
> 
> module_param(num_prealloc_post_read_ctxs, uint, 0444);
> MODULE_PARM_DESC(num_prealloc_post_read_ctxs,
> 		"Number of bio_post_read contexts to preallocate");
> 
> static struct kmem_cache *bpr_ctx_cache;
> static mempool_t *bpr_ctx_pool;
> 
> void __bpr_read_end_io(struct bio *bio)
> {
> 	struct page *page;
> 	struct bio_vec *bv;
> 	int i;
> 
> 	bio_for_each_segment_all(bv, bio, i) {
> 		page = bv->bv_page;
> 
> 		/* PG_error was set if any post_read step failed */
> 		if (bio->bi_status || PageError(page)) {
> 			ClearPageUptodate(page);
> 			SetPageError(page);
> 		} else {
> 			SetPageUptodate(page);
> 		}
> 		unlock_page(page);
> 	}
> 	if (bio->bi_private)
> 		mempool_free(bio->bi_private, bpr_ctx_pool);
> 	bio_put(bio);
> }
> 
> static void decrypt_work(struct work_struct *work)
> {
> 	struct bio_post_read_ctx *ctx =
> 		container_of(work, struct bio_post_read_ctx, work);
> 
> 	fscrypt_decrypt_bio(ctx->bio);
> 
> 	bpr_do_processing(ctx);
> }
> 
> static void verity_work(struct work_struct *work)
> {
> 	struct bio_post_read_ctx *ctx =
> 		container_of(work, struct bio_post_read_ctx, work);
> 
> 	fsverity_verify_bio(ctx->bio);
> 
> 	bpr_do_processing(ctx);
> }
> 
> void bpr_do_processing(struct bio_post_read_ctx *ctx)
> {
> 	/*
> 	 * We use different work queues for decryption and for verity because
> 	 * verity may require reading metadata pages that need decryption, and
> 	 * we shouldn't recurse to the same workqueue.
> 	 */
> 	switch (++ctx->cur_step) {
> 	case STEP_DECRYPT:
> 		if (ctx->enabled_steps & (1 << STEP_DECRYPT)) {
> 			INIT_WORK(&ctx->work, decrypt_work);
> 			fscrypt_enqueue_decrypt_work(&ctx->work);
> 			return;
> 		}
> 		ctx->cur_step++;
> 		/* fall-through */
> 	case STEP_VERITY:
> 		if (ctx->enabled_steps & (1 << STEP_VERITY)) {
> 			INIT_WORK(&ctx->work, verity_work);
> 			fsverity_enqueue_verify_work(&ctx->work);
> 			return;
> 		}
> 		ctx->cur_step++;
> 		/* fall-through */
> 	default:
> 		__bpr_read_end_io(ctx->bio);
> 	}
> }
> 
> int __init bpr_init(void)
> {
> 	bpr_ctx_cache = KMEM_CACHE(bio_post_read_ctx, 0);
> 	if (!bpr_ctx_cache)
> 		goto fail;
> 	bpr_ctx_pool = mempool_create_slab_pool(num_prealloc_post_read_ctxs,
> 						bpr_ctx_cache);
> 	if (!bpr_ctx_pool)
> 		goto fail_free_cache;
> 	return 0;
> 
> fail_free_cache:
> 	kmem_cache_destroy(bpr_ctx_cache);
> fail:
> 	return -ENOMEM;
> }
> 
> void __exit bpr_exit(void)
> {
> 	mempool_destroy(bpr_ctx_pool);
> 	kmem_cache_destroy(bpr_ctx_cache);
> }
> 
> module_init(bpr_init);
> module_exit(bpr_exit);
> MODULE_LICENSE("GPL");
> 
>
Theodore Ts'o May 30, 2018, 4:02 p.m. UTC | #9
On Wed, May 30, 2018 at 05:03:40PM +0530, Chandan Rajendra wrote:
> On Wednesday, May 30, 2018 10:36:42 AM IST Theodore Y. Ts'o wrote:
> > Note: we may want to move this thread so that it's on linux-fscrypt
> > exclusively.  I'm thinking that we should consider using linux-fscrypt
> > for fscrypt and fsverity discussions.  That way we can avoid adding
> > extra noise to the linux-fsdevel and linux-ext4 lists.  Comments?
> 
> I agree.

OK, I've moved linux-ext4 and linux-fsdevel to bcc.

> > I now think this was a mistake, and that we should handle this the
> > same way we handle CONFIG_QUOTA.  If we enable fscrypt or fsverity, it
> > should be enabled for all file sytems which support that feature.
> 
> Do we continue to retain ext4/readpage.c? I ask because if the logic in
> ext4/readpage.c has to be moved to fs/buffer.c (to avoid code duplication) then 
> we end up losing the ability to build fscrypt as a module even if one of
> the filesystems using fs/bio_post_read.c is itself built as a module.

It's not necessarily an either-or.  For example, we could use function
pointers to allow avoid needing code in the built-in portion of the
kernel from linking directly to the module.  Just as an illustration,
we could hang a pointer off of the struct super data structure which
is an array of post-processing functions, so each file system can
register post-processing handlers for each "post processing step".

I think the bigger deal is that mpage_readpage() and mpage_readpages()
are used by a large number of file systems, with a variety of
requirements --- including some with high performance requirements.
So if we make changes to fs/mpage.c, we need to make sure we don't
make things worse for them.  In particular, if post-processing isn't
needed, we shouldn't be doing any extra memory allocations or taking
any extra locks.

The somewhat smaller concern is that because mpage_readpages() has to
be used for some very old file systems, it uses a very simple
interface for getting the mapping from a logical block # to the
physical block number.  One of the benefits when I created
fs/ext4/readpage.c was that I dind't need to restrict myself to using
the old get_blocks_t interface, but could call ext4_map_blocks()
directly.  So for a readpages() call, we don't need end up needing to
call ext4_get_block() for each block mapping.  This saves CPU and the
need to take a read spinlock multiple times.  On the other hand,
readpages() is only used for for preallocation reads (so it's not
_quite_ as performance critical) and as an indication, XFS, which
tends to be highly performance sensitive, is using mpage_readpages()
without worrying too much about this issue.

The bottom line is that there is a large set of engineering tradeoffsh
to be made here, which is why we should talk about the approach first.
It may be there are people who can emit large amounts of perfectly
designed code all at once at high speed (perhaps like Mozart, who once
compared is musical composition process to vomiting, to the point
where he was bottlenecked on the speed that he could write notes on
staff paper), but I'm not smart enough to do that.  For code this
complex, doing some design ahead of time is a good thing.  :-)

> Relying on bio->bi_private to check for requirement of post processing steps
> does not seem to be correct. AFAIK (please correct me if I am wrong) the
> convention is that bi_private field is owned by the code that allocates the
> bio and the owner can assign its own value to this field.

Well, there is precedence for common code creating a convention for
using a private field.  For example, page_buffers() uses
page->private.  If a page belongs to a user such as a file system,
it's up the owner to decide how to use the private field, that's true.
However, if it uses fs/mpage.c, fs/mpage.c does use page_buffers(), so
a file system which uses mpage_readpages() has to allow page->private
to be used according to the page_buffers convention.

If we try to use bio->bi_private for fscrypt and fsveirty, *and* we
want to fold this support into common code like fs/mpage.c --- we're
probably OK, since at the moment, the fs/mpage.c code:

   * creates the bio
   * submits the bio
   * handles the bio completion handling

There is no way callers of mpage_readpage() and mpage_readpages() can
override how the bio is created (and hence control setting bi_private)
nor can they provide a bio completion handler (which could consume the
bi_private) field.

> Ideally we should be checking for per-inode encryption/verity flags to decide
> if any post-processing steps are required to be executed

At least for fsverity, it can't be a per-inode flag, since whether or
not verity processing is enabled depends on what part of the inode you
are reading.  (For example, if you are reading the Merkle tree or the
verity footer block, that's file metadata which does need verity
processing.)

We could make that be handled in the post-processing function, which
either returns an ERR_PTR, 0 if whatever work it did was done
immediately and nothing had to be queued on a workqueue, so the
bio-post-read code should try figure out what is the next
post-processing function should be called, and then call it, or 1 if
the work was queued on a workqueue, and so bio-post-read processor
should exit, since the bio-post-read processing function will be
called once the workqueue function is done doing its thing.

I'm not saying that's how we must do things; this is all part of the
design brainstorming process.  It's a lot easier to throw out possible
ideas for discussion than to spend days or weeks coding, only to
discover that it's not the best way forward.  :-)

						- Ted
Chandan Rajendra June 4, 2018, 10:09 a.m. UTC | #10
On Wednesday, May 30, 2018 10:36:42 AM IST Theodore Y. Ts'o wrote:
> Note: we may want to move this thread so that it's on linux-fscrypt
> exclusively.  I'm thinking that we should consider using linux-fscrypt
> for fscrypt and fsverity discussions.  That way we can avoid adding
> extra noise to the linux-fsdevel and linux-ext4 lists.  Comments?
> 
> On Wed, May 30, 2018 at 08:39:11AM +0530, Chandan Rajendra wrote:
> > 
> > I had misunderstood the requirement. Sorry about that. I had written the
> > patchset in its current form with the understanding that fs/buffer.c and
> > fs/ext4/*.c would need to get compiled even when fscrypt code isn't compiled
> > at all. When the fscrypt module isn't selected for build in the kernel config,
> > calls to fscrypt_*() functions would end up calling the equivalent nop
> > functions in fscrypt_notsupp.h file.
> > 
> > For the generic code to be completely unaware of several stages of "post
> > processinhg" functionality, I would most likely have to add more callback
> > pointers into the newly introduced "struct post_process_read" structure. 
> 
> It's still a work in progress, but I have an initial integration of
> ext4 with fsverity.  See the fsverity branch on ext4.git, and in
> particular, the changes made to fs/ext4/readpage.c.
> 
> Let's be clear that neither Eric and I are completely happy with how
> the fsveirty post-read processing is being done.  What's there right
> now is a place-holder so we can continue to development/debug the
> other aspects of fsverity.  In particular, we're aware that there is
> code duplication between code in fs/f2fs/data.c and fs/ext4/readpage.c
> 
> One of the things which is a bit tricky right now is that fscrypt and
> fsverity can be enabled on a per-file system basis.  That is, there are
> separate CONFIG options:
> 
>    * CONFIG_EXT4_FS_ENCRYPTION
>    * CONFIG_F2FS_FS_ENCRYPTION
>    * CONFIG_EXT4_FS_VERITY
>    * CONFIG_F2FS_FS_VERITY
> 
> And in each file system, you have to do this before including the header files:
> 
> #define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
> #include <linux/fscrypt.h>
> 
> #define __FS_HAS_VERITY IS_ENABLED(CONFIG_EXT4_FS_VERITY)
> #include <linux/fsverity.ha>
> 
> That's because whether you get the function prototype or an inline
> function which returns EOhPNOTSUPP for functions such as
> fscrypt_decrypt_bio() depends on how __FS_HAS_ENCRYPTION is defined
> before including linux/fscrypt.h.
> 
> I now think this was a mistake, and that we should handle this the
> same way we handle CONFIG_QUOTA.  If we enable fscrypt or fsverity, it
> should be enabled for all file sytems which support that feature.
> Otherwise it becomes too hard to try to support this in a file system
> independent way --- and we end up having code which is cut-and-pasted
> for each file system (e.g., in fs/f2fs/data.c and fs/ext4/readpage.c)
> but which may compile to something quite different thanks to the C
> preprocessor magic which is going on at the moment.
> 
> > I will work on this and post the results in the next version of the patchset.
> 
> So in order to avoid your wasting time and energy, and perhaps getting
> unnecessarily frustrated, I'd recommend that before you immediately
> start diving into implementation, that we have a design discussion
> about the best way to proceed.  And then when we have a common
> agreement about how to proceed, let's get something upstream first
> which changes the infrastructure used by the file systems and by
> fscrypt first.  And let's get that working, *before* we start
> integrating the changes for supporting fscrypt for 4k blocks for
> systems with 32k pagesize, or before we start making final (e.g.,
> non-prototype) changes to integrate fsverity.
> 
> I'll include my first attempt that I played around over the weekend
> with in terms of generic infrastructure, before I realized that we
> need to decide whether the current way we configure fscrypt and
> fsverity makes sense or not.  It's an example of why we should have
> design discussions first, and not just immediately start diving into
> code.
> 
> Fortunately (perhaps because I've some experience with these sorts of
> things) I only spent about an hour or so working on the prototype, and
> none trying to integrate it and doing all of the testing and
> debugging, before recognizing that we needed to have a meeting of the
> minds about design and requirements first.  :-)
> 
> Cheers,
> 
> 						- Ted
> 
> ------------- include/linux/bio_post_read.h
> 
> #ifndef _LINUX_BIO_POST_READ_H
> #define _LINUX_BIO_POST_READ_H
> 
> #include <linux/bio.h>
> 
> enum bio_post_read_step {
> 	STEP_INITIAL = 0,
> 	STEP_DECRYPT,
> 	STEP_VERITY,
> };
> 
> struct bio_post_read_ctx {
> 	struct bio *bio;
> 	struct work_struct work;
> 	unsigned int cur_step;
> 	unsigned int enabled_steps;
> };
> 
> static inline bool bpr_required(struct bio *bio)
> {
> 	return bio->bi_private && !bio->bi_status;
> }
> 
> extern void __bpr_read_end_io(struct bio *bio);
> void bpr_do_processing(struct bio_post_read_ctx *ctx);
> 
> #endif /* _LINUX_BIO_POST_READ_H */
> 
> ------------- fs/bio_post_read.c
> 
> // SPDX-License-Identifier: GPL-2.0
> /*
>  * fs/bio_post_read.c
>  *
>  * Copyright (C) 2018, Google LLC
>  *
>  * Contains helper functions used by file systems which use fscrypt
>  * and/or fsverity
>  */
> 
> #include <linux/mempool.h>
> #include <linux/module.h>
> #include <linux/slab.h>
> #include <linux/bio_post_read.h>
> #include <linux/pagemap.h>
> #include <linux/fscrypt.h>
> #include <linux/fsverity.h>
> 
> static unsigned int num_prealloc_post_read_ctxs = 128;
> 
> module_param(num_prealloc_post_read_ctxs, uint, 0444);
> MODULE_PARM_DESC(num_prealloc_post_read_ctxs,
> 		"Number of bio_post_read contexts to preallocate");
> 
> static struct kmem_cache *bpr_ctx_cache;
> static mempool_t *bpr_ctx_pool;
> 
> void __bpr_read_end_io(struct bio *bio)
> {
> 	struct page *page;
> 	struct bio_vec *bv;
> 	int i;
> 
> 	bio_for_each_segment_all(bv, bio, i) {
> 		page = bv->bv_page;
> 
> 		/* PG_error was set if any post_read step failed */
> 		if (bio->bi_status || PageError(page)) {
> 			ClearPageUptodate(page);
> 			SetPageError(page);
> 		} else {
> 			SetPageUptodate(page);
> 		}
> 		unlock_page(page);
> 	}
> 	if (bio->bi_private)
> 		mempool_free(bio->bi_private, bpr_ctx_pool);
> 	bio_put(bio);
> }
> 
> static void decrypt_work(struct work_struct *work)
> {
> 	struct bio_post_read_ctx *ctx =
> 		container_of(work, struct bio_post_read_ctx, work);
> 
> 	fscrypt_decrypt_bio(ctx->bio);
> 
> 	bpr_do_processing(ctx);
> }
> 
> static void verity_work(struct work_struct *work)
> {
> 	struct bio_post_read_ctx *ctx =
> 		container_of(work, struct bio_post_read_ctx, work);
> 
> 	fsverity_verify_bio(ctx->bio);
> 
> 	bpr_do_processing(ctx);
> }
> 
> void bpr_do_processing(struct bio_post_read_ctx *ctx)
> {
> 	/*
> 	 * We use different work queues for decryption and for verity because
> 	 * verity may require reading metadata pages that need decryption, and
> 	 * we shouldn't recurse to the same workqueue.
> 	 */
> 	switch (++ctx->cur_step) {
> 	case STEP_DECRYPT:
> 		if (ctx->enabled_steps & (1 << STEP_DECRYPT)) {
> 			INIT_WORK(&ctx->work, decrypt_work);
> 			fscrypt_enqueue_decrypt_work(&ctx->work);
> 			return;
> 		}
> 		ctx->cur_step++;
> 		/* fall-through */
> 	case STEP_VERITY:
> 		if (ctx->enabled_steps & (1 << STEP_VERITY)) {
> 			INIT_WORK(&ctx->work, verity_work);
> 			fsverity_enqueue_verify_work(&ctx->work);
> 			return;
> 		}
> 		ctx->cur_step++;
> 		/* fall-through */
> 	default:
> 		__bpr_read_end_io(ctx->bio);
> 	}
> }

decrypt_work() and verity_work() can be defined in fscrypt and fsverity
modules. inode->[i_crypt_info|i_verity_info] can hold the pointers to
decrypt_work() and verity_work() functions. The newly introduced function
pointers in inode->[i_crypt_info|i_verity_info] can be initialized in
fscrypt_get_encryption_info() and create_fsverity_info() respectively.

'struct bio_post_read_ctx' should have a new member to store a pointer to the
bpr_do_processing() function. Once *_work() functions complete their job, they
could invoke ctx->do_processing() to continue with the next stage of bio
processing.


> 
> int __init bpr_init(void)
> {
> 	bpr_ctx_cache = KMEM_CACHE(bio_post_read_ctx, 0);
> 	if (!bpr_ctx_cache)
> 		goto fail;
> 	bpr_ctx_pool = mempool_create_slab_pool(num_prealloc_post_read_ctxs,
> 						bpr_ctx_cache);
> 	if (!bpr_ctx_pool)
> 		goto fail_free_cache;
> 	return 0;
> 
> fail_free_cache:
> 	kmem_cache_destroy(bpr_ctx_cache);
> fail:
> 	return -ENOMEM;
> }
> 
> void __exit bpr_exit(void)
> {
> 	mempool_destroy(bpr_ctx_pool);
> 	kmem_cache_destroy(bpr_ctx_cache);
> }
> 
> module_init(bpr_init);
> module_exit(bpr_exit);
> MODULE_LICENSE("GPL");
> 
>
diff mbox series

Patch

diff --git a/fs/block_dev.c b/fs/block_dev.c
index b549666..254af9a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -568,13 +568,14 @@  static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 
 static int blkdev_readpage(struct file * file, struct page * page)
 {
-	return block_read_full_page(page, blkdev_get_block);
+	return block_read_full_page(page, blkdev_get_block, NULL);
 }
 
 static int blkdev_readpages(struct file *file, struct address_space *mapping,
 			struct list_head *pages, unsigned nr_pages)
 {
-	return mpage_readpages(mapping, pages, nr_pages, blkdev_get_block);
+	return mpage_readpages(mapping, pages, nr_pages, blkdev_get_block,
+			NULL);
 }
 
 static int blkdev_write_begin(struct file *file, struct address_space *mapping,
diff --git a/fs/buffer.c b/fs/buffer.c
index fda7926..978a8b7 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -45,8 +45,12 @@ 
 #include <linux/mpage.h>
 #include <linux/bit_spinlock.h>
 #include <linux/pagevec.h>
+#include <linux/fs.h>
 #include <trace/events/block.h>
 
+#define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
+#include <linux/fscrypt.h>
+
 static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
 static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 			 enum rw_hint hint, struct writeback_control *wbc);
@@ -2197,6 +2201,172 @@  int block_is_partially_uptodate(struct page *page, unsigned long from,
 }
 EXPORT_SYMBOL(block_is_partially_uptodate);
 
+static void end_bio_bh_io_sync(struct bio *bio)
+{
+	post_process_read_t *post_process;
+	struct fscrypt_ctx *ctx;
+	struct buffer_head *bh;
+
+	if (fscrypt_bio_encrypted(bio)) {
+		ctx = bio->bi_private;
+		post_process = fscrypt_get_post_process(ctx);
+
+		if (bio->bi_status || post_process->process_block == NULL) {
+			bh = fscrypt_get_bh(ctx);
+			fscrypt_release_ctx(ctx);
+		} else {
+			fscrypt_enqueue_decrypt_bio(ctx, bio,
+						post_process->process_block);
+			return;
+		}
+	} else {
+		bh = bio->bi_private;
+	}
+
+	if (unlikely(bio_flagged(bio, BIO_QUIET)))
+		set_bit(BH_Quiet, &bh->b_state);
+
+	bh->b_end_io(bh, !bio->bi_status);
+	bio_put(bio);
+}
+
+/*
+ * This allows us to do IO even on the odd last sectors
+ * of a device, even if the block size is some multiple
+ * of the physical sector size.
+ *
+ * We'll just truncate the bio to the size of the device,
+ * and clear the end of the buffer head manually.
+ *
+ * Truly out-of-range accesses will turn into actual IO
+ * errors, this only handles the "we need to be able to
+ * do IO at the final sector" case.
+ */
+void guard_bio_eod(int op, struct bio *bio)
+{
+	sector_t maxsector;
+	struct bio_vec *bvec = bio_last_bvec_all(bio);
+	unsigned truncated_bytes;
+	struct hd_struct *part;
+
+	rcu_read_lock();
+	part = __disk_get_part(bio->bi_disk, bio->bi_partno);
+	if (part)
+		maxsector = part_nr_sects_read(part);
+	else
+		maxsector = get_capacity(bio->bi_disk);
+	rcu_read_unlock();
+
+	if (!maxsector)
+		return;
+
+	/*
+	 * If the *whole* IO is past the end of the device,
+	 * let it through, and the IO layer will turn it into
+	 * an EIO.
+	 */
+	if (unlikely(bio->bi_iter.bi_sector >= maxsector))
+		return;
+
+	maxsector -= bio->bi_iter.bi_sector;
+	if (likely((bio->bi_iter.bi_size >> 9) <= maxsector))
+		return;
+
+	/* Uhhuh. We've got a bio that straddles the device size! */
+	truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9);
+
+	/* Truncate the bio.. */
+	bio->bi_iter.bi_size -= truncated_bytes;
+	bvec->bv_len -= truncated_bytes;
+
+	/* ..and clear the end of the buffer for reads */
+	if (op == REQ_OP_READ) {
+		zero_user(bvec->bv_page, bvec->bv_offset + bvec->bv_len,
+				truncated_bytes);
+	}
+}
+
+struct bio *create_bh_bio(int op, int op_flags, struct buffer_head *bh,
+			enum rw_hint write_hint,
+			post_process_read_t *post_process)
+{
+	struct address_space *mapping;
+	struct fscrypt_ctx *ctx = NULL;
+	struct inode *inode;
+	struct page *page;
+	struct bio *bio;
+
+	BUG_ON(!buffer_locked(bh));
+	BUG_ON(!buffer_mapped(bh));
+	BUG_ON(!bh->b_end_io);
+	BUG_ON(buffer_delay(bh));
+	BUG_ON(buffer_unwritten(bh));
+
+	/*
+	 * Only clear out a write error when rewriting
+	 */
+	if (test_set_buffer_req(bh) && (op == REQ_OP_WRITE))
+		clear_buffer_write_io_error(bh);
+
+	page = bh->b_page;
+
+	if (op == REQ_OP_READ) {
+		mapping = page_mapping(page);
+		if (mapping && !PageSwapCache(page)) {
+			inode = mapping->host;
+			if (IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode)) {
+				ctx = fscrypt_get_ctx(inode, GFP_NOFS);
+				BUG_ON(!ctx);
+				fscrypt_set_bh(ctx, bh);
+				if (post_process)
+					fscrypt_set_post_process(ctx,
+								post_process);
+			}
+		}
+	}
+
+	/*
+	 * from here on down, it's all bio -- do the initial mapping,
+	 * submit_bio -> generic_make_request may further map this bio around
+	 */
+	bio = bio_alloc(GFP_NOIO, 1);
+
+	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
+	bio_set_dev(bio, bh->b_bdev);
+	bio->bi_write_hint = write_hint;
+
+	bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
+	BUG_ON(bio->bi_iter.bi_size != bh->b_size);
+
+	bio->bi_end_io = end_bio_bh_io_sync;
+
+	if (ctx)
+		bio->bi_private = ctx;
+	else
+		bio->bi_private = bh;
+
+	/* Take care of bh's that straddle the end of the device */
+	guard_bio_eod(op, bio);
+
+	if (buffer_meta(bh))
+		op_flags |= REQ_META;
+	if (buffer_prio(bh))
+		op_flags |= REQ_PRIO;
+	bio_set_op_attrs(bio, op, op_flags);
+
+	return bio;
+}
+
+static int submit_bh_post_process(int op, int op_flags, struct buffer_head *bh,
+		post_process_read_t *post_process)
+{
+	struct bio *bio;
+
+	bio = create_bh_bio(op, op_flags, bh, 0, post_process);
+	submit_bio(bio);
+	return 0;
+}
+
 /*
  * Generic "read page" function for block devices that have the normal
  * get_block functionality. This is most of the block device filesystems.
@@ -2204,7 +2374,8 @@  EXPORT_SYMBOL(block_is_partially_uptodate);
  * set/clear_buffer_uptodate() functions propagate buffer state into the
  * page struct once IO has completed.
  */
-int block_read_full_page(struct page *page, get_block_t *get_block)
+int block_read_full_page(struct page *page, get_block_t *get_block,
+			post_process_read_t *post_process)
 {
 	struct inode *inode = page->mapping->host;
 	sector_t iblock, lblock;
@@ -2284,7 +2455,8 @@  int block_read_full_page(struct page *page, get_block_t *get_block)
 		if (buffer_uptodate(bh))
 			end_buffer_async_read(bh, 1);
 		else
-			submit_bh(REQ_OP_READ, 0, bh);
+			submit_bh_post_process(REQ_OP_READ, 0, bh,
+					post_process);
 	}
 	return 0;
 }
@@ -2959,124 +3131,12 @@  sector_t generic_block_bmap(struct address_space *mapping, sector_t block,
 }
 EXPORT_SYMBOL(generic_block_bmap);
 
-static void end_bio_bh_io_sync(struct bio *bio)
-{
-	struct buffer_head *bh = bio->bi_private;
-
-	if (unlikely(bio_flagged(bio, BIO_QUIET)))
-		set_bit(BH_Quiet, &bh->b_state);
-
-	bh->b_end_io(bh, !bio->bi_status);
-	bio_put(bio);
-}
-
-/*
- * This allows us to do IO even on the odd last sectors
- * of a device, even if the block size is some multiple
- * of the physical sector size.
- *
- * We'll just truncate the bio to the size of the device,
- * and clear the end of the buffer head manually.
- *
- * Truly out-of-range accesses will turn into actual IO
- * errors, this only handles the "we need to be able to
- * do IO at the final sector" case.
- */
-void guard_bio_eod(int op, struct bio *bio)
-{
-	sector_t maxsector;
-	struct bio_vec *bvec = bio_last_bvec_all(bio);
-	unsigned truncated_bytes;
-	struct hd_struct *part;
-
-	rcu_read_lock();
-	part = __disk_get_part(bio->bi_disk, bio->bi_partno);
-	if (part)
-		maxsector = part_nr_sects_read(part);
-	else
-		maxsector = get_capacity(bio->bi_disk);
-	rcu_read_unlock();
-
-	if (!maxsector)
-		return;
-
-	/*
-	 * If the *whole* IO is past the end of the device,
-	 * let it through, and the IO layer will turn it into
-	 * an EIO.
-	 */
-	if (unlikely(bio->bi_iter.bi_sector >= maxsector))
-		return;
-
-	maxsector -= bio->bi_iter.bi_sector;
-	if (likely((bio->bi_iter.bi_size >> 9) <= maxsector))
-		return;
-
-	/* Uhhuh. We've got a bio that straddles the device size! */
-	truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9);
-
-	/* Truncate the bio.. */
-	bio->bi_iter.bi_size -= truncated_bytes;
-	bvec->bv_len -= truncated_bytes;
-
-	/* ..and clear the end of the buffer for reads */
-	if (op == REQ_OP_READ) {
-		zero_user(bvec->bv_page, bvec->bv_offset + bvec->bv_len,
-				truncated_bytes);
-	}
-}
-
-struct bio *create_bh_bio(int op, int op_flags, struct buffer_head *bh,
-                          enum rw_hint write_hint)
-{
-	struct bio *bio;
-
-	BUG_ON(!buffer_locked(bh));
-	BUG_ON(!buffer_mapped(bh));
-	BUG_ON(!bh->b_end_io);
-	BUG_ON(buffer_delay(bh));
-	BUG_ON(buffer_unwritten(bh));
-
-	/*
-	 * Only clear out a write error when rewriting
-	 */
-	if (test_set_buffer_req(bh) && (op == REQ_OP_WRITE))
-		clear_buffer_write_io_error(bh);
-
-	/*
-	 * from here on down, it's all bio -- do the initial mapping,
-	 * submit_bio -> generic_make_request may further map this bio around
-	 */
-	bio = bio_alloc(GFP_NOIO, 1);
-
-	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
-	bio_set_dev(bio, bh->b_bdev);
-	bio->bi_write_hint = write_hint;
-
-	bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
-	BUG_ON(bio->bi_iter.bi_size != bh->b_size);
-
-	bio->bi_end_io = end_bio_bh_io_sync;
-	bio->bi_private = bh;
-
-	/* Take care of bh's that straddle the end of the device */
-	guard_bio_eod(op, bio);
-
-	if (buffer_meta(bh))
-		op_flags |= REQ_META;
-	if (buffer_prio(bh))
-		op_flags |= REQ_PRIO;
-	bio_set_op_attrs(bio, op, op_flags);
-
-	return bio;
-}
-
 static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 			 enum rw_hint write_hint, struct writeback_control *wbc)
 {
 	struct bio *bio;
 
-	bio = create_bh_bio(op, op_flags, bh, write_hint);
+	bio = create_bh_bio(op, op_flags, bh, write_hint, NULL);
 
 	if (wbc) {
 		wbc_init_bio(wbc, bio);
@@ -3092,7 +3152,7 @@  int submit_bh_blkcg_css(int op, int op_flags, struct buffer_head *bh,
 {
 	struct bio *bio;
 
-	bio = create_bh_bio(op, op_flags, bh, 0);
+	bio = create_bh_bio(op, op_flags, bh, 0, NULL);
 	bio_associate_blkcg(bio, blkcg_css);
 	submit_bio(bio);
 	return 0;
@@ -3101,11 +3161,7 @@  EXPORT_SYMBOL(submit_bh_blkcg_css);
 
 int submit_bh(int op, int op_flags, struct buffer_head *bh)
 {
-	struct bio *bio;
-
-	bio = create_bh_bio(op, op_flags, bh, 0);
-	submit_bio(bio);
-	return 0;
+	return submit_bh_post_process(op, op_flags, bh, NULL);
 }
 EXPORT_SYMBOL(submit_bh);
 
diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
index 32288c3..aba22f7 100644
--- a/fs/crypto/bio.c
+++ b/fs/crypto/bio.c
@@ -24,6 +24,7 @@ 
 #include <linux/module.h>
 #include <linux/bio.h>
 #include <linux/namei.h>
+#include <linux/buffer_head.h>
 #include "fscrypt_private.h"
 
 static void __fscrypt_decrypt_bio(struct bio *bio, bool done)
@@ -59,7 +60,7 @@  void fscrypt_decrypt_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(fscrypt_decrypt_bio);
 
-static void completion_pages(struct work_struct *work)
+void fscrypt_complete_pages(struct work_struct *work)
 {
 	struct fscrypt_ctx *ctx =
 		container_of(work, struct fscrypt_ctx, r.work);
@@ -69,15 +70,103 @@  static void completion_pages(struct work_struct *work)
 	fscrypt_release_ctx(ctx);
 	bio_put(bio);
 }
+EXPORT_SYMBOL(fscrypt_complete_pages);
 
-void fscrypt_enqueue_decrypt_bio(struct fscrypt_ctx *ctx, struct bio *bio)
+void fscrypt_complete_block(struct work_struct *work)
 {
-	INIT_WORK(&ctx->r.work, completion_pages);
+	struct fscrypt_ctx *ctx =
+		container_of(work, struct fscrypt_ctx, r.work);
+	struct buffer_head *bh;
+	struct bio *bio;
+	struct bio_vec *bv;
+	struct page *page;
+	struct inode *inode;
+	u64 blk_nr;
+	int ret;
+
+	bio = ctx->r.bio;
+	WARN_ON(bio->bi_vcnt != 1);
+
+	bv = bio->bi_io_vec;
+	page = bv->bv_page;
+	inode = page->mapping->host;
+
+	WARN_ON(bv->bv_len != i_blocksize(inode));
+
+	blk_nr = page->index << (PAGE_SHIFT - inode->i_blkbits);
+	blk_nr += bv->bv_offset >> inode->i_blkbits;
+
+	bh = ctx->r.bh;
+
+	ret = fscrypt_decrypt_page(inode, page, bv->bv_len,
+				bv->bv_offset, blk_nr);
+
+	bh->b_end_io(bh, !ret);
+
+	fscrypt_release_ctx(ctx);
+	bio_put(bio);
+}
+EXPORT_SYMBOL(fscrypt_complete_block);
+
+bool fscrypt_bio_encrypted(struct bio *bio)
+{
+	struct address_space *mapping;
+	struct inode *inode;
+	struct page *page;
+
+	if (bio_op(bio) == REQ_OP_READ && bio->bi_vcnt) {
+		page = bio->bi_io_vec->bv_page;
+
+		if (!PageSwapCache(page)) {
+			mapping = page_mapping(page);
+			if (mapping) {
+				inode = mapping->host;
+
+				if (IS_ENCRYPTED(inode) &&
+					S_ISREG(inode->i_mode))
+					return true;
+			}
+		}
+	}
+
+	return false;
+}
+EXPORT_SYMBOL(fscrypt_bio_encrypted);
+
+void fscrypt_enqueue_decrypt_bio(struct fscrypt_ctx *ctx, struct bio *bio,
+			void (*process_bio)(struct work_struct *))
+{
+	BUG_ON(!process_bio);
+	INIT_WORK(&ctx->r.work, process_bio);
 	ctx->r.bio = bio;
 	fscrypt_enqueue_decrypt_work(&ctx->r.work);
 }
 EXPORT_SYMBOL(fscrypt_enqueue_decrypt_bio);
 
+post_process_read_t *fscrypt_get_post_process(struct fscrypt_ctx *ctx)
+{
+	return &(ctx->r.post_process);
+}
+EXPORT_SYMBOL(fscrypt_get_post_process);
+
+void fscrypt_set_post_process(struct fscrypt_ctx *ctx,
+			post_process_read_t *post_process)
+{
+	ctx->r.post_process = *post_process;
+}
+
+struct buffer_head *fscrypt_get_bh(struct fscrypt_ctx *ctx)
+{
+	return ctx->r.bh;
+}
+EXPORT_SYMBOL(fscrypt_get_bh);
+
+void fscrypt_set_bh(struct fscrypt_ctx *ctx, struct buffer_head *bh)
+{
+	ctx->r.bh = bh;
+}
+EXPORT_SYMBOL(fscrypt_set_bh);
+
 void fscrypt_pullback_bio_page(struct page **page, bool restore)
 {
 	struct fscrypt_ctx *ctx;
diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index 27509b1..2148651 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -127,6 +127,8 @@  struct fscrypt_ctx *fscrypt_get_ctx(const struct inode *inode, gfp_t gfp_flags)
 		ctx->flags |= FS_CTX_REQUIRES_FREE_ENCRYPT_FL;
 	} else {
 		ctx->flags &= ~FS_CTX_REQUIRES_FREE_ENCRYPT_FL;
+		ctx->r.post_process.process_block = NULL;
+		ctx->r.post_process.process_pages = NULL;
 	}
 	ctx->flags &= ~FS_CTX_HAS_BOUNCE_BUFFER_FL;
 	return ctx;
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 1329b69..0a91f87 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -869,14 +869,14 @@  static int ext2_writepage(struct page *page, struct writeback_control *wbc)
 
 static int ext2_readpage(struct file *file, struct page *page)
 {
-	return mpage_readpage(page, ext2_get_block);
+	return mpage_readpage(page, ext2_get_block, NULL);
 }
 
 static int
 ext2_readpages(struct file *file, struct address_space *mapping,
 		struct list_head *pages, unsigned nr_pages)
 {
-	return mpage_readpages(mapping, pages, nr_pages, ext2_get_block);
+	return mpage_readpages(mapping, pages, nr_pages, ext2_get_block, NULL);
 }
 
 static int
diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 8fdfcd3..7c38803 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -8,7 +8,7 @@  obj-$(CONFIG_EXT4_FS) += ext4.o
 ext4-y	:= balloc.o bitmap.o block_validity.o dir.o ext4_jbd2.o extents.o \
 		extents_status.o file.o fsmap.o fsync.o hash.o ialloc.o \
 		indirect.o inline.o inode.o ioctl.o mballoc.o migrate.o \
-		mmp.o move_extent.o namei.o page-io.o readpage.o resize.o \
+		mmp.o move_extent.o namei.o page-io.o resize.o \
 		super.o symlink.o sysfs.o xattr.o xattr_trusted.o xattr_user.o
 
 ext4-$(CONFIG_EXT4_FS_POSIX_ACL)	+= acl.o
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fbc89d9..5ae3c7b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3337,6 +3337,10 @@  static int ext4_readpage(struct file *file, struct page *page)
 {
 	int ret = -EAGAIN;
 	struct inode *inode = page->mapping->host;
+	post_process_read_t post_process = {
+		.process_block = fscrypt_complete_block,
+		.process_pages = fscrypt_complete_pages,
+	};
 
 	trace_ext4_readpage(page);
 
@@ -3344,7 +3348,7 @@  static int ext4_readpage(struct file *file, struct page *page)
 		ret = ext4_readpage_inline(inode, page);
 
 	if (ret == -EAGAIN)
-		return ext4_mpage_readpages(page->mapping, NULL, page, 1);
+		return mpage_readpage(page, ext4_get_block, &post_process);
 
 	return ret;
 }
@@ -3354,12 +3358,17 @@  ext4_readpages(struct file *file, struct address_space *mapping,
 		struct list_head *pages, unsigned nr_pages)
 {
 	struct inode *inode = mapping->host;
+	post_process_read_t post_process = {
+		.process_block = fscrypt_complete_block,
+		.process_pages = fscrypt_complete_pages,
+	};
 
 	/* If the file has inline data, no need to do readpages. */
 	if (ext4_has_inline_data(inode))
 		return 0;
 
-	return ext4_mpage_readpages(mapping, pages, NULL, nr_pages);
+	return mpage_readpages(mapping, pages, nr_pages, ext4_get_block,
+			&post_process);
 }
 
 static void ext4_invalidatepage(struct page *page, unsigned int offset,
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
deleted file mode 100644
index 19b87a8..0000000
--- a/fs/ext4/readpage.c
+++ /dev/null
@@ -1,294 +0,0 @@ 
-// SPDX-License-Identifier: GPL-2.0
-/*
- * linux/fs/ext4/readpage.c
- *
- * Copyright (C) 2002, Linus Torvalds.
- * Copyright (C) 2015, Google, Inc.
- *
- * This was originally taken from fs/mpage.c
- *
- * The intent is the ext4_mpage_readpages() function here is intended
- * to replace mpage_readpages() in the general case, not just for
- * encrypted files.  It has some limitations (see below), where it
- * will fall back to read_block_full_page(), but these limitations
- * should only be hit when page_size != block_size.
- *
- * This will allow us to attach a callback function to support ext4
- * encryption.
- *
- * If anything unusual happens, such as:
- *
- * - encountering a page which has buffers
- * - encountering a page which has a non-hole after a hole
- * - encountering a page with non-contiguous blocks
- *
- * then this code just gives up and calls the buffer_head-based read function.
- * It does handle a page which has holes at the end - that is a common case:
- * the end-of-file on blocksize < PAGE_SIZE setups.
- *
- */
-
-#include <linux/kernel.h>
-#include <linux/export.h>
-#include <linux/mm.h>
-#include <linux/kdev_t.h>
-#include <linux/gfp.h>
-#include <linux/bio.h>
-#include <linux/fs.h>
-#include <linux/buffer_head.h>
-#include <linux/blkdev.h>
-#include <linux/highmem.h>
-#include <linux/prefetch.h>
-#include <linux/mpage.h>
-#include <linux/writeback.h>
-#include <linux/backing-dev.h>
-#include <linux/pagevec.h>
-#include <linux/cleancache.h>
-
-#include "ext4.h"
-
-static inline bool ext4_bio_encrypted(struct bio *bio)
-{
-#ifdef CONFIG_EXT4_FS_ENCRYPTION
-	return unlikely(bio->bi_private != NULL);
-#else
-	return false;
-#endif
-}
-
-/*
- * I/O completion handler for multipage BIOs.
- *
- * The mpage code never puts partial pages into a BIO (except for end-of-file).
- * If a page does not map to a contiguous run of blocks then it simply falls
- * back to block_read_full_page().
- *
- * Why is this?  If a page's completion depends on a number of different BIOs
- * which can complete in any order (or at the same time) then determining the
- * status of that page is hard.  See end_buffer_async_read() for the details.
- * There is no point in duplicating all that complexity.
- */
-static void mpage_end_io(struct bio *bio)
-{
-	struct bio_vec *bv;
-	int i;
-
-	if (ext4_bio_encrypted(bio)) {
-		if (bio->bi_status) {
-			fscrypt_release_ctx(bio->bi_private);
-		} else {
-			fscrypt_enqueue_decrypt_bio(bio->bi_private, bio);
-			return;
-		}
-	}
-	bio_for_each_segment_all(bv, bio, i) {
-		struct page *page = bv->bv_page;
-
-		if (!bio->bi_status) {
-			SetPageUptodate(page);
-		} else {
-			ClearPageUptodate(page);
-			SetPageError(page);
-		}
-		unlock_page(page);
-	}
-
-	bio_put(bio);
-}
-
-int ext4_mpage_readpages(struct address_space *mapping,
-			 struct list_head *pages, struct page *page,
-			 unsigned nr_pages)
-{
-	struct bio *bio = NULL;
-	sector_t last_block_in_bio = 0;
-
-	struct inode *inode = mapping->host;
-	const unsigned blkbits = inode->i_blkbits;
-	const unsigned blocks_per_page = PAGE_SIZE >> blkbits;
-	const unsigned blocksize = 1 << blkbits;
-	sector_t block_in_file;
-	sector_t last_block;
-	sector_t last_block_in_file;
-	sector_t blocks[MAX_BUF_PER_PAGE];
-	unsigned page_block;
-	struct block_device *bdev = inode->i_sb->s_bdev;
-	int length;
-	unsigned relative_block = 0;
-	struct ext4_map_blocks map;
-
-	map.m_pblk = 0;
-	map.m_lblk = 0;
-	map.m_len = 0;
-	map.m_flags = 0;
-
-	for (; nr_pages; nr_pages--) {
-		int fully_mapped = 1;
-		unsigned first_hole = blocks_per_page;
-
-		prefetchw(&page->flags);
-		if (pages) {
-			page = list_entry(pages->prev, struct page, lru);
-			list_del(&page->lru);
-			if (add_to_page_cache_lru(page, mapping, page->index,
-				  readahead_gfp_mask(mapping)))
-				goto next_page;
-		}
-
-		if (page_has_buffers(page))
-			goto confused;
-
-		block_in_file = (sector_t)page->index << (PAGE_SHIFT - blkbits);
-		last_block = block_in_file + nr_pages * blocks_per_page;
-		last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits;
-		if (last_block > last_block_in_file)
-			last_block = last_block_in_file;
-		page_block = 0;
-
-		/*
-		 * Map blocks using the previous result first.
-		 */
-		if ((map.m_flags & EXT4_MAP_MAPPED) &&
-		    block_in_file > map.m_lblk &&
-		    block_in_file < (map.m_lblk + map.m_len)) {
-			unsigned map_offset = block_in_file - map.m_lblk;
-			unsigned last = map.m_len - map_offset;
-
-			for (relative_block = 0; ; relative_block++) {
-				if (relative_block == last) {
-					/* needed? */
-					map.m_flags &= ~EXT4_MAP_MAPPED;
-					break;
-				}
-				if (page_block == blocks_per_page)
-					break;
-				blocks[page_block] = map.m_pblk + map_offset +
-					relative_block;
-				page_block++;
-				block_in_file++;
-			}
-		}
-
-		/*
-		 * Then do more ext4_map_blocks() calls until we are
-		 * done with this page.
-		 */
-		while (page_block < blocks_per_page) {
-			if (block_in_file < last_block) {
-				map.m_lblk = block_in_file;
-				map.m_len = last_block - block_in_file;
-
-				if (ext4_map_blocks(NULL, inode, &map, 0) < 0) {
-				set_error_page:
-					SetPageError(page);
-					zero_user_segment(page, 0,
-							  PAGE_SIZE);
-					unlock_page(page);
-					goto next_page;
-				}
-			}
-			if ((map.m_flags & EXT4_MAP_MAPPED) == 0) {
-				fully_mapped = 0;
-				if (first_hole == blocks_per_page)
-					first_hole = page_block;
-				page_block++;
-				block_in_file++;
-				continue;
-			}
-			if (first_hole != blocks_per_page)
-				goto confused;		/* hole -> non-hole */
-
-			/* Contiguous blocks? */
-			if (page_block && blocks[page_block-1] != map.m_pblk-1)
-				goto confused;
-			for (relative_block = 0; ; relative_block++) {
-				if (relative_block == map.m_len) {
-					/* needed? */
-					map.m_flags &= ~EXT4_MAP_MAPPED;
-					break;
-				} else if (page_block == blocks_per_page)
-					break;
-				blocks[page_block] = map.m_pblk+relative_block;
-				page_block++;
-				block_in_file++;
-			}
-		}
-		if (first_hole != blocks_per_page) {
-			zero_user_segment(page, first_hole << blkbits,
-					  PAGE_SIZE);
-			if (first_hole == 0) {
-				SetPageUptodate(page);
-				unlock_page(page);
-				goto next_page;
-			}
-		} else if (fully_mapped) {
-			SetPageMappedToDisk(page);
-		}
-		if (fully_mapped && blocks_per_page == 1 &&
-		    !PageUptodate(page) && cleancache_get_page(page) == 0) {
-			SetPageUptodate(page);
-			goto confused;
-		}
-
-		/*
-		 * This page will go to BIO.  Do we need to send this
-		 * BIO off first?
-		 */
-		if (bio && (last_block_in_bio != blocks[0] - 1)) {
-		submit_and_realloc:
-			submit_bio(bio);
-			bio = NULL;
-		}
-		if (bio == NULL) {
-			struct fscrypt_ctx *ctx = NULL;
-
-			if (ext4_encrypted_inode(inode) &&
-			    S_ISREG(inode->i_mode)) {
-				ctx = fscrypt_get_ctx(inode, GFP_NOFS);
-				if (IS_ERR(ctx))
-					goto set_error_page;
-			}
-			bio = bio_alloc(GFP_KERNEL,
-				min_t(int, nr_pages, BIO_MAX_PAGES));
-			if (!bio) {
-				if (ctx)
-					fscrypt_release_ctx(ctx);
-				goto set_error_page;
-			}
-			bio_set_dev(bio, bdev);
-			bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9);
-			bio->bi_end_io = mpage_end_io;
-			bio->bi_private = ctx;
-			bio_set_op_attrs(bio, REQ_OP_READ, 0);
-		}
-
-		length = first_hole << blkbits;
-		if (bio_add_page(bio, page, length, 0) < length)
-			goto submit_and_realloc;
-
-		if (((map.m_flags & EXT4_MAP_BOUNDARY) &&
-		     (relative_block == map.m_len)) ||
-		    (first_hole != blocks_per_page)) {
-			submit_bio(bio);
-			bio = NULL;
-		} else
-			last_block_in_bio = blocks[blocks_per_page - 1];
-		goto next_page;
-	confused:
-		if (bio) {
-			submit_bio(bio);
-			bio = NULL;
-		}
-		if (!PageUptodate(page))
-			block_read_full_page(page, ext4_get_block);
-		else
-			unlock_page(page);
-	next_page:
-		if (pages)
-			put_page(page);
-	}
-	BUG_ON(pages && !list_empty(pages));
-	if (bio)
-		submit_bio(bio);
-	return 0;
-}
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index ffbbf05..ee1ddc4f 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -194,13 +194,13 @@  static int fat_writepages(struct address_space *mapping,
 
 static int fat_readpage(struct file *file, struct page *page)
 {
-	return mpage_readpage(page, fat_get_block);
+	return mpage_readpage(page, fat_get_block, NULL);
 }
 
 static int fat_readpages(struct file *file, struct address_space *mapping,
 			 struct list_head *pages, unsigned nr_pages)
 {
-	return mpage_readpages(mapping, pages, nr_pages, fat_get_block);
+	return mpage_readpages(mapping, pages, nr_pages, fat_get_block, NULL);
 }
 
 static void fat_write_failed(struct address_space *mapping, loff_t to)
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index ec3fba7..60df56f 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -1173,13 +1173,14 @@  struct buffer_head *isofs_bread(struct inode *inode, sector_t block)
 
 static int isofs_readpage(struct file *file, struct page *page)
 {
-	return mpage_readpage(page, isofs_get_block);
+	return mpage_readpage(page, isofs_get_block, NULL);
 }
 
 static int isofs_readpages(struct file *file, struct address_space *mapping,
 			struct list_head *pages, unsigned nr_pages)
 {
-	return mpage_readpages(mapping, pages, nr_pages, isofs_get_block);
+	return mpage_readpages(mapping, pages, nr_pages, isofs_get_block,
+			NULL);
 }
 
 static sector_t _isofs_bmap(struct address_space *mapping, sector_t block)
diff --git a/fs/mpage.c b/fs/mpage.c
index b7e7f57..c88fdd4 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -30,6 +30,8 @@ 
 #include <linux/backing-dev.h>
 #include <linux/pagevec.h>
 #include <linux/cleancache.h>
+#define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
+#include <linux/fscrypt.h>
 #include "internal.h"
 
 /*
@@ -46,9 +48,24 @@ 
  */
 static void mpage_end_io(struct bio *bio)
 {
+	post_process_read_t *post_process;
+	struct fscrypt_ctx *ctx;
 	struct bio_vec *bv;
 	int i;
 
+	if (fscrypt_bio_encrypted(bio)) {
+		ctx = bio->bi_private;
+		post_process = fscrypt_get_post_process(ctx);
+
+		if (bio->bi_status || post_process->process_pages == NULL) {
+			fscrypt_release_ctx(ctx);
+		} else {
+			fscrypt_enqueue_decrypt_bio(ctx, bio,
+						post_process->process_pages);
+			return;
+		}
+	}
+
 	bio_for_each_segment_all(bv, bio, i) {
 		struct page *page = bv->bv_page;
 		page_endio(page, op_is_write(bio_op(bio)),
@@ -146,7 +163,7 @@  static struct bio *
 do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 		sector_t *last_block_in_bio, struct buffer_head *map_bh,
 		unsigned long *first_logical_block, get_block_t get_block,
-		gfp_t gfp)
+		post_process_read_t *post_process, gfp_t gfp)
 {
 	struct inode *inode = page->mapping->host;
 	const unsigned blkbits = inode->i_blkbits;
@@ -278,15 +295,26 @@  do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 
 alloc_new:
 	if (bio == NULL) {
-		if (first_hole == blocks_per_page) {
+		struct fscrypt_ctx *ctx = NULL;
+
+		if (IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode)) {
+			ctx = fscrypt_get_ctx(inode, gfp & GFP_KERNEL);
+			if (IS_ERR(ctx))
+				goto confused;
+			fscrypt_set_post_process(ctx, post_process);
+		} else if (first_hole == blocks_per_page) {
 			if (!bdev_read_page(bdev, blocks[0] << (blkbits - 9),
 								page))
 				goto out;
 		}
 		bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9),
 				min_t(int, nr_pages, BIO_MAX_PAGES), gfp);
-		if (bio == NULL)
+		if (bio == NULL) {
+			if (ctx)
+				fscrypt_release_ctx(ctx);
 			goto confused;
+		}
+		bio->bi_private = ctx;
 	}
 
 	length = first_hole << blkbits;
@@ -309,7 +337,7 @@  do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 	if (bio)
 		bio = mpage_bio_submit(REQ_OP_READ, 0, bio);
 	if (!PageUptodate(page))
-	        block_read_full_page(page, get_block);
+		block_read_full_page(page, get_block, post_process);
 	else
 		unlock_page(page);
 	goto out;
@@ -361,7 +389,8 @@  do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
  */
 int
 mpage_readpages(struct address_space *mapping, struct list_head *pages,
-				unsigned nr_pages, get_block_t get_block)
+		unsigned nr_pages, get_block_t get_block,
+		post_process_read_t *post_process)
 {
 	struct bio *bio = NULL;
 	unsigned page_idx;
@@ -384,7 +413,8 @@  mpage_readpages(struct address_space *mapping, struct list_head *pages,
 					nr_pages - page_idx,
 					&last_block_in_bio, &map_bh,
 					&first_logical_block,
-					get_block, gfp);
+					get_block, post_process,
+					gfp);
 		}
 		put_page(page);
 	}
@@ -398,7 +428,8 @@  EXPORT_SYMBOL(mpage_readpages);
 /*
  * This isn't called much at all
  */
-int mpage_readpage(struct page *page, get_block_t get_block)
+int mpage_readpage(struct page *page, get_block_t get_block,
+		post_process_read_t *post_process)
 {
 	struct bio *bio = NULL;
 	sector_t last_block_in_bio = 0;
@@ -409,7 +440,8 @@  int mpage_readpage(struct page *page, get_block_t get_block)
 	map_bh.b_state = 0;
 	map_bh.b_size = 0;
 	bio = do_mpage_readpage(bio, page, 1, &last_block_in_bio,
-			&map_bh, &first_logical_block, get_block, gfp);
+				&map_bh, &first_logical_block, get_block,
+				post_process, gfp);
 	if (bio)
 		mpage_bio_submit(REQ_OP_READ, 0, bio);
 	return 0;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0ab824f..74591b8 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1405,7 +1405,7 @@  xfs_vm_readpage(
 	struct page		*page)
 {
 	trace_xfs_vm_readpage(page->mapping->host, 1);
-	return mpage_readpage(page, xfs_get_blocks);
+	return mpage_readpage(page, xfs_get_blocks, NULL);
 }
 
 STATIC int
@@ -1416,7 +1416,7 @@  xfs_vm_readpages(
 	unsigned		nr_pages)
 {
 	trace_xfs_vm_readpages(mapping->host, nr_pages);
-	return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
+	return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks, NULL);
 }
 
 /*
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index c2fbd97..3718c20 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -224,7 +224,7 @@  int block_write_full_page(struct page *page, get_block_t *get_block,
 int __block_write_full_page(struct inode *inode, struct page *page,
 			get_block_t *get_block, struct writeback_control *wbc,
 			bh_end_io_t *handler);
-int block_read_full_page(struct page*, get_block_t*);
+int block_read_full_page(struct page*, get_block_t*, post_process_read_t*);
 int block_is_partially_uptodate(struct page *page, unsigned long from,
 				unsigned long count);
 int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0eedf74..40e3537 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -83,6 +83,10 @@  typedef int (get_block_t)(struct inode *inode, sector_t iblock,
 			struct buffer_head *bh_result, int create);
 typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 			ssize_t bytes, void *private);
+typedef struct post_process_read {
+	void (*process_block)(struct work_struct *);
+	void (*process_pages)(struct work_struct *);
+} post_process_read_t;
 
 #define MAY_EXEC		0x00000001
 #define MAY_WRITE		0x00000002
diff --git a/include/linux/fscrypt_notsupp.h b/include/linux/fscrypt_notsupp.h
index 9770be37..ceac8c8 100644
--- a/include/linux/fscrypt_notsupp.h
+++ b/include/linux/fscrypt_notsupp.h
@@ -168,9 +168,44 @@  static inline void fscrypt_decrypt_bio(struct bio *bio)
 {
 }
 
+static inline void fscrypt_complete_block(struct work_struct *work)
+{
+}
+
+static inline void fscrypt_complete_pages(struct work_struct *work)
+{
+}
+
 static inline void fscrypt_enqueue_decrypt_bio(struct fscrypt_ctx *ctx,
-					       struct bio *bio)
+					struct bio *bio,
+					void (*process_bio)(struct work_struct *))
+{
+}
+
+static inline bool fscrypt_bio_encrypted(struct bio *bio)
+{
+	return false;
+}
+
+static inline post_process_read_t *
+fscrypt_get_post_process(struct fscrypt_ctx *ctx)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+
+static inline void
+fscrypt_set_post_process(struct fscrypt_ctx *ctx, post_process_read_t *post_process)
+{
+}
+
+static inline void
+fscrypt_set_bh(struct fscrypt_ctx *ctx, struct buffer_head *bh)
+{
+}
+
+static inline struct buffer_head *fscrypt_get_bh(struct fscrypt_ctx *ctx)
 {
+	return ERR_PTR(-EOPNOTSUPP);
 }
 
 static inline void fscrypt_pullback_bio_page(struct page **page, bool restore)
diff --git a/include/linux/fscrypt_supp.h b/include/linux/fscrypt_supp.h
index 2c9a86a..b946eca 100644
--- a/include/linux/fscrypt_supp.h
+++ b/include/linux/fscrypt_supp.h
@@ -39,8 +39,10 @@  struct fscrypt_ctx {
 			struct page *control_page;	/* Original page  */
 		} w;
 		struct {
+			struct buffer_head *bh;
 			struct bio *bio;
 			struct work_struct work;
+			post_process_read_t post_process;
 		} r;
 		struct list_head free_list;	/* Free list */
 	};
@@ -190,8 +192,17 @@  static inline bool fscrypt_match_name(const struct fscrypt_name *fname,
 
 /* bio.c */
 extern void fscrypt_decrypt_bio(struct bio *);
+extern void fscrypt_complete_pages(struct work_struct *work);
+extern void fscrypt_complete_block(struct work_struct *work);
 extern void fscrypt_enqueue_decrypt_bio(struct fscrypt_ctx *ctx,
-					struct bio *bio);
+					struct bio *bio,
+					void (*process_bio)(struct work_struct *));
+extern post_process_read_t *fscrypt_get_post_process(struct fscrypt_ctx *ctx);
+extern void fscrypt_set_post_process(struct fscrypt_ctx *ctx,
+				post_process_read_t *post_process);
+extern struct buffer_head *fscrypt_get_bh(struct fscrypt_ctx *ctx);
+extern void fscrypt_set_bh(struct fscrypt_ctx *ctx, struct buffer_head *bh);
+extern bool fscrypt_bio_encrypted(struct bio *bio);
 extern void fscrypt_pullback_bio_page(struct page **, bool);
 extern int fscrypt_zeroout_range(const struct inode *, pgoff_t, sector_t,
 				 unsigned int);
diff --git a/include/linux/mpage.h b/include/linux/mpage.h
index 001f1fc..da2526a 100644
--- a/include/linux/mpage.h
+++ b/include/linux/mpage.h
@@ -15,8 +15,10 @@ 
 struct writeback_control;
 
 int mpage_readpages(struct address_space *mapping, struct list_head *pages,
-				unsigned nr_pages, get_block_t get_block);
-int mpage_readpage(struct page *page, get_block_t get_block);
+		unsigned nr_pages, get_block_t get_block,
+		post_process_read_t *post_process);
+int mpage_readpage(struct page *page, get_block_t get_block,
+		post_process_read_t *post_process);
 int mpage_writepages(struct address_space *mapping,
 		struct writeback_control *wbc, get_block_t get_block);
 int mpage_writepage(struct page *page, get_block_t *get_block,