[RFC,02/10] fs-verity: add data verification hooks for ->readpages()

Message ID 20180824161642.1144-3-ebiggers@kernel.org
State Superseded
Headers show
Series
  • fs-verity: filesystem-level integrity protection
Related show

Commit Message

Eric Biggers Aug. 24, 2018, 4:16 p.m.
From: Eric Biggers <ebiggers@google.com>

Add functions that verify data pages that have been read from a
fs-verity file, against that file's Merkle tree.  These will be called
from filesystems' ->readpage() and ->readpages() methods.

Since data verification can block, a workqueue is provided for these
methods to enqueue verification work from their bio completion callback.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/verity/Makefile           |   2 +-
 fs/verity/fsverity_private.h |   3 +
 fs/verity/setup.c            |  26 ++-
 fs/verity/verify.c           | 310 +++++++++++++++++++++++++++++++++++
 include/linux/fsverity.h     |  23 +++
 5 files changed, 362 insertions(+), 2 deletions(-)
 create mode 100644 fs/verity/verify.c

Comments

Gao Xiang Aug. 25, 2018, 2:29 a.m. | #1
Hi,

On 2018/8/25 0:16, Eric Biggers wrote:
> +/**
> + * fsverity_verify_page - verify a data page
> + *
> + * Verify a page that has just been read from a file against that file's Merkle
> + * tree.  The page is assumed to be a pagecache page.
> + *
> + * Return: true if the page is valid, else false.
> + */
> +bool fsverity_verify_page(struct page *data_page)
> +{
> +	struct inode *inode = data_page->mapping->host;
> +	const struct fsverity_info *vi = get_fsverity_info(inode);
> +	struct ahash_request *req;
> +	bool valid;
> +
> +	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
> +	if (unlikely(!req))
> +		return false;
> +
> +	valid = verify_page(inode, vi, req, data_page);
> +
> +	ahash_request_free(req);
> +
> +	return valid;
> +}
> +EXPORT_SYMBOL_GPL(fsverity_verify_page);
> +
> +/**
> + * fsverity_verify_bio - verify a 'read' bio that has just completed
> + *
> + * Verify a set of pages that have just been read from a file against that
> + * file's Merkle tree.  The pages are assumed to be pagecache pages.  Pages that
> + * fail verification are set to the Error state.  Verification is skipped for
> + * pages already in the Error state, e.g. due to fscrypt decryption failure.
> + */
> +void fsverity_verify_bio(struct bio *bio)
> +{
> +	struct inode *inode = bio_first_page_all(bio)->mapping->host;
> +	const struct fsverity_info *vi = get_fsverity_info(inode);
> +	struct ahash_request *req;
> +	struct bio_vec *bv;
> +	int i;
> +
> +	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
> +	if (unlikely(!req)) {
> +		bio_for_each_segment_all(bv, bio, i)
> +			SetPageError(bv->bv_page);
> +		return;
> +	}
> +
> +	bio_for_each_segment_all(bv, bio, i) {
> +		struct page *page = bv->bv_page;
> +
> +		if (!PageError(page) && !verify_page(inode, vi, req, page))
> +			SetPageError(page);
> +	}
> +
> +	ahash_request_free(req);
> +}
> +EXPORT_SYMBOL_GPL(fsverity_verify_bio);

Out of curiosity, I quickly scanned the fs-verity source code and some minor question out there....

If something is wrong, please point out, thanks in advance...

My first question is that 'Is there any way to skip to verify pages in a bio?'
I am thinking about
If metadata and data page are mixed in a filesystem of such kind, they could submit together in a bio, but metadata could be unsuitable for such kind of verification.

The second question is related to the first question --- 'Is there any way to verify a partial page?'
Take scalability into consideration, some files could be totally inlined or partially inlined in metadata.
Is there any way to deal with them in per-file approach? at least --- support for the interface?

At last, I hope filesystems could select the on-disk position of hash tree and 'struct fsverity_descriptor'
rather than fixed in the end of verity files...I think if fs-verity preparing such support and interfaces could be better.....hmmm... :(

Thanks,
Gao Xiang
Theodore Ts'o Aug. 25, 2018, 3:45 a.m. | #2
On Sat, Aug 25, 2018 at 10:29:26AM +0800, Gao Xiang wrote:
> 
> My first question is that 'Is there any way to skip to verify pages in a bio?'
> I am thinking about
> If metadata and data page are mixed in a filesystem of such kind, they could submit together in a bio, but metadata could be unsuitable for such kind of verification.
> 
> The second question is related to the first question --- 'Is there any way to verify a partial page?'
> Take scalability into consideration, some files could be totally inlined or partially inlined in metadata.
> Is there any way to deal with them in per-file approach? at least --- support for the interface?

A requirement of both fscrypt and fsverity is that is that block size
== page size, and that all data is stored in blocks.  Inline data is
not supported.

The files that are intended for use with fsverity are large files
(such as APK files), so optimizing for files smaller than a block was
not a design goal.

					- Ted
Gao Xiang Aug. 25, 2018, 4 a.m. | #3
Hi Ted,

On 2018/8/25 11:45, Theodore Y. Ts'o wrote:
> On Sat, Aug 25, 2018 at 10:29:26AM +0800, Gao Xiang wrote:
>> My first question is that 'Is there any way to skip to verify pages in a bio?'
>> I am thinking about
>> If metadata and data page are mixed in a filesystem of such kind, they could submit together in a bio, but metadata could be unsuitable for such kind of verification.
>>
>> The second question is related to the first question --- 'Is there any way to verify a partial page?'
>> Take scalability into consideration, some files could be totally inlined or partially inlined in metadata.
>> Is there any way to deal with them in per-file approach? at least --- support for the interface?
> A requirement of both fscrypt and fsverity is that is that block size
> == page size, and that all data is stored in blocks.  Inline data is
> not supported.
> 
> The files that are intended for use with fsverity are large files
> (such as APK files), so optimizing for files smaller than a block was
> not a design goal.

Thanks for your quickly reply. :)

I had seen the background of why Google/Android introduces fs-verity before.

> 

But I have some consideration than the current implementation.... (if it is suitable to discuss, thanks...)

1) Since it is the libfs-like library, I think bio-strict is too strict for its future fs users.

bios could be already organized in filesystem-specific way, which could include some other pages that is unnecessary to be verified.

I could give some example, if some filesystem organizes its bios for decompression, and some data exist in metadata.
It could be hard to use this libfs-like fsverity interface.

2) My last question
"At last, I hope filesystems could select the on-disk position of hash tree and 'struct fsverity_descriptor'
rather than fixed in the end of verity files...I think if fs-verity preparing such support and interfaces could be better....."

is also for some files partially or totally encoded (eg. compressed, or whatever ...)

I think the hash tree is unnecessary to be compressed...so I think it could be better that it can be selected by users (filesystems of course).

Thanks,
Gao Xiang.

> 					- Ted
Eric Biggers Aug. 25, 2018, 4:16 a.m. | #4
Hi Gao,

On Sat, Aug 25, 2018 at 10:29:26AM +0800, Gao Xiang wrote:
> Hi,
> 
> On 2018/8/25 0:16, Eric Biggers wrote:
> > +/**
> > + * fsverity_verify_page - verify a data page
> > + *
> > + * Verify a page that has just been read from a file against that file's Merkle
> > + * tree.  The page is assumed to be a pagecache page.
> > + *
> > + * Return: true if the page is valid, else false.
> > + */
> > +bool fsverity_verify_page(struct page *data_page)
> > +{
> > +	struct inode *inode = data_page->mapping->host;
> > +	const struct fsverity_info *vi = get_fsverity_info(inode);
> > +	struct ahash_request *req;
> > +	bool valid;
> > +
> > +	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
> > +	if (unlikely(!req))
> > +		return false;
> > +
> > +	valid = verify_page(inode, vi, req, data_page);
> > +
> > +	ahash_request_free(req);
> > +
> > +	return valid;
> > +}
> > +EXPORT_SYMBOL_GPL(fsverity_verify_page);
> > +
> > +/**
> > + * fsverity_verify_bio - verify a 'read' bio that has just completed
> > + *
> > + * Verify a set of pages that have just been read from a file against that
> > + * file's Merkle tree.  The pages are assumed to be pagecache pages.  Pages that
> > + * fail verification are set to the Error state.  Verification is skipped for
> > + * pages already in the Error state, e.g. due to fscrypt decryption failure.
> > + */
> > +void fsverity_verify_bio(struct bio *bio)
> > +{
> > +	struct inode *inode = bio_first_page_all(bio)->mapping->host;
> > +	const struct fsverity_info *vi = get_fsverity_info(inode);
> > +	struct ahash_request *req;
> > +	struct bio_vec *bv;
> > +	int i;
> > +
> > +	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
> > +	if (unlikely(!req)) {
> > +		bio_for_each_segment_all(bv, bio, i)
> > +			SetPageError(bv->bv_page);
> > +		return;
> > +	}
> > +
> > +	bio_for_each_segment_all(bv, bio, i) {
> > +		struct page *page = bv->bv_page;
> > +
> > +		if (!PageError(page) && !verify_page(inode, vi, req, page))
> > +			SetPageError(page);
> > +	}
> > +
> > +	ahash_request_free(req);
> > +}
> > +EXPORT_SYMBOL_GPL(fsverity_verify_bio);
> 
> Out of curiosity, I quickly scanned the fs-verity source code and some minor question out there....
> 
> If something is wrong, please point out, thanks in advance...
> 
> My first question is that 'Is there any way to skip to verify pages in a bio?'
> I am thinking about
> If metadata and data page are mixed in a filesystem of such kind, they could submit together in a bio, but metadata could be unsuitable for such kind of verification.
> 

Pages below i_size are verified, pages above are not.

With my patches, ext4 and f2fs won't actually submit pages in both areas in the
same bio, and they won't call the fs-verity verification function for bios in
the data area.  But even if they did, there's also a check in verify_page() that
skips the verification if the page is above i_size.

> The second question is related to the first question --- 'Is there any way to verify a partial page?'
> Take scalability into consideration, some files could be totally inlined or partially inlined in metadata.
> Is there any way to deal with them in per-file approach? at least --- support for the interface?

Well, one problem is that inline data has its own separate I/O path; see
ext4_readpage_inline() and f2fs_read_inline_data().  So it would be a large
effort to support features like encryption and verity which require
postprocessing after reads, and probably not worthwhile especially for verity
which is primarily intended for large files.

A somewhat separate question is whether the zero padding to a block boundary
after i_size, before the Merkle tree begins, is needed.  The answer is yes,
since mixing data and metadata in the same page would cause problems.  First,
userspace would be able to mmap the page and see some of the metadata rather
than zeroes.  That's not a huge problem, but it breaks the standard behavior.
Second, any page containing data cannot be set Uptodate until it's been
verified.  So, a special case would be needed to handle reading the part of the
metadata that's located in a data page.

> At last, I hope filesystems could select the on-disk position of hash tree and 'struct fsverity_descriptor'
> rather than fixed in the end of verity files...I think if fs-verity preparing such support and interfaces could be better.....hmmm... :(

In theory it would be a much cleaner design to store verity metadata separately
from the data.  But the Merkle tree can be very large.  For example, a 1 GB file
using SHA-512 would have a 16.6 MB Merkle tree.  So the Merkle tree can't be an
extended attribute, since the xattrs API requires xattrs to be small (<= 64 KB),
and most filesystems further limit xattr sizes in their on-disk format to as
little as 4 KB.  Furthermore, even if both of these limits were to be increased,
the xattrs functions (both the syscalls, and the internal functions that
filesystems have) are all based around getting/setting the entire xattr value.

Also when used with fscrypt, we want the Merkle tree and fsverity_descriptor to
be encrypted, so they doesn't leak plaintext hashes.  And we want the Merkle
tree to be paged into memory, just like the file contents, to take advantage of
the usual Linux memory management.

What we really need is *streams*, like NTFS has.  But the filesystems we're
targetting don't support streams, nor does the Linux syscall interface have any
API for accessing streams, nor does the VFS support them.

Adding streams support to all those things would be a huge multi-year effort,
controversial, and almost certainly not worth it just for fs-verity.

So simply storing the verity metadata past i_size seems like the best solution
for now.

That being said, in the future we could pretty easily swap out the calls to
read_mapping_page() with something else if a particular filesystem wanted to
store the metadata somewhere else.  We actually even originally had a function
->read_metadata_page() in the filesystem's fsverity_operations, but it turned
out to be unnecessary and I replaced it with directly calling
read_mapping_page(), but it could be changed back at any time.

- Eric
Theodore Ts'o Aug. 25, 2018, 5:06 a.m. | #5
On Sat, Aug 25, 2018 at 12:00:04PM +0800, Gao Xiang wrote:
> 
> But I have some consideration than the current implementation.... (if it is suitable to discuss, thanks...)
> 
> 1) Since it is the libfs-like library, I think bio-strict is too strict for its future fs users.

Well, it's always possible to potentially expand fs-crypt and
fs-verity to be more flexible in the future.  For example, Chandan
Rajendra from IBM has been working on a set of patches to support file
systems that have a block size smaller than a page size.  This turns
out to be important on Power architecture with 64k page sizes.

Fundamentally, a Merkle tree is a data structure that works on fixed
size chunks, both for the data blocks and the hash tree.  The natural
size to use is the page size, since data is cached in the page cache.

So a file system can be store data in any number of places, but
ultimately, most interesting file systems are ones where you can
execute ELF binaries out of said file system with demand paging, which
in turn means that mmap has to work, which in turn means that file
data will be stored in the page cache.  This is true of f2fs, btrfs,
ext4, xfs, etc.  So basically, fs-verity will be verifying the page
before it is marked as uptodate.  Right now, all of the file systems
that we are interested in trigger the call to ask fsverity to verify
the page via the bio endio callback function.

Some other file systems could theoretically call that function after
assembling the page from a dozen random locations in a b-tree.  In
that case, it could call fsverity after assembling the page in the
page cache.  But I'd suggest worrying about it when such a file system
comes out of the woodwork, and someone is willing to do the work to
integrate fserity in that file system.

> 2) My last question
> "At last, I hope filesystems could select the on-disk position of hash tree and 'struct fsverity_descriptor'
> rather than fixed in the end of verity files...I think if fs-verity preparing such support and interfaces could be better....."
> is also for some files partially or totally encoded (eg. compressed, or whatever ...)

Well, the userspace interface for instantiating a fs-verity file is
that it writes the file data with the fs-verity metadata (which
consists of the Merkle tree with a fs-verity header at the end of the
file).  The program (which might be a package manager such as dpkg or
rpm) would then call an ioctl which would cause the file system to
read the fs-verity header and make only the file data visible, and the
file system would the verify the data as it is read into the page
cache.

That is the userspace API to the fs-verity system.  That has to remain
the same, regardless of which file system is in use.  We need a common
interface so that whether it is the Android APK management system, or
some distribution package manager, can instantiate fs-verity protected
file the same way regardless of the file system in use.

There is a very simple, easy way to implement this in the file system,
and f2fs and ext4 both do it that way --- which is to simply change
the i_size exposed to the userspace when you stat the file, and we use
the file system's existing mechanism to map logical block numbers to
physical block numbers to read the Merkle tree.

If the file system wants to import that file data and store it
somewhere else random --- perhaps it breaks it apart into a zillion
tiny pieces and puts it in a b-tree --- a file system implementor is
free to do that.  I personally think it is a completely insane thing
to do, but there is nothing in the fs-verity design that *prohibits*
that.

Regards,

						- Ted
Gao Xiang Aug. 25, 2018, 6:31 a.m. | #6
Hi Eric,

Thanks for your detailed reply.

My english is not quite well, I could not type logically and quickly like you and could use some words improperly,
I just want to express my personal concern, please understand, thanks. :)

On 2018/8/25 12:16, Eric Biggers wrote:
> Hi Gao,
> 
> On Sat, Aug 25, 2018 at 10:29:26AM +0800, Gao Xiang wrote:
>> Hi,
>>
>> On 2018/8/25 0:16, Eric Biggers wrote:
>>> +/**
>>> + * fsverity_verify_page - verify a data page
>>> + *
>>> + * Verify a page that has just been read from a file against that file's Merkle
>>> + * tree.  The page is assumed to be a pagecache page.
>>> + *
>>> + * Return: true if the page is valid, else false.
>>> + */
>>> +bool fsverity_verify_page(struct page *data_page)
>>> +{
>>> +	struct inode *inode = data_page->mapping->host;
>>> +	const struct fsverity_info *vi = get_fsverity_info(inode);
>>> +	struct ahash_request *req;
>>> +	bool valid;
>>> +
>>> +	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
>>> +	if (unlikely(!req))
>>> +		return false;
>>> +
>>> +	valid = verify_page(inode, vi, req, data_page);
>>> +
>>> +	ahash_request_free(req);
>>> +
>>> +	return valid;
>>> +}
>>> +EXPORT_SYMBOL_GPL(fsverity_verify_page);
>>> +
>>> +/**
>>> + * fsverity_verify_bio - verify a 'read' bio that has just completed
>>> + *
>>> + * Verify a set of pages that have just been read from a file against that
>>> + * file's Merkle tree.  The pages are assumed to be pagecache pages.  Pages that
>>> + * fail verification are set to the Error state.  Verification is skipped for
>>> + * pages already in the Error state, e.g. due to fscrypt decryption failure.
>>> + */
>>> +void fsverity_verify_bio(struct bio *bio)
>>> +{
>>> +	struct inode *inode = bio_first_page_all(bio)->mapping->host;
>>> +	const struct fsverity_info *vi = get_fsverity_info(inode);
>>> +	struct ahash_request *req;
>>> +	struct bio_vec *bv;
>>> +	int i;
>>> +
>>> +	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
>>> +	if (unlikely(!req)) {
>>> +		bio_for_each_segment_all(bv, bio, i)
>>> +			SetPageError(bv->bv_page);
>>> +		return;
>>> +	}
>>> +
>>> +	bio_for_each_segment_all(bv, bio, i) {
>>> +		struct page *page = bv->bv_page;
>>> +
>>> +		if (!PageError(page) && !verify_page(inode, vi, req, page))
>>> +			SetPageError(page);
>>> +	}
>>> +
>>> +	ahash_request_free(req);
>>> +}
>>> +EXPORT_SYMBOL_GPL(fsverity_verify_bio);
>>
>> Out of curiosity, I quickly scanned the fs-verity source code and some minor question out there....
>>
>> If something is wrong, please point out, thanks in advance...
>>
>> My first question is that 'Is there any way to skip to verify pages in a bio?'
>> I am thinking about
>> If metadata and data page are mixed in a filesystem of such kind, they could submit together in a bio, but metadata could be unsuitable for such kind of verification.
>>
> 
> Pages below i_size are verified, pages above are not.
> 
> With my patches, ext4 and f2fs won't actually submit pages in both areas in the
> same bio, and they won't call the fs-verity verification function for bios in
> the data area.  But even if they did, there's also a check in verify_page() that

I think you mean the hash area?
Yes, I understand your design. It is a wonderful job for ext4/f2fs for now as Ted said.

> skips the verification if the page is above i_size.
>

I think it could not be as simple as you said for all cases.

If some fs submits contiguous access with different MAPPING (something like mixed FILE_MAPPING and META_MAPPING),
their page->index are actually unreliable(could be logical page index for FILE_MAPPING,and physical page index for META_MAPPING),
and data are organized by design in multi bios for a fs-specific use (such as compresssion).

You couldn't do such verification `if the page is above i_size' and it could be hard to integrate somehow.

>> The second question is related to the first question --- 'Is there any way to verify a partial page?'
>> Take scalability into consideration, some files could be totally inlined or partially inlined in metadata.
>> Is there any way to deal with them in per-file approach? at least --- support for the interface?
> 
> Well, one problem is that inline data has its own separate I/O path; see
> ext4_readpage_inline() and f2fs_read_inline_data().  So it would be a large
> effort to support features like encryption and verity which require
> postprocessing after reads, and probably not worthwhile especially for verity
> which is primarily intended for large files.

Yes, for the current user ext4 and f2fs, it is absolutely wonderful.


I have to admit I am curious about Google fs-verity roadmap for the future Android
(I have to identify whether it is designed to replace dm-verity, currently I think is no)

since it is very important whether our EROFS should support fs-verity or not in the near future...


I could give some EROFS use case if you have some time to discuss.

EROFS uses a more aggressive inline approach, which means it not only inline data for small files.
It is designed to inline the last page, which are reasonable small (eg. only a few byte) to inline for all files, eg.

                                                                 IN FILE_MAPPING
IN META_MAPPING                                                  blk-aligned
+--------------------------------------|                         +--------+--------+     +----------+.......+
|inode A+inlined-last data.. inode B...|                         | page 0 | page 1 | ... | page n-1 . page n.
+--------------------------------------+                         +--------+--------+     +----------+.......+
         |------------------------------------------------------------------------------------------|\

In priciple, this approach could be also used for read-write file systems to save more storage space.
I think it is still easy for uncompressed file if you do the zero padding as you said below.

But if considering _compression_.....especially compression in VLE, I think it should not rely on `bio' directly, because,
1) endio with compressed data rather than FILE_MAPPING plain data, these pages which could from META_MAPPING
(for caching compressed page on purpose) or FILE_MAPPING(for decompressing in-place to save redundant META_MAPPING memory).

I think it should be decompress at first and then fs-verity, but there could be more filepages other than compresssed pages joined
(eg. 128kb->32kb, we submit 8 pages but decompress end with 32 pages), it should not be the original bio any more...
(actually I think it is not the bio concept anymore...)

2) EROFS VLE is more complicated, we could end a bio with a compressed page but decompress a partial file page, eg.
    +-------------------+--------------------+
... | compressed page X |compressed page X+1 |
    +-------------------|--------------------+
            end of bio Y/     bio Y+1
                 \      |      /
                  +-------------------------+
                  |   plain data (file page)|
                  +-------------------------+
which means a bio could only decompress partial data of a page, the page could be Uptodate by two bios rather than one,
I have no idea how to fs-verity like this...

`it could call fsverity after assembling the page in the page cache.` as Ted said in that case.

> 
> A somewhat separate question is whether the zero padding to a block boundary
> after i_size, before the Merkle tree begins, is needed.  The answer is yes,
> since mixing data and metadata in the same page would cause problems.  First,
> userspace would be able to mmap the page and see some of the metadata rather
> than zeroes.  That's not a huge problem, but it breaks the standard behavior.
> Second, any page containing data cannot be set Uptodate until it's been
> verified.  So, a special case would be needed to handle reading the part of the
> metadata that's located in a data page.

Yes, after I just thinked over, I think there should be a zero padding to a block boundary
as you said due to Uptodate and mmap scenerio if you directly use its inode(file) mapping for verification.


> 
>> At last, I hope filesystems could select the on-disk position of hash tree and 'struct fsverity_descriptor'
>> rather than fixed in the end of verity files...I think if fs-verity preparing such support and interfaces could be better.....hmmm... :(
> 
> In theory it would be a much cleaner design to store verity metadata separately
> from the data.  But the Merkle tree can be very large.  For example, a 1 GB file
> using SHA-512 would have a 16.6 MB Merkle tree.  So the Merkle tree can't be an
> extended attribute, since the xattrs API requires xattrs to be small (<= 64 KB),
> and most filesystems further limit xattr sizes in their on-disk format to as
> little as 4 KB.  Furthermore, even if both of these limits were to be increased,
> the xattrs functions (both the syscalls, and the internal functions that
> filesystems have) are all based around getting/setting the entire xattr value.
> 
> Also when used with fscrypt, we want the Merkle tree and fsverity_descriptor to
> be encrypted, so they doesn't leak plaintext hashes.  And we want the Merkle
> tree to be paged into memory, just like the file contents, to take advantage of
> the usual Linux memory management.
> 
> What we really need is *streams*, like NTFS has.  But the filesystems we're
> targetting don't support streams, nor does the Linux syscall interface have any
> API for accessing streams, nor does the VFS support them.
> 
> Adding streams support to all those things would be a huge multi-year effort,
> controversial, and almost certainly not worth it just for fs-verity.
> 
> So simply storing the verity metadata past i_size seems like the best solution
> for now.
> 
> That being said, in the future we could pretty easily swap out the calls to
> read_mapping_page() with something else if a particular filesystem wanted to
> store the metadata somewhere else.  We actually even originally had a function
> ->read_metadata_page() in the filesystem's fsverity_operations, but it turned
> out to be unnecessary and I replaced it with directly calling
> read_mapping_page(), but it could be changed back at any time.

OK, I got it.

I have to look into that and think over again. Thanks for your reply again in the end. :)

Thanks,
Gao Xiang

> 
> - Eric
>
Eric Biggers Aug. 25, 2018, 7:18 a.m. | #7
Hi Gao,

On Sat, Aug 25, 2018 at 02:31:16PM +0800, Gao Xiang wrote:
> Hi Eric,
> 
> Thanks for your detailed reply.
> 
> My english is not quite well, I could not type logically and quickly like you and could use some words improperly,
> I just want to express my personal concern, please understand, thanks. :)
> 
> On 2018/8/25 12:16, Eric Biggers wrote:
> > Hi Gao,
> > 
> > On Sat, Aug 25, 2018 at 10:29:26AM +0800, Gao Xiang wrote:
> >> Hi,
> >>
> >> On 2018/8/25 0:16, Eric Biggers wrote:
> >>> +/**
> >>> + * fsverity_verify_page - verify a data page
> >>> + *
> >>> + * Verify a page that has just been read from a file against that file's Merkle
> >>> + * tree.  The page is assumed to be a pagecache page.
> >>> + *
> >>> + * Return: true if the page is valid, else false.
> >>> + */
> >>> +bool fsverity_verify_page(struct page *data_page)
> >>> +{
> >>> +	struct inode *inode = data_page->mapping->host;
> >>> +	const struct fsverity_info *vi = get_fsverity_info(inode);
> >>> +	struct ahash_request *req;
> >>> +	bool valid;
> >>> +
> >>> +	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
> >>> +	if (unlikely(!req))
> >>> +		return false;
> >>> +
> >>> +	valid = verify_page(inode, vi, req, data_page);
> >>> +
> >>> +	ahash_request_free(req);
> >>> +
> >>> +	return valid;
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(fsverity_verify_page);
> >>> +
> >>> +/**
> >>> + * fsverity_verify_bio - verify a 'read' bio that has just completed
> >>> + *
> >>> + * Verify a set of pages that have just been read from a file against that
> >>> + * file's Merkle tree.  The pages are assumed to be pagecache pages.  Pages that
> >>> + * fail verification are set to the Error state.  Verification is skipped for
> >>> + * pages already in the Error state, e.g. due to fscrypt decryption failure.
> >>> + */
> >>> +void fsverity_verify_bio(struct bio *bio)
> >>> +{
> >>> +	struct inode *inode = bio_first_page_all(bio)->mapping->host;
> >>> +	const struct fsverity_info *vi = get_fsverity_info(inode);
> >>> +	struct ahash_request *req;
> >>> +	struct bio_vec *bv;
> >>> +	int i;
> >>> +
> >>> +	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
> >>> +	if (unlikely(!req)) {
> >>> +		bio_for_each_segment_all(bv, bio, i)
> >>> +			SetPageError(bv->bv_page);
> >>> +		return;
> >>> +	}
> >>> +
> >>> +	bio_for_each_segment_all(bv, bio, i) {
> >>> +		struct page *page = bv->bv_page;
> >>> +
> >>> +		if (!PageError(page) && !verify_page(inode, vi, req, page))
> >>> +			SetPageError(page);
> >>> +	}
> >>> +
> >>> +	ahash_request_free(req);
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(fsverity_verify_bio);
> >>
> >> Out of curiosity, I quickly scanned the fs-verity source code and some minor question out there....
> >>
> >> If something is wrong, please point out, thanks in advance...
> >>
> >> My first question is that 'Is there any way to skip to verify pages in a bio?'
> >> I am thinking about
> >> If metadata and data page are mixed in a filesystem of such kind, they could submit together in a bio, but metadata could be unsuitable for such kind of verification.
> >>
> > 
> > Pages below i_size are verified, pages above are not.
> > 
> > With my patches, ext4 and f2fs won't actually submit pages in both areas in the
> > same bio, and they won't call the fs-verity verification function for bios in
> > the data area.  But even if they did, there's also a check in verify_page() that
> 
> I think you mean the hash area?

Yes, I meant the hash area.

> > skips the verification if the page is above i_size.
> >
> 
> I think it could not be as simple as you said for all cases.
> 
> If some fs submits contiguous access with different MAPPING (something like
> mixed FILE_MAPPING and META_MAPPING), their page->index are actually
> unreliable(could be logical page index for FILE_MAPPING,and physical page
> index for META_MAPPING), and data are organized by design in multi bios for a
> fs-specific use (such as compresssion).
> 
> You couldn't do such verification `if the page is above i_size' and it could
> be hard to integrate somehow.

We do have to be very careful here, but the same restriction already exists with
fscrypt which both f2fs and ext4 already support too.  With fscrypt, each page
is decrypted with the key from page->mapping->host->i_crypt_info and the
initialization vector from page->index.  With fs-verity, each page is verified
using the Merkle tree state from page->mapping->host->i_verify_info and the
block location from page->index.  So, they are very similar.

On f2fs, any pages submitted via META_MAPPING just skip both fscrypt and
fs-verity since the "meta_inode" doesn't have either feature enabled.  That's
done intentionally, so that garbage collection can move the blocks on-disk.
Regular reads aren't done via META_MAPPING.

> 
> >> The second question is related to the first question --- 'Is there any way to verify a partial page?'
> >> Take scalability into consideration, some files could be totally inlined or partially inlined in metadata.
> >> Is there any way to deal with them in per-file approach? at least --- support for the interface?
> > 
> > Well, one problem is that inline data has its own separate I/O path; see
> > ext4_readpage_inline() and f2fs_read_inline_data().  So it would be a large
> > effort to support features like encryption and verity which require
> > postprocessing after reads, and probably not worthwhile especially for verity
> > which is primarily intended for large files.
> 
> Yes, for the current user ext4 and f2fs, it is absolutely wonderful.
> 
> 
> I have to admit I am curious about Google fs-verity roadmap for the future Android
> (I have to identify whether it is designed to replace dm-verity, currently I think is no)
> 
> since it is very important whether our EROFS should support fs-verity or not in the near future...
> 
> 
> I could give some EROFS use case if you have some time to discuss.
> 
> EROFS uses a more aggressive inline approach, which means it not only inline data for small files.
> It is designed to inline the last page, which are reasonable small (eg. only a few byte) to inline for all files, eg.
> 
>                                                                  IN FILE_MAPPING
> IN META_MAPPING                                                  blk-aligned
> +--------------------------------------|                         +--------+--------+     +----------+.......+
> |inode A+inlined-last data.. inode B...|                         | page 0 | page 1 | ... | page n-1 . page n.
> +--------------------------------------+                         +--------+--------+     +----------+.......+
>          |------------------------------------------------------------------------------------------|\
> 
> In priciple, this approach could be also used for read-write file systems to save more storage space.
> I think it is still easy for uncompressed file if you do the zero padding as you said below.
> 
> But if considering _compression_.....especially compression in VLE, I think it should not rely on `bio' directly, because,
> 1) endio with compressed data rather than FILE_MAPPING plain data, these pages which could from META_MAPPING
> (for caching compressed page on purpose) or FILE_MAPPING(for decompressing in-place to save redundant META_MAPPING memory).
> 
> I think it should be decompress at first and then fs-verity, but there could be more filepages other than compresssed pages joined
> (eg. 128kb->32kb, we submit 8 pages but decompress end with 32 pages), it should not be the original bio any more...
> (actually I think it is not the bio concept anymore...)
> 
> 2) EROFS VLE is more complicated, we could end a bio with a compressed page but decompress a partial file page, eg.
>     +-------------------+--------------------+
> ... | compressed page X |compressed page X+1 |
>     +-------------------|--------------------+
>             end of bio Y/     bio Y+1
>                  \      |      /
>                   +-------------------------+
>                   |   plain data (file page)|
>                   +-------------------------+
> which means a bio could only decompress partial data of a page, the page could be Uptodate by two bios rather than one,
> I have no idea how to fs-verity like this...
> 
> `it could call fsverity after assembling the page in the page cache.` as Ted said in that case.
> 

I don't know of any plan to use fs-verity on Android's system partition or to
replace dm-verity on the system partition.  The use cases so far have been
verifying files on /data, like APK files.

So I don't think you need to support fs-verity in EROFS.

Re: the compression, I don't see how it would be much of a problem (even if you
did need or want to add fs-verity support).  Assuming that the verification is
done over the uncompressed version of the data, you wouldn't verify the pages
directly from the bio's page list since those would contain compressed data.
But even without fs-verity you'd need to decompress the data into pagecache
pages... so you could just call fsverity_verify_page() on each of those
decompressed pages before unlocking them and setting them Uptodate.  You don't
*have* to call fsverity_verify_bio() to do the verification; it's just a helper
for the case where the list of pages to verify happens to be in a completed bio.

> > 
> > A somewhat separate question is whether the zero padding to a block boundary
> > after i_size, before the Merkle tree begins, is needed.  The answer is yes,
> > since mixing data and metadata in the same page would cause problems.  First,
> > userspace would be able to mmap the page and see some of the metadata rather
> > than zeroes.  That's not a huge problem, but it breaks the standard behavior.
> > Second, any page containing data cannot be set Uptodate until it's been
> > verified.  So, a special case would be needed to handle reading the part of the
> > metadata that's located in a data page.
> 
> Yes, after I just thinked over, I think there should be a zero padding to a block boundary
> as you said due to Uptodate and mmap scenerio if you directly use its inode(file) mapping for verification.
> 
> 
> > 
> >> At last, I hope filesystems could select the on-disk position of hash tree and 'struct fsverity_descriptor'
> >> rather than fixed in the end of verity files...I think if fs-verity preparing such support and interfaces could be better.....hmmm... :(
> > 
> > In theory it would be a much cleaner design to store verity metadata separately
> > from the data.  But the Merkle tree can be very large.  For example, a 1 GB file
> > using SHA-512 would have a 16.6 MB Merkle tree.  So the Merkle tree can't be an
> > extended attribute, since the xattrs API requires xattrs to be small (<= 64 KB),
> > and most filesystems further limit xattr sizes in their on-disk format to as
> > little as 4 KB.  Furthermore, even if both of these limits were to be increased,
> > the xattrs functions (both the syscalls, and the internal functions that
> > filesystems have) are all based around getting/setting the entire xattr value.
> > 
> > Also when used with fscrypt, we want the Merkle tree and fsverity_descriptor to
> > be encrypted, so they doesn't leak plaintext hashes.  And we want the Merkle
> > tree to be paged into memory, just like the file contents, to take advantage of
> > the usual Linux memory management.
> > 
> > What we really need is *streams*, like NTFS has.  But the filesystems we're
> > targetting don't support streams, nor does the Linux syscall interface have any
> > API for accessing streams, nor does the VFS support them.
> > 
> > Adding streams support to all those things would be a huge multi-year effort,
> > controversial, and almost certainly not worth it just for fs-verity.
> > 
> > So simply storing the verity metadata past i_size seems like the best solution
> > for now.
> > 
> > That being said, in the future we could pretty easily swap out the calls to
> > read_mapping_page() with something else if a particular filesystem wanted to
> > store the metadata somewhere else.  We actually even originally had a function
> > ->read_metadata_page() in the filesystem's fsverity_operations, but it turned
> > out to be unnecessary and I replaced it with directly calling
> > read_mapping_page(), but it could be changed back at any time.
> 
> OK, I got it.
> 
> I have to look into that and think over again. Thanks for your reply again in the end. :)
> 
> Thanks,
> Gao Xiang
> 

- Eric
Gao Xiang Aug. 25, 2018, 7:33 a.m. | #8
Hi Ted,

Thanks for your detailed reply. Sorry about my english, the words could not be logical.

Tiny pieces in B-tree to compose a page is too far from us too, and you are right,
fs-verity is complete for >99% cases for the existed file system, and no need to worry about currently.

As I mentioned in reply to Eric, I am actually curious about the Google fs-verity roadmap
for the future Android, I need to analyze if it is only limited to APKs for the read-write partitions
and not to replace dm-verity in the near future since fs-verity has some conflicts
to EROFS I am working on I mentioned in the email to Eric.

I think it is more than just to handle FILE_MAPPING and bio-strict for compression use.

On 2018/8/25 13:06, Theodore Y. Ts'o wrote:
> But I'd suggest worrying about it when such a file system
> comes out of the woodwork, and someone is willing to do the work to
> integrate fserity in that file system.
> 
Yes, we are now handling partial page due to compression use.

fs could submit bios in pages from different mapping(FILE_MAPPING[compress in-place and no caching
compressed page to reduce extra memory overhead] or META_MAPPING [for caching compressed page]) and
they could be decompressed into many full pages and (possible) a partial page (in-place or out-of-place).

so in principle, since we have BIO_MAX_PAGES limitation, a filemap page could be Uptodate
after two bios is ended and decompressed. and other runtime limitations could also divide a bio into two bios for encoded cases.

Therefore, I think in that case we could not just consider FILE_MAPPING and one bio, and as you said `In
that case, it could call fsverity after assembling the page in the page cache.' should be done in this way.

> Well, the userspace interface for instantiating a fs-verity file is
> that it writes the file data with the fs-verity metadata (which
> consists of the Merkle tree with a fs-verity header at the end of the
> file).  The program (which might be a package manager such as dpkg or
> rpm) would then call an ioctl which would cause the file system to
> read the fs-verity header and make only the file data visible, and the
> file system would the verify the data as it is read into the page
> cache.

Thanks for your reply again, I think fs-verity is good enough for now.
However, I need to think over about fs-verity itself more... :(

Thanks,
Gao Xiang
Gao Xiang Aug. 25, 2018, 7:43 a.m. | #9
Hi Eric,

On 2018/8/25 15:18, Eric Biggers wrote:
> We do have to be very careful here, but the same restriction already exists with
> fscrypt which both f2fs and ext4 already support too.  With fscrypt, each page
> is decrypted with the key from page->mapping->host->i_crypt_info and the
> initialization vector from page->index.  With fs-verity, each page is verified
> using the Merkle tree state from page->mapping->host->i_verify_info and the
> block location from page->index.  So, they are very similar.
> 
> On f2fs, any pages submitted via META_MAPPING just skip both fscrypt and
> fs-verity since the "meta_inode" doesn't have either feature enabled.  That's
> done intentionally, so that garbage collection can move the blocks on-disk.
> Regular reads aren't done via META_MAPPING.
> 

I think you deal with the existed cases quite well, I was just thinking about EROFS... :)

> I don't know of any plan to use fs-verity on Android's system partition or to
> replace dm-verity on the system partition.  The use cases so far have been
> verifying files on /data, like APK files.
> 
> So I don't think you need to support fs-verity in EROFS.
> 

Thanks for your information about fs-verity, that is quite useful for us
Actually, I was worrying about that these months...  :)

> Re: the compression, I don't see how it would be much of a problem (even if you
> did need or want to add fs-verity support).  Assuming that the verification is
> done over the uncompressed version of the data, you wouldn't verify the pages
> directly from the bio's page list since those would contain compressed data.
> But even without fs-verity you'd need to decompress the data into pagecache
> pages... so you could just call fsverity_verify_page() on each of those
> decompressed pages before unlocking them and setting them Uptodate.  You don't
> *have* to call fsverity_verify_bio() to do the verification; it's just a helper
> for the case where the list of pages to verify happens to be in a completed bio.
> 

I haven't look into all patches, I will look into that carefully if I finish my current job.
It is wonderful to have such a helper --- fsverity_verify_page :)

I have no other problem currently, and look forward for your final implementation.

Best Regards,
Gao Xiang

> - Eric
Gao Xiang Aug. 25, 2018, 7:55 a.m. | #10
Hi Ted,

Please ignore the following email, Eric has replied to me. :)
I need to dig into these fs-verity patches later and best wishes to fs-verity.

Thanks,
Gao Xiang

On 2018/8/25 15:33, Gao Xiang wrote:
> Hi Ted,
> 
> Thanks for your detailed reply. Sorry about my english, the words could not be logical.
> 
> Tiny pieces in B-tree to compose a page is too far from us too, and you are right,
> fs-verity is complete for >99% cases for the existed file system, and no need to worry about currently.
> 
> As I mentioned in reply to Eric, I am actually curious about the Google fs-verity roadmap
> for the future Android, I need to analyze if it is only limited to APKs for the read-write partitions
> and not to replace dm-verity in the near future since fs-verity has some conflicts
> to EROFS I am working on I mentioned in the email to Eric.
> 
> I think it is more than just to handle FILE_MAPPING and bio-strict for compression use.
> 
> On 2018/8/25 13:06, Theodore Y. Ts'o wrote:
>> But I'd suggest worrying about it when such a file system
>> comes out of the woodwork, and someone is willing to do the work to
>> integrate fserity in that file system.
>>
> Yes, we are now handling partial page due to compression use.
> 
> fs could submit bios in pages from different mapping(FILE_MAPPING[compress in-place and no caching
> compressed page to reduce extra memory overhead] or META_MAPPING [for caching compressed page]) and
> they could be decompressed into many full pages and (possible) a partial page (in-place or out-of-place).
> 
> so in principle, since we have BIO_MAX_PAGES limitation, a filemap page could be Uptodate
> after two bios is ended and decompressed. and other runtime limitations could also divide a bio into two bios for encoded cases.
> 
> Therefore, I think in that case we could not just consider FILE_MAPPING and one bio, and as you said `In
> that case, it could call fsverity after assembling the page in the page cache.' should be done in this way.
> 
>> Well, the userspace interface for instantiating a fs-verity file is
>> that it writes the file data with the fs-verity metadata (which
>> consists of the Merkle tree with a fs-verity header at the end of the
>> file).  The program (which might be a package manager such as dpkg or
>> rpm) would then call an ioctl which would cause the file system to
>> read the fs-verity header and make only the file data visible, and the
>> file system would the verify the data as it is read into the page
>> cache.
> Thanks for your reply again, I think fs-verity is good enough for now.
> However, I need to think over about fs-verity itself more... :(
> 
> Thanks,
> Gao Xiang
Theodore Ts'o Aug. 25, 2018, 5:06 p.m. | #11
On Sat, Aug 25, 2018 at 03:43:43PM +0800, Gao Xiang wrote:
> > I don't know of any plan to use fs-verity on Android's system partition or to
> > replace dm-verity on the system partition.  The use cases so far have been
> > verifying files on /data, like APK files.
> > 
> > So I don't think you need to support fs-verity in EROFS.
> 
> Thanks for your information about fs-verity, that is quite useful for us
> Actually, I was worrying about that these months...  :)

I'll be even clearer --- I can't *imagine* any situation where it
would make sense to use fs-verity on the Android system partition.
Remember, for OTA to work the system image has to be bit-for-bit
identical to the official golden image for that release.  So the
system image has to be completely locked down from any modification
(to data or metadata), and that means dm-verity and *NOT* fs-verity.

The initial use of fs-verity (as you can see if you look at AOSP) will
be to protect a small number of privileged APK's that are stored on
the data partition.  Previously, they were verified when they were
downloaded, and never again.

Part of the goal which we are trying to achieve here is that even if
the kernel gets compromised by a 0-day, a successful reboot should
restore the system to a known state.  That is, the secure bootloader
checks the signature of the kernel, and then in turn, dm-verity will
verify the root Merkle hash protecting the system partition, and
fs-verity will protect the privileged APK's.  If malware modifies any
these components in an attempt to be persistent, the modifications
would be detected, and the worst it could do is to cause subsequent
reboots to fail until the phone's software could be reflashed.

Cheers,

					- Ted
Gao Xiang Aug. 26, 2018, 1:44 p.m. | #12
Hi Ted,

Sorry for the late reply...

On 2018/8/26 1:06, Theodore Y. Ts'o wrote:
> On Sat, Aug 25, 2018 at 03:43:43PM +0800, Gao Xiang wrote:
>>> I don't know of any plan to use fs-verity on Android's system partition or to
>>> replace dm-verity on the system partition.  The use cases so far have been
>>> verifying files on /data, like APK files.
>>>
>>> So I don't think you need to support fs-verity in EROFS.
>>
>> Thanks for your information about fs-verity, that is quite useful for us
>> Actually, I was worrying about that these months...  :)
> 
> I'll be even clearer --- I can't *imagine* any situation where it
> would make sense to use fs-verity on the Android system partition.
> Remember, for OTA to work the system image has to be bit-for-bit
> identical to the official golden image for that release.  So the
> system image has to be completely locked down from any modification
> (to data or metadata), and that means dm-verity and *NOT* fs-verity.

I think so mainly because of the security reason you said above.

In addition, I think it is mandatory that the Android system partition
should also _never_ suffer from filesystem corrupted by design (expect
for the storage device corrupt or malware), therefore I think the
bit-for-bit read-only, and identical-verity requirement is quite strong
for Android, which will make the Android system steady and as solid as
rocks.

But I need to make sure my personal thoughts through this topic. :)

> 
> The initial use of fs-verity (as you can see if you look at AOSP) will
> be to protect a small number of privileged APK's that are stored on
> the data partition.  Previously, they were verified when they were
> downloaded, and never again.
> 
> Part of the goal which we are trying to achieve here is that even if
> the kernel gets compromised by a 0-day, a successful reboot should
> restore the system to a known state.  That is, the secure bootloader
> checks the signature of the kernel, and then in turn, dm-verity will
> verify the root Merkle hash protecting the system partition, and
> fs-verity will protect the privileged APK's.  If malware modifies any
> these components in an attempt to be persistent, the modifications
> would be detected, and the worst it could do is to cause subsequent
> reboots to fail until the phone's software could be reflashed.
> 

Yeah, I have seen the the fs-verity presentation and materials from
Android bootcamp and other official channels before.


Thanks for your kindly detailed explanation. :)


Best regards,
Gao Xiang

> Cheers,
> 
> 					- Ted
>
Chuck Lever Aug. 26, 2018, 3:55 p.m. | #13
> On Aug 24, 2018, at 12:16 PM, Eric Biggers <ebiggers@kernel.org> wrote:
> 
> From: Eric Biggers <ebiggers@google.com>
> 
> Add functions that verify data pages that have been read from a
> fs-verity file, against that file's Merkle tree.  These will be called
> from filesystems' ->readpage() and ->readpages() methods.
> 
> Since data verification can block, a workqueue is provided for these
> methods to enqueue verification work from their bio completion callback.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
> fs/verity/Makefile           |   2 +-
> fs/verity/fsverity_private.h |   3 +
> fs/verity/setup.c            |  26 ++-
> fs/verity/verify.c           | 310 +++++++++++++++++++++++++++++++++++
> include/linux/fsverity.h     |  23 +++
> 5 files changed, 362 insertions(+), 2 deletions(-)
> create mode 100644 fs/verity/verify.c
> 
> diff --git a/fs/verity/Makefile b/fs/verity/Makefile
> index 39e123805c827..a6c7cefb61ab7 100644
> --- a/fs/verity/Makefile
> +++ b/fs/verity/Makefile
> @@ -1,3 +1,3 @@
> obj-$(CONFIG_FS_VERITY)	+= fsverity.o
> 
> -fsverity-y := hash_algs.o setup.o
> +fsverity-y := hash_algs.o setup.o verify.o
> diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
> index a18ff645695f4..c553f99dc4973 100644
> --- a/fs/verity/fsverity_private.h
> +++ b/fs/verity/fsverity_private.h
> @@ -96,4 +96,7 @@ static inline bool set_fsverity_info(struct inode *inode,
> 	return true;
> }
> 
> +/* verify.c */
> +extern struct workqueue_struct *fsverity_read_workqueue;
> +
> #endif /* _FSVERITY_PRIVATE_H */
> diff --git a/fs/verity/setup.c b/fs/verity/setup.c
> index e675c52898d5b..84cc2edeca25b 100644
> --- a/fs/verity/setup.c
> +++ b/fs/verity/setup.c
> @@ -824,18 +824,42 @@ EXPORT_SYMBOL_GPL(fsverity_full_i_size);
> 
> static int __init fsverity_module_init(void)
> {
> +	int err;
> +
> +	/*
> +	 * Use an unbound workqueue to allow bios to be verified in parallel
> +	 * even when they happen to complete on the same CPU.  This sacrifices
> +	 * locality, but it's worthwhile since hashing is CPU-intensive.
> +	 *
> +	 * Also use a high-priority workqueue to prioritize verification work,
> +	 * which blocks reads from completing, over regular application tasks.
> +	 */
> +	err = -ENOMEM;
> +	fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue",
> +						  WQ_UNBOUND | WQ_HIGHPRI,
> +						  num_online_cpus());
> +	if (!fsverity_read_workqueue)
> +		goto error;
> +
> +	err = -ENOMEM;
> 	fsverity_info_cachep = KMEM_CACHE(fsverity_info, SLAB_RECLAIM_ACCOUNT);
> 	if (!fsverity_info_cachep)
> -		return -ENOMEM;
> +		goto error_free_workqueue;
> 
> 	fsverity_check_hash_algs();
> 
> 	pr_debug("Initialized fs-verity\n");
> 	return 0;
> +
> +error_free_workqueue:
> +	destroy_workqueue(fsverity_read_workqueue);
> +error:
> +	return err;
> }
> 
> static void __exit fsverity_module_exit(void)
> {
> +	destroy_workqueue(fsverity_read_workqueue);
> 	kmem_cache_destroy(fsverity_info_cachep);
> 	fsverity_exit_hash_algs();
> }
> diff --git a/fs/verity/verify.c b/fs/verity/verify.c
> new file mode 100644
> index 0000000000000..1452dd05f75d3
> --- /dev/null
> +++ b/fs/verity/verify.c
> @@ -0,0 +1,310 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * fs/verity/verify.c: fs-verity data verification functions,
> + *		       i.e. hooks for ->readpages()
> + *
> + * Copyright (C) 2018 Google LLC
> + *
> + * Originally written by Jaegeuk Kim and Michael Halcrow;
> + * heavily rewritten by Eric Biggers.
> + */
> +
> +#include "fsverity_private.h"
> +
> +#include <crypto/hash.h>
> +#include <linux/bio.h>
> +#include <linux/pagemap.h>
> +#include <linux/ratelimit.h>
> +#include <linux/scatterlist.h>
> +
> +struct workqueue_struct *fsverity_read_workqueue;
> +
> +/**
> + * hash_at_level() - compute the location of the block's hash at the given level
> + *
> + * @vi:		(in) the file's verity info
> + * @dindex:	(in) the index of the data block being verified
> + * @level:	(in) the level of hash we want
> + * @hindex:	(out) the index of the hash block containing the wanted hash
> + * @hoffset:	(out) the byte offset to the wanted hash within the hash block
> + */
> +static void hash_at_level(const struct fsverity_info *vi, pgoff_t dindex,
> +			  unsigned int level, pgoff_t *hindex,
> +			  unsigned int *hoffset)
> +{
> +	pgoff_t hoffset_in_lvl;
> +
> +	/*
> +	 * Compute the offset of the hash within the level's region, in hashes.
> +	 * For example, with 4096-byte blocks and 32-byte hashes, there are
> +	 * 4096/32 = 128 = 2^7 hashes per hash block, i.e. log_arity = 7.  Then,
> +	 * if the data block index is 65668 and we want the level 1 hash, it is
> +	 * located at 65668 >> 7 = 513 hashes into the level 1 region.
> +	 */
> +	hoffset_in_lvl = dindex >> (level * vi->log_arity);
> +
> +	/*
> +	 * Compute the index of the hash block containing the wanted hash.
> +	 * Continuing the above example, the block would be at index 513 >> 7 =
> +	 * 4 within the level 1 region.  To this we'd add the index at which the
> +	 * level 1 region starts.
> +	 */
> +	*hindex = vi->hash_lvl_region_idx[level] +
> +		  (hoffset_in_lvl >> vi->log_arity);
> +
> +	/*
> +	 * Finally, compute the index of the hash within the block rather than
> +	 * the region, and multiply by the hash size to turn it into a byte
> +	 * offset.  Continuing the above example, the hash would be at byte
> +	 * offset (513 & ((1 << 7) - 1)) * 32 = 32 within the block.
> +	 */
> +	*hoffset = (hoffset_in_lvl & ((1 << vi->log_arity) - 1)) *
> +		   vi->hash_alg->digest_size;
> +}
> +
> +/* Extract a hash from a hash page */
> +static void extract_hash(struct page *hpage, unsigned int hoffset,
> +			 unsigned int hsize, u8 *out)
> +{
> +	void *virt = kmap_atomic(hpage);
> +
> +	memcpy(out, virt + hoffset, hsize);
> +	kunmap_atomic(virt);
> +}
> +
> +static int hash_page(const struct fsverity_info *vi, struct ahash_request *req,
> +		     struct page *page, u8 *out)
> +{
> +	struct scatterlist sg[3];
> +	DECLARE_CRYPTO_WAIT(wait);
> +	int err;
> +
> +	sg_init_table(sg, 1);
> +	sg_set_page(&sg[0], page, PAGE_SIZE, 0);
> +
> +	ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
> +				   CRYPTO_TFM_REQ_MAY_BACKLOG,
> +				   crypto_req_done, &wait);
> +	ahash_request_set_crypt(req, sg, out, PAGE_SIZE);
> +
> +	err = crypto_ahash_import(req, vi->hashstate);
> +	if (err)
> +		return err;
> +
> +	return crypto_wait_req(crypto_ahash_finup(req), &wait);
> +}
> +
> +static inline int compare_hashes(const u8 *want_hash, const u8 *real_hash,
> +				 int digest_size, struct inode *inode,
> +				 pgoff_t index, int level, const char *algname)
> +{
> +	if (memcmp(want_hash, real_hash, digest_size) == 0)
> +		return 0;
> +
> +	pr_warn_ratelimited("VERIFICATION FAILURE!  ino=%lu, index=%lu, level=%d, want_hash=%s:%*phN, real_hash=%s:%*phN\n",
> +			    inode->i_ino, index, level,
> +			    algname, digest_size, want_hash,
> +			    algname, digest_size, real_hash);
> +	return -EBADMSG;
> +}
> +
> +/*
> + * Verify a single data page against the file's Merkle tree.
> + *
> + * In principle, we need to verify the entire path to the root node.  But as an
> + * optimization, we cache the hash pages in the file's page cache, similar to
> + * data pages.  Therefore, we can stop verifying as soon as a verified hash page
> + * is seen while ascending the tree.
> + *
> + * Note that unlike data pages, hash pages are marked Uptodate *before* they are
> + * verified; instead, the Checked bit is set on hash pages that have been
> + * verified.  Multiple tasks may race to verify a hash page and mark it Checked,
> + * but it doesn't matter.  The use of the Checked bit also implies that the hash
> + * block size must equal PAGE_SIZE (for now).
> + */
> +static bool verify_page(struct inode *inode, const struct fsverity_info *vi,
> +			struct ahash_request *req, struct page *data_page)
> +{
> +	pgoff_t index = data_page->index;
> +	int level = 0;
> +	u8 _want_hash[FS_VERITY_MAX_DIGEST_SIZE];
> +	const u8 *want_hash = NULL;
> +	u8 real_hash[FS_VERITY_MAX_DIGEST_SIZE];
> +	struct page *hpages[FS_VERITY_MAX_LEVELS];
> +	unsigned int hoffsets[FS_VERITY_MAX_LEVELS];
> +	int err;
> +
> +	/* The page must not be unlocked until verification has completed. */
> +	if (WARN_ON_ONCE(!PageLocked(data_page)))
> +		return false;
> +
> +	/*
> +	 * Since ->i_size is overridden with ->data_i_size, and fs-verity avoids
> +	 * recursing into itself when reading hash pages, we shouldn't normally
> +	 * get here with a page beyond ->data_i_size.  But, it can happen if a
> +	 * read is issued at or beyond EOF since the VFS doesn't check i_size
> +	 * before calling ->readpage().  Thus, just skip verification if the
> +	 * page is beyond ->data_i_size.
> +	 */
> +	if (index >= (vi->data_i_size + PAGE_SIZE - 1) >> PAGE_SHIFT) {
> +		pr_debug("Page %lu is in metadata region\n", index);
> +		return true;
> +	}
> +
> +	pr_debug_ratelimited("Verifying data page %lu...\n", index);
> +
> +	/*
> +	 * Starting at the leaves, ascend the tree saving hash pages along the
> +	 * way until we find a verified hash page, indicated by PageChecked; or
> +	 * until we reach the root.
> +	 */
> +	for (level = 0; level < vi->depth; level++) {
> +		pgoff_t hindex;
> +		unsigned int hoffset;
> +		struct page *hpage;
> +
> +		hash_at_level(vi, index, level, &hindex, &hoffset);
> +
> +		pr_debug_ratelimited("Level %d: hindex=%lu, hoffset=%u\n",
> +				     level, hindex, hoffset);
> +
> +		hpage = read_mapping_page(inode->i_mapping, hindex, NULL);
> +		if (IS_ERR(hpage)) {
> +			err = PTR_ERR(hpage);
> +			goto out;
> +		}
> +
> +		if (PageChecked(hpage)) {
> +			extract_hash(hpage, hoffset, vi->hash_alg->digest_size,
> +				     _want_hash);
> +			want_hash = _want_hash;
> +			put_page(hpage);
> +			pr_debug_ratelimited("Hash page already checked, want %s:%*phN\n",
> +					     vi->hash_alg->name,
> +					     vi->hash_alg->digest_size,
> +					     want_hash);
> +			break;
> +		}
> +		pr_debug_ratelimited("Hash page not yet checked\n");
> +		hpages[level] = hpage;
> +		hoffsets[level] = hoffset;
> +	}
> +
> +	if (!want_hash) {
> +		want_hash = vi->root_hash;
> +		pr_debug("Want root hash: %s:%*phN\n", vi->hash_alg->name,
> +			 vi->hash_alg->digest_size, want_hash);
> +	}
> +
> +	/* Descend the tree verifying hash pages */
> +	for (; level > 0; level--) {
> +		struct page *hpage = hpages[level - 1];
> +		unsigned int hoffset = hoffsets[level - 1];
> +
> +		err = hash_page(vi, req, hpage, real_hash);
> +		if (err)
> +			goto out;
> +		err = compare_hashes(want_hash, real_hash,
> +				     vi->hash_alg->digest_size,
> +				     inode, index, level - 1,
> +				     vi->hash_alg->name);
> +		if (err)
> +			goto out;
> +		SetPageChecked(hpage);
> +		extract_hash(hpage, hoffset, vi->hash_alg->digest_size,
> +			     _want_hash);
> +		want_hash = _want_hash;
> +		put_page(hpage);
> +		pr_debug("Verified hash page at level %d, now want %s:%*phN\n",
> +			 level - 1, vi->hash_alg->name,
> +			 vi->hash_alg->digest_size, want_hash);
> +	}
> +
> +	/* Finally, verify the data page */
> +	err = hash_page(vi, req, data_page, real_hash);
> +	if (err)
> +		goto out;
> +	err = compare_hashes(want_hash, real_hash, vi->hash_alg->digest_size,
> +			     inode, index, -1, vi->hash_alg->name);
> +out:
> +	for (; level > 0; level--)
> +		put_page(hpages[level - 1]);
> +	if (err) {
> +		pr_warn_ratelimited("Error verifying page; ino=%lu, index=%lu (err=%d)\n",
> +				    inode->i_ino, data_page->index, err);
> +		return false;
> +	}
> +	return true;
> +}
> +
> +/**
> + * fsverity_verify_page - verify a data page
> + *
> + * Verify a page that has just been read from a file against that file's Merkle
> + * tree.  The page is assumed to be a pagecache page.
> + *
> + * Return: true if the page is valid, else false.
> + */
> +bool fsverity_verify_page(struct page *data_page)
> +{
> +	struct inode *inode = data_page->mapping->host;
> +	const struct fsverity_info *vi = get_fsverity_info(inode);
> +	struct ahash_request *req;
> +	bool valid;
> +
> +	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
> +	if (unlikely(!req))
> +		return false;
> +
> +	valid = verify_page(inode, vi, req, data_page);
> +
> +	ahash_request_free(req);
> +
> +	return valid;
> +}
> +EXPORT_SYMBOL_GPL(fsverity_verify_page);
> +
> +/**
> + * fsverity_verify_bio - verify a 'read' bio that has just completed
> + *
> + * Verify a set of pages that have just been read from a file against that
> + * file's Merkle tree.  The pages are assumed to be pagecache pages.  Pages that
> + * fail verification are set to the Error state.  Verification is skipped for
> + * pages already in the Error state, e.g. due to fscrypt decryption failure.
> + */
> +void fsverity_verify_bio(struct bio *bio)

Hi Eric-

This kind of API won't work for remote filesystems, which do not use
"struct bio" to do their I/O. Could a remote filesystem solely use
fsverity_verify_page instead?


> +{
> +	struct inode *inode = bio_first_page_all(bio)->mapping->host;
> +	const struct fsverity_info *vi = get_fsverity_info(inode);
> +	struct ahash_request *req;
> +	struct bio_vec *bv;
> +	int i;
> +
> +	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
> +	if (unlikely(!req)) {
> +		bio_for_each_segment_all(bv, bio, i)
> +			SetPageError(bv->bv_page);
> +		return;
> +	}
> +
> +	bio_for_each_segment_all(bv, bio, i) {
> +		struct page *page = bv->bv_page;
> +
> +		if (!PageError(page) && !verify_page(inode, vi, req, page))
> +			SetPageError(page);
> +	}
> +
> +	ahash_request_free(req);
> +}
> +EXPORT_SYMBOL_GPL(fsverity_verify_bio);
> +
> +/**
> + * fsverity_enqueue_verify_work - enqueue work on the fs-verity workqueue
> + *
> + * Enqueue verification work for asynchronous processing.
> + */
> +void fsverity_enqueue_verify_work(struct work_struct *work)
> +{
> +	queue_work(fsverity_read_workqueue, work);
> +}
> +EXPORT_SYMBOL_GPL(fsverity_enqueue_verify_work);
> diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
> index 3af55241046aa..56341f10aa965 100644
> --- a/include/linux/fsverity.h
> +++ b/include/linux/fsverity.h
> @@ -28,6 +28,11 @@ extern int fsverity_prepare_getattr(struct inode *inode);
> extern void fsverity_cleanup_inode(struct inode *inode);
> extern loff_t fsverity_full_i_size(const struct inode *inode);
> 
> +/* verify.c */
> +extern bool fsverity_verify_page(struct page *page);
> +extern void fsverity_verify_bio(struct bio *bio);
> +extern void fsverity_enqueue_verify_work(struct work_struct *work);
> +
> #else /* !__FS_HAS_VERITY */
> 
> /* setup.c */
> @@ -57,6 +62,24 @@ static inline loff_t fsverity_full_i_size(const struct inode *inode)
> 	return i_size_read(inode);
> }
> 
> +/* verify.c */
> +
> +static inline bool fsverity_verify_page(struct page *page)
> +{
> +	WARN_ON(1);
> +	return false;
> +}
> +
> +static inline void fsverity_verify_bio(struct bio *bio)
> +{
> +	WARN_ON(1);
> +}
> +
> +static inline void fsverity_enqueue_verify_work(struct work_struct *work)
> +{
> +	WARN_ON(1);
> +}
> +
> #endif	/* !__FS_HAS_VERITY */
> 
> #endif	/* _LINUX_FSVERITY_H */
> -- 
> 2.18.0
> 

--
Chuck Lever
chucklever@gmail.com
Eric Biggers Aug. 26, 2018, 5:04 p.m. | #14
Hi Chuck,

On Sun, Aug 26, 2018 at 11:55:57AM -0400, Chuck Lever wrote:
> > +
> > +/**
> > + * fsverity_verify_page - verify a data page
> > + *
> > + * Verify a page that has just been read from a file against that file's Merkle
> > + * tree.  The page is assumed to be a pagecache page.
> > + *
> > + * Return: true if the page is valid, else false.
> > + */
> > +bool fsverity_verify_page(struct page *data_page)
> > +{
> > +	struct inode *inode = data_page->mapping->host;
> > +	const struct fsverity_info *vi = get_fsverity_info(inode);
> > +	struct ahash_request *req;
> > +	bool valid;
> > +
> > +	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
> > +	if (unlikely(!req))
> > +		return false;
> > +
> > +	valid = verify_page(inode, vi, req, data_page);
> > +
> > +	ahash_request_free(req);
> > +
> > +	return valid;
> > +}
> > +EXPORT_SYMBOL_GPL(fsverity_verify_page);
> > +
> > +/**
> > + * fsverity_verify_bio - verify a 'read' bio that has just completed
> > + *
> > + * Verify a set of pages that have just been read from a file against that
> > + * file's Merkle tree.  The pages are assumed to be pagecache pages.  Pages that
> > + * fail verification are set to the Error state.  Verification is skipped for
> > + * pages already in the Error state, e.g. due to fscrypt decryption failure.
> > + */
> > +void fsverity_verify_bio(struct bio *bio)
> 
> Hi Eric-
> 
> This kind of API won't work for remote filesystems, which do not use
> "struct bio" to do their I/O. Could a remote filesystem solely use
> fsverity_verify_page instead?
> 

Yes, filesystems don't have to use fsverity_verify_bio().  They can call
fsverity_verify_page() on each page instead.  I will clarify this in the next
revision of the patchset.

- Eric
Gao Xiang Aug. 26, 2018, 5:44 p.m. | #15
Hi Eric,

On 2018/8/27 1:04, Eric Biggers wrote:
> Hi Chuck,
> 
> On Sun, Aug 26, 2018 at 11:55:57AM -0400, Chuck Lever wrote:
>>> +
>>> +/**
>>> + * fsverity_verify_page - verify a data page
>>> + *
>>> + * Verify a page that has just been read from a file against that file's Merkle
>>> + * tree.  The page is assumed to be a pagecache page.
>>> + *
>>> + * Return: true if the page is valid, else false.
>>> + */
>>> +bool fsverity_verify_page(struct page *data_page)
>>> +{
>>> +	struct inode *inode = data_page->mapping->host;
>>> +	const struct fsverity_info *vi = get_fsverity_info(inode);
>>> +	struct ahash_request *req;
>>> +	bool valid;
>>> +
>>> +	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);

Some minor suggestions occurred to me after I saw this part of code
again before sleeping...

1) How about introducing an iterator callback to avoid too many
ahash_request_alloc and ahash_request_free... (It seems too many pages
and could be some slower than fsverity_verify_bio...)

2) How about adding a gfp_t input argument since I don't know whether
GFP_KERNEL is suitable for all use cases...

It seems there could be more fsverity_verify_page users as well as
fsverity_verify_bio ;)

Sorry for interruption...

Thanks,
Gao Xiang

>>> +	if (unlikely(!req))
>>> +		return false;
>>> +
>>> +	valid = verify_page(inode, vi, req, data_page);
>>> +
>>> +	ahash_request_free(req);
>>> +
>>> +	return valid;
>>> +}
>>> +EXPORT_SYMBOL_GPL(fsverity_verify_page);
>>> +
>>> +/**
>>> + * fsverity_verify_bio - verify a 'read' bio that has just completed
>>> + *
>>> + * Verify a set of pages that have just been read from a file against that
>>> + * file's Merkle tree.  The pages are assumed to be pagecache pages.  Pages that
>>> + * fail verification are set to the Error state.  Verification is skipped for
>>> + * pages already in the Error state, e.g. due to fscrypt decryption failure.
>>> + */
>>> +void fsverity_verify_bio(struct bio *bio)
>>
>> Hi Eric-
>>
>> This kind of API won't work for remote filesystems, which do not use
>> "struct bio" to do their I/O. Could a remote filesystem solely use
>> fsverity_verify_page instead?
>>
> 
> Yes, filesystems don't have to use fsverity_verify_bio().  They can call
> fsverity_verify_page() on each page instead.  I will clarify this in the next
> revision of the patchset.
> 
> - Eric
>
Olof Johansson Sept. 2, 2018, 2:35 a.m. | #16
Hi,

On Fri, Aug 24, 2018 at 9:16 PM, Eric Biggers <ebiggers@kernel.org> wrote:
> Hi Gao,
>
> On Sat, Aug 25, 2018 at 10:29:26AM +0800, Gao Xiang wrote:
>> Hi,
>>
>> At last, I hope filesystems could select the on-disk position of hash tree and 'struct fsverity_descriptor'
>> rather than fixed in the end of verity files...I think if fs-verity preparing such support and interfaces could be better.....hmmm... :(
>
> In theory it would be a much cleaner design to store verity metadata separately
> from the data.  But the Merkle tree can be very large.  For example, a 1 GB file
> using SHA-512 would have a 16.6 MB Merkle tree.  So the Merkle tree can't be an
> extended attribute, since the xattrs API requires xattrs to be small (<= 64 KB),
> and most filesystems further limit xattr sizes in their on-disk format to as
> little as 4 KB.  Furthermore, even if both of these limits were to be increased,
> the xattrs functions (both the syscalls, and the internal functions that
> filesystems have) are all based around getting/setting the entire xattr value.
>
> Also when used with fscrypt, we want the Merkle tree and fsverity_descriptor to
> be encrypted, so they doesn't leak plaintext hashes.  And we want the Merkle
> tree to be paged into memory, just like the file contents, to take advantage of
> the usual Linux memory management.
>
> What we really need is *streams*, like NTFS has.  But the filesystems we're
> targetting don't support streams, nor does the Linux syscall interface have any
> API for accessing streams, nor does the VFS support them.
>
> Adding streams support to all those things would be a huge multi-year effort,
> controversial, and almost certainly not worth it just for fs-verity.
>
> So simply storing the verity metadata past i_size seems like the best solution
> for now.
>
> That being said, in the future we could pretty easily swap out the calls to
> read_mapping_page() with something else if a particular filesystem wanted to
> store the metadata somewhere else.  We actually even originally had a function
> ->read_metadata_page() in the filesystem's fsverity_operations, but it turned
> out to be unnecessary and I replaced it with directly calling
> read_mapping_page(), but it could be changed back at any time.

What about an xattr not to hold the Merkle tree, but to contain a
suitable reference to a file/inode+offset that contains it (+ toplevel
hash for said tree/file or the descriptor/struct)?

If you also expose said file in the directory structure, things such
as backups might be easier to handle. For where the tree is appended
to the file, you could self-reference.


-Olof

Patch

diff --git a/fs/verity/Makefile b/fs/verity/Makefile
index 39e123805c827..a6c7cefb61ab7 100644
--- a/fs/verity/Makefile
+++ b/fs/verity/Makefile
@@ -1,3 +1,3 @@ 
 obj-$(CONFIG_FS_VERITY)	+= fsverity.o
 
-fsverity-y := hash_algs.o setup.o
+fsverity-y := hash_algs.o setup.o verify.o
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index a18ff645695f4..c553f99dc4973 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -96,4 +96,7 @@  static inline bool set_fsverity_info(struct inode *inode,
 	return true;
 }
 
+/* verify.c */
+extern struct workqueue_struct *fsverity_read_workqueue;
+
 #endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/setup.c b/fs/verity/setup.c
index e675c52898d5b..84cc2edeca25b 100644
--- a/fs/verity/setup.c
+++ b/fs/verity/setup.c
@@ -824,18 +824,42 @@  EXPORT_SYMBOL_GPL(fsverity_full_i_size);
 
 static int __init fsverity_module_init(void)
 {
+	int err;
+
+	/*
+	 * Use an unbound workqueue to allow bios to be verified in parallel
+	 * even when they happen to complete on the same CPU.  This sacrifices
+	 * locality, but it's worthwhile since hashing is CPU-intensive.
+	 *
+	 * Also use a high-priority workqueue to prioritize verification work,
+	 * which blocks reads from completing, over regular application tasks.
+	 */
+	err = -ENOMEM;
+	fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue",
+						  WQ_UNBOUND | WQ_HIGHPRI,
+						  num_online_cpus());
+	if (!fsverity_read_workqueue)
+		goto error;
+
+	err = -ENOMEM;
 	fsverity_info_cachep = KMEM_CACHE(fsverity_info, SLAB_RECLAIM_ACCOUNT);
 	if (!fsverity_info_cachep)
-		return -ENOMEM;
+		goto error_free_workqueue;
 
 	fsverity_check_hash_algs();
 
 	pr_debug("Initialized fs-verity\n");
 	return 0;
+
+error_free_workqueue:
+	destroy_workqueue(fsverity_read_workqueue);
+error:
+	return err;
 }
 
 static void __exit fsverity_module_exit(void)
 {
+	destroy_workqueue(fsverity_read_workqueue);
 	kmem_cache_destroy(fsverity_info_cachep);
 	fsverity_exit_hash_algs();
 }
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
new file mode 100644
index 0000000000000..1452dd05f75d3
--- /dev/null
+++ b/fs/verity/verify.c
@@ -0,0 +1,310 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fs/verity/verify.c: fs-verity data verification functions,
+ *		       i.e. hooks for ->readpages()
+ *
+ * Copyright (C) 2018 Google LLC
+ *
+ * Originally written by Jaegeuk Kim and Michael Halcrow;
+ * heavily rewritten by Eric Biggers.
+ */
+
+#include "fsverity_private.h"
+
+#include <crypto/hash.h>
+#include <linux/bio.h>
+#include <linux/pagemap.h>
+#include <linux/ratelimit.h>
+#include <linux/scatterlist.h>
+
+struct workqueue_struct *fsverity_read_workqueue;
+
+/**
+ * hash_at_level() - compute the location of the block's hash at the given level
+ *
+ * @vi:		(in) the file's verity info
+ * @dindex:	(in) the index of the data block being verified
+ * @level:	(in) the level of hash we want
+ * @hindex:	(out) the index of the hash block containing the wanted hash
+ * @hoffset:	(out) the byte offset to the wanted hash within the hash block
+ */
+static void hash_at_level(const struct fsverity_info *vi, pgoff_t dindex,
+			  unsigned int level, pgoff_t *hindex,
+			  unsigned int *hoffset)
+{
+	pgoff_t hoffset_in_lvl;
+
+	/*
+	 * Compute the offset of the hash within the level's region, in hashes.
+	 * For example, with 4096-byte blocks and 32-byte hashes, there are
+	 * 4096/32 = 128 = 2^7 hashes per hash block, i.e. log_arity = 7.  Then,
+	 * if the data block index is 65668 and we want the level 1 hash, it is
+	 * located at 65668 >> 7 = 513 hashes into the level 1 region.
+	 */
+	hoffset_in_lvl = dindex >> (level * vi->log_arity);
+
+	/*
+	 * Compute the index of the hash block containing the wanted hash.
+	 * Continuing the above example, the block would be at index 513 >> 7 =
+	 * 4 within the level 1 region.  To this we'd add the index at which the
+	 * level 1 region starts.
+	 */
+	*hindex = vi->hash_lvl_region_idx[level] +
+		  (hoffset_in_lvl >> vi->log_arity);
+
+	/*
+	 * Finally, compute the index of the hash within the block rather than
+	 * the region, and multiply by the hash size to turn it into a byte
+	 * offset.  Continuing the above example, the hash would be at byte
+	 * offset (513 & ((1 << 7) - 1)) * 32 = 32 within the block.
+	 */
+	*hoffset = (hoffset_in_lvl & ((1 << vi->log_arity) - 1)) *
+		   vi->hash_alg->digest_size;
+}
+
+/* Extract a hash from a hash page */
+static void extract_hash(struct page *hpage, unsigned int hoffset,
+			 unsigned int hsize, u8 *out)
+{
+	void *virt = kmap_atomic(hpage);
+
+	memcpy(out, virt + hoffset, hsize);
+	kunmap_atomic(virt);
+}
+
+static int hash_page(const struct fsverity_info *vi, struct ahash_request *req,
+		     struct page *page, u8 *out)
+{
+	struct scatterlist sg[3];
+	DECLARE_CRYPTO_WAIT(wait);
+	int err;
+
+	sg_init_table(sg, 1);
+	sg_set_page(&sg[0], page, PAGE_SIZE, 0);
+
+	ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
+				   CRYPTO_TFM_REQ_MAY_BACKLOG,
+				   crypto_req_done, &wait);
+	ahash_request_set_crypt(req, sg, out, PAGE_SIZE);
+
+	err = crypto_ahash_import(req, vi->hashstate);
+	if (err)
+		return err;
+
+	return crypto_wait_req(crypto_ahash_finup(req), &wait);
+}
+
+static inline int compare_hashes(const u8 *want_hash, const u8 *real_hash,
+				 int digest_size, struct inode *inode,
+				 pgoff_t index, int level, const char *algname)
+{
+	if (memcmp(want_hash, real_hash, digest_size) == 0)
+		return 0;
+
+	pr_warn_ratelimited("VERIFICATION FAILURE!  ino=%lu, index=%lu, level=%d, want_hash=%s:%*phN, real_hash=%s:%*phN\n",
+			    inode->i_ino, index, level,
+			    algname, digest_size, want_hash,
+			    algname, digest_size, real_hash);
+	return -EBADMSG;
+}
+
+/*
+ * Verify a single data page against the file's Merkle tree.
+ *
+ * In principle, we need to verify the entire path to the root node.  But as an
+ * optimization, we cache the hash pages in the file's page cache, similar to
+ * data pages.  Therefore, we can stop verifying as soon as a verified hash page
+ * is seen while ascending the tree.
+ *
+ * Note that unlike data pages, hash pages are marked Uptodate *before* they are
+ * verified; instead, the Checked bit is set on hash pages that have been
+ * verified.  Multiple tasks may race to verify a hash page and mark it Checked,
+ * but it doesn't matter.  The use of the Checked bit also implies that the hash
+ * block size must equal PAGE_SIZE (for now).
+ */
+static bool verify_page(struct inode *inode, const struct fsverity_info *vi,
+			struct ahash_request *req, struct page *data_page)
+{
+	pgoff_t index = data_page->index;
+	int level = 0;
+	u8 _want_hash[FS_VERITY_MAX_DIGEST_SIZE];
+	const u8 *want_hash = NULL;
+	u8 real_hash[FS_VERITY_MAX_DIGEST_SIZE];
+	struct page *hpages[FS_VERITY_MAX_LEVELS];
+	unsigned int hoffsets[FS_VERITY_MAX_LEVELS];
+	int err;
+
+	/* The page must not be unlocked until verification has completed. */
+	if (WARN_ON_ONCE(!PageLocked(data_page)))
+		return false;
+
+	/*
+	 * Since ->i_size is overridden with ->data_i_size, and fs-verity avoids
+	 * recursing into itself when reading hash pages, we shouldn't normally
+	 * get here with a page beyond ->data_i_size.  But, it can happen if a
+	 * read is issued at or beyond EOF since the VFS doesn't check i_size
+	 * before calling ->readpage().  Thus, just skip verification if the
+	 * page is beyond ->data_i_size.
+	 */
+	if (index >= (vi->data_i_size + PAGE_SIZE - 1) >> PAGE_SHIFT) {
+		pr_debug("Page %lu is in metadata region\n", index);
+		return true;
+	}
+
+	pr_debug_ratelimited("Verifying data page %lu...\n", index);
+
+	/*
+	 * Starting at the leaves, ascend the tree saving hash pages along the
+	 * way until we find a verified hash page, indicated by PageChecked; or
+	 * until we reach the root.
+	 */
+	for (level = 0; level < vi->depth; level++) {
+		pgoff_t hindex;
+		unsigned int hoffset;
+		struct page *hpage;
+
+		hash_at_level(vi, index, level, &hindex, &hoffset);
+
+		pr_debug_ratelimited("Level %d: hindex=%lu, hoffset=%u\n",
+				     level, hindex, hoffset);
+
+		hpage = read_mapping_page(inode->i_mapping, hindex, NULL);
+		if (IS_ERR(hpage)) {
+			err = PTR_ERR(hpage);
+			goto out;
+		}
+
+		if (PageChecked(hpage)) {
+			extract_hash(hpage, hoffset, vi->hash_alg->digest_size,
+				     _want_hash);
+			want_hash = _want_hash;
+			put_page(hpage);
+			pr_debug_ratelimited("Hash page already checked, want %s:%*phN\n",
+					     vi->hash_alg->name,
+					     vi->hash_alg->digest_size,
+					     want_hash);
+			break;
+		}
+		pr_debug_ratelimited("Hash page not yet checked\n");
+		hpages[level] = hpage;
+		hoffsets[level] = hoffset;
+	}
+
+	if (!want_hash) {
+		want_hash = vi->root_hash;
+		pr_debug("Want root hash: %s:%*phN\n", vi->hash_alg->name,
+			 vi->hash_alg->digest_size, want_hash);
+	}
+
+	/* Descend the tree verifying hash pages */
+	for (; level > 0; level--) {
+		struct page *hpage = hpages[level - 1];
+		unsigned int hoffset = hoffsets[level - 1];
+
+		err = hash_page(vi, req, hpage, real_hash);
+		if (err)
+			goto out;
+		err = compare_hashes(want_hash, real_hash,
+				     vi->hash_alg->digest_size,
+				     inode, index, level - 1,
+				     vi->hash_alg->name);
+		if (err)
+			goto out;
+		SetPageChecked(hpage);
+		extract_hash(hpage, hoffset, vi->hash_alg->digest_size,
+			     _want_hash);
+		want_hash = _want_hash;
+		put_page(hpage);
+		pr_debug("Verified hash page at level %d, now want %s:%*phN\n",
+			 level - 1, vi->hash_alg->name,
+			 vi->hash_alg->digest_size, want_hash);
+	}
+
+	/* Finally, verify the data page */
+	err = hash_page(vi, req, data_page, real_hash);
+	if (err)
+		goto out;
+	err = compare_hashes(want_hash, real_hash, vi->hash_alg->digest_size,
+			     inode, index, -1, vi->hash_alg->name);
+out:
+	for (; level > 0; level--)
+		put_page(hpages[level - 1]);
+	if (err) {
+		pr_warn_ratelimited("Error verifying page; ino=%lu, index=%lu (err=%d)\n",
+				    inode->i_ino, data_page->index, err);
+		return false;
+	}
+	return true;
+}
+
+/**
+ * fsverity_verify_page - verify a data page
+ *
+ * Verify a page that has just been read from a file against that file's Merkle
+ * tree.  The page is assumed to be a pagecache page.
+ *
+ * Return: true if the page is valid, else false.
+ */
+bool fsverity_verify_page(struct page *data_page)
+{
+	struct inode *inode = data_page->mapping->host;
+	const struct fsverity_info *vi = get_fsverity_info(inode);
+	struct ahash_request *req;
+	bool valid;
+
+	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
+	if (unlikely(!req))
+		return false;
+
+	valid = verify_page(inode, vi, req, data_page);
+
+	ahash_request_free(req);
+
+	return valid;
+}
+EXPORT_SYMBOL_GPL(fsverity_verify_page);
+
+/**
+ * fsverity_verify_bio - verify a 'read' bio that has just completed
+ *
+ * Verify a set of pages that have just been read from a file against that
+ * file's Merkle tree.  The pages are assumed to be pagecache pages.  Pages that
+ * fail verification are set to the Error state.  Verification is skipped for
+ * pages already in the Error state, e.g. due to fscrypt decryption failure.
+ */
+void fsverity_verify_bio(struct bio *bio)
+{
+	struct inode *inode = bio_first_page_all(bio)->mapping->host;
+	const struct fsverity_info *vi = get_fsverity_info(inode);
+	struct ahash_request *req;
+	struct bio_vec *bv;
+	int i;
+
+	req = ahash_request_alloc(vi->hash_alg->tfm, GFP_KERNEL);
+	if (unlikely(!req)) {
+		bio_for_each_segment_all(bv, bio, i)
+			SetPageError(bv->bv_page);
+		return;
+	}
+
+	bio_for_each_segment_all(bv, bio, i) {
+		struct page *page = bv->bv_page;
+
+		if (!PageError(page) && !verify_page(inode, vi, req, page))
+			SetPageError(page);
+	}
+
+	ahash_request_free(req);
+}
+EXPORT_SYMBOL_GPL(fsverity_verify_bio);
+
+/**
+ * fsverity_enqueue_verify_work - enqueue work on the fs-verity workqueue
+ *
+ * Enqueue verification work for asynchronous processing.
+ */
+void fsverity_enqueue_verify_work(struct work_struct *work)
+{
+	queue_work(fsverity_read_workqueue, work);
+}
+EXPORT_SYMBOL_GPL(fsverity_enqueue_verify_work);
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 3af55241046aa..56341f10aa965 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -28,6 +28,11 @@  extern int fsverity_prepare_getattr(struct inode *inode);
 extern void fsverity_cleanup_inode(struct inode *inode);
 extern loff_t fsverity_full_i_size(const struct inode *inode);
 
+/* verify.c */
+extern bool fsverity_verify_page(struct page *page);
+extern void fsverity_verify_bio(struct bio *bio);
+extern void fsverity_enqueue_verify_work(struct work_struct *work);
+
 #else /* !__FS_HAS_VERITY */
 
 /* setup.c */
@@ -57,6 +62,24 @@  static inline loff_t fsverity_full_i_size(const struct inode *inode)
 	return i_size_read(inode);
 }
 
+/* verify.c */
+
+static inline bool fsverity_verify_page(struct page *page)
+{
+	WARN_ON(1);
+	return false;
+}
+
+static inline void fsverity_verify_bio(struct bio *bio)
+{
+	WARN_ON(1);
+}
+
+static inline void fsverity_enqueue_verify_work(struct work_struct *work)
+{
+	WARN_ON(1);
+}
+
 #endif	/* !__FS_HAS_VERITY */
 
 #endif	/* _LINUX_FSVERITY_H */