LUKS: support preallocation in qemu-img
diff mbox series

Message ID 20190710170349.1548-1-mlevitsk@redhat.com
State New
Headers show
Series
  • LUKS: support preallocation in qemu-img
Related show

Commit Message

Maxim Levitsky July 10, 2019, 5:03 p.m. UTC
preallocation=off and preallocation=metadata
both allocate luks header only, and preallocation=falloc/full
is passed to underlying file, with the given image size.

Note that the actual preallocated size is a bit smaller due
to luks header.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/crypto.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

Comments

Max Reitz July 10, 2019, 9:24 p.m. UTC | #1
On 10.07.19 19:03, Maxim Levitsky wrote:
> preallocation=off and preallocation=metadata
> both allocate luks header only, and preallocation=falloc/full
> is passed to underlying file, with the given image size.
> 
> Note that the actual preallocated size is a bit smaller due
> to luks header.

Couldn’t you just preallocate it after creating the crypto header so
qcrypto_block_get_payload_offset(crypto->block) + size is the actual
file size?

> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  block/crypto.c | 28 ++++++++++++++++++++++++++--
>  1 file changed, 26 insertions(+), 2 deletions(-)

Hm.  I would expect a preallocated image to read 0.  But if you just
pass this through to the protocol layer, it won’t read 0.

(In fact, I don’t even quite see the point of having LUKS as an own
format still.  It was useful when qcow2 didn’t have LUKS support, but
now it does, so...  I suppose everyone using the LUKS format should
actually be using qcow2 with LUKS?)

Max
Max Reitz July 10, 2019, 9:52 p.m. UTC | #2
On 10.07.19 23:24, Max Reitz wrote:
> On 10.07.19 19:03, Maxim Levitsky wrote:
>> preallocation=off and preallocation=metadata
>> both allocate luks header only, and preallocation=falloc/full
>> is passed to underlying file, with the given image size.
>>
>> Note that the actual preallocated size is a bit smaller due
>> to luks header.
> 
> Couldn’t you just preallocate it after creating the crypto header so
> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
> file size?
> 
>> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
>>
>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>> ---
>>  block/crypto.c | 28 ++++++++++++++++++++++++++--
>>  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> Hm.  I would expect a preallocated image to read 0.  But if you just
> pass this through to the protocol layer, it won’t read 0.
> 
> (In fact, I don’t even quite see the point of having LUKS as an own
> format still.  It was useful when qcow2 didn’t have LUKS support, but
> now it does, so...  I suppose everyone using the LUKS format should
> actually be using qcow2 with LUKS?)

Kevin just pointed out to me that our LUKS format is compatible to the
actual layout cryptsetup uses.  OK, that is an important use case.

Hm.  Unfortunately, that doesn’t really necessitate preallocation.

Well, whatever.  If it’s simple enough, that shouldn’t stop us from
implementing preallocation anyway.


Now I found that qapi/block-core.json defines PreallocMode’s falloc and
full values as follows:

> # @falloc: like @full preallocation but allocate disk space by
> #          posix_fallocate() rather than writing zeros.
> # @full: preallocate all data by writing zeros to device to ensure disk
> #        space is really available. @full preallocation also sets up
> #        metadata correctly.

So it isn’t just me who expects these to pre-initialize the image to 0.
 Hm, although...  I suppose @falloc technically does not specify whether
the data reads as zeroes.  I kind of find it to be implied, but, well...

Max
Maxim Levitsky July 11, 2019, 8:39 a.m. UTC | #3
On Wed, 2019-07-10 at 23:52 +0200, Max Reitz wrote:
> On 10.07.19 23:24, Max Reitz wrote:
> > On 10.07.19 19:03, Maxim Levitsky wrote:
> > > preallocation=off and preallocation=metadata
> > > both allocate luks header only, and preallocation=falloc/full
> > > is passed to underlying file, with the given image size.
> > > 
> > > Note that the actual preallocated size is a bit smaller due
> > > to luks header.
> > 
> > Couldn’t you just preallocate it after creating the crypto header so
> > qcrypto_block_get_payload_offset(crypto->block) + size is the actual
> > file size?

I kind of thought of the same thing after I send the patch. I'll see now it I can make it work.


> > 
> > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> > > 
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >  block/crypto.c | 28 ++++++++++++++++++++++++++--
> > >  1 file changed, 26 insertions(+), 2 deletions(-)
> > 
> > Hm.  I would expect a preallocated image to read 0.  But if you just
> > pass this through to the protocol layer, it won’t read 0.
> > 
> > (In fact, I don’t even quite see the point of having LUKS as an own
> > format still.  It was useful when qcow2 didn’t have LUKS support, but
> > now it does, so...  I suppose everyone using the LUKS format should
> > actually be using qcow2 with LUKS?)
> 
> Kevin just pointed out to me that our LUKS format is compatible to the
> actual layout cryptsetup uses.  OK, that is an important use case.
> 
> Hm.  Unfortunately, that doesn’t really necessitate preallocation.
> 
> Well, whatever.  If it’s simple enough, that shouldn’t stop us from
> implementing preallocation anyway.
Exactly. Since I already know the area of qemu-img relatively well, and
this bug is on my backlog, I thought why not to do it.


> 
> 
> Now I found that qapi/block-core.json defines PreallocMode’s falloc and
> full values as follows:
> 
> > # @falloc: like @full preallocation but allocate disk space by
> > #          posix_fallocate() rather than writing zeros.
> > # @full: preallocate all data by writing zeros to device to ensure disk
> > #        space is really available. @full preallocation also sets up
> > #        metadata correctly.
> 
> So it isn’t just me who expects these to pre-initialize the image to 0.
>  Hm, although...  I suppose @falloc technically does not specify whether
> the data reads as zeroes.  I kind of find it to be implied, but, well...

I personally don't really think that zeros are important, but rather the level of allocation.
posix_fallocate probably won't write the data blocks but rather only the inode metadata / used block bitmap/etc.

On the other hand writing zeros (or anything else) will force the block layer to actually write to the underlying
storage which could trigger lower layer allocation if the underlying storage is thin-provisioned.

In fact IMHO, instead of writing zeros, it would be better to write random garbage instead (or have that as an even 'fuller'
preallocation mode), since underlying storage might 'compress' the zeros. 

In this version I do have a bug that I mentioned, about not preallocation some data at the end of the image, and I will
fix it, so that all image is zeros as expected

Best regards,
	Maxim Levitsky


> 
> Max
>
Daniel P. Berrangé July 11, 2019, 9:20 a.m. UTC | #4
On Wed, Jul 10, 2019 at 11:24:46PM +0200, Max Reitz wrote:
> On 10.07.19 19:03, Maxim Levitsky wrote:
> > preallocation=off and preallocation=metadata
> > both allocate luks header only, and preallocation=falloc/full
> > is passed to underlying file, with the given image size.
> > 
> > Note that the actual preallocated size is a bit smaller due
> > to luks header.
> 
> Couldn’t you just preallocate it after creating the crypto header so
> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
> file size?

Yeah that would be preferrable. If that's really not possible, we
could likely provide some API to query the expected hreader size for
a given set of creation options. 

> 
> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  block/crypto.c | 28 ++++++++++++++++++++++++++--
> >  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> Hm.  I would expect a preallocated image to read 0.  But if you just
> pass this through to the protocol layer, it won’t read 0.

Yes, it will be zeros at the physical layer, but unintelligble
garbage from POV of the virtual disk.

I don't think this is really a problem though - this is what you
get already if you create a LUKS volume on top of a block device
today.

AFAIK, we've not documented that preallocation guarantees future
reads will return zeros. Preallocation simply ensures that all
required space is allocated upfront. We do mention that it might
be achieved by writing zeros to the underlying storage but never
said you'll get zeros back.

IOW I think its at most a docs problem to more clearly explain
that preallocation != guaranteed zeros for reads.

> (In fact, I don’t even quite see the point of having LUKS as an own
> format still.  It was useful when qcow2 didn’t have LUKS support, but
> now it does, so...  I suppose everyone using the LUKS format should
> actually be using qcow2 with LUKS?)

Certainly not. LUKS on raw is going to be very common, not least because
that's directly compatible with what Linux kernel supports. If you don't
want the features of qcow2 like snapshots, it just adds overhead and mgmt
complexity for no gain, especially if dealing with block device backed
storage (iSCSI, RBD).

OpenStack will use cryptsetup when initializing its block storage with
LUKS, then tell QEMU to run with the raw + LUKS driver.

Regards,
Daniel
Max Reitz July 11, 2019, 12:23 p.m. UTC | #5
On 11.07.19 11:20, Daniel P. Berrangé wrote:
> On Wed, Jul 10, 2019 at 11:24:46PM +0200, Max Reitz wrote:
>> On 10.07.19 19:03, Maxim Levitsky wrote:
>>> preallocation=off and preallocation=metadata
>>> both allocate luks header only, and preallocation=falloc/full
>>> is passed to underlying file, with the given image size.
>>>
>>> Note that the actual preallocated size is a bit smaller due
>>> to luks header.
>>
>> Couldn’t you just preallocate it after creating the crypto header so
>> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
>> file size?
> 
> Yeah that would be preferrable. If that's really not possible, we
> could likely provide some API to query the expected hreader size for
> a given set of creation options. 
> 
>>
>>> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
>>>
>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>> ---
>>>  block/crypto.c | 28 ++++++++++++++++++++++++++--
>>>  1 file changed, 26 insertions(+), 2 deletions(-)
>>
>> Hm.  I would expect a preallocated image to read 0.  But if you just
>> pass this through to the protocol layer, it won’t read 0.
> 
> Yes, it will be zeros at the physical layer, but unintelligble
> garbage from POV of the virtual disk.
> 
> I don't think this is really a problem though - this is what you
> get already if you create a LUKS volume on top of a block device
> today.

Which is why we have BlockDriver.bdrv_has_zero_init(), which the LUKS
driver does not implement, hence it being treated as false.

But if you are preallocating, you have a choice of what you write, and
why not make that zeroes?

> AFAIK, we've not documented that preallocation guarantees future
> reads will return zeros. Preallocation simply ensures that all
> required space is allocated upfront. We do mention that it might
> be achieved by writing zeros to the underlying storage but never
> said you'll get zeros back.

But we have, as I wrote in my second reply.  PreallocMode's
documentation says that at least “full” is writing zeroes, and to say
those zeroes can be anywhere in the stack is cheating, from my POV.

> IOW I think its at most a docs problem to more clearly explain
> that preallocation != guaranteed zeros for reads.
> 
>> (In fact, I don’t even quite see the point of having LUKS as an own
>> format still.  It was useful when qcow2 didn’t have LUKS support, but
>> now it does, so...  I suppose everyone using the LUKS format should
>> actually be using qcow2 with LUKS?)
> 
> Certainly not. LUKS on raw is going to be very common, not least because
> that's directly compatible with what Linux kernel supports. If you don't
> want the features of qcow2 like snapshots, it just adds overhead and mgmt
> complexity for no gain, especially if dealing with block device backed
> storage (iSCSI, RBD).
> 
> OpenStack will use cryptsetup when initializing its block storage with
> LUKS, then tell QEMU to run with the raw + LUKS driver.

I see the compatibility with the Linux kernel, yes (as I wrote in my
second reply), but I’m not sure whether “overhead” really is that big of
a point when using encryption.

Max
Max Reitz July 11, 2019, 12:24 p.m. UTC | #6
On 11.07.19 10:39, Maxim Levitsky wrote:
> On Wed, 2019-07-10 at 23:52 +0200, Max Reitz wrote:
>> On 10.07.19 23:24, Max Reitz wrote:
>>> On 10.07.19 19:03, Maxim Levitsky wrote:
>>>> preallocation=off and preallocation=metadata
>>>> both allocate luks header only, and preallocation=falloc/full
>>>> is passed to underlying file, with the given image size.
>>>>
>>>> Note that the actual preallocated size is a bit smaller due
>>>> to luks header.
>>>
>>> Couldn’t you just preallocate it after creating the crypto header so
>>> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
>>> file size?
> 
> I kind of thought of the same thing after I send the patch. I'll see now it I can make it work.
> 
> 
>>>
>>>> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
>>>>
>>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>>> ---
>>>>  block/crypto.c | 28 ++++++++++++++++++++++++++--
>>>>  1 file changed, 26 insertions(+), 2 deletions(-)
>>>
>>> Hm.  I would expect a preallocated image to read 0.  But if you just
>>> pass this through to the protocol layer, it won’t read 0.
>>>
>>> (In fact, I don’t even quite see the point of having LUKS as an own
>>> format still.  It was useful when qcow2 didn’t have LUKS support, but
>>> now it does, so...  I suppose everyone using the LUKS format should
>>> actually be using qcow2 with LUKS?)
>>
>> Kevin just pointed out to me that our LUKS format is compatible to the
>> actual layout cryptsetup uses.  OK, that is an important use case.
>>
>> Hm.  Unfortunately, that doesn’t really necessitate preallocation.
>>
>> Well, whatever.  If it’s simple enough, that shouldn’t stop us from
>> implementing preallocation anyway.
> Exactly. Since I already know the area of qemu-img relatively well, and
> this bug is on my backlog, I thought why not to do it.
> 
> 
>>
>>
>> Now I found that qapi/block-core.json defines PreallocMode’s falloc and
>> full values as follows:
>>
>>> # @falloc: like @full preallocation but allocate disk space by
>>> #          posix_fallocate() rather than writing zeros.
>>> # @full: preallocate all data by writing zeros to device to ensure disk
>>> #        space is really available. @full preallocation also sets up
>>> #        metadata correctly.
>>
>> So it isn’t just me who expects these to pre-initialize the image to 0.
>>  Hm, although...  I suppose @falloc technically does not specify whether
>> the data reads as zeroes.  I kind of find it to be implied, but, well...
> 
> I personally don't really think that zeros are important, but rather the level of allocation.
> posix_fallocate probably won't write the data blocks but rather only the inode metadata / used block bitmap/etc.
> 
> On the other hand writing zeros (or anything else) will force the block layer to actually write to the underlying
> storage which could trigger lower layer allocation if the underlying storage is thin-provisioned.
> 
> In fact IMHO, instead of writing zeros, it would be better to write random garbage instead (or have that as an even 'fuller'
> preallocation mode), since underlying storage might 'compress' the zeros. 

Which is actually an argument why you should just write zeroes on the
LUKS layer, because this will then turn into quasi-random data on the
protocol layer.

Max

> In this version I do have a bug that I mentioned, about not preallocation some data at the end of the image, and I will
> fix it, so that all image is zeros as expected
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
>>
>> Max
>>
> 
>
Max Reitz July 11, 2019, 12:54 p.m. UTC | #7
On 11.07.19 14:23, Max Reitz wrote:
> On 11.07.19 11:20, Daniel P. Berrangé wrote:
>> On Wed, Jul 10, 2019 at 11:24:46PM +0200, Max Reitz wrote:

[...]

>>> Hm.  I would expect a preallocated image to read 0.  But if you just
>>> pass this through to the protocol layer, it won’t read 0.
>>
>> Yes, it will be zeros at the physical layer, but unintelligble
>> garbage from POV of the virtual disk.
>>
>> I don't think this is really a problem though - this is what you
>> get already if you create a LUKS volume on top of a block device
>> today.
> 
> Which is why we have BlockDriver.bdrv_has_zero_init(), which the LUKS
> driver does not implement, hence it being treated as false.
> 
> But if you are preallocating, you have a choice of what you write, and
> why not make that zeroes?
> 
>> AFAIK, we've not documented that preallocation guarantees future
>> reads will return zeros. Preallocation simply ensures that all
>> required space is allocated upfront. We do mention that it might
>> be achieved by writing zeros to the underlying storage but never
>> said you'll get zeros back.
> 
> But we have, as I wrote in my second reply.  PreallocMode's
> documentation says that at least “full” is writing zeroes, and to say
> those zeroes can be anywhere in the stack is cheating, from my POV.

I should add that I don’t mind changing the current documentation too much:

>> IOW I think its at most a docs problem to more clearly explain
>> that preallocation != guaranteed zeros for reads.

If there is a good reason to do that, sure.  But it needs to be done
explicitly, with an accompanying justification.  I don’t like just
ignoring the documentation we have.

(And yes, if something says “this writes zeroes”, I personally will
always interpret that as “it will read as zeroes”.)

Max
Daniel P. Berrangé July 11, 2019, 1:01 p.m. UTC | #8
On Thu, Jul 11, 2019 at 02:23:55PM +0200, Max Reitz wrote:
> On 11.07.19 11:20, Daniel P. Berrangé wrote:
> > On Wed, Jul 10, 2019 at 11:24:46PM +0200, Max Reitz wrote:
> >> On 10.07.19 19:03, Maxim Levitsky wrote:
> >>> preallocation=off and preallocation=metadata
> >>> both allocate luks header only, and preallocation=falloc/full
> >>> is passed to underlying file, with the given image size.
> >>>
> >>> Note that the actual preallocated size is a bit smaller due
> >>> to luks header.
> >>
> >> Couldn’t you just preallocate it after creating the crypto header so
> >> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
> >> file size?
> > 
> > Yeah that would be preferrable. If that's really not possible, we
> > could likely provide some API to query the expected hreader size for
> > a given set of creation options. 
> > 
> >>
> >>> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> >>>
> >>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> >>> ---
> >>>  block/crypto.c | 28 ++++++++++++++++++++++++++--
> >>>  1 file changed, 26 insertions(+), 2 deletions(-)
> >>
> >> Hm.  I would expect a preallocated image to read 0.  But if you just
> >> pass this through to the protocol layer, it won’t read 0.
> > 
> > Yes, it will be zeros at the physical layer, but unintelligble
> > garbage from POV of the virtual disk.
> > 
> > I don't think this is really a problem though - this is what you
> > get already if you create a LUKS volume on top of a block device
> > today.
> 
> Which is why we have BlockDriver.bdrv_has_zero_init(), which the LUKS
> driver does not implement, hence it being treated as false.
> 
> But if you are preallocating, you have a choice of what you write, and
> why not make that zeroes?
> 
> > AFAIK, we've not documented that preallocation guarantees future
> > reads will return zeros. Preallocation simply ensures that all
> > required space is allocated upfront. We do mention that it might
> > be achieved by writing zeros to the underlying storage but never
> > said you'll get zeros back.
> 
> But we have, as I wrote in my second reply.  PreallocMode's
> documentation says that at least “full” is writing zeroes, and to say
> those zeroes can be anywhere in the stack is cheating, from my POV.

I guess it depends on your interpretation of the docs. In qemu-img
man page it says

  "falloc" mode preallocates space for image by calling posix_fallocate().
  "full" mode preallocates space for image by writing zeros to underlying
  storage.

To me both those sentances are talking about the lowest level in the
stack, closest to the physical storage medium, though I can understand
if people have other interpretations.

> > IOW I think its at most a docs problem to more clearly explain
> > that preallocation != guaranteed zeros for reads.
> > 
> >> (In fact, I don’t even quite see the point of having LUKS as an own
> >> format still.  It was useful when qcow2 didn’t have LUKS support, but
> >> now it does, so...  I suppose everyone using the LUKS format should
> >> actually be using qcow2 with LUKS?)
> > 
> > Certainly not. LUKS on raw is going to be very common, not least because
> > that's directly compatible with what Linux kernel supports. If you don't
> > want the features of qcow2 like snapshots, it just adds overhead and mgmt
> > complexity for no gain, especially if dealing with block device backed
> > storage (iSCSI, RBD).
> > 
> > OpenStack will use cryptsetup when initializing its block storage with
> > LUKS, then tell QEMU to run with the raw + LUKS driver.
> 
> I see the compatibility with the Linux kernel, yes (as I wrote in my
> second reply), but I’m not sure whether “overhead” really is that big of
> a point when using encryption.

Overhead is not purely about CPU burn. There's non-negligible memory
overhead for qcow2s data tables that doesn't exist at all with raw.
The mgmt complexity & interoperability is the real killer feature
benefit of raw + LUKS vs qcow + LUKS though.

Regards,
Daniel
Eric Blake July 11, 2019, 1:50 p.m. UTC | #9
On 7/11/19 7:24 AM, Max Reitz wrote:

>>> So it isn’t just me who expects these to pre-initialize the image to 0.
>>>  Hm, although...  I suppose @falloc technically does not specify whether
>>> the data reads as zeroes.  I kind of find it to be implied, but, well...
>>
>> I personally don't really think that zeros are important, but rather the level of allocation.
>> posix_fallocate probably won't write the data blocks but rather only the inode metadata / used block bitmap/etc.
>>
>> On the other hand writing zeros (or anything else) will force the block layer to actually write to the underlying
>> storage which could trigger lower layer allocation if the underlying storage is thin-provisioned.
>>
>> In fact IMHO, instead of writing zeros, it would be better to write random garbage instead (or have that as an even 'fuller'
>> preallocation mode), since underlying storage might 'compress' the zeros. 
> 
> Which is actually an argument why you should just write zeroes on the
> LUKS layer, because this will then turn into quasi-random data on the
> protocol layer.

We want preallocation to be fast (insofar as possible). Writing zeroes
in LUKS is not fast, because it forces random data on the protocol
layer; while writing zeroes on the protocol layer can be fast, even if
it reads back as random on the LUKS layer. If you WANT to guarantee
reading zeroes, that's image scrubbing, not preallocation.  I think this
patch is taking the right approach, of letting the underlying layer
allocate data efficiently (but the burden is then on the underlying
layer to actually allocate data, and not optimize by compressing zeroes
into non-allocated storage).
Daniel P. Berrangé July 11, 2019, 1:56 p.m. UTC | #10
On Thu, Jul 11, 2019 at 08:50:56AM -0500, Eric Blake wrote:
> On 7/11/19 7:24 AM, Max Reitz wrote:
> 
> >>> So it isn’t just me who expects these to pre-initialize the image to 0.
> >>>  Hm, although...  I suppose @falloc technically does not specify whether
> >>> the data reads as zeroes.  I kind of find it to be implied, but, well...
> >>
> >> I personally don't really think that zeros are important, but rather the level of allocation.
> >> posix_fallocate probably won't write the data blocks but rather only the inode metadata / used block bitmap/etc.
> >>
> >> On the other hand writing zeros (or anything else) will force the block layer to actually write to the underlying
> >> storage which could trigger lower layer allocation if the underlying storage is thin-provisioned.
> >>
> >> In fact IMHO, instead of writing zeros, it would be better to write random garbage instead (or have that as an even 'fuller'
> >> preallocation mode), since underlying storage might 'compress' the zeros. 
> > 
> > Which is actually an argument why you should just write zeroes on the
> > LUKS layer, because this will then turn into quasi-random data on the
> > protocol layer.
> 
> We want preallocation to be fast (insofar as possible). Writing zeroes
> in LUKS is not fast, because it forces random data on the protocol
> layer; while writing zeroes on the protocol layer can be fast, even if
> it reads back as random on the LUKS layer. If you WANT to guarantee
> reading zeroes, that's image scrubbing, not preallocation.  I think this
> patch is taking the right approach, of letting the underlying layer
> allocate data efficiently (but the burden is then on the underlying
> layer to actually allocate data, and not optimize by compressing zeroes
> into non-allocated storage).

On the topic of scrubbing, it would actually be nice to have a
"secure delete" for QEMU block driver formats that can do some
level of scrubbing in software and/or calling out to hardware support.

Similarly to prealloc a choice of 'metadata' or 'full'. Wwith LUKS
you can do well by just scrubbing the image header (which kills the
master decryption key rendering payload useless).

Regards,
Daniel
Kevin Wolf July 11, 2019, 2:12 p.m. UTC | #11
Am 11.07.2019 um 15:50 hat Eric Blake geschrieben:
> On 7/11/19 7:24 AM, Max Reitz wrote:
> 
> >>> So it isn’t just me who expects these to pre-initialize the image to 0.
> >>>  Hm, although...  I suppose @falloc technically does not specify whether
> >>> the data reads as zeroes.  I kind of find it to be implied, but, well...
> >>
> >> I personally don't really think that zeros are important, but rather the level of allocation.
> >> posix_fallocate probably won't write the data blocks but rather only the inode metadata / used block bitmap/etc.
> >>
> >> On the other hand writing zeros (or anything else) will force the block layer to actually write to the underlying
> >> storage which could trigger lower layer allocation if the underlying storage is thin-provisioned.
> >>
> >> In fact IMHO, instead of writing zeros, it would be better to write random garbage instead (or have that as an even 'fuller'
> >> preallocation mode), since underlying storage might 'compress' the zeros. 
> > 
> > Which is actually an argument why you should just write zeroes on the
> > LUKS layer, because this will then turn into quasi-random data on the
> > protocol layer.
> 
> We want preallocation to be fast (insofar as possible). Writing zeroes
> in LUKS is not fast, because it forces random data on the protocol
> layer; while writing zeroes on the protocol layer can be fast, even if
> it reads back as random on the LUKS layer. If you WANT to guarantee
> reading zeroes, that's image scrubbing, not preallocation.  I think this
> patch is taking the right approach, of letting the underlying layer
> allocate data efficiently (but the burden is then on the underlying
> layer to actually allocate data, and not optimize by compressing zeroes
> into non-allocated storage).

Isn't letting the host efficiently preallocate things what we have
preallocation=falloc for? We implement preallocation=full as explicit
writes to make sure that no shortcuts are taken and things are _really_
preallocated throughout all layers. Not being efficient, but thorough is
almost like the whole point of the option.

So I'm inclined to think that writing zeros on the LUKS layer would be
right for full preallocation.

Kevin

Patch
diff mbox series

diff --git a/block/crypto.c b/block/crypto.c
index 8237424ae6..74b789d278 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -251,6 +251,7 @@  static int block_crypto_open_generic(QCryptoBlockFormat format,
 static int block_crypto_co_create_generic(BlockDriverState *bs,
                                           int64_t size,
                                           QCryptoBlockCreateOptions *opts,
+                                          PreallocMode prealloc,
                                           Error **errp)
 {
     int ret;
@@ -266,6 +267,13 @@  static int block_crypto_co_create_generic(BlockDriverState *bs,
         goto cleanup;
     }
 
+    if (prealloc != PREALLOC_MODE_OFF) {
+        ret = blk_truncate(blk, size, prealloc, errp);
+        if (ret < 0) {
+                goto cleanup;
+        }
+    }
+
     data = (struct BlockCryptoCreateData) {
         .blk = blk,
         .size = size,
@@ -516,7 +524,7 @@  block_crypto_co_create_luks(BlockdevCreateOptions *create_options, Error **errp)
     };
 
     ret = block_crypto_co_create_generic(bs, luks_opts->size, &create_opts,
-                                         errp);
+                                         PREALLOC_MODE_OFF, errp);
     if (ret < 0) {
         goto fail;
     }
@@ -534,12 +542,28 @@  static int coroutine_fn block_crypto_co_create_opts_luks(const char *filename,
     QCryptoBlockCreateOptions *create_opts = NULL;
     BlockDriverState *bs = NULL;
     QDict *cryptoopts;
+    PreallocMode prealloc;
+    char *buf = NULL;
     int64_t size;
     int ret;
+    Error *local_err = NULL;
 
     /* Parse options */
     size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
 
+    buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
+    prealloc = qapi_enum_parse(&PreallocMode_lookup, buf,
+                                   PREALLOC_MODE_OFF, &local_err);
+    g_free(buf);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return -EINVAL;
+    }
+
+    if (prealloc == PREALLOC_MODE_METADATA) {
+        prealloc  = PREALLOC_MODE_OFF;
+    }
+
     cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL,
                                              &block_crypto_create_opts_luks,
                                              true);
@@ -565,7 +589,7 @@  static int coroutine_fn block_crypto_co_create_opts_luks(const char *filename,
     }
 
     /* Create format layer */
-    ret = block_crypto_co_create_generic(bs, size, create_opts, errp);
+    ret = block_crypto_co_create_generic(bs, size, create_opts, prealloc, errp);
     if (ret < 0) {
         goto fail;
     }