Patchwork [RFC,06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations

login
register
mail settings
Submitter Richard Laager
Date March 15, 2012, 12:42 a.m.
Message ID <1331772155.24052.849.camel@watermelon.coderich.net>
Download mbox | patch
Permalink /patch/146792/
State New
Headers show

Comments

Richard Laager - March 15, 2012, 12:42 a.m.
On Wed, 2012-03-14 at 08:41 +0100, Paolo Bonzini wrote:
> Il 13/03/2012 20:13, Richard Laager ha scritto:
> >>> > >       * For SCSI, report an unmap_granularity to the guest as follows:
> >>> > >       max(logical_block_size, discard_granularity) / logical_block_size
> >> > 
> >> > This is more or less already in place later in the series.
> > I didn't see it. Which patch number?
> 
> Patch 11:

I was saying QEMU should pass the discard_granularity to the guest as
OPTIMAL UNMAP GRANULARITY. This would almost surely need to be done in
hw/scsi-disk.c, roughly around this change from your patch 10:


The code from patch 11 is more along the lines of what I think QEMU
should have in the block layer:

> +    } else if (discard_granularity < s->qdev.conf.logical_block_size) {
> +        error_report("scsi-block: invalid discard_granularity");
> +        return -1;
>
> +    } else if (discard_granularity & (discard_granularity - 1)) {
> +        error_report("scsi-block: discard_granularity not a power of two");
> +        return -1;
> +    }

However, I think the first check is unnecessarily restrictive. As long
as discard_granularity is a power of two, if it's less than the block
size (which is also a power of two), the block size will always be a
multiple of discard_granularity, so there's no problem.

I'd also like to explicitly allow discard_granularity = 1, which is what
fallocate() provides.

> It is worse in that we do not want the hardware parameters exposed to the
> guest to change behind the scenes, except if you change the machine type
> or if you use the default unversioned type.

You're saying that discard_granularity and discard_zeros_data need to be
properties of the machine type? If you start with that as a requirement,
I can see why you want to always report discard_granularity=512 &
discard_zeros_data=1. But that design has many downsides. I'm not
convinced that discard_granularity and discard_zeros_data need to be
properties of the machine type. Why do you feel that's necessary? What's
the harm in those properties changing across QEMU startups (i.e. guest
boots)?
Paolo Bonzini - March 15, 2012, 9:36 a.m.
Il 15/03/2012 01:42, Richard Laager ha scritto:
>> > It is worse in that we do not want the hardware parameters exposed to the
>> > guest to change behind the scenes, except if you change the machine type
>> > or if you use the default unversioned type.
> You're saying that discard_granularity and discard_zeros_data need to be
> properties of the machine type? If you start with that as a requirement,
> I can see why you want to always report discard_granularity=512 &
> discard_zeros_data=1. But that design has many downsides. I'm not
> convinced that discard_granularity and discard_zeros_data need to be
> properties of the machine type. Why do you feel that's necessary? What's
> the harm in those properties changing across QEMU startups (i.e. guest
> boots)?

Changing across guest boots is a minor problem, but changing across
migration must be avoided at all costs.

BTW, after this discussion I think we can instead report
discard_granularity = 512 and discard_zeroes_data=0 and get most of the
benefit, at least on file-backed storage.

Paolo
Richard Laager - March 16, 2012, 12:47 a.m.
On Thu, 2012-03-15 at 10:36 +0100, Paolo Bonzini wrote:
> Changing across guest boots is a minor problem, but changing across
> migration must be avoided at all costs.
> 
> BTW, after this discussion I think we can instead report
> discard_granularity = 512 and discard_zeroes_data=0 and get most of the
> benefit, at least on file-backed storage.

Are you going to report that to guests all the time, or only when the
host supports discard? If you don't report it all the time, you could
still be "changing across migration". If you do report it all the time,
then you're incurring a performance penalty on systems that don't
support discard, as the guest will be sending discard requests that QEMU
has to throw away (but by which time some work has been wasted).

And either way, what you're proposing means that users with
discard_zeros_data = 1 hosts can't get the (albeit small) benefits of
that because some other QEMU user might want to do a migration across
heterogeneous storage.

The more I think about this, the more it seems like we just need to make
this configurable. Then people that, because of migration concerns, want
to advertise lowest-common-denominator storage to their guests can do so
(just like they already can and likely do advertise a
lowest-common-denominator CPU). Everyone else can get whatever their
host supports (just like they do with CPUs).

Adding configuration options also means that developers can use QEMU to
test guest discard support in various ways.

Finally, I see your proposal of advertising fixed discard support to
guests (regardless of which set of values is chosen) out of line with
the existing QEMU precedent with CPUs as well as the design decision
used for discard in the Linux kernel (which passes the values all the
way up the stack). While this doesn't inherently mean it's a bad design,
I think QEMU should tread carefully.
Paolo Bonzini - March 16, 2012, 9:34 a.m.
Il 16/03/2012 01:47, Richard Laager ha scritto:
> On Thu, 2012-03-15 at 10:36 +0100, Paolo Bonzini wrote:
>> Changing across guest boots is a minor problem, but changing across
>> migration must be avoided at all costs.
>>
>> BTW, after this discussion I think we can instead report
>> discard_granularity = 512 and discard_zeroes_data=0 and get most of the
>> benefit, at least on file-backed storage.
> 
> Are you going to report that to guests all the time, or only when the
> host supports discard? If you don't report it all the time, you could
> still be "changing across migration". If you do report it all the time,
> then you're incurring a performance penalty on systems that don't
> support discard, as the guest will be sending discard requests that QEMU
> has to throw away (but by which time some work has been wasted).

I don't think that should be that bad.  Discard requests should be
relatively rare.

> And either way, what you're proposing means that users with
> discard_zeros_data = 1 hosts can't get the (albeit small) benefits of
> that because some other QEMU user might want to do a migration across
> heterogeneous storage.

Yes, discard_zeroes_data can be made configurable on top, and either
rejected or emulated if the storage does not support it.

> Finally, I see your proposal of advertising fixed discard support

It does not really have to be fixed, it's just a default.

Paolo

Patch

--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -1295,8 +1295,11 @@  static int scsi_disk_emulate_command(SCSIDiskReq *r)
             outbuf[13] = get_physical_block_exp(&s->qdev.conf);
 
             /* set TPE bit if the format supports discard */
-            if (s->qdev.conf.discard_granularity) {
+            if (s->qdev.type == TYPE_DISK && s->qdev.conf.discard_granularity) {
                 outbuf[14] = 0x80;
+                if (s->qdev.conf.discard_zeroes_data) {
+                    outbuf[14] |= 0x40;
+                }
             }
 
             /* Protection, exponent and lowest lba field left blank. */