From patchwork Tue Dec 2 07:47:45 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Francesco Romani X-Patchwork-Id: 416737 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id C8BE21402A9 for ; Tue, 2 Dec 2014 18:48:18 +1100 (AEDT) Received: from localhost ([::1]:35516 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XviBz-0005rL-NQ for incoming@patchwork.ozlabs.org; Tue, 02 Dec 2014 02:48:15 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51111) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XviBe-0005RG-Eb for qemu-devel@nongnu.org; Tue, 02 Dec 2014 02:47:58 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XviBZ-0006dp-OE for qemu-devel@nongnu.org; Tue, 02 Dec 2014 02:47:54 -0500 Received: from mx5-phx2.redhat.com ([209.132.183.37]:39428) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XviBZ-0006cd-FO for qemu-devel@nongnu.org; Tue, 02 Dec 2014 02:47:49 -0500 Received: from zmail22.collab.prod.int.phx2.redhat.com (zmail22.collab.prod.int.phx2.redhat.com [10.5.83.26]) by mx5-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sB27ljXw015955; Tue, 2 Dec 2014 02:47:45 -0500 Date: Tue, 2 Dec 2014 02:47:45 -0500 (EST) From: Francesco Romani To: Eric Blake Message-ID: <364789514.5335967.1417506465064.JavaMail.zimbra@redhat.com> In-Reply-To: <547CD89A.2090903@redhat.com> References: <1417177867-27918-1-git-send-email-fromani@redhat.com> <1417177867-27918-2-git-send-email-fromani@redhat.com> <547CD89A.2090903@redhat.com> MIME-Version: 1.0 X-Originating-IP: [10.5.82.12] X-Mailer: Zimbra 8.0.6_GA_5922 (ZimbraWebClient - FF33 (Linux)/8.0.6_GA_5922) Thread-Topic: block: add event when disk usage exceeds threshold Thread-Index: cfBZOEkkM0GzgfYnK4j4KsyZufwYuA== X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.37 Cc: kwolf@redhat.com, lcapitulino@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, mdroth@linux.vnet.ibm.com Subject: Re: [Qemu-devel] [PATCHv3] block: add event when disk usage exceeds threshold X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Thanks for the quick review! ----- Original Message ----- > From: "Eric Blake" > To: "Francesco Romani" , qemu-devel@nongnu.org > Cc: kwolf@redhat.com, mdroth@linux.vnet.ibm.com, stefanha@redhat.com, lcapitulino@redhat.com > Sent: Monday, December 1, 2014 10:07:38 PM > Subject: Re: [Qemu-devel] [PATCHv3] block: add event when disk usage exceeds threshold > > On 11/28/2014 05:31 AM, Francesco Romani wrote: > > Managing applications, like oVirt (http://www.ovirt.org), make extensive > > use of thin-provisioned disk images. > > To let the guest run smoothly and be not unnecessarily paused, oVirt sets > > a disk usage threshold (so called 'high water mark') based on the > > occupation > > of the device, and automatically extends the image once the threshold > > is reached or exceeded. > > > > In order to detect the crossing of the threshold, oVirt has no choice but > > aggressively polling the QEMU monitor using the query-blockstats command. > > This lead to unnecessary system load, and is made even worse under scale: > > deployments with hundreds of VMs are no longer rare. > > > > To fix this, this patch adds: > > * A new monitor command to set a mark for a given block device. > > * A new event to report if a block device usage exceeds the threshold. > > > > This will allow the managing application to drop the polling > > altogether and just wait for a watermark crossing event. > > I like the idea! > > Question - what happens if management misses the event (for example, if > libvirtd is restarted)? Does the existing 'query-blockstats' and/or > 'query-named-block-nodes' still work to query the current threshold and > whether it has been exceeded, as a poll-once command executed when > reconnecting to the monitor? Indeed oVirt wants to keep the existing polling and to use it as fallback, to make sure no events are missed. oVirt will not rely on the new notification *alone*. The plan is to "just" poll *much less* frequently. Today's default poll rate is every 2 (two) seconds, so there is a lot of room for improvement. > > > > > Signed-off-by: Francesco Romani > > --- > > No need for a 0/1 cover letter on a 1-patch series; you have the option > of just putting the side-band information here and sending it as a > single mail. But the cover letter approach doesn't hurt either, and I > can see how it can be easier for some workflows to always send a cover > letter than to special-case a 1-patch series. Also, I found that a separate context/introduction/room for comments would helped, especially on first two revisions of the patch which were tagged as RFC. I have zero problems in dropping the cover letter now that consensus is forming and patch is taking shape, just let me know. [... snip spelling: thanks! will fix] > > +++ b/qapi/block-core.json > > @@ -239,6 +239,9 @@ > > # > > # @iops_size: #optional an I/O size in bytes (Since 1.7) > > # > > +# @write-threshold: configured write threshold for the device. > > +# 0 if disabled. (Since 2.3) > > +# > > # Since: 0.14.0 > > # > > ## > > @@ -253,7 +256,7 @@ > > '*bps_max': 'int', '*bps_rd_max': 'int', > > '*bps_wr_max': 'int', '*iops_max': 'int', > > '*iops_rd_max': 'int', '*iops_wr_max': 'int', > > - '*iops_size': 'int' } } > > + '*iops_size': 'int', 'write-threshold': 'uint64' } } > > In QMP specs, 'uint64' and 'int' are practically synonyms. I can live > with either spelling, although 'int' is more common. > > Bikeshed on naming: Although we prefer '-' over '_' in new interfaces, > we also favor consistency, and BlockDeviceInfo is one of those dinosaur > commands that uses _ everywhere until your addition. So naming this > field 'write_threshold' would be more consistent. Agreed. Will fix. > > +# > > +# @node-name: graph node name on which the threshold was exceeded. > > +# > > +# @amount-exceeded: amount of data which exceeded the threshold, in bytes. > > +# > > +# @offset-threshold: last configured threshold, in bytes. > > +# > > Might want to mention that this event is one-shot; after it triggers, a > user must re-register a threshold to get the event again. Good point. Will fix. > > > +# Since: 2.3 > > +## > > +{ 'event': 'BLOCK_USAGE_THRESHOLD', > > + 'data': { 'node-name': 'str', > > + 'amount-exceeded': 'uint64', > > TAB damage. Please use spaces. ./scripts/checkpatch.pl will catch some > offenders (although I didn't test if it will catch this one). > > However, here you are correct in using '-' for naming :) Oops. Rebase error (was clean before!) will fix. > > > + 'threshold': 'uint64' } } > > + > > +## > > +# @block-set-threshold > > +# > > +# Change usage threshold for a block drive. An event will be delivered > > +# if a write to this block drive crosses the configured threshold. > > +# This is useful to transparently resize thin-provisioned drives without > > +# the guest OS noticing. > > +# > > +# @node-name: graph node name on which the threshold must be set. > > +# > > +# @write-threshold: configured threshold for the block device, bytes. > > +# Use 0 to disable the threshold. > > +# > > +# Returns: Nothing on success > > +# If @node name is not found on the block device graph, > > +# DeviceNotFound > > +# > > +# Since: 2.3 > > +## > > +{ 'command': 'block-set-threshold', > > + 'data': { 'node-name': 'str', 'threshold': 'uint64' } } > > Again, 'int' instead of 'uint64' is more typical, but shouldn't hurt > with either spelling. > > > +SQMP > > +block-set-threshold > > +------------ > > + > > +Change the write threshold for a block drive. The threshold is an offset, > > +thus must be non-negative. Default is not write threshold. > > s/not/no/ > > > +To set the threshold to zero disables it. > > s/To set/Setting/ > > Missing the change to the 'query-block' and 'query-named-block-nodes' > examples to show the new always-present output field. Sorry, not sure I'm following. Isn't that hunk enough? It produces the output I'd expect on my tests: Welcome to the QMP low-level shell! Connected to QEMU 2.1.91 (QEMU) query-block { u'return': [ { u'device': u'drive-virtio-disk0', u'inserted': { u'backing_file_depth': 0, u'bps': 0, u'bps_rd': 0, u'bps_wr': 0, u'detect_zeroes': u'off', u'drv': u'qcow2', u'encrypted': False, u'encryption_key_missing': False, u'file': u'/var/lib/libvirt/images/fedora20.qcow2', u'image': { u'actual-size': 11804672, u'cluster-size': 65536, u'dirty-flag': False, u'filename': u'/var/lib/libvirt/images/fedora20.qcow2', u'format': u'qcow2', u'format-specific': { u'data': { u'compat': u'1.1', u'corrupt': False, u'lazy-refcounts': False}, u'type': u'qcow2'}, u'virtual-size': 17179869184}, u'iops': 0, u'iops_rd': 0, u'iops_wr': 0, u'node-name': u'foo', u'ro': False, u'write-threshold': 0}, u'io-status': u'ok', u'locked': False, u'removable': False, u'type': u'unknown'}, { u'device': u'drive-ide0-0-0', u'inserted': { u'backing_file_depth': 0, u'bps': 0, u'bps_rd': 0, u'bps_wr': 0, u'detect_zeroes': u'off', u'drv': u'raw', u'encrypted': False, u'encryption_key_missing': False, u'file': u'/var/lib/libvirt/images/Fedora-Live-Desktop-x86_64-20-1.iso', u'image': { u'actual-size': 999297024, u'dirty-flag': False, u'filename': u'/var/lib/libvirt/images/Fedora-Live-Desktop-x86_64-20-1.iso', u'format': u'raw', u'virtual-size': 999292928}, u'iops': 0, u'iops_rd': 0, u'iops_wr': 0, u'ro': True, u'write-threshold': 0}, u'io-status': u'ok', u'locked': False, u'removable': True, u'tray_open': False, u'type': u'unknown'}]} (QEMU) query-named-block-nodes { u'return': [ { u'backing_file_depth': 0, u'bps': 0, u'bps_rd': 0, u'bps_wr': 0, u'detect_zeroes': u'off', u'drv': u'qcow2', u'encrypted': False, u'encryption_key_missing': False, u'file': u'/var/lib/libvirt/images/fedora20.qcow2', u'image': { }, u'iops': 0, u'iops_rd': 0, u'iops_wr': 0, u'node-name': u'foo', u'ro': False, u'write-threshold': 0}, { u'backing_file_depth': 0, u'bps': 0, u'bps_rd': 0, u'bps_wr': 0, u'detect_zeroes': u'off', u'drv': u'file', u'encrypted': False, u'encryption_key_missing': False, u'file': u'/var/lib/libvirt/images/fedora20.qcow2', u'image': { }, u'iops': 0, u'iops_rd': 0, u'iops_wr': 0, u'node-name': u'bar', u'ro': False, u'write-threshold': 0}]} (relevant cmd-line snippet: -drive file=/var/lib/libvirt/images/fedora20.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,node-name=foo,file.node-name=bar \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 \ ) Bests, diff --git a/block/qapi.c b/block/qapi.c index a87a34a..c85abb0 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -24,6 +24,7 @@ #include "block/qapi.h" #include "block/block_int.h" +#include "block/usage-threshold.h" #include "qmp-commands.h" #include "qapi-visit.h" #include "qapi/qmp-output-visitor.h" @@ -82,6 +83,8 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs) info->iops_size = cfg.op_size; } + info->write_threshold = bdrv_usage_threshold_get(bs); + return info; }