diff mbox

[v3] docs: add blkdebug block driver documentation

Message ID 1411551854-23025-1-git-send-email-stefanha@redhat.com
State New
Headers show

Commit Message

Stefan Hajnoczi Sept. 24, 2014, 9:44 a.m. UTC
The blkdebug block driver is undocumented.  Documenting it is worthwhile
since it offers powerful error injection features that are used by
qemu-iotests test cases.

This document will make it easier for people to learn about and use
blkdebug.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v3:
 * Fix tab space damage [Eric]
 * Rephrase event_names[] as full list of events [Eric]
 * Explain that blkdebug state is not observable from outside [Eric]
 * Clarify state 0 and state 1 [Eric]

v2:
 * Added GPL v2 or later license and Red Hat copyright [Eric]
 * Expanded ini rules file explanation [Paolo]
 * Added note that errno values depend on the host [Eric]

 docs/blkdebug.txt | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 161 insertions(+)
 create mode 100644 docs/blkdebug.txt

Comments

Benoît Canet Sept. 24, 2014, 3:43 p.m. UTC | #1
The Wednesday 24 Sep 2014 à 10:44:14 (+0100), Stefan Hajnoczi wrote :
> The blkdebug block driver is undocumented.  Documenting it is worthwhile
> since it offers powerful error injection features that are used by
> qemu-iotests test cases.
> 
> This document will make it easier for people to learn about and use
> blkdebug.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> v3:
>  * Fix tab space damage [Eric]
>  * Rephrase event_names[] as full list of events [Eric]
>  * Explain that blkdebug state is not observable from outside [Eric]
>  * Clarify state 0 and state 1 [Eric]
> 
> v2:
>  * Added GPL v2 or later license and Red Hat copyright [Eric]
>  * Expanded ini rules file explanation [Paolo]
>  * Added note that errno values depend on the host [Eric]
> 
>  docs/blkdebug.txt | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 161 insertions(+)
>  create mode 100644 docs/blkdebug.txt
> 
> diff --git a/docs/blkdebug.txt b/docs/blkdebug.txt
> new file mode 100644
> index 0000000..5dde072
> --- /dev/null
> +++ b/docs/blkdebug.txt
> @@ -0,0 +1,161 @@
> +Block I/O error injection using blkdebug
> +----------------------------------------
> +Copyright (C) 2014 Red Hat Inc
> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.  See
> +the COPYING file in the top-level directory.
> +
> +The blkdebug block driver is a rule-based error injection engine.  It can be
> +used to exercise error code paths in block drivers including ENOSPC (out of
> +space) and EIO.
> +
> +This document gives an overview of the features available in blkdebug.
> +
> +Background
> +----------
> +Block drivers have many error code paths that handle I/O errors.  Image formats
> +are especially complex since metadata I/O errors during cluster allocation or
> +while updating tables happen halfway through request processing and require
> +discipline to keep image files consistent.
> +
> +Error injection allows test cases to trigger I/O errors at specific points.
> +This way, all error paths can be tested to make sure they are correct.
> +
> +Rules
> +-----
> +The blkdebug block driver takes a list of "rules" that tell the error injection
> +engine when to fail an I/O request.
> +
> +Each I/O request is evaluated against the rules.  If a rule matches the request
> +then its "action" is executed.
> +
> +Rules can be placed in a configuration file; the configuration file
> +follows the same .ini-like format used by QEMU's -readconfig option, and
> +each section of the file represents a rule.
> +
> +The following configuration file defines a single rule:
> +
> +  $ cat blkdebug.conf
> +  [inject-error]
> +  event = "read_aio"
> +  errno = "28"
> +
> +This rule fails all aio read requests with ENOSPC (28).  Note that the errno
> +value depends on the host.  On Linux, see
> +/usr/include/asm-generic/errno-base.h for errno values.
> +
> +Invoke QEMU as follows:
> +
> +  $ qemu-system-x86_64
> +        -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \
> +        -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0
> +
> +Rules support the following attributes:
> +
> +  event - which type of operation to match (e.g. read_aio, write_aio,
> +          flush_to_os, flush_to_disk).  See the "Events" section for
> +          information on events.
> +
> +  state - (optional) the engine must be in this state number in order for this
> +          rule to match.  See the "State transitions" section for information
> +          on states.
> +
> +  errno - the numeric errno value to return when a request matches this rule.
> +          The errno values depend on the host since the numeric values are not
> +          standarized in the POSIX specification.
> +
> +  sector - (optional) a sector number that the request must overlap in order to
> +           match this rule
> +
> +  once - (optional, default "off") only execute this action on the first
> +         matching request
> +
> +  immediately - (optional, default "off") return a NULL BlockDriverAIOCB
> +                pointer and fail without an errno instead.  This exercises the
> +                code path where BlockDriverAIOCB fails and the caller's
> +                BlockDriverCompletionFunc is not invoked.
> +
> +Events
> +------
> +Block drivers provide information about the type of I/O request they are about
> +to make so rules can match specific types of requests.  For example, the qcow2
> +block driver tells blkdebug when it accesses the L1 table so rules can match
> +only L1 table accesses and not other metadata or guest data requests.
> +
> +The core events are:
> +
> +  read_aio - guest data read
> +
> +  write_aio - guest data write
> +
> +  flush_to_os - write out unwritten block driver state (e.g. cached metadata)
> +
> +  flush_to_disk - flush the host block device's disk cache
> +
> +See block/blkdebug.c:event_names[] for the full list of events.  You may need
> +to grep block driver source code to understand the meaning of specific events.
> +
> +State transitions
> +-----------------
> +There are cases where more power is needed to match a particular I/O request in
> +a longer sequence of requests.  For example:
> +
> +  write_aio
> +  flush_to_disk
> +  write_aio
> +
> +How do we match the 2nd write_aio but not the first?  This is where state
> +transitions come in.
> +
> +The error injection engine has an integer called the "state" that always starts
> +initialized to 1.  The state integer is internal to blkdebug and cannot be
> +observed from outside but rules can interact with it for powerful matching
> +behavior.
> +
> +Rules can be conditional on the current state and they can transition to a new
> +state.
> +
> +When a rule's "state" attribute is non-zero then the current state must equal
> +the attribute in order for the rule to match.
> +
> +For example, to match the 2nd write_aio:
> +
> +  [set-state]
> +  event = "write_aio"
> +  state = "1"
> +  new_state = "2"
> +
> +  [inject-error]
> +  event = "write_aio"
> +  state = "2"
> +  errno = "5"
> +
> +The first write_aio request matches the set-state rule and transitions from
> +state 1 to state 2.  Once state 2 has been entered, the set-state rule no
> +longer matches since it requires state 1.  But the inject-error rule now
> +matches the next write_aio request and injects EIO (5).
> +
> +State transition rules support the following attributes:
> +
> +  event - which type of operation to match (e.g. read_aio, write_aio,
> +          flush_to_os, flush_to_disk).  See the "Events" section for
> +          information on events.
> +
> +  state - (optional) the engine must be in this state number in order for this
> +          rule to match
> +
> +  new_state - transition to this state number
> +
> +Suspend and resume
> +------------------
> +Exercising code paths in block drivers may require specific ordering amongst
> +concurrent requests.  The "breakpoint" feature allows requests to be halted on
> +a blkdebug event and resumed later.  This makes it possible to achieve
> +deterministic ordering when multiple requests are in flight.
> +
> +Breakpoints on blkdebug events are associated with a user-defined "tag" string.
> +This tag serves as an identifier by which the request can be resumed at a later
> +point.
> +
> +See the qemu-io(1) break, resume, remove_break, and wait_break commands for
> +details.
> -- 
> 1.9.3
> 
> 

I won't be able to spellcheck and help clarify it better than Eric but it's
nice that it is getting documented since it's a powerfull and useful feature.

Best regards

Benoît
Max Reitz Sept. 24, 2014, 4:24 p.m. UTC | #2
On 24.09.2014 11:44, Stefan Hajnoczi wrote:
> The blkdebug block driver is undocumented.  Documenting it is worthwhile
> since it offers powerful error injection features that are used by
> qemu-iotests test cases.
>
> This document will make it easier for people to learn about and use
> blkdebug.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> v3:
>   * Fix tab space damage [Eric]
>   * Rephrase event_names[] as full list of events [Eric]
>   * Explain that blkdebug state is not observable from outside [Eric]
>   * Clarify state 0 and state 1 [Eric]
>
> v2:
>   * Added GPL v2 or later license and Red Hat copyright [Eric]
>   * Expanded ini rules file explanation [Paolo]
>   * Added note that errno values depend on the host [Eric]
>
>   docs/blkdebug.txt | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 161 insertions(+)
>   create mode 100644 docs/blkdebug.txt

Reviewed-by: Max Reitz <mreitz@redhat.com>

Maybe I'll add information about blkdebug's QMP interface sometime...
John Snow Sept. 24, 2014, 4:26 p.m. UTC | #3
On 09/24/2014 12:24 PM, Max Reitz wrote:
> On 24.09.2014 11:44, Stefan Hajnoczi wrote:
>> The blkdebug block driver is undocumented.  Documenting it is worthwhile
>> since it offers powerful error injection features that are used by
>> qemu-iotests test cases.
>>
>> This document will make it easier for people to learn about and use
>> blkdebug.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>> v3:
>>   * Fix tab space damage [Eric]
>>   * Rephrase event_names[] as full list of events [Eric]
>>   * Explain that blkdebug state is not observable from outside [Eric]
>>   * Clarify state 0 and state 1 [Eric]
>>
>> v2:
>>   * Added GPL v2 or later license and Red Hat copyright [Eric]
>>   * Expanded ini rules file explanation [Paolo]
>>   * Added note that errno values depend on the host [Eric]
>>
>>   docs/blkdebug.txt | 161
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 161 insertions(+)
>>   create mode 100644 docs/blkdebug.txt
>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
>
> Maybe I'll add information about blkdebug's QMP interface sometime...

If you have the cycles and the knowledge, you definitely should!
Eric Blake Sept. 24, 2014, 5:12 p.m. UTC | #4
On 09/24/2014 03:44 AM, Stefan Hajnoczi wrote:
> The blkdebug block driver is undocumented.  Documenting it is worthwhile
> since it offers powerful error injection features that are used by
> qemu-iotests test cases.
> 
> This document will make it easier for people to learn about and use
> blkdebug.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> v3:
>  * Fix tab space damage [Eric]
>  * Rephrase event_names[] as full list of events [Eric]
>  * Explain that blkdebug state is not observable from outside [Eric]
>  * Clarify state 0 and state 1 [Eric]

Thanks for the updates.

Reviewed-by: Eric Blake <eblake@redhat.com>


> +
> +The error injection engine has an integer called the "state" that always starts
> +initialized to 1.  The state integer is internal to blkdebug and cannot be
> +observed from outside but rules can interact with it for powerful matching
> +behavior.
> +

We could always expose it and update the documentation in a later patch.
 But it is fair to document the current state of things (current docs
are always better than no docs :)
Kevin Wolf Sept. 25, 2014, 7:41 a.m. UTC | #5
Am 24.09.2014 um 11:44 hat Stefan Hajnoczi geschrieben:
> The blkdebug block driver is undocumented.  Documenting it is worthwhile
> since it offers powerful error injection features that are used by
> qemu-iotests test cases.
> 
> This document will make it easier for people to learn about and use
> blkdebug.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Thanks, applied to the block branch.

Kevin
diff mbox

Patch

diff --git a/docs/blkdebug.txt b/docs/blkdebug.txt
new file mode 100644
index 0000000..5dde072
--- /dev/null
+++ b/docs/blkdebug.txt
@@ -0,0 +1,161 @@ 
+Block I/O error injection using blkdebug
+----------------------------------------
+Copyright (C) 2014 Red Hat Inc
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.  See
+the COPYING file in the top-level directory.
+
+The blkdebug block driver is a rule-based error injection engine.  It can be
+used to exercise error code paths in block drivers including ENOSPC (out of
+space) and EIO.
+
+This document gives an overview of the features available in blkdebug.
+
+Background
+----------
+Block drivers have many error code paths that handle I/O errors.  Image formats
+are especially complex since metadata I/O errors during cluster allocation or
+while updating tables happen halfway through request processing and require
+discipline to keep image files consistent.
+
+Error injection allows test cases to trigger I/O errors at specific points.
+This way, all error paths can be tested to make sure they are correct.
+
+Rules
+-----
+The blkdebug block driver takes a list of "rules" that tell the error injection
+engine when to fail an I/O request.
+
+Each I/O request is evaluated against the rules.  If a rule matches the request
+then its "action" is executed.
+
+Rules can be placed in a configuration file; the configuration file
+follows the same .ini-like format used by QEMU's -readconfig option, and
+each section of the file represents a rule.
+
+The following configuration file defines a single rule:
+
+  $ cat blkdebug.conf
+  [inject-error]
+  event = "read_aio"
+  errno = "28"
+
+This rule fails all aio read requests with ENOSPC (28).  Note that the errno
+value depends on the host.  On Linux, see
+/usr/include/asm-generic/errno-base.h for errno values.
+
+Invoke QEMU as follows:
+
+  $ qemu-system-x86_64
+        -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \
+        -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0
+
+Rules support the following attributes:
+
+  event - which type of operation to match (e.g. read_aio, write_aio,
+          flush_to_os, flush_to_disk).  See the "Events" section for
+          information on events.
+
+  state - (optional) the engine must be in this state number in order for this
+          rule to match.  See the "State transitions" section for information
+          on states.
+
+  errno - the numeric errno value to return when a request matches this rule.
+          The errno values depend on the host since the numeric values are not
+          standarized in the POSIX specification.
+
+  sector - (optional) a sector number that the request must overlap in order to
+           match this rule
+
+  once - (optional, default "off") only execute this action on the first
+         matching request
+
+  immediately - (optional, default "off") return a NULL BlockDriverAIOCB
+                pointer and fail without an errno instead.  This exercises the
+                code path where BlockDriverAIOCB fails and the caller's
+                BlockDriverCompletionFunc is not invoked.
+
+Events
+------
+Block drivers provide information about the type of I/O request they are about
+to make so rules can match specific types of requests.  For example, the qcow2
+block driver tells blkdebug when it accesses the L1 table so rules can match
+only L1 table accesses and not other metadata or guest data requests.
+
+The core events are:
+
+  read_aio - guest data read
+
+  write_aio - guest data write
+
+  flush_to_os - write out unwritten block driver state (e.g. cached metadata)
+
+  flush_to_disk - flush the host block device's disk cache
+
+See block/blkdebug.c:event_names[] for the full list of events.  You may need
+to grep block driver source code to understand the meaning of specific events.
+
+State transitions
+-----------------
+There are cases where more power is needed to match a particular I/O request in
+a longer sequence of requests.  For example:
+
+  write_aio
+  flush_to_disk
+  write_aio
+
+How do we match the 2nd write_aio but not the first?  This is where state
+transitions come in.
+
+The error injection engine has an integer called the "state" that always starts
+initialized to 1.  The state integer is internal to blkdebug and cannot be
+observed from outside but rules can interact with it for powerful matching
+behavior.
+
+Rules can be conditional on the current state and they can transition to a new
+state.
+
+When a rule's "state" attribute is non-zero then the current state must equal
+the attribute in order for the rule to match.
+
+For example, to match the 2nd write_aio:
+
+  [set-state]
+  event = "write_aio"
+  state = "1"
+  new_state = "2"
+
+  [inject-error]
+  event = "write_aio"
+  state = "2"
+  errno = "5"
+
+The first write_aio request matches the set-state rule and transitions from
+state 1 to state 2.  Once state 2 has been entered, the set-state rule no
+longer matches since it requires state 1.  But the inject-error rule now
+matches the next write_aio request and injects EIO (5).
+
+State transition rules support the following attributes:
+
+  event - which type of operation to match (e.g. read_aio, write_aio,
+          flush_to_os, flush_to_disk).  See the "Events" section for
+          information on events.
+
+  state - (optional) the engine must be in this state number in order for this
+          rule to match
+
+  new_state - transition to this state number
+
+Suspend and resume
+------------------
+Exercising code paths in block drivers may require specific ordering amongst
+concurrent requests.  The "breakpoint" feature allows requests to be halted on
+a blkdebug event and resumed later.  This makes it possible to achieve
+deterministic ordering when multiple requests are in flight.
+
+Breakpoints on blkdebug events are associated with a user-defined "tag" string.
+This tag serves as an identifier by which the request can be resumed at a later
+point.
+
+See the qemu-io(1) break, resume, remove_break, and wait_break commands for
+details.