diff mbox

qed: add qed-tool.py image manipulation utility

Message ID 1322758843-19809-1-git-send-email-stefanha@linux.vnet.ibm.com
State New
Headers show

Commit Message

Stefan Hajnoczi Dec. 1, 2011, 5 p.m. UTC
The qed-tool.py utility can inspect and manipulate QED image files.  It
can be used for testing to see the state of image metadata and also to
inject corruptions into the image file.  It also has a scrubbing feature
to copy just the metadata out of an image file, allowing users to share
broken image files without revealing data in bug reports.

This has lived in my local repo for a long time but could be useful to
others.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 scripts/qed-tool.py |  250 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 250 insertions(+), 0 deletions(-)
 create mode 100755 scripts/qed-tool.py

Comments

Kevin Wolf Dec. 2, 2011, 9:15 a.m. UTC | #1
Am 01.12.2011 18:00, schrieb Stefan Hajnoczi:
> The qed-tool.py utility can inspect and manipulate QED image files.  It
> can be used for testing to see the state of image metadata and also to
> inject corruptions into the image file.  It also has a scrubbing feature
> to copy just the metadata out of an image file, allowing users to share
> broken image files without revealing data in bug reports.
> 
> This has lived in my local repo for a long time but could be useful to
> others.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

For most of the commands, I think qemu-img/qemu-io should be extended
instead of creating scripts for one or two formats and lacking the
functionality for the rest.

Kevin
Stefan Hajnoczi Dec. 2, 2011, 10:23 a.m. UTC | #2
On Fri, Dec 2, 2011 at 9:15 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 01.12.2011 18:00, schrieb Stefan Hajnoczi:
>> The qed-tool.py utility can inspect and manipulate QED image files.  It
>> can be used for testing to see the state of image metadata and also to
>> inject corruptions into the image file.  It also has a scrubbing feature
>> to copy just the metadata out of an image file, allowing users to share
>> broken image files without revealing data in bug reports.
>>
>> This has lived in my local repo for a long time but could be useful to
>> others.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>
> For most of the commands, I think qemu-img/qemu-io should be extended
> instead of creating scripts for one or two formats and lacking the
> functionality for the rest.

I have mixed feelings about that because I don't think a common
interface will ever live up to its promise.  We will have an interface
that no two file formats implement much of (i.e. lots of NULL function
pointers).  The user experience will be that these commands don't work
("Operation not supported") and it's more flexible (and less code) to
write a format-specific script like this.

Also, usually before I use any of these potentially destructive
commands I review the script's code to double-check exactly what the
impact on the file will be.  It's nice to have a concise Python script
that can be reviewed easily rather than looking through layers of
production C code.

Do you really think there is much worth making common here?

Stefan
Kevin Wolf Dec. 2, 2011, 11:13 a.m. UTC | #3
Am 02.12.2011 11:23, schrieb Stefan Hajnoczi:
> On Fri, Dec 2, 2011 at 9:15 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 01.12.2011 18:00, schrieb Stefan Hajnoczi:
>>> The qed-tool.py utility can inspect and manipulate QED image files.  It
>>> can be used for testing to see the state of image metadata and also to
>>> inject corruptions into the image file.  It also has a scrubbing feature
>>> to copy just the metadata out of an image file, allowing users to share
>>> broken image files without revealing data in bug reports.
>>>
>>> This has lived in my local repo for a long time but could be useful to
>>> others.
>>>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>
>> For most of the commands, I think qemu-img/qemu-io should be extended
>> instead of creating scripts for one or two formats and lacking the
>> functionality for the rest.
> 
> I have mixed feelings about that because I don't think a common
> interface will ever live up to its promise.  We will have an interface
> that no two file formats implement much of (i.e. lots of NULL function
> pointers).  The user experience will be that these commands don't work
> ("Operation not supported") and it's more flexible (and less code) to
> write a format-specific script like this.
> 
> Also, usually before I use any of these potentially destructive
> commands I review the script's code to double-check exactly what the
> impact on the file will be.  It's nice to have a concise Python script
> that can be reviewed easily rather than looking through layers of
> production C code.
> 
> Do you really think there is much worth making common here?

Ok, I had another, closer look and there are two functions that I would
prefer to see in qemu-img info, namely fragmentation and dirty flag
status. For the rest you're probably right that an external script makes
more sense.

Kevin
Stefan Hajnoczi Dec. 2, 2011, 1:31 p.m. UTC | #4
On Fri, Dec 2, 2011 at 11:13 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 02.12.2011 11:23, schrieb Stefan Hajnoczi:
>> On Fri, Dec 2, 2011 at 9:15 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 01.12.2011 18:00, schrieb Stefan Hajnoczi:
>>>> The qed-tool.py utility can inspect and manipulate QED image files.  It
>>>> can be used for testing to see the state of image metadata and also to
>>>> inject corruptions into the image file.  It also has a scrubbing feature
>>>> to copy just the metadata out of an image file, allowing users to share
>>>> broken image files without revealing data in bug reports.
>>>>
>>>> This has lived in my local repo for a long time but could be useful to
>>>> others.
>>>>
>>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>>
>>> For most of the commands, I think qemu-img/qemu-io should be extended
>>> instead of creating scripts for one or two formats and lacking the
>>> functionality for the rest.
>>
>> I have mixed feelings about that because I don't think a common
>> interface will ever live up to its promise.  We will have an interface
>> that no two file formats implement much of (i.e. lots of NULL function
>> pointers).  The user experience will be that these commands don't work
>> ("Operation not supported") and it's more flexible (and less code) to
>> write a format-specific script like this.
>>
>> Also, usually before I use any of these potentially destructive
>> commands I review the script's code to double-check exactly what the
>> impact on the file will be.  It's nice to have a concise Python script
>> that can be reviewed easily rather than looking through layers of
>> production C code.
>>
>> Do you really think there is much worth making common here?
>
> Ok, I had another, closer look and there are two functions that I would
> prefer to see in qemu-img info, namely fragmentation and dirty flag
> status. For the rest you're probably right that an external script makes
> more sense.

Okay, I will resubmit with patches to implement those two and a
cut-down qed-tool.py.

Stefan
diff mbox

Patch

diff --git a/scripts/qed-tool.py b/scripts/qed-tool.py
new file mode 100755
index 0000000..90cc375
--- /dev/null
+++ b/scripts/qed-tool.py
@@ -0,0 +1,250 @@ 
+#!/usr/bin/env python
+#
+# Tool to manipulate QED image files
+#
+# Copyright (C) 2010 IBM, Corp.
+#
+# Authors:
+#  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2.  See
+# the COPYING file in the top-level directory.
+
+import sys
+import struct
+import random
+import optparse
+
+QED_F_NEED_CHECK = 0x02
+
+header_fmt = '<IIIIQQQQQII'
+header_size = struct.calcsize(header_fmt)
+field_names = ['magic', 'cluster_size', 'table_size',
+               'header_size', 'features', 'compat_features',
+               'autoclear_features', 'l1_table_offset', 'image_size',
+               'backing_filename_offset', 'backing_filename_size']
+table_elem_fmt = '<Q'
+table_elem_size = struct.calcsize(table_elem_fmt)
+
+def err(msg):
+    sys.stderr.write(msg + '\n')
+    sys.exit(1)
+
+def unpack_header(s):
+    fields = struct.unpack(header_fmt, s)
+    return dict((field_names[idx], val) for idx, val in enumerate(fields))
+
+def pack_header(header):
+    fields = tuple(header[x] for x in field_names)
+    return struct.pack(header_fmt, *fields)
+
+def unpack_table_elem(s):
+    return struct.unpack(table_elem_fmt, s)[0]
+
+def pack_table_elem(elem):
+    return struct.pack(table_elem_fmt, elem)
+
+class QED(object):
+    def __init__(self, f):
+        self.f = f
+
+        self.f.seek(0, 2)
+        self.filesize = f.tell()
+
+        self.load_header()
+        self.load_l1_table()
+
+    def raw_pread(self, offset, size):
+        self.f.seek(offset)
+        return self.f.read(size)
+
+    def raw_pwrite(self, offset, data):
+        self.f.seek(offset)
+        return self.f.write(data)
+
+    def load_header(self):
+        self.header = unpack_header(self.raw_pread(0, header_size))
+
+    def store_header(self):
+        self.raw_pwrite(0, pack_header(self.header))
+
+    def read_table(self, offset):
+        size = self.header['table_size'] * self.header['cluster_size']
+        s = self.raw_pread(offset, size)
+        table = [unpack_table_elem(s[i:i + table_elem_size]) for i in xrange(0, size, table_elem_size)]
+        return table
+
+    def load_l1_table(self):
+        self.l1_table = self.read_table(self.header['l1_table_offset'])
+        self.table_nelems = self.header['table_size'] * self.header['cluster_size'] / table_elem_size
+
+    def write_table(self, offset, table):
+        s = ''.join(pack_table_elem(x) for x in table)
+        self.raw_pwrite(offset, s)
+
+def random_table_item(table):
+    return random.choice([(index, offset) for index, offset in enumerate(table) if offset != 0])
+
+def corrupt_table_duplicate(table):
+    '''Corrupt a table by introducing a duplicate offset'''
+    _, dup_victim = random_table_item(table)
+
+    for i in xrange(len(table)):
+        dup_target = random.randint(0, len(table) - 1)
+        if table[dup_target] != dup_victim:
+            table[dup_target] = dup_victim
+            return
+    raise Exception('no duplication corruption possible in table')
+
+def corrupt_table_invalidate(qed, table):
+    '''Corrupt a table by introducing an invalid offset'''
+    index, _ = random_table_item(table)
+    table[index] = qed.filesize + random.randint(0, 100 * 1024 * 1024 * 1024 * 1024)
+
+def cmd_show(qed, *args):
+    '''show - Show header and l1 table'''
+    if not args or args[0] == 'header':
+        print qed.header
+    elif args[0] == 'l1':
+        print qed.l1_table
+    elif len(args) == 2 and args[0] == 'l2':
+        offset = int(args[1])
+        print qed.read_table(offset)
+    else:
+        err('unrecognized sub-command')
+
+def cmd_duplicate(qed, table_level):
+    '''duplicate l1|l2 - Duplicate a table element'''
+    if table_level == 'l1':
+        offset = qed.header['l1_table_offset']
+        table = qed.l1_table
+    elif table_level == 'l2':
+        _, offset = random_table_item(qed.l1_table)
+        table = qed.read_table(l2_offset)
+    else:
+        err('unrecognized sub-command')
+    corrupt_table_duplicate(table)
+    qed.write_table(offset, table)
+
+def cmd_invalidate(qed, table_level):
+    '''invalidate l1|l2 - Plant an invalid table element'''
+    if table_level == 'l1':
+        offset = qed.header['l1_table_offset']
+        table = qed.l1_table
+    elif table_level == 'l2':
+        _, offset = random_table_item(qed.l1_table)
+        table = qed.read_table(l2_offset)
+    else:
+        err('unrecognized sub-command')
+    corrupt_table_invalidate(qed, table)
+    qed.write_table(offset, table)
+
+def cmd_need_check(qed, *args):
+    '''need_check [on|off] - Test, set, or clear the QED_F_NEED_CHECK header bit'''
+    if not args:
+        print bool(qed.header['features'] & QED_F_NEED_CHECK)
+        return
+
+    if args[0] == 'on':
+        qed.header['features'] |= QED_F_NEED_CHECK
+    elif args[1] == 'off':
+        qed.header['features'] &= ~QED_F_NEED_CHECK
+    else:
+        err('unrecognized sub-command')
+    qed.store_header()
+
+def cmd_fragcheck(qed):
+    '''fragcheck - Determine allocation and fragmentation statistics'''
+    cluster_size = qed.header['cluster_size']
+    allocated_clusters = 0
+    fragmented_clusters = 0
+    last_offset = None
+
+    for l2_offset in qed.l1_table:
+        if l2_offset == 0:
+            continue
+        l2_table = qed.read_table(l2_offset)
+        for offset in l2_table:
+            if offset < cluster_size:
+                continue # ignore unallocated cluster
+            allocated_clusters += 1
+            if last_offset and offset != last_offset + cluster_size:
+                fragmented_clusters += 1
+            last_offset = offset
+
+    total_clusters = (qed.header['image_size'] + cluster_size - 1) / cluster_size
+    print '%d/%d = %0.2f%% allocated, %0.2f%% fragmented' % (
+            allocated_clusters, total_clusters,
+            allocated_clusters * 100.0 / total_clusters,
+            fragmented_clusters * 100.0 / allocated_clusters)
+
+def cmd_zerocluster(qed, pos, *args):
+    '''zerocluster <pos> [<n>] - Zero data clusters'''
+    pos, n = int(pos), 1
+    if args:
+        if len(args) != 1:
+            err('expected one argument')
+        n = int(args[0])
+
+    for i in xrange(n):
+        l1_index = pos / qed.header['cluster_size'] / len(qed.l1_table)
+        if qed.l1_table[l1_index] == 0:
+            err('no l2 table allocated')
+
+        l2_offset = qed.l1_table[l1_index]
+        l2_table = qed.read_table(l2_offset)
+
+        l2_index = (pos / qed.header['cluster_size']) % len(qed.l1_table)
+        l2_table[l2_index] = 1 # zero the data cluster
+        qed.write_table(l2_offset, l2_table)
+        pos += qed.header['cluster_size']
+
+def cmd_copy_metadata(qed, outfile):
+    '''copy_metadata <outfile> - Copy metadata only (for scrubbing corrupted images)'''
+    out = open(outfile, 'wb')
+
+    # Match file size
+    out.seek(qed.filesize - 1)
+    out.write('\0')
+
+    # Copy header clusters
+    out.seek(0)
+    header_size_bytes = qed.header['header_size'] * qed.header['cluster_size']
+    out.write(qed.raw_pread(0, header_size_bytes))
+
+    # Copy L1 table
+    out.seek(qed.header['l1_table_offset'])
+    s = ''.join(pack_table_elem(x) for x in qed.l1_table)
+    out.write(s)
+
+    # Copy L2 tables
+    for l2_offset in qed.l1_table:
+        if l2_offset == 0:
+            continue
+        l2_table = qed.read_table(l2_offset)
+        out.seek(l2_offset)
+        s = ''.join(pack_table_elem(x) for x in l2_table)
+        out.write(s)
+
+    out.close()
+
+def usage():
+    sys.stderr.write('usage: %s <filename> <command> [<args...>]\n\n' % sys.argv[0])
+    for cmd in sorted(x for x in globals() if x.startswith('cmd_')):
+        sys.stderr.write(globals()[cmd].__doc__ + '\n')
+    sys.exit(1)
+
+if len(sys.argv) < 3:
+    usage()
+filename, cmd = sys.argv[1:3]
+
+cmd = 'cmd_' + cmd
+if cmd not in globals():
+    usage()
+
+qed = QED(open(filename, 'r+b'))
+try:
+    globals()[cmd](qed, *sys.argv[3:])
+except TypeError:
+    sys.stderr.write(globals()[cmd].__doc__ + '\n')
+    sys.exit(1)