diff mbox

[v3,5/5] Add migration stream analyzation script

Message ID 1419604968-87437-6-git-send-email-agraf@suse.de
State New
Headers show

Commit Message

Alexander Graf Dec. 26, 2014, 2:42 p.m. UTC
This patch adds a python tool to the scripts directory that can read
a dumped migration stream if it contains the JSON description of the
device states. I constructs a human readable JSON stream out of it.

It's very simple to use:

  $ qemu-system-x86_64
    (qemu) migrate "exec:cat > mig"
  $ ./scripts/analyze_migration.py -f mig

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v1 -> v2:

  - Remove support for multiple vmsd versions
  - Add support for HTAB
  - Add support for array_len
  - Move to new naming schema for fields
  - Use dynamic page size

v2 -> v3:

  - Add tell function to MigrationFile
  - Report where subsections were not found
  - Print ram sections with size
  - Remove foreign desc file
  - Add memory dump support (-m option)
  - Add -x option to extract all sections into files
---
 scripts/analyze-migration.py | 592 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 592 insertions(+)
 create mode 100755 scripts/analyze-migration.py

Comments

Eric Blake Jan. 6, 2015, 4:05 p.m. UTC | #1
On 12/26/2014 07:42 AM, Alexander Graf wrote:
> This patch adds a python tool to the scripts directory that can read
> a dumped migration stream if it contains the JSON description of the
> device states. I constructs a human readable JSON stream out of it.
> 
> It's very simple to use:
> 
>   $ qemu-system-x86_64
>     (qemu) migrate "exec:cat > mig"
>   $ ./scripts/analyze_migration.py -f mig
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> 
> ---

> diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
> new file mode 100755
> index 0000000..3363172
> --- /dev/null
> +++ b/scripts/analyze-migration.py
> @@ -0,0 +1,592 @@
> +#!/usr/bin/env python
> +#
> +#  Migration Stream Analyzer
> +#
> +#  Copyright (c) 2013 Alexander Graf <agraf@suse.de>

Started in 2013, but you may want to add 2014 and 2015 by the time this
patch is polished.

> +
> +    # The VMSD description is at the end of the file, after EOF. Look for
> +    # the last NULL byte, then for the beginning brace of JSON.

Hmm, you picked up on an optimization that I missed in my ramblings
about 4/5 :)  Since a well-formed JSON description contains no NUL bytes
or control characters, a pipeline read of a migration stream can just
continually look for every '\6' marker byte followed by '{' as the
potential start of an object, and reset that search every time a '\0' or
another '\6' byte is seen; eventually, this will result in just the tail
of the file being seen as the JSON object (assuming that we never add
any other tail metadata - or that all other tail metadata is added as
backwards-compatible extensions to the JSON).  It STILL might be worth
the efficiency gain of stashing an offset in the file somewhere (as it
is faster to seek to an offset than it is to scan for particular
patterns), but at least your scan works forwards rather than in reverse.

> +
> +        # Find the last NULL byte, then the first brace after that. This should
> +        # be the beginning of our JSON data.
> +        nulpos = data.rfind("\0")
> +        jsonpos = data.find("{", nulpos)

Are you sure that there will NEVER be a '{' somewhere between the last
NUL and the end-of-stream marker?  Shouldn't you really be searching for
the last QEMU_VM_VMDESCRIPTION ('\6') magic byte, rather than NUL?
Alexander Graf Jan. 6, 2015, 9:29 p.m. UTC | #2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 06.01.15 17:05, Eric Blake wrote:
> On 12/26/2014 07:42 AM, Alexander Graf wrote:
>> This patch adds a python tool to the scripts directory that can
>> read a dumped migration stream if it contains the JSON
>> description of the device states. I constructs a human readable
>> JSON stream out of it.
>> 
>> It's very simple to use:
>> 
>> $ qemu-system-x86_64 (qemu) migrate "exec:cat > mig" $
>> ./scripts/analyze_migration.py -f mig
>> 
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>> 
>> ---
> 
>> diff --git a/scripts/analyze-migration.py
>> b/scripts/analyze-migration.py new file mode 100755 index
>> 0000000..3363172 --- /dev/null +++
>> b/scripts/analyze-migration.py @@ -0,0 +1,592 @@ +#!/usr/bin/env
>> python +# +#  Migration Stream Analyzer +# +#  Copyright (c) 2013
>> Alexander Graf <agraf@suse.de>
> 
> Started in 2013, but you may want to add 2014 and 2015 by the time
> this patch is polished.

Oops :).

> 
>> + +    # The VMSD description is at the end of the file, after
>> EOF. Look for +    # the last NULL byte, then for the beginning
>> brace of JSON.
> 
> Hmm, you picked up on an optimization that I missed in my
> ramblings about 4/5 :)  Since a well-formed JSON description
> contains no NUL bytes or control characters, a pipeline read of a
> migration stream can just continually look for every '\6' marker
> byte followed by '{' as the potential start of an object, and reset
> that search every time a '\0' or another '\6' byte is seen;
> eventually, this will result in just the tail of the file being
> seen as the JSON object (assuming that we never add any other tail
> metadata - or that all other tail metadata is added as 
> backwards-compatible extensions to the JSON).  It STILL might be
> worth the efficiency gain of stashing an offset in the file
> somewhere (as it is faster to seek to an offset than it is to scan
> for particular patterns), but at least your scan works forwards
> rather than in reverse.

Have you tried to run the script yet? Full search, assembly and
analysis of a normal sized migration stream take less than a second -
and that's with in-memory assembly of all the device objects and
properties (which takes the bulk of the time).

I don't think we have an efficiency problem :).

> 
>> + +        # Find the last NULL byte, then the first brace after
>> that. This should +        # be the beginning of our JSON data. +
>> nulpos = data.rfind("\0") +        jsonpos = data.find("{",
>> nulpos)
> 
> Are you sure that there will NEVER be a '{' somewhere between the
> last NUL and the end-of-stream marker?  Shouldn't you really be
> searching for the last QEMU_VM_VMDESCRIPTION ('\6') magic byte,
> rather than NUL?

NUL is the EOF marker, so I'm definitely hoping for it :). If we
decide to add more data after EOF (which we really shouldn't I think),
we can always adjust the script again IMHO.


Alex
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQIcBAEBAgAGBQJUrFO0AAoJECszeR4D/txg2wUQAIp8R22LiQzqQPrqs5VMiUbE
kOLvieg0fRRHmgQ/SE7aHc/s4sC88EEKsOBintijgcNPkt8Lu65xm71JvI9660BM
9Y/ASYIliNrBKr28ThszK074xAKsvkjSBULIPH6hMUk9e6P8p5SjnKckerWWc7a/
YK7xa+1uxTaK/3+JSzV4ZKFVwbIePrEXqGxYHFcKVqmNs/OSqNwkIkErFJQCyd8D
edqU/zQ4A4v+ZlJl4HZkZqXCBwUGJd6kF0ZUDmF4a7lsAELbv+MTxmVy8enBmH8p
KngnDkI67tpQmlxwSYu9tBW/25qcTIiZOigGT/VShDN6cRQfbw+h3lC3klt/GKA7
ghu464ept6/xI5FQpMkdkPQ742c6lhkuQAUTVqlorVtTsc9ZF2ONLl36Y/xPNmZB
R2BLDq3zDijWe6spWrfoMmc2pmtohHwyBNDtjVbI3nAkhdPddHdR2SzaCUf0B4Nj
tMckMDCucGWYMGTXGMElkw9SZE8kwExPme43nSsGRaHdUVLv24kIv3lNhUlIlpBq
Ljg9CMsHV6RhDySIvucfDueOpmgnyc7KGadXVEFr1pfEJrvg/SjhGDdX+ilmQeRg
iCk0f0/DyJWhPskGDeKBMm9DiJ0QZ3j+Lnw+ZYnOEazyr5tOy0kTuhmPx1eDK9yg
q1YJ7JGn7zVDwoBVWiXK
=wERA
-----END PGP SIGNATURE-----
diff mbox

Patch

diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
new file mode 100755
index 0000000..3363172
--- /dev/null
+++ b/scripts/analyze-migration.py
@@ -0,0 +1,592 @@ 
+#!/usr/bin/env python
+#
+#  Migration Stream Analyzer
+#
+#  Copyright (c) 2013 Alexander Graf <agraf@suse.de>
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+import numpy as np
+import json
+import os
+import argparse
+import collections
+import pprint
+
+def mkdir_p(path):
+    try:
+        os.makedirs(path)
+    except OSError:
+        pass
+
+class MigrationFile(object):
+    def __init__(self, filename):
+        self.filename = filename
+        self.file = open(self.filename, "rb")
+
+    def read64(self):
+        return np.asscalar(np.fromfile(self.file, count=1, dtype='>i8')[0])
+
+    def read32(self):
+        return np.asscalar(np.fromfile(self.file, count=1, dtype='>i4')[0])
+
+    def read16(self):
+        return np.asscalar(np.fromfile(self.file, count=1, dtype='>i2')[0])
+
+    def read8(self):
+        return np.asscalar(np.fromfile(self.file, count=1, dtype='>i1')[0])
+
+    def readstr(self, len = None):
+        if len is None:
+            len = self.read8()
+        if len == 0:
+            return ""
+        return np.fromfile(self.file, count=1, dtype=('S%d' % len))[0]
+
+    def readvar(self, size = None):
+        if size is None:
+            size = self.read8()
+        if size == 0:
+            return ""
+        value = self.file.read(size)
+        if len(value) != size:
+            raise Exception("Unexpected end of %s at 0x%x" % (self.filename, self.file.tell()))
+        return value
+
+    def tell(self):
+        return self.file.tell()
+
+    # The VMSD description is at the end of the file, after EOF. Look for
+    # the last NULL byte, then for the beginning brace of JSON.
+    def read_migration_debug_json(self):
+        QEMU_VM_VMDESCRIPTION = 0x06
+
+        # Remember the offset in the file when we started
+        entrypos = self.file.tell()
+
+        # Read the last 10MB
+        self.file.seek(0, os.SEEK_END)
+        endpos = self.file.tell()
+        self.file.seek(max(-endpos, -10 * 1024 * 1024), os.SEEK_END)
+        datapos = self.file.tell()
+        data = self.file.read()
+        # The full file read closed the file as well, reopen it
+        self.file = open(self.filename, "rb")
+
+        # Find the last NULL byte, then the first brace after that. This should
+        # be the beginning of our JSON data.
+        nulpos = data.rfind("\0")
+        jsonpos = data.find("{", nulpos)
+
+        # Check backwards from there and see whether we guessed right
+        self.file.seek(datapos + jsonpos - 5, 0)
+        if self.read8() != QEMU_VM_VMDESCRIPTION:
+            raise Exception("No Debug Migration device found")
+
+        jsonlen = self.read32()
+
+        # Seek back to where we were at the beginning
+        self.file.seek(entrypos, 0)
+
+        return data[jsonpos:jsonpos + jsonlen]
+
+    def close(self):
+        self.file.close()
+
+class RamSection(object):
+    RAM_SAVE_FLAG_COMPRESS = 0x02
+    RAM_SAVE_FLAG_MEM_SIZE = 0x04
+    RAM_SAVE_FLAG_PAGE     = 0x08
+    RAM_SAVE_FLAG_EOS      = 0x10
+    RAM_SAVE_FLAG_CONTINUE = 0x20
+    RAM_SAVE_FLAG_XBZRLE   = 0x40
+    RAM_SAVE_FLAG_HOOK     = 0x80
+
+    def __init__(self, file, version_id, ramargs, section_key):
+        if version_id != 4:
+            raise Exception("Unknown RAM version %d" % version_id)
+
+        self.file = file
+        self.section_key = section_key
+        self.TARGET_PAGE_SIZE = ramargs['page_size']
+        self.dump_memory = ramargs['dump_memory']
+        self.write_memory = ramargs['write_memory']
+        self.sizeinfo = collections.OrderedDict()
+        self.data = collections.OrderedDict()
+        self.data['section sizes'] = self.sizeinfo
+        self.name = ''
+        if self.write_memory:
+            self.files = { }
+        if self.dump_memory:
+            self.memory = collections.OrderedDict()
+            self.data['memory'] = self.memory
+
+    def __repr__(self):
+        return self.data.__repr__()
+
+    def __str__(self):
+        return self.data.__str__()
+
+    def getDict(self):
+        return self.data
+
+    def read(self):
+        # Read all RAM sections
+        while True:
+            addr = self.file.read64()
+            flags = addr & (self.TARGET_PAGE_SIZE - 1)
+            addr &= ~(self.TARGET_PAGE_SIZE - 1)
+
+            if flags & self.RAM_SAVE_FLAG_MEM_SIZE:
+                while True:
+                    namelen = self.file.read8()
+                    # We assume that no RAM chunk is big enough to ever
+                    # hit the first byte of the address, so when we see
+                    # a zero here we know it has to be an address, not the
+                    # length of the next block.
+                    if namelen == 0:
+                        self.file.file.seek(-1, 1)
+                        break
+                    self.name = self.file.readstr(len = namelen)
+                    len = self.file.read64()
+                    self.sizeinfo[self.name] = '0x%016x' % len
+                    if self.write_memory:
+                        print self.name
+                        mkdir_p('./' + os.path.dirname(self.name))
+                        f = open('./' + self.name, "wb")
+                        f.truncate(0)
+                        f.truncate(len)
+                        self.files[self.name] = f
+                flags &= ~self.RAM_SAVE_FLAG_MEM_SIZE
+
+            if flags & self.RAM_SAVE_FLAG_COMPRESS:
+                if flags & self.RAM_SAVE_FLAG_CONTINUE:
+                    flags &= ~self.RAM_SAVE_FLAG_CONTINUE
+                else:
+                    self.name = self.file.readstr()
+                fill_char = self.file.read8()
+                # The page in question is filled with fill_char now
+                if self.write_memory and fill_char != 0:
+                    self.files[self.name].seek(addr, os.SEEK_SET)
+                    self.files[self.name].write(chr(fill_char) * self.TARGET_PAGE_SIZE)
+                if self.dump_memory:
+                    self.memory['%s (0x%016x)' % (self.name, addr)] = 'Filled with 0x%02x' % fill_char
+                flags &= ~self.RAM_SAVE_FLAG_COMPRESS
+            elif flags & self.RAM_SAVE_FLAG_PAGE:
+                if flags & self.RAM_SAVE_FLAG_CONTINUE:
+                    flags &= ~self.RAM_SAVE_FLAG_CONTINUE
+                else:
+                    self.name = self.file.readstr()
+
+                if self.write_memory or self.dump_memory:
+                    data = self.file.readvar(size = self.TARGET_PAGE_SIZE)
+                else: # Just skip RAM data
+                    self.file.file.seek(self.TARGET_PAGE_SIZE, 1)
+
+                if self.write_memory:
+                    self.files[self.name].seek(addr, os.SEEK_SET)
+                    self.files[self.name].write(data)
+                if self.dump_memory:
+                    hexdata = " ".join("{0:02x}".format(ord(c)) for c in data)
+                    self.memory['%s (0x%016x)' % (self.name, addr)] = hexdata
+
+                flags &= ~self.RAM_SAVE_FLAG_PAGE
+            elif flags & self.RAM_SAVE_FLAG_XBZRLE:
+                raise Exception("XBZRLE RAM compression is not supported yet")
+            elif flags & self.RAM_SAVE_FLAG_HOOK:
+                raise Exception("RAM hooks don't make sense with files")
+
+            # End of RAM section
+            if flags & self.RAM_SAVE_FLAG_EOS:
+                break
+
+            if flags != 0:
+                raise Exception("Unknown RAM flags: %x" % flags)
+
+    def __del__(self):
+        if self.write_memory:
+            for key in self.files:
+                self.files[key].close()
+
+
+class HTABSection(object):
+    HASH_PTE_SIZE_64       = 16
+
+    def __init__(self, file, version_id, device, section_key):
+        if version_id != 1:
+            raise Exception("Unknown HTAB version %d" % version_id)
+
+        self.file = file
+        self.section_key = section_key
+
+    def read(self):
+
+        header = self.file.read32()
+
+        if (header > 0):
+            # First section, just the hash shift
+            return
+
+        # Read until end marker
+        while True:
+            index = self.file.read32()
+            n_valid = self.file.read16()
+            n_invalid = self.file.read16()
+
+            if index == 0 and n_valid == 0 and n_invalid == 0:
+                break
+
+            self.file.readvar(n_valid * HASH_PTE_SIZE_64)
+
+    def getDict(self):
+        return ""
+
+class VMSDFieldGeneric(object):
+    def __init__(self, desc, file):
+        self.file = file
+        self.desc = desc
+        self.data = ""
+
+    def __repr__(self):
+        return str(self.__str__())
+
+    def __str__(self):
+        return " ".join("{0:02x}".format(ord(c)) for c in self.data)
+
+    def getDict(self):
+        return self.__str__()
+
+    def read(self):
+        size = int(self.desc['size'])
+        self.data = self.file.readvar(size)
+        return self.data
+
+class VMSDFieldInt(VMSDFieldGeneric):
+    def __init__(self, desc, file):
+        super(VMSDFieldInt, self).__init__(desc, file)
+        self.size = int(desc['size'])
+        self.format = '0x%%0%dx' % (self.size * 2)
+        self.sdtype = '>i%d' % self.size
+        self.udtype = '>u%d' % self.size
+
+    def __repr__(self):
+        if self.data < 0:
+            return ('%s (%d)' % ((self.format % self.udata), self.data))
+        else:
+            return self.format % self.data
+
+    def __str__(self):
+        return self.__repr__()
+
+    def getDict(self):
+        return self.__str__()
+
+    def read(self):
+        super(VMSDFieldInt, self).read()
+        self.sdata = np.fromstring(self.data, count=1, dtype=(self.sdtype))[0]
+        self.udata = np.fromstring(self.data, count=1, dtype=(self.udtype))[0]
+        self.data = self.sdata
+        return self.data
+
+class VMSDFieldUInt(VMSDFieldInt):
+    def __init__(self, desc, file):
+        super(VMSDFieldUInt, self).__init__(desc, file)
+
+    def read(self):
+        super(VMSDFieldUInt, self).read()
+        self.data = self.udata
+        return self.data
+
+class VMSDFieldIntLE(VMSDFieldInt):
+    def __init__(self, desc, file):
+        super(VMSDFieldIntLE, self).__init__(desc, file)
+        self.dtype = '<i%d' % self.size
+
+class VMSDFieldBool(VMSDFieldGeneric):
+    def __init__(self, desc, file):
+        super(VMSDFieldBool, self).__init__(desc, file)
+
+    def __repr__(self):
+        return self.data.__repr__()
+
+    def __str__(self):
+        return self.data.__str__()
+
+    def getDict(self):
+        return self.data
+
+    def read(self):
+        super(VMSDFieldBool, self).read()
+        if self.data[0] == 0:
+            self.data = False
+        else:
+            self.data = True
+        return self.data
+
+class VMSDFieldStruct(VMSDFieldGeneric):
+    QEMU_VM_SUBSECTION    = 0x05
+
+    def __init__(self, desc, file):
+        super(VMSDFieldStruct, self).__init__(desc, file)
+        self.data = collections.OrderedDict()
+
+        # When we see compressed array elements, unfold them here
+        new_fields = []
+        for field in self.desc['struct']['fields']:
+            if not 'array_len' in field:
+                new_fields.append(field)
+                continue
+            array_len = field.pop('array_len')
+            field['index'] = 0
+            new_fields.append(field)
+            for i in xrange(1, array_len):
+                c = field.copy()
+                c['index'] = i
+                new_fields.append(c)
+
+        self.desc['struct']['fields'] = new_fields
+
+    def __repr__(self):
+        return self.data.__repr__()
+
+    def __str__(self):
+        return self.data.__str__()
+
+    def read(self):
+        for field in self.desc['struct']['fields']:
+            try:
+                reader = vmsd_field_readers[field['type']]
+            except:
+                reader = VMSDFieldGeneric
+
+            field['data'] = reader(field, self.file)
+            field['data'].read()
+
+            if 'index' in field:
+                if field['name'] not in self.data:
+                    self.data[field['name']] = []
+                a = self.data[field['name']]
+                if len(a) != int(field['index']):
+                    raise Exception("internal index of data field unmatched (%d/%d)" % (len(a), int(field['index'])))
+                a.append(field['data'])
+            else:
+                self.data[field['name']] = field['data']
+
+        if 'subsections' in self.desc['struct']:
+            for subsection in self.desc['struct']['subsections']:
+                if self.file.read8() != self.QEMU_VM_SUBSECTION:
+                    raise Exception("Subsection %s not found at offset %x" % ( subsection['vmsd_name'], self.file.tell()))
+                name = self.file.readstr()
+                version_id = self.file.read32()
+                self.data[name] = VMSDSection(self.file, version_id, subsection, (name, 0))
+                self.data[name].read()
+
+    def getDictItem(self, value):
+       # Strings would fall into the array category, treat
+       # them specially
+       if value.__class__ is ''.__class__:
+           return value
+
+       try:
+           return self.getDictOrderedDict(value)
+       except:
+           try:
+               return self.getDictArray(value)
+           except:
+               try:
+                   return value.getDict()
+               except:
+                   return value
+
+    def getDictArray(self, array):
+        r = []
+        for value in array:
+           r.append(self.getDictItem(value))
+        return r
+
+    def getDictOrderedDict(self, dict):
+        r = collections.OrderedDict()
+        for (key, value) in dict.items():
+            r[key] = self.getDictItem(value)
+        return r
+
+    def getDict(self):
+        return self.getDictOrderedDict(self.data)
+
+vmsd_field_readers = {
+    "bool" : VMSDFieldBool,
+    "int8" : VMSDFieldInt,
+    "int16" : VMSDFieldInt,
+    "int32" : VMSDFieldInt,
+    "int32 equal" : VMSDFieldInt,
+    "int32 le" : VMSDFieldIntLE,
+    "int64" : VMSDFieldInt,
+    "uint8" : VMSDFieldUInt,
+    "uint16" : VMSDFieldUInt,
+    "uint32" : VMSDFieldUInt,
+    "uint32 equal" : VMSDFieldUInt,
+    "uint64" : VMSDFieldUInt,
+    "int64 equal" : VMSDFieldInt,
+    "uint8 equal" : VMSDFieldInt,
+    "uint16 equal" : VMSDFieldInt,
+    "float64" : VMSDFieldGeneric,
+    "timer" : VMSDFieldGeneric,
+    "buffer" : VMSDFieldGeneric,
+    "unused_buffer" : VMSDFieldGeneric,
+    "bitmap" : VMSDFieldGeneric,
+    "struct" : VMSDFieldStruct,
+    "unknown" : VMSDFieldGeneric,
+}
+
+class VMSDSection(VMSDFieldStruct):
+    def __init__(self, file, version_id, device, section_key):
+        self.file = file
+        self.data = ""
+        self.vmsd_name = ""
+        self.section_key = section_key
+        desc = device
+        if 'vmsd_name' in device:
+            self.vmsd_name = device['vmsd_name']
+
+        # A section really is nothing but a FieldStruct :)
+        super(VMSDSection, self).__init__({ 'struct' : desc }, file)
+
+###############################################################################
+
+class MigrationDump(object):
+    QEMU_VM_FILE_MAGIC    = 0x5145564d
+    QEMU_VM_FILE_VERSION  = 0x00000003
+    QEMU_VM_EOF           = 0x00
+    QEMU_VM_SECTION_START = 0x01
+    QEMU_VM_SECTION_PART  = 0x02
+    QEMU_VM_SECTION_END   = 0x03
+    QEMU_VM_SECTION_FULL  = 0x04
+    QEMU_VM_SUBSECTION    = 0x05
+    QEMU_VM_VMDESCRIPTION = 0x06
+
+    def __init__(self, filename):
+        self.section_classes = { ( 'ram', 0 ) : [ RamSection, None ],
+                                 ( 'spapr/htab', 0) : ( HTABSection, None ) }
+        self.filename = filename
+        self.vmsd_desc = None
+
+    def read(self, desc_only = False, dump_memory = False, write_memory = False):
+        # Read in the whole file
+        file = MigrationFile(self.filename)
+
+        # File magic
+        data = file.read32()
+        if data != self.QEMU_VM_FILE_MAGIC:
+            raise Exception("Invalid file magic %x" % data)
+
+        # Version (has to be v3)
+        data = file.read32()
+        if data != self.QEMU_VM_FILE_VERSION:
+            raise Exception("Invalid version number %d" % data)
+
+        self.load_vmsd_json(file)
+
+        # Read sections
+        self.sections = collections.OrderedDict()
+
+        if desc_only:
+            return
+
+        ramargs = {}
+        ramargs['page_size'] = self.vmsd_desc['page_size']
+        ramargs['dump_memory'] = dump_memory
+        ramargs['write_memory'] = write_memory
+        self.section_classes[('ram',0)][1] = ramargs
+
+        while True:
+            section_type = file.read8()
+            if section_type == self.QEMU_VM_EOF:
+                break
+            elif section_type == self.QEMU_VM_SECTION_START or section_type == self.QEMU_VM_SECTION_FULL:
+                section_id = file.read32()
+                name = file.readstr()
+                instance_id = file.read32()
+                version_id = file.read32()
+                section_key = (name, instance_id)
+                classdesc = self.section_classes[section_key]
+                section = classdesc[0](file, version_id, classdesc[1], section_key)
+                self.sections[section_id] = section
+                section.read()
+            elif section_type == self.QEMU_VM_SECTION_PART or section_type == self.QEMU_VM_SECTION_END:
+                section_id = file.read32()
+                self.sections[section_id].read()
+            else:
+                raise Exception("Unknown section type: %d" % section_type)
+        file.close()
+
+    def load_vmsd_json(self, file):
+        vmsd_json = file.read_migration_debug_json()
+        self.vmsd_desc = json.loads(vmsd_json, object_pairs_hook=collections.OrderedDict)
+        for device in self.vmsd_desc['devices']:
+            key = (device['name'], device['instance_id'])
+            value = ( VMSDSection, device )
+            self.section_classes[key] = value
+
+    def getDict(self):
+        r = collections.OrderedDict()
+        for (key, value) in self.sections.items():
+           key = "%s (%d)" % ( value.section_key[0], key )
+           r[key] = value.getDict()
+        return r
+
+###############################################################################
+
+class JSONEncoder(json.JSONEncoder):
+    def default(self, o):
+        if isinstance(o, VMSDFieldGeneric):
+            return str(o)
+        return json.JSONEncoder.default(self, o)
+
+parser = argparse.ArgumentParser()
+parser.add_argument("-f", "--file", help='migration dump to read from', required=True)
+parser.add_argument("-m", "--memory", help='dump RAM contents as well', action='store_true')
+parser.add_argument("-d", "--dump", help='what to dump ("state" or "desc")', default='state')
+parser.add_argument("-x", "--extract", help='extract contents into individual files', action='store_true')
+args = parser.parse_args()
+
+jsonenc = JSONEncoder(indent=4, separators=(',', ': '))
+
+if args.extract:
+    dump = MigrationDump(args.file)
+
+    dump.read(desc_only = True)
+    print "desc.json"
+    f = open("desc.json", "wb")
+    f.truncate()
+    f.write(jsonenc.encode(dump.vmsd_desc))
+    f.close()
+
+    dump.read(write_memory = True)
+    dict = dump.getDict()
+    print "state.json"
+    f = open("state.json", "wb")
+    f.truncate()
+    f.write(jsonenc.encode(dict))
+    f.close()
+elif args.dump == "state":
+    dump = MigrationDump(args.file)
+    dump.read(dump_memory = args.memory)
+    dict = dump.getDict()
+    print jsonenc.encode(dict)
+elif args.dump == "desc":
+    dump = MigrationDump(args.file)
+    dump.read(desc_only = True)
+    print jsonenc.encode(dump.vmsd_desc)
+else:
+    raise Exception("Please specify either -x, -d state or -d dump")