From patchwork Mon Nov 14 22:41:41 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Liguori X-Patchwork-Id: 125623 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [140.186.70.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 94680B7219 for ; Tue, 15 Nov 2011 09:42:32 +1100 (EST) Received: from localhost ([::1]:53822 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RQ5EO-0004N2-Cl for incoming@patchwork.ozlabs.org; Mon, 14 Nov 2011 17:42:24 -0500 Received: from eggs.gnu.org ([140.186.70.92]:52319) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RQ5E6-0004Aq-Ci for qemu-devel@nongnu.org; Mon, 14 Nov 2011 17:42:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RQ5E4-0006oZ-6X for qemu-devel@nongnu.org; Mon, 14 Nov 2011 17:42:06 -0500 Received: from e31.co.us.ibm.com ([32.97.110.149]:47854) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RQ5E3-0006oP-Rq for qemu-devel@nongnu.org; Mon, 14 Nov 2011 17:42:04 -0500 Received: from /spool/local by e31.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 14 Nov 2011 15:42:02 -0700 Received: from d03relay03.boulder.ibm.com ([9.17.195.228]) by e31.co.us.ibm.com ([192.168.1.131]) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 14 Nov 2011 15:41:56 -0700 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay03.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id pAEMfrnb165332 for ; Mon, 14 Nov 2011 15:41:54 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id pAEMfrdO028176 for ; Mon, 14 Nov 2011 15:41:53 -0700 Received: from titi.austin.rr.com (sig-9-65-38-167.mts.ibm.com [9.65.38.167]) by d03av04.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id pAEMfpJM027976; Mon, 14 Nov 2011 15:41:52 -0700 From: Anthony Liguori To: qemu-devel@nongnu.org Date: Mon, 14 Nov 2011 16:41:41 -0600 Message-Id: <1321310505-27845-2-git-send-email-aliguori@us.ibm.com> X-Mailer: git-send-email 1.7.4.1 In-Reply-To: <1321310505-27845-1-git-send-email-aliguori@us.ibm.com> References: <1321310505-27845-1-git-send-email-aliguori@us.ibm.com> x-cbid: 11111422-7282-0000-0000-000003C5DBAD X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 32.97.110.149 Cc: Anthony Liguori Subject: [Qemu-devel] [PATCH 1/5] docs: convert qed_spec.txt to markdown X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Signed-off-by: Anthony Liguori --- docs/specs/qed_spec.md | 221 +++++++++++++++++++++++++++++++++++++++++++++++ docs/specs/qed_spec.txt | 138 ----------------------------- 2 files changed, 221 insertions(+), 138 deletions(-) create mode 100644 docs/specs/qed_spec.md delete mode 100644 docs/specs/qed_spec.txt diff --git a/docs/specs/qed_spec.md b/docs/specs/qed_spec.md new file mode 100644 index 0000000..d74984c --- /dev/null +++ b/docs/specs/qed_spec.md @@ -0,0 +1,221 @@ +Specification +============= + +The file format looks like this: + + +----------+----------+----------+-----+ + | cluster0 | cluster1 | cluster2 | ... | + +----------+----------+----------+-----+ + +The first cluster begins with the *header*. The header contains information +about where regular clusters start; this allows the header to be extensible and +store extra information about the image file. A regular cluster may be a *data +cluster*, an *L2*, or an *L1 table*. L1 and L2 tables are composed of one or +more contiguous clusters. + +Normally the file size will be a multiple of the cluster size. If the file +size is not a multiple, extra information after the last cluster may not be +preserved if data is written. Legitimate extra information should use space +between the header and the first regular cluster. + +All fields are little-endian. + +Header +------ + Header { + uint32_t magic; /* QED\0 */ + + uint32_t cluster_size; /* in bytes */ + uint32_t table_size; /* for L1 and L2 tables, in clusters */ + uint32_t header_size; /* in clusters */ + + uint64_t features; /* format feature bits */ + uint64_t compat_features; /* compat feature bits */ + uint64_t autoclear_features; /* self-resetting feature bits */ + + uint64_t l1_table_offset; /* in bytes */ + uint64_t image_size; /* total logical image size, in bytes */ + + /* if (features & QED_F_BACKING_FILE) */ + uint32_t backing_filename_offset; /* in bytes from start of header */ + uint32_t backing_filename_size; /* in bytes */ + } + +Field descriptions: + + * __cluster_size__ must be a power of 2 in range [2^12, 2^26]. + + * __table_size__ must be a power of 2 in range [1, 16]. + + * __header_size__ is the number of clusters used by the header and any + additional information stored before regular clusters. + + * __features__, __compat_features__, and __autoclear_features__ are file format + extension bitmaps. They work as follows: + + - An image with unknown __features__ bits enabled must not be opened. File + format changes that are not backwards-compatible must use __features__ bits. + + - An image with unknown __compat_features__ bits enabled can be opened safely. + The unknown features are simply ignored and represent backwards-compatible + changes to the file format. + + - An image with unknown __autoclear_features__ bits enable can be opened + safely after clearing the unknown bits. This allows for + backwards-compatible changes to the file format which degrade gracefully and + can be re-enabled again by a new program later. + + * __l1_table_offset__ is the offset of the first byte of the L1 table in the + image file and must be a multiple of __cluster_size__. + + * __image_size__ is the block device size seen by the guest and must be a + multiple of 512 bytes. + + * __backing_filename_offset__ and __backing_filename_size__ describe a string + in (byte offset, byte size) form. It is not NUL-terminated and has no + alignment constraints. The string must be stored within the first + __header_size__ clusters. The backing filename may be an absolute path or + relative to the image file. + +Feature bits: + + * QED_F_BACKING_FILE = 0x01. The image uses a backing file. + + * QED_F_NEED_CHECK = 0x02. The image needs a consistency check before use. + + * QED_F_BACKING_FORMAT_NO_PROBE = 0x04. The backing file is a raw disk image + and no file format autodetection should be attempted. This should be used + to ensure that raw backing files are never detected as an image format if + they happen to contain magic constants. + +There are currently no defined __compat_features__ or __autoclear_features__ +bits. + +Fields predicated on a feature bit are only used when that feature is set. The +fields always take up header space, regardless of whether or not the feature +bit is set. + +Tables +------ + +Tables provide the translation from logical offsets in the block device to +cluster offsets in the file. + + #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t)) + + Table { + uint64_t offsets[TABLE_NOFFSETS]; + } + +The tables are organized as follows: + + +----------+ + | L1 table | + +----------+ + ,------' | '------. + +----------+ | +----------+ + | L2 table | ... | L2 table | + +----------+ +----------+ + ,------' | '------. + +----------+ | +----------+ + | Data | ... | Data | + +----------+ +----------+ + +A table is made up of one or more contiguous clusters. The table_size header +field determines table size for an image file. For example, cluster_size=64 KB +and table_size=4 results in 256 KB tables. + +The logical image size must be less than or equal to the maximum possible size +of clusters rooted by the L1 table: + + header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size + +L1, L2, and data cluster offsets must be aligned to header.cluster_size. The +following offsets have special meanings: + +### L2 table offsets + + * 0 - unallocated. The L2 table is not yet allocated. + +### Data cluster offsets + + * 0 - unallocated. The data cluster is not yet allocated. + + * 1 - zero. The data cluster contents are all zeroes and no cluster is + allocated. + +Future format extensions may wish to store per-offset information. The least +significant 12 bits of an offset are reserved for this purpose and must be set +to zero. Image files with cluster_size > 2^12 will have more unused bits which +should also be zeroed. + +### Unallocated L2 tables and data clusters + +Reads to an unallocated area of the image file access the backing file. If +there is no backing file, then zeroes are produced. The backing file may be +smaller than the image file and reads of unallocated areas beyond the end of +the backing file produce zeroes. + +Writes to an unallocated area cause a new data clusters to be allocated, and a +new L2 table if that is also unallocated. The new data cluster is populated +with data from the backing file (or zeroes if no backing file) and the data +being written. + +### Zero data clusters + +Zero data clusters are a space-efficient way of storing zeroed regions of the +image. + +Reads to a zero data cluster produce zeroes. Note that the difference between +an unallocated and a zero data cluster is that zero data clusters stop the +reading of contents from the backing file. + +Writes to a zero data cluster cause a new data cluster to be allocated. The +new data cluster is populated with zeroes and the data being written. + +### Logical offset translation + +Logical offsets are translated into cluster offsets as follows: + + table_bits table_bits cluster_bits + <--------> <--------> <---------------> + +----------+----------+-----------------+ + | L1 index | L2 index | byte offset | + +----------+----------+-----------------+ + +Structure of a logical offset + + offset_mask = ~(cluster_size - 1) # mask for the image file byte offset + + def logical_to_cluster_offset(l1_index, l2_index, byte_offset): + l2_offset = l1_table[l1_index] + l2_table = load_table(l2_offset) + cluster_offset = l2_table[l2_index] & offset_mask + return cluster_offset + byte_offset + +Consistency checking +-------------------- + +This section is informational and included to provide background on the use of +the QED_F_NEED_CHECK __features__ bit. + +The QED_F_NEED_CHECK bit is used to mark an image as dirty before starting an +operation that could leave the image in an inconsistent state if interrupted by +a crash or power failure. A dirty image must be checked on open because its +metadata may not be consistent. + +Consistency check includes the following invariants: + + 1. Each cluster is referenced once and only once. It is an inconsistency to + have a cluster referenced more than once by L1 or L2 tables. A cluster has + been leaked if it has no references. + + 2. Offsets must be within the image file size and must be __cluster_size__ + aligned. + + 3. Table offsets must at least __table_size__ * __cluster_size__ bytes from the + end of the image file so that there is space for the entire table. + +The consistency check process starts by from __l1_table_offset__ and scans all +L2 tables. After the check completes with no other errors besides leaks, the +QED_F_NEED_CHECK bit can be cleared and the image can be accessed. diff --git a/docs/specs/qed_spec.txt b/docs/specs/qed_spec.txt deleted file mode 100644 index 7982e05..0000000 --- a/docs/specs/qed_spec.txt +++ /dev/null @@ -1,138 +0,0 @@ -=Specification= - -The file format looks like this: - - +----------+----------+----------+-----+ - | cluster0 | cluster1 | cluster2 | ... | - +----------+----------+----------+-----+ - -The first cluster begins with the '''header'''. The header contains information about where regular clusters start; this allows the header to be extensible and store extra information about the image file. A regular cluster may be a '''data cluster''', an '''L2''', or an '''L1 table'''. L1 and L2 tables are composed of one or more contiguous clusters. - -Normally the file size will be a multiple of the cluster size. If the file size is not a multiple, extra information after the last cluster may not be preserved if data is written. Legitimate extra information should use space between the header and the first regular cluster. - -All fields are little-endian. - -==Header== - Header { - uint32_t magic; /* QED\0 */ - - uint32_t cluster_size; /* in bytes */ - uint32_t table_size; /* for L1 and L2 tables, in clusters */ - uint32_t header_size; /* in clusters */ - - uint64_t features; /* format feature bits */ - uint64_t compat_features; /* compat feature bits */ - uint64_t autoclear_features; /* self-resetting feature bits */ - - uint64_t l1_table_offset; /* in bytes */ - uint64_t image_size; /* total logical image size, in bytes */ - - /* if (features & QED_F_BACKING_FILE) */ - uint32_t backing_filename_offset; /* in bytes from start of header */ - uint32_t backing_filename_size; /* in bytes */ - } - -Field descriptions: -* ''cluster_size'' must be a power of 2 in range [2^12, 2^26]. -* ''table_size'' must be a power of 2 in range [1, 16]. -* ''header_size'' is the number of clusters used by the header and any additional information stored before regular clusters. -* ''features'', ''compat_features'', and ''autoclear_features'' are file format extension bitmaps. They work as follows: -** An image with unknown ''features'' bits enabled must not be opened. File format changes that are not backwards-compatible must use ''features'' bits. -** An image with unknown ''compat_features'' bits enabled can be opened safely. The unknown features are simply ignored and represent backwards-compatible changes to the file format. -** An image with unknown ''autoclear_features'' bits enable can be opened safely after clearing the unknown bits. This allows for backwards-compatible changes to the file format which degrade gracefully and can be re-enabled again by a new program later. -* ''l1_table_offset'' is the offset of the first byte of the L1 table in the image file and must be a multiple of ''cluster_size''. -* ''image_size'' is the block device size seen by the guest and must be a multiple of 512 bytes. -* ''backing_filename_offset'' and ''backing_filename_size'' describe a string in (byte offset, byte size) form. It is not NUL-terminated and has no alignment constraints. The string must be stored within the first ''header_size'' clusters. The backing filename may be an absolute path or relative to the image file. - -Feature bits: -* QED_F_BACKING_FILE = 0x01. The image uses a backing file. -* QED_F_NEED_CHECK = 0x02. The image needs a consistency check before use. -* QED_F_BACKING_FORMAT_NO_PROBE = 0x04. The backing file is a raw disk image and no file format autodetection should be attempted. This should be used to ensure that raw backing files are never detected as an image format if they happen to contain magic constants. - -There are currently no defined ''compat_features'' or ''autoclear_features'' bits. - -Fields predicated on a feature bit are only used when that feature is set. The fields always take up header space, regardless of whether or not the feature bit is set. - -==Tables== - -Tables provide the translation from logical offsets in the block device to cluster offsets in the file. - - #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t)) - - Table { - uint64_t offsets[TABLE_NOFFSETS]; - } - -The tables are organized as follows: - - +----------+ - | L1 table | - +----------+ - ,------' | '------. - +----------+ | +----------+ - | L2 table | ... | L2 table | - +----------+ +----------+ - ,------' | '------. - +----------+ | +----------+ - | Data | ... | Data | - +----------+ +----------+ - -A table is made up of one or more contiguous clusters. The table_size header field determines table size for an image file. For example, cluster_size=64 KB and table_size=4 results in 256 KB tables. - -The logical image size must be less than or equal to the maximum possible size of clusters rooted by the L1 table: - header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size - -L1, L2, and data cluster offsets must be aligned to header.cluster_size. The following offsets have special meanings: - -===L2 table offsets=== -* 0 - unallocated. The L2 table is not yet allocated. - -===Data cluster offsets=== -* 0 - unallocated. The data cluster is not yet allocated. -* 1 - zero. The data cluster contents are all zeroes and no cluster is allocated. - -Future format extensions may wish to store per-offset information. The least significant 12 bits of an offset are reserved for this purpose and must be set to zero. Image files with cluster_size > 2^12 will have more unused bits which should also be zeroed. - -===Unallocated L2 tables and data clusters=== -Reads to an unallocated area of the image file access the backing file. If there is no backing file, then zeroes are produced. The backing file may be smaller than the image file and reads of unallocated areas beyond the end of the backing file produce zeroes. - -Writes to an unallocated area cause a new data clusters to be allocated, and a new L2 table if that is also unallocated. The new data cluster is populated with data from the backing file (or zeroes if no backing file) and the data being written. - -===Zero data clusters=== -Zero data clusters are a space-efficient way of storing zeroed regions of the image. - -Reads to a zero data cluster produce zeroes. Note that the difference between an unallocated and a zero data cluster is that zero data clusters stop the reading of contents from the backing file. - -Writes to a zero data cluster cause a new data cluster to be allocated. The new data cluster is populated with zeroes and the data being written. - -===Logical offset translation=== -Logical offsets are translated into cluster offsets as follows: - - table_bits table_bits cluster_bits - <--------> <--------> <---------------> - +----------+----------+-----------------+ - | L1 index | L2 index | byte offset | - +----------+----------+-----------------+ - - Structure of a logical offset - - offset_mask = ~(cluster_size - 1) # mask for the image file byte offset - - def logical_to_cluster_offset(l1_index, l2_index, byte_offset): - l2_offset = l1_table[l1_index] - l2_table = load_table(l2_offset) - cluster_offset = l2_table[l2_index] & offset_mask - return cluster_offset + byte_offset - -==Consistency checking== - -This section is informational and included to provide background on the use of the QED_F_NEED_CHECK ''features'' bit. - -The QED_F_NEED_CHECK bit is used to mark an image as dirty before starting an operation that could leave the image in an inconsistent state if interrupted by a crash or power failure. A dirty image must be checked on open because its metadata may not be consistent. - -Consistency check includes the following invariants: -# Each cluster is referenced once and only once. It is an inconsistency to have a cluster referenced more than once by L1 or L2 tables. A cluster has been leaked if it has no references. -# Offsets must be within the image file size and must be ''cluster_size'' aligned. -# Table offsets must at least ''table_size'' * ''cluster_size'' bytes from the end of the image file so that there is space for the entire table. - -The consistency check process starts by from ''l1_table_offset'' and scans all L2 tables. After the check completes with no other errors besides leaks, the QED_F_NEED_CHECK bit can be cleared and the image can be accessed.