diff mbox

Add better example about how to compress e2image raw image

Message ID 1316803672-25550-1-git-send-email-cmaiolino@redhat.com
State Rejected, archived
Headers show

Commit Message

Carlos Maiolino Sept. 23, 2011, 6:47 p.m. UTC
The current example in the man page uses bzip2 to compress
the raw image file created by the e2image, but, bzip2 does
not honors sparse files, which causes the image to have the
same size of the filesystem.
Using tar together with bzip2 will make the compressed file
to honor the sparsed file, which makes it more transportable
than the current one if the filesystem is large.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
---
 misc/e2image.8.in |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

Andreas Dilger Sept. 23, 2011, 8:24 p.m. UTC | #1
On 2011-09-23, at 12:47 PM, Carlos Maiolino wrote:
> The current example in the man page uses bzip2 to compress
> the raw image file created by the e2image, but, bzip2 does
> not honors sparse files, which causes the image to have the
> same size of the filesystem.
> Using tar together with bzip2 will make the compressed file
> to honor the sparsed file, which makes it more transportable
> than the current one if the filesystem is large.
> 
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
> misc/e2image.8.in |    2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/misc/e2image.8.in b/misc/e2image.8.in
> index 74d2a0b..fcf3d20 100644
> --- a/misc/e2image.8.in
> +++ b/misc/e2image.8.in
> @@ -115,7 +115,7 @@ as part of bug reports to e2fsprogs.  When used in this capacity, the
> recommended command is as follows (replace hda1 with the appropriate device):
> .PP
> .br
> -\	\fBe2image \-r /dev/hda1 \- | bzip2 > hda1.e2i.bz2\fR
> +\	\fBe2image \-r /dev/hda1 hda1.e2i && tar Sjcvf e2i.tar.bz2 hda1.e2i\fR

Even better would be the use of the QCOW2 format that Lukas added, if it could also be operated on directly by the e2fsprogs utils (I don't know if that is possible or not).


Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Carlos Maiolino Sept. 23, 2011, 8:51 p.m. UTC | #2
On Fri, Sep 23, 2011 at 02:24:23PM -0600, Andreas Dilger wrote:
> On 2011-09-23, at 12:47 PM, Carlos Maiolino wrote:
> > The current example in the man page uses bzip2 to compress
> > the raw image file created by the e2image, but, bzip2 does
> > not honors sparse files, which causes the image to have the
> > same size of the filesystem.
> > Using tar together with bzip2 will make the compressed file
> > to honor the sparsed file, which makes it more transportable
> > than the current one if the filesystem is large.
> > 
> > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> > ---
> > misc/e2image.8.in |    2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/misc/e2image.8.in b/misc/e2image.8.in
> > index 74d2a0b..fcf3d20 100644
> > --- a/misc/e2image.8.in
> > +++ b/misc/e2image.8.in
> > @@ -115,7 +115,7 @@ as part of bug reports to e2fsprogs.  When used in this capacity, the
> > recommended command is as follows (replace hda1 with the appropriate device):
> > .PP
> > .br
> > -\	\fBe2image \-r /dev/hda1 \- | bzip2 > hda1.e2i.bz2\fR
> > +\	\fBe2image \-r /dev/hda1 hda1.e2i && tar Sjcvf e2i.tar.bz2 hda1.e2i\fR
> 
> Even better would be the use of the QCOW2 format that Lukas added, if it could also be operated on directly by the e2fsprogs utils (I don't know if that is possible or not).
> 
The QCOW2 format can only be operated with qcow2 capable tools like qemu-img =[ to use directly with e2fsprogs tools, we still need to use raw images.
Theodore Ts'o Sept. 24, 2011, 4:31 p.m. UTC | #3
On Fri, Sep 23, 2011 at 05:51:24PM -0300, Carlos Maiolino wrote:
> On Fri, Sep 23, 2011 at 02:24:23PM -0600, Andreas Dilger wrote:
> > On 2011-09-23, at 12:47 PM, Carlos Maiolino wrote:
> > > The current example in the man page uses bzip2 to compress
> > > the raw image file created by the e2image, but, bzip2 does
> > > not honors sparse files, which causes the image to have the
> > > same size of the filesystem.
> > > Using tar together with bzip2 will make the compressed file
> > > to honor the sparsed file, which makes it more transportable
> > > than the current one if the filesystem is large.

The problem with using tar is that it requires extra disk space by the
user --- somewhere a bit more than double the extra disk space
(because you need to have space for the hda1.e2i file before it gets
compressed).  For very large file systems, this can be quite
significant.  My general philosophy has been to make things easy as
possible for the users as being more important for the developers.

For the developers, we do have contrib/make-sparse.c.  All we have to do is:

    bunzip2 < hda1.e2i.bz2 | make-sparse hda1.e2i

... and this creates a sparse file in hda1.e2i.

> > Even better would be the use of the QCOW2 format that Lukas added,
> > if it could also be operated on directly by the e2fsprogs utils (I
> > don't know if that is possible or not).
> > 
> The QCOW2 format can only be operated with qcow2 capable tools like
> > qemu-img to use directly with e2fsprogs tools, we still need to
> > use raw images.

Yeah, it would be nice if we had an io_manager implementation that
understood qcow2, which could then be used by dumpe2fs and debugfs.
Hopefully at some point someone will implement it.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Carlos Maiolino Sept. 26, 2011, 12:13 p.m. UTC | #4
> The problem with using tar is that it requires extra disk space by the
> user --- somewhere a bit more than double the extra disk space
> (because you need to have space for the hda1.e2i file before it gets
> compressed).  For very large file systems, this can be quite
> significant.  My general philosophy has been to make things easy as
> possible for the users as being more important for the developers.
> 
> For the developers, we do have contrib/make-sparse.c.  All we have to do is:
> 
>     bunzip2 < hda1.e2i.bz2 | make-sparse hda1.e2i
> 
> ... and this creates a sparse file in hda1.e2i.
> 

Nice to know this Ted. I'll use this instead. Thanks
Bernd Schubert Sept. 26, 2011, 12:22 p.m. UTC | #5
On 09/24/2011 06:31 PM, Ted Ts'o wrote:
> On Fri, Sep 23, 2011 at 05:51:24PM -0300, Carlos Maiolino wrote:
>> On Fri, Sep 23, 2011 at 02:24:23PM -0600, Andreas Dilger wrote:
>>> On 2011-09-23, at 12:47 PM, Carlos Maiolino wrote:
>>>> The current example in the man page uses bzip2 to compress
>>>> the raw image file created by the e2image, but, bzip2 does
>>>> not honors sparse files, which causes the image to have the
>>>> same size of the filesystem.
>>>> Using tar together with bzip2 will make the compressed file
>>>> to honor the sparsed file, which makes it more transportable
>>>> than the current one if the filesystem is large.
>
> The problem with using tar is that it requires extra disk space by the
> user --- somewhere a bit more than double the extra disk space
> (because you need to have space for the hda1.e2i file before it gets
> compressed).  For very large file systems, this can be quite
> significant.  My general philosophy has been to make things easy as
> possible for the users as being more important for the developers.
>
> For the developers, we do have contrib/make-sparse.c.  All we have to do is:
>
>      bunzip2<  hda1.e2i.bz2 | make-sparse hda1.e2i
>
> ... and this creates a sparse file in hda1.e2i.

The problem is that the bzip2 run will take a huge amount of time to 
compress all the zeros. In 2009 (with a recent CPU of that time) I 
aborted such a run for a 8TiB file system after a couple of days, then 
stored the e2image directly on disk and compressed it with tar and 
sparse support, which finished after only 12 hours...
I don't think more modern CPUs are much faster for single threaded runs 
as bzip2 does it.
So IMHO the man page should at least warn about that issue and suggest 
to use a similar tar command.


Cheers,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Sept. 26, 2011, 4:24 p.m. UTC | #6
On 9/24/11 11:31 AM, Ted Ts'o wrote:
> On Fri, Sep 23, 2011 at 05:51:24PM -0300, Carlos Maiolino wrote:
>> On Fri, Sep 23, 2011 at 02:24:23PM -0600, Andreas Dilger wrote:
>>> On 2011-09-23, at 12:47 PM, Carlos Maiolino wrote:
>>>> The current example in the man page uses bzip2 to compress
>>>> the raw image file created by the e2image, but, bzip2 does
>>>> not honors sparse files, which causes the image to have the
>>>> same size of the filesystem.
>>>> Using tar together with bzip2 will make the compressed file
>>>> to honor the sparsed file, which makes it more transportable
>>>> than the current one if the filesystem is large.
> 
> The problem with using tar is that it requires extra disk space by the
> user --- somewhere a bit more than double the extra disk space
> (because you need to have space for the hda1.e2i file before it gets
> compressed).  For very large file systems, this can be quite
> significant.  My general philosophy has been to make things easy as
> possible for the users as being more important for the developers.
> 
> For the developers, we do have contrib/make-sparse.c.  All we have to do is:
> 
>     bunzip2 < hda1.e2i.bz2 | make-sparse hda1.e2i
> 
> ... and this creates a sparse file in hda1.e2i.

or | cp --sparse=always /dev/stdin sparse.img works too.

But have you ever tried this with a multi-terabyte image?

It takes -forever- to process all those 0s, with cpus pegged.

The tar command seems to actually annotate the sparseness efficiently.

Ted, your concern about space - it doesn't take the full fs size worth
of space, right, just the metadata space?  So in general it should not
be THAT much ...

-Eric

>>> Even better would be the use of the QCOW2 format that Lukas added,
>>> if it could also be operated on directly by the e2fsprogs utils (I
>>> don't know if that is possible or not).
>>>
>> The QCOW2 format can only be operated with qcow2 capable tools like
>>> qemu-img to use directly with e2fsprogs tools, we still need to
>>> use raw images.
> 
> Yeah, it would be nice if we had an io_manager implementation that
> understood qcow2, which could then be used by dumpe2fs and debugfs.
> Hopefully at some point someone will implement it.
> 
> 						- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Sept. 26, 2011, 4:25 p.m. UTC | #7
On 9/26/11 7:22 AM, Bernd Schubert wrote:
> On 09/24/2011 06:31 PM, Ted Ts'o wrote:
>> On Fri, Sep 23, 2011 at 05:51:24PM -0300, Carlos Maiolino wrote:
>>> On Fri, Sep 23, 2011 at 02:24:23PM -0600, Andreas Dilger wrote:
>>>> On 2011-09-23, at 12:47 PM, Carlos Maiolino wrote:
>>>>> The current example in the man page uses bzip2 to compress
>>>>> the raw image file created by the e2image, but, bzip2 does
>>>>> not honors sparse files, which causes the image to have the
>>>>> same size of the filesystem.
>>>>> Using tar together with bzip2 will make the compressed file
>>>>> to honor the sparsed file, which makes it more transportable
>>>>> than the current one if the filesystem is large.
>>
>> The problem with using tar is that it requires extra disk space by the
>> user --- somewhere a bit more than double the extra disk space
>> (because you need to have space for the hda1.e2i file before it gets
>> compressed).  For very large file systems, this can be quite
>> significant.  My general philosophy has been to make things easy as
>> possible for the users as being more important for the developers.
>>
>> For the developers, we do have contrib/make-sparse.c.  All we have to do is:
>>
>>      bunzip2<  hda1.e2i.bz2 | make-sparse hda1.e2i
>>
>> ... and this creates a sparse file in hda1.e2i.
> 
> The problem is that the bzip2 run will take a huge amount of time to
> compress all the zeros. In 2009 (with a recent CPU of that time) I
> aborted such a run for a 8TiB file system after a couple of days,
> then stored the e2image directly on disk and compressed it with tar
> and sparse support, which finished after only 12 hours... I don't
> think more modern CPUs are much faster for single threaded runs as
> bzip2 does it. So IMHO the man page should at least warn about that
> issue and suggest to use a similar tar command.

Agreed!

I think they both have their place.

passing images around in qcow format may be best in the long run though.

-Eric

> 
> Cheers,
> Bernd
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Sept. 26, 2011, 7:23 p.m. UTC | #8
On Mon, Sep 26, 2011 at 11:24:31AM -0500, Eric Sandeen wrote:
> > 
> >     bunzip2 < hda1.e2i.bz2 | make-sparse hda1.e2i
> > 
> > ... and this creates a sparse file in hda1.e2i.
> 
> or | cp --sparse=always /dev/stdin sparse.img works too.
> 
> But have you ever tried this with a multi-terabyte image?
> 
> It takes -forever- to process all those 0s, with cpus pegged.

Yeah, I didn't realize until I read another message on this thread
that bzip2's CPU problems were causing problems.  Is gzip sufficiently
better, I wonder, or is it still problematic?

> Ted, your concern about space - it doesn't take the full fs size worth
> of space, right, just the metadata space?  So in general it should not
> be THAT much ...

Yes, it's just the metadata space that I was worried about.  So it's
not *that* much, but it still adds up on large systems.  But then
again, on large systems we precisely have the problem of bzip2 taking
forever.

If we decide that we're OK with not compressing qcow2, we could use
qcow2.  But note that the qcow2 format is still very compressible ---
it looks like it could do a better job removing zero blocks.  (I had a
256meg qcow2 e2image file compress down to 9 megs.)  Unfortunately we
can't do stream compression with qcow2.

Long run I think we should make the qcow2 support better (by dropping
all-zero blocks, and adding support for qcow2 to
debugfs/dumpe2fs/e2fsck, and perhaps adding support for native
compression).  Anyone looking for a project?  :-)

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Carlos Maiolino Sept. 26, 2011, 7:36 p.m. UTC | #9
Hi,

> If we decide that we're OK with not compressing qcow2, we could use
> qcow2.  But note that the qcow2 format is still very compressible ---
> it looks like it could do a better job removing zero blocks.  (I had a
> 256meg qcow2 e2image file compress down to 9 megs.)  Unfortunately we
> can't do stream compression with qcow2.
> 
> Long run I think we should make the qcow2 support better (by dropping
> all-zero blocks, and adding support for qcow2 to
> debugfs/dumpe2fs/e2fsck, and perhaps adding support for native
> compression).  Anyone looking for a project?  :-)
> 
> 						- Ted

Actually I'm pretty interested in contribute more to ExtFS, although I'm
not pretty much familiar with all Ext internals.
I've been contributing to GFS2 only. If you're not hurry to get it asap
and don't mind to receive some (maybe lots of) questions I can get this 
project (will need to learn more about ExtFS first :).

I can take sometime to implement it since my poor knowledge on ExtFS internals,
but, if you're ok with the above, I'm in.
Amir Goldstein Sept. 26, 2011, 8:01 p.m. UTC | #10
On Mon, Sep 26, 2011 at 10:23 PM, Ted Ts'o <tytso@mit.edu> wrote:
> On Mon, Sep 26, 2011 at 11:24:31AM -0500, Eric Sandeen wrote:
>> >
>> >     bunzip2 < hda1.e2i.bz2 | make-sparse hda1.e2i
>> >
>> > ... and this creates a sparse file in hda1.e2i.
>>
>> or | cp --sparse=always /dev/stdin sparse.img works too.
>>
>> But have you ever tried this with a multi-terabyte image?
>>
>> It takes -forever- to process all those 0s, with cpus pegged.
>
> Yeah, I didn't realize until I read another message on this thread
> that bzip2's CPU problems were causing problems.  Is gzip sufficiently
> better, I wonder, or is it still problematic?
>
>> Ted, your concern about space - it doesn't take the full fs size worth
>> of space, right, just the metadata space?  So in general it should not
>> be THAT much ...
>
> Yes, it's just the metadata space that I was worried about.  So it's
> not *that* much, but it still adds up on large systems.  But then
> again, on large systems we precisely have the problem of bzip2 taking
> forever.
>
> If we decide that we're OK with not compressing qcow2, we could use
> qcow2.  But note that the qcow2 format is still very compressible ---
> it looks like it could do a better job removing zero blocks.  (I had a
> 256meg qcow2 e2image file compress down to 9 megs.)  Unfortunately we
> can't do stream compression with qcow2.

I wasn't sure if I should bring this up, but what the hack...
Shardul, one of our GSoC students, have implemented e4send/e4receive
for streaming of ext4
snapshot image (with data) to a remote machine. The code can be found
in his github repo:
https://github.com/shardulmangade/e2fsprogs-snapshots

His code mostly reuses e2image code and uses "LVM snapshot store"
format for streaming
block numbers + blocks content to e4receive, which writes to a sparse
file or block device.
LVM snapshot store format was chosen simply because we needed
something quick and the code was already implemented by another GSoC student
for his own project (revert to ext4 snapshot using LVM merge).

So without trying to promote upstream inclusion of this implementation,
just so you know:
1. the code works, although not configured for exporting metadata at the moment
2. it's simple
3. no intermediate files needed
4. output can be streamed compressed
5. Shardul would be happy to help with further questions

Amir.

>
> Long run I think we should make the qcow2 support better (by dropping
> all-zero blocks, and adding support for qcow2 to
> debugfs/dumpe2fs/e2fsck, and perhaps adding support for native
> compression).  Anyone looking for a project?  :-)
>
>                                                - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/misc/e2image.8.in b/misc/e2image.8.in
index 74d2a0b..fcf3d20 100644
--- a/misc/e2image.8.in
+++ b/misc/e2image.8.in
@@ -115,7 +115,7 @@  as part of bug reports to e2fsprogs.  When used in this capacity, the
 recommended command is as follows (replace hda1 with the appropriate device):
 .PP
 .br
-\	\fBe2image \-r /dev/hda1 \- | bzip2 > hda1.e2i.bz2\fR
+\	\fBe2image \-r /dev/hda1 hda1.e2i && tar Sjcvf e2i.tar.bz2 hda1.e2i\fR
 .PP
 This will only send the metadata information, without any data blocks.  
 However, the filenames in the directory blocks can still reveal