diff mbox

[4/4] e2fsck: Add QCOW2 support

Message ID 1298638173-25050-4-git-send-email-lczerner@redhat.com
State Superseded, archived
Headers show

Commit Message

Lukas Czerner Feb. 25, 2011, 12:49 p.m. UTC
This commit adds QCOW2 support for e2fsck. In order to avoid creating
real QCOW2 image support, which would require creating a lot of code, we
simply bypass the problem by converting the QCOW2 image into raw image
and than let e2fsck work with raw image. Conversion itself can be quite
fast, so it should not be a serious slowdown.

Add '-Q' option to specify path for the raw image. It not specified the
raw image will be saved in /tmp direcotry in format
<qcow2_filename>.raw.XXXXXX, where X chosen randomly.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 e2fsck/e2fsck.8.in |    8 +++++-
 e2fsck/unix.c      |   74 ++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 76 insertions(+), 6 deletions(-)

Comments

Theodore Ts'o Feb. 26, 2011, 4:44 p.m. UTC | #1
On Fri, Feb 25, 2011 at 01:49:33PM +0100, Lukas Czerner wrote:
> This commit adds QCOW2 support for e2fsck. In order to avoid creating
> real QCOW2 image support, which would require creating a lot of code, we
> simply bypass the problem by converting the QCOW2 image into raw image
> and than let e2fsck work with raw image. Conversion itself can be quite
> fast, so it should not be a serious slowdown.
> 
> Add '-Q' option to specify path for the raw image. It not specified the
> raw image will be saved in /tmp direcotry in format
> <qcow2_filename>.raw.XXXXXX, where X chosen randomly.
> 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>

If we're just going to convert the qcow2 image into a raw image, that
means that if someone sends us a N gigabyte QCOW2 image, it will lots
of time (I'm not sure I agree with the "quite fast part"), and consume
an extra N gigabytes of free space to create the raw image.

In that case, I'm not so sure we really want to have a -Q option to
e2fsck.  We might be better off simply forcing the use of e2image to
convert the image back.

Note that the other reason why it's a lot better to be able to allow
e2fsck to be able to work on the raw image directly is that if a
customer sends a qcow2's metadata-only image from their 3TB raid
array, we won't be able to expand that to a raw image because of
ext2/3/4's 2TB maximum file size limit.  The qcow2 image might be only
a few hundreds of megabytes, so being able to have e2fsck operate on
that image directly would be a huge win. 

Adding iomanager support would also allow debugfs to access the qcow2
image directly --- also a win.

Whether or not we add the io_manager support right away (eventually I
think it's a must have feature), I don't think having a "decompress a
qcow2 image to a sparse raw image" makes sense as an explicit e2fsck
option.  It just clutters up the e2fsck option space, and people might
be confused because now e2fsck could break because there wasn't enough
free space to decompress the raw image.  Also, e2fsck doesn't delete
the /tmp file afterwards, which is bad --- but if it takes a large
amount of time to create the raw image, deleting afterwards is a bit
of waste as well.  Probably better to force the user to manage the
converted raw file system image.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rogier Wolff Feb. 28, 2011, 9:44 a.m. UTC | #2
On Sat, Feb 26, 2011 at 11:44:42AM -0500, Ted Ts'o wrote:
> ext2/3/4's 2TB maximum file size limit.  The qcow2 image might be only
> a few hundreds of megabytes, so being able to have e2fsck operate on
> that image directly would be a huge win. 

driepoot:~> ls -ls /mnt/md3.img 
61558920 -rw------- 1 root root 2937535070208 Feb 26 00:36 /mnt/md3.img

61 Gigabytes in my case... (and my system finished counting: I have
8.9M directories on there...)

	Roger.
Lukas Czerner March 1, 2011, 11:42 a.m. UTC | #3
On Sat, 26 Feb 2011, Ted Ts'o wrote:

> On Fri, Feb 25, 2011 at 01:49:33PM +0100, Lukas Czerner wrote:
> > This commit adds QCOW2 support for e2fsck. In order to avoid creating
> > real QCOW2 image support, which would require creating a lot of code, we
> > simply bypass the problem by converting the QCOW2 image into raw image
> > and than let e2fsck work with raw image. Conversion itself can be quite
> > fast, so it should not be a serious slowdown.
> > 
> > Add '-Q' option to specify path for the raw image. It not specified the
> > raw image will be saved in /tmp direcotry in format
> > <qcow2_filename>.raw.XXXXXX, where X chosen randomly.
> > 
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> 
> If we're just going to convert the qcow2 image into a raw image, that
> means that if someone sends us a N gigabyte QCOW2 image, it will lots
> of time (I'm not sure I agree with the "quite fast part"), and consume
> an extra N gigabytes of free space to create the raw image.
> 
> In that case, I'm not so sure we really want to have a -Q option to
> e2fsck.  We might be better off simply forcing the use of e2image to
> convert the image back.
> 
> Note that the other reason why it's a lot better to be able to allow
> e2fsck to be able to work on the raw image directly is that if a
> customer sends a qcow2's metadata-only image from their 3TB raid
> array, we won't be able to expand that to a raw image because of
> ext2/3/4's 2TB maximum file size limit.  The qcow2 image might be only
> a few hundreds of megabytes, so being able to have e2fsck operate on
> that image directly would be a huge win. 
> 
> Adding iomanager support would also allow debugfs to access the qcow2
> image directly --- also a win.
> 
> Whether or not we add the io_manager support right away (eventually I
> think it's a must have feature), I don't think having a "decompress a
> qcow2 image to a sparse raw image" makes sense as an explicit e2fsck
> option.  It just clutters up the e2fsck option space, and people might
> be confused because now e2fsck could break because there wasn't enough
> free space to decompress the raw image.  Also, e2fsck doesn't delete
> the /tmp file afterwards, which is bad --- but if it takes a large
> amount of time to create the raw image, deleting afterwards is a bit
> of waste as well.  Probably better to force the user to manage the
> converted raw file system image.
> 
> 					- Ted
> 

Hi Ted,

sorry for late answer, but I was running some benchmarks to have some
numbers to throw at you :). Now let's see how "qite fast" it actually is
in comparison:

I have 6TB raid composed of four drives and I flooded it with lots and
lots of files (copying /usr/share over and over again) and even created
some big files (1M, 20M, 1G, 10G) so the number of used inodes on the
filesystem is 10928139. I am using e2fsck form top of the master branch.

Before each step I run:
sync; echo 3 > /proc/sys/vm/drop_caches

exporting raw image:
time .//misc/e2image -r /dev/mapper/vg_raid-lv_stripe image.raw

	real    12m3.798s
	user    2m53.116s
	sys     3m38.430s

	6,0G    image.raw

exporting qcow2 image
time .//misc/e2image -Q /dev/mapper/vg_raid-lv_stripe image.qcow2
e2image 1.41.14 (22-Dec-2010)

	real    11m55.574s
	user    2m50.521s
	sys     3m41.515s

	6,1G    image.qcow2

So we can see that the running time is essentially the same, so there is
no crazy overhead in creating qcow2 image. Note that qcow2 image is
slightly bigger because of all the qcow2 related metadata and it's size
really depends on the size of the device. Also I tried to see how long
does it take to export bzipped2 raw image, but it is running almost one
day now, so it is not even comparable.

e2fsck on the device:
time .//e2fsck/e2fsck -fn /dev/mapper/vg_raid-lv_stripe

	real    3m9.400s
	user    0m47.558s
	sys     0m15.098s

e2fsck on the raw image:
time .//e2fsck/e2fsck -fn image.raw

	real    2m36.767s
	user    0m47.613s
	sys     0m8.403s

We can see that e2fsck on the raw image is a bit faster, but that is
obvious since the drive does not have to seek so much (right?).

Now converting qcow2 image into raw image:
time .//misc/e2image -r image.qcow2 image.qcow2.raw

	real    1m23.486s
	user    0m0.704s
	sys     0m22.574s

It is hard to say if it is "quite fast" or not. But I would say it is
not terribly slow either. Just out of curiosity, I have tried to convert
raw->qcow2 with qemu-img convert tool:

time qemu-img convert -O raw image.qcow2 image.qemu.raw
..it is running almost an hour now, so it is not comparable as well :)

e2fsck on the qcow2 image.
time .//e2fsck/e2fsck -fn -Q ./image.qcow2.img.tmp image.qcow2

	real    2m47.256s
	user    0m41.646s
	sys     0m28.618s

Now that is surprising. Well, not so much actually.. We can see that
e2fsck check on the qcow2 image, including qcow2->raw conversion is a
bit slower than checking raw image (by 7% which is not much) but it is
still faster than checking device itself. Now, the reason is probably
that the raw image we are creating is partially loaded into memory, hence
accelerate e2fsck. So I do not think that converting image before check
is such a bad idea (especially when you have enough memory:)).

I completely agree that having io_manager for the qcow2 format would be
cool, if someone is willing to do that, but I am not convinced that it
is worth it. Your concerns are all valid and I agree, however I do not
think e2image is used by regular unexperienced users, so it should not
confuse them, but that is just stupid assumption :).

Also, remember that if you really do not want to convert the image
because of file size limit, or whatever, you can always use qemu-nbd to
attach qcow2 image into nbd block device and use that as regular device.

Regarding the e2fsck and the qcow2 support (or -Q option), I think it is
useful, but I do not really insist on keeping it and as you said we can
always force user to use e2image for conversion. It is just, this way it
seems easier to do it automatically. Maybe we can ask user whether he
wants to keep the raw image after the check or not ?

Regaring separate qcow2.h file and "qcow2_" prefix. I have done this
because I am using this code from e2image and e2fsck so it seemed
convenient to have it in separate header, however I guess I can move it
into e2image.c and e2image.h if you want.

So what do you think.

Thanks!
-Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Goldstein March 7, 2011, 10:40 a.m. UTC | #4
On Tue, Mar 1, 2011 at 1:42 PM, Lukas Czerner <lczerner@redhat.com> wrote:
> On Sat, 26 Feb 2011, Ted Ts'o wrote:
>
>> On Fri, Feb 25, 2011 at 01:49:33PM +0100, Lukas Czerner wrote:
>> > This commit adds QCOW2 support for e2fsck. In order to avoid creating
>> > real QCOW2 image support, which would require creating a lot of code, we
>> > simply bypass the problem by converting the QCOW2 image into raw image
>> > and than let e2fsck work with raw image. Conversion itself can be quite
>> > fast, so it should not be a serious slowdown.
>> >
>> > Add '-Q' option to specify path for the raw image. It not specified the
>> > raw image will be saved in /tmp direcotry in format
>> > <qcow2_filename>.raw.XXXXXX, where X chosen randomly.
>> >
>> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
>>
>> If we're just going to convert the qcow2 image into a raw image, that
>> means that if someone sends us a N gigabyte QCOW2 image, it will lots
>> of time (I'm not sure I agree with the "quite fast part"), and consume
>> an extra N gigabytes of free space to create the raw image.
>>
>> In that case, I'm not so sure we really want to have a -Q option to
>> e2fsck.  We might be better off simply forcing the use of e2image to
>> convert the image back.
>>
>> Note that the other reason why it's a lot better to be able to allow
>> e2fsck to be able to work on the raw image directly is that if a
>> customer sends a qcow2's metadata-only image from their 3TB raid
>> array, we won't be able to expand that to a raw image because of
>> ext2/3/4's 2TB maximum file size limit.  The qcow2 image might be only
>> a few hundreds of megabytes, so being able to have e2fsck operate on
>> that image directly would be a huge win.
>>
>> Adding iomanager support would also allow debugfs to access the qcow2
>> image directly --- also a win.
>>
>> Whether or not we add the io_manager support right away (eventually I
>> think it's a must have feature), I don't think having a "decompress a
>> qcow2 image to a sparse raw image" makes sense as an explicit e2fsck
>> option.  It just clutters up the e2fsck option space, and people might
>> be confused because now e2fsck could break because there wasn't enough
>> free space to decompress the raw image.  Also, e2fsck doesn't delete
>> the /tmp file afterwards, which is bad --- but if it takes a large
>> amount of time to create the raw image, deleting afterwards is a bit
>> of waste as well.  Probably better to force the user to manage the
>> converted raw file system image.
>>
>>                                       - Ted
>>
>
> Hi Ted,
>
> sorry for late answer, but I was running some benchmarks to have some
> numbers to throw at you :). Now let's see how "qite fast" it actually is
> in comparison:
>
> I have 6TB raid composed of four drives and I flooded it with lots and
> lots of files (copying /usr/share over and over again) and even created
> some big files (1M, 20M, 1G, 10G) so the number of used inodes on the
> filesystem is 10928139. I am using e2fsck form top of the master branch.
>
> Before each step I run:
> sync; echo 3 > /proc/sys/vm/drop_caches
>
> exporting raw image:
> time .//misc/e2image -r /dev/mapper/vg_raid-lv_stripe image.raw
>
>        real    12m3.798s
>        user    2m53.116s
>        sys     3m38.430s
>
>        6,0G    image.raw
>
> exporting qcow2 image
> time .//misc/e2image -Q /dev/mapper/vg_raid-lv_stripe image.qcow2
> e2image 1.41.14 (22-Dec-2010)
>
>        real    11m55.574s
>        user    2m50.521s
>        sys     3m41.515s
>
>        6,1G    image.qcow2
>
> So we can see that the running time is essentially the same, so there is
> no crazy overhead in creating qcow2 image. Note that qcow2 image is
> slightly bigger because of all the qcow2 related metadata and it's size
> really depends on the size of the device. Also I tried to see how long
> does it take to export bzipped2 raw image, but it is running almost one
> day now, so it is not even comparable.
>
> e2fsck on the device:
> time .//e2fsck/e2fsck -fn /dev/mapper/vg_raid-lv_stripe
>
>        real    3m9.400s
>        user    0m47.558s
>        sys     0m15.098s
>
> e2fsck on the raw image:
> time .//e2fsck/e2fsck -fn image.raw
>
>        real    2m36.767s
>        user    0m47.613s
>        sys     0m8.403s
>
> We can see that e2fsck on the raw image is a bit faster, but that is
> obvious since the drive does not have to seek so much (right?).
>
> Now converting qcow2 image into raw image:
> time .//misc/e2image -r image.qcow2 image.qcow2.raw
>
>        real    1m23.486s
>        user    0m0.704s
>        sys     0m22.574s
>
> It is hard to say if it is "quite fast" or not. But I would say it is
> not terribly slow either. Just out of curiosity, I have tried to convert
> raw->qcow2 with qemu-img convert tool:
>
> time qemu-img convert -O raw image.qcow2 image.qemu.raw
> ..it is running almost an hour now, so it is not comparable as well :)
>
> e2fsck on the qcow2 image.
> time .//e2fsck/e2fsck -fn -Q ./image.qcow2.img.tmp image.qcow2
>
>        real    2m47.256s
>        user    0m41.646s
>        sys     0m28.618s
>
> Now that is surprising. Well, not so much actually.. We can see that
> e2fsck check on the qcow2 image, including qcow2->raw conversion is a
> bit slower than checking raw image (by 7% which is not much) but it is
> still faster than checking device itself. Now, the reason is probably
> that the raw image we are creating is partially loaded into memory, hence
> accelerate e2fsck. So I do not think that converting image before check
> is such a bad idea (especially when you have enough memory:)).
>
> I completely agree that having io_manager for the qcow2 format would be
> cool, if someone is willing to do that, but I am not convinced that it
> is worth it. Your concerns are all valid and I agree, however I do not
> think e2image is used by regular unexperienced users, so it should not
> confuse them, but that is just stupid assumption :).
>
> Also, remember that if you really do not want to convert the image
> because of file size limit, or whatever, you can always use qemu-nbd to
> attach qcow2 image into nbd block device and use that as regular device.

Did you consider the possibility to use QCOW2 format for doing a "tryout"
fsck on the filesystem with the option to rollback?

If QCOW2 image is created with the 'backing_file' option set to the origin
block device (and 'backing_fmt' is set to 'host_device'), then qemu-nbd
will be able to see the exported image metadata as well as the filesystem
data.

You can then do an "intrusive" fsck run on the NBD, mount your filesystem
(from the NBD) and view the results.

If you are satisfied with the results, you can apply the fsck changes to the
origin block device (there is probably a qemu-img command to do that).
If you are unsatisfied with the results, you can simply discard the image
or better yet, revert to a QCOW2 snapshot, which you created just before
running fsck.

Can you provide the performance figures for running fsck over NBD?

>
> Regarding the e2fsck and the qcow2 support (or -Q option), I think it is
> useful, but I do not really insist on keeping it and as you said we can
> always force user to use e2image for conversion. It is just, this way it
> seems easier to do it automatically. Maybe we can ask user whether he
> wants to keep the raw image after the check or not ?
>
> Regaring separate qcow2.h file and "qcow2_" prefix. I have done this
> because I am using this code from e2image and e2fsck so it seemed
> convenient to have it in separate header, however I guess I can move it
> into e2image.c and e2image.h if you want.
>
> So what do you think.
>
> Thanks!
> -Lukas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner March 7, 2011, 12:40 p.m. UTC | #5
On Mon, 7 Mar 2011, Amir Goldstein wrote:

> On Tue, Mar 1, 2011 at 1:42 PM, Lukas Czerner <lczerner@redhat.com> wrote:
> > On Sat, 26 Feb 2011, Ted Ts'o wrote:
> >
> >> On Fri, Feb 25, 2011 at 01:49:33PM +0100, Lukas Czerner wrote:
> >> > This commit adds QCOW2 support for e2fsck. In order to avoid creating
> >> > real QCOW2 image support, which would require creating a lot of code, we
> >> > simply bypass the problem by converting the QCOW2 image into raw image
> >> > and than let e2fsck work with raw image. Conversion itself can be quite
> >> > fast, so it should not be a serious slowdown.
> >> >
> >> > Add '-Q' option to specify path for the raw image. It not specified the
> >> > raw image will be saved in /tmp direcotry in format
> >> > <qcow2_filename>.raw.XXXXXX, where X chosen randomly.
> >> >
> >> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> >>
> >> If we're just going to convert the qcow2 image into a raw image, that
> >> means that if someone sends us a N gigabyte QCOW2 image, it will lots
> >> of time (I'm not sure I agree with the "quite fast part"), and consume
> >> an extra N gigabytes of free space to create the raw image.
> >>
> >> In that case, I'm not so sure we really want to have a -Q option to
> >> e2fsck.  We might be better off simply forcing the use of e2image to
> >> convert the image back.
> >>
> >> Note that the other reason why it's a lot better to be able to allow
> >> e2fsck to be able to work on the raw image directly is that if a
> >> customer sends a qcow2's metadata-only image from their 3TB raid
> >> array, we won't be able to expand that to a raw image because of
> >> ext2/3/4's 2TB maximum file size limit.  The qcow2 image might be only
> >> a few hundreds of megabytes, so being able to have e2fsck operate on
> >> that image directly would be a huge win.
> >>
> >> Adding iomanager support would also allow debugfs to access the qcow2
> >> image directly --- also a win.
> >>
> >> Whether or not we add the io_manager support right away (eventually I
> >> think it's a must have feature), I don't think having a "decompress a
> >> qcow2 image to a sparse raw image" makes sense as an explicit e2fsck
> >> option.  It just clutters up the e2fsck option space, and people might
> >> be confused because now e2fsck could break because there wasn't enough
> >> free space to decompress the raw image.  Also, e2fsck doesn't delete
> >> the /tmp file afterwards, which is bad --- but if it takes a large
> >> amount of time to create the raw image, deleting afterwards is a bit
> >> of waste as well.  Probably better to force the user to manage the
> >> converted raw file system image.
> >>
> >>                                       - Ted
> >>
> >
> > Hi Ted,
> >
> > sorry for late answer, but I was running some benchmarks to have some
> > numbers to throw at you :). Now let's see how "qite fast" it actually is
> > in comparison:
> >
> > I have 6TB raid composed of four drives and I flooded it with lots and
> > lots of files (copying /usr/share over and over again) and even created
> > some big files (1M, 20M, 1G, 10G) so the number of used inodes on the
> > filesystem is 10928139. I am using e2fsck form top of the master branch.
> >
> > Before each step I run:
> > sync; echo 3 > /proc/sys/vm/drop_caches
> >
> > exporting raw image:
> > time .//misc/e2image -r /dev/mapper/vg_raid-lv_stripe image.raw
> >
> >        real    12m3.798s
> >        user    2m53.116s
> >        sys     3m38.430s
> >
> >        6,0G    image.raw
> >
> > exporting qcow2 image
> > time .//misc/e2image -Q /dev/mapper/vg_raid-lv_stripe image.qcow2
> > e2image 1.41.14 (22-Dec-2010)
> >
> >        real    11m55.574s
> >        user    2m50.521s
> >        sys     3m41.515s
> >
> >        6,1G    image.qcow2
> >
> > So we can see that the running time is essentially the same, so there is
> > no crazy overhead in creating qcow2 image. Note that qcow2 image is
> > slightly bigger because of all the qcow2 related metadata and it's size
> > really depends on the size of the device. Also I tried to see how long
> > does it take to export bzipped2 raw image, but it is running almost one
> > day now, so it is not even comparable.
> >
> > e2fsck on the device:
> > time .//e2fsck/e2fsck -fn /dev/mapper/vg_raid-lv_stripe
> >
> >        real    3m9.400s
> >        user    0m47.558s
> >        sys     0m15.098s
> >
> > e2fsck on the raw image:
> > time .//e2fsck/e2fsck -fn image.raw
> >
> >        real    2m36.767s
> >        user    0m47.613s
> >        sys     0m8.403s
> >
> > We can see that e2fsck on the raw image is a bit faster, but that is
> > obvious since the drive does not have to seek so much (right?).
> >
> > Now converting qcow2 image into raw image:
> > time .//misc/e2image -r image.qcow2 image.qcow2.raw
> >
> >        real    1m23.486s
> >        user    0m0.704s
> >        sys     0m22.574s
> >
> > It is hard to say if it is "quite fast" or not. But I would say it is
> > not terribly slow either. Just out of curiosity, I have tried to convert
> > raw->qcow2 with qemu-img convert tool:
> >
> > time qemu-img convert -O raw image.qcow2 image.qemu.raw
> > ..it is running almost an hour now, so it is not comparable as well :)
> >
> > e2fsck on the qcow2 image.
> > time .//e2fsck/e2fsck -fn -Q ./image.qcow2.img.tmp image.qcow2
> >
> >        real    2m47.256s
> >        user    0m41.646s
> >        sys     0m28.618s
> >
> > Now that is surprising. Well, not so much actually.. We can see that
> > e2fsck check on the qcow2 image, including qcow2->raw conversion is a
> > bit slower than checking raw image (by 7% which is not much) but it is
> > still faster than checking device itself. Now, the reason is probably
> > that the raw image we are creating is partially loaded into memory, hence
> > accelerate e2fsck. So I do not think that converting image before check
> > is such a bad idea (especially when you have enough memory:)).
> >
> > I completely agree that having io_manager for the qcow2 format would be
> > cool, if someone is willing to do that, but I am not convinced that it
> > is worth it. Your concerns are all valid and I agree, however I do not
> > think e2image is used by regular unexperienced users, so it should not
> > confuse them, but that is just stupid assumption :).
> >
> > Also, remember that if you really do not want to convert the image
> > because of file size limit, or whatever, you can always use qemu-nbd to
> > attach qcow2 image into nbd block device and use that as regular device.
> 
> Did you consider the possibility to use QCOW2 format for doing a "tryout"
> fsck on the filesystem with the option to rollback?
> 
> If QCOW2 image is created with the 'backing_file' option set to the origin
> block device (and 'backing_fmt' is set to 'host_device'), then qemu-nbd
> will be able to see the exported image metadata as well as the filesystem
> data.
> 
> You can then do an "intrusive" fsck run on the NBD, mount your filesystem
> (from the NBD) and view the results.
> 
> If you are satisfied with the results, you can apply the fsck changes to the
> origin block device (there is probably a qemu-img command to do that).
> If you are unsatisfied with the results, you can simply discard the image
> or better yet, revert to a QCOW2 snapshot, which you created just before
> running fsck.

But this is something you can do even now. You can mount the qcow2
metadata image without any problems, you just will not see any data. But
I can take a look at this functionality, it seems simple enough.

> 
> Can you provide the performance figures for running fsck over NBD?

Well, unfortunately I do not have access to the same machine anymore,
but I have simple results which has been done elsewhere, but due to lack
of proper storage this has been done on loop device (should not affect
raw and qcow2 results).

[+] fsck raw image
real    0m30.176s
user    0m22.397s
sys     0m2.289s

[+] fsck NBD exported qcow2 image
real    0m31.667s
user    0m21.561s
sys     0m3.293s

So you can see that performance here is a bit worse (5%).

Thanks!
-Lukas

> 
> >
> > Regarding the e2fsck and the qcow2 support (or -Q option), I think it is
> > useful, but I do not really insist on keeping it and as you said we can
> > always force user to use e2image for conversion. It is just, this way it
> > seems easier to do it automatically. Maybe we can ask user whether he
> > wants to keep the raw image after the check or not ?
> >
> > Regaring separate qcow2.h file and "qcow2_" prefix. I have done this
> > because I am using this code from e2image and e2fsck so it seemed
> > convenient to have it in separate header, however I guess I can move it
> > into e2image.c and e2image.h if you want.
> >
> > So what do you think.
> >
> > Thanks!
> > -Lukas
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 

--
Lukas Czerner March 9, 2011, 4:30 p.m. UTC | #6
--snip--
> > 
> > Did you consider the possibility to use QCOW2 format for doing a "tryout"
> > fsck on the filesystem with the option to rollback?
> > 
> > If QCOW2 image is created with the 'backing_file' option set to the origin
> > block device (and 'backing_fmt' is set to 'host_device'), then qemu-nbd
> > will be able to see the exported image metadata as well as the filesystem
> > data.
> > 
> > You can then do an "intrusive" fsck run on the NBD, mount your filesystem
> > (from the NBD) and view the results.
> > 
> > If you are satisfied with the results, you can apply the fsck changes to the
> > origin block device (there is probably a qemu-img command to do that).
> > If you are unsatisfied with the results, you can simply discard the image
> > or better yet, revert to a QCOW2 snapshot, which you created just before
> > running fsck.
> 
> But this is something you can do even now. You can mount the qcow2
> metadata image without any problems, you just will not see any data. But
> I can take a look at this functionality, it seems simple enough.

So I have done this and it works as expected as long as the device
you've created the image from is present in the system, which might not
be true, especially in the case you are transferring the image to the
another machine (bug report).

If the device with the same name as the original does not exist in the
system qemu-nbd is not smart enough to just ignore that fact and mount
the image anyway. And looking at the man page there is no way to do it.

So, the result is I am not going to include this into my patches (if
someone does not change my mind:)) as I do not want to create just-another
switch for e2image. Also I fail to see the benefit if it anyway:).

Thanks!
-Lukas


> 
> > 
> > Can you provide the performance figures for running fsck over NBD?
> 
> Well, unfortunately I do not have access to the same machine anymore,
> but I have simple results which has been done elsewhere, but due to lack
> of proper storage this has been done on loop device (should not affect
> raw and qcow2 results).
> 
> [+] fsck raw image
> real    0m30.176s
> user    0m22.397s
> sys     0m2.289s
> 
> [+] fsck NBD exported qcow2 image
> real    0m31.667s
> user    0m21.561s
> sys     0m3.293s
> 
> So you can see that performance here is a bit worse (5%).
> 
> Thanks!
> -Lukas
> 
--snip--
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Goldstein March 9, 2011, 5:52 p.m. UTC | #7
On Wed, Mar 9, 2011 at 6:30 PM, Lukas Czerner <lczerner@redhat.com> wrote:
>
> --snip--
>> >
>> > Did you consider the possibility to use QCOW2 format for doing a "tryout"
>> > fsck on the filesystem with the option to rollback?
>> >
>> > If QCOW2 image is created with the 'backing_file' option set to the origin
>> > block device (and 'backing_fmt' is set to 'host_device'), then qemu-nbd
>> > will be able to see the exported image metadata as well as the filesystem
>> > data.
>> >
>> > You can then do an "intrusive" fsck run on the NBD, mount your filesystem
>> > (from the NBD) and view the results.
>> >
>> > If you are satisfied with the results, you can apply the fsck changes to the
>> > origin block device (there is probably a qemu-img command to do that).
>> > If you are unsatisfied with the results, you can simply discard the image
>> > or better yet, revert to a QCOW2 snapshot, which you created just before
>> > running fsck.
>>
>> But this is something you can do even now. You can mount the qcow2
>> metadata image without any problems, you just will not see any data. But
>> I can take a look at this functionality, it seems simple enough.
>
> So I have done this and it works as expected as long as the device
> you've created the image from is present in the system, which might not
> be true, especially in the case you are transferring the image to the
> another machine (bug report).
>
> If the device with the same name as the original does not exist in the
> system qemu-nbd is not smart enough to just ignore that fact and mount
> the image anyway. And looking at the man page there is no way to do it.
>
> So, the result is I am not going to include this into my patches (if
> someone does not change my mind:)) as I do not want to create just-another
> switch for e2image. Also I fail to see the benefit if it anyway:).
>

The benefit is, as I see it, is with the following capability:
A user with a corrupted fs, sends an e2image to an expert,
having him examine the file system (so far we already have).
Then the expert can fix the fs image (say using hard core debugfs'ing) and
send it back to the user.
The user can then "test mount" the fixed fs and if his valuable data is back,
send the other half of the payment to the expert, apply the fix to the origin
device and go on with his life.

It's a shame that qemu-nbd doesn't play along with that plan, but you can't
blame it, can you...

Anyway, thanks for testing my idea and thanks for QCOW2 e2image :-)
This is just one example of the nice things that the new e2image format
can be leveraged to.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner March 17, 2011, 1:05 p.m. UTC | #8
Hi Ted,

any comment on this ?

Thanks!
-Lukas

On Tue, 1 Mar 2011, Lukas Czerner wrote:

> On Sat, 26 Feb 2011, Ted Ts'o wrote:
> 
> > On Fri, Feb 25, 2011 at 01:49:33PM +0100, Lukas Czerner wrote:
> > > This commit adds QCOW2 support for e2fsck. In order to avoid creating
> > > real QCOW2 image support, which would require creating a lot of code, we
> > > simply bypass the problem by converting the QCOW2 image into raw image
> > > and than let e2fsck work with raw image. Conversion itself can be quite
> > > fast, so it should not be a serious slowdown.
> > > 
> > > Add '-Q' option to specify path for the raw image. It not specified the
> > > raw image will be saved in /tmp direcotry in format
> > > <qcow2_filename>.raw.XXXXXX, where X chosen randomly.
> > > 
> > > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > 
> > If we're just going to convert the qcow2 image into a raw image, that
> > means that if someone sends us a N gigabyte QCOW2 image, it will lots
> > of time (I'm not sure I agree with the "quite fast part"), and consume
> > an extra N gigabytes of free space to create the raw image.
> > 
> > In that case, I'm not so sure we really want to have a -Q option to
> > e2fsck.  We might be better off simply forcing the use of e2image to
> > convert the image back.
> > 
> > Note that the other reason why it's a lot better to be able to allow
> > e2fsck to be able to work on the raw image directly is that if a
> > customer sends a qcow2's metadata-only image from their 3TB raid
> > array, we won't be able to expand that to a raw image because of
> > ext2/3/4's 2TB maximum file size limit.  The qcow2 image might be only
> > a few hundreds of megabytes, so being able to have e2fsck operate on
> > that image directly would be a huge win. 
> > 
> > Adding iomanager support would also allow debugfs to access the qcow2
> > image directly --- also a win.
> > 
> > Whether or not we add the io_manager support right away (eventually I
> > think it's a must have feature), I don't think having a "decompress a
> > qcow2 image to a sparse raw image" makes sense as an explicit e2fsck
> > option.  It just clutters up the e2fsck option space, and people might
> > be confused because now e2fsck could break because there wasn't enough
> > free space to decompress the raw image.  Also, e2fsck doesn't delete
> > the /tmp file afterwards, which is bad --- but if it takes a large
> > amount of time to create the raw image, deleting afterwards is a bit
> > of waste as well.  Probably better to force the user to manage the
> > converted raw file system image.
> > 
> > 					- Ted
> > 
> 
> Hi Ted,
> 
> sorry for late answer, but I was running some benchmarks to have some
> numbers to throw at you :). Now let's see how "qite fast" it actually is
> in comparison:
> 
> I have 6TB raid composed of four drives and I flooded it with lots and
> lots of files (copying /usr/share over and over again) and even created
> some big files (1M, 20M, 1G, 10G) so the number of used inodes on the
> filesystem is 10928139. I am using e2fsck form top of the master branch.
> 
> Before each step I run:
> sync; echo 3 > /proc/sys/vm/drop_caches
> 
> exporting raw image:
> time .//misc/e2image -r /dev/mapper/vg_raid-lv_stripe image.raw
> 
> 	real    12m3.798s
> 	user    2m53.116s
> 	sys     3m38.430s
> 
> 	6,0G    image.raw
> 
> exporting qcow2 image
> time .//misc/e2image -Q /dev/mapper/vg_raid-lv_stripe image.qcow2
> e2image 1.41.14 (22-Dec-2010)
> 
> 	real    11m55.574s
> 	user    2m50.521s
> 	sys     3m41.515s
> 
> 	6,1G    image.qcow2
> 
> So we can see that the running time is essentially the same, so there is
> no crazy overhead in creating qcow2 image. Note that qcow2 image is
> slightly bigger because of all the qcow2 related metadata and it's size
> really depends on the size of the device. Also I tried to see how long
> does it take to export bzipped2 raw image, but it is running almost one
> day now, so it is not even comparable.
> 
> e2fsck on the device:
> time .//e2fsck/e2fsck -fn /dev/mapper/vg_raid-lv_stripe
> 
> 	real    3m9.400s
> 	user    0m47.558s
> 	sys     0m15.098s
> 
> e2fsck on the raw image:
> time .//e2fsck/e2fsck -fn image.raw
> 
> 	real    2m36.767s
> 	user    0m47.613s
> 	sys     0m8.403s
> 
> We can see that e2fsck on the raw image is a bit faster, but that is
> obvious since the drive does not have to seek so much (right?).
> 
> Now converting qcow2 image into raw image:
> time .//misc/e2image -r image.qcow2 image.qcow2.raw
> 
> 	real    1m23.486s
> 	user    0m0.704s
> 	sys     0m22.574s
> 
> It is hard to say if it is "quite fast" or not. But I would say it is
> not terribly slow either. Just out of curiosity, I have tried to convert
> raw->qcow2 with qemu-img convert tool:
> 
> time qemu-img convert -O raw image.qcow2 image.qemu.raw
> ..it is running almost an hour now, so it is not comparable as well :)
> 
> e2fsck on the qcow2 image.
> time .//e2fsck/e2fsck -fn -Q ./image.qcow2.img.tmp image.qcow2
> 
> 	real    2m47.256s
> 	user    0m41.646s
> 	sys     0m28.618s
> 
> Now that is surprising. Well, not so much actually.. We can see that
> e2fsck check on the qcow2 image, including qcow2->raw conversion is a
> bit slower than checking raw image (by 7% which is not much) but it is
> still faster than checking device itself. Now, the reason is probably
> that the raw image we are creating is partially loaded into memory, hence
> accelerate e2fsck. So I do not think that converting image before check
> is such a bad idea (especially when you have enough memory:)).
> 
> I completely agree that having io_manager for the qcow2 format would be
> cool, if someone is willing to do that, but I am not convinced that it
> is worth it. Your concerns are all valid and I agree, however I do not
> think e2image is used by regular unexperienced users, so it should not
> confuse them, but that is just stupid assumption :).
> 
> Also, remember that if you really do not want to convert the image
> because of file size limit, or whatever, you can always use qemu-nbd to
> attach qcow2 image into nbd block device and use that as regular device.
> 
> Regarding the e2fsck and the qcow2 support (or -Q option), I think it is
> useful, but I do not really insist on keeping it and as you said we can
> always force user to use e2image for conversion. It is just, this way it
> seems easier to do it automatically. Maybe we can ask user whether he
> wants to keep the raw image after the check or not ?
> 
> Regaring separate qcow2.h file and "qcow2_" prefix. I have done this
> because I am using this code from e2image and e2fsck so it seemed
> convenient to have it in separate header, however I guess I can move it
> into e2image.c and e2image.h if you want.
> 
> So what do you think.
> 
> Thanks!
> -Lukas
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index 3fb15e6..36d1492 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -8,7 +8,7 @@  e2fsck \- check a Linux ext2/ext3/ext4 file system
 .SH SYNOPSIS
 .B e2fsck
 [
-.B \-pacnyrdfkvtDFV
+.B \-pacnyrdfkvtDFVQ
 ]
 [
 .B \-b
@@ -263,6 +263,12 @@  will print a description of the problem and then exit with the value 4
 logically or'ed into the exit code.  (See the \fBEXIT CODE\fR section.)
 This option is normally used by the system's boot scripts.  It may not 
 be specified at the same time as the
+.TP
+.BI \-Q " filename"
+When e2fsck is attempting to check QCOW2 image, it has to convert QCOW2
+into raw image. This option specify the filename for the raw image. If
+this option is ommited, raw image will be created in /tmp direcotry.
+.TP
 .B \-n
 or
 .B \-y
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 7eb269c..acfff47 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -19,6 +19,7 @@ 
 #include <fcntl.h>
 #include <ctype.h>
 #include <time.h>
+#include <limits.h>
 #ifdef HAVE_SIGNAL_H
 #include <signal.h>
 #endif
@@ -53,6 +54,7 @@  extern int optind;
 #include "e2p/e2p.h"
 #include "e2fsck.h"
 #include "problem.h"
+#include "ext2fs/qcow2.h"
 #include "../version.h"
 
 /* Command line options */
@@ -626,8 +628,10 @@  static const char *config_fn[] = { ROOT_SYSCONFDIR "/e2fsck.conf", 0 };
 
 static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 {
-	int		flush = 0;
-	int		c, fd;
+	int		flush = 0, raw_name_set = 0;
+	int		c, fd, qcow2_fd;
+	struct		ext2_qcow2_hdr *header = NULL;
+	char		*d_name, raw_name[PATH_MAX];
 #ifdef MTRACE
 	extern void	*mallwatch;
 #endif
@@ -667,7 +671,7 @@  static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 		ctx->program_name = *argv;
 	else
 		ctx->program_name = "e2fsck";
-	while ((c = getopt (argc, argv, "panyrcC:B:dE:fvtFVM:b:I:j:P:l:L:N:SsDk")) != EOF)
+	while ((c = getopt (argc, argv, "panyrcC:B:dE:fvtFVM:b:I:j:P:l:L:N:SsDkQ:")) != EOF)
 		switch (c) {
 		case 'C':
 			ctx->progress = e2fsck_update_progress;
@@ -790,6 +794,10 @@  static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 		case 'k':
 			keep_bad_blocks++;
 			break;
+		case 'Q':
+			raw_name_set++;
+			snprintf(raw_name, PATH_MAX, "%s", optarg);
+			break;
 		default:
 			usage(ctx);
 		}
@@ -819,10 +827,66 @@  static errcode_t PRS(int argc, char *argv[], e2fsck_t *ret_ctx)
 	ctx->io_options = strchr(argv[optind], '?');
 	if (ctx->io_options)
 		*ctx->io_options++ = 0;
-	ctx->filesystem_name = blkid_get_devname(ctx->blkid, argv[optind], 0);
+
+	d_name = argv[optind];
+
+	/* Check whether the device, of image is QCOW2 */
+#ifdef HAVE_OPEN64
+	qcow2_fd = open64(d_name, O_RDONLY);
+#else
+	qcow2_fd = open(d_name, O_RDONLY);
+#endif
+	if (qcow2_fd < 0)
+		goto skip_qcow2;
+
+	header = qcow2_read_header(qcow2_fd, d_name);
+	if (header) {
+		int raw_fd;
+		char *path;
+		/*
+		 * We have qcow2 image, so need to convert it into raw
+		 * image, then pass its filename into further e2fsck code.
+		 */
+		if (!raw_name_set) {
+			if (!(path = strdup(d_name)))
+				fatal_error(ctx, "Could not allocate path");
+			snprintf(raw_name, PATH_MAX, "/tmp/%s.raw.XXXXXX",
+				 basename(path));
+			free(path);
+			raw_fd = mkstemp(raw_name);
+			printf(_("QCOW2 image detected! Converting into raw"
+				 " image = %s\n"), raw_name);
+		} else {
+#ifdef HAVE_OPEN64
+			raw_fd = open64(raw_name, O_CREAT|O_TRUNC|O_WRONLY, 0600);
+#else
+			raw_fd = open(raw_name, O_CREAT|O_TRUNC|O_WRONLY, 0600);
+#endif
+		}
+
+		if (raw_fd < 0) {
+			com_err(ctx->program_name, errno,
+				_("while opening raw image file %s"),raw_name);
+			fatal_error(ctx, 0);
+		}
+
+		retval = qcow2_write_raw_image(qcow2_fd, raw_fd, header);
+		if (retval) {
+			com_err(ctx->program_name, retval,
+				_("while converting qcow image %s into "
+				  "raw image %s"),d_name, raw_name);
+			fatal_error(ctx, 0);
+		}
+		close(raw_fd);
+		d_name = raw_name;
+	}
+	close(qcow2_fd);
+
+skip_qcow2:
+	ctx->filesystem_name = blkid_get_devname(ctx->blkid, d_name, 0);
 	if (!ctx->filesystem_name) {
 		com_err(ctx->program_name, 0, _("Unable to resolve '%s'"),
-			argv[optind]);
+			d_name);
 		fatal_error(ctx, 0);
 	}
 	if (extended_opts)