diff mbox

HMP: snapshot_blkdev can not consider //root/sn1 and /root/sn1 as the same file

Message ID 52864C79.20800@gmail.com
State New
Headers show

Commit Message

lijun Nov. 15, 2013, 4:31 p.m. UTC
From: Jun Li <junmuzi@gmail.com>

Hi all,

snapshot_blkdev can not consider //root/sn1 and /root/sn1 as the same 
file. when file /root/sn1 is the base file, do snapshot using file 
//root/sn1, qemu consider it as a new file. So this will rewrite the 
base file.

Signed-off-by: Jun Li <junmuzi@gmail.com>

snapshot
@@ -970,6 +972,18 @@ void hmp_snapshot_blkdev(Monitor *mon, c
          return;
      }

+    /* Delete duplicate '/' in filename. */
+    while (*p != '\0') {
+        if (*p == '/') {
+            while (*(p + count++) == '/') {
+                /* do null. */
+            }
+            strcpy((char *)(p + 1), (char *)(p + count - 1));
+            count = 1;
+        }
+        p++;
+    }
+
      mode = reuse ? NEW_IMAGE_MODE_EXISTING : 
NEW_IMAGE_MODE_ABSOLUTE_PATHS;
      qmp_blockdev_snapshot_sync(device, filename, !!format, format,
                                 true, mode, &errp);

Comments

Max Reitz Nov. 15, 2013, 4:42 p.m. UTC | #1
On 15.11.2013 17:31, lijun wrote:
> From: Jun Li <junmuzi@gmail.com>
>
> Hi all,
>
> snapshot_blkdev can not consider //root/sn1 and /root/sn1 as the same
> file. when file /root/sn1 is the base file, do snapshot using file
> //root/sn1, qemu consider it as a new file. So this will rewrite the
> base file.

Actually, the same problem can occur anyway if you have a path with a
couple of “.” and “..” in it – or even just a hardlink. Thus, to be
completely safe, we'd have to check whether the snapshot file (if it
already exists) has a different inode number and/or is located on a
different filesystem.

> Signed-off-by: Jun Li <junmuzi@gmail.com>
>
> --- a/hmp.c 2013-11-15 23:15:46.733361130 +0800
> +++ b/hmp.c 2013-11-16 00:20:23.972248509 +0800
> @@ -957,10 +957,12 @@ void hmp_snapshot_blkdev(Monitor *mon, c
> {
> const char *device = qdict_get_str(qdict, "device");
> const char *filename = qdict_get_try_str(qdict, "snapshot-file");
> + const char *p = filename;
> const char *format = qdict_get_try_str(qdict, "format");
> int reuse = qdict_get_try_bool(qdict, "reuse", 0);
> enum NewImageMode mode;
> Error *errp = NULL;
> + int count = 1;
>
> if (!filename) {
> /* In the future, if 'snapshot-file' is not specified, the snapshot
> @@ -970,6 +972,18 @@ void hmp_snapshot_blkdev(Monitor *mon, c
> return;
> }
>
> + /* Delete duplicate '/' in filename. */
> + while (*p != '\0') {
> + if (*p == '/') {
> + while (*(p + count++) == '/') {
> + /* do null. */
> + }
> + strcpy((char *)(p + 1), (char *)(p + count - 1));

Casting a const char * to a char * seems very bogus to me. Using
g_strdup or something before is probably a better idea. Also, don't use
strcpy but memmove (since both strings overlap).

All in all, using realpath() is probably the better idea anyway (it'd
also resolve the . and .. problem (and symlinks), though it still won't
prevent hardlinks from messing things up).

Max

> + count = 1;
> + }
> + p++;
> + }
> +
> mode = reuse ? NEW_IMAGE_MODE_EXISTING : NEW_IMAGE_MODE_ABSOLUTE_PATHS;
> qmp_blockdev_snapshot_sync(device, filename, !!format, format,
> true, mode, &errp);
>
Eric Blake Nov. 15, 2013, 5:21 p.m. UTC | #2
On 11/15/2013 09:42 AM, Max Reitz wrote:

> Actually, the same problem can occur anyway if you have a path with a
> couple of “.” and “..” in it – or even just a hardlink. Thus, to be
> completely safe, we'd have to check whether the snapshot file (if it
> already exists) has a different inode number and/or is located on a
> different filesystem.

See also the recent thread on detecting backing file loops - this should
be part of that solution (if it isn't already):
https://lists.gnu.org/archive/html/qemu-devel/2013-11/msg01840.html


Backing file loops might get away with string-only detection; but then I
start to worry that the string-only detection will misbehave on relative
paths (consider: /dir1/a <- /dir1/b [backed by relative 'a'] <- /dir2/a
[backed by absolute /dir1/b] <- /dir2/a [backed by relative 'a']);
devno/inode pairs are the only reliable to detect loops when only the
filesystem is involved, but then you also introduce network protocols
(and there, it's worse: gluster://host1/vol/img and
gluster://host2/vol/img could be the same file, if host1 and host2 are
part of the same storage cluster, but there is no devno/inode to tell
you that).
Stefan Hajnoczi Nov. 18, 2013, 1:31 p.m. UTC | #3
On Fri, Nov 15, 2013 at 10:21:40AM -0700, Eric Blake wrote:
> On 11/15/2013 09:42 AM, Max Reitz wrote:
> 
> > Actually, the same problem can occur anyway if you have a path with a
> > couple of “.” and “..” in it – or even just a hardlink. Thus, to be
> > completely safe, we'd have to check whether the snapshot file (if it
> > already exists) has a different inode number and/or is located on a
> > different filesystem.
> 
> See also the recent thread on detecting backing file loops - this should
> be part of that solution (if it isn't already):
> https://lists.gnu.org/archive/html/qemu-devel/2013-11/msg01840.html
> 
> 
> Backing file loops might get away with string-only detection; but then I
> start to worry that the string-only detection will misbehave on relative
> paths (consider: /dir1/a <- /dir1/b [backed by relative 'a'] <- /dir2/a
> [backed by absolute /dir1/b] <- /dir2/a [backed by relative 'a']);
> devno/inode pairs are the only reliable to detect loops when only the
> filesystem is involved, but then you also introduce network protocols
> (and there, it's worse: gluster://host1/vol/img and
> gluster://host2/vol/img could be the same file, if host1 and host2 are
> part of the same storage cluster, but there is no devno/inode to tell
> you that).

Detecting identical "files" is not a problem that can be solved in the
general case.  Once network storage comes into play we don't have the
ability to check file identity.

Users can misconfigure QEMU if they try hard enough.  Filename string
manipulation is very error-prone and I'd rather just avoid it than
provide a false sense of security.

What's the real use case for this patch?

Stefan
lijun Dec. 9, 2013, 6:06 a.m. UTC | #4
If mount a local file(disk) in two different dirctories, it is similar to the
network storage. Detecting identical "files" is still a problem.
Such as:
dd if=/dev/zero of=aa bs=1M count=10
mkfs.ext4 aa
Then mount aa to two directories.
mount aa /mnt/dir1
mount aa /mnt/dir2
# tune2fs -l aa
tune2fs 1.42.5 (29-Jul-2012)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          bb095a44-e896-4949-b5f4-9d9468a7178e
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index
filetype needs_recovery extent flex_bg sparse_super huge_file uninit_bg
dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              2560
Block count:              10240
Reserved block count:     512
Free blocks:              8795
Free inodes:              2549
First block:              1
Block size:               1024
Fragment size:            1024
Reserved GDT blocks:      39
Blocks per group:         8192
Fragments per group:      8192
Inodes per group:         1280
Inode blocks per group:   160
Flex block group size:    16
Filesystem created:       Mon Dec  9 14:00:46 2013
Last mount time:          Mon Dec  9 14:01:35 2013
Last write time:          Mon Dec  9 14:01:35 2013
Mount count:              2
Maximum mount count:      -1
Last checked:             Mon Dec  9 14:00:46 2013
Check interval:           0 (<none>)
Lifetime writes:          1150 kB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:              128
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      efe7990e-1414-4ede-9488-4f4d369fe7f5
Journal backup:           inode blocks
---
Can create the same file in /mnt/dir1 and /mnt/dir2.
-----
For network storage, usually mounted via FuseFS(such as glusterfs). Can not
check file identity when mount in more than a directory. So maybe it is the
same as the network storage file.

Best wishes,
Jun Li



2013/11/18 Stefan Hajnoczi <stefanha@gmail.com>

> On Fri, Nov 15, 2013 at 10:21:40AM -0700, Eric Blake wrote:
> > On 11/15/2013 09:42 AM, Max Reitz wrote:
> >
> > > Actually, the same problem can occur anyway if you have a path with a
> > > couple of “.” and “..” in it – or even just a hardlink. Thus, to be
> > > completely safe, we'd have to check whether the snapshot file (if it
> > > already exists) has a different inode number and/or is located on a
> > > different filesystem.
> >
> > See also the recent thread on detecting backing file loops - this should
> > be part of that solution (if it isn't already):
> > https://lists.gnu.org/archive/html/qemu-devel/2013-11/msg01840.html
> >
> >
> > Backing file loops might get away with string-only detection; but then I
> > start to worry that the string-only detection will misbehave on relative
> > paths (consider: /dir1/a <- /dir1/b [backed by relative 'a'] <- /dir2/a
> > [backed by absolute /dir1/b] <- /dir2/a [backed by relative 'a']);
> > devno/inode pairs are the only reliable to detect loops when only the
> > filesystem is involved, but then you also introduce network protocols
> > (and there, it's worse: gluster://host1/vol/img and
> > gluster://host2/vol/img could be the same file, if host1 and host2 are
> > part of the same storage cluster, but there is no devno/inode to tell
> > you that).
>
> Detecting identical "files" is not a problem that can be solved in the
> general case.  Once network storage comes into play we don't have the
> ability to check file identity.
>
> Users can misconfigure QEMU if they try hard enough.  Filename string
> manipulation is very error-prone and I'd rather just avoid it than
> provide a false sense of security.
>
> What's the real use case for this patch?
>
> Stefan
>
Stefan Hajnoczi Dec. 9, 2013, 12:17 p.m. UTC | #5
On Mon, Dec 09, 2013 at 02:06:21PM +0800, jun muzi wrote:
> If mount a local file(disk) in two different dirctories, it is similar to the
> network storage. Detecting identical "files" is still a problem.
> Such as:
> dd if=/dev/zero of=aa bs=1M count=10
> mkfs.ext4 aa
> Then mount aa to two directories.
> mount aa /mnt/dir1
> mount aa /mnt/dir2

(Off-topic, but you should never mount a regular file system read/write
more than once at a time.  This example is wrong, it can corrupt the
file system.)

My question was about the bug.  What is the problem you are seeing and
how do you reproduce it (including the HMP commands)?

Stefan
lijun Dec. 10, 2013, 1:58 p.m. UTC | #6
On 12/09/2013 08:17 PM, Stefan Hajnoczi wrote:
> On Mon, Dec 09, 2013 at 02:06:21PM +0800, jun muzi wrote:
>> If mount a local file(disk) in two different dirctories, it is similar to the
>> network storage. Detecting identical "files" is still a problem.
>> Such as:
>> dd if=/dev/zero of=aa bs=1M count=10
>> mkfs.ext4 aa
>> Then mount aa to two directories.
>> mount aa /mnt/dir1
>> mount aa /mnt/dir2
> (Off-topic, but you should never mount a regular file system read/write
> more than once at a time.  This example is wrong, it can corrupt the
> file system.)
yes, I should not to mount a regular file more than once at a time. But 
I just want to say the network storage will be mount more than once at a 
time.
Such as :
Create a nfs server, and mount it as followings:
# mount $IP:/home/ /mnt/dir1
# mount $IP:/home/ /mnt/dir2
If create a new file named test in /mnt/dir1, /mnt/dir2 will know this 
new file "test". So the linux kernel can know the file in /mnt/dir1 and 
/mnt/dir2 whether it is the same file or not.
Based on above info, qemu-kvm can know whether file1 in /mnt/dir1 and 
file2 in /mnt/dir2 are the same file or not. How do qemu-kvm distinguish 
between file1 and file2 is the problem we should solve. To solve this 
issue, maybe could refer to the linux kernel. But I don't know now.
----
How to fix this bug:
1, realpath() ---- string-only detection
2, hard/soft link. ----maybe ref 
https://lists.gnu.org/archive/html/qemu-devel/2013-11/msg01840.html
3, distinguish network storage file mounted on different dir is the same 
file or not.
> My question was about the bug.  What is the problem you are seeing and
> how do you reproduce it (including the HMP commands)?
Now I am seeing is how to "distinguish network storage file mounted on 
different dir is the same file or not".
Reproduce this bug using nfs mounted in different directories:
Create a nfs server, and mount it as followings:
# mount $IP:/home/ /mnt/dir1
# mount $IP:/home/ /mnt/dir2
(qemu) snapshot_blkdev drive-scsi0-0-0  /mnt/dir1/sn1
Formatting '/mnt/dir1/sn1', fmt=qcow2 size=32212254720 
backing_file='/home/juli/RHEL-7.0-20131127.1.qcow2_v3' 
backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
(qemu) snapshot_blkdev drive-scsi0-0-0  /mnt/dir2/sn1
Formatting '/mnt/dir2/sn1', fmt=qcow2 size=32212254720 
backing_file='/mnt/dir1/sn1' backing_fmt='qcow2' encryption=off 
cluster_size=65536 lazy_refcounts=off
(qemu) info block
drive-scsi0-0-0: removable=0 io-status=ok file=/mnt/dir2/sn1 
backing_file=/mnt/dir1/sn1 backing_file_depth=2 ro=0 drv=qcow2 
encrypted=0 bps=0 bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0
drive-virtio-disk1: removable=0 io-status=ok 
file=/home/juli/data_disk.qcow2_v3 ro=0 drv=qcow2 encrypted=0 bps=0 
bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0
ide1-cd0: removable=1 locked=0 tray-open=0 io-status=ok [not inserted]
floppy0: removable=1 locked=0 tray-open=0 [not inserted]
sd0: removable=1 locked=0 tray-open=0 [not inserted]
(qemu) snapshot_blkdev drive-scsi0-0-0  /mnt/dir2/sn2
Could not open '/mnt/dir2/sn1': Could not set AIO state: Too many open 
files: Too many open files

Best Regards,
Jun Li
>
> Stefan
Stefan Hajnoczi Jan. 2, 2014, 8:55 a.m. UTC | #7
On Tue, Dec 10, 2013 at 09:58:44PM +0800, lijun wrote:
> On 12/09/2013 08:17 PM, Stefan Hajnoczi wrote:
> >On Mon, Dec 09, 2013 at 02:06:21PM +0800, jun muzi wrote:
> >My question was about the bug.  What is the problem you are seeing and
> >how do you reproduce it (including the HMP commands)?
> Now I am seeing is how to "distinguish network storage file mounted
> on different dir is the same file or not".
> Reproduce this bug using nfs mounted in different directories:
> Create a nfs server, and mount it as followings:
> # mount $IP:/home/ /mnt/dir1
> # mount $IP:/home/ /mnt/dir2
> (qemu) snapshot_blkdev drive-scsi0-0-0  /mnt/dir1/sn1
> Formatting '/mnt/dir1/sn1', fmt=qcow2 size=32212254720
> backing_file='/home/juli/RHEL-7.0-20131127.1.qcow2_v3'
> backing_fmt='qcow2' encryption=off cluster_size=65536
> lazy_refcounts=off
> (qemu) snapshot_blkdev drive-scsi0-0-0  /mnt/dir2/sn1
> Formatting '/mnt/dir2/sn1', fmt=qcow2 size=32212254720
> backing_file='/mnt/dir1/sn1' backing_fmt='qcow2' encryption=off
> cluster_size=65536 lazy_refcounts=off
> (qemu) info block
> drive-scsi0-0-0: removable=0 io-status=ok file=/mnt/dir2/sn1
> backing_file=/mnt/dir1/sn1 backing_file_depth=2 ro=0 drv=qcow2
> encrypted=0 bps=0 bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0
> drive-virtio-disk1: removable=0 io-status=ok
> file=/home/juli/data_disk.qcow2_v3 ro=0 drv=qcow2 encrypted=0 bps=0
> bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0
> ide1-cd0: removable=1 locked=0 tray-open=0 io-status=ok [not inserted]
> floppy0: removable=1 locked=0 tray-open=0 [not inserted]
> sd0: removable=1 locked=0 tray-open=0 [not inserted]
> (qemu) snapshot_blkdev drive-scsi0-0-0  /mnt/dir2/sn2
> Could not open '/mnt/dir2/sn1': Could not set AIO state: Too many
> open files: Too many open files

Okay, I think we got side-tracked worrying about identifying identical
files.  The problem in your example is more fundamental.

Here is what should have happened:

  (qemu) snapshot_blkdev drive-scsi0-0-0  /mnt/dir2/sn1
  Could not create '/mnt/dir2/sn1': File exists

The file was previously created with the name /mnt/dir1/sn1.  Running
snapshot_blkdev with /mnt/dir2/sn1 clobbers the file in your example.
This is bad because it corrupts the backing chain.

If QEMU uses O_CREAT | O_EXCL open(2) flags then creating the snapshot
file fails and the user will not mistakingly overwrite sn1.

However, QEMU does not use O_EXCL to create image files anywhere today
(qemu-img or QEMU monitor commands).  Returning an error is not
backwards-compatible since existing users might rely on clobbering files
(there are safe cases where it can be useful).

We could add an option to commands that create files but I'm not sure if
it's worth the effort since human users who are most at risk probably
won't provide this new option...

That said, if you want to add an O_EXCL option to bdrv_img_create(),
qemu-img commands, and monitor commands, I'm happy to review patches.

Stefan
Eric Blake Jan. 2, 2014, 1:06 p.m. UTC | #8
On 01/02/2014 01:55 AM, Stefan Hajnoczi wrote:

> Okay, I think we got side-tracked worrying about identifying identical
> files.  The problem in your example is more fundamental.
> 
> Here is what should have happened:
> 
>   (qemu) snapshot_blkdev drive-scsi0-0-0  /mnt/dir2/sn1
>   Could not create '/mnt/dir2/sn1': File exists
> 
> The file was previously created with the name /mnt/dir1/sn1.  Running
> snapshot_blkdev with /mnt/dir2/sn1 clobbers the file in your example.
> This is bad because it corrupts the backing chain.
> 
> If QEMU uses O_CREAT | O_EXCL open(2) flags then creating the snapshot
> file fails and the user will not mistakingly overwrite sn1.
> 
> However, QEMU does not use O_EXCL to create image files anywhere today
> (qemu-img or QEMU monitor commands).  Returning an error is not
> backwards-compatible since existing users might rely on clobbering files
> (there are safe cases where it can be useful).

In fact, libvirt _requires_ that you clobber existing image files - when
libvirt drives qemu with SELinux labelling enabled, then qemu cannot
create files, but can only open already existing files that have already
been labelled.  So libvirt would need a way to avoid O_EXCL, even if you
add it and make it default (and if you change the default to O_EXCL, you
also need to provide a way to probe for the new switch that can bypass
that new default).

> 
> We could add an option to commands that create files but I'm not sure if
> it's worth the effort since human users who are most at risk probably
> won't provide this new option...

Changing to be safe by default and requiring a new option to allow reuse
of existing files might be okay; but it will be backwards incompatible.
 Keeping existing behavior and requiring a new option to turn on O_EXCL
for safety is back-compat friendly, but is the very situation where
users aren't going to know they need to use the new option, so you've
gained no safety.
diff mbox

Patch

--- a/hmp.c    2013-11-15 23:15:46.733361130 +0800
+++ b/hmp.c    2013-11-16 00:20:23.972248509 +0800
@@ -957,10 +957,12 @@  void hmp_snapshot_blkdev(Monitor *mon, c
  {
      const char *device = qdict_get_str(qdict, "device");
      const char *filename = qdict_get_try_str(qdict, "snapshot-file");
+    const char *p = filename;
      const char *format = qdict_get_try_str(qdict, "format");
      int reuse = qdict_get_try_bool(qdict, "reuse", 0);
      enum NewImageMode mode;
      Error *errp = NULL;
+    int count = 1;

      if (!filename) {
          /* In the future, if 'snapshot-file' is not specified, the