Message ID | 52864C79.20800@gmail.com |
---|---|
State | New |
Headers | show |
On 15.11.2013 17:31, lijun wrote: > From: Jun Li <junmuzi@gmail.com> > > Hi all, > > snapshot_blkdev can not consider //root/sn1 and /root/sn1 as the same > file. when file /root/sn1 is the base file, do snapshot using file > //root/sn1, qemu consider it as a new file. So this will rewrite the > base file. Actually, the same problem can occur anyway if you have a path with a couple of “.” and “..” in it – or even just a hardlink. Thus, to be completely safe, we'd have to check whether the snapshot file (if it already exists) has a different inode number and/or is located on a different filesystem. > Signed-off-by: Jun Li <junmuzi@gmail.com> > > --- a/hmp.c 2013-11-15 23:15:46.733361130 +0800 > +++ b/hmp.c 2013-11-16 00:20:23.972248509 +0800 > @@ -957,10 +957,12 @@ void hmp_snapshot_blkdev(Monitor *mon, c > { > const char *device = qdict_get_str(qdict, "device"); > const char *filename = qdict_get_try_str(qdict, "snapshot-file"); > + const char *p = filename; > const char *format = qdict_get_try_str(qdict, "format"); > int reuse = qdict_get_try_bool(qdict, "reuse", 0); > enum NewImageMode mode; > Error *errp = NULL; > + int count = 1; > > if (!filename) { > /* In the future, if 'snapshot-file' is not specified, the snapshot > @@ -970,6 +972,18 @@ void hmp_snapshot_blkdev(Monitor *mon, c > return; > } > > + /* Delete duplicate '/' in filename. */ > + while (*p != '\0') { > + if (*p == '/') { > + while (*(p + count++) == '/') { > + /* do null. */ > + } > + strcpy((char *)(p + 1), (char *)(p + count - 1)); Casting a const char * to a char * seems very bogus to me. Using g_strdup or something before is probably a better idea. Also, don't use strcpy but memmove (since both strings overlap). All in all, using realpath() is probably the better idea anyway (it'd also resolve the . and .. problem (and symlinks), though it still won't prevent hardlinks from messing things up). Max > + count = 1; > + } > + p++; > + } > + > mode = reuse ? NEW_IMAGE_MODE_EXISTING : NEW_IMAGE_MODE_ABSOLUTE_PATHS; > qmp_blockdev_snapshot_sync(device, filename, !!format, format, > true, mode, &errp); >
On 11/15/2013 09:42 AM, Max Reitz wrote: > Actually, the same problem can occur anyway if you have a path with a > couple of “.” and “..” in it – or even just a hardlink. Thus, to be > completely safe, we'd have to check whether the snapshot file (if it > already exists) has a different inode number and/or is located on a > different filesystem. See also the recent thread on detecting backing file loops - this should be part of that solution (if it isn't already): https://lists.gnu.org/archive/html/qemu-devel/2013-11/msg01840.html Backing file loops might get away with string-only detection; but then I start to worry that the string-only detection will misbehave on relative paths (consider: /dir1/a <- /dir1/b [backed by relative 'a'] <- /dir2/a [backed by absolute /dir1/b] <- /dir2/a [backed by relative 'a']); devno/inode pairs are the only reliable to detect loops when only the filesystem is involved, but then you also introduce network protocols (and there, it's worse: gluster://host1/vol/img and gluster://host2/vol/img could be the same file, if host1 and host2 are part of the same storage cluster, but there is no devno/inode to tell you that).
On Fri, Nov 15, 2013 at 10:21:40AM -0700, Eric Blake wrote: > On 11/15/2013 09:42 AM, Max Reitz wrote: > > > Actually, the same problem can occur anyway if you have a path with a > > couple of “.” and “..” in it – or even just a hardlink. Thus, to be > > completely safe, we'd have to check whether the snapshot file (if it > > already exists) has a different inode number and/or is located on a > > different filesystem. > > See also the recent thread on detecting backing file loops - this should > be part of that solution (if it isn't already): > https://lists.gnu.org/archive/html/qemu-devel/2013-11/msg01840.html > > > Backing file loops might get away with string-only detection; but then I > start to worry that the string-only detection will misbehave on relative > paths (consider: /dir1/a <- /dir1/b [backed by relative 'a'] <- /dir2/a > [backed by absolute /dir1/b] <- /dir2/a [backed by relative 'a']); > devno/inode pairs are the only reliable to detect loops when only the > filesystem is involved, but then you also introduce network protocols > (and there, it's worse: gluster://host1/vol/img and > gluster://host2/vol/img could be the same file, if host1 and host2 are > part of the same storage cluster, but there is no devno/inode to tell > you that). Detecting identical "files" is not a problem that can be solved in the general case. Once network storage comes into play we don't have the ability to check file identity. Users can misconfigure QEMU if they try hard enough. Filename string manipulation is very error-prone and I'd rather just avoid it than provide a false sense of security. What's the real use case for this patch? Stefan
If mount a local file(disk) in two different dirctories, it is similar to the network storage. Detecting identical "files" is still a problem. Such as: dd if=/dev/zero of=aa bs=1M count=10 mkfs.ext4 aa Then mount aa to two directories. mount aa /mnt/dir1 mount aa /mnt/dir2 # tune2fs -l aa tune2fs 1.42.5 (29-Jul-2012) Filesystem volume name: <none> Last mounted on: <not available> Filesystem UUID: bb095a44-e896-4949-b5f4-9d9468a7178e Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 2560 Block count: 10240 Reserved block count: 512 Free blocks: 8795 Free inodes: 2549 First block: 1 Block size: 1024 Fragment size: 1024 Reserved GDT blocks: 39 Blocks per group: 8192 Fragments per group: 8192 Inodes per group: 1280 Inode blocks per group: 160 Flex block group size: 16 Filesystem created: Mon Dec 9 14:00:46 2013 Last mount time: Mon Dec 9 14:01:35 2013 Last write time: Mon Dec 9 14:01:35 2013 Mount count: 2 Maximum mount count: -1 Last checked: Mon Dec 9 14:00:46 2013 Check interval: 0 (<none>) Lifetime writes: 1150 kB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: efe7990e-1414-4ede-9488-4f4d369fe7f5 Journal backup: inode blocks --- Can create the same file in /mnt/dir1 and /mnt/dir2. ----- For network storage, usually mounted via FuseFS(such as glusterfs). Can not check file identity when mount in more than a directory. So maybe it is the same as the network storage file. Best wishes, Jun Li 2013/11/18 Stefan Hajnoczi <stefanha@gmail.com> > On Fri, Nov 15, 2013 at 10:21:40AM -0700, Eric Blake wrote: > > On 11/15/2013 09:42 AM, Max Reitz wrote: > > > > > Actually, the same problem can occur anyway if you have a path with a > > > couple of “.” and “..” in it – or even just a hardlink. Thus, to be > > > completely safe, we'd have to check whether the snapshot file (if it > > > already exists) has a different inode number and/or is located on a > > > different filesystem. > > > > See also the recent thread on detecting backing file loops - this should > > be part of that solution (if it isn't already): > > https://lists.gnu.org/archive/html/qemu-devel/2013-11/msg01840.html > > > > > > Backing file loops might get away with string-only detection; but then I > > start to worry that the string-only detection will misbehave on relative > > paths (consider: /dir1/a <- /dir1/b [backed by relative 'a'] <- /dir2/a > > [backed by absolute /dir1/b] <- /dir2/a [backed by relative 'a']); > > devno/inode pairs are the only reliable to detect loops when only the > > filesystem is involved, but then you also introduce network protocols > > (and there, it's worse: gluster://host1/vol/img and > > gluster://host2/vol/img could be the same file, if host1 and host2 are > > part of the same storage cluster, but there is no devno/inode to tell > > you that). > > Detecting identical "files" is not a problem that can be solved in the > general case. Once network storage comes into play we don't have the > ability to check file identity. > > Users can misconfigure QEMU if they try hard enough. Filename string > manipulation is very error-prone and I'd rather just avoid it than > provide a false sense of security. > > What's the real use case for this patch? > > Stefan >
On Mon, Dec 09, 2013 at 02:06:21PM +0800, jun muzi wrote: > If mount a local file(disk) in two different dirctories, it is similar to the > network storage. Detecting identical "files" is still a problem. > Such as: > dd if=/dev/zero of=aa bs=1M count=10 > mkfs.ext4 aa > Then mount aa to two directories. > mount aa /mnt/dir1 > mount aa /mnt/dir2 (Off-topic, but you should never mount a regular file system read/write more than once at a time. This example is wrong, it can corrupt the file system.) My question was about the bug. What is the problem you are seeing and how do you reproduce it (including the HMP commands)? Stefan
On 12/09/2013 08:17 PM, Stefan Hajnoczi wrote: > On Mon, Dec 09, 2013 at 02:06:21PM +0800, jun muzi wrote: >> If mount a local file(disk) in two different dirctories, it is similar to the >> network storage. Detecting identical "files" is still a problem. >> Such as: >> dd if=/dev/zero of=aa bs=1M count=10 >> mkfs.ext4 aa >> Then mount aa to two directories. >> mount aa /mnt/dir1 >> mount aa /mnt/dir2 > (Off-topic, but you should never mount a regular file system read/write > more than once at a time. This example is wrong, it can corrupt the > file system.) yes, I should not to mount a regular file more than once at a time. But I just want to say the network storage will be mount more than once at a time. Such as : Create a nfs server, and mount it as followings: # mount $IP:/home/ /mnt/dir1 # mount $IP:/home/ /mnt/dir2 If create a new file named test in /mnt/dir1, /mnt/dir2 will know this new file "test". So the linux kernel can know the file in /mnt/dir1 and /mnt/dir2 whether it is the same file or not. Based on above info, qemu-kvm can know whether file1 in /mnt/dir1 and file2 in /mnt/dir2 are the same file or not. How do qemu-kvm distinguish between file1 and file2 is the problem we should solve. To solve this issue, maybe could refer to the linux kernel. But I don't know now. ---- How to fix this bug: 1, realpath() ---- string-only detection 2, hard/soft link. ----maybe ref https://lists.gnu.org/archive/html/qemu-devel/2013-11/msg01840.html 3, distinguish network storage file mounted on different dir is the same file or not. > My question was about the bug. What is the problem you are seeing and > how do you reproduce it (including the HMP commands)? Now I am seeing is how to "distinguish network storage file mounted on different dir is the same file or not". Reproduce this bug using nfs mounted in different directories: Create a nfs server, and mount it as followings: # mount $IP:/home/ /mnt/dir1 # mount $IP:/home/ /mnt/dir2 (qemu) snapshot_blkdev drive-scsi0-0-0 /mnt/dir1/sn1 Formatting '/mnt/dir1/sn1', fmt=qcow2 size=32212254720 backing_file='/home/juli/RHEL-7.0-20131127.1.qcow2_v3' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off (qemu) snapshot_blkdev drive-scsi0-0-0 /mnt/dir2/sn1 Formatting '/mnt/dir2/sn1', fmt=qcow2 size=32212254720 backing_file='/mnt/dir1/sn1' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off (qemu) info block drive-scsi0-0-0: removable=0 io-status=ok file=/mnt/dir2/sn1 backing_file=/mnt/dir1/sn1 backing_file_depth=2 ro=0 drv=qcow2 encrypted=0 bps=0 bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0 drive-virtio-disk1: removable=0 io-status=ok file=/home/juli/data_disk.qcow2_v3 ro=0 drv=qcow2 encrypted=0 bps=0 bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0 ide1-cd0: removable=1 locked=0 tray-open=0 io-status=ok [not inserted] floppy0: removable=1 locked=0 tray-open=0 [not inserted] sd0: removable=1 locked=0 tray-open=0 [not inserted] (qemu) snapshot_blkdev drive-scsi0-0-0 /mnt/dir2/sn2 Could not open '/mnt/dir2/sn1': Could not set AIO state: Too many open files: Too many open files Best Regards, Jun Li > > Stefan
On Tue, Dec 10, 2013 at 09:58:44PM +0800, lijun wrote: > On 12/09/2013 08:17 PM, Stefan Hajnoczi wrote: > >On Mon, Dec 09, 2013 at 02:06:21PM +0800, jun muzi wrote: > >My question was about the bug. What is the problem you are seeing and > >how do you reproduce it (including the HMP commands)? > Now I am seeing is how to "distinguish network storage file mounted > on different dir is the same file or not". > Reproduce this bug using nfs mounted in different directories: > Create a nfs server, and mount it as followings: > # mount $IP:/home/ /mnt/dir1 > # mount $IP:/home/ /mnt/dir2 > (qemu) snapshot_blkdev drive-scsi0-0-0 /mnt/dir1/sn1 > Formatting '/mnt/dir1/sn1', fmt=qcow2 size=32212254720 > backing_file='/home/juli/RHEL-7.0-20131127.1.qcow2_v3' > backing_fmt='qcow2' encryption=off cluster_size=65536 > lazy_refcounts=off > (qemu) snapshot_blkdev drive-scsi0-0-0 /mnt/dir2/sn1 > Formatting '/mnt/dir2/sn1', fmt=qcow2 size=32212254720 > backing_file='/mnt/dir1/sn1' backing_fmt='qcow2' encryption=off > cluster_size=65536 lazy_refcounts=off > (qemu) info block > drive-scsi0-0-0: removable=0 io-status=ok file=/mnt/dir2/sn1 > backing_file=/mnt/dir1/sn1 backing_file_depth=2 ro=0 drv=qcow2 > encrypted=0 bps=0 bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0 > drive-virtio-disk1: removable=0 io-status=ok > file=/home/juli/data_disk.qcow2_v3 ro=0 drv=qcow2 encrypted=0 bps=0 > bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0 > ide1-cd0: removable=1 locked=0 tray-open=0 io-status=ok [not inserted] > floppy0: removable=1 locked=0 tray-open=0 [not inserted] > sd0: removable=1 locked=0 tray-open=0 [not inserted] > (qemu) snapshot_blkdev drive-scsi0-0-0 /mnt/dir2/sn2 > Could not open '/mnt/dir2/sn1': Could not set AIO state: Too many > open files: Too many open files Okay, I think we got side-tracked worrying about identifying identical files. The problem in your example is more fundamental. Here is what should have happened: (qemu) snapshot_blkdev drive-scsi0-0-0 /mnt/dir2/sn1 Could not create '/mnt/dir2/sn1': File exists The file was previously created with the name /mnt/dir1/sn1. Running snapshot_blkdev with /mnt/dir2/sn1 clobbers the file in your example. This is bad because it corrupts the backing chain. If QEMU uses O_CREAT | O_EXCL open(2) flags then creating the snapshot file fails and the user will not mistakingly overwrite sn1. However, QEMU does not use O_EXCL to create image files anywhere today (qemu-img or QEMU monitor commands). Returning an error is not backwards-compatible since existing users might rely on clobbering files (there are safe cases where it can be useful). We could add an option to commands that create files but I'm not sure if it's worth the effort since human users who are most at risk probably won't provide this new option... That said, if you want to add an O_EXCL option to bdrv_img_create(), qemu-img commands, and monitor commands, I'm happy to review patches. Stefan
On 01/02/2014 01:55 AM, Stefan Hajnoczi wrote: > Okay, I think we got side-tracked worrying about identifying identical > files. The problem in your example is more fundamental. > > Here is what should have happened: > > (qemu) snapshot_blkdev drive-scsi0-0-0 /mnt/dir2/sn1 > Could not create '/mnt/dir2/sn1': File exists > > The file was previously created with the name /mnt/dir1/sn1. Running > snapshot_blkdev with /mnt/dir2/sn1 clobbers the file in your example. > This is bad because it corrupts the backing chain. > > If QEMU uses O_CREAT | O_EXCL open(2) flags then creating the snapshot > file fails and the user will not mistakingly overwrite sn1. > > However, QEMU does not use O_EXCL to create image files anywhere today > (qemu-img or QEMU monitor commands). Returning an error is not > backwards-compatible since existing users might rely on clobbering files > (there are safe cases where it can be useful). In fact, libvirt _requires_ that you clobber existing image files - when libvirt drives qemu with SELinux labelling enabled, then qemu cannot create files, but can only open already existing files that have already been labelled. So libvirt would need a way to avoid O_EXCL, even if you add it and make it default (and if you change the default to O_EXCL, you also need to provide a way to probe for the new switch that can bypass that new default). > > We could add an option to commands that create files but I'm not sure if > it's worth the effort since human users who are most at risk probably > won't provide this new option... Changing to be safe by default and requiring a new option to allow reuse of existing files might be okay; but it will be backwards incompatible. Keeping existing behavior and requiring a new option to turn on O_EXCL for safety is back-compat friendly, but is the very situation where users aren't going to know they need to use the new option, so you've gained no safety.
--- a/hmp.c 2013-11-15 23:15:46.733361130 +0800 +++ b/hmp.c 2013-11-16 00:20:23.972248509 +0800 @@ -957,10 +957,12 @@ void hmp_snapshot_blkdev(Monitor *mon, c { const char *device = qdict_get_str(qdict, "device"); const char *filename = qdict_get_try_str(qdict, "snapshot-file"); + const char *p = filename; const char *format = qdict_get_try_str(qdict, "format"); int reuse = qdict_get_try_bool(qdict, "reuse", 0); enum NewImageMode mode; Error *errp = NULL; + int count = 1; if (!filename) { /* In the future, if 'snapshot-file' is not specified, the