Patchwork qemu-img: add FUSE-based image access

login
register
mail settings
Submitter Jan Kiszka
Date March 25, 2010, 5:52 p.m.
Message ID <4BABA2FB.8050505@web.de>
Download mbox | patch
Permalink /patch/48588/
State New
Headers show

Comments

Jan Kiszka - March 25, 2010, 5:52 p.m.
This adds the "map" subcommand to qemu-img. It is able to expose the raw
content of a disk image via a FUSE filesystem. Both the whole disk can
be accessed, e.g. to run partitioning tools against it, as well as
individual partitions. This allows to create new filesystems in the
image or loop-back mount exiting ones. Using the great mountlo tool
from the FUSE collection [1][2], the latter can even be done by non-root
users (the former anyway).

There are some dependency to fulfill to gain all features: Partition
scanning is done via recent libblkid (I used version 2.17.1). If this
library is not available, only the disk file is provide. Fortunately,
mountlo can do partition scanning as well ("-p n") to work around this.

Moreover, libfuse >= 2.8 and a host kernel >= 2.6.29 is required for
seamless disk access via fdisk. Otherwise, the BLKGETSIZE64 IOCTL cannot
be provided, and the number of cylinders has to set explicitly (e.g. via
"-C n").

This work was inspired by Ashley Saulsbury's qemu-diskp [3].

[1] http://sourceforge.net/apps/mediawiki/fuse/index.php?title=FileSystems#Mountlo
[2] http://sourceforge.net/projects/fuse/files/mountlo/
[3] http://www.saulsbury.org/software/virtualization.html

Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
---
 Makefile         |    6 +-
 Makefile.objs    |    6 +
 configure        |   55 +++++++
 qemu-img-cmds.hx |   11 ++
 qemu-img-map.c   |  438 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-img.c       |   13 +-
 qemu-img.h       |   13 ++
 qemu-img.texi    |   10 ++
 8 files changed, 545 insertions(+), 7 deletions(-)
 create mode 100644 qemu-img-map.c
 create mode 100644 qemu-img.h
Anthony Liguori - March 25, 2010, 8:58 p.m.
On 03/25/2010 12:52 PM, Jan Kiszka wrote:
> This adds the "map" subcommand to qemu-img. It is able to expose the raw
> content of a disk image via a FUSE filesystem. Both the whole disk can
> be accessed, e.g. to run partitioning tools against it, as well as
> individual partitions. This allows to create new filesystems in the
> image or loop-back mount exiting ones. Using the great mountlo tool
> from the FUSE collection [1][2], the latter can even be done by non-root
> users (the former anyway).
>
> There are some dependency to fulfill to gain all features: Partition
> scanning is done via recent libblkid (I used version 2.17.1). If this
> library is not available, only the disk file is provide. Fortunately,
> mountlo can do partition scanning as well ("-p n") to work around this.
>
> Moreover, libfuse>= 2.8 and a host kernel>= 2.6.29 is required for
> seamless disk access via fdisk. Otherwise, the BLKGETSIZE64 IOCTL cannot
> be provided, and the number of cylinders has to set explicitly (e.g. via
> "-C n").
>
> This work was inspired by Ashley Saulsbury's qemu-diskp [3].
>
> [1] http://sourceforge.net/apps/mediawiki/fuse/index.php?title=FileSystems#Mountlo
> [2] http://sourceforge.net/projects/fuse/files/mountlo/
> [3] http://www.saulsbury.org/software/virtualization.html
>
> Signed-off-by: Jan Kiszka<jan.kiszka@web.de>
>    

This has been proposed quite a few times.

In fact, I wrote something like this prior to implementing qemu-nbd.

The problem with fuse is that as default configured, you can't actually 
enter into a fuse filesystem as root and since you need to be root to 
loopback mount it, it pretty nasty from a usability perspective.

So why did you go the fuse route instead of using qemu-nbd?

Regards,

Anthony Liguori

> ---
>   Makefile         |    6 +-
>   Makefile.objs    |    6 +
>   configure        |   55 +++++++
>   qemu-img-cmds.hx |   11 ++
>   qemu-img-map.c   |  438 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   qemu-img.c       |   13 +-
>   qemu-img.h       |   13 ++
>   qemu-img.texi    |   10 ++
>   8 files changed, 545 insertions(+), 7 deletions(-)
>   create mode 100644 qemu-img-map.c
>   create mode 100644 qemu-img.h
>
> diff --git a/Makefile b/Makefile
> index 57c354d..d5a1dae 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -126,10 +126,12 @@ bt-host.o: QEMU_CFLAGS += $(BLUEZ_CFLAGS)
>
>   ######################################################################
>
> -qemu-img.o: qemu-img-cmds.h
> +qemu-img.o: qemu-img.h qemu-img-cmds.h
>   qemu-img.o qemu-tool.o qemu-nbd.o qemu-io.o: $(GENERATED_HEADERS)
>
> -qemu-img$(EXESUF): qemu-img.o qemu-tool.o $(block-obj-y) $(qobject-obj-y)
> +qemu-img-map.o: QEMU_CFLAGS += $(FUSE_CFLAGS) $(BLKID_CFLAGS)
> +
> +qemu-img$(EXESUF): $(qemu-img-y) $(block-obj-y) $(qobject-obj-y)
>
>   qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o $(block-obj-y) $(qobject-obj-y)
>
> diff --git a/Makefile.objs b/Makefile.objs
> index 281f7a6..8a651d2 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -207,3 +207,9 @@ libdis-$(CONFIG_PPC_DIS) += ppc-dis.o
>   libdis-$(CONFIG_S390_DIS) += s390-dis.o
>   libdis-$(CONFIG_SH4_DIS) += sh4-dis.o
>   libdis-$(CONFIG_SPARC_DIS) += sparc-dis.o
> +
> +######################################################################
> +# qemu-img
> +
> +qemu-img-y = qemu-img.o qemu-tool.o
> +qemu-img-$(CONFIG_FUSE) += qemu-img-map.o
> diff --git a/configure b/configure
> index 6bc40a3..c84aaa9 100755
> --- a/configure
> +++ b/configure
> @@ -263,6 +263,7 @@ vnc_tls=""
>   vnc_sasl=""
>   xen=""
>   linux_aio=""
> +fuse=""
>
>   gprof="no"
>   debug_tcg="no"
> @@ -639,6 +640,10 @@ for opt do
>     ;;
>     --enable-linux-aio) linux_aio="yes"
>     ;;
> +  --disable-fuse) fuse="no"
> +  ;;
> +  --enable-fuse) fuse="yes"
> +  ;;
>     --enable-io-thread) io_thread="yes"
>     ;;
>     --disable-blobs) blobs="no"
> @@ -801,6 +806,8 @@ echo "  --disable-vde            disable support for vde network"
>   echo "  --enable-vde             enable support for vde network"
>   echo "  --disable-linux-aio      disable Linux AIO support"
>   echo "  --enable-linux-aio       enable Linux AIO support"
> +echo "  --disable-fuse           disable support for FUSE in qemu-img"
> +echo "  --enable-fuse            enable support for FUSE in qemu-img"
>   echo "  --enable-io-thread       enable IO thread"
>   echo "  --disable-blobs          disable installing provided firmware blobs"
>   echo "  --kerneldir=PATH         look for kernel includes in PATH"
> @@ -1586,6 +1593,44 @@ EOF
>     fi
>   fi
>
> +##########################################
> +# FUSE libraries probe
> +if test "$fuse" != "no" ; then
> +  fuse_cflags=`pkg-config --cflags fuse 2>  /dev/null`
> +  fuse_libs=`pkg-config --libs fuse 2>  /dev/null`
> +  cat>  $TMPC<<  EOF
> +#include<fuse.h>
> +int main(int argc, const char *argv[])
> +{
> +    return fuse_main(argc, argv, NULL);
> +}
> +EOF
> +  if compile_prog "$fuse_cflags" "$fuse_libs" ; then
> +    fuse=yes
> +    libs_tools="$fuse_libs $libs_tools"
> +  else
> +    if test "$fuse" = "yes" ; then
> +      feature_not_found "FUSE"
> +    fi
> +    fuse=no
> +  fi
> +fi
> +
> +##########################################
> +# blkid_partlist probe
> +blkid_cflags=`pkg-config --cflags blkid 2>  /dev/null`
> +blkid_libs=`pkg-config --libs blkid 2>  /dev/null`
> +cat>  $TMPC<<EOF
> +#include<blkid.h>
> +int main(void) { blkid_partlist ls; return 0; }
> +EOF
> +blkid_partlist=no
> +if compile_prog "$blkid_cflags" "$blkid_libs" ; then
> +  blkid_partlist=yes
> +  libs_tools="$blkid_libs $libs_tools"
> +fi
> +
> +
>   #
>   # Check for xxxat() functions when we are building linux-user
>   # emulator.  This is done because older glibc versions don't
> @@ -1962,6 +2007,8 @@ echo "PIE user targets  $user_pie"
>   echo "vde support       $vde"
>   echo "IO thread         $io_thread"
>   echo "Linux AIO support $linux_aio"
> +echo "FUSE support      $fuse"
> +echo "partlist support  $blkid_partlist"
>   echo "Install blobs     $blobs"
>   echo "KVM support       $kvm"
>   echo "fdt support       $fdt"
> @@ -2183,6 +2230,14 @@ fi
>   if test "$fdatasync" = "yes" ; then
>     echo "CONFIG_FDATASYNC=y">>  $config_host_mak
>   fi
> +if test "$fuse" = "yes" ; then
> +  echo "CONFIG_FUSE=y">>  $config_host_mak
> +  echo "FUSE_CFLAGS=$fuse_cflags">>  $config_host_mak
> +fi
> +if test "$blkid_partlist" = "yes" ; then
> +  echo "CONFIG_BLKID_PARTLIST=y">>  $config_host_mak
> +  echo "BLKID_CFLAGS=$blkid_cflags">>  $config_host_mak
> +fi
>
>   # XXX: suppress that
>   if [ "$bsd" = "yes" ] ; then
> diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
> index f96876a..94c6e66 100644
> --- a/qemu-img-cmds.hx
> +++ b/qemu-img-cmds.hx
> @@ -49,5 +49,16 @@ DEF("rebase", img_rebase,
>       "rebase [-f fmt] [-u] -b backing_file [-F backing_fmt] filename")
>   STEXI
>   @item rebase [-f @var{fmt}] [-u] -b @var{backing_file} [-F @var{backing_fmt}] @var{filename}
> +ETEXI
> +
> +#ifdef CONFIG_FUSE
> +DEF("map", img_map,
> +    "map [-f fmt] [<FUSE options>] filename mountpoint")
> +#endif
> +STEXI
> +@item map [@var{FUSE options}] @var{filename} @var{mountpoint}
> +ETEXI
> +
> +STEXI
>   @end table
>   ETEXI
> diff --git a/qemu-img-map.c b/qemu-img-map.c
> new file mode 100644
> index 0000000..cd6bbf4
> --- /dev/null
> +++ b/qemu-img-map.c
> @@ -0,0 +1,438 @@
> +/*
> + * QEMU disk image utility
> + *
> + * Copyright (c) 2010 Jan Kiszka
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#include "qemu-img.h"
> +#include "qemu-option.h"
> +#include "osdep.h"
> +#include "block_int.h"
> +#include<stdio.h>
> +#include<getopt.h>
> +#include<pthread.h>
> +#include<signal.h>
> +
> +#define FUSE_USE_VERSION 28
> +#include<fuse.h>
> +
> +#ifdef CONFIG_LINUX
> +#include<linux/fs.h>
> +#endif
> +
> +#define ENTRY_INVALID   1
> +#define ENTRY_DIRTY     2
> +
> +#define ENTRY_PATH_MAX  16
> +
> +struct map_entry {
> +    struct map_entry *next;
> +    const char       *path;
> +    size_t           size;
> +    off_t            offset;
> +    unsigned int     use_counter;
> +    unsigned int     flags;
> +};
> +
> +static struct stat img_stat;
> +static BlockDriverState *img_bs;
> +static struct map_entry disk_entry = { .path = "/disk" };
> +static char *disk_path;
> +
> +#ifdef CONFIG_BLKID_PARTLIST
> +
> +#include<blkid.h>
> +
> +static pthread_t reader_thread;
> +static sigset_t wakeup_sigset;
> +static pthread_mutex_t entry_lock = PTHREAD_MUTEX_INITIALIZER;
> +static struct map_entry *last_entry =&disk_entry;
> +
> +static void *partition_reader(void *unused)
> +{
> +    struct map_entry *entry;
> +    blkid_partition par;
> +    blkid_partlist ls;
> +    blkid_probe pr;
> +    int nparts, i;
> +    char *path;
> +
> +    while (sigwaitinfo(&wakeup_sigset, NULL)>= 0) {
> +        pr = blkid_new_probe_from_filename(disk_path);
> +        if (!pr) {
> +            continue;
> +        }
> +
> +        ls = blkid_probe_get_partitions(pr);
> +        if (!ls) {
> +            blkid_free_probe(pr);
> +            continue;
> +        }
> +
> +        nparts = blkid_partlist_numof_partitions(ls);
> +
> +        for (i = 0; i<  nparts; i++) {
> +            entry = calloc(1, sizeof(*entry));
> +            if (!entry) {
> +                continue;
> +            }
> +            path = malloc(ENTRY_PATH_MAX);
> +            if (!path) {
> +                free(entry);
> +                continue;
> +            }
> +
> +            par = blkid_partlist_get_partition(ls, i);
> +
> +            snprintf(path, ENTRY_PATH_MAX, "/partition%d",
> +                     blkid_partition_get_partno(par));
> +            entry->path = path;
> +            entry->size = blkid_partition_get_size(par) * BDRV_SECTOR_SIZE;
> +            entry->offset = blkid_partition_get_start(par) * BDRV_SECTOR_SIZE;
> +
> +            pthread_mutex_lock(&entry_lock);
> +
> +            last_entry->next = entry;
> +            last_entry = entry;
> +
> +            pthread_mutex_unlock(&entry_lock);
> +        }
> +
> +        blkid_free_probe(pr);
> +    }
> +
> +    return NULL;
> +}
> +
> +static void update_partitions(void)
> +{
> +    struct map_entry *entry = disk_entry.next;
> +    struct map_entry *old;
> +
> +    /* release old partions */
> +    pthread_mutex_lock(&entry_lock);
> +
> +    while (entry) {
> +        old = entry;
> +        entry = entry->next;
> +        if (old->use_counter == 0) {
> +            free((void *)old->path);
> +            free(old);
> +        } else {
> +            old->flags = ENTRY_INVALID;
> +        }
> +    }
> +
> +    disk_entry.next = NULL;
> +    last_entry =&disk_entry;
> +
> +    disk_entry.flags&= ~ENTRY_DIRTY;
> +
> +    pthread_mutex_unlock(&entry_lock);
> +
> +    /* kick off partition table scan */
> +    pthread_kill(reader_thread, SIGUSR1);
> +}
> +
> +static void init_reader_thread(void)
> +{
> +    sigemptyset(&wakeup_sigset);
> +    sigaddset(&wakeup_sigset, SIGUSR1);
> +    sigprocmask(SIG_BLOCK,&wakeup_sigset, NULL);
> +
> +    if (pthread_create(&reader_thread, NULL, partition_reader, NULL)) {
> +        error("Could not spawn partition reader thread");
> +    }
> +}
> +
> +#else /* !CONFIG_BLKID_PARTLIST */
> +
> +static inline void update_partitions(void) { }
> +static inline void init_reader_thread(void) { }
> +
> +#endif /* !CONFIG_BLKID_PARTLIST */
> +
> +static struct map_entry *find_map_entry(const char *path)
> +{
> +    struct map_entry *entry =&disk_entry;
> +
> +    do {
> +        if (strcmp(entry->path, path) == 0) {
> +            break;
> +        }
> +        entry = entry->next;
> +    } while (entry);
> +
> +    return entry;
> +}
> +
> +static void *map_init(struct fuse_conn_info *conn)
> +{
> +    init_reader_thread();
> +    update_partitions();
> +    return NULL;
> +}
> +
> +static int map_getattr(const char *path, struct stat *stbuf)
> +{
> +    struct map_entry *entry;
> +    int res = 0;
> +
> +    memset(stbuf, 0, sizeof(struct stat));
> +    stbuf->st_uid   = img_stat.st_uid;
> +    stbuf->st_gid   = img_stat.st_gid;
> +    stbuf->st_atime = img_stat.st_atime;
> +    stbuf->st_mtime = img_stat.st_mtime;
> +    stbuf->st_ctime = img_stat.st_ctime;
> +
> +    if (strcmp(path, "/") == 0) {
> +        stbuf->st_mode = S_IFDIR | 0111 | img_stat.st_mode;
> +        stbuf->st_nlink = 2;
> +    } else {
> +        entry = find_map_entry(path);
> +        if (entry) {
> +            stbuf->st_mode = S_IFREG | img_stat.st_mode;
> +            stbuf->st_nlink = 1;
> +            stbuf->st_size = entry->size;
> +        } else {
> +            res = -ENOENT;
> +        }
> +    }
> +
> +    return res;
> +}
> +
> +static int map_readdir(const char *path, void *buf, fuse_fill_dir_t filler,
> +                       off_t offset, struct fuse_file_info *fi)
> +{
> +    struct map_entry *entry;
> +
> +    if (strcmp(path, "/") != 0) {
> +        return -ENOENT;
> +    }
> +    filler(buf, ".", NULL, 0);
> +    filler(buf, "..", NULL, 0);
> +    for (entry =&disk_entry; entry; entry = entry->next) {
> +        filler(buf, entry->path+1, NULL, 0);
> +    }
> +
> +    return 0;
> +}
> +
> +static int map_open(const char *path, struct fuse_file_info *fi)
> +{
> +    struct map_entry *entry = find_map_entry(path);
> +
> +    if (!entry) {
> +        return -ENOENT;
> +    }
> +
> +    entry->use_counter++;
> +    fi->fh = (uint64_t)entry;
> +
> +    return 0;
> +}
> +
> +static int map_release(const char *path, struct fuse_file_info *fi)
> +{
> +    struct map_entry *entry = (struct map_entry *)fi->fh;
> +
> +    entry->use_counter--;
> +
> +    if (entry ==&disk_entry&&  entry->flags&  ENTRY_DIRTY) {
> +        update_partitions();
> +    }
> +    if (entry->flags&  ENTRY_INVALID&&  entry->use_counter == 0) {
> +        free((void *)entry->path);
> +        free(entry);
> +    }
> +
> +    return 0;
> +}
> +
> +static int map_read(const char *path, char *buf, size_t size, off_t offset,
> +                    struct fuse_file_info *fi)
> +{
> +    struct map_entry *entry = (struct map_entry *)fi->fh;
> +    int err;
> +
> +    if (entry->flags&  ENTRY_INVALID) {
> +        return -ENOENT;
> +    }
> +
> +    if (offset + size>  entry->size) {
> +        size = entry->size - offset;
> +    }
> +
> +    err = bdrv_read(img_bs, (entry->offset + offset) / BDRV_SECTOR_SIZE,
> +                    (uint8_t*)buf, size / BDRV_SECTOR_SIZE);
> +    if (err) {
> +        return err;
> +    }
> +
> +    return size;
> +}
> +
> +static int map_write(const char *path, const char *buf, size_t size,
> +                     off_t offset, struct fuse_file_info *fi)
> +{
> +    struct map_entry *entry = (struct map_entry *)fi->fh;
> +    int err;
> +
> +    if (entry->flags&  ENTRY_INVALID) {
> +        return -ENOENT;
> +    }
> +
> +    err = bdrv_write(img_bs, (entry->offset + offset) / BDRV_SECTOR_SIZE,
> +                     (uint8_t*)buf, size / BDRV_SECTOR_SIZE);
> +    if (err) {
> +        return err;
> +    }
> +
> +    entry->flags |= ENTRY_DIRTY;
> +
> +    return size;
> +}
> +
> +#if FUSE_VERSION>= 28
> +static int map_ioctl(const char *path, int cmd, void *arg,
> +                     struct fuse_file_info *fi, unsigned int flags, void *data)
> +{
> +    struct map_entry *entry = (struct map_entry *)fi->fh;
> +
> +    if (entry->flags&  ENTRY_INVALID) {
> +        return -ENOENT;
> +    }
> +
> +    switch (cmd) {
> +#ifdef CONFIG_LINUX
> +    case BLKGETSIZE64:
> +        *(uint64_t *)data = entry->size;
> +        return 0;
> +#endif /* CONFIG_LINUX */
> +    default:
> +        return -ENOTTY;
> +    }
> +}
> +#endif /* FUSE_VERSION>= 28 */
> +
> +static struct fuse_operations map_ops = {
> +    .init    = map_init,
> +    .getattr = map_getattr,
> +    .readdir = map_readdir,
> +    .open    = map_open,
> +    .release = map_release,
> +    .read    = map_read,
> +    .write   = map_write,
> +#if FUSE_VERSION>= 28
> +    .ioctl   = map_ioctl,
> +#endif
> +};
> +
> +static void QEMU_NORETURN map_help(struct fuse_args *args)
> +{
> +    printf("usage: qemu-img map [-F fmt] [FUSE options] filename mountpoint\n"
> +           "\ngeneral options:\n"
> +           "    -o opt,[opt...]         mount options\n"
> +           "    -h    --help            print help\n"
> +           "    -V    --version         print version\n"
> +           "\nqemu-img options:\n"
> +           "    -F fmt                  image format\n\n");
> +    fuse_opt_add_arg(args, "-ho");
> +    fuse_main(args->argc, args->argv,&map_ops, NULL);
> +    exit(1);
> +}
> +
> +int img_map(int argc, char **argv)
> +{
> +    struct fuse_args args = FUSE_ARGS_INIT(0, NULL);
> +    const char *filename = NULL;
> +    const char *fmt = NULL;
> +    const char *mountpoint;
> +    char *fs_name;
> +    uint64_t size;
> +
> +    fuse_opt_add_arg(&args, argv[0]);
> +    fuse_opt_add_arg(&args, "-o");
> +    fuse_opt_add_arg(&args, "subtype=qemu-img-map");
> +
> +    /* block layer is not thread-safe */
> +    fuse_opt_add_arg(&args, "-s");
> +
> +    for (;;) {
> +        static const struct option long_opts[] = {
> +            { "--help", 0, NULL, 'h' },
> +            { "--version", 0, NULL, 'v' },
> +            { NULL, 0, NULL, 0 }
> +        };
> +        int c;
> +
> +        c = getopt_long(argc, argv, "F:dfsho:", long_opts, NULL);
> +        if (c<  0) {
> +            break;
> +        }
> +        switch (c) {
> +        case 'h':
> +            map_help(&args);
> +            break;
> +        case 'F':
> +            fmt = optarg;
> +            break;
> +        case 'o':
> +            fuse_opt_add_arg(&args, "-o");
> +            fuse_opt_add_arg(&args, optarg);
> +            break;
> +        case 'd':
> +            fuse_opt_add_arg(&args, "-d");
> +            break;
> +        case 'f':
> +            fuse_opt_add_arg(&args, "-f");
> +            break;
> +        default:
> +            /* ignore -s, we enforce it anyway */
> +            break;
> +        }
> +    }
> +    if (optind + 1>= argc) {
> +        map_help(&args);
> +    }
> +
> +    filename = argv[optind++];
> +
> +    size = strlen(filename) + 8;
> +    fs_name = malloc(size);
> +    if (!fs_name) {
> +        error("Not enough memory");
> +    }
> +    snprintf(fs_name, size, "fsname=%s", filename);
> +    fuse_opt_insert_arg(&args, 1, "-o");
> +    fuse_opt_insert_arg(&args, 2, fs_name);
> +    free(fs_name);
> +
> +    mountpoint = argv[optind];
> +    fuse_opt_add_arg(&args, mountpoint);
> +
> +    size = strlen(mountpoint) + strlen(disk_entry.path) + 1;
> +    disk_path = malloc(size);
> +    if (!disk_path) {
> +        error("Not enough memory");
> +    }
> +    snprintf(disk_path, size, "%s%s", mountpoint, disk_entry.path);
> +
> +    if (stat(filename,&img_stat)<  0) {
> +        perror("Unable to process image file");
> +        exit(1);
> +    }
> +    img_stat.st_mode&= S_IRWXU | S_IRWXG | S_IRWXO;
> +
> +    img_bs = bdrv_new_open(filename, fmt, 0);
> +    if (!img_bs) {
> +        error("Could not open '%s'", filename);
> +    }
> +    bdrv_get_geometry(img_bs,&size);
> +    disk_entry.size = size * BDRV_SECTOR_SIZE;
> +
> +    return fuse_main(args.argc, args.argv,&map_ops, NULL);
> +}
> diff --git a/qemu-img.c b/qemu-img.c
> index 9b28664..28b8427 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -21,7 +21,7 @@
>    * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>    * THE SOFTWARE.
>    */
> -#include "qemu-common.h"
> +#include "qemu-img.h"
>   #include "qemu-option.h"
>   #include "osdep.h"
>   #include "block_int.h"
> @@ -39,7 +39,7 @@ typedef struct img_cmd_t {
>   /* Default to cache=writeback as data integrity is not important for qemu-tcg. */
>   #define BRDV_O_FLAGS BDRV_O_CACHE_WB
>
> -static void QEMU_NORETURN error(const char *fmt, ...)
> +void QEMU_NORETURN error(const char *fmt, ...)
>   {
>       va_list ap;
>       va_start(ap, fmt);
> @@ -97,6 +97,9 @@ static void help(void)
>       printf("%s\nSupported formats:", help_msg);
>       bdrv_iterate_format(format_print, NULL);
>       printf("\n");
> +#ifdef CONFIG_FUSE
> +    printf("\nInvoke 'qemu-img map --help' to list FUSE options.\n");
> +#endif
>       exit(1);
>   }
>
> @@ -188,9 +191,9 @@ static int read_password(char *buf, int buf_size)
>   }
>   #endif
>
> -static BlockDriverState *bdrv_new_open(const char *filename,
> -                                       const char *fmt,
> -                                       int readonly)
> +BlockDriverState *bdrv_new_open(const char *filename,
> +                                const char *fmt,
> +                                int readonly)
>   {
>       BlockDriverState *bs;
>       BlockDriver *drv;
> diff --git a/qemu-img.h b/qemu-img.h
> new file mode 100644
> index 0000000..1bf0f27
> --- /dev/null
> +++ b/qemu-img.h
> @@ -0,0 +1,13 @@
> +#ifndef QEMU_IMG_H
> +#define QEMU_IMG_H
> +
> +#include "qemu-common.h"
> +
> +void QEMU_NORETURN error(const char *fmt, ...);
> +BlockDriverState *bdrv_new_open(const char *filename,
> +                                const char *fmt,
> +                                int readonly);
> +
> +int img_map(int argc, char **argv);
> +
> +#endif
> diff --git a/qemu-img.texi b/qemu-img.texi
> index ac97854..a85f454 100644
> --- a/qemu-img.texi
> +++ b/qemu-img.texi
> @@ -106,6 +106,16 @@ they are displayed too.
>   @item snapshot [-l | -a @var{snapshot} | -c @var{snapshot} | -d @var{snapshot} ] @var{filename}
>
>   List, apply, create or delete snapshots in image @var{filename}.
> +
> +@item map [-F @var{fmt}] [@var{FUSE options}] @var{filename} @var{mountpoint}
> +
> +Make a disk image accessible via pseudo devices under @var{mountpoint}. This
> +command will expose the whole raw image as well as individual partitions, the
> +latter depending on the parsing capabilies of libblkid. The exposed disk
> +device file can be passed to partitioning tools, and any device file containing
> +a valid filesystem can be loop-back mounted to access its content (e.g. via
> +mountlo without any root privileges). For the full list of FUSE-related
> +options, invoke @code{qemu-img map --help}.
>   @end table
>
>   Supported image file formats:
>
>
>
>
Jan Kiszka - March 25, 2010, 9:46 p.m.
Anthony Liguori wrote:
> On 03/25/2010 12:52 PM, Jan Kiszka wrote:
>> This adds the "map" subcommand to qemu-img. It is able to expose the raw
>> content of a disk image via a FUSE filesystem. Both the whole disk can
>> be accessed, e.g. to run partitioning tools against it, as well as
>> individual partitions. This allows to create new filesystems in the
>> image or loop-back mount exiting ones. Using the great mountlo tool
>> from the FUSE collection [1][2], the latter can even be done by non-root
>> users (the former anyway).
>>
>> There are some dependency to fulfill to gain all features: Partition
>> scanning is done via recent libblkid (I used version 2.17.1). If this
>> library is not available, only the disk file is provide. Fortunately,
>> mountlo can do partition scanning as well ("-p n") to work around this.
>>
>> Moreover, libfuse>= 2.8 and a host kernel>= 2.6.29 is required for
>> seamless disk access via fdisk. Otherwise, the BLKGETSIZE64 IOCTL cannot
>> be provided, and the number of cylinders has to set explicitly (e.g. via
>> "-C n").
>>
>> This work was inspired by Ashley Saulsbury's qemu-diskp [3].
>>
>> [1]
>> http://sourceforge.net/apps/mediawiki/fuse/index.php?title=FileSystems#Mountlo
>>
>> [2] http://sourceforge.net/projects/fuse/files/mountlo/
>> [3] http://www.saulsbury.org/software/virtualization.html
>>
>> Signed-off-by: Jan Kiszka<jan.kiszka@web.de>
>>    
> 
> This has been proposed quite a few times.
> 
> In fact, I wrote something like this prior to implementing qemu-nbd.
> 
> The problem with fuse is that as default configured, you can't actually
> enter into a fuse filesystem as root and since you need to be root to
> loopback mount it, it pretty nasty from a usability perspective.

You don't, see mountlo.

> 
> So why did you go the fuse route instead of using qemu-nbd?

Mostly usability. It's really straightforward to stack mountlo on top of
the mapped image. And you can run (almost) all the filesystem and
partitioning tools.

Moreover, blkid_partlist provides a more complete partition parser than
rolling your own version - which I briefly considered and then quickly
dropped after looking at some implementations.

Jan
Anthony Liguori - March 25, 2010, 10:04 p.m.
On 03/25/2010 04:46 PM, Jan Kiszka wrote:
> Anthony Liguori wrote:
>    
>> On 03/25/2010 12:52 PM, Jan Kiszka wrote:
>>      
>>> This adds the "map" subcommand to qemu-img. It is able to expose the raw
>>> content of a disk image via a FUSE filesystem. Both the whole disk can
>>> be accessed, e.g. to run partitioning tools against it, as well as
>>> individual partitions. This allows to create new filesystems in the
>>> image or loop-back mount exiting ones. Using the great mountlo tool
>>> from the FUSE collection [1][2], the latter can even be done by non-root
>>> users (the former anyway).
>>>
>>> There are some dependency to fulfill to gain all features: Partition
>>> scanning is done via recent libblkid (I used version 2.17.1). If this
>>> library is not available, only the disk file is provide. Fortunately,
>>> mountlo can do partition scanning as well ("-p n") to work around this.
>>>
>>> Moreover, libfuse>= 2.8 and a host kernel>= 2.6.29 is required for
>>> seamless disk access via fdisk. Otherwise, the BLKGETSIZE64 IOCTL cannot
>>> be provided, and the number of cylinders has to set explicitly (e.g. via
>>> "-C n").
>>>
>>> This work was inspired by Ashley Saulsbury's qemu-diskp [3].
>>>
>>> [1]
>>> http://sourceforge.net/apps/mediawiki/fuse/index.php?title=FileSystems#Mountlo
>>>
>>> [2] http://sourceforge.net/projects/fuse/files/mountlo/
>>> [3] http://www.saulsbury.org/software/virtualization.html
>>>
>>> Signed-off-by: Jan Kiszka<jan.kiszka@web.de>
>>>
>>>        
>> This has been proposed quite a few times.
>>
>> In fact, I wrote something like this prior to implementing qemu-nbd.
>>
>> The problem with fuse is that as default configured, you can't actually
>> enter into a fuse filesystem as root and since you need to be root to
>> loopback mount it, it pretty nasty from a usability perspective.
>>      
> You don't, see mountlo.
>    

That definitely changes things.  I assume it just uses libe2fs et al to 
display filesystem contents?

Does it preserve ownership?

You still can't do things as root I take it which is problematic.

>> So why did you go the fuse route instead of using qemu-nbd?
>>      
> Mostly usability. It's really straightforward to stack mountlo on top of
> the mapped image. And you can run (almost) all the filesystem and
> partitioning tools.
>
> Moreover, blkid_partlist provides a more complete partition parser than
> rolling your own version

We do in qemu-nbd and in qemu (to guess disk geometry).

Regards,

Anthony Liguori
Jan Kiszka - March 25, 2010, 10:27 p.m.
Anthony Liguori wrote:
> On 03/25/2010 04:46 PM, Jan Kiszka wrote:
>> Anthony Liguori wrote:
>>   
>>> On 03/25/2010 12:52 PM, Jan Kiszka wrote:
>>>     
>>>> This adds the "map" subcommand to qemu-img. It is able to expose the
>>>> raw
>>>> content of a disk image via a FUSE filesystem. Both the whole disk can
>>>> be accessed, e.g. to run partitioning tools against it, as well as
>>>> individual partitions. This allows to create new filesystems in the
>>>> image or loop-back mount exiting ones. Using the great mountlo tool
>>>> from the FUSE collection [1][2], the latter can even be done by
>>>> non-root
>>>> users (the former anyway).
>>>>
>>>> There are some dependency to fulfill to gain all features: Partition
>>>> scanning is done via recent libblkid (I used version 2.17.1). If this
>>>> library is not available, only the disk file is provide. Fortunately,
>>>> mountlo can do partition scanning as well ("-p n") to work around this.
>>>>
>>>> Moreover, libfuse>= 2.8 and a host kernel>= 2.6.29 is required for
>>>> seamless disk access via fdisk. Otherwise, the BLKGETSIZE64 IOCTL
>>>> cannot
>>>> be provided, and the number of cylinders has to set explicitly (e.g.
>>>> via
>>>> "-C n").
>>>>
>>>> This work was inspired by Ashley Saulsbury's qemu-diskp [3].
>>>>
>>>> [1]
>>>> http://sourceforge.net/apps/mediawiki/fuse/index.php?title=FileSystems#Mountlo
>>>>
>>>>
>>>> [2] http://sourceforge.net/projects/fuse/files/mountlo/
>>>> [3] http://www.saulsbury.org/software/virtualization.html
>>>>
>>>> Signed-off-by: Jan Kiszka<jan.kiszka@web.de>
>>>>
>>>>        
>>> This has been proposed quite a few times.
>>>
>>> In fact, I wrote something like this prior to implementing qemu-nbd.
>>>
>>> The problem with fuse is that as default configured, you can't actually
>>> enter into a fuse filesystem as root and since you need to be root to
>>> loopback mount it, it pretty nasty from a usability perspective.
>>>      
>> You don't, see mountlo.
>>    
> 
> That definitely changes things.  I assume it just uses libe2fs et al to
> display filesystem contents?

Nope. It's a bit like libguestfs as it uses Linux to access the
filesystems, but that Linux runs in UML mode, thus does not require any
qemu/kvm underneath. It simply maps the FUSE requests on corresponding
VFS services in the UML kernel.

> 
> Does it preserve ownership?

Yep.

> 
> You still can't do things as root I take it which is problematic.

At least my default config does not prevent running qemu-img map as root
and then performing a classic "mount -o loop" on the partitions it
provides. Or what do you mean?


What mountlo is lacking (at least so far) are things like LVM or
soft-RAID. There were some posts on the fuse lists announcing work on
it, but that dates 2 years back without any code traces. But if this
path turns out to be useful for us (or libguestfs), I guess that should
be easy to add.

Jan
Anthony Liguori - March 25, 2010, 10:54 p.m.
On 03/25/2010 05:27 PM, Jan Kiszka wrote:
> Anthony Liguori wrote:
>    
>> On 03/25/2010 04:46 PM, Jan Kiszka wrote:
>>      
>>> Anthony Liguori wrote:
>>>
>>>        
>>>> On 03/25/2010 12:52 PM, Jan Kiszka wrote:
>>>>
>>>>          
>>>>> This adds the "map" subcommand to qemu-img. It is able to expose the
>>>>> raw
>>>>> content of a disk image via a FUSE filesystem. Both the whole disk can
>>>>> be accessed, e.g. to run partitioning tools against it, as well as
>>>>> individual partitions. This allows to create new filesystems in the
>>>>> image or loop-back mount exiting ones. Using the great mountlo tool
>>>>> from the FUSE collection [1][2], the latter can even be done by
>>>>> non-root
>>>>> users (the former anyway).
>>>>>
>>>>> There are some dependency to fulfill to gain all features: Partition
>>>>> scanning is done via recent libblkid (I used version 2.17.1). If this
>>>>> library is not available, only the disk file is provide. Fortunately,
>>>>> mountlo can do partition scanning as well ("-p n") to work around this.
>>>>>
>>>>> Moreover, libfuse>= 2.8 and a host kernel>= 2.6.29 is required for
>>>>> seamless disk access via fdisk. Otherwise, the BLKGETSIZE64 IOCTL
>>>>> cannot
>>>>> be provided, and the number of cylinders has to set explicitly (e.g.
>>>>> via
>>>>> "-C n").
>>>>>
>>>>> This work was inspired by Ashley Saulsbury's qemu-diskp [3].
>>>>>
>>>>> [1]
>>>>> http://sourceforge.net/apps/mediawiki/fuse/index.php?title=FileSystems#Mountlo
>>>>>
>>>>>
>>>>> [2] http://sourceforge.net/projects/fuse/files/mountlo/
>>>>> [3] http://www.saulsbury.org/software/virtualization.html
>>>>>
>>>>> Signed-off-by: Jan Kiszka<jan.kiszka@web.de>
>>>>>
>>>>>
>>>>>            
>>>> This has been proposed quite a few times.
>>>>
>>>> In fact, I wrote something like this prior to implementing qemu-nbd.
>>>>
>>>> The problem with fuse is that as default configured, you can't actually
>>>> enter into a fuse filesystem as root and since you need to be root to
>>>> loopback mount it, it pretty nasty from a usability perspective.
>>>>
>>>>          
>>> You don't, see mountlo.
>>>
>>>        
>> That definitely changes things.  I assume it just uses libe2fs et al to
>> display filesystem contents?
>>      
> Nope. It's a bit like libguestfs as it uses Linux to access the
> filesystems, but that Linux runs in UML mode, thus does not require any
> qemu/kvm underneath. It simply maps the FUSE requests on corresponding
> VFS services in the UML kernel.
>
>    
>> Does it preserve ownership?
>>      
> Yep.
>
>    
>> You still can't do things as root I take it which is problematic.
>>      
> At least my default config does not prevent running qemu-img map as root
> and then performing a classic "mount -o loop" on the partitions it
> provides. Or what do you mean?
>    

You need user_allow_other set in /etc/fuse.conf which isn't set by default.

Regards,

Anthony Liguori
Jan Kiszka - March 26, 2010, 7:59 a.m.
Anthony Liguori wrote:
> On 03/25/2010 05:27 PM, Jan Kiszka wrote:
>> Anthony Liguori wrote:
>>   
>>> On 03/25/2010 04:46 PM, Jan Kiszka wrote:
>>>     
>>>> Anthony Liguori wrote:
>>>>
>>>>       
>>>>> On 03/25/2010 12:52 PM, Jan Kiszka wrote:
>>>>>
>>>>>         
>>>>>> This adds the "map" subcommand to qemu-img. It is able to expose the
>>>>>> raw
>>>>>> content of a disk image via a FUSE filesystem. Both the whole disk
>>>>>> can
>>>>>> be accessed, e.g. to run partitioning tools against it, as well as
>>>>>> individual partitions. This allows to create new filesystems in the
>>>>>> image or loop-back mount exiting ones. Using the great mountlo tool
>>>>>> from the FUSE collection [1][2], the latter can even be done by
>>>>>> non-root
>>>>>> users (the former anyway).
>>>>>>
>>>>>> There are some dependency to fulfill to gain all features: Partition
>>>>>> scanning is done via recent libblkid (I used version 2.17.1). If this
>>>>>> library is not available, only the disk file is provide. Fortunately,
>>>>>> mountlo can do partition scanning as well ("-p n") to work around
>>>>>> this.
>>>>>>
>>>>>> Moreover, libfuse>= 2.8 and a host kernel>= 2.6.29 is required for
>>>>>> seamless disk access via fdisk. Otherwise, the BLKGETSIZE64 IOCTL
>>>>>> cannot
>>>>>> be provided, and the number of cylinders has to set explicitly (e.g.
>>>>>> via
>>>>>> "-C n").
>>>>>>
>>>>>> This work was inspired by Ashley Saulsbury's qemu-diskp [3].
>>>>>>
>>>>>> [1]
>>>>>> http://sourceforge.net/apps/mediawiki/fuse/index.php?title=FileSystems#Mountlo
>>>>>>
>>>>>>
>>>>>>
>>>>>> [2] http://sourceforge.net/projects/fuse/files/mountlo/
>>>>>> [3] http://www.saulsbury.org/software/virtualization.html
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka<jan.kiszka@web.de>
>>>>>>
>>>>>>
>>>>>>            
>>>>> This has been proposed quite a few times.
>>>>>
>>>>> In fact, I wrote something like this prior to implementing qemu-nbd.
>>>>>
>>>>> The problem with fuse is that as default configured, you can't
>>>>> actually
>>>>> enter into a fuse filesystem as root and since you need to be root to
>>>>> loopback mount it, it pretty nasty from a usability perspective.
>>>>>
>>>>>          
>>>> You don't, see mountlo.
>>>>
>>>>        
>>> That definitely changes things.  I assume it just uses libe2fs et al to
>>> display filesystem contents?
>>>      
>> Nope. It's a bit like libguestfs as it uses Linux to access the
>> filesystems, but that Linux runs in UML mode, thus does not require any
>> qemu/kvm underneath. It simply maps the FUSE requests on corresponding
>> VFS services in the UML kernel.
>>
>>   
>>> Does it preserve ownership?
>>>      
>> Yep.
>>
>>   
>>> You still can't do things as root I take it which is problematic.
>>>      
>> At least my default config does not prevent running qemu-img map as root
>> and then performing a classic "mount -o loop" on the partitions it
>> provides. Or what do you mean?
>>    
> 
> You need user_allow_other set in /etc/fuse.conf which isn't set by default.

I don't see the need for sharing the mount. Either your are root, then
you can do this anyway. Or you are a normal user, and then the vision is
that you can do everything you need for setting up and maintaining guest
images without ever becoming root.

We aren't completely there yet. E.g., the Linux kernel blocks mknod of
devices although FUSE filesystems are automatically mounted with nodev.
But that should be fixable as well.

I think this approach already covers the majority of use cases of
manipulating guest images as normal user, and that without requiring
more than 500 lines of code here plus the external mountlo tool.

Jan
Christoph Hellwig - March 28, 2010, 11:02 a.m.
On Thu, Mar 25, 2010 at 06:52:59PM +0100, Jan Kiszka wrote:
> This adds the "map" subcommand to qemu-img. It is able to expose the raw
> content of a disk image via a FUSE filesystem. Both the whole disk can
> be accessed, e.g. to run partitioning tools against it, as well as
> individual partitions. This allows to create new filesystems in the
> image or loop-back mount exiting ones. Using the great mountlo tool
> from the FUSE collection [1][2], the latter can even be done by non-root
> users (the former anyway).

Is there a good reason to throw this into qemu-img instead of making
a separate qemu-fuse or similar tool?  It's doing something quite
different than the rest of qemu-img.
Jan Kiszka - March 29, 2010, 7:46 a.m.
Christoph Hellwig wrote:
> On Thu, Mar 25, 2010 at 06:52:59PM +0100, Jan Kiszka wrote:
>> This adds the "map" subcommand to qemu-img. It is able to expose the raw
>> content of a disk image via a FUSE filesystem. Both the whole disk can
>> be accessed, e.g. to run partitioning tools against it, as well as
>> individual partitions. This allows to create new filesystems in the
>> image or loop-back mount exiting ones. Using the great mountlo tool
>> from the FUSE collection [1][2], the latter can even be done by non-root
>> users (the former anyway).
> 
> Is there a good reason to throw this into qemu-img instead of making
> a separate qemu-fuse or similar tool?  It's doing something quite
> different than the rest of qemu-img.
> 

qemu-img is the swiss knife for QEMU disk image manipulation (like git
is for everything around a git repository). So, IHMO, mapping the image
content into the host filesystem for further manipulation with standard
tools belongs to this.

If the "map" thing works out for most users, I could even imagine some
helper sub-command "mount" that encapsulates map and mountlo (or some
other unprivileged mounting mechanism). This should make it easier for
users to explore all possibilities they have when working with disk images.

Jan
Alexander Graf - March 29, 2010, 8:57 a.m.
On 29.03.2010, at 09:46, Jan Kiszka wrote:

> Christoph Hellwig wrote:
>> On Thu, Mar 25, 2010 at 06:52:59PM +0100, Jan Kiszka wrote:
>>> This adds the "map" subcommand to qemu-img. It is able to expose the raw
>>> content of a disk image via a FUSE filesystem. Both the whole disk can
>>> be accessed, e.g. to run partitioning tools against it, as well as
>>> individual partitions. This allows to create new filesystems in the
>>> image or loop-back mount exiting ones. Using the great mountlo tool
>>> from the FUSE collection [1][2], the latter can even be done by non-root
>>> users (the former anyway).
>> 
>> Is there a good reason to throw this into qemu-img instead of making
>> a separate qemu-fuse or similar tool?  It's doing something quite
>> different than the rest of qemu-img.
>> 
> 
> qemu-img is the swiss knife for QEMU disk image manipulation (like git
> is for everything around a git repository). So, IHMO, mapping the image
> content into the host filesystem for further manipulation with standard
> tools belongs to this.
> 
> If the "map" thing works out for most users, I could even imagine some
> helper sub-command "mount" that encapsulates map and mountlo (or some
> other unprivileged mounting mechanism). This should make it easier for
> users to explore all possibilities they have when working with disk images.

We also have a tool called "qemu-ext2" lying around that allows you to explore ext2 based file system contents in any qemu block layer supported backend.

IMHO the best move to do here (Anthony's idea) is to somehow get the full block layer into a library, move it out of qemu into a separate project and allow other tools in there too.

That move would vastly improve the situation of distributions too. I don't want to have a qemu-img each coming from the Xen, KVM and Qemu packages. One is enough :-). And it could enable block layer experienced people to be the project maintainers, making that more valuable.


Alex
Jan Kiszka - March 29, 2010, 9:37 a.m.
Alexander Graf wrote:
> On 29.03.2010, at 09:46, Jan Kiszka wrote:
> 
>> Christoph Hellwig wrote:
>>> On Thu, Mar 25, 2010 at 06:52:59PM +0100, Jan Kiszka wrote:
>>>> This adds the "map" subcommand to qemu-img. It is able to expose the raw
>>>> content of a disk image via a FUSE filesystem. Both the whole disk can
>>>> be accessed, e.g. to run partitioning tools against it, as well as
>>>> individual partitions. This allows to create new filesystems in the
>>>> image or loop-back mount exiting ones. Using the great mountlo tool
>>>> from the FUSE collection [1][2], the latter can even be done by non-root
>>>> users (the former anyway).
>>> Is there a good reason to throw this into qemu-img instead of making
>>> a separate qemu-fuse or similar tool?  It's doing something quite
>>> different than the rest of qemu-img.
>>>
>> qemu-img is the swiss knife for QEMU disk image manipulation (like git
>> is for everything around a git repository). So, IHMO, mapping the image
>> content into the host filesystem for further manipulation with standard
>> tools belongs to this.
>>
>> If the "map" thing works out for most users, I could even imagine some
>> helper sub-command "mount" that encapsulates map and mountlo (or some
>> other unprivileged mounting mechanism). This should make it easier for
>> users to explore all possibilities they have when working with disk images.
> 
> We also have a tool called "qemu-ext2" lying around that allows you to explore ext2 based file system contents in any qemu block layer supported backend.

"we" == SUSE?

[ Wow - just typed "qemu-ext2" into Big Brother's search bar and found
the very same mail I'm just replying to. That's fast. ]

> 
> IMHO the best move to do here (Anthony's idea) is to somehow get the full block layer into a library, move it out of qemu into a separate project and allow other tools in there too.
> 
> That move would vastly improve the situation of distributions too. I don't want to have a qemu-img each coming from the Xen, KVM and Qemu packages. One is enough :-). And it could enable block layer experienced people to be the project maintainers, making that more valuable.
> 

Full ack.

Jan
Alexander Graf - March 29, 2010, 9:39 a.m.
On 29.03.2010, at 11:37, Jan Kiszka wrote:

> Alexander Graf wrote:
>> On 29.03.2010, at 09:46, Jan Kiszka wrote:
>> 
>>> Christoph Hellwig wrote:
>>>> On Thu, Mar 25, 2010 at 06:52:59PM +0100, Jan Kiszka wrote:
>>>>> This adds the "map" subcommand to qemu-img. It is able to expose the raw
>>>>> content of a disk image via a FUSE filesystem. Both the whole disk can
>>>>> be accessed, e.g. to run partitioning tools against it, as well as
>>>>> individual partitions. This allows to create new filesystems in the
>>>>> image or loop-back mount exiting ones. Using the great mountlo tool
>>>>> from the FUSE collection [1][2], the latter can even be done by non-root
>>>>> users (the former anyway).
>>>> Is there a good reason to throw this into qemu-img instead of making
>>>> a separate qemu-fuse or similar tool?  It's doing something quite
>>>> different than the rest of qemu-img.
>>>> 
>>> qemu-img is the swiss knife for QEMU disk image manipulation (like git
>>> is for everything around a git repository). So, IHMO, mapping the image
>>> content into the host filesystem for further manipulation with standard
>>> tools belongs to this.
>>> 
>>> If the "map" thing works out for most users, I could even imagine some
>>> helper sub-command "mount" that encapsulates map and mountlo (or some
>>> other unprivileged mounting mechanism). This should make it easier for
>>> users to explore all possibilities they have when working with disk images.
>> 
>> We also have a tool called "qemu-ext2" lying around that allows you to explore ext2 based file system contents in any qemu block layer supported backend.
> 
> "we" == SUSE?

"we" == "SUSE Studio" (in fact, Nat wrote it). It is GPL'ed, just not released yet. As soon as there will be a separate project with a broader scope than just qemu for the block layer, I'll happily invest the time to clean it up for upstream submission.


Alex

Patch

diff --git a/Makefile b/Makefile
index 57c354d..d5a1dae 100644
--- a/Makefile
+++ b/Makefile
@@ -126,10 +126,12 @@  bt-host.o: QEMU_CFLAGS += $(BLUEZ_CFLAGS)
 
 ######################################################################
 
-qemu-img.o: qemu-img-cmds.h
+qemu-img.o: qemu-img.h qemu-img-cmds.h
 qemu-img.o qemu-tool.o qemu-nbd.o qemu-io.o: $(GENERATED_HEADERS)
 
-qemu-img$(EXESUF): qemu-img.o qemu-tool.o $(block-obj-y) $(qobject-obj-y)
+qemu-img-map.o: QEMU_CFLAGS += $(FUSE_CFLAGS) $(BLKID_CFLAGS)
+
+qemu-img$(EXESUF): $(qemu-img-y) $(block-obj-y) $(qobject-obj-y)
 
 qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o $(block-obj-y) $(qobject-obj-y)
 
diff --git a/Makefile.objs b/Makefile.objs
index 281f7a6..8a651d2 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -207,3 +207,9 @@  libdis-$(CONFIG_PPC_DIS) += ppc-dis.o
 libdis-$(CONFIG_S390_DIS) += s390-dis.o
 libdis-$(CONFIG_SH4_DIS) += sh4-dis.o
 libdis-$(CONFIG_SPARC_DIS) += sparc-dis.o
+
+######################################################################
+# qemu-img
+
+qemu-img-y = qemu-img.o qemu-tool.o
+qemu-img-$(CONFIG_FUSE) += qemu-img-map.o
diff --git a/configure b/configure
index 6bc40a3..c84aaa9 100755
--- a/configure
+++ b/configure
@@ -263,6 +263,7 @@  vnc_tls=""
 vnc_sasl=""
 xen=""
 linux_aio=""
+fuse=""
 
 gprof="no"
 debug_tcg="no"
@@ -639,6 +640,10 @@  for opt do
   ;;
   --enable-linux-aio) linux_aio="yes"
   ;;
+  --disable-fuse) fuse="no"
+  ;;
+  --enable-fuse) fuse="yes"
+  ;;
   --enable-io-thread) io_thread="yes"
   ;;
   --disable-blobs) blobs="no"
@@ -801,6 +806,8 @@  echo "  --disable-vde            disable support for vde network"
 echo "  --enable-vde             enable support for vde network"
 echo "  --disable-linux-aio      disable Linux AIO support"
 echo "  --enable-linux-aio       enable Linux AIO support"
+echo "  --disable-fuse           disable support for FUSE in qemu-img"
+echo "  --enable-fuse            enable support for FUSE in qemu-img"
 echo "  --enable-io-thread       enable IO thread"
 echo "  --disable-blobs          disable installing provided firmware blobs"
 echo "  --kerneldir=PATH         look for kernel includes in PATH"
@@ -1586,6 +1593,44 @@  EOF
   fi
 fi
 
+##########################################
+# FUSE libraries probe
+if test "$fuse" != "no" ; then
+  fuse_cflags=`pkg-config --cflags fuse 2> /dev/null`
+  fuse_libs=`pkg-config --libs fuse 2> /dev/null`
+  cat > $TMPC << EOF
+#include <fuse.h>
+int main(int argc, const char *argv[])
+{
+    return fuse_main(argc, argv, NULL);
+}
+EOF
+  if compile_prog "$fuse_cflags" "$fuse_libs" ; then
+    fuse=yes
+    libs_tools="$fuse_libs $libs_tools"
+  else
+    if test "$fuse" = "yes" ; then
+      feature_not_found "FUSE"
+    fi
+    fuse=no
+  fi
+fi
+
+##########################################
+# blkid_partlist probe
+blkid_cflags=`pkg-config --cflags blkid 2> /dev/null`
+blkid_libs=`pkg-config --libs blkid 2> /dev/null`
+cat > $TMPC <<EOF
+#include <blkid.h>
+int main(void) { blkid_partlist ls; return 0; }
+EOF
+blkid_partlist=no
+if compile_prog "$blkid_cflags" "$blkid_libs" ; then
+  blkid_partlist=yes
+  libs_tools="$blkid_libs $libs_tools"
+fi
+
+
 #
 # Check for xxxat() functions when we are building linux-user
 # emulator.  This is done because older glibc versions don't
@@ -1962,6 +2007,8 @@  echo "PIE user targets  $user_pie"
 echo "vde support       $vde"
 echo "IO thread         $io_thread"
 echo "Linux AIO support $linux_aio"
+echo "FUSE support      $fuse"
+echo "partlist support  $blkid_partlist"
 echo "Install blobs     $blobs"
 echo "KVM support       $kvm"
 echo "fdt support       $fdt"
@@ -2183,6 +2230,14 @@  fi
 if test "$fdatasync" = "yes" ; then
   echo "CONFIG_FDATASYNC=y" >> $config_host_mak
 fi
+if test "$fuse" = "yes" ; then
+  echo "CONFIG_FUSE=y" >> $config_host_mak
+  echo "FUSE_CFLAGS=$fuse_cflags" >> $config_host_mak
+fi
+if test "$blkid_partlist" = "yes" ; then
+  echo "CONFIG_BLKID_PARTLIST=y" >> $config_host_mak
+  echo "BLKID_CFLAGS=$blkid_cflags" >> $config_host_mak
+fi
 
 # XXX: suppress that
 if [ "$bsd" = "yes" ] ; then
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index f96876a..94c6e66 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -49,5 +49,16 @@  DEF("rebase", img_rebase,
     "rebase [-f fmt] [-u] -b backing_file [-F backing_fmt] filename")
 STEXI
 @item rebase [-f @var{fmt}] [-u] -b @var{backing_file} [-F @var{backing_fmt}] @var{filename}
+ETEXI
+
+#ifdef CONFIG_FUSE
+DEF("map", img_map,
+    "map [-f fmt] [<FUSE options>] filename mountpoint")
+#endif
+STEXI
+@item map [@var{FUSE options}] @var{filename} @var{mountpoint}
+ETEXI
+
+STEXI
 @end table
 ETEXI
diff --git a/qemu-img-map.c b/qemu-img-map.c
new file mode 100644
index 0000000..cd6bbf4
--- /dev/null
+++ b/qemu-img-map.c
@@ -0,0 +1,438 @@ 
+/*
+ * QEMU disk image utility
+ *
+ * Copyright (c) 2010 Jan Kiszka
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu-img.h"
+#include "qemu-option.h"
+#include "osdep.h"
+#include "block_int.h"
+#include <stdio.h>
+#include <getopt.h>
+#include <pthread.h>
+#include <signal.h>
+
+#define FUSE_USE_VERSION 28
+#include <fuse.h>
+
+#ifdef CONFIG_LINUX
+#include <linux/fs.h>
+#endif
+
+#define ENTRY_INVALID   1
+#define ENTRY_DIRTY     2
+
+#define ENTRY_PATH_MAX  16
+
+struct map_entry {
+    struct map_entry *next;
+    const char       *path;
+    size_t           size;
+    off_t            offset;
+    unsigned int     use_counter;
+    unsigned int     flags;
+};
+
+static struct stat img_stat;
+static BlockDriverState *img_bs;
+static struct map_entry disk_entry = { .path = "/disk" };
+static char *disk_path;
+
+#ifdef CONFIG_BLKID_PARTLIST
+
+#include <blkid.h>
+
+static pthread_t reader_thread;
+static sigset_t wakeup_sigset;
+static pthread_mutex_t entry_lock = PTHREAD_MUTEX_INITIALIZER;
+static struct map_entry *last_entry = &disk_entry;
+
+static void *partition_reader(void *unused)
+{
+    struct map_entry *entry;
+    blkid_partition par;
+    blkid_partlist ls;
+    blkid_probe pr;
+    int nparts, i;
+    char *path;
+
+    while (sigwaitinfo(&wakeup_sigset, NULL) >= 0) {
+        pr = blkid_new_probe_from_filename(disk_path);
+        if (!pr) {
+            continue;
+        }
+
+        ls = blkid_probe_get_partitions(pr);
+        if (!ls) {
+            blkid_free_probe(pr);
+            continue;
+        }
+
+        nparts = blkid_partlist_numof_partitions(ls);
+
+        for (i = 0; i < nparts; i++) {
+            entry = calloc(1, sizeof(*entry));
+            if (!entry) {
+                continue;
+            }
+            path = malloc(ENTRY_PATH_MAX);
+            if (!path) {
+                free(entry);
+                continue;
+            }
+
+            par = blkid_partlist_get_partition(ls, i);
+
+            snprintf(path, ENTRY_PATH_MAX, "/partition%d",
+                     blkid_partition_get_partno(par));
+            entry->path = path;
+            entry->size = blkid_partition_get_size(par) * BDRV_SECTOR_SIZE;
+            entry->offset = blkid_partition_get_start(par) * BDRV_SECTOR_SIZE;
+
+            pthread_mutex_lock(&entry_lock);
+
+            last_entry->next = entry;
+            last_entry = entry;
+
+            pthread_mutex_unlock(&entry_lock);
+        }
+
+        blkid_free_probe(pr);
+    }
+
+    return NULL;
+}
+
+static void update_partitions(void)
+{
+    struct map_entry *entry = disk_entry.next;
+    struct map_entry *old;
+
+    /* release old partions */
+    pthread_mutex_lock(&entry_lock);
+
+    while (entry) {
+        old = entry;
+        entry = entry->next;
+        if (old->use_counter == 0) {
+            free((void *)old->path);
+            free(old);
+        } else {
+            old->flags = ENTRY_INVALID;
+        }
+    }
+
+    disk_entry.next = NULL;
+    last_entry = &disk_entry;
+
+    disk_entry.flags &= ~ENTRY_DIRTY;
+
+    pthread_mutex_unlock(&entry_lock);
+
+    /* kick off partition table scan */
+    pthread_kill(reader_thread, SIGUSR1);
+}
+
+static void init_reader_thread(void)
+{
+    sigemptyset(&wakeup_sigset);
+    sigaddset(&wakeup_sigset, SIGUSR1);
+    sigprocmask(SIG_BLOCK, &wakeup_sigset, NULL);
+
+    if (pthread_create(&reader_thread, NULL, partition_reader, NULL)) {
+        error("Could not spawn partition reader thread");
+    }
+}
+
+#else /* !CONFIG_BLKID_PARTLIST */
+
+static inline void update_partitions(void) { }
+static inline void init_reader_thread(void) { }
+
+#endif /* !CONFIG_BLKID_PARTLIST */
+
+static struct map_entry *find_map_entry(const char *path)
+{
+    struct map_entry *entry = &disk_entry;
+
+    do {
+        if (strcmp(entry->path, path) == 0) {
+            break;
+        }
+        entry = entry->next;
+    } while (entry);
+
+    return entry;
+}
+
+static void *map_init(struct fuse_conn_info *conn)
+{
+    init_reader_thread();
+    update_partitions();
+    return NULL;
+}
+
+static int map_getattr(const char *path, struct stat *stbuf)
+{
+    struct map_entry *entry;
+    int res = 0;
+
+    memset(stbuf, 0, sizeof(struct stat));
+    stbuf->st_uid   = img_stat.st_uid;
+    stbuf->st_gid   = img_stat.st_gid;
+    stbuf->st_atime = img_stat.st_atime;
+    stbuf->st_mtime = img_stat.st_mtime;
+    stbuf->st_ctime = img_stat.st_ctime;
+
+    if (strcmp(path, "/") == 0) {
+        stbuf->st_mode = S_IFDIR | 0111 | img_stat.st_mode;
+        stbuf->st_nlink = 2;
+    } else {
+        entry = find_map_entry(path);
+        if (entry) {
+            stbuf->st_mode = S_IFREG | img_stat.st_mode;
+            stbuf->st_nlink = 1;
+            stbuf->st_size = entry->size;
+        } else {
+            res = -ENOENT;
+        }
+    }
+
+    return res;
+}
+
+static int map_readdir(const char *path, void *buf, fuse_fill_dir_t filler,
+                       off_t offset, struct fuse_file_info *fi)
+{
+    struct map_entry *entry;
+
+    if (strcmp(path, "/") != 0) {
+        return -ENOENT;
+    }
+    filler(buf, ".", NULL, 0);
+    filler(buf, "..", NULL, 0);
+    for (entry = &disk_entry; entry; entry = entry->next) {
+        filler(buf, entry->path+1, NULL, 0);
+    }
+
+    return 0;
+}
+
+static int map_open(const char *path, struct fuse_file_info *fi)
+{
+    struct map_entry *entry = find_map_entry(path);
+
+    if (!entry) {
+        return -ENOENT;
+    }
+
+    entry->use_counter++;
+    fi->fh = (uint64_t)entry;
+
+    return 0;
+}
+
+static int map_release(const char *path, struct fuse_file_info *fi)
+{
+    struct map_entry *entry = (struct map_entry *)fi->fh;
+
+    entry->use_counter--;
+
+    if (entry == &disk_entry && entry->flags & ENTRY_DIRTY) {
+        update_partitions();
+    }
+    if (entry->flags & ENTRY_INVALID && entry->use_counter == 0) {
+        free((void *)entry->path);
+        free(entry);
+    }
+
+    return 0;
+}
+
+static int map_read(const char *path, char *buf, size_t size, off_t offset,
+                    struct fuse_file_info *fi)
+{
+    struct map_entry *entry = (struct map_entry *)fi->fh;
+    int err;
+
+    if (entry->flags & ENTRY_INVALID) {
+        return -ENOENT;
+    }
+
+    if (offset + size > entry->size) {
+        size = entry->size - offset;
+    }
+
+    err = bdrv_read(img_bs, (entry->offset + offset) / BDRV_SECTOR_SIZE,
+                    (uint8_t*)buf, size / BDRV_SECTOR_SIZE);
+    if (err) {
+        return err;
+    }
+
+    return size;
+}
+
+static int map_write(const char *path, const char *buf, size_t size,
+                     off_t offset, struct fuse_file_info *fi)
+{
+    struct map_entry *entry = (struct map_entry *)fi->fh;
+    int err;
+
+    if (entry->flags & ENTRY_INVALID) {
+        return -ENOENT;
+    }
+
+    err = bdrv_write(img_bs, (entry->offset + offset) / BDRV_SECTOR_SIZE,
+                     (uint8_t*)buf, size / BDRV_SECTOR_SIZE);
+    if (err) {
+        return err;
+    }
+
+    entry->flags |= ENTRY_DIRTY;
+
+    return size;
+}
+
+#if FUSE_VERSION >= 28
+static int map_ioctl(const char *path, int cmd, void *arg,
+                     struct fuse_file_info *fi, unsigned int flags, void *data)
+{
+    struct map_entry *entry = (struct map_entry *)fi->fh;
+
+    if (entry->flags & ENTRY_INVALID) {
+        return -ENOENT;
+    }
+
+    switch (cmd) {
+#ifdef CONFIG_LINUX
+    case BLKGETSIZE64:
+        *(uint64_t *)data = entry->size;
+        return 0;
+#endif /* CONFIG_LINUX */
+    default:
+        return -ENOTTY;
+    }
+}
+#endif /* FUSE_VERSION >= 28 */
+
+static struct fuse_operations map_ops = {
+    .init    = map_init,
+    .getattr = map_getattr,
+    .readdir = map_readdir,
+    .open    = map_open,
+    .release = map_release,
+    .read    = map_read,
+    .write   = map_write,
+#if FUSE_VERSION >= 28
+    .ioctl   = map_ioctl,
+#endif
+};
+
+static void QEMU_NORETURN map_help(struct fuse_args *args)
+{
+    printf("usage: qemu-img map [-F fmt] [FUSE options] filename mountpoint\n"
+           "\ngeneral options:\n"
+           "    -o opt,[opt...]         mount options\n"
+           "    -h    --help            print help\n"
+           "    -V    --version         print version\n"
+           "\nqemu-img options:\n"
+           "    -F fmt                  image format\n\n");
+    fuse_opt_add_arg(args, "-ho");
+    fuse_main(args->argc, args->argv, &map_ops, NULL);
+    exit(1);
+}
+
+int img_map(int argc, char **argv)
+{
+    struct fuse_args args = FUSE_ARGS_INIT(0, NULL);
+    const char *filename = NULL;
+    const char *fmt = NULL;
+    const char *mountpoint;
+    char *fs_name;
+    uint64_t size;
+
+    fuse_opt_add_arg(&args, argv[0]);
+    fuse_opt_add_arg(&args, "-o");
+    fuse_opt_add_arg(&args, "subtype=qemu-img-map");
+
+    /* block layer is not thread-safe */
+    fuse_opt_add_arg(&args, "-s");
+
+    for (;;) {
+        static const struct option long_opts[] = {
+            { "--help", 0, NULL, 'h' },
+            { "--version", 0, NULL, 'v' },
+            { NULL, 0, NULL, 0 }
+        };
+        int c;
+
+        c = getopt_long(argc, argv, "F:dfsho:", long_opts, NULL);
+        if (c < 0) {
+            break;
+        }
+        switch (c) {
+        case 'h':
+            map_help(&args);
+            break;
+        case 'F':
+            fmt = optarg;
+            break;
+        case 'o':
+            fuse_opt_add_arg(&args, "-o");
+            fuse_opt_add_arg(&args, optarg);
+            break;
+        case 'd':
+            fuse_opt_add_arg(&args, "-d");
+            break;
+        case 'f':
+            fuse_opt_add_arg(&args, "-f");
+            break;
+        default:
+            /* ignore -s, we enforce it anyway */
+            break;
+        }
+    }
+    if (optind + 1 >= argc) {
+        map_help(&args);
+    }
+
+    filename = argv[optind++];
+
+    size = strlen(filename) + 8;
+    fs_name = malloc(size);
+    if (!fs_name) {
+        error("Not enough memory");
+    }
+    snprintf(fs_name, size, "fsname=%s", filename);
+    fuse_opt_insert_arg(&args, 1, "-o");
+    fuse_opt_insert_arg(&args, 2, fs_name);
+    free(fs_name);
+
+    mountpoint = argv[optind];
+    fuse_opt_add_arg(&args, mountpoint);
+
+    size = strlen(mountpoint) + strlen(disk_entry.path) + 1;
+    disk_path = malloc(size);
+    if (!disk_path) {
+        error("Not enough memory");
+    }
+    snprintf(disk_path, size, "%s%s", mountpoint, disk_entry.path);
+
+    if (stat(filename, &img_stat) < 0) {
+        perror("Unable to process image file");
+        exit(1);
+    }
+    img_stat.st_mode &= S_IRWXU | S_IRWXG | S_IRWXO;
+
+    img_bs = bdrv_new_open(filename, fmt, 0);
+    if (!img_bs) {
+        error("Could not open '%s'", filename);
+    }
+    bdrv_get_geometry(img_bs, &size);
+    disk_entry.size = size * BDRV_SECTOR_SIZE;
+
+    return fuse_main(args.argc, args.argv, &map_ops, NULL);
+}
diff --git a/qemu-img.c b/qemu-img.c
index 9b28664..28b8427 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -21,7 +21,7 @@ 
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
  * THE SOFTWARE.
  */
-#include "qemu-common.h"
+#include "qemu-img.h"
 #include "qemu-option.h"
 #include "osdep.h"
 #include "block_int.h"
@@ -39,7 +39,7 @@  typedef struct img_cmd_t {
 /* Default to cache=writeback as data integrity is not important for qemu-tcg. */
 #define BRDV_O_FLAGS BDRV_O_CACHE_WB
 
-static void QEMU_NORETURN error(const char *fmt, ...)
+void QEMU_NORETURN error(const char *fmt, ...)
 {
     va_list ap;
     va_start(ap, fmt);
@@ -97,6 +97,9 @@  static void help(void)
     printf("%s\nSupported formats:", help_msg);
     bdrv_iterate_format(format_print, NULL);
     printf("\n");
+#ifdef CONFIG_FUSE
+    printf("\nInvoke 'qemu-img map --help' to list FUSE options.\n");
+#endif
     exit(1);
 }
 
@@ -188,9 +191,9 @@  static int read_password(char *buf, int buf_size)
 }
 #endif
 
-static BlockDriverState *bdrv_new_open(const char *filename,
-                                       const char *fmt,
-                                       int readonly)
+BlockDriverState *bdrv_new_open(const char *filename,
+                                const char *fmt,
+                                int readonly)
 {
     BlockDriverState *bs;
     BlockDriver *drv;
diff --git a/qemu-img.h b/qemu-img.h
new file mode 100644
index 0000000..1bf0f27
--- /dev/null
+++ b/qemu-img.h
@@ -0,0 +1,13 @@ 
+#ifndef QEMU_IMG_H
+#define QEMU_IMG_H
+
+#include "qemu-common.h"
+
+void QEMU_NORETURN error(const char *fmt, ...);
+BlockDriverState *bdrv_new_open(const char *filename,
+                                const char *fmt,
+                                int readonly);
+
+int img_map(int argc, char **argv);
+
+#endif
diff --git a/qemu-img.texi b/qemu-img.texi
index ac97854..a85f454 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -106,6 +106,16 @@  they are displayed too.
 @item snapshot [-l | -a @var{snapshot} | -c @var{snapshot} | -d @var{snapshot} ] @var{filename}
 
 List, apply, create or delete snapshots in image @var{filename}.
+
+@item map [-F @var{fmt}] [@var{FUSE options}] @var{filename} @var{mountpoint}
+
+Make a disk image accessible via pseudo devices under @var{mountpoint}. This
+command will expose the whole raw image as well as individual partitions, the
+latter depending on the parsing capabilies of libblkid. The exposed disk
+device file can be passed to partitioning tools, and any device file containing
+a valid filesystem can be loop-back mounted to access its content (e.g. via
+mountlo without any root privileges). For the full list of FUSE-related
+options, invoke @code{qemu-img map --help}.
 @end table
 
 Supported image file formats: