diff mbox

[PATCHv5] block: add native support for NFS

Message ID 1388062130-21126-1-git-send-email-pl@kamp.de
State New
Headers show

Commit Message

Peter Lieven Dec. 26, 2013, 12:48 p.m. UTC
This patch adds native support for accessing images on NFS shares without
the requirement to actually mount the entire NFS share on the host.

NFS Images can simply be specified by an url of the form:
nfs://<host>/<export>/<filename>

For example:
qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2

You need LibNFS from Ronnie Sahlberg available at:
   git://github.com/sahlberg/libnfs.git
for this to work.

During configure it is automatically probed for libnfs and support
is enabled on-the-fly. You can forbid or enforce libnfs support
with --disable-libnfs or --enable-libnfs respectively.

Due to NFS restrictions you might need to execute your binaries
as root, allow them to open priviledged ports (<1024) or specify
insecure option on the NFS server.

For additional information on ROOT vs. non-ROOT operation and URL
format + parameters see:
   https://raw.github.com/sahlberg/libnfs/master/README

LibNFS currently support NFS version 3 only.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
v4->v5:
 - disussed with Ronnie and decided to move URL + Paramter parsing to LibNFS.
   This allows for URL parameter processing directly in LibNFS without altering
   the qemu NFS block driver. This bumps the version requirement for LibNFS
   to 1.9.0 though.
 - added a pointer to the LibNFS readme where additional information about
   ROOT privilidge requirements can be found as this raised a few concerns.
 - removed a trailing dot in an error statement [Fam].

v3->v4:
 - finally added full implementation of bdrv_get_allocated_file_size [Stefan]
 - removed trailing \n from error statements [Stefan]

v2->v3:
 - rebased the stefanha/block
 - use pkg_config to check for libnfs (ignoring cflags which are broken in 1.8.0) [Stefan]
 - fixed NFSClient declaration [Stefan]
 - renamed Task variables to task [Stefan]
 - renamed NFSTask to NFSRPC [Ronnie]
 - do not update bs->total_sectors in nfs_co_writev [Stefan]
 - return -ENOMEM on all async call failures [Stefan,Ronnie]
 - fully implement ftruncate
 - use util/uri.c for URL parsing [Stefan]
 - reworked nfs_file_open_common to nfs_client_open which works on NFSClient [Stefan]
 - added a comment ot the connect message that libnfs support NFSv3 only at the moment.
 - DID NOT add full implementation of bdrv_get_allocated_file_size because
   we are not in a coroutine context and I cannot do an async call here.
   I could do a sync call if there would be a guarantee that no requests
   are in flight. [Stefan]

v1->v2:
 - fixed block/Makefile.objs [Ronnie]
 - do not always register a read handler [Ronnie]
 - add support for reading beyond EOF [Fam]
 - fixed struct and paramter naming [Fam]
 - fixed overlong lines and whitespace errors [Fam]
 - return return status from libnfs whereever possible [Fam]
 - added comment why we set allocated_file_size to -ENOTSUP after write [Fam]
 - avoid segfault when parsing filname [Fam]
 - remove unused close_bh from NFSClient [Fam]
 - avoid dividing and mutliplying total_size by BDRV_SECTOR_SIZE in nfs_file_create [Fam]

 MAINTAINERS         |    5 +
 block/Makefile.objs |    1 +
 block/nfs.c         |  405 +++++++++++++++++++++++++++++++++++++++++++++++++++
 configure           |   26 ++++
 4 files changed, 437 insertions(+)
 create mode 100644 block/nfs.c

Comments

Stefan Hajnoczi Jan. 3, 2014, 10:37 a.m. UTC | #1
Looks good.  In order to merge this new block driver qemu-iotests
support for nfs is required.  That way the block driver can be exercised
and checked for regressions (I guess you performed manual testing
during development).

Please see tests/qemu-iotests/common for examples of
NBD/SSH/Sheepdog/etc support.

The qemu-iotests test suite with raw, qcow2, and vmdk formats should
work on top of NFS.  Assuming you have an NFS server already running
on localhost, something like the following should succeed:

  cd tests/qemu-iotests
  ln -s ../../x86_64-softmmu/qemu-system-x86_64 qemu
  ln -s ../../qemu-img .
  ln -s ../../qemu-io .
  ./check -nfs # raw format by default
  ./check -nfs -qcow2
  ./check -nfs -vmdk

Maybe -nfs should take the base NFS URI as an argument to allow more
flexible test configurations.  It's up to you.

More info on qemu-iotests: http://qemu-project.org/Documentation/QemuIoTests
Peter Lieven Jan. 3, 2014, 10:51 a.m. UTC | #2
On 03.01.2014 11:37, Stefan Hajnoczi wrote:
> Looks good.  In order to merge this new block driver qemu-iotests
> support for nfs is required.  That way the block driver can be exercised
> and checked for regressions (I guess you performed manual testing
> during development).
>
> Please see tests/qemu-iotests/common for examples of
> NBD/SSH/Sheepdog/etc support.
>
> The qemu-iotests test suite with raw, qcow2, and vmdk formats should
> work on top of NFS.  Assuming you have an NFS server already running
> on localhost, something like the following should succeed:
>
>    cd tests/qemu-iotests
>    ln -s ../../x86_64-softmmu/qemu-system-x86_64 qemu
>    ln -s ../../qemu-img .
>    ln -s ../../qemu-io .
>    ./check -nfs # raw format by default
>    ./check -nfs -qcow2
>    ./check -nfs -vmdk
>
> Maybe -nfs should take the base NFS URI as an argument to allow more
> flexible test configurations.  It's up to you.
>
> More info on qemu-iotests: http://qemu-project.org/Documentation/QemuIoTests
That would have been good to know earlier ;-)

Will look into this.

Peter
Peter Lieven Jan. 3, 2014, 11:04 a.m. UTC | #3
On 03.01.2014 11:37, Stefan Hajnoczi wrote:
> Looks good.  In order to merge this new block driver qemu-iotests
> support for nfs is required.  That way the block driver can be exercised
> and checked for regressions (I guess you performed manual testing
> during development).
>
> Please see tests/qemu-iotests/common for examples of
> NBD/SSH/Sheepdog/etc support.
>
> The qemu-iotests test suite with raw, qcow2, and vmdk formats should
> work on top of NFS.  Assuming you have an NFS server already running
> on localhost, something like the following should succeed:
>
>    cd tests/qemu-iotests
>    ln -s ../../x86_64-softmmu/qemu-system-x86_64 qemu
>    ln -s ../../qemu-img .
>    ln -s ../../qemu-io .
>    ./check -nfs # raw format by default
>    ./check -nfs -qcow2
>    ./check -nfs -vmdk
>
> Maybe -nfs should take the base NFS URI as an argument to allow more
> flexible test configurations.  It's up to you.
>
> More info on qemu-iotests: http://qemu-project.org/Documentation/QemuIoTests
i need to exclude test 063.

rm -f nfs://

obviously does not work. I wonder how sheepdog and rdb passed this test..

Peter
Peter Lieven Jan. 3, 2014, 11:28 a.m. UTC | #4
On 03.01.2014 11:37, Stefan Hajnoczi wrote:
> Looks good.  In order to merge this new block driver qemu-iotests
> support for nfs is required.  That way the block driver can be exercised
> and checked for regressions (I guess you performed manual testing
> during development).
>
> Please see tests/qemu-iotests/common for examples of
> NBD/SSH/Sheepdog/etc support.
>
> The qemu-iotests test suite with raw, qcow2, and vmdk formats should
> work on top of NFS.  Assuming you have an NFS server already running
> on localhost, something like the following should succeed:
>
>    cd tests/qemu-iotests
>    ln -s ../../x86_64-softmmu/qemu-system-x86_64 qemu
>    ln -s ../../qemu-img .
>    ln -s ../../qemu-io .
>    ./check -nfs # raw format by default
>    ./check -nfs -qcow2
>    ./check -nfs -vmdk
>
> Maybe -nfs should take the base NFS URI as an argument to allow more
> flexible test configurations.  It's up to you.
>
> More info on qemu-iotests: http://qemu-project.org/Documentation/QemuIoTests
it seems that several tests are broken since they use commands like
rm -f or mv and have protocol generic. shall I fix this?

Peter
Stefan Hajnoczi Jan. 6, 2014, 1:18 a.m. UTC | #5
On Fri, Jan 03, 2014 at 12:28:31PM +0100, Peter Lieven wrote:
> On 03.01.2014 11:37, Stefan Hajnoczi wrote:
> >Looks good.  In order to merge this new block driver qemu-iotests
> >support for nfs is required.  That way the block driver can be exercised
> >and checked for regressions (I guess you performed manual testing
> >during development).
> >
> >Please see tests/qemu-iotests/common for examples of
> >NBD/SSH/Sheepdog/etc support.
> >
> >The qemu-iotests test suite with raw, qcow2, and vmdk formats should
> >work on top of NFS.  Assuming you have an NFS server already running
> >on localhost, something like the following should succeed:
> >
> >   cd tests/qemu-iotests
> >   ln -s ../../x86_64-softmmu/qemu-system-x86_64 qemu
> >   ln -s ../../qemu-img .
> >   ln -s ../../qemu-io .
> >   ./check -nfs # raw format by default
> >   ./check -nfs -qcow2
> >   ./check -nfs -vmdk
> >
> >Maybe -nfs should take the base NFS URI as an argument to allow more
> >flexible test configurations.  It's up to you.
> >
> >More info on qemu-iotests: http://qemu-project.org/Documentation/QemuIoTests
> it seems that several tests are broken since they use commands like
> rm -f or mv and have protocol generic. shall I fix this?

The _cleanup_test_img() function in common.rc has cases for
nbd/rbd/sheepdog.  They don't invoke regular rm(1).  Maybe you need to
add a case for NFS there too.

If you find something broken and can fix it that's great.  If you cannot
fix it please report it so the nbd/rbd/sheepdog/ssh/etc maintainers can
investigate.

Stefan
Peter Lieven Jan. 6, 2014, 6:53 a.m. UTC | #6
On 06.01.2014 02:18, Stefan Hajnoczi wrote:
> On Fri, Jan 03, 2014 at 12:28:31PM +0100, Peter Lieven wrote:
>> On 03.01.2014 11:37, Stefan Hajnoczi wrote:
>>> Looks good.  In order to merge this new block driver qemu-iotests
>>> support for nfs is required.  That way the block driver can be exercised
>>> and checked for regressions (I guess you performed manual testing
>>> during development).
>>>
>>> Please see tests/qemu-iotests/common for examples of
>>> NBD/SSH/Sheepdog/etc support.
>>>
>>> The qemu-iotests test suite with raw, qcow2, and vmdk formats should
>>> work on top of NFS.  Assuming you have an NFS server already running
>>> on localhost, something like the following should succeed:
>>>
>>>    cd tests/qemu-iotests
>>>    ln -s ../../x86_64-softmmu/qemu-system-x86_64 qemu
>>>    ln -s ../../qemu-img .
>>>    ln -s ../../qemu-io .
>>>    ./check -nfs # raw format by default
>>>    ./check -nfs -qcow2
>>>    ./check -nfs -vmdk
>>>
>>> Maybe -nfs should take the base NFS URI as an argument to allow more
>>> flexible test configurations.  It's up to you.
>>>
>>> More info on qemu-iotests: http://qemu-project.org/Documentation/QemuIoTests
>> it seems that several tests are broken since they use commands like
>> rm -f or mv and have protocol generic. shall I fix this?
> The _cleanup_test_img() function in common.rc has cases for
> nbd/rbd/sheepdog.  They don't invoke regular rm(1).  Maybe you need to
> add a case for NFS there too.
>
> If you find something broken and can fix it that's great.  If you cannot
> fix it please report it so the nbd/rbd/sheepdog/ssh/etc maintainers can
> investigate.

I was not thinking of the cleanup. Its mainly almost every test that does
not deal with the RAW format that its not working. Please see my v2
of the patch for the NFS tests.

In general, Ronnie and I thought of a LDPRELOAD library which enables
every posix binary to talk to an NFS share. In this case even qcow2.py would
work out of the box.

Peter
Kevin Wolf Jan. 9, 2014, 2:13 p.m. UTC | #7
Am 26.12.2013 um 13:48 hat Peter Lieven geschrieben:
> This patch adds native support for accessing images on NFS shares without
> the requirement to actually mount the entire NFS share on the host.
> 
> NFS Images can simply be specified by an url of the form:
> nfs://<host>/<export>/<filename>
> 
> For example:
> qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
> 
> You need LibNFS from Ronnie Sahlberg available at:
>    git://github.com/sahlberg/libnfs.git
> for this to work.
> 
> During configure it is automatically probed for libnfs and support
> is enabled on-the-fly. You can forbid or enforce libnfs support
> with --disable-libnfs or --enable-libnfs respectively.
> 
> Due to NFS restrictions you might need to execute your binaries
> as root, allow them to open priviledged ports (<1024) or specify
> insecure option on the NFS server.
> 
> For additional information on ROOT vs. non-ROOT operation and URL
> format + parameters see:
>    https://raw.github.com/sahlberg/libnfs/master/README
> 
> LibNFS currently support NFS version 3 only.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
> v4->v5:
>  - disussed with Ronnie and decided to move URL + Paramter parsing to LibNFS.
>    This allows for URL parameter processing directly in LibNFS without altering
>    the qemu NFS block driver. This bumps the version requirement for LibNFS
>    to 1.9.0 though.

Considering that we'll likely want to add new options in the future, I'm
not sure if this is a good idea. This means that struct nfs_url will
change, and if qemu isn't updated, it might not even notice that some
option was requested in a new field that it doesn't know and provide,
so it will silently ignore it. Or if we have a qemu built against an
older libnfs, this must be marked as an incompatible ABI change, so it
can't run at all with the newer libnfs version.

>  - added a pointer to the LibNFS readme where additional information about
>    ROOT privilidge requirements can be found as this raised a few concerns.
>  - removed a trailing dot in an error statement [Fam].
> 
> v3->v4:
>  - finally added full implementation of bdrv_get_allocated_file_size [Stefan]
>  - removed trailing \n from error statements [Stefan]
> 
> v2->v3:
>  - rebased the stefanha/block
>  - use pkg_config to check for libnfs (ignoring cflags which are broken in 1.8.0) [Stefan]
>  - fixed NFSClient declaration [Stefan]
>  - renamed Task variables to task [Stefan]
>  - renamed NFSTask to NFSRPC [Ronnie]
>  - do not update bs->total_sectors in nfs_co_writev [Stefan]
>  - return -ENOMEM on all async call failures [Stefan,Ronnie]
>  - fully implement ftruncate
>  - use util/uri.c for URL parsing [Stefan]
>  - reworked nfs_file_open_common to nfs_client_open which works on NFSClient [Stefan]
>  - added a comment ot the connect message that libnfs support NFSv3 only at the moment.
>  - DID NOT add full implementation of bdrv_get_allocated_file_size because
>    we are not in a coroutine context and I cannot do an async call here.
>    I could do a sync call if there would be a guarantee that no requests
>    are in flight. [Stefan]
> 
> v1->v2:
>  - fixed block/Makefile.objs [Ronnie]
>  - do not always register a read handler [Ronnie]
>  - add support for reading beyond EOF [Fam]
>  - fixed struct and paramter naming [Fam]
>  - fixed overlong lines and whitespace errors [Fam]
>  - return return status from libnfs whereever possible [Fam]
>  - added comment why we set allocated_file_size to -ENOTSUP after write [Fam]
>  - avoid segfault when parsing filname [Fam]
>  - remove unused close_bh from NFSClient [Fam]
>  - avoid dividing and mutliplying total_size by BDRV_SECTOR_SIZE in nfs_file_create [Fam]
> 
>  MAINTAINERS         |    5 +
>  block/Makefile.objs |    1 +
>  block/nfs.c         |  405 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  configure           |   26 ++++

qapi-schema.json is missing, so you can't add NFS block devices using
blockdev-add.

>  4 files changed, 437 insertions(+)
>  create mode 100644 block/nfs.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a5ab8f8..09996ab 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -935,6 +935,11 @@ M: Peter Lieven <pl@kamp.de>
>  S: Supported
>  F: block/iscsi.c
>  
> +NFS
> +M: Peter Lieven <pl@kamp.de>
> +S: Maintained
> +F: block/nfs.c
> +
>  SSH
>  M: Richard W.M. Jones <rjones@redhat.com>
>  S: Supported
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 4e8c91e..e254a21 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -12,6 +12,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
>  ifeq ($(CONFIG_POSIX),y)
>  block-obj-y += nbd.o nbd-client.o sheepdog.o
>  block-obj-$(CONFIG_LIBISCSI) += iscsi.o
> +block-obj-$(CONFIG_LIBNFS) += nfs.o
>  block-obj-$(CONFIG_CURL) += curl.o
>  block-obj-$(CONFIG_RBD) += rbd.o
>  block-obj-$(CONFIG_GLUSTERFS) += gluster.o
> diff --git a/block/nfs.c b/block/nfs.c
> new file mode 100644
> index 0000000..4023b71
> --- /dev/null
> +++ b/block/nfs.c
> @@ -0,0 +1,405 @@
> +/*
> + * QEMU Block driver for native access to files on NFS shares
> + *
> + * Copyright (c) 2013 Peter Lieven <pl@kamp.de>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "config-host.h"
> +
> +#include <poll.h>
> +#include "qemu-common.h"
> +#include "qemu/config-file.h"
> +#include "qemu/error-report.h"
> +#include "block/block_int.h"
> +#include "trace.h"
> +#include "qemu/iov.h"
> +#include "sysemu/sysemu.h"
> +
> +#include <nfsc/libnfs-zdr.h>
> +#include <nfsc/libnfs.h>
> +#include <nfsc/libnfs-raw.h>
> +#include <nfsc/libnfs-raw-mount.h>
> +
> +typedef struct NFSClient {
> +    struct nfs_context *context;
> +    struct nfsfh *fh;
> +    int events;
> +    bool has_zero_init;
> +} NFSClient;
> +
> +typedef struct NFSRPC {
> +    int status;
> +    int complete;
> +    QEMUIOVector *iov;
> +    struct stat *st;
> +    Coroutine *co;
> +    QEMUBH *bh;
> +} NFSRPC;
> +
> +static void nfs_process_read(void *arg);
> +static void nfs_process_write(void *arg);
> +
> +static void nfs_set_events(NFSClient *client)
> +{
> +    int ev = nfs_which_events(client->context);
> +    if (ev != client->events) {
> +        qemu_aio_set_fd_handler(nfs_get_fd(client->context),
> +                      (ev & POLLIN) ? nfs_process_read : NULL,
> +                      (ev & POLLOUT) ? nfs_process_write : NULL,
> +                      client);
> +
> +    }
> +    client->events = ev;
> +}
> +
> +static void nfs_process_read(void *arg)
> +{
> +    NFSClient *client = arg;
> +    nfs_service(client->context, POLLIN);
> +    nfs_set_events(client);
> +}
> +
> +static void nfs_process_write(void *arg)
> +{
> +    NFSClient *client = arg;
> +    nfs_service(client->context, POLLOUT);
> +    nfs_set_events(client);
> +}
> +
> +static void nfs_co_init_task(NFSClient *client, NFSRPC *task)
> +{
> +    *task = (NFSRPC) {
> +        .co         = qemu_coroutine_self(),
> +    };
> +}
> +
> +static void nfs_co_generic_bh_cb(void *opaque)
> +{
> +    NFSRPC *task = opaque;
> +    qemu_bh_delete(task->bh);
> +    qemu_coroutine_enter(task->co, NULL);
> +}
> +
> +static void
> +nfs_co_generic_cb(int status, struct nfs_context *nfs, void *data,
> +                  void *private_data)
> +{
> +    NFSRPC *task = private_data;
> +    task->complete = 1;
> +    task->status = status;
> +    if (task->status > 0 && task->iov) {
> +        if (task->status <= task->iov->size) {
> +            qemu_iovec_from_buf(task->iov, 0, data, task->status);
> +        } else {
> +            task->status = -EIO;

Short reads don't happen in practice with libnfs?

> +        }
> +    }
> +    if (task->status == 0 && task->st) {
> +        memcpy(task->st, data, sizeof(struct stat));
> +    }
> +    if (task->co) {
> +        task->bh = qemu_bh_new(nfs_co_generic_bh_cb, task);
> +        qemu_bh_schedule(task->bh);
> +    }
> +}
> +
> +static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
> +                                     int64_t sector_num, int nb_sectors,
> +                                     QEMUIOVector *iov)
> +{
> +    NFSClient *client = bs->opaque;
> +    NFSRPC task;
> +
> +    nfs_co_init_task(client, &task);
> +    task.iov = iov;
> +
> +    if (nfs_pread_async(client->context, client->fh,
> +                        sector_num * BDRV_SECTOR_SIZE,
> +                        nb_sectors * BDRV_SECTOR_SIZE,
> +                        nfs_co_generic_cb, &task) != 0) {
> +        return -ENOMEM;
> +    }
> +
> +    while (!task.complete) {
> +        nfs_set_events(client);
> +        qemu_coroutine_yield();
> +    }
> +
> +    if (task.status < 0) {
> +        return task.status;
> +    }
> +
> +    return 0;
> +}
> +
> +static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
> +                                        int64_t sector_num, int nb_sectors,
> +                                        QEMUIOVector *iov)
> +{
> +    NFSClient *client = bs->opaque;
> +    NFSRPC task;
> +    char *buf = NULL;
> +
> +    nfs_co_init_task(client, &task);
> +
> +    buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE);
> +    qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
> +
> +    if (nfs_pwrite_async(client->context, client->fh,
> +                         sector_num * BDRV_SECTOR_SIZE,
> +                         nb_sectors * BDRV_SECTOR_SIZE,
> +                         buf, nfs_co_generic_cb, &task) != 0) {
> +        g_free(buf);
> +        return -ENOMEM;
> +    }
> +
> +    while (!task.complete) {
> +        nfs_set_events(client);
> +        qemu_coroutine_yield();
> +    }
> +
> +    g_free(buf);
> +
> +    if (task.status != nb_sectors * BDRV_SECTOR_SIZE) {
> +        return task.status < 0 ? task.status : -EIO;

Does this duplicate the logic of nfs_co_generic_cb?

> +    }
> +
> +    return 0;
> +}
> +
> +static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
> +{
> +    NFSClient *client = bs->opaque;
> +    NFSRPC task;
> +
> +    nfs_co_init_task(client, &task);
> +
> +    if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
> +                        &task) != 0) {
> +        return -ENOMEM;
> +    }
> +
> +    while (!task.complete) {
> +        nfs_set_events(client);
> +        qemu_coroutine_yield();
> +    }
> +
> +    return task.status;
> +}
> +
> +static QemuOptsList runtime_opts = {
> +    .name = "nfs",
> +    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
> +    .desc = {
> +        {
> +            .name = "filename",
> +            .type = QEMU_OPT_STRING,
> +            .help = "URL to the NFS file",
> +        },
> +        { /* end of list */ }
> +    },
> +};

I think this is the point where I disagree. First of all, if it's a URL,
call it "url" instead of "filename". But more importantly, a URL encodes
options in a string instead of structured options that can be set
separately.

I would split this into options 'server', 'path', etc. and only have
bdrv_parse_filename() invoke the URL parsing to fill these options if
the user preferred the URL style.

Kevin
Peter Lieven Jan. 9, 2014, 4:08 p.m. UTC | #8
Am 09.01.2014 15:13, schrieb Kevin Wolf:
> Am 26.12.2013 um 13:48 hat Peter Lieven geschrieben:
>> This patch adds native support for accessing images on NFS shares without
>> the requirement to actually mount the entire NFS share on the host.
>>
>> NFS Images can simply be specified by an url of the form:
>> nfs://<host>/<export>/<filename>
>>
>> For example:
>> qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
>>
>> You need LibNFS from Ronnie Sahlberg available at:
>>    git://github.com/sahlberg/libnfs.git
>> for this to work.
>>
>> During configure it is automatically probed for libnfs and support
>> is enabled on-the-fly. You can forbid or enforce libnfs support
>> with --disable-libnfs or --enable-libnfs respectively.
>>
>> Due to NFS restrictions you might need to execute your binaries
>> as root, allow them to open priviledged ports (<1024) or specify
>> insecure option on the NFS server.
>>
>> For additional information on ROOT vs. non-ROOT operation and URL
>> format + parameters see:
>>    https://raw.github.com/sahlberg/libnfs/master/README
>>
>> LibNFS currently support NFS version 3 only.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> v4->v5:
>>  - disussed with Ronnie and decided to move URL + Paramter parsing to LibNFS.
>>    This allows for URL parameter processing directly in LibNFS without altering
>>    the qemu NFS block driver. This bumps the version requirement for LibNFS
>>    to 1.9.0 though.
> Considering that we'll likely want to add new options in the future, I'm
> not sure if this is a good idea. This means that struct nfs_url will
> change, and if qemu isn't updated, it might not even notice that some
> option was requested in a new field that it doesn't know and provide,
> so it will silently ignore it. Or if we have a qemu built against an
> older libnfs, this must be marked as an incompatible ABI change, so it
> can't run at all with the newer libnfs version.

Maybe we are talking about differnt things here. The paramteres/options
we were talking about are options to libnfs not to qemu. This could be
e.g. the uid or the protocol version. This is nothing qemu has to care about.
The nfs_url struct is likely not to change and even if it would change
I see no problem as long as we do only extend the struct and do not change
to position of the server, export and file.
>
>>  - added a pointer to the LibNFS readme where additional information about
>>    ROOT privilidge requirements can be found as this raised a few concerns.
>>  - removed a trailing dot in an error statement [Fam].
>>
>> v3->v4:
>>  - finally added full implementation of bdrv_get_allocated_file_size [Stefan]
>>  - removed trailing \n from error statements [Stefan]
>>
>> v2->v3:
>>  - rebased the stefanha/block
>>  - use pkg_config to check for libnfs (ignoring cflags which are broken in 1.8.0) [Stefan]
>>  - fixed NFSClient declaration [Stefan]
>>  - renamed Task variables to task [Stefan]
>>  - renamed NFSTask to NFSRPC [Ronnie]
>>  - do not update bs->total_sectors in nfs_co_writev [Stefan]
>>  - return -ENOMEM on all async call failures [Stefan,Ronnie]
>>  - fully implement ftruncate
>>  - use util/uri.c for URL parsing [Stefan]
>>  - reworked nfs_file_open_common to nfs_client_open which works on NFSClient [Stefan]
>>  - added a comment ot the connect message that libnfs support NFSv3 only at the moment.
>>  - DID NOT add full implementation of bdrv_get_allocated_file_size because
>>    we are not in a coroutine context and I cannot do an async call here.
>>    I could do a sync call if there would be a guarantee that no requests
>>    are in flight. [Stefan]
>>
>> v1->v2:
>>  - fixed block/Makefile.objs [Ronnie]
>>  - do not always register a read handler [Ronnie]
>>  - add support for reading beyond EOF [Fam]
>>  - fixed struct and paramter naming [Fam]
>>  - fixed overlong lines and whitespace errors [Fam]
>>  - return return status from libnfs whereever possible [Fam]
>>  - added comment why we set allocated_file_size to -ENOTSUP after write [Fam]
>>  - avoid segfault when parsing filname [Fam]
>>  - remove unused close_bh from NFSClient [Fam]
>>  - avoid dividing and mutliplying total_size by BDRV_SECTOR_SIZE in nfs_file_create [Fam]
>>
>>  MAINTAINERS         |    5 +
>>  block/Makefile.objs |    1 +
>>  block/nfs.c         |  405 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  configure           |   26 ++++
> qapi-schema.json is missing, so you can't add NFS block devices using
> blockdev-add.
>
>>  4 files changed, 437 insertions(+)
>>  create mode 100644 block/nfs.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index a5ab8f8..09996ab 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -935,6 +935,11 @@ M: Peter Lieven <pl@kamp.de>
>>  S: Supported
>>  F: block/iscsi.c
>>  
>> +NFS
>> +M: Peter Lieven <pl@kamp.de>
>> +S: Maintained
>> +F: block/nfs.c
>> +
>>  SSH
>>  M: Richard W.M. Jones <rjones@redhat.com>
>>  S: Supported
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index 4e8c91e..e254a21 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -12,6 +12,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
>>  ifeq ($(CONFIG_POSIX),y)
>>  block-obj-y += nbd.o nbd-client.o sheepdog.o
>>  block-obj-$(CONFIG_LIBISCSI) += iscsi.o
>> +block-obj-$(CONFIG_LIBNFS) += nfs.o
>>  block-obj-$(CONFIG_CURL) += curl.o
>>  block-obj-$(CONFIG_RBD) += rbd.o
>>  block-obj-$(CONFIG_GLUSTERFS) += gluster.o
>> diff --git a/block/nfs.c b/block/nfs.c
>> new file mode 100644
>> index 0000000..4023b71
>> --- /dev/null
>> +++ b/block/nfs.c
>> @@ -0,0 +1,405 @@
>> +/*
>> + * QEMU Block driver for native access to files on NFS shares
>> + *
>> + * Copyright (c) 2013 Peter Lieven <pl@kamp.de>
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "config-host.h"
>> +
>> +#include <poll.h>
>> +#include "qemu-common.h"
>> +#include "qemu/config-file.h"
>> +#include "qemu/error-report.h"
>> +#include "block/block_int.h"
>> +#include "trace.h"
>> +#include "qemu/iov.h"
>> +#include "sysemu/sysemu.h"
>> +
>> +#include <nfsc/libnfs-zdr.h>
>> +#include <nfsc/libnfs.h>
>> +#include <nfsc/libnfs-raw.h>
>> +#include <nfsc/libnfs-raw-mount.h>
>> +
>> +typedef struct NFSClient {
>> +    struct nfs_context *context;
>> +    struct nfsfh *fh;
>> +    int events;
>> +    bool has_zero_init;
>> +} NFSClient;
>> +
>> +typedef struct NFSRPC {
>> +    int status;
>> +    int complete;
>> +    QEMUIOVector *iov;
>> +    struct stat *st;
>> +    Coroutine *co;
>> +    QEMUBH *bh;
>> +} NFSRPC;
>> +
>> +static void nfs_process_read(void *arg);
>> +static void nfs_process_write(void *arg);
>> +
>> +static void nfs_set_events(NFSClient *client)
>> +{
>> +    int ev = nfs_which_events(client->context);
>> +    if (ev != client->events) {
>> +        qemu_aio_set_fd_handler(nfs_get_fd(client->context),
>> +                      (ev & POLLIN) ? nfs_process_read : NULL,
>> +                      (ev & POLLOUT) ? nfs_process_write : NULL,
>> +                      client);
>> +
>> +    }
>> +    client->events = ev;
>> +}
>> +
>> +static void nfs_process_read(void *arg)
>> +{
>> +    NFSClient *client = arg;
>> +    nfs_service(client->context, POLLIN);
>> +    nfs_set_events(client);
>> +}
>> +
>> +static void nfs_process_write(void *arg)
>> +{
>> +    NFSClient *client = arg;
>> +    nfs_service(client->context, POLLOUT);
>> +    nfs_set_events(client);
>> +}
>> +
>> +static void nfs_co_init_task(NFSClient *client, NFSRPC *task)
>> +{
>> +    *task = (NFSRPC) {
>> +        .co         = qemu_coroutine_self(),
>> +    };
>> +}
>> +
>> +static void nfs_co_generic_bh_cb(void *opaque)
>> +{
>> +    NFSRPC *task = opaque;
>> +    qemu_bh_delete(task->bh);
>> +    qemu_coroutine_enter(task->co, NULL);
>> +}
>> +
>> +static void
>> +nfs_co_generic_cb(int status, struct nfs_context *nfs, void *data,
>> +                  void *private_data)
>> +{
>> +    NFSRPC *task = private_data;
>> +    task->complete = 1;
>> +    task->status = status;
>> +    if (task->status > 0 && task->iov) {
>> +        if (task->status <= task->iov->size) {
>> +            qemu_iovec_from_buf(task->iov, 0, data, task->status);
>> +        } else {
>> +            task->status = -EIO;
> Short reads don't happen in practice with libnfs?
Not hat I now of. Ronnie?
>
>> +        }
>> +    }
>> +    if (task->status == 0 && task->st) {
>> +        memcpy(task->st, data, sizeof(struct stat));
>> +    }
>> +    if (task->co) {
>> +        task->bh = qemu_bh_new(nfs_co_generic_bh_cb, task);
>> +        qemu_bh_schedule(task->bh);
>> +    }
>> +}
>> +
>> +static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
>> +                                     int64_t sector_num, int nb_sectors,
>> +                                     QEMUIOVector *iov)
>> +{
>> +    NFSClient *client = bs->opaque;
>> +    NFSRPC task;
>> +
>> +    nfs_co_init_task(client, &task);
>> +    task.iov = iov;
>> +
>> +    if (nfs_pread_async(client->context, client->fh,
>> +                        sector_num * BDRV_SECTOR_SIZE,
>> +                        nb_sectors * BDRV_SECTOR_SIZE,
>> +                        nfs_co_generic_cb, &task) != 0) {
>> +        return -ENOMEM;
>> +    }
>> +
>> +    while (!task.complete) {
>> +        nfs_set_events(client);
>> +        qemu_coroutine_yield();
>> +    }
>> +
>> +    if (task.status < 0) {
>> +        return task.status;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
>> +                                        int64_t sector_num, int nb_sectors,
>> +                                        QEMUIOVector *iov)
>> +{
>> +    NFSClient *client = bs->opaque;
>> +    NFSRPC task;
>> +    char *buf = NULL;
>> +
>> +    nfs_co_init_task(client, &task);
>> +
>> +    buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE);
>> +    qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
>> +
>> +    if (nfs_pwrite_async(client->context, client->fh,
>> +                         sector_num * BDRV_SECTOR_SIZE,
>> +                         nb_sectors * BDRV_SECTOR_SIZE,
>> +                         buf, nfs_co_generic_cb, &task) != 0) {
>> +        g_free(buf);
>> +        return -ENOMEM;
>> +    }
>> +
>> +    while (!task.complete) {
>> +        nfs_set_events(client);
>> +        qemu_coroutine_yield();
>> +    }
>> +
>> +    g_free(buf);
>> +
>> +    if (task.status != nb_sectors * BDRV_SECTOR_SIZE) {
>> +        return task.status < 0 ? task.status : -EIO;
> Does this duplicate the logic of nfs_co_generic_cb?
The logic inside nfs_co_generic_cb is for the read case.
>
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
>> +{
>> +    NFSClient *client = bs->opaque;
>> +    NFSRPC task;
>> +
>> +    nfs_co_init_task(client, &task);
>> +
>> +    if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
>> +                        &task) != 0) {
>> +        return -ENOMEM;
>> +    }
>> +
>> +    while (!task.complete) {
>> +        nfs_set_events(client);
>> +        qemu_coroutine_yield();
>> +    }
>> +
>> +    return task.status;
>> +}
>> +
>> +static QemuOptsList runtime_opts = {
>> +    .name = "nfs",
>> +    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
>> +    .desc = {
>> +        {
>> +            .name = "filename",
>> +            .type = QEMU_OPT_STRING,
>> +            .help = "URL to the NFS file",
>> +        },
>> +        { /* end of list */ }
>> +    },
>> +};
> I think this is the point where I disagree. First of all, if it's a URL,
> call it "url" instead of "filename". But more importantly, a URL encodes
> options in a string instead of structured options that can be set
> separately.
Thats exactly what I meant above: The options are encoded in the
string, e.g.

nfs://server/export/file?uid=1000&gid=1000

Thats what libnfs expects. But all the options are not important for qemu.

"filename" was copied from block/iscsi.c. The semantic is exactly
the same. It accepts an URL and all the paramter parsing is
done inside libiscsi with iscsi_parse_url_full.

Peter
Kevin Wolf Jan. 10, 2014, 11:40 a.m. UTC | #9
Am 09.01.2014 um 17:08 hat Peter Lieven geschrieben:
> Am 09.01.2014 15:13, schrieb Kevin Wolf:
> > Am 26.12.2013 um 13:48 hat Peter Lieven geschrieben:
> >> v4->v5:
> >>  - disussed with Ronnie and decided to move URL + Paramter parsing to LibNFS.
> >>    This allows for URL parameter processing directly in LibNFS without altering
> >>    the qemu NFS block driver. This bumps the version requirement for LibNFS
> >>    to 1.9.0 though.
> > Considering that we'll likely want to add new options in the future, I'm
> > not sure if this is a good idea. This means that struct nfs_url will
> > change, and if qemu isn't updated, it might not even notice that some
> > option was requested in a new field that it doesn't know and provide,
> > so it will silently ignore it. Or if we have a qemu built against an
> > older libnfs, this must be marked as an incompatible ABI change, so it
> > can't run at all with the newer libnfs version.
> 
> Maybe we are talking about differnt things here. The paramteres/options
> we were talking about are options to libnfs not to qemu. This could be
> e.g. the uid or the protocol version. This is nothing qemu has to care about.
> The nfs_url struct is likely not to change and even if it would change
> I see no problem as long as we do only extend the struct and do not change
> to position of the server, export and file.

The problem is that qemu doesn't treat nfsurl as a black box. It calls
the libnfs function to parse a URL into the struct, and then it accesses
the individual fields of struct nfsurl to pass them to several libnfs
functions.

If struct nfsurl is extended, qemu needs to learn to feed the new fields
to libnfs functions, otherwise they'll be silently ignored.

Just think it through what happens after adding a new option with the
combinations of an old qemu binary run with a new libnfs, and a new qemu
binary run with the old libnfs. You'll come to the conclusion that the
ABI is incompatible both ways.

> >> +static QemuOptsList runtime_opts = {
> >> +    .name = "nfs",
> >> +    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
> >> +    .desc = {
> >> +        {
> >> +            .name = "filename",
> >> +            .type = QEMU_OPT_STRING,
> >> +            .help = "URL to the NFS file",
> >> +        },
> >> +        { /* end of list */ }
> >> +    },
> >> +};
> > I think this is the point where I disagree. First of all, if it's a URL,
> > call it "url" instead of "filename". But more importantly, a URL encodes
> > options in a string instead of structured options that can be set
> > separately.
> Thats exactly what I meant above: The options are encoded in the
> string, e.g.
> 
> nfs://server/export/file?uid=1000&gid=1000
> 
> Thats what libnfs expects. But all the options are not important for qemu.

This means that we only expose one string option to QMP clients. There
is no way for them to figure out which options are valid, even with a
QAPI schema. This will be a real problem for libvirt support for libnfs.

It also means that you can't override single options of a libnfs backing
file on the command line, though I suppose that is a minor use case.

> "filename" was copied from block/iscsi.c. The semantic is exactly
> the same. It accepts an URL and all the paramter parsing is
> done inside libiscsi with iscsi_parse_url_full.

Yup, and the part that you didn't copy is:

    /* TODO Convert to fine grained options */

The iscsi block driver is still using the legacy interface and pending
conversion. Doesn't mean that this is a reason for a new block driver to
not do it right from the beginning.

Kevin
Peter Lieven Jan. 10, 2014, 12:12 p.m. UTC | #10
On 10.01.2014 12:40, Kevin Wolf wrote:
> Am 09.01.2014 um 17:08 hat Peter Lieven geschrieben:
>> Am 09.01.2014 15:13, schrieb Kevin Wolf:
>>> Am 26.12.2013 um 13:48 hat Peter Lieven geschrieben:
>>>> v4->v5:
>>>>   - disussed with Ronnie and decided to move URL + Paramter parsing to LibNFS.
>>>>     This allows for URL parameter processing directly in LibNFS without altering
>>>>     the qemu NFS block driver. This bumps the version requirement for LibNFS
>>>>     to 1.9.0 though.
>>> Considering that we'll likely want to add new options in the future, I'm
>>> not sure if this is a good idea. This means that struct nfs_url will
>>> change, and if qemu isn't updated, it might not even notice that some
>>> option was requested in a new field that it doesn't know and provide,
>>> so it will silently ignore it. Or if we have a qemu built against an
>>> older libnfs, this must be marked as an incompatible ABI change, so it
>>> can't run at all with the newer libnfs version.
>> Maybe we are talking about differnt things here. The paramteres/options
>> we were talking about are options to libnfs not to qemu. This could be
>> e.g. the uid or the protocol version. This is nothing qemu has to care about.
>> The nfs_url struct is likely not to change and even if it would change
>> I see no problem as long as we do only extend the struct and do not change
>> to position of the server, export and file.
> The problem is that qemu doesn't treat nfsurl as a black box. It calls
> the libnfs function to parse a URL into the struct, and then it accesses
> the individual fields of struct nfsurl to pass them to several libnfs
> functions.
libnfs feeds itself when the url is parsed. this way qemu does not
need to be recompiled or adjusted any time when there is a new
option to libnfs. the options that are specified here are only meaningful
to the NFS transport itself nothing qemu has to care about.
>
> If struct nfsurl is extended, qemu needs to learn to feed the new fields
> to libnfs functions, otherwise they'll be silently ignored.
>
> Just think it through what happens after adding a new option with the
> combinations of an old qemu binary run with a new libnfs, and a new qemu
> binary run with the old libnfs. You'll come to the conclusion that the
> ABI is incompatible both ways.
we only have a pointer to the struct so the size doesn't matter. as
long as the expected fields are at the right positions I do not see the issue.
>
>>>> +static QemuOptsList runtime_opts = {
>>>> +    .name = "nfs",
>>>> +    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
>>>> +    .desc = {
>>>> +        {
>>>> +            .name = "filename",
>>>> +            .type = QEMU_OPT_STRING,
>>>> +            .help = "URL to the NFS file",
>>>> +        },
>>>> +        { /* end of list */ }
>>>> +    },
>>>> +};
>>> I think this is the point where I disagree. First of all, if it's a URL,
>>> call it "url" instead of "filename". But more importantly, a URL encodes
>>> options in a string instead of structured options that can be set
>>> separately.
>> Thats exactly what I meant above: The options are encoded in the
>> string, e.g.
>>
>> nfs://server/export/file?uid=1000&gid=1000
>>
>> Thats what libnfs expects. But all the options are not important for qemu.
> This means that we only expose one string option to QMP clients. There
> is no way for them to figure out which options are valid, even with a
> QAPI schema. This will be a real problem for libvirt support for libnfs.
Which option is available is highly depending on the libnfs version.
Currently libnfs parses the options and ignores them if they are unknown.
Like it or not that is how it is designed.
>
> It also means that you can't override single options of a libnfs backing
> file on the command line, though I suppose that is a minor use case.
>
>> "filename" was copied from block/iscsi.c. The semantic is exactly
>> the same. It accepts an URL and all the paramter parsing is
>> done inside libiscsi with iscsi_parse_url_full.
> Yup, and the part that you didn't copy is:
>
>      /* TODO Convert to fine grained options */
>
> The iscsi block driver is still using the legacy interface and pending
> conversion. Doesn't mean that this is a reason for a new block driver to
> not do it right from the beginning.
Sorry, I cannot see from the docs or existing drivers how to do this.
All protocols that use URLs are pending regarding to the conversion.
To be honest it starts to getting a little bit frustrating. I wanted to share
this protocol extension I developed to get around some issues with
disappearing NFS shares we experienced. I heard that anything is fine and then
suddenly it had to run through qemu-iotests. Ok, I looked at it
and found that most of the tests only work properly with the
file protocol altough they are tagged generic. I fixed that or proposed a fix.
Now, I shall start over and change the parsing that was changed
upon the wish of the libnfs author and that works well for me.
Then I shall convert everything to a qapi schema whereby the current
design of libnfs is designed to work with plain URLs.

I currently do not have resources to look further into this. If the driver
does not meet the requirements and this can't be adjusted with little
changes than we have to leave it out for the moment.

Peter
Paolo Bonzini Jan. 10, 2014, 12:30 p.m. UTC | #11
Il 10/01/2014 13:12, Peter Lieven ha scritto:
> Then I shall convert everything to a qapi schema whereby the current
> design of libnfs is designed to work with plain URLs.

No, no one is asking you to do this.  URLs are fine, but I agree with
Kevin that parsing them in QEMU is better.

Also because the QEMU parser is known to be based on RFCs and good code
from libxml2.  For example, the iSCSI URL parser, when introduced,
didn't even have percent-escape parsing, causing libvirt to fail with
old libiscsi (and actually not that old too: IIRC libiscsi 1.7 will
still fail).  Unless the libnfs parser is as good as libxml2's, I think
there's value in using the QEMU URI parser.

Paolo
ronnie sahlberg Jan. 10, 2014, 2:49 p.m. UTC | #12
On Fri, Jan 10, 2014 at 4:30 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 10/01/2014 13:12, Peter Lieven ha scritto:
>> Then I shall convert everything to a qapi schema whereby the current
>> design of libnfs is designed to work with plain URLs.
>
> No, no one is asking you to do this.  URLs are fine, but I agree with
> Kevin that parsing them in QEMU is better.
>
> Also because the QEMU parser is known to be based on RFCs and good code
> from libxml2.  For example, the iSCSI URL parser, when introduced,
> didn't even have percent-escape parsing, causing libvirt to fail with
> old libiscsi (and actually not that old too: IIRC libiscsi 1.7 will
> still fail).  Unless the libnfs parser is as good as libxml2's, I think
> there's value in using the QEMU URI parser.


I think that is fair enough.

The arguments we are talking about are the type of arguments that only
affect the interface between libnfs and the nfs server itself
and is not strictly all that interesting to the application that links
to libnfs.

Since parsing a URL does require a fair amount of code, a hundred
lines or more, it is a bit annoying having to re-implement the parsing
code for every single small utility. For example  nfs-ls   nfs-cp
nfs-cp    or for the parsing, that is still done, in the sg-utils
patch.
For a lot of these small and semi-trivial applications we don't really
care all that much about what the options are but we might care a lot
about making it easier to use libnfs and to avoid having to write a
parser each time.

For those use cases, I definitely think that having a built in
function to parse a url, and automatically update the nfs context with
connection related tweaks is a good thing. It eliminates the need to
re-implement the parsing functions in every single trivial
application.


For QEMU and libvirt things may be different. These are non-trivial
applications and may have needs to be able to control the settings
explicitely in the QEMU code.
That is still possible to do. All the url arguments so far tweak
arguments that can also be controlled through explicit existing APIs.
So for QEMU, there are functions available in libnfs now that will
automatically update the nfs context with things like UID/GID to use
when talking to the server, passed via the URL and QEMU can use them.
On the other hand, if QEMU rather wants to parse the URL itself
and make calls into the libnfs API to tweak these settings directly
from the QEMU codebase, that is also possible.


For example:   nfs://1.2.3.4/path/file?uid=10&gid=10
When parsing these using the libnfs functions, the parsing functions
will automatically update the nfs context so that it will use these
values when it fills in the rpc header to send to the server.
But if you want to parse the url yourself, you can do that too, by
just calling   nfs_set_auth(nfs,  libnfs_authunix_create(..., 10, 10,
...


Regards
Ronnie Sahlberg
Peter Lieven Jan. 10, 2014, 3:05 p.m. UTC | #13
On 10.01.2014 15:49, ronnie sahlberg wrote:
> On Fri, Jan 10, 2014 at 4:30 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 10/01/2014 13:12, Peter Lieven ha scritto:
>>> Then I shall convert everything to a qapi schema whereby the current
>>> design of libnfs is designed to work with plain URLs.
>> No, no one is asking you to do this.  URLs are fine, but I agree with
>> Kevin that parsing them in QEMU is better.
>>
>> Also because the QEMU parser is known to be based on RFCs and good code
>> from libxml2.  For example, the iSCSI URL parser, when introduced,
>> didn't even have percent-escape parsing, causing libvirt to fail with
>> old libiscsi (and actually not that old too: IIRC libiscsi 1.7 will
>> still fail).  Unless the libnfs parser is as good as libxml2's, I think
>> there's value in using the QEMU URI parser.
>
> I think that is fair enough.
>
> The arguments we are talking about are the type of arguments that only
> affect the interface between libnfs and the nfs server itself
> and is not strictly all that interesting to the application that links
> to libnfs.
>
> Since parsing a URL does require a fair amount of code, a hundred
> lines or more, it is a bit annoying having to re-implement the parsing
> code for every single small utility. For example  nfs-ls   nfs-cp
> nfs-cp    or for the parsing, that is still done, in the sg-utils
> patch.
> For a lot of these small and semi-trivial applications we don't really
> care all that much about what the options are but we might care a lot
> about making it easier to use libnfs and to avoid having to write a
> parser each time.
>
> For those use cases, I definitely think that having a built in
> function to parse a url, and automatically update the nfs context with
> connection related tweaks is a good thing. It eliminates the need to
> re-implement the parsing functions in every single trivial
> application.
>
>
> For QEMU and libvirt things may be different. These are non-trivial
> applications and may have needs to be able to control the settings
> explicitely in the QEMU code.
> That is still possible to do. All the url arguments so far tweak
> arguments that can also be controlled through explicit existing APIs.
> So for QEMU, there are functions available in libnfs now that will
> automatically update the nfs context with things like UID/GID to use
> when talking to the server, passed via the URL and QEMU can use them.
> On the other hand, if QEMU rather wants to parse the URL itself
> and make calls into the libnfs API to tweak these settings directly
> from the QEMU codebase, that is also possible.
>
>
> For example:   nfs://1.2.3.4/path/file?uid=10&gid=10
> When parsing these using the libnfs functions, the parsing functions
> will automatically update the nfs context so that it will use these
> values when it fills in the rpc header to send to the server.
> But if you want to parse the url yourself, you can do that too, by
> just calling   nfs_set_auth(nfs,  libnfs_authunix_create(..., 10, 10,
> ...


Proposal:
I revert the URL parsing code to v4 of the patch:

     URI *uri;
     char *file = NULL, *strp = NULL;

     uri = uri_parse(filename);
     if (!uri) {
         error_setg(errp, "Invalid URL specified.");
         goto fail;
     }
     strp = strrchr(uri->path, '/');
     if (strp == NULL) {
         error_setg(errp, "Invalid URL specified.");
         goto fail;
     }
     file = g_strdup(strp);
     *strp = 0;


And then pipe all the URL params (in QueryParams) through a (to be defined
public) function in libnfs

nfs_set_context_args(struct nfs_context *nfs, char *arg, char *val);


And we leave all the

QemuOptsList

and qapi-schema.json stuff for a later version when we touch all the other protocols.


Peter
Kevin Wolf Jan. 10, 2014, 3:46 p.m. UTC | #14
Am 10.01.2014 um 16:05 hat Peter Lieven geschrieben:
> On 10.01.2014 15:49, ronnie sahlberg wrote:
> >On Fri, Jan 10, 2014 at 4:30 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>Il 10/01/2014 13:12, Peter Lieven ha scritto:
> >>>Then I shall convert everything to a qapi schema whereby the current
> >>>design of libnfs is designed to work with plain URLs.
> >>No, no one is asking you to do this.  URLs are fine, but I agree with
> >>Kevin that parsing them in QEMU is better.
> >>
> >>Also because the QEMU parser is known to be based on RFCs and good code
> >>from libxml2.  For example, the iSCSI URL parser, when introduced,
> >>didn't even have percent-escape parsing, causing libvirt to fail with
> >>old libiscsi (and actually not that old too: IIRC libiscsi 1.7 will
> >>still fail).  Unless the libnfs parser is as good as libxml2's, I think
> >>there's value in using the QEMU URI parser.
> >
> >I think that is fair enough.
> >
> >The arguments we are talking about are the type of arguments that only
> >affect the interface between libnfs and the nfs server itself
> >and is not strictly all that interesting to the application that links
> >to libnfs.
> >
> >Since parsing a URL does require a fair amount of code, a hundred
> >lines or more, it is a bit annoying having to re-implement the parsing
> >code for every single small utility. For example  nfs-ls   nfs-cp
> >nfs-cp    or for the parsing, that is still done, in the sg-utils
> >patch.
> >For a lot of these small and semi-trivial applications we don't really
> >care all that much about what the options are but we might care a lot
> >about making it easier to use libnfs and to avoid having to write a
> >parser each time.
> >
> >For those use cases, I definitely think that having a built in
> >function to parse a url, and automatically update the nfs context with
> >connection related tweaks is a good thing. It eliminates the need to
> >re-implement the parsing functions in every single trivial
> >application.
> >
> >
> >For QEMU and libvirt things may be different. These are non-trivial
> >applications and may have needs to be able to control the settings
> >explicitely in the QEMU code.
> >That is still possible to do. All the url arguments so far tweak
> >arguments that can also be controlled through explicit existing APIs.
> >So for QEMU, there are functions available in libnfs now that will
> >automatically update the nfs context with things like UID/GID to use
> >when talking to the server, passed via the URL and QEMU can use them.
> >On the other hand, if QEMU rather wants to parse the URL itself
> >and make calls into the libnfs API to tweak these settings directly
> >from the QEMU codebase, that is also possible.
> >
> >
> >For example:   nfs://1.2.3.4/path/file?uid=10&gid=10
> >When parsing these using the libnfs functions, the parsing functions
> >will automatically update the nfs context so that it will use these
> >values when it fills in the rpc header to send to the server.
> >But if you want to parse the url yourself, you can do that too, by
> >just calling   nfs_set_auth(nfs,  libnfs_authunix_create(..., 10, 10,
> >...
> 
> 
> Proposal:
> I revert the URL parsing code to v4 of the patch:
> [...]

Agreed.

> And then pipe all the URL params (in QueryParams) through a (to be defined
> public) function in libnfs
> 
> nfs_set_context_args(struct nfs_context *nfs, char *arg, char *val);

I wouldn't do that. We should use specific functions like Ronnie
suggested in his nfs_set_auth() example.

> And we leave all the
> 
> QemuOptsList
> 
> and qapi-schema.json stuff for a later version when we touch all the other protocols.

Okay, I'll take care of it. For the time being, please include the TODO
comment that the other network-based drivers have.

Kevin
Peter Lieven Jan. 10, 2014, 4:10 p.m. UTC | #15
Am 10.01.2014 16:46, schrieb Kevin Wolf:
> Am 10.01.2014 um 16:05 hat Peter Lieven geschrieben:
>> On 10.01.2014 15:49, ronnie sahlberg wrote:
>>> On Fri, Jan 10, 2014 at 4:30 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> Il 10/01/2014 13:12, Peter Lieven ha scritto:
>>>>> Then I shall convert everything to a qapi schema whereby the current
>>>>> design of libnfs is designed to work with plain URLs.
>>>> No, no one is asking you to do this.  URLs are fine, but I agree with
>>>> Kevin that parsing them in QEMU is better.
>>>>
>>>> Also because the QEMU parser is known to be based on RFCs and good code
>>> >from libxml2.  For example, the iSCSI URL parser, when introduced,
>>>> didn't even have percent-escape parsing, causing libvirt to fail with
>>>> old libiscsi (and actually not that old too: IIRC libiscsi 1.7 will
>>>> still fail).  Unless the libnfs parser is as good as libxml2's, I think
>>>> there's value in using the QEMU URI parser.
>>> I think that is fair enough.
>>>
>>> The arguments we are talking about are the type of arguments that only
>>> affect the interface between libnfs and the nfs server itself
>>> and is not strictly all that interesting to the application that links
>>> to libnfs.
>>>
>>> Since parsing a URL does require a fair amount of code, a hundred
>>> lines or more, it is a bit annoying having to re-implement the parsing
>>> code for every single small utility. For example  nfs-ls   nfs-cp
>>> nfs-cp    or for the parsing, that is still done, in the sg-utils
>>> patch.
>>> For a lot of these small and semi-trivial applications we don't really
>>> care all that much about what the options are but we might care a lot
>>> about making it easier to use libnfs and to avoid having to write a
>>> parser each time.
>>>
>>> For those use cases, I definitely think that having a built in
>>> function to parse a url, and automatically update the nfs context with
>>> connection related tweaks is a good thing. It eliminates the need to
>>> re-implement the parsing functions in every single trivial
>>> application.
>>>
>>>
>>> For QEMU and libvirt things may be different. These are non-trivial
>>> applications and may have needs to be able to control the settings
>>> explicitely in the QEMU code.
>>> That is still possible to do. All the url arguments so far tweak
>>> arguments that can also be controlled through explicit existing APIs.
>>> So for QEMU, there are functions available in libnfs now that will
>>> automatically update the nfs context with things like UID/GID to use
>>> when talking to the server, passed via the URL and QEMU can use them.
>>> On the other hand, if QEMU rather wants to parse the URL itself
>>> and make calls into the libnfs API to tweak these settings directly
>> >from the QEMU codebase, that is also possible.
>>>
>>> For example:   nfs://1.2.3.4/path/file?uid=10&gid=10
>>> When parsing these using the libnfs functions, the parsing functions
>>> will automatically update the nfs context so that it will use these
>>> values when it fills in the rpc header to send to the server.
>>> But if you want to parse the url yourself, you can do that too, by
>>> just calling   nfs_set_auth(nfs,  libnfs_authunix_create(..., 10, 10,
>>> ...
>>
>> Proposal:
>> I revert the URL parsing code to v4 of the patch:
>> [...]
> Agreed.
>
>> And then pipe all the URL params (in QueryParams) through a (to be defined
>> public) function in libnfs
>>
>> nfs_set_context_args(struct nfs_context *nfs, char *arg, char *val);
> I wouldn't do that. We should use specific functions like Ronnie
> suggested in his nfs_set_auth() example.
Ronnie, I would map to the following functions. Especially for uid,gid because
we would have to add all that specific what to do on windows and what to do if
a user specifies only a uid and no gid stuff again:

uid => rpc_set_uid
gid => rpc_set_gid
tcp-syncnt => rpc_set_tcp_syncnt
autoreconnect => rpc_{set,unset}_autoreconnect

Ronnie, can you also give a short advise on Kevin's question about short reads.
I think they can happen if we read beyond past EOF or not?

>
>> And we leave all the
>>
>> QemuOptsList
>>
>> and qapi-schema.json stuff for a later version when we touch all the other protocols.
> Okay, I'll take care of it. For the time being, please include the TODO
> comment that the other network-based drivers have.
Thanks. Kevin, can you please give an advice how to proceed with the qemu-iotests.

Peter
ronnie sahlberg Jan. 10, 2014, 5:16 p.m. UTC | #16
On Fri, Jan 10, 2014 at 8:10 AM, Peter Lieven <pl@kamp.de> wrote:
>
> Ronnie, can you also give a short advise on Kevin's question about short reads.
> I think they can happen if we read beyond past EOF or not?
>

Short reads should normally not happen in libnfs itself since servers
are often careful always trying to sending back as much data as the
client requested.

There is a common exception though, for the case where you read past
the end of file.
So short reads should normally not happen. Unless QEMU or the guest
sends a request to libnfs to read past the end of the file.


If you send a READ for 1024 bytes to an nfs server at the offset 512
bytes from the end-of-file
then the server will respond with a read reply containing 512 bytes of
data  (and the eof flag set in the reply).

In my experience, most kernel/os based clients seem to be very careful
to never try to read beyond enf of file, so this rarely happens in
normal nfs.
(I only recall HPUX being a system where it would be common to always
issue nfs i/o in multiples of 4k   so for those clients it was very
important to make sure you implement short reads correctly in the
server).


I don't know how careful QEMU is in trying to prevent reading past the
end of the device or if it enforces it if the guest tries.
It is probably worth checking for short reads, at least for the case
where you might be reading past end of file.
Paolo Bonzini Jan. 10, 2014, 6:05 p.m. UTC | #17
Il 10/01/2014 18:16, ronnie sahlberg ha scritto:
> 
> There is a common exception though, for the case where you read past
> the end of file.
> So short reads should normally not happen. Unless QEMU or the guest
> sends a request to libnfs to read past the end of the file.

Yes, this can happen in QEMU and the various drivers are careful to pad
with zeroes.  It could perhaps be moved to block.c, but for now each
driver handles it separately.

Paolo
Peter Lieven Jan. 10, 2014, 6:07 p.m. UTC | #18
Von meinem iPad gesendet

Am 10.01.2014 um 19:05 schrieb "Paolo Bonzini" <pbonzini@redhat.com>:

> Il 10/01/2014 18:16, ronnie sahlberg ha scritto:
>> 
>> There is a common exception though, for the case where you read past
>> the end of file.
>> So short reads should normally not happen. Unless QEMU or the guest
>> sends a request to libnfs to read past the end of the file.
> 
> Yes, this can happen in QEMU and the various drivers are careful to pad
> with zeroes.  It could perhaps be moved to block.c, but for now each
> driver handles it separately.

ok i will add this as well. however, i thought i had seen code for this in block.c  already?,

> 
> Paolo
Paolo Bonzini Jan. 10, 2014, 6:24 p.m. UTC | #19
Il 10/01/2014 19:07, Peter Lieven ha scritto:
> 
> 
> 
> Von meinem iPad gesendet
> 
> Am 10.01.2014 um 19:05 schrieb "Paolo Bonzini" <pbonzini@redhat.com>:
> 
>> Il 10/01/2014 18:16, ronnie sahlberg ha scritto:
>>>
>>> There is a common exception though, for the case where you read past
>>> the end of file.
>>> So short reads should normally not happen. Unless QEMU or the guest
>>> sends a request to libnfs to read past the end of the file.
>>
>> Yes, this can happen in QEMU and the various drivers are careful to pad
>> with zeroes.  It could perhaps be moved to block.c, but for now each
>> driver handles it separately.
> 
> ok i will add this as well. however, i thought i had seen code for this in block.c  already?,

No, it corresponds to this code in block/raw-posix.c:

static int aio_worker(void *arg)
{
    RawPosixAIOData *aiocb = arg;
    ssize_t ret = 0;

    switch (aiocb->aio_type & QEMU_AIO_TYPE_MASK) {
    case QEMU_AIO_READ:
        ret = handle_aiocb_rw(aiocb);
        if (ret >= 0 && ret < aiocb->aio_nbytes && aiocb->bs->growable) {
            iov_memset(aiocb->aio_iov, aiocb->aio_niov, ret,
                      0, aiocb->aio_nbytes - ret);

            ret = aiocb->aio_nbytes;
        }
        if (ret == aiocb->aio_nbytes) {
            ret = 0;
        } else if (ret >= 0 && ret < aiocb->aio_nbytes) {
            ret = -EINVAL;
        }
        break;

Paolo
Peter Lieven Jan. 10, 2014, 6:47 p.m. UTC | #20
Am 10.01.2014 19:24, schrieb Paolo Bonzini:
> Il 10/01/2014 19:07, Peter Lieven ha scritto:
>>
>>
>> Von meinem iPad gesendet
>>
>> Am 10.01.2014 um 19:05 schrieb "Paolo Bonzini" <pbonzini@redhat.com>:
>>
>>> Il 10/01/2014 18:16, ronnie sahlberg ha scritto:
>>>> There is a common exception though, for the case where you read past
>>>> the end of file.
>>>> So short reads should normally not happen. Unless QEMU or the guest
>>>> sends a request to libnfs to read past the end of the file.
>>> Yes, this can happen in QEMU and the various drivers are careful to pad
>>> with zeroes.  It could perhaps be moved to block.c, but for now each
>>> driver handles it separately.
>> ok i will add this as well. however, i thought i had seen code for this in block.c  already?,
> No, it corresponds to this code in block/raw-posix.c:
>
> static int aio_worker(void *arg)
> {
>     RawPosixAIOData *aiocb = arg;
>     ssize_t ret = 0;
>
>     switch (aiocb->aio_type & QEMU_AIO_TYPE_MASK) {
>     case QEMU_AIO_READ:
>         ret = handle_aiocb_rw(aiocb);
>         if (ret >= 0 && ret < aiocb->aio_nbytes && aiocb->bs->growable) {
>             iov_memset(aiocb->aio_iov, aiocb->aio_niov, ret,
>                       0, aiocb->aio_nbytes - ret);
>
>             ret = aiocb->aio_nbytes;
>         }
>         if (ret == aiocb->aio_nbytes) {
>             ret = 0;
>         } else if (ret >= 0 && ret < aiocb->aio_nbytes) {
>             ret = -EINVAL;
>         }
>         break;

I am a little confused... but it seems what I had in mind just fills up full sectors....?!

    if (!(bs->zero_beyond_eof && bs->growable)) {
        ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
    } else {
        /* Read zeros after EOF of growable BDSes */
        int64_t len, total_sectors, max_nb_sectors;

        len = bdrv_getlength(bs);
        if (len < 0) {
            ret = len;
            goto out;
        }

        total_sectors = DIV_ROUND_UP(len, BDRV_SECTOR_SIZE);
        max_nb_sectors = MAX(0, total_sectors - sector_num);
        if (max_nb_sectors > 0) {
            ret = drv->bdrv_co_readv(bs, sector_num,
                                     MIN(nb_sectors, max_nb_sectors), qiov);
        } else {
            ret = 0;
        }

        /* Reading beyond end of file is supposed to produce zeroes */
        if (ret == 0 && total_sectors < sector_num + nb_sectors) {
            uint64_t offset = MAX(0, total_sectors - sector_num);
            uint64_t bytes = (sector_num + nb_sectors - offset) *
                              BDRV_SECTOR_SIZE;
            qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes);
        }
    }

Peter



>
> Paolo
diff mbox

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index a5ab8f8..09996ab 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -935,6 +935,11 @@  M: Peter Lieven <pl@kamp.de>
 S: Supported
 F: block/iscsi.c
 
+NFS
+M: Peter Lieven <pl@kamp.de>
+S: Maintained
+F: block/nfs.c
+
 SSH
 M: Richard W.M. Jones <rjones@redhat.com>
 S: Supported
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 4e8c91e..e254a21 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -12,6 +12,7 @@  block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 ifeq ($(CONFIG_POSIX),y)
 block-obj-y += nbd.o nbd-client.o sheepdog.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
+block-obj-$(CONFIG_LIBNFS) += nfs.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
 block-obj-$(CONFIG_GLUSTERFS) += gluster.o
diff --git a/block/nfs.c b/block/nfs.c
new file mode 100644
index 0000000..4023b71
--- /dev/null
+++ b/block/nfs.c
@@ -0,0 +1,405 @@ 
+/*
+ * QEMU Block driver for native access to files on NFS shares
+ *
+ * Copyright (c) 2013 Peter Lieven <pl@kamp.de>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "config-host.h"
+
+#include <poll.h>
+#include "qemu-common.h"
+#include "qemu/config-file.h"
+#include "qemu/error-report.h"
+#include "block/block_int.h"
+#include "trace.h"
+#include "qemu/iov.h"
+#include "sysemu/sysemu.h"
+
+#include <nfsc/libnfs-zdr.h>
+#include <nfsc/libnfs.h>
+#include <nfsc/libnfs-raw.h>
+#include <nfsc/libnfs-raw-mount.h>
+
+typedef struct NFSClient {
+    struct nfs_context *context;
+    struct nfsfh *fh;
+    int events;
+    bool has_zero_init;
+} NFSClient;
+
+typedef struct NFSRPC {
+    int status;
+    int complete;
+    QEMUIOVector *iov;
+    struct stat *st;
+    Coroutine *co;
+    QEMUBH *bh;
+} NFSRPC;
+
+static void nfs_process_read(void *arg);
+static void nfs_process_write(void *arg);
+
+static void nfs_set_events(NFSClient *client)
+{
+    int ev = nfs_which_events(client->context);
+    if (ev != client->events) {
+        qemu_aio_set_fd_handler(nfs_get_fd(client->context),
+                      (ev & POLLIN) ? nfs_process_read : NULL,
+                      (ev & POLLOUT) ? nfs_process_write : NULL,
+                      client);
+
+    }
+    client->events = ev;
+}
+
+static void nfs_process_read(void *arg)
+{
+    NFSClient *client = arg;
+    nfs_service(client->context, POLLIN);
+    nfs_set_events(client);
+}
+
+static void nfs_process_write(void *arg)
+{
+    NFSClient *client = arg;
+    nfs_service(client->context, POLLOUT);
+    nfs_set_events(client);
+}
+
+static void nfs_co_init_task(NFSClient *client, NFSRPC *task)
+{
+    *task = (NFSRPC) {
+        .co         = qemu_coroutine_self(),
+    };
+}
+
+static void nfs_co_generic_bh_cb(void *opaque)
+{
+    NFSRPC *task = opaque;
+    qemu_bh_delete(task->bh);
+    qemu_coroutine_enter(task->co, NULL);
+}
+
+static void
+nfs_co_generic_cb(int status, struct nfs_context *nfs, void *data,
+                  void *private_data)
+{
+    NFSRPC *task = private_data;
+    task->complete = 1;
+    task->status = status;
+    if (task->status > 0 && task->iov) {
+        if (task->status <= task->iov->size) {
+            qemu_iovec_from_buf(task->iov, 0, data, task->status);
+        } else {
+            task->status = -EIO;
+        }
+    }
+    if (task->status == 0 && task->st) {
+        memcpy(task->st, data, sizeof(struct stat));
+    }
+    if (task->co) {
+        task->bh = qemu_bh_new(nfs_co_generic_bh_cb, task);
+        qemu_bh_schedule(task->bh);
+    }
+}
+
+static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
+                                     int64_t sector_num, int nb_sectors,
+                                     QEMUIOVector *iov)
+{
+    NFSClient *client = bs->opaque;
+    NFSRPC task;
+
+    nfs_co_init_task(client, &task);
+    task.iov = iov;
+
+    if (nfs_pread_async(client->context, client->fh,
+                        sector_num * BDRV_SECTOR_SIZE,
+                        nb_sectors * BDRV_SECTOR_SIZE,
+                        nfs_co_generic_cb, &task) != 0) {
+        return -ENOMEM;
+    }
+
+    while (!task.complete) {
+        nfs_set_events(client);
+        qemu_coroutine_yield();
+    }
+
+    if (task.status < 0) {
+        return task.status;
+    }
+
+    return 0;
+}
+
+static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
+                                        int64_t sector_num, int nb_sectors,
+                                        QEMUIOVector *iov)
+{
+    NFSClient *client = bs->opaque;
+    NFSRPC task;
+    char *buf = NULL;
+
+    nfs_co_init_task(client, &task);
+
+    buf = g_malloc(nb_sectors * BDRV_SECTOR_SIZE);
+    qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
+
+    if (nfs_pwrite_async(client->context, client->fh,
+                         sector_num * BDRV_SECTOR_SIZE,
+                         nb_sectors * BDRV_SECTOR_SIZE,
+                         buf, nfs_co_generic_cb, &task) != 0) {
+        g_free(buf);
+        return -ENOMEM;
+    }
+
+    while (!task.complete) {
+        nfs_set_events(client);
+        qemu_coroutine_yield();
+    }
+
+    g_free(buf);
+
+    if (task.status != nb_sectors * BDRV_SECTOR_SIZE) {
+        return task.status < 0 ? task.status : -EIO;
+    }
+
+    return 0;
+}
+
+static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
+{
+    NFSClient *client = bs->opaque;
+    NFSRPC task;
+
+    nfs_co_init_task(client, &task);
+
+    if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
+                        &task) != 0) {
+        return -ENOMEM;
+    }
+
+    while (!task.complete) {
+        nfs_set_events(client);
+        qemu_coroutine_yield();
+    }
+
+    return task.status;
+}
+
+static QemuOptsList runtime_opts = {
+    .name = "nfs",
+    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
+    .desc = {
+        {
+            .name = "filename",
+            .type = QEMU_OPT_STRING,
+            .help = "URL to the NFS file",
+        },
+        { /* end of list */ }
+    },
+};
+
+static void nfs_client_close(NFSClient *client)
+{
+    if (client->context) {
+        if (client->fh) {
+            nfs_close(client->context, client->fh);
+        }
+        qemu_aio_set_fd_handler(nfs_get_fd(client->context), NULL, NULL, NULL);
+        nfs_destroy_context(client->context);
+    }
+    memset(client, 0, sizeof(NFSClient));
+}
+
+static void nfs_file_close(BlockDriverState *bs)
+{
+    NFSClient *client = bs->opaque;
+    nfs_client_close(client);
+}
+
+static int64_t nfs_client_open(NFSClient *client, const char *filename,
+                               int flags, Error **errp)
+{
+    int ret = -EINVAL;
+    struct stat st;
+    struct nfs_url *url;
+
+    client->context = nfs_init_context();
+    if (client->context == NULL) {
+        error_setg(errp, "Failed to init NFS context");
+        goto fail;
+    }
+
+    url = nfs_parse_url_full(client->context, filename);
+    if (url == NULL) {
+        error_setg(errp, "Failed to parse URL: %s",
+                   nfs_get_error(client->context));
+        goto fail;
+    }
+
+    ret = nfs_mount(client->context, url->server, url->path);
+    if (ret < 0) {
+        error_setg(errp, "Failed to mount nfs share: %s",
+                   nfs_get_error(client->context));
+        goto fail;
+    }
+
+    if (flags & O_CREAT) {
+        ret = nfs_creat(client->context, url->file, 0600, &client->fh);
+        if (ret < 0) {
+            error_setg(errp, "Failed to create file: %s",
+                       nfs_get_error(client->context));
+            goto fail;
+        }
+    } else {
+        ret = nfs_open(client->context, url->file, flags, &client->fh);
+        if (ret < 0) {
+            error_setg(errp, "Failed to open file : %s",
+                       nfs_get_error(client->context));
+            goto fail;
+        }
+    }
+
+    ret = nfs_fstat(client->context, client->fh, &st);
+    if (ret < 0) {
+        error_setg(errp, "Failed to fstat file: %s",
+                   nfs_get_error(client->context));
+        goto fail;
+    }
+
+    ret = DIV_ROUND_UP(st.st_size, BDRV_SECTOR_SIZE);
+    client->has_zero_init = S_ISREG(st.st_mode);
+    goto out;
+fail:
+    nfs_client_close(client);
+out:
+    nfs_destroy_url(url);
+    return ret;
+}
+
+static int nfs_file_open(BlockDriverState *bs, QDict *options, int flags,
+                         Error **errp) {
+    NFSClient *client = bs->opaque;
+    int64_t ret;
+    QemuOpts *opts;
+    Error *local_err = NULL;
+
+    opts = qemu_opts_create_nofail(&runtime_opts);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (error_is_set(&local_err)) {
+        qerror_report_err(local_err);
+        error_free(local_err);
+        return -EINVAL;
+    }
+    ret = nfs_client_open(client, qemu_opt_get(opts, "filename"),
+                          (flags & BDRV_O_RDWR) ? O_RDWR : O_RDONLY,
+                          errp);
+    if (ret < 0) {
+        return ret;
+    }
+    bs->total_sectors = ret;
+    return 0;
+}
+
+static int nfs_file_create(const char *filename, QEMUOptionParameter *options,
+                         Error **errp)
+{
+    int ret = 0;
+    int64_t total_size = 0;
+    NFSClient *client = g_malloc0(sizeof(NFSClient));
+
+    /* Read out options */
+    while (options && options->name) {
+        if (!strcmp(options->name, "size")) {
+            total_size = options->value.n;
+        }
+        options++;
+    }
+
+    ret = nfs_client_open(client, filename, O_CREAT, errp);
+    if (ret < 0) {
+        goto out;
+    }
+    ret = nfs_ftruncate(client->context, client->fh, total_size);
+out:
+    nfs_client_close(client);
+    g_free(client);
+    return ret;
+}
+
+static int nfs_has_zero_init(BlockDriverState *bs)
+{
+    NFSClient *client = bs->opaque;
+    return client->has_zero_init;
+}
+
+static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
+{
+    NFSClient *client = bs->opaque;
+    NFSRPC task = {0};
+    struct stat st;
+
+    task.st = &st;
+    if (nfs_fstat_async(client->context, client->fh, nfs_co_generic_cb,
+                        &task) != 0) {
+        return -ENOMEM;
+    }
+
+    while (!task.complete) {
+        nfs_set_events(client);
+        qemu_aio_wait();
+    }
+
+    return (task.status < 0 ? task.status : st.st_blocks * st.st_blksize);
+}
+
+static int nfs_file_truncate(BlockDriverState *bs, int64_t offset)
+{
+    NFSClient *client = bs->opaque;
+    return nfs_ftruncate(client->context, client->fh, offset);
+}
+
+static BlockDriver bdrv_nfs = {
+    .format_name     = "nfs",
+    .protocol_name   = "nfs",
+
+    .instance_size   = sizeof(NFSClient),
+    .bdrv_needs_filename = true,
+    .bdrv_has_zero_init = nfs_has_zero_init,
+    .bdrv_get_allocated_file_size = nfs_get_allocated_file_size,
+    .bdrv_truncate = nfs_file_truncate,
+
+    .bdrv_file_open  = nfs_file_open,
+    .bdrv_close      = nfs_file_close,
+    .bdrv_create     = nfs_file_create,
+
+    .bdrv_co_readv         = nfs_co_readv,
+    .bdrv_co_writev        = nfs_co_writev,
+    .bdrv_co_flush_to_disk = nfs_co_flush,
+};
+
+static void nfs_block_init(void)
+{
+    bdrv_register(&bdrv_nfs);
+}
+
+block_init(nfs_block_init);
diff --git a/configure b/configure
index d97556c..b3c3ca6 100755
--- a/configure
+++ b/configure
@@ -251,6 +251,7 @@  vss_win32_sdk=""
 win_sdk="no"
 want_tools="yes"
 libiscsi=""
+libnfs=""
 coroutine=""
 coroutine_pool=""
 seccomp=""
@@ -839,6 +840,10 @@  for opt do
   ;;
   --enable-libiscsi) libiscsi="yes"
   ;;
+  --disable-libnfs) libnfs="no"
+  ;;
+  --enable-libnfs) libnfs="yes"
+  ;;
   --enable-profiler) profiler="yes"
   ;;
   --disable-cocoa) cocoa="no"
@@ -1210,6 +1215,8 @@  echo "  --enable-spice           enable spice"
 echo "  --enable-rbd             enable building the rados block device (rbd)"
 echo "  --disable-libiscsi       disable iscsi support"
 echo "  --enable-libiscsi        enable iscsi support"
+echo "  --disable-libnfs         disable nfs support"
+echo "  --enable-libnfs          enable nfs support"
 echo "  --disable-smartcard-nss  disable smartcard nss support"
 echo "  --enable-smartcard-nss   enable smartcard nss support"
 echo "  --disable-libusb         disable libusb (for usb passthrough)"
@@ -3596,6 +3603,20 @@  elif test "$debug" = "no" ; then
   CFLAGS="-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 $CFLAGS"
 fi
 
+##########################################
+# Do we have libnfs
+if test "$libnfs" != "no" ; then
+  if $pkg_config --atleast-version=1.8.90 libnfs; then
+    libnfs="yes"
+    libnfs_libs=$($pkg_config --libs libnfs)
+    LIBS="$LIBS $libnfs_libs"
+  else
+    if test "$libnfs" = "yes" ; then
+      feature_not_found "libnfs"
+    fi
+    libnfs="no"
+  fi
+fi
 
 # Disable zero malloc errors for official releases unless explicitly told to
 # enable/disable
@@ -3825,6 +3846,7 @@  echo "libiscsi support  $libiscsi (1.4.0)"
 else
 echo "libiscsi support  $libiscsi"
 fi
+echo "libnfs support    $libnfs"
 echo "build guest agent $guest_agent"
 echo "QGA VSS support   $guest_agent_with_vss"
 echo "seccomp support   $seccomp"
@@ -4161,6 +4183,10 @@  if test "$libiscsi" = "yes" ; then
   fi
 fi
 
+if test "$libnfs" = "yes" ; then
+  echo "CONFIG_LIBNFS=y" >> $config_host_mak
+fi
+
 if test "$seccomp" = "yes"; then
   echo "CONFIG_SECCOMP=y" >> $config_host_mak
 fi