Patchwork RFC: NVRAM for pseries machine

login
register
mail settings
Submitter David Gibson
Date Sept. 21, 2012, 3:08 a.m.
Message ID <1348196931-9660-1-git-send-email-david@gibson.dropbear.id.au>
Download mbox | patch
Permalink /patch/185558/
State New
Headers show

Comments

David Gibson - Sept. 21, 2012, 3:08 a.m.
Below is a patch which implements the (PAPR mandated) NVRAM for the
pseries machine.  It raises a couple of generic questions.

First, this adds a new "nvram" machine option which is used to give a
block device id to back the NVRAM so it is persistent.  Since some
sort of NVRAM is quite common, it seems this might be useful on other
machines one day, although obviously nothing else implements it yet.

Second, if a block device is not specified, it simply allocates a
block of memory to make a non-persistent NVRAM.  Obviously that isn't
really "NV", but it's enough to make many guests happy most of the
time, and doesn't require setting up an image file and drive.  It does
mean a different set of code paths in the driver though, and it will
need special case handling for savevm (not implemented yet).  Is this
the right approach, or should I be creating a dummy block device for a
one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
but maybe I'm missing something.

pseries: Implement PAPR NVRAM

The PAPR specification requires a certain amount of NVRAM, accessed via
RTAS, which we don't currently implement in qemu.  This patch addresses
this deficiency, implementing the NVRAM as a VIO device, with some glue to
instantiate it automatically based on a machine option.

The machine option specifies a drive id, which is used to back the NVRAM,
making it persistent.  If nothing is specified, the driver instead simply
allocates space for the NVRAM, which will not be persistent

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/Makefile.objs |    1 +
 hw/spapr.c           |    3 +
 hw/spapr.h           |    3 +
 hw/spapr_nvram.c     |  225 ++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-config.c        |    4 +
 5 files changed, 236 insertions(+)
 create mode 100644 hw/spapr_nvram.c
Blue Swirl - Sept. 22, 2012, 1:31 p.m.
On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
<david@gibson.dropbear.id.au> wrote:
> Below is a patch which implements the (PAPR mandated) NVRAM for the
> pseries machine.  It raises a couple of generic questions.
>
> First, this adds a new "nvram" machine option which is used to give a
> block device id to back the NVRAM so it is persistent.  Since some
> sort of NVRAM is quite common, it seems this might be useful on other
> machines one day, although obviously nothing else implements it yet.

Yes, there have been discussions earlier since loading NVRAM contents
from a file would be useful for many architectures too.

>
> Second, if a block device is not specified, it simply allocates a
> block of memory to make a non-persistent NVRAM.  Obviously that isn't
> really "NV", but it's enough to make many guests happy most of the
> time, and doesn't require setting up an image file and drive.  It does
> mean a different set of code paths in the driver though, and it will
> need special case handling for savevm (not implemented yet).  Is this
> the right approach, or should I be creating a dummy block device for a
> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
> but maybe I'm missing something.

That was the problem earlier too, it looks like a generic way for all
NVRAM/flash devices should be obvious but so far nobody has been able
to propose something.

What if there are two devices which could use this, for example CMOS
and flash on x86?

This should be extending  -device syntax rather than adding another
top level option. Something like
-drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
spapr-nvram,drive_id=main_nvram

>
> pseries: Implement PAPR NVRAM
>
> The PAPR specification requires a certain amount of NVRAM, accessed via
> RTAS, which we don't currently implement in qemu.  This patch addresses
> this deficiency, implementing the NVRAM as a VIO device, with some glue to
> instantiate it automatically based on a machine option.
>
> The machine option specifies a drive id, which is used to back the NVRAM,
> making it persistent.  If nothing is specified, the driver instead simply
> allocates space for the NVRAM, which will not be persistent
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/Makefile.objs |    1 +
>  hw/spapr.c           |    3 +
>  hw/spapr.h           |    3 +
>  hw/spapr_nvram.c     |  225 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  qemu-config.c        |    4 +
>  5 files changed, 236 insertions(+)
>  create mode 100644 hw/spapr_nvram.c
>
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index 951e407..91cbe8c 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -11,6 +11,7 @@ obj-y += ppc_newworld.o
>  obj-$(CONFIG_PSERIES) += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
>  obj-$(CONFIG_PSERIES) += xics.o spapr_vty.o spapr_llan.o spapr_vscsi.o
>  obj-$(CONFIG_PSERIES) += spapr_pci.o pci-hotplug.o spapr_iommu.o
> +obj-$(CONFIG_PSERIES) += spapr_nvram.o
>  # PowerPC 4xx boards
>  obj-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
>  obj-y += ppc440_bamboo.o
> diff --git a/hw/spapr.c b/hw/spapr.c
> index a8bd3c1..079825a 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -804,6 +804,9 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>          }
>      }
>
> +    /* We always have at least the nvram device on VIO */
> +    spapr_create_nvram(spapr);
> +
>      /* Set up PCI */
>      spapr_pci_rtas_init();
>
> diff --git a/hw/spapr.h b/hw/spapr.h
> index e984e3f..d9c3b4a 100644
> --- a/hw/spapr.h
> +++ b/hw/spapr.h
> @@ -6,11 +6,13 @@
>
>  struct VIOsPAPRBus;
>  struct sPAPRPHBState;
> +struct sPAPRNVRAM;
>  struct icp_state;
>
>  typedef struct sPAPREnvironment {
>      struct VIOsPAPRBus *vio_bus;
>      QLIST_HEAD(, sPAPRPHBState) phbs;
> +    struct sPAPRNVRAM *nvram;
>      struct icp_state *icp;
>
>      target_phys_addr_t ram_limit;
> @@ -336,6 +338,7 @@ typedef struct sPAPRTCE {
>  #define SPAPR_PCI_BASE_LIOBN    0x80000000
>
>  void spapr_iommu_init(void);
> +void spapr_create_nvram(sPAPREnvironment *spapr);
>  DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size);
>  void spapr_tce_free(DMAContext *dma);
>  void spapr_tce_reset(DMAContext *dma);
> diff --git a/hw/spapr_nvram.c b/hw/spapr_nvram.c
> new file mode 100644
> index 0000000..8cd8a53
> --- /dev/null
> +++ b/hw/spapr_nvram.c
> @@ -0,0 +1,225 @@
> +/*
> + * QEMU sPAPR NVRAM emulation
> + *
> + * Copyright (C) 2012 David Gibson, IBM Corporation.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +#include <sys/mman.h>
> +#include <libfdt.h>
> +
> +#include "device_tree.h"
> +#include "hw/sysbus.h"
> +#include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> +
> +typedef struct sPAPRNVRAM {
> +    VIOsPAPRDevice sdev;
> +    uint32_t size;
> +    uint8_t *buf;
> +    BlockDriverState *drive;
> +} sPAPRNVRAM;
> +
> +#define MIN_NVRAM_SIZE 8192
> +#define DEFAULT_NVRAM_SIZE 16384
> +#define MAX_NVRAM_SIZE (UINT16_MAX * 16)
> +
> +static void rtas_nvram_fetch(sPAPREnvironment *spapr,
> +                             uint32_t token, uint32_t nargs,
> +                             target_ulong args,
> +                             uint32_t nret, target_ulong rets)
> +{
> +    sPAPRNVRAM *nvram = spapr->nvram;
> +    target_phys_addr_t offset, buffer, len;
> +    int alen;
> +    void *membuf;
> +
> +    if ((nargs != 3) || (nret != 2)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    if (!nvram) {
> +        rtas_st(rets, 0, -1);
> +        rtas_st(rets, 1, 0);
> +        return;
> +    }
> +
> +    offset = rtas_ld(args, 0);
> +    buffer = rtas_ld(args, 1);
> +    len = rtas_ld(args, 2);
> +
> +    if (((offset + len) < offset)
> +        || ((offset + len) > nvram->size)) {
> +        rtas_st(rets, 0, -3);
> +        rtas_st(rets, 1, 0);
> +        return;
> +    }
> +
> +    membuf = cpu_physical_memory_map(buffer, &len, 1);
> +    if (nvram->drive) {
> +        alen = bdrv_pread(nvram->drive, offset, membuf, len);
> +    } else {
> +        assert(nvram->buf);
> +
> +        memcpy(membuf, nvram->buf + offset, len);
> +        alen = len;
> +    }
> +    cpu_physical_memory_unmap(membuf, len, 1, len);
> +
> +    rtas_st(rets, 0, (alen < len) ? -1 : 0);
> +    rtas_st(rets, 1, (alen < 0) ? 0 : alen);
> +}
> +
> +static void rtas_nvram_store(sPAPREnvironment *spapr,
> +                             uint32_t token, uint32_t nargs,
> +                             target_ulong args,
> +                             uint32_t nret, target_ulong rets)
> +{
> +    sPAPRNVRAM *nvram = spapr->nvram;
> +    target_phys_addr_t offset, buffer, len;
> +    int alen;
> +    void *membuf;
> +
> +    if ((nargs != 3) || (nret != 2)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    if (!nvram) {
> +        rtas_st(rets, 0, -1);
> +        return;
> +    }
> +
> +    offset = rtas_ld(args, 0);
> +    buffer = rtas_ld(args, 1);
> +    len = rtas_ld(args, 2);
> +
> +    if (((offset + len) < offset)
> +        || ((offset + len) > nvram->size)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    membuf = cpu_physical_memory_map(buffer, &len, 0);
> +    if (nvram->drive) {
> +        alen = bdrv_pwrite(nvram->drive, offset, membuf, len);
> +    } else {
> +        assert(nvram->buf);
> +
> +        memcpy(nvram->buf + offset, membuf, len);
> +        alen = len;
> +    }
> +    cpu_physical_memory_unmap(membuf, len, 0, len);
> +
> +    rtas_st(rets, 0, (alen < len) ? -1 : 0);
> +    rtas_st(rets, 1, (alen < 0) ? 0 : alen);
> +}
> +
> +static int spapr_nvram_init(VIOsPAPRDevice *dev)
> +{
> +    sPAPRNVRAM *nvram = (sPAPRNVRAM *)dev;
> +
> +    if (nvram->drive) {
> +        nvram->size = bdrv_getlength(nvram->drive);
> +    } else {
> +        nvram->size = DEFAULT_NVRAM_SIZE;
> +        nvram->buf = g_malloc0(nvram->size);
> +    }
> +
> +    if ((nvram->size < MIN_NVRAM_SIZE) || (nvram->size > MAX_NVRAM_SIZE)) {
> +        fprintf(stderr, "spapr-nvram must be between %d and %d bytes in size\n",
> +                MIN_NVRAM_SIZE, MAX_NVRAM_SIZE);
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int spapr_nvram_devnode(VIOsPAPRDevice *dev, void *fdt, int node_off)
> +{
> +    sPAPRNVRAM *nvram = (sPAPRNVRAM *)dev;
> +
> +    return fdt_setprop_cell(fdt, node_off, "#bytes", nvram->size);
> +}
> +
> +static Property spapr_nvram_properties[] = {
> +    DEFINE_SPAPR_PROPERTIES(sPAPRNVRAM, sdev),
> +    DEFINE_PROP_DRIVE("drive", sPAPRNVRAM, drive),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void spapr_nvram_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    VIOsPAPRDeviceClass *k = VIO_SPAPR_DEVICE_CLASS(klass);
> +
> +    k->init = spapr_nvram_init;
> +    k->devnode = spapr_nvram_devnode;
> +    k->dt_name = "nvram";
> +    k->dt_type = "nvram";
> +    k->dt_compatible = "qemu,spapr-nvram";
> +    dc->props = spapr_nvram_properties;
> +}
> +
> +static const TypeInfo spapr_nvram_type_info = {
> +    .name          = "spapr-nvram",
> +    .parent        = TYPE_VIO_SPAPR_DEVICE,
> +    .instance_size = sizeof(sPAPRNVRAM),
> +    .class_init    = spapr_nvram_class_init,
> +};
> +
> +static void spapr_nvram_register_types(void)
> +{
> +    type_register_static(&spapr_nvram_type_info);
> +}
> +
> +type_init(spapr_nvram_register_types)
> +
> +void spapr_create_nvram(sPAPREnvironment *spapr)
> +{
> +    QemuOpts *machine_opts;
> +    DeviceState *dev;
> +
> +    dev = qdev_create(&spapr->vio_bus->bus, "spapr-nvram");
> +
> +    machine_opts = qemu_opts_find(qemu_find_opts("machine"), 0);
> +    if (machine_opts) {
> +        const char *drivename;
> +
> +        drivename = qemu_opt_get(machine_opts, "nvram");
> +        if (drivename) {
> +            BlockDriverState *bs;
> +
> +            bs = bdrv_find(drivename);
> +            if (!bs) {
> +                fprintf(stderr, "No such block device \"%s\" for nvram\n",
> +                        drivename);
> +                exit(1);
> +            }
> +            qdev_prop_set_drive_nofail(dev, "drive", bs);
> +        }
> +    }
> +
> +    qdev_init_nofail(dev);
> +
> +    spapr->nvram = (sPAPRNVRAM *)dev;
> +    spapr_rtas_register("nvram-fetch", rtas_nvram_fetch);
> +    spapr_rtas_register("nvram-store", rtas_nvram_store);
> +}
> diff --git a/qemu-config.c b/qemu-config.c
> index 12eafbb..1cd9a1b 100644
> --- a/qemu-config.c
> +++ b/qemu-config.c
> @@ -619,6 +619,10 @@ static QemuOptsList qemu_machine_opts = {
>              .name = "mem-merge",
>              .type = QEMU_OPT_BOOL,
>              .help = "enable/disable memory merge support",
> +        }, {
> +            .name = "nvram",
> +            .type = QEMU_OPT_STRING,
> +            .help = "Drive backing persistent NVRAM",
>          },
>          { /* End of list */ }
>      },
> --
> 1.7.10.4
>
>
Alexander Graf - Sept. 22, 2012, 2:16 p.m.
On 22.09.2012, at 15:31, Blue Swirl <blauwirbel@gmail.com> wrote:

> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
> <david@gibson.dropbear.id.au> wrote:
>> Below is a patch which implements the (PAPR mandated) NVRAM for the
>> pseries machine.  It raises a couple of generic questions.
>> 
>> First, this adds a new "nvram" machine option which is used to give a
>> block device id to back the NVRAM so it is persistent.  Since some
>> sort of NVRAM is quite common, it seems this might be useful on other
>> machines one day, although obviously nothing else implements it yet.
> 
> Yes, there have been discussions earlier since loading NVRAM contents
> from a file would be useful for many architectures too.
> 
>> 
>> Second, if a block device is not specified, it simply allocates a
>> block of memory to make a non-persistent NVRAM.  Obviously that isn't
>> really "NV", but it's enough to make many guests happy most of the
>> time, and doesn't require setting up an image file and drive.  It does
>> mean a different set of code paths in the driver though, and it will
>> need special case handling for savevm (not implemented yet).  Is this
>> the right approach, or should I be creating a dummy block device for a
>> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
>> but maybe I'm missing something.
> 
> That was the problem earlier too, it looks like a generic way for all
> NVRAM/flash devices should be obvious but so far nobody has been able
> to propose something.
> 
> What if there are two devices which could use this, for example CMOS
> and flash on x86?
> 
> This should be extending  -device syntax rather than adding another
> top level option. Something like
> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
> spapr-nvram,drive_id=main_nvram

Could we create a simplified syntax for this in addition? Something like

  -device spapr-nvram,file=nvram.raw

which would then automatically spawn a drive for the user. Saving the machine state would obviously save the transparently created drive.

Of course we could also just introduce a -nvram option that machine dependingly does create the device and block backend. More complex setups where a machine doesn't fit into -bios and -nvram options (are there any?) can then go with more complex syntax.

But I don't want to force people to have to use -device syntax for the average qemu use cases. Ever. If we go that route we could just as well librarify all of QEMU and always link it with libvirt and only ever expose an xml file for machine config to users.


Alex
Blue Swirl - Sept. 22, 2012, 2:26 p.m.
On Sat, Sep 22, 2012 at 2:16 PM, Alexander Graf <agraf@suse.de> wrote:
>
>
> On 22.09.2012, at 15:31, Blue Swirl <blauwirbel@gmail.com> wrote:
>
>> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
>> <david@gibson.dropbear.id.au> wrote:
>>> Below is a patch which implements the (PAPR mandated) NVRAM for the
>>> pseries machine.  It raises a couple of generic questions.
>>>
>>> First, this adds a new "nvram" machine option which is used to give a
>>> block device id to back the NVRAM so it is persistent.  Since some
>>> sort of NVRAM is quite common, it seems this might be useful on other
>>> machines one day, although obviously nothing else implements it yet.
>>
>> Yes, there have been discussions earlier since loading NVRAM contents
>> from a file would be useful for many architectures too.
>>
>>>
>>> Second, if a block device is not specified, it simply allocates a
>>> block of memory to make a non-persistent NVRAM.  Obviously that isn't
>>> really "NV", but it's enough to make many guests happy most of the
>>> time, and doesn't require setting up an image file and drive.  It does
>>> mean a different set of code paths in the driver though, and it will
>>> need special case handling for savevm (not implemented yet).  Is this
>>> the right approach, or should I be creating a dummy block device for a
>>> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
>>> but maybe I'm missing something.
>>
>> That was the problem earlier too, it looks like a generic way for all
>> NVRAM/flash devices should be obvious but so far nobody has been able
>> to propose something.
>>
>> What if there are two devices which could use this, for example CMOS
>> and flash on x86?
>>
>> This should be extending  -device syntax rather than adding another
>> top level option. Something like
>> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
>> spapr-nvram,drive_id=main_nvram
>
> Could we create a simplified syntax for this in addition? Something like
>
>   -device spapr-nvram,file=nvram.raw
>
> which would then automatically spawn a drive for the user. Saving the machine state would obviously save the transparently created drive.

That would be nice too. Maybe NVRAM-like devices should just declare
that they support backing storage and this would then be attached
automatically if either file=blah or id=drive_id is specified.

>
> Of course we could also just introduce a -nvram option that machine dependingly does create the device and block backend. More complex setups where a machine doesn't fit into -bios and -nvram options (are there any?) can then go with more complex syntax.
>
> But I don't want to force people to have to use -device syntax for the average qemu use cases. Ever. If we go that route we could just as well librarify all of QEMU and always link it with libvirt and only ever expose an xml file for machine config to users.

I don't use -device often either. The good side of -device syntax is
that it forces people to think about for example multiple devices and
how the devices fit together etc.

>
>
> Alex
>
David Gibson - Sept. 24, 2012, 12:31 a.m.
On Sat, Sep 22, 2012 at 01:31:08PM +0000, Blue Swirl wrote:
> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
> <david@gibson.dropbear.id.au> wrote:
> > Below is a patch which implements the (PAPR mandated) NVRAM for the
> > pseries machine.  It raises a couple of generic questions.
> >
> > First, this adds a new "nvram" machine option which is used to give a
> > block device id to back the NVRAM so it is persistent.  Since some
> > sort of NVRAM is quite common, it seems this might be useful on other
> > machines one day, although obviously nothing else implements it yet.
> 
> Yes, there have been discussions earlier since loading NVRAM contents
> from a file would be useful for many architectures too.
> 
> >
> > Second, if a block device is not specified, it simply allocates a
> > block of memory to make a non-persistent NVRAM.  Obviously that isn't
> > really "NV", but it's enough to make many guests happy most of the
> > time, and doesn't require setting up an image file and drive.  It does
> > mean a different set of code paths in the driver though, and it will
> > need special case handling for savevm (not implemented yet).  Is this
> > the right approach, or should I be creating a dummy block device for a
> > one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
> > but maybe I'm missing something.
> 
> That was the problem earlier too, it looks like a generic way for all
> NVRAM/flash devices should be obvious but so far nobody has been able
> to propose something.
> 
> What if there are two devices which could use this, for example CMOS
> and flash on x86?
> 
> This should be extending  -device syntax rather than adding another
> top level option. Something like
> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
> spapr-nvram,drive_id=main_nvram

So, if you look at the patch there is actually a -device form within
there, the machine option is a wrapper around it.  Without the machine
option, I don't see how to get the desired properties for the
configuration that is:
 * NVRAM is always instantiated by default (even if it's
non-persistent)
 * It's easy to set the drive for that always-present NVRAM

Note that in our case the guest interface to the NVRAM precludes
having more than one instance of the device (the paravirtualized calls
to access it take no address or instance id).  You can create multiple
with -device, but there will be no way to access them.
David Gibson - Sept. 24, 2012, 12:34 a.m.
On Sat, Sep 22, 2012 at 02:26:43PM +0000, Blue Swirl wrote:
> On Sat, Sep 22, 2012 at 2:16 PM, Alexander Graf <agraf@suse.de> wrote:
> >
> >
> > On 22.09.2012, at 15:31, Blue Swirl <blauwirbel@gmail.com> wrote:
> >
> >> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
> >> <david@gibson.dropbear.id.au> wrote:
> >>> Below is a patch which implements the (PAPR mandated) NVRAM for the
> >>> pseries machine.  It raises a couple of generic questions.
> >>>
> >>> First, this adds a new "nvram" machine option which is used to give a
> >>> block device id to back the NVRAM so it is persistent.  Since some
> >>> sort of NVRAM is quite common, it seems this might be useful on other
> >>> machines one day, although obviously nothing else implements it yet.
> >>
> >> Yes, there have been discussions earlier since loading NVRAM contents
> >> from a file would be useful for many architectures too.
> >>
> >>>
> >>> Second, if a block device is not specified, it simply allocates a
> >>> block of memory to make a non-persistent NVRAM.  Obviously that isn't
> >>> really "NV", but it's enough to make many guests happy most of the
> >>> time, and doesn't require setting up an image file and drive.  It does
> >>> mean a different set of code paths in the driver though, and it will
> >>> need special case handling for savevm (not implemented yet).  Is this
> >>> the right approach, or should I be creating a dummy block device for a
> >>> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
> >>> but maybe I'm missing something.
> >>
> >> That was the problem earlier too, it looks like a generic way for all
> >> NVRAM/flash devices should be obvious but so far nobody has been able
> >> to propose something.
> >>
> >> What if there are two devices which could use this, for example CMOS
> >> and flash on x86?
> >>
> >> This should be extending  -device syntax rather than adding another
> >> top level option. Something like
> >> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
> >> spapr-nvram,drive_id=main_nvram
> >
> > Could we create a simplified syntax for this in addition? Something like
> >
> >   -device spapr-nvram,file=nvram.raw
> >
> > which would then automatically spawn a drive for the user. Saving the machine state would obviously save the transparently created drive.
> 
> That would be nice too. Maybe NVRAM-like devices should just declare
> that they support backing storage and this would then be attached
> automatically if either file=blah or id=drive_id is specified.

Well, that might be cool.  But can we come to some sort of consensus
or otherwise on whether this approach is a good start, rather than
wandering off into how we can extend it just now?
Alexander Graf - Sept. 24, 2012, 10:36 a.m.
On 21.09.2012, at 05:08, David Gibson wrote:

> Below is a patch which implements the (PAPR mandated) NVRAM for the
> pseries machine.  It raises a couple of generic questions.
> 
> First, this adds a new "nvram" machine option which is used to give a
> block device id to back the NVRAM so it is persistent.  Since some
> sort of NVRAM is quite common, it seems this might be useful on other
> machines one day, although obviously nothing else implements it yet.
> 
> Second, if a block device is not specified, it simply allocates a
> block of memory to make a non-persistent NVRAM.  Obviously that isn't
> really "NV", but it's enough to make many guests happy most of the
> time, and doesn't require setting up an image file and drive.  It does
> mean a different set of code paths in the driver though, and it will
> need special case handling for savevm (not implemented yet).  Is this
> the right approach, or should I be creating a dummy block device for a
> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
> but maybe I'm missing something.
> 
> pseries: Implement PAPR NVRAM
> 
> The PAPR specification requires a certain amount of NVRAM, accessed via
> RTAS, which we don't currently implement in qemu.  This patch addresses
> this deficiency, implementing the NVRAM as a VIO device, with some glue to
> instantiate it automatically based on a machine option.
> 
> The machine option specifies a drive id, which is used to back the NVRAM,
> making it persistent.  If nothing is specified, the driver instead simply
> allocates space for the NVRAM, which will not be persistent
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> hw/ppc/Makefile.objs |    1 +
> hw/spapr.c           |    3 +
> hw/spapr.h           |    3 +
> hw/spapr_nvram.c     |  225 ++++++++++++++++++++++++++++++++++++++++++++++++++
> qemu-config.c        |    4 +
> 5 files changed, 236 insertions(+)
> create mode 100644 hw/spapr_nvram.c
> 
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index 951e407..91cbe8c 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -11,6 +11,7 @@ obj-y += ppc_newworld.o
> obj-$(CONFIG_PSERIES) += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
> obj-$(CONFIG_PSERIES) += xics.o spapr_vty.o spapr_llan.o spapr_vscsi.o
> obj-$(CONFIG_PSERIES) += spapr_pci.o pci-hotplug.o spapr_iommu.o
> +obj-$(CONFIG_PSERIES) += spapr_nvram.o
> # PowerPC 4xx boards
> obj-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
> obj-y += ppc440_bamboo.o
> diff --git a/hw/spapr.c b/hw/spapr.c
> index a8bd3c1..079825a 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -804,6 +804,9 @@ static void ppc_spapr_init(ram_addr_t ram_size,
>         }
>     }
> 
> +    /* We always have at least the nvram device on VIO */
> +    spapr_create_nvram(spapr);
> +
>     /* Set up PCI */
>     spapr_pci_rtas_init();
> 
> diff --git a/hw/spapr.h b/hw/spapr.h
> index e984e3f..d9c3b4a 100644
> --- a/hw/spapr.h
> +++ b/hw/spapr.h
> @@ -6,11 +6,13 @@
> 
> struct VIOsPAPRBus;
> struct sPAPRPHBState;
> +struct sPAPRNVRAM;
> struct icp_state;
> 
> typedef struct sPAPREnvironment {
>     struct VIOsPAPRBus *vio_bus;
>     QLIST_HEAD(, sPAPRPHBState) phbs;
> +    struct sPAPRNVRAM *nvram;
>     struct icp_state *icp;
> 
>     target_phys_addr_t ram_limit;
> @@ -336,6 +338,7 @@ typedef struct sPAPRTCE {
> #define SPAPR_PCI_BASE_LIOBN    0x80000000
> 
> void spapr_iommu_init(void);
> +void spapr_create_nvram(sPAPREnvironment *spapr);
> DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size);
> void spapr_tce_free(DMAContext *dma);
> void spapr_tce_reset(DMAContext *dma);
> diff --git a/hw/spapr_nvram.c b/hw/spapr_nvram.c
> new file mode 100644
> index 0000000..8cd8a53
> --- /dev/null
> +++ b/hw/spapr_nvram.c
> @@ -0,0 +1,225 @@
> +/*
> + * QEMU sPAPR NVRAM emulation
> + *
> + * Copyright (C) 2012 David Gibson, IBM Corporation.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +#include <sys/mman.h>
> +#include <libfdt.h>
> +
> +#include "device_tree.h"
> +#include "hw/sysbus.h"
> +#include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> +
> +typedef struct sPAPRNVRAM {
> +    VIOsPAPRDevice sdev;
> +    uint32_t size;
> +    uint8_t *buf;
> +    BlockDriverState *drive;
> +} sPAPRNVRAM;
> +
> +#define MIN_NVRAM_SIZE 8192
> +#define DEFAULT_NVRAM_SIZE 16384
> +#define MAX_NVRAM_SIZE (UINT16_MAX * 16)
> +
> +static void rtas_nvram_fetch(sPAPREnvironment *spapr,
> +                             uint32_t token, uint32_t nargs,
> +                             target_ulong args,
> +                             uint32_t nret, target_ulong rets)
> +{
> +    sPAPRNVRAM *nvram = spapr->nvram;
> +    target_phys_addr_t offset, buffer, len;
> +    int alen;
> +    void *membuf;
> +
> +    if ((nargs != 3) || (nret != 2)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    if (!nvram) {
> +        rtas_st(rets, 0, -1);
> +        rtas_st(rets, 1, 0);
> +        return;
> +    }
> +
> +    offset = rtas_ld(args, 0);
> +    buffer = rtas_ld(args, 1);
> +    len = rtas_ld(args, 2);
> +
> +    if (((offset + len) < offset)
> +        || ((offset + len) > nvram->size)) {
> +        rtas_st(rets, 0, -3);
> +        rtas_st(rets, 1, 0);
> +        return;
> +    }
> +
> +    membuf = cpu_physical_memory_map(buffer, &len, 1);
> +    if (nvram->drive) {
> +        alen = bdrv_pread(nvram->drive, offset, membuf, len);
> +    } else {
> +        assert(nvram->buf);
> +
> +        memcpy(membuf, nvram->buf + offset, len);
> +        alen = len;
> +    }
> +    cpu_physical_memory_unmap(membuf, len, 1, len);
> +
> +    rtas_st(rets, 0, (alen < len) ? -1 : 0);
> +    rtas_st(rets, 1, (alen < 0) ? 0 : alen);
> +}
> +
> +static void rtas_nvram_store(sPAPREnvironment *spapr,
> +                             uint32_t token, uint32_t nargs,
> +                             target_ulong args,
> +                             uint32_t nret, target_ulong rets)
> +{
> +    sPAPRNVRAM *nvram = spapr->nvram;
> +    target_phys_addr_t offset, buffer, len;
> +    int alen;
> +    void *membuf;
> +
> +    if ((nargs != 3) || (nret != 2)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    if (!nvram) {
> +        rtas_st(rets, 0, -1);
> +        return;
> +    }
> +
> +    offset = rtas_ld(args, 0);
> +    buffer = rtas_ld(args, 1);
> +    len = rtas_ld(args, 2);
> +
> +    if (((offset + len) < offset)
> +        || ((offset + len) > nvram->size)) {
> +        rtas_st(rets, 0, -3);
> +        return;
> +    }
> +
> +    membuf = cpu_physical_memory_map(buffer, &len, 0);
> +    if (nvram->drive) {
> +        alen = bdrv_pwrite(nvram->drive, offset, membuf, len);
> +    } else {
> +        assert(nvram->buf);
> +
> +        memcpy(nvram->buf + offset, membuf, len);
> +        alen = len;
> +    }
> +    cpu_physical_memory_unmap(membuf, len, 0, len);
> +
> +    rtas_st(rets, 0, (alen < len) ? -1 : 0);
> +    rtas_st(rets, 1, (alen < 0) ? 0 : alen);
> +}
> +
> +static int spapr_nvram_init(VIOsPAPRDevice *dev)
> +{
> +    sPAPRNVRAM *nvram = (sPAPRNVRAM *)dev;
> +
> +    if (nvram->drive) {
> +        nvram->size = bdrv_getlength(nvram->drive);
> +    } else {
> +        nvram->size = DEFAULT_NVRAM_SIZE;
> +        nvram->buf = g_malloc0(nvram->size);
> +    }
> +
> +    if ((nvram->size < MIN_NVRAM_SIZE) || (nvram->size > MAX_NVRAM_SIZE)) {
> +        fprintf(stderr, "spapr-nvram must be between %d and %d bytes in size\n",
> +                MIN_NVRAM_SIZE, MAX_NVRAM_SIZE);
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int spapr_nvram_devnode(VIOsPAPRDevice *dev, void *fdt, int node_off)
> +{
> +    sPAPRNVRAM *nvram = (sPAPRNVRAM *)dev;
> +
> +    return fdt_setprop_cell(fdt, node_off, "#bytes", nvram->size);
> +}
> +
> +static Property spapr_nvram_properties[] = {
> +    DEFINE_SPAPR_PROPERTIES(sPAPRNVRAM, sdev),
> +    DEFINE_PROP_DRIVE("drive", sPAPRNVRAM, drive),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void spapr_nvram_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    VIOsPAPRDeviceClass *k = VIO_SPAPR_DEVICE_CLASS(klass);
> +
> +    k->init = spapr_nvram_init;
> +    k->devnode = spapr_nvram_devnode;
> +    k->dt_name = "nvram";
> +    k->dt_type = "nvram";
> +    k->dt_compatible = "qemu,spapr-nvram";
> +    dc->props = spapr_nvram_properties;
> +}
> +
> +static const TypeInfo spapr_nvram_type_info = {
> +    .name          = "spapr-nvram",
> +    .parent        = TYPE_VIO_SPAPR_DEVICE,
> +    .instance_size = sizeof(sPAPRNVRAM),
> +    .class_init    = spapr_nvram_class_init,
> +};
> +
> +static void spapr_nvram_register_types(void)
> +{
> +    type_register_static(&spapr_nvram_type_info);
> +}
> +
> +type_init(spapr_nvram_register_types)
> +
> +void spapr_create_nvram(sPAPREnvironment *spapr)
> +{
> +    QemuOpts *machine_opts;
> +    DeviceState *dev;
> +
> +    dev = qdev_create(&spapr->vio_bus->bus, "spapr-nvram");
> +
> +    machine_opts = qemu_opts_find(qemu_find_opts("machine"), 0);
> +    if (machine_opts) {
> +        const char *drivename;
> +
> +        drivename = qemu_opt_get(machine_opts, "nvram");
> +        if (drivename) {
> +            BlockDriverState *bs;
> +
> +            bs = bdrv_find(drivename);
> +            if (!bs) {
> +                fprintf(stderr, "No such block device \"%s\" for nvram\n",
> +                        drivename);
> +                exit(1);

I thought you wanted to autogenerate contents in RAM if no drive is available?

> +            }
> +            qdev_prop_set_drive_nofail(dev, "drive", bs);
> +        }
> +    }
> +
> +    qdev_init_nofail(dev);
> +
> +    spapr->nvram = (sPAPRNVRAM *)dev;
> +    spapr_rtas_register("nvram-fetch", rtas_nvram_fetch);
> +    spapr_rtas_register("nvram-store", rtas_nvram_store);
> +}
> diff --git a/qemu-config.c b/qemu-config.c
> index 12eafbb..1cd9a1b 100644
> --- a/qemu-config.c
> +++ b/qemu-config.c
> @@ -619,6 +619,10 @@ static QemuOptsList qemu_machine_opts = {
>             .name = "mem-merge",
>             .type = QEMU_OPT_BOOL,
>             .help = "enable/disable memory merge support",
> +        }, {
> +            .name = "nvram",
> +            .type = QEMU_OPT_STRING,
> +            .help = "Drive backing persistent NVRAM",

I like the idea of a machine implemented NVRAM. Maybe we should add an -nvram option that does an automatic -drive and -machine ...,machine=drive_id for the user :).


Alex
Alexander Graf - Sept. 24, 2012, 10:38 a.m.
On 24.09.2012, at 02:31, David Gibson wrote:

> On Sat, Sep 22, 2012 at 01:31:08PM +0000, Blue Swirl wrote:
>> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
>> <david@gibson.dropbear.id.au> wrote:
>>> Below is a patch which implements the (PAPR mandated) NVRAM for the
>>> pseries machine.  It raises a couple of generic questions.
>>> 
>>> First, this adds a new "nvram" machine option which is used to give a
>>> block device id to back the NVRAM so it is persistent.  Since some
>>> sort of NVRAM is quite common, it seems this might be useful on other
>>> machines one day, although obviously nothing else implements it yet.
>> 
>> Yes, there have been discussions earlier since loading NVRAM contents
>> from a file would be useful for many architectures too.
>> 
>>> 
>>> Second, if a block device is not specified, it simply allocates a
>>> block of memory to make a non-persistent NVRAM.  Obviously that isn't
>>> really "NV", but it's enough to make many guests happy most of the
>>> time, and doesn't require setting up an image file and drive.  It does
>>> mean a different set of code paths in the driver though, and it will
>>> need special case handling for savevm (not implemented yet).  Is this
>>> the right approach, or should I be creating a dummy block device for a
>>> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
>>> but maybe I'm missing something.
>> 
>> That was the problem earlier too, it looks like a generic way for all
>> NVRAM/flash devices should be obvious but so far nobody has been able
>> to propose something.
>> 
>> What if there are two devices which could use this, for example CMOS
>> and flash on x86?
>> 
>> This should be extending  -device syntax rather than adding another
>> top level option. Something like
>> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
>> spapr-nvram,drive_id=main_nvram
> 
> So, if you look at the patch there is actually a -device form within
> there, the machine option is a wrapper around it.  Without the machine
> option, I don't see how to get the desired properties for the
> configuration that is:
> * NVRAM is always instantiated by default (even if it's
> non-persistent)
> * It's easy to set the drive for that always-present NVRAM

I suppose the idea is that when creating a machine from a qtree dump, we can still recreate it. Or maybe when using -nodefaults? Not sure. But the way you do it right now is very close to how we want to model USB too, so I do like it. It's consistent.

> Note that in our case the guest interface to the NVRAM precludes
> having more than one instance of the device (the paravirtualized calls
> to access it take no address or instance id).  You can create multiple
> with -device, but there will be no way to access them.

Yes. But one of the future ideas in QEMU land is that you could recreate a machine from a dump of the current documentation, and thus it would require for a -device style instantiation. The default case for the command line shouldn't need that for obvious reasons :).


Alex
David Gibson - Sept. 24, 2012, 12:25 p.m.
On Mon, Sep 24, 2012 at 12:36:33PM +0200, Alexander Graf wrote:
> On 21.09.2012, at 05:08, David Gibson wrote:
[snip]
> > +void spapr_create_nvram(sPAPREnvironment *spapr)
> > +{
> > +    QemuOpts *machine_opts;
> > +    DeviceState *dev;
> > +
> > +    dev = qdev_create(&spapr->vio_bus->bus, "spapr-nvram");
> > +
> > +    machine_opts = qemu_opts_find(qemu_find_opts("machine"), 0);
> > +    if (machine_opts) {
> > +        const char *drivename;
> > +
> > +        drivename = qemu_opt_get(machine_opts, "nvram");
> > +        if (drivename) {
> > +            BlockDriverState *bs;
> > +
> > +            bs = bdrv_find(drivename);
> > +            if (!bs) {
> > +                fprintf(stderr, "No such block device \"%s\" for nvram\n",
> > +                        drivename);
> > +                exit(1);
> 
> I thought you wanted to autogenerate contents in RAM if no drive is available?

We do.  This is the case where the user *has* specified a drive id,
but a drive of that name does not exist.

> > +            }
> > +            qdev_prop_set_drive_nofail(dev, "drive", bs);
> > +        }
> > +    }
> > +
> > +    qdev_init_nofail(dev);
> > +
> > +    spapr->nvram = (sPAPRNVRAM *)dev;
> > +    spapr_rtas_register("nvram-fetch", rtas_nvram_fetch);
> > +    spapr_rtas_register("nvram-store", rtas_nvram_store);
> > +}
> > diff --git a/qemu-config.c b/qemu-config.c
> > index 12eafbb..1cd9a1b 100644
> > --- a/qemu-config.c
> > +++ b/qemu-config.c
> > @@ -619,6 +619,10 @@ static QemuOptsList qemu_machine_opts = {
> >             .name = "mem-merge",
> >             .type = QEMU_OPT_BOOL,
> >             .help = "enable/disable memory merge support",
> > +        }, {
> > +            .name = "nvram",
> > +            .type = QEMU_OPT_STRING,
> > +            .help = "Drive backing persistent NVRAM",
> 
> I like the idea of a machine implemented NVRAM. Maybe we should add
> an -nvram option that does an automatic -drive and -machine
> ...,machine=drive_id for the user :).

Uh, sure, whatever.  I don't really care what syntactic sugar is
added, as long as the basic thing is working.
Alexander Graf - Sept. 24, 2012, 1:44 p.m.
On 24.09.2012, at 14:25, David Gibson wrote:

> On Mon, Sep 24, 2012 at 12:36:33PM +0200, Alexander Graf wrote:
>> On 21.09.2012, at 05:08, David Gibson wrote:
> [snip]
>>> +void spapr_create_nvram(sPAPREnvironment *spapr)
>>> +{
>>> +    QemuOpts *machine_opts;
>>> +    DeviceState *dev;
>>> +
>>> +    dev = qdev_create(&spapr->vio_bus->bus, "spapr-nvram");
>>> +
>>> +    machine_opts = qemu_opts_find(qemu_find_opts("machine"), 0);
>>> +    if (machine_opts) {
>>> +        const char *drivename;
>>> +
>>> +        drivename = qemu_opt_get(machine_opts, "nvram");
>>> +        if (drivename) {
>>> +            BlockDriverState *bs;
>>> +
>>> +            bs = bdrv_find(drivename);
>>> +            if (!bs) {
>>> +                fprintf(stderr, "No such block device \"%s\" for nvram\n",
>>> +                        drivename);
>>> +                exit(1);
>> 
>> I thought you wanted to autogenerate contents in RAM if no drive is available?
> 
> We do.  This is the case where the user *has* specified a drive id,
> but a drive of that name does not exist.

Ah :).

> 
>>> +            }
>>> +            qdev_prop_set_drive_nofail(dev, "drive", bs);
>>> +        }
>>> +    }
>>> +
>>> +    qdev_init_nofail(dev);
>>> +
>>> +    spapr->nvram = (sPAPRNVRAM *)dev;
>>> +    spapr_rtas_register("nvram-fetch", rtas_nvram_fetch);
>>> +    spapr_rtas_register("nvram-store", rtas_nvram_store);
>>> +}
>>> diff --git a/qemu-config.c b/qemu-config.c
>>> index 12eafbb..1cd9a1b 100644
>>> --- a/qemu-config.c
>>> +++ b/qemu-config.c
>>> @@ -619,6 +619,10 @@ static QemuOptsList qemu_machine_opts = {
>>>            .name = "mem-merge",
>>>            .type = QEMU_OPT_BOOL,
>>>            .help = "enable/disable memory merge support",
>>> +        }, {
>>> +            .name = "nvram",
>>> +            .type = QEMU_OPT_STRING,
>>> +            .help = "Drive backing persistent NVRAM",
>> 
>> I like the idea of a machine implemented NVRAM. Maybe we should add
>> an -nvram option that does an automatic -drive and -machine
>> ...,machine=drive_id for the user :).
> 
> Uh, sure, whatever.  I don't really care what syntactic sugar is
> added, as long as the basic thing is working.

It certainly works for me :).


Alex
David Gibson - Sept. 26, 2012, 12:27 a.m.
On Mon, Sep 24, 2012 at 12:38:59PM +0200, Alexander Graf wrote:
> 
> On 24.09.2012, at 02:31, David Gibson wrote:
> 
> > On Sat, Sep 22, 2012 at 01:31:08PM +0000, Blue Swirl wrote:
> >> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
> >> <david@gibson.dropbear.id.au> wrote:
> >>> Below is a patch which implements the (PAPR mandated) NVRAM for the
> >>> pseries machine.  It raises a couple of generic questions.
> >>> 
> >>> First, this adds a new "nvram" machine option which is used to give a
> >>> block device id to back the NVRAM so it is persistent.  Since some
> >>> sort of NVRAM is quite common, it seems this might be useful on other
> >>> machines one day, although obviously nothing else implements it yet.
> >> 
> >> Yes, there have been discussions earlier since loading NVRAM contents
> >> from a file would be useful for many architectures too.
> >> 
> >>> 
> >>> Second, if a block device is not specified, it simply allocates a
> >>> block of memory to make a non-persistent NVRAM.  Obviously that isn't
> >>> really "NV", but it's enough to make many guests happy most of the
> >>> time, and doesn't require setting up an image file and drive.  It does
> >>> mean a different set of code paths in the driver though, and it will
> >>> need special case handling for savevm (not implemented yet).  Is this
> >>> the right approach, or should I be creating a dummy block device for a
> >>> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
> >>> but maybe I'm missing something.
> >> 
> >> That was the problem earlier too, it looks like a generic way for all
> >> NVRAM/flash devices should be obvious but so far nobody has been able
> >> to propose something.
> >> 
> >> What if there are two devices which could use this, for example CMOS
> >> and flash on x86?
> >> 
> >> This should be extending  -device syntax rather than adding another
> >> top level option. Something like
> >> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
> >> spapr-nvram,drive_id=main_nvram
> > 
> > So, if you look at the patch there is actually a -device form within
> > there, the machine option is a wrapper around it.  Without the machine
> > option, I don't see how to get the desired properties for the
> > configuration that is:
> > * NVRAM is always instantiated by default (even if it's
> > non-persistent)
> > * It's easy to set the drive for that always-present NVRAM
> 
> I suppose the idea is that when creating a machine from a qtree
> dump, we can still recreate it. Or maybe when using -nodefaults? Not
> sure. But the way you do it right now is very close to how we want
> to model USB too, so I do like it. It's consistent.

I really don't follow what point you're making here.

The problem with -device syntax for my purpose is that with *no* extra
command line arguments we should always have some sort of NVRAM - it's
mandated by the platform spec, and should always be there, just like
the PCI bridge and VIO bridge.  That means instantiating the device
from the machine setup code.  But then, using -device will create a
second instance of the device, which is no good, because only one can
actually be used.
Alexander Graf - Sept. 26, 2012, 1:03 a.m.
On 26.09.2012, at 02:27, David Gibson wrote:

> On Mon, Sep 24, 2012 at 12:38:59PM +0200, Alexander Graf wrote:
>> 
>> On 24.09.2012, at 02:31, David Gibson wrote:
>> 
>>> On Sat, Sep 22, 2012 at 01:31:08PM +0000, Blue Swirl wrote:
>>>> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
>>>> <david@gibson.dropbear.id.au> wrote:
>>>>> Below is a patch which implements the (PAPR mandated) NVRAM for the
>>>>> pseries machine.  It raises a couple of generic questions.
>>>>> 
>>>>> First, this adds a new "nvram" machine option which is used to give a
>>>>> block device id to back the NVRAM so it is persistent.  Since some
>>>>> sort of NVRAM is quite common, it seems this might be useful on other
>>>>> machines one day, although obviously nothing else implements it yet.
>>>> 
>>>> Yes, there have been discussions earlier since loading NVRAM contents
>>>> from a file would be useful for many architectures too.
>>>> 
>>>>> 
>>>>> Second, if a block device is not specified, it simply allocates a
>>>>> block of memory to make a non-persistent NVRAM.  Obviously that isn't
>>>>> really "NV", but it's enough to make many guests happy most of the
>>>>> time, and doesn't require setting up an image file and drive.  It does
>>>>> mean a different set of code paths in the driver though, and it will
>>>>> need special case handling for savevm (not implemented yet).  Is this
>>>>> the right approach, or should I be creating a dummy block device for a
>>>>> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
>>>>> but maybe I'm missing something.
>>>> 
>>>> That was the problem earlier too, it looks like a generic way for all
>>>> NVRAM/flash devices should be obvious but so far nobody has been able
>>>> to propose something.
>>>> 
>>>> What if there are two devices which could use this, for example CMOS
>>>> and flash on x86?
>>>> 
>>>> This should be extending  -device syntax rather than adding another
>>>> top level option. Something like
>>>> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
>>>> spapr-nvram,drive_id=main_nvram
>>> 
>>> So, if you look at the patch there is actually a -device form within
>>> there, the machine option is a wrapper around it.  Without the machine
>>> option, I don't see how to get the desired properties for the
>>> configuration that is:
>>> * NVRAM is always instantiated by default (even if it's
>>> non-persistent)
>>> * It's easy to set the drive for that always-present NVRAM
>> 
>> I suppose the idea is that when creating a machine from a qtree
>> dump, we can still recreate it. Or maybe when using -nodefaults? Not
>> sure. But the way you do it right now is very close to how we want
>> to model USB too, so I do like it. It's consistent.
> 
> I really don't follow what point you're making here.
> 
> The problem with -device syntax for my purpose is that with *no* extra
> command line arguments we should always have some sort of NVRAM - it's
> mandated by the platform spec, and should always be there, just like
> the PCI bridge and VIO bridge.  That means instantiating the device
> from the machine setup code.  But then, using -device will create a
> second instance of the device, which is no good, because only one can
> actually be used.

What I'm trying to say is that the machine file should create a device. Always in the case of PAPR. But I suppose pseudo-code is easier to read:

spapr.c:

  create_device("spapr-nvram", drive=machine_opts["nvram"]);

spapr-nvram:

  if (!drive || checksum_is_bad(drive))
    autogenerate_nvram_contents();

Then we can later add in vl.c:

  case OPTION_nvram:
    create_drive("nvram", option);
    machine_opts["nvram"] = drive["nvram"];


Alex
David Gibson - Sept. 26, 2012, 1:18 a.m.
On Wed, Sep 26, 2012 at 03:03:10AM +0200, Alexander Graf wrote:
> On 26.09.2012, at 02:27, David Gibson wrote:
> > On Mon, Sep 24, 2012 at 12:38:59PM +0200, Alexander Graf wrote:
> >> On 24.09.2012, at 02:31, David Gibson wrote:
[snip]
> >>> So, if you look at the patch there is actually a -device form within
> >>> there, the machine option is a wrapper around it.  Without the machine
> >>> option, I don't see how to get the desired properties for the
> >>> configuration that is:
> >>> * NVRAM is always instantiated by default (even if it's
> >>> non-persistent)
> >>> * It's easy to set the drive for that always-present NVRAM
> >> 
> >> I suppose the idea is that when creating a machine from a qtree
> >> dump, we can still recreate it. Or maybe when using -nodefaults? Not
> >> sure. But the way you do it right now is very close to how we want
> >> to model USB too, so I do like it. It's consistent.
> > 
> > I really don't follow what point you're making here.
> > 
> > The problem with -device syntax for my purpose is that with *no* extra
> > command line arguments we should always have some sort of NVRAM - it's
> > mandated by the platform spec, and should always be there, just like
> > the PCI bridge and VIO bridge.  That means instantiating the device
> > from the machine setup code.  But then, using -device will create a
> > second instance of the device, which is no good, because only one can
> > actually be used.
> 
> What I'm trying to say is that the machine file should create a
> device. Always in the case of PAPR. But I suppose pseudo-code is
> easier to read:
> 
> spapr.c:
> 
>   create_device("spapr-nvram", drive=machine_opts["nvram"]);

Ok.  That's what I do now.

> spapr-nvram:
> 
>   if (!drive || checksum_is_bad(drive))
>     autogenerate_nvram_contents();

Actually, I'm planning for the initialization of the content to be
done from the guest firmware.


> Then we can later add in vl.c:
> 
>   case OPTION_nvram:
>     create_drive("nvram", option);
>     machine_opts["nvram"] = drive["nvram"];

Ok, that all works for me.

Blue, does that seem reasonable to you?
Alexander Graf - Sept. 26, 2012, 8:56 a.m.
On 26.09.2012, at 03:18, David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Sep 26, 2012 at 03:03:10AM +0200, Alexander Graf wrote:
>> On 26.09.2012, at 02:27, David Gibson wrote:
>>> On Mon, Sep 24, 2012 at 12:38:59PM +0200, Alexander Graf wrote:
>>>> On 24.09.2012, at 02:31, David Gibson wrote:
> [snip]
>>>>> So, if you look at the patch there is actually a -device form within
>>>>> there, the machine option is a wrapper around it.  Without the machine
>>>>> option, I don't see how to get the desired properties for the
>>>>> configuration that is:
>>>>> * NVRAM is always instantiated by default (even if it's
>>>>> non-persistent)
>>>>> * It's easy to set the drive for that always-present NVRAM
>>>> 
>>>> I suppose the idea is that when creating a machine from a qtree
>>>> dump, we can still recreate it. Or maybe when using -nodefaults? Not
>>>> sure. But the way you do it right now is very close to how we want
>>>> to model USB too, so I do like it. It's consistent.
>>> 
>>> I really don't follow what point you're making here.
>>> 
>>> The problem with -device syntax for my purpose is that with *no* extra
>>> command line arguments we should always have some sort of NVRAM - it's
>>> mandated by the platform spec, and should always be there, just like
>>> the PCI bridge and VIO bridge.  That means instantiating the device
>>> from the machine setup code.  But then, using -device will create a
>>> second instance of the device, which is no good, because only one can
>>> actually be used.
>> 
>> What I'm trying to say is that the machine file should create a
>> device. Always in the case of PAPR. But I suppose pseudo-code is
>> easier to read:
>> 
>> spapr.c:
>> 
>>  create_device("spapr-nvram", drive=machine_opts["nvram"]);
> 
> Ok.  That's what I do now.
> 
>> spapr-nvram:
>> 
>>  if (!drive || checksum_is_bad(drive))
>>    autogenerate_nvram_contents();
> 
> Actually, I'm planning for the initialization of the content to be
> done from the guest firmware.

Does the guest have all information necessary to construct a workable nvram image? If so, then yes, that's even better.

Alex

> 
> 
>> Then we can later add in vl.c:
>> 
>>  case OPTION_nvram:
>>    create_drive("nvram", option);
>>    machine_opts["nvram"] = drive["nvram"];
> 
> Ok, that all works for me.
> 
> Blue, does that seem reasonable to you?
> 
> -- 
> David Gibson            | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au    | minimalist, thank you.  NOT _the_ _other_
>                | _way_ _around_!
> http://www.ozlabs.org/~dgibson
Thomas Huth - Sept. 26, 2012, 12:08 p.m.
Am Wed, 26 Sep 2012 10:56:07 +0200
schrieb Alexander Graf <agraf@suse.de>:

> 
> 
> On 26.09.2012, at 03:18, David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Sep 26, 2012 at 03:03:10AM +0200, Alexander Graf wrote:
[snip]
> >> spapr-nvram:
> >> 
> >>  if (!drive || checksum_is_bad(drive))
> >>    autogenerate_nvram_contents();
> > 
> > Actually, I'm planning for the initialization of the content to be
> > done from the guest firmware.
> 
> Does the guest have all information necessary to construct a workable nvram image? If so, then yes, that's even better.

At least SLOF already contains the code to initialize the PAPR-style
NVRAM partitions from scratch. So as soon as SLOF can work with this
new NVRAM devices, it should be able to initialize the required layout.

 Thomas
Anthony Liguori - Sept. 26, 2012, 8:03 p.m.
Alexander Graf <agraf@suse.de> writes:

> On 22.09.2012, at 15:31, Blue Swirl <blauwirbel@gmail.com> wrote:
>
>> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
>> <david@gibson.dropbear.id.au> wrote:
>>> Below is a patch which implements the (PAPR mandated) NVRAM for the
>>> pseries machine.  It raises a couple of generic questions.
>>> 
>>> First, this adds a new "nvram" machine option which is used to give a
>>> block device id to back the NVRAM so it is persistent.  Since some
>>> sort of NVRAM is quite common, it seems this might be useful on other
>>> machines one day, although obviously nothing else implements it yet.
>> 
>> Yes, there have been discussions earlier since loading NVRAM contents
>> from a file would be useful for many architectures too.
>> 
>>> 
>>> Second, if a block device is not specified, it simply allocates a
>>> block of memory to make a non-persistent NVRAM.  Obviously that isn't
>>> really "NV", but it's enough to make many guests happy most of the
>>> time, and doesn't require setting up an image file and drive.  It does
>>> mean a different set of code paths in the driver though, and it will
>>> need special case handling for savevm (not implemented yet).  Is this
>>> the right approach, or should I be creating a dummy block device for a
>>> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
>>> but maybe I'm missing something.
>> 
>> That was the problem earlier too, it looks like a generic way for all
>> NVRAM/flash devices should be obvious but so far nobody has been able
>> to propose something.
>> 
>> What if there are two devices which could use this, for example CMOS
>> and flash on x86?
>> 
>> This should be extending  -device syntax rather than adding another
>> top level option. Something like
>> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
>> spapr-nvram,drive_id=main_nvram
>
> Could we create a simplified syntax for this in addition? Something like
>
>   -device spapr-nvram,file=nvram.raw
>
> which would then automatically spawn a drive for the user. Saving the
> machine state would obviously save the transparently created drive.

We can't ask people to rewrite half of QEMU just to merge a feature.

In this case, what matters is:

0) The device should always be modelled with QOM/qdev

1) If the device is a fundamental part of the machine (i.e. you can't do
   anything useful with out it), then it's configuration should be
   specified as a machine parameter.

2) If !(1), the device should be added with -device

3) Devices deal with backends and only with backends.  We have a syntax
   for specifying backends independently of backends.

If you want a single option to configure a device, that's a problem to
attempt to solve independent of this series.

> But I don't want to force people to have to use -device syntax for the
> average qemu use cases.

Sorry, but that's where we're at today.  -device is part of our user
interface.  It's a management tool only interface and we cannot
replicate every option just because you don't like the syntax.

Regards,

Anthony Liguori
Alexander Graf - Sept. 26, 2012, 8:51 p.m.
On 26.09.2012, at 22:03, Anthony Liguori <anthony@codemonkey.ws> wrote:

> Alexander Graf <agraf@suse.de> writes:
> 
>> On 22.09.2012, at 15:31, Blue Swirl <blauwirbel@gmail.com> wrote:
>> 
>>> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
>>> <david@gibson.dropbear.id.au> wrote:
>>>> Below is a patch which implements the (PAPR mandated) NVRAM for the
>>>> pseries machine.  It raises a couple of generic questions.
>>>> 
>>>> First, this adds a new "nvram" machine option which is used to give a
>>>> block device id to back the NVRAM so it is persistent.  Since some
>>>> sort of NVRAM is quite common, it seems this might be useful on other
>>>> machines one day, although obviously nothing else implements it yet.
>>> 
>>> Yes, there have been discussions earlier since loading NVRAM contents
>>> from a file would be useful for many architectures too.
>>> 
>>>> 
>>>> Second, if a block device is not specified, it simply allocates a
>>>> block of memory to make a non-persistent NVRAM.  Obviously that isn't
>>>> really "NV", but it's enough to make many guests happy most of the
>>>> time, and doesn't require setting up an image file and drive.  It does
>>>> mean a different set of code paths in the driver though, and it will
>>>> need special case handling for savevm (not implemented yet).  Is this
>>>> the right approach, or should I be creating a dummy block device for a
>>>> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
>>>> but maybe I'm missing something.
>>> 
>>> That was the problem earlier too, it looks like a generic way for all
>>> NVRAM/flash devices should be obvious but so far nobody has been able
>>> to propose something.
>>> 
>>> What if there are two devices which could use this, for example CMOS
>>> and flash on x86?
>>> 
>>> This should be extending  -device syntax rather than adding another
>>> top level option. Something like
>>> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
>>> spapr-nvram,drive_id=main_nvram
>> 
>> Could we create a simplified syntax for this in addition? Something like
>> 
>>  -device spapr-nvram,file=nvram.raw
>> 
>> which would then automatically spawn a drive for the user. Saving the
>> machine state would obviously save the transparently created drive.
> 
> We can't ask people to rewrite half of QEMU just to merge a feature.

Who is asking anyone to rewrite half of QEMU?

> 
> In this case, what matters is:
> 
> 0) The device should always be modelled with QOM/qdev

Yes

> 
> 1) If the device is a fundamental part of the machine (i.e. you can't do
>   anything useful with out it), then it's configuration should be
>   specified as a machine parameter.

Yes

> 
> 2) If !(1), the device should be added with -device

Yes

> 
> 3) Devices deal with backends and only with backends.  We have a syntax
>   for specifying backends independently of backends.

Yes

and

4) For often occuring use cases, we might want to provide a simplified cmdline syntax

> 
> If you want a single option to configure a device, that's a problem to
> attempt to solve independent of this series.

I never disagreed with that statement. We were merely brainstorming.

> 
>> But I don't want to force people to have to use -device syntax for the
>> average qemu use cases.
> 
> Sorry, but that's where we're at today.  -device is part of our user
> interface.  It's a management tool only interface and we cannot
> replicate every option just because you don't like the syntax.

Sure. And it's good to have. But we should also provide easier syntax for people without mgmnt tools, for use cases that occur often.

From the first xen vs kvm days one main argument about kvm was the difficulty in running it. People took the (overly complex) libvirt execution command line to QEMU and showed it to people. It did indeed scare a few.

So all I'm saying above is that we should not restrict ourselves to -device syntax, if we see a case that happens for more people than usual. However, I'd always try to model it as a shortcut form. So

  -nvram <file>

would just in the cmdline parser be converted to

  -drive file=<file>,if=none,id=nvram -machine nvram=nvram

I hope that makes my point a bit clearer. In fact, I'm quite sure we're in heavy agreement, so I'm not quite sure what you're complaining about :)

Alex

> 
> Regards,
> 
> Anthony Liguori
Blue Swirl - Sept. 29, 2012, 11:46 a.m.
On Wed, Sep 26, 2012 at 8:51 PM, Alexander Graf <agraf@suse.de> wrote:
>
>
> On 26.09.2012, at 22:03, Anthony Liguori <anthony@codemonkey.ws> wrote:
>
>> Alexander Graf <agraf@suse.de> writes:
>>
>>> On 22.09.2012, at 15:31, Blue Swirl <blauwirbel@gmail.com> wrote:
>>>
>>>> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
>>>> <david@gibson.dropbear.id.au> wrote:
>>>>> Below is a patch which implements the (PAPR mandated) NVRAM for the
>>>>> pseries machine.  It raises a couple of generic questions.
>>>>>
>>>>> First, this adds a new "nvram" machine option which is used to give a
>>>>> block device id to back the NVRAM so it is persistent.  Since some
>>>>> sort of NVRAM is quite common, it seems this might be useful on other
>>>>> machines one day, although obviously nothing else implements it yet.
>>>>
>>>> Yes, there have been discussions earlier since loading NVRAM contents
>>>> from a file would be useful for many architectures too.
>>>>
>>>>>
>>>>> Second, if a block device is not specified, it simply allocates a
>>>>> block of memory to make a non-persistent NVRAM.  Obviously that isn't
>>>>> really "NV", but it's enough to make many guests happy most of the
>>>>> time, and doesn't require setting up an image file and drive.  It does
>>>>> mean a different set of code paths in the driver though, and it will
>>>>> need special case handling for savevm (not implemented yet).  Is this
>>>>> the right approach, or should I be creating a dummy block device for a
>>>>> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
>>>>> but maybe I'm missing something.
>>>>
>>>> That was the problem earlier too, it looks like a generic way for all
>>>> NVRAM/flash devices should be obvious but so far nobody has been able
>>>> to propose something.
>>>>
>>>> What if there are two devices which could use this, for example CMOS
>>>> and flash on x86?
>>>>
>>>> This should be extending  -device syntax rather than adding another
>>>> top level option. Something like
>>>> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
>>>> spapr-nvram,drive_id=main_nvram
>>>
>>> Could we create a simplified syntax for this in addition? Something like
>>>
>>>  -device spapr-nvram,file=nvram.raw
>>>
>>> which would then automatically spawn a drive for the user. Saving the
>>> machine state would obviously save the transparently created drive.
>>
>> We can't ask people to rewrite half of QEMU just to merge a feature.
>
> Who is asking anyone to rewrite half of QEMU?
>
>>
>> In this case, what matters is:
>>
>> 0) The device should always be modelled with QOM/qdev
>
> Yes
>
>>
>> 1) If the device is a fundamental part of the machine (i.e. you can't do
>>   anything useful with out it), then it's configuration should be
>>   specified as a machine parameter.
>
> Yes
>
>>
>> 2) If !(1), the device should be added with -device
>
> Yes
>
>>
>> 3) Devices deal with backends and only with backends.  We have a syntax
>>   for specifying backends independently of backends.
>
> Yes
>
> and
>
> 4) For often occuring use cases, we might want to provide a simplified cmdline syntax
>
>>
>> If you want a single option to configure a device, that's a problem to
>> attempt to solve independent of this series.
>
> I never disagreed with that statement. We were merely brainstorming.
>
>>
>>> But I don't want to force people to have to use -device syntax for the
>>> average qemu use cases.
>>
>> Sorry, but that's where we're at today.  -device is part of our user
>> interface.  It's a management tool only interface and we cannot
>> replicate every option just because you don't like the syntax.
>
> Sure. And it's good to have. But we should also provide easier syntax for people without mgmnt tools, for use cases that occur often.
>
> From the first xen vs kvm days one main argument about kvm was the difficulty in running it. People took the (overly complex) libvirt execution command line to QEMU and showed it to people. It did indeed scare a few.
>
> So all I'm saying above is that we should not restrict ourselves to -device syntax, if we see a case that happens for more people than usual. However, I'd always try to model it as a shortcut form. So
>
>   -nvram <file>
>
> would just in the cmdline parser be converted to
>
>   -drive file=<file>,if=none,id=nvram -machine nvram=nvram

The problem with this is that it hardcodes the nvram device to one and
only 'nvram'. What about CMOS and flash for x86, which one -nvram
would control?

>
> I hope that makes my point a bit clearer. In fact, I'm quite sure we're in heavy agreement, so I'm not quite sure what you're complaining about :)
>
> Alex
>
>>
>> Regards,
>>
>> Anthony Liguori
Alexander Graf - Sept. 29, 2012, 12:54 p.m.
On 29.09.2012, at 13:46, Blue Swirl <blauwirbel@gmail.com> wrote:

> On Wed, Sep 26, 2012 at 8:51 PM, Alexander Graf <agraf@suse.de> wrote:
>> 
>> 
>> On 26.09.2012, at 22:03, Anthony Liguori <anthony@codemonkey.ws> wrote:
>> 
>>> Alexander Graf <agraf@suse.de> writes:
>>> 
>>>> On 22.09.2012, at 15:31, Blue Swirl <blauwirbel@gmail.com> wrote:
>>>> 
>>>>> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
>>>>> <david@gibson.dropbear.id.au> wrote:
>>>>>> Below is a patch which implements the (PAPR mandated) NVRAM for the
>>>>>> pseries machine.  It raises a couple of generic questions.
>>>>>> 
>>>>>> First, this adds a new "nvram" machine option which is used to give a
>>>>>> block device id to back the NVRAM so it is persistent.  Since some
>>>>>> sort of NVRAM is quite common, it seems this might be useful on other
>>>>>> machines one day, although obviously nothing else implements it yet.
>>>>> 
>>>>> Yes, there have been discussions earlier since loading NVRAM contents
>>>>> from a file would be useful for many architectures too.
>>>>> 
>>>>>> 
>>>>>> Second, if a block device is not specified, it simply allocates a
>>>>>> block of memory to make a non-persistent NVRAM.  Obviously that isn't
>>>>>> really "NV", but it's enough to make many guests happy most of the
>>>>>> time, and doesn't require setting up an image file and drive.  It does
>>>>>> mean a different set of code paths in the driver though, and it will
>>>>>> need special case handling for savevm (not implemented yet).  Is this
>>>>>> the right approach, or should I be creating a dummy block device for a
>>>>>> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
>>>>>> but maybe I'm missing something.
>>>>> 
>>>>> That was the problem earlier too, it looks like a generic way for all
>>>>> NVRAM/flash devices should be obvious but so far nobody has been able
>>>>> to propose something.
>>>>> 
>>>>> What if there are two devices which could use this, for example CMOS
>>>>> and flash on x86?
>>>>> 
>>>>> This should be extending  -device syntax rather than adding another
>>>>> top level option. Something like
>>>>> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
>>>>> spapr-nvram,drive_id=main_nvram
>>>> 
>>>> Could we create a simplified syntax for this in addition? Something like
>>>> 
>>>> -device spapr-nvram,file=nvram.raw
>>>> 
>>>> which would then automatically spawn a drive for the user. Saving the
>>>> machine state would obviously save the transparently created drive.
>>> 
>>> We can't ask people to rewrite half of QEMU just to merge a feature.
>> 
>> Who is asking anyone to rewrite half of QEMU?
>> 
>>> 
>>> In this case, what matters is:
>>> 
>>> 0) The device should always be modelled with QOM/qdev
>> 
>> Yes
>> 
>>> 
>>> 1) If the device is a fundamental part of the machine (i.e. you can't do
>>>  anything useful with out it), then it's configuration should be
>>>  specified as a machine parameter.
>> 
>> Yes
>> 
>>> 
>>> 2) If !(1), the device should be added with -device
>> 
>> Yes
>> 
>>> 
>>> 3) Devices deal with backends and only with backends.  We have a syntax
>>>  for specifying backends independently of backends.
>> 
>> Yes
>> 
>> and
>> 
>> 4) For often occuring use cases, we might want to provide a simplified cmdline syntax
>> 
>>> 
>>> If you want a single option to configure a device, that's a problem to
>>> attempt to solve independent of this series.
>> 
>> I never disagreed with that statement. We were merely brainstorming.
>> 
>>> 
>>>> But I don't want to force people to have to use -device syntax for the
>>>> average qemu use cases.
>>> 
>>> Sorry, but that's where we're at today.  -device is part of our user
>>> interface.  It's a management tool only interface and we cannot
>>> replicate every option just because you don't like the syntax.
>> 
>> Sure. And it's good to have. But we should also provide easier syntax for people without mgmnt tools, for use cases that occur often.
>> 
>> From the first xen vs kvm days one main argument about kvm was the difficulty in running it. People took the (overly complex) libvirt execution command line to QEMU and showed it to people. It did indeed scare a few.
>> 
>> So all I'm saying above is that we should not restrict ourselves to -device syntax, if we see a case that happens for more people than usual. However, I'd always try to model it as a shortcut form. So
>> 
>>  -nvram <file>
>> 
>> would just in the cmdline parser be converted to
>> 
>>  -drive file=<file>,if=none,id=nvram -machine nvram=nvram
> 
> The problem with this is that it hardcodes the nvram device to one and
> only 'nvram'. What about CMOS and flash for x86, which one -nvram
> would control?

Then we invent a new option -cmos? These are just ideas. The bit about the machine option is the important one :). Direct cmdline options really should only be shortcuts.


Alex

> 
>> 
>> I hope that makes my point a bit clearer. In fact, I'm quite sure we're in heavy agreement, so I'm not quite sure what you're complaining about :)
>> 
>> Alex
>> 
>>> 
>>> Regards,
>>> 
>>> Anthony Liguori
Blue Swirl - Sept. 29, 2012, 2:11 p.m.
On Sat, Sep 29, 2012 at 12:54 PM, Alexander Graf <agraf@suse.de> wrote:
>
>
> On 29.09.2012, at 13:46, Blue Swirl <blauwirbel@gmail.com> wrote:
>
>> On Wed, Sep 26, 2012 at 8:51 PM, Alexander Graf <agraf@suse.de> wrote:
>>>
>>>
>>> On 26.09.2012, at 22:03, Anthony Liguori <anthony@codemonkey.ws> wrote:
>>>
>>>> Alexander Graf <agraf@suse.de> writes:
>>>>
>>>>> On 22.09.2012, at 15:31, Blue Swirl <blauwirbel@gmail.com> wrote:
>>>>>
>>>>>> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
>>>>>> <david@gibson.dropbear.id.au> wrote:
>>>>>>> Below is a patch which implements the (PAPR mandated) NVRAM for the
>>>>>>> pseries machine.  It raises a couple of generic questions.
>>>>>>>
>>>>>>> First, this adds a new "nvram" machine option which is used to give a
>>>>>>> block device id to back the NVRAM so it is persistent.  Since some
>>>>>>> sort of NVRAM is quite common, it seems this might be useful on other
>>>>>>> machines one day, although obviously nothing else implements it yet.
>>>>>>
>>>>>> Yes, there have been discussions earlier since loading NVRAM contents
>>>>>> from a file would be useful for many architectures too.
>>>>>>
>>>>>>>
>>>>>>> Second, if a block device is not specified, it simply allocates a
>>>>>>> block of memory to make a non-persistent NVRAM.  Obviously that isn't
>>>>>>> really "NV", but it's enough to make many guests happy most of the
>>>>>>> time, and doesn't require setting up an image file and drive.  It does
>>>>>>> mean a different set of code paths in the driver though, and it will
>>>>>>> need special case handling for savevm (not implemented yet).  Is this
>>>>>>> the right approach, or should I be creating a dummy block device for a
>>>>>>> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
>>>>>>> but maybe I'm missing something.
>>>>>>
>>>>>> That was the problem earlier too, it looks like a generic way for all
>>>>>> NVRAM/flash devices should be obvious but so far nobody has been able
>>>>>> to propose something.
>>>>>>
>>>>>> What if there are two devices which could use this, for example CMOS
>>>>>> and flash on x86?
>>>>>>
>>>>>> This should be extending  -device syntax rather than adding another
>>>>>> top level option. Something like
>>>>>> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
>>>>>> spapr-nvram,drive_id=main_nvram
>>>>>
>>>>> Could we create a simplified syntax for this in addition? Something like
>>>>>
>>>>> -device spapr-nvram,file=nvram.raw
>>>>>
>>>>> which would then automatically spawn a drive for the user. Saving the
>>>>> machine state would obviously save the transparently created drive.
>>>>
>>>> We can't ask people to rewrite half of QEMU just to merge a feature.
>>>
>>> Who is asking anyone to rewrite half of QEMU?
>>>
>>>>
>>>> In this case, what matters is:
>>>>
>>>> 0) The device should always be modelled with QOM/qdev
>>>
>>> Yes
>>>
>>>>
>>>> 1) If the device is a fundamental part of the machine (i.e. you can't do
>>>>  anything useful with out it), then it's configuration should be
>>>>  specified as a machine parameter.
>>>
>>> Yes
>>>
>>>>
>>>> 2) If !(1), the device should be added with -device
>>>
>>> Yes
>>>
>>>>
>>>> 3) Devices deal with backends and only with backends.  We have a syntax
>>>>  for specifying backends independently of backends.
>>>
>>> Yes
>>>
>>> and
>>>
>>> 4) For often occuring use cases, we might want to provide a simplified cmdline syntax
>>>
>>>>
>>>> If you want a single option to configure a device, that's a problem to
>>>> attempt to solve independent of this series.
>>>
>>> I never disagreed with that statement. We were merely brainstorming.
>>>
>>>>
>>>>> But I don't want to force people to have to use -device syntax for the
>>>>> average qemu use cases.
>>>>
>>>> Sorry, but that's where we're at today.  -device is part of our user
>>>> interface.  It's a management tool only interface and we cannot
>>>> replicate every option just because you don't like the syntax.
>>>
>>> Sure. And it's good to have. But we should also provide easier syntax for people without mgmnt tools, for use cases that occur often.
>>>
>>> From the first xen vs kvm days one main argument about kvm was the difficulty in running it. People took the (overly complex) libvirt execution command line to QEMU and showed it to people. It did indeed scare a few.
>>>
>>> So all I'm saying above is that we should not restrict ourselves to -device syntax, if we see a case that happens for more people than usual. However, I'd always try to model it as a shortcut form. So
>>>
>>>  -nvram <file>
>>>
>>> would just in the cmdline parser be converted to
>>>
>>>  -drive file=<file>,if=none,id=nvram -machine nvram=nvram
>>
>> The problem with this is that it hardcodes the nvram device to one and
>> only 'nvram'. What about CMOS and flash for x86, which one -nvram
>> would control?
>
> Then we invent a new option -cmos? These are just ideas. The bit about the machine option is the important one :). Direct cmdline options really should only be shortcuts.

Again, -flash, -cmos, -nvram? And then the same for the machine,
-machine nvram=foo,cmos=bar,flash=zerg?

>
>
> Alex
>
>>
>>>
>>> I hope that makes my point a bit clearer. In fact, I'm quite sure we're in heavy agreement, so I'm not quite sure what you're complaining about :)
>>>
>>> Alex
>>>
>>>>
>>>> Regards,
>>>>
>>>> Anthony Liguori
Alexander Graf - Sept. 29, 2012, 3:24 p.m.
On 29.09.2012, at 16:11, Blue Swirl <blauwirbel@gmail.com> wrote:

> On Sat, Sep 29, 2012 at 12:54 PM, Alexander Graf <agraf@suse.de> wrote:
>> 
>> 
>> On 29.09.2012, at 13:46, Blue Swirl <blauwirbel@gmail.com> wrote:
>> 
>>> On Wed, Sep 26, 2012 at 8:51 PM, Alexander Graf <agraf@suse.de> wrote:
>>>> 
>>>> 
>>>> On 26.09.2012, at 22:03, Anthony Liguori <anthony@codemonkey.ws> wrote:
>>>> 
>>>>> Alexander Graf <agraf@suse.de> writes:
>>>>> 
>>>>>> On 22.09.2012, at 15:31, Blue Swirl <blauwirbel@gmail.com> wrote:
>>>>>> 
>>>>>>> On Fri, Sep 21, 2012 at 3:08 AM, David Gibson
>>>>>>> <david@gibson.dropbear.id.au> wrote:
>>>>>>>> Below is a patch which implements the (PAPR mandated) NVRAM for the
>>>>>>>> pseries machine.  It raises a couple of generic questions.
>>>>>>>> 
>>>>>>>> First, this adds a new "nvram" machine option which is used to give a
>>>>>>>> block device id to back the NVRAM so it is persistent.  Since some
>>>>>>>> sort of NVRAM is quite common, it seems this might be useful on other
>>>>>>>> machines one day, although obviously nothing else implements it yet.
>>>>>>> 
>>>>>>> Yes, there have been discussions earlier since loading NVRAM contents
>>>>>>> from a file would be useful for many architectures too.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Second, if a block device is not specified, it simply allocates a
>>>>>>>> block of memory to make a non-persistent NVRAM.  Obviously that isn't
>>>>>>>> really "NV", but it's enough to make many guests happy most of the
>>>>>>>> time, and doesn't require setting up an image file and drive.  It does
>>>>>>>> mean a different set of code paths in the driver though, and it will
>>>>>>>> need special case handling for savevm (not implemented yet).  Is this
>>>>>>>> the right approach, or should I be creating a dummy block device for a
>>>>>>>> one-run NVRAM of this kind?  I couldn't see an obvious way to do that,
>>>>>>>> but maybe I'm missing something.
>>>>>>> 
>>>>>>> That was the problem earlier too, it looks like a generic way for all
>>>>>>> NVRAM/flash devices should be obvious but so far nobody has been able
>>>>>>> to propose something.
>>>>>>> 
>>>>>>> What if there are two devices which could use this, for example CMOS
>>>>>>> and flash on x86?
>>>>>>> 
>>>>>>> This should be extending  -device syntax rather than adding another
>>>>>>> top level option. Something like
>>>>>>> -drive foo,file=nvram.qcow2,format=qcow2,id=main_nvram -device
>>>>>>> spapr-nvram,drive_id=main_nvram
>>>>>> 
>>>>>> Could we create a simplified syntax for this in addition? Something like
>>>>>> 
>>>>>> -device spapr-nvram,file=nvram.raw
>>>>>> 
>>>>>> which would then automatically spawn a drive for the user. Saving the
>>>>>> machine state would obviously save the transparently created drive.
>>>>> 
>>>>> We can't ask people to rewrite half of QEMU just to merge a feature.
>>>> 
>>>> Who is asking anyone to rewrite half of QEMU?
>>>> 
>>>>> 
>>>>> In this case, what matters is:
>>>>> 
>>>>> 0) The device should always be modelled with QOM/qdev
>>>> 
>>>> Yes
>>>> 
>>>>> 
>>>>> 1) If the device is a fundamental part of the machine (i.e. you can't do
>>>>> anything useful with out it), then it's configuration should be
>>>>> specified as a machine parameter.
>>>> 
>>>> Yes
>>>> 
>>>>> 
>>>>> 2) If !(1), the device should be added with -device
>>>> 
>>>> Yes
>>>> 
>>>>> 
>>>>> 3) Devices deal with backends and only with backends.  We have a syntax
>>>>> for specifying backends independently of backends.
>>>> 
>>>> Yes
>>>> 
>>>> and
>>>> 
>>>> 4) For often occuring use cases, we might want to provide a simplified cmdline syntax
>>>> 
>>>>> 
>>>>> If you want a single option to configure a device, that's a problem to
>>>>> attempt to solve independent of this series.
>>>> 
>>>> I never disagreed with that statement. We were merely brainstorming.
>>>> 
>>>>> 
>>>>>> But I don't want to force people to have to use -device syntax for the
>>>>>> average qemu use cases.
>>>>> 
>>>>> Sorry, but that's where we're at today.  -device is part of our user
>>>>> interface.  It's a management tool only interface and we cannot
>>>>> replicate every option just because you don't like the syntax.
>>>> 
>>>> Sure. And it's good to have. But we should also provide easier syntax for people without mgmnt tools, for use cases that occur often.
>>>> 
>>>> From the first xen vs kvm days one main argument about kvm was the difficulty in running it. People took the (overly complex) libvirt execution command line to QEMU and showed it to people. It did indeed scare a few.
>>>> 
>>>> So all I'm saying above is that we should not restrict ourselves to -device syntax, if we see a case that happens for more people than usual. However, I'd always try to model it as a shortcut form. So
>>>> 
>>>> -nvram <file>
>>>> 
>>>> would just in the cmdline parser be converted to
>>>> 
>>>> -drive file=<file>,if=none,id=nvram -machine nvram=nvram
>>> 
>>> The problem with this is that it hardcodes the nvram device to one and
>>> only 'nvram'. What about CMOS and flash for x86, which one -nvram
>>> would control?
>> 
>> Then we invent a new option -cmos? These are just ideas. The bit about the machine option is the important one :). Direct cmdline options really should only be shortcuts.
> 
> Again, -flash, -cmos, -nvram? And then the same for the machine,
> -machine nvram=foo,cmos=bar,flash=zerg?

Yup ;). Though I would probably call flash 'bios', since that is essentially what -bios does today.

Alex

> 
>> 
>> 
>> Alex
>> 
>>> 
>>>> 
>>>> I hope that makes my point a bit clearer. In fact, I'm quite sure we're in heavy agreement, so I'm not quite sure what you're complaining about :)
>>>> 
>>>> Alex
>>>> 
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Anthony Liguori

Patch

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index 951e407..91cbe8c 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -11,6 +11,7 @@  obj-y += ppc_newworld.o
 obj-$(CONFIG_PSERIES) += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
 obj-$(CONFIG_PSERIES) += xics.o spapr_vty.o spapr_llan.o spapr_vscsi.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o pci-hotplug.o spapr_iommu.o
+obj-$(CONFIG_PSERIES) += spapr_nvram.o
 # PowerPC 4xx boards
 obj-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
 obj-y += ppc440_bamboo.o
diff --git a/hw/spapr.c b/hw/spapr.c
index a8bd3c1..079825a 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -804,6 +804,9 @@  static void ppc_spapr_init(ram_addr_t ram_size,
         }
     }
 
+    /* We always have at least the nvram device on VIO */
+    spapr_create_nvram(spapr);
+
     /* Set up PCI */
     spapr_pci_rtas_init();
 
diff --git a/hw/spapr.h b/hw/spapr.h
index e984e3f..d9c3b4a 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -6,11 +6,13 @@ 
 
 struct VIOsPAPRBus;
 struct sPAPRPHBState;
+struct sPAPRNVRAM;
 struct icp_state;
 
 typedef struct sPAPREnvironment {
     struct VIOsPAPRBus *vio_bus;
     QLIST_HEAD(, sPAPRPHBState) phbs;
+    struct sPAPRNVRAM *nvram;
     struct icp_state *icp;
 
     target_phys_addr_t ram_limit;
@@ -336,6 +338,7 @@  typedef struct sPAPRTCE {
 #define SPAPR_PCI_BASE_LIOBN    0x80000000
 
 void spapr_iommu_init(void);
+void spapr_create_nvram(sPAPREnvironment *spapr);
 DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size);
 void spapr_tce_free(DMAContext *dma);
 void spapr_tce_reset(DMAContext *dma);
diff --git a/hw/spapr_nvram.c b/hw/spapr_nvram.c
new file mode 100644
index 0000000..8cd8a53
--- /dev/null
+++ b/hw/spapr_nvram.c
@@ -0,0 +1,225 @@ 
+/*
+ * QEMU sPAPR NVRAM emulation
+ *
+ * Copyright (C) 2012 David Gibson, IBM Corporation.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include <sys/mman.h>
+#include <libfdt.h>
+
+#include "device_tree.h"
+#include "hw/sysbus.h"
+#include "hw/spapr.h"
+#include "hw/spapr_vio.h"
+
+typedef struct sPAPRNVRAM {
+    VIOsPAPRDevice sdev;
+    uint32_t size;
+    uint8_t *buf;
+    BlockDriverState *drive;
+} sPAPRNVRAM;
+
+#define MIN_NVRAM_SIZE 8192
+#define DEFAULT_NVRAM_SIZE 16384
+#define MAX_NVRAM_SIZE (UINT16_MAX * 16)
+
+static void rtas_nvram_fetch(sPAPREnvironment *spapr,
+                             uint32_t token, uint32_t nargs,
+                             target_ulong args,
+                             uint32_t nret, target_ulong rets)
+{
+    sPAPRNVRAM *nvram = spapr->nvram;
+    target_phys_addr_t offset, buffer, len;
+    int alen;
+    void *membuf;
+
+    if ((nargs != 3) || (nret != 2)) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+
+    if (!nvram) {
+        rtas_st(rets, 0, -1);
+        rtas_st(rets, 1, 0);
+        return;
+    }
+
+    offset = rtas_ld(args, 0);
+    buffer = rtas_ld(args, 1);
+    len = rtas_ld(args, 2);
+
+    if (((offset + len) < offset)
+        || ((offset + len) > nvram->size)) {
+        rtas_st(rets, 0, -3);
+        rtas_st(rets, 1, 0);
+        return;
+    }
+
+    membuf = cpu_physical_memory_map(buffer, &len, 1);
+    if (nvram->drive) {
+        alen = bdrv_pread(nvram->drive, offset, membuf, len);
+    } else {
+        assert(nvram->buf);
+
+        memcpy(membuf, nvram->buf + offset, len);
+        alen = len;
+    }
+    cpu_physical_memory_unmap(membuf, len, 1, len);
+
+    rtas_st(rets, 0, (alen < len) ? -1 : 0);
+    rtas_st(rets, 1, (alen < 0) ? 0 : alen);
+}
+
+static void rtas_nvram_store(sPAPREnvironment *spapr,
+                             uint32_t token, uint32_t nargs,
+                             target_ulong args,
+                             uint32_t nret, target_ulong rets)
+{
+    sPAPRNVRAM *nvram = spapr->nvram;
+    target_phys_addr_t offset, buffer, len;
+    int alen;
+    void *membuf;
+
+    if ((nargs != 3) || (nret != 2)) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+
+    if (!nvram) {
+        rtas_st(rets, 0, -1);
+        return;
+    }
+
+    offset = rtas_ld(args, 0);
+    buffer = rtas_ld(args, 1);
+    len = rtas_ld(args, 2);
+
+    if (((offset + len) < offset)
+        || ((offset + len) > nvram->size)) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+
+    membuf = cpu_physical_memory_map(buffer, &len, 0);
+    if (nvram->drive) {
+        alen = bdrv_pwrite(nvram->drive, offset, membuf, len);
+    } else {
+        assert(nvram->buf);
+
+        memcpy(nvram->buf + offset, membuf, len);
+        alen = len;
+    }
+    cpu_physical_memory_unmap(membuf, len, 0, len);
+
+    rtas_st(rets, 0, (alen < len) ? -1 : 0);
+    rtas_st(rets, 1, (alen < 0) ? 0 : alen);
+}
+
+static int spapr_nvram_init(VIOsPAPRDevice *dev)
+{
+    sPAPRNVRAM *nvram = (sPAPRNVRAM *)dev;
+
+    if (nvram->drive) {
+        nvram->size = bdrv_getlength(nvram->drive);
+    } else {
+        nvram->size = DEFAULT_NVRAM_SIZE;
+        nvram->buf = g_malloc0(nvram->size);
+    }
+
+    if ((nvram->size < MIN_NVRAM_SIZE) || (nvram->size > MAX_NVRAM_SIZE)) {
+        fprintf(stderr, "spapr-nvram must be between %d and %d bytes in size\n",
+                MIN_NVRAM_SIZE, MAX_NVRAM_SIZE);
+        return -1;
+    }
+
+    return 0;
+}
+
+static int spapr_nvram_devnode(VIOsPAPRDevice *dev, void *fdt, int node_off)
+{
+    sPAPRNVRAM *nvram = (sPAPRNVRAM *)dev;
+
+    return fdt_setprop_cell(fdt, node_off, "#bytes", nvram->size);
+}
+
+static Property spapr_nvram_properties[] = {
+    DEFINE_SPAPR_PROPERTIES(sPAPRNVRAM, sdev),
+    DEFINE_PROP_DRIVE("drive", sPAPRNVRAM, drive),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void spapr_nvram_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VIOsPAPRDeviceClass *k = VIO_SPAPR_DEVICE_CLASS(klass);
+
+    k->init = spapr_nvram_init;
+    k->devnode = spapr_nvram_devnode;
+    k->dt_name = "nvram";
+    k->dt_type = "nvram";
+    k->dt_compatible = "qemu,spapr-nvram";
+    dc->props = spapr_nvram_properties;
+}
+
+static const TypeInfo spapr_nvram_type_info = {
+    .name          = "spapr-nvram",
+    .parent        = TYPE_VIO_SPAPR_DEVICE,
+    .instance_size = sizeof(sPAPRNVRAM),
+    .class_init    = spapr_nvram_class_init,
+};
+
+static void spapr_nvram_register_types(void)
+{
+    type_register_static(&spapr_nvram_type_info);
+}
+
+type_init(spapr_nvram_register_types)
+
+void spapr_create_nvram(sPAPREnvironment *spapr)
+{
+    QemuOpts *machine_opts;
+    DeviceState *dev;
+
+    dev = qdev_create(&spapr->vio_bus->bus, "spapr-nvram");
+
+    machine_opts = qemu_opts_find(qemu_find_opts("machine"), 0);
+    if (machine_opts) {
+        const char *drivename;
+
+        drivename = qemu_opt_get(machine_opts, "nvram");
+        if (drivename) {
+            BlockDriverState *bs;
+
+            bs = bdrv_find(drivename);
+            if (!bs) {
+                fprintf(stderr, "No such block device \"%s\" for nvram\n",
+                        drivename);
+                exit(1);
+            }
+            qdev_prop_set_drive_nofail(dev, "drive", bs);
+        }
+    }
+
+    qdev_init_nofail(dev);
+
+    spapr->nvram = (sPAPRNVRAM *)dev;
+    spapr_rtas_register("nvram-fetch", rtas_nvram_fetch);
+    spapr_rtas_register("nvram-store", rtas_nvram_store);
+}
diff --git a/qemu-config.c b/qemu-config.c
index 12eafbb..1cd9a1b 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -619,6 +619,10 @@  static QemuOptsList qemu_machine_opts = {
             .name = "mem-merge",
             .type = QEMU_OPT_BOOL,
             .help = "enable/disable memory merge support",
+        }, {
+            .name = "nvram",
+            .type = QEMU_OPT_STRING,
+            .help = "Drive backing persistent NVRAM",
         },
         { /* End of list */ }
     },