diff mbox

[1/1] ubi: Introduce block devices for UBI volumes

Message ID 1391027881-8354-2-git-send-email-ezequiel.garcia@free-electrons.com
State Superseded
Headers show

Commit Message

Ezequiel Garcia Jan. 29, 2014, 8:38 p.m. UTC
Block device emulation on top of ubi volumes with cached read/write support.
Both the cached access and the write support are selectable at compile time.

Given UBI takes care of wear leveling and bad block management it's possible
to add a thin layer to enable block device access to UBI volumes.
This allows to use a block-oriented filesystem on a flash device.

In a similar fashion to mtdblock, a 1-LEB size cache has been
implemented. However, very memory-constrained systems can choose to
disable the cache and save the 1-LEB byte allocation.

If write support is enabled, the flash device will be written when the cache
is flushed. The following events trigger this:
* block device release (detach)
* different than cached leb is accessed
* io-barrier is received through a REQ_FLUSH request

Despite this efforts, it's very important to remember that regular
block-oriented filesystems have no care at all about wear leveling;
they will access the block device randomly, only caring for performance.
Therefore, write support should be selected only for development and
with extreme caution.

The cache is 1-LEB bytes, vmalloced at open() and freed at release();
in addition, each block device has a workqueue associated.

Block devices are created upon user request through new ioctls:
UBI_IOCVOLATTBLK to attach and UBI_IOCVOLDETBLK to detach.
Also, a new UBI module parameter is added 'ubi.block'. This parameter is
needed in order to attach a block device on boot-up time, allowing to
mount the rootfs on a ubiblock device.
For instance, you could have these kernel parameters:

  ubi.mtd=5 ubi.block=0,0 root=/dev/ubiblock0_0

Or, if you compile ubi as a module:

  $ modprobe ubi mtd=/dev/mtd5 block=/dev/ubi0_0

Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Cc: Tim Bird <tim.bird@am.sony.com>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
---
 drivers/mtd/ubi/Kconfig     |  42 +++
 drivers/mtd/ubi/Makefile    |   1 +
 drivers/mtd/ubi/block.c     | 899 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/mtd/ubi/build.c     |   5 +
 drivers/mtd/ubi/cdev.c      |  20 +
 drivers/mtd/ubi/ubi.h       |  14 +
 include/uapi/mtd/ubi-user.h |  11 +
 7 files changed, 992 insertions(+)
 create mode 100644 drivers/mtd/ubi/block.c

Comments

Willy Tarreau Jan. 31, 2014, 5:06 p.m. UTC | #1
On Wed, Jan 29, 2014 at 05:38:01PM -0300, Ezequiel Garcia wrote:
> Block device emulation on top of ubi volumes with cached read/write support.
> Both the cached access and the write support are selectable at compile time.
> 
> Given UBI takes care of wear leveling and bad block management it's possible
> to add a thin layer to enable block device access to UBI volumes.
> This allows to use a block-oriented filesystem on a flash device.
> 
> In a similar fashion to mtdblock, a 1-LEB size cache has been
> implemented. However, very memory-constrained systems can choose to
> disable the cache and save the 1-LEB byte allocation.
> 
> If write support is enabled, the flash device will be written when the cache
> is flushed. The following events trigger this:
> * block device release (detach)
> * different than cached leb is accessed
> * io-barrier is received through a REQ_FLUSH request
> 
> Despite this efforts, it's very important to remember that regular
> block-oriented filesystems have no care at all about wear leveling;
> they will access the block device randomly, only caring for performance.
> Therefore, write support should be selected only for development and
> with extreme caution.
> 
> The cache is 1-LEB bytes, vmalloced at open() and freed at release();
> in addition, each block device has a workqueue associated.
> 
> Block devices are created upon user request through new ioctls:
> UBI_IOCVOLATTBLK to attach and UBI_IOCVOLDETBLK to detach.
> Also, a new UBI module parameter is added 'ubi.block'. This parameter is
> needed in order to attach a block device on boot-up time, allowing to
> mount the rootfs on a ubiblock device.
> For instance, you could have these kernel parameters:
> 
>   ubi.mtd=5 ubi.block=0,0 root=/dev/ubiblock0_0
> 
> Or, if you compile ubi as a module:
> 
>   $ modprobe ubi mtd=/dev/mtd5 block=/dev/ubi0_0
> 
> Cc: Artem Bityutskiy <dedekind1@gmail.com>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Brian Norris <computersforpeace@gmail.com>
> Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
> Cc: Tim Bird <tim.bird@am.sony.com>
> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> Cc: Mike Frysinger <vapier@gentoo.org>
> Cc: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
> Cc: Willy Tarreau <w@1wt.eu>
> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>

Works pretty well here on 3.14-git. I've also tested write support
with success. I'm booting from a squashfs installed on top of it.
I find that the combination of squashfs + ubiblock is really good
for a rootfs. It's fast, space-efficient in terms of RAM and NAND,
and made reliable by the ubi layer.

Concerning the question about the usage of write support, I find it
useful to perform rootfs upgrades from Linux.

Feel free to add :

   Tested-By: Willy Tarreau <w@1wt.eu>

Best regards,
Willy
Ezequiel Garcia Feb. 4, 2014, 11:06 a.m. UTC | #2
On Fri, Jan 31, 2014 at 06:06:37PM +0100, Willy Tarreau wrote:
> On Wed, Jan 29, 2014 at 05:38:01PM -0300, Ezequiel Garcia wrote:
> > Block device emulation on top of ubi volumes with cached read/write support.
> > Both the cached access and the write support are selectable at compile time.
> > 
> > Given UBI takes care of wear leveling and bad block management it's possible
> > to add a thin layer to enable block device access to UBI volumes.
> > This allows to use a block-oriented filesystem on a flash device.
> > 
> > In a similar fashion to mtdblock, a 1-LEB size cache has been
> > implemented. However, very memory-constrained systems can choose to
> > disable the cache and save the 1-LEB byte allocation.
> > 
> > If write support is enabled, the flash device will be written when the cache
> > is flushed. The following events trigger this:
> > * block device release (detach)
> > * different than cached leb is accessed
> > * io-barrier is received through a REQ_FLUSH request
> > 
> > Despite this efforts, it's very important to remember that regular
> > block-oriented filesystems have no care at all about wear leveling;
> > they will access the block device randomly, only caring for performance.
> > Therefore, write support should be selected only for development and
> > with extreme caution.
> > 
> > The cache is 1-LEB bytes, vmalloced at open() and freed at release();
> > in addition, each block device has a workqueue associated.
> > 
> > Block devices are created upon user request through new ioctls:
> > UBI_IOCVOLATTBLK to attach and UBI_IOCVOLDETBLK to detach.
> > Also, a new UBI module parameter is added 'ubi.block'. This parameter is
> > needed in order to attach a block device on boot-up time, allowing to
> > mount the rootfs on a ubiblock device.
> > For instance, you could have these kernel parameters:
> > 
> >   ubi.mtd=5 ubi.block=0,0 root=/dev/ubiblock0_0
> > 
> > Or, if you compile ubi as a module:
> > 
> >   $ modprobe ubi mtd=/dev/mtd5 block=/dev/ubi0_0
> > 
> > Cc: Artem Bityutskiy <dedekind1@gmail.com>
> > Cc: David Woodhouse <dwmw2@infradead.org>
> > Cc: Brian Norris <computersforpeace@gmail.com>
> > Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
> > Cc: Tim Bird <tim.bird@am.sony.com>
> > Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> > Cc: Mike Frysinger <vapier@gentoo.org>
> > Cc: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
> > Cc: Willy Tarreau <w@1wt.eu>
> > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> 
> Works pretty well here on 3.14-git. I've also tested write support
> with success. I'm booting from a squashfs installed on top of it.
> I find that the combination of squashfs + ubiblock is really good
> for a rootfs. It's fast, space-efficient in terms of RAM and NAND,
> and made reliable by the ubi layer.
> 
> Concerning the question about the usage of write support, I find it
> useful to perform rootfs upgrades from Linux.
> 
> Feel free to add :
> 
>    Tested-By: Willy Tarreau <w@1wt.eu>
> 

Thanks Willy!

Artem: If at all possible, I'd like to avoid stalling, so feel free to comment
on anything...
Ezequiel Garcia Feb. 8, 2014, 4:50 p.m. UTC | #3
On Tue, Feb 04, 2014 at 08:06:43AM -0300, Ezequiel Garcia wrote:
> On Fri, Jan 31, 2014 at 06:06:37PM +0100, Willy Tarreau wrote:
> > On Wed, Jan 29, 2014 at 05:38:01PM -0300, Ezequiel Garcia wrote:
> > > Block device emulation on top of ubi volumes with cached read/write support.
> > > Both the cached access and the write support are selectable at compile time.
> > > 
> > > Given UBI takes care of wear leveling and bad block management it's possible
> > > to add a thin layer to enable block device access to UBI volumes.
> > > This allows to use a block-oriented filesystem on a flash device.
> > > 
> > > In a similar fashion to mtdblock, a 1-LEB size cache has been
> > > implemented. However, very memory-constrained systems can choose to
> > > disable the cache and save the 1-LEB byte allocation.
> > > 
> > > If write support is enabled, the flash device will be written when the cache
> > > is flushed. The following events trigger this:
> > > * block device release (detach)
> > > * different than cached leb is accessed
> > > * io-barrier is received through a REQ_FLUSH request
> > > 
> > > Despite this efforts, it's very important to remember that regular
> > > block-oriented filesystems have no care at all about wear leveling;
> > > they will access the block device randomly, only caring for performance.
> > > Therefore, write support should be selected only for development and
> > > with extreme caution.
> > > 
> > > The cache is 1-LEB bytes, vmalloced at open() and freed at release();
> > > in addition, each block device has a workqueue associated.
> > > 
> > > Block devices are created upon user request through new ioctls:
> > > UBI_IOCVOLATTBLK to attach and UBI_IOCVOLDETBLK to detach.
> > > Also, a new UBI module parameter is added 'ubi.block'. This parameter is
> > > needed in order to attach a block device on boot-up time, allowing to
> > > mount the rootfs on a ubiblock device.
> > > For instance, you could have these kernel parameters:
> > > 
> > >   ubi.mtd=5 ubi.block=0,0 root=/dev/ubiblock0_0
> > > 
> > > Or, if you compile ubi as a module:
> > > 
> > >   $ modprobe ubi mtd=/dev/mtd5 block=/dev/ubi0_0
> > > 
> > > Cc: Artem Bityutskiy <dedekind1@gmail.com>
> > > Cc: David Woodhouse <dwmw2@infradead.org>
> > > Cc: Brian Norris <computersforpeace@gmail.com>
> > > Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
> > > Cc: Tim Bird <tim.bird@am.sony.com>
> > > Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> > > Cc: Mike Frysinger <vapier@gentoo.org>
> > > Cc: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
> > > Cc: Willy Tarreau <w@1wt.eu>
> > > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> > 
> > Works pretty well here on 3.14-git. I've also tested write support
> > with success. I'm booting from a squashfs installed on top of it.
> > I find that the combination of squashfs + ubiblock is really good
> > for a rootfs. It's fast, space-efficient in terms of RAM and NAND,
> > and made reliable by the ubi layer.
> > 
> > Concerning the question about the usage of write support, I find it
> > useful to perform rootfs upgrades from Linux.
> > 
> > Feel free to add :
> > 
> >    Tested-By: Willy Tarreau <w@1wt.eu>
> > 
> 
> Thanks Willy!
> 
> Artem: If at all possible, I'd like to avoid stalling, so feel free to comment
> on anything...

Ping?
Richard Weinberger Feb. 8, 2014, 9:37 p.m. UTC | #4
On Wed, Jan 29, 2014 at 9:38 PM, Ezequiel Garcia
<ezequiel.garcia@free-electrons.com> wrote:
> Block device emulation on top of ubi volumes with cached read/write support.
> Both the cached access and the write support are selectable at compile time.
>
> Given UBI takes care of wear leveling and bad block management it's possible
> to add a thin layer to enable block device access to UBI volumes.
> This allows to use a block-oriented filesystem on a flash device.
>
> In a similar fashion to mtdblock, a 1-LEB size cache has been
> implemented. However, very memory-constrained systems can choose to
> disable the cache and save the 1-LEB byte allocation.
>
> If write support is enabled, the flash device will be written when the cache
> is flushed. The following events trigger this:
> * block device release (detach)
> * different than cached leb is accessed
> * io-barrier is received through a REQ_FLUSH request
>
> Despite this efforts, it's very important to remember that regular
> block-oriented filesystems have no care at all about wear leveling;
> they will access the block device randomly, only caring for performance.
> Therefore, write support should be selected only for development and
> with extreme caution.
>
> The cache is 1-LEB bytes, vmalloced at open() and freed at release();
> in addition, each block device has a workqueue associated.
>
> Block devices are created upon user request through new ioctls:
> UBI_IOCVOLATTBLK to attach and UBI_IOCVOLDETBLK to detach.
> Also, a new UBI module parameter is added 'ubi.block'. This parameter is
> needed in order to attach a block device on boot-up time, allowing to
> mount the rootfs on a ubiblock device.
> For instance, you could have these kernel parameters:
>
>   ubi.mtd=5 ubi.block=0,0 root=/dev/ubiblock0_0
>
> Or, if you compile ubi as a module:
>
>   $ modprobe ubi mtd=/dev/mtd5 block=/dev/ubi0_0
>
> Cc: Artem Bityutskiy <dedekind1@gmail.com>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Brian Norris <computersforpeace@gmail.com>
> Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
> Cc: Tim Bird <tim.bird@am.sony.com>
> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> Cc: Mike Frysinger <vapier@gentoo.org>
> Cc: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
> Cc: Willy Tarreau <w@1wt.eu>
> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> ---
>  drivers/mtd/ubi/Kconfig     |  42 +++
>  drivers/mtd/ubi/Makefile    |   1 +
>  drivers/mtd/ubi/block.c     | 899 ++++++++++++++++++++++++++++++++++++++++++++
>  drivers/mtd/ubi/build.c     |   5 +
>  drivers/mtd/ubi/cdev.c      |  20 +
>  drivers/mtd/ubi/ubi.h       |  14 +
>  include/uapi/mtd/ubi-user.h |  11 +
>  7 files changed, 992 insertions(+)
>  create mode 100644 drivers/mtd/ubi/block.c
>
> diff --git a/drivers/mtd/ubi/Kconfig b/drivers/mtd/ubi/Kconfig
> index 36663af..2893775 100644
> --- a/drivers/mtd/ubi/Kconfig
> +++ b/drivers/mtd/ubi/Kconfig
> @@ -87,4 +87,46 @@ config MTD_UBI_GLUEBI
>            work on top of UBI. Do not enable this unless you use legacy
>            software.
>
> +config MTD_UBI_BLOCK
> +       bool "Block device access to UBI volumes"
> +       default n
> +       help
> +          Since UBI already takes care of eraseblock wear leveling
> +          and bad block handling, it's possible to implement a block
> +          device on top of it and therefore mount regular filesystems
> +          (i.e. not flash-oriented, as ext4).
> +          In other words, this is a software flash translation layer.
> +
> +          This can be particularly interesting to allow mounting a read-only
> +          filesystem, such as squashfs, on a NAND device.
> +
> +          When selected, this feature will be built-in the ubi module
> +          and block devices will be able to be created and removed using
> +          the userspace 'ubiblkvol' tool (provided mtd-utils).
> +
> +          If in doubt, say "N".
> +
> +config MTD_UBI_BLOCK_CACHED
> +       bool "Enable cached UBI block access"
> +       default y
> +       depends on MTD_UBI_BLOCK
> +       help
> +          In order to reduce flash device access, this option enables a 1-LEB
> +          sized cache. For read-only access this can be an overkill, given
> +          filesystems most likely implement their own caching policies.
> +          Moreover, since a LEB can be as large as ~1 MiB, memory-constrained
> +          platforms can choose to disable this.
> +
> +config MTD_UBI_BLOCK_WRITE_SUPPORT
> +       bool "Enable write support (DANGEROUS)"
> +       default n
> +       depends on MTD_UBI_BLOCK
> +       select MTD_UBI_BLOCK_CACHED
> +       help
> +          This is a *very* dangerous feature. Using a regular block-oriented
> +          filesystem might impact heavily on a flash device wear.
> +          Use with extreme caution.
> +
> +          If in doubt, say "N".

I really vote for dropping write support at all.

>  endif # MTD_UBI
> diff --git a/drivers/mtd/ubi/Makefile b/drivers/mtd/ubi/Makefile
> index b46b0c97..4e3c3d7 100644
> --- a/drivers/mtd/ubi/Makefile
> +++ b/drivers/mtd/ubi/Makefile
> @@ -3,5 +3,6 @@ obj-$(CONFIG_MTD_UBI) += ubi.o
>  ubi-y += vtbl.o vmt.o upd.o build.o cdev.o kapi.o eba.o io.o wl.o attach.o
>  ubi-y += misc.o debug.o
>  ubi-$(CONFIG_MTD_UBI_FASTMAP) += fastmap.o
> +ubi-$(CONFIG_MTD_UBI_BLOCK) += block.o
>
>  obj-$(CONFIG_MTD_UBI_GLUEBI) += gluebi.o
> diff --git a/drivers/mtd/ubi/block.c b/drivers/mtd/ubi/block.c
> new file mode 100644
> index 0000000..6a7dc00
> --- /dev/null
> +++ b/drivers/mtd/ubi/block.c
> @@ -0,0 +1,899 @@
> +/*
> + * Copyright (c) 2014 Ezequiel Garcia
> + * Copyright (c) 2011 Free Electrons
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, version 2.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
> + * the GNU General Public License for more details.
> + *
> + * A block implementation on UBI volume
> + * ------------------------------------
> + *
> + * How to use this? A bunch of examples worth a thousand words:
> + *
> + * If you want to attach a block device on bootup time, e.g. in order
> + * to mount the rootfs on such a block device:
> + *
> + * Using the UBI volume path:
> + *  ubi.block=/dev/ubi0_0
> + *
> + * Using the UBI device, and the volume name:
> + *  ubi.block=0,rootfs
> + *
> + * Using both UBI device number and UBI volume number:
> + *  ubi.block=0,0
> + *
> + * In this case you would have such kernel parameters:
> + *  ubi.mtd=5 ubi.block=0,0 root=/dev/ubiblock0_0
> + *
> + * Of course, if you compile ubi as a module you can use this
> + * parameter on module insertion time:
> + *  modprobe ubi mtd=/dev/mtd5 block=/dev/ubi0_0
> + *
> + * For runtime block attaching/detaching, see mtd-utils' ubiblkvol tool.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/err.h>
> +#include <linux/kernel.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include <linux/mtd/ubi.h>
> +#include <linux/workqueue.h>
> +#include <linux/blkdev.h>
> +#include <linux/hdreg.h>
> +
> +#include "ubi-media.h"
> +#include "ubi.h"
> +
> +/* Maximum number of supported devices */
> +#define UBIBLOCK_MAX_DEVICES 32
> +
> +/* Maximum length of the 'block=' parameter */
> +#define UBIBLOCK_PARAM_LEN 63
> +
> +/* Maximum number of comma-separated items in the 'block=' parameter */
> +#define UBIBLOCK_PARAM_COUNT 2
> +
> +struct ubiblock_param {
> +       int ubi_num;
> +       int vol_id;
> +       char name[UBIBLOCK_PARAM_LEN+1];
> +};
> +
> +/* Numbers of elements set in the @ubiblock_param array */
> +static int ubiblock_devs __initdata;
> +
> +/* MTD devices specification parameters */
> +static struct ubiblock_param ubiblock_param[UBIBLOCK_MAX_DEVICES] __initdata;
> +
> +struct ubiblock_cache {
> +       char *buffer;
> +       enum { STATE_EMPTY, STATE_CLEAN, STATE_DIRTY } state;
> +       int leb_num;
> +};
> +
> +struct ubiblock {
> +       struct ubi_volume_desc *desc;
> +       struct ubi_volume_info *vi;
> +       int ubi_num;
> +       int vol_id;
> +       int refcnt;
> +
> +       struct gendisk *gd;
> +       struct request_queue *rq;
> +
> +       struct workqueue_struct *wq;
> +       struct work_struct work;
> +
> +       struct mutex vol_mutex;
> +       spinlock_t queue_lock;
> +       struct list_head list;
> +
> +       int leb_size;
> +#ifdef CONFIG_MTD_UBI_BLOCK_CACHED
> +       struct ubiblock_cache cache;
> +#endif
> +};
> +
> +/* Linked list of all ubiblock instances */
> +static LIST_HEAD(ubiblock_devices);
> +static DEFINE_MUTEX(devices_mutex);
> +static int ubiblock_major;
> +
> +/* Ugh! this parameter parsing code is awful */
> +static int __init ubiblock_set_param(const char *val,
> +                                    const struct kernel_param *kp)
> +{
> +       int i, ret;
> +       size_t len;
> +       struct ubiblock_param *param;
> +       char buf[UBIBLOCK_PARAM_LEN];
> +       char *pbuf = &buf[0];
> +       char *tokens[UBIBLOCK_PARAM_COUNT];
> +
> +       if (!val)
> +               return -EINVAL;
> +
> +       len = strnlen(val, UBIBLOCK_PARAM_LEN);
> +       if (len == 0) {
> +               ubi_warn("block: empty 'block=' parameter - ignored\n");
> +               return 0;
> +       }
> +
> +       if (len == UBIBLOCK_PARAM_LEN) {
> +               ubi_err("block: parameter \"%s\" is too long, max. is %d\n",
> +                       val, UBIBLOCK_PARAM_LEN);
> +               return -EINVAL;
> +       }
> +
> +       strcpy(buf, val);
> +
> +       /* Get rid of the final newline */
> +       if (buf[len - 1] == '\n')
> +               buf[len - 1] = '\0';
> +
> +       for (i = 0; i < UBIBLOCK_PARAM_COUNT; i++)
> +               tokens[i] = strsep(&pbuf, ",");
> +
> +       param = &ubiblock_param[ubiblock_devs];
> +       if (tokens[1]) {
> +               /* Two parameters: can be 'ubi, vol_id' or 'ubi, vol_name' */
> +               ret = kstrtoint(tokens[0], 10, &param->ubi_num);
> +               if (ret < 0)
> +                       return -EINVAL;
> +
> +               /* Second param can be a number or a name */
> +               ret = kstrtoint(tokens[1], 10, &param->vol_id);
> +               if (ret < 0) {
> +                       param->vol_id = -1;
> +                       strcpy(param->name, tokens[1]);
> +               }
> +
> +       } else {
> +               /* One parameter: must be device path */
> +               strcpy(param->name, tokens[0]);
> +               param->ubi_num = -1;
> +               param->vol_id = -1;
> +       }
> +
> +       ubiblock_devs++;
> +
> +       return 0;
> +}
> +
> +static const struct kernel_param_ops ubiblock_param_ops = {
> +       .set    = ubiblock_set_param,
> +};
> +module_param_cb(block, &ubiblock_param_ops, NULL, 0);
> +MODULE_PARM_DESC(block, "Attach block devices to UBI volumes. Parameter format: block=<path|dev,num|dev,name>.\n"
> +                       "Multiple \"block\" parameters may be specified.\n"
> +                       "UBI volumes may be specified by their number, name, or path to the device node.\n"
> +                       "Examples\n"
> +                       "Using the UBI volume path:\n"
> +                       "ubi.block=/dev/ubi0_0\n"
> +                       "Using the UBI device, and the volume name:\n"
> +                       "ubi.block=0,rootfs\n"
> +                       "Using both UBI device number and UBI volume number:\n"
> +                       "ubi.block=0,0\n");
> +
> +static struct ubiblock *find_dev_nolock(int ubi_num, int vol_id)
> +{
> +       struct ubiblock *dev;
> +
> +       list_for_each_entry(dev, &ubiblock_devices, list)
> +               if (dev->ubi_num == ubi_num && dev->vol_id == vol_id)
> +                       return dev;
> +       return NULL;
> +}
> +
> +#ifdef CONFIG_MTD_UBI_BLOCK_CACHED
> +static void ubiblock_clean_cache(struct ubiblock *dev)
> +{
> +       dev->cache.leb_num = -1;
> +       dev->cache.state = STATE_EMPTY;
> +}
> +
> +static void ubiblock_free_cache(struct ubiblock *dev)
> +{
> +       dev->cache.leb_num = -1;
> +       dev->cache.state = STATE_EMPTY;
> +       vfree(dev->cache.buffer);
> +       dev->cache.buffer = NULL;
> +}
> +
> +static int ubiblock_alloc_cache(struct ubiblock *dev)
> +{
> +       dev->cache.state = STATE_EMPTY;
> +       dev->cache.leb_num = -1;
> +       dev->cache.buffer = vmalloc(dev->leb_size);
> +       if (!dev->cache.buffer)
> +               return -ENOMEM;
> +       return 0;
> +}
> +
> +static bool leb_in_cache(struct ubiblock_cache *cache, int leb_num)
> +{
> +       return cache->leb_num == leb_num;
> +}
> +
> +static int ubiblock_fill_cache(struct ubiblock *dev, int leb_num,
> +                              struct ubiblock_cache *cache)
> +{
> +       int ret;
> +
> +       /* Warn if we fill cache while being dirty */
> +       WARN_ON(cache->state == STATE_DIRTY);
> +
> +       cache->leb_num = leb_num;
> +       cache->state = STATE_CLEAN;
> +
> +       ret = ubi_read(dev->desc, leb_num, cache->buffer, 0,
> +                      dev->leb_size);
> +       if (ret) {
> +               ubi_err("%s ubi_read error %d", dev->gd->disk_name, ret);
> +               return ret;
> +       }

If read fails we still end up with a valid cache entry?
Please set STATE_CLEAN only after a successful read.

> +       return 0;
> +}
> +
> +static int ubiblock_read_to_buf(struct ubiblock *dev, char *buffer,
> +                               int leb, int offset, int len)
> +{
> +       int ret;
> +       char *cache_buffer;
> +       /*
> +        * First try in cache, if it's not there load this leb.
> +        * Note that reading never flushes to disk!
> +        */
> +       if (leb_in_cache(&dev->cache, leb)) {
> +               cache_buffer = dev->cache.buffer;
> +       } else {
> +               /* Leb is not in cache: fill it! */
> +               ret = ubiblock_fill_cache(dev, leb, &dev->cache);
> +               if (ret)
> +                       return ret;
> +               cache_buffer = dev->cache.buffer;
> +       }
> +       memcpy(buffer, cache_buffer + offset, len);
> +       return 0;
> +}
> +#else
> +static inline void ubiblock_clean_cache(struct ubiblock *dev) {}
> +
> +static inline void ubiblock_free_cache(struct ubiblock *dev) {}
> +
> +static inline int ubiblock_alloc_cache(struct ubiblock *dev)
> +{
> +       return 0;
> +}
> +
> +static int ubiblock_read_to_buf(struct ubiblock *dev, char *buffer,
> +                               int leb, int offset, int len)
> +{
> +       int ret;
> +
> +       ret = ubi_read(dev->desc, leb, buffer, offset, len);
> +       if (ret) {
> +               ubi_err("%s ubi_read error %d",
> +                       dev->gd->disk_name, ret);
> +               return ret;
> +       }
> +       return 0;
> +}
> +#endif /* CONFIG_MTD_UBI_BLOCK_CACHED */
> +
> +#ifdef CONFIG_MTD_UBI_BLOCK_WRITE_SUPPORT
> +static int ubiblock_flush(struct ubiblock *dev, bool sync)
> +{
> +       struct ubiblock_cache *cache = &dev->cache;
> +       int ret = 0;
> +
> +       if (cache->state != STATE_DIRTY)
> +               return 0;
> +
> +       /*
> +        * mtdblock sets STATE_EMPTY, arguing that it prevents the
> +        * underlying media to get changed without notice.
> +        * I'm not fully convinced, so I just put STATE_CLEAN.
> +        */
> +       cache->state = STATE_CLEAN;
> +
> +       /* Atomically change leb with buffer contents */
> +       ret = ubi_leb_change(dev->desc, cache->leb_num,
> +                            cache->buffer, dev->leb_size);
> +       if (ret) {
> +               ubi_err("%s ubi_leb_change error %d",
> +                       dev->gd->disk_name, ret);
> +               return ret;
> +       }
> +
> +       /* Sync ubi device when device is released and on block flush ioctl */
> +       if (sync)
> +               ret = ubi_sync(dev->ubi_num);
> +
> +       return ret;
> +}
> +
> +static int ubiblock_write(struct ubiblock *dev, const char *buffer,
> +                         int pos, int len)
> +{
> +       int leb, offset, ret;
> +       int bytes_left = len;
> +       int to_write = len;
> +       struct ubiblock_cache *cache = &dev->cache;
> +
> +       /* Get (leb:offset) address to write */
> +       leb = pos / dev->leb_size;
> +       offset = pos % dev->leb_size;
> +
> +       while (bytes_left) {
> +               /*
> +                * We can only write one leb at a time.
> +                * Therefore if the write length is larger than
> +                * one leb size, we split the operation.
> +                */
> +               if (offset + to_write > dev->leb_size)
> +                       to_write = dev->leb_size - offset;
> +
> +               /*
> +                * If leb is not in cache, we flush current cached
> +                * leb to disk. Cache contents will be filled by reading device.
> +                */
> +               if (!leb_in_cache(cache, leb)) {
> +
> +                       ret = ubiblock_flush(dev, false);
> +                       if (ret)
> +                               return ret;
> +
> +                       ret = ubiblock_fill_cache(dev, leb, cache);
> +                       if (ret)
> +                               return ret;
> +               }
> +
> +               memcpy(cache->buffer + offset, buffer, to_write);
> +
> +               /* This is the only place where we dirt the cache */
> +               cache->state = STATE_DIRTY;
> +
> +               buffer += to_write;
> +               bytes_left -= to_write;
> +               to_write = bytes_left;
> +               offset = 0;
> +               leb++;
> +       }
> +       return 0;
> +}
> +#else
> +static int ubiblock_flush(struct ubiblock *dev, bool sync)
> +{
> +       return -EPERM;
> +}
> +
> +static int ubiblock_write(struct ubiblock *dev, const char *buffer,
> +                         int pos, int len)
> +{
> +       return -EPERM;
> +}
> +#endif /* CONFIG_MTD_UBI_BLOCK_WRITE_SUPPORT */
> +
> +static int ubiblock_read(struct ubiblock *dev, char *buffer, int pos, int len)
> +{
> +       int leb, offset, ret;
> +       int bytes_left = len;
> +       int to_read = len;
> +
> +       /* Get leb:offset address to read from */
> +       leb = pos / dev->leb_size;
> +       offset = pos % dev->leb_size;
> +
> +       while (bytes_left) {
> +
> +               /*
> +                * We can only read one leb at a time.
> +                * Therefore if the read length is larger than
> +                * one leb size, we split the operation.
> +                */
> +               if (offset + to_read > dev->leb_size)
> +                       to_read = dev->leb_size - offset;
> +
> +               ret = ubiblock_read_to_buf(dev, buffer, leb, offset, to_read);
> +               if (ret)
> +                       return ret;
> +
> +               buffer += to_read;
> +               bytes_left -= to_read;
> +               to_read = bytes_left;
> +               leb++;
> +               offset = 0;
> +       }
> +       return 0;
> +}
> +
> +static int do_ubiblock_request(struct ubiblock *dev, struct request *req)
> +{
> +       int pos, len;
> +
> +       if (req->cmd_flags & REQ_FLUSH)
> +               return ubiblock_flush(dev, true);
> +
> +       if (req->cmd_type != REQ_TYPE_FS)
> +               return -EIO;
> +
> +       if (blk_rq_pos(req) + blk_rq_cur_sectors(req) >
> +           get_capacity(req->rq_disk))
> +               return -EIO;
> +
> +       pos = blk_rq_pos(req) << 9;
> +       len = blk_rq_cur_bytes(req);
> +
> +       switch (rq_data_dir(req)) {
> +       case READ:
> +               return ubiblock_read(dev, req->buffer, pos, len);
> +       case WRITE:
> +               return ubiblock_write(dev, req->buffer, pos, len);
> +       default:
> +               return -EIO;
> +       }
> +
> +       return 0;
> +}
> +
> +static void ubiblock_do_work(struct work_struct *work)
> +{
> +       struct ubiblock *dev =
> +               container_of(work, struct ubiblock, work);
> +       struct request_queue *rq = dev->rq;
> +       struct request *req;
> +       int res;
> +
> +       spin_lock_irq(rq->queue_lock);
> +
> +       req = blk_fetch_request(rq);
> +       while (req) {
> +
> +               spin_unlock_irq(rq->queue_lock);
> +
> +               mutex_lock(&dev->vol_mutex);
> +               res = do_ubiblock_request(dev, req);
> +               mutex_unlock(&dev->vol_mutex);

This means that you can never do parallel IO?

> +
> +               spin_lock_irq(rq->queue_lock);
> +
> +               /*
> +                * If we're done with this request,
> +                * we need to fetch a new one
> +                */
> +               if (!__blk_end_request_cur(req, res))
> +                       req = blk_fetch_request(rq);
> +       }
> +
> +       spin_unlock_irq(rq->queue_lock);
> +}
> +
> +static void ubiblock_request(struct request_queue *rq)
> +{
> +       struct ubiblock *dev;
> +       struct request *req;
> +
> +       dev = rq->queuedata;
> +
> +       if (!dev)
> +               while ((req = blk_fetch_request(rq)) != NULL)
> +                       __blk_end_request_all(req, -ENODEV);
> +       else
> +               queue_work(dev->wq, &dev->work);
> +}
> +
> +static int ubiblock_open(struct block_device *bdev, fmode_t mode)
> +{
> +       struct ubiblock *dev = bdev->bd_disk->private_data;
> +       int ubi_mode = UBI_READONLY;
> +       int ret;
> +#ifdef CONFIG_MTD_UBI_BLOCK_WRITE_SUPPORT
> +       const bool allow_write = true;
> +#else
> +       const bool allow_write = false;
> +#endif
> +
> +       mutex_lock(&dev->vol_mutex);
> +       if (dev->refcnt > 0) {
> +               /*
> +                * The volume is already opened,
> +                * just increase the reference counter
> +                *
> +                * If the first user has oppened this as read-only,
> +                * we don't allow to open as read-write.
> +                * This is the simplest solution. A better one would
> +                * be to re-open the volume as writable.
> +                */
> +               if ((mode & FMODE_WRITE) &&
> +                   (dev->desc->mode != UBI_READWRITE)) {
> +                       ret = -EBUSY;
> +                       goto out_unlock;
> +               }
> +               goto out_done;
> +       }
> +
> +       if (allow_write) {
> +               if (mode & FMODE_WRITE)
> +                       ubi_mode = UBI_READWRITE;
> +       } else {
> +               if (mode & FMODE_WRITE) {
> +                       ret = -EPERM;
> +                       goto out_unlock;
> +               }
> +       }
> +
> +       dev->desc = ubi_open_volume(dev->ubi_num, dev->vol_id, ubi_mode);
> +       if (IS_ERR(dev->desc)) {
> +               ubi_err("%s failed to open ubi volume %d_%d",
> +                       dev->gd->disk_name, dev->ubi_num, dev->vol_id);
> +
> +               ret = PTR_ERR(dev->desc);
> +               dev->desc = NULL;
> +               goto out_unlock;
> +       }
> +
> +       dev->vi = kzalloc(sizeof(struct ubi_volume_info), GFP_KERNEL);
> +       if (!dev->vi) {
> +               ret = -ENOMEM;
> +               goto out_close;
> +       }
> +       ubi_get_volume_info(dev->desc, dev->vi);
> +       dev->leb_size = dev->vi->usable_leb_size;
> +
> +       ret = ubiblock_alloc_cache(dev);
> +       if (ret)
> +               goto out_free;
> +out_done:
> +       dev->refcnt++;
> +       mutex_unlock(&dev->vol_mutex);
> +       return 0;
> +
> +out_free:
> +       kfree(dev->vi);
> +out_close:
> +       ubi_close_volume(dev->desc);
> +       dev->desc = NULL;
> +out_unlock:
> +       mutex_unlock(&dev->vol_mutex);
> +       return ret;
> +}
> +
> +static void ubiblock_release(struct gendisk *gd, fmode_t mode)
> +{
> +       struct ubiblock *dev = gd->private_data;
> +
> +       mutex_lock(&dev->vol_mutex);
> +
> +       dev->refcnt--;
> +       if (dev->refcnt == 0) {
> +               ubiblock_flush(dev, true);
> +               ubiblock_free_cache(dev);
> +
> +               kfree(dev->vi);
> +               ubi_close_volume(dev->desc);
> +
> +               dev->vi = NULL;
> +               dev->desc = NULL;
> +       }
> +
> +       mutex_unlock(&dev->vol_mutex);
> +}
> +
> +static int ubiblock_ioctl(struct block_device *bdev, fmode_t mode,
> +                             unsigned int cmd, unsigned long arg)
> +{
> +       struct ubiblock *dev = bdev->bd_disk->private_data;
> +       int ret = -ENXIO;
> +
> +       if (!dev)
> +               return ret;
> +
> +       mutex_lock(&dev->vol_mutex);
> +
> +       /* I can't get this to get called. What's going on? */
> +       switch (cmd) {
> +       case BLKFLSBUF:
> +               ret = ubiblock_flush(dev, true);
> +               break;
> +       default:
> +               ret = -ENOTTY;
> +       }
> +
> +       mutex_unlock(&dev->vol_mutex);
> +       return ret;
> +}
> +
> +static int ubiblock_getgeo(struct block_device *bdev, struct hd_geometry *geo)
> +{
> +       /* Some tools might require this information */
> +       geo->heads = 1;
> +       geo->cylinders = 1;
> +       geo->sectors = get_capacity(bdev->bd_disk);
> +       geo->start = 0;
> +       return 0;
> +}
> +
> +static const struct block_device_operations ubiblock_ops = {
> +       .owner = THIS_MODULE,
> +       .open = ubiblock_open,
> +       .release = ubiblock_release,
> +       .ioctl = ubiblock_ioctl,
> +       .getgeo = ubiblock_getgeo,
> +};
> +
> +int ubiblock_add(struct ubi_volume_info *vi)
> +{
> +       struct ubiblock *dev;
> +       struct gendisk *gd;
> +       int disk_capacity;
> +       int ret;
> +
> +       /* Check that the volume isn't already handled */
> +       mutex_lock(&devices_mutex);
> +       if (find_dev_nolock(vi->ubi_num, vi->vol_id)) {
> +               mutex_unlock(&devices_mutex);
> +               return -EEXIST;
> +       }
> +       mutex_unlock(&devices_mutex);
> +
> +       dev = kzalloc(sizeof(struct ubiblock), GFP_KERNEL);
> +       if (!dev)
> +               return -ENOMEM;
> +
> +       mutex_init(&dev->vol_mutex);
> +
> +       dev->ubi_num = vi->ubi_num;
> +       dev->vol_id = vi->vol_id;
> +
> +       ubiblock_clean_cache(dev);
> +
> +       /* Initialize the gendisk of this ubiblock device */
> +       gd = alloc_disk(1);
> +       if (!gd) {
> +               ubi_err("block: alloc_disk failed");
> +               ret = -ENODEV;
> +               goto out_free_dev;
> +       }
> +
> +       gd->fops = &ubiblock_ops;
> +       gd->major = ubiblock_major;
> +       gd->first_minor = dev->ubi_num * UBI_MAX_VOLUMES + dev->vol_id;
> +       gd->private_data = dev;
> +       sprintf(gd->disk_name, "ubiblock%d_%d", dev->ubi_num, dev->vol_id);
> +       disk_capacity = (vi->size * vi->usable_leb_size) >> 9;
> +       set_capacity(gd, disk_capacity);
> +       dev->gd = gd;
> +
> +       spin_lock_init(&dev->queue_lock);
> +       dev->rq = blk_init_queue(ubiblock_request, &dev->queue_lock);
> +       if (!dev->rq) {
> +               ubi_err("block: blk_init_queue failed");
> +               ret = -ENODEV;
> +               goto out_put_disk;
> +       }
> +
> +       dev->rq->queuedata = dev;
> +       dev->gd->queue = dev->rq;
> +
> +       blk_queue_flush(dev->rq, REQ_FLUSH);
> +
> +       /*
> +        * Create one workqueue per volume (per registered block device).
> +        * Rembember workqueues are cheap, they're not threads.
> +        */
> +       dev->wq = alloc_workqueue(gd->disk_name, 0, 0);
> +       if (!dev->wq)
> +               goto out_free_queue;
> +       INIT_WORK(&dev->work, ubiblock_do_work);
> +
> +       mutex_lock(&devices_mutex);
> +       list_add_tail(&dev->list, &ubiblock_devices);
> +       mutex_unlock(&devices_mutex);
> +
> +       /* Must be the last step: anyone can call file ops from now on */
> +       add_disk(dev->gd);
> +
> +       ubi_msg("%s created from ubi%d:%d(%s)",
> +               dev->gd->disk_name, dev->ubi_num, dev->vol_id, vi->name);
> +
> +       return 0;
> +
> +out_free_queue:
> +       blk_cleanup_queue(dev->rq);
> +out_put_disk:
> +       put_disk(dev->gd);
> +out_free_dev:
> +       kfree(dev);
> +
> +       return ret;
> +}
> +
> +static void ubiblock_cleanup(struct ubiblock *dev)
> +{
> +       del_gendisk(dev->gd);
> +       blk_cleanup_queue(dev->rq);
> +       ubi_msg("%s released", dev->gd->disk_name);
> +       put_disk(dev->gd);
> +}
> +
> +int ubiblock_del(struct ubi_volume_info *vi)
> +{
> +       struct ubiblock *dev;
> +
> +       mutex_lock(&devices_mutex);
> +       dev = find_dev_nolock(vi->ubi_num, vi->vol_id);
> +       if (!dev) {
> +               mutex_unlock(&devices_mutex);
> +               return -ENODEV;
> +       }
> +
> +       /* Found a device, let's lock it so we can check if it's busy */
> +       mutex_lock(&dev->vol_mutex);
> +
> +       if (dev->refcnt > 0) {
> +               mutex_unlock(&dev->vol_mutex);
> +               mutex_unlock(&devices_mutex);
> +               return -EBUSY;
> +       }
> +
> +       /* Remove from device list */
> +       list_del(&dev->list);
> +       mutex_unlock(&devices_mutex);
> +
> +       /* Flush pending work and stop this workqueue */
> +       destroy_workqueue(dev->wq);
> +
> +       ubiblock_cleanup(dev);
> +       mutex_unlock(&dev->vol_mutex);
> +       kfree(dev);
> +       return 0;
> +}
> +
> +static void ubiblock_resize(struct ubi_volume_info *vi)
> +{
> +       struct ubiblock *dev;
> +       int disk_capacity;
> +
> +       /*
> +        * Need to lock the device list until we stop using the device,
> +        * otherwise the device struct might get released in ubiblock_del().
> +        */
> +       mutex_lock(&devices_mutex);
> +       dev = find_dev_nolock(vi->ubi_num, vi->vol_id);
> +       if (!dev) {
> +               mutex_unlock(&devices_mutex);
> +               return;
> +       }
> +
> +       mutex_lock(&dev->vol_mutex);
> +       disk_capacity = (vi->size * vi->usable_leb_size) >> 9;
> +       set_capacity(dev->gd, disk_capacity);
> +       ubi_msg("%s resized to %d LEBs", dev->gd->disk_name, vi->size);
> +       mutex_unlock(&dev->vol_mutex);
> +       mutex_unlock(&devices_mutex);
> +}
> +
> +static int ubiblock_notify(struct notifier_block *nb,
> +                        unsigned long notification_type, void *ns_ptr)
> +{
> +       struct ubi_notification *nt = ns_ptr;
> +
> +       switch (notification_type) {
> +       case UBI_VOLUME_ADDED:
> +               /*
> +                * We want to enforce explicit block device attaching for
> +                * volumes; so when a volume is added we do nothing.
> +                */
> +               break;
> +       case UBI_VOLUME_REMOVED:
> +               ubiblock_del(&nt->vi);
> +               break;
> +       case UBI_VOLUME_RESIZED:
> +               ubiblock_resize(&nt->vi);
> +               break;
> +       default:
> +               break;
> +       }
> +       return NOTIFY_OK;
> +}
> +
> +static struct notifier_block ubiblock_notifier = {
> +       .notifier_call = ubiblock_notify,
> +};
> +
> +static struct ubi_volume_desc * __init
> +open_volume_desc(const char *name, int ubi_num, int vol_id)
> +{
> +       if (ubi_num == -1)
> +               /* No ubi num, name must be a vol device path */
> +               return ubi_open_volume_path(name, UBI_READONLY);
> +       else if (vol_id == -1)
> +               /* No vol_id, must be vol_name */
> +               return ubi_open_volume_nm(ubi_num, name, UBI_READONLY);
> +       else
> +               return ubi_open_volume(ubi_num, vol_id, UBI_READONLY);
> +}
> +
> +static void __init ubiblock_add_from_param(void)
> +{
> +       int i, ret;
> +       struct ubiblock_param *p;
> +       struct ubi_volume_desc *desc;
> +       struct ubi_volume_info vi;
> +
> +       for (i = 0; i < ubiblock_devs; i++) {
> +               p = &ubiblock_param[i];
> +
> +               desc = open_volume_desc(p->name, p->ubi_num, p->vol_id);
> +               if (IS_ERR(desc)) {
> +                       ubi_warn("block: can't open volume, err=%ld\n",
> +                                PTR_ERR(desc));
> +                       continue;
> +               }
> +
> +               ubi_get_volume_info(desc, &vi);
> +               ret = ubiblock_add(&vi);
> +               if (ret)
> +                       ubi_warn("block: can't add '%s' volume, err=%d\n",
> +                                vi.name, ret);
> +               ubi_close_volume(desc);
> +       }
> +}
> +
> +int __init ubiblock_init(void)
> +{
> +       int ret;
> +
> +       ubiblock_major = register_blkdev(0, "ubiblock");
> +       if (ubiblock_major < 0)
> +               return ubiblock_major;
> +
> +       /* Attach block devices from 'block=' module param */
> +       ubiblock_add_from_param();
> +
> +       /*
> +        * Block devices needs to be attached to volumes explicitly
> +        * upon user request. So we ignore existing volumes.
> +        */
> +       ret = ubi_register_volume_notifier(&ubiblock_notifier, 1);
> +       if (ret < 0)
> +               unregister_blkdev(ubiblock_major, "ubiblock");
> +       return ret;
> +}
> +
> +void __exit ubiblock_exit(void)
> +{
> +       struct ubiblock *next;
> +       struct ubiblock *dev;
> +
> +       ubi_unregister_volume_notifier(&ubiblock_notifier);
> +
> +       list_for_each_entry_safe(dev, next, &ubiblock_devices, list) {
> +
> +               /* Flush pending work and stop workqueue */
> +               destroy_workqueue(dev->wq);
> +
> +               /* The module is being forcefully removed */
> +               WARN_ON(dev->desc);
> +
> +               /* Remove from device list */
> +               list_del(&dev->list);
> +
> +               ubiblock_cleanup(dev);
> +
> +               kfree(dev);
> +       }
> +
> +       unregister_blkdev(ubiblock_major, "ubiblock");
> +}
> diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c
> index e05dc62..2705bad 100644
> --- a/drivers/mtd/ubi/build.c
> +++ b/drivers/mtd/ubi/build.c
> @@ -1296,6 +1296,9 @@ static int __init ubi_init(void)
>                 }
>         }
>
> +       if (ubiblock_init() < 0)
> +               ubi_warn("cannot init block device access");
> +
>         return 0;
>
>  out_detach:
> @@ -1324,6 +1327,8 @@ static void __exit ubi_exit(void)
>  {
>         int i;
>
> +       ubiblock_exit();
> +
>         for (i = 0; i < UBI_MAX_DEVICES; i++)
>                 if (ubi_devices[i]) {
>                         mutex_lock(&ubi_devices_mutex);
> diff --git a/drivers/mtd/ubi/cdev.c b/drivers/mtd/ubi/cdev.c
> index 8ca49f2..39d3774 100644
> --- a/drivers/mtd/ubi/cdev.c
> +++ b/drivers/mtd/ubi/cdev.c
> @@ -561,6 +561,26 @@ static long vol_cdev_ioctl(struct file *file, unsigned int cmd,
>                 break;
>         }
>
> +       /* Attach a block device to an UBI volume */
> +       case UBI_IOCVOLATTBLK:
> +       {
> +               struct ubi_volume_info vi;
> +
> +               ubi_get_volume_info(desc, &vi);
> +               err = ubiblock_add(&vi);
> +               break;
> +       }
> +
> +       /* Dettach a block device from an UBI volume */
> +       case UBI_IOCVOLDETBLK:
> +       {
> +               struct ubi_volume_info vi;
> +
> +               ubi_get_volume_info(desc, &vi);
> +               err = ubiblock_del(&vi);
> +               break;
> +       }
> +
>         default:
>                 err = -ENOTTY;
>                 break;
> diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
> index 8ea6297..e76ff98 100644
> --- a/drivers/mtd/ubi/ubi.h
> +++ b/drivers/mtd/ubi/ubi.h
> @@ -864,6 +864,20 @@ int ubi_update_fastmap(struct ubi_device *ubi);
>  int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
>                      int fm_anchor);
>
> +/* block.c */
> +#ifdef CONFIG_MTD_UBI_BLOCK
> +int ubiblock_init(void);
> +void ubiblock_exit(void);
> +int ubiblock_add(struct ubi_volume_info *vi);
> +int ubiblock_del(struct ubi_volume_info *vi);
> +#else
> +static inline int ubiblock_init(void) { return 0; }
> +static inline void ubiblock_exit(void) {}
> +static inline int ubiblock_add(struct ubi_volume_info *vi) { return -ENOTTY; }
> +static inline int ubiblock_del(struct ubi_volume_info *vi) { return -ENOTTY; }
> +#endif
> +
> +
>  /*
>   * ubi_rb_for_each_entry - walk an RB-tree.
>   * @rb: a pointer to type 'struct rb_node' to use as a loop counter
> diff --git a/include/uapi/mtd/ubi-user.h b/include/uapi/mtd/ubi-user.h
> index 723c324..1080762 100644
> --- a/include/uapi/mtd/ubi-user.h
> +++ b/include/uapi/mtd/ubi-user.h
> @@ -134,6 +134,13 @@
>   * used. A pointer to a &struct ubi_set_vol_prop_req object is expected to be
>   * passed. The object describes which property should be set, and to which value
>   * it should be set.
> + *
> + * Block device access to UBI volumes
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + *
> + * To attach or detach a block device from an UBI volume the %UBI_IOCVOLATTBLK
> + * and %UBI_IOCVOLDETBLK ioctl commands should be used, respectively.
> + * These commands take no arguments.
>   */
>
>  /*
> @@ -191,6 +198,10 @@
>  /* Set an UBI volume property */
>  #define UBI_IOCSETVOLPROP _IOW(UBI_VOL_IOC_MAGIC, 6, \
>                                struct ubi_set_vol_prop_req)
> +/* Attach a block device to an UBI volume */
> +#define UBI_IOCVOLATTBLK _IO(UBI_VOL_IOC_MAGIC, 7)
> +/* Detach a block device from an UBI volume */
> +#define UBI_IOCVOLDETBLK _IO(UBI_VOL_IOC_MAGIC, 8)
>
>  /* Maximum MTD device name length supported by UBI */
>  #define MAX_UBI_MTD_NAME_LEN 127
> --
> 1.8.1.5
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
Willy Tarreau Feb. 8, 2014, 10:51 p.m. UTC | #5
On Sat, Feb 08, 2014 at 10:37:19PM +0100, Richard Weinberger wrote:
> > +config MTD_UBI_BLOCK_WRITE_SUPPORT
> > +       bool "Enable write support (DANGEROUS)"
> > +       default n
> > +       depends on MTD_UBI_BLOCK
> > +       select MTD_UBI_BLOCK_CACHED
> > +       help
> > +          This is a *very* dangerous feature. Using a regular block-oriented
> > +          filesystem might impact heavily on a flash device wear.
> > +          Use with extreme caution.
> > +
> > +          If in doubt, say "N".
> 
> I really vote for dropping write support at all.

Why ? When you put a read-only filesystem there such as squashfs, the
only writes you'll have will be updates, and write support will be the
only way to update the filesystem. So removing write support seriously
impacts the usefulness of the feature itself.

Willy
Richard Weinberger Feb. 8, 2014, 10:56 p.m. UTC | #6
Am 08.02.2014 23:51, schrieb Willy Tarreau:
> On Sat, Feb 08, 2014 at 10:37:19PM +0100, Richard Weinberger wrote:
>>> +config MTD_UBI_BLOCK_WRITE_SUPPORT
>>> +       bool "Enable write support (DANGEROUS)"
>>> +       default n
>>> +       depends on MTD_UBI_BLOCK
>>> +       select MTD_UBI_BLOCK_CACHED
>>> +       help
>>> +          This is a *very* dangerous feature. Using a regular block-oriented
>>> +          filesystem might impact heavily on a flash device wear.
>>> +          Use with extreme caution.
>>> +
>>> +          If in doubt, say "N".
>>
>> I really vote for dropping write support at all.
> 
> Why ? When you put a read-only filesystem there such as squashfs, the
> only writes you'll have will be updates, and write support will be the
> only way to update the filesystem. So removing write support seriously
> impacts the usefulness of the feature itself.

So almost everyone has to enable MTD_UBI_BLOCK_WRITE_SUPPORT?
I thought there is another way to fill the volume with data...

Thanks,
//richard
Willy Tarreau Feb. 8, 2014, 11:01 p.m. UTC | #7
On Sat, Feb 08, 2014 at 11:56:02PM +0100, Richard Weinberger wrote:
> Am 08.02.2014 23:51, schrieb Willy Tarreau:
> > On Sat, Feb 08, 2014 at 10:37:19PM +0100, Richard Weinberger wrote:
> >>> +config MTD_UBI_BLOCK_WRITE_SUPPORT
> >>> +       bool "Enable write support (DANGEROUS)"
> >>> +       default n
> >>> +       depends on MTD_UBI_BLOCK
> >>> +       select MTD_UBI_BLOCK_CACHED
> >>> +       help
> >>> +          This is a *very* dangerous feature. Using a regular block-oriented
> >>> +          filesystem might impact heavily on a flash device wear.
> >>> +          Use with extreme caution.
> >>> +
> >>> +          If in doubt, say "N".
> >>
> >> I really vote for dropping write support at all.
> > 
> > Why ? When you put a read-only filesystem there such as squashfs, the
> > only writes you'll have will be updates, and write support will be the
> > only way to update the filesystem. So removing write support seriously
> > impacts the usefulness of the feature itself.
> 
> So almost everyone has to enable MTD_UBI_BLOCK_WRITE_SUPPORT?
> I thought there is another way to fill the volume with data...

I personally don't see the use of disabling write support on anything
unless the code is broken. Better emit a warning upon first write to
mention that there is limited or no wear leveling. But preventing all
reasonable users from using a useful feature just to save a few ignorant
from shooting themselves in the foot is non-sense in my opinion.

Why not disable write support to ubifs as well then, so that we're
sure that the most demanding ones will never wear their NANDs ? And
why not disable mtdblock so that nobody can mount them as ext2 ? If
people can already do bad things more easily without this code,
there is no reason to remove the feature.

Regards,
Willy
Piergiorgio Beruto Feb. 8, 2014, 11:05 p.m. UTC | #8
Hello,
I am one of the early testers of ubi block.

In my design, which is pretty common for embedded systems, I use ubiblk in
read-only mode (and no caching since squashfs already provides it).
For updating the squash I use ubiupdatevol, as shown in this code snippet
(double bank handling).

swrel_update() {
   swrel_loadst || return
   if [ -z "$FILE" ] ; then
      echo "error: you must specify a valid image file with -f option"
      return 1
   fi

   if [ -z "$BANK" ] ; then
      # search for an active bank to update
      # valid state is when at least one bank is standby
      [ "$SWREL2_STATE" != "active" ] && [ "$SWREL2_STATE" != "committed" ]
&& BANK=2
      [ "$SWREL1_STATE" != "active" ] && [ "$SWREL1_STATE" != "committed" ]
&& BANK=1

      if [ -z "$BANK" ] ; then
         decho "swrel_update: cannot find a standby bank to update"
         exit 2
      fi
   fi

   decho "swrel_update: updating bank #$BANK with file \"$FILE\""

   dev="/dev/ubi-app${BANK}w"
   ubiupdatevol $dev "$FILE" || return

   # if there are no committed banks, commit the updating one automatically
   if [ "$SWREL1_STATE" != "committed" ] && [ "$SWREL2_STATE" != "committed"
] ; then
      decho "swrel_update: committing bank #$BANK as no committed banks were
found"
      swrel_commit
   fi
}

If I had to go for a RW filesystem I would use ubifs instead of ubiblk in RW
mode.
But of course this is my very personal need.

Regards,
PIergiorgio

-----Original Message-----
From: Richard Weinberger [mailto:richard@nod.at] 
Sent: Saturday, February 8, 2014 11:56 PM
To: Willy Tarreau
Cc: Ezequiel Garcia; linux-mtd@lists.infradead.org; Thomas Petazzoni; Mike
Frysinger; Artem Bityutskiy; Michael Opdenacker; Tim Bird; Piergiorgio
Beruto; Brian Norris; David Woodhouse
Subject: Re: [PATCH 1/1] ubi: Introduce block devices for UBI volumes

Am 08.02.2014 23:51, schrieb Willy Tarreau:
> On Sat, Feb 08, 2014 at 10:37:19PM +0100, Richard Weinberger wrote:
>>> +config MTD_UBI_BLOCK_WRITE_SUPPORT
>>> +       bool "Enable write support (DANGEROUS)"
>>> +       default n
>>> +       depends on MTD_UBI_BLOCK
>>> +       select MTD_UBI_BLOCK_CACHED
>>> +       help
>>> +          This is a *very* dangerous feature. Using a regular
block-oriented
>>> +          filesystem might impact heavily on a flash device wear.
>>> +          Use with extreme caution.
>>> +
>>> +          If in doubt, say "N".
>>
>> I really vote for dropping write support at all.
> 
> Why ? When you put a read-only filesystem there such as squashfs, the 
> only writes you'll have will be updates, and write support will be the 
> only way to update the filesystem. So removing write support seriously 
> impacts the usefulness of the feature itself.

So almost everyone has to enable MTD_UBI_BLOCK_WRITE_SUPPORT?
I thought there is another way to fill the volume with data...

Thanks,
//richard
Piergiorgio Beruto Feb. 8, 2014, 11:10 p.m. UTC | #9
Hi,
just to better explain my previous mail, I actually agree with you that it's
generally a bad idea to cut off a feature just because you don't see good
use for yourself or because it's dangerous.
The real reason to remove the RW feature, in my opinion, might be for
reducing the amount of code you have to maintain over time.

Usually I try to understand the benefits vs effort when deciding about such
things.
In this case It seems to me that the effort is not that much but I fail to
see any real use of the RW feature for myself.

I suggest to listen to other people that used ubblk before making any
decision.

Regards,
Piergiorgio

-----Original Message-----
From: Willy Tarreau [mailto:w@1wt.eu] 
Sent: Sunday, February 9, 2014 12:02 AM
To: Richard Weinberger
Cc: Ezequiel Garcia; linux-mtd@lists.infradead.org; Thomas Petazzoni; Mike
Frysinger; Artem Bityutskiy; Michael Opdenacker; Tim Bird; Piergiorgio
Beruto; Brian Norris; David Woodhouse
Subject: Re: [PATCH 1/1] ubi: Introduce block devices for UBI volumes

On Sat, Feb 08, 2014 at 11:56:02PM +0100, Richard Weinberger wrote:
> Am 08.02.2014 23:51, schrieb Willy Tarreau:
> > On Sat, Feb 08, 2014 at 10:37:19PM +0100, Richard Weinberger wrote:
> >>> +config MTD_UBI_BLOCK_WRITE_SUPPORT
> >>> +       bool "Enable write support (DANGEROUS)"
> >>> +       default n
> >>> +       depends on MTD_UBI_BLOCK
> >>> +       select MTD_UBI_BLOCK_CACHED
> >>> +       help
> >>> +          This is a *very* dangerous feature. Using a regular
block-oriented
> >>> +          filesystem might impact heavily on a flash device wear.
> >>> +          Use with extreme caution.
> >>> +
> >>> +          If in doubt, say "N".
> >>
> >> I really vote for dropping write support at all.
> > 
> > Why ? When you put a read-only filesystem there such as squashfs, 
> > the only writes you'll have will be updates, and write support will 
> > be the only way to update the filesystem. So removing write support 
> > seriously impacts the usefulness of the feature itself.
> 
> So almost everyone has to enable MTD_UBI_BLOCK_WRITE_SUPPORT?
> I thought there is another way to fill the volume with data...

I personally don't see the use of disabling write support on anything unless
the code is broken. Better emit a warning upon first write to mention that
there is limited or no wear leveling. But preventing all reasonable users
from using a useful feature just to save a few ignorant from shooting
themselves in the foot is non-sense in my opinion.

Why not disable write support to ubifs as well then, so that we're sure that
the most demanding ones will never wear their NANDs ? And why not disable
mtdblock so that nobody can mount them as ext2 ? If people can already do
bad things more easily without this code, there is no reason to remove the
feature.

Regards,
Willy
Willy Tarreau Feb. 8, 2014, 11:13 p.m. UTC | #10
Hi Piergiorgio,

On Sun, Feb 09, 2014 at 12:05:12AM +0100, Piergiorgio Beruto wrote:
> Hello,
> I am one of the early testers of ubi block.
> 
> In my design, which is pretty common for embedded systems, I use ubiblk in
> read-only mode (and no caching since squashfs already provides it).
> For updating the squash I use ubiupdatevol, as shown in this code snippet
> (double bank handling).
> 
> swrel_update() {
>    swrel_loadst || return
>    if [ -z "$FILE" ] ; then
>       echo "error: you must specify a valid image file with -f option"
>       return 1
>    fi
> 
>    if [ -z "$BANK" ] ; then
>       # search for an active bank to update
>       # valid state is when at least one bank is standby
>       [ "$SWREL2_STATE" != "active" ] && [ "$SWREL2_STATE" != "committed" ]
> && BANK=2
>       [ "$SWREL1_STATE" != "active" ] && [ "$SWREL1_STATE" != "committed" ]
> && BANK=1
> 
>       if [ -z "$BANK" ] ; then
>          decho "swrel_update: cannot find a standby bank to update"
>          exit 2
>       fi
>    fi
> 
>    decho "swrel_update: updating bank #$BANK with file \"$FILE\""
> 
>    dev="/dev/ubi-app${BANK}w"
>    ubiupdatevol $dev "$FILE" || return
> 
>    # if there are no committed banks, commit the updating one automatically
>    if [ "$SWREL1_STATE" != "committed" ] && [ "$SWREL2_STATE" != "committed"
> ] ; then
>       decho "swrel_update: committing bank #$BANK as no committed banks were
> found"
>       swrel_commit
>    fi
> }
> 
> If I had to go for a RW filesystem I would use ubifs instead of ubiblk in RW
> mode.

I've been using ubifs for rootfs, but for a config FS, I'd rather use something
simple like ext2. Ubifs has unfortunately failed too many times on me and there
is no fsck to recover it after a failure. While I don't mind this for a rootfs
which is supposed to be easy to rebuild on an embedded device, it can be
problematic for some other parts like config and/or extra data that need to
be read once per boot and written very rarely. Anyway, squashfs guarantees me
that the FS I'm using matches what I think it should be.

> But of course this is my very personal need.

Yes and that's very interesting to all share our respective needs!

Regards,
Willy
Richard Weinberger Feb. 8, 2014, 11:13 p.m. UTC | #11
Am 09.02.2014 00:01, schrieb Willy Tarreau:
> On Sat, Feb 08, 2014 at 11:56:02PM +0100, Richard Weinberger wrote:
>> Am 08.02.2014 23:51, schrieb Willy Tarreau:
>>> On Sat, Feb 08, 2014 at 10:37:19PM +0100, Richard Weinberger wrote:
>>>>> +config MTD_UBI_BLOCK_WRITE_SUPPORT
>>>>> +       bool "Enable write support (DANGEROUS)"
>>>>> +       default n
>>>>> +       depends on MTD_UBI_BLOCK
>>>>> +       select MTD_UBI_BLOCK_CACHED
>>>>> +       help
>>>>> +          This is a *very* dangerous feature. Using a regular block-oriented
>>>>> +          filesystem might impact heavily on a flash device wear.
>>>>> +          Use with extreme caution.
>>>>> +
>>>>> +          If in doubt, say "N".
>>>>
>>>> I really vote for dropping write support at all.
>>>
>>> Why ? When you put a read-only filesystem there such as squashfs, the
>>> only writes you'll have will be updates, and write support will be the
>>> only way to update the filesystem. So removing write support seriously
>>> impacts the usefulness of the feature itself.
>>
>> So almost everyone has to enable MTD_UBI_BLOCK_WRITE_SUPPORT?
>> I thought there is another way to fill the volume with data...
> 
> I personally don't see the use of disabling write support on anything
> unless the code is broken. Better emit a warning upon first write to
> mention that there is limited or no wear leveling. But preventing all
> reasonable users from using a useful feature just to save a few ignorant
> from shooting themselves in the foot is non-sense in my opinion.

As Piergiorgio wrote, one can use ubiupdatevol to update his squashfs.
There is simply no use case for MTD_UBI_BLOCK_WRITE_SUPPORT.

> Why not disable write support to ubifs as well then, so that we're
> sure that the most demanding ones will never wear their NANDs ? And
> why not disable mtdblock so that nobody can mount them as ext2 ? If
> people can already do bad things more easily without this code,
> there is no reason to remove the feature.

I'd like to avoid another mtdblock.

Thanks,
//richard
Willy Tarreau Feb. 8, 2014, 11:15 p.m. UTC | #12
On Sun, Feb 09, 2014 at 12:13:11AM +0100, Richard Weinberger wrote:
> Am 09.02.2014 00:01, schrieb Willy Tarreau:
> > On Sat, Feb 08, 2014 at 11:56:02PM +0100, Richard Weinberger wrote:
> >> Am 08.02.2014 23:51, schrieb Willy Tarreau:
> >>> On Sat, Feb 08, 2014 at 10:37:19PM +0100, Richard Weinberger wrote:
> >>>>> +config MTD_UBI_BLOCK_WRITE_SUPPORT
> >>>>> +       bool "Enable write support (DANGEROUS)"
> >>>>> +       default n
> >>>>> +       depends on MTD_UBI_BLOCK
> >>>>> +       select MTD_UBI_BLOCK_CACHED
> >>>>> +       help
> >>>>> +          This is a *very* dangerous feature. Using a regular block-oriented
> >>>>> +          filesystem might impact heavily on a flash device wear.
> >>>>> +          Use with extreme caution.
> >>>>> +
> >>>>> +          If in doubt, say "N".
> >>>>
> >>>> I really vote for dropping write support at all.
> >>>
> >>> Why ? When you put a read-only filesystem there such as squashfs, the
> >>> only writes you'll have will be updates, and write support will be the
> >>> only way to update the filesystem. So removing write support seriously
> >>> impacts the usefulness of the feature itself.
> >>
> >> So almost everyone has to enable MTD_UBI_BLOCK_WRITE_SUPPORT?
> >> I thought there is another way to fill the volume with data...
> > 
> > I personally don't see the use of disabling write support on anything
> > unless the code is broken. Better emit a warning upon first write to
> > mention that there is limited or no wear leveling. But preventing all
> > reasonable users from using a useful feature just to save a few ignorant
> > from shooting themselves in the foot is non-sense in my opinion.
> 
> As Piergiorgio wrote, one can use ubiupdatevol to update his squashfs.
> There is simply no use case for MTD_UBI_BLOCK_WRITE_SUPPORT.

I gave an example with ext2 for the config. It's a bit excessive to
quickly declare "there is simply no use case for $put_your_option_here",
it just means that *you* don't have this use case, which I perfectly
respect.

> > Why not disable write support to ubifs as well then, so that we're
> > sure that the most demanding ones will never wear their NANDs ? And
> > why not disable mtdblock so that nobody can mount them as ext2 ? If
> > people can already do bad things more easily without this code,
> > there is no reason to remove the feature.
> 
> I'd like to avoid another mtdblock.

But we already have it. Without write support on ubiblock, people will
continue to use mtdblock for this, which is even worse since it does
not handle bad blocks.

Regards,
Willy
Richard Weinberger Feb. 8, 2014, 11:25 p.m. UTC | #13
Am 09.02.2014 00:15, schrieb Willy Tarreau:
> On Sun, Feb 09, 2014 at 12:13:11AM +0100, Richard Weinberger wrote:
>> Am 09.02.2014 00:01, schrieb Willy Tarreau:
>>> On Sat, Feb 08, 2014 at 11:56:02PM +0100, Richard Weinberger wrote:
>>>> Am 08.02.2014 23:51, schrieb Willy Tarreau:
>>>>> On Sat, Feb 08, 2014 at 10:37:19PM +0100, Richard Weinberger wrote:
>>>>>>> +config MTD_UBI_BLOCK_WRITE_SUPPORT
>>>>>>> +       bool "Enable write support (DANGEROUS)"
>>>>>>> +       default n
>>>>>>> +       depends on MTD_UBI_BLOCK
>>>>>>> +       select MTD_UBI_BLOCK_CACHED
>>>>>>> +       help
>>>>>>> +          This is a *very* dangerous feature. Using a regular block-oriented
>>>>>>> +          filesystem might impact heavily on a flash device wear.
>>>>>>> +          Use with extreme caution.
>>>>>>> +
>>>>>>> +          If in doubt, say "N".
>>>>>>
>>>>>> I really vote for dropping write support at all.
>>>>>
>>>>> Why ? When you put a read-only filesystem there such as squashfs, the
>>>>> only writes you'll have will be updates, and write support will be the
>>>>> only way to update the filesystem. So removing write support seriously
>>>>> impacts the usefulness of the feature itself.
>>>>
>>>> So almost everyone has to enable MTD_UBI_BLOCK_WRITE_SUPPORT?
>>>> I thought there is another way to fill the volume with data...
>>>
>>> I personally don't see the use of disabling write support on anything
>>> unless the code is broken. Better emit a warning upon first write to
>>> mention that there is limited or no wear leveling. But preventing all
>>> reasonable users from using a useful feature just to save a few ignorant
>>> from shooting themselves in the foot is non-sense in my opinion.
>>
>> As Piergiorgio wrote, one can use ubiupdatevol to update his squashfs.
>> There is simply no use case for MTD_UBI_BLOCK_WRITE_SUPPORT.
> 
> I gave an example with ext2 for the config. It's a bit excessive to
> quickly declare "there is simply no use case for $put_your_option_here",
> it just means that *you* don't have this use case, which I perfectly
> respect.

The mail with your ext2 use case arrived afterward I've sent that mail.

So you are using ext2 as config filesystem because you're facing issues with ubifs?

Thanks,
//richard
Willy Tarreau Feb. 8, 2014, 11:37 p.m. UTC | #14
On Sun, Feb 09, 2014 at 12:25:01AM +0100, Richard Weinberger wrote:
> Am 09.02.2014 00:15, schrieb Willy Tarreau:
> > On Sun, Feb 09, 2014 at 12:13:11AM +0100, Richard Weinberger wrote:
> >> Am 09.02.2014 00:01, schrieb Willy Tarreau:
> >>> On Sat, Feb 08, 2014 at 11:56:02PM +0100, Richard Weinberger wrote:
> >>>> Am 08.02.2014 23:51, schrieb Willy Tarreau:
> >>>>> On Sat, Feb 08, 2014 at 10:37:19PM +0100, Richard Weinberger wrote:
> >>>>>>> +config MTD_UBI_BLOCK_WRITE_SUPPORT
> >>>>>>> +       bool "Enable write support (DANGEROUS)"
> >>>>>>> +       default n
> >>>>>>> +       depends on MTD_UBI_BLOCK
> >>>>>>> +       select MTD_UBI_BLOCK_CACHED
> >>>>>>> +       help
> >>>>>>> +          This is a *very* dangerous feature. Using a regular block-oriented
> >>>>>>> +          filesystem might impact heavily on a flash device wear.
> >>>>>>> +          Use with extreme caution.
> >>>>>>> +
> >>>>>>> +          If in doubt, say "N".
> >>>>>>
> >>>>>> I really vote for dropping write support at all.
> >>>>>
> >>>>> Why ? When you put a read-only filesystem there such as squashfs, the
> >>>>> only writes you'll have will be updates, and write support will be the
> >>>>> only way to update the filesystem. So removing write support seriously
> >>>>> impacts the usefulness of the feature itself.
> >>>>
> >>>> So almost everyone has to enable MTD_UBI_BLOCK_WRITE_SUPPORT?
> >>>> I thought there is another way to fill the volume with data...
> >>>
> >>> I personally don't see the use of disabling write support on anything
> >>> unless the code is broken. Better emit a warning upon first write to
> >>> mention that there is limited or no wear leveling. But preventing all
> >>> reasonable users from using a useful feature just to save a few ignorant
> >>> from shooting themselves in the foot is non-sense in my opinion.
> >>
> >> As Piergiorgio wrote, one can use ubiupdatevol to update his squashfs.
> >> There is simply no use case for MTD_UBI_BLOCK_WRITE_SUPPORT.
> > 
> > I gave an example with ext2 for the config. It's a bit excessive to
> > quickly declare "there is simply no use case for $put_your_option_here",
> > it just means that *you* don't have this use case, which I perfectly
> > respect.
> 
> The mail with your ext2 use case arrived afterward I've sent that mail.

No problem.

> So you are using ext2 as config filesystem because you're facing issues with ubifs?

No, I've been using ext2 on x86-based hardware and compact flash for
something like 10 years with a great success (easy to mount, easy to
fix, easy to save, easy to occasionally add a backup copy or an extra
data file, etc). I contemplated ubifs on NAND devices as an alternative
when starting to play with ARM-based devices, and lost the reliability
and ability to fix. Switching back to the proven ext2 completely solved
the issues in the end. Ubifs is nice when you need a real read/write FS,
but most small devices do not need wear leveling or any of such nice
features. When you just write 1-10 times a year, other solutions are
fine enough. Using mtdblock directly is not reliable because of bad
blocks which come from time to time. If your FS happens to be located
on one of them, you're screwed. UBI solves such issues and ubiblock
provides a nice interface for this. I even thought about putting the
kernel on top of UBI so that it better resists NAND issues, but some
versions of u-boot do not seem to update it correctly.

In fact, my feeling is that ubiblock provides the same flexibility
with MTD as you have on new devices with eMMC. You have no wear
levelling, and so what ? You never know if your eMMC does it well
either. I even had a series of compactflash which died after a small
number of writes in the past, so that has existed and will always.

Also, all these low-level features on top of MTD are used by people
who try to build systems and who are expected to understand a little
bit some of the limits of the solutions they use. It's not the basic
joe user who will install ext4 on top of ubiblock on his NAND by
himself.

This I think it's a bad idea to artificially remove some features
if they're not broken.

Regards,
Willy
Richard Weinberger Feb. 9, 2014, 12:17 a.m. UTC | #15
Am 09.02.2014 00:37, schrieb Willy Tarreau:
>>> I gave an example with ext2 for the config. It's a bit excessive to
>>> quickly declare "there is simply no use case for $put_your_option_here",
>>> it just means that *you* don't have this use case, which I perfectly
>>> respect.
>>
>> The mail with your ext2 use case arrived afterward I've sent that mail.
> 
> No problem.
> 
>> So you are using ext2 as config filesystem because you're facing issues with ubifs?
> 
> No, I've been using ext2 on x86-based hardware and compact flash for
> something like 10 years with a great success (easy to mount, easy to
> fix, easy to save, easy to occasionally add a backup copy or an extra
> data file, etc). I contemplated ubifs on NAND devices as an alternative
> when starting to play with ARM-based devices, and lost the reliability
> and ability to fix. Switching back to the proven ext2 completely solved
> the issues in the end. Ubifs is nice when you need a real read/write FS,
> but most small devices do not need wear leveling or any of such nice
> features. When you just write 1-10 times a year, other solutions are
> fine enough. Using mtdblock directly is not reliable because of bad
> blocks which come from time to time. If your FS happens to be located
> on one of them, you're screwed. UBI solves such issues and ubiblock
> provides a nice interface for this. I even thought about putting the
> kernel on top of UBI so that it better resists NAND issues, but some
> versions of u-boot do not seem to update it correctly.

Thanks a lot for the detailed explanation.

> In fact, my feeling is that ubiblock provides the same flexibility
> with MTD as you have on new devices with eMMC. You have no wear
> levelling, and so what ? You never know if your eMMC does it well
> either. I even had a series of compactflash which died after a small
> number of writes in the past, so that has existed and will always.
> 
> Also, all these low-level features on top of MTD are used by people
> who try to build systems and who are expected to understand a little
> bit some of the limits of the solutions they use. It's not the basic
> joe user who will install ext4 on top of ubiblock on his NAND by
> himself.

My experience has shown the opposite. ;-)

> This I think it's a bad idea to artificially remove some features
> if they're not broken.

Your arguments have convinced me, let's keep it and hope the best.

Thanks,
//richard
Willy Tarreau Feb. 9, 2014, 7:51 a.m. UTC | #16
On Sun, Feb 09, 2014 at 01:17:26AM +0100, Richard Weinberger wrote:
> > Also, all these low-level features on top of MTD are used by people
> > who try to build systems and who are expected to understand a little
> > bit some of the limits of the solutions they use. It's not the basic
> > joe user who will install ext4 on top of ubiblock on his NAND by
> > himself.
> 
> My experience has shown the opposite. ;-)

When these happen, it's most likely because they copy-paste a howto
from a blog, so anyway they'll blindly follow whatever appears on the
blog including patching their kernel or forcing whatever option :-)

> > This I think it's a bad idea to artificially remove some features
> > if they're not broken.
> 
> Your arguments have convinced me, let's keep it and hope the best.

OK thanks!
Willy
Richard Weinberger Feb. 9, 2014, 10:56 p.m. UTC | #17
On Wed, Jan 29, 2014 at 9:38 PM, Ezequiel Garcia
<ezequiel.garcia@free-electrons.com> wrote:
> Block device emulation on top of ubi volumes with cached read/write support.
> Both the cached access and the write support are selectable at compile time.
>
> Given UBI takes care of wear leveling and bad block management it's possible
> to add a thin layer to enable block device access to UBI volumes.
> This allows to use a block-oriented filesystem on a flash device.
>
> In a similar fashion to mtdblock, a 1-LEB size cache has been
> implemented. However, very memory-constrained systems can choose to
> disable the cache and save the 1-LEB byte allocation.
>
> If write support is enabled, the flash device will be written when the cache
> is flushed. The following events trigger this:
> * block device release (detach)
> * different than cached leb is accessed
> * io-barrier is received through a REQ_FLUSH request
>
> Despite this efforts, it's very important to remember that regular
> block-oriented filesystems have no care at all about wear leveling;
> they will access the block device randomly, only caring for performance.
> Therefore, write support should be selected only for development and
> with extreme caution.
>
> The cache is 1-LEB bytes, vmalloced at open() and freed at release();
> in addition, each block device has a workqueue associated.
>
> Block devices are created upon user request through new ioctls:
> UBI_IOCVOLATTBLK to attach and UBI_IOCVOLDETBLK to detach.
> Also, a new UBI module parameter is added 'ubi.block'. This parameter is
> needed in order to attach a block device on boot-up time, allowing to
> mount the rootfs on a ubiblock device.
> For instance, you could have these kernel parameters:
>
>   ubi.mtd=5 ubi.block=0,0 root=/dev/ubiblock0_0
>
> Or, if you compile ubi as a module:
>
>   $ modprobe ubi mtd=/dev/mtd5 block=/dev/ubi0_0
>
> Cc: Artem Bityutskiy <dedekind1@gmail.com>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Brian Norris <computersforpeace@gmail.com>
> Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
> Cc: Tim Bird <tim.bird@am.sony.com>
> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> Cc: Mike Frysinger <vapier@gentoo.org>
> Cc: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
> Cc: Willy Tarreau <w@1wt.eu>
> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> ---
>  drivers/mtd/ubi/Kconfig     |  42 +++
>  drivers/mtd/ubi/Makefile    |   1 +
>  drivers/mtd/ubi/block.c     | 899 ++++++++++++++++++++++++++++++++++++++++++++
>  drivers/mtd/ubi/build.c     |   5 +
>  drivers/mtd/ubi/cdev.c      |  20 +
>  drivers/mtd/ubi/ubi.h       |  14 +
>  include/uapi/mtd/ubi-user.h |  11 +
>  7 files changed, 992 insertions(+)
>  create mode 100644 drivers/mtd/ubi/block.c
>
> diff --git a/drivers/mtd/ubi/Kconfig b/drivers/mtd/ubi/Kconfig
> index 36663af..2893775 100644
> --- a/drivers/mtd/ubi/Kconfig
> +++ b/drivers/mtd/ubi/Kconfig
> @@ -87,4 +87,46 @@ config MTD_UBI_GLUEBI
>            work on top of UBI. Do not enable this unless you use legacy
>            software.
>
> +config MTD_UBI_BLOCK
> +       bool "Block device access to UBI volumes"
> +       default n
> +       help
> +          Since UBI already takes care of eraseblock wear leveling
> +          and bad block handling, it's possible to implement a block
> +          device on top of it and therefore mount regular filesystems
> +          (i.e. not flash-oriented, as ext4).
> +          In other words, this is a software flash translation layer.
> +
> +          This can be particularly interesting to allow mounting a read-only
> +          filesystem, such as squashfs, on a NAND device.
> +
> +          When selected, this feature will be built-in the ubi module
> +          and block devices will be able to be created and removed using
> +          the userspace 'ubiblkvol' tool (provided mtd-utils).
> +
> +          If in doubt, say "N".
> +
> +config MTD_UBI_BLOCK_CACHED
> +       bool "Enable cached UBI block access"
> +       default y
> +       depends on MTD_UBI_BLOCK
> +       help
> +          In order to reduce flash device access, this option enables a 1-LEB
> +          sized cache. For read-only access this can be an overkill, given
> +          filesystems most likely implement their own caching policies.
> +          Moreover, since a LEB can be as large as ~1 MiB, memory-constrained
> +          platforms can choose to disable this.
> +
> +config MTD_UBI_BLOCK_WRITE_SUPPORT
> +       bool "Enable write support (DANGEROUS)"
> +       default n
> +       depends on MTD_UBI_BLOCK
> +       select MTD_UBI_BLOCK_CACHED
> +       help
> +          This is a *very* dangerous feature. Using a regular block-oriented
> +          filesystem might impact heavily on a flash device wear.
> +          Use with extreme caution.
> +
> +          If in doubt, say "N".
> +
>  endif # MTD_UBI
> diff --git a/drivers/mtd/ubi/Makefile b/drivers/mtd/ubi/Makefile
> index b46b0c97..4e3c3d7 100644
> --- a/drivers/mtd/ubi/Makefile
> +++ b/drivers/mtd/ubi/Makefile
> @@ -3,5 +3,6 @@ obj-$(CONFIG_MTD_UBI) += ubi.o
>  ubi-y += vtbl.o vmt.o upd.o build.o cdev.o kapi.o eba.o io.o wl.o attach.o
>  ubi-y += misc.o debug.o
>  ubi-$(CONFIG_MTD_UBI_FASTMAP) += fastmap.o
> +ubi-$(CONFIG_MTD_UBI_BLOCK) += block.o
>
>  obj-$(CONFIG_MTD_UBI_GLUEBI) += gluebi.o
> diff --git a/drivers/mtd/ubi/block.c b/drivers/mtd/ubi/block.c
> new file mode 100644
> index 0000000..6a7dc00
> --- /dev/null
> +++ b/drivers/mtd/ubi/block.c
> @@ -0,0 +1,899 @@
> +/*
> + * Copyright (c) 2014 Ezequiel Garcia
> + * Copyright (c) 2011 Free Electrons
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, version 2.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
> + * the GNU General Public License for more details.
> + *
> + * A block implementation on UBI volume
> + * ------------------------------------
> + *
> + * How to use this? A bunch of examples worth a thousand words:
> + *
> + * If you want to attach a block device on bootup time, e.g. in order
> + * to mount the rootfs on such a block device:
> + *
> + * Using the UBI volume path:
> + *  ubi.block=/dev/ubi0_0
> + *
> + * Using the UBI device, and the volume name:
> + *  ubi.block=0,rootfs
> + *
> + * Using both UBI device number and UBI volume number:
> + *  ubi.block=0,0
> + *
> + * In this case you would have such kernel parameters:
> + *  ubi.mtd=5 ubi.block=0,0 root=/dev/ubiblock0_0
> + *
> + * Of course, if you compile ubi as a module you can use this
> + * parameter on module insertion time:
> + *  modprobe ubi mtd=/dev/mtd5 block=/dev/ubi0_0
> + *
> + * For runtime block attaching/detaching, see mtd-utils' ubiblkvol tool.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/err.h>
> +#include <linux/kernel.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include <linux/mtd/ubi.h>
> +#include <linux/workqueue.h>
> +#include <linux/blkdev.h>
> +#include <linux/hdreg.h>
> +
> +#include "ubi-media.h"
> +#include "ubi.h"
> +
> +/* Maximum number of supported devices */
> +#define UBIBLOCK_MAX_DEVICES 32
> +
> +/* Maximum length of the 'block=' parameter */
> +#define UBIBLOCK_PARAM_LEN 63
> +
> +/* Maximum number of comma-separated items in the 'block=' parameter */
> +#define UBIBLOCK_PARAM_COUNT 2
> +
> +struct ubiblock_param {
> +       int ubi_num;
> +       int vol_id;
> +       char name[UBIBLOCK_PARAM_LEN+1];
> +};
> +
> +/* Numbers of elements set in the @ubiblock_param array */
> +static int ubiblock_devs __initdata;
> +
> +/* MTD devices specification parameters */
> +static struct ubiblock_param ubiblock_param[UBIBLOCK_MAX_DEVICES] __initdata;
> +
> +struct ubiblock_cache {
> +       char *buffer;
> +       enum { STATE_EMPTY, STATE_CLEAN, STATE_DIRTY } state;
> +       int leb_num;
> +};
> +
> +struct ubiblock {
> +       struct ubi_volume_desc *desc;
> +       struct ubi_volume_info *vi;
> +       int ubi_num;
> +       int vol_id;
> +       int refcnt;
> +
> +       struct gendisk *gd;
> +       struct request_queue *rq;
> +
> +       struct workqueue_struct *wq;
> +       struct work_struct work;
> +
> +       struct mutex vol_mutex;
> +       spinlock_t queue_lock;
> +       struct list_head list;
> +
> +       int leb_size;
> +#ifdef CONFIG_MTD_UBI_BLOCK_CACHED
> +       struct ubiblock_cache cache;
> +#endif
> +};
> +
> +/* Linked list of all ubiblock instances */
> +static LIST_HEAD(ubiblock_devices);
> +static DEFINE_MUTEX(devices_mutex);
> +static int ubiblock_major;
> +
> +/* Ugh! this parameter parsing code is awful */
> +static int __init ubiblock_set_param(const char *val,
> +                                    const struct kernel_param *kp)
> +{
> +       int i, ret;
> +       size_t len;
> +       struct ubiblock_param *param;
> +       char buf[UBIBLOCK_PARAM_LEN];
> +       char *pbuf = &buf[0];
> +       char *tokens[UBIBLOCK_PARAM_COUNT];
> +
> +       if (!val)
> +               return -EINVAL;
> +
> +       len = strnlen(val, UBIBLOCK_PARAM_LEN);
> +       if (len == 0) {
> +               ubi_warn("block: empty 'block=' parameter - ignored\n");
> +               return 0;
> +       }
> +
> +       if (len == UBIBLOCK_PARAM_LEN) {
> +               ubi_err("block: parameter \"%s\" is too long, max. is %d\n",
> +                       val, UBIBLOCK_PARAM_LEN);
> +               return -EINVAL;
> +       }
> +
> +       strcpy(buf, val);
> +
> +       /* Get rid of the final newline */
> +       if (buf[len - 1] == '\n')
> +               buf[len - 1] = '\0';
> +
> +       for (i = 0; i < UBIBLOCK_PARAM_COUNT; i++)
> +               tokens[i] = strsep(&pbuf, ",");
> +
> +       param = &ubiblock_param[ubiblock_devs];
> +       if (tokens[1]) {
> +               /* Two parameters: can be 'ubi, vol_id' or 'ubi, vol_name' */
> +               ret = kstrtoint(tokens[0], 10, &param->ubi_num);
> +               if (ret < 0)
> +                       return -EINVAL;
> +
> +               /* Second param can be a number or a name */
> +               ret = kstrtoint(tokens[1], 10, &param->vol_id);
> +               if (ret < 0) {
> +                       param->vol_id = -1;
> +                       strcpy(param->name, tokens[1]);
> +               }
> +
> +       } else {
> +               /* One parameter: must be device path */
> +               strcpy(param->name, tokens[0]);
> +               param->ubi_num = -1;
> +               param->vol_id = -1;
> +       }
> +
> +       ubiblock_devs++;
> +
> +       return 0;
> +}
> +
> +static const struct kernel_param_ops ubiblock_param_ops = {
> +       .set    = ubiblock_set_param,
> +};
> +module_param_cb(block, &ubiblock_param_ops, NULL, 0);
> +MODULE_PARM_DESC(block, "Attach block devices to UBI volumes. Parameter format: block=<path|dev,num|dev,name>.\n"
> +                       "Multiple \"block\" parameters may be specified.\n"
> +                       "UBI volumes may be specified by their number, name, or path to the device node.\n"
> +                       "Examples\n"
> +                       "Using the UBI volume path:\n"
> +                       "ubi.block=/dev/ubi0_0\n"
> +                       "Using the UBI device, and the volume name:\n"
> +                       "ubi.block=0,rootfs\n"
> +                       "Using both UBI device number and UBI volume number:\n"
> +                       "ubi.block=0,0\n");
> +
> +static struct ubiblock *find_dev_nolock(int ubi_num, int vol_id)
> +{
> +       struct ubiblock *dev;
> +
> +       list_for_each_entry(dev, &ubiblock_devices, list)
> +               if (dev->ubi_num == ubi_num && dev->vol_id == vol_id)
> +                       return dev;
> +       return NULL;
> +}
> +
> +#ifdef CONFIG_MTD_UBI_BLOCK_CACHED
> +static void ubiblock_clean_cache(struct ubiblock *dev)
> +{
> +       dev->cache.leb_num = -1;
> +       dev->cache.state = STATE_EMPTY;
> +}
> +
> +static void ubiblock_free_cache(struct ubiblock *dev)
> +{
> +       dev->cache.leb_num = -1;
> +       dev->cache.state = STATE_EMPTY;
> +       vfree(dev->cache.buffer);
> +       dev->cache.buffer = NULL;
> +}
> +
> +static int ubiblock_alloc_cache(struct ubiblock *dev)
> +{
> +       dev->cache.state = STATE_EMPTY;
> +       dev->cache.leb_num = -1;
> +       dev->cache.buffer = vmalloc(dev->leb_size);
> +       if (!dev->cache.buffer)
> +               return -ENOMEM;
> +       return 0;
> +}
> +
> +static bool leb_in_cache(struct ubiblock_cache *cache, int leb_num)
> +{
> +       return cache->leb_num == leb_num;
> +}
> +
> +static int ubiblock_fill_cache(struct ubiblock *dev, int leb_num,
> +                              struct ubiblock_cache *cache)
> +{
> +       int ret;
> +
> +       /* Warn if we fill cache while being dirty */
> +       WARN_ON(cache->state == STATE_DIRTY);
> +
> +       cache->leb_num = leb_num;
> +       cache->state = STATE_CLEAN;
> +
> +       ret = ubi_read(dev->desc, leb_num, cache->buffer, 0,
> +                      dev->leb_size);
> +       if (ret) {
> +               ubi_err("%s ubi_read error %d", dev->gd->disk_name, ret);
> +               return ret;
> +       }
> +       return 0;
> +}
> +
> +static int ubiblock_read_to_buf(struct ubiblock *dev, char *buffer,
> +                               int leb, int offset, int len)
> +{
> +       int ret;
> +       char *cache_buffer;
> +       /*
> +        * First try in cache, if it's not there load this leb.
> +        * Note that reading never flushes to disk!
> +        */
> +       if (leb_in_cache(&dev->cache, leb)) {
> +               cache_buffer = dev->cache.buffer;
> +       } else {

I think you have to flush the cache here too.
With RW support enabled you can end up with a dirty cache here.

> +               /* Leb is not in cache: fill it! */
> +               ret = ubiblock_fill_cache(dev, leb, &dev->cache);
> +               if (ret)
> +                       return ret;
> +               cache_buffer = dev->cache.buffer;
> +       }
> +       memcpy(buffer, cache_buffer + offset, len);
> +       return 0;
> +}
> +#else
> +static inline void ubiblock_clean_cache(struct ubiblock *dev) {}
> +
> +static inline void ubiblock_free_cache(struct ubiblock *dev) {}
> +
> +static inline int ubiblock_alloc_cache(struct ubiblock *dev)
> +{
> +       return 0;
> +}
> +
> +static int ubiblock_read_to_buf(struct ubiblock *dev, char *buffer,
> +                               int leb, int offset, int len)
> +{
> +       int ret;
> +
> +       ret = ubi_read(dev->desc, leb, buffer, offset, len);
> +       if (ret) {
> +               ubi_err("%s ubi_read error %d",
> +                       dev->gd->disk_name, ret);
> +               return ret;
> +       }
> +       return 0;
> +}
> +#endif /* CONFIG_MTD_UBI_BLOCK_CACHED */
> +
> +#ifdef CONFIG_MTD_UBI_BLOCK_WRITE_SUPPORT
> +static int ubiblock_flush(struct ubiblock *dev, bool sync)
> +{
> +       struct ubiblock_cache *cache = &dev->cache;
> +       int ret = 0;
> +
> +       if (cache->state != STATE_DIRTY)
> +               return 0;
> +
> +       /*
> +        * mtdblock sets STATE_EMPTY, arguing that it prevents the
> +        * underlying media to get changed without notice.
> +        * I'm not fully convinced, so I just put STATE_CLEAN.
> +        */
> +       cache->state = STATE_CLEAN;
> +
> +       /* Atomically change leb with buffer contents */
> +       ret = ubi_leb_change(dev->desc, cache->leb_num,
> +                            cache->buffer, dev->leb_size);
> +       if (ret) {
> +               ubi_err("%s ubi_leb_change error %d",
> +                       dev->gd->disk_name, ret);
> +               return ret;
> +       }
> +
> +       /* Sync ubi device when device is released and on block flush ioctl */
> +       if (sync)
> +               ret = ubi_sync(dev->ubi_num);
> +
> +       return ret;
> +}
> +
> +static int ubiblock_write(struct ubiblock *dev, const char *buffer,
> +                         int pos, int len)
> +{
> +       int leb, offset, ret;
> +       int bytes_left = len;
> +       int to_write = len;
> +       struct ubiblock_cache *cache = &dev->cache;
> +
> +       /* Get (leb:offset) address to write */
> +       leb = pos / dev->leb_size;
> +       offset = pos % dev->leb_size;
> +
> +       while (bytes_left) {
> +               /*
> +                * We can only write one leb at a time.
> +                * Therefore if the write length is larger than
> +                * one leb size, we split the operation.
> +                */
> +               if (offset + to_write > dev->leb_size)
> +                       to_write = dev->leb_size - offset;
> +
> +               /*
> +                * If leb is not in cache, we flush current cached
> +                * leb to disk. Cache contents will be filled by reading device.
> +                */
> +               if (!leb_in_cache(cache, leb)) {
> +
> +                       ret = ubiblock_flush(dev, false);
> +                       if (ret)
> +                               return ret;
> +
> +                       ret = ubiblock_fill_cache(dev, leb, cache);
> +                       if (ret)
> +                               return ret;
> +               }
> +
> +               memcpy(cache->buffer + offset, buffer, to_write);
> +
> +               /* This is the only place where we dirt the cache */
> +               cache->state = STATE_DIRTY;
> +
> +               buffer += to_write;
> +               bytes_left -= to_write;
> +               to_write = bytes_left;
> +               offset = 0;
> +               leb++;
> +       }
> +       return 0;
> +}
> +#else
> +static int ubiblock_flush(struct ubiblock *dev, bool sync)
> +{
> +       return -EPERM;
> +}
> +
> +static int ubiblock_write(struct ubiblock *dev, const char *buffer,
> +                         int pos, int len)
> +{
> +       return -EPERM;
> +}
> +#endif /* CONFIG_MTD_UBI_BLOCK_WRITE_SUPPORT */
> +
> +static int ubiblock_read(struct ubiblock *dev, char *buffer, int pos, int len)
> +{
> +       int leb, offset, ret;
> +       int bytes_left = len;
> +       int to_read = len;
> +
> +       /* Get leb:offset address to read from */
> +       leb = pos / dev->leb_size;
> +       offset = pos % dev->leb_size;
> +
> +       while (bytes_left) {
> +
> +               /*
> +                * We can only read one leb at a time.
> +                * Therefore if the read length is larger than
> +                * one leb size, we split the operation.
> +                */
> +               if (offset + to_read > dev->leb_size)
> +                       to_read = dev->leb_size - offset;
> +
> +               ret = ubiblock_read_to_buf(dev, buffer, leb, offset, to_read);
> +               if (ret)
> +                       return ret;
> +
> +               buffer += to_read;
> +               bytes_left -= to_read;
> +               to_read = bytes_left;
> +               leb++;
> +               offset = 0;
> +       }
> +       return 0;
> +}
> +
> +static int do_ubiblock_request(struct ubiblock *dev, struct request *req)
> +{
> +       int pos, len;
> +
> +       if (req->cmd_flags & REQ_FLUSH)
> +               return ubiblock_flush(dev, true);
> +
> +       if (req->cmd_type != REQ_TYPE_FS)
> +               return -EIO;
> +
> +       if (blk_rq_pos(req) + blk_rq_cur_sectors(req) >
> +           get_capacity(req->rq_disk))
> +               return -EIO;
> +
> +       pos = blk_rq_pos(req) << 9;
> +       len = blk_rq_cur_bytes(req);
> +
> +       switch (rq_data_dir(req)) {
> +       case READ:
> +               return ubiblock_read(dev, req->buffer, pos, len);
> +       case WRITE:
> +               return ubiblock_write(dev, req->buffer, pos, len);
> +       default:
> +               return -EIO;
> +       }
> +
> +       return 0;
> +}
> +
> +static void ubiblock_do_work(struct work_struct *work)
> +{
> +       struct ubiblock *dev =
> +               container_of(work, struct ubiblock, work);
> +       struct request_queue *rq = dev->rq;
> +       struct request *req;
> +       int res;
> +
> +       spin_lock_irq(rq->queue_lock);
> +
> +       req = blk_fetch_request(rq);
> +       while (req) {
> +
> +               spin_unlock_irq(rq->queue_lock);
> +
> +               mutex_lock(&dev->vol_mutex);
> +               res = do_ubiblock_request(dev, req);
> +               mutex_unlock(&dev->vol_mutex);
> +
> +               spin_lock_irq(rq->queue_lock);
> +
> +               /*
> +                * If we're done with this request,
> +                * we need to fetch a new one
> +                */
> +               if (!__blk_end_request_cur(req, res))
> +                       req = blk_fetch_request(rq);
> +       }
> +
> +       spin_unlock_irq(rq->queue_lock);
> +}
> +
> +static void ubiblock_request(struct request_queue *rq)
> +{
> +       struct ubiblock *dev;
> +       struct request *req;
> +
> +       dev = rq->queuedata;
> +
> +       if (!dev)
> +               while ((req = blk_fetch_request(rq)) != NULL)
> +                       __blk_end_request_all(req, -ENODEV);
> +       else
> +               queue_work(dev->wq, &dev->work);
> +}
> +
> +static int ubiblock_open(struct block_device *bdev, fmode_t mode)
> +{
> +       struct ubiblock *dev = bdev->bd_disk->private_data;
> +       int ubi_mode = UBI_READONLY;
> +       int ret;
> +#ifdef CONFIG_MTD_UBI_BLOCK_WRITE_SUPPORT
> +       const bool allow_write = true;
> +#else
> +       const bool allow_write = false;
> +#endif
> +
> +       mutex_lock(&dev->vol_mutex);
> +       if (dev->refcnt > 0) {
> +               /*
> +                * The volume is already opened,
> +                * just increase the reference counter
> +                *
> +                * If the first user has oppened this as read-only,
> +                * we don't allow to open as read-write.
> +                * This is the simplest solution. A better one would
> +                * be to re-open the volume as writable.
> +                */
> +               if ((mode & FMODE_WRITE) &&
> +                   (dev->desc->mode != UBI_READWRITE)) {
> +                       ret = -EBUSY;
> +                       goto out_unlock;
> +               }
> +               goto out_done;
> +       }
> +
> +       if (allow_write) {
> +               if (mode & FMODE_WRITE)
> +                       ubi_mode = UBI_READWRITE;
> +       } else {
> +               if (mode & FMODE_WRITE) {
> +                       ret = -EPERM;
> +                       goto out_unlock;
> +               }
> +       }
> +
> +       dev->desc = ubi_open_volume(dev->ubi_num, dev->vol_id, ubi_mode);
> +       if (IS_ERR(dev->desc)) {
> +               ubi_err("%s failed to open ubi volume %d_%d",
> +                       dev->gd->disk_name, dev->ubi_num, dev->vol_id);
> +
> +               ret = PTR_ERR(dev->desc);
> +               dev->desc = NULL;
> +               goto out_unlock;
> +       }
> +
> +       dev->vi = kzalloc(sizeof(struct ubi_volume_info), GFP_KERNEL);
> +       if (!dev->vi) {
> +               ret = -ENOMEM;
> +               goto out_close;
> +       }
> +       ubi_get_volume_info(dev->desc, dev->vi);
> +       dev->leb_size = dev->vi->usable_leb_size;
> +
> +       ret = ubiblock_alloc_cache(dev);
> +       if (ret)
> +               goto out_free;
> +out_done:
> +       dev->refcnt++;
> +       mutex_unlock(&dev->vol_mutex);
> +       return 0;
> +
> +out_free:
> +       kfree(dev->vi);
> +out_close:
> +       ubi_close_volume(dev->desc);
> +       dev->desc = NULL;
> +out_unlock:
> +       mutex_unlock(&dev->vol_mutex);
> +       return ret;
> +}
> +
> +static void ubiblock_release(struct gendisk *gd, fmode_t mode)
> +{
> +       struct ubiblock *dev = gd->private_data;
> +
> +       mutex_lock(&dev->vol_mutex);
> +
> +       dev->refcnt--;
> +       if (dev->refcnt == 0) {
> +               ubiblock_flush(dev, true);
> +               ubiblock_free_cache(dev);
> +
> +               kfree(dev->vi);
> +               ubi_close_volume(dev->desc);
> +
> +               dev->vi = NULL;
> +               dev->desc = NULL;
> +       }
> +
> +       mutex_unlock(&dev->vol_mutex);
> +}
> +
> +static int ubiblock_ioctl(struct block_device *bdev, fmode_t mode,
> +                             unsigned int cmd, unsigned long arg)
> +{
> +       struct ubiblock *dev = bdev->bd_disk->private_data;
> +       int ret = -ENXIO;
> +
> +       if (!dev)
> +               return ret;
> +
> +       mutex_lock(&dev->vol_mutex);
> +
> +       /* I can't get this to get called. What's going on? */
> +       switch (cmd) {
> +       case BLKFLSBUF:
> +               ret = ubiblock_flush(dev, true);
> +               break;
> +       default:
> +               ret = -ENOTTY;
> +       }
> +
> +       mutex_unlock(&dev->vol_mutex);
> +       return ret;
> +}
> +
> +static int ubiblock_getgeo(struct block_device *bdev, struct hd_geometry *geo)
> +{
> +       /* Some tools might require this information */
> +       geo->heads = 1;
> +       geo->cylinders = 1;
> +       geo->sectors = get_capacity(bdev->bd_disk);
> +       geo->start = 0;
> +       return 0;
> +}
> +
> +static const struct block_device_operations ubiblock_ops = {
> +       .owner = THIS_MODULE,
> +       .open = ubiblock_open,
> +       .release = ubiblock_release,
> +       .ioctl = ubiblock_ioctl,
> +       .getgeo = ubiblock_getgeo,
> +};
> +
> +int ubiblock_add(struct ubi_volume_info *vi)
> +{
> +       struct ubiblock *dev;
> +       struct gendisk *gd;
> +       int disk_capacity;
> +       int ret;
> +
> +       /* Check that the volume isn't already handled */
> +       mutex_lock(&devices_mutex);
> +       if (find_dev_nolock(vi->ubi_num, vi->vol_id)) {
> +               mutex_unlock(&devices_mutex);
> +               return -EEXIST;
> +       }
> +       mutex_unlock(&devices_mutex);
> +
> +       dev = kzalloc(sizeof(struct ubiblock), GFP_KERNEL);
> +       if (!dev)
> +               return -ENOMEM;
> +
> +       mutex_init(&dev->vol_mutex);
> +
> +       dev->ubi_num = vi->ubi_num;
> +       dev->vol_id = vi->vol_id;
> +
> +       ubiblock_clean_cache(dev);
> +
> +       /* Initialize the gendisk of this ubiblock device */
> +       gd = alloc_disk(1);
> +       if (!gd) {
> +               ubi_err("block: alloc_disk failed");
> +               ret = -ENODEV;
> +               goto out_free_dev;
> +       }
> +
> +       gd->fops = &ubiblock_ops;
> +       gd->major = ubiblock_major;
> +       gd->first_minor = dev->ubi_num * UBI_MAX_VOLUMES + dev->vol_id;
> +       gd->private_data = dev;
> +       sprintf(gd->disk_name, "ubiblock%d_%d", dev->ubi_num, dev->vol_id);
> +       disk_capacity = (vi->size * vi->usable_leb_size) >> 9;
> +       set_capacity(gd, disk_capacity);
> +       dev->gd = gd;
> +
> +       spin_lock_init(&dev->queue_lock);
> +       dev->rq = blk_init_queue(ubiblock_request, &dev->queue_lock);
> +       if (!dev->rq) {
> +               ubi_err("block: blk_init_queue failed");
> +               ret = -ENODEV;
> +               goto out_put_disk;
> +       }
> +
> +       dev->rq->queuedata = dev;
> +       dev->gd->queue = dev->rq;
> +
> +       blk_queue_flush(dev->rq, REQ_FLUSH);
> +
> +       /*
> +        * Create one workqueue per volume (per registered block device).
> +        * Rembember workqueues are cheap, they're not threads.
> +        */
> +       dev->wq = alloc_workqueue(gd->disk_name, 0, 0);
> +       if (!dev->wq)
> +               goto out_free_queue;
> +       INIT_WORK(&dev->work, ubiblock_do_work);
> +
> +       mutex_lock(&devices_mutex);
> +       list_add_tail(&dev->list, &ubiblock_devices);
> +       mutex_unlock(&devices_mutex);
> +
> +       /* Must be the last step: anyone can call file ops from now on */
> +       add_disk(dev->gd);
> +
> +       ubi_msg("%s created from ubi%d:%d(%s)",
> +               dev->gd->disk_name, dev->ubi_num, dev->vol_id, vi->name);
> +
> +       return 0;
> +
> +out_free_queue:
> +       blk_cleanup_queue(dev->rq);
> +out_put_disk:
> +       put_disk(dev->gd);
> +out_free_dev:
> +       kfree(dev);
> +
> +       return ret;
> +}
> +
> +static void ubiblock_cleanup(struct ubiblock *dev)
> +{
> +       del_gendisk(dev->gd);
> +       blk_cleanup_queue(dev->rq);
> +       ubi_msg("%s released", dev->gd->disk_name);
> +       put_disk(dev->gd);
> +}
> +
> +int ubiblock_del(struct ubi_volume_info *vi)
> +{
> +       struct ubiblock *dev;
> +
> +       mutex_lock(&devices_mutex);
> +       dev = find_dev_nolock(vi->ubi_num, vi->vol_id);
> +       if (!dev) {
> +               mutex_unlock(&devices_mutex);
> +               return -ENODEV;
> +       }
> +
> +       /* Found a device, let's lock it so we can check if it's busy */
> +       mutex_lock(&dev->vol_mutex);
> +
> +       if (dev->refcnt > 0) {
> +               mutex_unlock(&dev->vol_mutex);
> +               mutex_unlock(&devices_mutex);
> +               return -EBUSY;
> +       }
> +
> +       /* Remove from device list */
> +       list_del(&dev->list);
> +       mutex_unlock(&devices_mutex);
> +
> +       /* Flush pending work and stop this workqueue */
> +       destroy_workqueue(dev->wq);
> +
> +       ubiblock_cleanup(dev);
> +       mutex_unlock(&dev->vol_mutex);
> +       kfree(dev);
> +       return 0;
> +}
> +
> +static void ubiblock_resize(struct ubi_volume_info *vi)
> +{
> +       struct ubiblock *dev;
> +       int disk_capacity;
> +
> +       /*
> +        * Need to lock the device list until we stop using the device,
> +        * otherwise the device struct might get released in ubiblock_del().
> +        */
> +       mutex_lock(&devices_mutex);
> +       dev = find_dev_nolock(vi->ubi_num, vi->vol_id);
> +       if (!dev) {
> +               mutex_unlock(&devices_mutex);
> +               return;
> +       }
> +
> +       mutex_lock(&dev->vol_mutex);
> +       disk_capacity = (vi->size * vi->usable_leb_size) >> 9;
> +       set_capacity(dev->gd, disk_capacity);
> +       ubi_msg("%s resized to %d LEBs", dev->gd->disk_name, vi->size);
> +       mutex_unlock(&dev->vol_mutex);
> +       mutex_unlock(&devices_mutex);
> +}
> +
> +static int ubiblock_notify(struct notifier_block *nb,
> +                        unsigned long notification_type, void *ns_ptr)
> +{
> +       struct ubi_notification *nt = ns_ptr;
> +
> +       switch (notification_type) {
> +       case UBI_VOLUME_ADDED:
> +               /*
> +                * We want to enforce explicit block device attaching for
> +                * volumes; so when a volume is added we do nothing.
> +                */
> +               break;
> +       case UBI_VOLUME_REMOVED:
> +               ubiblock_del(&nt->vi);
> +               break;
> +       case UBI_VOLUME_RESIZED:
> +               ubiblock_resize(&nt->vi);
> +               break;
> +       default:
> +               break;
> +       }
> +       return NOTIFY_OK;
> +}
> +
> +static struct notifier_block ubiblock_notifier = {
> +       .notifier_call = ubiblock_notify,
> +};
> +
> +static struct ubi_volume_desc * __init
> +open_volume_desc(const char *name, int ubi_num, int vol_id)
> +{
> +       if (ubi_num == -1)
> +               /* No ubi num, name must be a vol device path */
> +               return ubi_open_volume_path(name, UBI_READONLY);
> +       else if (vol_id == -1)
> +               /* No vol_id, must be vol_name */
> +               return ubi_open_volume_nm(ubi_num, name, UBI_READONLY);
> +       else
> +               return ubi_open_volume(ubi_num, vol_id, UBI_READONLY);
> +}
> +
> +static void __init ubiblock_add_from_param(void)
> +{
> +       int i, ret;
> +       struct ubiblock_param *p;
> +       struct ubi_volume_desc *desc;
> +       struct ubi_volume_info vi;
> +
> +       for (i = 0; i < ubiblock_devs; i++) {
> +               p = &ubiblock_param[i];
> +
> +               desc = open_volume_desc(p->name, p->ubi_num, p->vol_id);
> +               if (IS_ERR(desc)) {
> +                       ubi_warn("block: can't open volume, err=%ld\n",
> +                                PTR_ERR(desc));
> +                       continue;
> +               }
> +
> +               ubi_get_volume_info(desc, &vi);
> +               ret = ubiblock_add(&vi);
> +               if (ret)
> +                       ubi_warn("block: can't add '%s' volume, err=%d\n",
> +                                vi.name, ret);
> +               ubi_close_volume(desc);
> +       }
> +}
> +
> +int __init ubiblock_init(void)
> +{
> +       int ret;
> +
> +       ubiblock_major = register_blkdev(0, "ubiblock");
> +       if (ubiblock_major < 0)
> +               return ubiblock_major;
> +
> +       /* Attach block devices from 'block=' module param */
> +       ubiblock_add_from_param();
> +
> +       /*
> +        * Block devices needs to be attached to volumes explicitly
> +        * upon user request. So we ignore existing volumes.
> +        */
> +       ret = ubi_register_volume_notifier(&ubiblock_notifier, 1);
> +       if (ret < 0)
> +               unregister_blkdev(ubiblock_major, "ubiblock");
> +       return ret;
> +}
> +
> +void __exit ubiblock_exit(void)
> +{
> +       struct ubiblock *next;
> +       struct ubiblock *dev;
> +
> +       ubi_unregister_volume_notifier(&ubiblock_notifier);
> +
> +       list_for_each_entry_safe(dev, next, &ubiblock_devices, list) {
> +
> +               /* Flush pending work and stop workqueue */
> +               destroy_workqueue(dev->wq);
> +
> +               /* The module is being forcefully removed */
> +               WARN_ON(dev->desc);
> +
> +               /* Remove from device list */
> +               list_del(&dev->list);
> +
> +               ubiblock_cleanup(dev);
> +
> +               kfree(dev);
> +       }
> +
> +       unregister_blkdev(ubiblock_major, "ubiblock");
> +}
> diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c
> index e05dc62..2705bad 100644
> --- a/drivers/mtd/ubi/build.c
> +++ b/drivers/mtd/ubi/build.c
> @@ -1296,6 +1296,9 @@ static int __init ubi_init(void)
>                 }
>         }
>
> +       if (ubiblock_init() < 0)
> +               ubi_warn("cannot init block device access");
> +
>         return 0;
>
>  out_detach:
> @@ -1324,6 +1327,8 @@ static void __exit ubi_exit(void)
>  {
>         int i;
>
> +       ubiblock_exit();
> +
>         for (i = 0; i < UBI_MAX_DEVICES; i++)
>                 if (ubi_devices[i]) {
>                         mutex_lock(&ubi_devices_mutex);
> diff --git a/drivers/mtd/ubi/cdev.c b/drivers/mtd/ubi/cdev.c
> index 8ca49f2..39d3774 100644
> --- a/drivers/mtd/ubi/cdev.c
> +++ b/drivers/mtd/ubi/cdev.c
> @@ -561,6 +561,26 @@ static long vol_cdev_ioctl(struct file *file, unsigned int cmd,
>                 break;
>         }
>
> +       /* Attach a block device to an UBI volume */
> +       case UBI_IOCVOLATTBLK:
> +       {
> +               struct ubi_volume_info vi;
> +
> +               ubi_get_volume_info(desc, &vi);
> +               err = ubiblock_add(&vi);
> +               break;
> +       }
> +
> +       /* Dettach a block device from an UBI volume */
> +       case UBI_IOCVOLDETBLK:
> +       {
> +               struct ubi_volume_info vi;
> +
> +               ubi_get_volume_info(desc, &vi);
> +               err = ubiblock_del(&vi);
> +               break;
> +       }
> +
>         default:
>                 err = -ENOTTY;
>                 break;
> diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
> index 8ea6297..e76ff98 100644
> --- a/drivers/mtd/ubi/ubi.h
> +++ b/drivers/mtd/ubi/ubi.h
> @@ -864,6 +864,20 @@ int ubi_update_fastmap(struct ubi_device *ubi);
>  int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
>                      int fm_anchor);
>
> +/* block.c */
> +#ifdef CONFIG_MTD_UBI_BLOCK
> +int ubiblock_init(void);
> +void ubiblock_exit(void);
> +int ubiblock_add(struct ubi_volume_info *vi);
> +int ubiblock_del(struct ubi_volume_info *vi);
> +#else
> +static inline int ubiblock_init(void) { return 0; }
> +static inline void ubiblock_exit(void) {}
> +static inline int ubiblock_add(struct ubi_volume_info *vi) { return -ENOTTY; }
> +static inline int ubiblock_del(struct ubi_volume_info *vi) { return -ENOTTY; }
> +#endif
> +
> +
>  /*
>   * ubi_rb_for_each_entry - walk an RB-tree.
>   * @rb: a pointer to type 'struct rb_node' to use as a loop counter
> diff --git a/include/uapi/mtd/ubi-user.h b/include/uapi/mtd/ubi-user.h
> index 723c324..1080762 100644
> --- a/include/uapi/mtd/ubi-user.h
> +++ b/include/uapi/mtd/ubi-user.h
> @@ -134,6 +134,13 @@
>   * used. A pointer to a &struct ubi_set_vol_prop_req object is expected to be
>   * passed. The object describes which property should be set, and to which value
>   * it should be set.
> + *
> + * Block device access to UBI volumes
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + *
> + * To attach or detach a block device from an UBI volume the %UBI_IOCVOLATTBLK
> + * and %UBI_IOCVOLDETBLK ioctl commands should be used, respectively.
> + * These commands take no arguments.
>   */
>
>  /*
> @@ -191,6 +198,10 @@
>  /* Set an UBI volume property */
>  #define UBI_IOCSETVOLPROP _IOW(UBI_VOL_IOC_MAGIC, 6, \
>                                struct ubi_set_vol_prop_req)
> +/* Attach a block device to an UBI volume */
> +#define UBI_IOCVOLATTBLK _IO(UBI_VOL_IOC_MAGIC, 7)
> +/* Detach a block device from an UBI volume */
> +#define UBI_IOCVOLDETBLK _IO(UBI_VOL_IOC_MAGIC, 8)
>
>  /* Maximum MTD device name length supported by UBI */
>  #define MAX_UBI_MTD_NAME_LEN 127
> --
> 1.8.1.5
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
Ezequiel Garcia Feb. 10, 2014, 1:29 a.m. UTC | #18
Richard,

First of all, thanks for reviewing!

<nitpick>
If at all possible, it would be better if you could [snip]
the parts of the e-mail unrelated to the discussion.
It's a bit harder to follow when you include the whole patch
in your reply. Not a big deal, of course.
</nitpick>

On Sat, Feb 08, 2014 at 10:37:19PM +0100, Richard Weinberger wrote:
> On Wed, Jan 29, 2014 at 9:38 PM, Ezequiel Garcia
> <ezequiel.garcia@free-electrons.com> wrote:
[..]
> > +
> > +config MTD_UBI_BLOCK_WRITE_SUPPORT
> > +       bool "Enable write support (DANGEROUS)"
> > +       default n
> > +       depends on MTD_UBI_BLOCK
> > +       select MTD_UBI_BLOCK_CACHED
> > +       help
> > +          This is a *very* dangerous feature. Using a regular block-oriented
> > +          filesystem might impact heavily on a flash device wear.
> > +          Use with extreme caution.
> > +
> > +          If in doubt, say "N".
> 
> I really vote for dropping write support at all.
> 

I'll reply to this later.

[..]
> > +
> > +static int ubiblock_fill_cache(struct ubiblock *dev, int leb_num,
> > +                              struct ubiblock_cache *cache)
> > +{
> > +       int ret;
> > +
> > +       /* Warn if we fill cache while being dirty */
> > +       WARN_ON(cache->state == STATE_DIRTY);
> > +
> > +       cache->leb_num = leb_num;
> > +       cache->state = STATE_CLEAN;
> > +
> > +       ret = ubi_read(dev->desc, leb_num, cache->buffer, 0,
> > +                      dev->leb_size);
> > +       if (ret) {
> > +               ubi_err("%s ubi_read error %d", dev->gd->disk_name, ret);
> > +               return ret;
> > +       }
> 
> If read fails we still end up with a valid cache entry?
> Please set STATE_CLEAN only after a successful read.
> 

Good catch. I'll fix that.

[..]
> > +
> > +               mutex_lock(&dev->vol_mutex);
> > +               res = do_ubiblock_request(dev, req);
> > +               mutex_unlock(&dev->vol_mutex);
> 
> This means that you can never do parallel IO?
> 

Indeed. Feel free to prepare a follow-up patch improving it,
once this is merged.
Ezequiel Garcia Feb. 10, 2014, 2:36 a.m. UTC | #19
On Sun, Feb 09, 2014 at 11:56:24PM +0100, Richard Weinberger wrote:
> On Wed, Jan 29, 2014 at 9:38 PM, Ezequiel Garcia
> <ezequiel.garcia@free-electrons.com> wrote:
[..]
> > +
> > +static int ubiblock_read_to_buf(struct ubiblock *dev, char *buffer,
> > +                               int leb, int offset, int len)
> > +{
> > +       int ret;
> > +       char *cache_buffer;
> > +       /*
> > +        * First try in cache, if it's not there load this leb.
> > +        * Note that reading never flushes to disk!
> > +        */
> > +       if (leb_in_cache(&dev->cache, leb)) {
> > +               cache_buffer = dev->cache.buffer;
> > +       } else {
> 
> I think you have to flush the cache here too.
> With RW support enabled you can end up with a dirty cache here.
> 

Yes, right. This is a left-over from the two-cache design. Now that we
have just one cache, it's possible to flush it before filling it.

Good catch!
Ezequiel Garcia Feb. 10, 2014, 2:48 a.m. UTC | #20
On Sun, Feb 09, 2014 at 08:51:57AM +0100, Willy Tarreau wrote:
[..]
> 
> > > This I think it's a bad idea to artificially remove some features
> > > if they're not broken.
> > 
> > Your arguments have convinced me, let's keep it and hope the best.
> 

Let me add that keeping the write support follows the whole "mechanism,
not policy" kernel motto, doesn't it?

Regarding users, well the option looks like this:

  [ ]     Enable write support (DANGEROUS)

I think any user would think twice before enabling it.

Regaring the maintainability: the implementation is *really* simple,
and won't add much complexity.

Let's not prevent someone from divising a wear level aware block filesystem
and using it on top of ubiblock. And perhaps f2fs already does a good job!
Keeping this support would allow someone to take the lead and get some real
numbers on that.

As I said, the implementation is so simple that it seems a pity to just
drop it.

BTW, v5 (this is v4 which I forgot to add to the subject) is already on the
oven, with the fixes pointed out by Richard.
Artem Bityutskiy Feb. 10, 2014, 7:35 a.m. UTC | #21
Ezequiel,

first of all, thanks for the driver.

On Sun, 2014-02-09 at 23:48 -0300, Ezequiel Garcia wrote:
> On Sun, Feb 09, 2014 at 08:51:57AM +0100, Willy Tarreau wrote:
> [..]
> > 
> > > > This I think it's a bad idea to artificially remove some features
> > > > if they're not broken.
> > > 
> > > Your arguments have convinced me, let's keep it and hope the best.
> > 
> 
> Let me add that keeping the write support follows the whole "mechanism,
> not policy" kernel motto, doesn't it?
> 
> Regarding users, well the option looks like this:
> 
>   [ ]     Enable write support (DANGEROUS)
> 
> I think any user would think twice before enabling it.

Linus and Andrew usually ask reasonable questions like these for new
features. I'd like to ask them for the write feature.

Who are the customers for these?
How are the user of the write feature? How many?
Have it been tested? If yes, how?

These are really the things which define whether the feature should be
in or not, I think.

If write support has 0 or 1.5 customers and it was not tested
extensively, and never used in any kind of production, I am not sure it
is needed to be there. But let's first hear your answers.

It is simple is not good argument. It will be as simple to add it too.

WRT to DANGEROUS sign, people do not read Kconfig help. Some distro will
just enable this, people will start using this, and then start sending
unappy e-mails. We have this with MTD block. No matter how many times I
wrote to people that this is just a debugging module, they still kept
using it.

If this is about really few lines of code, the alternative would be to
leave them there, but just disable, and remove the Kconfig part. If
someone needs write support and starts looking inside the driver, he/she
will notice the code and start playing with that.
Richard Weinberger Feb. 10, 2014, 7:53 a.m. UTC | #22
Am 10.02.2014 02:29, schrieb Ezequiel Garcia:
>>> +
>>> +               mutex_lock(&dev->vol_mutex);
>>> +               res = do_ubiblock_request(dev, req);
>>> +               mutex_unlock(&dev->vol_mutex);
>>
>> This means that you can never do parallel IO?
>>
> 
> Indeed. Feel free to prepare a follow-up patch improving it,
> once this is merged.

Sorry, this is a very lame argument.

You need to describe why your application design has this flaw.
Modern SoC have very often more than one CPU and many filesystems can to parallel IO.
I'm sure currently the read performance of squashfs suffers because of that.

Thanks,
//richard
Ezequiel Garcia Feb. 10, 2014, 8:12 a.m. UTC | #23
On Mon, Feb 10, 2014 at 08:53:14AM +0100, Richard Weinberger wrote:
> Am 10.02.2014 02:29, schrieb Ezequiel Garcia:
> >>> +
> >>> +               mutex_lock(&dev->vol_mutex);
> >>> +               res = do_ubiblock_request(dev, req);
> >>> +               mutex_unlock(&dev->vol_mutex);
> >>
> >> This means that you can never do parallel IO?
> >>
> > 
> > Indeed. Feel free to prepare a follow-up patch improving it,
> > once this is merged.
> 
> Sorry, this is a very lame argument.
> 
> You need to describe why your application design has this flaw.

Not at all. It's perfectly fine to merge a feature with a simple
implementation and improve it progressively. In fact, I've explicitly
chosen the simplest implementation whenever possible. We can always
get back here and improve the performance.
Artem Bityutskiy Feb. 10, 2014, 8:24 a.m. UTC | #24
On Mon, 2014-02-10 at 05:12 -0300, Ezequiel Garcia wrote:
> On Mon, Feb 10, 2014 at 08:53:14AM +0100, Richard Weinberger wrote:
> > Am 10.02.2014 02:29, schrieb Ezequiel Garcia:
> > >>> +
> > >>> +               mutex_lock(&dev->vol_mutex);
> > >>> +               res = do_ubiblock_request(dev, req);
> > >>> +               mutex_unlock(&dev->vol_mutex);
> > >>
> > >> This means that you can never do parallel IO?
> > >>
> > > 
> > > Indeed. Feel free to prepare a follow-up patch improving it,
> > > once this is merged.
> > 
> > Sorry, this is a very lame argument.
> > 
> > You need to describe why your application design has this flaw.
> 
> Not at all. It's perfectly fine to merge a feature with a simple
> implementation and improve it progressively. In fact, I've explicitly
> chosen the simplest implementation whenever possible. We can always
> get back here and improve the performance.

The NAND part of the MTD layer serializes all the I/O, so probably it is
OK. May be needs to be documented, though. May be a comment in the code
would be nice to have too.

I mean, it is fine to have limitations, just be explicit and open about
all of them, in order to make your user aware of what to expect.

Would you be able to write a small article for the MTD web site about
the driver, may be some I/O figures there too, the limitations too? And
send a patch against mtd-www.git
Ezequiel Garcia Feb. 10, 2014, 8:27 a.m. UTC | #25
On Mon, Feb 10, 2014 at 09:35:50AM +0200, Artem Bityutskiy wrote:
> On Sun, 2014-02-09 at 23:48 -0300, Ezequiel Garcia wrote:
> > On Sun, Feb 09, 2014 at 08:51:57AM +0100, Willy Tarreau wrote:
> > [..]
> > > 
> > > > > This I think it's a bad idea to artificially remove some features
> > > > > if they're not broken.
> > > > 
> > > > Your arguments have convinced me, let's keep it and hope the best.
> > > 
> > 
> > Let me add that keeping the write support follows the whole "mechanism,
> > not policy" kernel motto, doesn't it?
> > 
> > Regarding users, well the option looks like this:
> > 
> >   [ ]     Enable write support (DANGEROUS)
> > 
> > I think any user would think twice before enabling it.
> 
> Linus and Andrew usually ask reasonable questions like these for new
> features. I'd like to ask them for the write feature.
> 
> Who are the customers for these?
> How are the user of the write feature? How many?

I'm not aware of any. So far all the users that I'm aware of will be using
this to mount a squashfs.

> Have it been tested? If yes, how?
> 

To be honest, not much.

> These are really the things which define whether the feature should be
> in or not, I think.
> 
> If write support has 0 or 1.5 customers and it was not tested
> extensively, and never used in any kind of production, I am not sure it
> is needed to be there. But let's first hear your answers.
> 

No, this hasn't been tested intensively and I'm pretty sure nobody would
ever put it in production before conducting such tests himself.

> It is simple is not good argument. It will be as simple to add it too.
> 

OK.

> WRT to DANGEROUS sign, people do not read Kconfig help. Some distro will
> just enable this, people will start using this, and then start sending
> unappy e-mails. We have this with MTD block. No matter how many times I
> wrote to people that this is just a debugging module, they still kept
> using it.
> 

If you really think distros will enable it and users will "just it", without
thinking about the consequences, then I'd say let's just remove it.
Willy Tarreau Feb. 10, 2014, 8:37 a.m. UTC | #26
On Mon, Feb 10, 2014 at 10:24:12AM +0200, Artem Bityutskiy wrote:
> On Mon, 2014-02-10 at 05:12 -0300, Ezequiel Garcia wrote:
> > On Mon, Feb 10, 2014 at 08:53:14AM +0100, Richard Weinberger wrote:
> > > Am 10.02.2014 02:29, schrieb Ezequiel Garcia:
> > > >>> +
> > > >>> +               mutex_lock(&dev->vol_mutex);
> > > >>> +               res = do_ubiblock_request(dev, req);
> > > >>> +               mutex_unlock(&dev->vol_mutex);
> > > >>
> > > >> This means that you can never do parallel IO?
> > > >>
> > > > 
> > > > Indeed. Feel free to prepare a follow-up patch improving it,
> > > > once this is merged.
> > > 
> > > Sorry, this is a very lame argument.
> > > 
> > > You need to describe why your application design has this flaw.
> > 
> > Not at all. It's perfectly fine to merge a feature with a simple
> > implementation and improve it progressively. In fact, I've explicitly
> > chosen the simplest implementation whenever possible. We can always
> > get back here and improve the performance.
> 
> The NAND part of the MTD layer serializes all the I/O, so probably it is
> OK. May be needs to be documented, though. May be a comment in the code
> would be nice to have too.

I think so as well. Seeing how slow my NAND is on the mirabox (about
14 MB/s), I think that whatever level of parallelism we could have
would not make anything better since we're always waiting for the
device anyway. Also, such systems are generally designed to limit
access as much as possible.

Regards,
Willy
Thomas Petazzoni Feb. 10, 2014, 8:42 a.m. UTC | #27
Dear Willy Tarreau,

On Sat, 8 Feb 2014 23:51:49 +0100, Willy Tarreau wrote:

> > I really vote for dropping write support at all.
> 
> Why ? When you put a read-only filesystem there such as squashfs, the
> only writes you'll have will be updates, and write support will be the
> only way to update the filesystem. So removing write support seriously
> impacts the usefulness of the feature itself.

Why can't you use ubi_updatevol to do the update of the squashfs image?
I.e instead of updating through the block layer, you update at the UBI
layer level. No?

Thomas
Willy Tarreau Feb. 10, 2014, 8:46 a.m. UTC | #28
On Mon, Feb 10, 2014 at 05:27:15AM -0300, Ezequiel Garcia wrote:
> On Mon, Feb 10, 2014 at 09:35:50AM +0200, Artem Bityutskiy wrote:
> > On Sun, 2014-02-09 at 23:48 -0300, Ezequiel Garcia wrote:
> > > On Sun, Feb 09, 2014 at 08:51:57AM +0100, Willy Tarreau wrote:
> > > [..]
> > > > 
> > > > > > This I think it's a bad idea to artificially remove some features
> > > > > > if they're not broken.
> > > > > 
> > > > > Your arguments have convinced me, let's keep it and hope the best.
> > > > 
> > > 
> > > Let me add that keeping the write support follows the whole "mechanism,
> > > not policy" kernel motto, doesn't it?
> > > 
> > > Regarding users, well the option looks like this:
> > > 
> > >   [ ]     Enable write support (DANGEROUS)
> > > 
> > > I think any user would think twice before enabling it.
> > 
> > Linus and Andrew usually ask reasonable questions like these for new
> > features. I'd like to ask them for the write feature.
> > 
> > Who are the customers for these?
> > How are the user of the write feature? How many?
> 
> I'm not aware of any. So far all the users that I'm aware of will be using
> this to mount a squashfs.

I'm one of the users who were waiting for this for a long time. In
fact I wanted to try to write such a driver myself when I discovered
Ezequiel had already done it.

I've been experiencing trouble from time to time with initrd that
could not be mounted because the underlying NAND had bad blocks. So
I really wanted to have something to provide block device access on
top of UBI to abstract this part.

The write aspect of it provides several benefits :

  - upgrades are easy and will take care of bad blocks. In fact they
    will be as convenient as with eMMC.

  - it's possible to use small FS like ext2 to save some config and
    a few extra tools that are written a few times a year. There is
    no reason to fear wear levelling when writing a few times a year,
    and such FS are convenient for this since they're small, proven,
    and support recovery tools like fsck for the emergency situations
    which arise from time to time due to power loss.

> > Have it been tested? If yes, how?
> > 
> 
> To be honest, not much.

I have tested it a little bit. Performed squashfs upgrades, written
to ext2, and so far I didn't encounter any issue.

> > These are really the things which define whether the feature should be
> > in or not, I think.
> > 
> > If write support has 0 or 1.5 customers and it was not tested
> > extensively, and never used in any kind of production, I am not sure it
> > is needed to be there. But let's first hear your answers.
> > 
> 
> No, this hasn't been tested intensively and I'm pretty sure nobody would
> ever put it in production before conducting such tests himself.

For sure, but conversely, disabling it in the code would result in
nobody ever testing it !

> > It is simple is not good argument. It will be as simple to add it too.
> > 
> 
> OK.
> 
> > WRT to DANGEROUS sign, people do not read Kconfig help. Some distro will
> > just enable this, people will start using this, and then start sending
> > unappy e-mails. We have this with MTD block. No matter how many times I
> > wrote to people that this is just a debugging module, they still kept
> > using it.
> > 
> 
> If you really think distros will enable it and users will "just it", without
> thinking about the consequences, then I'd say let's just remove it.

In my opinion, this would result in users falling back to mtdblock as
they currently to when they want a block device. This is even worse.

I'd really like to have this feature as a standard one, it shortens
the gap which exists between MTD and eMMC which is becoming more and
more common these days, precisely because of the difficulty to deal
with NAND directly while eMMC provides the abstraction which offers
more flexibility.

Regards,
Willy
Ezequiel Garcia Feb. 10, 2014, 8:50 a.m. UTC | #29
On Mon, Feb 10, 2014 at 10:24:12AM +0200, Artem Bityutskiy wrote:
> On Mon, 2014-02-10 at 05:12 -0300, Ezequiel Garcia wrote:
> > On Mon, Feb 10, 2014 at 08:53:14AM +0100, Richard Weinberger wrote:
> > > Am 10.02.2014 02:29, schrieb Ezequiel Garcia:
> > > >>> +
> > > >>> +               mutex_lock(&dev->vol_mutex);
> > > >>> +               res = do_ubiblock_request(dev, req);
> > > >>> +               mutex_unlock(&dev->vol_mutex);
> > > >>
> > > >> This means that you can never do parallel IO?
> > > >>
> > > > 
> > > > Indeed. Feel free to prepare a follow-up patch improving it,
> > > > once this is merged.
> > > 
> > > Sorry, this is a very lame argument.
> > > 
> > > You need to describe why your application design has this flaw.
> > 
> > Not at all. It's perfectly fine to merge a feature with a simple
> > implementation and improve it progressively. In fact, I've explicitly
> > chosen the simplest implementation whenever possible. We can always
> > get back here and improve the performance.
> 
> The NAND part of the MTD layer serializes all the I/O, so probably it is
> OK. May be needs to be documented, though. May be a comment in the code
> would be nice to have too.
> 

Yes, of course.

[..]
> Would you be able to write a small article for the MTD web site about
> the driver, may be some I/O figures there too, the limitations too? And
> send a patch against mtd-www.git 
> 

Sure!
Artem Bityutskiy Feb. 10, 2014, 8:50 a.m. UTC | #30
On Mon, 2014-02-10 at 05:27 -0300, Ezequiel Garcia wrote:
> > WRT to DANGEROUS sign, people do not read Kconfig help. Some distro
> will
> > just enable this, people will start using this, and then start
> sending
> > unappy e-mails. We have this with MTD block. No matter how many
> times I
> > wrote to people that this is just a debugging module, they still
> kept
> > using it.
> > 
> 
> If you really think distros will enable it and users will "just it",
> without
> thinking about the consequences, then I'd say let's just remove it.

I do not think everyone would do this, of course not. But if there were
many users, some would, I think, yes.
Willy Tarreau Feb. 10, 2014, 8:51 a.m. UTC | #31
Hi Thomas,

On Mon, Feb 10, 2014 at 09:42:46AM +0100, Thomas Petazzoni wrote:
> Dear Willy Tarreau,
> 
> On Sat, 8 Feb 2014 23:51:49 +0100, Willy Tarreau wrote:
> 
> > > I really vote for dropping write support at all.
> > 
> > Why ? When you put a read-only filesystem there such as squashfs, the
> > only writes you'll have will be updates, and write support will be the
> > only way to update the filesystem. So removing write support seriously
> > impacts the usefulness of the feature itself.
> 
> Why can't you use ubi_updatevol to do the update of the squashfs image?
> I.e instead of updating through the block layer, you update at the UBI
> layer level. No?

It only solves one part of the problem with needless extra complexity.
I really fail to see why we would punish MTD users asking them to use
significantly different block device management interfaces than what
they can have on top of eMMC, SD, etc..., allowing to unify the FS
management methods.

Also, ubiupdatevol will offers no possibility to have a small R/W FS
to save a configuration from time to time.

Thanks,
Willy
Ezequiel Garcia Feb. 10, 2014, 2:20 p.m. UTC | #32
On Mon, Feb 10, 2014 at 09:46:16AM +0100, Willy Tarreau wrote:
> > > 
> > > If write support has 0 or 1.5 customers and it was not tested
> > > extensively, and never used in any kind of production, I am not sure it
> > > is needed to be there. But let's first hear your answers.
> > > 
> > 
> > No, this hasn't been tested intensively and I'm pretty sure nobody would
> > ever put it in production before conducting such tests himself.
> 
> For sure, but conversely, disabling it in the code would result in
> nobody ever testing it !
> 

I agree completely and it's why I wanted to have it available.

> > 
> > If you really think distros will enable it and users will "just it", without
> > thinking about the consequences, then I'd say let's just remove it.
> 
> In my opinion, this would result in users falling back to mtdblock as
> they currently to when they want a block device. This is even worse.
> 
> I'd really like to have this feature as a standard one, it shortens
> the gap which exists between MTD and eMMC which is becoming more and
> more common these days, precisely because of the difficulty to deal
> with NAND directly while eMMC provides the abstraction which offers
> more flexibility.
> 

Artem, I'd say it's your call. Want me to drop write support or not?

Unfortunately, I don't have enough time to conduct extensive testings on
that, but just simple read/write test on some filesystem as Willy did on
ext2

Quite frankly, I want to see this merged as soon as possible, so if we
are still having second thoughts, I'll submit a read-only version and
we'll see about adding write support later.
Richard Weinberger Feb. 10, 2014, 2:41 p.m. UTC | #33
Am 10.02.2014 15:20, schrieb Ezequiel Garcia:
> On Mon, Feb 10, 2014 at 09:46:16AM +0100, Willy Tarreau wrote:
>>>>
>>>> If write support has 0 or 1.5 customers and it was not tested
>>>> extensively, and never used in any kind of production, I am not sure it
>>>> is needed to be there. But let's first hear your answers.
>>>>
>>>
>>> No, this hasn't been tested intensively and I'm pretty sure nobody would
>>> ever put it in production before conducting such tests himself.
>>
>> For sure, but conversely, disabling it in the code would result in
>> nobody ever testing it !
>>
> 
> I agree completely and it's why I wanted to have it available.
> 
>>>
>>> If you really think distros will enable it and users will "just it", without
>>> thinking about the consequences, then I'd say let's just remove it.
>>
>> In my opinion, this would result in users falling back to mtdblock as
>> they currently to when they want a block device. This is even worse.
>>
>> I'd really like to have this feature as a standard one, it shortens
>> the gap which exists between MTD and eMMC which is becoming more and
>> more common these days, precisely because of the difficulty to deal
>> with NAND directly while eMMC provides the abstraction which offers
>> more flexibility.
>>
> 
> Artem, I'd say it's your call. Want me to drop write support or not?
> 
> Unfortunately, I don't have enough time to conduct extensive testings on
> that, but just simple read/write test on some filesystem as Willy did on
> ext2

Please, at lest run xfstests. :)

> Quite frankly, I want to see this merged as soon as possible, so if we
> are still having second thoughts, I'll submit a read-only version and
> we'll see about adding write support later.
> 

If you have addressed the issues I pointed out feel free to add
Reviewed-by: Richard Weinberger <richard@nod.at>

Thanks,
//richard
Artem Bityutskiy Feb. 10, 2014, 2:50 p.m. UTC | #34
On Mon, 2014-02-10 at 11:20 -0300, Ezequiel Garcia wrote:
> On Mon, Feb 10, 2014 at 09:46:16AM +0100, Willy Tarreau wrote:
> > > > 
> > > > If write support has 0 or 1.5 customers and it was not tested
> > > > extensively, and never used in any kind of production, I am not sure it
> > > > is needed to be there. But let's first hear your answers.
> > > > 
> > > 
> > > No, this hasn't been tested intensively and I'm pretty sure nobody would
> > > ever put it in production before conducting such tests himself.
> > 
> > For sure, but conversely, disabling it in the code would result in
> > nobody ever testing it !
> > 
> 
> I agree completely and it's why I wanted to have it available.
> 
> > > 
> > > If you really think distros will enable it and users will "just it", without
> > > thinking about the consequences, then I'd say let's just remove it.
> > 
> > In my opinion, this would result in users falling back to mtdblock as
> > they currently to when they want a block device. This is even worse.
> > 
> > I'd really like to have this feature as a standard one, it shortens
> > the gap which exists between MTD and eMMC which is becoming more and
> > more common these days, precisely because of the difficulty to deal
> > with NAND directly while eMMC provides the abstraction which offers
> > more flexibility.
> > 
> 
> Artem, I'd say it's your call. Want me to drop write support or not?

I guess what I am concerned of more now is not the feature, since at
least Willy used is.

I am concerned of a kernel option. All these little tiny compile-time
options are so annoying. We have so many of them.

I'd say, either support write or not.

If you support it, document its limitations in mtd-www. Print a run-time
warning that it is not power-cut-safe on module initialization, after
all. Let others improve it later if needed.

Or mark R/W as experimental and make your module to be R/O by default.
Force people to use 'blockdev --setro' if they want R/W. Run-time.

But do not add anther tiny little Kconfig option.

Just like the option for the cache that you added - please, try to make
it go away. Have only one single option - enable or disable the entire
driver.

How does this sound?

And I understand your concerns about stalling. I do not have time to
carefully review your patch. It looks good in general. And I do not know
if I will have time for review or not. I hope I will soon. But even if I
won't, I am thinking to just merge it to my tree for 3.14 anyway, unless
someone rises a blocking concern.

How does this sound?
Bityutskiy, Artem Feb. 10, 2014, 2:52 p.m. UTC | #35
On Mon, 2014-02-10 at 16:50 +0200, Artem Bityutskiy wrote:
> Or mark R/W as experimental and make your module to be R/O by default.
> Force people to use 'blockdev --setro' if they want R/W. Run-time.

Err, here I did not mean the Kconfig experimental. I am asking for no
Kconfig options for this. I more meant like tell people where the docs
are, and tell there that write support is experimental.
Bityutskiy, Artem Feb. 10, 2014, 2:53 p.m. UTC | #36
On Mon, 2014-02-10 at 16:50 +0200, Artem Bityutskiy wrote:
> Or mark R/W as experimental and make your module to be R/O by default.
> Force people to use 'blockdev --setro' if they want R/W. Run-time.

Gosh, I'm sorry, of course I meant '--setrw' here.
Willy Tarreau Feb. 10, 2014, 4:15 p.m. UTC | #37
On Mon, Feb 10, 2014 at 02:52:12PM +0000, Bityutskiy, Artem wrote:
> On Mon, 2014-02-10 at 16:50 +0200, Artem Bityutskiy wrote:
> > Or mark R/W as experimental and make your module to be R/O by default.
> > Force people to use 'blockdev --setro' if they want R/W. Run-time.
> 
> Err, here I did not mean the Kconfig experimental. I am asking for no
> Kconfig options for this. I more meant like tell people where the docs
> are, and tell there that write support is experimental.

I'd rather have no option either and have it naturally support the
features. Build-time options are always annoying, you disable them
at the beginning believing they're useless, then you regret this
when you try to use them...

Willy
Ezequiel Garcia Feb. 10, 2014, 6:48 p.m. UTC | #38
On Mon, Feb 10, 2014 at 04:50:14PM +0200, Artem Bityutskiy wrote:
[..]
> 
> I am concerned of a kernel option. All these little tiny compile-time
> options are so annoying. We have so many of them.
> 
> I'd say, either support write or not.
> 
> If you support it, document its limitations in mtd-www. Print a run-time
> warning that it is not power-cut-safe on module initialization, after
> all. Let others improve it later if needed.
> 
> Or mark R/W as experimental and make your module to be R/O by default.
> Force people to use 'blockdev --setro' if they want R/W. Run-time.
> 
> But do not add anther tiny little Kconfig option.
> 

Hm... well, this brings another question. The 1-LEB cache was made
optional after a suggestion from Piergiorgio; the reason for not
enabling the cache is to make the block interface even less memory
hungry.

Filesystems can have smarter caches, so it's not clear the cache is
needed.

Maybe we can keep a parameter for it? However, notice that the block
interface is no longer a separate module, but part of the UBI core.
Willy Tarreau Feb. 10, 2014, 9:43 p.m. UTC | #39
On Mon, Feb 10, 2014 at 10:39:18PM +0100, Piergiorgio Beruto wrote:
> Hi all,
> Please, the possibility to disable at compile time unneeded code to save
> memory and/or cpu speed would make the life of embedded developers much more
> easier.
> 
> Ubiblk was exactly what we were looking for, in my personal opinion the
> possibility to disable caching and rw feature is of very high value.

Reminds me the happy days of CONFIG_TINY :-)
So in practice it's probably just a matter of turning the options to :

    Disable Caching (default N, say Y to save memory)
    Disable Write support (default N, say Y to reduce code size)

Thanks,
Willy
Thomas Petazzoni Feb. 10, 2014, 10:37 p.m. UTC | #40
Dear Ezequiel Garcia,

On Mon, 10 Feb 2014 15:48:59 -0300, Ezequiel Garcia wrote:

> Hm... well, this brings another question. The 1-LEB cache was made
> optional after a suggestion from Piergiorgio; the reason for not
> enabling the cache is to make the block interface even less memory
> hungry.

Can you make this runtime configurable using a module parameter?

Best regards,

Thomas
Thomas Petazzoni Feb. 10, 2014, 10:46 p.m. UTC | #41
Dear Piergiorgio Beruto,

On Mon, 10 Feb 2014 23:41:54 +0100, Piergiorgio Beruto wrote:

> sorry for interrupting, but what's the benefit in using a runtime var?
> It is great to being able to reduce code size for embedded systems. And ubiblk is likely to be used for embedded.

The point about the caching feature was not really the increased code
size, but rather the memory consumption caused by the cache itself.
This can be solved using a runtime configurable module parameter.

Best regards,

Thomas
Thomas Petazzoni Feb. 10, 2014, 11:01 p.m. UTC | #42
Dear Piergiorgio Beruto,

(Please wrap your e-mail at ~72 characters per line, as is coming
practice in the open-source community)

On Mon, 10 Feb 2014 23:49:38 +0100, Piergiorgio Beruto wrote:

> Sorry, i understand, just i dont see the point in using a runtime var when the compile time option is already implemented and in my personal opinion is preferable.

That's probably because you don't understand the additional maintenance
burden caused by compile time option. When there are gazillions of
compile time options, there are gazillions of build possibilities that
a maintainer must test when applying patches, and pushing changes
upstream. This is annoying and can be avoided by limiting the number of
compile time parameters.

Of course, it's completely a matter of trade-off: we still do want to
have compile-time options for complete drivers, but we generally try to
avoid them to tune the driver behavior itself.

That being said, I can't speak for the UBI maintainer :-)

Best regards,

Thomas
Ezequiel Garcia Feb. 10, 2014, 11:19 p.m. UTC | #43
On Mon, Feb 10, 2014 at 11:46:00PM +0100, Thomas Petazzoni wrote:
> On Mon, 10 Feb 2014 23:41:54 +0100, Piergiorgio Beruto wrote:
> 
> > sorry for interrupting

Please, don't apoplogize. As a user of this you have all the right
-maybe the obligation- to interrupt :-)

> The point about the caching feature was not really the increased code
> size, but rather the memory consumption caused by the cache itself.
> This can be solved using a runtime configurable module parameter.
> 

Yes, Artem is suggestion that direction too. The write support can be
added with an already existent block device ioctl (blockdev --setrw).

As for the cache, well... we could have a module parameter. The problem
I see in that, is that the block interface is *not* a module (as I recently
pointed out). Instead, it's integrated into the UBI core.

Therefore, it would be an UBI module parameter, such as "ubi.block_cache
= yes/no". I really don't like this.

Oh, and *please* don't ask about the non-module choice. Having the block
interface as an extension of the UBI core, the userspace interface
results very intuitive.

We have a new UBI module parameter:

  $ modprobe ubi mtd=/dev/mtd5 block=/dev/ubi0_0

And a simple tool, which re-uses the already existent UBI ioctl:

  $ ubiblkvol --attach /dev/ubi0_0
  $ ubiblkvol --detach /dev/ubi0_0

The modularized approach meant a very complex userspace interface. It
seemed to require *another* control char device:

  https://lkml.org/lkml/2011/8/15/145

*** ***

On the other side, I don't really agree with all this fuzz around the
two compile options. The code is *really* dead simple, as I went to
great extents to keep it simple.

Sadly, we are not reaching any agreement. So unless anybody has a better
idea, I'd go for a solution that suits the squashfs user: no cache,
no write support.

Artem? What do you say?
Geert Uytterhoeven Feb. 11, 2014, 8:37 a.m. UTC | #44
On Mon, Feb 10, 2014 at 10:43 PM, Willy Tarreau <w@1wt.eu> wrote:
> On Mon, Feb 10, 2014 at 10:39:18PM +0100, Piergiorgio Beruto wrote:
>> Hi all,
>> Please, the possibility to disable at compile time unneeded code to save
>> memory and/or cpu speed would make the life of embedded developers much more
>> easier.
>>
>> Ubiblk was exactly what we were looking for, in my personal opinion the
>> possibility to disable caching and rw feature is of very high value.
>
> Reminds me the happy days of CONFIG_TINY :-)
> So in practice it's probably just a matter of turning the options to :
>
>     Disable Caching (default N, say Y to save memory)
>     Disable Write support (default N, say Y to reduce code size)

But then allmodconfig builds will not exercise those code paths.

Don't you want:

    bool "Enable Caching" if !EXPERT
    default y

    bool "Enable Write support" if !EXPERT
    default y

?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Willy Tarreau Feb. 11, 2014, 9:05 a.m. UTC | #45
On Tue, Feb 11, 2014 at 09:37:25AM +0100, Geert Uytterhoeven wrote:
> On Mon, Feb 10, 2014 at 10:43 PM, Willy Tarreau <w@1wt.eu> wrote:
> > On Mon, Feb 10, 2014 at 10:39:18PM +0100, Piergiorgio Beruto wrote:
> >> Hi all,
> >> Please, the possibility to disable at compile time unneeded code to save
> >> memory and/or cpu speed would make the life of embedded developers much more
> >> easier.
> >>
> >> Ubiblk was exactly what we were looking for, in my personal opinion the
> >> possibility to disable caching and rw feature is of very high value.
> >
> > Reminds me the happy days of CONFIG_TINY :-)
> > So in practice it's probably just a matter of turning the options to :
> >
> >     Disable Caching (default N, say Y to save memory)
> >     Disable Write support (default N, say Y to reduce code size)
> 
> But then allmodconfig builds will not exercise those code paths.
> 
> Don't you want:
> 
>     bool "Enable Caching" if !EXPERT
>     default y
> 
>     bool "Enable Write support" if !EXPERT
>     default y
> 
> ?

Sounds like an excellent idea!

Willy
Ezequiel Garcia Feb. 11, 2014, 9:35 a.m. UTC | #46
On Tue, Feb 11, 2014 at 10:05:33AM +0100, Willy Tarreau wrote:
> On Tue, Feb 11, 2014 at 09:37:25AM +0100, Geert Uytterhoeven wrote:
> > On Mon, Feb 10, 2014 at 10:43 PM, Willy Tarreau <w@1wt.eu> wrote:
> > > On Mon, Feb 10, 2014 at 10:39:18PM +0100, Piergiorgio Beruto wrote:
> > >> Hi all,
> > >> Please, the possibility to disable at compile time unneeded code to save
> > >> memory and/or cpu speed would make the life of embedded developers much more
> > >> easier.
> > >>
> > >> Ubiblk was exactly what we were looking for, in my personal opinion the
> > >> possibility to disable caching and rw feature is of very high value.
> > >
> > > Reminds me the happy days of CONFIG_TINY :-)
> > > So in practice it's probably just a matter of turning the options to :
> > >
> > >     Disable Caching (default N, say Y to save memory)
> > >     Disable Write support (default N, say Y to reduce code size)
> > 
> > But then allmodconfig builds will not exercise those code paths.
> > 
> > Don't you want:
> > 
> >     bool "Enable Caching" if !EXPERT
> >     default y
> > 
> >     bool "Enable Write support" if !EXPERT
> >     default y
> > 
> > ?
> 
> Sounds like an excellent idea!
> 

FWIW, we can also use CONFIG_EMBEDDED. However, most distros seem to
enable this by default.

Artem: What do you say about Geert's suggestion?
Peter Korsgaard Feb. 11, 2014, 9:43 a.m. UTC | #47
>>>>> "Geert" == Geert Uytterhoeven <geert@linux-m68k.org> writes:

Hi,

 >> Disable Caching (default N, say Y to save memory)
 >> Disable Write support (default N, say Y to reduce code size)

 > But then allmodconfig builds will not exercise those code paths.

 > Don't you want:

 >     bool "Enable Caching" if !EXPERT
 >     default y

 >     bool "Enable Write support" if !EXPERT
 >     default y

Ehh, those should be EXPERT, not !EXPERT, right?
Geert Uytterhoeven Feb. 11, 2014, 10:21 a.m. UTC | #48
On Tue, Feb 11, 2014 at 10:43 AM, Peter Korsgaard <peter@korsgaard.com> wrote:
>  >> Disable Caching (default N, say Y to save memory)
>  >> Disable Write support (default N, say Y to reduce code size)
>
>  > But then allmodconfig builds will not exercise those code paths.
>
>  > Don't you want:
>
>  >     bool "Enable Caching" if !EXPERT
>  >     default y
>
>  >     bool "Enable Write support" if !EXPERT
>  >     default y
>
> Ehh, those should be EXPERT, not !EXPERT, right?

Yes, sorry for the mistake.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
diff mbox

Patch

diff --git a/drivers/mtd/ubi/Kconfig b/drivers/mtd/ubi/Kconfig
index 36663af..2893775 100644
--- a/drivers/mtd/ubi/Kconfig
+++ b/drivers/mtd/ubi/Kconfig
@@ -87,4 +87,46 @@  config MTD_UBI_GLUEBI
 	   work on top of UBI. Do not enable this unless you use legacy
 	   software.
 
+config MTD_UBI_BLOCK
+	bool "Block device access to UBI volumes"
+	default n
+	help
+	   Since UBI already takes care of eraseblock wear leveling
+	   and bad block handling, it's possible to implement a block
+	   device on top of it and therefore mount regular filesystems
+	   (i.e. not flash-oriented, as ext4).
+	   In other words, this is a software flash translation layer.
+
+	   This can be particularly interesting to allow mounting a read-only
+	   filesystem, such as squashfs, on a NAND device.
+
+	   When selected, this feature will be built-in the ubi module
+	   and block devices will be able to be created and removed using
+	   the userspace 'ubiblkvol' tool (provided mtd-utils).
+
+	   If in doubt, say "N".
+
+config MTD_UBI_BLOCK_CACHED
+	bool "Enable cached UBI block access"
+	default y
+	depends on MTD_UBI_BLOCK
+	help
+	   In order to reduce flash device access, this option enables a 1-LEB
+	   sized cache. For read-only access this can be an overkill, given
+	   filesystems most likely implement their own caching policies.
+	   Moreover, since a LEB can be as large as ~1 MiB, memory-constrained
+	   platforms can choose to disable this.
+
+config MTD_UBI_BLOCK_WRITE_SUPPORT
+	bool "Enable write support (DANGEROUS)"
+	default n
+	depends on MTD_UBI_BLOCK
+	select MTD_UBI_BLOCK_CACHED
+	help
+	   This is a *very* dangerous feature. Using a regular block-oriented
+	   filesystem might impact heavily on a flash device wear.
+	   Use with extreme caution.
+
+	   If in doubt, say "N".
+
 endif # MTD_UBI
diff --git a/drivers/mtd/ubi/Makefile b/drivers/mtd/ubi/Makefile
index b46b0c97..4e3c3d7 100644
--- a/drivers/mtd/ubi/Makefile
+++ b/drivers/mtd/ubi/Makefile
@@ -3,5 +3,6 @@  obj-$(CONFIG_MTD_UBI) += ubi.o
 ubi-y += vtbl.o vmt.o upd.o build.o cdev.o kapi.o eba.o io.o wl.o attach.o
 ubi-y += misc.o debug.o
 ubi-$(CONFIG_MTD_UBI_FASTMAP) += fastmap.o
+ubi-$(CONFIG_MTD_UBI_BLOCK) += block.o
 
 obj-$(CONFIG_MTD_UBI_GLUEBI) += gluebi.o
diff --git a/drivers/mtd/ubi/block.c b/drivers/mtd/ubi/block.c
new file mode 100644
index 0000000..6a7dc00
--- /dev/null
+++ b/drivers/mtd/ubi/block.c
@@ -0,0 +1,899 @@ 
+/*
+ * Copyright (c) 2014 Ezequiel Garcia
+ * Copyright (c) 2011 Free Electrons
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, version 2.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
+ * the GNU General Public License for more details.
+ *
+ * A block implementation on UBI volume
+ * ------------------------------------
+ *
+ * How to use this? A bunch of examples worth a thousand words:
+ *
+ * If you want to attach a block device on bootup time, e.g. in order
+ * to mount the rootfs on such a block device:
+ *
+ * Using the UBI volume path:
+ *  ubi.block=/dev/ubi0_0
+ *
+ * Using the UBI device, and the volume name:
+ *  ubi.block=0,rootfs
+ *
+ * Using both UBI device number and UBI volume number:
+ *  ubi.block=0,0
+ *
+ * In this case you would have such kernel parameters:
+ *  ubi.mtd=5 ubi.block=0,0 root=/dev/ubiblock0_0
+ *
+ * Of course, if you compile ubi as a module you can use this
+ * parameter on module insertion time:
+ *  modprobe ubi mtd=/dev/mtd5 block=/dev/ubi0_0
+ *
+ * For runtime block attaching/detaching, see mtd-utils' ubiblkvol tool.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/err.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/mtd/ubi.h>
+#include <linux/workqueue.h>
+#include <linux/blkdev.h>
+#include <linux/hdreg.h>
+
+#include "ubi-media.h"
+#include "ubi.h"
+
+/* Maximum number of supported devices */
+#define UBIBLOCK_MAX_DEVICES 32
+
+/* Maximum length of the 'block=' parameter */
+#define UBIBLOCK_PARAM_LEN 63
+
+/* Maximum number of comma-separated items in the 'block=' parameter */
+#define UBIBLOCK_PARAM_COUNT 2
+
+struct ubiblock_param {
+	int ubi_num;
+	int vol_id;
+	char name[UBIBLOCK_PARAM_LEN+1];
+};
+
+/* Numbers of elements set in the @ubiblock_param array */
+static int ubiblock_devs __initdata;
+
+/* MTD devices specification parameters */
+static struct ubiblock_param ubiblock_param[UBIBLOCK_MAX_DEVICES] __initdata;
+
+struct ubiblock_cache {
+	char *buffer;
+	enum { STATE_EMPTY, STATE_CLEAN, STATE_DIRTY } state;
+	int leb_num;
+};
+
+struct ubiblock {
+	struct ubi_volume_desc *desc;
+	struct ubi_volume_info *vi;
+	int ubi_num;
+	int vol_id;
+	int refcnt;
+
+	struct gendisk *gd;
+	struct request_queue *rq;
+
+	struct workqueue_struct *wq;
+	struct work_struct work;
+
+	struct mutex vol_mutex;
+	spinlock_t queue_lock;
+	struct list_head list;
+
+	int leb_size;
+#ifdef CONFIG_MTD_UBI_BLOCK_CACHED
+	struct ubiblock_cache cache;
+#endif
+};
+
+/* Linked list of all ubiblock instances */
+static LIST_HEAD(ubiblock_devices);
+static DEFINE_MUTEX(devices_mutex);
+static int ubiblock_major;
+
+/* Ugh! this parameter parsing code is awful */
+static int __init ubiblock_set_param(const char *val,
+				     const struct kernel_param *kp)
+{
+	int i, ret;
+	size_t len;
+	struct ubiblock_param *param;
+	char buf[UBIBLOCK_PARAM_LEN];
+	char *pbuf = &buf[0];
+	char *tokens[UBIBLOCK_PARAM_COUNT];
+
+	if (!val)
+		return -EINVAL;
+
+	len = strnlen(val, UBIBLOCK_PARAM_LEN);
+	if (len == 0) {
+		ubi_warn("block: empty 'block=' parameter - ignored\n");
+		return 0;
+	}
+
+	if (len == UBIBLOCK_PARAM_LEN) {
+		ubi_err("block: parameter \"%s\" is too long, max. is %d\n",
+			val, UBIBLOCK_PARAM_LEN);
+		return -EINVAL;
+	}
+
+	strcpy(buf, val);
+
+	/* Get rid of the final newline */
+	if (buf[len - 1] == '\n')
+		buf[len - 1] = '\0';
+
+	for (i = 0; i < UBIBLOCK_PARAM_COUNT; i++)
+		tokens[i] = strsep(&pbuf, ",");
+
+	param = &ubiblock_param[ubiblock_devs];
+	if (tokens[1]) {
+		/* Two parameters: can be 'ubi, vol_id' or 'ubi, vol_name' */
+		ret = kstrtoint(tokens[0], 10, &param->ubi_num);
+		if (ret < 0)
+			return -EINVAL;
+
+		/* Second param can be a number or a name */
+		ret = kstrtoint(tokens[1], 10, &param->vol_id);
+		if (ret < 0) {
+			param->vol_id = -1;
+			strcpy(param->name, tokens[1]);
+		}
+
+	} else {
+		/* One parameter: must be device path */
+		strcpy(param->name, tokens[0]);
+		param->ubi_num = -1;
+		param->vol_id = -1;
+	}
+
+	ubiblock_devs++;
+
+	return 0;
+}
+
+static const struct kernel_param_ops ubiblock_param_ops = {
+	.set    = ubiblock_set_param,
+};
+module_param_cb(block, &ubiblock_param_ops, NULL, 0);
+MODULE_PARM_DESC(block, "Attach block devices to UBI volumes. Parameter format: block=<path|dev,num|dev,name>.\n"
+			"Multiple \"block\" parameters may be specified.\n"
+			"UBI volumes may be specified by their number, name, or path to the device node.\n"
+			"Examples\n"
+			"Using the UBI volume path:\n"
+			"ubi.block=/dev/ubi0_0\n"
+			"Using the UBI device, and the volume name:\n"
+			"ubi.block=0,rootfs\n"
+			"Using both UBI device number and UBI volume number:\n"
+			"ubi.block=0,0\n");
+
+static struct ubiblock *find_dev_nolock(int ubi_num, int vol_id)
+{
+	struct ubiblock *dev;
+
+	list_for_each_entry(dev, &ubiblock_devices, list)
+		if (dev->ubi_num == ubi_num && dev->vol_id == vol_id)
+			return dev;
+	return NULL;
+}
+
+#ifdef CONFIG_MTD_UBI_BLOCK_CACHED
+static void ubiblock_clean_cache(struct ubiblock *dev)
+{
+	dev->cache.leb_num = -1;
+	dev->cache.state = STATE_EMPTY;
+}
+
+static void ubiblock_free_cache(struct ubiblock *dev)
+{
+	dev->cache.leb_num = -1;
+	dev->cache.state = STATE_EMPTY;
+	vfree(dev->cache.buffer);
+	dev->cache.buffer = NULL;
+}
+
+static int ubiblock_alloc_cache(struct ubiblock *dev)
+{
+	dev->cache.state = STATE_EMPTY;
+	dev->cache.leb_num = -1;
+	dev->cache.buffer = vmalloc(dev->leb_size);
+	if (!dev->cache.buffer)
+		return -ENOMEM;
+	return 0;
+}
+
+static bool leb_in_cache(struct ubiblock_cache *cache, int leb_num)
+{
+	return cache->leb_num == leb_num;
+}
+
+static int ubiblock_fill_cache(struct ubiblock *dev, int leb_num,
+			       struct ubiblock_cache *cache)
+{
+	int ret;
+
+	/* Warn if we fill cache while being dirty */
+	WARN_ON(cache->state == STATE_DIRTY);
+
+	cache->leb_num = leb_num;
+	cache->state = STATE_CLEAN;
+
+	ret = ubi_read(dev->desc, leb_num, cache->buffer, 0,
+		       dev->leb_size);
+	if (ret) {
+		ubi_err("%s ubi_read error %d", dev->gd->disk_name, ret);
+		return ret;
+	}
+	return 0;
+}
+
+static int ubiblock_read_to_buf(struct ubiblock *dev, char *buffer,
+				int leb, int offset, int len)
+{
+	int ret;
+	char *cache_buffer;
+	/*
+	 * First try in cache, if it's not there load this leb.
+	 * Note that reading never flushes to disk!
+	 */
+	if (leb_in_cache(&dev->cache, leb)) {
+		cache_buffer = dev->cache.buffer;
+	} else {
+		/* Leb is not in cache: fill it! */
+		ret = ubiblock_fill_cache(dev, leb, &dev->cache);
+		if (ret)
+			return ret;
+		cache_buffer = dev->cache.buffer;
+	}
+	memcpy(buffer, cache_buffer + offset, len);
+	return 0;
+}
+#else
+static inline void ubiblock_clean_cache(struct ubiblock *dev) {}
+
+static inline void ubiblock_free_cache(struct ubiblock *dev) {}
+
+static inline int ubiblock_alloc_cache(struct ubiblock *dev)
+{
+	return 0;
+}
+
+static int ubiblock_read_to_buf(struct ubiblock *dev, char *buffer,
+				int leb, int offset, int len)
+{
+	int ret;
+
+	ret = ubi_read(dev->desc, leb, buffer, offset, len);
+	if (ret) {
+		ubi_err("%s ubi_read error %d",
+			dev->gd->disk_name, ret);
+		return ret;
+	}
+	return 0;
+}
+#endif /* CONFIG_MTD_UBI_BLOCK_CACHED */
+
+#ifdef CONFIG_MTD_UBI_BLOCK_WRITE_SUPPORT
+static int ubiblock_flush(struct ubiblock *dev, bool sync)
+{
+	struct ubiblock_cache *cache = &dev->cache;
+	int ret = 0;
+
+	if (cache->state != STATE_DIRTY)
+		return 0;
+
+	/*
+	 * mtdblock sets STATE_EMPTY, arguing that it prevents the
+	 * underlying media to get changed without notice.
+	 * I'm not fully convinced, so I just put STATE_CLEAN.
+	 */
+	cache->state = STATE_CLEAN;
+
+	/* Atomically change leb with buffer contents */
+	ret = ubi_leb_change(dev->desc, cache->leb_num,
+			     cache->buffer, dev->leb_size);
+	if (ret) {
+		ubi_err("%s ubi_leb_change error %d",
+			dev->gd->disk_name, ret);
+		return ret;
+	}
+
+	/* Sync ubi device when device is released and on block flush ioctl */
+	if (sync)
+		ret = ubi_sync(dev->ubi_num);
+
+	return ret;
+}
+
+static int ubiblock_write(struct ubiblock *dev, const char *buffer,
+			  int pos, int len)
+{
+	int leb, offset, ret;
+	int bytes_left = len;
+	int to_write = len;
+	struct ubiblock_cache *cache = &dev->cache;
+
+	/* Get (leb:offset) address to write */
+	leb = pos / dev->leb_size;
+	offset = pos % dev->leb_size;
+
+	while (bytes_left) {
+		/*
+		 * We can only write one leb at a time.
+		 * Therefore if the write length is larger than
+		 * one leb size, we split the operation.
+		 */
+		if (offset + to_write > dev->leb_size)
+			to_write = dev->leb_size - offset;
+
+		/*
+		 * If leb is not in cache, we flush current cached
+		 * leb to disk. Cache contents will be filled by reading device.
+		 */
+		if (!leb_in_cache(cache, leb)) {
+
+			ret = ubiblock_flush(dev, false);
+			if (ret)
+				return ret;
+
+			ret = ubiblock_fill_cache(dev, leb, cache);
+			if (ret)
+				return ret;
+		}
+
+		memcpy(cache->buffer + offset, buffer, to_write);
+
+		/* This is the only place where we dirt the cache */
+		cache->state = STATE_DIRTY;
+
+		buffer += to_write;
+		bytes_left -= to_write;
+		to_write = bytes_left;
+		offset = 0;
+		leb++;
+	}
+	return 0;
+}
+#else
+static int ubiblock_flush(struct ubiblock *dev, bool sync)
+{
+	return -EPERM;
+}
+
+static int ubiblock_write(struct ubiblock *dev, const char *buffer,
+			  int pos, int len)
+{
+	return -EPERM;
+}
+#endif /* CONFIG_MTD_UBI_BLOCK_WRITE_SUPPORT */
+
+static int ubiblock_read(struct ubiblock *dev, char *buffer, int pos, int len)
+{
+	int leb, offset, ret;
+	int bytes_left = len;
+	int to_read = len;
+
+	/* Get leb:offset address to read from */
+	leb = pos / dev->leb_size;
+	offset = pos % dev->leb_size;
+
+	while (bytes_left) {
+
+		/*
+		 * We can only read one leb at a time.
+		 * Therefore if the read length is larger than
+		 * one leb size, we split the operation.
+		 */
+		if (offset + to_read > dev->leb_size)
+			to_read = dev->leb_size - offset;
+
+		ret = ubiblock_read_to_buf(dev, buffer, leb, offset, to_read);
+		if (ret)
+			return ret;
+
+		buffer += to_read;
+		bytes_left -= to_read;
+		to_read = bytes_left;
+		leb++;
+		offset = 0;
+	}
+	return 0;
+}
+
+static int do_ubiblock_request(struct ubiblock *dev, struct request *req)
+{
+	int pos, len;
+
+	if (req->cmd_flags & REQ_FLUSH)
+		return ubiblock_flush(dev, true);
+
+	if (req->cmd_type != REQ_TYPE_FS)
+		return -EIO;
+
+	if (blk_rq_pos(req) + blk_rq_cur_sectors(req) >
+	    get_capacity(req->rq_disk))
+		return -EIO;
+
+	pos = blk_rq_pos(req) << 9;
+	len = blk_rq_cur_bytes(req);
+
+	switch (rq_data_dir(req)) {
+	case READ:
+		return ubiblock_read(dev, req->buffer, pos, len);
+	case WRITE:
+		return ubiblock_write(dev, req->buffer, pos, len);
+	default:
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static void ubiblock_do_work(struct work_struct *work)
+{
+	struct ubiblock *dev =
+		container_of(work, struct ubiblock, work);
+	struct request_queue *rq = dev->rq;
+	struct request *req;
+	int res;
+
+	spin_lock_irq(rq->queue_lock);
+
+	req = blk_fetch_request(rq);
+	while (req) {
+
+		spin_unlock_irq(rq->queue_lock);
+
+		mutex_lock(&dev->vol_mutex);
+		res = do_ubiblock_request(dev, req);
+		mutex_unlock(&dev->vol_mutex);
+
+		spin_lock_irq(rq->queue_lock);
+
+		/*
+		 * If we're done with this request,
+		 * we need to fetch a new one
+		 */
+		if (!__blk_end_request_cur(req, res))
+			req = blk_fetch_request(rq);
+	}
+
+	spin_unlock_irq(rq->queue_lock);
+}
+
+static void ubiblock_request(struct request_queue *rq)
+{
+	struct ubiblock *dev;
+	struct request *req;
+
+	dev = rq->queuedata;
+
+	if (!dev)
+		while ((req = blk_fetch_request(rq)) != NULL)
+			__blk_end_request_all(req, -ENODEV);
+	else
+		queue_work(dev->wq, &dev->work);
+}
+
+static int ubiblock_open(struct block_device *bdev, fmode_t mode)
+{
+	struct ubiblock *dev = bdev->bd_disk->private_data;
+	int ubi_mode = UBI_READONLY;
+	int ret;
+#ifdef CONFIG_MTD_UBI_BLOCK_WRITE_SUPPORT
+	const bool allow_write = true;
+#else
+	const bool allow_write = false;
+#endif
+
+	mutex_lock(&dev->vol_mutex);
+	if (dev->refcnt > 0) {
+		/*
+		 * The volume is already opened,
+		 * just increase the reference counter
+		 *
+		 * If the first user has oppened this as read-only,
+		 * we don't allow to open as read-write.
+		 * This is the simplest solution. A better one would
+		 * be to re-open the volume as writable.
+		 */
+		if ((mode & FMODE_WRITE) &&
+		    (dev->desc->mode != UBI_READWRITE)) {
+			ret = -EBUSY;
+			goto out_unlock;
+		}
+		goto out_done;
+	}
+
+	if (allow_write) {
+		if (mode & FMODE_WRITE)
+			ubi_mode = UBI_READWRITE;
+	} else {
+		if (mode & FMODE_WRITE) {
+			ret = -EPERM;
+			goto out_unlock;
+		}
+	}
+
+	dev->desc = ubi_open_volume(dev->ubi_num, dev->vol_id, ubi_mode);
+	if (IS_ERR(dev->desc)) {
+		ubi_err("%s failed to open ubi volume %d_%d",
+			dev->gd->disk_name, dev->ubi_num, dev->vol_id);
+
+		ret = PTR_ERR(dev->desc);
+		dev->desc = NULL;
+		goto out_unlock;
+	}
+
+	dev->vi = kzalloc(sizeof(struct ubi_volume_info), GFP_KERNEL);
+	if (!dev->vi) {
+		ret = -ENOMEM;
+		goto out_close;
+	}
+	ubi_get_volume_info(dev->desc, dev->vi);
+	dev->leb_size = dev->vi->usable_leb_size;
+
+	ret = ubiblock_alloc_cache(dev);
+	if (ret)
+		goto out_free;
+out_done:
+	dev->refcnt++;
+	mutex_unlock(&dev->vol_mutex);
+	return 0;
+
+out_free:
+	kfree(dev->vi);
+out_close:
+	ubi_close_volume(dev->desc);
+	dev->desc = NULL;
+out_unlock:
+	mutex_unlock(&dev->vol_mutex);
+	return ret;
+}
+
+static void ubiblock_release(struct gendisk *gd, fmode_t mode)
+{
+	struct ubiblock *dev = gd->private_data;
+
+	mutex_lock(&dev->vol_mutex);
+
+	dev->refcnt--;
+	if (dev->refcnt == 0) {
+		ubiblock_flush(dev, true);
+		ubiblock_free_cache(dev);
+
+		kfree(dev->vi);
+		ubi_close_volume(dev->desc);
+
+		dev->vi = NULL;
+		dev->desc = NULL;
+	}
+
+	mutex_unlock(&dev->vol_mutex);
+}
+
+static int ubiblock_ioctl(struct block_device *bdev, fmode_t mode,
+			      unsigned int cmd, unsigned long arg)
+{
+	struct ubiblock *dev = bdev->bd_disk->private_data;
+	int ret = -ENXIO;
+
+	if (!dev)
+		return ret;
+
+	mutex_lock(&dev->vol_mutex);
+
+	/* I can't get this to get called. What's going on? */
+	switch (cmd) {
+	case BLKFLSBUF:
+		ret = ubiblock_flush(dev, true);
+		break;
+	default:
+		ret = -ENOTTY;
+	}
+
+	mutex_unlock(&dev->vol_mutex);
+	return ret;
+}
+
+static int ubiblock_getgeo(struct block_device *bdev, struct hd_geometry *geo)
+{
+	/* Some tools might require this information */
+	geo->heads = 1;
+	geo->cylinders = 1;
+	geo->sectors = get_capacity(bdev->bd_disk);
+	geo->start = 0;
+	return 0;
+}
+
+static const struct block_device_operations ubiblock_ops = {
+	.owner = THIS_MODULE,
+	.open = ubiblock_open,
+	.release = ubiblock_release,
+	.ioctl = ubiblock_ioctl,
+	.getgeo	= ubiblock_getgeo,
+};
+
+int ubiblock_add(struct ubi_volume_info *vi)
+{
+	struct ubiblock *dev;
+	struct gendisk *gd;
+	int disk_capacity;
+	int ret;
+
+	/* Check that the volume isn't already handled */
+	mutex_lock(&devices_mutex);
+	if (find_dev_nolock(vi->ubi_num, vi->vol_id)) {
+		mutex_unlock(&devices_mutex);
+		return -EEXIST;
+	}
+	mutex_unlock(&devices_mutex);
+
+	dev = kzalloc(sizeof(struct ubiblock), GFP_KERNEL);
+	if (!dev)
+		return -ENOMEM;
+
+	mutex_init(&dev->vol_mutex);
+
+	dev->ubi_num = vi->ubi_num;
+	dev->vol_id = vi->vol_id;
+
+	ubiblock_clean_cache(dev);
+
+	/* Initialize the gendisk of this ubiblock device */
+	gd = alloc_disk(1);
+	if (!gd) {
+		ubi_err("block: alloc_disk failed");
+		ret = -ENODEV;
+		goto out_free_dev;
+	}
+
+	gd->fops = &ubiblock_ops;
+	gd->major = ubiblock_major;
+	gd->first_minor = dev->ubi_num * UBI_MAX_VOLUMES + dev->vol_id;
+	gd->private_data = dev;
+	sprintf(gd->disk_name, "ubiblock%d_%d", dev->ubi_num, dev->vol_id);
+	disk_capacity = (vi->size * vi->usable_leb_size) >> 9;
+	set_capacity(gd, disk_capacity);
+	dev->gd = gd;
+
+	spin_lock_init(&dev->queue_lock);
+	dev->rq = blk_init_queue(ubiblock_request, &dev->queue_lock);
+	if (!dev->rq) {
+		ubi_err("block: blk_init_queue failed");
+		ret = -ENODEV;
+		goto out_put_disk;
+	}
+
+	dev->rq->queuedata = dev;
+	dev->gd->queue = dev->rq;
+
+	blk_queue_flush(dev->rq, REQ_FLUSH);
+
+	/*
+	 * Create one workqueue per volume (per registered block device).
+	 * Rembember workqueues are cheap, they're not threads.
+	 */
+	dev->wq = alloc_workqueue(gd->disk_name, 0, 0);
+	if (!dev->wq)
+		goto out_free_queue;
+	INIT_WORK(&dev->work, ubiblock_do_work);
+
+	mutex_lock(&devices_mutex);
+	list_add_tail(&dev->list, &ubiblock_devices);
+	mutex_unlock(&devices_mutex);
+
+	/* Must be the last step: anyone can call file ops from now on */
+	add_disk(dev->gd);
+
+	ubi_msg("%s created from ubi%d:%d(%s)",
+		dev->gd->disk_name, dev->ubi_num, dev->vol_id, vi->name);
+
+	return 0;
+
+out_free_queue:
+	blk_cleanup_queue(dev->rq);
+out_put_disk:
+	put_disk(dev->gd);
+out_free_dev:
+	kfree(dev);
+
+	return ret;
+}
+
+static void ubiblock_cleanup(struct ubiblock *dev)
+{
+	del_gendisk(dev->gd);
+	blk_cleanup_queue(dev->rq);
+	ubi_msg("%s released", dev->gd->disk_name);
+	put_disk(dev->gd);
+}
+
+int ubiblock_del(struct ubi_volume_info *vi)
+{
+	struct ubiblock *dev;
+
+	mutex_lock(&devices_mutex);
+	dev = find_dev_nolock(vi->ubi_num, vi->vol_id);
+	if (!dev) {
+		mutex_unlock(&devices_mutex);
+		return -ENODEV;
+	}
+
+	/* Found a device, let's lock it so we can check if it's busy */
+	mutex_lock(&dev->vol_mutex);
+
+	if (dev->refcnt > 0) {
+		mutex_unlock(&dev->vol_mutex);
+		mutex_unlock(&devices_mutex);
+		return -EBUSY;
+	}
+
+	/* Remove from device list */
+	list_del(&dev->list);
+	mutex_unlock(&devices_mutex);
+
+	/* Flush pending work and stop this workqueue */
+	destroy_workqueue(dev->wq);
+
+	ubiblock_cleanup(dev);
+	mutex_unlock(&dev->vol_mutex);
+	kfree(dev);
+	return 0;
+}
+
+static void ubiblock_resize(struct ubi_volume_info *vi)
+{
+	struct ubiblock *dev;
+	int disk_capacity;
+
+	/*
+	 * Need to lock the device list until we stop using the device,
+	 * otherwise the device struct might get released in ubiblock_del().
+	 */
+	mutex_lock(&devices_mutex);
+	dev = find_dev_nolock(vi->ubi_num, vi->vol_id);
+	if (!dev) {
+		mutex_unlock(&devices_mutex);
+		return;
+	}
+
+	mutex_lock(&dev->vol_mutex);
+	disk_capacity = (vi->size * vi->usable_leb_size) >> 9;
+	set_capacity(dev->gd, disk_capacity);
+	ubi_msg("%s resized to %d LEBs", dev->gd->disk_name, vi->size);
+	mutex_unlock(&dev->vol_mutex);
+	mutex_unlock(&devices_mutex);
+}
+
+static int ubiblock_notify(struct notifier_block *nb,
+			 unsigned long notification_type, void *ns_ptr)
+{
+	struct ubi_notification *nt = ns_ptr;
+
+	switch (notification_type) {
+	case UBI_VOLUME_ADDED:
+		/*
+		 * We want to enforce explicit block device attaching for
+		 * volumes; so when a volume is added we do nothing.
+		 */
+		break;
+	case UBI_VOLUME_REMOVED:
+		ubiblock_del(&nt->vi);
+		break;
+	case UBI_VOLUME_RESIZED:
+		ubiblock_resize(&nt->vi);
+		break;
+	default:
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block ubiblock_notifier = {
+	.notifier_call = ubiblock_notify,
+};
+
+static struct ubi_volume_desc * __init
+open_volume_desc(const char *name, int ubi_num, int vol_id)
+{
+	if (ubi_num == -1)
+		/* No ubi num, name must be a vol device path */
+		return ubi_open_volume_path(name, UBI_READONLY);
+	else if (vol_id == -1)
+		/* No vol_id, must be vol_name */
+		return ubi_open_volume_nm(ubi_num, name, UBI_READONLY);
+	else
+		return ubi_open_volume(ubi_num, vol_id, UBI_READONLY);
+}
+
+static void __init ubiblock_add_from_param(void)
+{
+	int i, ret;
+	struct ubiblock_param *p;
+	struct ubi_volume_desc *desc;
+	struct ubi_volume_info vi;
+
+	for (i = 0; i < ubiblock_devs; i++) {
+		p = &ubiblock_param[i];
+
+		desc = open_volume_desc(p->name, p->ubi_num, p->vol_id);
+		if (IS_ERR(desc)) {
+			ubi_warn("block: can't open volume, err=%ld\n",
+				 PTR_ERR(desc));
+			continue;
+		}
+
+		ubi_get_volume_info(desc, &vi);
+		ret = ubiblock_add(&vi);
+		if (ret)
+			ubi_warn("block: can't add '%s' volume, err=%d\n",
+				 vi.name, ret);
+		ubi_close_volume(desc);
+	}
+}
+
+int __init ubiblock_init(void)
+{
+	int ret;
+
+	ubiblock_major = register_blkdev(0, "ubiblock");
+	if (ubiblock_major < 0)
+		return ubiblock_major;
+
+	/* Attach block devices from 'block=' module param */
+	ubiblock_add_from_param();
+
+	/*
+	 * Block devices needs to be attached to volumes explicitly
+	 * upon user request. So we ignore existing volumes.
+	 */
+	ret = ubi_register_volume_notifier(&ubiblock_notifier, 1);
+	if (ret < 0)
+		unregister_blkdev(ubiblock_major, "ubiblock");
+	return ret;
+}
+
+void __exit ubiblock_exit(void)
+{
+	struct ubiblock *next;
+	struct ubiblock *dev;
+
+	ubi_unregister_volume_notifier(&ubiblock_notifier);
+
+	list_for_each_entry_safe(dev, next, &ubiblock_devices, list) {
+
+		/* Flush pending work and stop workqueue */
+		destroy_workqueue(dev->wq);
+
+		/* The module is being forcefully removed */
+		WARN_ON(dev->desc);
+
+		/* Remove from device list */
+		list_del(&dev->list);
+
+		ubiblock_cleanup(dev);
+
+		kfree(dev);
+	}
+
+	unregister_blkdev(ubiblock_major, "ubiblock");
+}
diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c
index e05dc62..2705bad 100644
--- a/drivers/mtd/ubi/build.c
+++ b/drivers/mtd/ubi/build.c
@@ -1296,6 +1296,9 @@  static int __init ubi_init(void)
 		}
 	}
 
+	if (ubiblock_init() < 0)
+		ubi_warn("cannot init block device access");
+
 	return 0;
 
 out_detach:
@@ -1324,6 +1327,8 @@  static void __exit ubi_exit(void)
 {
 	int i;
 
+	ubiblock_exit();
+
 	for (i = 0; i < UBI_MAX_DEVICES; i++)
 		if (ubi_devices[i]) {
 			mutex_lock(&ubi_devices_mutex);
diff --git a/drivers/mtd/ubi/cdev.c b/drivers/mtd/ubi/cdev.c
index 8ca49f2..39d3774 100644
--- a/drivers/mtd/ubi/cdev.c
+++ b/drivers/mtd/ubi/cdev.c
@@ -561,6 +561,26 @@  static long vol_cdev_ioctl(struct file *file, unsigned int cmd,
 		break;
 	}
 
+	/* Attach a block device to an UBI volume */
+	case UBI_IOCVOLATTBLK:
+	{
+		struct ubi_volume_info vi;
+
+		ubi_get_volume_info(desc, &vi);
+		err = ubiblock_add(&vi);
+		break;
+	}
+
+	/* Dettach a block device from an UBI volume */
+	case UBI_IOCVOLDETBLK:
+	{
+		struct ubi_volume_info vi;
+
+		ubi_get_volume_info(desc, &vi);
+		err = ubiblock_del(&vi);
+		break;
+	}
+
 	default:
 		err = -ENOTTY;
 		break;
diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
index 8ea6297..e76ff98 100644
--- a/drivers/mtd/ubi/ubi.h
+++ b/drivers/mtd/ubi/ubi.h
@@ -864,6 +864,20 @@  int ubi_update_fastmap(struct ubi_device *ubi);
 int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
 		     int fm_anchor);
 
+/* block.c */
+#ifdef CONFIG_MTD_UBI_BLOCK
+int ubiblock_init(void);
+void ubiblock_exit(void);
+int ubiblock_add(struct ubi_volume_info *vi);
+int ubiblock_del(struct ubi_volume_info *vi);
+#else
+static inline int ubiblock_init(void) { return 0; }
+static inline void ubiblock_exit(void) {}
+static inline int ubiblock_add(struct ubi_volume_info *vi) { return -ENOTTY; }
+static inline int ubiblock_del(struct ubi_volume_info *vi) { return -ENOTTY; }
+#endif
+
+
 /*
  * ubi_rb_for_each_entry - walk an RB-tree.
  * @rb: a pointer to type 'struct rb_node' to use as a loop counter
diff --git a/include/uapi/mtd/ubi-user.h b/include/uapi/mtd/ubi-user.h
index 723c324..1080762 100644
--- a/include/uapi/mtd/ubi-user.h
+++ b/include/uapi/mtd/ubi-user.h
@@ -134,6 +134,13 @@ 
  * used. A pointer to a &struct ubi_set_vol_prop_req object is expected to be
  * passed. The object describes which property should be set, and to which value
  * it should be set.
+ *
+ * Block device access to UBI volumes
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * To attach or detach a block device from an UBI volume the %UBI_IOCVOLATTBLK
+ * and %UBI_IOCVOLDETBLK ioctl commands should be used, respectively.
+ * These commands take no arguments.
  */
 
 /*
@@ -191,6 +198,10 @@ 
 /* Set an UBI volume property */
 #define UBI_IOCSETVOLPROP _IOW(UBI_VOL_IOC_MAGIC, 6, \
 			       struct ubi_set_vol_prop_req)
+/* Attach a block device to an UBI volume */
+#define UBI_IOCVOLATTBLK _IO(UBI_VOL_IOC_MAGIC, 7)
+/* Detach a block device from an UBI volume */
+#define UBI_IOCVOLDETBLK _IO(UBI_VOL_IOC_MAGIC, 8)
 
 /* Maximum MTD device name length supported by UBI */
 #define MAX_UBI_MTD_NAME_LEN 127