Message ID | 20211025082104.8017-1-kernel@kempniu.pl |
---|---|
State | Changes Requested |
Headers | show |
Series | mtdchar: prevent unbounded allocation in MEMWRITE ioctl | expand |
Hi Michał, kernel@kempniu.pl wrote on Mon, 25 Oct 2021 10:21:04 +0200: > In the mtdchar_write_ioctl() function, memdup_user() is called with its > 'len' parameter set to verbatim values provided by user space via a > struct mtd_write_req. Both the 'len' and 'ooblen' fields of that > structure are 64-bit unsigned integers, which means the MEMWRITE ioctl > can trigger unbounded kernel memory allocation requests. > > Fix by iterating over the buffers provided by user space in a loop, > processing at most mtd->erasesize bytes in each iteration. Adopt some > checks from mtd_check_oob_ops() to retain backward user space > compatibility. > Thank you very much for this proposal! > Suggested-by: Boris Brezillon <boris.brezillon@collabora.com> > Signed-off-by: Michał Kępień <kernel@kempniu.pl> > --- > Fixing this problem was suggested last month, during a discussion about > a new MEMREAD ioctl. [1] [2] > > My primary objective was to not break user space, i.e. to not change > externally visible behavior compared to the current code. The main > problem I faced was that splitting the input data into chunks makes the > MEMWRITE ioctl a _wrapper_ for mtd_write_oob() rather than a direct > _interface_ to it and yet from the user space perspective it still needs > to behave as if nothing changed. > > Despite my efforts, the patch does _not_ retain absolute backward > compatibility and I do not know whether this is acceptable. > Specifically, multi-eraseblock-sized writes (requiring multiple loop > iterations to be processed) which succeeded with the original code _may_ > now return errors and/or write different contents to the device than the > original code, depending on the MTD mode of operation requested and on > whether the start offset is page-aligned. The documentation for struct > mtd_oob_ops warns about even multi-page write requests, but... > > Example: > > MTD device parameters: > > - mtd->writesize = 2048 > - mtd->erasesize = 131072 > - 64 bytes of raw OOB space per page > > struct mtd_write_req req = { > .start = 2048, > .len = 262144, > .ooblen = 64, > .usr_data = ..., > .usr_oob = ..., > .mode = MTD_OPS_RAW, > }; > > (This is a 128-page write with OOB data supplied for just one page.) > > Current mtdchar_write_ioctl() returns 0 for this request and writes > 128 pages of data and 1 page's worth of OOB data (64 bytes) to the > MTD device. > > Patched mtdchar_write_ioctl() may return an error because the > original request gets split into two chunks (<data_len, oob_len>): > > <131072, 64> > <131072, 0> > > and the second chunk with zero OOB data length may make the MTD > driver unhappy in raw mode (resulting in only the first 64 pages of > data and 1 page's worth of OOB data getting written to the MTD > device). Isn't this a driver issue instead? I mean, writing an eraseblock without providing any OOB data is completely fine, if the driver accepts 2 blocks + 1 page OOB but refuses 1 block + 1 page OOB and then 1 block, it's broken, no? Have you experienced such a situation in your testing? > > Is an ioctl like this considered a "degenerate" one and therefore > acceptable to break or not? I don't think so :) > > At any rate, the revised code feels brittle to me and I would not be > particularly surprised if I missed more cases in which it produces > different results than the original code. > > I keep on wondering whether the benefits of this change are worth the > extra code complexity, but fortunately it is not my call to make :) > Perhaps I am missing something and my proposal can be simplified? Or > maybe the way I approached this is completely wrong? Any thoughts on > this are welcome. > > As the outcome of the discussion around this patch will have a > significant influence on the shape of the v2 of the MEMREAD ioctl, I > decided to submit this one first as a standalone patch. > > [1] https://lists.infradead.org/pipermail/linux-mtd/2021-September/088485.html > [2] https://lists.infradead.org/pipermail/linux-mtd/2021-September/088544.html > > drivers/mtd/mtdchar.c | 93 ++++++++++++++++++++++++++++++++----------- > 1 file changed, 70 insertions(+), 23 deletions(-) > > diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c > index 155e991d9d75..a3afc390e254 100644 > --- a/drivers/mtd/mtdchar.c > +++ b/drivers/mtd/mtdchar.c > @@ -578,9 +578,10 @@ static int mtdchar_write_ioctl(struct mtd_info *mtd, > { > struct mtd_info *master = mtd_get_master(mtd); > struct mtd_write_req req; > - struct mtd_oob_ops ops = {}; > const void __user *usr_data, *usr_oob; > - int ret; > + uint8_t *datbuf = NULL, *oobbuf = NULL; > + size_t datbuf_len, oobbuf_len; > + int ret = 0; > > if (copy_from_user(&req, argp, sizeof(req))) > return -EFAULT; > @@ -590,33 +591,79 @@ static int mtdchar_write_ioctl(struct mtd_info *mtd, > > if (!master->_write_oob) > return -EOPNOTSUPP; > - ops.mode = req.mode; > - ops.len = (size_t)req.len; > - ops.ooblen = (size_t)req.ooblen; > - ops.ooboffs = 0; > - > - if (usr_data) { > - ops.datbuf = memdup_user(usr_data, ops.len); > - if (IS_ERR(ops.datbuf)) > - return PTR_ERR(ops.datbuf); > - } else { > - ops.datbuf = NULL; > + > + if (!usr_data) > + req.len = 0; > + > + if (!usr_oob) > + req.ooblen = 0; > + > + if (req.start + req.len > mtd->size) > + return -EINVAL; > + > + datbuf_len = min_t(size_t, req.len, mtd->erasesize); > + if (datbuf_len > 0) { > + datbuf = kmalloc(datbuf_len, GFP_KERNEL); > + if (!datbuf) > + return -ENOMEM; > } > > - if (usr_oob) { > - ops.oobbuf = memdup_user(usr_oob, ops.ooblen); > - if (IS_ERR(ops.oobbuf)) { > - kfree(ops.datbuf); > - return PTR_ERR(ops.oobbuf); > + oobbuf_len = min_t(size_t, req.ooblen, mtd->erasesize); > + if (oobbuf_len > 0) { > + oobbuf = kmalloc(oobbuf_len, GFP_KERNEL); > + if (!oobbuf) { > + kfree(datbuf); > + return -ENOMEM; > } > - } else { > - ops.oobbuf = NULL; > } > > - ret = mtd_write_oob(mtd, (loff_t)req.start, &ops); > + while (req.len > 0 || (!usr_data && req.ooblen > 0)) { > + struct mtd_oob_ops ops = { > + .mode = req.mode, > + .len = min_t(size_t, req.len, datbuf_len), > + .ooblen = min_t(size_t, req.ooblen, oobbuf_len), > + .datbuf = req.len ? datbuf : NULL, > + .oobbuf = req.ooblen ? oobbuf : NULL, > + }; > > - kfree(ops.datbuf); > - kfree(ops.oobbuf); > + /* > + * For writes which are not OOB-only, adjust the amount of OOB > + * data written according to the number of data pages written. > + * This is necessary to prevent OOB data from being skipped > + * over in data+OOB writes requiring multiple mtd_write_oob() > + * calls to be completed. > + */ > + if (ops.len > 0 && ops.ooblen > 0) { > + u32 oob_per_page = mtd_oobavail(mtd, &ops); > + uint32_t pages_to_write = mtd_div_by_ws(ops.len, mtd); > + > + if (mtd_mod_by_ws(req.start + ops.len, mtd)) > + pages_to_write++; > + > + ops.ooblen = min_t(size_t, ops.ooblen, > + pages_to_write * oob_per_page); > + } > + > + if (copy_from_user(datbuf, usr_data, ops.len) || > + copy_from_user(oobbuf, usr_oob, ops.ooblen)) { > + ret = -EFAULT; > + break; > + } > + > + ret = mtd_write_oob(mtd, req.start, &ops); > + if (ret) > + break; > + > + req.start += ops.retlen; > + req.len -= ops.retlen; > + usr_data += ops.retlen; > + > + req.ooblen -= ops.oobretlen; > + usr_oob += ops.oobretlen; > + } > + > + kfree(datbuf); > + kfree(oobbuf); > > return ret; > } Thanks, Miquèl
Hi Miquèl, TL;DR: thanks, your comment made me look closer and I found what seems to be a feasible workaround that I will apply in v2 (along other fixes). > > Despite my efforts, the patch does _not_ retain absolute backward > > compatibility and I do not know whether this is acceptable. > > Specifically, multi-eraseblock-sized writes (requiring multiple loop > > iterations to be processed) which succeeded with the original code _may_ > > now return errors and/or write different contents to the device than the > > original code, depending on the MTD mode of operation requested and on > > whether the start offset is page-aligned. The documentation for struct > > mtd_oob_ops warns about even multi-page write requests, but... > > > > Example: > > > > MTD device parameters: > > > > - mtd->writesize = 2048 > > - mtd->erasesize = 131072 > > - 64 bytes of raw OOB space per page > > > > struct mtd_write_req req = { > > .start = 2048, > > .len = 262144, > > .ooblen = 64, > > .usr_data = ..., > > .usr_oob = ..., > > .mode = MTD_OPS_RAW, > > }; > > > > (This is a 128-page write with OOB data supplied for just one page.) > > > > Current mtdchar_write_ioctl() returns 0 for this request and writes > > 128 pages of data and 1 page's worth of OOB data (64 bytes) to the > > MTD device. > > > > Patched mtdchar_write_ioctl() may return an error because the > > original request gets split into two chunks (<data_len, oob_len>): > > > > <131072, 64> > > <131072, 0> > > > > and the second chunk with zero OOB data length may make the MTD > > driver unhappy in raw mode (resulting in only the first 64 pages of > > data and 1 page's worth of OOB data getting written to the MTD > > device). > > Isn't this a driver issue instead? I mean, writing an eraseblock > without providing any OOB data is completely fine, if the driver > accepts 2 blocks + 1 page OOB but refuses 1 block + 1 page OOB and then > 1 block, it's broken, no? Have you experienced such a situation in your > testing? I may not have expressed myself clearly here, sorry - the example was already getting a bit lengthy at that point... :) I tested the patch with nandsim, but I do not think it is that specific driver that is broken. The catch is that when mtd_write_oob() is called, nand_do_write_ops() splits multi-page requests into single-page requests and what it passes to nand_write_page() depends on whether the struct mtd_oob_ops it was invoked with has the oobbuf field set to NULL or not. This is okay in itself, but when another request-splitting "layer" is introduced by my patch, the ioctl may start returning different result codes than it used to. Here is what happens with the unpatched code for the example above: 1. mtdchar_write_ioctl() gets called with the following request: struct mtd_write_req req = { .start = 2048, .len = 262144, .ooblen = 64, .usr_data = 0x10000000, .usr_oob = 0x20000000, .mode = MTD_OPS_RAW, }; 2. The above request is passed through to mtd_write_oob() verbatim: struct mtd_oob_ops ops = { .mode = MTD_OPS_RAW, .len = 262144, .ooblen = 64, .datbuf = 0x10000000, .oobbuf = 0x20000000, }; 3. nand_do_write_ops() splits this request up into page-sized requests. a) For the first page, the initial 2048 bytes of data + 64 bytes of OOB data are passed to nand_write_page(). b) For each subsequent page, a 2048-byte chunk of data + 64 bytes of 0xff bytes are passed to nand_write_page(). Since the oobbuf field in the struct mtd_oob_ops passed is not NULL, oob_required is set to 1 for all nand_write_page() calls. 4. The above causes the driver to receive 2112 bytes of data for each page write, which results in the ioctl being successful. Here is what happens with the patched code: 1. mtdchar_write_ioctl() gets called with the same request as above. 2. The original request gets split into two eraseblock-sized mtd_write_oob() calls: a) struct mtd_oob_ops ops = { .mode = MTD_OPS_RAW, .len = 131072, .ooblen = 64, .datbuf = 0x10000000, .oobbuf = 0x20000000, }; b) struct mtd_oob_ops ops = { .mode = MTD_OPS_RAW, .len = 131072, .ooblen = 0, .datbuf = 0x10020000, .oobbuf = NULL, }; (My code sets oobbuf to NULL if ooblen is 0.) 3. nand_do_write_ops() splits the first request up into page-sized requests the same way as for the original code. It returns successfully, so mtdchar_write_ioctl() proceeds with the next eraseblock-sized request. 4. nand_do_write_ops() splits the second request up into page-sized requests. The first page write contains 2048 bytes of data and no OOB data; since the oobbuf field in the struct mtd_oob_ops passed is NULL, oob_required is set to 0. 5. The above causes the driver to receive 2048 bytes of data for a page write in raw mode, which results in an error that propagates all the way up to mtdchar_write_ioctl(). The nandsim driver returns the same error if you pass the following request to the MEMWRITE ioctl: struct mtd_write_req req = { .start = 2048, .len = 2048, .ooblen = 0, .usr_data = 0x10000000, .usr_oob = NULL, .mode = MTD_OPS_RAW, }; so it is not the driver that is broken or insane, it is the splitting process that may cause the MEMWRITE ioctl to return different error codes than before. I played with the code a bit more and I found a fix which addresses this issue without breaking other scenarios: setting oobbuf to the same pointer for every loop iteration (if ooblen is 0, no OOB data will be written anyway). I also tackled the problem of mishandling large non-page-aligned writes in v1 and I managed to fix it by trimming the first mtd_write_oob() call so that it ends on an eraseblock boundary. This implicitly makes subsequent writes page-aligned and seems to fix the problem. Finally, I reworked the OOB length adjustment code to address other cases of mishandling non-page-aligned writes. I could not find any other cases in which the revised code behaves differently than the original one. I will send v2 soon.
Hi Michał, kernel@kempniu.pl wrote on Thu, 25 Nov 2021 13:08:16 +0100: > Hi Miquèl, > > TL;DR: thanks, your comment made me look closer and I found what seems > to be a feasible workaround that I will apply in v2 (along other fixes). > > > > Despite my efforts, the patch does _not_ retain absolute backward > > > compatibility and I do not know whether this is acceptable. > > > Specifically, multi-eraseblock-sized writes (requiring multiple loop > > > iterations to be processed) which succeeded with the original code _may_ > > > now return errors and/or write different contents to the device than the > > > original code, depending on the MTD mode of operation requested and on > > > whether the start offset is page-aligned. The documentation for struct > > > mtd_oob_ops warns about even multi-page write requests, but... > > > > > > Example: > > > > > > MTD device parameters: > > > > > > - mtd->writesize = 2048 > > > - mtd->erasesize = 131072 > > > - 64 bytes of raw OOB space per page > > > > > > struct mtd_write_req req = { > > > .start = 2048, > > > .len = 262144, > > > .ooblen = 64, > > > .usr_data = ..., > > > .usr_oob = ..., > > > .mode = MTD_OPS_RAW, > > > }; > > > > > > (This is a 128-page write with OOB data supplied for just one page.) > > > > > > Current mtdchar_write_ioctl() returns 0 for this request and writes > > > 128 pages of data and 1 page's worth of OOB data (64 bytes) to the > > > MTD device. > > > > > > Patched mtdchar_write_ioctl() may return an error because the > > > original request gets split into two chunks (<data_len, oob_len>): > > > > > > <131072, 64> > > > <131072, 0> > > > > > > and the second chunk with zero OOB data length may make the MTD > > > driver unhappy in raw mode (resulting in only the first 64 pages of > > > data and 1 page's worth of OOB data getting written to the MTD > > > device). > > > > Isn't this a driver issue instead? I mean, writing an eraseblock > > without providing any OOB data is completely fine, if the driver > > accepts 2 blocks + 1 page OOB but refuses 1 block + 1 page OOB and then > > 1 block, it's broken, no? Have you experienced such a situation in your > > testing? > > I may not have expressed myself clearly here, sorry - the example was > already getting a bit lengthy at that point... :) > > I tested the patch with nandsim, but I do not think it is that specific > driver that is broken. The catch is that when mtd_write_oob() is > called, nand_do_write_ops() splits multi-page requests into single-page > requests and what it passes to nand_write_page() depends on whether the > struct mtd_oob_ops it was invoked with has the oobbuf field set to NULL > or not. This is okay in itself, but when another request-splitting > "layer" is introduced by my patch, the ioctl may start returning > different result codes than it used to. > > Here is what happens with the unpatched code for the example above: > > 1. mtdchar_write_ioctl() gets called with the following request: > > struct mtd_write_req req = { > .start = 2048, > .len = 262144, > .ooblen = 64, > .usr_data = 0x10000000, > .usr_oob = 0x20000000, > .mode = MTD_OPS_RAW, > }; > > 2. The above request is passed through to mtd_write_oob() verbatim: > > struct mtd_oob_ops ops = { > .mode = MTD_OPS_RAW, > .len = 262144, > .ooblen = 64, > .datbuf = 0x10000000, > .oobbuf = 0x20000000, > }; > > 3. nand_do_write_ops() splits this request up into page-sized requests. > > a) For the first page, the initial 2048 bytes of data + 64 bytes of > OOB data are passed to nand_write_page(). > > b) For each subsequent page, a 2048-byte chunk of data + 64 bytes > of 0xff bytes are passed to nand_write_page(). > > Since the oobbuf field in the struct mtd_oob_ops passed is not NULL, > oob_required is set to 1 for all nand_write_page() calls. > > 4. The above causes the driver to receive 2112 bytes of data for each > page write, which results in the ioctl being successful. > > Here is what happens with the patched code: > > 1. mtdchar_write_ioctl() gets called with the same request as above. > > 2. The original request gets split into two eraseblock-sized > mtd_write_oob() calls: > > a) struct mtd_oob_ops ops = { > .mode = MTD_OPS_RAW, > .len = 131072, > .ooblen = 64, > .datbuf = 0x10000000, > .oobbuf = 0x20000000, > }; > > b) struct mtd_oob_ops ops = { > .mode = MTD_OPS_RAW, > .len = 131072, > .ooblen = 0, > .datbuf = 0x10020000, > .oobbuf = NULL, > }; > > (My code sets oobbuf to NULL if ooblen is 0.) > > 3. nand_do_write_ops() splits the first request up into page-sized > requests the same way as for the original code. It returns > successfully, so mtdchar_write_ioctl() proceeds with the next > eraseblock-sized request. > > 4. nand_do_write_ops() splits the second request up into page-sized > requests. The first page write contains 2048 bytes of data and no > OOB data; since the oobbuf field in the struct mtd_oob_ops passed is > NULL, oob_required is set to 0. > > 5. The above causes the driver to receive 2048 bytes of data for a page > write in raw mode, which results in an error that propagates all the > way up to mtdchar_write_ioctl(). This is definitely far from an expected behavior. Writing a page without OOB is completely fine. > > The nandsim driver returns the same error if you pass the following > request to the MEMWRITE ioctl: > > struct mtd_write_req req = { > .start = 2048, > .len = 2048, > .ooblen = 0, > .usr_data = 0x10000000, > .usr_oob = NULL, > .mode = MTD_OPS_RAW, > }; > > so it is not the driver that is broken or insane, it is the splitting > process that may cause the MEMWRITE ioctl to return different error > codes than before. > > I played with the code a bit more and I found a fix which addresses this > issue without breaking other scenarios: setting oobbuf to the same > pointer for every loop iteration (if ooblen is 0, no OOB data will be > written anyway). You mean that { .user_oob = NULL, .ooblen = 0 } fails, while { .user_oob = random, .ooblen = 0 } works? This seems a little bit fragile. Could you tell us the origin of the error? Because in nand_do_write_ops() if ops->oobbuf is populated then oob_required is set to true no matter the value set in ooblen. Plus, the code in mtdchar is clear: .oobbuf is set to NULL if there are no OOBs provided by the user so I believe this is a situation that should already work. > I also tackled the problem of mishandling large non-page-aligned writes > in v1 and I managed to fix it by trimming the first mtd_write_oob() call > so that it ends on an eraseblock boundary. This implicitly makes > subsequent writes page-aligned and seems to fix the problem. Great! > > Finally, I reworked the OOB length adjustment code to address other > cases of mishandling non-page-aligned writes. > > I could not find any other cases in which the revised code behaves > differently than the original one. I will send v2 soon. Thanks for all work on this topic! Cheers, Miquèl
Hi Miquèl, > > 5. The above causes the driver to receive 2048 bytes of data for a page > > write in raw mode, which results in an error that propagates all the > > way up to mtdchar_write_ioctl(). > > This is definitely far from an expected behavior. Writing a page > without OOB is completely fine. Could it be a nandsim quirk? Sorry, I do not feel qualified enough to comment on this. I prepared a code flow analysis below, though. > > The nandsim driver returns the same error if you pass the following > > request to the MEMWRITE ioctl: > > > > struct mtd_write_req req = { > > .start = 2048, > > .len = 2048, > > .ooblen = 0, > > .usr_data = 0x10000000, > > .usr_oob = NULL, > > .mode = MTD_OPS_RAW, > > }; > > > > so it is not the driver that is broken or insane, it is the splitting > > process that may cause the MEMWRITE ioctl to return different error > > codes than before. > > > > I played with the code a bit more and I found a fix which addresses this > > issue without breaking other scenarios: setting oobbuf to the same > > pointer for every loop iteration (if ooblen is 0, no OOB data will be > > written anyway). > > You mean that > { .user_oob = NULL, .ooblen = 0 } > fails, while > { .user_oob = random, .ooblen = 0 } > works? This seems a little bit fragile. That is indeed the behavior I am observing with nandsim, even on a kernel which does not include my patch. > Could you tell us the origin of the error? Because in > nand_do_write_ops() if ops->oobbuf is populated then oob_required is > set to true no matter the value set in ooblen. Correct - and that is what causes the behavior described above (and why the tweak I came up with works around the problem). nand_do_write_ops() calls nand_write_page() with 'oob_required' passed as the fifth parameter. In raw mode, nand_write_page() calls nand_write_page_raw(). Here is what happens there: 1. nand_prog_page_begin_op() sets up a page programming operation by sending a few commands to the chip. See nand_exec_prog_page_op() for details. Note that since the 'prog' parameter is set to false, the last two instructions from the 'instrs' array are not run yet. 2. 'oob_required' is checked. If it is set to 1, OOB data is sent to the chip; otherwise, it is not sent. 3. nand_prog_page_end_op() is called to finish the programming operation. At that point, the ACTION_PRGPAGE switch case in ns_do_state_action() (in drivers/mtd/nand/raw/nandsim.c) checks whether the number of bytes it received so far for this operation (ns->regs.count, updated by ns_nand_write_buf() as data is pushed to the chip) equals the number of bytes in a full page with OOB data (num). If not, an error is returned, which results in the NAND_STATUS_FAIL flag being set in the status byte, triggering an -EIO. This does not happen for any other MTD operation mode because the chip callbacks that nand_write_page() invokes in those other modes cause OOB data to be sent to the chip. > Plus, the code in mtdchar is clear: .oobbuf is set to NULL if there are > no OOBs provided by the user so I believe this is a situation that > should already work. Correct, though current mtdchar_write_ioctl() code only looks at the value of the 'usr_oob' field in the struct mtd_write_req passed to it, so even if you pass { .usr_oob = <something non-NULL>, .ooblen = 0 }, it will still set ops.oobbuf to the pointer returned by memdup_user().
diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c index 155e991d9d75..a3afc390e254 100644 --- a/drivers/mtd/mtdchar.c +++ b/drivers/mtd/mtdchar.c @@ -578,9 +578,10 @@ static int mtdchar_write_ioctl(struct mtd_info *mtd, { struct mtd_info *master = mtd_get_master(mtd); struct mtd_write_req req; - struct mtd_oob_ops ops = {}; const void __user *usr_data, *usr_oob; - int ret; + uint8_t *datbuf = NULL, *oobbuf = NULL; + size_t datbuf_len, oobbuf_len; + int ret = 0; if (copy_from_user(&req, argp, sizeof(req))) return -EFAULT; @@ -590,33 +591,79 @@ static int mtdchar_write_ioctl(struct mtd_info *mtd, if (!master->_write_oob) return -EOPNOTSUPP; - ops.mode = req.mode; - ops.len = (size_t)req.len; - ops.ooblen = (size_t)req.ooblen; - ops.ooboffs = 0; - - if (usr_data) { - ops.datbuf = memdup_user(usr_data, ops.len); - if (IS_ERR(ops.datbuf)) - return PTR_ERR(ops.datbuf); - } else { - ops.datbuf = NULL; + + if (!usr_data) + req.len = 0; + + if (!usr_oob) + req.ooblen = 0; + + if (req.start + req.len > mtd->size) + return -EINVAL; + + datbuf_len = min_t(size_t, req.len, mtd->erasesize); + if (datbuf_len > 0) { + datbuf = kmalloc(datbuf_len, GFP_KERNEL); + if (!datbuf) + return -ENOMEM; } - if (usr_oob) { - ops.oobbuf = memdup_user(usr_oob, ops.ooblen); - if (IS_ERR(ops.oobbuf)) { - kfree(ops.datbuf); - return PTR_ERR(ops.oobbuf); + oobbuf_len = min_t(size_t, req.ooblen, mtd->erasesize); + if (oobbuf_len > 0) { + oobbuf = kmalloc(oobbuf_len, GFP_KERNEL); + if (!oobbuf) { + kfree(datbuf); + return -ENOMEM; } - } else { - ops.oobbuf = NULL; } - ret = mtd_write_oob(mtd, (loff_t)req.start, &ops); + while (req.len > 0 || (!usr_data && req.ooblen > 0)) { + struct mtd_oob_ops ops = { + .mode = req.mode, + .len = min_t(size_t, req.len, datbuf_len), + .ooblen = min_t(size_t, req.ooblen, oobbuf_len), + .datbuf = req.len ? datbuf : NULL, + .oobbuf = req.ooblen ? oobbuf : NULL, + }; - kfree(ops.datbuf); - kfree(ops.oobbuf); + /* + * For writes which are not OOB-only, adjust the amount of OOB + * data written according to the number of data pages written. + * This is necessary to prevent OOB data from being skipped + * over in data+OOB writes requiring multiple mtd_write_oob() + * calls to be completed. + */ + if (ops.len > 0 && ops.ooblen > 0) { + u32 oob_per_page = mtd_oobavail(mtd, &ops); + uint32_t pages_to_write = mtd_div_by_ws(ops.len, mtd); + + if (mtd_mod_by_ws(req.start + ops.len, mtd)) + pages_to_write++; + + ops.ooblen = min_t(size_t, ops.ooblen, + pages_to_write * oob_per_page); + } + + if (copy_from_user(datbuf, usr_data, ops.len) || + copy_from_user(oobbuf, usr_oob, ops.ooblen)) { + ret = -EFAULT; + break; + } + + ret = mtd_write_oob(mtd, req.start, &ops); + if (ret) + break; + + req.start += ops.retlen; + req.len -= ops.retlen; + usr_data += ops.retlen; + + req.ooblen -= ops.oobretlen; + usr_oob += ops.oobretlen; + } + + kfree(datbuf); + kfree(oobbuf); return ret; }
In the mtdchar_write_ioctl() function, memdup_user() is called with its 'len' parameter set to verbatim values provided by user space via a struct mtd_write_req. Both the 'len' and 'ooblen' fields of that structure are 64-bit unsigned integers, which means the MEMWRITE ioctl can trigger unbounded kernel memory allocation requests. Fix by iterating over the buffers provided by user space in a loop, processing at most mtd->erasesize bytes in each iteration. Adopt some checks from mtd_check_oob_ops() to retain backward user space compatibility. Suggested-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Michał Kępień <kernel@kempniu.pl> --- Fixing this problem was suggested last month, during a discussion about a new MEMREAD ioctl. [1] [2] My primary objective was to not break user space, i.e. to not change externally visible behavior compared to the current code. The main problem I faced was that splitting the input data into chunks makes the MEMWRITE ioctl a _wrapper_ for mtd_write_oob() rather than a direct _interface_ to it and yet from the user space perspective it still needs to behave as if nothing changed. Despite my efforts, the patch does _not_ retain absolute backward compatibility and I do not know whether this is acceptable. Specifically, multi-eraseblock-sized writes (requiring multiple loop iterations to be processed) which succeeded with the original code _may_ now return errors and/or write different contents to the device than the original code, depending on the MTD mode of operation requested and on whether the start offset is page-aligned. The documentation for struct mtd_oob_ops warns about even multi-page write requests, but... Example: MTD device parameters: - mtd->writesize = 2048 - mtd->erasesize = 131072 - 64 bytes of raw OOB space per page struct mtd_write_req req = { .start = 2048, .len = 262144, .ooblen = 64, .usr_data = ..., .usr_oob = ..., .mode = MTD_OPS_RAW, }; (This is a 128-page write with OOB data supplied for just one page.) Current mtdchar_write_ioctl() returns 0 for this request and writes 128 pages of data and 1 page's worth of OOB data (64 bytes) to the MTD device. Patched mtdchar_write_ioctl() may return an error because the original request gets split into two chunks (<data_len, oob_len>): <131072, 64> <131072, 0> and the second chunk with zero OOB data length may make the MTD driver unhappy in raw mode (resulting in only the first 64 pages of data and 1 page's worth of OOB data getting written to the MTD device). Is an ioctl like this considered a "degenerate" one and therefore acceptable to break or not? At any rate, the revised code feels brittle to me and I would not be particularly surprised if I missed more cases in which it produces different results than the original code. I keep on wondering whether the benefits of this change are worth the extra code complexity, but fortunately it is not my call to make :) Perhaps I am missing something and my proposal can be simplified? Or maybe the way I approached this is completely wrong? Any thoughts on this are welcome. As the outcome of the discussion around this patch will have a significant influence on the shape of the v2 of the MEMREAD ioctl, I decided to submit this one first as a standalone patch. [1] https://lists.infradead.org/pipermail/linux-mtd/2021-September/088485.html [2] https://lists.infradead.org/pipermail/linux-mtd/2021-September/088544.html drivers/mtd/mtdchar.c | 93 ++++++++++++++++++++++++++++++++----------- 1 file changed, 70 insertions(+), 23 deletions(-)