Message ID | 20190303151846.GQ2217@ZenIV.linux.org.uk |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
Series | aio: prevent the final fput() in the middle of vfs_poll() (Re: KASAN: use-after-free Read in unix_dgram_poll) | expand |
On 03/03/2019 07:18 AM, Al Viro wrote: > Fixes: bfe4037e722ec > Cc: stable@vger.kernel.org > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- > diff --git a/fs/aio.c b/fs/aio.c > index 3083180a54c8..7e88bfabdac2 100644 > --- a/fs/aio.c > +++ b/fs/aio.c > @@ -1767,6 +1767,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) > > /* one for removal from waitqueue, one for this function */ > refcount_set(&aiocb->ki_refcnt, 2); > + get_file(req->file); > > mask = vfs_poll(req->file, &apt.pt) & req->events; > if (unlikely(!req->head)) { > @@ -1793,6 +1794,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) > spin_unlock_irq(&ctx->ctx_lock); > > out: > + fput(req->file); > if (unlikely(apt.error)) { > fput(req->file); > return apt.error; > Very nice changelog Al, thanks for fixing this. Reviewed-by: Eric Dumazet <edumazet@google.com>
On Sun, Mar 3, 2019 at 7:18 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > Maybe unrelated to this bug, but... What's to prevent a wakeup > > that happens just after we'd been added to a waitqueue by ->poll() > > triggering aio_poll_wake(), which gets to aio_poll_complete() > > with its fput() *before* we'd reached the end of ->poll() instance? I'm assuming you're talking about the second vfs_poll() in aio_poll_complete_work()? The one we call before we check for "rew->cancelled" properly under the spinlock? > 1) io_submit(2) allocates aio_kiocb instance and passes it to aio_poll() > 2) aio_poll() resolves the descriptor to struct file by > req->file = fget(iocb->aio_fildes) [...] So honestly, the whole filp handling in aio looks overly complicated to me. All the different operations do that silly fget/fput() dance, although aio_read/write at least tried to make a common helper function for handling it. But as far as I can tell, they *all* could do: - look up file in aio_submit() when allocating and creating the aio_kiocb - drop the filp in 'iocb_put()' (which happens whether things complete successfully or not). and we'd have avoided a lot of complexity, and we'd have avoided this bug. Your patch fixes the poll() case, but it does so by just letting the existing complexity remain, and adding a second fget/fput pair in the poll logic. It would seem like it would be much better to rip all the complexity out entirely, and replace it with sane, simple and obviously correct code. Hmm? In other words, why wouldn't something like the attached work instead? TOTALLY UNTESTED! It builds, and it looks sane, but maybe I'm overlooking some obvious problem with it. But doesn't it look nice to see 2 files changed, 41 insertions(+), 50 deletions(-) with actual code reduction, and a fundamental simplification in handling of the file pointer? Linus fs/aio.c | 83 ++++++++++++++++++++++-------------------------------- include/linux/fs.h | 8 +++++- 2 files changed, 41 insertions(+), 50 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index aaaaf4d12c73..9eccd2ea6ca9 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -167,14 +167,18 @@ struct kioctx { unsigned id; }; +/* + * First field must be 'ki_filp' in all the + * iocb unions! + */ struct fsync_iocb { + struct file *ki_filp; struct work_struct work; - struct file *file; bool datasync; }; struct poll_iocb { - struct file *file; + struct file *ki_filp; struct wait_queue_head *head; __poll_t events; bool woken; @@ -183,8 +187,15 @@ struct poll_iocb { struct work_struct work; }; +/* + * NOTE! Each of the iocb union members has "ki_filp" as + * the first entry in their struct definition. So you can + * access the file pointer either directly through this + * anonymous union, or through any of the sub-structs. + */ struct aio_kiocb { union { + struct file *ki_filp; struct kiocb rw; struct fsync_iocb fsync; struct poll_iocb poll; @@ -1060,6 +1071,7 @@ static inline void iocb_put(struct aio_kiocb *iocb) { if (refcount_read(&iocb->ki_refcnt) == 0 || refcount_dec_and_test(&iocb->ki_refcnt)) { + fput(iocb->ki_filp); percpu_ref_put(&iocb->ki_ctx->reqs); kmem_cache_free(kiocb_cachep, iocb); } @@ -1413,7 +1425,7 @@ static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) aio_remove_iocb(iocb); if (kiocb->ki_flags & IOCB_WRITE) { - struct inode *inode = file_inode(kiocb->ki_filp); + struct inode *inode = file_inode(iocb->ki_filp); /* * Tell lockdep we inherited freeze protection from submission @@ -1421,10 +1433,9 @@ static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) */ if (S_ISREG(inode->i_mode)) __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE); - file_end_write(kiocb->ki_filp); + file_end_write(iocb->ki_filp); } - fput(kiocb->ki_filp); aio_complete(iocb, res, res2); } @@ -1432,9 +1443,6 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) { int ret; - req->ki_filp = fget(iocb->aio_fildes); - if (unlikely(!req->ki_filp)) - return -EBADF; req->ki_complete = aio_complete_rw; req->private = NULL; req->ki_pos = iocb->aio_offset; @@ -1451,7 +1459,7 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) ret = ioprio_check_cap(iocb->aio_reqprio); if (ret) { pr_debug("aio ioprio check cap error: %d\n", ret); - goto out_fput; + return ret; } req->ki_ioprio = iocb->aio_reqprio; @@ -1460,14 +1468,10 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags); if (unlikely(ret)) - goto out_fput; + return ret; req->ki_flags &= ~IOCB_HIPRI; /* no one is going to poll for this I/O */ return 0; - -out_fput: - fput(req->ki_filp); - return ret; } static int aio_setup_rw(int rw, const struct iocb *iocb, struct iovec **iovec, @@ -1521,24 +1525,19 @@ static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb, if (ret) return ret; file = req->ki_filp; - - ret = -EBADF; if (unlikely(!(file->f_mode & FMODE_READ))) - goto out_fput; + return -EBADF; ret = -EINVAL; if (unlikely(!file->f_op->read_iter)) - goto out_fput; + return -EINVAL; ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter); if (ret) - goto out_fput; + return ret; ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter)); if (!ret) aio_rw_done(req, call_read_iter(file, req, &iter)); kfree(iovec); -out_fput: - if (unlikely(ret)) - fput(file); return ret; } @@ -1555,16 +1554,14 @@ static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb, return ret; file = req->ki_filp; - ret = -EBADF; if (unlikely(!(file->f_mode & FMODE_WRITE))) - goto out_fput; - ret = -EINVAL; + return -EBADF; if (unlikely(!file->f_op->write_iter)) - goto out_fput; + return -EINVAL; ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter); if (ret) - goto out_fput; + return ret; ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter)); if (!ret) { /* @@ -1582,9 +1579,6 @@ static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb, aio_rw_done(req, call_write_iter(file, req, &iter)); } kfree(iovec); -out_fput: - if (unlikely(ret)) - fput(file); return ret; } @@ -1593,8 +1587,7 @@ static void aio_fsync_work(struct work_struct *work) struct fsync_iocb *req = container_of(work, struct fsync_iocb, work); int ret; - ret = vfs_fsync(req->file, req->datasync); - fput(req->file); + ret = vfs_fsync(req->ki_filp, req->datasync); aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0); } @@ -1605,13 +1598,8 @@ static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb, iocb->aio_rw_flags)) return -EINVAL; - req->file = fget(iocb->aio_fildes); - if (unlikely(!req->file)) - return -EBADF; - if (unlikely(!req->file->f_op->fsync)) { - fput(req->file); + if (unlikely(!req->ki_filp->f_op->fsync)) return -EINVAL; - } req->datasync = datasync; INIT_WORK(&req->work, aio_fsync_work); @@ -1621,10 +1609,7 @@ static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb, static inline void aio_poll_complete(struct aio_kiocb *iocb, __poll_t mask) { - struct file *file = iocb->poll.file; - aio_complete(iocb, mangle_poll(mask), 0); - fput(file); } static void aio_poll_complete_work(struct work_struct *work) @@ -1636,7 +1621,7 @@ static void aio_poll_complete_work(struct work_struct *work) __poll_t mask = 0; if (!READ_ONCE(req->cancelled)) - mask = vfs_poll(req->file, &pt) & req->events; + mask = vfs_poll(iocb->ki_filp, &pt) & req->events; /* * Note that ->ki_cancel callers also delete iocb from active_reqs after @@ -1743,9 +1728,6 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) INIT_WORK(&req->work, aio_poll_complete_work); req->events = demangle_poll(iocb->aio_buf) | EPOLLERR | EPOLLHUP; - req->file = fget(iocb->aio_fildes); - if (unlikely(!req->file)) - return -EBADF; req->head = NULL; req->woken = false; @@ -1763,7 +1745,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) /* one for removal from waitqueue, one for this function */ refcount_set(&aiocb->ki_refcnt, 2); - mask = vfs_poll(req->file, &apt.pt) & req->events; + mask = vfs_poll(aiocb->ki_filp, &apt.pt) & req->events; if (unlikely(!req->head)) { /* we did not manage to set up a waitqueue, done */ goto out; @@ -1788,10 +1770,8 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) spin_unlock_irq(&ctx->ctx_lock); out: - if (unlikely(apt.error)) { - fput(req->file); + if (unlikely(apt.error)) return apt.error; - } if (mask) aio_poll_complete(aiocb, mask); @@ -1829,6 +1809,11 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, if (unlikely(!req)) goto out_put_reqs_available; + req->ki_filp = fget(iocb->aio_fildes); + ret = -EBADF; + if (unlikely(!req->ki_filp)) + goto out_put_req; + if (iocb->aio_flags & IOCB_FLAG_RESFD) { /* * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an diff --git a/include/linux/fs.h b/include/linux/fs.h index 29d8e2cfed0e..fd423fec8d83 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -304,13 +304,19 @@ enum rw_hint { struct kiocb { struct file *ki_filp; + + /* The 'ki_filp' pointer is shared in a union for aio */ + randomized_struct_fields_start + loff_t ki_pos; void (*ki_complete)(struct kiocb *iocb, long ret, long ret2); void *private; int ki_flags; u16 ki_hint; u16 ki_ioprio; /* See linux/ioprio.h */ -} __randomize_layout; + + randomized_struct_fields_end +}; static inline bool is_sync_kiocb(struct kiocb *kiocb) {
On Sun, Mar 3, 2019 at 11:44 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > But doesn't it look nice to see > > 2 files changed, 41 insertions(+), 50 deletions(-) > > with actual code reduction, and a fundamental simplification in > handling of the file pointer? A coupl,e of the changes are "useless", and do the same thing as not having them at all: - struct inode *inode = file_inode(kiocb->ki_filp); + struct inode *inode = file_inode(iocb->ki_filp); - file_end_write(kiocb->ki_filp); + file_end_write(iocb->ki_filp); because the "ki_filp" ends up existing in both kiocb and iocb. At one point of editing that file I decided to try to just remove it from the sub-structs entirely and only keep it in the top-level structure, but it needs to be inside the 'struct kiocb' anyway for all the other users outside of fs/aio.c. Anyway, I don't think the patch is wrong (although I haven't actually _tested_ it) but I wanted to point out that those two one-liner changes are just "noise" that doesn't matter for the working of the patch. In the above, we have 'kiocb' being the embedded 'struct kiocb', and 'iocb' is the 'struct aio_kiocb' that contains it. 'ki_filp' is the exact same field in both cases. Linus Linus
On Sun, Mar 03, 2019 at 11:44:33AM -0800, Linus Torvalds wrote: > On Sun, Mar 3, 2019 at 7:18 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > Maybe unrelated to this bug, but... What's to prevent a wakeup > > > that happens just after we'd been added to a waitqueue by ->poll() > > > triggering aio_poll_wake(), which gets to aio_poll_complete() > > > with its fput() *before* we'd reached the end of ->poll() instance? > > I'm assuming you're talking about the second vfs_poll() in > aio_poll_complete_work()? The one we call before we check for > "rew->cancelled" properly under the spinlock? No. The first one, right in aio_poll(). > But as far as I can tell, they *all* could do: > > - look up file in aio_submit() when allocating and creating the aio_kiocb > > - drop the filp in 'iocb_put()' (which happens whether things > complete successfully or not). > > and we'd have avoided a lot of complexity, and we'd have avoided this bug. > > Your patch fixes the poll() case, but it does so by just letting the > existing complexity remain, and adding a second fget/fput pair in the > poll logic. > > It would seem like it would be much better to rip all the complexity > out entirely, and replace it with sane, simple and obviously correct > code. > > Hmm? > > In other words, why wouldn't something like the attached work instead? I'll need to dig out the mail archive from last year, but IIRC this had been considered and there'd been non-trivial problems. Give me an hour or so and I'll dig that out (there'd been many rounds of review, with long threads involved, some private, some on fsdevel). > @@ -1060,6 +1071,7 @@ static inline void iocb_put(struct aio_kiocb *iocb) > { > if (refcount_read(&iocb->ki_refcnt) == 0 || > refcount_dec_and_test(&iocb->ki_refcnt)) { > + fput(iocb->ki_filp); Trivial oops here - it might be NULL.
On Sun, Mar 3, 2019 at 12:30 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Sun, Mar 03, 2019 at 11:44:33AM -0800, Linus Torvalds wrote: > > > > I'm assuming you're talking about the second vfs_poll() in > > aio_poll_complete_work()? The one we call before we check for > > "rew->cancelled" properly under the spinlock? > > No. The first one, right in aio_poll(). Ok, they are both confusing. The lifetime of that filp is just not clear, with the whole "it could have been completed asynchronously" issue. > I'll need to dig out the mail archive from last year, but IIRC this > had been considered and there'd been non-trivial problems. Give me > an hour or so and I'll dig that out (there'd been many rounds of > review, with long threads involved, some private, some on fsdevel). Looking more at the patch, it still looks eminently sane to me.I fixed the silly "ki_filp" thing you noticed (I thought we made fput() take NULL, but you're right we don't), and simplified the patch to not do the changes that aren't necessary. I fixed the silly "filp can be NULL" issue you pointed out (I lazily thought we allowed fput(NULL), but you're right, we definitely don't), and simplified the patch to not do the unnecessary changes where we can just access the file pointer multiple different ways. And the resulting kernel boots fine, but I doubt I have anything that actually uses io_submit(), so that doesn't mean much. Slightly updated patch attached anyway, even smaller than before: 2 files changed, 36 insertions(+), 44 deletions(-) with several of the new lines being comments about that file pointer placement in the union. Linus fs/aio.c | 72 ++++++++++++++++++++++-------------------------------- include/linux/fs.h | 8 +++++- 2 files changed, 36 insertions(+), 44 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index aaaaf4d12c73..82c08422b0f4 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -167,9 +167,13 @@ struct kioctx { unsigned id; }; +/* + * First field must be the file pointer in all the + * iocb unions! See also 'struct kiocb' in <linux/fs.h> + */ struct fsync_iocb { - struct work_struct work; struct file *file; + struct work_struct work; bool datasync; }; @@ -183,8 +187,15 @@ struct poll_iocb { struct work_struct work; }; +/* + * NOTE! Each of the iocb union members has the file pointer + * as the first entry in their struct definition. So you can + * access the file pointer through any of the sub-structs, + * or directly as just 'ki_filp' in this struct. + */ struct aio_kiocb { union { + struct file *ki_filp; struct kiocb rw; struct fsync_iocb fsync; struct poll_iocb poll; @@ -1060,6 +1071,8 @@ static inline void iocb_put(struct aio_kiocb *iocb) { if (refcount_read(&iocb->ki_refcnt) == 0 || refcount_dec_and_test(&iocb->ki_refcnt)) { + if (iocb->ki_filp) + fput(iocb->ki_filp); percpu_ref_put(&iocb->ki_ctx->reqs); kmem_cache_free(kiocb_cachep, iocb); } @@ -1424,7 +1437,6 @@ static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) file_end_write(kiocb->ki_filp); } - fput(kiocb->ki_filp); aio_complete(iocb, res, res2); } @@ -1432,9 +1444,6 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) { int ret; - req->ki_filp = fget(iocb->aio_fildes); - if (unlikely(!req->ki_filp)) - return -EBADF; req->ki_complete = aio_complete_rw; req->private = NULL; req->ki_pos = iocb->aio_offset; @@ -1451,7 +1460,7 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) ret = ioprio_check_cap(iocb->aio_reqprio); if (ret) { pr_debug("aio ioprio check cap error: %d\n", ret); - goto out_fput; + return ret; } req->ki_ioprio = iocb->aio_reqprio; @@ -1460,14 +1469,10 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags); if (unlikely(ret)) - goto out_fput; + return ret; req->ki_flags &= ~IOCB_HIPRI; /* no one is going to poll for this I/O */ return 0; - -out_fput: - fput(req->ki_filp); - return ret; } static int aio_setup_rw(int rw, const struct iocb *iocb, struct iovec **iovec, @@ -1521,24 +1526,19 @@ static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb, if (ret) return ret; file = req->ki_filp; - - ret = -EBADF; if (unlikely(!(file->f_mode & FMODE_READ))) - goto out_fput; + return -EBADF; ret = -EINVAL; if (unlikely(!file->f_op->read_iter)) - goto out_fput; + return -EINVAL; ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter); if (ret) - goto out_fput; + return ret; ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter)); if (!ret) aio_rw_done(req, call_read_iter(file, req, &iter)); kfree(iovec); -out_fput: - if (unlikely(ret)) - fput(file); return ret; } @@ -1555,16 +1555,14 @@ static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb, return ret; file = req->ki_filp; - ret = -EBADF; if (unlikely(!(file->f_mode & FMODE_WRITE))) - goto out_fput; - ret = -EINVAL; + return -EBADF; if (unlikely(!file->f_op->write_iter)) - goto out_fput; + return -EINVAL; ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter); if (ret) - goto out_fput; + return ret; ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter)); if (!ret) { /* @@ -1582,9 +1580,6 @@ static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb, aio_rw_done(req, call_write_iter(file, req, &iter)); } kfree(iovec); -out_fput: - if (unlikely(ret)) - fput(file); return ret; } @@ -1594,7 +1589,6 @@ static void aio_fsync_work(struct work_struct *work) int ret; ret = vfs_fsync(req->file, req->datasync); - fput(req->file); aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0); } @@ -1605,13 +1599,8 @@ static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb, iocb->aio_rw_flags)) return -EINVAL; - req->file = fget(iocb->aio_fildes); - if (unlikely(!req->file)) - return -EBADF; - if (unlikely(!req->file->f_op->fsync)) { - fput(req->file); + if (unlikely(!req->file->f_op->fsync)) return -EINVAL; - } req->datasync = datasync; INIT_WORK(&req->work, aio_fsync_work); @@ -1621,10 +1610,7 @@ static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb, static inline void aio_poll_complete(struct aio_kiocb *iocb, __poll_t mask) { - struct file *file = iocb->poll.file; - aio_complete(iocb, mangle_poll(mask), 0); - fput(file); } static void aio_poll_complete_work(struct work_struct *work) @@ -1743,9 +1729,6 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) INIT_WORK(&req->work, aio_poll_complete_work); req->events = demangle_poll(iocb->aio_buf) | EPOLLERR | EPOLLHUP; - req->file = fget(iocb->aio_fildes); - if (unlikely(!req->file)) - return -EBADF; req->head = NULL; req->woken = false; @@ -1788,10 +1771,8 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) spin_unlock_irq(&ctx->ctx_lock); out: - if (unlikely(apt.error)) { - fput(req->file); + if (unlikely(apt.error)) return apt.error; - } if (mask) aio_poll_complete(aiocb, mask); @@ -1829,6 +1810,11 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, if (unlikely(!req)) goto out_put_reqs_available; + req->ki_filp = fget(iocb->aio_fildes); + ret = -EBADF; + if (unlikely(!req->ki_filp)) + goto out_put_req; + if (iocb->aio_flags & IOCB_FLAG_RESFD) { /* * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an diff --git a/include/linux/fs.h b/include/linux/fs.h index 29d8e2cfed0e..fd423fec8d83 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -304,13 +304,19 @@ enum rw_hint { struct kiocb { struct file *ki_filp; + + /* The 'ki_filp' pointer is shared in a union for aio */ + randomized_struct_fields_start + loff_t ki_pos; void (*ki_complete)(struct kiocb *iocb, long ret, long ret2); void *private; int ki_flags; u16 ki_hint; u16 ki_ioprio; /* See linux/ioprio.h */ -} __randomize_layout; + + randomized_struct_fields_end +}; static inline bool is_sync_kiocb(struct kiocb *kiocb) {
On Sun, Mar 03, 2019 at 02:23:33PM -0800, Linus Torvalds wrote: OK, having dug through the archives, the reasons were not strong. So that part is OK... > @@ -1060,6 +1071,8 @@ static inline void iocb_put(struct aio_kiocb *iocb) > { > if (refcount_read(&iocb->ki_refcnt) == 0 || > refcount_dec_and_test(&iocb->ki_refcnt)) { > + if (iocb->ki_filp) > + fput(iocb->ki_filp); > percpu_ref_put(&iocb->ki_ctx->reqs); > kmem_cache_free(kiocb_cachep, iocb); > } That reminds me - refcount_t here is rather ridiculous; what we have is * everything other than aio_poll: ki_refcnt is 0 all along * aio_poll: originally 0, then set to 2, then two iocb_put() are done (either both synchronous to aio_poll(), or one sync and one async). That's not a refcount at all. It's a flag, set only for aio_poll ones. And that test in iocb_put() is "if flag is set, clear it and bugger off". What's worse, AFAICS the callers in aio_poll() are buggered (see below) > static inline void aio_poll_complete(struct aio_kiocb *iocb, __poll_t mask) > { > - struct file *file = iocb->poll.file; > - > aio_complete(iocb, mangle_poll(mask), 0); > - fput(file); > } No reasons to keep that function at all now... > - if (unlikely(apt.error)) { > - fput(req->file); > + if (unlikely(apt.error)) > return apt.error; > - } > > if (mask) > aio_poll_complete(aiocb, mask); Looking at that thing... How does it manage to avoid leaks when we try to use it on e.g. /dev/tty, which has poll_wait(file, &tty->read_wait, wait); poll_wait(file, &tty->write_wait, wait); in n_tty_poll()? AFAICS, we'll succeed adding it to the first queue, then have aio_poll_queue_proc() fail and set apt.error to -EINVAL. Suppose we are looking for EPOLLIN and there's nothing ready to read. We'll go mask = vfs_poll(req->file, &apt.pt) & req->events; mask is 0. if (unlikely(!req->head)) { nope - it's &tty->read_wait, not NULL /* we did not manage to set up a waitqueue, done */ goto out; } spin_lock_irq(&ctx->ctx_lock); spin_lock(&req->head->lock); if (req->woken) { nope - no wakeups so far /* wake_up context handles the rest */ mask = 0; apt.error = 0; } else if (mask || apt.error) { apt.error is non-zero /* if we get an error or a mask we are done */ WARN_ON_ONCE(list_empty(&req->wait.entry)); list_del_init(&req->wait.entry); OK, removed from queue } else { /* actually waiting for an event */ list_add_tail(&aiocb->ki_list, &ctx->active_reqs); aiocb->ki_cancel = aio_poll_cancel; } spin_unlock(&req->head->lock); spin_unlock_irq(&ctx->ctx_lock); out: if (unlikely(apt.error)) { fput(req->file); return apt.error; ... and we return -EINVAL to __io_submit_one(), where we hit /* * If ret is 0, we'd either done aio_complete() ourselves or have * arranged for that to be done asynchronously. Anything non-zero * means that we need to destroy req ourselves. */ if (ret) goto out_put_req; return 0; out_put_req: if (req->ki_eventfd) eventfd_ctx_put(req->ki_eventfd); iocb_put(req); out_put_reqs_available: put_reqs_available(ctx, 1); return ret; and all knowledge of req is lost. But we'd done only *one* iocb_put() in that case, and ->ki_refcnt had been set to 2 by aio_poll(). How could it avoid a leak? The same goes for "->poll() returns something without bothering to call poll_wait()" case, actually... IOW, I would rather have aio_poll() (along with your fput-a-bit-later change) do this - out: if (mask && !apt.error) aio_complete(aiocb, mangle_poll(mask), 0); iocb_put(aiocb); return apt.error; Comments?
On Sun, Mar 3, 2019 at 4:19 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Sun, Mar 03, 2019 at 01:55:02PM +0000, Al Viro wrote: > > > Maybe unrelated to this bug, but... What's to prevent a wakeup > > that happens just after we'd been added to a waitqueue by ->poll() > > triggering aio_poll_wake(), which gets to aio_poll_complete() > > with its fput() *before* we'd reached the end of ->poll() instance? > > > > I wonder if adding > > get_file(req->file); // make sure that early completion is safe > > right after > > req->file = fget(iocb->aio_fildes); > > if (unlikely(!req->file)) > > return -EBADF; > > paired with > > fput(req->file); > > right after out: in aio_poll() is needed... Am I missing something > > obvious here? Christoph? > > In more details - normally IOCB_CMD_POLL handling looks so: > > 1) io_submit(2) allocates aio_kiocb instance and passes it to aio_poll() > 2) aio_poll() resolves the descriptor to struct file by > req->file = fget(iocb->aio_fildes) > 3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that > aio_kiocb to 2 (bumps by 1, that is). > 4) aio_poll() calls vfs_poll(). After sanity checks (basically, "poll_wait() > had been called and only once") it locks the queue. That's what the extra > reference to iocb had been for - we know we can safely access it. > 5) With queue locked, we check if ->woken has already been set to true > (by aio_poll_wake()) and, if it had been, we unlock the queue, drop > a reference to aio_kiocb and bugger off - at that point it's > a responsibility to aio_poll_wake() and the stuff called/scheduled by > it. That code will drop the reference to file in req->file, along > with the other reference to our aio_kiocb. > 6) otherwise, we see whether we need to wait. If we do, we unlock > the queue, drop one reference to aio_kiocb and go away - eventual > wakeup (or cancel) will deal with the reference to file and with the > other reference to aio_kiocb > 7) otherwise we remove ourselves from waitqueue (still under the queue > lock), so that wakeup won't get us. No async activity will be happening, > so we can safely drop req->file and iocb ourselves. > > If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb > won't get freed under us, so we can do all the checks and locking safely. > And we don't touch ->file if we detect that case. > > However, vfs_poll() most certainly *does* touch the file it had been > given. So wakeup coming while we are still in ->poll() might end up > doing fput() on that file. That case is not too rare, and usually > we are saved by the still present reference from descriptor table - > that fput() is not the final one. But if another thread closes that > descriptor right after our fget() and wakeup does happen before > ->poll() returns, we are in trouble - final fput() done while we > are in the middle of a method. > > What we need is to hold on to the file reference the same way we do with > aio_kiocb. No need to store the reference to what we'd got in a separate > variable - req->file is never reassigned and we'd already made sure that > req won't be freed under us, so dropping the extra reference with > fput(req->file) is fine in all cases. > > Fixes: bfe4037e722ec > Cc: stable@vger.kernel.org > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Please add the Reported-by tag from the report. If you don't add it, then either: 1. somebody (you) will need to remember to later go and associate fix the the report with "#syz fix" command 2. or the bug will stay open and syzbot will never report use-after-frees in unix_dgram_poll again (it's still the same already reported bug) 3. or worse somebody will spend time re-debugging this bug just to find that you already fixed it but did not include the tag Thanks > --- > diff --git a/fs/aio.c b/fs/aio.c > index 3083180a54c8..7e88bfabdac2 100644 > --- a/fs/aio.c > +++ b/fs/aio.c > @@ -1767,6 +1767,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) > > /* one for removal from waitqueue, one for this function */ > refcount_set(&aiocb->ki_refcnt, 2); > + get_file(req->file); > > mask = vfs_poll(req->file, &apt.pt) & req->events; > if (unlikely(!req->head)) { > @@ -1793,6 +1794,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) > spin_unlock_irq(&ctx->ctx_lock); > > out: > + fput(req->file); > if (unlikely(apt.error)) { > fput(req->file); > return apt.error; > > -- > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20190303151846.GQ2217%40ZenIV.linux.org.uk. > For more options, visit https://groups.google.com/d/optout.
On Sun, Mar 3, 2019 at 6:36 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > OK, having dug through the archives, the reasons were not strong. > So that part is OK... I've committed the patch. However, I didn't actually do the separate and independent cleanups: > > @@ -1060,6 +1071,8 @@ static inline void iocb_put(struct aio_kiocb *iocb) > > { > > if (refcount_read(&iocb->ki_refcnt) == 0 || > > refcount_dec_and_test(&iocb->ki_refcnt)) { > > + if (iocb->ki_filp) > > + fput(iocb->ki_filp); > > percpu_ref_put(&iocb->ki_ctx->reqs); > > kmem_cache_free(kiocb_cachep, iocb); > > } > > That reminds me - refcount_t here is rather ridiculous; what we > have is > * everything other than aio_poll: ki_refcnt is 0 all along > * aio_poll: originally 0, then set to 2, then two iocb_put() > are done (either both synchronous to aio_poll(), or one sync and one > async). > > That's not a refcount at all. It's a flag, set only for aio_poll ones. > And that test in iocb_put() is "if flag is set, clear it and bugger off". > > What's worse, AFAICS the callers in aio_poll() are buggered (see below) Ok. Suggestions? > > - if (unlikely(apt.error)) { > > - fput(req->file); > > + if (unlikely(apt.error)) > > return apt.error; > > - } > > > > if (mask) > > aio_poll_complete(aiocb, mask); > > Looking at that thing... How does it manage to avoid leaks > when we try to use it on e.g. /dev/tty, which has > poll_wait(file, &tty->read_wait, wait); > poll_wait(file, &tty->write_wait, wait); > in n_tty_poll()? I don't think that's he only case that uses multiple poll entries. It's now the poll() machinery is designed to be used, after all. Although maybe everybody ends up using just a single wait-queue.. > AFAICS, we'll succeed adding it to the first queue, then have > aio_poll_queue_proc() fail and set apt.error to -EINVAL. Yeah, that's bogus, but documented. I guess nobody really expects to use aio_poll on anything but a socket or something? But your refcount leak looks valid. Hmm. So yes, that whole ki_refcnt looks a bit odd and pointless. And apparently buggy too. > Comments? Can we just (a) remove ki_refcnt entirely (b) remove the "iocb_put(aiocb);" from aio_poll()? Every actual successful io submission should end in aio_complete(), or we free the aio iocb in the "out_put_req" error case. There's no point in doing the refcount as you pointed out, and it seems to be actively buggy. Anyway, all of this (and your suggestion to just remove "aio_poll_complete()" and just use "aio_complete()") is independent of the file refcounting part, I feel. Which is why I just committed that patch as-is (84c4e1f89fef "aio: simplify - and fix - fget/fput for io_submit()"). Linus
diff --git a/fs/aio.c b/fs/aio.c index 3083180a54c8..7e88bfabdac2 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1767,6 +1767,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) /* one for removal from waitqueue, one for this function */ refcount_set(&aiocb->ki_refcnt, 2); + get_file(req->file); mask = vfs_poll(req->file, &apt.pt) & req->events; if (unlikely(!req->head)) { @@ -1793,6 +1794,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) spin_unlock_irq(&ctx->ctx_lock); out: + fput(req->file); if (unlikely(apt.error)) { fput(req->file); return apt.error;