diff mbox series

[1/8] aio: make sure file is pinned

Message ID 20190307000316.31133-1-viro@ZenIV.linux.org.uk
State Not Applicable
Delegated to: David Miller
Headers show
Series [1/8] aio: make sure file is pinned | expand

Commit Message

Al Viro March 7, 2019, 12:03 a.m. UTC
From: Al Viro <viro@zeniv.linux.org.uk>

"aio: remove the extra get_file/fput pair in io_submit_one" was
too optimistic - not dereferencing file pointer after e.g.
->write_iter() returns is not enough; that reference might've been
the only thing that kept alive objects that are referenced
*before* the method returns.  Such as inode, for example...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Linus Torvalds March 7, 2019, 12:23 a.m. UTC | #1
On Wed, Mar 6, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> "aio: remove the extra get_file/fput pair in io_submit_one" was
> too optimistic - not dereferencing file pointer after e.g.
> ->write_iter() returns is not enough; that reference might've been
> the only thing that kept alive objects that are referenced
> *before* the method returns.  Such as inode, for example...

I still; think that this is actually _worse_ than just having the
refcount on the req instead.

As it is, we have that completely insane "ref can go away from under
us", because nothing keeps that around, which then causes all those
other crazy issues with "woken" etc garbage.

I think we should be able to get rid of those entirely. Make the
poll() case just return zero if it has added the entry successfully to
poll queue.  No need for "woken", no need for all that odd "oh, but
now the req might no longer exist".

The refcount wasn't the problem. Everything *else* was the problem,
including only using the refcount for the poll case etc.

                       Linus
Al Viro March 7, 2019, 12:41 a.m. UTC | #2
On Wed, Mar 06, 2019 at 04:23:04PM -0800, Linus Torvalds wrote:
> On Wed, Mar 6, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > From: Al Viro <viro@zeniv.linux.org.uk>
> >
> > "aio: remove the extra get_file/fput pair in io_submit_one" was
> > too optimistic - not dereferencing file pointer after e.g.
> > ->write_iter() returns is not enough; that reference might've been
> > the only thing that kept alive objects that are referenced
> > *before* the method returns.  Such as inode, for example...
> 
> I still; think that this is actually _worse_ than just having the
> refcount on the req instead.
> 
> As it is, we have that completely insane "ref can go away from under
> us", because nothing keeps that around, which then causes all those
> other crazy issues with "woken" etc garbage.
>
> I think we should be able to get rid of those entirely. Make the
> poll() case just return zero if it has added the entry successfully to
> poll queue.  No need for "woken", no need for all that odd "oh, but
> now the req might no longer exist".

Not really.  Sure, you can get rid of "might no longer exist"
considerations, but you still need to decide which way do we want to
handle it.  There are 3 cases:
	* it's already taken up; don't put on the list for possible
cancel, don't call aio_complete().
	* will eventually be woken up; put on the list for possible
cancle, don't call aio_complete().
	* wanted to be on several queues, fortunately not woken up
yet.  Make sure it's gone from queue, return an error.
	* none of the above, and ->poll() has reported what we wanted
from the very beginning.  Remove from queue, call aio_complete().

You'll need some logics to handle that.  I can buy the "if we know
the req is still alive, we can check if it's still queued instead of
separate woken flag", but but it won't win you much ;-/
Al Viro March 7, 2019, 12:48 a.m. UTC | #3
On Thu, Mar 07, 2019 at 12:41:59AM +0000, Al Viro wrote:
> On Wed, Mar 06, 2019 at 04:23:04PM -0800, Linus Torvalds wrote:
> > On Wed, Mar 6, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > From: Al Viro <viro@zeniv.linux.org.uk>
> > >
> > > "aio: remove the extra get_file/fput pair in io_submit_one" was
> > > too optimistic - not dereferencing file pointer after e.g.
> > > ->write_iter() returns is not enough; that reference might've been
> > > the only thing that kept alive objects that are referenced
> > > *before* the method returns.  Such as inode, for example...
> > 
> > I still; think that this is actually _worse_ than just having the
> > refcount on the req instead.
> > 
> > As it is, we have that completely insane "ref can go away from under
> > us", because nothing keeps that around, which then causes all those
> > other crazy issues with "woken" etc garbage.
> >
> > I think we should be able to get rid of those entirely. Make the
> > poll() case just return zero if it has added the entry successfully to
> > poll queue.  No need for "woken", no need for all that odd "oh, but
> > now the req might no longer exist".
> 
> Not really.  Sure, you can get rid of "might no longer exist"
> considerations, but you still need to decide which way do we want to
> handle it.  There are 3 cases:
> 	* it's already taken up; don't put on the list for possible
> cancel, don't call aio_complete().
> 	* will eventually be woken up; put on the list for possible
> cancle, don't call aio_complete().
> 	* wanted to be on several queues, fortunately not woken up
> yet.  Make sure it's gone from queue, return an error.
> 	* none of the above, and ->poll() has reported what we wanted
> from the very beginning.  Remove from queue, call aio_complete().
> 
> You'll need some logics to handle that.  I can buy the "if we know
> the req is still alive, we can check if it's still queued instead of
> separate woken flag", but but it won't win you much ;-/

If anything, the one good reason for refcount would be the risk that
some ->read_iter() or ->write_iter() will try to dereference iocb
after having decided to return -EIOCBQUEUED and submitted all bios.
I think that doesn't happen, but making sure it doesn't would be
a good argument in favour of that refcount.
Al Viro March 7, 2019, 1:20 a.m. UTC | #4
On Thu, Mar 07, 2019 at 12:48:28AM +0000, Al Viro wrote:
> On Thu, Mar 07, 2019 at 12:41:59AM +0000, Al Viro wrote:
> > On Wed, Mar 06, 2019 at 04:23:04PM -0800, Linus Torvalds wrote:
> > > On Wed, Mar 6, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > >
> > > > "aio: remove the extra get_file/fput pair in io_submit_one" was
> > > > too optimistic - not dereferencing file pointer after e.g.
> > > > ->write_iter() returns is not enough; that reference might've been
> > > > the only thing that kept alive objects that are referenced
> > > > *before* the method returns.  Such as inode, for example...
> > > 
> > > I still; think that this is actually _worse_ than just having the
> > > refcount on the req instead.
> > > 
> > > As it is, we have that completely insane "ref can go away from under
> > > us", because nothing keeps that around, which then causes all those
> > > other crazy issues with "woken" etc garbage.
> > >
> > > I think we should be able to get rid of those entirely. Make the
> > > poll() case just return zero if it has added the entry successfully to
> > > poll queue.  No need for "woken", no need for all that odd "oh, but
> > > now the req might no longer exist".
> > 
> > Not really.  Sure, you can get rid of "might no longer exist"
> > considerations, but you still need to decide which way do we want to
> > handle it.  There are 3 cases:
> > 	* it's already taken up; don't put on the list for possible
> > cancel, don't call aio_complete().
> > 	* will eventually be woken up; put on the list for possible
> > cancle, don't call aio_complete().
> > 	* wanted to be on several queues, fortunately not woken up
> > yet.  Make sure it's gone from queue, return an error.
> > 	* none of the above, and ->poll() has reported what we wanted
> > from the very beginning.  Remove from queue, call aio_complete().
> > 
> > You'll need some logics to handle that.  I can buy the "if we know
> > the req is still alive, we can check if it's still queued instead of
> > separate woken flag", but but it won't win you much ;-/
> 
> If anything, the one good reason for refcount would be the risk that
> some ->read_iter() or ->write_iter() will try to dereference iocb
> after having decided to return -EIOCBQUEUED and submitted all bios.
> I think that doesn't happen, but making sure it doesn't would be
> a good argument in favour of that refcount.

*grumble*

It is a good argument, unfortunately ;-/  Proof that instances do not
step into that is rather subtle and won't take much to break.  OK...

I'll try to massage that series on top of your patch; I still hate the
post-vfs_poll() logics in aio_poll() ;-/  Give me about half an hour
and I'll have something to post.
Linus Torvalds March 7, 2019, 1:30 a.m. UTC | #5
On Wed, Mar 6, 2019 at 5:20 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> I'll try to massage that series on top of your patch; I still hate the
> post-vfs_poll() logics in aio_poll() ;-/  Give me about half an hour
> and I'll have something to post.

No inherent hurry, I sent the ping just to make sure it hadn't gotten lost.

And yeah, I think the post-vfs_poll() logic cannot possibly be
necessary. My gut feel is that *if* we have the refcounting right,
then we should be able to just let the wakeup come in at any later
point, and ordering shouldn't matter all that much, and we shouldn't
even need any locking.

I'd like to think that it can be done with something like "just 'or'
in the mask atomically" (so that we don't care about ordering between
the synchronous vfs_poll() and the async poll wakeup), together with
"when refcount goes to zero, finish the thing off and complete it" (so
that we don't care who finishes first).

No "woken" logic, no "who fired first" logic, no BS. Just make the
operations work regardless of ordering.

And maybe it can't be done. But the current model seems just so hacky
that it can't be the right model.

               Linus
Al Viro March 8, 2019, 3:36 a.m. UTC | #6
On Wed, Mar 06, 2019 at 05:30:21PM -0800, Linus Torvalds wrote:
> On Wed, Mar 6, 2019 at 5:20 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > I'll try to massage that series on top of your patch; I still hate the
> > post-vfs_poll() logics in aio_poll() ;-/  Give me about half an hour
> > and I'll have something to post.
> 
> No inherent hurry, I sent the ping just to make sure it hadn't gotten lost.
> 
> And yeah, I think the post-vfs_poll() logic cannot possibly be
> necessary. My gut feel is that *if* we have the refcounting right,
> then we should be able to just let the wakeup come in at any later
> point, and ordering shouldn't matter all that much, and we shouldn't
> even need any locking.
> 
> I'd like to think that it can be done with something like "just 'or'
> in the mask atomically" (so that we don't care about ordering between
> the synchronous vfs_poll() and the async poll wakeup), together with
> "when refcount goes to zero, finish the thing off and complete it" (so
> that we don't care who finishes first).
> 
> No "woken" logic, no "who fired first" logic, no BS. Just make the
> operations work regardless of ordering.
> 
> And maybe it can't be done. But the current model seems just so hacky
> that it can't be the right model.

Umm...  It is kinda-sorta doable; we do need something vaguely similar
to ->woken ("should we add it to the list of cancellables, or is the
async reference already gone?"), but other than that it seems to be
feasible.

See vfs.git#work.aio; the crucial bits are in these commits:
      keep io_event in aio_kiocb
      get rid of aio_complete() res/res2 arguments
      move aio_complete() to final iocb_put(), try to fix aio_poll() logics
The first two are preparations, the last is where the fixes (hopefully)
happen.

The logics in aio_poll() after vfs_poll():
	* we might want to steal the async reference (e.g. due to event
returned from the very beginning, or due to attempt to put on more than
one waitqueue, which makes results unreliable).  That's _NOT_ possible
if the thing had been put on a waitqueue, but currently isn't there.
It might be either due to early wakeup having done everything or the
same having scheduled aio_poll_complete_work().  In either case, the
best we can do is to ignore the return value of vfs_poll() and, in
case of error, mark the sucker cancelled.  We *can't* return an error
in that case.

	* if we want and can steal the async reference, rip it from
waitqueue; otherwise, put it on the "cancellable" list, unless it's
already gone or unless we are simulating the cancel ourselves.

	* if vfs_poll() has reported something we want and we have
successufully stolen the iocb, put it there, have the reference
we'd taken over dropped and return 0

Comments?
Christoph Hellwig March 8, 2019, 3:50 p.m. UTC | #7
On Fri, Mar 08, 2019 at 03:36:50AM +0000, Al Viro wrote:
> See vfs.git#work.aio; the crucial bits are in these commits:
>       keep io_event in aio_kiocb
>       get rid of aio_complete() res/res2 arguments
>       move aio_complete() to final iocb_put(), try to fix aio_poll() logics
> The first two are preparations, the last is where the fixes (hopefully)
> happen.

Looks sensible.  I'll try to run the tests over it, and I've added
Avi so that maybe he can make sure that scylladb is also happy with it,
that was usually the best way to find aio poll bugs..
Al Viro March 10, 2019, 7:06 a.m. UTC | #8
On Fri, Mar 08, 2019 at 03:36:50AM +0000, Al Viro wrote:

> See vfs.git#work.aio; the crucial bits are in these commits:
>       keep io_event in aio_kiocb
>       get rid of aio_complete() res/res2 arguments
>       move aio_complete() to final iocb_put(), try to fix aio_poll() logics
> The first two are preparations, the last is where the fixes (hopefully)
> happen.

OK, refactored, cleaned up and force-pushed.  Current state:
Al Viro (7):
      keep io_event in aio_kiocb
      aio: store event at final iocb_put()
      Fix aio_poll() races
      make aio_read()/aio_write() return int
      move dropping ->ki_eventfd into iocb_destroy()
      deal with get_reqs_available() in aio_get_req() itself
      aio: move sanity checks and request allocation to io_submit_one()

Linus Torvalds (1):
      pin iocb through aio.

 fs/aio.c | 327 ++++++++++++++++++++++++++++-----------------------------------
 1 file changed, 146 insertions(+), 181 deletions(-)
Christoph Hellwig March 11, 2019, 7:41 p.m. UTC | #9
On Sun, Mar 10, 2019 at 07:06:18AM +0000, Al Viro wrote:
> OK, refactored, cleaned up and force-pushed.  Current state:

This survives the libaio test suite at least.
diff mbox series

Patch

diff --git a/fs/aio.c b/fs/aio.c
index 3d9669d011b9..ea30b78187ed 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1790,6 +1790,7 @@  static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 			   struct iocb __user *user_iocb, bool compat)
 {
 	struct aio_kiocb *req;
+	struct file *file;
 	ssize_t ret;
 
 	/* enforce forwards compatibility on users */
@@ -1844,6 +1845,7 @@  static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 
 	req->ki_user_iocb = user_iocb;
 	req->ki_user_data = iocb->aio_data;
+	file = get_file(req->ki_filp);	/* req can die too early */
 
 	switch (iocb->aio_lio_opcode) {
 	case IOCB_CMD_PREAD:
@@ -1872,6 +1874,7 @@  static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 		ret = -EINVAL;
 		break;
 	}
+	fput(file);
 
 	/*
 	 * If ret is 0, we'd either done aio_complete() ourselves or have