diff mbox series

[v2,2/2] 9p: Add refcount to p9_req_t

Message ID 20180814174342.11068-2-tomasbortoli@gmail.com
State Not Applicable, archived
Delegated to: David Miller
Headers show
Series [v2,1/2] WIP: 9p: rename p9_free_req() function | expand

Commit Message

Tomas Bortoli Aug. 14, 2018, 5:43 p.m. UTC
To avoid use-after-free(s), use a refcount to keep track of the
usable references to any instantiated struct p9_req_t.

This commit adds p9_req_put(), p9_req_get() and p9_req_try_get() as
wrappers to kref_put(), kref_get() and kref_get_unless_zero().
These are used by the client and the transports to keep track of
valid requests' references.

p9_free_req() is added back and used as callback by kref_put().

Add SLAB_TYPESAFE_BY_RCU as it ensures that the memory freed by
kmem_cache_free() will not be reused for another type until the rcu
synchronisation period is over, so an address gotten under rcu read
lock is safe to inc_ref() without corrupting random memory while
the lock is held.

Co-developed-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
Reported-by: syzbot+467050c1ce275af2a5b8@syzkaller.appspotmail.com
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
---
 include/net/9p/client.h | 14 +++++++++++++
 net/9p/client.c         | 54 +++++++++++++++++++++++++++++++++++++++++++------
 net/9p/trans_fd.c       | 11 +++++++++-
 net/9p/trans_rdma.c     |  1 +
 4 files changed, 73 insertions(+), 7 deletions(-)

Comments

Dominique Martinet Aug. 27, 2018, 11:09 p.m. UTC | #1
Tomas Bortoli wrote on Tue, Aug 14, 2018:
> To avoid use-after-free(s), use a refcount to keep track of the
> usable references to any instantiated struct p9_req_t.
> 
> This commit adds p9_req_put(), p9_req_get() and p9_req_try_get() as
> wrappers to kref_put(), kref_get() and kref_get_unless_zero().
> These are used by the client and the transports to keep track of
> valid requests' references.
> 
> p9_free_req() is added back and used as callback by kref_put().
> 
> Add SLAB_TYPESAFE_BY_RCU as it ensures that the memory freed by
> kmem_cache_free() will not be reused for another type until the rcu
> synchronisation period is over, so an address gotten under rcu read
> lock is safe to inc_ref() without corrupting random memory while
> the lock is held.


FWIW, since 4.19-rc1 has been tagged I was going to push this and all
the perrequesites to linux-next, but I've managed to leak some requests
by interrupting them in trans_virtio.
I think I've found why (see below), so I'll push a fixed version after
some more testing and another thorough read -- at some point today, but
this hasn't been 'approved' explicitely so please review! :)

(Jun, I think you'll need to ask again to rename 'req' to 'rreq' if you
think it's important -- I think such a rename should go in a separate
patch anyway, there's plenty of time until the 4.20 merge window)


By "all the prerequesites" I mean this patch "serie":
* 9p: Use a slab for allocating requests
* 9p: Remove p9_idpool
* net/9p: embed fcall in req to round down buffer allocs
* net/9p: add a per-client fcall kmem_cache
* 9p: rename p9_free_req() function
* 9p: Add refcount to p9_req_t

All the other patchs have had some review though, I was just waiting for
the start of this cycle, but if someone has any issue with the above
patches now is a good time to say.


> diff --git a/net/9p/client.c b/net/9p/client.c
> index 7942c0bfcc5b..c9bb5d41afa4 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -716,6 +756,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
>  
>  	err = c->trans_mod->request(c, req);
>  	if (err < 0) {
> +		/* write won't happen */
> +		p9_req_put(req);
>  		if (err != -ERESTARTSYS && err != -EFAULT)
>  			c->status = Disconnected;
>  		goto recalc_sigpending;

p9_client_zc_rpc needs the same put if zc_request failed, I'm not sure
why it wasn't here in my draft
piaojun Aug. 28, 2018, 1:07 a.m. UTC | #2
Hi Dominique,

On 2018/8/28 7:09, Dominique Martinet wrote:
> Tomas Bortoli wrote on Tue, Aug 14, 2018:
>> To avoid use-after-free(s), use a refcount to keep track of the
>> usable references to any instantiated struct p9_req_t.
>>
>> This commit adds p9_req_put(), p9_req_get() and p9_req_try_get() as
>> wrappers to kref_put(), kref_get() and kref_get_unless_zero().
>> These are used by the client and the transports to keep track of
>> valid requests' references.
>>
>> p9_free_req() is added back and used as callback by kref_put().
>>
>> Add SLAB_TYPESAFE_BY_RCU as it ensures that the memory freed by
>> kmem_cache_free() will not be reused for another type until the rcu
>> synchronisation period is over, so an address gotten under rcu read
>> lock is safe to inc_ref() without corrupting random memory while
>> the lock is held.
> 
> 
> FWIW, since 4.19-rc1 has been tagged I was going to push this and all
> the perrequesites to linux-next, but I've managed to leak some requests
> by interrupting them in trans_virtio.
> I think I've found why (see below), so I'll push a fixed version after
> some more testing and another thorough read -- at some point today, but
> this hasn't been 'approved' explicitely so please review! :)
> 
> (Jun, I think you'll need to ask again to rename 'req' to 'rreq' if you
> think it's important -- I think such a rename should go in a separate
> patch anyway, there's plenty of time until the 4.20 merge window)
> 

I still think such a rename is necessary, and as you said, it will be
better go in another patch.

Thanks,
Jun

> 
> By "all the prerequesites" I mean this patch "serie":
> * 9p: Use a slab for allocating requests
> * 9p: Remove p9_idpool
> * net/9p: embed fcall in req to round down buffer allocs
> * net/9p: add a per-client fcall kmem_cache
> * 9p: rename p9_free_req() function
> * 9p: Add refcount to p9_req_t
> 
> All the other patchs have had some review though, I was just waiting for
> the start of this cycle, but if someone has any issue with the above
> patches now is a good time to say.
> 
> 
>> diff --git a/net/9p/client.c b/net/9p/client.c
>> index 7942c0bfcc5b..c9bb5d41afa4 100644
>> --- a/net/9p/client.c
>> +++ b/net/9p/client.c
>> @@ -716,6 +756,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
>>  
>>  	err = c->trans_mod->request(c, req);
>>  	if (err < 0) {
>> +		/* write won't happen */
>> +		p9_req_put(req);
>>  		if (err != -ERESTARTSYS && err != -EFAULT)
>>  			c->status = Disconnected;
>>  		goto recalc_sigpending;
> 
> p9_client_zc_rpc needs the same put if zc_request failed, I'm not sure
> why it wasn't here in my draft
>
Dominique Martinet Aug. 28, 2018, 1:57 a.m. UTC | #3
piaojun wrote on Tue, Aug 28, 2018:
> > (Jun, I think you'll need to ask again to rename 'req' to 'rreq' if you
> > think it's important -- I think such a rename should go in a separate
> > patch anyway, there's plenty of time until the 4.20 merge window)
> > 
> 
> I still think such a rename is necessary, and as you said, it will be
> better go in another patch.

Tomas can you send a patch for that please?
It's not very interesting, but might as well finish this properly :)

> >> diff --git a/net/9p/client.c b/net/9p/client.c
> >> index 7942c0bfcc5b..c9bb5d41afa4 100644
> >> --- a/net/9p/client.c
> >> +++ b/net/9p/client.c
> >> @@ -716,6 +756,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
> >>  
> >>  	err = c->trans_mod->request(c, req);
> >>  	if (err < 0) {
> >> +		/* write won't happen */
> >> +		p9_req_put(req);
> >>  		if (err != -ERESTARTSYS && err != -EFAULT)
> >>  			c->status = Disconnected;
> >>  		goto recalc_sigpending;
> > 
> > p9_client_zc_rpc needs the same put if zc_request failed, I'm not sure
> > why it wasn't here in my draft

Ah, I remember a bit better now, this is not as simple as adding the
same check after zc_request because the zc_request embeds the wait
itself to do its own cleanup after the reply came (unpin pages that were
sent to the server).

This brings in an interesting race condition that if the
wait_event_killable() in the zc_request is interrupted, the user data
pages that were pinned get unpinned and could potentially be moved
before the server replies... Even if they're not moved the user would be
told the read/write failed and could reuse the memory that would be
read/written later.
I'm not sure how this part works but it's probably not great.

Greg, do you have an opinion on this?


This is tricky, we cannot even rely on the refcounting for this as the
zc pages are likely user pages, so it'll be bad if we return from the
syscall and the memory gets accessed later.
On the other hand making that wait non-killable isn't a good solution
either, and we cannot use flush for virtio, so I don't have any idea for
this... Any magic virtio "take-back"?



Well, this would be for another patch anyway - for now I'll just do the
p9_req_put if it hasn't been kicked so that means something like the
following diff.. But my test bed is currently down so I'll wait for
tests to push:
-------8<----------------
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 7728b0acde09..36a1401c0722 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -404,6 +404,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
 	struct scatterlist *sgs[4];
 	size_t offs;
 	int need_drop = 0;
+	int kicked = 0;
 
 	p9_debug(P9_DEBUG_TRANS, "virtio request\n");
 
@@ -498,6 +499,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
 	}
 	virtqueue_kick(chan->vq);
 	spin_unlock_irqrestore(&chan->lock, flags);
+	kicked = 1;
 	p9_debug(P9_DEBUG_TRANS, "virtio request kicked\n");
 	err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);
 	/*
@@ -518,6 +520,10 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
 	}
 	kvfree(in_pages);
 	kvfree(out_pages);
+	if (!kicked) {
+		/* reply won't come */
+		p9_req_put(req);
+	}
 	return err;
 }
 
-------8<----------------
Dominique Martinet Aug. 29, 2018, 4:43 a.m. UTC | #4
Dominique Martinet wrote on Tue, Aug 28, 2018:
> I think I've found why (see below), so I'll push a fixed version after
> some more testing and another thorough read -- at some point today, but
> this hasn't been 'approved' explicitely so please review! :)

While the issue I pointed at was real, it wasn't what was causing the
refcount leak I was observing -- the problem is that we didn't drop a
ref when the request was successfully cancelled (e.g. the reply to the
flush came and the original request didn't get replied to)

The reason for this was that there were multiple versions of the patch
which alternated between doing the put in client.c after the cancelled
callback inconditionally, and doing the put in each transport's
cancelled() function, but virtio does not have this callback so that
didn't get added in the final version (codeveloping is hard); so I've
added an else() close to just issue a put if there is no callback.

(In the end, it felt better to have the req_put in the transport because
trans_fd is making refcounting difficult with its list handling, and
separating the put from the list removal would be more confusing than is
gained by sharing code)



Anyway, that's starting to be quite different from the v2 so I'll send a
v3 keeping Tomas as the author -- please check my edits are alright with
you, Tomas.

Meanwhile I'll keep running tests, I'm now confident about virtio but
want to spend more time on other transports again, so delaying the push
to linux-next for a few more days...
diff mbox series

Patch

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index 735f3979d559..947a570307a6 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -94,6 +94,7 @@  enum p9_req_status_t {
 struct p9_req_t {
 	int status;
 	int t_err;
+	struct kref refcount;
 	wait_queue_head_t wq;
 	struct p9_fcall tc;
 	struct p9_fcall rc;
@@ -233,6 +234,19 @@  int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status);
 int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *fl);
 void p9_fcall_fini(struct p9_fcall *fc);
 struct p9_req_t *p9_tag_lookup(struct p9_client *, u16);
+
+static inline void p9_req_get(struct p9_req_t *r)
+{
+	kref_get(&r->refcount);
+}
+
+static inline int p9_req_try_get(struct p9_req_t *r)
+{
+	return kref_get_unless_zero(&r->refcount);
+}
+
+int p9_req_put(struct p9_req_t *r);
+
 void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status);
 
 int p9_parse_header(struct p9_fcall *, int32_t *, int8_t *, int16_t *, int);
diff --git a/net/9p/client.c b/net/9p/client.c
index 7942c0bfcc5b..c9bb5d41afa4 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -310,6 +310,18 @@  p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
 	if (tag < 0)
 		goto free;
 
+	/*	Init ref to two because in the general case there is one ref
+	 *	that is put asynchronously by a writer thread, one ref
+	 *	temporarily given by p9_tag_lookup and put by p9_client_cb
+	 *	in the recv thread, and one ref put by p9_tag_remove in the
+	 *	main thread. The only exception is virtio that does not use
+	 *	p9_tag_lookup but does not have a writer thread either
+	 *	(the write happens synchronously in the request/zc_request
+	 *	callback), so p9_client_cb eats the second ref there
+	 *	as the pointer is duplicated directly by virtqueue_add_sgs()
+	 */
+	refcount_set(&req->refcount.refcount, 2);
+
 	return req;
 
 free:
@@ -333,10 +345,21 @@  struct p9_req_t *p9_tag_lookup(struct p9_client *c, u16 tag)
 	struct p9_req_t *req;
 
 	rcu_read_lock();
+again:
 	req = idr_find(&c->reqs, tag);
-	/* There's no refcount on the req; a malicious server could cause
-	 * us to dereference a NULL pointer
-	 */
+	if (req) {
+		/* We have to be careful with the req found under rcu_read_lock
+		 * Thanks to SLAB_TYPESAFE_BY_RCU we can safely try to get the
+		 * ref again without corrupting other data, then check again
+		 * that the tag matches once we have the ref
+		 */
+		if (!p9_req_try_get(req))
+			goto again;
+		if (req->tc.tag != tag) {
+			p9_req_put(req);
+			goto again;
+		}
+	}
 	rcu_read_unlock();
 
 	return req;
@@ -350,7 +373,7 @@  EXPORT_SYMBOL(p9_tag_lookup);
  *
  * Context: Any context.
  */
-static void p9_tag_remove(struct p9_client *c, struct p9_req_t *r)
+static int p9_tag_remove(struct p9_client *c, struct p9_req_t *r)
 {
 	unsigned long flags;
 	u16 tag = r->tc.tag;
@@ -359,11 +382,23 @@  static void p9_tag_remove(struct p9_client *c, struct p9_req_t *r)
 	spin_lock_irqsave(&c->lock, flags);
 	idr_remove(&c->reqs, tag);
 	spin_unlock_irqrestore(&c->lock, flags);
+	return p9_req_put(r);
+}
+
+static void p9_req_free(struct kref *ref)
+{
+	struct p9_req_t *r = container_of(ref, struct p9_req_t, refcount);
 	p9_fcall_fini(&r->tc);
 	p9_fcall_fini(&r->rc);
 	kmem_cache_free(p9_req_cache, r);
 }
 
+int p9_req_put(struct p9_req_t *r)
+{
+	return kref_put(&r->refcount, p9_req_free);
+}
+EXPORT_SYMBOL(p9_req_put);
+
 /**
  * p9_tag_cleanup - cleans up tags structure and reclaims resources
  * @c:  v9fs client struct
@@ -379,7 +414,9 @@  static void p9_tag_cleanup(struct p9_client *c)
 	rcu_read_lock();
 	idr_for_each_entry(&c->reqs, req, id) {
 		pr_info("Tag %d still in use\n", id);
-		p9_tag_remove(c, req);
+		if (p9_tag_remove(c, req) == 0)
+			pr_warn("Packet with tag %d has still references",
+				req->tc.tag);
 	}
 	rcu_read_unlock();
 }
@@ -403,6 +440,7 @@  void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
 
 	wake_up(&req->wq);
 	p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
+	p9_req_put(req);
 }
 EXPORT_SYMBOL(p9_client_cb);
 
@@ -682,6 +720,8 @@  static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 	return req;
 reterr:
 	p9_tag_remove(c, req);
+	/* We have to put also the 2nd reference as it won't be used */
+	p9_req_put(req);
 	return ERR_PTR(err);
 }
 
@@ -716,6 +756,8 @@  p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 
 	err = c->trans_mod->request(c, req);
 	if (err < 0) {
+		/* write won't happen */
+		p9_req_put(req);
 		if (err != -ERESTARTSYS && err != -EFAULT)
 			c->status = Disconnected;
 		goto recalc_sigpending;
@@ -2241,7 +2283,7 @@  EXPORT_SYMBOL(p9_client_readlink);
 
 int __init p9_client_init(void)
 {
-	p9_req_cache = KMEM_CACHE(p9_req_t, 0);
+	p9_req_cache = KMEM_CACHE(p9_req_t, SLAB_TYPESAFE_BY_RCU);
 	return p9_req_cache ? 0 : -ENOMEM;
 }
 
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 20f46f13fe83..686e24e355d0 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -132,6 +132,7 @@  struct p9_conn {
 	struct list_head req_list;
 	struct list_head unsent_req_list;
 	struct p9_req_t *req;
+	struct p9_req_t *wreq;
 	char tmp_buf[7];
 	struct p9_fcall rc;
 	int wpos;
@@ -383,6 +384,7 @@  static void p9_read_work(struct work_struct *work)
 		m->rc.sdata = NULL;
 		m->rc.offset = 0;
 		m->rc.capacity = 0;
+		p9_req_put(m->req);
 		m->req = NULL;
 	}
 
@@ -472,6 +474,8 @@  static void p9_write_work(struct work_struct *work)
 		m->wbuf = req->tc.sdata;
 		m->wsize = req->tc.size;
 		m->wpos = 0;
+		p9_req_get(req);
+		m->wreq = req;
 		spin_unlock(&m->client->lock);
 	}
 
@@ -492,8 +496,11 @@  static void p9_write_work(struct work_struct *work)
 	}
 
 	m->wpos += err;
-	if (m->wpos == m->wsize)
+	if (m->wpos == m->wsize) {
 		m->wpos = m->wsize = 0;
+		p9_req_put(m->wreq);
+		m->wreq = NULL;
+	}
 
 end_clear:
 	clear_bit(Wworksched, &m->wsched);
@@ -694,6 +701,7 @@  static int p9_fd_cancel(struct p9_client *client, struct p9_req_t *req)
 	if (req->status == REQ_STATUS_UNSENT) {
 		list_del(&req->req_list);
 		req->status = REQ_STATUS_FLSHD;
+		p9_req_put(req);
 		ret = 0;
 	}
 	spin_unlock(&client->lock);
@@ -711,6 +719,7 @@  static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req)
 	spin_lock(&client->lock);
 	list_del(&req->req_list);
 	spin_unlock(&client->lock);
+	p9_req_put(req);
 
 	return 0;
 }
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index c60655c90c9e..8cff368a11e3 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -365,6 +365,7 @@  send_done(struct ib_cq *cq, struct ib_wc *wc)
 			    c->busa, c->req->tc.size,
 			    DMA_TO_DEVICE);
 	up(&rdma->sq_sem);
+	p9_req_put(c->req);
 	kfree(c);
 }