diff mbox

unix: properly account for FDs passed over unix sockets

Message ID 20151228141435.GA13351@1wt.eu
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Willy Tarreau Dec. 28, 2015, 2:14 p.m. UTC
It is possible for a process to allocate and accumulate far more FDs than
the process' limit by sending them over a unix socket then closing them
to keep the process' fd count low.

This change addresses this problem by keeping track of the number of FDs
in flight per user and preventing non-privileged processes from having
more FDs in flight than their configured FD limit.

Reported-by: socketpair@gmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
It would be nice if (if accepted) it would be backported to -stable as the
issue is currently exploitable.
---
 include/linux/sched.h |  1 +
 net/unix/af_unix.c    | 24 ++++++++++++++++++++----
 net/unix/garbage.c    | 13 ++++++++-----
 3 files changed, 29 insertions(+), 9 deletions(-)

Comments

Hannes Frederic Sowa Dec. 29, 2015, 2:37 p.m. UTC | #1
On 28.12.2015 15:14, Willy Tarreau wrote:
> @@ -1528,10 +1546,8 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
>   	if (!UNIXCB(skb).fp)
>   		return -ENOMEM;
>
> -	if (unix_sock_count) {
> -		for (i = scm->fp->count - 1; i >= 0; i--)
> -			unix_inflight(scm->fp->fp[i]);
> -	}
> +	for (i = scm->fp->count - 1; i >= 0; i--)
> +		unix_inflight(scm->fp->fp[i]);
>   	return max_level;

You seem to be able to remove the unix_sock_count variable altogether now.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Dec. 29, 2015, 2:48 p.m. UTC | #2
On 28.12.2015 15:14, Willy Tarreau wrote:
> It is possible for a process to allocate and accumulate far more FDs than
> the process' limit by sending them over a unix socket then closing them
> to keep the process' fd count low.
>
> This change addresses this problem by keeping track of the number of FDs
> in flight per user and preventing non-privileged processes from having
> more FDs in flight than their configured FD limit.
>
> Reported-by: socketpair@gmail.com
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Willy Tarreau <w@1wt.eu>

Thanks for the patch!

I think this does not close the DoS attack completely as we duplicate 
fds if the reader uses MSG_PEEK on the unix domain socket and thus 
clones the fd. Have I overlooked something?

Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willy Tarreau Dec. 29, 2015, 8:35 p.m. UTC | #3
On Tue, Dec 29, 2015 at 03:48:45PM +0100, Hannes Frederic Sowa wrote:
> On 28.12.2015 15:14, Willy Tarreau wrote:
> >It is possible for a process to allocate and accumulate far more FDs than
> >the process' limit by sending them over a unix socket then closing them
> >to keep the process' fd count low.
> >
> >This change addresses this problem by keeping track of the number of FDs
> >in flight per user and preventing non-privileged processes from having
> >more FDs in flight than their configured FD limit.
> >
> >Reported-by: socketpair@gmail.com
> >Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> >Signed-off-by: Willy Tarreau <w@1wt.eu>
> 
> Thanks for the patch!
> 
> I think this does not close the DoS attack completely as we duplicate 
> fds if the reader uses MSG_PEEK on the unix domain socket and thus 
> clones the fd. Have I overlooked something?

I didn't know this behaviour. However, then the fd remains in flight, right ?
So as long as it's not removed from the queue, the sender cannot add more
than its FD limit. I may be missing something obvious though :-/

Thanks,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Dec. 30, 2015, 8:58 a.m. UTC | #4
On 29.12.2015 21:35, Willy Tarreau wrote:
> On Tue, Dec 29, 2015 at 03:48:45PM +0100, Hannes Frederic Sowa wrote:
>> On 28.12.2015 15:14, Willy Tarreau wrote:
>>> It is possible for a process to allocate and accumulate far more FDs than
>>> the process' limit by sending them over a unix socket then closing them
>>> to keep the process' fd count low.
>>>
>>> This change addresses this problem by keeping track of the number of FDs
>>> in flight per user and preventing non-privileged processes from having
>>> more FDs in flight than their configured FD limit.
>>>
>>> Reported-by: socketpair@gmail.com
>>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>>> Signed-off-by: Willy Tarreau <w@1wt.eu>
>>
>> Thanks for the patch!
>>
>> I think this does not close the DoS attack completely as we duplicate
>> fds if the reader uses MSG_PEEK on the unix domain socket and thus
>> clones the fd. Have I overlooked something?
>
> I didn't know this behaviour. However, then the fd remains in flight, right ?
> So as long as it's not removed from the queue, the sender cannot add more
> than its FD limit. I may be missing something obvious though :-/

Yes, it remains in flight.

The MSG_PEEK code should not be harmful and the patch is good as is. I 
first understood from the published private thread, that it is possible 
for a program to exceed the rlimit of fds. But the DoS is only by 
keeping the fds in flight and not attaching them to any program.

__alloc_fd, called on the receiver side, does check for the rlimit 
maximum anyway, so I don't see a loophole anymore:

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Another idea would be to add the amount of memory used to manage the fds 
to sock_rmem/wmem but I don't see any advantages or disadvantages.

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willy Tarreau Dec. 30, 2015, 11:23 a.m. UTC | #5
On Wed, Dec 30, 2015 at 09:58:42AM +0100, Hannes Frederic Sowa wrote:
> The MSG_PEEK code should not be harmful and the patch is good as is. I 
> first understood from the published private thread, that it is possible 
> for a program to exceed the rlimit of fds. But the DoS is only by 
> keeping the fds in flight and not attaching them to any program.

Exactly. The real issue is when these FDs become very expensive such as
pipes full of data.

> __alloc_fd, called on the receiver side, does check for the rlimit 
> maximum anyway, so I don't see a loophole anymore:
> 
> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks!

> Another idea would be to add the amount of memory used to manage the fds 
> to sock_rmem/wmem but I don't see any advantages or disadvantages.

Compared to the impact of the pending data in pipes themselves in flight,
this would remain fairly minimal.

Thanks,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox Dec. 30, 2015, 1:14 p.m. UTC | #6
> > Another idea would be to add the amount of memory used to manage the fds 
> > to sock_rmem/wmem but I don't see any advantages or disadvantages.
> 
> Compared to the impact of the pending data in pipes themselves in flight,
> this would remain fairly minimal.

The true size of the memory pinned by a file handle cannot be determined
by the socket layer anyway. If the handle is for a device it may be
pinning lots of memory in the device driver. It's up to the rest of the
system to manage and contain those resource allocations correctly which a
fair bit of the code (except graphics) does properly.

The resource counting does create the reverse problem that user A running
a program written by hostile user B may find it takes all their resources,
stuffs them down an AF_UNIX socket and passes the last reference to the
socket via a setuid app somewhere where the user can't get it back.

For an end user case that's pretty irrelevant - you run hostile user B's
app then you already lost. For a service it's an interesting attack vector
but not one I can see any obvious ways to exploit because (with the
possible exception of dbus) services simply don't pass file handles down
sockets to untrusted targets.

Alan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa Dec. 31, 2015, 6:08 a.m. UTC | #7
Willy Tarreau wrote:
> On Wed, Dec 30, 2015 at 09:58:42AM +0100, Hannes Frederic Sowa wrote:
> > The MSG_PEEK code should not be harmful and the patch is good as is. I 
> > first understood from the published private thread, that it is possible 
> > for a program to exceed the rlimit of fds. But the DoS is only by 
> > keeping the fds in flight and not attaching them to any program.
> 
> Exactly. The real issue is when these FDs become very expensive such as
> pipes full of data.
> 

As you wrote how to abuse this vulnerability which exists in Linux 2.0
and later kernel, I quote a short description from private thread.

  "an unprivileged user consumes all file descriptors so that other
  unprivileged user cannot work" and "an unprivileged user consumes all
  kernel memory so that the OOM killer kills almost all processes before
  the culprit process is killed (CVE-2013-4312)".

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Mitigates: CVE-2013-4312 (Linux 2.0+)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willy Tarreau Dec. 31, 2015, 7:12 a.m. UTC | #8
On Thu, Dec 31, 2015 at 03:08:53PM +0900, Tetsuo Handa wrote:
> Willy Tarreau wrote:
> > On Wed, Dec 30, 2015 at 09:58:42AM +0100, Hannes Frederic Sowa wrote:
> > > The MSG_PEEK code should not be harmful and the patch is good as is. I 
> > > first understood from the published private thread, that it is possible 
> > > for a program to exceed the rlimit of fds. But the DoS is only by 
> > > keeping the fds in flight and not attaching them to any program.
> > 
> > Exactly. The real issue is when these FDs become very expensive such as
> > pipes full of data.
> > 
> 
> As you wrote how to abuse this vulnerability which exists in Linux 2.0
> and later kernel, I quote a short description from private thread.
> 
>   "an unprivileged user consumes all file descriptors so that other
>   unprivileged user cannot work" and "an unprivileged user consumes all
>   kernel memory so that the OOM killer kills almost all processes before
>   the culprit process is killed (CVE-2013-4312)".
> 
> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Mitigates: CVE-2013-4312 (Linux 2.0+)

Well I didn't reveal any secret as it was publicly reported first
in 2010, it's only that Mark sent us the proof of concept exploit
on the security list recently :-)

    https://bugzilla.kernel.org/show_bug.cgi?id=20402

Anyway I'll resend the patch with your reported-by, the CVE and
Hannes' ACK.

Thanks!
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox Dec. 31, 2015, 10:27 a.m. UTC | #9
On Thu, 31 Dec 2015 08:12:53 +0100
Willy Tarreau <w@1wt.eu> wrote:

> On Thu, Dec 31, 2015 at 03:08:53PM +0900, Tetsuo Handa wrote:
> > Willy Tarreau wrote:
> > > On Wed, Dec 30, 2015 at 09:58:42AM +0100, Hannes Frederic Sowa wrote:
> > > > The MSG_PEEK code should not be harmful and the patch is good as is. I 
> > > > first understood from the published private thread, that it is possible 
> > > > for a program to exceed the rlimit of fds. But the DoS is only by 
> > > > keeping the fds in flight and not attaching them to any program.
> > > 
> > > Exactly. The real issue is when these FDs become very expensive such as
> > > pipes full of data.
> > > 
> > 
> > As you wrote how to abuse this vulnerability which exists in Linux 2.0
> > and later kernel, I quote a short description from private thread.
> > 
> >   "an unprivileged user consumes all file descriptors so that other
> >   unprivileged user cannot work" and "an unprivileged user consumes all
> >   kernel memory so that the OOM killer kills almost all processes before
> >   the culprit process is killed (CVE-2013-4312)".
> > 
> > Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Mitigates: CVE-2013-4312 (Linux 2.0+)
> 
> Well I didn't reveal any secret as it was publicly reported first
> in 2010, it's only that Mark sent us the proof of concept exploit
> on the security list recently :-)

There were demonstrations of this bug posted for BSD unixes before Linux
even existed. It and "run the box out of socket buffers" are older than
Linux 8)

Alan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 4, 2016, 9:44 p.m. UTC | #10
From: Willy Tarreau <w@1wt.eu>
Date: Mon, 28 Dec 2015 15:14:35 +0100

> It is possible for a process to allocate and accumulate far more FDs than
> the process' limit by sending them over a unix socket then closing them
> to keep the process' fd count low.
> 
> This change addresses this problem by keeping track of the number of FDs
> in flight per user and preventing non-privileged processes from having
> more FDs in flight than their configured FD limit.
> 
> Reported-by: socketpair@gmail.com
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
> It would be nice if (if accepted) it would be backported to -stable as the
> issue is currently exploitable.

As mentioned, please remove the unix_sock_count variable and
associated code as it is completely unused after this patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index edad7a4..fbf25f1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -830,6 +830,7 @@  struct user_struct {
 	unsigned long mq_bytes;	/* How many bytes can be allocated to mqueue? */
 #endif
 	unsigned long locked_shm; /* How many pages of mlocked shm ? */
+	unsigned long unix_inflight;	/* How many files in flight in unix sockets */
 
 #ifdef CONFIG_KEYS
 	struct key *uid_keyring;	/* UID specific keyring */
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 45aebd9..d6d7b43 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1499,6 +1499,21 @@  static void unix_destruct_scm(struct sk_buff *skb)
 	sock_wfree(skb);
 }
 
+/*
+ * The "user->unix_inflight" variable is protected by the garbage
+ * collection lock, and we just read it locklessly here. If you go
+ * over the limit, there might be a tiny race in actually noticing
+ * it across threads. Tough.
+ */
+static inline bool too_many_unix_fds(struct task_struct *p)
+{
+	struct user_struct *user = current_user();
+
+	if (unlikely(user->unix_inflight > task_rlimit(p, RLIMIT_NOFILE)))
+		return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN);
+	return false;
+}
+
 #define MAX_RECURSION_LEVEL 4
 
 static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
@@ -1507,6 +1522,9 @@  static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 	unsigned char max_level = 0;
 	int unix_sock_count = 0;
 
+	if (too_many_unix_fds(current))
+		return -ETOOMANYREFS;
+
 	for (i = scm->fp->count - 1; i >= 0; i--) {
 		struct sock *sk = unix_get_socket(scm->fp->fp[i]);
 
@@ -1528,10 +1546,8 @@  static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 	if (!UNIXCB(skb).fp)
 		return -ENOMEM;
 
-	if (unix_sock_count) {
-		for (i = scm->fp->count - 1; i >= 0; i--)
-			unix_inflight(scm->fp->fp[i]);
-	}
+	for (i = scm->fp->count - 1; i >= 0; i--)
+		unix_inflight(scm->fp->fp[i]);
 	return max_level;
 }
 
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index a73a226..8fcdc22 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -120,11 +120,11 @@  void unix_inflight(struct file *fp)
 {
 	struct sock *s = unix_get_socket(fp);
 
+	spin_lock(&unix_gc_lock);
+
 	if (s) {
 		struct unix_sock *u = unix_sk(s);
 
-		spin_lock(&unix_gc_lock);
-
 		if (atomic_long_inc_return(&u->inflight) == 1) {
 			BUG_ON(!list_empty(&u->link));
 			list_add_tail(&u->link, &gc_inflight_list);
@@ -132,25 +132,28 @@  void unix_inflight(struct file *fp)
 			BUG_ON(list_empty(&u->link));
 		}
 		unix_tot_inflight++;
-		spin_unlock(&unix_gc_lock);
 	}
+	fp->f_cred->user->unix_inflight++;
+	spin_unlock(&unix_gc_lock);
 }
 
 void unix_notinflight(struct file *fp)
 {
 	struct sock *s = unix_get_socket(fp);
 
+	spin_lock(&unix_gc_lock);
+
 	if (s) {
 		struct unix_sock *u = unix_sk(s);
 
-		spin_lock(&unix_gc_lock);
 		BUG_ON(list_empty(&u->link));
 
 		if (atomic_long_dec_and_test(&u->inflight))
 			list_del_init(&u->link);
 		unix_tot_inflight--;
-		spin_unlock(&unix_gc_lock);
 	}
+	fp->f_cred->user->unix_inflight--;
+	spin_unlock(&unix_gc_lock);
 }
 
 static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),