diff mbox

LSM: Add security_socket_post_accept() and security_socket_post_recv_datagram().

Message ID 200904141944.JFE64074.FHtOMOFQLFJOVS@I-love.SAKURA.ne.jp
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Tetsuo Handa April 14, 2009, 10:44 a.m. UTC
Hello.

The security_socket_post_accept() hook was recently removed because this hook
was not used by any in-tree modules and its existence continued to cause
problems by confusing people about what can be safely accomplished using this
hook. Now, TOMOYO became in-tree and TOMOYO wants to add network access control
in 2.6.31.

TOMOYO is not a label based access control and won't cause packet labeling
problem. TOMOYO won't care as long as packets are not copied to userspace.

Regards.
--------------------
Subject: LSM: Add security_socket_post_accept() and security_socket_post_recv_datagram().

This patch allows LSM modules to filter incoming connections/datagrams based on
the process's security context who is attempting to pick up.

There are already hooks for filtering incoming connections/datagrams based on
the socket's security context, but these hooks are not applicable when someone
wants to do TCP Wrapper-like filtering (e.g. App1 is permitted to accept TCP
connections from 192.168.0.0/16) because nobody can tell who picks them up
before the moment of accept()/recvmsg() request.

Precautions: These hooks have a side effect if improperly used.

If a socket is shared by multiple processes with different policy, the process
who should be able to accept this connection will not be able to accept this
connection because socket_post_accept() aborts this connection.
This is needed because the process who must not be able to accept this
connection will repeat accept() request (and consume CPU resource) forever
if socket_post_accept() doesn't abort this connection.

Similarly, if a socket is shared by multiple processes with different policy,
the process who should be able to pick up this datagram will not be able to
pick up this datagram because socket_post_recv_datagram() discards this
datagram.
This is needed because the process who must not be able to pick up this
datagram will repeat recvmsg() request (and consume CPU resource) forever
if socket_post_recv_datagram() doesn't discard this datagram.

Therefore, don't give different policy between processes who share one socket.
Otherwise, some connections/datagrams cannot be delivered to intended process.

Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp>

 include/linux/security.h |   39 +++++++++++++++++++++++++++++++++++++++
 net/core/datagram.c      |   29 ++++++++++++++++++++++++++++-
 net/socket.c             |    5 +++++
 security/capability.c    |   13 +++++++++++++
 security/security.c      |   11 +++++++++++
 5 files changed, 96 insertions(+), 1 deletion(-)

---

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Paul Moore April 14, 2009, 10:59 p.m. UTC | #1
On Tuesday 14 April 2009 06:44:35 am Tetsuo Handa wrote:
> Hello.
>
> The security_socket_post_accept() hook was recently removed because this
> hook was not used by any in-tree modules and its existence continued to
> cause problems by confusing people about what can be safely accomplished
> using this hook. Now, TOMOYO became in-tree and TOMOYO wants to add network
> access control in 2.6.31.
>
> TOMOYO is not a label based access control and won't cause packet labeling
> problem. TOMOYO won't care as long as packets are not copied to userspace.

We've discussed this issue several times on the mailing list so I won't go 
over all of the details again, but I will say that my main concern with the 
post_accept() hook is that you are rejecting a connection after is has already 
gone through the connection handshake.  I personally don't feel this is a good 
approach but I don't want to stand in your way if I am alone on this point.

I'm less concerned about the recv_datagram() hook although the issues you 
point out about sharing sockets across multiple processes does highlight what 
I believe are some fundamental issues regarding the TOMOYO network security 
model.  Once again, I'm not going to object if I am the only one.

Lastly, I think in order to review these patches we need to see how they are 
used.  Please submit a patch set that includes both this patch as well as a 
patch to TOMOYO which makes use of these changes; this way we can properly 
review your patches in context.  Regardless, I took a quick look and noticed a 
few things (below).

Thanks.

> --- security-testing-2.6.git.orig/net/core/datagram.c
> +++ security-testing-2.6.git/net/core/datagram.c
> @@ -55,6 +55,7 @@
>  #include <net/checksum.h>
>  #include <net/sock.h>
>  #include <net/tcp_states.h>
> +#include <linux/security.h>
>
>  /*
>   *	Is a socket 'connection oriented' ?
> @@ -148,6 +149,7 @@ struct sk_buff *__skb_recv_datagram(stru
>  {
>  	struct sk_buff *skb;
>  	long timeo;
> +	unsigned long cpu_flags;
>  	/*
>  	 * Caller is allowed not to check sk->sk_err before skb_recv_datagram()
>  	 */
> @@ -165,7 +167,6 @@ struct sk_buff *__skb_recv_datagram(stru
>  		 * Look at current nfs client by the way...
>  		 * However, this function was corrent in any case. 8)
>  		 */
> -		unsigned long cpu_flags;
>
>  		spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
>  		skb = skb_peek(&sk->sk_receive_queue);
> @@ -179,6 +180,14 @@ struct sk_buff *__skb_recv_datagram(stru
>  		}
>  		spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
>
> +		/* Filter packets from unwanted peers. */

It might be best to drop this comment, we don't want to speculate how the LSM 
might be making access controls decisions here.

> +		if (skb) {
> +			error = security_socket_post_recv_datagram(sk, skb,
> +								   flags);
> +			if (error)
> +				goto force_dequeue;
> +		}
> +
>  		if (skb)
>  			return skb;

Why check to see if skb != NULL twice?  I think you should be able to move the 
skb return statement into the body of the first if block.

> @@ -191,6 +200,24 @@ struct sk_buff *__skb_recv_datagram(stru
>
>  	return NULL;
>
> +force_dequeue:
> +	/* Drop this packet because LSM says "Don't pass it to the caller". */

Once again, remove this comment since we don't know what a LSM might decide.

> +	if (!(flags & MSG_PEEK))
> +		goto no_peek;
> +	/*
> +	 * If this packet is MSG_PEEK'ed, dequeue it forcibly
> +	 * so that this packet won't prevent the caller from picking up
> +	 * next packet.
> +	 */
> +	spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
> +	if (skb == skb_peek(&sk->sk_receive_queue)) {
> +		__skb_unlink(skb, &sk->sk_receive_queue);
> +		atomic_dec(&skb->users);
> +		/* Do I have something to do with skb->peeked ? */

I don't know but you should find out before this code is merged :)

> +	}
> +	spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
> +no_peek:
> +	kfree_skb(skb);
>  no_packet:
>  	*err = error;
>  	return NULL;
Tetsuo Handa April 15, 2009, 5:12 a.m. UTC | #2
Hello.

Paul Moore wrote:
> Please submit a patch set that includes both this patch as well as a
> patch to TOMOYO which makes use of these changes; this way we can properly
> review your patches in context.

I see.

I have some questions.

> > +	if (!(flags & MSG_PEEK))
> > +		goto no_peek;
> > +	/*
> > +	 * If this packet is MSG_PEEK'ed, dequeue it forcibly
> > +	 * so that this packet won't prevent the caller from picking up
> > +	 * next packet.
> > +	 */
> > +	spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
> > +	if (skb == skb_peek(&sk->sk_receive_queue)) {
> > +		__skb_unlink(skb, &sk->sk_receive_queue);
> > +		atomic_dec(&skb->users);
> > +		/* Do I have something to do with skb->peeked ? */
> 
> I don't know but you should find out before this code is merged :)

Q1: Can I use skb_kill_datagram() here?

    skb_kill_datagram() uses spin_lock_bh() while __skb_recv_datagram() uses
    spin_lock_irqsave(). Since this codepath is called inside
    __skb_recv_datagram(), I used spin_lock_irqsave() rather than calling
    skb_kill_datagram().
> 
> > +	}
> > +	spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
> > +no_peek:
> > +	kfree_skb(skb);

Q2: Do I need to use skb_free_datagram() here rather than kfree_skb()?

    In the past ( http://lkml.org/lkml/2007/11/16/406 ), there was no
    difference between skb_free_datagram() and kfree_skb().
    | void skb_free_datagram(struct sock *sk, struct sk_buff *skb)
    | {
    | 	kfree_skb(skb);
    | }
    But now (as of 2.6.30-rc2), there is a difference.
    | void skb_free_datagram(struct sock *sk, struct sk_buff *skb)
    | {
    | 	consume_skb(skb);
    | 	sk_mem_reclaim_partial(sk);
    | }

> >  no_packet:
> >  	*err = error;
> >  	return NULL;

Q3: Is __skb_recv_datagram() called from contexts that are not permitted to
    sleep?

    If so, TOMOYO has to check whether it is allowed to sleep, for TOMOYO will
    prompt the user "whether to allow App1 to read this datagram or not".

Q4: Is there a way to distinguish requests from userland programs and requests
    from kernel code?

    Some kernel code (e.g. NFS) sends/receives UDP packets to deal requests
    from userland program's requests. TOMOYO wants to distinguish "direct
    requests" (requests issued by userland programs, such as open()/read()/
    write() against files on NFS) and "indirect requests" (requests issued by
    reasons of kernel's own which are needed to handle "direct requests", such
    as fetching file data from NFS server). But currently, TOMOYO can't
    distinguish these requests. As a result, those who use NFS have to give
    permissions for sending/receiving UDP packets to/from NFS server to all
    userland programs.
    This means that TOMOYO allows userland programs to send/receive crafted
    packets to/from NFS server. I want to solve this problem.

Regards.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Moore April 16, 2009, 6:23 p.m. UTC | #3
On Wednesday 15 April 2009 01:12:41 am Tetsuo Handa wrote:
> Hello.
>
> Paul Moore wrote:
> > Please submit a patch set that includes both this patch as well as a
> > patch to TOMOYO which makes use of these changes; this way we can
> > properly review your patches in context.

Thank you for sending a patchset with both TOMOYO and LSM/stack changes; this 
should give other developers who may not be familiar with TOMOYO a chance to 
review your code.  My comments/concerns about the LSM changes still stand, if 
it is decided to merge these changes I'll be happy to review the TOMOYO 
changes further.

> > > +	if (!(flags & MSG_PEEK))
> > > +		goto no_peek;
> > > +	/*
> > > +	 * If this packet is MSG_PEEK'ed, dequeue it forcibly
> > > +	 * so that this packet won't prevent the caller from picking up
> > > +	 * next packet.
> > > +	 */
> > > +	spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
> > > +	if (skb == skb_peek(&sk->sk_receive_queue)) {
> > > +		__skb_unlink(skb, &sk->sk_receive_queue);
> > > +		atomic_dec(&skb->users);
> > > +		/* Do I have something to do with skb->peeked ? */
> >
> > I don't know but you should find out before this code is merged :)
>
> Q1: Can I use skb_kill_datagram() here?
>
>     skb_kill_datagram() uses spin_lock_bh() while __skb_recv_datagram()
> uses spin_lock_irqsave(). Since this codepath is called inside
>     __skb_recv_datagram(), I used spin_lock_irqsave() rather than calling
>     skb_kill_datagram().

Since __skb_recv_datagram() is already using spin_lock_irqsave() it seems 
reasonable to do the same in your changes.  As far as skb->peeked is concerned 
I don't think it matters much since you are destroying the skb anyway.

> > > +no_peek:
> > > +	kfree_skb(skb);
>
> Q2: Do I need to use skb_free_datagram() here rather than kfree_skb()?
>
>     In the past ( http://lkml.org/lkml/2007/11/16/406 ), there was no
>     difference between skb_free_datagram() and kfree_skb().
>
>     | void skb_free_datagram(struct sock *sk, struct sk_buff *skb)
>     | {
>     | 	kfree_skb(skb);
>     | }
>
>     But now (as of 2.6.30-rc2), there is a difference.
>
>     | void skb_free_datagram(struct sock *sk, struct sk_buff *skb)
>     | {
>     | 	consume_skb(skb);
>     | 	sk_mem_reclaim_partial(sk);
>     | }

I don't know for certain, I would have to go look at other users of 
skb_free_datagram(), but it does look like using skb_free_datagram() instead 
of skb_free() might be preferable.

> Q3: Is __skb_recv_datagram() called from contexts that are not permitted to
>     sleep?
>
>     If so, TOMOYO has to check whether it is allowed to sleep, for TOMOYO
> will prompt the user "whether to allow App1 to read this datagram or not".

I believe __skb_recv_datagram() is only called via userspace so sleeping 
should not be an issue.

> Q4: Is there a way to distinguish requests from userland programs and
> requests from kernel code?

I'm not sure if this is the 100% correct way to do it, but in the past I have 
always checked current->mm, for kernel threads this will be NULL.
Tetsuo Handa April 18, 2009, 8:34 a.m. UTC | #4
Hello.

Thank you for answering my questions.

Paul Moore wrote:
> > Q1: Can I use skb_kill_datagram() here?
> >
> >     skb_kill_datagram() uses spin_lock_bh() while __skb_recv_datagram()
> > uses spin_lock_irqsave(). Since this codepath is called inside
> >     __skb_recv_datagram(), I used spin_lock_irqsave() rather than calling
> >     skb_kill_datagram().
> 
> Since __skb_recv_datagram() is already using spin_lock_irqsave() it seems 
> reasonable to do the same in your changes.
I see. I'll use spin_lock_irqsave() here.

> > > > +no_peek:
> > > > +	kfree_skb(skb);
> >
> > Q2: Do I need to use skb_free_datagram() here rather than kfree_skb()?
> >
> >     In the past ( http://lkml.org/lkml/2007/11/16/406 ), there was no
> >     difference between skb_free_datagram() and kfree_skb().
> >
> >     | void skb_free_datagram(struct sock *sk, struct sk_buff *skb)
> >     | {
> >     | 	kfree_skb(skb);
> >     | }
> >
> >     But now (as of 2.6.30-rc2), there is a difference.
> >
> >     | void skb_free_datagram(struct sock *sk, struct sk_buff *skb)
> >     | {
> >     | 	consume_skb(skb);
> >     | 	sk_mem_reclaim_partial(sk);
> >     | }
> 
> I don't know for certain, I would have to go look at other users of 
> skb_free_datagram(), but it does look like using skb_free_datagram() instead
 
> of skb_free() might be preferable.
I see. I'll use skb_free_datagram() here.

Also, I'll need to protect skb_free_datagram() with lock_sock()/
release_sock().
(Thanks to Eric Dumazet.)

> > Q3: Is __skb_recv_datagram() called from contexts that are not permitted 
to
> >     sleep?
> >
> >     If so, TOMOYO has to check whether it is allowed to sleep, for TOMOYO
> > will prompt the user "whether to allow App1 to read this datagram or not".
> 
> I believe __skb_recv_datagram() is only called via userspace so sleeping 
> should not be an issue.
> 
NFS code needs to issue UDP send()/recv() requests from the 
kernel. Therefore,
I think __skb_recv_datagram() is called from kernel space.

I'm worrying that __skb_recv_datagram() is called with a 
spinlock held.

> > Q4: Is there a way to distinguish requests from userland programs and
> > requests from kernel code?
> 
> I'm not sure if this is the 100% correct way to do it, but in the past I 
have 
> always checked current->mm, for kernel threads this will be NULL.

Nowadays, it will be "current->mm && !(current->flags & PF_
KTHREAD)" because
get_task_mm() says a kernel workthread may temporarily have a 
user mm to do its
AIO.

Sorry for confusing question. What I wanted to know is not "how 
can I
distinguish kernel processes and userland processes". What I 
wanted to know is
"how can I distinguish requests issued for processing a request 
>from userland
process".

(a) If a file is on a simple filesystem like ext3, an open()/
read()/write()
    request from userspace application will not issue another
    open()/read()/write() request.

(b) If a file is on a stackable filesystem like unionfs, an open
()/create()/
    unlink() request from userspace application will issue 
another
    open()/create()/unlink() request from kernel space.

(c) If a file is on a networking filesystem like nfs, an open()/
create()/
    unlink() request from userspace application will issue send
()/recv()
    request from kernel space.

If (b) and (c) use a dedicated task_struct for handling requests
 from kernel
space, TOMOYO has nothing to do.
But unfortunately, (b) and (c) use a task_struct of the 
userspace application
for handling requests from kernel space. TOMOYO wants to 
distinguish
an open()/create()/unlink() request from userspace application 
and 
open()/create()/unlink()/send()/recv() requests needed for 
handling
the open()/create()/unlink() request from userspace application.
I wished that there is an indicator that tells TOMOYO that 
whether an
open()/create()/unlink()/send()/recv() request is issued for 
processing
a request from userland process.

I know little about NFS, but I think NFS does not use a kernel 
workthread for
sending/receiving UDP packets to handle an open()/read()/write()
 request from
a userspace application.

I'm afraid that checking "current->mm && !(current->flags & PF_
KTHREAD)" will
not help TOMOYO in distinguishing "direct request" and "indirect
 request".






By the way, I need to tell you one more thing about
security_socket_post_accept() hook's usage. Not now, but in 
future,
I want to introduce task's state variable which is used for 
dividing
permissions within a domain.

  # Example policy for /usr/sbin/sshd
  allow_network TCP accept 10.0.0.0-10.255.255.255.255 1024-
65535 ; set task.state[0]=1
  allow_network TCP accept 192.168.0.0-192.168.255.255 1024-
65535 ; set task.state[0]=2
  allow_execute /bin/bash if task.state[0]=1
  allow_execute /bin/rbash if task.state[0]=2

The above example policy allows an instance of /usr/sbin/sshd to
(a) execute /bin/bash if that instance accepted a TCP connection
 from
    10.0.0.0/8
(b) execute /bin/rbash if that instance accepted a TCP 
connection from
    192.168.0.0/16.
(c) abort TCP connections if that instance accepted a TCP 
connection from
    neither 10.0.0.0/8 nor 192.168.0.0/16.

The security_socket_post_accept() hook is used for not only 
aborting TCP
connections from unwanted peers but also associating client's 
information
with a process who handles that TCP connection. The task's state
 variable
definitely requires a LSM hook which is called after sock->ops->
accept() call.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Moore April 20, 2009, 10:22 p.m. UTC | #5
On Saturday 18 April 2009 04:34:02 am Tetsuo Handa wrote:
> Hello.
>
> Thank you for answering my questions.

No problem.

> > I believe __skb_recv_datagram() is only called via userspace so sleeping
> > should not be an issue.
>
> NFS code needs to issue UDP send()/recv() requests from the
> kernel. Therefore, I think __skb_recv_datagram() is called from kernel
> space.

My mistake, I trusted (or misread) the comment at the start of the function.

> I'm worrying that __skb_recv_datagram() is called with a
> spinlock held.

It looks like you've already solved that issue.

> > > Q4: Is there a way to distinguish requests from userland programs and
> > > requests from kernel code?
> >
> > I'm not sure if this is the 100% correct way to do it, but in the past I
> > have always checked current->mm, for kernel threads this will be NULL.
>
> Nowadays, it will be "current->mm && !(current->flags & PF_
> KTHREAD)" because get_task_mm() says a kernel workthread may temporarily
> have a user mm to do its AIO.

Thanks, that is good to know.

> Sorry for confusing question. What I wanted to know is not "how
> can I distinguish kernel processes and userland processes". What I
> wanted to know is "how can I distinguish requests issued for processing a
> request from userland process".

I do not know of a way but someone else reading this might.

> By the way, I need to tell you one more thing about
> security_socket_post_accept() hook's usage. Not now, but in future,
> I want to introduce task's state variable which is used for dividing
> permissions within a domain.
>
>   # Example policy for /usr/sbin/sshd
>   allow_network TCP accept 10.0.0.0-10.255.255.255.255 1024-
> 65535 ; set task.state[0]=1
>   allow_network TCP accept 192.168.0.0-192.168.255.255 1024-
> 65535 ; set task.state[0]=2
>   allow_execute /bin/bash if task.state[0]=1
>   allow_execute /bin/rbash if task.state[0]=2
>
> The above example policy allows an instance of /usr/sbin/sshd to
> (a) execute /bin/bash if that instance accepted a TCP connection
>  from
>     10.0.0.0/8
> (b) execute /bin/rbash if that instance accepted a TCP
> connection from
>     192.168.0.0/16.
> (c) abort TCP connections if that instance accepted a TCP
> connection from
>     neither 10.0.0.0/8 nor 192.168.0.0/16.
>
> The security_socket_post_accept() hook is used for not only
> aborting TCP connections from unwanted peers but also associating client's
> information with a process who handles that TCP connection. The task's state
> variable definitely requires a LSM hook which is called after sock->ops->
> accept() call.

I don't have a problem with using a socket_post_accept() hook to assign/modify 
state, however, I still not like the idea of using the socket_post_accept() 
hook to abort connections.
Tetsuo Handa April 21, 2009, 10:54 a.m. UTC | #6
Paul Moore wrote:
> > The security_socket_post_accept() hook is used for not only
> > aborting TCP connections from unwanted peers but also associating client's
> > information with a process who handles that TCP connection. The task's state
> > variable definitely requires a LSM hook which is called after sock->ops->
> > accept() call.
> 
> I don't have a problem with using a socket_post_accept() hook to assign/modify 
> state, however, I still not like the idea of using the socket_post_accept()
> hook to abort connections.

What do you think that assigning/modifying state at socket_post_accept() could
fail?

TOMOYO 1.x never fails since task's state variable is directly attached to
"struct task_struct" but TOMOYO 2.x can fail since task's state variable will
be indirectly attached via current->cred->security and memory allocation
failure for new credential could occur. Though it unlikely fails, I have to
abort connections by returning an error at socket_post_accept() hook if failed.

I think the socket_post_accept() hook anyway need to be able to abort
connections. (Or prepare new credential at socket_accept() and add a hook for
discarding prepared credential when sock->ops->accept() returned an error?)

Regards.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 21, 2009, 10:57 a.m. UTC | #7
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Tue, 21 Apr 2009 19:54:06 +0900

> Paul Moore wrote:
>> > The security_socket_post_accept() hook is used for not only
>> > aborting TCP connections from unwanted peers but also associating client's
>> > information with a process who handles that TCP connection. The task's state
>> > variable definitely requires a LSM hook which is called after sock->ops->
>> > accept() call.
>> 
>> I don't have a problem with using a socket_post_accept() hook to assign/modify 
>> state, however, I still not like the idea of using the socket_post_accept()
>> hook to abort connections.
> 
> What do you think that assigning/modifying state at socket_post_accept() could
> fail?

I have to agree with Paul.

There is no sane way for the user to handle this connection
being aborted, and there is no way to insert the connection
back into the listening socket queue once we get to this
point so we can't replay this situation either.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa April 21, 2009, 11:39 a.m. UTC | #8
David Miller wrote:
> There is no sane way for the user to handle this connection
> being aborted, and there is no way to insert the connection
> back into the listening socket queue once we get to this
> point so we can't replay this situation either.
> 
Excuse me, I couldn't catch.

I don't have a problem that there is no way to insert the connection back into
the listening socket queue once we get to this point. I want to drop the
connection rather than pushing the connection back to the queue.

I meant that "if we allow doing something after sock->ops->accept(), we need to
drop the connection if something returned an error". An example of something is
to assign/modify state. To modify task's state, I need to allocate memory for
new credential which is an operation that could fail.
Therefore, the socket_post_accept() hook should allow aborting the connection.

Paul doesn't like using the socket_post_accept() hook to abort connections.
But the connection is aborted if either newsock->ops->getname() or
move_addr_to_user() returned an error. I wonder what's so problematic with
using socket_post_accept() for dropping the connection.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 21, 2009, 11:40 a.m. UTC | #9
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Tue, 21 Apr 2009 20:39:08 +0900

> David Miller wrote:
>> There is no sane way for the user to handle this connection
>> being aborted, and there is no way to insert the connection
>> back into the listening socket queue once we get to this
>> point so we can't replay this situation either.
>> 
> Excuse me, I couldn't catch.
> 
> I don't have a problem that there is no way to insert the connection back into
> the listening socket queue once we get to this point. I want to drop the
> connection rather than pushing the connection back to the queue.

I am saying that you shouldn't be dropping connections like
this.

You've already accepted the packet, you cannot reneg on this.
You must either allow the socket to have the connection with
current labels or give it a label appropriate for the socket.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa April 21, 2009, 12:26 p.m. UTC | #10
David Miller wrote:
> I am saying that you shouldn't be dropping connections like
> this.
Please don't assume that we are talking about labeled networking.
I want to implement TCPwrapper-like filtering (which is automatically linked to
all processes).

> You've already accepted the packet, you cannot reneg on this.
TCPwrapper accepts the packet before it decides to drop the connection.

> You must either allow the socket to have the connection with
> current labels or give it a label appropriate for the socket.
The socket_post_accept() hook is not designed for modifying labels of a socket.
It is too late to modify because we have already accepted the packet.

+ * @socket_post_accept:
+ *	This hook allows a security module to filter connections from unwanted
+ *	peers based on the process accepting this connection.
+ *	The connection will be aborted if this hook returns nonzero.
+ *	This hook is not designed for updating security attributes of
+ *	an accept()ed socket, for the accept()ed socket has already sent
+ *	several packets (e.g. TCP's SYN/ACK packet and some ACK packets for
+ *	incoming data) before this hook is called.
+ *	@sock contains the listening socket structure.
+ *	@newsock contains the newly created server socket for connection.
+ *	Return 0 if permission is granted.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 21, 2009, 12:37 p.m. UTC | #11
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Tue, 21 Apr 2009 21:26:36 +0900

> It is too late to modify because we have already accepted the
> packet.

Then queue it up somewhere, and deliver your verdict and continue
packet processing later.

In fact, I do not like killing accept()'s like this at all.

If we tell the application, via poll() or similar, that connections
are available and no other thread can get in there to steal the
connection, we must deliver a connection to the listening socket when
it calls accept() except in very unusual circumstances (out of file
descriptors, memory allocation failure, etc.)!

I believe this idea is conceptually very broken, sorry.  The semantics
are absolutely horrible.

Put this stuff in userspace, where you say it already is.  The kernel
isn't the place to dump every cool feature you want to bring in from
userspace.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa April 21, 2009, 12:52 p.m. UTC | #12
David Miller wrote:
> Put this stuff in userspace, where you say it already is.  The kernel
> isn't the place to dump every cool feature you want to bring in from
> userspace.
If I can do this stuff in userspace, I already put this stuff in userspace.
Access control in userspace is easily bypassed, no good for policy based
mandatory access control. The reason I propose these hooks in kernel is that
nobody can bypass these hooks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 21, 2009, 1:04 p.m. UTC | #13
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Tue, 21 Apr 2009 21:52:57 +0900

> David Miller wrote:
>> Put this stuff in userspace, where you say it already is.  The kernel
>> isn't the place to dump every cool feature you want to bring in from
>> userspace.
> If I can do this stuff in userspace, I already put this stuff in userspace.
> Access control in userspace is easily bypassed, no good for policy based
> mandatory access control. The reason I propose these hooks in kernel is that
> nobody can bypass these hooks.

They you have to make your decisions when the packet arrives
in order to not break application expected semantics.

If poll() says to a listening socket that connections are
available, we MUST return a connection from accept() unless
there is a hard error.

The current proposal is not acceptable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa April 22, 2009, 12:55 a.m. UTC | #14
David Miller wrote:
> If poll() says to a listening socket that connections are
> available, we MUST return a connection from accept() unless
> there is a hard error.

No application is permitted to assume that accept() returns a connection if
poll() says that connections are available even if there is an assumption that
none of sock->ops->accept(), newsock->ops->getname(), move_addr_to_user()
fails, for there is security_socket_accept() hook which can interrupt between
"poll()" and "accept()".

There is a possibility that LSM module's policy changes from "allow picking
the connection up from the listening socket" to "deny picking the connection
up from the listening socket" after poll() said "connections are available".
We allow this policy change and we tolerate breakage of the application
expected semantics.

> If we tell the application, via poll() or similar, that connections
> are available and no other thread can get in there to steal the
> connection, we must deliver a connection to the listening socket when
> it calls accept() except in very unusual circumstances (out of file
> descriptors, memory allocation failure, etc.)!

If you think policy changes between poll() and security_socket_accept() is
one of unusual circumstances, I think policy changes between
security_socket_accept() and security_socket_post_accept()
(the old policy before poll() said "connections are available" was "allow
picking the connection up from any client" while the new policy after
poll() said "connections are available" is "allow picking the connection
up from only 127.0.0.1") is also one of unusual circumstances.
The only difference between security_socket_accept() and
security_socket_post_accept() is that whether the connection remains in the
listening socket's queue or not. We cannot avoid breakage of the application
expected semantics from the beginning.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 22, 2009, 1:14 a.m. UTC | #15
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Date: Wed, 22 Apr 2009 09:55:26 +0900

> David Miller wrote:
>> If poll() says to a listening socket that connections are
>> available, we MUST return a connection from accept() unless
>> there is a hard error.
> 
> No application is permitted to assume that accept() returns a connection if
> poll() says that connections are available even if there is an assumption that
> none of sock->ops->accept(), newsock->ops->getname(), move_addr_to_user()
> fails, for there is security_socket_accept() hook which can interrupt between
> "poll()" and "accept()".
> 
> There is a possibility that LSM module's policy changes from "allow picking
> the connection up from the listening socket" to "deny picking the connection
> up from the listening socket" after poll() said "connections are available".
> We allow this policy change and we tolerate breakage of the application
> expected semantics.

We had a similar situation with read()'s on UDP sockets.

When poll() says something, it has to stick.

This is not discussable.  If these semantics are "necessary", then
what you're doing is definitely misdesigned in my opinion.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa April 22, 2009, 1:49 a.m. UTC | #16
David Miller wrote:
> We had a similar situation with read()'s on UDP sockets.
> 
> When poll() says something, it has to stick.
To adhere what poll() said (i.e. "connections are ready" or "datagrams are
ready"), security_socket_accept() and security_socket_recvmsg() hooks must be
removed. Otherwise, LSM users cannot adhere what poll() said.

However, security_socket_accept() and security_socket_recvmsg() hooks remain
there. LSM users are already using semantics which may not adhere what poll()
said.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Lindahl April 22, 2009, 1:52 a.m. UTC | #17
On Tue, Apr 21, 2009 at 06:14:11PM -0700, David Miller wrote:

> We had a similar situation with read()'s on UDP sockets.
> 
> When poll() says something, it has to stick.

Isn't that completely different? Anyone who writes code that calls
accept() quickly finds out that in the real world it fails for all
kinds of reasons worth ignoring. As an example, a comment in ircd at
the only accept call (circa 1998):

   /*
   ** There may be many reasons for error return, but in otherwise
   ** correctly working environment the probable cause is running
   ** out of file descriptors (EMFILE, ENFILE or others?). The
   ** man pages for accept don't seem to list these as possible,
   ** although it's obvious that it may happen here.
   ** Thus no specific errors are tested at this point, just
   ** assume that connections cannot be accepted until some old
   ** is closed first.
   */

And it silently ignores EAGAIN, which of course is a can't happen when
used with select(). The recently-written only-runs-on-Linux system I'm
working on ignores EAGAIN, even though it's a can't happen with
epoll. I can ask the guy who wrote it, but he's probably ignoring it
because he was frequently seeing them.

I'd be surprised if you found much real-life code that didn't
gracefully tolerate accept failures. Can anyone come up with an
example?

-- greg

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 22, 2009, 4:22 a.m. UTC | #18
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Date: Wed, 22 Apr 2009 10:49:42 +0900

> David Miller wrote:
>> We had a similar situation with read()'s on UDP sockets.
>> 
>> When poll() says something, it has to stick.
> To adhere what poll() said (i.e. "connections are ready" or "datagrams are
> ready"), security_socket_accept() and security_socket_recvmsg() hooks must be
> removed. Otherwise, LSM users cannot adhere what poll() said.
> 
> However, security_socket_accept() and security_socket_recvmsg() hooks remain
> there. LSM users are already using semantics which may not adhere what poll()
> said.

So what does your TOMOTO stuff do if the mapping changes again and
that incoming connection that became unacceptable is now acceptable?

We've lost the connection, and can never get it back.

These semantics don't make any sense at all, and the point at which
you make your choices here is totally arbitrary.

Read that carefully: the point at which you are making this
drop decision is arbitrary.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 22, 2009, 4:23 a.m. UTC | #19
From: Greg Lindahl <greg@blekko.com>
Date: Tue, 21 Apr 2009 18:52:28 -0700

> On Tue, Apr 21, 2009 at 06:14:11PM -0700, David Miller wrote:
> 
>> We had a similar situation with read()'s on UDP sockets.
>> 
>> When poll() says something, it has to stick.
> 
> Isn't that completely different? Anyone who writes code that calls
> accept() quickly finds out that in the real world it fails for all
> kinds of reasons worth ignoring. As an example, a comment in ircd at
> the only accept call (circa 1998):

I said explicitly that hard errors are allows (out of file
descriptors, memory allocation failure)

Feel free to ignore what I'm saying, and I'll feel free to
ignore you too.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa April 22, 2009, 5:02 a.m. UTC | #20
David Miller wrote:
> So what does your TOMOTO stuff do if the mapping changes again and
> that incoming connection that became unacceptable is now acceptable?
Nothing.

> We've lost the connection, and can never get it back.
TOMOYO won't keep the connection. That's fine.
TOMOYO thinks that dropping the connection/datagram is better than choking
the queue/buffer by keeping the connection/datagram for future policy changes.

I browsed the poll() logic for TCP/IPv4's listening socket.

SYSCALL_DEFINE3(poll, struct pollfd __user *, ufds, unsigned int, nfds,
                long, timeout_msecs)
{
    int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds,
                    struct timespec *end_time)
    {
        static int do_poll(unsigned int nfds,  struct poll_list *list,
                           struct poll_wqueues *wait,
                           struct timespec *end_time)
        {
            static inline unsigned int do_pollfd(struct pollfd *pollfd,
                                                 poll_table *pwait)
            {
                file->f_op->poll(file, pwait);
            }
        }
    }
}

(file->f_op->poll == sock_poll in net/socket.c)

static unsigned int sock_poll(struct file *file, poll_table *wait)
{
    sock->ops->poll(file, sock, wait);
}

(sock->ops->poll == tcp_poll in net/ipv4/af_inet.c)

unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
{
    if (sk->sk_state == TCP_LISTEN) {
        static inline unsigned int inet_csk_listen_poll(const struct sock *sk)
        {
            return !reqsk_queue_empty(&inet_csk(sk)->icsk_accept_queue) ?
                                      (POLLIN | POLLRDNORM) : 0;
        }
    }
}

(and reqsk_queue_empty() is defined as)

static inline int reqsk_queue_empty(struct request_sock_queue *queue)
{
    return queue->rskq_accept_head == NULL;
}

If I read the code correctly, there is no LSM hook called for controlling
whether poll() is allowed to say "connections are ready" or not.
If security_socket_accept() rejects the accept() request, a connection remains
in the listening socket's queue. This will cause the subsequent poll() requests
to say "connections are ready" but the subsequent accept() requests to say
"you are not permitted to pick a connection up", and will continue consuming
100% of CPU resource until security_socket_accept() says "you are permitted to
pick a connection up".

Did I miss something?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 22, 2009, 5:07 a.m. UTC | #21
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Date: Wed, 22 Apr 2009 14:02:16 +0900

> Did I miss something?

Do me a favor.  We have mechanisms by which you can queue up packets
(even into userspace) to make policy decisions.

You can make your decision there, and if appropriate, reinject the
packet back into the kernel input processing.

This is how netfilter ip_queue allows people to implement policies
in userspace, and you could use it in the kernel and then need
none of these ugly hooks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa April 22, 2009, 5:38 a.m. UTC | #22
David Miller wrote:
> Do me a favor.  We have mechanisms by which you can queue up packets
> (even into userspace) to make policy decisions.
I know.
> 
> You can make your decision there, and if appropriate, reinject the
> packet back into the kernel input processing.
No I can't.
> 
> This is how netfilter ip_queue allows people to implement policies
> in userspace, and you could use it in the kernel and then need
> none of these ugly hooks.
I was suggested to use that approach in the past discussion.

I want to make policy decisions based on the "task_struct" who picks up
the connection/datagram.

The netfilter can't predict which "task_struct" will pick up the
connection/datagram. Nobody can predict it. Even security_socket_accept()
and security_socket_recvmsg() can't predict it.

If TOMOYO is allowed to make policy decisions based on the "task_struct" who
picks up the connection/datagram, the only possible approach is to introduce
security_socket_post_accept() and security_socket_post_recv_datagram().
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 22, 2009, 5:52 a.m. UTC | #23
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Date: Wed, 22 Apr 2009 14:38:04 +0900

> If TOMOYO is allowed to make policy decisions based on the "task_struct" who
> picks up the connection/datagram, the only possible approach is to introduce
> security_socket_post_accept() and security_socket_post_recv_datagram().

Fine.

FWIW I do not agree with TOMOYO conceptually.  But if people are
generally going to let such a scheme into the kernel, I guess
I have to accept these socket layer hacks :-/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Lindahl April 22, 2009, 6:10 a.m. UTC | #24
On Tue, Apr 21, 2009 at 09:23:42PM -0700, David Miller wrote:

> I said explicitly that hard errors are allows (out of file
> descriptors, memory allocation failure)

I have no idea what you meant by a "hard" error. Note that I also
discussed EAGAIN, which appears to happen commonly historically and
today, and appears to be what the security module folks would want to
have happen and you're rejecting. Do you consider that to be a hard
error? I'm betting not.

-- greg


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 22, 2009, 6:34 a.m. UTC | #25
From: Greg Lindahl <greg@blekko.com>
Date: Tue, 21 Apr 2009 23:10:06 -0700

> On Tue, Apr 21, 2009 at 09:23:42PM -0700, David Miller wrote:
> 
>> I said explicitly that hard errors are allows (out of file
>> descriptors, memory allocation failure)
> 
> I have no idea what you meant by a "hard" error. Note that I also
> discussed EAGAIN, which appears to happen commonly historically and
> today, and appears to be what the security module folks would want to
> have happen and you're rejecting. Do you consider that to be a hard
> error? I'm betting not.

People use poll() to avoid -EAGAIN and blocking, they expect the bits
to tell them what fd's they can work on to do real work.

But this whole task_struct based security is bogus from the start.

If I dup a file descriptor for a listening socket, and accept() in the
"wrong" task, the other task has no way to accept() that connection
even if it's security settings allow it.  The connection is lost
forever.

The realm of file descriptors overlaps that of tasks.  Trying to
pretend this isn't the case results in all kinds of crazy things like
we see being attempted here.

I consider TOMOYO conceptually broken in many regards.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Lindahl April 22, 2009, 6:41 a.m. UTC | #26
On Tue, Apr 21, 2009 at 11:34:03PM -0700, David Miller wrote:

> People use poll() to avoid -EAGAIN and blocking, they expect the bits
> to tell them what fd's they can work on to do real work.

My point is that EAGAIN happens already. So you can't claim that
returning it in accept() breaks the interface, when it's common enough
that today's user-level network code already handles it.

I have no opinion about TOMOYO. There are many reasons other than
EAGAIN from accept() to complain about.

-- greg

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 22, 2009, 6:46 a.m. UTC | #27
From: Greg Lindahl <greg@blekko.com>
Date: Tue, 21 Apr 2009 23:41:59 -0700

> On Tue, Apr 21, 2009 at 11:34:03PM -0700, David Miller wrote:
> 
>> People use poll() to avoid -EAGAIN and blocking, they expect the bits
>> to tell them what fd's they can work on to do real work.
> 
> My point is that EAGAIN happens already. So you can't claim that
> returning it in accept() breaks the interface, when it's common enough
> that today's user-level network code already handles it.

EAGAIN does not happen if the application calls poll(), gets
an indication that connections are available, and then
immediately calls accept() on the indicated FD.

It can only happen if it calls accept() multiple times after such a
poll() call _OR_ it does not control multi-threaded access to the
listen file descriptor.

This new behavior from TOMOYO would make accept() return -EAGAIN in
cases which are of no fault of the application.  It is definitely
unexpected behavior.

If overly anal apps "code for it" that is entirely besides the point.
What we have to be concerned for, from a kernel behavioral standpoint,
is that some apps "might not code for it".  This is why we don't
change behavior.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Lindahl April 22, 2009, 6:54 a.m. UTC | #28
On Tue, Apr 21, 2009 at 11:46:18PM -0700, David Miller wrote:

> EAGAIN does not happen if the application calls poll(), gets
> an indication that connections are available, and then
> immediately calls accept() on the indicated FD.

I have observed it recently and historically, and not by calling
accept() repeatedly. I don't know what you mean by "immediately" since I
don't think you're advocating race conditions; the other end can
always exit/reset/whatever between the poll() and the accept().

> If overly anal apps "code for it" that is entirely besides the point.
> What we have to be concerned for, from a kernel behavioral standpoint,
> is that some apps "might not code for it".  This is why we don't
> change behavior.

I am suggesting that you survey actual apps. If you find that they're
all overly anal, then maybe you've learned something about EAGAIN
already happening today. I assure you that the co-worker who stuck in
the "ignore EAGAIN without logging it" only did so because he saw it
fairly frequently. He's that way.

-- greg


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 22, 2009, 6:58 a.m. UTC | #29
From: Greg Lindahl <greg@blekko.com>
Date: Tue, 21 Apr 2009 23:54:43 -0700

> On Tue, Apr 21, 2009 at 11:46:18PM -0700, David Miller wrote:
> 
>> If overly anal apps "code for it" that is entirely besides the point.
>> What we have to be concerned for, from a kernel behavioral standpoint,
>> is that some apps "might not code for it".  This is why we don't
>> change behavior.
> 
> I am suggesting that you survey actual apps. If you find that they're
> all overly anal, then maybe you've learned something about EAGAIN
> already happening today. I assure you that the co-worker who stuck in
> the "ignore EAGAIN without logging it" only did so because he saw it
> fairly frequently. He's that way.

It's likely a bug and should have been reported.

Unexplainable behavior isn't something to ignore.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa April 22, 2009, 7:19 a.m. UTC | #30
David Miller wrote:
> If I dup a file descriptor for a listening socket, and accept() in the
> "wrong" task, the other task has no way to accept() that connection
> even if it's security settings allow it.  The connection is lost
> forever.
Why the connection gets lost? If two tasks' security settings are the same,
the process whichever reached sock->ops->accept() first will get the connetion.
If two tasks' security settings are not the same, I warned it on the patch
descripption.

> This new behavior from TOMOYO would make accept() return -EAGAIN in
> cases which are of no fault of the application.  It is definitely
> unexpected behavior.
TOMOYO will return -ECONNABORTED, which is also returned by failure of
newsock->ops->getname().

If there were some application which can't handle accept() returning
-ECONNABORTED error, we can simply disable this filtering (by giving such
application permission to accept connection from all addresses).
Applications should be able to handle accept() error other than -EAGAIN.
It is legal to return (for example) -ENOMEM, -EPERM. "man 2 accept" says:

ERRORS
       accept() shall fail if:

       EAGAIN or EWOULDBLOCK
              The socket is marked non-blocking and no connections are present to be accepted.

       EBADF  The descriptor is invalid.

       ECONNABORTED
              A connection has been aborted.

       EINTR  The system call was interrupted by a signal that was caught before a valid connection arrived.

       EINVAL Socket is not listening for connections, or addrlen is invalid (e.g., is negative).

       EMFILE The per-process limit of open file descriptors has been reached.

       ENFILE The system limit on the total number of open files has been reached.

       ENOTSOCK
              The descriptor references a file, not a socket.

       EOPNOTSUPP
              The referenced socket is not of type SOCK_STREAM.

       accept() may fail if:

       EFAULT The addr argument is not in a writable part of the user address space.

       ENOBUFS, ENOMEM
              Not  enough free memory.  This often means that the memory allocation is limited by the socket buffer limits, not by the system memory.

       EPROTO Protocol error.

       Linux accept() may fail if:

       EPERM  Firewall rules forbid connection.

       In addition, network errors for the new socket and as defined for the protocol may be returned. Various  Linux  kernels  can  return
       other errors such as ENOSR, ESOCKTNOSUPPORT, EPROTONOSUPPORT, ETIMEDOUT.  The value ERESTARTSYS may be seen during a trace.

Linux 2.6.7                       2004-06-17                         ACCEPT(2)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa April 23, 2009, 2 p.m. UTC | #31
David Miller wrote:
> FWIW I do not agree with TOMOYO conceptually.
That will be my fault. I haven't explained you the background of this proposal.
Would you please be patient and read below explanation?

Thanks.
----------
TOMOYO is a security module which focuses on behavior of a system. A process is
created to achieve something. TOMOYO lets each process declare behaviors and
resources needed to achieve its purpose (like an immigration officer) and
permits only declared behaviors and resources (like an operation watchdog).

TOMOYO has an unprecedented concept called "process invocation history" (in
short, PIH). TOMOYO utilizes the PIH for categorizing the purpose of a process.
The PIH is stored into current->cred->security and is defined as concatenation
of program's pathnames ever executed. For example, /sbin/init invoked from the
kernel is defined as "<kernel> /sbin/init", /etc/rc.d/rc.sysinit invoked from
/sbin/init invoked from the kernel is defined as
"<kernel> /sbin/init /etc/rc.d/rc.sysinit". (There are some exceptions, but I
omit explanation because exceptions have no linkage with this proposal.)

**TOMOYO's policy is PIH-driven.** For example,

  <kernel> /sbin/init
  allow_read /etc/inittab

means that any process with PIH "<kernel> /sbin/init" is allowed to open a file
named /etc/inittab for reading.

  <kernel> /usr/sbin/sshd
  allow_create /var/run/sshd.pid

means that any process with PIH "<kernel> /usr/sbin/sshd" is allowed to create
a file named /var/run/sshd.pid .

  <kernel> /usr/sbin/sshd /bin/bash /usr/bin/curl
  allow_network TCP connect 192.168.1.1 80
  allow_network UDP connect 192.168.1.2 53

means that any process with PIH
"<kernel> /usr/sbin/sshd /bin/bash /usr/bin/curl" is allowed to send TCP
connect() requests to 192.168.1.1 port 80 and is allowed to send UDP datagrams
to 192.168.1.2 port 53.

TOMOYO wants to allow writing policy for incoming connections/datagrams in the
same manner. For example,

  <kernel> /usr/sbin/sshd
  allow_network TCP accept 10.0.0.0-10.255.255.255 1024-65535

means that any process with PIH "<kernel> /usr/sbin/sshd" is allowed to pick up
TCP connections from 10.0.0.0/8 port 1024-65535.

To be able to write in the same manner, TOMOYO needs to know the PIH of a
process who is about to pick up the incoming connection/datagram.
The PIH (i.e. current->cred->security) is different from the security context
of a socket which is going to enqueue the incoming connection/datagram
(i.e. "struct sock"->sk_security). And LSM has no hooks which allow TOMOYO to
use current->cred->security for incoming connections/datagrams.

There could be some programs which get confused by accept()/recvmsg() returning
an error when poll() said "connections are ready" or "datagrams are ready".
If we find such programs, we can tell TOMOYO to disable filtering for such
programs.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 23, 2009, 2:10 p.m. UTC | #32
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 23 Apr 2009 23:00:09 +0900

> David Miller wrote:
>> FWIW I do not agree with TOMOYO conceptually.
> That will be my fault. I haven't explained you the background of this proposal.
> Would you please be patient and read below explanation?

Save your typing, I've read all of these descriptions, I still
do not like it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Samir Bellabes April 23, 2009, 2:47 p.m. UTC | #33
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:

> There could be some programs which get confused by accept()/recvmsg() returning
> an error when poll() said "connections are ready" or "datagrams are ready".
> If we find such programs, we can tell TOMOYO to disable filtering for such
> programs.

Hello Tetsuo,

this will introduce a way to bypass the security system (?)

If TOMOYO won't filter such programs, people may add this "poll()"
feature to their code, in order to escape the security system.

I think it's strange for a security system to allow some programs
because of specific code issue, and not because of security reasons.

sam
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa April 24, 2009, 2:07 a.m. UTC | #34
David Miller wrote:
> People use poll() to avoid -EAGAIN and blocking, they expect the bits
> to tell them what fd's they can work on to do real work.

I found that "man 2 select" says

  Under Linux, select() may report a socket file descriptor as "ready for
  reading", while nevertheless a subsequent read blocks. This could for example
  happen when data has arrived but upon examination has wrong checksum and is
  discarded. There may be other circumstances in which a file descriptor is
  spuriously reported as ready. Thus it may be safer to use O_NONBLOCK on
  sockets that should not block.

  Linux 2.6.16                   2006-03-11                      SELECT(2)

People cannot use "poll()" to avoid blocking.
Applications had better not to completely depend on what poll() says.

> The current proposal is not acceptable.
You don't like TOMOYO's concept. I see.
But I don't see the reason you can't accept this proposal.
What does this proposal break? Please explain me.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 24, 2009, 4:35 a.m. UTC | #35
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Date: Fri, 24 Apr 2009 11:07:28 +0900

> David Miller wrote:
>> People use poll() to avoid -EAGAIN and blocking, they expect the bits
>> to tell them what fd's they can work on to do real work.
> 
> I found that "man 2 select" says
> 
>   Under Linux, select() may report a socket file descriptor as "ready for
>   reading", while nevertheless a subsequent read blocks. This could for example
>   happen when data has arrived but upon examination has wrong checksum and is
>   discarded.

You won't give up will you?  If you're trying to irritate me, it's
working.

This behavior mentioned in that manpage snippet is a BUG which we
FIXED!  You're just proving my point even more!

The poll() paths now cause a bypass of the delayed checksum
verification, which used to cause that above mentioned incorrect
behavior.

Don't push man page crap at me.  Instead, actually understand what the
code does and how it behaves.

This man page snipped above is completely wrong and buggy.  Never
trust documentation over code.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa April 24, 2009, 4:55 a.m. UTC | #36
David Miller wrote:
> This behavior mentioned in that manpage snippet is a BUG which we
> FIXED!  You're just proving my point even more!
So, people can use poll() to avoid -EAGAIN and blocking. I see.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa April 24, 2009, 5:26 a.m. UTC | #37
OK. I understood that security_socket_post_recv_datagram() must not return
-EAGAIN if a process calls recvmsg() after poll() said "ready".
That will be also true for security_socket_recvmsg().


Is it OK for security_socket_recvmsg()/security_socket_accept() to return
an error other than -EAGAIN?
(In other words, security_socket_recvmsg()/security_socket_accept() errors are
one of "hard" errors?)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 24, 2009, 11:40 a.m. UTC | #38
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Date: Fri, 24 Apr 2009 14:26:24 +0900

> OK. I understood that security_socket_post_recv_datagram() must not return
> -EAGAIN if a process calls recvmsg() after poll() said "ready".
> That will be also true for security_socket_recvmsg().
> 
> 
> Is it OK for security_socket_recvmsg()/security_socket_accept() to return
> an error other than -EAGAIN?
> (In other words, security_socket_recvmsg()/security_socket_accept() errors are
> one of "hard" errors?)

I think you really need to wrap your head around the fact that you
can't decide after you've accepted a packet, that it's no longer
acceptable.  Once it's in the socket's queue, and you tell the
application it's there at poll() time, you simply cannot reneg.

You just can't.

Otherwise you're breaking the whole premise upon which these UNIX
system calls are based.  This is how people use these things.

Are you beginning to understand the fundamental problems I have with
your work?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tetsuo Handa April 24, 2009, 1:57 p.m. UTC | #39
David Miller wrote:
> Are you beginning to understand the fundamental problems I have with
> your work?
You are saying that we must not discard what we have in the queue after we told
somebody that we have something in the queue unless we get "hard" errors.

> If poll() says to a listening socket that connections are
> available, we MUST return a connection from accept() unless
> there is a hard error.
I haven't understood what you meant by a "hard" error. What I want to know is,
are errors returned by security_socket_accept() and security_socket_recvmsg()
"hard" errors?

If these errors are not "hard" errors, security_socket_accept() and
security_socket_recvmsg() must not return an error because it will result in
"not returning a connection/datagram", for you are saying that only "hard"
errors can justify "not returning a connection/datagram".

If these errors are "hard" errors, security_socket_accept() and
security_socket_recvmsg() can justify "not returning a connection/datagram".
But errors by newsock->ops->getname() or move_addr_to_user() (these are also
"hard" errors, aren't they?) justified "discarding a connection/datagram".
Then, if a "hard" error was returned by LSM hooks after a connection/datagram
was dequeued, I think we can justify "discarding that connection/datagram".
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- security-testing-2.6.git.orig/include/linux/security.h
+++ security-testing-2.6.git/include/linux/security.h
@@ -880,6 +880,17 @@  static inline void security_free_mnt_opt
  *	@sock contains the listening socket structure.
  *	@newsock contains the newly created server socket for connection.
  *	Return 0 if permission is granted.
+ * @socket_post_accept:
+ *	This hook allows a security module to filter connections from unwanted
+ *	peers based on the process accepting this connection.
+ *	The connection will be aborted if this hook returns nonzero.
+ *	This hook is not designed for updating security attributes of
+ *	an accept()ed socket, for the accept()ed socket has already sent
+ *	several packets (e.g. TCP's SYN/ACK packet and some ACK packets for
+ *	incoming data) before this hook is called.
+ *	@sock contains the listening socket structure.
+ *	@newsock contains the newly created server socket for connection.
+ *	Return 0 if permission is granted.
  * @socket_sendmsg:
  *	Check permission before transmitting a message to another socket.
  *	@sock contains the socket structure.
@@ -893,6 +904,15 @@  static inline void security_free_mnt_opt
  *	@size contains the size of message structure.
  *	@flags contains the operational flags.
  *	Return 0 if permission is granted.
+ * @socket_post_recv_datagram:
+ *	Check permission after receiving a datagram.
+ *	This hook allows a security module to filter packets
+ *	from unwanted peers based on the process receiving this datagram.
+ *	The packet will be discarded if this hook returns nonzero.
+ *	@sk contains the socket.
+ *	@skb contains the socket buffer.
+ *	@flags contains the operational flags.
+ *	Return 0 if permission is granted.
  * @socket_getsockname:
  *	Check permission before the local address (name) of the socket object
  *	@sock is retrieved.
@@ -1549,10 +1569,13 @@  struct security_operations {
 			       struct sockaddr *address, int addrlen);
 	int (*socket_listen) (struct socket *sock, int backlog);
 	int (*socket_accept) (struct socket *sock, struct socket *newsock);
+	int (*socket_post_accept) (struct socket *sock, struct socket *newsock);
 	int (*socket_sendmsg) (struct socket *sock,
 			       struct msghdr *msg, int size);
 	int (*socket_recvmsg) (struct socket *sock,
 			       struct msghdr *msg, int size, int flags);
+	int (*socket_post_recv_datagram) (struct sock *sk, struct sk_buff *skb,
+					  unsigned int flags);
 	int (*socket_getsockname) (struct socket *sock);
 	int (*socket_getpeername) (struct socket *sock);
 	int (*socket_getsockopt) (struct socket *sock, int level, int optname);
@@ -2530,9 +2553,12 @@  int security_socket_bind(struct socket *
 int security_socket_connect(struct socket *sock, struct sockaddr *address, int addrlen);
 int security_socket_listen(struct socket *sock, int backlog);
 int security_socket_accept(struct socket *sock, struct socket *newsock);
+int security_socket_post_accept(struct socket *sock, struct socket *newsock);
 int security_socket_sendmsg(struct socket *sock, struct msghdr *msg, int size);
 int security_socket_recvmsg(struct socket *sock, struct msghdr *msg,
 			    int size, int flags);
+int security_socket_post_recv_datagram(struct sock *sk, struct sk_buff *skb,
+				       unsigned int flags);
 int security_socket_getsockname(struct socket *sock);
 int security_socket_getpeername(struct socket *sock);
 int security_socket_getsockopt(struct socket *sock, int level, int optname);
@@ -2608,6 +2634,12 @@  static inline int security_socket_accept
 	return 0;
 }
 
+static inline int security_socket_post_accept(struct socket *sock,
+					      struct socket *newsock)
+{
+	return 0;
+}
+
 static inline int security_socket_sendmsg(struct socket *sock,
 					  struct msghdr *msg, int size)
 {
@@ -2621,6 +2653,13 @@  static inline int security_socket_recvms
 	return 0;
 }
 
+static inline int security_socket_post_recv_datagram(struct sock *sk,
+						     struct sk_buff *skb,
+						     unsigned int flags)
+{
+	return 0;
+}
+
 static inline int security_socket_getsockname(struct socket *sock)
 {
 	return 0;
--- security-testing-2.6.git.orig/net/core/datagram.c
+++ security-testing-2.6.git/net/core/datagram.c
@@ -55,6 +55,7 @@ 
 #include <net/checksum.h>
 #include <net/sock.h>
 #include <net/tcp_states.h>
+#include <linux/security.h>
 
 /*
  *	Is a socket 'connection oriented' ?
@@ -148,6 +149,7 @@  struct sk_buff *__skb_recv_datagram(stru
 {
 	struct sk_buff *skb;
 	long timeo;
+	unsigned long cpu_flags;
 	/*
 	 * Caller is allowed not to check sk->sk_err before skb_recv_datagram()
 	 */
@@ -165,7 +167,6 @@  struct sk_buff *__skb_recv_datagram(stru
 		 * Look at current nfs client by the way...
 		 * However, this function was corrent in any case. 8)
 		 */
-		unsigned long cpu_flags;
 
 		spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
 		skb = skb_peek(&sk->sk_receive_queue);
@@ -179,6 +180,14 @@  struct sk_buff *__skb_recv_datagram(stru
 		}
 		spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
 
+		/* Filter packets from unwanted peers. */
+		if (skb) {
+			error = security_socket_post_recv_datagram(sk, skb,
+								   flags);
+			if (error)
+				goto force_dequeue;
+		}
+
 		if (skb)
 			return skb;
 
@@ -191,6 +200,24 @@  struct sk_buff *__skb_recv_datagram(stru
 
 	return NULL;
 
+force_dequeue:
+	/* Drop this packet because LSM says "Don't pass it to the caller". */
+	if (!(flags & MSG_PEEK))
+		goto no_peek;
+	/*
+	 * If this packet is MSG_PEEK'ed, dequeue it forcibly
+	 * so that this packet won't prevent the caller from picking up
+	 * next packet.
+	 */
+	spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
+	if (skb == skb_peek(&sk->sk_receive_queue)) {
+		__skb_unlink(skb, &sk->sk_receive_queue);
+		atomic_dec(&skb->users);
+		/* Do I have something to do with skb->peeked ? */
+	}
+	spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
+no_peek:
+	kfree_skb(skb);
 no_packet:
 	*err = error;
 	return NULL;
--- security-testing-2.6.git.orig/net/socket.c
+++ security-testing-2.6.git/net/socket.c
@@ -1519,6 +1519,11 @@  SYSCALL_DEFINE4(accept4, int, fd, struct
 	if (err < 0)
 		goto out_fd;
 
+	/* Filter connections from unwanted peers. */
+	err = security_socket_post_accept(sock, newsock);
+	if (err)
+		goto out_fd;
+
 	if (upeer_sockaddr) {
 		if (newsock->ops->getname(newsock, (struct sockaddr *)&address,
 					  &len, 2) < 0) {
--- security-testing-2.6.git.orig/security/capability.c
+++ security-testing-2.6.git/security/capability.c
@@ -620,6 +620,11 @@  static int cap_socket_accept(struct sock
 	return 0;
 }
 
+static int cap_socket_post_accept(struct socket *sock, struct socket *newsock)
+{
+	return 0;
+}
+
 static int cap_socket_sendmsg(struct socket *sock, struct msghdr *msg, int size)
 {
 	return 0;
@@ -631,6 +636,12 @@  static int cap_socket_recvmsg(struct soc
 	return 0;
 }
 
+static int cap_socket_post_recv_datagram(struct sock *sk, struct sk_buff *skb,
+					 unsigned int flags)
+{
+	return 0;
+}
+
 static int cap_socket_getsockname(struct socket *sock)
 {
 	return 0;
@@ -1010,8 +1021,10 @@  void security_fixup_ops(struct security_
 	set_to_cap_if_null(ops, socket_connect);
 	set_to_cap_if_null(ops, socket_listen);
 	set_to_cap_if_null(ops, socket_accept);
+	set_to_cap_if_null(ops, socket_post_accept);
 	set_to_cap_if_null(ops, socket_sendmsg);
 	set_to_cap_if_null(ops, socket_recvmsg);
+	set_to_cap_if_null(ops, socket_post_recv_datagram);
 	set_to_cap_if_null(ops, socket_getsockname);
 	set_to_cap_if_null(ops, socket_getpeername);
 	set_to_cap_if_null(ops, socket_setsockopt);
--- security-testing-2.6.git.orig/security/security.c
+++ security-testing-2.6.git/security/security.c
@@ -1007,6 +1007,11 @@  int security_socket_accept(struct socket
 	return security_ops->socket_accept(sock, newsock);
 }
 
+int security_socket_post_accept(struct socket *sock, struct socket *newsock)
+{
+	return security_ops->socket_post_accept(sock, newsock);
+}
+
 int security_socket_sendmsg(struct socket *sock, struct msghdr *msg, int size)
 {
 	return security_ops->socket_sendmsg(sock, msg, size);
@@ -1018,6 +1023,12 @@  int security_socket_recvmsg(struct socke
 	return security_ops->socket_recvmsg(sock, msg, size, flags);
 }
 
+int security_socket_post_recv_datagram(struct sock *sk, struct sk_buff *skb,
+				       unsigned int flags)
+{
+	return security_ops->socket_post_recv_datagram(sk, skb, flags);
+}
+
 int security_socket_getsockname(struct socket *sock)
 {
 	return security_ops->socket_getsockname(sock);