Message ID | 1344523833.28967.996.camel@edumazet-glaptop |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On Thursday, August 09, 2012 04:50:33 PM Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > commit be9f4a44e7d41cee (ipv4: tcp: remove per net tcp_sock) added a > selinux regression, reported and bisected by John Stultz > > selinux_ip_postroute_compat() expect to find a valid sk->sk_security > pointer, but this field is NULL for unicast_sock > > Fix this by adding a new 'kernel' parameter to security_sk_alloc(), > set to true if socket might already have a valid sk->sk_security > pointer. ip_send_unicast_reply() uses a percpu fake socket, so the first > call to security_sk_alloc() will populate sk->sk_security pointer, > subsequent ones will reuse existing context. > > Reported-by: John Stultz <johnstul@us.ibm.com> > Bisected-by: John Stultz <johnstul@us.ibm.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Paul Moore <paul@paul-moore.com> > Cc: Eric Paris <eparis@parisplace.org> > Cc: "Serge E. Hallyn" <serge@hallyn.com> ... > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 76dde25..b233d6e 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -1524,6 +1524,8 @@ void ip_send_unicast_reply(struct net *net, struct > sk_buff *skb, __be32 daddr, sk->sk_priority = skb->priority; > sk->sk_protocol = ip_hdr(skb)->protocol; > sk->sk_bound_dev_if = arg->bound_dev_if; > + if (security_sk_alloc(sk, PF_INET, GFP_ATOMIC, true)) > + goto out; > sock_net_set(sk, net); > __skb_queue_head_init(&sk->sk_write_queue); > sk->sk_sndbuf = sysctl_wmem_default; Is is possible to do the call to security_sk_alloc() in the ip_init() function or does the per-cpu nature of the socket make this a pain?
On Thu, 2012-08-09 at 11:07 -0400, Paul Moore wrote: > Is is possible to do the call to security_sk_alloc() in the ip_init() function > or does the per-cpu nature of the socket make this a pain? > Its a pain, if we want NUMA affinity. Here, each cpu should get memory from its closest node. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 9, 2012 at 11:36 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2012-08-09 at 11:07 -0400, Paul Moore wrote: > >> Is is possible to do the call to security_sk_alloc() in the ip_init() function >> or does the per-cpu nature of the socket make this a pain? >> > > Its a pain, if we want NUMA affinity. > > Here, each cpu should get memory from its closest node. Okay, makes sense. Acked-by: Paul Moore <paul@paul-moore.com>
On Thu, Aug 9, 2012 at 11:36 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2012-08-09 at 11:07 -0400, Paul Moore wrote: > >> Is is possible to do the call to security_sk_alloc() in the ip_init() function >> or does the per-cpu nature of the socket make this a pain? >> > > Its a pain, if we want NUMA affinity. > > Here, each cpu should get memory from its closest node. I really really don't like it. I won't say NAK, but it is the first and only place in the kernel where I believe we allocate an object and don't allocate the security blob until some random later point in time. If it is such a performance issue to have the security blob in the same numa node, isn't adding a number of branches and putting this function call on every output at least as bad? Aren't we discouraged from GFP_ATOMIC? In __init we can use GFP_KERNEL. This still doesn't fix these sockets entirely. We now have the security blob allocated, but it was never set to something useful. Paul, are you looking into this? This is a bandaide, not a fix.... -Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 9, 2012 at 12:05 PM, Eric Paris <eparis@parisplace.org> wrote:
> Paul, are you looking into this? This is a bandaide, not a fix....
Yep, I mentioned this a few times in the other thread. The problem is
there is not going to be an easy fix for the labeling so I'd rather we
see this patch, or something like it, go in now to resolve the kernel
panic, and fix the labeling later.
On Thu, 2012-08-09 at 12:05 -0400, Eric Paris wrote: > On Thu, Aug 9, 2012 at 11:36 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Thu, 2012-08-09 at 11:07 -0400, Paul Moore wrote: > > > >> Is is possible to do the call to security_sk_alloc() in the ip_init() function > >> or does the per-cpu nature of the socket make this a pain? > >> > > > > Its a pain, if we want NUMA affinity. > > > > Here, each cpu should get memory from its closest node. > > I really really don't like it. I won't say NAK, but it is the first > and only place in the kernel where I believe we allocate an object and > don't allocate the security blob until some random later point in > time. ... > If it is such a performance issue to have the security blob in > the same numa node, isn't adding a number of branches and putting this > function call on every output at least as bad? Aren't we discouraged > from GFP_ATOMIC? In __init we can use GFP_KERNEL. What a big deal. Its done _once_ time per cpu, and this is so small blob of memory you'll have to show us one single failure out of one million boots. If the security_sk_alloc() fails, we dont care. We are about sending a RESET or ACK packet. They can be lost by the network, or even skb allocation can fail. Nobody ever noticed and complained. Every time we accept() a new socket (and call security_sk_alloc()), its done under soft irq, thus GFP_ATOMIC, and you didn't complain yet, while a socket needs about 2 Kbytes of memory... > > This still doesn't fix these sockets entirely. We now have the > security blob allocated, but it was never set to something useful. > Paul, are you looking into this? This is a bandaide, not a fix.... > Please do so, on a followup patch, dont pretend I must fix all this stuff. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
NAK. I personally think commit be9f4a44e7d41cee should be reverted until it is fixed. Let me explain what all I believe it broke and how. Old callchain of the creation of the 'equivalent' socket previous to the patch in question just for reference: inet_ctl_sock_create sock_create_kern __sock_create pf->create (inet_create) sk_alloc sk_prot_alloc security_sk_alloc() This WAS working properly. All of it. The equivalent struct sock was being created and allocated in inet_create(), which called to sk_alloc->sk_prot_alloc->security_sk_alloc(). We all agree that failing to call security_sk_alloc() is the first regression introduced. The second regression was the labeling issue. There was a call to security_socket_post_create (from __sock_create) which was properly setting the labels on both the socket and sock. This new patch broke that as well. We don't expose an equivalent security_sock_post_create() interface in the LSM currently, and until we do, this can't be fixed. It's why I say it should be reverted. I have a patch I'm testing right now which takes care of the first part the way I like (and yes, I'm doing the allocation on the correct number node). It basically looks like so: + for_each_possible_cpu(cpu) { + sock = &per_cpu(unicast_sock, cpu); + rc = security_sk_alloc(&sock->sk, PF_INET, GFP_KERNEL, cpu_to_node(cpu)); + if (rc) + return rc; + } I'm going to work right now on exposing the equivalent struct sock LSM interface so we can call that as well. But it's going to take me a bit. Attached is the patch just to (hopefully untested) shut up the panic. -Eric On Thu, Aug 9, 2012 at 10:50 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > commit be9f4a44e7d41cee (ipv4: tcp: remove per net tcp_sock) added a > selinux regression, reported and bisected by John Stultz > > selinux_ip_postroute_compat() expect to find a valid sk->sk_security > pointer, but this field is NULL for unicast_sock > > Fix this by adding a new 'kernel' parameter to security_sk_alloc(), > set to true if socket might already have a valid sk->sk_security > pointer. ip_send_unicast_reply() uses a percpu fake socket, so the first > call to security_sk_alloc() will populate sk->sk_security pointer, > subsequent ones will reuse existing context. > > Reported-by: John Stultz <johnstul@us.ibm.com> > Bisected-by: John Stultz <johnstul@us.ibm.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Paul Moore <paul@paul-moore.com> > Cc: Eric Paris <eparis@parisplace.org> > Cc: "Serge E. Hallyn" <serge@hallyn.com> > --- > include/linux/security.h | 6 +++--- > net/core/sock.c | 2 +- > net/ipv4/ip_output.c | 4 +++- > security/security.c | 4 ++-- > security/selinux/hooks.c | 5 ++++- > security/smack/smack_lsm.c | 10 ++++++++-- > 6 files changed, 21 insertions(+), 10 deletions(-) > > diff --git a/include/linux/security.h b/include/linux/security.h > index 4e5a73c..4d8e454 100644 > --- a/include/linux/security.h > +++ b/include/linux/security.h > @@ -1601,7 +1601,7 @@ struct security_operations { > int (*socket_sock_rcv_skb) (struct sock *sk, struct sk_buff *skb); > int (*socket_getpeersec_stream) (struct socket *sock, char __user *optval, int __user *optlen, unsigned len); > int (*socket_getpeersec_dgram) (struct socket *sock, struct sk_buff *skb, u32 *secid); > - int (*sk_alloc_security) (struct sock *sk, int family, gfp_t priority); > + int (*sk_alloc_security) (struct sock *sk, int family, gfp_t priority, bool kernel); > void (*sk_free_security) (struct sock *sk); > void (*sk_clone_security) (const struct sock *sk, struct sock *newsk); > void (*sk_getsecid) (struct sock *sk, u32 *secid); > @@ -2539,7 +2539,7 @@ int security_sock_rcv_skb(struct sock *sk, struct sk_buff *skb); > int security_socket_getpeersec_stream(struct socket *sock, char __user *optval, > int __user *optlen, unsigned len); > int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff *skb, u32 *secid); > -int security_sk_alloc(struct sock *sk, int family, gfp_t priority); > +int security_sk_alloc(struct sock *sk, int family, gfp_t priority, bool kernel); > void security_sk_free(struct sock *sk); > void security_sk_clone(const struct sock *sk, struct sock *newsk); > void security_sk_classify_flow(struct sock *sk, struct flowi *fl); > @@ -2667,7 +2667,7 @@ static inline int security_socket_getpeersec_dgram(struct socket *sock, struct s > return -ENOPROTOOPT; > } > > -static inline int security_sk_alloc(struct sock *sk, int family, gfp_t priority) > +static inline int security_sk_alloc(struct sock *sk, int family, gfp_t priority, bool kernel) > { > return 0; > } > diff --git a/net/core/sock.c b/net/core/sock.c > index 8f67ced..e00cadf 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1186,7 +1186,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority, > if (sk != NULL) { > kmemcheck_annotate_bitfield(sk, flags); > > - if (security_sk_alloc(sk, family, priority)) > + if (security_sk_alloc(sk, family, priority, false)) > goto out_free; > > if (!try_module_get(prot->owner)) > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 76dde25..b233d6e 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -1524,6 +1524,8 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr, > sk->sk_priority = skb->priority; > sk->sk_protocol = ip_hdr(skb)->protocol; > sk->sk_bound_dev_if = arg->bound_dev_if; > + if (security_sk_alloc(sk, PF_INET, GFP_ATOMIC, true)) > + goto out; > sock_net_set(sk, net); > __skb_queue_head_init(&sk->sk_write_queue); > sk->sk_sndbuf = sysctl_wmem_default; > @@ -1539,7 +1541,7 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr, > skb_set_queue_mapping(nskb, skb_get_queue_mapping(skb)); > ip_push_pending_frames(sk, &fl4); > } > - > +out: > put_cpu_var(unicast_sock); > > ip_rt_put(rt); > diff --git a/security/security.c b/security/security.c > index 860aeb3..23cf297 100644 > --- a/security/security.c > +++ b/security/security.c > @@ -1146,9 +1146,9 @@ int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff *skb, u > } > EXPORT_SYMBOL(security_socket_getpeersec_dgram); > > -int security_sk_alloc(struct sock *sk, int family, gfp_t priority) > +int security_sk_alloc(struct sock *sk, int family, gfp_t priority, bool kernel) > { > - return security_ops->sk_alloc_security(sk, family, priority); > + return security_ops->sk_alloc_security(sk, family, priority, kernel); > } > > void security_sk_free(struct sock *sk) > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 6c77f63..ccd4374 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -4289,10 +4289,13 @@ out: > return 0; > } > > -static int selinux_sk_alloc_security(struct sock *sk, int family, gfp_t priority) > +static int selinux_sk_alloc_security(struct sock *sk, int family, gfp_t priority, bool kernel) > { > struct sk_security_struct *sksec; > > + if (kernel && sk->sk_security) > + return 0; > + > sksec = kzalloc(sizeof(*sksec), priority); > if (!sksec) > return -ENOMEM; > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > index 8221514..0b066d0 100644 > --- a/security/smack/smack_lsm.c > +++ b/security/smack/smack_lsm.c > @@ -1749,20 +1749,26 @@ static void smack_task_to_inode(struct task_struct *p, struct inode *inode) > * @sk: the socket > * @family: unused > * @gfp_flags: memory allocation flags > + * @kernel: true if we should check sk_security being already set > * > * Assign Smack pointers to current > * > * Returns 0 on success, -ENOMEM is there's no memory > */ > -static int smack_sk_alloc_security(struct sock *sk, int family, gfp_t gfp_flags) > +static int smack_sk_alloc_security(struct sock *sk, int family, gfp_t gfp_flags, bool kernel) > { > - char *csp = smk_of_current(); > + char *csp; > struct socket_smack *ssp; > > + if (kernel && sk->sk_security) > + return 0; > + > ssp = kzalloc(sizeof(struct socket_smack), gfp_flags); > if (ssp == NULL) > return -ENOMEM; > > + csp = kernel ? smack_net_ambient : smk_of_current(); > + > ssp->smk_in = csp; > ssp->smk_out = csp; > ssp->smk_packet = NULL; > >
On Thu, Aug 9, 2012 at 4:06 PM, Eric Paris <eparis@parisplace.org> wrote: > I'm going to work right now on exposing the equivalent struct sock LSM > interface so we can call that as well. But it's going to take me a > bit. Before you go too far down this path, can you elaborate on what exactly you mean by the above? I'm asking because I'm not convinced the labeling, either the old way or the new way, was 100% correct and I think we're going to need to change things regardless. I'm just not sure what the right solution is just yet.
diff --git a/include/linux/security.h b/include/linux/security.h index 4e5a73c..4d8e454 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1601,7 +1601,7 @@ struct security_operations { int (*socket_sock_rcv_skb) (struct sock *sk, struct sk_buff *skb); int (*socket_getpeersec_stream) (struct socket *sock, char __user *optval, int __user *optlen, unsigned len); int (*socket_getpeersec_dgram) (struct socket *sock, struct sk_buff *skb, u32 *secid); - int (*sk_alloc_security) (struct sock *sk, int family, gfp_t priority); + int (*sk_alloc_security) (struct sock *sk, int family, gfp_t priority, bool kernel); void (*sk_free_security) (struct sock *sk); void (*sk_clone_security) (const struct sock *sk, struct sock *newsk); void (*sk_getsecid) (struct sock *sk, u32 *secid); @@ -2539,7 +2539,7 @@ int security_sock_rcv_skb(struct sock *sk, struct sk_buff *skb); int security_socket_getpeersec_stream(struct socket *sock, char __user *optval, int __user *optlen, unsigned len); int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff *skb, u32 *secid); -int security_sk_alloc(struct sock *sk, int family, gfp_t priority); +int security_sk_alloc(struct sock *sk, int family, gfp_t priority, bool kernel); void security_sk_free(struct sock *sk); void security_sk_clone(const struct sock *sk, struct sock *newsk); void security_sk_classify_flow(struct sock *sk, struct flowi *fl); @@ -2667,7 +2667,7 @@ static inline int security_socket_getpeersec_dgram(struct socket *sock, struct s return -ENOPROTOOPT; } -static inline int security_sk_alloc(struct sock *sk, int family, gfp_t priority) +static inline int security_sk_alloc(struct sock *sk, int family, gfp_t priority, bool kernel) { return 0; } diff --git a/net/core/sock.c b/net/core/sock.c index 8f67ced..e00cadf 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1186,7 +1186,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority, if (sk != NULL) { kmemcheck_annotate_bitfield(sk, flags); - if (security_sk_alloc(sk, family, priority)) + if (security_sk_alloc(sk, family, priority, false)) goto out_free; if (!try_module_get(prot->owner)) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 76dde25..b233d6e 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1524,6 +1524,8 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr, sk->sk_priority = skb->priority; sk->sk_protocol = ip_hdr(skb)->protocol; sk->sk_bound_dev_if = arg->bound_dev_if; + if (security_sk_alloc(sk, PF_INET, GFP_ATOMIC, true)) + goto out; sock_net_set(sk, net); __skb_queue_head_init(&sk->sk_write_queue); sk->sk_sndbuf = sysctl_wmem_default; @@ -1539,7 +1541,7 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr, skb_set_queue_mapping(nskb, skb_get_queue_mapping(skb)); ip_push_pending_frames(sk, &fl4); } - +out: put_cpu_var(unicast_sock); ip_rt_put(rt); diff --git a/security/security.c b/security/security.c index 860aeb3..23cf297 100644 --- a/security/security.c +++ b/security/security.c @@ -1146,9 +1146,9 @@ int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff *skb, u } EXPORT_SYMBOL(security_socket_getpeersec_dgram); -int security_sk_alloc(struct sock *sk, int family, gfp_t priority) +int security_sk_alloc(struct sock *sk, int family, gfp_t priority, bool kernel) { - return security_ops->sk_alloc_security(sk, family, priority); + return security_ops->sk_alloc_security(sk, family, priority, kernel); } void security_sk_free(struct sock *sk) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 6c77f63..ccd4374 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -4289,10 +4289,13 @@ out: return 0; } -static int selinux_sk_alloc_security(struct sock *sk, int family, gfp_t priority) +static int selinux_sk_alloc_security(struct sock *sk, int family, gfp_t priority, bool kernel) { struct sk_security_struct *sksec; + if (kernel && sk->sk_security) + return 0; + sksec = kzalloc(sizeof(*sksec), priority); if (!sksec) return -ENOMEM; diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 8221514..0b066d0 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -1749,20 +1749,26 @@ static void smack_task_to_inode(struct task_struct *p, struct inode *inode) * @sk: the socket * @family: unused * @gfp_flags: memory allocation flags + * @kernel: true if we should check sk_security being already set * * Assign Smack pointers to current * * Returns 0 on success, -ENOMEM is there's no memory */ -static int smack_sk_alloc_security(struct sock *sk, int family, gfp_t gfp_flags) +static int smack_sk_alloc_security(struct sock *sk, int family, gfp_t gfp_flags, bool kernel) { - char *csp = smk_of_current(); + char *csp; struct socket_smack *ssp; + if (kernel && sk->sk_security) + return 0; + ssp = kzalloc(sizeof(struct socket_smack), gfp_flags); if (ssp == NULL) return -ENOMEM; + csp = kernel ? smack_net_ambient : smk_of_current(); + ssp->smk_in = csp; ssp->smk_out = csp; ssp->smk_packet = NULL;