Patchwork [2/2] net: Implement SO_PASSCGROUP to enable passing cgroup path

login
register
mail settings
Submitter Vivek Goyal
Date April 15, 2014, 9:15 p.m.
Message ID <1397596546-10153-3-git-send-email-vgoyal@redhat.com>
Download mbox | patch
Permalink /patch/339377/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

Vivek Goyal - April 15, 2014, 9:15 p.m.
This patch implements socket option SO_PASSCGROUP along the lines of
SO_PASSCRED.

If SO_PASSCGROUP is set, then recvmsg() will get a control message
SCM_CGROUP which will contain the cgroup path of sender. This cgroup
belongs to first mounted hierarchy in the sytem.

SCM_CGROUP control message can only be received and sender can not send
a SCM_CGROUP message. Kernel automatically generates one if receiver
chooses to receive one.

This works both for unix stream and datagram sockets.

cgroup information is passed only if either the sender or receiver has
SO_PASSCGROUP option set. This means for existing workloads they should
not see any significant performance impact of this change.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 arch/alpha/include/uapi/asm/socket.h   |   1 +
 arch/avr32/include/uapi/asm/socket.h   |   1 +
 arch/cris/include/uapi/asm/socket.h    |   1 +
 arch/frv/include/uapi/asm/socket.h     |   1 +
 arch/ia64/include/uapi/asm/socket.h    |   1 +
 arch/m32r/include/uapi/asm/socket.h    |   1 +
 arch/mips/include/uapi/asm/socket.h    |   1 +
 arch/mn10300/include/uapi/asm/socket.h |   1 +
 arch/parisc/include/uapi/asm/socket.h  |   1 +
 arch/powerpc/include/uapi/asm/socket.h |   1 +
 arch/s390/include/uapi/asm/socket.h    |   1 +
 arch/sparc/include/uapi/asm/socket.h   |   1 +
 arch/xtensa/include/uapi/asm/socket.h  |   1 +
 include/linux/net.h                    |   1 +
 include/linux/socket.h                 |   1 +
 include/net/af_unix.h                  |   1 +
 include/net/scm.h                      |  26 +++++--
 include/uapi/asm-generic/socket.h      |   1 +
 net/core/sock.c                        |   7 ++
 net/unix/af_unix.c                     | 122 +++++++++++++++++++++++++++++++++
 20 files changed, 167 insertions(+), 5 deletions(-)
Andrew Lutomirski - April 15, 2014, 9:53 p.m.
On Tue, Apr 15, 2014 at 2:15 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> This patch implements socket option SO_PASSCGROUP along the lines of
> SO_PASSCRED.
>
> If SO_PASSCGROUP is set, then recvmsg() will get a control message
> SCM_CGROUP which will contain the cgroup path of sender. This cgroup
> belongs to first mounted hierarchy in the sytem.
>
> SCM_CGROUP control message can only be received and sender can not send
> a SCM_CGROUP message. Kernel automatically generates one if receiver
> chooses to receive one.
>
> This works both for unix stream and datagram sockets.
>
> cgroup information is passed only if either the sender or receiver has
> SO_PASSCGROUP option set. This means for existing workloads they should
> not see any significant performance impact of this change.

This is odd.  Shouldn't an SCM_CGROUP cmsg be generated when the
receiver has SO_PASSCGROUP set and the sender passes SCM_CGROUP to
sendmsg?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 15, 2014, 11:09 p.m.
On Tue, 2014-04-15 at 14:53 -0700, Andy Lutomirski wrote:
> On Tue, Apr 15, 2014 at 2:15 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > This patch implements socket option SO_PASSCGROUP along the lines of
> > SO_PASSCRED.
> >
> > If SO_PASSCGROUP is set, then recvmsg() will get a control message
> > SCM_CGROUP which will contain the cgroup path of sender. This cgroup
> > belongs to first mounted hierarchy in the sytem.
> >
> > SCM_CGROUP control message can only be received and sender can not send
> > a SCM_CGROUP message. Kernel automatically generates one if receiver
> > chooses to receive one.
> >
> > This works both for unix stream and datagram sockets.
> >
> > cgroup information is passed only if either the sender or receiver has
> > SO_PASSCGROUP option set. This means for existing workloads they should
> > not see any significant performance impact of this change.
> 
> This is odd.  Shouldn't an SCM_CGROUP cmsg be generated when the
> receiver has SO_PASSCGROUP set and the sender passes SCM_CGROUP to
> sendmsg?

What would be the point ?

Simo.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 16, 2014, 12:20 a.m.
On Tue, Apr 15, 2014 at 02:53:13PM -0700, Andy Lutomirski wrote:
> On Tue, Apr 15, 2014 at 2:15 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > This patch implements socket option SO_PASSCGROUP along the lines of
> > SO_PASSCRED.
> >
> > If SO_PASSCGROUP is set, then recvmsg() will get a control message
> > SCM_CGROUP which will contain the cgroup path of sender. This cgroup
> > belongs to first mounted hierarchy in the sytem.
> >
> > SCM_CGROUP control message can only be received and sender can not send
> > a SCM_CGROUP message. Kernel automatically generates one if receiver
> > chooses to receive one.
> >
> > This works both for unix stream and datagram sockets.
> >
> > cgroup information is passed only if either the sender or receiver has
> > SO_PASSCGROUP option set. This means for existing workloads they should
> > not see any significant performance impact of this change.
> 
> This is odd.  Shouldn't an SCM_CGROUP cmsg be generated when the
> receiver has SO_PASSCGROUP set and the sender passes SCM_CGROUP to
> sendmsg?

How can receiver trust the cgroup info generated by sender. It needs to
be generated by kernel so that receiver can trust it.

And if receiver needs to know cgroup of sender, receiver can just set
SO_PASSCGROUP on socket and receiver should get one SCM_CGROUP message
with each message received.

Thanks
Vivek

> 
> --Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - April 16, 2014, 1:05 a.m.
From: Vivek Goyal <vgoyal@redhat.com>
Date: Tue, 15 Apr 2014 20:20:10 -0400

> On Tue, Apr 15, 2014 at 02:53:13PM -0700, Andy Lutomirski wrote:
>> On Tue, Apr 15, 2014 at 2:15 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > This patch implements socket option SO_PASSCGROUP along the lines of
>> > SO_PASSCRED.
>> >
>> > If SO_PASSCGROUP is set, then recvmsg() will get a control message
>> > SCM_CGROUP which will contain the cgroup path of sender. This cgroup
>> > belongs to first mounted hierarchy in the sytem.
>> >
>> > SCM_CGROUP control message can only be received and sender can not send
>> > a SCM_CGROUP message. Kernel automatically generates one if receiver
>> > chooses to receive one.
>> >
>> > This works both for unix stream and datagram sockets.
>> >
>> > cgroup information is passed only if either the sender or receiver has
>> > SO_PASSCGROUP option set. This means for existing workloads they should
>> > not see any significant performance impact of this change.
>> 
>> This is odd.  Shouldn't an SCM_CGROUP cmsg be generated when the
>> receiver has SO_PASSCGROUP set and the sender passes SCM_CGROUP to
>> sendmsg?
> 
> How can receiver trust the cgroup info generated by sender. It needs to
> be generated by kernel so that receiver can trust it.
> 
> And if receiver needs to know cgroup of sender, receiver can just set
> SO_PASSCGROUP on socket and receiver should get one SCM_CGROUP message
> with each message received.

I completely agree.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 16, 2014, 3:47 a.m.
On Apr 15, 2014 5:20 PM, "Vivek Goyal" <vgoyal@redhat.com> wrote:
>
> On Tue, Apr 15, 2014 at 02:53:13PM -0700, Andy Lutomirski wrote:
> > On Tue, Apr 15, 2014 at 2:15 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > > This patch implements socket option SO_PASSCGROUP along the lines of
> > > SO_PASSCRED.
> > >
> > > If SO_PASSCGROUP is set, then recvmsg() will get a control message
> > > SCM_CGROUP which will contain the cgroup path of sender. This cgroup
> > > belongs to first mounted hierarchy in the sytem.
> > >
> > > SCM_CGROUP control message can only be received and sender can not send
> > > a SCM_CGROUP message. Kernel automatically generates one if receiver
> > > chooses to receive one.
> > >
> > > This works both for unix stream and datagram sockets.
> > >
> > > cgroup information is passed only if either the sender or receiver has
> > > SO_PASSCGROUP option set. This means for existing workloads they should
> > > not see any significant performance impact of this change.
> >
> > This is odd.  Shouldn't an SCM_CGROUP cmsg be generated when the
> > receiver has SO_PASSCGROUP set and the sender passes SCM_CGROUP to
> > sendmsg?
>
> How can receiver trust the cgroup info generated by sender. It needs to
> be generated by kernel so that receiver can trust it.
>
> And if receiver needs to know cgroup of sender, receiver can just set
> SO_PASSCGROUP on socket and receiver should get one SCM_CGROUP message
> with each message received.

I think the kernel should validate the data.

Here's an attack against SO_PEERCGROUP: if you create a container with
a super secret name, then every time you connect to any unix socket,
you leak the name.

Here's an attack against SO_PASSCGROUP, as you implemented it: connect
a socket and get someone else to write(2) to it.  This isn't very
hard.  Now you've impersonated.

I advocate for the following semantics: if sendmsg is passed a
SCM_CGROUP cmsg, and that cmsg has the right cgroup, and the receiver
has SO_PASSCGROUP set, then the receiver gets SCM_CGROUP.  If you try
to lie using SCM_CGROUP, you get -EPERM.  If you set SO_PASSCGROUP,
but your peer doesn't sent SCM_CREDS, you get nothing.

This is immune to both attacks.  It should be cheaper, too, since
there's no overhead for people who don't use it.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 16, 2014, 10:17 a.m.
On Tue, Apr 15, 2014 at 08:47:54PM -0700, Andy Lutomirski wrote:
> On Apr 15, 2014 5:20 PM, "Vivek Goyal" <vgoyal@redhat.com> wrote:
> >
> > On Tue, Apr 15, 2014 at 02:53:13PM -0700, Andy Lutomirski wrote:
> > > On Tue, Apr 15, 2014 at 2:15 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > This patch implements socket option SO_PASSCGROUP along the lines of
> > > > SO_PASSCRED.
> > > >
> > > > If SO_PASSCGROUP is set, then recvmsg() will get a control message
> > > > SCM_CGROUP which will contain the cgroup path of sender. This cgroup
> > > > belongs to first mounted hierarchy in the sytem.
> > > >
> > > > SCM_CGROUP control message can only be received and sender can not send
> > > > a SCM_CGROUP message. Kernel automatically generates one if receiver
> > > > chooses to receive one.
> > > >
> > > > This works both for unix stream and datagram sockets.
> > > >
> > > > cgroup information is passed only if either the sender or receiver has
> > > > SO_PASSCGROUP option set. This means for existing workloads they should
> > > > not see any significant performance impact of this change.
> > >
> > > This is odd.  Shouldn't an SCM_CGROUP cmsg be generated when the
> > > receiver has SO_PASSCGROUP set and the sender passes SCM_CGROUP to
> > > sendmsg?
> >
> > How can receiver trust the cgroup info generated by sender. It needs to
> > be generated by kernel so that receiver can trust it.
> >
> > And if receiver needs to know cgroup of sender, receiver can just set
> > SO_PASSCGROUP on socket and receiver should get one SCM_CGROUP message
> > with each message received.
> 
> I think the kernel should validate the data.
> 
> Here's an attack against SO_PEERCGROUP: if you create a container with
> a super secret name, then every time you connect to any unix socket,
> you leak the name.

One should be able to do that already today with SO_PASSCRED option and
then map pid to cgroup. Or if one is using user namespaces then go
through uid mappings and  figure out which container sent message.

> 
> Here's an attack against SO_PASSCGROUP, as you implemented it: connect
> a socket and get someone else to write(2) to it.  This isn't very
> hard.  Now you've impersonated.

If you can get another process to write to your socket and impersonate,
then what will stop from that process to also send SCM_CGROUP message
also? So I don't see how SCM_CGROUP from client will solve this problem.

Kernel cgroup verification will also not help in this case as sender
is sending his own cgroup.

> 
> I advocate for the following semantics: if sendmsg is passed a
> SCM_CGROUP cmsg, and that cmsg has the right cgroup, and the receiver
> has SO_PASSCGROUP set, then the receiver gets SCM_CGROUP.  If you try
> to lie using SCM_CGROUP, you get -EPERM.  If you set SO_PASSCGROUP,
> but your peer doesn't sent SCM_CREDS, you get nothing.
> 
> This is immune to both attacks.  It should be cheaper, too, since
> there's no overhead for people who don't use it.

I think you seem to be saying that a client's credentials should not be
visible to receiver until and unless client himself wants to reveal
those. IOW, it kind of looks like an anonymous mode of operation where
client connects to a socket but receiver client not want to reveal any of
the information about itself to receiver.

I am not sure how useful that mode really is. If it is really useful, I
think one could implement another socket option on client side to
deny passing cgroup information to receiver. Say SO_NOPASSCGROUP.

Before we even get there, I will question that what's so secret about
cgroup information that one would like to hide it from receiver. We don't
hide uid, pid, gid. 

Secondly, how would client know when to send SCM_CGROUP to receiver. For
the use case I mentioned that init wants to log cgroup of every message
going into journal. How would client know that every message needs to
have SCM_CGROUP. By automatically getting client information when receiver
needs it, simplifies the things a lot without any client modificaiton.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - April 16, 2014, 12:57 p.m.
Please, just stop.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 16, 2014, 2:34 p.m.
On Wed, Apr 16, 2014 at 3:17 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 15, 2014 at 08:47:54PM -0700, Andy Lutomirski wrote:
>> On Apr 15, 2014 5:20 PM, "Vivek Goyal" <vgoyal@redhat.com> wrote:
>> >
>> > On Tue, Apr 15, 2014 at 02:53:13PM -0700, Andy Lutomirski wrote:
>> > > On Tue, Apr 15, 2014 at 2:15 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > > > This patch implements socket option SO_PASSCGROUP along the lines of
>> > > > SO_PASSCRED.
>> > > >
>> > > > If SO_PASSCGROUP is set, then recvmsg() will get a control message
>> > > > SCM_CGROUP which will contain the cgroup path of sender. This cgroup
>> > > > belongs to first mounted hierarchy in the sytem.
>> > > >
>> > > > SCM_CGROUP control message can only be received and sender can not send
>> > > > a SCM_CGROUP message. Kernel automatically generates one if receiver
>> > > > chooses to receive one.
>> > > >
>> > > > This works both for unix stream and datagram sockets.
>> > > >
>> > > > cgroup information is passed only if either the sender or receiver has
>> > > > SO_PASSCGROUP option set. This means for existing workloads they should
>> > > > not see any significant performance impact of this change.
>> > >
>> > > This is odd.  Shouldn't an SCM_CGROUP cmsg be generated when the
>> > > receiver has SO_PASSCGROUP set and the sender passes SCM_CGROUP to
>> > > sendmsg?
>> >
>> > How can receiver trust the cgroup info generated by sender. It needs to
>> > be generated by kernel so that receiver can trust it.
>> >
>> > And if receiver needs to know cgroup of sender, receiver can just set
>> > SO_PASSCGROUP on socket and receiver should get one SCM_CGROUP message
>> > with each message received.
>>
>> I think the kernel should validate the data.
>>
>> Here's an attack against SO_PEERCGROUP: if you create a container with
>> a super secret name, then every time you connect to any unix socket,
>> you leak the name.
>
> One should be able to do that already today with SO_PASSCRED option and
> then map pid to cgroup. Or if one is using user namespaces then go
> through uid mappings and  figure out which container sent message.

Not if you've locked down proc, perhaps by using hidepid.

>
>>
>> Here's an attack against SO_PASSCGROUP, as you implemented it: connect
>> a socket and get someone else to write(2) to it.  This isn't very
>> hard.  Now you've impersonated.
>
> If you can get another process to write to your socket and impersonate,
> then what will stop from that process to also send SCM_CGROUP message
> also? So I don't see how SCM_CGROUP from client will solve this problem.
>

I can easily get other processed to write to my socket.  Passing that
socket as stderr to a setuid program is probably the easiest way.
Finding a service that accepts a socket using SCM_RIGHTS and writes to
it is another.  It is supposed to be safe to write(2) to an untrusted
file descriptor, or, at the very least, it is supposed to be a DoS at
worst.  In this case, it's also either an information leak.

It's true that SO_PASSCRED has the same problem.  I consider that to
be a mistake, and I suspect that there are a large number of
longstanding security problems caused by it.  Regardless, we shouldn't
exacerbate this problem.  There is no legacy code using SCM_CGROUP at
all right now, because the option has never been in a released kernel.
 So let's get the interface right the first time around.

If I find some time later today, I can try to write a variant of the
patch that only sends SCM_CGROUP when the sender requests it.

On an unrelated note, what happens when there are multiple cgroup hierarchies?

> Kernel cgroup verification will also not help in this case as sender
> is sending his own cgroup.

Sure it will -- the sender sticks a string into SCM_CGROUP and the
kernel checks it.

>
>>
>> I advocate for the following semantics: if sendmsg is passed a
>> SCM_CGROUP cmsg, and that cmsg has the right cgroup, and the receiver
>> has SO_PASSCGROUP set, then the receiver gets SCM_CGROUP.  If you try
>> to lie using SCM_CGROUP, you get -EPERM.  If you set SO_PASSCGROUP,
>> but your peer doesn't sent SCM_CREDS, you get nothing.
>>
>> This is immune to both attacks.  It should be cheaper, too, since
>> there's no overhead for people who don't use it.
>
> I think you seem to be saying that a client's credentials should not be
> visible to receiver until and unless client himself wants to reveal
> those. IOW, it kind of looks like an anonymous mode of operation where
> client connects to a socket but receiver client not want to reveal any of
> the information about itself to receiver.
>
> I am not sure how useful that mode really is. If it is really useful, I
> think one could implement another socket option on client side to
> deny passing cgroup information to receiver. Say SO_NOPASSCGROUP.

This won't help -- an attacker will simply not set that option, and
the program being attacked is certainly not going to set
SO_NOPASSCGROUP right before calling write.

>
> Before we even get there, I will question that what's so secret about
> cgroup information that one would like to hide it from receiver. We don't
> hide uid, pid, gid.
>

I think we should hide uid, gid, and pid too, but that ship has sailed.

The bigger issue isn't hiding so much as accidental assertions of
authority.  If a program accidentally leaks its uid, that's one thing.
 If a program, by accidentally leaking its uid, causes another program
to think that the sender wants some action to be taken on behalf of
its uid, then there's a real security problem.

> Secondly, how would client know when to send SCM_CGROUP to receiver. For
> the use case I mentioned that init wants to log cgroup of every message
> going into journal. How would client know that every message needs to
> have SCM_CGROUP. By automatically getting client information when receiver
> needs it, simplifies the things a lot without any client modificaiton.

I think that the client *should* be modified.  What if there's an
existing program that runs as a container's root but does not intend
to sign off on a message with its cgroup?

In any event, I still think that the journald case has no need for any
kernel changes at all.  From a very cursory inspection, the journal
code expects to find a socket in /run/systemd/journal/socket.  It
should be enough to stick a different socket into that location in
each container.  This will work on all kernels and may even work
without modifying any code in the container.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 16, 2014, 2:37 p.m.
On Wed, Apr 16, 2014 at 5:57 AM, David Miller <davem@davemloft.net> wrote:
>
> Please, just stop.

No.

This thread is proposing an ABI.  This means that, if the ABI ends up
in Linus's kernel, then it has to be supported forever.  Now is the
time to find and fix any issues with it before they become much harder
to fix.

This ABI is especially tricky because programs will use it even if
they don't explicitly try to.  So just adding the ABI may break
existing assumptions that are relevant to security or correctness.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 16, 2014, 3:10 p.m.
On Wed, Apr 16, 2014 at 07:34:41AM -0700, Andy Lutomirski wrote:
> On Wed, Apr 16, 2014 at 3:17 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, Apr 15, 2014 at 08:47:54PM -0700, Andy Lutomirski wrote:
> >> On Apr 15, 2014 5:20 PM, "Vivek Goyal" <vgoyal@redhat.com> wrote:
> >> >
> >> > On Tue, Apr 15, 2014 at 02:53:13PM -0700, Andy Lutomirski wrote:
> >> > > On Tue, Apr 15, 2014 at 2:15 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > > > This patch implements socket option SO_PASSCGROUP along the lines of
> >> > > > SO_PASSCRED.
> >> > > >
> >> > > > If SO_PASSCGROUP is set, then recvmsg() will get a control message
> >> > > > SCM_CGROUP which will contain the cgroup path of sender. This cgroup
> >> > > > belongs to first mounted hierarchy in the sytem.
> >> > > >
> >> > > > SCM_CGROUP control message can only be received and sender can not send
> >> > > > a SCM_CGROUP message. Kernel automatically generates one if receiver
> >> > > > chooses to receive one.
> >> > > >
> >> > > > This works both for unix stream and datagram sockets.
> >> > > >
> >> > > > cgroup information is passed only if either the sender or receiver has
> >> > > > SO_PASSCGROUP option set. This means for existing workloads they should
> >> > > > not see any significant performance impact of this change.
> >> > >
> >> > > This is odd.  Shouldn't an SCM_CGROUP cmsg be generated when the
> >> > > receiver has SO_PASSCGROUP set and the sender passes SCM_CGROUP to
> >> > > sendmsg?
> >> >
> >> > How can receiver trust the cgroup info generated by sender. It needs to
> >> > be generated by kernel so that receiver can trust it.
> >> >
> >> > And if receiver needs to know cgroup of sender, receiver can just set
> >> > SO_PASSCGROUP on socket and receiver should get one SCM_CGROUP message
> >> > with each message received.
> >>
> >> I think the kernel should validate the data.
> >>
> >> Here's an attack against SO_PEERCGROUP: if you create a container with
> >> a super secret name, then every time you connect to any unix socket,
> >> you leak the name.
> >
> > One should be able to do that already today with SO_PASSCRED option and
> > then map pid to cgroup. Or if one is using user namespaces then go
> > through uid mappings and  figure out which container sent message.
> 
> Not if you've locked down proc, perhaps by using hidepid.
> 

I think before we dive into smaller details lets take a step back. Core of
your argument seems to be that exposing cgroup information of sender to
receiver is a security risk. Hence only sender should decide when to
send that information and when not to.

I find several issues here.

- Why exposing cgroup information of sender is a security risk?

- How would a sender decide when to send SCM_CGROUP info and when not to.

- Additional higher level protocols need to be established now between
  sender and receiver so that they agree upon that cgroup information need
  to be sent. This will just make the implementation complicated and 
  this should be done only if benefits outweigh the cons.  

> >
> >>
> >> Here's an attack against SO_PASSCGROUP, as you implemented it: connect
> >> a socket and get someone else to write(2) to it.  This isn't very
> >> hard.  Now you've impersonated.
> >
> > If you can get another process to write to your socket and impersonate,
> > then what will stop from that process to also send SCM_CGROUP message
> > also? So I don't see how SCM_CGROUP from client will solve this problem.
> >
> 
> I can easily get other processed to write to my socket.  Passing that
> socket as stderr to a setuid program is probably the easiest way.
> Finding a service that accepts a socket using SCM_RIGHTS and writes to
> it is another.  It is supposed to be safe to write(2) to an untrusted
> file descriptor, or, at the very least, it is supposed to be a DoS at
> worst.  In this case, it's also either an information leak.

Hold on, so what's the problem here.

- First of all, if you opened the connection SO_PEERCGROUP will still give
  the right information to receiver. Does not matter if later you got some
  priviliged process to write to that socket descriptor.

- Now if you passed fd to a privliged process as stderr, it is not asking
  for any services. And if it does, then there is a design issue on that
  privliged service side. 

I really don't see any issue here.

> 
> It's true that SO_PASSCRED has the same problem.  I consider that to
> be a mistake, and I suspect that there are a large number of
> longstanding security problems caused by it.  Regardless, we shouldn't
> exacerbate this problem.  There is no legacy code using SCM_CGROUP at
> all right now, because the option has never been in a released kernel.
>  So let's get the interface right the first time around.
> 
> If I find some time later today, I can try to write a variant of the
> patch that only sends SCM_CGROUP when the sender requests it.

I don't think implementation is the issue here. You need to provide a
stronger argument that why passing cgroup information is a security
risk.

> 
> On an unrelated note, what happens when there are multiple cgroup hierarchies?
> 

People are moving away from multiple cgroup hierarchiy model. In future a
single hiearchy is planned and multiple hierarchy will remain there only
for backward compatibility.

Passing cgroup of task in every hierarchy becomes too expensive. (4K per
hierarchy). And its  not clear how many hierarchies are present in the
system.

So there is not much point in passing cgroup of task in every hierarchy as
it is slowly being phased out.


> > Kernel cgroup verification will also not help in this case as sender
> > is sending his own cgroup.
> 
> Sure it will -- the sender sticks a string into SCM_CGROUP and the
> kernel checks it.

Kernel will check whether sender is sending right cgroup information or
not. So in your example, if you passed fd to a privliged process and
that process sends an SCM_CGROUP message, then kernel will just let
it go. SCM_CGROUP contains the cgroup of priviliged process and that's
the right information. 

If a priviliged process is asking for a service on a fd passed by 
unpriviliged process, that itself is a risk. And that risk will not
be mitigated by forcing the usage of SCM_CGROUP. 

> 
> >
> >>
> >> I advocate for the following semantics: if sendmsg is passed a
> >> SCM_CGROUP cmsg, and that cmsg has the right cgroup, and the receiver
> >> has SO_PASSCGROUP set, then the receiver gets SCM_CGROUP.  If you try
> >> to lie using SCM_CGROUP, you get -EPERM.  If you set SO_PASSCGROUP,
> >> but your peer doesn't sent SCM_CREDS, you get nothing.
> >>
> >> This is immune to both attacks.  It should be cheaper, too, since
> >> there's no overhead for people who don't use it.
> >
> > I think you seem to be saying that a client's credentials should not be
> > visible to receiver until and unless client himself wants to reveal
> > those. IOW, it kind of looks like an anonymous mode of operation where
> > client connects to a socket but receiver client not want to reveal any of
> > the information about itself to receiver.
> >
> > I am not sure how useful that mode really is. If it is really useful, I
> > think one could implement another socket option on client side to
> > deny passing cgroup information to receiver. Say SO_NOPASSCGROUP.
> 
> This won't help -- an attacker will simply not set that option, and
> the program being attacked is certainly not going to set
> SO_NOPASSCGROUP right before calling write.

This was for the case where you gave example of "super secret container"
which did not want to reveal its identity to receiver. 

For the case of priviliged process asking for service on an fd passed
by unpriviliged process, SCM_CGROUP will not help you either. So it is
a bad idea to begin with.


> 
> >
> > Before we even get there, I will question that what's so secret about
> > cgroup information that one would like to hide it from receiver. We don't
> > hide uid, pid, gid.
> >
> 
> I think we should hide uid, gid, and pid too, but that ship has sailed.
> 
> The bigger issue isn't hiding so much as accidental assertions of
> authority.  If a program accidentally leaks its uid, that's one thing.
>  If a program, by accidentally leaking its uid, causes another program
> to think that the sender wants some action to be taken on behalf of
> its uid, then there's a real security problem.
> 
> > Secondly, how would client know when to send SCM_CGROUP to receiver. For
> > the use case I mentioned that init wants to log cgroup of every message
> > going into journal. How would client know that every message needs to
> > have SCM_CGROUP. By automatically getting client information when receiver
> > needs it, simplifies the things a lot without any client modificaiton.
> 
> I think that the client *should* be modified.  What if there's an
> existing program that runs as a container's root but does not intend
> to sign off on a message with its cgroup?

That goes back to my previous question. Why revealing cgroup of sender
is a problem.

And in case we agree that it is a problem SO_NOPASSCGROUP will solve
that problem.

> 
> In any event, I still think that the journald case has no need for any
> kernel changes at all.  From a very cursory inspection, the journal
> code expects to find a socket in /run/systemd/journal/socket.  It
> should be enough to stick a different socket into that location in
> each container.  This will work on all kernels and may even work
> without modifying any code in the container.

This is a little generic discussion. No containers here. Think of various
services running in their own cgroups and logging messages into journal.

And that's only one use case.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 16, 2014, 4:13 p.m.
On Wed, 2014-04-16 at 07:37 -0700, Andy Lutomirski wrote:
> On Wed, Apr 16, 2014 at 5:57 AM, David Miller <davem@davemloft.net> wrote:
> >
> > Please, just stop.
> 
> No.
> 
> This thread is proposing an ABI.  This means that, if the ABI ends up
> in Linus's kernel, then it has to be supported forever.  Now is the
> time to find and fix any issues with it before they become much harder
> to fix.

Ok, but so far I haven't seen a single objection from you that has solid
grounds.

The only one that *may* be reasonable is the "secret" cgroup name one,
however nobody seem to come up with a reason why it is legitimate to
allow to keep cgroup names secret.

And if you can come up with such a good reason the SO_NOPASSCGROUP
option seem the right solution.

> This ABI is especially tricky because programs will use it even if
> they don't explicitly try to.  So just adding the ABI may break
> existing assumptions that are relevant to security or correctness.

It's not clear to me what you mean by this, either you explicitly use
SO_PASSCGROUP or not, it's not like you can involuntarily add a flag ...

Simo.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - April 16, 2014, 4:21 p.m.
Hello,

On Wed, Apr 16, 2014 at 12:13:57PM -0400, Simo Sorce wrote:
> The only one that *may* be reasonable is the "secret" cgroup name one,
> however nobody seem to come up with a reason why it is legitimate to
> allow to keep cgroup names secret.

Ugh, please don't play security games with cgroup names.  It is one of
the identifying properties of a task, like a pid, and will be used in
other parts of the kernel to match groups of tasks.  If we play
security peekaboo with cgroup names, it has to be transitive and puts
burdens on all its future uses.  Unless there are *REALLY* strong
rationales, which can also justify hiding pids, this isn't happening.

Thanks.
Andrew Lutomirski - April 16, 2014, 4:31 p.m.
On Wed, Apr 16, 2014 at 9:13 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Wed, 2014-04-16 at 07:37 -0700, Andy Lutomirski wrote:
>> On Wed, Apr 16, 2014 at 5:57 AM, David Miller <davem@davemloft.net> wrote:
>> >
>> > Please, just stop.
>>
>> No.
>>
>> This thread is proposing an ABI.  This means that, if the ABI ends up
>> in Linus's kernel, then it has to be supported forever.  Now is the
>> time to find and fix any issues with it before they become much harder
>> to fix.
>
> Ok, but so far I haven't seen a single objection from you that has solid
> grounds.

CVE-2013-1959 was caused by a new kernel feature causing a call to
write(2) to behave as though the caller was authenticating itself to
something else where, in previous kernels, write(2) did not
authenticate.

Admittedly cgroups aren't currently as important as uid, but if this
changes, then SO_PASSCGROUP, as currently written, will have *exactly*
the same problem.

>
> The only one that *may* be reasonable is the "secret" cgroup name one,
> however nobody seem to come up with a reason why it is legitimate to
> allow to keep cgroup names secret.
>
> And if you can come up with such a good reason the SO_NOPASSCGROUP
> option seem the right solution.
>
>> This ABI is especially tricky because programs will use it even if
>> they don't explicitly try to.  So just adding the ABI may break
>> existing assumptions that are relevant to security or correctness.
>
> It's not clear to me what you mean by this, either you explicitly use
> SO_PASSCGROUP or not, it's not like you can involuntarily add a flag ...
>

The issue here is that the receiver sets SO_(PASS|PEER)CGROUP, forcing
the sender to identify or authenticate itself.  The sender might not
want to identify itself.  Even if you don't buy any secrecy arguments,
the sender might not intend to authenticate.  Certainly no existing
callers of connect or write intend to authenticate using their cgroup,
since current kernels don't have those semantics.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 16, 2014, 4:54 p.m.
On Wed, 2014-04-16 at 12:21 -0400, Tejun Heo wrote:
> Hello,
> 
> On Wed, Apr 16, 2014 at 12:13:57PM -0400, Simo Sorce wrote:
> > The only one that *may* be reasonable is the "secret" cgroup name one,
> > however nobody seem to come up with a reason why it is legitimate to
> > allow to keep cgroup names secret.
> 
> Ugh, please don't play security games with cgroup names.  It is one of
> the identifying properties of a task, like a pid, and will be used in
> other parts of the kernel to match groups of tasks.  If we play
> security peekaboo with cgroup names, it has to be transitive and puts
> burdens on all its future uses.  Unless there are *REALLY* strong
> rationales, which can also justify hiding pids, this isn't happening.

FWIW, I totally agree with you, it's Andy Lutomirski that is coming up
with this "secret" cgropus name idea, nobody else (so far) seem to agree
it makes sense.

Simo.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 16, 2014, 5:02 p.m.
On Wed, 2014-04-16 at 09:31 -0700, Andy Lutomirski wrote:
> On Wed, Apr 16, 2014 at 9:13 AM, Simo Sorce <ssorce@redhat.com> wrote:
> > On Wed, 2014-04-16 at 07:37 -0700, Andy Lutomirski wrote:
> >> On Wed, Apr 16, 2014 at 5:57 AM, David Miller <davem@davemloft.net> wrote:
> >> >
> >> > Please, just stop.
> >>
> >> No.
> >>
> >> This thread is proposing an ABI.  This means that, if the ABI ends up
> >> in Linus's kernel, then it has to be supported forever.  Now is the
> >> time to find and fix any issues with it before they become much harder
> >> to fix.
> >
> > Ok, but so far I haven't seen a single objection from you that has solid
> > grounds.
> 
> CVE-2013-1959 was caused by a new kernel feature causing a call to
> write(2) to behave as though the caller was authenticating itself to
> something else where, in previous kernels, write(2) did not
> authenticate.
> 
> Admittedly cgroups aren't currently as important as uid, but if this
> changes, then SO_PASSCGROUP, as currently written, will have *exactly*
> the same problem.

Which is easy to foil by using SO_PEERCGROUP and find out who originally
opened the socket, which is why that is also available!

> > The only one that *may* be reasonable is the "secret" cgroup name one,
> > however nobody seem to come up with a reason why it is legitimate to
> > allow to keep cgroup names secret.
> >
> > And if you can come up with such a good reason the SO_NOPASSCGROUP
> > option seem the right solution.
> >
> >> This ABI is especially tricky because programs will use it even if
> >> they don't explicitly try to.  So just adding the ABI may break
> >> existing assumptions that are relevant to security or correctness.
> >
> > It's not clear to me what you mean by this, either you explicitly use
> > SO_PASSCGROUP or not, it's not like you can involuntarily add a flag ...
> >
> 
> The issue here is that the receiver sets SO_(PASS|PEER)CGROUP, forcing
> the sender to identify or authenticate itself.  The sender might not
> want to identify itself.

You need to explain, why, on a host, when an application connects to
another, it is ok to make it anonymous. If you do not want to expose
yourself, do not talk to other applications in the first place.

>   Even if you don't buy any secrecy arguments,

I don't.

> the sender might not intend to authenticate.  Certainly no existing
> callers of connect or write intend to authenticate using their cgroup,
> since current kernels don't have those semantics.

This is a passive check, it's the receiver that wants to make decisions
about who is connecting, again if you do not want to "disclose"
information do not connect in the first place ?

In the rare case where you may have a legitimate reason to conceal a
process, it will be very simply for the admin to set up a proxy process
that will just terminate the original sender's identity and present only
the proxy identity to the receiver and perform message passing between 2
sockets.

So I rest my case.

Simo.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 16, 2014, 5:29 p.m.
On Wed, Apr 16, 2014 at 10:02 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Wed, 2014-04-16 at 09:31 -0700, Andy Lutomirski wrote:
>> On Wed, Apr 16, 2014 at 9:13 AM, Simo Sorce <ssorce@redhat.com> wrote:
>> > On Wed, 2014-04-16 at 07:37 -0700, Andy Lutomirski wrote:
>> >> On Wed, Apr 16, 2014 at 5:57 AM, David Miller <davem@davemloft.net> wrote:
>> >> >
>> >> > Please, just stop.
>> >>
>> >> No.
>> >>
>> >> This thread is proposing an ABI.  This means that, if the ABI ends up
>> >> in Linus's kernel, then it has to be supported forever.  Now is the
>> >> time to find and fix any issues with it before they become much harder
>> >> to fix.
>> >
>> > Ok, but so far I haven't seen a single objection from you that has solid
>> > grounds.
>>
>> CVE-2013-1959 was caused by a new kernel feature causing a call to
>> write(2) to behave as though the caller was authenticating itself to
>> something else where, in previous kernels, write(2) did not
>> authenticate.
>>
>> Admittedly cgroups aren't currently as important as uid, but if this
>> changes, then SO_PASSCGROUP, as currently written, will have *exactly*
>> the same problem.
>
> Which is easy to foil by using SO_PEERCGROUP and find out who originally
> opened the socket, which is why that is also available!

Then please remove SO_PASSCGROUP.

>> The issue here is that the receiver sets SO_(PASS|PEER)CGROUP, forcing
>> the sender to identify or authenticate itself.  The sender might not
>> want to identify itself.
>
> You need to explain, why, on a host, when an application connects to
> another, it is ok to make it anonymous. If you do not want to expose
> yourself, do not talk to other applications in the first place.

Why is anonymity harmful here?  I don't have a great argument off the
top of my head for why it's necessary, but neither do I see why it's
bad.  And I think a lack of anonymity is asking for trouble (see
below).

>
>> the sender might not intend to authenticate.  Certainly no existing
>> callers of connect or write intend to authenticate using their cgroup,
>> since current kernels don't have those semantics.
>
> This is a passive check, it's the receiver that wants to make decisions
> about who is connecting, again if you do not want to "disclose"
> information do not connect in the first place ?
>

Let's say I have a receiver.  I'll call it journald, since I think
that's one of the eventual use cases.

Let's also say I some random program on my box.  It is willing to
answer requests on behalf of anyone else with the same uid, and it
will happily open a given unix socket and send the requester the file
descriptor.  Such a program is a bit odd, but it's perfectly safe
right now.

Now add a malicious program into the mix.  It asks the daemon to open
/run/systemd/journal/socket.  Then it starts writing to the fd.

Whoops, now the malicious program can impersonate the helper.  This
happens because SO_PEERCGROUP (and journald's use of SO_PEERCGROUP)
caused connect(2) to produce a descriptor that carries a permission
that the descriptor did not carry in the past, and because the caller
of connect(2) did not need to opt in to the new behavior.

I still haven't seen any explanation for what's wrong with requiring
senders to ask the kernel to transmit their cgroup.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 16, 2014, 5:34 p.m.
On Wed, 2014-04-16 at 10:29 -0700, Andy Lutomirski wrote:
> Then please remove SO_PASSCGROUP.

Can you stop demanding changes while demonstrating you haven't well
understood the needs, let alone the consequences ?

Take a day or 2, put your thoughts together and come back with a clear
description of your concerns if you still have any, please.

You clearly do not like this proposal, but you can't really articulate
why, all I hear at this point are the scratches of nails on the glass
wall you are trying to climb.

Seriously, if there are reasonable objections we all want to hear them,
but they need to be clear and circumstanced.

Simo.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 16, 2014, 5:53 p.m.
On Wed, Apr 16, 2014 at 10:34 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Wed, 2014-04-16 at 10:29 -0700, Andy Lutomirski wrote:
>> Then please remove SO_PASSCGROUP.
>
> Can you stop demanding changes while demonstrating you haven't well
> understood the needs, let alone the consequences ?
>
> Take a day or 2, put your thoughts together and come back with a clear
> description of your concerns if you still have any, please.

OK, if one of you can give a description of what the needs are.  So
far, I've heard:

SSSD, but there was already a longish thread on why SSSD could solve
all of its problems without any kernel changes at all.  I didn't see a
compelling answer there.

journald for things in containers, but this doesn't seem to me to
require any help from the kernel.

journald for things not in containers.  This one is legit, although it
can be solved, albeit with some races, in current kernels.  Is this
the only example?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 16, 2014, 6:06 p.m.
On Wed, Apr 16, 2014 at 09:31:25AM -0700, Andy Lutomirski wrote:
> On Wed, Apr 16, 2014 at 9:13 AM, Simo Sorce <ssorce@redhat.com> wrote:
> > On Wed, 2014-04-16 at 07:37 -0700, Andy Lutomirski wrote:
> >> On Wed, Apr 16, 2014 at 5:57 AM, David Miller <davem@davemloft.net> wrote:
> >> >
> >> > Please, just stop.
> >>
> >> No.
> >>
> >> This thread is proposing an ABI.  This means that, if the ABI ends up
> >> in Linus's kernel, then it has to be supported forever.  Now is the
> >> time to find and fix any issues with it before they become much harder
> >> to fix.
> >
> > Ok, but so far I haven't seen a single objection from you that has solid
> > grounds.
> 
> CVE-2013-1959 was caused by a new kernel feature causing a call to
> write(2) to behave as though the caller was authenticating itself to
> something else where, in previous kernels, write(2) did not
> authenticate.
> 
> Admittedly cgroups aren't currently as important as uid, but if this
> changes, then SO_PASSCGROUP, as currently written, will have *exactly*
> the same problem.

I am not sure how same issue with happen with cgroups. In the case of
socket example, you are forcing a setuid program to write to standard
output and that setuid program will run in same cgroup as caller and
will have same cgroup as caller. So even if somebody was using cgroup
information for authentication, atleast in this particular case it
will not be a problem. Both unpriviliged and priviliged programs has
same cgroups.

> 
> >
> > The only one that *may* be reasonable is the "secret" cgroup name one,
> > however nobody seem to come up with a reason why it is legitimate to
> > allow to keep cgroup names secret.
> >
> > And if you can come up with such a good reason the SO_NOPASSCGROUP
> > option seem the right solution.
> >
> >> This ABI is especially tricky because programs will use it even if
> >> they don't explicitly try to.  So just adding the ABI may break
> >> existing assumptions that are relevant to security or correctness.
> >
> > It's not clear to me what you mean by this, either you explicitly use
> > SO_PASSCGROUP or not, it's not like you can involuntarily add a flag ...
> >
> 
> The issue here is that the receiver sets SO_(PASS|PEER)CGROUP, forcing
> the sender to identify or authenticate itself.  The sender might not
> want to identify itself.  Even if you don't buy any secrecy arguments,
> the sender might not intend to authenticate.  Certainly no existing
> callers of connect or write intend to authenticate using their cgroup,
> since current kernels don't have those semantics.

Ok, so passing cgroup information is not necessarily a problem as long
as it is not used for authentication. So say somebody is just logging
all the client request and which cgroup client was in, that should not
be a problem.

I agree that before somebody uses cgroup information for authentication
purposes, may be there needs to be a bigger debate whether this info
can be used safely for authentication purposes or not and in what
circumstances it is safe to use for authentication.

But that does not mean that API to pass the cgroup information around is
wrong.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 16, 2014, 6:13 p.m.
On Wed, Apr 16, 2014 at 11:06 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Apr 16, 2014 at 09:31:25AM -0700, Andy Lutomirski wrote:
> I am not sure how same issue with happen with cgroups. In the case of
> socket example, you are forcing a setuid program to write to standard
> output and that setuid program will run in same cgroup as caller and
> will have same cgroup as caller. So even if somebody was using cgroup
> information for authentication, atleast in this particular case it
> will not be a problem. Both unpriviliged and priviliged programs has
> same cgroups.
>

I'm not sure that there's an actual attackable program.  But I also
see no reason to be convinced that there isn't one, and the problem
can easily be avoided by requiring programs to explicitly ask to send
their cgroup.

>>
>> >
>> > The only one that *may* be reasonable is the "secret" cgroup name one,
>> > however nobody seem to come up with a reason why it is legitimate to
>> > allow to keep cgroup names secret.
>> >
>> > And if you can come up with such a good reason the SO_NOPASSCGROUP
>> > option seem the right solution.
>> >
>> >> This ABI is especially tricky because programs will use it even if
>> >> they don't explicitly try to.  So just adding the ABI may break
>> >> existing assumptions that are relevant to security or correctness.
>> >
>> > It's not clear to me what you mean by this, either you explicitly use
>> > SO_PASSCGROUP or not, it's not like you can involuntarily add a flag ...
>> >
>>
>> The issue here is that the receiver sets SO_(PASS|PEER)CGROUP, forcing
>> the sender to identify or authenticate itself.  The sender might not
>> want to identify itself.  Even if you don't buy any secrecy arguments,
>> the sender might not intend to authenticate.  Certainly no existing
>> callers of connect or write intend to authenticate using their cgroup,
>> since current kernels don't have those semantics.
>
> Ok, so passing cgroup information is not necessarily a problem as long
> as it is not used for authentication. So say somebody is just logging
> all the client request and which cgroup client was in, that should not
> be a problem.

Do you consider correct attribution of logging messages to be
important?  If so, then this is a kind of authentication, albeit one
where the impact of screwing it up is a bit lower.

>
> I agree that before somebody uses cgroup information for authentication
> purposes, may be there needs to be a bigger debate whether this info
> can be used safely for authentication purposes or not and in what
> circumstances it is safe to use for authentication.

I thought that the original intended user of these patches was SSSD.
I have no idea what SSSD wanted them for, but I think it may better.

>
> But that does not mean that API to pass the cgroup information around is
> wrong.
>

It may not be wrong, but it might be extremely difficult or impossible
to use it safely.  I think that's something to avoid.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 16, 2014, 6:25 p.m.
On Wed, Apr 16, 2014 at 11:13:31AM -0700, Andy Lutomirski wrote:

[..]
> > Ok, so passing cgroup information is not necessarily a problem as long
> > as it is not used for authentication. So say somebody is just logging
> > all the client request and which cgroup client was in, that should not
> > be a problem.
> 
> Do you consider correct attribution of logging messages to be
> important?  If so, then this is a kind of authentication, albeit one
> where the impact of screwing it up is a bit lower.

So not passing cgroup information makes attribution more correct. Just
logging of information is authentication how? Both kernel and user space
log message into /var/log/messages and kernel messages are prefixed with
"kernel". So this somehow becomes are sort of authentication. I don't
get it.

> 
> >
> > I agree that before somebody uses cgroup information for authentication
> > purposes, may be there needs to be a bigger debate whether this info
> > can be used safely for authentication purposes or not and in what
> > circumstances it is safe to use for authentication.
> 
> I thought that the original intended user of these patches was SSSD.
> I have no idea what SSSD wanted them for, but I think it may better.

SSSD wanted to use this information too. And I think this is a good time 
to revisit and discuss can cgroup information be used safely for
authentication or not.

> 
> >
> > But that does not mean that API to pass the cgroup information around is
> > wrong.
> >
> 
> It may not be wrong, but it might be extremely difficult or impossible
> to use it safely.  I think that's something to avoid.

Atleast I can't see a problem with logging example yet.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 16, 2014, 6:35 p.m.
On Wed, Apr 16, 2014 at 11:25 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Apr 16, 2014 at 11:13:31AM -0700, Andy Lutomirski wrote:
>
> [..]
>> > Ok, so passing cgroup information is not necessarily a problem as long
>> > as it is not used for authentication. So say somebody is just logging
>> > all the client request and which cgroup client was in, that should not
>> > be a problem.
>>
>> Do you consider correct attribution of logging messages to be
>> important?  If so, then this is a kind of authentication, albeit one
>> where the impact of screwing it up is a bit lower.
>
> So not passing cgroup information makes attribution more correct. Just
> logging of information is authentication how? Both kernel and user space
> log message into /var/log/messages and kernel messages are prefixed with
> "kernel". So this somehow becomes are sort of authentication. I don't
> get it.

I did a bad job of explaining what I meant.

I think that, currently, log lines can be correctly attributed to the
kernel or to userspace, but determining where in userspace a log line
came from is a bit flaky.  One of the goals of these patches is to
make log attribution less flaky.  But if you want log attribution to
be completely correct, even in the presence of malicious programs,
then I think that the current patches aren't quite there.

Is the reason that you don't want to modify the senders because you
want users of syslog(3) to get the new behavior?  If so, I think it
would be nice to update glibc to fix that, but maybe the kernel should
cooperate, and maybe SO_PEERCGROUP is a decent way to handle this.

I still think that SO_PASSCGROUP, as currently designed, is problematic.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 16, 2014, 6:36 p.m.
On Wed, Apr 16, 2014 at 10:29:08AM -0700, Andy Lutomirski wrote:

[..]
> >> Admittedly cgroups aren't currently as important as uid, but if this
> >> changes, then SO_PASSCGROUP, as currently written, will have *exactly*
> >> the same problem.
> >
> > Which is easy to foil by using SO_PEERCGROUP and find out who originally
> > opened the socket, which is why that is also available!
> 
> Then please remove SO_PASSCGROUP.

SO_PASSCGROUP is important because SO_PEERCGROUP does not work with unix
datagram sockets.

Again going back to logging example, if some clients are logging to unix
datagram sockets, SO_PASSCGROUP is the only option to figure out cgroup
of client.

> 
> >> The issue here is that the receiver sets SO_(PASS|PEER)CGROUP, forcing
> >> the sender to identify or authenticate itself.  The sender might not
> >> want to identify itself.
> >
> > You need to explain, why, on a host, when an application connects to
> > another, it is ok to make it anonymous. If you do not want to expose
> > yourself, do not talk to other applications in the first place.
> 
> Why is anonymity harmful here?  I don't have a great argument off the
> top of my head for why it's necessary, but neither do I see why it's
> bad.

So once we have example where anonymity helps, we can implement
SO_NOPASSCGROUP to take care of it.

> And I think a lack of anonymity is asking for trouble (see
> below).

I read the example, below. I don't get where the *lack of anonymity*
is a problem.

You just seem to referring to the fact that SO_PEERCGROUP will not
be reliable if after opening a connection fd was passed to a program
in a different cgroup. And that's one reason why SCM_PASSCGROUP is
also implemented so that these kind of issues can be avoided.

So I fail to see what's the relation to *anonymity* in your example.

> 
> >
> >> the sender might not intend to authenticate.  Certainly no existing
> >> callers of connect or write intend to authenticate using their cgroup,
> >> since current kernels don't have those semantics.
> >
> > This is a passive check, it's the receiver that wants to make decisions
> > about who is connecting, again if you do not want to "disclose"
> > information do not connect in the first place ?
> >
> 
> Let's say I have a receiver.  I'll call it journald, since I think
> that's one of the eventual use cases.
> 
> Let's also say I some random program on my box.  It is willing to
> answer requests on behalf of anyone else with the same uid, and it
> will happily open a given unix socket and send the requester the file
> descriptor.  Such a program is a bit odd, but it's perfectly safe
> right now.
> 
> Now add a malicious program into the mix.  It asks the daemon to open
> /run/systemd/journal/socket.  Then it starts writing to the fd.
> 
> Whoops, now the malicious program can impersonate the helper.  This
> happens because SO_PEERCGROUP (and journald's use of SO_PEERCGROUP)
> caused connect(2) to produce a descriptor that carries a permission
> that the descriptor did not carry in the past, and because the caller
> of connect(2) did not need to opt in to the new behavior.
> 
> I still haven't seen any explanation for what's wrong with requiring
> senders to ask the kernel to transmit their cgroup.

If nothing else, additional complexity and ovhead. Extra pair of messages
need to be exchanged to request and then provide the information.

How would it work in logging example? Every time logger receives a
message, is it supposed to send another message to client to send
SCM_CGROUP? That does not sound right.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 16, 2014, 6:40 p.m.
On Wed, Apr 16, 2014 at 11:36 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Apr 16, 2014 at 10:29:08AM -0700, Andy Lutomirski wrote:
>
> [..]
>> >> Admittedly cgroups aren't currently as important as uid, but if this
>> >> changes, then SO_PASSCGROUP, as currently written, will have *exactly*
>> >> the same problem.
>> >
>> > Which is easy to foil by using SO_PEERCGROUP and find out who originally
>> > opened the socket, which is why that is also available!
>>
>> Then please remove SO_PASSCGROUP.
>
> SO_PASSCGROUP is important because SO_PEERCGROUP does not work with unix
> datagram sockets.

Right.  I forgot about that.

>
> Again going back to logging example, if some clients are logging to unix
> datagram sockets, SO_PASSCGROUP is the only option to figure out cgroup
> of client.

Hmm.  I think that, in your patch, the cgroup that is sent is the
cgroup of the caller of write/send/sendmsg.  What if you changed it to
use the same cgroup that SO_PEERCRED would use?  Would that still
work?

>>
>> I still haven't seen any explanation for what's wrong with requiring
>> senders to ask the kernel to transmit their cgroup.
>
> If nothing else, additional complexity and ovhead. Extra pair of messages
> need to be exchanged to request and then provide the information.
>
> How would it work in logging example? Every time logger receives a
> message, is it supposed to send another message to client to send
> SCM_CGROUP? That does not sound right.

No -- just have the logger send the cgroup with every message.  Yes,
it seems silly, but it's probably barely more expensive than with the
code in your patch.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 16, 2014, 6:51 p.m.
On Wed, Apr 16, 2014 at 11:40:44AM -0700, Andy Lutomirski wrote:
> On Wed, Apr 16, 2014 at 11:36 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Wed, Apr 16, 2014 at 10:29:08AM -0700, Andy Lutomirski wrote:
> >
> > [..]
> >> >> Admittedly cgroups aren't currently as important as uid, but if this
> >> >> changes, then SO_PASSCGROUP, as currently written, will have *exactly*
> >> >> the same problem.
> >> >
> >> > Which is easy to foil by using SO_PEERCGROUP and find out who originally
> >> > opened the socket, which is why that is also available!
> >>
> >> Then please remove SO_PASSCGROUP.
> >
> > SO_PASSCGROUP is important because SO_PEERCGROUP does not work with unix
> > datagram sockets.
> 
> Right.  I forgot about that.
> 
> >
> > Again going back to logging example, if some clients are logging to unix
> > datagram sockets, SO_PASSCGROUP is the only option to figure out cgroup
> > of client.
> 
> Hmm.  I think that, in your patch, the cgroup that is sent is the
> cgroup of the caller of write/send/sendmsg.  What if you changed it to
> use the same cgroup that SO_PEERCRED would use?  Would that still
> work?

No. SO_PEERCRED stores the cgroup information once at the time of
connect(). After that it never changes.

What if sender changes the cgroup. That information will not be captured.
Also what if multiple client use the same socket fd to writer to logger?
In that case too storing cgroup info in socket will not help.

Cgroup is sender task's property and not client side socket's property. 

> 
> >>
> >> I still haven't seen any explanation for what's wrong with requiring
> >> senders to ask the kernel to transmit their cgroup.
> >
> > If nothing else, additional complexity and ovhead. Extra pair of messages
> > need to be exchanged to request and then provide the information.
> >
> > How would it work in logging example? Every time logger receives a
> > message, is it supposed to send another message to client to send
> > SCM_CGROUP? That does not sound right.
> 
> No -- just have the logger send the cgroup with every message.  Yes,
> it seems silly, but it's probably barely more expensive than with the
> code in your patch.

So receiver gets the cgroup messages even if it might not want to. There
is no way to say "Hey don't send me SCM_CGROUP's messages".

Now all loggers need to be modifed to always send SCM_CGROUP messages. And
all other more complicated cases might need a different consideration and
clients and servers will need to be modified accordingly.

I think it is much simpler to allow passing of cgroup information and
once we figure out some concrete cases where passing of that info is
not desirabe, implement SO_NOPASSCGROUP and modify those *selected few
corner cases* to set this flag on sockets.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 16, 2014, 6:59 p.m.
On Wed, Apr 16, 2014 at 11:51 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Apr 16, 2014 at 11:40:44AM -0700, Andy Lutomirski wrote:
>> On Wed, Apr 16, 2014 at 11:36 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Wed, Apr 16, 2014 at 10:29:08AM -0700, Andy Lutomirski wrote:
>> >
>> > [..]
>> >> >> Admittedly cgroups aren't currently as important as uid, but if this
>> >> >> changes, then SO_PASSCGROUP, as currently written, will have *exactly*
>> >> >> the same problem.
>> >> >
>> >> > Which is easy to foil by using SO_PEERCGROUP and find out who originally
>> >> > opened the socket, which is why that is also available!
>> >>
>> >> Then please remove SO_PASSCGROUP.
>> >
>> > SO_PASSCGROUP is important because SO_PEERCGROUP does not work with unix
>> > datagram sockets.
>>
>> Right.  I forgot about that.
>>
>> >
>> > Again going back to logging example, if some clients are logging to unix
>> > datagram sockets, SO_PASSCGROUP is the only option to figure out cgroup
>> > of client.
>>
>> Hmm.  I think that, in your patch, the cgroup that is sent is the
>> cgroup of the caller of write/send/sendmsg.  What if you changed it to
>> use the same cgroup that SO_PEERCRED would use?  Would that still
>> work?
>
> No. SO_PEERCRED stores the cgroup information once at the time of
> connect(). After that it never changes.
>
> What if sender changes the cgroup. That information will not be captured.
> Also what if multiple client use the same socket fd to writer to logger?
> In that case too storing cgroup info in socket will not help.


What is the use case of SO_PEERCRED, then?

Why can't clients that change cgroup reopen the socket?  They're
already cgroup-aware.  As far as I know, there is probably exactly one
client that actually changes cgroups and then tries to log without
execing first: systemd (or journald as used by systemd).  And this is
exactly the component that needs to change to use any new socket
option, no matter what.


>> > How would it work in logging example? Every time logger receives a
>> > message, is it supposed to send another message to client to send
>> > SCM_CGROUP? That does not sound right.
>>
>> No -- just have the logger send the cgroup with every message.  Yes,
>> it seems silly, but it's probably barely more expensive than with the
>> code in your patch.
>
> So receiver gets the cgroup messages even if it might not want to. There
> is no way to say "Hey don't send me SCM_CGROUP's messages".

The receiver would only get SCM_CGROUP messages if it set SO_PASSCGROUP.

>
> Now all loggers need to be modifed to always send SCM_CGROUP messages. And
> all other more complicated cases might need a different consideration and
> clients and servers will need to be modified accordingly.
>
> I think it is much simpler to allow passing of cgroup information and
> once we figure out some concrete cases where passing of that info is
> not desirabe, implement SO_NOPASSCGROUP and modify those *selected few
> corner cases* to set this flag on sockets.

The problem with SO_NOPASSCGROUP is that the programs that would need
to set it are probably already written and don't care about cgroups.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 16, 2014, 6:59 p.m.
On Wed, Apr 16, 2014 at 11:13:31AM -0700, Andy Lutomirski wrote:
> On Wed, Apr 16, 2014 at 11:06 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Wed, Apr 16, 2014 at 09:31:25AM -0700, Andy Lutomirski wrote:
> > I am not sure how same issue with happen with cgroups. In the case of
> > socket example, you are forcing a setuid program to write to standard
> > output and that setuid program will run in same cgroup as caller and
> > will have same cgroup as caller. So even if somebody was using cgroup
> > information for authentication, atleast in this particular case it
> > will not be a problem. Both unpriviliged and priviliged programs has
> > same cgroups.
> >
> 
> I'm not sure that there's an actual attackable program.  But I also
> see no reason to be convinced that there isn't one, and the problem
> can easily be avoided by requiring programs to explicitly ask to send
> their cgroup.

If you can't prove that there is something fundamentally wrong with
passing cgroup info to receiver, there is no reason to block these
patches either.

We can't fix the problems which we can't see. You are saying that I 
don't know what kind of problem can happen due to cgroup passing. Still
that does not mean none of the problems exist. So let us not pass cgroup
info by default and ask client to opt in.

I don't think this is a very convincing argument.

To me, if we can't see anything fundamentally wrong with passing cgroup
info, we should take these patches in. And once we figure out that there
is are problematic use cases, then implement SO_NOPASSCGROUP and
SO_NOPEERCRED  and allow problematic clients to opt out.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 16, 2014, 7:06 p.m.
On Wed, Apr 16, 2014 at 11:35:13AM -0700, Andy Lutomirski wrote:
> On Wed, Apr 16, 2014 at 11:25 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Wed, Apr 16, 2014 at 11:13:31AM -0700, Andy Lutomirski wrote:
> >
> > [..]
> >> > Ok, so passing cgroup information is not necessarily a problem as long
> >> > as it is not used for authentication. So say somebody is just logging
> >> > all the client request and which cgroup client was in, that should not
> >> > be a problem.
> >>
> >> Do you consider correct attribution of logging messages to be
> >> important?  If so, then this is a kind of authentication, albeit one
> >> where the impact of screwing it up is a bit lower.
> >
> > So not passing cgroup information makes attribution more correct. Just
> > logging of information is authentication how? Both kernel and user space
> > log message into /var/log/messages and kernel messages are prefixed with
> > "kernel". So this somehow becomes are sort of authentication. I don't
> > get it.
> 
> I did a bad job of explaining what I meant.
> 
> I think that, currently, log lines can be correctly attributed to the
> kernel or to userspace, but determining where in userspace a log line
> came from is a bit flaky.  One of the goals of these patches is to
> make log attribution less flaky.  But if you want log attribution to
> be completely correct, even in the presence of malicious programs,
> then I think that the current patches aren't quite there.

Why do you think that current patches are not there yet in the presence of 
malicious program. In your example, one program opened the socket and
passed it to malicious program. And all the future messages are coming
from malicious program. As long as receiver checks for SO_PASSCGROUP, 
it covers your example.

How a malicious program will fake cgroup information?

And we want to take care of pid related races too.

> 
> Is the reason that you don't want to modify the senders because you
> want users of syslog(3) to get the new behavior?  If so, I think it
> would be nice to update glibc to fix that, but maybe the kernel should
> cooperate, and maybe SO_PEERCGROUP is a decent way to handle this.

Modifying every user of unix sockets to start passing cgroup information
will make sense only if it was deemed that passing cgroup information
is risky inherently and should be done only in selected cases.

I think so far our understanding is that we can't find anything 
inherently wrong with passing cgroup information, though there might
be some corner cases we can run into and which are not obivious right
now.

In this case, it does not make sense to force opt-in on clients. It
is more convinient to default to *opt-in* and implement *opt-out* when
need be.

> 
> I still think that SO_PASSCGROUP, as currently designed, is problematic.

And I still do not get why it is problematic.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 16, 2014, 7:13 p.m.
On Wed, Apr 16, 2014 at 12:06 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Apr 16, 2014 at 11:35:13AM -0700, Andy Lutomirski wrote:
>> On Wed, Apr 16, 2014 at 11:25 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Wed, Apr 16, 2014 at 11:13:31AM -0700, Andy Lutomirski wrote:
>> >
>> > [..]
>> >> > Ok, so passing cgroup information is not necessarily a problem as long
>> >> > as it is not used for authentication. So say somebody is just logging
>> >> > all the client request and which cgroup client was in, that should not
>> >> > be a problem.
>> >>
>> >> Do you consider correct attribution of logging messages to be
>> >> important?  If so, then this is a kind of authentication, albeit one
>> >> where the impact of screwing it up is a bit lower.
>> >
>> > So not passing cgroup information makes attribution more correct. Just
>> > logging of information is authentication how? Both kernel and user space
>> > log message into /var/log/messages and kernel messages are prefixed with
>> > "kernel". So this somehow becomes are sort of authentication. I don't
>> > get it.
>>
>> I did a bad job of explaining what I meant.
>>
>> I think that, currently, log lines can be correctly attributed to the
>> kernel or to userspace, but determining where in userspace a log line
>> came from is a bit flaky.  One of the goals of these patches is to
>> make log attribution less flaky.  But if you want log attribution to
>> be completely correct, even in the presence of malicious programs,
>> then I think that the current patches aren't quite there.
>
> Why do you think that current patches are not there yet in the presence of
> malicious program. In your example, one program opened the socket and
> passed it to malicious program. And all the future messages are coming
> from malicious program. As long as receiver checks for SO_PASSCGROUP,
> it covers your example.

This is backwards.  The malicious program opens the socket and passes
it to an unwitting non-malicious program.  That non-malicious program
sends messages, and the logging daemon things that the non-malicious
program actually intended for these messages to end up in the system
log.

>
>>
>> Is the reason that you don't want to modify the senders because you
>> want users of syslog(3) to get the new behavior?  If so, I think it
>> would be nice to update glibc to fix that, but maybe the kernel should
>> cooperate, and maybe SO_PEERCGROUP is a decent way to handle this.
>
> Modifying every user of unix sockets to start passing cgroup information
> will make sense only if it was deemed that passing cgroup information
> is risky inherently and should be done only in selected cases.
>
> I think so far our understanding is that we can't find anything
> inherently wrong with passing cgroup information, though there might
> be some corner cases we can run into and which are not obivious right
> now.

I think I've explained why causing write(2) to start transmitting the
caller's cgroup is problematic.  Your SO_PASSCGROUP patch does that.
Your SO_PEERCGROUP patch does not.

I'm still nervous about SO_PEERCGROUP and about a variant of
SO_PASSCGROUP that used the cgroup as of the time of connect(2), but
I'm much less convinced that there's an actual problem.  My
nervousness here is more that I don't like APIs that aren't clearly
safe.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 16, 2014, 7:39 p.m.
On Wed, Apr 16, 2014 at 12:13:21PM -0700, Andy Lutomirski wrote:
> On Wed, Apr 16, 2014 at 12:06 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Wed, Apr 16, 2014 at 11:35:13AM -0700, Andy Lutomirski wrote:
> >> On Wed, Apr 16, 2014 at 11:25 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > On Wed, Apr 16, 2014 at 11:13:31AM -0700, Andy Lutomirski wrote:
> >> >
> >> > [..]
> >> >> > Ok, so passing cgroup information is not necessarily a problem as long
> >> >> > as it is not used for authentication. So say somebody is just logging
> >> >> > all the client request and which cgroup client was in, that should not
> >> >> > be a problem.
> >> >>
> >> >> Do you consider correct attribution of logging messages to be
> >> >> important?  If so, then this is a kind of authentication, albeit one
> >> >> where the impact of screwing it up is a bit lower.
> >> >
> >> > So not passing cgroup information makes attribution more correct. Just
> >> > logging of information is authentication how? Both kernel and user space
> >> > log message into /var/log/messages and kernel messages are prefixed with
> >> > "kernel". So this somehow becomes are sort of authentication. I don't
> >> > get it.
> >>
> >> I did a bad job of explaining what I meant.
> >>
> >> I think that, currently, log lines can be correctly attributed to the
> >> kernel or to userspace, but determining where in userspace a log line
> >> came from is a bit flaky.  One of the goals of these patches is to
> >> make log attribution less flaky.  But if you want log attribution to
> >> be completely correct, even in the presence of malicious programs,
> >> then I think that the current patches aren't quite there.
> >
> > Why do you think that current patches are not there yet in the presence of
> > malicious program. In your example, one program opened the socket and
> > passed it to malicious program. And all the future messages are coming
> > from malicious program. As long as receiver checks for SO_PASSCGROUP,
> > it covers your example.
> 
> This is backwards.  The malicious program opens the socket and passes
> it to an unwitting non-malicious program.  That non-malicious program
> sends messages, and the logging daemon things that the non-malicious
> program actually intended for these messages to end up in the system
> log.

Either way you look at it, I can't see the problem. Even without cgroup
info, in your example, a non-malicious programs error message will show
up at the receiver (Because malicious program passed that fd as stdout
to non-malicious program).

Are you complainig about this?

Or are you complaing that non-malicious program's cgroup info will show
up at the receiver. What's the problem with that. Receiver can use
SO_PASSCGROUP and get non-malicious programs cgroup and log it (along
with error message). I don't see where is the problem in that.

> 
> >
> >>
> >> Is the reason that you don't want to modify the senders because you
> >> want users of syslog(3) to get the new behavior?  If so, I think it
> >> would be nice to update glibc to fix that, but maybe the kernel should
> >> cooperate, and maybe SO_PEERCGROUP is a decent way to handle this.
> >
> > Modifying every user of unix sockets to start passing cgroup information
> > will make sense only if it was deemed that passing cgroup information
> > is risky inherently and should be done only in selected cases.
> >
> > I think so far our understanding is that we can't find anything
> > inherently wrong with passing cgroup information, though there might
> > be some corner cases we can run into and which are not obivious right
> > now.
> 
> I think I've explained why causing write(2) to start transmitting the
> caller's cgroup is problematic. Your SO_PASSCGROUP patch does that.
> Your SO_PEERCGROUP patch does not.

That's a generic statement. You have not given an specific example, where
it will be a problem. Two easy ways to mitigate this will be.

- Depending on use case, just look at SO_PEERCGROUP info. This will
  avoid all the use cases you mentioned about tricking a priviliged
  program to write something on a fd.

- Also depending on use case one could compare both SO_PEERCGROUP and
  SO_PASSCGROUP info and make sure cgroup remains the same otherwise
  deny the service.

> 
> I'm still nervous about SO_PEERCGROUP and about a variant of
> SO_PASSCGROUP that used the cgroup as of the time of connect(2), but
> I'm much less convinced that there's an actual problem.  My
> nervousness here is more that I don't like APIs that aren't clearly
> safe.

CVE you pointed to talks about SCM_CREDENTIAL vulnerability. That was
a bug and now it has been fixed. So what's unsafe about SCM_CREDENTIAL
now.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 16, 2014, 8:24 p.m.
On Wed, Apr 16, 2014 at 12:39 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Apr 16, 2014 at 12:13:21PM -0700, Andy Lutomirski wrote:
>> On Wed, Apr 16, 2014 at 12:06 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Wed, Apr 16, 2014 at 11:35:13AM -0700, Andy Lutomirski wrote:
>> >> On Wed, Apr 16, 2014 at 11:25 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >> > On Wed, Apr 16, 2014 at 11:13:31AM -0700, Andy Lutomirski wrote:
>> >> >
>> >> > [..]
>> >> >> > Ok, so passing cgroup information is not necessarily a problem as long
>> >> >> > as it is not used for authentication. So say somebody is just logging
>> >> >> > all the client request and which cgroup client was in, that should not
>> >> >> > be a problem.
>> >> >>
>> >> >> Do you consider correct attribution of logging messages to be
>> >> >> important?  If so, then this is a kind of authentication, albeit one
>> >> >> where the impact of screwing it up is a bit lower.
>> >> >
>> >> > So not passing cgroup information makes attribution more correct. Just
>> >> > logging of information is authentication how? Both kernel and user space
>> >> > log message into /var/log/messages and kernel messages are prefixed with
>> >> > "kernel". So this somehow becomes are sort of authentication. I don't
>> >> > get it.
>> >>
>> >> I did a bad job of explaining what I meant.
>> >>
>> >> I think that, currently, log lines can be correctly attributed to the
>> >> kernel or to userspace, but determining where in userspace a log line
>> >> came from is a bit flaky.  One of the goals of these patches is to
>> >> make log attribution less flaky.  But if you want log attribution to
>> >> be completely correct, even in the presence of malicious programs,
>> >> then I think that the current patches aren't quite there.
>> >
>> > Why do you think that current patches are not there yet in the presence of
>> > malicious program. In your example, one program opened the socket and
>> > passed it to malicious program. And all the future messages are coming
>> > from malicious program. As long as receiver checks for SO_PASSCGROUP,
>> > it covers your example.
>>
>> This is backwards.  The malicious program opens the socket and passes
>> it to an unwitting non-malicious program.  That non-malicious program
>> sends messages, and the logging daemon things that the non-malicious
>> program actually intended for these messages to end up in the system
>> log.
>
> Either way you look at it, I can't see the problem. Even without cgroup
> info, in your example, a non-malicious programs error message will show
> up at the receiver (Because malicious program passed that fd as stdout
> to non-malicious program).
>
> Are you complainig about this?
>
> Or are you complaing that non-malicious program's cgroup info will show
> up at the receiver. What's the problem with that. Receiver can use
> SO_PASSCGROUP and get non-malicious programs cgroup and log it (along
> with error message). I don't see where is the problem in that.

I'm not talking about the risk that someone learns someone's cgroup.
I'm talking about the risk that a malicious program can get a lot
entry like: "whatever planted text"
_SYSTEMD_UNIT=non-malicious.service.  That is, they've spoofed a log
line.

If you don't care about spoofing of log lines, then there's no point
to having the kernel validate them anyway.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 17, 2014, 1:41 p.m.
On Wed, Apr 16, 2014 at 01:24:32PM -0700, Andy Lutomirski wrote:

[..]
> I'm not talking about the risk that someone learns someone's cgroup.
> I'm talking about the risk that a malicious program can get a lot
> entry like: "whatever planted text"
> _SYSTEMD_UNIT=non-malicious.service.  That is, they've spoofed a log
> line.
> 
> If you don't care about spoofing of log lines, then there's no point
> to having the kernel validate them anyway.

What's wrong with this. A message came from a cgroup which maps to
a unit xyz and it got logged.  I can't see what's wrong here.

Anyway that message will get logged with unit information. These patches
will just make getting unit information race free and more reliable.

Thansk
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel J Walsh - April 17, 2014, 3:41 p.m.
On 04/16/2014 11:59 AM, Vivek Goyal wrote:
> On Wed, Apr 16, 2014 at 11:13:31AM -0700, Andy Lutomirski wrote:
>> On Wed, Apr 16, 2014 at 11:06 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>>> On Wed, Apr 16, 2014 at 09:31:25AM -0700, Andy Lutomirski wrote:
>>> I am not sure how same issue with happen with cgroups. In the case of
>>> socket example, you are forcing a setuid program to write to standard
>>> output and that setuid program will run in same cgroup as caller and
>>> will have same cgroup as caller. So even if somebody was using cgroup
>>> information for authentication, atleast in this particular case it
>>> will not be a problem. Both unpriviliged and priviliged programs has
>>> same cgroups.
>>>
>> I'm not sure that there's an actual attackable program.  But I also
>> see no reason to be convinced that there isn't one, and the problem
>> can easily be avoided by requiring programs to explicitly ask to send
>> their cgroup.
> If you can't prove that there is something fundamentally wrong with
> passing cgroup info to receiver, there is no reason to block these
> patches either.
>
> We can't fix the problems which we can't see. You are saying that I 
> don't know what kind of problem can happen due to cgroup passing. Still
> that does not mean none of the problems exist. So let us not pass cgroup
> info by default and ask client to opt in.
>
> I don't think this is a very convincing argument.
>
> To me, if we can't see anything fundamentally wrong with passing cgroup
> info, we should take these patches in. And once we figure out that there
> is are problematic use cases, then implement SO_NOPASSCGROUP and
> SO_NOPEERCRED  and allow problematic clients to opt out.
>
> Thanks
> Vivek
The two use cases for this patch are:

1 Logging, to make sure the cgroup information gets correctly attributed
to the caller. 

2 Potentially reveal different information to the caller based on the
cgroup information. 

Imagine you want to set up an apache web server that is going to use
sssd for authentication data.  You might want to reveal a limited set of
users to the apache service process based on the fact that it is running
in the apache.service.slice.  If the apache service does the equivalent
of getent passwd I want to give it different information then say sshd
did the same calls.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 17, 2014, 4:04 p.m.
On Thu, 2014-04-17 at 08:41 -0700, Daniel J Walsh wrote:
> On 04/16/2014 11:59 AM, Vivek Goyal wrote:
> > On Wed, Apr 16, 2014 at 11:13:31AM -0700, Andy Lutomirski wrote:
> >> On Wed, Apr 16, 2014 at 11:06 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >>> On Wed, Apr 16, 2014 at 09:31:25AM -0700, Andy Lutomirski wrote:
> >>> I am not sure how same issue with happen with cgroups. In the case of
> >>> socket example, you are forcing a setuid program to write to standard
> >>> output and that setuid program will run in same cgroup as caller and
> >>> will have same cgroup as caller. So even if somebody was using cgroup
> >>> information for authentication, atleast in this particular case it
> >>> will not be a problem. Both unpriviliged and priviliged programs has
> >>> same cgroups.
> >>>
> >> I'm not sure that there's an actual attackable program.  But I also
> >> see no reason to be convinced that there isn't one, and the problem
> >> can easily be avoided by requiring programs to explicitly ask to send
> >> their cgroup.
> > If you can't prove that there is something fundamentally wrong with
> > passing cgroup info to receiver, there is no reason to block these
> > patches either.
> >
> > We can't fix the problems which we can't see. You are saying that I 
> > don't know what kind of problem can happen due to cgroup passing. Still
> > that does not mean none of the problems exist. So let us not pass cgroup
> > info by default and ask client to opt in.
> >
> > I don't think this is a very convincing argument.
> >
> > To me, if we can't see anything fundamentally wrong with passing cgroup
> > info, we should take these patches in. And once we figure out that there
> > is are problematic use cases, then implement SO_NOPASSCGROUP and
> > SO_NOPEERCRED  and allow problematic clients to opt out.
> >
> > Thanks
> > Vivek
> The two use cases for this patch are:

Let me add some caveats to explain what is used, as the 2 cases map to
the 2 different new options.

> 1 Logging, to make sure the cgroup information gets correctly attributed
> to the caller. 

In here the logging system wants to know *who* logged, if the cgroups of
the process actually doing the logging changes, that's what the logging
system wants to know.
If somehow a setuid binary can change the cgroups, then the logging
system *wants* to know that these logs are coming from there, because
they sure are not coming from the original bounded process anymore.

This use case wants to use SO_PASSCGROUP as it wants to know who the
current writer is, not who opened the file descriptor.


> 2 Potentially reveal different information to the caller based on the
> cgroup information. 
> 
> Imagine you want to set up an apache web server that is going to use
> sssd for authentication data.  You might want to reveal a limited set of
> users to the apache service process based on the fact that it is running
> in the apache.service.slice.  If the apache service does the equivalent
> of getent passwd I want to give it different information then say sshd
> did the same calls.

This case is the inverse, we totally want to care only about who
*opened* the socket for communication, because that's when the kernel
does access control and tells us who is the *owner* of the information.
It doesn't matter at all if the owner turns around and passes the socket
to another process anywhere in the system. If that process does it, it
means it wants to disclose information to which it already have access
to and can ship to said process already and independently anyway.

This use case wants to use SO_PEERCGROUP for that reason.

I hope this makes the use cases more clear.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 17, 2014, 4:05 p.m.
On Thu, Apr 17, 2014 at 8:41 AM, Daniel J Walsh <dwalsh@redhat.com> wrote:
>
> On 04/16/2014 11:59 AM, Vivek Goyal wrote:
> The two use cases for this patch are:
>
> 1 Logging, to make sure the cgroup information gets correctly attributed
> to the caller.
>

I think that the cgroup information of the opener of the socket should
be used to avoid a whole class of issues where the opener gets someone
else to call write(2).

> 2 Potentially reveal different information to the caller based on the
> cgroup information.
>
> Imagine you want to set up an apache web server that is going to use
> sssd for authentication data.  You might want to reveal a limited set of
> users to the apache service process based on the fact that it is running
> in the apache.service.slice.  If the apache service does the equivalent
> of getent passwd I want to give it different information then say sshd
> did the same calls.

This sounds like it's asking for trouble.  cgroups are not a security
boundary.  ptrace, /proc, existing setuid programs, etc do not respect
cgroups.

In this regard, cgroups are a lot like seccomp.  You can *use* them to
block certain actions, just like you can use seccomp to block
essentially anything.  But I hope that no one ever tries to reveal
different getent information based on which seccomp filter is applied
to a process, and, similarly, it seems extremely risky to reveal
different getent information based on which cgroup is in use.

Cgroups may be even worse here, since the actual cgroup names may
change as systemd and whatever competing managers exist come up with
new and varied ways to use cgroups.

If you want to turn cgroups into a security boundary, it may be
possible to do so.  But I don't see the point.  We have user
namespaces and uids for *exactly* this reason.  User namespaces come
with mount and network namespaces, both of which can be used to
identify who connected to a socket by using different sockets.  User
namespaces also allow using uids for this.

If using different sockets has scaling problems, them having a way to
use cmsg or some similar mechanism to identify your network namespace
seems reasonable, too.

--Andy

P.S. I can flat-out guarantee that anything you do with cgroups will
fsck up completely on my production server, because I use my own
cgroup policy.  Some day I may have to reconcile my policy with
systemd (damn it single-hierarchy people and systemd people, you
should have an option that will make systemd stay the f*ck away from
the unified hierarchy; I *like* systemd in general, but this may be a
show-stopper), but I really don't think that SSSD should be hardcoding
assumptions about the cgroup hierarchy, unless those assumptions are
easily reconfigurable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 17, 2014, 4:11 p.m.
On Thu, Apr 17, 2014 at 9:04 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Thu, 2014-04-17 at 08:41 -0700, Daniel J Walsh wrote:
>> On 04/16/2014 11:59 AM, Vivek Goyal wrote:
>> > On Wed, Apr 16, 2014 at 11:13:31AM -0700, Andy Lutomirski wrote:
>> >> On Wed, Apr 16, 2014 at 11:06 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >>> On Wed, Apr 16, 2014 at 09:31:25AM -0700, Andy Lutomirski wrote:
>> >>> I am not sure how same issue with happen with cgroups. In the case of
>> >>> socket example, you are forcing a setuid program to write to standard
>> >>> output and that setuid program will run in same cgroup as caller and
>> >>> will have same cgroup as caller. So even if somebody was using cgroup
>> >>> information for authentication, atleast in this particular case it
>> >>> will not be a problem. Both unpriviliged and priviliged programs has
>> >>> same cgroups.
>> >>>
>> >> I'm not sure that there's an actual attackable program.  But I also
>> >> see no reason to be convinced that there isn't one, and the problem
>> >> can easily be avoided by requiring programs to explicitly ask to send
>> >> their cgroup.
>> > If you can't prove that there is something fundamentally wrong with
>> > passing cgroup info to receiver, there is no reason to block these
>> > patches either.
>> >
>> > We can't fix the problems which we can't see. You are saying that I
>> > don't know what kind of problem can happen due to cgroup passing. Still
>> > that does not mean none of the problems exist. So let us not pass cgroup
>> > info by default and ask client to opt in.
>> >
>> > I don't think this is a very convincing argument.
>> >
>> > To me, if we can't see anything fundamentally wrong with passing cgroup
>> > info, we should take these patches in. And once we figure out that there
>> > is are problematic use cases, then implement SO_NOPASSCGROUP and
>> > SO_NOPEERCRED  and allow problematic clients to opt out.
>> >
>> > Thanks
>> > Vivek
>> The two use cases for this patch are:
>
> Let me add some caveats to explain what is used, as the 2 cases map to
> the 2 different new options.
>
>> 1 Logging, to make sure the cgroup information gets correctly attributed
>> to the caller.
>
> In here the logging system wants to know *who* logged, if the cgroups of
> the process actually doing the logging changes, that's what the logging
> system wants to know.
> If somehow a setuid binary can change the cgroups, then the logging
> system *wants* to know that these logs are coming from there, because
> they sure are not coming from the original bounded process anymore.
>
> This use case wants to use SO_PASSCGROUP as it wants to know who the
> current writer is, not who opened the file descriptor.

No.  The logging daemon thinks it wants to know who the writer is, but
the logging daemon is wrong.  It actually wants to know who composed a
log message destined to it.  The caller of write(2) may or may not be
the same entity.

If this form of SO_PASSCGROUP somehow makes it into a pull request for
Linus, I will ask him not to pull it and/or to revert it.  I think
he'll agree that write(2) MUST NOT care who called it.  Yes, I don't
see how this is exploitable on my machine, but it's a mistake for the
same reason that the netlink crap in CVE-2014-0181 is a mistake.

FWIW, there are a handful of people who think about security and
occasionally answer things on lkml.  Some of them will tell you once
that a patch is a problem, and then they'll watch you ignore it, and
then they'll pwn you later on.  This has happened to me.  I'm not one
of those people, though, but it's generally good policy to pay
attention when people tell you that your proposed API *cannot* be used
safely.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 17, 2014, 4:12 p.m.
On Thu, Apr 17, 2014 at 08:41:15AM -0700, Daniel J Walsh wrote:

[..]
> The two use cases for this patch are:
> 
> 1 Logging, to make sure the cgroup information gets correctly attributed
> to the caller. 
> 
> 2 Potentially reveal different information to the caller based on the
> cgroup information. 
> 
> Imagine you want to set up an apache web server that is going to use
> sssd for authentication data.  You might want to reveal a limited set of
> users to the apache service process based on the fact that it is running
> in the apache.service.slice.  If the apache service does the equivalent
> of getent passwd I want to give it different information then say sshd
> did the same calls.

Dan,

So in first case we should be fine with both SO_PEERCGROUP and
SO_PASSCGROUP. cgroup information is not being used as some sort of
security attribute and decide who can access what.

Second use case looks more like that cgroup information is being used
as some sort of security attribute to decide who gets to see what
information.

cgroup retrieved from SO_PEERCGROUP and used as an identifier to
decide who can access what should be safe. This is similar to doing
security checks at open(2) time.

Andy's contention is that cgroup information received using SO_PASSCGROUP
is not safe to use for some sort of authorization. And reason being that
socket file descriptors can be passed around and one can trick an 
priviliged process to write to that descriptor, making receiver believe
that priviilged process is asking for some info.

Now there are caveats to it. So far all the examples I have seen are
the ones where setuid program will run in same cgroup as client. So even
if we trick setuid program to write to socket fd, cgroup information
remains same.

I think problem will happen only in advance cases where priviliged
program is running in some other cgroup and if it accepts file
descritors and writes to them. I don't know if it is a common practice
or not.

May be we should just make it very clear that if cgroup information
is being used for any kind of authorization purposes, then use only
SO_PEERCGROUP. Limit SO_PASSCGROUP cgroup info for usages like logging.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 17, 2014, 4:24 p.m.
On Thu, 2014-04-17 at 09:11 -0700, Andy Lutomirski wrote:
> On Thu, Apr 17, 2014 at 9:04 AM, Simo Sorce <ssorce@redhat.com> wrote:
> > On Thu, 2014-04-17 at 08:41 -0700, Daniel J Walsh wrote:
> >> On 04/16/2014 11:59 AM, Vivek Goyal wrote:
> >> > On Wed, Apr 16, 2014 at 11:13:31AM -0700, Andy Lutomirski wrote:
> >> >> On Wed, Apr 16, 2014 at 11:06 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> >>> On Wed, Apr 16, 2014 at 09:31:25AM -0700, Andy Lutomirski wrote:
> >> >>> I am not sure how same issue with happen with cgroups. In the case of
> >> >>> socket example, you are forcing a setuid program to write to standard
> >> >>> output and that setuid program will run in same cgroup as caller and
> >> >>> will have same cgroup as caller. So even if somebody was using cgroup
> >> >>> information for authentication, atleast in this particular case it
> >> >>> will not be a problem. Both unpriviliged and priviliged programs has
> >> >>> same cgroups.
> >> >>>
> >> >> I'm not sure that there's an actual attackable program.  But I also
> >> >> see no reason to be convinced that there isn't one, and the problem
> >> >> can easily be avoided by requiring programs to explicitly ask to send
> >> >> their cgroup.
> >> > If you can't prove that there is something fundamentally wrong with
> >> > passing cgroup info to receiver, there is no reason to block these
> >> > patches either.
> >> >
> >> > We can't fix the problems which we can't see. You are saying that I
> >> > don't know what kind of problem can happen due to cgroup passing. Still
> >> > that does not mean none of the problems exist. So let us not pass cgroup
> >> > info by default and ask client to opt in.
> >> >
> >> > I don't think this is a very convincing argument.
> >> >
> >> > To me, if we can't see anything fundamentally wrong with passing cgroup
> >> > info, we should take these patches in. And once we figure out that there
> >> > is are problematic use cases, then implement SO_NOPASSCGROUP and
> >> > SO_NOPEERCRED  and allow problematic clients to opt out.
> >> >
> >> > Thanks
> >> > Vivek
> >> The two use cases for this patch are:
> >
> > Let me add some caveats to explain what is used, as the 2 cases map to
> > the 2 different new options.
> >
> >> 1 Logging, to make sure the cgroup information gets correctly attributed
> >> to the caller.
> >
> > In here the logging system wants to know *who* logged, if the cgroups of
> > the process actually doing the logging changes, that's what the logging
> > system wants to know.
> > If somehow a setuid binary can change the cgroups, then the logging
> > system *wants* to know that these logs are coming from there, because
> > they sure are not coming from the original bounded process anymore.
> >
> > This use case wants to use SO_PASSCGROUP as it wants to know who the
> > current writer is, not who opened the file descriptor.
> 
> No.  The logging daemon thinks it wants to know who the writer is, but
> the logging daemon is wrong.  It actually wants to know who composed a
> log message destined to it.  The caller of write(2) may or may not be
> the same entity.

This works both ways, and doesn't really matter, you are *no* better off
w/o this interface.

> If this form of SO_PASSCGROUP somehow makes it into a pull request for
> Linus, I will ask him not to pull it and/or to revert it.  I think
> he'll agree that write(2) MUST NOT care who called it.

And write() does not, there is no access control check being performed
here. This call is the same as getting the pid of the process and
crawling /proc with that information, just more efficient and race-free.

I repeat, it is *not* access control.

> Yes, I don't see how this is exploitable on my machine, but it's a mistake for the
> same reason that the netlink crap in CVE-2014-0181 is a mistake.

That is a different matter, that is an access control decision.

> FWIW, there are a handful of people who think about security and
> occasionally answer things on lkml.  Some of them will tell you once
> that a patch is a problem, and then they'll watch you ignore it, and
> then they'll pwn you later on.  This has happened to me.  I'm not one
> of those people, though, but it's generally good policy to pay
> attention when people tell you that your proposed API *cannot* be used
> safely.

It is also important to understand why a mechanism is being proposed and
what is its intended usage. Cgroups information passed via SO_PASSCGROUP
is *not* to be used for access control, but logging is not access
control.
SO_PEERCGROUP instead checks for the cgroup determined at open() so it
could be used for access control.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 17, 2014, 4:37 p.m.
On Thu, Apr 17, 2014 at 9:24 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Thu, 2014-04-17 at 09:11 -0700, Andy Lutomirski wrote:
>>
>> No.  The logging daemon thinks it wants to know who the writer is, but
>> the logging daemon is wrong.  It actually wants to know who composed a
>> log message destined to it.  The caller of write(2) may or may not be
>> the same entity.
>
> This works both ways, and doesn't really matter, you are *no* better off
> w/o this interface.
>
>> If this form of SO_PASSCGROUP somehow makes it into a pull request for
>> Linus, I will ask him not to pull it and/or to revert it.  I think
>> he'll agree that write(2) MUST NOT care who called it.
>
> And write() does not, there is no access control check being performed
> here. This call is the same as getting the pid of the process and
> crawling /proc with that information, just more efficient and race-free.

Doing it using the pid of writer is wrong.  So is doing it with the
cgroup of the writer.  The fact that it's even possible to use the pid
of the caller of write(2) is a mistake, but that particular mistake
is, unfortunately, well-enshrined in history.

>
> I repeat, it is *not* access control.
>

Sure it is.

Either correct attribution of logs doesn't matter, in which case it
makes little difference how you do it, or it does matter, in which
case it should be done right.

Here's a real world example from my industry.  Suppose I used journald
for logging on a production trading system.  The ability to write a
log line that says "I just bought 100000 EUR/USD for
such-and-such-price" attributed to a trading program is absolutely a
security-sensitive operation and must be subject to access control.

If Common Criteria doesn't say that audit logs need to be resistant to
spoofing, then that's just one more reason that Common Criteria is
broken.

I don't use journald for trading logs, and I'd be absolutely daft to
use ordinary syslog, because ordinary syslog doesn't even pretend to
be secure.  But if you're going to design something that claims to be
secure, "well, I can't see how this issue would be exploited" is not
good enough.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 17, 2014, 4:38 p.m.
On Thu, Apr 17, 2014 at 09:11:11AM -0700, Andy Lutomirski wrote:
> On Thu, Apr 17, 2014 at 9:04 AM, Simo Sorce <ssorce@redhat.com> wrote:
> > On Thu, 2014-04-17 at 08:41 -0700, Daniel J Walsh wrote:
> >> On 04/16/2014 11:59 AM, Vivek Goyal wrote:
> >> > On Wed, Apr 16, 2014 at 11:13:31AM -0700, Andy Lutomirski wrote:
> >> >> On Wed, Apr 16, 2014 at 11:06 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> >>> On Wed, Apr 16, 2014 at 09:31:25AM -0700, Andy Lutomirski wrote:
> >> >>> I am not sure how same issue with happen with cgroups. In the case of
> >> >>> socket example, you are forcing a setuid program to write to standard
> >> >>> output and that setuid program will run in same cgroup as caller and
> >> >>> will have same cgroup as caller. So even if somebody was using cgroup
> >> >>> information for authentication, atleast in this particular case it
> >> >>> will not be a problem. Both unpriviliged and priviliged programs has
> >> >>> same cgroups.
> >> >>>
> >> >> I'm not sure that there's an actual attackable program.  But I also
> >> >> see no reason to be convinced that there isn't one, and the problem
> >> >> can easily be avoided by requiring programs to explicitly ask to send
> >> >> their cgroup.
> >> > If you can't prove that there is something fundamentally wrong with
> >> > passing cgroup info to receiver, there is no reason to block these
> >> > patches either.
> >> >
> >> > We can't fix the problems which we can't see. You are saying that I
> >> > don't know what kind of problem can happen due to cgroup passing. Still
> >> > that does not mean none of the problems exist. So let us not pass cgroup
> >> > info by default and ask client to opt in.
> >> >
> >> > I don't think this is a very convincing argument.
> >> >
> >> > To me, if we can't see anything fundamentally wrong with passing cgroup
> >> > info, we should take these patches in. And once we figure out that there
> >> > is are problematic use cases, then implement SO_NOPASSCGROUP and
> >> > SO_NOPEERCRED  and allow problematic clients to opt out.
> >> >
> >> > Thanks
> >> > Vivek
> >> The two use cases for this patch are:
> >
> > Let me add some caveats to explain what is used, as the 2 cases map to
> > the 2 different new options.
> >
> >> 1 Logging, to make sure the cgroup information gets correctly attributed
> >> to the caller.
> >
> > In here the logging system wants to know *who* logged, if the cgroups of
> > the process actually doing the logging changes, that's what the logging
> > system wants to know.
> > If somehow a setuid binary can change the cgroups, then the logging
> > system *wants* to know that these logs are coming from there, because
> > they sure are not coming from the original bounded process anymore.
> >
> > This use case wants to use SO_PASSCGROUP as it wants to know who the
> > current writer is, not who opened the file descriptor.
> 
> No.  The logging daemon thinks it wants to know who the writer is, but
> the logging daemon is wrong.  It actually wants to know who composed a
> log message destined to it.  The caller of write(2) may or may not be
> the same entity.

So say, a process A writes a message, passes that message to process B and
asks process B to send the message to logger daemon. You think logger
deamon is interested in knowing who originally wrote the message (process A)
instead of who sent the message(process B)? I think this is wrong.

Logger daemon is interested in logging who actually *sent* the message
and does not care who originally *wrote* the message.

> 
> If this form of SO_PASSCGROUP somehow makes it into a pull request for
> Linus, I will ask him not to pull it and/or to revert it.  I think
> he'll agree that write(2) MUST NOT care who called it.  Yes, I don't
> see how this is exploitable on my machine, but it's a mistake for the
> same reason that the netlink crap in CVE-2014-0181 is a mistake.

There is no issue with usage of SO_PASSCGROUP information with logger
example. You are assuming that people are going to use SO_PASSCGROUP
for security related stuff.

So to me, argument should be made with systemd or any other application
if they try to use it for any unsafe purposes. Logger is a very valid
use case and for that purpose SO_PASSCGROUP patchset should be go in.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 17, 2014, 4:48 p.m.
On Thu, 2014-04-17 at 09:37 -0700, Andy Lutomirski wrote:
> On Thu, Apr 17, 2014 at 9:24 AM, Simo Sorce <ssorce@redhat.com> wrote:
> > On Thu, 2014-04-17 at 09:11 -0700, Andy Lutomirski wrote:
> >>
> >> No.  The logging daemon thinks it wants to know who the writer is, but
> >> the logging daemon is wrong.  It actually wants to know who composed a
> >> log message destined to it.  The caller of write(2) may or may not be
> >> the same entity.
> >
> > This works both ways, and doesn't really matter, you are *no* better off
> > w/o this interface.
> >
> >> If this form of SO_PASSCGROUP somehow makes it into a pull request for
> >> Linus, I will ask him not to pull it and/or to revert it.  I think
> >> he'll agree that write(2) MUST NOT care who called it.
> >
> > And write() does not, there is no access control check being performed
> > here. This call is the same as getting the pid of the process and
> > crawling /proc with that information, just more efficient and race-free.
> 
> Doing it using the pid of writer is wrong.  So is doing it with the
> cgroup of the writer.  The fact that it's even possible to use the pid
> of the caller of write(2) is a mistake, but that particular mistake
> is, unfortunately, well-enshrined in history.
> 
> >
> > I repeat, it is *not* access control.
> >
> 
> Sure it is.
> 
> Either correct attribution of logs doesn't matter, in which case it
> makes little difference how you do it, or it does matter, in which
> case it should be done right.

Well journald can *also* get  SO_PEERCGROUP and log anomalies if the 2
differ. That is if the log happens on a connected socket.

If the log happens on a unix datagram* then SO_PEERCGROUP is not
available because there is no connect(), however write() cannot be used
either, only sendmsg() AFAIK, so the "setuid" binary attack does not
apply.

> Here's a real world example from my industry.  Suppose I used journald
> for logging on a production trading system.  The ability to write a
> log line that says "I just bought 100000 EUR/USD for
> such-and-such-price" attributed to a trading program is absolutely a
> security-sensitive operation and must be subject to access control.

Eh well if SCM_CREDNTIALS passed the euid you'd se a different user in
the logs from the one that is supposed to be writing the log ... but
that send the real uid instead oups ... but I think the point is moot
for logs, given the previous explanation.

> If Common Criteria doesn't say that audit logs need to be resistant to
> spoofing, then that's just one more reason that Common Criteria is
> broken.

IT does say a lot about audit logs, but journald is not classified as an
audit log under CC, and I am not sure it can ever be.

> I don't use journald for trading logs, and I'd be absolutely daft to
> use ordinary syslog, because ordinary syslog doesn't even pretend to
> be secure.

Right.

>   But if you're going to design something that claims to be
> secure, "well, I can't see how this issue would be exploited" is not
> good enough.

Did anyone claim the journal is secure to the level you claim it should
be. Regardless, systemd can be that secure if it uses also SO_PEERCGROUP
in the vulnerable case (when you have a connected socket).


Simo.


* I think this is the case that matters for journald

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 17, 2014, 4:55 p.m.
On Thu, Apr 17, 2014 at 9:48 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Thu, 2014-04-17 at 09:37 -0700, Andy Lutomirski wrote:
>> On Thu, Apr 17, 2014 at 9:24 AM, Simo Sorce <ssorce@redhat.com> wrote:
>> > On Thu, 2014-04-17 at 09:11 -0700, Andy Lutomirski wrote:
>> >>
>> >> No.  The logging daemon thinks it wants to know who the writer is, but
>> >> the logging daemon is wrong.  It actually wants to know who composed a
>> >> log message destined to it.  The caller of write(2) may or may not be
>> >> the same entity.
>> >
>> > This works both ways, and doesn't really matter, you are *no* better off
>> > w/o this interface.
>> >
>> >> If this form of SO_PASSCGROUP somehow makes it into a pull request for
>> >> Linus, I will ask him not to pull it and/or to revert it.  I think
>> >> he'll agree that write(2) MUST NOT care who called it.
>> >
>> > And write() does not, there is no access control check being performed
>> > here. This call is the same as getting the pid of the process and
>> > crawling /proc with that information, just more efficient and race-free.
>>
>> Doing it using the pid of writer is wrong.  So is doing it with the
>> cgroup of the writer.  The fact that it's even possible to use the pid
>> of the caller of write(2) is a mistake, but that particular mistake
>> is, unfortunately, well-enshrined in history.
>>
>> >
>> > I repeat, it is *not* access control.
>> >
>>
>> Sure it is.
>>
>> Either correct attribution of logs doesn't matter, in which case it
>> makes little difference how you do it, or it does matter, in which
>> case it should be done right.
>
> Well journald can *also* get  SO_PEERCGROUP and log anomalies if the 2
> differ. That is if the log happens on a connected socket.
>
> If the log happens on a unix datagram* then SO_PEERCGROUP is not
> available because there is no connect(), however write() cannot be used
> either, only sendmsg() AFAIK, so the "setuid" binary attack does not
> apply.
>

Or you could only send SCM_CGROUP when the writer asks sendmsg to send
it, in which case this whole problem goes away.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 17, 2014, 5:12 p.m.
On Thu, Apr 17, 2014 at 09:55:08AM -0700, Andy Lutomirski wrote:
> On Thu, Apr 17, 2014 at 9:48 AM, Simo Sorce <ssorce@redhat.com> wrote:
> > On Thu, 2014-04-17 at 09:37 -0700, Andy Lutomirski wrote:
> >> On Thu, Apr 17, 2014 at 9:24 AM, Simo Sorce <ssorce@redhat.com> wrote:
> >> > On Thu, 2014-04-17 at 09:11 -0700, Andy Lutomirski wrote:
> >> >>
> >> >> No.  The logging daemon thinks it wants to know who the writer is, but
> >> >> the logging daemon is wrong.  It actually wants to know who composed a
> >> >> log message destined to it.  The caller of write(2) may or may not be
> >> >> the same entity.
> >> >
> >> > This works both ways, and doesn't really matter, you are *no* better off
> >> > w/o this interface.
> >> >
> >> >> If this form of SO_PASSCGROUP somehow makes it into a pull request for
> >> >> Linus, I will ask him not to pull it and/or to revert it.  I think
> >> >> he'll agree that write(2) MUST NOT care who called it.
> >> >
> >> > And write() does not, there is no access control check being performed
> >> > here. This call is the same as getting the pid of the process and
> >> > crawling /proc with that information, just more efficient and race-free.
> >>
> >> Doing it using the pid of writer is wrong.  So is doing it with the
> >> cgroup of the writer.  The fact that it's even possible to use the pid
> >> of the caller of write(2) is a mistake, but that particular mistake
> >> is, unfortunately, well-enshrined in history.
> >>
> >> >
> >> > I repeat, it is *not* access control.
> >> >
> >>
> >> Sure it is.
> >>
> >> Either correct attribution of logs doesn't matter, in which case it
> >> makes little difference how you do it, or it does matter, in which
> >> case it should be done right.
> >
> > Well journald can *also* get  SO_PEERCGROUP and log anomalies if the 2
> > differ. That is if the log happens on a connected socket.
> >
> > If the log happens on a unix datagram* then SO_PEERCGROUP is not
> > available because there is no connect(), however write() cannot be used
> > either, only sendmsg() AFAIK, so the "setuid" binary attack does not
> > apply.
> >
> 
> Or you could only send SCM_CGROUP when the writer asks sendmsg to send
> it, in which case this whole problem goes away.

Sending SCM_CGROUP explicitly is also sending cgroup info at write(2) time
and if receiver uses that info for access control, it can be problematic.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 17, 2014, 5:26 p.m.
On Thu, Apr 17, 2014 at 10:12 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Apr 17, 2014 at 09:55:08AM -0700, Andy Lutomirski wrote:
>> On Thu, Apr 17, 2014 at 9:48 AM, Simo Sorce <ssorce@redhat.com> wrote:
>> > On Thu, 2014-04-17 at 09:37 -0700, Andy Lutomirski wrote:
>> >> On Thu, Apr 17, 2014 at 9:24 AM, Simo Sorce <ssorce@redhat.com> wrote:
>> >> > On Thu, 2014-04-17 at 09:11 -0700, Andy Lutomirski wrote:
>> >> >>
>> >> >> No.  The logging daemon thinks it wants to know who the writer is, but
>> >> >> the logging daemon is wrong.  It actually wants to know who composed a
>> >> >> log message destined to it.  The caller of write(2) may or may not be
>> >> >> the same entity.
>> >> >
>> >> > This works both ways, and doesn't really matter, you are *no* better off
>> >> > w/o this interface.
>> >> >
>> >> >> If this form of SO_PASSCGROUP somehow makes it into a pull request for
>> >> >> Linus, I will ask him not to pull it and/or to revert it.  I think
>> >> >> he'll agree that write(2) MUST NOT care who called it.
>> >> >
>> >> > And write() does not, there is no access control check being performed
>> >> > here. This call is the same as getting the pid of the process and
>> >> > crawling /proc with that information, just more efficient and race-free.
>> >>
>> >> Doing it using the pid of writer is wrong.  So is doing it with the
>> >> cgroup of the writer.  The fact that it's even possible to use the pid
>> >> of the caller of write(2) is a mistake, but that particular mistake
>> >> is, unfortunately, well-enshrined in history.
>> >>
>> >> >
>> >> > I repeat, it is *not* access control.
>> >> >
>> >>
>> >> Sure it is.
>> >>
>> >> Either correct attribution of logs doesn't matter, in which case it
>> >> makes little difference how you do it, or it does matter, in which
>> >> case it should be done right.
>> >
>> > Well journald can *also* get  SO_PEERCGROUP and log anomalies if the 2
>> > differ. That is if the log happens on a connected socket.
>> >
>> > If the log happens on a unix datagram* then SO_PEERCGROUP is not
>> > available because there is no connect(), however write() cannot be used
>> > either, only sendmsg() AFAIK, so the "setuid" binary attack does not
>> > apply.
>> >
>>
>> Or you could only send SCM_CGROUP when the writer asks sendmsg to send
>> it, in which case this whole problem goes away.
>
> Sending SCM_CGROUP explicitly is also sending cgroup info at write(2) time
> and if receiver uses that info for access control, it can be problematic.
>

Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
who supply SCM_CGROUP are explicitly indicating that they want their
cgroup associated with that message.  Callers of write(2) and send(2)
are simply indicating that they have some bytes that they want to
shove into whatever's at the other end of the fd.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 17, 2014, 5:33 p.m.
On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
> On Thu, Apr 17, 2014 at 10:12 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Thu, Apr 17, 2014 at 09:55:08AM -0700, Andy Lutomirski wrote:
> >> On Thu, Apr 17, 2014 at 9:48 AM, Simo Sorce <ssorce@redhat.com> wrote:
> >> > On Thu, 2014-04-17 at 09:37 -0700, Andy Lutomirski wrote:
> >> >> On Thu, Apr 17, 2014 at 9:24 AM, Simo Sorce <ssorce@redhat.com> wrote:
> >> >> > On Thu, 2014-04-17 at 09:11 -0700, Andy Lutomirski wrote:
> >> >> >>
> >> >> >> No.  The logging daemon thinks it wants to know who the writer is, but
> >> >> >> the logging daemon is wrong.  It actually wants to know who composed a
> >> >> >> log message destined to it.  The caller of write(2) may or may not be
> >> >> >> the same entity.
> >> >> >
> >> >> > This works both ways, and doesn't really matter, you are *no* better off
> >> >> > w/o this interface.
> >> >> >
> >> >> >> If this form of SO_PASSCGROUP somehow makes it into a pull request for
> >> >> >> Linus, I will ask him not to pull it and/or to revert it.  I think
> >> >> >> he'll agree that write(2) MUST NOT care who called it.
> >> >> >
> >> >> > And write() does not, there is no access control check being performed
> >> >> > here. This call is the same as getting the pid of the process and
> >> >> > crawling /proc with that information, just more efficient and race-free.
> >> >>
> >> >> Doing it using the pid of writer is wrong.  So is doing it with the
> >> >> cgroup of the writer.  The fact that it's even possible to use the pid
> >> >> of the caller of write(2) is a mistake, but that particular mistake
> >> >> is, unfortunately, well-enshrined in history.
> >> >>
> >> >> >
> >> >> > I repeat, it is *not* access control.
> >> >> >
> >> >>
> >> >> Sure it is.
> >> >>
> >> >> Either correct attribution of logs doesn't matter, in which case it
> >> >> makes little difference how you do it, or it does matter, in which
> >> >> case it should be done right.
> >> >
> >> > Well journald can *also* get  SO_PEERCGROUP and log anomalies if the 2
> >> > differ. That is if the log happens on a connected socket.
> >> >
> >> > If the log happens on a unix datagram* then SO_PEERCGROUP is not
> >> > available because there is no connect(), however write() cannot be used
> >> > either, only sendmsg() AFAIK, so the "setuid" binary attack does not
> >> > apply.
> >> >
> >>
> >> Or you could only send SCM_CGROUP when the writer asks sendmsg to send
> >> it, in which case this whole problem goes away.
> >
> > Sending SCM_CGROUP explicitly is also sending cgroup info at write(2) time
> > and if receiver uses that info for access control, it can be problematic.
> >
> 
> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
> who supply SCM_CGROUP are explicitly indicating that they want their
> cgroup associated with that message.  Callers of write(2) and send(2)
> are simply indicating that they have some bytes that they want to
> shove into whatever's at the other end of the fd.

But there is no attack vector that passes by tricking setuid binaries to
write to pre-opened file descriptors on sendmsg(), and for the other
cases (connected socket) journald can always cross check with
SO_PEERCGROUP, so why do we care again ?

Simo.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 17, 2014, 5:35 p.m.
On Thu, Apr 17, 2014 at 10:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
>>
>> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
>> who supply SCM_CGROUP are explicitly indicating that they want their
>> cgroup associated with that message.  Callers of write(2) and send(2)
>> are simply indicating that they have some bytes that they want to
>> shove into whatever's at the other end of the fd.
>
> But there is no attack vector that passes by tricking setuid binaries to
> write to pre-opened file descriptors on sendmsg(), and for the other
> cases (connected socket) journald can always cross check with
> SO_PEERCGROUP, so why do we care again ?

Because the proposed code does not do what I described, at least as
far I as I can tell.

--Andy
Simo Sorce - April 17, 2014, 5:47 p.m.
On Thu, 2014-04-17 at 10:35 -0700, Andy Lutomirski wrote:
> On Thu, Apr 17, 2014 at 10:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
> > On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
> >>
> >> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
> >> who supply SCM_CGROUP are explicitly indicating that they want their
> >> cgroup associated with that message.  Callers of write(2) and send(2)
> >> are simply indicating that they have some bytes that they want to
> >> shove into whatever's at the other end of the fd.
> >
> > But there is no attack vector that passes by tricking setuid binaries to
> > write to pre-opened file descriptors on sendmsg(), and for the other
> > cases (connected socket) journald can always cross check with
> > SO_PEERCGROUP, so why do we care again ?
> 
> Because the proposed code does not do what I described, at least as
> far I as I can tell.

You do realize that we have been speaking in hypothetical for a while
now ?

Even without doing the SO_PEERCRED, you are not going to fool the log,
as it gathers a ton of other info about the process, and cgroup is just
one of the infos used to classify the log.

There are also credentials, pid, and a lot of other things.
Even if a setuid binary could be tricked to send a message with an
"impostor" cgroup don't you think you'd see other things out of place ?
(wrong uid, wrong pid, etc...).

What I am telling you is that userspace has all the tools it needs to
not get fooled, as long as cgroup information retrieved via
SO_PASSCGROUP is not uniquely used to authenticate a peer process for
connected sockets.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 17, 2014, 5:52 p.m.
On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
> On Thu, Apr 17, 2014 at 10:12 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Thu, Apr 17, 2014 at 09:55:08AM -0700, Andy Lutomirski wrote:
> >> On Thu, Apr 17, 2014 at 9:48 AM, Simo Sorce <ssorce@redhat.com> wrote:
> >> > On Thu, 2014-04-17 at 09:37 -0700, Andy Lutomirski wrote:
> >> >> On Thu, Apr 17, 2014 at 9:24 AM, Simo Sorce <ssorce@redhat.com> wrote:
> >> >> > On Thu, 2014-04-17 at 09:11 -0700, Andy Lutomirski wrote:
> >> >> >>
> >> >> >> No.  The logging daemon thinks it wants to know who the writer is, but
> >> >> >> the logging daemon is wrong.  It actually wants to know who composed a
> >> >> >> log message destined to it.  The caller of write(2) may or may not be
> >> >> >> the same entity.
> >> >> >
> >> >> > This works both ways, and doesn't really matter, you are *no* better off
> >> >> > w/o this interface.
> >> >> >
> >> >> >> If this form of SO_PASSCGROUP somehow makes it into a pull request for
> >> >> >> Linus, I will ask him not to pull it and/or to revert it.  I think
> >> >> >> he'll agree that write(2) MUST NOT care who called it.
> >> >> >
> >> >> > And write() does not, there is no access control check being performed
> >> >> > here. This call is the same as getting the pid of the process and
> >> >> > crawling /proc with that information, just more efficient and race-free.
> >> >>
> >> >> Doing it using the pid of writer is wrong.  So is doing it with the
> >> >> cgroup of the writer.  The fact that it's even possible to use the pid
> >> >> of the caller of write(2) is a mistake, but that particular mistake
> >> >> is, unfortunately, well-enshrined in history.
> >> >>
> >> >> >
> >> >> > I repeat, it is *not* access control.
> >> >> >
> >> >>
> >> >> Sure it is.
> >> >>
> >> >> Either correct attribution of logs doesn't matter, in which case it
> >> >> makes little difference how you do it, or it does matter, in which
> >> >> case it should be done right.
> >> >
> >> > Well journald can *also* get  SO_PEERCGROUP and log anomalies if the 2
> >> > differ. That is if the log happens on a connected socket.
> >> >
> >> > If the log happens on a unix datagram* then SO_PEERCGROUP is not
> >> > available because there is no connect(), however write() cannot be used
> >> > either, only sendmsg() AFAIK, so the "setuid" binary attack does not
> >> > apply.
> >> >
> >>
> >> Or you could only send SCM_CGROUP when the writer asks sendmsg to send
> >> it, in which case this whole problem goes away.
> >
> > Sending SCM_CGROUP explicitly is also sending cgroup info at write(2) time
> > and if receiver uses that info for access control, it can be problematic.
> >
> 
> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
> who supply SCM_CGROUP are explicitly indicating that they want their
> cgroup associated with that message.  Callers of write(2) and send(2)
> are simply indicating that they have some bytes that they want to
> shove into whatever's at the other end of the fd.

So you are telling me that you want to change all code that writes to
stderr to be changed to use sendmsg() instead of write() in order to get
that information ?

If you are using datagram sockets then the sender explicitly has to use
sendmsg() already and if a setuid binary can be convinced to send
arbitrary data to an arbitrary datagram sokcet you have bigger problems
in that binary, and said binary will send you whatever cgroup it is in.
You do have an opt out clause but for a case where you do not need it
(IMO).

If you are not using datagram sockets, then you can use SO_PEERCGROUP,
to cross check and augment whatever data you receive, and that should be
enough.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 17, 2014, 6:04 p.m.
On Thu, Apr 17, 2014 at 10:52 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
>> On Thu, Apr 17, 2014 at 10:12 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Thu, Apr 17, 2014 at 09:55:08AM -0700, Andy Lutomirski wrote:
>> >> On Thu, Apr 17, 2014 at 9:48 AM, Simo Sorce <ssorce@redhat.com> wrote:
>> >> > On Thu, 2014-04-17 at 09:37 -0700, Andy Lutomirski wrote:
>> >> >> On Thu, Apr 17, 2014 at 9:24 AM, Simo Sorce <ssorce@redhat.com> wrote:
>> >> >> > On Thu, 2014-04-17 at 09:11 -0700, Andy Lutomirski wrote:
>> >> >> >>
>> >> >> >> No.  The logging daemon thinks it wants to know who the writer is, but
>> >> >> >> the logging daemon is wrong.  It actually wants to know who composed a
>> >> >> >> log message destined to it.  The caller of write(2) may or may not be
>> >> >> >> the same entity.
>> >> >> >
>> >> >> > This works both ways, and doesn't really matter, you are *no* better off
>> >> >> > w/o this interface.
>> >> >> >
>> >> >> >> If this form of SO_PASSCGROUP somehow makes it into a pull request for
>> >> >> >> Linus, I will ask him not to pull it and/or to revert it.  I think
>> >> >> >> he'll agree that write(2) MUST NOT care who called it.
>> >> >> >
>> >> >> > And write() does not, there is no access control check being performed
>> >> >> > here. This call is the same as getting the pid of the process and
>> >> >> > crawling /proc with that information, just more efficient and race-free.
>> >> >>
>> >> >> Doing it using the pid of writer is wrong.  So is doing it with the
>> >> >> cgroup of the writer.  The fact that it's even possible to use the pid
>> >> >> of the caller of write(2) is a mistake, but that particular mistake
>> >> >> is, unfortunately, well-enshrined in history.
>> >> >>
>> >> >> >
>> >> >> > I repeat, it is *not* access control.
>> >> >> >
>> >> >>
>> >> >> Sure it is.
>> >> >>
>> >> >> Either correct attribution of logs doesn't matter, in which case it
>> >> >> makes little difference how you do it, or it does matter, in which
>> >> >> case it should be done right.
>> >> >
>> >> > Well journald can *also* get  SO_PEERCGROUP and log anomalies if the 2
>> >> > differ. That is if the log happens on a connected socket.
>> >> >
>> >> > If the log happens on a unix datagram* then SO_PEERCGROUP is not
>> >> > available because there is no connect(), however write() cannot be used
>> >> > either, only sendmsg() AFAIK, so the "setuid" binary attack does not
>> >> > apply.
>> >> >
>> >>
>> >> Or you could only send SCM_CGROUP when the writer asks sendmsg to send
>> >> it, in which case this whole problem goes away.
>> >
>> > Sending SCM_CGROUP explicitly is also sending cgroup info at write(2) time
>> > and if receiver uses that info for access control, it can be problematic.
>> >
>>
>> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
>> who supply SCM_CGROUP are explicitly indicating that they want their
>> cgroup associated with that message.  Callers of write(2) and send(2)
>> are simply indicating that they have some bytes that they want to
>> shove into whatever's at the other end of the fd.
>
> So you are telling me that you want to change all code that writes to
> stderr to be changed to use sendmsg() instead of write() in order to get
> that information ?

No.  I'm telling you that I want whoever writes the logging code to
change *the logging code* to use sendmsg.

>
> If you are using datagram sockets then the sender explicitly has to use
> sendmsg() already and if a setuid binary can be convinced to send
> arbitrary data to an arbitrary datagram sokcet you have bigger problems
> in that binary, and said binary will send you whatever cgroup it is in.

Really?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 17, 2014, 6:05 p.m.
On Thu, Apr 17, 2014 at 10:47 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Thu, 2014-04-17 at 10:35 -0700, Andy Lutomirski wrote:
>> On Thu, Apr 17, 2014 at 10:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
>> > On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
>> >>
>> >> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
>> >> who supply SCM_CGROUP are explicitly indicating that they want their
>> >> cgroup associated with that message.  Callers of write(2) and send(2)
>> >> are simply indicating that they have some bytes that they want to
>> >> shove into whatever's at the other end of the fd.
>> >
>> > But there is no attack vector that passes by tricking setuid binaries to
>> > write to pre-opened file descriptors on sendmsg(), and for the other
>> > cases (connected socket) journald can always cross check with
>> > SO_PEERCGROUP, so why do we care again ?
>>
>> Because the proposed code does not do what I described, at least as
>> far I as I can tell.
>
> You do realize that we have been speaking in hypothetical for a while
> now ?
>
> Even without doing the SO_PEERCRED, you are not going to fool the log,
> as it gathers a ton of other info about the process, and cgroup is just
> one of the infos used to classify the log.
>
> There are also credentials, pid, and a lot of other things.
> Even if a setuid binary could be tricked to send a message with an
> "impostor" cgroup don't you think you'd see other things out of place ?
> (wrong uid, wrong pid, etc...).

Credentials and pid have much the same problem because SCM_CREDENTIALS
is screwed up.  That's not an excuse to screw up SCM_CGROUP in the
same way.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 17, 2014, 6:23 p.m.
On Thu, 2014-04-17 at 10:35 -0700, Andy Lutomirski wrote:
> On Thu, Apr 17, 2014 at 10:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
> > On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
> >>
> >> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
> >> who supply SCM_CGROUP are explicitly indicating that they want their
> >> cgroup associated with that message.  Callers of write(2) and send(2)
> >> are simply indicating that they have some bytes that they want to
> >> shove into whatever's at the other end of the fd.
> >
> > But there is no attack vector that passes by tricking setuid binaries to
> > write to pre-opened file descriptors on sendmsg(), and for the other
> > cases (connected socket) journald can always cross check with
> > SO_PEERCGROUP, so why do we care again ?
> 
> Because the proposed code does not do what I described, at least as
> far I as I can tell.

Ok let me backtrack, apparently if you explicitly use connect() on a
datagram socket then you *can* write() (thanks to Vivek for checking
this).

So you can trick something to write() to it but you can't do
SO_PEERCGROUP on the other side, because it is not really a connected
socket, the connection is only faked on the sender side by constructing
sendmsg() messages with the original address passed into connect().

So given this unfortunate circumstance, requiring the client to
explicitly pass cgroup data on unix datagram sockets may be an
acceptable request IMO.

Perhaps this could be done with a sendmsg() header flag or simplified
ancillary data even, rather than forcing the sender process to retrieve
and construct the whole information which is already available in
kernel.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 17, 2014, 6:31 p.m.
On Thu, 2014-04-17 at 11:04 -0700, Andy Lutomirski wrote:
> On Thu, Apr 17, 2014 at 10:52 AM, Simo Sorce <ssorce@redhat.com> wrote:
> > On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
> >> On Thu, Apr 17, 2014 at 10:12 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > On Thu, Apr 17, 2014 at 09:55:08AM -0700, Andy Lutomirski wrote:
> >> >> On Thu, Apr 17, 2014 at 9:48 AM, Simo Sorce <ssorce@redhat.com> wrote:
> >> >> > On Thu, 2014-04-17 at 09:37 -0700, Andy Lutomirski wrote:
> >> >> >> On Thu, Apr 17, 2014 at 9:24 AM, Simo Sorce <ssorce@redhat.com> wrote:
> >> >> >> > On Thu, 2014-04-17 at 09:11 -0700, Andy Lutomirski wrote:
> >> >> >> >>
> >> >> >> >> No.  The logging daemon thinks it wants to know who the writer is, but
> >> >> >> >> the logging daemon is wrong.  It actually wants to know who composed a
> >> >> >> >> log message destined to it.  The caller of write(2) may or may not be
> >> >> >> >> the same entity.
> >> >> >> >
> >> >> >> > This works both ways, and doesn't really matter, you are *no* better off
> >> >> >> > w/o this interface.
> >> >> >> >
> >> >> >> >> If this form of SO_PASSCGROUP somehow makes it into a pull request for
> >> >> >> >> Linus, I will ask him not to pull it and/or to revert it.  I think
> >> >> >> >> he'll agree that write(2) MUST NOT care who called it.
> >> >> >> >
> >> >> >> > And write() does not, there is no access control check being performed
> >> >> >> > here. This call is the same as getting the pid of the process and
> >> >> >> > crawling /proc with that information, just more efficient and race-free.
> >> >> >>
> >> >> >> Doing it using the pid of writer is wrong.  So is doing it with the
> >> >> >> cgroup of the writer.  The fact that it's even possible to use the pid
> >> >> >> of the caller of write(2) is a mistake, but that particular mistake
> >> >> >> is, unfortunately, well-enshrined in history.
> >> >> >>
> >> >> >> >
> >> >> >> > I repeat, it is *not* access control.
> >> >> >> >
> >> >> >>
> >> >> >> Sure it is.
> >> >> >>
> >> >> >> Either correct attribution of logs doesn't matter, in which case it
> >> >> >> makes little difference how you do it, or it does matter, in which
> >> >> >> case it should be done right.
> >> >> >
> >> >> > Well journald can *also* get  SO_PEERCGROUP and log anomalies if the 2
> >> >> > differ. That is if the log happens on a connected socket.
> >> >> >
> >> >> > If the log happens on a unix datagram* then SO_PEERCGROUP is not
> >> >> > available because there is no connect(), however write() cannot be used
> >> >> > either, only sendmsg() AFAIK, so the "setuid" binary attack does not
> >> >> > apply.
> >> >> >
> >> >>
> >> >> Or you could only send SCM_CGROUP when the writer asks sendmsg to send
> >> >> it, in which case this whole problem goes away.
> >> >
> >> > Sending SCM_CGROUP explicitly is also sending cgroup info at write(2) time
> >> > and if receiver uses that info for access control, it can be problematic.
> >> >
> >>
> >> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
> >> who supply SCM_CGROUP are explicitly indicating that they want their
> >> cgroup associated with that message.  Callers of write(2) and send(2)
> >> are simply indicating that they have some bytes that they want to
> >> shove into whatever's at the other end of the fd.
> >
> > So you are telling me that you want to change all code that writes to
> > stderr to be changed to use sendmsg() instead of write() in order to get
> > that information ?
> 
> No.  I'm telling you that I want whoever writes the logging code to
> change *the logging code* to use sendmsg.
> 

> > If you are using datagram sockets then the sender explicitly has to use
> > sendmsg() already and if a setuid binary can be convinced to send
> > arbitrary data to an arbitrary datagram sokcet you have bigger problems
> > in that binary, and said binary will send you whatever cgroup it is in.
> 
> Really?

I want to retract my comment in light of the fact you can use connect()
on a datagram socket and then kernel will "helpfully" then allow you to
use write() on it, I forgot about this detail.

For the logging q. above I wouldn't see any issue unless the journald
people were planning on using connect() themselves to pie things like
stderr over a datagram socket.
If that were the case, well then ...

OTOH, I still need to understand how you attack a setuid binary to fake
a cgroup, it's not like uig/gid information that is changed by the
simple fact of invoking the setuid binary, and this is a brand new
contract, so it is not unreasonable to put in the security
considerations that a setuid binary should take extracare to where they
emit stdout/stderr message should they decide to change their cgroup ...

Simo.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 17, 2014, 6:33 p.m.
On Thu, Apr 17, 2014 at 11:23 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Thu, 2014-04-17 at 10:35 -0700, Andy Lutomirski wrote:
>> On Thu, Apr 17, 2014 at 10:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
>> > On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
>> >>
>> >> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
>> >> who supply SCM_CGROUP are explicitly indicating that they want their
>> >> cgroup associated with that message.  Callers of write(2) and send(2)
>> >> are simply indicating that they have some bytes that they want to
>> >> shove into whatever's at the other end of the fd.
>> >
>> > But there is no attack vector that passes by tricking setuid binaries to
>> > write to pre-opened file descriptors on sendmsg(), and for the other
>> > cases (connected socket) journald can always cross check with
>> > SO_PEERCGROUP, so why do we care again ?
>>
>> Because the proposed code does not do what I described, at least as
>> far I as I can tell.
>
> Ok let me backtrack, apparently if you explicitly use connect() on a
> datagram socket then you *can* write() (thanks to Vivek for checking
> this).
>
> So you can trick something to write() to it but you can't do
> SO_PEERCGROUP on the other side, because it is not really a connected
> socket, the connection is only faked on the sender side by constructing
> sendmsg() messages with the original address passed into connect().
>
> So given this unfortunate circumstance, requiring the client to
> explicitly pass cgroup data on unix datagram sockets may be an
> acceptable request IMO.
>
> Perhaps this could be done with a sendmsg() header flag or simplified
> ancillary data even, rather than forcing the sender process to retrieve
> and construct the whole information which is already available in
> kernel.

It seems reasonable to me to have the sender give a blank SCM_CGROUP
and to let the kernel populate it.

I'm still a bit vague on what happens if the sender is in two
different cgroups.  Which one wins?  Presumably the unified hierarchy
one if one of them is in use, but I haven't kept track of all the
changes there.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 17, 2014, 6:50 p.m.
On Thu, Apr 17, 2014 at 02:23:33PM -0400, Simo Sorce wrote:
> On Thu, 2014-04-17 at 10:35 -0700, Andy Lutomirski wrote:
> > On Thu, Apr 17, 2014 at 10:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
> > > On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
> > >>
> > >> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
> > >> who supply SCM_CGROUP are explicitly indicating that they want their
> > >> cgroup associated with that message.  Callers of write(2) and send(2)
> > >> are simply indicating that they have some bytes that they want to
> > >> shove into whatever's at the other end of the fd.
> > >
> > > But there is no attack vector that passes by tricking setuid binaries to
> > > write to pre-opened file descriptors on sendmsg(), and for the other
> > > cases (connected socket) journald can always cross check with
> > > SO_PEERCGROUP, so why do we care again ?
> > 
> > Because the proposed code does not do what I described, at least as
> > far I as I can tell.
> 
> Ok let me backtrack, apparently if you explicitly use connect() on a
> datagram socket then you *can* write() (thanks to Vivek for checking
> this).
> 
> So you can trick something to write() to it but you can't do
> SO_PEERCGROUP on the other side, because it is not really a connected
> socket, the connection is only faked on the sender side by constructing
> sendmsg() messages with the original address passed into connect().
> 
> So given this unfortunate circumstance, requiring the client to
> explicitly pass cgroup data on unix datagram sockets may be an
> acceptable request IMO.
> 
> Perhaps this could be done with a sendmsg() header flag or simplified
> ancillary data even, rather than forcing the sender process to retrieve
> and construct the whole information which is already available in
> kernel.

So what would be the protocol here? When should somebody send an
SCM_CGROUP message using sendmsg()?

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 17, 2014, 6:57 p.m.
On Thu, Apr 17, 2014 at 02:50:23PM -0400, Vivek Goyal wrote:
> On Thu, Apr 17, 2014 at 02:23:33PM -0400, Simo Sorce wrote:
> > On Thu, 2014-04-17 at 10:35 -0700, Andy Lutomirski wrote:
> > > On Thu, Apr 17, 2014 at 10:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
> > > > On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
> > > >>
> > > >> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
> > > >> who supply SCM_CGROUP are explicitly indicating that they want their
> > > >> cgroup associated with that message.  Callers of write(2) and send(2)
> > > >> are simply indicating that they have some bytes that they want to
> > > >> shove into whatever's at the other end of the fd.
> > > >
> > > > But there is no attack vector that passes by tricking setuid binaries to
> > > > write to pre-opened file descriptors on sendmsg(), and for the other
> > > > cases (connected socket) journald can always cross check with
> > > > SO_PEERCGROUP, so why do we care again ?
> > > 
> > > Because the proposed code does not do what I described, at least as
> > > far I as I can tell.
> > 
> > Ok let me backtrack, apparently if you explicitly use connect() on a
> > datagram socket then you *can* write() (thanks to Vivek for checking
> > this).
> > 
> > So you can trick something to write() to it but you can't do
> > SO_PEERCGROUP on the other side, because it is not really a connected
> > socket, the connection is only faked on the sender side by constructing
> > sendmsg() messages with the original address passed into connect().
> > 
> > So given this unfortunate circumstance, requiring the client to
> > explicitly pass cgroup data on unix datagram sockets may be an
> > acceptable request IMO.
> > 
> > Perhaps this could be done with a sendmsg() header flag or simplified
> > ancillary data even, rather than forcing the sender process to retrieve
> > and construct the whole information which is already available in
> > kernel.
> 
> So what would be the protocol here? When should somebody send an
> SCM_CGROUP message using sendmsg()?

I don't know how it will even be used for systemd logging case. systemd
provides various ways to connect stdout of services. So say a service's
stdout is connected to a connected datagram socket and all printf()
messages to stdout are being logged by receiver in journal. Now how
would sender know that it is supposed to send SCM_CGROUP? One needs
to modify printf() now?

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 17, 2014, 7:06 p.m.
On Thu, Apr 17, 2014 at 11:57 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Apr 17, 2014 at 02:50:23PM -0400, Vivek Goyal wrote:
>> On Thu, Apr 17, 2014 at 02:23:33PM -0400, Simo Sorce wrote:
>> > On Thu, 2014-04-17 at 10:35 -0700, Andy Lutomirski wrote:
>> > > On Thu, Apr 17, 2014 at 10:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
>> > > > On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
>> > > >>
>> > > >> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
>> > > >> who supply SCM_CGROUP are explicitly indicating that they want their
>> > > >> cgroup associated with that message.  Callers of write(2) and send(2)
>> > > >> are simply indicating that they have some bytes that they want to
>> > > >> shove into whatever's at the other end of the fd.
>> > > >
>> > > > But there is no attack vector that passes by tricking setuid binaries to
>> > > > write to pre-opened file descriptors on sendmsg(), and for the other
>> > > > cases (connected socket) journald can always cross check with
>> > > > SO_PEERCGROUP, so why do we care again ?
>> > >
>> > > Because the proposed code does not do what I described, at least as
>> > > far I as I can tell.
>> >
>> > Ok let me backtrack, apparently if you explicitly use connect() on a
>> > datagram socket then you *can* write() (thanks to Vivek for checking
>> > this).
>> >
>> > So you can trick something to write() to it but you can't do
>> > SO_PEERCGROUP on the other side, because it is not really a connected
>> > socket, the connection is only faked on the sender side by constructing
>> > sendmsg() messages with the original address passed into connect().
>> >
>> > So given this unfortunate circumstance, requiring the client to
>> > explicitly pass cgroup data on unix datagram sockets may be an
>> > acceptable request IMO.
>> >
>> > Perhaps this could be done with a sendmsg() header flag or simplified
>> > ancillary data even, rather than forcing the sender process to retrieve
>> > and construct the whole information which is already available in
>> > kernel.
>>
>> So what would be the protocol here? When should somebody send an
>> SCM_CGROUP message using sendmsg()?
>
> I don't know how it will even be used for systemd logging case. systemd
> provides various ways to connect stdout of services. So say a service's
> stdout is connected to a connected datagram socket and all printf()
> messages to stdout are being logged by receiver in journal. Now how
> would sender know that it is supposed to send SCM_CGROUP? One needs
> to modify printf() now?

Does connecting stdout to a datagram socket really work well?  The
systemd function connect_logger_as looks like it's using stream
sockets, one per service, connected to /run/systemd/journal/stdout.
There's some rather strange logic in journald to authenticate the
thing that connects (using SO_PEERCRED!), but I don't see why this
code would even want to use SCM_CGROUP.

IOW, write(2) issues notwithstanding, I'm still wondering what the use
case for this whole thing is.

>
> Thanks
> Vivek
Simo Sorce - April 17, 2014, 7:10 p.m.
On Thu, 2014-04-17 at 14:50 -0400, Vivek Goyal wrote:
> On Thu, Apr 17, 2014 at 02:23:33PM -0400, Simo Sorce wrote:
> > On Thu, 2014-04-17 at 10:35 -0700, Andy Lutomirski wrote:
> > > On Thu, Apr 17, 2014 at 10:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
> > > > On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
> > > >>
> > > >> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
> > > >> who supply SCM_CGROUP are explicitly indicating that they want their
> > > >> cgroup associated with that message.  Callers of write(2) and send(2)
> > > >> are simply indicating that they have some bytes that they want to
> > > >> shove into whatever's at the other end of the fd.
> > > >
> > > > But there is no attack vector that passes by tricking setuid binaries to
> > > > write to pre-opened file descriptors on sendmsg(), and for the other
> > > > cases (connected socket) journald can always cross check with
> > > > SO_PEERCGROUP, so why do we care again ?
> > > 
> > > Because the proposed code does not do what I described, at least as
> > > far I as I can tell.
> > 
> > Ok let me backtrack, apparently if you explicitly use connect() on a
> > datagram socket then you *can* write() (thanks to Vivek for checking
> > this).
> > 
> > So you can trick something to write() to it but you can't do
> > SO_PEERCGROUP on the other side, because it is not really a connected
> > socket, the connection is only faked on the sender side by constructing
> > sendmsg() messages with the original address passed into connect().
> > 
> > So given this unfortunate circumstance, requiring the client to
> > explicitly pass cgroup data on unix datagram sockets may be an
> > acceptable request IMO.
> > 
> > Perhaps this could be done with a sendmsg() header flag or simplified
> > ancillary data even, rather than forcing the sender process to retrieve
> > and construct the whole information which is already available in
> > kernel.
> 
> So what would be the protocol here? When should somebody send an
> SCM_CGROUP message using sendmsg()?

Andy's point is that the sender application should explicitly decide to
send SCM_CGROUP data to the receiver, and if it doesn't do that then the
receiver will get no cgroup information at all even if it used
SO_PASSCGROUP to ask for it.

I am still not convinced that we should not allow to send the
information regardless, I think we should simply make it very clear that
there are ways this information may be used improperly so it shouldn't
be the unique factor in deciding anything.

After all, the receiving application can use SCM_CREDENTIALS to get the
pid and then try to check for the cgroup information in /proc, which is
terribly racy and expensive compared to unconditional SO_PASSCGROUP

For the log case, even faking cgroup information shouldn't be enough to
fool the logger, because it crosschecks information and takes in account
things like uids and pids.

Another option would be to allow unconditional SO_PASSCGROUP only if the
socket was opened (on the sender side) by a process with some explicit
privilege to do so ?  But I am not sure this really changes anything if
journald wants to use this mechanism on stderr.
And we are still talking about a hypothetical attack were a setuid
binary can be tricked to change its cgroup to another one chosen by the
attacker, and cause the same process to emit messages on stdout/stederr
in a format that does not give away something funny is going on, and
also counting on the fact that the logger will completely ignore that
the pid is different etc...

At this point I think journald people need to give a little bit more
details on how they plan to use SO_PASSCGROUP.

For my use cases I care only about streams and SO_PEERCGROUP that does
not have any of the (perceived) issues of SO_PASSCGROUP.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 17, 2014, 7:15 p.m.
On Thu, 2014-04-17 at 12:06 -0700, Andy Lutomirski wrote:
> On Thu, Apr 17, 2014 at 11:57 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Thu, Apr 17, 2014 at 02:50:23PM -0400, Vivek Goyal wrote:
> >> On Thu, Apr 17, 2014 at 02:23:33PM -0400, Simo Sorce wrote:
> >> > On Thu, 2014-04-17 at 10:35 -0700, Andy Lutomirski wrote:
> >> > > On Thu, Apr 17, 2014 at 10:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
> >> > > > On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
> >> > > >>
> >> > > >> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
> >> > > >> who supply SCM_CGROUP are explicitly indicating that they want their
> >> > > >> cgroup associated with that message.  Callers of write(2) and send(2)
> >> > > >> are simply indicating that they have some bytes that they want to
> >> > > >> shove into whatever's at the other end of the fd.
> >> > > >
> >> > > > But there is no attack vector that passes by tricking setuid binaries to
> >> > > > write to pre-opened file descriptors on sendmsg(), and for the other
> >> > > > cases (connected socket) journald can always cross check with
> >> > > > SO_PEERCGROUP, so why do we care again ?
> >> > >
> >> > > Because the proposed code does not do what I described, at least as
> >> > > far I as I can tell.
> >> >
> >> > Ok let me backtrack, apparently if you explicitly use connect() on a
> >> > datagram socket then you *can* write() (thanks to Vivek for checking
> >> > this).
> >> >
> >> > So you can trick something to write() to it but you can't do
> >> > SO_PEERCGROUP on the other side, because it is not really a connected
> >> > socket, the connection is only faked on the sender side by constructing
> >> > sendmsg() messages with the original address passed into connect().
> >> >
> >> > So given this unfortunate circumstance, requiring the client to
> >> > explicitly pass cgroup data on unix datagram sockets may be an
> >> > acceptable request IMO.
> >> >
> >> > Perhaps this could be done with a sendmsg() header flag or simplified
> >> > ancillary data even, rather than forcing the sender process to retrieve
> >> > and construct the whole information which is already available in
> >> > kernel.
> >>
> >> So what would be the protocol here? When should somebody send an
> >> SCM_CGROUP message using sendmsg()?
> >
> > I don't know how it will even be used for systemd logging case. systemd
> > provides various ways to connect stdout of services. So say a service's
> > stdout is connected to a connected datagram socket and all printf()
> > messages to stdout are being logged by receiver in journal. Now how
> > would sender know that it is supposed to send SCM_CGROUP? One needs
> > to modify printf() now?
> 
> Does connecting stdout to a datagram socket really work well?  The
> systemd function connect_logger_as looks like it's using stream
> sockets, one per service, connected to /run/systemd/journal/stdout.
> There's some rather strange logic in journald to authenticate the
> thing that connects (using SO_PEERCRED!), but I don't see why this
> code would even want to use SCM_CGROUP.
> 
> IOW, write(2) issues notwithstanding, I'm still wondering what the use
> case for this whole thing is.

I "think" the use case is to aggregate all the logs that belong to a
specific service by using a cgroup name, then, as long as children do
not close stdout/stderr anything they emit would be captured and
properly filed with the rest of the logs from the other process of the
same control group, which has been made to mean "the service".

I also "think" using datagram sockets may be an attempt to reduce the
number of sockets that need to be kept open and polled on the receiving
side.

But I really haven't discussed the use case with them, we just happened
to come to similar needs wrt knowing information about cgroups, and it
seemed logical to combined all needs into a single patchset given they
are related from the kernel point of view.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 17, 2014, 7:16 p.m.
On Thu, Apr 17, 2014 at 03:10:17PM -0400, Simo Sorce wrote:

[..]
> At this point I think journald people need to give a little bit more
> details on how they plan to use SO_PASSCGROUP.
> 
> For my use cases I care only about streams and SO_PEERCGROUP that does
> not have any of the (perceived) issues of SO_PASSCGROUP.

Ok, so we agree that SO_PEERCGROUP is not a problem. And it solves the
problem for some of the use cases.

And there is lot of contention on the SO_PASSCGROUP option.

So how about taking one step at a time. Get SO_PEERCGROUP in first and
then get into more details on how SO_PASSCGROUP will exactly be used and
then decide what to do.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 17, 2014, 7:19 p.m.
On Thu, Apr 17, 2014 at 12:15 PM, Simo Sorce <ssorce@redhat.com> wrote:
> On Thu, 2014-04-17 at 12:06 -0700, Andy Lutomirski wrote:
>> On Thu, Apr 17, 2014 at 11:57 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Thu, Apr 17, 2014 at 02:50:23PM -0400, Vivek Goyal wrote:
>> >> On Thu, Apr 17, 2014 at 02:23:33PM -0400, Simo Sorce wrote:
>> >> > On Thu, 2014-04-17 at 10:35 -0700, Andy Lutomirski wrote:
>> >> > > On Thu, Apr 17, 2014 at 10:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
>> >> > > > On Thu, 2014-04-17 at 10:26 -0700, Andy Lutomirski wrote:
>> >> > > >>
>> >> > > >> Not really.  write(2) can't send SCM_CGROUP.  Callers of sendmsg(2)
>> >> > > >> who supply SCM_CGROUP are explicitly indicating that they want their
>> >> > > >> cgroup associated with that message.  Callers of write(2) and send(2)
>> >> > > >> are simply indicating that they have some bytes that they want to
>> >> > > >> shove into whatever's at the other end of the fd.
>> >> > > >
>> >> > > > But there is no attack vector that passes by tricking setuid binaries to
>> >> > > > write to pre-opened file descriptors on sendmsg(), and for the other
>> >> > > > cases (connected socket) journald can always cross check with
>> >> > > > SO_PEERCGROUP, so why do we care again ?
>> >> > >
>> >> > > Because the proposed code does not do what I described, at least as
>> >> > > far I as I can tell.
>> >> >
>> >> > Ok let me backtrack, apparently if you explicitly use connect() on a
>> >> > datagram socket then you *can* write() (thanks to Vivek for checking
>> >> > this).
>> >> >
>> >> > So you can trick something to write() to it but you can't do
>> >> > SO_PEERCGROUP on the other side, because it is not really a connected
>> >> > socket, the connection is only faked on the sender side by constructing
>> >> > sendmsg() messages with the original address passed into connect().
>> >> >
>> >> > So given this unfortunate circumstance, requiring the client to
>> >> > explicitly pass cgroup data on unix datagram sockets may be an
>> >> > acceptable request IMO.
>> >> >
>> >> > Perhaps this could be done with a sendmsg() header flag or simplified
>> >> > ancillary data even, rather than forcing the sender process to retrieve
>> >> > and construct the whole information which is already available in
>> >> > kernel.
>> >>
>> >> So what would be the protocol here? When should somebody send an
>> >> SCM_CGROUP message using sendmsg()?
>> >
>> > I don't know how it will even be used for systemd logging case. systemd
>> > provides various ways to connect stdout of services. So say a service's
>> > stdout is connected to a connected datagram socket and all printf()
>> > messages to stdout are being logged by receiver in journal. Now how
>> > would sender know that it is supposed to send SCM_CGROUP? One needs
>> > to modify printf() now?
>>
>> Does connecting stdout to a datagram socket really work well?  The
>> systemd function connect_logger_as looks like it's using stream
>> sockets, one per service, connected to /run/systemd/journal/stdout.
>> There's some rather strange logic in journald to authenticate the
>> thing that connects (using SO_PEERCRED!), but I don't see why this
>> code would even want to use SCM_CGROUP.
>>
>> IOW, write(2) issues notwithstanding, I'm still wondering what the use
>> case for this whole thing is.
>
> I "think" the use case is to aggregate all the logs that belong to a
> specific service by using a cgroup name, then, as long as children do
> not close stdout/stderr anything they emit would be captured and
> properly filed with the rest of the logs from the other process of the
> same control group, which has been made to mean "the service".

Would it be worth asking the people who actually intend to use this
thing to comment, then?  As far as I can tell, journald already does
this by using one socket per service.

>
> I also "think" using datagram sockets may be an attempt to reduce the
> number of sockets that need to be kept open and polled on the receiving
> side.

I think this can be done today with recvfrom.  At service creation
time, systemd creates a new datagram socket, connects it, calls
getsockname, and records that somewhere.

The downside is that there is no notification that your peer is gone.
There's also no notification that a cgroup is gone, so that part makes
little difference.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 17, 2014, 7:23 p.m.
On Thu, Apr 17, 2014 at 11:50 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Apr 17, 2014 at 02:23:33PM -0400, Simo Sorce wrote:
>> Perhaps this could be done with a sendmsg() header flag or simplified
>> ancillary data even, rather than forcing the sender process to retrieve
>> and construct the whole information which is already available in
>> kernel.
>
> So what would be the protocol here? When should somebody send an
> SCM_CGROUP message using sendmsg()?
>

Presumably whenever it knows talking to someone who cares about the
cgroup.  Since I don't understand why you would want to know someone's
cgroup when speaking a protocol that didn't previously require cgroup
infomation, I don't really have a good answer for you.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 17, 2014, 7:46 p.m.
On Thu, Apr 17, 2014 at 12:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Apr 17, 2014 at 03:10:17PM -0400, Simo Sorce wrote:
>
> [..]
>> At this point I think journald people need to give a little bit more
>> details on how they plan to use SO_PASSCGROUP.
>>
>> For my use cases I care only about streams and SO_PEERCGROUP that does
>> not have any of the (perceived) issues of SO_PASSCGROUP.
>
> Ok, so we agree that SO_PEERCGROUP is not a problem. And it solves the
> problem for some of the use cases.
>
> And there is lot of contention on the SO_PASSCGROUP option.
>
> So how about taking one step at a time. Get SO_PEERCGROUP in first and
> then get into more details on how SO_PASSCGROUP will exactly be used and
> then decide what to do.

My only objection to SO_PEERCGROUP is that I don't believe that a
legitimate use case exists.  I think the feature itself is safe to
add.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 21, 2014, 3:03 p.m.
On Thu, Apr 17, 2014 at 12:46:22PM -0700, Andy Lutomirski wrote:
> On Thu, Apr 17, 2014 at 12:16 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Thu, Apr 17, 2014 at 03:10:17PM -0400, Simo Sorce wrote:
> >
> > [..]
> >> At this point I think journald people need to give a little bit more
> >> details on how they plan to use SO_PASSCGROUP.
> >>
> >> For my use cases I care only about streams and SO_PEERCGROUP that does
> >> not have any of the (perceived) issues of SO_PASSCGROUP.
> >
> > Ok, so we agree that SO_PEERCGROUP is not a problem. And it solves the
> > problem for some of the use cases.
> >
> > And there is lot of contention on the SO_PASSCGROUP option.
> >
> > So how about taking one step at a time. Get SO_PEERCGROUP in first and
> > then get into more details on how SO_PASSCGROUP will exactly be used and
> > then decide what to do.
> 
> My only objection to SO_PEERCGROUP is that I don't believe that a
> legitimate use case exists.  I think the feature itself is safe to
> add.

So what happened to logger use case where logger accepts stream
connections and logs the cgroup of client too.

W.r.t systemd, looks like journald is accepting connections at
/run/systemd/journal/stdout. (stdout_stream_new() and
server_open_stdout_socket()).

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 21, 2014, 3:47 p.m.
On Mon, Apr 21, 2014 at 8:03 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> So what happened to logger use case where logger accepts stream
> connections and logs the cgroup of client too.
>
> W.r.t systemd, looks like journald is accepting connections at
> /run/systemd/journal/stdout. (stdout_stream_new() and
> server_open_stdout_socket()).

See stdout_stream_line.  As far as I can tell, journald already
implements this in mostly sensible manner, with no help from the
kernel required.

On my system, journalctl -f -o verbose says:

Mon 2014-04-21 08:34:52.732065 PDT
[s=4970edca25b4456d80b00e6e4cefd94b;i=2010;b=2d2454632c0f4f998a8d0158156ab743;m=66f5d274a;t=4f78f3d9a11a1;x=9902671f5a7e7bcc]
    _UID=0
    _BOOT_ID=2d2454632c0f4f998a8d0158156ab743
[...]
    _GID=500
    _AUDIT_SESSION=1
    _AUDIT_LOGINUID=500
    _SYSTEMD_CGROUP=/user.slice/user-500.slice/session-1.scope
    _SYSTEMD_SESSION=1
    _SYSTEMD_OWNER_UID=500
    _SYSTEMD_UNIT=session-1.scope
    _SYSTEMD_SLICE=user-500.slice
    SYSLOG_IDENTIFIER=sudo
    _COMM=sudo
    _EXE=/usr/bin/sudo
    _SELINUX_CONTEXT=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
    MESSAGE=luto : TTY=pts/1 ; PWD=/home/luto/apps/systemd ; USER=root
; COMMAND=/usr/bin/journalctl -f -a
    _PID=32393
    _CMDLINE=sudo journalctl -f -a
    _SOURCE_REALTIME_TIMESTAMP=1398094492732065

Unfortunately, the code in journald seems to be rather buggy and
prefers the unit that it derives from the (racy!) cg_path_get_unit
hack over the unit that is *already knows* (search the journald
sources for STDOUT_STREAM_UNIT_ID), but the right fix is the FIX THE
FSCKING JOURNALD BUG, not to change the kernel.

To summarize from my reading of how this crap words:

When a unit is created, systemd opens a stream socket pointing at
/run/systemd/journal/stdout.  It tells journald the unit, along with
lots of other useful information.  journald records this association
between the socket and the unit.  Systemd could tell journald the
cgroup here, too, if it wanted it.

Systemd then starts the unit, passing it the socket as stdout, if
configured to do so.

That unit logs something.  Journald then uses the crappy, racy ucred
mechanism to resolve the cgroup, login id, unit, etc.

Your proposals are to either (a) replace that with an almost-as-buggy
SO_PASSCGROUP option or to add SO_PEERCGROUP.  The latter would allow
journald to figure out the cgroup that opened the socket.  The problem
here is two-fold.  One: systemd already knows the cgroup it intends to
use, and it can tell journald without kernel help.  Two: Systemd seems
to open the stdout socket right before setting the cgroup, so the
kernel's idea of what cgroup opened the socket is crap.

The solution to all of this seems straightforward: fix journald to use
the information it already has, trusted, without races, from systemd.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 23, 2014, 3:07 p.m.
On Mon, Apr 21, 2014 at 08:47:51AM -0700, Andy Lutomirski wrote:

[..]
> To summarize from my reading of how this crap words:
> 
> When a unit is created, systemd opens a stream socket pointing at
> /run/systemd/journal/stdout.  It tells journald the unit, along with
> lots of other useful information.  journald records this association
> between the socket and the unit.  Systemd could tell journald the
> cgroup here, too, if it wanted it.

Ok, that's a fair point. I looked at  connect_logger_as() and I see
that systemd does connect() on behalf of service being launched and
sets up fd and passes bunch of information to journald. So cgroup could
be one of the information and that would act like SO_PEERCGROUP in
this specific case. Not sure why it is not done though. I will let
systemd folks comment on that. I don't have enough background here.

But this works in this specific case where there is a mechanism to 
pass meta information to receiver. What about SSSD use case where 
they want to know the cgroup of client and possibly provide differentiated
service based on client.

Also Dan Walsh mentioned that what if a process directly wants to open
journal socket and log something to journal. 

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski - April 23, 2014, 3:37 p.m.
On Wed, Apr 23, 2014 at 8:07 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Mon, Apr 21, 2014 at 08:47:51AM -0700, Andy Lutomirski wrote:
>
> [..]
>> To summarize from my reading of how this crap words:
>>
>> When a unit is created, systemd opens a stream socket pointing at
>> /run/systemd/journal/stdout.  It tells journald the unit, along with
>> lots of other useful information.  journald records this association
>> between the socket and the unit.  Systemd could tell journald the
>> cgroup here, too, if it wanted it.
>
> Ok, that's a fair point. I looked at  connect_logger_as() and I see
> that systemd does connect() on behalf of service being launched and
> sets up fd and passes bunch of information to journald. So cgroup could
> be one of the information and that would act like SO_PEERCGROUP in
> this specific case. Not sure why it is not done though. I will let
> systemd folks comment on that. I don't have enough background here.
>
> But this works in this specific case where there is a mechanism to
> pass meta information to receiver. What about SSSD use case where
> they want to know the cgroup of client and possibly provide differentiated
> service based on client.

Separate sockets sounds like it will work just fine, if not better, here, to me.

>
> Also Dan Walsh mentioned that what if a process directly wants to open
> journal socket and log something to journal.

This is a fair point.  I think that cgroup is a very odd thing to log,
but systemd unit isn't so strange.

As I've said, I won't object strongly to SO_PEERCGROUP.  I think that
applications that rely on it are likely to be annoying to their users,
but I think the API itself is okay.

I'd still rather see a good, general solution for sensibly identifying
yourself to your peer, but that can wait.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 23, 2014, 4:01 p.m.
On Wed, Apr 23, 2014 at 08:37:56AM -0700, Andy Lutomirski wrote:
> On Wed, Apr 23, 2014 at 8:07 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Mon, Apr 21, 2014 at 08:47:51AM -0700, Andy Lutomirski wrote:
> >
> > [..]
> >> To summarize from my reading of how this crap words:
> >>
> >> When a unit is created, systemd opens a stream socket pointing at
> >> /run/systemd/journal/stdout.  It tells journald the unit, along with
> >> lots of other useful information.  journald records this association
> >> between the socket and the unit.  Systemd could tell journald the
> >> cgroup here, too, if it wanted it.
> >
> > Ok, that's a fair point. I looked at  connect_logger_as() and I see
> > that systemd does connect() on behalf of service being launched and
> > sets up fd and passes bunch of information to journald. So cgroup could
> > be one of the information and that would act like SO_PEERCGROUP in
> > this specific case. Not sure why it is not done though. I will let
> > systemd folks comment on that. I don't have enough background here.
> >
> > But this works in this specific case where there is a mechanism to
> > pass meta information to receiver. What about SSSD use case where
> > they want to know the cgroup of client and possibly provide differentiated
> > service based on client.
> 
> Separate sockets sounds like it will work just fine, if not better, here, to me.

That's one way of doing it. But that can't be the argument to not provide
another way of doing same thing with the help of single socket instead
of maintaining multiple ones.

Setting up separate socket for each client will require knowing every
client in advance so that one can socket appropriately. But mutiplexing
everything on one socket, does not have that constraint and should make
life little bit easier.

I think they will be just two different ways of doing things and user
can choose one depending on what suits them.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 23, 2014, 4:45 p.m.
On Tue, Apr 15, 2014 at 08:47:54PM -0700, Andy Lutomirski wrote:

[..]
> Here's an attack against SO_PASSCGROUP, as you implemented it: connect
> a socket and get someone else to write(2) to it.  This isn't very
> hard.  Now you've impersonated.

If this is a problem then I think kernel requires fixing. Because kernel
will apply all resource management policies based on the cgroup at write(2)
time and not based on open() time.

For example, blkio throttling policies. If you get a process in other
cgroup to read/write to an fd, then IO throttling rules of that cgroup
are applied and it does not matter who opened fd in first place.

So SO_PASSCGROUP is not exactly same as SO_PASSCRED in that sense. If
there are issues w.r.t authorization/authentication etc, then that
should be a recommendation to user space that don't use cgroup info
for unsafe cases.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - April 23, 2014, 5:29 p.m.
From: Vivek Goyal <vgoyal@redhat.com>
Date: Wed, 23 Apr 2014 12:45:37 -0400

> On Tue, Apr 15, 2014 at 08:47:54PM -0700, Andy Lutomirski wrote:
> 
> [..]
>> Here's an attack against SO_PASSCGROUP, as you implemented it: connect
>> a socket and get someone else to write(2) to it.  This isn't very
>> hard.  Now you've impersonated.
> 
> If this is a problem then I think kernel requires fixing. Because kernel
> will apply all resource management policies based on the cgroup at write(2)
> time and not based on open() time.

Anyways, this is not even worth discussing.

We already agreed that the cgroup passed at write time with SO_PASSGROUP
enabled should be the socket creation time cgroup.

Just like SO_PASSCRED does.

The identity given is thus the one at open() time.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 24, 2014, 8:34 p.m.
On Wed, Apr 23, 2014 at 01:29:55PM -0400, David Miller wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> Date: Wed, 23 Apr 2014 12:45:37 -0400
> 
> > On Tue, Apr 15, 2014 at 08:47:54PM -0700, Andy Lutomirski wrote:
> > 
> > [..]
> >> Here's an attack against SO_PASSCGROUP, as you implemented it: connect
> >> a socket and get someone else to write(2) to it.  This isn't very
> >> hard.  Now you've impersonated.
> > 
> > If this is a problem then I think kernel requires fixing. Because kernel
> > will apply all resource management policies based on the cgroup at write(2)
> > time and not based on open() time.
> 
> Anyways, this is not even worth discussing.
> 
> We already agreed that the cgroup passed at write time with SO_PASSGROUP
> enabled should be the socket creation time cgroup.
> 
> Just like SO_PASSCRED does.
> 
> The identity given is thus the one at open() time.

Hi Dave,

I am not sure about few things. So I will ask.

By open() time you mean at socket() time or at connect() time? I guess
you mean connect() time as at socket() time there are no access control
checks and even if socket fd is passed around, it does not matter. So
connect() seems to be more close to open() as opposed to socket().

You also mentioned that you want SO_PEERCGROUP and SO_PASSCGROUP as pairs
like SO_PEERCRED and SO_PASSCRED. But to me, SO_PEERCRED and SO_PASSCRED
are not *exact* pairs and are little different in their semantics.
SO_PEERCRED gives us client creds at connect() time while SO_PASSCRED
client's real creds at sendmsg() time. SO_PASSCRED does not store client's
credential's at connect() time for datagram sockets.

So which semantics are we looking for. If we are looking for same
semantics as PEERCRED/PASSCRED, then I think my patches are already
there and don't need modifications.

But if we are looking for deviation SO_PASSCRED and store and pass
sender's cgroup info retrieved at connect() time, then that requires
rework of patches.
 
W.r.t privilige escalation by tricking setuid program to write to a 
descriptor, I don't think my patches are susceptible to that. setuid
programs don't change cgroups themselves, so even if caller tricks setuid
to write to an fd, receiver will still receive same cgroup info.

If caller launches setuid program in a different cgroup (using cgexec or some
other program), that means caller has the privilige to get into that
cgroup and there is no magic privilige escalation here because caller
has the permission to go in destination cgroup. IOW, caller can not
force/trick setuid program to run in a cgroup where caller does not
have permission to run itself.

Only caveat to this theory is what happens if setuid program changes
cgroup at his own. As we don't have any notion of real/effective cgroups,
I guess there needs to be a restriction on setuid programs to not change
cgroups at their own.

So which semantics you are looking for. Same as current SO_PASSCRED or
freeze cgroup info at connect() time and pass that info.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - April 24, 2014, 8:48 p.m.
From: Vivek Goyal <vgoyal@redhat.com>
Date: Thu, 24 Apr 2014 16:34:27 -0400

> By open() time you mean at socket() time or at connect() time?

I mean at all of the places at which init_peercred() occurs.

> You also mentioned that you want SO_PEERCGROUP and SO_PASSCGROUP as
> pairs like SO_PEERCRED and SO_PASSCRED.  But to me, SO_PEERCRED and
> SO_PASSCRED are not *exact* pairs and are little different in their
> semantics.  SO_PEERCRED gives us client creds at connect() time
> while SO_PASSCRED client's real creds at sendmsg() time. SO_PASSCRED
> does not store client's credential's at connect() time for datagram
> sockets.

Then you haven't been following the discussion.

The client's credentials at sendmsg()/write() time are "DO NOT CARE".

You cannot even guarentee the semantics in the logging example if
you ask for these "client identity at sendmsg() time" semantics.

What if the event occured when the client was in cgroup1, and the
log message goes out after it has been moved into cgroup2?

That is just proof that this whole idea is fundamentally flawed.

You guys need to come up with something else to achieve your goals,
this isn't it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal - April 24, 2014, 9:04 p.m.
On Thu, Apr 24, 2014 at 04:48:20PM -0400, David Miller wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> Date: Thu, 24 Apr 2014 16:34:27 -0400
> 
> > By open() time you mean at socket() time or at connect() time?
> 
> I mean at all of the places at which init_peercred() occurs.

init_peercred() is only used for stream sockets and not for datagram
sockets. Hence the confusion that what are cgroup semantics for datagram
sockets.

> 
> > You also mentioned that you want SO_PEERCGROUP and SO_PASSCGROUP as
> > pairs like SO_PEERCRED and SO_PASSCRED.  But to me, SO_PEERCRED and
> > SO_PASSCRED are not *exact* pairs and are little different in their
> > semantics.  SO_PEERCRED gives us client creds at connect() time
> > while SO_PASSCRED client's real creds at sendmsg() time. SO_PASSCRED
> > does not store client's credential's at connect() time for datagram
> > sockets.
> 
> Then you haven't been following the discussion.
> 
> The client's credentials at sendmsg()/write() time are "DO NOT CARE".
> 
> You cannot even guarentee the semantics in the logging example if
> you ask for these "client identity at sendmsg() time" semantics.
> 
> What if the event occured when the client was in cgroup1, and the
> log message goes out after it has been moved into cgroup2?

Does it really matter. If a task is changing cgroup while it is doing
other operations, I don't think one can guarantee which cgroup kernel
will effectively uses for a particular operation. It will depend on
what was cgroup when kernel actually made task_cgroup() call. 

> 
> That is just proof that this whole idea is fundamentally flawed.

I don't think anybody is looking for such strong semantics. For example,
same issue will exist if receiver gets the message and looks up
/proc/pid/cgroup to associate message with a cgroup. By then task might
have changed cgroup.

So I really don't think changing cgroup is a problem w.r.t API.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - April 24, 2014, 9:11 p.m.
From: Vivek Goyal <vgoyal@redhat.com>
Date: Thu, 24 Apr 2014 17:04:29 -0400

> Does it really matter.

Good question, if it doesn't matter, you might as well log garbage.

There are a lot of logical holes in this discussion.

Real UIDs are always reported at sendmsg() time, not effective ones
(the UID "at sendmsg() time").

Therefore if we were to add CGROUP passing at sendmsg() time it should
report the "real cgroup", the one the task has at the time the file
descriptor / socket was "created".
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simo Sorce - April 25, 2014, 12:29 a.m.
On Thu, 2014-04-24 at 17:11 -0400, David Miller wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> Date: Thu, 24 Apr 2014 17:04:29 -0400
> 
> > Does it really matter.
> 
> Good question, if it doesn't matter, you might as well log garbage.
> 
> There are a lot of logical holes in this discussion.
> 
> Real UIDs are always reported at sendmsg() time, not effective ones
> (the UID "at sendmsg() time").
> 
> Therefore if we were to add CGROUP passing at sendmsg() time it should
> report the "real cgroup", the one the task has at the time the file
> descriptor / socket was "created".

Given we are creating a new interface ... should we just send both the
cgroup at creation time and (optionally, if it differs) the cgroup the
process has at sendmsg() time, and let the peer sort out which one it is
interested in ?

Simo.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 7178353..8e67ddb 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -88,4 +88,5 @@ 
 #define SO_BPF_EXTENSIONS	48
 
 #define SO_PEERCGROUP		49
+#define SO_PASSCGROUP		50
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h
index 486212b..71e795a 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -81,4 +81,5 @@ 
 #define SO_BPF_EXTENSIONS	48
 
 #define SO_PEERCGROUP		49
+#define SO_PASSCGROUP		50
 #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/cris/include/uapi/asm/socket.h b/arch/cris/include/uapi/asm/socket.h
index 89a09e3..b339e52 100644
--- a/arch/cris/include/uapi/asm/socket.h
+++ b/arch/cris/include/uapi/asm/socket.h
@@ -83,6 +83,7 @@ 
 #define SO_BPF_EXTENSIONS	48
 
 #define SO_PEERCGROUP		49
+#define SO_PASSCGROUP		50
 
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
index c4d90bc..4fc46fb 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -81,5 +81,6 @@ 
 #define SO_BPF_EXTENSIONS	48
 
 #define SO_PEERCGROUP		49
+#define SO_PASSCGROUP		50
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index 62c196d..5e77320 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -90,5 +90,6 @@ 
 #define SO_BPF_EXTENSIONS	48
 
 #define SO_PEERCGROUP		49
+#define SO_PASSCGROUP		50
 
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
index 6e04a7d..aec9a78 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -81,4 +81,5 @@ 
 #define SO_BPF_EXTENSIONS	48
 
 #define SO_PEERCGROUP		49
+#define SO_PASSCGROUP		50
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index cfbd84b..30354ea 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -99,4 +99,5 @@ 
 #define SO_BPF_EXTENSIONS	48
 
 #define SO_PEERCGROUP		49
+#define SO_PASSCGROUP		50
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
index 73467fe..c68786d 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -81,4 +81,5 @@ 
 #define SO_BPF_EXTENSIONS	48
 
 #define SO_PEERCGROUP		49
+#define SO_PASSCGROUP		50
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 24d8913..6d3447a 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -80,4 +80,5 @@ 
 #define SO_BPF_EXTENSIONS	0x4029
 
 #define SO_PEERCGROUP           0x402a
+#define SO_PASSCGROUP           0x402b
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
index 50106be..89a55b8 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -88,4 +88,5 @@ 
 #define SO_BPF_EXTENSIONS	48
 
 #define SO_PEERCGROUP		49
+#define SO_PASSCGROUP		50
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index 4ae2f3c..f5b10d8 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -87,4 +87,5 @@ 
 #define SO_BPF_EXTENSIONS	48
 
 #define SO_PEERCGROUP		49
+#define SO_PASSCGROUP		50
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 1056168..d1c5f33 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -77,6 +77,7 @@ 
 #define SO_BPF_EXTENSIONS	0x0032
 
 #define SO_PEERCGROUP           0x0033
+#define SO_PASSCGROUP           0x0034
 
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
index 947bc6e..47b3593 100644
--- a/arch/xtensa/include/uapi/asm/socket.h
+++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -92,4 +92,5 @@ 
 #define SO_BPF_EXTENSIONS	48
 
 #define SO_PEERCGROUP		49
+#define SO_PASSCGROUP		50
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/linux/net.h b/include/linux/net.h
index 94734a6..5ec6d71 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -39,6 +39,7 @@  struct net;
 #define SOCK_PASSCRED		3
 #define SOCK_PASSSEC		4
 #define SOCK_EXTERNALLY_ALLOCATED 5
+#define SOCK_PASSCGROUP		6
 
 #ifndef ARCH_HAS_SOCKET_TYPES
 /**
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 8e98297..9993d65 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -130,6 +130,7 @@  static inline struct cmsghdr * cmsg_nxthdr (struct msghdr *__msg, struct cmsghdr
 #define	SCM_RIGHTS	0x01		/* rw: access rights (array of int) */
 #define SCM_CREDENTIALS 0x02		/* rw: struct ucred		*/
 #define SCM_SECURITY	0x03		/* rw: security label		*/
+#define SCM_CGROUP	0x04		/* r: cgroup path of sender	*/
 
 struct ucred {
 	__u32	pid;
diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index a175ba4..7301371 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -36,6 +36,7 @@  struct unix_skb_parms {
 	u32			secid;		/* Security ID		*/
 #endif
 	u32			consumed;
+	char			*cgroup_path;
 };
 
 #define UNIXCB(skb) 	(*(struct unix_skb_parms *)&((skb)->cb))
diff --git a/include/net/scm.h b/include/net/scm.h
index 262532d..477c154 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -31,6 +31,7 @@  struct scm_cookie {
 #ifdef CONFIG_SECURITY_NETWORK
 	u32			secid;		/* Passed security ID 	*/
 #endif
+	char			*cgroup_path;
 };
 
 void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm);
@@ -64,11 +65,18 @@  static __inline__ void scm_destroy_cred(struct scm_cookie *scm)
 	scm->pid  = NULL;
 }
 
+static __inline__ void scm_free_cgroup_path(struct scm_cookie *scm)
+{
+	kfree(scm->cgroup_path);
+	scm->cgroup_path = NULL;
+}
+
 static __inline__ void scm_destroy(struct scm_cookie *scm)
 {
 	scm_destroy_cred(scm);
 	if (scm->fp)
 		__scm_destroy(scm);
+	scm_free_cgroup_path(scm);
 }
 
 static __inline__ int scm_send(struct socket *sock, struct msghdr *msg,
@@ -110,7 +118,8 @@  static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg,
 				struct scm_cookie *scm, int flags)
 {
 	if (!msg->msg_control) {
-		if (test_bit(SOCK_PASSCRED, &sock->flags) || scm->fp)
+		if (test_bit(SOCK_PASSCRED, &sock->flags) || scm->fp ||
+		    test_bit(SOCK_PASSCGROUP, &sock->flags))
 			msg->msg_flags |= MSG_CTRUNC;
 		scm_destroy(scm);
 		return;
@@ -130,10 +139,17 @@  static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg,
 
 	scm_passec(sock, msg, scm);
 
-	if (!scm->fp)
-		return;
-	
-	scm_detach_fds(msg, scm);
+	if (scm->fp)
+		scm_detach_fds(msg, scm);
+
+	if (scm->cgroup_path) {
+		if (test_bit(SOCK_PASSCGROUP, &sock->flags)) {
+			int len = strlen(scm->cgroup_path) + 1;
+			put_cmsg(msg, SOL_SOCKET, SCM_CGROUP, len,
+				 scm->cgroup_path);
+		}
+		scm_free_cgroup_path(scm);
+	}
 }
 
 
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index e86be5b..aad9ddb 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -83,5 +83,6 @@ 
 #define SO_BPF_EXTENSIONS	48
 
 #define SO_PEERCGROUP		49
+#define SO_PASSCGROUP		50
 
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index 2926774..76ff2a6 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -779,6 +779,13 @@  set_rcvbuf:
 			clear_bit(SOCK_PASSCRED, &sock->flags);
 		break;
 
+	case SO_PASSCGROUP:
+		if (valbool)
+			set_bit(SOCK_PASSCGROUP, &sock->flags);
+		else
+			clear_bit(SOCK_PASSCGROUP, &sock->flags);
+		break;
+
 	case SO_TIMESTAMP:
 	case SO_TIMESTAMPNS:
 		if (valbool)  {
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 892ea50..85e1e4b 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -155,6 +155,23 @@  static inline void unix_set_secdata(struct scm_cookie *scm, struct sk_buff *skb)
 { }
 #endif /* CONFIG_SECURITY_NETWORK */
 
+#ifdef CONFIG_CGROUPS
+static inline void scm_set_cgroup_path(struct scm_cookie *scm,
+				       struct sk_buff *skb)
+{
+	if (!UNIXCB(skb).cgroup_path)
+		return;
+
+	/* Transfer the ownership of cgroup path buffer from skb to scm */
+	scm->cgroup_path = UNIXCB(skb).cgroup_path;
+	UNIXCB(skb).cgroup_path = NULL;
+}
+#else
+static inline void scm_set_cgroup_path(struct scm_cookie *scm,
+				       struct sk_buff *skb)
+{ }
+#endif
+
 /*
  *  SMP locking strategy:
  *    hash table is protected with spinlock unix_table_lock
@@ -1326,6 +1343,8 @@  static void unix_sock_inherit_flags(const struct socket *old,
 		set_bit(SOCK_PASSCRED, &new->flags);
 	if (test_bit(SOCK_PASSSEC, &old->flags))
 		set_bit(SOCK_PASSSEC, &new->flags);
+	if (test_bit(SOCK_PASSCGROUP, &old->flags))
+		set_bit(SOCK_PASSCGROUP, &new->flags);
 }
 
 static int unix_accept(struct socket *sock, struct socket *newsock, int flags)
@@ -1427,6 +1446,11 @@  static void unix_destruct_scm(struct sk_buff *skb)
 	if (UNIXCB(skb).fp)
 		unix_detach_fds(&scm, skb);
 
+	if (UNIXCB(skb).cgroup_path) {
+		scm.cgroup_path = UNIXCB(skb).cgroup_path;
+		UNIXCB(skb).cgroup_path = NULL;
+	}
+
 	/* Alas, it calls VFS */
 	/* So fscking what? fput() had been SMP-safe since the last Summer */
 	scm_destroy(&scm);
@@ -1480,6 +1504,7 @@  static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
 	if (scm->fp && send_fds)
 		err = unix_attach_fds(scm, skb);
 
+	UNIXCB(skb).cgroup_path = NULL;
 	skb->destructor = unix_destruct_scm;
 	return err;
 }
@@ -1502,6 +1527,55 @@  static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
 	}
 }
 
+/* Should be called with "other" state spin lock held */
+static bool
+should_add_cgroup_path(const struct socket *sock, const struct sock *other)
+{
+#ifdef CONFIG_CGROUPS
+	/*
+	 * In stream sockets, it is possible that client starts sending
+	 * data (sendmsg()) before server has finished accept(). In that
+	 * case other->sk_socket will be null for a brief window and
+	 * will be set as soon as accept() completes.
+	 *
+	 * Send cgroup path for this small duration when other->sk_socket
+	 * is not set. Soon accept() should finish and
+	 * other->sk_socket->flags will decide whether to send cgroup
+	 * path or not.
+	 */
+	if (test_bit(SOCK_PASSCGROUP, &sock->flags) ||
+	    !other->sk_socket ||
+	    test_bit(SOCK_PASSCGROUP, &other->sk_socket->flags))
+		return true;
+#endif
+
+	return false;
+}
+
+static int skb_alloc_install_cgroup_path(struct sk_buff *skb)
+{
+
+#ifdef CONFIG_CGROUPS
+	char *cgroup_path, *path;
+
+	cgroup_path = kzalloc(PATH_MAX, GFP_KERNEL);
+	if (!cgroup_path)
+		return -ENOMEM;
+
+	path = task_cgroup_path(current, cgroup_path, PATH_MAX);
+	if (!path) {
+		kfree(cgroup_path);
+		return -ENAMETOOLONG;
+	}
+
+	if (path != cgroup_path)
+		memmove(cgroup_path, path, strlen(path) + 1);
+
+	UNIXCB(skb).cgroup_path = cgroup_path;
+#endif
+	return 0;
+}
+
 /*
  *	Send AF_UNIX data.
  */
@@ -1523,6 +1597,7 @@  static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	struct scm_cookie tmp_scm;
 	int max_level;
 	int data_len = 0;
+	bool need_cgroup_path = false;
 
 	if (NULL == siocb->scm)
 		siocb->scm = &tmp_scm;
@@ -1600,7 +1675,20 @@  restart:
 		goto out_free;
 	}
 
+alloc_cgroup:
+	if (need_cgroup_path && !UNIXCB(skb).cgroup_path) {
+		err = skb_alloc_install_cgroup_path(skb);
+		if (err)
+			goto out_free;
+	}
+
 	unix_state_lock(other);
+	need_cgroup_path = should_add_cgroup_path(sock, other);
+	if (need_cgroup_path && !UNIXCB(skb).cgroup_path) {
+		unix_state_unlock(other);
+		goto alloc_cgroup;
+	}
+
 	err = -EPERM;
 	if (!unix_may_send(sk, other))
 		goto out_unlock;
@@ -1698,6 +1786,7 @@  static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	bool fds_sent = false;
 	int max_level;
 	int data_len;
+	bool need_cgroup_path = false;
 
 	if (NULL == siocb->scm)
 		siocb->scm = &tmp_scm;
@@ -1759,12 +1848,26 @@  static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 			goto out_err;
 		}
 
+alloc_cgroup:
+		if (need_cgroup_path && !UNIXCB(skb).cgroup_path) {
+			err = skb_alloc_install_cgroup_path(skb);
+			if (err) {
+				kfree_skb(skb);
+				goto out_err;
+			}
+		}
+
 		unix_state_lock(other);
 
 		if (sock_flag(other, SOCK_DEAD) ||
 		    (other->sk_shutdown & RCV_SHUTDOWN))
 			goto pipe_err_free;
 
+		need_cgroup_path = should_add_cgroup_path(sock, other);
+		if (need_cgroup_path && !UNIXCB(skb).cgroup_path) {
+			unix_state_unlock(other);
+			goto alloc_cgroup;
+		}
 		maybe_add_creds(skb, sock, other);
 		skb_queue_tail(&other->sk_receive_queue, skb);
 		if (max_level > unix_sk(other)->recursion_level)
@@ -1897,6 +2000,8 @@  static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 	scm_set_cred(siocb->scm, UNIXCB(skb).pid, UNIXCB(skb).uid, UNIXCB(skb).gid);
 	unix_set_secdata(siocb->scm, skb);
 
+	scm_set_cgroup_path(siocb->scm, skb);
+
 	if (!(flags & MSG_PEEK)) {
 		if (UNIXCB(skb).fp)
 			unix_detach_fds(siocb->scm, skb);
@@ -1982,6 +2087,7 @@  static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 	int copied = 0;
 	int noblock = flags & MSG_DONTWAIT;
 	int check_creds = 0;
+	bool check_cgroups = false;
 	int target;
 	int err = 0;
 	long timeo;
@@ -2081,6 +2187,22 @@  again:
 			check_creds = 1;
 		}
 
+		/* Don't glue messages from writers in different cgroups */
+		if (check_cgroups) {
+			/* Previous skb had cgroup path and this one does not */
+			if (UNIXCB(skb).cgroup_path == NULL)
+				break;
+
+			if (strcmp(UNIXCB(skb).cgroup_path,
+			    siocb->scm->cgroup_path))
+				break;
+		} else if (test_bit(SOCK_PASSCGROUP, &sock->flags) &&
+			   UNIXCB(skb).cgroup_path != NULL) {
+			/* Copy cgroup path */
+			scm_set_cgroup_path(siocb->scm, skb);
+			check_cgroups = true;
+		}
+
 		/* Copy address just once */
 		if (sunaddr) {
 			unix_copy_addr(msg, skb->sk);