diff mbox

ipv4: Use standard iovec primitive in raw_probe_proto_opt

Message ID 20141106055023.GA28865@gondor.apana.org.au
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu Nov. 6, 2014, 5:50 a.m. UTC
On Thu, Nov 06, 2014 at 03:25:34AM +0000, Al Viro wrote:
>
> 	* there's some really weird stuff in there.  Just what is this
> static int raw_probe_proto_opt(struct flowi4 *fl4, struct msghdr *msg)
> {

It looks like newbie coding that's all.  There's nothing tricky
here as far as I can tell.  We're just trying to fetch the ICMP
header to seed the IPsec lookup.

So how about this rewrite? I'm assuming that you're not going
to get rid of memcpy_fromiovecend/memcpy_toiovecend, if you
are, let me know and I'll redo this with iterators.

ipv4: Use standard iovec primitive in raw_probe_proto_opt

The function raw_probe_proto_opt tries to extract the first two
bytes from the user input in order to seed the IPsec lookup for
ICMP packets.  In doing so it's processing iovec by hand and
overcomplicating things.

This patch replaces the manual iovec processing with a call to
memcpy_fromiovecend.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


Cheers,

Comments

Al Viro Nov. 6, 2014, 6:43 a.m. UTC | #1
On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
> +	/* We only need the first two bytes. */
> +	err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
> +	if (err)
> +		return err;
> +
> +	fl4->fl4_icmp_type = icmph.type;
> +	fl4->fl4_icmp_code = icmph.code;

That's more readable, but that exposes another problem in there - we read
the same piece of userland data twice, with no promise whatsoever that we'll
get the same value both times...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Nov. 6, 2014, 6:46 a.m. UTC | #2
On Thu, Nov 06, 2014 at 06:43:18AM +0000, Al Viro wrote:
> On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
> > +	/* We only need the first two bytes. */
> > +	err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
> > +	if (err)
> > +		return err;
> > +
> > +	fl4->fl4_icmp_type = icmph.type;
> > +	fl4->fl4_icmp_code = icmph.code;
> 
> That's more readable, but that exposes another problem in there - we read
> the same piece of userland data twice, with no promise whatsoever that we'll
> get the same value both times...

Sure, but you have to be root anyway to write to raw sockets.

Patches are welcome :)

Cheers,
Al Viro Nov. 6, 2014, 7:11 a.m. UTC | #3
On Thu, Nov 06, 2014 at 02:46:29PM +0800, Herbert Xu wrote:
> On Thu, Nov 06, 2014 at 06:43:18AM +0000, Al Viro wrote:
> > On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
> > > +	/* We only need the first two bytes. */
> > > +	err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	fl4->fl4_icmp_type = icmph.type;
> > > +	fl4->fl4_icmp_code = icmph.code;
> > 
> > That's more readable, but that exposes another problem in there - we read
> > the same piece of userland data twice, with no promise whatsoever that we'll
> > get the same value both times...
> 
> Sure, but you have to be root anyway to write to raw sockets.

Point, but that might very well be a pattern to watch for - there's at least
one more instance in TIPC (also not exploitable, according to TIPC folks)
and such bugs are easily repeated...

BTW, I've picked the tun and macvtap related bits from another part of old
queue; see vfs.git#untested-macvtap - it's on top of #iov_iter-net and it's
really completely untested.  Back then I was mostly interested in killing
as many ->aio_write() instances as I could, so it's only the "send" side of
things.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Maloy Nov. 6, 2014, 9:55 a.m. UTC | #4
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Al Viro
> Sent: November-06-14 8:11 AM
> To: Herbert Xu
> Cc: David Miller; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> bcrl@kvack.org; Masahide Nakamura; Hideaki YOSHIFUJI
> Subject: Re: ipv4: Use standard iovec primitive in raw_probe_proto_opt
> 
> On Thu, Nov 06, 2014 at 02:46:29PM +0800, Herbert Xu wrote:
> > On Thu, Nov 06, 2014 at 06:43:18AM +0000, Al Viro wrote:
> > > On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
> > > > +	/* We only need the first two bytes. */
> > > > +	err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
> > > > +	if (err)
> > > > +		return err;
> > > > +
> > > > +	fl4->fl4_icmp_type = icmph.type;
> > > > +	fl4->fl4_icmp_code = icmph.code;
> > >
> > > That's more readable, but that exposes another problem in there - we
> > > read the same piece of userland data twice, with no promise
> > > whatsoever that we'll get the same value both times...
> >
> > Sure, but you have to be root anyway to write to raw sockets.
> 
> Point, but that might very well be a pattern to watch for - there's at least one
> more instance in TIPC (also not exploitable, according to TIPC folks) and such

I don't recall this, and I can't see where it would be either. Can you please
point to where it is?

///jon

> bugs are easily repeated...
> 
> BTW, I've picked the tun and macvtap related bits from another part of old
> queue; see vfs.git#untested-macvtap - it's on top of #iov_iter-net and it's
> really completely untested.  Back then I was mostly interested in killing as
> many ->aio_write() instances as I could, so it's only the "send" side of things.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 6, 2014, 9:28 p.m. UTC | #5
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 6 Nov 2014 14:46:29 +0800

> On Thu, Nov 06, 2014 at 06:43:18AM +0000, Al Viro wrote:
>> On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
>> > +	/* We only need the first two bytes. */
>> > +	err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
>> > +	if (err)
>> > +		return err;
>> > +
>> > +	fl4->fl4_icmp_type = icmph.type;
>> > +	fl4->fl4_icmp_code = icmph.code;
>> 
>> That's more readable, but that exposes another problem in there - we read
>> the same piece of userland data twice, with no promise whatsoever that we'll
>> get the same value both times...
> 
> Sure, but you have to be root anyway to write to raw sockets.
> 
> Patches are welcome :)

I'd agree with this root-only argument maybe 15 years ago, but with
containers and stuff like that we want to prevent root X from messing
up the machine for root Y.

This is a recurring topic, and I'd strongly like to avoid adding new
ways that these kinds of problems can happen.

For example, I'm still on the hook to address the AF_NETLINK mmap TX
code, which has a similarly abusable issue.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Nov. 6, 2014, 10:16 p.m. UTC | #6
On Thu, Nov 06, 2014 at 09:55:31AM +0000, Jon Maloy wrote:
> > Point, but that might very well be a pattern to watch for - there's at least one
> > more instance in TIPC (also not exploitable, according to TIPC folks) and such
> 
> I don't recall this, and I can't see where it would be either. Can you please
> point to where it is?

The same dest_name_check() thing.  This
        if (copy_from_user(&hdr, m->msg_iov[0].iov_base, sizeof(hdr)))
                return -EFAULT;
        if ((ntohs(hdr.tcm_type) & 0xC000) && (!capable(CAP_NET_ADMIN)))
                return -EACCES;
is easily bypassed.  Suppose you want to send a packet with these two
bits in ->tcm_type not being 00, and you don't have CAP_NET_ADMIN.
Not a problem - spawn two threads sharing memory, have one trying to
call sendmsg() while another keeps flipping these two bits.  Sooner
of later you'll get the timing right and have these bits observed as 00
in dest_name_check() and 11 when it comes to memcpy_fromiovecend() actually
copying the whole thing.  And considering that the interval between those
two is much longer than the loop in the second thread would take on
each iteration, I'd expect the odds around 25% per attempted sendmsg().

IOW, this test is either pointless and can be removed completely, or there's
an exploitable race.  As far as I understand from your replies both back then
and in another branch of this thread, it's the former and the proper fix is
to remove at least that part of dest_name_check().  So this case is also
not something exploitable, but it certainly matches the same pattern.

My point was simply that this pattern is worth watching for - recurrent bug
classes like that have a good chance to spawn an instance that will be
exploitable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Nov. 7, 2014, 2 a.m. UTC | #7
On Thu, Nov 06, 2014 at 04:28:08PM -0500, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Thu, 6 Nov 2014 14:46:29 +0800
> 
> > On Thu, Nov 06, 2014 at 06:43:18AM +0000, Al Viro wrote:
> >> On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
> >> > +	/* We only need the first two bytes. */
> >> > +	err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
> >> > +	if (err)
> >> > +		return err;
> >> > +
> >> > +	fl4->fl4_icmp_type = icmph.type;
> >> > +	fl4->fl4_icmp_code = icmph.code;
> >> 
> >> That's more readable, but that exposes another problem in there - we read
> >> the same piece of userland data twice, with no promise whatsoever that we'll
> >> get the same value both times...
> > 
> > Sure, but you have to be root anyway to write to raw sockets.
> > 
> > Patches are welcome :)
> 
> I'd agree with this root-only argument maybe 15 years ago, but with
> containers and stuff like that we want to prevent root X from messing
> up the machine for root Y.
> 
> This is a recurring topic, and I'd strongly like to avoid adding new
> ways that these kinds of problems can happen.
> 
> For example, I'm still on the hook to address the AF_NETLINK mmap TX
> code, which has a similarly abusable issue.

Fair enough.  Even though the bug existed prior to my patch I'll
see if we could get rid of it.

Cheers,
Herbert Xu Nov. 7, 2014, 1:25 p.m. UTC | #8
Hi Dave:

This series rewrites the function raw_probe_proto_opt in a more
readable fasion, and then fixes the long-standing bug where we
read the probed bytes twice which means that what we're using to
probe may in fact be invalid.

Cheers,
David Miller Nov. 10, 2014, 7:26 p.m. UTC | #9
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 7 Nov 2014 21:25:53 +0800

> This series rewrites the function raw_probe_proto_opt in a more
> readable fasion, and then fixes the long-standing bug where we
> read the probed bytes twice which means that what we're using to
> probe may in fact be invalid.

Series applied to net-next, thanks Herbert.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Nov. 28, 2014, 5:14 a.m. UTC | #10
On Thu, Nov 06, 2014 at 10:16:08PM +0000, Al Viro wrote:
> On Thu, Nov 06, 2014 at 09:55:31AM +0000, Jon Maloy wrote:
> > > Point, but that might very well be a pattern to watch for - there's at least one
> > > more instance in TIPC (also not exploitable, according to TIPC folks) and such
> > 
> > I don't recall this, and I can't see where it would be either. Can you please
> > point to where it is?
> 
> The same dest_name_check() thing.  This
>         if (copy_from_user(&hdr, m->msg_iov[0].iov_base, sizeof(hdr)))
>                 return -EFAULT;
>         if ((ntohs(hdr.tcm_type) & 0xC000) && (!capable(CAP_NET_ADMIN)))
>                 return -EACCES;
> is easily bypassed.  Suppose you want to send a packet with these two
> bits in ->tcm_type not being 00, and you don't have CAP_NET_ADMIN.
> Not a problem - spawn two threads sharing memory, have one trying to
> call sendmsg() while another keeps flipping these two bits.  Sooner
> of later you'll get the timing right and have these bits observed as 00
> in dest_name_check() and 11 when it comes to memcpy_fromiovecend() actually
> copying the whole thing.  And considering that the interval between those
> two is much longer than the loop in the second thread would take on
> each iteration, I'd expect the odds around 25% per attempted sendmsg().
> 
> IOW, this test is either pointless and can be removed completely, or there's
> an exploitable race.  As far as I understand from your replies both back then
> and in another branch of this thread, it's the former and the proper fix is
> to remove at least that part of dest_name_check().  So this case is also
> not something exploitable, but it certainly matches the same pattern.
> 
> My point was simply that this pattern is worth watching for - recurrent bug
> classes like that have a good chance to spawn an instance that will be
> exploitable.

Ping?  Can we simply remove dest_name_check() completely?  That's one of the
few remaining obstacles to making ->sendmsg() iov_iter-clean.  For now I'm
simply commenting its call out in tipc_sendmsg(); if it _is_ needed for
anything, we'll need to get rid of that double copying from userland.  I can
do that, but my impression from your comments back in April is that you
planned to removed the damn check anyway.

Another question: in tipc_send_stream() we have
        mtu = tsk->max_pkt;
        send = min_t(uint, dsz - sent, TIPC_MAX_USER_MSG_SIZE);
        __skb_queue_head_init(&head);
        rc = tipc_msg_build(mhdr, m, sent, send, mtu, &head);
        if (unlikely(rc < 0))
                goto exit;
        do {   
                if (likely(!tsk_conn_cong(tsk))) {
                        rc = tipc_link_xmit(&head, dnode, ref);
                        if (likely(!rc)) {
                                tsk->sent_unacked++;
                                sent += send;
                                if (sent == dsz)
                                        break;
                                goto next;
                        }
                        if (rc == -EMSGSIZE) {
                                tsk->max_pkt = tipc_node_get_mtu(dnode, ref);
                                goto next;
                        }

How can it hit that EMSGSIZE?  AFAICS, it can come only from
int __tipc_link_xmit(struct tipc_link *link, struct sk_buff_head *list)
{
        struct tipc_msg *msg = buf_msg(skb_peek(list));
        uint psz = msg_size(msg);
...
        uint mtu = link->max_pkt;
...
        /* Has valid packet limit been used ? */   
        if (unlikely(psz > mtu)) {
                __skb_queue_purge(list);
                return -EMSGSIZE;
        }

and msg_size() is basically the bits copied into skb by tipc_msg_build() and
set by msg_set_size() in there.  And unless I'm seriously misreading that
function, it can't be more than pktmax argument, i.e. mtu.  So unless something
manages to crap into our skb or change mtu right under us, it shouldn't be
possible.  And mtu (i.e. ->max_pkt) ought to be protected by lock_sock() there.

What's going on there?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 739db31..04f67e1 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -422,48 +422,20 @@  error:
 
 static int raw_probe_proto_opt(struct flowi4 *fl4, struct msghdr *msg)
 {
-	struct iovec *iov;
-	u8 __user *type = NULL;
-	u8 __user *code = NULL;
-	int probed = 0;
-	unsigned int i;
+	struct icmphdr icmph;
+	int err;
 
-	if (!msg->msg_iov)
+	if (fl4->flowi4_proto != IPPROTO_ICMP)
 		return 0;
 
-	for (i = 0; i < msg->msg_iovlen; i++) {
-		iov = &msg->msg_iov[i];
-		if (!iov)
-			continue;
-
-		switch (fl4->flowi4_proto) {
-		case IPPROTO_ICMP:
-			/* check if one-byte field is readable or not. */
-			if (iov->iov_base && iov->iov_len < 1)
-				break;
-
-			if (!type) {
-				type = iov->iov_base;
-				/* check if code field is readable or not. */
-				if (iov->iov_len > 1)
-					code = type + 1;
-			} else if (!code)
-				code = iov->iov_base;
-
-			if (type && code) {
-				if (get_user(fl4->fl4_icmp_type, type) ||
-				    get_user(fl4->fl4_icmp_code, code))
-					return -EFAULT;
-				probed = 1;
-			}
-			break;
-		default:
-			probed = 1;
-			break;
-		}
-		if (probed)
-			break;
-	}
+	/* We only need the first two bytes. */
+	err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
+	if (err)
+		return err;
+
+	fl4->fl4_icmp_type = icmph.type;
+	fl4->fl4_icmp_code = icmph.code;
+
 	return 0;
 }