Patchwork [net-next-2.6] netlink: netlink_recvmsg() fix

login
register
mail settings
Submitter Eric Dumazet
Date July 21, 2010, 8:43 a.m.
Message ID <1279701835.2452.17.camel@edumazet-laptop>
Download mbox | patch
Permalink /patch/59411/
State Accepted
Delegated to: David Miller
Headers show

Comments

Eric Dumazet - July 21, 2010, 8:43 a.m.
Le mercredi 21 juillet 2010 à 10:20 +0200, Eric Dumazet a écrit :

> Nothing can touch this skb at the same time but us and our friends
> (consumers that did a skb_recv_datagram( MSG_PEEK ) operation).
> 
> Oh well, I see skb_unshare() tests skb_cloned(). This is not what we
> want.
> 

Here is V3 of this patch.

Thanks

[PATCH net-next-2.6 v3] netlink: netlink_recvmsg() fix

commit 1dacc76d0014 
(net/compat/wext: send different messages to compat tasks)
introduced a race condition on netlink, in case MSG_PEEK is used.

An skb given by skb_recv_datagram() might be shared, we must copy it
before any modification, or risk fatal corruption.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/netlink/af_netlink.c |   22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - July 26, 2010, 4:55 a.m.
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Jul 2010 10:43:55 +0200

> [PATCH net-next-2.6 v3] netlink: netlink_recvmsg() fix
> 
> commit 1dacc76d0014 
> (net/compat/wext: send different messages to compat tasks)
> introduced a race condition on netlink, in case MSG_PEEK is used.
> 
> An skb given by skb_recv_datagram() might be shared, we must copy it
> before any modification, or risk fatal corruption.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - July 26, 2010, 8:08 p.m.
From: David Miller <davem@davemloft.net>
Date: Sun, 25 Jul 2010 21:55:48 -0700 (PDT)

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 21 Jul 2010 10:43:55 +0200
> 
>> [PATCH net-next-2.6 v3] netlink: netlink_recvmsg() fix
>> 
>> commit 1dacc76d0014 
>> (net/compat/wext: send different messages to compat tasks)
>> introduced a race condition on netlink, in case MSG_PEEK is used.
>> 
>> An skb given by skb_recv_datagram() might be shared, we must copy it
>> before any modification, or risk fatal corruption.
>> 
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Applied, thanks Eric.

I bet you didn't compile test the code you modified at all, but it's
not your fault :-)

The code is protected by CONFIG_WIRELESS_EXT but that protection is
not valid.  It should be protected by something like CONFIG_WEXT_CORE
or similar.

The only way to get CONFIG_WIRELESS_EXT set it to enable one of a few
drivers, many of which are in staging.

Anyways, just a heads up, I'll fix this up.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - July 26, 2010, 8:39 p.m.
Le lundi 26 juillet 2010 à 13:08 -0700, David Miller a écrit :

> I bet you didn't compile test the code you modified at all, but it's
> not your fault :-)
> 
> The code is protected by CONFIG_WIRELESS_EXT but that protection is
> not valid.  It should be protected by something like CONFIG_WEXT_CORE
> or similar.
> 
> The only way to get CONFIG_WIRELESS_EXT set it to enable one of a few
> drivers, many of which are in staging.
> 
> Anyways, just a heads up, I'll fix this up.

Well, I recall having compiled (but not boot tested) patches,
on my x86_64 dev machine with a .config containing :

# grep WIRELESS_EXT .config
CONFIG_WIRELESS_EXT=y
CONFIG_WIRELESS_EXT_SYSFS=y
# grep COMPAT_NETLINK_MESSAGES .config
CONFIG_COMPAT_NETLINK_MESSAGES=y

Cannot remember if I compiled v3 of the patch, to be honest.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - July 26, 2010, 8:48 p.m.
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 26 Jul 2010 22:39:03 +0200

> Cannot remember if I compiled v3 of the patch, to be honest.

It's impossible that you did so, there is a missing
closing parenthesis in the first if() statement added
by your patch :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - July 26, 2010, 8:55 p.m.
Le lundi 26 juillet 2010 à 13:48 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 26 Jul 2010 22:39:03 +0200
> 
> > Cannot remember if I compiled v3 of the patch, to be honest.
> 
> It's impossible that you did so, there is a missing
> closing parenthesis in the first if() statement added
> by your patch :-)

Oops

Yes, I realized this shortly after sending my previous mail :)

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Berg - Aug. 13, 2010, 2 p.m.
On Sun, 2010-07-25 at 21:55 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 21 Jul 2010 10:43:55 +0200
> 
> > [PATCH net-next-2.6 v3] netlink: netlink_recvmsg() fix
> > 
> > commit 1dacc76d0014 
> > (net/compat/wext: send different messages to compat tasks)
> > introduced a race condition on netlink, in case MSG_PEEK is used.
> > 
> > An skb given by skb_recv_datagram() might be shared, we must copy it
> > before any modification, or risk fatal corruption.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Applied, thanks Eric.

I keep getting errors like below in 2.6.35+wireless-testing. Not saying
that it's this patch's fault, but it is the only thing I remember
touching that area.

[10548.768754] =============================================================================
[10548.773474] BUG kmalloc-4096: Object already free
[10548.775342] -----------------------------------------------------------------------------
[10548.775344] 
[10548.778505] INFO: Allocated in wireless_send_event+0x1f2/0x410 age=1 cpu=0 pid=28521
[10548.778505] INFO: Freed in skb_release_data+0xd0/0xe0 age=2 cpu=2 pid=27730
[10548.778505] INFO: Slab 0xffffea0003fa4b80 objects=7 used=1 fp=0xffff880122f12090 flags=0x80000000000040c3
[10548.778505] INFO: Object 0xffff880122f12090 @offset=8336 fp=0xffff880122f11048
[10548.778505] 
[10548.778505] Bytes b4 0xffff880122f12080:  6c a3 0f 00 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a l�......ZZZZZZZZ
...
[10548.778505]   Object 0xffff880122f13080:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk�
[10548.778505]  Redzone 0xffff880122f13090:  bb bb bb bb bb bb bb bb                         ��������        
[10548.778505]  Padding 0xffff880122f130d0:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ        
[10548.778505] Pid: 916, comm: avahi-daemon Tainted: G        W   2.6.35-wl-67184-g1595c70-dirty #12
[10548.778505] Call Trace:
[10548.778505]  [<ffffffff81133993>] print_trailer+0x103/0x160
[10548.778505]  [<ffffffff81133a31>] object_err+0x41/0x50
[10548.778505]  [<ffffffff81135b0f>] __slab_free+0x24f/0x3b0
[10548.778505]  [<ffffffff81135da2>] kfree+0x132/0x1a0
[10548.778505]  [<ffffffff81383710>] skb_release_data+0xd0/0xe0
[10548.778505]  [<ffffffff813831be>] __kfree_skb+0x1e/0xa0
[10548.778505]  [<ffffffff813832c2>] kfree_skb+0x42/0xb0
[10548.778505]  [<ffffffff813ac3f2>] netlink_recvmsg+0x422/0x480
[10548.778505]  [<ffffffff8137afdd>] sock_recvmsg+0xfd/0x130
[10548.778505]  [<ffffffff8137c1e4>] __sys_recvmsg+0x144/0x2e0
[10548.778505]  [<ffffffff8137c629>] sys_recvmsg+0x49/0x80
[10548.778505]  [<ffffffff8100b132>] system_call_fastpath+0x16/0x1b
[10548.778505] FIX kmalloc-4096: Object at 0xffff880122f12090 not freed


johannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Berg - Aug. 13, 2010, 2:35 p.m.
On Fri, 2010-08-13 at 16:00 +0200, Johannes Berg wrote:
> On Sun, 2010-07-25 at 21:55 -0700, David Miller wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 21 Jul 2010 10:43:55 +0200
> > 
> > > [PATCH net-next-2.6 v3] netlink: netlink_recvmsg() fix
> > > 
> > > commit 1dacc76d0014 
> > > (net/compat/wext: send different messages to compat tasks)
> > > introduced a race condition on netlink, in case MSG_PEEK is used.
> > > 
> > > An skb given by skb_recv_datagram() might be shared, we must copy it
> > > before any modification, or risk fatal corruption.
> > > 
> > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > 
> > Applied, thanks Eric.
> 
> I keep getting errors like below in 2.6.35+wireless-testing. Not saying
> that it's this patch's fault, but it is the only thing I remember
> touching that area.

In fact, I think something's wrong with this patch, since my comment
(that right now unfortunately I no longer fully understand) says:

                 * If this skb has a frag_list, then here that means that
                 * we will have to use the frag_list skb for compat tasks
                 * and the regular skb for non-compat tasks.
                 *
                 * The skb might (and likely will) be cloned, so we can't
                 * just reset frag_list and go on with things -- we need to
                 * keep that. For the compat case that's easy -- simply get
                 * a reference to the compat skb and free the regular one
                 * including the frag. For the non-compat case, we need to
                 * avoid sending the frag to the user -- so assign NULL but
                 * restore it below before freeing the skb.

and that's no longer true, afaict.

johannes


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Aug. 13, 2010, 2:48 p.m.
Le vendredi 13 août 2010 à 16:35 +0200, Johannes Berg a écrit :
> On Fri, 2010-08-13 at 16:00 +0200, Johannes Berg wrote:
> > On Sun, 2010-07-25 at 21:55 -0700, David Miller wrote:
> > > From: Eric Dumazet <eric.dumazet@gmail.com>
> > > Date: Wed, 21 Jul 2010 10:43:55 +0200
> > > 
> > > > [PATCH net-next-2.6 v3] netlink: netlink_recvmsg() fix
> > > > 
> > > > commit 1dacc76d0014 
> > > > (net/compat/wext: send different messages to compat tasks)
> > > > introduced a race condition on netlink, in case MSG_PEEK is used.
> > > > 
> > > > An skb given by skb_recv_datagram() might be shared, we must copy it
> > > > before any modification, or risk fatal corruption.
> > > > 
> > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > > 
> > > Applied, thanks Eric.
> > 
> > I keep getting errors like below in 2.6.35+wireless-testing. Not saying
> > that it's this patch's fault, but it is the only thing I remember
> > touching that area.
> 
> In fact, I think something's wrong with this patch, since my comment
> (that right now unfortunately I no longer fully understand) says:
> 
>                  * If this skb has a frag_list, then here that means that
>                  * we will have to use the frag_list skb for compat tasks
>                  * and the regular skb for non-compat tasks.
>                  *
>                  * The skb might (and likely will) be cloned, so we can't
>                  * just reset frag_list and go on with things -- we need to
>                  * keep that. For the compat case that's easy -- simply get
>                  * a reference to the compat skb and free the regular one
>                  * including the frag. For the non-compat case, we need to
>                  * avoid sending the frag to the user -- so assign NULL but
>                  * restore it below before freeing the skb.
> 
> and that's no longer true, afaict.
> 

Comment was not updated by the patch.

But do you agree temporarly setting frag_list to NULL was a bug ?

Unfortunatly I cannot test this path...

I assume reverting 1235f504aaba removes these errors ?


Its strange we have a double-free on a data part and not a skb_head.

maybe pskb_copy() has a problem with frag_list...


Maybe we can revert the patch and find another way to make sure two
process can not manipulate this skb in // (adding a mutex, or using
RTNL ?)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Berg - Aug. 13, 2010, 3:13 p.m.
On Fri, 2010-08-13 at 16:48 +0200, Eric Dumazet wrote:

> >                  * If this skb has a frag_list, then here that means that
> >                  * we will have to use the frag_list skb for compat tasks
> >                  * and the regular skb for non-compat tasks.
> >                  *
> >                  * The skb might (and likely will) be cloned, so we can't
> >                  * just reset frag_list and go on with things -- we need to
> >                  * keep that. For the compat case that's easy -- simply get
> >                  * a reference to the compat skb and free the regular one
> >                  * including the frag. For the non-compat case, we need to
> >                  * avoid sending the frag to the user -- so assign NULL but
> >                  * restore it below before freeing the skb.
> > 
> > and that's no longer true, afaict.
> > 
> 
> Comment was not updated by the patch.
> 
> But do you agree temporarly setting frag_list to NULL was a bug ?

No, that was actually intentional, as the comment describes?

> Unfortunatly I cannot test this path...

No wireless devices? :)

> I assume reverting 1235f504aaba removes these errors ?

I haven't tried yet, but it only happened very recently and I didn't
find any other candidate -- the error always points to
wireless_send_event too.

> Its strange we have a double-free on a data part and not a skb_head.

There also was an apparent use-after-free error.

> maybe pskb_copy() has a problem with frag_list...
> 
> 
> Maybe we can revert the patch and find another way to make sure two
> process can not manipulate this skb in // (adding a mutex, or using
> RTNL ?)

I'm getting confused. Why did I think the NULLing and then restoring was
not racy to start with?

johannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Aug. 15, 2010, 5:37 a.m.
From: Johannes Berg <johannes@sipsolutions.net>
Date: Fri, 13 Aug 2010 17:13:53 +0200

> On Fri, 2010-08-13 at 16:48 +0200, Eric Dumazet wrote:
> 
>> I assume reverting 1235f504aaba removes these errors ?
> 
> I haven't tried yet, but it only happened very recently and I didn't
> find any other candidate -- the error always points to
> wireless_send_event too.

Please test with the commit reverted and let us know if it helps.

The current situation is worse than what we were trying to fix
in that commit, so if a revert fixes your problem then as Eric
said we should do that first.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Berg - Aug. 16, 2010, 5:25 a.m.
On Sat, 2010-08-14 at 22:37 -0700, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Fri, 13 Aug 2010 17:13:53 +0200
> 
> > On Fri, 2010-08-13 at 16:48 +0200, Eric Dumazet wrote:
> > 
> >> I assume reverting 1235f504aaba removes these errors ?
> > 
> > I haven't tried yet, but it only happened very recently and I didn't
> > find any other candidate -- the error always points to
> > wireless_send_event too.
> 
> Please test with the commit reverted and let us know if it helps.
> 
> The current situation is worse than what we were trying to fix
> in that commit, so if a revert fixes your problem then as Eric
> said we should do that first.

I haven't gotten around to it, but Kalle had been running into the same
issue and said reverting it fixed it:

http://article.gmane.org/gmane.linux.kernel.wireless.general/54492

johannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Aug. 16, 2010, 6:10 a.m.
Le lundi 16 août 2010 à 07:25 +0200, Johannes Berg a écrit :

> I haven't gotten around to it, but Kalle had been running into the same
> issue and said reverting it fixed it:
> 
> http://article.gmane.org/gmane.linux.kernel.wireless.general/54492
> 

Thanks Johannes

Please David revert 1235f504.

We'll find another way to address the problem.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Aug. 16, 2010, 6:21 a.m.
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 16 Aug 2010 08:10:02 +0200

> Le lundi 16 août 2010 à 07:25 +0200, Johannes Berg a écrit :
> 
>> I haven't gotten around to it, but Kalle had been running into the same
>> issue and said reverting it fixed it:
>> 
>> http://article.gmane.org/gmane.linux.kernel.wireless.general/54492
>> 
> 
> Thanks Johannes
> 
> Please David revert 1235f504.
> 
> We'll find another way to address the problem.

I will, thanks guys.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Berg - Aug. 16, 2010, 6:22 a.m.
On Mon, 2010-08-16 at 08:10 +0200, Eric Dumazet wrote:

> We'll find another way to address the problem.

For my understanding: The problem is that the save/restore of the
frag_list skb can race so a second reader will not see the frag_list skb
because it gets there while it's NULL. This happens with MSG_PEEK, or
with clones of the SKB since skb_shared_info is invariant across clones.

Correct?

johannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Aug. 16, 2010, 6:29 a.m.
Le lundi 16 août 2010 à 08:22 +0200, Johannes Berg a écrit :
> On Mon, 2010-08-16 at 08:10 +0200, Eric Dumazet wrote:
> 
> > We'll find another way to address the problem.
> 
> For my understanding: The problem is that the save/restore of the
> frag_list skb can race so a second reader will not see the frag_list skb
> because it gets there while it's NULL. This happens with MSG_PEEK, or
> with clones of the SKB since skb_shared_info is invariant across clones.
> 
> Correct?
> 

Yes

I believe we should have a mutual exclusion for the critical section.

(setting pointer to NULL, ...., setting pointer back to its orig value)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Berg - Aug. 16, 2010, 6:31 a.m.
On Mon, 2010-08-16 at 08:29 +0200, Eric Dumazet wrote:
> Le lundi 16 août 2010 à 08:22 +0200, Johannes Berg a écrit :
> > On Mon, 2010-08-16 at 08:10 +0200, Eric Dumazet wrote:
> > 
> > > We'll find another way to address the problem.
> > 
> > For my understanding: The problem is that the save/restore of the
> > frag_list skb can race so a second reader will not see the frag_list skb
> > because it gets there while it's NULL. This happens with MSG_PEEK, or
> > with clones of the SKB since skb_shared_info is invariant across clones.
> > 
> > Correct?
> > 
> 
> Yes
> 
> I believe we should have a mutual exclusion for the critical section.
> 
> (setting pointer to NULL, ...., setting pointer back to its orig value)

Yeah, that'd work, but I'm wondering now why I did this at all. The code
already restricts the data copying to skb->len, which will not include
the frag_list, so it should be OK to just leave it intact? I'll go and
look at that in more detail and try it.

johannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7aeaa83..0c86657 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1405,7 +1405,7 @@  static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 	struct netlink_sock *nlk = nlk_sk(sk);
 	int noblock = flags&MSG_DONTWAIT;
 	size_t copied;
-	struct sk_buff *skb, *frag __maybe_unused = NULL;
+	struct sk_buff *skb;
 	int err;
 
 	if (flags&MSG_OOB)
@@ -1440,7 +1440,21 @@  static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 			kfree_skb(skb);
 			skb = compskb;
 		} else {
-			frag = skb_shinfo(skb)->frag_list;
+			/*
+			 * Before setting frag_list to NULL, we must get a
+			 * private copy of skb if shared (because of MSG_PEEK)
+			 */
+			if (skb_shared(skb) {
+				struct sk_buff *nskb;
+
+				nskb = pskb_copy(skb, GFP_KERNEL);
+				kfree_skb(skb);
+				skb = nskb;
+				err = -ENOMEM;
+				if (!skb)
+					goto out;
+			}
+			kfree_skb(skb_shinfo(skb)->frag_list);
 			skb_shinfo(skb)->frag_list = NULL;
 		}
 	}
@@ -1477,10 +1491,6 @@  static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 	if (flags & MSG_TRUNC)
 		copied = skb->len;
 
-#ifdef CONFIG_COMPAT_NETLINK_MESSAGES
-	skb_shinfo(skb)->frag_list = frag;
-#endif
-
 	skb_free_datagram(sk, skb);
 
 	if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2)