Message ID | 1267067593.16986.1583.camel@debian |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
From: Zhu Yi <yi.zhu@intel.com> Date: Thu, 25 Feb 2010 11:13:13 +0800 > @@ -1372,8 +1372,13 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) > bh_lock_sock(sk); > if (!sock_owned_by_user(sk)) > rc = __udp_queue_rcv_skb(sk, skb); > - else > + else { > + if (atomic_read(&sk->sk_backlog.len) >= sk->sk_rcvbuf) { > + bh_unlock_sock(sk); > + goto drop; > + } > sk_add_backlog(sk, skb); > + } We have to address this issue, of course, but I bet this method of handling it negatively impacts performance in normal cases. Right now we can queue up a lot and still get it to the application if it is slow getting scheduled onto a cpu, but if you put this limit here it could result in lots of drops. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 25 février 2010 à 11:13 +0800, Zhu Yi a écrit : > Hi, > > We got system OOM while running some UDP netperf testing on the loopback > device. The case is multiple senders sent stream UDP packets to a single > receiver via loopback on local host. Of course, the receiver is not able > to handle all the packets in time. But we surprisingly found that these > packets were not discarded due to the receiver's sk->sk_rcvbuf limit. > Instead, they are kept queuing to sk->sk_backlog and finally ate up all > the memory. We believe this is a secure hole that a none privileged user > can crash the system. > > The root cause for this problem is, when the receiver is doing > __release_sock() (i.e. after userspace recv, kernel udp_recvmsg -> > skb_free_datagram_locked -> release_sock), it moves skbs from backlog to > sk_receive_queue with the softirq enabled. In the above case, multiple > busy senders will almost make it an endless loop. The skbs in the > backlog end up eat all the system memory. > > The patch fixed this problem by adding accounting for the socket > backlog. So that the backlog size can be restricted by protocol's choice > (i.e. UDP). > > Signed-off-by: Zhu Yi <yi.zhu@intel.com> > --- > diff --git a/include/net/sock.h b/include/net/sock.h > index 3f1a480..2e003b9 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -253,6 +253,7 @@ struct sock { > struct { > struct sk_buff *head; > struct sk_buff *tail; > + atomic_t len; This adds a hole on 32bit arches. I am pretty sure we dont need an atomic here, since we must own a lock before manipulating sk_backlog{head,tail,len}. UDP/IPV6 should be addressed too in your patch. Other questions raised by your discovery : - What about other protocols that also use a backlog ? - __release_sock() could run forever with no preemption, even with a limit on backlog. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2010-02-25 at 19:24 +0800, Eric Dumazet wrote: > > @@ -253,6 +253,7 @@ struct sock { > > struct { > > struct sk_buff *head; > > struct sk_buff *tail; > > + atomic_t len; > > This adds a hole on 32bit arches. > > I am pretty sure we dont need an atomic here, since we must own a lock > before manipulating sk_backlog{head,tail,len}. Good point. bh_lock_sock is always held for backlog operations. > UDP/IPV6 should be addressed too in your patch. Will do, this is only a RFC anyway. > Other questions raised by your discovery : > - What about other protocols that also use a backlog ? I don't think protocols with flow/congestion control capability have such issue. We have tested TCP is immune. Other current backlog users are dccp, sctp, tipc, x.25 and llc. We didn't test all of them. But looks like only llc here is possible but unlikely? > - __release_sock() could run forever with no preemption, even with a > limit on backlog. Yes, but there is no critical impact like memory exhausted for this case. Thanks, -yi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2010-02-25 at 16:31 +0800, David Miller wrote: > > @@ -1372,8 +1372,13 @@ int udp_queue_rcv_skb(struct sock *sk, struct > sk_buff *skb) > > bh_lock_sock(sk); > > if (!sock_owned_by_user(sk)) > > rc = __udp_queue_rcv_skb(sk, skb); > > - else > > + else { > > + if (atomic_read(&sk->sk_backlog.len) >= sk->sk_rcvbuf) > { > > + bh_unlock_sock(sk); > > + goto drop; > > + } > > sk_add_backlog(sk, skb); > > + } > > We have to address this issue, of course, but I bet this method of > handling it negatively impacts performance in normal cases. Eric mentioned atomic is not required here. I don't think performance will be impacted any more with the above if clause. Right? > Right now we can queue up a lot and still get it to the application > if it is slow getting scheduled onto a cpu, but if you put this > limit here it could result in lots of drops. Or we can replace the sk->sk_rcvbuf limit with a backlog own limit. We can queue "a lot", but not endless. We have to have a limit anyway. Thanks, -yi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Zhu Yi <yi.zhu@intel.com> Date: Fri, 26 Feb 2010 10:44:13 +0800 > On Thu, 2010-02-25 at 16:31 +0800, David Miller wrote: >> Right now we can queue up a lot and still get it to the application >> if it is slow getting scheduled onto a cpu, but if you put this >> limit here it could result in lots of drops. > > Or we can replace the sk->sk_rcvbuf limit with a backlog own limit. We > can queue "a lot", but not endless. We have to have a limit anyway. Simply using (2 * sk->sk_rcvbuf) might be sufficient enough. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 26 février 2010 à 10:34 +0800, Zhu Yi a écrit : > On Thu, 2010-02-25 at 19:24 +0800, Eric Dumazet wrote: > > - What about other protocols that also use a backlog ? > > I don't think protocols with flow/congestion control capability have > such issue. We have tested TCP is immune. Other current backlog users > are dccp, sctp, tipc, x.25 and llc. We didn't test all of them. But > looks like only llc here is possible but unlikely? > I looked again for TCP case and I see same problem than UDP. If socket is locked by user, and a softirq flood happens, you can fill backlog too with frames (any frame with correct tuple, even with bad sequences), and exhaust memory. > > - __release_sock() could run forever with no preemption, even with a > > limit on backlog. > > Yes, but there is no critical impact like memory exhausted for this > case. > Well, if you have one processor, and a process doesnt want to yield the cpu (apart of sofirq of course that is filling the backlog while your process tries to empty it), your machine is dead. This is critical too :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2010-02-26 at 21:12 +0800, Eric Dumazet wrote: > Well, if you have one processor, and a process doesnt want to yield > the cpu (apart of sofirq of course that is filling the backlog while > your process tries to empty it), your machine is dead. This is > critical too :) If you only have one CPU, this won't happen. Because while the receiver is busy processing the backlog, no senders will have the chance to be scheduled to Tx more. And with the limited backlog, it won't take long for the receiver to finish processing all the frames in the backlog. Thanks, -yi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 01 mars 2010 à 10:17 +0800, Zhu Yi a écrit : > On Fri, 2010-02-26 at 21:12 +0800, Eric Dumazet wrote: > > Well, if you have one processor, and a process doesnt want to yield > > the cpu (apart of sofirq of course that is filling the backlog while > > your process tries to empty it), your machine is dead. This is > > critical too :) > > If you only have one CPU, this won't happen. Because while the receiver > is busy processing the backlog, no senders will have the chance to be > scheduled to Tx more. And with the limited backlog, it won't take long > for the receiver to finish processing all the frames in the backlog. > You focus on the case you Intel guys discovered the flaw, using loopback interface. I am concerned with a DOS situation, when some bad guys on your LAN sends a flood on your machine, using a real 10Gb NIC. I was concerned by two things : - One process being stuck forever in the __release_sock(), basically stopping an application from performing progress. This is a DOS problem. Our kernel is potentially affected. We probably should do something to avoid this problem. But I am _not_ saying this should be done by your patch, it is probably possible to address it in an independant patch. - One process being stuck in __release_sock() and not yield to other processes. This is not the case since we call the cond_resched_sofirq() function that permits other high priority task to get the CPU. One more point : If you remember my previous mail, I suggested cleaning the len field in __release_sock(). This way, you can provide a first patch, protocol agnostic, then provide further patch for UDP V4, another patch for UDP V6, etc... to have a clean path and make the resolution of the bugs more self explaining and not as a whole and big patch. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2010-03-01 at 10:29 +0800, Eric Dumazet wrote: > You focus on the case you Intel guys discovered the flaw, using > loopback interface. I am concerned with a DOS situation, when some bad > guys on your LAN sends a flood on your machine, using a real 10Gb NIC. > I was concerned by two things : > > - One process being stuck forever in the __release_sock(), basically > stopping an application from performing progress. This is a DOS > problem. > Our kernel is potentially affected. We probably should do something to > avoid this problem. But I am _not_ saying this should be done by your > patch, it is probably possible to address it in an independant patch. Agreed. Unless you or someone else provides a better idea solves them both, I'd only fix one problem at a time. > - One process being stuck in __release_sock() and not yield to other > processes. This is not the case since we call the > cond_resched_sofirq() > function that permits other high priority task to get the CPU. > > > One more point : > > If you remember my previous mail, I suggested cleaning the len field > in > __release_sock(). This way, you can provide a first patch, protocol > agnostic, then provide further patch for UDP V4, another patch for UDP > V6, etc... to have a clean path and make the resolution of the bugs > more > self explaining and not as a whole and big patch. This is my original plan. But davem just mentioned we have to prevent from maliciously send frames, that means all protocols using backlog. That makes me consider if I should put the check to the generic path. Thanks, -yi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 01 mars 2010 à 11:03 +0800, Zhu Yi a écrit : > This is my original plan. But davem just mentioned we have to prevent > from maliciously send frames, that means all protocols using backlog. > That makes me consider if I should put the check to the generic path. > Sure, check could be done in sk_add_backlog() (including the atomic_inc(&sk->sk_drops); and kfree_skb(skb);), but make sure a status is returned so that we can increment SNMP counters at protocol level. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/sock.h b/include/net/sock.h index 3f1a480..2e003b9 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -253,6 +253,7 @@ struct sock { struct { struct sk_buff *head; struct sk_buff *tail; + atomic_t len; } sk_backlog; wait_queue_head_t *sk_sleep; struct dst_entry *sk_dst_cache; @@ -583,11 +584,13 @@ static inline void sk_add_backlog(struct sock *sk, struct sk_buff *skb) sk->sk_backlog.tail->next = skb; sk->sk_backlog.tail = skb; } + atomic_add(skb->truesize, &sk->sk_backlog.len); skb->next = NULL; } static inline int sk_backlog_rcv(struct sock *sk, struct sk_buff *skb) { + atomic_sub(skb->truesize, &sk->sk_backlog.len); return sk->sk_backlog_rcv(sk, skb); } diff --git a/net/core/sock.c b/net/core/sock.c index e1f6f22..3b988f2 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1138,6 +1138,7 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t priority) sock_lock_init(newsk); bh_lock_sock(newsk); newsk->sk_backlog.head = newsk->sk_backlog.tail = NULL; + atomic_set(&newsk->sk_backlog.len, 0); atomic_set(&newsk->sk_rmem_alloc, 0); /* diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index f0126fd..e019067 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1372,8 +1372,13 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) bh_lock_sock(sk); if (!sock_owned_by_user(sk)) rc = __udp_queue_rcv_skb(sk, skb); - else + else { + if (atomic_read(&sk->sk_backlog.len) >= sk->sk_rcvbuf) { + bh_unlock_sock(sk); + goto drop; + } sk_add_backlog(sk, skb); + } bh_unlock_sock(sk); return rc;
Hi, We got system OOM while running some UDP netperf testing on the loopback device. The case is multiple senders sent stream UDP packets to a single receiver via loopback on local host. Of course, the receiver is not able to handle all the packets in time. But we surprisingly found that these packets were not discarded due to the receiver's sk->sk_rcvbuf limit. Instead, they are kept queuing to sk->sk_backlog and finally ate up all the memory. We believe this is a secure hole that a none privileged user can crash the system. The root cause for this problem is, when the receiver is doing __release_sock() (i.e. after userspace recv, kernel udp_recvmsg -> skb_free_datagram_locked -> release_sock), it moves skbs from backlog to sk_receive_queue with the softirq enabled. In the above case, multiple busy senders will almost make it an endless loop. The skbs in the backlog end up eat all the system memory. The patch fixed this problem by adding accounting for the socket backlog. So that the backlog size can be restricted by protocol's choice (i.e. UDP). Signed-off-by: Zhu Yi <yi.zhu@intel.com> --- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html