Message ID | 551D1F86.8050200@fokus.fraunhofer.de |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Hi Mathias, On 04/02/2015 12:52 PM, Mathias Kretschmer wrote: > Dear all, > > we have encountered a problem where the send(MSG_DONTWAIT) call on a TX_RING is not fully non-blocking in cases where the device's sndBuf is full (i.e. we are trying to write faster than the device can handle). > > This is on a WLAN radio (so it's not that hard to achieve :). > > Comparing the TX_RING send() handler to the regular send() handler, the difference seems to be in the sock_alloc_send_skb() call where, the regular handler passes a (flags & MSG_DONTWAIT), while the TX_RING handler always passes a 0 (block). > > The attached patch changes this behavior by > > a) also passing (flags & MSG_DONTWAIT) > b) adjusting the return code so that -ENOBUFS is returned if no frame could be sent or to return the number of bytes sent, if frame(s) could be sent within this call. > > The proposed modification works fine for us and has been tested extensively with WLAN and Ethernet device. > > Feel free to apply this patch if you agree with this solution. > Of course, we're also open to other solutions / proposals / ideas. Please send a proper patch with SOB, and no white space corruption (there are spaces instead of tabs). + if (skb == NULL) { + /* we assume the socket was initially writeable ... */ + if (likely(len_sum > 0)) + err = len_sum; + else + err = -ENOBUFS; goto out_status; What I'm a bit worried about is, if existing applications would be able to handle -ENOBUFS? Any reason you don't let -EAGAIN from the sock_alloc_send_skb() not pass through? Well, man 2 sendmsg clearly describes the -EAGAIN possibility as "the socket is marked nonblocking and the requested operation would block". So far it was apparently not returned since here we'd just have blocked, but strictly speaking non-blocking applications would need to be aware and should handle -EAGAIN, that awareness might be more likely than -ENOBUFS, imho. What do you think? Cheers, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jeff, IMHO, the unlikely() makes perfect sense in the blocking case while in the non-blocking case it depends on the scenario: What's more likely, user space writing faster than the device can handle or vice versa ? The reason I removed the unlikely() is that I thought the situation is rather balanced in the non-blocking case. Let's see, if we assume that in the non-blocking case we go through select()/poll()/epoll() first, it is likely() that we can write, at least, one frame, while we would break after the first unsuccessful skb alloc. Hence, if we can only write one frames, the chance are fifty:fifty => no likely()/unlikely(). If we assume we can typically write more than one frame, we probably should put the unlikely() back. What do you think ? Cheers, Mathias On 04/05/2015 09:13 AM, Xin Zhou wrote: > Hi Mathias, > > Just for a general discussion, could removing the unlikely has > performance impact on some applications or platforms? > > - if (unlikely(skb == NULL)) > + if (skb == NULL) { > + /* we assume the socket was initially writeable > ... */ > + if (likely(len_sum > 0)) > + err = len_sum; > + else > + err = -ENOBUFS; > goto out_status; > - > + } > > Looking through the code in the do {} while loop of API tpacket_snd(), > the code is highly optimized with branch predictions. > > Is it possible the original intention is to pass noblock=0, and use > "unlikely"? > > Thanks for discussion, > Jeff > > > On Thu, Apr 2, 2015 at 3:52 AM, Mathias Kretschmer > <mathias.kretschmer@fokus.fraunhofer.de > <mailto:mathias.kretschmer@fokus.fraunhofer.de>> wrote: > > Dear all, > > we have encountered a problem where the send(MSG_DONTWAIT) call on > a TX_RING is not fully non-blocking in cases where the device's > sndBuf is full (i.e. we are trying to write faster than the device > can handle). > > This is on a WLAN radio (so it's not that hard to achieve :). > > Comparing the TX_RING send() handler to the regular send() > handler, the difference seems to be in the sock_alloc_send_skb() > call where, the regular handler passes a (flags & MSG_DONTWAIT), > while the TX_RING handler always passes a 0 (block). > > The attached patch changes this behavior by > > a) also passing (flags & MSG_DONTWAIT) > b) adjusting the return code so that -ENOBUFS is returned if no > frame could be sent or to return the number of bytes sent, if > frame(s) could be sent within this call. > > The proposed modification works fine for us and has been tested > extensively with WLAN and Ethernet device. > > Feel free to apply this patch if you agree with this solution. > Of course, we're also open to other solutions / proposals / ideas. > > Cheers, > > Mathias > > -- > Dr. Mathias Kretschmer, Head of Competence Center > Fraunhofer FOKUS Network Research > A Schloss Birlinghoven, 53754 Sankt Augustin, Germany > T +49-2241-14-3466 <tel:%2B49-2241-14-3466>, F +49-2241-14-1050 > <tel:%2B49-2241-14-1050> > E mathias.kretschmer@fokus.fraunhofer.de > <mailto:mathias.kretschmer@fokus.fraunhofer.de> > W http://www.fokus.fraunhofer.de/en/net > >
diff -uNpr linux-3.16.7.orig/net/packet/af_packet.c linux-3.16.7/net/packet/af_packet.c --- linux-3.16.7.orig/net/packet/af_packet.c 2014-10-30 16:41:01.000000000 +0000 +++ linux-3.16.7/net/packet/af_packet.c 2015-04-02 08:43:37.386617712 +0000 @@ -2285,17 +2285,22 @@ static int tpacket_snd(struct packet_soc schedule(); continue; } - + status = TP_STATUS_SEND_REQUEST; hlen = LL_RESERVED_SPACE(dev); tlen = dev->needed_tailroom; skb = sock_alloc_send_skb(&po->sk, hlen + tlen + sizeof(struct sockaddr_ll), - 0, &err); + !need_wait, &err); - if (unlikely(skb == NULL)) + if (skb == NULL) { + /* we assume the socket was initially writeable ... */ + if (likely(len_sum > 0)) + err = len_sum; + else + err = -ENOBUFS; goto out_status; - + } tp_len = tpacket_fill_skb(po, skb, ph, dev, size_max, proto, addr, hlen); if (tp_len > dev->mtu + dev->hard_header_len) {