diff mbox

af_packet / TX_RING not fully non-blocking (w/ MSG_DONTWAIT)

Message ID 551D1F86.8050200@fokus.fraunhofer.de
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Kretschmer, Mathias April 2, 2015, 10:52 a.m. UTC
Dear all,

we have encountered a problem where the send(MSG_DONTWAIT) call on a TX_RING is not 
fully non-blocking in cases where the device's sndBuf is full (i.e. we are trying to 
write faster than the device can handle).

This is on a WLAN radio (so it's not that hard to achieve :).

Comparing the TX_RING send() handler to the regular send() handler, the difference 
seems to be in the sock_alloc_send_skb() call where, the regular handler passes a 
(flags & MSG_DONTWAIT), while the TX_RING handler always passes a 0 (block).

The attached patch changes this behavior by

a) also passing (flags & MSG_DONTWAIT)
b) adjusting the return code so that -ENOBUFS is returned if no frame could be sent 
or to return the number of bytes sent, if frame(s) could be sent within this call.

The proposed modification works fine for us and has been tested extensively with 
WLAN and Ethernet device.

Feel free to apply this patch if you agree with this solution.
Of course, we're also open to other solutions / proposals / ideas.

Cheers,

Mathias

Comments

Daniel Borkmann April 3, 2015, 10:22 p.m. UTC | #1
Hi Mathias,

On 04/02/2015 12:52 PM, Mathias Kretschmer wrote:
> Dear all,
>
> we have encountered a problem where the send(MSG_DONTWAIT) call on a TX_RING is not fully non-blocking in cases where the device's sndBuf is full (i.e. we are trying to write faster than the device can handle).
>
> This is on a WLAN radio (so it's not that hard to achieve :).
>
> Comparing the TX_RING send() handler to the regular send() handler, the difference seems to be in the sock_alloc_send_skb() call where, the regular handler passes a (flags & MSG_DONTWAIT), while the TX_RING handler always passes a 0 (block).
>
> The attached patch changes this behavior by
>
> a) also passing (flags & MSG_DONTWAIT)
> b) adjusting the return code so that -ENOBUFS is returned if no frame could be sent or to return the number of bytes sent, if frame(s) could be sent within this call.
>
> The proposed modification works fine for us and has been tested extensively with WLAN and Ethernet device.
>
> Feel free to apply this patch if you agree with this solution.
> Of course, we're also open to other solutions / proposals / ideas.

Please send a proper patch with SOB, and no white space corruption
(there are spaces instead of tabs).

+		if (skb == NULL) {
+	                /* we assume the socket was initially writeable ... */
+                        if (likely(len_sum > 0))
+                        	err = len_sum;
+                	else
+                        	err = -ENOBUFS;
  			goto out_status;

What I'm a bit worried about is, if existing applications would be
able to handle -ENOBUFS? Any reason you don't let -EAGAIN from the
sock_alloc_send_skb() not pass through?

Well, man 2 sendmsg clearly describes the -EAGAIN possibility as
"the socket is marked nonblocking and the requested operation would
block". So far it was apparently not returned since here we'd just
have blocked, but strictly speaking non-blocking applications would
need to be aware and should handle -EAGAIN, that awareness might be
more likely than -ENOBUFS, imho. What do you think?

Cheers,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kretschmer, Mathias April 5, 2015, 8:35 a.m. UTC | #2
Hi Jeff,

IMHO, the unlikely() makes perfect sense in the blocking case while in 
the non-blocking case it depends on the scenario:
What's more likely, user space writing faster than the device can handle 
or vice versa ?
The reason I removed the unlikely() is that I thought the situation is 
rather balanced in the non-blocking case.

Let's see, if we assume that in the non-blocking case we go through 
select()/poll()/epoll() first, it is likely() that we can write, at 
least, one frame, while we would break after the first unsuccessful skb 
alloc.

Hence, if we can only write one frames, the chance are fifty:fifty => no 
likely()/unlikely().
If we assume we can typically write more than one frame, we probably 
should put the unlikely() back.

What do you think ?

Cheers,

Mathias

On 04/05/2015 09:13 AM, Xin Zhou wrote:
> Hi Mathias,
>
> Just for a general discussion, could removing the unlikely has 
> performance impact on some applications or platforms?
>
> -        if (unlikely(skb == NULL))
> +        if (skb == NULL) {
> +                    /* we assume the socket was initially writeable 
> ... */
> +                        if (likely(len_sum > 0))
> +                            err = len_sum;
> +                    else
> +                            err = -ENOBUFS;
>              goto out_status;
> -
> +                }
>
> Looking through the code in the do {} while loop of API tpacket_snd(),
> the code is highly optimized with branch predictions.
>
> Is it possible the original intention is to pass noblock=0, and use 
> "unlikely"?
>
> Thanks for discussion,
> Jeff
>
>
> On Thu, Apr 2, 2015 at 3:52 AM, Mathias Kretschmer 
> <mathias.kretschmer@fokus.fraunhofer.de 
> <mailto:mathias.kretschmer@fokus.fraunhofer.de>> wrote:
>
>     Dear all,
>
>     we have encountered a problem where the send(MSG_DONTWAIT) call on
>     a TX_RING is not fully non-blocking in cases where the device's
>     sndBuf is full (i.e. we are trying to write faster than the device
>     can handle).
>
>     This is on a WLAN radio (so it's not that hard to achieve :).
>
>     Comparing the TX_RING send() handler to the regular send()
>     handler, the difference seems to be in the sock_alloc_send_skb()
>     call where, the regular handler passes a (flags & MSG_DONTWAIT),
>     while the TX_RING handler always passes a 0 (block).
>
>     The attached patch changes this behavior by
>
>     a) also passing (flags & MSG_DONTWAIT)
>     b) adjusting the return code so that -ENOBUFS is returned if no
>     frame could be sent or to return the number of bytes sent, if
>     frame(s) could be sent within this call.
>
>     The proposed modification works fine for us and has been tested
>     extensively with WLAN and Ethernet device.
>
>     Feel free to apply this patch if you agree with this solution.
>     Of course, we're also open to other solutions / proposals / ideas.
>
>     Cheers,
>
>     Mathias
>
>     -- 
>     Dr. Mathias Kretschmer, Head of Competence Center
>     Fraunhofer FOKUS Network Research
>     A Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>     T +49-2241-14-3466 <tel:%2B49-2241-14-3466>, F +49-2241-14-1050
>     <tel:%2B49-2241-14-1050>
>     E mathias.kretschmer@fokus.fraunhofer.de
>     <mailto:mathias.kretschmer@fokus.fraunhofer.de>
>     W http://www.fokus.fraunhofer.de/en/net
>
>
diff mbox

Patch

diff -uNpr linux-3.16.7.orig/net/packet/af_packet.c linux-3.16.7/net/packet/af_packet.c
--- linux-3.16.7.orig/net/packet/af_packet.c	2014-10-30 16:41:01.000000000 +0000
+++ linux-3.16.7/net/packet/af_packet.c	2015-04-02 08:43:37.386617712 +0000
@@ -2285,17 +2285,22 @@  static int tpacket_snd(struct packet_soc
 				schedule();
 			continue;
 		}
-
+	
 		status = TP_STATUS_SEND_REQUEST;
 		hlen = LL_RESERVED_SPACE(dev);
 		tlen = dev->needed_tailroom;
 		skb = sock_alloc_send_skb(&po->sk,
 				hlen + tlen + sizeof(struct sockaddr_ll),
-				0, &err);
+				!need_wait, &err);
 
-		if (unlikely(skb == NULL))
+		if (skb == NULL) {
+	                /* we assume the socket was initially writeable ... */
+                        if (likely(len_sum > 0))
+                        	err = len_sum;
+                	else
+                        	err = -ENOBUFS;
 			goto out_status;
-
+                }
 		tp_len = tpacket_fill_skb(po, skb, ph, dev, size_max, proto,
 					  addr, hlen);
 		if (tp_len > dev->mtu + dev->hard_header_len) {