diff mbox

af_packet: don't pass empty blocks for PACKET_V3

Message ID 1423115891-3578-1-git-send-email-al.drozdov@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Alexander Drozdov Feb. 5, 2015, 5:58 a.m. UTC
Don't close an empty block on timeout. Its meaningless to
pass it to the user. Moreover, passing empty blocks wastes
CPU & buffer space increasing probability of packets
dropping on small timeouts.

Side effect of this patch is indefinite user-space wait
in poll on idle links. But, I believe its better to set
timeout for poll(2) when needed than to get empty blocks
every millisecond when not needed.

Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
---
 net/packet/af_packet.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Willem de Bruijn Feb. 5, 2015, 8:01 p.m. UTC | #1
On Wed, Feb 4, 2015 at 9:58 PM, Alexander Drozdov <al.drozdov@gmail.com> wrote:
> Don't close an empty block on timeout. Its meaningless to
> pass it to the user. Moreover, passing empty blocks wastes
> CPU & buffer space increasing probability of packets
> dropping on small timeouts.
>
> Side effect of this patch is indefinite user-space wait
> in poll on idle links. But, I believe its better to set
> timeout for poll(2) when needed than to get empty blocks
> every millisecond when not needed.

This change would break existing applications that have come
to depend on the periodic signal.

I don't disagree with the argument that the data ready signal
should be sent only when a block is full or a timer expires and
at least some data is waiting, but that is moot at this point.

>
> Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
> ---
>  net/packet/af_packet.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 9cfe2e1..9a2f70a 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -698,6 +698,10 @@ static void prb_retire_rx_blk_timer_expired(unsigned long data)
>
>         if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
>                 if (!frozen) {
> +                       if (!BLOCK_NUM_PKTS(pbd)) {
> +                               /* An empty block. Just refresh the timer. */
> +                               goto refresh_timer;
> +                       }
>                         prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
>                         if (!prb_dispatch_next_block(pkc, po))
>                                 goto refresh_timer;
> @@ -798,7 +802,11 @@ static void prb_close_block(struct tpacket_kbdq_core *pkc1,
>                 h1->ts_last_pkt.ts_sec = last_pkt->tp_sec;
>                 h1->ts_last_pkt.ts_nsec = last_pkt->tp_nsec;
>         } else {
> -               /* Ok, we tmo'd - so get the current time */
> +               /* Ok, we tmo'd - so get the current time.
> +                *
> +                * It shouldn't really happen as we don't close empty
> +                * blocks. See prb_retire_rx_blk_timer_expired().
> +                */
>                 struct timespec ts;
>                 getnstimeofday(&ts);
>                 h1->ts_last_pkt.ts_sec = ts.tv_sec;
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guy Harris Feb. 5, 2015, 9:16 p.m. UTC | #2
On Feb 5, 2015, at 12:01 PM, Willem de Bruijn <willemb@google.com> wrote:

> On Wed, Feb 4, 2015 at 9:58 PM, Alexander Drozdov <al.drozdov@gmail.com> wrote:
>> Don't close an empty block on timeout. Its meaningless to
>> pass it to the user. Moreover, passing empty blocks wastes
>> CPU & buffer space increasing probability of packets
>> dropping on small timeouts.
>> 
>> Side effect of this patch is indefinite user-space wait
>> in poll on idle links. But, I believe its better to set
>> timeout for poll(2) when needed than to get empty blocks
>> every millisecond when not needed.
> 
> This change would break existing applications that have come
> to depend on the periodic signal.
> 
> I don't disagree with the argument that the data ready signal
> should be sent only when a block is full or a timer expires and
> at least some data is waiting, but that is moot at this point.

For what it's worth, the BPF packet capture mechanism (which really needs a new name, to distinguish itself from the BPF packet filter language and its implementation(s), but I digress) has the same issue - when the timer expires, a wakeup is delivered even if there are no packets to read.

*However*, if there are no packets available, the buffers aren't rotated, so the empty buffer is left around to be filled up with packets, rather than being made the hold buffer.

Given that before the previous TPACKET_V3 change, wakeups were delivered when packets arrived rather than when a block was closed, presumably code using TPACKET_V3 was capable of dealing with wakeups being delivered when no new blocks had been made available to userland; could TPACKET_V3 work a bit more like BPF and deliver a wakeup when the timer expires *without* closing the empty block?--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Drozdov Feb. 6, 2015, 4:49 a.m. UTC | #3
On 06.02.2015 00:16:30 +0300 Guy Harris <guy@alum.mit.edu> wrote:
> On Feb 5, 2015, at 12:01 PM, Willem de Bruijn <willemb@google.com> wrote:
>
>> On Wed, Feb 4, 2015 at 9:58 PM, Alexander Drozdov <al.drozdov@gmail.com> wrote:
>>> Don't close an empty block on timeout. Its meaningless to
>>> pass it to the user. Moreover, passing empty blocks wastes
>>> CPU & buffer space increasing probability of packets
>>> dropping on small timeouts.
>>>
>>> Side effect of this patch is indefinite user-space wait
>>> in poll on idle links. But, I believe its better to set
>>> timeout for poll(2) when needed than to get empty blocks
>>> every millisecond when not needed.
>> This change would break existing applications that have come
>> to depend on the periodic signal.
>>
>> I don't disagree with the argument that the data ready signal
>> should be sent only when a block is full or a timer expires and
>> at least some data is waiting, but that is moot at this point.
> For what it's worth, the BPF packet capture mechanism (which really needs a new name, to distinguish itself from the BPF packet filter language and its implementation(s), but I digress) has the same issue - when the timer expires, a wakeup is delivered even if there are no packets to read.
>
> *However*, if there are no packets available, the buffers aren't rotated, so the empty buffer is left around to be filled up with packets, rather than being made the hold buffer.
>
> Given that before the previous TPACKET_V3 change, wakeups were delivered when packets arrived rather than when a block was closed, presumably code using TPACKET_V3 was capable of dealing with wakeups being delivered when no new blocks had been made available to userland; could TPACKET_V3 work a bit more like BPF and deliver a wakeup when the timer expires *without* closing the empty block?
Thank you all for your comments! I'll try to create two patches:
1. Wakeup by timeout without closing the empty block
2. Allow to not wakeup by timeout (the feature should be explicitly requested by a user)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Drozdov Feb. 6, 2015, 6:54 a.m. UTC | #4
On 05.02.2015 23:01:38 +0300 Willem de Bruijn wrote:
> On Wed, Feb 4, 2015 at 9:58 PM, Alexander Drozdov <al.drozdov@gmail.com> wrote:
>> Don't close an empty block on timeout. Its meaningless to
>> pass it to the user. Moreover, passing empty blocks wastes
>> CPU & buffer space increasing probability of packets
>> dropping on small timeouts.
>>
>> Side effect of this patch is indefinite user-space wait
>> in poll on idle links. But, I believe its better to set
>> timeout for poll(2) when needed than to get empty blocks
>> every millisecond when not needed.
> This change would break existing applications that have come
> to depend on the periodic signal.
>
> I don't disagree with the argument that the data ready signal
> should be sent only when a block is full or a timer expires and
> at least some data is waiting, but that is moot at this point.
I missed something. As pointed by Guy Harris <guy@alum.mit.edu>,
before the previous patch periodic signal was not delivered. The previous patch
(da413eec729dae5dc by Dan Collins <dan@dcollins.co.nz>) is for 3.19 kernel only.
Should we care about existing 3.19-only applications?
>
>> Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
>> ---
>>   net/packet/af_packet.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index 9cfe2e1..9a2f70a 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -698,6 +698,10 @@ static void prb_retire_rx_blk_timer_expired(unsigned long data)
>>
>>          if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
>>                  if (!frozen) {
>> +                       if (!BLOCK_NUM_PKTS(pbd)) {
>> +                               /* An empty block. Just refresh the timer. */
>> +                               goto refresh_timer;
>> +                       }
>>                          prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
>>                          if (!prb_dispatch_next_block(pkc, po))
>>                                  goto refresh_timer;
>> @@ -798,7 +802,11 @@ static void prb_close_block(struct tpacket_kbdq_core *pkc1,
>>                  h1->ts_last_pkt.ts_sec = last_pkt->tp_sec;
>>                  h1->ts_last_pkt.ts_nsec = last_pkt->tp_nsec;
>>          } else {
>> -               /* Ok, we tmo'd - so get the current time */
>> +               /* Ok, we tmo'd - so get the current time.
>> +                *
>> +                * It shouldn't really happen as we don't close empty
>> +                * blocks. See prb_retire_rx_blk_timer_expired().
>> +                */
>>                  struct timespec ts;
>>                  getnstimeofday(&ts);
>>                  h1->ts_last_pkt.ts_sec = ts.tv_sec;
>> --
>> 1.9.1
>>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willem de Bruijn Feb. 7, 2015, 1:45 a.m. UTC | #5
On Thu, Feb 5, 2015 at 10:54 PM, Alexander Drozdov <al.drozdov@gmail.com> wrote:
> On 05.02.2015 23:01:38 +0300 Willem de Bruijn wrote:
>>
>> On Wed, Feb 4, 2015 at 9:58 PM, Alexander Drozdov <al.drozdov@gmail.com>
>> wrote:
>>>
>>> Don't close an empty block on timeout. Its meaningless to
>>> pass it to the user. Moreover, passing empty blocks wastes
>>> CPU & buffer space increasing probability of packets
>>> dropping on small timeouts.
>>>
>>> Side effect of this patch is indefinite user-space wait
>>> in poll on idle links. But, I believe its better to set
>>> timeout for poll(2) when needed than to get empty blocks
>>> every millisecond when not needed.
>>
>> This change would break existing applications that have come
>> to depend on the periodic signal.
>>
>> I don't disagree with the argument that the data ready signal
>> should be sent only when a block is full or a timer expires and
>> at least some data is waiting, but that is moot at this point.
>
> I missed something. As pointed by Guy Harris <guy@alum.mit.edu>,
> before the previous patch periodic signal was not delivered. The previous
> patch
> (da413eec729dae5dc by Dan Collins <dan@dcollins.co.nz>) is for 3.19 kernel
> only. Should we care about existing 3.19-only applications?

It does sound reasonable to expect processes to handle infinite sleep
on no data if that is the historical behavior of the interface.

>>
>>> Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
>>> ---
>>>   net/packet/af_packet.c | 10 +++++++++-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>>> index 9cfe2e1..9a2f70a 100644
>>> --- a/net/packet/af_packet.c
>>> +++ b/net/packet/af_packet.c
>>> @@ -698,6 +698,10 @@ static void prb_retire_rx_blk_timer_expired(unsigned
>>> long data)
>>>
>>>          if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
>>>                  if (!frozen) {
>>> +                       if (!BLOCK_NUM_PKTS(pbd)) {
>>> +                               /* An empty block. Just refresh the
>>> timer. */
>>> +                               goto refresh_timer;
>>> +                       }
>>>                          prb_retire_current_block(pkc, po,
>>> TP_STATUS_BLK_TMO);
>>>                          if (!prb_dispatch_next_block(pkc, po))
>>>                                  goto refresh_timer;
>>> @@ -798,7 +802,11 @@ static void prb_close_block(struct tpacket_kbdq_core
>>> *pkc1,
>>>                  h1->ts_last_pkt.ts_sec = last_pkt->tp_sec;
>>>                  h1->ts_last_pkt.ts_nsec = last_pkt->tp_nsec;
>>>          } else {
>>> -               /* Ok, we tmo'd - so get the current time */
>>> +               /* Ok, we tmo'd - so get the current time.
>>> +                *
>>> +                * It shouldn't really happen as we don't close empty
>>> +                * blocks. See prb_retire_rx_blk_timer_expired().
>>> +                */
>>>                  struct timespec ts;
>>>                  getnstimeofday(&ts);
>>>                  h1->ts_last_pkt.ts_sec = ts.tv_sec;
>>> --
>>> 1.9.1
>>>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 9cfe2e1..9a2f70a 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -698,6 +698,10 @@  static void prb_retire_rx_blk_timer_expired(unsigned long data)
 
 	if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
 		if (!frozen) {
+			if (!BLOCK_NUM_PKTS(pbd)) {
+				/* An empty block. Just refresh the timer. */
+				goto refresh_timer;
+			}
 			prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
 			if (!prb_dispatch_next_block(pkc, po))
 				goto refresh_timer;
@@ -798,7 +802,11 @@  static void prb_close_block(struct tpacket_kbdq_core *pkc1,
 		h1->ts_last_pkt.ts_sec = last_pkt->tp_sec;
 		h1->ts_last_pkt.ts_nsec	= last_pkt->tp_nsec;
 	} else {
-		/* Ok, we tmo'd - so get the current time */
+		/* Ok, we tmo'd - so get the current time.
+		 *
+		 * It shouldn't really happen as we don't close empty
+		 * blocks. See prb_retire_rx_blk_timer_expired().
+		 */
 		struct timespec ts;
 		getnstimeofday(&ts);
 		h1->ts_last_pkt.ts_sec = ts.tv_sec;