diff mbox

[net-next,2/2] sunvnet: Re-check for a VIO_DESC_READY data descriptor after short udelay()

Message ID 20140829201809.GH14753@oracle.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Sowmini Varadhan Aug. 29, 2014, 8:18 p.m. UTC
Upon encountering the first !VIO_DESC_READY in vnet_walk_rx(),
it is frequently worthwhile to re-check the descriptor status
after a short microsecond delay, as a bursty sender could
be actively populating the descriptors, and the short udelay()
is far less expensive than rolling back to ldc_rx() and having
to wake up and read data on another LDC message.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Raghuram Kothakota <raghuram.kothakota@oracle.com>
---
 drivers/net/ethernet/sun/sunvnet.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

David Miller Sept. 2, 2014, 3:47 a.m. UTC | #1
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Fri, 29 Aug 2014 16:18:09 -0400

> @@ -398,6 +398,20 @@ again:
>  		err = vnet_walk_rx_one(port, dr, start, &ack);
>  		if (err == -ECONNRESET)
>  			return err;
> +		if (err != 0)  {
> +			/* The descriptor was not READY. Retry with a
> +			 * small delay, in case we have a bursty sender
> +			 * that is actively populating the descriptors, to
> +			 * reduce the overhead of stopping and re-entering
> +			 * which would involve expensive LDC messages.
> +			 */
> +			if (retries++ < 3) {
> +				udelay(4);
> +				goto again;
> +			} else {
> +				break;
> +			}
> +		}

I'm definitely not happy with this.

If there were no more packets coming, this is wasted useless polling
time in atomic context.

Everything should be event based, and we should not be compensating
and making sacrifices for a producer slower than we are as a consumer.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sowmini Varadhan Sept. 2, 2014, 10:27 a.m. UTC | #2
On 09/01/2014 11:47 PM, David Miller wrote:

>
> If there were no more packets coming, this is wasted useless polling
> time in atomic context.

turns out that this gives a 20-30% perf improvement for tests like
iperf.

when there are no more packets coming, the extra 12 microsecond
delay is not that big of a deal anyway. The point was that the extra
12 micro-second tax in the quiescent network state is less expensive
than exiting interrupt context, taking another interrupt and doing
another ldc_read, when there is actually a burst of packets.

notice that there are many other such udelay() loops elsewhere in
the code.

I can remove the retries and submit patch 1/1 again later today.

>
> Everything should be event based, and we should not be compensating
> and making sacrifices for a producer slower than we are as a consumer.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Raghuram Kothakota Sept. 2, 2014, 4:43 p.m. UTC | #3
On Sep 2, 2014, at 3:27 AM, Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote:

> On 09/01/2014 11:47 PM, David Miller wrote:
> 
>> 
>> If there were no more packets coming, this is wasted useless polling
>> time in atomic context.
> 
> turns out that this gives a 20-30% perf improvement for tests like
> iperf.
> 
> when there are no more packets coming, the extra 12 microsecond
> delay is not that big of a deal anyway. The point was that the extra
> 12 micro-second tax in the quiescent network state is less expensive
> than exiting interrupt context, taking another interrupt and doing
> another ldc_read, when there is actually a burst of packets.
> 

We could optimize this a bit by not wait for normal traffic, I mean, non-burst traffic
and apply the retry  only  when we detect a stream of packets?
We could detect a stream based on how many packets are picked
up in this function, picking up 3 or more could be considered as a stream,
of course tune based on testing.

You probably tried it already, but checking to see if you tried with less number
of iterations, we could reduce the iterations if the numbers are equally good
with less iterations.

-Raghuram

> notice that there are many other such udelay() loops elsewhere in
> the code.
> 
> I can remove the retries and submit patch 1/1 again later today.
> 
>> 
>> Everything should be event based, and we should not be compensating
>> and making sacrifices for a producer slower than we are as a consumer.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sowmini Varadhan Sept. 2, 2014, 4:56 p.m. UTC | #4
On (09/02/14 09:43), Raghuram Kothakota wrote:
> We could optimize this a bit by not wait for normal traffic, I mean,
> non-burst traffic
> and apply the retry  only  when we detect a stream of packets?

How could you tell the difference efficiently? You'd need to
track some kind of history/state for inter-packet arrival time.
All seems like over-kill (more useful to go and optimize other
parts of the system, such as do less work in interrupt context).

> We could detect a stream based on how many packets are picked
> up in this function, picking up 3 or more could be considered as a stream,
> of course tune based on testing.
> 
> You probably tried it already, but checking to see if you tried with
> less number
> of iterations, we could reduce the iterations if the numbers are equally good
> with less iterations.

yes, the 3 * 4 micro-seconds was arrived at heuristically.

--Sowmini

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Raghuram Kothakota Sept. 2, 2014, 5:33 p.m. UTC | #5
On Sep 2, 2014, at 9:56 AM, Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote:

> On (09/02/14 09:43), Raghuram Kothakota wrote:
>> We could optimize this a bit by not wait for normal traffic, I mean,
>> non-burst traffic
>> and apply the retry  only  when we detect a stream of packets?
> 
> How could you tell the difference efficiently? You'd need to
> track some kind of history/state for inter-packet arrival time.
> All seems like over-kill (more useful to go and optimize other
> parts of the system, such as do less work in interrupt context).

From what I see, the vnet_walk_rx() picks up packets in a while loop,
we could count the number of packets picked up in that loop and use
that count as a method to determine if we need to apply this retry or not.
That is, retry only if that counter is > x,  that may avoid waiting for cases
where peer sent one packet only?   It may be worth trying it
and see if it still keeps up the improvement that you saw.

-Raghuram
> 
>> We could detect a stream based on how many packets are picked
>> up in this function, picking up 3 or more could be considered as a stream,
>> of course tune based on testing.
>> 
>> You probably tried it already, but checking to see if you tried with
>> less number
>> of iterations, we could reduce the iterations if the numbers are equally good
>> with less iterations.
> 
> yes, the 3 * 4 micro-seconds was arrived at heuristically.
> 
> --Sowmini
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 2, 2014, 8:59 p.m. UTC | #6
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Tue, 02 Sep 2014 06:27:54 -0400

> when there are no more packets coming, the extra 12 microsecond
> delay is not that big of a deal anyway.

How much other work could the cpu do in those 12 microseconds?

That's almost 3000 cpu cycles on a T4.

I understand your argument, and the fact that there are some existing
pieces of code doing this already, so I'll think about it some more.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sowmini Varadhan Sept. 3, 2014, 5:12 p.m. UTC | #7
On 09/02/2014 04:59 PM, David Miller wrote:
> From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> Date: Tue, 02 Sep 2014 06:27:54 -0400
>
>> when there are no more packets coming, the extra 12 microsecond
>> delay is not that big of a deal anyway.
>
> How much other work could the cpu do in those 12 microseconds?
>
> That's almost 3000 cpu cycles on a T4.
>
> I understand your argument, and the fact that there are some existing
> pieces of code doing this already, so I'll think about it some more.
>
> Thanks.
>

Maybe we should just leave out patch 2/2 for now, and only retain 1/2.
I was always somewhat ambivalent about the fudge-factor anyway,
and there are plenty of other things we can do to get better
perf- we could revisit this one afterward, if it still makes
a difference.

--Sowmini

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
index fc13b9c..7b1f320 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -398,6 +398,20 @@  again:
 		err = vnet_walk_rx_one(port, dr, start, &ack);
 		if (err == -ECONNRESET)
 			return err;
+		if (err != 0)  {
+			/* The descriptor was not READY. Retry with a
+			 * small delay, in case we have a bursty sender
+			 * that is actively populating the descriptors, to
+			 * reduce the overhead of stopping and re-entering
+			 * which would involve expensive LDC messages.
+			 */
+			if (retries++ < 3) {
+				udelay(4);
+				goto again;
+			} else {
+				break;
+			}
+		}
 		if (ack_start == -1)
 			ack_start = start;
 		ack_end = start;