diff mbox

[net-next,v4] rps: selective flow shedding during softnet overflow

Message ID 51773905.9030005@mojatatu.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jamal Hadi Salim April 24, 2013, 1:44 a.m. UTC
On 13-04-23 09:32 PM, Eric Dumazet wrote:
> On Tue, 2013-04-23 at 21:25 -0400, Jamal Hadi Salim wrote:


> Not sure what you mean. The qdisc stuff would replace the 'cpu backlog',

Aha ;->
So you would have many little backlogs one per ring per cpu, correct?


> not be added to it. Think of having possibility to control backlog using
> standard qdiscs, like fq_codel ;)

Excellent. So this is not as a big surgery as it sounds then.
the backloglets just need to be exposed as netdevs.

> Yes, but the per cpu backlog is shared for all devices. We probably want
> different qdisc for gre tunnel, eth0, ...

Makes sense.

BTW, looking at __skb_get_rxhash(), if i had a driver that sets either
skb->rxhash (picks it off the dma descriptor), could i not use that 
instead of computing the hash? something like attached patch.

cheers,
jamal

Comments

Eric Dumazet April 24, 2013, 2:11 a.m. UTC | #1
On Tue, 2013-04-23 at 21:44 -0400, Jamal Hadi Salim wrote:

> 
> BTW, looking at __skb_get_rxhash(), if i had a driver that sets either
> skb->rxhash (picks it off the dma descriptor), could i not use that 
> instead of computing the hash? something like attached patch.
> 

The caller does this already ;)

static inline __u32 skb_get_rxhash(struct sk_buff *skb)
{
        if (!skb->l4_rxhash)
                __skb_get_rxhash(skb);

        return skb->rxhash;
}

Rationale being : if l4 rxhash was already provided, use it.

AFAIK, only bnx2x provides this.

For other cases, we prefer trying a software rxhash, as it gives us more
capabilities than the standard Toepliz hash (Not l4 for UDP flows for
example)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim April 24, 2013, 1 p.m. UTC | #2
On 13-04-23 10:11 PM, Eric Dumazet wrote:

>
> The caller does this already ;)

[..]
>
> Rationale being : if l4 rxhash was already provided, use it.
>
> AFAIK, only bnx2x provides this.
>
 > For other cases, we prefer trying a software rxhash, as it gives us
 > more
 > capabilities than the standard Toepliz hash (Not l4 for UDP flows for
 > example)
 >


I forgot about the Toepliz hash connection. I can see it makes sense here.

Let me clarify:
In the scenario i am thinking of, I have clever hardware which is smart 
enough to deal with details of identifying flow state(including 
fragementation etc) and tagging it in a DMA descriptor with 32 bit id.
I want to be able to take the tag produced by the hardware and use
that for rps cpu selection i.e assume the hardware has already done the
hashing and is giving me a 32 bit id. My initial thought was skb->rxhash
is the right spot to store this; then make get_rps_cpu() do the
selection based on this. l4 rxhash is 1 bit which is too small.

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 24, 2013, 2:41 p.m. UTC | #3
On Wed, 2013-04-24 at 09:00 -0400, Jamal Hadi Salim wrote:
> I forgot about the Toepliz hash connection. I can see it makes sense here.
> 
> Let me clarify:
> In the scenario i am thinking of, I have clever hardware which is smart 
> enough to deal with details of identifying flow state(including 
> fragementation etc) and tagging it in a DMA descriptor with 32 bit id.
> I want to be able to take the tag produced by the hardware and use
> that for rps cpu selection i.e assume the hardware has already done the
> hashing and is giving me a 32 bit id. My initial thought was skb->rxhash
> is the right spot to store this; then make get_rps_cpu() do the
> selection based on this. l4 rxhash is 1 bit which is too small.

Set skb->rxrhash to the hash your hardware computed, and skb->l4_rxhash
to 1.

Then get_rps_cpu() will use skb->rxhash happily

(and other callers of skb_get_rxhash() as well)

Not clear what you mean by fragmentation : fragmented frames have no
flow information (but the first fragment)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index e187bf0..a6abee0 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -159,8 +159,9 @@  void __skb_get_rxhash(struct sk_buff *skb)
 	struct flow_keys keys;
 	u32 hash;
 
-	if (!skb_flow_dissect(skb, &keys))
+	if (skb->rxhash || !skb_flow_dissect(skb, &keys)) {
 		return;
+	}
 
 	if (keys.ports)
 		skb->l4_rxhash = 1;