diff mbox

[RFC] net: Pass full skb hash to ndo_rx_flow_steer

Message ID 1416370137-2512-1-git-send-email-therbert@google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Tom Herbert Nov. 19, 2014, 4:08 a.m. UTC
Currently, for aRFS the index into the flow table is passed to
ndo_rx_flow_steer as the flow ID of a connection. This is the skb->hash
& the table mask. It looks like the backend can accept the full
skb->hash as the flow ID which should reduce the number of collisions
in the hardware tables.

This patch provides the skb->hash to the driver for flow steering.
Expiration of HW steered flows was also updated.

With a hash collision in RFS, ndo_rx_flow_steer will continue to be
called with different CPUs, but now with different flow_ids. If this
is still too much device interaction, then it might make sense for the
driver to do its own lookup in its structure to see if a matching
filter is already installed for a given flow_id, an if it is just
refresh a timestamp to avoid expiration (based on looking at sfc
driver).

I don't currently have any HW to test this, if someone could try this
on hardware with aRFS and provide feedback that would be appreciated.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/core/dev.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

David Miller Nov. 19, 2014, 11:56 p.m. UTC | #1
From: Tom Herbert <therbert@google.com>
Date: Tue, 18 Nov 2014 20:08:57 -0800

> Currently, for aRFS the index into the flow table is passed to
> ndo_rx_flow_steer as the flow ID of a connection. This is the skb->hash
> & the table mask. It looks like the backend can accept the full
> skb->hash as the flow ID which should reduce the number of collisions
> in the hardware tables.
> 
> This patch provides the skb->hash to the driver for flow steering.
> Expiration of HW steered flows was also updated.
> 
> With a hash collision in RFS, ndo_rx_flow_steer will continue to be
> called with different CPUs, but now with different flow_ids. If this
> is still too much device interaction, then it might make sense for the
> driver to do its own lookup in its structure to see if a matching
> filter is already installed for a given flow_id, an if it is just
> refresh a timestamp to avoid expiration (based on looking at sfc
> driver).
> 
> I don't currently have any HW to test this, if someone could try this
> on hardware with aRFS and provide feedback that would be appreciated.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>

This seems legitimate to me.

The only thing a driver can reliably do with the flow_id is
pass it back into rps_may_expire_flow().  Therefore you can
pass it in whatever format you like.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Lutomirski Nov. 20, 2014, 12:21 a.m. UTC | #2
On Tue, Nov 18, 2014 at 8:08 PM, Tom Herbert <therbert@google.com> wrote:
> Currently, for aRFS the index into the flow table is passed to
> ndo_rx_flow_steer as the flow ID of a connection. This is the skb->hash
> & the table mask. It looks like the backend can accept the full
> skb->hash as the flow ID which should reduce the number of collisions
> in the hardware tables.
>
> This patch provides the skb->hash to the driver for flow steering.
> Expiration of HW steered flows was also updated.
>
> With a hash collision in RFS, ndo_rx_flow_steer will continue to be
> called with different CPUs, but now with different flow_ids. If this
> is still too much device interaction, then it might make sense for the
> driver to do its own lookup in its structure to see if a matching
> filter is already installed for a given flow_id, an if it is just
> refresh a timestamp to avoid expiration (based on looking at sfc
> driver).
>
> I don't currently have any HW to test this, if someone could try this
> on hardware with aRFS and provide feedback that would be appreciated.
>
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  net/core/dev.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1ab168e..cb1e06d 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3057,7 +3057,8 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
>                         goto out;
>                 flow_id = skb_get_hash(skb) & flow_table->mask;
>                 rc = dev->netdev_ops->ndo_rx_flow_steer(dev, skb,
> -                                                       rxq_index, flow_id);
> +                                                       rxq_index,
> +                                                       skb_get_hash(skb));

Can gcc CSE this?  Not that it matters that much.

>                 if (rc < 0)
>                         goto out;
>                 old_rflow = rflow;
> @@ -3195,8 +3196,8 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
>
>         rcu_read_lock();
>         flow_table = rcu_dereference(rxqueue->rps_flow_table);
> -       if (flow_table && flow_id <= flow_table->mask) {
> -               rflow = &flow_table->flows[flow_id];
> +       if (flow_table) {
> +               rflow = &flow_table->flows[flow_id & flow_table->mask];

I think this is nicer, but why will it help?  If there's a collision
in the low bits of the hash, we'll still think that the flow is
expired.

Is there a real LRU-ish hash table that could be subbed in easily?
There ought to be a data structure that's barely more complicated than
this kind of hash table but that gives much better behavior when the
number of flows is smaller than the table size.

--Andy

>                 cpu = ACCESS_ONCE(rflow->cpu);
>                 if (rflow->filter == filter_id && cpu != RPS_NO_CPU &&
>                     ((int)(per_cpu(softnet_data, cpu).input_queue_head -
> --
> 2.1.0.rc2.206.gedb03e5
>
Tom Herbert Nov. 20, 2014, 1:04 a.m. UTC | #3
On Wed, Nov 19, 2014 at 4:21 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Tue, Nov 18, 2014 at 8:08 PM, Tom Herbert <therbert@google.com> wrote:
>> Currently, for aRFS the index into the flow table is passed to
>> ndo_rx_flow_steer as the flow ID of a connection. This is the skb->hash
>> & the table mask. It looks like the backend can accept the full
>> skb->hash as the flow ID which should reduce the number of collisions
>> in the hardware tables.
>>
>> This patch provides the skb->hash to the driver for flow steering.
>> Expiration of HW steered flows was also updated.
>>
>> With a hash collision in RFS, ndo_rx_flow_steer will continue to be
>> called with different CPUs, but now with different flow_ids. If this
>> is still too much device interaction, then it might make sense for the
>> driver to do its own lookup in its structure to see if a matching
>> filter is already installed for a given flow_id, an if it is just
>> refresh a timestamp to avoid expiration (based on looking at sfc
>> driver).
>>
>> I don't currently have any HW to test this, if someone could try this
>> on hardware with aRFS and provide feedback that would be appreciated.
>>
>> Signed-off-by: Tom Herbert <therbert@google.com>
>> ---
>>  net/core/dev.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 1ab168e..cb1e06d 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3057,7 +3057,8 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
>>                         goto out;
>>                 flow_id = skb_get_hash(skb) & flow_table->mask;
>>                 rc = dev->netdev_ops->ndo_rx_flow_steer(dev, skb,
>> -                                                       rxq_index, flow_id);
>> +                                                       rxq_index,
>> +                                                       skb_get_hash(skb));
>
> Can gcc CSE this?  Not that it matters that much.
>
>>                 if (rc < 0)
>>                         goto out;
>>                 old_rflow = rflow;
>> @@ -3195,8 +3196,8 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
>>
>>         rcu_read_lock();
>>         flow_table = rcu_dereference(rxqueue->rps_flow_table);
>> -       if (flow_table && flow_id <= flow_table->mask) {
>> -               rflow = &flow_table->flows[flow_id];
>> +       if (flow_table) {
>> +               rflow = &flow_table->flows[flow_id & flow_table->mask];
>
> I think this is nicer, but why will it help?  If there's a collision
> in the low bits of the hash, we'll still think that the flow is
> expired.
>
> Is there a real LRU-ish hash table that could be subbed in easily?

I think this does exist in sfc, or at least could potentially be
updated to do it. When a filter is being inserted, there is a lookup
on the spec to see if a matching filter already exists. If it is
found, -EBUSY is returned. I "think" that instead of returning -EBUSY,
the filter index could be returned and a timestamp in the soft data
structure could be refreshed. In the driver expiration code, the
timestamp could be checked before deciding to call
rps_may_expire_flow. In the case of a collision in the RPS flow table,
ndo_rx_flow_steer would be alternately called for the flows which
should continuously refresh the timestamps of the filters for each
flow. This should keep both flow filters active in the device, without
needing to whack the device for each call to ndo_rx_flow_steer.

> There ought to be a data structure that's barely more complicated than
> this kind of hash table but that gives much better behavior when the
> number of flows is smaller than the table size.
>
> --Andy
>
>>                 cpu = ACCESS_ONCE(rflow->cpu);
>>                 if (rflow->filter == filter_id && cpu != RPS_NO_CPU &&
>>                     ((int)(per_cpu(softnet_data, cpu).input_queue_head -
>> --
>> 2.1.0.rc2.206.gedb03e5
>>
>
>
>
> --
> Andy Lutomirski
> AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 1ab168e..cb1e06d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3057,7 +3057,8 @@  set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 			goto out;
 		flow_id = skb_get_hash(skb) & flow_table->mask;
 		rc = dev->netdev_ops->ndo_rx_flow_steer(dev, skb,
-							rxq_index, flow_id);
+							rxq_index,
+							skb_get_hash(skb));
 		if (rc < 0)
 			goto out;
 		old_rflow = rflow;
@@ -3195,8 +3196,8 @@  bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
 
 	rcu_read_lock();
 	flow_table = rcu_dereference(rxqueue->rps_flow_table);
-	if (flow_table && flow_id <= flow_table->mask) {
-		rflow = &flow_table->flows[flow_id];
+	if (flow_table) {
+		rflow = &flow_table->flows[flow_id & flow_table->mask];
 		cpu = ACCESS_ONCE(rflow->cpu);
 		if (rflow->filter == filter_id && cpu != RPS_NO_CPU &&
 		    ((int)(per_cpu(softnet_data, cpu).input_queue_head -