diff mbox series

[net-next,v2] route: Add a new fib_multipath_hash_policy base on cpu id for tunnel packet

Message ID 1550827209-23440-1-git-send-email-wenxu@ucloud.cn
State Changes Requested
Delegated to: David Miller
Headers show
Series [net-next,v2] route: Add a new fib_multipath_hash_policy base on cpu id for tunnel packet | expand

Commit Message

wenxu Feb. 22, 2019, 9:20 a.m. UTC
From: wenxu <wenxu@ucloud.cn>

Current fib_multipath_hash_policy can make hash based on the L3 or
L4. But it only work on the outer IP. So a specific tunnel always
has the same hash value. But a specific tunnel may contain so many
inner connections. However there is no good way for tunnel packet.
A specific tunnel route based on the percpu dst_cache. It will not
lookup route table for each packet.

This patch provide a based cpu id hash policy. The different
connection run on different cpu and there will be different hash
value for percpu dst_cache.

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
 net/ipv4/route.c           | 6 ++++++
 net/ipv4/sysctl_net_ipv4.c | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

Comments

Nikolay Aleksandrov Feb. 22, 2019, 10:26 a.m. UTC | #1
On 22/02/2019 11:20, wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
> 
> Current fib_multipath_hash_policy can make hash based on the L3 or
> L4. But it only work on the outer IP. So a specific tunnel always
> has the same hash value. But a specific tunnel may contain so many
> inner connections. However there is no good way for tunnel packet.
> A specific tunnel route based on the percpu dst_cache. It will not
> lookup route table for each packet.
> 
> This patch provide a based cpu id hash policy. The different
> connection run on different cpu and there will be different hash
> value for percpu dst_cache.
> 
> Signed-off-by: wenxu <wenxu@ucloud.cn>
> ---
>  net/ipv4/route.c           | 6 ++++++
>  net/ipv4/sysctl_net_ipv4.c | 2 +-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
Hi,
When we had the same issue in the bonding, we added a new mode which used
the flow dissector to get the inner headers and hash on them.
I believe that is even easier nowadays, but I think people wanted to
use different hash algorithms as well and last we discussed this we
were talking about yet another bpf use to generate the hash.
Now we got bpf_set_hash() which can be used to achieve this from various
places, if you use that with hash_policy=1 (L4) then skb->hash will
be used and you can achieve any hashing that you desire.

Cheers,
 Nik
wenxu Feb. 22, 2019, 12:50 p.m. UTC | #2
On 2019/2/22 下午6:26, Nikolay Aleksandrov wrote:
> On 22/02/2019 11:20, wenxu@ucloud.cn wrote:
>> From: wenxu <wenxu@ucloud.cn>
>>
>> Current fib_multipath_hash_policy can make hash based on the L3 or
>> L4. But it only work on the outer IP. So a specific tunnel always
>> has the same hash value. But a specific tunnel may contain so many
>> inner connections. However there is no good way for tunnel packet.
>> A specific tunnel route based on the percpu dst_cache. It will not
>> lookup route table for each packet.
>>
>> This patch provide a based cpu id hash policy. The different
>> connection run on different cpu and there will be different hash
>> value for percpu dst_cache.
>>
>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>> ---
>>  net/ipv4/route.c           | 6 ++++++
>>  net/ipv4/sysctl_net_ipv4.c | 2 +-
>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>
> Hi,
> When we had the same issue in the bonding, we added a new mode which used
> the flow dissector to get the inner headers and hash on them.
> I believe that is even easier nowadays, but I think people wanted to
> use different hash algorithms as well and last we discussed this we
> were talking about yet another bpf use to generate the hash.
> Now we got bpf_set_hash() which can be used to achieve this from various
> places, if you use that with hash_policy=1 (L4) then skb->hash will
> be used and you can achieve any hashing that you desire.
>
> Cheers,
>  Nik
>
>
Yes, we can set the skb->hash. But ip tunnel lookup the route table without skb.

The most important for performance issue most tunnel send work with dst_cache.

The dst_cache is percpu cache. So the number of dst_entry of a specific tunnel is the cpu cores .

So It's more better hash based on outer and cpu id.
Nikolay Aleksandrov Feb. 22, 2019, 4:08 p.m. UTC | #3
On 22/02/2019 14:50, wenxu wrote:
> On 2019/2/22 下午6:26, Nikolay Aleksandrov wrote:
>> On 22/02/2019 11:20, wenxu@ucloud.cn wrote:
>>> From: wenxu <wenxu@ucloud.cn>
>>>
>>> Current fib_multipath_hash_policy can make hash based on the L3 or
>>> L4. But it only work on the outer IP. So a specific tunnel always
>>> has the same hash value. But a specific tunnel may contain so many
>>> inner connections. However there is no good way for tunnel packet.
>>> A specific tunnel route based on the percpu dst_cache. It will not
>>> lookup route table for each packet.
>>>
>>> This patch provide a based cpu id hash policy. The different
>>> connection run on different cpu and there will be different hash
>>> value for percpu dst_cache.
>>>
>>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>>> ---
>>>  net/ipv4/route.c           | 6 ++++++
>>>  net/ipv4/sysctl_net_ipv4.c | 2 +-
>>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>>
>> Hi,
>> When we had the same issue in the bonding, we added a new mode which used
>> the flow dissector to get the inner headers and hash on them.
>> I believe that is even easier nowadays, but I think people wanted to
>> use different hash algorithms as well and last we discussed this we
>> were talking about yet another bpf use to generate the hash.
>> Now we got bpf_set_hash() which can be used to achieve this from various
>> places, if you use that with hash_policy=1 (L4) then skb->hash will
>> be used and you can achieve any hashing that you desire.
>>
>> Cheers,
>>  Nik
>>
>>
> Yes, we can set the skb->hash. But ip tunnel lookup the route table without skb.
> 
> The most important for performance issue most tunnel send work with dst_cache.
> 
> The dst_cache is percpu cache. So the number of dst_entry of a specific tunnel is the cpu cores .
> 
> So It's more better hash based on outer and cpu id.
> 
> 

I was just hoping for a more generic solution, anyway if this would go forward
please add documentation for the new option in Documentation/networking/ip-sysctl.txt
there's a description of fib_multipath_hash_policy. Also how is this supposed to work
if skb is present & skb->hash is set ? I believe it will be ignored, so it really only
works for lookups without an skb (or without a generated hash).

A thought (unchecked/untested, may not make much sense, feel free to ignore):
Why don't you make it mix flowi4_mark in with the hash and add some tunnel option
which mixes the CPU in that field ?
That is the new '2' mode will mix flowi4_mark with the hash and if some specific
tunnel option is set, the tunnel code will mix smp_current_id() with fl4's mark
before calling ip_route_output(), this sounds more useful to me as users will be
able to affect the hash calculation via the mark from other places in different
ways.

Cheers,
 Nik
wenxu Feb. 23, 2019, 12:45 a.m. UTC | #4
On 2019/2/23 上午12:08, Nikolay Aleksandrov wrote:
> On 22/02/2019 14:50, wenxu wrote:
>> On 2019/2/22 下午6:26, Nikolay Aleksandrov wrote:
>>> On 22/02/2019 11:20, wenxu@ucloud.cn wrote:
>>>> From: wenxu <wenxu@ucloud.cn>
>>>>
>>>> Current fib_multipath_hash_policy can make hash based on the L3 or
>>>> L4. But it only work on the outer IP. So a specific tunnel always
>>>> has the same hash value. But a specific tunnel may contain so many
>>>> inner connections. However there is no good way for tunnel packet.
>>>> A specific tunnel route based on the percpu dst_cache. It will not
>>>> lookup route table for each packet.
>>>>
>>>> This patch provide a based cpu id hash policy. The different
>>>> connection run on different cpu and there will be different hash
>>>> value for percpu dst_cache.
>>>>
>>>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>>>> ---
>>>>  net/ipv4/route.c           | 6 ++++++
>>>>  net/ipv4/sysctl_net_ipv4.c | 2 +-
>>>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>>>
>>> Hi,
>>> When we had the same issue in the bonding, we added a new mode which used
>>> the flow dissector to get the inner headers and hash on them.
>>> I believe that is even easier nowadays, but I think people wanted to
>>> use different hash algorithms as well and last we discussed this we
>>> were talking about yet another bpf use to generate the hash.
>>> Now we got bpf_set_hash() which can be used to achieve this from various
>>> places, if you use that with hash_policy=1 (L4) then skb->hash will
>>> be used and you can achieve any hashing that you desire.
>>>
>>> Cheers,
>>>  Nik
>>>
>>>
>> Yes, we can set the skb->hash. But ip tunnel lookup the route table without skb.
>>
>> The most important for performance issue most tunnel send work with dst_cache.
>>
>> The dst_cache is percpu cache. So the number of dst_entry of a specific tunnel is the cpu cores .
>>
>> So It's more better hash based on outer and cpu id.
>>
>>
> I was just hoping for a more generic solution, anyway if this would go forward
> please add documentation for the new option in Documentation/networking/ip-sysctl.txt
> there's a description of fib_multipath_hash_policy. Also how is this supposed to work
> if skb is present & skb->hash is set ? I believe it will be ignored, so it really only
> works for lookups without an skb (or without a generated hash).
>
> A thought (unchecked/untested, may not make much sense, feel free to ignore):
> Why don't you make it mix flowi4_mark in with the hash and add some tunnel option
> which mixes the CPU in that field ?
> That is the new '2' mode will mix flowi4_mark with the hash and if some specific
> tunnel option is set, the tunnel code will mix smp_current_id() with fl4's mark
> before calling ip_route_output(), this sounds more useful to me as users will be
> able to affect the hash calculation via the mark from other places in different
> ways.
>
> Cheers,
>  Nik
>
>
Yes, I agree that it need a more generic solution. So It should consider the both percpu cache and not cache situations.

Maye add a fl4.fl4_inner_hash for support this? It always hash according to the inner if it is the tunnel packet(

set the inner_hash by tunnel)

even there is a percpu cache it's also make the connections on the same core with the same hash result.
diff mbox series

Patch

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index ecc12a7..6cf2fd4 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1821,6 +1821,7 @@  int fib_multipath_hash(const struct net *net, const struct flowi4 *fl4,
 		       const struct sk_buff *skb, struct flow_keys *flkeys)
 {
 	struct flow_keys hash_keys;
+	u32 cpu = 0;
 	u32 mhash;
 
 	switch (net->ipv4.sysctl_fib_multipath_hash_policy) {
@@ -1834,6 +1835,8 @@  int fib_multipath_hash(const struct net *net, const struct flowi4 *fl4,
 			hash_keys.addrs.v4addrs.dst = fl4->daddr;
 		}
 		break;
+	case 2:
+		cpu = smp_processor_id() + 1;
 	case 1:
 		/* skb is currently provided only when forwarding */
 		if (skb) {
@@ -1870,6 +1873,9 @@  int fib_multipath_hash(const struct net *net, const struct flowi4 *fl4,
 	}
 	mhash = flow_hash_from_keys(&hash_keys);
 
+	if (cpu)
+		mhash = jhash_2words(mhash, cpu, 0);
+
 	return mhash >> 1;
 }
 #endif /* CONFIG_IP_ROUTE_MULTIPATH */
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index ba0fc4b..708bbbb 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -951,7 +951,7 @@  static int proc_fib_multipath_hash_policy(struct ctl_table *table, int write,
 		.mode		= 0644,
 		.proc_handler	= proc_fib_multipath_hash_policy,
 		.extra1		= &zero,
-		.extra2		= &one,
+		.extra2		= &two,
 	},
 #endif
 	{