diff mbox series

[net-next] net: slightly optimize eth_type_trans

Message ID 1542072871-21208-1-git-send-email-lirongqing@baidu.com
State Accepted, archived
Delegated to: David Miller
Headers show
Series [net-next] net: slightly optimize eth_type_trans | expand

Commit Message

Li RongQing Nov. 13, 2018, 1:34 a.m. UTC
netperf udp stream shows that eth_type_trans takes certain cpu,
so adjust the mac address check order, and firstly check if it
is device address, and only check if it is multicast address
only if not the device address.

After this change:
To unicast, and skb dst mac is device mac, this is most of time
reduce a comparision
To unicast, and skb dst mac is not device mac, nothing change
To multicast, increase a comparision

Before:
1.03%  [kernel]          [k] eth_type_trans

After:
0.78%  [kernel]          [k] eth_type_trans

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 net/ethernet/eth.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

Comments

David Miller Nov. 15, 2018, 11:11 p.m. UTC | #1
From: Li RongQing <lirongqing@baidu.com>
Date: Tue, 13 Nov 2018 09:34:31 +0800

> netperf udp stream shows that eth_type_trans takes certain cpu,
> so adjust the mac address check order, and firstly check if it
> is device address, and only check if it is multicast address
> only if not the device address.
> 
> After this change:
> To unicast, and skb dst mac is device mac, this is most of time
> reduce a comparision
> To unicast, and skb dst mac is not device mac, nothing change
> To multicast, increase a comparision
> 
> Before:
> 1.03%  [kernel]          [k] eth_type_trans
> 
> After:
> 0.78%  [kernel]          [k] eth_type_trans
> 
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>

Applied.
Eric Dumazet Nov. 17, 2018, 10:37 p.m. UTC | #2
On 11/15/2018 03:11 PM, David Miller wrote:

> 
> Applied.
> 

While reviewing this stuff, I found we have a bug.

If napi_reuse_skb() is called, we might inherit from prior skb->pkt_type value.

It seems that GRO could aggregate packets with pkt_type != PACKET_HOST, right ?

David, any objection if I submit the following fix ?

diff --git a/net/core/dev.c b/net/core/dev.c
index 5927f6a7c301ed90af21a4b82b443f30f00bb483..f2bfd2eda7b2734d29d30f0e82c1a48c1b5b166a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5664,6 +5664,10 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
        __vlan_hwaccel_clear_tag(skb);
        skb->dev = napi->dev;
        skb->skb_iif = 0;
+
+       /* eth_type_trans() assumes initial pkt_type is PACKET_HOST */
+       skb->pkt_type = PACKET_HOST;
+
        skb->encapsulation = 0;
        skb_shinfo(skb)->gso_type = 0;
        skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
David Miller Nov. 18, 2018, 12:51 a.m. UTC | #3
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 17 Nov 2018 14:37:12 -0800

> 
> 
> On 11/15/2018 03:11 PM, David Miller wrote:
> 
>> 
>> Applied.
>> 
> 
> While reviewing this stuff, I found we have a bug.
> 
> If napi_reuse_skb() is called, we might inherit from prior skb->pkt_type value.
> 
> It seems that GRO could aggregate packets with pkt_type != PACKET_HOST, right ?
> 
> David, any objection if I submit the following fix ?

Oh weird, so we do GRO frags accumulation using SKB which never goes through
eth_type_trans()?

I don't understand how we can, in this circumstance, assume PACKET_HOST?

Because that is what your suggested patch does.

Frame could be UDP multicast, and we could legitimately GRO accumulate it.
In that situations setting PACKET_HOST doesn't seem correct.
Eric Dumazet Nov. 18, 2018, 1:19 a.m. UTC | #4
On 11/17/2018 04:51 PM, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sat, 17 Nov 2018 14:37:12 -0800
> 
>>
>>
>> On 11/15/2018 03:11 PM, David Miller wrote:
>>
>>>
>>> Applied.
>>>
>>
>> While reviewing this stuff, I found we have a bug.
>>
>> If napi_reuse_skb() is called, we might inherit from prior skb->pkt_type value.
>>
>> It seems that GRO could aggregate packets with pkt_type != PACKET_HOST, right ?
>>
>> David, any objection if I submit the following fix ?
> 
> Oh weird, so we do GRO frags accumulation using SKB which never goes through
> eth_type_trans()?
> 
> I don't understand how we can, in this circumstance, assume PACKET_HOST?
> 
> Because that is what your suggested patch does.
> 
> Frame could be UDP multicast, and we could legitimately GRO accumulate it.
> In that situations setting PACKET_HOST doesn't seem correct.
> 

I might have been not very clear.

Issue is that with macvlan and GRO-friend traffic we can receive the following packets :

P1-P2 with PACKET_OTHERHOST because eth_type_trans() detected the dst MAC is not the eth0 device mac address.

P2 has been aggregated to P1, so the sk_buff has been put to napi->skb via napi_reuse_skb()

Then we receive on same NAPI packet P3, for this host, reusing napi->skb that was saved (old P2 sk_buff).
skb->pkt_type is PACKET_OTHERHOST.

eth_type_trans() does not change skb->pkt_type because ethernet dst mac address is our ethernet mac address.

-> We feed the upper stack with P3, with incorrect pkt_type.

 -> packet is dropped because pkt_type != PACKET_HOST, for example in tcp_v4_rcv()
David Miller Nov. 18, 2018, 4:06 a.m. UTC | #5
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 17 Nov 2018 17:19:34 -0800

> I might have been not very clear.
> 
> Issue is that with macvlan and GRO-friend traffic we can receive the following packets :
> 
> P1-P2 with PACKET_OTHERHOST because eth_type_trans() detected the dst MAC is not the eth0 device mac address.
> 
> P2 has been aggregated to P1, so the sk_buff has been put to napi->skb via napi_reuse_skb()
> 
> Then we receive on same NAPI packet P3, for this host, reusing napi->skb that was saved (old P2 sk_buff).
> skb->pkt_type is PACKET_OTHERHOST.
> 
> eth_type_trans() does not change skb->pkt_type because ethernet dst mac address is our ethernet mac address.
> 
> -> We feed the upper stack with P3, with incorrect pkt_type.
> 
>  -> packet is dropped because pkt_type != PACKET_HOST, for example in tcp_v4_rcv()

Oh I see, it is about defaults when using skb freshly from cache for another new packet.

Yes, I completely agree with your suggested patch.

Thanks for explaining.
diff mbox series

Patch

diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index fd8faa0dfa61..1c88f5c5d5b1 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -165,15 +165,17 @@  __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev)
 	eth = (struct ethhdr *)skb->data;
 	skb_pull_inline(skb, ETH_HLEN);
 
-	if (unlikely(is_multicast_ether_addr_64bits(eth->h_dest))) {
-		if (ether_addr_equal_64bits(eth->h_dest, dev->broadcast))
-			skb->pkt_type = PACKET_BROADCAST;
-		else
-			skb->pkt_type = PACKET_MULTICAST;
+	if (unlikely(!ether_addr_equal_64bits(eth->h_dest,
+						   dev->dev_addr))) {
+		if (unlikely(is_multicast_ether_addr_64bits(eth->h_dest))) {
+			if (ether_addr_equal_64bits(eth->h_dest, dev->broadcast))
+				skb->pkt_type = PACKET_BROADCAST;
+			else
+				skb->pkt_type = PACKET_MULTICAST;
+		} else {
+			skb->pkt_type = PACKET_OTHERHOST;
+		}
 	}
-	else if (unlikely(!ether_addr_equal_64bits(eth->h_dest,
-						   dev->dev_addr)))
-		skb->pkt_type = PACKET_OTHERHOST;
 
 	/*
 	 * Some variants of DSA tagging don't have an ethertype field