diff mbox

[net,1/2] net: dev_queue_xmit_nit: fix skb->vlan_tci field value

Message ID 1357671093-9605-2-git-send-email-dborkman@redhat.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Daniel Borkmann Jan. 8, 2013, 6:51 p.m. UTC
VLAN packets that are locally injected through taps will loose their
skb->vlan_tci value when they pass dev_hard_start_xmit and get looped
back to a packet sniffer via dev_queue_xmit_nit. Besides others, this
meta data is used in Linux socket filtering for VLANs. Tested with a
VLAN ancillary ops filter.

Patch is based on a previous version by Jiri Pirko.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ani Sinha <ani@aristanetworks.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Reported-by: Paul Pearce <pearce@cs.berkeley.edu>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 net/core/dev.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Ani Sinha Jan. 8, 2013, 7:54 p.m. UTC | #1
Agreed with the fix. This fixes the issue introduced by this change :l


On Tue, Jan 8, 2013 at 10:51 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> VLAN packets that are locally injected through taps will loose their
> skb->vlan_tci value when they pass dev_hard_start_xmit and get looped
> back to a packet sniffer via dev_queue_xmit_nit. Besides others, this
> meta data is used in Linux socket filtering for VLANs. Tested with a
> VLAN ancillary ops filter.
>
> Patch is based on a previous version by Jiri Pirko.
>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Ani Sinha <ani@aristanetworks.com>
> Cc: Jiri Pirko <jpirko@redhat.com>
> Reported-by: Paul Pearce <pearce@cs.berkeley.edu>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>

Acked-by: Ani Sinha <ani@aristanetworks.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 8, 2013, 8:04 p.m. UTC | #2
On Tue, 2013-01-08 at 19:51 +0100, Daniel Borkmann wrote:
> VLAN packets that are locally injected through taps will loose their
> skb->vlan_tci value when they pass dev_hard_start_xmit and get looped
> back to a packet sniffer via dev_queue_xmit_nit. Besides others, this
> meta data is used in Linux socket filtering for VLANs. Tested with a
> VLAN ancillary ops filter.
> 
> Patch is based on a previous version by Jiri Pirko.
> 
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Ani Sinha <ani@aristanetworks.com>
> Cc: Jiri Pirko <jpirko@redhat.com>
> Reported-by: Paul Pearce <pearce@cs.berkeley.edu>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> ---
>  net/core/dev.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 515473e..723dcd0 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1775,6 +1775,19 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
>  	struct packet_type *ptype;
>  	struct sk_buff *skb2 = NULL;
>  	struct packet_type *pt_prev = NULL;
> +	struct ethhdr *ehdr;
> +
> +	/* Network taps could make use of skb->vlan_tci, which got wiped
> +	 * out. Hence, we need to reset it correctly.
> +	 */
> +	skb_reset_mac_header(skb);
> +	ehdr = eth_hdr(skb);
> +
> +	if (ehdr->h_proto == __constant_htons(ETH_P_8021Q)) {
> +		skb2 = vlan_untag(skb);
> +		if (likely(skb2))
> +			skb = skb2;
> +	}
>  
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(ptype, &ptype_all, list) {

This patch is wrong (it adds a leak), and not needed.

If a packet has no vlan_tci, its for a good reason.

We want sniffer see the packet content as is.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Jan. 8, 2013, 8:14 p.m. UTC | #3
Tue, Jan 08, 2013 at 07:51:32PM CET, dborkman@redhat.com wrote:
>VLAN packets that are locally injected through taps will loose their
>skb->vlan_tci value when they pass dev_hard_start_xmit and get looped
>back to a packet sniffer via dev_queue_xmit_nit. Besides others, this
>meta data is used in Linux socket filtering for VLANs. Tested with a
>VLAN ancillary ops filter.
>
>Patch is based on a previous version by Jiri Pirko.
>
>Cc: Eric Dumazet <eric.dumazet@gmail.com>
>Cc: Ani Sinha <ani@aristanetworks.com>
>Cc: Jiri Pirko <jpirko@redhat.com>
>Reported-by: Paul Pearce <pearce@cs.berkeley.edu>
>Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>---
> net/core/dev.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 515473e..723dcd0 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -1775,6 +1775,19 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
> 	struct packet_type *ptype;
> 	struct sk_buff *skb2 = NULL;
> 	struct packet_type *pt_prev = NULL;
>+	struct ethhdr *ehdr;
>+
>+	/* Network taps could make use of skb->vlan_tci, which got wiped
>+	 * out. Hence, we need to reset it correctly.
>+	 */
>+	skb_reset_mac_header(skb);
>+	ehdr = eth_hdr(skb);
>+
>+	if (ehdr->h_proto == __constant_htons(ETH_P_8021Q)) {
>+		skb2 = vlan_untag(skb);
>+		if (likely(skb2))
>+			skb = skb2;
>+	}

	Hmm, nitpick, I think that better would be to do:
		skb = vlan_untag(skb);
		if (unlikely(!skb))
			return;
	
	I believe that better is to deliver skbs in consistent way and
	to do not deliver at all in case of -ENOMEM

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Jan. 8, 2013, 8:22 p.m. UTC | #4
Tue, Jan 08, 2013 at 09:04:52PM CET, eric.dumazet@gmail.com wrote:
>On Tue, 2013-01-08 at 19:51 +0100, Daniel Borkmann wrote:
>> VLAN packets that are locally injected through taps will loose their
>> skb->vlan_tci value when they pass dev_hard_start_xmit and get looped
>> back to a packet sniffer via dev_queue_xmit_nit. Besides others, this
>> meta data is used in Linux socket filtering for VLANs. Tested with a
>> VLAN ancillary ops filter.
>> 
>> Patch is based on a previous version by Jiri Pirko.
>> 
>> Cc: Eric Dumazet <eric.dumazet@gmail.com>
>> Cc: Ani Sinha <ani@aristanetworks.com>
>> Cc: Jiri Pirko <jpirko@redhat.com>
>> Reported-by: Paul Pearce <pearce@cs.berkeley.edu>
>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>> ---
>>  net/core/dev.c | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>> 
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 515473e..723dcd0 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1775,6 +1775,19 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
>>  	struct packet_type *ptype;
>>  	struct sk_buff *skb2 = NULL;
>>  	struct packet_type *pt_prev = NULL;
>> +	struct ethhdr *ehdr;
>> +
>> +	/* Network taps could make use of skb->vlan_tci, which got wiped
>> +	 * out. Hence, we need to reset it correctly.
>> +	 */
>> +	skb_reset_mac_header(skb);
>> +	ehdr = eth_hdr(skb);
>> +
>> +	if (ehdr->h_proto == __constant_htons(ETH_P_8021Q)) {
>> +		skb2 = vlan_untag(skb);
>> +		if (likely(skb2))
>> +			skb = skb2;
>> +	}
>>  
>>  	rcu_read_lock();
>>  	list_for_each_entry_rcu(ptype, &ptype_all, list) {
>
>This patch is wrong (it adds a leak), and not needed.
>
>If a packet has no vlan_tci, its for a good reason.
>
>We want sniffer see the packet content as is.


The issue is that for exmaple in af_packet the function packet_rcv()
expects vlan_tci to be filled out as that is on RX path ensured by
__netif_receive_skb(). However on TX path, dev_queue_xmit_nit() is
called with vlan_tci cleaned in case the device does not have TX vlan
accel enabled. This patch is trying to fix this difference.

>
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 8, 2013, 8:42 p.m. UTC | #5
On Tue, 2013-01-08 at 21:22 +0100, Jiri Pirko wrote:

> 
> The issue is that for exmaple in af_packet the function packet_rcv()
> expects vlan_tci to be filled out as that is on RX path ensured by
> __netif_receive_skb(). However on TX path, dev_queue_xmit_nit() is
> called with vlan_tci cleaned in case the device does not have TX vlan
> accel enabled. This patch is trying to fix this difference.

I perfectly understood that, and I repeat :

I want to see the difference.

A filter cannot expect skb->vlan_tci is set for all packets.

If a device doesn't have TX vlan accel, vlan_tci is 0.

packet sniffing is already slow, we don't want to force an extra copy
with vlan_untag() killer.

If you want to fix/extend af_packet, do this in af_packet.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 515473e..723dcd0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1775,6 +1775,19 @@  static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 	struct packet_type *ptype;
 	struct sk_buff *skb2 = NULL;
 	struct packet_type *pt_prev = NULL;
+	struct ethhdr *ehdr;
+
+	/* Network taps could make use of skb->vlan_tci, which got wiped
+	 * out. Hence, we need to reset it correctly.
+	 */
+	skb_reset_mac_header(skb);
+	ehdr = eth_hdr(skb);
+
+	if (ehdr->h_proto == __constant_htons(ETH_P_8021Q)) {
+		skb2 = vlan_untag(skb);
+		if (likely(skb2))
+			skb = skb2;
+	}
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(ptype, &ptype_all, list) {