diff mbox

1/1 net: packet: Keep 802.1Q VLAN tag in packet on SOCK_DGRAM socket - resend

Message ID 001801ca8d1d$90c68de0$b253a9a0$@name
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Milan Dadok Jan. 4, 2010, 9:09 a.m. UTC
Keep 802.1Q VLAN tag on non HW vlan accelerated network card received to SOCK_DGRAM socket.

Signed-off-by: Milan Dadok <milan@dadok.name>

---
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Jan. 4, 2010, 10:36 a.m. UTC | #1
Le 04/01/2010 10:09, Milan Dadok a écrit :
> Keep 802.1Q VLAN tag on non HW vlan accelerated network card received to SOCK_DGRAM socket.
> 
> Signed-off-by: Milan Dadok <milan@dadok.name>
> 
> ---
> diff -urN af_packet.c.orig af_packet.c
> --- af_packet.c.orig    2009-12-26 12:34:15.000000000 +0100
> +++ af_packet.c 2009-12-28 14:31:14.000000000 +0100
> @@ -57,6 +57,8 @@
>  #include <linux/inet.h>
>  #include <linux/netdevice.h>
>  #include <linux/if_packet.h>
> +#include <linux/if_vlan.h>
> +#include "../8021q/vlan.h"

This is ugly, please move is_vlan_dev() in if_vlan.h instead ?

>  #include <linux/wireless.h>
>  #include <linux/kernel.h>
>  #include <linux/kmod.h>
> @@ -680,8 +682,28 @@
>                 if (sk->sk_type != SOCK_DGRAM)
>                         skb_push(skb, skb->data - skb_mac_header(skb));
>                 else if (skb->pkt_type == PACKET_OUTGOING) {
> -                       /* Special case: outgoing packets have ll header at head */
> -                       skb_pull(skb, skb_network_offset(skb));
> +                       /* Special case: outgoing packets have ll header at head
> +                        * but we must leave 802.1Q encapsulation etc. (only for non HW vlan accelerated)
> +                        * encasulation len = actual header_len minus hard_header_len
> +                        * packet outgoing from vlan1@eth0 on eth0 have skb_network_offset=18, hard_header_len=14
> +                        */
> +                       int hard_header_len;
> +                       struct net_device *pdev;
> +                       hard_header_len = dev->hard_header_len;
> +                       pdev = dev;
> +                       /* if dev is vlan device, hard_header_len contains 802.1Q encap, subtract it, recursively
> +                        * ie. vlan3@vlan2@vlan1@eth0
> +                        */
> +                       while (is_vlan_dev(pdev)) {
> +                               struct net_device *real_dev = vlan_dev_info(pdev)->real_dev;
A new line after variable definition is welcomed.
> +                               hard_header_len -= pdev->hard_header_len - real_dev->hard_header_len;
> +                               pdev = real_dev;
> +                       }
> +
> +                       skb_pull(skb, skb_network_offset(skb) -
> +                            (skb_network_offset(skb) - hard_header_len>0 ? skb_network_offset(skb) - hard_header_len : 0));
>                 }
>         }

I find this logic a bit hard to read, you might use this instead :

if (skb_network_offset(skb) > hard_header_len)
	skb_pull(skb, hard_header_len);
else
	skb_pull(skb, skb_network_offset(skb));




Also, could you please read Documentation/SubmittingPatches and
Documentation/email-clients.txt

Your patch has at least two problems :

1) It is not an unified diff
2) It has garbled tabulations (they are replaced by spaces)

It is not applicable as is.

Thank you
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Jan. 4, 2010, 1:52 p.m. UTC | #2
[please don't remove the mailing list from CC]

Milan Dadok wrote:
> Keep 802.1Q VLAN tag on non HW vlan accelerated network card received to SOCK_DGRAM socket.

Quoting from Documentation/networking/packet_mmap.txt:

fd= socket(PF_PACKET, mode, htons(ETH_P_ALL))

where mode is SOCK_RAW for the raw interface were link level
information can be captured or SOCK_DGRAM for the cooked
interface where link level information capture is not
supported and a link level pseudo-header is provided
by the kernel.

So not including the link layer header for SOCK_DGRAM sockets
seems to be the intended behaviour.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Milan Dadok Jan. 4, 2010, 3:28 p.m. UTC | #3
Patrick McHardy wrote:

>Milan Dadok wrote:
>> Keep 802.1Q VLAN tag on non HW vlan accelerated network card received to SOCK_DGRAM socket.

>So not including the link layer header for SOCK_DGRAM sockets
>seems to be the intended behaviour.

From my point of view i have question
Is 802.1Q encapsulation (or another type of encapsulation (IPSec?)) part of link level header or part of data packet?

Currently pseudo-header contains for OUTGOING packet on physical card (vlan10@eth1)
a) HW accelarated network card  
protocol = ethertype IPv4 (0x0800)
tci = vlan number = 10
and data starts with 4500 0028

b) non HW accelerated network card 
protocol = ethertype 802.1Q (0x8100)
tci = 0
and data starts with 4500 0028
vlan tci and real protocol number (ARP,IPV4,IPV6) of data is lost 

And with more nested vlans it is getting worse
for example

vlan1010@vlan10@eth1

a) HW accelarated network card  
protocol = ethertype IPv4 (0x8100)
tci = 10
and data starts with 4500 0028

the 4 bytes of real packet 03f2 0800 is lost too

b) non HW accelarated network card  
4 words of data packet are lost ...

I have no problems with received packets, only outgoing packet have problem.
I think that out packet on SOCK_DGRAM sockets MUST BE in same format as in (received) packet on same interface.
Can we agree on this?

Milan



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Jan. 4, 2010, 4:29 p.m. UTC | #4
Milan Dadok wrote:
> Patrick McHardy wrote:
> 
>> Milan Dadok wrote:
>>> Keep 802.1Q VLAN tag on non HW vlan accelerated network card received to SOCK_DGRAM socket.
> 
>> So not including the link layer header for SOCK_DGRAM sockets
>> seems to be the intended behaviour.
> 
>>From my point of view i have question
> Is 802.1Q encapsulation (or another type of encapsulation (IPSec?)) part of link level header or part of data packet?
> 
> Currently pseudo-header contains for OUTGOING packet on physical card (vlan10@eth1)
> a) HW accelarated network card  
> protocol = ethertype IPv4 (0x0800)
> tci = vlan number = 10
> and data starts with 4500 0028
> 
> b) non HW accelerated network card 
> protocol = ethertype 802.1Q (0x8100)
> tci = 0
> and data starts with 4500 0028
> vlan tci and real protocol number (ARP,IPV4,IPV6) of data is lost 

As mentioned in the text I quoted, this is apparently what is
intended for SOCK_DGRAM packet sockets. The accelerated case is
inconsistent and vlan_tci should be cleared I guess.

I agree that sll_protocol should reflect the network protocol
in this case however.

> And with more nested vlans it is getting worse
> for example
> 
> vlan1010@vlan10@eth1
> 
> a) HW accelarated network card  
> protocol = ethertype IPv4 (0x8100)
> tci = 10
> and data starts with 4500 0028
> 
> the 4 bytes of real packet 03f2 0800 is lost too
> 
> b) non HW accelarated network card  
> 4 words of data packet are lost ...
> 
> I have no problems with received packets, only outgoing packet have problem.
> I think that out packet on SOCK_DGRAM sockets MUST BE in same format as in (received) packet on same interface.
> Can we agree on this?

Yes, agreed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Milan Dadok Jan. 4, 2010, 5:11 p.m. UTC | #5
>Patrick McHardy wrote:
>> Currently pseudo-header contains for OUTGOING packet on physical card (vlan10@eth1)
>> a) HW accelarated network card  
>> protocol = ethertype IPv4 (0x0800)
>> tci = vlan number = 10
>> and data starts with 4500 0028
>> 
>> b) non HW accelerated network card 
>> protocol = ethertype 802.1Q (0x8100)
>> tci = 0
>> and data starts with 4500 0028
>> vlan tci and real protocol number (ARP,IPV4,IPV6) of data is lost 
>>
>As mentioned in the text I quoted, this is apparently what is
>intended for SOCK_DGRAM packet sockets. The accelerated case is
>inconsistent and vlan_tci should be cleared I guess.
>
>I agree that sll_protocol should reflect the network protocol
>in this case however.

There are probablly two solution SOCK_DGRAM
A. sll_protocol will have outer network protokol and packet data will have all data after that network protokol field
	(sll tci will be <>0 only with VLAN acceleration)

B. ssl_protocol will have the most inner network protokol (Ipv4, ARP, Ipv6)
	and cooked sll will have fields for restore all encapsulation protocols
	(on eth1 I need to know, if packet is send with vlan 1010 in vlan 10 or as vlan 10 in vlan 1010)

>> I have no problems with received packets, only outgoing packet have problem.
>> I think that out packet on SOCK_DGRAM sockets MUST BE in same format as in (received) packet on same interface.
>> Can we agree on this?
>
>Yes, agreed.

Now receive path is working as in solution A (if I remember correctly from my tests)
- packet data with all 802.1Q tags are send throu SOCK_DGRAM in same format as in SOCK_RAW (expect MAC header)
  (on HW vlan accel first VLAN tag is in tci field of TPACKET_V2)

Outgoing packet in SOCK_RAW is same as in receive.
Outgoing packet in SOCK_DGRAM - solution B with invalid sll_protokol

My patch is trying to change behaviour of outgoing packet in SOCK_DGRAM to same format as in received SOCK_DGRAM or SOCK_RAW

PS. There is one BIG question - how can I write kernel filter for filter explicit vlan number,
which can be used at the same time on HW vlan accelerated card (outer VLAN is in tci) and
non HW vlan accelerated card (outer VLAN is at the beggining of packet data)

Only solution I found, is to use user level filter and recreate vlan tag from sll tci in userspace before running filter.

Milan







--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Jan. 5, 2010, 5:17 a.m. UTC | #6
Milan Dadok wrote:
>> Patrick McHardy wrote:
>>> Currently pseudo-header contains for OUTGOING packet on physical card (vlan10@eth1)
>>> a) HW accelarated network card  
>>> protocol = ethertype IPv4 (0x0800)
>>> tci = vlan number = 10
>>> and data starts with 4500 0028
>>>
>>> b) non HW accelerated network card 
>>> protocol = ethertype 802.1Q (0x8100)
>>> tci = 0
>>> and data starts with 4500 0028
>>> vlan tci and real protocol number (ARP,IPV4,IPV6) of data is lost 
>>>
>> As mentioned in the text I quoted, this is apparently what is
>> intended for SOCK_DGRAM packet sockets. The accelerated case is
>> inconsistent and vlan_tci should be cleared I guess.
>>
>> I agree that sll_protocol should reflect the network protocol
>> in this case however.
> 
> There are probablly two solution SOCK_DGRAM
> A. sll_protocol will have outer network protokol and packet data will have all data after that network protokol field
> 	(sll tci will be <>0 only with VLAN acceleration)
> 
> B. ssl_protocol will have the most inner network protokol (Ipv4, ARP, Ipv6)
> 	and cooked sll will have fields for restore all encapsulation protocols
> 	(on eth1 I need to know, if packet is send with vlan 1010 in vlan 10 or as vlan 10 in vlan 1010)

I think it should indicate the outer network protocol, IOW the
protocol of the header at offset 0. But as the data doesn't contain
any link layer headers, vlan_tci should always be zero for SOCK_DGRAM
sockets for consistency.

If you want the link layer data, you simply shouldn't be using
SOCK_DGRAM sockets. I don't know why libpcap uses them when no
device is specified, I guess it simply can't build proper BPF
filters without knowing the link layer type. Which actually
should be possible, this seems like a limitation in libpcap's
BPF compiler.

>>> I have no problems with received packets, only outgoing packet have problem.
>>> I think that out packet on SOCK_DGRAM sockets MUST BE in same format as in (received) packet on same interface.
>>> Can we agree on this?
>> Yes, agreed.
> 
> Now receive path is working as in solution A (if I remember correctly from my tests)
> - packet data with all 802.1Q tags are send throu SOCK_DGRAM in same format as in SOCK_RAW (expect MAC header)
>   (on HW vlan accel first VLAN tag is in tci field of TPACKET_V2)
> 
> Outgoing packet in SOCK_RAW is same as in receive.
> Outgoing packet in SOCK_DGRAM - solution B with invalid sll_protokol
> 
> My patch is trying to change behaviour of outgoing packet in SOCK_DGRAM to same format as in received SOCK_DGRAM or SOCK_RAW

So basically what seems to be missing is a) not including vlan_tci
and b) figuring out the higher layer protocol for inclusion in
sll_protocol. The second part should be easy to fix for nested
VLANs. It gets quite complicated however if you consider setups
using nested VLANs with non-VLAN devices in between, like
vlan10@gretap0@vlan100.

> PS. There is one BIG question - how can I write kernel filter for filter explicit vlan number,
> which can be used at the same time on HW vlan accelerated card (outer VLAN is in tci) and
> non HW vlan accelerated card (outer VLAN is at the beggining of packet data)
> 
> Only solution I found, is to use user level filter and recreate vlan tag from sll tci in userspace before running filter.

To do this in the kernel, you'd have to add a new ancillary data
filter type to net/core/filter.c using vlan_get_tag().
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff -urN af_packet.c.orig af_packet.c
--- af_packet.c.orig    2009-12-26 12:34:15.000000000 +0100
+++ af_packet.c 2009-12-28 14:31:14.000000000 +0100
@@ -57,6 +57,8 @@ 
 #include <linux/inet.h>
 #include <linux/netdevice.h>
 #include <linux/if_packet.h>
+#include <linux/if_vlan.h>
+#include "../8021q/vlan.h"
 #include <linux/wireless.h>
 #include <linux/kernel.h>
 #include <linux/kmod.h>
@@ -680,8 +682,28 @@ 
                if (sk->sk_type != SOCK_DGRAM)
                        skb_push(skb, skb->data - skb_mac_header(skb));
                else if (skb->pkt_type == PACKET_OUTGOING) {
-                       /* Special case: outgoing packets have ll header at head */
-                       skb_pull(skb, skb_network_offset(skb));
+                       /* Special case: outgoing packets have ll header at head
+                        * but we must leave 802.1Q encapsulation etc. (only for non HW vlan accelerated)
+                        * encasulation len = actual header_len minus hard_header_len
+                        * packet outgoing from vlan1@eth0 on eth0 have skb_network_offset=18, hard_header_len=14
+                        */
+                       int hard_header_len;
+                       struct net_device *pdev;
+                       hard_header_len = dev->hard_header_len;
+                       pdev = dev;
+                       /* if dev is vlan device, hard_header_len contains 802.1Q encap, subtract it, recursively
+                        * ie. vlan3@vlan2@vlan1@eth0
+                        */
+                       while (is_vlan_dev(pdev)) {
+                               struct net_device *real_dev = vlan_dev_info(pdev)->real_dev;
+                               hard_header_len -= pdev->hard_header_len - real_dev->hard_header_len;
+                               pdev = real_dev;
+                       }
+
+                       skb_pull(skb, skb_network_offset(skb) -
+                            (skb_network_offset(skb) - hard_header_len>0 ? skb_network_offset(skb) - hard_header_len : 0));
                }
        }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org