diff mbox

Receiving raw packets (incl. VLAN tags) on raw sockets

Message ID e06dbb47-2d1c-03ca-4cd7-cc958b6a939e@gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jan Grashöfer June 14, 2017, 12:47 p.m. UTC
Hello,

trying to ingest packets to a network monitoring tool using AF_Packet, I
noticed that VLAN tags are stripped off even if VLAN offloading is
disabled. As described in [1] the tags are stripped that early in the
receive path that raw sockets do not receive the packets as seen on the
wire. However, the man pages [2] state:

> SOCK_RAW packets are passed to and from the device driver without any
> changes in the packet data.

So at least the docs are wrong here. While the VLAN information is
present in the tpacket_hdr, the packets themselves are manipulated by
the kernel. In my case, for example, I need to pass on each packet (in
form of a packet) and thus would have to reinsert the tag manually,
which is definitely undesirable.

As I am dealing with a monitoring application, the following patch that
disables untagging VLANs in promisc mode fixes the issue for me:

                        goto out;
--

I tested the patch with tcpdump and suricata. As they are designed to
handle VLAN tags anyway, both work as expected. Still there are two
caveats regarding this patch: First, there might be software out there
that does not work as expected anymore. Second, the semantics of raw
sockets are even more cluttered than before. I could imagine the
following solutions:

1) Disable VLAN untagging done by the kernel or move it up the receive
path - probably least practicable.

2) Parse VLAN info into the tpacket_hdr but keep the packet as is (at
least in promisc mode) - still, seems clumsy and error-prone to me.

3) Introduce a new socket option to disable VLAN untagging for raw
sockets - maybe overkill.

I would be happy to receive some feedback!

Best regards,
Jan

[1] https://www.spinics.net/lists/netdev/msg244668.html
[2] http://man7.org/linux/man-pages/man7/ip.7.html

Comments

Eric W. Biederman June 14, 2017, 2:41 p.m. UTC | #1
Jan Grashöfer <jan.grashoefer@gmail.com> writes:

> Hello,
>
> trying to ingest packets to a network monitoring tool using AF_Packet, I
> noticed that VLAN tags are stripped off even if VLAN offloading is
> disabled. As described in [1] the tags are stripped that early in the
> receive path that raw sockets do not receive the packets as seen on the
> wire. However, the man pages [2] state:
>
>> SOCK_RAW packets are passed to and from the device driver without any
>> changes in the packet data.
>
> So at least the docs are wrong here. While the VLAN information is
> present in the tpacket_hdr, the packets themselves are manipulated by
> the kernel. In my case, for example, I need to pass on each packet (in
> form of a packet) and thus would have to reinsert the tag manually,
> which is definitely undesirable.

My sympathies but that is pretty much what you are going to need to do.

> As I am dealing with a monitoring application, the following patch that
> disables untagging VLANs in promisc mode fixes the issue for me:
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 54f8c16..ec5b1c5 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4104,8 +4104,9 @@ static int __netif_receive_skb_core(struct sk_buff
> *skb, bool pfmemalloc)
>
>         __this_cpu_inc(softnet_data.processed);
>
> -       if (skb->protocol == cpu_to_be16(ETH_P_8021Q) ||
> -           skb->protocol == cpu_to_be16(ETH_P_8021AD)) {
> +       if ((skb->protocol == cpu_to_be16(ETH_P_8021Q) ||
> +           skb->protocol == cpu_to_be16(ETH_P_8021AD)) &&
> +           skb->dev->promiscuity < 1) {
>                 skb = skb_vlan_untag(skb);
>                 if (unlikely(!skb))
>                         goto out;

That does not work.  That is is just the software fallback for when
the device driver does not have a special case the processing
vlan tagged packets.

There was a major inconsistency that for a long time the hardware
network drivers were stripping tags and the software ones were not.

The code you are playing with is the fix for the rare slow path
that does not happen to strip the tags.  Disabling the rare slow path
might temporarily solve your symptoms but it will be much more painful
when you are entrenched in your ways and discover that high performance
hardware behaves differently than your software device.

> --
>
> I tested the patch with tcpdump and suricata. As they are designed to
> handle VLAN tags anyway, both work as expected. Still there are two
> caveats regarding this patch: First, there might be software out there
> that does not work as expected anymore. Second, the semantics of raw
> sockets are even more cluttered than before. I could imagine the
> following solutions:
>
> 1) Disable VLAN untagging done by the kernel or move it up the receive
> path - probably least practicable.
>
> 2) Parse VLAN info into the tpacket_hdr but keep the packet as is (at
> least in promisc mode) - still, seems clumsy and error-prone to me.
>
> 3) Introduce a new socket option to disable VLAN untagging for raw
> sockets - maybe overkill.

Thinking in terms of disabling vlan untagging is not going to get you a
solution.

Eric
Jan Grashöfer June 14, 2017, 4:03 p.m. UTC | #2
On 14/06/17 16:41, Eric W. Biederman wrote:
> That does not work.  That is is just the software fallback for when
> the device driver does not have a special case the processing
> vlan tagged packets.
> 
> There was a major inconsistency that for a long time the hardware
> network drivers were stripping tags and the software ones were not.
> 
> The code you are playing with is the fix for the rare slow path
> that does not happen to strip the tags.  Disabling the rare slow path
> might temporarily solve your symptoms but it will be much more painful
> when you are entrenched in your ways and discover that high performance
> hardware behaves differently than your software device.

Thanks for your reply! Actually, I was referring to COTS hardware that 
incorporates offloading features. But, when it comes to (security) 
monitoring, offloading is usually disabled [1,2] to process the packets 
as seen on the wire. Thus the "slow path" would be the default path for 
most monitoring applications. That is, what makes this situation kind of 
weird. After turning off the NIC's VLAN offloading, it took me some time 
to realize that now the kernel strips off VLAN tags. If someone decides 
that VLAN offloading is not needed, I think the kernel should not 
enforce it.

Jan

[1] 
https://redmine.openinfosecfoundation.org/projects/suricata/wiki/File_Extraction#NIC-offloading
[2] 
https://www.bro.org/documentation/faq.html#capture-loss-without-dropped-packets
Eric W. Biederman June 14, 2017, 4:54 p.m. UTC | #3
Jan Grashöfer <jan.grashoefer@gmail.com> writes:

> On 14/06/17 16:41, Eric W. Biederman wrote:
>> That does not work.  That is is just the software fallback for when
>> the device driver does not have a special case the processing
>> vlan tagged packets.
>>
>> There was a major inconsistency that for a long time the hardware
>> network drivers were stripping tags and the software ones were not.
>>
>> The code you are playing with is the fix for the rare slow path
>> that does not happen to strip the tags.  Disabling the rare slow path
>> might temporarily solve your symptoms but it will be much more painful
>> when you are entrenched in your ways and discover that high performance
>> hardware behaves differently than your software device.
>
> Thanks for your reply! Actually, I was referring to COTS hardware that
> incorporates offloading features. But, when it comes to (security)
> monitoring, offloading is usually disabled [1,2] to process the
> packets as seen on the wire. Thus the "slow path" would be the default
> path for most monitoring applications. That is, what makes this
> situation kind of weird. After turning off the NIC's VLAN offloading,
> it took me some time to realize that now the kernel strips off VLAN
> tags. If someone decides that VLAN offloading is not needed, I think
> the kernel should not enforce it.

In practice it is too too complicated to support both so we choose the
mode where vlan tags are always stripped.

I can imagine a tweak to pf_packet where it readds the vlan tag before
it gets to user space.  I can not image changing how we treat the vlan
tags internally to the kernel.

There were nasty kernel bugs before: 
bcc6d4790361 ("net: vlan: make non-hw-accel rx path similar to hw-accel")

I don't even want to contemplate opening that can of worms again.

Eric
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 54f8c16..ec5b1c5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4104,8 +4104,9 @@  static int __netif_receive_skb_core(struct sk_buff
*skb, bool pfmemalloc)

        __this_cpu_inc(softnet_data.processed);

-       if (skb->protocol == cpu_to_be16(ETH_P_8021Q) ||
-           skb->protocol == cpu_to_be16(ETH_P_8021AD)) {
+       if ((skb->protocol == cpu_to_be16(ETH_P_8021Q) ||
+           skb->protocol == cpu_to_be16(ETH_P_8021AD)) &&
+           skb->dev->promiscuity < 1) {
                skb = skb_vlan_untag(skb);
                if (unlikely(!skb))