Message ID | e06dbb47-2d1c-03ca-4cd7-cc958b6a939e@gmail.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Jan Grashöfer <jan.grashoefer@gmail.com> writes: > Hello, > > trying to ingest packets to a network monitoring tool using AF_Packet, I > noticed that VLAN tags are stripped off even if VLAN offloading is > disabled. As described in [1] the tags are stripped that early in the > receive path that raw sockets do not receive the packets as seen on the > wire. However, the man pages [2] state: > >> SOCK_RAW packets are passed to and from the device driver without any >> changes in the packet data. > > So at least the docs are wrong here. While the VLAN information is > present in the tpacket_hdr, the packets themselves are manipulated by > the kernel. In my case, for example, I need to pass on each packet (in > form of a packet) and thus would have to reinsert the tag manually, > which is definitely undesirable. My sympathies but that is pretty much what you are going to need to do. > As I am dealing with a monitoring application, the following patch that > disables untagging VLANs in promisc mode fixes the issue for me: > > diff --git a/net/core/dev.c b/net/core/dev.c > index 54f8c16..ec5b1c5 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4104,8 +4104,9 @@ static int __netif_receive_skb_core(struct sk_buff > *skb, bool pfmemalloc) > > __this_cpu_inc(softnet_data.processed); > > - if (skb->protocol == cpu_to_be16(ETH_P_8021Q) || > - skb->protocol == cpu_to_be16(ETH_P_8021AD)) { > + if ((skb->protocol == cpu_to_be16(ETH_P_8021Q) || > + skb->protocol == cpu_to_be16(ETH_P_8021AD)) && > + skb->dev->promiscuity < 1) { > skb = skb_vlan_untag(skb); > if (unlikely(!skb)) > goto out; That does not work. That is is just the software fallback for when the device driver does not have a special case the processing vlan tagged packets. There was a major inconsistency that for a long time the hardware network drivers were stripping tags and the software ones were not. The code you are playing with is the fix for the rare slow path that does not happen to strip the tags. Disabling the rare slow path might temporarily solve your symptoms but it will be much more painful when you are entrenched in your ways and discover that high performance hardware behaves differently than your software device. > -- > > I tested the patch with tcpdump and suricata. As they are designed to > handle VLAN tags anyway, both work as expected. Still there are two > caveats regarding this patch: First, there might be software out there > that does not work as expected anymore. Second, the semantics of raw > sockets are even more cluttered than before. I could imagine the > following solutions: > > 1) Disable VLAN untagging done by the kernel or move it up the receive > path - probably least practicable. > > 2) Parse VLAN info into the tpacket_hdr but keep the packet as is (at > least in promisc mode) - still, seems clumsy and error-prone to me. > > 3) Introduce a new socket option to disable VLAN untagging for raw > sockets - maybe overkill. Thinking in terms of disabling vlan untagging is not going to get you a solution. Eric
On 14/06/17 16:41, Eric W. Biederman wrote: > That does not work. That is is just the software fallback for when > the device driver does not have a special case the processing > vlan tagged packets. > > There was a major inconsistency that for a long time the hardware > network drivers were stripping tags and the software ones were not. > > The code you are playing with is the fix for the rare slow path > that does not happen to strip the tags. Disabling the rare slow path > might temporarily solve your symptoms but it will be much more painful > when you are entrenched in your ways and discover that high performance > hardware behaves differently than your software device. Thanks for your reply! Actually, I was referring to COTS hardware that incorporates offloading features. But, when it comes to (security) monitoring, offloading is usually disabled [1,2] to process the packets as seen on the wire. Thus the "slow path" would be the default path for most monitoring applications. That is, what makes this situation kind of weird. After turning off the NIC's VLAN offloading, it took me some time to realize that now the kernel strips off VLAN tags. If someone decides that VLAN offloading is not needed, I think the kernel should not enforce it. Jan [1] https://redmine.openinfosecfoundation.org/projects/suricata/wiki/File_Extraction#NIC-offloading [2] https://www.bro.org/documentation/faq.html#capture-loss-without-dropped-packets
Jan Grashöfer <jan.grashoefer@gmail.com> writes: > On 14/06/17 16:41, Eric W. Biederman wrote: >> That does not work. That is is just the software fallback for when >> the device driver does not have a special case the processing >> vlan tagged packets. >> >> There was a major inconsistency that for a long time the hardware >> network drivers were stripping tags and the software ones were not. >> >> The code you are playing with is the fix for the rare slow path >> that does not happen to strip the tags. Disabling the rare slow path >> might temporarily solve your symptoms but it will be much more painful >> when you are entrenched in your ways and discover that high performance >> hardware behaves differently than your software device. > > Thanks for your reply! Actually, I was referring to COTS hardware that > incorporates offloading features. But, when it comes to (security) > monitoring, offloading is usually disabled [1,2] to process the > packets as seen on the wire. Thus the "slow path" would be the default > path for most monitoring applications. That is, what makes this > situation kind of weird. After turning off the NIC's VLAN offloading, > it took me some time to realize that now the kernel strips off VLAN > tags. If someone decides that VLAN offloading is not needed, I think > the kernel should not enforce it. In practice it is too too complicated to support both so we choose the mode where vlan tags are always stripped. I can imagine a tweak to pf_packet where it readds the vlan tag before it gets to user space. I can not image changing how we treat the vlan tags internally to the kernel. There were nasty kernel bugs before: bcc6d4790361 ("net: vlan: make non-hw-accel rx path similar to hw-accel") I don't even want to contemplate opening that can of worms again. Eric
diff --git a/net/core/dev.c b/net/core/dev.c index 54f8c16..ec5b1c5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4104,8 +4104,9 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc) __this_cpu_inc(softnet_data.processed); - if (skb->protocol == cpu_to_be16(ETH_P_8021Q) || - skb->protocol == cpu_to_be16(ETH_P_8021AD)) { + if ((skb->protocol == cpu_to_be16(ETH_P_8021Q) || + skb->protocol == cpu_to_be16(ETH_P_8021AD)) && + skb->dev->promiscuity < 1) { skb = skb_vlan_untag(skb); if (unlikely(!skb))