diff mbox series

[v2] AF_PACKET doesnt strip VLAN information

Message ID 20200718091732.8761-1-srirakr2@cisco.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [v2] AF_PACKET doesnt strip VLAN information | expand

Commit Message

Sriram Krishnan (srirakr2) July 18, 2020, 9:17 a.m. UTC
When an application sends with AF_PACKET and places a vlan header on
the raw packet; then the AF_PACKET needs to move the tag into the skb
so that it gets processed normally through the rest of the transmit
path.

This is particularly a problem on Hyper-V where the host only allows
vlan in the offload info.

Cc: xe-linux-external@cisco.com
Cc: Sriram Krishnan <srirakr2@cisco.com>
Signed-off-by: Sriram Krishnan <srirakr2@cisco.com>
---
 net/packet/af_packet.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

Comments

Willem de Bruijn July 18, 2020, 3:58 p.m. UTC | #1
On Sat, Jul 18, 2020 at 5:25 AM Sriram Krishnan <srirakr2@cisco.com> wrote:
>
> When an application sends with AF_PACKET and places a vlan header on
> the raw packet; then the AF_PACKET needs to move the tag into the skb
> so that it gets processed normally through the rest of the transmit
> path.
>
> This is particularly a problem on Hyper-V where the host only allows
> vlan in the offload info.

Shouldn't this be resolved in that driver, then?

Packet sockets are not the only way to introduce packets in the
kernel. Tuntap would be another.
kernel test robot July 19, 2020, 5:16 a.m. UTC | #2
Hi Sriram,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on hnaz-linux-mm/master]
[also build test WARNING on sparc-next/master linus/master v5.8-rc5 next-20200717]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sriram-Krishnan/AF_PACKET-doesnt-strip-VLAN-information/20200718-172707
base:   https://github.com/hnaz/linux-mm master
config: s390-randconfig-s031-20200719 (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.2-49-g707c5017-dirty
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> net/packet/af_packet.c:1877:52: sparse: sparse: incorrect type in argument 2 (different base types) @@     expected unsigned short [usertype] *vlan_tci @@     got restricted __be16 * @@
>> net/packet/af_packet.c:1877:52: sparse:     expected unsigned short [usertype] *vlan_tci
>> net/packet/af_packet.c:1877:52: sparse:     got restricted __be16 *
>> net/packet/af_packet.c:1881:64: sparse: sparse: incorrect type in argument 3 (different base types) @@     expected unsigned short [usertype] vlan_tci @@     got restricted __be16 [addressable] [usertype] vlan_tci @@
   net/packet/af_packet.c:1881:64: sparse:     expected unsigned short [usertype] vlan_tci
>> net/packet/af_packet.c:1881:64: sparse:     got restricted __be16 [addressable] [usertype] vlan_tci

vim +1877 net/packet/af_packet.c

  1859	
  1860	static int packet_parse_headers(struct sk_buff *skb, struct socket *sock)
  1861	{
  1862		if ((!skb->protocol || skb->protocol == htons(ETH_P_ALL)) &&
  1863		    sock->type == SOCK_RAW) {
  1864			__be16 ethertype;
  1865	
  1866			skb_reset_mac_header(skb);
  1867	
  1868			ethertype = eth_hdr(skb)->h_proto;
  1869			/*
  1870			 * If Vlan tag is present in the packet
  1871			 *  move it to skb
  1872			*/
  1873			if (eth_type_vlan(ethertype)) {
  1874				int err;
  1875				__be16 vlan_tci;
  1876	
> 1877				err = __skb_vlan_pop(skb, &vlan_tci);
  1878				if (unlikely(err))
  1879					return err;
  1880	
> 1881				__vlan_hwaccel_put_tag(skb, ethertype, vlan_tci);
  1882			}
  1883	
  1884			skb->protocol = dev_parse_header_protocol(skb);
  1885		}
  1886	
  1887		skb_probe_transport_header(skb);
  1888		return 0;
  1889	}
  1890	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Willem de Bruijn July 20, 2020, 1:52 p.m. UTC | #3
On Mon, Jul 20, 2020 at 12:27 AM Sriram Krishnan (srirakr2)
<srirakr2@cisco.com> wrote:
>
> +Stephen Hemminger
>
> Hi Willem,
> Thanks for looking into the code, I understand that this is more of a generic problem wherein many of the filtering functions assume the vlan tag to be in the skb rather than in the packet. Hence we moved the fix from the driver to the common AF packet that our solution uses.
>
> I recall from the v1 of the patch you had mentioned other common areas where this fix might be relevant (such as tap/tun), but I'm afraid I cant comprehensively test those patches out. Please let me know your thoughts

Please use plain text to respond. HTML replies do not reach the list.

Can you be more precise in which other code besides the hyper-v driver
is affected? Do you have an example?

This is a resubmit of the original patch. My previous
questions/concerns remain valid:

- if the function can now fail, all callers must be updated to detect
and handle that

- any solution should probably address all inputs into the tx path:
packet sockets, tuntap, virtio-net

- this only addresses packet sockets with ETH_P_ALL/ETH_P_NONE. Not
sockets that set ETH_P_8021Q

- which code in the transmit stack requires the tag to be in the skb,
and does this problem after this patch still persist for Q-in-Q?
Sriram Krishnan (srirakr2) July 20, 2020, 5:01 p.m. UTC | #4
I have moved the code to the driver and pushed a new patch due to the below highlighted issues.

Stephen H,
Please let me know if you have any concerns localising the changes to the netvsc driver.


Thanks,
Sriram

´╗┐On 20/07/20, 7:23 PM, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com> wrote:

    On Mon, Jul 20, 2020 at 12:27 AM Sriram Krishnan (srirakr2)
    <srirakr2@cisco.com> wrote:
    >
    > +Stephen Hemminger
    >
    > Hi Willem,
    > Thanks for looking into the code, I understand that this is more of a generic problem wherein many of the filtering functions assume the vlan tag to be in the skb rather than in the packet. Hence we moved the fix from the driver to the common AF packet that our solution uses.
    >
    > I recall from the v1 of the patch you had mentioned other common areas where this fix might be relevant (such as tap/tun), but I'm afraid I cant comprehensively test those patches out. Please let me know your thoughts

    Please use plain text to respond. HTML replies do not reach the list.

    Can you be more precise in which other code besides the hyper-v driver
    is affected? Do you have an example?

    This is a resubmit of the original patch. My previous
    questions/concerns remain valid:

    - if the function can now fail, all callers must be updated to detect
    and handle that

    - any solution should probably address all inputs into the tx path:
    packet sockets, tuntap, virtio-net

    - this only addresses packet sockets with ETH_P_ALL/ETH_P_NONE. Not
    sockets that set ETH_P_8021Q

    - which code in the transmit stack requires the tag to be in the skb,
    and does this problem after this patch still persist for Q-in-Q?
Stephen Hemminger July 20, 2020, 8:56 p.m. UTC | #5
On Mon, 20 Jul 2020 09:52:27 -0400
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:

> On Mon, Jul 20, 2020 at 12:27 AM Sriram Krishnan (srirakr2)
> <srirakr2@cisco.com> wrote:
> >
> > +Stephen Hemminger
> >
> > Hi Willem,
> > Thanks for looking into the code, I understand that this is more of a generic problem wherein many of the filtering functions assume the vlan tag to be in the skb rather than in the packet. Hence we moved the fix from the driver to the common AF packet that our solution uses.
> >
> > I recall from the v1 of the patch you had mentioned other common areas where this fix might be relevant (such as tap/tun), but I'm afraid I cant comprehensively test those patches out. Please let me know your thoughts  
> 
> Please use plain text to respond. HTML replies do not reach the list.
> 
> Can you be more precise in which other code besides the hyper-v driver
> is affected? Do you have an example?
> 
> This is a resubmit of the original patch. My previous
> questions/concerns remain valid:
> 
> - if the function can now fail, all callers must be updated to detect
> and handle that
> 
> - any solution should probably address all inputs into the tx path:
> packet sockets, tuntap, virtio-net
> 
> - this only addresses packet sockets with ETH_P_ALL/ETH_P_NONE. Not
> sockets that set ETH_P_8021Q
> 
> - which code in the transmit stack requires the tag to be in the skb,
> and does this problem after this patch still persist for Q-in-Q?

It matters because the problem is generic, not just to the netvsc driver.
For example, BPF programs and netfilter rules will see different packets
when send is through AF_PACKET than they would see for sends from the
kernel stack.

Presenting uniform data to the lower layers makes sense.
Willem de Bruijn July 20, 2020, 9:22 p.m. UTC | #6
On Mon, Jul 20, 2020 at 4:57 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 20 Jul 2020 09:52:27 -0400
> Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
>
> > On Mon, Jul 20, 2020 at 12:27 AM Sriram Krishnan (srirakr2)
> > <srirakr2@cisco.com> wrote:
> > >
> > > +Stephen Hemminger
> > >
> > > Hi Willem,
> > > Thanks for looking into the code, I understand that this is more of a generic problem wherein many of the filtering functions assume the vlan tag to be in the skb rather than in the packet. Hence we moved the fix from the driver to the common AF packet that our solution uses.
> > >
> > > I recall from the v1 of the patch you had mentioned other common areas where this fix might be relevant (such as tap/tun), but I'm afraid I cant comprehensively test those patches out. Please let me know your thoughts
> >
> > Please use plain text to respond. HTML replies do not reach the list.
> >
> > Can you be more precise in which other code besides the hyper-v driver
> > is affected? Do you have an example?
> >
> > This is a resubmit of the original patch. My previous
> > questions/concerns remain valid:
> >
> > - if the function can now fail, all callers must be updated to detect
> > and handle that
> >
> > - any solution should probably address all inputs into the tx path:
> > packet sockets, tuntap, virtio-net
> >
> > - this only addresses packet sockets with ETH_P_ALL/ETH_P_NONE. Not
> > sockets that set ETH_P_8021Q
> >
> > - which code in the transmit stack requires the tag to be in the skb,
> > and does this problem after this patch still persist for Q-in-Q?
>
> It matters because the problem is generic, not just to the netvsc driver.
> For example, BPF programs and netfilter rules will see different packets
> when send is through AF_PACKET than they would see for sends from the
> kernel stack.
>
> Presenting uniform data to the lower layers makes sense.

Are all forwarded and locally generated packets guaranteed to always
have VLAN information in the tag (so that this is indeed only an issue
with input from userspace, through tuntap, virtio and packet sockets)?

I guess the first might be assured due to this in __netif_receive_skb_core:

        if (skb->protocol == cpu_to_be16(ETH_P_8021Q) ||
            skb->protocol == cpu_to_be16(ETH_P_8021AD)) {
                skb = skb_vlan_untag(skb);
                if (unlikely(!skb))
                        goto out;
        }

and the second by this in vlan_dev_hard_start_xmit:

        if (veth->h_vlan_proto != vlan->vlan_proto ||
            vlan->flags & VLAN_FLAG_REORDER_HDR) {
                u16 vlan_tci;
                vlan_tci = vlan->vlan_id;
                vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb->priority);
                __vlan_hwaccel_put_tag(skb, vlan->vlan_proto, vlan_tci);
        }

But I don't know this code very well, so that is based on a very
cursory glance only. Might well be missing other paths. (update: I
think pktgen is another example.)

Netfilter and BPF still need to handle tags in the packet for Q-in-Q,
right? So does this actually simplify their logic?

If the above holds and Q-in-Q is not a problem, then doing the same on
ingress from userspace may make sense. I don't know the kind of BPF
or netfilter programs what would be affected, and how.

Then it would be good to all those inputs at once to really plug the hole.
See also virtio_net_hdr_to_skb for another example of code that
applies to all of tuntap, virtio, pf_packet and uml.
Stephen Hemminger July 20, 2020, 10:03 p.m. UTC | #7
On Mon, 20 Jul 2020 17:22:49 -0400
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:

> On Mon, Jul 20, 2020 at 4:57 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Mon, 20 Jul 2020 09:52:27 -0400
> > Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> >  
> > > On Mon, Jul 20, 2020 at 12:27 AM Sriram Krishnan (srirakr2)
> > > <srirakr2@cisco.com> wrote:  
> > > >
> > > > +Stephen Hemminger
> > > >
> > > > Hi Willem,
> > > > Thanks for looking into the code, I understand that this is more of a generic problem wherein many of the filtering functions assume the vlan tag to be in the skb rather than in the packet. Hence we moved the fix from the driver to the common AF packet that our solution uses.
> > > >
> > > > I recall from the v1 of the patch you had mentioned other common areas where this fix might be relevant (such as tap/tun), but I'm afraid I cant comprehensively test those patches out. Please let me know your thoughts  
> > >
> > > Please use plain text to respond. HTML replies do not reach the list.
> > >
> > > Can you be more precise in which other code besides the hyper-v driver
> > > is affected? Do you have an example?
> > >
> > > This is a resubmit of the original patch. My previous
> > > questions/concerns remain valid:
> > >
> > > - if the function can now fail, all callers must be updated to detect
> > > and handle that
> > >
> > > - any solution should probably address all inputs into the tx path:
> > > packet sockets, tuntap, virtio-net
> > >
> > > - this only addresses packet sockets with ETH_P_ALL/ETH_P_NONE. Not
> > > sockets that set ETH_P_8021Q
> > >
> > > - which code in the transmit stack requires the tag to be in the skb,
> > > and does this problem after this patch still persist for Q-in-Q?  
> >
> > It matters because the problem is generic, not just to the netvsc driver.
> > For example, BPF programs and netfilter rules will see different packets
> > when send is through AF_PACKET than they would see for sends from the
> > kernel stack.
> >
> > Presenting uniform data to the lower layers makes sense.  
> 
> Are all forwarded and locally generated packets guaranteed to always
> have VLAN information in the tag (so that this is indeed only an issue
> with input from userspace, through tuntap, virtio and packet sockets)?
> 
> I guess the first might be assured due to this in __netif_receive_skb_core:
> 
>         if (skb->protocol == cpu_to_be16(ETH_P_8021Q) ||
>             skb->protocol == cpu_to_be16(ETH_P_8021AD)) {
>                 skb = skb_vlan_untag(skb);
>                 if (unlikely(!skb))
>                         goto out;
>         }
> 
> and the second by this in vlan_dev_hard_start_xmit:
> 
>         if (veth->h_vlan_proto != vlan->vlan_proto ||
>             vlan->flags & VLAN_FLAG_REORDER_HDR) {
>                 u16 vlan_tci;
>                 vlan_tci = vlan->vlan_id;
>                 vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb->priority);
>                 __vlan_hwaccel_put_tag(skb, vlan->vlan_proto, vlan_tci);
>         }
> 
> But I don't know this code very well, so that is based on a very
> cursory glance only. Might well be missing other paths. (update: I
> think pktgen is another example.)
> 
> Netfilter and BPF still need to handle tags in the packet for Q-in-Q,
> right? So does this actually simplify their logic?
> 
> If the above holds and Q-in-Q is not a problem, then doing the same on
> ingress from userspace may make sense. I don't know the kind of BPF
> or netfilter programs what would be affected, and how.
> 
> Then it would be good to all those inputs at once to really plug the hole.
> See also virtio_net_hdr_to_skb for another example of code that
> applies to all of tuntap, virtio, pf_packet and uml.


Older versions of Linux used to handle outer VLAN differentl
based on what the driver supported. It was a mess.
Some drivers and code paths would strip and put in meta-data, some
would leave it in skb data. But in recent (like 5 yrs), the kernel
has tried to be more uniform and only have vlan as skb tag.
It looks like AF_PACKET was overlooked at that time.
David Miller July 20, 2020, 11:36 p.m. UTC | #8
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Mon, 20 Jul 2020 13:56:50 -0700

> It matters because the problem is generic, not just to the netvsc driver.
> For example, BPF programs and netfilter rules will see different packets
> when send is through AF_PACKET than they would see for sends from the
> kernel stack.
> 
> Presenting uniform data to the lower layers makes sense.

The issue here is that for hyperv, vlan offloading is not a "may" but
a "must" and I've never understood it to have that meaning.

And I still haven't heard what is going to happen in Q-in-Q situations
even with the ugly hyperv driver hack that is currently under review.
diff mbox series

Patch

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 29bd405adbbd..10988639561e 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1857,15 +1857,35 @@  static int packet_rcv_spkt(struct sk_buff *skb, struct net_device *dev,
 	return 0;
 }
 
-static void packet_parse_headers(struct sk_buff *skb, struct socket *sock)
+static int packet_parse_headers(struct sk_buff *skb, struct socket *sock)
 {
 	if ((!skb->protocol || skb->protocol == htons(ETH_P_ALL)) &&
 	    sock->type == SOCK_RAW) {
+		__be16 ethertype;
+
 		skb_reset_mac_header(skb);
+
+		ethertype = eth_hdr(skb)->h_proto;
+		/*
+		 * If Vlan tag is present in the packet
+		 *  move it to skb
+		*/
+		if (eth_type_vlan(ethertype)) {
+			int err;
+			__be16 vlan_tci;
+
+			err = __skb_vlan_pop(skb, &vlan_tci);
+			if (unlikely(err))
+				return err;
+
+			__vlan_hwaccel_put_tag(skb, ethertype, vlan_tci);
+		}
+
 		skb->protocol = dev_parse_header_protocol(skb);
 	}
 
 	skb_probe_transport_header(skb);
+	return 0;
 }
 
 /*
@@ -1987,7 +2007,9 @@  static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
 	if (unlikely(extra_len == 4))
 		skb->no_fcs = 1;
 
-	packet_parse_headers(skb, sock);
+	err = packet_parse_headers(skb, sock);
+	if (err)
+		goto out_unlock;
 
 	dev_queue_xmit(skb);
 	rcu_read_unlock();