diff mbox

filter: introduce SKF_AD_VLAN_PROTO BPF extension

Message ID 1425501718-12066-1-git-send-email-msekleta@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Michal Sekletar March 4, 2015, 8:41 p.m. UTC
This commit introduces new BPF extension. It makes possible to load value of
skb->vlan_proto (vlan tpid) to register A.

Currently, vlan header is removed from frame and information is available to
userspace only via tpacket interface. Hence, it is not possible to install
filter which uses value of vlan tpid field.

AFAICT only way how to filter based on tpid value is to reconstruct original
frame encapsulation and interpret BPF filter code in userspace. Doing that is
way slower than doing filtering in kernel.

Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Michal Sekletar <msekleta@redhat.com>
---
 Documentation/networking/filter.txt |  1 +
 arch/arm/net/bpf_jit_32.c           |  6 ++++++
 arch/mips/net/bpf_jit.c             |  6 ++++++
 arch/powerpc/net/bpf_jit_comp.c     |  5 +++++
 arch/s390/net/bpf_jit_comp.c        |  5 +++++
 arch/sparc/net/bpf_jit_comp.c       |  3 +++
 include/linux/filter.h              |  1 +
 include/uapi/linux/filter.h         |  3 ++-
 net/core/filter.c                   | 10 ++++++++++
 tools/net/bpf_exp.l                 |  1 +
 tools/net/bpf_exp.y                 | 11 ++++++++++-
 11 files changed, 50 insertions(+), 2 deletions(-)

Comments

Alexei Starovoitov March 4, 2015, 9:03 p.m. UTC | #1
On 3/4/15 12:41 PM, Michal Sekletar wrote:
> This commit introduces new BPF extension. It makes possible to load value of
> skb->vlan_proto (vlan tpid) to register A.
>
> Currently, vlan header is removed from frame and information is available to
> userspace only via tpacket interface. Hence, it is not possible to install
> filter which uses value of vlan tpid field.
>
> AFAICT only way how to filter based on tpid value is to reconstruct original
> frame encapsulation and interpret BPF filter code in userspace. Doing that is
> way slower than doing filtering in kernel.
>
> Cc: Alexei Starovoitov <ast@plumgrid.com>
> Cc: Jiri Pirko <jpirko@redhat.com>
> Signed-off-by: Michal Sekletar <msekleta@redhat.com>
> ---
> @@ -282,6 +282,7 @@ Possible BPF extensions are shown in the following table:
>     vlan_tci                              skb_vlan_tag_get(skb)
>     vlan_pr                               skb_vlan_tag_present(skb)
>     rand                                  prandom_u32()
> +  vlan_proto                            skb->vlan_proto

the patch is correct and looks clean, but I don't understand
the motivation for the patch.
There is already SKF_AD_VLAN_TAG_PRESENT. If it is set then only
two possible values of vlan_proto are ETH_P_8021Q or ETH_P_8021AD.
If there another vlan header inside the packet, it's AD.
So you can do the filtering already without adding new bpf extension...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guy Harris March 4, 2015, 9:14 p.m. UTC | #2
On Mar 4, 2015, at 1:03 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:

> the patch is correct and looks clean, but I don't understand
> the motivation for the patch.
> There is already SKF_AD_VLAN_TAG_PRESENT. If it is set then only
> two possible values of vlan_proto are ETH_P_8021Q or ETH_P_8021AD.
> If there another vlan header inside the packet, it's AD.
> So you can do the filtering already without adding new bpf extension...

I presume he's referring to

	https://github.com/the-tcpdump-group/libpcap/issues/397

or

	https://github.com/the-tcpdump-group/libpcap/issues/390

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov March 4, 2015, 11:47 p.m. UTC | #3
On 3/4/15 1:14 PM, Guy Harris wrote:
>
> On Mar 4, 2015, at 1:03 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>
>> the patch is correct and looks clean, but I don't understand
>> the motivation for the patch.
>> There is already SKF_AD_VLAN_TAG_PRESENT. If it is set then only
>> two possible values of vlan_proto are ETH_P_8021Q or ETH_P_8021AD.
>> If there another vlan header inside the packet, it's AD.
>> So you can do the filtering already without adding new bpf extension...
>
> I presume he's referring to
>
> 	https://github.com/the-tcpdump-group/libpcap/issues/397
>
> or
>
> 	https://github.com/the-tcpdump-group/libpcap/issues/390

ok. context is clear.
yet, it still sounds like something to fix inside libpcap.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshiaki Makita March 5, 2015, 2:28 a.m. UTC | #4
On 2015/03/05 6:03, Alexei Starovoitov wrote:
> On 3/4/15 12:41 PM, Michal Sekletar wrote:
>> This commit introduces new BPF extension. It makes possible to load
>> value of
>> skb->vlan_proto (vlan tpid) to register A.
>>
>> Currently, vlan header is removed from frame and information is
>> available to
>> userspace only via tpacket interface. Hence, it is not possible to
>> install
>> filter which uses value of vlan tpid field.
>>
>> AFAICT only way how to filter based on tpid value is to reconstruct
>> original
>> frame encapsulation and interpret BPF filter code in userspace. Doing
>> that is
>> way slower than doing filtering in kernel.
>>
>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>> Cc: Jiri Pirko <jpirko@redhat.com>
>> Signed-off-by: Michal Sekletar <msekleta@redhat.com>
>> ---
>> @@ -282,6 +282,7 @@ Possible BPF extensions are shown in the following
>> table:
>>     vlan_tci                              skb_vlan_tag_get(skb)
>>     vlan_pr                               skb_vlan_tag_present(skb)
>>     rand                                  prandom_u32()
>> +  vlan_proto                            skb->vlan_proto
> 
> the patch is correct and looks clean, but I don't understand
> the motivation for the patch.
> There is already SKF_AD_VLAN_TAG_PRESENT. If it is set then only
> two possible values of vlan_proto are ETH_P_8021Q or ETH_P_8021AD.
> If there another vlan header inside the packet, it's AD.

802.1ad can be used without an inner vlan tag.
Also, we can send/receive QinQ frames with 802.1Q outer tags.
(We can create vlan interface on 802.1Q vlan interface.)

Thanks,
Toshiaki Makita

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov March 5, 2015, 2:41 a.m. UTC | #5
On 3/4/15 6:28 PM, Toshiaki Makita wrote:
> On 2015/03/05 6:03, Alexei Starovoitov wrote:
>> On 3/4/15 12:41 PM, Michal Sekletar wrote:
>>> This commit introduces new BPF extension. It makes possible to load
>>> value of
>>> skb->vlan_proto (vlan tpid) to register A.
>>>
>>> Currently, vlan header is removed from frame and information is
>>> available to
>>> userspace only via tpacket interface. Hence, it is not possible to
>>> install
>>> filter which uses value of vlan tpid field.
>>>
>>> AFAICT only way how to filter based on tpid value is to reconstruct
>>> original
>>> frame encapsulation and interpret BPF filter code in userspace. Doing
>>> that is
>>> way slower than doing filtering in kernel.
>>>
>>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>>> Cc: Jiri Pirko <jpirko@redhat.com>
>>> Signed-off-by: Michal Sekletar <msekleta@redhat.com>
>>> ---
>>> @@ -282,6 +282,7 @@ Possible BPF extensions are shown in the following
>>> table:
>>>      vlan_tci                              skb_vlan_tag_get(skb)
>>>      vlan_pr                               skb_vlan_tag_present(skb)
>>>      rand                                  prandom_u32()
>>> +  vlan_proto                            skb->vlan_proto
>>
>> the patch is correct and looks clean, but I don't understand
>> the motivation for the patch.
>> There is already SKF_AD_VLAN_TAG_PRESENT. If it is set then only
>> two possible values of vlan_proto are ETH_P_8021Q or ETH_P_8021AD.
>> If there another vlan header inside the packet, it's AD.
>
> 802.1ad can be used without an inner vlan tag.
> Also, we can send/receive QinQ frames with 802.1Q outer tags.
> (We can create vlan interface on 802.1Q vlan interface.)

yes, but I think existing 'vlan_tag_present' would be enough
to address the issue mentioned in two bugs reports.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko March 5, 2015, 6:50 a.m. UTC | #6
Thu, Mar 05, 2015 at 12:47:06AM CET, ast@plumgrid.com wrote:
>On 3/4/15 1:14 PM, Guy Harris wrote:
>>
>>On Mar 4, 2015, at 1:03 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>
>>>the patch is correct and looks clean, but I don't understand
>>>the motivation for the patch.
>>>There is already SKF_AD_VLAN_TAG_PRESENT. If it is set then only
>>>two possible values of vlan_proto are ETH_P_8021Q or ETH_P_8021AD.
>>>If there another vlan header inside the packet, it's AD.
>>>So you can do the filtering already without adding new bpf extension...
>>
>>I presume he's referring to
>>
>>	https://github.com/the-tcpdump-group/libpcap/issues/397
>>
>>or
>>
>>	https://github.com/the-tcpdump-group/libpcap/issues/390
>
>ok. context is clear.
>yet, it still sounds like something to fix inside libpcap.

Libpcap need to somehow let kernel now what vlan proto it should filter on.
Also, it is not true that another vlan header inside packet is AD. You
can have packet with just AD header (or 2 AD or 2 Q, etc).

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov March 5, 2015, 7:23 a.m. UTC | #7
On 3/4/15 10:50 PM, Jiri Pirko wrote:
> Thu, Mar 05, 2015 at 12:47:06AM CET, ast@plumgrid.com wrote:
>> On 3/4/15 1:14 PM, Guy Harris wrote:
>>>
>>> On Mar 4, 2015, at 1:03 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>>
>>>> the patch is correct and looks clean, but I don't understand
>>>> the motivation for the patch.
>>>> There is already SKF_AD_VLAN_TAG_PRESENT. If it is set then only
>>>> two possible values of vlan_proto are ETH_P_8021Q or ETH_P_8021AD.
>>>> If there another vlan header inside the packet, it's AD.
>>>> So you can do the filtering already without adding new bpf extension...
>>>
>>> I presume he's referring to
>>>
>>> 	https://github.com/the-tcpdump-group/libpcap/issues/397
>>>
>>> or
>>>
>>> 	https://github.com/the-tcpdump-group/libpcap/issues/390
>>
>> ok. context is clear.
>> yet, it still sounds like something to fix inside libpcap.
>
> Libpcap need to somehow let kernel now what vlan proto it should filter on.

I don't think that's what bug report talking about. It's looking to
match on vlan id regardless of vlan_proto. Only vlan_tag_present is
needed to make it work.

> Also, it is not true that another vlan header inside packet is AD. You
> can have packet with just AD header (or 2 AD or 2 Q, etc).

yes. Correct, but then again it doesn't seem to be the goal of the bug 
report.
To make myself clear. I think the patch is ok, I'm only asking for
clear justification based on real use case and not invented one.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Kubecek March 5, 2015, 7:24 a.m. UTC | #8
On Thu, Mar 05, 2015 at 07:50:53AM +0100, Jiri Pirko wrote:
> Thu, Mar 05, 2015 at 12:47:06AM CET, ast@plumgrid.com wrote:
> >On 3/4/15 1:14 PM, Guy Harris wrote:
> >>
> >>On Mar 4, 2015, at 1:03 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> >>
> >>>the patch is correct and looks clean, but I don't understand
> >>>the motivation for the patch.
> >>>There is already SKF_AD_VLAN_TAG_PRESENT. If it is set then only
> >>>two possible values of vlan_proto are ETH_P_8021Q or ETH_P_8021AD.
> >>>If there another vlan header inside the packet, it's AD.
> >>>So you can do the filtering already without adding new bpf extension...
> >>
> >>I presume he's referring to
> >>
> >>	https://github.com/the-tcpdump-group/libpcap/issues/397
> >>
> >>or
> >>
> >>	https://github.com/the-tcpdump-group/libpcap/issues/390
> >
> >ok. context is clear.
> >yet, it still sounds like something to fix inside libpcap.
> 
> Libpcap need to somehow let kernel now what vlan proto it should filter on.

To be more precise, it does not need it now as there is no syntax for
pcap filter on TPID, one can only filter by VID. But if someone wanted
to implement such feature, it could not work with in-kernel filtering at
the moment.

                                                          Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov March 5, 2015, 7:49 a.m. UTC | #9
On 3/4/15 11:24 PM, Michal Kubecek wrote:
> On Thu, Mar 05, 2015 at 07:50:53AM +0100, Jiri Pirko wrote:
>> Thu, Mar 05, 2015 at 12:47:06AM CET, ast@plumgrid.com wrote:
>>> On 3/4/15 1:14 PM, Guy Harris wrote:
>>>>
>>>> On Mar 4, 2015, at 1:03 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>>>
>>>>> the patch is correct and looks clean, but I don't understand
>>>>> the motivation for the patch.
>>>>> There is already SKF_AD_VLAN_TAG_PRESENT. If it is set then only
>>>>> two possible values of vlan_proto are ETH_P_8021Q or ETH_P_8021AD.
>>>>> If there another vlan header inside the packet, it's AD.
>>>>> So you can do the filtering already without adding new bpf extension...
>>>>
>>>> I presume he's referring to
>>>>
>>>> 	https://github.com/the-tcpdump-group/libpcap/issues/397
>>>>
>>>> or
>>>>
>>>> 	https://github.com/the-tcpdump-group/libpcap/issues/390
>>>
>>> ok. context is clear.
>>> yet, it still sounds like something to fix inside libpcap.
>>
>> Libpcap need to somehow let kernel now what vlan proto it should filter on.
>
> To be more precise, it does not need it now as there is no syntax for
> pcap filter on TPID, one can only filter by VID. But if someone wanted
> to implement such feature, it could not work with in-kernel filtering at
> the moment.

exactly my point. That's how we should evaluate the patch.
It does provide extra visibility for classic bpf programs and
it's not strongly required at the moment, but can be useful, no doubt.
At the same time it forever exposes skb->vlan_proto to user space,
so any refactoring of sk_buff would need to deal with that.
With the use case presented I don't have strong opinion one way or
the other.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guy Harris March 5, 2015, 8:35 a.m. UTC | #10
On Mar 4, 2015, at 11:24 PM, Michal Kubecek <mkubecek@suse.cz> wrote:

> To be more precise, it does not need it now as there is no syntax for
> pcap filter on TPID,

In the current version of libpcap (and the commit dates back to November 2011), the filter "vlan" checks both for 0x8100 and 0x9100 (it should perhaps also check for other TPID values), in userland and in the kernel on platforms other than Linux, as the generated code checks for both values.

If there is a way that an in-kernel BPF program can check for all VLAN TPID values, that would be helpful; that would be sufficient to implement the filter "vlan" in a fashion that works the same in the Linux kernel and in other contexts.  (I presume that one can, from the data delivered to a PF_PACKET SOCK_RAW socket, reconstruct the packet as received, with all the VLAN tags in place in the packet just as they were when the packet hit the adapter's transceiver from the network.  If not, that's a *separate* deficiency.)

I'm not currently worried about Backbone Service Instance Tag Control Information right now, as libpcap currently doesn't handle that, so there's no need to, for the generated code for "vlan", actually test what particular TPID value is in the packet, as long as we can test whether there's a VLAN header or not.  In the future, that might be useful.

(BTW, why did the IEEE remove 802.1Q from the IEEE Get program for IEEE 802 standards?

	http://standards.ieee.org/about/get/802/802.1.html

Are they waiting for 802.1Q-2014 to be published?

	http://www.ieee802.org/1/pages/802.1Q-2014.html

And what's the current status of 802.1ad?  Is QinQ deprecated or something?)--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko March 5, 2015, 8:52 a.m. UTC | #11
Wed, Mar 04, 2015 at 09:41:58PM CET, msekleta@redhat.com wrote:
>This commit introduces new BPF extension. It makes possible to load value of
>skb->vlan_proto (vlan tpid) to register A.
>
>Currently, vlan header is removed from frame and information is available to
>userspace only via tpacket interface. Hence, it is not possible to install
>filter which uses value of vlan tpid field.
>
>AFAICT only way how to filter based on tpid value is to reconstruct original
>frame encapsulation and interpret BPF filter code in userspace. Doing that is
>way slower than doing filtering in kernel.
>
>Cc: Alexei Starovoitov <ast@plumgrid.com>
>Cc: Jiri Pirko <jpirko@redhat.com>
>Signed-off-by: Michal Sekletar <msekleta@redhat.com>

Reviewed-by: Jiri Pirko <jiri@resnulli.us>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Denis Kirjanov March 5, 2015, 8:57 a.m. UTC | #12
On 3/4/15, Michal Sekletar <msekleta@redhat.com> wrote:
> This commit introduces new BPF extension. It makes possible to load value
> of
> skb->vlan_proto (vlan tpid) to register A.
>
> Currently, vlan header is removed from frame and information is available
> to
> userspace only via tpacket interface. Hence, it is not possible to install
> filter which uses value of vlan tpid field.
>
> AFAICT only way how to filter based on tpid value is to reconstruct
> original
> frame encapsulation and interpret BPF filter code in userspace. Doing that
> is
> way slower than doing filtering in kernel.
>
> Cc: Alexei Starovoitov <ast@plumgrid.com>
> Cc: Jiri Pirko <jpirko@redhat.com>
> Signed-off-by: Michal Sekletar <msekleta@redhat.com>
> ---
>  Documentation/networking/filter.txt |  1 +
>  arch/arm/net/bpf_jit_32.c           |  6 ++++++
>  arch/mips/net/bpf_jit.c             |  6 ++++++
>  arch/powerpc/net/bpf_jit_comp.c     |  5 +++++
>  arch/s390/net/bpf_jit_comp.c        |  5 +++++
>  arch/sparc/net/bpf_jit_comp.c       |  3 +++
>  include/linux/filter.h              |  1 +
>  include/uapi/linux/filter.h         |  3 ++-
>  net/core/filter.c                   | 10 ++++++++++
>  tools/net/bpf_exp.l                 |  1 +
>  tools/net/bpf_exp.y                 | 11 ++++++++++-
>  11 files changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/networking/filter.txt
> b/Documentation/networking/filter.txt
> index 9930ecfb..3155c4c 100644
> --- a/Documentation/networking/filter.txt
> +++ b/Documentation/networking/filter.txt
> @@ -282,6 +282,7 @@ Possible BPF extensions are shown in the following
> table:
>    vlan_tci                              skb_vlan_tag_get(skb)
>    vlan_pr                               skb_vlan_tag_present(skb)
>    rand                                  prandom_u32()
> +  vlan_proto                            skb->vlan_proto
>
>  These extensions can also be prefixed with '#'.
>  Examples for low-level BPF:
> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
> index e1268f9..2954e05 100644
> --- a/arch/arm/net/bpf_jit_32.c
> +++ b/arch/arm/net/bpf_jit_32.c
> @@ -843,6 +843,12 @@ b_epilogue:
>  			else
>  				OP_IMM3(ARM_AND, r_A, r_A, VLAN_TAG_PRESENT, ctx);
>  			break;
> +		case BPF_ANC | SKF_AD_VLAN_PROTO:
> +			ctx->seen |= SEEN_SKB;
> +			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
> +			off = offsetof(struct sk_buff, vlan_proto);
> +			emit(ARM_LDR_I(r_A, r_skb, off), ctx);
> +			break;
>  		case BPF_ANC | SKF_AD_QUEUE:
>  			ctx->seen |= SEEN_SKB;
>  			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
> diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> index 5d61393..732fb1d 100644
> --- a/arch/mips/net/bpf_jit.c
> +++ b/arch/mips/net/bpf_jit.c
> @@ -1295,6 +1295,12 @@ jmp_cmp:
>  				emit_sltu(r_A, r_zero, r_A, ctx);
>  			}
>  			break;
> +		case BPF_ANC | SKF_AD_VLAN_PROTO:
> +			ctx->flags |= SEEN_SKB | SEEN_A;
> +			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
> +			off = offsetof(struct sk_buff, vlan_proto);
> +			emit_load(r_A, r_skb, off, ctx);
> +			break;
>  		case BPF_ANC | SKF_AD_PKTTYPE:
>  			ctx->flags |= SEEN_SKB;
>
> diff --git a/arch/powerpc/net/bpf_jit_comp.c
> b/arch/powerpc/net/bpf_jit_comp.c
> index 17cea18..acfa732 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -399,6 +399,11 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32
> *image,
>  				PPC_SRWI(r_A, r_A, 12);
>  			}
>  			break;
> +		case BPF_ANC | SKF_AD_VLAN_PROTO:
> +			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
> +			PPC_LWZ_OFFS(r_A, r_skb, offsetof(struct sk_buff,
> +							  vlan_proto));
You're going to load a halfword, so lhz has to be used

> +			break;
>  		case BPF_ANC | SKF_AD_QUEUE:
>  			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
>  						  queue_mapping) != 2);
> diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
> index bbd1981..55b6db7 100644
> --- a/arch/s390/net/bpf_jit_comp.c
> +++ b/arch/s390/net/bpf_jit_comp.c
> @@ -722,6 +722,11 @@ call_fn:	/* lg %r1,<d(function)>(%r13) */
>  			EMIT4_DISP(0x88500000, 12);
>  		}
>  		break;
> +	case BPF_ANC | SKF_AD_VLAN_PROTO: /* A = skb->vlan_proto */
> +		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
> +		/* l %r5,<d(vlan_proto)>(%r2) */
> +		EMIT4_DISP(0x58502000, offsetof(struct sk_buff, vlan_proto));
> +		break;
>  	case BPF_ANC | SKF_AD_PKTTYPE:
>  		/* lhi %r5,0 */
>  		EMIT4(0xa7580000);
> diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
> index 7931eee..cf3e6ac 100644
> --- a/arch/sparc/net/bpf_jit_comp.c
> +++ b/arch/sparc/net/bpf_jit_comp.c
> @@ -624,6 +624,9 @@ void bpf_jit_compile(struct bpf_prog *fp)
>  					emit_and(r_A, r_TMP, r_A);
>  				}
>  				break;
> +			case BPF_ANC | SKF_AD_VLAN_PROTO:
> +				emit_skb_load32(vlan_proto, r_A);
> +				break;
>  			case BPF_LD | BPF_W | BPF_LEN:
>  				emit_skb_load32(len, r_A);
>  				break;
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 9ee8c67..3ec42a1 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -454,6 +454,7 @@ static inline u16 bpf_anc_helper(const struct
> sock_filter *ftest)
>  		BPF_ANCILLARY(VLAN_TAG_PRESENT);
>  		BPF_ANCILLARY(PAY_OFFSET);
>  		BPF_ANCILLARY(RANDOM);
> +		BPF_ANCILLARY(VLAN_PROTO);
>  		}
>  		/* Fallthrough. */
>  	default:
> diff --git a/include/uapi/linux/filter.h b/include/uapi/linux/filter.h
> index 47785d5..aef103bf 100644
> --- a/include/uapi/linux/filter.h
> +++ b/include/uapi/linux/filter.h
> @@ -77,7 +77,8 @@ struct sock_fprog {	/* Required for SO_ATTACH_FILTER. */
>  #define SKF_AD_VLAN_TAG_PRESENT 48
>  #define SKF_AD_PAY_OFFSET	52
>  #define SKF_AD_RANDOM	56
> -#define SKF_AD_MAX	60
> +#define SKF_AD_VLAN_PROTO	60
> +#define SKF_AD_MAX	64
>  #define SKF_NET_OFF   (-0x100000)
>  #define SKF_LL_OFF    (-0x200000)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 7a4eb70..b14cc40 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -236,6 +236,16 @@ static bool convert_bpf_extensions(struct sock_filter
> *fp,
>  		}
>  		break;
>
> +	case SKF_AD_OFF + SKF_AD_VLAN_PROTO:
> +		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
> +
> +		/* A = *(u16 *) (CTX + offsetof(vlan_proto)) */
> +		*insn++ = BPF_LDX_MEM(BPF_H, BPF_REG_A, BPF_REG_CTX,
> +				      offsetof(struct sk_buff, vlan_proto));
> +		/* A = ntohs(A) [emitting a nop or swap16] */
> +		*insn = BPF_ENDIAN(BPF_FROM_BE, BPF_REG_A, 16);
> +		break;
> +
>  	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
>  	case SKF_AD_OFF + SKF_AD_NLATTR:
>  	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
> diff --git a/tools/net/bpf_exp.l b/tools/net/bpf_exp.l
> index 833a966..0d4e72f 100644
> --- a/tools/net/bpf_exp.l
> +++ b/tools/net/bpf_exp.l
> @@ -93,6 +93,7 @@ extern void yyerror(const char *str);
>  "#"?("vlan_tci") { return K_VLANT; }
>  "#"?("vlan_pr")	{ return K_VLANP; }
>  "#"?("rand")	{ return K_RAND; }
> +"#"?("vlan_proto")	{ return K_VLANPR; }
>
>  ":"		{ return ':'; }
>  ","		{ return ','; }
> diff --git a/tools/net/bpf_exp.y b/tools/net/bpf_exp.y
> index e6306c5..9c7a0e7 100644
> --- a/tools/net/bpf_exp.y
> +++ b/tools/net/bpf_exp.y
> @@ -56,7 +56,7 @@ static void bpf_set_jmp_label(char *label, enum jmp_type
> type);
>  %token OP_LDXI
>
>  %token K_PKT_LEN K_PROTO K_TYPE K_NLATTR K_NLATTR_NEST K_MARK K_QUEUE
> K_HATYPE
> -%token K_RXHASH K_CPU K_IFIDX K_VLANT K_VLANP K_POFF K_RAND
> +%token K_RXHASH K_CPU K_IFIDX K_VLANT K_VLANP K_POFF K_RAND K_VLANPR
>
>  %token ':' ',' '[' ']' '(' ')' 'x' 'a' '+' 'M' '*' '&' '#' '%'
>
> @@ -167,6 +167,9 @@ ldb
>  	| OP_LDB K_RAND {
>  		bpf_set_curr_instr(BPF_LD | BPF_B | BPF_ABS, 0, 0,
>  				   SKF_AD_OFF + SKF_AD_RANDOM); }
> +	| OP_LDB K_VLANPR {
> +		bpf_set_curr_instr(BPF_LD | BPF_B | BPF_ABS, 0, 0,
> +				   SKF_AD_OFF + SKF_AD_VLAN_PROTO); }
>  	;
>
>  ldh
> @@ -218,6 +221,9 @@ ldh
>  	| OP_LDH K_RAND {
>  		bpf_set_curr_instr(BPF_LD | BPF_H | BPF_ABS, 0, 0,
>  				   SKF_AD_OFF + SKF_AD_RANDOM); }
> +	| OP_LDH K_VLANPR {
> +		bpf_set_curr_instr(BPF_LD | BPF_H | BPF_ABS, 0, 0,
> +				   SKF_AD_OFF + SKF_AD_VLAN_PROTO); }
>  	;
>
>  ldi
> @@ -274,6 +280,9 @@ ld
>  	| OP_LD K_RAND {
>  		bpf_set_curr_instr(BPF_LD | BPF_W | BPF_ABS, 0, 0,
>  				   SKF_AD_OFF + SKF_AD_RANDOM); }
> +	| OP_LD K_VLANPR {
> +		bpf_set_curr_instr(BPF_LD | BPF_W | BPF_ABS, 0, 0,
> +				   SKF_AD_OFF + SKF_AD_VLAN_PROTO); }
>  	| OP_LD 'M' '[' number ']' {
>  		bpf_set_curr_instr(BPF_LD | BPF_MEM, 0, 0, $4); }
>  	| OP_LD '[' 'x' '+' number ']' {
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Kubecek March 5, 2015, 9:23 a.m. UTC | #13
On Thu, Mar 05, 2015 at 12:35:01AM -0800, Guy Harris wrote:
> 
> On Mar 4, 2015, at 11:24 PM, Michal Kubecek <mkubecek@suse.cz> wrote:
> 
> > To be more precise, it does not need it now as there is no syntax for
> > pcap filter on TPID,
> 
> In the current version of libpcap (and the commit dates back to November
> 2011), the filter "vlan" checks both for 0x8100 and 0x9100

Actually, there are four different pieces of code:

  - userspace checking packet contents
  - userspace checking packet metadata
  - kernel BPF checking packet contents
  - kernel BPF checking packet metadata

As far as I can see, only the BPF generated to check packet contents
does check TPID value, BPF checking metadata does not (neither in kernel
nor in userspace).

> (it should perhaps also check for other TPID values), in userland and
> in the kernel on platforms other than Linux, as the generated code
> checks for both values.
...
> (I presume that one can, from the data delivered to a PF_PACKET
> SOCK_RAW socket, reconstruct the packet as received, with all the VLAN
> tags in place in the packet just as they were when the packet hit the
> adapter's transceiver from the network.  If not, that's a *separate*
> deficiency.)

These two issues were addressed by

  https://github.com/the-tcpdump-group/libpcap/pull/351

but the code changed since then so I'll have to check the changes and
create a new pull request.

                                                       Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Sekletar March 5, 2015, 10:37 a.m. UTC | #14
On Wed, Mar 04, 2015 at 01:03:50PM -0800, Alexei Starovoitov wrote:
> On 3/4/15 12:41 PM, Michal Sekletar wrote:
> >This commit introduces new BPF extension. It makes possible to load value of
> >skb->vlan_proto (vlan tpid) to register A.
> >
> >Currently, vlan header is removed from frame and information is available to
> >userspace only via tpacket interface. Hence, it is not possible to install
> >filter which uses value of vlan tpid field.
> >
> >AFAICT only way how to filter based on tpid value is to reconstruct original
> >frame encapsulation and interpret BPF filter code in userspace. Doing that is
> >way slower than doing filtering in kernel.
> >
> >Cc: Alexei Starovoitov <ast@plumgrid.com>
> >Cc: Jiri Pirko <jpirko@redhat.com>
> >Signed-off-by: Michal Sekletar <msekleta@redhat.com>
> >---
> >@@ -282,6 +282,7 @@ Possible BPF extensions are shown in the following table:
> >    vlan_tci                              skb_vlan_tag_get(skb)
> >    vlan_pr                               skb_vlan_tag_present(skb)
> >    rand                                  prandom_u32()
> >+  vlan_proto                            skb->vlan_proto
> 
> the patch is correct and looks clean, but I don't understand
> the motivation for the patch.

Way how libpcap currently uses BPF extensions is not compatible with old
behavior where actual value of tpid field was checked. I wanted to address
that, i.e. if "vlan" keyword is used as filter expression, libpcap should
install a filter such that only ethernet frames having tpid value of 0x8100 or
0x9100 will pass. That is not the case with current libpcap git and 4.0-rc1
kernel.

Given that I broke libpcap as described above I tried to come up with the way
how to fix that. However I realized that with recent kernels there is no other
way than adding new BPF extension.

> There is already SKF_AD_VLAN_TAG_PRESENT. If it is set then only
> two possible values of vlan_proto are ETH_P_8021Q or ETH_P_8021AD.

Any reason why ETH_P_QINQ1, ETH_P_QINQ2, ETH_P_QINQ3 no longer works? If I
understand correctly, you are basically saying, that there is no point checking
for vlan tpid because PF_PACKET socket will never receive frame having other
tpid value than above two anyway.

So bottom line is that I wanted to grant userspace programs more flexibility,
and you are saying that it is pointless because for example if outer tpid is
0x9100 socket will never receive the frame. If that is the case then
disregard the patch.

> If there another vlan header inside the packet, it's AD.
> So you can do the filtering already without adding new bpf extension...

Cheers,

Michal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Sekletar March 5, 2015, 2:33 p.m. UTC | #15
On Thu, Mar 05, 2015 at 11:57:05AM +0300, Denis Kirjanov wrote:
> On 3/4/15, Michal Sekletar <msekleta@redhat.com> wrote:
> > This commit introduces new BPF extension. It makes possible to load value
> > of
> > skb->vlan_proto (vlan tpid) to register A.
> >
> > Currently, vlan header is removed from frame and information is available
> > to
> > userspace only via tpacket interface. Hence, it is not possible to install
> > filter which uses value of vlan tpid field.
> >
> > AFAICT only way how to filter based on tpid value is to reconstruct
> > original
> > frame encapsulation and interpret BPF filter code in userspace. Doing that
> > is
> > way slower than doing filtering in kernel.
> >
> > Cc: Alexei Starovoitov <ast@plumgrid.com>
> > Cc: Jiri Pirko <jpirko@redhat.com>
> > Signed-off-by: Michal Sekletar <msekleta@redhat.com>
> > ---
> >  Documentation/networking/filter.txt |  1 +
> >  arch/arm/net/bpf_jit_32.c           |  6 ++++++
> >  arch/mips/net/bpf_jit.c             |  6 ++++++
> >  arch/powerpc/net/bpf_jit_comp.c     |  5 +++++
> >  arch/s390/net/bpf_jit_comp.c        |  5 +++++
> >  arch/sparc/net/bpf_jit_comp.c       |  3 +++
> >  include/linux/filter.h              |  1 +
> >  include/uapi/linux/filter.h         |  3 ++-
> >  net/core/filter.c                   | 10 ++++++++++
> >  tools/net/bpf_exp.l                 |  1 +
> >  tools/net/bpf_exp.y                 | 11 ++++++++++-
> >  11 files changed, 50 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/networking/filter.txt
> > b/Documentation/networking/filter.txt
> > index 9930ecfb..3155c4c 100644
> > --- a/Documentation/networking/filter.txt
> > +++ b/Documentation/networking/filter.txt
> > @@ -282,6 +282,7 @@ Possible BPF extensions are shown in the following
> > table:
> >    vlan_tci                              skb_vlan_tag_get(skb)
> >    vlan_pr                               skb_vlan_tag_present(skb)
> >    rand                                  prandom_u32()
> > +  vlan_proto                            skb->vlan_proto
> >
> >  These extensions can also be prefixed with '#'.
> >  Examples for low-level BPF:
> > diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
> > index e1268f9..2954e05 100644
> > --- a/arch/arm/net/bpf_jit_32.c
> > +++ b/arch/arm/net/bpf_jit_32.c
> > @@ -843,6 +843,12 @@ b_epilogue:
> >  			else
> >  				OP_IMM3(ARM_AND, r_A, r_A, VLAN_TAG_PRESENT, ctx);
> >  			break;
> > +		case BPF_ANC | SKF_AD_VLAN_PROTO:
> > +			ctx->seen |= SEEN_SKB;
> > +			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
> > +			off = offsetof(struct sk_buff, vlan_proto);
> > +			emit(ARM_LDR_I(r_A, r_skb, off), ctx);
> > +			break;
> >  		case BPF_ANC | SKF_AD_QUEUE:
> >  			ctx->seen |= SEEN_SKB;
> >  			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
> > diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> > index 5d61393..732fb1d 100644
> > --- a/arch/mips/net/bpf_jit.c
> > +++ b/arch/mips/net/bpf_jit.c
> > @@ -1295,6 +1295,12 @@ jmp_cmp:
> >  				emit_sltu(r_A, r_zero, r_A, ctx);
> >  			}
> >  			break;
> > +		case BPF_ANC | SKF_AD_VLAN_PROTO:
> > +			ctx->flags |= SEEN_SKB | SEEN_A;
> > +			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
> > +			off = offsetof(struct sk_buff, vlan_proto);
> > +			emit_load(r_A, r_skb, off, ctx);
> > +			break;
> >  		case BPF_ANC | SKF_AD_PKTTYPE:
> >  			ctx->flags |= SEEN_SKB;
> >
> > diff --git a/arch/powerpc/net/bpf_jit_comp.c
> > b/arch/powerpc/net/bpf_jit_comp.c
> > index 17cea18..acfa732 100644
> > --- a/arch/powerpc/net/bpf_jit_comp.c
> > +++ b/arch/powerpc/net/bpf_jit_comp.c
> > @@ -399,6 +399,11 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32
> > *image,
> >  				PPC_SRWI(r_A, r_A, 12);
> >  			}
> >  			break;
> > +		case BPF_ANC | SKF_AD_VLAN_PROTO:
> > +			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
> > +			PPC_LWZ_OFFS(r_A, r_skb, offsetof(struct sk_buff,
> > +							  vlan_proto));
> You're going to load a halfword, so lhz has to be used

I treated skb->vlan_proto as word instead of halfword on other arches
too. I fixed them all now, but I will not send out v2 just yet. First, I'd like
to hear back from Alexei.

> 
> > +			break;
> >  		case BPF_ANC | SKF_AD_QUEUE:
> >  			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
> >  						  queue_mapping) != 2);
> > diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
> > index bbd1981..55b6db7 100644
> > --- a/arch/s390/net/bpf_jit_comp.c
> > +++ b/arch/s390/net/bpf_jit_comp.c
> > @@ -722,6 +722,11 @@ call_fn:	/* lg %r1,<d(function)>(%r13) */
> >  			EMIT4_DISP(0x88500000, 12);
> >  		}
> >  		break;
> > +	case BPF_ANC | SKF_AD_VLAN_PROTO: /* A = skb->vlan_proto */
> > +		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
> > +		/* l %r5,<d(vlan_proto)>(%r2) */
> > +		EMIT4_DISP(0x58502000, offsetof(struct sk_buff, vlan_proto));
> > +		break;
> >  	case BPF_ANC | SKF_AD_PKTTYPE:
> >  		/* lhi %r5,0 */
> >  		EMIT4(0xa7580000);
> > diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
> > index 7931eee..cf3e6ac 100644
> > --- a/arch/sparc/net/bpf_jit_comp.c
> > +++ b/arch/sparc/net/bpf_jit_comp.c
> > @@ -624,6 +624,9 @@ void bpf_jit_compile(struct bpf_prog *fp)
> >  					emit_and(r_A, r_TMP, r_A);
> >  				}
> >  				break;
> > +			case BPF_ANC | SKF_AD_VLAN_PROTO:
> > +				emit_skb_load32(vlan_proto, r_A);
> > +				break;
> >  			case BPF_LD | BPF_W | BPF_LEN:
> >  				emit_skb_load32(len, r_A);
> >  				break;
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index 9ee8c67..3ec42a1 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -454,6 +454,7 @@ static inline u16 bpf_anc_helper(const struct
> > sock_filter *ftest)
> >  		BPF_ANCILLARY(VLAN_TAG_PRESENT);
> >  		BPF_ANCILLARY(PAY_OFFSET);
> >  		BPF_ANCILLARY(RANDOM);
> > +		BPF_ANCILLARY(VLAN_PROTO);
> >  		}
> >  		/* Fallthrough. */
> >  	default:
> > diff --git a/include/uapi/linux/filter.h b/include/uapi/linux/filter.h
> > index 47785d5..aef103bf 100644
> > --- a/include/uapi/linux/filter.h
> > +++ b/include/uapi/linux/filter.h
> > @@ -77,7 +77,8 @@ struct sock_fprog {	/* Required for SO_ATTACH_FILTER. */
> >  #define SKF_AD_VLAN_TAG_PRESENT 48
> >  #define SKF_AD_PAY_OFFSET	52
> >  #define SKF_AD_RANDOM	56
> > -#define SKF_AD_MAX	60
> > +#define SKF_AD_VLAN_PROTO	60
> > +#define SKF_AD_MAX	64
> >  #define SKF_NET_OFF   (-0x100000)
> >  #define SKF_LL_OFF    (-0x200000)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 7a4eb70..b14cc40 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -236,6 +236,16 @@ static bool convert_bpf_extensions(struct sock_filter
> > *fp,
> >  		}
> >  		break;
> >
> > +	case SKF_AD_OFF + SKF_AD_VLAN_PROTO:
> > +		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
> > +
> > +		/* A = *(u16 *) (CTX + offsetof(vlan_proto)) */
> > +		*insn++ = BPF_LDX_MEM(BPF_H, BPF_REG_A, BPF_REG_CTX,
> > +				      offsetof(struct sk_buff, vlan_proto));
> > +		/* A = ntohs(A) [emitting a nop or swap16] */
> > +		*insn = BPF_ENDIAN(BPF_FROM_BE, BPF_REG_A, 16);
> > +		break;
> > +
> >  	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
> >  	case SKF_AD_OFF + SKF_AD_NLATTR:
> >  	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
> > diff --git a/tools/net/bpf_exp.l b/tools/net/bpf_exp.l
> > index 833a966..0d4e72f 100644
> > --- a/tools/net/bpf_exp.l
> > +++ b/tools/net/bpf_exp.l
> > @@ -93,6 +93,7 @@ extern void yyerror(const char *str);
> >  "#"?("vlan_tci") { return K_VLANT; }
> >  "#"?("vlan_pr")	{ return K_VLANP; }
> >  "#"?("rand")	{ return K_RAND; }
> > +"#"?("vlan_proto")	{ return K_VLANPR; }
> >
> >  ":"		{ return ':'; }
> >  ","		{ return ','; }
> > diff --git a/tools/net/bpf_exp.y b/tools/net/bpf_exp.y
> > index e6306c5..9c7a0e7 100644
> > --- a/tools/net/bpf_exp.y
> > +++ b/tools/net/bpf_exp.y
> > @@ -56,7 +56,7 @@ static void bpf_set_jmp_label(char *label, enum jmp_type
> > type);
> >  %token OP_LDXI
> >
> >  %token K_PKT_LEN K_PROTO K_TYPE K_NLATTR K_NLATTR_NEST K_MARK K_QUEUE
> > K_HATYPE
> > -%token K_RXHASH K_CPU K_IFIDX K_VLANT K_VLANP K_POFF K_RAND
> > +%token K_RXHASH K_CPU K_IFIDX K_VLANT K_VLANP K_POFF K_RAND K_VLANPR
> >
> >  %token ':' ',' '[' ']' '(' ')' 'x' 'a' '+' 'M' '*' '&' '#' '%'
> >
> > @@ -167,6 +167,9 @@ ldb
> >  	| OP_LDB K_RAND {
> >  		bpf_set_curr_instr(BPF_LD | BPF_B | BPF_ABS, 0, 0,
> >  				   SKF_AD_OFF + SKF_AD_RANDOM); }
> > +	| OP_LDB K_VLANPR {
> > +		bpf_set_curr_instr(BPF_LD | BPF_B | BPF_ABS, 0, 0,
> > +				   SKF_AD_OFF + SKF_AD_VLAN_PROTO); }
> >  	;
> >
> >  ldh
> > @@ -218,6 +221,9 @@ ldh
> >  	| OP_LDH K_RAND {
> >  		bpf_set_curr_instr(BPF_LD | BPF_H | BPF_ABS, 0, 0,
> >  				   SKF_AD_OFF + SKF_AD_RANDOM); }
> > +	| OP_LDH K_VLANPR {
> > +		bpf_set_curr_instr(BPF_LD | BPF_H | BPF_ABS, 0, 0,
> > +				   SKF_AD_OFF + SKF_AD_VLAN_PROTO); }
> >  	;
> >
> >  ldi
> > @@ -274,6 +280,9 @@ ld
> >  	| OP_LD K_RAND {
> >  		bpf_set_curr_instr(BPF_LD | BPF_W | BPF_ABS, 0, 0,
> >  				   SKF_AD_OFF + SKF_AD_RANDOM); }
> > +	| OP_LD K_VLANPR {
> > +		bpf_set_curr_instr(BPF_LD | BPF_W | BPF_ABS, 0, 0,
> > +				   SKF_AD_OFF + SKF_AD_VLAN_PROTO); }
> >  	| OP_LD 'M' '[' number ']' {
> >  		bpf_set_curr_instr(BPF_LD | BPF_MEM, 0, 0, $4); }
> >  	| OP_LD '[' 'x' '+' number ']' {
> > --
> > 1.8.3.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov March 5, 2015, 4:52 p.m. UTC | #16
On 3/5/15 2:37 AM, Michal Sekletar wrote:
> On Wed, Mar 04, 2015 at 01:03:50PM -0800, Alexei Starovoitov wrote:
>> On 3/4/15 12:41 PM, Michal Sekletar wrote:
>>> This commit introduces new BPF extension. It makes possible to load value of
>>> skb->vlan_proto (vlan tpid) to register A.
>>>
>>> Currently, vlan header is removed from frame and information is available to
>>> userspace only via tpacket interface. Hence, it is not possible to install
>>> filter which uses value of vlan tpid field.
>>>
>>> AFAICT only way how to filter based on tpid value is to reconstruct original
>>> frame encapsulation and interpret BPF filter code in userspace. Doing that is
>>> way slower than doing filtering in kernel.
>>>
>>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>>> Cc: Jiri Pirko <jpirko@redhat.com>
>>> Signed-off-by: Michal Sekletar <msekleta@redhat.com>
>>> ---
>>> @@ -282,6 +282,7 @@ Possible BPF extensions are shown in the following table:
>>>     vlan_tci                              skb_vlan_tag_get(skb)
>>>     vlan_pr                               skb_vlan_tag_present(skb)
>>>     rand                                  prandom_u32()
>>> +  vlan_proto                            skb->vlan_proto
>>
>> the patch is correct and looks clean, but I don't understand
>> the motivation for the patch.
>
> Way how libpcap currently uses BPF extensions is not compatible with old
> behavior where actual value of tpid field was checked. I wanted to address
> that, i.e. if "vlan" keyword is used as filter expression, libpcap should
> install a filter such that only ethernet frames having tpid value of 0x8100 or
> 0x9100 will pass. That is not the case with current libpcap git and 4.0-rc1
> kernel.
>
> Given that I broke libpcap as described above I tried to come up with the way
> how to fix that. However I realized that with recent kernels there is no other
> way than adding new BPF extension.
>
>> There is already SKF_AD_VLAN_TAG_PRESENT. If it is set then only
>> two possible values of vlan_proto are ETH_P_8021Q or ETH_P_8021AD.
>
> Any reason why ETH_P_QINQ1, ETH_P_QINQ2, ETH_P_QINQ3 no longer works? If I
> understand correctly, you are basically saying, that there is no point checking
> for vlan tpid because PF_PACKET socket will never receive frame having other
> tpid value than above two anyway.
>
> So bottom line is that I wanted to grant userspace programs more flexibility,
> and you are saying that it is pointless because for example if outer tpid is
> 0x9100 socket will never receive the frame. If that is the case then
> disregard the patch.

steering towards vlan device happens only for ETH_P_8021Q and
ETH_P_8021AD. Non-standard 0x9100 and other tags won't be popped into
skb metadata and will stay as-is in the packet body.
If the meaning of 'vlan 100' in libpcap is to detect all possible
vlan tpid then bpf program would need to check VLAN_TAG_PRESENT
(that would mean vlan_proto is either 0x8100 or 0x88A8)
and parse packet body for tpids 0x9[123]00.
Whether we add access to skb->vlan_proto or not, the program would
still need to do the above steps, but instead of checking for
vlan_tag_present only, it would need to do vlan_proto==0x8100
or vlan_proto=0x88a8 and then parse the packet for tpid=0x9[123]00
so adding access to vlan_proto will not simplify libpcap job.
At this point I think it's up to Dave to decide whether we need
this patch (after fixing the issue pointed by Denis) or not.
imo there is a benefit of giving programs more visibility into
skb metadata.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 5, 2015, 6:12 p.m. UTC | #17
From: Denis Kirjanov <kda@linux-powerpc.org>
Date: Thu, 5 Mar 2015 11:57:05 +0300

> On 3/4/15, Michal Sekletar <msekleta@redhat.com> wrote:
>> --- a/arch/powerpc/net/bpf_jit_comp.c
>> +++ b/arch/powerpc/net/bpf_jit_comp.c
>> @@ -399,6 +399,11 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32
>> *image,
>>  				PPC_SRWI(r_A, r_A, 12);
>>  			}
>>  			break;
>> +		case BPF_ANC | SKF_AD_VLAN_PROTO:
>> +			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
>> +			PPC_LWZ_OFFS(r_A, r_skb, offsetof(struct sk_buff,
>> +							  vlan_proto));
> You're going to load a halfword, so lhz has to be used
> 
>> +			break;

Same bug in the sparc implementation too:

>> @@ -624,6 +624,9 @@ void bpf_jit_compile(struct bpf_prog *fp)
>>  					emit_and(r_A, r_TMP, r_A);
>>  				}
>>  				break;
>> +			case BPF_ANC | SKF_AD_VLAN_PROTO:
>> +				emit_skb_load32(vlan_proto, r_A);
>> +				break;
>>  			case BPF_LD | BPF_W | BPF_LEN:
>>  				emit_skb_load32(len, r_A);
>>  				break;

This needs to use emit_load16().

These JIT changes seem to have been done a little carelessly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index 9930ecfb..3155c4c 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -282,6 +282,7 @@  Possible BPF extensions are shown in the following table:
   vlan_tci                              skb_vlan_tag_get(skb)
   vlan_pr                               skb_vlan_tag_present(skb)
   rand                                  prandom_u32()
+  vlan_proto                            skb->vlan_proto
 
 These extensions can also be prefixed with '#'.
 Examples for low-level BPF:
diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index e1268f9..2954e05 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -843,6 +843,12 @@  b_epilogue:
 			else
 				OP_IMM3(ARM_AND, r_A, r_A, VLAN_TAG_PRESENT, ctx);
 			break;
+		case BPF_ANC | SKF_AD_VLAN_PROTO:
+			ctx->seen |= SEEN_SKB;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
+			off = offsetof(struct sk_buff, vlan_proto);
+			emit(ARM_LDR_I(r_A, r_skb, off), ctx);
+			break;
 		case BPF_ANC | SKF_AD_QUEUE:
 			ctx->seen |= SEEN_SKB;
 			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
index 5d61393..732fb1d 100644
--- a/arch/mips/net/bpf_jit.c
+++ b/arch/mips/net/bpf_jit.c
@@ -1295,6 +1295,12 @@  jmp_cmp:
 				emit_sltu(r_A, r_zero, r_A, ctx);
 			}
 			break;
+		case BPF_ANC | SKF_AD_VLAN_PROTO:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
+			off = offsetof(struct sk_buff, vlan_proto);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
 		case BPF_ANC | SKF_AD_PKTTYPE:
 			ctx->flags |= SEEN_SKB;
 
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 17cea18..acfa732 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -399,6 +399,11 @@  static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 				PPC_SRWI(r_A, r_A, 12);
 			}
 			break;
+		case BPF_ANC | SKF_AD_VLAN_PROTO:
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
+			PPC_LWZ_OFFS(r_A, r_skb, offsetof(struct sk_buff,
+							  vlan_proto));
+			break;
 		case BPF_ANC | SKF_AD_QUEUE:
 			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
 						  queue_mapping) != 2);
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index bbd1981..55b6db7 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -722,6 +722,11 @@  call_fn:	/* lg %r1,<d(function)>(%r13) */
 			EMIT4_DISP(0x88500000, 12);
 		}
 		break;
+	case BPF_ANC | SKF_AD_VLAN_PROTO: /* A = skb->vlan_proto */
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
+		/* l %r5,<d(vlan_proto)>(%r2) */
+		EMIT4_DISP(0x58502000, offsetof(struct sk_buff, vlan_proto));
+		break;
 	case BPF_ANC | SKF_AD_PKTTYPE:
 		/* lhi %r5,0 */
 		EMIT4(0xa7580000);
diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
index 7931eee..cf3e6ac 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp.c
@@ -624,6 +624,9 @@  void bpf_jit_compile(struct bpf_prog *fp)
 					emit_and(r_A, r_TMP, r_A);
 				}
 				break;
+			case BPF_ANC | SKF_AD_VLAN_PROTO:
+				emit_skb_load32(vlan_proto, r_A);
+				break;
 			case BPF_LD | BPF_W | BPF_LEN:
 				emit_skb_load32(len, r_A);
 				break;
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9ee8c67..3ec42a1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -454,6 +454,7 @@  static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
 		BPF_ANCILLARY(VLAN_TAG_PRESENT);
 		BPF_ANCILLARY(PAY_OFFSET);
 		BPF_ANCILLARY(RANDOM);
+		BPF_ANCILLARY(VLAN_PROTO);
 		}
 		/* Fallthrough. */
 	default:
diff --git a/include/uapi/linux/filter.h b/include/uapi/linux/filter.h
index 47785d5..aef103bf 100644
--- a/include/uapi/linux/filter.h
+++ b/include/uapi/linux/filter.h
@@ -77,7 +77,8 @@  struct sock_fprog {	/* Required for SO_ATTACH_FILTER. */
 #define SKF_AD_VLAN_TAG_PRESENT 48
 #define SKF_AD_PAY_OFFSET	52
 #define SKF_AD_RANDOM	56
-#define SKF_AD_MAX	60
+#define SKF_AD_VLAN_PROTO	60
+#define SKF_AD_MAX	64
 #define SKF_NET_OFF   (-0x100000)
 #define SKF_LL_OFF    (-0x200000)
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 7a4eb70..b14cc40 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -236,6 +236,16 @@  static bool convert_bpf_extensions(struct sock_filter *fp,
 		}
 		break;
 
+	case SKF_AD_OFF + SKF_AD_VLAN_PROTO:
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_proto) != 2);
+
+		/* A = *(u16 *) (CTX + offsetof(vlan_proto)) */
+		*insn++ = BPF_LDX_MEM(BPF_H, BPF_REG_A, BPF_REG_CTX,
+				      offsetof(struct sk_buff, vlan_proto));
+		/* A = ntohs(A) [emitting a nop or swap16] */
+		*insn = BPF_ENDIAN(BPF_FROM_BE, BPF_REG_A, 16);
+		break;
+
 	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
 	case SKF_AD_OFF + SKF_AD_NLATTR:
 	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
diff --git a/tools/net/bpf_exp.l b/tools/net/bpf_exp.l
index 833a966..0d4e72f 100644
--- a/tools/net/bpf_exp.l
+++ b/tools/net/bpf_exp.l
@@ -93,6 +93,7 @@  extern void yyerror(const char *str);
 "#"?("vlan_tci") { return K_VLANT; }
 "#"?("vlan_pr")	{ return K_VLANP; }
 "#"?("rand")	{ return K_RAND; }
+"#"?("vlan_proto")	{ return K_VLANPR; }
 
 ":"		{ return ':'; }
 ","		{ return ','; }
diff --git a/tools/net/bpf_exp.y b/tools/net/bpf_exp.y
index e6306c5..9c7a0e7 100644
--- a/tools/net/bpf_exp.y
+++ b/tools/net/bpf_exp.y
@@ -56,7 +56,7 @@  static void bpf_set_jmp_label(char *label, enum jmp_type type);
 %token OP_LDXI
 
 %token K_PKT_LEN K_PROTO K_TYPE K_NLATTR K_NLATTR_NEST K_MARK K_QUEUE K_HATYPE
-%token K_RXHASH K_CPU K_IFIDX K_VLANT K_VLANP K_POFF K_RAND
+%token K_RXHASH K_CPU K_IFIDX K_VLANT K_VLANP K_POFF K_RAND K_VLANPR
 
 %token ':' ',' '[' ']' '(' ')' 'x' 'a' '+' 'M' '*' '&' '#' '%'
 
@@ -167,6 +167,9 @@  ldb
 	| OP_LDB K_RAND {
 		bpf_set_curr_instr(BPF_LD | BPF_B | BPF_ABS, 0, 0,
 				   SKF_AD_OFF + SKF_AD_RANDOM); }
+	| OP_LDB K_VLANPR {
+		bpf_set_curr_instr(BPF_LD | BPF_B | BPF_ABS, 0, 0,
+				   SKF_AD_OFF + SKF_AD_VLAN_PROTO); }
 	;
 
 ldh
@@ -218,6 +221,9 @@  ldh
 	| OP_LDH K_RAND {
 		bpf_set_curr_instr(BPF_LD | BPF_H | BPF_ABS, 0, 0,
 				   SKF_AD_OFF + SKF_AD_RANDOM); }
+	| OP_LDH K_VLANPR {
+		bpf_set_curr_instr(BPF_LD | BPF_H | BPF_ABS, 0, 0,
+				   SKF_AD_OFF + SKF_AD_VLAN_PROTO); }
 	;
 
 ldi
@@ -274,6 +280,9 @@  ld
 	| OP_LD K_RAND {
 		bpf_set_curr_instr(BPF_LD | BPF_W | BPF_ABS, 0, 0,
 				   SKF_AD_OFF + SKF_AD_RANDOM); }
+	| OP_LD K_VLANPR {
+		bpf_set_curr_instr(BPF_LD | BPF_W | BPF_ABS, 0, 0,
+				   SKF_AD_OFF + SKF_AD_VLAN_PROTO); }
 	| OP_LD 'M' '[' number ']' {
 		bpf_set_curr_instr(BPF_LD | BPF_MEM, 0, 0, $4); }
 	| OP_LD '[' 'x' '+' number ']' {