diff mbox series

[bpf,5/5] flow_dissector: document BPF flow dissector environment

Message ID 20190401205734.4400-6-sdf@google.com
State Accepted
Delegated to: BPF Maintainers
Headers show
Series flow_dissector: lay groundwork for calling BPF hook from eth_get_headlen | expand

Commit Message

Stanislav Fomichev April 1, 2019, 8:57 p.m. UTC
Short doc on what BPF flow dissector should expect in the input
__sk_buff and flow_keys.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../networking/bpf_flow_dissector.txt         | 115 ++++++++++++++++++
 1 file changed, 115 insertions(+)
 create mode 100644 Documentation/networking/bpf_flow_dissector.txt

Comments

Petar Penkov April 2, 2019, 8:54 p.m. UTC | #1
On Mon, Apr 1, 2019 at 1:57 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> Short doc on what BPF flow dissector should expect in the input
> __sk_buff and flow_keys.
>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  .../networking/bpf_flow_dissector.txt         | 115 ++++++++++++++++++
>  1 file changed, 115 insertions(+)
>  create mode 100644 Documentation/networking/bpf_flow_dissector.txt
>
> diff --git a/Documentation/networking/bpf_flow_dissector.txt b/Documentation/networking/bpf_flow_dissector.txt
> new file mode 100644
> index 000000000000..513be8e20afb
> --- /dev/null
> +++ b/Documentation/networking/bpf_flow_dissector.txt
> @@ -0,0 +1,115 @@
> +==================
> +BPF Flow Dissector
> +==================
> +
> +Overview
> +========
> +
> +Flow dissector is a routine that parses metadata out of the packets. It's
> +used in the various places in the networking subsystem (RFS, flow hash, etc).
> +
> +BPF flow dissector is an attempt to reimplement C-based flow dissector logic
> +in BPF to gain all the benefits of BPF verifier (namely, limits on the
> +number of instructions and tail calls).
> +
> +API
> +===
> +
> +BPF flow dissector programs operate on an __sk_buff. However, only the
> +limited set of fields is allowed: data, data_end and flow_keys. flow_keys
> +is 'struct bpf_flow_keys' and contains flow dissector input and
> +output arguments.
> +
> +The inputs are:
> +  * nhoff - initial offset of the networking header
> +  * thoff - initial offset of the transport header, initialized to nhoff
> +  * n_proto - L3 protocol type, parsed out of L2 header
> +
> +Flow dissector BPF program should fill out the rest of the 'struct
> +bpf_flow_keys' fields. Input arguments nhoff/thoff/n_proto should be also
> +adjusted accordingly.
> +
> +The return code of the BPF program is either BPF_OK to indicate successful
> +dissection, or BPF_DROP to indicate parsing error.
I don't think this is actually enforced. I believe the current code
just checks if the status is BPF_OK or not, rather than BPF_OK,
BPF_DROP, or neither.

> +
> +__sk_buff->data
> +===============
> +
> +In the VLAN-less case, this is what the initial state of the BPF flow
> +dissector looks like:
> ++------+------+------------+-----------+
> +| DMAC | SMAC | ETHER_TYPE | L3_HEADER |
> ++------+------+------------+-----------+
> +                            ^
> +                            |
> +                            +-- flow dissector starts here
> +
> +skb->data + flow_keys->nhoff point to the first byte of L3_HEADER.
> +flow_keys->thoff = nhoff
> +flow_keys->n_proto = ETHER_TYPE
> +
> +
> +In case of VLAN, flow dissector can be called with the two different states.
> +
> +Pre-VLAN parsing:
> ++------+------+------+-----+-----------+-----------+
> +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
> ++------+------+------+-----+-----------+-----------+
> +                      ^
> +                      |
> +                      +-- flow dissector starts here
> +
> +skb->data + flow_keys->nhoff point the to first byte of TCI.
> +flow_keys->thoff = nhoff
> +flow_keys->n_proto = TPID
> +
> +Please note that TPID can be 802.1AD and, hence, BPF program would
> +have to parse VLAN information twice for double tagged packets.
> +
> +
> +Post-VLAN parsing:
> ++------+------+------+-----+-----------+-----------+
> +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
> ++------+------+------+-----+-----------+-----------+
> +                                        ^
> +                                        |
> +                                        +-- flow dissector starts here
> +
> +skb->data + flow_keys->nhoff point the to first byte of L3_HEADER.
> +flow_keys->thoff = nhoff
> +flow_keys->n_proto = ETHER_TYPE
> +
> +In this case VLAN information has been processed before the flow dissector
> +and BPF flow dissector is not required to handle it.
> +
> +
> +The takeaway here is as follows: BPF flow dissector program can be called with
> +the optional VLAN header and should gracefully handle both cases: when single
> +or double VLAN is present and when it is not present. The same program
> +can be called for both cases and would have to be written carefully to
> +handle both cases.
> +
> +
> +Reference Implementation
> +========================
> +
> +See tools/testing/selftests/bpf/progs/bpf_flow.c for the reference
> +implementation and tools/testing/selftests/bpf/flow_dissector_load.[hc] for
> +the loader. bpftool can be used to load BPF flow dissector program as well.
> +
> +The reference implementation is organized as follows:
> +* jmp_table map that contains sub-programs for each supported L3 protocol
> +* _dissect routine - entry point; it does input n_proto parsing and does
> +  bpf_tail_call to the appropriate L3 handler
> +
> +Since BPF at this point doesn't support looping (or any jumping back),
> +jmp_table is used instead to handle multiple levels of encapsulation (and
> +IPv6 options).
> +
> +
> +Current Limitations
> +===================
> +BPF flow dissector doesn't support exporting all the metadata that in-kernel
> +C-based implementation can export. Notable example is single VLAN (802.1Q)
> +and double VLAN (802.1AD) tags. Please refer to the 'struct bpf_flow_keys'
> +for a set of information that's currently can be exported from the BPF context.
> --
> 2.21.0.392.gf8f6787159e-goog
>
Stanislav Fomichev April 2, 2019, 9 p.m. UTC | #2
On 04/02, Petar Penkov wrote:
> On Mon, Apr 1, 2019 at 1:57 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > Short doc on what BPF flow dissector should expect in the input
> > __sk_buff and flow_keys.
> >
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  .../networking/bpf_flow_dissector.txt         | 115 ++++++++++++++++++
> >  1 file changed, 115 insertions(+)
> >  create mode 100644 Documentation/networking/bpf_flow_dissector.txt
> >
> > diff --git a/Documentation/networking/bpf_flow_dissector.txt b/Documentation/networking/bpf_flow_dissector.txt
> > new file mode 100644
> > index 000000000000..513be8e20afb
> > --- /dev/null
> > +++ b/Documentation/networking/bpf_flow_dissector.txt
> > @@ -0,0 +1,115 @@
> > +==================
> > +BPF Flow Dissector
> > +==================
> > +
> > +Overview
> > +========
> > +
> > +Flow dissector is a routine that parses metadata out of the packets. It's
> > +used in the various places in the networking subsystem (RFS, flow hash, etc).
> > +
> > +BPF flow dissector is an attempt to reimplement C-based flow dissector logic
> > +in BPF to gain all the benefits of BPF verifier (namely, limits on the
> > +number of instructions and tail calls).
> > +
> > +API
> > +===
> > +
> > +BPF flow dissector programs operate on an __sk_buff. However, only the
> > +limited set of fields is allowed: data, data_end and flow_keys. flow_keys
> > +is 'struct bpf_flow_keys' and contains flow dissector input and
> > +output arguments.
> > +
> > +The inputs are:
> > +  * nhoff - initial offset of the networking header
> > +  * thoff - initial offset of the transport header, initialized to nhoff
> > +  * n_proto - L3 protocol type, parsed out of L2 header
> > +
> > +Flow dissector BPF program should fill out the rest of the 'struct
> > +bpf_flow_keys' fields. Input arguments nhoff/thoff/n_proto should be also
> > +adjusted accordingly.
> > +
> > +The return code of the BPF program is either BPF_OK to indicate successful
> > +dissection, or BPF_DROP to indicate parsing error.
> I don't think this is actually enforced. I believe the current code
> just checks if the status is BPF_OK or not, rather than BPF_OK,
> BPF_DROP, or neither.
It's not universally enforced, but some codepaths in the kernel look at
the returned value (e.g. skb_get_poff and eth_get_headlen), so it's
better to set the expectations :-)

> > +
> > +__sk_buff->data
> > +===============
> > +
> > +In the VLAN-less case, this is what the initial state of the BPF flow
> > +dissector looks like:
> > ++------+------+------------+-----------+
> > +| DMAC | SMAC | ETHER_TYPE | L3_HEADER |
> > ++------+------+------------+-----------+
> > +                            ^
> > +                            |
> > +                            +-- flow dissector starts here
> > +
> > +skb->data + flow_keys->nhoff point to the first byte of L3_HEADER.
> > +flow_keys->thoff = nhoff
> > +flow_keys->n_proto = ETHER_TYPE
> > +
> > +
> > +In case of VLAN, flow dissector can be called with the two different states.
> > +
> > +Pre-VLAN parsing:
> > ++------+------+------+-----+-----------+-----------+
> > +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
> > ++------+------+------+-----+-----------+-----------+
> > +                      ^
> > +                      |
> > +                      +-- flow dissector starts here
> > +
> > +skb->data + flow_keys->nhoff point the to first byte of TCI.
> > +flow_keys->thoff = nhoff
> > +flow_keys->n_proto = TPID
> > +
> > +Please note that TPID can be 802.1AD and, hence, BPF program would
> > +have to parse VLAN information twice for double tagged packets.
> > +
> > +
> > +Post-VLAN parsing:
> > ++------+------+------+-----+-----------+-----------+
> > +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
> > ++------+------+------+-----+-----------+-----------+
> > +                                        ^
> > +                                        |
> > +                                        +-- flow dissector starts here
> > +
> > +skb->data + flow_keys->nhoff point the to first byte of L3_HEADER.
> > +flow_keys->thoff = nhoff
> > +flow_keys->n_proto = ETHER_TYPE
> > +
> > +In this case VLAN information has been processed before the flow dissector
> > +and BPF flow dissector is not required to handle it.
> > +
> > +
> > +The takeaway here is as follows: BPF flow dissector program can be called with
> > +the optional VLAN header and should gracefully handle both cases: when single
> > +or double VLAN is present and when it is not present. The same program
> > +can be called for both cases and would have to be written carefully to
> > +handle both cases.
> > +
> > +
> > +Reference Implementation
> > +========================
> > +
> > +See tools/testing/selftests/bpf/progs/bpf_flow.c for the reference
> > +implementation and tools/testing/selftests/bpf/flow_dissector_load.[hc] for
> > +the loader. bpftool can be used to load BPF flow dissector program as well.
> > +
> > +The reference implementation is organized as follows:
> > +* jmp_table map that contains sub-programs for each supported L3 protocol
> > +* _dissect routine - entry point; it does input n_proto parsing and does
> > +  bpf_tail_call to the appropriate L3 handler
> > +
> > +Since BPF at this point doesn't support looping (or any jumping back),
> > +jmp_table is used instead to handle multiple levels of encapsulation (and
> > +IPv6 options).
> > +
> > +
> > +Current Limitations
> > +===================
> > +BPF flow dissector doesn't support exporting all the metadata that in-kernel
> > +C-based implementation can export. Notable example is single VLAN (802.1Q)
> > +and double VLAN (802.1AD) tags. Please refer to the 'struct bpf_flow_keys'
> > +for a set of information that's currently can be exported from the BPF context.
> > --
> > 2.21.0.392.gf8f6787159e-goog
> >
Jesper Dangaard Brouer April 3, 2019, 6:34 p.m. UTC | #3
On Mon,  1 Apr 2019 13:57:34 -0700
Stanislav Fomichev <sdf@google.com> wrote:

> diff --git a/Documentation/networking/bpf_flow_dissector.txt b/Documentation/networking/bpf_flow_dissector.txt
> new file mode 100644
> index 000000000000..513be8e20afb
> --- /dev/null
> +++ b/Documentation/networking/bpf_flow_dissector.txt

It looks like you use the RST format, but you use suffix .txt and not .rst.

If you don't know, these files get rendered on:
 https://www.kernel.org/doc/html/latest/bpf/index.html

And GitHub also render this stuff e.g.
 https://github.com/torvalds/linux/blob/master/Documentation/bpf/bpf_devel_QA.rst
 

> @@ -0,0 +1,115 @@
> +==================
> +BPF Flow Dissector
> +==================
> +
> +Overview
> +========
> +
> +Flow dissector is a routine that parses metadata out of the packets. It's
> +used in the various places in the networking subsystem (RFS, flow hash, etc).
> +
> +BPF flow dissector is an attempt to reimplement C-based flow dissector logic
> +in BPF to gain all the benefits of BPF verifier (namely, limits on the
> +number of instructions and tail calls).
> +
> +API
> +===
> +
Stanislav Fomichev April 3, 2019, 6:50 p.m. UTC | #4
On 04/03, Jesper Dangaard Brouer wrote:
> On Mon,  1 Apr 2019 13:57:34 -0700
> Stanislav Fomichev <sdf@google.com> wrote:
> 
> > diff --git a/Documentation/networking/bpf_flow_dissector.txt b/Documentation/networking/bpf_flow_dissector.txt
> > new file mode 100644
> > index 000000000000..513be8e20afb
> > --- /dev/null
> > +++ b/Documentation/networking/bpf_flow_dissector.txt
> 
> It looks like you use the RST format, but you use suffix .txt and not .rst.
Thanks for the suggestion, let me try to rename it to .rst and build local
htmldocs to make sure it renders correctly (I'm not sure about the ascii art).
If that looks good, I'll follow up with a rename patch.

> 
> If you don't know, these files get rendered on:
>  https://www.kernel.org/doc/html/latest/bpf/index.html
> 
> And GitHub also render this stuff e.g.
>  https://github.com/torvalds/linux/blob/master/Documentation/bpf/bpf_devel_QA.rst
>  
> 
> > @@ -0,0 +1,115 @@
> > +==================
> > +BPF Flow Dissector
> > +==================
> > +
> > +Overview
> > +========
> > +
> > +Flow dissector is a routine that parses metadata out of the packets. It's
> > +used in the various places in the networking subsystem (RFS, flow hash, etc).
> > +
> > +BPF flow dissector is an attempt to reimplement C-based flow dissector logic
> > +in BPF to gain all the benefits of BPF verifier (namely, limits on the
> > +number of instructions and tail calls).
> > +
> > +API
> > +===
> > +
> 
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
diff mbox series

Patch

diff --git a/Documentation/networking/bpf_flow_dissector.txt b/Documentation/networking/bpf_flow_dissector.txt
new file mode 100644
index 000000000000..513be8e20afb
--- /dev/null
+++ b/Documentation/networking/bpf_flow_dissector.txt
@@ -0,0 +1,115 @@ 
+==================
+BPF Flow Dissector
+==================
+
+Overview
+========
+
+Flow dissector is a routine that parses metadata out of the packets. It's
+used in the various places in the networking subsystem (RFS, flow hash, etc).
+
+BPF flow dissector is an attempt to reimplement C-based flow dissector logic
+in BPF to gain all the benefits of BPF verifier (namely, limits on the
+number of instructions and tail calls).
+
+API
+===
+
+BPF flow dissector programs operate on an __sk_buff. However, only the
+limited set of fields is allowed: data, data_end and flow_keys. flow_keys
+is 'struct bpf_flow_keys' and contains flow dissector input and
+output arguments.
+
+The inputs are:
+  * nhoff - initial offset of the networking header
+  * thoff - initial offset of the transport header, initialized to nhoff
+  * n_proto - L3 protocol type, parsed out of L2 header
+
+Flow dissector BPF program should fill out the rest of the 'struct
+bpf_flow_keys' fields. Input arguments nhoff/thoff/n_proto should be also
+adjusted accordingly.
+
+The return code of the BPF program is either BPF_OK to indicate successful
+dissection, or BPF_DROP to indicate parsing error.
+
+__sk_buff->data
+===============
+
+In the VLAN-less case, this is what the initial state of the BPF flow
+dissector looks like:
++------+------+------------+-----------+
+| DMAC | SMAC | ETHER_TYPE | L3_HEADER |
++------+------+------------+-----------+
+                            ^
+                            |
+                            +-- flow dissector starts here
+
+skb->data + flow_keys->nhoff point to the first byte of L3_HEADER.
+flow_keys->thoff = nhoff
+flow_keys->n_proto = ETHER_TYPE
+
+
+In case of VLAN, flow dissector can be called with the two different states.
+
+Pre-VLAN parsing:
++------+------+------+-----+-----------+-----------+
+| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
++------+------+------+-----+-----------+-----------+
+                      ^
+                      |
+                      +-- flow dissector starts here
+
+skb->data + flow_keys->nhoff point the to first byte of TCI.
+flow_keys->thoff = nhoff
+flow_keys->n_proto = TPID
+
+Please note that TPID can be 802.1AD and, hence, BPF program would
+have to parse VLAN information twice for double tagged packets.
+
+
+Post-VLAN parsing:
++------+------+------+-----+-----------+-----------+
+| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
++------+------+------+-----+-----------+-----------+
+                                        ^
+                                        |
+                                        +-- flow dissector starts here
+
+skb->data + flow_keys->nhoff point the to first byte of L3_HEADER.
+flow_keys->thoff = nhoff
+flow_keys->n_proto = ETHER_TYPE
+
+In this case VLAN information has been processed before the flow dissector
+and BPF flow dissector is not required to handle it.
+
+
+The takeaway here is as follows: BPF flow dissector program can be called with
+the optional VLAN header and should gracefully handle both cases: when single
+or double VLAN is present and when it is not present. The same program
+can be called for both cases and would have to be written carefully to
+handle both cases.
+
+
+Reference Implementation
+========================
+
+See tools/testing/selftests/bpf/progs/bpf_flow.c for the reference
+implementation and tools/testing/selftests/bpf/flow_dissector_load.[hc] for
+the loader. bpftool can be used to load BPF flow dissector program as well.
+
+The reference implementation is organized as follows:
+* jmp_table map that contains sub-programs for each supported L3 protocol
+* _dissect routine - entry point; it does input n_proto parsing and does
+  bpf_tail_call to the appropriate L3 handler
+
+Since BPF at this point doesn't support looping (or any jumping back),
+jmp_table is used instead to handle multiple levels of encapsulation (and
+IPv6 options).
+
+
+Current Limitations
+===================
+BPF flow dissector doesn't support exporting all the metadata that in-kernel
+C-based implementation can export. Notable example is single VLAN (802.1Q)
+and double VLAN (802.1AD) tags. Please refer to the 'struct bpf_flow_keys'
+for a set of information that's currently can be exported from the BPF context.