[PATCH/RFC,net-next,1/2] flow dissector: ND support

Message ID	1486031855-10551-2-git-send-email-simon.horman@netronome.com
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Simon Horman <simon.horman@netronome.com> To: David Miller <davem@davemloft.net> Cc: Dinan Gunawardena <dinan.gunawardena@netronome.com>, netdev@vger.kernel.org, oss-drivers@netronome.com, Simon Horman <simon.horman@netronome.com> Subject: [PATCH/RFC net-next 1/2] flow dissector: ND support Date: Thu, 2 Feb 2017 11:37:34 +0100 Message-Id: <1486031855-10551-2-git-send-email-simon.horman@netronome.com> In-Reply-To: <1486031855-10551-1-git-send-email-simon.horman@netronome.com> References: <1486031855-10551-1-git-send-email-simon.horman@netronome.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk

Simon Horman Feb. 2, 2017, 10:37 a.m. UTC

Allow dissection of Neighbour Discovery target IP, and source and
destination link-layer addresses for neighbour solicitation and
advertisement messages.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 include/net/flow_dissector.h | 14 ++++++++
 net/core/flow_dissector.c    | 83 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 96 insertions(+), 1 deletion(-)

Eric Dumazet Feb. 2, 2017, 12:31 p.m. UTC | #1

On Thu, 2017-02-02 at 11:37 +0100, Simon Horman wrote:
> Allow dissection of Neighbour Discovery target IP, and source and
> destination link-layer addresses for neighbour solicitation and
> advertisement messages.
> 
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
> ---

Hi Simon

Why is this needed ?

Any code added in flow dissector needs to be extra careful,
we had various packet of deaths errors recently in this area.

Thanks.

Simon Horman Feb. 2, 2017, 3:58 p.m. UTC | #2

[Repost due to gmail account problem]

On Thu, Feb 02, 2017 at 04:31:33AM -0800, Eric Dumazet wrote:
> On Thu, 2017-02-02 at 11:37 +0100, Simon Horman wrote:
> > Allow dissection of Neighbour Discovery target IP, and source and
> > destination link-layer addresses for neighbour solicitation and
> > advertisement messages.
> > 
> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
> > ---
> 
> Hi Simon
> 
> Why is this needed ?
> 
> Any code added in flow dissector needs to be extra careful,
> we had various packet of deaths errors recently in this area.

Hi Eric,

there some activity to allow programming of OvS flows in hardware via TC
with the flower classifier. As the ND fields in this patch are part of the
OvS flow key I would like them considered for additions to flower and thus
the dissector to allow compatibility with OvS.

I apologise if any 'deaths' have resulted from my recent work on the
dissector. I am of course very open to ideas on how to avoid any future
incidents.

Tom Herbert Feb. 2, 2017, 5:24 p.m. UTC | #3

On Thu, Feb 2, 2017 at 7:58 AM, Simon Horman <simon.horman@netronome.com> wrote:
> [Repost due to gmail account problem]
>
> On Thu, Feb 02, 2017 at 04:31:33AM -0800, Eric Dumazet wrote:
>> On Thu, 2017-02-02 at 11:37 +0100, Simon Horman wrote:
>> > Allow dissection of Neighbour Discovery target IP, and source and
>> > destination link-layer addresses for neighbour solicitation and
>> > advertisement messages.
>> >
>> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
>> > ---
>>
>> Hi Simon
>>
>> Why is this needed ?
>>
>> Any code added in flow dissector needs to be extra careful,
>> we had various packet of deaths errors recently in this area.
>
> Hi Eric,
>
> there some activity to allow programming of OvS flows in hardware via TC
> with the flower classifier. As the ND fields in this patch are part of the
> OvS flow key I would like them considered for additions to flower and thus
> the dissector to allow compatibility with OvS.
>
Given that ARP is already there it seems only "fair" to have ND also.
But Eric is correct, this is quite a sensitive area of code.

> I apologise if any 'deaths' have resulted from my recent work on the
> dissector. I am of course very open to ideas on how to avoid any future
> incidents.

That's a tough problem. flow_dissector started off as simple mechanism
to just identify actual flows (really just TCP and UDP packets) for
the purposes of packet steering. But given the benefits of its
location low in the stack and the open ended capabilities for parsing
it seems to have mushroomed into a general catchall to parse a whole
bunch of different protocols. A lot of these go beyond simply
identifying flows (ICMP parsing, ARP, or ND as in your patches). These
new use cases may be valid, but the result is a convoluted function (>
500 LOC by my count) and it seems to be quite easy to have subtle bugs
mostly in edge cases, several of which could have been exploited in
DDOS attacks.

At some point we need to stop adding new protocols to parse in
__skb_flow_dissect and push the processing back into the protocol
modules with a callout interface from flow_dissector (for instance if
we ever want VXLAN parsing in flow dissector this is the only
reasonable way to do it). That moves the complexity but doesn't solve
the problem of buggy code in this critical path. An alternative might
be to put a cap on flow_dissector and add a hook to BPF program to
allow parsing of new protocols. This has the advantage of providing an
constrained interface that could eliminate possibility of some types
of bugs we've seen. Also, this allows adding support for "user"
protocols that the kernel might not even know about (QUIC comes to
mind).

Tom

Jiri Pirko Feb. 2, 2017, 5:48 p.m. UTC | #4

Thu, Feb 02, 2017 at 06:24:40PM CET, tom@herbertland.com wrote:
>On Thu, Feb 2, 2017 at 7:58 AM, Simon Horman <simon.horman@netronome.com> wrote:
>> [Repost due to gmail account problem]
>>
>> On Thu, Feb 02, 2017 at 04:31:33AM -0800, Eric Dumazet wrote:
>>> On Thu, 2017-02-02 at 11:37 +0100, Simon Horman wrote:
>>> > Allow dissection of Neighbour Discovery target IP, and source and
>>> > destination link-layer addresses for neighbour solicitation and
>>> > advertisement messages.
>>> >
>>> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
>>> > ---
>>>
>>> Hi Simon
>>>
>>> Why is this needed ?
>>>
>>> Any code added in flow dissector needs to be extra careful,
>>> we had various packet of deaths errors recently in this area.
>>
>> Hi Eric,
>>
>> there some activity to allow programming of OvS flows in hardware via TC
>> with the flower classifier. As the ND fields in this patch are part of the
>> OvS flow key I would like them considered for additions to flower and thus
>> the dissector to allow compatibility with OvS.
>>
>Given that ARP is already there it seems only "fair" to have ND also.
>But Eric is correct, this is quite a sensitive area of code.
>
>> I apologise if any 'deaths' have resulted from my recent work on the
>> dissector. I am of course very open to ideas on how to avoid any future
>> incidents.
>
>That's a tough problem. flow_dissector started off as simple mechanism
>to just identify actual flows (really just TCP and UDP packets) for
>the purposes of packet steering. But given the benefits of its
>location low in the stack and the open ended capabilities for parsing
>it seems to have mushroomed into a general catchall to parse a whole
>bunch of different protocols. A lot of these go beyond simply
>identifying flows (ICMP parsing, ARP, or ND as in your patches). These
>new use cases may be valid, but the result is a convoluted function (>
>500 LOC by my count) and it seems to be quite easy to have subtle bugs
>mostly in edge cases, several of which could have been exploited in
>DDOS attacks.

Agreed that we probably came to a point when we need to split
__skb_flow_dissect into modular and pluggable pieces. Will not be
trivial though.

Also note that it depends on the __skb_flow_dissect user which code is
actually used or not. For the critical path, that keys are defined by:
flow_keys_dissector_keys

Most of the code Simon is adding is noop for non-flower usecase if:
dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) == false


>
>At some point we need to stop adding new protocols to parse in
>__skb_flow_dissect and push the processing back into the protocol
>modules with a callout interface from flow_dissector (for instance if
>we ever want VXLAN parsing in flow dissector this is the only
>reasonable way to do it). That moves the complexity but doesn't solve
>the problem of buggy code in this critical path. An alternative might
>be to put a cap on flow_dissector and add a hook to BPF program to

Loks like BPF is becoming an answer for everything these days :O


>allow parsing of new protocols. This has the advantage of providing an
>constrained interface that could eliminate possibility of some types
>of bugs we've seen. Also, this allows adding support for "user"
>protocols that the kernel might not even know about (QUIC comes to
>mind).

Not sure it is wise to make life easier for the proprietary
out-of-tree beasts...

Tom Herbert Feb. 2, 2017, 6:36 p.m. UTC | #5

On Thu, Feb 2, 2017 at 9:48 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, Feb 02, 2017 at 06:24:40PM CET, tom@herbertland.com wrote:
>>On Thu, Feb 2, 2017 at 7:58 AM, Simon Horman <simon.horman@netronome.com> wrote:
>>> [Repost due to gmail account problem]
>>>
>>> On Thu, Feb 02, 2017 at 04:31:33AM -0800, Eric Dumazet wrote:
>>>> On Thu, 2017-02-02 at 11:37 +0100, Simon Horman wrote:
>>>> > Allow dissection of Neighbour Discovery target IP, and source and
>>>> > destination link-layer addresses for neighbour solicitation and
>>>> > advertisement messages.
>>>> >
>>>> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
>>>> > ---
>>>>
>>>> Hi Simon
>>>>
>>>> Why is this needed ?
>>>>
>>>> Any code added in flow dissector needs to be extra careful,
>>>> we had various packet of deaths errors recently in this area.
>>>
>>> Hi Eric,
>>>
>>> there some activity to allow programming of OvS flows in hardware via TC
>>> with the flower classifier. As the ND fields in this patch are part of the
>>> OvS flow key I would like them considered for additions to flower and thus
>>> the dissector to allow compatibility with OvS.
>>>
>>Given that ARP is already there it seems only "fair" to have ND also.
>>But Eric is correct, this is quite a sensitive area of code.
>>
>>> I apologise if any 'deaths' have resulted from my recent work on the
>>> dissector. I am of course very open to ideas on how to avoid any future
>>> incidents.
>>
>>That's a tough problem. flow_dissector started off as simple mechanism
>>to just identify actual flows (really just TCP and UDP packets) for
>>the purposes of packet steering. But given the benefits of its
>>location low in the stack and the open ended capabilities for parsing
>>it seems to have mushroomed into a general catchall to parse a whole
>>bunch of different protocols. A lot of these go beyond simply
>>identifying flows (ICMP parsing, ARP, or ND as in your patches). These
>>new use cases may be valid, but the result is a convoluted function (>
>>500 LOC by my count) and it seems to be quite easy to have subtle bugs
>>mostly in edge cases, several of which could have been exploited in
>>DDOS attacks.
>
> Agreed that we probably came to a point when we need to split
> __skb_flow_dissect into modular and pluggable pieces. Will not be
> trivial though.
>
> Also note that it depends on the __skb_flow_dissect user which code is
> actually used or not. For the critical path, that keys are defined by:
> flow_keys_dissector_keys
>
True, but the code doesn't separate out the critical path from all
these extended features which is resulted in a jumbled mess with no
modularity to speak of :-(

> Most of the code Simon is adding is noop for non-flower usecase if:
> dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) == false
>
Sure, but that just makes this code corner cases which means it's hard
to maintain and harder to find bugs in the long run.

>
>>
>>At some point we need to stop adding new protocols to parse in
>>__skb_flow_dissect and push the processing back into the protocol
>>modules with a callout interface from flow_dissector (for instance if
>>we ever want VXLAN parsing in flow dissector this is the only
>>reasonable way to do it). That moves the complexity but doesn't solve
>>the problem of buggy code in this critical path. An alternative might
>>be to put a cap on flow_dissector and add a hook to BPF program to
>
> Loks like BPF is becoming an answer for everything these days :O
>
In this case it makes sense though, we can't just continue accepting
every poor little narrow use-case protocol that comes along into the
kernel :-)

>
>>allow parsing of new protocols. This has the advantage of providing an
>>constrained interface that could eliminate possibility of some types
>>of bugs we've seen. Also, this allows adding support for "user"
>>protocols that the kernel might not even know about (QUIC comes to
>>mind).
>
> Not sure it is wise to make life easier for the proprietary
> out-of-tree beasts...

It's going to be a problem with a whole host of application level
protocols especially those run over UDP. QUIC is a great example. The
actual protocol will probably only ever run in userspace, but it is
inevitable that we want to provide targeted kernel support for packet
steering. filtering, GRO/GSO if they have such things. Instead of
implementing this in a specialized QUIC module, it will most likely
make everyone happier to add these in a generic protocol-agnostic way.
From QUIC POV they want to minimize any dependencies on the kernel and
be able to iterate quickly, from a kernel POV we really don't want to
have to explicitly support an endless stream of protocols like this.

Tom

Eric Dumazet Feb. 2, 2017, 6:56 p.m. UTC | #6

On Thu, 2017-02-02 at 10:36 -0800, Tom Herbert wrote:

> It's going to be a problem with a whole host of application level
> protocols especially those run over UDP. QUIC is a great example. The
> actual protocol will probably only ever run in userspace, but it is
> inevitable that we want to provide targeted kernel support for packet
> steering. filtering, GRO/GSO if they have such things. Instead of
> implementing this in a specialized QUIC module, it will most likely
> make everyone happier to add these in a generic protocol-agnostic way.
> From QUIC POV they want to minimize any dependencies on the kernel and
> be able to iterate quickly, from a kernel POV we really don't want to
> have to explicitly support an endless stream of protocols like this.

Small note here : Google does not intend to add QUIC knowledge anywhere
in the kernel.

SO_ATTACH_REUSEPORT_EBPF can be used if someone wants to take a deep
look at QUIC header (and presumably hash the connection-id) to select
one socket among the group.

Thanks.

Tom Herbert Feb. 2, 2017, 7:19 p.m. UTC | #7

On Thu, Feb 2, 2017 at 10:56 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2017-02-02 at 10:36 -0800, Tom Herbert wrote:
>
>> It's going to be a problem with a whole host of application level
>> protocols especially those run over UDP. QUIC is a great example. The
>> actual protocol will probably only ever run in userspace, but it is
>> inevitable that we want to provide targeted kernel support for packet
>> steering. filtering, GRO/GSO if they have such things. Instead of
>> implementing this in a specialized QUIC module, it will most likely
>> make everyone happier to add these in a generic protocol-agnostic way.
>> From QUIC POV they want to minimize any dependencies on the kernel and
>> be able to iterate quickly, from a kernel POV we really don't want to
>> have to explicitly support an endless stream of protocols like this.
>
> Small note here : Google does not intend to add QUIC knowledge anywhere
> in the kernel.
>
Someone else might want to do that. It's also conceivable that we
could see NIC vendors trying to offload parts of QUIC and they would
want support in the kernel for that.

Tom

> SO_ATTACH_REUSEPORT_EBPF can be used if someone wants to take a deep
> look at QUIC header (and presumably hash the connection-id) to select
> one socket among the group.
>
> Thanks.
>
>
>

Simon Horman Feb. 6, 2017, 9:12 a.m. UTC | #8

On Thu, Feb 02, 2017 at 10:36:31AM -0800, Tom Herbert wrote:
> On Thu, Feb 2, 2017 at 9:48 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> > Thu, Feb 02, 2017 at 06:24:40PM CET, tom@herbertland.com wrote:
> >>On Thu, Feb 2, 2017 at 7:58 AM, Simon Horman <simon.horman@netronome.com> wrote:
> >>> [Repost due to gmail account problem]
> >>>
> >>> On Thu, Feb 02, 2017 at 04:31:33AM -0800, Eric Dumazet wrote:
> >>>> On Thu, 2017-02-02 at 11:37 +0100, Simon Horman wrote:
> >>>> > Allow dissection of Neighbour Discovery target IP, and source and
> >>>> > destination link-layer addresses for neighbour solicitation and
> >>>> > advertisement messages.
> >>>> >
> >>>> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
> >>>> > ---
> >>>>
> >>>> Hi Simon
> >>>>
> >>>> Why is this needed ?
> >>>>
> >>>> Any code added in flow dissector needs to be extra careful,
> >>>> we had various packet of deaths errors recently in this area.
> >>>
> >>> Hi Eric,
> >>>
> >>> there some activity to allow programming of OvS flows in hardware via TC
> >>> with the flower classifier. As the ND fields in this patch are part of the
> >>> OvS flow key I would like them considered for additions to flower and thus
> >>> the dissector to allow compatibility with OvS.
> >>>
> >>Given that ARP is already there it seems only "fair" to have ND also.
> >>But Eric is correct, this is quite a sensitive area of code.
> >>
> >>> I apologise if any 'deaths' have resulted from my recent work on the
> >>> dissector. I am of course very open to ideas on how to avoid any future
> >>> incidents.
> >>
> >>That's a tough problem. flow_dissector started off as simple mechanism
> >>to just identify actual flows (really just TCP and UDP packets) for
> >>the purposes of packet steering. But given the benefits of its
> >>location low in the stack and the open ended capabilities for parsing
> >>it seems to have mushroomed into a general catchall to parse a whole
> >>bunch of different protocols. A lot of these go beyond simply
> >>identifying flows (ICMP parsing, ARP, or ND as in your patches). These
> >>new use cases may be valid, but the result is a convoluted function (>
> >>500 LOC by my count) and it seems to be quite easy to have subtle bugs
> >>mostly in edge cases, several of which could have been exploited in
> >>DDOS attacks.
> >
> > Agreed that we probably came to a point when we need to split
> > __skb_flow_dissect into modular and pluggable pieces. Will not be
> > trivial though.
> >
> > Also note that it depends on the __skb_flow_dissect user which code is
> > actually used or not. For the critical path, that keys are defined by:
> > flow_keys_dissector_keys
> >
> True, but the code doesn't separate out the critical path from all
> these extended features which is resulted in a jumbled mess with no
> modularity to speak of :-(
> 
> > Most of the code Simon is adding is noop for non-flower usecase if:
> > dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) == false
> >
> Sure, but that just makes this code corner cases which means it's hard
> to maintain and harder to find bugs in the long run.

I think that there is a bit of tension here between having generic reusable
code on the one hand and having a small robust implementation for critical
code.

From my point of view having the flow dissector shared makes complete sense
when all the users need a similar set of keys. This was mostly the case
until one user, flower, started growing the number of keys it can use - I am
partly responsible for that. So now we have one user that is placing a
burden on the complexity and recently the robustness of code that is relied
on by many users.

I think that Jiri's point regarding dissector_uses_key() is the nub of the
issue. On the one hand I believe, given my recent updates to the code in
question, that covering more code by such conditionals - without necessarily
adding more instances of the conditionals - would lead to more robust code
for the current users. But as Tom points out paradoxically it may also lead
to less robust code for paths that aren't executed much - specifically
paths being added to support keys only used by flower.

I suspect a similar paradox would exists for other, likely more complex,
refactoring efforts.

From my side I think that using dissector_uses_key() to cover more code
makes sense as it allow features to be added to flower and for bugs in them
to be ironed out with lower risk of effecting other users along the way.

> >>At some point we need to stop adding new protocols to parse in
> >>__skb_flow_dissect and push the processing back into the protocol
> >>modules with a callout interface from flow_dissector (for instance if
> >>we ever want VXLAN parsing in flow dissector this is the only
> >>reasonable way to do it). That moves the complexity but doesn't solve
> >>the problem of buggy code in this critical path. An alternative might
> >>be to put a cap on flow_dissector and add a hook to BPF program to
> >
> > Loks like BPF is becoming an answer for everything these days :O
> >
> In this case it makes sense though, we can't just continue accepting
> every poor little narrow use-case protocol that comes along into the
> kernel :-)

Maybe. I'm not sure that I think we should exclude narrow cases. But
I have no strong feeling on that at this time.

To lay my intentions on the table: I am interested in enhancing flower and
by implication the flow dissector to support the header fields and
protocols supported by OvS. I can check the list but possibly ND is the
narrowest case protocol there.

> >>allow parsing of new protocols. This has the advantage of providing an
> >>constrained interface that could eliminate possibility of some types
> >>of bugs we've seen. Also, this allows adding support for "user"
> >>protocols that the kernel might not even know about (QUIC comes to
> >>mind).
> >
> > Not sure it is wise to make life easier for the proprietary
> > out-of-tree beasts...
> 
> It's going to be a problem with a whole host of application level
> protocols especially those run over UDP. QUIC is a great example. The
> actual protocol will probably only ever run in userspace, but it is
> inevitable that we want to provide targeted kernel support for packet
> steering. filtering, GRO/GSO if they have such things. Instead of
> implementing this in a specialized QUIC module, it will most likely
> make everyone happier to add these in a generic protocol-agnostic way.
> From QUIC POV they want to minimize any dependencies on the kernel and
> be able to iterate quickly, from a kernel POV we really don't want to
> have to explicitly support an endless stream of protocols like this.
> 
> Tom

Tom Herbert Feb. 7, 2017, 5:36 p.m. UTC | #9

On Mon, Feb 6, 2017 at 1:12 AM, Simon Horman <simon.horman@netronome.com> wrote:
> On Thu, Feb 02, 2017 at 10:36:31AM -0800, Tom Herbert wrote:
>> On Thu, Feb 2, 2017 at 9:48 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> > Thu, Feb 02, 2017 at 06:24:40PM CET, tom@herbertland.com wrote:
>> >>On Thu, Feb 2, 2017 at 7:58 AM, Simon Horman <simon.horman@netronome.com> wrote:
>> >>> [Repost due to gmail account problem]
>> >>>
>> >>> On Thu, Feb 02, 2017 at 04:31:33AM -0800, Eric Dumazet wrote:
>> >>>> On Thu, 2017-02-02 at 11:37 +0100, Simon Horman wrote:
>> >>>> > Allow dissection of Neighbour Discovery target IP, and source and
>> >>>> > destination link-layer addresses for neighbour solicitation and
>> >>>> > advertisement messages.
>> >>>> >
>> >>>> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
>> >>>> > ---
>> >>>>
>> >>>> Hi Simon
>> >>>>
>> >>>> Why is this needed ?
>> >>>>
>> >>>> Any code added in flow dissector needs to be extra careful,
>> >>>> we had various packet of deaths errors recently in this area.
>> >>>
>> >>> Hi Eric,
>> >>>
>> >>> there some activity to allow programming of OvS flows in hardware via TC
>> >>> with the flower classifier. As the ND fields in this patch are part of the
>> >>> OvS flow key I would like them considered for additions to flower and thus
>> >>> the dissector to allow compatibility with OvS.
>> >>>
>> >>Given that ARP is already there it seems only "fair" to have ND also.
>> >>But Eric is correct, this is quite a sensitive area of code.
>> >>
>> >>> I apologise if any 'deaths' have resulted from my recent work on the
>> >>> dissector. I am of course very open to ideas on how to avoid any future
>> >>> incidents.
>> >>
>> >>That's a tough problem. flow_dissector started off as simple mechanism
>> >>to just identify actual flows (really just TCP and UDP packets) for
>> >>the purposes of packet steering. But given the benefits of its
>> >>location low in the stack and the open ended capabilities for parsing
>> >>it seems to have mushroomed into a general catchall to parse a whole
>> >>bunch of different protocols. A lot of these go beyond simply
>> >>identifying flows (ICMP parsing, ARP, or ND as in your patches). These
>> >>new use cases may be valid, but the result is a convoluted function (>
>> >>500 LOC by my count) and it seems to be quite easy to have subtle bugs
>> >>mostly in edge cases, several of which could have been exploited in
>> >>DDOS attacks.
>> >
>> > Agreed that we probably came to a point when we need to split
>> > __skb_flow_dissect into modular and pluggable pieces. Will not be
>> > trivial though.
>> >
>> > Also note that it depends on the __skb_flow_dissect user which code is
>> > actually used or not. For the critical path, that keys are defined by:
>> > flow_keys_dissector_keys
>> >
>> True, but the code doesn't separate out the critical path from all
>> these extended features which is resulted in a jumbled mess with no
>> modularity to speak of :-(
>>
>> > Most of the code Simon is adding is noop for non-flower usecase if:
>> > dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) == false
>> >
>> Sure, but that just makes this code corner cases which means it's hard
>> to maintain and harder to find bugs in the long run.
>
> I think that there is a bit of tension here between having generic reusable
> code on the one hand and having a small robust implementation for critical
> code.
>
> From my point of view having the flow dissector shared makes complete sense
> when all the users need a similar set of keys. This was mostly the case
> until one user, flower, started growing the number of keys it can use - I am
> partly responsible for that. So now we have one user that is placing a
> burden on the complexity and recently the robustness of code that is relied
> on by many users.
>
> I think that Jiri's point regarding dissector_uses_key() is the nub of the
> issue. On the one hand I believe, given my recent updates to the code in
> question, that covering more code by such conditionals - without necessarily
> adding more instances of the conditionals - would lead to more robust code
> for the current users. But as Tom points out paradoxically it may also lead
> to less robust code for paths that aren't executed much - specifically
> paths being added to support keys only used by flower.
>
> I suspect a similar paradox would exists for other, likely more complex,
> refactoring efforts.
>
> From my side I think that using dissector_uses_key() to cover more code
> makes sense as it allow features to be added to flower and for bugs in them
> to be ironed out with lower risk of effecting other users along the way.
>
>> >>At some point we need to stop adding new protocols to parse in
>> >>__skb_flow_dissect and push the processing back into the protocol
>> >>modules with a callout interface from flow_dissector (for instance if
>> >>we ever want VXLAN parsing in flow dissector this is the only
>> >>reasonable way to do it). That moves the complexity but doesn't solve
>> >>the problem of buggy code in this critical path. An alternative might
>> >>be to put a cap on flow_dissector and add a hook to BPF program to
>> >
>> > Loks like BPF is becoming an answer for everything these days :O
>> >
>> In this case it makes sense though, we can't just continue accepting
>> every poor little narrow use-case protocol that comes along into the
>> kernel :-)
>
> Maybe. I'm not sure that I think we should exclude narrow cases. But
> I have no strong feeling on that at this time.
>
> To lay my intentions on the table: I am interested in enhancing flower and
> by implication the flow dissector to support the header fields and
> protocols supported by OvS. I can check the list but possibly ND is the
> narrowest case protocol there.
>
Okay, but can you give us an idea of how many more of these protocols
are going to be added to flow_dissector. TBH I'm not very enthused
about making more flow_dissector more complex for the benefit of OVS.

Tom

>> >>allow parsing of new protocols. This has the advantage of providing an
>> >>constrained interface that could eliminate possibility of some types
>> >>of bugs we've seen. Also, this allows adding support for "user"
>> >>protocols that the kernel might not even know about (QUIC comes to
>> >>mind).
>> >
>> > Not sure it is wise to make life easier for the proprietary
>> > out-of-tree beasts...
>>
>> It's going to be a problem with a whole host of application level
>> protocols especially those run over UDP. QUIC is a great example. The
>> actual protocol will probably only ever run in userspace, but it is
>> inevitable that we want to provide targeted kernel support for packet
>> steering. filtering, GRO/GSO if they have such things. Instead of
>> implementing this in a specialized QUIC module, it will most likely
>> make everyone happier to add these in a generic protocol-agnostic way.
>> From QUIC POV they want to minimize any dependencies on the kernel and
>> be able to iterate quickly, from a kernel POV we really don't want to
>> have to explicitly support an endless stream of protocols like this.
>>
>> Tom

David Miller Feb. 7, 2017, 5:38 p.m. UTC | #10

From: Tom Herbert <tom@herbertland.com>
Date: Tue, 7 Feb 2017 09:36:20 -0800

> Okay, but can you give us an idea of how many more of these protocols
> are going to be added to flow_dissector. TBH I'm not very enthused
> about making more flow_dissector more complex for the benefit of OVS.

Especially since the kernel datapath of OVS has been marked
experimental.

Simon Horman Feb. 8, 2017, 9:28 a.m. UTC | #11

On Tue, Feb 07, 2017 at 12:38:31PM -0500, David Miller wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Tue, 7 Feb 2017 09:36:20 -0800
> 
> > Okay, but can you give us an idea of how many more of these protocols
> > are going to be added to flow_dissector. TBH I'm not very enthused
> > about making more flow_dissector more complex for the benefit of OVS.
> 
> Especially since the kernel datapath of OVS has been marked
> experimental.

Hi Dave, Hi Tom,

Firstly I'd like to apologise for posing what has turned out to be a
somewhat divisive patch.

After looking through things a little more I think the simple answer to
Tom's question is only ND. But there are also some fields of already
supported protocols which are covered in the OvS flow key but not the flow
dissector.

My analysis of OvS flow key fields for non-tunnel packet data - which I
think is the extent of what is relevant to the flow dissector - yields the
following list:

  New protocols:
  * ND (this patch)

  Fields of already supported protocols:
  * MPLS.lse (currently the label of the LSE is handled by the dissector
              so this could also be described as MPLS.lse.{tc,s,ttl})
  * IPV4.tos
  * IPV4.ttl
  * IPV6.hlimit
  * IPV6.tclass

  I do expect that some of the above will not be appropriate for existing
  users of the flow dissector; e.g. IPV4.ttl does not seem much use when
  calculating the hash of a flow for steering purposes. And I do not yet
  have a good idea of how to approach that beyond using different dissector
  keys.

  Further support for to already supported fields.
  * IPV[46].frag (supported by flow dissector : 1st frag flag;
                  not supported by flow dissector: subsequent frag flag)

From my point of view some parts of what is above is more important than
others. In particular ND is probably not particularly important at least at
this time. I would be happy to withdraw this patchset if the complexity is
not deemed worth it at this time.

I'd like to take a moment to give a little background and restate that my
goal is not to create a burden for existing users of the flow dissector
- or any other part of the stack.

Netronome has worked on various approaches to an upstream offload of OvS.
One was hooks added to the OvS kernel datapath; an idea which was rejected
at least twice but none the less I'd be happy to revisit if there is
interest in it.

The result of trying various approaches is that it seems the most
acceptable is to use TC for programming flows into hardware which is where
my work on TC flower and the flow dissector is coming from.  A key part of
the reasoning being, as I understand it, that the flows could also be
programmed into hardware for non-OvS use-cases; technology that could be
useful e.g. in a post-OvS world.

I think the above paragraph gets back to Tom's original question regarding
making things more complex just for OvS (use-cases). Possibly ND is an edge
case even for OvS and on reflection my timing for posting it seems to have
been less than ideal.

Simon Horman Feb. 8, 2017, 4:43 p.m. UTC | #12

Responding to myself as my previous post doesn't seem to have hit netdev
for some reason.

On Wed, Feb 08, 2017 at 10:28:24AM +0100, Simon Horman wrote:
> On Tue, Feb 07, 2017 at 12:38:31PM -0500, David Miller wrote:
> > From: Tom Herbert <tom@herbertland.com>
> > Date: Tue, 7 Feb 2017 09:36:20 -0800
> > 
> > > Okay, but can you give us an idea of how many more of these protocols
> > > are going to be added to flow_dissector. TBH I'm not very enthused
> > > about making more flow_dissector more complex for the benefit of OVS.
> > 
> > Especially since the kernel datapath of OVS has been marked
> > experimental.
> 
> Hi Dave, Hi Tom,
> 
> Firstly I'd like to apologise for posing what has turned out to be a
> somewhat divisive patch.
> 
> After looking through things a little more I think the simple answer to
> Tom's question is only ND. But there are also some fields of already
> supported protocols which are covered in the OvS flow key but not the flow
> dissector.
> 
> My analysis of OvS flow key fields for non-tunnel packet data - which I
> think is the extent of what is relevant to the flow dissector - yields the
> following list:
> 
>   New protocols:
>   * ND (this patch)
> 
>   Fields of already supported protocols:
>   * MPLS.lse (currently the label of the LSE is handled by the dissector
>               so this could also be described as MPLS.lse.{tc,s,ttl})
>   * IPV4.tos
>   * IPV4.ttl
>   * IPV6.hlimit
>   * IPV6.tclass
> 
>   I do expect that some of the above will not be appropriate for existing
>   users of the flow dissector; e.g. IPV4.ttl does not seem much use when
>   calculating the hash of a flow for steering purposes. And I do not yet
>   have a good idea of how to approach that beyond using different dissector
>   keys.
> 
>   Further support for to already supported fields.
>   * IPV[46].frag (supported by flow dissector : 1st frag flag;
>                   not supported by flow dissector: subsequent frag flag)
> 
> From my point of view some parts of what is above is more important than
> others. In particular ND is probably not particularly important at least at
> this time. I would be happy to withdraw this patchset if the complexity is
> not deemed worth it at this time.
> 
> 
> I'd like to take a moment to give a little background and restate that my
> goal is not to create a burden for existing users of the flow dissector
> - or any other part of the stack.
> 
> Netronome has worked on various approaches to an upstream offload of OvS.
> One was hooks added to the OvS kernel datapath; an idea which was rejected
> at least twice but none the less I'd be happy to revisit if there is
> interest in it.
> 
> The result of trying various approaches is that it seems the most
> acceptable is to use TC for programming flows into hardware which is where
> my work on TC flower and the flow dissector is coming from.  A key part of
> the reasoning being, as I understand it, that the flows could also be
> programmed into hardware for non-OvS use-cases; technology that could be
> useful e.g. in a post-OvS world.
> 
> I think the above paragraph gets back to Tom's original question regarding
> making things more complex just for OvS (use-cases). Possibly ND is an edge
> case even for OvS and on reflection my timing for posting it seems to have
> been less than ideal.

Tom Herbert Feb. 8, 2017, 6:33 p.m. UTC | #13

On Wed, Feb 8, 2017 at 1:28 AM, Simon Horman <simon.horman@netronome.com> wrote:
> On Tue, Feb 07, 2017 at 12:38:31PM -0500, David Miller wrote:
>> From: Tom Herbert <tom@herbertland.com>
>> Date: Tue, 7 Feb 2017 09:36:20 -0800
>>
>> > Okay, but can you give us an idea of how many more of these protocols
>> > are going to be added to flow_dissector. TBH I'm not very enthused
>> > about making more flow_dissector more complex for the benefit of OVS.
>>
>> Especially since the kernel datapath of OVS has been marked
>> experimental.
>
> Hi Dave, Hi Tom,
>
> Firstly I'd like to apologise for posing what has turned out to be a
> somewhat divisive patch.
>
> After looking through things a little more I think the simple answer to
> Tom's question is only ND. But there are also some fields of already
> supported protocols which are covered in the OvS flow key but not the flow
> dissector.
>
> My analysis of OvS flow key fields for non-tunnel packet data - which I
> think is the extent of what is relevant to the flow dissector - yields the
> following list:
>
>   New protocols:
>   * ND (this patch)
>
>   Fields of already supported protocols:
>   * MPLS.lse (currently the label of the LSE is handled by the dissector
>               so this could also be described as MPLS.lse.{tc,s,ttl})
>   * IPV4.tos
>   * IPV4.ttl
>   * IPV6.hlimit
>   * IPV6.tclass
>
>   I do expect that some of the above will not be appropriate for existing
>   users of the flow dissector; e.g. IPV4.ttl does not seem much use when
>   calculating the hash of a flow for steering purposes. And I do not yet
>   have a good idea of how to approach that beyond using different dissector
>   keys.
>
>   Further support for to already supported fields.
>   * IPV[46].frag (supported by flow dissector : 1st frag flag;
>                   not supported by flow dissector: subsequent frag flag)
>
> From my point of view some parts of what is above is more important than
> others. In particular ND is probably not particularly important at least at
> this time. I would be happy to withdraw this patchset if the complexity is
> not deemed worth it at this time.
>
>
> I'd like to take a moment to give a little background and restate that my
> goal is not to create a burden for existing users of the flow dissector
> - or any other part of the stack.
>
> Netronome has worked on various approaches to an upstream offload of OvS.
> One was hooks added to the OvS kernel datapath; an idea which was rejected
> at least twice but none the less I'd be happy to revisit if there is
> interest in it.
>
> The result of trying various approaches is that it seems the most
> acceptable is to use TC for programming flows into hardware which is where
> my work on TC flower and the flow dissector is coming from.  A key part of
> the reasoning being, as I understand it, that the flows could also be
> programmed into hardware for non-OvS use-cases; technology that could be
> useful e.g. in a post-OvS world.
>
It's not an issue specific OVS, it's part of a broader problem in that
we are see more case where the core kernel paths are modified to
support somewhat narrow use case functionality (like OVS, advanced HW
accelerations, etc.). As we have seen recently this can make
maintainability and backports very difficult (see my comments about
backporting mlx5).

> I think the above paragraph gets back to Tom's original question regarding
> making things more complex just for OvS (use-cases). Possibly ND is an edge
> case even for OvS and on reflection my timing for posting it seems to have
> been less than ideal.

If it wasn't ND it would be something else... with all the activity
happening in networking features and HW this is a timely discussion.
Flow dissector presents a good example of a function that might become
a dumping ground for an endless stream of features if we don't figure
out how exercise some restraint.

Tom

David Miller Feb. 8, 2017, 6:54 p.m. UTC | #14

From: Tom Herbert <tom@herbertland.com>
Date: Wed, 8 Feb 2017 10:33:46 -0800

> On Wed, Feb 8, 2017 at 1:28 AM, Simon Horman <simon.horman@netronome.com> wrote:
>> I think the above paragraph gets back to Tom's original question regarding
>> making things more complex just for OvS (use-cases). Possibly ND is an edge
>> case even for OvS and on reflection my timing for posting it seems to have
>> been less than ideal.
> 
> If it wasn't ND it would be something else... with all the activity
> happening in networking features and HW this is a timely discussion.
> Flow dissector presents a good example of a function that might become
> a dumping ground for an endless stream of features if we don't figure
> out how exercise some restraint.

I agree on most points.

But, I would say that in this specific case, since we have ARP support in
there already it behooves us to support the ipv6 side in the form of ND
too.

Then we can put a line in the sand and say that future feature additions
in this area require serious discussion.

Ok Tom?

Tom Herbert Feb. 8, 2017, 7:10 p.m. UTC | #15

On Wed, Feb 8, 2017 at 10:54 AM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Wed, 8 Feb 2017 10:33:46 -0800
>
>> On Wed, Feb 8, 2017 at 1:28 AM, Simon Horman <simon.horman@netronome.com> wrote:
>>> I think the above paragraph gets back to Tom's original question regarding
>>> making things more complex just for OvS (use-cases). Possibly ND is an edge
>>> case even for OvS and on reflection my timing for posting it seems to have
>>> been less than ideal.
>>
>> If it wasn't ND it would be something else... with all the activity
>> happening in networking features and HW this is a timely discussion.
>> Flow dissector presents a good example of a function that might become
>> a dumping ground for an endless stream of features if we don't figure
>> out how exercise some restraint.
>
> I agree on most points.
>
> But, I would say that in this specific case, since we have ARP support in
> there already it behooves us to support the ipv6 side in the form of ND
> too.
>
> Then we can put a line in the sand and say that future feature additions
> in this area require serious discussion.
>
> Ok Tom?

Right, ND is okay on the basis that we already have ARP (although I
still may grumble from time to time that ARP, ND, and ICMP are being
identified as flows ;-) ).

I think there are two projects in the are that someone, maybe an
aspiring kernel network developer, might want to look into if they
have the time:

- Inevitably someone will want to support VXLAN or other UDP
encapsulations in flow dissector. The only correct way to do this is
going to be to do a lookup on UDP socket and have a flow_dissector
function related to the socket. This is the model for dealing with UDP
encapsulations in GRO that could be extended for flow dissection. We
cannot hard code port numbers in flow_dissector. The interesting part
here will be making a robust interface to avoid the pitfalls we've
seen in some of the protocols in flow_dissector.

- Allow calling a BPF function to do custom flow dissection. IIRC
there someone (Daniel?) had already implement flow_dissector in BPF
with pretty good results.

Tom

Jiri Pirko Feb. 8, 2017, 8:09 p.m. UTC | #16

Wed, Feb 08, 2017 at 07:54:15PM CET, davem@davemloft.net wrote:
>From: Tom Herbert <tom@herbertland.com>
>Date: Wed, 8 Feb 2017 10:33:46 -0800
>
>> On Wed, Feb 8, 2017 at 1:28 AM, Simon Horman <simon.horman@netronome.com> wrote:
>>> I think the above paragraph gets back to Tom's original question regarding
>>> making things more complex just for OvS (use-cases). Possibly ND is an edge
>>> case even for OvS and on reflection my timing for posting it seems to have
>>> been less than ideal.
>> 
>> If it wasn't ND it would be something else... with all the activity
>> happening in networking features and HW this is a timely discussion.
>> Flow dissector presents a good example of a function that might become
>> a dumping ground for an endless stream of features if we don't figure
>> out how exercise some restraint.
>
>I agree on most points.
>
>But, I would say that in this specific case, since we have ARP support in
>there already it behooves us to support the ipv6 side in the form of ND
>too.
>
>Then we can put a line in the sand and say that future feature additions
>in this area require serious discussion.

Yeah, well, and if there is a functinality that is unacceptable for any
reason to put into flow_dissector, we have to do a flow_dissector2?

Note that I originally had separate dissection in cls_flower, you
suggested to use the existing flow_dissector. And I still believe it was
the right way to do it.

I think that better is to make existing flow dissector more modular.
I'll look into this.

Jiri Pirko Feb. 8, 2017, 8:12 p.m. UTC | #17

Wed, Feb 08, 2017 at 08:10:06PM CET, tom@herbertland.com wrote:
>On Wed, Feb 8, 2017 at 10:54 AM, David Miller <davem@davemloft.net> wrote:
>> From: Tom Herbert <tom@herbertland.com>
>> Date: Wed, 8 Feb 2017 10:33:46 -0800
>>
>>> On Wed, Feb 8, 2017 at 1:28 AM, Simon Horman <simon.horman@netronome.com> wrote:
>>>> I think the above paragraph gets back to Tom's original question regarding
>>>> making things more complex just for OvS (use-cases). Possibly ND is an edge
>>>> case even for OvS and on reflection my timing for posting it seems to have
>>>> been less than ideal.
>>>
>>> If it wasn't ND it would be something else... with all the activity
>>> happening in networking features and HW this is a timely discussion.
>>> Flow dissector presents a good example of a function that might become
>>> a dumping ground for an endless stream of features if we don't figure
>>> out how exercise some restraint.
>>
>> I agree on most points.
>>
>> But, I would say that in this specific case, since we have ARP support in
>> there already it behooves us to support the ipv6 side in the form of ND
>> too.
>>
>> Then we can put a line in the sand and say that future feature additions
>> in this area require serious discussion.
>>
>> Ok Tom?
>
>Right, ND is okay on the basis that we already have ARP (although I
>still may grumble from time to time that ARP, ND, and ICMP are being
>identified as flows ;-) ).
>
>I think there are two projects in the are that someone, maybe an
>aspiring kernel network developer, might want to look into if they
>have the time:
>
>- Inevitably someone will want to support VXLAN or other UDP
>encapsulations in flow dissector. The only correct way to do this is
>going to be to do a lookup on UDP socket and have a flow_dissector
>function related to the socket. This is the model for dealing with UDP
>encapsulations in GRO that could be extended for flow dissection. We
>cannot hard code port numbers in flow_dissector. The interesting part
>here will be making a robust interface to avoid the pitfalls we've
>seen in some of the protocols in flow_dissector.
>
>- Allow calling a BPF function to do custom flow dissection. IIRC
>there someone (Daniel?) had already implement flow_dissector in BPF
>with pretty good results.

How will this help us for cls_flower case? Are you suggesting to put this
whole BPF occult to the next level and use it kernel-to-kernel? :D

Tom Herbert Feb. 8, 2017, 8:33 p.m. UTC | #18

On Wed, Feb 8, 2017 at 12:12 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Wed, Feb 08, 2017 at 08:10:06PM CET, tom@herbertland.com wrote:
>>On Wed, Feb 8, 2017 at 10:54 AM, David Miller <davem@davemloft.net> wrote:
>>> From: Tom Herbert <tom@herbertland.com>
>>> Date: Wed, 8 Feb 2017 10:33:46 -0800
>>>
>>>> On Wed, Feb 8, 2017 at 1:28 AM, Simon Horman <simon.horman@netronome.com> wrote:
>>>>> I think the above paragraph gets back to Tom's original question regarding
>>>>> making things more complex just for OvS (use-cases). Possibly ND is an edge
>>>>> case even for OvS and on reflection my timing for posting it seems to have
>>>>> been less than ideal.
>>>>
>>>> If it wasn't ND it would be something else... with all the activity
>>>> happening in networking features and HW this is a timely discussion.
>>>> Flow dissector presents a good example of a function that might become
>>>> a dumping ground for an endless stream of features if we don't figure
>>>> out how exercise some restraint.
>>>
>>> I agree on most points.
>>>
>>> But, I would say that in this specific case, since we have ARP support in
>>> there already it behooves us to support the ipv6 side in the form of ND
>>> too.
>>>
>>> Then we can put a line in the sand and say that future feature additions
>>> in this area require serious discussion.
>>>
>>> Ok Tom?
>>
>>Right, ND is okay on the basis that we already have ARP (although I
>>still may grumble from time to time that ARP, ND, and ICMP are being
>>identified as flows ;-) ).
>>
>>I think there are two projects in the are that someone, maybe an
>>aspiring kernel network developer, might want to look into if they
>>have the time:
>>
>>- Inevitably someone will want to support VXLAN or other UDP
>>encapsulations in flow dissector. The only correct way to do this is
>>going to be to do a lookup on UDP socket and have a flow_dissector
>>function related to the socket. This is the model for dealing with UDP
>>encapsulations in GRO that could be extended for flow dissection. We
>>cannot hard code port numbers in flow_dissector. The interesting part
>>here will be making a robust interface to avoid the pitfalls we've
>>seen in some of the protocols in flow_dissector.
>>
>>- Allow calling a BPF function to do custom flow dissection. IIRC
>>there someone (Daniel?) had already implement flow_dissector in BPF
>>with pretty good results.
>
> How will this help us for cls_flower case? Are you suggesting to put this
> whole BPF occult to the next level and use it kernel-to-kernel? :D
>
I am merely suggesting that BPF offers a user programmable interface
that could allow implementing protocol support in the kernel for
functions like flow_dissector for protocols that we don't really want
explicit kernel code for. SOREUSEPORT-BPF is the canonical example of
such a capability.

Tom

Simon Horman Feb. 9, 2017, 8:25 a.m. UTC | #19

On Wed, Feb 08, 2017 at 09:09:17PM +0100, Jiri Pirko wrote:
> Wed, Feb 08, 2017 at 07:54:15PM CET, davem@davemloft.net wrote:
> >From: Tom Herbert <tom@herbertland.com>
> >Date: Wed, 8 Feb 2017 10:33:46 -0800
> >
> >> On Wed, Feb 8, 2017 at 1:28 AM, Simon Horman <simon.horman@netronome.com> wrote:
> >>> I think the above paragraph gets back to Tom's original question regarding
> >>> making things more complex just for OvS (use-cases). Possibly ND is an edge
> >>> case even for OvS and on reflection my timing for posting it seems to have
> >>> been less than ideal.
> >> 
> >> If it wasn't ND it would be something else... with all the activity
> >> happening in networking features and HW this is a timely discussion.
> >> Flow dissector presents a good example of a function that might become
> >> a dumping ground for an endless stream of features if we don't figure
> >> out how exercise some restraint.
> >
> >I agree on most points.
> >
> >But, I would say that in this specific case, since we have ARP support in
> >there already it behooves us to support the ipv6 side in the form of ND
> >too.
> >
> >Then we can put a line in the sand and say that future feature additions
> >in this area require serious discussion.

I think this serious discussion is all about the long term and am
completely in favour of that. But in the short term it would help me to
know if the line in the sand excludes proposing enhancements to protocols
already supported by the flow dissector; in particular MPLS and IP fields I
listed earlier in this thread.

Perhaps the answer is that it depends. But if the line in
the sand has been fortified I'd rather avoid trying to cross it.

> Yeah, well, and if there is a functinality that is unacceptable for any
> reason to put into flow_dissector, we have to do a flow_dissector2?
> 
> Note that I originally had separate dissection in cls_flower, you
> suggested to use the existing flow_dissector. And I still believe it was
> the right way to do it.
> 
> I think that better is to make existing flow dissector more modular.
> I'll look into this.

Making the flow dissector more modular makes some sense to me.
It seems that it should be a way to address sharing common code while
allowing more flexibility for users, such as flower, that need it.

Elsewhere in this thread there has been some discussion of BPF. While I
also think that makes sense I think that for in-kernel users it makes
more sense to allow leveraging of the flow dissector using C code.

Such an approach well allow some of the complexity that was recently added
to the flow dissector to move into more peripheral code. Which may or may
not be a win as I think was discussed earlier in this thread.

Jiri Pirko Feb. 21, 2017, 2:31 p.m. UTC | #20

Thu, Feb 02, 2017 at 07:36:31PM CET, tom@herbertland.com wrote:
>On Thu, Feb 2, 2017 at 9:48 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Thu, Feb 02, 2017 at 06:24:40PM CET, tom@herbertland.com wrote:
>>>On Thu, Feb 2, 2017 at 7:58 AM, Simon Horman <simon.horman@netronome.com> wrote:
>>>> [Repost due to gmail account problem]
>>>>
>>>> On Thu, Feb 02, 2017 at 04:31:33AM -0800, Eric Dumazet wrote:
>>>>> On Thu, 2017-02-02 at 11:37 +0100, Simon Horman wrote:
>>>>> > Allow dissection of Neighbour Discovery target IP, and source and
>>>>> > destination link-layer addresses for neighbour solicitation and
>>>>> > advertisement messages.
>>>>> >
>>>>> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
>>>>> > ---
>>>>>
>>>>> Hi Simon
>>>>>
>>>>> Why is this needed ?
>>>>>
>>>>> Any code added in flow dissector needs to be extra careful,
>>>>> we had various packet of deaths errors recently in this area.
>>>>
>>>> Hi Eric,
>>>>
>>>> there some activity to allow programming of OvS flows in hardware via TC
>>>> with the flower classifier. As the ND fields in this patch are part of the
>>>> OvS flow key I would like them considered for additions to flower and thus
>>>> the dissector to allow compatibility with OvS.
>>>>
>>>Given that ARP is already there it seems only "fair" to have ND also.
>>>But Eric is correct, this is quite a sensitive area of code.
>>>
>>>> I apologise if any 'deaths' have resulted from my recent work on the
>>>> dissector. I am of course very open to ideas on how to avoid any future
>>>> incidents.
>>>
>>>That's a tough problem. flow_dissector started off as simple mechanism
>>>to just identify actual flows (really just TCP and UDP packets) for
>>>the purposes of packet steering. But given the benefits of its
>>>location low in the stack and the open ended capabilities for parsing
>>>it seems to have mushroomed into a general catchall to parse a whole
>>>bunch of different protocols. A lot of these go beyond simply
>>>identifying flows (ICMP parsing, ARP, or ND as in your patches). These
>>>new use cases may be valid, but the result is a convoluted function (>
>>>500 LOC by my count) and it seems to be quite easy to have subtle bugs
>>>mostly in edge cases, several of which could have been exploited in
>>>DDOS attacks.
>>
>> Agreed that we probably came to a point when we need to split
>> __skb_flow_dissect into modular and pluggable pieces. Will not be
>> trivial though.
>>
>> Also note that it depends on the __skb_flow_dissect user which code is
>> actually used or not. For the critical path, that keys are defined by:
>> flow_keys_dissector_keys
>>
>True, but the code doesn't separate out the critical path from all
>these extended features which is resulted in a jumbled mess with no
>modularity to speak of :-(

I'm looking at this right now. It is really not possible to split this
in some nice and efficient way. So what I did (patchset in reply to this
email) is I pushed the bits that are not needed for the default hash
dissection out to sub-functions. Also, the code is executed only when
the flow dissector user needs it - that is not the case of the default
hash dissection, so that should satisfy your concerns.

Every future dissection feature addition will be done like this. So
Simon can do it like that for ND and should be safe.


>
>> Most of the code Simon is adding is noop for non-flower usecase if:
>> dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) == false
>>
>Sure, but that just makes this code corner cases which means it's hard
>to maintain and harder to find bugs in the long run.

Sure, but you will never see the bugs in the default hash dissection.
And that is what you need. For cls_flower user, we'll experience it and
fix it. I see no problem.

Jiri Pirko Feb. 21, 2017, 3:28 p.m. UTC | #21

Thu, Feb 02, 2017 at 11:37:34AM CET, simon.horman@netronome.com wrote:
>Allow dissection of Neighbour Discovery target IP, and source and
>destination link-layer addresses for neighbour solicitation and
>advertisement messages.
>
>Signed-off-by: Simon Horman <simon.horman@netronome.com>
>---

[...]

>@@ -633,6 +702,18 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> 						     FLOW_DISSECTOR_KEY_ICMP,
> 						     target_container);
> 		key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data, hlen);
>+
>+		if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) &&
>+		    ip_proto == IPPROTO_IPV6 && key_icmp->code == 0 &&

IPPROTO_IPV6 say "IPv6-in-IPv4 tunnelling". Please use "NEXTHDR_IPV6"
instead.


>+		    (key_icmp->type == NDISC_NEIGHBOUR_SOLICITATION ||
>+		     key_icmp->type == NDISC_NEIGHBOUR_ADVERTISEMENT)) {
>+			key_nd = skb_flow_dissector_target(flow_dissector,
>+							   FLOW_DISSECTOR_KEY_ND,
>+							   target_container);
>+			if (!(skb_flow_dissect_nd(skb, key_nd, data, nhoff,
>+						  hlen, ipv6_payload_len)))
>+				goto out_bad;
>+		}

You should put this under "switch (ip_proto) {"

Simon Horman March 10, 2017, 2:19 p.m. UTC | #22

On Tue, Feb 21, 2017 at 04:28:10PM +0100, Jiri Pirko wrote:
> Thu, Feb 02, 2017 at 11:37:34AM CET, simon.horman@netronome.com wrote:
> >Allow dissection of Neighbour Discovery target IP, and source and
> >destination link-layer addresses for neighbour solicitation and
> >advertisement messages.
> >
> >Signed-off-by: Simon Horman <simon.horman@netronome.com>
> >---
> 
> [...]
> 
> >@@ -633,6 +702,18 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> > 						     FLOW_DISSECTOR_KEY_ICMP,
> > 						     target_container);
> > 		key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data, hlen);
> >+
> >+		if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) &&
> >+		    ip_proto == IPPROTO_IPV6 && key_icmp->code == 0 &&
> 
> IPPROTO_IPV6 say "IPv6-in-IPv4 tunnelling". Please use "NEXTHDR_IPV6"
> instead.

Thanks, will do.

> >+		    (key_icmp->type == NDISC_NEIGHBOUR_SOLICITATION ||
> >+		     key_icmp->type == NDISC_NEIGHBOUR_ADVERTISEMENT)) {
> >+			key_nd = skb_flow_dissector_target(flow_dissector,
> >+							   FLOW_DISSECTOR_KEY_ND,
> >+							   target_container);
> >+			if (!(skb_flow_dissect_nd(skb, key_nd, data, nhoff,
> >+						  hlen, ipv6_payload_len)))
> >+				goto out_bad;
> >+		}
> 
> You should put this under "switch (ip_proto) {"

I see that makes sense in terms of the check against ip_proto.
But I added it here to allow checking against key_icmp->code
and key_icmp->type as well.

Jiri Pirko March 10, 2017, 2:27 p.m. UTC | #23

Fri, Mar 10, 2017 at 03:19:13PM CET, simon.horman@netronome.com wrote:
>On Tue, Feb 21, 2017 at 04:28:10PM +0100, Jiri Pirko wrote:
>> Thu, Feb 02, 2017 at 11:37:34AM CET, simon.horman@netronome.com wrote:
>> >Allow dissection of Neighbour Discovery target IP, and source and
>> >destination link-layer addresses for neighbour solicitation and
>> >advertisement messages.
>> >
>> >Signed-off-by: Simon Horman <simon.horman@netronome.com>
>> >---
>> 
>> [...]
>> 
>> >@@ -633,6 +702,18 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>> > 						     FLOW_DISSECTOR_KEY_ICMP,
>> > 						     target_container);
>> > 		key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data, hlen);
>> >+
>> >+		if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) &&
>> >+		    ip_proto == IPPROTO_IPV6 && key_icmp->code == 0 &&
>> 
>> IPPROTO_IPV6 say "IPv6-in-IPv4 tunnelling". Please use "NEXTHDR_IPV6"
>> instead.
>
>Thanks, will do.
>
>> >+		    (key_icmp->type == NDISC_NEIGHBOUR_SOLICITATION ||
>> >+		     key_icmp->type == NDISC_NEIGHBOUR_ADVERTISEMENT)) {
>> >+			key_nd = skb_flow_dissector_target(flow_dissector,
>> >+							   FLOW_DISSECTOR_KEY_ND,
>> >+							   target_container);
>> >+			if (!(skb_flow_dissect_nd(skb, key_nd, data, nhoff,
>> >+						  hlen, ipv6_payload_len)))
>> >+				goto out_bad;
>> >+		}
>> 
>> You should put this under "switch (ip_proto) {"
>
>I see that makes sense in terms of the check against ip_proto.
>But I added it here to allow checking against key_icmp->code
>and key_icmp->type as well.

Sure. Just add under "switch (ip_proto) {" and call a wrapper nd
function from there. In that function, you check dissector_uses_key and
other needed things.

Simon Horman March 10, 2017, 3:20 p.m. UTC | #24

On Fri, Mar 10, 2017 at 03:27:32PM +0100, Jiri Pirko wrote:
> Fri, Mar 10, 2017 at 03:19:13PM CET, simon.horman@netronome.com wrote:
> >On Tue, Feb 21, 2017 at 04:28:10PM +0100, Jiri Pirko wrote:
> >> Thu, Feb 02, 2017 at 11:37:34AM CET, simon.horman@netronome.com wrote:
> >> >Allow dissection of Neighbour Discovery target IP, and source and
> >> >destination link-layer addresses for neighbour solicitation and
> >> >advertisement messages.
> >> >
> >> >Signed-off-by: Simon Horman <simon.horman@netronome.com>
> >> >---
> >> 
> >> [...]
> >> 
> >> >@@ -633,6 +702,18 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> >> > 						     FLOW_DISSECTOR_KEY_ICMP,
> >> > 						     target_container);
> >> > 		key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data, hlen);
> >> >+
> >> >+		if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) &&
> >> >+		    ip_proto == IPPROTO_IPV6 && key_icmp->code == 0 &&
> >> 
> >> IPPROTO_IPV6 say "IPv6-in-IPv4 tunnelling". Please use "NEXTHDR_IPV6"
> >> instead.
> >
> >Thanks, will do.
> >
> >> >+		    (key_icmp->type == NDISC_NEIGHBOUR_SOLICITATION ||
> >> >+		     key_icmp->type == NDISC_NEIGHBOUR_ADVERTISEMENT)) {
> >> >+			key_nd = skb_flow_dissector_target(flow_dissector,
> >> >+							   FLOW_DISSECTOR_KEY_ND,
> >> >+							   target_container);
> >> >+			if (!(skb_flow_dissect_nd(skb, key_nd, data, nhoff,
> >> >+						  hlen, ipv6_payload_len)))
> >> >+				goto out_bad;
> >> >+		}
> >> 
> >> You should put this under "switch (ip_proto) {"
> >
> >I see that makes sense in terms of the check against ip_proto.
> >But I added it here to allow checking against key_icmp->code
> >and key_icmp->type as well.
> 
> Sure. Just add under "switch (ip_proto) {" and call a wrapper nd
> function from there. In that function, you check dissector_uses_key and
> other needed things.

Sorry, but I'm still a little unclear on how that interacts with
the dissection of key_icmp.

Jiri Pirko March 10, 2017, 3:26 p.m. UTC | #25

Fri, Mar 10, 2017 at 04:20:21PM CET, simon.horman@netronome.com wrote:
>On Fri, Mar 10, 2017 at 03:27:32PM +0100, Jiri Pirko wrote:
>> Fri, Mar 10, 2017 at 03:19:13PM CET, simon.horman@netronome.com wrote:
>> >On Tue, Feb 21, 2017 at 04:28:10PM +0100, Jiri Pirko wrote:
>> >> Thu, Feb 02, 2017 at 11:37:34AM CET, simon.horman@netronome.com wrote:
>> >> >Allow dissection of Neighbour Discovery target IP, and source and
>> >> >destination link-layer addresses for neighbour solicitation and
>> >> >advertisement messages.
>> >> >
>> >> >Signed-off-by: Simon Horman <simon.horman@netronome.com>
>> >> >---
>> >> 
>> >> [...]
>> >> 
>> >> >@@ -633,6 +702,18 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>> >> > 						     FLOW_DISSECTOR_KEY_ICMP,
>> >> > 						     target_container);
>> >> > 		key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data, hlen);
>> >> >+
>> >> >+		if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) &&
>> >> >+		    ip_proto == IPPROTO_IPV6 && key_icmp->code == 0 &&
>> >> 
>> >> IPPROTO_IPV6 say "IPv6-in-IPv4 tunnelling". Please use "NEXTHDR_IPV6"
>> >> instead.
>> >
>> >Thanks, will do.
>> >
>> >> >+		    (key_icmp->type == NDISC_NEIGHBOUR_SOLICITATION ||
>> >> >+		     key_icmp->type == NDISC_NEIGHBOUR_ADVERTISEMENT)) {
>> >> >+			key_nd = skb_flow_dissector_target(flow_dissector,
>> >> >+							   FLOW_DISSECTOR_KEY_ND,
>> >> >+							   target_container);
>> >> >+			if (!(skb_flow_dissect_nd(skb, key_nd, data, nhoff,
>> >> >+						  hlen, ipv6_payload_len)))
>> >> >+				goto out_bad;
>> >> >+		}
>> >> 
>> >> You should put this under "switch (ip_proto) {"
>> >
>> >I see that makes sense in terms of the check against ip_proto.
>> >But I added it here to allow checking against key_icmp->code
>> >and key_icmp->type as well.
>> 
>> Sure. Just add under "switch (ip_proto) {" and call a wrapper nd
>> function from there. In that function, you check dissector_uses_key and
>> other needed things.
>
>Sorry, but I'm still a little unclear on how that interacts with
>the dissection of key_icmp.

you do:
if (key_icmp->code != 0)
	return

Inside that function. Or something like that. Up to you. Just look at
__skb_flow_dissect_arp for example. First it checks dissector_uses_key,
then it does other checks.

Simon Horman March 13, 2017, 1:50 p.m. UTC | #26

On Fri, Mar 10, 2017 at 03:27:32PM +0100, Jiri Pirko wrote:
> Fri, Mar 10, 2017 at 03:19:13PM CET, simon.horman@netronome.com wrote:
> >On Tue, Feb 21, 2017 at 04:28:10PM +0100, Jiri Pirko wrote:
> >> Thu, Feb 02, 2017 at 11:37:34AM CET, simon.horman@netronome.com wrote:
> >> >Allow dissection of Neighbour Discovery target IP, and source and
> >> >destination link-layer addresses for neighbour solicitation and
> >> >advertisement messages.
> >> >
> >> >Signed-off-by: Simon Horman <simon.horman@netronome.com>
> >> >---
> >> 
> >> [...]
> >> 
> >> >@@ -633,6 +702,18 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> >> > 						     FLOW_DISSECTOR_KEY_ICMP,
> >> > 						     target_container);
> >> > 		key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data, hlen);
> >> >+
> >> >+		if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) &&
> >> >+		    ip_proto == IPPROTO_IPV6 && key_icmp->code == 0 &&
> >> 
> >> IPPROTO_IPV6 say "IPv6-in-IPv4 tunnelling". Please use "NEXTHDR_IPV6"
> >> instead.
> >
> >Thanks, will do.
> >
> >> >+		    (key_icmp->type == NDISC_NEIGHBOUR_SOLICITATION ||
> >> >+		     key_icmp->type == NDISC_NEIGHBOUR_ADVERTISEMENT)) {
> >> >+			key_nd = skb_flow_dissector_target(flow_dissector,
> >> >+							   FLOW_DISSECTOR_KEY_ND,
> >> >+							   target_container);
> >> >+			if (!(skb_flow_dissect_nd(skb, key_nd, data, nhoff,
> >> >+						  hlen, ipv6_payload_len)))
> >> >+				goto out_bad;
> >> >+		}
> >> 
> >> You should put this under "switch (ip_proto) {"
> >
> >I see that makes sense in terms of the check against ip_proto.
> >But I added it here to allow checking against key_icmp->code
> >and key_icmp->type as well.
> 
> Sure. Just add under "switch (ip_proto) {" and call a wrapper nd
> function from there. In that function, you check dissector_uses_key and
> other needed things.

Hi Jiri,

I'm sorry but I'm having a bit of trouble understanding how best to
structure the extraction of ICMP and ND.

The way I see things is this:

* ICMP extraction may occur for IPv4 or IPv6 although currently neither
  IPv4 nor IPv6 is a condition of ICMP extraction.
* ND extraction may only occur for IPv6
* ND extraction may only occur for certain ICMP code/type values;
  thus ICMP extraction should occur before ND extraction.

I wonder if a good alternative to the approach I took above in my patch
would  be to provide:
* ICMP extraction conditional on IPv4 and;
* ICMP extraction conditional on IPv6 followed by
  ND extraction conditional on ICMP type and code

Jiri Pirko March 13, 2017, 1:53 p.m. UTC | #27

Mon, Mar 13, 2017 at 02:50:08PM CET, simon.horman@netronome.com wrote:
>On Fri, Mar 10, 2017 at 03:27:32PM +0100, Jiri Pirko wrote:
>> Fri, Mar 10, 2017 at 03:19:13PM CET, simon.horman@netronome.com wrote:
>> >On Tue, Feb 21, 2017 at 04:28:10PM +0100, Jiri Pirko wrote:
>> >> Thu, Feb 02, 2017 at 11:37:34AM CET, simon.horman@netronome.com wrote:
>> >> >Allow dissection of Neighbour Discovery target IP, and source and
>> >> >destination link-layer addresses for neighbour solicitation and
>> >> >advertisement messages.
>> >> >
>> >> >Signed-off-by: Simon Horman <simon.horman@netronome.com>
>> >> >---
>> >> 
>> >> [...]
>> >> 
>> >> >@@ -633,6 +702,18 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>> >> > 						     FLOW_DISSECTOR_KEY_ICMP,
>> >> > 						     target_container);
>> >> > 		key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data, hlen);
>> >> >+
>> >> >+		if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ND) &&
>> >> >+		    ip_proto == IPPROTO_IPV6 && key_icmp->code == 0 &&
>> >> 
>> >> IPPROTO_IPV6 say "IPv6-in-IPv4 tunnelling". Please use "NEXTHDR_IPV6"
>> >> instead.
>> >
>> >Thanks, will do.
>> >
>> >> >+		    (key_icmp->type == NDISC_NEIGHBOUR_SOLICITATION ||
>> >> >+		     key_icmp->type == NDISC_NEIGHBOUR_ADVERTISEMENT)) {
>> >> >+			key_nd = skb_flow_dissector_target(flow_dissector,
>> >> >+							   FLOW_DISSECTOR_KEY_ND,
>> >> >+							   target_container);
>> >> >+			if (!(skb_flow_dissect_nd(skb, key_nd, data, nhoff,
>> >> >+						  hlen, ipv6_payload_len)))
>> >> >+				goto out_bad;
>> >> >+		}
>> >> 
>> >> You should put this under "switch (ip_proto) {"
>> >
>> >I see that makes sense in terms of the check against ip_proto.
>> >But I added it here to allow checking against key_icmp->code
>> >and key_icmp->type as well.
>> 
>> Sure. Just add under "switch (ip_proto) {" and call a wrapper nd
>> function from there. In that function, you check dissector_uses_key and
>> other needed things.
>
>Hi Jiri,
>
>I'm sorry but I'm having a bit of trouble understanding how best to
>structure the extraction of ICMP and ND.
>
>The way I see things is this:
>
>* ICMP extraction may occur for IPv4 or IPv6 although currently neither
>  IPv4 nor IPv6 is a condition of ICMP extraction.
>* ND extraction may only occur for IPv6
>* ND extraction may only occur for certain ICMP code/type values;
>  thus ICMP extraction should occur before ND extraction.
>
>I wonder if a good alternative to the approach I took above in my patch
>would  be to provide:
>* ICMP extraction conditional on IPv4 and;
>* ICMP extraction conditional on IPv6 followed by
>  ND extraction conditional on ICMP type and code

Makes sense to me.

[PATCH/RFC,net-next,1/2] flow dissector: ND support

Commit Message

Comments

Patch