diff mbox series

[net-next] net: openvswitch: Add support to lookup invalid packet in ct action.

Message ID 20201006083355.121018-1-nusiddiq@redhat.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [net-next] net: openvswitch: Add support to lookup invalid packet in ct action. | expand

Commit Message

Numan Siddique Oct. 6, 2020, 8:33 a.m. UTC
From: Numan Siddique <nusiddiq@redhat.com>

For a tcp packet which is part of an existing committed connection,
nf_conntrack_in() will return err and set skb->_nfct to NULL if it is
out of tcp window. ct action for this packet will set the ct_state
to +inv which is as expected.

But a controller cannot add an OVS flow as

table=21,priority=100,ct_state=+inv, actions=drop

to drop such packets. That is because when ct action is executed on other
packets which are not part of existing committed connections, ct_state
can be set to invalid. Few such cases are:
   - ICMP reply packets.
   - TCP SYN/ACK packets during connection establishment.
   - SCTP INIT ACK, COOKIE ACK, DATA and DATA ACK packets.

To distinguish between an invalid packet part of committed connection
and others, this patch introduces as a new ct attribute
OVS_CT_ATTR_LOOKUP_INV. If this is set in the ct action (without commit),
it tries to find the ct entry and if present, sets the ct_state to
+inv,+trk and also sets the mark and labels associated with the
connection.

With this,  a controller can add flows like

....
....
table=20,ip, action=ct(table=21, lookup_invalid)
table=21,priority=100,ct_state=+inv+trk,ct_label=0x2/0x2 actions=drop
table=21,ip, actions=resubmit(,22)
....
....

CC: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
---

RFC -> PATCH
------
  * Changed the patch from RFC to a formal one. No other changes.

 include/uapi/linux/openvswitch.h |  4 +++
 net/openvswitch/conntrack.c      | 47 ++++++++++++++++++++++++--------
 2 files changed, 40 insertions(+), 11 deletions(-)

Comments

Florian Westphal Oct. 6, 2020, 11:16 a.m. UTC | #1
nusiddiq@redhat.com <nusiddiq@redhat.com> wrote:
> From: Numan Siddique <nusiddiq@redhat.com>
> 
> For a tcp packet which is part of an existing committed connection,
> nf_conntrack_in() will return err and set skb->_nfct to NULL if it is
> out of tcp window. ct action for this packet will set the ct_state
> to +inv which is as expected.

This is because from conntrack p.o.v., such TCP packet is NOT part of
the existing connection.

For example, because it is considered part of a previous incarnation
of the same connection.

> But a controller cannot add an OVS flow as
> 
> table=21,priority=100,ct_state=+inv, actions=drop
> 
> to drop such packets. That is because when ct action is executed on other
> packets which are not part of existing committed connections, ct_state
> can be set to invalid. Few such cases are:
>    - ICMP reply packets.

Can you elaborate? Echo reply should not be invalid. Conntrack should
mark it as established (unless such echo reply came out of the blue).

>    - TCP SYN/ACK packets during connection establishment.

SYN/ACK should also be established state.
INVALID should only be matched for packets that were never seen
by conntrack, or that are deemed out of date / corrupted.

> To distinguish between an invalid packet part of committed connection
> and others, this patch introduces as a new ct attribute
> OVS_CT_ATTR_LOOKUP_INV. If this is set in the ct action (without commit),
> it tries to find the ct entry and if present, sets the ct_state to
> +inv,+trk and also sets the mark and labels associated with the
> connection.
> 
> With this,  a controller can add flows like
> 
> ....
> ....
> table=20,ip, action=ct(table=21, lookup_invalid)
> table=21,priority=100,ct_state=+inv+trk,ct_label=0x2/0x2 actions=drop
> table=21,ip, actions=resubmit(,22)
> ....
> ....

What exactly is the feature/problem that needs to be solved?
I suspect this would help me to provide better feedback than the
semi-random comments below .... :-)

My only problem with how conntrack does things ATM is that the ruleset
cannot distinguish:

1. packet was not even seen by conntrack
2. packet matches existing connection, but is "bad", for example:
  - contradicting tcp flags
  - out of window
  - invalid checksum

There are a few sysctls to modify default behaviour, e.g. relax window
checks, or ignore/skip checksum validation.

The other problem i see (solveable for sure by yet-another-sysctl but i
see that as last-resort) is usual compatibility problem:

ct state invalid drop
ct mark gt 0 accept

If standard netfilter conntrack were to set skb->_nfct e.g. even if
state is invalid, we could still make the above work via some internal
flag.

But if you reverse it, you get different behaviour:

ct mark gt 0 accept
ct state invalid drop

First rule might now accept out-of-window packet even when "be_liberal"
sysctl is off.
Numan Siddique Oct. 6, 2020, 12:19 p.m. UTC | #2
On Tue, Oct 6, 2020 at 4:46 PM Florian Westphal <fw@strlen.de> wrote:
>
> nusiddiq@redhat.com <nusiddiq@redhat.com> wrote:
> > From: Numan Siddique <nusiddiq@redhat.com>
> >
> > For a tcp packet which is part of an existing committed connection,
> > nf_conntrack_in() will return err and set skb->_nfct to NULL if it is
> > out of tcp window. ct action for this packet will set the ct_state
> > to +inv which is as expected.
>
> This is because from conntrack p.o.v., such TCP packet is NOT part of
> the existing connection.
>
> For example, because it is considered part of a previous incarnation
> of the same connection.
>
> > But a controller cannot add an OVS flow as
> >
> > table=21,priority=100,ct_state=+inv, actions=drop
> >
> > to drop such packets. That is because when ct action is executed on other
> > packets which are not part of existing committed connections, ct_state
> > can be set to invalid. Few such cases are:
> >    - ICMP reply packets.
>
> Can you elaborate? Echo reply should not be invalid. Conntrack should
> mark it as established (unless such echo reply came out of the blue).

Hi Florian,

Thanks for providing the comments.

Sorry for not being very clear.

Let me brief about the present problem we see in OVN (which is a
controller using ovs)

When a VM/container sends a packet (in the ingress direction), we don't send all
the packets to conntrack. If a packet is destined to an OVN load
balancer virtual ip,
only then we send the packet to conntrack in the ingress direction and
then we do dnat
to the backend.

Eg. in the ingress direction

table=1, match = (ip && ip4.dst == VIP) action = ct(table=2)
tablle=2, ct_state=+new+trk && ip4.dst == VIP, action = ct(commit,
nat=BACKEND_IP)
...
..

However for the egress direction (when the packet is to be delivered
to the VM/container),
we send all the packets to conntrack and if the ct.est is set, we do
undnat before delivering
the packet to the VM/container.
...
table=40, match = ip, action = ct(table=41)
table=41, match = ct_state=+est+trk, action = ct(nat)
...

What I mean here is that, since we send all the packets in the egress
pipeline to conntrack,
we can't add a flow like - match = ct_state=+inv, action = drop.

i.e When a VM/container sends an ICMP request packet, it will not be
sent to conntrack, but
the reply ICMP will be sent to conntrack and it will be marked as invalid.

So is the case with TCP, the TCP SYN from the VM is not sent to
conntrack, but the SYN/ACK
from the server would be sent to conntrack and it will be marked as invalid.

>
> >    - TCP SYN/ACK packets during connection establishment.
>
> SYN/ACK should also be established state.
> INVALID should only be matched for packets that were never seen
> by conntrack, or that are deemed out of date / corrupted.
>
> > To distinguish between an invalid packet part of committed connection
> > and others, this patch introduces as a new ct attribute
> > OVS_CT_ATTR_LOOKUP_INV. If this is set in the ct action (without commit),
> > it tries to find the ct entry and if present, sets the ct_state to
> > +inv,+trk and also sets the mark and labels associated with the
> > connection.
> >
> > With this,  a controller can add flows like
> >
> > ....
> > ....
> > table=20,ip, action=ct(table=21, lookup_invalid)
> > table=21,priority=100,ct_state=+inv+trk,ct_label=0x2/0x2 actions=drop
> > table=21,ip, actions=resubmit(,22)
> > ....
> > ....
>
> What exactly is the feature/problem that needs to be solved?
> I suspect this would help me to provide better feedback than the
> semi-random comments below .... :-)
>
> My only problem with how conntrack does things ATM is that the ruleset
> cannot distinguish:
>
> 1. packet was not even seen by conntrack
> 2. packet matches existing connection, but is "bad", for example:
>   - contradicting tcp flags
>   - out of window
>   - invalid checksum

We want the below to be solved (using OVS flows) :
  - If the packet is marked as invalid due to (2) which you mentioned above,
    we would like to read the ct_mark and ct_label fields as the packet is
    part of existing connection, so that we can add an OVS flow like

ct_state=+inv+trk,ct_label=0x2 actions=drop

Right now it is not possible.

This patch does another lookup if skb->_nfct is NULL after
nf_conntrack_in() to check
if (2) is the case. If the lookup is successful, it updates the ct flow
key with the ct_mark and ct_label. This is made optional using a
netlink attribute.

I'm not sure if it's possible for nf_conntrack_in() to provide this
information for
its callers so that the caller can come to know that the state is
invalid because of (2).

I tested by setting 'be_liberal' sysctl flag and since skb->_nfct was
set for (2), OVS
datapath module set the ct_state to +est.

Thanks
Numan


>
> There are a few sysctls to modify default behaviour, e.g. relax window
> checks, or ignore/skip checksum validation.
>
> The other problem i see (solveable for sure by yet-another-sysctl but i
> see that as last-resort) is usual compatibility problem:
>
> ct state invalid drop
> ct mark gt 0 accept
>
> If standard netfilter conntrack were to set skb->_nfct e.g. even if
> state is invalid, we could still make the above work via some internal
> flag.
>
> But if you reverse it, you get different behaviour:
>
> ct mark gt 0 accept
> ct state invalid drop
>
> First rule might now accept out-of-window packet even when "be_liberal"
> sysctl is off.
>
Numan Siddique Oct. 6, 2020, 1:22 p.m. UTC | #3
On Tue, Oct 6, 2020 at 5:49 PM Numan Siddique <nusiddiq@redhat.com> wrote:
>
> On Tue, Oct 6, 2020 at 4:46 PM Florian Westphal <fw@strlen.de> wrote:
> >
> > nusiddiq@redhat.com <nusiddiq@redhat.com> wrote:
> > > From: Numan Siddique <nusiddiq@redhat.com>
> > >
> > > For a tcp packet which is part of an existing committed connection,
> > > nf_conntrack_in() will return err and set skb->_nfct to NULL if it is
> > > out of tcp window. ct action for this packet will set the ct_state
> > > to +inv which is as expected.
> >
> > This is because from conntrack p.o.v., such TCP packet is NOT part of
> > the existing connection.
> >
> > For example, because it is considered part of a previous incarnation
> > of the same connection.
> >
> > > But a controller cannot add an OVS flow as
> > >
> > > table=21,priority=100,ct_state=+inv, actions=drop
> > >
> > > to drop such packets. That is because when ct action is executed on other
> > > packets which are not part of existing committed connections, ct_state
> > > can be set to invalid. Few such cases are:
> > >    - ICMP reply packets.
> >
> > Can you elaborate? Echo reply should not be invalid. Conntrack should
> > mark it as established (unless such echo reply came out of the blue).
>
> Hi Florian,
>
> Thanks for providing the comments.
>
> Sorry for not being very clear.
>
> Let me brief about the present problem we see in OVN (which is a
> controller using ovs)
>
> When a VM/container sends a packet (in the ingress direction), we don't send all
> the packets to conntrack. If a packet is destined to an OVN load
> balancer virtual ip,
> only then we send the packet to conntrack in the ingress direction and
> then we do dnat
> to the backend.
>
> Eg. in the ingress direction
>
> table=1, match = (ip && ip4.dst == VIP) action = ct(table=2)
> tablle=2, ct_state=+new+trk && ip4.dst == VIP, action = ct(commit,
> nat=BACKEND_IP)
> ...
> ..
>
> However for the egress direction (when the packet is to be delivered
> to the VM/container),
> we send all the packets to conntrack and if the ct.est is set, we do
> undnat before delivering
> the packet to the VM/container.
> ...
> table=40, match = ip, action = ct(table=41)
> table=41, match = ct_state=+est+trk, action = ct(nat)
> ...
>
> What I mean here is that, since we send all the packets in the egress
> pipeline to conntrack,
> we can't add a flow like - match = ct_state=+inv, action = drop.
>
> i.e When a VM/container sends an ICMP request packet, it will not be
> sent to conntrack, but
> the reply ICMP will be sent to conntrack and it will be marked as invalid.
>
> So is the case with TCP, the TCP SYN from the VM is not sent to
> conntrack, but the SYN/ACK
> from the server would be sent to conntrack and it will be marked as invalid.
>
> >
> > >    - TCP SYN/ACK packets during connection establishment.
> >
> > SYN/ACK should also be established state.
> > INVALID should only be matched for packets that were never seen
> > by conntrack, or that are deemed out of date / corrupted.
> >
> > > To distinguish between an invalid packet part of committed connection
> > > and others, this patch introduces as a new ct attribute
> > > OVS_CT_ATTR_LOOKUP_INV. If this is set in the ct action (without commit),
> > > it tries to find the ct entry and if present, sets the ct_state to
> > > +inv,+trk and also sets the mark and labels associated with the
> > > connection.
> > >
> > > With this,  a controller can add flows like
> > >
> > > ....
> > > ....
> > > table=20,ip, action=ct(table=21, lookup_invalid)
> > > table=21,priority=100,ct_state=+inv+trk,ct_label=0x2/0x2 actions=drop
> > > table=21,ip, actions=resubmit(,22)
> > > ....
> > > ....
> >
> > What exactly is the feature/problem that needs to be solved?
> > I suspect this would help me to provide better feedback than the
> > semi-random comments below .... :-)
> >
> > My only problem with how conntrack does things ATM is that the ruleset
> > cannot distinguish:
> >
> > 1. packet was not even seen by conntrack
> > 2. packet matches existing connection, but is "bad", for example:
> >   - contradicting tcp flags
> >   - out of window
> >   - invalid checksum
>
> We want the below to be solved (using OVS flows) :
>   - If the packet is marked as invalid due to (2) which you mentioned above,
>     we would like to read the ct_mark and ct_label fields as the packet is
>     part of existing connection, so that we can add an OVS flow like
>
> ct_state=+inv+trk,ct_label=0x2 actions=drop
>
> Right now it is not possible.

I forgot to mention the side effect of it. Since the tcp out of window packet
is set as +inv, this packet is delivered to the VM/container without undnat
and because of this VM/container resets the connection.

Thanks
Numan



>
> This patch does another lookup if skb->_nfct is NULL after
> nf_conntrack_in() to check
> if (2) is the case. If the lookup is successful, it updates the ct flow
> key with the ct_mark and ct_label. This is made optional using a
> netlink attribute.
>
> I'm not sure if it's possible for nf_conntrack_in() to provide this
> information for
> its callers so that the caller can come to know that the state is
> invalid because of (2).
>
> I tested by setting 'be_liberal' sysctl flag and since skb->_nfct was
> set for (2), OVS
> datapath module set the ct_state to +est.
>
> Thanks
> Numan
>
>
> >
> > There are a few sysctls to modify default behaviour, e.g. relax window
> > checks, or ignore/skip checksum validation.
> >
> > The other problem i see (solveable for sure by yet-another-sysctl but i
> > see that as last-resort) is usual compatibility problem:
> >
> > ct state invalid drop
> > ct mark gt 0 accept
> >
> > If standard netfilter conntrack were to set skb->_nfct e.g. even if
> > state is invalid, we could still make the above work via some internal
> > flag.
> >
> > But if you reverse it, you get different behaviour:
> >
> > ct mark gt 0 accept
> > ct state invalid drop
> >
> > First rule might now accept out-of-window packet even when "be_liberal"
> > sysctl is off.
> >
Florian Westphal Oct. 6, 2020, 6:52 p.m. UTC | #4
Numan Siddique <nusiddiq@redhat.com> wrote:
> On Tue, Oct 6, 2020 at 4:46 PM Florian Westphal <fw@strlen.de> wrote:
> >
> > nusiddiq@redhat.com <nusiddiq@redhat.com> wrote:
> > > From: Numan Siddique <nusiddiq@redhat.com>
> > >
> > > For a tcp packet which is part of an existing committed connection,
> > > nf_conntrack_in() will return err and set skb->_nfct to NULL if it is
> > > out of tcp window. ct action for this packet will set the ct_state
> > > to +inv which is as expected.
> >
> > This is because from conntrack p.o.v., such TCP packet is NOT part of
> > the existing connection.
> >
> > For example, because it is considered part of a previous incarnation
> > of the same connection.
> >
> > > But a controller cannot add an OVS flow as
> > >
> > > table=21,priority=100,ct_state=+inv, actions=drop
> > >
> > > to drop such packets. That is because when ct action is executed on other
> > > packets which are not part of existing committed connections, ct_state
> > > can be set to invalid. Few such cases are:
> > >    - ICMP reply packets.
> >
> > Can you elaborate? Echo reply should not be invalid. Conntrack should
> > mark it as established (unless such echo reply came out of the blue).
> 
> Hi Florian,
> 
> Thanks for providing the comments.
> 
> Sorry for not being very clear.
> 
> Let me brief about the present problem we see in OVN (which is a
> controller using ovs)
> 
> When a VM/container sends a packet (in the ingress direction), we don't send all
> the packets to conntrack. If a packet is destined to an OVN load
> balancer virtual ip,
> only then we send the packet to conntrack in the ingress direction and
> then we do dnat
> to the backend.

Ah, okay.  That explains the INVALID then, but in that case I think this
patch/direction is less desirable from conntrack point of view.

More below.

> table=1, match = (ip && ip4.dst == VIP) action = ct(table=2)
> tablle=2, ct_state=+new+trk && ip4.dst == VIP, action = ct(commit,
> nat=BACKEND_IP)
> ...
> ..
> 
> However for the egress direction (when the packet is to be delivered
> to the VM/container),
> we send all the packets to conntrack and if the ct.est is set, we do
> undnat before delivering
> the packet to the VM/container.
> ...
> table=40, match = ip, action = ct(table=41)
> table=41, match = ct_state=+est+trk, action = ct(nat)
> ...
> 
> What I mean here is that, since we send all the packets in the egress
> pipeline to conntrack,
> we can't add a flow like - match = ct_state=+inv, action = drop.

Basically this is like a normal routing/netfitler (no ovs), where
the machine in question only sees unidirectional traffic (reply packets
taking different route).

> i.e When a VM/container sends an ICMP request packet, it will not be
> sent to conntrack, but
> the reply ICMP will be sent to conntrack and it will be marked as invalid.

Yes.  For plain netfilter, the solution would be to accept INVALID icmp
replies in the iptables/nftables ruleset.

> So is the case with TCP, the TCP SYN from the VM is not sent to
> conntrack, but the SYN/ACK
> from the server would be sent to conntrack and it will be marked as invalid.

Right.

> > 1. packet was not even seen by conntrack
> > 2. packet matches existing connection, but is "bad", for example:
> >   - contradicting tcp flags
> >   - out of window
> >   - invalid checksum
> 
> We want the below to be solved (using OVS flows) :
>   - If the packet is marked as invalid due to (2) which you mentioned above,
>     we would like to read the ct_mark and ct_label fields as the packet is
>     part of existing connection, so that we can add an OVS flow like
>
> ct_state=+inv+trk,ct_label=0x2 actions=drop

Wouldn't it make more sense to let tcp conntrack skip the in-window
check?  AFAIU thats the only problem and its identical to other cases
that we have at the moment, for example, conntrack supports mid-stream
pickup (on by default) which also disables tcp window checks.

> I tested by setting 'be_liberal' sysctl flag and since skb->_nfct was
> set for (2), OVS
> datapath module set the ct_state to +est.

Yes.  This flag can be set programatically on a per-ct basis.

See nft_flow_offload_eval() for example (BE_LIBERAL).
Given we now have multiple places that set this I think it would make
sense to add a helper, say, e.g.

void nf_ct_tcp_be_liberal(struct nf_conn *ct);
?

(Or perhaps nf_ct_tcp_disable_window_checks() , might be more
 clear/descriptive as to what this is doing).

Do you think that this resolves the OVN problem?
Numan Siddique Oct. 6, 2020, 8:02 p.m. UTC | #5
On Wed, Oct 7, 2020 at 12:22 AM Florian Westphal <fw@strlen.de> wrote:
>
> Numan Siddique <nusiddiq@redhat.com> wrote:
> > On Tue, Oct 6, 2020 at 4:46 PM Florian Westphal <fw@strlen.de> wrote:
> > >
> > > nusiddiq@redhat.com <nusiddiq@redhat.com> wrote:
> > > > From: Numan Siddique <nusiddiq@redhat.com>
> > > >
> > > > For a tcp packet which is part of an existing committed connection,
> > > > nf_conntrack_in() will return err and set skb->_nfct to NULL if it is
> > > > out of tcp window. ct action for this packet will set the ct_state
> > > > to +inv which is as expected.
> > >
> > > This is because from conntrack p.o.v., such TCP packet is NOT part of
> > > the existing connection.
> > >
> > > For example, because it is considered part of a previous incarnation
> > > of the same connection.
> > >
> > > > But a controller cannot add an OVS flow as
> > > >
> > > > table=21,priority=100,ct_state=+inv, actions=drop
> > > >
> > > > to drop such packets. That is because when ct action is executed on other
> > > > packets which are not part of existing committed connections, ct_state
> > > > can be set to invalid. Few such cases are:
> > > >    - ICMP reply packets.
> > >
> > > Can you elaborate? Echo reply should not be invalid. Conntrack should
> > > mark it as established (unless such echo reply came out of the blue).
> >
> > Hi Florian,
> >
> > Thanks for providing the comments.
> >
> > Sorry for not being very clear.
> >
> > Let me brief about the present problem we see in OVN (which is a
> > controller using ovs)
> >
> > When a VM/container sends a packet (in the ingress direction), we don't send all
> > the packets to conntrack. If a packet is destined to an OVN load
> > balancer virtual ip,
> > only then we send the packet to conntrack in the ingress direction and
> > then we do dnat
> > to the backend.
>
> Ah, okay.  That explains the INVALID then, but in that case I think this
> patch/direction is less desirable from conntrack point of view.
>
> More below.
>
> > table=1, match = (ip && ip4.dst == VIP) action = ct(table=2)
> > tablle=2, ct_state=+new+trk && ip4.dst == VIP, action = ct(commit,
> > nat=BACKEND_IP)
> > ...
> > ..
> >
> > However for the egress direction (when the packet is to be delivered
> > to the VM/container),
> > we send all the packets to conntrack and if the ct.est is set, we do
> > undnat before delivering
> > the packet to the VM/container.
> > ...
> > table=40, match = ip, action = ct(table=41)
> > table=41, match = ct_state=+est+trk, action = ct(nat)
> > ...
> >
> > What I mean here is that, since we send all the packets in the egress
> > pipeline to conntrack,
> > we can't add a flow like - match = ct_state=+inv, action = drop.
>
> Basically this is like a normal routing/netfitler (no ovs), where
> the machine in question only sees unidirectional traffic (reply packets
> taking different route).
>
> > i.e When a VM/container sends an ICMP request packet, it will not be
> > sent to conntrack, but
> > the reply ICMP will be sent to conntrack and it will be marked as invalid.
>
> Yes.  For plain netfilter, the solution would be to accept INVALID icmp
> replies in the iptables/nftables ruleset.
>
> > So is the case with TCP, the TCP SYN from the VM is not sent to
> > conntrack, but the SYN/ACK
> > from the server would be sent to conntrack and it will be marked as invalid.
>
> Right.
>
> > > 1. packet was not even seen by conntrack
> > > 2. packet matches existing connection, but is "bad", for example:
> > >   - contradicting tcp flags
> > >   - out of window
> > >   - invalid checksum
> >
> > We want the below to be solved (using OVS flows) :
> >   - If the packet is marked as invalid due to (2) which you mentioned above,
> >     we would like to read the ct_mark and ct_label fields as the packet is
> >     part of existing connection, so that we can add an OVS flow like
> >
> > ct_state=+inv+trk,ct_label=0x2 actions=drop
>
> Wouldn't it make more sense to let tcp conntrack skip the in-window
> check?  AFAIU thats the only problem and its identical to other cases
> that we have at the moment, for example, conntrack supports mid-stream
> pickup (on by default) which also disables tcp window checks.
>
> > I tested by setting 'be_liberal' sysctl flag and since skb->_nfct was
> > set for (2), OVS
> > datapath module set the ct_state to +est.
>
> Yes.  This flag can be set programatically on a per-ct basis.
>
> See nft_flow_offload_eval() for example (BE_LIBERAL).
> Given we now have multiple places that set this I think it would make
> sense to add a helper, say, e.g.
>
> void nf_ct_tcp_be_liberal(struct nf_conn *ct);
> ?
>
> (Or perhaps nf_ct_tcp_disable_window_checks() , might be more
>  clear/descriptive as to what this is doing).
>
> Do you think that this resolves the OVN problem?

Thanks for the comments. I think this should resolve the problem for OVN.

Numan

>
diff mbox series

Patch

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 8300cc29dec8..db942986c5b7 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -768,6 +768,9 @@  struct ovs_action_hash {
  * respectively.  Remaining bits control the changes for which an event is
  * delivered on the NFNLGRP_CONNTRACK_UPDATE group.
  * @OVS_CT_ATTR_TIMEOUT: Variable length string defining conntrack timeout.
+ * @OVS_CT_ATTR_LOOKUP_INV: If present, looks up and sets the state, mark and
+ * labels for an invalid packet (eg. out of tcp window) if it is part of
+ * committed connection.
  */
 enum ovs_ct_attr {
 	OVS_CT_ATTR_UNSPEC,
@@ -782,6 +785,7 @@  enum ovs_ct_attr {
 	OVS_CT_ATTR_EVENTMASK,  /* u32 mask of IPCT_* events. */
 	OVS_CT_ATTR_TIMEOUT,	/* Associate timeout with this connection for
 				 * fine-grain timeout tuning. */
+	OVS_CT_ATTR_LOOKUP_INV, /* No argument. */
 	__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index e6fe26a9c892..a6f96d9b4452 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -62,6 +62,7 @@  struct ovs_conntrack_info {
 	u8 nat : 3;                 /* enum ovs_ct_nat */
 	u8 force : 1;
 	u8 have_eventmask : 1;
+	u8 lookup_invalid : 1;
 	u16 family;
 	u32 eventmask;              /* Mask of 1 << IPCT_*. */
 	struct md_mark mark;
@@ -601,12 +602,13 @@  ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
  *
  * Must be called with rcu_read_lock.
  *
- * On success, populates skb->_nfct and returns the connection.  Returns NULL
- * if there is no existing entry.
+ * On success, populates skb->_nfct if 'skb_set_ct' is true and returns the
+ * connection.  Returns NULL if there is no existing entry.
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
-		     u8 l3num, struct sk_buff *skb, bool natted)
+		     u8 l3num, struct sk_buff *skb, bool natted,
+		     bool skb_set_ct)
 {
 	struct nf_conntrack_tuple tuple;
 	struct nf_conntrack_tuple_hash *h;
@@ -636,14 +638,17 @@  ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
 
 	ct = nf_ct_tuplehash_to_ctrack(h);
 
-	/* Inverted packet tuple matches the reverse direction conntrack tuple,
-	 * select the other tuplehash to get the right 'ctinfo' bits for this
-	 * packet.
-	 */
-	if (natted)
-		h = &ct->tuplehash[!h->tuple.dst.dir];
+	if (skb_set_ct) {
+		/* Inverted packet tuple matches the reverse direction
+		 * conntrack tuple, select the other tuplehash to get the
+		 * right 'ctinfo' bits for this packet.
+		 */
+		if (natted)
+			h = &ct->tuplehash[!h->tuple.dst.dir];
+
+		nf_ct_set(skb, ct, ovs_ct_get_info(h));
+	}
 
-	nf_ct_set(skb, ct, ovs_ct_get_info(h));
 	return ct;
 }
 
@@ -669,7 +674,7 @@  struct nf_conn *ovs_ct_executed(struct net *net,
 	if (*ct_executed || (!key->ct_state && info->force)) {
 		ct = ovs_ct_find_existing(net, &info->zone, info->family, skb,
 					  !!(key->ct_state &
-					  OVS_CS_F_NAT_MASK));
+					  OVS_CS_F_NAT_MASK), true);
 	}
 
 	return ct;
@@ -1033,6 +1038,20 @@  static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key,
 		    ovs_ct_helper(skb, info->family) != NF_ACCEPT) {
 			return -EINVAL;
 		}
+	} else if (info->lookup_invalid) {
+		/* nf_conntrack_in() sets skb->_nfct to NULL if the packet is
+		 * invalid (eg. out of tcp window) even if it belongs to
+		 * an existing connection. Check if there is an existing entry
+		 * and if so, update the key with the mark and ct_labels.
+		 */
+		ct = ovs_ct_find_existing(net, &info->zone, info->family, skb,
+					  false, false);
+		if (ct) {
+			u8 state;
+
+			state = OVS_CS_F_TRACKED | OVS_CS_F_INVALID;
+			__ovs_ct_update_key(key, state, &info->zone, ct);
+		}
 	}
 
 	return 0;
@@ -1602,6 +1621,9 @@  static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
 			}
 			break;
 #endif
+		case OVS_CT_ATTR_LOOKUP_INV:
+			info->lookup_invalid = true;
+			break;
 
 		default:
 			OVS_NLERR(log, "Unknown conntrack attr (%d)",
@@ -1819,6 +1841,9 @@  int ovs_ct_action_to_attr(const struct ovs_conntrack_info *ct_info,
 		if (nla_put_string(skb, OVS_CT_ATTR_TIMEOUT, ct_info->timeout))
 			return -EMSGSIZE;
 	}
+	if (ct_info->lookup_invalid &&
+	    nla_put_flag(skb, OVS_CT_ATTR_LOOKUP_INV))
+		return -EMSGSIZE;
 
 #if IS_ENABLED(CONFIG_NF_NAT)
 	if (ct_info->nat && !ovs_ct_nat_to_attr(ct_info, skb))