diff mbox

[RFC,nf-next] netfilter: nf_conntrack_proto_tcp: propagate IP_CT_TCP_FLAG_BE_LIBERAL

Message ID 147695370184.31999.2434286995020619745.stgit@nfdev2.cica.es
State RFC
Delegated to: Pablo Neira
Headers show

Commit Message

Arturo Borrero Gonzalez Oct. 20, 2016, 9 a.m. UTC
According to Mathew Heard, the IP_CT_TCP_FLAG_BE_LIBERAL
is not being propagated properly while using userspace conntrackd to
replicate connections states in a firewall cluster.

This change modifies the behaviour of the engine to always be liberal in
the reply direction if we were liberal in the original direction as well.

More info in the Netfilter bugzilla:
 https://bugzilla.netfilter.org/show_bug.cgi?id=1087

Suggested-by: Mathew Heard <mat999@gmail.com>
Signed-off-by: Arturo Borrero Gonzalez <arturo@debian.org>
---
RFC: I don't fully understand this patch. Specifically, I don't understand
why this can't be done from userspace, in conntrackd, when creating/updating
synced conntracks. We could just set the new/updated conntrack with the flags
we want, don't we?

Also, I don't fully understand the consecuences of doing this flags change
in the middle of tcp_packet().

So, please, review the patch and give us comments.

 net/netfilter/nf_conntrack_proto_tcp.c |    7 +++++++
 1 file changed, 7 insertions(+)


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Pablo Neira Ayuso Oct. 20, 2016, 6:14 p.m. UTC | #1
On Thu, Oct 20, 2016 at 11:00:49AM +0200, Arturo Borrero Gonzalez wrote:
> According to Mathew Heard, the IP_CT_TCP_FLAG_BE_LIBERAL
> is not being propagated properly while using userspace conntrackd to
> replicate connections states in a firewall cluster.
> 
> This change modifies the behaviour of the engine to always be liberal in
> the reply direction if we were liberal in the original direction as well.
> 
> More info in the Netfilter bugzilla:
>  https://bugzilla.netfilter.org/show_bug.cgi?id=1087
> 
> Suggested-by: Mathew Heard <mat999@gmail.com>
> Signed-off-by: Arturo Borrero Gonzalez <arturo@debian.org>
> ---
> RFC: I don't fully understand this patch. Specifically, I don't understand
> why this can't be done from userspace, in conntrackd, when creating/updating
> synced conntracks. We could just set the new/updated conntrack with the flags
> we want, don't we?
> 
> Also, I don't fully understand the consecuences of doing this flags change
> in the middle of tcp_packet().
> 
> So, please, review the patch and give us comments.

There is a 'TCPWindowTracking' option that you can set on from the
configuration file.

Is that probably what Mathew needs?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arturo Borrero Gonzalez Oct. 21, 2016, 7:15 a.m. UTC | #2
On 20 October 2016 at 20:14, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> There is a 'TCPWindowTracking' option that you can set on from the
> configuration file.
>
> Is that probably what Mathew needs?

@Mathew, could you please share what are your problems with this
conntrackd code?


/* disable TCP window tracking for recovered connections if required */
if (nfct_attr_is_set(ct, ATTR_TCP_STATE)) {
    uint8_t flags = IP_CT_TCP_FLAG_SACK_PERM;

    if (!CONFIG(sync).tcp_window_tracking)
        flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
    else
        flags |= IP_CT_TCP_FLAG_WINDOW_SCALE;
[...]
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arturo Borrero Gonzalez Oct. 21, 2016, 7:22 a.m. UTC | #3
(please keep the netfilter-devel list in CC)

On 21 October 2016 at 09:18, Mathew Heard <mat999@gmail.com> wrote:
> That's been covered already.
>
> The problem with it is that only the ORIG side of the connection ends
> up set. REPLY does not.
>
> I don't know the fundamental reason why this occurs, only the effect.
>

In that same function, in conntrackd:
 http://git.netfilter.org/conntrack-tools/tree/src/netlink.c#n256

we set the same flags in both original and reply directions:

nfct_set_attr_u8(ct, ATTR_TCP_FLAGS_ORIG, flags);
nfct_set_attr_u8(ct, ATTR_TCP_MASK_ORIG, flags);
nfct_set_attr_u8(ct, ATTR_TCP_FLAGS_REPL, flags);
nfct_set_attr_u8(ct, ATTR_TCP_MASK_REPL, flags);
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathew Heard Oct. 21, 2016, 7:26 a.m. UTC | #4
However under testing, in practice is not. As covered in the bug.

Fields: CTA_IP_V4_DST, CTA_PROTOINFO_TCP_FLAGS_ORIGINAL &
CTA_PROTOINFO_TCP_FLAGS_REPLY
Result: "**.**.56.135: 10 3"

It's only being set on one side. I believe this is because the reply
side flags are being set/initialised after the fact (i.e where they
are initialised in that function for incoming connections would do it
too).


On Fri, Oct 21, 2016 at 6:22 PM, Arturo Borrero Gonzalez
<arturo@debian.org> wrote:
> (please keep the netfilter-devel list in CC)
>
> On 21 October 2016 at 09:18, Mathew Heard <mat999@gmail.com> wrote:
>> That's been covered already.
>>
>> The problem with it is that only the ORIG side of the connection ends
>> up set. REPLY does not.
>>
>> I don't know the fundamental reason why this occurs, only the effect.
>>
>
> In that same function, in conntrackd:
>  http://git.netfilter.org/conntrack-tools/tree/src/netlink.c#n256
>
> we set the same flags in both original and reply directions:
>
> nfct_set_attr_u8(ct, ATTR_TCP_FLAGS_ORIG, flags);
> nfct_set_attr_u8(ct, ATTR_TCP_MASK_ORIG, flags);
> nfct_set_attr_u8(ct, ATTR_TCP_FLAGS_REPL, flags);
> nfct_set_attr_u8(ct, ATTR_TCP_MASK_REPL, flags);
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso Oct. 21, 2016, 9:56 a.m. UTC | #5
On Fri, Oct 21, 2016 at 06:26:28PM +1100, Mathew Heard wrote:
> However under testing, in practice is not. As covered in the bug.
> 
> Fields: CTA_IP_V4_DST, CTA_PROTOINFO_TCP_FLAGS_ORIGINAL &
> CTA_PROTOINFO_TCP_FLAGS_REPLY
> Result: "**.**.56.135: 10 3"

From where are you printing this? userspace or kernel?

> It's only being set on one side. I believe this is because the reply
> side flags are being set/initialised after the fact (i.e where they
> are initialised in that function for incoming connections would do it
> too).

Please develop this a bit more.

Is there anything we should know on your infrastructure? eg. kernel
and library version, what architecture you using?

Asking this because I found an old report on problems on ARM that the
submitter never confirmed to be fixed.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathew Heard Oct. 21, 2016, 10:15 a.m. UTC | #6
On Fri, Oct 21, 2016 at 8:56 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Fri, Oct 21, 2016 at 06:26:28PM +1100, Mathew Heard wrote:
>> However under testing, in practice is not. As covered in the bug.
>>
>> Fields: CTA_IP_V4_DST, CTA_PROTOINFO_TCP_FLAGS_ORIGINAL &
>> CTA_PROTOINFO_TCP_FLAGS_REPLY
>> Result: "**.**.56.135: 10 3"
>
> From where are you printing this? userspace or kernel?
>

CTA_* comes from libnetfilter_conntrack, which is userspace.

I have however also printk'ed flags in the kernel during testing and
seen the same (further confirmed by the crude fix working).

>> It's only being set on one side. I believe this is because the reply
>> side flags are being set/initialised after the fact (i.e where they
>> are initialised in that function for incoming connections would do it
>> too).
>
> Please develop this a bit more.
>
> Is there anything we should know on your infrastructure? eg. kernel
> and library version, what architecture you using?
>
> Asking this because I found an old report on problems on ARM that the
> submitter never confirmed to be fixed.
>
> Thanks.

AMD64 in both native (staging) and virtual (dev) environments.

Originally we found this issue with incoming connections, however due
to it being simpler to test I moved to testing outgoing.

I hope this ASCII diagram survives the mail system. Test System:

[NAT Router A]  -----
         |                    \_____
    Conntrackd          ______ Target Box
         |                   /
[NAT Router B] ------

Target box connected via GRE (internal network range 10.x.x.x). Router
A and B both with standard DNAT & SNAT rules to provide connectivity &
port forwards. To test, I just change the route of an outgoing
connection from Target Box on Target Box mid connection (i.e via using
"ip rule")

With TCP window tracking disabled using sysctl's, or with the crude
patch this all works as expected.
Without the patch, due to the tcp flags of the reply side not
containing the correct flags, it does not.

Inbound testing was tested similarly, but by moving BGP announcements
between routers.

I have also replicated the same results in our staging environment but
thats substantially more complex.

Regards,
Mathew
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 69f6877..ed16acf 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -835,6 +835,13 @@  static int tcp_packet(struct nf_conn *ct,
 	new_state = tcp_conntracks[dir][index][old_state];
 	tuple = &ct->tuplehash[dir].tuple;
 
+	/* if we are liberal in one direction, so be it in the other */
+	if (ct->proto.tcp.seen[IP_CT_DIR_ORIGINAL].flags &
+	    IP_CT_TCP_FLAG_BE_LIBERAL) {
+		ct->proto.tcp.seen[IP_CT_DIR_REPLY].flags |=
+			IP_CT_TCP_FLAG_BE_LIBERAL;
+	}
+
 	switch (new_state) {
 	case TCP_CONNTRACK_SYN_SENT:
 		if (old_state < TCP_CONNTRACK_TIME_WAIT)