diff mbox

[RFC,nf_conntrack_tcp] Export ip_ct_tcp_state variables to userspace

Message ID 1387039571-6110-1-git-send-email-kristian.evensen@gmail.com
State Not Applicable
Headers show

Commit Message

Kristian Evensen Dec. 14, 2013, 4:46 p.m. UTC
From: Kristian Evensen <kristian.evensen@gmail.com>

Several of the TCP state variables tracked by conntrack are interesting for
userspace applications. This patch adds additional netlink attributes and
exports the rest of the variables contained in the ip_ct_tcp_state-struct, as
well as retrans from struct ip_ct_tcp. The size of the netlink message
increases, but as protoinfo is only called on get/dump, I don't think the
increased buffer requirement should be a problem.

One example of a use case for these variables is to make monitoring of TCP
connections on middleboxes easier and more efficient.  Applications would no
longer have to detect and keep track of TCP connections them self, they could
rather rely on the information provided by conntrack. By monitoring the
development of the sequence numbers and the window size, and using snapshots of
retrans, one could for example detect the type of tcp flow (thin/thick) and say
something about the quality of the link.

Would this functionality be useful or is the cost of a bigger message is too
large? If it is useful, what would be the best way to implement this
functionality? Now that all of ip_ct_tcp_state is contained in the message,
would it better to export the whole struct (similar to how it is done with
tcp_info and inet_diag)?

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
---
 include/uapi/linux/netfilter/nfnetlink_conntrack.h |  9 +++++++++
 net/netfilter/nf_conntrack_proto_tcp.c             | 20 +++++++++++++++++++-
 2 files changed, 28 insertions(+), 1 deletion(-)

Comments

Pablo Neira Ayuso Dec. 17, 2013, 1:16 p.m. UTC | #1
Hi Kristian,

On Sat, Dec 14, 2013 at 05:46:11PM +0100, Kristian Evensen wrote:
> From: Kristian Evensen <kristian.evensen@gmail.com>
> 
> Several of the TCP state variables tracked by conntrack are interesting for
> userspace applications. This patch adds additional netlink attributes and
> exports the rest of the variables contained in the ip_ct_tcp_state-struct, as
> well as retrans from struct ip_ct_tcp. The size of the netlink message
> increases, but as protoinfo is only called on get/dump, I don't think the
> increased buffer requirement should be a problem.

We have limited bandwidth in the event path and this information would
be also included in those messages.

> One example of a use case for these variables is to make monitoring of TCP
> connections on middleboxes easier and more efficient.  Applications would no
> longer have to detect and keep track of TCP connections them self, they could
> rather rely on the information provided by conntrack. By monitoring the
> development of the sequence numbers and the window size, and using snapshots of
> retrans, one could for example detect the type of tcp flow (thin/thick) and say
> something about the quality of the link.

Looking at the existing Netfilter userspace software that we have now,
I cannot see how we can benefit from this proposed feature.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kristian Evensen Dec. 17, 2013, 2:22 p.m. UTC | #2
Hi Pablo,

Thank you very much for your reply.

On Tue, Dec 17, 2013 at 2:16 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> We have limited bandwidth in the event path and this information would
> be also included in those messages.

Ah, I see, I missed that one. Thanks for letting me know.

> Looking at the existing Netfilter userspace software that we have now,
> I cannot see how we can benefit from this proposed feature.

One example application I thought of was a "middlebox/router-version"
of ss, with some added logic for for example identifying TCP
connections.

-Kristian
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso Dec. 17, 2013, 2:30 p.m. UTC | #3
On Tue, Dec 17, 2013 at 03:22:50PM +0100, Kristian Evensen wrote:
> Hi Pablo,
> 
> Thank you very much for your reply.
> 
> On Tue, Dec 17, 2013 at 2:16 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > We have limited bandwidth in the event path and this information would
> > be also included in those messages.
> 
> Ah, I see, I missed that one. Thanks for letting me know.
>
> > Looking at the existing Netfilter userspace software that we have now,
> > I cannot see how we can benefit from this proposed feature.
> 
> One example application I thought of was a "middlebox/router-version"
> of ss, with some added logic for for example identifying TCP
> connections.

I guess that application you refer can be the conntrack [1] userspace
tool.

I still don't see how that information can be useful to be included in
that output as it changes very fast and by polling you will only get
stale snapshots of what it's actually happening in the TCP tracking
subsystem.

[1] http://www.netfilter.org/projects/conntrack-tools/index.html
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kristian Evensen Dec. 17, 2013, 2:50 p.m. UTC | #4
On Tue, Dec 17, 2013 at 3:30 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> I still don't see how that information can be useful to be included in
> that output as it changes very fast and by polling you will only get
> stale snapshots of what it's actually happening in the TCP tracking
> subsystem.

Yes, this tool would be most useful for somewhat long-living TCP
connections. One idea I had was to correlate the development of max
ack/end with number of TCP bytes sent/received over a single link.
Little development in the sequence numbers, but large amounts of data
transferred could be used as an indication of a problematic link.
However, the more I think about it, this will be a very complicated
and unreliable solution, at least when scaled up.

Thanks for your help, back to the drawing board for me :)

-Kristian
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/uapi/linux/netfilter/nfnetlink_conntrack.h b/include/uapi/linux/netfilter/nfnetlink_conntrack.h
index acad6c5..f27fa5f 100644
--- a/include/uapi/linux/netfilter/nfnetlink_conntrack.h
+++ b/include/uapi/linux/netfilter/nfnetlink_conntrack.h
@@ -106,6 +106,15 @@  enum ctattr_protoinfo_tcp {
 	CTA_PROTOINFO_TCP_WSCALE_REPLY,
 	CTA_PROTOINFO_TCP_FLAGS_ORIGINAL,
 	CTA_PROTOINFO_TCP_FLAGS_REPLY,
+	CTA_PROTOINFO_TCP_END_ORIGINAL,
+	CTA_PROTOINFO_TCP_END_REPLY,
+	CTA_PROTOINFO_TCP_MAXEND_ORIGINAL,
+	CTA_PROTOINFO_TCP_MAXEND_REPLY,
+	CTA_PROTOINFO_TCP_MAXWIN_ORIGINAL,
+	CTA_PROTOINFO_TCP_MAXWIN_REPLY,
+	CTA_PROTOINFO_TCP_MAXACK_ORIGINAL,
+	CTA_PROTOINFO_TCP_MAXACK_REPLY,
+	CTA_PROTOINFO_TCP_RETRANS,
 	__CTA_PROTOINFO_TCP_MAX
 };
 #define CTA_PROTOINFO_TCP_MAX (__CTA_PROTOINFO_TCP_MAX - 1)
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 44d1ea3..17e85e5 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -1166,7 +1166,25 @@  static int tcp_to_nlattr(struct sk_buff *skb, struct nlattr *nla,
 	    nla_put_u8(skb, CTA_PROTOINFO_TCP_WSCALE_ORIGINAL,
 		       ct->proto.tcp.seen[0].td_scale) ||
 	    nla_put_u8(skb, CTA_PROTOINFO_TCP_WSCALE_REPLY,
-		       ct->proto.tcp.seen[1].td_scale))
+		       ct->proto.tcp.seen[1].td_scale) ||
+	    nla_put_u32(skb, CTA_PROTOINFO_TCP_END_ORIGINAL,
+		       ct->proto.tcp.seen[0].td_end) ||
+	    nla_put_u32(skb, CTA_PROTOINFO_TCP_END_REPLY,
+		       ct->proto.tcp.seen[1].td_end) ||
+	    nla_put_u32(skb, CTA_PROTOINFO_TCP_MAXEND_ORIGINAL,
+		       ct->proto.tcp.seen[0].td_maxend) ||
+	    nla_put_u32(skb, CTA_PROTOINFO_TCP_MAXEND_REPLY,
+		       ct->proto.tcp.seen[1].td_maxend) ||
+	    nla_put_u32(skb, CTA_PROTOINFO_TCP_MAXWIN_ORIGINAL,
+		       ct->proto.tcp.seen[0].td_maxwin) ||
+	    nla_put_u32(skb, CTA_PROTOINFO_TCP_MAXWIN_REPLY,
+		       ct->proto.tcp.seen[1].td_maxwin) ||
+	    nla_put_u32(skb, CTA_PROTOINFO_TCP_MAXACK_ORIGINAL,
+		       ct->proto.tcp.seen[0].td_maxack) ||
+	    nla_put_u32(skb, CTA_PROTOINFO_TCP_MAXACK_REPLY,
+		       ct->proto.tcp.seen[1].td_maxack) ||
+	    nla_put_u8(skb, CTA_PROTOINFO_TCP_RETRANS,
+		       ct->proto.tcp.retrans))
 		goto nla_put_failure;
 
 	tmp.flags = ct->proto.tcp.seen[0].flags;