From patchwork Thu Oct 12 22:48:07 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 825145 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="g42kM5Kp"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3yCmGV3lRlz9sNc for ; Fri, 13 Oct 2017 09:48:18 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753554AbdJLWsQ (ORCPT ); Thu, 12 Oct 2017 18:48:16 -0400 Received: from mail-pf0-f194.google.com ([209.85.192.194]:45755 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753291AbdJLWsO (ORCPT ); Thu, 12 Oct 2017 18:48:14 -0400 Received: by mail-pf0-f194.google.com with SMTP id d28so7013735pfe.2 for ; Thu, 12 Oct 2017 15:48:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=LLQrEpkmZioIHdgd0ejGYg76zPC9KyFOwqnEUCQX+pA=; b=g42kM5KpxLpniJwBhIckYtE/PRzi9MS4LmRwEuWxZy/d148k9YdhDD0N3Owd9fRXH2 2+jClF9hp1ofPuvn0Ms2jO0Pq4t7Rgp0m3FEWaJ1R9wLERZxHjmD3NwxEETldW+95VUX +w6hSxKiBTLdQ81EY9jsgCKUZaEeauGJM3vra4qTIo6t+K4ATszy89RdRz7GDS1m/TtU bZQ5okoHnlXnKLpawLCbiG9pY8xKilq6iE+hcplRnczoQmihCOZvRZQ2o1EQw9H4+94X 5H/k4QXA7W5pAnPdml7d7hAkJOxfso6GahFBB7qkWnAHeNCXe3BEsk/Cb8fNAaYul2zp 55tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=LLQrEpkmZioIHdgd0ejGYg76zPC9KyFOwqnEUCQX+pA=; b=snQxIz47g6FI/MZ/97f6vIMaz1u0w3EXVS6Hrfaaa32p6fPzvBuu51yfnGQBw4N/LF PKnbWaj54Rx01S4QwRhcri8uck59oTTblG3gKBd6qtERPFpF/j08lbcIi4Q99JaVKMpy LLBY5x2OG3Va0B3FkmeDjukoJBjIInSKxmHl7X5KiJe+jj0Fze1jKDSGH27uVZPOBB9m iu3GXMIVTRAxR8RUFwAwX3qyiHFP04XIHZRTArWuL9zVRs+uk31P+7HCMct4fLGmLj6m 2uP86AvKTDcvx53kj42Zf3H96oa7szzr/b0FfZN+rfrbSA3++2ex+9YIfoRNI8vgGFAi BBWA== X-Gm-Message-State: AMCzsaWs4XXaMcF2lGBsyKQPDwbjpZ2Vt9/tKedKN+fE/sKAXnriBeKX o7i8EagJBcthob4Tu/wKN1efvgju X-Google-Smtp-Source: AOwi7QBDT8+MpEyFpYZ659SYuS6yRwT9dHWxWNQkY7c/zwCXT+rKPt2LP5hZTzYjx7BpOdrsom6NIw== X-Received: by 10.99.96.15 with SMTP id u15mr1403822pgb.424.1507848494293; Thu, 12 Oct 2017 15:48:14 -0700 (PDT) Received: from tw-172-25-30-113.office.twttr.net ([8.25.197.25]) by smtp.gmail.com with ESMTPSA id e22sm26520829pgn.28.2017.10.12.15.48.13 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 12 Oct 2017 15:48:13 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: Cong Wang , Eric Dumazet , Alexei Starovoitov , Hannes Frederic Sowa , Brendan Gregg , Neal Cardwell Subject: [Patch net-next v2] tcp: add a tracepoint for tcp_retransmit_skb() Date: Thu, 12 Oct 2017 15:48:07 -0700 Message-Id: <20171012224807.1669-1-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.9.4 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org We need a real-time notification for tcp retransmission for monitoring. Of course we could use ftrace to dynamically instrument this kernel function too, however we can't retrieve the connection information at the same time, for example perf-tools [1] reads /proc/net/tcp for socket details, which is slow when we have a lots of connections. Therefore, this patch adds a tracepoint for tcp_retransmit_skb() and exposes src/dst IP addresses and ports of the connection. This also makes it easier to integrate into perf. Note, I expose both IPv4 and IPv6 addresses at the same time: for a IPv4 socket, v4 mapped address is used as IPv6 addresses, for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel. Also, add sk and skb pointers as they are useful for BPF. 1. https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans Cc: Eric Dumazet Cc: Alexei Starovoitov Cc: Hannes Frederic Sowa Cc: Brendan Gregg Cc: Neal Cardwell Signed-off-by: Cong Wang --- include/trace/events/tcp.h | 68 ++++++++++++++++++++++++++++++++++++++++++++++ net/core/net-traces.c | 1 + net/ipv4/tcp_output.c | 3 ++ 3 files changed, 72 insertions(+) create mode 100644 include/trace/events/tcp.h diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h new file mode 100644 index 000000000000..749f93c542ab --- /dev/null +++ b/include/trace/events/tcp.h @@ -0,0 +1,68 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM tcp + +#if !defined(_TRACE_TCP_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_TCP_H + +#include +#include +#include +#include + +TRACE_EVENT(tcp_retransmit_skb, + + TP_PROTO(struct sock *sk, struct sk_buff *skb, int segs), + + TP_ARGS(sk, skb, segs), + + TP_STRUCT__entry( + __field(void *, skbaddr) + __field(void *, skaddr) + __field(__u16, sport) + __field(__u16, dport) + __array(__u8, saddr, 4) + __array(__u8, daddr, 4) + __array(__u8, saddr_v6, 16) + __array(__u8, daddr_v6, 16) + ), + + TP_fast_assign( + struct ipv6_pinfo *np = inet6_sk(sk); + struct inet_sock *inet = inet_sk(sk); + struct in6_addr *pin6; + __be32 *p32; + + __entry->skbaddr = skb; + __entry->skaddr = sk; + + __entry->sport = ntohs(inet->inet_sport); + __entry->dport = ntohs(inet->inet_dport); + + p32 = (__be32 *) __entry->saddr; + *p32 = inet->inet_saddr; + + p32 = (__be32 *) __entry->daddr; + *p32 = inet->inet_daddr; + + if (np) { + pin6 = (struct in6_addr *)__entry->saddr_v6; + *pin6 = np->saddr; + pin6 = (struct in6_addr *)__entry->daddr_v6; + *pin6 = *(np->daddr_cache); + } else { + pin6 = (struct in6_addr *)__entry->saddr_v6; + ipv6_addr_set_v4mapped(inet->inet_saddr, pin6); + pin6 = (struct in6_addr *)__entry->daddr_v6; + ipv6_addr_set_v4mapped(inet->inet_daddr, pin6); + } + ), + + TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6 daddrv6=%pI6", + __entry->sport, __entry->dport, __entry->saddr, __entry->daddr, + __entry->saddr_v6, __entry->daddr_v6) +); + +#endif /* _TRACE_TCP_H */ + +/* This part must be outside protection */ +#include diff --git a/net/core/net-traces.c b/net/core/net-traces.c index 1132820c8e62..f4e4fa2db505 100644 --- a/net/core/net-traces.c +++ b/net/core/net-traces.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #if IS_ENABLED(CONFIG_IPV6) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 696b0a168f16..e1e7410a5b60 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -42,6 +42,8 @@ #include #include +#include + /* People can turn this off for buggy TCP's found in printers etc. */ int sysctl_tcp_retrans_collapse __read_mostly = 1; @@ -2875,6 +2877,7 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs) if (likely(!err)) { TCP_SKB_CB(skb)->sacked |= TCPCB_EVER_RETRANS; + trace_tcp_retransmit_skb(sk, skb, segs); } else if (err != -EBUSY) { NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPRETRANSFAIL); }