mbox series

[net-next,00/10] drop_monitor: Capture dropped packets and metadata

Message ID 20190807103059.15270-1-idosch@idosch.org
Headers show
Series drop_monitor: Capture dropped packets and metadata | expand

Message

Ido Schimmel Aug. 7, 2019, 10:30 a.m. UTC
From: Ido Schimmel <idosch@mellanox.com>

So far drop monitor supported only one mode of operation in which a
summary of recent packet drops is periodically sent to user space as a
netlink event. The event only includes the drop location (program
counter) and number of drops in the last interval.

While this mode of operation allows one to understand if the system is
dropping packets, it is not sufficient if a more detailed analysis is
required. Both the packet itself and related metadata are missing.

This patchset extends drop monitor with another mode of operation where
the packet - potentially truncated - and metadata (e.g., drop location,
timestamp, netdev) are sent to user space as a netlink event. Thanks to
the extensible nature of netlink, more metadata can be added in the
future.

To avoid performing expensive operations in the context in which
kfree_skb() is called, the dropped skbs are cloned and queued on per-CPU
skb drop list. The list is then processed in process context (using a
workqueue), where the netlink messages are allocated, prepared and
finally sent to user space.

A follow-up patchset will integrate drop monitor with devlink and allow
the latter to call into drop monitor to report hardware drops. In the
future, XDP drops can be added as well, thereby making drop monitor the
go-to netlink channel for diagnosing all packet drops.

Example usage with patched dropwatch [1] can be found here [2]. Example
dissection of drop monitor netlink events with patched wireshark [3] can
be found here [4]. I will submit both changes upstream after the kernel
changes are accepted. Another change worth making is adding a dropmon
pseudo interface to libpcap, similar to the nflog interface [5]. This
will allow users to specifically listen on dropmon traffic instead of
capturing all netlink packets via the nlmon netdev.

Patches #1-#5 prepare the code towards the actual changes in later
patches.

Patch #6 adds another mode of operation to drop monitor in which the
dropped packet itself is notified to user space along with metadata.

Patch #7 allows users to truncate reported packets to a specific length,
in case only the headers are of interest. The original length of the
packet is added as metadata to the netlink notification.

Patch #8 allows user to query the current configuration of drop monitor
(e.g., alert mode, truncation length).

Patches #9-#10 allow users to tune the length of the per-CPU skb drop
list according to their needs.

Changes since RFC [6]:
* Limit the length of the per-CPU skb drop list and make it configurable
* Do not use the hysteresis timer in packet alert mode
* Introduce alert mode operations in a separate patch and only then
  introduce the new alert mode
* Use 'skb->skb_iif' instead of 'skb->dev' because the latter is inside
  a union with 'dev_scratch' and therefore not guaranteed to point to a
  valid netdev
* Return '-EBUSY' instead of '-EOPNOTSUPP' when trying to configure drop
  monitor while it is monitoring
* Did not change schedule_work() in favor of schedule_work_on() as I did
  not observe a change in number of tail drops

[1] https://github.com/idosch/dropwatch/tree/packet-mode
[2] https://gist.github.com/idosch/166b64384577174230fd2523866f6b1c#file-gistfile1-txt
[3] https://github.com/idosch/wireshark/tree/drop-monitor-v1
[4] https://gist.github.com/idosch/166b64384577174230fd2523866f6b1c#file-gistfile2-txt
[5] https://github.com/the-tcpdump-group/libpcap/blob/master/pcap-netfilter-linux.c
[6] https://patchwork.ozlabs.org/cover/1135226/

Ido Schimmel (10):
  drop_monitor: Split tracing enable / disable to different functions
  drop_monitor: Initialize timer and work item upon tracing enable
  drop_monitor: Reset per-CPU data before starting to trace
  drop_monitor: Require CAP_NET_ADMIN for drop monitor configuration
  drop_monitor: Add alert mode operations
  drop_monitor: Add packet alert mode
  drop_monitor: Allow truncation of dropped packets
  drop_monitor: Add a command to query current configuration
  drop_monitor: Make drop queue length configurable
  drop_monitor: Expose tail drop counter

 include/uapi/linux/net_dropmon.h |  50 +++
 net/core/drop_monitor.c          | 594 +++++++++++++++++++++++++++++--
 2 files changed, 607 insertions(+), 37 deletions(-)

Comments

David Ahern Aug. 8, 2019, 9:08 p.m. UTC | #1
On 8/7/19 4:30 AM, Ido Schimmel wrote:
> Example usage with patched dropwatch [1] can be found here [2]. Example
> dissection of drop monitor netlink events with patched wireshark [3] can
> be found here [4]. I will submit both changes upstream after the kernel
> changes are accepted. Another change worth making is adding a dropmon
> pseudo interface to libpcap, similar to the nflog interface [5]. This
> will allow users to specifically listen on dropmon traffic instead of
> capturing all netlink packets via the nlmon netdev.

Nice work, Ido.

On top of your dropwatch changes I added the ability to print the
payload as hex. e.g.,

Issue Ctrl-C to stop monitoring
drop at: nf_hook_slow+0x59/0x98 (0xffffffff814ec532)
input port ifindex: 1
timestamp: Thu Aug  8 15:04:02 2019 360015026 nsec
length: 64
00 00 00 00 00 00 00 00  00 00 00 00 08 00 45 00      ........ ......E.
00 3c e7 50 40 00 40 06  55 69 7f 00 00 01 7f 00      .<.P@.@. Ui......
00 01 80 2c 30 39 74 b9  c7 4d 00 00 00 00 a0 02      ...,09t. .M......
ff d7 fe 30 00 00 02 04  ff d7 04 02 08 0a 53 79       ...0.... ......Sy
original length: 74


Seems like the skb protocol is also needed to properly parse the payload
- ie., to know it is an ethernet header, followed by ip and tcp.
Toke Høiland-Jørgensen Aug. 9, 2019, 8:41 a.m. UTC | #2
Ido Schimmel <idosch@idosch.org> writes:

> From: Ido Schimmel <idosch@mellanox.com>
>
> So far drop monitor supported only one mode of operation in which a
> summary of recent packet drops is periodically sent to user space as a
> netlink event. The event only includes the drop location (program
> counter) and number of drops in the last interval.
>
> While this mode of operation allows one to understand if the system is
> dropping packets, it is not sufficient if a more detailed analysis is
> required. Both the packet itself and related metadata are missing.
>
> This patchset extends drop monitor with another mode of operation where
> the packet - potentially truncated - and metadata (e.g., drop location,
> timestamp, netdev) are sent to user space as a netlink event. Thanks to
> the extensible nature of netlink, more metadata can be added in the
> future.
>
> To avoid performing expensive operations in the context in which
> kfree_skb() is called, the dropped skbs are cloned and queued on per-CPU
> skb drop list. The list is then processed in process context (using a
> workqueue), where the netlink messages are allocated, prepared and
> finally sent to user space.
>
> A follow-up patchset will integrate drop monitor with devlink and allow
> the latter to call into drop monitor to report hardware drops. In the
> future, XDP drops can be added as well, thereby making drop monitor the
> go-to netlink channel for diagnosing all packet drops.

This is great. Are you planning to add the XDP integration as well? :)

-Toke
Ido Schimmel Aug. 9, 2019, 12:38 p.m. UTC | #3
On Thu, Aug 08, 2019 at 03:08:25PM -0600, David Ahern wrote:
> On top of your dropwatch changes I added the ability to print the
> payload as hex. e.g.,
> 
> Issue Ctrl-C to stop monitoring
> drop at: nf_hook_slow+0x59/0x98 (0xffffffff814ec532)
> input port ifindex: 1
> timestamp: Thu Aug  8 15:04:02 2019 360015026 nsec
> length: 64
> 00 00 00 00 00 00 00 00  00 00 00 00 08 00 45 00      ........ ......E.
> 00 3c e7 50 40 00 40 06  55 69 7f 00 00 01 7f 00      .<.P@.@. Ui......
> 00 01 80 2c 30 39 74 b9  c7 4d 00 00 00 00 a0 02      ...,09t. .M......
> ff d7 fe 30 00 00 02 04  ff d7 04 02 08 0a 53 79       ...0.... ......Sy
> original length: 74

Nice! I'll Cc you on my pull request so that you could submit your
changes afterwards.

> Seems like the skb protocol is also needed to properly parse the payload
> - ie., to know it is an ethernet header, followed by ip and tcp.

Yes, you're right. Will add this in v2.

Thanks, David.
Ido Schimmel Aug. 9, 2019, 12:54 p.m. UTC | #4
On Fri, Aug 09, 2019 at 10:41:47AM +0200, Toke Høiland-Jørgensen wrote:
> This is great. Are you planning to add the XDP integration as well? :)

Thanks, Toke. From one of your previous replies I got the impression
that another hook needs to be added in order to catch 'XDP_DROP' as it
is not covered by the 'xdp_exception' tracepoint which only covers
'XDP_ABORTED'. If you can take care of that, I can look into the
integration with drop monitor.

I kept the XDP case in mind while working on the hardware originated
drops and I think that extending drop monitor for this use case should
be relatively straightforward. I will Cc you on the patchset that adds
monitoring of hardware drops and comment with how I think XDP
integration should look like.
Toke Høiland-Jørgensen Aug. 9, 2019, 6:15 p.m. UTC | #5
Ido Schimmel <idosch@idosch.org> writes:

> On Fri, Aug 09, 2019 at 10:41:47AM +0200, Toke Høiland-Jørgensen wrote:
>> This is great. Are you planning to add the XDP integration as well? :)
>
> Thanks, Toke. From one of your previous replies I got the impression
> that another hook needs to be added in order to catch 'XDP_DROP' as it
> is not covered by the 'xdp_exception' tracepoint which only covers
> 'XDP_ABORTED'. If you can take care of that, I can look into the
> integration with drop monitor.
>
> I kept the XDP case in mind while working on the hardware originated
> drops and I think that extending drop monitor for this use case should
> be relatively straightforward. I will Cc you on the patchset that adds
> monitoring of hardware drops and comment with how I think XDP
> integration should look like.

SGTM, thanks!

-Toke