mbox series

[RFC,net-next,00/12] Add drop monitor for offloaded data paths

Message ID 20190528122136.30476-1-idosch@idosch.org
Headers show
Series Add drop monitor for offloaded data paths | expand

Message

Ido Schimmel May 28, 2019, 12:21 p.m. UTC
From: Ido Schimmel <idosch@mellanox.com>

Users have several ways to debug the kernel and understand why a packet
was dropped. For example, using "drop monitor" and "perf". Both
utilities trace kfree_skb(), which is the function called when a packet
is freed as part of a failure. The information provided by these tools
is invaluable when trying to understand the cause of a packet loss.

In recent years, large portions of the kernel data path were offloaded
to capable devices. Today, it is possible to perform L2 and L3
forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
Different TC classifiers and actions are also offloaded to capable
devices, at both ingress and egress.

However, when the data path is offloaded it is not possible to achieve
the same level of introspection as tools such "perf" and "drop monitor"
become irrelevant.

This patchset aims to solve this by allowing users to monitor packets
that the underlying device decided to drop along with relevant metadata
such as the drop reason and ingress port.

The above is achieved by exposing a fundamental capability of devices
capable of data path offloading - packet trapping. While the common use
case for packet trapping is the trapping of packets required for the
correct functioning of the control plane (e.g., STP, BGP packets),
packets can also be trapped due to other reasons such as exceptions
(e.g., TTL error) and drops (e.g., blackhole route).

Given this ability is not specific to a port, but rather to a device, it
is exposed using devlink. Each capable driver is expected to register
its supported packet traps with devlink and report trapped packets to
devlink as they income. devlink will perform accounting of received
packets and bytes and will potentially generate an event to user space
using a new generic netlink multicast group.

While this patchset is concerned with traps corresponding to dropped
packets, the interface itself is generic and can be used to expose traps
corresponding to control packets in the future. The API is vendor
neutral and similar to the API exposed by SAI which is implemented by
several vendors already.

The implementation in this patchset is on top of both mlxsw and
netdevsim so that people could experiment with the interface and provide
useful feedback.

Example
=======

Instantiate netdevsim
---------------------

# echo "10 1" > /sys/bus/netdevsim/new_device
# ip link set dev eth0 up

List supported traps
--------------------

# devlink trap show
netdevsim/netdevsim10:
  name ingress_smac_mc_drop type drop generic true report false action drop group l2_drops
  name ingress_vlan_tag_allow_drop type drop generic true report false action drop group l2_drops
  name ingress_vlan_filter_drop type drop generic true report false action drop group l2_drops
  name ingress_stp_filter_drop type drop generic true report false action drop group l2_drops
  name uc_empty_tx_list_drop type drop generic true report false action drop group l2_drops
  name mc_empty_tx_list_drop type drop generic true report false action drop group l2_drops
  name uc_loopback_filter_drop type drop generic true report false action drop group l2_drops
  name fid_miss_exception type exception generic false report false action trap group l2_drops
  name blackhole_route_drop type drop generic true report false action drop group l3_drops
  name ttl_error_exception type exception generic true report false action trap group l3_drops
  name tail_drop type drop generic true report false action drop group buffer_drops
  name early_drop type drop generic true report false action drop group buffer_drops

Enable a trap
-------------

# devlink trap set netdevsim/netdevsim10 trap blackhole_route_drop action trap report true

Query statistics
----------------

# devlink -s trap show netdevsim/netdevsim10 trap blackhole_route_drop
netdevsim/netdevsim10:
  name blackhole_route_drop type drop generic true report true action trap group l3_drops
    stats:
        rx:
          bytes 2272 packets 16

Monitor dropped packets
-----------------------

# devlink -v mon trap-report
[trap-report,report] netdevsim/netdevsim10: name blackhole_route_drop type drop group l3_drops length 146 timestamp Tue May 28 15:02:26 2019 153282944 nsec
  input_port:
    netdevsim/netdevsim10/0: type eth netdev eth0

TODO
====

* Add selftests
* Write a man page for devlink-trap

Future plans
============

* Write a Wireshark dissector
* Provide eBPF programs that show how drops are distributed between different
  flows
* Provide more drop reasons as well as more metadata

Ido Schimmel (12):
  devlink: Create helper to fill port type information
  devlink: Add packet trap infrastructure
  devlink: Add generic packet traps and groups
  Documentation: Add devlink-trap documentation
  netdevsim: Add devlink-trap support
  Documentation: Add description of netdevsim traps
  mlxsw: pci: Query and store PCIe bandwidth during init
  mlxsw: core: Add API to set trap action
  mlxsw: reg: Add new trap action
  mlxsw: Add layer 2 discard trap IDs
  mlxsw: Add trap group for layer 2 discards
  mlxsw: spectrum: Add devlink-trap support

 .../networking/devlink-trap-netdevsim.rst     |   20 +
 Documentation/networking/devlink-trap.rst     |  200 +++
 Documentation/networking/index.rst            |    2 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile  |    2 +-
 drivers/net/ethernet/mellanox/mlxsw/core.c    |   64 +
 drivers/net/ethernet/mellanox/mlxsw/core.h    |   13 +
 drivers/net/ethernet/mellanox/mlxsw/pci.c     |    2 +
 drivers/net/ethernet/mellanox/mlxsw/reg.h     |   10 +
 .../net/ethernet/mellanox/mlxsw/spectrum.c    |   17 +
 .../net/ethernet/mellanox/mlxsw/spectrum.h    |   13 +
 .../ethernet/mellanox/mlxsw/spectrum_trap.c   |  245 +++
 drivers/net/ethernet/mellanox/mlxsw/trap.h    |    7 +
 drivers/net/netdevsim/dev.c                   |  273 +++-
 drivers/net/netdevsim/netdevsim.h             |    1 +
 include/net/devlink.h                         |  188 +++
 include/uapi/linux/devlink.h                  |   68 +
 net/core/devlink.c                            | 1314 ++++++++++++++++-
 17 files changed, 2412 insertions(+), 27 deletions(-)
 create mode 100644 Documentation/networking/devlink-trap-netdevsim.rst
 create mode 100644 Documentation/networking/devlink-trap.rst
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c

Comments

Florian Fainelli May 29, 2019, 1:32 a.m. UTC | #1
On 5/28/2019 5:21 AM, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> Users have several ways to debug the kernel and understand why a packet
> was dropped. For example, using "drop monitor" and "perf". Both
> utilities trace kfree_skb(), which is the function called when a packet
> is freed as part of a failure. The information provided by these tools
> is invaluable when trying to understand the cause of a packet loss.
> 
> In recent years, large portions of the kernel data path were offloaded
> to capable devices. Today, it is possible to perform L2 and L3
> forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> Different TC classifiers and actions are also offloaded to capable
> devices, at both ingress and egress.
> 
> However, when the data path is offloaded it is not possible to achieve
> the same level of introspection as tools such "perf" and "drop monitor"
> become irrelevant.
> 
> This patchset aims to solve this by allowing users to monitor packets
> that the underlying device decided to drop along with relevant metadata
> such as the drop reason and ingress port.
> 
> The above is achieved by exposing a fundamental capability of devices
> capable of data path offloading - packet trapping. While the common use
> case for packet trapping is the trapping of packets required for the
> correct functioning of the control plane (e.g., STP, BGP packets),
> packets can also be trapped due to other reasons such as exceptions
> (e.g., TTL error) and drops (e.g., blackhole route).
> 
> Given this ability is not specific to a port, but rather to a device, it
> is exposed using devlink. Each capable driver is expected to register
> its supported packet traps with devlink and report trapped packets to
> devlink as they income. devlink will perform accounting of received
> packets and bytes and will potentially generate an event to user space
> using a new generic netlink multicast group.
> 
> While this patchset is concerned with traps corresponding to dropped
> packets, the interface itself is generic and can be used to expose traps
> corresponding to control packets in the future. The API is vendor
> neutral and similar to the API exposed by SAI which is implemented by
> several vendors already.
> 
> The implementation in this patchset is on top of both mlxsw and
> netdevsim so that people could experiment with the interface and provide
> useful feedback.

This is not particularly useful feedback but I found very little to
comment on because you have covered a lot of ground here.

What you propose is entirely reasonable and seems perfectly adequate to
report the Broadcom tags reason code (RC) (there are only a few reason
codes) within DSA. I don't know if other tagging formats may allow
similar information to be reported.

Looking forward to the non-RFC version!