mbox series

[ovs-dev,v3,0/3] Add ovn drop debugging

Message ID 20220927073116.2166024-1-amorenoz@redhat.com
Headers show
Series Add ovn drop debugging | expand

Message

Adrián Moreno Sept. 27, 2022, 7:31 a.m. UTC
Very often when troubleshooting networking issues in an OVN cluster one
would like to know if any packet (or a specific one) is being dropped by
OVN.

Currently, this cannot be known because of two main reasons:

1 - Implicit drops: Some tables do not have a default action
(priority=0, match=1). In this case, a packet that does not match any
rule will be silently dropped.

2 - Even on explicit drops, we only know a packet was dropped. We lack
information about that packet.

In order to improve this, this series introduces a two-fold solution:

- First, create a debug-mode option. When enabled, it makes:
   - northd add a default (match = "1") "drop;" action to those tables
   that currently lack one.
   - ovn-controller add an explicit drop action on those tables are not
   associated with logical flows (i.e: physical-to-logical mappings).

- Secondly, allow sampling of all drops. By introducing a new OVN
  action: "sample" (equivalent to OVS's), OVN can make OVS sample the
  packets as they are dropped. In order to be able to correlate those
  samples back to what exact rule generated them, the user specifies the
  a 8-bit observation_domain_id. Based on that, the samples contain
  the following fields:
  - obs_domain_id:
     - 8 most significant bits = the provided observation_domain_id.
     - 24 least significant bits = the datapath's tunnely key if the
       drop comes from a lflow or zero otherwise.
  - obs_point_id: the first 32-bits of the lflow's UUID (i.e: the
    cookie) if the drop comes from an lflow or the table number
    otherwise.

Based on the above changes in the flows, all of which are optional,
users can collect IPFIX samples of the packets that are dropped by OVN
which contain header information useful for debugging.

* Note on observation_domain_ids:
By allowing the user to specify only the 8 most significant bits of the
obs_domain_id and having OVN combine it with the datapath's tunnel key,
OVN could be extended to support more than one "sampling" application.
For instance, ACL sampling could be developed in the future and, by
specifying a different observation_domain_id, it could co-exist with the
drop sampling mode implemented in the current series while still
allowing to uniquely identify the flow that created the sample.

* Notes on testing and usage:
Any IPFIX collector that parses ObservationPointID and
ObservationDomainID fields can be used. For instance, nfdump supports
these fields in its unicorn branch [1] (future nfdump 1.7). Example of
how to capture and analyze drops:
# Enable debug sampling:
$ ovn-nbctl set NB_Global . options:debug_drop_mode=true
options:debug_drop_collector_set=1 options:debug_drop_domain_id=1
# Start nfcapd:
nfcapd -p 2055 -l nfcap &
# Configue sampling on the OVS you want to inspect:
$ ovs-vsctl --id=@br get Bridge br-int -- --id=@i create IPFIX
targets=\"172.18.0.1:2055\" --  create Flow_Sample_Collector_Set
bridge=@br id=1
# Inspect samples and figure out what LogicalFlow caused them:
$ nfdump -r nfcap -o fmt:'%line %odid %opid'
Date first seen             Duration     Proto      Src IP Addr:Port
Dst IP Addr:Port   Packets    Bytes Flows obsDomainID   obsPointID
1970-01-01 01:09:36.000     00:00:00.000 UDP         172.18.0.1:49230 ->
239.255.255.250:1900        12     6356     1 0x001000009 0x00d8dd23c7
1970-01-01 01:01:34.000     00:00:00.000 UDP         172.18.0.1:5353  ->
224.0.0.251:5353       165    89257     1 0x001000009 0x00d8dd23c7
[...]
$ ovn-sb vn-sbctl list Logical_Flow | grep -A 11 d8dd23c7
_uuid               : d8dd23c7-1451-4ea3-add7-8d68b4be4691
actions             :
"sample(probability=65535,collector_set=1,obs_domain=1,obs_point=$cookie);
/* drop */"
controller_meter    : []
external_ids        : {source="northd.c:12504",
stage-name=lr_in_ip_input}
logical_datapath    : []
logical_dp_group    : 0dc1b195-c647-4277-aea0-0bad5e896f51
match               : "ip4.mcast || ip6.mcast"
pipeline            : ingress
priority            : 82
table_id            : 3
tags                : {}
hash                : 0


[1] https://github.com/phaag/nfdump/tree/unicorn

V2 -> V3: Fix rebase problem on unit test

V1 -> V2
- Rebased and Addressed Mark's comments.
- Added NEWS section.

Adrian Moreno (3):
  actions: add sample action
  northd: add drop-debug-mode to add explicit drops
  northd: add drop sampling

 NEWS                        |   2 +
 controller/lflow.c          |   1 +
 controller/ovn-controller.c |  50 +++++++++
 controller/physical.c       |  80 ++++++++++++++-
 controller/physical.h       |   7 ++
 include/ovn/actions.h       |  16 +++
 lib/actions.c               | 120 ++++++++++++++++++++++
 northd/automake.mk          |   2 +
 northd/debug.c              | 107 +++++++++++++++++++
 northd/debug.h              |  41 ++++++++
 northd/northd.c             | 115 ++++++++++++++-------
 ovn-nb.xml                  |  32 ++++++
 tests/ovn-northd.at         |  75 ++++++++++++++
 tests/ovn.at                | 200 +++++++++++++++++++++++++++++++++++-
 tests/test-ovn.c            |   3 +
 utilities/ovn-trace.c       |   2 +
 16 files changed, 810 insertions(+), 43 deletions(-)
 create mode 100644 northd/debug.c
 create mode 100644 northd/debug.h