Message ID | 20221121161217.304094-1-amorenoz@redhat.com |
---|---|
Headers | show |
Series | Add ovn drop debugging | expand |
On 11/21/22 17:12, Adrian Moreno wrote: > Very often when troubleshooting networking issues in an OVN cluster one > would like to know if any packet (or a specific one) is being dropped by > OVN. > > Currently, this cannot be known because of two main reasons: > > 1 - Implicit drops: Some tables do not have a default action > (priority=0, match=1). In this case, a packet that does not match any > rule will be silently dropped. > > 2 - Even on explicit drops, we only know a packet was dropped. We lack > information about that packet. > > In order to improve this, this series introduces a two-fold solution: > > - First, make all drops explicit: > - northd add a default (match = "1") "drop;" action to those tables > that currently lack one. > - ovn-controller add an explicit drop action on those tables are not > associated with logical flows (i.e: physical-to-logical mappings). > > - Secondly, allow sampling of all drops. By introducing a new OVN > action: "sample" (equivalent to OVS's), OVN can make OVS sample the > packets as they are dropped. In order to be able to correlate those > samples back to what exact rule generated them, the user specifies the > a 8-bit observation_domain_id. Based on that, the samples contain > the following fields: > - obs_domain_id: > - 8 most significant bits = the provided observation_domain_id. > - 24 least significant bits = the datapath's tunnely key if the > drop comes from a lflow or zero otherwise. > - obs_point_id: the first 32-bits of the lflow's UUID (i.e: the > cookie) if the drop comes from an lflow or the table number > otherwise. > > Based on the above changes in the flows, all of which are optional, > users can collect IPFIX samples of the packets that are dropped by OVN > which contain header information useful for debugging. > > * Note on observation_domain_ids: > By allowing the user to specify only the 8 most significant bits of the > obs_domain_id and having OVN combine it with the datapath's tunnel key, > OVN could be extended to support more than one "sampling" application. > For instance, ACL sampling could be developed in the future and, by > specifying a different observation_domain_id, it could co-exist with the > drop sampling mode implemented in the current series while still > allowing to uniquely identify the flow that created the sample. > > * Notes on testing and usage: > Any IPFIX collector that parses ObservationPointID and > ObservationDomainID fields can be used. For instance, nfdump 1.7 > supports these fields in nfdump. Example of how to capture and analyze > drops: > # Enable debug sampling: > $ ovn-nbctl set NB_Global . options:debug_drop_collector_set=1 options:debug_drop_domain_id=1 > # Start nfcapd: > nfcapd -p 2055 -l nfcap & > # Configue sampling on the OVS you want to inspect: > $ ovs-vsctl --id=@br get Bridge br-int -- --id=@i create IPFIX > targets=\"172.18.0.1:2055\" -- create Flow_Sample_Collector_Set > bridge=@br id=1 > # Inspect samples and figure out what LogicalFlow caused them: > $ nfdump -r nfcap -o fmt:'%line %odid %opid' > Date first seen Duration Proto Src IP Addr:Port > Dst IP Addr:Port Packets Bytes Flows obsDomainID obsPointID > 1970-01-01 01:09:36.000 00:00:00.000 UDP 172.18.0.1:49230 -> > 239.255.255.250:1900 12 6356 1 0x001000009 0x00d8dd23c7 > 1970-01-01 01:01:34.000 00:00:00.000 UDP 172.18.0.1:5353 -> > 224.0.0.251:5353 165 89257 1 0x001000009 0x00d8dd23c7 > [...] > $ ovn-sb vn-sbctl list Logical_Flow | grep -A 11 d8dd23c7 > _uuid : d8dd23c7-1451-4ea3-add7-8d68b4be4691 > actions : > "sample(probability=65535,collector_set=1,obs_domain=1,obs_point=$cookie); > /* drop */" > controller_meter : [] > external_ids : {source="northd.c:12504", > stage-name=lr_in_ip_input} > logical_datapath : [] > logical_dp_group : 0dc1b195-c647-4277-aea0-0bad5e896f51 > match : "ip4.mcast || ip6.mcast" > pipeline : ingress > priority : 82 > table_id : 3 > tags : {} > hash : 0 > > v5 -> v6: Addressed Dumitru's comments. > V4 -> V5: Added documentation > V3 -> V4: Make explicit drops the default behavior. > V2 -> V3: Fix rebase problem on unit test > V1 -> V2 > - Rebased and Addressed Mark's comments. > - Added NEWS section. > > Adrian Moreno (3): > actions: add sample action > northd: make default drops explicit > northd: add drop sampling > Thanks Adrian, Ales, Mark, Numan! It was quite easy to rebase this so I added Ales' "Reviewed-by" and Numan's "Acked-by" and I pushed the series to the main branch. Regards, Dumitru
On 11/23/22 16:58, Dumitru Ceara wrote: > On 11/21/22 17:12, Adrian Moreno wrote: >> Very often when troubleshooting networking issues in an OVN cluster one >> would like to know if any packet (or a specific one) is being dropped by >> OVN. >> >> Currently, this cannot be known because of two main reasons: >> >> 1 - Implicit drops: Some tables do not have a default action >> (priority=0, match=1). In this case, a packet that does not match any >> rule will be silently dropped. >> >> 2 - Even on explicit drops, we only know a packet was dropped. We lack >> information about that packet. >> >> In order to improve this, this series introduces a two-fold solution: >> >> - First, make all drops explicit: >> - northd add a default (match = "1") "drop;" action to those tables >> that currently lack one. >> - ovn-controller add an explicit drop action on those tables are not >> associated with logical flows (i.e: physical-to-logical mappings). >> >> - Secondly, allow sampling of all drops. By introducing a new OVN >> action: "sample" (equivalent to OVS's), OVN can make OVS sample the >> packets as they are dropped. In order to be able to correlate those >> samples back to what exact rule generated them, the user specifies the >> a 8-bit observation_domain_id. Based on that, the samples contain >> the following fields: >> - obs_domain_id: >> - 8 most significant bits = the provided observation_domain_id. >> - 24 least significant bits = the datapath's tunnely key if the >> drop comes from a lflow or zero otherwise. >> - obs_point_id: the first 32-bits of the lflow's UUID (i.e: the >> cookie) if the drop comes from an lflow or the table number >> otherwise. >> >> Based on the above changes in the flows, all of which are optional, >> users can collect IPFIX samples of the packets that are dropped by OVN >> which contain header information useful for debugging. >> >> * Note on observation_domain_ids: >> By allowing the user to specify only the 8 most significant bits of the >> obs_domain_id and having OVN combine it with the datapath's tunnel key, >> OVN could be extended to support more than one "sampling" application. >> For instance, ACL sampling could be developed in the future and, by >> specifying a different observation_domain_id, it could co-exist with the >> drop sampling mode implemented in the current series while still >> allowing to uniquely identify the flow that created the sample. >> >> * Notes on testing and usage: >> Any IPFIX collector that parses ObservationPointID and >> ObservationDomainID fields can be used. For instance, nfdump 1.7 >> supports these fields in nfdump. Example of how to capture and analyze >> drops: >> # Enable debug sampling: >> $ ovn-nbctl set NB_Global . options:debug_drop_collector_set=1 options:debug_drop_domain_id=1 >> # Start nfcapd: >> nfcapd -p 2055 -l nfcap & >> # Configue sampling on the OVS you want to inspect: >> $ ovs-vsctl --id=@br get Bridge br-int -- --id=@i create IPFIX >> targets=\"172.18.0.1:2055\" -- create Flow_Sample_Collector_Set >> bridge=@br id=1 >> # Inspect samples and figure out what LogicalFlow caused them: >> $ nfdump -r nfcap -o fmt:'%line %odid %opid' >> Date first seen Duration Proto Src IP Addr:Port >> Dst IP Addr:Port Packets Bytes Flows obsDomainID obsPointID >> 1970-01-01 01:09:36.000 00:00:00.000 UDP 172.18.0.1:49230 -> >> 239.255.255.250:1900 12 6356 1 0x001000009 0x00d8dd23c7 >> 1970-01-01 01:01:34.000 00:00:00.000 UDP 172.18.0.1:5353 -> >> 224.0.0.251:5353 165 89257 1 0x001000009 0x00d8dd23c7 >> [...] >> $ ovn-sb vn-sbctl list Logical_Flow | grep -A 11 d8dd23c7 >> _uuid : d8dd23c7-1451-4ea3-add7-8d68b4be4691 >> actions : >> "sample(probability=65535,collector_set=1,obs_domain=1,obs_point=$cookie); >> /* drop */" >> controller_meter : [] >> external_ids : {source="northd.c:12504", >> stage-name=lr_in_ip_input} >> logical_datapath : [] >> logical_dp_group : 0dc1b195-c647-4277-aea0-0bad5e896f51 >> match : "ip4.mcast || ip6.mcast" >> pipeline : ingress >> priority : 82 >> table_id : 3 >> tags : {} >> hash : 0 >> >> v5 -> v6: Addressed Dumitru's comments. >> V4 -> V5: Added documentation >> V3 -> V4: Make explicit drops the default behavior. >> V2 -> V3: Fix rebase problem on unit test >> V1 -> V2 >> - Rebased and Addressed Mark's comments. >> - Added NEWS section. >> >> Adrian Moreno (3): >> actions: add sample action >> northd: make default drops explicit >> northd: add drop sampling >> > > Thanks Adrian, Ales, Mark, Numan! > > It was quite easy to rebase this so I added Ales' "Reviewed-by" and > Numan's "Acked-by" and I pushed the series to the main branch. > > Regards, > Dumitru > Thanks all.