Message ID | 20230906093307.2960027-1-mheib@redhat.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev] pinctrl: DNS refuse inapplicable AAAA queries | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | fail | github build: failed |
Hi Mohammad, I have some comments. See below. On 9/6/23 05:33, Mohammad Heib wrote: > Currently ovn ignores DNS AAAA queries that has > record in the DNS tables but no ipv6 associated > with this record, this will incress the DNS processing > time for the custmer since they will keep waiting for > reply or a timeout. What does "ignores" mean in this case? Does this mean the DNS request is dropped by pinctrl? Or does it mean the DNS request is not replied by pinctrl, but the request continues through the next tables in br-int? My understanding is that it should be the latter. The packet should continue through br-int and eventually get routed towards the destination IP address of the DNS request. If I'm correct about this behavior, then I think the behavior added in this patch could potentially cause issues for some deployments. If they have relied on DNS requests slipping through OVN's cache, then they will be replied with a rejection now when they were not before. This also creeps the scope of OVN's DNS resolution. The idea of DNS resolution in OVN is to act as a cache of known addresses to speed up lookups of common domains (or local domains that are not in DNS). It's not meant to act as a full-featured DNS cache and certainly not a full-featured DNS server. IMO, the issue here is not in OVN but with DNS configuration on the host. I suspect the host is using default resolv.conf style DNS resolution. When this is done, no caching of DNS queries is done by the host. A query can take quite a long time to eventually result in an NXDOMAIN result. This is because of timeout and retry settings on the system. The issue specifically mentions that with ML2/OVS, they're using dnsmasq. If using a full-featured DNS forwarder like dnsmasq, then subsequent lookups of NXDOMAINs will be answered near-instantly by the forwarder instead of having to perform the lookup again. I think the proper fix is to use dnsmasq on the host (or some other caching DNS forwarder) to ensure that repeated lookups to non-existent domains are answered quickly. Now, if the domain only exists locally and an administrator knows for sure that no DNS server is ever going to give a useful reply to DNS requests, then a configuration option could be added to indicate that OVN is the authority for this domain and so we should be able to reply with this sort of rejection. And finally, my last top-level comment is that I think this same situation could happen with the opposite configuration. If an NBDB DNS record only has IPv6 addresses, and someone sends a DNS A record request, then I think the same problem would occur. If we are adding an option to DNS to say that OVN is the authority for a specific domain, then we need to address both scenarios. > > To improve the DNS processing time this patch will immediately > send a DNS reply with DNS RCODE flag set to 0x5 (server refuses > to perform the specified operation) and no DNS answers. > > Reported-at: https://issues.redhat.com/browse/FD-1211 Please cite the Bugzilla issue here (https://bugzilla.redhat.com/show_bug.cgi?id=1946662) instead of the FD Jira issue. The FD project in Jira is not visible to people outside of Red Hat. Red Hat is migrating from Bugzilla to Jira, and the confusingly named "FDP" project is the one that is visible to the public. The "FD" project just mirrors Bugzilla issues, so the Bugzilla issue is the source of truth for those types of issues. > Signed-off-by: Mohammad Heib <mheib@redhat.com> > --- > controller/pinctrl.c | 28 +++++++++++++++++++++++++--- > tests/ovn.at | 32 ++++++++++++++++++++++++++++++++ > 2 files changed, 57 insertions(+), 3 deletions(-) > > diff --git a/controller/pinctrl.c b/controller/pinctrl.c > index 1884e9f1b..3212900c5 100644 > --- a/controller/pinctrl.c > +++ b/controller/pinctrl.c > @@ -2815,6 +2815,7 @@ pinctrl_handle_dns_lookup( > enum ofputil_protocol proto = ofputil_protocol_from_ofp_version(version); > struct dp_packet *pkt_out_ptr = NULL; > uint32_t success = 0; > + bool send_aaaa_query_rejection = false; > > /* Parse result field. */ > const struct mf_field *f; > @@ -2972,6 +2973,18 @@ pinctrl_handle_dns_lookup( > ancount++; > } > } > + > + /* DNS is configured with a record for this domain with > + * an IPv4 only, so instead of ignoring this AAAA query, > + * we can reply with RCODE = 5 (server refuses) and that > + * will speed up the DNS process by not letting the customer > + * wait for a timeout. > + */ > + if (query_type == DNS_QUERY_TYPE_AAAA && !ancount) { > + ancount = 1; I'm guessing you set ancount to 1 here so that the "if (!ancount)" block below will not trigger. I think a better way to do this would be to leave ancount set to 0 here. Then, modify the later if statement to be: if (!ancount || !send_aaaa_query_rejection) { This way we do not have a misleading ancount variable. > + send_aaaa_query_rejection = true; > + } > + > destroy_lport_addresses(&ip_addrs); > } > > @@ -3009,15 +3022,24 @@ pinctrl_handle_dns_lookup( > out_dns_header->lo_flag |= 0x80; > > /* Set the answer RRs. */ > - out_dns_header->ancount = htons(ancount); > + if (!send_aaaa_query_rejection) { > + out_dns_header->ancount = htons(ancount); > + } else { > + /* set RCODE = 5 (server refuses). */ > + out_dns_header->ancount = 0; > + out_dns_header->hi_flag |= 0x5; > + ofpbuf_uninit(&dns_answer); > + } If you take my advice about not changing ancount to 1, then the above block can be rewritten as: out_dns_header->ancount = htons(ancount); if (send_aaaa_query_rejection) { out_dns_header->hi_flag |= 0x5; } I think that the ofpbuf_uninit() call should be removed from this block of code. It'll make a bit more sense when you see my next comment. Also, I recommend replacing "0x5" with a named constant. > out_dns_header->arcount = 0; > > /* Copy the Query section. */ > dp_packet_put(&pkt_out, dp_packet_data(pkt_in), dp_packet_size(pkt_in)); > > /* Copy the answer sections. */ > - dp_packet_put(&pkt_out, dns_answer.data, dns_answer.size); > - ofpbuf_uninit(&dns_answer); > + if (!send_aaaa_query_rejection) { > + dp_packet_put(&pkt_out, dns_answer.data, dns_answer.size); > + ofpbuf_uninit(&dns_answer); > + } If you leave the unconditional ofpbuf_uninit(&dns_answer) here, then it's more clear what the scope of dns_answer is. There's reduced risk of someone accidentally dereferencing it after it has been freed. The rewritten code would be: if (!send_aaaa_query_rejection) { dp_packet_put(&pkt_out, dns_answer.data, dns_answer.size); } ofpbuf_uninit(&dns_answer); > > out_udp->udp_len = htons(new_l4_size); > out_udp->udp_csum = 0; > diff --git a/tests/ovn.at b/tests/ovn.at > index c6c5f920f..7b90a991d 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -11290,6 +11290,38 @@ reset_pcap_file hv1-vif2 hv1/vif2 > rm -f 1.expected > rm -f 2.expected > > +# send AAAA query for a server known domain that don't have > +# any IPV6 address associated with this domain, and expected > +# server refused DNS reply to save the sender time of waiting for timeout. > +AS_BOX([Test IPv6 (AAAA records) NO timeout.]) > +# Add back the DNS options for ls1-lp1 without ipv6. > +ovn-nbctl --wait=hv remove DNS $DNS1 records vm1.ovn.org > +ovn-nbctl --wait=hv set DNS $DNS1 records:vm1.ovn.org="10.0.0.4" Two things: 1) Please add "check" before these ovn-nbctl calls. 2) You don't need to have "--wait=hv" on both ovn-nbctl calls. You only need to have this on the final ovn-nbctl call. > +ovn-sbctl list DNS > dns4 > +AT_CAPTURE_FILE([dns4]) > +ovn-sbctl dump-flows > sbflows4 > +AT_CAPTURE_FILE([sbflows4]) > + > +set_dns_params vm1_ipv6_only > +src_ip=`ip_to_hex 10 0 0 6` > +dst_ip=`ip_to_hex 10 0 0 1` > +dns_reply=1 > +test_dns 2 f00000000002 f000000000f0 $src_ip $dst_ip $dns_reply $dns_req_data $dns_resp_data > + > +# NXT_RESUMEs should be 5. > +OVS_WAIT_UNTIL([test 13 = `cat ofctl_monitor*.log | grep -c NXT_RESUME`]) > + > +$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/vif2-tx.pcap > 2.packets > +# dns hdr with server refuse RCODE > +echo "01028125" > expout > +#only check for the DNS HDR flags since we are not getting any DNS answer > +AT_CHECK([cat 2.packets | cut -c -92 | cut -c 85-], [0], [expout]) > + > +reset_pcap_file hv1-vif1 hv1/vif1 > +reset_pcap_file hv1-vif2 hv1/vif2 > +rm -f 1.expected > +rm -f 2.expected > + > OVN_CLEANUP([hv1]) > > AT_CLEANUP
diff --git a/controller/pinctrl.c b/controller/pinctrl.c index 1884e9f1b..3212900c5 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -2815,6 +2815,7 @@ pinctrl_handle_dns_lookup( enum ofputil_protocol proto = ofputil_protocol_from_ofp_version(version); struct dp_packet *pkt_out_ptr = NULL; uint32_t success = 0; + bool send_aaaa_query_rejection = false; /* Parse result field. */ const struct mf_field *f; @@ -2972,6 +2973,18 @@ pinctrl_handle_dns_lookup( ancount++; } } + + /* DNS is configured with a record for this domain with + * an IPv4 only, so instead of ignoring this AAAA query, + * we can reply with RCODE = 5 (server refuses) and that + * will speed up the DNS process by not letting the customer + * wait for a timeout. + */ + if (query_type == DNS_QUERY_TYPE_AAAA && !ancount) { + ancount = 1; + send_aaaa_query_rejection = true; + } + destroy_lport_addresses(&ip_addrs); } @@ -3009,15 +3022,24 @@ pinctrl_handle_dns_lookup( out_dns_header->lo_flag |= 0x80; /* Set the answer RRs. */ - out_dns_header->ancount = htons(ancount); + if (!send_aaaa_query_rejection) { + out_dns_header->ancount = htons(ancount); + } else { + /* set RCODE = 5 (server refuses). */ + out_dns_header->ancount = 0; + out_dns_header->hi_flag |= 0x5; + ofpbuf_uninit(&dns_answer); + } out_dns_header->arcount = 0; /* Copy the Query section. */ dp_packet_put(&pkt_out, dp_packet_data(pkt_in), dp_packet_size(pkt_in)); /* Copy the answer sections. */ - dp_packet_put(&pkt_out, dns_answer.data, dns_answer.size); - ofpbuf_uninit(&dns_answer); + if (!send_aaaa_query_rejection) { + dp_packet_put(&pkt_out, dns_answer.data, dns_answer.size); + ofpbuf_uninit(&dns_answer); + } out_udp->udp_len = htons(new_l4_size); out_udp->udp_csum = 0; diff --git a/tests/ovn.at b/tests/ovn.at index c6c5f920f..7b90a991d 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -11290,6 +11290,38 @@ reset_pcap_file hv1-vif2 hv1/vif2 rm -f 1.expected rm -f 2.expected +# send AAAA query for a server known domain that don't have +# any IPV6 address associated with this domain, and expected +# server refused DNS reply to save the sender time of waiting for timeout. +AS_BOX([Test IPv6 (AAAA records) NO timeout.]) +# Add back the DNS options for ls1-lp1 without ipv6. +ovn-nbctl --wait=hv remove DNS $DNS1 records vm1.ovn.org +ovn-nbctl --wait=hv set DNS $DNS1 records:vm1.ovn.org="10.0.0.4" +ovn-sbctl list DNS > dns4 +AT_CAPTURE_FILE([dns4]) +ovn-sbctl dump-flows > sbflows4 +AT_CAPTURE_FILE([sbflows4]) + +set_dns_params vm1_ipv6_only +src_ip=`ip_to_hex 10 0 0 6` +dst_ip=`ip_to_hex 10 0 0 1` +dns_reply=1 +test_dns 2 f00000000002 f000000000f0 $src_ip $dst_ip $dns_reply $dns_req_data $dns_resp_data + +# NXT_RESUMEs should be 5. +OVS_WAIT_UNTIL([test 13 = `cat ofctl_monitor*.log | grep -c NXT_RESUME`]) + +$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/vif2-tx.pcap > 2.packets +# dns hdr with server refuse RCODE +echo "01028125" > expout +#only check for the DNS HDR flags since we are not getting any DNS answer +AT_CHECK([cat 2.packets | cut -c -92 | cut -c 85-], [0], [expout]) + +reset_pcap_file hv1-vif1 hv1/vif1 +reset_pcap_file hv1-vif2 hv1/vif2 +rm -f 1.expected +rm -f 2.expected + OVN_CLEANUP([hv1]) AT_CLEANUP
Currently ovn ignores DNS AAAA queries that has record in the DNS tables but no ipv6 associated with this record, this will incress the DNS processing time for the custmer since they will keep waiting for reply or a timeout. To improve the DNS processing time this patch will immediately send a DNS reply with DNS RCODE flag set to 0x5 (server refuses to perform the specified operation) and no DNS answers. Reported-at: https://issues.redhat.com/browse/FD-1211 Signed-off-by: Mohammad Heib <mheib@redhat.com> --- controller/pinctrl.c | 28 +++++++++++++++++++++++++--- tests/ovn.at | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 3 deletions(-)