Patchwork resolver does not try other nameservers on SERVFAIL

login
register
mail settings
Submitter aldot
Date March 27, 2014, 7:06 a.m.
Message ID <CAC1BbcThyOkzLvubn3AOu5qyLROpL3LrjNFvfrnaOzEnZ0yDqw@mail.gmail.com>
Download mbox | patch
Permalink /patch/334235/
State Superseded
Headers show

Comments

aldot - March 27, 2014, 7:06 a.m.
On 17 March 2014 20:23, Michel Stam <michel@reverze.net> wrote:
> Thanks Bernhard.
>
> Cheers
>
> Michel Stam
>
>
>> On 17 mrt. 2014, at 20:19, "Bernhard Reutner-Fischer" <rep.dot.nop@gmail.com> wrote:
>>
>>> On 13 March 2014 11:43:05 Michel Stam <michel.uclibc@reverze.net> wrote:
>>>
>>> Dear mailing list,
>>>
>>> I have seen very little response on this topic; Would it be possible to apply this patch to trunk?
>>
>> I'm verifying the patch right now, will apply it afterwards. Will followup in the push.
>>
>> Thanks!
>>>
>>> I would like to add that without this patch, the behaviour in uClibc differs from glibc.
>>>
>>> Best regads,
>>>
>>> Michel Stam
>>> On 03/03/2014 11:46 AM, Michel Stam wrote:
>>> > Commit e1420eca7374cd8f583e9d774c890645a205aaee fixed a bug where a
>>> > response code should mean the next server is tried. However, it tries
>>> > only the next search domain, and never skips to the next server.
>>> >
>>> > In my specific situation, I was using tmdns as a DNS -> mDNS bridge to resolve mDNS names. tmdns returns SERVFAIL on any domain that does not end in .local.
>>> >
>>> > uClibc then tries all the search domains in /etc/resolv.conf and gives up, not jumping to the next nameserver in the list (in my case the real nameserver). Thus, any non-.local domain never got resolved.
>>> >
>>> > My resolv.conf;
>>> > domain bla.net
>>> > search bla.net
>>> > nameserver 127.0.0.1
>>> > nameserver 172.16.1.1
>>> >
>>> > The patch I have attached basically allows SERVFAIL to go back to the case as it was before 0.9.32, except that search domains are still tried.
>>> >

Not entirely happy with that.
How about the attached instead, without looking at the size(1) implications yet?

Revert e1420eca7374cd8f583e9d774c890645a205aaee
Rephrase __dns_lookup rcode handling to fix bug 660 and fix ???? as
reported by Michel Stam.

Details:
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 193.170.61.4
nameserver 193.170.61.5
#nameserver 131.130.250.130
nameserver 192.168.100.42
search loc lane wifi wien berlin ac.at at

$ for i in nxdomain univie nic;do echo "# $i";./ghbn $i;echo
"#############################################";done
# nxdomain
argc=2 argv=0x7fffd5941508 envp=0x7fffd5941520
ELF header=0x7f3dd946f000
First Dynamic section entry=0x7f3dd9678ea0
Scanning DYNAMIC section
Done scanning DYNAMIC section
About to do library loader relocations
Done relocating ldso; we can now use globals and make function calls!
_dl_get_ready_to_run:450: Cool, ldso survived making function calls
_dl_malloc:240: mmapping more memory
_dl_ldsopath_init:156: Lib Loader: (0xd946f000)
/scratch/src/uClibc.push7/lib/ld64-uClibc.so.0: using path:
/scratch/src/uClibc.push7/lib
_dl_load_elf_shared_library:772: Found TLS header for
/scratch/src/uClibc.push7/lib/libc.so.0
_dl_load_elf_shared_library:799: Relocated TLS initial image from
0x2ca968 to 0x7f3dd9466968 (size = 0x8)
_dl_get_ready_to_run:1053: Loading: (0x7f3dd919c000)
/scratch/src/uClibc.push7/lib/libc.so.0
_dl_get_ready_to_run:1053: Loading: (0x7f3dd946f000)
/scratch/src/uClibc.push7/lib/ld64-uClibc.so.0
_dl_get_ready_to_run:1194: Calling init_tls()!
_dl_malloc:240: mmapping more memory
_dl_malloc:240: mmapping more memory
_dl_get_ready_to_run:1296: Beginning relocation fixups
_dl_get_ready_to_run:1326: Calling _dl_allocate_tls_init()!
transfering control to application @ 0x400550
Nothing found in /etc/hosts
Looking up type 1 answer for 'nxdomain'
adding search loc
adding search lane
adding search wifi
adding search wien
adding search berlin
adding search ac.at
adding search at
nameservers = 3
encoding header
lookup name: nxdomain
On try 8, sending query to 193.170.61.4, port 53
Xmit packet len:26 id:2 qr:0
len:26 id:2 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=0,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=0,rcode=5
encoding header
lookup name: nxdomain
On try 7, sending query to 193.170.61.5, port 53
Xmit packet len:26 id:3 qr:0
len:26 id:3 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=0,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=0,rcode=5
encoding header
lookup name: nxdomain
On try 6, sending query to 192.168.100.42, port 53
Xmit packet len:26 id:4 qr:0
len:101 id:4 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=1,rcode=3
variant:-1 sdomains:7
encoding header
lookup name: nxdomain.loc
On try 5, sending query to 192.168.100.42, port 53
Xmit packet len:30 id:5 qr:0
len:81 id:5 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=1,tc=0,rd=1,ra=1,rcode=3
variant:0 sdomains:7
encoding header
lookup name: nxdomain.lane
On try 4, sending query to 192.168.100.42, port 53
Xmit packet len:31 id:6 qr:0
len:85 id:6 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=1,tc=0,rd=1,ra=1,rcode=3
variant:1 sdomains:7
encoding header
lookup name: nxdomain.wifi
On try 3, sending query to 192.168.100.42, port 53
Xmit packet len:31 id:7 qr:0
len:85 id:7 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=1,tc=0,rd=1,ra=1,rcode=3
variant:2 sdomains:7
encoding header
lookup name: nxdomain.wien
On try 2, sending query to 192.168.100.42, port 53
Xmit packet len:31 id:8 qr:0
len:98 id:8 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=1,rcode=3
variant:3 sdomains:7
encoding header
lookup name: nxdomain.berlin
On try 1, sending query to 192.168.100.42, port 53
Xmit packet len:33 id:9 qr:0
len:100 id:9 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=1,rcode=3
variant:4 sdomains:7
encoding header
lookup name: nxdomain.ac.at
On try 0, sending query to 192.168.100.42, port 53
Xmit packet len:32 id:10 qr:0
len:96 id:10 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=1,rcode=3
variant:5 sdomains:7
__dns_lookup returned < 0
ERROR: : Resolver error
#############################################
# univie
argc=2 argv=0x7ffff9da32a8 envp=0x7ffff9da32c0
ELF header=0x7f517874d000
First Dynamic section entry=0x7f5178956ea0
Scanning DYNAMIC section
Done scanning DYNAMIC section
About to do library loader relocations
Done relocating ldso; we can now use globals and make function calls!
_dl_get_ready_to_run:450: Cool, ldso survived making function calls
_dl_malloc:240: mmapping more memory
_dl_ldsopath_init:156: Lib Loader: (0x7874d000)
/scratch/src/uClibc.push7/lib/ld64-uClibc.so.0: using path:
/scratch/src/uClibc.push7/lib
_dl_load_elf_shared_library:772: Found TLS header for
/scratch/src/uClibc.push7/lib/libc.so.0
_dl_load_elf_shared_library:799: Relocated TLS initial image from
0x2ca968 to 0x7f5178744968 (size = 0x8)
_dl_get_ready_to_run:1053: Loading: (0x7f517847a000)
/scratch/src/uClibc.push7/lib/libc.so.0
_dl_get_ready_to_run:1053: Loading: (0x7f517874d000)
/scratch/src/uClibc.push7/lib/ld64-uClibc.so.0
_dl_get_ready_to_run:1194: Calling init_tls()!
_dl_malloc:240: mmapping more memory
_dl_malloc:240: mmapping more memory
_dl_get_ready_to_run:1296: Beginning relocation fixups
_dl_get_ready_to_run:1326: Calling _dl_allocate_tls_init()!
transfering control to application @ 0x400550
Nothing found in /etc/hosts
Looking up type 1 answer for 'univie'
adding search loc
adding search lane
adding search wifi
adding search wien
adding search berlin
adding search ac.at
adding search at
nameservers = 3
encoding header
lookup name: univie
On try 8, sending query to 193.170.61.4, port 53
Xmit packet len:24 id:2 qr:0
len:24 id:2 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=0,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=0,rcode=5
encoding header
lookup name: univie
On try 7, sending query to 193.170.61.5, port 53
Xmit packet len:24 id:3 qr:0
len:24 id:3 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=0,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=0,rcode=5
encoding header
lookup name: univie
On try 6, sending query to 192.168.100.42, port 53
Xmit packet len:24 id:4 qr:0
len:99 id:4 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=1,rcode=3
variant:-1 sdomains:7
encoding header
lookup name: univie.loc
On try 5, sending query to 192.168.100.42, port 53
Xmit packet len:28 id:5 qr:0
len:79 id:5 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=1,tc=0,rd=1,ra=1,rcode=3
variant:0 sdomains:7
encoding header
lookup name: univie.lane
On try 4, sending query to 192.168.100.42, port 53
Xmit packet len:29 id:6 qr:0
len:83 id:6 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=1,tc=0,rd=1,ra=1,rcode=3
variant:1 sdomains:7
encoding header
lookup name: univie.wifi
On try 3, sending query to 192.168.100.42, port 53
Xmit packet len:29 id:7 qr:0
len:83 id:7 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=1,tc=0,rd=1,ra=1,rcode=3
variant:2 sdomains:7
encoding header
lookup name: univie.wien
On try 2, sending query to 192.168.100.42, port 53
Xmit packet len:29 id:8 qr:0
len:96 id:8 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=1,rcode=3
variant:3 sdomains:7
encoding header
lookup name: univie.berlin
On try 1, sending query to 192.168.100.42, port 53
Xmit packet len:31 id:9 qr:0
len:98 id:9 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=1,rcode=3
variant:4 sdomains:7
encoding header
lookup name: univie.ac.at
On try 0, sending query to 192.168.100.42, port 53
Xmit packet len:30 id:10 qr:0
len:257 id:10 qr:1
Got response (i think)!
qrcount=1,ancount=1,nscount=13,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=1,rcode=0
Skipping question 0 at 12
Length of question 0 is 18
Decoding answer at pos 30
decode_answer(start): off 30, len 257
Total decode len = 2
i=2,rdlength=4
Answer name = |univie.ac.at|
Answer type = |1|
a.add_count:0 a.rdlength:4 a.rdata:0x63f03a
name   : univie.ac.at
fam    : 2
addrlen: 4
addr   : 131.130.70.17
total 1 addresses
aliases: univie
#############################################
# nic
argc=2 argv=0x7fff5356d078 envp=0x7fff5356d090
ELF header=0x7f2c1397f000
First Dynamic section entry=0x7f2c13b88ea0
Scanning DYNAMIC section
Done scanning DYNAMIC section
About to do library loader relocations
Done relocating ldso; we can now use globals and make function calls!
_dl_get_ready_to_run:450: Cool, ldso survived making function calls
_dl_malloc:240: mmapping more memory
_dl_ldsopath_init:156: Lib Loader: (0x1397f000)
/scratch/src/uClibc.push7/lib/ld64-uClibc.so.0: using path:
/scratch/src/uClibc.push7/lib
_dl_load_elf_shared_library:772: Found TLS header for
/scratch/src/uClibc.push7/lib/libc.so.0
_dl_load_elf_shared_library:799: Relocated TLS initial image from
0x2ca968 to 0x7f2c13976968 (size = 0x8)
_dl_get_ready_to_run:1053: Loading: (0x7f2c136ac000)
/scratch/src/uClibc.push7/lib/libc.so.0
_dl_get_ready_to_run:1053: Loading: (0x7f2c1397f000)
/scratch/src/uClibc.push7/lib/ld64-uClibc.so.0
_dl_get_ready_to_run:1194: Calling init_tls()!
_dl_malloc:240: mmapping more memory
_dl_malloc:240: mmapping more memory
_dl_get_ready_to_run:1296: Beginning relocation fixups
_dl_get_ready_to_run:1326: Calling _dl_allocate_tls_init()!
transfering control to application @ 0x400550
Nothing found in /etc/hosts
Looking up type 1 answer for 'nic'
adding search loc
adding search lane
adding search wifi
adding search wien
adding search berlin
adding search ac.at
adding search at
nameservers = 3
encoding header
lookup name: nic
On try 8, sending query to 193.170.61.4, port 53
Xmit packet len:21 id:2 qr:0
len:21 id:2 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=0,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=0,rcode=5
encoding header
lookup name: nic
On try 7, sending query to 193.170.61.5, port 53
Xmit packet len:21 id:3 qr:0
len:21 id:3 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=0,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=0,rcode=5
encoding header
lookup name: nic
On try 6, sending query to 192.168.100.42, port 53
Xmit packet len:21 id:4 qr:0
len:96 id:4 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=0,tc=0,rd=1,ra=1,rcode=3
variant:-1 sdomains:7
encoding header
lookup name: nic.loc
On try 5, sending query to 192.168.100.42, port 53
Xmit packet len:25 id:5 qr:0
len:76 id:5 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=1,tc=0,rd=1,ra=1,rcode=3
variant:0 sdomains:7
encoding header
lookup name: nic.lane
On try 4, sending query to 192.168.100.42, port 53
Xmit packet len:26 id:6 qr:0
len:80 id:6 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=1,tc=0,rd=1,ra=1,rcode=3
variant:1 sdomains:7
encoding header
lookup name: nic.wifi
On try 3, sending query to 192.168.100.42, port 53
Xmit packet len:26 id:7 qr:0
len:80 id:7 qr:1
Got response (i think)!
qrcount=1,ancount=0,nscount=1,arcount=0
opcode=0,aa=1,tc=0,rd=1,ra=1,rcode=3
variant:2 sdomains:7
encoding header
lookup name: nic.wien
On try 2, sending query to 192.168.100.42, port 53
Xmit packet len:26 id:8 qr:0
len:198 id:8 qr:1
Got response (i think)!
qrcount=1,ancount=1,nscount=3,arcount=5
opcode=0,aa=0,tc=0,rd=1,ra=1,rcode=0
Skipping question 0 at 12
Length of question 0 is 14
Decoding answer at pos 26
decode_answer(start): off 26, len 198
Total decode len = 2
i=2,rdlength=4
Answer name = |nic.wien|
Answer type = |1|
a.add_count:0 a.rdlength:4 a.rdata:0x1d76036
name   : nic.wien
fam    : 2
addrlen: 4
addr   : 188.40.87.61
total 1 addresses
aliases: nic
#############################################

vs.

$ for i in nxdomain univie nic;do echo "# $i";./ghbn_glibc  $i;echo
"#############################################";done
# nxdomain
ERROR: : Resolver Error 0 (no error)
#############################################
# univie
name   : univie.ac.at
fam    : 2
addrlen: 4
addr   : 131.130.70.17
total 1 addresses
aliases:
#############################################
# nic
name   : nic.wien
fam    : 2
addrlen: 4
addr   : 188.40.87.61
total 1 addresses
aliases:
#############################################

which leaves us with a diff for 'nxdomain' which you might want to
rectify in an incremental update?

TIA && cheers,

Patch

diff --git a/libc/inet/resolv.c b/libc/inet/resolv.c
index 154734d..06c59ee 100644
--- a/libc/inet/resolv.c
+++ b/libc/inet/resolv.c
@@ -453,9 +453,9 @@  extern int __read_etc_hosts_r(parser_t *parser,
 		struct hostent **result,
 		int *h_errnop) attribute_hidden;
 extern int __dns_lookup(const char *name,
-		int type,
+		const int type,
 		unsigned char **outpacket,
-		struct resolv_answer *a) attribute_hidden;
+		struct resolv_answer *a) attribute_hidden __nonnull((1));
 extern int __encode_dotted(const char *dotted,
 		unsigned char *dest,
 		int maxlen) attribute_hidden;
@@ -1230,7 +1230,7 @@  static int __decode_answer(const unsigned char *message, /* packet */
  *      This is a malloced string. May be NULL because strdup failed.
  */
 int __dns_lookup(const char *name,
-		int type,
+		const int type,
 		unsigned char **outpacket,
 		struct resolv_answer *a)
 {
@@ -1240,7 +1240,7 @@  int __dns_lookup(const char *name,
 
 	int i, j, fd, rc;
 	int packet_len;
-	int name_len;
+	const size_t name_len = strlen(name);
 #ifdef USE_SELECT
 	struct timeval tv;
 	fd_set fds;
@@ -1258,18 +1258,16 @@  int __dns_lookup(const char *name,
 	int local_ns_num = -1; /* Nth server to use */
 	int local_id = local_id; /* for compiler */
 	int sdomains;
-	bool ends_with_dot;
+	const bool ends_with_dot = name[name_len - 1] == '.';
 	sockaddr46_t sa;
 
 	fd = -1;
 	lookup = NULL;
-	name_len = strlen(name);
 	if ((unsigned)name_len >= MAXDNAME - MAXLEN_searchdomain - 2)
 		goto fail; /* paranoia */
 	lookup = malloc(name_len + 1/*for '.'*/ + MAXLEN_searchdomain + 1);
 	if (!packet || !lookup || !name[0])
 		goto fail;
-	ends_with_dot = (name[name_len - 1] == '.');
 	/* no strcpy! paranoia, user might change name[] under us */
 	memcpy(lookup, name, name_len);
 
@@ -1456,32 +1454,38 @@  int __dns_lookup(const char *name,
 				h.qdcount, h.ancount, h.nscount, h.arcount);
 		DPRINTF("opcode=%d,aa=%d,tc=%d,rd=%d,ra=%d,rcode=%d\n",
 				h.opcode, h.aa, h.tc, h.rd, h.ra, h.rcode);
+		if (unlikely(h.rcode != 0)) {
+			/* bug 660 says we treat negative response as an error
+			 * and retry, which is, eh, an error. :) */
+
+			/* Try to keep latency and traffic low here, please! */
+			if (h.rcode == REFUSED) {
+				/* type not supported,
+				   try the next server, with fresh variants */
+				goto try_next_server;
+			}
+			/* Insert other non-fatal errors here, which do not warrant
+			 * switching to next nameserver */
 
-		/* bug 660 says we treat negative response as an error
-		 * and retry, which is, eh, an error. :)
-		 * We were incurring long delays because of this. */
-		if (h.rcode == NXDOMAIN || h.rcode == SERVFAIL) {
-			/* if possible, try next search domain */
 			if (!ends_with_dot) {
+				/* When not being asked for a specific delegation we can try
+				 * other variants of the given name */
 				DPRINTF("variant:%d sdomains:%d\n", variant, sdomains);
 				if (variant < sdomains - 1) {
-					/* next search domain */
+					/* if possible, try next search domain */
 					variant++;
 					continue;
 				}
-				/* no more search domains to try */
+			} else if (h.rcode == NXDOMAIN) {
+				/* Specific */
+				/* Fall through to
+				   Finally; dont loop, this is "no such host" situation
+				 */
 			}
-			/* dont loop, this is "no such host" situation */
+			/* Finally; dont loop, this is "no such host" situation */
 			h_errno = HOST_NOT_FOUND;
 			goto fail1;
 		}
-		/* Insert other non-fatal errors here, which do not warrant
-		 * switching to next nameserver */
-
-		/* Strange error, assuming this nameserver is feeling bad */
-		if (h.rcode != 0)
-			goto try_next_server;
-
 		/* Code below won't work correctly with h.ancount == 0, so... */
 		if (h.ancount <= 0) {
 			h_errno = NO_DATA; /* [is this correct code to check for?] */