Message ID | 0175978a-476c-e5f3-1da0-12bb78de7f54@kleine-koenig.org |
---|---|
State | New |
Headers | show |
Series | nslookup failures with coarse CLOCK_MONOTONIC | expand |
On Sat, Oct 08, 2022 at 01:04:25AM +0200, Uwe Kleine-König wrote: > Hello, > > on a TP-Link RE200 v1 (platform = ramips/mt7620) I experience often: > > root@ares:~# nslookup www.openwrt.org > Server: 127.0.0.1 > Address: 127.0.0.1:53 > > Non-authoritative answer: > www.openwrt.org canonical name = wiki-01.infra.openwrt.org > Name: wiki-01.infra.openwrt.org > Address: 2a03:b0c0:3:d0::1af1:1 > > *** Can't find www.openwrt.org: No answer > > I narrowed the problem down to the following: > > nslookup creates and sends two querys (for A and AAAA) using > res_mkquery(). Each query has a more or less random ID and nslookup > matches the received responses using these IDs to the sent querys. This was fixed for the libc stub resolver in commit 6c858d6fd4df8b5498ef2cae66c8f3c3eff1587b, which is not present in any release yet but in mainline git. However, it looks like you've hit it with code directly using the res_* API, which would not get the fix. > Looking at the sent queries using tcpdump, I saw that in the above > scenario the two IDs are identical. Then nslookup matches the first > received answer to the first query and discards the second reply, as > it's matched to the already handled first query, too. > > In a few cases where both lookups succeed, I saw the following pairs of IDs: > > 17372 37373 > 40961 60961 > 45955 419 > 47302 1766 > > Musl does the following to create the 16 bit ID: > > /* Make a reasonably unpredictable id */ > clock_gettime(CLOCK_REALTIME, &ts); > id = ts.tv_nsec + ts.tv_nsec/65536UL & 0xffff; > q[0] = id/256; > q[1] = id; > > (from musl's src/network/res_mkquery.c) My hypothesis now is that > the monotonic clock has a resolution of 20 µs only. So if the two > res_mkquery() calls are called within the same 20 µs tick, the IDs > end up being identical. If they happen in two consecutive ticks, the > IDs have a delta of 20000 or 20001 which matches the four cases > observed above. > > To improve the situation I suggest something like: > > diff --git a/src/network/res_mkquery.c b/src/network/res_mkquery.c > index 614bf7864b48..78b3095fe959 100644 > --- a/src/network/res_mkquery.c > +++ b/src/network/res_mkquery.c > @@ -11,6 +11,7 @@ int __res_mkquery(int op, const char *dname, int > class, int type, > struct timespec ts; > size_t l = strnlen(dname, 255); > int n; > + static unsigned int querycnt; > > if (l && dname[l-1]=='.') l--; > if (l && dname[l-1]=='.') return -1; > @@ -34,6 +35,8 @@ int __res_mkquery(int op, const char *dname, int > class, int type, > > /* Make a reasonably unpredictable id */ > clock_gettime(CLOCK_REALTIME, &ts); > + /* force a different ID if mkquery was called twice during > the same tick */ > + ts.tv_nsec += querycnt++; > id = ts.tv_nsec + ts.tv_nsec/65536UL & 0xffff; > q[0] = id/256; > q[1] = id; > > Would that make sense? > > Note I'm not subscribed to the musl mailing list, so please Cc: me > on replies. This isn't acceptable as-is because it introduces a data race. That could be fixed in various ways, but I'm not sure if it's even our responsibility to fix it, If a caller of res_mkquery is going to send multiple queries on the same source port, it really needs to be making sure on its own that they have distinct query IDs. Any 16-bit random-ish identity is going to have collisions given enough attempts, even if the clock is not low-resolution. Being time-based probably makes it slightly less bad than pure random here, but I still suspect it's a problem. The res_mkquery function simply doesn't and can't know when you're going to use the results and what domain their IDs need to be unique in. My view of the randomness here is that it wasn't put in to avoid collisions (space is too small) but to help (along with random ports) make spoofing results less successful. Which implementation of nslookup is this? Busybox? It would probably be useful to hear thoughts on it from their side. Rich
Hi, > [...] > Which implementation of nslookup is this? Busybox? It would probably > be useful to hear thoughts on it from their side. assuming the OP is using standard OpenWrt nslookup, it is the "big" busybox nslookup implementation, which is using the res_*() api and name lookup logic borrowed from musl libc instead of the original "small" version fiddling with the `_res` state directly (and being broken on musl libc due to that). The proper course of action here is likely adapting the solution in 6c858d6fd4df8b5498ef2cae66c8f3c3eff1587b and porting it to the busybox "big" nslookup code itself. I agree that musl libc itself cannot do much more to ensure uniqueness of the IDs generated by res_mkquery() and that it should be solved in the application code itself in this case. Regards, Jo
On Sat, Oct 08, 2022 at 01:53:29AM +0200, Jo-Philipp Wich wrote: > Hi, > > > [...] > > Which implementation of nslookup is this? Busybox? It would probably > > be useful to hear thoughts on it from their side. > assuming the OP is using standard OpenWrt nslookup, it is the "big" busybox > nslookup implementation, which is using the res_*() api and name lookup logic > borrowed from musl libc instead of the original "small" version fiddling with > the `_res` state directly (and being broken on musl libc due to that). > > The proper course of action here is likely adapting the solution in > 6c858d6fd4df8b5498ef2cae66c8f3c3eff1587b and porting it to the busybox "big" > nslookup code itself. > > I agree that musl libc itself cannot do much more to ensure uniqueness of the > IDs generated by res_mkquery() and that it should be solved in the application > code itself in this case. While it won't be as fast (not parallel unless you do threads) it might be worth just using res_send in the Busybox "big" nslookup. At present neither busybox nor musl supports TCP fallback for large records, but the next release of musl will, and the fact that it's using its own query look with UDP derived from the musl code means it won't get that benefit. Alternatively, busybox could copy our parallel TCP fallback code if they like. :-) Rich
diff --git a/src/network/res_mkquery.c b/src/network/res_mkquery.c index 614bf7864b48..78b3095fe959 100644 --- a/src/network/res_mkquery.c +++ b/src/network/res_mkquery.c @@ -11,6 +11,7 @@ int __res_mkquery(int op, const char *dname, int class, int type, struct timespec ts; size_t l = strnlen(dname, 255); int n; + static unsigned int querycnt; if (l && dname[l-1]=='.') l--;