Message ID | 20190208175245.2314-1-halves@canonical.com |
---|---|
State | New |
Headers | show |
Series | [RFC] getaddrinfo: Force name resolution for AI_CANONNAME [BZ# 24182] | expand |
* Heitor R. Alves de Siqueira: > This patch forces name resolution if the AI_CANONNAME flag is set. Even > if inet_aton_exact() identifies the input name as being a valid IPv4 > address, we will try name resolution in case it's a valid hostname. If > no hostname is found after resolution, the input name is still copied > to the ai_canonname field. This is not correct because it sends queries for names such as 192.0.2.1 to the DNS root servers. I'm sorry, but I still don't see how the general idea is useful. Which applications benefit if getaddrinfo returns in ai_canonname which will most likely resolve to a completely different set of addresses? Do you have a bug report that requests a behavior change in this area? Which problem is this trying to address? Thanks, Florian
* Florian Weimer: > This is not correct because it sends queries for names such as 192.0.2.1 to the DNS root servers. Right, that can be avoided by setting the AI_NUMERICHOST flag, but it is a problem if we just set the AI_CANONNAME flag (and then getaddrinfo() wouldn't know if we meant the nodename to be an IPv4 address). An alternative solution is to check if the nodename contains any '.' characters (e.g. using strchr) after it was identified by inet_aton_exact(). In that case, we could set ai_family to AF_UNSPEC and try name resolution since the nodename could be either an IPv4 adress in 32-bit format or a numeric hostname. Do you think that would be a better approach, Florian? > I'm sorry, but I still don't see how the general idea is useful. Which applications benefit if getaddrinfo returns in ai_canonname which will most likely resolve to a completely different set of addresses? Most applications that go through glibc for network connections would benefit from this. As an example, with the current behaviour we can't ssh to a machine called '12345' in our LAN even if we explicitly add its IP address to /etc/hosts: $ head -n2 /etc/hosts 127.0.0.1 localhost 10.188.133.187 12345.lan 12345 $ ssh 12345 ssh: connect to host 12345 port 22: Invalid argument If we change getaddrinfo() to handle digits-only hostnames, then we can correctly reach the host: $ ssh 12345 user@12345: Permission denied (publickey). We could connect to these hosts using the FQDN according to the hosts file or DNS, but I think it's reasonable to expect the numeric host to resolve if it's set in the /etc/hosts file or equivalent. Changing getaddrinfo() so that it resolves numeric hostnames helps with this scenario not only for ssh, but also for other glibc-dependent programs. Thanks! Heitor
* Heitor Alves de Siqueira: > * Florian Weimer: > >> This is not correct because it sends queries for names such as 192.0.2.1 > to the DNS root servers. > > Right, that can be avoided by setting the AI_NUMERICHOST flag, but it is a > problem if we just set the AI_CANONNAME flag (and then getaddrinfo() wouldn't > know if we meant the nodename to be an IPv4 address). Just be clear here, we need to avoid sending those queries in all cases, whether AI_NUMERICHOST is set or not. > An alternative solution is to check if the nodename contains any '.' characters > (e.g. using strchr) after it was identified by inet_aton_exact(). In that > case, we could set ai_family to AF_UNSPEC and try name resolution since the > nodename could be either an IPv4 adress in 32-bit format or a numeric hostname. > Do you think that would be a better approach, Florian? It is at least theoretically possible to attempt a host name lookup for a name that is a non-negative integer, and use the integer as an IPv4 address only as a fallback if name resolution through NSS does not deliver any results. This would still benefit from changes to the stub resolver that essentially make sure that these queries do not reach the root servers (related to bug 19634). The question is if it's worth this complexity, and the resulting lack of consistency with what other systems do (and older versions of glibc which have not backported this change). >> I'm sorry, but I still don't see how the general idea is useful. Which > applications benefit if getaddrinfo returns in ai_canonname which will most > likely resolve to a completely different set of addresses? > > Most applications that go through glibc for network connections would benefit > from this. As an example, with the current behaviour we can't ssh to a machine > called '12345' in our LAN even if we explicitly add its IP address to > /etc/hosts: > $ head -n2 /etc/hosts > 127.0.0.1 localhost > 10.188.133.187 12345.lan 12345 > > $ ssh 12345 > ssh: connect to host 12345 port 22: Invalid argument > > If we change getaddrinfo() to handle digits-only hostnames, then we can > correctly reach the host: > $ ssh 12345 > user@12345: Permission denied (publickey). Is there *any* system that currently behaves this way? I checked Windows 10 (native, not WSL, obviously), and it very closely matches the glibc behavior: octal parsing, and the hosts file does not override parsing as numeric domain names. This is not surprising, given the shared ancestry in the BIND stub resolver code. Thanks, Florian
diff --git a/sysdeps/posix/getaddrinfo.c b/sysdeps/posix/getaddrinfo.c index aa054b620f2a..fa9e2d6ad3b1 100644 --- a/sysdeps/posix/getaddrinfo.c +++ b/sysdeps/posix/getaddrinfo.c @@ -505,9 +505,6 @@ gaih_inet (const char *name, const struct gaih_service *service, result = -EAI_ADDRFAMILY; goto free_and_return; } - - if (req->ai_flags & AI_CANONNAME) - canon = name; } else if (at->family == AF_UNSPEC) { @@ -548,7 +545,8 @@ gaih_inet (const char *name, const struct gaih_service *service, } } - if (at->family == AF_UNSPEC && (req->ai_flags & AI_NUMERICHOST) == 0) + if ((at->family == AF_UNSPEC || (req->ai_flags & AI_CANONNAME)) + && (req->ai_flags & AI_NUMERICHOST) == 0) { struct gaih_addrtuple **pat = &at; int no_data = 0;