[LEDE-DEV,v1] dnsmasq: use SIGUSR2 for dnssec time valid

Message ID 20180102152901.27154-1-ldir@darbyshire-bryant.me.uk
State Superseded
Headers show
Series
  • [LEDE-DEV,v1] dnsmasq: use SIGUSR2 for dnssec time valid
Related show

Commit Message

Kevin 'ldir' Darbyshire-Bryant Jan. 2, 2018, 3:29 p.m.
Move 'check dnssec timestamp enable' from SIGHUP handler to SIGUSR2.

Dnsmasq uses SIGHUP to do too many things: 1) set dnssec time validation
enabled, 2) bump SOA zone serial, 3) clear dns cache, 4) reload hosts
files, 5) reload resolvers/servers files.  SIGUSR2 is used to
re-open/re-start the logfile.  Default LEDE does not use logfile
functionality.

Many subsystems within LEDE can send SIGHUP to dnsmasq: 1) ntpd hotplug
(to indicate time is valid for dnssec) 2) odhcpd (to indicate a
new/removed host - typically DHCPv6 leases) 3) procd on interface state
changes 4) procd on system config state changes, 5) service reload.

If dnssec time validation is enabled before the system clock has been
set to a sensible time, name resolution will fail.  Because name
resolution fails, ntpd is unable to resolve time server names to
addresses, so is unable to set time.  Classic chicken/egg.

Since commits 23bba9cb330cd298739a16e350b0029ed9429eef (service reload) &
4f02285d8b4a66359a8fa46f22a3efde391b5419 (system config)  make it more
likely a SIGHUP will be sent for events other than 'ntpd has set time'
it is more likely that an errant 'name resolution is failing for
everything' situation will be encountered.

Ideally dnsmasq would have some other IPC mechanism for indicating 'time
is valid, go check dnssec timestamps', but until that time
(implementation is left as an exercise for the interested/competent
reader/bikeshedder) the next best thing is to move functionality from
the overloaded SIGHUP signal to the under-utilised SIGUSR2.

ntpd hotplug script updated to use SIGUSR2.

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
 package/network/services/dnsmasq/Makefile          |  2 +-
 .../services/dnsmasq/files/dnsmasqsec.hotplug      |  2 +-
 .../dnsmasq/patches/250-dnssec-SIGUSR2.patch       | 32 ++++++++++++++++++++++
 3 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 package/network/services/dnsmasq/patches/250-dnssec-SIGUSR2.patch

Comments

e9hack Jan. 5, 2018, 8:22 a.m. | #1
Am 02.01.2018 um 16:29 schrieb Kevin Darbyshire-Bryant:
> Move 'check dnssec timestamp enable' from SIGHUP handler to SIGUSR2.

Hi,

your patch fixes the DNS problem for me.

Now I get another ugly behaviour which is more related to ntpd from busybox. Ntpd answers to ntp request before it did
update the time of the router. I've connected a voip phone to the router, which uses the router as time server. In the
past, I did never see a wrong date/time on the phone. Now after a reboot/update of the router, the phone uses a wrong
time for around 6 hours. I add a few firewall logging rules to monitor ntp and dns traffic. After the reboot of the
router, the phone sends first dns request and a few seconds later the first ntp request. This first ntp request is send
a few seconds before ntpd sends ntp requests over the wan interface and does update the time of the router.

Regards,
Hartmut
Kevin Darbyshire-Bryant Jan. 5, 2018, 9:30 a.m. | #2
> On 5 Jan 2018, at 08:22, e9hack <e9hack@gmail.com> wrote:

> 

> Am 02.01.2018 um 16:29 schrieb Kevin Darbyshire-Bryant:

>> Move 'check dnssec timestamp enable' from SIGHUP handler to SIGUSR2.

> 

> Hi,

> 

> your patch fixes the DNS problem for me.

Good, as I suspected it would.
> 

> Now I get another ugly behaviour which is more related to ntpd from busybox. Ntpd answers to ntp request before it did

> update the time of the router. I've connected a voip phone to the router, which uses the router as time server. In the

> past, I did never see a wrong date/time on the phone. Now after a reboot/update of the router, the phone uses a wrong

> time for around 6 hours. I add a few firewall logging rules to monitor ntp and dns traffic. After the reboot of the

> router, the phone sends first dns request and a few seconds later the first ntp request. This first ntp request is send

> a few seconds before ntpd sends ntp requests over the wan interface and does update the time of the router.


I don’t have a magic patch for this problem.  It seems to me that ideally busybox ntpd shouldn’t serve time until it has sync’d.  A horrible hack idea: firewall rule to drop incoming ntp requests from clients… have an ntpd hotplug script that captures the stratum change event and removes the firewall rules.  And/or ntpd has a new command line switch implemented that does the same thing ie. ignore ntp requests until sync’d.

Failing that, maybe you really can’t have DNS and time at the same time ;-)   Now where did I put that chicken…...


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
Karl Palsson Jan. 5, 2018, 11:10 a.m. | #3
Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
> 
> I don’t have a magic patch for this problem. It seems to me
> that ideally busybox ntpd shouldn’t serve time until it has
> sync’d. A horrible hack idea: firewall rule to drop incoming
> ntp requests from clients… have an ntpd hotplug script that
> captures the stratum change event and removes the firewall
> rules. And/or ntpd has a new command line switch implemented
> that does the same thing ie. ignore ntp requests until sync’d.
> 
> Failing that, maybe you really can’t have DNS and time at the
> same time ;-) Now where did I put that chicken…...

The freshly released busybox 1.28 has a number of DNS related
fixes for NTP btw.

      ntpd: do run the script at least once in 11 minutes
      ntpd: improve treatment of DNS resolution failures
      ntpd: mention in help text that -d can be repeated
      ntpd: perform DNS resolution out of send/receive loop. Closes 10466
      ntpd: skip over setting next DNS resolution attempt if it is not needed


Cheers,
Karl P
Kevin Darbyshire-Bryant Jan. 9, 2018, 11:36 a.m. | #4
> On 5 Jan 2018, at 11:10, Karl Palsson <karlp@tweak.net.au> wrote:
> 
> 
> The freshly released busybox 1.28 has a number of DNS related
> fixes for NTP btw.
> 
>      ntpd: do run the script at least once in 11 minutes
>      ntpd: improve treatment of DNS resolution failures
>      ntpd: mention in help text that -d can be repeated
>      ntpd: perform DNS resolution out of send/receive loop. Closes 10466
>      ntpd: skip over setting next DNS resolution attempt if it is not needed
> 
> 
> Cheers,
> Karl P

Hi Karl,

Thanks for the pointers and thoughts, unfortunately none of the fixes can help.  It’s a sort of race problem.  For background (and I apologise if you know any of this already)

dnssec offers a means of validating dns data is correct within a zone.  Part of that validation includes a timestamp ‘this zone data is to be trusted until such a time’.  This implies that the requestor of the data has a good (enough) idea of the current time in order to work out if it should trust the response or not.

dnsmasq can validate whether a non dnssec zone should be a non dnssec zone (in other words, is someone faking to us that a dnssec zone is actually a non dnssec zone and thus can fake further responses within that zone) - it does so by getting a dnssec signed message that ‘the non dnssec zone is really a non dnssec zone’ with a timestamp (and hence needs time to be valid to verify)

dnsmasq can run with dnssec enabled (—dnssec) where it just checks dnssec signed zones (and hence requires valid times). It can also run with unsigned zone checking (—dnssec-check-unsigned) which would also required valid time set.

From the dnsmasq man page:

"DNSSEC signatures are only valid for specified time windows, and should be rejected outside those windows. This generates an interesting chicken-and-egg problem for machines which don't have a hardware real time clock. For these machines to determine the correct time typically requires use of NTP and therefore DNS, but validating DNS requires that the correct time is already known. Setting this flag removes the time-window checks (but not other DNSSEC validation.) only until the dnsmasq process receives SIGHUP. The intention is that dnsmasq should be started with this flag when the platform determines that reliable time is not currently available. As soon as reliable time is established, a SIGHUP should be sent to dnsmasq, which enables time checking, and purges the cache of DNS records which have not been throughly checked."

ntpd needs name resolution to work in order to turn ‘a.ntpservername.org’ into an address and query it for time.  name resolution needs the time to be set correctly in order to validate the returned address info - the ferocity/severity of checking non dnssec zones being determined by ‘—dnssec-check-unsigned’)

As described in the manpage, dnsmasq offers a way out of the chicken/egg problem by relaxing the time validation until dnsmasq is signalled that it should go into full paranoia mode (—dnssec-no-timecheck), the signal being sent a SIGHUP.

This makes SIGHUP a rather important signal, but unfortunately, dnsmasq uses SIGHUP to indicate other things as well as ‘time is valid for dnssec’.  These ‘other things’ are important to other LEDE processes (typically procd & odhcpd) and are being sent more frequently.  The problem with that is that the ‘other things’ are erroneously telling dnsmasq that time is valid, *before* ntpd has actually had a chance to set the correct system time.

This breaks name resolution for dnssec zone (—dnssec) and potentially completely (—dnssec-check-unsigned).  If you were checking only dnssec zones (ie no —dnssec-check-unsigned) AND your ntp server names resided in a non-dnssec zone, then eventually access to the dnssec zone would come good.

So, I’ve currently a patch in submission with dnsmasq to change which signal is used to indicate that time checking should be enabled.  It’s similar to the v1 patch here, but better ;-)  I intend to backport the upstream patch once it has settled.


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A

Patch

diff --git a/package/network/services/dnsmasq/Makefile b/package/network/services/dnsmasq/Makefile
index c6d2739f03..1224ad86f8 100644
--- a/package/network/services/dnsmasq/Makefile
+++ b/package/network/services/dnsmasq/Makefile
@@ -9,7 +9,7 @@  include $(TOPDIR)/rules.mk
 
 PKG_NAME:=dnsmasq
 PKG_VERSION:=2.78
-PKG_RELEASE:=7
+PKG_RELEASE:=8
 
 PKG_SOURCE:=$(PKG_NAME)-$(PKG_VERSION).tar.xz
 PKG_SOURCE_URL:=http://thekelleys.org.uk/dnsmasq/
diff --git a/package/network/services/dnsmasq/files/dnsmasqsec.hotplug b/package/network/services/dnsmasq/files/dnsmasqsec.hotplug
index a155eb0f6e..f9fb4b533d 100644
--- a/package/network/services/dnsmasq/files/dnsmasqsec.hotplug
+++ b/package/network/services/dnsmasq/files/dnsmasqsec.hotplug
@@ -9,6 +9,6 @@  TIMEVALIDFILE="/var/state/dnsmasqsec"
 [ -f "$TIMEVALIDFILE" ] || {
 	echo "ntpd says time is valid" >$TIMEVALIDFILE
 	/etc/init.d/dnsmasq enabled && {
-		procd_send_signal dnsmasq
+		procd_send_signal dnsmasq '*' USR2
 	}
 }
diff --git a/package/network/services/dnsmasq/patches/250-dnssec-SIGUSR2.patch b/package/network/services/dnsmasq/patches/250-dnssec-SIGUSR2.patch
new file mode 100644
index 0000000000..1c7ffa5123
--- /dev/null
+++ b/package/network/services/dnsmasq/patches/250-dnssec-SIGUSR2.patch
@@ -0,0 +1,32 @@ 
+--- a/src/dnsmasq.c
++++ b/src/dnsmasq.c
+@@ -1296,13 +1296,6 @@ static void async_event(int pipe, time_t
+       case EVENT_RELOAD:
+ 	daemon->soa_sn++; /* Bump zone serial, as it may have changed. */
+ 
+-#ifdef HAVE_DNSSEC
+-	if (daemon->dnssec_no_time_check && option_bool(OPT_DNSSEC_VALID) && option_bool(OPT_DNSSEC_TIME))
+-	  {
+-	    my_syslog(LOG_INFO, _("now checking DNSSEC signature timestamps"));
+-	    daemon->dnssec_no_time_check = 0;
+-	  } 
+-#endif
+ 	/* fall through */
+ 	
+       case EVENT_INIT:
+@@ -1399,6 +1392,15 @@ static void async_event(int pipe, time_t
+ 	   we leave them logging to the old file. */
+ 	if (daemon->log_file != NULL)
+ 	  log_reopen(daemon->log_file);
++
++#ifdef HAVE_DNSSEC
++	if (daemon->dnssec_no_time_check && option_bool(OPT_DNSSEC_VALID) && option_bool(OPT_DNSSEC_TIME))
++	  {
++	    my_syslog(LOG_INFO, _("now checking DNSSEC signature timestamps"));
++	    daemon->dnssec_no_time_check = 0;
++	    clear_cache_and_reload(now);
++	  }
++#endif
+ 	break;
+ 
+       case EVENT_NEWADDR: