diff mbox series

[OpenWrt-Devel,RFC] base-files: send informational UDP message each second waiting

Message ID 20191210142420.30748-1-fercerpav@gmail.com
State Accepted
Delegated to: Petr Štetiar
Headers show
Series [OpenWrt-Devel,RFC] base-files: send informational UDP message each second waiting | expand

Commit Message

Paul Fertser Dec. 10, 2019, 2:24 p.m. UTC
The preinit network initialisation and failsafe informational message
are inherently racy as the interface takes some time to become
functional after "ip link set $pi_ifname up" command.

Consider this timing:

[   12.002713] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[   12.008819] IPv6: ADDRCONF(NETDEV_UP): eth1.1: link is not ready
[   12.118877] random: procd: uninitialized urandom read (4 bytes read)
[   13.068614] eth1: link up (1000Mbps/Full duplex)
[   13.073309] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[   13.080445] IPv6: ADDRCONF(NETDEV_CHANGE): eth1.1: link becomes ready

Since the UDP message was sent prior to link becoming ready, it was
never seen on the wire.

The default failsafe timeout is set to 2 seconds, so with this patch
there are two attempts to send the message, one spent in vain, and the
other visible in tcpdump on an attached host. Of course, in cases when
the interface is brought up faster it leads to two messages, however it
should be harmless. This patch (almost) doesn't affect normal boot time
while still allowing to enter failsafe reliably with a single button
press, matching the official "generic failsafe" documentation.

Signed-off-by: Paul Fertser <fercerpav@gmail.com>
---
 package/base-files/files/lib/preinit/30_failsafe_wait | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Petr Štetiar Dec. 10, 2019, 2:42 p.m. UTC | #1
Paul Fertser <fercerpav@gmail.com> [2019-12-10 17:24:20]:

Hi,

> in cases when the interface is brought up faster it leads to two messages

in cases when the interface is brought up slower it leads to no message.

To me it just seems like a workaround to fix your use case, not a proper fix.

-- ynezz
Paul Fertser Dec. 11, 2019, 11:03 a.m. UTC | #2
Hi,

On Tue, Dec 10, 2019 at 03:42:13PM +0100, Petr Štetiar wrote:
> Paul Fertser <fercerpav@gmail.com> [2019-12-10 17:24:20]:
> > in cases when the interface is brought up faster it leads to two messages
> 
> in cases when the interface is brought up slower it leads to no message.
> 
> To me it just seems like a workaround to fix your use case, not a proper fix.

You're right, I mentioned "inherently racy" in the commit message
exactly because of that.

Waiting for LOWER_UP there without a timeout is not a solution because
in the normal bootup case there might be nothing attached to the LAN
so the boot will be effectively halted forever. Waiting with a timeout
poses a question of what that timeout should be set to; and if that's
reasonable to extend current 2 seconds with any significant amount.

Current documentation says a message should be sent. Current code
works for some devices and fails for other devices. My patch improves
the situation without adding any code complexity (indeed, it's even
removing one line) or wasting boot time.

Do you have any other possible solution in mind?
Petr Štetiar Dec. 11, 2019, 11:47 a.m. UTC | #3
Paul Fertser <fercerpav@gmail.com> [2019-12-11 14:03:53]:

> Waiting with a timeout poses a question of what that timeout should be set
> to; and if that's reasonable to extend current 2 seconds with any
> significant amount.

As you've witnessed this default value doesn't work well in some cases.

> Current documentation says a message should be sent.

So now, this would need to be rephrased from current 0 or 1, to 0 to N, where
N depends on config settings (timeout value).

> Do you have any other possible solution in mind?

(nope, just thinking aloud)

If the fixed value doesn't work for all use cases, then its not fixed variable
anymore and should be variable, thus config option, probably finetuned per
target/device.

Question is, if it's worth the hassle for a feature which is targeted more
towards the expert users.

-- ynezz
Paul Fertser Dec. 11, 2019, 11:56 a.m. UTC | #4
On Wed, Dec 11, 2019 at 12:47:22PM +0100, Petr Štetiar wrote:
> Paul Fertser <fercerpav@gmail.com> [2019-12-11 14:03:53]:
> > Waiting with a timeout poses a question of what that timeout should be set
> > to; and if that's reasonable to extend current 2 seconds with any
> > significant amount.
> 
> As you've witnessed this default value doesn't work well in some cases.

The problem I faced was because the message is sent only once _before_
timeout starts ticking, so no matter what it's set to, the message
wouldn't be there.

> > Current documentation says a message should be sent.
> 
> So now, this would need to be rephrased from current 0 or 1, to 0 to N, where
> N depends on config settings (timeout value).

Documentation [0] says "Wait (with a packet sniffer) for a special
broadcast packet and press a button". So I think with this patch
documentation changes are not strictly necessary, it still works the
same way, you wait for the packet and press a button.

> > Do you have any other possible solution in mind?
> 
> (nope, just thinking aloud)
> 
> If the fixed value doesn't work for all use cases, then its not fixed variable
> anymore and should be variable, thus config option, probably finetuned per
> target/device.

Currently all devices wait for 2 seconds during normal bootup no
matter what. Without my patch any value for that timeout wouldn't help
with the UDP message.

[0] https://openwrt.org/docs/guide-user/troubleshooting/failsafe_and_factory_reset
Jo-Philipp Wich Dec. 11, 2019, 11:57 a.m. UTC | #5
Hi,

> Question is, if it's worth the hassle for a feature which is targeted more
> towards the expert users.

from my pov - it is not worth overengineering this feature. The proposed
patch is more than adequate. It increases the probability of the message
getting delivered without additional code complexity.

~ Jo
diff mbox series

Patch

diff --git a/package/base-files/files/lib/preinit/30_failsafe_wait b/package/base-files/files/lib/preinit/30_failsafe_wait
index dd9c7e2b59..85dca398fa 100644
--- a/package/base-files/files/lib/preinit/30_failsafe_wait
+++ b/package/base-files/files/lib/preinit/30_failsafe_wait
@@ -31,6 +31,8 @@  fs_wait_for_key () {
 	lock $keypress_wait
 	{
 		while [ $timer -gt 0 ]; do
+			pi_failsafe_net_message=true \
+				preinit_net_echo "Please press button now to enter failsafe"
 			echo "$timer" >$keypress_sec
 			timer=$(($timer - 1))
 			sleep 1
@@ -88,9 +90,6 @@  failsafe_wait() {
 	}
 	grep -q 'failsafe=' /proc/cmdline && FAILSAFE=true && export FAILSAFE
 	if [ "$FAILSAFE" != "true" ]; then
-		pi_failsafe_net_message=true
-		preinit_net_echo "Please press button now to enter failsafe"
-		pi_failsafe_net_message=false
 		fs_wait_for_key f 'to enter failsafe mode' $fs_failsafe_wait_timeout && FAILSAFE=true
 		[ -f "/tmp/failsafe_button" ] && FAILSAFE=true && echo "- failsafe button "`cat /tmp/failsafe_button`" was pressed -"
 		[ "$FAILSAFE" = "true" ] && export FAILSAFE && touch /tmp/failsafe