Message ID | 20191210142420.30748-1-fercerpav@gmail.com |
---|---|
State | Accepted |
Delegated to: | Petr Štetiar |
Headers | show |
Series | [OpenWrt-Devel,RFC] base-files: send informational UDP message each second waiting | expand |
Paul Fertser <fercerpav@gmail.com> [2019-12-10 17:24:20]:
Hi,
> in cases when the interface is brought up faster it leads to two messages
in cases when the interface is brought up slower it leads to no message.
To me it just seems like a workaround to fix your use case, not a proper fix.
-- ynezz
Hi, On Tue, Dec 10, 2019 at 03:42:13PM +0100, Petr Štetiar wrote: > Paul Fertser <fercerpav@gmail.com> [2019-12-10 17:24:20]: > > in cases when the interface is brought up faster it leads to two messages > > in cases when the interface is brought up slower it leads to no message. > > To me it just seems like a workaround to fix your use case, not a proper fix. You're right, I mentioned "inherently racy" in the commit message exactly because of that. Waiting for LOWER_UP there without a timeout is not a solution because in the normal bootup case there might be nothing attached to the LAN so the boot will be effectively halted forever. Waiting with a timeout poses a question of what that timeout should be set to; and if that's reasonable to extend current 2 seconds with any significant amount. Current documentation says a message should be sent. Current code works for some devices and fails for other devices. My patch improves the situation without adding any code complexity (indeed, it's even removing one line) or wasting boot time. Do you have any other possible solution in mind?
Paul Fertser <fercerpav@gmail.com> [2019-12-11 14:03:53]: > Waiting with a timeout poses a question of what that timeout should be set > to; and if that's reasonable to extend current 2 seconds with any > significant amount. As you've witnessed this default value doesn't work well in some cases. > Current documentation says a message should be sent. So now, this would need to be rephrased from current 0 or 1, to 0 to N, where N depends on config settings (timeout value). > Do you have any other possible solution in mind? (nope, just thinking aloud) If the fixed value doesn't work for all use cases, then its not fixed variable anymore and should be variable, thus config option, probably finetuned per target/device. Question is, if it's worth the hassle for a feature which is targeted more towards the expert users. -- ynezz
On Wed, Dec 11, 2019 at 12:47:22PM +0100, Petr Štetiar wrote: > Paul Fertser <fercerpav@gmail.com> [2019-12-11 14:03:53]: > > Waiting with a timeout poses a question of what that timeout should be set > > to; and if that's reasonable to extend current 2 seconds with any > > significant amount. > > As you've witnessed this default value doesn't work well in some cases. The problem I faced was because the message is sent only once _before_ timeout starts ticking, so no matter what it's set to, the message wouldn't be there. > > Current documentation says a message should be sent. > > So now, this would need to be rephrased from current 0 or 1, to 0 to N, where > N depends on config settings (timeout value). Documentation [0] says "Wait (with a packet sniffer) for a special broadcast packet and press a button". So I think with this patch documentation changes are not strictly necessary, it still works the same way, you wait for the packet and press a button. > > Do you have any other possible solution in mind? > > (nope, just thinking aloud) > > If the fixed value doesn't work for all use cases, then its not fixed variable > anymore and should be variable, thus config option, probably finetuned per > target/device. Currently all devices wait for 2 seconds during normal bootup no matter what. Without my patch any value for that timeout wouldn't help with the UDP message. [0] https://openwrt.org/docs/guide-user/troubleshooting/failsafe_and_factory_reset
Hi, > Question is, if it's worth the hassle for a feature which is targeted more > towards the expert users. from my pov - it is not worth overengineering this feature. The proposed patch is more than adequate. It increases the probability of the message getting delivered without additional code complexity. ~ Jo
diff --git a/package/base-files/files/lib/preinit/30_failsafe_wait b/package/base-files/files/lib/preinit/30_failsafe_wait index dd9c7e2b59..85dca398fa 100644 --- a/package/base-files/files/lib/preinit/30_failsafe_wait +++ b/package/base-files/files/lib/preinit/30_failsafe_wait @@ -31,6 +31,8 @@ fs_wait_for_key () { lock $keypress_wait { while [ $timer -gt 0 ]; do + pi_failsafe_net_message=true \ + preinit_net_echo "Please press button now to enter failsafe" echo "$timer" >$keypress_sec timer=$(($timer - 1)) sleep 1 @@ -88,9 +90,6 @@ failsafe_wait() { } grep -q 'failsafe=' /proc/cmdline && FAILSAFE=true && export FAILSAFE if [ "$FAILSAFE" != "true" ]; then - pi_failsafe_net_message=true - preinit_net_echo "Please press button now to enter failsafe" - pi_failsafe_net_message=false fs_wait_for_key f 'to enter failsafe mode' $fs_failsafe_wait_timeout && FAILSAFE=true [ -f "/tmp/failsafe_button" ] && FAILSAFE=true && echo "- failsafe button "`cat /tmp/failsafe_button`" was pressed -" [ "$FAILSAFE" = "true" ] && export FAILSAFE && touch /tmp/failsafe
The preinit network initialisation and failsafe informational message are inherently racy as the interface takes some time to become functional after "ip link set $pi_ifname up" command. Consider this timing: [ 12.002713] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready [ 12.008819] IPv6: ADDRCONF(NETDEV_UP): eth1.1: link is not ready [ 12.118877] random: procd: uninitialized urandom read (4 bytes read) [ 13.068614] eth1: link up (1000Mbps/Full duplex) [ 13.073309] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready [ 13.080445] IPv6: ADDRCONF(NETDEV_CHANGE): eth1.1: link becomes ready Since the UDP message was sent prior to link becoming ready, it was never seen on the wire. The default failsafe timeout is set to 2 seconds, so with this patch there are two attempts to send the message, one spent in vain, and the other visible in tcpdump on an attached host. Of course, in cases when the interface is brought up faster it leads to two messages, however it should be harmless. This patch (almost) doesn't affect normal boot time while still allowing to enter failsafe reliably with a single button press, matching the official "generic failsafe" documentation. Signed-off-by: Paul Fertser <fercerpav@gmail.com> --- package/base-files/files/lib/preinit/30_failsafe_wait | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)