diff mbox

WPA Supplicant causes timeouts

Message ID 20161113163410.GA3404@w1.fi
State Accepted
Headers show

Commit Message

Jouni Malinen Nov. 13, 2016, 4:34 p.m. UTC
On Sun, Oct 30, 2016 at 08:01:19PM +0100, Nickolai Dobrynin wrote:
> Last February, the distro I was using - Gentoo - switched
> from WPA Supplicant v. 2.4 to v. 2.5, and recently to 2.6.
> 
> After the upgrade, I started having frequent (every 2-4 mins)
> transient timeouts in all of my web browsers.  I looked at the
> logs and found this:
> 
> Oct 30 03:08:17 wpa_supplicant[335]: nl80211: Associated on 5500 MHz
> Oct 30 03:08:17 wpa_supplicant[335]: nl80211: Associated on 2462 MHz
> 
> Why would it associate and instantly reassociate with the same AP on a different
> frequency?  I am sending the relevant log snippet.  Is there any way to stop
> these timeouts from happening?

> Oct 30 03:08:17 wpa_supplicant[335]: wlp3s0: Selecting BSS from priority group 0
> Oct 30 03:08:17 wpa_supplicant[335]: wlp3s0: 0: *** ssid='***'
> wpa_ie_len=0 rsn_ie_len=20 caps=0x411 level=-57 freq=2462  wps
> Oct 30 03:08:17 wpa_supplicant[335]: wlp3s0:    selected based on RSN IE

This seems to indicate that the best BSS result in the last scan was
from the 2.4 GHz band.

> Oct 30 03:08:17 wpa_supplicant[335]: wlp3s0: Considering within-ESS
> reassociation
> Oct 30 03:08:17 wpa_supplicant[335]: wlp3s0: Current BSS: ***
> level=-68 snr=24 est_throughput=390001
> Oct 30 03:08:17 wpa_supplicant[335]: wlp3s0: Selected BSS: ***
> level=-57 snr=32 est_throughput=65000

However, the current BSS (on the 5 GHz band) has better estimated
throughput even though its signal strength is significantly lower.

> Oct 30 03:08:17 wpa_supplicant[335]: wlp3s0: Considering connect
> request: reassociate: 0  selected: ***  bssid: ***  pending:

But the roaming to the selected BSS is allowed.. This does not really
look reasonable in this specific case. It might be useful to check what
triggered the current BSS being selected (i.e., go through a longer log
snippet showing multiple roams between the BSSs). My guess is that there
would be "Allow reassociation - selected BSS has better estimated
throughput" somewhere there. This case was added to make it easier to
roam from a less capable AP (e.g., 2.4 GHz HT) to a more capable AP
(e.g., 5 GHz VHT) even if the signal strength would not allow that on
its own.

It looks like this change only applied in this lower throughput to
higher throughput direction and that could result in cases where a
significantly higher signal strength from the AP that has a lower
estimated throughput estimate would allow roaming back to the previous
AP on the next scan and then back to the higher throughput one on the
next scan, and so on..

The following changes could be tried to avoid this:

Use estimated throughput to avoid signal based roaming decision

Previously, the estimated throughput was used to enable roaming to a
better AP. However, this information was not used when considering a
roam to an AP that has better signal strength, but smaller estimated
throughput. This could result in allowing roaming from 5 GHz band to 2.4
GHz band in cases where 2.4 GHz band has significantly higher signal
strength, but still a lower throughput estimate.

Make this less likely to happen by increasing/reducing the minimum
required signal strength difference based on the estimated throughputs
of the current and selected AP. In addition, add more details about the
selection process to the debug log to make it easier to determine whaty
happened and why.

Signed-off-by: Jouni Malinen <j@w1.fi>
---
 wpa_supplicant/events.c | 52 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 10 deletions(-)

Comments

Nickolai Dobrynin Nov. 13, 2016, 8:54 p.m. UTC | #1
Which version will have this patch?

Also, regarding the original problem, I thought I would
add a few more details.  With the 5500MHz frequency
turned off, timeouts are nearly gone.  I *am* still seeing
occasional ones: 1 timeout every 5-6-7 days.

To answer Dan's q on whether both frequencies were
coming from the same AP, in my setup - the answer
is yes.

Many thx.
Jouni Malinen Nov. 19, 2016, 3:32 p.m. UTC | #2
On Sun, Nov 13, 2016 at 09:54:42PM +0100, Nickolai Dobrynin wrote:
> Which version will have this patch?

It is now in the master branch so assuming no issues show up, it should
be included in the next release (v2.7).

> Also, regarding the original problem, I thought I would
> add a few more details.  With the 5500MHz frequency
> turned off, timeouts are nearly gone.  I *am* still seeing
> occasional ones: 1 timeout every 5-6-7 days.

Could you please provide some more detail on what extra you mean with
"timeouts"? I tried to interpret the earlier description on this, but
I'm not sure I understood it correctly. Is this referring to some upper
layer protocol timing out due to a short disconnection at link layer or
is the full WLAN connection down for a longer time?
Nickolai Dobrynin Nov. 19, 2016, 5:30 p.m. UTC | #3
Jouni,

On Sat, Nov 19, 2016 at 4:32 PM, Jouni Malinen <j@w1.fi> wrote:
> On Sun, Nov 13, 2016 at 09:54:42PM +0100, Nickolai Dobrynin wrote:
>> Which version will have this patch?
>
> It is now in the master branch so assuming no issues show up, it should
> be included in the next release (v2.7).
>
>> Also, regarding the original problem, I thought I would
>> add a few more details.  With the 5500MHz frequency
>> turned off, timeouts are nearly gone.  I *am* still seeing
>> occasional ones: 1 timeout every 5-6-7 days.
>
> Could you please provide some more detail on what extra you mean with
> "timeouts"? I tried to interpret the earlier description on this, but
> I'm not sure I understood it correctly. Is this referring to some upper
> layer protocol timing out due to a short disconnection at link layer or
> is the full WLAN connection down for a longer time?

My web browsers were timing out badly.  In some cases, the browser
would explicitly say "Connection timed out". But most of the time, the
browser tab would show that spinny thing as if the page was loading.
The spinny thing would be around for a long time (30 seconds, or so),
then would disappear, and the page would not be loaded.  I've had GMail
refuse to send my messages, complaining about connectivity problems.
And many other symptoms.

The reason I asked the version question was because I wanted to know
when to re-enable the 5500MHz frequency on my router (see previous
messages). Sounds like I'll have to wait until 2.7 is out.

Many thanks,
Nickolai Dobrynin
diff mbox

Patch

diff --git a/wpa_supplicant/events.c b/wpa_supplicant/events.c
index 17f057a..210b45c 100644
--- a/wpa_supplicant/events.c
+++ b/wpa_supplicant/events.c
@@ -1396,8 +1396,9 @@  static int wpa_supplicant_need_to_roam(struct wpa_supplicant *wpa_s,
 {
 	struct wpa_bss *current_bss = NULL;
 #ifndef CONFIG_NO_ROAMING
-	int min_diff;
+	int min_diff, diff;
 	int to_5ghz;
+	int cur_est, sel_est;
 #endif /* CONFIG_NO_ROAMING */
 
 	if (wpa_s->reassociate)
@@ -1431,12 +1432,13 @@  static int wpa_supplicant_need_to_roam(struct wpa_supplicant *wpa_s,
 #ifndef CONFIG_NO_ROAMING
 	wpa_dbg(wpa_s, MSG_DEBUG, "Considering within-ESS reassociation");
 	wpa_dbg(wpa_s, MSG_DEBUG, "Current BSS: " MACSTR
-		" level=%d snr=%d est_throughput=%u",
-		MAC2STR(current_bss->bssid), current_bss->level,
+		" freq=%d level=%d snr=%d est_throughput=%u",
+		MAC2STR(current_bss->bssid),
+		current_bss->freq, current_bss->level,
 		current_bss->snr, current_bss->est_throughput);
 	wpa_dbg(wpa_s, MSG_DEBUG, "Selected BSS: " MACSTR
-		" level=%d snr=%d est_throughput=%u",
-		MAC2STR(selected->bssid), selected->level,
+		" freq=%d level=%d snr=%d est_throughput=%u",
+		MAC2STR(selected->bssid), selected->freq, selected->level,
 		selected->snr, selected->est_throughput);
 
 	if (wpa_s->current_ssid->bssid_set &&
@@ -1462,6 +1464,14 @@  static int wpa_supplicant_need_to_roam(struct wpa_supplicant *wpa_s,
 		return 0;
 	}
 
+	if (current_bss->est_throughput > selected->est_throughput + 5000) {
+		wpa_dbg(wpa_s, MSG_DEBUG,
+			"Skip roam - Current BSS has better estimated throughput");
+		return 1;
+	}
+
+	cur_est = current_bss->est_throughput;
+	sel_est = selected->est_throughput;
 	min_diff = 2;
 	if (current_bss->level < 0) {
 		if (current_bss->level < -85)
@@ -1474,20 +1484,42 @@  static int wpa_supplicant_need_to_roam(struct wpa_supplicant *wpa_s,
 			min_diff = 4;
 		else
 			min_diff = 5;
+		if (cur_est > sel_est * 1.5)
+			min_diff += 10;
+		else if (cur_est > sel_est * 1.2)
+			min_diff += 5;
+		else if (cur_est > sel_est * 1.1)
+			min_diff += 2;
+		else if (cur_est > sel_est)
+			min_diff++;
 	}
 	if (to_5ghz) {
+		int reduce = 2;
+
 		/* Make it easier to move to 5 GHz band */
-		if (min_diff > 2)
-			min_diff -= 2;
+		if (sel_est > cur_est * 1.5)
+			reduce = 5;
+		else if (sel_est > cur_est * 1.2)
+			reduce = 4;
+		else if (sel_est > cur_est * 1.1)
+			reduce = 3;
+
+		if (min_diff > reduce)
+			min_diff -= reduce;
 		else
 			min_diff = 0;
 	}
-	if (abs(current_bss->level - selected->level) < min_diff) {
-		wpa_dbg(wpa_s, MSG_DEBUG, "Skip roam - too small difference "
-			"in signal level");
+	diff = abs(current_bss->level - selected->level);
+	if (diff < min_diff) {
+		wpa_dbg(wpa_s, MSG_DEBUG,
+			"Skip roam - too small difference in signal level (%d < %d)",
+			diff, min_diff);
 		return 0;
 	}
 
+	wpa_dbg(wpa_s, MSG_DEBUG,
+		"Allow reassociation due to difference in signal level (%d >= %d)",
+		diff, min_diff);
 	return 1;
 #else /* CONFIG_NO_ROAMING */
 	return 0;