diff mbox

Roaming on android blacklists incorrect bss

Message ID 57762B95.2090309@sonymobile.com
State Superseded
Headers show

Commit Message

Mikael Kanstrup July 1, 2016, 8:36 a.m. UTC
Hi Jouni and all,

On Android M we've seen cases where Android's way of roaming sometimes 
end up blacklisting incorrect bss.

Connected to BSSID1 to roam to another AP the following sequence of 
commands are used:
SET_NETWORK 0 bssid <BSSID2>
ENABLE_NETWORK 0
REASSOCIATE

Most of the time this works just fine, though if authentication timer 
times out (probably due to auth/assoc/eapol packet loss) the BSSID 
roamed away from gets blacklisted (BSSID1), not the one failing to 
reassociate with (BSSID2).

Interesting lines from the log look like this:

wlan0: Considering connect request: reassociate: 1  selected: <BSSID2>  
bssid: <BBSID1>  pending: 00:00:00:00:00:00 wpa_state: COMPLETED  
ssid=<SSID>  current_ssid=<SSID>
wlan0: Request association with <BSSID2>
wlan0: Re-association to the same ESS
...
wlan0: Add radio work 'connect'@0x7f9769c230
wlan0: First radio work item in the queue - schedule start immediately
wlan0: Starting radio work 'connect'@0x7f9769c230 after 0.000144 second 
wait
wlan0: Trying to associate with SSID <SSID>
...
wlan0: State: COMPLETED -> ASSOCIATING
...
Limit connection to BSSID <BBSID2> freq=5180 MHz based on scan results 
(bssid_set=1)
...
nl80211: Connect (ifindex=6)
   * bssid=<BSSID2>
   * bssid_hint=<BSSID2>
...
nl80211: Connect request send successfully
wlan0: Setting authentication timeout: 10 sec 0 usec
...
wlan0: Authentication with <BSSID1> timed out.
Added BSSID <BSSID1> into blacklist
TDLS: Remove peers on disassociation
wlan0: WPA: Clear old PMK and PTK
wlan0: Request to deauthenticate - bssid=<BSSID1> 
pending_bssid=00:00:00:00:00:00 reason=3 state=ASSOCIATING

Question is, is this way of using the REASSOCIATE command to perform 
roam operation valid?
I worked on a patch that solved this specific case but had to apply some 
hacks to reproduce it with hwsim tests. It would be great with some 
feedback on the scenario and attached patches. I think not all of them 
should really be applied but should help discussing the problem seen.

Thanks
Mikael Kanstrup

Comments

Mikael Kanstrup Aug. 22, 2016, 4:35 p.m. UTC | #1
Hi Jouni,

Wanted to ask if you've had the time look at this. The fix is patch 5/5, 
the others are there only to reproduce the error with hwsim tests. If it 
helps I can resend the patches with git send-mail.

Regards
Mikael Kanstrup

On 07/01/2016 10:36 AM, Mikael Kanstrup wrote:
> Hi Jouni and all,
>
> On Android M we've seen cases where Android's way of roaming sometimes 
> end up blacklisting incorrect bss.
>
> Connected to BSSID1 to roam to another AP the following sequence of 
> commands are used:
> SET_NETWORK 0 bssid <BSSID2>
> ENABLE_NETWORK 0
> REASSOCIATE
>
> Most of the time this works just fine, though if authentication timer 
> times out (probably due to auth/assoc/eapol packet loss) the BSSID 
> roamed away from gets blacklisted (BSSID1), not the one failing to 
> reassociate with (BSSID2).
>
> Interesting lines from the log look like this:
>
> wlan0: Considering connect request: reassociate: 1  selected: 
> <BSSID2>  bssid: <BBSID1>  pending: 00:00:00:00:00:00 wpa_state: 
> COMPLETED  ssid=<SSID>  current_ssid=<SSID>
> wlan0: Request association with <BSSID2>
> wlan0: Re-association to the same ESS
> ...
> wlan0: Add radio work 'connect'@0x7f9769c230
> wlan0: First radio work item in the queue - schedule start immediately
> wlan0: Starting radio work 'connect'@0x7f9769c230 after 0.000144 
> second wait
> wlan0: Trying to associate with SSID <SSID>
> ...
> wlan0: State: COMPLETED -> ASSOCIATING
> ...
> Limit connection to BSSID <BBSID2> freq=5180 MHz based on scan results 
> (bssid_set=1)
> ...
> nl80211: Connect (ifindex=6)
>   * bssid=<BSSID2>
>   * bssid_hint=<BSSID2>
> ...
> nl80211: Connect request send successfully
> wlan0: Setting authentication timeout: 10 sec 0 usec
> ...
> wlan0: Authentication with <BSSID1> timed out.
> Added BSSID <BSSID1> into blacklist
> TDLS: Remove peers on disassociation
> wlan0: WPA: Clear old PMK and PTK
> wlan0: Request to deauthenticate - bssid=<BSSID1> 
> pending_bssid=00:00:00:00:00:00 reason=3 state=ASSOCIATING
>
> Question is, is this way of using the REASSOCIATE command to perform 
> roam operation valid?
> I worked on a patch that solved this specific case but had to apply 
> some hacks to reproduce it with hwsim tests. It would be great with 
> some feedback on the scenario and attached patches. I think not all of 
> them should really be applied but should help discussing the problem 
> seen.
>
> Thanks
> Mikael Kanstrup
>
>
>
Jouni Malinen Sept. 10, 2016, 6:30 p.m. UTC | #2
On Fri, Jul 01, 2016 at 10:36:37AM +0200, Mikael Kanstrup wrote:
> On Android M we've seen cases where Android's way of roaming
> sometimes end up blacklisting incorrect bss.
> 
> Connected to BSSID1 to roam to another AP the following sequence of
> commands are used:
> SET_NETWORK 0 bssid <BSSID2>
> ENABLE_NETWORK 0
> REASSOCIATE
> 
> Most of the time this works just fine, though if authentication
> timer times out (probably due to auth/assoc/eapol packet loss) the
> BSSID roamed away from gets blacklisted (BSSID1), not the one
> failing to reassociate with (BSSID2).

Hmm.. Is that really the case for lost EAPOL frames? That would be
post-association and wpa_s->bssid is set to the new BSSID when
completing association. The auth/assoc case sounds more likely to be an
issue, though.

> Question is, is this way of using the REASSOCIATE command to perform
> roam operation valid?

Yes, even if not really the most optimized way of doing that.

> I worked on a patch that solved this specific case but had to apply
> some hacks to reproduce it with hwsim tests. It would be great with
> some feedback on the scenario and attached patches. I think not all
> of them should really be applied but should help discussing the
> problem seen.

The testing patches 1-4 would need some cleanup to be acceptable, e.g.,
by making the wpa_s->ignore_auth_resp behavior protected with #ifdef
CONFIG_TESTING_OPTIONS. In addition, I'm not sure why 2/5 is needed
since the driver parameter is parsed with strstr searches of substrings
from the full parameter string. Instead of adding IGNORE_AUTH_RESP, I'd
use the existing SET control interface command. And the new test case
would need to reset IGNORE_AUTH_RESP in all error cases (e.g., try ..
finally) or make FLUSH clear this in wpa_supplicant_ctrl_iface_flush().

As far as the actual fix patch 5/5 is concerned, I don't think I like
the use of wpa_s->pending_bssid to override wpa_s->bssid in
wpa_supplicant_timeout() since this function can be called during EAPOL
timeout (i.e., that post-association case mentioned above). While this
may work now for most cases, this does not sound like the correct thing
to do. It might be fine to do so in wpa_state ==
WPA_AUTHENTICATING/ASSOCIATING just like the
wpa_supplicant_deauthenticate() does.
Mikael Kanstrup Sept. 19, 2016, 11:17 a.m. UTC | #3
Hi Jouni,

Thanks a lot for feedback!

2016-09-10 20:30 GMT+02:00 Jouni Malinen <j@w1.fi>:
> Hmm.. Is that really the case for lost EAPOL frames? That would be
> post-association and wpa_s->bssid is set to the new BSSID when
> completing association. The auth/assoc case sounds more likely to be an
> issue, though.

Yes you're right this is not about lost EAPOL frames. We lack air
sniffer logs so this was a bit of guessing.

> The testing patches 1-4 would need some cleanup to be acceptable, e.g.,
> by making the wpa_s->ignore_auth_resp behavior protected with #ifdef
> CONFIG_TESTING_OPTIONS.

OK I'll update the patches.

> In addition, I'm not sure why 2/5 is needed
> since the driver parameter is parsed with strstr searches of substrings
> from the full parameter string.

Thanks, yes this patch is not needed.

>Instead of adding IGNORE_AUTH_RESP, I'd
> use the existing SET control interface command. And the new test case
> would need to reset IGNORE_AUTH_RESP in all error cases (e.g., try ..
> finally) or make FLUSH clear this in wpa_supplicant_ctrl_iface_flush().

OK

> As far as the actual fix patch 5/5 is concerned, I don't think I like
> the use of wpa_s->pending_bssid to override wpa_s->bssid in
> wpa_supplicant_timeout() since this function can be called during EAPOL
> timeout (i.e., that post-association case mentioned above). While this
> may work now for most cases, this does not sound like the correct thing
> to do. It might be fine to do so in wpa_state ==
> WPA_AUTHENTICATING/ASSOCIATING just like the
> wpa_supplicant_deauthenticate() does.

Thanks, I will add a check for AUTHENTICATING/ASSOCIATING state.

/Mikael
diff mbox

Patch

From bef2433754014e13143f10ed2a778c8bbba0a518 Mon Sep 17 00:00:00 2001
From: Mikael Kanstrup <mikael.kanstrup@sonymobile.com>
Date: Wed, 29 Jun 2016 15:44:19 +0200
Subject: [PATCH 5/5] Blacklist correct bssid on auth timeout if bssid_set

If authentication times out while performing reassociate with
bssid_set=1 incorrect bssid end up being blacklisted. Use pending_bss
field on auth timeout and deauth to ensure correct AP get blacklisted.

Change-Id: I11eec4f5bf05c6512486307c5afae969cdde4e02
---
 wpa_supplicant/wpa_supplicant.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/wpa_supplicant/wpa_supplicant.c b/wpa_supplicant/wpa_supplicant.c
index 57881e8..73316db 100644
--- a/wpa_supplicant/wpa_supplicant.c
+++ b/wpa_supplicant/wpa_supplicant.c
@@ -192,7 +192,7 @@  static void wpa_supplicant_timeout(void *eloop_ctx, void *timeout_ctx)
 {
 	struct wpa_supplicant *wpa_s = eloop_ctx;
 	const u8 *bssid = wpa_s->bssid;
-	if (is_zero_ether_addr(bssid))
+	if (!is_zero_ether_addr(wpa_s->pending_bssid))
 		bssid = wpa_s->pending_bssid;
 	wpa_msg(wpa_s, MSG_INFO, "Authentication with " MACSTR " timed out.",
 		MAC2STR(bssid));
@@ -2156,7 +2156,10 @@  static void wpas_start_assoc_cb(struct wpa_radio_work *work, int deinit)
 	} else {
 		wpa_msg(wpa_s, MSG_INFO, "Trying to associate with SSID '%s'",
 			wpa_ssid_txt(ssid->ssid, ssid->ssid_len));
-		os_memset(wpa_s->pending_bssid, 0, ETH_ALEN);
+		if (bss && ssid->bssid_set)
+			os_memcpy(wpa_s->pending_bssid, bss->bssid, ETH_ALEN);
+		else
+			os_memset(wpa_s->pending_bssid, 0, ETH_ALEN);
 	}
 	if (!wpa_s->pno)
 		wpa_supplicant_cancel_sched_scan(wpa_s);
@@ -2685,12 +2688,12 @@  void wpa_supplicant_deauthenticate(struct wpa_supplicant *wpa_s,
 		MAC2STR(wpa_s->bssid), MAC2STR(wpa_s->pending_bssid),
 		reason_code, wpa_supplicant_state_txt(wpa_s->wpa_state));
 
-	if (!is_zero_ether_addr(wpa_s->bssid))
-		addr = wpa_s->bssid;
-	else if (!is_zero_ether_addr(wpa_s->pending_bssid) &&
+	if (!is_zero_ether_addr(wpa_s->pending_bssid) &&
 		 (wpa_s->wpa_state == WPA_AUTHENTICATING ||
 		  wpa_s->wpa_state == WPA_ASSOCIATING))
 		addr = wpa_s->pending_bssid;
+	else if (!is_zero_ether_addr(wpa_s->bssid))
+		addr = wpa_s->bssid;
 	else if (wpa_s->wpa_state == WPA_ASSOCIATING) {
 		/*
 		 * When using driver-based BSS selection, we may not know the
-- 
2.4.2