Patchwork [RFC] pmksa: don't evict active entry when adding new ones

login
register
mail settings
Submitter Dan Williams
Date Aug. 6, 2012, 4:30 p.m.
Message ID <1344270602.11053.9.camel@dcbw.foobar.com>
Download mbox | patch
Permalink /patch/175422/
State Accepted
Commit 0e502f97c5848ec30d9a822c03f05c57471abbed
Headers show

Comments

Dan Williams - Aug. 6, 2012, 4:30 p.m.
If the PMKSA cache is full (ie, 32 candidates have been seen in scan
results and have not yet expired) then any additional entries can
potentially evict the current/active entry (if it is the first entry),
which triggers a pointless local deauthentication.  The supplicant
shouldn't replace the current/active entry if it is still valid, but
instead the oldest entry that is *not* the current/active one.

1343242112.312194: New scan results available
1343242112.312282: bgscan simple: scan result notification
1343242112.312343: RSN: Consider 00:xx:xx:xx:xx:40 for OKC
1343242112.312408: RSN: removed the oldest PMKSA cache entry (for
00:xx:xx:xx:xx:90) to make room for new one
1343242112.312454: RSN: removed current PMKSA entry
1343242112.312490: wpa_driver_nl80211_deauthenticate
1343242112.336808: State: COMPLETED -> DISCONNECTED   (oops...)

This is a bug at least as far back as 0.7, so the patch is desired for
0.7 (if you're ever updating that again), 1.0, and git master.

Signed-hostap: Dan Williams <dcbw@redhat.com>
---
Does this patch look correct?  I haven't runtime tested it yet, but
that's in the process of being done.  Somebody double-check my
linked-list logic, please :)
Jouni Malinen - Aug. 10, 2012, 3:11 p.m.
On Mon, Aug 06, 2012 at 11:30:02AM -0500, Dan Williams wrote:
> If the PMKSA cache is full (ie, 32 candidates have been seen in scan
> results and have not yet expired) then any additional entries can
> potentially evict the current/active entry (if it is the first entry),
> which triggers a pointless local deauthentication.  The supplicant
> shouldn't replace the current/active entry if it is still valid, but
> instead the oldest entry that is *not* the current/active one.

Agreed.

> Does this patch look correct?  I haven't runtime tested it yet, but
> that's in the process of being done.  Somebody double-check my
> linked-list logic, please :)

List handling was fine, but this patch is not enough on its own since
sm->cur_pmksa may be NULL and the check here for an active entry could
have failed. I applied this and then another commit that updates
sm->cur_pmksa when adding the initial SA entry. This seemed to address
the issue.
Dan Williams - Aug. 13, 2012, 5:51 p.m.
On Fri, 2012-08-10 at 18:11 +0300, Jouni Malinen wrote:
> On Mon, Aug 06, 2012 at 11:30:02AM -0500, Dan Williams wrote:
> > If the PMKSA cache is full (ie, 32 candidates have been seen in scan
> > results and have not yet expired) then any additional entries can
> > potentially evict the current/active entry (if it is the first entry),
> > which triggers a pointless local deauthentication.  The supplicant
> > shouldn't replace the current/active entry if it is still valid, but
> > instead the oldest entry that is *not* the current/active one.
> 
> Agreed.
> 
> > Does this patch look correct?  I haven't runtime tested it yet, but
> > that's in the process of being done.  Somebody double-check my
> > linked-list logic, please :)
> 
> List handling was fine, but this patch is not enough on its own since
> sm->cur_pmksa may be NULL and the check here for an active entry could
> have failed. I applied this and then another commit that updates
> sm->cur_pmksa when adding the initial SA entry. This seemed to address
> the issue.

So one problem with the current code in my testing was that when an
entry is removed, the check inwpa_sm_pmksa_free_cb() that determines
whether or not to deauthenticate checks either the cache entry, or the
PMK itself:

	if (sm->cur_pmksa == entry ||
	    (sm->pmk_len == entry->pmk_len &&
	     os_memcmp(sm->pmk, entry->pmk, sm->pmk_len) == 0)) {

and in my testing, a number of cache entries had the *same* PMK:

1344603539.406323: Selected BSS: 00:xx:xx:xx:xx:90 level=-63
...
1344603570.520225: RSN: Consider 00:xx:xx:xx:xx:10 for OKC
1344603570.520281: RSN: oldest PMKSA cache entry is 0x19ff450 00:xx:xx:xx:xx:20 PMK 341b7... (32)
1344603570.520332: RSN: current PMKSA cache entry is  0x19def10 00:xx:xx:xx:xx:90 PMK 341b7... (32)
1344603570.520379: RSN: current PMKSA is 00:xx:xx:xx:xx:90 PMK 341b7... (32)
1344603570.520429: RSN: removed the oldest idle PMKSA cache entry (for 00:xx:xx:xx:xx:30) to make room for new one
1344603570.520460: RSN: removed current PMKSA entry

clearly the entry for :30 is neither the current entry, nor is the entry
for the current BSSID (which is :90).  Yet the snippet in
wpa_sm_pmksa_free_cb() just compares the PMK and decides to
deauthenticate.

So my questions here are:

1) is it normal for multiple cache entries to have the same PMK?  It
seems the answer is "yes", given pmksa_cache_clone_entry() and the
opportunistic caching logic.

2) Shouldn't pmksa_cache_set_current() be honoring the BSSID that's
passed in before falling back to just checking the PMK?  It seems to me
the code should be searching from more specific to less specific.  If
we're passed *both* a PMKID and a BSSID, shouldn't the code make an
effort to find one that matches both, instead of just the PMKID?
Otherwise we end up with sm->cur_pmksa pointing to a BSSID that we're
not associated with.

Something like this maybe?  I'd argue that if both the PMKID and BSSID
are passed, and an exact match is not found, that's a failure.  If only
one of the two is passed, then we can't find an exact match, so try for
something close.  But give the BSSID priority since that's most likely
what we're currently associated with?  I don't pretend to know much
about OKC operationally though.

	sm->cur_pmksa = NULL;
	if (pmkid && bssid) {
		/* exact match requested; if we cannot provide it we fail */
		sm->cur_pmksa = pmksa_cache_get(pmksa, bssid, pmkid,
						network_ctx);
	} else {
		/* Fuzzy matching; one of BSSID or PMKID not known */
		if (bssid)
			sm->cur_pmksa = pmksa_cache_get(pmksa, bssid, NULL,
							network_ctx);
		if (sm->cur_pmksa == NULL && try_opportunistic && bssid)
			sm->cur_pmksa = pmksa_cache_get_opportunistic(pmksa,
								      network_ctx,
								      bssid);
		if (sm->cur_pmksa == NULL && pmkid)
			sm->cur_pmksa = pmksa_cache_get(pmksa, NULL, pmkid,
							network_ctx);
	}

Dan
Jouni Malinen - Nov. 25, 2012, 8:23 p.m.
On Mon, Aug 13, 2012 at 12:51:24PM -0500, Dan Williams wrote:
> So one problem with the current code in my testing was that when an
> entry is removed, the check inwpa_sm_pmksa_free_cb() that determines
> whether or not to deauthenticate checks either the cache entry, or the
> PMK itself:
> 
> 	if (sm->cur_pmksa == entry ||
> 	    (sm->pmk_len == entry->pmk_len &&
> 	     os_memcmp(sm->pmk, entry->pmk, sm->pmk_len) == 0)) {

There shouldn't really be need for checking the actual PMK value, but
until we can trust on sm->cur_pmksa being always set correctly, it is
better to keep that there to make sure PTK gets expired when the PMK
does.

> 1) is it normal for multiple cache entries to have the same PMK?  It
> seems the answer is "yes", given pmksa_cache_clone_entry() and the
> opportunistic caching logic.

Yes, with OKC.

> 2) Shouldn't pmksa_cache_set_current() be honoring the BSSID that's
> passed in before falling back to just checking the PMK?  It seems to me
> the code should be searching from more specific to less specific.  If
> we're passed *both* a PMKID and a BSSID, shouldn't the code make an
> effort to find one that matches both, instead of just the PMKID?
> Otherwise we end up with sm->cur_pmksa pointing to a BSSID that we're
> not associated with.

This function is a bit ugly due to history, i.e., number of different
places were setting cur_pmksa and I just collected them into that single
function when adding some more separation between the core
wpa_supplicant code and RSN/PMKSA cache implementation. This is designed
to search mainly based on PMKID for the association case (which does not
actually even pass the BSSID in the current caller), but since PMKID is
derived based on AA (= BSSID), this will, in practice, match both.

Do you have clear cases where sm->cur_pmksa is really pointing to
another BSSID? The way the PMKID is derived should make that not happen
in practice..

> Something like this maybe?  I'd argue that if both the PMKID and BSSID
> are passed, and an exact match is not found, that's a failure.  If only
> one of the two is passed, then we can't find an exact match, so try for
> something close.  But give the BSSID priority since that's most likely
> what we're currently associated with?  I don't pretend to know much
> about OKC operationally though.
> 
> 	sm->cur_pmksa = NULL;
> 	if (pmkid && bssid) {
> 		/* exact match requested; if we cannot provide it we fail */
> 		sm->cur_pmksa = pmksa_cache_get(pmksa, bssid, pmkid,
> 						network_ctx);
> 	} else {
> 		/* Fuzzy matching; one of BSSID or PMKID not known */
> 		if (bssid)
> 			sm->cur_pmksa = pmksa_cache_get(pmksa, bssid, NULL,
> 							network_ctx);
> 		if (sm->cur_pmksa == NULL && try_opportunistic && bssid)
> 			sm->cur_pmksa = pmksa_cache_get_opportunistic(pmksa,
> 								      network_ctx,
> 								      bssid);
> 		if (sm->cur_pmksa == NULL && pmkid)
> 			sm->cur_pmksa = pmksa_cache_get(pmksa, NULL, pmkid,
> 							network_ctx);
> 	}

I'm not sure I would do this now, i.e., it would be better to go through
the callers first and make sure they pass proper information for this or
well, maybe just split pmksa_cache_set_current() into couple of
functions based on the use cases to make this somewhat cleaner.

Patch

diff -up wpa_supplicant-0.7.3/src/rsn_supp/pmksa_cache.c.foo wpa_supplicant-0.7.3/src/rsn_supp/pmksa_cache.c
--- wpa_supplicant-0.7.3/src/rsn_supp/pmksa_cache.c.foo	2012-08-05 23:34:38.230809262 -0500
+++ wpa_supplicant-0.7.3/src/rsn_supp/pmksa_cache.c	2012-08-05 23:41:10.862900686 -0500
@@ -203,11 +203,23 @@  pmksa_cache_add(struct rsn_pmksa_cache *
 	if (pmksa->pmksa_count >= pmksa_cache_max_entries && pmksa->pmksa) {
 		/* Remove the oldest entry to make room for the new entry */
 		pos = pmksa->pmksa;
-		pmksa->pmksa = pos->next;
-		wpa_printf(MSG_DEBUG, "RSN: removed the oldest PMKSA cache "
-			   "entry (for " MACSTR ") to make room for new one",
-			   MAC2STR(pos->aa));
-		pmksa_cache_free_entry(pmksa, pos, 0);
+
+		/* Never remove the current PMKSA cache entry, since it's
+		 * in use, and removing it triggers a needless deauthentication.
+		 */
+		if (pos == pmksa->sm->cur_pmksa) {
+			pos = pos->next;
+			pmksa->pmksa->next = pos ? pos->next : NULL;
+		} else
+			pmksa->pmksa = pos->next;
+
+		if (pos) {
+			wpa_printf(MSG_DEBUG, "RSN: removed the oldest idle "
+				   "PMKSA cache entry (for " MACSTR ") to make "
+				   "room for new one",
+				   MAC2STR(pos->aa));
+			pmksa_cache_free_entry(pmksa, pos, 0);
+		}
 	}
 
 	/* Add the new entry; order by expiration time */