From patchwork Wed Sep 27 18:16:23 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Wetzel X-Patchwork-Id: 820291 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.infradead.org (client-ip=65.50.211.133; helo=bombadil.infradead.org; envelope-from=hostap-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="kc5G/Moj"; dkim-atps=neutral Received: from bombadil.infradead.org (bombadil.infradead.org [65.50.211.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3y4sbr6z0Pz9s83 for ; Mon, 2 Oct 2017 04:24:24 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: Date:Message-ID:From:In-Reply-To:To:References:Subject:Reply-To:Cc: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=wjnKDNhlCAffmjKHb2imwp+btR8qJcTVhUhEFbDS/9Y=; b=kc5G/MojBY7Akejqt7SgLRIpO jEwkK/ijg0a8zAz2uIPoREG0HDk7fP9R8QWpiO+kp2VYB02C5eoxMrI8jezPY/2V5IR1HqMk1ey+j XLAzKD/aXnIGUZUjNYlHyCNRHelhiUa9qG2LoodixBK6EcE2kzRPV/OLfE8O59EIqfSbEQ8/ZaDwq AxAa8rclaFXBlQUXj1wLn7xCzFeavwJ3haZr+yQug4abCsmTqxTnbihBmCfkBRrdmFNhDjDf9QSrP MlnynmPxiHMaBLHVU4Jd6na449q0QYTE7+NOyy5+PF8YeVE/9WRHUuIeRwUonrDQ+1MYgr1M+IO0F n+Yli5Rsw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1dyhyA-0005lo-R7; Sun, 01 Oct 2017 17:23:58 +0000 Received: from 8.mo7.mail-out.ovh.net ([46.105.77.114]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1dxGt8-0000oB-AN for hostap@lists.infradead.org; Wed, 27 Sep 2017 18:16:53 +0000 Received: from player788.ha.ovh.net (b6.ovh.net [213.186.33.56]) by mo7.mail-out.ovh.net (Postfix) with ESMTP id 68ABF73B09 for ; Wed, 27 Sep 2017 20:16:21 +0200 (CEST) Received: from awhome.eu (p4FF91B21.dip0.t-ipconnect.de [79.249.27.33]) (Authenticated sender: postmaster@awhome.eu) by player788.ha.ovh.net (Postfix) with ESMTPSA id 435D0180090 for ; Wed, 27 Sep 2017 20:16:20 +0200 (CEST) Subject: unicast rekey fundamental flawed (was: connection hangs after wpa_supplicant re-key) References: To: hostap@lists.infradead.org In-Reply-To: From: Alexander Wetzel X-Forwarded-Message-Id: Message-ID: <27736598-ba37-0d61-009f-a4443355bbbe@web.de> Date: Wed, 27 Sep 2017 20:16:23 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 Content-Language: en-GB X-Ovh-Tracer-Id: 16930156902009082560 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 0 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeelledrjeeggdduvdelucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuqfggjfdpvefjgfevmfevgfenuceurghilhhouhhtmecufedttdenuc X-Bad-Reply: References and In-Reply-To but no 'Re:' in Subject. X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20170927_111650_516879_D4814814 X-CRM114-Status: GOOD ( 26.66 ) X-Spam-Score: -2.4 (--) X-Spam-Report: SpamAssassin version 3.4.1 on bombadil.infradead.org summary: Content analysis details: (-2.4 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [46.105.77.114 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (alexander.wetzel[at]web.de) 0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 0.2 FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different X-Mailman-Approved-At: Sun, 01 Oct 2017 10:23:35 -0700 X-BeenThere: hostap@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Hostap" Errors-To: hostap-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org Hello, >> As above, I can work around the problem by increasing >> dot11RSNAConfigPMKLifetime in the config file. I also tried setting >> "fast_reauth=0" but that did not have an impact. With >> "dot11RSNAConfigPMKLifetime=31536000" I've seen a solid connection for >> multiple days. >> >> Any ideas on how I can further debug/fix this? > > Some notes above on what this would take.. Either debug from AP or > sniffer capture and all the needed keys for analysis. > > Using a larger dot11RSNAConfigPMKLifetime value sounds like a reasonable > workaround for this, though. All it does here is give the AP full > control on when to force PMK rekeying (i.e., in practice, when to force > EAP reauthentication). This seems to be the same issue I had in the past and reported/debugged (also with wlan captures) here https://patchwork.kernel.org/patch/6449291/ and here https://dev.openwrt.org/ticket/18966 The short version is, that unicast rekeys are inherently dangerous when offloading the encryption to the card and using mac80211 from the linux kernel. (Group rekeys are not affected and fine). The root of the evil is directly in the ieee802.11 spec and only was "fixed" in 802.11-2012. The fix hast not been implemented in any wlan Stack I'm aware of, though.. (At least Windows seems to have code to handle the issue as a special case when connected to an linux AP using rekeys. Here the wlan also freezes, but recovers within ~1s.) Here how I currently understand the issue: (can be wrong and/or incomplete) When changing the unicast key but having no new key ID to switch over to we are racing the hardware of the wlan card. It can (in my test environment to 100%) happen, that mac80211 hands over a frame to the wlan card for encryption with a pn belonging to the then still current old key. While this packet is queued in the wlan card the unicast key is updated and installed in the card. The packet with the old pn is then encrypted with the new key and sent out. The other end revives the packet, decrypt it successful with the new key and then sets the pn for the new key to the value from the packet. Which is of course way too high, since it belongs to the old key... One or two packets later the correct pn is beeing used, but the reply protection now drops the packets till we reach the pn of the old key (pretty unlikely to happen ever..) or the key is rolled over again, resetting the max seen pn to zero again. The result here is, that a rekey only works if the wlan is idle at the critical time, so no packets are queued when we replace the key. Switching your wlan card to software encryption prevents the issue for linux systems, but chances are you have to do that on the AP and the client to really prevent the freezes. At least when both are running linux and mac80211. (We no longer race the wlan hardware, preventing key and pn to running out of sync.) I'm currently back looking at the issue and trying to get an acceptable patch for that together to start a new discussion on linux-wireless. Since that will probably still take some time I've attached you one older but tested interims version of the new kernel patch I'm working on. The patch will not prevent sending the broken packets, it will just detect and handle them for the most probable case (TID=0) on the receiving end. Preventing the issue all together seems to be very hard, expensive and for sure still above my current understanding and coding skills. At least in my setup both systems - the AP and the Station - must be patched or the wLan freezes during rekey if there is a data transfer ongoing. Since I'm normally testing with flood ping and therefore have the same packet load in both directions that's expected. The patch will print out "HACK: -RESCUE- new key packet with old pn mitigated" when encountering and handling a problematic packet. Here a quick sample how an mitigated wlan freeze looks with the attached patch: Sep 10 21:24:21.557801 perry kernel: HACK: virgin key detected, enable HACK code path! Sep 10 21:24:21.557925 perry kernel: HACK cnt: 00 00 00 00 00 00 Sep 10 21:24:21.557961 perry kernel: HACK old_cnt: 00 00 00 00 47 69 Sep 10 21:24:21.557986 perry kernel: HACK pn: 00 00 00 00 47 6b Sep 10 21:24:21.558016 perry kernel: HACK: -RESCUE- new key packet with old pn mitigated Sep 10 21:24:21.617804 perry kernel: HACK: virgin key detected, enable HACK code path! Sep 10 21:24:21.617941 perry kernel: HACK cnt: 00 00 00 00 00 00 Sep 10 21:24:21.617970 perry kernel: HACK old_cnt: 00 00 00 00 47 6b Sep 10 21:24:21.618007 perry kernel: HACK pn: 00 00 00 00 00 01 Sep 10 21:24:21.618034 perry kernel: HACK: Switching key over to normal counter I hope that helps and make this really hard to debug issue more widely known... As it is only a small percentage of linux users will be able to tie that to rekeys. And even finding that out there does not help much, since there is absolutely nothing in any debug logs or even a kernel trace. (I tried that all prior to giving up and finally patching wireshark to be able to look at the interesting encrypted packets.) So besides using one of the patches you'll be only able to see issue in a wlan capture when looking for it. Alexander Wetzel diff -ur linux-4.13.0-gentoo_/net/mac80211/key.c linux-4.13.0-gentoo/net/mac80211/key.c --- linux-4.13.0-gentoo_/net/mac80211/key.c 2017-09-03 22:56:17.000000000 +0200 +++ linux-4.13.0-gentoo/net/mac80211/key.c 2017-09-10 21:02:23.822346404 +0200 @@ -626,9 +626,21 @@ mutex_lock(&sdata->local->key_mtx); - if (sta && pairwise) + if (sta && pairwise) { old_key = key_mtx_dereference(sdata->local, sta->ptk[idx]); - else if (sta) + if (old_key) + switch (key->conf.cipher) { + /* For now we only fix the issue for CCMP */ + case WLAN_CIPHER_SUITE_CCMP: + /* Only TID=0 seems to be relevant, but that assumption may be wrong... */ + memcpy(&key->u.ccmp.rx_pn_old, old_key->u.ccmp.rx_pn[0], IEEE80211_CCMP_PN_LEN); + key->check_pn_old = true; + break; + } + else + /* No old key, bypass hack code */ + key->check_pn_old = false; + } else if (sta) old_key = key_mtx_dereference(sdata->local, sta->gtk[idx]); else old_key = key_mtx_dereference(sdata->local, sdata->keys[idx]); diff -ur linux-4.13.0-gentoo_/net/mac80211/key.h linux-4.13.0-gentoo/net/mac80211/key.h --- linux-4.13.0-gentoo_/net/mac80211/key.h 2017-09-03 22:56:17.000000000 +0200 +++ linux-4.13.0-gentoo/net/mac80211/key.h 2017-09-10 21:02:54.752438385 +0200 @@ -59,6 +59,7 @@ struct ieee80211_local *local; struct ieee80211_sub_if_data *sdata; struct sta_info *sta; + bool check_pn_old; /* for sdata list */ struct list_head list; @@ -88,6 +89,7 @@ * Management frames. */ u8 rx_pn[IEEE80211_NUM_TIDS + 1][IEEE80211_CCMP_PN_LEN]; + u8 rx_pn_old[IEEE80211_CMAC_PN_LEN]; struct crypto_aead *tfm; u32 replays; /* dot11RSNAStatsCCMPReplays */ } ccmp; diff -ur linux-4.13.0-gentoo_/net/mac80211/wpa.c linux-4.13.0-gentoo/net/mac80211/wpa.c --- linux-4.13.0-gentoo_/net/mac80211/wpa.c 2017-09-03 22:56:17.000000000 +0200 +++ linux-4.13.0-gentoo/net/mac80211/wpa.c 2017-09-10 21:08:04.203331545 +0200 @@ -532,6 +532,31 @@ key->u.ccmp.replays++; return RX_DROP_UNUSABLE; } + if (unlikely(key->check_pn_old)) { + /* Code only handles TID=0, which seems to be the only relevant TID for the race */ + if (queue == 0) { + printk ("HACK: virgin key detected, enable HACK code path!"); + print_hex_dump_debug("HACK cnt: ", DUMP_PREFIX_NONE, IEEE80211_CCMP_PN_LEN, 6, key->u.ccmp.rx_pn[queue], IEEE80211_CCMP_PN_LEN, false); + print_hex_dump_debug("HACK old_cnt: ", DUMP_PREFIX_NONE, IEEE80211_CCMP_PN_LEN, 6, key->u.ccmp.rx_pn_old, IEEE80211_CCMP_PN_LEN, false); + print_hex_dump_debug("HACK pn: ", DUMP_PREFIX_NONE, IEEE80211_CCMP_PN_LEN, 6, pn, IEEE80211_CCMP_PN_LEN, false); + + if (memcmp(pn, key->u.ccmp.rx_pn_old, IEEE80211_CCMP_PN_LEN) < 0 || + memcmp(key->u.ccmp.rx_pn[queue], key->u.ccmp.rx_pn_old, IEEE80211_CCMP_PN_LEN) == 0 ) { + /* pn is < the pn from old key or rx_pn_old and rx_pn are identical, complete switch to new key */ + printk ("HACK: Switching key over to normal counter\n"); + memcpy(key->u.ccmp.rx_pn[queue], pn, IEEE80211_CCMP_PN_LEN); + key->check_pn_old = false; + } else { + /* This case would freeze the wlan on an unpatched kernel */ + printk ("HACK: -RESCUE- new key packet with old pn mitigated\n"); + memcpy(key->u.ccmp.rx_pn_old, pn, IEEE80211_CCMP_PN_LEN); + } + } else { + printk ("HACK: Sanity ERROR - Found a key with check_pn_old set were TID!=0"); + } + } else { + memcpy(key->u.ccmp.rx_pn[queue], pn, IEEE80211_CCMP_PN_LEN); + } if (!(status->flag & RX_FLAG_DECRYPTED)) { u8 aad[2 * AES_BLOCK_SIZE]; @@ -546,8 +571,6 @@ skb->data + skb->len - mic_len, mic_len)) return RX_DROP_UNUSABLE; } - - memcpy(key->u.ccmp.rx_pn[queue], pn, IEEE80211_CCMP_PN_LEN); } /* Remove CCMP header and MIC */