diff mbox

Association race when acting as AP?

Message ID CA+BoTQm==AM3F2Fxq6T_+uY+jq0yXc9JHFT0iwpe2XJrmxkSsg@mail.gmail.com
State Changes Requested
Headers show

Commit Message

Michal Kazior July 2, 2015, 8:24 a.m. UTC
Hello,

I've been recently trying to figure out why I'm seeing messages like the
following in dmesg:

 [  135.866308] p2p-wlan1-0: authenticate with 02:03:7f:91:53:51
 [  135.869745] p2p-wlan1-0: send auth to 02:03:7f:91:53:51 (try 1/3)
 [  135.877538] p2p-wlan1-0: authenticated
 [  135.888029] p2p-wlan1-0: associate with 02:03:7f:91:53:51 (try 1/3)
 [  135.912461] p2p-wlan1-0: RX AssocResp from 02:03:7f:91:53:51
(capab=0x411 status=0 aid=1)
 [  135.916226] p2p-wlan1-0: associated
 [  135.918038] p2p-wlan1-0: deauthenticated from 02:03:7f:91:53:51
(Reason: 7=CLASS3_FRAME_FROM_NONASSOC_STA)

This gets repeated a few times. Sometimes the connection succeeds after a
few cycles, sometimes it doesn't. I've seen this mostly while testing P2P.

After looking into hostapd code I noticed something strange and I wonder if
anyone else is already aware of this problem:

 1. AP starts
 2. STA->AP auth OTA
 3. AP->STA auth OTA
 4. STA->AP assoc req OTA
 5. AP->STA assoc resp OTA
 6. STA sends NullFunc with "STA will go to sleep" bit set
 7. AP driver/device sees a frame from with unknown TA/SA and issues Deauth
w/ Reason 7
   (this Deauth doesn't originate from hostapd; it comes from the device FW
in my case)
 8. AP sees TX_STATUS for (5) so it just now installs station entry to
device/driver
 9. AP attempts to send EAPOL but STA is no longer there

I'm able to reproduce this quite easily with QCA6174 (ath10k) acting as P2P
GO and Intel 7260 (iwlmvm) as P2P Client.

This also suggests it's not P2P specific.

To me this looks like a race in hostapd. The station should be installed to
driver _before_ sending Assoc Resp frame, not after. My quick-n-dirty hack
seems to help:

                           strerror(errno));
@@ -2561,7 +2568,6 @@ void ieee802_11_mgmt_cb(struct hostapd_data *hapd,
const u8 *buf, size_t len,
                break;
        case WLAN_FC_STYPE_ASSOC_RESP:
                wpa_printf(MSG_DEBUG, "mgmt::assoc_resp cb");
-               handle_assoc_cb(hapd, mgmt, len, 0, ok);
                break;
        case WLAN_FC_STYPE_REASSOC_RESP:
                wpa_printf(MSG_DEBUG, "mgmt::reassoc_resp cb");


Obviously this is whitespace damaged and incomplete as it doesn't cover all
the possible fail cases. It's just a proof-of-concept for the purpose of
discussion.

Is anyone aware of this problem already? Anyone working on it? Any gotchas
I should be aware of before I go into fixing this in a proper way? Or am I
missing something and this isn't actually a problem?


Michał

Comments

Johannes Berg July 2, 2015, 8:38 a.m. UTC | #1
[please try to send w/o html if you're CC'ing the linux-wireless list]

> To me this looks like a race in hostapd. The station should be 
> installed to driver _before_ sending Assoc Resp frame, not after. My 
> quick-n-dirty hack seems to help:
> 
[...]
> Is anyone aware of this problem already? Anyone working on it? Any 
> gotchas I should be aware of before I go into fixing this in a proper 
> way? Or am I missing something and this isn't actually a problem?

The TI folks had a similar patch that broke open networks, not sure
what was wrong there.

Ultimately, depending on the nl80211 capabilities, the station should
in fact be added (as unauthenticated) before even sending the
authentication response frame, and then stepping through the stages
appropriately.

It should also react to errors by sending a negative association
response I guess.

johannes
Eliad Peller July 2, 2015, 8:43 a.m. UTC | #2
On Thu, Jul 2, 2015 at 11:38 AM, Johannes Berg <johannes@sipsolutions.net>
wrote:

> [please try to send w/o html if you're CC'ing the linux-wireless list]
>
> > To me this looks like a race in hostapd. The station should be
> > installed to driver _before_ sending Assoc Resp frame, not after. My
> > quick-n-dirty hack seems to help:
> >
> [...]
> > Is anyone aware of this problem already? Anyone working on it? Any
> > gotchas I should be aware of before I go into fixing this in a proper
> > way? Or am I missing something and this isn't actually a problem?
>
> The TI folks had a similar patch that broke open networks, not sure
> what was wrong there.
>
> there was some implementation error. it was fixed later on.

also, take a look at this thread:
http://thread.gmane.org/gmane.linux.kernel.wireless.general/88421

Eliad.
Michal Kazior July 2, 2015, 10:28 a.m. UTC | #3
On 2 July 2015 at 10:38, Johannes Berg <johannes@sipsolutions.net> wrote:
> [please try to send w/o html if you're CC'ing the linux-wireless list]

Ah, sorry. I suspect the "plain text mode" in gmail/www got disabled
for some reason for that e-mail..


>> To me this looks like a race in hostapd. The station should be
>> installed to driver _before_ sending Assoc Resp frame, not after. My
>> quick-n-dirty hack seems to help:
>>
> [...]
>> Is anyone aware of this problem already? Anyone working on it? Any
>> gotchas I should be aware of before I go into fixing this in a proper
>> way? Or am I missing something and this isn't actually a problem?
>
> The TI folks had a similar patch that broke open networks, not sure
> what was wrong there.
>
> Ultimately, depending on the nl80211 capabilities, the station should
> in fact be added (as unauthenticated) before even sending the
> authentication response frame, and then stepping through the stages
> appropriately.

While I think it does make sense (I thought of this too, sounds
desirable) I think it wouldn't solve the race problem entirely. The
station might no longer be rejected with Deauth but may end up
confusing AP's internal/offloaded STA powersave state depending on
implementation detail (what do you do when you receive NullFunc from a
station that you don't know assoc id of or isn't fully initialized as
associated?). I.e. station should be transitioned to Assoc state
before sending the Assoc Resp frame.


> It should also react to errors by sending a negative association
> response I guess.

Good point.


Michał
Michal Kazior July 2, 2015, 10:39 a.m. UTC | #4
On 2 July 2015 at 10:43, Eliad Peller <eliad@wizery.com> wrote:
> On Thu, Jul 2, 2015 at 11:38 AM, Johannes Berg <johannes@sipsolutions.net>
> wrote:
>>
>> [please try to send w/o html if you're CC'ing the linux-wireless list]
>>
>> > To me this looks like a race in hostapd. The station should be
>> > installed to driver _before_ sending Assoc Resp frame, not after. My
>> > quick-n-dirty hack seems to help:
>> >
>> [...]
>> > Is anyone aware of this problem already? Anyone working on it? Any
>> > gotchas I should be aware of before I go into fixing this in a proper
>> > way? Or am I missing something and this isn't actually a problem?
>>
>> The TI folks had a similar patch that broke open networks, not sure
>> what was wrong there.
>>
> there was some implementation error. it was fixed later on.
>
> also, take a look at this thread:
> http://thread.gmane.org/gmane.linux.kernel.wireless.general/88421

Thanks! Looks like you guys hit this with EAPOL frames. I'll look into it more.

Is there any particular reason why these TI-OpenLink patches weren't
submitted/merged to hostap?


Michał
Johannes Berg July 2, 2015, 11:41 a.m. UTC | #5
On Thu, 2015-07-02 at 12:28 +0200, Michal Kazior wrote:

> > Ultimately, depending on the nl80211 capabilities, the station 
> > should
> > in fact be added (as unauthenticated) before even sending the
> > authentication response frame, and then stepping through the stages
> > appropriately.
> 
> While I think it does make sense (I thought of this too, sounds
> desirable) I think it wouldn't solve the race problem entirely. The
> station might no longer be rejected with Deauth but may end up
> confusing AP's internal/offloaded STA powersave state depending on
> implementation detail (what do you do when you receive NullFunc from 
> a
> station that you don't know assoc id of or isn't fully initialized as
> associated?). 

We'd send a deauth with "class 3 frame from unassociated STA" reason :)

> I.e. station should be transitioned to Assoc state
> before sending the Assoc Resp frame.

Yeah, I guess that's still true, but it doesn't preclude adding the
station before auth response and sending an auth response depending on
whether it could be added; perhaps we need to set it to authenticated
just before sending the frame as well though.


johannes
Eliad Peller July 2, 2015, 12:45 p.m. UTC | #6
On Thu, Jul 2, 2015 at 1:39 PM, Michal Kazior <michal.kazior@tieto.com> wrote:
> On 2 July 2015 at 10:43, Eliad Peller <eliad@wizery.com> wrote:
>> On Thu, Jul 2, 2015 at 11:38 AM, Johannes Berg <johannes@sipsolutions.net>
>> wrote:
>>>
>>> [please try to send w/o html if you're CC'ing the linux-wireless list]
>>>
>>> > To me this looks like a race in hostapd. The station should be
>>> > installed to driver _before_ sending Assoc Resp frame, not after. My
>>> > quick-n-dirty hack seems to help:
>>> >
>>> [...]
>>> > Is anyone aware of this problem already? Anyone working on it? Any
>>> > gotchas I should be aware of before I go into fixing this in a proper
>>> > way? Or am I missing something and this isn't actually a problem?
>>>
>>> The TI folks had a similar patch that broke open networks, not sure
>>> what was wrong there.
>>>
>> there was some implementation error. it was fixed later on.
>>
>> also, take a look at this thread:
>> http://thread.gmane.org/gmane.linux.kernel.wireless.general/88421
>
> Thanks! Looks like you guys hit this with EAPOL frames. I'll look into it more.
>
> Is there any particular reason why these TI-OpenLink patches weren't
> submitted/merged to hostap?
>
nothing particular, AFAIR.
they needed further review/cleanup before upstream, and we just didn't
get to it.

Eliad.
Jouni Malinen July 7, 2015, 3 p.m. UTC | #7
On Thu, Jul 02, 2015 at 10:24:33AM +0200, Michal Kazior wrote:
> After looking into hostapd code I noticed something strange and I wonder if
> anyone else is already aware of this problem:
> 
>  1. AP starts
>  2. STA->AP auth OTA
>  3. AP->STA auth OTA
>  4. STA->AP assoc req OTA
>  5. AP->STA assoc resp OTA
>  6. STA sends NullFunc with "STA will go to sleep" bit set
>  7. AP driver/device sees a frame from with unknown TA/SA and issues Deauth
> w/ Reason 7
>    (this Deauth doesn't originate from hostapd; it comes from the device FW
> in my case)

If there is a driver or firmware design that sends these
Deauthentication frames on their own, they better be able to handle race
conditions and enable this functionality at the correct time.. Sure,
cfg80211 and hostapd may need modifications to make this work better,
but this needs to be done for things to work properly. There's a good
reason for hostapd having code to check the internal STA associated
flag before triggering deauthentication based on EVENT_RX_FROM_UNKNOWN
events..

> To me this looks like a race in hostapd. The station should be installed to
> driver _before_ sending Assoc Resp frame, not after. My quick-n-dirty hack
> seems to help:

Adding a STA entry before sending Association Response frame would be
fine, but this change would do more: it would claim that STA entry to be
in associated state. That is not correct from the IEEE 802.11 standard
view point. On the AP side, a STA becomes associated when an ACK frame
to the (Re)Association Response frame is received by the AP.
Michal Kazior July 9, 2015, 12:42 p.m. UTC | #8
On 7 July 2015 at 17:00, Jouni Malinen <j@w1.fi> wrote:
> On Thu, Jul 02, 2015 at 10:24:33AM +0200, Michal Kazior wrote:
>> After looking into hostapd code I noticed something strange and I wonder if
>> anyone else is already aware of this problem:
>>
>>  1. AP starts
>>  2. STA->AP auth OTA
>>  3. AP->STA auth OTA
>>  4. STA->AP assoc req OTA
>>  5. AP->STA assoc resp OTA
>>  6. STA sends NullFunc with "STA will go to sleep" bit set
>>  7. AP driver/device sees a frame from with unknown TA/SA and issues Deauth
>> w/ Reason 7
>>    (this Deauth doesn't originate from hostapd; it comes from the device FW
>> in my case)
>
> If there is a driver or firmware design that sends these
> Deauthentication frames on their own, they better be able to handle race
> conditions and enable this functionality at the correct time..

At least one ath10k qca61x4 firmware seems to send Deauth when it gets
a NullFunc from DA which doesn't have an internal peer entry. It
doesn't seem to care whether peer is authenticated or associated
though. Probably having none->auth, auth->assoc state transitions
(instead of a single none->assoc) would be enough - at least for
ath10k today.


> Sure,
> cfg80211 and hostapd may need modifications to make this work better,
> but this needs to be done for things to work properly. There's a good
> reason for hostapd having code to check the internal STA associated
> flag before triggering deauthentication based on EVENT_RX_FROM_UNKNOWN
> events..
>
>> To me this looks like a race in hostapd. The station should be installed to
>> driver _before_ sending Assoc Resp frame, not after. My quick-n-dirty hack
>> seems to help:
>
> Adding a STA entry before sending Association Response frame would be
> fine, but this change would do more: it would claim that STA entry to be
> in associated state. That is not correct from the IEEE 802.11 standard
> view point. On the AP side, a STA becomes associated when an ACK frame
> to the (Re)Association Response frame is received by the AP.

This seems pretty racy inherently. Driver/firmware actually needs to
be aware of this requirement and make sure to avoid reordering Tx ACK
processing and Rx. Since Tx/Rx engines are often (if not always?)
independent of each other I imagine this could be a common problem
across many devices today but perhaps not easily observable due to
narrow time window this could happen in.

Anyway - thank you for your great insight!


Michał
diff mbox

Patch

--- a/src/ap/ieee802_11.c
+++ b/src/ap/ieee802_11.c
@@ -42,6 +42,11 @@ 
 #include "dfs.h"


+static void handle_assoc_cb(struct hostapd_data *hapd,
+                           const struct ieee80211_mgmt *mgmt,
+                           size_t len, int reassoc, int ok);
+
+
 u8 * hostapd_eid_supp_rates(struct hostapd_data *hapd, u8 *eid)
 {
        u8 *pos = eid;
@@ -1675,6 +1680,8 @@  static void send_assoc_resp(struct hostapd_data
*hapd, struct sta_info *sta,

        send_len += p - reply->u.assoc_resp.variable;

+       handle_assoc_cb(hapd, reply, send_len, 0, 1);
+
        if (hostapd_drv_send_mlme(hapd, reply, send_len, 0) < 0)
                wpa_printf(MSG_INFO, "Failed to send assoc resp: %s",