diff mbox series

wpa_supplicant 2.10 - Scan failed

Message ID d9eb10a4-31dc-b87f-2d75-90e9a9e1a361@mail.com
State Not Applicable
Headers show
Series wpa_supplicant 2.10 - Scan failed | expand

Commit Message

Étienne Morice Jan. 27, 2022, 12:06 p.m. UTC
Hi,

Following the update from 2.9 to 2.10 in arch, we had some scan failures reported.

Original report is here:

https://bugs.archlinux.org/task/73495

Upon updating and restarting wpa_supplicant, scan starts to fail with a log of this kind:

> systemd[1]: Started WPA supplicant daemon (interface-specific version).
> Jan 26 22:33:45 ... wpa_supplicant[422]: Successfully initialized wpa_supplicant
> Jan 26 22:33:45 ... wpa_supplicant[422]: wlp3s0: CTRL-EVENT-SCAN-FAILED ret=-22 retry=1
> Jan 26 22:33:46 ... wpa_supplicant[422]: wlp3s0: CTRL-EVENT-SCAN-FAILED ret=-22 retry=1

In one case at least bisecting the problem was possible, here are the conclusions pasted from the arch thread:

=====================================

For me the first commits that make it stop working are:
a11804724 MSCS: Add support to send MSCS Request frames
c005283c4 SCS: Sending of SCS Request frames

These introduced in wpa_supplicant/wpa_supplicant.c, in wpas_ext_capab_byte, two new flags:
```
case 6: /* Bits 48-55 */
		*pos |= 0x40; /* Bit 54 - SCS */
```
and
```
case 10: /* Bits 80-87 */
		*pos |= 0x20; /* Bit 85 - Mirrored SCS */
```

If I comment these out, or set mscs = scs = false at the beginning of the function:
```
I get it to work again. I'm posting from hostap_2_10 with this modification right now.

Also, commenting the rest of the first breaking commit made no
difference, so it seems to me that it's really these flags that cause
the problem for me and not the rest of code depending on them introduced
at the same time.

Now the problem is, I have no idea what these actually do and how they cause the scan to fail.

Most crucially, I don't know at all if wpa_supplicant is doing something
wrong here, or if that code is actually correct but these flags are
somehow propagated down the stack to components (drivers ?) that cannot
handle them yet. So I'm not even sure if this should be fixed upstream
here or in other packages.

======================================

Any idea how to proceed from here ?

Best,

Étienne Morice

Comments

Étienne Morice Jan. 29, 2022, 11:25 p.m. UTC | #1
Update as I've been doing a few more checks.

I should specify these tests are with the broadcom-wl 6.30.223.271-357
driver on a Broadcom BCM43142 controller.

First, I checked why the new SCS/MSCS was making the scan fail.

The reason is, it causes nl80211_scan_common in
src/drivers/driver_nl80211_scan.c to add some extra IEs in the scan
request, via

 > nla_put(msg, NL80211_ATTR_IE, params->extra_ies_len, params->extra_ies)

(commenting out this block also makes the scans succeed again)

and these extra_ies cause the cfg80211 kernel module to reject the scan
request with EINVAL based on the following check in
linux/net/wireless/nl80211.c in nl80211_trigger_scan :

 >  if (ie_len > wiphy->max_scan_ie_len)
 > return -EINVAL;

In this case, ie_len is set to 13 in the scan request and
max_scan_ie_len is set (default initialized) to 0 in the wl driver,
hence the reported behavior.

To double check I recompiled my broadcom-wl driver with this added
initialization line:

 > wdev->wiphy->max_scan_ie_len = 1024;

(meaning the driver still completely ignores the extra IEs, but it
declares that it accepts some)

and indeed the scan also started to work again with the original 2.10
release.

So my takeaway is that one of these two should happen:
  1) wpa_supplicant should check if the driver accepts extra IEs in the
scan request before actually putting any.
  2) drivers should be patched to make sure they accept extra IEs in the
scan request even if they then silently discard them.

Then I'm not qualified to say which of these two (or something else
entirely) is the proper fix. However the fact that the cfg80211 module
has an explicit check for it makes me guess that drivers are allowed to
limit extra IEs and that wpa_supplicant should honor this limit ?

Best,
Étienne Morice

On 1/27/22 13:06, Étienne Morice wrote:
> Hi,
>
> Following the update from 2.9 to 2.10 in arch, we had some scan failures
> reported.
>
> Original report is here:
>
> https://bugs.archlinux.org/task/73495
>
> Upon updating and restarting wpa_supplicant, scan starts to fail with a
> log of this kind:
>
>> systemd[1]: Started WPA supplicant daemon (interface-specific version).
>> Jan 26 22:33:45 ... wpa_supplicant[422]: Successfully initialized
>> wpa_supplicant
>> Jan 26 22:33:45 ... wpa_supplicant[422]: wlp3s0:
>> CTRL-EVENT-SCAN-FAILED ret=-22 retry=1
>> Jan 26 22:33:46 ... wpa_supplicant[422]: wlp3s0:
>> CTRL-EVENT-SCAN-FAILED ret=-22 retry=1
>
> In one case at least bisecting the problem was possible, here are the
> conclusions pasted from the arch thread:
>
> =====================================
>
> For me the first commits that make it stop working are:
> a11804724 MSCS: Add support to send MSCS Request frames
> c005283c4 SCS: Sending of SCS Request frames
>
> These introduced in wpa_supplicant/wpa_supplicant.c, in
> wpas_ext_capab_byte, two new flags:
> ```
> case 6: /* Bits 48-55 */
>          *pos |= 0x40; /* Bit 54 - SCS */
> ```
> and
> ```
> case 10: /* Bits 80-87 */
>          *pos |= 0x20; /* Bit 85 - Mirrored SCS */
> ```
>
> If I comment these out, or set mscs = scs = false at the beginning of
> the function:
> ```
> diff --git a/wpa_supplicant/wpa_supplicant.c
> b/wpa_supplicant/wpa_supplicant.c
> index d37a994f9..561e5f3ed 100644
> --- a/wpa_supplicant/wpa_supplicant.c
> +++ b/wpa_supplicant/wpa_supplicant.c
> @@ -1883,7 +1883,7 @@ int wpa_supplicant_set_suites(struct
> wpa_supplicant *wpa_s,
>
>   static void wpas_ext_capab_byte(struct wpa_supplicant *wpa_s, u8 *pos,
> int idx)
>   {
> -       bool scs = true, mscs = true;
> +       bool scs = false, mscs = false;
>
>          *pos = 0x00;
> ```
> I get it to work again. I'm posting from hostap_2_10 with this
> modification right now.
>
> Also, commenting the rest of the first breaking commit made no
> difference, so it seems to me that it's really these flags that cause
> the problem for me and not the rest of code depending on them introduced
> at the same time.
>
> Now the problem is, I have no idea what these actually do and how they
> cause the scan to fail.
>
> Most crucially, I don't know at all if wpa_supplicant is doing something
> wrong here, or if that code is actually correct but these flags are
> somehow propagated down the stack to components (drivers ?) that cannot
> handle them yet. So I'm not even sure if this should be fixed upstream
> here or in other packages.
>
> ======================================
>
> Any idea how to proceed from here ?
>
> Best,
>
> Étienne Morice
>
>
David Bauer Jan. 30, 2022, 12:49 p.m. UTC | #2
Hi Etienne,

On 1/30/22 00:25, Étienne Morice wrote:
> Update as I've been doing a few more checks.
> 
> I should specify these tests are with the broadcom-wl 6.30.223.271-357
> driver on a Broadcom BCM43142 controller.
> 
> First, I checked why the new SCS/MSCS was making the scan fail.
> 
> The reason is, it causes nl80211_scan_common in
> src/drivers/driver_nl80211_scan.c to add some extra IEs in the scan
> request, via

As a sidenote: Similar breakage was observed a while back with broadcom-wl
when Debian and Arch enabled CONFIG_WNM for their wpa_supplicant
distributions [0].

The kernel exposes the maximum supported bytes to userspace with
which can be retrieved in wiphy_info_handler.

See the attached patch. Does scanning work with thid patch applied to v2.10?

[0] https://bugs.archlinux.org/task/65588

Best
David

> 
>  > nla_put(msg, NL80211_ATTR_IE, params->extra_ies_len, params->extra_ies)
> 
> (commenting out this block also makes the scans succeed again)
> 
> and these extra_ies cause the cfg80211 kernel module to reject the scan
> request with EINVAL based on the following check in
> linux/net/wireless/nl80211.c in nl80211_trigger_scan :
> 
>  >  if (ie_len > wiphy->max_scan_ie_len)
>  > return -EINVAL;
> 
> In this case, ie_len is set to 13 in the scan request and
> max_scan_ie_len is set (default initialized) to 0 in the wl driver,
> hence the reported behavior.
> 
> To double check I recompiled my broadcom-wl driver with this added
> initialization line:
> 
>  > wdev->wiphy->max_scan_ie_len = 1024;
> 
> (meaning the driver still completely ignores the extra IEs, but it
> declares that it accepts some)
> 
> and indeed the scan also started to work again with the original 2.10
> release.
> 
> So my takeaway is that one of these two should happen:
>   1) wpa_supplicant should check if the driver accepts extra IEs in the
> scan request before actually putting any.
>   2) drivers should be patched to make sure they accept extra IEs in the
> scan request even if they then silently discard them.
> 
> Then I'm not qualified to say which of these two (or something else
> entirely) is the proper fix. However the fact that the cfg80211 module
> has an explicit check for it makes me guess that drivers are allowed to
> limit extra IEs and that wpa_supplicant should honor this limit ?
> 
> Best,
> Étienne Morice
> 
> On 1/27/22 13:06, Étienne Morice wrote:
>> Hi,
>>
>> Following the update from 2.9 to 2.10 in arch, we had some scan failures
>> reported.
>>
>> Original report is here:
>>
>> https://bugs.archlinux.org/task/73495
>>
>> Upon updating and restarting wpa_supplicant, scan starts to fail with a
>> log of this kind:
>>
>>> systemd[1]: Started WPA supplicant daemon (interface-specific version).
>>> Jan 26 22:33:45 ... wpa_supplicant[422]: Successfully initialized
>>> wpa_supplicant
>>> Jan 26 22:33:45 ... wpa_supplicant[422]: wlp3s0:
>>> CTRL-EVENT-SCAN-FAILED ret=-22 retry=1
>>> Jan 26 22:33:46 ... wpa_supplicant[422]: wlp3s0:
>>> CTRL-EVENT-SCAN-FAILED ret=-22 retry=1
>>
>> In one case at least bisecting the problem was possible, here are the
>> conclusions pasted from the arch thread:
>>
>> =====================================
>>
>> For me the first commits that make it stop working are:
>> a11804724 MSCS: Add support to send MSCS Request frames
>> c005283c4 SCS: Sending of SCS Request frames
>>
>> These introduced in wpa_supplicant/wpa_supplicant.c, in
>> wpas_ext_capab_byte, two new flags:
>> ```
>> case 6: /* Bits 48-55 */
>>          *pos |= 0x40; /* Bit 54 - SCS */
>> ```
>> and
>> ```
>> case 10: /* Bits 80-87 */
>>          *pos |= 0x20; /* Bit 85 - Mirrored SCS */
>> ```
>>
>> If I comment these out, or set mscs = scs = false at the beginning of
>> the function:
>> ```
>> diff --git a/wpa_supplicant/wpa_supplicant.c
>> b/wpa_supplicant/wpa_supplicant.c
>> index d37a994f9..561e5f3ed 100644
>> --- a/wpa_supplicant/wpa_supplicant.c
>> +++ b/wpa_supplicant/wpa_supplicant.c
>> @@ -1883,7 +1883,7 @@ int wpa_supplicant_set_suites(struct
>> wpa_supplicant *wpa_s,
>>
>>   static void wpas_ext_capab_byte(struct wpa_supplicant *wpa_s, u8 *pos,
>> int idx)
>>   {
>> -       bool scs = true, mscs = true;
>> +       bool scs = false, mscs = false;
>>
>>          *pos = 0x00;
>> ```
>> I get it to work again. I'm posting from hostap_2_10 with this
>> modification right now.
>>
>> Also, commenting the rest of the first breaking commit made no
>> difference, so it seems to me that it's really these flags that cause
>> the problem for me and not the rest of code depending on them introduced
>> at the same time.
>>
>> Now the problem is, I have no idea what these actually do and how they
>> cause the scan to fail.
>>
>> Most crucially, I don't know at all if wpa_supplicant is doing something
>> wrong here, or if that code is actually correct but these flags are
>> somehow propagated down the stack to components (drivers ?) that cannot
>> handle them yet. So I'm not even sure if this should be fixed upstream
>> here or in other packages.
>>
>> ======================================
>>
>> Any idea how to proceed from here ?
>>
>> Best,
>>
>> Étienne Morice
>>
>>
> 
> _______________________________________________
> Hostap mailing list
> Hostap@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/hostap
Étienne Morice Jan. 30, 2022, 2 p.m. UTC | #3
Hi David,

On 1/30/22 13:49, David Bauer wrote:

 >
 > The kernel exposes the maximum supported bytes to userspace with
 > which can be retrieved in wiphy_info_handler.
 >
 > See the attached patch. Does scanning work with thid patch applied to
 > v2.10?
 >
 > [0] https://bugs.archlinux.org/task/65588
 >
 > Best
 > David

Thanks ! This almost works. The patch doesn't fix the problem as-is, but
I suspect it's just a typo in the direction of the comparison in the
check in nl80211_scan_common. If I flip the direction as attached, it
then works with the orginial patch + the modification applied to 2.10.

Best,
Étienne
David Bauer Jan. 30, 2022, 7:01 p.m. UTC | #4
Hi Etienne,

On 1/30/22 15:00, Étienne Morice wrote:
> Hi David,
> 
> On 1/30/22 13:49, David Bauer wrote:
> 
>  >
>  > The kernel exposes the maximum supported bytes to userspace with
>  > which can be retrieved in wiphy_info_handler.
>  >
>  > See the attached patch. Does scanning work with thid patch applied to
>  > v2.10?
>  >
>  > [0] https://bugs.archlinux.org/task/65588
>  >
>  > Best
>  > David
> 
> Thanks ! This almost works. The patch doesn't fix the problem as-is, but
> I suspect it's just a typo in the direction of the comparison in the
> check in nl80211_scan_common. If I flip the direction as attached, it
> then works with the orginial patch + the modification applied to 2.10.

Great to hear!

One additional question (As i do not have the hardware): Does scan & association
work with CONFIG_WNM enabled? As indicated in my last name, this broke broadcom-wl
on 2.9, from my understanding the patch should also solve this.

Best
David

> 
> Best,
> Étienne
Étienne Morice Jan. 30, 2022, 9:46 p.m. UTC | #5
Hi David,

On 1/30/22 20:01, David Bauer wrote:
> Hi Etienne,
>
> On 1/30/22 15:00, Étienne Morice wrote:
>> Hi David,
>>
>> On 1/30/22 13:49, David Bauer wrote:
>>
>>  >
>>  > The kernel exposes the maximum supported bytes to userspace with
>>  > which can be retrieved in wiphy_info_handler.
>>  >
>>  > See the attached patch. Does scanning work with thid patch applied to
>>  > v2.10?
>>  >
>>  > [0] https://bugs.archlinux.org/task/65588
>>  >
>>  > Best
>>  > David
>>
>> Thanks ! This almost works. The patch doesn't fix the problem as-is, but
>> I suspect it's just a typo in the direction of the comparison in the
>> check in nl80211_scan_common. If I flip the direction as attached, it
>> then works with the orginial patch + the modification applied to 2.10.
>
> Great to hear!
>
> One additional question (As i do not have the hardware): Does scan &
> association
> work with CONFIG_WNM enabled? As indicated in my last name, this broke
> broadcom-wl
> on 2.9, from my understanding the patch should also solve this.
>
> Best
> David

Yes, it works. I can indeed reproduce the same problem on 2.10 if I set
the [m]scs flags to false in wpas_wpas_ext_capab_byte (which by itself
works) but turn on CONFIG_WNM=y, and then I get the same failure with
ie_len=6 getting passed down to cfg80211. With the patch on top it scans
and associates correctly with CONFIG_WNM=y (both with and without the
[m]scs flags).

Best,
Étienne.
diff mbox series

Patch

diff --git a/wpa_supplicant/wpa_supplicant.c b/wpa_supplicant/wpa_supplicant.c
index d37a994f9..561e5f3ed 100644
--- a/wpa_supplicant/wpa_supplicant.c
+++ b/wpa_supplicant/wpa_supplicant.c
@@ -1883,7 +1883,7 @@  int wpa_supplicant_set_suites(struct wpa_supplicant *wpa_s,

  static void wpas_ext_capab_byte(struct wpa_supplicant *wpa_s, u8 *pos, int idx)
  {
-       bool scs = true, mscs = true;
+       bool scs = false, mscs = false;

         *pos = 0x00;
```