Patchwork I always need a miracle to connect with iwlwifi

login
register
mail settings
Submitter Felipe Contreras
Date Nov. 8, 2013, 2:18 p.m.
Message ID <CAMP44s0h836BZ+g9-g_t78bKuV_m_xAuvPYRCHaFdJ58eTKthw@mail.gmail.com>
Download mbox | patch
Permalink /patch/289844/
State Not Applicable
Headers show

Comments

Felipe Contreras - Nov. 8, 2013, 2:18 p.m.
On Fri, Nov 8, 2013 at 8:06 AM, Krishna Chaitanya
<chaitanya.mgit@gmail.com> wrote:
> On Fri, Nov 8, 2013 at 6:44 PM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> On Fri, Nov 8, 2013 at 2:35 AM, Felipe Contreras
>> <felipe.contreras@gmail.com> wrote:
>>> On Sat, Nov 2, 2013 at 2:05 PM, Krishna Chaitanya
>>> <chaitanya.mgit@gmail.com> wrote:
>>>
>>>> Also one more thing you said N900 uses mac80211 and it has no issues, but as
>>>> its a embedded device it might running an older kernel where the
>>>> handling might be
>>>> different, so we need to try with the same kernel you are facing an
>>>> issue with the
>>>> a driver which advertises IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC.
>>>
>>> Yes it was running an older kernel, but I just compiled v3.12 and ran
>>> it on the N900, and still everything works fine.
>>>
>>>> (or) if you a have a compilation environment try commenting the advertisement of
>>>> IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC in the iwlwifi DVM driver and
>>>> try to reproduce the issue.
>>>
>>> After commenting that flag everything works fine :)
>
> Oh, great. That was just to corner the problem, that means we are not getting
> the required beacon  before the association, but we only wait for 1 beacon here
> may be we to wait for some number of beacons before giving up the association??
>
> Johannes??

But we are receiving 0 beacons, waiting for more than 1 won't help.
BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
before the association?

>>> What are the next steps?
>>
>> I tried to add some debugging to see what's going on, and indeed the
>> beacon packets are lost, I added debugging as low in the chain as I
>> could (iwlagn_rx_reply_rx()), and I don't see them there. However,
>> when I enable the monitor mode, I see them. What's going on?
>
> In the captures you shared all the beacons are malformed, so
> probably they failed the CRC check. iwlwifi drops all the CRC failed
> packets. (doth MVM and DVM)

Before iwlagn_rx_reply_rx()?

> Not sure how you are receiving the beacons in the monitor mode.

I don't know what kismet does, but I can see my debugging is printing them.

> BTW did you tried capturing the beacons in other devices and see if they
> are really malformed, or is it just iwlwifi interpreting them wrongly.?

I haven't managed to do that yet.

This is what I'm doing:
Chaitanya TK - Nov. 8, 2013, 8:30 p.m.
On Fri, Nov 8, 2013 at 7:48 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Fri, Nov 8, 2013 at 8:06 AM, Krishna Chaitanya
> <chaitanya.mgit@gmail.com> wrote:
>> On Fri, Nov 8, 2013 at 6:44 PM, Felipe Contreras
>> <felipe.contreras@gmail.com> wrote:
>>> On Fri, Nov 8, 2013 at 2:35 AM, Felipe Contreras
>>> <felipe.contreras@gmail.com> wrote:
>>>> On Sat, Nov 2, 2013 at 2:05 PM, Krishna Chaitanya
>>>> <chaitanya.mgit@gmail.com> wrote:
>>>>
>>>>> Also one more thing you said N900 uses mac80211 and it has no issues, but as
>>>>> its a embedded device it might running an older kernel where the
>>>>> handling might be
>>>>> different, so we need to try with the same kernel you are facing an
>>>>> issue with the
>>>>> a driver which advertises IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC.
>>>>
>>>> Yes it was running an older kernel, but I just compiled v3.12 and ran
>>>> it on the N900, and still everything works fine.
>>>>
>>>>> (or) if you a have a compilation environment try commenting the advertisement of
>>>>> IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC in the iwlwifi DVM driver and
>>>>> try to reproduce the issue.
>>>>
>>>> After commenting that flag everything works fine :)
>>
>> Oh, great. That was just to corner the problem, that means we are not getting
>> the required beacon  before the association, but we only wait for 1 beacon here
>> may be we to wait for some number of beacons before giving up the association??
>>
>> Johannes??
>
> But we are receiving 0 beacons, waiting for more than 1 won't help.
> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
> before the association?
>
>>>> What are the next steps?
>>>
>>> I tried to add some debugging to see what's going on, and indeed the
>>> beacon packets are lost, I added debugging as low in the chain as I
>>> could (iwlagn_rx_reply_rx()), and I don't see them there. However,
>>> when I enable the monitor mode, I see them. What's going on?
>>
>> In the captures you shared all the beacons are malformed, so
>> probably they failed the CRC check. iwlwifi drops all the CRC failed
>> packets. (doth MVM and DVM)
>
> Before iwlagn_rx_reply_rx()?
>
>> Not sure how you are receiving the beacons in the monitor mode.
>
> I don't know what kismet does, but I can see my debugging is printing them.
>
>> BTW did you tried capturing the beacons in other devices and see if they
>> are really malformed, or is it just iwlwifi interpreting them wrongly.?
>
> I haven't managed to do that yet.
>
> This is what I'm doing:
>
> --- a/drivers/net/wireless/iwlwifi/dvm/rx.c
> +++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
> @@ -919,6 +919,11 @@ static int iwlagn_rx_reply_rx(struct iwl_priv *priv,
>         ampdu_status = iwlagn_translate_rx_status(priv,
>                                                   le32_to_cpu(rx_pkt_status));
>
> +       if (ieee80211_is_beacon(header->frame_control)) {
> +               print_hex_dump(KERN_INFO, "iwlwifi: dump: ", DUMP_PREFIX_OFFSET,
> +                               16, 1, header, len, true);
> +       }
> +
>         if ((unlikely(phy_res->cfg_phy_cnt > 20))) {
>                 IWL_DEBUG_DROP(priv, "dsp size out of range [0,20]: %d\n",
>                                 phy_res->cfg_phy_cnt);
>
Oops...you just missed, Right after your print there is a check to
drop frames with BAD CRC :-).
Line 928.

    ampdu_status = iwlagn_translate_rx_status(priv,
                          le32_to_cpu(rx_pkt_status));

    if ((unlikely(phy_res->cfg_phy_cnt > 20))) {
        IWL_DEBUG_DROP(priv, "dsp size out of range [0,20]: %d\n",
                phy_res->cfg_phy_cnt);
        return 0;
    }

    if (!(rx_pkt_status & RX_RES_STATUS_NO_CRC32_ERROR) ||
        !(rx_pkt_status & RX_RES_STATUS_NO_RXE_OVERFLOW)) {
        IWL_DEBUG_RX(priv, "Bad CRC or FIFO: 0x%08X.\n",
                le32_to_cpu(rx_pkt_status));
        return 0;
    }
Felipe Contreras - Nov. 8, 2013, 8:52 p.m.
On Fri, Nov 8, 2013 at 2:30 PM, Krishna Chaitanya
<chaitanya.mgit@gmail.com> wrote:
> On Fri, Nov 8, 2013 at 7:48 PM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> On Fri, Nov 8, 2013 at 8:06 AM, Krishna Chaitanya
>> <chaitanya.mgit@gmail.com> wrote:
>>> On Fri, Nov 8, 2013 at 6:44 PM, Felipe Contreras
>>> <felipe.contreras@gmail.com> wrote:
>>>> On Fri, Nov 8, 2013 at 2:35 AM, Felipe Contreras
>>>> <felipe.contreras@gmail.com> wrote:
>>>>> On Sat, Nov 2, 2013 at 2:05 PM, Krishna Chaitanya
>>>>> <chaitanya.mgit@gmail.com> wrote:
>>>>>
>>>>>> Also one more thing you said N900 uses mac80211 and it has no issues, but as
>>>>>> its a embedded device it might running an older kernel where the
>>>>>> handling might be
>>>>>> different, so we need to try with the same kernel you are facing an
>>>>>> issue with the
>>>>>> a driver which advertises IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC.
>>>>>
>>>>> Yes it was running an older kernel, but I just compiled v3.12 and ran
>>>>> it on the N900, and still everything works fine.
>>>>>
>>>>>> (or) if you a have a compilation environment try commenting the advertisement of
>>>>>> IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC in the iwlwifi DVM driver and
>>>>>> try to reproduce the issue.
>>>>>
>>>>> After commenting that flag everything works fine :)
>>>
>>> Oh, great. That was just to corner the problem, that means we are not getting
>>> the required beacon  before the association, but we only wait for 1 beacon here
>>> may be we to wait for some number of beacons before giving up the association??
>>>
>>> Johannes??
>>
>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>> before the association?
>>
>>>>> What are the next steps?
>>>>
>>>> I tried to add some debugging to see what's going on, and indeed the
>>>> beacon packets are lost, I added debugging as low in the chain as I
>>>> could (iwlagn_rx_reply_rx()), and I don't see them there. However,
>>>> when I enable the monitor mode, I see them. What's going on?
>>>
>>> In the captures you shared all the beacons are malformed, so
>>> probably they failed the CRC check. iwlwifi drops all the CRC failed
>>> packets. (doth MVM and DVM)
>>
>> Before iwlagn_rx_reply_rx()?
>>
>>> Not sure how you are receiving the beacons in the monitor mode.
>>
>> I don't know what kismet does, but I can see my debugging is printing them.
>>
>>> BTW did you tried capturing the beacons in other devices and see if they
>>> are really malformed, or is it just iwlwifi interpreting them wrongly.?
>>
>> I haven't managed to do that yet.
>>
>> This is what I'm doing:
>>
>> --- a/drivers/net/wireless/iwlwifi/dvm/rx.c
>> +++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
>> @@ -919,6 +919,11 @@ static int iwlagn_rx_reply_rx(struct iwl_priv *priv,
>>         ampdu_status = iwlagn_translate_rx_status(priv,
>>                                                   le32_to_cpu(rx_pkt_status));
>>
>> +       if (ieee80211_is_beacon(header->frame_control)) {
>> +               print_hex_dump(KERN_INFO, "iwlwifi: dump: ", DUMP_PREFIX_OFFSET,
>> +                               16, 1, header, len, true);
>> +       }
>> +
>>         if ((unlikely(phy_res->cfg_phy_cnt > 20))) {
>>                 IWL_DEBUG_DROP(priv, "dsp size out of range [0,20]: %d\n",
>>                                 phy_res->cfg_phy_cnt);
>>
> Oops...you just missed, Right after your print there is a check to
> drop frames with BAD CRC :-).

That's why I put the print before that check. Since I don't see the
print, that means the check was never executed. iwlagn_rx_reply_rx()
was never called for the beacon frame.
Chaitanya TK - Nov. 9, 2013, 1:10 p.m.
On Sat, Nov 9, 2013 at 2:22 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Fri, Nov 8, 2013 at 2:30 PM, Krishna Chaitanya
> <chaitanya.mgit@gmail.com> wrote:
>> On Fri, Nov 8, 2013 at 7:48 PM, Felipe Contreras
>> <felipe.contreras@gmail.com> wrote:
>>> On Fri, Nov 8, 2013 at 8:06 AM, Krishna Chaitanya
>>> <chaitanya.mgit@gmail.com> wrote:
>>>> On Fri, Nov 8, 2013 at 6:44 PM, Felipe Contreras
>>>> <felipe.contreras@gmail.com> wrote:
>>>>> On Fri, Nov 8, 2013 at 2:35 AM, Felipe Contreras
>>>>> <felipe.contreras@gmail.com> wrote:
>>>>>> On Sat, Nov 2, 2013 at 2:05 PM, Krishna Chaitanya
>>>>>> <chaitanya.mgit@gmail.com> wrote:
>>>>>>
>>>>>>> Also one more thing you said N900 uses mac80211 and it has no issues, but as
>>>>>>> its a embedded device it might running an older kernel where the
>>>>>>> handling might be
>>>>>>> different, so we need to try with the same kernel you are facing an
>>>>>>> issue with the
>>>>>>> a driver which advertises IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC.
>>>>>>
>>>>>> Yes it was running an older kernel, but I just compiled v3.12 and ran
>>>>>> it on the N900, and still everything works fine.
>>>>>>
>>>>>>> (or) if you a have a compilation environment try commenting the advertisement of
>>>>>>> IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC in the iwlwifi DVM driver and
>>>>>>> try to reproduce the issue.
>>>>>>
>>>>>> After commenting that flag everything works fine :)
>>>>
>>>> Oh, great. That was just to corner the problem, that means we are not getting
>>>> the required beacon  before the association, but we only wait for 1 beacon here
>>>> may be we to wait for some number of beacons before giving up the association??
>>>>
>>>> Johannes??
>>>
>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>> before the association?
>>>
This is not just for your case but rather on a generic note. Regarding
the flag even i am not
too sure but i guess some hardware need to know the DTIM to set the
wakeup schedule
after the association?

>>>>>> What are the next steps?
>>>>>
>>>>> I tried to add some debugging to see what's going on, and indeed the
>>>>> beacon packets are lost, I added debugging as low in the chain as I
>>>>> could (iwlagn_rx_reply_rx()), and I don't see them there. However,
>>>>> when I enable the monitor mode, I see them. What's going on?
>>>>
>>>> In the captures you shared all the beacons are malformed, so
>>>> probably they failed the CRC check. iwlwifi drops all the CRC failed
>>>> packets. (doth MVM and DVM)
>>>
>>> Before iwlagn_rx_reply_rx()?
>>>
>>>> Not sure how you are receiving the beacons in the monitor mode.
>>>
>>> I don't know what kismet does, but I can see my debugging is printing them.
>>>
>>>> BTW did you tried capturing the beacons in other devices and see if they
>>>> are really malformed, or is it just iwlwifi interpreting them wrongly.?
>>>
>>> I haven't managed to do that yet.
>>>
>>> This is what I'm doing:
>>>
>>> --- a/drivers/net/wireless/iwlwifi/dvm/rx.c
>>> +++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
>>> @@ -919,6 +919,11 @@ static int iwlagn_rx_reply_rx(struct iwl_priv *priv,
>>>         ampdu_status = iwlagn_translate_rx_status(priv,
>>>                                                   le32_to_cpu(rx_pkt_status));
>>>
>>> +       if (ieee80211_is_beacon(header->frame_control)) {
>>> +               print_hex_dump(KERN_INFO, "iwlwifi: dump: ", DUMP_PREFIX_OFFSET,
>>> +                               16, 1, header, len, true);
>>> +       }
>>> +
>>>         if ((unlikely(phy_res->cfg_phy_cnt > 20))) {
>>>                 IWL_DEBUG_DROP(priv, "dsp size out of range [0,20]: %d\n",
>>>                                 phy_res->cfg_phy_cnt);
>>>
>> Oops...you just missed, Right after your print there is a check to
>> drop frames with BAD CRC :-).
>
> That's why I put the print before that check. Since I don't see the
> print, that means the check was never executed. iwlagn_rx_reply_rx()
> was never called for the beacon frame.
>
Ok. So when we disable advertising of that flag in the driver you said things
are working fine. So in that scenario after the connection are you
seeing the beacons?

Just want to understand the problem is throughout or just before association.
If the driver itself it not getting the beacons then our debugging ends there,
some one from intel should guide you through the FW debugging.
Felipe Contreras - Nov. 9, 2013, 4:52 p.m.
On Sat, Nov 9, 2013 at 7:10 AM, Krishna Chaitanya
<chaitanya.mgit@gmail.com> wrote:
> On Sat, Nov 9, 2013 at 2:22 AM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> On Fri, Nov 8, 2013 at 2:30 PM, Krishna Chaitanya
>> <chaitanya.mgit@gmail.com> wrote:

>>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>>> before the association?
>>>>
> This is not just for your case but rather on a generic note. Regarding
> the flag even i am not
> too sure but i guess some hardware need to know the DTIM to set the
> wakeup schedule
> after the association?

But not this hardware? Because everything works fine.

>>> Oops...you just missed, Right after your print there is a check to
>>> drop frames with BAD CRC :-).
>>
>> That's why I put the print before that check. Since I don't see the
>> print, that means the check was never executed. iwlagn_rx_reply_rx()
>> was never called for the beacon frame.
>>
> Ok. So when we disable advertising of that flag in the driver you said things
> are working fine.

Yes, everything works perfectly.

> So in that scenario after the connection are you
> seeing the beacons?

No, there are no beacons ever, at least from this AP.

It seems to me all the beacon frames are dropped by the firmware
before passing them to the driver, so the driver cannot parse them and
do something sensible even though they are corrupted, the driver never
gets them.

> Just want to understand the problem is throughout or just before association.
> If the driver itself it not getting the beacons then our debugging ends there,
> some one from intel should guide you through the FW debugging.

Not really, part of the debugging ends there, but we can still do something.

What is the meaning of NEED_DTIM_BEFORE_ASSOC, if the driver doesn't
*need* this? Why fail the association completely, if we don't need to?

Also, I realized that after rebooting the router, the beacon frames
are not corrupted any more, so it's a compound problem, yet even in
the corrupted case, the driver can work just fine, if only it didn't
*require* the DTIM unnecessarily, as apparently all hardware and even
other OS'es on this hardware do.

Cheers.
Chaitanya TK - Nov. 9, 2013, 7:10 p.m.
On Sat, Nov 9, 2013 at 10:22 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Sat, Nov 9, 2013 at 7:10 AM, Krishna Chaitanya
> <chaitanya.mgit@gmail.com> wrote:
>> On Sat, Nov 9, 2013 at 2:22 AM, Felipe Contreras
>> <felipe.contreras@gmail.com> wrote:
>>> On Fri, Nov 8, 2013 at 2:30 PM, Krishna Chaitanya
>>> <chaitanya.mgit@gmail.com> wrote:
>
>>>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>>>> before the association?
>>>>>
>> This is not just for your case but rather on a generic note. Regarding
>> the flag even i am not
>> too sure but i guess some hardware need to know the DTIM to set the
>> wakeup schedule
>> after the association?
>
> But not this hardware? Because everything works fine.
>
>>>> Oops...you just missed, Right after your print there is a check to
>>>> drop frames with BAD CRC :-).
>>>
>>> That's why I put the print before that check. Since I don't see the
>>> print, that means the check was never executed. iwlagn_rx_reply_rx()
>>> was never called for the beacon frame.
>>>
>> Ok. So when we disable advertising of that flag in the driver you said things
>> are working fine.
>
> Yes, everything works perfectly.
>
>> So in that scenario after the connection are you
>> seeing the beacons?
>
> No, there are no beacons ever, at least from this AP
Oh ok, thats interesting. Are you not seeing any disconnects due
to beacon loss triggers?

Also can you add some debugging to the iwlagn_rx_beacon_notif
(the beacon RX handler)?
.
>
> It seems to me all the beacon frames are dropped by the firmware
> before passing them to the driver, so the driver cannot parse them and
> do something sensible even though they are corrupted, the driver never
> gets them.
>
>> Just want to understand the problem is throughout or just before association.
>> If the driver itself it not getting the beacons then our debugging ends there,
>> some one from intel should guide you through the FW debugging.
>
> Not really, part of the debugging ends there, but we can still do something.
>
> What is the meaning of NEED_DTIM_BEFORE_ASSOC, if the driver doesn't
> *need* this? Why fail the association completely, if we don't need to?
>
> Also, I realized that after rebooting the router, the beacon frames
> are not corrupted any more, so it's a compound problem, yet even in
> the corrupted case, the driver can work just fine, if only it didn't
> *require* the DTIM unnecessarily,

Yeah, that's more of design query with the problem being not able to
Rx the beacons? We need to understand the reason for this flag being
set by the iwlwifi driver.

>as apparently all hardware and even
> other OS'es on this hardware do.

Thats the reason this flag is a _HW_ not all hardwares requrie this
but intel does.

I don't know the background of this flag, johannes is the right guy
to be able to answer this.
Felipe Contreras - Nov. 9, 2013, 9:24 p.m.
On Sat, Nov 9, 2013 at 1:10 PM, Krishna Chaitanya
<chaitanya.mgit@gmail.com> wrote:
> On Sat, Nov 9, 2013 at 10:22 PM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> On Sat, Nov 9, 2013 at 7:10 AM, Krishna Chaitanya
>> <chaitanya.mgit@gmail.com> wrote:
>>> On Sat, Nov 9, 2013 at 2:22 AM, Felipe Contreras
>>> <felipe.contreras@gmail.com> wrote:
>>>> On Fri, Nov 8, 2013 at 2:30 PM, Krishna Chaitanya
>>>> <chaitanya.mgit@gmail.com> wrote:
>>
>>>>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>>>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>>>>> before the association?
>>>>>>
>>> This is not just for your case but rather on a generic note. Regarding
>>> the flag even i am not
>>> too sure but i guess some hardware need to know the DTIM to set the
>>> wakeup schedule
>>> after the association?
>>
>> But not this hardware? Because everything works fine.
>>
>>>>> Oops...you just missed, Right after your print there is a check to
>>>>> drop frames with BAD CRC :-).
>>>>
>>>> That's why I put the print before that check. Since I don't see the
>>>> print, that means the check was never executed. iwlagn_rx_reply_rx()
>>>> was never called for the beacon frame.
>>>>
>>> Ok. So when we disable advertising of that flag in the driver you said things
>>> are working fine.
>>
>> Yes, everything works perfectly.
>>
>>> So in that scenario after the connection are you
>>> seeing the beacons?
>>
>> No, there are no beacons ever, at least from this AP

> Oh ok, thats interesting. Are you not seeing any disconnects due
> to beacon loss triggers?

I see some disconnects now and then, but I don't know why. Before
trying to tackle those problems I would like to be able to connect
reliably.

> Also can you add some debugging to the iwlagn_rx_beacon_notif
> (the beacon RX handler)?

All right, I've added debugging there, but so far I see nothing.

>> It seems to me all the beacon frames are dropped by the firmware
>> before passing them to the driver, so the driver cannot parse them and
>> do something sensible even though they are corrupted, the driver never
>> gets them.
>>
>>> Just want to understand the problem is throughout or just before association.
>>> If the driver itself it not getting the beacons then our debugging ends there,
>>> some one from intel should guide you through the FW debugging.
>>
>> Not really, part of the debugging ends there, but we can still do something.
>>
>> What is the meaning of NEED_DTIM_BEFORE_ASSOC, if the driver doesn't
>> *need* this? Why fail the association completely, if we don't need to?
>>
>> Also, I realized that after rebooting the router, the beacon frames
>> are not corrupted any more, so it's a compound problem, yet even in
>> the corrupted case, the driver can work just fine, if only it didn't
>> *require* the DTIM unnecessarily,
>
> Yeah, that's more of design query with the problem being not able to
> Rx the beacons? We need to understand the reason for this flag being
> set by the iwlwifi driver.

Indeed.

>>as apparently all hardware and even
>> other OS'es on this hardware do.
>
> Thats the reason this flag is a _HW_ not all hardwares requrie this
> but intel does.

But it doesn't, my hardware is Intel, and it works fine without it.
Chaitanya TK - Nov. 9, 2013, 9:37 p.m.
On Sun, Nov 10, 2013 at 2:54 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Sat, Nov 9, 2013 at 1:10 PM, Krishna Chaitanya
> <chaitanya.mgit@gmail.com> wrote:
>> On Sat, Nov 9, 2013 at 10:22 PM, Felipe Contreras
>> <felipe.contreras@gmail.com> wrote:
>>> On Sat, Nov 9, 2013 at 7:10 AM, Krishna Chaitanya
>>> <chaitanya.mgit@gmail.com> wrote:
>>>> On Sat, Nov 9, 2013 at 2:22 AM, Felipe Contreras
>>>> <felipe.contreras@gmail.com> wrote:
>>>>> On Fri, Nov 8, 2013 at 2:30 PM, Krishna Chaitanya
>>>>> <chaitanya.mgit@gmail.com> wrote:
>>>
>>>>>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>>>>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>>>>>> before the association?
>>>>>>>
>>>> This is not just for your case but rather on a generic note. Regarding
>>>> the flag even i am not
>>>> too sure but i guess some hardware need to know the DTIM to set the
>>>> wakeup schedule
>>>> after the association?
>>>
>>> But not this hardware? Because everything works fine.
>>>
>>>>>> Oops...you just missed, Right after your print there is a check to
>>>>>> drop frames with BAD CRC :-).
>>>>>
>>>>> That's why I put the print before that check. Since I don't see the
>>>>> print, that means the check was never executed. iwlagn_rx_reply_rx()
>>>>> was never called for the beacon frame.
>>>>>
>>>> Ok. So when we disable advertising of that flag in the driver you said things
>>>> are working fine.
>>>
>>> Yes, everything works perfectly.
>>>
>>>> So in that scenario after the connection are you
>>>> seeing the beacons?
>>>
>>> No, there are no beacons ever, at least from this AP
>
>> Oh ok, thats interesting. Are you not seeing any disconnects due
>> to beacon loss triggers?
>
> I see some disconnects now and then, but I don't know why. Before
> trying to tackle those problems I would like to be able to connect
> reliably.

Its probably the beacons loss that triggering the disconnects, so
both the problem have the same cause. Its the beacon reception
we need to figure it out.

Adding some intel guys explicitly.

>> Also can you add some debugging to the iwlagn_rx_beacon_notif
>> (the beacon RX handler)?
>
> All right, I've added debugging there, but so far I see nothing.
>

Hmm...dead end this side too.

>>> It seems to me all the beacon frames are dropped by the firmware
>>> before passing them to the driver, so the driver cannot parse them and
>>> do something sensible even though they are corrupted, the driver never
>>> gets them.
>>>
>>>> Just want to understand the problem is throughout or just before association.
>>>> If the driver itself it not getting the beacons then our debugging ends there,
>>>> some one from intel should guide you through the FW debugging.
>>>
>>> Not really, part of the debugging ends there, but we can still do something.
>>>
>>> What is the meaning of NEED_DTIM_BEFORE_ASSOC, if the driver doesn't
>>> *need* this? Why fail the association completely, if we don't need to?
>>>
>>> Also, I realized that after rebooting the router, the beacon frames
>>> are not corrupted any more, so it's a compound problem, yet even in
>>> the corrupted case, the driver can work just fine, if only it didn't
>>> *require* the DTIM unnecessarily,
>>
>> Yeah, that's more of design query with the problem being not able to
>> Rx the beacons? We need to understand the reason for this flag being
>> set by the iwlwifi driver.
>
> Indeed.
>
>>>as apparently all hardware and even
>>> other OS'es on this hardware do.
>>
>> Thats the reason this flag is a _HW_ not all hardwares requrie this
>> but intel does.
>
> But it doesn't, my hardware is Intel, and it works fine without it.
>
Yeah, so far so good. But there should be a reason why they are
specifically advertising this flag? Also DTIM is Multicast+Powersave
so a rare thing, we might no hit that too often.
Felipe Contreras - Nov. 10, 2013, 4:26 p.m.
On Sun, Nov 10, 2013 at 12:31 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>>>>>>>>> But we are receiving 0 beacons, waiting for more than 1 won't help.
>>>>>>>>> BTW, why NEED_DTIM_BEFORE_ASSOC if the device doesn't *need* the DTIM
>>>>>>>>> before the association?
>>>>>>>>>
>>>>>> This is not just for your case but rather on a generic note. Regarding
>>>>>> the flag even i am not
>>>>>> too sure but i guess some hardware need to know the DTIM to set the
>>>>>> wakeup schedule
>>>>>> after the association?
>>>>>
>
> Right - we need the send the beacon interval to the device *before* we
> configure the device to be associated.

But what do you mean "need"? If I remove the flag the association works fine.

>>>>> But not this hardware? Because everything works fine.
>>>>>
>>>>>>>> Oops...you just missed, Right after your print there is a check to
>>>>>>>> drop frames with BAD CRC :-).
>>>>>>>
>>>>>>> That's why I put the print before that check. Since I don't see the
>>>>>>> print, that means the check was never executed. iwlagn_rx_reply_rx()
>>>>>>> was never called for the beacon frame.
>>>>>>>
>
> That won't help since the firmware will drop frames with bad CRC,
> unless you are in monitor mode.

And apparently ad-hoc mode too.

Either way that's not helping, ideally those corrupted beacons should
be parsed by the driver, it will see they are corrupted, but still do
something sensible.

>>>>>> Ok. So when we disable advertising of that flag in the driver you said things
>>>>>> are working fine.
>>>>>
>>>>> Yes, everything works perfectly.
>>>>>
>>>>>> So in that scenario after the connection are you
>>>>>> seeing the beacons?
>>>>>
>>>>> No, there are no beacons ever, at least from this AP
>>>
>>>> Oh ok, thats interesting. Are you not seeing any disconnects due
>>>> to beacon loss triggers?
>>>
>>> I see some disconnects now and then, but I don't know why. Before
>>> trying to tackle those problems I would like to be able to connect
>>> reliably.
>
> No wonder. If we can't receive any beacons you can expect issues....
> PS will be completely broken and that is only the first on the list...

That's OK, it's better to connect with issues rather than not connect at all.

>> Its probably the beacons loss that triggering the disconnects, so
>> both the problem have the same cause. Its the beacon reception
>> we need to figure it out.
>>
>> Adding some intel guys explicitly.
>>
>>>> Also can you add some debugging to the iwlagn_rx_beacon_notif
>>>> (the beacon RX handler)?
>>>
>>> All right, I've added debugging there, but so far I see nothing.
>>>
>>
>> Hmm...dead end this side too.
>>
>>>>> It seems to me all the beacon frames are dropped by the firmware
>>>>> before passing them to the driver, so the driver cannot parse them and
>>>>> do something sensible even though they are corrupted, the driver never
>>>>> gets them.
>>>>>
>>>>>> Just want to understand the problem is throughout or just before association.
>>>>>> If the driver itself it not getting the beacons then our debugging ends there,
>>>>>> some one from intel should guide you through the FW debugging.
>>>>>
>>>>> Not really, part of the debugging ends there, but we can still do something.
>>>>>
>>>>> What is the meaning of NEED_DTIM_BEFORE_ASSOC, if the driver doesn't
>>>>> *need* this? Why fail the association completely, if we don't need to?
>>>>>
>
> As I explained, the firmware needs to. This is for configuring the PS
> state machine. But since you AP is completely broken, PS is likely not
> to work at all anyway....

I don't use powersave anyway.

> And my small experience in WiFi leads me to the conclusion that if a
> driver cannot rely on the AP sending beacon, it is really in trouble.

Somehow every device in this house doesn't seem to have a problem.
Even this device in Windows works fine.

> We can cope with buggy AP, but not associate to microwaves.
> Other devices will work, granted. But they can't go to sleep then, and
> need to poke the AP from time to time to make sure it hasn't
> disappeared.

That's better than not associating at all, ever.

> Note that this is true regardless of the design / HW wahtever. Ok, the
> Windows driver on the same device works with this "AP". Fine. But it
> can't theoretically can't work well. Nor can any other WiFi device
> that can't hear the beacon. Now - maybe we have an issue in the Linux
> driver that mangles the beacons (PHY stuff) - that's possible. But
> since you haven't sent a sniffer capture of the AP with another
> device, we can't know.

That's right, I tried to do that with an N900 but the monitor mode
doesn't work. I'll keep trying.

>>>>> Also, I realized that after rebooting the router, the beacon frames
>>>>> are not corrupted any more, so it's a compound problem, yet even in
>>>>> the corrupted case, the driver can work just fine, if only it didn't
>>>>> *require* the DTIM unnecessarily,
>>>>
>>>> Yeah, that's more of design query with the problem being not able to
>>>> Rx the beacons? We need to understand the reason for this flag being
>>>> set by the iwlwifi driver.
>>>
>>> Indeed.
>>>
>>>>>as apparently all hardware and even
>>>>> other OS'es on this hardware do.
>>>>
>>>> Thats the reason this flag is a _HW_ not all hardwares requrie this
>>>> but intel does.
>>>
>>> But it doesn't, my hardware is Intel, and it works fine without it.
>>>
>> Yeah, so far so good. But there should be a reason why they are
>> specifically advertising this flag? Also DTIM is Multicast+Powersave
>> so a rare thing, we might no hit that too often.
>
> Hmm... well... N/M.

Wouldn't it make sense to timeout if there's no DTIM, and still
associate? It's better than not associating ever. Plus, if you already
know that power saving wouldn't work in this case, merely disable
powersave.
Felipe Contreras - Nov. 10, 2013, 8:32 p.m.
Emmanuel Grumbach wrote:
> On 11/10/2013 06:26 PM, Felipe Contreras wrote:

> > That's better than not associating at all, ever.
> 
> No because it would break the driver against all the working APs which
> are fortunately enough more common. Maybe you can rewrite mac80211 /
> iwlwifi to make things work differently so that PS would still work with
> good APs and association would work with yours. Fair enough. Go ahead.

Challenge accepted:

http://article.gmane.org/gmane.linux.network/290256

> > Wouldn't it make sense to timeout if there's no DTIM, and still
> > associate? It's better than not associating ever. Plus, if you already
> > know that power saving wouldn't work in this case, merely disable
> > powersave.
> 
> I can't wait for your patch.

Good, because I already sent it.

With my patch if the AP sends the beacon correctly; power saving is enabled, if
not, association still works, but power saving is disabled.

How you could not imagine such patch is beyond me.

Patch

--- a/drivers/net/wireless/iwlwifi/dvm/rx.c
+++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
@@ -919,6 +919,11 @@  static int iwlagn_rx_reply_rx(struct iwl_priv *priv,
        ampdu_status = iwlagn_translate_rx_status(priv,
                                                  le32_to_cpu(rx_pkt_status));

+       if (ieee80211_is_beacon(header->frame_control)) {
+               print_hex_dump(KERN_INFO, "iwlwifi: dump: ", DUMP_PREFIX_OFFSET,
+                               16, 1, header, len, true);
+       }
+
        if ((unlikely(phy_res->cfg_phy_cnt > 20))) {
                IWL_DEBUG_DROP(priv, "dsp size out of range [0,20]: %d\n",
                                phy_res->cfg_phy_cnt);