diff mbox

Fix scanning state when sched_scan is stopped explicitly

Message ID 20140409225048.F13EA140263@ushik.mtv.corp.google.com
State Superseded
Headers show

Commit Message

Dmitry Shmidt April 9, 2014, 10:48 p.m. UTC
Change-Id: I6deca387f7c64e4125e85ecfd585e1cff6931ab1
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
---
 wpa_supplicant/scan.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Jouni Malinen April 11, 2014, 4:09 p.m. UTC | #1
On Wed, Apr 09, 2014 at 03:48:07PM -0700, Dmitry Shmidt wrote:
> --- a/wpa_supplicant/scan.c
> +++ b/wpa_supplicant/scan.c
> @@ -261,6 +261,9 @@ int wpa_supplicant_stop_sched_scan(struct wpa_supplicant *wpa_s)
>  		wpa_dbg(wpa_s, MSG_DEBUG, "stopping sched_scan failed!");
>  		/* TODO: what to do if stopping fails? */
>  		return -1;
> +	} else {
> +		wpa_s->sched_scanning = 0;
> +		wpa_supplicant_notify_scanning(wpa_s, 0);
>  	}

It looks like the current implementation assumes that there will be a
driver event to indicate that sched_scan was stopped and
wpa_s->sched_scanning is cleared to 0 when that event is processed. Is
there a driver that does not end up indicating that event?

I would assume there was a reason for postponing, but then again, I now
remember that there were open questions on a related area.. I think
there was a request to revert commit
cf70d2981df1be7005fb90c2532e737ab39bc119 and handle some of the changes
in mac80211 to make operations synchronous. I guess we should get open
item resolved as well.. I'm not completely sure whether these two
changes have any dependencies to each other, though.
Dmitry Shmidt April 11, 2014, 4:51 p.m. UTC | #2
On Fri, Apr 11, 2014 at 9:09 AM, Jouni Malinen <j@w1.fi> wrote:
> On Wed, Apr 09, 2014 at 03:48:07PM -0700, Dmitry Shmidt wrote:
>> --- a/wpa_supplicant/scan.c
>> +++ b/wpa_supplicant/scan.c
>> @@ -261,6 +261,9 @@ int wpa_supplicant_stop_sched_scan(struct wpa_supplicant *wpa_s)
>>               wpa_dbg(wpa_s, MSG_DEBUG, "stopping sched_scan failed!");
>>               /* TODO: what to do if stopping fails? */
>>               return -1;
>> +     } else {
>> +             wpa_s->sched_scanning = 0;
>> +             wpa_supplicant_notify_scanning(wpa_s, 0);
>>       }
>
> It looks like the current implementation assumes that there will be a
> driver event to indicate that sched_scan was stopped and
> wpa_s->sched_scanning is cleared to 0 when that event is processed. Is
> there a driver that does not end up indicating that event?

Indeed, driver is not calling cfg80211_sched_scan_stopped() to send
NL80211_CMD_SCHED_SCAN_STOPPED message.
It is calling cfg80211_scan_done(aborted) to send
NL80211_CMD_SCAN_ABORTED that is doing EVENT_SCAN_RESULTS
and finally cleans wpa_s->scanning state.

However, I tried to resolve next problem that is not disappearing even if we
send message above from the driver:

wpas_stop_pno()  {
     wpa_supplicant_stop_sched_scan()
      if (wpa_s->wpa_state == WPA_SCANNING)
                wpa_supplicant_req_scan(wpa_s, 0, 0);  <<< usual scan
}

wpa_supplicant_scan() {
...
        if (wpa_s->scanning) {
                ...
                wpa_supplicant_req_scan(wpa_s, 1, 0);  <<< postponing
scan for 1 sec
                return;
        }
...
}

Scan is postponed for 1 sec and I just tried to eliminate it. It is
slightly different
from problem below where Raja tried to sync between two sched scan requests.

>
> I would assume there was a reason for postponing, but then again, I now
> remember that there were open questions on a related area.. I think
> there was a request to revert commit
> cf70d2981df1be7005fb90c2532e737ab39bc119 and handle some of the changes
> in mac80211 to make operations synchronous. I guess we should get open
> item resolved as well.. I'm not completely sure whether these two
> changes have any dependencies to each other, though.

I think Raja's change is related to "coordination" between sched scan
by wpa_supplicant
and explicit one.

>
> --
> Jouni Malinen                                            PGP id EFC895FA
Jouni Malinen March 7, 2015, 7:46 p.m. UTC | #3
On Fri, Apr 11, 2014 at 09:51:14AM -0700, Dmitry Shmidt wrote:
> On Fri, Apr 11, 2014 at 9:09 AM, Jouni Malinen <j@w1.fi> wrote:
> > On Wed, Apr 09, 2014 at 03:48:07PM -0700, Dmitry Shmidt wrote:
> >> --- a/wpa_supplicant/scan.c
> >> +++ b/wpa_supplicant/scan.c
> >> @@ -261,6 +261,9 @@ int wpa_supplicant_stop_sched_scan(struct wpa_supplicant *wpa_s)
> >>               wpa_dbg(wpa_s, MSG_DEBUG, "stopping sched_scan failed!");
> >>               /* TODO: what to do if stopping fails? */
> >>               return -1;
> >> +     } else {
> >> +             wpa_s->sched_scanning = 0;
> >> +             wpa_supplicant_notify_scanning(wpa_s, 0);

I have this in my pending queue, but I'm not completely sure what to do
about this.

Is there still a case where this would be needed? If so, I guess I can
apply this since I have not really had a chance to do any real
sched_scan testing myself and no one has come up with additional
justification for delaying the notification until the driver has
actually completed stopping of sched_scan.

> > I would assume there was a reason for postponing, but then again, I now
> > remember that there were open questions on a related area.. I think
> > there was a request to revert commit
> > cf70d2981df1be7005fb90c2532e737ab39bc119 and handle some of the changes
> > in mac80211 to make operations synchronous. I guess we should get open
> > item resolved as well.. I'm not completely sure whether these two
> > changes have any dependencies to each other, though.
> 
> I think Raja's change is related to "coordination" between sched scan
> by wpa_supplicant
> and explicit one.

This revert request has not gone anywhere either.. Would someone still
remember what the issue here is and whether there is still need to
either revert that commit cf70d2981df1be7005fb90c2532e737ab39bc119
('wpa_supplicant: Schedule PNO on completion of ongoing sched_scan') or
fix something in it?

It would be nice to see a debug log showing the issue since I don't have
very convenient test setup for sched_scan.
Dmitry Shmidt March 9, 2015, 6:29 p.m. UTC | #4
On Sat, Mar 7, 2015 at 11:46 AM, Jouni Malinen <j@w1.fi> wrote:
> On Fri, Apr 11, 2014 at 09:51:14AM -0700, Dmitry Shmidt wrote:
>> On Fri, Apr 11, 2014 at 9:09 AM, Jouni Malinen <j@w1.fi> wrote:
>> > On Wed, Apr 09, 2014 at 03:48:07PM -0700, Dmitry Shmidt wrote:
>> >> --- a/wpa_supplicant/scan.c
>> >> +++ b/wpa_supplicant/scan.c
>> >> @@ -261,6 +261,9 @@ int wpa_supplicant_stop_sched_scan(struct wpa_supplicant *wpa_s)
>> >>               wpa_dbg(wpa_s, MSG_DEBUG, "stopping sched_scan failed!");
>> >>               /* TODO: what to do if stopping fails? */
>> >>               return -1;
>> >> +     } else {
>> >> +             wpa_s->sched_scanning = 0;
>> >> +             wpa_supplicant_notify_scanning(wpa_s, 0);
>
> I have this in my pending queue, but I'm not completely sure what to do
> about this.
>
> Is there still a case where this would be needed? If so, I guess I can
> apply this since I have not really had a chance to do any real
> sched_scan testing myself and no one has come up with additional
> justification for delaying the notification until the driver has
> actually completed stopping of sched_scan.

When wifi manager explicitly stops PNO (by set pno 0), wpa_supplicant sends
stop_pno command to the driver. Theoretically it is "sync" call and
PNO is stopped
immediately. However, it takes time for event to propagate back:
  nl80211_stop_sched_scan() -> __cfg80211_stop_sched_scan() ->
  nl80211_send_sched_scan(rdev, dev, NL80211_CMD_SCHED_SCAN_STOPPED);
so when wpa_supplciant gets NL80211_CMD_SCHED_SCAN_STOPPED
that will be translated to EVENT_SCHED_SCAN_STOPPED to clear
  wpa_s->scanning
it will be too late - code in scan function will postpone scan for 1 sec:
        if (wpa_s->scanning) {
                /*
                 * If we are already in scanning state, we shall reschedule the
                 * the incoming scan request.
                 */
                wpa_dbg(wpa_s, MSG_DEBUG, "Already scanning -
Reschedule the incoming scan req");
                wpa_supplicant_req_scan(wpa_s, 1, 0);
                return;
        }

So unless you see some problem with this patch, I would recommend to take it
to avoid 1 sec delay after PNO termination.

>
>> > I would assume there was a reason for postponing, but then again, I now
>> > remember that there were open questions on a related area.. I think
>> > there was a request to revert commit
>> > cf70d2981df1be7005fb90c2532e737ab39bc119 and handle some of the changes
>> > in mac80211 to make operations synchronous. I guess we should get open
>> > item resolved as well.. I'm not completely sure whether these two
>> > changes have any dependencies to each other, though.
>>
>> I think Raja's change is related to "coordination" between sched scan
>> by wpa_supplicant
>> and explicit one.
>
> This revert request has not gone anywhere either.. Would someone still
> remember what the issue here is and whether there is still need to
> either revert that commit cf70d2981df1be7005fb90c2532e737ab39bc119
> ('wpa_supplicant: Schedule PNO on completion of ongoing sched_scan') or
> fix something in it?
>
> It would be nice to see a debug log showing the issue since I don't have
> very convenient test setup for sched_scan.
>
> --
> Jouni Malinen                                            PGP id EFC895FA
> _______________________________________________
> HostAP mailing list
> HostAP@lists.shmoo.com
> http://lists.shmoo.com/mailman/listinfo/hostap
Johannes Berg March 9, 2015, 7:43 p.m. UTC | #5
On Mon, 2015-03-09 at 11:29 -0700, Dmitry Shmidt wrote:
> On Sat, Mar 7, 2015 at 11:46 AM, Jouni Malinen <j@w1.fi> wrote:
> > On Fri, Apr 11, 2014 at 09:51:14AM -0700, Dmitry Shmidt wrote:
> >> On Fri, Apr 11, 2014 at 9:09 AM, Jouni Malinen <j@w1.fi> wrote:
> >> > On Wed, Apr 09, 2014 at 03:48:07PM -0700, Dmitry Shmidt wrote:
> >> >> --- a/wpa_supplicant/scan.c
> >> >> +++ b/wpa_supplicant/scan.c
> >> >> @@ -261,6 +261,9 @@ int wpa_supplicant_stop_sched_scan(struct wpa_supplicant *wpa_s)
> >> >>               wpa_dbg(wpa_s, MSG_DEBUG, "stopping sched_scan failed!");
> >> >>               /* TODO: what to do if stopping fails? */
> >> >>               return -1;
> >> >> +     } else {
> >> >> +             wpa_s->sched_scanning = 0;
> >> >> +             wpa_supplicant_notify_scanning(wpa_s, 0);
> >
> > I have this in my pending queue, but I'm not completely sure what to do
> > about this.
> >
> > Is there still a case where this would be needed? If so, I guess I can
> > apply this since I have not really had a chance to do any real
> > sched_scan testing myself and no one has come up with additional
> > justification for delaying the notification until the driver has
> > actually completed stopping of sched_scan.
> 
> When wifi manager explicitly stops PNO (by set pno 0), wpa_supplicant sends
> stop_pno command to the driver. Theoretically it is "sync" call and
> PNO is stopped
> immediately. However, it takes time for event to propagate back:
>   nl80211_stop_sched_scan() -> __cfg80211_stop_sched_scan() ->
>   nl80211_send_sched_scan(rdev, dev, NL80211_CMD_SCHED_SCAN_STOPPED);
> so when wpa_supplciant gets NL80211_CMD_SCHED_SCAN_STOPPED
> that will be translated to EVENT_SCHED_SCAN_STOPPED to clear
>   wpa_s->scanning
> it will be too late

Which driver are you using? ISTR fixing a number of issues in this area.

johannes
Dmitry Shmidt March 9, 2015, 7:46 p.m. UTC | #6
On Mon, Mar 9, 2015 at 12:43 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Mon, 2015-03-09 at 11:29 -0700, Dmitry Shmidt wrote:
>> On Sat, Mar 7, 2015 at 11:46 AM, Jouni Malinen <j@w1.fi> wrote:
>> > On Fri, Apr 11, 2014 at 09:51:14AM -0700, Dmitry Shmidt wrote:
>> >> On Fri, Apr 11, 2014 at 9:09 AM, Jouni Malinen <j@w1.fi> wrote:
>> >> > On Wed, Apr 09, 2014 at 03:48:07PM -0700, Dmitry Shmidt wrote:
>> >> >> --- a/wpa_supplicant/scan.c
>> >> >> +++ b/wpa_supplicant/scan.c
>> >> >> @@ -261,6 +261,9 @@ int wpa_supplicant_stop_sched_scan(struct wpa_supplicant *wpa_s)
>> >> >>               wpa_dbg(wpa_s, MSG_DEBUG, "stopping sched_scan failed!");
>> >> >>               /* TODO: what to do if stopping fails? */
>> >> >>               return -1;
>> >> >> +     } else {
>> >> >> +             wpa_s->sched_scanning = 0;
>> >> >> +             wpa_supplicant_notify_scanning(wpa_s, 0);
>> >
>> > I have this in my pending queue, but I'm not completely sure what to do
>> > about this.
>> >
>> > Is there still a case where this would be needed? If so, I guess I can
>> > apply this since I have not really had a chance to do any real
>> > sched_scan testing myself and no one has come up with additional
>> > justification for delaying the notification until the driver has
>> > actually completed stopping of sched_scan.
>>
>> When wifi manager explicitly stops PNO (by set pno 0), wpa_supplicant sends
>> stop_pno command to the driver. Theoretically it is "sync" call and
>> PNO is stopped
>> immediately. However, it takes time for event to propagate back:
>>   nl80211_stop_sched_scan() -> __cfg80211_stop_sched_scan() ->
>>   nl80211_send_sched_scan(rdev, dev, NL80211_CMD_SCHED_SCAN_STOPPED);
>> so when wpa_supplciant gets NL80211_CMD_SCHED_SCAN_STOPPED
>> that will be translated to EVENT_SCHED_SCAN_STOPPED to clear
>>   wpa_s->scanning
>> it will be too late
>
> Which driver are you using? ISTR fixing a number of issues in this area.

We are using Broadcom and Qualcomm drivers. However, I am not sure that
driver is relevant here - all the calls I described above are done by kernel
wireless stack and wpa_supplicant.

>
> johannes
>
Johannes Berg March 9, 2015, 7:56 p.m. UTC | #7
On Mon, 2015-03-09 at 12:46 -0700, Dmitry Shmidt wrote:
> >> immediately. However, it takes time for event to propagate back:
> >>   nl80211_stop_sched_scan() -> __cfg80211_stop_sched_scan() ->
> >>   nl80211_send_sched_scan(rdev, dev, NL80211_CMD_SCHED_SCAN_STOPPED);
> >> so when wpa_supplciant gets NL80211_CMD_SCHED_SCAN_STOPPED
> >> that will be translated to EVENT_SCHED_SCAN_STOPPED to clear
> >>   wpa_s->scanning
> >> it will be too late
> >
> > Which driver are you using? ISTR fixing a number of issues in this area.
> 
> We are using Broadcom and Qualcomm drivers. However, I am not sure that
> driver is relevant here - all the calls I described above are done by kernel
> wireless stack and wpa_supplicant.

Right but I think we fixed the stack to report the event back before the
stop_sched_scan() call can return? But my memory of this is very vague,
I might very well be mistaken and would have to check the code.

johannes
Dmitry Shmidt March 9, 2015, 8:07 p.m. UTC | #8
On Mon, Mar 9, 2015 at 12:56 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Mon, 2015-03-09 at 12:46 -0700, Dmitry Shmidt wrote:
>> >> immediately. However, it takes time for event to propagate back:
>> >>   nl80211_stop_sched_scan() -> __cfg80211_stop_sched_scan() ->
>> >>   nl80211_send_sched_scan(rdev, dev, NL80211_CMD_SCHED_SCAN_STOPPED);
>> >> so when wpa_supplciant gets NL80211_CMD_SCHED_SCAN_STOPPED
>> >> that will be translated to EVENT_SCHED_SCAN_STOPPED to clear
>> >>   wpa_s->scanning
>> >> it will be too late
>> >
>> > Which driver are you using? ISTR fixing a number of issues in this area.
>>
>> We are using Broadcom and Qualcomm drivers. However, I am not sure that
>> driver is relevant here - all the calls I described above are done by kernel
>> wireless stack and wpa_supplicant.
>
> Right but I think we fixed the stack to report the event back before the
> stop_sched_scan() call can return? But my memory of this is very vague,
> I might very well be mistaken and would have to check the code.

If you can hint me what this change was about, I'll try to find it.
Meanwhile the only change I see between 3.19 and 3.4 in
__cfg80211_stop_sched_scan() is to use rtnl_lock instead of sched_scan_mtx.
Other than that the logic is the same - send stop command to the driver and send
 NL80211_CMD_SCHED_SCAN_STOPPED back to user space.
And I see (I didn't try 3.19) that it will come "too late".

>
> johannes
>
Johannes Berg March 9, 2015, 8:11 p.m. UTC | #9
On Mon, 2015-03-09 at 13:07 -0700, Dmitry Shmidt wrote:

> If you can hint me what this change was about, I'll try to find it.
> Meanwhile the only change I see between 3.19 and 3.4 in
> __cfg80211_stop_sched_scan() is to use rtnl_lock instead of sched_scan_mtx.
> Other than that the logic is the same - send stop command to the driver and send
>  NL80211_CMD_SCHED_SCAN_STOPPED back to user space.
> And I see (I didn't try 3.19) that it will come "too late".

Looks like cfg80211 could do this correctly, but mac80211 insists on
doing the scheduled work. Given that locks the RTNL it's not going to be
simple to actually implement it.

It therefore seems far more likely then that I'm just confused and was
thinking about something else, sorry about that.

johannes
Arik Nemtsov March 9, 2015, 8:17 p.m. UTC | #10
On Mon, Mar 9, 2015 at 10:11 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
>
> On Mon, 2015-03-09 at 13:07 -0700, Dmitry Shmidt wrote:
>
> > If you can hint me what this change was about, I'll try to find it.
> > Meanwhile the only change I see between 3.19 and 3.4 in
> > __cfg80211_stop_sched_scan() is to use rtnl_lock instead of sched_scan_mtx.
> > Other than that the logic is the same - send stop command to the driver and send
> >  NL80211_CMD_SCHED_SCAN_STOPPED back to user space.
> > And I see (I didn't try 3.19) that it will come "too late".
>
> Looks like cfg80211 could do this correctly, but mac80211 insists on
> doing the scheduled work. Given that locks the RTNL it's not going to be
> simple to actually implement it.
>

Is this really about kernel-mode though? wpa_s is single threaded
(eloop), so even if the notification comes up right away, the event is
still only queued on the thread..

That said, I think Dmitry's version might hurt some state. It's safer
to just schedule the scan on the eloop with a very small delay (even
0), so that the pending sched-scan event has a chance to get
processed.

Arik
Dmitry Shmidt March 13, 2015, 6:16 p.m. UTC | #11
On Mon, Mar 9, 2015 at 1:17 PM, Arik Nemtsov <arik@wizery.com> wrote:
> On Mon, Mar 9, 2015 at 10:11 PM, Johannes Berg
> <johannes@sipsolutions.net> wrote:
>>
>> On Mon, 2015-03-09 at 13:07 -0700, Dmitry Shmidt wrote:
>>
>> > If you can hint me what this change was about, I'll try to find it.
>> > Meanwhile the only change I see between 3.19 and 3.4 in
>> > __cfg80211_stop_sched_scan() is to use rtnl_lock instead of sched_scan_mtx.
>> > Other than that the logic is the same - send stop command to the driver and send
>> >  NL80211_CMD_SCHED_SCAN_STOPPED back to user space.
>> > And I see (I didn't try 3.19) that it will come "too late".
>>
>> Looks like cfg80211 could do this correctly, but mac80211 insists on
>> doing the scheduled work. Given that locks the RTNL it's not going to be
>> simple to actually implement it.
>>
>
> Is this really about kernel-mode though? wpa_s is single threaded
> (eloop), so even if the notification comes up right away, the event is
> still only queued on the thread..

This is what I see as well.

>
> That said, I think Dmitry's version might hurt some state. It's safer
> to just schedule the scan on the eloop with a very small delay (even
> 0), so that the pending sched-scan event has a chance to get
> processed.

Changing scanning function may affect others, while resetting PNO
state should not. Moreover later EVENT_SCHED_SCAN_STOPPED
will do nothing if sched_scan_timed_out or pno_sched_pending will not
be set.
               if (wpa_s->sched_scan_timed_out) {
                        wpa_supplicant_req_sched_scan(wpa_s);
                } else if (wpa_s->pno_sched_pending) {
                        wpa_s->pno_sched_pending = 0;
                        wpas_start_pno(wpa_s);
                }


>
> Arik
Arik Nemtsov March 16, 2015, 11:39 a.m. UTC | #12
On Fri, Mar 13, 2015 at 8:16 PM, Dmitry Shmidt <dimitrysh@google.com> wrote:
> Changing scanning function may affect others, while resetting PNO
> state should not. Moreover later EVENT_SCHED_SCAN_STOPPED
> will do nothing if sched_scan_timed_out or pno_sched_pending will not
> be set.
>                if (wpa_s->sched_scan_timed_out) {
>                         wpa_supplicant_req_sched_scan(wpa_s);
>                 } else if (wpa_s->pno_sched_pending) {
>                         wpa_s->pno_sched_pending = 0;
>                         wpas_start_pno(wpa_s);
>                 }

Your change can hurt sched-scan users. For instance if
wpa_s->sched_scan_timed_out is set, then your patch will introduce a
time-window where a regular scan might come in before the
EVENT_SCHED_SCAN_STOPPED has a chance to arrive.

You introduce a similar race for wpa_s->pno_sched_pending (you even
guarantee it there I believe?).

I guess the easiest fix is to have another pno related flag
(wpa_s->restart_scan_after_pno ?) and checking it in
EVENT_SCHED_SCAN_STOPPED for restarting the regular scan.

Arik
diff mbox

Patch

diff --git a/wpa_supplicant/scan.c b/wpa_supplicant/scan.c
index 1d4e6e5..af34887 100644
--- a/wpa_supplicant/scan.c
+++ b/wpa_supplicant/scan.c
@@ -261,6 +261,9 @@  int wpa_supplicant_stop_sched_scan(struct wpa_supplicant *wpa_s)
 		wpa_dbg(wpa_s, MSG_DEBUG, "stopping sched_scan failed!");
 		/* TODO: what to do if stopping fails? */
 		return -1;
+	} else {
+		wpa_s->sched_scanning = 0;
+		wpa_supplicant_notify_scanning(wpa_s, 0);
 	}
 
 	return ret;