diff mbox

[7/8] GAS: End remain-on-channel due to delayed GAS comeback request

Message ID 1449744969-14895-7-git-send-email-ilan.peer@intel.com
State Accepted
Headers show

Commit Message

Ilan Peer Dec. 10, 2015, 10:56 a.m. UTC
From: Matti Gottlieb <matti.gottlieb@intel.com>

During the sequence of exchanging GAS frames with the AP, the AP can
request to comeback in X amount of time and resend the GAS request.

Currently the wpa_supplicant does not initiate the termination of the
remain-on-channel session, but rather waits until the requested comeback
delay has expired, and then tries to send the GAS frame (potentially
to save the time that is required to schedule a new remain on channel
flow).

This might cause unnecessary idle time (can be close to 1000 ms) in which the
device might be off-channel. Ending the current remain-on-channel session and
then rescheduling makes better usage of the time in this case.

End remain-on-channel session due to receiving a delayed GAS comeback request
from the AP.

Signed-off-by: Matti Gottlieb <matti.gottlieb@intel.com>
---
 wpa_supplicant/gas_query.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jouni Malinen Dec. 18, 2015, 8:33 p.m. UTC | #1
On Thu, Dec 10, 2015 at 12:56:08PM +0200, Ilan Peer wrote:
> During the sequence of exchanging GAS frames with the AP, the AP can
> request to comeback in X amount of time and resend the GAS request.
> 
> Currently the wpa_supplicant does not initiate the termination of the
> remain-on-channel session, but rather waits until the requested comeback
> delay has expired, and then tries to send the GAS frame (potentially
> to save the time that is required to schedule a new remain on channel
> flow).
> 
> This might cause unnecessary idle time (can be close to 1000 ms) in which the
> device might be off-channel. Ending the current remain-on-channel session and
> then rescheduling makes better usage of the time in this case.
> 
> End remain-on-channel session due to receiving a delayed GAS comeback request
> from the AP.

Thanks, applied.
Ilan Peer Dec. 20, 2015, 8:52 a.m. UTC | #2
> -----Original Message-----
> From: Jouni Malinen [mailto:j@w1.fi]
> Sent: Friday, December 18, 2015 22:34
> To: Peer, Ilan
> Cc: hostap@lists.infradead.org; Gottlieb, Matti
> Subject: Re: [PATCH 7/8] GAS: End remain-on-channel due to delayed GAS
> comeback request
> 
> On Thu, Dec 10, 2015 at 12:56:08PM +0200, Ilan Peer wrote:
> > During the sequence of exchanging GAS frames with the AP, the AP can
> > request to comeback in X amount of time and resend the GAS request.
> >
> > Currently the wpa_supplicant does not initiate the termination of the
> > remain-on-channel session, but rather waits until the requested
> > comeback delay has expired, and then tries to send the GAS frame
> > (potentially to save the time that is required to schedule a new
> > remain on channel flow).
> >
> > This might cause unnecessary idle time (can be close to 1000 ms) in
> > which the device might be off-channel. Ending the current
> > remain-on-channel session and then rescheduling makes better usage of
> the time in this case.
> >
> > End remain-on-channel session due to receiving a delayed GAS comeback
> > request from the AP.
> 
> Thanks, applied.
> 

Thanks. Let me know if there are issues with the last patch.

Ilan.
Jouni Malinen Dec. 20, 2015, 10:14 a.m. UTC | #3
On Sun, Dec 20, 2015 at 08:52:37AM +0000, Peer, Ilan wrote:
> Thanks. Let me know if there are issues with the last patch.

It does increase the risk of failures with deployed APs. I'm not at all
convinced that they can reply quickly enough to GAS comeback requests in
all cases.. I had actually experimented with this some time earlier but
with even a shorted timeout (100 ms). wpa_supplicant does not currently
retry GAS comeback requests at higher layer and it may be necessary to
add such a retry mechanism before dropping the wait timeout on each
individual frame.

Other than that, the change itself is clearly of significant benefit for
the cases where the first wait cannot be extended and mac80211 ends up
waiting for significant amount of time between each attempt. I just
don't want to optimize this at the cost of reliability to already
deployed use cases.
Ilan Peer Dec. 20, 2015, 12:57 p.m. UTC | #4
Hi Jouni,

> -----Original Message-----
> From: Jouni Malinen [mailto:j@w1.fi]
> Sent: Sunday, December 20, 2015 12:14
> To: Peer, Ilan
> Cc: hostap@lists.infradead.org; Gottlieb, Matti
> Subject: Re: [PATCH 7/8] GAS: End remain-on-channel due to delayed GAS
> comeback request
> 
> On Sun, Dec 20, 2015 at 08:52:37AM +0000, Peer, Ilan wrote:
> > Thanks. Let me know if there are issues with the last patch.
> 
> It does increase the risk of failures with deployed APs. I'm not at all convinced
> that they can reply quickly enough to GAS comeback requests in all cases.. I
> had actually experimented with this some time earlier but with even a shorted
> timeout (100 ms). wpa_supplicant does not currently retry GAS comeback

We did not think that this would be an issue as we assumed that after the GAS
initial request/response exchange all the comeback requests are negotiated directly
with the AP, without involving the advertisement server, so the exchange should be
fast enough to complete in 200 msec.

FWIW, in our testing setups we also used 100 msec which was also ok, however,
these are only testing setup, so we could still might issues in real deployments :)

> requests at higher layer and it may be necessary to add such a retry
> mechanism before dropping the wait timeout on each individual frame.
> 
> Other than that, the change itself is clearly of significant benefit for the cases
> where the first wait cannot be extended and mac80211 ends up waiting for
> significant amount of time between each attempt. I just don't want to optimize
> this at the cost of reliability to already deployed use cases.
> 

In case that all the wait times are equal, the first wait would never be extended, so 
eventually we will always need to pay the wait time between ROCs. As an alternative we
also considered to always cancel the previous running ROC before starting a new one, but this
has the disadvantage that scheduling a new ROC can once again incur additional delays,
so we decided to go with the approach in patch 8/8. We can revert to this approach
if you think that it is safer in terms of inter-op.

Regards,

Ilan.
Jouni Malinen Dec. 20, 2015, 5:55 p.m. UTC | #5
On Sun, Dec 20, 2015 at 12:57:36PM +0000, Peer, Ilan wrote:
> We did not think that this would be an issue as we assumed that after the GAS
> initial request/response exchange all the comeback requests are negotiated directly
> with the AP, without involving the advertisement server, so the exchange should be
> fast enough to complete in 200 msec.

Well, if things were perfect, sure, but.. There are bad AP
implementations and there are environments where it can be difficult to
get a long, fragmented GAS exchange through. On a busy 2.4 GHz channel,
it can take a while to get a chance to transmit a frame and the
likelihood of getting multiple 1500 byte frames through at 1 Mbps with
some interference drops quite a bit with interference. I've been to lab
environments where it was very difficult to complete fragmented GAS
exchanges reliable; never mind trying to do this in a way that each
frame has a maximum of 200 ms to make it through..

> FWIW, in our testing setups we also used 100 msec which was also ok, however,
> these are only testing setup, so we could still might issues in real deployments :)

I don't know what to expect in practical deployments, but I picked
semi-randomly a value between these: 150 ms. Or well, it was not really
that randomly, since it happened to be the value that made the existing
hwsim test case pass with MCC enabled.. :)

> In case that all the wait times are equal, the first wait would never be extended, so 
> eventually we will always need to pay the wait time between ROCs. As an alternative we
> also considered to always cancel the previous running ROC before starting a new one, but this
> has the disadvantage that scheduling a new ROC can once again incur additional delays,
> so we decided to go with the approach in patch 8/8. We can revert to this approach
> if you think that it is safer in terms of inter-op.

This patch 8/8 has a bug caused by patch 7/8, i.e., it does not really
do what you describe here.. Because of 7/8 terminating the first
offchannel wait (the only one with the longer wait time), the first
comeback request would start a new ROC with the shorter wait time and
every following comeback request would use that same wait time and
without ROC extension, that would result in the exact same issue.. Just
the wait time is shorter (200 vs. 1000 ms in these patches).

I fixed that by keeping the query->offchannel_tx_started tracking
up-to-date with patch 7/8 behavior and using the longer wait time for
the first comeback request if the initial wait time had been canceled
(which it really is in every single case now, but that could be modified
to consider the fragmentation-without-wait case with very short
comeback delay to skip stopping the initial ROC). This provides
significant further speedup when both patches 7 and 8 are applied.

To make it acceptable to test with shorter wait time first, I added a
mechanism to retry full GAS sequence if any waits for a comeback
response fail. This second attempt will use the old timeout of 1000 ms.
With this, the end result is actually more robust than the previous
design and significantly faster for the fragmented case with drivers
that cannot extend pending ROC. I haven't yet pushed this into the
master branch, but if nothing unexpected shows up, I'll probably do so.
Jouni Malinen Dec. 20, 2015, 8:35 p.m. UTC | #6
On Sun, Dec 20, 2015 at 07:55:48PM +0200, Jouni Malinen wrote:
> I fixed that by keeping the query->offchannel_tx_started tracking
> up-to-date with patch 7/8 behavior and using the longer wait time for
> the first comeback request if the initial wait time had been canceled
> (which it really is in every single case now, but that could be modified
> to consider the fragmentation-without-wait case with very short
> comeback delay to skip stopping the initial ROC). This provides
> significant further speedup when both patches 7 and 8 are applied.

That special case of fragmentation (comeback delay 1) is now handled
without stopping the initial ROC.

> To make it acceptable to test with shorter wait time first, I added a
> mechanism to retry full GAS sequence if any waits for a comeback
> response fail. This second attempt will use the old timeout of 1000 ms.
> With this, the end result is actually more robust than the previous
> design and significantly faster for the fragmented case with drivers
> that cannot extend pending ROC. I haven't yet pushed this into the
> master branch, but if nothing unexpected shows up, I'll probably do so.

I pushed this with a fixed patch 8/8 into the master branch. It is not
exactly perfect since an exchange taking more than 850 ms will end up in
the state where each new TX operation hits the issue of the pending ROC
ending up sooner. That said, this is significantly faster than what was
there previously and I'm not sure there would be sufficient
justification for the extra complexity within wpa_supplicant to try to
track the remaining ROC duration within kernel (i.e., if someone wants
to optimize that, it might be better to spend the effort working on
kernel side making it possible to extend the ongoing ROC and/or
providing more convenient events to user space to indicate when the wait
for an offloaded TX operation has completed in a way that would benefit
from wpa_supplicant requesting a longer duration for the next TX
command).
Ilan Peer Dec. 21, 2015, 8:12 a.m. UTC | #7
> -----Original Message-----
> From: Jouni Malinen [mailto:j@w1.fi]
> Sent: Sunday, December 20, 2015 19:56
> To: Peer, Ilan
> Cc: hostap@lists.infradead.org; Gottlieb, Matti
> Subject: Re: [PATCH 7/8] GAS: End remain-on-channel due to delayed GAS
> comeback request
> 
> On Sun, Dec 20, 2015 at 12:57:36PM +0000, Peer, Ilan wrote:
> > We did not think that this would be an issue as we assumed that after
> > the GAS initial request/response exchange all the comeback requests
> > are negotiated directly with the AP, without involving the
> > advertisement server, so the exchange should be fast enough to complete
> in 200 msec.
> 
> Well, if things were perfect, sure, but.. There are bad AP implementations and
> there are environments where it can be difficult to get a long, fragmented
> GAS exchange through. On a busy 2.4 GHz channel, it can take a while to get a
> chance to transmit a frame and the likelihood of getting multiple 1500 byte
> frames through at 1 Mbps with some interference drops quite a bit with
> interference. I've been to lab environments where it was very difficult to
> complete fragmented GAS exchanges reliable; never mind trying to do this in
> a way that each frame has a maximum of 200 ms to make it through..
> 
> > FWIW, in our testing setups we also used 100 msec which was also ok,
> > however, these are only testing setup, so we could still might issues
> > in real deployments :)
> 
> I don't know what to expect in practical deployments, but I picked semi-
> randomly a value between these: 150 ms. Or well, it was not really that
> randomly, since it happened to be the value that made the existing hwsim test
> case pass with MCC enabled.. :)
> 

hehe ... 150 does not really sound random to begin with :)

> > In case that all the wait times are equal, the first wait would never
> > be extended, so eventually we will always need to pay the wait time
> > between ROCs. As an alternative we also considered to always cancel
> > the previous running ROC before starting a new one, but this has the
> > disadvantage that scheduling a new ROC can once again incur additional
> > delays, so we decided to go with the approach in patch 8/8. We can revert
> to this approach if you think that it is safer in terms of inter-op.
> 
> This patch 8/8 has a bug caused by patch 7/8, i.e., it does not really do what
> you describe here.. Because of 7/8 terminating the first offchannel wait (the
> only one with the longer wait time), the first comeback request would start a
> new ROC with the shorter wait time and every following comeback request
> would use that same wait time and without ROC extension, that would result in
> the exact same issue.. Just the wait time is shorter (200 vs. 1000 ms in these
> patches).
> 

Missed that fact that comeback delay is also set for initial response.

> I fixed that by keeping the query->offchannel_tx_started tracking up-to-date
> with patch 7/8 behavior and using the longer wait time for the first comeback
> request if the initial wait time had been canceled (which it really is in every
> single case now, but that could be modified to consider the fragmentation-
> without-wait case with very short comeback delay to skip stopping the initial
> ROC). This provides significant further speedup when both patches 7 and 8
> are applied.
> 
> To make it acceptable to test with shorter wait time first, I added a mechanism
> to retry full GAS sequence if any waits for a comeback response fail. This
> second attempt will use the old timeout of 1000 ms.
> With this, the end result is actually more robust than the previous design and
> significantly faster for the fragmented case with drivers that cannot extend
> pending ROC. I haven't yet pushed this into the master branch, but if nothing
> unexpected shows up, I'll probably do so.
> 

Thanks,

Ilan.
Ilan Peer Dec. 21, 2015, 8:40 a.m. UTC | #8
> -----Original Message-----
> From: Jouni Malinen [mailto:j@w1.fi]
> Sent: Sunday, December 20, 2015 22:36
> To: Peer, Ilan
> Cc: hostap@lists.infradead.org; Gottlieb, Matti
> Subject: Re: [PATCH 7/8] GAS: End remain-on-channel due to delayed GAS
> comeback request
> 
> On Sun, Dec 20, 2015 at 07:55:48PM +0200, Jouni Malinen wrote:
> > I fixed that by keeping the query->offchannel_tx_started tracking
> > up-to-date with patch 7/8 behavior and using the longer wait time for
> > the first comeback request if the initial wait time had been canceled
> > (which it really is in every single case now, but that could be
> > modified to consider the fragmentation-without-wait case with very
> > short comeback delay to skip stopping the initial ROC). This provides
> > significant further speedup when both patches 7 and 8 are applied.
> 
> That special case of fragmentation (comeback delay 1) is now handled without
> stopping the initial ROC.
> 
> > To make it acceptable to test with shorter wait time first, I added a
> > mechanism to retry full GAS sequence if any waits for a comeback
> > response fail. This second attempt will use the old timeout of 1000 ms.
> > With this, the end result is actually more robust than the previous
> > design and significantly faster for the fragmented case with drivers
> > that cannot extend pending ROC. I haven't yet pushed this into the
> > master branch, but if nothing unexpected shows up, I'll probably do so.
> 
> I pushed this with a fixed patch 8/8 into the master branch. It is not exactly
> perfect since an exchange taking more than 850 ms will end up in the state
> where each new TX operation hits the issue of the pending ROC ending up
> sooner. That said, this is significantly faster than what was there previously and
> I'm not sure there would be sufficient justification for the extra complexity
> within wpa_supplicant to try to track the remaining ROC duration within
> kernel (i.e., if someone wants to optimize that, it might be better to spend the
> effort working on kernel side making it possible to extend the ongoing ROC
> and/or providing more convenient events to user space to indicate when the
> wait for an offloaded TX operation has completed in a way that would benefit
> from wpa_supplicant requesting a longer duration for the next TX command).
> 

I do not think that this should be handled in wpa_supplicant, as even today drivers behave
differently when it comes for handling ROC operations, i.e., mac80211 SW ROC vs. HW ROC,
and this would over complicate wpa_s. I think that that if this would be needed a better apparoch
would be moving the ROC scheduling policy further down to the hardware were possible (for
example for HWs already need to handle MCC etc.).

Thanks for optimizing this :)

Ilan.
diff mbox

Patch

diff --git a/wpa_supplicant/gas_query.c b/wpa_supplicant/gas_query.c
index 10ecce7..65dec2b 100644
--- a/wpa_supplicant/gas_query.c
+++ b/wpa_supplicant/gas_query.c
@@ -319,6 +319,8 @@  static void gas_query_tx_comeback_req_delay(struct gas_query *gas,
 {
 	unsigned int secs, usecs;
 
+	offchannel_send_action_done(gas->wpa_s);
+
 	secs = (comeback_delay * 1024) / 1000000;
 	usecs = comeback_delay * 1024 - secs * 1000000;
 	wpa_printf(MSG_DEBUG, "GAS: Send comeback request to " MACSTR