diff mbox series

nl80211: Do not clear send_frame_cookie on TX wait expiration

Message ID 20230101125949.283037-1-andrei.otcheretianski@intel.com
State Changes Requested
Headers show
Series nl80211: Do not clear send_frame_cookie on TX wait expiration | expand

Commit Message

Andrei Otcheretianski Jan. 1, 2023, 12:59 p.m. UTC
From: Ilan Peer <ilan.peer@intel.com>

With mac80211 based drivers the NL80211_CMD_FRAME_WAIT_CANCEL can be
notified to user space while the driver is still handling the frame.
In such cases, when the driver is done handling the frame it would
indicate the frame handling status to mac80211, which would
trigger NL80211_CMD_FRAME_TX_STATUS notification to user space.

However, the nl80211 driver logic handling for
NL80211_CMD_FRAME_WAIT_CANCEL clears the send_frame_cookie, so when
the NL80211_CMD_FRAME_TX_STATUS arrives it is being ignored.
As a result, wpa_supplicant flows, e.g., P2P GoN confirm transmission
etc., are stuck (since Tx wait expiration is not handled by the
wpa_supplicant other than in the context of DPP).

Fix this issue by not clearing the send_frame_cookie as part of the
NL80211_CMD_FRAME_WAIT_CANCEL handling.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
---
 src/drivers/driver_nl80211_event.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Jouni Malinen Feb. 22, 2023, 4:42 p.m. UTC | #1
On Sun, Jan 01, 2023 at 02:59:49PM +0200, Andrei Otcheretianski wrote:
> With mac80211 based drivers the NL80211_CMD_FRAME_WAIT_CANCEL can be
> notified to user space while the driver is still handling the frame.
> In such cases, when the driver is done handling the frame it would
> indicate the frame handling status to mac80211, which would
> trigger NL80211_CMD_FRAME_TX_STATUS notification to user space.

This feels a bit strange.. Would you happen to have a wpa_supplicant
debug log with timestamps showing such a case?

NL80211_CMD_FRAME_WAIT_CANCEL is documented to be used as an event
"whenever the driver has completed the off-channel wait time" which is
supposed to be talking about the wait for the response. Why would
the driver be still processing the _transmitted_ request frame if it has
already indicated that it is not waiting for a response anymore? Is
there really a case where the TX status gets delayed so much that this
continues beyond the end of the maximum wait for the response?

> However, the nl80211 driver logic handling for
> NL80211_CMD_FRAME_WAIT_CANCEL clears the send_frame_cookie, so when
> the NL80211_CMD_FRAME_TX_STATUS arrives it is being ignored.
> As a result, wpa_supplicant flows, e.g., P2P GoN confirm transmission
> etc., are stuck (since Tx wait expiration is not handled by the
> wpa_supplicant other than in the context of DPP).
> 
> Fix this issue by not clearing the send_frame_cookie as part of the
> NL80211_CMD_FRAME_WAIT_CANCEL handling.

This clearing was added here:
http://w1.fi/cgit/hostap/commit/?id=7e6f59c7027a08dad4dd022da975502fd53857d1

The proposed changes here seem to partially revert that commit. Is it
really correct to not revert more if this new change is indeed needed?
What about the change in wpa_driver_nl80211_send_action_cancel_wait()?

Would there be a way to achieve both the cleanup in that earlier commit
and this strangely delayed TX status event handling? If not, the new
commit message would at least need to explain that this results in those
old confusing log entries starting to show up again.
Peer, Ilan Feb. 23, 2023, 3:23 p.m. UTC | #2
Hi Jouni,

> -----Original Message-----
> From: Jouni Malinen <j@w1.fi>
> Sent: Wednesday, February 22, 2023 18:43
> To: Otcheretianski, Andrei <andrei.otcheretianski@intel.com>
> Cc: hostap@lists.infradead.org; Peer, Ilan <ilan.peer@intel.com>
> Subject: Re: [PATCH] nl80211: Do not clear send_frame_cookie on TX wait
> expiration
> 
> On Sun, Jan 01, 2023 at 02:59:49PM +0200, Andrei Otcheretianski wrote:
> > With mac80211 based drivers the NL80211_CMD_FRAME_WAIT_CANCEL
> can be
> > notified to user space while the driver is still handling the frame.
> > In such cases, when the driver is done handling the frame it would
> > indicate the frame handling status to mac80211, which would trigger
> > NL80211_CMD_FRAME_TX_STATUS notification to user space.
> 
> This feels a bit strange.. Would you happen to have a wpa_supplicant debug
> log with timestamps showing such a case?
> 

The log file is attached. Note that this is using an innternal hostap version we are maintaining.

> NL80211_CMD_FRAME_WAIT_CANCEL is documented to be used as an
> event "whenever the driver has completed the off-channel wait time" which
> is supposed to be talking about the wait for the response. Why would the
> driver be still processing the _transmitted_ request frame if it has already
> indicated that it is not waiting for a response anymore? Is there really a case
> where the TX status gets delayed so much that this continues beyond the
> end of the maximum wait for the response?
> 

In this case the FW was not able to transmit the frame since the medium was very
busy. In this specific case the handling of the wait end event and the Tx complete status are
done asynchronously, where the remain on channel is handled before the Tx complete,
thus resulting with the scenario described here.  

> > However, the nl80211 driver logic handling for
> > NL80211_CMD_FRAME_WAIT_CANCEL clears the send_frame_cookie, so
> when
> > the NL80211_CMD_FRAME_TX_STATUS arrives it is being ignored.
> > As a result, wpa_supplicant flows, e.g., P2P GoN confirm transmission
> > etc., are stuck (since Tx wait expiration is not handled by the
> > wpa_supplicant other than in the context of DPP).
> >
> > Fix this issue by not clearing the send_frame_cookie as part of the
> > NL80211_CMD_FRAME_WAIT_CANCEL handling.
> 
> This clearing was added here:
> http://w1.fi/cgit/hostap/commit/?id=7e6f59c7027a08dad4dd022da975502fd
> 53857d1
> 
> The proposed changes here seem to partially revert that commit. Is it really
> correct to not revert more if this new change is indeed needed?
> What about the change in wpa_driver_nl80211_send_action_cancel_wait()?
> 

I do not think so. The change in  wpa_driver_nl80211_send_action_cancel_wait()
seems valid to me, as it initiates cancelling of wait operation only if it has a valid
cookie. This makes sense to me. 

> Would there be a way to achieve both the cleanup in that earlier commit and
> this strangely delayed TX status event handling? If not, the new commit
> message would at least need to explain that this results in those old
> confusing log entries starting to show up again.

I considered adding handling for EVENT_TX_WAIT_EXPIRE event in the wpa_supplicant, where
the offchannel logic would clear the current ' pending_action_tx'. Would this make more sense?

Regards,

Ilan.
diff mbox series

Patch

diff --git a/src/drivers/driver_nl80211_event.c b/src/drivers/driver_nl80211_event.c
index 29613161b9..c958eb66cc 100644
--- a/src/drivers/driver_nl80211_event.c
+++ b/src/drivers/driver_nl80211_event.c
@@ -3347,9 +3347,12 @@  static void nl80211_frame_wait_cancel(struct wpa_driver_nl80211_data *drv,
 		   (long long unsigned int) cookie,
 		   match ? " (match)" : "",
 		   drv->send_frame_cookie == cookie ? " (match-saved)" : "");
-	if (drv->send_frame_cookie == cookie)
-		drv->send_frame_cookie = (u64) -1;
-	if (!match)
+
+	/*
+	 * In case that the cookie matches the saved frame cookie the driver is
+	 * expected to indicate TX status
+	 */
+	if (!match || drv->send_frame_cookie == cookie)
 		return;
 
 	if (i < drv->num_send_frame_cookies - 1)