diff mbox series

Bypass QDISC for control frames on Linux

Message ID 20190912192002.6105-1-alexander@wetzel-home.de
State Changes Requested
Headers show
Series Bypass QDISC for control frames on Linux | expand

Commit Message

Alexander Wetzel Sept. 12, 2019, 7:20 p.m. UTC
Let wpa_supplicant/hostapd generated control frames bypass the QDISC
when the linux kernel supports the feature, to make sure the frames
are reaching the driver and are not delayed or dropped by QDISC.

This fixes a potential race when rekeying a connection under load:
Without that an EAPOL#4 frame send out by wpa_supplicant may not have
reached the driver when switching over to the new key, causing the AP to
disconnect the STA.

Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
---

The intend of this patch is to make sure the eapol#4 frame has at least
reached the driver when we replace the key.

Now this is is just the quick & easy fix to make sure EAPOL#4 is at
least in the driver queues when we replace the key, as discussed here:
http://lists.infradead.org/pipermail/hostap/2019-September/040514.html

I've cleaned up the tested hack a bit and added the same bypass for
hostapd. I don't like the two independent eapol paths and think it would
make sense to let nl80211 handle eapol frames also for wpa_supplicant
and not just for hostapd. But the changes to get that implemented are
quite far reaching and not directly linked to the problem at hand.

I've also started looking how to unify the two paths, but this will be
quite invasive and nothing I see in the near future.

The patch here is therefore just enabling PACKET_QDISC_BYPASS on the
existing sockets we use to transmit eapol frames: When we can't bypass
the QDISC - probably the kernel is < 3.14 - we simply continue anyhow,
thus keeping the existing behavior. (A warning here seems a bit like
overkill, but that's just my gut feeling...)

Technically it would be sufficient to only apply the second part of the
patch (l2_packet_linux.c): Really critical is only the eapol#4 frame. 
But having some qdisc drop any 802.11 management frames - and then ONLY
eapol frames - did not look like a good idea.

 src/drivers/driver_nl80211.c    | 8 ++++++--
 src/l2_packet/l2_packet_linux.c | 9 +++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

Comments

Jouni Malinen Dec. 28, 2019, 3:40 p.m. UTC | #1
On Thu, Sep 12, 2019 at 09:20:02PM +0200, Alexander Wetzel wrote:
> Let wpa_supplicant/hostapd generated control frames bypass the QDISC
> when the linux kernel supports the feature, to make sure the frames
> are reaching the driver and are not delayed or dropped by QDISC.

What impact would this have on reliability of sending out the EAPOL
frames in cases where there is significant traffic load on the
interface? Documentation for PACKET_QDISC_BYPASS seems to imply that
this option is targeted for traffic testing applications and as a side
effect, this increases packet drops when transmit queues are busy. Would
all those drops be frames that go through qdisc or would this also apply
for the frames that go out through the socket that uses
PACKET_QDISC_BYPASS (i.e., these EAPOL frames)?

> This fixes a potential race when rekeying a connection under load:
> Without that an EAPOL#4 frame send out by wpa_supplicant may not have
> reached the driver when switching over to the new key, causing the AP to
> disconnect the STA.

It would seem to me that it would be better to send and receive EAPOL
frames using nl80211 commands/events (i.e., same mechanism that is used
to perform authentication/association and key configuration) to get
matching behavior for all these management operations.

> The intend of this patch is to make sure the eapol#4 frame has at least
> reached the driver when we replace the key.

Based on PACKET_QDISC_BYPASS documentation, I'm not convinced this
"makes sure" of that.. Using nl80211 commands for sending EAPOL-Key msg
4/4 and setting the key on the other hand would make that pretty clear
(or would provide clear error reporting if that is not the case).

> Now this is is just the quick & easy fix to make sure EAPOL#4 is at
> least in the driver queues when we replace the key, as discussed here:
> http://lists.infradead.org/pipermail/hostap/2019-September/040514.html

That other patch, i.e., EAPOL-over-nl80211 would seem to be a better
direction to finally get working. Was anyone still working on that or
can I assume that no one is going to address my comment on it (use it
more consistently for all EAPOL frames rather than have a special case
for EAPOL-Key)?
Alexander Wetzel Dec. 29, 2019, 1:40 p.m. UTC | #2
> On Thu, Sep 12, 2019 at 09:20:02PM +0200, Alexander Wetzel wrote:
>> Let wpa_supplicant/hostapd generated control frames bypass the QDISC
>> when the linux kernel supports the feature, to make sure the frames
>> are reaching the driver and are not delayed or dropped by QDISC.
> 
> What impact would this have on reliability of sending out the EAPOL
> frames in cases where there is significant traffic load on the
> interface? Documentation for PACKET_QDISC_BYPASS seems to imply that
> this option is targeted for traffic testing applications and as a side
> effect, this increases packet drops when transmit queues are busy. Would
> all those drops be frames that go through qdisc or would this also apply
> for the frames that go out through the socket that uses
> PACKET_QDISC_BYPASS (i.e., these EAPOL frames)?
> 

First, I do not think the patch should be merged as it is. PTK rekeying 
is at the moment mostly broken and this patch is just replacing one 
broken solution with a less broken one. I was hoping for some discussion 
here how to generally handle that. That did not happen so far.

Second, I'll soon have a quite radical patch ready to address the PTK 
rekey issues with what seems to be the only solution: Make rekeying 
optional and use reconnects by default as replacement. There are just 
too many broken implementations in the wild which in at least one case 
enable plain text attacks on the outgoing PTK: Ath9k with mac80211 from 
a kernel < 4.20 can repeat lost MPDUs but skips the encryption...
Generally all cards using HW crypto and mac80211 likely have some issues 
with a kernel 4.20, which more or less added proper rekey support, 
handling the corner cases. (ath10k is the known exception so far.)
But quite many authors seem to have missed at least some of the corner 
cases PTK rekeying with only keyid 0 has I invite anyone believing that 
rekey is working in his setup to set the rekey intervall to 30s and try 
a download. Even with an AP able to handle that error free many devices 
- probably most if my samples are representative - will have have 
issues. Most common are download aborts, caused to to a reconnect due to 
an error during rekeying. Making that to the default action and allow 
users knowing what they do to override it seems to be the best way out 
of the PTK rekey mess.

But pack to this patch here:
The patch would disconnect a STA when the socket refuse to accept the 
MPDU. Now the easy fix would be to simply ignore send errors and trust 
that one of the retransmits will be accepted and can get through.
But this should only be a fallback strategy when we can't use CONTROL PORT.

That said the current solution using QDISC is probably worse then just 
hoping the HW is accepting the first send attempt. Short of a really 
loaded connection the driver should be able to accept the MPDU. And when 
the driver can't only the drives with special PTK rekeying hacks can 
hope to rekey the PTK when we are using QDISC.

The price of using QDISC is, that when the send call returns one skb has 
probably been send out, but next to sure not our EAPOL frame we just 
added at the end of the queue. Assuming die QDISC queues are filled it 
may take quite some time the MPDU is OTA and has therefore a very good 
chance to get encrypted with the new key. Which will never be decrypted 
by the AP and also cause a disconnect.

Setting PACKET_QDISC_BYPASS on the socket is just "jumping" QDISC.
- We lose retransmits handling when the HW is busy and not accepting 
skb's. (Which is kind of dangerous to outsource in the first place)
+ We get the guarantee that the skb is at least in a driver buffer when 
the send call returns (instead of a QDISC bufer)


>> This fixes a potential race when rekeying a connection under load:
>> Without that an EAPOL#4 frame send out by wpa_supplicant may not have
>> reached the driver when switching over to the new key, causing the AP to
>> disconnect the STA.
> 
> It would seem to me that it would be better to send and receive EAPOL
> frames using nl80211 commands/events (i.e., same mechanism that is used
> to perform authentication/association and key configuration) to get
> matching behavior for all these management operations.
> 

Yes, but CONTROL PORT is optionally and many drivers are not 
implementing it.

>> The intend of this patch is to make sure the eapol#4 frame has at least
>> reached the driver when we replace the key.
> 
> Based on PACKET_QDISC_BYPASS documentation, I'm not convinced this
> "makes sure" of that.. Using nl80211 commands for sending EAPOL-Key msg
> 4/4 and setting the key on the other hand would make that pretty clear
> (or would provide clear error reporting if that is not the case).
> 

I checked the code, not the documentation. The ioctl can only return 
when the skb has been either accepted by the driver or hit an error 
somewhere.

>> Now this is is just the quick & easy fix to make sure EAPOL#4 is at
>> least in the driver queues when we replace the key, as discussed here:
>> http://lists.infradead.org/pipermail/hostap/2019-September/040514.html
> 
> That other patch, i.e., EAPOL-over-nl80211 would seem to be a better
> direction to finally get working. Was anyone still working on that or
> can I assume that no one is going to address my comment on it (use it
> more consistently for all EAPOL frames rather than have a special case
> for EAPOL-Key)?
> 

If nobody is else is I'll sooner or more likely later pick that up.
I just fear the final patch will be quite invasive when done proper and 
is merging the eapol paths from hostapd and wpa_supplicant.

And while it should be the preferred path when available so far only 
mac80211 is supporting it from the in-tree drivers.

Alexander
Krishna Chaitanya Dec. 30, 2019, 9:47 a.m. UTC | #3
On Sun, Dec 29, 2019 at 7:10 PM Alexander Wetzel
<alexander@wetzel-home.de> wrote:
>
> > On Thu, Sep 12, 2019 at 09:20:02PM +0200, Alexander Wetzel wrote:
> >> Let wpa_supplicant/hostapd generated control frames bypass the QDISC
> >> when the linux kernel supports the feature, to make sure the frames
> >> are reaching the driver and are not delayed or dropped by QDISC.
> >
> > What impact would this have on reliability of sending out the EAPOL
> > frames in cases where there is significant traffic load on the
> > interface? Documentation for PACKET_QDISC_BYPASS seems to imply that
> > this option is targeted for traffic testing applications and as a side
> > effect, this increases packet drops when transmit queues are busy. Would
> > all those drops be frames that go through qdisc or would this also apply
> > for the frames that go out through the socket that uses
> > PACKET_QDISC_BYPASS (i.e., these EAPOL frames)?
> >
>
> First, I do not think the patch should be merged as it is. PTK rekeying
> is at the moment mostly broken and this patch is just replacing one
> broken solution with a less broken one. I was hoping for some discussion
> here how to generally handle that. That did not happen so far.
>
> Second, I'll soon have a quite radical patch ready to address the PTK
> rekey issues with what seems to be the only solution: Make rekeying
> optional and use reconnects by default as replacement. There are just
> too many broken implementations in the wild which in at least one case
> enable plain text attacks on the outgoing PTK: Ath9k with mac80211 from
> a kernel < 4.20 can repeat lost MPDUs but skips the encryption...
> Generally all cards using HW crypto and mac80211 likely have some issues
> with a kernel 4.20, which more or less added proper rekey support,
> handling the corner cases. (ath10k is the known exception so far.)
> But quite many authors seem to have missed at least some of the corner
> cases PTK rekeying with only keyid 0 has I invite anyone believing that
> rekey is working in his setup to set the rekey intervall to 30s and try
> a download. Even with an AP able to handle that error free many devices
> - probably most if my samples are representative - will have have
> issues. Most common are download aborts, caused to to a reconnect due to
> an error during rekeying. Making that to the default action and allow
> users knowing what they do to override it seems to be the best way out
> of the PTK rekey mess.
>
> But pack to this patch here:
> The patch would disconnect a STA when the socket refuse to accept the
> MPDU. Now the easy fix would be to simply ignore send errors and trust
> that one of the retransmits will be accepted and can get through.
> But this should only be a fallback strategy when we can't use CONTROL PORT.
>
> That said the current solution using QDISC is probably worse then just
> hoping the HW is accepting the first send attempt. Short of a really
> loaded connection the driver should be able to accept the MPDU. And when
> the driver can't only the drives with special PTK rekeying hacks can
> hope to rekey the PTK when we are using QDISC.
>
> The price of using QDISC is, that when the send call returns one skb has
> probably been send out, but next to sure not our EAPOL frame we just
> added at the end of the queue. Assuming die QDISC queues are filled it
> may take quite some time the MPDU is OTA and has therefore a very good
> chance to get encrypted with the new key. Which will never be decrypted
> by the AP and also cause a disconnect.
>
> Setting PACKET_QDISC_BYPASS on the socket is just "jumping" QDISC.
> - We lose retransmits handling when the HW is busy and not accepting
> skb's. (Which is kind of dangerous to outsource in the first place)
> + We get the guarantee that the skb is at least in a driver buffer when
> the send call returns (instead of a QDISC bufer)
>
>
> >> This fixes a potential race when rekeying a connection under load:
> >> Without that an EAPOL#4 frame send out by wpa_supplicant may not have
> >> reached the driver when switching over to the new key, causing the AP to
> >> disconnect the STA.
> >
> > It would seem to me that it would be better to send and receive EAPOL
> > frames using nl80211 commands/events (i.e., same mechanism that is used
> > to perform authentication/association and key configuration) to get
> > matching behavior for all these management operations.
> >
>
> Yes, but CONTROL PORT is optionally and many drivers are not
> implementing it.
>
> >> The intend of this patch is to make sure the eapol#4 frame has at least
> >> reached the driver when we replace the key.
> >
> > Based on PACKET_QDISC_BYPASS documentation, I'm not convinced this
> > "makes sure" of that.. Using nl80211 commands for sending EAPOL-Key msg
> > 4/4 and setting the key on the other hand would make that pretty clear
> > (or would provide clear error reporting if that is not the case).
> >
>
> I checked the code, not the documentation. The ioctl can only return
> when the skb has been either accepted by the driver or hit an error
> somewhere.
>
> >> Now this is is just the quick & easy fix to make sure EAPOL#4 is at
> >> least in the driver queues when we replace the key, as discussed here:
> >> http://lists.infradead.org/pipermail/hostap/2019-September/040514.html
> >
> > That other patch, i.e., EAPOL-over-nl80211 would seem to be a better
> > direction to finally get working. Was anyone still working on that or
> > can I assume that no one is going to address my comment on it (use it
> > more consistently for all EAPOL frames rather than have a special case
> > for EAPOL-Key)?
> >
>
> If nobody is else is I'll sooner or more likely later pick that up.
> I just fear the final patch will be quite invasive when done proper and
> is merging the eapol paths from hostapd and wpa_supplicant.
Sorry, it went out of my radar, I can pick that up. This patch + flushing EAPoL
in the driver rekeying works fine.

> And while it should be the preferred path when available so far only
> mac80211 is supporting it from the in-tree drivers.
diff mbox series

Patch

diff --git a/src/drivers/driver_nl80211.c b/src/drivers/driver_nl80211.c
index f699fb53f..a8372cfc0 100644
--- a/src/drivers/driver_nl80211.c
+++ b/src/drivers/driver_nl80211.c
@@ -1992,6 +1992,7 @@  static void * wpa_driver_nl80211_drv_init(void *ctx, const char *ifname,
 {
 	struct wpa_driver_nl80211_data *drv;
 	struct i802_bss *bss;
+	int enabled = 1;
 
 	if (global_priv == NULL)
 		return NULL;
@@ -2038,9 +2039,12 @@  static void * wpa_driver_nl80211_drv_init(void *ctx, const char *ifname,
 	if (drv->eapol_tx_sock < 0)
 		goto failed;
 
-	if (drv->data_tx_status) {
-		int enabled = 1;
+	/* The qdisc could delay/drop frames. Bypass it when supported. */
+	setsockopt(drv->eapol_tx_sock, SOL_PACKET, PACKET_QDISC_BYPASS,
+		   &enabled, sizeof(enabled));
 
+	if (drv->data_tx_status) {
+		enabled = 1;
 		if (setsockopt(drv->eapol_tx_sock, SOL_SOCKET, SO_WIFI_STATUS,
 			       &enabled, sizeof(enabled)) < 0) {
 			wpa_printf(MSG_DEBUG,
diff --git a/src/l2_packet/l2_packet_linux.c b/src/l2_packet/l2_packet_linux.c
index 291c9dd26..1bd7d15d1 100644
--- a/src/l2_packet/l2_packet_linux.c
+++ b/src/l2_packet/l2_packet_linux.c
@@ -274,6 +274,7 @@  struct l2_packet_data * l2_packet_init(
 	struct l2_packet_data *l2;
 	struct ifreq ifr;
 	struct sockaddr_ll ll;
+	int enabled = 1;
 
 	l2 = os_zalloc(sizeof(struct l2_packet_data));
 	if (l2 == NULL)
@@ -294,6 +295,14 @@  struct l2_packet_data * l2_packet_init(
 		os_free(l2);
 		return NULL;
 	}
+
+	/* The qdisc could delay/drop the eapol #3 frame, causing the frame to
+	 * be encrypted with the new key or not to be send out at all when
+	 * rekeying a busy link. Bypass it when supported.
+	 */
+	setsockopt(l2->fd, SOL_PACKET, PACKET_QDISC_BYPASS,
+                   &enabled, sizeof(enabled));
+
 	os_memset(&ifr, 0, sizeof(ifr));
 	os_strlcpy(ifr.ifr_name, l2->ifname, sizeof(ifr.ifr_name));
 	if (ioctl(l2->fd, SIOCGIFINDEX, &ifr) < 0) {