diff mbox series

wpa_supplicant: Send EAPoL-Key frames over NL80211 where available

Message ID 20190531065426.25654-1-brendan.jackman@bluwireless.co.uk
State Superseded
Headers show
Series wpa_supplicant: Send EAPoL-Key frames over NL80211 where available | expand

Commit Message

Brendan Jackman May 31, 2019, 6:54 a.m. UTC
Linux kernel v4.17 added the ability to request sending control port
frames via NL80211 instead of a normal network socket. Doing this provides
the device driver with ordering information between the control port
frames and the installation of keys. This empowers it to avoid race
conditions between, for example, PTK replacement and the sending of frame
4 of the 4-way rekeying handshake in an RSNA.

This patch adds a TX_CONTROL_PORT flag to the hostap driver API to report
that it supports, for a given device, a new operation called
tx_control_port. This operation is exactly like an ethernet send except
for the extra ordering information it provides for device drivers. The
nl80211 driver is updated to support this operation when the device
reports the CONTROL_PORT_OVER_NL80211 extended feature. Finally the RSN
supplicant system is updated to use this new operation for sending
EAPoL-Key frames when the driver reports that it is available; otherwise
falling back to a normal ethernet TX.

There may be other cases than these EAPoL-Key frames that would benefit
from using the new operation but I do not know of them.

I have tested this on DMG (11ad/ay) devices with an out-of-tree Linux
driver that does not use mac80211. Without this patch I consistently see
PTK rekeying fail if message 4/4 shares a stream with other in-flight
traffic. With this patch, and the driver updated to flush the relevant TX
queue before overwriting a PTK (knowing, now, that if there was a message
4/4 related to the key installation, it has already entered the driver
queue), rekeying is reliable.

There is still data loss surrounding key installation - this problem is
alluded to in 802.11-2016 12.6.21, where extended Key ID support is
described as the eventual solution. This patch aims to at least prevent
rekeying from totally breaking the association, in a way that works on
kernels as far back as 4.17 (as per Alexander Wetzel extended Key ID
support should be possible on 5.2).

See http://lists.infradead.org/pipermail/hostap/2019-May/040089.html for a
little more context.

Cc: Chaitanya Tata <chaitanya.tata@bluwireless.co.uk>
Cc: Antony King <antony.king@bluwireless.co.uk>
Signed-off-by: Brendan Jackman <brendan.jackman@bluwireless.co.uk>
---
 src/drivers/driver.h              | 26 +++++++++++++
 src/drivers/driver_nl80211.c      | 39 ++++++++++++++++++++
 src/drivers/driver_nl80211_capa.c |  4 ++
 src/rsn_supp/wpa.c                |  2 +-
 src/rsn_supp/wpa.h                |  2 +
 src/rsn_supp/wpa_i.h              |  7 ++++
 wpa_supplicant/driver_i.h         |  8 ++++
 wpa_supplicant/ibss_rsn.c         | 18 +++++++++
 wpa_supplicant/wpas_glue.c        | 61 ++++++++++++++++++++++++-------
 9 files changed, 153 insertions(+), 14 deletions(-)

Comments

Brendan Jackman Aug. 14, 2019, 6:20 a.m. UTC | #1
Hi all,

Any chance I could get a review on this? I'm leaving my current job soon so it would be great to get it merged/rejected before I lose access to the test HW!

Cheers,
Brendan

On 31/05/2019 13:54, Brendan Jackman wrote:
> Linux kernel v4.17 added the ability to request sending control port
> frames via NL80211 instead of a normal network socket. Doing this provides
> the device driver with ordering information between the control port
> frames and the installation of keys. This empowers it to avoid race
> conditions between, for example, PTK replacement and the sending of frame
> 4 of the 4-way rekeying handshake in an RSNA.
>
> This patch adds a TX_CONTROL_PORT flag to the hostap driver API to report
> that it supports, for a given device, a new operation called
> tx_control_port. This operation is exactly like an ethernet send except
> for the extra ordering information it provides for device drivers. The
> nl80211 driver is updated to support this operation when the device
> reports the CONTROL_PORT_OVER_NL80211 extended feature. Finally the RSN
> supplicant system is updated to use this new operation for sending
> EAPoL-Key frames when the driver reports that it is available; otherwise
> falling back to a normal ethernet TX.
>
> There may be other cases than these EAPoL-Key frames that would benefit
> from using the new operation but I do not know of them.
>
> I have tested this on DMG (11ad/ay) devices with an out-of-tree Linux
> driver that does not use mac80211. Without this patch I consistently see
> PTK rekeying fail if message 4/4 shares a stream with other in-flight
> traffic. With this patch, and the driver updated to flush the relevant TX
> queue before overwriting a PTK (knowing, now, that if there was a message
> 4/4 related to the key installation, it has already entered the driver
> queue), rekeying is reliable.
>
> There is still data loss surrounding key installation - this problem is
> alluded to in 802.11-2016 12.6.21, where extended Key ID support is
> described as the eventual solution. This patch aims to at least prevent
> rekeying from totally breaking the association, in a way that works on
> kernels as far back as 4.17 (as per Alexander Wetzel extended Key ID
> support should be possible on 5.2).
>
> See http://lists.infradead.org/pipermail/hostap/2019-May/040089.html for a
> little more context.
>
> Cc: Chaitanya Tata <chaitanya.tata@bluwireless.co.uk>
> Cc: Antony King <antony.king@bluwireless.co.uk>
> Signed-off-by: Brendan Jackman <brendan.jackman@bluwireless.co.uk>
> ---
>   src/drivers/driver.h              | 26 +++++++++++++
>   src/drivers/driver_nl80211.c      | 39 ++++++++++++++++++++
>   src/drivers/driver_nl80211_capa.c |  4 ++
>   src/rsn_supp/wpa.c                |  2 +-
>   src/rsn_supp/wpa.h                |  2 +
>   src/rsn_supp/wpa_i.h              |  7 ++++
>   wpa_supplicant/driver_i.h         |  8 ++++
>   wpa_supplicant/ibss_rsn.c         | 18 +++++++++
>   wpa_supplicant/wpas_glue.c        | 61 ++++++++++++++++++++++++-------
>   9 files changed, 153 insertions(+), 14 deletions(-)
>
> diff --git a/src/drivers/driver.h b/src/drivers/driver.h
> index 496bd522e..7cf1582ce 100644
> --- a/src/drivers/driver.h
> +++ b/src/drivers/driver.h
> @@ -1625,6 +1625,8 @@ struct wpa_driver_capa {
>   #define WPA_DRIVER_FLAGS_FTM_RESPONDER		0x0100000000000000ULL
>   /** Driver support 4-way handshake offload for WPA-Personal */
>   #define WPA_DRIVER_FLAGS_4WAY_HANDSHAKE_PSK	0x0200000000000000ULL
> +/** Driver has working tx_control_port */
> +#define WPA_DRIVER_FLAGS_TX_CONTROL_PORT	0x0400000000000000ULL
>   	u64 flags;
>   
>   #define FULL_AP_CLIENT_STATE_SUPP(drv_flags) \
> @@ -2280,6 +2282,30 @@ struct wpa_driver_ops {
>   		       const u8 *seq, size_t seq_len,
>   		       const u8 *key, size_t key_len);
>   
> +	/**
> +	 * tx_control_port - Send a frame over the 802.1X controlled port
> +	 * @priv: private driver interface data
> +	 * @dest: Destination MAC address
> +	 * @proto: Ethertype in host byte order
> +	 * @buf: Frame payload starting from IEEE 802.1X header
> +	 * @len: Frame payload length
> +	 *
> +	 * Returns 0 on success, else an error
> +	 *
> +	 * This is like a normal ethernet send except that the OS device driver
> +	 * is aware (by other means than the ethertype) that this frame is
> +	 * ~special~, and more importantly it gains an ordering between the
> +	 * transmission of the frame and other driver operations such as key
> +	 * installations. This can be used to work around known limitations in
> +	 * 802.11 protocols such as race conditions between 802.1X rekeying
> +	 * handshake message 4/4 and a PTK being overwritten.
> +	 *
> +	 * This function is only implemented for a given interface if the driver
> +	 * instance reports WPA_DRIVER_FLAGS_TX_CONTROL_PORT. Otherwise API
> +	 * users should fall back to sending the frame via a normal socket.
> +	 */
> +	int (*tx_control_port)(void *priv, const u8 *dest,
> +			       u16 proto, const u8 *buf, size_t len);
>   	/**
>   	 * init - Initialize driver interface
>   	 * @ctx: context to be used when calling wpa_supplicant functions,
> diff --git a/src/drivers/driver_nl80211.c b/src/drivers/driver_nl80211.c
> index 3556b6d69..b1d0220e9 100644
> --- a/src/drivers/driver_nl80211.c
> +++ b/src/drivers/driver_nl80211.c
> @@ -3007,6 +3007,44 @@ static int nl80211_set_pmk(struct wpa_driver_nl80211_data *drv,
>   }
>   
>   
> +static int driver_nl80211_tx_control_port(void *priv, const u8 *dest,
> +					  u16 proto, const u8 *buf, size_t len)
> +{
> +	struct i802_bss *bss = priv;
> +	struct nl_msg *msg = NULL;
> +	int ifindex = if_nametoindex(bss->ifname);
> +	int ret = 0;
> +
> +	wpa_printf(MSG_DEBUG,
> +		   "nl80211: tx_control_port "MACSTR" proto=0x%04x len=%zu",
> +		   MAC2STR(dest), proto, len);
> +
> +	msg = nl80211_ifindex_msg(bss->drv, ifindex, 0,
> +				  NL80211_CMD_CONTROL_PORT_FRAME);
> +	if (!msg)
> +		return -ENOBUFS;
> +
> +	if (nla_put_u16(msg, NL80211_ATTR_CONTROL_PORT_ETHERTYPE, proto))
> +		goto fail;
> +	if (nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN, dest))
> +		goto fail;
> +	if (nla_put(msg, NL80211_ATTR_FRAME, len, buf))
> +		goto fail;
> +
> +	ret = send_and_recv_msgs(bss->drv, msg, NULL, NULL);
> +	if (ret)
> +		wpa_printf(MSG_DEBUG,
> +			   "nl80211: tx_control_port failed: ret=%d (%s)",
> +			   ret, strerror(ret));
> +
> +	return ret;
> +
> +fail:
> +	nl80211_nlmsg_clear(msg);
> +	nlmsg_free(msg);
> +	return -ENOBUFS;
> +}
> +
>   static int wpa_driver_nl80211_set_key(const char *ifname, struct i802_bss *bss,
>   				      enum wpa_alg alg, const u8 *addr,
>   				      int key_idx, int set_tx,
> @@ -10940,6 +10978,7 @@ const struct wpa_driver_ops wpa_driver_nl80211_ops = {
>   	.get_bssid = wpa_driver_nl80211_get_bssid,
>   	.get_ssid = wpa_driver_nl80211_get_ssid,
>   	.set_key = driver_nl80211_set_key,
> +	.tx_control_port = driver_nl80211_tx_control_port,
>   	.scan2 = driver_nl80211_scan2,
>   	.sched_scan = wpa_driver_nl80211_sched_scan,
>   	.stop_sched_scan = wpa_driver_nl80211_stop_sched_scan,
> diff --git a/src/drivers/driver_nl80211_capa.c b/src/drivers/driver_nl80211_capa.c
> index a90a55db8..f87621003 100644
> --- a/src/drivers/driver_nl80211_capa.c
> +++ b/src/drivers/driver_nl80211_capa.c
> @@ -433,6 +433,10 @@ static void wiphy_info_ext_feature_flags(struct wiphy_info_data *info,
>   	if (ext_feature_isset(ext_features, len,
>   			      NL80211_EXT_FEATURE_ENABLE_FTM_RESPONDER))
>   		capa->flags |= WPA_DRIVER_FLAGS_FTM_RESPONDER;
> +
> +	if (ext_feature_isset(ext_features, len,
> +			      NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211))
> +		capa->flags |= WPA_DRIVER_FLAGS_TX_CONTROL_PORT;
>   }
>   
>   
> diff --git a/src/rsn_supp/wpa.c b/src/rsn_supp/wpa.c
> index e0039fac0..aecabac0b 100644
> --- a/src/rsn_supp/wpa.c
> +++ b/src/rsn_supp/wpa.c
> @@ -158,7 +158,7 @@ int wpa_eapol_key_send(struct wpa_sm *sm, struct wpa_ptk *ptk,
>   	}
>   
>   	wpa_hexdump(MSG_MSGDUMP, "WPA: TX EAPOL-Key", msg, msg_len);
> -	ret = wpa_sm_ether_send(sm, dest, proto, msg, msg_len);
> +	ret = wpa_sm_tx_control_port(sm, dest, proto, msg, msg_len);
>   	eapol_sm_notify_tx_eapol_key(sm->eapol);
>   out:
>   	os_free(msg);
> diff --git a/src/rsn_supp/wpa.h b/src/rsn_supp/wpa.h
> index ae9cd6484..145d57988 100644
> --- a/src/rsn_supp/wpa.h
> +++ b/src/rsn_supp/wpa.h
> @@ -33,6 +33,8 @@ struct wpa_sm_ctx {
>   		       const u8 *key, size_t key_len);
>   	void * (*get_network_ctx)(void *ctx);
>   	int (*get_bssid)(void *ctx, u8 *bssid);
> +	int (*tx_control_port)(void *ctx, const u8 *dest, u16 proto,
> +			       const u8 *buf, size_t len);
>   	int (*ether_send)(void *ctx, const u8 *dest, u16 proto, const u8 *buf,
>   			  size_t len);
>   	int (*get_beacon_ie)(void *ctx);
> diff --git a/src/rsn_supp/wpa_i.h b/src/rsn_supp/wpa_i.h
> index d86734b0d..908af3d25 100644
> --- a/src/rsn_supp/wpa_i.h
> +++ b/src/rsn_supp/wpa_i.h
> @@ -216,6 +216,13 @@ static inline int wpa_sm_get_bssid(struct wpa_sm *sm, u8 *bssid)
>   	return sm->ctx->get_bssid(sm->ctx->ctx, bssid);
>   }
>   
> +static inline int wpa_sm_tx_control_port(struct wpa_sm *sm, const u8 *dest,
> +					 u16 proto, const u8 *buf, size_t len)
> +{
> +	WPA_ASSERT(sm->ctx->tx_control_port);
> +	return sm->ctx->tx_control_port(sm->ctx->ctx, dest, proto, buf, len);
> +}
> +
>   static inline int wpa_sm_ether_send(struct wpa_sm *sm, const u8 *dest,
>   				    u16 proto, const u8 *buf, size_t len)
>   {
> diff --git a/wpa_supplicant/driver_i.h b/wpa_supplicant/driver_i.h
> index f073b8a6d..2372344fa 100644
> --- a/wpa_supplicant/driver_i.h
> +++ b/wpa_supplicant/driver_i.h
> @@ -138,6 +138,14 @@ static inline int wpa_drv_get_ssid(struct wpa_supplicant *wpa_s, u8 *ssid)
>   	return -1;
>   }
>   
> +static inline int wpa_drv_tx_contol_port(struct wpa_supplicant *wpa_s,
> +					 const u8 *dest,
> +					 u16 proto, const u8 *buf, size_t len)
> +{
> +	return wpa_s->driver->tx_control_port(wpa_s->drv_priv,
> +					      dest, proto, buf, len);
> +}
> +
>   static inline int wpa_drv_set_key(struct wpa_supplicant *wpa_s,
>   				  enum wpa_alg alg, const u8 *addr,
>   				  int key_idx, int set_tx,
> diff --git a/wpa_supplicant/ibss_rsn.c b/wpa_supplicant/ibss_rsn.c
> index 6934c4725..cdadadee1 100644
> --- a/wpa_supplicant/ibss_rsn.c
> +++ b/wpa_supplicant/ibss_rsn.c
> @@ -76,6 +76,23 @@ static int supp_ether_send(void *ctx, const u8 *dest, u16 proto, const u8 *buf,
>   }
>   
>   
> +static int supp_tx_control_port(void *ctx, const u8 *dest, u16 proto,
> +				const u8 *buf, size_t len)
> +{
> +	struct ibss_rsn_peer *peer = ctx;
> +	struct wpa_supplicant *wpa_s = peer->ibss_rsn->wpa_s;
> +
> +	wpa_printf(MSG_DEBUG, "SUPP: %s(dest=" MACSTR " proto=0x%04x "
> +		   "len=%lu)",
> +		   __func__, MAC2STR(dest), proto, (unsigned long) len);
> +
> +	if (wpa_s->drv_flags & WPA_DRIVER_FLAGS_TX_CONTROL_PORT)
> +		return wpa_drv_tx_contol_port(wpa_s, dest, proto, buf, len);
> +	else
> +		return supp_ether_send(wpa_s, dest, proto, buf, len);
> +}
> +
> +
>   static u8 * supp_alloc_eapol(void *ctx, u8 type, const void *data,
>   			     u16 data_len, size_t *msg_len, void **data_pos)
>   {
> @@ -211,6 +228,7 @@ static int ibss_rsn_supp_init(struct ibss_rsn_peer *peer, const u8 *own_addr,
>   	ctx->set_state = supp_set_state;
>   	ctx->get_state = supp_get_state;
>   	ctx->ether_send = supp_ether_send;
> +	ctx->tx_control_port = supp_tx_control_port;
>   	ctx->get_beacon_ie = supp_get_beacon_ie;
>   	ctx->alloc_eapol = supp_alloc_eapol;
>   	ctx->set_key = supp_set_key;
> diff --git a/wpa_supplicant/wpas_glue.c b/wpa_supplicant/wpas_glue.c
> index e98bf1147..6f5c301f8 100644
> --- a/wpa_supplicant/wpas_glue.c
> +++ b/wpa_supplicant/wpas_glue.c
> @@ -85,17 +85,9 @@ static u8 * wpa_alloc_eapol(const struct wpa_supplicant *wpa_s, u8 type,
>   }
>   
>   
> -/**
> - * wpa_ether_send - Send Ethernet frame
> - * @wpa_s: Pointer to wpa_supplicant data
> - * @dest: Destination MAC address
> - * @proto: Ethertype in host byte order
> - * @buf: Frame payload starting from IEEE 802.1X header
> - * @len: Frame payload length
> - * Returns: >=0 on success, <0 on failure
> - */
> -static int wpa_ether_send(struct wpa_supplicant *wpa_s, const u8 *dest,
> -			  u16 proto, const u8 *buf, size_t len)
> +static inline int ext_eapol_frame_io_notify_tx(struct wpa_supplicant *wpa_s,
> +					       const u8 *dest, u16 proto,
> +					       const u8 *buf, size_t len)
>   {
>   #ifdef CONFIG_TESTING_OPTIONS
>   	if (wpa_s->ext_eapol_frame_io && proto == ETH_P_EAPOL) {
> @@ -108,18 +100,60 @@ static int wpa_ether_send(struct wpa_supplicant *wpa_s, const u8 *dest,
>   		wpa_msg(wpa_s, MSG_INFO, "EAPOL-TX " MACSTR " %s",
>   			MAC2STR(dest), hex);
>   		os_free(hex);
> -		return 0;
> +		return -1;
>   	}
>   #endif /* CONFIG_TESTING_OPTIONS */
>   
> +	return 0;
> +}
> +
> +/**
> + * wpa_ether_send - Send Ethernet frame
> + * @wpa_s: Pointer to wpa_supplicant data
> + * @dest: Destination MAC address
> + * @proto: Ethertype in host byte order
> + * @buf: Frame payload starting from IEEE 802.1X header
> + * @len: Frame payload length
> + * Returns: >=0 on success, <0 on failure
> + */
> +static int wpa_ether_send(struct wpa_supplicant *wpa_s, const u8 *dest,
> +			  u16 proto, const u8 *buf, size_t len)
> +{
> +	if (ext_eapol_frame_io_notify_tx(wpa_s, dest, proto, buf, len))
> +		return 0;
> +
>   	if (wpa_s->l2) {
>   		return l2_packet_send(wpa_s->l2, dest, proto, buf, len);
>   	}
>   
>   	return -1;
>   }
> -#endif /* IEEE8021X_EAPOL || !CONFIG_NO_WPA */
>   
> +/**
> + * wpa_supplicant_tx_control_port - Send Ethernet frame over 802.1X control port
> + * @wpa_s: Pointer to wpa_supplicant data
> + * @dest: Destination MAC address
> + * @proto: Ethertype in host byte order
> + * @buf: Frame payload starting from IEEE 802.1X header
> + * @len: Frame payload length
> + * Just like wpa_ether_send, but when this function is used the driver may be
> + * able to handle control port frames specially.
> + * Returns: >=0 on success, <0 on failure
> + */
> +static int wpa_supplicant_tx_control_port(void *ctx, const u8 *dest,
> +					  u16 proto, const u8 *buf, size_t len)
> +{
> +	struct wpa_supplicant *wpa_s = ctx;
> +
> +	if (wpa_s->drv_flags & WPA_DRIVER_FLAGS_TX_CONTROL_PORT) {
> +		if (ext_eapol_frame_io_notify_tx(wpa_s, dest, proto, buf, len))
> +			return 0;
> +		return wpa_drv_tx_contol_port(wpa_s, dest, proto, buf, len);
> +	} else {
> +		return wpa_ether_send(wpa_s, dest, proto, buf, len);
> +	}
> +}
> +#endif /* IEEE8021X_EAPOL || !CONFIG_NO_WPA */
>   
>   #ifdef IEEE8021X_EAPOL
>   
> @@ -1214,6 +1248,7 @@ int wpa_supplicant_init_wpa(struct wpa_supplicant *wpa_s)
>   	ctx->get_network_ctx = wpa_supplicant_get_network_ctx;
>   	ctx->get_bssid = wpa_supplicant_get_bssid;
>   	ctx->ether_send = _wpa_ether_send;
> +	ctx->tx_control_port = wpa_supplicant_tx_control_port;
>   	ctx->get_beacon_ie = wpa_supplicant_get_beacon_ie;
>   	ctx->alloc_eapol = _wpa_alloc_eapol;
>   	ctx->cancel_auth_timeout = _wpa_supplicant_cancel_auth_timeout;
Johannes Berg Aug. 23, 2019, 10:58 a.m. UTC | #2
On Fri, 2019-05-31 at 06:54 +0000, Brendan Jackman wrote:
> 
> +static int driver_nl80211_tx_control_port(void *priv, const u8 *dest,
> +					  u16 proto, const u8 *buf, size_t len)
> +{
> +	struct i802_bss *bss = priv;
> +	struct nl_msg *msg = NULL;
> +	int ifindex = if_nametoindex(bss->ifname);
> +	int ret = 0;
> +
> +	wpa_printf(MSG_DEBUG,
> +		   "nl80211: tx_control_port "MACSTR" proto=0x%04x len=%zu",
> +		   MAC2STR(dest), proto, len);
> +
> +	msg = nl80211_ifindex_msg(bss->drv, ifindex, 0,
> +				  NL80211_CMD_CONTROL_PORT_FRAME);
> +	if (!msg)
> +		return -ENOBUFS;
> +
> +	if (nla_put_u16(msg, NL80211_ATTR_CONTROL_PORT_ETHERTYPE, proto))
> +		goto fail;
> +	if (nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN, dest))
> +		goto fail;
> +	if (nla_put(msg, NL80211_ATTR_FRAME, len, buf))
> +		goto fail;

One of the reasons to have this path was to enable/disable encryption on
this particular frame, IIRC. Shouldn't
NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT get set here at least in some
cases?

> +static int supp_tx_control_port(void *ctx, const u8 *dest, u16 proto,
> +				const u8 *buf, size_t len)
> +{
> +	struct ibss_rsn_peer *peer = ctx;
> +	struct wpa_supplicant *wpa_s = peer->ibss_rsn->wpa_s;
> +
> +	wpa_printf(MSG_DEBUG, "SUPP: %s(dest=" MACSTR " proto=0x%04x "
> +		   "len=%lu)",
> +		   __func__, MAC2STR(dest), proto, (unsigned long) len);
> +
> +	if (wpa_s->drv_flags & WPA_DRIVER_FLAGS_TX_CONTROL_PORT)
> +		return wpa_drv_tx_contol_port(wpa_s, dest, proto, buf, len);


Typo - "control". Must be in at least two other places too, otherwise it
wouldn't compile :-)

Otherwise LGTM!

johannes
Brendan Jackman Aug. 27, 2019, 6:54 a.m. UTC | #3
Hi Johannes,

Thanks a lot for the review :)

On 23/08/2019 17:58, Johannes Berg wrote:
 > On Fri, 2019-05-31 at 06:54 +0000, Brendan Jackman wrote:
 >> +static int driver_nl80211_tx_control_port(void *priv, const u8 *dest,
 >> +                      u16 proto, const u8 *buf, size_t len)
 >> +{
 >> +    struct i802_bss *bss = priv;
 >> +    struct nl_msg *msg = NULL;
 >> +    int ifindex = if_nametoindex(bss->ifname);
 >> +    int ret = 0;
 >> +
 >> +    wpa_printf(MSG_DEBUG,
 >> +           "nl80211: tx_control_port "MACSTR" proto=0x%04x len=%zu",
 >> +           MAC2STR(dest), proto, len);
 >> +
 >> +    msg = nl80211_ifindex_msg(bss->drv, ifindex, 0,
 >> +                  NL80211_CMD_CONTROL_PORT_FRAME);
 >> +    if (!msg)
 >> +        return -ENOBUFS;
 >> +
 >> +    if (nla_put_u16(msg, NL80211_ATTR_CONTROL_PORT_ETHERTYPE, proto))
 >> +        goto fail;
 >> +    if (nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN, dest))
 >> +        goto fail;
 >> +    if (nla_put(msg, NL80211_ATTR_FRAME, len, buf))
 >> +        goto fail;
 > One of the reasons to have this path was to enable/disable encryption on
 > this particular frame, IIRC. Shouldn't
 > NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT get set here at least in some
 > cases?

Well actually my goal is strictly to _disable_ encryption for this frame. In our
driver we currently have a double-awkward workaround for the key 802.11 key
installation flaw: we explicitly disable encyption for the initial handshake, to
avoid the MAC inadvertently encrypting handshake packets before the peer has
installed the PTK... but for rekeying handshakes we then have to _stop_
explicitly disabling encryption because if we send unencapsulated packets during
an RSNA they will have no packet number. Handling that properly would require
more workarounds in the GCMP replay detection code on the RX side. But now that
we are encrypting handshake packets we are back to having the race condition:
now we can end up inadvertently encrypting frames with a key that the peer has
not yet installed. As I mentioned in the original thread, the _proper_ solution
for this is Alexander Wetzel's Extended Key ID support, which means we don't
have to overwrite the key during rekeying. But that is only available on
5.3+. This is an interim solution.

The goal of this patch is ultimately to let the driver know that when it is
asked to install a new PTK to the MAC, the handshake frames have already entered
its TX ring. So if we flush the rings before installing a key, we avoid the race
condition.

Hope that's clear... been a while since I had all these details paged in!

On the other hand, this now makes me worry that there could be other drivers out
there that implement .tx_control_port, but do not take advantage of it to avoid
the key installation race... in that case this patch could break handshakes for
those drivers. On the other-other hand I don't know why you'd implement
.tx_control_port except for this very reason.

 >> +static int supp_tx_control_port(void *ctx, const u8 *dest, u16 proto,
 >> +                const u8 *buf, size_t len)
 >> +{
 >> +    struct ibss_rsn_peer *peer = ctx;
 >> +    struct wpa_supplicant *wpa_s = peer->ibss_rsn->wpa_s;
 >> +
 >> +    wpa_printf(MSG_DEBUG, "SUPP: %s(dest=" MACSTR " proto=0x%04x "
 >> +           "len=%lu)",
 >> +           __func__, MAC2STR(dest), proto, (unsigned long) len);
 >> +
 >> +    if (wpa_s->drv_flags & WPA_DRIVER_FLAGS_TX_CONTROL_PORT)
 >> +        return wpa_drv_tx_contol_port(wpa_s, dest, proto, buf, len);
 >
 > Typo - "control". Must be in at least two other places too, otherwise it
 > wouldn't compile

Oops, well spotted - will fix for v2 (after above discussions is settled).

 >
 > Otherwise LGTM!
 >

Thanks again,
Brendan
Johannes Berg Aug. 27, 2019, 7:34 a.m. UTC | #4
Hi Brendan,

>  >> +    if (nla_put(msg, NL80211_ATTR_FRAME, len, buf))
>  >> +        goto fail;
>  > One of the reasons to have this path was to enable/disable encryption on
>  > this particular frame, IIRC. Shouldn't
>  > NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT get set here at least in some
>  > cases?
> 
> Well actually my goal is strictly to _disable_ encryption for this frame.

Right, I think you mentioned that.

> In our
> driver we currently have a double-awkward workaround for the key 802.11 key
> installation flaw: we explicitly disable encyption for the initial handshake, to
> avoid the MAC inadvertently encrypting handshake packets before the peer has
> installed the PTK... but for rekeying handshakes we then have to _stop_
> explicitly disabling encryption because if we send unencapsulated packets during
> an RSNA they will have no packet number. Handling that properly would require
> more workarounds in the GCMP replay detection code on the RX side. But now that
> we are encrypting handshake packets we are back to having the race condition:
> now we can end up inadvertently encrypting frames with a key that the peer has
> not yet installed.

It's not clear to me what this change gives you though. At least in
theory, nothing stops you from detecting that it's an EAPOL packet, and
then stopping the queues and all in a similar fashion to what you'd do
when you get the EAPOL packet through nl80211.

One of the things we did want to have here was the ability to *control*
whether the frames were encrypted, there are some protocols (WAPI?) that
require encryption, while the normal case (WPA/RSN) require _no_
encryption. Hence my comment regarding being able to control it in the
supplicant, although I guess since it doesn't actually support any paths
that want encryption it doesn't matter for now (and for upstream).

> The goal of this patch is ultimately to let the driver know that when it is
> asked to install a new PTK to the MAC, the handshake frames have already entered
> its TX ring. So if we flush the rings before installing a key, we avoid the race
> condition.

So if I understand correctly: you want the EAPOL over nl80211 so you
don't have a race between sending the EAPOL and installing the key? I
guess that makes more sense now.

> On the other hand, this now makes me worry that there could be other drivers out
> there that implement .tx_control_port, but do not take advantage of it to avoid
> the key installation race... in that case this patch could break handshakes for
> those drivers. On the other-other hand I don't know why you'd implement
> .tx_control_port except for this very reason.

I'm not sure how it'd break anything, it'd just behave as before? But
this is new, nobody really implements it yet. So I don't think we have
to worry about that even if I'm not sure I understand your concern.

johannes
Brendan Jackman Aug. 27, 2019, 7:52 a.m. UTC | #5
On 27/08/2019 14:34, Johannes Berg wrote:
 > Hi Brendan,
 >
 >>  >> +    if (nla_put(msg, NL80211_ATTR_FRAME, len, buf))
 >>  >> +        goto fail;
 >>  > One of the reasons to have this path was to enable/disable encryption on
 >>  > this particular frame, IIRC. Shouldn't
 >>  > NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT get set here at least in some
 >>  > cases?
 >>
 >> Well actually my goal is strictly to _disable_ encryption for this frame.
 > Right, I think you mentioned that.
 >
 >> In our
 >> driver we currently have a double-awkward workaround for the key 802.11 key
 >> installation flaw: we explicitly disable encyption for the initial handshake, to
 >> avoid the MAC inadvertently encrypting handshake packets before the peer has
 >> installed the PTK... but for rekeying handshakes we then have to _stop_
 >> explicitly disabling encryption because if we send unencapsulated packets during
 >> an RSNA they will have no packet number. Handling that properly would require
 >> more workarounds in the GCMP replay detection code on the RX side. But now that
 >> we are encrypting handshake packets we are back to having the race condition:
 >> now we can end up inadvertently encrypting frames with a key that the peer has
 >> not yet installed.
 > It's not clear to me what this change gives you though. At least in
 > theory, nothing stops you from detecting that it's an EAPOL packet, and
 > then stopping the queues and all in a similar fashion to what you'd do
 > when you get the EAPOL packet through nl80211.

We don't flush the queues when we get the EAPOL packet, we flush the queue when
we get the .add_key cfg80211 call, so we can be sure that the EAPOL packet has
already been encapsulated by the MAC before we install the new key.

We can already do this ring flush in .add_key, but we don't really know for
certain that the handshake packet has reached the driver's ring yet (at least, I
do not understand the kernel's network stack well enough to convince myself that
it must have). With .tx_control_port, we know for sure that it has.

We could do some insane hacks with existing information, like "we've seen an
EAPOL frame of type XYZ, so we know an .add_key call should be on the way (or
maybe we've already had it, because the frame got delayed somehow in the kernel
network stack?) so we can somehow order the actioning of the .add_key vs the
transmission of the EAPOL" but I hope you'll agree with my characterisation as
"insane" :D

 > One of the things we did want to have here was the ability to *control*
 > whether the frames were encrypted, there are some protocols (WAPI?) that
 > require encryption, while the normal case (WPA/RSN) require _no_
 > encryption. Hence my comment regarding being able to control it in the
 > supplicant, although I guess since it doesn't actually support any paths
 > that want encryption it doesn't matter for now (and for upstream).
 >
 >> The goal of this patch is ultimately to let the driver know that when it is
 >> asked to install a new PTK to the MAC, the handshake frames have already entered
 >> its TX ring. So if we flush the rings before installing a key, we avoid the race
 >> condition.
 > So if I understand correctly: you want the EAPOL over nl80211 so you
 > don't have a race between sending the EAPOL and installing the key? I
 > guess that makes more sense now.

Yes, or more exactly so it's _possible_ for the driver to avoid the race (by
flushing the TX rings before installing the key in cfg80211 .add_key, because it
now knows for sure any handshake frames have already gone into the ring).

 >> On the other hand, this now makes me worry that there could be other drivers out
 >> there that implement .tx_control_port, but do not take advantage of it to avoid
 >> the key installation race... in that case this patch could break handshakes for
 >> those drivers. On the other-other hand I don't know why you'd implement
 >> .tx_control_port except for this very reason.
 > I'm not sure how it'd break anything, it'd just behave as before? But
 > this is new, nobody really implements it yet. So I don't think we have
 > to worry about that even if I'm not sure I understand your concern.

Well my fear was that there could be a driver that relies on something like:

     if (skb->protocol == htons(ETH_P_PAE))
         /* explicitly disable encryption for this frame */


in its .ndo_start_xmit, to avoid the race condition vs key installation. But
then it implements .tx_control_port and respects the noencrypt flag (but doesn't
flush its TX queues in .add_key). So this hostap patch would end up changing
that driver's behaviour because it would set noencrypt to 0 and the driver would
no longer be explicitly disabling encryption for the frame.

But yes, I think you're right that we don't need to worry about it, I don't
think there's any reason to believe such a driver exists.

Hope that's a bit more clear...
Brendan
Rafał Miłecki Aug. 27, 2019, 8:10 a.m. UTC | #6
On Tue, 27 Aug 2019 at 09:52, Brendan Jackman
<brendan.jackman@bluwireless.com> wrote:
> On 27/08/2019 14:34, Johannes Berg wrote:
>  >> On the other hand, this now makes me worry that there could be other drivers out
>  >> there that implement .tx_control_port, but do not take advantage of it to avoid
>  >> the key installation race... in that case this patch could break handshakes for
>  >> those drivers. On the other-other hand I don't know why you'd implement
>  >> .tx_control_port except for this very reason.
>  > I'm not sure how it'd break anything, it'd just behave as before? But
>  > this is new, nobody really implements it yet. So I don't think we have
>  > to worry about that even if I'm not sure I understand your concern.
>
> Well my fear was that there could be a driver that relies on something like:
>
>      if (skb->protocol == htons(ETH_P_PAE))
>          /* explicitly disable encryption for this frame */
>
>
> in its .ndo_start_xmit, to avoid the race condition vs key installation. But
> then it implements .tx_control_port and respects the noencrypt flag (but doesn't
> flush its TX queues in .add_key).

FWIW, brcmfmac has brcmf_netdev_wait_pend8021x() that gets called from
.add_key and .del_key. It does nothing else than just waiting for zero
ETH_P_PAE packets queued. In .ndo_start_xmit it keeps counting queued
ETH_P_PAE packets.

It doesn't do any encryption/.tx_control_port trickery though, so I
assume it's all safe.
Brendan Jackman Aug. 27, 2019, 8:19 a.m. UTC | #7
On 27/08/2019 15:10, Rafał Miłecki wrote:
 > On Tue, 27 Aug 2019 at 09:52, Brendan Jackman
 > <brendan.jackman@bluwireless.com> wrote:
 >> On 27/08/2019 14:34, Johannes Berg wrote:
 >>  >> On the other hand, this now makes me worry that there could be other drivers out
 >>  >> there that implement .tx_control_port, but do not take advantage of it to avoid
 >>  >> the key installation race... in that case this patch could break handshakes for
 >>  >> those drivers. On the other-other hand I don't know why you'd implement
 >>  >> .tx_control_port except for this very reason.
 >>  > I'm not sure how it'd break anything, it'd just behave as before? But
 >>  > this is new, nobody really implements it yet. So I don't think we have
 >>  > to worry about that even if I'm not sure I understand your concern.
 >>
 >> Well my fear was that there could be a driver that relies on something like:
 >>
 >>      if (skb->protocol == htons(ETH_P_PAE))
 >>          /* explicitly disable encryption for this frame */
 >>
 >>
 >> in its .ndo_start_xmit, to avoid the race condition vs key installation. But
 >> then it implements .tx_control_port and respects the noencrypt flag (but doesn't
 >> flush its TX queues in .add_key).
 >
 > FWIW, brcmfmac has brcmf_netdev_wait_pend8021x() that gets called from
 > .add_key and .del_key. It does nothing else than just waiting for zero
 > ETH_P_PAE packets queued. In .ndo_start_xmit it keeps counting queued
 > ETH_P_PAE packets.
 >
 > It doesn't do any encryption/.tx_control_port trickery though, so I

 > assume it's all safe.

Ah, that's interesting. I actually considered implementing such a trick in our
driver, but I could not convince myself it was a complete solution, because the
EAPOL frame might not be in the ring before .add_key. (Maybe I am wrong about
that).
Johannes Berg Aug. 27, 2019, 8:56 a.m. UTC | #8
Hi,

> We don't flush the queues when we get the EAPOL packet, we flush the queue when
> we get the .add_key cfg80211 call, so we can be sure that the EAPOL packet has
> already been encapsulated by the MAC before we install the new key.
> 
> We can already do this ring flush in .add_key, but we don't really know for
> certain that the handshake packet has reached the driver's ring yet (at least, I
> do not understand the kernel's network stack well enough to convince myself that
> it must have). With .tx_control_port, we know for sure that it has.
> 
> We could do some insane hacks with existing information, like "we've seen an
> EAPOL frame of type XYZ, so we know an .add_key call should be on the way (or
> maybe we've already had it, because the frame got delayed somehow in the kernel
> network stack?) so we can somehow order the actioning of the .add_key vs the
> transmission of the EAPOL" but I hope you'll agree with my characterisation as
> "insane" :D

Right, that's basically what I was thinking of when I said there were
race conditions. Yes, I agree :)

> Well my fear was that there could be a driver that relies on something like:
> 
>      if (skb->protocol == htons(ETH_P_PAE))
>          /* explicitly disable encryption for this frame */
> 
> 
> in its .ndo_start_xmit, to avoid the race condition vs key installation.

That's basically every driver.

I guess you have this problem only because you can't tag that specific
frame for "no encryption" or something like that?

> But
> then it implements .tx_control_port and respects the noencrypt flag (but doesn't
> flush its TX queues in .add_key).

I don't think any implement it now, really. I'm also not sure why the
flush queues should be important.

Ohh. You're thinking of the *other* rekey race now, because of the reuse
of the key index ... but as you said, that one is fundamentally
unsolvable, we have to get the extended key ID support for that.


So I guess I think this is fine. I do think somebody will eventually
need the ability to *NOT* set the NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT
flag in the future, but I guess they can worry about that.

johannes
Johannes Berg Aug. 27, 2019, 8:56 a.m. UTC | #9
On Tue, 2019-08-27 at 06:54 +0000, Brendan Jackman wrote:
>  > Shouldn't
>  > NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT get set here at least in some
>  > cases?
> 
> Well actually my goal is strictly to _disable_ encryption for this frame.

Wait though - that means you *do* need to (always) set
NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT.

Your driver might not care right now, but others will, and that might
end up a problem if you don't set it, no?

johannes
Brendan Jackman Aug. 27, 2019, 9:01 a.m. UTC | #10
On 27/08/2019 15:56, Johannes Berg wrote:
 > On Tue, 2019-08-27 at 06:54 +0000, Brendan Jackman wrote:
 >>  > Shouldn't
 >>  > NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT get set here at least in some
 >>  > cases?
 >>
 >> Well actually my goal is strictly to _disable_ encryption for this frame.
 > Wait though - that means you *do* need to (always) set
 > NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT.
 >
 > Your driver might not care right now, but others will, and that might
 > end up a problem if you don't set it, no?
 >

Agh! I meant to say _enable_ encryption. i.e. "disable" the bit of code that
"disables" encryption for the frame specficially. Sorry!
Johannes Berg Aug. 27, 2019, 9:02 a.m. UTC | #11
On Tue, 2019-08-27 at 09:01 +0000, Brendan Jackman wrote:
> On 27/08/2019 15:56, Johannes Berg wrote:
>  > On Tue, 2019-08-27 at 06:54 +0000, Brendan Jackman wrote:
>  >>  > Shouldn't
>  >>  > NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT get set here at least in some
>  >>  > cases?
>  >>
>  >> Well actually my goal is strictly to _disable_ encryption for this frame.
>  > Wait though - that means you *do* need to (always) set
>  > NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT.
>  >
>  > Your driver might not care right now, but others will, and that might
>  > end up a problem if you don't set it, no?
>  >
> 
> Agh! I meant to say _enable_ encryption. i.e. "disable" the bit of code that
> "disables" encryption for the frame specficially. Sorry!

Hmm, but why? 802.11 explicitly says EAPOL frames are _not_ to be
encrypted, IIRC?

johannes
Johannes Berg Aug. 27, 2019, 9:05 a.m. UTC | #12
On Tue, 2019-08-27 at 11:02 +0200, Johannes Berg wrote:
> 
> > Agh! I meant to say _enable_ encryption. i.e. "disable" the bit of code that
> > "disables" encryption for the frame specficially. Sorry!
> 
> Hmm, but why? 802.11 explicitly says EAPOL frames are _not_ to be
> encrypted, IIRC?

No, I'm confused now, let me think about this. There's something with
WPA1 and WPA2 as well, I think? wpa_s only sets
NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT for non-WPA though ...

Or maybe WAPI was the one that said it should *not* be encrypted.

johannes
Brendan Jackman Aug. 27, 2019, 9:07 a.m. UTC | #13
On 27/08/2019 16:05, Johannes Berg wrote:
> On Tue, 2019-08-27 at 11:02 +0200, Johannes Berg wrote:
>>> Agh! I meant to say _enable_ encryption. i.e. "disable" the bit of code that
>>> "disables" encryption for the frame specficially. Sorry!
>> Hmm, but why? 802.11 explicitly says EAPOL frames are _not_ to be
>> encrypted, IIRC?
> No, I'm confused now, let me think about this. There's something with
> WPA1 and WPA2 as well, I think? wpa_s only sets
> NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT for non-WPA though ...
>
> Or maybe WAPI was the one that said it should *not* be encrypted.
Some quotes from 802.11-2016...

12.6.9:

     The MAC does not distinguish between MSDUs for the Controlled Port, and
     MSDUs for the Uncontrolled Port. In other words, EAPOL-Start frames and
     EAPOL-Key frames are encrypted only after invocation of the
     MLME-SETPROTECTION.request primitive.


4.10.3.2

     Installing the PTK [...] causes the MAC to encrypt and decrypt all
     subsequent MSDUs irrespective of their path through the controlled or
     uncontrolled ports

My understanding is that the only reason to explicitly disable encryption for
EAPoL is to workaround the race conditions.
Johannes Berg Aug. 27, 2019, 9:10 a.m. UTC | #14
On Tue, 2019-08-27 at 09:07 +0000, Brendan Jackman wrote:
> 
> My understanding is that the only reason to explicitly disable encryption for
> EAPoL is to workaround the race conditions.

Oh. Yes, maybe we wanted to be able to do something like

 1) install key
 2) send EAPOL frame(s)

in that order, but still send the frames unencrypted even though the key
is already available.

But if we actually were to do that, would it break your scheme?

johannes
Brendan Jackman Aug. 27, 2019, 9:38 a.m. UTC | #15
On 27/08/2019 16:10, Johannes Berg wrote:
 > On Tue, 2019-08-27 at 09:07 +0000, Brendan Jackman wrote:
 >>
 >> My understanding is that the only reason to explicitly disable encryption for
 >> EAPoL is to workaround the race conditions.
 >
 > Oh. Yes, maybe we wanted to be able to do something like
 >
 >  1) install key
 >  2) send EAPOL frame(s)
 >
 > in that order, but still send the frames unencrypted even though the key
 > is already available.
 >
 > But if we actually were to do that, would it break your scheme?
 >
 > johannes
 >

I'm a little confused now so let me lay my scheme out really verbosely and maybe
I will answer your question...

The correct sequence for the initial key setup is:

1. Complete 4-way handshake
2. Install PTK

The problem is that one of those processes takes place in the data plane and the
other in the control plane. So we don't really know that part 1 is complete
before we do part 2. So 4/4 can get encrypted with a key that the peer has not
installed.

As you mentioned basically all drivers (including ours) work around this by just
specifically saying that the packets in part 1 are not encrypted even if there
is a key (I'll call this NOCRYPT). So even if part 2 interferes and the key gets
installed before 4/4 goes out, it won't get encrypted.

So far so good.

However now we get to rekeying. The ideal sequence would be:

1. Complete 4-way handshake, encrypting EAPoL frames with key 0, establishing key 1
2. Install key 1

Because we specify that key 0 is used to encrypt the handshake, even if
installing key 1 races against the sending of those frames, the peer can still
decapsulate them using key 0.

But in reality we only get to use a single key (for <5.3), so the sequence is:

1. Complete 4-way handshake to establish key 0'
2. Install key 0' over the top of key 0

My problem boils down to the fact that if, in part 1, we encrypt the handshake
frames using key 0 (this is what the spec tells us to do), then if we hit the
installation race then 4/4 might get encrypted with 0'. I think up to this point
we are already on the same page...

So, what if we just don't encrypt the rekeying handshake frames either
(i.e. keep th NOCRYPT workaround going)? The problem is that the peer should
reject them according to the spec. So let's say we just have the peer allow
unencrypted frames during an RSNA as long as they are EAPoL. This is easier said
than done (at least in my case - perhaps I am a special case here?) because the
MAC layer whose job is to discard unencrypted frames has no business peering at
the ethertype or whatever. Similarly for the part where we discard GCMP replays
(unencrypted frames don't have a packet number at all!).

So in our driver we use the NOCRYPT workaround for the initial handshake, but
then _don't_ use it for the rekeying handshake (i.e. we _do_ encrypt EAPoL
frames during rekeying). But that means we are vulnerable to the race condition
and indeed we see rekeying fail if there is heavy traffic on the link.

So the idea is that by moving part 1. into the control plane to some degree
(i.e. into nl80211) we can fully complete establishment of key 0', using
handshake frames encrypted with key 0, before we install key 0' (by flushing the
TX ring in part 2). So now the NOCRYPT workaround is not necessary to avoid lost
handshake packets, neither for the initial nor for rekeying handshakes.
Johannes Berg Aug. 27, 2019, 12:11 p.m. UTC | #16
> I'm a little confused now so let me lay my scheme out really verbosely and maybe
> I will answer your question...
> 
> The correct sequence for the initial key setup is:
> 
> 1. Complete 4-way handshake
> 2. Install PTK

Yes, I think so. I'd have to check the exact installation point(s)
mentioned in the spec again though, not entirely sure they're really all
_after_ the whole handshake. Doesn't matter in practice though.

> The problem is that one of those processes takes place in the data plane and the
> other in the control plane. So we don't really know that part 1 is complete
> before we do part 2. So 4/4 can get encrypted with a key that the peer has not
> installed.

Right.

> As you mentioned basically all drivers (including ours) work around this by just
> specifically saying that the packets in part 1 are not encrypted even if there
> is a key (I'll call this NOCRYPT). So even if part 2 interferes and the key gets
> installed before 4/4 goes out, it won't get encrypted.
> 
> So far so good.

Right.

> However now we get to rekeying. The ideal sequence would be:
> 
> 1. Complete 4-way handshake, encrypting EAPoL frames with key 0, establishing key 1
> 2. Install key 1
> 
> Because we specify that key 0 is used to encrypt the handshake, even if
> installing key 1 races against the sending of those frames, the peer can still
> decapsulate them using key 0.

Sure, this is fundamentally racy/broken, as long as you don't have
everything in a single place/entity, which is unrealistic.

> But in reality we only get to use a single key (for <5.3), so the sequence is:
> 
> 1. Complete 4-way handshake to establish key 0'
> 2. Install key 0' over the top of key 0
> 
> My problem boils down to the fact that if, in part 1, we encrypt the handshake
> frames using key 0 (this is what the spec tells us to do), then if we hit the
> installation race then 4/4 might get encrypted with 0'. I think up to this point
> we are already on the same page...

I think so too.

> So, what if we just don't encrypt the rekeying handshake frames either
> (i.e. keep th NOCRYPT workaround going)? The problem is that the peer should
> reject them according to the spec. So let's say we just have the peer allow
> unencrypted frames during an RSNA as long as they are EAPoL. This is easier said
> than done (at least in my case - perhaps I am a special case here?) because the
> MAC layer whose job is to discard unencrypted frames has no business peering at
> the ethertype or whatever. Similarly for the part where we discard GCMP replays
> (unencrypted frames don't have a packet number at all!).

In mac80211, we specifically allow unencrypted EAPOL frames to be
received, because also some WPA1 APs allow them, and WAPI requires this
(IIRC), and for various other reasons. We have to look at the ethertype
anyway at some point, so not sure I agree with that part of your
statement.

In general though, yes, I think you're right. We just allow them
unencrypted for interoperability, but I'm not sure we actually transmit
them unencrypted.

> So in our driver we use the NOCRYPT workaround for the initial handshake, but
> then _don't_ use it for the rekeying handshake (i.e. we _do_ encrypt EAPoL
> frames during rekeying). But that means we are vulnerable to the race condition
> and indeed we see rekeying fail if there is heavy traffic on the link.
> 
> So the idea is that by moving part 1. into the control plane to some degree
> (i.e. into nl80211) we can fully complete establishment of key 0', using
> handshake frames encrypted with key 0, before we install key 0' (by flushing the
> TX ring in part 2). So now the NOCRYPT workaround is not necessary to avoid lost
> handshake packets, neither for the initial nor for rekeying handshakes.

Right, OK. I forgot (again) you were worried about the rekeying mostly.

Nevertheless, I think this means that we do need to have the NO_ENCRYPT
flag set to keep the *initial* handshake working, no? Or actually fix
the race conditions you pointed out above for it. While for the
rekeying, well, it basically cannot work and you just move it to the
control plane to have more control over it, which also makes sense.

johannes
Brendan Jackman Aug. 27, 2019, 12:49 p.m. UTC | #17
On 27/08/2019 19:11, Johannes Berg wrote:

 >> So, what if we just don't encrypt the rekeying handshake frames either
 >> (i.e. keep th NOCRYPT workaround going)? The problem is that the peer should
 >> reject them according to the spec. So let's say we just have the peer allow
 >> unencrypted frames during an RSNA as long as they are EAPoL. This is easier said
 >> than done (at least in my case - perhaps I am a special case here?) because the
 >> MAC layer whose job is to discard unencrypted frames has no business peering at
 >> the ethertype or whatever. Similarly for the part where we discard GCMP replays
 >> (unencrypted frames don't have a packet number at all!).
 >
 > In mac80211, we specifically allow unencrypted EAPOL frames to be
 > received, because also some WPA1 APs allow them, and WAPI requires this
 > (IIRC), and for various other reasons. We have to look at the ethertype
 > anyway at some point, so not sure I agree with that part of your
 > statement.

When I say "has no business peering" I'm not just talking about a philosophical
distaste for it - I actually mean in our current architecure the components that
discard frames for RSNA reasons don't have physical access (or it would be
costly to performance) to the ethernet header. So we'd basically need to change
those components so that where currently they cause the frame to be discarded,
they instead cause it to be passed on up and just set a flag somewhere to say
"discard that frame if it doesn't turn out to be EAPoL". Or, more accurately
"discard all the MSDUs in that MPDU that don't turn out to be EAPoL".

I must admit: perhaps we'll need to do this in the end anyway for interop
reasons. And perhaps this is a totally unique issue for our architecture which
doesn't really justify changing generic code. But anyway perhaps you can see why
I've been hesitant to do it.

 > In general though, yes, I think you're right. We just allow them
 > unencrypted for interoperability, but I'm not sure we actually transmit
 > them unencrypted.
 >
 >> So in our driver we use the NOCRYPT workaround for the initial handshake, but
 >> then _don't_ use it for the rekeying handshake (i.e. we _do_ encrypt EAPoL
 >> frames during rekeying). But that means we are vulnerable to the race condition
 >> and indeed we see rekeying fail if there is heavy traffic on the link.
 >>
 >> So the idea is that by moving part 1. into the control plane to some degree
 >> (i.e. into nl80211) we can fully complete establishment of key 0', using
 >> handshake frames encrypted with key 0, before we install key 0' (by flushing the
 >> TX ring in part 2). So now the NOCRYPT workaround is not necessary to avoid lost
 >> handshake packets, neither for the initial nor for rekeying handshakes.
 >
 > Right, OK. I forgot (again) you were worried about the rekeying mostly.
 >
 > Nevertheless, I think this means that we do need to have the NO_ENCRYPT
 > flag set to keep the *initial* handshake working, no? Or actually fix
 > the race conditions you pointed out above for it. While for the
 > rekeying, well, it basically cannot work and you just move it to the
 > control plane to have more control over it, which also makes sense.

.tx_control_port + a TX ring flush in .add_key should fix both the initial and
rekeying handshake without any need for NO_ENCRYPT. The handshake frames will
always go out before the key (re)installation takes place, which means:

- The handshake frames for the initial key setup will all go out unencrypted
   (because the MAC has no keys).

- The handshake frames for rekeying will all go out encrypted with the old key,
   before the new key gets installed. (If there is data flowing, some will
   probably be lost at this point, but the EAPoL frames will be fine which is the
   most important thing).

I should also say that with Alexander Wetzels' work on Extended Key ID support
going on as it is, my patch here would only end up really mattering for users of
kernels between v4.19 and v5.3 (I think). So yeah, if you do not think my
justifications are strong enough, I would understand.

Cheers,
Brendan
Alexander Wetzel Aug. 27, 2019, 6:50 p.m. UTC | #18
On 8/27/19 2:49 PM, Brendan Jackman wrote:
> 
> On 27/08/2019 19:11, Johannes Berg wrote:
> 
>   >> So, what if we just don't encrypt the rekeying handshake frames either
>   >> (i.e. keep th NOCRYPT workaround going)? The problem is that the peer should
>   >> reject them according to the spec. So let's say we just have the peer allow
>   >> unencrypted frames during an RSNA as long as they are EAPoL. This is easier said
>   >> than done (at least in my case - perhaps I am a special case here?) because the
>   >> MAC layer whose job is to discard unencrypted frames has no business peering at
>   >> the ethertype or whatever. Similarly for the part where we discard GCMP replays
>   >> (unencrypted frames don't have a packet number at all!).
>   >
>   > In mac80211, we specifically allow unencrypted EAPOL frames to be
>   > received, because also some WPA1 APs allow them, and WAPI requires this
>   > (IIRC), and for various other reasons. We have to look at the ethertype
>   > anyway at some point, so not sure I agree with that part of your
>   > statement.
> 

I've never seen unencrypted eapol frames with mac80211 or any other 
card. And I don't think disabling encryption for eapol during rekey can 
work. It would probably just cause even more issues when (trying) to 
rekey a connection.
I see why accepting unencrypted eapol frames at the initial connect 
makes sense. But for rekeying - regardless of the technical challenges - 
we would need some new RSN feature flag and only do that when both 
support it.

> When I say "has no business peering" I'm not just talking about a philosophical
> distaste for it - I actually mean in our current architecure the components that
> discard frames for RSNA reasons don't have physical access (or it would be
> costly to performance) to the ethernet header. So we'd basically need to change
> those components so that where currently they cause the frame to be discarded,
> they instead cause it to be passed on up and just set a flag somewhere to say
> "discard that frame if it doesn't turn out to be EAPoL". Or, more accurately
> "discard all the MSDUs in that MPDU that don't turn out to be EAPoL".
> 
> I must admit: perhaps we'll need to do this in the end anyway for interop
> reasons. And perhaps this is a totally unique issue for our architecture which
> doesn't really justify changing generic code. But anyway perhaps you can see why
> I've been hesitant to do it.
> 
>   > In general though, yes, I think you're right. We just allow them
>   > unencrypted for interoperability, but I'm not sure we actually transmit
>   > them unencrypted.
>   >
>   >> So in our driver we use the NOCRYPT workaround for the initial handshake, but
>   >> then _don't_ use it for the rekeying handshake (i.e. we _do_ encrypt EAPoL
>   >> frames during rekeying). But that means we are vulnerable to the race condition
>   >> and indeed we see rekeying fail if there is heavy traffic on the link.
>   >>

All this nowwcrypto talk may miss the point: This is a workaround 
working at the initial connect but that was it. (Looks like 90% of the 
cards/drivers are not looking behind the initial connect and mess up the 
rekey. So far I found only two devices which can rekey correctly...) 
Assuming the driver is supporting Extended Key ID but we can't use it we 
could achieve the same by installing the key prior to sending out eapol 
#4 RX_ONLY and wait till the frame is acknowledged to allow the key to 
be used for TX. But it has the same issue as the no-encrypt idea: When 
we rekey it's not working.

The PTK0 patch in mac80211 attempts to solve the issue in the following way:

Normally there are next no buffers in the tx path. When you hand over a 
frame to the kernel it's - according what I remember when investigating 
the issue - an atomic operation till the skb reaches mac80211 or 
whatever other driver we use. And the few "buffers" we have are fully 
driver controlled and we basically can just make sure those are flushed 
when we replace the key. The only potential issue I see are traffic 
shapers but then I think I eapol frames bypass them already.
Of course using the control port for the frames would be nicer. It 
should not be needed, through.

So mac80211 just makes sure that once the key replacement code is 
running any new frames handed over to it are dropped. The drivers then 
have to make sure they flush all queues, making sure the eapol #4 frame 
is also send out prior when it's instructed to delete an unicast key.

The key deletion therefore serves as sync. When mac80211 set_key call 
returns we know all frames queued to us prior to the call from user 
space to replace the key have been send.

The price of that are some dropped frames (which also could be queued) 
which would likely have been caught by the key mismatch between the STAs 
anyhow.

That same procedure should also work for you, or I must have a serious 
flaw in my understanding. And has the additional benefit to also work 
correctly when mac80211 generates the PN (IV) and not the card. Down 
sides are the dropped frames and the needed queue flush.

>   >> So the idea is that by moving part 1. into the control plane to some degree
>   >> (i.e. into nl80211) we can fully complete establishment of key 0', using
>   >> handshake frames encrypted with key 0, before we install key 0' (by flushing the
>   >> TX ring in part 2). So now the NOCRYPT workaround is not necessary to avoid lost
>   >> handshake packets, neither for the initial nor for rekeying handshakes.
>   >
>   > Right, OK. I forgot (again) you were worried about the rekeying mostly.
>   >
>   > Nevertheless, I think this means that we do need to have the NO_ENCRYPT
>   > flag set to keep the *initial* handshake working, no? Or actually fix
>   > the race conditions you pointed out above for it. While for the
>   > rekeying, well, it basically cannot work and you just move it to the
>   > control plane to have more control over it, which also makes sense.
> 
> .tx_control_port + a TX ring flush in .add_key should fix both the initial and
> rekeying handshake without any need for NO_ENCRYPT. The handshake frames will
> always go out before the key (re)installation takes place, which means:
> 
> - The handshake frames for the initial key setup will all go out unencrypted
>     (because the MAC has no keys).
> 
> - The handshake frames for rekeying will all go out encrypted with the old key,
>     before the new key gets installed. (If there is data flowing, some will
>     probably be lost at this point, but the EAPoL frames will be fine which is the
>     most important thing).
> > I should also say that with Alexander Wetzels' work on Extended Key 
ID support
> going on as it is, my patch here would only end up really mattering for users of
> kernels between v4.19 and v5.3 (I think). So yeah, if you do not think my
> justifications are strong enough, I would understand.

I think that's way too optimistic. Extended Key ID will not play any 
meaningful role for probably decades, maybe even never.
Extended Key ID should have been the only way from the beginning, it's 
now quite late to force everyone to switch.
At the moment this is only something which can work in a careful 
controlled environment, regardless which kernel you are using. This may 
be something to make mandatory in WPA4 or so. But since it will force 
many vendors to redesign their chips I don't hold my breath...

> 
> Cheers,
> Brendan

Alexander
Brendan Jackman Aug. 28, 2019, 5:22 a.m. UTC | #19
On 28/08/2019 01:50, Alexander Wetzel wrote:
 > On 8/27/19 2:49 PM, Brendan Jackman wrote:
 >>
 >> On 27/08/2019 19:11, Johannes Berg wrote:
 >>
 >>   >> So, what if we just don't encrypt the rekeying handshake frames either
 >>   >> (i.e. keep th NOCRYPT workaround going)? The problem is that the peer should
 >>   >> reject them according to the spec. So let's say we just have the peer allow
 >>   >> unencrypted frames during an RSNA as long as they are EAPoL. This is easier said
 >>   >> than done (at least in my case - perhaps I am a special case here?) because the
 >>   >> MAC layer whose job is to discard unencrypted frames has no business peering at
 >>   >> the ethertype or whatever. Similarly for the part where we discard GCMP replays
 >>   >> (unencrypted frames don't have a packet number at all!).
 >>   >
 >>   > In mac80211, we specifically allow unencrypted EAPOL frames to be
 >>   > received, because also some WPA1 APs allow them, and WAPI requires this
 >>   > (IIRC), and for various other reasons. We have to look at the ethertype
 >>   > anyway at some point, so not sure I agree with that part of your
 >>   > statement.
 >>
 >
 > I've never seen unencrypted eapol frames with mac80211 or any other card. And
 > I don't think disabling encryption for eapol during rekey can work. It would
 > probably just cause even more issues when (trying) to rekey a connection.  I
 > see why accepting unencrypted eapol frames at the initial connect makes
 > sense. But for rekeying - regardless of the technical challenges - we would
 > need some new RSN feature flag and only do that when both support it.
 >
 >> When I say "has no business peering" I'm not just talking about a philosophical
 >> distaste for it - I actually mean in our current architecure the components that
 >> discard frames for RSNA reasons don't have physical access (or it would be
 >> costly to performance) to the ethernet header. So we'd basically need to change
 >> those components so that where currently they cause the frame to be discarded,
 >> they instead cause it to be passed on up and just set a flag somewhere to say
 >> "discard that frame if it doesn't turn out to be EAPoL". Or, more accurately
 >> "discard all the MSDUs in that MPDU that don't turn out to be EAPoL".
 >>
 >> I must admit: perhaps we'll need to do this in the end anyway for interop
 >> reasons. And perhaps this is a totally unique issue for our architecture which
 >> doesn't really justify changing generic code. But anyway perhaps you can see why
 >> I've been hesitant to do it.
 >>
 >>   > In general though, yes, I think you're right. We just allow them
 >>   > unencrypted for interoperability, but I'm not sure we actually transmit
 >>   > them unencrypted.
 >>   >
 >>   >> So in our driver we use the NOCRYPT workaround for the initial handshake, but
 >>   >> then _don't_ use it for the rekeying handshake (i.e. we _do_ encrypt EAPoL
 >>   >> frames during rekeying). But that means we are vulnerable to the race condition
 >>   >> and indeed we see rekeying fail if there is heavy traffic on the link.
 >>   >>
 >
 > All this nowwcrypto talk may miss the point: This is a workaround working at
 > the initial connect but that was it. (Looks like 90% of the cards/drivers are
 > not looking behind the initial connect and mess up the rekey. So far I found
 > only two devices which can rekey correctly...) Assuming the driver is
 > supporting Extended Key ID but we can't use it we could achieve the same by
 > installing the key prior to sending out eapol #4 RX_ONLY and wait till the
 > frame is acknowledged to allow the key to be used for TX. But it has the same
 > issue as the no-encrypt idea: When we rekey it's not working.
 >
 > The PTK0 patch in mac80211 attempts to solve the issue in the following way:
 >
 > Normally there are next no buffers in the tx path. When you hand over a frame
 > to the kernel it's - according what I remember when investigating the issue -
 > an atomic operation till the skb reaches mac80211 or whatever other driver we
 > use. And the few "buffers" we have are fully driver controlled and we
 > basically can just make sure those are flushed when we replace the key. The
 > only potential issue I see are traffic shapers but then I think I eapol frames
 > bypass them already.  Of course using the control port for the frames would be
 > nicer. It should not be needed, through.
 >
 > So mac80211 just makes sure that once the key replacement code is running any
 > new frames handed over to it are dropped. The drivers then have to make sure
 > they flush all queues, making sure the eapol #4 frame is also send out prior
 > when it's instructed to delete an unicast key.
 >
 > The key deletion therefore serves as sync. When mac80211 set_key call returns
 > we know all frames queued to us prior to the call from user space to replace
 > the key have been send.
 >
 > The price of that are some dropped frames (which also could be queued) which
 > would likely have been caught by the key mismatch between the STAs anyhow.
 >
 > That same procedure should also work for you, or I must have a serious flaw in
 > my understanding. And has the additional benefit to also work correctly when
 > mac80211 generates the PN (IV) and not the card. Down sides are the dropped
 > frames and the needed queue flush.

This is interesting. To make sure I understand you: you are saying that my fear
of the .add_key call coming before the .ndo_start_xmit for frame 4/4 is
unfounded, 4/4 will always reach the driver before the .add_key call? So just
ensuring all frames queued before .add_key are sent before key
installation/replacement should fix the problem, without any need to change
userspace.

(I actually tried this a few months ago, and indeed I found that just flushing
  the TX rings in .add_key made rekeying work. But seeing it happen in my testing
  did not convince me that it would work in every situation)

Do you consider this is Linux net behaviour that can be relied upon or just
happenstance of today's implementation (maybe because of all the bufferbloat
work)? Perhaps this hostap patch is still not a bad idea, even if the current
kernel behaviour means that it is not strictly necessary?

 >>   >> So the idea is that by moving part 1. into the control plane to some degree
 >>   >> (i.e. into nl80211) we can fully complete establishment of key 0', using
 >>   >> handshake frames encrypted with key 0, before we install key 0' (by flushing the
 >>   >> TX ring in part 2). So now the NOCRYPT workaround is not necessary to avoid lost
 >>   >> handshake packets, neither for the initial nor for rekeying handshakes.
 >>   >
 >>   > Right, OK. I forgot (again) you were worried about the rekeying mostly.
 >>   >
 >>   > Nevertheless, I think this means that we do need to have the NO_ENCRYPT
 >>   > flag set to keep the *initial* handshake working, no? Or actually fix
 >>   > the race conditions you pointed out above for it. While for the
 >>   > rekeying, well, it basically cannot work and you just move it to the
 >>   > control plane to have more control over it, which also makes sense.
 >>
 >> .tx_control_port + a TX ring flush in .add_key should fix both the initial and
 >> rekeying handshake without any need for NO_ENCRYPT. The handshake frames will
 >> always go out before the key (re)installation takes place, which means:
 >>
 >> - The handshake frames for the initial key setup will all go out unencrypted
 >>     (because the MAC has no keys).
 >>
 >> - The handshake frames for rekeying will all go out encrypted with the old key,
 >>     before the new key gets installed. (If there is data flowing, some will
 >>     probably be lost at this point, but the EAPoL frames will be fine which is the
 >>     most important thing).
 >> > I should also say that with Alexander Wetzels' work on Extended Key
 > ID support
 >> going on as it is, my patch here would only end up really mattering for users of
 >> kernels between v4.19 and v5.3 (I think). So yeah, if you do not think my
 >> justifications are strong enough, I would understand.
 >
 > I think that's way too optimistic. Extended Key ID will not play any
 > meaningful role for probably decades, maybe even never. Extended Key ID
 > should have been the only way from the beginning, it's now quite late to force
 > everyone to switch.  At the moment this is only something which can work in a
 > careful controlled environment, regardless which kernel you are using. This
 > may be something to make mandatory in WPA4 or so. But since it will force many
 > vendors to redesign their chips I don't hold my breath...

Hmm, that's a good point, I hadn't thought of it this way because our device
supports multiple key IDs, and current use cases tend to be our devices talking
to each other.

Thanks for the input,
Brendan
Krishna Chaitanya Aug. 28, 2019, 7:42 a.m. UTC | #20
On Tue, Aug 27, 2019 at 10:23 PM Alexander Wetzel
<alexander@wetzel-home.de> wrote:
>
> On 8/27/19 2:49 PM, Brendan Jackman wrote:
> >
> > On 27/08/2019 19:11, Johannes Berg wrote:
> >
> >   >> So, what if we just don't encrypt the rekeying handshake frames either
> >   >> (i.e. keep th NOCRYPT workaround going)? The problem is that the peer should
> >   >> reject them according to the spec. So let's say we just have the peer allow
> >   >> unencrypted frames during an RSNA as long as they are EAPoL. This is easier said
> >   >> than done (at least in my case - perhaps I am a special case here?) because the
> >   >> MAC layer whose job is to discard unencrypted frames has no business peering at
> >   >> the ethertype or whatever. Similarly for the part where we discard GCMP replays
> >   >> (unencrypted frames don't have a packet number at all!).
> >   >
> >   > In mac80211, we specifically allow unencrypted EAPOL frames to be
> >   > received, because also some WPA1 APs allow them, and WAPI requires this
> >   > (IIRC), and for various other reasons. We have to look at the ethertype
> >   > anyway at some point, so not sure I agree with that part of your
> >   > statement.
> >
>
> I've never seen unencrypted eapol frames with mac80211 or any other
> card. And I don't think disabling encryption for eapol during rekey can
> work. It would probably just cause even more issues when (trying) to
> rekey a connection.
> I see why accepting unencrypted eapol frames at the initial connect
> makes sense. But for rekeying - regardless of the technical challenges -
> we would need some new RSN feature flag and only do that when both
> support it.
>
> > When I say "has no business peering" I'm not just talking about a philosophical
> > distaste for it - I actually mean in our current architecure the components that
> > discard frames for RSNA reasons don't have physical access (or it would be
> > costly to performance) to the ethernet header. So we'd basically need to change
> > those components so that where currently they cause the frame to be discarded,
> > they instead cause it to be passed on up and just set a flag somewhere to say
> > "discard that frame if it doesn't turn out to be EAPoL". Or, more accurately
> > "discard all the MSDUs in that MPDU that don't turn out to be EAPoL".
> >
> > I must admit: perhaps we'll need to do this in the end anyway for interop
> > reasons. And perhaps this is a totally unique issue for our architecture which
> > doesn't really justify changing generic code. But anyway perhaps you can see why
> > I've been hesitant to do it.
> >
> >   > In general though, yes, I think you're right. We just allow them
> >   > unencrypted for interoperability, but I'm not sure we actually transmit
> >   > them unencrypted.
> >   >
> >   >> So in our driver we use the NOCRYPT workaround for the initial handshake, but
> >   >> then _don't_ use it for the rekeying handshake (i.e. we _do_ encrypt EAPoL
> >   >> frames during rekeying). But that means we are vulnerable to the race condition
> >   >> and indeed we see rekeying fail if there is heavy traffic on the link.
> >   >>
>
> All this nowwcrypto talk may miss the point: This is a workaround
> working at the initial connect but that was it. (Looks like 90% of the
> cards/drivers are not looking behind the initial connect and mess up the
> rekey. So far I found only two devices which can rekey correctly...)
> Assuming the driver is supporting Extended Key ID but we can't use it we
> could achieve the same by installing the key prior to sending out eapol
> #4 RX_ONLY and wait till the frame is acknowledged to allow the key to
> be used for TX. But it has the same issue as the no-encrypt idea: When
> we rekey it's not working.
>
> The PTK0 patch in mac80211 attempts to solve the issue in the following way:
>
> Normally there are next no buffers in the tx path. When you hand over a
> frame to the kernel it's - according what I remember when investigating
> the issue - an atomic operation till the skb reaches mac80211 or
> whatever other driver we use. And the few "buffers" we have are fully
> driver controlled and we basically can just make sure those are flushed
> when we replace the key. The only potential issue I see are traffic
> shapers but then I think I eapol frames bypass them already.
the issue of race b/w set_key and send_m4 probably caused the delays in
socket_queues/QDISC for M4, can you please elaborate on how EAPoL
bypasses QDISC?
I don't see PACKET_QDISC_BYPASS being set for l2_packet_send.
> Of course using the control port for the frames would be nicer. It
> should not be needed, through.
>
> So mac80211 just makes sure that once the key replacement code is
> running any new frames handed over to it are dropped. The drivers then
> have to make sure they flush all queues, making sure the eapol #4 frame
> is also send out prior when it's instructed to delete an unicast key.
>
> The key deletion therefore serves as sync. When mac80211 set_key call
> returns we know all frames queued to us prior to the call from user
> space to replace the key have been send.
>
> The price of that are some dropped frames (which also could be queued)
> which would likely have been caught by the key mismatch between the STAs
> anyhow.
>
> That same procedure should also work for you, or I must have a serious
> flaw in my understanding. And has the additional benefit to also work
> correctly when mac80211 generates the PN (IV) and not the card. Down
> sides are the dropped frames and the needed queue flush.
>
> >   >> So the idea is that by moving part 1. into the control plane to some degree
> >   >> (i.e. into nl80211) we can fully complete establishment of key 0', using
> >   >> handshake frames encrypted with key 0, before we install key 0' (by flushing the
> >   >> TX ring in part 2). So now the NOCRYPT workaround is not necessary to avoid lost
> >   >> handshake packets, neither for the initial nor for rekeying handshakes.
> >   >
> >   > Right, OK. I forgot (again) you were worried about the rekeying mostly.
> >   >
> >   > Nevertheless, I think this means that we do need to have the NO_ENCRYPT
> >   > flag set to keep the *initial* handshake working, no? Or actually fix
> >   > the race conditions you pointed out above for it. While for the
> >   > rekeying, well, it basically cannot work and you just move it to the
> >   > control plane to have more control over it, which also makes sense.
> >
> > .tx_control_port + a TX ring flush in .add_key should fix both the initial and
> > rekeying handshake without any need for NO_ENCRYPT. The handshake frames will
> > always go out before the key (re)installation takes place, which means:
> >
> > - The handshake frames for the initial key setup will all go out unencrypted
> >     (because the MAC has no keys).
> >
> > - The handshake frames for rekeying will all go out encrypted with the old key,
> >     before the new key gets installed. (If there is data flowing, some will
> >     probably be lost at this point, but the EAPoL frames will be fine which is the
> >     most important thing).
> > > I should also say that with Alexander Wetzels' work on Extended Key
> ID support
> > going on as it is, my patch here would only end up really mattering for users of
> > kernels between v4.19 and v5.3 (I think). So yeah, if you do not think my
> > justifications are strong enough, I would understand.
>
> I think that's way too optimistic. Extended Key ID will not play any
> meaningful role for probably decades, maybe even never.
> Extended Key ID should have been the only way from the beginning, it's
> now quite late to force everyone to switch.
> At the moment this is only something which can work in a careful
> controlled environment, regardless which kernel you are using. This may
> be something to make mandatory in WPA4 or so. But since it will force
> many vendors to redesign their chips I don't hold my breath...
>
> >
> > Cheers,
> > Brendan
>
> Alexander
>
> _______________________________________________
> Hostap mailing list
> Hostap@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/hostap
Alexander Wetzel Aug. 30, 2019, 9:42 p.m. UTC | #21
Am 28.08.19 um 07:22 schrieb Brendan Jackman:
> 
> On 28/08/2019 01:50, Alexander Wetzel wrote:
>   > On 8/27/19 2:49 PM, Brendan Jackman wrote:
>   >>
>   >> On 27/08/2019 19:11, Johannes Berg wrote:
>   >>
>   >>   >> So, what if we just don't encrypt the rekeying handshake frames either
>   >>   >> (i.e. keep th NOCRYPT workaround going)? The problem is that the peer should
>   >>   >> reject them according to the spec. So let's say we just have the peer allow
>   >>   >> unencrypted frames during an RSNA as long as they are EAPoL. This is easier said
>   >>   >> than done (at least in my case - perhaps I am a special case here?) because the
>   >>   >> MAC layer whose job is to discard unencrypted frames has no business peering at
>   >>   >> the ethertype or whatever. Similarly for the part where we discard GCMP replays
>   >>   >> (unencrypted frames don't have a packet number at all!).
>   >>   >
>   >>   > In mac80211, we specifically allow unencrypted EAPOL frames to be
>   >>   > received, because also some WPA1 APs allow them, and WAPI requires this
>   >>   > (IIRC), and for various other reasons. We have to look at the ethertype
>   >>   > anyway at some point, so not sure I agree with that part of your
>   >>   > statement.
>   >>
>   >
>   > I've never seen unencrypted eapol frames with mac80211 or any other card. And
>   > I don't think disabling encryption for eapol during rekey can work. It would
>   > probably just cause even more issues when (trying) to rekey a connection.  I
>   > see why accepting unencrypted eapol frames at the initial connect makes
>   > sense. But for rekeying - regardless of the technical challenges - we would
>   > need some new RSN feature flag and only do that when both support it.
>   >
>   >> When I say "has no business peering" I'm not just talking about a philosophical
>   >> distaste for it - I actually mean in our current architecure the components that
>   >> discard frames for RSNA reasons don't have physical access (or it would be
>   >> costly to performance) to the ethernet header. So we'd basically need to change
>   >> those components so that where currently they cause the frame to be discarded,
>   >> they instead cause it to be passed on up and just set a flag somewhere to say
>   >> "discard that frame if it doesn't turn out to be EAPoL". Or, more accurately
>   >> "discard all the MSDUs in that MPDU that don't turn out to be EAPoL".
>   >>
>   >> I must admit: perhaps we'll need to do this in the end anyway for interop
>   >> reasons. And perhaps this is a totally unique issue for our architecture which
>   >> doesn't really justify changing generic code. But anyway perhaps you can see why
>   >> I've been hesitant to do it.
>   >>
>   >>   > In general though, yes, I think you're right. We just allow them
>   >>   > unencrypted for interoperability, but I'm not sure we actually transmit
>   >>   > them unencrypted.
>   >>   >
>   >>   >> So in our driver we use the NOCRYPT workaround for the initial handshake, but
>   >>   >> then _don't_ use it for the rekeying handshake (i.e. we _do_ encrypt EAPoL
>   >>   >> frames during rekeying). But that means we are vulnerable to the race condition
>   >>   >> and indeed we see rekeying fail if there is heavy traffic on the link.
>   >>   >>
>   >
>   > All this nowwcrypto talk may miss the point: This is a workaround working at
>   > the initial connect but that was it. (Looks like 90% of the cards/drivers are
>   > not looking behind the initial connect and mess up the rekey. So far I found
>   > only two devices which can rekey correctly...) Assuming the driver is
>   > supporting Extended Key ID but we can't use it we could achieve the same by
>   > installing the key prior to sending out eapol #4 RX_ONLY and wait till the
>   > frame is acknowledged to allow the key to be used for TX. But it has the same
>   > issue as the no-encrypt idea: When we rekey it's not working.
>   >
>   > The PTK0 patch in mac80211 attempts to solve the issue in the following way:
>   >
>   > Normally there are next no buffers in the tx path. When you hand over a frame
>   > to the kernel it's - according what I remember when investigating the issue -
>   > an atomic operation till the skb reaches mac80211 or whatever other driver we
>   > use. And the few "buffers" we have are fully driver controlled and we
>   > basically can just make sure those are flushed when we replace the key. The
>   > only potential issue I see are traffic shapers but then I think I eapol frames
>   > bypass them already.  Of course using the control port for the frames would be
>   > nicer. It should not be needed, through.
>   >
>   > So mac80211 just makes sure that once the key replacement code is running any
>   > new frames handed over to it are dropped. The drivers then have to make sure
>   > they flush all queues, making sure the eapol #4 frame is also send out prior
>   > when it's instructed to delete an unicast key.
>   >
>   > The key deletion therefore serves as sync. When mac80211 set_key call returns
>   > we know all frames queued to us prior to the call from user space to replace
>   > the key have been send.
>   >
>   > The price of that are some dropped frames (which also could be queued) which
>   > would likely have been caught by the key mismatch between the STAs anyhow.
>   >
>   > That same procedure should also work for you, or I must have a serious flaw in
>   > my understanding. And has the additional benefit to also work correctly when
>   > mac80211 generates the PN (IV) and not the card. Down sides are the dropped
>   > frames and the needed queue flush.
> 
> This is interesting. To make sure I understand you: you are saying that my fear
> of the .add_key call coming before the .ndo_start_xmit for frame 4/4 is
> unfounded, 4/4 will always reach the driver before the .add_key call? So just
> ensuring all frames queued before .add_key are sent before key
> installation/replacement should fix the problem, without any need to change
> userspace.
> 
> (I actually tried this a few months ago, and indeed I found that just flushing
>    the TX rings in .add_key made rekeying work. But seeing it happen in my testing
>    did not convince me that it would work in every situation)
> 
> Do you consider this is Linux net behaviour that can be relied upon or just
> happenstance of today's implementation (maybe because of all the bufferbloat
> work)? Perhaps this hostap patch is still not a bad idea, even if the current
> kernel behaviour means that it is not strictly necessary?

I had a look at that when working on the mac80211 PTK0 rekey patch.
Now that was just a side quest besides many and I did not spend the time 
needed to work out the exact path(s)...

I'll trying to figure it out now but I'm next to sure that the eapol 
frames go through the qdisc. Bypassing that via the control path makes 
definitely sense for me.
As of today I expect to find out that the qdisc code is the only 
dangerous point which could queue (or drop!) the eapol frames prior to 
handing it over to the driver.

> 
>   >>   >> So the idea is that by moving part 1. into the control plane to some degree
>   >>   >> (i.e. into nl80211) we can fully complete establishment of key 0', using
>   >>   >> handshake frames encrypted with key 0, before we install key 0' (by flushing the
>   >>   >> TX ring in part 2). So now the NOCRYPT workaround is not necessary to avoid lost
>   >>   >> handshake packets, neither for the initial nor for rekeying handshakes.
>   >>   >
>   >>   > Right, OK. I forgot (again) you were worried about the rekeying mostly.
>   >>   >
>   >>   > Nevertheless, I think this means that we do need to have the NO_ENCRYPT
>   >>   > flag set to keep the *initial* handshake working, no? Or actually fix
>   >>   > the race conditions you pointed out above for it. While for the
>   >>   > rekeying, well, it basically cannot work and you just move it to the
>   >>   > control plane to have more control over it, which also makes sense.
>   >>
>   >> .tx_control_port + a TX ring flush in .add_key should fix both the initial and
>   >> rekeying handshake without any need for NO_ENCRYPT. The handshake frames will
>   >> always go out before the key (re)installation takes place, which means:
>   >>
>   >> - The handshake frames for the initial key setup will all go out unencrypted
>   >>     (because the MAC has no keys).
>   >>
>   >> - The handshake frames for rekeying will all go out encrypted with the old key,
>   >>     before the new key gets installed. (If there is data flowing, some will
>   >>     probably be lost at this point, but the EAPoL frames will be fine which is the
>   >>     most important thing).
>   >> > I should also say that with Alexander Wetzels' work on Extended Key
>   > ID support
>   >> going on as it is, my patch here would only end up really mattering for users of
>   >> kernels between v4.19 and v5.3 (I think). So yeah, if you do not think my
>   >> justifications are strong enough, I would understand.
>   >
>   > I think that's way too optimistic. Extended Key ID will not play any
>   > meaningful role for probably decades, maybe even never. Extended Key ID
>   > should have been the only way from the beginning, it's now quite late to force
>   > everyone to switch.  At the moment this is only something which can work in a
>   > careful controlled environment, regardless which kernel you are using. This
>   > may be something to make mandatory in WPA4 or so. But since it will force many
>   > vendors to redesign their chips I don't hold my breath...
> 
> Hmm, that's a good point, I hadn't thought of it this way because our device
> supports multiple key IDs, and current use cases tend to be our devices talking
> to each other.
>

To real "fix" - or as close as we can get to it - I have in mind has the 
following steps:

1) Extended Key ID as ultimate solution
2) PTK0 rekey flag in drivers able to rekey correctly, empowering
3) hostapd/wpa_supplicant to make a fast disconnect/reconnect instead of 
rekeying the connection when PTK0 rekey is not supported.

Of course hostapd/wpa_supplicant will also get a config setting to force 
reconnects for any rekeys, regardless if they are initiated locally or 
by the remote STA. Not sure what we want to use as the default setting 
for that, but first we have to get there:-)

I have most of the pieces in place now and even a POC patch doing 3)
It's just not fast the rest is working. (Probably I just have to prevent 
the BSS table to get flushed when reconnecting to a AP due to a rekey 
request.)

So in the end we have three group of cards. All able to rekey somehow. 
Some just better with no impact (Extended Key ID) or quite fast (PTK0 
rekey) or at least stable if with a comparable huge dropout (reconnect 
instead re-key).

Now there is not much we can do with cards/drivers not able to rekey 
PTK0 but still (trying) to do it. This would require a update to 802.11, 
forcing all drivers able to rekey PTK0 to also set a (RSN?) flag.
With such a flag either end of the connection could "override" a rekey 
attempt with a disconnect. Hopefully followed by a more or less fast 
reconnect. (And this may well be > 10s, if my tests with wpa_supplicant).


Alexander
Alexander Wetzel Sept. 1, 2019, 10:45 a.m. UTC | #22
>> Normally there are next no buffers in the tx path. When you hand over a
>> frame to the kernel it's - according what I remember when investigating
>> the issue - an atomic operation till the skb reaches mac80211 or
>> whatever other driver we use. And the few "buffers" we have are fully
>> driver controlled and we basically can just make sure those are flushed
>> when we replace the key. The only potential issue I see are traffic
>> shapers but then I think I eapol frames bypass them already.

> the issue of race b/w set_key and send_m4 probably caused the delays in
> socket_queues/QDISC for M4, can you please elaborate on how EAPoL
> bypasses QDISC?
> I don't see PACKET_QDISC_BYPASS being set for l2_packet_send.

I had some hope that eapol packets would have a build in fast lane but 
you are right: They get added to the QDISC queue and processed like any 
other packets.

But why do we not just set PACKET_QDISC_BYPASS for eapol frames?
Just tried it and it seems to work as it should: fine.

We could allow this setsockopt() call to fail, so all kernels < 3.14 
would continue to push the eapol packets through the QDISC. But starting 
with 3.14 drivers flushing their queues prior to deleting a key can be 
sure the eapol frame has been send.

The only potential down side I can find is the - required - lack of 
buffers. The driver could therefore refuse to accept the eapol packet 
which should cause a disconnect instead a queued and later send eapol frame.
Using the control port for eapol would of course still better, but using 
PACKET_QDISC_BYPASS when this is not working looks like a good fallback 
and an improvement compared to as things are today.

Alexander
Krishna Chaitanya Sept. 1, 2019, 3:20 p.m. UTC | #23
On Sun, Sep 1, 2019 at 4:15 PM Alexander Wetzel
<alexander@wetzel-home.de> wrote:
>
>
> >> Normally there are next no buffers in the tx path. When you hand over a
> >> frame to the kernel it's - according what I remember when investigating
> >> the issue - an atomic operation till the skb reaches mac80211 or
> >> whatever other driver we use. And the few "buffers" we have are fully
> >> driver controlled and we basically can just make sure those are flushed
> >> when we replace the key. The only potential issue I see are traffic
> >> shapers but then I think I eapol frames bypass them already.
>
> > the issue of race b/w set_key and send_m4 probably caused the delays in
> > socket_queues/QDISC for M4, can you please elaborate on how EAPoL
> > bypasses QDISC?
> > I don't see PACKET_QDISC_BYPASS being set for l2_packet_send.
>
> I had some hope that eapol packets would have a build in fast lane but
> you are right: They get added to the QDISC queue and processed like any
> other packets.
>
> But why do we not just set PACKET_QDISC_BYPASS for eapol frames?
> Just tried it and it seems to work as it should: fine.
>
> We could allow this setsockopt() call to fail, so all kernels < 3.14
> would continue to push the eapol packets through the QDISC. But starting
> with 3.14 drivers flushing their queues prior to deleting a key can be
> sure the eapol frame has been send.
>
> The only potential down side I can find is the - required - lack of
> buffers. The driver could therefore refuse to accept the eapol packet
> which should cause a disconnect instead a queued and later send eapol frame.
> Using the control port for eapol would of course still better, but using
> PACKET_QDISC_BYPASS when this is not working looks like a good fallback
> and an improvement compared to as things are today.
>
I agree using that option to bypass QDISC is a definite improvment.
Krishna Chaitanya Oct. 16, 2019, 10:34 a.m. UTC | #24
On Sun, Sep 1, 2019 at 8:50 PM Krishna Chaitanya
<chaitanya.mgit@gmail.com> wrote:
>
> On Sun, Sep 1, 2019 at 4:15 PM Alexander Wetzel
> <alexander@wetzel-home.de> wrote:
> >
> >
> > >> Normally there are next no buffers in the tx path. When you hand over a
> > >> frame to the kernel it's - according what I remember when investigating
> > >> the issue - an atomic operation till the skb reaches mac80211 or
> > >> whatever other driver we use. And the few "buffers" we have are fully
> > >> driver controlled and we basically can just make sure those are flushed
> > >> when we replace the key. The only potential issue I see are traffic
> > >> shapers but then I think I eapol frames bypass them already.
> >
> > > the issue of race b/w set_key and send_m4 probably caused the delays in
> > > socket_queues/QDISC for M4, can you please elaborate on how EAPoL
> > > bypasses QDISC?
> > > I don't see PACKET_QDISC_BYPASS being set for l2_packet_send.
> >
> > I had some hope that eapol packets would have a build in fast lane but
> > you are right: They get added to the QDISC queue and processed like any
> > other packets.
> >
> > But why do we not just set PACKET_QDISC_BYPASS for eapol frames?
> > Just tried it and it seems to work as it should: fine.
> >
> > We could allow this setsockopt() call to fail, so all kernels < 3.14
> > would continue to push the eapol packets through the QDISC. But starting
> > with 3.14 drivers flushing their queues prior to deleting a key can be
> > sure the eapol frame has been send.
> >
> > The only potential down side I can find is the - required - lack of
> > buffers. The driver could therefore refuse to accept the eapol packet
> > which should cause a disconnect instead a queued and later send eapol frame.
> > Using the control port for eapol would of course still better, but using
> > PACKET_QDISC_BYPASS when this is not working looks like a good fallback
> > and an improvement compared to as things are today.
> >
> I agree using that option to bypass QDISC is a definite improvment.

To summarize: Patching hostap/wpa_s to use PACKET_QDISC_BYPASS is enough
(along with driver stalling the set_key till EAPoL is flushed) and we
don't need this patch
anymore? Please confirm.
Alexander Wetzel Oct. 18, 2019, 7:44 p.m. UTC | #25
Am 16.10.19 um 12:34 schrieb Krishna Chaitanya:
> On Sun, Sep 1, 2019 at 8:50 PM Krishna Chaitanya
> <chaitanya.mgit@gmail.com> wrote:
>>
>> On Sun, Sep 1, 2019 at 4:15 PM Alexander Wetzel
>> <alexander@wetzel-home.de> wrote:
>>>
>>>
>>>>> Normally there are next no buffers in the tx path. When you hand over a
>>>>> frame to the kernel it's - according what I remember when investigating
>>>>> the issue - an atomic operation till the skb reaches mac80211 or
>>>>> whatever other driver we use. And the few "buffers" we have are fully
>>>>> driver controlled and we basically can just make sure those are flushed
>>>>> when we replace the key. The only potential issue I see are traffic
>>>>> shapers but then I think I eapol frames bypass them already.
>>>
>>>> the issue of race b/w set_key and send_m4 probably caused the delays in
>>>> socket_queues/QDISC for M4, can you please elaborate on how EAPoL
>>>> bypasses QDISC?
>>>> I don't see PACKET_QDISC_BYPASS being set for l2_packet_send.
>>>
>>> I had some hope that eapol packets would have a build in fast lane but
>>> you are right: They get added to the QDISC queue and processed like any
>>> other packets.
>>>
>>> But why do we not just set PACKET_QDISC_BYPASS for eapol frames?
>>> Just tried it and it seems to work as it should: fine.
>>>
>>> We could allow this setsockopt() call to fail, so all kernels < 3.14
>>> would continue to push the eapol packets through the QDISC. But starting
>>> with 3.14 drivers flushing their queues prior to deleting a key can be
>>> sure the eapol frame has been send.
>>>
>>> The only potential down side I can find is the - required - lack of
>>> buffers. The driver could therefore refuse to accept the eapol packet
>>> which should cause a disconnect instead a queued and later send eapol frame.
>>> Using the control port for eapol would of course still better, but using
>>> PACKET_QDISC_BYPASS when this is not working looks like a good fallback
>>> and an improvement compared to as things are today.
>>>
>> I agree using that option to bypass QDISC is a definite improvment.
> 
> To summarize: Patching hostap/wpa_s to use PACKET_QDISC_BYPASS is enough
> (along with driver stalling the set_key till EAPoL is flushed) and we
> don't need this patch
> anymore? Please confirm.
> 
That is my understanding, yes.

But the "simple" patch (https://patchwork.ozlabs.org/patch/1161750/) to 
enable PACKET_QDISC_BYPASS in hostapd is little more than a POC and 
doing it right will be quite invasive.

And using CONTROL_PORT has some benefits we can't get otherwise. Here 
the ones I see:
- The drivers may not have to flush the queues in all constellations
- EAPOL MPDUs can be send with the basic rate
- EAPOL MPDUs will normally be a separate TID

Denis may have more, so far I only had passing contact with CONTROL_PORT.

We discussed the the different solutions recently here on linux wireless:
https://marc.info/?l=linux-wireless&m=156873433609504&w=2

In short CONTROL_PORT was designed to handle this kind of races and 
since the API is available should be the preferred solution.

Regardless of using CONTROL_PORT or queue flushes the drivers/cards will 
have to make sure they can rekey correctly. There are many pitfalls and 
nearly all drivers have some oversights here.

Alexander
Krishna Chaitanya Oct. 20, 2019, 3:43 p.m. UTC | #26
On Sat, Oct 19, 2019 at 1:14 AM Alexander Wetzel
<alexander@wetzel-home.de> wrote:
>
> Am 16.10.19 um 12:34 schrieb Krishna Chaitanya:
> > On Sun, Sep 1, 2019 at 8:50 PM Krishna Chaitanya
> > <chaitanya.mgit@gmail.com> wrote:
> >>
> >> On Sun, Sep 1, 2019 at 4:15 PM Alexander Wetzel
> >> <alexander@wetzel-home.de> wrote:
> >>>
> >>>
> >>>>> Normally there are next no buffers in the tx path. When you hand over a
> >>>>> frame to the kernel it's - according what I remember when investigating
> >>>>> the issue - an atomic operation till the skb reaches mac80211 or
> >>>>> whatever other driver we use. And the few "buffers" we have are fully
> >>>>> driver controlled and we basically can just make sure those are flushed
> >>>>> when we replace the key. The only potential issue I see are traffic
> >>>>> shapers but then I think I eapol frames bypass them already.
> >>>
> >>>> the issue of race b/w set_key and send_m4 probably caused the delays in
> >>>> socket_queues/QDISC for M4, can you please elaborate on how EAPoL
> >>>> bypasses QDISC?
> >>>> I don't see PACKET_QDISC_BYPASS being set for l2_packet_send.
> >>>
> >>> I had some hope that eapol packets would have a build in fast lane but
> >>> you are right: They get added to the QDISC queue and processed like any
> >>> other packets.
> >>>
> >>> But why do we not just set PACKET_QDISC_BYPASS for eapol frames?
> >>> Just tried it and it seems to work as it should: fine.
> >>>
> >>> We could allow this setsockopt() call to fail, so all kernels < 3.14
> >>> would continue to push the eapol packets through the QDISC. But starting
> >>> with 3.14 drivers flushing their queues prior to deleting a key can be
> >>> sure the eapol frame has been send.
> >>>
> >>> The only potential down side I can find is the - required - lack of
> >>> buffers. The driver could therefore refuse to accept the eapol packet
> >>> which should cause a disconnect instead a queued and later send eapol frame.
> >>> Using the control port for eapol would of course still better, but using
> >>> PACKET_QDISC_BYPASS when this is not working looks like a good fallback
> >>> and an improvement compared to as things are today.
> >>>
> >> I agree using that option to bypass QDISC is a definite improvment.
> >
> > To summarize: Patching hostap/wpa_s to use PACKET_QDISC_BYPASS is enough
> > (along with driver stalling the set_key till EAPoL is flushed) and we
> > don't need this patch
> > anymore? Please confirm.
> >
> That is my understanding, yes.
>
> But the "simple" patch (https://patchwork.ozlabs.org/patch/1161750/) to
> enable PACKET_QDISC_BYPASS in hostapd is little more than a POC and
> doing it right will be quite invasive.
>
> And using CONTROL_PORT has some benefits we can't get otherwise. Here
> the ones I see:
> - The drivers may not have to flush the queues in all constellations
> - EAPOL MPDUs can be send with the basic rate
> - EAPOL MPDUs will normally be a separate TID
Agree, I guess most of the drivers/FW do look for EAPoL and handle it
specially,
but good to have control_port.

> Denis may have more, so far I only had passing contact with CONTROL_PORT.
>
> We discussed the the different solutions recently here on linux wireless:
> https://marc.info/?l=linux-wireless&m=156873433609504&w=2
>
> In short CONTROL_PORT was designed to handle this kind of races and
> since the API is available should be the preferred solution.
>
> Regardless of using CONTROL_PORT or queue flushes the drivers/cards will
> have to make sure they can rekey correctly. There are many pitfalls and
> nearly all drivers have some oversights here.

It does require multiple knobs to set the right to get initial and
rekeying working.
So, I guess we should try to get this patch along with bypassing QDISC
patch in and
set expectations to the drivers to make sure they flush EAPoL properly.
diff mbox series

Patch

diff --git a/src/drivers/driver.h b/src/drivers/driver.h
index 496bd522e..7cf1582ce 100644
--- a/src/drivers/driver.h
+++ b/src/drivers/driver.h
@@ -1625,6 +1625,8 @@  struct wpa_driver_capa {
 #define WPA_DRIVER_FLAGS_FTM_RESPONDER		0x0100000000000000ULL
 /** Driver support 4-way handshake offload for WPA-Personal */
 #define WPA_DRIVER_FLAGS_4WAY_HANDSHAKE_PSK	0x0200000000000000ULL
+/** Driver has working tx_control_port */
+#define WPA_DRIVER_FLAGS_TX_CONTROL_PORT	0x0400000000000000ULL
 	u64 flags;
 
 #define FULL_AP_CLIENT_STATE_SUPP(drv_flags) \
@@ -2280,6 +2282,30 @@  struct wpa_driver_ops {
 		       const u8 *seq, size_t seq_len,
 		       const u8 *key, size_t key_len);
 
+	/**
+	 * tx_control_port - Send a frame over the 802.1X controlled port
+	 * @priv: private driver interface data
+	 * @dest: Destination MAC address
+	 * @proto: Ethertype in host byte order
+	 * @buf: Frame payload starting from IEEE 802.1X header
+	 * @len: Frame payload length
+	 *
+	 * Returns 0 on success, else an error
+	 *
+	 * This is like a normal ethernet send except that the OS device driver
+	 * is aware (by other means than the ethertype) that this frame is
+	 * ~special~, and more importantly it gains an ordering between the
+	 * transmission of the frame and other driver operations such as key
+	 * installations. This can be used to work around known limitations in
+	 * 802.11 protocols such as race conditions between 802.1X rekeying
+	 * handshake message 4/4 and a PTK being overwritten.
+	 *
+	 * This function is only implemented for a given interface if the driver
+	 * instance reports WPA_DRIVER_FLAGS_TX_CONTROL_PORT. Otherwise API
+	 * users should fall back to sending the frame via a normal socket.
+	 */
+	int (*tx_control_port)(void *priv, const u8 *dest,
+			       u16 proto, const u8 *buf, size_t len);
 	/**
 	 * init - Initialize driver interface
 	 * @ctx: context to be used when calling wpa_supplicant functions,
diff --git a/src/drivers/driver_nl80211.c b/src/drivers/driver_nl80211.c
index 3556b6d69..b1d0220e9 100644
--- a/src/drivers/driver_nl80211.c
+++ b/src/drivers/driver_nl80211.c
@@ -3007,6 +3007,44 @@  static int nl80211_set_pmk(struct wpa_driver_nl80211_data *drv,
 }
 
 
+static int driver_nl80211_tx_control_port(void *priv, const u8 *dest,
+					  u16 proto, const u8 *buf, size_t len)
+{
+	struct i802_bss *bss = priv;
+	struct nl_msg *msg = NULL;
+	int ifindex = if_nametoindex(bss->ifname);
+	int ret = 0;
+
+	wpa_printf(MSG_DEBUG,
+		   "nl80211: tx_control_port "MACSTR" proto=0x%04x len=%zu",
+		   MAC2STR(dest), proto, len);
+
+	msg = nl80211_ifindex_msg(bss->drv, ifindex, 0,
+				  NL80211_CMD_CONTROL_PORT_FRAME);
+	if (!msg)
+		return -ENOBUFS;
+
+	if (nla_put_u16(msg, NL80211_ATTR_CONTROL_PORT_ETHERTYPE, proto))
+		goto fail;
+	if (nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN, dest))
+		goto fail;
+	if (nla_put(msg, NL80211_ATTR_FRAME, len, buf))
+		goto fail;
+
+	ret = send_and_recv_msgs(bss->drv, msg, NULL, NULL);
+	if (ret)
+		wpa_printf(MSG_DEBUG,
+			   "nl80211: tx_control_port failed: ret=%d (%s)",
+			   ret, strerror(ret));
+
+	return ret;
+
+fail:
+	nl80211_nlmsg_clear(msg);
+	nlmsg_free(msg);
+	return -ENOBUFS;
+}
+
 static int wpa_driver_nl80211_set_key(const char *ifname, struct i802_bss *bss,
 				      enum wpa_alg alg, const u8 *addr,
 				      int key_idx, int set_tx,
@@ -10940,6 +10978,7 @@  const struct wpa_driver_ops wpa_driver_nl80211_ops = {
 	.get_bssid = wpa_driver_nl80211_get_bssid,
 	.get_ssid = wpa_driver_nl80211_get_ssid,
 	.set_key = driver_nl80211_set_key,
+	.tx_control_port = driver_nl80211_tx_control_port,
 	.scan2 = driver_nl80211_scan2,
 	.sched_scan = wpa_driver_nl80211_sched_scan,
 	.stop_sched_scan = wpa_driver_nl80211_stop_sched_scan,
diff --git a/src/drivers/driver_nl80211_capa.c b/src/drivers/driver_nl80211_capa.c
index a90a55db8..f87621003 100644
--- a/src/drivers/driver_nl80211_capa.c
+++ b/src/drivers/driver_nl80211_capa.c
@@ -433,6 +433,10 @@  static void wiphy_info_ext_feature_flags(struct wiphy_info_data *info,
 	if (ext_feature_isset(ext_features, len,
 			      NL80211_EXT_FEATURE_ENABLE_FTM_RESPONDER))
 		capa->flags |= WPA_DRIVER_FLAGS_FTM_RESPONDER;
+
+	if (ext_feature_isset(ext_features, len,
+			      NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211))
+		capa->flags |= WPA_DRIVER_FLAGS_TX_CONTROL_PORT;
 }
 
 
diff --git a/src/rsn_supp/wpa.c b/src/rsn_supp/wpa.c
index e0039fac0..aecabac0b 100644
--- a/src/rsn_supp/wpa.c
+++ b/src/rsn_supp/wpa.c
@@ -158,7 +158,7 @@  int wpa_eapol_key_send(struct wpa_sm *sm, struct wpa_ptk *ptk,
 	}
 
 	wpa_hexdump(MSG_MSGDUMP, "WPA: TX EAPOL-Key", msg, msg_len);
-	ret = wpa_sm_ether_send(sm, dest, proto, msg, msg_len);
+	ret = wpa_sm_tx_control_port(sm, dest, proto, msg, msg_len);
 	eapol_sm_notify_tx_eapol_key(sm->eapol);
 out:
 	os_free(msg);
diff --git a/src/rsn_supp/wpa.h b/src/rsn_supp/wpa.h
index ae9cd6484..145d57988 100644
--- a/src/rsn_supp/wpa.h
+++ b/src/rsn_supp/wpa.h
@@ -33,6 +33,8 @@  struct wpa_sm_ctx {
 		       const u8 *key, size_t key_len);
 	void * (*get_network_ctx)(void *ctx);
 	int (*get_bssid)(void *ctx, u8 *bssid);
+	int (*tx_control_port)(void *ctx, const u8 *dest, u16 proto,
+			       const u8 *buf, size_t len);
 	int (*ether_send)(void *ctx, const u8 *dest, u16 proto, const u8 *buf,
 			  size_t len);
 	int (*get_beacon_ie)(void *ctx);
diff --git a/src/rsn_supp/wpa_i.h b/src/rsn_supp/wpa_i.h
index d86734b0d..908af3d25 100644
--- a/src/rsn_supp/wpa_i.h
+++ b/src/rsn_supp/wpa_i.h
@@ -216,6 +216,13 @@  static inline int wpa_sm_get_bssid(struct wpa_sm *sm, u8 *bssid)
 	return sm->ctx->get_bssid(sm->ctx->ctx, bssid);
 }
 
+static inline int wpa_sm_tx_control_port(struct wpa_sm *sm, const u8 *dest,
+					 u16 proto, const u8 *buf, size_t len)
+{
+	WPA_ASSERT(sm->ctx->tx_control_port);
+	return sm->ctx->tx_control_port(sm->ctx->ctx, dest, proto, buf, len);
+}
+
 static inline int wpa_sm_ether_send(struct wpa_sm *sm, const u8 *dest,
 				    u16 proto, const u8 *buf, size_t len)
 {
diff --git a/wpa_supplicant/driver_i.h b/wpa_supplicant/driver_i.h
index f073b8a6d..2372344fa 100644
--- a/wpa_supplicant/driver_i.h
+++ b/wpa_supplicant/driver_i.h
@@ -138,6 +138,14 @@  static inline int wpa_drv_get_ssid(struct wpa_supplicant *wpa_s, u8 *ssid)
 	return -1;
 }
 
+static inline int wpa_drv_tx_contol_port(struct wpa_supplicant *wpa_s,
+					 const u8 *dest,
+					 u16 proto, const u8 *buf, size_t len)
+{
+	return wpa_s->driver->tx_control_port(wpa_s->drv_priv,
+					      dest, proto, buf, len);
+}
+
 static inline int wpa_drv_set_key(struct wpa_supplicant *wpa_s,
 				  enum wpa_alg alg, const u8 *addr,
 				  int key_idx, int set_tx,
diff --git a/wpa_supplicant/ibss_rsn.c b/wpa_supplicant/ibss_rsn.c
index 6934c4725..cdadadee1 100644
--- a/wpa_supplicant/ibss_rsn.c
+++ b/wpa_supplicant/ibss_rsn.c
@@ -76,6 +76,23 @@  static int supp_ether_send(void *ctx, const u8 *dest, u16 proto, const u8 *buf,
 }
 
 
+static int supp_tx_control_port(void *ctx, const u8 *dest, u16 proto,
+				const u8 *buf, size_t len)
+{
+	struct ibss_rsn_peer *peer = ctx;
+	struct wpa_supplicant *wpa_s = peer->ibss_rsn->wpa_s;
+
+	wpa_printf(MSG_DEBUG, "SUPP: %s(dest=" MACSTR " proto=0x%04x "
+		   "len=%lu)",
+		   __func__, MAC2STR(dest), proto, (unsigned long) len);
+
+	if (wpa_s->drv_flags & WPA_DRIVER_FLAGS_TX_CONTROL_PORT)
+		return wpa_drv_tx_contol_port(wpa_s, dest, proto, buf, len);
+	else
+		return supp_ether_send(wpa_s, dest, proto, buf, len);
+}
+
+
 static u8 * supp_alloc_eapol(void *ctx, u8 type, const void *data,
 			     u16 data_len, size_t *msg_len, void **data_pos)
 {
@@ -211,6 +228,7 @@  static int ibss_rsn_supp_init(struct ibss_rsn_peer *peer, const u8 *own_addr,
 	ctx->set_state = supp_set_state;
 	ctx->get_state = supp_get_state;
 	ctx->ether_send = supp_ether_send;
+	ctx->tx_control_port = supp_tx_control_port;
 	ctx->get_beacon_ie = supp_get_beacon_ie;
 	ctx->alloc_eapol = supp_alloc_eapol;
 	ctx->set_key = supp_set_key;
diff --git a/wpa_supplicant/wpas_glue.c b/wpa_supplicant/wpas_glue.c
index e98bf1147..6f5c301f8 100644
--- a/wpa_supplicant/wpas_glue.c
+++ b/wpa_supplicant/wpas_glue.c
@@ -85,17 +85,9 @@  static u8 * wpa_alloc_eapol(const struct wpa_supplicant *wpa_s, u8 type,
 }
 
 
-/**
- * wpa_ether_send - Send Ethernet frame
- * @wpa_s: Pointer to wpa_supplicant data
- * @dest: Destination MAC address
- * @proto: Ethertype in host byte order
- * @buf: Frame payload starting from IEEE 802.1X header
- * @len: Frame payload length
- * Returns: >=0 on success, <0 on failure
- */
-static int wpa_ether_send(struct wpa_supplicant *wpa_s, const u8 *dest,
-			  u16 proto, const u8 *buf, size_t len)
+static inline int ext_eapol_frame_io_notify_tx(struct wpa_supplicant *wpa_s,
+					       const u8 *dest, u16 proto,
+					       const u8 *buf, size_t len)
 {
 #ifdef CONFIG_TESTING_OPTIONS
 	if (wpa_s->ext_eapol_frame_io && proto == ETH_P_EAPOL) {
@@ -108,18 +100,60 @@  static int wpa_ether_send(struct wpa_supplicant *wpa_s, const u8 *dest,
 		wpa_msg(wpa_s, MSG_INFO, "EAPOL-TX " MACSTR " %s",
 			MAC2STR(dest), hex);
 		os_free(hex);
-		return 0;
+		return -1;
 	}
 #endif /* CONFIG_TESTING_OPTIONS */
 
+	return 0;
+}
+
+/**
+ * wpa_ether_send - Send Ethernet frame
+ * @wpa_s: Pointer to wpa_supplicant data
+ * @dest: Destination MAC address
+ * @proto: Ethertype in host byte order
+ * @buf: Frame payload starting from IEEE 802.1X header
+ * @len: Frame payload length
+ * Returns: >=0 on success, <0 on failure
+ */
+static int wpa_ether_send(struct wpa_supplicant *wpa_s, const u8 *dest,
+			  u16 proto, const u8 *buf, size_t len)
+{
+	if (ext_eapol_frame_io_notify_tx(wpa_s, dest, proto, buf, len))
+		return 0;
+
 	if (wpa_s->l2) {
 		return l2_packet_send(wpa_s->l2, dest, proto, buf, len);
 	}
 
 	return -1;
 }
-#endif /* IEEE8021X_EAPOL || !CONFIG_NO_WPA */
 
+/**
+ * wpa_supplicant_tx_control_port - Send Ethernet frame over 802.1X control port
+ * @wpa_s: Pointer to wpa_supplicant data
+ * @dest: Destination MAC address
+ * @proto: Ethertype in host byte order
+ * @buf: Frame payload starting from IEEE 802.1X header
+ * @len: Frame payload length
+ * Just like wpa_ether_send, but when this function is used the driver may be
+ * able to handle control port frames specially.
+ * Returns: >=0 on success, <0 on failure
+ */
+static int wpa_supplicant_tx_control_port(void *ctx, const u8 *dest,
+					  u16 proto, const u8 *buf, size_t len)
+{
+	struct wpa_supplicant *wpa_s = ctx;
+
+	if (wpa_s->drv_flags & WPA_DRIVER_FLAGS_TX_CONTROL_PORT) {
+		if (ext_eapol_frame_io_notify_tx(wpa_s, dest, proto, buf, len))
+			return 0;
+		return wpa_drv_tx_contol_port(wpa_s, dest, proto, buf, len);
+	} else {
+		return wpa_ether_send(wpa_s, dest, proto, buf, len);
+	}
+}
+#endif /* IEEE8021X_EAPOL || !CONFIG_NO_WPA */
 
 #ifdef IEEE8021X_EAPOL
 
@@ -1214,6 +1248,7 @@  int wpa_supplicant_init_wpa(struct wpa_supplicant *wpa_s)
 	ctx->get_network_ctx = wpa_supplicant_get_network_ctx;
 	ctx->get_bssid = wpa_supplicant_get_bssid;
 	ctx->ether_send = _wpa_ether_send;
+	ctx->tx_control_port = wpa_supplicant_tx_control_port;
 	ctx->get_beacon_ie = wpa_supplicant_get_beacon_ie;
 	ctx->alloc_eapol = _wpa_alloc_eapol;
 	ctx->cancel_auth_timeout = _wpa_supplicant_cancel_auth_timeout;