diff mbox

[1/2] Client Taxonomy

Message ID 1470869553-23547-2-git-send-email-dgentry@google.com
State Superseded
Headers show

Commit Message

Denton Gentry Aug. 10, 2016, 10:52 p.m. UTC
Implement the signature mechanism described in the paper
"Passive Taxonomy of Wifi Clients using MLME Frame Contents"
published by Denton Gentry and Avery Pennarun.

http://research.google.com/pubs/pub45429.html
https://arxiv.org/abs/1608.01725

This involves:
1. Store the Probe Request and Associate Request in the sta_info_t.
2. Implement code to extract the ID of each Information Element,
   plus selected fields and bitmasks from certain IEs, into a
   descriptive text string. This is done in a new source file,
   src/ap/taxonomy.c.
3. Add a "client_taxonomy=[0|1]" in hostapd.conf. Enabling taxonomy
   incurs a memory overhead of up to several kilobytes per associated
   station.
4. Implement a "signature qq:rr:ss:tt:uu:vv" command
   in hostapd_cli to retrieve the signature.

Signatures take the form of a text string. For example, a signature
for the Nexus 5X is:
  wifi4|probe:0,1,127,45,191,htcap:01ef,htagg:03,htmcs:0000ffff,vhtcap:338061b2,
  vhtrxmcs:030cfffa,vhttxmcs:030cfffa,extcap:00000a0201000040|assoc:0,1,48,45,
  221(0050f2,2),191,127,htcap:01ef,htagg:03,htmcs:0000ffff,vhtcap:339071b2,
  vhtrxmcs:030cfffa,vhttxmcs:030cfffa,extcap:0000000000000040

Signed-off-by: dgentry@google.com (Denton Gentry)
Signed-off-by: denny@geekhold.com (Denton Gentry)
---
 hostapd/Makefile       |   1 +
 hostapd/config_file.c  |   2 +
 hostapd/ctrl_iface.c   |   3 +
 hostapd/hostapd.conf   |   7 ++
 hostapd/hostapd_cli.c  |  16 +++
 src/ap/ap_config.h     |   2 +
 src/ap/beacon.c        |   8 ++
 src/ap/ctrl_iface_ap.c |  33 ++++++
 src/ap/ctrl_iface_ap.h |   3 +
 src/ap/ieee802_11.c    |   3 +
 src/ap/sta_info.c      |  11 ++
 src/ap/sta_info.h      |   5 +
 src/ap/taxonomy.c      | 302 +++++++++++++++++++++++++++++++++++++++++++++++++
 src/ap/taxonomy.h      |  19 ++++
 14 files changed, 415 insertions(+)
 create mode 100644 src/ap/taxonomy.c
 create mode 100644 src/ap/taxonomy.h

Comments

Johannes Berg Aug. 11, 2016, 6:35 a.m. UTC | #1
> +##### Client Taxonomy #########################################################
> +#
> +# Has the AP retain the Probe Request and Association Request MLME frames from
> +# a client, from which a signature can be produced which can identify the model
> +# of client device like "Nexus 6P" or "iPhone 5s"
> +# client_taxonomy=1
> 
This being a fairly niche feature, perhaps it should get a build time
option so the code can be excluded? Even things almost everybody wants
like 11N have build time options, and this one seems to be much more
likely to not be desired in all builds. Thoughts?

johannes
Arran Cudbard-Bell Aug. 13, 2016, 9:18 p.m. UTC | #2
> On 11 Aug 2016, at 08:35, Johannes Berg <johannes@sipsolutions.net> wrote:
> 
> 
>> +##### Client Taxonomy #########################################################
>> +#
>> +# Has the AP retain the Probe Request and Association Request MLME frames from
>> +# a client, from which a signature can be produced which can identify the model
>> +# of client device like "Nexus 6P" or "iPhone 5s"
>> +# client_taxonomy=1
>> 
> This being a fairly niche feature, perhaps it should get a build time
> option so the code can be excluded? Even things almost everybody wants
> like 11N have build time options, and this one seems to be much more
> likely to not be desired in all builds. Thoughts?

I think the techniques Avery described in his presentation (https://www.youtube.com/watch?v=yZcHbD84j5Y) are equally useful in Education, Enterprise and Carrier deployments and are not at all niche.

If signature definitions were bundled, and the determination/confidence info were inserted into an attribute like Connect-Info (or a hostapd VSA - I’m sure Alan DeKok will comment on appropriate attribute usage), you’d see very widespread adoption and use.  It’d represent an ultra low barrier to the sort of analysis Google are doing on their ISP network.

If there’s interest, and the Client Taxonomy patches go in, we (FreeRADIUS) would definitely be up for submitting patches to add RADIUS support.

-Arran

Arran Cudbard-Bell <a.cudbardb@freeradius.org>
FreeRADIUS Development Team

FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2
Denton Gentry Aug. 14, 2016, 4:17 p.m. UTC | #3
On Sat, Aug 13, 2016 at 2:18 PM, Arran Cudbard-Bell
<a.cudbardb@freeradius.org> wrote:
>> On 11 Aug 2016, at 08:35, Johannes Berg <johannes@sipsolutions.net> wrote:
>>
>>
>>> +##### Client Taxonomy #########################################################
>>> +#
>>> +# Has the AP retain the Probe Request and Association Request MLME frames from
>>> +# a client, from which a signature can be produced which can identify the model
>>> +# of client device like "Nexus 6P" or "iPhone 5s"
>>> +# client_taxonomy=1
>>>
>> This being a fairly niche feature, perhaps it should get a build time
>> option so the code can be excluded? Even things almost everybody wants
>> like 11N have build time options, and this one seems to be much more
>> likely to not be desired in all builds. Thoughts?
>
> I think the techniques Avery described in his presentation (https://www.youtube.com/watch?v=yZcHbD84j5Y) are equally useful in Education, Enterprise and Carrier deployments and are not at all niche.
>
> If signature definitions were bundled, and the determination/confidence info were inserted into an attribute like Connect-Info (or a hostapd VSA - I’m sure Alan DeKok will comment on appropriate attribute usage), you’d see very widespread adoption and use.  It’d represent an ultra low barrier to the sort of analysis Google are doing on their ISP network.
>
> If there’s interest, and the Client Taxonomy patches go in, we (FreeRADIUS) would definitely be up for submitting patches to add RADIUS support.
>
> -Arran
>
> Arran Cudbard-Bell <a.cudbardb@freeradius.org>
> FreeRADIUS Development Team
>
> FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2

The signature database we've assembled is available:
https://gfiber.googlesource.com/vendor/google/platform/+/master/taxonomy/wifi.py

We intend to extract it out of the git repository it is currently in
(shared with a number of other tools we use) and into a github repo of
its own. That would let signature submissions be handled as pull
requests. We also need to revamp how it gets its inputs. Previously we
had hostapd writing directly to files, the current signature lookup
code expects to use those files.

However using it from hostapd at the time of sending the RADIUS report
would be challenging. A number of the signatures supplement the
information from the MLME frames with information from DHCP, and the
DHCP exchange happens later. We talk about this in the paper
https://arxiv.org/pdf/1608.01725v1.pdf in sections labelled
"Supplemental Information" about OUIs and DHCP.

There are a number of signatures where we could switch from DHCP to
rely on OUIs, but some of the important ones would be difficult. For
example we use the DHCP signature of iOS for the various Apple
devices. Apple's production volume is such that they consume OUIs
every couple weeks, faster than we can keep up.
Arran Cudbard-Bell Aug. 14, 2016, 4:48 p.m. UTC | #4
> 
> The signature database we've assembled is available:
> https://gfiber.googlesource.com/vendor/google/platform/+/master/taxonomy/wifi.py
> 
> We intend to extract it out of the git repository it is currently in
> (shared with a number of other tools we use) and into a github repo of
> its own. That would let signature submissions be handled as pull
> requests. We also need to revamp how it gets its inputs. Previously we
> had hostapd writing directly to files, the current signature lookup
> code expects to use those files.
> 
> However using it from hostapd at the time of sending the RADIUS report
> would be challenging. A number of the signatures supplement the
> information from the MLME frames with information from DHCP, and the
> DHCP exchange happens later. We talk about this in the paper
> https://arxiv.org/pdf/1608.01725v1.pdf in sections labelled
> "Supplemental Information" about OUIs and DHCP.
> 
> There are a number of signatures where we could switch from DHCP to
> rely on OUIs, but some of the important ones would be difficult.

It’s reasonably common in commercial equipment that supports DHCP Snooping for RADIUS Interim-Update packets to be sent as soon as the AP learns the IP of the STA.  We could do something similar here. It’s fine for additional data to be added in later accounting packets so long as the Acct-Session-ID attribute stays consistent.

Forwarding the data learned from 802.11 frames to the RADIUS server for aggregation and correlation with DHCP data would also be an option, but I think Interim-Updates would be simpler and easier for people to use.

> For
> example we use the DHCP signature of iOS for the various Apple
> devices. Apple's production volume is such that they consume OUIs
> every couple weeks, faster than we can keep up.

Wow, that’s pretty crazy!

-Arran

Arran Cudbard-Bell <a.cudbardb@freeradius.org>
FreeRADIUS Development Team

FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2
Nick Lowe Aug. 14, 2016, 8:09 p.m. UTC | #5
> It’s reasonably common in commercial equipment that supports DHCP Snooping for RADIUS Interim-Update packets to be sent as soon as the AP learns the IP of the STA.  We could do something similar here. It’s fine for additional data to be added in later accounting packets so long as the Acct-Session-ID attribute stays consistent.

At Aerohive, we have implemented asynchronous, 'immediate'
Interim-Updates in our recent software releases so that as a
client/stations's IP becomes known or changes, we send an immediate
Interim-Update with that information. If the information is available
at the time that a Start is available, we will not send an
asynchronous Interim-Update.

(FWIW, we also now ensure that we only use DHCP snooped information to
populate the Framed-IP-Address AVP, rather than ARP sourced
information so that it's not unreliable / trivially spoofable.)
Avery Pennarun Aug. 14, 2016, 11:28 p.m. UTC | #6
On Sat, Aug 13, 2016 at 5:18 PM, Arran Cudbard-Bell
<a.cudbardb@freeradius.org> wrote:
>> On 11 Aug 2016, at 08:35, Johannes Berg <johannes@sipsolutions.net> wrote:
>> This being a fairly niche feature, perhaps it should get a build time
>> option so the code can be excluded? Even things almost everybody wants
>> like 11N have build time options, and this one seems to be much more
>> likely to not be desired in all builds. Thoughts?
>
> I think the techniques Avery described in his presentation (https://www.youtube.com/watch?v=yZcHbD84j5Y)
> are equally useful in Education, Enterprise and Carrier deployments and are not at all niche.
>
> If signature definitions were bundled, and the determination/confidence info were inserted into an attribute like
> Connect-Info (or a hostapd VSA - I’m sure Alan DeKok will comment on appropriate attribute usage), you’d
> see very widespread adoption and use.  It’d represent an ultra low barrier to the sort of analysis Google are
> doing on their ISP network.

Although I agree that the feature is awesome, I also agree that not
everyone will want it, so it still makes sense to make it optional I
guess :)

Have fun,

Avery
Johannes Berg Aug. 15, 2016, 5:38 a.m. UTC | #7
On Sat, 2016-08-13 at 23:18 +0200, Arran Cudbard-Bell wrote:

> I think the techniques Avery described in his presentation (https://w
> ww.youtube.com/watch?v=yZcHbD84j5Y) are equally useful in Education,
> Enterprise and Carrier deployments and are not at all niche.
> 

Maybe so, perhaps my wording was misleading. My main concern is that
wpa_supplicant (and to some extent hostapd) is also often deployed in
something like a small IoT devices, firmware, smartphones, etc. where
none of this will matter at all.

If wpa_supplicant is compiled with AP/P2P support, then all the code
under src/ap/ will be used, but the taxonomy feature could never be.

In any case, I see that Denton just posted a new version with ifdefs,
thanks :)

johannes
diff mbox

Patch

diff --git a/hostapd/Makefile b/hostapd/Makefile
index ba094ba..9662b65 100644
--- a/hostapd/Makefile
+++ b/hostapd/Makefile
@@ -86,6 +86,7 @@  OBJS += ../src/ap/beacon.o
 OBJS += ../src/ap/bss_load.o
 OBJS += ../src/ap/neighbor_db.o
 OBJS += ../src/ap/rrm.o
+OBJS += ../src/ap/taxonomy.o
 
 OBJS_c = hostapd_cli.o
 OBJS_c += ../src/common/wpa_ctrl.o
diff --git a/hostapd/config_file.c b/hostapd/config_file.c
index 6c53151..c2330fe 100644
--- a/hostapd/config_file.c
+++ b/hostapd/config_file.c
@@ -3486,6 +3486,8 @@  static int hostapd_config_fill(struct hostapd_config *conf,
 				WLAN_RRM_CAPS_NEIGHBOR_REPORT;
 	} else if (os_strcmp(buf, "gas_address3") == 0) {
 		bss->gas_address3 = atoi(pos);
+	} else if (os_strcmp(buf, "client_taxonomy") == 0) {
+		conf->client_taxonomy = atoi(pos);
 	} else {
 		wpa_printf(MSG_ERROR,
 			   "Line %d: unknown configuration item '%s'",
diff --git a/hostapd/ctrl_iface.c b/hostapd/ctrl_iface.c
index 4e7b58e..99ceaa3 100644
--- a/hostapd/ctrl_iface.c
+++ b/hostapd/ctrl_iface.c
@@ -2364,6 +2364,9 @@  static int hostapd_ctrl_iface_receive_process(struct hostapd_data *hapd,
 	} else if (os_strncmp(buf, "DISASSOCIATE ", 13) == 0) {
 		if (hostapd_ctrl_iface_disassociate(hapd, buf + 13))
 			reply_len = -1;
+	} else if (os_strncmp(buf, "SIGNATURE ", 10) == 0) {
+		reply_len = hostapd_ctrl_iface_signature(hapd, buf + 10,
+							 reply, reply_size);
 	} else if (os_strncmp(buf, "POLL_STA ", 9) == 0) {
 		if (hostapd_ctrl_iface_poll_sta(hapd, buf + 9))
 			reply_len = -1;
diff --git a/hostapd/hostapd.conf b/hostapd/hostapd.conf
index a310c05..33a3140 100644
--- a/hostapd/hostapd.conf
+++ b/hostapd/hostapd.conf
@@ -1909,6 +1909,13 @@  own_ip_addr=127.0.0.1
 # Enable neighbor report via radio measurements
 #rrm_neighbor_report=1
 
+##### Client Taxonomy #########################################################
+#
+# Has the AP retain the Probe Request and Association Request MLME frames from
+# a client, from which a signature can be produced which can identify the model
+# of client device like "Nexus 6P" or "iPhone 5s"
+# client_taxonomy=1
+
 ##### TESTING OPTIONS #########################################################
 #
 # The options in this section are only available when the build configuration
diff --git a/hostapd/hostapd_cli.c b/hostapd/hostapd_cli.c
index 04819d1..ec2cf7e 100644
--- a/hostapd/hostapd_cli.c
+++ b/hostapd/hostapd_cli.c
@@ -366,6 +366,20 @@  static char ** hostapd_complete_disassociate(const char *str, int pos)
 }
 
 
+static int hostapd_cli_cmd_signature(struct wpa_ctrl *ctrl, int argc,
+				     char *argv[])
+{
+	char buf[64];
+	if (argc != 1) {
+		printf("Invalid 'signature' command - exactly one "
+		       "argument, STA address, is required.\n");
+		return -1;
+	}
+	os_snprintf(buf, sizeof(buf), "SIGNATURE %s", argv[0]);
+	return wpa_ctrl_command(ctrl, buf);
+}
+
+
 #ifdef CONFIG_IEEE80211W
 static int hostapd_cli_cmd_sa_query(struct wpa_ctrl *ctrl, int argc,
 				    char *argv[])
@@ -1271,6 +1285,8 @@  static const struct hostapd_cli_cmd hostapd_cli_commands[] = {
 	{ "disassociate", hostapd_cli_cmd_disassociate,
 	  hostapd_complete_disassociate,
 	  "<addr> = disassociate a station" },
+	{ "signature", hostapd_cli_cmd_signature, NULL,
+	  "<addr> = get taxonomy signature for a station" },
 #ifdef CONFIG_IEEE80211W
 	{ "sa_query", hostapd_cli_cmd_sa_query, NULL,
 	  "<addr> = send SA Query to a station" },
diff --git a/src/ap/ap_config.h b/src/ap/ap_config.h
index 64daf4c..9c169ab 100644
--- a/src/ap/ap_config.h
+++ b/src/ap/ap_config.h
@@ -703,6 +703,8 @@  struct hostapd_config {
 
 	struct wpabuf *lci;
 	struct wpabuf *civic;
+
+	int client_taxonomy;
 };
 
 
diff --git a/src/ap/beacon.c b/src/ap/beacon.c
index 0570ab7..7cfc7f2 100644
--- a/src/ap/beacon.c
+++ b/src/ap/beacon.c
@@ -29,6 +29,7 @@ 
 #include "beacon.h"
 #include "hs20.h"
 #include "dfs.h"
+#include "taxonomy.h"
 
 
 #ifdef NEED_AP_MLME
@@ -782,6 +783,13 @@  void handle_probe_req(struct hostapd_data *hapd,
 	}
 #endif /* CONFIG_P2P */
 
+	{
+		struct sta_info *sta = ap_get_sta(hapd, mgmt->sa);
+		if (sta) {
+			hostapd_taxonomy_probe_req(hapd, sta, ie, ie_len);
+		}
+	}
+
 	res = ssid_match(hapd, elems.ssid, elems.ssid_len,
 			 elems.ssid_list, elems.ssid_list_len);
 	if (res == NO_SSID_MATCH) {
diff --git a/src/ap/ctrl_iface_ap.c b/src/ap/ctrl_iface_ap.c
index 17a3ea4..8c6d915 100644
--- a/src/ap/ctrl_iface_ap.c
+++ b/src/ap/ctrl_iface_ap.c
@@ -23,6 +23,7 @@ 
 #include "ctrl_iface_ap.h"
 #include "ap_drv_ops.h"
 #include "mbo_ap.h"
+#include "taxonomy.h"
 
 
 static int hostapd_get_sta_tx_rx(struct hostapd_data *hapd,
@@ -429,6 +430,38 @@  int hostapd_ctrl_iface_disassociate(struct hostapd_data *hapd,
 }
 
 
+int hostapd_ctrl_iface_signature(struct hostapd_data *hapd,
+				 const char *txtaddr,
+				 char *buf, size_t buflen)
+{
+	u8 addr[ETH_ALEN];
+	int ret;
+	struct sta_info *sta;
+
+	wpa_dbg(hapd->msg_ctx, MSG_DEBUG, "CTRL_IFACE SIGNATURE %s", txtaddr);
+
+	if (hwaddr_aton(txtaddr, addr)) {
+		ret = os_snprintf(buf, buflen, "FAIL\n");
+		if (os_snprintf_error(buflen, ret))
+			return 0;
+		return ret;
+	}
+
+	sta = ap_get_sta(hapd, addr);
+	if (sta == NULL)
+		return -1;
+
+	if (!hapd->iconf->client_taxonomy) {
+		ret = os_snprintf(buf, buflen, "DISABLED\n");
+		if (os_snprintf_error(buflen, ret))
+			return 0;
+		return ret;
+	}
+
+	return retrieve_sta_taxonomy(hapd, sta, buf, buflen);
+}
+
+
 int hostapd_ctrl_iface_poll_sta(struct hostapd_data *hapd,
 				const char *txtaddr)
 {
diff --git a/src/ap/ctrl_iface_ap.h b/src/ap/ctrl_iface_ap.h
index 6095d7d..4f99680 100644
--- a/src/ap/ctrl_iface_ap.h
+++ b/src/ap/ctrl_iface_ap.h
@@ -19,6 +19,9 @@  int hostapd_ctrl_iface_deauthenticate(struct hostapd_data *hapd,
 				      const char *txtaddr);
 int hostapd_ctrl_iface_disassociate(struct hostapd_data *hapd,
 				    const char *txtaddr);
+int hostapd_ctrl_iface_signature(struct hostapd_data *hapd,
+				 const char *txtaddr,
+				 char *buf, size_t buflen);
 int hostapd_ctrl_iface_poll_sta(struct hostapd_data *hapd,
 				const char *txtaddr);
 int hostapd_ctrl_iface_status(struct hostapd_data *hapd, char *buf,
diff --git a/src/ap/ieee802_11.c b/src/ap/ieee802_11.c
index 555a731..52413cc 100644
--- a/src/ap/ieee802_11.c
+++ b/src/ap/ieee802_11.c
@@ -44,6 +44,7 @@ 
 #include "dfs.h"
 #include "mbo_ap.h"
 #include "rrm.h"
+#include "taxonomy.h"
 
 
 u8 * hostapd_eid_supp_rates(struct hostapd_data *hapd, u8 *eid)
@@ -2250,6 +2251,8 @@  static void handle_assoc(struct hostapd_data *hapd,
 	 * remove the STA immediately. */
 	sta->timeout_next = STA_NULLFUNC;
 
+	hostapd_taxonomy_assoc_req(hapd, sta, pos, left);
+
  fail:
 	/*
 	 * In case of a successful response, add the station to the driver.
diff --git a/src/ap/sta_info.c b/src/ap/sta_info.c
index c36842b..ed321f7 100644
--- a/src/ap/sta_info.c
+++ b/src/ap/sta_info.c
@@ -222,6 +222,17 @@  void ap_free_sta(struct hostapd_data *hapd, struct sta_info *sta)
 		hapd->iface->num_sta_ht_20mhz--;
 	}
 
+	if (sta->probe_ie_taxonomy) {
+		os_free((void *)sta->probe_ie_taxonomy);
+		sta->probe_ie_taxonomy = NULL;
+		sta->probe_ie_taxonomy_len = 0;
+	}
+	if (sta->assoc_ie_taxonomy) {
+		os_free((void *)sta->assoc_ie_taxonomy);
+		sta->assoc_ie_taxonomy = NULL;
+		sta->assoc_ie_taxonomy_len = 0;
+	}
+
 #ifdef CONFIG_IEEE80211N
 	ht40_intolerant_remove(hapd->iface, sta);
 #endif /* CONFIG_IEEE80211N */
diff --git a/src/ap/sta_info.h b/src/ap/sta_info.h
index cf3fbb1..4a2014a 100644
--- a/src/ap/sta_info.h
+++ b/src/ap/sta_info.h
@@ -214,6 +214,11 @@  struct sta_info {
 			      * received, starting from the Length field */
 
 	u8 rrm_enabled_capa[5];
+
+	const u8 *probe_ie_taxonomy;
+	size_t probe_ie_taxonomy_len;
+	const u8 *assoc_ie_taxonomy;
+	size_t assoc_ie_taxonomy_len;
 };
 
 
diff --git a/src/ap/taxonomy.c b/src/ap/taxonomy.c
new file mode 100644
index 0000000..71f1645
--- /dev/null
+++ b/src/ap/taxonomy.c
@@ -0,0 +1,302 @@ 
+/*
+ * hostapd / Client taxonomy
+ * Copyright (c) 2015 Google, Inc.
+ *
+ * This software may be distributed under the terms of the BSD license.
+ * See README for more details.
+ */
+
+/*
+ * Parse a series of IEs, as in Probe or Association packets,
+ * and render them to a descriptive string. The tag number of
+ * standard options is written to the string, while the vendor
+ * ID and subtag are written for vendor options.
+ *
+ * Example strings:
+ * 0,1,50,45,221(00904c,51)
+ * 0,1,33,36,48,45,221(00904c,51),221(0050f2,2)
+ */
+
+#include "includes.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+#include <dirent.h>
+#include "common/wpa_ctrl.h"
+#include "utils/includes.h"
+#include "utils/common.h"
+#include "hostapd.h"
+#include "sta_info.h"
+
+
+/* Copy a string with no funny schtuff allowed; only alphanumerics. */
+static void no_mischief_strncpy(char *dst, const char *src, size_t n)
+{
+	size_t i;
+	for (i = 0; i < n; i++) {
+		unsigned char s = src[i];
+		int is_lower = (s >= 'a' && s <= 'z');
+		int is_upper = (s >= 'A' && s <= 'Z');
+		int is_digit = (s >= '0' && s <= '9');
+		if (is_lower || is_upper || is_digit) {
+			/* TODO: if any manufacturer uses Unicode within the
+			 * WPS header, it will get mangled here. */
+			dst[i] = s;
+		} else {
+			/* note that even spaces will be transformed to underscores,
+			 * so 'Nexus 7' will turn into 'Nexus_7'. This is deliberate,
+			 * to make the string easier to parse. */
+			dst[i] = '_';
+		}
+	}
+}
+
+static int get_wps_name(char *name, size_t name_len,
+		const u8 *data, size_t data_len)
+{
+	/* Inside the WPS IE are a series of sub-IEs, using two byte IDs
+	 * and two byte lengths. We're looking for the model name, if
+	 * present. */
+	while (data_len >= 4) {
+		u16 id, elen;
+		id = (data[0] << 8) | data[1];
+		elen = (data[2] << 8) | data[3];
+		data += 4;
+		data_len -= 4;
+
+		if (elen > data_len) {
+			return 0;
+		}
+
+		if (id == 0x1023) {
+			/* Model name, like 'Nexus 7' */
+			size_t n = (elen < name_len) ? elen : name_len;
+			no_mischief_strncpy(name, (const char *)data, n);
+			return n;
+		}
+
+		data += elen;
+		data_len -= elen;
+	}
+
+	return 0;
+}
+
+static void ie_to_string(char *fstr, size_t fstr_len,
+                         const u8 *ie, size_t ie_len)
+{
+	size_t flen = fstr_len - 1;
+	char htcap[7 + 4 + 1];  // ",htcap:" + %04hx + trailing NUL
+	char htagg[7 + 2 + 1];  // ",htagg:" + %02hx + trailing NUL
+	char htmcs[7 + 8 + 1];  // ",htmcs:" + %08x + trailing NUL
+	char vhtcap[8 + 8 + 1];  // ",vhtcap:" + %08x + trailing NUL
+	char vhtrxmcs[10 + 8 + 1];  // ",vhtrxmcs:" + %08x + trailing NUL
+	char vhttxmcs[10 + 8 + 1];  // ",vhttxmcs:" + %08x + trailing NUL
+	#define MAX_EXTCAP	254
+	char extcap[8 + (2 * MAX_EXTCAP) + 1];  // ",extcap:" + hex + trailing NUL
+	char txpow[7 + 4 + 1];  // ",txpow:" + %04hx + trailing NUL
+	#define WPS_NAME_LEN		32
+	char wps[WPS_NAME_LEN + 5 + 1];  // room to prepend ",wps:" + trailing NUL
+	int num = 0;
+
+	memset(htcap, 0, sizeof(htcap));
+	memset(htagg, 0, sizeof(htagg));
+	memset(htmcs, 0, sizeof(htmcs));
+	memset(vhtcap, 0, sizeof(vhtcap));
+	memset(vhtrxmcs, 0, sizeof(vhtrxmcs));
+	memset(vhttxmcs, 0, sizeof(vhttxmcs));
+	memset(extcap, 0, sizeof(extcap));
+	memset(txpow, 0, sizeof(txpow));
+	memset(wps, 0, sizeof(wps));
+	fstr[0] = '\0';
+
+	while (ie_len >= 2) {
+		u8 id, elen;
+		char tagbuf[32];
+		char *sep = (num++ == 0) ? "" : ",";
+
+		id = *ie++;
+		elen = *ie++;
+		ie_len -= 2;
+
+		if (elen > ie_len) {
+			break;
+		}
+
+		if ((id == 221) && (elen >= 4)) {
+			/* Vendor specific */
+			int is_MSFT = (ie[0] == 0x00 && ie[1] == 0x50 && ie[2] == 0xf2);
+			if (is_MSFT && ie[3] == 0x04) {
+				/* WPS */
+				char model_name[WPS_NAME_LEN + 1];
+				const u8 *data = &ie[4];
+				size_t data_len = elen - 4;
+				memset(model_name, 0, sizeof(model_name));
+				if (get_wps_name(model_name, WPS_NAME_LEN, data, data_len)) {
+					snprintf(wps, sizeof(wps), ",wps:%s", model_name);
+				}
+			}
+
+			snprintf(tagbuf, sizeof(tagbuf), "%s%d(%02x%02x%02x,%d)",
+			         sep, id, ie[0], ie[1], ie[2], ie[3]);
+		} else {
+			if ((id == 45) && (elen >= 2)) {
+				/* HT Capabilities (802.11n) */
+				u16 cap;
+				memcpy(&cap, ie, sizeof(cap));
+				snprintf(htcap, sizeof(htcap), ",htcap:%04hx",
+				         le_to_host16(cap));
+			}
+			if ((id == 45) && (elen >= 3)) {
+				/* HT Capabilities (802.11n), A-MPDU information */
+				u8 agg;
+				memcpy(&agg, ie + 2, sizeof(agg));
+				snprintf(htagg, sizeof(htagg), ",htagg:%02hx", (u16)agg);
+			}
+			if ((id == 45) && (elen >= 7)) {
+				/* HT Capabilities (802.11n), MCS information */
+				u32 mcs;
+				memcpy(&mcs, ie + 3, sizeof(mcs));
+				snprintf(htmcs, sizeof(htmcs), ",htmcs:%08hx",
+						(u16)le_to_host32(mcs));
+			}
+			if ((id == 191) && (elen >= 4)) {
+				/* VHT Capabilities (802.11ac) */
+				u32 cap;
+				memcpy(&cap, ie, sizeof(cap));
+				snprintf(vhtcap, sizeof(vhtcap), ",vhtcap:%08x",
+				         le_to_host32(cap));
+			}
+			if ((id == 191) && (elen >= 8)) {
+				/* VHT Capabilities (802.11ac), RX MCS information */
+				u32 mcs;
+				memcpy(&mcs, ie + 4, sizeof(mcs));
+				snprintf(vhtrxmcs, sizeof(vhtrxmcs), ",vhtrxmcs:%08x",
+				         le_to_host32(mcs));
+			}
+			if ((id == 191) && (elen >= 12)) {
+				/* VHT Capabilities (802.11ac), TX MCS information */
+				u32 mcs;
+				memcpy(&mcs, ie + 8, sizeof(mcs));
+				snprintf(vhttxmcs, sizeof(vhttxmcs), ",vhttxmcs:%08x",
+				         le_to_host32(mcs));
+			}
+			if (id == 127) {
+				/* Extended Capabilities */
+				int i;
+				int len = (elen < MAX_EXTCAP) ? elen : MAX_EXTCAP;
+				char *p = extcap;
+
+				p += snprintf(extcap, sizeof(extcap), ",extcap:");
+				for (i = 0; i < len; ++i) {
+					int lim = sizeof(extcap) - strlen(extcap);
+					p += snprintf(p, lim, "%02x", *(ie + i));
+				}
+			}
+			if ((id == 33) && (elen == 2)) {
+				/* TX Power */
+				u16 p;
+				memcpy(&p, ie, sizeof(p));
+				snprintf(txpow, sizeof(txpow), ",txpow:%04hx",
+				         le_to_host16(p));
+			}
+
+			snprintf(tagbuf, sizeof(tagbuf), "%s%d", sep, id);
+		}
+
+		strncat(fstr, tagbuf, flen);
+		flen = fstr_len - strlen(fstr) - 1;
+
+		ie += elen;
+		ie_len -= elen;
+	}
+
+	if (strlen(htcap)) {
+		strncat(fstr, htcap, flen);
+		flen = fstr_len - strlen(fstr) - 1;
+	}
+	if (strlen(htagg)) {
+		strncat(fstr, htagg, flen);
+		flen = fstr_len - strlen(fstr) - 1;
+	}
+	if (strlen(htmcs)) {
+		strncat(fstr, htmcs, flen);
+		flen = fstr_len - strlen(fstr) - 1;
+	}
+	if (strlen(vhtcap)) {
+		strncat(fstr, vhtcap, flen);
+		flen = fstr_len - strlen(fstr) - 1;
+	}
+	if (strlen(vhtrxmcs)) {
+		strncat(fstr, vhtrxmcs, flen);
+		flen = fstr_len - strlen(fstr) - 1;
+	}
+	if (strlen(vhttxmcs)) {
+		strncat(fstr, vhttxmcs, flen);
+		flen = fstr_len - strlen(fstr) - 1;
+	}
+	if (strlen(txpow)) {
+		strncat(fstr, txpow, flen);
+		flen = fstr_len - strlen(fstr) - 1;
+	}
+	if (strlen(extcap)) {
+		strncat(fstr, extcap, flen);
+		flen = fstr_len - strlen(fstr) - 1;
+	}
+	if (strlen(wps)) {
+		strncat(fstr, wps, flen);
+		flen = fstr_len - strlen(fstr) - 1;
+	}
+
+	fstr[fstr_len - 1] = '\0';
+}
+
+int retrieve_sta_taxonomy(const struct hostapd_data *hapd,
+	struct sta_info *sta, char *buf, size_t buflen)
+{
+	if (sta->probe_ie_taxonomy && sta->assoc_ie_taxonomy) {
+		int ret;
+		char probe_signature[768];
+		char assoc_signature[768];
+		ie_to_string(probe_signature, sizeof(probe_signature),
+			sta->probe_ie_taxonomy, sta->probe_ie_taxonomy_len);
+		ie_to_string(assoc_signature, sizeof(assoc_signature),
+			sta->assoc_ie_taxonomy, sta->assoc_ie_taxonomy_len);
+		ret = os_snprintf(buf, buflen, "wifi4|probe:%s|assoc:%s",
+				  probe_signature, assoc_signature);
+		if (os_snprintf_error(buflen, ret))
+			return 0;
+		return ret;
+	}
+	return 0;
+}
+
+void hostapd_taxonomy_probe_req(const struct hostapd_data *hapd,
+	struct sta_info *sta, const u8 *ie, size_t ie_len)
+{
+	if (sta->probe_ie_taxonomy) {
+		os_free(sta->probe_ie_taxonomy);
+		sta->probe_ie_taxonomy = NULL;
+		sta->probe_ie_taxonomy_len = 0;
+	}
+	if (hapd->iconf->client_taxonomy) {
+		sta->probe_ie_taxonomy = os_malloc(ie_len);
+		os_memcpy(sta->probe_ie_taxonomy, ie, ie_len);
+		sta->probe_ie_taxonomy_len = ie_len;
+	}
+}
+
+void hostapd_taxonomy_assoc_req(const struct hostapd_data *hapd,
+	struct sta_info *sta, const u8 *ie, size_t ie_len)
+{
+	if (sta->assoc_ie_taxonomy) {
+		os_free(sta->assoc_ie_taxonomy);
+		sta->assoc_ie_taxonomy = NULL;
+		sta->assoc_ie_taxonomy_len = 0;
+	}
+	if (hapd->iconf->client_taxonomy) {
+		sta->assoc_ie_taxonomy = os_malloc(ie_len);
+		os_memcpy(sta->assoc_ie_taxonomy, ie, ie_len);
+		sta->assoc_ie_taxonomy_len = ie_len;
+	}
+}
diff --git a/src/ap/taxonomy.h b/src/ap/taxonomy.h
new file mode 100644
index 0000000..6d2ec39
--- /dev/null
+++ b/src/ap/taxonomy.h
@@ -0,0 +1,19 @@ 
+/*
+ * hostapd / Station client taxonomy
+ * Copyright (c) 2015 Google, Inc.
+ *
+ * This software may be distributed under the terms of the BSD license.
+ * See README for more details.
+ */
+
+#ifndef TAXONOMY_H
+#define TAXONOMY_H
+
+void hostapd_taxonomy_probe_req(const struct hostapd_data *hapd,
+	struct sta_info *sta, const u8 *ie, size_t ie_len);
+void hostapd_taxonomy_assoc_req(const struct hostapd_data *hapd,
+	struct sta_info *sta, const u8 *ie, size_t ie_len);
+int retrieve_sta_taxonomy(const struct hostapd_data *hapd,
+	struct sta_info *sta, char *buf, size_t buflen);
+
+#endif /* TAXONOMY_H */