diff mbox

Revert "nl80211: Use global netlink rtm event object"

Message ID 1319816231.8931.14.camel@jlt3.sipsolutions.net
State Rejected, archived
Headers show

Commit Message

Johannes Berg Oct. 28, 2011, 3:37 p.m. UTC
From: Johannes Berg <johannes.berg@intel.com>

This reverts commit 36d84860bbe09641f782fcc21b09e5a6952b4629.

This commit completely broke P2P operation.

---
I'm very unhappy now as I just spent a long time tracking this down.

TEST YOUR PATCHES!

Comments

Ben Greear Oct. 28, 2011, 4:09 p.m. UTC | #1
On 10/28/2011 08:37 AM, Johannes Berg wrote:
> From: Johannes Berg<johannes.berg@intel.com>
>
> This reverts commit 36d84860bbe09641f782fcc21b09e5a6952b4629.
>
> This commit completely broke P2P operation.
>
> ---
> I'm very unhappy now as I just spent a long time tracking this down.
>
> TEST YOUR PATCHES!

I did test it, but not with p2p (and bundled with the other pending
global-netlink patches).  Any idea what is why
this breaks p2p?  Are there any directions anywhere for how
to set up a p2p environment for testing?

Thanks,
Ben
Jouni Malinen Oct. 28, 2011, 4:42 p.m. UTC | #2
On Fri, Oct 28, 2011 at 09:09:02AM -0700, Ben Greear wrote:
> I did test it, but not with p2p (and bundled with the other pending
> global-netlink patches).  Any idea what is why
> this breaks p2p?  Are there any directions anywhere for how
> to set up a p2p environment for testing?

To be more exact, this breaks AP (including P2P GO) mode operations
with wpa_supplicant because of EVENT_INTERFACE_DISABLED event getting
generated at a particularly inconvenient time. Currently, that makes
wpa_supplicant start scanning to find a station mode connection when the
interface "comes back" (i.e., immediately after that on the next netlink
event). This is obviously wrong behavior and should be fixed somehow
regardless.

There are instructions for P2P somewhere.. Anyway, the easiest way of
showing this particular regression is by adding following to
wpa_supplicant/.config:
CONFIG_AP=y
CONFIG_P2P=y

and using following wpa_supplicant.conf file:
ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=admin
driver_param=use_p2p_group_interface=1

and run "wpa_cli p2p_group_add"

That will create p2p-wlan0-0 vif that is supposed to be in AP/GO mode,
but it gets moved to station mode when the interface-disabled event is
processed.
Ben Greear Oct. 28, 2011, 4:55 p.m. UTC | #3
On 10/28/2011 09:42 AM, Jouni Malinen wrote:
> On Fri, Oct 28, 2011 at 09:09:02AM -0700, Ben Greear wrote:
>> I did test it, but not with p2p (and bundled with the other pending
>> global-netlink patches).  Any idea what is why
>> this breaks p2p?  Are there any directions anywhere for how
>> to set up a p2p environment for testing?
>
> To be more exact, this breaks AP (including P2P GO) mode operations
> with wpa_supplicant because of EVENT_INTERFACE_DISABLED event getting
> generated at a particularly inconvenient time. Currently, that makes
> wpa_supplicant start scanning to find a station mode connection when the
> interface "comes back" (i.e., immediately after that on the next netlink
> event). This is obviously wrong behavior and should be fixed somehow
> regardless.
>
> There are instructions for P2P somewhere.. Anyway, the easiest way of
> showing this particular regression is by adding following to
> wpa_supplicant/.config:
> CONFIG_AP=y
> CONFIG_P2P=y
>
> and using following wpa_supplicant.conf file:
> ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=admin
> driver_param=use_p2p_group_interface=1
>
> and run "wpa_cli p2p_group_add"
>
> That will create p2p-wlan0-0 vif that is supposed to be in AP/GO mode,
> but it gets moved to station mode when the interface-disabled event is
> processed.

I don't see how that patch would change any behaviour like this,
but I'll work on reproducing the problem and see what I can learn.

Thanks for the details.

Ben
Ben Greear Oct. 28, 2011, 5:19 p.m. UTC | #4
On 10/28/2011 09:55 AM, Ben Greear wrote:
> On 10/28/2011 09:42 AM, Jouni Malinen wrote:
>> On Fri, Oct 28, 2011 at 09:09:02AM -0700, Ben Greear wrote:
>>> I did test it, but not with p2p (and bundled with the other pending
>>> global-netlink patches).  Any idea what is why
>>> this breaks p2p?  Are there any directions anywhere for how
>>> to set up a p2p environment for testing?
>>
>> To be more exact, this breaks AP (including P2P GO) mode operations
>> with wpa_supplicant because of EVENT_INTERFACE_DISABLED event getting
>> generated at a particularly inconvenient time. Currently, that makes
>> wpa_supplicant start scanning to find a station mode connection when the
>> interface "comes back" (i.e., immediately after that on the next netlink
>> event). This is obviously wrong behavior and should be fixed somehow
>> regardless.
>>
>> There are instructions for P2P somewhere.. Anyway, the easiest way of
>> showing this particular regression is by adding following to
>> wpa_supplicant/.config:
>> CONFIG_AP=y
>> CONFIG_P2P=y

Just FYI:  The build will break unless you also
have WPS enabled.  Not sure if WPS2 is required or not,
but seemed like a good idea to enable it.

# Wi-Fi Protected Setup (WPS)
CONFIG_WPS=y
# Enable WSC 2.0 support
CONFIG_WPS2=y

Thanks,
Ben
Jouni Malinen Oct. 28, 2011, 6:14 p.m. UTC | #5
On Fri, Oct 28, 2011 at 09:55:22AM -0700, Ben Greear wrote:
> > To be more exact, this breaks AP (including P2P GO) mode operations
> > with wpa_supplicant because of EVENT_INTERFACE_DISABLED event getting
> > generated at a particularly inconvenient time. Currently, that makes
> > wpa_supplicant start scanning to find a station mode connection when the
> > interface "comes back" (i.e., immediately after that on the next netlink
> > event). This is obviously wrong behavior and should be fixed somehow
> > regardless.

> I don't see how that patch would change any behaviour like this,
> but I'll work on reproducing the problem and see what I can learn.

Agreed, the netlink related patch is not really broken, but it is
highlighting another issue that has been there with wpa_supplicant AP
mode from the beginning. It was just not showing up frequently enough to
get enough energy to fixing it. Your patch made it happen on every
P2P GO start, so this kind of changes ;-).

Instead of reverting the patch, I added yet another workaround for the
netlink download processing: commit
59d249255c5ea484ac2050f19ecbe84a7d9323f0.
Jouni Malinen Oct. 28, 2011, 6:19 p.m. UTC | #6
On Fri, Oct 28, 2011 at 10:19:46AM -0700, Ben Greear wrote:
> Just FYI:  The build will break unless you also
> have WPS enabled.  Not sure if WPS2 is required or not,
> but seemed like a good idea to enable it.

Ah, yes, WPS is required for P2P. Strictly speaking, WPS2 is not, but
anyway, I would recommend enabling it. As far as the minimum requirement
is concerned, I made CONFIG_P2P=y enable CONFIG_WPS=y and CONFIG_AP=y
automatically.
Ben Greear Oct. 28, 2011, 6:37 p.m. UTC | #7
On 10/28/2011 11:14 AM, Jouni Malinen wrote:
> On Fri, Oct 28, 2011 at 09:55:22AM -0700, Ben Greear wrote:
>>> To be more exact, this breaks AP (including P2P GO) mode operations
>>> with wpa_supplicant because of EVENT_INTERFACE_DISABLED event getting
>>> generated at a particularly inconvenient time. Currently, that makes
>>> wpa_supplicant start scanning to find a station mode connection when the
>>> interface "comes back" (i.e., immediately after that on the next netlink
>>> event). This is obviously wrong behavior and should be fixed somehow
>>> regardless.
>
>> I don't see how that patch would change any behaviour like this,
>> but I'll work on reproducing the problem and see what I can learn.
>
> Agreed, the netlink related patch is not really broken, but it is
> highlighting another issue that has been there with wpa_supplicant AP
> mode from the beginning. It was just not showing up frequently enough to
> get enough energy to fixing it. Your patch made it happen on every
> P2P GO start, so this kind of changes ;-).
>
> Instead of reverting the patch, I added yet another workaround for the
> netlink download processing: commit
> 59d249255c5ea484ac2050f19ecbe84a7d9323f0.

Thanks.  I sync'd this down and verified it also fixes the problem
for me (I was able to reproduce it).

I also tested a small number of regular virtual stations and that
all seems good too.

Hopefully there is nothing that does a quick interface bounce and
actually wants the down/up state change logic to happen in hostap,
because if so, that work-around will probably break it.

Thanks,
Ben
Jouni Malinen Oct. 28, 2011, 7:06 p.m. UTC | #8
On Fri, Oct 28, 2011 at 11:37:17AM -0700, Ben Greear wrote:
> Hopefully there is nothing that does a quick interface bounce and
> actually wants the down/up state change logic to happen in hostap,
> because if so, that work-around will probably break it.

I'm perfectly fine with breaking such a hack ;-).
Johannes Berg Nov. 2, 2011, 10:22 a.m. UTC | #9
On Fri, 2011-10-28 at 09:09 -0700, Ben Greear wrote:
> On 10/28/2011 08:37 AM, Johannes Berg wrote:
> > From: Johannes Berg<johannes.berg@intel.com>
> >
> > This reverts commit 36d84860bbe09641f782fcc21b09e5a6952b4629.
> >
> > This commit completely broke P2P operation.
> >
> > ---
> > I'm very unhappy now as I just spent a long time tracking this down.
> >
> > TEST YOUR PATCHES!
> 
> I did test it, but not with p2p (and bundled with the other pending
> global-netlink patches).  Any idea what is why
> this breaks p2p?  Are there any directions anywhere for how
> to set up a p2p environment for testing?

I apologies for the rash email. I realise you can't possibly test every
combination of things, it just seemed a very common thing to me though I
guess you don't do P2P at all.

I see Jouni has fixed this -- essentially what I found later is what
broke it is we would before this patch not have listened to netlink
events about that interface, now we did before and got a DOWN event
while creating it.

johannes
Ben Greear Nov. 2, 2011, 3:48 p.m. UTC | #10
On 11/02/2011 03:22 AM, Johannes Berg wrote:
> On Fri, 2011-10-28 at 09:09 -0700, Ben Greear wrote:
>> On 10/28/2011 08:37 AM, Johannes Berg wrote:
>>> From: Johannes Berg<johannes.berg@intel.com>
>>>
>>> This reverts commit 36d84860bbe09641f782fcc21b09e5a6952b4629.
>>>
>>> This commit completely broke P2P operation.
>>>
>>> ---
>>> I'm very unhappy now as I just spent a long time tracking this down.
>>>
>>> TEST YOUR PATCHES!
>>
>> I did test it, but not with p2p (and bundled with the other pending
>> global-netlink patches).  Any idea what is why
>> this breaks p2p?  Are there any directions anywhere for how
>> to set up a p2p environment for testing?
>
> I apologies for the rash email. I realise you can't possibly test every
> combination of things, it just seemed a very common thing to me though I
> guess you don't do P2P at all.
>
> I see Jouni has fixed this -- essentially what I found later is what
> broke it is we would before this patch not have listened to netlink
> events about that interface, now we did before and got a DOWN event
> while creating it.

No problem.  I certainly understand the frustration!

I've had similar issues with netlink in my own code, and my
solution was to have an explicit 'slurpNetlinkSocket()' call
that read all events.  So, you could do something like:

set interface down
slurp
set interface up

Then you can dodge a lot of the transient down/up events.  But, it's
still a mess (in my code at least)...

The socket is also slurped as part of the normal select/poll loop
as well of course.

Thanks,
Ben
diff mbox

Patch

diff --git a/src/drivers/driver_nl80211.c b/src/drivers/driver_nl80211.c
index d962ac9..239f4b4 100644
--- a/src/drivers/driver_nl80211.c
+++ b/src/drivers/driver_nl80211.c
@@ -167,7 +167,6 @@  static void nl_destroy_handles(struct nl80211_handles *handles)
 struct nl80211_global {
 	struct dl_list interfaces;
 	int if_add_ifindex;
-	struct netlink_data *netlink;
 	struct nl_cb *nl_cb;
 	struct nl80211_handles nl;
 	struct genl_family *nl80211;
@@ -194,6 +193,7 @@  struct wpa_driver_nl80211_data {
 	u8 addr[ETH_ALEN];
 	char phyname[32];
 	void *ctx;
+	struct netlink_data *netlink;
 	int ifindex;
 	int if_removed;
 	int if_disabled;
@@ -561,32 +561,17 @@  static int wpa_driver_nl80211_own_ifindex(struct wpa_driver_nl80211_data *drv,
 }
 
 
-static struct wpa_driver_nl80211_data *
-nl80211_find_drv(struct nl80211_global *global, int idx, u8 *buf, size_t len)
-{
-	struct wpa_driver_nl80211_data *drv;
-	dl_list_for_each(drv, &global->interfaces,
-			 struct wpa_driver_nl80211_data, list) {
-		if (wpa_driver_nl80211_own_ifindex(drv, idx, buf, len) ||
-		    have_ifidx(drv, idx))
-			return drv;
-	}
-	return NULL;
-}
-
-
 static void wpa_driver_nl80211_event_rtm_newlink(void *ctx,
 						 struct ifinfomsg *ifi,
 						 u8 *buf, size_t len)
 {
-	struct nl80211_global *global = ctx;
-	struct wpa_driver_nl80211_data *drv;
+	struct wpa_driver_nl80211_data *drv = ctx;
 	int attrlen, rta_len;
 	struct rtattr *attr;
 	u32 brid = 0;
 
-	drv = nl80211_find_drv(global, ifi->ifi_index, buf, len);
-	if (!drv) {
+	if (!wpa_driver_nl80211_own_ifindex(drv, ifi->ifi_index, buf, len) &&
+	    !have_ifidx(drv, ifi->ifi_index)) {
 		wpa_printf(MSG_DEBUG, "nl80211: Ignore event for foreign "
 			   "ifindex %d", ifi->ifi_index);
 		return;
@@ -628,7 +613,7 @@  static void wpa_driver_nl80211_event_rtm_newlink(void *ctx,
 	if (drv->operstate == 1 &&
 	    (ifi->ifi_flags & (IFF_LOWER_UP | IFF_DORMANT)) == IFF_LOWER_UP &&
 	    !(ifi->ifi_flags & IFF_RUNNING))
-		netlink_send_oper_ifla(drv->global->netlink, drv->ifindex,
+		netlink_send_oper_ifla(drv->netlink, drv->ifindex,
 				       -1, IF_OPER_UP);
 
 	attrlen = len;
@@ -660,19 +645,11 @@  static void wpa_driver_nl80211_event_rtm_dellink(void *ctx,
 						 struct ifinfomsg *ifi,
 						 u8 *buf, size_t len)
 {
-	struct nl80211_global *global = ctx;
-	struct wpa_driver_nl80211_data *drv;
+	struct wpa_driver_nl80211_data *drv = ctx;
 	int attrlen, rta_len;
 	struct rtattr *attr;
 	u32 brid = 0;
 
-	drv = nl80211_find_drv(global, ifi->ifi_index, buf, len);
-	if (!drv) {
-		wpa_printf(MSG_DEBUG, "nl80211: Ignore dellink event for "
-			   "foreign ifindex %d", ifi->ifi_index);
-		return;
-	}
-
 	attrlen = len;
 	attr = (struct rtattr *) buf;
 
@@ -2185,6 +2162,7 @@  static void * wpa_driver_nl80211_init(void *ctx, const char *ifname,
 				      void *global_priv)
 {
 	struct wpa_driver_nl80211_data *drv;
+	struct netlink_config *cfg;
 	struct rfkill_config *rcfg;
 	struct i802_bss *bss;
 
@@ -2207,6 +2185,18 @@  static void * wpa_driver_nl80211_init(void *ctx, const char *ifname,
 
 	nl80211_get_phy_name(drv);
 
+	cfg = os_zalloc(sizeof(*cfg));
+	if (cfg == NULL)
+		goto failed;
+	cfg->ctx = drv;
+	cfg->newlink_cb = wpa_driver_nl80211_event_rtm_newlink;
+	cfg->dellink_cb = wpa_driver_nl80211_event_rtm_dellink;
+	drv->netlink = netlink_init(cfg);
+	if (drv->netlink == NULL) {
+		os_free(cfg);
+		goto failed;
+	}
+
 	rcfg = os_zalloc(sizeof(*rcfg));
 	if (rcfg == NULL)
 		goto failed;
@@ -2384,7 +2374,7 @@  wpa_driver_nl80211_finish_drv_init(struct wpa_driver_nl80211_data *drv)
 		}
 	}
 
-	netlink_send_oper_ifla(drv->global->netlink, drv->ifindex,
+	netlink_send_oper_ifla(drv->netlink, drv->ifindex,
 			       1, IF_OPER_DORMANT);
 #endif /* HOSTAPD */
 
@@ -2490,8 +2480,8 @@  static void wpa_driver_nl80211_deinit(void *priv)
 	if (drv->disable_11b_rates)
 		nl80211_disable_11b_rates(drv, drv->ifindex, 0);
 
-	netlink_send_oper_ifla(drv->global->netlink, drv->ifindex, 0,
-			       IF_OPER_UP);
+	netlink_send_oper_ifla(drv->netlink, drv->ifindex, 0, IF_OPER_UP);
+	netlink_deinit(drv->netlink);
 	rfkill_deinit(drv->rfkill);
 
 	eloop_cancel_timeout(wpa_driver_nl80211_scan_timeout, drv, drv->ctx);
@@ -5787,7 +5777,7 @@  static int wpa_driver_nl80211_set_operstate(void *priv, int state)
 	wpa_printf(MSG_DEBUG, "%s: operstate %d->%d (%s)",
 		   __func__, drv->operstate, state, state ? "UP" : "DORMANT");
 	drv->operstate = state;
-	return netlink_send_oper_ifla(drv->global->netlink, drv->ifindex, -1,
+	return netlink_send_oper_ifla(drv->netlink, drv->ifindex, -1,
 				      state ? IF_OPER_UP : IF_OPER_DORMANT);
 }
 
@@ -7167,8 +7157,6 @@  static int nl80211_set_param(void *priv, const char *param)
 static void * nl80211_global_init(void)
 {
 	struct nl80211_global *global;
-	struct netlink_config *cfg;
-
 	global = os_zalloc(sizeof(*global));
 	if (global == NULL)
 		return NULL;
@@ -7176,19 +7164,6 @@  static void * nl80211_global_init(void)
 	dl_list_init(&global->interfaces);
 	global->if_add_ifindex = -1;
 
-	cfg = os_zalloc(sizeof(*cfg));
-	if (cfg == NULL)
-		goto err;
-
-	cfg->ctx = global;
-	cfg->newlink_cb = wpa_driver_nl80211_event_rtm_newlink;
-	cfg->dellink_cb = wpa_driver_nl80211_event_rtm_dellink;
-	global->netlink = netlink_init(cfg);
-	if (global->netlink == NULL) {
-		os_free(cfg);
-		goto err;
-	}
-
 	if (wpa_driver_nl80211_init_nl_global(global) < 0)
 		goto err;
 
@@ -7217,9 +7192,6 @@  static void nl80211_global_deinit(void *priv)
 			   dl_list_len(&global->interfaces));
 	}
 
-	if (global->netlink)
-		netlink_deinit(global->netlink);
-
 	if (global->nl80211)
 		genl_family_put(global->nl80211);
 	nl_destroy_handles(&global->nl);