Message ID | 1414657490-22551-1-git-send-email-lkundrak@v3.sk |
---|---|
State | Changes Requested |
Headers | show |
On Thu, Oct 30, 2014 at 09:24:50AM +0100, Lubomir Rintel wrote: > It might be that the underlying infrastrucutre went away and the state is no > longer valid. We ought to reinitialize it once a device appears again. > > This is the case when the nl80211 devices disappear and cfg8011 is remoed > afterwards. The netlink handle is no longer valid (returns ENOENT) and a new > one needs to be open if it's loaded back. This is not the way the global_init/global_deinit was originally meant to be used, i.e., the assumption was more like this getting initialized once and then remain available even if no interfaces are enabled, e.g., for get_interfaces() call to work. > diff --git a/wpa_supplicant/wpa_supplicant.c b/wpa_supplicant/wpa_supplicant.c > @@ -3925,6 +3925,25 @@ static int wpa_supplicant_init_iface(struct wpa_supplicant *wpa_s, > +/* Deinitialize the driver if we're the last user. */ > +static void wpa_drv_cleanup(struct wpa_supplicant *wpa_s) > +{ > + struct wpa_global *global = wpa_s->global; > + struct wpa_supplicant *iface; > + int i; > + > + for (iface = global->ifaces; iface; iface = iface->next) > + if (iface != wpa_s && iface->driver == wpa_s->driver) > + return; > + > + for (i = 0; wpa_drivers[i]; i++) > + if (global->drv_priv[i] == wpa_s->global_drv_priv) > + global->drv_priv[i] = NULL; > + > + if (wpa_s->driver->global_deinit) That would be a NULL pointer dereference if interface initialization fails early enough (wpa_s->driver == NULL in that case). In addition to this, there is something wrong in how the global deinit gets handled, i.e., at least one more place can trigger a SIGSEGV when running through mac80211_hwsim test cases. As such, I cannot really apply this until those issues have been found and fixed. Then again, I'm not sure whether this is really the best way of handling the issue which I'm assuming is the main reason for this patch, i.e., need to re-initialize some socket for the case of cfg80211 getting reloaded.
On Sat, 2014-11-01 at 17:08 +0200, Jouni Malinen wrote: > On Thu, Oct 30, 2014 at 09:24:50AM +0100, Lubomir Rintel wrote: > > It might be that the underlying infrastrucutre went away and the state is no > > longer valid. We ought to reinitialize it once a device appears again. > > > > This is the case when the nl80211 devices disappear and cfg8011 is remoed > > afterwards. The netlink handle is no longer valid (returns ENOENT) and a new > > one needs to be open if it's loaded back. > > This is not the way the global_init/global_deinit was originally meant > to be used, i.e., the assumption was more like this getting initialized > once and then remain available even if no interfaces are enabled, e.g., > for get_interfaces() call to work. I guess this is something specific to Windows NDIS drivers (only one that seems to implement get_interfaces). It seems to me that get_interfaces works whether the driver is initialized or not; maybe it shouldn't even be passwd the global_priv structure as it's unused anyway? > > diff --git a/wpa_supplicant/wpa_supplicant.c b/wpa_supplicant/wpa_supplicant.c > > @@ -3925,6 +3925,25 @@ static int wpa_supplicant_init_iface(struct wpa_supplicant *wpa_s, > > +/* Deinitialize the driver if we're the last user. */ > > +static void wpa_drv_cleanup(struct wpa_supplicant *wpa_s) > > +{ > > + struct wpa_global *global = wpa_s->global; > > + struct wpa_supplicant *iface; > > + int i; > > + > > + for (iface = global->ifaces; iface; iface = iface->next) > > + if (iface != wpa_s && iface->driver == wpa_s->driver) > > + return; > > + > > + for (i = 0; wpa_drivers[i]; i++) > > + if (global->drv_priv[i] == wpa_s->global_drv_priv) > > + global->drv_priv[i] = NULL; > > + > > + if (wpa_s->driver->global_deinit) > > That would be a NULL pointer dereference if interface initialization > fails early enough (wpa_s->driver == NULL in that case). I see. Will fix. > In addition to this, there is something wrong in how the global deinit > gets handled, i.e., at least one more place can trigger a SIGSEGV when > running through mac80211_hwsim test cases. I've run the test suite. It runs now for hours, but I noticed no crashes so far. However I see a lot of timeout and FAIL messages. I'm wondering if some tests are expected to fail in master tree, or I need to fix my setup: [lkundrak@fedora20-2 hwsim]$ ./run-all.sh START offchannel_tx_roc_gas 1/764 Exception: ANQP_GET command failed FAIL offchannel_tx_roc_gas 0.283542 2014-11-11 13:28:00.406416 ... START p2p_device_group_remove 420/764 Exception: [Errno 111] Connection refused FAIL p2p_device_group_remove 0.104766 2014-11-11 13:58:50.080920 ... > As such, I cannot really > apply this until those issues have been found and fixed. Then again, I'm > not sure whether this is really the best way of handling the issue which > I'm assuming is the main reason for this patch, i.e., need to > re-initialize some socket for the case of cfg80211 getting reloaded. Correct. What would be the other possible fix? Address that in kernel? Lubo
On Tue, Nov 11, 2014 at 02:05:53PM +0100, Lubomir Rintel wrote: > On Sat, 2014-11-01 at 17:08 +0200, Jouni Malinen wrote: > > This is not the way the global_init/global_deinit was originally meant > > to be used, i.e., the assumption was more like this getting initialized > > once and then remain available even if no interfaces are enabled, e.g., > > for get_interfaces() call to work. > > I guess this is something specific to Windows NDIS drivers (only one > that seems to implement get_interfaces). > > It seems to me that get_interfaces works whether the driver is > initialized or not; maybe it shouldn't even be passwd the global_priv > structure as it's unused anyway? This is not really used currently, but that does not change the reasons for the design. Global context was expected to be available and remain available even if driver interfaces were not in use. > I've run the test suite. It runs now for hours, but I noticed no crashes > so far. However I see a lot of timeout and FAIL messages. I'm wondering > if some tests are expected to fail in master tree, or I need to fix my > setup: All the test cases pass in my tests (or well, if running under heavy load, couple of test cases may fail out of about 800 every now and then). > [lkundrak@fedora20-2 hwsim]$ ./run-all.sh > START offchannel_tx_roc_gas 1/764 > Exception: ANQP_GET command failed > FAIL offchannel_tx_roc_gas 0.283542 2014-11-11 13:28:00.406416 > ... > START p2p_device_group_remove 420/764 > Exception: [Errno 111] Connection refused > FAIL p2p_device_group_remove 0.104766 2014-11-11 13:58:50.080920 > ... That "Connection refused" is one of the most common indication of hostapd or wpa_supplicant having crashed during the test.. > > As such, I cannot really > > apply this until those issues have been found and fixed. Then again, I'm > > not sure whether this is really the best way of handling the issue which > > I'm assuming is the main reason for this patch, i.e., need to > > re-initialize some socket for the case of cfg80211 getting reloaded. > > Correct. What would be the other possible fix? Address that in kernel? It could be enough to just re-initialize the netlink sockets in global context within driver_nl80211.c if they start showing errors that could indicate that cfg80211 has been reloaded.
On Sun, Nov 23, 2014 at 09:55:24PM +0200, Jouni Malinen wrote: > On Tue, Nov 11, 2014 at 02:05:53PM +0100, Lubomir Rintel wrote: > > Correct. What would be the other possible fix? Address that in kernel? > > It could be enough to just re-initialize the netlink sockets in global > context within driver_nl80211.c if they start showing errors that could > indicate that cfg80211 has been reloaded. Actually, it may be easier to simply always deinit the nl80211_global allocated resources when the last interface gets removed from nl80211_global::interfaces list at the end of wpa_driver_nl80211_deinit() (and then re-init them on next interface add in wpa_driver_nl80211_drv_init()). I'd assume this would be pretty close to what you were trying to do, but limited to be within driver_nl80211.c and as such, not affecting any other uses of the global driver wrapper context.
diff --git a/wpa_supplicant/wpa_supplicant.c b/wpa_supplicant/wpa_supplicant.c index 5a4d8dc..2099ef8 100644 --- a/wpa_supplicant/wpa_supplicant.c +++ b/wpa_supplicant/wpa_supplicant.c @@ -3925,6 +3925,25 @@ static int wpa_supplicant_init_iface(struct wpa_supplicant *wpa_s, return 0; } +/* Deinitialize the driver if we're the last user. */ +static void wpa_drv_cleanup(struct wpa_supplicant *wpa_s) +{ + struct wpa_global *global = wpa_s->global; + struct wpa_supplicant *iface; + int i; + + for (iface = global->ifaces; iface; iface = iface->next) + if (iface != wpa_s && iface->driver == wpa_s->driver) + return; + + for (i = 0; wpa_drivers[i]; i++) + if (global->drv_priv[i] == wpa_s->global_drv_priv) + global->drv_priv[i] = NULL; + + if (wpa_s->driver->global_deinit) + wpa_s->driver->global_deinit (wpa_s->global_drv_priv); + wpa_s->global_drv_priv = NULL; +} static void wpa_supplicant_deinit_iface(struct wpa_supplicant *wpa_s, int notify, int terminate) @@ -3967,6 +3986,8 @@ static void wpa_supplicant_deinit_iface(struct wpa_supplicant *wpa_s, if (wpa_s->drv_priv) wpa_drv_deinit(wpa_s); + wpa_drv_cleanup(wpa_s); + if (notify) wpas_notify_iface_removed(wpa_s);
It might be that the underlying infrastrucutre went away and the state is no longer valid. We ought to reinitialize it once a device appears again. This is the case when the nl80211 devices disappear and cfg8011 is remoed afterwards. The netlink handle is no longer valid (returns ENOENT) and a new one needs to be open if it's loaded back. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> --- Changes since v1: - wpa_s->global_drv_priv unset moved to correct place wpa_supplicant/wpa_supplicant.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)