diff mbox

[RFC] Suspicious bug in module refcounting

Message ID 200902041418.09630.rusty@rustcorp.com.au
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Rusty Russell Feb. 4, 2009, 3:48 a.m. UTC
On Wednesday 04 February 2009 00:17:21 Karsten Keil wrote:
> The refcount is a per CPU atomic variable, module_refcount() simple add
> in a fully unprotected loop (not disabled irqs, not protected against
> scheduling) all per cpu values.

Hi Karsten,

   Yes, the BUG_ON() is overly aggressive.  And I really hate __module_get,
and it looks like most of the callers are completely bogus.  The watchdog
drivers use it to nail themselves in place in their open routines: this is
OK, if a bit weird.

   We should only use __module_get() when you *can't handle* failure;
otherwise you should accept that the admin did rmmod --wait and don't use the
module any further.

  dmaengine.c seems to be taking liberties like this.  AFAICT it can error
out, so why not just try_module_get() always?

  gameport.c, serio.c and input.c increment their own refcount, but to get
into those init functions someone must be holding a refcount already (ie. a
module depends on this module).  Ditto cyber2000fb.c, and MTD.

  mdio-bitbang.c should definitely use try_module_get.

  loop.c bumping its own refcount, Al might know why, but definitely can be
try_module_get() if it's valid at all.

  net/socket.c can also handle failure, so that's another try_module_get.

etc.

> I think we should replace all unprotected __module_get() calls with
> try_module_get(), or remove __module_get() completely.

Agreed.  We will need a "nail_module()" call for those legitimate uses (which
should clear mod->exit, rather than manipulating the refcount at all).

Meanwhile, I'll remove the BUG_ON for 2.6.29.

Thanks,
Rusty.

module: remove over-zealous check in __module_get()

module_refcount() isn't reliable outside stop_machine(), as demonstrated
by Karsten Keil <kkeil@suse.de>, networking can trigger it under load
(an inc on one cpu and dec on another while module_refcount() is tallying
 can give false results, for example).

Almost noone should be using __module_get, but that's another issue.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Russell King Feb. 4, 2009, 10:11 a.m. UTC | #1
On Wed, Feb 04, 2009 at 02:18:08PM +1030, Rusty Russell wrote:
>   gameport.c, serio.c and input.c increment their own refcount, but to get
> into those init functions someone must be holding a refcount already (ie. a
> module depends on this module).  Ditto cyber2000fb.c, and MTD.

Err, wrong.  cyber2000fb.c does it in its module initialization function
to prevent the module (when built for Shark) from being unloaded.  It
does this because it's from the days of 2.2 kernels and no one bothered
writing the module unload support for Shark.  I'm certainly not in a
position to do that.

Since you can't unload a module while its initialization function is
running, so someone else must be holding a refcount (the insmod process).

I'm not saying that it's the right solution, I'm saying that this is how
it's evolved.

If someone has an idea on what to do about it then patches will be given
due consideration.
Dan Williams Feb. 4, 2009, 4:33 p.m. UTC | #2
On Tue, Feb 3, 2009 at 8:48 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> On Wednesday 04 February 2009 00:17:21 Karsten Keil wrote:
>> The refcount is a per CPU atomic variable, module_refcount() simple add
>> in a fully unprotected loop (not disabled irqs, not protected against
>> scheduling) all per cpu values.
>
> Hi Karsten,
>
>   Yes, the BUG_ON() is overly aggressive.  And I really hate __module_get,
> and it looks like most of the callers are completely bogus.  The watchdog
> drivers use it to nail themselves in place in their open routines: this is
> OK, if a bit weird.
>
>   We should only use __module_get() when you *can't handle* failure;
> otherwise you should accept that the admin did rmmod --wait and don't use the
> module any further.
>
>  dmaengine.c seems to be taking liberties like this.  AFAICT it can error
> out, so why not just try_module_get() always?

Currently there is no feedback loop for clients calling
dmaengine_get().  It simply means "I want to do offload, pin any
offload resources you may see, and don't let the resource leave until
dmaengine_ref_count == 0".  Even if we always called try_module_get()
we would still need to wait until dmaengine_ref_count reached zero to
be sure no transactions are in flight, effectively ignoring module_get
failures.

However, dma-driver module removal is still in the central control of
the administrator as downing all network interfaces and unloading the
async_tx api (i.e. raid456) will kill all dmaengine references.  We
just have the caveat highlighted below:

modprobe raid456
ifup eth0
rmmod --wait ioat_dma &
ifup eth1
modprobe -r raid456
ifdown eth0  <-- module removal succeeds here in a perfect world
ifdown eth1 <-- module removal currently succeeds here

Regards,
Dan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Karsten Keil Feb. 6, 2009, 10:41 p.m. UTC | #3
Hi Rusty,

On Wed, Feb 04, 2009 at 02:18:08PM +1030, Rusty Russell wrote:
> On Wednesday 04 February 2009 00:17:21 Karsten Keil wrote:
> > The refcount is a per CPU atomic variable, module_refcount() simple add
> > in a fully unprotected loop (not disabled irqs, not protected against
> > scheduling) all per cpu values.
> 
> Hi Karsten,
> 
>    Yes, the BUG_ON() is overly aggressive.  And I really hate __module_get,
> and it looks like most of the callers are completely bogus.  The watchdog
> drivers use it to nail themselves in place in their open routines: this is
> OK, if a bit weird.
> 

...
> 
> Meanwhile, I'll remove the BUG_ON for 2.6.29.
> 
> Thanks,
> Rusty.

Seems that this was not picked up yet for 2.6.29, but I think it really should
go in random triggering BUG() is not very nice, maybe it should also added to
the stable trees.

Can you please submit it again ?

> 
> module: remove over-zealous check in __module_get()
> 
> module_refcount() isn't reliable outside stop_machine(), as demonstrated
> by Karsten Keil <kkeil@suse.de>, networking can trigger it under load
> (an inc on one cpu and dec on another while module_refcount() is tallying
>  can give false results, for example).
> 
> Almost noone should be using __module_get, but that's another issue.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -407,7 +407,6 @@ static inline void __module_get(struct m
>  static inline void __module_get(struct module *module)
>  {
>  	if (module) {
> -		BUG_ON(module_refcount(module) == 0);
>  		local_inc(__module_ref_addr(module, get_cpu()));
>  		put_cpu();
>  	}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/include/linux/module.h b/include/linux/module.h
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -407,7 +407,6 @@  static inline void __module_get(struct m
 static inline void __module_get(struct module *module)
 {
 	if (module) {
-		BUG_ON(module_refcount(module) == 0);
 		local_inc(__module_ref_addr(module, get_cpu()));
 		put_cpu();
 	}