diff mbox

gigaset: memory leak in gigaset_initcshw

Message ID CACT4Y+aNN8AAtRSvxxrP5RfsTZdd49odnQ_GRq86Y=jXr=pANw@mail.gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Dmitry Vyukov Feb. 5, 2016, 1:28 p.m. UTC
On Thu, Feb 4, 2016 at 4:06 PM, Paul Bolle <pebolle@tiscali.nl> wrote:
> On do, 2016-02-04 at 15:54 +0100, Dmitry Vyukov wrote:
>> One TIOCSETD is enough to trigger the leak.
>> I've tested with different line disciplines and only N_GIGASET_M101
>> triggers the leak.
>
> So things appear to be just on my plate now. I'll see what I can come up
> with. Feel free to prod me if I stay silent for too long.
>
> Thanks for narrowing things down!

> 0) It's way too late here. And I can't really reproduce, and too many
> things jumps to 100% when running your reproducer. Anyhow, this is all I
> came up with (for drivers/isdn/gigaset/common.c):
>
> @@ -427,7 +427,11 @@ exit:
>
>  static void free_cs(struct cardstate *cs)
>  {
> -       cs->flags = 0;
> +       unsigned long flags;
> +       struct gigaset_driver *drv = cs->driver;
> +       spin_lock_irqsave(&drv->lock, flags);
> +       cs->flags &= ~VALID_MINOR;
> +       spin_unlock_irqrestore(&drv->lock, flags);
>  }
>
>  static void make_valid(struct cardstate *cs, unsigned mask)
>
> 1) On my side the logs do seem more sensible, for what it's worth. But
> does this fix the leak?
>
> 2) Note that I fear your reproducer allows to DOS the box (even if the
> leak turns out to be fixed) so the mess might be getting much worse. I
> need a clear hear to think this through. Ie, I need some sleep.

No, it does not help.

I wonder why you don't see the leak I am seeing... are you suing qemu
or real hardware? I am using qemu.

I've added the following change:


and it does fire.
Can it be a case that free_cs() runs before gigaset_device_release()?
If that would happen, then cs can be reused while the previous
cs->hw.ser is not freed yet. Just a guess.

Comments

Paul Bolle Feb. 5, 2016, 4:06 p.m. UTC | #1
Hi Dmitry,

(If anyone is confused by this conversation: Dmitry replied to an off
list message.)

On vr, 2016-02-05 at 14:28 +0100, Dmitry Vyukov wrote:
> I wonder why you don't see the leak I am seeing...

So do I, for a few days now.

> are you suing qemu or real hardware? I am using qemu.

Real hardware (a ThinkPad). Probably less powerful that your VM.

What is the rate you're seeing leakage of a struct ser_cardstate? I'm
running your latest test at about 2.000 TIOCSETD's per second - which is
by itself not very useful for our driver - and notice no _obvious_
leakage when I do that for a few minutes. I do note the hardware
screaming to just keep up with the abuse, though.

> I've added the following change:
> 
> --- a/drivers/isdn/gigaset/ser-gigaset.c
> +++ b/drivers/isdn/gigaset/ser-gigaset.c
> @@ -396,6 +396,7 @@ static int gigaset_initcshw(struct cardstate *cs)
>                 pr_err("out of memory\n");
>                 return -ENOMEM;
>         }
> +       WARN_ON(cs->hw.ser != NULL);
>         cs->hw.ser = scs;
> 
>         cs->hw.ser->dev.name = GIGASET_MODULENAME;
> 
> and it does fire.
> Can it be a case that free_cs() runs before gigaset_device_release()?

gigaset_device_release() is the release operation that is run when our
struct device goes away. The core code is responsible for calling it, we
can't be certain when that will happen. At least, we should not expect
it to happen directly after calling platform_device_unregister().

(It was actually syzkaller that warned us that we did just that until
recently. See commit 4c5e354a9742 ("ser_gigaset: fix deallocation of
platform device structure").)

> If that would happen, then cs can be reused while the previous
> cs->hw.ser is not freed yet. Just a guess.

I'll have to ponder on that a bit, sorry.


Paul Bolle
Paul Bolle Feb. 11, 2016, 10:54 p.m. UTC | #2
Hi Dmitry,

On vr, 2016-02-05 at 17:06 +0100, Paul Bolle wrote:
> On vr, 2016-02-05 at 14:28 +0100, Dmitry Vyukov wrote:
> > I wonder why you don't see the leak I am seeing...
> 
> So do I, for a few days now.

0) I finally managed to reliably trigger this leak on an i686, single
core machine (yet another ThinkPad).

1) Note that on that machine the leak was noticeable under the kmalloc
-512 line (struct ser_cardstate is 456 bytes on that machine). I'm
_guessing_ the kmalloc-2048 line, which I stared at for quite some time,
is only relevant here for x86_64 and when there's a bit of
instrumentation, or whatever, added to the slab objects (as they are in
your VM?).

2) More important was that this i686 machine ran a tree that actually
included the offending commit:
	25cad69f21f5 ("base/platform: Fix platform drivers with no probe callback").

See, after staring at the gigaset code for way too long I decided to
just use brute force. Ie, I bisected this issue.

2) Anyhow, thanks again for the report. Now on to the next question: how
on earth does that commit make ser_gigaset leak struct ser_cardstate?

To be continued,


Paul Bolle
diff mbox

Patch

--- a/drivers/isdn/gigaset/ser-gigaset.c
+++ b/drivers/isdn/gigaset/ser-gigaset.c
@@ -396,6 +396,7 @@  static int gigaset_initcshw(struct cardstate *cs)
                pr_err("out of memory\n");
                return -ENOMEM;
        }
+       WARN_ON(cs->hw.ser != NULL);
        cs->hw.ser = scs;

        cs->hw.ser->dev.name = GIGASET_MODULENAME;