diff mbox

[4/7] hvc_console: Fix race between hvc_close and hvc_remove

Message ID 20100324121902.GJ15789@amit-x200.redhat.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Amit Shah March 24, 2010, 12:19 p.m. UTC
On (Sun) Mar 21 2010 [10:07:25], Amit Shah wrote:
> On (Sun) Mar 21 2010 [08:04:39], Benjamin Herrenschmidt wrote:
> > On Fri, 2010-03-19 at 08:18 -0700, Greg Kroah-Hartman wrote:
> > > From: Amit Shah <amit.shah@redhat.com>
> > > 
> > > Alan pointed out a race in the code where hvc_remove is invoked. The
> > > recent virtio_console work is the first user of hvc_remove().
> > 
> > This causes hangs during boot on pseries machines. Haven't had a chance
> > to track that down yet, but please revert
> > e74d098c66543d0731de62eb747ccd5b636a6f4c for now.
> 
> I tested this with the virtio-console driver with multiple consoles on
> Linux-2.6.34-rc2 inside a KVM VM. I didn't face any lockups / badness,
> but the following popped up once on -smp 4.
> 
> Could this be the same bug that's causing a lockup for you?
> 
> Since the virtio-console isn't the primary console on my VM, I didn't
> see a lockup but maybe powerpc uses this as the primary console and the
> lockup is a result of this?
> 
> I've not investigated this; I might get around to it next week, but
> CC'ing the sched people on this one.

I just looked at this:

> (My .config is after the oops.)
> 
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<ffffffff8101f4dc>] task_is_waking+0x1/0x1f
> PGD 3d261067 PUD 3d013067 PMD 0 
> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> last sysfs file: /sys/devices/virtual/block/ram13/removable
> CPU 0 
> Modules linked in:
> 
> Pid: 573, comm: console_check Not tainted 2.6.34-rc2 #102 /Bochs
> RIP: 0010:[<ffffffff8101f4dc>]  [<ffffffff8101f4dc>]
> task_is_waking+0x1/0x1f
> RSP: 0018:ffff88003bdf5b48  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81646e30
> RDX: ffff88003bdf5b78 RSI: ffff88003bdf5ba0 RDI: 0000000000000000
> RBP: ffff88003bdf5b78 R08: 0000000000000000 R09: ffffffff81646e08
> R10: 0000000000000046 R11: 0000000000001130 R12: 00000000001d1d00
> R13: 0000000000000000 R14: ffff88003bdf5ba0 R15: 000000000000000f
> FS:  00007f330731b6f0(0000) GS:ffff880003800000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000003be78000 CR4: 00000000000006b0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process console_check (pid: 573, threadinfo ffff88003bdf4000, task
> ffff88003bc3a2d0)
> Stack:
>  ffff88003bdf5b78 ffffffff8102058e 0000000000000000 0000000000000000
> <0> 0000000000000000 0000000000000000 ffff88003bdf5bd8 ffffffff81026f03
> <0> ffff88003ead8cd8 ffff88003eb10490 ffff88003bdf5bd8 ffffffff8118cea9
> Call Trace:
>  [<ffffffff8102058e>] ? task_rq_lock+0x24/0x98
>  [<ffffffff81026f03>] try_to_wake_up+0x4b/0x33b
>  [<ffffffff8118cea9>] ? resize_console+0x25/0x95
>  [<ffffffff8102721f>] wake_up_process+0x10/0x12
>  [<ffffffff8118c18e>] hvc_kick+0x1a/0x1c
>  [<ffffffff8118cbb4>] hvc_open+0xf6/0x102

And this suggests that hvc_kick() is called before hvc_task is
initialised, ie, before hvc_init() is called.

Does this help?


		Amit

Comments

Anton Blanchard March 25, 2010, 11:30 p.m. UTC | #1
Hi,

> And this suggests that hvc_kick() is called before hvc_task is
> initialised, ie, before hvc_init() is called.
>
> Does this help?

Looks good, tests OK on my POWER5 box. Thanks!

Tested-by: Anton Blanchard <anton@samba.org>

> diff --git a/drivers/char/hvc_console.c b/drivers/char/hvc_console.c
> index ba55bba..50ac983 100644
> --- a/drivers/char/hvc_console.c
> +++ b/drivers/char/hvc_console.c
> @@ -285,6 +285,9 @@ EXPORT_SYMBOL_GPL(hvc_instantiate);
>  /* Wake the sleeping khvcd */
>  void hvc_kick(void)
>  {
> +       if (!hvc_task)
> +               return;
> +
>         hvc_kicked = 1;
>         wake_up_process(hvc_task);
>  }
> 
> 		Amit
Amit Shah March 26, 2010, 2:01 a.m. UTC | #2
On (Fri) Mar 26 2010 [10:30:31], Anton Blanchard wrote:
>  
> Hi,
> 
> > And this suggests that hvc_kick() is called before hvc_task is
> > initialised, ie, before hvc_init() is called.
> >
> > Does this help?
> 
> Looks good, tests OK on my POWER5 box. Thanks!
> 
> Tested-by: Anton Blanchard <anton@samba.org>

Thanks. Before I submit this patch, though, I'd like to see why
hvc_kick() gets called before hvc_init. If I can get to it real quick,
I'll do it, else I'll just queue this one up.

> > diff --git a/drivers/char/hvc_console.c b/drivers/char/hvc_console.c
> > index ba55bba..50ac983 100644
> > --- a/drivers/char/hvc_console.c
> > +++ b/drivers/char/hvc_console.c
> > @@ -285,6 +285,9 @@ EXPORT_SYMBOL_GPL(hvc_instantiate);
> >  /* Wake the sleeping khvcd */
> >  void hvc_kick(void)
> >  {
> > +       if (!hvc_task)
> > +               return;
> > +
> >         hvc_kicked = 1;
> >         wake_up_process(hvc_task);
> >  }

		Amit
Sachin P. Sant March 26, 2010, 9:13 a.m. UTC | #3
Amit Shah wrote:
> On (Sun) Mar 21 2010 [10:07:25], Amit Shah wrote:
>   
>> On (Sun) Mar 21 2010 [08:04:39], Benjamin Herrenschmidt wrote:
>>     
>>> On Fri, 2010-03-19 at 08:18 -0700, Greg Kroah-Hartman wrote:
>>>       
>>>> From: Amit Shah <amit.shah@redhat.com>
>>>>
>>>> Alan pointed out a race in the code where hvc_remove is invoked. The
>>>> recent virtio_console work is the first user of hvc_remove().
>>>>         
>>> This causes hangs during boot on pseries machines. Haven't had a chance
>>> to track that down yet, but please revert
>>> e74d098c66543d0731de62eb747ccd5b636a6f4c for now.
>>>       
<SNIP>

>
>
> And this suggests that hvc_kick() is called before hvc_task is
> initialised, ie, before hvc_init() is called.
>
> Does this help?
>   
Hi Amit,

I too ran into this issue as reported by Anton. Unfortunately in my
case the following suggested patch does not help boot the powerpc
box successfully. The boot still hangs with following messages

/dev/sda5: clean, 229739/2097152 files, 2485598/8377888 blocks (check in 4 mounts)
fsck succeeded. Mounting root device read-write.
Mounting root /dev/sda5
mount -o rw,acl,user_xattr -t ext3 /dev/sda5 /root
kjournald starting.  Commit interval 5 seconds
EXT3-fs (sda5): using internal journal
EXT3-fs (sda5): mounted filesystem with writeback data mode
mount: can't find /root/proc in /etc/fstab or /etc/mtab
/sbin/smart_agetty: can not determine 'console' speed


Just to confirm i too am facing same issue as reported by Anton,
reverted commit e74d098c66543d0731de62eb747ccd5b636a6f4c and that allowed
by box to boot successfully.

Thanks
-Sachin

> diff --git a/drivers/char/hvc_console.c b/drivers/char/hvc_console.c
> index ba55bba..50ac983 100644
> --- a/drivers/char/hvc_console.c
> +++ b/drivers/char/hvc_console.c
> @@ -285,6 +285,9 @@ EXPORT_SYMBOL_GPL(hvc_instantiate);
>  /* Wake the sleeping khvcd */
>  void hvc_kick(void)
>  {
> +       if (!hvc_task)
> +               return;
> +
>         hvc_kicked = 1;
>         wake_up_process(hvc_task);
>  }
>
> 		Amit
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>
Amit Shah March 26, 2010, 9:58 a.m. UTC | #4
On (Fri) Mar 26 2010 [14:43:56], Sachin Sant wrote:
>> And this suggests that hvc_kick() is called before hvc_task is
>> initialised, ie, before hvc_init() is called.
>>
>> Does this help?
>>   
> Hi Amit,
>
> I too ran into this issue as reported by Anton. Unfortunately in my
> case the following suggested patch does not help boot the powerpc
> box successfully. The boot still hangs with following messages

Was the hang you saw before applying this patch at the same place / for
the same reason?

> /dev/sda5: clean, 229739/2097152 files, 2485598/8377888 blocks (check in 4 mounts)
> fsck succeeded. Mounting root device read-write.
> Mounting root /dev/sda5
> mount -o rw,acl,user_xattr -t ext3 /dev/sda5 /root
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs (sda5): using internal journal
> EXT3-fs (sda5): mounted filesystem with writeback data mode
> mount: can't find /root/proc in /etc/fstab or /etc/mtab
> /sbin/smart_agetty: can not determine 'console' speed

OK, this gets even murkier. I don't know why taking a few krefs modifies
hvc_console behaviour to such an extent.

> Just to confirm i too am facing same issue as reported by Anton,
> reverted commit e74d098c66543d0731de62eb747ccd5b636a6f4c and that allowed
> by box to boot successfully.

Which module uses hvc_console in your case?

		Amit
Stephen Rothwell March 26, 2010, 10:54 a.m. UTC | #5
Hi Amit,

On Fri, 26 Mar 2010 15:28:21 +0530 Amit Shah <amit.shah@redhat.com> wrote:
>
> On (Fri) Mar 26 2010 [14:43:56], Sachin Sant wrote:
> >
> > I too ran into this issue as reported by Anton. Unfortunately in my
> > case the following suggested patch does not help boot the powerpc
> > box successfully. The boot still hangs with following messages

I have been getting this on my linux-next boot tests for a while as well.

> Was the hang you saw before applying this patch at the same place / for
> the same reason?

I get the same hangs with and without the patch.  Paulus bisected it
(again) to the commit below and reverting that commit fixes the problem.

> OK, this gets even murkier. I don't know why taking a few krefs modifies
> hvc_console behaviour to such an extent.
> 
> > Just to confirm i too am facing same issue as reported by Anton,
> > reverted commit e74d098c66543d0731de62eb747ccd5b636a6f4c and that allowed
> > by box to boot successfully.
> 
> Which module uses hvc_console in your case?

I assume hcv_vio.

I also tested using "init=/bin/bash" as a kernel command line option and
the console worked while I was running bash, but everything was very
sluggish.  Then when I did "exec /sbin/init" the console hung completely
- sysrq (^O) didn't even work any more (it did while I was running bash).
Sachin P. Sant March 26, 2010, 11:42 a.m. UTC | #6
Amit Shah wrote:
> On (Fri) Mar 26 2010 [14:43:56], Sachin Sant wrote:
>   
>>> And this suggests that hvc_kick() is called before hvc_task is
>>> initialised, ie, before hvc_init() is called.
>>>
>>> Does this help?
>>>   
>>>       
>> Hi Amit,
>>
>> I too ran into this issue as reported by Anton. Unfortunately in my
>> case the following suggested patch does not help boot the powerpc
>> box successfully. The boot still hangs with following messages
>>     
>
> Was the hang you saw before applying this patch at the same place / for
> the same reason?
>   
Yes the machine hangs at the same place with or without the patch.

>> mount: can't find /root/proc in /etc/fstab or /etc/mtab
>> /sbin/smart_agetty: can not determine 'console' speed
>>     
>
> OK, this gets even murkier. I don't know why taking a few krefs modifies
> hvc_console behaviour to such an extent.
>   
Looking at the commit e74d098c66543d0731de62eb747ccd5b636a6f4c,
i see that for every tty_kref_get() there is a corresponding
tty_kref_put() except maybe for the one in the following patch snippet

        spin_lock_irqsave(&hp->lock, flags);
        /* Check and then increment for fast path open. */
        if (hp->count++ > 0) {
+               tty_kref_get(tty);
                spin_unlock_irqrestore(&hp->lock, flags);
                hvc_kick();
                return 0;

I don't know this code very well but we might be missing a
corresponding tty_kref_put() some place ?

If i comment out the above tty_kref_get() call, the machine
boots fine with or without the suggested patch.

Thanks
-Sachin


>   
>> Just to confirm i too am facing same issue as reported by Anton,
>> reverted commit e74d098c66543d0731de62eb747ccd5b636a6f4c and that allowed
>> by box to boot successfully.
>>     
>
> Which module uses hvc_console in your case?
>
> 		Amit
>
>
Alan Cox March 26, 2010, 11:52 a.m. UTC | #7
> >> mount: can't find /root/proc in /etc/fstab or /etc/mtab
> >> /sbin/smart_agetty: can not determine 'console' speed
> >>     
> >
> > OK, this gets even murkier. I don't know why taking a few krefs modifies
> > hvc_console behaviour to such an extent.

Can you see if the agetty is getting a hangup or open failing. I wonder
if its just a timing change ?

Alan
Amit Shah March 26, 2010, 12:43 p.m. UTC | #8
On (Fri) Mar 26 2010 [17:12:22], Sachin Sant wrote:
> Amit Shah wrote:
>> On (Fri) Mar 26 2010 [14:43:56], Sachin Sant wrote:
>>   
>>>> And this suggests that hvc_kick() is called before hvc_task is
>>>> initialised, ie, before hvc_init() is called.
>>>>
>>>> Does this help?
>>>>         
>>> Hi Amit,
>>>
>>> I too ran into this issue as reported by Anton. Unfortunately in my
>>> case the following suggested patch does not help boot the powerpc
>>> box successfully. The boot still hangs with following messages
>>>     
>>
>> Was the hang you saw before applying this patch at the same place / for
>> the same reason?
>>   
> Yes the machine hangs at the same place with or without the patch.

OK.

>>> mount: can't find /root/proc in /etc/fstab or /etc/mtab
>>> /sbin/smart_agetty: can not determine 'console' speed
>>>     
>>
>> OK, this gets even murkier. I don't know why taking a few krefs modifies
>> hvc_console behaviour to such an extent.
>>   
> Looking at the commit e74d098c66543d0731de62eb747ccd5b636a6f4c,
> i see that for every tty_kref_get() there is a corresponding
> tty_kref_put() except maybe for the one in the following patch snippet
>
>        spin_lock_irqsave(&hp->lock, flags);
>        /* Check and then increment for fast path open. */
>        if (hp->count++ > 0) {
> +               tty_kref_get(tty);
>                spin_unlock_irqrestore(&hp->lock, flags);
>                hvc_kick();
>                return 0;
>
> I don't know this code very well but we might be missing a
> corresponding tty_kref_put() some place ?

See hvc_hangup:

	temp_open_count = hp->count;
	...
	while(temp_open_count) {
		--temp_open_count;
		tty_kref_put(tty);
		kref_put(&hp->kref, destroy_hvc_struct);
	}

> If i comment out the above tty_kref_get() call, the machine
> boots fine with or without the suggested patch.

Hm, so as Alan suggests, it could be a timing issue in getting the ref
and returning back from hvc_open(). I don't know how the tty layer
works, but delays in getting the open() succeed results in badness?
Guess we'll have to poke at the other drivers to see how they handle
open() calls.

		Amit
Amit Shah March 26, 2010, 12:49 p.m. UTC | #9
On (Fri) Mar 26 2010 [11:52:33], Alan Cox wrote:
> > >> mount: can't find /root/proc in /etc/fstab or /etc/mtab
> > >> /sbin/smart_agetty: can not determine 'console' speed
> > >>     
> > >
> > > OK, this gets even murkier. I don't know why taking a few krefs modifies
> > > hvc_console behaviour to such an extent.
> 
> Can you see if the agetty is getting a hangup or open failing. I wonder
> if its just a timing change ?

Can someone please try this? Looks like it only reproduces on ppc, I
can't get it to reproduce in my kvm vm. I'm trying to get hold of a ppc
box, though. (btw ppc, ppc64, both should trigger this, right?)

		Amit
diff mbox

Patch

diff --git a/drivers/char/hvc_console.c b/drivers/char/hvc_console.c
index ba55bba..50ac983 100644
--- a/drivers/char/hvc_console.c
+++ b/drivers/char/hvc_console.c
@@ -285,6 +285,9 @@  EXPORT_SYMBOL_GPL(hvc_instantiate);
 /* Wake the sleeping khvcd */
 void hvc_kick(void)
 {
+       if (!hvc_task)
+               return;
+
        hvc_kicked = 1;
        wake_up_process(hvc_task);
 }