diff mbox

[RFC] isdn/capi: fix up CAPI subsystem workaround locking a bit

Message ID 20091003120657.2228911186C@xenon.ts.pxnet.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Tilman Schmidt Oct. 3, 2009, 12:06 p.m. UTC
Move calls to handle_minor_send() and handle_minor_recv() out of
the sections locked by workaround_lock.
- handle_minor_send() may call another CAPI function via the card
  driver, deadlocking by trying to take workaround_lock again.
- handle_minor_recv() calls the receive_buf method of the active
  line discipline which may sleep.

This fixes Bugzilla entries 11687 and 14305 but may reenlarge the
window of vulnerability for the races that were not-quite-fixed by
commit 053b47ff249b9e0a634dae807f81465205e7c228. To avoid one
specific race, read the mp->tty member of the capiminor structure
only once in handle_recv_skb().

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
---
I wasn't able to get any information on the nature of the problem fixed
by commit 053b47ff249b9e0a634dae807f81465205e7c228 from its author, nor
did my search of the LKML archives yield anything on it, so I went for
a minimally invasive approach. It works on my test machine, but a
complete overhaul of locking in capi.ko would of course be better.

 drivers/isdn/capi/capi.c |   33 +++++++++++++++++++--------------
 1 files changed, 19 insertions(+), 14 deletions(-)

Comments

Michael Buesch Oct. 3, 2009, 6:26 p.m. UTC | #1
On Saturday 03 October 2009 14:06:57 Tilman Schmidt wrote:
> Move calls to handle_minor_send() and handle_minor_recv() out of
> the sections locked by workaround_lock.
> - handle_minor_send() may call another CAPI function via the card
>   driver, deadlocking by trying to take workaround_lock again.
> - handle_minor_recv() calls the receive_buf method of the active
>   line discipline which may sleep.

I remember that handle_minor_send() and/or handle_minor_recv() showed up
in the crash backtraces. So if you move them out of the critical
section, you can as well remove the lock completely.
Michael Buesch Oct. 3, 2009, 6:35 p.m. UTC | #2
On Saturday 03 October 2009 20:26:22 Michael Buesch wrote:
> On Saturday 03 October 2009 14:06:57 Tilman Schmidt wrote:
> > Move calls to handle_minor_send() and handle_minor_recv() out of
> > the sections locked by workaround_lock.
> > - handle_minor_send() may call another CAPI function via the card
> >   driver, deadlocking by trying to take workaround_lock again.
> > - handle_minor_recv() calls the receive_buf method of the active
> >   line discipline which may sleep.
> 
> I remember that handle_minor_send() and/or handle_minor_recv() showed up
> in the crash backtraces. So if you move them out of the critical
> section, you can as well remove the lock completely.
> 

here's my original mail:
http://lkml.indiana.edu/hypermail/linux/kernel/0605.0/0455.html

Note the patch in that mail does _not_ fix the issue, as it turned out later.
Then I did the workaround-lock patch, which _did_ fix it.

But in that mail you can see the original crash backtraces, which might be
useful for finding out what _really_ happened.
Tilman Schmidt Oct. 5, 2009, 11:42 a.m. UTC | #3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 2009-10-03 20:35:19 +0200, Michael Buesch wrote:

>> I remember that handle_minor_send() and/or handle_minor_recv() showed up
>> in the crash backtraces. So if you move them out of the critical
>> section, you can as well remove the lock completely.
> 
> here's my original mail:
> http://lkml.indiana.edu/hypermail/linux/kernel/0605.0/0455.html
> 
> Note the patch in that mail does _not_ fix the issue, as it turned out later.
> Then I did the workaround-lock patch, which _did_ fix it.

Thanks for the info. So do I understand correctly that after:

commit 6aa65472d18703064898eefb5eb58f7ecd0d8912
Author: Michael Buesch <mb@bu3sch.de>
Date:   Mon Jun 26 00:25:30 2006 -0700

    [PATCH] CAPI crash / race condition

you were actually still seeing LIST_POISON2 Oopses in
capiminor_del_ack(), but after:

commit 053b47ff249b9e0a634dae807f81465205e7c228
Author: Michael Buesch <mb@bu3sch.de>
Date:   Mon Feb 12 00:53:26 2007 -0800

    [PATCH] Workaround CAPI subsystem locking issue

they were gone? That's interesting. I'll try to wrap my mind around
this. capiminor_del_ack() isn't that big, after all.

It's unfortunate that these crashes only seem to occur with one specific
device (FritzCard DSL) which I don't have. Can anyone shed some light on
what that device is doing differently from other ISDN cards?

Thanks,
Tilman

- --
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFKydulQ3+did9BuFsRAsYVAJ9e5azOPmNycMHZM6onfsDIY21a5wCffNh5
14WY2tYjla7wmmVgCHe+qDo=
=BgmY
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Buesch Oct. 5, 2009, 9:24 p.m. UTC | #4
On Monday 05 October 2009 13:42:29 Tilman Schmidt wrote:
> On Sat, 2009-10-03 20:35:19 +0200, Michael Buesch wrote:
> 
> >> I remember that handle_minor_send() and/or handle_minor_recv() showed up
> >> in the crash backtraces. So if you move them out of the critical
> >> section, you can as well remove the lock completely.
> >
> > here's my original mail:
> > http://lkml.indiana.edu/hypermail/linux/kernel/0605.0/0455.html
> >
> > Note the patch in that mail does _not_ fix the issue, as it turned out later.
> > Then I did the workaround-lock patch, which _did_ fix it.
> 
> Thanks for the info. So do I understand correctly that after:
> 
> commit 6aa65472d18703064898eefb5eb58f7ecd0d8912
> Author: Michael Buesch <mb@bu3sch.de>
> Date:   Mon Jun 26 00:25:30 2006 -0700
> 
>     [PATCH] CAPI crash / race condition
> 
> you were actually still seeing LIST_POISON2 Oopses in
> capiminor_del_ack(), but after:

Yeah well. The oops with LIST_POISON was with a patch that converted the
datahandle_queue to struct list_head, but without the spinlock_t ackqlock added.
Then I added the spinlock_t ackqlock and it first seemed to fix the problem. (That
is the patch from the mail).
But it did only shrink the race window, so the crash did still happen, but less often.
The crash was only "fixed" with the workaround_lock patch (but _without_ any of the
ackqueue patches applied.)

> commit 053b47ff249b9e0a634dae807f81465205e7c228
> Author: Michael Buesch <mb@bu3sch.de>
> Date:   Mon Feb 12 00:53:26 2007 -0800
> 
>     [PATCH] Workaround CAPI subsystem locking issue
> 
> they were gone? That's interesting. I'll try to wrap my mind around
> this.

Yeah, this sledgehammer lock did fix the crash while leaving the old non-list-head
queue in place (it should still be there today).

> It's unfortunate that these crashes only seem to occur with one specific
> device (FritzCard DSL) which I don't have.

I still have the device somewhere. If you want to have it, I can blow off the
dust and send it to you. If you don't want it, I'll throw it away soon.
I'd really like to send it to you to get rid of it. ;)

> Can anyone shed some light on 
> what that device is doing differently from other ISDN cards?

Well, it's a combined ISDN/DSL card, but I never used the ISDN part. So the crash
happened while transferring data over the DSL link.
The vendor driver is closed source with an open wrapper (like nvidia). It's a pretty
crappy unmaintained piece of software, but it ran stable with some patches applied
to the driver and the workaround-lock patch to the capi stack.
Tilman Schmidt Oct. 6, 2009, 5:52 p.m. UTC | #5
On Mon, 2009-10-05 23:24:07 +0200, Michael Buesch wrote:
> On Monday 05 October 2009 13:42:29 Tilman Schmidt wrote:

>> Thanks for the info. So do I understand correctly that after:
>>
>> commit 6aa65472d18703064898eefb5eb58f7ecd0d8912
>> Author: Michael Buesch <mb@bu3sch.de>
>> Date:   Mon Jun 26 00:25:30 2006 -0700
>>
>>     [PATCH] CAPI crash / race condition
>>
>> you were actually still seeing LIST_POISON2 Oopses in
>> capiminor_del_ack(), but after:
> 
> Yeah well. The oops with LIST_POISON was with a patch that converted the
> datahandle_queue to struct list_head, but without the spinlock_t ackqlock added.
> Then I added the spinlock_t ackqlock and it first seemed to fix the problem. (That
> is the patch from the mail).

Understood.

> But it did only shrink the race window, so the crash did still happen, but less often.

Ok.

> The crash was only "fixed" with the workaround_lock patch (but _without_ any of the
> ackqueue patches applied.)
> 
>> commit 053b47ff249b9e0a634dae807f81465205e7c228
>> Author: Michael Buesch <mb@bu3sch.de>
>> Date:   Mon Feb 12 00:53:26 2007 -0800
>>
>>     [PATCH] Workaround CAPI subsystem locking issue
>>
>> they were gone? That's interesting. I'll try to wrap my mind around
>> this.
> 
> Yeah, this sledgehammer lock did fix the crash while leaving the old non-list-head
> queue in place (it should still be there today).

That is not the case. In mainline, your second patch was applied on
top of the first one, so there is now struct list_head ackqueue and
spinlock_t ackqlock as well as static DEFINE_SPINLOCK(workaround_lock).

>> It's unfortunate that these crashes only seem to occur with one specific
>> device (FritzCard DSL) which I don't have.
> 
> I still have the device somewhere. If you want to have it, I can blow off the
> dust and send it to you. If you don't want it, I'll throw it away soon.
> I'd really like to send it to you to get rid of it. ;)

Feel free to do so. I'll send you my snailmail address by PM. I'm
making no promises about when I might get around to actually testing
it, though.

>> Can anyone shed some light on 
>> what that device is doing differently from other ISDN cards?
> 
> Well, it's a combined ISDN/DSL card, but I never used the ISDN part. So the crash
> happened while transferring data over the DSL link.

DSL over CAPI? Strange.

> The vendor driver is closed source with an open wrapper (like nvidia). It's a pretty
> crappy unmaintained piece of software, but it ran stable with some patches applied
> to the driver and the workaround-lock patch to the capi stack.

O dear. I wonder what I'll find.

Thanks,
Tilman
Michael Buesch Oct. 6, 2009, 9:01 p.m. UTC | #6
On Tuesday 06 October 2009 19:52:55 Tilman Schmidt wrote:
> > Yeah, this sledgehammer lock did fix the crash while leaving the old non-list-head
> > queue in place (it should still be there today).
> 
> That is not the case. In mainline, your second patch was applied on
> top of the first one, so there is now struct list_head ackqueue and
> spinlock_t ackqlock as well as static DEFINE_SPINLOCK(workaround_lock).

Ok, that's fine, too.
It's just that the ackqlock wasn't enough to fix it, as I said. It just shrinks the window.

> > I still have the device somewhere. If you want to have it, I can blow off the
> > dust and send it to you. If you don't want it, I'll throw it away soon.
> > I'd really like to send it to you to get rid of it. ;)
> 
> Feel free to do so. I'll send you my snailmail address by PM. I'm
> making no promises about when I might get around to actually testing
> it, though.

Thanks, I'll get it to the post office by the end of the week.

> >> Can anyone shed some light on 
> >> what that device is doing differently from other ISDN cards?
> > 
> > Well, it's a combined ISDN/DSL card, but I never used the ISDN part. So the crash
> > happened while transferring data over the DSL link.
> 
> DSL over CAPI? Strange.

Yeah, dunno. They did it that way... Pretty weird.

> > The vendor driver is closed source with an open wrapper (like nvidia). It's a pretty
> > crappy unmaintained piece of software, but it ran stable with some patches applied
> > to the driver and the workaround-lock patch to the capi stack.
> 
> O dear. I wonder what I'll find.

ftp://ftp.avm.de/cardware/fritzcrd.dsl/linux/suse.93/fcdsl-suse93-3.11-07.tar.gz

This is the driver that I used. It needs some modification to compile on a recent kernel.
Unfortunately I don't have the patch anymore. But it was rather trivial. It renamed some functions,
because they clashed with others and did some other things I don't remember.
Remember to get a bucket before looking at the code, because you will need it. ;)
diff mbox

Patch

diff --git a/drivers/isdn/capi/capi.c b/drivers/isdn/capi/capi.c
index 65bf91e..f348df2 100644
--- a/drivers/isdn/capi/capi.c
+++ b/drivers/isdn/capi/capi.c
@@ -452,18 +452,19 @@  static int handle_recv_skb(struct capiminor *mp, struct sk_buff *skb)
 	struct sk_buff *nskb;
 	int datalen;
 	u16 errcode, datahandle;
+	struct tty_struct *tty;
 	struct tty_ldisc *ld;
 	
 	datalen = skb->len - CAPIMSG_LEN(skb->data);
-	if (mp->tty == NULL)
-	{
+	tty = mp->tty;
+	if (tty == NULL) {
 #ifdef _DEBUG_DATAFLOW
 		printk(KERN_DEBUG "capi: currently no receiver\n");
 #endif
 		return -1;
 	}
 	
-	ld = tty_ldisc_ref(mp->tty);
+	ld = tty_ldisc_ref(tty);
 	if (ld == NULL)
 		return -1;
 	if (ld->ops->receive_buf == NULL) {
@@ -478,7 +479,7 @@  static int handle_recv_skb(struct capiminor *mp, struct sk_buff *skb)
 #endif
 		goto bad;
 	}
-	if (mp->tty->receive_room < datalen) {
+	if (tty->receive_room < datalen) {
 #if defined(_DEBUG_DATAFLOW) || defined(_DEBUG_TTYFUNCS)
 		printk(KERN_DEBUG "capi: no room in tty\n");
 #endif
@@ -501,7 +502,7 @@  static int handle_recv_skb(struct capiminor *mp, struct sk_buff *skb)
 	printk(KERN_DEBUG "capi: DATA_B3_RESP %u len=%d => ldisc\n",
 				datahandle, skb->len);
 #endif
-	ld->ops->receive_buf(mp->tty, skb->data, NULL, skb->len);
+	ld->ops->receive_buf(tty, skb->data, NULL, skb->len);
 	kfree_skb(skb);
 	tty_ldisc_deref(ld);
 	return 0;
@@ -653,7 +654,9 @@  static void capi_recv_message(struct capi20_appl *ap, struct sk_buff *skb)
 #endif
 		skb_queue_tail(&mp->inqueue, skb);
 		mp->inbytes += skb->len;
+		spin_unlock_irqrestore(&workaround_lock, flags);
 		handle_minor_recv(mp);
+		return;
 
 	} else if (CAPIMSG_SUBCOMMAND(skb->data) == CAPI_CONF) {
 
@@ -667,7 +670,9 @@  static void capi_recv_message(struct capi20_appl *ap, struct sk_buff *skb)
 		(void)capiminor_del_ack(mp, datahandle);
 		if (mp->tty)
 			tty_wakeup(mp->tty);
-		(void)handle_minor_send(mp);
+		spin_unlock_irqrestore(&workaround_lock, flags);
+		handle_minor_send(mp);
+		return;
 
 	} else {
 		/* ups, let capi application handle it :-) */
@@ -1042,8 +1047,8 @@  static int capinc_tty_open(struct tty_struct * tty, struct file * file)
 #ifdef _DEBUG_REFCOUNT
 	printk(KERN_DEBUG "capinc_tty_open ocount=%d\n", atomic_read(&mp->ttyopencount));
 #endif
-	handle_minor_recv(mp);
 	spin_unlock_irqrestore(&workaround_lock, flags);
+	handle_minor_recv(mp);
 	return 0;
 }
 
@@ -1110,9 +1115,9 @@  static int capinc_tty_write(struct tty_struct * tty,
 
 	skb_queue_tail(&mp->outqueue, skb);
 	mp->outbytes += skb->len;
-	(void)handle_minor_send(mp);
-	(void)handle_minor_recv(mp);
 	spin_unlock_irqrestore(&workaround_lock, flags);
+	handle_minor_send(mp);
+	handle_minor_recv(mp);
 	return count;
 }
 
@@ -1145,7 +1150,6 @@  static int capinc_tty_put_char(struct tty_struct *tty, unsigned char ch)
 		mp->ttyskb = NULL;
 		skb_queue_tail(&mp->outqueue, skb);
 		mp->outbytes += skb->len;
-		(void)handle_minor_send(mp);
 	}
 	skb = alloc_skb(CAPI_DATA_B3_REQ_LEN+CAPI_MAX_BLKSIZE, GFP_ATOMIC);
 	if (skb) {
@@ -1157,6 +1161,7 @@  static int capinc_tty_put_char(struct tty_struct *tty, unsigned char ch)
 		ret = 0;
 	}
 	spin_unlock_irqrestore(&workaround_lock, flags);
+	handle_minor_send(mp);
 	return ret;
 }
 
@@ -1183,10 +1188,10 @@  static void capinc_tty_flush_chars(struct tty_struct *tty)
 		mp->ttyskb = NULL;
 		skb_queue_tail(&mp->outqueue, skb);
 		mp->outbytes += skb->len;
-		(void)handle_minor_send(mp);
 	}
-	(void)handle_minor_recv(mp);
 	spin_unlock_irqrestore(&workaround_lock, flags);
+	handle_minor_send(mp);
+	handle_minor_recv(mp);
 }
 
 static int capinc_tty_write_room(struct tty_struct *tty)
@@ -1264,8 +1269,8 @@  static void capinc_tty_unthrottle(struct tty_struct * tty)
 	if (mp) {
 		spin_lock_irqsave(&workaround_lock, flags);
 		mp->ttyinstop = 0;
-		handle_minor_recv(mp);
 		spin_unlock_irqrestore(&workaround_lock, flags);
+		handle_minor_recv(mp);
 	}
 }
 
@@ -1290,8 +1295,8 @@  static void capinc_tty_start(struct tty_struct *tty)
 	if (mp) {
 		spin_lock_irqsave(&workaround_lock, flags);
 		mp->ttyoutstop = 0;
-		(void)handle_minor_send(mp);
 		spin_unlock_irqrestore(&workaround_lock, flags);
+		handle_minor_send(mp);
 	}
 }