Patchwork netconsole: queue console messages to send later

login
register
mail settings
Submitter Flavio Leitner
Date June 7, 2010, 7:24 p.m.
Message ID <1275938692-26997-1-git-send-email-fleitner@redhat.com>
Download mbox | patch
Permalink /patch/54890/
State Rejected
Delegated to: David Miller
Headers show

Comments

Flavio Leitner - June 7, 2010, 7:24 p.m.
There are some networking drivers that hold a lock in the
transmit path. Therefore, if a console message is printed
after that, netconsole will push it through the transmit path,
resulting in a deadlock.

This patch fixes the re-injection problem by queuing the console
messages in a preallocated circular buffer and then scheduling a
workqueue to send them later with another context.

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
---
 drivers/net/netconsole.c |  156 +++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 133 insertions(+), 23 deletions(-)
Matt Mackall - June 7, 2010, 7:50 p.m.
On Mon, 2010-06-07 at 16:24 -0300, Flavio Leitner wrote:
> There are some networking drivers that hold a lock in the
> transmit path. Therefore, if a console message is printed
> after that, netconsole will push it through the transmit path,
> resulting in a deadlock.

This is an ongoing pain we've known about since before introducing the
netpoll code to the tree.

My take has always been that any form of queueing is contrary to the
goal of netpoll: timely delivery of messages even during machine-killing
situations like oopses. There may never be a second chance to deliver
the message as the machine may be locked solid. And there may be no
other way to get the message out of the box in such situations. Adding
queueing is a throwing-the-baby-out-with-the-bathwater fix.

I think Dave agrees with me here, and I believe he's said in the past
that drivers trying to print messages in such contexts should be
considered buggy.
Stephen Hemminger - June 7, 2010, 8 p.m.
On Mon, 07 Jun 2010 14:50:48 -0500
Matt Mackall <mpm@selenic.com> wrote:

> On Mon, 2010-06-07 at 16:24 -0300, Flavio Leitner wrote:
> > There are some networking drivers that hold a lock in the
> > transmit path. Therefore, if a console message is printed
> > after that, netconsole will push it through the transmit path,
> > resulting in a deadlock.
> 
> This is an ongoing pain we've known about since before introducing the
> netpoll code to the tree.
> 
> My take has always been that any form of queueing is contrary to the
> goal of netpoll: timely delivery of messages even during machine-killing
> situations like oopses. There may never be a second chance to deliver
> the message as the machine may be locked solid. And there may be no
> other way to get the message out of the box in such situations. Adding
> queueing is a throwing-the-baby-out-with-the-bathwater fix.
> 
> I think Dave agrees with me here, and I believe he's said in the past
> that drivers trying to print messages in such contexts should be
> considered buggy.
> 

Because it to hard to fix all possible device configurations.
There should be any way to detect recursion and just drop the message to
avoid deadlock.
Matt Mackall - June 7, 2010, 8:21 p.m.
On Mon, 2010-06-07 at 13:00 -0700, Stephen Hemminger wrote:
> On Mon, 07 Jun 2010 14:50:48 -0500
> Matt Mackall <mpm@selenic.com> wrote:
> 
> > On Mon, 2010-06-07 at 16:24 -0300, Flavio Leitner wrote:
> > > There are some networking drivers that hold a lock in the
> > > transmit path. Therefore, if a console message is printed
> > > after that, netconsole will push it through the transmit path,
> > > resulting in a deadlock.
> > 
> > This is an ongoing pain we've known about since before introducing the
> > netpoll code to the tree.
> > 
> > My take has always been that any form of queueing is contrary to the
> > goal of netpoll: timely delivery of messages even during machine-killing
> > situations like oopses. There may never be a second chance to deliver
> > the message as the machine may be locked solid. And there may be no
> > other way to get the message out of the box in such situations. Adding
> > queueing is a throwing-the-baby-out-with-the-bathwater fix.
> > 
> > I think Dave agrees with me here, and I believe he's said in the past
> > that drivers trying to print messages in such contexts should be
> > considered buggy.
> > 
> 
> Because it to hard to fix all possible device configurations.
> There should be any way to detect recursion and just drop the message to
> avoid deadlock.

Open to suggestions. The locks in question are driver-internal. There
also may not be any actual recursion taking place:

driver path a takes private lock x
driver path a attempts printk
printk calls into netconsole
netconsole calls into driver path b
driver path b attempts to take lock x -> deadlock

So we can't even try to walk back the stack looking for such nonsense.
Though we could perhaps force queuing of all messages -from- the driver
bound to netconsole. Tricky, and not quite foolproof.
David Miller - June 7, 2010, 11:50 p.m.
From: Flavio Leitner <fleitner@redhat.com>
Date: Mon,  7 Jun 2010 16:24:52 -0300

> There are some networking drivers that hold a lock in the
> transmit path. Therefore, if a console message is printed
> after that, netconsole will push it through the transmit path,
> resulting in a deadlock.
> 
> This patch fixes the re-injection problem by queuing the console
> messages in a preallocated circular buffer and then scheduling a
> workqueue to send them later with another context.
> 
> Signed-off-by: Flavio Leitner <fleitner@redhat.com>

You absolutely and positively MUST NOT do this.  Otherwise netconsole
becomes completely useless.  Your idea has been proposed several times
as far back as 6 years ago, it was unacceptable then and it's
unacceptable now.

The whole point of netconsole is that we may be deep in an interrupt
or other atomic context, the machine is about to hard hang, and it's
absolutely essential that we get out any and all kernel logging
messages that we can, immediately.

There may not be another timer or workqueue able to execute after the
printk() we're trying to emit.  We may never get to that point.

So if we defer messages, that means we won't get the message and we
won't be able to debug the problem.

Fix the locking in the drivers or layers that cause the issue instead
of breaking netconsole.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - June 7, 2010, 11:52 p.m.
From: Matt Mackall <mpm@selenic.com>
Date: Mon, 07 Jun 2010 15:21:31 -0500

> Open to suggestions. The locks in question are driver-internal. There
> also may not be any actual recursion taking place:
> 
> driver path a takes private lock x
> driver path a attempts printk
> printk calls into netconsole
> netconsole calls into driver path b
> driver path b attempts to take lock x -> deadlock
> 
> So we can't even try to walk back the stack looking for such nonsense.
> Though we could perhaps force queuing of all messages -from- the driver
> bound to netconsole. Tricky, and not quite foolproof.

Look, this is all nonsense talk.

This is only coming about because of the recent discussions about
bonding, so let's fix bonding's locking.  I've made concrete
suggestions on converting it's rwlocks over to spinlocks and RCU to
fix the specific problem bonding has.

Every time we hit some new locking issue the knee jerk reaction is
to do something stupid to the generic netconsole code instead of
fixing the real source of the problem.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Flavio Leitner - June 8, 2010, 12:37 a.m.
On Mon, Jun 07, 2010 at 04:50:24PM -0700, David Miller wrote:
> From: Flavio Leitner <fleitner@redhat.com>
> Date: Mon,  7 Jun 2010 16:24:52 -0300
> 
> > There are some networking drivers that hold a lock in the
> > transmit path. Therefore, if a console message is printed
> > after that, netconsole will push it through the transmit path,
> > resulting in a deadlock.
> > 
> > This patch fixes the re-injection problem by queuing the console
> > messages in a preallocated circular buffer and then scheduling a
> > workqueue to send them later with another context.
> > 
> > Signed-off-by: Flavio Leitner <fleitner@redhat.com>
> 
> You absolutely and positively MUST NOT do this.  Otherwise netconsole
> becomes completely useless.  Your idea has been proposed several times
> as far back as 6 years ago, it was unacceptable then and it's
> unacceptable now.
> 
> The whole point of netconsole is that we may be deep in an interrupt
> or other atomic context, the machine is about to hard hang, and it's
> absolutely essential that we get out any and all kernel logging
> messages that we can, immediately.

Got it. I've never assumed that netconsole would work reliable on 
such situations, so I thought as we have better ways now it would
be helpful. See another idea below.

> There may not be another timer or workqueue able to execute after the
> printk() we're trying to emit.  We may never get to that point.

What if in the netpoll, before we push the skb to the driver, we check
for a bit saying that it's already pushing another skb. In this case,
queue the new skb inside of netpoll and soon as the first call returns
and try to clear the bit, it will send the next skb?

printk("message 1")
...
netconsole called
netpoll sets the flag bit
pushes to the bonding driver which does another printk("message 2")
netconsole called again
netpoll checks for the flag, queue the message, returns.
so, bonding can finish up to send the first message
netpoll is about to return, checks for new queued messages, and pushes them.
bonding finishes up to send the second message
....

No deadlocks, skbs are ordered and still under the same opportunity
to send something. Does it sound acceptable?
It's off the top of my head, so probably this idea has some problems.


> Fix the locking in the drivers or layers that cause the issue instead
> of breaking netconsole.

Someday, somewhere, I know because I did this before, someone will
use a debugging printk() and will see the entire box hanging with
absolutely no message in any console because of this problem. 
I'm not saying that fixing driver isn't the right way to go but
it seems not enough to me.
Amerigo Wang - June 8, 2010, 8:59 a.m.
Thanks for your fix, Flavio!

On 06/08/10 08:37, Flavio Leitner wrote:
>> There may not be another timer or workqueue able to execute after the
>> printk() we're trying to emit.  We may never get to that point.
>
> What if in the netpoll, before we push the skb to the driver, we check
> for a bit saying that it's already pushing another skb. In this case,
> queue the new skb inside of netpoll and soon as the first call returns
> and try to clear the bit, it will send the next skb?
>
> printk("message 1")
> ...
> netconsole called
> netpoll sets the flag bit
> pushes to the bonding driver which does another printk("message 2")
> netconsole called again
> netpoll checks for the flag, queue the message, returns.
> so, bonding can finish up to send the first message
> netpoll is about to return, checks for new queued messages, and pushes them.
> bonding finishes up to send the second message
> ....
>
> No deadlocks, skbs are ordered and still under the same opportunity
> to send something. Does it sound acceptable?
> It's off the top of my head, so probably this idea has some problems.
>


I am not a net expert, I am not sure if this solution really addresses
David's concern, but it makes sense for me.

>
>> Fix the locking in the drivers or layers that cause the issue instead
>> of breaking netconsole.
>
> Someday, somewhere, I know because I did this before, someone will
> use a debugging printk() and will see the entire box hanging with
> absolutely no message in any console because of this problem.
> I'm not saying that fixing driver isn't the right way to go but
> it seems not enough to me.

Well, I think netconsole is not alone, other console drivers could
have the same problem, printk() is not always available in some
situation like this.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index ca142c4..874376d 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -44,6 +44,8 @@ 
 #include <linux/netpoll.h>
 #include <linux/inet.h>
 #include <linux/configfs.h>
+#include <linux/workqueue.h>
+#include <linux/circ_buf.h>
 
 MODULE_AUTHOR("Maintainer: Matt Mackall <mpm@selenic.com>");
 MODULE_DESCRIPTION("Console driver for network interfaces");
@@ -56,6 +58,10 @@  static char config[MAX_PARAM_LENGTH];
 module_param_string(netconsole, config, MAX_PARAM_LENGTH, 0);
 MODULE_PARM_DESC(netconsole, " netconsole=[src-port]@[src-ip]/[dev],[tgt-port]@<tgt-ip>/[tgt-macaddr]");
 
+static int logsize = PAGE_SIZE;
+module_param(logsize, int, 0444);
+MODULE_PARM_DESC(logsize, "netconsole buffer size");
+
 #ifndef	MODULE
 static int __init option_setup(char *opt)
 {
@@ -100,6 +106,75 @@  struct netconsole_target {
 	struct netpoll		np;
 };
 
+struct netconsole_msg_ctl {
+	struct circ_buf		messages;
+	unsigned long		ring_size;
+	struct page		*buffer_page;
+	struct work_struct 	tx_work;
+};
+static struct netconsole_msg_ctl *netconsole_ctl;
+
+#define RING_INC_POS(pos, inc, size) ((pos + inc) & (size - 1))
+
+static void netconsole_target_get(struct netconsole_target *nt);
+static void netconsole_target_put(struct netconsole_target *nt);
+
+static void netconsole_start_xmit(const char *msg, unsigned int length)
+{
+	int frag, left;
+	unsigned long flags;
+	struct netconsole_target *nt;
+	const char *tmp;
+
+	/* Avoid taking lock and disabling interrupts unnecessarily */
+	if (list_empty(&target_list))
+		return;
+
+	spin_lock_irqsave(&target_list_lock, flags);
+	list_for_each_entry(nt, &target_list, list) {
+		netconsole_target_get(nt);
+		if (nt->enabled && netif_running(nt->np.dev)) {
+			/*
+			 * We nest this inside the for-each-target loop above
+			 * so that we're able to get as much logging out to
+			 * at least one target if we die inside here, instead
+			 * of unnecessarily keeping all targets in lock-step.
+			 */
+			tmp = msg;
+			for (left = length; left;) {
+				frag = min(left, MAX_PRINT_CHUNK);
+				netpoll_send_udp(&nt->np, tmp, frag);
+				tmp += frag;
+				left -= frag;
+			}
+		}
+		netconsole_target_put(nt);
+	}
+	spin_unlock_irqrestore(&target_list_lock, flags);
+}
+
+static void netconsole_process_queue(struct work_struct *work)
+{
+	struct circ_buf *messages = &netconsole_ctl->messages;
+	unsigned long ring_size = netconsole_ctl->ring_size;
+	unsigned long head = ACCESS_ONCE(messages->head);
+	unsigned long len;
+
+	while (CIRC_CNT(head, messages->tail, ring_size) >= 1) {
+		/* read index before reading contents at that index */
+		smp_read_barrier_depends();
+
+		/* pick up a length that don't wrap in the middle */
+		len = CIRC_CNT_TO_END(head, messages->tail, ring_size);
+		netconsole_start_xmit(&messages->buf[messages->tail], len);
+
+		/* finish reading descriptor before incrementing tail */
+		smp_mb();
+		messages->tail = RING_INC_POS(messages->tail, len, ring_size);
+		head = ACCESS_ONCE(messages->head);
+	}
+}
+
 #ifdef	CONFIG_NETCONSOLE_DYNAMIC
 
 static struct configfs_subsystem netconsole_subsys;
@@ -702,38 +777,43 @@  static struct notifier_block netconsole_netdev_notifier = {
 	.notifier_call  = netconsole_netdev_event,
 };
 
+/* called with console sem, interrupts disabled */
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
-	int frag, left;
-	unsigned long flags;
-	struct netconsole_target *nt;
-	const char *tmp;
+	struct circ_buf *messages = &netconsole_ctl->messages;
+	unsigned long ring_size = netconsole_ctl->ring_size;
+	unsigned long tail = ACCESS_ONCE(messages->tail);
+	unsigned long left;
+	unsigned long end;
+	unsigned long pos;
 
 	/* Avoid taking lock and disabling interrupts unnecessarily */
 	if (list_empty(&target_list))
 		return;
 
-	spin_lock_irqsave(&target_list_lock, flags);
-	list_for_each_entry(nt, &target_list, list) {
-		netconsole_target_get(nt);
-		if (nt->enabled && netif_running(nt->np.dev)) {
-			/*
-			 * We nest this inside the for-each-target loop above
-			 * so that we're able to get as much logging out to
-			 * at least one target if we die inside here, instead
-			 * of unnecessarily keeping all targets in lock-step.
-			 */
-			tmp = msg;
-			for (left = len; left;) {
-				frag = min(left, MAX_PRINT_CHUNK);
-				netpoll_send_udp(&nt->np, tmp, frag);
-				tmp += frag;
-				left -= frag;
-			}
+	pos = 0;
+	left = len;
+	while (left && CIRC_SPACE(messages->head, tail, ring_size) >= 1) {
+		end = CIRC_SPACE_TO_END(messages->head, tail, ring_size);
+		/* fast path, no wrapping is needed */
+		if (end >= left) {
+			memcpy(&messages->buf[messages->head], &msg[pos], left);
+			smp_wmb(); 
+			messages->head = RING_INC_POS(messages->head, left, ring_size);
+			left = 0;
 		}
-		netconsole_target_put(nt);
+		else {
+			/* copy up to the end */
+			memcpy(&messages->buf[messages->head], &msg[pos], end);
+			smp_wmb(); 
+			messages->head = RING_INC_POS(messages->head, end, ring_size);
+			left -= end;
+			pos += end;
+		}
+
 	}
-	spin_unlock_irqrestore(&target_list_lock, flags);
+
+	schedule_work(&netconsole_ctl->tx_work);
 }
 
 static struct console netconsole = {
@@ -746,9 +826,25 @@  static int __init init_netconsole(void)
 {
 	int err;
 	struct netconsole_target *nt, *tmp;
+	struct circ_buf *messages;
 	unsigned long flags;
 	char *target_config;
 	char *input = config;
+	int order = get_order(logsize);
+
+	err = -ENOMEM;
+	netconsole_ctl = kzalloc(sizeof(*netconsole_ctl), GFP_KERNEL);
+	if (netconsole_ctl == NULL)
+		goto nomem;
+
+	netconsole_ctl->buffer_page = alloc_pages(GFP_KERNEL, order);
+	if (netconsole_ctl->buffer_page == NULL)
+		goto nopage;
+
+	netconsole_ctl->ring_size = (PAGE_SIZE << order);
+	messages = &netconsole_ctl->messages;
+	messages->buf = page_address(netconsole_ctl->buffer_page);
+	INIT_WORK(&netconsole_ctl->tx_work, netconsole_process_queue);
 
 	if (strnlen(input, MAX_PARAM_LENGTH)) {
 		while ((target_config = strsep(&input, ";"))) {
@@ -795,6 +891,11 @@  fail:
 		free_param_target(nt);
 	}
 
+	__free_pages(netconsole_ctl->buffer_page, order);
+nopage:
+	kfree(netconsole_ctl);
+
+nomem:
 	return err;
 }
 
@@ -806,6 +907,10 @@  static void __exit cleanup_netconsole(void)
 	dynamic_netconsole_exit();
 	unregister_netdevice_notifier(&netconsole_netdev_notifier);
 
+	flush_work(&netconsole_ctl->tx_work);
+	cancel_work_sync(&netconsole_ctl->tx_work);
+	netconsole_process_queue(NULL);
+
 	/*
 	 * Targets created via configfs pin references on our module
 	 * and would first be rmdir(2)'ed from userspace. We reach
@@ -818,6 +923,11 @@  static void __exit cleanup_netconsole(void)
 		list_del(&nt->list);
 		free_param_target(nt);
 	}
+
+	__free_pages(netconsole_ctl->buffer_page,
+			get_order(netconsole_ctl->ring_size));
+
+	kfree(netconsole_ctl);
 }
 
 module_init(init_netconsole);