diff mbox

netpoll: Fix carrier detection for drivers that are using phylib

Message ID alpine.LFD.2.01.0907081701050.3352@localhost.localdomain
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Linus Torvalds July 9, 2009, 12:01 a.m. UTC
On Thu, 9 Jul 2009, Anton Vorontsov wrote:
> 
> The netpoll code is using msleep() just a few lines below cond_resched(),
> so we won't make things worse. ;-)

Yeah. That function is definitely sleeping. It does things like 
kmalloc(GFP_KERNEL), rtnl_lock() and synchronize_rcu() etc too, so an 
added msleep() is the least of our problems.

Afaik, it's called from a bog-standard "module_init()", which happens late 
enough that everything works.

In fact, I wonder if we should set SYSTEM_RUNNING much earlier - _before_ 
doing the whole "do_initcalls()". By then we've set up all the core stuff 
and enabled interrupts, so we really _are_ running. We just don't 
necessarily have drivers, filesystems etc loaded yet. But anything that 
happens late enough to be an initcall should be largely considered to 
be during "normal code".

(The "early_initcall" cases are special - those really do happen pretty 
early).

So ACK on Anton's patch, but I wonder if we _also_ should do the 
following?

Looking at the people looking at SYSTEM_RUNNING, I do note some odd cases. 
Why the heck does kernel/perf_counter.c do it, for example?

		Linus

---
 init/main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller July 9, 2009, 3:08 a.m. UTC | #1
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 8 Jul 2009 17:01:14 -0700 (PDT)

> So ACK on Anton's patch,

I'll integrate it into my tree, thanks.

> but I wonder if we _also_ should do the following?

I think we should set SYSTEM_RUNNING earlier, makes sense
to me.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra July 9, 2009, 7:56 a.m. UTC | #2
On Wed, 2009-07-08 at 17:01 -0700, Linus Torvalds wrote:
> Looking at the people looking at SYSTEM_RUNNING, I do note some odd cases. 
> Why the heck does kernel/perf_counter.c do it, for example?

Ah, those are the swcounter and other probe entry points. I've had
several cases where we called into the perf counter code from those
points before it was initialized, getting in kernel segfaults due to
dereferencing uninitialized data etc..

I could keep a variable that tracked the perf_counter_init() state, and
use that instead if you prefer?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Mackall July 9, 2009, 12:56 p.m. UTC | #3
On Thu, 2009-07-09 at 09:56 +0200, Peter Zijlstra wrote:
> On Wed, 2009-07-08 at 17:01 -0700, Linus Torvalds wrote:
> > Looking at the people looking at SYSTEM_RUNNING, I do note some odd cases. 
> > Why the heck does kernel/perf_counter.c do it, for example?
> 
> Ah, those are the swcounter and other probe entry points. I've had
> several cases where we called into the perf counter code from those
> points before it was initialized, getting in kernel segfaults due to
> dereferencing uninitialized data etc..
> 
> I could keep a variable that tracked the perf_counter_init() state, and
> use that instead if you prefer?

Looks like that'd be more accurate. Linus's proposed patch might break
your current assumptions.
Matt Mackall July 9, 2009, 1:26 p.m. UTC | #4
On Wed, 2009-07-08 at 17:01 -0700, Linus Torvalds wrote:
> 
> On Thu, 9 Jul 2009, Anton Vorontsov wrote:
> > 
> > The netpoll code is using msleep() just a few lines below cond_resched(),
> > so we won't make things worse. ;-)
> 
> Yeah. That function is definitely sleeping. It does things like 
> kmalloc(GFP_KERNEL), rtnl_lock() and synchronize_rcu() etc too, so an 
> added msleep() is the least of our problems.
> 
> Afaik, it's called from a bog-standard "module_init()", which happens late 
> enough that everything works.
> 
> In fact, I wonder if we should set SYSTEM_RUNNING much earlier - _before_ 
> doing the whole "do_initcalls()".

Well there are two ways of consistently defining SYSTEM_RUNNING:

a) define it with reference to the well-understood notion of booting vs
running and don't switch it until handing off to init

b) define it with reference to its usage by an arbitrary user like
cond_resched()

In the latter case, we obviously need to move it to the earliest point
that scheduling is possible. But there are a number of things like 

http://lxr.linux.no/linux+v2.6.30/kernel/printk.c#L228

that assume the definition is actually (a). We're currently within a
couple lines of a strict definition of (a) already, so I actually think
cond_resched() is just wrong (and we already know it broke a
previously-working user). It should perhaps be using another private
flag that gets set as soon as scheduling is up and running.

But I'd actually go further and say that it's unfortunate to be checking
extra flags in such an important inline, especially since the check is
false for all but the first couple seconds of run time. Seems like we
could avoid adding an extra check by artificially elevating the preempt
count in early boot (or at compile time) then dropping it when
scheduling becomes available.
Peter Zijlstra July 9, 2009, 1:46 p.m. UTC | #5
On Thu, 2009-07-09 at 08:26 -0500, Matt Mackall wrote: 
> On Wed, 2009-07-08 at 17:01 -0700, Linus Torvalds wrote:
> > 
> > On Thu, 9 Jul 2009, Anton Vorontsov wrote:
> > > 
> > > The netpoll code is using msleep() just a few lines below cond_resched(),
> > > so we won't make things worse. ;-)
> > 
> > Yeah. That function is definitely sleeping. It does things like 
> > kmalloc(GFP_KERNEL), rtnl_lock() and synchronize_rcu() etc too, so an 
> > added msleep() is the least of our problems.
> > 
> > Afaik, it's called from a bog-standard "module_init()", which happens late 
> > enough that everything works.
> > 
> > In fact, I wonder if we should set SYSTEM_RUNNING much earlier - _before_ 
> > doing the whole "do_initcalls()".
> 
> Well there are two ways of consistently defining SYSTEM_RUNNING:
> 
> a) define it with reference to the well-understood notion of booting vs
> running and don't switch it until handing off to init

This makes the most sense IMHO.

> b) define it with reference to its usage by an arbitrary user like
> cond_resched()
> 
> In the latter case, we obviously need to move it to the earliest point
> that scheduling is possible. But there are a number of things like 
> 
> http://lxr.linux.no/linux+v2.6.30/kernel/printk.c#L228
> 
> that assume the definition is actually (a). We're currently within a
> couple lines of a strict definition of (a) already, so I actually think
> cond_resched() is just wrong (and we already know it broke a
> previously-working user). It should perhaps be using another private
> flag that gets set as soon as scheduling is up and running.

Right as mentioned before in this thread, we grew scheduler_running a
while back which could be used for this.

> But I'd actually go further and say that it's unfortunate to be checking
> extra flags in such an important inline, especially since the check is
> false for all but the first couple seconds of run time. Seems like we
> could avoid adding an extra check by artificially elevating the preempt
> count in early boot (or at compile time) then dropping it when
> scheduling becomes available.

Calling cond_resched() and co when !preemptable is an error so this
wouldn't actually work.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Mackall July 9, 2009, 2:18 p.m. UTC | #6
On Thu, 2009-07-09 at 15:46 +0200, Peter Zijlstra wrote:
> On Thu, 2009-07-09 at 08:26 -0500, Matt Mackall wrote: 
> > On Wed, 2009-07-08 at 17:01 -0700, Linus Torvalds wrote:
> > > 
> > > On Thu, 9 Jul 2009, Anton Vorontsov wrote:
> > > > 
> > > > The netpoll code is using msleep() just a few lines below cond_resched(),
> > > > so we won't make things worse. ;-)
> > > 
> > > Yeah. That function is definitely sleeping. It does things like 
> > > kmalloc(GFP_KERNEL), rtnl_lock() and synchronize_rcu() etc too, so an 
> > > added msleep() is the least of our problems.
> > > 
> > > Afaik, it's called from a bog-standard "module_init()", which happens late 
> > > enough that everything works.
> > > 
> > > In fact, I wonder if we should set SYSTEM_RUNNING much earlier - _before_ 
> > > doing the whole "do_initcalls()".
> > 
> > Well there are two ways of consistently defining SYSTEM_RUNNING:
> > 
> > a) define it with reference to the well-understood notion of booting vs
> > running and don't switch it until handing off to init
> 
> This makes the most sense IMHO.
> 
> > b) define it with reference to its usage by an arbitrary user like
> > cond_resched()
> > 
> > In the latter case, we obviously need to move it to the earliest point
> > that scheduling is possible. But there are a number of things like 
> > 
> > http://lxr.linux.no/linux+v2.6.30/kernel/printk.c#L228
> > 
> > that assume the definition is actually (a). We're currently within a
> > couple lines of a strict definition of (a) already, so I actually think
> > cond_resched() is just wrong (and we already know it broke a
> > previously-working user). It should perhaps be using another private
> > flag that gets set as soon as scheduling is up and running.
> 
> Right as mentioned before in this thread, we grew scheduler_running a
> while back which could be used for this.
> 
> > But I'd actually go further and say that it's unfortunate to be checking
> > extra flags in such an important inline, especially since the check is
> > false for all but the first couple seconds of run time. Seems like we
> > could avoid adding an extra check by artificially elevating the preempt
> > count in early boot (or at compile time) then dropping it when
> > scheduling becomes available.
> 
> Calling cond_resched() and co when !preemptable is an error so this
> wouldn't actually work.

Sorry if I was unclear. I'm suggesting setting the count so the existing
PREEMPT_ACTIVE test here fires:

int __sched _cond_resched(void)
{
        if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE) &&
                                        system_state == SYSTEM_RUNNING) {
                __cond_resched();
                return 1;
        }
        return 0;
}

That should be kosher.
diff mbox

Patch

diff --git a/init/main.c b/init/main.c
index 2c5ade7..f10d9cd 100644
--- a/init/main.c
+++ b/init/main.c
@@ -788,6 +788,7 @@  static void __init do_initcalls(void)
 {
 	initcall_t *call;
 
+	system_state = SYSTEM_RUNNING;
 	for (call = __early_initcall_end; call < __initcall_end; call++)
 		do_one_initcall(*call);
 
@@ -839,7 +840,6 @@  static noinline int init_post(void)
 	free_initmem();
 	unlock_kernel();
 	mark_rodata_ro();
-	system_state = SYSTEM_RUNNING;
 	numa_default_policy();
 
 	if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)