diff mbox

Fix BBC I2C envctrl on SunBlade 2000

Message ID 201407271504.05258.cat.schulze@alice-dsl.net
State Changes Requested
Delegated to: David Miller
Headers show

Commit Message

Christopher Alexander Tobias Schulze July 27, 2014, 1:04 p.m. UTC
Boot bus I2C based temperature and fan control does not currently work 
on (at least) SunBlade 2000 systems. This is caused by the resource bbc_i2c_bussel
missing on the second BBC I2C bus the temp and fan sensors are attached to, only
bbc_i2c_regs is present. While older drivers (2.6.24 era) could cope well with that
situation, the present drivers refuse to attach. This seems to be a regression
caused by the conversion to pure OF drivers on 2008-08-30:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/sbus/char/bbc_i2c.c?id=e21e245bcd9d5244735799387d14421789b20557

(look for the check "if (edev->num_addrs == 2)")

This patch restores the old behavior by checking num_resources before trying to
remap the bbc_i2c_bussel resource. If num_resources != 2, only the bbc_i2c_regs 
resource will be mapped. (The env & temp driver in bbc_envctrl still works well
with i2c_bussel_reg set to NULL as it did before in older kernels.)

In addition, INIT_LIST_HEAD invocations are needed to set up bbc_cpu_temperature and
bbc_fan_control. The fact that these were missing was hidden by the fact that the
bbc_envctrl driver obviously did not load at all anymore on the systems where this
hardware exists due to the bbc_i2c_bussel problem as discussed above.

With these patches, the SunBlade 2000 seems to work fine.

The patch was originally developed against a 3.13 backport kernel from Debian.
Both the patch against 3.13 and a recent 3.16-rc6 are included below. Please note
that I could only test the 3.13 version as I do not have access to the affected
machine anymore.

PATCH 1 - KERNEL VERSION 3.13 #####################################################


Regards,
Alexander Schulze
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christopher Alexander Tobias Schulze July 27, 2014, 3:31 p.m. UTC | #1
On the SunBlade 2000 system which was affected by the lockup bug as
discussed in my previous mail

	[PATCH] Installing invalid entries in TSB causes hard lockup on UltraSPARC III

we also experienced random segmentation faults in combination with
RSS counter warnings, which would show under similar circumstances
where the lockup bug hit (heavy disk I/O).

During testing the patch as described in my previous mail, I added
additional instrumentation to tsb_insert() to trigger in cases where
a TSB entry with PTE.VALID = 0 was to be installed. This instrumentation
dumped the TAG and PTE to the syslog, together with a stacktrace to show
the call chain, and was not included in the patch as presented in my previous
mail.

While the patch as presented in my previous mail works flawlessly (no more
lockups, no more segmentation faults and/or RSS counter errors, even under
heavy stress testing), we noticed that the instrumentation patch did not
prevent the segmentation fault and RSS counter error problems:

Jul 21 10:43:31 troi kernel: [  986.918478] sshd[4408]: segfault at 15fc ip 00000000f7453f7c (rpc 00000000f7453f18) sp 00000000fffc6760 error 30001 in libc-2.13.so[f7394000+172000]
Jul 21 10:43:31 troi kernel: [  987.291121] ------------[ cut here ]------------
Jul 21 10:43:31 troi kernel: [  987.352984] WARNING: CPU: 0 PID: 4408 at mm/mmap.c:2736 exit_mmap+0x138/0x160()
Jul 21 10:43:31 troi kernel: [  987.455605] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop snd_sun_cs4231 snd_pcm snd_page_alloc snd_timer snd soundcore ext4 crc16 mbcache 
Jul 21 10:43:31 troi kernel: [  987.977807] CPU: 0 PID: 4408 Comm: sshd Not tainted 3.13.10 #3
Jul 21 10:43:31 troi kernel: [  988.055697] Call Trace:
Jul 21 10:43:31 troi kernel: [  988.092701]  [0000000000521e58] exit_mmap+0x138/0x160
Jul 21 10:43:31 troi kernel: [  988.160972]  [000000000045be9c] mmput+0x5c/0x100
Jul 21 10:43:31 troi kernel: [  988.223853]  [000000000045fdbc] do_exit+0x21c/0x9a0
Jul 21 10:43:31 troi kernel: [  988.289832]  [00000000004605a4] do_group_exit+0x24/0xc0
Jul 21 10:43:31 troi kernel: [  988.359881]  [000000000046dc80] get_signal_to_deliver+0x220/0x560
Jul 21 10:43:31 troi kernel: [  988.440467]  [00000000004457f8] do_signal32+0x18/0xac0
Jul 21 10:43:31 troi kernel: [  988.509488]  [000000000042cc20] do_signal+0x2c0/0x520
Jul 21 10:43:31 troi kernel: [  988.577441]  [000000000042d680] do_notify_resume+0x40/0x60
Jul 21 10:43:31 troi kernel: [  988.650551]  [0000000000404ac4] __handle_signal+0xc/0x2c
Jul 21 10:43:31 troi kernel: [  988.721638] ---[ end trace 06eabdd105f65186 ]---
Jul 21 10:43:31 troi kernel: [  988.784414] BUG: Bad rss-counter state mm:fffffc003deae4a0 idx:1 val:3

The segmentation fault seems to occur in __fork() in GLIBC, shortly after the
fork syscall returned, and the value of 0x15fc seems to indicate that some registers
were corrupted (%g2 is set to the constant 0x15f8 in the two instructions preceding
the memory access that segfaults, and the code tries to fetch a value from
[%l7 + %g2], where %l7 was initialized before the fork syscall was performed). The
value of %rpc is also quite suspicious: It seems to be an address in __fork() itself,
where __fork() calls _IO_list_lock() (shortly before the fork syscall), so %rpc
seems to belong to the register window of _IO_list_lock() and not to the window that
_fork() should use.

The fact that this segmentation fault seems to be related to forking, that
installing TSB entries with VALID set to 0 was attempted in a code path
that involved forking, and the RSS count warning for the process where the
segfault occured makes me wonder whether this could be related to a
register window spill onto the userspace stack, perhaps causing a fault-in
of the stack page at a phase during forking where this might not be expected.

Note that dumping a stack trace also involves flushing all register windows
out to the stack, so the fact that we observed the segfault and RSS problem
only in combination with the instrumentation patch (but not with the patch
that just prevents invalid entries to be inserted into the TSB, with no further
actions) could also point in this direction.

Unfortunately, due to lack of direct access to an affected machine, I will not
be able to investigate any further. However, I hope that these observations might
help others to find and eliminate the cause of these segfaults and RSS counter
bugs, so that the affected UltraSPARC III system is usable again with newer
kernels. (Until then, my former colleague will be forced to run another SunBlade
under 2.6.24, which has been performing absolutely flawlessly, without any
such problems and with uptimes of > 1 year.)

Regards,
Alexander Schulze
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 28, 2014, 10:49 p.m. UTC | #2
From: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
Date: Sun, 27 Jul 2014 15:04:04 +0200

Thanks for all of your patches today.

Firstly, please format your Subject line correctly, it should
be of the form:

	[PATCH] ${subsystem}: Description.

Here subsystem could be "bbc-i2c: ".

> +       /* Need to initialize some fields of bbc_cpu_temperature */
> +       INIT_LIST_HEAD(&(tp->bp_list));
> +       INIT_LIST_HEAD(&(tp->glob_list));
> +

The innermost parenthesis are unnecessary, please remove them.

> +       /* Need to initialize some fields of bbc_fan_control */
> +       INIT_LIST_HEAD(&(fp->bp_list));
> +       INIT_LIST_HEAD(&(fp->glob_list));
> +

Likewise, and for the rest of your list initialization additions.

> -       bp->i2c_bussel_reg = of_ioremap(&op->resource[1], 0, 0x1, "bbc_i2c_bussel");
> -       if (!bp->i2c_bussel_reg)
> -               goto fail;
> +       if(op->num_resources == 2) {

Please put a space between "if" and the openning parenthesis of the
condition being tested.

> +       if(op->num_resources == 2) {

Likewise.

> +               bp->i2c_bussel_reg = of_ioremap(&op->resource[1], 0, 0x1, "bbc_i2c_bussel");
> +               if (!bp->i2c_bussel_reg) {
> +                       goto fail;
> +               }

Please do not use curly braces around single line basic blocks.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 28, 2014, 11:43 p.m. UTC | #3
From: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
Date: Sun, 27 Jul 2014 17:31:00 +0200

> Note that dumping a stack trace also involves flushing all register windows
> out to the stack, so the fact that we observed the segfault and RSS problem
> only in combination with the instrumentation patch (but not with the patch
> that just prevents invalid entries to be inserted into the TSB, with no further
> actions) could also point in this direction.

User register windows should be fully flushed to the stack before we try to
copy the address space, via the "flushw" done in the sys_vfork/sys_fork/sys_clone
assembler paths of arch/sparc/kernel/syscalls.S
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff -Naupr linux-source-3.13-orig/drivers/sbus/char/bbc_envctrl.c linux-source-3.13-patched/drivers/sbus/char/bbc_envctrl.c
--- linux-source-3.13-orig/drivers/sbus/char/bbc_envctrl.c      2014-04-14 15:48:24.000000000 +0200
+++ linux-source-3.13-patched/drivers/sbus/char/bbc_envctrl.c   2014-07-27 14:29:58.000000000 +0200
@@ -452,6 +452,10 @@  static void attach_one_temp(struct bbc_i
        if (!tp)
                return;
 
+       /* Need to initialize some fields of bbc_cpu_temperature */
+       INIT_LIST_HEAD(&(tp->bp_list));
+       INIT_LIST_HEAD(&(tp->glob_list));
+
        tp->client = bbc_i2c_attach(bp, op);
        if (!tp->client) {
                kfree(tp);
@@ -497,6 +501,10 @@  static void attach_one_fan(struct bbc_i2
        if (!fp)
                return;
 
+       /* Need to initialize some fields of bbc_fan_control */
+       INIT_LIST_HEAD(&(fp->bp_list));
+       INIT_LIST_HEAD(&(fp->glob_list));
+
        fp->client = bbc_i2c_attach(bp, op);
        if (!fp->client) {
                kfree(fp);
diff -Naupr linux-source-3.13-orig/drivers/sbus/char/bbc_i2c.c linux-source-3.13-patched/drivers/sbus/char/bbc_i2c.c
--- linux-source-3.13-orig/drivers/sbus/char/bbc_i2c.c  2014-04-14 15:48:24.000000000 +0200
+++ linux-source-3.13-patched/drivers/sbus/char/bbc_i2c.c       2014-07-27 14:29:58.000000000 +0200
@@ -301,13 +301,20 @@  static struct bbc_i2c_bus * attach_one_i
        if (!bp)
                return NULL;
 
+       /* Need to initialize some fields of bbc_i2c_bus */
+       INIT_LIST_HEAD(&(bp->temps));
+       INIT_LIST_HEAD(&(bp->fans));
+
        bp->i2c_control_regs = of_ioremap(&op->resource[0], 0, 0x2, "bbc_i2c_regs");
        if (!bp->i2c_control_regs)
                goto fail;
 
-       bp->i2c_bussel_reg = of_ioremap(&op->resource[1], 0, 0x1, "bbc_i2c_bussel");
-       if (!bp->i2c_bussel_reg)
-               goto fail;
+       if(op->num_resources == 2) {
+               bp->i2c_bussel_reg = of_ioremap(&op->resource[1], 0, 0x1, "bbc_i2c_bussel");
+               if (!bp->i2c_bussel_reg) {
+                       goto fail;
+               }
+       }
 
        bp->waiting = 0;
        init_waitqueue_head(&bp->wq);

PATCH 2 - KERNEL VERSION 3.16 #####################################################

diff -Naupr linux-3.16-rc6-orig/drivers/sbus/char/bbc_envctrl.c linux-3.16-rc6-patched/drivers/sbus/char/bbc_envctrl.c
--- linux-3.16-rc6-orig/drivers/sbus/char/bbc_envctrl.c 2014-07-27 11:49:21.000000000 +0200
+++ linux-3.16-rc6-patched/drivers/sbus/char/bbc_envctrl.c      2014-07-27 14:28:12.000000000 +0200
@@ -452,6 +452,10 @@  static void attach_one_temp(struct bbc_i
        if (!tp)
                return;
 
+       /* Need to initialize some fields of bbc_cpu_temperature */
+       INIT_LIST_HEAD(&(tp->bp_list));
+       INIT_LIST_HEAD(&(tp->glob_list));
+
        tp->client = bbc_i2c_attach(bp, op);
        if (!tp->client) {
                kfree(tp);
@@ -497,6 +501,10 @@  static void attach_one_fan(struct bbc_i2
        if (!fp)
                return;
 
+       /* Need to initialize some fields of bbc_fan_control */
+       INIT_LIST_HEAD(&(fp->bp_list));
+       INIT_LIST_HEAD(&(fp->glob_list));
+
        fp->client = bbc_i2c_attach(bp, op);
        if (!fp->client) {
                kfree(fp);
diff -Naupr linux-3.16-rc6-orig/drivers/sbus/char/bbc_i2c.c linux-3.16-rc6-patched/drivers/sbus/char/bbc_i2c.c
--- linux-3.16-rc6-orig/drivers/sbus/char/bbc_i2c.c     2014-07-27 11:49:45.000000000 +0200
+++ linux-3.16-rc6-patched/drivers/sbus/char/bbc_i2c.c  2014-07-27 14:28:12.000000000 +0200
@@ -300,13 +300,20 @@  static struct bbc_i2c_bus * attach_one_i
        if (!bp)
                return NULL;
 
+       /* Need to initialize some fields of bbc_i2c_bus */
+       INIT_LIST_HEAD(&(bp->temps));
+       INIT_LIST_HEAD(&(bp->fans));
+
        bp->i2c_control_regs = of_ioremap(&op->resource[0], 0, 0x2, "bbc_i2c_regs");
        if (!bp->i2c_control_regs)
                goto fail;
 
-       bp->i2c_bussel_reg = of_ioremap(&op->resource[1], 0, 0x1, "bbc_i2c_bussel");
-       if (!bp->i2c_bussel_reg)
-               goto fail;
+       if(op->num_resources == 2) {
+               bp->i2c_bussel_reg = of_ioremap(&op->resource[1], 0, 0x1, "bbc_i2c_bussel");
+               if (!bp->i2c_bussel_reg) {
+                       goto fail;
+               }
+       }
 
        bp->waiting = 0;
        init_waitqueue_head(&bp->wq);