Message ID | 52EF9FFE.4020405@nod.at |
---|---|
State | RFC |
Headers | show |
On Mon, 2014-02-03 at 14:56 +0100, Richard Weinberger wrote: > Am 03.02.2014 13:51, schrieb Wiedemer, Thorsten (Lawo AG): > > Hi, > > > > I can reproduce it fairly regularly, but not really "quickly". At the moment, I can use a setup of about identical 70 devices. > > A test over the last weekend resultet In 6 devices showing the bug. > > What we have are multiple processes which write in different intervals some data on the device and sync it, because this data should be available after a power cut. > > Perhaps I can force the error more often in writing test processes with shorter write/sync intervals. > > > > If I have further access to the "big" setup for some days, I will try to make a test without preemption. > > Hmm, ok. > Please also apply this patch, just in case... > > diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c > index 0e11671d..48fd2aa 100644 > --- a/drivers/mtd/ubi/eba.c > +++ b/drivers/mtd/ubi/eba.c > @@ -301,6 +301,7 @@ static void leb_write_unlock(struct ubi_device *ubi, int vol_id, int lnum) > > spin_lock(&ubi->ltree_lock); > le = ltree_lookup(ubi, vol_id, lnum); > + ubi_assert(le); > le->users -= 1; > ubi_assert(le->users >= 0); > up_write(&le->mutex); The UBI LEB locking is a bit over-designed, it could be simplified, may be this could help looking for the problem. The this report does really sound like there is something specific to Thorsten's system which corrupts memory. And it is difficult to debug this via the mailing list. Thorsten should start adding various checks like this and try to come closer to the root-cause.
Am 04.02.2014 08:22, schrieb Artem Bityutskiy: > On Mon, 2014-02-03 at 14:56 +0100, Richard Weinberger wrote: >> Am 03.02.2014 13:51, schrieb Wiedemer, Thorsten (Lawo AG): >>> Hi, >>> >>> I can reproduce it fairly regularly, but not really "quickly". At the moment, I can use a setup of about identical 70 devices. >>> A test over the last weekend resultet In 6 devices showing the bug. >>> What we have are multiple processes which write in different intervals some data on the device and sync it, because this data should be available after a power cut. >>> Perhaps I can force the error more often in writing test processes with shorter write/sync intervals. >>> >>> If I have further access to the "big" setup for some days, I will try to make a test without preemption. >> >> Hmm, ok. >> Please also apply this patch, just in case... >> >> diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c >> index 0e11671d..48fd2aa 100644 >> --- a/drivers/mtd/ubi/eba.c >> +++ b/drivers/mtd/ubi/eba.c >> @@ -301,6 +301,7 @@ static void leb_write_unlock(struct ubi_device *ubi, int vol_id, int lnum) >> >> spin_lock(&ubi->ltree_lock); >> le = ltree_lookup(ubi, vol_id, lnum); >> + ubi_assert(le); >> le->users -= 1; >> ubi_assert(le->users >= 0); >> up_write(&le->mutex); > > The UBI LEB locking is a bit over-designed, it could be simplified, may > be this could help looking for the problem. > > The this report does really sound like there is something specific to > Thorsten's system which corrupts memory. Thorsten sees: Dec 25 03:59:22 kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000000c (leb_write_unlock+0x74/0xf0) from [<c02d0d10>] (ubi_eba_write_leb+0x94/0x820 In July 2013 we got this report from a user: [ 300.554525] Unable to handle kernel NULL pointer dereference at virtual address 0000000c (leb_write_unlock+0xa0/0xf4) from [<802850e0>] (ubi_eba_write_leb+0x568/0x80c) In both cases we fault at address 0000000c and leb_write_unlock() was called by ubi_eba_write_leb(). Same user saw the issue also in the read path: [ 38.471134] Unable to handle kernel NULL pointer dereference at virtual address 00000000 (leb_read_unlock+0xa0/0xf4) from [<80285cdc>] (ubi_eba_read_leb+0x404/0x480) In that case the fault happened at 00000000 directly. A bit too deterministic for a memory corruption IMHO. > And it is difficult to debug this via the mailing list. Thorsten should > start adding various checks like this and try to come closer to the > root-cause. Yeah. We also need more oopses, maybe we can spot a pattern. Thanks, //richard
On Tue, 2014-02-04 at 08:46 +0100, Richard Weinberger wrote: > Am 04.02.2014 08:22, schrieb Artem Bityutskiy: > > On Mon, 2014-02-03 at 14:56 +0100, Richard Weinberger wrote: > >> Am 03.02.2014 13:51, schrieb Wiedemer, Thorsten (Lawo AG): > >>> Hi, > >>> > >>> I can reproduce it fairly regularly, but not really "quickly". At the moment, I can use a setup of about identical 70 devices. > >>> A test over the last weekend resultet In 6 devices showing the bug. > >>> What we have are multiple processes which write in different intervals some data on the device and sync it, because this data should be available after a power cut. > >>> Perhaps I can force the error more often in writing test processes with shorter write/sync intervals. > >>> > >>> If I have further access to the "big" setup for some days, I will try to make a test without preemption. > >> > >> Hmm, ok. > >> Please also apply this patch, just in case... > >> > >> diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c > >> index 0e11671d..48fd2aa 100644 > >> --- a/drivers/mtd/ubi/eba.c > >> +++ b/drivers/mtd/ubi/eba.c > >> @@ -301,6 +301,7 @@ static void leb_write_unlock(struct ubi_device *ubi, int vol_id, int lnum) > >> > >> spin_lock(&ubi->ltree_lock); > >> le = ltree_lookup(ubi, vol_id, lnum); > >> + ubi_assert(le); > >> le->users -= 1; > >> ubi_assert(le->users >= 0); > >> up_write(&le->mutex); > > > > The UBI LEB locking is a bit over-designed, it could be simplified, may > > be this could help looking for the problem. > > > > The this report does really sound like there is something specific to > > Thorsten's system which corrupts memory. May be. Although sometimes corruptions are also deterministics - a buffer over-run at the same place causes the same side effects etc. But in any case, the only way I know to deal with this issues is start putting various prints and assertions, and trying to come closer to the root-cause. Sometimes bisecting helps, but this case would be difficult to bisect because the reproducability is hard. Indeed, one may think that there is no failure duding a day, so the commit as 'good' while it may be actually 'bad', the bug just happen to not manifest itself quickly enough.
>>>> Am 03.02.2014 13:51, schrieb Wiedemer, Thorsten (Lawo AG): >>>>> I can reproduce it fairly regularly, but not really "quickly". At >>>>> the moment, I can use a setup of about identical 70 devices. A >>>>> test over the last weekend resultet In 6 devices showing the bug. >>>>> What we have are multiple processes which write in different >>>>> intervals some data on the device and sync it, because this data >>>>> should be available after a power cut. Perhaps I can force the >>>>> error more often in writing test processes with shorter write/sync >>>>> intervals. >>>>> >>>>> If I have further access to the "big" setup for some days, I will >>>>> try to make a test without preemption. >>> On Mon, 2014-02-03 at 14:56 +0100, Richard Weinberger wrote: >>>> Hmm, ok. >>>> Please also apply this patch, just in case... I don't think this patch will help. On 4 Feb 2014, dedekind1 at gmail.com wrote: > May be. Although sometimes corruptions are also deterministics - a > buffer over-run at the same place causes the same side effects etc. > But in any case, the only way I know to deal with this issues is start > putting various prints and assertions, and trying to come closer to the > root-cause. Sometimes bisecting helps, but this case would be difficult > to bisect because the reproducability is hard. Indeed, one may think > that there is no failure duding a day, so the commit as 'good' while it > may be actually 'bad', the bug just happen to not manifest itself > quickly enough. I have seen the same crash on a 2.6.36 system with all of the UbiFs/Ubi backported. It is also an IMX25 based system. We have, PC is at __up_write+0x3c/0x1a8 LR is at __up_write+0x24/0x1a8 pc : [<c0229400>] lr : [<c02293e8>] psr: a0000093 sp : c7225cc8 ip : 00020000 fp : c79fba00 r10: 00000523 r9 : 00000001 r8 : c7b33000 r7 : c793a800 r6 : c7bd473c r5 : c7bd4738 r4 : c7bd4720 r3 : 00000000 r2 : 00000002 r1 : 00000001 r0 : 00000002 Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Code: e4863004 e5953004 e1560003 0a00002a (e593200c) I run this, $ printf "\x04\x30\x86\xe4" "\x04\x30\x95\xe5" "\x03\x00\x56\xe1" "\x2a\x00\x00\x0a" "\x0c\x20\x93\xe5" > crash.dump $ objdump --disassemble-all -b binary -m arm crash.dump crash.dump: file format binary Disassembly of section .data: 00000000 <.data>: 0: e4863004 str r3, [r6], #4 4: e5953004 ldr r3, [r5, #4] 8: e1560003 cmp r6, r3 c: 0a00002a beq 0xbc 10: e593200c ldr r2, [r3, #12] The values 'r6' and 'r5' are pointers and they are far from non-NULL and look like good kernel data pointers. Something in the list is 'NULL'. There is a load of 'r3' as NULL and then the use of '[r3, #12]' which gives the crash address of '0xc'. Here is the objdump with source interspersed for my build, sem->activity = 0; 350: e3a0a000 mov sl, #0 354: e1a05000 mov r5, r0 358: e485a004 str sl, [r5], #4 * list_empty - tests whether a list is empty * @head: the list to test. */ static inline int list_empty(const struct list_head *head) { return head->next == head; 35c: e5903004 ldr r3, [r0, #4] if (!list_empty(&sem->wait_list)) 360: e1550003 cmp r5, r3 364: 0a00002b beq 418 <__up_write+0xfc> /* if we are allowed to wake writers try to grant a single write lock * if there's a writer at the front of the queue * - we leave the 'waiting count' incremented to signify potential * contention */ if (waiter->flags & RWSEM_WAITING_FOR_WRITE) { 368: e593200c ldr r2, [r3, #12] 36c: e2124002 ands r4, r2, #2 370: 0a000006 beq 390 <__up_write+0x74> 374: ea000034 b 44c <__up_write+0x130> The compiler picks different registers, r3/sl+r3, r5/r0, r6/r5 but the code is the same. The 'rw_semaphore' is struct rw_semaphore { __s32 activity; struct list_head wait_list; }; So, the 'wait_list' is non-NULL, the rw_semaphore is non-NULL, but 'wait_list->next' is NULL. This seems to be very consistent with this 'oops'. It seems that the "ltree_lock" doesn't protect the use of the ltree_lookup() versus insertions and deletions? Ie, ltree_lookup() may return non-NULL, but an interrupt/pagefault before a 'le->users +/- = 1;' may mean the node is released? On a UP system, does 'spin_lock()' actually do anything? The rw_semaphore uses spin_lock_irqsave(). However, that doesn't make sense as I think this occurs mainly on a ARM926 system. The ARM926 systems do not have proper 'lock free' idioms like 'ldrex/strex' and they try to do atomic operations by locking interrupts. I think that UbiFs/UBI maybe called on a 'data fault' or 'program fault' (in user space) when memory pressure is present. I have seen this occur in some sound drivers where the data source is coming from disk (or maybe the driver uses vmalloc() or something). So I think on occasion, the ltree_lookup() may not work or there is something weird with the atomic primatives and data/page faults. Fwiw, Bill Pringlemeir.
On 4 Feb 2014, bpringlemeir@nbsps.com wrote: > The ARM926 systems do not have proper 'lock free' idioms like > 'ldrex/strex' and they try to do atomic operations by locking > interrupts. I think that UbiFs/UBI maybe called on a 'data fault' or > 'program fault' (in user space) when memory pressure is present. I have > seen this occur in some sound drivers where the data source is coming > from disk (or maybe the driver uses vmalloc() or something). So I think > on occasion, the ltree_lookup() may not work or there is something weird > with the atomic primatives and data/page faults. https://www.google.ca/#q=site:infradead.org+leb_write_unlock+oops http://lists.infradead.org/pipermail/linux-mtd/2013-May/046907.html at91sam9g20 - arm926, different MTD driver. Linux 3.6.9 Code: e5903004 e58d2004 e1560003 0a00002a (e593200c) 0: e5903004 ldr r3, [r0, #4] 4: e58d2004 str r2, [sp, #4] 8: e1560003 cmp r6, r3 c: 0a00002a beq 0xbc 10: e593200c ldr r2, [r3, #12] The code sequence looks identical and the Oops trace, etc is the same. People from Pengutronix also indicated seeing the same type of Opps; I think they deal with the IMX, but maybe this was on another board. Regards, Bill Pringlemeir.
On 4 Feb 2014, bpringlemeir@nbsps.com wrote: > http://lists.infradead.org/pipermail/linux-mtd/2013-May/046907.html > > at91sam9g20 - arm926, different MTD driver. Linux 3.6.9 > > Code: e5903004 e58d2004 e1560003 0a00002a (e593200c) > > 0: e5903004 ldr r3, [r0, #4] > 4: e58d2004 str r2, [sp, #4] > 8: e1560003 cmp r6, r3 > c: 0a00002a beq 0xbc > 10: e593200c ldr r2, [r3, #12] > > The code sequence looks identical and the Oops trace, etc is the same. > People from Pengutronix also indicated seeing the same type of Opps; I > think they deal with the IMX, but maybe this was on another board. >>>> schrieb Wiedemer, Thorsten (Lawo AG): > Ehmm, OK, OK, even with the changes in kernel, ubi_assert() in > leb_write_unlock() wouldn't have triggered ... Another up_read() crash, http://lists.infradead.org/pipermail/linux-mtd/2013-July/047512.html Code: e1530001 0a000016 e3e01000 e5801000 (e8930003) 00000000 <.data>: 0: e1530001 cmp r3, r1 4: 0a000016 beq 0x64 8: e3e01000 mvn r1, #0 c: e5801000 str r1, [r0] 10: e8930003 ldm r3, {r0, r1} Thorsten's Oops, Code: e3e02000 e5842000 e59fc084 e59f0084 (e8930006) 00000000 <.data>: 0: e3e02000 mvn r2, #0 4: e5842000 str r2, [r4] 8: e59fc084 ldr ip, [pc, #132] ; 0x94 c: e59f0084 ldr r0, [pc, #132] ; 0x98 10: e8930006 ldm r3, {r1, r2} The registers are different, but the instruction sequence is similar. In my ARM926 build, the __up_read() is, static inline int list_empty(const struct list_head *head) { return head->next == head; 250: e1a01000 mov r1, r0 254: e5b12004 ldr r2, [r1, #4]! 258: e1520001 cmp r2, r1 25c: 0a000017 beq 2c0 <__up_read+0xb0> __rwsem_wake_one_writer(struct rw_semaphore *sem) { struct rwsem_waiter *waiter; struct task_struct *tsk; sem->activity = -1; 260: e3e01000 mvn r1, #0 264: e5801000 str r1, [r0] * in an undefined state. */ #ifndef CONFIG_DEBUG_LIST static inline void list_del(struct list_head *entry) { __list_del(entry->prev, entry->next); 268: e8920003 ldm r2, {r0, r1} * This is only for internal list manipulation where we know * the prev/next entries already! */ static inline void __list_del(struct list_head * prev, struct list_head * next) { next->prev = prev; 26c: e5801004 str r1, [r0, #4] prev->next = next; 270: e5810000 str r0, [r1] This is the same symptom, __rwsem_wake_one_writer(struct rw_semaphore *sem) { ... waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list); list_del(&waiter->list); The sem->wait_list is non-NULL, but the 'sem->wait_list.next' is NULL. I would suggest you try with 'DEBUG_LOCK_ALLOC' or something like this. The crash points are not the failure, it is when we insert a rw_semaphore of 'NULL' or use some memory that is already freed. Fwiw, Bill Pringlemeir.
Am 04.02.2014 20:57, schrieb Bill Pringlemeir: > On 4 Feb 2014, bpringlemeir@nbsps.com wrote: > >> http://lists.infradead.org/pipermail/linux-mtd/2013-May/046907.html >> >> at91sam9g20 - arm926, different MTD driver. Linux 3.6.9 >> >> Code: e5903004 e58d2004 e1560003 0a00002a (e593200c) >> >> 0: e5903004 ldr r3, [r0, #4] >> 4: e58d2004 str r2, [sp, #4] >> 8: e1560003 cmp r6, r3 >> c: 0a00002a beq 0xbc >> 10: e593200c ldr r2, [r3, #12] >> >> The code sequence looks identical and the Oops trace, etc is the same. >> People from Pengutronix also indicated seeing the same type of Opps; I >> think they deal with the IMX, but maybe this was on another board. > >>>>> schrieb Wiedemer, Thorsten (Lawo AG): > >> Ehmm, OK, OK, even with the changes in kernel, ubi_assert() in >> leb_write_unlock() wouldn't have triggered ... > > Another up_read() crash, > > http://lists.infradead.org/pipermail/linux-mtd/2013-July/047512.html > > Code: e1530001 0a000016 e3e01000 e5801000 (e8930003) > > 00000000 <.data>: > 0: e1530001 cmp r3, r1 > 4: 0a000016 beq 0x64 > 8: e3e01000 mvn r1, #0 > c: e5801000 str r1, [r0] > 10: e8930003 ldm r3, {r0, r1} > > Thorsten's Oops, > > Code: e3e02000 e5842000 e59fc084 e59f0084 (e8930006) > > 00000000 <.data>: > 0: e3e02000 mvn r2, #0 > 4: e5842000 str r2, [r4] > 8: e59fc084 ldr ip, [pc, #132] ; 0x94 > c: e59f0084 ldr r0, [pc, #132] ; 0x98 > 10: e8930006 ldm r3, {r1, r2} > > The registers are different, but the instruction sequence is similar. > In my ARM926 build, the __up_read() is, > > static inline int list_empty(const struct list_head *head) > { > return head->next == head; > 250: e1a01000 mov r1, r0 > 254: e5b12004 ldr r2, [r1, #4]! > 258: e1520001 cmp r2, r1 > 25c: 0a000017 beq 2c0 <__up_read+0xb0> > __rwsem_wake_one_writer(struct rw_semaphore *sem) > { > struct rwsem_waiter *waiter; > struct task_struct *tsk; > > sem->activity = -1; > 260: e3e01000 mvn r1, #0 > 264: e5801000 str r1, [r0] > * in an undefined state. > */ > #ifndef CONFIG_DEBUG_LIST > static inline void list_del(struct list_head *entry) > { > __list_del(entry->prev, entry->next); > 268: e8920003 ldm r2, {r0, r1} > * This is only for internal list manipulation where we know > * the prev/next entries already! > */ > static inline void __list_del(struct list_head * prev, struct list_head * next) > { > next->prev = prev; > 26c: e5801004 str r1, [r0, #4] > prev->next = next; > 270: e5810000 str r0, [r1] > > > This is the same symptom, > > __rwsem_wake_one_writer(struct rw_semaphore *sem) > { > ... > waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list); > list_del(&waiter->list); > > The sem->wait_list is non-NULL, but the 'sem->wait_list.next' is NULL. I > would suggest you try with 'DEBUG_LOCK_ALLOC' or something like this. > The crash points are not the failure, it is when we insert a > rw_semaphore of 'NULL' or use some memory that is already freed. CONFIG_DEBUG_LIST please. Thanks, //richard
diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c index 0e11671d..48fd2aa 100644 --- a/drivers/mtd/ubi/eba.c +++ b/drivers/mtd/ubi/eba.c @@ -301,6 +301,7 @@ static void leb_write_unlock(struct ubi_device *ubi, int vol_id, int lnum) spin_lock(&ubi->ltree_lock); le = ltree_lookup(ubi, vol_id, lnum); + ubi_assert(le); le->users -= 1; ubi_assert(le->users >= 0); up_write(&le->mutex);