Message ID | 20250513100730.12664-1-byungchul@sk.com |
---|---|
Headers | show |
Series | DEPT(DEPendency Tracker) | expand |
On Tue, May 13, 2025 at 07:06:47PM +0900, Byungchul Park wrote: > I'm happy to see that dept reported a real problem in practice. See: > > https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t > https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/ > > I added a document describing dept, that would help you understand what > dept is and how dept works. You can use dept just with CONFIG_DEPT on > and by checking dmesg in runtime. > > There are still false positives here and there and some of those are > already in progress to suppress and the efforts are essencial until it > gets more stable as lockdep experienced. > > It's worth noting that EXPERIMENTAL in Kconfig is tagged. I missed thanks for the support and contribution to: Harry Yoo <harry.yoo@oracle.com> Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Yunseong Kim <yskelg@gmail.com> Yeoreum Yun <yeoreum.yun@arm.com> Thank you :) Byungchul > --- > > Hi Linus and folks, > > I've been developing a tool for detecting deadlock possibilities by > tracking wait/event rather than lock acquisition order to try to cover > all synchonization machanisms. > > Benefits: > > 0. Works with all lock primitives. > 1. Works with wait_for_completion()/complete(). > 2. Works with PG_locked. > 3. Works with swait/wakeup. > 4. Works with waitqueue. > 5. Works with wait_bit. > 6. Multiple reports are allowed. > 7. Deduplication control on multiple reports. > 8. Withstand false positives thanks to 7. > 9. Easy to tag any wait/event. > > Future works: > > 0. To make it more stable. > 1. To separates dept from lockdep. > 2. To improves performance in terms of time and space. > 3. To use dept as a dependency engine for lockdep. > 4. To add any missing tags of wait/event in the kernel. > 5. To deduplicate memory space for stack traces. > > How to interpret reports: > (See the document in this patchset for more detail.) > > [S] the start of the event context > [W] the wait disturbing the event from being triggered > [E] the event that cannot be reachable > > Thanks. > > Byungchul > > --- > > Changes from v14: > 1. Rebase on the current latest, v6.15-rc6. > 2. Refactor dept code. > 3. With multi event sites for a single wait, even if an event > forms a circular dependency, the event can be recovered by > other event(or wake up) paths. Even though informing the > circular dependency is worthy but it should be suppressed > once informing it, if it doesn't lead an actual deadlock. So > introduce APIs to annotate the relationship between event > site and recover site, that are, event_site() and > dept_recover_event(). > 4. wait_for_completion() worked with dept map embedded in struct > completion. However, it generates a few false positves since > all the waits using the instance of struct completion, share > the map and key. To avoid the false positves, make it not to > share the map and key but each wait_for_completion() caller > have its own key by default. Of course, external maps also > can be used if needed. > 5. Fix a bug about hardirq on/off tracing. > 6. Implement basic unit test for dept. > 7. Add more supports for dma fence synchronization. > 8. Add emergency stop of dept e.g. on panic(). > 9. Fix false positives by mmu_notifier_invalidate_*(). > 10. Fix recursive call bug by DEPT_WARN_*() and DEPT_STOP(). > 11. Fix trivial bugs in DEPT_WARN_*() and DEPT_STOP(). > 12. Fix a bug that a spin lock, dept_pool_spin, is used in > both contexts of irq disabled and enabled without irq > disabled. > 13. Suppress reports with classes, any of that already have > been reported, even though they have different chains but > being barely meaningful. > 14. Print stacktrace of the wait that an event is now waking up, > not only stacktrace of the event. > 15. Make dept aware of lockdep_cmp_fn() that is used to avoid > false positives in lockdep so that dept can also avoid them. > 16. Do do_event() only if there are no ecxts have been > delimited. > 17. Fix a bug that was not synchronized for stage_m in struct > dept_task, using a spin lock, dept_task()->stage_lock. > 18. Fix a bug that dept didn't handle the case that multiple > ttwus for a single waiter can be called at the same time > e.i. a race issue. > 19. Distinguish each kernel context from others, not only by > system call but also by user oriented fault so that dept can > work with more accuracy information about kernel context. > That helps to avoid a few false positives. > 20. Limit dept's working to x86_64 and arm64. > > Changes from v13: > > 1. Rebase on the current latest version, v6.9-rc7. > 2. Add 'dept' documentation describing dept APIs. > > Changes from v12: > > 1. Refine the whole document for dept. > 2. Add 'Interpret dept report' section in the document, using a > deadlock report obtained in practice. Hope this version of > document helps guys understand dept better. > > https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t > https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/ > > Changes from v11: > > 1. Add 'dept' documentation describing the concept of dept. > 2. Rewrite the commit messages of the following commits for > using weaker lockdep annotation, for better description. > > fs/jbd2: Use a weaker annotation in journal handling > cpu/hotplug: Use a weaker annotation in AP thread > > (feedbacked by Thomas Gleixner) > > Changes from v10: > > 1. Fix noinstr warning when building kernel source. > 2. dept has been reporting some false positives due to the folio > lock's unfairness. Reflect it and make dept work based on > dept annotaions instead of just wait and wake up primitives. > 3. Remove the support for PG_writeback while working on 2. I > will add the support later if needed. > 4. dept didn't print stacktrace for [S] if the participant of a > deadlock is not lock mechanism but general wait and event. > However, it made hard to interpret the report in that case. > So add support to print stacktrace of the requestor who asked > the event context to run - usually a waiter of the event does > it just before going to wait state. > 5. Give up tracking raw_local_irq_{disable,enable}() since it > totally messed up dept's irq tracking. So make it work in the > same way as lockdep does. I will consider it once any false > positives by those are observed again. > 6. Change the manual rwsem_acquire_read(->j_trans_commit_map) > annotation in fs/jbd2/transaction.c to the try version so > that it works as much as it exactly needs. > 7. Remove unnecessary 'inline' keyword in dept.c and add > '__maybe_unused' to a needed place. > > Changes from v9: > > 1. Fix a bug. SDT tracking didn't work well because of my big > mistake that I should've used waiter's map to indentify its > class but it had been working with waker's one. FYI, > PG_locked and PG_writeback weren't affected. They still > worked well. (reported by YoungJun) > > Changes from v8: > > 1. Fix build error by adding EXPORT_SYMBOL(PG_locked_map) and > EXPORT_SYMBOL(PG_writeback_map) for kernel module build - > appologize for that. (reported by kernel test robot) > 2. Fix build error by removing header file's circular dependency > that was caused by "atomic.h", "kernel.h" and "irqflags.h", > which I introduced - appolgize for that. (reported by kernel > test robot) > > Changes from v7: > > 1. Fix a bug that cannot track rwlock dependency properly, > introduced in v7. (reported by Boqun and lockdep selftest) > 2. Track wait/event of PG_{locked,writeback} more aggressively > assuming that when a bit of PG_{locked,writeback} is cleared > there might be waits on the bit. (reported by Linus, Hillf > and syzbot) > 3. Fix and clean bad style code e.i. unnecessarily introduced > a randome pattern and so on. (pointed out by Linux) > 4. Clean code for applying dept to wait_for_completion(). > > Changes from v6: > > 1. Tie to task scheduler code to track sleep and try_to_wake_up() > assuming sleeps cause waits, try_to_wake_up()s would be the > events that those are waiting for, of course with proper dept > annotations, sdt_might_sleep_weak(), sdt_might_sleep_strong() > and so on. For these cases, class is classified at sleep > entrance rather than the synchronization initialization code. > Which would extremely reduce false alarms. > 2. Remove the dept associated instance in each page struct for > tracking dependencies by PG_locked and PG_writeback thanks to > the 1. work above. > 3. Introduce CONFIG_dept_AGGRESIVE_TIMEOUT_WAIT to suppress > reports that waits with timeout set are involved, for those > who don't like verbose reporting. > 4. Add a mechanism to refill the internal memory pools on > running out so that dept could keep working as long as free > memory is available in the system. > 5. Re-enable tracking hashed-waitqueue wait. That's going to no > longer generate false positives because class is classified > at sleep entrance rather than the waitqueue initailization. > 6. Refactor to make it easier to port onto each new version of > the kernel. > 7. Apply dept to dma fence. > 8. Do trivial optimizaitions. > > Changes from v5: > > 1. Use just pr_warn_once() rather than WARN_ONCE() on the lack > of internal resources because WARN_*() printing stacktrace is > too much for informing the lack. (feedback from Ted, Hyeonggon) > 2. Fix trivial bugs like missing initializing a struct before > using it. > 3. Assign a different class per task when handling onstack > variables for waitqueue or the like. Which makes dept > distinguish between onstack variables of different tasks so > as to prevent false positives. (reported by Hyeonggon) > 4. Make dept aware of even raw_local_irq_*() to prevent false > positives. (reported by Hyeonggon) > 5. Don't consider dependencies between the events that might be > triggered within __schedule() and the waits that requires > __schedule(), real ones. (reported by Hyeonggon) > 6. Unstage the staged wait that has prepare_to_wait_event()'ed > *and* yet to get to __schedule(), if we encounter __schedule() > in-between for another sleep, which is possible if e.g. a > mutex_lock() exists in 'condition' of ___wait_event(). > 7. Turn on CONFIG_PROVE_LOCKING when CONFIG_DEPT is on, to rely > on the hardirq and softirq entrance tracing to make dept more > portable for now. > > Changes from v4: > > 1. Fix some bugs that produce false alarms. > 2. Distinguish each syscall context from another *for arm64*. > 3. Make it not warn it but just print it in case dept ring > buffer gets exhausted. (feedback from Hyeonggon) > 4. Explicitely describe "EXPERIMENTAL" and "dept might produce > false positive reports" in Kconfig. (feedback from Ted) > > Changes from v3: > > 1. dept shouldn't create dependencies between different depths > of a class that were indicated by *_lock_nested(). dept > normally doesn't but it does once another lock class comes > in. So fixed it. (feedback from Hyeonggon) > 2. dept considered a wait as a real wait once getting to > __schedule() even if it has been set to TASK_RUNNING by wake > up sources in advance. Fixed it so that dept doesn't consider > the case as a real wait. (feedback from Jan Kara) > 3. Stop tracking dependencies with a map once the event > associated with the map has been handled. dept will start to > work with the map again, on the next sleep. > > Changes from v2: > > 1. Disable dept on bit_wait_table[] in sched/wait_bit.c > reporting a lot of false positives, which is my fault. > Wait/event for bit_wait_table[] should've been tagged in a > higher layer for better work, which is a future work. > (feedback from Jan Kara) > 2. Disable dept on crypto_larval's completion to prevent a false > positive. > > Changes from v1: > > 1. Fix coding style and typo. (feedback from Steven) > 2. Distinguish each work context from another in workqueue. > 3. Skip checking lock acquisition with nest_lock, which is about > correct lock usage that should be checked by lockdep. > > Changes from RFC(v0): > > 1. Prevent adding a wait tag at prepare_to_wait() but __schedule(). > (feedback from Linus and Matthew) > 2. Use try version at lockdep_acquire_cpus_lock() annotation. > 3. Distinguish each syscall context from another. > > Byungchul Park (43): > llist: move llist_{head,node} definition to types.h > dept: implement DEPT(DEPendency Tracker) > dept: add single event dependency tracker APIs > dept: add lock dependency tracker APIs > dept: tie to lockdep and IRQ tracing > dept: add proc knobs to show stats and dependency graph > dept: distinguish each kernel context from another > x86_64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to x86_64 > arm64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64 > dept: distinguish each work from another > dept: add a mechanism to refill the internal memory pools on running > out > dept: record the latest one out of consecutive waits of the same class > dept: apply sdt_might_sleep_{start,end}() to > wait_for_completion()/complete() > dept: apply sdt_might_sleep_{start,end}() to swait > dept: apply sdt_might_sleep_{start,end}() to waitqueue wait > dept: apply sdt_might_sleep_{start,end}() to hashed-waitqueue wait > dept: apply sdt_might_sleep_{start,end}() to dma fence > dept: track timeout waits separately with a new Kconfig > dept: apply timeout consideration to wait_for_completion()/complete() > dept: apply timeout consideration to swait > dept: apply timeout consideration to waitqueue wait > dept: apply timeout consideration to hashed-waitqueue wait > dept: apply timeout consideration to dma fence wait > dept: make dept able to work with an external wgen > dept: track PG_locked with dept > dept: print staged wait's stacktrace on report > locking/lockdep: prevent various lockdep assertions when > lockdep_off()'ed > dept: suppress reports with classes that have been already reported > dept: add documentation for dept > cpu/hotplug: use a weaker annotation in AP thread > fs/jbd2: use a weaker annotation in journal handling > dept: assign dept map to mmu notifier invalidation synchronization > dept: assign unique dept_key to each distinct dma fence caller > dept: make dept aware of lockdep_set_lock_cmp_fn() annotation > dept: make dept stop from working on debug_locks_off() > i2c: rename wait_for_completion callback to wait_for_completion_cb > dept: assign unique dept_key to each distinct wait_for_completion() > caller > completion, dept: introduce init_completion_dmap() API > dept: introduce a new type of dependency tracking between multi event > sites > dept: add module support for struct dept_event_site and > dept_event_site_dep > dept: introduce event_site() to disable event tracking if it's > recoverable > dept: implement a basic unit test for dept > dept: call dept_hardirqs_off() in local_irq_*() regardless of irq > state > > Documentation/dependency/dept.txt | 735 ++++++ > Documentation/dependency/dept_api.txt | 117 + > arch/arm64/Kconfig | 1 + > arch/arm64/kernel/syscall.c | 7 + > arch/arm64/mm/fault.c | 7 + > arch/x86/Kconfig | 1 + > arch/x86/entry/syscall_64.c | 7 + > arch/x86/mm/fault.c | 7 + > drivers/dma-buf/dma-fence.c | 17 +- > drivers/i2c/algos/i2c-algo-pca.c | 2 +- > drivers/i2c/busses/i2c-pca-isa.c | 2 +- > drivers/i2c/busses/i2c-pca-platform.c | 2 +- > fs/jbd2/transaction.c | 2 +- > include/asm-generic/vmlinux.lds.h | 13 +- > include/linux/completion.h | 124 +- > include/linux/dept.h | 625 +++++ > include/linux/dept_ldt.h | 77 + > include/linux/dept_sdt.h | 67 + > include/linux/dept_unit_test.h | 67 + > include/linux/dma-fence.h | 74 +- > include/linux/hardirq.h | 3 + > include/linux/i2c-algo-pca.h | 2 +- > include/linux/irqflags.h | 21 +- > include/linux/llist.h | 8 - > include/linux/local_lock_internal.h | 1 + > include/linux/lockdep.h | 105 +- > include/linux/lockdep_types.h | 3 + > include/linux/mm_types.h | 2 + > include/linux/mmu_notifier.h | 26 + > include/linux/module.h | 5 + > include/linux/mutex.h | 1 + > include/linux/page-flags.h | 125 +- > include/linux/pagemap.h | 7 +- > include/linux/percpu-rwsem.h | 2 +- > include/linux/rtmutex.h | 1 + > include/linux/rwlock_types.h | 1 + > include/linux/rwsem.h | 1 + > include/linux/sched.h | 120 +- > include/linux/seqlock.h | 2 +- > include/linux/spinlock_types_raw.h | 3 + > include/linux/srcu.h | 2 +- > include/linux/swait.h | 3 + > include/linux/types.h | 8 + > include/linux/wait.h | 3 + > include/linux/wait_bit.h | 3 + > init/init_task.c | 2 + > init/main.c | 2 + > kernel/Makefile | 1 + > kernel/cpu.c | 2 +- > kernel/dependency/Makefile | 5 + > kernel/dependency/dept.c | 3510 +++++++++++++++++++++++++ > kernel/dependency/dept_hash.h | 10 + > kernel/dependency/dept_internal.h | 64 + > kernel/dependency/dept_object.h | 13 + > kernel/dependency/dept_proc.c | 93 + > kernel/dependency/dept_unit_test.c | 173 ++ > kernel/exit.c | 1 + > kernel/fork.c | 2 + > kernel/locking/lockdep.c | 33 + > kernel/module/main.c | 19 + > kernel/sched/completion.c | 62 +- > kernel/sched/core.c | 8 + > kernel/workqueue.c | 3 + > lib/Kconfig.debug | 51 + > lib/debug_locks.c | 2 + > lib/locking-selftest.c | 2 + > mm/filemap.c | 26 + > mm/mm_init.c | 2 + > mm/mmu_notifier.c | 31 +- > 69 files changed, 6404 insertions(+), 125 deletions(-) > create mode 100644 Documentation/dependency/dept.txt > create mode 100644 Documentation/dependency/dept_api.txt > create mode 100644 include/linux/dept.h > create mode 100644 include/linux/dept_ldt.h > create mode 100644 include/linux/dept_sdt.h > create mode 100644 include/linux/dept_unit_test.h > create mode 100644 kernel/dependency/Makefile > create mode 100644 kernel/dependency/dept.c > create mode 100644 kernel/dependency/dept_hash.h > create mode 100644 kernel/dependency/dept_internal.h > create mode 100644 kernel/dependency/dept_object.h > create mode 100644 kernel/dependency/dept_proc.c > create mode 100644 kernel/dependency/dept_unit_test.c > > > base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3 > -- > 2.17.1