mbox series

[v4,00/24] DEPT(Dependency Tracker)

Message ID 1646377603-19730-1-git-send-email-byungchul.park@lge.com
Headers show
Series DEPT(Dependency Tracker) | expand

Message

Byungchul Park March 4, 2022, 7:06 a.m. UTC
Hi Linus and folks,

I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock(?) acquisition order to try to
cover all synchonization machanisms. It's done on v5.17-rc1 tag.

https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1

Benifit:

	0. Works with all lock primitives.
	1. Works with wait_for_completion()/complete().
	2. Works with 'wait' on PG_locked.
	3. Works with 'wait' on PG_writeback.
	4. Works with swait/wakeup.
	5. Works with waitqueue.
	6. Multiple reports are allowed.
	7. Deduplication control on multiple reports.
	8. Withstand false positives thanks to 6.
	9. Easy to tag any wait/event.

Future work:

	0. To make it more stable.
	1. To separates Dept from Lockdep.
	2. To improves performance in terms of time and space.
	3. To use Dept as a dependency engine for Lockdep.
	4. To add any missing tags of wait/event in the kernel.
	5. To deduplicate stack trace.

How to interpret reports:

	1. E(event) in each context cannot be triggered because of the
	   W(wait) that cannot be woken.
	2. The stack trace helping find the problematic code is located
	   in each conext's detail.

Thanks,
Byungchul

---

Changes from v3:

	1. Dept shouldn't create dependencies between different depths
	   of a class that were indicated by *_lock_nested(). Dept
	   normally doesn't but it does once another lock class comes
	   in. So fixed it. (feedback from Hyeonggon)
	2. Dept considered a wait as a real wait once getting to
	   __schedule() even if it has been set to TASK_RUNNING by wake
	   up sources in advance. Fixed it so that Dept doesn't consider
	   the case as a real wait. (feedback from Jan Kara)
	3. Stop tracking dependencies with a map once the event
	   associated with the map has been handled. Dept will start to
	   work with the map again, on the next sleep.

Changes from v2:

	1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
	   reporting a lot of false positives, which is my fault.
	   Wait/event for bit_wait_table[] should've been tagged in a
	   higher layer for better work, which is a future work.
	   (feedback from Jan Kara)
	2. Disable Dept on crypto_larval's completion to prevent a false
	   positive.

Changes from v1:

	1. Fix coding style and typo. (feedback from Steven)
	2. Distinguish each work context from another in workqueue.
	3. Skip checking lock acquisition with nest_lock, which is about
	   correct lock usage that should be checked by Lockdep.

Changes from RFC:

	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
	   (feedback from Linus and Matthew)
	2. Use try version at lockdep_acquire_cpus_lock() annotation.
	3. Distinguish each syscall context from another.

Byungchul Park (24):
  llist: Move llist_{head,node} definition to types.h
  dept: Implement Dept(Dependency Tracker)
  dept: Embed Dept data in Lockdep
  dept: Add a API for skipping dependency check temporarily
  dept: Apply Dept to spinlock
  dept: Apply Dept to mutex families
  dept: Apply Dept to rwlock
  dept: Apply Dept to wait_for_completion()/complete()
  dept: Apply Dept to seqlock
  dept: Apply Dept to rwsem
  dept: Add proc knobs to show stats and dependency graph
  dept: Introduce split map concept and new APIs for them
  dept: Apply Dept to wait/event of PG_{locked,writeback}
  dept: Apply SDT to swait
  dept: Apply SDT to wait(waitqueue)
  locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
  dept: Distinguish each syscall context from another
  dept: Distinguish each work from another
  dept: Disable Dept within the wait_bit layer by default
  dept: Add nocheck version of init_completion()
  dept: Disable Dept on struct crypto_larval's completion for now
  dept: Don't create dependencies between different depths in any case
  dept: Let it work with real sleeps in __schedule()
  dept: Disable Dept on that map once it's been handled until next turn

 crypto/api.c                       |    7 +-
 include/linux/completion.h         |   50 +-
 include/linux/dept.h               |  535 +++++++
 include/linux/dept_page.h          |   78 ++
 include/linux/dept_sdt.h           |   62 +
 include/linux/hardirq.h            |    3 +
 include/linux/irqflags.h           |   33 +-
 include/linux/llist.h              |    8 -
 include/linux/lockdep.h            |  158 ++-
 include/linux/lockdep_types.h      |    3 +
 include/linux/mutex.h              |   33 +
 include/linux/page-flags.h         |   45 +-
 include/linux/pagemap.h            |    7 +-
 include/linux/percpu-rwsem.h       |   10 +-
 include/linux/rtmutex.h            |    7 +
 include/linux/rwlock.h             |   52 +
 include/linux/rwlock_api_smp.h     |    8 +-
 include/linux/rwlock_types.h       |    7 +
 include/linux/rwsem.h              |   33 +
 include/linux/sched.h              |    7 +
 include/linux/seqlock.h            |   59 +-
 include/linux/spinlock.h           |   26 +
 include/linux/spinlock_types_raw.h |   13 +
 include/linux/swait.h              |    4 +
 include/linux/types.h              |    8 +
 include/linux/wait.h               |    6 +-
 init/init_task.c                   |    2 +
 init/main.c                        |    4 +
 kernel/Makefile                    |    1 +
 kernel/cpu.c                       |    2 +-
 kernel/dependency/Makefile         |    4 +
 kernel/dependency/dept.c           | 2716 ++++++++++++++++++++++++++++++++++++
 kernel/dependency/dept_hash.h      |   10 +
 kernel/dependency/dept_internal.h  |   26 +
 kernel/dependency/dept_object.h    |   13 +
 kernel/dependency/dept_proc.c      |   92 ++
 kernel/entry/common.c              |    3 +
 kernel/exit.c                      |    1 +
 kernel/fork.c                      |    2 +
 kernel/locking/lockdep.c           |   12 +-
 kernel/module.c                    |    2 +
 kernel/sched/completion.c          |   12 +-
 kernel/sched/core.c                |    8 +
 kernel/sched/swait.c               |   10 +
 kernel/sched/wait.c                |   16 +
 kernel/sched/wait_bit.c            |    5 +-
 kernel/softirq.c                   |    6 +-
 kernel/trace/trace_preemptirq.c    |   19 +-
 kernel/workqueue.c                 |    3 +
 lib/Kconfig.debug                  |   21 +
 mm/filemap.c                       |   68 +
 mm/page_ext.c                      |    5 +
 52 files changed, 4266 insertions(+), 59 deletions(-)
 create mode 100644 include/linux/dept.h
 create mode 100644 include/linux/dept_page.h
 create mode 100644 include/linux/dept_sdt.h
 create mode 100644 kernel/dependency/Makefile
 create mode 100644 kernel/dependency/dept.c
 create mode 100644 kernel/dependency/dept_hash.h
 create mode 100644 kernel/dependency/dept_internal.h
 create mode 100644 kernel/dependency/dept_object.h
 create mode 100644 kernel/dependency/dept_proc.c

Comments

Hyeonggon Yoo March 12, 2022, 1:53 a.m. UTC | #1
On Fri, Mar 04, 2022 at 04:06:19PM +0900, Byungchul Park wrote:
> Hi Linus and folks,
> 
> I've been developing a tool for detecting deadlock possibilities by
> tracking wait/event rather than lock(?) acquisition order to try to
> cover all synchonization machanisms. It's done on v5.17-rc1 tag.
> 
> https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
>

Small feedback unrelated to thread:
I'm not sure "Need to expand the ring buffer" is something to call
WARN(). Is this stack trace useful for something?
========

Hello Byungchul. These are two warnings of DEPT on system.
Both cases look similar.

In what case DEPT says (unknown)?
I'm not sure we can properly debug this.

===================================================
DEPT: Circular dependency has been detected.
5.17.0-rc1+ #3 Tainted: G        W        
---------------------------------------------------
summary
---------------------------------------------------
*** AA DEADLOCK ***

context A
    [S] (unknown)(&vfork:0)
    [W] wait_for_completion_killable(&vfork:0)
    [E] complete(&vfork:0)

[S]: start of the event context
[W]: the wait blocked
[E]: the event not reachable
---------------------------------------------------
context A's detail
---------------------------------------------------
context A
    [S] (unknown)(&vfork:0)
    [W] wait_for_completion_killable(&vfork:0)
    [E] complete(&vfork:0)

[S] (unknown)(&vfork:0):
(N/A)

[W] wait_for_completion_killable(&vfork:0):
[<ffffffc00802204c>] kernel_clone+0x25c/0x2b8
stacktrace:
      dept_wait+0x74/0x88
      wait_for_completion_killable+0x60/0xa0
      kernel_clone+0x25c/0x2b8
      __do_sys_clone+0x5c/0x74
      __arm64_sys_clone+0x18/0x20
      invoke_syscall.constprop.0+0x78/0xc4
      do_el0_svc+0x98/0xd0
      el0_svc+0x44/0xe4
      el0t_64_sync_handler+0xb0/0x12c
      el0t_64_sync+0x158/0x15c

[E] complete(&vfork:0):
[<ffffffc00801f49c>] mm_release+0x7c/0x90
stacktrace:
      dept_event+0xe0/0x100
      complete+0x48/0x98
      mm_release+0x7c/0x90
      exit_mm_release+0xc/0x14
      do_exit+0x1b4/0x81c
      do_group_exit+0x30/0x9c
      __wake_up_parent+0x0/0x24
      invoke_syscall.constprop.0+0x78/0xc4
      do_el0_svc+0x98/0xd0
      el0_svc+0x44/0xe4
      el0t_64_sync_handler+0xb0/0x12c
      el0t_64_sync+0x158/0x15c
---------------------------------------------------
information that might be helpful
---------------------------------------------------
CPU: 6 PID: 229 Comm: start-stop-daem Tainted: G        W         5.17.0-rc1+ #3
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace.part.0+0x9c/0xc4
 show_stack+0x14/0x28
 dump_stack_lvl+0x9c/0xcc
 dump_stack+0x14/0x2c
 print_circle+0x2d4/0x438
 cb_check_dl+0x44/0x70
 bfs+0x60/0x168
 add_dep+0x88/0x11c
 do_event.constprop.0+0x19c/0x2c0
 dept_event+0xe0/0x100
 complete+0x48/0x98
 mm_release+0x7c/0x90
 exit_mm_release+0xc/0x14
 do_exit+0x1b4/0x81c
 do_group_exit+0x30/0x9c
 __wake_up_parent+0x0/0x24
 invoke_syscall.constprop.0+0x78/0xc4
 do_el0_svc+0x98/0xd0
 el0_svc+0x44/0xe4
 el0t_64_sync_handler+0xb0/0x12c
 el0t_64_sync+0x158/0x15c




===================================================
DEPT: Circular dependency has been detected.
5.17.0-rc1+ #3 Tainted: G        W        
---------------------------------------------------
summary
---------------------------------------------------
*** AA DEADLOCK ***

context A
    [S] (unknown)(&try_completion:0)
    [W] wait_for_completion_timeout(&try_completion:0)
    [E] complete(&try_completion:0)

[S]: start of the event context
[W]: the wait blocked
[E]: the event not reachable
---------------------------------------------------
context A's detail
---------------------------------------------------
context A
    [S] (unknown)(&try_completion:0)
    [W] wait_for_completion_timeout(&try_completion:0)
    [E] complete(&try_completion:0)

[S] (unknown)(&try_completion:0):
(N/A)

[W] wait_for_completion_timeout(&try_completion:0):
[<ffffffc008166bf4>] kunit_try_catch_run+0xb4/0x160
stacktrace:
      dept_wait+0x74/0x88
      wait_for_completion_timeout+0x64/0xa0
      kunit_try_catch_run+0xb4/0x160
      kunit_test_try_catch_successful_try_no_catch+0x3c/0x98
      kunit_try_run_case+0x9c/0xa0
      kunit_generic_run_threadfn_adapter+0x1c/0x28
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

[E] complete(&try_completion:0):
[<ffffffc00803dce4>] kthread_complete_and_exit+0x18/0x20
stacktrace:
      dept_event+0xe0/0x100
      complete+0x48/0x98
      kthread_complete_and_exit+0x18/0x20
      kunit_try_catch_throw+0x0/0x1c
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

---------------------------------------------------
information that might be helpful
---------------------------------------------------
CPU: 15 PID: 132 Comm: kunit_try_catch Tainted: G        W         5.17.0-rc1+ #3
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace.part.0+0x9c/0xc4
 show_stack+0x14/0x28
 dump_stack_lvl+0x9c/0xcc
 dump_stack+0x14/0x2c
 print_circle+0x2d4/0x438
 cb_check_dl+0x44/0x70
 bfs+0x60/0x168
 add_dep+0x88/0x11c
 do_event.constprop.0+0x19c/0x2c0
 dept_event+0xe0/0x100
 complete+0x48/0x98
 kthread_complete_and_exit+0x18/0x20
 kunit_try_catch_throw+0x0/0x1c
 kthread+0xd4/0xe4
 ret_from_fork+0x10/0x20


> Benifit:
> 
> 	0. Works with all lock primitives.
> 	1. Works with wait_for_completion()/complete().
> 	2. Works with 'wait' on PG_locked.
> 	3. Works with 'wait' on PG_writeback.
> 	4. Works with swait/wakeup.
> 	5. Works with waitqueue.
> 	6. Multiple reports are allowed.
> 	7. Deduplication control on multiple reports.
> 	8. Withstand false positives thanks to 6.
> 	9. Easy to tag any wait/event.
> 
> Future work:

[...]

> -- 
> 1.9.1
>
Byungchul Park March 14, 2022, 6:59 a.m. UTC | #2
On Sat, Mar 12, 2022 at 01:53:26AM +0000, Hyeonggon Yoo wrote:
> On Fri, Mar 04, 2022 at 04:06:19PM +0900, Byungchul Park wrote:
> > Hi Linus and folks,
> > 
> > I've been developing a tool for detecting deadlock possibilities by
> > tracking wait/event rather than lock(?) acquisition order to try to
> > cover all synchonization machanisms. It's done on v5.17-rc1 tag.
> > 
> > https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
> >
> 
> Small feedback unrelated to thread:
> I'm not sure "Need to expand the ring buffer" is something to call
> WARN(). Is this stack trace useful for something?

Yeah. It seems to happen too often. I won't warn it. Thanks.

> ========
> 
> Hello Byungchul. These are two warnings of DEPT on system.
> Both cases look similar.
> 
> In what case DEPT says (unknown)?
> I'm not sure we can properly debug this.
> 
> ===================================================
> DEPT: Circular dependency has been detected.
> 5.17.0-rc1+ #3 Tainted: G        W        
> ---------------------------------------------------
> summary
> ---------------------------------------------------
> *** AA DEADLOCK ***
> 
> context A
>     [S] (unknown)(&vfork:0)
>     [W] wait_for_completion_killable(&vfork:0)
>     [E] complete(&vfork:0)

All the reports look like having to do with kernel_clone(). I need to
check it more. Thank you very much.

You are awesome, Hyeonggon.

Thank you,
Byungchul

> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable
> ---------------------------------------------------
> context A's detail
> ---------------------------------------------------
> context A
>     [S] (unknown)(&vfork:0)
>     [W] wait_for_completion_killable(&vfork:0)
>     [E] complete(&vfork:0)
> 
> [S] (unknown)(&vfork:0):
> (N/A)
> 
> [W] wait_for_completion_killable(&vfork:0):
> [<ffffffc00802204c>] kernel_clone+0x25c/0x2b8
> stacktrace:
>       dept_wait+0x74/0x88
>       wait_for_completion_killable+0x60/0xa0
>       kernel_clone+0x25c/0x2b8
>       __do_sys_clone+0x5c/0x74
>       __arm64_sys_clone+0x18/0x20
>       invoke_syscall.constprop.0+0x78/0xc4
>       do_el0_svc+0x98/0xd0
>       el0_svc+0x44/0xe4
>       el0t_64_sync_handler+0xb0/0x12c
>       el0t_64_sync+0x158/0x15c
> 
> [E] complete(&vfork:0):
> [<ffffffc00801f49c>] mm_release+0x7c/0x90
> stacktrace:
>       dept_event+0xe0/0x100
>       complete+0x48/0x98
>       mm_release+0x7c/0x90
>       exit_mm_release+0xc/0x14
>       do_exit+0x1b4/0x81c
>       do_group_exit+0x30/0x9c
>       __wake_up_parent+0x0/0x24
>       invoke_syscall.constprop.0+0x78/0xc4
>       do_el0_svc+0x98/0xd0
>       el0_svc+0x44/0xe4
>       el0t_64_sync_handler+0xb0/0x12c
>       el0t_64_sync+0x158/0x15c
> ---------------------------------------------------
> information that might be helpful
> ---------------------------------------------------
> CPU: 6 PID: 229 Comm: start-stop-daem Tainted: G        W         5.17.0-rc1+ #3
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.0+0x9c/0xc4
>  show_stack+0x14/0x28
>  dump_stack_lvl+0x9c/0xcc
>  dump_stack+0x14/0x2c
>  print_circle+0x2d4/0x438
>  cb_check_dl+0x44/0x70
>  bfs+0x60/0x168
>  add_dep+0x88/0x11c
>  do_event.constprop.0+0x19c/0x2c0
>  dept_event+0xe0/0x100
>  complete+0x48/0x98
>  mm_release+0x7c/0x90
>  exit_mm_release+0xc/0x14
>  do_exit+0x1b4/0x81c
>  do_group_exit+0x30/0x9c
>  __wake_up_parent+0x0/0x24
>  invoke_syscall.constprop.0+0x78/0xc4
>  do_el0_svc+0x98/0xd0
>  el0_svc+0x44/0xe4
>  el0t_64_sync_handler+0xb0/0x12c
>  el0t_64_sync+0x158/0x15c
> 
> 
> 
> 
> ===================================================
> DEPT: Circular dependency has been detected.
> 5.17.0-rc1+ #3 Tainted: G        W        
> ---------------------------------------------------
> summary
> ---------------------------------------------------
> *** AA DEADLOCK ***
> 
> context A
>     [S] (unknown)(&try_completion:0)
>     [W] wait_for_completion_timeout(&try_completion:0)
>     [E] complete(&try_completion:0)
> 
> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable
> ---------------------------------------------------
> context A's detail
> ---------------------------------------------------
> context A
>     [S] (unknown)(&try_completion:0)
>     [W] wait_for_completion_timeout(&try_completion:0)
>     [E] complete(&try_completion:0)
> 
> [S] (unknown)(&try_completion:0):
> (N/A)
> 
> [W] wait_for_completion_timeout(&try_completion:0):
> [<ffffffc008166bf4>] kunit_try_catch_run+0xb4/0x160
> stacktrace:
>       dept_wait+0x74/0x88
>       wait_for_completion_timeout+0x64/0xa0
>       kunit_try_catch_run+0xb4/0x160
>       kunit_test_try_catch_successful_try_no_catch+0x3c/0x98
>       kunit_try_run_case+0x9c/0xa0
>       kunit_generic_run_threadfn_adapter+0x1c/0x28
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> 
> [E] complete(&try_completion:0):
> [<ffffffc00803dce4>] kthread_complete_and_exit+0x18/0x20
> stacktrace:
>       dept_event+0xe0/0x100
>       complete+0x48/0x98
>       kthread_complete_and_exit+0x18/0x20
>       kunit_try_catch_throw+0x0/0x1c
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> 
> ---------------------------------------------------
> information that might be helpful
> ---------------------------------------------------
> CPU: 15 PID: 132 Comm: kunit_try_catch Tainted: G        W         5.17.0-rc1+ #3
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.0+0x9c/0xc4
>  show_stack+0x14/0x28
>  dump_stack_lvl+0x9c/0xcc
>  dump_stack+0x14/0x2c
>  print_circle+0x2d4/0x438
>  cb_check_dl+0x44/0x70
>  bfs+0x60/0x168
>  add_dep+0x88/0x11c
>  do_event.constprop.0+0x19c/0x2c0
>  dept_event+0xe0/0x100
>  complete+0x48/0x98
>  kthread_complete_and_exit+0x18/0x20
>  kunit_try_catch_throw+0x0/0x1c
>  kthread+0xd4/0xe4
>  ret_from_fork+0x10/0x20


> -- 
> Thank you, You are awesome!
> Hyeonggon :-)
Hyeonggon Yoo March 15, 2022, 12:04 p.m. UTC | #3
On Mon, Mar 14, 2022 at 03:59:06PM +0900, Byungchul Park wrote:
> On Sat, Mar 12, 2022 at 01:53:26AM +0000, Hyeonggon Yoo wrote:
> > On Fri, Mar 04, 2022 at 04:06:19PM +0900, Byungchul Park wrote:
> > > Hi Linus and folks,
> > > 
> > > I've been developing a tool for detecting deadlock possibilities by
> > > tracking wait/event rather than lock(?) acquisition order to try to
> > > cover all synchonization machanisms. It's done on v5.17-rc1 tag.
> > > 
> > > https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
> > >
> > 
> > Small feedback unrelated to thread:
> > I'm not sure "Need to expand the ring buffer" is something to call
> > WARN(). Is this stack trace useful for something?
> 
> Yeah. It seems to happen too often. I won't warn it. Thanks.

Thanks!

> > ========
> > 
> > Hello Byungchul. These are two warnings of DEPT on system.
> > Both cases look similar.
> > 
> > In what case DEPT says (unknown)?
> > I'm not sure we can properly debug this.
> > 
> > ===================================================
> > DEPT: Circular dependency has been detected.
> > 5.17.0-rc1+ #3 Tainted: G        W        
> > ---------------------------------------------------
> > summary
> > ---------------------------------------------------
> > *** AA DEADLOCK ***
> > 
> > context A
> >     [S] (unknown)(&vfork:0)
> >     [W] wait_for_completion_killable(&vfork:0)
> >     [E] complete(&vfork:0)
> 
> All the reports look like having to do with kernel_clone(). I need to
> check it more. Thank you very much.
> 
> You are awesome, Hyeonggon.
>

Thank you. Let me know if there is something I can help!

> Thank you,
> Byungchul
> 
> > [S]: start of the event context
> > [W]: the wait blocked
> > [E]: the event not reachable
> > ---------------------------------------------------
> > context A's detail
> > ---------------------------------------------------
> > context A
> >     [S] (unknown)(&vfork:0)
> >     [W] wait_for_completion_killable(&vfork:0)
> >     [E] complete(&vfork:0)
> > 
> > [S] (unknown)(&vfork:0):
> > (N/A)
> > 
> > [W] wait_for_completion_killable(&vfork:0):
> > [<ffffffc00802204c>] kernel_clone+0x25c/0x2b8
> > stacktrace:
> >       dept_wait+0x74/0x88
> >       wait_for_completion_killable+0x60/0xa0
> >       kernel_clone+0x25c/0x2b8
> >       __do_sys_clone+0x5c/0x74
> >       __arm64_sys_clone+0x18/0x20
> >       invoke_syscall.constprop.0+0x78/0xc4
> >       do_el0_svc+0x98/0xd0
> >       el0_svc+0x44/0xe4
> >       el0t_64_sync_handler+0xb0/0x12c
> >       el0t_64_sync+0x158/0x15c
> > 
> > [E] complete(&vfork:0):
> > [<ffffffc00801f49c>] mm_release+0x7c/0x90
> > stacktrace:
> >       dept_event+0xe0/0x100
> >       complete+0x48/0x98
> >       mm_release+0x7c/0x90
> >       exit_mm_release+0xc/0x14
> >       do_exit+0x1b4/0x81c
> >       do_group_exit+0x30/0x9c
> >       __wake_up_parent+0x0/0x24
> >       invoke_syscall.constprop.0+0x78/0xc4
> >       do_el0_svc+0x98/0xd0
> >       el0_svc+0x44/0xe4
> >       el0t_64_sync_handler+0xb0/0x12c
> >       el0t_64_sync+0x158/0x15c
> > ---------------------------------------------------
> > information that might be helpful
> > ---------------------------------------------------
> > CPU: 6 PID: 229 Comm: start-stop-daem Tainted: G        W         5.17.0-rc1+ #3
> > Hardware name: linux,dummy-virt (DT)
> > Call trace:
> >  dump_backtrace.part.0+0x9c/0xc4
> >  show_stack+0x14/0x28
> >  dump_stack_lvl+0x9c/0xcc
> >  dump_stack+0x14/0x2c
> >  print_circle+0x2d4/0x438
> >  cb_check_dl+0x44/0x70
> >  bfs+0x60/0x168
> >  add_dep+0x88/0x11c
> >  do_event.constprop.0+0x19c/0x2c0
> >  dept_event+0xe0/0x100
> >  complete+0x48/0x98
> >  mm_release+0x7c/0x90
> >  exit_mm_release+0xc/0x14
> >  do_exit+0x1b4/0x81c
> >  do_group_exit+0x30/0x9c
> >  __wake_up_parent+0x0/0x24
> >  invoke_syscall.constprop.0+0x78/0xc4
> >  do_el0_svc+0x98/0xd0
> >  el0_svc+0x44/0xe4
> >  el0t_64_sync_handler+0xb0/0x12c
> >  el0t_64_sync+0x158/0x15c
> > 
> > 
> > 
> > 
> > ===================================================
> > DEPT: Circular dependency has been detected.
> > 5.17.0-rc1+ #3 Tainted: G        W        
> > ---------------------------------------------------
> > summary
> > ---------------------------------------------------
> > *** AA DEADLOCK ***
> > 
> > context A
> >     [S] (unknown)(&try_completion:0)
> >     [W] wait_for_completion_timeout(&try_completion:0)
> >     [E] complete(&try_completion:0)
> > 
> > [S]: start of the event context
> > [W]: the wait blocked
> > [E]: the event not reachable
> > ---------------------------------------------------
> > context A's detail
> > ---------------------------------------------------
> > context A
> >     [S] (unknown)(&try_completion:0)
> >     [W] wait_for_completion_timeout(&try_completion:0)
> >     [E] complete(&try_completion:0)
> > 
> > [S] (unknown)(&try_completion:0):
> > (N/A)
> > 
> > [W] wait_for_completion_timeout(&try_completion:0):
> > [<ffffffc008166bf4>] kunit_try_catch_run+0xb4/0x160
> > stacktrace:
> >       dept_wait+0x74/0x88
> >       wait_for_completion_timeout+0x64/0xa0
> >       kunit_try_catch_run+0xb4/0x160
> >       kunit_test_try_catch_successful_try_no_catch+0x3c/0x98
> >       kunit_try_run_case+0x9c/0xa0
> >       kunit_generic_run_threadfn_adapter+0x1c/0x28
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > 
> > [E] complete(&try_completion:0):
> > [<ffffffc00803dce4>] kthread_complete_and_exit+0x18/0x20
> > stacktrace:
> >       dept_event+0xe0/0x100
> >       complete+0x48/0x98
> >       kthread_complete_and_exit+0x18/0x20
> >       kunit_try_catch_throw+0x0/0x1c
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > 
> > ---------------------------------------------------
> > information that might be helpful
> > ---------------------------------------------------
> > CPU: 15 PID: 132 Comm: kunit_try_catch Tainted: G        W         5.17.0-rc1+ #3
> > Hardware name: linux,dummy-virt (DT)
> > Call trace:
> >  dump_backtrace.part.0+0x9c/0xc4
> >  show_stack+0x14/0x28
> >  dump_stack_lvl+0x9c/0xcc
> >  dump_stack+0x14/0x2c
> >  print_circle+0x2d4/0x438
> >  cb_check_dl+0x44/0x70
> >  bfs+0x60/0x168
> >  add_dep+0x88/0x11c
> >  do_event.constprop.0+0x19c/0x2c0
> >  dept_event+0xe0/0x100
> >  complete+0x48/0x98
> >  kthread_complete_and_exit+0x18/0x20
> >  kunit_try_catch_throw+0x0/0x1c
> >  kthread+0xd4/0xe4
> >  ret_from_fork+0x10/0x20
> 
> 
> > -- 
> > Thank you, You are awesome!
> > Hyeonggon :-)
Byungchul Park March 16, 2022, 4:32 a.m. UTC | #4
On Sat, Mar 12, 2022 at 01:53:26AM +0000, Hyeonggon Yoo wrote:
> On Fri, Mar 04, 2022 at 04:06:19PM +0900, Byungchul Park wrote:
> > Hi Linus and folks,
> > 
> > I've been developing a tool for detecting deadlock possibilities by
> > tracking wait/event rather than lock(?) acquisition order to try to
> > cover all synchonization machanisms. It's done on v5.17-rc1 tag.
> > 
> > https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
> >
> 
> Small feedback unrelated to thread:
> I'm not sure "Need to expand the ring buffer" is something to call
> WARN(). Is this stack trace useful for something?
> ========
> 
> Hello Byungchul. These are two warnings of DEPT on system.

Hi Hyeonggon,

Could you run scripts/decode_stacktrace.sh and share the result instead
of the raw format below if the reports still appear with PATCH v5? It'd
be appreciated (:

https://lkml.org/lkml/2022/3/15/1277
(or https://github.com/lgebyungchulpark/linux-dept/commits/dept1.18_on_v5.17-rc7)

Thank you very much!

--
Byungchul

> Both cases look similar.
> 
> In what case DEPT says (unknown)?
> I'm not sure we can properly debug this.
> 
> ===================================================
> DEPT: Circular dependency has been detected.
> 5.17.0-rc1+ #3 Tainted: G        W        
> ---------------------------------------------------
> summary
> ---------------------------------------------------
> *** AA DEADLOCK ***
> 
> context A
>     [S] (unknown)(&vfork:0)
>     [W] wait_for_completion_killable(&vfork:0)
>     [E] complete(&vfork:0)
> 
> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable
> ---------------------------------------------------
> context A's detail
> ---------------------------------------------------
> context A
>     [S] (unknown)(&vfork:0)
>     [W] wait_for_completion_killable(&vfork:0)
>     [E] complete(&vfork:0)
> 
> [S] (unknown)(&vfork:0):
> (N/A)
> 
> [W] wait_for_completion_killable(&vfork:0):
> [<ffffffc00802204c>] kernel_clone+0x25c/0x2b8
> stacktrace:
>       dept_wait+0x74/0x88
>       wait_for_completion_killable+0x60/0xa0
>       kernel_clone+0x25c/0x2b8
>       __do_sys_clone+0x5c/0x74
>       __arm64_sys_clone+0x18/0x20
>       invoke_syscall.constprop.0+0x78/0xc4
>       do_el0_svc+0x98/0xd0
>       el0_svc+0x44/0xe4
>       el0t_64_sync_handler+0xb0/0x12c
>       el0t_64_sync+0x158/0x15c
> 
> [E] complete(&vfork:0):
> [<ffffffc00801f49c>] mm_release+0x7c/0x90
> stacktrace:
>       dept_event+0xe0/0x100
>       complete+0x48/0x98
>       mm_release+0x7c/0x90
>       exit_mm_release+0xc/0x14
>       do_exit+0x1b4/0x81c
>       do_group_exit+0x30/0x9c
>       __wake_up_parent+0x0/0x24
>       invoke_syscall.constprop.0+0x78/0xc4
>       do_el0_svc+0x98/0xd0
>       el0_svc+0x44/0xe4
>       el0t_64_sync_handler+0xb0/0x12c
>       el0t_64_sync+0x158/0x15c
> ---------------------------------------------------
> information that might be helpful
> ---------------------------------------------------
> CPU: 6 PID: 229 Comm: start-stop-daem Tainted: G        W         5.17.0-rc1+ #3
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.0+0x9c/0xc4
>  show_stack+0x14/0x28
>  dump_stack_lvl+0x9c/0xcc
>  dump_stack+0x14/0x2c
>  print_circle+0x2d4/0x438
>  cb_check_dl+0x44/0x70
>  bfs+0x60/0x168
>  add_dep+0x88/0x11c
>  do_event.constprop.0+0x19c/0x2c0
>  dept_event+0xe0/0x100
>  complete+0x48/0x98
>  mm_release+0x7c/0x90
>  exit_mm_release+0xc/0x14
>  do_exit+0x1b4/0x81c
>  do_group_exit+0x30/0x9c
>  __wake_up_parent+0x0/0x24
>  invoke_syscall.constprop.0+0x78/0xc4
>  do_el0_svc+0x98/0xd0
>  el0_svc+0x44/0xe4
>  el0t_64_sync_handler+0xb0/0x12c
>  el0t_64_sync+0x158/0x15c
> 
> 
> 
> 
> ===================================================
> DEPT: Circular dependency has been detected.
> 5.17.0-rc1+ #3 Tainted: G        W        
> ---------------------------------------------------
> summary
> ---------------------------------------------------
> *** AA DEADLOCK ***
> 
> context A
>     [S] (unknown)(&try_completion:0)
>     [W] wait_for_completion_timeout(&try_completion:0)
>     [E] complete(&try_completion:0)
> 
> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable
> ---------------------------------------------------
> context A's detail
> ---------------------------------------------------
> context A
>     [S] (unknown)(&try_completion:0)
>     [W] wait_for_completion_timeout(&try_completion:0)
>     [E] complete(&try_completion:0)
> 
> [S] (unknown)(&try_completion:0):
> (N/A)
> 
> [W] wait_for_completion_timeout(&try_completion:0):
> [<ffffffc008166bf4>] kunit_try_catch_run+0xb4/0x160
> stacktrace:
>       dept_wait+0x74/0x88
>       wait_for_completion_timeout+0x64/0xa0
>       kunit_try_catch_run+0xb4/0x160
>       kunit_test_try_catch_successful_try_no_catch+0x3c/0x98
>       kunit_try_run_case+0x9c/0xa0
>       kunit_generic_run_threadfn_adapter+0x1c/0x28
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> 
> [E] complete(&try_completion:0):
> [<ffffffc00803dce4>] kthread_complete_and_exit+0x18/0x20
> stacktrace:
>       dept_event+0xe0/0x100
>       complete+0x48/0x98
>       kthread_complete_and_exit+0x18/0x20
>       kunit_try_catch_throw+0x0/0x1c
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> 
> ---------------------------------------------------
> information that might be helpful
> ---------------------------------------------------
> CPU: 15 PID: 132 Comm: kunit_try_catch Tainted: G        W         5.17.0-rc1+ #3
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.0+0x9c/0xc4
>  show_stack+0x14/0x28
>  dump_stack_lvl+0x9c/0xcc
>  dump_stack+0x14/0x2c
>  print_circle+0x2d4/0x438
>  cb_check_dl+0x44/0x70
>  bfs+0x60/0x168
>  add_dep+0x88/0x11c
>  do_event.constprop.0+0x19c/0x2c0
>  dept_event+0xe0/0x100
>  complete+0x48/0x98
>  kthread_complete_and_exit+0x18/0x20
>  kunit_try_catch_throw+0x0/0x1c
>  kthread+0xd4/0xe4
>  ret_from_fork+0x10/0x20
> 
> 
> > Benifit:
> > 
> > 	0. Works with all lock primitives.
> > 	1. Works with wait_for_completion()/complete().
> > 	2. Works with 'wait' on PG_locked.
> > 	3. Works with 'wait' on PG_writeback.
> > 	4. Works with swait/wakeup.
> > 	5. Works with waitqueue.
> > 	6. Multiple reports are allowed.
> > 	7. Deduplication control on multiple reports.
> > 	8. Withstand false positives thanks to 6.
> > 	9. Easy to tag any wait/event.
> > 
> > Future work:
> 
> [...]
> 
> > -- 
> > 1.9.1
> > 
> 
> -- 
> Thank you, You are awesome!
> Hyeonggon :-)
Hyeonggon Yoo March 16, 2022, 9:30 a.m. UTC | #5
On Wed, Mar 16, 2022 at 01:32:13PM +0900, Byungchul Park wrote:
> On Sat, Mar 12, 2022 at 01:53:26AM +0000, Hyeonggon Yoo wrote:
> > On Fri, Mar 04, 2022 at 04:06:19PM +0900, Byungchul Park wrote:
> > > Hi Linus and folks,
> > > 
> > > I've been developing a tool for detecting deadlock possibilities by
> > > tracking wait/event rather than lock(?) acquisition order to try to
> > > cover all synchonization machanisms. It's done on v5.17-rc1 tag.
> > > 
> > > https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
> > >
> > 
> > Small feedback unrelated to thread:
> > I'm not sure "Need to expand the ring buffer" is something to call
> > WARN(). Is this stack trace useful for something?
> > ========
> > 
> > Hello Byungchul. These are two warnings of DEPT on system.
> 
> Hi Hyeonggon,
> 
> Could you run scripts/decode_stacktrace.sh and share the result instead
> of the raw format below if the reports still appear with PATCH v5? It'd
> be appreciated (:
>

Hi Byungchul.

on dept1.18_on_v5.17-rc7, the kernel_clone() warning has gone.
There is one warning remaining on my system:

It warns when running kunit-try-catch-test testcase.

===================================================
DEPT: Circular dependency has been detected.
5.17.0-rc7+ #4 Not tainted
---------------------------------------------------
summary
---------------------------------------------------
*** AA DEADLOCK ***

context A
[S] (unknown)(&try_completion:0)
[W] wait_for_completion_timeout(&try_completion:0)
[E] complete(&try_completion:0)

[S]: start of the event context
[W]: the wait blocked
[E]: the event not reachable
---------------------------------------------------
context A's detail
---------------------------------------------------
context A
[S] (unknown)(&try_completion:0)
[W] wait_for_completion_timeout(&try_completion:0)
[E] complete(&try_completion:0)

[S] (unknown)(&try_completion:0):
(N/A)

[W] wait_for_completion_timeout(&try_completion:0):
kunit_try_catch_run (lib/kunit/try-catch.c:78 (discriminator 1)) 
stacktrace:
dept_wait (kernel/dependency/dept.c:2149) 
wait_for_completion_timeout (kernel/sched/completion.c:119 (discriminator 4) kernel/sched/completion.c:165 (discriminator 4)) 
kunit_try_catch_run (lib/kunit/try-catch.c:78 (discriminator 1)) 
kunit_test_try_catch_successful_try_no_catch (lib/kunit/kunit-test.c:43) 
kunit_try_run_case (lib/kunit/test.c:333 lib/kunit/test.c:374) 
kunit_generic_run_threadfn_adapter (lib/kunit/try-catch.c:30) 
kthread (kernel/kthread.c:379) 
ret_from_fork (arch/arm64/kernel/entry.S:757)

[E] complete(&try_completion:0):
kthread_complete_and_exit (kernel/kthread.c:327) 
stacktrace:
dept_event (kernel/dependency/dept.c:2376 (discriminator 2)) 
complete (kernel/sched/completion.c:33 (discriminator 4)) 
kthread_complete_and_exit (kernel/kthread.c:327) 
kunit_try_catch_throw (lib/kunit/try-catch.c:18) 
kthread (kernel/kthread.c:379) 
ret_from_fork (arch/arm64/kernel/entry.S:757) 

---------------------------------------------------
information that might be helpful
---------------------------------------------------
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:186) 
show_stack (arch/arm64/kernel/stacktrace.c:193) 
dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
dump_stack (lib/dump_stack.c:114) 
print_circle (./arch/arm64/include/asm/atomic_ll_sc.h:112 ./arch/arm64/include/asm/atomic.h:30 ./include/linux/atomic/atomic-arch-fallback.h:511 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:140 kernel/dependency/dept.c:748) 
cb_check_dl (kernel/dependency/dept.c:1083 kernel/dependency/dept.c:1064) 
bfs (kernel/dependency/dept.c:833) 
add_dep (kernel/dependency/dept.c:1409) 
do_event (kernel/dependency/dept.c:175 kernel/dependency/dept.c:1644) 
dept_event (kernel/dependency/dept.c:2376 (discriminator 2)) 
complete (kernel/sched/completion.c:33 (discriminator 4)) 
kthread_complete_and_exit (kernel/kthread.c:327) 
kunit_try_catch_throw (lib/kunit/try-catch.c:18) 
kthread (kernel/kthread.c:379) 
ret_from_fork (arch/arm64/kernel/entry.S:757)
Byungchul Park March 18, 2022, 7:51 a.m. UTC | #6
On Wed, Mar 16, 2022 at 09:30:02AM +0000, Hyeonggon Yoo wrote:
> On Wed, Mar 16, 2022 at 01:32:13PM +0900, Byungchul Park wrote:
> > On Sat, Mar 12, 2022 at 01:53:26AM +0000, Hyeonggon Yoo wrote:
> > > On Fri, Mar 04, 2022 at 04:06:19PM +0900, Byungchul Park wrote:
> > > > Hi Linus and folks,
> > > > 
> > > > I've been developing a tool for detecting deadlock possibilities by
> > > > tracking wait/event rather than lock(?) acquisition order to try to
> > > > cover all synchonization machanisms. It's done on v5.17-rc1 tag.
> > > > 
> > > > https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
> > > >
> > > 
> > > Small feedback unrelated to thread:
> > > I'm not sure "Need to expand the ring buffer" is something to call
> > > WARN(). Is this stack trace useful for something?
> > > ========
> > > 
> > > Hello Byungchul. These are two warnings of DEPT on system.
> > 
> > Hi Hyeonggon,
> > 
> > Could you run scripts/decode_stacktrace.sh and share the result instead
> > of the raw format below if the reports still appear with PATCH v5? It'd
> > be appreciated (:
> >
> 
> Hi Byungchul.
> 
> on dept1.18_on_v5.17-rc7, the kernel_clone() warning has gone.
> There is one warning remaining on my system:
> 
> It warns when running kunit-try-catch-test testcase.

Hi Hyeonggon,

I can reproduce it thanks to you. I will let you know on all works done.

Thanks,
Byungchul

> ===================================================
> DEPT: Circular dependency has been detected.
> 5.17.0-rc7+ #4 Not tainted
> ---------------------------------------------------
> summary
> ---------------------------------------------------
> *** AA DEADLOCK ***
> 
> context A
> [S] (unknown)(&try_completion:0)
> [W] wait_for_completion_timeout(&try_completion:0)
> [E] complete(&try_completion:0)
> 
> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable
> ---------------------------------------------------
> context A's detail
> ---------------------------------------------------
> context A
> [S] (unknown)(&try_completion:0)
> [W] wait_for_completion_timeout(&try_completion:0)
> [E] complete(&try_completion:0)
> 
> [S] (unknown)(&try_completion:0):
> (N/A)
> 
> [W] wait_for_completion_timeout(&try_completion:0):
> kunit_try_catch_run (lib/kunit/try-catch.c:78 (discriminator 1)) 
> stacktrace:
> dept_wait (kernel/dependency/dept.c:2149) 
> wait_for_completion_timeout (kernel/sched/completion.c:119 (discriminator 4) kernel/sched/completion.c:165 (discriminator 4)) 
> kunit_try_catch_run (lib/kunit/try-catch.c:78 (discriminator 1)) 
> kunit_test_try_catch_successful_try_no_catch (lib/kunit/kunit-test.c:43) 
> kunit_try_run_case (lib/kunit/test.c:333 lib/kunit/test.c:374) 
> kunit_generic_run_threadfn_adapter (lib/kunit/try-catch.c:30) 
> kthread (kernel/kthread.c:379) 
> ret_from_fork (arch/arm64/kernel/entry.S:757)
> 
> [E] complete(&try_completion:0):
> kthread_complete_and_exit (kernel/kthread.c:327) 
> stacktrace:
> dept_event (kernel/dependency/dept.c:2376 (discriminator 2)) 
> complete (kernel/sched/completion.c:33 (discriminator 4)) 
> kthread_complete_and_exit (kernel/kthread.c:327) 
> kunit_try_catch_throw (lib/kunit/try-catch.c:18) 
> kthread (kernel/kthread.c:379) 
> ret_from_fork (arch/arm64/kernel/entry.S:757) 
> 
> ---------------------------------------------------
> information that might be helpful
> ---------------------------------------------------
> Hardware name: linux,dummy-virt (DT)
> Call trace:
> dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:186) 
> show_stack (arch/arm64/kernel/stacktrace.c:193) 
> dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
> dump_stack (lib/dump_stack.c:114) 
> print_circle (./arch/arm64/include/asm/atomic_ll_sc.h:112 ./arch/arm64/include/asm/atomic.h:30 ./include/linux/atomic/atomic-arch-fallback.h:511 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:140 kernel/dependency/dept.c:748) 
> cb_check_dl (kernel/dependency/dept.c:1083 kernel/dependency/dept.c:1064) 
> bfs (kernel/dependency/dept.c:833) 
> add_dep (kernel/dependency/dept.c:1409) 
> do_event (kernel/dependency/dept.c:175 kernel/dependency/dept.c:1644) 
> dept_event (kernel/dependency/dept.c:2376 (discriminator 2)) 
> complete (kernel/sched/completion.c:33 (discriminator 4)) 
> kthread_complete_and_exit (kernel/kthread.c:327) 
> kunit_try_catch_throw (lib/kunit/try-catch.c:18) 
> kthread (kernel/kthread.c:379) 
> ret_from_fork (arch/arm64/kernel/entry.S:757)
> 
> -- 
> Thank you, You are awesome!
> Hyeonggon :-)
> 
> > https://lkml.org/lkml/2022/3/15/1277
> > (or https://github.com/lgebyungchulpark/linux-dept/commits/dept1.18_on_v5.17-rc7)
> > 
> > Thank you very much!
> > 
> > --
> > Byungchul
> > 
> > > Both cases look similar.
> > > 
> > > In what case DEPT says (unknown)?
> > > I'm not sure we can properly debug this.
> > > 
> > > ===================================================
> > > DEPT: Circular dependency has been detected.
> > > 5.17.0-rc1+ #3 Tainted: G        W        
> > > ---------------------------------------------------
> > > summary
> > > ---------------------------------------------------
> > > *** AA DEADLOCK ***
> > > 
> > > context A
> > >     [S] (unknown)(&vfork:0)
> > >     [W] wait_for_completion_killable(&vfork:0)
> > >     [E] complete(&vfork:0)
> > > 
> > > [S]: start of the event context
> > > [W]: the wait blocked
> > > [E]: the event not reachable
> > > ---------------------------------------------------
> > > context A's detail
> > > ---------------------------------------------------
> > > context A
> > >     [S] (unknown)(&vfork:0)
> > >     [W] wait_for_completion_killable(&vfork:0)
> > >     [E] complete(&vfork:0)
> > > 
> > > [S] (unknown)(&vfork:0):
> > > (N/A)
> > > 
> > > [W] wait_for_completion_killable(&vfork:0):
> > > [<ffffffc00802204c>] kernel_clone+0x25c/0x2b8
> > > stacktrace:
> > >       dept_wait+0x74/0x88
> > >       wait_for_completion_killable+0x60/0xa0
> > >       kernel_clone+0x25c/0x2b8
> > >       __do_sys_clone+0x5c/0x74
> > >       __arm64_sys_clone+0x18/0x20
> > >       invoke_syscall.constprop.0+0x78/0xc4
> > >       do_el0_svc+0x98/0xd0
> > >       el0_svc+0x44/0xe4
> > >       el0t_64_sync_handler+0xb0/0x12c
> > >       el0t_64_sync+0x158/0x15c
> > > 
> > > [E] complete(&vfork:0):
> > > [<ffffffc00801f49c>] mm_release+0x7c/0x90
> > > stacktrace:
> > >       dept_event+0xe0/0x100
> > >       complete+0x48/0x98
> > >       mm_release+0x7c/0x90
> > >       exit_mm_release+0xc/0x14
> > >       do_exit+0x1b4/0x81c
> > >       do_group_exit+0x30/0x9c
> > >       __wake_up_parent+0x0/0x24
> > >       invoke_syscall.constprop.0+0x78/0xc4
> > >       do_el0_svc+0x98/0xd0
> > >       el0_svc+0x44/0xe4
> > >       el0t_64_sync_handler+0xb0/0x12c
> > >       el0t_64_sync+0x158/0x15c
> > > ---------------------------------------------------
> > > information that might be helpful
> > > ---------------------------------------------------
> > > CPU: 6 PID: 229 Comm: start-stop-daem Tainted: G        W         5.17.0-rc1+ #3
> > > Hardware name: linux,dummy-virt (DT)
> > > Call trace:
> > >  dump_backtrace.part.0+0x9c/0xc4
> > >  show_stack+0x14/0x28
> > >  dump_stack_lvl+0x9c/0xcc
> > >  dump_stack+0x14/0x2c
> > >  print_circle+0x2d4/0x438
> > >  cb_check_dl+0x44/0x70
> > >  bfs+0x60/0x168
> > >  add_dep+0x88/0x11c
> > >  do_event.constprop.0+0x19c/0x2c0
> > >  dept_event+0xe0/0x100
> > >  complete+0x48/0x98
> > >  mm_release+0x7c/0x90
> > >  exit_mm_release+0xc/0x14
> > >  do_exit+0x1b4/0x81c
> > >  do_group_exit+0x30/0x9c
> > >  __wake_up_parent+0x0/0x24
> > >  invoke_syscall.constprop.0+0x78/0xc4
> > >  do_el0_svc+0x98/0xd0
> > >  el0_svc+0x44/0xe4
> > >  el0t_64_sync_handler+0xb0/0x12c
> > >  el0t_64_sync+0x158/0x15c
> > > 
> > > 
> > > 
> > > 
> > > ===================================================
> > > DEPT: Circular dependency has been detected.
> > > 5.17.0-rc1+ #3 Tainted: G        W        
> > > ---------------------------------------------------
> > > summary
> > > ---------------------------------------------------
> > > *** AA DEADLOCK ***
> > > 
> > > context A
> > >     [S] (unknown)(&try_completion:0)
> > >     [W] wait_for_completion_timeout(&try_completion:0)
> > >     [E] complete(&try_completion:0)
> > > 
> > > [S]: start of the event context
> > > [W]: the wait blocked
> > > [E]: the event not reachable
> > > ---------------------------------------------------
> > > context A's detail
> > > ---------------------------------------------------
> > > context A
> > >     [S] (unknown)(&try_completion:0)
> > >     [W] wait_for_completion_timeout(&try_completion:0)
> > >     [E] complete(&try_completion:0)
> > > 
> > > [S] (unknown)(&try_completion:0):
> > > (N/A)
> > > 
> > > [W] wait_for_completion_timeout(&try_completion:0):
> > > [<ffffffc008166bf4>] kunit_try_catch_run+0xb4/0x160
> > > stacktrace:
> > >       dept_wait+0x74/0x88
> > >       wait_for_completion_timeout+0x64/0xa0
> > >       kunit_try_catch_run+0xb4/0x160
> > >       kunit_test_try_catch_successful_try_no_catch+0x3c/0x98
> > >       kunit_try_run_case+0x9c/0xa0
> > >       kunit_generic_run_threadfn_adapter+0x1c/0x28
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > 
> > > [E] complete(&try_completion:0):
> > > [<ffffffc00803dce4>] kthread_complete_and_exit+0x18/0x20
> > > stacktrace:
> > >       dept_event+0xe0/0x100
> > >       complete+0x48/0x98
> > >       kthread_complete_and_exit+0x18/0x20
> > >       kunit_try_catch_throw+0x0/0x1c
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > 
> > > ---------------------------------------------------
> > > information that might be helpful
> > > ---------------------------------------------------
> > > CPU: 15 PID: 132 Comm: kunit_try_catch Tainted: G        W         5.17.0-rc1+ #3
> > > Hardware name: linux,dummy-virt (DT)
> > > Call trace:
> > >  dump_backtrace.part.0+0x9c/0xc4
> > >  show_stack+0x14/0x28
> > >  dump_stack_lvl+0x9c/0xcc
> > >  dump_stack+0x14/0x2c
> > >  print_circle+0x2d4/0x438
> > >  cb_check_dl+0x44/0x70
> > >  bfs+0x60/0x168
> > >  add_dep+0x88/0x11c
> > >  do_event.constprop.0+0x19c/0x2c0
> > >  dept_event+0xe0/0x100
> > >  complete+0x48/0x98
> > >  kthread_complete_and_exit+0x18/0x20
> > >  kunit_try_catch_throw+0x0/0x1c
> > >  kthread+0xd4/0xe4
> > >  ret_from_fork+0x10/0x20
> > > 
> > > 
> > > > Benifit:
> > > > 
> > > > 	0. Works with all lock primitives.
> > > > 	1. Works with wait_for_completion()/complete().
> > > > 	2. Works with 'wait' on PG_locked.
> > > > 	3. Works with 'wait' on PG_writeback.
> > > > 	4. Works with swait/wakeup.
> > > > 	5. Works with waitqueue.
> > > > 	6. Multiple reports are allowed.
> > > > 	7. Deduplication control on multiple reports.
> > > > 	8. Withstand false positives thanks to 6.
> > > > 	9. Easy to tag any wait/event.
> > > > 
> > > > Future work:
> > > 
> > > [...]
> > > 
> > > > -- 
> > > > 1.9.1
> > > > 
> > > 
> > > -- 
> > > Thank you, You are awesome!
> > > Hyeonggon :-)
Byungchul Park March 20, 2022, 10:57 a.m. UTC | #7
On Fri, Mar 18, 2022 at 04:51:29PM +0900, Byungchul Park wrote:
> On Wed, Mar 16, 2022 at 09:30:02AM +0000, Hyeonggon Yoo wrote:
> > On Wed, Mar 16, 2022 at 01:32:13PM +0900, Byungchul Park wrote:
> > > On Sat, Mar 12, 2022 at 01:53:26AM +0000, Hyeonggon Yoo wrote:
> > > > On Fri, Mar 04, 2022 at 04:06:19PM +0900, Byungchul Park wrote:
> > > > > Hi Linus and folks,
> > > > > 
> > > > > I've been developing a tool for detecting deadlock possibilities by
> > > > > tracking wait/event rather than lock(?) acquisition order to try to
> > > > > cover all synchonization machanisms. It's done on v5.17-rc1 tag.
> > > > > 
> > > > > https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
> > > > >
> > > > 
> > > > Small feedback unrelated to thread:
> > > > I'm not sure "Need to expand the ring buffer" is something to call
> > > > WARN(). Is this stack trace useful for something?
> > > > ========
> > > > 
> > > > Hello Byungchul. These are two warnings of DEPT on system.
> > > 
> > > Hi Hyeonggon,
> > > 
> > > Could you run scripts/decode_stacktrace.sh and share the result instead
> > > of the raw format below if the reports still appear with PATCH v5? It'd
> > > be appreciated (:
> > >
> > 
> > Hi Byungchul.
> > 
> > on dept1.18_on_v5.17-rc7, the kernel_clone() warning has gone.
> > There is one warning remaining on my system:
> > 
> > It warns when running kunit-try-catch-test testcase.
> 
> Hi Hyeonggon,
> 
> I can reproduce it thanks to you. I will let you know on all works done.

Hi Hyeonggon,

All works wrt this issue have been done. I've just updated the same
branch.

https://github.com/lgebyungchulpark/linux-dept/commits/dept1.18_on_v5.17-rc7

This is just for your information.

Thanks,
Byungchul

> 
> Thanks,
> Byungchul
> 
> > ===================================================
> > DEPT: Circular dependency has been detected.
> > 5.17.0-rc7+ #4 Not tainted
> > ---------------------------------------------------
> > summary
> > ---------------------------------------------------
> > *** AA DEADLOCK ***
> > 
> > context A
> > [S] (unknown)(&try_completion:0)
> > [W] wait_for_completion_timeout(&try_completion:0)
> > [E] complete(&try_completion:0)
> > 
> > [S]: start of the event context
> > [W]: the wait blocked
> > [E]: the event not reachable
> > ---------------------------------------------------
> > context A's detail
> > ---------------------------------------------------
> > context A
> > [S] (unknown)(&try_completion:0)
> > [W] wait_for_completion_timeout(&try_completion:0)
> > [E] complete(&try_completion:0)
> > 
> > [S] (unknown)(&try_completion:0):
> > (N/A)
> > 
> > [W] wait_for_completion_timeout(&try_completion:0):
> > kunit_try_catch_run (lib/kunit/try-catch.c:78 (discriminator 1)) 
> > stacktrace:
> > dept_wait (kernel/dependency/dept.c:2149) 
> > wait_for_completion_timeout (kernel/sched/completion.c:119 (discriminator 4) kernel/sched/completion.c:165 (discriminator 4)) 
> > kunit_try_catch_run (lib/kunit/try-catch.c:78 (discriminator 1)) 
> > kunit_test_try_catch_successful_try_no_catch (lib/kunit/kunit-test.c:43) 
> > kunit_try_run_case (lib/kunit/test.c:333 lib/kunit/test.c:374) 
> > kunit_generic_run_threadfn_adapter (lib/kunit/try-catch.c:30) 
> > kthread (kernel/kthread.c:379) 
> > ret_from_fork (arch/arm64/kernel/entry.S:757)
> > 
> > [E] complete(&try_completion:0):
> > kthread_complete_and_exit (kernel/kthread.c:327) 
> > stacktrace:
> > dept_event (kernel/dependency/dept.c:2376 (discriminator 2)) 
> > complete (kernel/sched/completion.c:33 (discriminator 4)) 
> > kthread_complete_and_exit (kernel/kthread.c:327) 
> > kunit_try_catch_throw (lib/kunit/try-catch.c:18) 
> > kthread (kernel/kthread.c:379) 
> > ret_from_fork (arch/arm64/kernel/entry.S:757) 
> > 
> > ---------------------------------------------------
> > information that might be helpful
> > ---------------------------------------------------
> > Hardware name: linux,dummy-virt (DT)
> > Call trace:
> > dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:186) 
> > show_stack (arch/arm64/kernel/stacktrace.c:193) 
> > dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
> > dump_stack (lib/dump_stack.c:114) 
> > print_circle (./arch/arm64/include/asm/atomic_ll_sc.h:112 ./arch/arm64/include/asm/atomic.h:30 ./include/linux/atomic/atomic-arch-fallback.h:511 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:140 kernel/dependency/dept.c:748) 
> > cb_check_dl (kernel/dependency/dept.c:1083 kernel/dependency/dept.c:1064) 
> > bfs (kernel/dependency/dept.c:833) 
> > add_dep (kernel/dependency/dept.c:1409) 
> > do_event (kernel/dependency/dept.c:175 kernel/dependency/dept.c:1644) 
> > dept_event (kernel/dependency/dept.c:2376 (discriminator 2)) 
> > complete (kernel/sched/completion.c:33 (discriminator 4)) 
> > kthread_complete_and_exit (kernel/kthread.c:327) 
> > kunit_try_catch_throw (lib/kunit/try-catch.c:18) 
> > kthread (kernel/kthread.c:379) 
> > ret_from_fork (arch/arm64/kernel/entry.S:757)
> > 
> > -- 
> > Thank you, You are awesome!
> > Hyeonggon :-)
> > 
> > > https://lkml.org/lkml/2022/3/15/1277
> > > (or https://github.com/lgebyungchulpark/linux-dept/commits/dept1.18_on_v5.17-rc7)
> > > 
> > > Thank you very much!
> > > 
> > > --
> > > Byungchul
> > > 
> > > > Both cases look similar.
> > > > 
> > > > In what case DEPT says (unknown)?
> > > > I'm not sure we can properly debug this.
> > > > 
> > > > ===================================================
> > > > DEPT: Circular dependency has been detected.
> > > > 5.17.0-rc1+ #3 Tainted: G        W        
> > > > ---------------------------------------------------
> > > > summary
> > > > ---------------------------------------------------
> > > > *** AA DEADLOCK ***
> > > > 
> > > > context A
> > > >     [S] (unknown)(&vfork:0)
> > > >     [W] wait_for_completion_killable(&vfork:0)
> > > >     [E] complete(&vfork:0)
> > > > 
> > > > [S]: start of the event context
> > > > [W]: the wait blocked
> > > > [E]: the event not reachable
> > > > ---------------------------------------------------
> > > > context A's detail
> > > > ---------------------------------------------------
> > > > context A
> > > >     [S] (unknown)(&vfork:0)
> > > >     [W] wait_for_completion_killable(&vfork:0)
> > > >     [E] complete(&vfork:0)
> > > > 
> > > > [S] (unknown)(&vfork:0):
> > > > (N/A)
> > > > 
> > > > [W] wait_for_completion_killable(&vfork:0):
> > > > [<ffffffc00802204c>] kernel_clone+0x25c/0x2b8
> > > > stacktrace:
> > > >       dept_wait+0x74/0x88
> > > >       wait_for_completion_killable+0x60/0xa0
> > > >       kernel_clone+0x25c/0x2b8
> > > >       __do_sys_clone+0x5c/0x74
> > > >       __arm64_sys_clone+0x18/0x20
> > > >       invoke_syscall.constprop.0+0x78/0xc4
> > > >       do_el0_svc+0x98/0xd0
> > > >       el0_svc+0x44/0xe4
> > > >       el0t_64_sync_handler+0xb0/0x12c
> > > >       el0t_64_sync+0x158/0x15c
> > > > 
> > > > [E] complete(&vfork:0):
> > > > [<ffffffc00801f49c>] mm_release+0x7c/0x90
> > > > stacktrace:
> > > >       dept_event+0xe0/0x100
> > > >       complete+0x48/0x98
> > > >       mm_release+0x7c/0x90
> > > >       exit_mm_release+0xc/0x14
> > > >       do_exit+0x1b4/0x81c
> > > >       do_group_exit+0x30/0x9c
> > > >       __wake_up_parent+0x0/0x24
> > > >       invoke_syscall.constprop.0+0x78/0xc4
> > > >       do_el0_svc+0x98/0xd0
> > > >       el0_svc+0x44/0xe4
> > > >       el0t_64_sync_handler+0xb0/0x12c
> > > >       el0t_64_sync+0x158/0x15c
> > > > ---------------------------------------------------
> > > > information that might be helpful
> > > > ---------------------------------------------------
> > > > CPU: 6 PID: 229 Comm: start-stop-daem Tainted: G        W         5.17.0-rc1+ #3
> > > > Hardware name: linux,dummy-virt (DT)
> > > > Call trace:
> > > >  dump_backtrace.part.0+0x9c/0xc4
> > > >  show_stack+0x14/0x28
> > > >  dump_stack_lvl+0x9c/0xcc
> > > >  dump_stack+0x14/0x2c
> > > >  print_circle+0x2d4/0x438
> > > >  cb_check_dl+0x44/0x70
> > > >  bfs+0x60/0x168
> > > >  add_dep+0x88/0x11c
> > > >  do_event.constprop.0+0x19c/0x2c0
> > > >  dept_event+0xe0/0x100
> > > >  complete+0x48/0x98
> > > >  mm_release+0x7c/0x90
> > > >  exit_mm_release+0xc/0x14
> > > >  do_exit+0x1b4/0x81c
> > > >  do_group_exit+0x30/0x9c
> > > >  __wake_up_parent+0x0/0x24
> > > >  invoke_syscall.constprop.0+0x78/0xc4
> > > >  do_el0_svc+0x98/0xd0
> > > >  el0_svc+0x44/0xe4
> > > >  el0t_64_sync_handler+0xb0/0x12c
> > > >  el0t_64_sync+0x158/0x15c
> > > > 
> > > > 
> > > > 
> > > > 
> > > > ===================================================
> > > > DEPT: Circular dependency has been detected.
> > > > 5.17.0-rc1+ #3 Tainted: G        W        
> > > > ---------------------------------------------------
> > > > summary
> > > > ---------------------------------------------------
> > > > *** AA DEADLOCK ***
> > > > 
> > > > context A
> > > >     [S] (unknown)(&try_completion:0)
> > > >     [W] wait_for_completion_timeout(&try_completion:0)
> > > >     [E] complete(&try_completion:0)
> > > > 
> > > > [S]: start of the event context
> > > > [W]: the wait blocked
> > > > [E]: the event not reachable
> > > > ---------------------------------------------------
> > > > context A's detail
> > > > ---------------------------------------------------
> > > > context A
> > > >     [S] (unknown)(&try_completion:0)
> > > >     [W] wait_for_completion_timeout(&try_completion:0)
> > > >     [E] complete(&try_completion:0)
> > > > 
> > > > [S] (unknown)(&try_completion:0):
> > > > (N/A)
> > > > 
> > > > [W] wait_for_completion_timeout(&try_completion:0):
> > > > [<ffffffc008166bf4>] kunit_try_catch_run+0xb4/0x160
> > > > stacktrace:
> > > >       dept_wait+0x74/0x88
> > > >       wait_for_completion_timeout+0x64/0xa0
> > > >       kunit_try_catch_run+0xb4/0x160
> > > >       kunit_test_try_catch_successful_try_no_catch+0x3c/0x98
> > > >       kunit_try_run_case+0x9c/0xa0
> > > >       kunit_generic_run_threadfn_adapter+0x1c/0x28
> > > >       kthread+0xd4/0xe4
> > > >       ret_from_fork+0x10/0x20
> > > > 
> > > > [E] complete(&try_completion:0):
> > > > [<ffffffc00803dce4>] kthread_complete_and_exit+0x18/0x20
> > > > stacktrace:
> > > >       dept_event+0xe0/0x100
> > > >       complete+0x48/0x98
> > > >       kthread_complete_and_exit+0x18/0x20
> > > >       kunit_try_catch_throw+0x0/0x1c
> > > >       kthread+0xd4/0xe4
> > > >       ret_from_fork+0x10/0x20
> > > > 
> > > > ---------------------------------------------------
> > > > information that might be helpful
> > > > ---------------------------------------------------
> > > > CPU: 15 PID: 132 Comm: kunit_try_catch Tainted: G        W         5.17.0-rc1+ #3
> > > > Hardware name: linux,dummy-virt (DT)
> > > > Call trace:
> > > >  dump_backtrace.part.0+0x9c/0xc4
> > > >  show_stack+0x14/0x28
> > > >  dump_stack_lvl+0x9c/0xcc
> > > >  dump_stack+0x14/0x2c
> > > >  print_circle+0x2d4/0x438
> > > >  cb_check_dl+0x44/0x70
> > > >  bfs+0x60/0x168
> > > >  add_dep+0x88/0x11c
> > > >  do_event.constprop.0+0x19c/0x2c0
> > > >  dept_event+0xe0/0x100
> > > >  complete+0x48/0x98
> > > >  kthread_complete_and_exit+0x18/0x20
> > > >  kunit_try_catch_throw+0x0/0x1c
> > > >  kthread+0xd4/0xe4
> > > >  ret_from_fork+0x10/0x20
> > > > 
> > > > 
> > > > > Benifit:
> > > > > 
> > > > > 	0. Works with all lock primitives.
> > > > > 	1. Works with wait_for_completion()/complete().
> > > > > 	2. Works with 'wait' on PG_locked.
> > > > > 	3. Works with 'wait' on PG_writeback.
> > > > > 	4. Works with swait/wakeup.
> > > > > 	5. Works with waitqueue.
> > > > > 	6. Multiple reports are allowed.
> > > > > 	7. Deduplication control on multiple reports.
> > > > > 	8. Withstand false positives thanks to 6.
> > > > > 	9. Easy to tag any wait/event.
> > > > > 
> > > > > Future work:
> > > > 
> > > > [...]
> > > > 
> > > > > -- 
> > > > > 1.9.1
> > > > > 
> > > > 
> > > > -- 
> > > > Thank you, You are awesome!
> > > > Hyeonggon :-)