mbox series

[00/16] ptrace: cleanups and calling do_cldstop with only siglock

Message ID 871qwq5ucx.fsf_-_@email.froward.int.ebiederm.org
Headers show
Series ptrace: cleanups and calling do_cldstop with only siglock | expand

Message

Eric W. Biederman May 18, 2022, 10:49 p.m. UTC
For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
ptrace_freeze_traced has completed successfully.  Which fundamentally
means the lock dance of dropping siglock and grabbing tasklist_lock does
not work on PREEMPT_RT.  So I have worked through what is necessary so
that tasklist_lock does not need to be grabbed in ptrace_stop after
siglock is dropped.

I have explored several alternate ways of getting there and along the
way I found a lot of small bug fixes/cleanups that don't necessarily
contribute to the final result but that or worthwhile on their own.  So
I have included those changes in this set of changes just so they don't
get lost.

In addition I had a conversation with Thomas Gleixner recently that
emphasized for me the need to reduce the hold times of tasklist_lock,
and that made me realize that in principle it is possible.
https://lkml.kernel.org/r/87mtfmhap2.fsf@email.froward.int.ebiederm.org

Which is a long way of saying that not taking tasklist_lock in
ptrace_stop is good not just for PREMPT_RT but also for improving the
scalability of the kernel in general.

After this set of changes only cgroup_enter_frozen should remain a
stumbling block for PREEMPT_RT in the ptrace_stop path.

Eric W. Biederman (16):
      signal/alpha: Remove unused definition of TASK_REAL_PARENT
      signal/ia64: Remove unused definition of IA64_TASK_REAL_PARENT_OFFSET
      kdb: Use real_parent when displaying a list of processes
      powerpc/xmon:  Use real_parent when displaying a list of processes
      ptrace: Remove dead code from __ptrace_detach
      ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo
      signal: Wake up the designated parent
      ptrace: Only populate last_siginfo from ptrace
      ptrace: In ptrace_setsiginfo deal with invalid si_signo
      ptrace: In ptrace_signal look at what the debugger did with siginfo
      ptrace: Use si_sino as the signal number to resume with
      ptrace: Stop protecting ptrace_set_signr with tasklist_lock
      ptrace: Document why ptrace_setoptions does not need a lock
      signal: Protect parent child relationships by childs siglock
      ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
      signal: Always call do_notify_parent_cldstop with siglock held

 arch/alpha/kernel/asm-offsets.c |   1 -
 arch/ia64/kernel/asm-offsets.c  |   1 -
 arch/powerpc/xmon/xmon.c        |   2 +-
 kernel/debug/kdb/kdb_main.c     |   2 +-
 kernel/exit.c                   |  23 +++-
 kernel/fork.c                   |  12 +-
 kernel/ptrace.c                 | 132 ++++++++----------
 kernel/signal.c                 | 296 ++++++++++++++++++++++++++--------------
 8 files changed, 279 insertions(+), 190 deletions(-)

Eric

Comments

Sebastian Andrzej Siewior May 19, 2022, 6:19 a.m. UTC | #1
On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> Is there a git branch somewhere I can pull to test this? It doesn't apply
> cleanly to Linus's tip.

https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19

> - Kyle

Sebastian
Eric W. Biederman May 19, 2022, 6:05 p.m. UTC | #2
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
>> Is there a git branch somewhere I can pull to test this? It doesn't apply
>> cleanly to Linus's tip.
>
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19

Yes that is the branch this all applies to.

This is my second round of cleanups this cycle for this code.
I just keep finding little things that deserve to be changed,
when I am working on the more substantial issues.

Eric
Kyle Huey May 20, 2022, 5:24 a.m. UTC | #3
On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>
> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
> >> cleanly to Linus's tip.
> >
> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
>
> Yes that is the branch this all applies to.
>
> This is my second round of cleanups this cycle for this code.
> I just keep finding little things that deserve to be changed,
> when I am working on the more substantial issues.
>
> Eric

When running the rr test suite, I see hangs like this

[  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
[condvar_stress-:12152]
[  812.151529] Modules linked in: snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
snd_hda_codec_
hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
btintel btmtk snd_seq_device rapl bluetooth snd_timer i
ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
odel ipmi_devintf ipmi_msghandler msr vhost_vsock
vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
[  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
xhci_pci_renesas wmi video
[  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
    I  L    5.18.0-rc1+ #2
[  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
[  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
[  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
a c1 9a 5f 85 c0 74 02 5d
[  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
[  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
[  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
[  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
[  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
[  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
[  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
knlGS:0000000000000000
[  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
[  812.151601] Call Trace:
[  812.151602]  <TASK>
[  812.151604]  do_signal_stop+0x228/0x260
[  812.151606]  get_signal+0x43a/0x8e0
[  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
[  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
[  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
[  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
[  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
[  812.151620]  syscall_exit_to_user_mode+0x26/0x40
[  812.151621]  ret_from_fork+0x15/0x30
[  812.151623] RIP: 0033:0x7f612dfcd125
[  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
0 48 89 c7 b8 3c 00 00 00
[  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
0000000000000038
[  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
[  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
[  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
[  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
[  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
[  812.151632]  </TASK>

- Kyle
Sebastian Andrzej Siewior May 20, 2022, 7:33 a.m. UTC | #4
On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> 
> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> ptrace_freeze_traced has completed successfully.  Which fundamentally
> means the lock dance of dropping siglock and grabbing tasklist_lock does
> not work on PREEMPT_RT.  So I have worked through what is necessary so
> that tasklist_lock does not need to be grabbed in ptrace_stop after
> siglock is dropped.
…
It took me a while to realise that this is a follow-up I somehow assumed
that you added a few patches on top. Might have been the yesterday's
heat. b4 also refused to download this series because the v4 in this
thread looked newer… Anyway. Both series applied:

| =============================
| WARNING: suspicious RCU usage
| 5.18.0-rc7+ #16 Not tainted
| -----------------------------
| include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
|
| other info that might help us debug this:
|
| rcu_scheduler_active = 2, debug_locks = 1
| 2 locks held by ssdd/1734:
|  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
|  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
|
| stack backtrace:
| CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
| Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
| Call Trace:
|  <TASK>
|  dump_stack_lvl+0x45/0x5a
|  unlock_parents_siglocks+0xb6/0xc0
|  ptrace_stop+0xb9/0x390
|  get_signal+0x51c/0x8d0
|  arch_do_signal_or_restart+0x31/0x750
|  exit_to_user_mode_prepare+0x157/0x220
|  irqentry_exit_to_user_mode+0x5/0x50
|  asm_sysvec_apic_timer_interrupt+0x12/0x20

That is ptrace_parent() in unlock_parents_siglocks().

Sebastian
Sebastian Andrzej Siewior May 20, 2022, 9:19 a.m. UTC | #5
On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> After this set of changes only cgroup_enter_frozen should remain a
> stumbling block for PREEMPT_RT in the ptrace_stop path.

Yes, I can confirm that. I have no systemd-less system at hand which
means I can't boot a kernel without CGROUP support. But after removing
cgroup_{enter|leave}_frozen() in ptrace_stop() I don't see the problems
I saw earlier.

Sebastian
Eric W. Biederman May 20, 2022, 7:32 p.m. UTC | #6
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
>> 
>> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
>> ptrace_freeze_traced has completed successfully.  Which fundamentally
>> means the lock dance of dropping siglock and grabbing tasklist_lock does
>> not work on PREEMPT_RT.  So I have worked through what is necessary so
>> that tasklist_lock does not need to be grabbed in ptrace_stop after
>> siglock is dropped.
> …
> It took me a while to realise that this is a follow-up I somehow assumed
> that you added a few patches on top. Might have been the yesterday's
> heat. b4 also refused to download this series because the v4 in this
> thread looked newer… Anyway. Both series applied:
>
> | =============================
> | WARNING: suspicious RCU usage
> | 5.18.0-rc7+ #16 Not tainted
> | -----------------------------
> | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> |
> | other info that might help us debug this:
> |
> | rcu_scheduler_active = 2, debug_locks = 1
> | 2 locks held by ssdd/1734:
> |  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> |  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> |
> | stack backtrace:
> | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> | Call Trace:
> |  <TASK>
> |  dump_stack_lvl+0x45/0x5a
> |  unlock_parents_siglocks+0xb6/0xc0
> |  ptrace_stop+0xb9/0x390
> |  get_signal+0x51c/0x8d0
> |  arch_do_signal_or_restart+0x31/0x750
> |  exit_to_user_mode_prepare+0x157/0x220
> |  irqentry_exit_to_user_mode+0x5/0x50
> |  asm_sysvec_apic_timer_interrupt+0x12/0x20
>
> That is ptrace_parent() in unlock_parents_siglocks().

How odd.  I thought I had the appropriate lockdep config options enabled
in my test build to catch things like this.  I guess not.

Now I am trying to think how to tell it that holding the appropriate
iglock makes this ok.

Eric
Peter Zijlstra May 20, 2022, 7:58 p.m. UTC | #7
On Fri, May 20, 2022 at 02:32:24PM -0500, Eric W. Biederman wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> 
> > On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> >> 
> >> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> >> ptrace_freeze_traced has completed successfully.  Which fundamentally
> >> means the lock dance of dropping siglock and grabbing tasklist_lock does
> >> not work on PREEMPT_RT.  So I have worked through what is necessary so
> >> that tasklist_lock does not need to be grabbed in ptrace_stop after
> >> siglock is dropped.
> > …
> > It took me a while to realise that this is a follow-up I somehow assumed
> > that you added a few patches on top. Might have been the yesterday's
> > heat. b4 also refused to download this series because the v4 in this
> > thread looked newer… Anyway. Both series applied:
> >
> > | =============================
> > | WARNING: suspicious RCU usage
> > | 5.18.0-rc7+ #16 Not tainted
> > | -----------------------------
> > | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> > |
> > | other info that might help us debug this:
> > |
> > | rcu_scheduler_active = 2, debug_locks = 1
> > | 2 locks held by ssdd/1734:
> > |  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> > |  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> > |
> > | stack backtrace:
> > | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> > | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> > | Call Trace:
> > |  <TASK>
> > |  dump_stack_lvl+0x45/0x5a
> > |  unlock_parents_siglocks+0xb6/0xc0
> > |  ptrace_stop+0xb9/0x390
> > |  get_signal+0x51c/0x8d0
> > |  arch_do_signal_or_restart+0x31/0x750
> > |  exit_to_user_mode_prepare+0x157/0x220
> > |  irqentry_exit_to_user_mode+0x5/0x50
> > |  asm_sysvec_apic_timer_interrupt+0x12/0x20
> >
> > That is ptrace_parent() in unlock_parents_siglocks().
> 
> How odd.  I thought I had the appropriate lockdep config options enabled
> in my test build to catch things like this.  I guess not.
> 
> Now I am trying to think how to tell it that holding the appropriate
> iglock makes this ok.

The typical annotation is something like:

	rcu_dereference_protected(foo, lockdep_is_held(&bar))

Except in this case I think the problem is that bar depends on foo in
non-trivial ways. That is, foo is 'task->parent' and bar is
'task->parent->sighand->siglock' or something.

The other option is to use rcu_dereference_raw() in this one instance
and have a comment that explains the situation.
Eric W. Biederman June 6, 2022, 4:12 p.m. UTC | #8
Kyle Huey <khuey@pernos.co> writes:

> On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>>
>> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
>> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
>> >> cleanly to Linus's tip.
>> >
>> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
>>
>> Yes that is the branch this all applies to.
>>
>> This is my second round of cleanups this cycle for this code.
>> I just keep finding little things that deserve to be changed,
>> when I am working on the more substantial issues.
>>
>> Eric
>
> When running the rr test suite, I see hangs like this

Thanks.  I will dig into this.

Is there an easy way I can run the rr test suite to see if I can
reproduce this myself?

Thanks,
Eric

>
> [  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
> [condvar_stress-:12152]
> [  812.151529] Modules linked in: snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
> algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
> snd_hda_codec_
> hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
> snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
> snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
> btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
> rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
> btintel btmtk snd_seq_device rapl bluetooth snd_timer i
> ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
> ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
> input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
> odel ipmi_devintf ipmi_msghandler msr vhost_vsock
> vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
> tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
> autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
> [  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
> i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
> drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
> llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
> psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
> xhci_pci_renesas wmi video
> [  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
>     I  L    5.18.0-rc1+ #2
> [  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
> [  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
> [  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
> 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
> 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
> a c1 9a 5f 85 c0 74 02 5d
> [  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
> [  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
> [  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
> [  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
> [  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
> [  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
> [  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
> knlGS:0000000000000000
> [  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
> [  812.151601] Call Trace:
> [  812.151602]  <TASK>
> [  812.151604]  do_signal_stop+0x228/0x260
> [  812.151606]  get_signal+0x43a/0x8e0
> [  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
> [  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
> [  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
> [  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
> [  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
> [  812.151620]  syscall_exit_to_user_mode+0x26/0x40
> [  812.151621]  ret_from_fork+0x15/0x30
> [  812.151623] RIP: 0033:0x7f612dfcd125
> [  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
> 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
> 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
> 0 48 89 c7 b8 3c 00 00 00
> [  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000038
> [  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
> [  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
> [  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
> [  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
> [  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
> [  812.151632]  </TASK>
>
> - Kyle
Kyle Huey June 9, 2022, 7:59 p.m. UTC | #9
On Mon, Jun 6, 2022 at 9:12 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Kyle Huey <khuey@pernos.co> writes:
>
> > On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
> > <ebiederm@xmission.com> wrote:
> >>
> >> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> >>
> >> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> >> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
> >> >> cleanly to Linus's tip.
> >> >
> >> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
> >>
> >> Yes that is the branch this all applies to.
> >>
> >> This is my second round of cleanups this cycle for this code.
> >> I just keep finding little things that deserve to be changed,
> >> when I am working on the more substantial issues.
> >>
> >> Eric
> >
> > When running the rr test suite, I see hangs like this
>
> Thanks.  I will dig into this.
>
> Is there an easy way I can run the rr test suite to see if I can
> reproduce this myself?

It should be a straight forward
1. https://github.com/rr-debugger/rr.git
2. mkdir obj-rr && cd obj-rr
3. cmake ../rr
4. make -jN
5. make check

If you have trouble with it feel free to email me off list.

- Kyle

> Thanks,
> Eric
>
> >
> > [  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
> > [condvar_stress-:12152]
> > [  812.151529] Modules linked in: snd_hda_codec_realtek
> > snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
> > algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
> > snd_hda_codec_
> > hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
> > snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
> > snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
> > btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
> > rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
> > btintel btmtk snd_seq_device rapl bluetooth snd_timer i
> > ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
> > ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
> > input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
> > odel ipmi_devintf ipmi_msghandler msr vhost_vsock
> > vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
> > tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
> > autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
> > [  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
> > i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
> > drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
> > llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
> > psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
> > xhci_pci_renesas wmi video
> > [  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
> >     I  L    5.18.0-rc1+ #2
> > [  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
> > [  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
> > [  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
> > 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
> > 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
> > a c1 9a 5f 85 c0 74 02 5d
> > [  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
> > [  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
> > [  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
> > [  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
> > [  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
> > [  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
> > [  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
> > knlGS:0000000000000000
> > [  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
> > [  812.151601] Call Trace:
> > [  812.151602]  <TASK>
> > [  812.151604]  do_signal_stop+0x228/0x260
> > [  812.151606]  get_signal+0x43a/0x8e0
> > [  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
> > [  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
> > [  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
> > [  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
> > [  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
> > [  812.151620]  syscall_exit_to_user_mode+0x26/0x40
> > [  812.151621]  ret_from_fork+0x15/0x30
> > [  812.151623] RIP: 0033:0x7f612dfcd125
> > [  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
> > 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
> > 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
> > 0 48 89 c7 b8 3c 00 00 00
> > [  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000038
> > [  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
> > [  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
> > [  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
> > [  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
> > [  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
> > [  812.151632]  </TASK>
> >
> > - Kyle