mbox series

[v3,0/6] mm: Further memory block device cleanups

Message ID 20190620183139.4352-1-david@redhat.com (mailing list archive)
Headers show
Series mm: Further memory block device cleanups | expand

Message

David Hildenbrand June 20, 2019, 6:31 p.m. UTC
@Andrew: Only patch 1, 4 and 6 changed compared to v1.

Some further cleanups around memory block devices. Especially, clean up
and simplify walk_memory_range(). Including some other minor cleanups.

Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.

v2 -> v3:
- "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
-- Avoid warning on ppc.
- "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
-- Fixup a comment regarding hinted devices.

v1 -> v2:
- "mm: Section numbers use the type "unsigned long""
-- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
- "drivers/base/memory.c: Get rid of find_memory_block_hinted("
-- Fix compilation error
-- Get rid of the "hint" parameter completely

David Hildenbrand (6):
  mm: Section numbers use the type "unsigned long"
  drivers/base/memory: Use "unsigned long" for block ids
  mm: Make register_mem_sect_under_node() static
  mm/memory_hotplug: Rename walk_memory_range() and pass start+size
    instead of pfns
  mm/memory_hotplug: Move and simplify walk_memory_blocks()
  drivers/base/memory.c: Get rid of find_memory_block_hinted()

 arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
 drivers/acpi/acpi_memhotplug.c            |  19 +---
 drivers/base/memory.c                     | 120 +++++++++++++---------
 drivers/base/node.c                       |   8 +-
 include/linux/memory.h                    |   5 +-
 include/linux/memory_hotplug.h            |   2 -
 include/linux/mmzone.h                    |   4 +-
 include/linux/node.h                      |   7 --
 mm/memory_hotplug.c                       |  57 +---------
 mm/sparse.c                               |  12 +--
 10 files changed, 106 insertions(+), 151 deletions(-)

Comments

Qian Cai June 21, 2019, 3:15 p.m. UTC | #1
On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
> 
> Some further cleanups around memory block devices. Especially, clean up
> and simplify walk_memory_range(). Including some other minor cleanups.
> 
> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
> 
> v2 -> v3:
> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
> -- Avoid warning on ppc.
> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
> -- Fixup a comment regarding hinted devices.
> 
> v1 -> v2:
> - "mm: Section numbers use the type "unsigned long""
> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
> -- Fix compilation error
> -- Get rid of the "hint" parameter completely
> 
> David Hildenbrand (6):
>   mm: Section numbers use the type "unsigned long"
>   drivers/base/memory: Use "unsigned long" for block ids
>   mm: Make register_mem_sect_under_node() static
>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>     instead of pfns
>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
> 
>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>  drivers/acpi/acpi_memhotplug.c            |  19 +---
>  drivers/base/memory.c                     | 120 +++++++++++++---------
>  drivers/base/node.c                       |   8 +-
>  include/linux/memory.h                    |   5 +-
>  include/linux/memory_hotplug.h            |   2 -
>  include/linux/mmzone.h                    |   4 +-
>  include/linux/node.h                      |   7 --
>  mm/memory_hotplug.c                       |  57 +---------
>  mm/sparse.c                               |  12 +--
>  10 files changed, 106 insertions(+), 151 deletions(-)
> 

This series causes a few machines are unable to boot triggering endless soft
lockups. Reverted those commits fixed the issue.

97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
start+size instead of pfns"
c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
startsize-instead-of-pfns-fix"
34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify walk_memory_blocks()"
59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
find_memory_block_hinted()"
5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-find_memory_block_hinted-
v3"

[    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe ASPM,
so disable it
[    4.590405][    T1] ACPI: bus type PCI registered
[    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
0x80000000-0x8fffffff] (base 0x80000000)
[    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in
E820
[    4.601860][    T1] PCI: Using configuration type 1 for base access
[   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
[swapper/0:1]
[   28.671351][   C16] Modules linked in:
[   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
next-20190621+ #1
[   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 03/09/2018
[   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
[   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48 8b
55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d <65>
ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
[   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff13
[   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
ffffffffb6f2a3b8
[   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
ffff8882053d6138
[   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
ffffed1040a7ac27
[   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
0000000000000246
[   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
0000000000000000
[   28.761368][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
knlGS:0000000000000000
[   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
00000000001406a0
[   28.791333][   C16] Call Trace:
[   28.791374][   C16]  klist_next+0xd8/0x1c0
[   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
[   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
[   28.801370][   C16]  ? kobject_put+0x23/0x250
[   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
[   28.811353][   C16]  ? write_policy_show+0x40/0x40
[   28.821334][   C16]  link_mem_sections+0x7e/0xa0
[   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
[   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
[   28.831353][   C16]  topology_init+0xbf/0x126
[   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
[   28.841368][   C16]  do_one_initcall+0xfe/0x45a
[   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
[   28.851353][   C16]  ? kasan_check_write+0x14/0x20
[   28.861333][   C16]  ? up_write+0x75/0x140
[   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
[   28.871333][   C16]  ? rest_init+0x188/0x188
[   28.871353][   C16]  kernel_init+0x11/0x138
[   28.881363][   C16]  ? rest_init+0x188/0x188
[   28.881363][   C16]  ret_from_fork+0x22/0x40
[   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
[swapper/0:1]
[   56.671352][   C16] Modules linked in:
[   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
G             L    5.2.0-rc5-next-20190621+ #1
[   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 03/09/2018
[   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
[   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d 8b 7e
58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00 <75>
c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
[   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287 ORIG_RAX:
ffffffffffffff13
[   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
ffffffffb74c9dc1
[   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
ffff8888774ec3e0
[   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
ffffed1040a7ac27
[   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
0000000000085c1b
[   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
ffff8888774ec040
[   56.761372][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
knlGS:0000000000000000
[   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
00000000001406a0
[   56.791373][   C16] Call Trace:
[   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
[   56.801334][   C16]  ? kobject_put+0x23/0x250
[   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
[   56.811333][   C16]  ? write_policy_show+0x40/0x40
[   56.811353][   C16]  link_mem_sections+0x7e/0xa0
[   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
[   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
[   56.831333][   C16]  topology_init+0xbf/0x126
[   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
[   56.841334][   C16]  do_one_initcall+0xfe/0x45a
[   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
[   56.851333][   C16]  ? kasan_check_write+0x14/0x20
[   56.851354][   C16]  ? up_write+0x75/0x140
[   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
[   56.861333][   C16]  ? rest_init+0x188/0x188
[   56.861369][   C16]  kernel_init+0x11/0x138
[   56.871333][   C16]  ? rest_init+0x188/0x188
[   56.871354][   C16]  ret_from_fork+0x22/0x40
[   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
[   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
[   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
[   64.621334][   C16] NMI backtrace for cpu 16
[   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
G             L    5.2.0-rc5-next-20190621+ #1
[   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 03/09/2018
[   64.641371][   C16] Call Trace:
[   64.651337][   C16]  <IRQ>
[   64.651376][   C16]  dump_stack+0x62/0x9a
[   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
[   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
[   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
[   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
[   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
[   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
[   64.691336][   C16]  ? kasan_check_read+0x11/0x20
[   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
[   64.701336][   C16]  update_process_times+0x2f/0x60
[   64.701362][   C16]  tick_periodic+0x38/0xe0
[   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
[   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
[   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
[   64.721367][   C16]  </IRQ>
[   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
[   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53
David Hildenbrand June 21, 2019, 3:22 p.m. UTC | #2
On 21.06.19 17:15, Qian Cai wrote:
> On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
>> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
>>
>> Some further cleanups around memory block devices. Especially, clean up
>> and simplify walk_memory_range(). Including some other minor cleanups.
>>
>> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
>>
>> v2 -> v3:
>> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
>> -- Avoid warning on ppc.
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
>> -- Fixup a comment regarding hinted devices.
>>
>> v1 -> v2:
>> - "mm: Section numbers use the type "unsigned long""
>> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
>> -- Fix compilation error
>> -- Get rid of the "hint" parameter completely
>>
>> David Hildenbrand (6):
>>   mm: Section numbers use the type "unsigned long"
>>   drivers/base/memory: Use "unsigned long" for block ids
>>   mm: Make register_mem_sect_under_node() static
>>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>>     instead of pfns
>>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
>>
>>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>>  drivers/acpi/acpi_memhotplug.c            |  19 +---
>>  drivers/base/memory.c                     | 120 +++++++++++++---------
>>  drivers/base/node.c                       |   8 +-
>>  include/linux/memory.h                    |   5 +-
>>  include/linux/memory_hotplug.h            |   2 -
>>  include/linux/mmzone.h                    |   4 +-
>>  include/linux/node.h                      |   7 --
>>  mm/memory_hotplug.c                       |  57 +---------
>>  mm/sparse.c                               |  12 +--
>>  10 files changed, 106 insertions(+), 151 deletions(-)
>>
> 
> This series causes a few machines are unable to boot triggering endless soft
> lockups. Reverted those commits fixed the issue.
> 
> 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
> start+size instead of pfns"
> c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> startsize-instead-of-pfns-fix"
> 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify walk_memory_blocks()"
> 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> find_memory_block_hinted()"
> 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-find_memory_block_hinted-
> v3"
> 
> [    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe ASPM,
> so disable it
> [    4.590405][    T1] ACPI: bus type PCI registered
> [    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> 0x80000000-0x8fffffff] (base 0x80000000)
> [    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in
> E820
> [    4.601860][    T1] PCI: Using configuration type 1 for base access
> [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> [swapper/0:1]
> [   28.671351][   C16] Modules linked in:
> [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
> next-20190621+ #1
> [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48 8b
> 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d <65>
> ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> [   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff13
> [   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
> ffffffffb6f2a3b8
> [   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
> ffff8882053d6138
> [   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
> ffffed1040a7ac27
> [   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> 0000000000000246
> [   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
> 0000000000000000
> [   28.761368][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
> knlGS:0000000000000000
> [   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> 00000000001406a0
> [   28.791333][   C16] Call Trace:
> [   28.791374][   C16]  klist_next+0xd8/0x1c0
> [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
> [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
> [   28.801370][   C16]  ? kobject_put+0x23/0x250
> [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
> [   28.811353][   C16]  ? write_policy_show+0x40/0x40
> [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
> [   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> [   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
> [   28.831353][   C16]  topology_init+0xbf/0x126
> [   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> [   28.841368][   C16]  do_one_initcall+0xfe/0x45a
> [   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
> [   28.851353][   C16]  ? kasan_check_write+0x14/0x20
> [   28.861333][   C16]  ? up_write+0x75/0x140
> [   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
> [   28.871333][   C16]  ? rest_init+0x188/0x188
> [   28.871353][   C16]  kernel_init+0x11/0x138
> [   28.881363][   C16]  ? rest_init+0x188/0x188
> [   28.881363][   C16]  ret_from_fork+0x22/0x40
> [   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> [swapper/0:1]
> [   56.671352][   C16] Modules linked in:
> [   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> G             L    5.2.0-rc5-next-20190621+ #1
> [   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
> [   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d 8b 7e
> 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00 <75>
> c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
> [   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287 ORIG_RAX:
> ffffffffffffff13
> [   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
> ffffffffb74c9dc1
> [   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
> ffff8888774ec3e0
> [   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
> ffffed1040a7ac27
> [   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> 0000000000085c1b
> [   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
> ffff8888774ec040
> [   56.761372][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
> knlGS:0000000000000000
> [   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> 00000000001406a0
> [   56.791373][   C16] Call Trace:
> [   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
> [   56.801334][   C16]  ? kobject_put+0x23/0x250
> [   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
> [   56.811333][   C16]  ? write_policy_show+0x40/0x40
> [   56.811353][   C16]  link_mem_sections+0x7e/0xa0
> [   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> [   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
> [   56.831333][   C16]  topology_init+0xbf/0x126
> [   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> [   56.841334][   C16]  do_one_initcall+0xfe/0x45a
> [   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
> [   56.851333][   C16]  ? kasan_check_write+0x14/0x20
> [   56.851354][   C16]  ? up_write+0x75/0x140
> [   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
> [   56.861333][   C16]  ? rest_init+0x188/0x188
> [   56.861369][   C16]  kernel_init+0x11/0x138
> [   56.871333][   C16]  ? rest_init+0x188/0x188
> [   56.871354][   C16]  ret_from_fork+0x22/0x40
> [   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
> [   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
> idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
> [   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
> [   64.621334][   C16] NMI backtrace for cpu 16
> [   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> G             L    5.2.0-rc5-next-20190621+ #1
> [   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   64.641371][   C16] Call Trace:
> [   64.651337][   C16]  <IRQ>
> [   64.651376][   C16]  dump_stack+0x62/0x9a
> [   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
> [   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
> [   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
> [   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
> [   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
> [   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
> [   64.691336][   C16]  ? kasan_check_read+0x11/0x20
> [   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
> [   64.701336][   C16]  update_process_times+0x2f/0x60
> [   64.701362][   C16]  tick_periodic+0x38/0xe0
> [   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
> [   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
> [   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
> [   64.721367][   C16]  </IRQ>
> [   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> [   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 
> 

Thanks for the report. Man, this series is nastier than I thought. This
is making more noise than I was hoping for.

@Andrew can you revert patch 4-6 for now? I'll be on vacation soon and
don't want cleanups to constantly break things. Just nasty.
David Hildenbrand June 21, 2019, 6:24 p.m. UTC | #3
On 21.06.19 17:15, Qian Cai wrote:
> On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
>> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
>>
>> Some further cleanups around memory block devices. Especially, clean up
>> and simplify walk_memory_range(). Including some other minor cleanups.
>>
>> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
>>
>> v2 -> v3:
>> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
>> -- Avoid warning on ppc.
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
>> -- Fixup a comment regarding hinted devices.
>>
>> v1 -> v2:
>> - "mm: Section numbers use the type "unsigned long""
>> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
>> -- Fix compilation error
>> -- Get rid of the "hint" parameter completely
>>
>> David Hildenbrand (6):
>>   mm: Section numbers use the type "unsigned long"
>>   drivers/base/memory: Use "unsigned long" for block ids
>>   mm: Make register_mem_sect_under_node() static
>>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>>     instead of pfns
>>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
>>
>>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>>  drivers/acpi/acpi_memhotplug.c            |  19 +---
>>  drivers/base/memory.c                     | 120 +++++++++++++---------
>>  drivers/base/node.c                       |   8 +-
>>  include/linux/memory.h                    |   5 +-
>>  include/linux/memory_hotplug.h            |   2 -
>>  include/linux/mmzone.h                    |   4 +-
>>  include/linux/node.h                      |   7 --
>>  mm/memory_hotplug.c                       |  57 +---------
>>  mm/sparse.c                               |  12 +--
>>  10 files changed, 106 insertions(+), 151 deletions(-)
>>
> 
> This series causes a few machines are unable to boot triggering endless soft
> lockups. Reverted those commits fixed the issue.
> 
> 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
> start+size instead of pfns"
> c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> startsize-instead-of-pfns-fix"
> 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify walk_memory_blocks()"
> 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> find_memory_block_hinted()"
> 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-find_memory_block_hinted-
> v3"
> 
> [    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe ASPM,
> so disable it
> [    4.590405][    T1] ACPI: bus type PCI registered
> [    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> 0x80000000-0x8fffffff] (base 0x80000000)
> [    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in
> E820
> [    4.601860][    T1] PCI: Using configuration type 1 for base access
> [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> [swapper/0:1]
> [   28.671351][   C16] Modules linked in:
> [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
> next-20190621+ #1
> [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48 8b
> 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d <65>
> ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> [   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff13
> [   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
> ffffffffb6f2a3b8
> [   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
> ffff8882053d6138
> [   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
> ffffed1040a7ac27
> [   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> 0000000000000246
> [   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
> 0000000000000000
> [   28.761368][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
> knlGS:0000000000000000
> [   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> 00000000001406a0
> [   28.791333][   C16] Call Trace:
> [   28.791374][   C16]  klist_next+0xd8/0x1c0
> [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
> [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
> [   28.801370][   C16]  ? kobject_put+0x23/0x250
> [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
> [   28.811353][   C16]  ? write_policy_show+0x40/0x40
> [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
> [   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> [   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
> [   28.831353][   C16]  topology_init+0xbf/0x126
> [   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> [   28.841368][   C16]  do_one_initcall+0xfe/0x45a
> [   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
> [   28.851353][   C16]  ? kasan_check_write+0x14/0x20
> [   28.861333][   C16]  ? up_write+0x75/0x140
> [   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
> [   28.871333][   C16]  ? rest_init+0x188/0x188
> [   28.871353][   C16]  kernel_init+0x11/0x138
> [   28.881363][   C16]  ? rest_init+0x188/0x188
> [   28.881363][   C16]  ret_from_fork+0x22/0x40
> [   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> [swapper/0:1]
> [   56.671352][   C16] Modules linked in:
> [   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> G             L    5.2.0-rc5-next-20190621+ #1
> [   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
> [   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d 8b 7e
> 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00 <75>
> c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
> [   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287 ORIG_RAX:
> ffffffffffffff13
> [   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
> ffffffffb74c9dc1
> [   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
> ffff8888774ec3e0
> [   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
> ffffed1040a7ac27
> [   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> 0000000000085c1b
> [   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
> ffff8888774ec040
> [   56.761372][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
> knlGS:0000000000000000
> [   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> 00000000001406a0
> [   56.791373][   C16] Call Trace:
> [   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
> [   56.801334][   C16]  ? kobject_put+0x23/0x250
> [   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
> [   56.811333][   C16]  ? write_policy_show+0x40/0x40
> [   56.811353][   C16]  link_mem_sections+0x7e/0xa0
> [   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> [   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
> [   56.831333][   C16]  topology_init+0xbf/0x126
> [   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> [   56.841334][   C16]  do_one_initcall+0xfe/0x45a
> [   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
> [   56.851333][   C16]  ? kasan_check_write+0x14/0x20
> [   56.851354][   C16]  ? up_write+0x75/0x140
> [   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
> [   56.861333][   C16]  ? rest_init+0x188/0x188
> [   56.861369][   C16]  kernel_init+0x11/0x138
> [   56.871333][   C16]  ? rest_init+0x188/0x188
> [   56.871354][   C16]  ret_from_fork+0x22/0x40
> [   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
> [   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
> idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
> [   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
> [   64.621334][   C16] NMI backtrace for cpu 16
> [   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> G             L    5.2.0-rc5-next-20190621+ #1
> [   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   64.641371][   C16] Call Trace:
> [   64.651337][   C16]  <IRQ>
> [   64.651376][   C16]  dump_stack+0x62/0x9a
> [   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
> [   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
> [   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
> [   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
> [   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
> [   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
> [   64.691336][   C16]  ? kasan_check_read+0x11/0x20
> [   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
> [   64.701336][   C16]  update_process_times+0x2f/0x60
> [   64.701362][   C16]  tick_periodic+0x38/0xe0
> [   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
> [   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
> [   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
> [   64.721367][   C16]  </IRQ>
> [   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> [   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 
> 

@Qian Cai, unfortunately I can't reproduce.

If you get the chance, it would be great if you could retry with

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 972c5336bebf..742f99ddd148 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
long size,
        unsigned long block_id;
        int ret = 0;

+       if (!size)
+               return;
+
        for (block_id = start_block_id; block_id <= end_block_id;
block_id++) {
                mem = find_memory_block_by_id(block_id);
                if (!mem)



If both, start and size are 0, we would get a veeeery long loop. This
would mean that we have an online node that does not span any pages at
all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).
David Hildenbrand June 21, 2019, 6:56 p.m. UTC | #4
On 21.06.19 20:24, David Hildenbrand wrote:
> On 21.06.19 17:15, Qian Cai wrote:
>> On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
>>> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
>>>
>>> Some further cleanups around memory block devices. Especially, clean up
>>> and simplify walk_memory_range(). Including some other minor cleanups.
>>>
>>> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
>>>
>>> v2 -> v3:
>>> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
>>> -- Avoid warning on ppc.
>>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
>>> -- Fixup a comment regarding hinted devices.
>>>
>>> v1 -> v2:
>>> - "mm: Section numbers use the type "unsigned long""
>>> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
>>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
>>> -- Fix compilation error
>>> -- Get rid of the "hint" parameter completely
>>>
>>> David Hildenbrand (6):
>>>   mm: Section numbers use the type "unsigned long"
>>>   drivers/base/memory: Use "unsigned long" for block ids
>>>   mm: Make register_mem_sect_under_node() static
>>>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>>>     instead of pfns
>>>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>>>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
>>>
>>>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>>>  drivers/acpi/acpi_memhotplug.c            |  19 +---
>>>  drivers/base/memory.c                     | 120 +++++++++++++---------
>>>  drivers/base/node.c                       |   8 +-
>>>  include/linux/memory.h                    |   5 +-
>>>  include/linux/memory_hotplug.h            |   2 -
>>>  include/linux/mmzone.h                    |   4 +-
>>>  include/linux/node.h                      |   7 --
>>>  mm/memory_hotplug.c                       |  57 +---------
>>>  mm/sparse.c                               |  12 +--
>>>  10 files changed, 106 insertions(+), 151 deletions(-)
>>>
>>
>> This series causes a few machines are unable to boot triggering endless soft
>> lockups. Reverted those commits fixed the issue.
>>
>> 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
>> start+size instead of pfns"
>> c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
>> startsize-instead-of-pfns-fix"
>> 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify walk_memory_blocks()"
>> 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
>> find_memory_block_hinted()"
>> 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-find_memory_block_hinted-
>> v3"
>>
>> [    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe ASPM,
>> so disable it
>> [    4.590405][    T1] ACPI: bus type PCI registered
>> [    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
>> 0x80000000-0x8fffffff] (base 0x80000000)
>> [    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in
>> E820
>> [    4.601860][    T1] PCI: Using configuration type 1 for base access
>> [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
>> [swapper/0:1]
>> [   28.671351][   C16] Modules linked in:
>> [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
>> next-20190621+ #1
>> [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
>> Gen10, BIOS A40 03/09/2018
>> [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
>> [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48 8b
>> 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d <65>
>> ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
>> [   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246 ORIG_RAX:
>> ffffffffffffff13
>> [   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
>> ffffffffb6f2a3b8
>> [   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
>> ffff8882053d6138
>> [   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
>> ffffed1040a7ac27
>> [   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
>> 0000000000000246
>> [   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
>> 0000000000000000
>> [   28.761368][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
>> knlGS:0000000000000000
>> [   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
>> 00000000001406a0
>> [   28.791333][   C16] Call Trace:
>> [   28.791374][   C16]  klist_next+0xd8/0x1c0
>> [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
>> [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
>> [   28.801370][   C16]  ? kobject_put+0x23/0x250
>> [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
>> [   28.811353][   C16]  ? write_policy_show+0x40/0x40
>> [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
>> [   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
>> [   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
>> [   28.831353][   C16]  topology_init+0xbf/0x126
>> [   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
>> [   28.841368][   C16]  do_one_initcall+0xfe/0x45a
>> [   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
>> [   28.851353][   C16]  ? kasan_check_write+0x14/0x20
>> [   28.861333][   C16]  ? up_write+0x75/0x140
>> [   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
>> [   28.871333][   C16]  ? rest_init+0x188/0x188
>> [   28.871353][   C16]  kernel_init+0x11/0x138
>> [   28.881363][   C16]  ? rest_init+0x188/0x188
>> [   28.881363][   C16]  ret_from_fork+0x22/0x40
>> [   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
>> [swapper/0:1]
>> [   56.671352][   C16] Modules linked in:
>> [   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
>> G             L    5.2.0-rc5-next-20190621+ #1
>> [   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
>> Gen10, BIOS A40 03/09/2018
>> [   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
>> [   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d 8b 7e
>> 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00 <75>
>> c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
>> [   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287 ORIG_RAX:
>> ffffffffffffff13
>> [   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
>> ffffffffb74c9dc1
>> [   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
>> ffff8888774ec3e0
>> [   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
>> ffffed1040a7ac27
>> [   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
>> 0000000000085c1b
>> [   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
>> ffff8888774ec040
>> [   56.761372][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
>> knlGS:0000000000000000
>> [   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
>> 00000000001406a0
>> [   56.791373][   C16] Call Trace:
>> [   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
>> [   56.801334][   C16]  ? kobject_put+0x23/0x250
>> [   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
>> [   56.811333][   C16]  ? write_policy_show+0x40/0x40
>> [   56.811353][   C16]  link_mem_sections+0x7e/0xa0
>> [   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
>> [   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
>> [   56.831333][   C16]  topology_init+0xbf/0x126
>> [   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
>> [   56.841334][   C16]  do_one_initcall+0xfe/0x45a
>> [   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
>> [   56.851333][   C16]  ? kasan_check_write+0x14/0x20
>> [   56.851354][   C16]  ? up_write+0x75/0x140
>> [   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
>> [   56.861333][   C16]  ? rest_init+0x188/0x188
>> [   56.861369][   C16]  kernel_init+0x11/0x138
>> [   56.871333][   C16]  ? rest_init+0x188/0x188
>> [   56.871354][   C16]  ret_from_fork+0x22/0x40
>> [   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
>> [   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
>> idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
>> [   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
>> [   64.621334][   C16] NMI backtrace for cpu 16
>> [   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
>> G             L    5.2.0-rc5-next-20190621+ #1
>> [   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
>> Gen10, BIOS A40 03/09/2018
>> [   64.641371][   C16] Call Trace:
>> [   64.651337][   C16]  <IRQ>
>> [   64.651376][   C16]  dump_stack+0x62/0x9a
>> [   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
>> [   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
>> [   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
>> [   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
>> [   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
>> [   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
>> [   64.691336][   C16]  ? kasan_check_read+0x11/0x20
>> [   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
>> [   64.701336][   C16]  update_process_times+0x2f/0x60
>> [   64.701362][   C16]  tick_periodic+0x38/0xe0
>> [   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
>> [   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
>> [   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
>> [   64.721367][   C16]  </IRQ>
>> [   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
>> [   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 
>>
> 
> @Qian Cai, unfortunately I can't reproduce.
> 
> If you get the chance, it would be great if you could retry with
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 972c5336bebf..742f99ddd148 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
> long size,
>         unsigned long block_id;
>         int ret = 0;
> 
> +       if (!size)
> +               return;
> +
>         for (block_id = start_block_id; block_id <= end_block_id;
> block_id++) {
>                 mem = find_memory_block_by_id(block_id);
>                 if (!mem)
> 
> 
> 
> If both, start and size are 0, we would get a veeeery long loop. This
> would mean that we have an online node that does not span any pages at
> all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).
> 


...trying to reproduce with QEMU (setting 0MB for the second node):

qemu-system-x86_64 --enable-kvm -m 4G,maxmem=20G,slots=2 \
	-smp sockets=2,cores=1 \
	-numa node,nodeid=0,cpus=0,mem=4G \
	-numa node,nodeid=1,cpus=1,mem=0 ...

I can indeed see that the node is online and
"pgdat->node_start_pfn == 0 && start_pfn + pgdat->node_spanned_pages == 0".

However, the kernel segfaults in an unrelated code path, so I can't
verify if this solves this problem:

[    0.313284] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[    0.313479] #PF: supervisor read access in kernel mode
[    0.313479] #PF: error_code(0x0000) - not-present page
[    0.313479] PGD 0 P4D 0 
[    0.313479] Oops: 0000 [#1] SMP PTI
[    0.313479] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-20190620+ #56
[    0.313479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
[    0.313479] RIP: 0010:bus_add_device+0x59/0x110
[    0.313479] Code: 20 48 89 df e8 f8 b4 ff ff 41 89 c4 85 c0 0f 85 81 00 00 00 48 8b 53 50 48 85 d2 75 03 48 8b 135
[    0.313479] RSP: 0000:ffffb4a6c0013e20 EFLAGS: 00010246
[    0.313479] RAX: 0000000000000000 RBX: ffff8b61bac23800 RCX: 0000000000000000
[    0.313479] RDX: ffff8b61bac29038 RSI: ffff8b61bac23800 RDI: ffff8b61bac23800
[    0.313479] RBP: ffffffff9d2f4500 R08: 0000000000000000 R09: 0000000000000001
[    0.313479] R10: 0000000000000000 R11: ffff8b61bad20878 R12: 0000000000000000
[    0.313479] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    0.313479] FS:  0000000000000000(0000) GS:ffff8b61bba00000(0000) knlGS:0000000000000000
[    0.313479] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.313479] CR2: 00000000000000a0 CR3: 0000000013c24000 CR4: 00000000000006f0
[    0.313479] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.313479] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.313479] Call Trace:
[    0.313479]  device_add+0x304/0x660
[    0.313479]  ? __init_waitqueue_head+0x31/0x50
[    0.313479]  __register_one_node+0x67/0x170
[    0.313479]  __try_online_node.cold+0x3e/0x78
[    0.313479]  try_online_node+0x25/0x40
[    0.313479]  do_cpu_up+0x36/0xc0
[    0.313479]  smp_init+0x59/0xb3
[    0.313479]  kernel_init_freeable+0x11a/0x247
[    0.313479]  ? rest_init+0x23f/0x23f
[    0.313479]  kernel_init+0x5/0xf1
[    0.313479]  ret_from_fork+0x3a/0x50
[    0.313479] Modules linked in:

Figuring out what goes wrong here (maybe QEMU creating a weird
system configuration) is a different journey :)
Qian Cai June 21, 2019, 7:07 p.m. UTC | #5
On Fri, 2019-06-21 at 20:56 +0200, David Hildenbrand wrote:
> On 21.06.19 20:24, David Hildenbrand wrote:
> > On 21.06.19 17:15, Qian Cai wrote:
> > > On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
> > > > @Andrew: Only patch 1, 4 and 6 changed compared to v1.
> > > > 
> > > > Some further cleanups around memory block devices. Especially, clean up
> > > > and simplify walk_memory_range(). Including some other minor cleanups.
> > > > 
> > > > Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
> > > > 
> > > > v2 -> v3:
> > > > - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
> > > > -- Avoid warning on ppc.
> > > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
> > > > -- Fixup a comment regarding hinted devices.
> > > > 
> > > > v1 -> v2:
> > > > - "mm: Section numbers use the type "unsigned long""
> > > > -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
> > > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
> > > > -- Fix compilation error
> > > > -- Get rid of the "hint" parameter completely
> > > > 
> > > > David Hildenbrand (6):
> > > >   mm: Section numbers use the type "unsigned long"
> > > >   drivers/base/memory: Use "unsigned long" for block ids
> > > >   mm: Make register_mem_sect_under_node() static
> > > >   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
> > > >     instead of pfns
> > > >   mm/memory_hotplug: Move and simplify walk_memory_blocks()
> > > >   drivers/base/memory.c: Get rid of find_memory_block_hinted()
> > > > 
> > > >  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
> > > >  drivers/acpi/acpi_memhotplug.c            |  19 +---
> > > >  drivers/base/memory.c                     | 120 +++++++++++++---------
> > > >  drivers/base/node.c                       |   8 +-
> > > >  include/linux/memory.h                    |   5 +-
> > > >  include/linux/memory_hotplug.h            |   2 -
> > > >  include/linux/mmzone.h                    |   4 +-
> > > >  include/linux/node.h                      |   7 --
> > > >  mm/memory_hotplug.c                       |  57 +---------
> > > >  mm/sparse.c                               |  12 +--
> > > >  10 files changed, 106 insertions(+), 151 deletions(-)
> > > > 
> > > 
> > > This series causes a few machines are unable to boot triggering endless
> > > soft
> > > lockups. Reverted those commits fixed the issue.
> > > 
> > > 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and
> > > pass
> > > start+size instead of pfns"
> > > c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> > > startsize-instead-of-pfns-fix"
> > > 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify
> > > walk_memory_blocks()"
> > > 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> > > find_memory_block_hinted()"
> > > 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-
> > > find_memory_block_hinted-
> > > v3"
> > > 
> > > [    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe
> > > ASPM,
> > > so disable it
> > > [    4.590405][    T1] ACPI: bus type PCI registered
> > > [    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> > > 0x80000000-0x8fffffff] (base 0x80000000)
> > > [    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff]
> > > reserved in
> > > E820
> > > [    4.601860][    T1] PCI: Using configuration type 1 for base access
> > > [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > > [swapper/0:1]
> > > [   28.671351][   C16] Modules linked in:
> > > [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-
> > > rc5-
> > > next-20190621+ #1
> > > [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385
> > > Gen10, BIOS A40 03/09/2018
> > > [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > > [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53
> > > 48 8b
> > > 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d
> > > <65>
> > > ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> > > [   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246
> > > ORIG_RAX:
> > > ffffffffffffff13
> > > [   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
> > > ffffffffb6f2a3b8
> > > [   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
> > > ffff8882053d6138
> > > [   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
> > > ffffed1040a7ac27
> > > [   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> > > 0000000000000246
> > > [   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
> > > 0000000000000000
> > > [   28.761368][   C16] FS:  0000000000000000(0000)
> > > GS:ffff888454500000(0000)
> > > knlGS:0000000000000000
> > > [   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> > > 00000000001406a0
> > > [   28.791333][   C16] Call Trace:
> > > [   28.791374][   C16]  klist_next+0xd8/0x1c0
> > > [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
> > > [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
> > > [   28.801370][   C16]  ? kobject_put+0x23/0x250
> > > [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
> > > [   28.811353][   C16]  ? write_policy_show+0x40/0x40
> > > [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
> > > [   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> > > [   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
> > > [   28.831353][   C16]  topology_init+0xbf/0x126
> > > [   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> > > [   28.841368][   C16]  do_one_initcall+0xfe/0x45a
> > > [   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
> > > [   28.851353][   C16]  ? kasan_check_write+0x14/0x20
> > > [   28.861333][   C16]  ? up_write+0x75/0x140
> > > [   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
> > > [   28.871333][   C16]  ? rest_init+0x188/0x188
> > > [   28.871353][   C16]  kernel_init+0x11/0x138
> > > [   28.881363][   C16]  ? rest_init+0x188/0x188
> > > [   28.881363][   C16]  ret_from_fork+0x22/0x40
> > > [   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > > [swapper/0:1]
> > > [   56.671352][   C16] Modules linked in:
> > > [   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> > > G             L    5.2.0-rc5-next-20190621+ #1
> > > [   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385
> > > Gen10, BIOS A40 03/09/2018
> > > [   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
> > > [   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d
> > > 8b 7e
> > > 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00
> > > <75>
> > > c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
> > > [   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287
> > > ORIG_RAX:
> > > ffffffffffffff13
> > > [   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
> > > ffffffffb74c9dc1
> > > [   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
> > > ffff8888774ec3e0
> > > [   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
> > > ffffed1040a7ac27
> > > [   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> > > 0000000000085c1b
> > > [   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
> > > ffff8888774ec040
> > > [   56.761372][   C16] FS:  0000000000000000(0000)
> > > GS:ffff888454500000(0000)
> > > knlGS:0000000000000000
> > > [   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> > > 00000000001406a0
> > > [   56.791373][   C16] Call Trace:
> > > [   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
> > > [   56.801334][   C16]  ? kobject_put+0x23/0x250
> > > [   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
> > > [   56.811333][   C16]  ? write_policy_show+0x40/0x40
> > > [   56.811353][   C16]  link_mem_sections+0x7e/0xa0
> > > [   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> > > [   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
> > > [   56.831333][   C16]  topology_init+0xbf/0x126
> > > [   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> > > [   56.841334][   C16]  do_one_initcall+0xfe/0x45a
> > > [   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
> > > [   56.851333][   C16]  ? kasan_check_write+0x14/0x20
> > > [   56.851354][   C16]  ? up_write+0x75/0x140
> > > [   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
> > > [   56.861333][   C16]  ? rest_init+0x188/0x188
> > > [   56.861369][   C16]  kernel_init+0x11/0x138
> > > [   56.871333][   C16]  ? rest_init+0x188/0x188
> > > [   56.871354][   C16]  ret_from_fork+0x22/0x40
> > > [   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
> > > [   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
> > > idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
> > > [   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
> > > [   64.621334][   C16] NMI backtrace for cpu 16
> > > [   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> > > G             L    5.2.0-rc5-next-20190621+ #1
> > > [   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385
> > > Gen10, BIOS A40 03/09/2018
> > > [   64.641371][   C16] Call Trace:
> > > [   64.651337][   C16]  <IRQ>
> > > [   64.651376][   C16]  dump_stack+0x62/0x9a
> > > [   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
> > > [   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
> > > [   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
> > > [   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
> > > [   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
> > > [   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
> > > [   64.691336][   C16]  ? kasan_check_read+0x11/0x20
> > > [   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
> > > [   64.701336][   C16]  update_process_times+0x2f/0x60
> > > [   64.701362][   C16]  tick_periodic+0x38/0xe0
> > > [   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
> > > [   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
> > > [   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
> > > [   64.721367][   C16]  </IRQ>
> > > [   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > > [   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 
> > > 
> > 
> > @Qian Cai, unfortunately I can't reproduce.
> > 
> > If you get the chance, it would be great if you could retry with
> > 
> > diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> > index 972c5336bebf..742f99ddd148 100644
> > --- a/drivers/base/memory.c
> > +++ b/drivers/base/memory.c
> > @@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
> > long size,
> >         unsigned long block_id;
> >         int ret = 0;
> > 
> > +       if (!size)
> > +               return;
> > +
> >         for (block_id = start_block_id; block_id <= end_block_id;
> > block_id++) {
> >                 mem = find_memory_block_by_id(block_id);
> >                 if (!mem)
> > 
> > 
> > 
> > If both, start and size are 0, we would get a veeeery long loop. This
> > would mean that we have an online node that does not span any pages at
> > all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).
> > 
> 
> 
> ...trying to reproduce with QEMU (setting 0MB for the second node):
> 
> qemu-system-x86_64 --enable-kvm -m 4G,maxmem=20G,slots=2 \
> 	-smp sockets=2,cores=1 \
> 	-numa node,nodeid=0,cpus=0,mem=4G \
> 	-numa node,nodeid=1,cpus=1,mem=0 ...
> 
> I can indeed see that the node is online and
> "pgdat->node_start_pfn == 0 && start_pfn + pgdat->node_spanned_pages == 0".
> 
> However, the kernel segfaults in an unrelated code path, so I can't
> verify if this solves this problem:
> 
> [    0.313284] BUG: kernel NULL pointer dereference, address: 00000000000000a0
> [    0.313479] #PF: supervisor read access in kernel mode
> [    0.313479] #PF: error_code(0x0000) - not-present page
> [    0.313479] PGD 0 P4D 0 
> [    0.313479] Oops: 0000 [#1] SMP PTI
> [    0.313479] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-
> 20190620+ #56
> [    0.313479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
> [    0.313479] RIP: 0010:bus_add_device+0x59/0x110
> [    0.313479] Code: 20 48 89 df e8 f8 b4 ff ff 41 89 c4 85 c0 0f 85 81 00 00
> 00 48 8b 53 50 48 85 d2 75 03 48 8b 135
> [    0.313479] RSP: 0000:ffffb4a6c0013e20 EFLAGS: 00010246
> [    0.313479] RAX: 0000000000000000 RBX: ffff8b61bac23800 RCX:
> 0000000000000000
> [    0.313479] RDX: ffff8b61bac29038 RSI: ffff8b61bac23800 RDI:
> ffff8b61bac23800
> [    0.313479] RBP: ffffffff9d2f4500 R08: 0000000000000000 R09:
> 0000000000000001
> [    0.313479] R10: 0000000000000000 R11: ffff8b61bad20878 R12:
> 0000000000000000
> [    0.313479] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> [    0.313479] FS:  0000000000000000(0000) GS:ffff8b61bba00000(0000)
> knlGS:0000000000000000
> [    0.313479] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.313479] CR2: 00000000000000a0 CR3: 0000000013c24000 CR4:
> 00000000000006f0
> [    0.313479] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [    0.313479] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [    0.313479] Call Trace:
> [    0.313479]  device_add+0x304/0x660
> [    0.313479]  ? __init_waitqueue_head+0x31/0x50
> [    0.313479]  __register_one_node+0x67/0x170
> [    0.313479]  __try_online_node.cold+0x3e/0x78
> [    0.313479]  try_online_node+0x25/0x40
> [    0.313479]  do_cpu_up+0x36/0xc0
> [    0.313479]  smp_init+0x59/0xb3
> [    0.313479]  kernel_init_freeable+0x11a/0x247
> [    0.313479]  ? rest_init+0x23f/0x23f
> [    0.313479]  kernel_init+0x5/0xf1
> [    0.313479]  ret_from_fork+0x3a/0x50
> [    0.313479] Modules linked in:
> 
> Figuring out what goes wrong here (maybe QEMU creating a weird
> system configuration) is a different journey :)
> 

That is a separate issue need to revert,

"x86, numa: always initialize all possible nodes"

and then, you should be able to reproduce.
David Hildenbrand June 21, 2019, 7:25 p.m. UTC | #6
On 21.06.19 21:07, Qian Cai wrote:
> On Fri, 2019-06-21 at 20:56 +0200, David Hildenbrand wrote:
>> On 21.06.19 20:24, David Hildenbrand wrote:
>>> On 21.06.19 17:15, Qian Cai wrote:
>>>> On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
>>>>> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
>>>>>
>>>>> Some further cleanups around memory block devices. Especially, clean up
>>>>> and simplify walk_memory_range(). Including some other minor cleanups.
>>>>>
>>>>> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
>>>>>
>>>>> v2 -> v3:
>>>>> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
>>>>> -- Avoid warning on ppc.
>>>>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
>>>>> -- Fixup a comment regarding hinted devices.
>>>>>
>>>>> v1 -> v2:
>>>>> - "mm: Section numbers use the type "unsigned long""
>>>>> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
>>>>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
>>>>> -- Fix compilation error
>>>>> -- Get rid of the "hint" parameter completely
>>>>>
>>>>> David Hildenbrand (6):
>>>>>   mm: Section numbers use the type "unsigned long"
>>>>>   drivers/base/memory: Use "unsigned long" for block ids
>>>>>   mm: Make register_mem_sect_under_node() static
>>>>>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>>>>>     instead of pfns
>>>>>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>>>>>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
>>>>>
>>>>>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>>>>>  drivers/acpi/acpi_memhotplug.c            |  19 +---
>>>>>  drivers/base/memory.c                     | 120 +++++++++++++---------
>>>>>  drivers/base/node.c                       |   8 +-
>>>>>  include/linux/memory.h                    |   5 +-
>>>>>  include/linux/memory_hotplug.h            |   2 -
>>>>>  include/linux/mmzone.h                    |   4 +-
>>>>>  include/linux/node.h                      |   7 --
>>>>>  mm/memory_hotplug.c                       |  57 +---------
>>>>>  mm/sparse.c                               |  12 +--
>>>>>  10 files changed, 106 insertions(+), 151 deletions(-)
>>>>>
>>>>
>>>> This series causes a few machines are unable to boot triggering endless
>>>> soft
>>>> lockups. Reverted those commits fixed the issue.
>>>>
>>>> 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and
>>>> pass
>>>> start+size instead of pfns"
>>>> c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
>>>> startsize-instead-of-pfns-fix"
>>>> 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify
>>>> walk_memory_blocks()"
>>>> 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
>>>> find_memory_block_hinted()"
>>>> 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-
>>>> find_memory_block_hinted-
>>>> v3"
>>>>
>>>> [    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe
>>>> ASPM,
>>>> so disable it
>>>> [    4.590405][    T1] ACPI: bus type PCI registered
>>>> [    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
>>>> 0x80000000-0x8fffffff] (base 0x80000000)
>>>> [    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff]
>>>> reserved in
>>>> E820
>>>> [    4.601860][    T1] PCI: Using configuration type 1 for base access
>>>> [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
>>>> [swapper/0:1]
>>>> [   28.671351][   C16] Modules linked in:
>>>> [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-
>>>> rc5-
>>>> next-20190621+ #1
>>>> [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>> DL385
>>>> Gen10, BIOS A40 03/09/2018
>>>> [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
>>>> [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53
>>>> 48 8b
>>>> 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d
>>>> <65>
>>>> ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
>>>> [   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246
>>>> ORIG_RAX:
>>>> ffffffffffffff13
>>>> [   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
>>>> ffffffffb6f2a3b8
>>>> [   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
>>>> ffff8882053d6138
>>>> [   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
>>>> ffffed1040a7ac27
>>>> [   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
>>>> 0000000000000246
>>>> [   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
>>>> 0000000000000000
>>>> [   28.761368][   C16] FS:  0000000000000000(0000)
>>>> GS:ffff888454500000(0000)
>>>> knlGS:0000000000000000
>>>> [   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
>>>> 00000000001406a0
>>>> [   28.791333][   C16] Call Trace:
>>>> [   28.791374][   C16]  klist_next+0xd8/0x1c0
>>>> [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
>>>> [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
>>>> [   28.801370][   C16]  ? kobject_put+0x23/0x250
>>>> [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
>>>> [   28.811353][   C16]  ? write_policy_show+0x40/0x40
>>>> [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
>>>> [   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
>>>> [   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
>>>> [   28.831353][   C16]  topology_init+0xbf/0x126
>>>> [   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
>>>> [   28.841368][   C16]  do_one_initcall+0xfe/0x45a
>>>> [   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
>>>> [   28.851353][   C16]  ? kasan_check_write+0x14/0x20
>>>> [   28.861333][   C16]  ? up_write+0x75/0x140
>>>> [   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
>>>> [   28.871333][   C16]  ? rest_init+0x188/0x188
>>>> [   28.871353][   C16]  kernel_init+0x11/0x138
>>>> [   28.881363][   C16]  ? rest_init+0x188/0x188
>>>> [   28.881363][   C16]  ret_from_fork+0x22/0x40
>>>> [   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
>>>> [swapper/0:1]
>>>> [   56.671352][   C16] Modules linked in:
>>>> [   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
>>>> G             L    5.2.0-rc5-next-20190621+ #1
>>>> [   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>> DL385
>>>> Gen10, BIOS A40 03/09/2018
>>>> [   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
>>>> [   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d
>>>> 8b 7e
>>>> 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00
>>>> <75>
>>>> c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
>>>> [   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287
>>>> ORIG_RAX:
>>>> ffffffffffffff13
>>>> [   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
>>>> ffffffffb74c9dc1
>>>> [   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
>>>> ffff8888774ec3e0
>>>> [   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
>>>> ffffed1040a7ac27
>>>> [   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
>>>> 0000000000085c1b
>>>> [   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
>>>> ffff8888774ec040
>>>> [   56.761372][   C16] FS:  0000000000000000(0000)
>>>> GS:ffff888454500000(0000)
>>>> knlGS:0000000000000000
>>>> [   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
>>>> 00000000001406a0
>>>> [   56.791373][   C16] Call Trace:
>>>> [   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
>>>> [   56.801334][   C16]  ? kobject_put+0x23/0x250
>>>> [   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
>>>> [   56.811333][   C16]  ? write_policy_show+0x40/0x40
>>>> [   56.811353][   C16]  link_mem_sections+0x7e/0xa0
>>>> [   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
>>>> [   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
>>>> [   56.831333][   C16]  topology_init+0xbf/0x126
>>>> [   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
>>>> [   56.841334][   C16]  do_one_initcall+0xfe/0x45a
>>>> [   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
>>>> [   56.851333][   C16]  ? kasan_check_write+0x14/0x20
>>>> [   56.851354][   C16]  ? up_write+0x75/0x140
>>>> [   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
>>>> [   56.861333][   C16]  ? rest_init+0x188/0x188
>>>> [   56.861369][   C16]  kernel_init+0x11/0x138
>>>> [   56.871333][   C16]  ? rest_init+0x188/0x188
>>>> [   56.871354][   C16]  ret_from_fork+0x22/0x40
>>>> [   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
>>>> [   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
>>>> idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
>>>> [   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
>>>> [   64.621334][   C16] NMI backtrace for cpu 16
>>>> [   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
>>>> G             L    5.2.0-rc5-next-20190621+ #1
>>>> [   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
>>>> DL385
>>>> Gen10, BIOS A40 03/09/2018
>>>> [   64.641371][   C16] Call Trace:
>>>> [   64.651337][   C16]  <IRQ>
>>>> [   64.651376][   C16]  dump_stack+0x62/0x9a
>>>> [   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
>>>> [   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
>>>> [   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
>>>> [   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
>>>> [   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
>>>> [   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
>>>> [   64.691336][   C16]  ? kasan_check_read+0x11/0x20
>>>> [   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
>>>> [   64.701336][   C16]  update_process_times+0x2f/0x60
>>>> [   64.701362][   C16]  tick_periodic+0x38/0xe0
>>>> [   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
>>>> [   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
>>>> [   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
>>>> [   64.721367][   C16]  </IRQ>
>>>> [   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
>>>> [   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 
>>>>
>>>
>>> @Qian Cai, unfortunately I can't reproduce.
>>>
>>> If you get the chance, it would be great if you could retry with
>>>
>>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>>> index 972c5336bebf..742f99ddd148 100644
>>> --- a/drivers/base/memory.c
>>> +++ b/drivers/base/memory.c
>>> @@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
>>> long size,
>>>         unsigned long block_id;
>>>         int ret = 0;
>>>
>>> +       if (!size)
>>> +               return;
>>> +
>>>         for (block_id = start_block_id; block_id <= end_block_id;
>>> block_id++) {
>>>                 mem = find_memory_block_by_id(block_id);
>>>                 if (!mem)
>>>
>>>
>>>
>>> If both, start and size are 0, we would get a veeeery long loop. This
>>> would mean that we have an online node that does not span any pages at
>>> all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).
>>>
>>
>>
>> ...trying to reproduce with QEMU (setting 0MB for the second node):
>>
>> qemu-system-x86_64 --enable-kvm -m 4G,maxmem=20G,slots=2 \
>> 	-smp sockets=2,cores=1 \
>> 	-numa node,nodeid=0,cpus=0,mem=4G \
>> 	-numa node,nodeid=1,cpus=1,mem=0 ...
>>
>> I can indeed see that the node is online and
>> "pgdat->node_start_pfn == 0 && start_pfn + pgdat->node_spanned_pages == 0".
>>
>> However, the kernel segfaults in an unrelated code path, so I can't
>> verify if this solves this problem:
>>
>> [    0.313284] BUG: kernel NULL pointer dereference, address: 00000000000000a0
>> [    0.313479] #PF: supervisor read access in kernel mode
>> [    0.313479] #PF: error_code(0x0000) - not-present page
>> [    0.313479] PGD 0 P4D 0 
>> [    0.313479] Oops: 0000 [#1] SMP PTI
>> [    0.313479] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-
>> 20190620+ #56
>> [    0.313479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
>> [    0.313479] RIP: 0010:bus_add_device+0x59/0x110
>> [    0.313479] Code: 20 48 89 df e8 f8 b4 ff ff 41 89 c4 85 c0 0f 85 81 00 00
>> 00 48 8b 53 50 48 85 d2 75 03 48 8b 135
>> [    0.313479] RSP: 0000:ffffb4a6c0013e20 EFLAGS: 00010246
>> [    0.313479] RAX: 0000000000000000 RBX: ffff8b61bac23800 RCX:
>> 0000000000000000
>> [    0.313479] RDX: ffff8b61bac29038 RSI: ffff8b61bac23800 RDI:
>> ffff8b61bac23800
>> [    0.313479] RBP: ffffffff9d2f4500 R08: 0000000000000000 R09:
>> 0000000000000001
>> [    0.313479] R10: 0000000000000000 R11: ffff8b61bad20878 R12:
>> 0000000000000000
>> [    0.313479] R13: 0000000000000000 R14: 0000000000000000 R15:
>> 0000000000000000
>> [    0.313479] FS:  0000000000000000(0000) GS:ffff8b61bba00000(0000)
>> knlGS:0000000000000000
>> [    0.313479] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.313479] CR2: 00000000000000a0 CR3: 0000000013c24000 CR4:
>> 00000000000006f0
>> [    0.313479] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [    0.313479] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>> 0000000000000400
>> [    0.313479] Call Trace:
>> [    0.313479]  device_add+0x304/0x660
>> [    0.313479]  ? __init_waitqueue_head+0x31/0x50
>> [    0.313479]  __register_one_node+0x67/0x170
>> [    0.313479]  __try_online_node.cold+0x3e/0x78
>> [    0.313479]  try_online_node+0x25/0x40
>> [    0.313479]  do_cpu_up+0x36/0xc0
>> [    0.313479]  smp_init+0x59/0xb3
>> [    0.313479]  kernel_init_freeable+0x11a/0x247
>> [    0.313479]  ? rest_init+0x23f/0x23f
>> [    0.313479]  kernel_init+0x5/0xf1
>> [    0.313479]  ret_from_fork+0x3a/0x50
>> [    0.313479] Modules linked in:
>>
>> Figuring out what goes wrong here (maybe QEMU creating a weird
>> system configuration) is a different journey :)
>>
> 
> That is a separate issue need to revert,
> 
> "x86, numa: always initialize all possible nodes"
> 
> and then, you should be able to reproduce.
> 

Thanks, reproduced and verified that this is the fix.
Qian Cai June 21, 2019, 7:29 p.m. UTC | #7
On Fri, 2019-06-21 at 20:24 +0200, David Hildenbrand wrote:
> On 21.06.19 17:15, Qian Cai wrote:
> > On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
> > > @Andrew: Only patch 1, 4 and 6 changed compared to v1.
> > > 
> > > Some further cleanups around memory block devices. Especially, clean up
> > > and simplify walk_memory_range(). Including some other minor cleanups.
> > > 
> > > Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
> > > 
> > > v2 -> v3:
> > > - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
> > > -- Avoid warning on ppc.
> > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
> > > -- Fixup a comment regarding hinted devices.
> > > 
> > > v1 -> v2:
> > > - "mm: Section numbers use the type "unsigned long""
> > > -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
> > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
> > > -- Fix compilation error
> > > -- Get rid of the "hint" parameter completely
> > > 
> > > David Hildenbrand (6):
> > >   mm: Section numbers use the type "unsigned long"
> > >   drivers/base/memory: Use "unsigned long" for block ids
> > >   mm: Make register_mem_sect_under_node() static
> > >   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
> > >     instead of pfns
> > >   mm/memory_hotplug: Move and simplify walk_memory_blocks()
> > >   drivers/base/memory.c: Get rid of find_memory_block_hinted()
> > > 
> > >  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
> > >  drivers/acpi/acpi_memhotplug.c            |  19 +---
> > >  drivers/base/memory.c                     | 120 +++++++++++++---------
> > >  drivers/base/node.c                       |   8 +-
> > >  include/linux/memory.h                    |   5 +-
> > >  include/linux/memory_hotplug.h            |   2 -
> > >  include/linux/mmzone.h                    |   4 +-
> > >  include/linux/node.h                      |   7 --
> > >  mm/memory_hotplug.c                       |  57 +---------
> > >  mm/sparse.c                               |  12 +--
> > >  10 files changed, 106 insertions(+), 151 deletions(-)
> > > 
> > 
> > This series causes a few machines are unable to boot triggering endless soft
> > lockups. Reverted those commits fixed the issue.
> > 
> > 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
> > start+size instead of pfns"
> > c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> > startsize-instead-of-pfns-fix"
> > 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify
> > walk_memory_blocks()"
> > 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> > find_memory_block_hinted()"
> > 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-
> > find_memory_block_hinted-
> > v3"
> > 
> > [    4.582081][    T1] ACPI FADT declares the system doesn't support PCIe
> > ASPM,
> > so disable it
> > [    4.590405][    T1] ACPI: bus type PCI registered
> > [    4.592908][    T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem
> > 0x80000000-0x8fffffff] (base 0x80000000)
> > [    4.601860][    T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved
> > in
> > E820
> > [    4.601860][    T1] PCI: Using configuration type 1 for base access
> > [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > [swapper/0:1]
> > [   28.671351][   C16] Modules linked in:
> > [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
> > next-20190621+ #1
> > [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385
> > Gen10, BIOS A40 03/09/2018
> > [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48
> > 8b
> > 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d
> > <65>
> > ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> > [   28.711354][   C16] RSP: 0018:ffff888205b27bf8 EFLAGS: 00000246 ORIG_RAX:
> > ffffffffffffff13
> > [   28.721372][   C16] RAX: 0000000000000000 RBX: ffff8882053d6138 RCX:
> > ffffffffb6f2a3b8
> > [   28.731371][   C16] RDX: 1ffff11040a7ac27 RSI: dffffc0000000000 RDI:
> > ffff8882053d6138
> > [   28.741371][   C16] RBP: ffff888205b27c08 R08: ffffed1040a7ac28 R09:
> > ffffed1040a7ac27
> > [   28.751334][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> > 0000000000000246
> > [   28.751370][   C16] R13: ffff888205b27c98 R14: ffff8884504d0a20 R15:
> > 0000000000000000
> > [   28.761368][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
> > knlGS:0000000000000000
> > [   28.771373][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   28.781334][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> > 00000000001406a0
> > [   28.791333][   C16] Call Trace:
> > [   28.791374][   C16]  klist_next+0xd8/0x1c0
> > [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
> > [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
> > [   28.801370][   C16]  ? kobject_put+0x23/0x250
> > [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
> > [   28.811353][   C16]  ? write_policy_show+0x40/0x40
> > [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
> > [   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> > [   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
> > [   28.831353][   C16]  topology_init+0xbf/0x126
> > [   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> > [   28.841368][   C16]  do_one_initcall+0xfe/0x45a
> > [   28.851334][   C16]  ? initcall_blacklisted+0x150/0x150
> > [   28.851353][   C16]  ? kasan_check_write+0x14/0x20
> > [   28.861333][   C16]  ? up_write+0x75/0x140
> > [   28.861369][   C16]  kernel_init_freeable+0x619/0x6ac
> > [   28.871333][   C16]  ? rest_init+0x188/0x188
> > [   28.871353][   C16]  kernel_init+0x11/0x138
> > [   28.881363][   C16]  ? rest_init+0x188/0x188
> > [   28.881363][   C16]  ret_from_fork+0x22/0x40
> > [   56.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > [swapper/0:1]
> > [   56.671352][   C16] Modules linked in:
> > [   56.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> > G             L    5.2.0-rc5-next-20190621+ #1
> > [   56.681357][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385
> > Gen10, BIOS A40 03/09/2018
> > [   56.691356][   C16] RIP: 0010:subsys_find_device_by_id+0x168/0x1f0
> > [   56.701334][   C16] Code: 48 85 c0 74 3e 48 8d 78 58 e8 14 77 ca ff 4d 8b
> > 7e
> > 58 4d 85 ff 74 2c 49 8d bf a0 03 00 00 e8 bf 75 ca ff 45 39 a7 a0 03 00 00
> > <75>
> > c9 4c 89 ff e8 0e 89 ff ff 48 85 c0 74 bc 48 89 df e8 21 3b 24
> > [   56.721333][   C16] RSP: 0018:ffff888205b27c68 EFLAGS: 00000287 ORIG_RAX:
> > ffffffffffffff13
> > [   56.721370][   C16] RAX: 0000000000000000 RBX: ffff888205b27c90 RCX:
> > ffffffffb74c9dc1
> > [   56.731370][   C16] RDX: 0000000000000003 RSI: dffffc0000000000 RDI:
> > ffff8888774ec3e0
> > [   56.741371][   C16] RBP: ffff888205b27cf8 R08: ffffed1040a7ac28 R09:
> > ffffed1040a7ac27
> > [   56.751335][   C16] R10: ffffed1040a7ac27 R11: ffff8882053d613b R12:
> > 0000000000085c1b
> > [   56.761334][   C16] R13: 1ffff11040b64f8e R14: ffff888450de4a20 R15:
> > ffff8888774ec040
> > [   56.761372][   C16] FS:  0000000000000000(0000) GS:ffff888454500000(0000)
> > knlGS:0000000000000000
> > [   56.771374][   C16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   56.781370][   C16] CR2: 0000000000000000 CR3: 00000007c9012000 CR4:
> > 00000000001406a0
> > [   56.791373][   C16] Call Trace:
> > [   56.791373][   C16]  ? bus_find_device_by_name+0x20/0x20
> > [   56.801334][   C16]  ? kobject_put+0x23/0x250
> > [   56.801334][   C16]  walk_memory_blocks+0x6c/0xb8
> > [   56.811333][   C16]  ? write_policy_show+0x40/0x40
> > [   56.811353][   C16]  link_mem_sections+0x7e/0xa0
> > [   56.811353][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
> > [   56.821333][   C16]  ? __register_one_node+0x3bd/0x600
> > [   56.831333][   C16]  topology_init+0xbf/0x126
> > [   56.831355][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
> > [   56.841334][   C16]  do_one_initcall+0xfe/0x45a
> > [   56.841334][   C16]  ? initcall_blacklisted+0x150/0x150
> > [   56.851333][   C16]  ? kasan_check_write+0x14/0x20
> > [   56.851354][   C16]  ? up_write+0x75/0x140
> > [   56.861333][   C16]  kernel_init_freeable+0x619/0x6ac
> > [   56.861333][   C16]  ? rest_init+0x188/0x188
> > [   56.861369][   C16]  kernel_init+0x11/0x138
> > [   56.871333][   C16]  ? rest_init+0x188/0x188
> > [   56.871354][   C16]  ret_from_fork+0x22/0x40
> > [   64.601362][   C16] rcu: INFO: rcu_sched self-detected stall on CPU
> > [   64.611335][   C16] rcu: 	16-....: (5958 ticks this GP)
> > idle=37e/1/0x4000000000000002 softirq=27/27 fqs=3000 
> > [   64.621334][   C16] 	(t=6002 jiffies g=-1079 q=25)
> > [   64.621334][   C16] NMI backtrace for cpu 16
> > [   64.621374][   C16] CPU: 16 PID: 1 Comm: swapper/0 Tainted:
> > G             L    5.2.0-rc5-next-20190621+ #1
> > [   64.631372][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385
> > Gen10, BIOS A40 03/09/2018
> > [   64.641371][   C16] Call Trace:
> > [   64.651337][   C16]  <IRQ>
> > [   64.651376][   C16]  dump_stack+0x62/0x9a
> > [   64.651376][   C16]  nmi_cpu_backtrace.cold.0+0x2e/0x33
> > [   64.661337][   C16]  ? nmi_cpu_backtrace_handler+0x20/0x20
> > [   64.661337][   C16]  nmi_trigger_cpumask_backtrace+0x1a6/0x1b9
> > [   64.671353][   C16]  arch_trigger_cpumask_backtrace+0x19/0x20
> > [   64.681366][   C16]  rcu_dump_cpu_stacks+0x18b/0x1d6
> > [   64.681366][   C16]  rcu_sched_clock_irq.cold.64+0x368/0x791
> > [   64.691336][   C16]  ? kasan_check_read+0x11/0x20
> > [   64.691354][   C16]  ? __raise_softirq_irqoff+0x66/0x150
> > [   64.701336][   C16]  update_process_times+0x2f/0x60
> > [   64.701362][   C16]  tick_periodic+0x38/0xe0
> > [   64.711334][   C16]  tick_handle_periodic+0x2e/0x80
> > [   64.711353][   C16]  smp_apic_timer_interrupt+0xfb/0x370
> > [   64.721367][   C16]  apic_timer_interrupt+0xf/0x20
> > [   64.721367][   C16]  </IRQ>
> > [   64.721367][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > [   64.731370][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 
> > 
> 
> @Qian Cai, unfortunately I can't reproduce.
> 
> If you get the chance, it would be great if you could retry with
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 972c5336bebf..742f99ddd148 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
> long size,
>         unsigned long block_id;
>         int ret = 0;
> 
> +       if (!size)
> +               return;
> +
>         for (block_id = start_block_id; block_id <= end_block_id;
> block_id++) {
>                 mem = find_memory_block_by_id(block_id);
>                 if (!mem)
> 
> 
> 
> If both, start and size are 0, we would get a veeeery long loop. This
> would mean that we have an online node that does not span any pages at
> all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).
> 

It works fine here.
Andrew Morton June 21, 2019, 11:42 p.m. UTC | #8
On Fri, 21 Jun 2019 20:24:59 +0200 David Hildenbrand <david@redhat.com> wrote:

> @Qian Cai, unfortunately I can't reproduce.
> 
> If you get the chance, it would be great if you could retry with
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 972c5336bebf..742f99ddd148 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
> long size,
>         unsigned long block_id;
>         int ret = 0;
> 
> +       if (!size)
> +               return;
> +
>         for (block_id = start_block_id; block_id <= end_block_id;
> block_id++) {
>                 mem = find_memory_block_by_id(block_id);
>                 if (!mem)
> 
> 
> 
> If both, start and size are 0, we would get a veeeery long loop. This
> would mean that we have an online node that does not span any pages at
> all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).

I think I'll make that a `return 0' and I won't drop patches 4-6 for
now, as we appear to have this fixed.



From: David Hildenbrand <david@redhat.com>
Subject: drivers-base-memoryc-get-rid-of-find_memory_block_hinted-v3-fix

handle zero-length walks

Link: http://lkml.kernel.org/r/1c2edc22-afd7-2211-c4c7-40e54e5007e8@redhat.com
Reported-by: Qian Cai <cai@lca.pw>
Tested-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/base/memory.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/base/memory.c~drivers-base-memoryc-get-rid-of-find_memory_block_hinted-v3-fix
+++ a/drivers/base/memory.c
@@ -866,6 +866,9 @@ int walk_memory_blocks(unsigned long sta
 	unsigned long block_id;
 	int ret = 0;
 
+	if (!size)
+		return 0;
+
 	for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
 		mem = find_memory_block_by_id(block_id);
 		if (!mem)