mbox series

[v4,0/3] delay timer_new from init to realize to fix memleaks.

Message ID 20200305065422.12707-1-pannengyuan@huawei.com
Headers show
Series delay timer_new from init to realize to fix memleaks. | expand

Message

Pan Nengyuan March 5, 2020, 6:54 a.m. UTC
This series delay timer_new from init into realize to avoid memleaks when we call 'device_list_properties'.
And do timer_free only in s390x_cpu_finalize because it's hotplugable. However, mos6522_realize is never called 
at all due to the incorrect creation of it. So we aslo fix the incorrect creation in mac_via first, then move the
timer_new to mos6522_realize().

v1:
   - Delay timer_new() from init() to realize() to fix memleaks.
v2:
   - Similarly to other cleanups, move timer_new into realize in target/s390x/cpu.c (Suggested by Philippe Mathieu-Daudé).
   - Send these two patches as a series instead of send each as a single patch but with wrong subject in v1.
v3:
   - It's not valid in mos6522 if we move timer_new from init to realize, because it's never called at all.
     Thus, we remove null check in reset, and add calls to mos6522_realize() in mac_via_realize to make this move to be valid.
   - split patch by device to make it more clear.
v4:
   - Aslo do timer_free on the error path in realize() and fix some coding style. Then use device_class_set_parent_unrealize to declare unrealize. 
   - split the mos6522 patch into two, one to fix incorrect creation of mos6522, the other to fix memleak.

Pan Nengyuan (3):
  s390x: fix memleaks in cpu_finalize
  mac_via: fix incorrect creation of mos6522 device in mac_via
  hw/misc/mos6522: move timer_new from init() into realize() to avoid
    memleaks

 hw/misc/mac_via.c      | 43 +++++++++++++++++++++++++++++-------------
 hw/misc/mos6522.c      |  6 ++++++
 target/s390x/cpu-qom.h |  1 +
 target/s390x/cpu.c     | 41 ++++++++++++++++++++++++++++++++++++----
 4 files changed, 74 insertions(+), 17 deletions(-)

Comments

no-reply@patchew.org March 5, 2020, 6:46 a.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/20200305065422.12707-1-pannengyuan@huawei.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v4 0/3] delay timer_new from init to realize to fix memleaks.
Message-id: 20200305065422.12707-1-pannengyuan@huawei.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20200305065422.12707-1-pannengyuan@huawei.com -> patchew/20200305065422.12707-1-pannengyuan@huawei.com
Switched to a new branch 'test'
e87aa99 hw/misc/mos6522: move timer_new from init() into realize() to avoid memleaks
0c6588b mac_via: fix incorrect creation of mos6522 device in mac_via
1afdb4e s390x: fix memleaks in cpu_finalize

=== OUTPUT BEGIN ===
1/3 Checking commit 1afdb4e181f1 (s390x: fix memleaks in cpu_finalize)
2/3 Checking commit 0c6588bb0991 (mac_via: fix incorrect creation of mos6522 device in mac_via)
ERROR: trailing whitespace
#71: FILE: hw/misc/mac_via.c:954:
+    object_initialize_child(OBJECT(m), "via1", &m->mos6522_via1, $

total: 1 errors, 0 warnings, 66 lines checked

Patch 2/3 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

3/3 Checking commit e87aa99376d6 (hw/misc/mos6522: move timer_new from init() into realize() to avoid memleaks)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200305065422.12707-1-pannengyuan@huawei.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Mark Cave-Ayland March 8, 2020, 11:58 a.m. UTC | #2
On 05/03/2020 06:54, Pan Nengyuan wrote:

> This series delay timer_new from init into realize to avoid memleaks when we call 'device_list_properties'.
> And do timer_free only in s390x_cpu_finalize because it's hotplugable. However, mos6522_realize is never called 
> at all due to the incorrect creation of it. So we aslo fix the incorrect creation in mac_via first, then move the
> timer_new to mos6522_realize().
> 
> v1:
>    - Delay timer_new() from init() to realize() to fix memleaks.
> v2:
>    - Similarly to other cleanups, move timer_new into realize in target/s390x/cpu.c (Suggested by Philippe Mathieu-Daudé).
>    - Send these two patches as a series instead of send each as a single patch but with wrong subject in v1.
> v3:
>    - It's not valid in mos6522 if we move timer_new from init to realize, because it's never called at all.
>      Thus, we remove null check in reset, and add calls to mos6522_realize() in mac_via_realize to make this move to be valid.
>    - split patch by device to make it more clear.
> v4:
>    - Aslo do timer_free on the error path in realize() and fix some coding style. Then use device_class_set_parent_unrealize to declare unrealize. 
>    - split the mos6522 patch into two, one to fix incorrect creation of mos6522, the other to fix memleak.
> 
> Pan Nengyuan (3):
>   s390x: fix memleaks in cpu_finalize
>   mac_via: fix incorrect creation of mos6522 device in mac_via
>   hw/misc/mos6522: move timer_new from init() into realize() to avoid
>     memleaks
> 
>  hw/misc/mac_via.c      | 43 +++++++++++++++++++++++++++++-------------
>  hw/misc/mos6522.c      |  6 ++++++
>  target/s390x/cpu-qom.h |  1 +
>  target/s390x/cpu.c     | 41 ++++++++++++++++++++++++++++++++++++----
>  4 files changed, 74 insertions(+), 17 deletions(-)

I just tried this patchset applied on top of git master and it causes qemu-system-ppc
to segfault on startup:

$ gdb --args ./qemu-system-ppc
...
...
Thread 1 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
0x0000555555e7e38c in timer_del (ts=0x0) at util/qemu-timer.c:429
429         QEMUTimerList *timer_list = ts->timer_list;
(gdb) bt
#0  0x0000555555e7e38c in timer_del (ts=0x0) at util/qemu-timer.c:429
#1  0x0000555555b5d2c1 in mos6522_reset (dev=0x555556e0ac50) at hw/misc/mos6522.c:468
#2  0x0000555555b63570 in mos6522_cuda_reset (dev=0x555556e0ac50) at
hw/misc/macio/cuda.c:599
#3  0x0000555555ad9dd5 in device_transitional_reset (obj=0x555556e0ac50) at
hw/core/qdev.c:1136
#4  0x0000555555ae0755 in resettable_phase_hold (obj=0x555556e0ac50, opaque=0x0,
type=RESET_TYPE_COLD) at hw/core/resettable.c:182
#5  0x0000555555add5f8 in bus_reset_child_foreach (obj=0x555556a472a0,
cb=0x555555ae0605 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at
hw/core/bus.c:94
#6  0x0000555555ae0418 in resettable_child_foreach (rc=0x55555696af80,
obj=0x555556a472a0, cb=0x555555ae0605 <resettable_phase_hold>, opaque=0x0,
type=RESET_TYPE_COLD) at hw/core/resettable.c:96
#7  0x0000555555ae06db in resettable_phase_hold (obj=0x555556a472a0, opaque=0x0,
type=RESET_TYPE_COLD) at hw/core/resettable.c:173
#8  0x0000555555ae02ab in resettable_assert_reset (obj=0x555556a472a0,
type=RESET_TYPE_COLD) at hw/core/resettable.c:60
#9  0x0000555555ae01ef in resettable_reset (obj=0x555556a472a0, type=RESET_TYPE_COLD)
at hw/core/resettable.c:45
#10 0x0000555555ae0afa in resettable_cold_reset_fn (opaque=0x555556a472a0) at
hw/core/resettable.c:269
#11 0x0000555555ae13a0 in qemu_devices_reset () at hw/core/reset.c:69
#12 0x000055555597d54c in qemu_system_reset (reason=SHUTDOWN_CAUSE_NONE) at
/home/build/src/qemu/git/qemu/softmmu/vl.c:1393
#13 0x00005555559855bb in qemu_init (argc=1, argv=0x7fffffffea78,
envp=0x7fffffffea88) at /home/build/src/qemu/git/qemu/softmmu/vl.c:4418
#14 0x0000555555e1b646 in main (argc=1, argv=0x7fffffffea78, envp=0x7fffffffea88) at
/home/build/src/qemu/git/qemu/softmmu/main.c:48


Possibly related to some of the new reset changes?


ATB,

Mark.
Peter Maydell March 8, 2020, 1:39 p.m. UTC | #3
On Sun, 8 Mar 2020 at 11:58, Mark Cave-Ayland
<mark.cave-ayland@ilande.co.uk> wrote:
> I just tried this patchset applied on top of git master and it causes qemu-system-ppc
> to segfault on startup:
>
> $ gdb --args ./qemu-system-ppc
> ...
> ...
> Thread 1 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
> 0x0000555555e7e38c in timer_del (ts=0x0) at util/qemu-timer.c:429
> 429         QEMUTimerList *timer_list = ts->timer_list;
> (gdb) bt
> #0  0x0000555555e7e38c in timer_del (ts=0x0) at util/qemu-timer.c:429
> #1  0x0000555555b5d2c1 in mos6522_reset (dev=0x555556e0ac50) at hw/misc/mos6522.c:468
> #2  0x0000555555b63570 in mos6522_cuda_reset (dev=0x555556e0ac50) at
> hw/misc/macio/cuda.c:599

It looks like we haven't caught all the cases of "somebody created a
MOS6522 (or one of its subclasses) but forgot to realize it". This
particular one I think is the s->cuda which is inited in macio_oldworld_init()
but not realized in macio_oldworld_realize(). I think that pmu_init() in
hw/misc/macio/pmu.c also has this bug. We need to go through and
audit all the places where we create TYPE_MOS6522 or any of its
subclasses and make sure they are also realizing the devices they create.
(The presence of the new 3-phase reset infrastructure in the backtrace
is a red herring here -- this would have crashed the same way with the
old code too.)

We should probably find some generic place in Device code where we
can stick an assert "are we trying to reset an unrealized device?"
because I bet we have other instances of this bug which we haven't
noticed because the reset function happens to not misbehave on
an inited-but-not-realized device...

thanks
-- PMM
Pan Nengyuan March 9, 2020, 12:49 a.m. UTC | #4
On 3/8/2020 9:39 PM, Peter Maydell wrote:
> On Sun, 8 Mar 2020 at 11:58, Mark Cave-Ayland
> <mark.cave-ayland@ilande.co.uk> wrote:
>> I just tried this patchset applied on top of git master and it causes qemu-system-ppc
>> to segfault on startup:
>>
>> $ gdb --args ./qemu-system-ppc
>> ...
>> ...
>> Thread 1 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
>> 0x0000555555e7e38c in timer_del (ts=0x0) at util/qemu-timer.c:429
>> 429         QEMUTimerList *timer_list = ts->timer_list;
>> (gdb) bt
>> #0  0x0000555555e7e38c in timer_del (ts=0x0) at util/qemu-timer.c:429
>> #1  0x0000555555b5d2c1 in mos6522_reset (dev=0x555556e0ac50) at hw/misc/mos6522.c:468
>> #2  0x0000555555b63570 in mos6522_cuda_reset (dev=0x555556e0ac50) at
>> hw/misc/macio/cuda.c:599
> 
> It looks like we haven't caught all the cases of "somebody created a
> MOS6522 (or one of its subclasses) but forgot to realize it". This
> particular one I think is the s->cuda which is inited in macio_oldworld_init()
> but not realized in macio_oldworld_realize(). I think that pmu_init() in
> hw/misc/macio/pmu.c also has this bug. We need to go through and
> audit all the places where we create TYPE_MOS6522 or any of its
> subclasses and make sure they are also realizing the devices they create.
> (The presence of the new 3-phase reset infrastructure in the backtrace
> is a red herring here -- this would have crashed the same way with the
> old code too.)
> 

Yes, that's right, this series miss other cases, I will search and fix all
the cases about MOS6522 device.

Thanks.

> We should probably find some generic place in Device code where we
> can stick an assert "are we trying to reset an unrealized device?"
> because I bet we have other instances of this bug which we haven't
> noticed because the reset function happens to not misbehave on
> an inited-but-not-realized device...
> 
> thanks
> -- PMM
> .
>
Mark Cave-Ayland March 9, 2020, 4:14 p.m. UTC | #5
On 08/03/2020 13:39, Peter Maydell wrote:

> On Sun, 8 Mar 2020 at 11:58, Mark Cave-Ayland
> <mark.cave-ayland@ilande.co.uk> wrote:
>> I just tried this patchset applied on top of git master and it causes qemu-system-ppc
>> to segfault on startup:
>>
>> $ gdb --args ./qemu-system-ppc
>> ...
>> ...
>> Thread 1 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
>> 0x0000555555e7e38c in timer_del (ts=0x0) at util/qemu-timer.c:429
>> 429         QEMUTimerList *timer_list = ts->timer_list;
>> (gdb) bt
>> #0  0x0000555555e7e38c in timer_del (ts=0x0) at util/qemu-timer.c:429
>> #1  0x0000555555b5d2c1 in mos6522_reset (dev=0x555556e0ac50) at hw/misc/mos6522.c:468
>> #2  0x0000555555b63570 in mos6522_cuda_reset (dev=0x555556e0ac50) at
>> hw/misc/macio/cuda.c:599
> 
> It looks like we haven't caught all the cases of "somebody created a
> MOS6522 (or one of its subclasses) but forgot to realize it". This
> particular one I think is the s->cuda which is inited in macio_oldworld_init()
> but not realized in macio_oldworld_realize(). I think that pmu_init() in
> hw/misc/macio/pmu.c also has this bug. We need to go through and
> audit all the places where we create TYPE_MOS6522 or any of its
> subclasses and make sure they are also realizing the devices they create.
> (The presence of the new 3-phase reset infrastructure in the backtrace
> is a red herring here -- this would have crashed the same way with the
> old code too.)
> 
> We should probably find some generic place in Device code where we
> can stick an assert "are we trying to reset an unrealized device?"
> because I bet we have other instances of this bug which we haven't
> noticed because the reset function happens to not misbehave on
> an inited-but-not-realized device...

Yeah that's probably my fault - I remember struggling quite a bit to get everything
to initialise correctly in the right order when I worked on this.

I tested first on cuda and then used the same pattern for pmu and mac_via so I'm not
surprised at all that the same problem appears in all three.


ATB,

Mark.