Message ID | 1343873409-8571-2-git-send-email-david@gibson.dropbear.id.au |
---|---|
State | New |
Headers | show |
David Gibson <david@gibson.dropbear.id.au> writes: > At present the qemu_system_reset() function always performs the same basic > actions on all machines. This includes running all the reset handler > hooks, however the order in which they run is not controlled by the board > logic. Let's be careful here in referring to "board logic". Reset should propagate--there's no argument there. Having all devices register in a list with no control over how reset is dispatched is wrong. It's workable because of qdev imposes per-bus reset functions and typically we have a small number of busses and ordering doesn't matter. > This is incorrect: any modern real hardware will generally have some sort > of reset controller, sometimes a full microcontroller or even service > processor which will control the order in which components on the board are > reset. I think this is true only with a loose definition of "modern", but okay. I don't think this is the right argument. The machine should serve as the entry point for reset. What components get reset in what order will need to be a machine hook for practical purposes (since not everything will be fully QOMized all at once). So I think this is the right approach for now. But all of the DT initialization stuff that is leading to this discussion in the first place is a gross hack to make a PV architecture work that took far too many short cuts. Building a DT in memory representing hardware instead of making things discoverable is not how hardware works. This sort of stuff is either (1) hard coded in a firmware/flashrom or (2) built dynamically in firmware. Let's not pretend like we're doing this because it's needed for real hardware. Regards, Anthony Liguori > For one specific example, the machine level reset handlers may need to > re-establish certain things in memory to meet the emulated platform's > specified entry conditions, before reexecuting the guest software. > re-establish the correct specified entry conditions for the emulated > platform. This must happen _after_ resetting peripheral devices, or they > could still be in the middle of DMA operations which would clobber the new > memory content. However with the normal ordering of reset hooks, the > machine is not able to register a hook which will run after the normal > qdev reset registered by generic code. > > This patch allows the machine to take control of the sequence of operations > during reset, by adding a new reset hook to the QEMUMachine. This hook is > optional - without it we just use the same reset sequence as always. That > logic is available in qemu_default_system_reset() to make it easy to > implement the case of the machine logic just needing to do some things > either before or after the normal reset. > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > --- > hw/boards.h | 3 +++ > sysemu.h | 1 + > vl.c | 10 +++++++++- > 3 files changed, 13 insertions(+), 1 deletion(-) > > diff --git a/hw/boards.h b/hw/boards.h > index 59c01d0..f2bfd3e 100644 > --- a/hw/boards.h > +++ b/hw/boards.h > @@ -12,11 +12,14 @@ typedef void QEMUMachineInitFunc(ram_addr_t ram_size, > const char *initrd_filename, > const char *cpu_model); > > +typedef void QEMUMachineResetFunc(bool report); > + > typedef struct QEMUMachine { > const char *name; > const char *alias; > const char *desc; > QEMUMachineInitFunc *init; > + QEMUMachineResetFunc *reset; > int use_scsi; > int max_cpus; > unsigned int no_serial:1, > diff --git a/sysemu.h b/sysemu.h > index 6540c79..f74319b 100644 > --- a/sysemu.h > +++ b/sysemu.h > @@ -62,6 +62,7 @@ int qemu_powerdown_requested(void); > void qemu_system_killed(int signal, pid_t pid); > void qemu_kill_report(void); > extern qemu_irq qemu_system_powerdown; > +void qemu_default_system_reset(bool report); > void qemu_system_reset(bool report); > > void qemu_add_exit_notifier(Notifier *notify); > diff --git a/vl.c b/vl.c > index 9fea320..ac47a7c 100644 > --- a/vl.c > +++ b/vl.c > @@ -1396,7 +1396,7 @@ void qemu_unregister_reset(QEMUResetHandler *func, void *opaque) > } > } > > -void qemu_system_reset(bool report) > +void qemu_default_system_reset(bool report) > { > QEMUResetEntry *re, *nre; > > @@ -1410,6 +1410,14 @@ void qemu_system_reset(bool report) > cpu_synchronize_all_post_reset(); > } Not a huge fan of the naming here. How about qemu_devices_reset()? > +void qemu_system_reset(bool report) > +{ > + if (current_machine->reset) > + current_machine->reset(report); > + else > + qemu_default_system_reset(report); > +} Missing curly braces. Regards, Anthony Liguori > + > void qemu_system_reset_request(void) > { > if (no_reboot) { > -- > 1.7.10.4
On Wed, 2012-08-01 at 21:37 -0500, Anthony Liguori wrote: > > But all of the DT initialization stuff that is leading to this > discussion in the first place is a gross hack to make a PV > architecture > work that took far too many short cuts. > > Building a DT in memory representing hardware instead of making things > discoverable is not how hardware works. It depends :-) Take a Power7 machine for example... the system & processor are actually initialized by the service processor which loads some kind of similar data structure into memory before starting the initial firmware :-) So from the P7 itself point of view, it starts with a memory based data structure pre-build. Also some processors have a pretty much turing complete "POR engine" which executes code from some kind of EEPROM to initialize the system, cores & busses on reset, which can involve writing things to memory as well (though it generally doesn't). > This sort of stuff is either > (1) hard coded in a firmware/flashrom or (2) built dynamically in > firmware. Let's not pretend like we're doing this because it's needed > for real hardware. Doesn't matter, we do things like -kernel which means pre-loading the kernel from qemu, even on x86. That doesn't match real HW either, but it's convenient to have. But overall there are real HW reasons to control the reset as well so the hook makes sense both ways. Cheers, Ben.
David Gibson writes: [...] > diff --git a/vl.c b/vl.c > index 9fea320..ac47a7c 100644 > --- a/vl.c > +++ b/vl.c > @@ -1410,6 +1410,14 @@ void qemu_system_reset(bool report) > cpu_synchronize_all_post_reset(); > } > +void qemu_system_reset(bool report) > +{ > + if (current_machine->reset) > + current_machine->reset(report); > + else > + qemu_default_system_reset(report); > +} > + > void qemu_system_reset_request(void) > { > if (no_reboot) { Nitpick: you're missing the braces. Lluis
Il 02/08/2012 05:50, Benjamin Herrenschmidt ha scritto: > >> > This sort of stuff is either >> > (1) hard coded in a firmware/flashrom or (2) built dynamically in >> > firmware. Let's not pretend like we're doing this because it's needed >> > for real hardware. > Doesn't matter, we do things like -kernel which means pre-loading the > kernel from qemu, even on x86. That doesn't match real HW either, but > it's convenient to have. But overall there are real HW reasons to > control the reset as well so the hook makes sense both ways. On x86 we do not pre-load the kernel. Neither the kernel/initramfs and the option ROM that loads the kernel are written in memory, they are passed to the guest via fw_cfg. Then the option ROM is loaded by the BIOS, and loads the kernel/initramfs. Paolo
On Thu, 2012-08-02 at 17:17 +0200, Paolo Bonzini wrote: > > On x86 we do not pre-load the kernel. Neither the kernel/initramfs > and > the option ROM that loads the kernel are written in memory, they are > passed to the guest via fw_cfg. Then the option ROM is loaded by the > BIOS, and loads the kernel/initramfs. How does it do ? IE. how does it access a random pair of files on the host side ? Via some magic ports ? Cheers, Ben.
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: > On Thu, 2012-08-02 at 17:17 +0200, Paolo Bonzini wrote: >> >> On x86 we do not pre-load the kernel. Neither the kernel/initramfs >> and >> the option ROM that loads the kernel are written in memory, they are >> passed to the guest via fw_cfg. Then the option ROM is loaded by the >> BIOS, and loads the kernel/initramfs. > > How does it do ? IE. how does it access a random pair of files on the > host side ? Via some magic ports ? Yes, see hw/fw_cfg.c Regards, Anthony Liguori > > Cheers, > Ben.
On Thu, Aug 02, 2012 at 06:00:06PM +0300, Lluís Vilanova wrote: > David Gibson writes: > [...] > > diff --git a/vl.c b/vl.c > > index 9fea320..ac47a7c 100644 > > --- a/vl.c > > +++ b/vl.c > > @@ -1410,6 +1410,14 @@ void qemu_system_reset(bool report) > > cpu_synchronize_all_post_reset(); > > } > > > +void qemu_system_reset(bool report) > > +{ > > + if (current_machine->reset) > > + current_machine->reset(report); > > + else > > + qemu_default_system_reset(report); > > +} > > + > > void qemu_system_reset_request(void) > > { > > if (no_reboot) { > > Nitpick: you're missing the braces. Ah, oops, thank you.
On Thu, Aug 02, 2012 at 05:17:27PM +0200, Paolo Bonzini wrote: > Il 02/08/2012 05:50, Benjamin Herrenschmidt ha scritto: > > > >> > This sort of stuff is either > >> > (1) hard coded in a firmware/flashrom or (2) built dynamically in > >> > firmware. Let's not pretend like we're doing this because it's needed > >> > for real hardware. > > Doesn't matter, we do things like -kernel which means pre-loading the > > kernel from qemu, even on x86. That doesn't match real HW either, but > > it's convenient to have. But overall there are real HW reasons to > > control the reset as well so the hook makes sense both ways. > > On x86 we do not pre-load the kernel. Neither the kernel/initramfs and > the option ROM that loads the kernel are written in memory, they are > passed to the guest via fw_cfg. Then the option ROM is loaded by the > BIOS, and loads the kernel/initramfs. On x86 (or rather PC platform), yes. But spapr is far from the only one to load the kernel images direct into RAM - arm_load_kernel() does it too, so does milkymist, sun4[mu] and a bunch of others.
On Wed, Aug 01, 2012 at 09:37:39PM -0500, Anthony Liguori wrote: > David Gibson <david@gibson.dropbear.id.au> writes: > > > At present the qemu_system_reset() function always performs the same basic > > actions on all machines. This includes running all the reset handler > > hooks, however the order in which they run is not controlled by the board > > logic. > > Let's be careful here in referring to "board logic". > > Reset should propagate--there's no argument there. Having all devices > register in a list with no control over how reset is dispatched is > wrong. It's workable because of qdev imposes per-bus reset functions > and typically we have a small number of busses and ordering doesn't > matter. > > > This is incorrect: any modern real hardware will generally have some sort > > of reset controller, sometimes a full microcontroller or even service > > processor which will control the order in which components on the board are > > reset. > > I think this is true only with a loose definition of "modern", but okay. > > I don't think this is the right argument. The machine should serve as > the entry point for reset. What components get reset in what order will > need to be a machine hook for practical purposes (since not everything > will be fully QOMized all at once). > > So I think this is the right approach for now. > > But all of the DT initialization stuff that is leading to this > discussion in the first place is a gross hack to make a PV architecture > work that took far too many short cuts. Hrm. The "short cuts" you seem to refer to is the fact that we do a bunch of our machine initialization direct from qemu, rather than from firmware executed within the guest. Doing it this way is both easier to write, easier to follow and faster to execute, so if that counts as a gross hack then I think your design priorities need re-consideration. Because of the arcane history of the PC boot sequence I can see why executing a BIOS image within the guest is the preferred model there, but that's no reason to put the same restriction on platforms that have a clear and well documented firmware to OS handoff state. [snip] > > -void qemu_system_reset(bool report) > > +void qemu_default_system_reset(bool report) > > { > > QEMUResetEntry *re, *nre; > > > > @@ -1410,6 +1410,14 @@ void qemu_system_reset(bool report) > > cpu_synchronize_all_post_reset(); > > } > > Not a huge fan of the naming here. How about qemu_devices_reset()? Ok, I'll change that.
diff --git a/hw/boards.h b/hw/boards.h index 59c01d0..f2bfd3e 100644 --- a/hw/boards.h +++ b/hw/boards.h @@ -12,11 +12,14 @@ typedef void QEMUMachineInitFunc(ram_addr_t ram_size, const char *initrd_filename, const char *cpu_model); +typedef void QEMUMachineResetFunc(bool report); + typedef struct QEMUMachine { const char *name; const char *alias; const char *desc; QEMUMachineInitFunc *init; + QEMUMachineResetFunc *reset; int use_scsi; int max_cpus; unsigned int no_serial:1, diff --git a/sysemu.h b/sysemu.h index 6540c79..f74319b 100644 --- a/sysemu.h +++ b/sysemu.h @@ -62,6 +62,7 @@ int qemu_powerdown_requested(void); void qemu_system_killed(int signal, pid_t pid); void qemu_kill_report(void); extern qemu_irq qemu_system_powerdown; +void qemu_default_system_reset(bool report); void qemu_system_reset(bool report); void qemu_add_exit_notifier(Notifier *notify); diff --git a/vl.c b/vl.c index 9fea320..ac47a7c 100644 --- a/vl.c +++ b/vl.c @@ -1396,7 +1396,7 @@ void qemu_unregister_reset(QEMUResetHandler *func, void *opaque) } } -void qemu_system_reset(bool report) +void qemu_default_system_reset(bool report) { QEMUResetEntry *re, *nre; @@ -1410,6 +1410,14 @@ void qemu_system_reset(bool report) cpu_synchronize_all_post_reset(); } +void qemu_system_reset(bool report) +{ + if (current_machine->reset) + current_machine->reset(report); + else + qemu_default_system_reset(report); +} + void qemu_system_reset_request(void) { if (no_reboot) {
At present the qemu_system_reset() function always performs the same basic actions on all machines. This includes running all the reset handler hooks, however the order in which they run is not controlled by the board logic. This is incorrect: any modern real hardware will generally have some sort of reset controller, sometimes a full microcontroller or even service processor which will control the order in which components on the board are reset. For one specific example, the machine level reset handlers may need to re-establish certain things in memory to meet the emulated platform's specified entry conditions, before reexecuting the guest software. re-establish the correct specified entry conditions for the emulated platform. This must happen _after_ resetting peripheral devices, or they could still be in the middle of DMA operations which would clobber the new memory content. However with the normal ordering of reset hooks, the machine is not able to register a hook which will run after the normal qdev reset registered by generic code. This patch allows the machine to take control of the sequence of operations during reset, by adding a new reset hook to the QEMUMachine. This hook is optional - without it we just use the same reset sequence as always. That logic is available in qemu_default_system_reset() to make it easy to implement the case of the machine logic just needing to do some things either before or after the normal reset. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- hw/boards.h | 3 +++ sysemu.h | 1 + vl.c | 10 +++++++++- 3 files changed, 13 insertions(+), 1 deletion(-)