Patchwork qdev: Reset hotplugged devices

login
register
mail settings
Submitter Alex Williamson
Date Aug. 3, 2010, 4:19 p.m.
Message ID <20100803161914.15514.59304.stgit@localhost6.localdomain6>
Download mbox | patch
Permalink /patch/60778/
State New
Headers show

Comments

Alex Williamson - Aug. 3, 2010, 4:19 p.m.
Several devices rely on their reset() function being called to
initialize device state, e1000 and rtl8139 in particular.  When
the device is hot added, the reset doesn't occur, often leaving
the device in an unusable state.  Adding a call to reset() after
init() for hotplugged devices puts the device in the expected
state for the guest.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 0.13 candidate?

 hw/qdev.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)
Glauber Costa - Aug. 3, 2010, 5:41 p.m.
On Tue, Aug 03, 2010 at 10:19:47AM -0600, Alex Williamson wrote:
> Several devices rely on their reset() function being called to
> initialize device state, e1000 and rtl8139 in particular.  When
> the device is hot added, the reset doesn't occur, often leaving
> the device in an unusable state.  Adding a call to reset() after
> init() for hotplugged devices puts the device in the expected
> state for the guest.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
I like this one better.
Markus Armbruster - Aug. 20, 2010, 9 a.m.
Alex Williamson <alex.williamson@redhat.com> writes:

> Several devices rely on their reset() function being called to
> initialize device state, e1000 and rtl8139 in particular.  When
> the device is hot added, the reset doesn't occur, often leaving
> the device in an unusable state.  Adding a call to reset() after
> init() for hotplugged devices puts the device in the expected
> state for the guest.
>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>
>  0.13 candidate?
>
>  hw/qdev.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/hw/qdev.c b/hw/qdev.c
> index e99c73f..b156272 100644
> --- a/hw/qdev.c
> +++ b/hw/qdev.c
> @@ -278,6 +278,9 @@ int qdev_init(DeviceState *dev)
>          qdev_free(dev);
>          return rc;
>      }
> +    if (dev->hotplugged) {
> +        qdev_reset(dev);
> +    }
>      qemu_register_reset(qdev_reset, dev);
>      if (dev->info->vmsd) {
>          vmstate_register_with_alias_id(dev, -1, dev->info->vmsd, dev,

qdev_reset() isn't necessary when !dev->hotplugged, because then
qemu_system_reset() will run shortly, which will call qdev_reset().
Correct?
Alex Williamson - Aug. 20, 2010, 12:41 p.m.
On Fri, 2010-08-20 at 11:00 +0200, Markus Armbruster wrote:
> Alex Williamson <alex.williamson@redhat.com> writes:
> 
> > Several devices rely on their reset() function being called to
> > initialize device state, e1000 and rtl8139 in particular.  When
> > the device is hot added, the reset doesn't occur, often leaving
> > the device in an unusable state.  Adding a call to reset() after
> > init() for hotplugged devices puts the device in the expected
> > state for the guest.
> >
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >
> >  0.13 candidate?
> >
> >  hw/qdev.c |    3 +++
> >  1 files changed, 3 insertions(+), 0 deletions(-)
> >
> > diff --git a/hw/qdev.c b/hw/qdev.c
> > index e99c73f..b156272 100644
> > --- a/hw/qdev.c
> > +++ b/hw/qdev.c
> > @@ -278,6 +278,9 @@ int qdev_init(DeviceState *dev)
> >          qdev_free(dev);
> >          return rc;
> >      }
> > +    if (dev->hotplugged) {
> > +        qdev_reset(dev);
> > +    }
> >      qemu_register_reset(qdev_reset, dev);
> >      if (dev->info->vmsd) {
> >          vmstate_register_with_alias_id(dev, -1, dev->info->vmsd, dev,
> 
> qdev_reset() isn't necessary when !dev->hotplugged, because then
> qemu_system_reset() will run shortly, which will call qdev_reset().
> Correct?

Yes, exactly.
Markus Armbruster - Aug. 20, 2010, 3:47 p.m.
Alex Williamson <alex.williamson@redhat.com> writes:

> On Fri, 2010-08-20 at 11:00 +0200, Markus Armbruster wrote:
>> Alex Williamson <alex.williamson@redhat.com> writes:
>> 
>> > Several devices rely on their reset() function being called to
>> > initialize device state, e1000 and rtl8139 in particular.  When
>> > the device is hot added, the reset doesn't occur, often leaving
>> > the device in an unusable state.  Adding a call to reset() after
>> > init() for hotplugged devices puts the device in the expected
>> > state for the guest.
>> >
>> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>> > ---
>> >
>> >  0.13 candidate?
>> >
>> >  hw/qdev.c |    3 +++
>> >  1 files changed, 3 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/hw/qdev.c b/hw/qdev.c
>> > index e99c73f..b156272 100644
>> > --- a/hw/qdev.c
>> > +++ b/hw/qdev.c
>> > @@ -278,6 +278,9 @@ int qdev_init(DeviceState *dev)
>> >          qdev_free(dev);
>> >          return rc;
>> >      }
>> > +    if (dev->hotplugged) {
>> > +        qdev_reset(dev);
>> > +    }
>> >      qemu_register_reset(qdev_reset, dev);
>> >      if (dev->info->vmsd) {
>> >          vmstate_register_with_alias_id(dev, -1, dev->info->vmsd, dev,
>> 
>> qdev_reset() isn't necessary when !dev->hotplugged, because then
>> qemu_system_reset() will run shortly, which will call qdev_reset().
>> Correct?
>
> Yes, exactly.

Hmm.  "qdev_init() automatically calls qdev_reset() if hotplug" feels
needlessly complicated.  I'd prefer qdev_init() to call it always or
never.

If "always", we reset twice for cold-plug.  Is that bad?

If "never", we need to reset somewhere else for hot-plug.  What about
do_device_add()?
Anthony Liguori - Aug. 20, 2010, 3:56 p.m.
On 08/20/2010 10:47 AM, Markus Armbruster wrote:
> Alex Williamson<alex.williamson@redhat.com>  writes:
>
>    
>> On Fri, 2010-08-20 at 11:00 +0200, Markus Armbruster wrote:
>>      
>>> Alex Williamson<alex.williamson@redhat.com>  writes:
>>>
>>>        
>>>> Several devices rely on their reset() function being called to
>>>> initialize device state, e1000 and rtl8139 in particular.  When
>>>> the device is hot added, the reset doesn't occur, often leaving
>>>> the device in an unusable state.  Adding a call to reset() after
>>>> init() for hotplugged devices puts the device in the expected
>>>> state for the guest.
>>>>
>>>> Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
>>>> ---
>>>>
>>>>   0.13 candidate?
>>>>
>>>>   hw/qdev.c |    3 +++
>>>>   1 files changed, 3 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/hw/qdev.c b/hw/qdev.c
>>>> index e99c73f..b156272 100644
>>>> --- a/hw/qdev.c
>>>> +++ b/hw/qdev.c
>>>> @@ -278,6 +278,9 @@ int qdev_init(DeviceState *dev)
>>>>           qdev_free(dev);
>>>>           return rc;
>>>>       }
>>>> +    if (dev->hotplugged) {
>>>> +        qdev_reset(dev);
>>>> +    }
>>>>       qemu_register_reset(qdev_reset, dev);
>>>>       if (dev->info->vmsd) {
>>>>           vmstate_register_with_alias_id(dev, -1, dev->info->vmsd, dev,
>>>>          
>>> qdev_reset() isn't necessary when !dev->hotplugged, because then
>>> qemu_system_reset() will run shortly, which will call qdev_reset().
>>> Correct?
>>>        
>> Yes, exactly.
>>      
> Hmm.  "qdev_init() automatically calls qdev_reset() if hotplug" feels
> needlessly complicated.  I'd prefer qdev_init() to call it always or
> never.
>
> If "always", we reset twice for cold-plug.  Is that bad?
>
> If "never", we need to reset somewhere else for hot-plug.  What about
> do_device_add()?
>    

The real problem is how we do reset.  We shouldn't register a reset 
handler for every qdev device but rather register a single reset handler 
that walks the device tree and calls reset on every reachable device.

Then we can always call reset in init() and there's no need to have a 
dev->hotplugged check.  The qdev device tree reset handler should not be 
registered until *after* we call qemu_system_reset() after creating the 
device model which will ensure that we don't do a double reset.

Regards,

Anthony Liguori
Markus Armbruster - Aug. 20, 2010, 4:14 p.m.
Anthony Liguori <anthony@codemonkey.ws> writes:

> On 08/20/2010 10:47 AM, Markus Armbruster wrote:
>> Alex Williamson<alex.williamson@redhat.com>  writes:
>>
>>    
>>> On Fri, 2010-08-20 at 11:00 +0200, Markus Armbruster wrote:
>>>      
>>>> Alex Williamson<alex.williamson@redhat.com>  writes:
>>>>
>>>>        
>>>>> Several devices rely on their reset() function being called to
>>>>> initialize device state, e1000 and rtl8139 in particular.  When
>>>>> the device is hot added, the reset doesn't occur, often leaving
>>>>> the device in an unusable state.  Adding a call to reset() after
>>>>> init() for hotplugged devices puts the device in the expected
>>>>> state for the guest.
>>>>>
>>>>> Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
>>>>> ---
>>>>>
>>>>>   0.13 candidate?
>>>>>
>>>>>   hw/qdev.c |    3 +++
>>>>>   1 files changed, 3 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/hw/qdev.c b/hw/qdev.c
>>>>> index e99c73f..b156272 100644
>>>>> --- a/hw/qdev.c
>>>>> +++ b/hw/qdev.c
>>>>> @@ -278,6 +278,9 @@ int qdev_init(DeviceState *dev)
>>>>>           qdev_free(dev);
>>>>>           return rc;
>>>>>       }
>>>>> +    if (dev->hotplugged) {
>>>>> +        qdev_reset(dev);
>>>>> +    }
>>>>>       qemu_register_reset(qdev_reset, dev);
>>>>>       if (dev->info->vmsd) {
>>>>>           vmstate_register_with_alias_id(dev, -1, dev->info->vmsd, dev,
>>>>>          
>>>> qdev_reset() isn't necessary when !dev->hotplugged, because then
>>>> qemu_system_reset() will run shortly, which will call qdev_reset().
>>>> Correct?
>>>>        
>>> Yes, exactly.
>>>      
>> Hmm.  "qdev_init() automatically calls qdev_reset() if hotplug" feels
>> needlessly complicated.  I'd prefer qdev_init() to call it always or
>> never.
>>
>> If "always", we reset twice for cold-plug.  Is that bad?
>>
>> If "never", we need to reset somewhere else for hot-plug.  What about
>> do_device_add()?
>>    
>
> The real problem is how we do reset.  We shouldn't register a reset
> handler for every qdev device but rather register a single reset
> handler that walks the device tree and calls reset on every reachable
> device.
>
> Then we can always call reset in init() and there's no need to have a
> dev->hotplugged check.  The qdev device tree reset handler should not
> be registered until *after* we call qemu_system_reset() after creating
> the device model which will ensure that we don't do a double reset.

Fine with me.

But we need to merge something short term (pre 0.13) to fix hot plug of
e1000 et al.  Use Alex's patch as such a stop-gap?
Avi Kivity - Aug. 23, 2010, noon
On 08/20/2010 12:00 PM, Markus Armbruster wrote:
> Alex Williamson<alex.williamson@redhat.com>  writes:
>
>> Several devices rely on their reset() function being called to
>> initialize device state, e1000 and rtl8139 in particular.  When
>> the device is hot added, the reset doesn't occur, often leaving
>> the device in an unusable state.  Adding a call to reset() after
>> init() for hotplugged devices puts the device in the expected
>> state for the guest.
>>
>> Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
>> ---
>>
>>   0.13 candidate?
>>
>>   hw/qdev.c |    3 +++
>>   1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/hw/qdev.c b/hw/qdev.c
>> index e99c73f..b156272 100644
>> --- a/hw/qdev.c
>> +++ b/hw/qdev.c
>> @@ -278,6 +278,9 @@ int qdev_init(DeviceState *dev)
>>           qdev_free(dev);
>>           return rc;
>>       }
>> +    if (dev->hotplugged) {
>> +        qdev_reset(dev);
>> +    }
>>       qemu_register_reset(qdev_reset, dev);
>>       if (dev->info->vmsd) {
>>           vmstate_register_with_alias_id(dev, -1, dev->info->vmsd, dev,
> qdev_reset() isn't necessary when !dev->hotplugged, because then
> qemu_system_reset() will run shortly, which will call qdev_reset().
> Correct?

Off-topic, but what's the reason for dev->hotplugged's existence?  A 
device is either plugged or not, it is either hotpluggable or not, but 
is there a way to tell, from looking at a plugged device, whether it has 
been hotplugged in the past?

AFAICT it is redundant state and should be removed.
Anthony Liguori - Aug. 23, 2010, 12:21 p.m.
On 08/23/2010 07:00 AM, Avi Kivity wrote:
> Off-topic, but what's the reason for dev->hotplugged's existence?  A 
> device is either plugged or not, it is either hotpluggable or not, but 
> is there a way to tell, from looking at a plugged device, whether it 
> has been hotplugged in the past?
>
> AFAICT it is redundant state and should be removed.

I've got that in my qdev tree.

http://repo.or.cz/w/qemu/aliguori.git/commit/672720c5c0315d2a5b43f58fd75822cc4a390ae9

Regards,

Anthony Liguori
Isaku Yamahata - Aug. 25, 2010, 3:07 a.m.
On Fri, Aug 20, 2010 at 10:56:57AM -0500, Anthony Liguori wrote:
> The real problem is how we do reset.  We shouldn't register a reset  
> handler for every qdev device but rather register a single reset handler  
> that walks the device tree and calls reset on every reachable device.
>
> Then we can always call reset in init() and there's no need to have a  
> dev->hotplugged check.  The qdev device tree reset handler should not be  
> registered until *after* we call qemu_system_reset() after creating the  
> device model which will ensure that we don't do a double reset.

I don't see why you re-invent the patch.
Please see the patches I sent out on August 5.
http://lists.nongnu.org/archive/html/qemu-devel/2010-08/msg00377.html
http://lists.nongnu.org/archive/html/qemu-devel/2010-08/msg00383.html

Maybe we can merge the patches.
As for your patch, I have some comment.
- bus itself may want its own handler. At lease pci bus needs it.
  And propagating reset signal to children is up to the bus controller.

- child devices should be reset before parent.
  This doesn't matter much though.
Anthony Liguori - Aug. 25, 2010, 12:55 p.m.
On 08/24/2010 10:07 PM, Isaku Yamahata wrote:
> On Fri, Aug 20, 2010 at 10:56:57AM -0500, Anthony Liguori wrote:
>    
>> The real problem is how we do reset.  We shouldn't register a reset
>> handler for every qdev device but rather register a single reset handler
>> that walks the device tree and calls reset on every reachable device.
>>
>> Then we can always call reset in init() and there's no need to have a
>> dev->hotplugged check.  The qdev device tree reset handler should not be
>> registered until *after* we call qemu_system_reset() after creating the
>> device model which will ensure that we don't do a double reset.
>>      
> I don't see why you re-invent the patch.
> Please see the patches I sent out on August 5.
> http://lists.nongnu.org/archive/html/qemu-devel/2010-08/msg00377.html
> http://lists.nongnu.org/archive/html/qemu-devel/2010-08/msg00383.html
>    

I honestly didn't connect the two but it's now obvious to me that they 
overlap significantly.

> Maybe we can merge the patches.
> As for your patch, I have some comment.
> - bus itself may want its own handler. At lease pci bus needs it.
>    And propagating reset signal to children is up to the bus controller.
>    

I disagree.  Reset should be equivalent to power off + init and it's not 
something that can be selectively propagated.

There are different types of bus level resets and it may make sense to 
model that in the PCI layer but the qdev reset is a power cycle 
semantically.

I think using a walker pattern is a stronger design than having each 
reset function do it's own transversal because the later has the 
potential to introduce bugs.

Regards,

Anthony Liguori

> - child devices should be reset before parent.
>    This doesn't matter much though.
>
>
Isaku Yamahata - Aug. 25, 2010, 3:17 p.m.
On Wed, Aug 25, 2010 at 07:55:27AM -0500, Anthony Liguori wrote:
>> Maybe we can merge the patches.
>> As for your patch, I have some comment.
>> - bus itself may want its own handler. At lease pci bus needs it.
>>    And propagating reset signal to children is up to the bus controller.
>>    
>
> I disagree.  Reset should be equivalent to power off + init and it's not  
> something that can be selectively propagated.
>
> There are different types of bus level resets and it may make sense to  
> model that in the PCI layer but the qdev reset is a power cycle  
> semantically.
>
> I think using a walker pattern is a stronger design than having each  
> reset function do it's own transversal because the later has the  
> potential to introduce bugs.

Warm reset is different from cold reset, so reset isn't equivalent to
a power cycle. Typically registers which report errors must retain the
contents across warm reset, but doesn't across a power cycle.
I can see the reset call back is called when both power on and warm reset,
but I don't see why qdev_reset() is a power cycle semantically.

Although your current patch might work for me at the moment,
the claim that qdev_reset() == a power cycle worries me.

How about adding a argument to distinguish reset type?
Something like QEMU_RESET_COLD, QEMU_RESET_WARM.
Anthony Liguori - Aug. 25, 2010, 4:49 p.m.
On 08/25/2010 10:17 AM, Isaku Yamahata wrote:
> On Wed, Aug 25, 2010 at 07:55:27AM -0500, Anthony Liguori wrote:
>    
>>> Maybe we can merge the patches.
>>> As for your patch, I have some comment.
>>> - bus itself may want its own handler. At lease pci bus needs it.
>>>     And propagating reset signal to children is up to the bus controller.
>>>
>>>        
>> I disagree.  Reset should be equivalent to power off + init and it's not
>> something that can be selectively propagated.
>>
>> There are different types of bus level resets and it may make sense to
>> model that in the PCI layer but the qdev reset is a power cycle
>> semantically.
>>
>> I think using a walker pattern is a stronger design than having each
>> reset function do it's own transversal because the later has the
>> potential to introduce bugs.
>>      
> Warm reset is different from cold reset, so reset isn't equivalent to
> a power cycle.

But qdev reset is not warm reset.  It's cold reset.  IOW, it's called 
when the device is first powered up.

>   Typically registers which report errors must retain the
> contents across warm reset, but doesn't across a power cycle.
> I can see the reset call back is called when both power on and warm reset,
> but I don't see why qdev_reset() is a power cycle semantically.
>
> Although your current patch might work for me at the moment,
> the claim that qdev_reset() == a power cycle worries me.
>
> How about adding a argument to distinguish reset type?
> Something like QEMU_RESET_COLD, QEMU_RESET_WARM.
>    

Before approaching this, I think we ought to better understand exactly 
what type of reset semantics we would need to support in the future.

I think that starts by understanding exactly what's guaranteed and 
understanding what the use cases are for it.

Regards,

Anthony Liguori
Isaku Yamahata - Aug. 26, 2010, 8:38 a.m.
On Wed, Aug 25, 2010 at 11:49:19AM -0500, Anthony Liguori wrote:
> On 08/25/2010 10:17 AM, Isaku Yamahata wrote:
>> On Wed, Aug 25, 2010 at 07:55:27AM -0500, Anthony Liguori wrote:
>>    
>>>> Maybe we can merge the patches.
>>>> As for your patch, I have some comment.
>>>> - bus itself may want its own handler. At lease pci bus needs it.
>>>>     And propagating reset signal to children is up to the bus controller.
>>>>
>>>>        
>>> I disagree.  Reset should be equivalent to power off + init and it's not
>>> something that can be selectively propagated.
>>>
>>> There are different types of bus level resets and it may make sense to
>>> model that in the PCI layer but the qdev reset is a power cycle
>>> semantically.
>>>
>>> I think using a walker pattern is a stronger design than having each
>>> reset function do it's own transversal because the later has the
>>> potential to introduce bugs.
>>>      
>> Warm reset is different from cold reset, so reset isn't equivalent to
>> a power cycle.
>
> But qdev reset is not warm reset.  It's cold reset.  IOW, it's called  
> when the device is first powered up.

No. It's not true.
Surely qemu_system_reset() is called on qemu startup from main().
But it is also called from main_loop() as warm reset.
qemu_system_reset_request() is called when warm reset is requested.
That is, qemu_system_reset() doesn't distinguish warm reset from
cold reset. Thus qdev_reset() doesn't.


>>   Typically registers which report errors must retain the
>> contents across warm reset, but doesn't across a power cycle.
>> I can see the reset call back is called when both power on and warm reset,
>> but I don't see why qdev_reset() is a power cycle semantically.
>>
>> Although your current patch might work for me at the moment,
>> the claim that qdev_reset() == a power cycle worries me.
>>
>> How about adding a argument to distinguish reset type?
>> Something like QEMU_RESET_COLD, QEMU_RESET_WARM.
>>    
>
> Before approaching this, I think we ought to better understand exactly  
> what type of reset semantics we would need to support in the future.
>
> I think that starts by understanding exactly what's guaranteed and  
> understanding what the use cases are for it.

Fair enough. How about the followings?
This is just a starting point. I borrowed terminology pci/pcie spec.


reset
	Bring the state of hardware state to consistent state.
	(some state might be left unknown.)
	

system reset
	a hardware mechanism for setting or returning all hardware states
	to the initial conditions.
	Use case:
	In qemu, system_system_reset().


cold reset(power on reset)
	system reset following the application of power.
	Use case:
	In qemu, system_reset()	in main().
	We might want to use this as a power cycle.
	When a device is hot plugged, the device should be cold reset too.
	This is your motivation.
	QEMU_RESET_COLD
	Guarantee:
	The internal status must be same to qdev_init() + qdev_reset()

	
warm reset
	system reset without cycling the supplied power.
	Use case:
	In qemu, system_reset() in main_loop(). There are many places
	which calls qemu_system_reset_request().
	Some state are retained across warm reset. Like PCIe AER, error
	reporting registers need to keep its contents across warm reset
	as OS would examine them and report it when hardware errors caused
	warm reset.
	QEMU_RESET_WARM
	

bus reset
	Reset bus and devices on the bus.
	Bus reset is usually triggered when cold reset, warm reset and
	commanding the bus controller to reset the child bus.
	When bus reset is triggered as command to bus controller,
	the effect is usually same to warm reset on devices on the bus.

	Typically on parallel bus, bus reset is started by asserting
	a designated signal.
	Example: PCI RST#, ATA RESET-, SCSI RST
	
	Use case:
	bus reset as result of programming bus controller. 
	Qemu is currently missing it which I'd like to fill for pci bus.
	ATA and SCSI could benefit from this.
	QEMU_RESET_WARM with bus.
	Guarantee:
	device state under the bus is same as warm reset.


device/function reset:
	Reset triggered by sending reset command to a device.
	This is bus/device specific.
	There might be many reset commands whose effects are different.
	Example: PCI FLR, ATA DEVICE RESET command,
                 scsi bus device reset message.

	This reset is bus specific, so it wouldn't be suitable for qdev
	frame work and could be handled by each bus level.

	
hot reset:
	I just put it here for completeness because pcie defines hot reset.
	A reset propagated in-band across a Link using a Physical Layer
	mechanism. 
	Qemu doesn't emulate physical layer, so we don't care it.
	From software point of view, hot reset has same effect to warm reset.
Anthony Liguori - Aug. 26, 2010, 1:02 p.m.
On 08/26/2010 03:38 AM, Isaku Yamahata wrote:
>
>> I think that starts by understanding exactly what's guaranteed and
>> understanding what the use cases are for it.
>>      
> Fair enough. How about the followings?
>    

Thanks for enumerating.

> This is just a starting point. I borrowed terminology pci/pcie spec.
>
>
> reset
> 	Bring the state of hardware state to consistent state.
> 	(some state might be left unknown.)
> 	
>
> system reset
> 	a hardware mechanism for setting or returning all hardware states
> 	to the initial conditions.
> 	Use case:
> 	In qemu, system_system_reset().
>
>
> cold reset(power on reset)
> 	system reset following the application of power.
> 	Use case:
> 	In qemu, system_reset()	in main().
> 	We might want to use this as a power cycle.
> 	When a device is hot plugged, the device should be cold reset too.
> 	This is your motivation.
> 	QEMU_RESET_COLD
> 	Guarantee:
> 	The internal status must be same to qdev_init() + qdev_reset()
>    

This is what we do today in QEMU and from a functional perspective it 
covers the type of function we need today.

> 	
> warm reset
> 	system reset without cycling the supplied power.
> 	Use case:
> 	In qemu, system_reset() in main_loop(). There are many places
> 	which calls qemu_system_reset_request().
> 	Some state are retained across warm reset. Like PCIe AER, error
> 	reporting registers need to keep its contents across warm reset
> 	as OS would examine them and report it when hardware errors caused
> 	warm reset.
> 	QEMU_RESET_WARM
>    

With AER, I can't imagine that this matters that much unless we're doing 
PCI passthrough, right?

So maybe the way we should frame this discussion is, what's the type of 
reset semantics that we need to support for PCI passthrough?  The next 
question after that is how do we achieve the different types of reset 
for passthrough devices?

BTW, if you could transfer some of this discussion to a wiki page on 
qemu.org, I think that would be extremely valuable.

Regards,

Anthony Liguori

> bus reset
> 	Reset bus and devices on the bus.
> 	Bus reset is usually triggered when cold reset, warm reset and
> 	commanding the bus controller to reset the child bus.
> 	When bus reset is triggered as command to bus controller,
> 	the effect is usually same to warm reset on devices on the bus.
>
> 	Typically on parallel bus, bus reset is started by asserting
> 	a designated signal.
> 	Example: PCI RST#, ATA RESET-, SCSI RST
> 	
> 	Use case:
> 	bus reset as result of programming bus controller.
> 	Qemu is currently missing it which I'd like to fill for pci bus.
> 	ATA and SCSI could benefit from this.
> 	QEMU_RESET_WARM with bus.
> 	Guarantee:
> 	device state under the bus is same as warm reset.
>
>
> device/function reset:
> 	Reset triggered by sending reset command to a device.
> 	This is bus/device specific.
> 	There might be many reset commands whose effects are different.
> 	Example: PCI FLR, ATA DEVICE RESET command,
>                   scsi bus device reset message.
>
> 	This reset is bus specific, so it wouldn't be suitable for qdev
> 	frame work and could be handled by each bus level.
>
> 	
> hot reset:
> 	I just put it here for completeness because pcie defines hot reset.
> 	A reset propagated in-band across a Link using a Physical Layer
> 	mechanism.
> 	Qemu doesn't emulate physical layer, so we don't care it.
> 	From software point of view, hot reset has same effect to warm reset.
>
>
Anthony Liguori - Aug. 26, 2010, 1:04 p.m.
On 08/26/2010 03:38 AM, Isaku Yamahata wrote:
>
> 	QEMU_RESET_COLD
>    

BTW, just from a implementation perspective, I'd rather have multiple 
reset callbacks in qdev instead of having a single callback with a type 
flag.  A type flag implies that every callback has to handle all cases 
whereas with separate callbacks, if a device doesn't implement 
warm_reset we can easily default it to reset (which is a cold reset).

Regards,

Anthony Liguori

> 	Guarantee:
> 	The internal status must be same to qdev_init() + qdev_reset()
>
> 	
> warm reset
> 	system reset without cycling the supplied power.
> 	Use case:
> 	In qemu, system_reset() in main_loop(). There are many places
> 	which calls qemu_system_reset_request().
> 	Some state are retained across warm reset. Like PCIe AER, error
> 	reporting registers need to keep its contents across warm reset
> 	as OS would examine them and report it when hardware errors caused
> 	warm reset.
> 	QEMU_RESET_WARM
> 	
>
> bus reset
> 	Reset bus and devices on the bus.
> 	Bus reset is usually triggered when cold reset, warm reset and
> 	commanding the bus controller to reset the child bus.
> 	When bus reset is triggered as command to bus controller,
> 	the effect is usually same to warm reset on devices on the bus.
>
> 	Typically on parallel bus, bus reset is started by asserting
> 	a designated signal.
> 	Example: PCI RST#, ATA RESET-, SCSI RST
> 	
> 	Use case:
> 	bus reset as result of programming bus controller.
> 	Qemu is currently missing it which I'd like to fill for pci bus.
> 	ATA and SCSI could benefit from this.
> 	QEMU_RESET_WARM with bus.
> 	Guarantee:
> 	device state under the bus is same as warm reset.
>
>
> device/function reset:
> 	Reset triggered by sending reset command to a device.
> 	This is bus/device specific.
> 	There might be many reset commands whose effects are different.
> 	Example: PCI FLR, ATA DEVICE RESET command,
>                   scsi bus device reset message.
>
> 	This reset is bus specific, so it wouldn't be suitable for qdev
> 	frame work and could be handled by each bus level.
>
> 	
> hot reset:
> 	I just put it here for completeness because pcie defines hot reset.
> 	A reset propagated in-band across a Link using a Physical Layer
> 	mechanism.
> 	Qemu doesn't emulate physical layer, so we don't care it.
> 	From software point of view, hot reset has same effect to warm reset.
>
>
Avi Kivity - Aug. 26, 2010, 1:15 p.m.
On 08/25/2010 03:55 PM, Anthony Liguori wrote:
>
>> Maybe we can merge the patches.
>> As for your patch, I have some comment.
>> - bus itself may want its own handler. At lease pci bus needs it.
>>    And propagating reset signal to children is up to the bus controller.
>
> I disagree.  Reset should be equivalent to power off + init and it's 
> not something that can be selectively propagated.

Not all busses propagate reset - SCSI is an example (I think).
Anthony Liguori - Aug. 26, 2010, 1:25 p.m.
On 08/26/2010 08:15 AM, Avi Kivity wrote:
>  On 08/25/2010 03:55 PM, Anthony Liguori wrote:
>>
>>> Maybe we can merge the patches.
>>> As for your patch, I have some comment.
>>> - bus itself may want its own handler. At lease pci bus needs it.
>>>    And propagating reset signal to children is up to the bus 
>>> controller.
>>
>> I disagree.  Reset should be equivalent to power off + init and it's 
>> not something that can be selectively propagated.
>
> Not all busses propagate reset - SCSI is an example (I think).
>

We're talking about cold reset vs. warm reset.

In the absence of passthrough, I'm struggling to see a useful use-case 
with warm reset.  However, there are many useful things we can do 
assuming a cold reset (like MADV_DONTNEED memory on reboot).

That's not saying we shouldn't do a warm reset, but I'd like to see that 
as an incremental addition to what we have today (like introducing a 
propagating warm reset callback) and thinking through what the actual 
behavior should and shouldn't be.

Regards,

Anthony Liguori
Avi Kivity - Aug. 26, 2010, 2:29 p.m.
On 08/26/2010 04:25 PM, Anthony Liguori wrote:
> On 08/26/2010 08:15 AM, Avi Kivity wrote:
>>  On 08/25/2010 03:55 PM, Anthony Liguori wrote:
>>>
>>>> Maybe we can merge the patches.
>>>> As for your patch, I have some comment.
>>>> - bus itself may want its own handler. At lease pci bus needs it.
>>>>    And propagating reset signal to children is up to the bus 
>>>> controller.
>>>
>>> I disagree.  Reset should be equivalent to power off + init and it's 
>>> not something that can be selectively propagated.
>>
>> Not all busses propagate reset - SCSI is an example (I think).
>>
>
> We're talking about cold reset vs. warm reset.
>
> In the absence of passthrough, I'm struggling to see a useful use-case 
> with warm reset.  However, there are many useful things we can do 
> assuming a cold reset (like MADV_DONTNEED memory on reboot).
>
> That's not saying we shouldn't do a warm reset, but I'd like to see 
> that as an incremental addition to what we have today (like 
> introducing a propagating warm reset callback) and thinking through 
> what the actual behavior should and shouldn't be.

Pressing the reset button is a warm reset on real machines, therefore it 
should be a warm reset in qemu.
Blue Swirl - Aug. 26, 2010, 5:39 p.m.
On Thu, Aug 26, 2010 at 2:29 PM, Avi Kivity <avi@redhat.com> wrote:
>  On 08/26/2010 04:25 PM, Anthony Liguori wrote:
>>
>> On 08/26/2010 08:15 AM, Avi Kivity wrote:
>>>
>>>  On 08/25/2010 03:55 PM, Anthony Liguori wrote:
>>>>
>>>>> Maybe we can merge the patches.
>>>>> As for your patch, I have some comment.
>>>>> - bus itself may want its own handler. At lease pci bus needs it.
>>>>>   And propagating reset signal to children is up to the bus controller.
>>>>
>>>> I disagree.  Reset should be equivalent to power off + init and it's not
>>>> something that can be selectively propagated.
>>>
>>> Not all busses propagate reset - SCSI is an example (I think).
>>>
>>
>> We're talking about cold reset vs. warm reset.
>>
>> In the absence of passthrough, I'm struggling to see a useful use-case
>> with warm reset.  However, there are many useful things we can do assuming a
>> cold reset (like MADV_DONTNEED memory on reboot).
>>
>> That's not saying we shouldn't do a warm reset, but I'd like to see that
>> as an incremental addition to what we have today (like introducing a
>> propagating warm reset callback) and thinking through what the actual
>> behavior should and shouldn't be.
>
> Pressing the reset button is a warm reset on real machines, therefore it
> should be a warm reset in qemu.

I agree. For example Sparc64 CPU uses different reset vector for warm
and cold reset, so system_reset acts like a reset button.

But most real life chips have only one reset signal pin, CPUs may be
the only cases where multiple reset signals are used. I think current
system_reset matches the need well. For most devices, system_reset
initiates the same procedure for cold or warm reset. Those devices
which care about warm reset can count the number of the resets. The
only problem I see is that there is no way to signal cold reset or
power cycle.

Then there are devices which don't have reset signals at all, like
RAM. Their behavior at reset is different compared to power cycling.
Stateless hardware like ROMs do not care about either.
Isaku Yamahata - Aug. 27, 2010, 3:52 a.m.
I added CC for those who might be interested in this discussion.

On Thu, Aug 26, 2010 at 08:02:38AM -0500, Anthony Liguori wrote:
> On 08/26/2010 03:38 AM, Isaku Yamahata wrote:
>>
>>> I think that starts by understanding exactly what's guaranteed and
>>> understanding what the use cases are for it.
>>>      
>> Fair enough. How about the followings?
>>    
>
> Thanks for enumerating.
>
>> This is just a starting point. I borrowed terminology pci/pcie spec.
>>
>>
>> reset
>> 	Bring the state of hardware state to consistent state.
>> 	(some state might be left unknown.)
>> 	
>>
>> system reset
>> 	a hardware mechanism for setting or returning all hardware states
>> 	to the initial conditions.
>> 	Use case:
>> 	In qemu, system_system_reset().
>>
>>
>> cold reset(power on reset)
>> 	system reset following the application of power.
>> 	Use case:
>> 	In qemu, system_reset()	in main().
>> 	We might want to use this as a power cycle.
>> 	When a device is hot plugged, the device should be cold reset too.
>> 	This is your motivation.
>> 	QEMU_RESET_COLD
>> 	Guarantee:
>> 	The internal status must be same to qdev_init() + qdev_reset()
>>    
>
> This is what we do today in QEMU and from a functional perspective it  
> covers the type of function we need today.
>
>> 	
>> warm reset
>> 	system reset without cycling the supplied power.
>> 	Use case:
>> 	In qemu, system_reset() in main_loop(). There are many places
>> 	which calls qemu_system_reset_request().
>> 	Some state are retained across warm reset. Like PCIe AER, error
>> 	reporting registers need to keep its contents across warm reset
>> 	as OS would examine them and report it when hardware errors caused
>> 	warm reset.
>> 	QEMU_RESET_WARM
>>    
>
> With AER, I can't imagine that this matters that much unless we're doing  
> PCI passthrough, right?

Even without PCI passthrough, AER errors can be injected into emulated pci/pcie
devices in a virtual pcie bus hierarchy from qemu command line.
This is useful to test guest OS AER handler.


> So maybe the way we should frame this discussion is, what's the type of  
> reset semantics that we need to support for PCI passthrough?  The next  
> question after that is how do we achieve the different types of reset  
> for passthrough devices?

What I want is hot reset in pcie terminology on a virtual bus as
PCI_BRIDGE_CTL_BUS_RESET emulation and propagated reset on devices/child
buses which might be a directly assigned.
In direct assigned device case, device-assignment code would be notified the reset.

As hot reset has same effect to warm reset in functionality sense
(the difference is the way to signal it in physical/signal layer which
qemu doesn't care),
I'd like to implement pci bus reset as triggering warm reset on a
(virtual) bus by utilizing qdev frame work.
This would be applicable to ata, scsi, I suppose.


It's another story how to virtualize hot reset on a given directly assigned
pci function or a pcie bus hierarchy. For example, as PCI device assignment
is done per function basis, co-existing functions in the same card shouldn't
be reset.
Another example is, virtual pci bus hierarchy might be reset, but it would
be difficult problem how to map the virtual bus topology to host bus topology.

thanks,

> BTW, if you could transfer some of this discussion to a wiki page on  
> qemu.org, I think that would be extremely valuable.
>
> Regards,
>
> Anthony Liguori
>
>> bus reset
>> 	Reset bus and devices on the bus.
>> 	Bus reset is usually triggered when cold reset, warm reset and
>> 	commanding the bus controller to reset the child bus.
>> 	When bus reset is triggered as command to bus controller,
>> 	the effect is usually same to warm reset on devices on the bus.
>>
>> 	Typically on parallel bus, bus reset is started by asserting
>> 	a designated signal.
>> 	Example: PCI RST#, ATA RESET-, SCSI RST
>> 	
>> 	Use case:
>> 	bus reset as result of programming bus controller.
>> 	Qemu is currently missing it which I'd like to fill for pci bus.
>> 	ATA and SCSI could benefit from this.
>> 	QEMU_RESET_WARM with bus.
>> 	Guarantee:
>> 	device state under the bus is same as warm reset.
>>
>>
>> device/function reset:
>> 	Reset triggered by sending reset command to a device.
>> 	This is bus/device specific.
>> 	There might be many reset commands whose effects are different.
>> 	Example: PCI FLR, ATA DEVICE RESET command,
>>                   scsi bus device reset message.
>>
>> 	This reset is bus specific, so it wouldn't be suitable for qdev
>> 	frame work and could be handled by each bus level.
>>
>> 	
>> hot reset:
>> 	I just put it here for completeness because pcie defines hot reset.
>> 	A reset propagated in-band across a Link using a Physical Layer
>> 	mechanism.
>> 	Qemu doesn't emulate physical layer, so we don't care it.
>> 	From software point of view, hot reset has same effect to warm reset.
>>
>>    
>
Isaku Yamahata - Aug. 27, 2010, 7:28 a.m.
On Thu, Aug 26, 2010 at 08:02:38AM -0500, Anthony Liguori wrote:
> BTW, if you could transfer some of this discussion to a wiki page on  
> qemu.org, I think that would be extremely valuable.

http://wiki.qemu.org/Features/ResetAPI
Wei Xu - Aug. 27, 2010, 5:43 p.m.
Isaku and Anthony:

This is excellent discussion! Thanks for forwarding.

Wei Xu

wexu2@cisco.com


On 8/26/10 8:52 PM, "Isaku Yamahata" <yamahata@valinux.co.jp> wrote:

> I added CC for those who might be interested in this discussion.
> 
> On Thu, Aug 26, 2010 at 08:02:38AM -0500, Anthony Liguori wrote:
>> On 08/26/2010 03:38 AM, Isaku Yamahata wrote:
>>> 
>>>> I think that starts by understanding exactly what's guaranteed and
>>>> understanding what the use cases are for it.
>>>>      
>>> Fair enough. How about the followings?
>>>    
>> 
>> Thanks for enumerating.
>> 
>>> This is just a starting point. I borrowed terminology pci/pcie spec.
>>> 
>>> 
>>> reset
>>> Bring the state of hardware state to consistent state.
>>> (some state might be left unknown.)
>>> 
>>> 
>>> system reset
>>> a hardware mechanism for setting or returning all hardware states
>>> to the initial conditions.
>>> Use case:
>>> In qemu, system_system_reset().
>>> 
>>> 
>>> cold reset(power on reset)
>>> system reset following the application of power.
>>> Use case:
>>> In qemu, system_reset() in main().
>>> We might want to use this as a power cycle.
>>> When a device is hot plugged, the device should be cold reset too.
>>> This is your motivation.
>>> QEMU_RESET_COLD
>>> Guarantee:
>>> The internal status must be same to qdev_init() + qdev_reset()
>>>    
>> 
>> This is what we do today in QEMU and from a functional perspective it
>> covers the type of function we need today.
>> 
>>> 
>>> warm reset
>>> system reset without cycling the supplied power.
>>> Use case:
>>> In qemu, system_reset() in main_loop(). There are many places
>>> which calls qemu_system_reset_request().
>>> Some state are retained across warm reset. Like PCIe AER, error
>>> reporting registers need to keep its contents across warm reset
>>> as OS would examine them and report it when hardware errors caused
>>> warm reset.
>>> QEMU_RESET_WARM
>>>    
>> 
>> With AER, I can't imagine that this matters that much unless we're doing
>> PCI passthrough, right?
> 
> Even without PCI passthrough, AER errors can be injected into emulated
> pci/pcie
> devices in a virtual pcie bus hierarchy from qemu command line.
> This is useful to test guest OS AER handler.
> 
> 
>> So maybe the way we should frame this discussion is, what's the type of
>> reset semantics that we need to support for PCI passthrough?  The next
>> question after that is how do we achieve the different types of reset
>> for passthrough devices?
> 
> What I want is hot reset in pcie terminology on a virtual bus as
> PCI_BRIDGE_CTL_BUS_RESET emulation and propagated reset on devices/child
> buses which might be a directly assigned.
> In direct assigned device case, device-assignment code would be notified the
> reset.
> 
> As hot reset has same effect to warm reset in functionality sense
> (the difference is the way to signal it in physical/signal layer which
> qemu doesn't care),
> I'd like to implement pci bus reset as triggering warm reset on a
> (virtual) bus by utilizing qdev frame work.
> This would be applicable to ata, scsi, I suppose.
> 
> 
> It's another story how to virtualize hot reset on a given directly assigned
> pci function or a pcie bus hierarchy. For example, as PCI device assignment
> is done per function basis, co-existing functions in the same card shouldn't
> be reset.
> Another example is, virtual pci bus hierarchy might be reset, but it would
> be difficult problem how to map the virtual bus topology to host bus topology.
> 
> thanks,
> 
>> BTW, if you could transfer some of this discussion to a wiki page on
>> qemu.org, I think that would be extremely valuable.
>> 
>> Regards,
>> 
>> Anthony Liguori
>> 
>>> bus reset
>>> Reset bus and devices on the bus.
>>> Bus reset is usually triggered when cold reset, warm reset and
>>> commanding the bus controller to reset the child bus.
>>> When bus reset is triggered as command to bus controller,
>>> the effect is usually same to warm reset on devices on the bus.
>>> 
>>> Typically on parallel bus, bus reset is started by asserting
>>> a designated signal.
>>> Example: PCI RST#, ATA RESET-, SCSI RST
>>> 
>>> Use case:
>>> bus reset as result of programming bus controller.
>>> Qemu is currently missing it which I'd like to fill for pci bus.
>>> ATA and SCSI could benefit from this.
>>> QEMU_RESET_WARM with bus.
>>> Guarantee:
>>> device state under the bus is same as warm reset.
>>> 
>>> 
>>> device/function reset:
>>> Reset triggered by sending reset command to a device.
>>> This is bus/device specific.
>>> There might be many reset commands whose effects are different.
>>> Example: PCI FLR, ATA DEVICE RESET command,
>>>                   scsi bus device reset message.
>>> 
>>> This reset is bus specific, so it wouldn't be suitable for qdev
>>> frame work and could be handled by each bus level.
>>> 
>>> 
>>> hot reset:
>>> I just put it here for completeness because pcie defines hot reset.
>>> A reset propagated in-band across a Link using a Physical Layer
>>> mechanism.
>>> Qemu doesn't emulate physical layer, so we don't care it.
>>> From software point of view, hot reset has same effect to warm reset.
>>> 
>>>    
>>

Patch

diff --git a/hw/qdev.c b/hw/qdev.c
index e99c73f..b156272 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -278,6 +278,9 @@  int qdev_init(DeviceState *dev)
         qdev_free(dev);
         return rc;
     }
+    if (dev->hotplugged) {
+        qdev_reset(dev);
+    }
     qemu_register_reset(qdev_reset, dev);
     if (dev->info->vmsd) {
         vmstate_register_with_alias_id(dev, -1, dev->info->vmsd, dev,