Patchwork [1/2] pci: Add pci_device_get_host_irq

login
register
mail settings
Submitter Michael S. Tsirkin
Date May 30, 2012, 1:34 p.m.
Message ID <20120530133415.GA27788@redhat.com>
Download mbox | patch
Permalink /patch/161982/
State New
Headers show

Comments

Michael S. Tsirkin - May 30, 2012, 1:34 p.m.
On Wed, May 30, 2012 at 02:11:58PM +0200, Jan Kiszka wrote:
> On 2012-05-21 23:03, Michael S. Tsirkin wrote:
> > On Mon, May 21, 2012 at 05:35:34PM -0300, Jan Kiszka wrote:
> >> On 2012-05-21 16:05, Michael S. Tsirkin wrote:
> >>> On Mon, May 21, 2012 at 10:13:47AM -0300, Jan Kiszka wrote:
> >>>> @@ -386,6 +387,14 @@ static void piix3_set_irq(void *opaque, int pirq, int level)
> >>>>      piix3_set_irq_level(piix3, pirq, level);
> >>>>  }
> >>>>  
> >>>> +static int piix3_map_host_irq(void *opaque, int pci_intx)
> >>>> +{
> >>>> +    PIIX3State *piix3 = opaque;
> >>>> +    int host_irq = piix3->dev.config[PIIX_PIRQC + pci_intx];
> >>>> +
> >>>> +    return host_irq < PIIX_NUM_PIC_IRQS ? host_irq : -1;
> >>>> +}
> >>>> +
> >>>>  /* irq routing is changed. so rebuild bitmap */
> >>>>  static void piix3_update_irq_levels(PIIX3State *piix3)
> >>>>  {
> >>>
> >>>
> >>> So, instead of special API just for assignment,
> >>> I would like to see map_irq in piix being reworked
> >>> to take dev config into account.
> >>> I think piix is almost unique in this but need to check,
> >>
> >> Maybe it is the only host bridge with routing that we have in QEMU right
> >> now, but it is surely not unique (e.g. all later Intel chipsets support
> >> this).
> > 
> > Yes. APIs for this should be in place. Just saying
> > instead of failing we can easily just make it work
> > if there are no remappings.
> > 
> >>> if not fix other host buses that are programmable.
> >>> PCI bridges are all fixed routing.
> >>>
> >>> Then we can drop set_irq callback.
> >>
> >> set_irq may do more than IRQ routing. It may also flip polarity (see
> >> bonito.c).
> > 
> > In that case host_map_irq is not a good API?
> > Need to investigate.
> > 
> >> I guess we need to analyze the requirements of all in-tree
> >> host bridges first.
> > 
> > Yes.
> > 
> >>>
> >>> Finally all devices can cache the irq#,
> >>> and piix would scan devices behind it
> >>> and update the irq.
> >>>
> >>> Assignment then just needs a notifier, since
> >>> it owns the device just a pointer in device is
> >>> enough.
> >>
> >> I cannot completely follow your ideas here yet. Do you want to pass the
> >> mapping information along the notifier, or why "just ... a notifier"?
> > 
> > 
> > Each device would resolve IRQs that it uses.
> > 
> > 
> >>>
> >>> Could you look at doing this please?
> >>> If no I can give it a stub.
> >>>
> >>
> >> Unless we can converge over a final API quickly, I would suggest to base
> >> these refactorings on top of what I propose here. We will have to touch
> >> all host bridges more invasively for this anyway. If you can explain to
> >> me how simple the refactoring can be (but I'm a bit skeptical ;) ), I
> >> could do this, otherwise I would prefer to focus on the device
> >> assignment topic which provide some more nuts.
> >>
> >> Jan
> > 
> > Yes it's simple. I'll post in a couple of days (lots of
> > meetings tomorrow). I think you'll
> > see it's easier and cleaner.
> 
> I looked into this again and it appears to me that, in fact, more work
> is required to bypass the current interrupt routing on delivery:
> 
> The PIIX2 tracks the interrupt level of each device on its bus with the
> help of PCIBus::irq_count. On routing changes via its config space,
> those levels are currently used to generate the new host IRQ states.
> But, once we bypass that level state tracking, we need to do something
> else, and that better for _all_ devices: clear all outputs and let the
> devices issue an update. The assigned device could provide this based on
> the INTx status bit, for all others we need to track the level generically.
> 
> Did you start working on this topic already?
> 
> Jan

I will need a bit of time to understand what you are saying above.

Unfortunately I got distracted by various end of quarter
activities.

Below's as far as I got - hopefully this
explains what I had in mind: for deep hierarchies
this will save a bus scan so I think this makes sense
even in the absense of kvm irqchip.

Warning: completely untested and known to be incomplete.
Please judge whether this is going in a direction that
is helpful for your efforts. If you respond in the positive,
I hope to be able to get back to this next week.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jan Kiszka - May 30, 2012, 2:38 p.m.
On 2012-05-30 15:34, Michael S. Tsirkin wrote:
> Below's as far as I got - hopefully this
> explains what I had in mind: for deep hierarchies
> this will save a bus scan so I think this makes sense
> even in the absense of kvm irqchip.
> 
> Warning: completely untested and known to be incomplete.
> Please judge whether this is going in a direction that
> is helpful for your efforts. If you respond in the positive,
> I hope to be able to get back to this next week.

We need to
 - account for polarity changes along the route
 - get rid of irq_count as it is no longer updated in the fast path,
   thus useless on routing updates

So it's a bit more complicated and requires a some broader refactorings.
But there is likely some room for simplifications in the end, true.

Jan
Michael S. Tsirkin - May 30, 2012, 2:42 p.m.
On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote:
> On 2012-05-30 15:34, Michael S. Tsirkin wrote:
> > Below's as far as I got - hopefully this
> > explains what I had in mind: for deep hierarchies
> > this will save a bus scan so I think this makes sense
> > even in the absense of kvm irqchip.
> > 
> > Warning: completely untested and known to be incomplete.
> > Please judge whether this is going in a direction that
> > is helpful for your efforts. If you respond in the positive,
> > I hope to be able to get back to this next week.
> 
> We need to
>  - account for polarity changes along the route
>  - get rid of irq_count as it is no longer updated in the fast path,
>    thus useless on routing updates

I'll need to consider this more to understand what you mean here.

> So it's a bit more complicated and requires a some broader refactorings.

Is this one good as a first step then?
If not I'll drop it for now.

> But there is likely some room for simplifications in the end, true.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
Jan Kiszka - May 30, 2012, 2:46 p.m.
On 2012-05-30 16:42, Michael S. Tsirkin wrote:
> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote:
>> On 2012-05-30 15:34, Michael S. Tsirkin wrote:
>>> Below's as far as I got - hopefully this
>>> explains what I had in mind: for deep hierarchies
>>> this will save a bus scan so I think this makes sense
>>> even in the absense of kvm irqchip.
>>>
>>> Warning: completely untested and known to be incomplete.
>>> Please judge whether this is going in a direction that
>>> is helpful for your efforts. If you respond in the positive,
>>> I hope to be able to get back to this next week.
>>
>> We need to
>>  - account for polarity changes along the route
>>  - get rid of irq_count as it is no longer updated in the fast path,
>>    thus useless on routing updates
> 
> I'll need to consider this more to understand what you mean here.

If, e.g., the host bridge is configured to flip the polarity of some
interrupt on delivery, the fast path must be able to explore this in
order to do the same.

Then you may want to have a look at how irq_count is maintained and when
it is used.

> 
>> So it's a bit more complicated and requires a some broader refactorings.
> 
> Is this one good as a first step then?
> If not I'll drop it for now.

I think we can't reliably implement this fast path delivery without
solving the above issues, so we can't do it stepwise, at least not with
this as first one.

Jan
Michael S. Tsirkin - May 30, 2012, 2:59 p.m.
On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote:
> On 2012-05-30 16:42, Michael S. Tsirkin wrote:
> > On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote:
> >> On 2012-05-30 15:34, Michael S. Tsirkin wrote:
> >>> Below's as far as I got - hopefully this
> >>> explains what I had in mind: for deep hierarchies
> >>> this will save a bus scan so I think this makes sense
> >>> even in the absense of kvm irqchip.
> >>>
> >>> Warning: completely untested and known to be incomplete.
> >>> Please judge whether this is going in a direction that
> >>> is helpful for your efforts. If you respond in the positive,
> >>> I hope to be able to get back to this next week.
> >>
> >> We need to
> >>  - account for polarity changes along the route
> >>  - get rid of irq_count as it is no longer updated in the fast path,
> >>    thus useless on routing updates
> > 
> > I'll need to consider this more to understand what you mean here.
> 
> If, e.g., the host bridge is configured to flip the polarity of some
> interrupt on delivery, the fast path must be able to explore this in
> order to do the same.

This can be solved incrementally I think - handle polarity
change like mapping change, no?

> Then you may want to have a look at how irq_count is maintained and when
> it is used.

In my patch it simply there to OR irq levels of all devices
connected to a specific pin.

> > 
> >> So it's a bit more complicated and requires a some broader refactorings.
> > 
> > Is this one good as a first step then?
> > If not I'll drop it for now.
> 
> I think we can't reliably implement this fast path delivery without
> solving the above issues, so we can't do it stepwise, at least not with
> this as first one.
> 
> Jan


> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
Jan Kiszka - May 30, 2012, 3:47 p.m.
On 2012-05-30 16:59, Michael S. Tsirkin wrote:
> On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote:
>> On 2012-05-30 16:42, Michael S. Tsirkin wrote:
>>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote:
>>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote:
>>>>> Below's as far as I got - hopefully this
>>>>> explains what I had in mind: for deep hierarchies
>>>>> this will save a bus scan so I think this makes sense
>>>>> even in the absense of kvm irqchip.
>>>>>
>>>>> Warning: completely untested and known to be incomplete.
>>>>> Please judge whether this is going in a direction that
>>>>> is helpful for your efforts. If you respond in the positive,
>>>>> I hope to be able to get back to this next week.
>>>>
>>>> We need to
>>>>  - account for polarity changes along the route
>>>>  - get rid of irq_count as it is no longer updated in the fast path,
>>>>    thus useless on routing updates
>>>
>>> I'll need to consider this more to understand what you mean here.
>>
>> If, e.g., the host bridge is configured to flip the polarity of some
>> interrupt on delivery, the fast path must be able to explore this in
>> order to do the same.
> 
> This can be solved incrementally I think - handle polarity
> change like mapping change, no?

Both belong together if we want to do it generically, IMHO.

> 
>> Then you may want to have a look at how irq_count is maintained and when
>> it is used.
> 
> In my patch it simply there to OR irq levels of all devices
> connected to a specific pin.

I think what we want is to avoid any walks and intermediate state
updates for all IRQ deliveries. That would be beneficial for any user,
not just device assignment. It would basically make the special case the
normal one, reducing code complexity. As there are still some problems
to solve for this, I'm just unsure if this step goes in the right
direction and will be reusable.

Jan
Michael S. Tsirkin - May 30, 2012, 4:17 p.m.
On Wed, May 30, 2012 at 05:47:45PM +0200, Jan Kiszka wrote:
> On 2012-05-30 16:59, Michael S. Tsirkin wrote:
> > On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote:
> >> On 2012-05-30 16:42, Michael S. Tsirkin wrote:
> >>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote:
> >>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote:
> >>>>> Below's as far as I got - hopefully this
> >>>>> explains what I had in mind: for deep hierarchies
> >>>>> this will save a bus scan so I think this makes sense
> >>>>> even in the absense of kvm irqchip.
> >>>>>
> >>>>> Warning: completely untested and known to be incomplete.
> >>>>> Please judge whether this is going in a direction that
> >>>>> is helpful for your efforts. If you respond in the positive,
> >>>>> I hope to be able to get back to this next week.
> >>>>
> >>>> We need to
> >>>>  - account for polarity changes along the route
> >>>>  - get rid of irq_count as it is no longer updated in the fast path,
> >>>>    thus useless on routing updates
> >>>
> >>> I'll need to consider this more to understand what you mean here.
> >>
> >> If, e.g., the host bridge is configured to flip the polarity of some
> >> interrupt on delivery, the fast path must be able to explore this in
> >> order to do the same.
> > 
> > This can be solved incrementally I think - handle polarity
> > change like mapping change, no?
> 
> Both belong together if we want to do it generically, IMHO.

So irq_polarity callback like we have for map_irq ATM?  But at least pci
bridges never do this flip.  Maybe this is the property of the interrupt
controller after all?

> > 
> >> Then you may want to have a look at how irq_count is maintained and when
> >> it is used.
> > 
> > In my patch it simply there to OR irq levels of all devices
> > connected to a specific pin.
> 
> I think what we want is to avoid any walks and intermediate state
> updates for all IRQ deliveries.

Well the bus is not an intermediate state ATM as piix
only has a bit per IRQ so it can't OR devices together.
So you want to move the counter over to piix?

> That would be beneficial for any user,
> not just device assignment. It would basically make the special case the
> normal one, reducing code complexity. As there are still some problems
> to solve for this, I'm just unsure if this step goes in the right
> direction and will be reusable.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
Jan Kiszka - May 30, 2012, 4:46 p.m.
On 2012-05-30 18:17, Michael S. Tsirkin wrote:
> On Wed, May 30, 2012 at 05:47:45PM +0200, Jan Kiszka wrote:
>> On 2012-05-30 16:59, Michael S. Tsirkin wrote:
>>> On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote:
>>>> On 2012-05-30 16:42, Michael S. Tsirkin wrote:
>>>>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote:
>>>>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote:
>>>>>>> Below's as far as I got - hopefully this
>>>>>>> explains what I had in mind: for deep hierarchies
>>>>>>> this will save a bus scan so I think this makes sense
>>>>>>> even in the absense of kvm irqchip.
>>>>>>>
>>>>>>> Warning: completely untested and known to be incomplete.
>>>>>>> Please judge whether this is going in a direction that
>>>>>>> is helpful for your efforts. If you respond in the positive,
>>>>>>> I hope to be able to get back to this next week.
>>>>>>
>>>>>> We need to
>>>>>>  - account for polarity changes along the route
>>>>>>  - get rid of irq_count as it is no longer updated in the fast path,
>>>>>>    thus useless on routing updates
>>>>>
>>>>> I'll need to consider this more to understand what you mean here.
>>>>
>>>> If, e.g., the host bridge is configured to flip the polarity of some
>>>> interrupt on delivery, the fast path must be able to explore this in
>>>> order to do the same.
>>>
>>> This can be solved incrementally I think - handle polarity
>>> change like mapping change, no?
>>
>> Both belong together if we want to do it generically, IMHO.
> 
> So irq_polarity callback like we have for map_irq ATM?  But at least pci
> bridges never do this flip.  Maybe this is the property of the interrupt
> controller after all?

It is a property of some host bridges apparently (e.g. bonito). In
theory, someone could also add a PCI-PCI bridge that does this (or does
the spec disallow it?).

> 
>>>
>>>> Then you may want to have a look at how irq_count is maintained and when
>>>> it is used.
>>>
>>> In my patch it simply there to OR irq levels of all devices
>>> connected to a specific pin.
>>
>> I think what we want is to avoid any walks and intermediate state
>> updates for all IRQ deliveries.
> 
> Well the bus is not an intermediate state ATM as piix
> only has a bit per IRQ so it can't OR devices together.
> So you want to move the counter over to piix?

irq_count can't be moved logically as it is part of the vmstate. But it
should only be generated for saving the state by polling all devices
(and bridges).

For that we need is an optional callback for devices via which we can
ask them to update PCIDevice::irq_state in case they don't trigger
pci_set_irq regularly. And pci_set_irq could simply change the input
line of the IRQ controller according to the cached route and polarity
mapping.

That's basically what I have in mind for any bus. But we could first try
it out on PCI, then later on generalize the design to make it useable
for all IRQ routings. I'm afraid there is no simpler way to introduce
direct IRQ delivery for PCI (unless ignoring corner cases like in qemu-kvm).

Jan
Michael S. Tsirkin - May 30, 2012, 5:20 p.m.
On Wed, May 30, 2012 at 06:46:03PM +0200, Jan Kiszka wrote:
> On 2012-05-30 18:17, Michael S. Tsirkin wrote:
> > On Wed, May 30, 2012 at 05:47:45PM +0200, Jan Kiszka wrote:
> >> On 2012-05-30 16:59, Michael S. Tsirkin wrote:
> >>> On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote:
> >>>> On 2012-05-30 16:42, Michael S. Tsirkin wrote:
> >>>>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote:
> >>>>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote:
> >>>>>>> Below's as far as I got - hopefully this
> >>>>>>> explains what I had in mind: for deep hierarchies
> >>>>>>> this will save a bus scan so I think this makes sense
> >>>>>>> even in the absense of kvm irqchip.
> >>>>>>>
> >>>>>>> Warning: completely untested and known to be incomplete.
> >>>>>>> Please judge whether this is going in a direction that
> >>>>>>> is helpful for your efforts. If you respond in the positive,
> >>>>>>> I hope to be able to get back to this next week.
> >>>>>>
> >>>>>> We need to
> >>>>>>  - account for polarity changes along the route
> >>>>>>  - get rid of irq_count as it is no longer updated in the fast path,
> >>>>>>    thus useless on routing updates
> >>>>>
> >>>>> I'll need to consider this more to understand what you mean here.
> >>>>
> >>>> If, e.g., the host bridge is configured to flip the polarity of some
> >>>> interrupt on delivery, the fast path must be able to explore this in
> >>>> order to do the same.
> >>>
> >>> This can be solved incrementally I think - handle polarity
> >>> change like mapping change, no?
> >>
> >> Both belong together if we want to do it generically, IMHO.
> > 
> > So irq_polarity callback like we have for map_irq ATM?  But at least pci
> > bridges never do this flip.  Maybe this is the property of the interrupt
> > controller after all?
> 
> It is a property of some host bridges apparently (e.g. bonito).

So I'm not sure it's worth it to abstract that but I don't mind either.

> In
> theory, someone could also add a PCI-PCI bridge that does this (or does
> the spec disallow it?).

It seems to disallow it.

> > 
> >>>
> >>>> Then you may want to have a look at how irq_count is maintained and when
> >>>> it is used.
> >>>
> >>> In my patch it simply there to OR irq levels of all devices
> >>> connected to a specific pin.
> >>
> >> I think what we want is to avoid any walks and intermediate state
> >> updates for all IRQ deliveries.
> > 
> > Well the bus is not an intermediate state ATM as piix
> > only has a bit per IRQ so it can't OR devices together.
> > So you want to move the counter over to piix?
> 
> irq_count can't be moved logically as it is part of the vmstate. But it
> should only be generated for saving the state by polling all devices
> (and bridges).
> 
> For that we need is an optional callback for devices via which we can
> ask them to update PCIDevice::irq_state in case they don't trigger
> pci_set_irq regularly.

Let's worry about migration compatibility separately.

> And pci_set_irq could simply change the input
> line of the IRQ controller according to the cached route and polarity
> mapping.

But the line is shared between multiple devices.
You need to perform a logical OR function between them all,
this is what the counter does.

> That's basically what I have in mind for any bus. But we could first try
> it out on PCI, then later on generalize the design to make it useable
> for all IRQ routings. I'm afraid there is no simpler way to introduce
> direct IRQ delivery for PCI (unless ignoring corner cases like in qemu-kvm).
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
Jan Kiszka - May 30, 2012, 5:29 p.m.
On 2012-05-30 19:20, Michael S. Tsirkin wrote:
> On Wed, May 30, 2012 at 06:46:03PM +0200, Jan Kiszka wrote:
>> On 2012-05-30 18:17, Michael S. Tsirkin wrote:
>>> On Wed, May 30, 2012 at 05:47:45PM +0200, Jan Kiszka wrote:
>>>> On 2012-05-30 16:59, Michael S. Tsirkin wrote:
>>>>> On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote:
>>>>>> On 2012-05-30 16:42, Michael S. Tsirkin wrote:
>>>>>>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote:
>>>>>>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote:
>>>>>>>>> Below's as far as I got - hopefully this
>>>>>>>>> explains what I had in mind: for deep hierarchies
>>>>>>>>> this will save a bus scan so I think this makes sense
>>>>>>>>> even in the absense of kvm irqchip.
>>>>>>>>>
>>>>>>>>> Warning: completely untested and known to be incomplete.
>>>>>>>>> Please judge whether this is going in a direction that
>>>>>>>>> is helpful for your efforts. If you respond in the positive,
>>>>>>>>> I hope to be able to get back to this next week.
>>>>>>>>
>>>>>>>> We need to
>>>>>>>>  - account for polarity changes along the route
>>>>>>>>  - get rid of irq_count as it is no longer updated in the fast path,
>>>>>>>>    thus useless on routing updates
>>>>>>>
>>>>>>> I'll need to consider this more to understand what you mean here.
>>>>>>
>>>>>> If, e.g., the host bridge is configured to flip the polarity of some
>>>>>> interrupt on delivery, the fast path must be able to explore this in
>>>>>> order to do the same.
>>>>>
>>>>> This can be solved incrementally I think - handle polarity
>>>>> change like mapping change, no?
>>>>
>>>> Both belong together if we want to do it generically, IMHO.
>>>
>>> So irq_polarity callback like we have for map_irq ATM?  But at least pci
>>> bridges never do this flip.  Maybe this is the property of the interrupt
>>> controller after all?
>>
>> It is a property of some host bridges apparently (e.g. bonito).
> 
> So I'm not sure it's worth it to abstract that but I don't mind either.

It is must for a generic solution. We cannot modeling after a single
host bridge called PIIX3.

> 
>> In
>> theory, someone could also add a PCI-PCI bridge that does this (or does
>> the spec disallow it?).
> 
> It seems to disallow it.
> 
>>>
>>>>>
>>>>>> Then you may want to have a look at how irq_count is maintained and when
>>>>>> it is used.
>>>>>
>>>>> In my patch it simply there to OR irq levels of all devices
>>>>> connected to a specific pin.
>>>>
>>>> I think what we want is to avoid any walks and intermediate state
>>>> updates for all IRQ deliveries.
>>>
>>> Well the bus is not an intermediate state ATM as piix
>>> only has a bit per IRQ so it can't OR devices together.
>>> So you want to move the counter over to piix?
>>
>> irq_count can't be moved logically as it is part of the vmstate. But it
>> should only be generated for saving the state by polling all devices
>> (and bridges).
>>
>> For that we need is an optional callback for devices via which we can
>> ask them to update PCIDevice::irq_state in case they don't trigger
>> pci_set_irq regularly.
> 
> Let's worry about migration compatibility separately.
> 
>> And pci_set_irq could simply change the input
>> line of the IRQ controller according to the cached route and polarity
>> mapping.
> 
> But the line is shared between multiple devices.
> You need to perform a logical OR function between them all,
> this is what the counter does.

Needs to be solved at a different level (at the final output of the fast
path, surely not per PCI bus).

Jan
Michael S. Tsirkin - May 30, 2012, 5:41 p.m.
On Wed, May 30, 2012 at 07:29:52PM +0200, Jan Kiszka wrote:
> On 2012-05-30 19:20, Michael S. Tsirkin wrote:
> > On Wed, May 30, 2012 at 06:46:03PM +0200, Jan Kiszka wrote:
> >> On 2012-05-30 18:17, Michael S. Tsirkin wrote:
> >>> On Wed, May 30, 2012 at 05:47:45PM +0200, Jan Kiszka wrote:
> >>>> On 2012-05-30 16:59, Michael S. Tsirkin wrote:
> >>>>> On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote:
> >>>>>> On 2012-05-30 16:42, Michael S. Tsirkin wrote:
> >>>>>>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote:
> >>>>>>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote:
> >>>>>>>>> Below's as far as I got - hopefully this
> >>>>>>>>> explains what I had in mind: for deep hierarchies
> >>>>>>>>> this will save a bus scan so I think this makes sense
> >>>>>>>>> even in the absense of kvm irqchip.
> >>>>>>>>>
> >>>>>>>>> Warning: completely untested and known to be incomplete.
> >>>>>>>>> Please judge whether this is going in a direction that
> >>>>>>>>> is helpful for your efforts. If you respond in the positive,
> >>>>>>>>> I hope to be able to get back to this next week.
> >>>>>>>>
> >>>>>>>> We need to
> >>>>>>>>  - account for polarity changes along the route
> >>>>>>>>  - get rid of irq_count as it is no longer updated in the fast path,
> >>>>>>>>    thus useless on routing updates
> >>>>>>>
> >>>>>>> I'll need to consider this more to understand what you mean here.
> >>>>>>
> >>>>>> If, e.g., the host bridge is configured to flip the polarity of some
> >>>>>> interrupt on delivery, the fast path must be able to explore this in
> >>>>>> order to do the same.
> >>>>>
> >>>>> This can be solved incrementally I think - handle polarity
> >>>>> change like mapping change, no?
> >>>>
> >>>> Both belong together if we want to do it generically, IMHO.
> >>>
> >>> So irq_polarity callback like we have for map_irq ATM?  But at least pci
> >>> bridges never do this flip.  Maybe this is the property of the interrupt
> >>> controller after all?
> >>
> >> It is a property of some host bridges apparently (e.g. bonito).
> > 
> > So I'm not sure it's worth it to abstract that but I don't mind either.
> 
> It is must for a generic solution. We cannot modeling after a single
> host bridge called PIIX3.
> 
> > 
> >> In
> >> theory, someone could also add a PCI-PCI bridge that does this (or does
> >> the spec disallow it?).
> > 
> > It seems to disallow it.
> > 
> >>>
> >>>>>
> >>>>>> Then you may want to have a look at how irq_count is maintained and when
> >>>>>> it is used.
> >>>>>
> >>>>> In my patch it simply there to OR irq levels of all devices
> >>>>> connected to a specific pin.
> >>>>
> >>>> I think what we want is to avoid any walks and intermediate state
> >>>> updates for all IRQ deliveries.
> >>>
> >>> Well the bus is not an intermediate state ATM as piix
> >>> only has a bit per IRQ so it can't OR devices together.
> >>> So you want to move the counter over to piix?
> >>
> >> irq_count can't be moved logically as it is part of the vmstate. But it
> >> should only be generated for saving the state by polling all devices
> >> (and bridges).
> >>
> >> For that we need is an optional callback for devices via which we can
> >> ask them to update PCIDevice::irq_state in case they don't trigger
> >> pci_set_irq regularly.
> > 
> > Let's worry about migration compatibility separately.
> > 
> >> And pci_set_irq could simply change the input
> >> line of the IRQ controller according to the cached route and polarity
> >> mapping.
> > 
> > But the line is shared between multiple devices.
> > You need to perform a logical OR function between them all,
> > this is what the counter does.
> 
> Needs to be solved at a different level (at the final output of the fast
> path, surely not per PCI bus).
> 
> Jan

Well the final output is in kvm :) What happens with assigned devices is
that kvm performs the logical OR using source id.  But the number of
available source IDs is limited (up to BIT_PER_LONG currently).

So it's not reasonable to allocate one per emulated device.

So assigned devices will have to be special-cased somehow.
Given that, is there really a point in moving these bits
around?

Well, I'll stop ranting here.  The patch that I sent is not intrusive
and simply gives you a simple way to implement pci_device_get_host_irq,
also optimizing emulated devices somewhat.  So if you think you need
pci_device_get_host_irq I think this is a reasonable way to support
that.  But if you changed your mind, I don't mind.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
Jan Kiszka - May 30, 2012, 5:51 p.m.
On 2012-05-30 19:41, Michael S. Tsirkin wrote:
> On Wed, May 30, 2012 at 07:29:52PM +0200, Jan Kiszka wrote:
>> On 2012-05-30 19:20, Michael S. Tsirkin wrote:
>>> On Wed, May 30, 2012 at 06:46:03PM +0200, Jan Kiszka wrote:
>>>> On 2012-05-30 18:17, Michael S. Tsirkin wrote:
>>>>> On Wed, May 30, 2012 at 05:47:45PM +0200, Jan Kiszka wrote:
>>>>>> On 2012-05-30 16:59, Michael S. Tsirkin wrote:
>>>>>>> On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote:
>>>>>>>> On 2012-05-30 16:42, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote:
>>>>>>>>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote:
>>>>>>>>>>> Below's as far as I got - hopefully this
>>>>>>>>>>> explains what I had in mind: for deep hierarchies
>>>>>>>>>>> this will save a bus scan so I think this makes sense
>>>>>>>>>>> even in the absense of kvm irqchip.
>>>>>>>>>>>
>>>>>>>>>>> Warning: completely untested and known to be incomplete.
>>>>>>>>>>> Please judge whether this is going in a direction that
>>>>>>>>>>> is helpful for your efforts. If you respond in the positive,
>>>>>>>>>>> I hope to be able to get back to this next week.
>>>>>>>>>>
>>>>>>>>>> We need to
>>>>>>>>>>  - account for polarity changes along the route
>>>>>>>>>>  - get rid of irq_count as it is no longer updated in the fast path,
>>>>>>>>>>    thus useless on routing updates
>>>>>>>>>
>>>>>>>>> I'll need to consider this more to understand what you mean here.
>>>>>>>>
>>>>>>>> If, e.g., the host bridge is configured to flip the polarity of some
>>>>>>>> interrupt on delivery, the fast path must be able to explore this in
>>>>>>>> order to do the same.
>>>>>>>
>>>>>>> This can be solved incrementally I think - handle polarity
>>>>>>> change like mapping change, no?
>>>>>>
>>>>>> Both belong together if we want to do it generically, IMHO.
>>>>>
>>>>> So irq_polarity callback like we have for map_irq ATM?  But at least pci
>>>>> bridges never do this flip.  Maybe this is the property of the interrupt
>>>>> controller after all?
>>>>
>>>> It is a property of some host bridges apparently (e.g. bonito).
>>>
>>> So I'm not sure it's worth it to abstract that but I don't mind either.
>>
>> It is must for a generic solution. We cannot modeling after a single
>> host bridge called PIIX3.
>>
>>>
>>>> In
>>>> theory, someone could also add a PCI-PCI bridge that does this (or does
>>>> the spec disallow it?).
>>>
>>> It seems to disallow it.
>>>
>>>>>
>>>>>>>
>>>>>>>> Then you may want to have a look at how irq_count is maintained and when
>>>>>>>> it is used.
>>>>>>>
>>>>>>> In my patch it simply there to OR irq levels of all devices
>>>>>>> connected to a specific pin.
>>>>>>
>>>>>> I think what we want is to avoid any walks and intermediate state
>>>>>> updates for all IRQ deliveries.
>>>>>
>>>>> Well the bus is not an intermediate state ATM as piix
>>>>> only has a bit per IRQ so it can't OR devices together.
>>>>> So you want to move the counter over to piix?
>>>>
>>>> irq_count can't be moved logically as it is part of the vmstate. But it
>>>> should only be generated for saving the state by polling all devices
>>>> (and bridges).
>>>>
>>>> For that we need is an optional callback for devices via which we can
>>>> ask them to update PCIDevice::irq_state in case they don't trigger
>>>> pci_set_irq regularly.
>>>
>>> Let's worry about migration compatibility separately.
>>>
>>>> And pci_set_irq could simply change the input
>>>> line of the IRQ controller according to the cached route and polarity
>>>> mapping.
>>>
>>> But the line is shared between multiple devices.
>>> You need to perform a logical OR function between them all,
>>> this is what the counter does.
>>
>> Needs to be solved at a different level (at the final output of the fast
>> path, surely not per PCI bus).
>>
>> Jan
> 
> Well the final output is in kvm :)

One final output. Userspace originated IRQs can be merged in userspace
before flipping KVM's [IOA]PIC input line.

> What happens with assigned devices is
> that kvm performs the logical OR using source id.  But the number of
> available source IDs is limited (up to BIT_PER_LONG currently).
> 
> So it's not reasonable to allocate one per emulated device.
> 
> So assigned devices will have to be special-cased somehow.
> Given that, is there really a point in moving these bits
> around?
> 
> Well, I'll stop ranting here.  The patch that I sent is not intrusive
> and simply gives you a simple way to implement pci_device_get_host_irq,
> also optimizing emulated devices somewhat.  So if you think you need
> pci_device_get_host_irq I think this is a reasonable way to support
> that.  But if you changed your mind, I don't mind.

Sorry, your patch doesn't help me in any way.

Jan
Jan Kiszka - May 30, 2012, 5:57 p.m.
On 2012-05-30 19:51, Jan Kiszka wrote:
>> Well, I'll stop ranting here.  The patch that I sent is not intrusive
>> and simply gives you a simple way to implement pci_device_get_host_irq,
>> also optimizing emulated devices somewhat.  So if you think you need
>> pci_device_get_host_irq I think this is a reasonable way to support
>> that.  But if you changed your mind, I don't mind.
> 
> Sorry, your patch doesn't help me in any way.

[to finish the sentence]

...as it doesn't handle the final routing step in the host bridge. I
still need to look this up and provide that via pci_device_get_host_irq.
For that I need the additional callback for host bridges. But I also
need to solve the other problems discussed in the past hours. I'm back
at the drawing board.

Jan
Michael S. Tsirkin - May 30, 2012, 6:23 p.m.
On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote:
> On 2012-05-30 19:51, Jan Kiszka wrote:
> >> Well, I'll stop ranting here.  The patch that I sent is not intrusive
> >> and simply gives you a simple way to implement pci_device_get_host_irq,
> >> also optimizing emulated devices somewhat.  So if you think you need
> >> pci_device_get_host_irq I think this is a reasonable way to support
> >> that.  But if you changed your mind, I don't mind.
> > 
> > Sorry, your patch doesn't help me in any way.
> 
> [to finish the sentence]
> 
> ...as it doesn't handle the final routing step in the host bridge.

I think you mean the logic in piix3_set_irq_level?

True. I suggest we make piix3_set_irq_level use the map_irq
infrastructure somehow: generalize it not to rely on device->bus
relationship.

Then something like my patch will solve the problem completely.


> I
> still need to look this up and provide that via pci_device_get_host_irq.
> For that I need the additional callback for host bridges. But I also
> need to solve the other problems discussed in the past hours. I'm back
> at the drawing board.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
Michael S. Tsirkin - May 30, 2012, 6:29 p.m.
On Wed, May 30, 2012 at 09:23:56PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote:
> > On 2012-05-30 19:51, Jan Kiszka wrote:
> > >> Well, I'll stop ranting here.  The patch that I sent is not intrusive
> > >> and simply gives you a simple way to implement pci_device_get_host_irq,
> > >> also optimizing emulated devices somewhat.  So if you think you need
> > >> pci_device_get_host_irq I think this is a reasonable way to support
> > >> that.  But if you changed your mind, I don't mind.
> > > 
> > > Sorry, your patch doesn't help me in any way.
> > 
> > [to finish the sentence]
> > 
> > ...as it doesn't handle the final routing step in the host bridge.
> 
> I think you mean the logic in piix3_set_irq_level?
> 
> True. I suggest we make piix3_set_irq_level use the map_irq
> infrastructure somehow:

BTW, can't we simply override map_irq and make it read from
piix3->dev.config[PIIX_PIRQC + pirq]?

The root bus is part of root complex I don't see why
doesn't it output an IRQ# that is directly useful
to feed into the CPU.

> generalize it not to rely on device->bus
> relationship.
> 
> Then something like my patch will solve the problem completely.
> 
> 
> > I
> > still need to look this up and provide that via pci_device_get_host_irq.
> > For that I need the additional callback for host bridges. But I also
> > need to solve the other problems discussed in the past hours. I'm back
> > at the drawing board.
> > 
> > Jan
> > 
> > -- 
> > Siemens AG, Corporate Technology, CT T DE IT 1
> > Corporate Competence Center Embedded Linux
Michael S. Tsirkin - May 30, 2012, 6:51 p.m.
On Wed, May 30, 2012 at 09:29:13PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 30, 2012 at 09:23:56PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote:
> > > On 2012-05-30 19:51, Jan Kiszka wrote:
> > > >> Well, I'll stop ranting here.  The patch that I sent is not intrusive
> > > >> and simply gives you a simple way to implement pci_device_get_host_irq,
> > > >> also optimizing emulated devices somewhat.  So if you think you need
> > > >> pci_device_get_host_irq I think this is a reasonable way to support
> > > >> that.  But if you changed your mind, I don't mind.
> > > > 
> > > > Sorry, your patch doesn't help me in any way.
> > > 
> > > [to finish the sentence]
> > > 
> > > ...as it doesn't handle the final routing step in the host bridge.
> > 
> > I think you mean the logic in piix3_set_irq_level?
> > 
> > True. I suggest we make piix3_set_irq_level use the map_irq
> > infrastructure somehow:
> 
> BTW, can't we simply override map_irq and make it read from
> piix3->dev.config[PIIX_PIRQC + pirq]?

Basically move
    pic_irq = piix3->dev.config[PIIX_PIRQC + pirq];
to pci_slot_get_pirq.

> The root bus is part of root complex I don't see why
> doesn't it output an IRQ# that is directly useful
> to feed into the CPU.
> 
> > generalize it not to rely on device->bus
> > relationship.
> > 
> > Then something like my patch will solve the problem completely.
> > 
> > 
> > > I
> > > still need to look this up and provide that via pci_device_get_host_irq.
> > > For that I need the additional callback for host bridges. But I also
> > > need to solve the other problems discussed in the past hours. I'm back
> > > at the drawing board.
> > > 
> > > Jan
> > > 
> > > -- 
> > > Siemens AG, Corporate Technology, CT T DE IT 1
> > > Corporate Competence Center Embedded Linux
Jan Kiszka - May 30, 2012, 7:06 p.m.
On 2012-05-30 20:51, Michael S. Tsirkin wrote:
> On Wed, May 30, 2012 at 09:29:13PM +0300, Michael S. Tsirkin wrote:
>> On Wed, May 30, 2012 at 09:23:56PM +0300, Michael S. Tsirkin wrote:
>>> On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote:
>>>> On 2012-05-30 19:51, Jan Kiszka wrote:
>>>>>> Well, I'll stop ranting here.  The patch that I sent is not intrusive
>>>>>> and simply gives you a simple way to implement pci_device_get_host_irq,
>>>>>> also optimizing emulated devices somewhat.  So if you think you need
>>>>>> pci_device_get_host_irq I think this is a reasonable way to support
>>>>>> that.  But if you changed your mind, I don't mind.
>>>>>
>>>>> Sorry, your patch doesn't help me in any way.
>>>>
>>>> [to finish the sentence]
>>>>
>>>> ...as it doesn't handle the final routing step in the host bridge.
>>>
>>> I think you mean the logic in piix3_set_irq_level?
>>>
>>> True. I suggest we make piix3_set_irq_level use the map_irq
>>> infrastructure somehow:
>>
>> BTW, can't we simply override map_irq and make it read from
>> piix3->dev.config[PIIX_PIRQC + pirq]?
> 
> Basically move
>     pic_irq = piix3->dev.config[PIIX_PIRQC + pirq];
> to pci_slot_get_pirq.

map_irq might be reused, but not that easily as the return value is used
as irq_count index.

Jan
Michael S. Tsirkin - May 30, 2012, 7:30 p.m.
On Wed, May 30, 2012 at 09:06:25PM +0200, Jan Kiszka wrote:
> On 2012-05-30 20:51, Michael S. Tsirkin wrote:
> > On Wed, May 30, 2012 at 09:29:13PM +0300, Michael S. Tsirkin wrote:
> >> On Wed, May 30, 2012 at 09:23:56PM +0300, Michael S. Tsirkin wrote:
> >>> On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote:
> >>>> On 2012-05-30 19:51, Jan Kiszka wrote:
> >>>>>> Well, I'll stop ranting here.  The patch that I sent is not intrusive
> >>>>>> and simply gives you a simple way to implement pci_device_get_host_irq,
> >>>>>> also optimizing emulated devices somewhat.  So if you think you need
> >>>>>> pci_device_get_host_irq I think this is a reasonable way to support
> >>>>>> that.  But if you changed your mind, I don't mind.
> >>>>>
> >>>>> Sorry, your patch doesn't help me in any way.
> >>>>
> >>>> [to finish the sentence]
> >>>>
> >>>> ...as it doesn't handle the final routing step in the host bridge.
> >>>
> >>> I think you mean the logic in piix3_set_irq_level?
> >>>
> >>> True. I suggest we make piix3_set_irq_level use the map_irq
> >>> infrastructure somehow:
> >>
> >> BTW, can't we simply override map_irq and make it read from
> >> piix3->dev.config[PIIX_PIRQC + pirq]?
> > 
> > Basically move
> >     pic_irq = piix3->dev.config[PIIX_PIRQC + pirq];
> > to pci_slot_get_pirq.
> 
> map_irq might be reused, but not that easily as the return value is used
> as irq_count index.
> 
> Jan

So we'll just have PIIX_NUM_PIC_IRQS entries there and use
irq_count instead of the pic_levels bitmap.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
Jan Kiszka - May 30, 2012, 8:23 p.m.
On 2012-05-30 21:30, Michael S. Tsirkin wrote:
> On Wed, May 30, 2012 at 09:06:25PM +0200, Jan Kiszka wrote:
>> On 2012-05-30 20:51, Michael S. Tsirkin wrote:
>>> On Wed, May 30, 2012 at 09:29:13PM +0300, Michael S. Tsirkin wrote:
>>>> On Wed, May 30, 2012 at 09:23:56PM +0300, Michael S. Tsirkin wrote:
>>>>> On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote:
>>>>>> On 2012-05-30 19:51, Jan Kiszka wrote:
>>>>>>>> Well, I'll stop ranting here.  The patch that I sent is not intrusive
>>>>>>>> and simply gives you a simple way to implement pci_device_get_host_irq,
>>>>>>>> also optimizing emulated devices somewhat.  So if you think you need
>>>>>>>> pci_device_get_host_irq I think this is a reasonable way to support
>>>>>>>> that.  But if you changed your mind, I don't mind.
>>>>>>>
>>>>>>> Sorry, your patch doesn't help me in any way.
>>>>>>
>>>>>> [to finish the sentence]
>>>>>>
>>>>>> ...as it doesn't handle the final routing step in the host bridge.
>>>>>
>>>>> I think you mean the logic in piix3_set_irq_level?
>>>>>
>>>>> True. I suggest we make piix3_set_irq_level use the map_irq
>>>>> infrastructure somehow:
>>>>
>>>> BTW, can't we simply override map_irq and make it read from
>>>> piix3->dev.config[PIIX_PIRQC + pirq]?
>>>
>>> Basically move
>>>     pic_irq = piix3->dev.config[PIIX_PIRQC + pirq];
>>> to pci_slot_get_pirq.
>>
>> map_irq might be reused, but not that easily as the return value is used
>> as irq_count index.
>>
>> Jan
> 
> So we'll just have PIIX_NUM_PIC_IRQS entries there and use
> irq_count instead of the pic_levels bitmap.

Just that this affects generic PCI code, not only PIIX-specific things.
And that we need to save/restore some irq_count field according to the
old semantics. As I said: not that easy. :)

Jan
Michael S. Tsirkin - May 30, 2012, 8:31 p.m.
On Wed, May 30, 2012 at 10:23:16PM +0200, Jan Kiszka wrote:
> On 2012-05-30 21:30, Michael S. Tsirkin wrote:
> > On Wed, May 30, 2012 at 09:06:25PM +0200, Jan Kiszka wrote:
> >> On 2012-05-30 20:51, Michael S. Tsirkin wrote:
> >>> On Wed, May 30, 2012 at 09:29:13PM +0300, Michael S. Tsirkin wrote:
> >>>> On Wed, May 30, 2012 at 09:23:56PM +0300, Michael S. Tsirkin wrote:
> >>>>> On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote:
> >>>>>> On 2012-05-30 19:51, Jan Kiszka wrote:
> >>>>>>>> Well, I'll stop ranting here.  The patch that I sent is not intrusive
> >>>>>>>> and simply gives you a simple way to implement pci_device_get_host_irq,
> >>>>>>>> also optimizing emulated devices somewhat.  So if you think you need
> >>>>>>>> pci_device_get_host_irq I think this is a reasonable way to support
> >>>>>>>> that.  But if you changed your mind, I don't mind.
> >>>>>>>
> >>>>>>> Sorry, your patch doesn't help me in any way.
> >>>>>>
> >>>>>> [to finish the sentence]
> >>>>>>
> >>>>>> ...as it doesn't handle the final routing step in the host bridge.
> >>>>>
> >>>>> I think you mean the logic in piix3_set_irq_level?
> >>>>>
> >>>>> True. I suggest we make piix3_set_irq_level use the map_irq
> >>>>> infrastructure somehow:
> >>>>
> >>>> BTW, can't we simply override map_irq and make it read from
> >>>> piix3->dev.config[PIIX_PIRQC + pirq]?
> >>>
> >>> Basically move
> >>>     pic_irq = piix3->dev.config[PIIX_PIRQC + pirq];
> >>> to pci_slot_get_pirq.
> >>
> >> map_irq might be reused, but not that easily as the return value is used
> >> as irq_count index.
> >>
> >> Jan
> > 
> > So we'll just have PIIX_NUM_PIC_IRQS entries there and use
> > irq_count instead of the pic_levels bitmap.
> 
> Just that this affects generic PCI code, not only PIIX-specific things.

Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs.

> And that we need to save/restore some irq_count field according to the
> old semantics.

Well, it's a bug: this is redundant info we should not have exposed it.

Anyway, let's make the rest work properly and cleanly first, add a FIXME
for now, then we'll find a hack making it work for migration.

> As I said: not that easy. :)
> 
> Jan
>
Jan Kiszka - June 1, 2012, 12:52 p.m.
On 2012-05-30 22:31, Michael S. Tsirkin wrote:
>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use
>>> irq_count instead of the pic_levels bitmap.
>>
>> Just that this affects generic PCI code, not only PIIX-specific things.
> 
> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs.
> 
>> And that we need to save/restore some irq_count field according to the
>> old semantics.
> 
> Well, it's a bug: this is redundant info we should not have exposed it.
> 
> Anyway, let's make the rest work properly and cleanly first, add a FIXME
> for now, then we'll find a hack making it work for migration.

It remains non-trivial: I got your patch working (a minor init issue),
but yet without changing the number of IRQs for PIIX3, so keeping the
irq_count semantics for this host bridge.

Now I'm facing three possibilities of how to proceed:

1. Give up on the (currently broken) feature to write a vmstate for
   older QEMU versions.

   This will allow to declare the irq_count field in vmstate_pcibus
   unused, and we would have to restore it on vmload step-wise via the
   PCI devices. It would also allow to change its semantics for PIIX3,
   mapping directly to PIC IRQs.

2. Keep writing a legacy irq_count field.

   This will require quite a few new APIs so that host bridges that
   want to change their nirq can still generate a compatible irq_count
   vmstate field. Namely:
    - A function to set up vmstate_irq_count and define a callback that
      the core will invoke to prepare the vmstate_irq_count before
      vmsave.
    - A function to obtain the IRQ mapping /without/ the final host
      bridge step. This is required so that the callback above can
      calculate the old state like in the PIIX3 case.

3. Keep irq_count and nirq as is, introduce additional map_host_irq.

   This is simpler than 2 and more compatible than 1. It would also
   allow to introduce the polarity and masking information more
   smoothly as we won't have to add it to existing map_irq callbacks
   then.

Any other suggestions?

Jan
Michael S. Tsirkin - June 1, 2012, 1:27 p.m.
On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote:
> On 2012-05-30 22:31, Michael S. Tsirkin wrote:
> >>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use
> >>> irq_count instead of the pic_levels bitmap.
> >>
> >> Just that this affects generic PCI code, not only PIIX-specific things.
> > 
> > Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs.
> > 
> >> And that we need to save/restore some irq_count field according to the
> >> old semantics.
> > 
> > Well, it's a bug: this is redundant info we should not have exposed it.
> > 
> > Anyway, let's make the rest work properly and cleanly first, add a FIXME
> > for now, then we'll find a hack making it work for migration.
> 
> It remains non-trivial: I got your patch working (a minor init issue),
> but yet without changing the number of IRQs for PIIX3, so keeping the
> irq_count semantics for this host bridge.
> 
> Now I'm facing three possibilities of how to proceed:

They all look OK I think :) Some comments below.

> 1. Give up on the (currently broken) feature to write a vmstate for
>    older QEMU versions.
> 
>    This will allow to declare the irq_count field in vmstate_pcibus
>    unused, and we would have to restore it on vmload step-wise via the
>    PCI devices. It would also allow to change its semantics for PIIX3,
>    mapping directly to PIC IRQs.

I think that's okay too simply because these things are usually
easy to fix after the fact when the rest of the issues are addressed.

> 2. Keep writing a legacy irq_count field.
> 
>    This will require quite a few new APIs so that host bridges that
>    want to change their nirq can still generate a compatible irq_count
>    vmstate field. Namely:
>     - A function to set up vmstate_irq_count and define a callback that
>       the core will invoke to prepare the vmstate_irq_count before
>       vmsave.
>     - A function to obtain the IRQ mapping /without/ the final host
>       bridge step. This is required so that the callback above can
>       calculate the old state like in the PIIX3 case.

Does this really need to be so complex? It seems that we just need
pci_get_irq_count(bus, irq) which can use the existing map_irq API, no?
Then invoke that before save.

> 3. Keep irq_count and nirq as is, introduce additional map_host_irq.
> 
>    This is simpler than 2 and more compatible than 1. It would also
>    allow to introduce the polarity and masking information more
>    smoothly as we won't have to add it to existing map_irq callbacks
>    then.

So what does it map, and to what?
Maybe we can make the name imply that somehow.

> Any other suggestions?
> 
> Jan
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
Michael S. Tsirkin - June 1, 2012, 1:28 p.m.
On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote:
> On 2012-05-30 22:31, Michael S. Tsirkin wrote:
> >>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use
> >>> irq_count instead of the pic_levels bitmap.
> >>
> >> Just that this affects generic PCI code, not only PIIX-specific things.
> > 
> > Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs.
> > 
> >> And that we need to save/restore some irq_count field according to the
> >> old semantics.
> > 
> > Well, it's a bug: this is redundant info we should not have exposed it.
> > 
> > Anyway, let's make the rest work properly and cleanly first, add a FIXME
> > for now, then we'll find a hack making it work for migration.
> 
> It remains non-trivial: I got your patch working (a minor init issue),
> but yet without changing the number of IRQs for PIIX3, so keeping the
> irq_count semantics for this host bridge.

BTW can you post the fixed version please in case
others want to play with it?
Jan Kiszka - June 1, 2012, 1:43 p.m.
On 2012-06-01 15:28, Michael S. Tsirkin wrote:
> On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote:
>> On 2012-05-30 22:31, Michael S. Tsirkin wrote:
>>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use
>>>>> irq_count instead of the pic_levels bitmap.
>>>>
>>>> Just that this affects generic PCI code, not only PIIX-specific things.
>>>
>>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs.
>>>
>>>> And that we need to save/restore some irq_count field according to the
>>>> old semantics.
>>>
>>> Well, it's a bug: this is redundant info we should not have exposed it.
>>>
>>> Anyway, let's make the rest work properly and cleanly first, add a FIXME
>>> for now, then we'll find a hack making it work for migration.
>>
>> It remains non-trivial: I got your patch working (a minor init issue),
>> but yet without changing the number of IRQs for PIIX3, so keeping the
>> irq_count semantics for this host bridge.
> 
> BTW can you post the fixed version please in case
> others want to play with it?

Pushed a snapshot to git://git.kiszka.org/qemu.git queues/pci. That
version survived simple tests.

Jan
Jan Kiszka - June 1, 2012, 1:57 p.m.
On 2012-06-01 15:27, Michael S. Tsirkin wrote:
> On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote:
>> On 2012-05-30 22:31, Michael S. Tsirkin wrote:
>>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use
>>>>> irq_count instead of the pic_levels bitmap.
>>>>
>>>> Just that this affects generic PCI code, not only PIIX-specific things.
>>>
>>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs.
>>>
>>>> And that we need to save/restore some irq_count field according to the
>>>> old semantics.
>>>
>>> Well, it's a bug: this is redundant info we should not have exposed it.
>>>
>>> Anyway, let's make the rest work properly and cleanly first, add a FIXME
>>> for now, then we'll find a hack making it work for migration.
>>
>> It remains non-trivial: I got your patch working (a minor init issue),
>> but yet without changing the number of IRQs for PIIX3, so keeping the
>> irq_count semantics for this host bridge.
>>
>> Now I'm facing three possibilities of how to proceed:
> 
> They all look OK I think :) Some comments below.
> 
>> 1. Give up on the (currently broken) feature to write a vmstate for
>>    older QEMU versions.
>>
>>    This will allow to declare the irq_count field in vmstate_pcibus
>>    unused, and we would have to restore it on vmload step-wise via the
>>    PCI devices. It would also allow to change its semantics for PIIX3,
>>    mapping directly to PIC IRQs.
> 
> I think that's okay too simply because these things are usually
> easy to fix after the fact when the rest of the issues are addressed.

Don't get what you mean with "fixed". If we fix the vmstate generation
in making it backward-compatible again, we enter option 2. Option 1 is
explicitly about giving this up.

> 
>> 2. Keep writing a legacy irq_count field.
>>
>>    This will require quite a few new APIs so that host bridges that
>>    want to change their nirq can still generate a compatible irq_count
>>    vmstate field. Namely:
>>     - A function to set up vmstate_irq_count and define a callback that
>>       the core will invoke to prepare the vmstate_irq_count before
>>       vmsave.
>>     - A function to obtain the IRQ mapping /without/ the final host
>>       bridge step. This is required so that the callback above can
>>       calculate the old state like in the PIIX3 case.
> 
> Does this really need to be so complex? It seems that we just need
> pci_get_irq_count(bus, irq) which can use the existing map_irq API, no?
> Then invoke that before save.

No, because the new map_irq of the PIIX3 bridge will also include the
host bridge routing (or masking) according to the PIRQx routoing
registers of the PIIX3. Moreover, the fixup of the written legacy
irq_count state has to happen in the PCI layer, which therefore has to
query the host bridge for fixup information, not the other way around
(because the PCI bus vmstate is separate from the PIIX3 host bridge).

> 
>> 3. Keep irq_count and nirq as is, introduce additional map_host_irq.
>>
>>    This is simpler than 2 and more compatible than 1. It would also
>>    allow to introduce the polarity and masking information more
>>    smoothly as we won't have to add it to existing map_irq callbacks
>>    then.
> 
> So what does it map, and to what?

PCI bus IRQ to *host* IRQ. It is supposed to explore the routing of the
host bridge between the root bus and the host's interrupt controller
(i.e. the step that is currently missing the cached chain).

> Maybe we can make the name imply that somehow.

Better suggestions for this handler and maybe also the existing map_irq
are welcome to make the difference clearer.

Jan
Michael S. Tsirkin - June 1, 2012, 2:34 p.m.
On Fri, Jun 01, 2012 at 03:57:01PM +0200, Jan Kiszka wrote:
> On 2012-06-01 15:27, Michael S. Tsirkin wrote:
> > On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote:
> >> On 2012-05-30 22:31, Michael S. Tsirkin wrote:
> >>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use
> >>>>> irq_count instead of the pic_levels bitmap.
> >>>>
> >>>> Just that this affects generic PCI code, not only PIIX-specific things.
> >>>
> >>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs.
> >>>
> >>>> And that we need to save/restore some irq_count field according to the
> >>>> old semantics.
> >>>
> >>> Well, it's a bug: this is redundant info we should not have exposed it.
> >>>
> >>> Anyway, let's make the rest work properly and cleanly first, add a FIXME
> >>> for now, then we'll find a hack making it work for migration.
> >>
> >> It remains non-trivial: I got your patch working (a minor init issue),
> >> but yet without changing the number of IRQs for PIIX3, so keeping the
> >> irq_count semantics for this host bridge.
> >>
> >> Now I'm facing three possibilities of how to proceed:
> > 
> > They all look OK I think :) Some comments below.
> > 
> >> 1. Give up on the (currently broken) feature to write a vmstate for
> >>    older QEMU versions.
> >>
> >>    This will allow to declare the irq_count field in vmstate_pcibus
> >>    unused, and we would have to restore it on vmload step-wise via the
> >>    PCI devices. It would also allow to change its semantics for PIIX3,
> >>    mapping directly to PIC IRQs.
> > 
> > I think that's okay too simply because these things are usually
> > easy to fix after the fact when the rest of the issues are addressed.
> 
> Don't get what you mean with "fixed". If we fix the vmstate generation
> in making it backward-compatible again, we enter option 2. Option 1 is
> explicitly about giving this up.

What I really mean is I think I see how 2 can be added without much
pain. So let's focus on 1 for now and worst case we break migration.

> > 
> >> 2. Keep writing a legacy irq_count field.
> >>
> >>    This will require quite a few new APIs so that host bridges that
> >>    want to change their nirq can still generate a compatible irq_count
> >>    vmstate field. Namely:
> >>     - A function to set up vmstate_irq_count and define a callback that
> >>       the core will invoke to prepare the vmstate_irq_count before
> >>       vmsave.
> >>     - A function to obtain the IRQ mapping /without/ the final host
> >>       bridge step. This is required so that the callback above can
> >>       calculate the old state like in the PIIX3 case.
> > 
> > Does this really need to be so complex? It seems that we just need
> > pci_get_irq_count(bus, irq) which can use the existing map_irq API, no?
> > Then invoke that before save.
> 
> No, because the new map_irq of the PIIX3 bridge will also include the
> host bridge routing (or masking) according to the PIRQx routoing
> registers of the PIIX3. Moreover, the fixup of the written legacy
> irq_count state has to happen in the PCI layer, which therefore has to
> query the host bridge for fixup information, not the other way around
> (because the PCI bus vmstate is separate from the PIIX3 host bridge).
> 
> > 
> >> 3. Keep irq_count and nirq as is, introduce additional map_host_irq.
> >>
> >>    This is simpler than 2 and more compatible than 1. It would also
> >>    allow to introduce the polarity and masking information more
> >>    smoothly as we won't have to add it to existing map_irq callbacks
> >>    then.
> > 
> > So what does it map, and to what?
> 
> PCI bus IRQ to *host* IRQ. It is supposed to explore the routing of the
> host bridge between the root bus and the host's interrupt controller
> (i.e. the step that is currently missing the cached chain).
> 
> > Maybe we can make the name imply that somehow.
> 
> Better suggestions for this handler and maybe also the existing map_irq
> are welcome to make the difference clearer.
> 
> Jan

So I won't object to adding a new API but if we do
it properly this won't help compatibility :(

Let's formulate what these do exactly, this will
also help us come up with sensible names.

1. The difference is that pci bridges route interrupt pins. So it gets
interrupt pin on device and returns interrupt pin on connector. All
attributes are standard PCI.  We should remove all mentions of "irq"
really.


2. The pci root (yes it's a host bridge but let's
not use the term host if we can) routes
an interrupt pin on device to a host irq. It can also
do more things like invert polarity.

So yes we can add 2 to piix but we really should
remove 1 from it.

Wrt names - do you object to long names?
How about route_interrupt_pin for 1
and route_interrupt_pin_to_irq for 2?


> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
Jan Kiszka - June 1, 2012, 3:15 p.m.
On 2012-06-01 16:34, Michael S. Tsirkin wrote:
> On Fri, Jun 01, 2012 at 03:57:01PM +0200, Jan Kiszka wrote:
>> On 2012-06-01 15:27, Michael S. Tsirkin wrote:
>>> On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote:
>>>> On 2012-05-30 22:31, Michael S. Tsirkin wrote:
>>>>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use
>>>>>>> irq_count instead of the pic_levels bitmap.
>>>>>>
>>>>>> Just that this affects generic PCI code, not only PIIX-specific things.
>>>>>
>>>>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs.
>>>>>
>>>>>> And that we need to save/restore some irq_count field according to the
>>>>>> old semantics.
>>>>>
>>>>> Well, it's a bug: this is redundant info we should not have exposed it.
>>>>>
>>>>> Anyway, let's make the rest work properly and cleanly first, add a FIXME
>>>>> for now, then we'll find a hack making it work for migration.
>>>>
>>>> It remains non-trivial: I got your patch working (a minor init issue),
>>>> but yet without changing the number of IRQs for PIIX3, so keeping the
>>>> irq_count semantics for this host bridge.
>>>>
>>>> Now I'm facing three possibilities of how to proceed:
>>>
>>> They all look OK I think :) Some comments below.
>>>
>>>> 1. Give up on the (currently broken) feature to write a vmstate for
>>>>    older QEMU versions.
>>>>
>>>>    This will allow to declare the irq_count field in vmstate_pcibus
>>>>    unused, and we would have to restore it on vmload step-wise via the
>>>>    PCI devices. It would also allow to change its semantics for PIIX3,
>>>>    mapping directly to PIC IRQs.
>>>
>>> I think that's okay too simply because these things are usually
>>> easy to fix after the fact when the rest of the issues are addressed.
>>
>> Don't get what you mean with "fixed". If we fix the vmstate generation
>> in making it backward-compatible again, we enter option 2. Option 1 is
>> explicitly about giving this up.
> 
> What I really mean is I think I see how 2 can be added without much
> pain. So let's focus on 1 for now and worst case we break migration.

I'd like to avoid planing for this worst case as long as there are also
statements [1] that this is not acceptable for QEMU in general. It
doesn't to create a beautiful architecture initially about which we
already know that it will become more complex than alternatives in the end.

> 
>>>
>>>> 2. Keep writing a legacy irq_count field.
>>>>
>>>>    This will require quite a few new APIs so that host bridges that
>>>>    want to change their nirq can still generate a compatible irq_count
>>>>    vmstate field. Namely:
>>>>     - A function to set up vmstate_irq_count and define a callback that
>>>>       the core will invoke to prepare the vmstate_irq_count before
>>>>       vmsave.
>>>>     - A function to obtain the IRQ mapping /without/ the final host
>>>>       bridge step. This is required so that the callback above can
>>>>       calculate the old state like in the PIIX3 case.
>>>
>>> Does this really need to be so complex? It seems that we just need
>>> pci_get_irq_count(bus, irq) which can use the existing map_irq API, no?
>>> Then invoke that before save.
>>
>> No, because the new map_irq of the PIIX3 bridge will also include the
>> host bridge routing (or masking) according to the PIRQx routoing
>> registers of the PIIX3. Moreover, the fixup of the written legacy
>> irq_count state has to happen in the PCI layer, which therefore has to
>> query the host bridge for fixup information, not the other way around
>> (because the PCI bus vmstate is separate from the PIIX3 host bridge).
>>
>>>
>>>> 3. Keep irq_count and nirq as is, introduce additional map_host_irq.
>>>>
>>>>    This is simpler than 2 and more compatible than 1. It would also
>>>>    allow to introduce the polarity and masking information more
>>>>    smoothly as we won't have to add it to existing map_irq callbacks
>>>>    then.
>>>
>>> So what does it map, and to what?
>>
>> PCI bus IRQ to *host* IRQ. It is supposed to explore the routing of the
>> host bridge between the root bus and the host's interrupt controller
>> (i.e. the step that is currently missing the cached chain).
>>
>>> Maybe we can make the name imply that somehow.
>>
>> Better suggestions for this handler and maybe also the existing map_irq
>> are welcome to make the difference clearer.
>>
>> Jan
> 
> So I won't object to adding a new API but if we do
> it properly this won't help compatibility :(

It will as this API does not touch the parts that affect the vmstate
(ie. semantics of irq_count won't change).

> 
> Let's formulate what these do exactly, this will
> also help us come up with sensible names.
> 
> 1. The difference is that pci bridges route interrupt pins. So it gets
> interrupt pin on device and returns interrupt pin on connector. All
> attributes are standard PCI.  We should remove all mentions of "irq"
> really.
> 
> 
> 2. The pci root (yes it's a host bridge but let's
> not use the term host if we can) routes
> an interrupt pin on device to a host irq. It can also
> do more things like invert polarity.
> 
> So yes we can add 2 to piix but we really should
> remove 1 from it.
> 
> Wrt names - do you object to long names?
> How about route_interrupt_pin for 1
> and route_interrupt_pin_to_irq for 2?

I'm fine with this.

Jan

[1] http://permalink.gmane.org/gmane.comp.emulators.qemu/153357
Michael S. Tsirkin - June 1, 2012, 3:26 p.m.
On Fri, Jun 01, 2012 at 05:34:14PM +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 01, 2012 at 03:57:01PM +0200, Jan Kiszka wrote:
> > On 2012-06-01 15:27, Michael S. Tsirkin wrote:
> > > On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote:
> > >> On 2012-05-30 22:31, Michael S. Tsirkin wrote:
> > >>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use
> > >>>>> irq_count instead of the pic_levels bitmap.
> > >>>>
> > >>>> Just that this affects generic PCI code, not only PIIX-specific things.
> > >>>
> > >>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs.
> > >>>
> > >>>> And that we need to save/restore some irq_count field according to the
> > >>>> old semantics.
> > >>>
> > >>> Well, it's a bug: this is redundant info we should not have exposed it.
> > >>>
> > >>> Anyway, let's make the rest work properly and cleanly first, add a FIXME
> > >>> for now, then we'll find a hack making it work for migration.
> > >>
> > >> It remains non-trivial: I got your patch working (a minor init issue),
> > >> but yet without changing the number of IRQs for PIIX3, so keeping the
> > >> irq_count semantics for this host bridge.
> > >>
> > >> Now I'm facing three possibilities of how to proceed:
> > > 
> > > They all look OK I think :) Some comments below.
> > > 
> > >> 1. Give up on the (currently broken) feature to write a vmstate for
> > >>    older QEMU versions.
> > >>
> > >>    This will allow to declare the irq_count field in vmstate_pcibus
> > >>    unused, and we would have to restore it on vmload step-wise via the
> > >>    PCI devices. It would also allow to change its semantics for PIIX3,
> > >>    mapping directly to PIC IRQs.
> > > 
> > > I think that's okay too simply because these things are usually
> > > easy to fix after the fact when the rest of the issues are addressed.
> > 
> > Don't get what you mean with "fixed". If we fix the vmstate generation
> > in making it backward-compatible again, we enter option 2. Option 1 is
> > explicitly about giving this up.
> 
> What I really mean is I think I see how 2 can be added without much
> pain. So let's focus on 1 for now and worst case we break migration.
> 
> > > 
> > >> 2. Keep writing a legacy irq_count field.
> > >>
> > >>    This will require quite a few new APIs so that host bridges that
> > >>    want to change their nirq can still generate a compatible irq_count
> > >>    vmstate field. Namely:
> > >>     - A function to set up vmstate_irq_count and define a callback that
> > >>       the core will invoke to prepare the vmstate_irq_count before
> > >>       vmsave.
> > >>     - A function to obtain the IRQ mapping /without/ the final host
> > >>       bridge step. This is required so that the callback above can
> > >>       calculate the old state like in the PIIX3 case.
> > > 
> > > Does this really need to be so complex? It seems that we just need
> > > pci_get_irq_count(bus, irq) which can use the existing map_irq API, no?
> > > Then invoke that before save.
> > 
> > No, because the new map_irq of the PIIX3 bridge will also include the
> > host bridge routing (or masking) according to the PIRQx routoing
> > registers of the PIIX3. Moreover, the fixup of the written legacy
> > irq_count state has to happen in the PCI layer, which therefore has to
> > query the host bridge for fixup information, not the other way around
> > (because the PCI bus vmstate is separate from the PIIX3 host bridge).
> > 
> > > 
> > >> 3. Keep irq_count and nirq as is, introduce additional map_host_irq.
> > >>
> > >>    This is simpler than 2 and more compatible than 1. It would also
> > >>    allow to introduce the polarity and masking information more
> > >>    smoothly as we won't have to add it to existing map_irq callbacks
> > >>    then.
> > > 
> > > So what does it map, and to what?
> > 
> > PCI bus IRQ to *host* IRQ. It is supposed to explore the routing of the
> > host bridge between the root bus and the host's interrupt controller
> > (i.e. the step that is currently missing the cached chain).
> > 
> > > Maybe we can make the name imply that somehow.
> > 
> > Better suggestions for this handler and maybe also the existing map_irq
> > are welcome to make the difference clearer.
> > 
> > Jan
> 
> So I won't object to adding a new API but if we do
> it properly this won't help compatibility :(
> 
> Let's formulate what these do exactly, this will
> also help us come up with sensible names.
> 
> 1. The difference is that pci bridges route interrupt pins. So it gets
> interrupt pin on device and returns interrupt pin on connector. All
> attributes are standard PCI.  We should remove all mentions of "irq"
> really.
> 
> 
> 2. The pci root (yes it's a host bridge but let's
> not use the term host if we can) routes
> an interrupt pin on device to a host irq. It can also
> do more things like invert polarity.
> 
> So yes we can add 2 to piix but we really should
> remove 1 from it.
> 
> Wrt names - do you object to long names?
> How about route_interrupt_pin for 1
> and route_interrupt_pin_to_irq for 2?

A small clarification.
What I tried to say above is that pci bus does not
route irqs. pci bridge routes irqs: either host or
pci to pci bridge.

But as long as we thought the functionality is roughly
the same it made sense to say that bus has this
function simply because both bridge types have a bus.

But if not then the clean thing is a callback in p2p bridge
and another one in a host bridge. Having both these
and one in the bus looks weird.

> > -- 
> > Siemens AG, Corporate Technology, CT T DE IT 1
> > Corporate Competence Center Embedded Linux
Michael S. Tsirkin - June 1, 2012, 3:28 p.m.
On Fri, Jun 01, 2012 at 05:15:42PM +0200, Jan Kiszka wrote:
> On 2012-06-01 16:34, Michael S. Tsirkin wrote:
> > On Fri, Jun 01, 2012 at 03:57:01PM +0200, Jan Kiszka wrote:
> >> On 2012-06-01 15:27, Michael S. Tsirkin wrote:
> >>> On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote:
> >>>> On 2012-05-30 22:31, Michael S. Tsirkin wrote:
> >>>>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use
> >>>>>>> irq_count instead of the pic_levels bitmap.
> >>>>>>
> >>>>>> Just that this affects generic PCI code, not only PIIX-specific things.
> >>>>>
> >>>>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs.
> >>>>>
> >>>>>> And that we need to save/restore some irq_count field according to the
> >>>>>> old semantics.
> >>>>>
> >>>>> Well, it's a bug: this is redundant info we should not have exposed it.
> >>>>>
> >>>>> Anyway, let's make the rest work properly and cleanly first, add a FIXME
> >>>>> for now, then we'll find a hack making it work for migration.
> >>>>
> >>>> It remains non-trivial: I got your patch working (a minor init issue),
> >>>> but yet without changing the number of IRQs for PIIX3, so keeping the
> >>>> irq_count semantics for this host bridge.
> >>>>
> >>>> Now I'm facing three possibilities of how to proceed:
> >>>
> >>> They all look OK I think :) Some comments below.
> >>>
> >>>> 1. Give up on the (currently broken) feature to write a vmstate for
> >>>>    older QEMU versions.
> >>>>
> >>>>    This will allow to declare the irq_count field in vmstate_pcibus
> >>>>    unused, and we would have to restore it on vmload step-wise via the
> >>>>    PCI devices. It would also allow to change its semantics for PIIX3,
> >>>>    mapping directly to PIC IRQs.
> >>>
> >>> I think that's okay too simply because these things are usually
> >>> easy to fix after the fact when the rest of the issues are addressed.
> >>
> >> Don't get what you mean with "fixed". If we fix the vmstate generation
> >> in making it backward-compatible again, we enter option 2. Option 1 is
> >> explicitly about giving this up.
> > 
> > What I really mean is I think I see how 2 can be added without much
> > pain. So let's focus on 1 for now and worst case we break migration.
> 
> I'd like to avoid planing for this worst case as long as there are also
> statements [1] that this is not acceptable for QEMU in general. It
> doesn't to create a beautiful architecture initially about which we
> already know that it will become more complex than alternatives in the end.
> 
> > 
> >>>
> >>>> 2. Keep writing a legacy irq_count field.
> >>>>
> >>>>    This will require quite a few new APIs so that host bridges that
> >>>>    want to change their nirq can still generate a compatible irq_count
> >>>>    vmstate field. Namely:
> >>>>     - A function to set up vmstate_irq_count and define a callback that
> >>>>       the core will invoke to prepare the vmstate_irq_count before
> >>>>       vmsave.
> >>>>     - A function to obtain the IRQ mapping /without/ the final host
> >>>>       bridge step. This is required so that the callback above can
> >>>>       calculate the old state like in the PIIX3 case.
> >>>
> >>> Does this really need to be so complex? It seems that we just need
> >>> pci_get_irq_count(bus, irq) which can use the existing map_irq API, no?
> >>> Then invoke that before save.
> >>
> >> No, because the new map_irq of the PIIX3 bridge will also include the
> >> host bridge routing (or masking) according to the PIRQx routoing
> >> registers of the PIIX3. Moreover, the fixup of the written legacy
> >> irq_count state has to happen in the PCI layer, which therefore has to
> >> query the host bridge for fixup information, not the other way around
> >> (because the PCI bus vmstate is separate from the PIIX3 host bridge).
> >>
> >>>
> >>>> 3. Keep irq_count and nirq as is, introduce additional map_host_irq.
> >>>>
> >>>>    This is simpler than 2 and more compatible than 1. It would also
> >>>>    allow to introduce the polarity and masking information more
> >>>>    smoothly as we won't have to add it to existing map_irq callbacks
> >>>>    then.
> >>>
> >>> So what does it map, and to what?
> >>
> >> PCI bus IRQ to *host* IRQ. It is supposed to explore the routing of the
> >> host bridge between the root bus and the host's interrupt controller
> >> (i.e. the step that is currently missing the cached chain).
> >>
> >>> Maybe we can make the name imply that somehow.
> >>
> >> Better suggestions for this handler and maybe also the existing map_irq
> >> are welcome to make the difference clearer.
> >>
> >> Jan
> > 
> > So I won't object to adding a new API but if we do
> > it properly this won't help compatibility :(
> 
> It will as this API does not touch the parts that affect the vmstate
> (ie. semantics of irq_count won't change).

Yes but irq_count in vmstate is a bug. IMO even if we do
not change anything we should ignore irq_count on
load and recalculate it from what the devices supply.

> > 
> > Let's formulate what these do exactly, this will
> > also help us come up with sensible names.
> > 
> > 1. The difference is that pci bridges route interrupt pins. So it gets
> > interrupt pin on device and returns interrupt pin on connector. All
> > attributes are standard PCI.  We should remove all mentions of "irq"
> > really.
> > 
> > 
> > 2. The pci root (yes it's a host bridge but let's
> > not use the term host if we can) routes
> > an interrupt pin on device to a host irq. It can also
> > do more things like invert polarity.
> > 
> > So yes we can add 2 to piix but we really should
> > remove 1 from it.
> > 
> > Wrt names - do you object to long names?
> > How about route_interrupt_pin for 1
> > and route_interrupt_pin_to_irq for 2?
> 
> I'm fine with this.
> 
> Jan
> 
> [1] http://permalink.gmane.org/gmane.comp.emulators.qemu/153357
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
Michael S. Tsirkin - June 1, 2012, 3:30 p.m.
On Fri, Jun 01, 2012 at 05:15:42PM +0200, Jan Kiszka wrote:
> On 2012-06-01 16:34, Michael S. Tsirkin wrote:
> > On Fri, Jun 01, 2012 at 03:57:01PM +0200, Jan Kiszka wrote:
> >> On 2012-06-01 15:27, Michael S. Tsirkin wrote:
> >>> On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote:
> >>>> On 2012-05-30 22:31, Michael S. Tsirkin wrote:
> >>>>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use
> >>>>>>> irq_count instead of the pic_levels bitmap.
> >>>>>>
> >>>>>> Just that this affects generic PCI code, not only PIIX-specific things.
> >>>>>
> >>>>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs.
> >>>>>
> >>>>>> And that we need to save/restore some irq_count field according to the
> >>>>>> old semantics.
> >>>>>
> >>>>> Well, it's a bug: this is redundant info we should not have exposed it.
> >>>>>
> >>>>> Anyway, let's make the rest work properly and cleanly first, add a FIXME
> >>>>> for now, then we'll find a hack making it work for migration.
> >>>>
> >>>> It remains non-trivial: I got your patch working (a minor init issue),
> >>>> but yet without changing the number of IRQs for PIIX3, so keeping the
> >>>> irq_count semantics for this host bridge.
> >>>>
> >>>> Now I'm facing three possibilities of how to proceed:
> >>>
> >>> They all look OK I think :) Some comments below.
> >>>
> >>>> 1. Give up on the (currently broken) feature to write a vmstate for
> >>>>    older QEMU versions.
> >>>>
> >>>>    This will allow to declare the irq_count field in vmstate_pcibus
> >>>>    unused, and we would have to restore it on vmload step-wise via the
> >>>>    PCI devices. It would also allow to change its semantics for PIIX3,
> >>>>    mapping directly to PIC IRQs.
> >>>
> >>> I think that's okay too simply because these things are usually
> >>> easy to fix after the fact when the rest of the issues are addressed.
> >>
> >> Don't get what you mean with "fixed". If we fix the vmstate generation
> >> in making it backward-compatible again, we enter option 2. Option 1 is
> >> explicitly about giving this up.
> > 
> > What I really mean is I think I see how 2 can be added without much
> > pain. So let's focus on 1 for now and worst case we break migration.
> 
> I'd like to avoid planing for this worst case as long as there are also
> statements [1] that this is not acceptable for QEMU in general. It
> doesn't to create a beautiful architecture initially about which we
> already know that it will become more complex than alternatives in the end.

1 and 2 are same really except 2 adds a hack for compatibility, no?
Jan Kiszka - June 1, 2012, 3:54 p.m.
On 2012-06-01 17:28, Michael S. Tsirkin wrote:
>>> So I won't object to adding a new API but if we do
>>> it properly this won't help compatibility :(
>>
>> It will as this API does not touch the parts that affect the vmstate
>> (ie. semantics of irq_count won't change).
> 
> Yes but irq_count in vmstate is a bug. IMO even if we do
> not change anything we should ignore irq_count on
> load and recalculate it from what the devices supply.

I don't disagree. But this will only allow keeping backward migration
support if we preserve the semantics of current map_irq somewhere - to
keep the chance of calculating the legacy values for vmsave as well.

Jan
Jan Kiszka - June 1, 2012, 3:59 p.m.
On 2012-06-01 17:30, Michael S. Tsirkin wrote:
> On Fri, Jun 01, 2012 at 05:15:42PM +0200, Jan Kiszka wrote:
>> On 2012-06-01 16:34, Michael S. Tsirkin wrote:
>>> On Fri, Jun 01, 2012 at 03:57:01PM +0200, Jan Kiszka wrote:
>>>> On 2012-06-01 15:27, Michael S. Tsirkin wrote:
>>>>> On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote:
>>>>>> On 2012-05-30 22:31, Michael S. Tsirkin wrote:
>>>>>>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use
>>>>>>>>> irq_count instead of the pic_levels bitmap.
>>>>>>>>
>>>>>>>> Just that this affects generic PCI code, not only PIIX-specific things.
>>>>>>>
>>>>>>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs.
>>>>>>>
>>>>>>>> And that we need to save/restore some irq_count field according to the
>>>>>>>> old semantics.
>>>>>>>
>>>>>>> Well, it's a bug: this is redundant info we should not have exposed it.
>>>>>>>
>>>>>>> Anyway, let's make the rest work properly and cleanly first, add a FIXME
>>>>>>> for now, then we'll find a hack making it work for migration.
>>>>>>
>>>>>> It remains non-trivial: I got your patch working (a minor init issue),
>>>>>> but yet without changing the number of IRQs for PIIX3, so keeping the
>>>>>> irq_count semantics for this host bridge.
>>>>>>
>>>>>> Now I'm facing three possibilities of how to proceed:
>>>>>
>>>>> They all look OK I think :) Some comments below.
>>>>>
>>>>>> 1. Give up on the (currently broken) feature to write a vmstate for
>>>>>>    older QEMU versions.
>>>>>>
>>>>>>    This will allow to declare the irq_count field in vmstate_pcibus
>>>>>>    unused, and we would have to restore it on vmload step-wise via the
>>>>>>    PCI devices. It would also allow to change its semantics for PIIX3,
>>>>>>    mapping directly to PIC IRQs.
>>>>>
>>>>> I think that's okay too simply because these things are usually
>>>>> easy to fix after the fact when the rest of the issues are addressed.
>>>>
>>>> Don't get what you mean with "fixed". If we fix the vmstate generation
>>>> in making it backward-compatible again, we enter option 2. Option 1 is
>>>> explicitly about giving this up.
>>>
>>> What I really mean is I think I see how 2 can be added without much
>>> pain. So let's focus on 1 for now and worst case we break migration.
>>
>> I'd like to avoid planing for this worst case as long as there are also
>> statements [1] that this is not acceptable for QEMU in general. It
>> doesn't to create a beautiful architecture initially about which we
>> already know that it will become more complex than alternatives in the end.
> 
> 1 and 2 are same really except 2 adds a hack for compatibility, no?

Yes, 2 would allow us to maintain irq_count in different ways as well,
specifically using it calculate shared output IRQ lines before reporting
them to the PIC generically. This approach has both a charming and an
ugly face.

Jan
Michael S. Tsirkin - June 1, 2012, 4:05 p.m.
On Fri, Jun 01, 2012 at 05:54:15PM +0200, Jan Kiszka wrote:
> On 2012-06-01 17:28, Michael S. Tsirkin wrote:
> >>> So I won't object to adding a new API but if we do
> >>> it properly this won't help compatibility :(
> >>
> >> It will as this API does not touch the parts that affect the vmstate
> >> (ie. semantics of irq_count won't change).
> > 
> > Yes but irq_count in vmstate is a bug. IMO even if we do
> > not change anything we should ignore irq_count on
> > load and recalculate it from what the devices supply.
> 
> I don't disagree. But this will only allow keeping backward migration
> support if we preserve the semantics of current map_irq somewhere - to
> keep the chance of calculating the legacy values for vmsave as well.
> 
> Jan

We don't need to preserve it, right? We can calculate it before
savevm.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
Jan Kiszka - June 1, 2012, 4:17 p.m.
On 2012-06-01 18:05, Michael S. Tsirkin wrote:
> On Fri, Jun 01, 2012 at 05:54:15PM +0200, Jan Kiszka wrote:
>> On 2012-06-01 17:28, Michael S. Tsirkin wrote:
>>>>> So I won't object to adding a new API but if we do
>>>>> it properly this won't help compatibility :(
>>>>
>>>> It will as this API does not touch the parts that affect the vmstate
>>>> (ie. semantics of irq_count won't change).
>>>
>>> Yes but irq_count in vmstate is a bug. IMO even if we do
>>> not change anything we should ignore irq_count on
>>> load and recalculate it from what the devices supply.
>>
>> I don't disagree. But this will only allow keeping backward migration
>> support if we preserve the semantics of current map_irq somewhere - to
>> keep the chance of calculating the legacy values for vmsave as well.
>>
>> Jan
> 
> We don't need to preserve it, right? We can calculate it before
> savevm.

We can't calculate it without something like the additional
infrastructure I listed.

Jan

Patch

diff --git a/hw/pci.c b/hw/pci.c
index c1ebdde..313abe1 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -112,18 +112,33 @@  static inline void pci_set_irq_state(PCIDevice *d, int irq_num, int level)
 	d->irq_state |= level << irq_num;
 }
 
-static void pci_change_irq_level(PCIDevice *pci_dev, int irq_num, int change)
+static void pci_update_irq_maps_for_device(PCIBus *unused, PCIDevice *dev)
 {
     PCIBus *bus;
-    for (;;) {
-        bus = pci_dev->bus;
-        irq_num = bus->map_irq(pci_dev, irq_num);
-        if (bus->set_irq)
-            break;
-        pci_dev = bus->parent_dev;
+    PCIDevice *pci_dev;
+    int i, irq_num;
+    for (i = 0; i < PCI_NUM_PINS; ++i) {
+        pci_dev = dev;
+        irq_num = i;
+        for (;;) {
+            bus = pci_dev->bus;
+            irq_num = bus->map_irq(pci_dev, irq_num);
+            if (bus->set_irq)
+                break;
+            pci_dev = bus->parent_dev;
+        }
+        dev->irq_num[i] = irq_num;
+        bus->irq_count[i] += pci_irq_state(dev, i);
+        dev->root_bus = bus;
     }
-    bus->irq_count[irq_num] += change;
-    bus->set_irq(bus->irq_opaque, irq_num, bus->irq_count[irq_num] != 0);
+}
+
+static void pci_change_irq_level(PCIDevice *pci_dev, int irq_num, int change)
+{
+    PCIBus *bus = pci_dev->root_bus;
+    int i = pci_dev->irq_num[irq_num];
+    bus->irq_count[i] += change;
+    bus->set_irq(bus->irq_opaque, i, bus->irq_count[i] != 0);
 }
 
 int pci_bus_get_irq_level(PCIBus *bus, int irq_num)
@@ -805,6 +820,7 @@  static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
     bus->devices[devfn] = pci_dev;
     pci_dev->irq = qemu_allocate_irqs(pci_set_irq, pci_dev, PCI_NUM_PINS);
     pci_dev->version_id = 2; /* Current pci device vmstate version */
+    pci_update_irq_maps_for_device(NULL, pci_dev);
     return pci_dev;
 }
 
@@ -1140,6 +1156,18 @@  static void pci_for_each_device_under_bus(PCIBus *bus,
     }
 }
 
+/* Update per-device IRQ mappings after bus mappings changed. */
+void pci_update_irq_maps(PCIBus *bus)
+{
+    int i;
+    /* FIXME: this only scans one level.
+     * Need to scan child buses recursively. */
+    for (i = 0; i <= bus->nirq; ++i)
+        bus->irq_count[i] = 0;
+    pci_for_each_device_under_bus(bus, pci_update_irq_maps_for_device);
+}
+
+
 void pci_for_each_device(PCIBus *bus, int bus_num,
                          void (*fn)(PCIBus *b, PCIDevice *d))
 {
diff --git a/hw/pci.h b/hw/pci.h
index 8d0aa49..4102c44 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -207,6 +207,12 @@  struct PCIDevice {
     /* Current IRQ levels.  Used internally by the generic PCI code.  */
     uint8_t irq_state;
 
+    /* The root bus.  Used internally by the generic PCI code.  */
+    PCIBus *root_bus;
+
+    /* IRQ num at the root bus.  Used internally by the generic PCI code.  */
+    int irq_num[PCI_NUM_PINS];
+
     /* Capability bits */
     uint32_t cap_present;
 
@@ -305,6 +311,7 @@  PCIDevice *pci_nic_init_nofail(NICInfo *nd, const char *default_model,
                                const char *default_devaddr);
 int pci_bus_num(PCIBus *s);
 void pci_for_each_device(PCIBus *bus, int bus_num, void (*fn)(PCIBus *bus, PCIDevice *d));
+void pci_update_irq_maps(PCIBus *bus);
 PCIBus *pci_find_root_bus(int domain);
 int pci_find_domain(const PCIBus *bus);
 PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn);
diff --git a/hw/piix_pci.c b/hw/piix_pci.c
index 09e84f5..ac71294 100644
--- a/hw/piix_pci.c
+++ b/hw/piix_pci.c
@@ -391,6 +391,7 @@  static void piix3_update_irq_levels(PIIX3State *piix3)
 {
     int pirq;
 
+    pci_update_irq_maps(piix3->dev.bus);
     piix3->pic_levels = 0;
     for (pirq = 0; pirq < PIIX_NUM_PIRQS; pirq++) {
         piix3_set_irq_level(piix3, pirq,