Message ID | 20120530133415.GA27788@redhat.com |
---|---|
State | New |
Headers | show |
On 2012-05-30 15:34, Michael S. Tsirkin wrote: > Below's as far as I got - hopefully this > explains what I had in mind: for deep hierarchies > this will save a bus scan so I think this makes sense > even in the absense of kvm irqchip. > > Warning: completely untested and known to be incomplete. > Please judge whether this is going in a direction that > is helpful for your efforts. If you respond in the positive, > I hope to be able to get back to this next week. We need to - account for polarity changes along the route - get rid of irq_count as it is no longer updated in the fast path, thus useless on routing updates So it's a bit more complicated and requires a some broader refactorings. But there is likely some room for simplifications in the end, true. Jan
On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote: > On 2012-05-30 15:34, Michael S. Tsirkin wrote: > > Below's as far as I got - hopefully this > > explains what I had in mind: for deep hierarchies > > this will save a bus scan so I think this makes sense > > even in the absense of kvm irqchip. > > > > Warning: completely untested and known to be incomplete. > > Please judge whether this is going in a direction that > > is helpful for your efforts. If you respond in the positive, > > I hope to be able to get back to this next week. > > We need to > - account for polarity changes along the route > - get rid of irq_count as it is no longer updated in the fast path, > thus useless on routing updates I'll need to consider this more to understand what you mean here. > So it's a bit more complicated and requires a some broader refactorings. Is this one good as a first step then? If not I'll drop it for now. > But there is likely some room for simplifications in the end, true. > > Jan > > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux
On 2012-05-30 16:42, Michael S. Tsirkin wrote: > On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote: >> On 2012-05-30 15:34, Michael S. Tsirkin wrote: >>> Below's as far as I got - hopefully this >>> explains what I had in mind: for deep hierarchies >>> this will save a bus scan so I think this makes sense >>> even in the absense of kvm irqchip. >>> >>> Warning: completely untested and known to be incomplete. >>> Please judge whether this is going in a direction that >>> is helpful for your efforts. If you respond in the positive, >>> I hope to be able to get back to this next week. >> >> We need to >> - account for polarity changes along the route >> - get rid of irq_count as it is no longer updated in the fast path, >> thus useless on routing updates > > I'll need to consider this more to understand what you mean here. If, e.g., the host bridge is configured to flip the polarity of some interrupt on delivery, the fast path must be able to explore this in order to do the same. Then you may want to have a look at how irq_count is maintained and when it is used. > >> So it's a bit more complicated and requires a some broader refactorings. > > Is this one good as a first step then? > If not I'll drop it for now. I think we can't reliably implement this fast path delivery without solving the above issues, so we can't do it stepwise, at least not with this as first one. Jan
On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote: > On 2012-05-30 16:42, Michael S. Tsirkin wrote: > > On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote: > >> On 2012-05-30 15:34, Michael S. Tsirkin wrote: > >>> Below's as far as I got - hopefully this > >>> explains what I had in mind: for deep hierarchies > >>> this will save a bus scan so I think this makes sense > >>> even in the absense of kvm irqchip. > >>> > >>> Warning: completely untested and known to be incomplete. > >>> Please judge whether this is going in a direction that > >>> is helpful for your efforts. If you respond in the positive, > >>> I hope to be able to get back to this next week. > >> > >> We need to > >> - account for polarity changes along the route > >> - get rid of irq_count as it is no longer updated in the fast path, > >> thus useless on routing updates > > > > I'll need to consider this more to understand what you mean here. > > If, e.g., the host bridge is configured to flip the polarity of some > interrupt on delivery, the fast path must be able to explore this in > order to do the same. This can be solved incrementally I think - handle polarity change like mapping change, no? > Then you may want to have a look at how irq_count is maintained and when > it is used. In my patch it simply there to OR irq levels of all devices connected to a specific pin. > > > >> So it's a bit more complicated and requires a some broader refactorings. > > > > Is this one good as a first step then? > > If not I'll drop it for now. > > I think we can't reliably implement this fast path delivery without > solving the above issues, so we can't do it stepwise, at least not with > this as first one. > > Jan > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux
On 2012-05-30 16:59, Michael S. Tsirkin wrote: > On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote: >> On 2012-05-30 16:42, Michael S. Tsirkin wrote: >>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote: >>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote: >>>>> Below's as far as I got - hopefully this >>>>> explains what I had in mind: for deep hierarchies >>>>> this will save a bus scan so I think this makes sense >>>>> even in the absense of kvm irqchip. >>>>> >>>>> Warning: completely untested and known to be incomplete. >>>>> Please judge whether this is going in a direction that >>>>> is helpful for your efforts. If you respond in the positive, >>>>> I hope to be able to get back to this next week. >>>> >>>> We need to >>>> - account for polarity changes along the route >>>> - get rid of irq_count as it is no longer updated in the fast path, >>>> thus useless on routing updates >>> >>> I'll need to consider this more to understand what you mean here. >> >> If, e.g., the host bridge is configured to flip the polarity of some >> interrupt on delivery, the fast path must be able to explore this in >> order to do the same. > > This can be solved incrementally I think - handle polarity > change like mapping change, no? Both belong together if we want to do it generically, IMHO. > >> Then you may want to have a look at how irq_count is maintained and when >> it is used. > > In my patch it simply there to OR irq levels of all devices > connected to a specific pin. I think what we want is to avoid any walks and intermediate state updates for all IRQ deliveries. That would be beneficial for any user, not just device assignment. It would basically make the special case the normal one, reducing code complexity. As there are still some problems to solve for this, I'm just unsure if this step goes in the right direction and will be reusable. Jan
On Wed, May 30, 2012 at 05:47:45PM +0200, Jan Kiszka wrote: > On 2012-05-30 16:59, Michael S. Tsirkin wrote: > > On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote: > >> On 2012-05-30 16:42, Michael S. Tsirkin wrote: > >>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote: > >>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote: > >>>>> Below's as far as I got - hopefully this > >>>>> explains what I had in mind: for deep hierarchies > >>>>> this will save a bus scan so I think this makes sense > >>>>> even in the absense of kvm irqchip. > >>>>> > >>>>> Warning: completely untested and known to be incomplete. > >>>>> Please judge whether this is going in a direction that > >>>>> is helpful for your efforts. If you respond in the positive, > >>>>> I hope to be able to get back to this next week. > >>>> > >>>> We need to > >>>> - account for polarity changes along the route > >>>> - get rid of irq_count as it is no longer updated in the fast path, > >>>> thus useless on routing updates > >>> > >>> I'll need to consider this more to understand what you mean here. > >> > >> If, e.g., the host bridge is configured to flip the polarity of some > >> interrupt on delivery, the fast path must be able to explore this in > >> order to do the same. > > > > This can be solved incrementally I think - handle polarity > > change like mapping change, no? > > Both belong together if we want to do it generically, IMHO. So irq_polarity callback like we have for map_irq ATM? But at least pci bridges never do this flip. Maybe this is the property of the interrupt controller after all? > > > >> Then you may want to have a look at how irq_count is maintained and when > >> it is used. > > > > In my patch it simply there to OR irq levels of all devices > > connected to a specific pin. > > I think what we want is to avoid any walks and intermediate state > updates for all IRQ deliveries. Well the bus is not an intermediate state ATM as piix only has a bit per IRQ so it can't OR devices together. So you want to move the counter over to piix? > That would be beneficial for any user, > not just device assignment. It would basically make the special case the > normal one, reducing code complexity. As there are still some problems > to solve for this, I'm just unsure if this step goes in the right > direction and will be reusable. > > Jan > > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux
On 2012-05-30 18:17, Michael S. Tsirkin wrote: > On Wed, May 30, 2012 at 05:47:45PM +0200, Jan Kiszka wrote: >> On 2012-05-30 16:59, Michael S. Tsirkin wrote: >>> On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote: >>>> On 2012-05-30 16:42, Michael S. Tsirkin wrote: >>>>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote: >>>>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote: >>>>>>> Below's as far as I got - hopefully this >>>>>>> explains what I had in mind: for deep hierarchies >>>>>>> this will save a bus scan so I think this makes sense >>>>>>> even in the absense of kvm irqchip. >>>>>>> >>>>>>> Warning: completely untested and known to be incomplete. >>>>>>> Please judge whether this is going in a direction that >>>>>>> is helpful for your efforts. If you respond in the positive, >>>>>>> I hope to be able to get back to this next week. >>>>>> >>>>>> We need to >>>>>> - account for polarity changes along the route >>>>>> - get rid of irq_count as it is no longer updated in the fast path, >>>>>> thus useless on routing updates >>>>> >>>>> I'll need to consider this more to understand what you mean here. >>>> >>>> If, e.g., the host bridge is configured to flip the polarity of some >>>> interrupt on delivery, the fast path must be able to explore this in >>>> order to do the same. >>> >>> This can be solved incrementally I think - handle polarity >>> change like mapping change, no? >> >> Both belong together if we want to do it generically, IMHO. > > So irq_polarity callback like we have for map_irq ATM? But at least pci > bridges never do this flip. Maybe this is the property of the interrupt > controller after all? It is a property of some host bridges apparently (e.g. bonito). In theory, someone could also add a PCI-PCI bridge that does this (or does the spec disallow it?). > >>> >>>> Then you may want to have a look at how irq_count is maintained and when >>>> it is used. >>> >>> In my patch it simply there to OR irq levels of all devices >>> connected to a specific pin. >> >> I think what we want is to avoid any walks and intermediate state >> updates for all IRQ deliveries. > > Well the bus is not an intermediate state ATM as piix > only has a bit per IRQ so it can't OR devices together. > So you want to move the counter over to piix? irq_count can't be moved logically as it is part of the vmstate. But it should only be generated for saving the state by polling all devices (and bridges). For that we need is an optional callback for devices via which we can ask them to update PCIDevice::irq_state in case they don't trigger pci_set_irq regularly. And pci_set_irq could simply change the input line of the IRQ controller according to the cached route and polarity mapping. That's basically what I have in mind for any bus. But we could first try it out on PCI, then later on generalize the design to make it useable for all IRQ routings. I'm afraid there is no simpler way to introduce direct IRQ delivery for PCI (unless ignoring corner cases like in qemu-kvm). Jan
On Wed, May 30, 2012 at 06:46:03PM +0200, Jan Kiszka wrote: > On 2012-05-30 18:17, Michael S. Tsirkin wrote: > > On Wed, May 30, 2012 at 05:47:45PM +0200, Jan Kiszka wrote: > >> On 2012-05-30 16:59, Michael S. Tsirkin wrote: > >>> On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote: > >>>> On 2012-05-30 16:42, Michael S. Tsirkin wrote: > >>>>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote: > >>>>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote: > >>>>>>> Below's as far as I got - hopefully this > >>>>>>> explains what I had in mind: for deep hierarchies > >>>>>>> this will save a bus scan so I think this makes sense > >>>>>>> even in the absense of kvm irqchip. > >>>>>>> > >>>>>>> Warning: completely untested and known to be incomplete. > >>>>>>> Please judge whether this is going in a direction that > >>>>>>> is helpful for your efforts. If you respond in the positive, > >>>>>>> I hope to be able to get back to this next week. > >>>>>> > >>>>>> We need to > >>>>>> - account for polarity changes along the route > >>>>>> - get rid of irq_count as it is no longer updated in the fast path, > >>>>>> thus useless on routing updates > >>>>> > >>>>> I'll need to consider this more to understand what you mean here. > >>>> > >>>> If, e.g., the host bridge is configured to flip the polarity of some > >>>> interrupt on delivery, the fast path must be able to explore this in > >>>> order to do the same. > >>> > >>> This can be solved incrementally I think - handle polarity > >>> change like mapping change, no? > >> > >> Both belong together if we want to do it generically, IMHO. > > > > So irq_polarity callback like we have for map_irq ATM? But at least pci > > bridges never do this flip. Maybe this is the property of the interrupt > > controller after all? > > It is a property of some host bridges apparently (e.g. bonito). So I'm not sure it's worth it to abstract that but I don't mind either. > In > theory, someone could also add a PCI-PCI bridge that does this (or does > the spec disallow it?). It seems to disallow it. > > > >>> > >>>> Then you may want to have a look at how irq_count is maintained and when > >>>> it is used. > >>> > >>> In my patch it simply there to OR irq levels of all devices > >>> connected to a specific pin. > >> > >> I think what we want is to avoid any walks and intermediate state > >> updates for all IRQ deliveries. > > > > Well the bus is not an intermediate state ATM as piix > > only has a bit per IRQ so it can't OR devices together. > > So you want to move the counter over to piix? > > irq_count can't be moved logically as it is part of the vmstate. But it > should only be generated for saving the state by polling all devices > (and bridges). > > For that we need is an optional callback for devices via which we can > ask them to update PCIDevice::irq_state in case they don't trigger > pci_set_irq regularly. Let's worry about migration compatibility separately. > And pci_set_irq could simply change the input > line of the IRQ controller according to the cached route and polarity > mapping. But the line is shared between multiple devices. You need to perform a logical OR function between them all, this is what the counter does. > That's basically what I have in mind for any bus. But we could first try > it out on PCI, then later on generalize the design to make it useable > for all IRQ routings. I'm afraid there is no simpler way to introduce > direct IRQ delivery for PCI (unless ignoring corner cases like in qemu-kvm). > > Jan > > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux
On 2012-05-30 19:20, Michael S. Tsirkin wrote: > On Wed, May 30, 2012 at 06:46:03PM +0200, Jan Kiszka wrote: >> On 2012-05-30 18:17, Michael S. Tsirkin wrote: >>> On Wed, May 30, 2012 at 05:47:45PM +0200, Jan Kiszka wrote: >>>> On 2012-05-30 16:59, Michael S. Tsirkin wrote: >>>>> On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote: >>>>>> On 2012-05-30 16:42, Michael S. Tsirkin wrote: >>>>>>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote: >>>>>>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote: >>>>>>>>> Below's as far as I got - hopefully this >>>>>>>>> explains what I had in mind: for deep hierarchies >>>>>>>>> this will save a bus scan so I think this makes sense >>>>>>>>> even in the absense of kvm irqchip. >>>>>>>>> >>>>>>>>> Warning: completely untested and known to be incomplete. >>>>>>>>> Please judge whether this is going in a direction that >>>>>>>>> is helpful for your efforts. If you respond in the positive, >>>>>>>>> I hope to be able to get back to this next week. >>>>>>>> >>>>>>>> We need to >>>>>>>> - account for polarity changes along the route >>>>>>>> - get rid of irq_count as it is no longer updated in the fast path, >>>>>>>> thus useless on routing updates >>>>>>> >>>>>>> I'll need to consider this more to understand what you mean here. >>>>>> >>>>>> If, e.g., the host bridge is configured to flip the polarity of some >>>>>> interrupt on delivery, the fast path must be able to explore this in >>>>>> order to do the same. >>>>> >>>>> This can be solved incrementally I think - handle polarity >>>>> change like mapping change, no? >>>> >>>> Both belong together if we want to do it generically, IMHO. >>> >>> So irq_polarity callback like we have for map_irq ATM? But at least pci >>> bridges never do this flip. Maybe this is the property of the interrupt >>> controller after all? >> >> It is a property of some host bridges apparently (e.g. bonito). > > So I'm not sure it's worth it to abstract that but I don't mind either. It is must for a generic solution. We cannot modeling after a single host bridge called PIIX3. > >> In >> theory, someone could also add a PCI-PCI bridge that does this (or does >> the spec disallow it?). > > It seems to disallow it. > >>> >>>>> >>>>>> Then you may want to have a look at how irq_count is maintained and when >>>>>> it is used. >>>>> >>>>> In my patch it simply there to OR irq levels of all devices >>>>> connected to a specific pin. >>>> >>>> I think what we want is to avoid any walks and intermediate state >>>> updates for all IRQ deliveries. >>> >>> Well the bus is not an intermediate state ATM as piix >>> only has a bit per IRQ so it can't OR devices together. >>> So you want to move the counter over to piix? >> >> irq_count can't be moved logically as it is part of the vmstate. But it >> should only be generated for saving the state by polling all devices >> (and bridges). >> >> For that we need is an optional callback for devices via which we can >> ask them to update PCIDevice::irq_state in case they don't trigger >> pci_set_irq regularly. > > Let's worry about migration compatibility separately. > >> And pci_set_irq could simply change the input >> line of the IRQ controller according to the cached route and polarity >> mapping. > > But the line is shared between multiple devices. > You need to perform a logical OR function between them all, > this is what the counter does. Needs to be solved at a different level (at the final output of the fast path, surely not per PCI bus). Jan
On Wed, May 30, 2012 at 07:29:52PM +0200, Jan Kiszka wrote: > On 2012-05-30 19:20, Michael S. Tsirkin wrote: > > On Wed, May 30, 2012 at 06:46:03PM +0200, Jan Kiszka wrote: > >> On 2012-05-30 18:17, Michael S. Tsirkin wrote: > >>> On Wed, May 30, 2012 at 05:47:45PM +0200, Jan Kiszka wrote: > >>>> On 2012-05-30 16:59, Michael S. Tsirkin wrote: > >>>>> On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote: > >>>>>> On 2012-05-30 16:42, Michael S. Tsirkin wrote: > >>>>>>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote: > >>>>>>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote: > >>>>>>>>> Below's as far as I got - hopefully this > >>>>>>>>> explains what I had in mind: for deep hierarchies > >>>>>>>>> this will save a bus scan so I think this makes sense > >>>>>>>>> even in the absense of kvm irqchip. > >>>>>>>>> > >>>>>>>>> Warning: completely untested and known to be incomplete. > >>>>>>>>> Please judge whether this is going in a direction that > >>>>>>>>> is helpful for your efforts. If you respond in the positive, > >>>>>>>>> I hope to be able to get back to this next week. > >>>>>>>> > >>>>>>>> We need to > >>>>>>>> - account for polarity changes along the route > >>>>>>>> - get rid of irq_count as it is no longer updated in the fast path, > >>>>>>>> thus useless on routing updates > >>>>>>> > >>>>>>> I'll need to consider this more to understand what you mean here. > >>>>>> > >>>>>> If, e.g., the host bridge is configured to flip the polarity of some > >>>>>> interrupt on delivery, the fast path must be able to explore this in > >>>>>> order to do the same. > >>>>> > >>>>> This can be solved incrementally I think - handle polarity > >>>>> change like mapping change, no? > >>>> > >>>> Both belong together if we want to do it generically, IMHO. > >>> > >>> So irq_polarity callback like we have for map_irq ATM? But at least pci > >>> bridges never do this flip. Maybe this is the property of the interrupt > >>> controller after all? > >> > >> It is a property of some host bridges apparently (e.g. bonito). > > > > So I'm not sure it's worth it to abstract that but I don't mind either. > > It is must for a generic solution. We cannot modeling after a single > host bridge called PIIX3. > > > > >> In > >> theory, someone could also add a PCI-PCI bridge that does this (or does > >> the spec disallow it?). > > > > It seems to disallow it. > > > >>> > >>>>> > >>>>>> Then you may want to have a look at how irq_count is maintained and when > >>>>>> it is used. > >>>>> > >>>>> In my patch it simply there to OR irq levels of all devices > >>>>> connected to a specific pin. > >>>> > >>>> I think what we want is to avoid any walks and intermediate state > >>>> updates for all IRQ deliveries. > >>> > >>> Well the bus is not an intermediate state ATM as piix > >>> only has a bit per IRQ so it can't OR devices together. > >>> So you want to move the counter over to piix? > >> > >> irq_count can't be moved logically as it is part of the vmstate. But it > >> should only be generated for saving the state by polling all devices > >> (and bridges). > >> > >> For that we need is an optional callback for devices via which we can > >> ask them to update PCIDevice::irq_state in case they don't trigger > >> pci_set_irq regularly. > > > > Let's worry about migration compatibility separately. > > > >> And pci_set_irq could simply change the input > >> line of the IRQ controller according to the cached route and polarity > >> mapping. > > > > But the line is shared between multiple devices. > > You need to perform a logical OR function between them all, > > this is what the counter does. > > Needs to be solved at a different level (at the final output of the fast > path, surely not per PCI bus). > > Jan Well the final output is in kvm :) What happens with assigned devices is that kvm performs the logical OR using source id. But the number of available source IDs is limited (up to BIT_PER_LONG currently). So it's not reasonable to allocate one per emulated device. So assigned devices will have to be special-cased somehow. Given that, is there really a point in moving these bits around? Well, I'll stop ranting here. The patch that I sent is not intrusive and simply gives you a simple way to implement pci_device_get_host_irq, also optimizing emulated devices somewhat. So if you think you need pci_device_get_host_irq I think this is a reasonable way to support that. But if you changed your mind, I don't mind. > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux
On 2012-05-30 19:41, Michael S. Tsirkin wrote: > On Wed, May 30, 2012 at 07:29:52PM +0200, Jan Kiszka wrote: >> On 2012-05-30 19:20, Michael S. Tsirkin wrote: >>> On Wed, May 30, 2012 at 06:46:03PM +0200, Jan Kiszka wrote: >>>> On 2012-05-30 18:17, Michael S. Tsirkin wrote: >>>>> On Wed, May 30, 2012 at 05:47:45PM +0200, Jan Kiszka wrote: >>>>>> On 2012-05-30 16:59, Michael S. Tsirkin wrote: >>>>>>> On Wed, May 30, 2012 at 04:46:14PM +0200, Jan Kiszka wrote: >>>>>>>> On 2012-05-30 16:42, Michael S. Tsirkin wrote: >>>>>>>>> On Wed, May 30, 2012 at 04:38:25PM +0200, Jan Kiszka wrote: >>>>>>>>>> On 2012-05-30 15:34, Michael S. Tsirkin wrote: >>>>>>>>>>> Below's as far as I got - hopefully this >>>>>>>>>>> explains what I had in mind: for deep hierarchies >>>>>>>>>>> this will save a bus scan so I think this makes sense >>>>>>>>>>> even in the absense of kvm irqchip. >>>>>>>>>>> >>>>>>>>>>> Warning: completely untested and known to be incomplete. >>>>>>>>>>> Please judge whether this is going in a direction that >>>>>>>>>>> is helpful for your efforts. If you respond in the positive, >>>>>>>>>>> I hope to be able to get back to this next week. >>>>>>>>>> >>>>>>>>>> We need to >>>>>>>>>> - account for polarity changes along the route >>>>>>>>>> - get rid of irq_count as it is no longer updated in the fast path, >>>>>>>>>> thus useless on routing updates >>>>>>>>> >>>>>>>>> I'll need to consider this more to understand what you mean here. >>>>>>>> >>>>>>>> If, e.g., the host bridge is configured to flip the polarity of some >>>>>>>> interrupt on delivery, the fast path must be able to explore this in >>>>>>>> order to do the same. >>>>>>> >>>>>>> This can be solved incrementally I think - handle polarity >>>>>>> change like mapping change, no? >>>>>> >>>>>> Both belong together if we want to do it generically, IMHO. >>>>> >>>>> So irq_polarity callback like we have for map_irq ATM? But at least pci >>>>> bridges never do this flip. Maybe this is the property of the interrupt >>>>> controller after all? >>>> >>>> It is a property of some host bridges apparently (e.g. bonito). >>> >>> So I'm not sure it's worth it to abstract that but I don't mind either. >> >> It is must for a generic solution. We cannot modeling after a single >> host bridge called PIIX3. >> >>> >>>> In >>>> theory, someone could also add a PCI-PCI bridge that does this (or does >>>> the spec disallow it?). >>> >>> It seems to disallow it. >>> >>>>> >>>>>>> >>>>>>>> Then you may want to have a look at how irq_count is maintained and when >>>>>>>> it is used. >>>>>>> >>>>>>> In my patch it simply there to OR irq levels of all devices >>>>>>> connected to a specific pin. >>>>>> >>>>>> I think what we want is to avoid any walks and intermediate state >>>>>> updates for all IRQ deliveries. >>>>> >>>>> Well the bus is not an intermediate state ATM as piix >>>>> only has a bit per IRQ so it can't OR devices together. >>>>> So you want to move the counter over to piix? >>>> >>>> irq_count can't be moved logically as it is part of the vmstate. But it >>>> should only be generated for saving the state by polling all devices >>>> (and bridges). >>>> >>>> For that we need is an optional callback for devices via which we can >>>> ask them to update PCIDevice::irq_state in case they don't trigger >>>> pci_set_irq regularly. >>> >>> Let's worry about migration compatibility separately. >>> >>>> And pci_set_irq could simply change the input >>>> line of the IRQ controller according to the cached route and polarity >>>> mapping. >>> >>> But the line is shared between multiple devices. >>> You need to perform a logical OR function between them all, >>> this is what the counter does. >> >> Needs to be solved at a different level (at the final output of the fast >> path, surely not per PCI bus). >> >> Jan > > Well the final output is in kvm :) One final output. Userspace originated IRQs can be merged in userspace before flipping KVM's [IOA]PIC input line. > What happens with assigned devices is > that kvm performs the logical OR using source id. But the number of > available source IDs is limited (up to BIT_PER_LONG currently). > > So it's not reasonable to allocate one per emulated device. > > So assigned devices will have to be special-cased somehow. > Given that, is there really a point in moving these bits > around? > > Well, I'll stop ranting here. The patch that I sent is not intrusive > and simply gives you a simple way to implement pci_device_get_host_irq, > also optimizing emulated devices somewhat. So if you think you need > pci_device_get_host_irq I think this is a reasonable way to support > that. But if you changed your mind, I don't mind. Sorry, your patch doesn't help me in any way. Jan
On 2012-05-30 19:51, Jan Kiszka wrote: >> Well, I'll stop ranting here. The patch that I sent is not intrusive >> and simply gives you a simple way to implement pci_device_get_host_irq, >> also optimizing emulated devices somewhat. So if you think you need >> pci_device_get_host_irq I think this is a reasonable way to support >> that. But if you changed your mind, I don't mind. > > Sorry, your patch doesn't help me in any way. [to finish the sentence] ...as it doesn't handle the final routing step in the host bridge. I still need to look this up and provide that via pci_device_get_host_irq. For that I need the additional callback for host bridges. But I also need to solve the other problems discussed in the past hours. I'm back at the drawing board. Jan
On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote: > On 2012-05-30 19:51, Jan Kiszka wrote: > >> Well, I'll stop ranting here. The patch that I sent is not intrusive > >> and simply gives you a simple way to implement pci_device_get_host_irq, > >> also optimizing emulated devices somewhat. So if you think you need > >> pci_device_get_host_irq I think this is a reasonable way to support > >> that. But if you changed your mind, I don't mind. > > > > Sorry, your patch doesn't help me in any way. > > [to finish the sentence] > > ...as it doesn't handle the final routing step in the host bridge. I think you mean the logic in piix3_set_irq_level? True. I suggest we make piix3_set_irq_level use the map_irq infrastructure somehow: generalize it not to rely on device->bus relationship. Then something like my patch will solve the problem completely. > I > still need to look this up and provide that via pci_device_get_host_irq. > For that I need the additional callback for host bridges. But I also > need to solve the other problems discussed in the past hours. I'm back > at the drawing board. > > Jan > > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux
On Wed, May 30, 2012 at 09:23:56PM +0300, Michael S. Tsirkin wrote: > On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote: > > On 2012-05-30 19:51, Jan Kiszka wrote: > > >> Well, I'll stop ranting here. The patch that I sent is not intrusive > > >> and simply gives you a simple way to implement pci_device_get_host_irq, > > >> also optimizing emulated devices somewhat. So if you think you need > > >> pci_device_get_host_irq I think this is a reasonable way to support > > >> that. But if you changed your mind, I don't mind. > > > > > > Sorry, your patch doesn't help me in any way. > > > > [to finish the sentence] > > > > ...as it doesn't handle the final routing step in the host bridge. > > I think you mean the logic in piix3_set_irq_level? > > True. I suggest we make piix3_set_irq_level use the map_irq > infrastructure somehow: BTW, can't we simply override map_irq and make it read from piix3->dev.config[PIIX_PIRQC + pirq]? The root bus is part of root complex I don't see why doesn't it output an IRQ# that is directly useful to feed into the CPU. > generalize it not to rely on device->bus > relationship. > > Then something like my patch will solve the problem completely. > > > > I > > still need to look this up and provide that via pci_device_get_host_irq. > > For that I need the additional callback for host bridges. But I also > > need to solve the other problems discussed in the past hours. I'm back > > at the drawing board. > > > > Jan > > > > -- > > Siemens AG, Corporate Technology, CT T DE IT 1 > > Corporate Competence Center Embedded Linux
On Wed, May 30, 2012 at 09:29:13PM +0300, Michael S. Tsirkin wrote: > On Wed, May 30, 2012 at 09:23:56PM +0300, Michael S. Tsirkin wrote: > > On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote: > > > On 2012-05-30 19:51, Jan Kiszka wrote: > > > >> Well, I'll stop ranting here. The patch that I sent is not intrusive > > > >> and simply gives you a simple way to implement pci_device_get_host_irq, > > > >> also optimizing emulated devices somewhat. So if you think you need > > > >> pci_device_get_host_irq I think this is a reasonable way to support > > > >> that. But if you changed your mind, I don't mind. > > > > > > > > Sorry, your patch doesn't help me in any way. > > > > > > [to finish the sentence] > > > > > > ...as it doesn't handle the final routing step in the host bridge. > > > > I think you mean the logic in piix3_set_irq_level? > > > > True. I suggest we make piix3_set_irq_level use the map_irq > > infrastructure somehow: > > BTW, can't we simply override map_irq and make it read from > piix3->dev.config[PIIX_PIRQC + pirq]? Basically move pic_irq = piix3->dev.config[PIIX_PIRQC + pirq]; to pci_slot_get_pirq. > The root bus is part of root complex I don't see why > doesn't it output an IRQ# that is directly useful > to feed into the CPU. > > > generalize it not to rely on device->bus > > relationship. > > > > Then something like my patch will solve the problem completely. > > > > > > > I > > > still need to look this up and provide that via pci_device_get_host_irq. > > > For that I need the additional callback for host bridges. But I also > > > need to solve the other problems discussed in the past hours. I'm back > > > at the drawing board. > > > > > > Jan > > > > > > -- > > > Siemens AG, Corporate Technology, CT T DE IT 1 > > > Corporate Competence Center Embedded Linux
On 2012-05-30 20:51, Michael S. Tsirkin wrote: > On Wed, May 30, 2012 at 09:29:13PM +0300, Michael S. Tsirkin wrote: >> On Wed, May 30, 2012 at 09:23:56PM +0300, Michael S. Tsirkin wrote: >>> On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote: >>>> On 2012-05-30 19:51, Jan Kiszka wrote: >>>>>> Well, I'll stop ranting here. The patch that I sent is not intrusive >>>>>> and simply gives you a simple way to implement pci_device_get_host_irq, >>>>>> also optimizing emulated devices somewhat. So if you think you need >>>>>> pci_device_get_host_irq I think this is a reasonable way to support >>>>>> that. But if you changed your mind, I don't mind. >>>>> >>>>> Sorry, your patch doesn't help me in any way. >>>> >>>> [to finish the sentence] >>>> >>>> ...as it doesn't handle the final routing step in the host bridge. >>> >>> I think you mean the logic in piix3_set_irq_level? >>> >>> True. I suggest we make piix3_set_irq_level use the map_irq >>> infrastructure somehow: >> >> BTW, can't we simply override map_irq and make it read from >> piix3->dev.config[PIIX_PIRQC + pirq]? > > Basically move > pic_irq = piix3->dev.config[PIIX_PIRQC + pirq]; > to pci_slot_get_pirq. map_irq might be reused, but not that easily as the return value is used as irq_count index. Jan
On Wed, May 30, 2012 at 09:06:25PM +0200, Jan Kiszka wrote: > On 2012-05-30 20:51, Michael S. Tsirkin wrote: > > On Wed, May 30, 2012 at 09:29:13PM +0300, Michael S. Tsirkin wrote: > >> On Wed, May 30, 2012 at 09:23:56PM +0300, Michael S. Tsirkin wrote: > >>> On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote: > >>>> On 2012-05-30 19:51, Jan Kiszka wrote: > >>>>>> Well, I'll stop ranting here. The patch that I sent is not intrusive > >>>>>> and simply gives you a simple way to implement pci_device_get_host_irq, > >>>>>> also optimizing emulated devices somewhat. So if you think you need > >>>>>> pci_device_get_host_irq I think this is a reasonable way to support > >>>>>> that. But if you changed your mind, I don't mind. > >>>>> > >>>>> Sorry, your patch doesn't help me in any way. > >>>> > >>>> [to finish the sentence] > >>>> > >>>> ...as it doesn't handle the final routing step in the host bridge. > >>> > >>> I think you mean the logic in piix3_set_irq_level? > >>> > >>> True. I suggest we make piix3_set_irq_level use the map_irq > >>> infrastructure somehow: > >> > >> BTW, can't we simply override map_irq and make it read from > >> piix3->dev.config[PIIX_PIRQC + pirq]? > > > > Basically move > > pic_irq = piix3->dev.config[PIIX_PIRQC + pirq]; > > to pci_slot_get_pirq. > > map_irq might be reused, but not that easily as the return value is used > as irq_count index. > > Jan So we'll just have PIIX_NUM_PIC_IRQS entries there and use irq_count instead of the pic_levels bitmap. > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux
On 2012-05-30 21:30, Michael S. Tsirkin wrote: > On Wed, May 30, 2012 at 09:06:25PM +0200, Jan Kiszka wrote: >> On 2012-05-30 20:51, Michael S. Tsirkin wrote: >>> On Wed, May 30, 2012 at 09:29:13PM +0300, Michael S. Tsirkin wrote: >>>> On Wed, May 30, 2012 at 09:23:56PM +0300, Michael S. Tsirkin wrote: >>>>> On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote: >>>>>> On 2012-05-30 19:51, Jan Kiszka wrote: >>>>>>>> Well, I'll stop ranting here. The patch that I sent is not intrusive >>>>>>>> and simply gives you a simple way to implement pci_device_get_host_irq, >>>>>>>> also optimizing emulated devices somewhat. So if you think you need >>>>>>>> pci_device_get_host_irq I think this is a reasonable way to support >>>>>>>> that. But if you changed your mind, I don't mind. >>>>>>> >>>>>>> Sorry, your patch doesn't help me in any way. >>>>>> >>>>>> [to finish the sentence] >>>>>> >>>>>> ...as it doesn't handle the final routing step in the host bridge. >>>>> >>>>> I think you mean the logic in piix3_set_irq_level? >>>>> >>>>> True. I suggest we make piix3_set_irq_level use the map_irq >>>>> infrastructure somehow: >>>> >>>> BTW, can't we simply override map_irq and make it read from >>>> piix3->dev.config[PIIX_PIRQC + pirq]? >>> >>> Basically move >>> pic_irq = piix3->dev.config[PIIX_PIRQC + pirq]; >>> to pci_slot_get_pirq. >> >> map_irq might be reused, but not that easily as the return value is used >> as irq_count index. >> >> Jan > > So we'll just have PIIX_NUM_PIC_IRQS entries there and use > irq_count instead of the pic_levels bitmap. Just that this affects generic PCI code, not only PIIX-specific things. And that we need to save/restore some irq_count field according to the old semantics. As I said: not that easy. :) Jan
On Wed, May 30, 2012 at 10:23:16PM +0200, Jan Kiszka wrote: > On 2012-05-30 21:30, Michael S. Tsirkin wrote: > > On Wed, May 30, 2012 at 09:06:25PM +0200, Jan Kiszka wrote: > >> On 2012-05-30 20:51, Michael S. Tsirkin wrote: > >>> On Wed, May 30, 2012 at 09:29:13PM +0300, Michael S. Tsirkin wrote: > >>>> On Wed, May 30, 2012 at 09:23:56PM +0300, Michael S. Tsirkin wrote: > >>>>> On Wed, May 30, 2012 at 07:57:04PM +0200, Jan Kiszka wrote: > >>>>>> On 2012-05-30 19:51, Jan Kiszka wrote: > >>>>>>>> Well, I'll stop ranting here. The patch that I sent is not intrusive > >>>>>>>> and simply gives you a simple way to implement pci_device_get_host_irq, > >>>>>>>> also optimizing emulated devices somewhat. So if you think you need > >>>>>>>> pci_device_get_host_irq I think this is a reasonable way to support > >>>>>>>> that. But if you changed your mind, I don't mind. > >>>>>>> > >>>>>>> Sorry, your patch doesn't help me in any way. > >>>>>> > >>>>>> [to finish the sentence] > >>>>>> > >>>>>> ...as it doesn't handle the final routing step in the host bridge. > >>>>> > >>>>> I think you mean the logic in piix3_set_irq_level? > >>>>> > >>>>> True. I suggest we make piix3_set_irq_level use the map_irq > >>>>> infrastructure somehow: > >>>> > >>>> BTW, can't we simply override map_irq and make it read from > >>>> piix3->dev.config[PIIX_PIRQC + pirq]? > >>> > >>> Basically move > >>> pic_irq = piix3->dev.config[PIIX_PIRQC + pirq]; > >>> to pci_slot_get_pirq. > >> > >> map_irq might be reused, but not that easily as the return value is used > >> as irq_count index. > >> > >> Jan > > > > So we'll just have PIIX_NUM_PIC_IRQS entries there and use > > irq_count instead of the pic_levels bitmap. > > Just that this affects generic PCI code, not only PIIX-specific things. Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs. > And that we need to save/restore some irq_count field according to the > old semantics. Well, it's a bug: this is redundant info we should not have exposed it. Anyway, let's make the rest work properly and cleanly first, add a FIXME for now, then we'll find a hack making it work for migration. > As I said: not that easy. :) > > Jan >
On 2012-05-30 22:31, Michael S. Tsirkin wrote: >>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use >>> irq_count instead of the pic_levels bitmap. >> >> Just that this affects generic PCI code, not only PIIX-specific things. > > Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs. > >> And that we need to save/restore some irq_count field according to the >> old semantics. > > Well, it's a bug: this is redundant info we should not have exposed it. > > Anyway, let's make the rest work properly and cleanly first, add a FIXME > for now, then we'll find a hack making it work for migration. It remains non-trivial: I got your patch working (a minor init issue), but yet without changing the number of IRQs for PIIX3, so keeping the irq_count semantics for this host bridge. Now I'm facing three possibilities of how to proceed: 1. Give up on the (currently broken) feature to write a vmstate for older QEMU versions. This will allow to declare the irq_count field in vmstate_pcibus unused, and we would have to restore it on vmload step-wise via the PCI devices. It would also allow to change its semantics for PIIX3, mapping directly to PIC IRQs. 2. Keep writing a legacy irq_count field. This will require quite a few new APIs so that host bridges that want to change their nirq can still generate a compatible irq_count vmstate field. Namely: - A function to set up vmstate_irq_count and define a callback that the core will invoke to prepare the vmstate_irq_count before vmsave. - A function to obtain the IRQ mapping /without/ the final host bridge step. This is required so that the callback above can calculate the old state like in the PIIX3 case. 3. Keep irq_count and nirq as is, introduce additional map_host_irq. This is simpler than 2 and more compatible than 1. It would also allow to introduce the polarity and masking information more smoothly as we won't have to add it to existing map_irq callbacks then. Any other suggestions? Jan
On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote: > On 2012-05-30 22:31, Michael S. Tsirkin wrote: > >>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use > >>> irq_count instead of the pic_levels bitmap. > >> > >> Just that this affects generic PCI code, not only PIIX-specific things. > > > > Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs. > > > >> And that we need to save/restore some irq_count field according to the > >> old semantics. > > > > Well, it's a bug: this is redundant info we should not have exposed it. > > > > Anyway, let's make the rest work properly and cleanly first, add a FIXME > > for now, then we'll find a hack making it work for migration. > > It remains non-trivial: I got your patch working (a minor init issue), > but yet without changing the number of IRQs for PIIX3, so keeping the > irq_count semantics for this host bridge. > > Now I'm facing three possibilities of how to proceed: They all look OK I think :) Some comments below. > 1. Give up on the (currently broken) feature to write a vmstate for > older QEMU versions. > > This will allow to declare the irq_count field in vmstate_pcibus > unused, and we would have to restore it on vmload step-wise via the > PCI devices. It would also allow to change its semantics for PIIX3, > mapping directly to PIC IRQs. I think that's okay too simply because these things are usually easy to fix after the fact when the rest of the issues are addressed. > 2. Keep writing a legacy irq_count field. > > This will require quite a few new APIs so that host bridges that > want to change their nirq can still generate a compatible irq_count > vmstate field. Namely: > - A function to set up vmstate_irq_count and define a callback that > the core will invoke to prepare the vmstate_irq_count before > vmsave. > - A function to obtain the IRQ mapping /without/ the final host > bridge step. This is required so that the callback above can > calculate the old state like in the PIIX3 case. Does this really need to be so complex? It seems that we just need pci_get_irq_count(bus, irq) which can use the existing map_irq API, no? Then invoke that before save. > 3. Keep irq_count and nirq as is, introduce additional map_host_irq. > > This is simpler than 2 and more compatible than 1. It would also > allow to introduce the polarity and masking information more > smoothly as we won't have to add it to existing map_irq callbacks > then. So what does it map, and to what? Maybe we can make the name imply that somehow. > Any other suggestions? > > Jan > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux
On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote: > On 2012-05-30 22:31, Michael S. Tsirkin wrote: > >>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use > >>> irq_count instead of the pic_levels bitmap. > >> > >> Just that this affects generic PCI code, not only PIIX-specific things. > > > > Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs. > > > >> And that we need to save/restore some irq_count field according to the > >> old semantics. > > > > Well, it's a bug: this is redundant info we should not have exposed it. > > > > Anyway, let's make the rest work properly and cleanly first, add a FIXME > > for now, then we'll find a hack making it work for migration. > > It remains non-trivial: I got your patch working (a minor init issue), > but yet without changing the number of IRQs for PIIX3, so keeping the > irq_count semantics for this host bridge. BTW can you post the fixed version please in case others want to play with it?
On 2012-06-01 15:28, Michael S. Tsirkin wrote: > On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote: >> On 2012-05-30 22:31, Michael S. Tsirkin wrote: >>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use >>>>> irq_count instead of the pic_levels bitmap. >>>> >>>> Just that this affects generic PCI code, not only PIIX-specific things. >>> >>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs. >>> >>>> And that we need to save/restore some irq_count field according to the >>>> old semantics. >>> >>> Well, it's a bug: this is redundant info we should not have exposed it. >>> >>> Anyway, let's make the rest work properly and cleanly first, add a FIXME >>> for now, then we'll find a hack making it work for migration. >> >> It remains non-trivial: I got your patch working (a minor init issue), >> but yet without changing the number of IRQs for PIIX3, so keeping the >> irq_count semantics for this host bridge. > > BTW can you post the fixed version please in case > others want to play with it? Pushed a snapshot to git://git.kiszka.org/qemu.git queues/pci. That version survived simple tests. Jan
On 2012-06-01 15:27, Michael S. Tsirkin wrote: > On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote: >> On 2012-05-30 22:31, Michael S. Tsirkin wrote: >>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use >>>>> irq_count instead of the pic_levels bitmap. >>>> >>>> Just that this affects generic PCI code, not only PIIX-specific things. >>> >>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs. >>> >>>> And that we need to save/restore some irq_count field according to the >>>> old semantics. >>> >>> Well, it's a bug: this is redundant info we should not have exposed it. >>> >>> Anyway, let's make the rest work properly and cleanly first, add a FIXME >>> for now, then we'll find a hack making it work for migration. >> >> It remains non-trivial: I got your patch working (a minor init issue), >> but yet without changing the number of IRQs for PIIX3, so keeping the >> irq_count semantics for this host bridge. >> >> Now I'm facing three possibilities of how to proceed: > > They all look OK I think :) Some comments below. > >> 1. Give up on the (currently broken) feature to write a vmstate for >> older QEMU versions. >> >> This will allow to declare the irq_count field in vmstate_pcibus >> unused, and we would have to restore it on vmload step-wise via the >> PCI devices. It would also allow to change its semantics for PIIX3, >> mapping directly to PIC IRQs. > > I think that's okay too simply because these things are usually > easy to fix after the fact when the rest of the issues are addressed. Don't get what you mean with "fixed". If we fix the vmstate generation in making it backward-compatible again, we enter option 2. Option 1 is explicitly about giving this up. > >> 2. Keep writing a legacy irq_count field. >> >> This will require quite a few new APIs so that host bridges that >> want to change their nirq can still generate a compatible irq_count >> vmstate field. Namely: >> - A function to set up vmstate_irq_count and define a callback that >> the core will invoke to prepare the vmstate_irq_count before >> vmsave. >> - A function to obtain the IRQ mapping /without/ the final host >> bridge step. This is required so that the callback above can >> calculate the old state like in the PIIX3 case. > > Does this really need to be so complex? It seems that we just need > pci_get_irq_count(bus, irq) which can use the existing map_irq API, no? > Then invoke that before save. No, because the new map_irq of the PIIX3 bridge will also include the host bridge routing (or masking) according to the PIRQx routoing registers of the PIIX3. Moreover, the fixup of the written legacy irq_count state has to happen in the PCI layer, which therefore has to query the host bridge for fixup information, not the other way around (because the PCI bus vmstate is separate from the PIIX3 host bridge). > >> 3. Keep irq_count and nirq as is, introduce additional map_host_irq. >> >> This is simpler than 2 and more compatible than 1. It would also >> allow to introduce the polarity and masking information more >> smoothly as we won't have to add it to existing map_irq callbacks >> then. > > So what does it map, and to what? PCI bus IRQ to *host* IRQ. It is supposed to explore the routing of the host bridge between the root bus and the host's interrupt controller (i.e. the step that is currently missing the cached chain). > Maybe we can make the name imply that somehow. Better suggestions for this handler and maybe also the existing map_irq are welcome to make the difference clearer. Jan
On Fri, Jun 01, 2012 at 03:57:01PM +0200, Jan Kiszka wrote: > On 2012-06-01 15:27, Michael S. Tsirkin wrote: > > On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote: > >> On 2012-05-30 22:31, Michael S. Tsirkin wrote: > >>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use > >>>>> irq_count instead of the pic_levels bitmap. > >>>> > >>>> Just that this affects generic PCI code, not only PIIX-specific things. > >>> > >>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs. > >>> > >>>> And that we need to save/restore some irq_count field according to the > >>>> old semantics. > >>> > >>> Well, it's a bug: this is redundant info we should not have exposed it. > >>> > >>> Anyway, let's make the rest work properly and cleanly first, add a FIXME > >>> for now, then we'll find a hack making it work for migration. > >> > >> It remains non-trivial: I got your patch working (a minor init issue), > >> but yet without changing the number of IRQs for PIIX3, so keeping the > >> irq_count semantics for this host bridge. > >> > >> Now I'm facing three possibilities of how to proceed: > > > > They all look OK I think :) Some comments below. > > > >> 1. Give up on the (currently broken) feature to write a vmstate for > >> older QEMU versions. > >> > >> This will allow to declare the irq_count field in vmstate_pcibus > >> unused, and we would have to restore it on vmload step-wise via the > >> PCI devices. It would also allow to change its semantics for PIIX3, > >> mapping directly to PIC IRQs. > > > > I think that's okay too simply because these things are usually > > easy to fix after the fact when the rest of the issues are addressed. > > Don't get what you mean with "fixed". If we fix the vmstate generation > in making it backward-compatible again, we enter option 2. Option 1 is > explicitly about giving this up. What I really mean is I think I see how 2 can be added without much pain. So let's focus on 1 for now and worst case we break migration. > > > >> 2. Keep writing a legacy irq_count field. > >> > >> This will require quite a few new APIs so that host bridges that > >> want to change their nirq can still generate a compatible irq_count > >> vmstate field. Namely: > >> - A function to set up vmstate_irq_count and define a callback that > >> the core will invoke to prepare the vmstate_irq_count before > >> vmsave. > >> - A function to obtain the IRQ mapping /without/ the final host > >> bridge step. This is required so that the callback above can > >> calculate the old state like in the PIIX3 case. > > > > Does this really need to be so complex? It seems that we just need > > pci_get_irq_count(bus, irq) which can use the existing map_irq API, no? > > Then invoke that before save. > > No, because the new map_irq of the PIIX3 bridge will also include the > host bridge routing (or masking) according to the PIRQx routoing > registers of the PIIX3. Moreover, the fixup of the written legacy > irq_count state has to happen in the PCI layer, which therefore has to > query the host bridge for fixup information, not the other way around > (because the PCI bus vmstate is separate from the PIIX3 host bridge). > > > > >> 3. Keep irq_count and nirq as is, introduce additional map_host_irq. > >> > >> This is simpler than 2 and more compatible than 1. It would also > >> allow to introduce the polarity and masking information more > >> smoothly as we won't have to add it to existing map_irq callbacks > >> then. > > > > So what does it map, and to what? > > PCI bus IRQ to *host* IRQ. It is supposed to explore the routing of the > host bridge between the root bus and the host's interrupt controller > (i.e. the step that is currently missing the cached chain). > > > Maybe we can make the name imply that somehow. > > Better suggestions for this handler and maybe also the existing map_irq > are welcome to make the difference clearer. > > Jan So I won't object to adding a new API but if we do it properly this won't help compatibility :( Let's formulate what these do exactly, this will also help us come up with sensible names. 1. The difference is that pci bridges route interrupt pins. So it gets interrupt pin on device and returns interrupt pin on connector. All attributes are standard PCI. We should remove all mentions of "irq" really. 2. The pci root (yes it's a host bridge but let's not use the term host if we can) routes an interrupt pin on device to a host irq. It can also do more things like invert polarity. So yes we can add 2 to piix but we really should remove 1 from it. Wrt names - do you object to long names? How about route_interrupt_pin for 1 and route_interrupt_pin_to_irq for 2? > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux
On 2012-06-01 16:34, Michael S. Tsirkin wrote: > On Fri, Jun 01, 2012 at 03:57:01PM +0200, Jan Kiszka wrote: >> On 2012-06-01 15:27, Michael S. Tsirkin wrote: >>> On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote: >>>> On 2012-05-30 22:31, Michael S. Tsirkin wrote: >>>>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use >>>>>>> irq_count instead of the pic_levels bitmap. >>>>>> >>>>>> Just that this affects generic PCI code, not only PIIX-specific things. >>>>> >>>>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs. >>>>> >>>>>> And that we need to save/restore some irq_count field according to the >>>>>> old semantics. >>>>> >>>>> Well, it's a bug: this is redundant info we should not have exposed it. >>>>> >>>>> Anyway, let's make the rest work properly and cleanly first, add a FIXME >>>>> for now, then we'll find a hack making it work for migration. >>>> >>>> It remains non-trivial: I got your patch working (a minor init issue), >>>> but yet without changing the number of IRQs for PIIX3, so keeping the >>>> irq_count semantics for this host bridge. >>>> >>>> Now I'm facing three possibilities of how to proceed: >>> >>> They all look OK I think :) Some comments below. >>> >>>> 1. Give up on the (currently broken) feature to write a vmstate for >>>> older QEMU versions. >>>> >>>> This will allow to declare the irq_count field in vmstate_pcibus >>>> unused, and we would have to restore it on vmload step-wise via the >>>> PCI devices. It would also allow to change its semantics for PIIX3, >>>> mapping directly to PIC IRQs. >>> >>> I think that's okay too simply because these things are usually >>> easy to fix after the fact when the rest of the issues are addressed. >> >> Don't get what you mean with "fixed". If we fix the vmstate generation >> in making it backward-compatible again, we enter option 2. Option 1 is >> explicitly about giving this up. > > What I really mean is I think I see how 2 can be added without much > pain. So let's focus on 1 for now and worst case we break migration. I'd like to avoid planing for this worst case as long as there are also statements [1] that this is not acceptable for QEMU in general. It doesn't to create a beautiful architecture initially about which we already know that it will become more complex than alternatives in the end. > >>> >>>> 2. Keep writing a legacy irq_count field. >>>> >>>> This will require quite a few new APIs so that host bridges that >>>> want to change their nirq can still generate a compatible irq_count >>>> vmstate field. Namely: >>>> - A function to set up vmstate_irq_count and define a callback that >>>> the core will invoke to prepare the vmstate_irq_count before >>>> vmsave. >>>> - A function to obtain the IRQ mapping /without/ the final host >>>> bridge step. This is required so that the callback above can >>>> calculate the old state like in the PIIX3 case. >>> >>> Does this really need to be so complex? It seems that we just need >>> pci_get_irq_count(bus, irq) which can use the existing map_irq API, no? >>> Then invoke that before save. >> >> No, because the new map_irq of the PIIX3 bridge will also include the >> host bridge routing (or masking) according to the PIRQx routoing >> registers of the PIIX3. Moreover, the fixup of the written legacy >> irq_count state has to happen in the PCI layer, which therefore has to >> query the host bridge for fixup information, not the other way around >> (because the PCI bus vmstate is separate from the PIIX3 host bridge). >> >>> >>>> 3. Keep irq_count and nirq as is, introduce additional map_host_irq. >>>> >>>> This is simpler than 2 and more compatible than 1. It would also >>>> allow to introduce the polarity and masking information more >>>> smoothly as we won't have to add it to existing map_irq callbacks >>>> then. >>> >>> So what does it map, and to what? >> >> PCI bus IRQ to *host* IRQ. It is supposed to explore the routing of the >> host bridge between the root bus and the host's interrupt controller >> (i.e. the step that is currently missing the cached chain). >> >>> Maybe we can make the name imply that somehow. >> >> Better suggestions for this handler and maybe also the existing map_irq >> are welcome to make the difference clearer. >> >> Jan > > So I won't object to adding a new API but if we do > it properly this won't help compatibility :( It will as this API does not touch the parts that affect the vmstate (ie. semantics of irq_count won't change). > > Let's formulate what these do exactly, this will > also help us come up with sensible names. > > 1. The difference is that pci bridges route interrupt pins. So it gets > interrupt pin on device and returns interrupt pin on connector. All > attributes are standard PCI. We should remove all mentions of "irq" > really. > > > 2. The pci root (yes it's a host bridge but let's > not use the term host if we can) routes > an interrupt pin on device to a host irq. It can also > do more things like invert polarity. > > So yes we can add 2 to piix but we really should > remove 1 from it. > > Wrt names - do you object to long names? > How about route_interrupt_pin for 1 > and route_interrupt_pin_to_irq for 2? I'm fine with this. Jan [1] http://permalink.gmane.org/gmane.comp.emulators.qemu/153357
On Fri, Jun 01, 2012 at 05:34:14PM +0300, Michael S. Tsirkin wrote: > On Fri, Jun 01, 2012 at 03:57:01PM +0200, Jan Kiszka wrote: > > On 2012-06-01 15:27, Michael S. Tsirkin wrote: > > > On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote: > > >> On 2012-05-30 22:31, Michael S. Tsirkin wrote: > > >>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use > > >>>>> irq_count instead of the pic_levels bitmap. > > >>>> > > >>>> Just that this affects generic PCI code, not only PIIX-specific things. > > >>> > > >>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs. > > >>> > > >>>> And that we need to save/restore some irq_count field according to the > > >>>> old semantics. > > >>> > > >>> Well, it's a bug: this is redundant info we should not have exposed it. > > >>> > > >>> Anyway, let's make the rest work properly and cleanly first, add a FIXME > > >>> for now, then we'll find a hack making it work for migration. > > >> > > >> It remains non-trivial: I got your patch working (a minor init issue), > > >> but yet without changing the number of IRQs for PIIX3, so keeping the > > >> irq_count semantics for this host bridge. > > >> > > >> Now I'm facing three possibilities of how to proceed: > > > > > > They all look OK I think :) Some comments below. > > > > > >> 1. Give up on the (currently broken) feature to write a vmstate for > > >> older QEMU versions. > > >> > > >> This will allow to declare the irq_count field in vmstate_pcibus > > >> unused, and we would have to restore it on vmload step-wise via the > > >> PCI devices. It would also allow to change its semantics for PIIX3, > > >> mapping directly to PIC IRQs. > > > > > > I think that's okay too simply because these things are usually > > > easy to fix after the fact when the rest of the issues are addressed. > > > > Don't get what you mean with "fixed". If we fix the vmstate generation > > in making it backward-compatible again, we enter option 2. Option 1 is > > explicitly about giving this up. > > What I really mean is I think I see how 2 can be added without much > pain. So let's focus on 1 for now and worst case we break migration. > > > > > > >> 2. Keep writing a legacy irq_count field. > > >> > > >> This will require quite a few new APIs so that host bridges that > > >> want to change their nirq can still generate a compatible irq_count > > >> vmstate field. Namely: > > >> - A function to set up vmstate_irq_count and define a callback that > > >> the core will invoke to prepare the vmstate_irq_count before > > >> vmsave. > > >> - A function to obtain the IRQ mapping /without/ the final host > > >> bridge step. This is required so that the callback above can > > >> calculate the old state like in the PIIX3 case. > > > > > > Does this really need to be so complex? It seems that we just need > > > pci_get_irq_count(bus, irq) which can use the existing map_irq API, no? > > > Then invoke that before save. > > > > No, because the new map_irq of the PIIX3 bridge will also include the > > host bridge routing (or masking) according to the PIRQx routoing > > registers of the PIIX3. Moreover, the fixup of the written legacy > > irq_count state has to happen in the PCI layer, which therefore has to > > query the host bridge for fixup information, not the other way around > > (because the PCI bus vmstate is separate from the PIIX3 host bridge). > > > > > > > >> 3. Keep irq_count and nirq as is, introduce additional map_host_irq. > > >> > > >> This is simpler than 2 and more compatible than 1. It would also > > >> allow to introduce the polarity and masking information more > > >> smoothly as we won't have to add it to existing map_irq callbacks > > >> then. > > > > > > So what does it map, and to what? > > > > PCI bus IRQ to *host* IRQ. It is supposed to explore the routing of the > > host bridge between the root bus and the host's interrupt controller > > (i.e. the step that is currently missing the cached chain). > > > > > Maybe we can make the name imply that somehow. > > > > Better suggestions for this handler and maybe also the existing map_irq > > are welcome to make the difference clearer. > > > > Jan > > So I won't object to adding a new API but if we do > it properly this won't help compatibility :( > > Let's formulate what these do exactly, this will > also help us come up with sensible names. > > 1. The difference is that pci bridges route interrupt pins. So it gets > interrupt pin on device and returns interrupt pin on connector. All > attributes are standard PCI. We should remove all mentions of "irq" > really. > > > 2. The pci root (yes it's a host bridge but let's > not use the term host if we can) routes > an interrupt pin on device to a host irq. It can also > do more things like invert polarity. > > So yes we can add 2 to piix but we really should > remove 1 from it. > > Wrt names - do you object to long names? > How about route_interrupt_pin for 1 > and route_interrupt_pin_to_irq for 2? A small clarification. What I tried to say above is that pci bus does not route irqs. pci bridge routes irqs: either host or pci to pci bridge. But as long as we thought the functionality is roughly the same it made sense to say that bus has this function simply because both bridge types have a bus. But if not then the clean thing is a callback in p2p bridge and another one in a host bridge. Having both these and one in the bus looks weird. > > -- > > Siemens AG, Corporate Technology, CT T DE IT 1 > > Corporate Competence Center Embedded Linux
On Fri, Jun 01, 2012 at 05:15:42PM +0200, Jan Kiszka wrote: > On 2012-06-01 16:34, Michael S. Tsirkin wrote: > > On Fri, Jun 01, 2012 at 03:57:01PM +0200, Jan Kiszka wrote: > >> On 2012-06-01 15:27, Michael S. Tsirkin wrote: > >>> On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote: > >>>> On 2012-05-30 22:31, Michael S. Tsirkin wrote: > >>>>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use > >>>>>>> irq_count instead of the pic_levels bitmap. > >>>>>> > >>>>>> Just that this affects generic PCI code, not only PIIX-specific things. > >>>>> > >>>>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs. > >>>>> > >>>>>> And that we need to save/restore some irq_count field according to the > >>>>>> old semantics. > >>>>> > >>>>> Well, it's a bug: this is redundant info we should not have exposed it. > >>>>> > >>>>> Anyway, let's make the rest work properly and cleanly first, add a FIXME > >>>>> for now, then we'll find a hack making it work for migration. > >>>> > >>>> It remains non-trivial: I got your patch working (a minor init issue), > >>>> but yet without changing the number of IRQs for PIIX3, so keeping the > >>>> irq_count semantics for this host bridge. > >>>> > >>>> Now I'm facing three possibilities of how to proceed: > >>> > >>> They all look OK I think :) Some comments below. > >>> > >>>> 1. Give up on the (currently broken) feature to write a vmstate for > >>>> older QEMU versions. > >>>> > >>>> This will allow to declare the irq_count field in vmstate_pcibus > >>>> unused, and we would have to restore it on vmload step-wise via the > >>>> PCI devices. It would also allow to change its semantics for PIIX3, > >>>> mapping directly to PIC IRQs. > >>> > >>> I think that's okay too simply because these things are usually > >>> easy to fix after the fact when the rest of the issues are addressed. > >> > >> Don't get what you mean with "fixed". If we fix the vmstate generation > >> in making it backward-compatible again, we enter option 2. Option 1 is > >> explicitly about giving this up. > > > > What I really mean is I think I see how 2 can be added without much > > pain. So let's focus on 1 for now and worst case we break migration. > > I'd like to avoid planing for this worst case as long as there are also > statements [1] that this is not acceptable for QEMU in general. It > doesn't to create a beautiful architecture initially about which we > already know that it will become more complex than alternatives in the end. > > > > >>> > >>>> 2. Keep writing a legacy irq_count field. > >>>> > >>>> This will require quite a few new APIs so that host bridges that > >>>> want to change their nirq can still generate a compatible irq_count > >>>> vmstate field. Namely: > >>>> - A function to set up vmstate_irq_count and define a callback that > >>>> the core will invoke to prepare the vmstate_irq_count before > >>>> vmsave. > >>>> - A function to obtain the IRQ mapping /without/ the final host > >>>> bridge step. This is required so that the callback above can > >>>> calculate the old state like in the PIIX3 case. > >>> > >>> Does this really need to be so complex? It seems that we just need > >>> pci_get_irq_count(bus, irq) which can use the existing map_irq API, no? > >>> Then invoke that before save. > >> > >> No, because the new map_irq of the PIIX3 bridge will also include the > >> host bridge routing (or masking) according to the PIRQx routoing > >> registers of the PIIX3. Moreover, the fixup of the written legacy > >> irq_count state has to happen in the PCI layer, which therefore has to > >> query the host bridge for fixup information, not the other way around > >> (because the PCI bus vmstate is separate from the PIIX3 host bridge). > >> > >>> > >>>> 3. Keep irq_count and nirq as is, introduce additional map_host_irq. > >>>> > >>>> This is simpler than 2 and more compatible than 1. It would also > >>>> allow to introduce the polarity and masking information more > >>>> smoothly as we won't have to add it to existing map_irq callbacks > >>>> then. > >>> > >>> So what does it map, and to what? > >> > >> PCI bus IRQ to *host* IRQ. It is supposed to explore the routing of the > >> host bridge between the root bus and the host's interrupt controller > >> (i.e. the step that is currently missing the cached chain). > >> > >>> Maybe we can make the name imply that somehow. > >> > >> Better suggestions for this handler and maybe also the existing map_irq > >> are welcome to make the difference clearer. > >> > >> Jan > > > > So I won't object to adding a new API but if we do > > it properly this won't help compatibility :( > > It will as this API does not touch the parts that affect the vmstate > (ie. semantics of irq_count won't change). Yes but irq_count in vmstate is a bug. IMO even if we do not change anything we should ignore irq_count on load and recalculate it from what the devices supply. > > > > Let's formulate what these do exactly, this will > > also help us come up with sensible names. > > > > 1. The difference is that pci bridges route interrupt pins. So it gets > > interrupt pin on device and returns interrupt pin on connector. All > > attributes are standard PCI. We should remove all mentions of "irq" > > really. > > > > > > 2. The pci root (yes it's a host bridge but let's > > not use the term host if we can) routes > > an interrupt pin on device to a host irq. It can also > > do more things like invert polarity. > > > > So yes we can add 2 to piix but we really should > > remove 1 from it. > > > > Wrt names - do you object to long names? > > How about route_interrupt_pin for 1 > > and route_interrupt_pin_to_irq for 2? > > I'm fine with this. > > Jan > > [1] http://permalink.gmane.org/gmane.comp.emulators.qemu/153357 > > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux
On Fri, Jun 01, 2012 at 05:15:42PM +0200, Jan Kiszka wrote: > On 2012-06-01 16:34, Michael S. Tsirkin wrote: > > On Fri, Jun 01, 2012 at 03:57:01PM +0200, Jan Kiszka wrote: > >> On 2012-06-01 15:27, Michael S. Tsirkin wrote: > >>> On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote: > >>>> On 2012-05-30 22:31, Michael S. Tsirkin wrote: > >>>>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use > >>>>>>> irq_count instead of the pic_levels bitmap. > >>>>>> > >>>>>> Just that this affects generic PCI code, not only PIIX-specific things. > >>>>> > >>>>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs. > >>>>> > >>>>>> And that we need to save/restore some irq_count field according to the > >>>>>> old semantics. > >>>>> > >>>>> Well, it's a bug: this is redundant info we should not have exposed it. > >>>>> > >>>>> Anyway, let's make the rest work properly and cleanly first, add a FIXME > >>>>> for now, then we'll find a hack making it work for migration. > >>>> > >>>> It remains non-trivial: I got your patch working (a minor init issue), > >>>> but yet without changing the number of IRQs for PIIX3, so keeping the > >>>> irq_count semantics for this host bridge. > >>>> > >>>> Now I'm facing three possibilities of how to proceed: > >>> > >>> They all look OK I think :) Some comments below. > >>> > >>>> 1. Give up on the (currently broken) feature to write a vmstate for > >>>> older QEMU versions. > >>>> > >>>> This will allow to declare the irq_count field in vmstate_pcibus > >>>> unused, and we would have to restore it on vmload step-wise via the > >>>> PCI devices. It would also allow to change its semantics for PIIX3, > >>>> mapping directly to PIC IRQs. > >>> > >>> I think that's okay too simply because these things are usually > >>> easy to fix after the fact when the rest of the issues are addressed. > >> > >> Don't get what you mean with "fixed". If we fix the vmstate generation > >> in making it backward-compatible again, we enter option 2. Option 1 is > >> explicitly about giving this up. > > > > What I really mean is I think I see how 2 can be added without much > > pain. So let's focus on 1 for now and worst case we break migration. > > I'd like to avoid planing for this worst case as long as there are also > statements [1] that this is not acceptable for QEMU in general. It > doesn't to create a beautiful architecture initially about which we > already know that it will become more complex than alternatives in the end. 1 and 2 are same really except 2 adds a hack for compatibility, no?
On 2012-06-01 17:28, Michael S. Tsirkin wrote: >>> So I won't object to adding a new API but if we do >>> it properly this won't help compatibility :( >> >> It will as this API does not touch the parts that affect the vmstate >> (ie. semantics of irq_count won't change). > > Yes but irq_count in vmstate is a bug. IMO even if we do > not change anything we should ignore irq_count on > load and recalculate it from what the devices supply. I don't disagree. But this will only allow keeping backward migration support if we preserve the semantics of current map_irq somewhere - to keep the chance of calculating the legacy values for vmsave as well. Jan
On 2012-06-01 17:30, Michael S. Tsirkin wrote: > On Fri, Jun 01, 2012 at 05:15:42PM +0200, Jan Kiszka wrote: >> On 2012-06-01 16:34, Michael S. Tsirkin wrote: >>> On Fri, Jun 01, 2012 at 03:57:01PM +0200, Jan Kiszka wrote: >>>> On 2012-06-01 15:27, Michael S. Tsirkin wrote: >>>>> On Fri, Jun 01, 2012 at 02:52:56PM +0200, Jan Kiszka wrote: >>>>>> On 2012-05-30 22:31, Michael S. Tsirkin wrote: >>>>>>>>> So we'll just have PIIX_NUM_PIC_IRQS entries there and use >>>>>>>>> irq_count instead of the pic_levels bitmap. >>>>>>>> >>>>>>>> Just that this affects generic PCI code, not only PIIX-specific things. >>>>>>> >>>>>>> Yes but it's not a problem - pci_bus_irqs sets the map function and nirqs. >>>>>>> >>>>>>>> And that we need to save/restore some irq_count field according to the >>>>>>>> old semantics. >>>>>>> >>>>>>> Well, it's a bug: this is redundant info we should not have exposed it. >>>>>>> >>>>>>> Anyway, let's make the rest work properly and cleanly first, add a FIXME >>>>>>> for now, then we'll find a hack making it work for migration. >>>>>> >>>>>> It remains non-trivial: I got your patch working (a minor init issue), >>>>>> but yet without changing the number of IRQs for PIIX3, so keeping the >>>>>> irq_count semantics for this host bridge. >>>>>> >>>>>> Now I'm facing three possibilities of how to proceed: >>>>> >>>>> They all look OK I think :) Some comments below. >>>>> >>>>>> 1. Give up on the (currently broken) feature to write a vmstate for >>>>>> older QEMU versions. >>>>>> >>>>>> This will allow to declare the irq_count field in vmstate_pcibus >>>>>> unused, and we would have to restore it on vmload step-wise via the >>>>>> PCI devices. It would also allow to change its semantics for PIIX3, >>>>>> mapping directly to PIC IRQs. >>>>> >>>>> I think that's okay too simply because these things are usually >>>>> easy to fix after the fact when the rest of the issues are addressed. >>>> >>>> Don't get what you mean with "fixed". If we fix the vmstate generation >>>> in making it backward-compatible again, we enter option 2. Option 1 is >>>> explicitly about giving this up. >>> >>> What I really mean is I think I see how 2 can be added without much >>> pain. So let's focus on 1 for now and worst case we break migration. >> >> I'd like to avoid planing for this worst case as long as there are also >> statements [1] that this is not acceptable for QEMU in general. It >> doesn't to create a beautiful architecture initially about which we >> already know that it will become more complex than alternatives in the end. > > 1 and 2 are same really except 2 adds a hack for compatibility, no? Yes, 2 would allow us to maintain irq_count in different ways as well, specifically using it calculate shared output IRQ lines before reporting them to the PIC generically. This approach has both a charming and an ugly face. Jan
On Fri, Jun 01, 2012 at 05:54:15PM +0200, Jan Kiszka wrote: > On 2012-06-01 17:28, Michael S. Tsirkin wrote: > >>> So I won't object to adding a new API but if we do > >>> it properly this won't help compatibility :( > >> > >> It will as this API does not touch the parts that affect the vmstate > >> (ie. semantics of irq_count won't change). > > > > Yes but irq_count in vmstate is a bug. IMO even if we do > > not change anything we should ignore irq_count on > > load and recalculate it from what the devices supply. > > I don't disagree. But this will only allow keeping backward migration > support if we preserve the semantics of current map_irq somewhere - to > keep the chance of calculating the legacy values for vmsave as well. > > Jan We don't need to preserve it, right? We can calculate it before savevm. > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux
On 2012-06-01 18:05, Michael S. Tsirkin wrote: > On Fri, Jun 01, 2012 at 05:54:15PM +0200, Jan Kiszka wrote: >> On 2012-06-01 17:28, Michael S. Tsirkin wrote: >>>>> So I won't object to adding a new API but if we do >>>>> it properly this won't help compatibility :( >>>> >>>> It will as this API does not touch the parts that affect the vmstate >>>> (ie. semantics of irq_count won't change). >>> >>> Yes but irq_count in vmstate is a bug. IMO even if we do >>> not change anything we should ignore irq_count on >>> load and recalculate it from what the devices supply. >> >> I don't disagree. But this will only allow keeping backward migration >> support if we preserve the semantics of current map_irq somewhere - to >> keep the chance of calculating the legacy values for vmsave as well. >> >> Jan > > We don't need to preserve it, right? We can calculate it before > savevm. We can't calculate it without something like the additional infrastructure I listed. Jan
diff --git a/hw/pci.c b/hw/pci.c index c1ebdde..313abe1 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -112,18 +112,33 @@ static inline void pci_set_irq_state(PCIDevice *d, int irq_num, int level) d->irq_state |= level << irq_num; } -static void pci_change_irq_level(PCIDevice *pci_dev, int irq_num, int change) +static void pci_update_irq_maps_for_device(PCIBus *unused, PCIDevice *dev) { PCIBus *bus; - for (;;) { - bus = pci_dev->bus; - irq_num = bus->map_irq(pci_dev, irq_num); - if (bus->set_irq) - break; - pci_dev = bus->parent_dev; + PCIDevice *pci_dev; + int i, irq_num; + for (i = 0; i < PCI_NUM_PINS; ++i) { + pci_dev = dev; + irq_num = i; + for (;;) { + bus = pci_dev->bus; + irq_num = bus->map_irq(pci_dev, irq_num); + if (bus->set_irq) + break; + pci_dev = bus->parent_dev; + } + dev->irq_num[i] = irq_num; + bus->irq_count[i] += pci_irq_state(dev, i); + dev->root_bus = bus; } - bus->irq_count[irq_num] += change; - bus->set_irq(bus->irq_opaque, irq_num, bus->irq_count[irq_num] != 0); +} + +static void pci_change_irq_level(PCIDevice *pci_dev, int irq_num, int change) +{ + PCIBus *bus = pci_dev->root_bus; + int i = pci_dev->irq_num[irq_num]; + bus->irq_count[i] += change; + bus->set_irq(bus->irq_opaque, i, bus->irq_count[i] != 0); } int pci_bus_get_irq_level(PCIBus *bus, int irq_num) @@ -805,6 +820,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus, bus->devices[devfn] = pci_dev; pci_dev->irq = qemu_allocate_irqs(pci_set_irq, pci_dev, PCI_NUM_PINS); pci_dev->version_id = 2; /* Current pci device vmstate version */ + pci_update_irq_maps_for_device(NULL, pci_dev); return pci_dev; } @@ -1140,6 +1156,18 @@ static void pci_for_each_device_under_bus(PCIBus *bus, } } +/* Update per-device IRQ mappings after bus mappings changed. */ +void pci_update_irq_maps(PCIBus *bus) +{ + int i; + /* FIXME: this only scans one level. + * Need to scan child buses recursively. */ + for (i = 0; i <= bus->nirq; ++i) + bus->irq_count[i] = 0; + pci_for_each_device_under_bus(bus, pci_update_irq_maps_for_device); +} + + void pci_for_each_device(PCIBus *bus, int bus_num, void (*fn)(PCIBus *b, PCIDevice *d)) { diff --git a/hw/pci.h b/hw/pci.h index 8d0aa49..4102c44 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -207,6 +207,12 @@ struct PCIDevice { /* Current IRQ levels. Used internally by the generic PCI code. */ uint8_t irq_state; + /* The root bus. Used internally by the generic PCI code. */ + PCIBus *root_bus; + + /* IRQ num at the root bus. Used internally by the generic PCI code. */ + int irq_num[PCI_NUM_PINS]; + /* Capability bits */ uint32_t cap_present; @@ -305,6 +311,7 @@ PCIDevice *pci_nic_init_nofail(NICInfo *nd, const char *default_model, const char *default_devaddr); int pci_bus_num(PCIBus *s); void pci_for_each_device(PCIBus *bus, int bus_num, void (*fn)(PCIBus *bus, PCIDevice *d)); +void pci_update_irq_maps(PCIBus *bus); PCIBus *pci_find_root_bus(int domain); int pci_find_domain(const PCIBus *bus); PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn); diff --git a/hw/piix_pci.c b/hw/piix_pci.c index 09e84f5..ac71294 100644 --- a/hw/piix_pci.c +++ b/hw/piix_pci.c @@ -391,6 +391,7 @@ static void piix3_update_irq_levels(PIIX3State *piix3) { int pirq; + pci_update_irq_maps(piix3->dev.bus); piix3->pic_levels = 0; for (pirq = 0; pirq < PIIX_NUM_PIRQS; pirq++) { piix3_set_irq_level(piix3, pirq,