Message ID | 20180801035357.7804-1-david@gibson.dropbear.id.au |
---|---|
State | New |
Headers | show |
On 1 August 2018 at 04:53, David Gibson <david@gibson.dropbear.id.au> wrote: > The following changes since commit f7502360397d291be04bc040e9f96c92ff2d8030: > > Update version for v3.0.0-rc3 release (2018-07-31 19:30:17 +0100) > > are available in the Git repository at: > > git://github.com/dgibson/qemu.git tags/ppc-for-3.0-20180801 > > for you to fetch changes up to 6484ab3dffadc79020a71376010f517d60b81b83: > > sam460ex: Fix PCI interrupts with multiple devices (2018-08-01 11:01:38 +1000) > > ---------------------------------------------------------------- > ppc patch queue for 2018-08-01 > > Here are a final couple of fixes for the 3.0 release. So, we've just put out rc3, which in an ideal world is our final release candidate for 3.0. Are these bugs regressions from 2.12 ? thanks -- PMM
On Wed, 1 Aug 2018, Peter Maydell wrote: > On 1 August 2018 at 04:53, David Gibson <david@gibson.dropbear.id.au> wrote: >> The following changes since commit f7502360397d291be04bc040e9f96c92ff2d8030: >> >> Update version for v3.0.0-rc3 release (2018-07-31 19:30:17 +0100) >> >> are available in the Git repository at: >> >> git://github.com/dgibson/qemu.git tags/ppc-for-3.0-20180801 >> >> for you to fetch changes up to 6484ab3dffadc79020a71376010f517d60b81b83: >> >> sam460ex: Fix PCI interrupts with multiple devices (2018-08-01 11:01:38 +1000) >> >> ---------------------------------------------------------------- >> ppc patch queue for 2018-08-01 >> >> Here are a final couple of fixes for the 3.0 release. > > So, we've just put out rc3, which in an ideal world is our > final release candidate for 3.0. Are these bugs regressions from > 2.12 ? I don't know about the macio one but the sam460ex PCI interrupts were broken in 2.12 too. However it's a fix for a device only used in sam460ex which is now fixed by this patch so including it is not high risk for breaking anything else than sam460ex which is known to be not finished yet so I would not worry too much. But which is better? Releasing 3.0 with a known bug or including this fix without an rc4? In case of machines that are in QEMU for a long time, widely used and other infrastructure is dependent upon it's good to have strict rules to avoid regressions but for the newly added sam460ex which is really only expected to be first usable in 3.0 and not many people depend on it yet delaying a fix for a known problem until December just to follow a rule looks counterproductive. So I would say it's OK to include this fix but skip rc4 and if it makes sam460ex worse than it is now then at worst we would have a broken 3.0 that will need to be fixed in next release. But we have a good chance to release a better 3.0 as opposed to not include the fix and release a known broken 3.0 that surely will have to be fixed in next release. (This fix may improve sound and network support so it would be good to have it in 3.0. A lot of potential users can only use release and rc versions because they can't build QEMU from source so they will only get this fix in next release otherwise.) But if it's not possible to have this in 3.0 without rc4 and there won't be other changes needing an rc4 we can live with this not getting in 3.0 but due to the above it might delay this for the general public by a few months. Regards, BALATON Zoltan
On 1 August 2018 at 12:24, BALATON Zoltan <balaton@eik.bme.hu> wrote: > On Wed, 1 Aug 2018, Peter Maydell wrote: >> So, we've just put out rc3, which in an ideal world is our >> final release candidate for 3.0. Are these bugs regressions from >> 2.12 ? > > > I don't know about the macio one but the sam460ex PCI interrupts were broken > in 2.12 too. However it's a fix for a device only used in sam460ex which is > now fixed by this patch so including it is not high risk for breaking > anything else than sam460ex which is known to be not finished yet so I would > not worry too much. But which is better? Releasing 3.0 with a known bug or > including this fix without an rc4? The problem with continuing to delay 3.0 while we have known bugs is that bugs generally come in at an even rate, so we *always* have known bugs, and so "we found another bug, let's delay 3.0 again to put in a fix for it" is a recipe for never doing a release. That's why we gradually wind up the bar for "should this go in", from "any bug" to "regressions" to "really really serious showstopper regressions". We never do a final release without a last rc (it is too risky), so that is not an option. thanks -- PMM
On Wed, Aug 01, 2018 at 02:04:04PM +0100, Peter Maydell wrote: > On 1 August 2018 at 12:24, BALATON Zoltan <balaton@eik.bme.hu> wrote: > > On Wed, 1 Aug 2018, Peter Maydell wrote: > >> So, we've just put out rc3, which in an ideal world is our > >> final release candidate for 3.0. Are these bugs regressions from > >> 2.12 ? > > > > > > I don't know about the macio one but the sam460ex PCI interrupts were broken > > in 2.12 too. However it's a fix for a device only used in sam460ex which is > > now fixed by this patch so including it is not high risk for breaking > > anything else than sam460ex which is known to be not finished yet so I would > > not worry too much. But which is better? Releasing 3.0 with a known bug or > > including this fix without an rc4? > > The problem with continuing to delay 3.0 while we have known bugs > is that bugs generally come in at an even rate, so we *always* > have known bugs, and so "we found another bug, let's delay 3.0 > again to put in a fix for it" is a recipe for never doing a release. > That's why we gradually wind up the bar for "should this go in", > from "any bug" to "regressions" to "really really serious showstopper > regressions". > > We never do a final release without a last rc (it is too risky), > so that is not an option. Ugh, sorry about this. I made an off-by-one error in my mental calculation of whether it would be ok to include non-regression bugfixes aet this point. Also, I originally meant to send this before the last -rc, but stuff happened. So, the sam460 fix is, indeed, not a regression and I'm happy to punt it to 3.1. The macio fix, however, *is* a regression from 2.12. Whether it's severe enough to warrant another -rc, I'm not sure. It is a bad pointer access which is, well, bad. It doesn't seem to bite obviously, needing valgrind to pick it up, but possibly that's just luck.
On 2 August 2018 at 08:08, David Gibson <david@gibson.dropbear.id.au> wrote: > The macio fix, however, *is* a regression from 2.12. Whether it's > severe enough to warrant another -rc, I'm not sure. It is a bad > pointer access which is, well, bad. It doesn't seem to bite > obviously, needing valgrind to pick it up, but possibly that's just > luck. I thought those introspection-bugs like the macio ones weren't regressions ? Certainly the devices have been leaking a refcount in their init methods for a long time. I don't view these as very important bugs, though, as they only manifest if you try to introspect fixed-always-present device/SoC objects, which in practice I don't think anybody does except as part of "test every device via introspection" test cases. thanks -- PMM
On Thu, Aug 02, 2018 at 10:16:32AM +0100, Peter Maydell wrote: > On 2 August 2018 at 08:08, David Gibson <david@gibson.dropbear.id.au> wrote: > > The macio fix, however, *is* a regression from 2.12. Whether it's > > severe enough to warrant another -rc, I'm not sure. It is a bad > > pointer access which is, well, bad. It doesn't seem to bite > > obviously, needing valgrind to pick it up, but possibly that's just > > luck. > > I thought those introspection-bugs like the macio ones weren't > regressions ? Well, I ran Thomas's testcase on master and it generates several valgrind warnings, which don't appear on either 2.12 or master+the patch. > Certainly the devices have been leaking a refcount > in their init methods for a long time. I don't view these as > very important bugs, though, as they only manifest if you > try to introspect fixed-always-present device/SoC objects, > which in practice I don't think anybody does except as > part of "test every device via introspection" test cases. > > thanks > -- PMM >
On 08/02/2018 04:07 PM, David Gibson wrote: > On Thu, Aug 02, 2018 at 10:16:32AM +0100, Peter Maydell wrote: >> On 2 August 2018 at 08:08, David Gibson <david@gibson.dropbear.id.au> wrote: >>> The macio fix, however, *is* a regression from 2.12. Whether it's >>> severe enough to warrant another -rc, I'm not sure. It is a bad >>> pointer access which is, well, bad. It doesn't seem to bite >>> obviously, needing valgrind to pick it up, but possibly that's just >>> luck. >> >> I thought those introspection-bugs like the macio ones weren't >> regressions ? > > Well, I ran Thomas's testcase on master and it generates several > valgrind warnings, which don't appear on either 2.12 or master+the > patch. Maybe the macio bug is something new, but we had plenty of these introspetion bugs in the other code (mainly the ARM code) which were clearly there since a looong time already and nobody ever complained. So it seems quite unusual that upper layer tools / the users are using the introspection feature of QEMU. Thus I'd say this bug is not important enough to block the release. We could fix it in the stable branch instead. Thomas
On Fri, Aug 03, 2018 at 07:49:12AM +0200, Thomas Huth wrote: > On 08/02/2018 04:07 PM, David Gibson wrote: > > On Thu, Aug 02, 2018 at 10:16:32AM +0100, Peter Maydell wrote: > >> On 2 August 2018 at 08:08, David Gibson <david@gibson.dropbear.id.au> wrote: > >>> The macio fix, however, *is* a regression from 2.12. Whether it's > >>> severe enough to warrant another -rc, I'm not sure. It is a bad > >>> pointer access which is, well, bad. It doesn't seem to bite > >>> obviously, needing valgrind to pick it up, but possibly that's just > >>> luck. > >> > >> I thought those introspection-bugs like the macio ones weren't > >> regressions ? > > > > Well, I ran Thomas's testcase on master and it generates several > > valgrind warnings, which don't appear on either 2.12 or master+the > > patch. > > Maybe the macio bug is something new, but we had plenty of these > introspetion bugs in the other code (mainly the ARM code) which were > clearly there since a looong time already and nobody ever complained. So > it seems quite unusual that upper layer tools / the users are using the > introspection feature of QEMU. Thus I'd say this bug is not important > enough to block the release. We could fix it in the stable branch > instead. Understood, I'll punt the patch to my 3.1 staging tree.
On Wed, 1 Aug 2018, Peter Maydell wrote: > On 1 August 2018 at 12:24, BALATON Zoltan <balaton@eik.bme.hu> wrote: >> On Wed, 1 Aug 2018, Peter Maydell wrote: >>> So, we've just put out rc3, which in an ideal world is our >>> final release candidate for 3.0. Are these bugs regressions from >>> 2.12 ? >> >> >> I don't know about the macio one but the sam460ex PCI interrupts were broken >> in 2.12 too. However it's a fix for a device only used in sam460ex which is >> now fixed by this patch so including it is not high risk for breaking >> anything else than sam460ex which is known to be not finished yet so I would >> not worry too much. But which is better? Releasing 3.0 with a known bug or >> including this fix without an rc4? > > The problem with continuing to delay 3.0 while we have known bugs > is that bugs generally come in at an even rate, so we *always* > have known bugs, and so "we found another bug, let's delay 3.0 > again to put in a fix for it" is a recipe for never doing a release. > That's why we gradually wind up the bar for "should this go in", > from "any bug" to "regressions" to "really really serious showstopper > regressions". > > We never do a final release without a last rc (it is too risky), > so that is not an option. Now that it looks like we'll have an rc4 due to other fixes can these be included as well despite not being regressions? These may not have been serious enough to fix when we wouldn't have rc4 otherwise but holding on to broken implementation just because they were also broken in the previous release does not make sense if we'll have another rc anyway. Regards, BALATON Zoltan
On 5 August 2018 at 16:38, BALATON Zoltan <balaton@eik.bme.hu> wrote: > Now that it looks like we'll have an rc4 due to other fixes can these be > included as well despite not being regressions? These may not have been > serious enough to fix when we wouldn't have rc4 otherwise but holding on to > broken implementation just because they were also broken in the previous > release does not make sense if we'll have another rc anyway. I'm considering it, yes, but it increases risk, so it is not a completely trivial decision. (All changes later in the release cycle increase the risk of something unexpected breaking as a result of the change.) thanks -- PMM
On 1 August 2018 at 04:53, David Gibson <david@gibson.dropbear.id.au> wrote: > The following changes since commit f7502360397d291be04bc040e9f96c92ff2d8030: > > Update version for v3.0.0-rc3 release (2018-07-31 19:30:17 +0100) > > are available in the Git repository at: > > git://github.com/dgibson/qemu.git tags/ppc-for-3.0-20180801 > > for you to fetch changes up to 6484ab3dffadc79020a71376010f517d60b81b83: > > sam460ex: Fix PCI interrupts with multiple devices (2018-08-01 11:01:38 +1000) > > ---------------------------------------------------------------- > ppc patch queue for 2018-08-01 > > Here are a final couple of fixes for the 3.0 release. > > ---------------------------------------------------------------- > BALATON Zoltan (1): > sam460ex: Fix PCI interrupts with multiple devices > > Thomas Huth (1): > hw/misc/macio: Fix device introspection problems in macio devices Since we needed an rc4, I've applied this to master (the bug fixes are not strict regressions but they are limited in scope to particular machines). thanks -- PMM