Message ID | 20180408064939.18879-1-npiggin@gmail.com |
---|---|
Headers | show |
Series | next round of OPAL debugging improvements | expand |
Nicholas Piggin <npiggin@gmail.com> writes: > The r13 clobber bug has been fixed in Linux upstream now, the next > bit is the skiboot stack corruption caused by re-entry from the OS. > > Patches 1-3 have been posted here before in various RFCs. They give > nicer back traces, make quiescing a bit more robust, and stop the > stack from being destroyed, and make quiescing a bit more robust. > > Patch 4 is one I've found useful when testing with sreset injection > into various parts of skiboot code. > > We can in general interrupt and resume OPAL after this series. > Many specific cases are unlikely to work for other reasons, but > we have a cut down sub-set of OPAL API that we allow through, so > the task of making those reasonably robust should be manageable. Neat. I haven't hammered the series with sreset and fun like that, but we should probably add something along those lines to op-test (hit it a lot more when we had more bugs :) When I build with STACK_CHECK=1, it seems that we never get below 10k of free stack, but I'm not quite bold enough to suggest "8k is enough for everyone", and just eating the extra memory for stacks is probably okay. Series merged to master as of 87f55507195a166fb12ed1240ed3221964f11f40
On Thu, 19 Apr 2018 16:24:06 +1000 Stewart Smith <stewart@linux.ibm.com> wrote: > Nicholas Piggin <npiggin@gmail.com> writes: > > The r13 clobber bug has been fixed in Linux upstream now, the next > > bit is the skiboot stack corruption caused by re-entry from the OS. > > > > Patches 1-3 have been posted here before in various RFCs. They give > > nicer back traces, make quiescing a bit more robust, and stop the > > stack from being destroyed, and make quiescing a bit more robust. > > > > Patch 4 is one I've found useful when testing with sreset injection > > into various parts of skiboot code. > > > > We can in general interrupt and resume OPAL after this series. > > Many specific cases are unlikely to work for other reasons, but > > we have a cut down sub-set of OPAL API that we allow through, so > > the task of making those reasonably robust should be manageable. > > Neat. > > I haven't hammered the series with sreset and fun like that, but we > should probably add something along those lines to op-test (hit it a lot > more when we had more bugs :) Yeah, I've mostly been injecting in mambo, so far. We should be able to add some tests. I'll think about it too. > > When I build with STACK_CHECK=1, it seems that we never get below 10k of > free stack, but I'm not quite bold enough to suggest "8k is enough for > everyone", and just eating the extra memory for stacks is probably okay. Yeah I don't see it's a really big problem. Worth the added robustness. > Series merged to master as of 87f55507195a166fb12ed1240ed3221964f11f40 Cool. I think this brings us to a *reasonably* good place in terms of coping with NMIs and re-entry from Linux. Maybe a few minor tweaks, but we should be getting more solid. Next chapter is some kind of OPAL_DEBUG API we've talked about, but that'll be for another day. Concentrate on getting all this well tested and stable first. Thanks, Nick
Nicholas Piggin <npiggin@gmail.com> writes: > Cool. I think this brings us to a *reasonably* good place in terms of > coping with NMIs and re-entry from Linux. Maybe a few minor tweaks, but > we should be getting more solid. > > Next chapter is some kind of OPAL_DEBUG API we've talked about, but > that'll be for another day. Concentrate on getting all this well tested > and stable first. I compare to stability of turning on stop4/5 and think this is the most solid thing in the entire universe. (i think that was meant to be my inside voice....)