Message ID | 1448986767-28016-1-git-send-email-rth@twiddle.net |
---|---|
State | New |
Headers | show |
On 1 December 2015 at 16:19, Richard Henderson <rth@twiddle.net> wrote: > If there are a lot of guest memory ops in the TB, the amount of > code generated by tcg_out_tb_finalize could be well more than 1k. > In the short term, increase the reservation larger than any TB > seen in practice. > > Reported-by: Aurelien Jarno <aurelien@aurel32.net> > Signed-off-by: Richard Henderson <rth@twiddle.net> > --- > > Reported and discussed with Aurelien on IRC yesterday. This seems > to be the easiest fix for the upcoming release. I will fix this > properly (by modifying every backend's finalize routines) for 2.6. What would be the result of our hitting this bug? I ask because there's a report on qemu-discuss about a qemu-i386-on-ARM-host bug: http://lists.nongnu.org/archive/html/qemu-discuss/2015-11/msg00042.html and the debug log (http://www.mediafire.com/download/ge611be9vbebbw7/qemu.log) suggests we're segfaulting in translation on the TB shortly after we (successfully) translate a TB whose final 'out' size is 1100 and which has 64 guest writes in it. So I'm wondering if that's actually the same bug this is fixing... thanks -- PMM
On 2015-12-01 08:19, Richard Henderson wrote: > If there are a lot of guest memory ops in the TB, the amount of > code generated by tcg_out_tb_finalize could be well more than 1k. > In the short term, increase the reservation larger than any TB > seen in practice. > > Reported-by: Aurelien Jarno <aurelien@aurel32.net> > Signed-off-by: Richard Henderson <rth@twiddle.net> Thanks! I was testing the same patch locally after our discussion, and I confirm it fixes the issue. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Tested-by: Aurelien Jarno <aurelien@aurel32.net>
On 2015-12-01 16:28, Peter Maydell wrote: > On 1 December 2015 at 16:19, Richard Henderson <rth@twiddle.net> wrote: > > If there are a lot of guest memory ops in the TB, the amount of > > code generated by tcg_out_tb_finalize could be well more than 1k. > > In the short term, increase the reservation larger than any TB > > seen in practice. > > > > Reported-by: Aurelien Jarno <aurelien@aurel32.net> > > Signed-off-by: Richard Henderson <rth@twiddle.net> > > --- > > > > Reported and discussed with Aurelien on IRC yesterday. This seems > > to be the easiest fix for the upcoming release. I will fix this > > properly (by modifying every backend's finalize routines) for 2.6. > > What would be the result of our hitting this bug? I ask because > there's a report on qemu-discuss about a qemu-i386-on-ARM-host > bug: http://lists.nongnu.org/archive/html/qemu-discuss/2015-11/msg00042.html > and the debug log (http://www.mediafire.com/download/ge611be9vbebbw7/qemu.log) > suggests we're segfaulting in translation on the TB shortly > after we (successfully) translate a TB whose final 'out' size > is 1100 and which has 64 guest writes in it. So I'm wondering > if that's actually the same bug this is fixing... I don't think this is the same bug. The problem happens because the slow path of the softmmu load/store access is written at the end of the TB. In user mode, there is no slow path, so nothing is written at the end.
On 2015-12-01 17:34, Aurelien Jarno wrote: > On 2015-12-01 16:28, Peter Maydell wrote: > > On 1 December 2015 at 16:19, Richard Henderson <rth@twiddle.net> wrote: > > > If there are a lot of guest memory ops in the TB, the amount of > > > code generated by tcg_out_tb_finalize could be well more than 1k. > > > In the short term, increase the reservation larger than any TB > > > seen in practice. > > > > > > Reported-by: Aurelien Jarno <aurelien@aurel32.net> > > > Signed-off-by: Richard Henderson <rth@twiddle.net> > > > --- > > > > > > Reported and discussed with Aurelien on IRC yesterday. This seems > > > to be the easiest fix for the upcoming release. I will fix this > > > properly (by modifying every backend's finalize routines) for 2.6. > > > > What would be the result of our hitting this bug? I ask because > > there's a report on qemu-discuss about a qemu-i386-on-ARM-host > > bug: http://lists.nongnu.org/archive/html/qemu-discuss/2015-11/msg00042.html > > and the debug log (http://www.mediafire.com/download/ge611be9vbebbw7/qemu.log) > > suggests we're segfaulting in translation on the TB shortly > > after we (successfully) translate a TB whose final 'out' size > > is 1100 and which has 64 guest writes in it. So I'm wondering > > if that's actually the same bug this is fixing... > > I don't think this is the same bug. The problem happens because the slow > path of the softmmu load/store access is written at the end of the TB. > In user mode, there is no slow path, so nothing is written at the end. That said the problem reported is likely fixed by this commit that went just after it has been reported: commit 644da9b39e477caa80bab69d2847dfcb468f0d33 Author: John Clarke <johnc@kirriwa.net> Date: Thu Nov 19 10:30:50 2015 +0100 tcg: Fix highwater check A simple typo in the variable to use when comparing vs the highwater mark. Reports are that qemu can in fact segfault occasionally due to this mistake. Signed-off-by: John Clarke <johnc@kirriwa.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
On 12/01/2015 08:28 AM, Peter Maydell wrote: > On 1 December 2015 at 16:19, Richard Henderson <rth@twiddle.net> wrote: >> If there are a lot of guest memory ops in the TB, the amount of >> code generated by tcg_out_tb_finalize could be well more than 1k. >> In the short term, increase the reservation larger than any TB >> seen in practice. >> >> Reported-by: Aurelien Jarno <aurelien@aurel32.net> >> Signed-off-by: Richard Henderson <rth@twiddle.net> >> --- >> >> Reported and discussed with Aurelien on IRC yesterday. This seems >> to be the easiest fix for the upcoming release. I will fix this >> properly (by modifying every backend's finalize routines) for 2.6. > > What would be the result of our hitting this bug? A segfault, writing to the guard page for the code_gen buffer. > I ask because > there's a report on qemu-discuss about a qemu-i386-on-ARM-host > bug: http://lists.nongnu.org/archive/html/qemu-discuss/2015-11/msg00042.html > and the debug log (http://www.mediafire.com/download/ge611be9vbebbw7/qemu.log) > suggests we're segfaulting in translation on the TB shortly > after we (successfully) translate a TB whose final 'out' size > is 1100 and which has 64 guest writes in it. So I'm wondering > if that's actually the same bug this is fixing... It's plausible. The maximum 32-bit memory op for arm requires 9 insns in the slow path. Times 64 that's 2304 bytes, which exceeds the current highwater buffer space. The new 64k buffer allows for 1820 (arm backend) writes before exceeding the highwater buffer. Which is significantly more than TCG_MAX_INSNS (512), though not even close to OPC_MAX_SIZE (170240), which would require > 6MB in highwater space. r~
On 12/01/2015 08:40 AM, Aurelien Jarno wrote: > On 2015-12-01 17:34, Aurelien Jarno wrote: >> On 2015-12-01 16:28, Peter Maydell wrote: >>> On 1 December 2015 at 16:19, Richard Henderson <rth@twiddle.net> wrote: >>>> If there are a lot of guest memory ops in the TB, the amount of >>>> code generated by tcg_out_tb_finalize could be well more than 1k. >>>> In the short term, increase the reservation larger than any TB >>>> seen in practice. >>>> >>>> Reported-by: Aurelien Jarno <aurelien@aurel32.net> >>>> Signed-off-by: Richard Henderson <rth@twiddle.net> >>>> --- >>>> >>>> Reported and discussed with Aurelien on IRC yesterday. This seems >>>> to be the easiest fix for the upcoming release. I will fix this >>>> properly (by modifying every backend's finalize routines) for 2.6. >>> >>> What would be the result of our hitting this bug? I ask because >>> there's a report on qemu-discuss about a qemu-i386-on-ARM-host >>> bug: http://lists.nongnu.org/archive/html/qemu-discuss/2015-11/msg00042.html >>> and the debug log (http://www.mediafire.com/download/ge611be9vbebbw7/qemu.log) >>> suggests we're segfaulting in translation on the TB shortly >>> after we (successfully) translate a TB whose final 'out' size >>> is 1100 and which has 64 guest writes in it. So I'm wondering >>> if that's actually the same bug this is fixing... >> >> I don't think this is the same bug. The problem happens because the slow >> path of the softmmu load/store access is written at the end of the TB. >> In user mode, there is no slow path, so nothing is written at the end. Oh quite right. Duh. > That said the problem reported is likely fixed by this commit that went > just after it has been reported: It does seem likely, but I don't see how we can know that the out size is 1100 in that situation. The disassembler dump doesn't happen until after we've done all of the writes that would have resulted in a highwater overflow segv. I don't see how this can be the same bug. r~
On 1 December 2015 at 23:06, Richard Henderson <rth@twiddle.net> wrote: > On 12/01/2015 08:40 AM, Aurelien Jarno wrote: >> That said the problem reported is likely fixed by this commit that went >> just after it has been reported: > > > It does seem likely, but I don't see how we can know that the out size is > 1100 in that situation. The disassembler dump doesn't happen until after > we've done all of the writes that would have resulted in a highwater > overflow segv. Yeah, if we always cleanly segv immediately on highwater overflow (as opposed to corrupting something so a later translation crashes) then this can't be the bug that's reported for qemu-i386. The actual TB that we never finish translating is quite small: IN: 0x419552e0: push %ebp 0x419552e1: mov %esp,%ebp 0x419552e3: sub $0x18,%esp 0x419552e6: fldl 0x8(%ebp) 0x419552e9: fstpl -0x8(%ebp) 0x419552ec: movl $0x14000000,0x4(%esp) 0x419552f4: movl $0x2,(%esp) 0x419552fb: call 0x41954b96 thanks -- PMM
diff --git a/tcg/tcg.c b/tcg/tcg.c index b20ed19..a163541 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -388,7 +388,11 @@ void tcg_prologue_init(TCGContext *s) /* Compute a high-water mark, at which we voluntarily flush the buffer and start over. The size here is arbitrary, significantly larger than we expect the code generation for any one opcode to require. */ - s->code_gen_highwater = s->code_gen_buffer + (total_size - 1024); + /* ??? We currently have no good estimate for, or checks in, + tcg_out_tb_finalize. If there are quite a lot of guest memory ops, + the number of out-of-line fragments could be quite high. In the + short-term, increase the highwater buffer. */ + s->code_gen_highwater = s->code_gen_buffer + (total_size - 64*1024); tcg_register_jit(s->code_gen_buffer, total_size);
If there are a lot of guest memory ops in the TB, the amount of code generated by tcg_out_tb_finalize could be well more than 1k. In the short term, increase the reservation larger than any TB seen in practice. Reported-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net> --- Reported and discussed with Aurelien on IRC yesterday. This seems to be the easiest fix for the upcoming release. I will fix this properly (by modifying every backend's finalize routines) for 2.6. r~ --- tcg/tcg.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)