diff mbox

[for-2.5] tcg: Increase the highwater reservation

Message ID 1448986767-28016-1-git-send-email-rth@twiddle.net
State New
Headers show

Commit Message

Richard Henderson Dec. 1, 2015, 4:19 p.m. UTC
If there are a lot of guest memory ops in the TB, the amount of
code generated by tcg_out_tb_finalize could be well more than 1k.
In the short term, increase the reservation larger than any TB
seen in practice.

Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---

Reported and discussed with Aurelien on IRC yesterday.  This seems
to be the easiest fix for the upcoming release.  I will fix this
properly (by modifying every backend's finalize routines) for 2.6.


r~
---
 tcg/tcg.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Peter Maydell Dec. 1, 2015, 4:28 p.m. UTC | #1
On 1 December 2015 at 16:19, Richard Henderson <rth@twiddle.net> wrote:
> If there are a lot of guest memory ops in the TB, the amount of
> code generated by tcg_out_tb_finalize could be well more than 1k.
> In the short term, increase the reservation larger than any TB
> seen in practice.
>
> Reported-by: Aurelien Jarno <aurelien@aurel32.net>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>
> Reported and discussed with Aurelien on IRC yesterday.  This seems
> to be the easiest fix for the upcoming release.  I will fix this
> properly (by modifying every backend's finalize routines) for 2.6.

What would be the result of our hitting this bug? I ask because
there's a report on qemu-discuss about a qemu-i386-on-ARM-host
bug: http://lists.nongnu.org/archive/html/qemu-discuss/2015-11/msg00042.html
and the debug log (http://www.mediafire.com/download/ge611be9vbebbw7/qemu.log)
suggests we're segfaulting in translation on the TB shortly
after we (successfully) translate a TB whose final 'out' size
is 1100 and which has 64 guest writes in it. So I'm wondering
if that's actually the same bug this is fixing...

thanks
-- PMM
Aurelien Jarno Dec. 1, 2015, 4:32 p.m. UTC | #2
On 2015-12-01 08:19, Richard Henderson wrote:
> If there are a lot of guest memory ops in the TB, the amount of
> code generated by tcg_out_tb_finalize could be well more than 1k.
> In the short term, increase the reservation larger than any TB
> seen in practice.
> 
> Reported-by: Aurelien Jarno <aurelien@aurel32.net>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Thanks! I was testing the same patch locally after our discussion, and I
confirm it fixes the issue.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Aurelien Jarno Dec. 1, 2015, 4:34 p.m. UTC | #3
On 2015-12-01 16:28, Peter Maydell wrote:
> On 1 December 2015 at 16:19, Richard Henderson <rth@twiddle.net> wrote:
> > If there are a lot of guest memory ops in the TB, the amount of
> > code generated by tcg_out_tb_finalize could be well more than 1k.
> > In the short term, increase the reservation larger than any TB
> > seen in practice.
> >
> > Reported-by: Aurelien Jarno <aurelien@aurel32.net>
> > Signed-off-by: Richard Henderson <rth@twiddle.net>
> > ---
> >
> > Reported and discussed with Aurelien on IRC yesterday.  This seems
> > to be the easiest fix for the upcoming release.  I will fix this
> > properly (by modifying every backend's finalize routines) for 2.6.
> 
> What would be the result of our hitting this bug? I ask because
> there's a report on qemu-discuss about a qemu-i386-on-ARM-host
> bug: http://lists.nongnu.org/archive/html/qemu-discuss/2015-11/msg00042.html
> and the debug log (http://www.mediafire.com/download/ge611be9vbebbw7/qemu.log)
> suggests we're segfaulting in translation on the TB shortly
> after we (successfully) translate a TB whose final 'out' size
> is 1100 and which has 64 guest writes in it. So I'm wondering
> if that's actually the same bug this is fixing...

I don't think this is the same bug. The problem happens because the slow
path of the softmmu load/store access is written at the end of the TB.
In user mode, there is no slow path, so nothing is written at the end.
Aurelien Jarno Dec. 1, 2015, 4:40 p.m. UTC | #4
On 2015-12-01 17:34, Aurelien Jarno wrote:
> On 2015-12-01 16:28, Peter Maydell wrote:
> > On 1 December 2015 at 16:19, Richard Henderson <rth@twiddle.net> wrote:
> > > If there are a lot of guest memory ops in the TB, the amount of
> > > code generated by tcg_out_tb_finalize could be well more than 1k.
> > > In the short term, increase the reservation larger than any TB
> > > seen in practice.
> > >
> > > Reported-by: Aurelien Jarno <aurelien@aurel32.net>
> > > Signed-off-by: Richard Henderson <rth@twiddle.net>
> > > ---
> > >
> > > Reported and discussed with Aurelien on IRC yesterday.  This seems
> > > to be the easiest fix for the upcoming release.  I will fix this
> > > properly (by modifying every backend's finalize routines) for 2.6.
> > 
> > What would be the result of our hitting this bug? I ask because
> > there's a report on qemu-discuss about a qemu-i386-on-ARM-host
> > bug: http://lists.nongnu.org/archive/html/qemu-discuss/2015-11/msg00042.html
> > and the debug log (http://www.mediafire.com/download/ge611be9vbebbw7/qemu.log)
> > suggests we're segfaulting in translation on the TB shortly
> > after we (successfully) translate a TB whose final 'out' size
> > is 1100 and which has 64 guest writes in it. So I'm wondering
> > if that's actually the same bug this is fixing...
> 
> I don't think this is the same bug. The problem happens because the slow
> path of the softmmu load/store access is written at the end of the TB.
> In user mode, there is no slow path, so nothing is written at the end.

That said the problem reported is likely fixed by this commit that went
just after it has been reported:

commit 644da9b39e477caa80bab69d2847dfcb468f0d33
Author: John Clarke <johnc@kirriwa.net>
Date:   Thu Nov 19 10:30:50 2015 +0100

    tcg: Fix highwater check
    
    A simple typo in the variable to use when comparing vs the highwater mark.
    Reports are that qemu can in fact segfault occasionally due to this mistake.
    
    Signed-off-by: John Clarke <johnc@kirriwa.net>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
Richard Henderson Dec. 1, 2015, 4:46 p.m. UTC | #5
On 12/01/2015 08:28 AM, Peter Maydell wrote:
> On 1 December 2015 at 16:19, Richard Henderson <rth@twiddle.net> wrote:
>> If there are a lot of guest memory ops in the TB, the amount of
>> code generated by tcg_out_tb_finalize could be well more than 1k.
>> In the short term, increase the reservation larger than any TB
>> seen in practice.
>>
>> Reported-by: Aurelien Jarno <aurelien@aurel32.net>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>
>> Reported and discussed with Aurelien on IRC yesterday.  This seems
>> to be the easiest fix for the upcoming release.  I will fix this
>> properly (by modifying every backend's finalize routines) for 2.6.
>
> What would be the result of our hitting this bug?

A segfault, writing to the guard page for the code_gen buffer.

> I ask because
> there's a report on qemu-discuss about a qemu-i386-on-ARM-host
> bug: http://lists.nongnu.org/archive/html/qemu-discuss/2015-11/msg00042.html
> and the debug log (http://www.mediafire.com/download/ge611be9vbebbw7/qemu.log)
> suggests we're segfaulting in translation on the TB shortly
> after we (successfully) translate a TB whose final 'out' size
> is 1100 and which has 64 guest writes in it. So I'm wondering
> if that's actually the same bug this is fixing...

It's plausible.

The maximum 32-bit memory op for arm requires 9 insns in the slow path.  Times 
64 that's 2304 bytes, which exceeds the current highwater buffer space.

The new 64k buffer allows for 1820 (arm backend) writes before exceeding the 
highwater buffer.  Which is significantly more than TCG_MAX_INSNS (512), though 
not even close to OPC_MAX_SIZE (170240), which would require > 6MB in highwater 
space.


r~
Richard Henderson Dec. 1, 2015, 11:06 p.m. UTC | #6
On 12/01/2015 08:40 AM, Aurelien Jarno wrote:
> On 2015-12-01 17:34, Aurelien Jarno wrote:
>> On 2015-12-01 16:28, Peter Maydell wrote:
>>> On 1 December 2015 at 16:19, Richard Henderson <rth@twiddle.net> wrote:
>>>> If there are a lot of guest memory ops in the TB, the amount of
>>>> code generated by tcg_out_tb_finalize could be well more than 1k.
>>>> In the short term, increase the reservation larger than any TB
>>>> seen in practice.
>>>>
>>>> Reported-by: Aurelien Jarno <aurelien@aurel32.net>
>>>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>>>> ---
>>>>
>>>> Reported and discussed with Aurelien on IRC yesterday.  This seems
>>>> to be the easiest fix for the upcoming release.  I will fix this
>>>> properly (by modifying every backend's finalize routines) for 2.6.
>>>
>>> What would be the result of our hitting this bug? I ask because
>>> there's a report on qemu-discuss about a qemu-i386-on-ARM-host
>>> bug: http://lists.nongnu.org/archive/html/qemu-discuss/2015-11/msg00042.html
>>> and the debug log (http://www.mediafire.com/download/ge611be9vbebbw7/qemu.log)
>>> suggests we're segfaulting in translation on the TB shortly
>>> after we (successfully) translate a TB whose final 'out' size
>>> is 1100 and which has 64 guest writes in it. So I'm wondering
>>> if that's actually the same bug this is fixing...
>>
>> I don't think this is the same bug. The problem happens because the slow
>> path of the softmmu load/store access is written at the end of the TB.
>> In user mode, there is no slow path, so nothing is written at the end.

Oh quite right.  Duh.

> That said the problem reported is likely fixed by this commit that went
> just after it has been reported:

It does seem likely, but I don't see how we can know that the out size is 1100 
in that situation.  The disassembler dump doesn't happen until after we've done 
all of the writes that would have resulted in a highwater overflow segv.

I don't see how this can be the same bug.


r~
Peter Maydell Dec. 1, 2015, 11:20 p.m. UTC | #7
On 1 December 2015 at 23:06, Richard Henderson <rth@twiddle.net> wrote:
> On 12/01/2015 08:40 AM, Aurelien Jarno wrote:
>> That said the problem reported is likely fixed by this commit that went
>> just after it has been reported:
>
>
> It does seem likely, but I don't see how we can know that the out size is
> 1100 in that situation.  The disassembler dump doesn't happen until after
> we've done all of the writes that would have resulted in a highwater
> overflow segv.

Yeah, if we always cleanly segv immediately on highwater overflow
(as opposed to corrupting something so a later translation crashes)
then this can't be the bug that's reported for qemu-i386. The
actual TB that we never finish translating is quite small:

IN:
0x419552e0:  push   %ebp
0x419552e1:  mov    %esp,%ebp
0x419552e3:  sub    $0x18,%esp
0x419552e6:  fldl   0x8(%ebp)
0x419552e9:  fstpl  -0x8(%ebp)
0x419552ec:  movl   $0x14000000,0x4(%esp)
0x419552f4:  movl   $0x2,(%esp)
0x419552fb:  call   0x41954b96

thanks
-- PMM
diff mbox

Patch

diff --git a/tcg/tcg.c b/tcg/tcg.c
index b20ed19..a163541 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -388,7 +388,11 @@  void tcg_prologue_init(TCGContext *s)
     /* Compute a high-water mark, at which we voluntarily flush the buffer
        and start over.  The size here is arbitrary, significantly larger
        than we expect the code generation for any one opcode to require.  */
-    s->code_gen_highwater = s->code_gen_buffer + (total_size - 1024);
+    /* ??? We currently have no good estimate for, or checks in,
+       tcg_out_tb_finalize.  If there are quite a lot of guest memory ops,
+       the number of out-of-line fragments could be quite high.  In the
+       short-term, increase the highwater buffer.  */
+    s->code_gen_highwater = s->code_gen_buffer + (total_size - 64*1024);
 
     tcg_register_jit(s->code_gen_buffer, total_size);