diff mbox

arm: fix TB alignment check

Message ID 20141021121453.7268.529.stgit@PASHA-ISP
State New
Headers show

Commit Message

Pavel Dovgalyuk Oct. 21, 2014, 12:14 p.m. UTC
Sometimes page faults happen during the translation of the target instructions.
To avoid the faults in the middle of the TB we have to stop translation at
the end of the page. Current implementation of ARM translation assumes that
instructions are aligned to their own size (4 or 2 bytes). But in thumb2 mode
4-byte instruction can be aligned to 2 bytes. In some cases such an alignment
leads to page fault.
This patch adds check that allows translation of such instructions only in
the beginning of the TB.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 target-arm/translate.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

Comments

Peter Maydell Oct. 23, 2014, 1:18 p.m. UTC | #1
On 21 October 2014 13:14, Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> wrote:
> Sometimes page faults happen during the translation of the target instructions.
> To avoid the faults in the middle of the TB we have to stop translation at
> the end of the page. Current implementation of ARM translation assumes that
> instructions are aligned to their own size (4 or 2 bytes). But in thumb2 mode
> 4-byte instruction can be aligned to 2 bytes. In some cases such an alignment
> leads to page fault.
> This patch adds check that allows translation of such instructions only in
> the beginning of the TB.
>
> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>

Richard, could I ask you to review this one? I don't really
understand how TCG frontend support for instructions that
span page boundaries is supposed to work :-(

> ---
>  target-arm/translate.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 2c0b1de..bc3a16b 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -11124,7 +11124,8 @@ static inline void gen_intermediate_code_internal(ARMCPU *cpu,
>               !cs->singlestep_enabled &&
>               !singlestep &&
>               !dc->ss_active &&
> -             dc->pc < next_page_start &&
> +             /* +3 is for unaligned Thumb2 instructions */
> +             dc->pc + 3 < next_page_start &&
>               num_insns < max_insns);
>
>      if (tb->cflags & CF_LAST_IO) {
>

thanks
-- PMM
Laurent Desnogues Oct. 23, 2014, 1:42 p.m. UTC | #2
Hello,

On Tue, Oct 21, 2014 at 2:14 PM, Pavel Dovgalyuk
<Pavel.Dovgaluk@ispras.ru> wrote:
> Sometimes page faults happen during the translation of the target instructions.
> To avoid the faults in the middle of the TB we have to stop translation at
> the end of the page. Current implementation of ARM translation assumes that
> instructions are aligned to their own size (4 or 2 bytes). But in thumb2 mode
> 4-byte instruction can be aligned to 2 bytes. In some cases such an alignment
> leads to page fault.
> This patch adds check that allows translation of such instructions only in
> the beginning of the TB.
>
> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> ---
>  target-arm/translate.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 2c0b1de..bc3a16b 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -11124,7 +11124,8 @@ static inline void gen_intermediate_code_internal(ARMCPU *cpu,
>               !cs->singlestep_enabled &&
>               !singlestep &&
>               !dc->ss_active &&
> -             dc->pc < next_page_start &&
> +             /* +3 is for unaligned Thumb2 instructions */
> +             dc->pc + 3 < next_page_start &&

Won't this end prematurely a (4KB) block which last instruction
is a Thumb 16-bit one?

Thanks,

Laurent

>               num_insns < max_insns);
>
>      if (tb->cflags & CF_LAST_IO) {
>
>
Richard Henderson Oct. 23, 2014, 4:15 p.m. UTC | #3
On 10/21/2014 05:14 AM, Pavel Dovgalyuk wrote:
> Sometimes page faults happen during the translation of the target instructions.
> To avoid the faults in the middle of the TB we have to stop translation at
> the end of the page. Current implementation of ARM translation assumes that
> instructions are aligned to their own size (4 or 2 bytes). But in thumb2 mode
> 4-byte instruction can be aligned to 2 bytes. In some cases such an alignment
> leads to page fault.
> This patch adds check that allows translation of such instructions only in
> the beginning of the TB.
> 
> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> ---
>  target-arm/translate.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 2c0b1de..bc3a16b 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -11124,7 +11124,8 @@ static inline void gen_intermediate_code_internal(ARMCPU *cpu,
>               !cs->singlestep_enabled &&
>               !singlestep &&
>               !dc->ss_active &&
> -             dc->pc < next_page_start &&
> +             /* +3 is for unaligned Thumb2 instructions */
> +             dc->pc + 3 < next_page_start &&

I'd like to know more about the problem you're seeing here.

QEMU can handle TB's that span two full pages.  The poster child for this
support is of course the i386 port.  There, the test we use is

   (pc_ptr - pc_start) >= (TARGET_PAGE_SIZE - 32)

Since pc_start can be anywhere at all, this test allows a 4k sliding window
across two pages.  The -32 slop there is presumably intended to prevent us from
starting that one last instruction that might push us to 3 pages[1].

And, by "handle", I mean invalidate such TB's when either page gets
overwritten, signalling that they need to be retranslated.

My guess is that you are talking about adjusting the point at which an
exception is recognized when executing a sequence that falls off the end of a
mapping.

To be honest qemu doesn't attempt to care about that much for targets that have
insns that span pages.  We've proven[2] that there are no branches or forced
exceptions that would change control flow earlier, therefore any time execution
begins at pc_start, it will likely fall through to the next page.

That said, for the special case of an isa limited to 2 and 4 byte insns[3],
this is probably a good change.  After we terminate this TB, execute it, and
start translating the next, we'll have three cases:

  (1) The next thumb insn is 2 bytes, and it gets a TB all by itself.
  (2) The next thumb insn is 4 bytes,
      (a) the next page is mapped, and it gets a TB to itself,
      (b) the next page is unmapped, and we signal it immediately.

But 2b is perfect because that's exactly when the signal would be delivered by
hardware.  We could improve things by allowing case 2a to continue translation
in the new page, but that may be rare enough not be worth the extra checks.

I will say that it's probably worth moving the sense of the check.  One can
achieve the same results by subtracting from next_page_start when it is
initialized.  That moves the arithmetic out of the loop, and also allows you to
write a larger comment about the logic and not have to place it in the middle
of this rather large while condition.


r~


[1] Why 32 when the maximum insn size is more like 15 bytes, I don't know.  But
it likely doesn't matter since I'd expect such large TB's to fill up the opcode
buffer first.  There would have to be a lot of nops on that page.

[2] Or ought to have done so, for a well written translator.

[3] Hello, MIPS.  Leon, the test for mips is (now) incorrect:

        if ((ctx.pc & (TARGET_PAGE_SIZE - 1)) == 0)
            break;

may never succeed for mips16 and micromips.
Peter Maydell Oct. 23, 2014, 4:25 p.m. UTC | #4
On 23 October 2014 17:15, Richard Henderson <rth@twiddle.net> wrote:
> [1] Why 32 when the maximum insn size is more like 15 bytes, I don't know.  But
> it likely doesn't matter since I'd expect such large TB's to fill up the opcode
> buffer first.  There would have to be a lot of nops on that page.

Do we actually correctly GPF if the guest hands us an instruction
with a huge long set of prefix bytes? I can't see anything obviously
in the code that catches this case...

-- PMM
Richard Henderson Oct. 23, 2014, 4:33 p.m. UTC | #5
On 10/23/2014 09:25 AM, Peter Maydell wrote:
> On 23 October 2014 17:15, Richard Henderson <rth@twiddle.net> wrote:
>> [1] Why 32 when the maximum insn size is more like 15 bytes, I don't know.  But
>> it likely doesn't matter since I'd expect such large TB's to fill up the opcode
>> buffer first.  There would have to be a lot of nops on that page.
> 
> Do we actually correctly GPF if the guest hands us an instruction
> with a huge long set of prefix bytes? I can't see anything obviously
> in the code that catches this case...

No, I don't think we check for that at all.


r~
Pavel Dovgalyuk Oct. 24, 2014, 5:24 a.m. UTC | #6
> From: Richard Henderson [mailto:rth7680@gmail.com] On Behalf Of Richard Henderson
> On 10/21/2014 05:14 AM, Pavel Dovgalyuk wrote:
> > Sometimes page faults happen during the translation of the target instructions.
> > To avoid the faults in the middle of the TB we have to stop translation at
> > the end of the page. Current implementation of ARM translation assumes that
> > instructions are aligned to their own size (4 or 2 bytes). But in thumb2 mode
> > 4-byte instruction can be aligned to 2 bytes. In some cases such an alignment
> > leads to page fault.
> > This patch adds check that allows translation of such instructions only in
> > the beginning of the TB.
> >
> > Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> > ---
> >  target-arm/translate.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/target-arm/translate.c b/target-arm/translate.c
> > index 2c0b1de..bc3a16b 100644
> > --- a/target-arm/translate.c
> > +++ b/target-arm/translate.c
> > @@ -11124,7 +11124,8 @@ static inline void gen_intermediate_code_internal(ARMCPU *cpu,
> >               !cs->singlestep_enabled &&
> >               !singlestep &&
> >               !dc->ss_active &&
> > -             dc->pc < next_page_start &&
> > +             /* +3 is for unaligned Thumb2 instructions */
> > +             dc->pc + 3 < next_page_start &&
> 
> I'd like to know more about the problem you're seeing here.
> 

The problem is coupled with the execution determinism.
Consider the following two cases:

      0x0000ffa insn0
TB1   0x0000ffc insn1
TB2   0x0000ffe insn2
      0x0001002 insn3

In different execution scenarios TB may start from address TB1 or TB2.
when the page 0x1000 is not mapped, translation of insn2 will cause an exception.
In first case exception appears after executing insn0 and in second case - after insn1.

This means that execution of the code is not deterministic and depends on
translation order.

Pavel Dovgalyuk
Leon Alrae Oct. 24, 2014, 4:08 p.m. UTC | #7
On 23/10/2014 17:15, Richard Henderson wrote:
> [3] Hello, MIPS.  Leon, the test for mips is (now) incorrect:
> 
>         if ((ctx.pc & (TARGET_PAGE_SIZE - 1)) == 0)
>             break;
> 
> may never succeed for mips16 and micromips.

Indeed, this test doesn't look right (although I'm not sure what exactly
are the consequences if the test doesn't succeed and we span two or more
pages). Thanks for spotting this in MIPS.

Regards,
Leon
Peter Maydell Sept. 19, 2015, 10 a.m. UTC | #8
On 21 October 2014 at 13:14, Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> wrote:
> Sometimes page faults happen during the translation of the target instructions.
> To avoid the faults in the middle of the TB we have to stop translation at
> the end of the page. Current implementation of ARM translation assumes that
> instructions are aligned to their own size (4 or 2 bytes). But in thumb2 mode
> 4-byte instruction can be aligned to 2 bytes. In some cases such an alignment
> leads to page fault.
> This patch adds check that allows translation of such instructions only in
> the beginning of the TB.
>
> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> ---
>  target-arm/translate.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 2c0b1de..bc3a16b 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -11124,7 +11124,8 @@ static inline void gen_intermediate_code_internal(ARMCPU *cpu,
>               !cs->singlestep_enabled &&
>               !singlestep &&
>               !dc->ss_active &&
> -             dc->pc < next_page_start &&
> +             /* +3 is for unaligned Thumb2 instructions */
> +             dc->pc + 3 < next_page_start &&
>               num_insns < max_insns);
>
>      if (tb->cflags & CF_LAST_IO) {

I finally got round to looking again at this patch from last year, which
I didn't really understand the first time round. Having investigated
more closely, this is a correctness issue (we report the page fault
with the wrong guest PC). Those get my attention much quicker if
clearly described as such :-)

As Laurent suggested, this code will give an awkward stub TB with
a single 16bit Thumb insn in it if the last insn in the page
happens to be a 16 bit Thumb insn. I'll post a patch shortly which
avoids this effect (and which has more commentary on why we need
to do this).

Incidentally, this remark by rth:
> To be honest qemu doesn't attempt to care about that much for targets that have
> insns that span pages.  We've proven[2] that there are no branches or forced
> exceptions that would change control flow earlier, therefore any time execution
> begins at pc_start, it will likely fall through to the next page.

fails to consider the case of non-forced exceptions (like those
on loads or stores). I think every target has to care about this
otherwise for situations like:
 load/store insn that should fault
 other insn that spans page boundary into a non-executable page

we will report the fault for the execution on the non-executable
page, when we should have reported the fault for the load/store
(and we'll get the faulting PC wrong too).

thanks
-- PMM
diff mbox

Patch

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 2c0b1de..bc3a16b 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -11124,7 +11124,8 @@  static inline void gen_intermediate_code_internal(ARMCPU *cpu,
              !cs->singlestep_enabled &&
              !singlestep &&
              !dc->ss_active &&
-             dc->pc < next_page_start &&
+             /* +3 is for unaligned Thumb2 instructions */
+             dc->pc + 3 < next_page_start &&
              num_insns < max_insns);
 
     if (tb->cflags & CF_LAST_IO) {