diff mbox series

[PULL,23/32] tcg: Support MMU protection regions smaller than TARGET_PAGE_SIZE

Message ID 20180626165658.31394-24-peter.maydell@linaro.org
State New
Headers show
Series [PULL,01/32] aspeed/smc: fix dummy cycles count when in dual IO mode | expand

Commit Message

Peter Maydell June 26, 2018, 4:56 p.m. UTC
Add support for MMU protection regions that are smaller than
TARGET_PAGE_SIZE. We do this by marking the TLB entry for those
pages with a flag TLB_RECHECK. This flag causes us to always
take the slow-path for accesses. In the slow path we can then
special case them to always call tlb_fill() again, so we have
the correct information for the exact address being accessed.

This change allows us to handle reading and writing from small
regions; we cannot deal with execution from the small region.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180620130619.11362-2-peter.maydell@linaro.org
---
 accel/tcg/softmmu_template.h |  24 ++++---
 include/exec/cpu-all.h       |   5 +-
 accel/tcg/cputlb.c           | 131 +++++++++++++++++++++++++++++------
 3 files changed, 130 insertions(+), 30 deletions(-)

Comments

Laurent Vivier June 28, 2018, 1:03 p.m. UTC | #1
Le 26/06/2018 à 18:56, Peter Maydell a écrit :
> Add support for MMU protection regions that are smaller than
> TARGET_PAGE_SIZE. We do this by marking the TLB entry for those
> pages with a flag TLB_RECHECK. This flag causes us to always
> take the slow-path for accesses. In the slow path we can then
> special case them to always call tlb_fill() again, so we have
> the correct information for the exact address being accessed.
> 
> This change allows us to handle reading and writing from small
> regions; we cannot deal with execution from the small region.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Message-id: 20180620130619.11362-2-peter.maydell@linaro.org
> ---
>  accel/tcg/softmmu_template.h |  24 ++++---
>  include/exec/cpu-all.h       |   5 +-
>  accel/tcg/cputlb.c           | 131 +++++++++++++++++++++++++++++------
>  3 files changed, 130 insertions(+), 30 deletions(-)

This patch breaks Quadra 800 emulation, any idea why?

ABCFGHIJK
qemu: fatal: Unable to handle guest executing from RAM within a small
MPU region at 0x0014cb5a
D0 = 0000006a   A0 = 002d8a19   F0 = 7fff ffffffffffffffff  (         nan)
D1 = 00000010   A1 = 002d8a19   F1 = 7fff ffffffffffffffff  (         nan)
D2 = 000003e0   A2 = 00332310   F2 = 7fff ffffffffffffffff  (         nan)
D3 = 00000000   A3 = 00331f98   F3 = 7fff ffffffffffffffff  (         nan)
D4 = 0036da87   A4 = 0036daa3   F4 = 7fff ffffffffffffffff  (         nan)
D5 = 000003e0   A5 = 0036de67   F5 = 7fff ffffffffffffffff  (         nan)
D6 = 002d8a18   A6 = 002d8a1a   F6 = 7fff ffffffffffffffff  (         nan)
D7 = 0014ac46   A7 = 00331ed8   F7 = 7fff ffffffffffffffff  (         nan)
PC = 0014cb5a   SR = 2700 T:0 I:7 SI -----
FPSR = 00000000 ---- -------- -----  FPCR = 0000 X RN --------
  A7(MSP) = 00000000   A7(USP) = 00000000 ->A7(ISP) = 00331f38
VBR = 0x00364528
SFC = 0 DFC 0
SSW 00000000 TCR 00008000 URP 00000000 SRP 00001000
DTTR0/1: 00000000/f807a040 ITTR0/1: 00000000/f807a040
MMUSR 00000000, fault at 00000000
Aborted (core dumped)

Laurent
Peter Maydell June 28, 2018, 1:23 p.m. UTC | #2
On 28 June 2018 at 14:03, Laurent Vivier <laurent@vivier.eu> wrote:
> Le 26/06/2018 à 18:56, Peter Maydell a écrit :
>> Add support for MMU protection regions that are smaller than
>> TARGET_PAGE_SIZE. We do this by marking the TLB entry for those
>> pages with a flag TLB_RECHECK. This flag causes us to always
>> take the slow-path for accesses. In the slow path we can then
>> special case them to always call tlb_fill() again, so we have
>> the correct information for the exact address being accessed.
>>
>> This change allows us to handle reading and writing from small
>> regions; we cannot deal with execution from the small region.
>>
>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>> Message-id: 20180620130619.11362-2-peter.maydell@linaro.org
>> ---
>>  accel/tcg/softmmu_template.h |  24 ++++---
>>  include/exec/cpu-all.h       |   5 +-
>>  accel/tcg/cputlb.c           | 131 +++++++++++++++++++++++++++++------
>>  3 files changed, 130 insertions(+), 30 deletions(-)
>
> This patch breaks Quadra 800 emulation, any idea why?
>
> ABCFGHIJK
> qemu: fatal: Unable to handle guest executing from RAM within a small
> MPU region at 0x0014cb5a

Hmm, that shouldn't happen unless your target code was
incorrectly returning a too-small page size. (I say
"incorrectly" because before this patchseries that was
unsupported and would have had weird effects depending on
exactly what the order of guest accesses to the page was.)

You could look at whether the m68k code is calling tlb_set_page()
with a wrong page_size value and why that happens. You can
get back the old behaviour by having your code do
   if (page_size < TARGET_PAGE_SIZE) {
       page_size = TARGET_PAGE_SIZE;
   }

but that is definitely a bit of a hack.

Does the m68k MMU let you specify permissions and mappings
for sub-page sizes ?

I do notice an oddity:
in m68k_cpu_handle_mmu_fault() we call get_physical_address()
but then ignore the page_size it returns when we call tlb_set_page()
and instead use TARGET_PAGE_SIZE. But in the ptest helper function
we use the page_size from get_physical_address() directly.
Are these bits of code deliberately different?

In fact it's not clear to me at all that PTEST should be
updating the QEMU TLB: it only needs to update the MMU
status registers. (The 68030 manual I have says that in
hardware PTEST doesn't update the ATC, which is the h/w
equivalent to doing a TLB update.)

thanks
-- PMM
Laurent Vivier June 28, 2018, 7:23 p.m. UTC | #3
Le 28/06/2018 à 15:23, Peter Maydell a écrit :
> On 28 June 2018 at 14:03, Laurent Vivier <laurent@vivier.eu> wrote:
>> Le 26/06/2018 à 18:56, Peter Maydell a écrit :
>>> Add support for MMU protection regions that are smaller than
>>> TARGET_PAGE_SIZE. We do this by marking the TLB entry for those
>>> pages with a flag TLB_RECHECK. This flag causes us to always
>>> take the slow-path for accesses. In the slow path we can then
>>> special case them to always call tlb_fill() again, so we have
>>> the correct information for the exact address being accessed.
>>>
>>> This change allows us to handle reading and writing from small
>>> regions; we cannot deal with execution from the small region.
>>>
>>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>>> Message-id: 20180620130619.11362-2-peter.maydell@linaro.org
>>> ---
>>>  accel/tcg/softmmu_template.h |  24 ++++---
>>>  include/exec/cpu-all.h       |   5 +-
>>>  accel/tcg/cputlb.c           | 131 +++++++++++++++++++++++++++++------
>>>  3 files changed, 130 insertions(+), 30 deletions(-)
>>
>> This patch breaks Quadra 800 emulation, any idea why?
>>
>> ABCFGHIJK
>> qemu: fatal: Unable to handle guest executing from RAM within a small
>> MPU region at 0x0014cb5a
> 
> Hmm, that shouldn't happen unless your target code was
> incorrectly returning a too-small page size. (I say
> "incorrectly" because before this patchseries that was
> unsupported and would have had weird effects depending on
> exactly what the order of guest accesses to the page was.)
> 
> You could look at whether the m68k code is calling tlb_set_page()
> with a wrong page_size value and why that happens. You can
> get back the old behaviour by having your code do
>    if (page_size < TARGET_PAGE_SIZE) {
>        page_size = TARGET_PAGE_SIZE;
>    }
> 
> but that is definitely a bit of a hack.

Thank you to have had a look at this.

I've added traces and tlb_set_page() is always called with page_size ==
TARGET_PAGE_SIZE.

m68k linux kernel always use 4 kB page that is the value of
TARGET_PAGE_SIZE.
68040 MMU can also use 8 kB page, but in our case it doesn't (and of
course 8 kB > TARGET_PAGE_SIZE).

> Does the m68k MMU let you specify permissions and mappings
> for sub-page sizes ?

I'm not aware of subpage in m68k MMU. but we have TLB entries that are
separated between code and data: does it change something in your code?
Accessing an address as a data access and then as an instruction access
could appear like a TLB_RECHECK?

> I do notice an oddity:
> in m68k_cpu_handle_mmu_fault() we call get_physical_address()
> but then ignore the page_size it returns when we call tlb_set_page()
> and instead use TARGET_PAGE_SIZE. But in the ptest helper function
> we use the page_size from get_physical_address() directly.
> Are these bits of code deliberately different?

I remember I had problem to make this to work. But I think  you're
right, it should be page_size everywhere. But I guess it's not the cause
of my problem (I tried :) )...

> In fact it's not clear to me at all that PTEST should be
> updating the QEMU TLB: it only needs to update the MMU
> status registers. (The 68030 manual I have says that in
> hardware PTEST doesn't update the ATC, which is the h/w
> equivalent to doing a TLB update.)

In QEMU, we emulate for the moment the 68040 MMU, and PTEST for 68040 is
not defined as the one for 68030.

For 68040, we have:

"A matching entry in the address translation cache (data or instruction)
specified by the function code will be flushed by PTEST. Completion of
PTEST results in the creation of a new address translation cache entry"

Thanks,
Laurent
Peter Maydell June 28, 2018, 8:05 p.m. UTC | #4
On 28 June 2018 at 20:23, Laurent Vivier <laurent@vivier.eu> wrote:
> Le 28/06/2018 à 15:23, Peter Maydell a écrit :
>> On 28 June 2018 at 14:03, Laurent Vivier <laurent@vivier.eu> wrote:
>>> Le 26/06/2018 à 18:56, Peter Maydell a écrit :
>>>> Add support for MMU protection regions that are smaller than
>>>> TARGET_PAGE_SIZE. We do this by marking the TLB entry for those
>>>> pages with a flag TLB_RECHECK. This flag causes us to always
>>>> take the slow-path for accesses. In the slow path we can then
>>>> special case them to always call tlb_fill() again, so we have
>>>> the correct information for the exact address being accessed.
>>>>
>>>> This change allows us to handle reading and writing from small
>>>> regions; we cannot deal with execution from the small region.
>>>>
>>>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>>>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>>>> Message-id: 20180620130619.11362-2-peter.maydell@linaro.org
>>>> ---
>>>>  accel/tcg/softmmu_template.h |  24 ++++---
>>>>  include/exec/cpu-all.h       |   5 +-
>>>>  accel/tcg/cputlb.c           | 131 +++++++++++++++++++++++++++++------
>>>>  3 files changed, 130 insertions(+), 30 deletions(-)
>>>
>>> This patch breaks Quadra 800 emulation, any idea why?
>>>
>>> ABCFGHIJK
>>> qemu: fatal: Unable to handle guest executing from RAM within a small
>>> MPU region at 0x0014cb5a
>>
>> Hmm, that shouldn't happen unless your target code was
>> incorrectly returning a too-small page size. (I say
>> "incorrectly" because before this patchseries that was
>> unsupported and would have had weird effects depending on
>> exactly what the order of guest accesses to the page was.)
>>
>> You could look at whether the m68k code is calling tlb_set_page()
>> with a wrong page_size value and why that happens. You can
>> get back the old behaviour by having your code do
>>    if (page_size < TARGET_PAGE_SIZE) {
>>        page_size = TARGET_PAGE_SIZE;
>>    }
>>
>> but that is definitely a bit of a hack.
>
> Thank you to have had a look at this.
>
> I've added traces and tlb_set_page() is always called with page_size ==
> TARGET_PAGE_SIZE.
>
> m68k linux kernel always use 4 kB page that is the value of
> TARGET_PAGE_SIZE.
> 68040 MMU can also use 8 kB page, but in our case it doesn't (and of
> course 8 kB > TARGET_PAGE_SIZE).

> I'm not aware of subpage in m68k MMU. but we have TLB entries that are
> separated between code and data: does it change something in your code?
> Accessing an address as a data access and then as an instruction access
> could appear like a TLB_RECHECK?

If you never pass a page_size < TARGET_PAGE_SIZE to
tlb_set_page() then we should never mark anything as TLB_RECHECK:
the theory was no behaviour change for the currently-being-used case.

Do you have a repro case (images, command line) that I can
use to investigate ?

>> In fact it's not clear to me at all that PTEST should be
>> updating the QEMU TLB: it only needs to update the MMU
>> status registers. (The 68030 manual I have says that in
>> hardware PTEST doesn't update the ATC, which is the h/w
>> equivalent to doing a TLB update.)
>
> In QEMU, we emulate for the moment the 68040 MMU, and PTEST for 68040 is
> not defined as the one for 68030.
>
> For 68040, we have:
>
> "A matching entry in the address translation cache (data or instruction)
> specified by the function code will be flushed by PTEST. Completion of
> PTEST results in the creation of a new address translation cache entry"

Oh, OK. Since the QEMU TLB isn't really the same as the hardware
TLB then it isn't strictly required to update our TLB here, but
if the hardware does that then it doesn't hurt.

thanks
-- PMM
Laurent Vivier June 28, 2018, 10:26 p.m. UTC | #5
Le 28/06/2018 à 22:05, Peter Maydell a écrit :
> On 28 June 2018 at 20:23, Laurent Vivier <laurent@vivier.eu> wrote:
>> Le 28/06/2018 à 15:23, Peter Maydell a écrit :
>>> On 28 June 2018 at 14:03, Laurent Vivier <laurent@vivier.eu> wrote:
>>>> Le 26/06/2018 à 18:56, Peter Maydell a écrit :
>>>>> Add support for MMU protection regions that are smaller than
>>>>> TARGET_PAGE_SIZE. We do this by marking the TLB entry for those
>>>>> pages with a flag TLB_RECHECK. This flag causes us to always
>>>>> take the slow-path for accesses. In the slow path we can then
>>>>> special case them to always call tlb_fill() again, so we have
>>>>> the correct information for the exact address being accessed.
>>>>>
>>>>> This change allows us to handle reading and writing from small
>>>>> regions; we cannot deal with execution from the small region.
>>>>>
>>>>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>>>>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>>>>> Message-id: 20180620130619.11362-2-peter.maydell@linaro.org
>>>>> ---
>>>>>  accel/tcg/softmmu_template.h |  24 ++++---
>>>>>  include/exec/cpu-all.h       |   5 +-
>>>>>  accel/tcg/cputlb.c           | 131 +++++++++++++++++++++++++++++------
>>>>>  3 files changed, 130 insertions(+), 30 deletions(-)
>>>>
>>>> This patch breaks Quadra 800 emulation, any idea why?
>>>>
>>>> ABCFGHIJK
>>>> qemu: fatal: Unable to handle guest executing from RAM within a small
>>>> MPU region at 0x0014cb5a
>>>
>>> Hmm, that shouldn't happen unless your target code was
>>> incorrectly returning a too-small page size. (I say
>>> "incorrectly" because before this patchseries that was
>>> unsupported and would have had weird effects depending on
>>> exactly what the order of guest accesses to the page was.)
>>>
>>> You could look at whether the m68k code is calling tlb_set_page()
>>> with a wrong page_size value and why that happens. You can
>>> get back the old behaviour by having your code do
>>>    if (page_size < TARGET_PAGE_SIZE) {
>>>        page_size = TARGET_PAGE_SIZE;
>>>    }
>>>
>>> but that is definitely a bit of a hack.
>>
>> Thank you to have had a look at this.
>>
>> I've added traces and tlb_set_page() is always called with page_size ==
>> TARGET_PAGE_SIZE.
>>
>> m68k linux kernel always use 4 kB page that is the value of
>> TARGET_PAGE_SIZE.
>> 68040 MMU can also use 8 kB page, but in our case it doesn't (and of
>> course 8 kB > TARGET_PAGE_SIZE).
> 
>> I'm not aware of subpage in m68k MMU. but we have TLB entries that are
>> separated between code and data: does it change something in your code?
>> Accessing an address as a data access and then as an instruction access
>> could appear like a TLB_RECHECK?
> 
> If you never pass a page_size < TARGET_PAGE_SIZE to
> tlb_set_page() then we should never mark anything as TLB_RECHECK:
> the theory was no behaviour change for the currently-being-used case.
> 
> Do you have a repro case (images, command line) that I can
> use to investigate ?
- checkout the branch q800-dev-part1 from
  git://github.com/vivier/qemu-m68k.git

- configure and build

 './configure' '--target-list=m68k-softmmu' '--enable-debug' \
               '--enable-debug-tcg' '--enable-debug-info'

  my gcc is from Fedora 27, version 7.3.1 20180303 (Red Hat 7.3.1-5)

- get the kernel from the debian installer:

wget
https://cdimage.debian.org/mirror/cdimage/ports/9.0/m68k/iso-cd/debian-9.0-m68k-NETINST-1.iso

guestfish --add debian-9.0-m68k-NETINST-1.iso --ro \
          --mount /dev/sda:/ <<_EOF_
copy-out /install/kernels/vmlinux-4.15.0-2-m68k .
_EOF_

- and run

./m68k-softmmu/qemu-system-m68k -M q800 \
    -serial none -serial mon:stdio \
    -kernel vmlinux-4.15.0-2-m68k \
    -nographic

ABCFGHIJK
qemu: fatal: Unable to handle guest executing from RAM within a small
MPU region at 0x0029bb2c
D0 = 003ca111   A0 = 003ca111   F0 = 7fff ffffffffffffffff  (         nan)
D1 = 00000000   A1 = 0000000a   F1 = 7fff ffffffffffffffff  (         nan)
D2 = 00000000   A2 = 00395314   F2 = 7fff ffffffffffffffff  (         nan)
D3 = 00000001   A3 = 003ca110   F3 = 7fff ffffffffffffffff  (         nan)
D4 = 000003e0   A4 = 003ca4e8   F4 = 7fff ffffffffffffffff  (         nan)
D5 = 00393fc8   A5 = 0033d77b   F5 = 7fff ffffffffffffffff  (         nan)
D6 = 003ca108   A6 = 00393fc4   F6 = 7fff ffffffffffffffff  (         nan)
D7 = 00000002   A7 = 00393ef8   F7 = 7fff ffffffffffffffff  (         nan)
PC = 0029bb2c   SR = 2700 T:0 I:7 SI -----
FPSR = 00000000 ---- -------- -----  FPCR = 0000 X RN --------
  A7(MSP) = 00000000   A7(USP) = 00000000 ->A7(ISP) = 00393f68
VBR = 0x003bfce8
SFC = 5 DFC 5
SSW 00000000 TCR 00008000 URP 00000000 SRP 00001000
DTTR0/1: 00000000/f807a040 ITTR0/1: 00000000/f807a040
MMUSR 00000000, fault at 00000000
Aborted (core dumped)

Thanks,
Laurent
Peter Maydell June 29, 2018, 12:14 p.m. UTC | #6
On 28 June 2018 at 23:26, Laurent Vivier <laurent@vivier.eu> wrote:
> ./m68k-softmmu/qemu-system-m68k -M q800 \
>     -serial none -serial mon:stdio \
>     -kernel vmlinux-4.15.0-2-m68k \
>     -nographic

Thanks for the test case. I'm still investigating, but there
are a couple of things happening here.

First, there's a bug in get_page_addr_code()'s "is this a
TLB miss?" condition which was introduced in commit 71b9a45330fe22:

    if (unlikely(env->tlb_table[mmu_idx][index].addr_code !=
                 (addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK)))) {

takes a (not necessarily page aligned) address, and masks out
everything but the page-aligned top half (good) and the
TLB_INVALID bit (not good, because that could be either 0 or 1
depending on the address). This means sometimes we'll incorrectly
decide we got a miss in the TLB and do an unnecessary refill.

The second thing that's going on here is that the m68k target
code writes TLB entries for the same address with different
prot bits without doing a flush in between:

tlb_set_page_with_attrs: vaddr=0029b000 paddr=0x000000000029b000 prot=3 idx=0
tlb_set_page_with_attrs: vaddr=0029b000 paddr=0x000000000029b000 prot=7 idx=0

The tlb_set_page_with_attrs() code isn't expecting this, so
we end up with two TLB entries for the same address, one in
the main TLB and one in the victim cache TLB. The bug above
means that we get this sequence of events:
 * fill main TLB entry with prot=3 entry
 * later, fill main TLB with prot=7 entry, and evict prot=3
   entry to victim cache
 * hit on the prot=7 entry in the main TLB
 * refill condition incorrectly fails, but we hit in the victim cache
 * so we pull the prot=3 entry from victim to main TLB
 * prot=3 means "addr_code == -1", so the check of the TLB_RECHECK
   bit succeeds
 * in the TLB_RECHECK code we do a tlb_fill()
 * that fills in the main TLB with a prot=7 entry again, bouncing
   the prot=3 entry back out to the victim cache
 * prot=7 means the addr_code is correct, so we find ourselves in
   the "TLB_RECHECK but this is RAM" abort code path

I'm not sure whether it's supposed to be the responsibility
of the target code or the common accel/tcg code to ensure
that we don't have multiple TLB entries for the same address.

thanks
-- PMM
Alex Bennée June 29, 2018, 2:07 p.m. UTC | #7
Peter Maydell <peter.maydell@linaro.org> writes:

> On 28 June 2018 at 23:26, Laurent Vivier <laurent@vivier.eu> wrote:
>> ./m68k-softmmu/qemu-system-m68k -M q800 \
>>     -serial none -serial mon:stdio \
>>     -kernel vmlinux-4.15.0-2-m68k \
>>     -nographic
>
> Thanks for the test case. I'm still investigating, but there
> are a couple of things happening here.
>
> First, there's a bug in get_page_addr_code()'s "is this a
> TLB miss?" condition which was introduced in commit 71b9a45330fe22:
>
>     if (unlikely(env->tlb_table[mmu_idx][index].addr_code !=
>                  (addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK)))) {
>
> takes a (not necessarily page aligned) address, and masks out
> everything but the page-aligned top half (good) and the
> TLB_INVALID bit (not good, because that could be either 0 or 1
> depending on the address). This means sometimes we'll incorrectly
> decide we got a miss in the TLB and do an unnecessary refill.
>
> The second thing that's going on here is that the m68k target
> code writes TLB entries for the same address with different
> prot bits without doing a flush in between:
>
> tlb_set_page_with_attrs: vaddr=0029b000 paddr=0x000000000029b000 prot=3 idx=0
> tlb_set_page_with_attrs: vaddr=0029b000 paddr=0x000000000029b000 prot=7 idx=0
>
> The tlb_set_page_with_attrs() code isn't expecting this, so
> we end up with two TLB entries for the same address, one in
> the main TLB and one in the victim cache TLB. The bug above
> means that we get this sequence of events:
>  * fill main TLB entry with prot=3 entry
>  * later, fill main TLB with prot=7 entry, and evict prot=3
>    entry to victim cache
>  * hit on the prot=7 entry in the main TLB
>  * refill condition incorrectly fails, but we hit in the victim cache
>  * so we pull the prot=3 entry from victim to main TLB
>  * prot=3 means "addr_code == -1", so the check of the TLB_RECHECK
>    bit succeeds
>  * in the TLB_RECHECK code we do a tlb_fill()
>  * that fills in the main TLB with a prot=7 entry again, bouncing
>    the prot=3 entry back out to the victim cache
>  * prot=7 means the addr_code is correct, so we find ourselves in
>    the "TLB_RECHECK but this is RAM" abort code path
>
> I'm not sure whether it's supposed to be the responsibility
> of the target code or the common accel/tcg code to ensure
> that we don't have multiple TLB entries for the same address.

My gut feeling is we should fail safely in the case of the guest writing
two mostly identical page entries in a row. We can check for aliasing
when we update and either evict to the victim cache or reset the vtlb
entry.

>
> thanks
> -- PMM


--
Alex Bennée
Peter Maydell June 29, 2018, 3:28 p.m. UTC | #8
On 28 June 2018 at 23:26, Laurent Vivier <laurent@vivier.eu> wrote:
> Le 28/06/2018 à 22:05, Peter Maydell a écrit :
>> Do you have a repro case (images, command line) that I can
>> use to investigate ?
> - checkout the branch q800-dev-part1 from
>   git://github.com/vivier/qemu-m68k.git
>
> - configure and build
>
>  './configure' '--target-list=m68k-softmmu' '--enable-debug' \
>                '--enable-debug-tcg' '--enable-debug-info'
>
>   my gcc is from Fedora 27, version 7.3.1 20180303 (Red Hat 7.3.1-5)
>
> - get the kernel from the debian installer:
>
> wget
> https://cdimage.debian.org/mirror/cdimage/ports/9.0/m68k/iso-cd/debian-9.0-m68k-NETINST-1.iso
>
> guestfish --add debian-9.0-m68k-NETINST-1.iso --ro \
>           --mount /dev/sda:/ <<_EOF_
> copy-out /install/kernels/vmlinux-4.15.0-2-m68k .
> _EOF_
>
> - and run
>
> ./m68k-softmmu/qemu-system-m68k -M q800 \
>     -serial none -serial mon:stdio \
>     -kernel vmlinux-4.15.0-2-m68k \
>     -nographic

What is this testcase supposed to print when it works?
I tried reverting 55df6fcf5476b44bc1b9, but that just prints
"ABCFGHIJK" and then nothing else.

thanks
-- PMM
Laurent Vivier June 29, 2018, 3:52 p.m. UTC | #9
Le 29/06/2018 à 17:28, Peter Maydell a écrit :
> On 28 June 2018 at 23:26, Laurent Vivier <laurent@vivier.eu> wrote:
>> Le 28/06/2018 à 22:05, Peter Maydell a écrit :
>>> Do you have a repro case (images, command line) that I can
>>> use to investigate ?
>> - checkout the branch q800-dev-part1 from
>>   git://github.com/vivier/qemu-m68k.git
>>
>> - configure and build
>>
>>  './configure' '--target-list=m68k-softmmu' '--enable-debug' \
>>                '--enable-debug-tcg' '--enable-debug-info'
>>
>>   my gcc is from Fedora 27, version 7.3.1 20180303 (Red Hat 7.3.1-5)
>>
>> - get the kernel from the debian installer:
>>
>> wget
>> https://cdimage.debian.org/mirror/cdimage/ports/9.0/m68k/iso-cd/debian-9.0-m68k-NETINST-1.iso
>>
>> guestfish --add debian-9.0-m68k-NETINST-1.iso --ro \
>>           --mount /dev/sda:/ <<_EOF_
>> copy-out /install/kernels/vmlinux-4.15.0-2-m68k .
>> _EOF_
>>
>> - and run
>>
>> ./m68k-softmmu/qemu-system-m68k -M q800 \
>>     -serial none -serial mon:stdio \
>>     -kernel vmlinux-4.15.0-2-m68k \
>>     -nographic
> 
> What is this testcase supposed to print when it works?
> I tried reverting 55df6fcf5476b44bc1b9, but that just prints
> "ABCFGHIJK" and then nothing else.

At this point, you can either remove the -nographic or add -append
"console=ttyS0 vga=off" to have the kernel boot logs.

If you want to have started an userspace command, add the initrd from
the CD:

guestfish --add debian-9.0-m68k-NETINST-1.iso --ro \
          --mount /dev/sda:/ <<_EOF_
copy-out /install/cdrom/initrd.gz .
_EOF_

Thanks,
Laurent
diff mbox series

Patch

diff --git a/accel/tcg/softmmu_template.h b/accel/tcg/softmmu_template.h
index 239ea6692b4..c47591c9709 100644
--- a/accel/tcg/softmmu_template.h
+++ b/accel/tcg/softmmu_template.h
@@ -98,10 +98,12 @@ 
 static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
                                               size_t mmu_idx, size_t index,
                                               target_ulong addr,
-                                              uintptr_t retaddr)
+                                              uintptr_t retaddr,
+                                              bool recheck)
 {
     CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
-    return io_readx(env, iotlbentry, mmu_idx, addr, retaddr, DATA_SIZE);
+    return io_readx(env, iotlbentry, mmu_idx, addr, retaddr, recheck,
+                    DATA_SIZE);
 }
 #endif
 
@@ -138,7 +140,8 @@  WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr,
 
         /* ??? Note that the io helpers always read data in the target
            byte ordering.  We should push the LE/BE request down into io.  */
-        res = glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr);
+        res = glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr,
+                                    tlb_addr & TLB_RECHECK);
         res = TGT_LE(res);
         return res;
     }
@@ -205,7 +208,8 @@  WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr,
 
         /* ??? Note that the io helpers always read data in the target
            byte ordering.  We should push the LE/BE request down into io.  */
-        res = glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr);
+        res = glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr,
+                                    tlb_addr & TLB_RECHECK);
         res = TGT_BE(res);
         return res;
     }
@@ -259,10 +263,12 @@  static inline void glue(io_write, SUFFIX)(CPUArchState *env,
                                           size_t mmu_idx, size_t index,
                                           DATA_TYPE val,
                                           target_ulong addr,
-                                          uintptr_t retaddr)
+                                          uintptr_t retaddr,
+                                          bool recheck)
 {
     CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
-    return io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr, DATA_SIZE);
+    return io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
+                     recheck, DATA_SIZE);
 }
 
 void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
@@ -298,7 +304,8 @@  void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
         /* ??? Note that the io helpers always read data in the target
            byte ordering.  We should push the LE/BE request down into io.  */
         val = TGT_LE(val);
-        glue(io_write, SUFFIX)(env, mmu_idx, index, val, addr, retaddr);
+        glue(io_write, SUFFIX)(env, mmu_idx, index, val, addr,
+                               retaddr, tlb_addr & TLB_RECHECK);
         return;
     }
 
@@ -375,7 +382,8 @@  void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
         /* ??? Note that the io helpers always read data in the target
            byte ordering.  We should push the LE/BE request down into io.  */
         val = TGT_BE(val);
-        glue(io_write, SUFFIX)(env, mmu_idx, index, val, addr, retaddr);
+        glue(io_write, SUFFIX)(env, mmu_idx, index, val, addr, retaddr,
+                               tlb_addr & TLB_RECHECK);
         return;
     }
 
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 7fa726b8e36..7338f57062f 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -330,11 +330,14 @@  CPUArchState *cpu_copy(CPUArchState *env);
 #define TLB_NOTDIRTY        (1 << (TARGET_PAGE_BITS - 2))
 /* Set if TLB entry is an IO callback.  */
 #define TLB_MMIO            (1 << (TARGET_PAGE_BITS - 3))
+/* Set if TLB entry must have MMU lookup repeated for every access */
+#define TLB_RECHECK         (1 << (TARGET_PAGE_BITS - 4))
 
 /* Use this mask to check interception with an alignment mask
  * in a TCG backend.
  */
-#define TLB_FLAGS_MASK  (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO)
+#define TLB_FLAGS_MASK  (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO \
+                         | TLB_RECHECK)
 
 void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf);
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 719cca2268b..eebe97dabb7 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -613,27 +613,42 @@  void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     target_ulong code_address;
     uintptr_t addend;
     CPUTLBEntry *te, *tv, tn;
-    hwaddr iotlb, xlat, sz;
+    hwaddr iotlb, xlat, sz, paddr_page;
+    target_ulong vaddr_page;
     unsigned vidx = env->vtlb_index++ % CPU_VTLB_SIZE;
     int asidx = cpu_asidx_from_attrs(cpu, attrs);
 
     assert_cpu_is_self(cpu);
-    assert(size >= TARGET_PAGE_SIZE);
-    if (size != TARGET_PAGE_SIZE) {
-        tlb_add_large_page(env, vaddr, size);
-    }
 
-    sz = size;
-    section = address_space_translate_for_iotlb(cpu, asidx, paddr, &xlat, &sz,
-                                                attrs, &prot);
+    if (size < TARGET_PAGE_SIZE) {
+        sz = TARGET_PAGE_SIZE;
+    } else {
+        if (size > TARGET_PAGE_SIZE) {
+            tlb_add_large_page(env, vaddr, size);
+        }
+        sz = size;
+    }
+    vaddr_page = vaddr & TARGET_PAGE_MASK;
+    paddr_page = paddr & TARGET_PAGE_MASK;
+
+    section = address_space_translate_for_iotlb(cpu, asidx, paddr_page,
+                                                &xlat, &sz, attrs, &prot);
     assert(sz >= TARGET_PAGE_SIZE);
 
     tlb_debug("vaddr=" TARGET_FMT_lx " paddr=0x" TARGET_FMT_plx
               " prot=%x idx=%d\n",
               vaddr, paddr, prot, mmu_idx);
 
-    address = vaddr;
-    if (!memory_region_is_ram(section->mr) && !memory_region_is_romd(section->mr)) {
+    address = vaddr_page;
+    if (size < TARGET_PAGE_SIZE) {
+        /*
+         * Slow-path the TLB entries; we will repeat the MMU check and TLB
+         * fill on every access.
+         */
+        address |= TLB_RECHECK;
+    }
+    if (!memory_region_is_ram(section->mr) &&
+        !memory_region_is_romd(section->mr)) {
         /* IO memory case */
         address |= TLB_MMIO;
         addend = 0;
@@ -643,10 +658,10 @@  void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     }
 
     code_address = address;
-    iotlb = memory_region_section_get_iotlb(cpu, section, vaddr, paddr, xlat,
-                                            prot, &address);
+    iotlb = memory_region_section_get_iotlb(cpu, section, vaddr_page,
+                                            paddr_page, xlat, prot, &address);
 
-    index = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    index = (vaddr_page >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     te = &env->tlb_table[mmu_idx][index];
     /* do not discard the translation in te, evict it into a victim tlb */
     tv = &env->tlb_v_table[mmu_idx][vidx];
@@ -662,18 +677,18 @@  void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
      * TARGET_PAGE_BITS, and either
      *  + the ram_addr_t of the page base of the target RAM (if NOTDIRTY or ROM)
      *  + the offset within section->mr of the page base (otherwise)
-     * We subtract the vaddr (which is page aligned and thus won't
+     * We subtract the vaddr_page (which is page aligned and thus won't
      * disturb the low bits) to give an offset which can be added to the
      * (non-page-aligned) vaddr of the eventual memory access to get
      * the MemoryRegion offset for the access. Note that the vaddr we
      * subtract here is that of the page base, and not the same as the
      * vaddr we add back in io_readx()/io_writex()/get_page_addr_code().
      */
-    env->iotlb[mmu_idx][index].addr = iotlb - vaddr;
+    env->iotlb[mmu_idx][index].addr = iotlb - vaddr_page;
     env->iotlb[mmu_idx][index].attrs = attrs;
 
     /* Now calculate the new entry */
-    tn.addend = addend - vaddr;
+    tn.addend = addend - vaddr_page;
     if (prot & PAGE_READ) {
         tn.addr_read = address;
     } else {
@@ -694,7 +709,7 @@  void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
             tn.addr_write = address | TLB_MMIO;
         } else if (memory_region_is_ram(section->mr)
                    && cpu_physical_memory_is_clean(
-                        memory_region_get_ram_addr(section->mr) + xlat)) {
+                       memory_region_get_ram_addr(section->mr) + xlat)) {
             tn.addr_write = address | TLB_NOTDIRTY;
         } else {
             tn.addr_write = address;
@@ -767,7 +782,8 @@  static inline ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
 
 static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
                          int mmu_idx,
-                         target_ulong addr, uintptr_t retaddr, int size)
+                         target_ulong addr, uintptr_t retaddr,
+                         bool recheck, int size)
 {
     CPUState *cpu = ENV_GET_CPU(env);
     hwaddr mr_offset;
@@ -777,6 +793,29 @@  static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     bool locked = false;
     MemTxResult r;
 
+    if (recheck) {
+        /*
+         * This is a TLB_RECHECK access, where the MMU protection
+         * covers a smaller range than a target page, and we must
+         * repeat the MMU check here. This tlb_fill() call might
+         * longjump out if this access should cause a guest exception.
+         */
+        int index;
+        target_ulong tlb_addr;
+
+        tlb_fill(cpu, addr, size, MMU_DATA_LOAD, mmu_idx, retaddr);
+
+        index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+        tlb_addr = env->tlb_table[mmu_idx][index].addr_read;
+        if (!(tlb_addr & ~(TARGET_PAGE_MASK | TLB_RECHECK))) {
+            /* RAM access */
+            uintptr_t haddr = addr + env->tlb_table[mmu_idx][index].addend;
+
+            return ldn_p((void *)haddr, size);
+        }
+        /* Fall through for handling IO accesses */
+    }
+
     section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
     mr = section->mr;
     mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
@@ -811,7 +850,7 @@  static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
 static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
                       int mmu_idx,
                       uint64_t val, target_ulong addr,
-                      uintptr_t retaddr, int size)
+                      uintptr_t retaddr, bool recheck, int size)
 {
     CPUState *cpu = ENV_GET_CPU(env);
     hwaddr mr_offset;
@@ -820,6 +859,30 @@  static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     bool locked = false;
     MemTxResult r;
 
+    if (recheck) {
+        /*
+         * This is a TLB_RECHECK access, where the MMU protection
+         * covers a smaller range than a target page, and we must
+         * repeat the MMU check here. This tlb_fill() call might
+         * longjump out if this access should cause a guest exception.
+         */
+        int index;
+        target_ulong tlb_addr;
+
+        tlb_fill(cpu, addr, size, MMU_DATA_STORE, mmu_idx, retaddr);
+
+        index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+        tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+        if (!(tlb_addr & ~(TARGET_PAGE_MASK | TLB_RECHECK))) {
+            /* RAM access */
+            uintptr_t haddr = addr + env->tlb_table[mmu_idx][index].addend;
+
+            stn_p((void *)haddr, size, val);
+            return;
+        }
+        /* Fall through for handling IO accesses */
+    }
+
     section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
     mr = section->mr;
     mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
@@ -903,6 +966,32 @@  tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr)
             tlb_fill(ENV_GET_CPU(env), addr, 0, MMU_INST_FETCH, mmu_idx, 0);
         }
     }
+
+    if (unlikely(env->tlb_table[mmu_idx][index].addr_code & TLB_RECHECK)) {
+        /*
+         * This is a TLB_RECHECK access, where the MMU protection
+         * covers a smaller range than a target page, and we must
+         * repeat the MMU check here. This tlb_fill() call might
+         * longjump out if this access should cause a guest exception.
+         */
+        int index;
+        target_ulong tlb_addr;
+
+        tlb_fill(cpu, addr, 0, MMU_INST_FETCH, mmu_idx, 0);
+
+        index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+        tlb_addr = env->tlb_table[mmu_idx][index].addr_code;
+        if (!(tlb_addr & ~(TARGET_PAGE_MASK | TLB_RECHECK))) {
+            /* RAM access. We can't handle this, so for now just stop */
+            cpu_abort(cpu, "Unable to handle guest executing from RAM within "
+                      "a small MPU region at 0x" TARGET_FMT_lx, addr);
+        }
+        /*
+         * Fall through to handle IO accesses (which will almost certainly
+         * also result in failure)
+         */
+    }
+
     iotlbentry = &env->iotlb[mmu_idx][index];
     section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
     mr = section->mr;
@@ -1011,8 +1100,8 @@  static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
         tlb_addr = tlbe->addr_write & ~TLB_INVALID_MASK;
     }
 
-    /* Notice an IO access  */
-    if (unlikely(tlb_addr & TLB_MMIO)) {
+    /* Notice an IO access or a needs-MMU-lookup access */
+    if (unlikely(tlb_addr & (TLB_MMIO | TLB_RECHECK))) {
         /* There's really nothing that can be done to
            support this apart from stop-the-world.  */
         goto stop_the_world;