diff mbox series

[1/4] util/cacheflush: fix illegal instruction on windows-arm64

Message ID 20230213161352.17199-2-pierrick.bouvier@linaro.org
State New
Headers show
Series Adds support for running QEMU natively on windows-arm64 | expand

Commit Message

Pierrick Bouvier Feb. 13, 2023, 4:13 p.m. UTC
mrs instruction fails as an illegal instruction.
For now, no cache information is retrieved for this platform.
It could be specialized later, using Windows API.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 util/cacheflush.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Peter Maydell Feb. 14, 2023, 4:44 p.m. UTC | #1
On Mon, 13 Feb 2023 at 20:50, Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
>
> mrs instruction fails as an illegal instruction.
> For now, no cache information is retrieved for this platform.
> It could be specialized later, using Windows API.

Unless I'm misreading the code, there's a sys_cache_info()
implementation that's only guarded by if defined(_WIN32), so
presumably we're using that on AArch64 also. Does it return
sensible values ?

> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
>  util/cacheflush.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/util/cacheflush.c b/util/cacheflush.c
> index 2c2c73e085..149f103d32 100644
> --- a/util/cacheflush.c
> +++ b/util/cacheflush.c
> @@ -121,8 +121,9 @@ static void sys_cache_info(int *isize, int *dsize)
>  static bool have_coherent_icache;
>  #endif
>
> -#if defined(__aarch64__) && !defined(CONFIG_DARWIN)
> +#if defined(__aarch64__) && !defined(CONFIG_DARWIN) && !defined(CONFIG_WIN32)
>  /* Apple does not expose CTR_EL0, so we must use system interfaces. */
> +/* Does not work on windows-arm64, illegal instruction using mrs */

This comment should be better integrated with the previous, because
the reason for the illegal instruction exception on Windows is the
same as for macos -- the OS doesn't expose CTR_EL0 to userspace.

>  static uint64_t save_ctr_el0;
>  static void arch_cache_info(int *isize, int *dsize)
>  {
> @@ -225,7 +226,7 @@ static void __attribute__((constructor)) init_cache_info(void)
>
>  /* Caches are coherent and do not require flushing; symbol inline. */
>
> -#elif defined(__aarch64__)
> +#elif defined(__aarch64__) && !defined(CONFIG_WIN32)

This will cause us to not use the generic aarch64 flush_idcache_range(),
which uses DC CVAU and IC IVAU. Does that not work on Windows?

If it doesn't then I think the ifdeffery would be more clearly
structured as

#elif defined(__aarch64__)

ifdef CONFIG_DARWIN
[macos implementation of flush_idcache_range]
#elif defined(CONFIG_WIN32)
/* Explanation here of why the generic version doesn't work */
#else
/* generic version */
#endif

#elif defined(__mips__)

etc

thanks
-- PMM
Richard Henderson Feb. 14, 2023, 5:02 p.m. UTC | #2
On 2/14/23 06:44, Peter Maydell wrote:
> This will cause us to not use the generic aarch64 flush_idcache_range(),
> which uses DC CVAU and IC IVAU. Does that not work on Windows?
> 
> If it doesn't then I think the ifdeffery would be more clearly
> structured as
> 
> #elif defined(__aarch64__)
> 
> ifdef CONFIG_DARWIN
> [macos implementation of flush_idcache_range]
> #elif defined(CONFIG_WIN32)
> /* Explanation here of why the generic version doesn't work */
> #else
> /* generic version */
> #endif
> 
> #elif defined(__mips__)

More specifically, there *must* be a replacement, or TCG will not work.
It appears as if FlushInstructionCache is the right syscall on windows.

I cannot find documentation for a data cache flush.  But until there is also a way to 
allocate split w^x memory regions (tcg/region.c, alloc_code_gen_buffer_splitwx), you need 
not worry about that.  You could reasonably assert(rx == rw) there.


r~
Pierrick Bouvier Feb. 15, 2023, 12:49 p.m. UTC | #3
On 2/14/23 17:44, Peter Maydell wrote:
> On Mon, 13 Feb 2023 at 20:50, Pierrick Bouvier
> <pierrick.bouvier@linaro.org> wrote:
>>
>> mrs instruction fails as an illegal instruction.
>> For now, no cache information is retrieved for this platform.
>> It could be specialized later, using Windows API.
> 
> Unless I'm misreading the code, there's a sys_cache_info()
> implementation that's only guarded by if defined(_WIN32), so
> presumably we're using that on AArch64 also. Does it return
> sensible values ?
> 

Yes, it works on arm64, and I checked it was returned by the expected 
"windows version" function sys_cache_info.
On my machine, having a snapdragon 8cx gen3 processor, it returns (64, 64).

It's on par with what I can see from a WSL1 environment (Linux, without 
VM) on the same machine:
$ getconf LEVEL1_DCACHE_LINESIZE
64
$ getconf LEVEL1_ICACHE_LINESIZE
64

>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>>   util/cacheflush.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/util/cacheflush.c b/util/cacheflush.c
>> index 2c2c73e085..149f103d32 100644
>> --- a/util/cacheflush.c
>> +++ b/util/cacheflush.c
>> @@ -121,8 +121,9 @@ static void sys_cache_info(int *isize, int *dsize)
>>   static bool have_coherent_icache;
>>   #endif
>>
>> -#if defined(__aarch64__) && !defined(CONFIG_DARWIN)
>> +#if defined(__aarch64__) && !defined(CONFIG_DARWIN) && !defined(CONFIG_WIN32)
>>   /* Apple does not expose CTR_EL0, so we must use system interfaces. */
>> +/* Does not work on windows-arm64, illegal instruction using mrs */
> 
> This comment should be better integrated with the previous, because
> the reason for the illegal instruction exception on Windows is the
> same as for macos -- the OS doesn't expose CTR_EL0 to userspace.
> 

I'll update the comment 👍

>>   static uint64_t save_ctr_el0;
>>   static void arch_cache_info(int *isize, int *dsize)
>>   {
>> @@ -225,7 +226,7 @@ static void __attribute__((constructor)) init_cache_info(void)
>>
>>   /* Caches are coherent and do not require flushing; symbol inline. */
>>
>> -#elif defined(__aarch64__)
>> +#elif defined(__aarch64__) && !defined(CONFIG_WIN32)
> 
> This will cause us to not use the generic aarch64 flush_idcache_range(),
> which uses DC CVAU and IC IVAU. Does that not work on Windows?
> 
> If it doesn't then I think the ifdeffery would be more clearly
> structured as
> 
> #elif defined(__aarch64__)
> 
> ifdef CONFIG_DARWIN
> [macos implementation of flush_idcache_range]
> #elif defined(CONFIG_WIN32)
> /* Explanation here of why the generic version doesn't work */
> #else
> /* generic version */
> #endif
> 
> #elif defined(__mips__)
> 
> etc
>

For now, the generic flush_idcache_range, using __builtin___clear_cache 
is used. Richard mentioned 'there *must* be a replacement, or TCG will 
not work'.

As said in the cover letter, I could successfully install and boot an 
arm64 and x64 vm.

I'm not an expert on this area, but I imagine that booting a full VM 
will force TCG to emit code at the same address twice (after having 
generated enough translated blocks), which shows that generic 
flush_idcache_range works. Is that reasoning correct?

Do you think we still need a specialized version for windows-arm64?

> thanks
> -- PMM
Richard Henderson Feb. 15, 2023, 6:22 p.m. UTC | #4
On 2/15/23 02:49, Pierrick Bouvier wrote:
> I'm not an expert on this area, but I imagine that booting a full VM will force TCG to 
> emit code at the same address twice (after having generated enough translated blocks), 
> which shows that generic flush_idcache_range works. Is that reasoning correct?
> 
> Do you think we still need a specialized version for windows-arm64?

No, I don't think so.  It would be ideal if we were able to read CTR_EL0, because we make 
use of the IDC field, but 0 is a safe default.


r~
Pierrick Bouvier Feb. 16, 2023, 12:53 p.m. UTC | #5
After some investigation on this, I found that even faking ctr_el0 
content does not work. Indeed, "dc cvau" and "ic ivau" both are 
privileged too (and fail with illegal instruction).

I started looking how other projects are handling this. In the case of 
firefox js engine, they simply perform a FlushInstructionCache [1], and 
nothing for data cache. From what I found on various websites, there is 
no way to perform any data cache flush from userspace under Windows.

Out of curiosity, I looked in llvm how __builtin___clear_cache was 
implemented, and for windows-arm64, it's simply a call to the same 
function, FlushInstructionCache [2].

This explains why the generic implementation of flush_idcache_range does 
the correct thing for this platform, and why tests I ran worked. Thus, 
it's probably the best approach we can have for now.

[1] 
https://hg.mozilla.org/mozilla-central/file/tip/js/src/jit/arm64/vixl/MozCpu-vixl.cpp#l110
[2] 
https://github.com/llvm/llvm-project/blob/574e417460cdfebb8157fbe61b6a015e44856f64/compiler-rt/lib/builtins/clear_cache.c#L66

On 2/15/23 19:22, Richard Henderson wrote:
> On 2/15/23 02:49, Pierrick Bouvier wrote:
>> I'm not an expert on this area, but I imagine that booting a full VM will force TCG to
>> emit code at the same address twice (after having generated enough translated blocks),
>> which shows that generic flush_idcache_range works. Is that reasoning correct?
>>
>> Do you think we still need a specialized version for windows-arm64?
> 
> No, I don't think so.  It would be ideal if we were able to read CTR_EL0, because we make
> use of the IDC field, but 0 is a safe default.
> 
> 
> r~
diff mbox series

Patch

diff --git a/util/cacheflush.c b/util/cacheflush.c
index 2c2c73e085..149f103d32 100644
--- a/util/cacheflush.c
+++ b/util/cacheflush.c
@@ -121,8 +121,9 @@  static void sys_cache_info(int *isize, int *dsize)
 static bool have_coherent_icache;
 #endif
 
-#if defined(__aarch64__) && !defined(CONFIG_DARWIN)
+#if defined(__aarch64__) && !defined(CONFIG_DARWIN) && !defined(CONFIG_WIN32)
 /* Apple does not expose CTR_EL0, so we must use system interfaces. */
+/* Does not work on windows-arm64, illegal instruction using mrs */
 static uint64_t save_ctr_el0;
 static void arch_cache_info(int *isize, int *dsize)
 {
@@ -225,7 +226,7 @@  static void __attribute__((constructor)) init_cache_info(void)
 
 /* Caches are coherent and do not require flushing; symbol inline. */
 
-#elif defined(__aarch64__)
+#elif defined(__aarch64__) && !defined(CONFIG_WIN32)
 
 #ifdef CONFIG_DARWIN
 /* Apple does not expose CTR_EL0, so we must use system interfaces. */