mbox series

[v6,00/25] target/riscv: MSTATUS_SUM + cleanups

Message ID 20230325105429.1142530-1-richard.henderson@linaro.org
Headers show
Series target/riscv: MSTATUS_SUM + cleanups | expand

Message

Richard Henderson March 25, 2023, 10:54 a.m. UTC
This builds on Fei and Zhiwei's SUM and TB_FLAGS changes.

  * Reclaim 5 TB_FLAGS bits, since we nearly ran out.

  * Using cpu_mmu_index(env, true) is insufficient to implement
    HLVX properly.  While that chooses the correct mmu_idx, it
    does not perform the read with execute permission.
    I add a new tcg interface to perform a read-for-execute with
    an arbitrary mmu_idx.  This is still not 100% compliant, but
    it's closer.

  * Handle mstatus.MPV in cpu_mmu_index.
  * Use vsstatus.SUM when required for MMUIdx_S_SUM.
  * Cleanups for get_physical_address.

While this passes check-avocado, I'm sure that's insufficient.
Please have a close look.


r~


Fei Wu (2):
  target/riscv: Separate priv from mmu_idx
  target/riscv: Reduce overhead of MSTATUS_SUM change

LIU Zhiwei (4):
  target/riscv: Extract virt enabled state from tb flags
  target/riscv: Add a general status enum for extensions
  target/riscv: Encode the FS and VS on a normal way for tb flags
  target/riscv: Add a tb flags field for vstart

Richard Henderson (19):
  target/riscv: Remove mstatus_hs_{fs,vs} from tb_flags
  accel/tcg: Add cpu_ld*_code_mmu
  target/riscv: Use cpu_ld*_code_mmu for HLVX
  target/riscv: Handle HLV, HSV via helpers
  target/riscv: Rename MMU_HYP_ACCESS_BIT to MMU_2STAGE_BIT
  target/riscv: Introduce mmuidx_sum
  target/riscv: Introduce mmuidx_priv
  target/riscv: Introduce mmuidx_2stage
  target/riscv: Move hstatus.spvp check to check_access_hlsv
  target/riscv: Set MMU_2STAGE_BIT in riscv_cpu_mmu_index
  target/riscv: Check SUM in the correct register
  target/riscv: Hoist second stage mode change to callers
  target/riscv: Hoist pbmte and hade out of the level loop
  target/riscv: Move leaf pte processing out of level loop
  target/riscv: Suppress pte update with is_debug
  target/riscv: Don't modify SUM with is_debug
  target/riscv: Merge checks for reserved pte flags
  target/riscv: Reorg access check in get_physical_address
  target/riscv: Reorg sum check in get_physical_address

 include/exec/cpu_ldst.h                       |   9 +
 target/riscv/cpu.h                            |  47 ++-
 target/riscv/cpu_bits.h                       |  12 +-
 target/riscv/helper.h                         |  12 +-
 target/riscv/internals.h                      |  35 ++
 accel/tcg/cputlb.c                            |  48 +++
 accel/tcg/user-exec.c                         |  58 +++
 target/riscv/cpu.c                            |   2 +-
 target/riscv/cpu_helper.c                     | 393 +++++++++---------
 target/riscv/csr.c                            |  21 +-
 target/riscv/op_helper.c                      | 113 ++++-
 target/riscv/translate.c                      |  72 ++--
 .../riscv/insn_trans/trans_privileged.c.inc   |   2 +-
 target/riscv/insn_trans/trans_rvf.c.inc       |   2 +-
 target/riscv/insn_trans/trans_rvh.c.inc       | 135 +++---
 target/riscv/insn_trans/trans_rvv.c.inc       |  22 +-
 target/riscv/insn_trans/trans_xthead.c.inc    |   7 +-
 17 files changed, 595 insertions(+), 395 deletions(-)

Comments

Richard Henderson March 26, 2023, 5:17 a.m. UTC | #1
On 3/25/23 03:54, Richard Henderson wrote:
> This builds on Fei and Zhiwei's SUM and TB_FLAGS changes.
> 
>    * Reclaim 5 TB_FLAGS bits, since we nearly ran out.
> 
>    * Using cpu_mmu_index(env, true) is insufficient to implement
>      HLVX properly.  While that chooses the correct mmu_idx, it
>      does not perform the read with execute permission.
>      I add a new tcg interface to perform a read-for-execute with
>      an arbitrary mmu_idx.  This is still not 100% compliant, but
>      it's closer.
> 
>    * Handle mstatus.MPV in cpu_mmu_index.
>    * Use vsstatus.SUM when required for MMUIdx_S_SUM.
>    * Cleanups for get_physical_address.
> 
> While this passes check-avocado, I'm sure that's insufficient.
> Please have a close look.

Somewhere after either patch 16 or 17, when env->virt is considered in riscv_cpu_mmu_index 
and a few other bugs are fixed, we can do

--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -591,11 +591,6 @@ void riscv_cpu_set_virt_enabled
          return;
      }

-    /* Flush the TLB on all virt mode changes. */
-    if (get_field(env->virt, VIRT_ONOFF) != enable) {
-        tlb_flush(env_cpu(env));
-    }
-
      env->virt = set_field(env->virt, VIRT_ONOFF, enable);

      if (enable) {
-- %< --

Because we're no longer trying to overlap the VS and HS tlbs on the same mmuidx.

I have been pondering additional changes that would avoid flushing for MXR changes, so 
that user-lookups from M-mode firmware gets the same advantage as has just been done for 
SUM.  But this is complicated by needing 12 (!) more mmuidx, which cannot currently be 
represented, nor does it even seem like the best idea.


r~
Weiwei Li March 26, 2023, 2:18 p.m. UTC | #2
On 2023/3/25 18:54, Richard Henderson wrote:
> This builds on Fei and Zhiwei's SUM and TB_FLAGS changes.
>
>    * Reclaim 5 TB_FLAGS bits, since we nearly ran out.
>
>    * Using cpu_mmu_index(env, true) is insufficient to implement
>      HLVX properly.  While that chooses the correct mmu_idx, it
>      does not perform the read with execute permission.
>      I add a new tcg interface to perform a read-for-execute with
>      an arbitrary mmu_idx.  This is still not 100% compliant, but
>      it's closer.
>
>    * Handle mstatus.MPV in cpu_mmu_index.
>    * Use vsstatus.SUM when required for MMUIdx_S_SUM.
>    * Cleanups for get_physical_address.
>
> While this passes check-avocado, I'm sure that's insufficient.
> Please have a close look.
>
>
> r~
>
>
> Fei Wu (2):
>    target/riscv: Separate priv from mmu_idx
>    target/riscv: Reduce overhead of MSTATUS_SUM change
>
> LIU Zhiwei (4):
>    target/riscv: Extract virt enabled state from tb flags
>    target/riscv: Add a general status enum for extensions
>    target/riscv: Encode the FS and VS on a normal way for tb flags
>    target/riscv: Add a tb flags field for vstart
>
> Richard Henderson (19):
>    target/riscv: Remove mstatus_hs_{fs,vs} from tb_flags
>    accel/tcg: Add cpu_ld*_code_mmu
>    target/riscv: Use cpu_ld*_code_mmu for HLVX
>    target/riscv: Handle HLV, HSV via helpers
>    target/riscv: Rename MMU_HYP_ACCESS_BIT to MMU_2STAGE_BIT
>    target/riscv: Introduce mmuidx_sum
>    target/riscv: Introduce mmuidx_priv
>    target/riscv: Introduce mmuidx_2stage
>    target/riscv: Move hstatus.spvp check to check_access_hlsv
>    target/riscv: Set MMU_2STAGE_BIT in riscv_cpu_mmu_index
>    target/riscv: Check SUM in the correct register
>    target/riscv: Hoist second stage mode change to callers
>    target/riscv: Hoist pbmte and hade out of the level loop
>    target/riscv: Move leaf pte processing out of level loop
>    target/riscv: Suppress pte update with is_debug
>    target/riscv: Don't modify SUM with is_debug
>    target/riscv: Merge checks for reserved pte flags
>    target/riscv: Reorg access check in get_physical_address
>    target/riscv: Reorg sum check in get_physical_address
>
>   include/exec/cpu_ldst.h                       |   9 +
>   target/riscv/cpu.h                            |  47 ++-
>   target/riscv/cpu_bits.h                       |  12 +-
>   target/riscv/helper.h                         |  12 +-
>   target/riscv/internals.h                      |  35 ++
>   accel/tcg/cputlb.c                            |  48 +++
>   accel/tcg/user-exec.c                         |  58 +++
>   target/riscv/cpu.c                            |   2 +-
>   target/riscv/cpu_helper.c                     | 393 +++++++++---------
>   target/riscv/csr.c                            |  21 +-
>   target/riscv/op_helper.c                      | 113 ++++-
>   target/riscv/translate.c                      |  72 ++--
>   .../riscv/insn_trans/trans_privileged.c.inc   |   2 +-
>   target/riscv/insn_trans/trans_rvf.c.inc       |   2 +-
>   target/riscv/insn_trans/trans_rvh.c.inc       | 135 +++---
>   target/riscv/insn_trans/trans_rvv.c.inc       |  22 +-
>   target/riscv/insn_trans/trans_xthead.c.inc    |   7 +-
>   17 files changed, 595 insertions(+), 395 deletions(-)

This patchset is LGTM.  Patch 16 seems fix a bug in the two_stage 
parameter of raise_mmu_exception

called when translation fails,  it didn't take MPV into consideration in 
original implementation.

Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>

Weiwei Li

>
Daniel Henrique Barboza March 27, 2023, 4:43 p.m. UTC | #3
On 3/25/23 07:54, Richard Henderson wrote:
> This builds on Fei and Zhiwei's SUM and TB_FLAGS changes.
> 
>    * Reclaim 5 TB_FLAGS bits, since we nearly ran out.
> 
>    * Using cpu_mmu_index(env, true) is insufficient to implement
>      HLVX properly.  While that chooses the correct mmu_idx, it
>      does not perform the read with execute permission.
>      I add a new tcg interface to perform a read-for-execute with
>      an arbitrary mmu_idx.  This is still not 100% compliant, but
>      it's closer.
> 
>    * Handle mstatus.MPV in cpu_mmu_index.
>    * Use vsstatus.SUM when required for MMUIdx_S_SUM.
>    * Cleanups for get_physical_address.
> 
> While this passes check-avocado, I'm sure that's insufficient.
> Please have a close look.

Tested fine in my end with some buildroot tests and 'stress-ng' in a 'virt'
machine with Ubuntu.

Tested-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>

> 
> 
> r~
> 
> 
> Fei Wu (2):
>    target/riscv: Separate priv from mmu_idx
>    target/riscv: Reduce overhead of MSTATUS_SUM change
> 
> LIU Zhiwei (4):
>    target/riscv: Extract virt enabled state from tb flags
>    target/riscv: Add a general status enum for extensions
>    target/riscv: Encode the FS and VS on a normal way for tb flags
>    target/riscv: Add a tb flags field for vstart
> 
> Richard Henderson (19):
>    target/riscv: Remove mstatus_hs_{fs,vs} from tb_flags
>    accel/tcg: Add cpu_ld*_code_mmu
>    target/riscv: Use cpu_ld*_code_mmu for HLVX
>    target/riscv: Handle HLV, HSV via helpers
>    target/riscv: Rename MMU_HYP_ACCESS_BIT to MMU_2STAGE_BIT
>    target/riscv: Introduce mmuidx_sum
>    target/riscv: Introduce mmuidx_priv
>    target/riscv: Introduce mmuidx_2stage
>    target/riscv: Move hstatus.spvp check to check_access_hlsv
>    target/riscv: Set MMU_2STAGE_BIT in riscv_cpu_mmu_index
>    target/riscv: Check SUM in the correct register
>    target/riscv: Hoist second stage mode change to callers
>    target/riscv: Hoist pbmte and hade out of the level loop
>    target/riscv: Move leaf pte processing out of level loop
>    target/riscv: Suppress pte update with is_debug
>    target/riscv: Don't modify SUM with is_debug
>    target/riscv: Merge checks for reserved pte flags
>    target/riscv: Reorg access check in get_physical_address
>    target/riscv: Reorg sum check in get_physical_address
> 
>   include/exec/cpu_ldst.h                       |   9 +
>   target/riscv/cpu.h                            |  47 ++-
>   target/riscv/cpu_bits.h                       |  12 +-
>   target/riscv/helper.h                         |  12 +-
>   target/riscv/internals.h                      |  35 ++
>   accel/tcg/cputlb.c                            |  48 +++
>   accel/tcg/user-exec.c                         |  58 +++
>   target/riscv/cpu.c                            |   2 +-
>   target/riscv/cpu_helper.c                     | 393 +++++++++---------
>   target/riscv/csr.c                            |  21 +-
>   target/riscv/op_helper.c                      | 113 ++++-
>   target/riscv/translate.c                      |  72 ++--
>   .../riscv/insn_trans/trans_privileged.c.inc   |   2 +-
>   target/riscv/insn_trans/trans_rvf.c.inc       |   2 +-
>   target/riscv/insn_trans/trans_rvh.c.inc       | 135 +++---
>   target/riscv/insn_trans/trans_rvv.c.inc       |  22 +-
>   target/riscv/insn_trans/trans_xthead.c.inc    |   7 +-
>   17 files changed, 595 insertions(+), 395 deletions(-)
>
Wu, Fei March 28, 2023, 1:22 a.m. UTC | #4
On 3/28/2023 12:43 AM, Daniel Henrique Barboza wrote:
> 
> 
> On 3/25/23 07:54, Richard Henderson wrote:
>> This builds on Fei and Zhiwei's SUM and TB_FLAGS changes.
>>
>>    * Reclaim 5 TB_FLAGS bits, since we nearly ran out.
>>
>>    * Using cpu_mmu_index(env, true) is insufficient to implement
>>      HLVX properly.  While that chooses the correct mmu_idx, it
>>      does not perform the read with execute permission.
>>      I add a new tcg interface to perform a read-for-execute with
>>      an arbitrary mmu_idx.  This is still not 100% compliant, but
>>      it's closer.
>>
>>    * Handle mstatus.MPV in cpu_mmu_index.
>>    * Use vsstatus.SUM when required for MMUIdx_S_SUM.
>>    * Cleanups for get_physical_address.
>>
>> While this passes check-avocado, I'm sure that's insufficient.
>> Please have a close look.
> 
> Tested fine in my end with some buildroot tests and 'stress-ng' in a 'virt'
> machine with Ubuntu.
> 
> Tested-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> 
Great. I suppose class 'os' in stress-ng should see performance boost too.

btw, Is there any public URL for us to check QEMU regressions and
performance data?

Thanks,
Fei.

>>
>>
>> r~
>>
>>
>> Fei Wu (2):
>>    target/riscv: Separate priv from mmu_idx
>>    target/riscv: Reduce overhead of MSTATUS_SUM change
>>
>> LIU Zhiwei (4):
>>    target/riscv: Extract virt enabled state from tb flags
>>    target/riscv: Add a general status enum for extensions
>>    target/riscv: Encode the FS and VS on a normal way for tb flags
>>    target/riscv: Add a tb flags field for vstart
>>
>> Richard Henderson (19):
>>    target/riscv: Remove mstatus_hs_{fs,vs} from tb_flags
>>    accel/tcg: Add cpu_ld*_code_mmu
>>    target/riscv: Use cpu_ld*_code_mmu for HLVX
>>    target/riscv: Handle HLV, HSV via helpers
>>    target/riscv: Rename MMU_HYP_ACCESS_BIT to MMU_2STAGE_BIT
>>    target/riscv: Introduce mmuidx_sum
>>    target/riscv: Introduce mmuidx_priv
>>    target/riscv: Introduce mmuidx_2stage
>>    target/riscv: Move hstatus.spvp check to check_access_hlsv
>>    target/riscv: Set MMU_2STAGE_BIT in riscv_cpu_mmu_index
>>    target/riscv: Check SUM in the correct register
>>    target/riscv: Hoist second stage mode change to callers
>>    target/riscv: Hoist pbmte and hade out of the level loop
>>    target/riscv: Move leaf pte processing out of level loop
>>    target/riscv: Suppress pte update with is_debug
>>    target/riscv: Don't modify SUM with is_debug
>>    target/riscv: Merge checks for reserved pte flags
>>    target/riscv: Reorg access check in get_physical_address
>>    target/riscv: Reorg sum check in get_physical_address
>>
>>   include/exec/cpu_ldst.h                       |   9 +
>>   target/riscv/cpu.h                            |  47 ++-
>>   target/riscv/cpu_bits.h                       |  12 +-
>>   target/riscv/helper.h                         |  12 +-
>>   target/riscv/internals.h                      |  35 ++
>>   accel/tcg/cputlb.c                            |  48 +++
>>   accel/tcg/user-exec.c                         |  58 +++
>>   target/riscv/cpu.c                            |   2 +-
>>   target/riscv/cpu_helper.c                     | 393 +++++++++---------
>>   target/riscv/csr.c                            |  21 +-
>>   target/riscv/op_helper.c                      | 113 ++++-
>>   target/riscv/translate.c                      |  72 ++--
>>   .../riscv/insn_trans/trans_privileged.c.inc   |   2 +-
>>   target/riscv/insn_trans/trans_rvf.c.inc       |   2 +-
>>   target/riscv/insn_trans/trans_rvh.c.inc       | 135 +++---
>>   target/riscv/insn_trans/trans_rvv.c.inc       |  22 +-
>>   target/riscv/insn_trans/trans_xthead.c.inc    |   7 +-
>>   17 files changed, 595 insertions(+), 395 deletions(-)
>>
Wu, Fei April 4, 2023, 6:42 a.m. UTC | #5
On 3/25/2023 6:54 PM, Richard Henderson wrote:
> This builds on Fei and Zhiwei's SUM and TB_FLAGS changes.
> 
>   * Reclaim 5 TB_FLAGS bits, since we nearly ran out.
> 
>   * Using cpu_mmu_index(env, true) is insufficient to implement
>     HLVX properly.  While that chooses the correct mmu_idx, it
>     does not perform the read with execute permission.
>     I add a new tcg interface to perform a read-for-execute with
>     an arbitrary mmu_idx.  This is still not 100% compliant, but
>     it's closer.
> 
>   * Handle mstatus.MPV in cpu_mmu_index.
>   * Use vsstatus.SUM when required for MMUIdx_S_SUM.
>   * Cleanups for get_physical_address.
> 
> While this passes check-avocado, I'm sure that's insufficient.
> Please have a close look.
> 
I tested stress-ng to get the feeling of performance gain, although
stress-ng is not designed to be a performance workload. btw, I had to
revert commit 0ee342256af9 which is unrelated to this series, or qemu
exited during the test.
    ./stress-ng --timeout 5 --metrics-brief --class os --sequential 1

Here is the result, in general most of the tests benefit from these
series, but please note that not all the results are necessary to be
consistent across multiple runs, and some regressions are not real but I
haven't checked it one by one.

             master(60ca584b)   master + this      speedup

stressor           bogo ops/s      bogo ops/s
               (usr+sys time)  (usr+sys time)
sigsuspend            19430.09      1492746.34     76.8265
utime                  8779.64       271023.89     30.8696
chmod                  1728.26        27050.50     15.6519
vdso               23527136.74    246955742.76     10.4966
signal               584521.13      5470775.44     9.35941
sigtrap              822935.76      7190973.63     8.7382
signest              802706.93      6969509.05     8.68251
sockpair             501188.08      4242275.08     8.46444
msg                 1627863.38     13557215.89     8.32823
sigpending           551174.68      4575836.91     8.30197
locka               1447750.95     11727762.91     8.10068
lockofd             1460020.77     11562178.66     7.91919
sigsegv              718492.57      5673228.57     7.89602
getrandom            129004.90      1006544.31     7.80237
sigq                 892062.12      6828556.43     7.6548
chdir                    13.39          100.66     7.51755
timerfd             2074142.37     15395307.29     7.42249
mq                   916620.00      6208148.59     6.77287
mutex               1124306.59      7285459.79     6.47996
urandom              104868.58       678510.46     6.4701
pipe                2243935.71     14391093.39     6.41333
loadavg              463874.30      2936816.17     6.33106
fifo                 423415.43      2632734.32     6.21785
vm                    16726.91        99928.62     5.97412
handle               199246.08      1131172.45     5.67726
fstat                  2383.12        13479.35     5.65618
sigrt                405007.13      2143758.11     5.29314
access                 8449.17        44145.10     5.22479
sigfd               1506073.95      7408089.06     4.91881
sysinfo               11711.47        54868.08     4.68499
sigio               1672452.59      7564833.33     4.5232
rlimit                26771.83       119476.12     4.46276
xattr                   772.25         3412.81     4.41931
udp                  595733.08      2495239.72     4.18852
sockfd               260825.22      1061910.05     4.07135
get                   13169.56        52788.06     4.00834
getdent              141465.81       564471.43     3.99016
rename                61771.74       246277.28     3.98689
chown                 54946.74       212353.58     3.86472
dev                    3555.80        13596.14     3.82365
mincore                6617.92        25215.66     3.81021
file-ioctl           105919.35       398122.29     3.75873
link                     15.45           56.02     3.62589
splice               239841.25       865390.06     3.60818
io-uring              45798.90       157006.17     3.42816
filename               7795.98        26238.75     3.36568
sock                   1746.96         5850.73     3.34909
vm-splice            953550.50      3188724.62     3.34405
schedpolicy          231915.33       773655.76     3.33594
clock                 21878.02        72400.21     3.30927
fcntl                 76122.11       245817.92     3.22926
dentry                79533.95       247610.80     3.11327
fpunch                11895.30        36608.97     3.0776
revio                866066.56      2596187.53     2.99768
null                2351038.37      6984334.92     2.97074
mknod                 71145.05       203284.26     2.85732
symlink                  12.40           35.41     2.85565
fiemap                45437.02       128983.69     2.83874
sleep                100093.89       282540.81     2.82276
dir                   99154.72       272727.21     2.75052
timer                126419.44       344857.10     2.72788
set                   70640.29       192423.96     2.724
udp-flood            662581.75      1782759.62     2.69063
ioprio                 7030.55        18807.67     2.67513
epoll                147525.39       387861.58     2.62912
vm-rw                  1437.12         3774.13     2.62618
kill                 234075.90       613281.66     2.62001
hdd                   99017.45       257841.08     2.604
rtc                   57639.55       149363.61     2.59134
dirmany              127249.90       323667.85     2.54356
sem-sysv            1096787.78      2753588.88     2.51059
close                194579.21       482854.54     2.48153
dnotify               15125.16        37097.94     2.45273
dccp                   7554.97        18429.65     2.43941
lease                285588.09       692990.31     2.42654
eventfd              282256.72       681576.60     2.41474
sockdiag           14803911.93     34934756.45     2.35983
memfd                  3632.45         8513.45     2.34372
tee                  124239.86       290298.68     2.3366
alarm                 78757.48       181210.40     2.30087
poll                 128638.34       292293.31     2.27221
open                 189323.41       418865.86     2.21244
sigpipe              222534.69       486854.87     2.18777
pty                      18.95           39.13     2.06491
futex               1333749.78      2742935.07     2.05656
lockf                648732.25      1321326.88     2.03678
kcmp                 734152.03      1452613.12     1.97863
procfs                 7378.58        14503.74     1.96565
sockmany              94910.81       180132.46     1.89791
dirdeep               10330.82        19390.08     1.87692
touch                 97843.94       182585.97     1.86609
chattr                 2952.98         5426.15     1.83752
mmaphuge                430.84          738.17     1.71333
sem                  649644.88      1107290.70     1.70446
ptrace              1010862.41      1677555.44     1.65953
vfork                244944.97       403514.39     1.64737
nanosleep             23147.04        38097.83     1.64591
mprotect            1068863.24      1729245.09     1.61784
pipeherd             720787.08      1157261.92     1.60555
pthread               48395.68        76169.49     1.57389
enosys                 8271.11        12705.37     1.53611
sockabuse              2825.44         4251.52     1.50473
af-alg               620270.87       916118.93     1.47697
fork                  10583.97        15363.15     1.45155
copy-file              6675.07         9389.54     1.40666
resched             1730236.55      2421449.49     1.39949
msync                 93196.18       122263.64     1.3119
vforkmany            239372.56       304313.41     1.2713
vm-segv               11918.23        14981.24     1.257
readahead            261489.55       321372.13     1.22901
sendfile             147043.77       174971.03     1.18992
dynlib                 8526.99        10078.23     1.18192
fault                 86430.63       100320.47     1.16071
dup                    9829.71        11264.11     1.14592
full                 473749.38       541801.33     1.14365
mmapaddr             315772.34       351766.42     1.11399
spawn                  3937.57         4384.92     1.11361
io                   371206.67       409205.80     1.10237
munmap                64162.14        70473.66     1.09837
exit-group             5990.95         6522.70     1.08876
pidfd                 37614.16        40687.85     1.08172
flock              14069057.61     15117799.43     1.07454
wait                 106334.40       113658.40     1.06888
mmapfork                  1.81            1.93     1.0663
daemon              1161091.36      1234795.43     1.06348
bigheap              185514.46       195279.13     1.05264
mmapfixed               319.65          333.70     1.04395
brk                 1410050.59      1456025.25     1.0326
sigabrt               12129.51        12520.45     1.03223
sysfs                   806.78          831.54     1.03069
dev-shm                  40.30           41.37     1.02655
bad-altstack           7310.53         7493.23     1.02499
shm                     823.73          842.50     1.02279
shm-sysv               1132.54         1151.86     1.01706
mmapmany             400323.77       406078.50     1.01438
session               12096.44        12228.64     1.01093
madvise                 116.81          117.96     1.00985
clone                 28152.35        28414.20     1.0093
msyncmany              2220.25         2238.88     1.00839
pageswap             205651.13       207367.84     1.00835
unshare                 637.92          642.98     1.00793
remap                   373.18          375.69     1.00673
personality         1698012.68      1706642.92     1.00508
reboot               117234.02       117421.91     1.0016
itimer                24962.64        24971.19     1.00034
sync-file                 0.00            0.00     1
sigfpe                    0.00            0.00     1
seek                      0.00            0.00     1
inode-flags               0.00            0.00     1
env                       0.00            0.00     1
prctl                 11805.81        11772.73     0.997198
malloc               991487.43       987061.41     0.995536
mmap                     14.48           14.39     0.993785
zombie                33753.24        33539.75     0.993675
rmap                    625.84          620.94     0.992171
tlb-shootdown           358.25          355.33     0.991849
switch              1251701.93      1240818.57     0.991305
zero                 127112.38       125254.50     0.985384
resources               685.62          674.89     0.98435
yield               4184626.17      4117860.34     0.984045
mlock                494527.50       485733.90     0.982218
fallocate             32711.39        32067.69     0.980322
sigchld               46289.82        44914.65     0.970292
inotify                3013.11         2879.87     0.95578
opcode                11315.78        10538.58     0.931317
nice                 154327.30       136797.63     0.886412
mremap                  225.29          198.82     0.882507
exec                   4118.89         3282.85     0.797023
vm-addr                 214.25          166.69     0.778016
landlock                950.00          722.74     0.760779

Thanks,
Fei.
> 
> r~
> 
> 
> Fei Wu (2):
>   target/riscv: Separate priv from mmu_idx
>   target/riscv: Reduce overhead of MSTATUS_SUM change
> 
> LIU Zhiwei (4):
>   target/riscv: Extract virt enabled state from tb flags
>   target/riscv: Add a general status enum for extensions
>   target/riscv: Encode the FS and VS on a normal way for tb flags
>   target/riscv: Add a tb flags field for vstart
> 
> Richard Henderson (19):
>   target/riscv: Remove mstatus_hs_{fs,vs} from tb_flags
>   accel/tcg: Add cpu_ld*_code_mmu
>   target/riscv: Use cpu_ld*_code_mmu for HLVX
>   target/riscv: Handle HLV, HSV via helpers
>   target/riscv: Rename MMU_HYP_ACCESS_BIT to MMU_2STAGE_BIT
>   target/riscv: Introduce mmuidx_sum
>   target/riscv: Introduce mmuidx_priv
>   target/riscv: Introduce mmuidx_2stage
>   target/riscv: Move hstatus.spvp check to check_access_hlsv
>   target/riscv: Set MMU_2STAGE_BIT in riscv_cpu_mmu_index
>   target/riscv: Check SUM in the correct register
>   target/riscv: Hoist second stage mode change to callers
>   target/riscv: Hoist pbmte and hade out of the level loop
>   target/riscv: Move leaf pte processing out of level loop
>   target/riscv: Suppress pte update with is_debug
>   target/riscv: Don't modify SUM with is_debug
>   target/riscv: Merge checks for reserved pte flags
>   target/riscv: Reorg access check in get_physical_address
>   target/riscv: Reorg sum check in get_physical_address
> 
>  include/exec/cpu_ldst.h                       |   9 +
>  target/riscv/cpu.h                            |  47 ++-
>  target/riscv/cpu_bits.h                       |  12 +-
>  target/riscv/helper.h                         |  12 +-
>  target/riscv/internals.h                      |  35 ++
>  accel/tcg/cputlb.c                            |  48 +++
>  accel/tcg/user-exec.c                         |  58 +++
>  target/riscv/cpu.c                            |   2 +-
>  target/riscv/cpu_helper.c                     | 393 +++++++++---------
>  target/riscv/csr.c                            |  21 +-
>  target/riscv/op_helper.c                      | 113 ++++-
>  target/riscv/translate.c                      |  72 ++--
>  .../riscv/insn_trans/trans_privileged.c.inc   |   2 +-
>  target/riscv/insn_trans/trans_rvf.c.inc       |   2 +-
>  target/riscv/insn_trans/trans_rvh.c.inc       | 135 +++---
>  target/riscv/insn_trans/trans_rvv.c.inc       |  22 +-
>  target/riscv/insn_trans/trans_xthead.c.inc    |   7 +-
>  17 files changed, 595 insertions(+), 395 deletions(-)
>
LIU Zhiwei April 4, 2023, 7:11 a.m. UTC | #6
On 2023/4/4 14:42, Wu, Fei wrote:
> On 3/25/2023 6:54 PM, Richard Henderson wrote:
>> This builds on Fei and Zhiwei's SUM and TB_FLAGS changes.
>>
>>    * Reclaim 5 TB_FLAGS bits, since we nearly ran out.
>>
>>    * Using cpu_mmu_index(env, true) is insufficient to implement
>>      HLVX properly.  While that chooses the correct mmu_idx, it
>>      does not perform the read with execute permission.
>>      I add a new tcg interface to perform a read-for-execute with
>>      an arbitrary mmu_idx.  This is still not 100% compliant, but
>>      it's closer.
>>
>>    * Handle mstatus.MPV in cpu_mmu_index.
>>    * Use vsstatus.SUM when required for MMUIdx_S_SUM.
>>    * Cleanups for get_physical_address.
>>
>> While this passes check-avocado, I'm sure that's insufficient.
>> Please have a close look.
>>
> I tested stress-ng to get the feeling of performance gain, although
> stress-ng is not designed to be a performance workload. btw, I had to
> revert commit 0ee342256af9 which is unrelated to this series, or qemu
> exited during the test.
>      ./stress-ng --timeout 5 --metrics-brief --class os --sequential 1
>
> Here is the result, in general most of the tests benefit from these
> series, but please note that not all the results are necessary to be
> consistent across multiple runs, and some regressions are not real but I
> haven't checked it one by one.
>
>               master(60ca584b)   master + this      speedup
>
> stressor           bogo ops/s      bogo ops/s
>                 (usr+sys time)  (usr+sys time)
> sigsuspend            19430.09      1492746.34     76.8265
> utime                  8779.64       271023.89     30.8696
> chmod                  1728.26        27050.50     15.6519
> vdso               23527136.74    246955742.76     10.4966
> signal               584521.13      5470775.44     9.35941
> sigtrap              822935.76      7190973.63     8.7382
> signest              802706.93      6969509.05     8.68251
> sockpair             501188.08      4242275.08     8.46444
> msg                 1627863.38     13557215.89     8.32823
> sigpending           551174.68      4575836.91     8.30197
> locka               1447750.95     11727762.91     8.10068
> lockofd             1460020.77     11562178.66     7.91919
> sigsegv              718492.57      5673228.57     7.89602
> getrandom            129004.90      1006544.31     7.80237
> sigq                 892062.12      6828556.43     7.6548
> chdir                    13.39          100.66     7.51755
> timerfd             2074142.37     15395307.29     7.42249
> mq                   916620.00      6208148.59     6.77287
> mutex               1124306.59      7285459.79     6.47996
> urandom              104868.58       678510.46     6.4701
> pipe                2243935.71     14391093.39     6.41333
> loadavg              463874.30      2936816.17     6.33106
> fifo                 423415.43      2632734.32     6.21785
> vm                    16726.91        99928.62     5.97412
> handle               199246.08      1131172.45     5.67726
> fstat                  2383.12        13479.35     5.65618
> sigrt                405007.13      2143758.11     5.29314
> access                 8449.17        44145.10     5.22479
> sigfd               1506073.95      7408089.06     4.91881
> sysinfo               11711.47        54868.08     4.68499
> sigio               1672452.59      7564833.33     4.5232
> rlimit                26771.83       119476.12     4.46276
> xattr                   772.25         3412.81     4.41931
> udp                  595733.08      2495239.72     4.18852
> sockfd               260825.22      1061910.05     4.07135
> get                   13169.56        52788.06     4.00834
> getdent              141465.81       564471.43     3.99016
> rename                61771.74       246277.28     3.98689
> chown                 54946.74       212353.58     3.86472
> dev                    3555.80        13596.14     3.82365
> mincore                6617.92        25215.66     3.81021
> file-ioctl           105919.35       398122.29     3.75873
> link                     15.45           56.02     3.62589
> splice               239841.25       865390.06     3.60818
> io-uring              45798.90       157006.17     3.42816
> filename               7795.98        26238.75     3.36568
> sock                   1746.96         5850.73     3.34909
> vm-splice            953550.50      3188724.62     3.34405
> schedpolicy          231915.33       773655.76     3.33594
> clock                 21878.02        72400.21     3.30927
> fcntl                 76122.11       245817.92     3.22926
> dentry                79533.95       247610.80     3.11327
> fpunch                11895.30        36608.97     3.0776
> revio                866066.56      2596187.53     2.99768
> null                2351038.37      6984334.92     2.97074
> mknod                 71145.05       203284.26     2.85732
> symlink                  12.40           35.41     2.85565
> fiemap                45437.02       128983.69     2.83874
> sleep                100093.89       282540.81     2.82276
> dir                   99154.72       272727.21     2.75052
> timer                126419.44       344857.10     2.72788
> set                   70640.29       192423.96     2.724
> udp-flood            662581.75      1782759.62     2.69063
> ioprio                 7030.55        18807.67     2.67513
> epoll                147525.39       387861.58     2.62912
> vm-rw                  1437.12         3774.13     2.62618
> kill                 234075.90       613281.66     2.62001
> hdd                   99017.45       257841.08     2.604
> rtc                   57639.55       149363.61     2.59134
> dirmany              127249.90       323667.85     2.54356
> sem-sysv            1096787.78      2753588.88     2.51059
> close                194579.21       482854.54     2.48153
> dnotify               15125.16        37097.94     2.45273
> dccp                   7554.97        18429.65     2.43941
> lease                285588.09       692990.31     2.42654
> eventfd              282256.72       681576.60     2.41474
> sockdiag           14803911.93     34934756.45     2.35983
> memfd                  3632.45         8513.45     2.34372
> tee                  124239.86       290298.68     2.3366
> alarm                 78757.48       181210.40     2.30087
> poll                 128638.34       292293.31     2.27221
> open                 189323.41       418865.86     2.21244
> sigpipe              222534.69       486854.87     2.18777
> pty                      18.95           39.13     2.06491
> futex               1333749.78      2742935.07     2.05656
> lockf                648732.25      1321326.88     2.03678
> kcmp                 734152.03      1452613.12     1.97863
> procfs                 7378.58        14503.74     1.96565
> sockmany              94910.81       180132.46     1.89791
> dirdeep               10330.82        19390.08     1.87692
> touch                 97843.94       182585.97     1.86609
> chattr                 2952.98         5426.15     1.83752
> mmaphuge                430.84          738.17     1.71333
> sem                  649644.88      1107290.70     1.70446
> ptrace              1010862.41      1677555.44     1.65953
> vfork                244944.97       403514.39     1.64737
> nanosleep             23147.04        38097.83     1.64591
> mprotect            1068863.24      1729245.09     1.61784
> pipeherd             720787.08      1157261.92     1.60555
> pthread               48395.68        76169.49     1.57389
> enosys                 8271.11        12705.37     1.53611
> sockabuse              2825.44         4251.52     1.50473
> af-alg               620270.87       916118.93     1.47697
> fork                  10583.97        15363.15     1.45155
> copy-file              6675.07         9389.54     1.40666
> resched             1730236.55      2421449.49     1.39949
> msync                 93196.18       122263.64     1.3119
> vforkmany            239372.56       304313.41     1.2713
> vm-segv               11918.23        14981.24     1.257
> readahead            261489.55       321372.13     1.22901
> sendfile             147043.77       174971.03     1.18992
> dynlib                 8526.99        10078.23     1.18192
> fault                 86430.63       100320.47     1.16071
> dup                    9829.71        11264.11     1.14592
> full                 473749.38       541801.33     1.14365
> mmapaddr             315772.34       351766.42     1.11399
> spawn                  3937.57         4384.92     1.11361
> io                   371206.67       409205.80     1.10237
> munmap                64162.14        70473.66     1.09837
> exit-group             5990.95         6522.70     1.08876
> pidfd                 37614.16        40687.85     1.08172
> flock              14069057.61     15117799.43     1.07454
> wait                 106334.40       113658.40     1.06888
> mmapfork                  1.81            1.93     1.0663
> daemon              1161091.36      1234795.43     1.06348
> bigheap              185514.46       195279.13     1.05264
> mmapfixed               319.65          333.70     1.04395
> brk                 1410050.59      1456025.25     1.0326
> sigabrt               12129.51        12520.45     1.03223
> sysfs                   806.78          831.54     1.03069
> dev-shm                  40.30           41.37     1.02655
> bad-altstack           7310.53         7493.23     1.02499
> shm                     823.73          842.50     1.02279
> shm-sysv               1132.54         1151.86     1.01706
> mmapmany             400323.77       406078.50     1.01438
> session               12096.44        12228.64     1.01093
> madvise                 116.81          117.96     1.00985
> clone                 28152.35        28414.20     1.0093
> msyncmany              2220.25         2238.88     1.00839
> pageswap             205651.13       207367.84     1.00835
> unshare                 637.92          642.98     1.00793
> remap                   373.18          375.69     1.00673
> personality         1698012.68      1706642.92     1.00508
> reboot               117234.02       117421.91     1.0016
> itimer                24962.64        24971.19     1.00034
> sync-file                 0.00            0.00     1
> sigfpe                    0.00            0.00     1
> seek                      0.00            0.00     1
> inode-flags               0.00            0.00     1
> env                       0.00            0.00     1
> prctl                 11805.81        11772.73     0.997198
> malloc               991487.43       987061.41     0.995536
> mmap                     14.48           14.39     0.993785
> zombie                33753.24        33539.75     0.993675
> rmap                    625.84          620.94     0.992171
> tlb-shootdown           358.25          355.33     0.991849
> switch              1251701.93      1240818.57     0.991305
> zero                 127112.38       125254.50     0.985384
> resources               685.62          674.89     0.98435
> yield               4184626.17      4117860.34     0.984045
> mlock                494527.50       485733.90     0.982218
> fallocate             32711.39        32067.69     0.980322
> sigchld               46289.82        44914.65     0.970292
> inotify                3013.11         2879.87     0.95578
> opcode                11315.78        10538.58     0.931317
> nice                 154327.30       136797.63     0.886412
> mremap                  225.29          198.82     0.882507
> exec                   4118.89         3282.85     0.797023
> vm-addr                 214.25          166.69     0.778016
> landlock                950.00          722.74     0.760779

Thanks for testing. Have you analyzed the cases with worse performance? 
As we are doing a optimization.

Thanks,
Zhiwei

> Thanks,
> Fei.
>> r~
>>
>>
>> Fei Wu (2):
>>    target/riscv: Separate priv from mmu_idx
>>    target/riscv: Reduce overhead of MSTATUS_SUM change
>>
>> LIU Zhiwei (4):
>>    target/riscv: Extract virt enabled state from tb flags
>>    target/riscv: Add a general status enum for extensions
>>    target/riscv: Encode the FS and VS on a normal way for tb flags
>>    target/riscv: Add a tb flags field for vstart
>>
>> Richard Henderson (19):
>>    target/riscv: Remove mstatus_hs_{fs,vs} from tb_flags
>>    accel/tcg: Add cpu_ld*_code_mmu
>>    target/riscv: Use cpu_ld*_code_mmu for HLVX
>>    target/riscv: Handle HLV, HSV via helpers
>>    target/riscv: Rename MMU_HYP_ACCESS_BIT to MMU_2STAGE_BIT
>>    target/riscv: Introduce mmuidx_sum
>>    target/riscv: Introduce mmuidx_priv
>>    target/riscv: Introduce mmuidx_2stage
>>    target/riscv: Move hstatus.spvp check to check_access_hlsv
>>    target/riscv: Set MMU_2STAGE_BIT in riscv_cpu_mmu_index
>>    target/riscv: Check SUM in the correct register
>>    target/riscv: Hoist second stage mode change to callers
>>    target/riscv: Hoist pbmte and hade out of the level loop
>>    target/riscv: Move leaf pte processing out of level loop
>>    target/riscv: Suppress pte update with is_debug
>>    target/riscv: Don't modify SUM with is_debug
>>    target/riscv: Merge checks for reserved pte flags
>>    target/riscv: Reorg access check in get_physical_address
>>    target/riscv: Reorg sum check in get_physical_address
>>
>>   include/exec/cpu_ldst.h                       |   9 +
>>   target/riscv/cpu.h                            |  47 ++-
>>   target/riscv/cpu_bits.h                       |  12 +-
>>   target/riscv/helper.h                         |  12 +-
>>   target/riscv/internals.h                      |  35 ++
>>   accel/tcg/cputlb.c                            |  48 +++
>>   accel/tcg/user-exec.c                         |  58 +++
>>   target/riscv/cpu.c                            |   2 +-
>>   target/riscv/cpu_helper.c                     | 393 +++++++++---------
>>   target/riscv/csr.c                            |  21 +-
>>   target/riscv/op_helper.c                      | 113 ++++-
>>   target/riscv/translate.c                      |  72 ++--
>>   .../riscv/insn_trans/trans_privileged.c.inc   |   2 +-
>>   target/riscv/insn_trans/trans_rvf.c.inc       |   2 +-
>>   target/riscv/insn_trans/trans_rvh.c.inc       | 135 +++---
>>   target/riscv/insn_trans/trans_rvv.c.inc       |  22 +-
>>   target/riscv/insn_trans/trans_xthead.c.inc    |   7 +-
>>   17 files changed, 595 insertions(+), 395 deletions(-)
>>
Wu, Fei April 4, 2023, 7:23 a.m. UTC | #7
On 4/4/2023 3:11 PM, LIU Zhiwei wrote:
> 
> On 2023/4/4 14:42, Wu, Fei wrote:
>> On 3/25/2023 6:54 PM, Richard Henderson wrote:
>>> This builds on Fei and Zhiwei's SUM and TB_FLAGS changes.
>>>
>>>    * Reclaim 5 TB_FLAGS bits, since we nearly ran out.
>>>
>>>    * Using cpu_mmu_index(env, true) is insufficient to implement
>>>      HLVX properly.  While that chooses the correct mmu_idx, it
>>>      does not perform the read with execute permission.
>>>      I add a new tcg interface to perform a read-for-execute with
>>>      an arbitrary mmu_idx.  This is still not 100% compliant, but
>>>      it's closer.
>>>
>>>    * Handle mstatus.MPV in cpu_mmu_index.
>>>    * Use vsstatus.SUM when required for MMUIdx_S_SUM.
>>>    * Cleanups for get_physical_address.
>>>
>>> While this passes check-avocado, I'm sure that's insufficient.
>>> Please have a close look.
>>>
>> I tested stress-ng to get the feeling of performance gain, although
>> stress-ng is not designed to be a performance workload. btw, I had to
>> revert commit 0ee342256af9 which is unrelated to this series, or qemu
>> exited during the test.
>>      ./stress-ng --timeout 5 --metrics-brief --class os --sequential 1
>>
>> Here is the result, in general most of the tests benefit from these
>> series, but please note that not all the results are necessary to be
>> consistent across multiple runs, and some regressions are not real but I
>> haven't checked it one by one.
>>
>>               master(60ca584b)   master + this      speedup
>>
>> stressor           bogo ops/s      bogo ops/s
>>                 (usr+sys time)  (usr+sys time)
>> sigsuspend            19430.09      1492746.34     76.8265
>> utime                  8779.64       271023.89     30.8696

>> opcode                11315.78        10538.58     0.931317
>> nice                 154327.30       136797.63     0.886412
>> mremap                  225.29          198.82     0.882507
>> exec                   4118.89         3282.85     0.797023
>> vm-addr                 214.25          166.69     0.778016
>> landlock                950.00          722.74     0.760779
> 
> Thanks for testing. Have you analyzed the cases with worse performance?
> As we are doing a optimization.
> 
During the 1st run, 'io' showed the worst regression, and it's proved
not a real regression or at least not consistent when I tried it again.

       master(60ca584b)   this run1  speedup1       this run2  speedup2

stressor   bogo ops/s      bogo ops/s
          (usr+sys time)  (usr+sys time)
fallocate     32711.39     33794.28    1.0331        32067.69  0.980322
sigchld       46289.82     42975.50    0.928401      44914.65  0.970292
inotify        3013.11      3511.21    1.16531        2879.87  0.95578
opcode        11315.78     10084.42    0.891182      10538.58  0.931317
nice         154327.30    186649.43    1.20944      136797.63  0.886412
mremap          225.29       237.39    1.05371         198.82  0.882507
exec           4118.89      4248.12    1.03137        3282.85  0.797023
vm-addr         214.25       268.60    1.25368         166.69  0.778016
landlock        950.00       791.12    0.832758        722.74  0.760779

io           371206.67    205232.61    0.55288      409205.80  1.10237

Thanks,
Fei.

> Thanks,
> Zhiwei
> 
>> Thanks,
>> Fei.
>>> r~
>>>
>>>
>>> Fei Wu (2):
>>>    target/riscv: Separate priv from mmu_idx
>>>    target/riscv: Reduce overhead of MSTATUS_SUM change
>>>
>>> LIU Zhiwei (4):
>>>    target/riscv: Extract virt enabled state from tb flags
>>>    target/riscv: Add a general status enum for extensions
>>>    target/riscv: Encode the FS and VS on a normal way for tb flags
>>>    target/riscv: Add a tb flags field for vstart
>>>
>>> Richard Henderson (19):
>>>    target/riscv: Remove mstatus_hs_{fs,vs} from tb_flags
>>>    accel/tcg: Add cpu_ld*_code_mmu
>>>    target/riscv: Use cpu_ld*_code_mmu for HLVX
>>>    target/riscv: Handle HLV, HSV via helpers
>>>    target/riscv: Rename MMU_HYP_ACCESS_BIT to MMU_2STAGE_BIT
>>>    target/riscv: Introduce mmuidx_sum
>>>    target/riscv: Introduce mmuidx_priv
>>>    target/riscv: Introduce mmuidx_2stage
>>>    target/riscv: Move hstatus.spvp check to check_access_hlsv
>>>    target/riscv: Set MMU_2STAGE_BIT in riscv_cpu_mmu_index
>>>    target/riscv: Check SUM in the correct register
>>>    target/riscv: Hoist second stage mode change to callers
>>>    target/riscv: Hoist pbmte and hade out of the level loop
>>>    target/riscv: Move leaf pte processing out of level loop
>>>    target/riscv: Suppress pte update with is_debug
>>>    target/riscv: Don't modify SUM with is_debug
>>>    target/riscv: Merge checks for reserved pte flags
>>>    target/riscv: Reorg access check in get_physical_address
>>>    target/riscv: Reorg sum check in get_physical_address
>>>
>>>   include/exec/cpu_ldst.h                       |   9 +
>>>   target/riscv/cpu.h                            |  47 ++-
>>>   target/riscv/cpu_bits.h                       |  12 +-
>>>   target/riscv/helper.h                         |  12 +-
>>>   target/riscv/internals.h                      |  35 ++
>>>   accel/tcg/cputlb.c                            |  48 +++
>>>   accel/tcg/user-exec.c                         |  58 +++
>>>   target/riscv/cpu.c                            |   2 +-
>>>   target/riscv/cpu_helper.c                     | 393 +++++++++---------
>>>   target/riscv/csr.c                            |  21 +-
>>>   target/riscv/op_helper.c                      | 113 ++++-
>>>   target/riscv/translate.c                      |  72 ++--
>>>   .../riscv/insn_trans/trans_privileged.c.inc   |   2 +-
>>>   target/riscv/insn_trans/trans_rvf.c.inc       |   2 +-
>>>   target/riscv/insn_trans/trans_rvh.c.inc       | 135 +++---
>>>   target/riscv/insn_trans/trans_rvv.c.inc       |  22 +-
>>>   target/riscv/insn_trans/trans_xthead.c.inc    |   7 +-
>>>   17 files changed, 595 insertions(+), 395 deletions(-)
>>>
Alistair Francis April 11, 2023, 5:38 a.m. UTC | #8
On Sat, Mar 25, 2023 at 9:58 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This builds on Fei and Zhiwei's SUM and TB_FLAGS changes.
>
>   * Reclaim 5 TB_FLAGS bits, since we nearly ran out.
>
>   * Using cpu_mmu_index(env, true) is insufficient to implement
>     HLVX properly.  While that chooses the correct mmu_idx, it
>     does not perform the read with execute permission.
>     I add a new tcg interface to perform a read-for-execute with
>     an arbitrary mmu_idx.  This is still not 100% compliant, but
>     it's closer.
>
>   * Handle mstatus.MPV in cpu_mmu_index.
>   * Use vsstatus.SUM when required for MMUIdx_S_SUM.
>   * Cleanups for get_physical_address.
>
> While this passes check-avocado, I'm sure that's insufficient.
> Please have a close look.
>
>
> r~
>
>
> Fei Wu (2):
>   target/riscv: Separate priv from mmu_idx
>   target/riscv: Reduce overhead of MSTATUS_SUM change
>
> LIU Zhiwei (4):
>   target/riscv: Extract virt enabled state from tb flags
>   target/riscv: Add a general status enum for extensions
>   target/riscv: Encode the FS and VS on a normal way for tb flags
>   target/riscv: Add a tb flags field for vstart
>
> Richard Henderson (19):
>   target/riscv: Remove mstatus_hs_{fs,vs} from tb_flags
>   accel/tcg: Add cpu_ld*_code_mmu
>   target/riscv: Use cpu_ld*_code_mmu for HLVX
>   target/riscv: Handle HLV, HSV via helpers
>   target/riscv: Rename MMU_HYP_ACCESS_BIT to MMU_2STAGE_BIT
>   target/riscv: Introduce mmuidx_sum
>   target/riscv: Introduce mmuidx_priv
>   target/riscv: Introduce mmuidx_2stage
>   target/riscv: Move hstatus.spvp check to check_access_hlsv
>   target/riscv: Set MMU_2STAGE_BIT in riscv_cpu_mmu_index
>   target/riscv: Check SUM in the correct register
>   target/riscv: Hoist second stage mode change to callers
>   target/riscv: Hoist pbmte and hade out of the level loop
>   target/riscv: Move leaf pte processing out of level loop
>   target/riscv: Suppress pte update with is_debug
>   target/riscv: Don't modify SUM with is_debug
>   target/riscv: Merge checks for reserved pte flags
>   target/riscv: Reorg access check in get_physical_address
>   target/riscv: Reorg sum check in get_physical_address

Thanks for the patches!

This has been reviewed and tested. Do you mind sending a v7 rebased on
https://github.com/alistair23/qemu/tree/riscv-to-apply.next?

Alistair

>
>  include/exec/cpu_ldst.h                       |   9 +
>  target/riscv/cpu.h                            |  47 ++-
>  target/riscv/cpu_bits.h                       |  12 +-
>  target/riscv/helper.h                         |  12 +-
>  target/riscv/internals.h                      |  35 ++
>  accel/tcg/cputlb.c                            |  48 +++
>  accel/tcg/user-exec.c                         |  58 +++
>  target/riscv/cpu.c                            |   2 +-
>  target/riscv/cpu_helper.c                     | 393 +++++++++---------
>  target/riscv/csr.c                            |  21 +-
>  target/riscv/op_helper.c                      | 113 ++++-
>  target/riscv/translate.c                      |  72 ++--
>  .../riscv/insn_trans/trans_privileged.c.inc   |   2 +-
>  target/riscv/insn_trans/trans_rvf.c.inc       |   2 +-
>  target/riscv/insn_trans/trans_rvh.c.inc       | 135 +++---
>  target/riscv/insn_trans/trans_rvv.c.inc       |  22 +-
>  target/riscv/insn_trans/trans_xthead.c.inc    |   7 +-
>  17 files changed, 595 insertions(+), 395 deletions(-)
>
> --
> 2.34.1
>
>