mbox series

[00/12] Rework stub syscall and page table handling

Message ID 20240418092327.860135-1-benjamin@sipsolutions.net
Headers show
Series Rework stub syscall and page table handling | expand

Message

Benjamin Berg April 18, 2024, 9:23 a.m. UTC
From: Benjamin Berg <benjamin.berg@intel.com>

This patchset reworks the stub syscall handling and also redos how page
table updates are tracked and synchronized. Some of this originated in
the SECCOMP patchset, but it became clear that these refactorings make
sense independently as they result in a considerably fewer page faults.

As an example, these changes bring the runtime of one of the hostapd
hwsim test modules from 121 seconds down to 103 seconds. Actual results
will depend heavily on how pagefault heavy the workload is.

Lowering the amount of pagefaults is due to proactively syncing PTEs
that are set (and were previously unset) and also delaying
synchronization so that multiple updates can be done in one step rather
than requiring multiple task switches.

This refactoring also fixes various odd corner cases in how UML was
handling memory and cloning of MMs. As part of this work, support for
LDTs has been dropped. My expectation is that this is not a problem as
it should only be needed by legacy applications.

Benjamin Berg (12):
  um: Remove stub-data.h include from common-offsets.h
  um: Create signal stack memory assignment in stub_data
  um: Add generic stub_syscall6 function
  um: Rework syscall handling
  um: compress memory related stub syscalls while adding them
  um: remove LDT support
  um: remove copy_context_skas0
  um: Delay flushing syscalls until the thread is restarted
  um: Do not flush MM in flush_thread
  um: remove force_flush_all from fork_handler
  um: simplify and consolidate TLB updates
  um: refactor TLB update handling

 arch/um/include/asm/mmu.h               |  10 +-
 arch/um/include/asm/mmu_context.h       |   2 -
 arch/um/include/asm/pgtable.h           |  32 ++
 arch/um/include/asm/tlbflush.h          |  46 +-
 arch/um/include/shared/as-layout.h      |   2 +-
 arch/um/include/shared/common-offsets.h |   5 -
 arch/um/include/shared/os.h             |  27 +-
 arch/um/include/shared/skas/mm_id.h     |   2 +-
 arch/um/include/shared/skas/skas.h      |   2 +
 arch/um/include/shared/skas/stub-data.h |  35 +-
 arch/um/include/shared/user.h           |   8 +
 arch/um/kernel/exec.c                   |   9 -
 arch/um/kernel/process.c                |   2 -
 arch/um/kernel/skas/Makefile            |   9 +-
 arch/um/kernel/skas/clone.c             |  48 ---
 arch/um/kernel/skas/mmu.c               |  33 +-
 arch/um/kernel/skas/process.c           |  18 +
 arch/um/kernel/skas/stub.c              |  69 +++
 arch/um/kernel/tlb.c                    | 551 ++++--------------------
 arch/um/kernel/trap.c                   |  15 +-
 arch/um/os-Linux/skas/mem.c             | 248 ++++++-----
 arch/um/os-Linux/skas/process.c         | 121 +-----
 arch/um/os-Linux/start_up.c             |   1 +
 arch/x86/um/Makefile                    |   4 +-
 arch/x86/um/asm/mm_context.h            |  70 ---
 arch/x86/um/ldt.c                       | 380 ----------------
 arch/x86/um/shared/sysdep/stub.h        |   1 +
 arch/x86/um/shared/sysdep/stub_32.h     |  29 +-
 arch/x86/um/shared/sysdep/stub_64.h     |  23 +-
 arch/x86/um/stub_32.S                   |  56 ---
 arch/x86/um/stub_64.S                   |  50 ---
 arch/x86/um/tls_32.c                    |   1 +
 32 files changed, 543 insertions(+), 1366 deletions(-)
 delete mode 100644 arch/um/kernel/skas/clone.c
 create mode 100644 arch/um/kernel/skas/stub.c
 delete mode 100644 arch/x86/um/asm/mm_context.h
 delete mode 100644 arch/x86/um/ldt.c
 delete mode 100644 arch/x86/um/stub_32.S
 delete mode 100644 arch/x86/um/stub_64.S

Comments

Tiwei Bie April 22, 2024, 2:35 a.m. UTC | #1
Hello Benjamin,

On 4/18/24 5:23 PM, benjamin@sipsolutions.net wrote:
> From: Benjamin Berg <benjamin.berg@intel.com>
> 
> This patchset reworks the stub syscall handling and also redos how page
> table updates are tracked and synchronized. Some of this originated in
> the SECCOMP patchset, but it became clear that these refactorings make
> sense independently as they result in a considerably fewer page faults.

I saw your SECCOMP patchset. It's pretty cool! Just wondering if you're about
to post a new version soon. :)

PS. Just FYI, gVisor also implemented a SECCOMP based platform which is
also very interesting:

https://gvisor.dev/blog/2023/04/28/systrap-release/
https://github.com/google/gvisor/tree/master/pkg/sentry/platform/systrap

Regards,
Tiwei
Benjamin Berg April 22, 2024, 7:41 a.m. UTC | #2
Hi Tiwei,

On Mon, 2024-04-22 at 10:35 +0800, Tiwei Bie wrote:
> On 4/18/24 5:23 PM, benjamin@sipsolutions.net wrote:
> > From: Benjamin Berg <benjamin.berg@intel.com>
> > 
> > This patchset reworks the stub syscall handling and also redos how page
> > table updates are tracked and synchronized. Some of this originated in
> > the SECCOMP patchset, but it became clear that these refactorings make
> > sense independently as they result in a considerably fewer page faults.
> 
> I saw your SECCOMP patchset. It's pretty cool! Just wondering if you're about
> to post a new version soon. :)

I am planning to work on it again, but it is not very high on my
priority list. So, could be quite soon or some months :-)

In the ARM support thread ("UML for arm64"), there were some ideas to
use FD passing in order to protect memory mappings better. Doing that
should allow the SECCOMP approach to scale to SMP and will also
simplify the security model.

Making those changes will take a bit of thought and experimentation.
Nothing really big though, it pretty much boils down to using sockets
for (some of) the synchronization and replacing mprotect with mmap so
the FD can authorize the operation.

> PS. Just FYI, gVisor also implemented a SECCOMP based platform which is
> also very interesting:
> 
> https://gvisor.dev/blog/2023/04/28/systrap-release/
> https://github.com/google/gvisor/tree/master/pkg/sentry/platform/systrap

It is a good choice if you want to catch syscalls and do some custom
handling in userspace. Just annoying that it is still stuck on classic
BPF :-)

Benjamin
Tiwei Bie April 22, 2024, 12:08 p.m. UTC | #3
On 4/22/24 3:41 PM, Benjamin Berg wrote:
> On Mon, 2024-04-22 at 10:35 +0800, Tiwei Bie wrote:
>> On 4/18/24 5:23 PM, benjamin@sipsolutions.net wrote:
>>> From: Benjamin Berg <benjamin.berg@intel.com>
>>>
>>> This patchset reworks the stub syscall handling and also redos how page
>>> table updates are tracked and synchronized. Some of this originated in
>>> the SECCOMP patchset, but it became clear that these refactorings make
>>> sense independently as they result in a considerably fewer page faults.
>>
>> I saw your SECCOMP patchset. It's pretty cool! Just wondering if you're about
>> to post a new version soon. :)
> 
> I am planning to work on it again, but it is not very high on my
> priority list. So, could be quite soon or some months :-)
> 
> In the ARM support thread ("UML for arm64"), there were some ideas to
> use FD passing in order to protect memory mappings better. Doing that
> should allow the SECCOMP approach to scale to SMP and will also
> simplify the security model.
> 
> Making those changes will take a bit of thought and experimentation.
> Nothing really big though, it pretty much boils down to using sockets
> for (some of) the synchronization and replacing mprotect with mmap so
> the FD can authorize the operation.

Cool. Thanks for the details! Looking forward to your new version. :)

Regards,
Tiwei