Message ID | 20240418092327.860135-1-benjamin@sipsolutions.net |
---|---|
Headers | show |
Series | Rework stub syscall and page table handling | expand |
Hello Benjamin, On 4/18/24 5:23 PM, benjamin@sipsolutions.net wrote: > From: Benjamin Berg <benjamin.berg@intel.com> > > This patchset reworks the stub syscall handling and also redos how page > table updates are tracked and synchronized. Some of this originated in > the SECCOMP patchset, but it became clear that these refactorings make > sense independently as they result in a considerably fewer page faults. I saw your SECCOMP patchset. It's pretty cool! Just wondering if you're about to post a new version soon. :) PS. Just FYI, gVisor also implemented a SECCOMP based platform which is also very interesting: https://gvisor.dev/blog/2023/04/28/systrap-release/ https://github.com/google/gvisor/tree/master/pkg/sentry/platform/systrap Regards, Tiwei
Hi Tiwei, On Mon, 2024-04-22 at 10:35 +0800, Tiwei Bie wrote: > On 4/18/24 5:23 PM, benjamin@sipsolutions.net wrote: > > From: Benjamin Berg <benjamin.berg@intel.com> > > > > This patchset reworks the stub syscall handling and also redos how page > > table updates are tracked and synchronized. Some of this originated in > > the SECCOMP patchset, but it became clear that these refactorings make > > sense independently as they result in a considerably fewer page faults. > > I saw your SECCOMP patchset. It's pretty cool! Just wondering if you're about > to post a new version soon. :) I am planning to work on it again, but it is not very high on my priority list. So, could be quite soon or some months :-) In the ARM support thread ("UML for arm64"), there were some ideas to use FD passing in order to protect memory mappings better. Doing that should allow the SECCOMP approach to scale to SMP and will also simplify the security model. Making those changes will take a bit of thought and experimentation. Nothing really big though, it pretty much boils down to using sockets for (some of) the synchronization and replacing mprotect with mmap so the FD can authorize the operation. > PS. Just FYI, gVisor also implemented a SECCOMP based platform which is > also very interesting: > > https://gvisor.dev/blog/2023/04/28/systrap-release/ > https://github.com/google/gvisor/tree/master/pkg/sentry/platform/systrap It is a good choice if you want to catch syscalls and do some custom handling in userspace. Just annoying that it is still stuck on classic BPF :-) Benjamin
On 4/22/24 3:41 PM, Benjamin Berg wrote: > On Mon, 2024-04-22 at 10:35 +0800, Tiwei Bie wrote: >> On 4/18/24 5:23 PM, benjamin@sipsolutions.net wrote: >>> From: Benjamin Berg <benjamin.berg@intel.com> >>> >>> This patchset reworks the stub syscall handling and also redos how page >>> table updates are tracked and synchronized. Some of this originated in >>> the SECCOMP patchset, but it became clear that these refactorings make >>> sense independently as they result in a considerably fewer page faults. >> >> I saw your SECCOMP patchset. It's pretty cool! Just wondering if you're about >> to post a new version soon. :) > > I am planning to work on it again, but it is not very high on my > priority list. So, could be quite soon or some months :-) > > In the ARM support thread ("UML for arm64"), there were some ideas to > use FD passing in order to protect memory mappings better. Doing that > should allow the SECCOMP approach to scale to SMP and will also > simplify the security model. > > Making those changes will take a bit of thought and experimentation. > Nothing really big though, it pretty much boils down to using sockets > for (some of) the synchronization and replacing mprotect with mmap so > the FD can authorize the operation. Cool. Thanks for the details! Looking forward to your new version. :) Regards, Tiwei
From: Benjamin Berg <benjamin.berg@intel.com> This patchset reworks the stub syscall handling and also redos how page table updates are tracked and synchronized. Some of this originated in the SECCOMP patchset, but it became clear that these refactorings make sense independently as they result in a considerably fewer page faults. As an example, these changes bring the runtime of one of the hostapd hwsim test modules from 121 seconds down to 103 seconds. Actual results will depend heavily on how pagefault heavy the workload is. Lowering the amount of pagefaults is due to proactively syncing PTEs that are set (and were previously unset) and also delaying synchronization so that multiple updates can be done in one step rather than requiring multiple task switches. This refactoring also fixes various odd corner cases in how UML was handling memory and cloning of MMs. As part of this work, support for LDTs has been dropped. My expectation is that this is not a problem as it should only be needed by legacy applications. Benjamin Berg (12): um: Remove stub-data.h include from common-offsets.h um: Create signal stack memory assignment in stub_data um: Add generic stub_syscall6 function um: Rework syscall handling um: compress memory related stub syscalls while adding them um: remove LDT support um: remove copy_context_skas0 um: Delay flushing syscalls until the thread is restarted um: Do not flush MM in flush_thread um: remove force_flush_all from fork_handler um: simplify and consolidate TLB updates um: refactor TLB update handling arch/um/include/asm/mmu.h | 10 +- arch/um/include/asm/mmu_context.h | 2 - arch/um/include/asm/pgtable.h | 32 ++ arch/um/include/asm/tlbflush.h | 46 +- arch/um/include/shared/as-layout.h | 2 +- arch/um/include/shared/common-offsets.h | 5 - arch/um/include/shared/os.h | 27 +- arch/um/include/shared/skas/mm_id.h | 2 +- arch/um/include/shared/skas/skas.h | 2 + arch/um/include/shared/skas/stub-data.h | 35 +- arch/um/include/shared/user.h | 8 + arch/um/kernel/exec.c | 9 - arch/um/kernel/process.c | 2 - arch/um/kernel/skas/Makefile | 9 +- arch/um/kernel/skas/clone.c | 48 --- arch/um/kernel/skas/mmu.c | 33 +- arch/um/kernel/skas/process.c | 18 + arch/um/kernel/skas/stub.c | 69 +++ arch/um/kernel/tlb.c | 551 ++++-------------------- arch/um/kernel/trap.c | 15 +- arch/um/os-Linux/skas/mem.c | 248 ++++++----- arch/um/os-Linux/skas/process.c | 121 +----- arch/um/os-Linux/start_up.c | 1 + arch/x86/um/Makefile | 4 +- arch/x86/um/asm/mm_context.h | 70 --- arch/x86/um/ldt.c | 380 ---------------- arch/x86/um/shared/sysdep/stub.h | 1 + arch/x86/um/shared/sysdep/stub_32.h | 29 +- arch/x86/um/shared/sysdep/stub_64.h | 23 +- arch/x86/um/stub_32.S | 56 --- arch/x86/um/stub_64.S | 50 --- arch/x86/um/tls_32.c | 1 + 32 files changed, 543 insertions(+), 1366 deletions(-) delete mode 100644 arch/um/kernel/skas/clone.c create mode 100644 arch/um/kernel/skas/stub.c delete mode 100644 arch/x86/um/asm/mm_context.h delete mode 100644 arch/x86/um/ldt.c delete mode 100644 arch/x86/um/stub_32.S delete mode 100644 arch/x86/um/stub_64.S