Message ID | 1504198860-12951-20-git-send-email-Dave.Martin@arm.com |
---|---|
State | New |
Headers | show |
Series | ARM Scalable Vector Extension (SVE) | expand |
Hi Dave, I am an engineer of the postK computer from Fujitsu. When I tried to read "max_vl" by ptrace with this patch on our local SVE simulator, it was read as zero. I think the cause of this incident is that "max_vl" is set as "header->vl" only on warning case in sve_init_header_from_task(). "max_vl" should be set up also on normal case, like the following patch. --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -755,6 +755,8 @@ static void sve_init_header_from_task(struct user_sve_header *header, if (WARN_ON(!sve_vl_valid(sve_max_vl))) header->max_vl = header->vl; + else + header->max_vl = sve_max_vl; header->size = SVE_PT_SIZE(vq, header->flags); header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl), Best regards, Takayuki Okamoto -----Original Message----- From: gdb-owner@sourceware.org [mailto:gdb-owner@sourceware.org] On Behalf Of Dave Martin Sent: Friday, September 1, 2017 2:01 AM To: linux-arm-kernel@lists.infradead.org Cc: Catalin Marinas <catalin.marinas@arm.com>; Will Deacon <will.deacon@arm.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>; Alex Bennée <alex.bennee@linaro.org>; Szabolcs Nagy <szabolcs.nagy@arm.com>; Richard Sandiford <richard.sandiford@arm.com>; kvmarm@lists.cs.columbia.edu; libc-alpha@sourceware.org; linux-arch@vger.kernel.org; gdb@sourceware.org; Alan Hayward <alan.hayward@arm.com>; Yao Qi <Yao.Qi@arm.com>; Oleg Nesterov <oleg@redhat.com>; Alexander Viro <viro@zeniv.linux.org.uk> Subject: [PATCH v2 19/28] arm64/sve: ptrace and ELF coredump support This patch defines and implements a new regset NT_ARM_SVE, which describes a thread's SVE register state. This allows a debugger to manipulate the SVE state, as well as being included in ELF coredumps for post-mortem debugging. Because the regset size and layout are dependent on the thread's current vector length, it is not possible to define a C struct to describe the regset contents as is done for existing regsets. Instead, and for the same reasons, NT_ARM_SVE is based on the freeform variable-layout approach used for the SVE signal frame. Additionally, to reduce debug overhead when debugging threads that might or might not have live SVE register state, NT_ARM_SVE may be presented in one of two different formats: the old struct user_fpsimd_state format is embedded for describing the state of a thread with no live SVE state, whereas a new variable-layout structure is embedded for describing live SVE state. This avoids a debugger needing to poll NT_PRFPREG in addition to NT_ARM_SVE, and allows existing userspace code to handle the non-SVE case without too much modification. For this to work, NT_ARM_SVE is defined with a fixed-format header of type struct user_sve_header, which the recipient can use to figure out the content, size and layout of the reset of the regset. Accessor macros are defined to allow the vector-length-dependent parts of the regset to be manipulated. Signed-off-by: Alan Hayward <alan.hayward@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Cc: Alex Bennée <alex.bennee@linaro.org> --- Changes since v1 ---------------- Other changes related to Alex Bennée's comments: * Migrate to SVE_VQ_BYTES instead of magic numbers. Requested by Alex Bennée: * Thin out BUG_ON()s: Redundant BUG_ON()s and ones that just check invariants are removed. Important sanity-checks are migrated to WARN_ON()s, with some minimal best-effort patch-up code. Other: * [ABI fix] Bail out with -EIO if attempting to set the SVE regs for an unsupported VL, instead of misparsing the regset data. * Replace some in-kernel open-coded arithmetic with ALIGN()/ DIV_ROUND_UP(). --- arch/arm64/include/asm/fpsimd.h | 13 +- arch/arm64/include/uapi/asm/ptrace.h | 135 ++++++++++++++++++ arch/arm64/kernel/fpsimd.c | 40 +++++- arch/arm64/kernel/ptrace.c | 270 +++++++++++++++++++++++++++++++++-- include/uapi/linux/elf.h | 1 + 5 files changed, 449 insertions(+), 10 deletions(-) diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 6c22624..2723cca 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -38,13 +38,16 @@ struct fpsimd_state { __uint128_t vregs[32]; u32 fpsr; u32 fpcr; + /* + * For ptrace compatibility, pad to next 128-bit + * boundary here if extending this struct. + */ }; }; /* the id of the last cpu to have restored this state */ unsigned int cpu; }; - #if defined(__KERNEL__) && defined(CONFIG_COMPAT) /* Masks for extracting the FPSR and FPCR from the FPSCR */ #define VFP_FPSCR_STAT_MASK 0xf800009f @@ -89,6 +92,10 @@ extern void sve_alloc(struct task_struct *task); extern void fpsimd_release_thread(struct task_struct *task); extern void fpsimd_dup_sve(struct task_struct *dst, struct task_struct const *src); +extern void fpsimd_sync_to_sve(struct task_struct *task); +extern void sve_sync_to_fpsimd(struct task_struct *task); +extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task); + extern int sve_set_vector_length(struct task_struct *task, unsigned long vl, unsigned long flags); @@ -103,6 +110,10 @@ static void __maybe_unused sve_alloc(struct task_struct *task) { } static void __maybe_unused fpsimd_release_thread(struct task_struct *task) { } static void __maybe_unused fpsimd_dup_sve(struct task_struct *dst, struct task_struct const *src) { } +static void __maybe_unused sve_sync_to_fpsimd(struct task_struct *task) { } +static void __maybe_unused sve_sync_from_fpsimd_zeropad( + struct task_struct *task) { } + static void __maybe_unused sve_init_vq_map(void) { } static void __maybe_unused sve_update_vq_map(void) { } static int __maybe_unused sve_verify_vq_map(void) { return 0; } diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h index d1ff83d..1915ab0 100644 --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -22,6 +22,7 @@ #include <linux/types.h> #include <asm/hwcap.h> +#include <asm/sigcontext.h> /* @@ -63,6 +64,8 @@ #ifndef __ASSEMBLY__ +#include <linux/prctl.h> + /* * User structures for general purpose, floating point and debug registers. */ @@ -90,6 +93,138 @@ struct user_hwdebug_state { } dbg_regs[16]; }; +/* SVE/FP/SIMD state (NT_ARM_SVE) */ + +struct user_sve_header { + __u32 size; /* total meaningful regset content in bytes */ + __u32 max_size; /* maxmium possible size for this thread */ + __u16 vl; /* current vector length */ + __u16 max_vl; /* maximum possible vector length */ + __u16 flags; + __u16 __reserved; +}; + +/* Definitions for user_sve_header.flags: */ +#define SVE_PT_REGS_MASK (1 << 0) + +/* Flags: must be kept in sync with prctl interface in <linux/ptrace.h> */ +#define SVE_PT_REGS_FPSIMD 0 +#define SVE_PT_REGS_SVE SVE_PT_REGS_MASK + +#define SVE_PT_VL_INHERIT (PR_SVE_VL_INHERIT >> 16) +#define SVE_PT_VL_ONEXEC (PR_SVE_SET_VL_ONEXEC >> 16) + + +/* + * The remainder of the SVE state follows struct user_sve_header. The + * total size of the SVE state (including header) depends on the + * metadata in the header: SVE_PT_SIZE(vq, flags) gives the total size + * of the state in bytes, including the header. + * + * Refer to <asm/sigcontext.h> for details of how to pass the correct + * "vq" argument to these macros. + */ + +/* Offset from the start of struct user_sve_header to the register data */ +#define SVE_PT_REGS_OFFSET \ + ((sizeof(struct sve_context) + (SVE_VQ_BYTES - 1)) \ + / SVE_VQ_BYTES * SVE_VQ_BYTES) + +/* + * The register data content and layout depends on the value of the + * flags field. + */ + +/* + * (flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD case: + * + * The payload starts at offset SVE_PT_FPSIMD_OFFSET, and is of type + * struct user_fpsimd_state. Additional data might be appended in the + * future: use SVE_PT_FPSIMD_SIZE(vq, flags) to compute the total size. + * SVE_PT_FPSIMD_SIZE(vq, flags) will never be less than + * sizeof(struct user_fpsimd_state). + */ + +#define SVE_PT_FPSIMD_OFFSET SVE_PT_REGS_OFFSET + +#define SVE_PT_FPSIMD_SIZE(vq, flags) (sizeof(struct user_fpsimd_state)) + +/* + * (flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE case: + * + * The payload starts at offset SVE_PT_SVE_OFFSET, and is of size + * SVE_PT_SVE_SIZE(vq, flags). + * + * Additional macros describe the contents and layout of the payload. + * For each, SVE_PT_SVE_x_OFFSET(args) is the start offset relative to + * the start of struct user_sve_header, and SVE_PT_SVE_x_SIZE(args) is + * the size in bytes: + * + * x type description + * - ---- ----------- + * ZREGS \ + * ZREG | + * PREGS | refer to <asm/sigcontext.h> + * PREG | + * FFR / + * + * FPSR uint32_t FPSR + * FPCR uint32_t FPCR + * + * Additional data might be appended in the future. + */ + +#define SVE_PT_SVE_ZREG_SIZE(vq) SVE_SIG_ZREG_SIZE(vq) +#define SVE_PT_SVE_PREG_SIZE(vq) SVE_SIG_PREG_SIZE(vq) +#define SVE_PT_SVE_FFR_SIZE(vq) SVE_SIG_FFR_SIZE(vq) +#define SVE_PT_SVE_FPSR_SIZE sizeof(__u32) +#define SVE_PT_SVE_FPCR_SIZE sizeof(__u32) + +#define __SVE_SIG_TO_PT(offset) \ + ((offset) - SVE_SIG_REGS_OFFSET + SVE_PT_REGS_OFFSET) + +#define SVE_PT_SVE_OFFSET SVE_PT_REGS_OFFSET + +#define SVE_PT_SVE_ZREGS_OFFSET \ + __SVE_SIG_TO_PT(SVE_SIG_ZREGS_OFFSET) +#define SVE_PT_SVE_ZREG_OFFSET(vq, n) \ + __SVE_SIG_TO_PT(SVE_SIG_ZREG_OFFSET(vq, n)) +#define SVE_PT_SVE_ZREGS_SIZE(vq) \ + (SVE_PT_SVE_ZREG_OFFSET(vq, SVE_NUM_ZREGS) - SVE_PT_SVE_ZREGS_OFFSET) + +#define SVE_PT_SVE_PREGS_OFFSET(vq) \ + __SVE_SIG_TO_PT(SVE_SIG_PREGS_OFFSET(vq)) +#define SVE_PT_SVE_PREG_OFFSET(vq, n) \ + __SVE_SIG_TO_PT(SVE_SIG_PREG_OFFSET(vq, n)) +#define SVE_PT_SVE_PREGS_SIZE(vq) \ + (SVE_PT_SVE_PREG_OFFSET(vq, SVE_NUM_PREGS) - \ + SVE_PT_SVE_PREGS_OFFSET(vq)) + +#define SVE_PT_SVE_FFR_OFFSET(vq) \ + __SVE_SIG_TO_PT(SVE_SIG_FFR_OFFSET(vq)) + +#define SVE_PT_SVE_FPSR_OFFSET(vq) \ + ((SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq) + \ + (SVE_VQ_BYTES - 1)) \ + / SVE_VQ_BYTES * SVE_VQ_BYTES) +#define SVE_PT_SVE_FPCR_OFFSET(vq) \ + (SVE_PT_SVE_FPSR_OFFSET(vq) + SVE_PT_SVE_FPSR_SIZE) + +/* + * Any future extension appended after FPCR must be aligned to the next + * 128-bit boundary. + */ + +#define SVE_PT_SVE_SIZE(vq, flags) \ + ((SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE \ + - SVE_PT_SVE_OFFSET + (SVE_VQ_BYTES - 1)) \ + / SVE_VQ_BYTES * SVE_VQ_BYTES) + +#define SVE_PT_SIZE(vq, flags) \ + (((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ? \ + SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags) \ + : SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags)) + #endif /* __ASSEMBLY__ */ #endif /* _UAPI__ASM_PTRACE_H */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index fff9fcf..361c019 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -303,6 +303,37 @@ void sve_alloc(struct task_struct *task) BUG_ON(!task->thread.sve_state); } +void fpsimd_sync_to_sve(struct task_struct *task) +{ + if (!test_tsk_thread_flag(task, TIF_SVE)) + fpsimd_to_sve(task); +} + +void sve_sync_to_fpsimd(struct task_struct *task) +{ + if (test_tsk_thread_flag(task, TIF_SVE)) + sve_to_fpsimd(task); +} + +void sve_sync_from_fpsimd_zeropad(struct task_struct *task) +{ + unsigned int vq; + void *sst = task->thread.sve_state; + struct fpsimd_state const *fst = &task->thread.fpsimd_state; + unsigned int i; + + if (!test_tsk_thread_flag(task, TIF_SVE)) + return; + + vq = sve_vq_from_vl(task->thread.sve_vl); + + memset(sst, 0, SVE_SIG_REGS_SIZE(vq)); + + for (i = 0; i < 32; ++i) + memcpy(ZREG(sst, vq, i), &fst->vregs[i], + sizeof(fst->vregs[i])); +} + /* * Handle SVE state across fork(): * @@ -459,10 +490,17 @@ static void __init sve_efi_setup(void) * This is evidence of a crippled system and we are returning void, * so no attempt is made to handle this situation here. */ - BUG_ON(!sve_vl_valid(sve_max_vl)); + if (!sve_vl_valid(sve_max_vl)) + goto fail; + efi_sve_state = __alloc_percpu( SVE_SIG_REGS_SIZE(sve_vq_from_vl(sve_max_vl)), SVE_VQ_BYTES); if (!efi_sve_state) + goto fail; + + return; + +fail: panic("Cannot allocate percpu memory for EFI SVE save/restore"); } diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 9cbb612..5ef4735b 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -32,6 +32,7 @@ #include <linux/security.h> #include <linux/init.h> #include <linux/signal.h> +#include <linux/string.h> #include <linux/uaccess.h> #include <linux/perf_event.h> #include <linux/hw_breakpoint.h> @@ -40,6 +41,7 @@ #include <linux/elf.h> #include <asm/compat.h> +#include <asm/cpufeature.h> #include <asm/debug-monitors.h> #include <asm/pgtable.h> #include <asm/stacktrace.h> @@ -618,33 +620,66 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset, /* * TODO: update fp accessors for lazy context switching (sync/flush hwstate) */ -static int fpr_get(struct task_struct *target, const struct user_regset *regset, - unsigned int pos, unsigned int count, - void *kbuf, void __user *ubuf) +static int __fpr_get(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + void *kbuf, void __user *ubuf, unsigned int start_pos) { struct user_fpsimd_state *uregs; + + sve_sync_to_fpsimd(target); + uregs = &target->thread.fpsimd_state.user_fpsimd; + return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, + start_pos, start_pos + sizeof(*uregs)); +} + +static int fpr_get(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + void *kbuf, void __user *ubuf) +{ if (target == current) fpsimd_preserve_current_state(); - return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, 0, -1); + return __fpr_get(target, regset, pos, count, kbuf, ubuf, 0); } -static int fpr_set(struct task_struct *target, const struct user_regset *regset, - unsigned int pos, unsigned int count, - const void *kbuf, const void __user *ubuf) +static int __fpr_set(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf, + unsigned int start_pos) { int ret; struct user_fpsimd_state newstate = target->thread.fpsimd_state.user_fpsimd; - ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, 0, -1); + sve_sync_to_fpsimd(target); + + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, + start_pos, start_pos + sizeof(newstate)); if (ret) return ret; target->thread.fpsimd_state.user_fpsimd = newstate; + + return ret; +} + +static int fpr_set(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + int ret; + + ret = __fpr_set(target, regset, pos, count, kbuf, ubuf, 0); + if (ret) + return ret; + + sve_sync_from_fpsimd_zeropad(target); fpsimd_flush_task_state(target); + return ret; } @@ -702,6 +737,210 @@ static int system_call_set(struct task_struct *target, return ret; } +#ifdef CONFIG_ARM64_SVE + +static void sve_init_header_from_task(struct user_sve_header *header, + struct task_struct *target) +{ + unsigned int vq; + + memset(header, 0, sizeof(*header)); + + header->flags = test_tsk_thread_flag(target, TIF_SVE) ? + SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD; + if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT)) + header->flags |= SVE_PT_VL_INHERIT; + + header->vl = target->thread.sve_vl; + vq = sve_vq_from_vl(header->vl); + + if (WARN_ON(!sve_vl_valid(sve_max_vl))) + header->max_vl = header->vl; + + header->size = SVE_PT_SIZE(vq, header->flags); + header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl), + SVE_PT_REGS_SVE); +} + +static unsigned int sve_size_from_header(struct user_sve_header const *header) +{ + return ALIGN(header->size, SVE_VQ_BYTES); +} + +static unsigned int sve_get_size(struct task_struct *target, + const struct user_regset *regset) +{ + struct user_sve_header header; + + if (!system_supports_sve()) + return 0; + + sve_init_header_from_task(&header, target); + return sve_size_from_header(&header); +} + +static int sve_get(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + void *kbuf, void __user *ubuf) +{ + int ret; + struct user_sve_header header; + unsigned int vq; + unsigned long start, end; + + if (!system_supports_sve()) + return -EINVAL; + + /* Header */ + sve_init_header_from_task(&header, target); + vq = sve_vq_from_vl(header.vl); + + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &header, + 0, sizeof(header)); + if (ret) + return ret; + + if (target == current) + fpsimd_preserve_current_state(); + + /* Registers: FPSIMD-only case */ + + BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header)); + if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD) + return __fpr_get(target, regset, pos, count, kbuf, ubuf, + SVE_PT_FPSIMD_OFFSET); + + /* Otherwise: full SVE case */ + + BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header)); + start = SVE_PT_SVE_OFFSET; + end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq); + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + target->thread.sve_state, + start, end); + if (ret) + return ret; + + start = end; + end = SVE_PT_SVE_FPSR_OFFSET(vq); + ret = user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf, + start, end); + if (ret) + return ret; + + /* + * Copy fpsr, and fpcr which must follow contiguously in + * struct fpsimd_state: + */ + start = end; + end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE; + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + &target->thread.fpsimd_state.fpsr, + start, end); + if (ret) + return ret; + + start = end; + end = sve_size_from_header(&header); + return user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf, + start, end); +} + +static int sve_set(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + int ret; + struct user_sve_header header; + unsigned int vq; + unsigned long start, end; + + if (!system_supports_sve()) + return -EINVAL; + + /* Header */ + if (count < sizeof(header)) + return -EINVAL; + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &header, + 0, sizeof(header)); + if (ret) + goto out; + + /* + * Apart from PT_SVE_REGS_MASK, all PT_SVE_* flags are consumed by + * sve_set_vector_length(), which will also validate them for us: + */ + ret = sve_set_vector_length(target, header.vl, + header.flags & ~SVE_PT_REGS_MASK); + if (ret) + goto out; + + /* Actual VL set may be less than the user asked for: */ + vq = sve_vq_from_vl(target->thread.sve_vl); + + /* Registers: FPSIMD-only case */ + + BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header)); + if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD) { + sve_sync_to_fpsimd(target); + + ret = __fpr_set(target, regset, pos, count, kbuf, ubuf, + SVE_PT_FPSIMD_OFFSET); + clear_tsk_thread_flag(target, TIF_SVE); + goto out; + } + + /* Otherwise: full SVE case */ + + /* + * If setting a different VL from the requested VL and there is + * register data, the data layout will be wrong: don't even + * try to set the registers in this case. + */ + if (count && vq != sve_vq_from_vl(header.vl)) { + ret = -EIO; + goto out; + } + + sve_alloc(target); + fpsimd_sync_to_sve(target); + set_tsk_thread_flag(target, TIF_SVE); + + BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header)); + start = SVE_PT_SVE_OFFSET; + end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq); + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + target->thread.sve_state, + start, end); + if (ret) + goto out; + + start = end; + end = SVE_PT_SVE_FPSR_OFFSET(vq); + ret = user_regset_copyin_ignore(&pos, &count, &kbuf, &ubuf, + start, end); + if (ret) + goto out; + + /* + * Copy fpsr, and fpcr which must follow contiguously in + * struct fpsimd_state: + */ + start = end; + end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE; + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + &target->thread.fpsimd_state.fpsr, + start, end); + +out: + fpsimd_flush_task_state(target); + return ret; +} + +#endif /* CONFIG_ARM64_SVE */ + enum aarch64_regset { REGSET_GPR, REGSET_FPR, @@ -711,6 +950,9 @@ enum aarch64_regset { REGSET_HW_WATCH, #endif REGSET_SYSTEM_CALL, +#ifdef CONFIG_ARM64_SVE + REGSET_SVE, +#endif }; static const struct user_regset aarch64_regsets[] = { @@ -768,6 +1010,18 @@ static const struct user_regset aarch64_regsets[] = { .get = system_call_get, .set = system_call_set, }, +#ifdef CONFIG_ARM64_SVE + [REGSET_SVE] = { /* Scalable Vector Extension */ + .core_note_type = NT_ARM_SVE, + .n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE), + SVE_VQ_BYTES), + .size = SVE_VQ_BYTES, + .align = SVE_VQ_BYTES, + .get = sve_get, + .set = sve_set, + .get_size = sve_get_size, + }, +#endif }; static const struct user_regset_view user_aarch64_view = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index b5280db..735b8f4 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -416,6 +416,7 @@ typedef struct elf64_shdr { #define NT_ARM_HW_BREAK 0x402 /* ARM hardware breakpoint registers */ #define NT_ARM_HW_WATCH 0x403 /* ARM hardware watchpoint registers */ #define NT_ARM_SYSTEM_CALL 0x404 /* ARM system call number */ +#define NT_ARM_SVE 0x405 /* ARM Scalable Vector Extension registers */ #define NT_METAG_CBUF 0x500 /* Metag catch buffer registers */ #define NT_METAG_RPIPE 0x501 /* Metag read pipeline state */ #define NT_METAG_TLS 0x502 /* Metag TLS pointer */ -- 2.1.4
On Wed, Sep 06, 2017 at 04:21:50PM +0000, Okamoto, Takayuki wrote: > Hi Dave, > > I am an engineer of the postK computer from Fujitsu. > > When I tried to read "max_vl" by ptrace with this patch on our local SVE > simulator, it was read as zero. > I think the cause of this incident is that "max_vl" is set as "header->vl" > only on warning case in sve_init_header_from_task(). > "max_vl" should be set up also on normal case, like the following patch. > > > --- a/arch/arm64/kernel/ptrace.c > +++ b/arch/arm64/kernel/ptrace.c > @@ -755,6 +755,8 @@ static void sve_init_header_from_task(struct user_sve_header *header, > > if (WARN_ON(!sve_vl_valid(sve_max_vl))) > header->max_vl = header->vl; > + else > + header->max_vl = sve_max_vl; > > header->size = SVE_PT_SIZE(vq, header->flags); > header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl), Hi, thanks for reporting this. It looks like a refactoring mistake I made while removing BUG_ON()s, which I missed in my testing. Your fix looks correct and seems to work. For stylistic reasons, I may write it like this instead, but the effect should be the same: header->max_vl = sve_max_vl; if (WARN_ON(!sve_vl_valid(sve_max_vl)) header->max_vl = header->vl; Cheers ---Dave > > > Best regards, > Takayuki Okamoto > > -----Original Message----- > From: gdb-owner@sourceware.org [mailto:gdb-owner@sourceware.org] On Behalf Of Dave Martin > Sent: Friday, September 1, 2017 2:01 AM > To: linux-arm-kernel@lists.infradead.org > Cc: Catalin Marinas <catalin.marinas@arm.com>; Will Deacon <will.deacon@arm.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>; Alex Bennée <alex.bennee@linaro.org>; Szabolcs Nagy <szabolcs.nagy@arm.com>; Richard Sandiford <richard.sandiford@arm.com>; kvmarm@lists.cs.columbia.edu; libc-alpha@sourceware.org; linux-arch@vger.kernel.org; gdb@sourceware.org; Alan Hayward <alan.hayward@arm.com>; Yao Qi <Yao.Qi@arm.com>; Oleg Nesterov <oleg@redhat.com>; Alexander Viro <viro@zeniv.linux.org.uk> > Subject: [PATCH v2 19/28] arm64/sve: ptrace and ELF coredump support > [...] > @@ -702,6 +737,210 @@ static int system_call_set(struct task_struct *target, > return ret; > } > > +#ifdef CONFIG_ARM64_SVE > + > +static void sve_init_header_from_task(struct user_sve_header *header, > + struct task_struct *target) > +{ > + unsigned int vq; > + > + memset(header, 0, sizeof(*header)); > + > + header->flags = test_tsk_thread_flag(target, TIF_SVE) ? > + SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD; > + if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT)) > + header->flags |= SVE_PT_VL_INHERIT; > + > + header->vl = target->thread.sve_vl; > + vq = sve_vq_from_vl(header->vl); > + > + if (WARN_ON(!sve_vl_valid(sve_max_vl))) > + header->max_vl = header->vl; > + > + header->size = SVE_PT_SIZE(vq, header->flags); > + header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl), > + SVE_PT_REGS_SVE); > +} [...]
Hi Dave, Thank you for your reply. > Your fix looks correct and seems to work. For stylistic reasons, I may > write it like this instead, but the effect should be the same: > > header->max_vl = sve_max_vl; > if (WARN_ON(!sve_vl_valid(sve_max_vl)) > header->max_vl = header->vl; It is better than my fix. Please, apply it at next version. Best regards, Takayuki Okamoto > -----Original Message----- > From: linux-arm-kernel > [mailto:linux-arm-kernel-bounces@lists.infradead.org] On Behalf Of Dave > Martin > Sent: Thursday, September 7, 2017 3:17 AM > To: Okamoto, Takayuki <tokamoto@jp.fujitsu.com> > Cc: linux-arch@vger.kernel.org; libc-alpha@sourceware.org; Ard > Biesheuvel <ard.biesheuvel@linaro.org>; Szabolcs Nagy > <szabolcs.nagy@arm.com>; gdb@sourceware.org; Yao Qi <Yao.Qi@arm.com>; > Alan Hayward <alan.hayward@arm.com>; Will Deacon <will.deacon@arm.com>; > Oleg Nesterov <oleg@redhat.com>; Richard Sandiford > <richard.sandiford@arm.com>; Alexander Viro <viro@zeniv.linux.org.uk>; > Catalin Marinas <catalin.marinas@arm.com>; Alex Bennée > <alex.bennee@linaro.org>; kvmarm@lists.cs.columbia.edu; > linux-arm-kernel@lists.infradead.org > Subject: Re: [PATCH v2 19/28] arm64/sve: ptrace and ELF coredump support > > On Wed, Sep 06, 2017 at 04:21:50PM +0000, Okamoto, Takayuki wrote: > > Hi Dave, > > > > I am an engineer of the postK computer from Fujitsu. > > > > When I tried to read "max_vl" by ptrace with this patch on our local SVE > > simulator, it was read as zero. > > I think the cause of this incident is that "max_vl" is set as "header->vl" > > only on warning case in sve_init_header_from_task(). > > "max_vl" should be set up also on normal case, like the following patch. > > > > > > --- a/arch/arm64/kernel/ptrace.c > > +++ b/arch/arm64/kernel/ptrace.c > > @@ -755,6 +755,8 @@ static void sve_init_header_from_task(struct > user_sve_header *header, > > > > if (WARN_ON(!sve_vl_valid(sve_max_vl))) > > header->max_vl = header->vl; > > + else > > + header->max_vl = sve_max_vl; > > > > header->size = SVE_PT_SIZE(vq, header->flags); > > header->max_size = > SVE_PT_SIZE(sve_vq_from_vl(header->max_vl), > > Hi, thanks for reporting this. > > It looks like a refactoring mistake I made while removing BUG_ON()s, > which I missed in my testing. > > Your fix looks correct and seems to work. For stylistic reasons, I may > write it like this instead, but the effect should be the same: > > header->max_vl = sve_max_vl; > if (WARN_ON(!sve_vl_valid(sve_max_vl)) > header->max_vl = header->vl; > > Cheers > ---Dave > > > > > > > Best regards, > > Takayuki Okamoto > > > > -----Original Message----- > > From: gdb-owner@sourceware.org [mailto:gdb-owner@sourceware.org] On > Behalf Of Dave Martin > > Sent: Friday, September 1, 2017 2:01 AM > > To: linux-arm-kernel@lists.infradead.org > > Cc: Catalin Marinas <catalin.marinas@arm.com>; Will Deacon > <will.deacon@arm.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>; Alex > Bennée <alex.bennee@linaro.org>; Szabolcs Nagy <szabolcs.nagy@arm.com>; > Richard Sandiford <richard.sandiford@arm.com>; > kvmarm@lists.cs.columbia.edu; libc-alpha@sourceware.org; > linux-arch@vger.kernel.org; gdb@sourceware.org; Alan Hayward > <alan.hayward@arm.com>; Yao Qi <Yao.Qi@arm.com>; Oleg Nesterov > <oleg@redhat.com>; Alexander Viro <viro@zeniv.linux.org.uk> > > Subject: [PATCH v2 19/28] arm64/sve: ptrace and ELF coredump support > > > > [...] > > > @@ -702,6 +737,210 @@ static int system_call_set(struct task_struct > *target, > > return ret; > > } > > > > +#ifdef CONFIG_ARM64_SVE > > + > > +static void sve_init_header_from_task(struct user_sve_header *header, > > + struct task_struct *target) > > +{ > > + unsigned int vq; > > + > > + memset(header, 0, sizeof(*header)); > > + > > + header->flags = test_tsk_thread_flag(target, TIF_SVE) ? > > + SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD; > > + if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT)) > > + header->flags |= SVE_PT_VL_INHERIT; > > + > > + header->vl = target->thread.sve_vl; > > + vq = sve_vq_from_vl(header->vl); > > + > > + if (WARN_ON(!sve_vl_valid(sve_max_vl))) > > + header->max_vl = header->vl; > > + > > + header->size = SVE_PT_SIZE(vq, header->flags); > > + header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl), > > + SVE_PT_REGS_SVE); > > +} > > [...] > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Thu, Sep 07, 2017 at 05:11:45AM +0000, Okamoto, Takayuki wrote: > Hi Dave, > > Thank you for your reply. > > > Your fix looks correct and seems to work. For stylistic reasons, I may > > write it like this instead, but the effect should be the same: > > > > header->max_vl = sve_max_vl; > > if (WARN_ON(!sve_vl_valid(sve_max_vl)) > > header->max_vl = header->vl; > > It is better than my fix. > Please, apply it at next version. I've rebased to v4.13 and pushed a branch to track fixes against v2, here: * http://linux-arm.org/git?p=linux-dm.git;a=shortlog;h=refs/heads/sve/v2%2Bfixes * git://linux-arm.org/linux-dm.git sve/v2+fixes Cheers ---Dave
Dave Martin <Dave.Martin@arm.com> writes: > This patch defines and implements a new regset NT_ARM_SVE, which > describes a thread's SVE register state. This allows a debugger to > manipulate the SVE state, as well as being included in ELF > coredumps for post-mortem debugging. > > Because the regset size and layout are dependent on the thread's > current vector length, it is not possible to define a C struct to > describe the regset contents as is done for existing regsets. > Instead, and for the same reasons, NT_ARM_SVE is based on the > freeform variable-layout approach used for the SVE signal frame. > > Additionally, to reduce debug overhead when debugging threads that > might or might not have live SVE register state, NT_ARM_SVE may be > presented in one of two different formats: the old struct > user_fpsimd_state format is embedded for describing the state of a > thread with no live SVE state, whereas a new variable-layout > structure is embedded for describing live SVE state. This avoids a > debugger needing to poll NT_PRFPREG in addition to NT_ARM_SVE, and > allows existing userspace code to handle the non-SVE case without > too much modification. > > For this to work, NT_ARM_SVE is defined with a fixed-format header > of type struct user_sve_header, which the recipient can use to > figure out the content, size and layout of the reset of the regset. > Accessor macros are defined to allow the vector-length-dependent > parts of the regset to be manipulated. > > Signed-off-by: Alan Hayward <alan.hayward@arm.com> > Signed-off-by: Dave Martin <Dave.Martin@arm.com> > Cc: Alex Bennée <alex.bennee@linaro.org> > > --- > > Changes since v1 > ---------------- > > Other changes related to Alex Bennée's comments: > > * Migrate to SVE_VQ_BYTES instead of magic numbers. > > Requested by Alex Bennée: > > * Thin out BUG_ON()s: > Redundant BUG_ON()s and ones that just check invariants are removed. > Important sanity-checks are migrated to WARN_ON()s, with some > minimal best-effort patch-up code. > > Other: > > * [ABI fix] Bail out with -EIO if attempting to set the > SVE regs for an unsupported VL, instead of misparsing the regset data. > > * Replace some in-kernel open-coded arithmetic with ALIGN()/ > DIV_ROUND_UP(). > --- > arch/arm64/include/asm/fpsimd.h | 13 +- > arch/arm64/include/uapi/asm/ptrace.h | 135 ++++++++++++++++++ > arch/arm64/kernel/fpsimd.c | 40 +++++- > arch/arm64/kernel/ptrace.c | 270 +++++++++++++++++++++++++++++++++-- > include/uapi/linux/elf.h | 1 + > 5 files changed, 449 insertions(+), 10 deletions(-) > > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h > index 6c22624..2723cca 100644 > --- a/arch/arm64/include/asm/fpsimd.h > +++ b/arch/arm64/include/asm/fpsimd.h > @@ -38,13 +38,16 @@ struct fpsimd_state { > __uint128_t vregs[32]; > u32 fpsr; > u32 fpcr; > + /* > + * For ptrace compatibility, pad to next 128-bit > + * boundary here if extending this struct. > + */ > }; > }; > /* the id of the last cpu to have restored this state */ > unsigned int cpu; > }; > > - > #if defined(__KERNEL__) && defined(CONFIG_COMPAT) > /* Masks for extracting the FPSR and FPCR from the FPSCR */ > #define VFP_FPSCR_STAT_MASK 0xf800009f > @@ -89,6 +92,10 @@ extern void sve_alloc(struct task_struct *task); > extern void fpsimd_release_thread(struct task_struct *task); > extern void fpsimd_dup_sve(struct task_struct *dst, > struct task_struct const *src); > +extern void fpsimd_sync_to_sve(struct task_struct *task); > +extern void sve_sync_to_fpsimd(struct task_struct *task); > +extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task); > + > extern int sve_set_vector_length(struct task_struct *task, > unsigned long vl, unsigned long flags); > > @@ -103,6 +110,10 @@ static void __maybe_unused sve_alloc(struct task_struct *task) { } > static void __maybe_unused fpsimd_release_thread(struct task_struct *task) { } > static void __maybe_unused fpsimd_dup_sve(struct task_struct *dst, > struct task_struct const *src) { } > +static void __maybe_unused sve_sync_to_fpsimd(struct task_struct *task) { } > +static void __maybe_unused sve_sync_from_fpsimd_zeropad( > + struct task_struct *task) { } > + > static void __maybe_unused sve_init_vq_map(void) { } > static void __maybe_unused sve_update_vq_map(void) { } > static int __maybe_unused sve_verify_vq_map(void) { return 0; } > diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h > index d1ff83d..1915ab0 100644 > --- a/arch/arm64/include/uapi/asm/ptrace.h > +++ b/arch/arm64/include/uapi/asm/ptrace.h > @@ -22,6 +22,7 @@ > #include <linux/types.h> > > #include <asm/hwcap.h> > +#include <asm/sigcontext.h> > > > /* > @@ -63,6 +64,8 @@ > > #ifndef __ASSEMBLY__ > > +#include <linux/prctl.h> > + > /* > * User structures for general purpose, floating point and debug registers. > */ > @@ -90,6 +93,138 @@ struct user_hwdebug_state { > } dbg_regs[16]; > }; > > +/* SVE/FP/SIMD state (NT_ARM_SVE) */ > + > +struct user_sve_header { > + __u32 size; /* total meaningful regset content in bytes */ > + __u32 max_size; /* maxmium possible size for this thread */ > + __u16 vl; /* current vector length */ > + __u16 max_vl; /* maximum possible vector length */ > + __u16 flags; > + __u16 __reserved; > +}; > + > +/* Definitions for user_sve_header.flags: */ > +#define SVE_PT_REGS_MASK (1 << 0) > + > +/* Flags: must be kept in sync with prctl interface in > <linux/ptrace.h> */ Which flags? We base some on PR_foo flags but we seem to shift them anyway so where is the requirement for them to match from? > +#define SVE_PT_REGS_FPSIMD 0 > +#define SVE_PT_REGS_SVE SVE_PT_REGS_MASK > + > +#define SVE_PT_VL_INHERIT (PR_SVE_VL_INHERIT >> 16) > +#define SVE_PT_VL_ONEXEC (PR_SVE_SET_VL_ONEXEC >> 16) > + > + > +/* > + * The remainder of the SVE state follows struct user_sve_header. The > + * total size of the SVE state (including header) depends on the > + * metadata in the header: SVE_PT_SIZE(vq, flags) gives the total size > + * of the state in bytes, including the header. > + * > + * Refer to <asm/sigcontext.h> for details of how to pass the correct > + * "vq" argument to these macros. > + */ > + > +/* Offset from the start of struct user_sve_header to the register data */ > +#define SVE_PT_REGS_OFFSET \ > + ((sizeof(struct sve_context) + (SVE_VQ_BYTES - 1)) \ > + / SVE_VQ_BYTES * SVE_VQ_BYTES) > + > +/* > + * The register data content and layout depends on the value of the > + * flags field. > + */ > + > +/* > + * (flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD case: > + * > + * The payload starts at offset SVE_PT_FPSIMD_OFFSET, and is of type > + * struct user_fpsimd_state. Additional data might be appended in the > + * future: use SVE_PT_FPSIMD_SIZE(vq, flags) to compute the total size. > + * SVE_PT_FPSIMD_SIZE(vq, flags) will never be less than > + * sizeof(struct user_fpsimd_state). > + */ > + > +#define SVE_PT_FPSIMD_OFFSET SVE_PT_REGS_OFFSET > + > +#define SVE_PT_FPSIMD_SIZE(vq, flags) (sizeof(struct user_fpsimd_state)) > + > +/* > + * (flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE case: > + * > + * The payload starts at offset SVE_PT_SVE_OFFSET, and is of size > + * SVE_PT_SVE_SIZE(vq, flags). > + * > + * Additional macros describe the contents and layout of the payload. > + * For each, SVE_PT_SVE_x_OFFSET(args) is the start offset relative to > + * the start of struct user_sve_header, and SVE_PT_SVE_x_SIZE(args) is > + * the size in bytes: > + * > + * x type description > + * - ---- ----------- > + * ZREGS \ > + * ZREG | > + * PREGS | refer to <asm/sigcontext.h> > + * PREG | > + * FFR / > + * > + * FPSR uint32_t FPSR > + * FPCR uint32_t FPCR > + * > + * Additional data might be appended in the future. > + */ > + > +#define SVE_PT_SVE_ZREG_SIZE(vq) SVE_SIG_ZREG_SIZE(vq) > +#define SVE_PT_SVE_PREG_SIZE(vq) SVE_SIG_PREG_SIZE(vq) > +#define SVE_PT_SVE_FFR_SIZE(vq) SVE_SIG_FFR_SIZE(vq) > +#define SVE_PT_SVE_FPSR_SIZE sizeof(__u32) > +#define SVE_PT_SVE_FPCR_SIZE sizeof(__u32) > + > +#define __SVE_SIG_TO_PT(offset) \ > + ((offset) - SVE_SIG_REGS_OFFSET + SVE_PT_REGS_OFFSET) > + > +#define SVE_PT_SVE_OFFSET SVE_PT_REGS_OFFSET > + > +#define SVE_PT_SVE_ZREGS_OFFSET \ > + __SVE_SIG_TO_PT(SVE_SIG_ZREGS_OFFSET) > +#define SVE_PT_SVE_ZREG_OFFSET(vq, n) \ > + __SVE_SIG_TO_PT(SVE_SIG_ZREG_OFFSET(vq, n)) > +#define SVE_PT_SVE_ZREGS_SIZE(vq) \ > + (SVE_PT_SVE_ZREG_OFFSET(vq, SVE_NUM_ZREGS) - SVE_PT_SVE_ZREGS_OFFSET) > + > +#define SVE_PT_SVE_PREGS_OFFSET(vq) \ > + __SVE_SIG_TO_PT(SVE_SIG_PREGS_OFFSET(vq)) > +#define SVE_PT_SVE_PREG_OFFSET(vq, n) \ > + __SVE_SIG_TO_PT(SVE_SIG_PREG_OFFSET(vq, n)) > +#define SVE_PT_SVE_PREGS_SIZE(vq) \ > + (SVE_PT_SVE_PREG_OFFSET(vq, SVE_NUM_PREGS) - \ > + SVE_PT_SVE_PREGS_OFFSET(vq)) > + > +#define SVE_PT_SVE_FFR_OFFSET(vq) \ > + __SVE_SIG_TO_PT(SVE_SIG_FFR_OFFSET(vq)) > + > +#define SVE_PT_SVE_FPSR_OFFSET(vq) \ > + ((SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq) + \ > + (SVE_VQ_BYTES - 1)) \ > + / SVE_VQ_BYTES * SVE_VQ_BYTES) > +#define SVE_PT_SVE_FPCR_OFFSET(vq) \ > + (SVE_PT_SVE_FPSR_OFFSET(vq) + SVE_PT_SVE_FPSR_SIZE) > + > +/* > + * Any future extension appended after FPCR must be aligned to the next > + * 128-bit boundary. > + */ > + > +#define SVE_PT_SVE_SIZE(vq, flags) \ > + ((SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE \ > + - SVE_PT_SVE_OFFSET + (SVE_VQ_BYTES - 1)) \ > + / SVE_VQ_BYTES * SVE_VQ_BYTES) > + > +#define SVE_PT_SIZE(vq, flags) \ > + (((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ? \ > + SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags) \ > + : SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags)) > + > #endif /* __ASSEMBLY__ */ > > #endif /* _UAPI__ASM_PTRACE_H */ > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c > index fff9fcf..361c019 100644 > --- a/arch/arm64/kernel/fpsimd.c > +++ b/arch/arm64/kernel/fpsimd.c > @@ -303,6 +303,37 @@ void sve_alloc(struct task_struct *task) > BUG_ON(!task->thread.sve_state); > } > > +void fpsimd_sync_to_sve(struct task_struct *task) > +{ > + if (!test_tsk_thread_flag(task, TIF_SVE)) > + fpsimd_to_sve(task); > +} > + > +void sve_sync_to_fpsimd(struct task_struct *task) > +{ > + if (test_tsk_thread_flag(task, TIF_SVE)) > + sve_to_fpsimd(task); > +} > + > +void sve_sync_from_fpsimd_zeropad(struct task_struct *task) > +{ > + unsigned int vq; > + void *sst = task->thread.sve_state; > + struct fpsimd_state const *fst = &task->thread.fpsimd_state; > + unsigned int i; > + > + if (!test_tsk_thread_flag(task, TIF_SVE)) > + return; > + > + vq = sve_vq_from_vl(task->thread.sve_vl); > + > + memset(sst, 0, SVE_SIG_REGS_SIZE(vq)); > + > + for (i = 0; i < 32; ++i) > + memcpy(ZREG(sst, vq, i), &fst->vregs[i], > + sizeof(fst->vregs[i])); > +} > + > /* > * Handle SVE state across fork(): > * > @@ -459,10 +490,17 @@ static void __init sve_efi_setup(void) > * This is evidence of a crippled system and we are returning void, > * so no attempt is made to handle this situation here. > */ > - BUG_ON(!sve_vl_valid(sve_max_vl)); > + if (!sve_vl_valid(sve_max_vl)) > + goto fail; > + > efi_sve_state = __alloc_percpu( > SVE_SIG_REGS_SIZE(sve_vq_from_vl(sve_max_vl)), SVE_VQ_BYTES); > if (!efi_sve_state) > + goto fail; > + > + return; > + > +fail: > panic("Cannot allocate percpu memory for EFI SVE save/restore"); > } > > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c > index 9cbb612..5ef4735b 100644 > --- a/arch/arm64/kernel/ptrace.c > +++ b/arch/arm64/kernel/ptrace.c > @@ -32,6 +32,7 @@ > #include <linux/security.h> > #include <linux/init.h> > #include <linux/signal.h> > +#include <linux/string.h> > #include <linux/uaccess.h> > #include <linux/perf_event.h> > #include <linux/hw_breakpoint.h> > @@ -40,6 +41,7 @@ > #include <linux/elf.h> > > #include <asm/compat.h> > +#include <asm/cpufeature.h> > #include <asm/debug-monitors.h> > #include <asm/pgtable.h> > #include <asm/stacktrace.h> > @@ -618,33 +620,66 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset, > /* > * TODO: update fp accessors for lazy context switching (sync/flush hwstate) > */ > -static int fpr_get(struct task_struct *target, const struct user_regset *regset, > - unsigned int pos, unsigned int count, > - void *kbuf, void __user *ubuf) > +static int __fpr_get(struct task_struct *target, > + const struct user_regset *regset, > + unsigned int pos, unsigned int count, > + void *kbuf, void __user *ubuf, unsigned int start_pos) > { > struct user_fpsimd_state *uregs; > + > + sve_sync_to_fpsimd(target); > + > uregs = &target->thread.fpsimd_state.user_fpsimd; > > + return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, > + start_pos, start_pos + sizeof(*uregs)); > +} > + > +static int fpr_get(struct task_struct *target, const struct user_regset *regset, > + unsigned int pos, unsigned int count, > + void *kbuf, void __user *ubuf) > +{ > if (target == current) > fpsimd_preserve_current_state(); > > - return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, 0, -1); > + return __fpr_get(target, regset, pos, count, kbuf, ubuf, 0); > } > > -static int fpr_set(struct task_struct *target, const struct user_regset *regset, > - unsigned int pos, unsigned int count, > - const void *kbuf, const void __user *ubuf) > +static int __fpr_set(struct task_struct *target, > + const struct user_regset *regset, > + unsigned int pos, unsigned int count, > + const void *kbuf, const void __user *ubuf, > + unsigned int start_pos) > { > int ret; > struct user_fpsimd_state newstate = > target->thread.fpsimd_state.user_fpsimd; > > - ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, 0, -1); > + sve_sync_to_fpsimd(target); > + > + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, > + start_pos, start_pos + sizeof(newstate)); > if (ret) > return ret; > > target->thread.fpsimd_state.user_fpsimd = newstate; > + > + return ret; > +} > + > +static int fpr_set(struct task_struct *target, const struct user_regset *regset, > + unsigned int pos, unsigned int count, > + const void *kbuf, const void __user *ubuf) > +{ > + int ret; > + > + ret = __fpr_set(target, regset, pos, count, kbuf, ubuf, 0); > + if (ret) > + return ret; > + > + sve_sync_from_fpsimd_zeropad(target); > fpsimd_flush_task_state(target); > + > return ret; > } > > @@ -702,6 +737,210 @@ static int system_call_set(struct task_struct *target, > return ret; > } > > +#ifdef CONFIG_ARM64_SVE > + > +static void sve_init_header_from_task(struct user_sve_header *header, > + struct task_struct *target) > +{ > + unsigned int vq; > + > + memset(header, 0, sizeof(*header)); > + > + header->flags = test_tsk_thread_flag(target, TIF_SVE) ? > + SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD; > + if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT)) > + header->flags |= SVE_PT_VL_INHERIT; > + > + header->vl = target->thread.sve_vl; > + vq = sve_vq_from_vl(header->vl); > + > + if (WARN_ON(!sve_vl_valid(sve_max_vl))) > + header->max_vl = header->vl; > + > + header->size = SVE_PT_SIZE(vq, header->flags); > + header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl), > + SVE_PT_REGS_SVE); > +} > + > +static unsigned int sve_size_from_header(struct user_sve_header const *header) > +{ > + return ALIGN(header->size, SVE_VQ_BYTES); > +} > + > +static unsigned int sve_get_size(struct task_struct *target, > + const struct user_regset *regset) > +{ > + struct user_sve_header header; > + > + if (!system_supports_sve()) > + return 0; > + > + sve_init_header_from_task(&header, target); > + return sve_size_from_header(&header); > +} > + > +static int sve_get(struct task_struct *target, > + const struct user_regset *regset, > + unsigned int pos, unsigned int count, > + void *kbuf, void __user *ubuf) > +{ > + int ret; > + struct user_sve_header header; > + unsigned int vq; > + unsigned long start, end; > + > + if (!system_supports_sve()) > + return -EINVAL; > + > + /* Header */ > + sve_init_header_from_task(&header, target); > + vq = sve_vq_from_vl(header.vl); > + > + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &header, > + 0, sizeof(header)); > + if (ret) > + return ret; > + > + if (target == current) > + fpsimd_preserve_current_state(); > + > + /* Registers: FPSIMD-only case */ > + > + BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header)); > + if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD) > + return __fpr_get(target, regset, pos, count, kbuf, ubuf, > + SVE_PT_FPSIMD_OFFSET); > + > + /* Otherwise: full SVE case */ > + > + BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header)); > + start = SVE_PT_SVE_OFFSET; > + end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq); > + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, > + target->thread.sve_state, > + start, end); > + if (ret) > + return ret; > + > + start = end; > + end = SVE_PT_SVE_FPSR_OFFSET(vq); > + ret = user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf, > + start, end); > + if (ret) > + return ret; > + > + /* > + * Copy fpsr, and fpcr which must follow contiguously in > + * struct fpsimd_state: > + */ > + start = end; > + end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE; > + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, > + &target->thread.fpsimd_state.fpsr, > + start, end); > + if (ret) > + return ret; > + > + start = end; > + end = sve_size_from_header(&header); > + return user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf, > + start, end); > +} > + > +static int sve_set(struct task_struct *target, > + const struct user_regset *regset, > + unsigned int pos, unsigned int count, > + const void *kbuf, const void __user *ubuf) > +{ > + int ret; > + struct user_sve_header header; > + unsigned int vq; > + unsigned long start, end; > + > + if (!system_supports_sve()) > + return -EINVAL; > + > + /* Header */ > + if (count < sizeof(header)) > + return -EINVAL; > + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &header, > + 0, sizeof(header)); > + if (ret) > + goto out; > + > + /* > + * Apart from PT_SVE_REGS_MASK, all PT_SVE_* flags are consumed by > + * sve_set_vector_length(), which will also validate them for us: > + */ > + ret = sve_set_vector_length(target, header.vl, > + header.flags & ~SVE_PT_REGS_MASK); > + if (ret) > + goto out; > + > + /* Actual VL set may be less than the user asked for: */ > + vq = sve_vq_from_vl(target->thread.sve_vl); > + > + /* Registers: FPSIMD-only case */ > + > + BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header)); > + if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD) { > + sve_sync_to_fpsimd(target); > + > + ret = __fpr_set(target, regset, pos, count, kbuf, ubuf, > + SVE_PT_FPSIMD_OFFSET); > + clear_tsk_thread_flag(target, TIF_SVE); > + goto out; > + } > + > + /* Otherwise: full SVE case */ > + > + /* > + * If setting a different VL from the requested VL and there is > + * register data, the data layout will be wrong: don't even > + * try to set the registers in this case. > + */ > + if (count && vq != sve_vq_from_vl(header.vl)) { > + ret = -EIO; > + goto out; > + } > + > + sve_alloc(target); > + fpsimd_sync_to_sve(target); > + set_tsk_thread_flag(target, TIF_SVE); > + > + BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header)); > + start = SVE_PT_SVE_OFFSET; > + end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq); > + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, > + target->thread.sve_state, > + start, end); > + if (ret) > + goto out; > + > + start = end; > + end = SVE_PT_SVE_FPSR_OFFSET(vq); > + ret = user_regset_copyin_ignore(&pos, &count, &kbuf, &ubuf, > + start, end); > + if (ret) > + goto out; > + > + /* > + * Copy fpsr, and fpcr which must follow contiguously in > + * struct fpsimd_state: > + */ > + start = end; > + end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE; > + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, > + &target->thread.fpsimd_state.fpsr, > + start, end); > + > +out: > + fpsimd_flush_task_state(target); > + return ret; > +} > + > +#endif /* CONFIG_ARM64_SVE */ > + > enum aarch64_regset { > REGSET_GPR, > REGSET_FPR, > @@ -711,6 +950,9 @@ enum aarch64_regset { > REGSET_HW_WATCH, > #endif > REGSET_SYSTEM_CALL, > +#ifdef CONFIG_ARM64_SVE > + REGSET_SVE, > +#endif > }; > > static const struct user_regset aarch64_regsets[] = { > @@ -768,6 +1010,18 @@ static const struct user_regset aarch64_regsets[] = { > .get = system_call_get, > .set = system_call_set, > }, > +#ifdef CONFIG_ARM64_SVE > + [REGSET_SVE] = { /* Scalable Vector Extension */ > + .core_note_type = NT_ARM_SVE, > + .n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE), > + SVE_VQ_BYTES), > + .size = SVE_VQ_BYTES, > + .align = SVE_VQ_BYTES, > + .get = sve_get, > + .set = sve_set, > + .get_size = sve_get_size, > + }, > +#endif > }; > > static const struct user_regset_view user_aarch64_view = { > diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h > index b5280db..735b8f4 100644 > --- a/include/uapi/linux/elf.h > +++ b/include/uapi/linux/elf.h > @@ -416,6 +416,7 @@ typedef struct elf64_shdr { > #define NT_ARM_HW_BREAK 0x402 /* ARM hardware breakpoint registers */ > #define NT_ARM_HW_WATCH 0x403 /* ARM hardware watchpoint registers */ > #define NT_ARM_SYSTEM_CALL 0x404 /* ARM system call number */ > +#define NT_ARM_SVE 0x405 /* ARM Scalable Vector Extension registers */ > #define NT_METAG_CBUF 0x500 /* Metag catch buffer registers */ > #define NT_METAG_RPIPE 0x501 /* Metag read pipeline state */ > #define NT_METAG_TLS 0x502 /* Metag TLS pointer */ Otherwise: Reviewed-by: Alex Bennée <alex.bennee@linaro.org> -- Alex Bennée
On Thu, Sep 14, 2017 at 01:57:08PM +0100, Alex Bennée wrote: > > Dave Martin <Dave.Martin@arm.com> writes: > > > This patch defines and implements a new regset NT_ARM_SVE, which > > describes a thread's SVE register state. This allows a debugger to > > manipulate the SVE state, as well as being included in ELF > > coredumps for post-mortem debugging. > > > > Because the regset size and layout are dependent on the thread's > > current vector length, it is not possible to define a C struct to > > describe the regset contents as is done for existing regsets. > > Instead, and for the same reasons, NT_ARM_SVE is based on the > > freeform variable-layout approach used for the SVE signal frame. > > > > Additionally, to reduce debug overhead when debugging threads that > > might or might not have live SVE register state, NT_ARM_SVE may be > > presented in one of two different formats: the old struct > > user_fpsimd_state format is embedded for describing the state of a > > thread with no live SVE state, whereas a new variable-layout > > structure is embedded for describing live SVE state. This avoids a > > debugger needing to poll NT_PRFPREG in addition to NT_ARM_SVE, and > > allows existing userspace code to handle the non-SVE case without > > too much modification. > > > > For this to work, NT_ARM_SVE is defined with a fixed-format header > > of type struct user_sve_header, which the recipient can use to > > figure out the content, size and layout of the reset of the regset. > > Accessor macros are defined to allow the vector-length-dependent > > parts of the regset to be manipulated. > > > > Signed-off-by: Alan Hayward <alan.hayward@arm.com> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com> > > Cc: Alex Bennée <alex.bennee@linaro.org> > > > > --- > > > > Changes since v1 > > ---------------- > > > > Other changes related to Alex Bennée's comments: > > > > * Migrate to SVE_VQ_BYTES instead of magic numbers. > > > > Requested by Alex Bennée: > > > > * Thin out BUG_ON()s: > > Redundant BUG_ON()s and ones that just check invariants are removed. > > Important sanity-checks are migrated to WARN_ON()s, with some > > minimal best-effort patch-up code. > > > > Other: > > > > * [ABI fix] Bail out with -EIO if attempting to set the > > SVE regs for an unsupported VL, instead of misparsing the regset data. > > > > * Replace some in-kernel open-coded arithmetic with ALIGN()/ > > DIV_ROUND_UP(). > > --- [...] > > diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h > > index d1ff83d..1915ab0 100644 > > --- a/arch/arm64/include/uapi/asm/ptrace.h > > +++ b/arch/arm64/include/uapi/asm/ptrace.h > > @@ -22,6 +22,7 @@ > > #include <linux/types.h> > > > > #include <asm/hwcap.h> > > +#include <asm/sigcontext.h> > > > > > > /* > > @@ -63,6 +64,8 @@ > > > > #ifndef __ASSEMBLY__ > > > > +#include <linux/prctl.h> > > + > > /* > > * User structures for general purpose, floating point and debug registers. > > */ > > @@ -90,6 +93,138 @@ struct user_hwdebug_state { > > } dbg_regs[16]; > > }; > > > > +/* SVE/FP/SIMD state (NT_ARM_SVE) */ > > + > > +struct user_sve_header { > > + __u32 size; /* total meaningful regset content in bytes */ > > + __u32 max_size; /* maxmium possible size for this thread */ > > + __u16 vl; /* current vector length */ > > + __u16 max_vl; /* maximum possible vector length */ > > + __u16 flags; > > + __u16 __reserved; > > +}; > > + > > +/* Definitions for user_sve_header.flags: */ > > +#define SVE_PT_REGS_MASK (1 << 0) > > + > > +/* Flags: must be kept in sync with prctl interface in > > <linux/ptrace.h> */ > > Which flags? We base some on PR_foo flags but we seem to shift them All the prctl flags that have equivalents here, because they're part of the internal API to sve_set_vector_length(). It didn't quite seem appropriate to document that in a userspace header, but it's probably better to say something here than not. I'll improve the comment. > anyway so where is the requirement for them to match from? There is a bug here though: sve_set() in ptrace.c is supposed to shift the flags from header.flags (which is a u16) back into the PR_SVE_SET_VL position (<< 16) for the flags argument of sve_set_vector_length(). But this isn't done, so attempting to set (or restore) those flags through ptrace may resulting EINVALs from sve_set_vector_length(). I'll write a test for this case and implement a fix, something like... -8<- static int sve_set(struct task_struct *target, [...] ret = sve_set_vector_length(target, header.vl, - header.flags & ~SVE_PT_REGS_MASK); + (header.flags & ~SVE_PT_REGS_MASK) << 16UL); ->8- What do you think? [...] > Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Again, I'll wait for your feedback first. Cheers ---Dave
On Thu, Sep 14, 2017 at 01:57:08PM +0100, Alex Bennée wrote: > > Dave Martin <Dave.Martin@arm.com> writes: > > > This patch defines and implements a new regset NT_ARM_SVE, which > > describes a thread's SVE register state. This allows a debugger to > > manipulate the SVE state, as well as being included in ELF > > coredumps for post-mortem debugging. > > > > Because the regset size and layout are dependent on the thread's > > current vector length, it is not possible to define a C struct to > > describe the regset contents as is done for existing regsets. > > Instead, and for the same reasons, NT_ARM_SVE is based on the > > freeform variable-layout approach used for the SVE signal frame. > > > > Additionally, to reduce debug overhead when debugging threads that > > might or might not have live SVE register state, NT_ARM_SVE may be > > presented in one of two different formats: the old struct > > user_fpsimd_state format is embedded for describing the state of a > > thread with no live SVE state, whereas a new variable-layout > > structure is embedded for describing live SVE state. This avoids a > > debugger needing to poll NT_PRFPREG in addition to NT_ARM_SVE, and > > allows existing userspace code to handle the non-SVE case without > > too much modification. > > > > For this to work, NT_ARM_SVE is defined with a fixed-format header > > of type struct user_sve_header, which the recipient can use to > > figure out the content, size and layout of the reset of the regset. > > Accessor macros are defined to allow the vector-length-dependent > > parts of the regset to be manipulated. > > > > Signed-off-by: Alan Hayward <alan.hayward@arm.com> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com> > > Cc: Alex Bennée <alex.bennee@linaro.org> > > > > --- > > > > Changes since v1 > > ---------------- > > > > Other changes related to Alex Bennée's comments: > > > > * Migrate to SVE_VQ_BYTES instead of magic numbers. > > > > Requested by Alex Bennée: > > > > * Thin out BUG_ON()s: > > Redundant BUG_ON()s and ones that just check invariants are removed. > > Important sanity-checks are migrated to WARN_ON()s, with some > > minimal best-effort patch-up code. > > > > Other: > > > > * [ABI fix] Bail out with -EIO if attempting to set the > > SVE regs for an unsupported VL, instead of misparsing the regset data. > > > > * Replace some in-kernel open-coded arithmetic with ALIGN()/ > > DIV_ROUND_UP(). > > --- [...] > > diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h [...] > > +/* Definitions for user_sve_header.flags: */ > > +#define SVE_PT_REGS_MASK (1 << 0) > > + > > +/* Flags: must be kept in sync with prctl interface in > > <linux/ptrace.h> */ > > Which flags? We base some on PR_foo flags but we seem to shift them > anyway so where is the requirement for them to match from? I've rearranged this as: -8<- /* Definitions for user_sve_header.flags: */ #define SVE_PT_REGS_MASK (1 << 0) #define SVE_PT_REGS_FPSIMD 0 #define SVE_PT_REGS_SVE SVE_PT_REGS_MASK /* * Common SVE_PT_* flags: * These must be kept in sync with prctl interface in <linux/ptrace.h> */ #define SVE_PT_VL_INHERIT (PR_SVE_VL_INHERIT >> 16) #define SVE_PT_VL_ONEXEC (PR_SVE_SET_VL_ONEXEC >> 16) ->8- This avoids the suggestion that SVE_PT_REGS_{MASK,FPSIMD,SVE} are supposed to have prctl counterparts. I don't really want to write more, in case it is misinterpreted as specification of behaviour. This comment is really only meant as a reminder to maintainers that they should go look at prctl.h too, before blindly making changes here. Any good? If you have a different suggestion, I'm all ears... [...] Cheers ---Dave
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 6c22624..2723cca 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -38,13 +38,16 @@ struct fpsimd_state { __uint128_t vregs[32]; u32 fpsr; u32 fpcr; + /* + * For ptrace compatibility, pad to next 128-bit + * boundary here if extending this struct. + */ }; }; /* the id of the last cpu to have restored this state */ unsigned int cpu; }; - #if defined(__KERNEL__) && defined(CONFIG_COMPAT) /* Masks for extracting the FPSR and FPCR from the FPSCR */ #define VFP_FPSCR_STAT_MASK 0xf800009f @@ -89,6 +92,10 @@ extern void sve_alloc(struct task_struct *task); extern void fpsimd_release_thread(struct task_struct *task); extern void fpsimd_dup_sve(struct task_struct *dst, struct task_struct const *src); +extern void fpsimd_sync_to_sve(struct task_struct *task); +extern void sve_sync_to_fpsimd(struct task_struct *task); +extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task); + extern int sve_set_vector_length(struct task_struct *task, unsigned long vl, unsigned long flags); @@ -103,6 +110,10 @@ static void __maybe_unused sve_alloc(struct task_struct *task) { } static void __maybe_unused fpsimd_release_thread(struct task_struct *task) { } static void __maybe_unused fpsimd_dup_sve(struct task_struct *dst, struct task_struct const *src) { } +static void __maybe_unused sve_sync_to_fpsimd(struct task_struct *task) { } +static void __maybe_unused sve_sync_from_fpsimd_zeropad( + struct task_struct *task) { } + static void __maybe_unused sve_init_vq_map(void) { } static void __maybe_unused sve_update_vq_map(void) { } static int __maybe_unused sve_verify_vq_map(void) { return 0; } diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h index d1ff83d..1915ab0 100644 --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -22,6 +22,7 @@ #include <linux/types.h> #include <asm/hwcap.h> +#include <asm/sigcontext.h> /* @@ -63,6 +64,8 @@ #ifndef __ASSEMBLY__ +#include <linux/prctl.h> + /* * User structures for general purpose, floating point and debug registers. */ @@ -90,6 +93,138 @@ struct user_hwdebug_state { } dbg_regs[16]; }; +/* SVE/FP/SIMD state (NT_ARM_SVE) */ + +struct user_sve_header { + __u32 size; /* total meaningful regset content in bytes */ + __u32 max_size; /* maxmium possible size for this thread */ + __u16 vl; /* current vector length */ + __u16 max_vl; /* maximum possible vector length */ + __u16 flags; + __u16 __reserved; +}; + +/* Definitions for user_sve_header.flags: */ +#define SVE_PT_REGS_MASK (1 << 0) + +/* Flags: must be kept in sync with prctl interface in <linux/ptrace.h> */ +#define SVE_PT_REGS_FPSIMD 0 +#define SVE_PT_REGS_SVE SVE_PT_REGS_MASK + +#define SVE_PT_VL_INHERIT (PR_SVE_VL_INHERIT >> 16) +#define SVE_PT_VL_ONEXEC (PR_SVE_SET_VL_ONEXEC >> 16) + + +/* + * The remainder of the SVE state follows struct user_sve_header. The + * total size of the SVE state (including header) depends on the + * metadata in the header: SVE_PT_SIZE(vq, flags) gives the total size + * of the state in bytes, including the header. + * + * Refer to <asm/sigcontext.h> for details of how to pass the correct + * "vq" argument to these macros. + */ + +/* Offset from the start of struct user_sve_header to the register data */ +#define SVE_PT_REGS_OFFSET \ + ((sizeof(struct sve_context) + (SVE_VQ_BYTES - 1)) \ + / SVE_VQ_BYTES * SVE_VQ_BYTES) + +/* + * The register data content and layout depends on the value of the + * flags field. + */ + +/* + * (flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD case: + * + * The payload starts at offset SVE_PT_FPSIMD_OFFSET, and is of type + * struct user_fpsimd_state. Additional data might be appended in the + * future: use SVE_PT_FPSIMD_SIZE(vq, flags) to compute the total size. + * SVE_PT_FPSIMD_SIZE(vq, flags) will never be less than + * sizeof(struct user_fpsimd_state). + */ + +#define SVE_PT_FPSIMD_OFFSET SVE_PT_REGS_OFFSET + +#define SVE_PT_FPSIMD_SIZE(vq, flags) (sizeof(struct user_fpsimd_state)) + +/* + * (flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE case: + * + * The payload starts at offset SVE_PT_SVE_OFFSET, and is of size + * SVE_PT_SVE_SIZE(vq, flags). + * + * Additional macros describe the contents and layout of the payload. + * For each, SVE_PT_SVE_x_OFFSET(args) is the start offset relative to + * the start of struct user_sve_header, and SVE_PT_SVE_x_SIZE(args) is + * the size in bytes: + * + * x type description + * - ---- ----------- + * ZREGS \ + * ZREG | + * PREGS | refer to <asm/sigcontext.h> + * PREG | + * FFR / + * + * FPSR uint32_t FPSR + * FPCR uint32_t FPCR + * + * Additional data might be appended in the future. + */ + +#define SVE_PT_SVE_ZREG_SIZE(vq) SVE_SIG_ZREG_SIZE(vq) +#define SVE_PT_SVE_PREG_SIZE(vq) SVE_SIG_PREG_SIZE(vq) +#define SVE_PT_SVE_FFR_SIZE(vq) SVE_SIG_FFR_SIZE(vq) +#define SVE_PT_SVE_FPSR_SIZE sizeof(__u32) +#define SVE_PT_SVE_FPCR_SIZE sizeof(__u32) + +#define __SVE_SIG_TO_PT(offset) \ + ((offset) - SVE_SIG_REGS_OFFSET + SVE_PT_REGS_OFFSET) + +#define SVE_PT_SVE_OFFSET SVE_PT_REGS_OFFSET + +#define SVE_PT_SVE_ZREGS_OFFSET \ + __SVE_SIG_TO_PT(SVE_SIG_ZREGS_OFFSET) +#define SVE_PT_SVE_ZREG_OFFSET(vq, n) \ + __SVE_SIG_TO_PT(SVE_SIG_ZREG_OFFSET(vq, n)) +#define SVE_PT_SVE_ZREGS_SIZE(vq) \ + (SVE_PT_SVE_ZREG_OFFSET(vq, SVE_NUM_ZREGS) - SVE_PT_SVE_ZREGS_OFFSET) + +#define SVE_PT_SVE_PREGS_OFFSET(vq) \ + __SVE_SIG_TO_PT(SVE_SIG_PREGS_OFFSET(vq)) +#define SVE_PT_SVE_PREG_OFFSET(vq, n) \ + __SVE_SIG_TO_PT(SVE_SIG_PREG_OFFSET(vq, n)) +#define SVE_PT_SVE_PREGS_SIZE(vq) \ + (SVE_PT_SVE_PREG_OFFSET(vq, SVE_NUM_PREGS) - \ + SVE_PT_SVE_PREGS_OFFSET(vq)) + +#define SVE_PT_SVE_FFR_OFFSET(vq) \ + __SVE_SIG_TO_PT(SVE_SIG_FFR_OFFSET(vq)) + +#define SVE_PT_SVE_FPSR_OFFSET(vq) \ + ((SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq) + \ + (SVE_VQ_BYTES - 1)) \ + / SVE_VQ_BYTES * SVE_VQ_BYTES) +#define SVE_PT_SVE_FPCR_OFFSET(vq) \ + (SVE_PT_SVE_FPSR_OFFSET(vq) + SVE_PT_SVE_FPSR_SIZE) + +/* + * Any future extension appended after FPCR must be aligned to the next + * 128-bit boundary. + */ + +#define SVE_PT_SVE_SIZE(vq, flags) \ + ((SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE \ + - SVE_PT_SVE_OFFSET + (SVE_VQ_BYTES - 1)) \ + / SVE_VQ_BYTES * SVE_VQ_BYTES) + +#define SVE_PT_SIZE(vq, flags) \ + (((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ? \ + SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags) \ + : SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags)) + #endif /* __ASSEMBLY__ */ #endif /* _UAPI__ASM_PTRACE_H */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index fff9fcf..361c019 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -303,6 +303,37 @@ void sve_alloc(struct task_struct *task) BUG_ON(!task->thread.sve_state); } +void fpsimd_sync_to_sve(struct task_struct *task) +{ + if (!test_tsk_thread_flag(task, TIF_SVE)) + fpsimd_to_sve(task); +} + +void sve_sync_to_fpsimd(struct task_struct *task) +{ + if (test_tsk_thread_flag(task, TIF_SVE)) + sve_to_fpsimd(task); +} + +void sve_sync_from_fpsimd_zeropad(struct task_struct *task) +{ + unsigned int vq; + void *sst = task->thread.sve_state; + struct fpsimd_state const *fst = &task->thread.fpsimd_state; + unsigned int i; + + if (!test_tsk_thread_flag(task, TIF_SVE)) + return; + + vq = sve_vq_from_vl(task->thread.sve_vl); + + memset(sst, 0, SVE_SIG_REGS_SIZE(vq)); + + for (i = 0; i < 32; ++i) + memcpy(ZREG(sst, vq, i), &fst->vregs[i], + sizeof(fst->vregs[i])); +} + /* * Handle SVE state across fork(): * @@ -459,10 +490,17 @@ static void __init sve_efi_setup(void) * This is evidence of a crippled system and we are returning void, * so no attempt is made to handle this situation here. */ - BUG_ON(!sve_vl_valid(sve_max_vl)); + if (!sve_vl_valid(sve_max_vl)) + goto fail; + efi_sve_state = __alloc_percpu( SVE_SIG_REGS_SIZE(sve_vq_from_vl(sve_max_vl)), SVE_VQ_BYTES); if (!efi_sve_state) + goto fail; + + return; + +fail: panic("Cannot allocate percpu memory for EFI SVE save/restore"); } diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 9cbb612..5ef4735b 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -32,6 +32,7 @@ #include <linux/security.h> #include <linux/init.h> #include <linux/signal.h> +#include <linux/string.h> #include <linux/uaccess.h> #include <linux/perf_event.h> #include <linux/hw_breakpoint.h> @@ -40,6 +41,7 @@ #include <linux/elf.h> #include <asm/compat.h> +#include <asm/cpufeature.h> #include <asm/debug-monitors.h> #include <asm/pgtable.h> #include <asm/stacktrace.h> @@ -618,33 +620,66 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset, /* * TODO: update fp accessors for lazy context switching (sync/flush hwstate) */ -static int fpr_get(struct task_struct *target, const struct user_regset *regset, - unsigned int pos, unsigned int count, - void *kbuf, void __user *ubuf) +static int __fpr_get(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + void *kbuf, void __user *ubuf, unsigned int start_pos) { struct user_fpsimd_state *uregs; + + sve_sync_to_fpsimd(target); + uregs = &target->thread.fpsimd_state.user_fpsimd; + return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, + start_pos, start_pos + sizeof(*uregs)); +} + +static int fpr_get(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + void *kbuf, void __user *ubuf) +{ if (target == current) fpsimd_preserve_current_state(); - return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, 0, -1); + return __fpr_get(target, regset, pos, count, kbuf, ubuf, 0); } -static int fpr_set(struct task_struct *target, const struct user_regset *regset, - unsigned int pos, unsigned int count, - const void *kbuf, const void __user *ubuf) +static int __fpr_set(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf, + unsigned int start_pos) { int ret; struct user_fpsimd_state newstate = target->thread.fpsimd_state.user_fpsimd; - ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, 0, -1); + sve_sync_to_fpsimd(target); + + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, + start_pos, start_pos + sizeof(newstate)); if (ret) return ret; target->thread.fpsimd_state.user_fpsimd = newstate; + + return ret; +} + +static int fpr_set(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + int ret; + + ret = __fpr_set(target, regset, pos, count, kbuf, ubuf, 0); + if (ret) + return ret; + + sve_sync_from_fpsimd_zeropad(target); fpsimd_flush_task_state(target); + return ret; } @@ -702,6 +737,210 @@ static int system_call_set(struct task_struct *target, return ret; } +#ifdef CONFIG_ARM64_SVE + +static void sve_init_header_from_task(struct user_sve_header *header, + struct task_struct *target) +{ + unsigned int vq; + + memset(header, 0, sizeof(*header)); + + header->flags = test_tsk_thread_flag(target, TIF_SVE) ? + SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD; + if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT)) + header->flags |= SVE_PT_VL_INHERIT; + + header->vl = target->thread.sve_vl; + vq = sve_vq_from_vl(header->vl); + + if (WARN_ON(!sve_vl_valid(sve_max_vl))) + header->max_vl = header->vl; + + header->size = SVE_PT_SIZE(vq, header->flags); + header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl), + SVE_PT_REGS_SVE); +} + +static unsigned int sve_size_from_header(struct user_sve_header const *header) +{ + return ALIGN(header->size, SVE_VQ_BYTES); +} + +static unsigned int sve_get_size(struct task_struct *target, + const struct user_regset *regset) +{ + struct user_sve_header header; + + if (!system_supports_sve()) + return 0; + + sve_init_header_from_task(&header, target); + return sve_size_from_header(&header); +} + +static int sve_get(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + void *kbuf, void __user *ubuf) +{ + int ret; + struct user_sve_header header; + unsigned int vq; + unsigned long start, end; + + if (!system_supports_sve()) + return -EINVAL; + + /* Header */ + sve_init_header_from_task(&header, target); + vq = sve_vq_from_vl(header.vl); + + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &header, + 0, sizeof(header)); + if (ret) + return ret; + + if (target == current) + fpsimd_preserve_current_state(); + + /* Registers: FPSIMD-only case */ + + BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header)); + if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD) + return __fpr_get(target, regset, pos, count, kbuf, ubuf, + SVE_PT_FPSIMD_OFFSET); + + /* Otherwise: full SVE case */ + + BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header)); + start = SVE_PT_SVE_OFFSET; + end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq); + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + target->thread.sve_state, + start, end); + if (ret) + return ret; + + start = end; + end = SVE_PT_SVE_FPSR_OFFSET(vq); + ret = user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf, + start, end); + if (ret) + return ret; + + /* + * Copy fpsr, and fpcr which must follow contiguously in + * struct fpsimd_state: + */ + start = end; + end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE; + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + &target->thread.fpsimd_state.fpsr, + start, end); + if (ret) + return ret; + + start = end; + end = sve_size_from_header(&header); + return user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf, + start, end); +} + +static int sve_set(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + int ret; + struct user_sve_header header; + unsigned int vq; + unsigned long start, end; + + if (!system_supports_sve()) + return -EINVAL; + + /* Header */ + if (count < sizeof(header)) + return -EINVAL; + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &header, + 0, sizeof(header)); + if (ret) + goto out; + + /* + * Apart from PT_SVE_REGS_MASK, all PT_SVE_* flags are consumed by + * sve_set_vector_length(), which will also validate them for us: + */ + ret = sve_set_vector_length(target, header.vl, + header.flags & ~SVE_PT_REGS_MASK); + if (ret) + goto out; + + /* Actual VL set may be less than the user asked for: */ + vq = sve_vq_from_vl(target->thread.sve_vl); + + /* Registers: FPSIMD-only case */ + + BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header)); + if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD) { + sve_sync_to_fpsimd(target); + + ret = __fpr_set(target, regset, pos, count, kbuf, ubuf, + SVE_PT_FPSIMD_OFFSET); + clear_tsk_thread_flag(target, TIF_SVE); + goto out; + } + + /* Otherwise: full SVE case */ + + /* + * If setting a different VL from the requested VL and there is + * register data, the data layout will be wrong: don't even + * try to set the registers in this case. + */ + if (count && vq != sve_vq_from_vl(header.vl)) { + ret = -EIO; + goto out; + } + + sve_alloc(target); + fpsimd_sync_to_sve(target); + set_tsk_thread_flag(target, TIF_SVE); + + BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header)); + start = SVE_PT_SVE_OFFSET; + end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq); + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + target->thread.sve_state, + start, end); + if (ret) + goto out; + + start = end; + end = SVE_PT_SVE_FPSR_OFFSET(vq); + ret = user_regset_copyin_ignore(&pos, &count, &kbuf, &ubuf, + start, end); + if (ret) + goto out; + + /* + * Copy fpsr, and fpcr which must follow contiguously in + * struct fpsimd_state: + */ + start = end; + end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE; + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + &target->thread.fpsimd_state.fpsr, + start, end); + +out: + fpsimd_flush_task_state(target); + return ret; +} + +#endif /* CONFIG_ARM64_SVE */ + enum aarch64_regset { REGSET_GPR, REGSET_FPR, @@ -711,6 +950,9 @@ enum aarch64_regset { REGSET_HW_WATCH, #endif REGSET_SYSTEM_CALL, +#ifdef CONFIG_ARM64_SVE + REGSET_SVE, +#endif }; static const struct user_regset aarch64_regsets[] = { @@ -768,6 +1010,18 @@ static const struct user_regset aarch64_regsets[] = { .get = system_call_get, .set = system_call_set, }, +#ifdef CONFIG_ARM64_SVE + [REGSET_SVE] = { /* Scalable Vector Extension */ + .core_note_type = NT_ARM_SVE, + .n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE), + SVE_VQ_BYTES), + .size = SVE_VQ_BYTES, + .align = SVE_VQ_BYTES, + .get = sve_get, + .set = sve_set, + .get_size = sve_get_size, + }, +#endif }; static const struct user_regset_view user_aarch64_view = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index b5280db..735b8f4 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -416,6 +416,7 @@ typedef struct elf64_shdr { #define NT_ARM_HW_BREAK 0x402 /* ARM hardware breakpoint registers */ #define NT_ARM_HW_WATCH 0x403 /* ARM hardware watchpoint registers */ #define NT_ARM_SYSTEM_CALL 0x404 /* ARM system call number */ +#define NT_ARM_SVE 0x405 /* ARM Scalable Vector Extension registers */ #define NT_METAG_CBUF 0x500 /* Metag catch buffer registers */ #define NT_METAG_RPIPE 0x501 /* Metag read pipeline state */ #define NT_METAG_TLS 0x502 /* Metag TLS pointer */