mbox series

[RFC,v2,00/31] Upstream kvx Linux port

Message ID 20230120141002.2442-1-ysionneau@kalray.eu
Headers show
Series Upstream kvx Linux port | expand

Message

Yann Sionneau Jan. 20, 2023, 2:09 p.m. UTC
This patch series adds support for the kv3-1 CPU architecture of the kvx family
found in the Coolidge (aka MPPA3-80) SoC of Kalray.

This is an RFC, since kvx support is not yet upstreamed into gcc/binutils,
therefore this patch series cannot be merged into Linux for now.

The goal is to have preliminary reviews and to fix problems early.

The Kalray VLIW processor family (kvx) has the following features:
* 32/64 bits execution mode
* 6-issue VLIW architecture
* 64 x 64bits general purpose registers
* SIMD instructions
* little-endian
* deep learning co-processor

Kalray kv3-1 core which is the third of the kvx family is embedded in Kalray
Coolidge SoC currently used on K200 and K200-LP boards.

The Coolidge SoC contains 5 clusters each of which is made of:
* 4MiB of on-chip memory (SMEM)
* 1 dedicated safety/security core (kv3-1 core).
* 16 PEs (Processing Elements) (kv3-1 cores).
* 16 Co-processors (one per PE)
* 2 Crypto accelerators

The Coolidge SoC contains the following features:
* 5 Clusters
* 2 100G Ethernet controllers
* 8 PCIe GEN4 controllers (Root Complex and Endpoint capable)
* 2 USB 2.0 controllers
* 1 Octal SPI-NOR flash controller
* 1 eMMC controller
* 3 Quad SPI controllers
* 6 UART
* 5 I2C controllers (3 of which are SMBus capable)
* 4 CAN controllers
* 1 OTP memory

A kvx toolchain can be built using:
# install dependencies: texinfo bison flex libgmp-dev libmpc-dev libmpfr-dev
$ git clone https://github.com/kalray/build-scripts
$ cd build-scripts
$ source last.refs
$ ./build-kvx-xgcc.sh output

The kvx toolchain will be installed in the "output" directory.

You can also find prebuilt toolchains at: https://github.com/kalray/build-scripts/releases/tag/v4.11.1
They are built for Ubuntu 18.04, 20.04 and 22.04 (named 'latest').

A buildroot image (kernel+rootfs) and toolchain can be built using:
$ git clone -b coolidge-for-upstream-v2 https://github.com/kalray/buildroot
$ cd buildroot
$ make O=build_kvx kvx_defconfig
$ make O=build_kvx

The vmlinux image can be found in buildroot/build_kvx/images/vmlinux.

If you are just interested in building the Linux kernel with no rootfs you can
just do this with the kvx-elf- toolchain:
$ make ARCH=kvx O=build_kvx CROSS_COMPILE=kvx-elf- defconfig
$ make ARCH=kvx O=build_kvx CROSS_COMPILE=kvx-elf- -j$(($(nproc) + 1))

The vmlinux ELF can be run with qemu by doing:
# install dependencies: ninja pkg-config libglib-2.0-dev cmake libfdt-dev libpixman-1-dev zlib1g-dev
$ git clone https://github.com/kalray/qemu-builder
$ cd qemu-builder
$ git submodule update --init
$ make -j$(($(nproc) + 1))
$ ./qemu-system-kvx -m 1024 -nographic -kernel <path/to/vmlinux>

V1 -> V2:
 - Rebase on 6.1.6
 - Removed features that are non-necessary for basic port to boot: kgdb, l2 cache driver, jump label, ftrace, hw breakpoints, gdb python helpers and perf monitors
 - Split dt bindings in separate patches
 - Split irqchip drivers: 1 driver == 1 patch
 - Fixed typos in arch/kvx/Kconfig
 - Rewrote ASM atomic helpers in C using builtins
 - Documentation has been rewritten in RST format and included in documentation build system
 - Renamed default_defconfig to defconfig
 - Removed arch-specific __access_ok to use generic code
 - Fixed make clean issue caused by LIBGCC definition in arch/kvx/Makefile

Note that all remarks on V1 patchset are not addressed yet. We are still working on some fixes like removing legacy syscalls,
using generic entry, rework smp.c and publication of kv3-1 ABI specification.

Jules Maselbas (11):
  Documentation: Add binding for kalray,kv3-1-core-intc
  Documentation: Add binding for kalray,kv3-1-apic-gic
  Documentation: Add binding for kalray,kv3-1-apic-mailbox
  Documentation: Add binding for kalray,coolidge-itgen
  Documentation: Add binding for kalray,kv3-1-ipi-ctrl
  Documentation: Add binding for kalray,kv3-1-pwr-ctrl
  irqchip: Add irq-kvx-itgen driver
  irqchip: Add irq-kvx-apic-mailbox driver
  irqchip: Add kvx-core-intc core interupt controller driver
  kvx: Add power controller driver
  kvx: Add IPI driver

Yann Sionneau (20):
  Documentation: kvx: Add basic documentation
  kvx: Add ELF-related definitions
  kvx: Add build infrastructure
  kvx: Add CPU definition headers
  kvx: Add atomic/locking headers
  kvx: Add other common headers
  kvx: Add boot and setup routines
  kvx: Add exception/interrupt handling
  irqchip: Add irq-kvx-apic-gic driver
  kvx: Add process management
  kvx: Add memory management
  kvx: Add system call support
  kvx: Add signal handling support
  kvx: Add ELF relocations and module support
  kvx: Add misc common routines
  kvx: Add some library functions
  kvx: Add multi-processor (SMP) support
  kvx: Add kvx default config file
  kvx: Add debugging related support
  kvx: Add support for cpuinfo

 Documentation/arch.rst                        |    1 +
 .../kalray,coolidge-itgen.yaml                |   48 +
 .../kalray,kv3-1-apic-gic.yaml                |   66 +
 .../kalray,kv3-1-apic-mailbox.yaml            |   75 +
 .../kalray,kv3-1-core-intc.yaml               |   46 +
 .../kalray/kalray,kv3-1-ipi-ctrl.yaml         |   44 +
 .../kalray/kalray,kv3-1-pwr-ctrl.yaml         |   29 +
 Documentation/kvx/index.rst                   |   17 +
 Documentation/kvx/kvx-exceptions.rst          |  256 +
 Documentation/kvx/kvx-iommu.rst               |  191 +
 Documentation/kvx/kvx-mmu.rst                 |  287 +
 Documentation/kvx/kvx-smp.rst                 |   39 +
 Documentation/kvx/kvx.rst                     |  273 +
 arch/kvx/Kconfig                              |  224 +
 arch/kvx/Kconfig.debug                        |   70 +
 arch/kvx/Makefile                             |   53 +
 arch/kvx/configs/defconfig                    |  127 +
 arch/kvx/include/asm/Kbuild                   |   20 +
 arch/kvx/include/asm/asm-prototypes.h         |   14 +
 arch/kvx/include/asm/atomic.h                 |  104 +
 arch/kvx/include/asm/barrier.h                |   15 +
 arch/kvx/include/asm/bitops.h                 |  115 +
 arch/kvx/include/asm/bitrev.h                 |   32 +
 arch/kvx/include/asm/break_hook.h             |   69 +
 arch/kvx/include/asm/bug.h                    |   67 +
 arch/kvx/include/asm/cache.h                  |   43 +
 arch/kvx/include/asm/cacheflush.h             |  158 +
 arch/kvx/include/asm/clocksource.h            |   17 +
 arch/kvx/include/asm/cmpxchg.h                |  170 +
 arch/kvx/include/asm/current.h                |   22 +
 arch/kvx/include/asm/dame.h                   |   31 +
 arch/kvx/include/asm/debug.h                  |   35 +
 arch/kvx/include/asm/elf.h                    |  155 +
 arch/kvx/include/asm/fixmap.h                 |   47 +
 arch/kvx/include/asm/futex.h                  |  141 +
 arch/kvx/include/asm/hardirq.h                |   14 +
 arch/kvx/include/asm/hugetlb.h                |   36 +
 arch/kvx/include/asm/hw_irq.h                 |   14 +
 arch/kvx/include/asm/insns.h                  |   16 +
 arch/kvx/include/asm/insns_defs.h             |  197 +
 arch/kvx/include/asm/io.h                     |   34 +
 arch/kvx/include/asm/ipi.h                    |   16 +
 arch/kvx/include/asm/irqflags.h               |   58 +
 arch/kvx/include/asm/linkage.h                |   13 +
 arch/kvx/include/asm/mem_map.h                |   44 +
 arch/kvx/include/asm/mmu.h                    |  289 +
 arch/kvx/include/asm/mmu_context.h            |  156 +
 arch/kvx/include/asm/mmu_stats.h              |   38 +
 arch/kvx/include/asm/page.h                   |  187 +
 arch/kvx/include/asm/page_size.h              |   29 +
 arch/kvx/include/asm/pci.h                    |   36 +
 arch/kvx/include/asm/pgalloc.h                |  101 +
 arch/kvx/include/asm/pgtable-bits.h           |  102 +
 arch/kvx/include/asm/pgtable.h                |  451 ++
 arch/kvx/include/asm/privilege.h              |  211 +
 arch/kvx/include/asm/processor.h              |  172 +
 arch/kvx/include/asm/ptrace.h                 |  217 +
 arch/kvx/include/asm/pwr_ctrl.h               |   45 +
 arch/kvx/include/asm/rm_fw.h                  |   16 +
 arch/kvx/include/asm/sections.h               |   18 +
 arch/kvx/include/asm/setup.h                  |   29 +
 arch/kvx/include/asm/sfr.h                    |  107 +
 arch/kvx/include/asm/sfr_defs.h               | 5028 +++++++++++++++++
 arch/kvx/include/asm/smp.h                    |   42 +
 arch/kvx/include/asm/sparsemem.h              |   15 +
 arch/kvx/include/asm/spinlock.h               |   16 +
 arch/kvx/include/asm/spinlock_types.h         |   17 +
 arch/kvx/include/asm/stackprotector.h         |   47 +
 arch/kvx/include/asm/stacktrace.h             |   44 +
 arch/kvx/include/asm/string.h                 |   20 +
 arch/kvx/include/asm/swab.h                   |   48 +
 arch/kvx/include/asm/switch_to.h              |   21 +
 arch/kvx/include/asm/symbols.h                |   16 +
 arch/kvx/include/asm/sys_arch.h               |   51 +
 arch/kvx/include/asm/syscall.h                |   73 +
 arch/kvx/include/asm/syscalls.h               |   21 +
 arch/kvx/include/asm/thread_info.h            |   78 +
 arch/kvx/include/asm/timex.h                  |   20 +
 arch/kvx/include/asm/tlb.h                    |   24 +
 arch/kvx/include/asm/tlb_defs.h               |  131 +
 arch/kvx/include/asm/tlbflush.h               |   58 +
 arch/kvx/include/asm/traps.h                  |   76 +
 arch/kvx/include/asm/types.h                  |   12 +
 arch/kvx/include/asm/uaccess.h                |  317 ++
 arch/kvx/include/asm/unistd.h                 |   11 +
 arch/kvx/include/asm/vermagic.h               |   12 +
 arch/kvx/include/asm/vmalloc.h                |   10 +
 arch/kvx/include/uapi/asm/Kbuild              |    1 +
 arch/kvx/include/uapi/asm/bitsperlong.h       |   14 +
 arch/kvx/include/uapi/asm/byteorder.h         |   12 +
 arch/kvx/include/uapi/asm/cachectl.h          |   25 +
 arch/kvx/include/uapi/asm/ptrace.h            |  114 +
 arch/kvx/include/uapi/asm/sigcontext.h        |   16 +
 arch/kvx/include/uapi/asm/unistd.h            |   16 +
 arch/kvx/kernel/Makefile                      |   15 +
 arch/kvx/kernel/asm-offsets.c                 |  157 +
 arch/kvx/kernel/break_hook.c                  |   76 +
 arch/kvx/kernel/common.c                      |   11 +
 arch/kvx/kernel/cpuinfo.c                     |   96 +
 arch/kvx/kernel/dame_handler.c                |  113 +
 arch/kvx/kernel/debug.c                       |   64 +
 arch/kvx/kernel/entry.S                       | 1759 ++++++
 arch/kvx/kernel/head.S                        |  568 ++
 arch/kvx/kernel/insns.c                       |  144 +
 arch/kvx/kernel/io.c                          |   96 +
 arch/kvx/kernel/irq.c                         |   78 +
 arch/kvx/kernel/kvx_ksyms.c                   |   29 +
 arch/kvx/kernel/module.c                      |  148 +
 arch/kvx/kernel/process.c                     |  203 +
 arch/kvx/kernel/prom.c                        |   24 +
 arch/kvx/kernel/ptrace.c                      |  276 +
 arch/kvx/kernel/reset.c                       |   37 +
 arch/kvx/kernel/setup.c                       |  177 +
 arch/kvx/kernel/signal.c                      |  265 +
 arch/kvx/kernel/smp.c                         |  110 +
 arch/kvx/kernel/smpboot.c                     |  127 +
 arch/kvx/kernel/stacktrace.c                  |  173 +
 arch/kvx/kernel/sys_kvx.c                     |   58 +
 arch/kvx/kernel/syscall_table.c               |   19 +
 arch/kvx/kernel/time.c                        |  242 +
 arch/kvx/kernel/traps.c                       |  243 +
 arch/kvx/kernel/vdso.c                        |   87 +
 arch/kvx/kernel/vmlinux.lds.S                 |  150 +
 arch/kvx/lib/Makefile                         |    6 +
 arch/kvx/lib/clear_page.S                     |   40 +
 arch/kvx/lib/copy_page.S                      |   90 +
 arch/kvx/lib/delay.c                          |   39 +
 arch/kvx/lib/memcpy.c                         |   70 +
 arch/kvx/lib/memset.S                         |  351 ++
 arch/kvx/lib/strlen.S                         |  122 +
 arch/kvx/lib/usercopy.S                       |   90 +
 arch/kvx/mm/Makefile                          |    8 +
 arch/kvx/mm/cacheflush.c                      |  154 +
 arch/kvx/mm/dma-mapping.c                     |   85 +
 arch/kvx/mm/extable.c                         |   24 +
 arch/kvx/mm/fault.c                           |  264 +
 arch/kvx/mm/init.c                            |  277 +
 arch/kvx/mm/mmap.c                            |   31 +
 arch/kvx/mm/mmu.c                             |  202 +
 arch/kvx/mm/mmu_stats.c                       |   94 +
 arch/kvx/mm/tlb.c                             |  433 ++
 arch/kvx/platform/Makefile                    |    7 +
 arch/kvx/platform/ipi.c                       |  108 +
 arch/kvx/platform/pwr_ctrl.c                  |   91 +
 drivers/irqchip/Kconfig                       |   27 +
 drivers/irqchip/Makefile                      |    4 +
 drivers/irqchip/irq-kvx-apic-gic.c            |  356 ++
 drivers/irqchip/irq-kvx-apic-mailbox.c        |  480 ++
 drivers/irqchip/irq-kvx-core-intc.c           |   80 +
 drivers/irqchip/irq-kvx-itgen.c               |  236 +
 include/linux/cpuhotplug.h                    |    2 +
 include/uapi/linux/audit.h                    |    1 +
 include/uapi/linux/elf-em.h                   |    1 +
 include/uapi/linux/elf.h                      |    1 +
 tools/include/uapi/asm/bitsperlong.h          |    2 +
 155 files changed, 21474 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/interrupt-controller/kalray,coolidge-itgen.yaml
 create mode 100644 Documentation/devicetree/bindings/interrupt-controller/kalray,kv3-1-apic-gic.yaml
 create mode 100644 Documentation/devicetree/bindings/interrupt-controller/kalray,kv3-1-apic-mailbox.yaml
 create mode 100644 Documentation/devicetree/bindings/interrupt-controller/kalray,kv3-1-core-intc.yaml
 create mode 100644 Documentation/devicetree/bindings/kalray/kalray,kv3-1-ipi-ctrl.yaml
 create mode 100644 Documentation/devicetree/bindings/kalray/kalray,kv3-1-pwr-ctrl.yaml
 create mode 100644 Documentation/kvx/index.rst
 create mode 100644 Documentation/kvx/kvx-exceptions.rst
 create mode 100644 Documentation/kvx/kvx-iommu.rst
 create mode 100644 Documentation/kvx/kvx-mmu.rst
 create mode 100644 Documentation/kvx/kvx-smp.rst
 create mode 100644 Documentation/kvx/kvx.rst
 create mode 100644 arch/kvx/Kconfig
 create mode 100644 arch/kvx/Kconfig.debug
 create mode 100644 arch/kvx/Makefile
 create mode 100644 arch/kvx/configs/defconfig
 create mode 100644 arch/kvx/include/asm/Kbuild
 create mode 100644 arch/kvx/include/asm/asm-prototypes.h
 create mode 100644 arch/kvx/include/asm/atomic.h
 create mode 100644 arch/kvx/include/asm/barrier.h
 create mode 100644 arch/kvx/include/asm/bitops.h
 create mode 100644 arch/kvx/include/asm/bitrev.h
 create mode 100644 arch/kvx/include/asm/break_hook.h
 create mode 100644 arch/kvx/include/asm/bug.h
 create mode 100644 arch/kvx/include/asm/cache.h
 create mode 100644 arch/kvx/include/asm/cacheflush.h
 create mode 100644 arch/kvx/include/asm/clocksource.h
 create mode 100644 arch/kvx/include/asm/cmpxchg.h
 create mode 100644 arch/kvx/include/asm/current.h
 create mode 100644 arch/kvx/include/asm/dame.h
 create mode 100644 arch/kvx/include/asm/debug.h
 create mode 100644 arch/kvx/include/asm/elf.h
 create mode 100644 arch/kvx/include/asm/fixmap.h
 create mode 100644 arch/kvx/include/asm/futex.h
 create mode 100644 arch/kvx/include/asm/hardirq.h
 create mode 100644 arch/kvx/include/asm/hugetlb.h
 create mode 100644 arch/kvx/include/asm/hw_irq.h
 create mode 100644 arch/kvx/include/asm/insns.h
 create mode 100644 arch/kvx/include/asm/insns_defs.h
 create mode 100644 arch/kvx/include/asm/io.h
 create mode 100644 arch/kvx/include/asm/ipi.h
 create mode 100644 arch/kvx/include/asm/irqflags.h
 create mode 100644 arch/kvx/include/asm/linkage.h
 create mode 100644 arch/kvx/include/asm/mem_map.h
 create mode 100644 arch/kvx/include/asm/mmu.h
 create mode 100644 arch/kvx/include/asm/mmu_context.h
 create mode 100644 arch/kvx/include/asm/mmu_stats.h
 create mode 100644 arch/kvx/include/asm/page.h
 create mode 100644 arch/kvx/include/asm/page_size.h
 create mode 100644 arch/kvx/include/asm/pci.h
 create mode 100644 arch/kvx/include/asm/pgalloc.h
 create mode 100644 arch/kvx/include/asm/pgtable-bits.h
 create mode 100644 arch/kvx/include/asm/pgtable.h
 create mode 100644 arch/kvx/include/asm/privilege.h
 create mode 100644 arch/kvx/include/asm/processor.h
 create mode 100644 arch/kvx/include/asm/ptrace.h
 create mode 100644 arch/kvx/include/asm/pwr_ctrl.h
 create mode 100644 arch/kvx/include/asm/rm_fw.h
 create mode 100644 arch/kvx/include/asm/sections.h
 create mode 100644 arch/kvx/include/asm/setup.h
 create mode 100644 arch/kvx/include/asm/sfr.h
 create mode 100644 arch/kvx/include/asm/sfr_defs.h
 create mode 100644 arch/kvx/include/asm/smp.h
 create mode 100644 arch/kvx/include/asm/sparsemem.h
 create mode 100644 arch/kvx/include/asm/spinlock.h
 create mode 100644 arch/kvx/include/asm/spinlock_types.h
 create mode 100644 arch/kvx/include/asm/stackprotector.h
 create mode 100644 arch/kvx/include/asm/stacktrace.h
 create mode 100644 arch/kvx/include/asm/string.h
 create mode 100644 arch/kvx/include/asm/swab.h
 create mode 100644 arch/kvx/include/asm/switch_to.h
 create mode 100644 arch/kvx/include/asm/symbols.h
 create mode 100644 arch/kvx/include/asm/sys_arch.h
 create mode 100644 arch/kvx/include/asm/syscall.h
 create mode 100644 arch/kvx/include/asm/syscalls.h
 create mode 100644 arch/kvx/include/asm/thread_info.h
 create mode 100644 arch/kvx/include/asm/timex.h
 create mode 100644 arch/kvx/include/asm/tlb.h
 create mode 100644 arch/kvx/include/asm/tlb_defs.h
 create mode 100644 arch/kvx/include/asm/tlbflush.h
 create mode 100644 arch/kvx/include/asm/traps.h
 create mode 100644 arch/kvx/include/asm/types.h
 create mode 100644 arch/kvx/include/asm/uaccess.h
 create mode 100644 arch/kvx/include/asm/unistd.h
 create mode 100644 arch/kvx/include/asm/vermagic.h
 create mode 100644 arch/kvx/include/asm/vmalloc.h
 create mode 100644 arch/kvx/include/uapi/asm/Kbuild
 create mode 100644 arch/kvx/include/uapi/asm/bitsperlong.h
 create mode 100644 arch/kvx/include/uapi/asm/byteorder.h
 create mode 100644 arch/kvx/include/uapi/asm/cachectl.h
 create mode 100644 arch/kvx/include/uapi/asm/ptrace.h
 create mode 100644 arch/kvx/include/uapi/asm/sigcontext.h
 create mode 100644 arch/kvx/include/uapi/asm/unistd.h
 create mode 100644 arch/kvx/kernel/Makefile
 create mode 100644 arch/kvx/kernel/asm-offsets.c
 create mode 100644 arch/kvx/kernel/break_hook.c
 create mode 100644 arch/kvx/kernel/common.c
 create mode 100644 arch/kvx/kernel/cpuinfo.c
 create mode 100644 arch/kvx/kernel/dame_handler.c
 create mode 100644 arch/kvx/kernel/debug.c
 create mode 100644 arch/kvx/kernel/entry.S
 create mode 100644 arch/kvx/kernel/head.S
 create mode 100644 arch/kvx/kernel/insns.c
 create mode 100644 arch/kvx/kernel/io.c
 create mode 100644 arch/kvx/kernel/irq.c
 create mode 100644 arch/kvx/kernel/kvx_ksyms.c
 create mode 100644 arch/kvx/kernel/module.c
 create mode 100644 arch/kvx/kernel/process.c
 create mode 100644 arch/kvx/kernel/prom.c
 create mode 100644 arch/kvx/kernel/ptrace.c
 create mode 100644 arch/kvx/kernel/reset.c
 create mode 100644 arch/kvx/kernel/setup.c
 create mode 100644 arch/kvx/kernel/signal.c
 create mode 100644 arch/kvx/kernel/smp.c
 create mode 100644 arch/kvx/kernel/smpboot.c
 create mode 100644 arch/kvx/kernel/stacktrace.c
 create mode 100644 arch/kvx/kernel/sys_kvx.c
 create mode 100644 arch/kvx/kernel/syscall_table.c
 create mode 100644 arch/kvx/kernel/time.c
 create mode 100644 arch/kvx/kernel/traps.c
 create mode 100644 arch/kvx/kernel/vdso.c
 create mode 100644 arch/kvx/kernel/vmlinux.lds.S
 create mode 100644 arch/kvx/lib/Makefile
 create mode 100644 arch/kvx/lib/clear_page.S
 create mode 100644 arch/kvx/lib/copy_page.S
 create mode 100644 arch/kvx/lib/delay.c
 create mode 100644 arch/kvx/lib/memcpy.c
 create mode 100644 arch/kvx/lib/memset.S
 create mode 100644 arch/kvx/lib/strlen.S
 create mode 100644 arch/kvx/lib/usercopy.S
 create mode 100644 arch/kvx/mm/Makefile
 create mode 100644 arch/kvx/mm/cacheflush.c
 create mode 100644 arch/kvx/mm/dma-mapping.c
 create mode 100644 arch/kvx/mm/extable.c
 create mode 100644 arch/kvx/mm/fault.c
 create mode 100644 arch/kvx/mm/init.c
 create mode 100644 arch/kvx/mm/mmap.c
 create mode 100644 arch/kvx/mm/mmu.c
 create mode 100644 arch/kvx/mm/mmu_stats.c
 create mode 100644 arch/kvx/mm/tlb.c
 create mode 100644 arch/kvx/platform/Makefile
 create mode 100644 arch/kvx/platform/ipi.c
 create mode 100644 arch/kvx/platform/pwr_ctrl.c
 create mode 100644 drivers/irqchip/irq-kvx-apic-gic.c
 create mode 100644 drivers/irqchip/irq-kvx-apic-mailbox.c
 create mode 100644 drivers/irqchip/irq-kvx-core-intc.c
 create mode 100644 drivers/irqchip/irq-kvx-itgen.c

base-commit: 21e996306a6afaae88295858de0ffb8955173a15

Comments

Jason A. Donenfeld Jan. 20, 2023, 2:29 p.m. UTC | #1
Hi Yann,

On Fri, Jan 20, 2023 at 03:09:43PM +0100, Yann Sionneau wrote:
> +#include <linux/random.h>
> +#include <linux/version.h>
> +
> +extern unsigned long __stack_chk_guard;
> +
> +/*
> + * Initialize the stackprotector canary value.
> + *
> + * NOTE: this must only be called from functions that never return,
> + * and it must always be inlined.
> + */
> +static __always_inline void boot_init_stack_canary(void)
> +{
> +	unsigned long canary;
> +
> +	/* Try to get a semi random initial value. */
> +	get_random_bytes(&canary, sizeof(canary));
> +	canary ^= LINUX_VERSION_CODE;
> +	canary &= CANARY_MASK;
> +
> +	current->stack_canary = canary;
> +	__stack_chk_guard = current->stack_canary;
> +}


You should rewrite this as:

    current->stack_canary = get_random_canary();
    __stack_chk_guard = current->stack_canary;

which is what the other archs all now do. (They didn't used to, and this
looks like it's simply based on older code.)

> +#define get_cycles get_cycles
> +
> +#include <asm/sfr.h>
> +#include <asm-generic/timex.h>
> +
> +static inline cycles_t get_cycles(void)
> +{
> +	return kvx_sfr_get(PM0);
> +}

Glad to see this CPU has a cycle counter. Out of curiosity, what is
its resolution?

Also, related, does this CPU happen to have a "RDRAND"-like instruction?
(I don't know anything about kvx or even what it is.)

Jason
Arnd Bergmann Jan. 20, 2023, 2:39 p.m. UTC | #2
On Fri, Jan 20, 2023, at 15:09, Yann Sionneau wrote:
>      - Fix clean target raising an error from gcc (LIBGCC)

I had not noticed this on v1 but:

> +# Link with libgcc to get __div* builtins.
> +LIBGCC	:= $(shell $(CC) $(DEFAULT_OPTS) --print-libgcc-file-name)

It's better to copy the bits of libgcc that you actually need
than to include the whole thing. The kernel is in a weird
state that is neither freestanding nor the normal libc based
environment, so we generally want full control over what is
used. This is particularly important for 32-bit architectures
that do not want the 64-bit division, but there are probably
enough other cases as well.

     Arnd
Jules Maselbas Jan. 20, 2023, 2:53 p.m. UTC | #3
On Fri, Jan 20, 2023 at 03:39:22PM +0100, Arnd Bergmann wrote:
> On Fri, Jan 20, 2023, at 15:09, Yann Sionneau wrote:
> >      - Fix clean target raising an error from gcc (LIBGCC)
> 
> I had not noticed this on v1 but:
> 
> > +# Link with libgcc to get __div* builtins.
> > +LIBGCC	:= $(shell $(CC) $(DEFAULT_OPTS) --print-libgcc-file-name)
> 
> It's better to copy the bits of libgcc that you actually need
> than to include the whole thing. The kernel is in a weird
It was initialy using KCONFIG_CFLAGS which do not contains valid options
when invoking the clean target.

I am not exactly sure what's needed by gcc for --print-libgcc-file-name,
my guess is that only the -march option matters, I will double check
internally with compiler peoples.

> state that is neither freestanding nor the normal libc based
> environment, so we generally want full control over what is
> used. This is particularly important for 32-bit architectures
> that do not want the 64-bit division, but there are probably
> enough other cases as well.
> 
>      Arnd
> 
> 
> 
>
Arnd Bergmann Jan. 20, 2023, 3:01 p.m. UTC | #4
On Fri, Jan 20, 2023, at 15:53, Jules Maselbas wrote:
> On Fri, Jan 20, 2023 at 03:39:22PM +0100, Arnd Bergmann wrote:
>> On Fri, Jan 20, 2023, at 15:09, Yann Sionneau wrote:
>> >      - Fix clean target raising an error from gcc (LIBGCC)
>> 
>> I had not noticed this on v1 but:
>> 
>> > +# Link with libgcc to get __div* builtins.
>> > +LIBGCC	:= $(shell $(CC) $(DEFAULT_OPTS) --print-libgcc-file-name)
>> 
>> It's better to copy the bits of libgcc that you actually need
>> than to include the whole thing. The kernel is in a weird
> It was initialy using KCONFIG_CFLAGS which do not contains valid options
> when invoking the clean target.
>
> I am not exactly sure what's needed by gcc for --print-libgcc-file-name,
> my guess is that only the -march option matters, I will double check
> internally with compiler peoples.
>
>> state that is neither freestanding nor the normal libc based
>> environment, so we generally want full control over what is
>> used. This is particularly important for 32-bit architectures
>> that do not want the 64-bit division, but there are probably
>> enough other cases as well.

To clarify: I meant you should not include libgcc.a at all but
add the minimum set of required files as arch/kvx/lib/*.S.

     Arnd
Jules Maselbas Jan. 20, 2023, 3:03 p.m. UTC | #5
On Fri, Jan 20, 2023 at 04:01:11PM +0100, Arnd Bergmann wrote:
> On Fri, Jan 20, 2023, at 15:53, Jules Maselbas wrote:
> > On Fri, Jan 20, 2023 at 03:39:22PM +0100, Arnd Bergmann wrote:
> >> On Fri, Jan 20, 2023, at 15:09, Yann Sionneau wrote:
> >> >      - Fix clean target raising an error from gcc (LIBGCC)
> >> 
> >> I had not noticed this on v1 but:
> >> 
> >> > +# Link with libgcc to get __div* builtins.
> >> > +LIBGCC	:= $(shell $(CC) $(DEFAULT_OPTS) --print-libgcc-file-name)
> >> 
> >> It's better to copy the bits of libgcc that you actually need
> >> than to include the whole thing. The kernel is in a weird
> > It was initialy using KCONFIG_CFLAGS which do not contains valid options
> > when invoking the clean target.
> >
> > I am not exactly sure what's needed by gcc for --print-libgcc-file-name,
> > my guess is that only the -march option matters, I will double check
> > internally with compiler peoples.
> >
> >> state that is neither freestanding nor the normal libc based
> >> environment, so we generally want full control over what is
> >> used. This is particularly important for 32-bit architectures
> >> that do not want the 64-bit division, but there are probably
> >> enough other cases as well.
> 
> To clarify: I meant you should not include libgcc.a at all but
> add the minimum set of required files as arch/kvx/lib/*.S.
Thanks for clarifying :)


-- Jules
Mark Rutland Jan. 20, 2023, 3:18 p.m. UTC | #6
On Fri, Jan 20, 2023 at 03:09:42PM +0100, Yann Sionneau wrote:
> Add common headers (atomic, bitops, barrier and locking) for basic
> kvx support.
> 
> Co-developed-by: Clement Leger <clement@clement-leger.fr>
> Signed-off-by: Clement Leger <clement@clement-leger.fr>
> Co-developed-by: Jules Maselbas <jmaselbas@kalray.eu>
> Signed-off-by: Jules Maselbas <jmaselbas@kalray.eu>
> Co-developed-by: Julian Vetter <jvetter@kalray.eu>
> Signed-off-by: Julian Vetter <jvetter@kalray.eu>
> Co-developed-by: Julien Villette <jvillette@kalray.eu>
> Signed-off-by: Julien Villette <jvillette@kalray.eu>
> Co-developed-by: Yann Sionneau <ysionneau@kalray.eu>
> Signed-off-by: Yann Sionneau <ysionneau@kalray.eu>
> ---
> 
> Notes:
>     V1 -> V2:
>      - use {READ,WRITE}_ONCE for arch_atomic64_{read,set}
>      - use asm-generic/bitops/atomic.h instead of __test_and_*_bit
>      - removed duplicated includes
>      - rewrite xchg and cmpxchg in C using builtins for acswap insn

Thanks for those changes. I see one issue below (instantiated a few times), but
other than that this looks good to me.

[...]

> +#define ATOMIC64_RETURN_OP(op, c_op)					\
> +static inline long arch_atomic64_##op##_return(long i, atomic64_t *v)	\
> +{									\
> +	long new, old, ret;						\
> +									\
> +	do {								\
> +		old = v->counter;					\

This should be arch_atomic64_read(v), in order to avoid the potential for the
compiler to replay the access and introduce ABA races and other such problems.

For details, see:

  https://lore.kernel.org/lkml/Y70SWXHDmOc3RhMd@osiris/
  https://lore.kernel.org/lkml/Y71LoCIl+IFdy9D8@FVFF77S0Q05N/

I see that the generic 32-bit atomic code suffers from that issue, and we
should fix it.

> +		new = old c_op i;					\
> +		ret = arch_cmpxchg(&v->counter, old, new);		\
> +	} while (ret != old);						\
> +									\
> +	return new;							\
> +}
> +
> +#define ATOMIC64_OP(op, c_op)						\
> +static inline void arch_atomic64_##op(long i, atomic64_t *v)		\
> +{									\
> +	long new, old, ret;						\
> +									\
> +	do {								\
> +		old = v->counter;					\

Likewise, arch_atomic64_read(v) here.

> +		new = old c_op i;					\
> +		ret = arch_cmpxchg(&v->counter, old, new);		\
> +	} while (ret != old);						\
> +}
> +
> +#define ATOMIC64_FETCH_OP(op, c_op)					\
> +static inline long arch_atomic64_fetch_##op(long i, atomic64_t *v)	\
> +{									\
> +	long new, old, ret;						\
> +									\
> +	do {								\
> +		old = v->counter;					\

Likewise, arch_atomic64_read(v) here.

> +		new = old c_op i;					\
> +		ret = arch_cmpxchg(&v->counter, old, new);		\
> +	} while (ret != old);						\
> +									\
> +	return old;							\
> +}
> +
> +#define ATOMIC64_OPS(op, c_op)						\
> +	ATOMIC64_OP(op, c_op)						\
> +	ATOMIC64_RETURN_OP(op, c_op)					\
> +	ATOMIC64_FETCH_OP(op, c_op)
> +
> +ATOMIC64_OPS(and, &)
> +ATOMIC64_OPS(or, |)
> +ATOMIC64_OPS(xor, ^)
> +ATOMIC64_OPS(add, +)
> +ATOMIC64_OPS(sub, -)
> +
> +#undef ATOMIC64_OPS
> +#undef ATOMIC64_FETCH_OP
> +#undef ATOMIC64_OP
> +
> +static inline int arch_atomic_add_return(int i, atomic_t *v)
> +{
> +	int new, old, ret;
> +
> +	do {
> +		old = v->counter;

Likewise, arch_atomic64_read(v) here.

> +		new = old + i;
> +		ret = arch_cmpxchg(&v->counter, old, new);
> +	} while (ret != old);
> +
> +	return new;
> +}
> +
> +static inline int arch_atomic_sub_return(int i, atomic_t *v)
> +{
> +	return arch_atomic_add_return(-i, v);
> +}
> +
> +#include <asm-generic/atomic.h>
> +
> +#endif	/* _ASM_KVX_ATOMIC_H */

Otherwise, the atomics look good to me.

Thanks,
Mark.
Krzysztof Kozlowski Jan. 22, 2023, 11:54 a.m. UTC | #7
On 20/01/2023 15:10, Yann Sionneau wrote:
> From: Jules Maselbas <jmaselbas@kalray.eu>
> 
> The Power Controller (pwr-ctrl) control cores reset and wake-up
> procedure.


> +
> +static struct device_node * __init get_pwr_ctrl_node(void)
> +{
> +	const phandle *ph;
> +	struct device_node *cpu;
> +	struct device_node *node;
> +
> +	cpu = of_get_cpu_node(raw_smp_processor_id(), NULL);
> +	if (!cpu) {
> +		pr_err("Failed to get CPU node\n");
> +		return NULL;
> +	}
> +
> +	ph = of_get_property(cpu, "power-controller", NULL);
> +	if (!ph) {
> +		pr_err("Failed to get power-controller phandle\n");
> +		return NULL;
> +	}
> +
> +	node = of_find_node_by_phandle(be32_to_cpup(ph));
> +	if (!node) {
> +		pr_err("Failed to get power-controller node\n");
> +		return NULL;
> +	}
> +
> +	return node;
> +}
> +
> +int __init kvx_pwr_ctrl_probe(void)
> +{
> +	struct device_node *ctrl;
> +
> +	ctrl = get_pwr_ctrl_node();
> +	if (!ctrl) {
> +		pr_err("Failed to get power controller node\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!of_device_is_compatible(ctrl, "kalray,kvx-pwr-ctrl")) {
> +		pr_err("Failed to get power controller node\n");

No. Drivers go to drivers, not to arch directory. This should be a
proper driver instead of some fake stub doing its own driver matching.
You need to rework this.

Best regards,
Krzysztof
Krzysztof Kozlowski Jan. 22, 2023, 11:54 a.m. UTC | #8
On 20/01/2023 15:10, Yann Sionneau wrote:
> +
> +int __init kvx_ipi_ctrl_probe(irqreturn_t (*ipi_irq_handler)(int, void *))
> +{
> +	struct device_node *np;
> +	int ret;
> +	unsigned int ipi_irq;
> +	void __iomem *ipi_base;
> +
> +	np = of_find_compatible_node(NULL, NULL, "kalray,kvx-ipi-ctrl");

Nope, big no.

Drivers go to drivers, not to arch code. Use proper driver infrastructure.

Best regards,
Krzysztof
Krzysztof Kozlowski Jan. 22, 2023, 11:57 a.m. UTC | #9
On 20/01/2023 15:10, Yann Sionneau wrote:
> +static int __init setup_cpuinfo(void)
> +{
> +	int cpu;
> +	struct clk *clk;
> +	unsigned long cpu_freq = 1000000000;
> +	struct device_node *node = of_get_cpu_node(0, NULL);
> +
> +	clk = of_clk_get(node, 0);
> +	if (IS_ERR(clk)) {
> +		printk(KERN_WARNING
> +		       "Device tree missing CPU 'clock' parameter. Assuming frequency is 1GHZ");
> +		goto setup_cpu_freq;
> +	}
> +
> +	cpu_freq = clk_get_rate(clk);

What about cpufreq? I don't think this is useful.

Best regards,
Krzysztof
Krzysztof Kozlowski Jan. 22, 2023, 11:58 a.m. UTC | #10
On 20/01/2023 15:09, Yann Sionneau wrote:
> Add a default config file for kvx based Coolidge SoC.
> 
> Co-developed-by: Ashley Lesdalons <alesdalons@kalray.eu>
> Signed-off-by: Ashley Lesdalons <alesdalons@kalray.eu>
> Co-developed-by: Benjamin Mugnier <mugnier.benjamin@gmail.com>
> Signed-off-by: Benjamin Mugnier <mugnier.benjamin@gmail.com>
> Co-developed-by: Clement Leger <clement@clement-leger.fr>
> Signed-off-by: Clement Leger <clement@clement-leger.fr>
> Co-developed-by: Guillaume Thouvenin <gthouvenin@kalray.eu>
> Signed-off-by: Guillaume Thouvenin <gthouvenin@kalray.eu>
> Co-developed-by: Jules Maselbas <jmaselbas@kalray.eu>
> Signed-off-by: Jules Maselbas <jmaselbas@kalray.eu>
> Co-developed-by: Julian Vetter <jvetter@kalray.eu>
> Signed-off-by: Julian Vetter <jvetter@kalray.eu>
> Co-developed-by: Samuel Jones <sjones@kalray.eu>
> Signed-off-by: Samuel Jones <sjones@kalray.eu>
> Co-developed-by: Thomas Costis <tcostis@kalray.eu>
> Signed-off-by: Thomas Costis <tcostis@kalray.eu>
> Co-developed-by: Vincent Chardon <vincent.chardon@elsys-design.com>
> Signed-off-by: Vincent Chardon <vincent.chardon@elsys-design.com>
> Co-developed-by: Yann Sionneau <ysionneau@kalray.eu>
> Signed-off-by: Yann Sionneau <ysionneau@kalray.eu>
> ---
> 
> Notes:
>     V1 -> V2: default_defconfig renamed to defconfig
> 
>  arch/kvx/configs/defconfig | 127 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 127 insertions(+)
>  create mode 100644 arch/kvx/configs/defconfig
> 
> diff --git a/arch/kvx/configs/defconfig b/arch/kvx/configs/defconfig
> new file mode 100644
> index 000000000000..960784da0b1b
> --- /dev/null
> +++ b/arch/kvx/configs/defconfig
> @@ -0,0 +1,127 @@
> +CONFIG_DEFAULT_HOSTNAME="KVXlinux"
> +CONFIG_SERIAL_KVX_SCALL_COMM=y
> +CONFIG_CONFIGFS_FS=y
> +CONFIG_DEBUG_KERNEL=y
> +CONFIG_DEBUG_INFO=y
> +CONFIG_DEBUG_INFO_DWARF4=y
> +CONFIG_PRINTK_TIME=y
> +CONFIG_CONSOLE_LOGLEVEL_DEFAULT=15
> +CONFIG_MESSAGE_LOGLEVEL_DEFAULT=7
> +CONFIG_PANIC_TIMEOUT=-1
> +CONFIG_BLK_DEV_INITRD=y
> +CONFIG_GDB_SCRIPTS=y
> +CONFIG_FRAME_POINTER=y
> +CONFIG_HZ_100=y
> +CONFIG_SERIAL_EARLYCON=y
> +CONFIG_HOTPLUG_PCI_PCIE=y
> +CONFIG_PCIEAER=y
> +CONFIG_PCIE_DPC=y
> +CONFIG_HOTPLUG_PCI=y
> +CONFIG_SERIAL_8250=y

Are you sure this is the result of savedefconfig? Order looks a bit odd
in several places, so I want to double check.

Best regards,
Krzysztof
Mike Rapoport Jan. 22, 2023, 3:02 p.m. UTC | #11
Hi,

While reviewing kvx-mmu.rst I've spotted several places where the
documentation does not match the code, please make sure that the
documentation in this patch actually reflects what the code is doing.

Also, sectioning looked a bit strange to me, make sure to take a look at
the generated html and see if it is rendered as intended.

On Fri, Jan 20, 2023 at 03:09:32PM +0100, Yann Sionneau wrote:
> Add some documentation for the kvx architecture and its Linux port.
> 
> Co-developed-by: Clement Leger <clement@clement-leger.fr>
> Signed-off-by: Clement Leger <clement@clement-leger.fr>
> Co-developed-by: Jules Maselbas <jmaselbas@kalray.eu>
> Signed-off-by: Jules Maselbas <jmaselbas@kalray.eu>
> Co-developed-by: Guillaume Thouvenin <gthouvenin@kalray.eu>
> Signed-off-by: Guillaume Thouvenin <gthouvenin@kalray.eu>
> Signed-off-by: Yann Sionneau <ysionneau@kalray.eu>
> ---
> 
> Notes:
>     V1 -> V2:
>      - converted to .rst, typos and formatting fixes
>      - reword few paragraphs
> 
>  Documentation/arch.rst               |   1 +
>  Documentation/kvx/index.rst          |  17 ++
>  Documentation/kvx/kvx-exceptions.rst | 256 ++++++++++++++++++++++++
>  Documentation/kvx/kvx-iommu.rst      | 191 ++++++++++++++++++
>  Documentation/kvx/kvx-mmu.rst        | 287 +++++++++++++++++++++++++++
>  Documentation/kvx/kvx-smp.rst        |  39 ++++
>  Documentation/kvx/kvx.rst            | 273 +++++++++++++++++++++++++
>  7 files changed, 1064 insertions(+)
>  create mode 100644 Documentation/kvx/index.rst
>  create mode 100644 Documentation/kvx/kvx-exceptions.rst
>  create mode 100644 Documentation/kvx/kvx-iommu.rst
>  create mode 100644 Documentation/kvx/kvx-mmu.rst
>  create mode 100644 Documentation/kvx/kvx-smp.rst
>  create mode 100644 Documentation/kvx/kvx.rst
> 
> diff --git a/Documentation/arch.rst b/Documentation/arch.rst
> index 41a66a8b38e4..1ccda8ef6eef 100644
> --- a/Documentation/arch.rst
> +++ b/Documentation/arch.rst
> @@ -13,6 +13,7 @@ implementation.
>     arm/index
>     arm64/index
>     ia64/index
> +   kvx/index
>     loongarch/index
>     m68k/index
>     mips/index

...

> diff --git a/Documentation/kvx/kvx-mmu.rst b/Documentation/kvx/kvx-mmu.rst
> new file mode 100644
> index 000000000000..b7186331396c
> --- /dev/null
> +++ b/Documentation/kvx/kvx-mmu.rst
> @@ -0,0 +1,287 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===
> +MMU
> +===
> +
> +Virtual addresses are on 41 bits for kvx when using 64-bit mode.

There is only 64-bit support at the moment, so no need to mention it.

> +To differentiate kernel from user space, we use the high order bit
> +(bit 40). When bit 40 is set, then the higher remaining bits must also be
> +set to 1. The virtual address must be extended with 1 when the bit 40 is set,
> +if not the address must be zero extended. Bit 40 is set for kernel space
> +mappings and not set for user space mappings.

I'd reorder this paragraph to first mention that bit 40 is set for kernel
space and then write about sign extension.

> +
> +Memory Map
> +----------
> +
> +In Linux physical memories are arranged into banks according to the cost of an
> +access in term of distance to a memory. As we are UMA architecture we only have
> +one bank and thus one node.
> +
> +A node is divided into several kind of zone. For example if DMA can only access
> +a specific area in the physical memory we will define a ZONE_DMA for this purpose.
> +In our case we are considering that DMA can access all DDR so we don't have a specific
> +zone for this. On 64 bit architecture all DDR can be mapped in virtual kernel space
> +so there is no need for a ZONE_HIGHMEM. That means that in our case there is
> +only one ZONE_NORMAL. This will be updated if DMA cannot access all memory.

These two paragraphs have nothing to do with the virtual memory map. Please
drop them.

> +
> +Currently, the memory mapping is the following for 4KB page:
> +
> +  ======================== ======================= ====== ======= ==============
> +  Start                    End                     Attr   Size    Name
> +  ======================== ======================= ====== ======= ==============
> +  0000 0000 0000 0000      0000 003F FFFF FFFF     ---    256GB    User
> +  0000 0040 0000 0000      0000 007F FFFF FFFF     ---    256GB     MMAP
> +  0000 0080 0000 0000      FFFF FF7F FFFF FFFF     ---    ---      Gap
> +  FFFF FF80 0000 0000      FFFF FFFF FFFF FFFF     ---    512GB    Kernel
> +    FFFF FF80 0000 0000     FFFF FF8F FFFF FFFF    RWX    64GB      Direct Map
> +    FFFF FF90 0000 0000     FFFF FF90 3FFF FFFF    RWX    1GB       Vmalloc
> +    FFFF FF90 4000 0000     FFFF FFFF FFFF FFFF    RW     447GB     Free area
> +  ======================== ======================= ====== ======= ==============
> +
> +Enable the MMU
> +--------------
> +
> +All kernel functions and symbols are in virtual memory except for kvx_start()
> +function which is loaded at 0x0 in physical memory.
> +To be able to switch from physical addresses to virtual addresses we choose to
> +setup the TLB at the very beginning of the boot process to be able to map both
> +pieces of code. For this we added two entries in the LTLB. The first one,
> +LTLB[0], contains the mapping between virtual memory and DDR. Its size is 512MB.
> +The second entry, LTLB[1], contains a flat mapping of the first 2MB of the SMEM.
> +Once those two entries are present we can enable the MMU. LTLB[1] will be
> +removed during paging_init() because once we are really running in virtual space
> +it will not be used anymore.
> +In order to access more than 512MB DDR memory, the remaining memory (> 512MB) is
> +refill using a comparison in kernel_perf_refill that does not walk the kernel
> +page table, thus having a faster refill time for kernel. These entries are
> +inserted into the LTLB for easier computation (4 LTLB entries). The drawback of
> +this approach is that mapped entries are using RWX protection attributes,
> +leading to no protection at all.
> +
> +Kernel strict RWX
> +-----------------
> +
> +``CONFIG_STRICT_KERNEL_RWX`` is enabled by default in defconfig.

This is not what I see in the current patchset. Please re-add this section
along with CONFIG_STRICT_KERNEL_RWX support, otherwise it's misleading.

> +Once booted, if ``CONFIG_STRICT_KERNEL_RWX`` is enable, the kernel text and memory
> +will be mapped in the init_mm page table. Once mapped, the refill routine for
> +the kernel is patched to always do a page table walk, bypassing the faster
> +comparison but enforcing page protection attributes when refilling.
> +Finally, the LTLB[0] entry is replaced by a 4K one, mapping only exceptions with
> +RX protection. It allows us to never trigger nomapping on nomapping refill
> +routine which would (obviously) not work... Once this is done, we can flush the
> +4 LTLB entries for kernel refill in order to be sure there is no stalled
> +entries and that new entries inserted in JTLB will apply.
> +
> +By default, the following policy is applied on vmlinux sections:
> +
> + - init_data: RW
> + - init_text: RX (or RWX if parameter rodata=off)
> + - text: RX (or RWX if parameter rodata=off)
> + - rodata: RW before init, RO after init
> + - sdata: RW
> +
> +Kernel RWX mode can then be switched on/off using /sys/kvx/kernel_rwx file.
> +
> +Privilege Level
> +---------------
> +
> +Since we are using privilege levels on kvx, we make use of the virtual
> +spaces to be in the same space as the user. The kernel will have the

I'm failing to parse this. What will be in the same space as user?

> +$ps.mmup set in kernel (PL1) and unset for user (PL2).
> +As said in kvx documentation, we have two cases when the kernel is
> +booted:
> +
> + - Either we have been booted by someone (bootloader, hypervisor, etc)
> + - Or we are alone (boot from flash)
> +
> +In both cases, we will use the virtual space 0. Indeed, if we are alone
> +on the core, then it means nobody is using the MMU and we can take the
> +first virtual space. If not alone, then when writing an entry to the tlb
> +using writetlb instruction, the hypervisor will catch it and change the
> +virtual space accordingly.
> +
> +Memblock
> +========
> +
> +When the kernel starts there is no memory allocator available. One of the first
> +step in the kernel is to detect the amount of DDR available by getting this
> +information in the device tree and initialize the low-level "memblock" allocator.
> +
> +We start by reserving memory for the whole kernel. For instance with a device
> +tree containing 512MB of DDR you could see the following boot messages::
> +
> +  setup_bootmem: Memory  : 0x100000000 - 0x120000000
> +  setup_bootmem: Reserved: 0x10001f000 - 0x1002d1bc0
> +
> +During the paging init we need to set:
> +
> + - min_low_pfn that is the lowest PFN available in the system
> + - max_low_pfn that indicates the end if NORMAL zone
> + - max_pfn that is the number of pages in the system
> +
> +This setting is used for dividing memory into pages and for configuring the
> +zone. See the memory map section for more information about ZONE.
> +
> +Zones are configured in free_area_init_core(). During start_kernel() other
> +allocations are done for command line, cpu areas, PID hash table, different
> +caches for VFS. This allocator is used until mem_init() is called.
> +
> +mem_init() is provided by the architecture. For MPPA we just call
> +free_all_bootmem() that will go through all pages that are not used by the
> +low level allocator and mark them as not used. So physical pages that are
> +reserved for the kernel are still used and remain in physical memory. All pages
> +released will now be used by the buddy allocator.

There is nothing kvx-specific in the memblock section. Please strive to
keep Docimentaion/arch/kvx about architecture specific details.

> +Peripherals
> +-----------
> +
> +Peripherals are mapped using standard ioremap infrastructure, therefore
> +mapped addresses are located in the vmalloc space.
> +
> +LTLB Usage
> +----------
> +
> +LTLB is used to add resident mapping which allows for faster MMU lookup.
> +Currently, the LTLB is used to map some mandatory kernel pages and to allow fast
> +accesses to l2 cache (mailbox and registers).
> +When CONFIG_STRICT_KERNEL_RWX is disabled, 4 entries are reserved for kernel

Please remove CONFIG_STRICT_KERNEL_RWX documentation for now. This should
be a part of a patchset than enables CONFIG_STRICT_KERNEL_RWX.

> +TLB refill using 512MB pages. When CONFIG_STRICT_KERNEL_RWX is enabled, these
> +entries are unused since kernel is paginated using the same mecanism than for
> +user (page walking and entries in JTLB)
> +
> +Page Table
> +==========
> +
> +We only support three levels for the page table and 4KB for page size.
> +
> +3 levels page table
> +-------------------
> +
> +::
> +
> +  ...-----+--------+--------+--------+--------+--------+
> +        40|39    32|31    24|23    16|15     8|7      0|
> +  ...-----++-------+--+-----+---+----+----+---+--------+
> +           |          |         |         |
> +           |          |         |         +--->  [11:0] Offset (12 bits)
> +           |          |         +------------->  [20:12] PTE offset (9 bits)
> +           |          +----------------------->  [29:21] PMD offset (9 bits)
> +           +---------------------------------->  [39:30] PGD offset (10 bits)
> +
> +Bits 40 to 64 are signed extended according to bit 39. If bit 39 is equal to 1
> +we are in kernel space.

Previously you've mentioned that bit 40 is used to differentiate kernel and
user space. Which one is correct?

> +
> +As 10 bits are used for PGD we need to allocate 2 pages.
> +
> +PTE format
> +==========

Shouldn't PTE format section be the same level as "3 levels page table"?

> +
> +About the format of the PTE entry, as we are not forced by hardware for choices,
> +we choose to follow the format described in the RiscV implementation as a
> +starting point::
> +
> +   +---------+--------+----+--------+---+---+---+---+---+---+------+---+---+
> +   | 63..23  | 22..13 | 12 | 11..10 | 9 | 8 | 7 | 6 | 5 | 4 | 3..2 | 1 | 0 |
> +   +---------+--------+----+--------+---+---+---+---+---+---+------+---+---+
> +       PFN     Unused   S    PageSZ   H   G   X   W   R   D    CP    A   P
> +         where:
> +          P: Present
> +          A: Accessed
> +          CP: Cache policy
> +          D: Dirty
> +          R: Read
> +          W: Write
> +          X: Executable
> +          G: Global
> +          H: Huge page
> +          PageSZ: Page size as set in TLB format (0:4KB, 1:64KB, 2:2MB, 3:512MB)
> +          S: Soft/Special
> +          PFN: Page frame number (depends on page size)
> +
> +Huge bit must be somewhere in the first 12 bits to be able to detect it
> +when reading the PMD entry.
> +
> +PageSZ must be on bit 10 and 11 because it matches the TEL.PS bits. And
> +by doing that it is easier in assembly to set the TEL.PS to PageSZ.
> +
> +Fast TLB refill
> +===============
> +
> +kvx core does not feature a hardware page walker. This work must be done
> +by the core in software. In order to optimize TLB refill, a special fast
> +path is taken when entering in kernel space.
> +In order to speed up the process, the following actions are taken:
> +
> + 1. Save some registers in a per process scratchpad
> + 2. If the trap is a nomapping then try the fastpath
> + 3. Save some more registers for this fastpath
> + 4. Check if faulting address is a memory direct mapping one.
> +
> +    * If entry is a direct mapping one and RWX is not enabled, add an entry into LTLB
> +    * If not, continue
> +
> + 5. Try to walk the page table
> +
> +    * If entry is not present, take the slowpath (do_page_fault)
> +
> + 6. Refill the tlb properly
> + 7. Exit by restoring only a few registers
> +
> +ASN Handling
> +============
> +
> +Disclaimer: Some part of this are taken from ARC architecture.
> +
> +kvx MMU provides 9-bit ASN (Address Space Number) in order to tag TLB entries.
> +It allows for multiple process with the same virtual space to cohabit without
> +the need to flush TLB everytime we context switch.
> +kvx implementation to use them is based on other architectures (such as arc
> +or xtensa) and uses a wrapping ASN counter containing both cycle/generation and
> +asn.
> +
> +::
> +
> +  +---------+--------+
> +  |63     10|9      0|
> +  +---------+--------+
> +    Cycle      ASN
> +
> +This ASN counter is incremented monotonously to allocate new ASNs. When the
> +counter reaches 511 (9 bit), TLB is completely flushed and a new cycle is
> +started. A new allocation cycle, post rollover, could potentially reassign an
> +ASN to a different task. Thus the rule is to reassign an ASN when the current
> +context cycles does not match the allocation cycle.
> +The 64 bit @cpu_asn_cache (and mm->asn) have 9 bits MMU ASN and rest 55 bits
> +serve as cycle/generation indicator and natural 64 bit unsigned math
> +automagically increments the generation when lower 9 bits rollover.
> +When the counter completely wraps, we reset the counter to first cycle value
> +(ie cycle = 1). This allows to distinguish context without any ASN and old cycle
> +generated value with the same operation (XOR on cycle).
> +
> +Huge page
> +=========
> +
> +Currently only 3 level page table has been implemented for 4KB base page size.
> +So the page shift is 12 bits, the pmd shift is 21 and the pgdir shift is 30 bits.
> +This choice implies that for 4KB base page size if we use a PMD as a huge
> +page the size will be 2MB and if we use a PUD as a huge page it will be 1GB.
> +
> +To support other huge page sizes (64KB and 512MB) we need to use several
> +contiguous entries in the page table. For huge page of 64KB we will need to
> +use 16 entries in the PTE and for a huge page of 512MB it means that 256
> +entries in PMD will be used.
> +
> +Debug
> +=====
> +
> +In order to debug the page table and tlb entries, gdb scripts contains commands
> +which allows to dump the page table:
> +
> +:``lx-kvx-page-table-walk``: Display the current process page table by default
> +:``lx-kvx-tlb-decode``: Display the content of $tel and $teh into something readable
> +
> +Other commands available in kvx-gdb are the following:
> +
> +:``mppa-dump-tlb``: Display the content of TLBs (JTLB and LTLB)
> +:``mppa-lookup-addr``: Find physical address matching a virtual one
Mike Rapoport Jan. 22, 2023, 4:09 p.m. UTC | #12
On Fri, Jan 20, 2023 at 03:09:51PM +0100, Yann Sionneau wrote:
> Add memory management support for kvx, including: cache and tlb
> management, page fault handling, ioremap/mmap and streaming dma support.
> 
> Co-developed-by: Clement Leger <clement@clement-leger.fr>
> Signed-off-by: Clement Leger <clement@clement-leger.fr>
> Co-developed-by: Guillaume Thouvenin <gthouvenin@kalray.eu>
> Signed-off-by: Guillaume Thouvenin <gthouvenin@kalray.eu>
> Co-developed-by: Jean-Christophe Pince <jcpince@gmail.com>
> Signed-off-by: Jean-Christophe Pince <jcpince@gmail.com>
> Co-developed-by: Jules Maselbas <jmaselbas@kalray.eu>
> Signed-off-by: Jules Maselbas <jmaselbas@kalray.eu>
> Co-developed-by: Julian Vetter <jvetter@kalray.eu>
> Signed-off-by: Julian Vetter <jvetter@kalray.eu>
> Co-developed-by: Julien Hascoet <jhascoet@kalray.eu>
> Signed-off-by: Julien Hascoet <jhascoet@kalray.eu>
> Co-developed-by: Louis Morhet <lmorhet@kalray.eu>
> Signed-off-by: Louis Morhet <lmorhet@kalray.eu>
> Co-developed-by: Marc Poulhiès <dkm@kataplop.net>
> Signed-off-by: Marc Poulhiès <dkm@kataplop.net>
> Co-developed-by: Marius Gligor <mgligor@kalray.eu>
> Signed-off-by: Marius Gligor <mgligor@kalray.eu>
> Co-developed-by: Vincent Chardon <vincent.chardon@elsys-design.com>
> Signed-off-by: Vincent Chardon <vincent.chardon@elsys-design.com>
> Co-developed-by: Yann Sionneau <ysionneau@kalray.eu>
> Signed-off-by: Yann Sionneau <ysionneau@kalray.eu>
> ---
> 
> Notes:
>     V1 -> V2: removed L2 cache management
> 
>  arch/kvx/include/asm/cache.h        |  43 +++
>  arch/kvx/include/asm/cacheflush.h   | 158 ++++++++++
>  arch/kvx/include/asm/fixmap.h       |  47 +++
>  arch/kvx/include/asm/hugetlb.h      |  36 +++
>  arch/kvx/include/asm/mem_map.h      |  44 +++
>  arch/kvx/include/asm/mmu.h          | 289 ++++++++++++++++++
>  arch/kvx/include/asm/mmu_context.h  | 156 ++++++++++
>  arch/kvx/include/asm/mmu_stats.h    |  38 +++
>  arch/kvx/include/asm/page.h         | 187 ++++++++++++
>  arch/kvx/include/asm/page_size.h    |  29 ++
>  arch/kvx/include/asm/pgalloc.h      | 101 +++++++
>  arch/kvx/include/asm/pgtable-bits.h | 102 +++++++
>  arch/kvx/include/asm/pgtable.h      | 451 ++++++++++++++++++++++++++++
>  arch/kvx/include/asm/rm_fw.h        |  16 +
>  arch/kvx/include/asm/sparsemem.h    |  15 +
>  arch/kvx/include/asm/symbols.h      |  16 +
>  arch/kvx/include/asm/tlb.h          |  24 ++
>  arch/kvx/include/asm/tlb_defs.h     | 131 ++++++++
>  arch/kvx/include/asm/tlbflush.h     |  58 ++++
>  arch/kvx/include/asm/vmalloc.h      |  10 +
>  arch/kvx/mm/cacheflush.c            | 154 ++++++++++
>  arch/kvx/mm/dma-mapping.c           |  85 ++++++
>  arch/kvx/mm/extable.c               |  24 ++
>  arch/kvx/mm/fault.c                 | 264 ++++++++++++++++
>  arch/kvx/mm/init.c                  | 277 +++++++++++++++++
>  arch/kvx/mm/mmap.c                  |  31 ++
>  arch/kvx/mm/mmu.c                   | 202 +++++++++++++
>  arch/kvx/mm/mmu_stats.c             |  94 ++++++
>  arch/kvx/mm/tlb.c                   | 433 ++++++++++++++++++++++++++
>  29 files changed, 3515 insertions(+)
>  create mode 100644 arch/kvx/include/asm/cache.h
>  create mode 100644 arch/kvx/include/asm/cacheflush.h
>  create mode 100644 arch/kvx/include/asm/fixmap.h
>  create mode 100644 arch/kvx/include/asm/hugetlb.h
>  create mode 100644 arch/kvx/include/asm/mem_map.h
>  create mode 100644 arch/kvx/include/asm/mmu.h
>  create mode 100644 arch/kvx/include/asm/mmu_context.h
>  create mode 100644 arch/kvx/include/asm/mmu_stats.h
>  create mode 100644 arch/kvx/include/asm/page.h
>  create mode 100644 arch/kvx/include/asm/page_size.h
>  create mode 100644 arch/kvx/include/asm/pgalloc.h
>  create mode 100644 arch/kvx/include/asm/pgtable-bits.h
>  create mode 100644 arch/kvx/include/asm/pgtable.h
>  create mode 100644 arch/kvx/include/asm/rm_fw.h
>  create mode 100644 arch/kvx/include/asm/sparsemem.h
>  create mode 100644 arch/kvx/include/asm/symbols.h
>  create mode 100644 arch/kvx/include/asm/tlb.h
>  create mode 100644 arch/kvx/include/asm/tlb_defs.h
>  create mode 100644 arch/kvx/include/asm/tlbflush.h
>  create mode 100644 arch/kvx/include/asm/vmalloc.h
>  create mode 100644 arch/kvx/mm/cacheflush.c
>  create mode 100644 arch/kvx/mm/dma-mapping.c
>  create mode 100644 arch/kvx/mm/extable.c
>  create mode 100644 arch/kvx/mm/fault.c
>  create mode 100644 arch/kvx/mm/init.c
>  create mode 100644 arch/kvx/mm/mmap.c
>  create mode 100644 arch/kvx/mm/mmu.c
>  create mode 100644 arch/kvx/mm/mmu_stats.c
>  create mode 100644 arch/kvx/mm/tlb.c

... 

> diff --git a/arch/kvx/include/asm/mmu.h b/arch/kvx/include/asm/mmu.h
> new file mode 100644
> index 000000000000..09f3fdd66a34
> --- /dev/null
> +++ b/arch/kvx/include/asm/mmu.h
> @@ -0,0 +1,289 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Guillaume Thouvenin
> + *            Clement Leger
> + *            Marc Poulhiès
> + */
> +
> +#ifndef _ASM_KVX_MMU_H
> +#define _ASM_KVX_MMU_H
> +
> +#include <linux/bug.h>
> +#include <linux/types.h>
> +#include <linux/threads.h>
> +
> +#include <asm/page.h>
> +#include <asm/sfr.h>
> +#include <asm/page.h>
> +#include <asm/pgtable-bits.h>
> +#include <asm/tlb_defs.h>
> +
> +/* Virtual addresses can use at most 41 bits */
> +#define MMU_VIRT_BITS		41
> +
> +/*
> + * See Documentation/kvx/kvx-mmu.txt for details about the division of the
> + * virtual memory space.
> + */
> +#if defined(CONFIG_KVX_4K_PAGES)
> +#define MMU_USR_ADDR_BITS	39
> +#else
> +#error "Only 4Ko page size is supported at this time"
> +#endif
> +
> +typedef struct mm_context {
> +	unsigned long end_brk;
> +	unsigned long asn[NR_CPUS];
> +	unsigned long sigpage;
> +} mm_context_t;
> +
> +struct __packed tlb_entry_low {
> +	unsigned int es:2;       /* Entry Status */
> +	unsigned int cp:2;       /* Cache Policy */
> +	unsigned int pa:4;       /* Protection Attributes */
> +	unsigned int r:2;        /* Reserved */
> +	unsigned int ps:2;       /* Page Size */
> +	unsigned int fn:28;      /* Frame Number */
> +};
> +
> +struct __packed tlb_entry_high {
> +	unsigned int asn:9;  /* Address Space Number */
> +	unsigned int g:1;    /* Global Indicator */
> +	unsigned int vs:2;   /* Virtual Space */
> +	unsigned int pn:29;  /* Page Number */
> +};
> +
> +struct kvx_tlb_format {
> +	union {
> +		struct tlb_entry_low tel;
> +		uint64_t tel_val;
> +	};
> +	union {
> +		struct tlb_entry_high teh;
> +		uint64_t teh_val;
> +	};
> +};

I believe tlb_entry is a better name and unless I've missed something, this
struct is only used internally in arch/kvx/mm, so it'd be better to declare
it in a header at arch/kvx/mm.

Generally, it's better to move internal definitions out of include/asm and
have them near the .c files that use them. For instance, I randomly picked
several functions below, e.g. kvx_mmu_probetlb(), tlb_entry_overlaps(), and
it seems they are only used in arch/kvx/mm.

> +
> +#define KVX_EMPTY_TLB_ENTRY { .tel_val = 0x0, .teh_val = 0x0 }
> +
> +/* Bit [0:39] of the TLB format corresponds to TLB Entry low */
> +/* Bit [40:80] of the TLB format corresponds to the TLB Entry high */
> +#define kvx_mmu_set_tlb_entry(tlbf) do { \
> +	kvx_sfr_set(TEL, (uint64_t) tlbf.tel_val); \
> +	kvx_sfr_set(TEH, (uint64_t) tlbf.teh_val); \
> +} while (0)
> +
> +#define kvx_mmu_get_tlb_entry(tlbf) do { \
> +	tlbf.tel_val = kvx_sfr_get(TEL); \
> +	tlbf.teh_val = kvx_sfr_get(TEH); \
> +} while (0)
> +
> +/* Use kvx_mmc_ to read a field from MMC value passed as parameter */
> +#define __kvx_mmc(mmc_reg, field) \
> +	kvx_sfr_field_val(mmc_reg, MMC, field)
> +

...

> diff --git a/arch/kvx/include/asm/mmu_context.h b/arch/kvx/include/asm/mmu_context.h
> new file mode 100644
> index 000000000000..39fa92f1506b
> --- /dev/null
> +++ b/arch/kvx/include/asm/mmu_context.h
> @@ -0,0 +1,156 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + *            Guillaume Thouvenin
> + */
> +
> +#ifndef __ASM_KVX_MMU_CONTEXT_H
> +#define __ASM_KVX_MMU_CONTEXT_H
> +
> +/*
> + * Management of the Address Space Number:
> + * Coolidge architecture provides a 9-bit ASN to tag TLB entries. This can be
> + * used to allow several entries with the same virtual address (so from
> + * different process) to be in the TLB at the same time. That means that won't
> + * necessarily flush the TLB when a context switch occurs and so it will
> + * improve performances.
> + */
> +#include <linux/smp.h>
> +
> +#include <asm/mmu.h>
> +#include <asm/sfr_defs.h>
> +#include <asm/tlbflush.h>
> +
> +#include <asm-generic/mm_hooks.h>

...

> +/**
> + * Redefining the generic hooks that are:
> + *   - activate_mm
> + *   - deactivate_mm
> + *   - enter_lazy_tlb
> + *   - init_new_context
> + *   - destroy_context
> + *   - switch_mm
> + */

Please drop this comment, it does not add any information
> +
> +#define activate_mm(prev, next) switch_mm((prev), (next), NULL)
> +#define deactivate_mm(tsk, mm) do { } while (0)
> +#define enter_lazy_tlb(mm, tsk) do { } while (0)
> +

...

> +#endif /* __ASM_KVX_MMU_CONTEXT_H */
> diff --git a/arch/kvx/include/asm/mmu_stats.h b/arch/kvx/include/asm/mmu_stats.h
> new file mode 100644
> index 000000000000..999352dbc1ce

Looks like this entire header can be moved to arch/kvx/mm

> --- /dev/null
> +++ b/arch/kvx/include/asm/mmu_stats.h
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + */
> +
> +#ifndef _ASM_KVX_MMU_STATS_H
> +#define _ASM_KVX_MMU_STATS_H
> +
> +#ifdef CONFIG_KVX_MMU_STATS
> +#include <linux/percpu.h>
> +
> +struct mmu_refill_stats {
> +	unsigned long count;
> +	unsigned long total;
> +	unsigned long min;
> +	unsigned long max;
> +};
> +
> +enum mmu_refill_type {
> +	MMU_REFILL_TYPE_USER,
> +	MMU_REFILL_TYPE_KERNEL,
> +	MMU_REFILL_TYPE_KERNEL_DIRECT,
> +	MMU_REFILL_TYPE_COUNT,
> +};
> +
> +struct mmu_stats {
> +	struct mmu_refill_stats refill[MMU_REFILL_TYPE_COUNT];
> +	/* keep these fields ordered this way for assembly */
> +	unsigned long cycles_between_refill;
> +	unsigned long last_refill;
> +	unsigned long tlb_flush_all;
> +};
> +
> +DECLARE_PER_CPU(struct mmu_stats, mmu_stats);
> +#endif
> +
> +#endif /* _ASM_KVX_MMU_STATS_H */

...

> diff --git a/arch/kvx/include/asm/pgalloc.h b/arch/kvx/include/asm/pgalloc.h
> new file mode 100644
> index 000000000000..0e654dd1a072
> --- /dev/null
> +++ b/arch/kvx/include/asm/pgalloc.h
> @@ -0,0 +1,101 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Guillaume Thouvenin
> + *            Clement Leger
> + */
> +
> +#ifndef _ASM_KVX_PGALLOC_H
> +#define _ASM_KVX_PGALLOC_H
> +
> +#include <linux/mm.h>
> +#include <asm/tlb.h>
> +
> +#define __HAVE_ARCH_PGD_FREE
> +#include <asm-generic/pgalloc.h>	/* for pte_{alloc,free}_one */
> +
> +static inline void check_pgt_cache(void)
> +{
> +	/*
> +	 * check_pgt_cache() is called to check watermarks from counters that
> +	 * computes the number of pages allocated by cached allocation functions
> +	 * pmd_alloc_one_fast() and pte_alloc_one_fast().
> +	 * Currently we just skip this test.
> +	 */

It seems this function is never called.

> +}
> +
> +/**
> + * PGD
> + */

Please drop these comments (here and for lower levels as well), they don't
add information but only take space.

> +
> +static inline void
> +pgd_free(struct mm_struct *mm, pgd_t *pgd)
> +{
> +	free_pages((unsigned long) pgd, PAGES_PER_PGD);
> +}
> +
> +static inline
> +pgd_t *pgd_alloc(struct mm_struct *mm)
> +{
> +	pgd_t *pgd;
> +
> +	pgd = (pgd_t *) __get_free_pages(GFP_KERNEL, PAGES_PER_PGD);
> +	if (unlikely(pgd == NULL))
> +		return NULL;
> +
> +	memset(pgd, 0, USER_PTRS_PER_PGD * sizeof(pgd_t));
> +
> +	/* Copy kernel mappings */
> +	memcpy(pgd + USER_PTRS_PER_PGD,
> +	       init_mm.pgd + USER_PTRS_PER_PGD,
> +	       (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
> +
> +	return pgd;
> +}
> +
> +/**
> + * PUD
> + */
> +
> +static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
> +{
> +	unsigned long pfn = virt_to_pfn(pmd);
> +
> +	set_pud(pud, __pud((unsigned long)pfn << PAGE_SHIFT));
> +}
> +
> +/**
> + * PMD
> + */
> +
> +static inline void pmd_populate_kernel(struct mm_struct *mm,
> +	pmd_t *pmd, pte_t *pte)
> +{
> +	unsigned long pfn = virt_to_pfn(pte);
> +
> +	set_pmd(pmd, __pmd((unsigned long)pfn << PAGE_SHIFT));
> +}
> +
> +static inline void pmd_populate(struct mm_struct *mm,
> +	pmd_t *pmd, pgtable_t pte)
> +{
> +	unsigned long pfn = virt_to_pfn(page_address(pte));
> +
> +	set_pmd(pmd, __pmd((unsigned long)pfn << PAGE_SHIFT));
> +}
> +
> +#if CONFIG_PGTABLE_LEVELS > 2
> +#define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd)
> +#endif /* CONFIG_PGTABLE_LEVELS > 2 */
> +
> +/**
> + * PTE
> + */
> +
> +#define __pte_free_tlb(tlb, pte, buf)   \
> +do {                                    \
> +	pgtable_pte_page_dtor(pte);         \
> +	tlb_remove_page((tlb), pte);    \
> +} while (0)
> +
> +#endif /* _ASM_KVX_PGALLOC_H */

...

> diff --git a/arch/kvx/include/asm/pgtable.h b/arch/kvx/include/asm/pgtable.h
> new file mode 100644
> index 000000000000..9e36db4d98a7
> --- /dev/null
> +++ b/arch/kvx/include/asm/pgtable.h
> @@ -0,0 +1,451 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Guillaume Thouvenin
> + *            Clement Leger
> + *            Marius Gligor
> + *            Yann Sionneau
> + */
> +
> +#ifndef _ASM_KVX_PGTABLE_H
> +#define _ASM_KVX_PGTABLE_H
> +

...

> +
> +/*
> + * PGD definitions:
> + *   - pgd_ERROR
> + */

With macro name pgd_ERROR, it's already clear that this is pgd_ERROR.
Please drop the comment.

> +#define pgd_ERROR(e) \
> +	pr_err("%s:%d: bad pgd %016lx.\n", __FILE__, __LINE__, pgd_val(e))
> +
> +/*
> + * PUD
> + *
> + * As we manage a three level page table the call to set_pud is used to fill
> + * PGD.
> + */
> +static inline void set_pud(pud_t *pudp, pud_t pmd)
> +{
> +	*pudp = pmd;
> +}
> +
> +static inline int pud_none(pud_t pud)
> +{
> +	return pud_val(pud) == 0;
> +}
> +
> +static inline int pud_bad(pud_t pud)
> +{
> +	return pud_none(pud);
> +}
> +static inline int pud_present(pud_t pud)
> +{
> +	return pud_val(pud) != 0;
> +}
> +
> +static inline void pud_clear(pud_t *pud)
> +{
> +	set_pud(pud, __pud(0));
> +}
> +
> +/*
> + * PMD definitions:
> + *   - set_pmd
> + *   - pmd_present
> + *   - pmd_none
> + *   - pmd_bad
> + *   - pmd_clear
> + *   - pmd_page
> + */

Ditto

> +
> +static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
> +{
> +	*pmdp = pmd;
> +}
> +
> +/* Returns 1 if entry is present */
> +static inline int pmd_present(pmd_t pmd)
> +{
> +	return pmd_val(pmd) != 0;
> +}
> +
> +/* Returns 1 if the corresponding entry has the value 0 */
> +static inline int pmd_none(pmd_t pmd)
> +{
> +	return pmd_val(pmd) == 0;
> +}
> +
> +/* Used to check that a page middle directory entry is valid */
> +static inline int pmd_bad(pmd_t pmd)
> +{
> +	return pmd_none(pmd);
> +}
> +
> +/* Clears the entry to prevent process to use the linear address that
> + * mapped it.
> + */
> +static inline void pmd_clear(pmd_t *pmdp)
> +{
> +	set_pmd(pmdp, __pmd(0));
> +}
> +
> +/*
> + * Returns the address of the descriptor of the page table referred by the
> + * PMD entry.
> + */
> +static inline struct page *pmd_page(pmd_t pmd)
> +{
> +	if (pmd_val(pmd) & _PAGE_HUGE)
> +		return pfn_to_page(
> +				(pmd_val(pmd) & KVX_PFN_MASK) >> KVX_PFN_SHIFT);
> +
> +	return pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT);
> +}
> +
> +#define pmd_ERROR(e) \
> +	pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))
> +
> +static inline pmd_t *pud_pgtable(pud_t pud)
> +{
> +	return (pmd_t *)pfn_to_virt(pud_val(pud) >> PAGE_SHIFT);
> +}
> +
> +static inline struct page *pud_page(pud_t pud)
> +{
> +	return pfn_to_page(pud_val(pud) >> PAGE_SHIFT);
> +}
> +
> +/*
> + * PTE definitions:
> + *   - set_pte
> + *   - set_pte_at
> + *   - pte_clear
> + *   - pte_page
> + *   - pte_pfn
> + *   - pte_present
> + *   - pte_none
> + *   - pte_write
> + *   - pte_dirty
> + *   - pte_young
> + *   - pte_special
> + *   - pte_mkdirty
> + *   - pte_mkwrite
> + *   - pte_mkclean
> + *   - pte_mkyoung
> + *   - pte_mkold
> + *   - pte_mkspecial
> + *   - pte_wrprotect
> + */

Ditto

> +
> +static inline void set_pte(pte_t *ptep, pte_t pteval)
> +{
> +	*ptep = pteval;
> +}
> +
> +static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t pteval)
> +{
> +	set_pte(ptep, pteval);
> +}
> +
> +#define pte_clear(mm, addr, ptep) set_pte(ptep, __pte(0))
> +
> +/* Constructs a page table entry */
> +static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
> +{
> +	return __pte(((pfn << KVX_PFN_SHIFT) & KVX_PFN_MASK) |
> +		     pgprot_val(prot));
> +}
> +
> +/* Builds a page table entry by combining a page descriptor and a group of
> + * access rights.
> + */
> +#define mk_pte(page, prot)	(pfn_pte(page_to_pfn(page), prot))
> +
> +/* Modifies page access rights */
> +static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
> +{
> +	return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
> +}
> +
> +#define pte_page(x)     pfn_to_page(pte_pfn(x))
> +
> +static inline unsigned long pmd_page_vaddr(pmd_t pmd)
> +{
> +	return (unsigned long)pfn_to_virt(pmd_val(pmd) >> PAGE_SHIFT);
> +}
> +
> +/* Yields the page frame number (PFN) of a page table entry */
> +static inline unsigned long pte_pfn(pte_t pte)
> +{
> +	return ((pte_val(pte) & KVX_PFN_MASK) >> KVX_PFN_SHIFT);
> +}
> +
> +static inline int pte_present(pte_t pte)
> +{
> +	return (pte_val(pte) & (_PAGE_PRESENT | _PAGE_NONE));
> +}
> +
> +static inline int pte_none(pte_t pte)
> +{
> +	return (pte_val(pte) == 0);
> +}
> +
> +static inline int pte_write(pte_t pte)
> +{
> +	return pte_val(pte) & _PAGE_WRITE;
> +}
> +
> +static inline int pte_dirty(pte_t pte)
> +{
> +	return pte_val(pte) & _PAGE_DIRTY;
> +}
> +
> +static inline int pte_young(pte_t pte)
> +{
> +	return pte_val(pte) & _PAGE_ACCESSED;
> +}
> +
> +static inline int pte_special(pte_t pte)
> +{
> +	return pte_val(pte) & _PAGE_SPECIAL;
> +}
> +
> +static inline int pte_huge(pte_t pte)
> +{
> +	return pte_val(pte) & _PAGE_HUGE;
> +}
> +
> +static inline pte_t pte_mkdirty(pte_t pte)
> +{
> +	return __pte(pte_val(pte) | _PAGE_DIRTY);
> +}
> +
> +static inline pte_t pte_mkwrite(pte_t pte)
> +{
> +	return __pte(pte_val(pte) | _PAGE_WRITE);
> +}
> +
> +static inline pte_t pte_mkclean(pte_t pte)
> +{
> +	return __pte(pte_val(pte) & ~(_PAGE_DIRTY));
> +}
> +
> +static inline pte_t pte_mkyoung(pte_t pte)
> +{
> +	return __pte(pte_val(pte) | _PAGE_ACCESSED);
> +}
> +
> +static inline pte_t pte_mkold(pte_t pte)
> +{
> +	return __pte(pte_val(pte) & ~(_PAGE_ACCESSED));
> +}
> +
> +static inline pte_t pte_mkspecial(pte_t pte)
> +{
> +	return __pte(pte_val(pte) | _PAGE_SPECIAL);
> +}
> +
> +static inline pte_t pte_wrprotect(pte_t pte)
> +{
> +	return __pte(pte_val(pte) & ~(_PAGE_WRITE));
> +}
> +
> +static inline pte_t pte_mkhuge(pte_t pte)
> +{
> +	return __pte(pte_val(pte) | _PAGE_HUGE);
> +}
> +
> +static inline pte_t pte_of_pmd(pmd_t pmd)
> +{
> +	return __pte(pmd_val(pmd));
> +}
> +
> +#define pmd_pfn(pmd)       pte_pfn(pte_of_pmd(pmd))
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE

I don't see HAVE_ARCH_TRANSPARENT_HUGEPAGE in arch/kvx/Kconfig. Please
remove this part for now.

> +
> +#define pmdp_establish pmdp_establish
> +static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
> +		unsigned long address, pmd_t *pmdp, pmd_t pmd)
> +{
> +	return __pmd(xchg(&pmd_val(*pmdp), pmd_val(pmd)));
> +}
> +
> +static inline int pmd_trans_huge(pmd_t pmd)
> +{
> +	return !!(pmd_val(pmd) & _PAGE_HUGE);
> +}
> +
> +static inline pmd_t pmd_of_pte(pte_t pte)
> +{
> +	return __pmd(pte_val(pte));
> +}
> +
> +
> +#define pmd_mkclean(pmd)      pmd_of_pte(pte_mkclean(pte_of_pmd(pmd)))
> +#define pmd_mkdirty(pmd)      pmd_of_pte(pte_mkdirty(pte_of_pmd(pmd)))
> +#define pmd_mkold(pmd)	      pmd_of_pte(pte_mkold(pte_of_pmd(pmd)))
> +#define pmd_mkwrite(pmd)      pmd_of_pte(pte_mkwrite(pte_of_pmd(pmd)))
> +#define pmd_mkyoung(pmd)      pmd_of_pte(pte_mkyoung(pte_of_pmd(pmd)))
> +#define pmd_modify(pmd, prot) pmd_of_pte(pte_modify(pte_of_pmd(pmd), prot))
> +#define pmd_wrprotect(pmd)    pmd_of_pte(pte_wrprotect(pte_of_pmd(pmd)))
> +
> +static inline pmd_t pmd_mkhuge(pmd_t pmd)
> +{
> +	/* Create a huge page in PMD implies a size of 2 MB */
> +	return __pmd(pmd_val(pmd) |
> +			_PAGE_HUGE | (TLB_PS_2M << KVX_PAGE_SZ_SHIFT));
> +}
> +
> +static inline pmd_t pmd_mkinvalid(pmd_t pmd)
> +{
> +	pmd_val(pmd) &= ~(_PAGE_PRESENT);
> +
> +	return pmd;
> +}
> +
> +#define pmd_dirty(pmd)     pte_dirty(pte_of_pmd(pmd))
> +#define pmd_write(pmd)     pte_write(pte_of_pmd(pmd))
> +#define pmd_young(pmd)     pte_young(pte_of_pmd(pmd))
> +
> +#define mk_pmd(page, prot)  pmd_of_pte(mk_pte(page, prot))
> +
> +static inline pmd_t pfn_pmd(unsigned long pfn, pgprot_t prot)
> +{
> +	return __pmd(((pfn << KVX_PFN_SHIFT) & KVX_PFN_MASK) |
> +			pgprot_val(prot));
> +}
> +
> +static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
> +			      pmd_t *pmdp, pmd_t pmd)
> +{
> +	*pmdp = pmd;
> +}
> +
> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> +
> +#endif	/* _ASM_KVX_PGTABLE_H */
> diff --git a/arch/kvx/include/asm/rm_fw.h b/arch/kvx/include/asm/rm_fw.h
> new file mode 100644
> index 000000000000..f89bdd5915ed
> --- /dev/null
> +++ b/arch/kvx/include/asm/rm_fw.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + */
> +
> +#ifndef _ASM_KVX_RM_FW_H
> +#define _ASM_KVX_RM_FW_H
> +
> +#include <linux/sizes.h>
> +
> +#define KVX_RM_ID	16
> +
> +#define RM_FIRMWARE_REGS_SIZE	(SZ_4K)
> +
> +#endif /* _ASM_KVX_RM_FW_H */
> diff --git a/arch/kvx/include/asm/sparsemem.h b/arch/kvx/include/asm/sparsemem.h
> new file mode 100644
> index 000000000000..2f35743f20fb
> --- /dev/null
> +++ b/arch/kvx/include/asm/sparsemem.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + */
> +
> +#ifndef _ASM_KVX_SPARSEMEM_H
> +#define _ASM_KVX_SPARSEMEM_H

Does kvx support holes between memory banks? If no, FLATMEM should do.

> +
> +#ifdef CONFIG_SPARSEMEM
> +#define MAX_PHYSMEM_BITS	40
> +#define SECTION_SIZE_BITS	30
> +#endif /* CONFIG_SPARSEMEM */
> +
> +#endif /* _ASM_KVX_SPARSEMEM_H */
> diff --git a/arch/kvx/include/asm/symbols.h b/arch/kvx/include/asm/symbols.h
> new file mode 100644
> index 000000000000..a53c1607979f
> --- /dev/null
> +++ b/arch/kvx/include/asm/symbols.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + */
> +
> +#ifndef _ASM_KVX_SYMBOLS_H
> +#define _ASM_KVX_SYMBOLS_H
> +
> +/* Symbols to patch TLB refill handler */
> +extern char kvx_perf_tlb_refill[], kvx_std_tlb_refill[];
> +
> +/* Entry point of the ELF, used to start other PEs in SMP */
> +extern int kvx_start[];
> +
> +#endif
> diff --git a/arch/kvx/include/asm/tlb.h b/arch/kvx/include/asm/tlb.h
> new file mode 100644
> index 000000000000..190b682e1819
> --- /dev/null
> +++ b/arch/kvx/include/asm/tlb.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Guillaume Thouvenin
> + *            Clement Leger
> + */
> +
> +#ifndef _ASM_KVX_TLB_H
> +#define _ASM_KVX_TLB_H
> +
> +struct mmu_gather;
> +
> +static void tlb_flush(struct mmu_gather *tlb);
> +
> +int clear_ltlb_entry(unsigned long vaddr);
> +
> +#include <asm-generic/tlb.h>
> +
> +static inline unsigned int pgprot_cache_policy(unsigned long flags)
> +{
> +	return (flags & KVX_PAGE_CP_MASK) >> KVX_PAGE_CP_SHIFT;
> +}
> +
> +#endif /* _ASM_KVX_TLB_H */
> diff --git a/arch/kvx/include/asm/tlb_defs.h b/arch/kvx/include/asm/tlb_defs.h
> new file mode 100644
> index 000000000000..3f5b29cd529e
> --- /dev/null
> +++ b/arch/kvx/include/asm/tlb_defs.h

It looks like this header can be moved to arch/kvx/mm

> @@ -0,0 +1,131 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + *            Julian Vetter
> + *            Guillaume Thouvenin
> + *            Marius Gligor
> + */
> +
> +#ifndef _ASM_KVX_TLB_DEFS_H
> +#define _ASM_KVX_TLB_DEFS_H
> +

...

> diff --git a/arch/kvx/mm/init.c b/arch/kvx/mm/init.c
> new file mode 100644
> index 000000000000..bac34bc09eb5
> --- /dev/null
> +++ b/arch/kvx/mm/init.c
> @@ -0,0 +1,277 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + *            Guillaume Thouvenin
> + */
> +
> +/* Memblock header depends on types.h but does not include it ! */
> +#include <linux/types.h>
> +#include <linux/memblock.h>
> +#include <linux/mmzone.h>
> +#include <linux/of_fdt.h>
> +#include <linux/sched.h>
> +#include <linux/sizes.h>
> +#include <linux/init.h>
> +#include <linux/initrd.h>
> +#include <linux/pfn.h>
> +#include <linux/mm.h>
> +
> +#include <asm/sections.h>
> +#include <asm/tlb_defs.h>
> +#include <asm/tlbflush.h>
> +#include <asm/fixmap.h>
> +#include <asm/page.h>
> +
> +/*
> + * On kvx, memory map contains the first 2G of DDR being aliased.
> + * Full contiguous DDR is located at @[4G - 68G].
> + * However, to access this DDR in 32bit mode, the first 2G of DDR are
> + * mirrored from 4G to 2G.
> + * These first 2G are accessible from all DMAs (included 32 bits one).
> + *
> + * Hence, the memory map is the following:
> + *
> + * (68G) 0x1100000000-> +-------------+
> + *                      |             |
> + *              66G     |(ZONE_NORMAL)|
> + *                      |             |
> + *   (6G) 0x180000000-> +-------------+
> + *                      |             |
> + *              2G      |(ZONE_DMA32) |
> + *                      |             |
> + *   (4G) 0x100000000-> +-------------+ +--+
> + *                      |             |    |
> + *              2G      |   (Alias)   |    | 2G Alias
> + *                      |             |    |
> + *    (2G) 0x80000000-> +-------------+ <--+
> + *
> + * The translation of 64 bits -> 32 bits can then be done using dma-ranges property
> + * in device-trees.
> + */
> +
> +#define DDR_64BIT_START		(4ULL * SZ_1G)
> +#define DDR_32BIT_ALIAS_SIZE	(2ULL * SZ_1G)
> +
> +#define MAX_DMA32_PFN	PHYS_PFN(DDR_64BIT_START + DDR_32BIT_ALIAS_SIZE)
> +
> +pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
> +
> +/*
> + * empty_zero_page is a special page that is used for zero-initialized data and
> + * COW.
> + */
> +struct page *empty_zero_page;
> +EXPORT_SYMBOL(empty_zero_page);
> +
> +extern char _start[];
> +extern char __kernel_smem_code_start[];
> +extern char __kernel_smem_code_end[];
> +
> +struct kernel_section {
> +	phys_addr_t start;
> +	phys_addr_t end;
> +};
> +
> +struct kernel_section kernel_sections[] = {
> +	{
> +		.start = (phys_addr_t)__kernel_smem_code_start,
> +		.end = (phys_addr_t)__kernel_smem_code_end
> +	},
> +	{
> +		.start = __pa(_start),
> +		.end = __pa(_end)
> +	}
> +};
> +
> +static void __init zone_sizes_init(void)
> +{
> +	unsigned long zones_size[MAX_NR_ZONES];
> +
> +	memset(zones_size, 0, sizeof(zones_size));
> +
> +	zones_size[ZONE_DMA32] = min(MAX_DMA32_PFN, max_low_pfn);
> +	zones_size[ZONE_NORMAL] = max_low_pfn;
> +
> +	free_area_init(zones_size);
> +}
> +
> +#ifdef CONFIG_BLK_DEV_INITRD
> +static void __init setup_initrd(void)
> +{
> +	u64 base = phys_initrd_start;
> +	u64 end = phys_initrd_start + phys_initrd_size;
> +
> +	if (phys_initrd_size == 0) {
> +		pr_info("initrd not found or empty");
> +		return;
> +	}
> +
> +	if (base < memblock_start_of_DRAM() || end > memblock_end_of_DRAM()) {
> +		pr_err("initrd not in accessible memory, disabling it");
> +		phys_initrd_size = 0;
> +		return;
> +	}
> +
> +	pr_info("initrd: 0x%llx - 0x%llx\n", base, end);
> +
> +	memblock_reserve(phys_initrd_start, phys_initrd_size);
> +
> +	/* the generic initrd code expects virtual addresses */
> +	initrd_start = (unsigned long) __va(base);
> +	initrd_end = initrd_start + phys_initrd_size;
> +}
> +#endif
> +
> +static phys_addr_t memory_limit = PHYS_ADDR_MAX;
> +
> +static int __init early_mem(char *p)
> +{
> +	if (!p)
> +		return 1;
> +
> +	memory_limit = memparse(p, &p) & PAGE_MASK;
> +	pr_notice("Memory limited to %lldMB\n", memory_limit >> 20);
> +
> +	return 0;
> +}
> +early_param("mem", early_mem);
> +
> +static void __init setup_bootmem(void)
> +{
> +	phys_addr_t kernel_start, kernel_end;
> +	phys_addr_t start, end = 0;
> +	u64 i;
> +
> +	init_mm.start_code = (unsigned long)_stext;
> +	init_mm.end_code = (unsigned long)_etext;
> +	init_mm.end_data = (unsigned long)_edata;
> +	init_mm.brk = (unsigned long)_end;
> +
> +	for (i = 0; i < ARRAY_SIZE(kernel_sections); i++) {
> +		kernel_start = kernel_sections[i].start;
> +		kernel_end = kernel_sections[i].end;
> +
> +		memblock_reserve(kernel_start, kernel_end - kernel_start);
> +	}
> +
> +	for_each_mem_range(i, &start, &end) {
> +		pr_info("%15s: memory  : 0x%lx - 0x%lx\n", __func__,
> +			(unsigned long)start,
> +			(unsigned long)end);
> +	}
> +
> +	/* min_low_pfn is the lowest PFN available in the system */
> +	min_low_pfn = PFN_UP(memblock_start_of_DRAM());
> +
> +	/* max_low_pfn indicates the end if NORMAL zone */
> +	max_low_pfn = PFN_DOWN(memblock_end_of_DRAM());

There is also max_pfn that's used by various pfn walkers. In your case it
should be the same as max_low_pfn.

> +
> +	/* Set the maximum number of pages in the system */
> +	set_max_mapnr(max_low_pfn - min_low_pfn);
> +
> +#ifdef CONFIG_BLK_DEV_INITRD
> +	setup_initrd();
> +#endif
> +
> +	if (memory_limit != PHYS_ADDR_MAX)
> +		memblock_mem_limit_remove_map(memory_limit);

This may cut off the initrd memory. It's be better to cap the memory before
setting up the initrd.

> +
> +	/* Don't reserve the device tree if its builtin */
> +	if (!is_kernel_rodata((unsigned long) initial_boot_params))
> +		early_init_fdt_reserve_self();
> +	early_init_fdt_scan_reserved_mem();
> +
> +	memblock_allow_resize();
> +	memblock_dump_all();
> +}
> +
> +static pmd_t fixmap_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
> +static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss __maybe_unused;
> +
> +void __init early_fixmap_init(void)
> +{
> +	unsigned long vaddr;
> +	pgd_t *pgd;
> +	p4d_t *p4d;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +
> +	/*
> +	 * Fixed mappings:
> +	 */
> +	vaddr = __fix_to_virt(__end_of_fixed_addresses - 1);
> +	pgd = pgd_offset_pgd(swapper_pg_dir, vaddr);
> +	set_pgd(pgd, __pgd(__pa_symbol(fixmap_pmd)));
> +
> +	p4d = p4d_offset(pgd, vaddr);
> +	pud = pud_offset(p4d, vaddr);
> +	pmd = pmd_offset(pud, vaddr);
> +	set_pmd(pmd, __pmd(__pa_symbol(fixmap_pte)));
> +}
> +
> +void __init setup_arch_memory(void)
> +{
> +	setup_bootmem();
> +	sparse_init();
> +	zone_sizes_init();
> +}
> +
> +void __init mem_init(void)
> +{
> +	memblock_free_all();
> +
> +	/* allocate the zero page */
> +	empty_zero_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!empty_zero_page)
> +		panic("Failed to allocate the empty_zero_page");
> +}
> +
> +void free_initmem(void)
> +{
> +#ifdef CONFIG_POISON_INITMEM
> +	free_initmem_default(0x0);
> +#else
> +	free_initmem_default(-1);
> +#endif

Any reason not to use default implementation in init/main.c?

> +}
> +
> +void __set_fixmap(enum fixed_addresses idx,
> +				phys_addr_t phys, pgprot_t flags)
> +{
> +	unsigned long addr = __fix_to_virt(idx);
> +	pte_t *pte;
> +
> +
> +	BUG_ON(idx >= __end_of_fixed_addresses);
> +
> +	pte = &fixmap_pte[pte_index(addr)];
> +
> +	if (pgprot_val(flags)) {
> +		set_pte(pte, pfn_pte(phys_to_pfn(phys), flags));
> +	} else {
> +		/* Remove the fixmap */
> +		pte_clear(&init_mm, addr, pte);
> +	}
> +	local_flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> +}
> +
> +static const pgprot_t protection_map[16] = {
> +	[VM_NONE]					= PAGE_NONE,
> +	[VM_READ]					= PAGE_READ,
> +	[VM_WRITE]					= PAGE_READ,
> +	[VM_WRITE | VM_READ]				= PAGE_READ,
> +	[VM_EXEC]					= PAGE_READ_EXEC,
> +	[VM_EXEC | VM_READ]				= PAGE_READ_EXEC,
> +	[VM_EXEC | VM_WRITE]				= PAGE_READ_EXEC,
> +	[VM_EXEC | VM_WRITE | VM_READ]			= PAGE_READ_EXEC,
> +	[VM_SHARED]					= PAGE_NONE,
> +	[VM_SHARED | VM_READ]				= PAGE_READ,
> +	[VM_SHARED | VM_WRITE]				= PAGE_READ_WRITE,
> +	[VM_SHARED | VM_WRITE | VM_READ]		= PAGE_READ_WRITE,
> +	[VM_SHARED | VM_EXEC]				= PAGE_READ_EXEC,
> +	[VM_SHARED | VM_EXEC | VM_READ]			= PAGE_READ_EXEC,
> +	[VM_SHARED | VM_EXEC | VM_WRITE]		= PAGE_READ_WRITE_EXEC,
> +	[VM_SHARED | VM_EXEC | VM_WRITE | VM_READ]	= PAGE_READ_WRITE_EXEC
> +};
> +DECLARE_VM_GET_PAGE_PROT
> diff --git a/arch/kvx/mm/mmap.c b/arch/kvx/mm/mmap.c
> new file mode 100644
> index 000000000000..a2225db64438
> --- /dev/null
> +++ b/arch/kvx/mm/mmap.c
> @@ -0,0 +1,31 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * derived from arch/arm64/mm/mmap.c
> + *
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + */
> +
> +#ifdef CONFIG_STRICT_DEVMEM
> +
> +#include <linux/mm.h>
> +#include <linux/ioport.h>
> +
> +#include <asm/page.h>
> +
> +/*
> + * devmem_is_allowed() checks to see if /dev/mem access to a certain address
> + * is valid. The argument is a physical page number.  We mimic x86 here by
> + * disallowing access to system RAM as well as device-exclusive MMIO regions.
> + * This effectively disable read()/write() on /dev/mem.
> + */
> +int devmem_is_allowed(unsigned long pfn)
> +{
> +	if (iomem_is_exclusive(pfn << PAGE_SHIFT))
> +		return 0;
> +	if (!page_is_ram(pfn))

This won't work because it relies on "System RAM" in the resource tree and
you don't setup this.

> +		return 1;
> +	return 0;
> +}
> +
> +#endif
> diff --git a/arch/kvx/mm/mmu.c b/arch/kvx/mm/mmu.c
> new file mode 100644
> index 000000000000..9cb11bd2dfdf
> --- /dev/null
> +++ b/arch/kvx/mm/mmu.c
> @@ -0,0 +1,202 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + *            Guillaume Thouvenin
> + *            Vincent Chardon
> + *            Jules Maselbas
> + */
> +
> +#include <linux/cache.h>
> +#include <linux/types.h>
> +#include <linux/irqflags.h>
> +#include <linux/printk.h>
> +#include <linux/percpu.h>
> +#include <linux/kernel.h>
> +#include <linux/spinlock.h>
> +#include <linux/spinlock_types.h>
> +
> +#include <asm/mmu.h>
> +#include <asm/tlb.h>
> +#include <asm/page_size.h>
> +#include <asm/mmu_context.h>
> +
> +#define DUMP_LTLB 0
> +#define DUMP_JTLB 1
> +
> +DEFINE_PER_CPU_ALIGNED(uint8_t[MMU_JTLB_SETS], jtlb_current_set_way);
> +static struct kvx_tlb_format ltlb_entries[MMU_LTLB_WAYS];
> +static unsigned long ltlb_entries_bmp;
> +
> +static int kvx_mmu_ltlb_overlaps(struct kvx_tlb_format tlbe)
> +{
> +	int bit = LTLB_ENTRY_FIXED_COUNT;
> +
> +	for_each_set_bit_from(bit, &ltlb_entries_bmp, MMU_LTLB_WAYS) {
> +		if (tlb_entry_overlaps(tlbe, ltlb_entries[bit]))
> +			return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * kvx_mmu_ltlb_add_entry - Add a kernel entry in the LTLB
> + *
> + * In order to lock some entries in the LTLB and be always mapped, this
> + * function can be called with a physical address, a virtual address and
> + * protection attribute to add an entry into the LTLB. This is mainly for
> + * performances since there won't be any NOMAPPING traps for these pages.
> + *
> + * @vaddr: Virtual address for the entry (must be aligned according to tlb_ps)
> + * @paddr: Physical address for the entry (must be aligned according to tlb_ps)
> + * @flags: Protection attributes
> + * @tlb_ps: Page size attribute for TLB (TLB_PS_*)
> + */
> +void kvx_mmu_ltlb_add_entry(unsigned long vaddr, phys_addr_t paddr,
> +			    pgprot_t flags, unsigned long tlb_ps)
> +{
> +	unsigned int cp;
> +	unsigned int idx;
> +	unsigned long irqflags;
> +	struct kvx_tlb_format tlbe;
> +	u64 page_size = BIT(get_page_size_shift(tlb_ps));
> +
> +	BUG_ON(!IS_ALIGNED(vaddr, page_size) || !IS_ALIGNED(paddr, page_size));
> +
> +	cp = pgprot_cache_policy(pgprot_val(flags));
> +
> +	tlbe = tlb_mk_entry(
> +		(void *) paddr,
> +		(void *) vaddr,

All occurrences of tlb_mk_entry() case paddr and vaddr parameters and then
tlb_mk_entry() essentially treats them as unsigned longs. Isn't it 
better to pass declare tlb_mk_entry as 

	tlb_mk_entry(phys_addr_t paddr, unsigned long vaddr, ... 

> +		tlb_ps,
> +		TLB_G_GLOBAL,
> +		TLB_PA_NA_RW,
> +		cp,
> +		0,
> +		TLB_ES_A_MODIFIED);

Please avoid having a parameter per line.

> +
> +	local_irq_save(irqflags);
> +
> +	if (IS_ENABLED(CONFIG_KVX_DEBUG_TLB_WRITE) &&
> +	    kvx_mmu_ltlb_overlaps(tlbe))
> +		panic("VA %lx overlaps with an existing LTLB mapping", vaddr);
> +
> +	idx = find_next_zero_bit(&ltlb_entries_bmp, MMU_LTLB_WAYS,
> +				 LTLB_ENTRY_FIXED_COUNT);
> +	/* This should never happen */
> +	BUG_ON(idx >= MMU_LTLB_WAYS);
> +	__set_bit(idx, &ltlb_entries_bmp);
> +	ltlb_entries[idx] = tlbe;
> +	kvx_mmu_add_entry(MMC_SB_LTLB, idx, tlbe);
> +
> +	if (kvx_mmc_error(kvx_sfr_get(MMC)))
> +		panic("Failed to write entry to the LTLB");
> +
> +	local_irq_restore(irqflags);
> +}
> +
> +void kvx_mmu_ltlb_remove_entry(unsigned long vaddr)
> +{
> +	int ret, bit = LTLB_ENTRY_FIXED_COUNT;
> +	struct kvx_tlb_format tlbe;
> +
> +	for_each_set_bit_from(bit, &ltlb_entries_bmp, MMU_LTLB_WAYS) {
> +		tlbe = ltlb_entries[bit];
> +		if (tlb_entry_match_addr(tlbe, vaddr)) {
> +			__clear_bit(bit, &ltlb_entries_bmp);
> +			break;
> +		}
> +	}
> +
> +	if (bit == MMU_LTLB_WAYS)
> +		panic("Trying to remove non-existent LTLB entry for addr %lx\n",
> +		      vaddr);
> +
> +	ret = clear_ltlb_entry(vaddr);
> +	if (ret)
> +		panic("Failed to remove LTLB entry for addr %lx\n", vaddr);
> +}
> +
> +/**
> + * kvx_mmu_jtlb_add_entry - Add an entry into JTLB
> + *
> + * JTLB is used both for kernel and user entries.
> + *
> + * @address: Virtual address for the entry (must be aligned according to tlb_ps)
> + * @ptep: pte entry pointer
> + * @asn: ASN (if pte entry is not global)
> + */
> +void kvx_mmu_jtlb_add_entry(unsigned long address, pte_t *ptep,
> +			    unsigned int asn)
> +{
> +	unsigned int shifted_addr, set, way;
> +	unsigned long flags, pte_val;
> +	struct kvx_tlb_format tlbe;
> +	unsigned int pa, cp, ps;
> +	phys_addr_t pfn;
> +
> +	pte_val = pte_val(*ptep);
> +
> +	pfn = (phys_addr_t)pte_pfn(*ptep);
> +
> +	asn &= MM_CTXT_ASN_MASK;
> +
> +	/* Set page as accessed */
> +	pte_val(*ptep) |= _PAGE_ACCESSED;
> +
> +	BUILD_BUG_ON(KVX_PAGE_SZ_SHIFT != KVX_SFR_TEL_PS_SHIFT);
> +
> +	ps = (pte_val & KVX_PAGE_SZ_MASK) >> KVX_PAGE_SZ_SHIFT;
> +	pa = get_page_access_perms(KVX_ACCESS_PERMS_INDEX(pte_val));
> +	cp = pgprot_cache_policy(pte_val);
> +
> +	tlbe = tlb_mk_entry(
> +		(void *)pfn_to_phys(pfn),
> +		(void *)address,
> +		ps,
> +		(pte_val & _PAGE_GLOBAL) ? TLB_G_GLOBAL : TLB_G_USE_ASN,
> +		pa,
> +		cp,
> +		asn,
> +		TLB_ES_A_MODIFIED);

Ditto

> +
> +	shifted_addr = address >> get_page_size_shift(ps);
> +	set = shifted_addr & MMU_JTLB_SET_MASK;
> +
> +	local_irq_save(flags);
> +
> +	if (IS_ENABLED(CONFIG_KVX_DEBUG_TLB_WRITE) &&
> +	    kvx_mmu_ltlb_overlaps(tlbe))
> +		panic("VA %lx overlaps with an existing LTLB mapping", address);
> +
> +	way = get_cpu_var(jtlb_current_set_way)[set]++;
> +	put_cpu_var(jtlb_current_set_way);
> +
> +	way &= MMU_JTLB_WAY_MASK;
> +
> +	kvx_mmu_add_entry(MMC_SB_JTLB, way, tlbe);
> +
> +	if (IS_ENABLED(CONFIG_KVX_DEBUG_TLB_WRITE) &&
> +	    kvx_mmc_error(kvx_sfr_get(MMC)))
> +		panic("Failed to write entry to the JTLB (in update_mmu_cache)");
> +
> +	local_irq_restore(flags);
> +}
> +
> +void __init kvx_mmu_early_setup(void)
> +{
> +	int bit;
> +	struct kvx_tlb_format tlbe;
> +
> +	kvx_mmu_remove_ltlb_entry(LTLB_ENTRY_EARLY_SMEM);
> +
> +	if (raw_smp_processor_id() != 0) {
> +		/* Apply existing ltlb entries starting from first one free */
> +		bit = LTLB_ENTRY_FIXED_COUNT;
> +		for_each_set_bit_from(bit, &ltlb_entries_bmp, MMU_LTLB_WAYS) {
> +			tlbe = ltlb_entries[bit];
> +			kvx_mmu_add_entry(MMC_SB_LTLB, bit, tlbe);
> +		}
> +	}
> +}

--
Sincerely yours,
Mike.
Jules Maselbas Jan. 25, 2023, 6:28 p.m. UTC | #13
Hi Bagas,

Thanks for taking your time and effort to improve the documentation.
We not only need to clean the documention syntax and wording but also
its content. I am tempted to apply all your proposed changes first and
then work on improving and correcting the documentation.

However I am not very sure on how to integrate your changes and give
proper contribution attributions. Any insights on this would be greatly
appreciated.

Thanks
-- Jules
Jules Maselbas Jan. 25, 2023, 9:55 p.m. UTC | #14
Hi Jason,

On Fri, Jan 20, 2023 at 03:29:14PM +0100, Jason A. Donenfeld wrote:
> Hi Yann,
> 
> On Fri, Jan 20, 2023 at 03:09:43PM +0100, Yann Sionneau wrote:
> > +#include <linux/random.h>
> > +#include <linux/version.h>
> > +
> > +extern unsigned long __stack_chk_guard;
> > +
> > +/*
> > + * Initialize the stackprotector canary value.
> > + *
> > + * NOTE: this must only be called from functions that never return,
> > + * and it must always be inlined.
> > + */
> > +static __always_inline void boot_init_stack_canary(void)
> > +{
> > +	unsigned long canary;
> > +
> > +	/* Try to get a semi random initial value. */
> > +	get_random_bytes(&canary, sizeof(canary));
> > +	canary ^= LINUX_VERSION_CODE;
> > +	canary &= CANARY_MASK;
> > +
> > +	current->stack_canary = canary;
> > +	__stack_chk_guard = current->stack_canary;
> > +}
> 
> 
> You should rewrite this as:
> 
>     current->stack_canary = get_random_canary();
>     __stack_chk_guard = current->stack_canary;
> 
> which is what the other archs all now do. (They didn't used to, and this
> looks like it's simply based on older code.)
Thanks for the suggestion, this will be into v3

> > +#define get_cycles get_cycles
> > +
> > +#include <asm/sfr.h>
> > +#include <asm-generic/timex.h>
> > +
> > +static inline cycles_t get_cycles(void)
> > +{
> > +	return kvx_sfr_get(PM0);
> > +}
> 
> Glad to see this CPU has a cycle counter. Out of curiosity, what is
> its resolution?
This cpu has 4 performance monitor (PM), the first one is reserved to
count cycles, and it is cycle accurate.

> Also, related, does this CPU happen to have a "RDRAND"-like instruction?
I didn't knew about the RDRAND insruction, but no this CPU do not have
an instruction like that.

> (I don't know anything about kvx or even what it is.)
It's a VLIW core, a bit like Itanium, there are currently not publicly
available documentation.  We have started a discussion internally at
Kalray to share more information regarding this CPU and its ABI.

A very crude instruction listing can be found in our fork of
gdb-binutils: https://raw.githubusercontent.com/kalray/binutils/binutils-2_35_2/coolidge/opcodes/kv3-opc.c

Best regards,
-- Jules
Bagas Sanjaya Jan. 26, 2023, 2:23 a.m. UTC | #15
On Wed, Jan 25, 2023 at 07:28:20PM +0100, Jules Maselbas wrote:
> Hi Bagas,
> 
> Thanks for taking your time and effort to improve the documentation.
> We not only need to clean the documention syntax and wording but also
> its content. I am tempted to apply all your proposed changes first and
> then work on improving and correcting the documentation.
> 
> However I am not very sure on how to integrate your changes and give
> proper contribution attributions. Any insights on this would be greatly
> appreciated.
> 

Hi Jules,

The reword diff can be squashed into the doc patch (here, [01/31]).

For the attribution, since the reword is significant,

Co-developed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>

Thanks.
Jules Maselbas Jan. 26, 2023, 9:57 a.m. UTC | #16
Hi Mark,

On Fri, Jan 20, 2023 at 03:18:48PM +0000, Mark Rutland wrote:
> On Fri, Jan 20, 2023 at 03:09:42PM +0100, Yann Sionneau wrote:
> > Add common headers (atomic, bitops, barrier and locking) for basic
> > kvx support.
> > 
> > Co-developed-by: Clement Leger <clement@clement-leger.fr>
> > Signed-off-by: Clement Leger <clement@clement-leger.fr>
> > Co-developed-by: Jules Maselbas <jmaselbas@kalray.eu>
> > Signed-off-by: Jules Maselbas <jmaselbas@kalray.eu>
> > Co-developed-by: Julian Vetter <jvetter@kalray.eu>
> > Signed-off-by: Julian Vetter <jvetter@kalray.eu>
> > Co-developed-by: Julien Villette <jvillette@kalray.eu>
> > Signed-off-by: Julien Villette <jvillette@kalray.eu>
> > Co-developed-by: Yann Sionneau <ysionneau@kalray.eu>
> > Signed-off-by: Yann Sionneau <ysionneau@kalray.eu>
> > ---
> > 
> > Notes:
> >     V1 -> V2:
> >      - use {READ,WRITE}_ONCE for arch_atomic64_{read,set}
> >      - use asm-generic/bitops/atomic.h instead of __test_and_*_bit
> >      - removed duplicated includes
> >      - rewrite xchg and cmpxchg in C using builtins for acswap insn
> 
> Thanks for those changes. I see one issue below (instantiated a few times), but
> other than that this looks good to me.
> 
> [...]
> 
> > +#define ATOMIC64_RETURN_OP(op, c_op)					\
> > +static inline long arch_atomic64_##op##_return(long i, atomic64_t *v)	\
> > +{									\
> > +	long new, old, ret;						\
> > +									\
> > +	do {								\
> > +		old = v->counter;					\
> 
> This should be arch_atomic64_read(v), in order to avoid the potential for the
> compiler to replay the access and introduce ABA races and other such problems.
Thanks for the suggestion, this will be into v3.

> For details, see:
> 
>   https://lore.kernel.org/lkml/Y70SWXHDmOc3RhMd@osiris/
>   https://lore.kernel.org/lkml/Y71LoCIl+IFdy9D8@FVFF77S0Q05N/
> 
> I see that the generic 32-bit atomic code suffers from that issue, and we
> should fix it.
I took a look at the generic 32-bit atomic, but I am unsure if this
needs to be done for both the SMP and non-SMP implementations. But I
can send a first patch and we can discuss from there.

> > +		new = old c_op i;					\
> > +		ret = arch_cmpxchg(&v->counter, old, new);		\
> > +	} while (ret != old);						\
> > +									\
> > +	return new;							\
> > +}
> > +
> > +#define ATOMIC64_OP(op, c_op)						\
> > +static inline void arch_atomic64_##op(long i, atomic64_t *v)		\
> > +{									\
> > +	long new, old, ret;						\
> > +									\
> > +	do {								\
> > +		old = v->counter;					\
> 
> Likewise, arch_atomic64_read(v) here.
ack

> > +		new = old c_op i;					\
> > +		ret = arch_cmpxchg(&v->counter, old, new);		\
> > +	} while (ret != old);						\
> > +}
> > +
> > +#define ATOMIC64_FETCH_OP(op, c_op)					\
> > +static inline long arch_atomic64_fetch_##op(long i, atomic64_t *v)	\
> > +{									\
> > +	long new, old, ret;						\
> > +									\
> > +	do {								\
> > +		old = v->counter;					\
> 
> Likewise, arch_atomic64_read(v) here.
ack

> > +		new = old c_op i;					\
> > +		ret = arch_cmpxchg(&v->counter, old, new);		\
> > +	} while (ret != old);						\
> > +									\
> > +	return old;							\
> > +}
> > +
> > +#define ATOMIC64_OPS(op, c_op)						\
> > +	ATOMIC64_OP(op, c_op)						\
> > +	ATOMIC64_RETURN_OP(op, c_op)					\
> > +	ATOMIC64_FETCH_OP(op, c_op)
> > +
> > +ATOMIC64_OPS(and, &)
> > +ATOMIC64_OPS(or, |)
> > +ATOMIC64_OPS(xor, ^)
> > +ATOMIC64_OPS(add, +)
> > +ATOMIC64_OPS(sub, -)
> > +
> > +#undef ATOMIC64_OPS
> > +#undef ATOMIC64_FETCH_OP
> > +#undef ATOMIC64_OP
> > +
> > +static inline int arch_atomic_add_return(int i, atomic_t *v)
> > +{
> > +	int new, old, ret;
> > +
> > +	do {
> > +		old = v->counter;
> 
> Likewise, arch_atomic64_read(v) here.
ack, this will bt arch_atomic_read(v) here since this is not atomic64_t
here.


Thanks
-- Jules
Mark Rutland Jan. 26, 2023, 11:15 a.m. UTC | #17
Hi Jules,

On Thu, Jan 26, 2023 at 10:57:20AM +0100, Jules Maselbas wrote:
> Hi Mark,
> 
> On Fri, Jan 20, 2023 at 03:18:48PM +0000, Mark Rutland wrote:
> > On Fri, Jan 20, 2023 at 03:09:42PM +0100, Yann Sionneau wrote:
> > > +#define ATOMIC64_RETURN_OP(op, c_op)					\
> > > +static inline long arch_atomic64_##op##_return(long i, atomic64_t *v)	\
> > > +{									\
> > > +	long new, old, ret;						\
> > > +									\
> > > +	do {								\
> > > +		old = v->counter;					\
> > 
> > This should be arch_atomic64_read(v), in order to avoid the potential for the
> > compiler to replay the access and introduce ABA races and other such problems.
> Thanks for the suggestion, this will be into v3.
> 
> > For details, see:
> > 
> >   https://lore.kernel.org/lkml/Y70SWXHDmOc3RhMd@osiris/
> >   https://lore.kernel.org/lkml/Y71LoCIl+IFdy9D8@FVFF77S0Q05N/
> > 
> > I see that the generic 32-bit atomic code suffers from that issue, and we
> > should fix it.
> I took a look at the generic 32-bit atomic, but I am unsure if this
> needs to be done for both the SMP and non-SMP implementations. But I
> can send a first patch and we can discuss from there.

Sounds good to me; thanks!

[...]

> > > +static inline int arch_atomic_add_return(int i, atomic_t *v)
> > > +{
> > > +	int new, old, ret;
> > > +
> > > +	do {
> > > +		old = v->counter;
> > 
> > Likewise, arch_atomic64_read(v) here.
> ack, this will bt arch_atomic_read(v) here since this is not atomic64_t
> here.

Ah, yes, my bad!

Thanks,
Mark.
Jules Maselbas Jan. 26, 2023, 11:19 a.m. UTC | #18
On Thu, Jan 26, 2023 at 10:57:20AM +0100, Jules Maselbas wrote:
> Hi Mark,
...

> > > +static inline int arch_atomic_add_return(int i, atomic_t *v)
> > > +{
> > > +	int new, old, ret;
> > > +
> > > +	do {
> > > +		old = v->counter;
> > 
> > Likewise, arch_atomic64_read(v) here.
> ack, this will bt arch_atomic_read(v) here since this is not atomic64_t
> here.
I took a second look at this and I think we are not doing the right
thing, we do not need to defined arch_atomic_add_return at all since
we are including the generic atomic right after, which will define
the macro arch_atomic_add_return as generic_atomic_add_return

> 
> Thanks
> -- Jules
> 
> 
> 
> 
> 
> 
> 
> 
>
Guo Ren Jan. 29, 2023, 11:50 a.m. UTC | #19
On Fri, Jan 20, 2023 at 10:13 PM Yann Sionneau <ysionneau@kalray.eu> wrote:
>
> Add common headers (atomic, bitops, barrier and locking) for basic
> kvx support.
>
> Co-developed-by: Clement Leger <clement@clement-leger.fr>
> Signed-off-by: Clement Leger <clement@clement-leger.fr>
> Co-developed-by: Jules Maselbas <jmaselbas@kalray.eu>
> Signed-off-by: Jules Maselbas <jmaselbas@kalray.eu>
> Co-developed-by: Julian Vetter <jvetter@kalray.eu>
> Signed-off-by: Julian Vetter <jvetter@kalray.eu>
> Co-developed-by: Julien Villette <jvillette@kalray.eu>
> Signed-off-by: Julien Villette <jvillette@kalray.eu>
> Co-developed-by: Yann Sionneau <ysionneau@kalray.eu>
> Signed-off-by: Yann Sionneau <ysionneau@kalray.eu>
> ---
>
> Notes:
>     V1 -> V2:
>      - use {READ,WRITE}_ONCE for arch_atomic64_{read,set}
>      - use asm-generic/bitops/atomic.h instead of __test_and_*_bit
>      - removed duplicated includes
>      - rewrite xchg and cmpxchg in C using builtins for acswap insn
>
>  arch/kvx/include/asm/atomic.h  | 104 ++++++++++++++++++++
>  arch/kvx/include/asm/barrier.h |  15 +++
>  arch/kvx/include/asm/bitops.h  | 115 ++++++++++++++++++++++
>  arch/kvx/include/asm/bitrev.h  |  32 +++++++
>  arch/kvx/include/asm/cmpxchg.h | 170 +++++++++++++++++++++++++++++++++
>  5 files changed, 436 insertions(+)
>  create mode 100644 arch/kvx/include/asm/atomic.h
>  create mode 100644 arch/kvx/include/asm/barrier.h
>  create mode 100644 arch/kvx/include/asm/bitops.h
>  create mode 100644 arch/kvx/include/asm/bitrev.h
>  create mode 100644 arch/kvx/include/asm/cmpxchg.h
>
> diff --git a/arch/kvx/include/asm/atomic.h b/arch/kvx/include/asm/atomic.h
> new file mode 100644
> index 000000000000..bea3d70785b1
> --- /dev/null
> +++ b/arch/kvx/include/asm/atomic.h
> @@ -0,0 +1,104 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + */
> +
> +#ifndef _ASM_KVX_ATOMIC_H
> +#define _ASM_KVX_ATOMIC_H
> +
> +#include <linux/types.h>
> +
> +#include <asm/cmpxchg.h>
> +
> +#define ATOMIC64_INIT(i)     { (i) }
> +
> +#define arch_atomic64_cmpxchg(v, old, new) (arch_cmpxchg(&((v)->counter), old, new))
> +#define arch_atomic64_xchg(v, new) (arch_xchg(&((v)->counter), new))
> +
> +static inline long arch_atomic64_read(const atomic64_t *v)
> +{
> +       return READ_ONCE(v->counter);
> +}
> +
> +static inline void arch_atomic64_set(atomic64_t *v, long i)
> +{
> +       WRITE_ONCE(v->counter, i);
> +}
> +
> +#define ATOMIC64_RETURN_OP(op, c_op)                                   \
> +static inline long arch_atomic64_##op##_return(long i, atomic64_t *v)  \
> +{                                                                      \
> +       long new, old, ret;                                             \
> +                                                                       \
> +       do {                                                            \
> +               old = v->counter;                                       \
> +               new = old c_op i;                                       \
> +               ret = arch_cmpxchg(&v->counter, old, new);              \
> +       } while (ret != old);                                           \
> +                                                                       \
> +       return new;                                                     \
> +}
> +
> +#define ATOMIC64_OP(op, c_op)                                          \
> +static inline void arch_atomic64_##op(long i, atomic64_t *v)           \
> +{                                                                      \
> +       long new, old, ret;                                             \
> +                                                                       \
> +       do {                                                            \
> +               old = v->counter;                                       \
> +               new = old c_op i;                                       \
> +               ret = arch_cmpxchg(&v->counter, old, new);              \
> +       } while (ret != old);                                           \
> +}
> +
> +#define ATOMIC64_FETCH_OP(op, c_op)                                    \
> +static inline long arch_atomic64_fetch_##op(long i, atomic64_t *v)     \
> +{                                                                      \
> +       long new, old, ret;                                             \
> +                                                                       \
> +       do {                                                            \
> +               old = v->counter;                                       \
> +               new = old c_op i;                                       \
> +               ret = arch_cmpxchg(&v->counter, old, new);              \
> +       } while (ret != old);                                           \
> +                                                                       \
> +       return old;                                                     \
> +}
> +
> +#define ATOMIC64_OPS(op, c_op)                                         \
> +       ATOMIC64_OP(op, c_op)                                           \
> +       ATOMIC64_RETURN_OP(op, c_op)                                    \
> +       ATOMIC64_FETCH_OP(op, c_op)
> +
> +ATOMIC64_OPS(and, &)
> +ATOMIC64_OPS(or, |)
> +ATOMIC64_OPS(xor, ^)
> +ATOMIC64_OPS(add, +)
> +ATOMIC64_OPS(sub, -)
> +
> +#undef ATOMIC64_OPS
> +#undef ATOMIC64_FETCH_OP
> +#undef ATOMIC64_OP
> +
> +static inline int arch_atomic_add_return(int i, atomic_t *v)
> +{
> +       int new, old, ret;
> +
> +       do {
> +               old = v->counter;
> +               new = old + i;
> +               ret = arch_cmpxchg(&v->counter, old, new);
> +       } while (ret != old);
> +
> +       return new;
> +}
> +
> +static inline int arch_atomic_sub_return(int i, atomic_t *v)
> +{
> +       return arch_atomic_add_return(-i, v);
> +}
> +
> +#include <asm-generic/atomic.h>
> +
> +#endif /* _ASM_KVX_ATOMIC_H */
> diff --git a/arch/kvx/include/asm/barrier.h b/arch/kvx/include/asm/barrier.h
> new file mode 100644
> index 000000000000..371f1c70746d
> --- /dev/null
> +++ b/arch/kvx/include/asm/barrier.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + */
> +
> +#ifndef _ASM_KVX_BARRIER_H
> +#define _ASM_KVX_BARRIER_H
> +
> +/* fence is sufficient to guarantee write ordering */
> +#define mb()   __builtin_kvx_fence()
> +
> +#include <asm-generic/barrier.h>
> +
> +#endif /* _ASM_KVX_BARRIER_H */
> diff --git a/arch/kvx/include/asm/bitops.h b/arch/kvx/include/asm/bitops.h
> new file mode 100644
> index 000000000000..c643f4765059
> --- /dev/null
> +++ b/arch/kvx/include/asm/bitops.h
> @@ -0,0 +1,115 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + *            Yann Sionneau
> + */
> +
> +#ifndef _ASM_KVX_BITOPS_H
> +#define _ASM_KVX_BITOPS_H
> +
> +#ifdef __KERNEL__
> +
> +#ifndef _LINUX_BITOPS_H
> +#error only <linux/bitops.h> can be included directly
> +#endif
> +
> +#include <asm/cmpxchg.h>
> +
> +static inline int fls(int x)
> +{
> +       return 32 - __builtin_kvx_clzw(x);
> +}
> +
> +static inline int fls64(__u64 x)
> +{
> +       return 64 - __builtin_kvx_clzd(x);
> +}
> +
> +/**
> + * __ffs - find first set bit in word
> + * @word: The word to search
> + *
> + * Undefined if no set bit exists, so code should check against 0 first.
> + */
> +static inline unsigned long __ffs(unsigned long word)
> +{
> +       return __builtin_kvx_ctzd(word);
> +}
> +
> +/**
> + * __fls - find last set bit in word
> + * @word: The word to search
> + *
> + * Undefined if no set bit exists, so code should check against 0 first.
> + */
> +static inline unsigned long __fls(unsigned long word)
> +{
> +       return 63 - __builtin_kvx_clzd(word);
> +}
> +
> +
> +/**
> + * ffs - find first set bit in word
> + * @x: the word to search
> + *
> + * This is defined the same way as the libc and compiler builtin ffs
> + * routines, therefore differs in spirit from the other bitops.
> + *
> + * ffs(value) returns 0 if value is 0 or the position of the first
> + * set bit if value is nonzero. The first (least significant) bit
> + * is at position 1.
> + */
> +static inline int ffs(int x)
> +{
> +       if (!x)
> +               return 0;
> +       return __builtin_kvx_ctzw(x) + 1;
> +}
> +
> +static inline unsigned int __arch_hweight32(unsigned int w)
> +{
> +       unsigned int count;
> +
> +       asm volatile ("cbsw %0 = %1\n\t;;"
> +       : "=r" (count)
> +       : "r" (w));
> +
> +       return count;
> +}
> +
> +static inline unsigned int __arch_hweight64(__u64 w)
> +{
> +       unsigned int count;
> +
> +       asm volatile ("cbsd %0 = %1\n\t;;"
> +       : "=r" (count)
> +       : "r" (w));
> +
> +       return count;
> +}
> +
> +static inline unsigned int __arch_hweight16(unsigned int w)
> +{
> +       return __arch_hweight32(w & 0xffff);
> +}
> +
> +static inline unsigned int __arch_hweight8(unsigned int w)
> +{
> +       return __arch_hweight32(w & 0xff);
> +}
> +
> +#include <asm-generic/bitops/ffz.h>
> +
> +#include <asm-generic/bitops/sched.h>
> +#include <asm-generic/bitops/const_hweight.h>
> +
> +#include <asm-generic/bitops/atomic.h>
> +#include <asm-generic/bitops/non-atomic.h>
> +#include <asm-generic/bitops/lock.h>
> +#include <asm-generic/bitops/le.h>
> +#include <asm-generic/bitops/ext2-atomic.h>
> +
> +#endif
> +
> +#endif
> diff --git a/arch/kvx/include/asm/bitrev.h b/arch/kvx/include/asm/bitrev.h
> new file mode 100644
> index 000000000000..79865081905a
> --- /dev/null
> +++ b/arch/kvx/include/asm/bitrev.h
> @@ -0,0 +1,32 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + */
> +
> +#ifndef _ASM_KVX_BITREV_H
> +#define _ASM_KVX_BITREV_H
> +
> +#include <linux/swab.h>
> +
> +/* Bit reversal constant for matrix multiply */
> +#define BIT_REVERSE 0x0102040810204080ULL
> +
> +static __always_inline __attribute_const__ u32 __arch_bitrev32(u32 x)
> +{
> +       /* Reverse all bits for each bytes and then byte-reverse the 32 LSB */
> +       return swab32(__builtin_kvx_sbmm8(BIT_REVERSE, x));
> +}
> +
> +static __always_inline __attribute_const__ u16 __arch_bitrev16(u16 x)
> +{
> +       /* Reverse all bits for each bytes and then byte-reverse the 16 LSB */
> +       return swab16(__builtin_kvx_sbmm8(BIT_REVERSE, x));
> +}
> +
> +static __always_inline __attribute_const__ u8 __arch_bitrev8(u8 x)
> +{
> +       return __builtin_kvx_sbmm8(BIT_REVERSE, x);
> +}
> +
> +#endif
> diff --git a/arch/kvx/include/asm/cmpxchg.h b/arch/kvx/include/asm/cmpxchg.h
> new file mode 100644
> index 000000000000..51ccb83757cc
> --- /dev/null
> +++ b/arch/kvx/include/asm/cmpxchg.h
> @@ -0,0 +1,170 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2017-2023 Kalray Inc.
> + * Author(s): Clement Leger
> + *            Yann Sionneau
> + *            Jules Maselbas
> + */
> +
> +#ifndef _ASM_KVX_CMPXCHG_H
> +#define _ASM_KVX_CMPXCHG_H
> +
> +#include <linux/bits.h>
> +#include <linux/types.h>
> +#include <linux/align.h>
> +#include <linux/build_bug.h>
> +
> +/*
> + * On kvx, we have a boolean compare and swap which means that the operation
> + * returns only the success of operation.
> + * If operation succeed, this is simple, we just need to return the provided
> + * old value. However, if it fails, we need to load the value to return it for
> + * the caller. If the loaded value is different from the "old" provided by the
> + * caller, we can return it since it will means it failed.
> + * However, if for some reason the value we read is equal to the old value
> + * provided by the caller, we can't simply return it or the caller will think it
> + * succeeded. So if the value we read is the same as the "old" provided by
> + * the caller, we try again until either we succeed or we fail with a different
> + * value than the provided one.
> + */
> +
> +static inline unsigned int __cmpxchg_u32(unsigned int old, unsigned int new,
> +                                        volatile unsigned int *ptr)
> +{
> +       unsigned int exp = old;
> +
> +       __builtin_kvx_fence();
> +       while (exp == old) {
> +               if (__builtin_kvx_acswapw((void *)ptr, new, exp))
What's the acswapw/d machine code? Seems all RMW-atomic operations are
based on it.

> +                       break; /* acswap succeed */
> +               exp = *ptr;
> +       }
> +
> +       return exp;
> +}
> +
> +static inline unsigned long __cmpxchg_u64(unsigned long old, unsigned long new,
> +                                         volatile unsigned long *ptr)
> +{
> +       unsigned long exp = old;
> +
> +       __builtin_kvx_fence();
> +       while (exp == old) {
> +               if (__builtin_kvx_acswapd((void *)ptr, new, exp))
> +                       break; /* acswap succeed */
> +               exp = *ptr;
> +       }
> +
> +       return exp;
> +}
> +
> +extern unsigned long __cmpxchg_called_with_bad_pointer(void)
> +       __compiletime_error("Bad argument size for cmpxchg");
> +
> +static __always_inline unsigned long __cmpxchg(unsigned long old,
> +                                              unsigned long new,
> +                                              volatile void *ptr, int size)
> +{
> +       switch (size) {
> +       case 4:
> +               return __cmpxchg_u32(old, new, ptr);
> +       case 8:
> +               return __cmpxchg_u64(old, new, ptr);
> +       default:
> +               return __cmpxchg_called_with_bad_pointer();
> +       }
> +}
> +
> +#define arch_cmpxchg(ptr, old, new)                                    \
> +       ((__typeof__(*(ptr))) __cmpxchg(                                \
> +               (unsigned long)(old), (unsigned long)(new),             \
> +               (ptr), sizeof(*(ptr))))
> +
> +/*
> + * In order to optimize xchg for 16 byte, we can use insf/extfs if we know the
> + * bounds. This way, we only take one more bundle than standard xchg.
> + * We simply do a read modify acswap on a 32 bit word.
> + */
> +
> +#define __kvx_insf(org, val, start, stop) __asm__ __volatile__(        \
> +               "insf %[_org] = %[_val], %[_stop], %[_start]\n\t;;"     \
> +               : [_org]"+r"(org)                                       \
> +               : [_val]"r"(val), [_stop]"i"(stop), [_start]"i"(start))
> +
> +#define __kvx_extfz(out, val, start, stop) __asm__ __volatile__(       \
> +               "extfz %[_out] = %[_val], %[_stop], %[_start]\n\t;;"    \
> +               : [_out]"=r"(out)                                       \
> +               : [_val]"r"(val), [_stop]"i"(stop), [_start]"i"(start))
> +
> +/* Needed for generic qspinlock implementation */
> +static inline unsigned int __xchg_u16(unsigned int old, unsigned int new,
> +                                     volatile unsigned int *ptr)
> +{
> +       unsigned int off = ((unsigned long)ptr) % sizeof(unsigned int);
> +       unsigned int val;
> +
> +       ptr = PTR_ALIGN_DOWN(ptr, sizeof(unsigned int));
> +       __builtin_kvx_fence();
> +       do {
> +               old = *ptr;
> +               val = old;
> +               if (off == 0)
> +                       __kvx_insf(val, new, 0, 15);
> +               else
> +                       __kvx_insf(val, new, 16, 31);
> +       } while (!__builtin_kvx_acswapw((void *)ptr, val, old));
> +
> +       if (off == 0)
> +               __kvx_extfz(old, old, 0, 15);
> +       else
> +               __kvx_extfz(old, old, 16, 31);
> +
> +       return old;
> +}
> +
> +static inline unsigned int __xchg_u32(unsigned int old, unsigned int new,
> +                                     volatile unsigned int *ptr)
> +{
> +       __builtin_kvx_fence();
> +       do
> +               old = *ptr;
> +       while (!__builtin_kvx_acswapw((void *)ptr, new, old));
> +
> +       return old;
> +}
> +
> +static inline unsigned long __xchg_u64(unsigned long old, unsigned long new,
> +                                      volatile unsigned long *ptr)
> +{
> +       __builtin_kvx_fence();
> +       do
> +               old = *ptr;
> +       while (!__builtin_kvx_acswapd((void *)ptr, new, old));
> +
> +       return old;
> +}
> +
> +extern unsigned long __xchg_called_with_bad_pointer(void)
> +       __compiletime_error("Bad argument size for xchg");
> +
> +static __always_inline unsigned long __xchg(unsigned long val,
> +                                           volatile void *ptr, int size)
> +{
> +       switch (size) {
> +       case 2:
> +               return __xchg_u16(0, val, ptr);
> +       case 4:
> +               return __xchg_u32(0, val, ptr);
> +       case 8:
> +               return __xchg_u64(0, val, ptr);
> +       default:
> +               return __xchg_called_with_bad_pointer();
> +       }
> +}
> +
> +#define arch_xchg(ptr, val)                                            \
> +       ((__typeof__(*(ptr))) __xchg(                                   \
> +               (unsigned long)(val),                                   \
> +               (ptr), sizeof(*(ptr))))
> +
> +#endif
> --
> 2.37.2
>
>
>
>
>
Yann Sionneau Jan. 31, 2024, 9:52 a.m. UTC | #20
Hello Krzysztof,

On 22/01/2023 12:54, Krzysztof Kozlowski wrote:
> On 20/01/2023 15:10, Yann Sionneau wrote:
>> +
>> +int __init kvx_ipi_ctrl_probe(irqreturn_t (*ipi_irq_handler)(int, void *))
>> +{
>> +	struct device_node *np;
>> +	int ret;
>> +	unsigned int ipi_irq;
>> +	void __iomem *ipi_base;
>> +
>> +	np = of_find_compatible_node(NULL, NULL, "kalray,kvx-ipi-ctrl");
> Nope, big no.
>
> Drivers go to drivers, not to arch code. Use proper driver infrastructure.
Thank you for your review.

It raises questions on our side about how to handle this change.

First let me describe the hardware:

The coolidge ipi controller device handles IPI communication between cpus
inside a cluster.

Each cpu has 8 of its dedicated irq lines (24 to 31) hard-wired to the ipi.
The ipi controller has 8 sets of 2 registers:
- a 17-bit "interrupt" register
- a 17-bit "mask" register

Each couple of register is dedicated to 1 of the 8 irqlines.
Each of the 17 bits of interrupt/mask register
identifies a cpu (cores 0 to 15 + secure_core).
Writing bit i in interrupt register sends an irq to cpu i, according to the mask
in mask register.
Writing in interrupt/mask register couple N targets irq line N of the core.

- Ipi generates an interrupt to the cpu when message is ready.
- Messages are delivered via Axi.
- Ipi does not have any interrupt input lines.


  +---------------+   irq       axi_w
  |         |  i  |<--/--- ipi <------
  | CPU     |  n  |  x8
  |  core0  |  t  |
  |         |  c  |  irq          irq         msi
  |         |  t  |<--/--- apic <----- mbox <-------
  |         |  l  |  x4
  +---------------+
  with intctl = core-irq controller
    

We analyzed how other Linux ports are handling this situation (IPI) and here are several possible solutions:

1/ put everything in smp.c like what longarch is doing.
  * Except that IPI in longarch seems to involve writing to a special purpose CPU register and not doing a memory mapped write like kvx.

2/ write a device driver in drivers/xxx/ with the content from ipi.c
  * the probe would just ioremap the reg from DT and register the irq using request_percpu_irq()
  * it would export a function "kvx_ipi_send()" that would directly be called by smp.c
  * Question : where would this driver be placed in drivers/ ? drivers/irqchip/ ? Even if this is not per-se an interrupt-controller driver?

3/ write a "dummy" interrupt-controller driver in drivers/irqchip/:
  * it would create a dummy irq_domain, ioremap the reg, request per_cpu irq
  * declare a struct irq_chip with only ipi_send_mask() callback declared so that generic IPI code in kernel/irq/ipi.c (__ipi_send_single()) would work.
  * This would make use of the generic IPI code like what mips and risc-v are doing.

4/ consider our "ipi device" as a mailbox and write a mailbox driver in drivers/mailbox/

5/ consider it as an msi-controller since it transforms an AXI write into IRQ. The solution would look a bit like 3/

6/ consider the ipi as "part of the core_intc" and add the content of ipi.c in drivers/irqchip/irq-kvx-core-intc.c

7/ Do like OpenRISC and CSKY:
  * declare a function pointer in smp.c (see smp_cross_call() from OpenRISC https://elixir.bootlin.com/linux/latest/source/arch/openrisc/kernel/smp.c#L28)
  * declare a "setter" function in smp.c (see set_send_ipi() from OpenRISC https://elixir.bootlin.com/linux/latest/source/arch/openrisc/kernel/smp.c#L202)
  * write a driver in drivers/irqchip/ which ends up calling the setter function (see irq-ompic.c: https://elixir.bootlin.com/linux/latest/source/drivers/irqchip/irq-ompic.c#L191)


I would tend to exclude solution 1/ because it does not fit exactly our arch (core reg vs AXI write), or we would have to do an ioremap() from inside smp.c, is this acceptable?
I would exclude 3/ because it feels a bit dirty to hack a dummy interrupt-controller... our IPI is not an interrupt-controller, there are no input irqs. It's more like a device generating an IRQ.
4/ and 5/ feel a bit over-engineered.
6/ I guess this would work since irqchips are initialized early from init_IRQ(), but it does not reflect very much our hardware since each CPU has one core_intc but the IPI is global to each cluster and is accessed over AXI.

Having considered all of this, I would tend to end up with solution 7/ but it honestly does not feel much cleaner than our current proposition. The function pointer dance feels a bit hackish.

What would you prefer?

Regards,
Krzysztof Kozlowski Jan. 31, 2024, 10:12 a.m. UTC | #21
On 31/01/2024 10:52, Yann Sionneau wrote:
> Hello Krzysztof,
> 
> On 22/01/2023 12:54, Krzysztof Kozlowski wrote:
>> On 20/01/2023 15:10, Yann Sionneau wrote:
>>> +
>>> +int __init kvx_ipi_ctrl_probe(irqreturn_t (*ipi_irq_handler)(int, void *))
>>> +{
>>> +	struct device_node *np;
>>> +	int ret;
>>> +	unsigned int ipi_irq;
>>> +	void __iomem *ipi_base;
>>> +
>>> +	np = of_find_compatible_node(NULL, NULL, "kalray,kvx-ipi-ctrl");
>> Nope, big no.
>>
>> Drivers go to drivers, not to arch code. Use proper driver infrastructure.
> Thank you for your review.
> 
> It raises questions on our side about how to handle this change.

I am sorry, but responding with one page of hardware description is
totally unrelated to the code I am questioning here and does not make it
easier for me to respond. I understand that you want me to learn entire
new KVX architecture to be able to provide good review, but it is just
not possible, sorry. We all have quite limited time around here, so we
all expect concise and precise answers.

Best regards,
Krzysztof
Arnd Bergmann Jan. 31, 2024, 10:28 a.m. UTC | #22
On Wed, Jan 31, 2024, at 10:52, Yann Sionneau wrote:
> On 22/01/2023 12:54, Krzysztof Kozlowski wrote:
>> On 20/01/2023 15:10, Yann Sionneau wrote:
>>> +
>>> +int __init kvx_ipi_ctrl_probe(irqreturn_t (*ipi_irq_handler)(int, void *))
>>> +{
>>> +	struct device_node *np;
>>> +	int ret;
>>> +	unsigned int ipi_irq;
>>> +	void __iomem *ipi_base;
>>> +
>>> +	np = of_find_compatible_node(NULL, NULL, "kalray,kvx-ipi-ctrl");
>> Nope, big no.
>>
>> Drivers go to drivers, not to arch code. Use proper driver infrastructure.
> Thank you for your review.
>
> It raises questions on our side about how to handle this change.
>
> First let me describe the hardware:
>
> The coolidge ipi controller device handles IPI communication between cpus
> inside a cluster.
>
> Each cpu has 8 of its dedicated irq lines (24 to 31) hard-wired to the ipi.
> The ipi controller has 8 sets of 2 registers:
> - a 17-bit "interrupt" register
> - a 17-bit "mask" register
>
> Each couple of register is dedicated to 1 of the 8 irqlines.
> Each of the 17 bits of interrupt/mask register
> identifies a cpu (cores 0 to 15 + secure_core).
> Writing bit i in interrupt register sends an irq to cpu i, according to the mask
> in mask register.
> Writing in interrupt/mask register couple N targets irq line N of the core.
>
> - Ipi generates an interrupt to the cpu when message is ready.
> - Messages are delivered via Axi.
> - Ipi does not have any interrupt input lines.
>
>
>   +---------------+   irq       axi_w
>   |         |  i  |<--/--- ipi <------
>   | CPU     |  n  |  x8
>   |  core0  |  t  |
>   |         |  c  |  irq          irq         msi
>   |         |  t  |<--/--- apic <----- mbox <-------
>   |         |  l  |  x4
>   +---------------+
>   with intctl = core-irq controller
>    
>
> We analyzed how other Linux ports are handling this situation (IPI) and 
> here are several possible solutions:
>
> 1/ put everything in smp.c like what longarch is doing.
>   * Except that IPI in longarch seems to involve writing to a special 
> purpose CPU register and not doing a memory mapped write like kvx.
>
> 2/ write a device driver in drivers/xxx/ with the content from ipi.c
>   * the probe would just ioremap the reg from DT and register the irq 
> using request_percpu_irq()
>   * it would export a function "kvx_ipi_send()" that would directly be 
> called by smp.c
>   * Question : where would this driver be placed in drivers/ ? 
> drivers/irqchip/ ? Even if this is not per-se an interrupt-controller 
> driver?

This looks like it's close enough to the irqchip driver
that you can just have it in the same file as the 'intctl'
portion.

Top-level irqchip implementations tend to be rather architecture
specific, as does the IPI mechanism. Depending on the register
layout, I think you can have a single devicetree node for
the combination of the core-irq (for managing your
own interrupts) and the ipi (for receiving interrupts from
others), and then have a driver in drivers/irqchip to
deal with both. For the ipi mechanism, trying to abstract
it too much generally makes it too slow, so I would not
go through a nested irqchip or a mailbox driver etc.

I don't know what the 'apic' in your diagram is, so that
would be either a nested irqchip or could be part of
the same driver as well.

     Arnd
Yann Sionneau April 15, 2024, 2:08 p.m. UTC | #23
Hello Krzysztof, Arnd, all,

On 1/22/23 12:54, Krzysztof Kozlowski wrote:
> On 20/01/2023 15:10, Yann Sionneau wrote:
>> From: Jules Maselbas <jmaselbas@kalray.eu>
>>
>> The Power Controller (pwr-ctrl) control cores reset and wake-up
>> procedure.
>> +
>> +int __init kvx_pwr_ctrl_probe(void)
>> +{
>> +	struct device_node *ctrl;
>> +
>> +	ctrl = get_pwr_ctrl_node();
>> +	if (!ctrl) {
>> +		pr_err("Failed to get power controller node\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!of_device_is_compatible(ctrl, "kalray,kvx-pwr-ctrl")) {
>> +		pr_err("Failed to get power controller node\n");
> No. Drivers go to drivers, not to arch directory. This should be a
> proper driver instead of some fake stub doing its own driver matching.
> You need to rework this.

I am working on a v3 patchset, therefore I am working on a solution for 
this "pwr-ctrl" driver that needs to go somewhere else than arch/kvx/.

The purpose of this "driver" is just to expose a void 
kvx_pwr_ctrl_cpu_poweron(unsigned int cpu) function, used by 
kernel/smpboot.c function __cpu_up() in order to start secondary CPUs in 
SMP config.

Doing this, on our SoC, requires writing 3 registers in a memory-mapped 
device named "power controller".

I made some researches in drivers/ but I am not sure yet what's a good 
place that fits what our device is doing (booting secondary CPUs).

* drivers/power/reset seems to be for resetting the entire SoC

* drivers/power/supply seems to be to control power supplies ICs/periph.

* drivers/reset seems to be for device reset

* drivers/pmdomain maybe ?

* drivers/soc ?

* drivers/platform ?

* drivers/misc ?

What do you think?

Thanks.

Regards,
Arnd Bergmann April 15, 2024, 3:30 p.m. UTC | #24
On Mon, Apr 15, 2024, at 16:08, Yann Sionneau wrote:
> On 1/22/23 12:54, Krzysztof Kozlowski wrote:
>> On 20/01/2023 15:10, Yann Sionneau wrote:
>>> From: Jules Maselbas <jmaselbas@kalray.eu>
>>>
>>> The Power Controller (pwr-ctrl) control cores reset and wake-up
>>> procedure.
>>> +
>>> +int __init kvx_pwr_ctrl_probe(void)
>>> +{
>>> +	struct device_node *ctrl;
>>> +
>>> +	ctrl = get_pwr_ctrl_node();
>>> +	if (!ctrl) {
>>> +		pr_err("Failed to get power controller node\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (!of_device_is_compatible(ctrl, "kalray,kvx-pwr-ctrl")) {
>>> +		pr_err("Failed to get power controller node\n");
>> No. Drivers go to drivers, not to arch directory. This should be a
>> proper driver instead of some fake stub doing its own driver matching.
>> You need to rework this.
>
> I am working on a v3 patchset, therefore I am working on a solution for 
> this "pwr-ctrl" driver that needs to go somewhere else than arch/kvx/.
>
> The purpose of this "driver" is just to expose a void 
> kvx_pwr_ctrl_cpu_poweron(unsigned int cpu) function, used by 
> kernel/smpboot.c function __cpu_up() in order to start secondary CPUs in 
> SMP config.
>
> Doing this, on our SoC, requires writing 3 registers in a memory-mapped 
> device named "power controller".
>
> I made some researches in drivers/ but I am not sure yet what's a good 
> place that fits what our device is doing (booting secondary CPUs).
>
> * drivers/power/reset seems to be for resetting the entire SoC
>
> * drivers/power/supply seems to be to control power supplies ICs/periph.
>
> * drivers/reset seems to be for device reset
>
> * drivers/pmdomain maybe ?

Right, I don't think any of the above are appropriate

> * drivers/soc ?
>
> * drivers/platform ?
>
> * drivers/misc ?

Not drivers/misc, that is mainly for things with a user-space
interface.

drivers/soc is mainly for drivers used by other drivers, but
this would work, especially if you expect to have multiple
SoC variants that all use the same architecture code but
incompatible register layouts

drivers/platform is really for things outside of the SoC
that are used for managing the system, especially across
architectures, so I don't think that is a good fit.

Traditionally we had this code in arch/{arm,mips,powerpc,sh,x86}
and we never created a drier subsystem for it since newer
targets (arm64, riscv, newer arm, most x86) all use a method
that is specified as part of the ISA or firmware interface.

The question what you expect to see with future hardware
iterations: if you think all arch/kvx/ hardware will use the
same code for maybe at least the next five years, I would
suggest you keep it in arch/kvx/kernel/smp.c, but if you
know or expect other implementations to be needed, I can
merge it as a driver through drivers/soc/.

     Arnd
Krzysztof Kozlowski April 17, 2024, 7:20 p.m. UTC | #25
On 15/04/2024 16:08, Yann Sionneau wrote:
> Hello Krzysztof, Arnd, all,
> 
> On 1/22/23 12:54, Krzysztof Kozlowski wrote:
>> On 20/01/2023 15:10, Yann Sionneau wrote:
>>> From: Jules Maselbas <jmaselbas@kalray.eu>
>>>
>>> The Power Controller (pwr-ctrl) control cores reset and wake-up
>>> procedure.
>>> +
>>> +int __init kvx_pwr_ctrl_probe(void)
>>> +{
>>> +	struct device_node *ctrl;
>>> +
>>> +	ctrl = get_pwr_ctrl_node();
>>> +	if (!ctrl) {
>>> +		pr_err("Failed to get power controller node\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (!of_device_is_compatible(ctrl, "kalray,kvx-pwr-ctrl")) {
>>> +		pr_err("Failed to get power controller node\n");
>> No. Drivers go to drivers, not to arch directory. This should be a
>> proper driver instead of some fake stub doing its own driver matching.
>> You need to rework this.
> 
> I am working on a v3 patchset, therefore I am working on a solution for 
> this "pwr-ctrl" driver that needs to go somewhere else than arch/kvx/.
> 
> The purpose of this "driver" is just to expose a void 
> kvx_pwr_ctrl_cpu_poweron(unsigned int cpu) function, used by 
> kernel/smpboot.c function __cpu_up() in order to start secondary CPUs in 
> SMP config.

I might be missing here some bigger picture and maybe my original
comment was no appropriate, but IIUC, you might now create dependencies
between arch code and drivers. That's also fragile.

> 
> Doing this, on our SoC, requires writing 3 registers in a memory-mapped 
> device named "power controller".
> 
> I made some researches in drivers/ but I am not sure yet what's a good 
> place that fits what our device is doing (booting secondary CPUs).
> 
> * drivers/power/reset seems to be for resetting the entire SoC
> 
> * drivers/power/supply seems to be to control power supplies ICs/periph.
> 
> * drivers/reset seems to be for device reset
> 
> * drivers/pmdomain maybe ?
> 
> * drivers/soc ?
> 

Bringup of CPU? Then I would vote for here. You also have existing
example: r9a06g032-smp.c

But anyway the point is to make it clear - either it is a driver or core
code. Not both. The original code was not looking like any other CPU
bringup code.

Best regards,
Krzysztof