Message ID | cover.1536757532.git.ren_guo@c-sky.com |
---|---|
Headers | show |
Series | C-SKY(csky) Linux Kernel Port | expand |
On Wed, Sep 12, 2018 at 3:25 PM Guo Ren <ren_guo@c-sky.com> wrote: > > Signed-off-by: Guo Ren <ren_guo@c-sky.com> > diff --git a/arch/csky/boot/dts/qemu.dts b/arch/csky/boot/dts/qemu.dts > new file mode 100644 > index 0000000..c6643b1 > --- /dev/null > +++ b/arch/csky/boot/dts/qemu.dts > @@ -0,0 +1,77 @@ Did you consider renaming the file as I suggested? I think that would help in the long run when you get new qemu versions that have a different set of devices and that may provide their own dtbs. > diff --git a/arch/csky/include/asm/compat.h b/arch/csky/include/asm/compat.h > new file mode 100644 > index 0000000..59f9297 > --- /dev/null > +++ b/arch/csky/include/asm/compat.h > @@ -0,0 +1,11 @@ > +// SPDX-License-Identifier: GPL-2.0 > +// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd. > + > +#ifndef __ASM_CSKY_COMPAT_H > +#define __ASM_CSKY_COMPAT_H > + > +#ifdef CONFIG_COMPAT > +#define COMPAT_UTS_MACHINE "csky\0\0" > +#endif > + > +#endif /* __ASM_CSKY_COMPAT_H */ Just noticed this one. I assume there won't be a csky64 architecture, so you can probably just delete this header. Arnd
On Wed, Sep 12, 2018 at 3:25 PM Guo Ren <ren_guo@c-sky.com> wrote: > > This is the 3th version patchset to add the Linux kernel port for C-SKY(csky). > Thanks to everyone who provided feedback on the previous version. > > This patchset adds architecture support to Linux for C-SKY's 32-bit embedded > CPU cores and the patches are based on linux-4.18.4 > > There are two ABI versions with several CPU cores in this patchset: > ABIv1: ck610 (16-bit instruction, 32-bit data path, VIPT Cache ...) > ABIv2: ck807 ck810 ck860 (16/32-bit variable length instruction, PIPT Cache, > SMP ...) > > More information: http://en.c-sky.com This looks good to me overall. I think a good next step would be to get the port included in linux-next, by preparing a git tree with all the patches and asking Stephen Rothwell to include it there. Further comments on the architecture port itself can be done on top of the existing patches. I would suggest you base the git tree on an -rc release (either 4.19-rc1 or 4.19-rc3) and then never rebase again. You have included a couple of drivers in the submission: two timer and two irqchip drivers. Please leave those out for the moment, and either have them merged through the respective subsystem trees, or get an Ack from the maintainers to merge them through your tree. I notice that a lot of the patches have no changeset comments on them. You should fix that and make a habit of describing every single patch with a few sentences, even if it seems obvious to you. Have a look at the changeset descriptions for the nds32 and riscv architectures when they got merged. One big question for me is what to do about time_t. Deepa and I are in the process of finalizing the system call ABI for 32-bit architectures with 64-bit time_t, but we are not done yet and it won't be complete for 4.20. If you target 4.21, that could be a chance to make csky the first architecture to only need the 64-bit time_t interface, with the corresponding user space changes. Arnd
On Wed, Sep 12, 2018 at 09:24:45PM +0800, Guo Ren wrote: > +#define ATOMIC_OP(op, c_op) \ > +static inline void atomic_##op(int i, atomic_t *v) \ > +{ \ > + unsigned long tmp; \ > + \ > + smp_mb(); \ > + asm volatile ( \ > + "1: ldex.w %0, (%2) \n" \ > + " " #op " %0, %1 \n" \ > + " stex.w %0, (%2) \n" \ > + " bez %0, 1b \n" \ > + : "=&r" (tmp) \ > + : "r" (i), "r"(&v->counter) \ > + : "memory"); \ > + smp_mb(); \ > +} ATOMIC_OP doesn't need to imply any smp_mb()'s what so ever. > +#define ATOMIC_OP_RETURN(op, c_op) \ > +static inline int atomic_##op##_return(int i, atomic_t *v) \ > +{ \ > + unsigned long tmp, ret; \ > + \ > + smp_mb(); \ > + asm volatile ( \ > + "1: ldex.w %0, (%3) \n" \ > + " " #op " %0, %2 \n" \ > + " mov %1, %0 \n" \ > + " stex.w %0, (%3) \n" \ > + " bez %0, 1b \n" \ > + : "=&r" (tmp), "=&r" (ret) \ > + : "r" (i), "r"(&v->counter) \ > + : "memory"); \ > + smp_mb(); \ > + \ > + return ret; \ > +} > + > +#define ATOMIC_FETCH_OP(op, c_op) \ > +static inline int atomic_fetch_##op(int i, atomic_t *v) \ > +{ \ > + unsigned long tmp, ret; \ > + \ > + smp_mb(); \ > + asm volatile ( \ > + "1: ldex.w %0, (%3) \n" \ > + " mov %1, %0 \n" \ > + " " #op " %0, %2 \n" \ > + " stex.w %0, (%3) \n" \ > + " bez %0, 1b \n" \ > + : "=&r" (tmp), "=&r" (ret) \ > + : "r" (i), "r"(&v->counter) \ > + : "memory"); \ > + smp_mb(); \ > + \ > + return ret; \ > +} For these you could generate _relaxed variants and not provide smp_mb() inside them. > +#else /* CONFIG_CPU_HAS_LDSTEX */ > + > +#include <linux/irqflags.h> > + > +#define ATOMIC_OP(op, c_op) \ > +static inline void atomic_##op(int i, atomic_t *v) \ > +{ \ > + unsigned long tmp, flags; \ > + \ > + raw_local_irq_save(flags); \ > + \ > + asm volatile ( \ > + " ldw %0, (%2) \n" \ > + " " #op " %0, %1 \n" \ > + " stw %0, (%2) \n" \ > + : "=&r" (tmp) \ > + : "r" (i), "r"(&v->counter) \ > + : "memory"); \ > + \ > + raw_local_irq_restore(flags); \ > +} Is this really 'better' than the generic UP fallback implementation? > diff --git a/arch/csky/include/asm/spinlock.h b/arch/csky/include/asm/spinlock.h > new file mode 100644 > index 0000000..f1081bb > --- /dev/null > +++ b/arch/csky/include/asm/spinlock.h > @@ -0,0 +1,286 @@ > +#ifndef __ASM_CSKY_SPINLOCK_H > +#define __ASM_CSKY_SPINLOCK_H > + > +#include <linux/spinlock_types.h> > +#include <asm/barrier.h> > + > +#ifdef CONFIG_QUEUED_RWLOCKS > + > +/* > + * Ticket-based spin-locking. > + */ > +static inline void arch_spin_lock(arch_spinlock_t *lock) > +{ > + arch_spinlock_t lockval; > + u32 ticket_next = 1 << TICKET_NEXT; > + u32 *p = &lock->lock; > + u32 tmp; > + > + smp_mb(); spin_lock() doesn't need smp_mb() before. > + asm volatile ( > + "1: ldex.w %0, (%2) \n" > + " mov %1, %0 \n" > + " add %0, %3 \n" > + " stex.w %0, (%2) \n" > + " bez %0, 1b \n" > + : "=&r" (tmp), "=&r" (lockval) > + : "r"(p), "r"(ticket_next) > + : "cc"); > + > + while (lockval.tickets.next != lockval.tickets.owner) { > + lockval.tickets.owner = READ_ONCE(lock->tickets.owner); > + } > + > + smp_mb(); > +} > + > +static inline int arch_spin_trylock(arch_spinlock_t *lock) > +{ > + u32 tmp, contended, res; > + u32 ticket_next = 1 << TICKET_NEXT; > + u32 *p = &lock->lock; > + > + smp_mb(); idem. > + do { > + asm volatile ( > + " ldex.w %0, (%3) \n" > + " movi %2, 1 \n" > + " rotli %1, %0, 16 \n" > + " cmpne %1, %0 \n" > + " bt 1f \n" > + " movi %2, 0 \n" > + " add %0, %0, %4 \n" > + " stex.w %0, (%3) \n" > + "1: \n" > + : "=&r" (res), "=&r" (tmp), "=&r" (contended) > + : "r"(p), "r"(ticket_next) > + : "cc"); > + } while (!res); > + > + if (!contended) > + smp_mb(); > + > + return !contended; > +} > + > +static inline void arch_spin_unlock(arch_spinlock_t *lock) > +{ > + smp_mb(); > + lock->tickets.owner++; > + smp_mb(); spin_unlock() doesn't need smp_mb() after. > +} > + > +static inline int arch_spin_value_unlocked(arch_spinlock_t lock) > +{ > + return lock.tickets.owner == lock.tickets.next; > +} > + > +static inline int arch_spin_is_locked(arch_spinlock_t *lock) > +{ > + return !arch_spin_value_unlocked(READ_ONCE(*lock)); > +} > + > +static inline int arch_spin_is_contended(arch_spinlock_t *lock) > +{ > + struct __raw_tickets tickets = READ_ONCE(lock->tickets); > + return (tickets.next - tickets.owner) > 1; > +} > +#define arch_spin_is_contended arch_spin_is_contended > + > +#include <asm/qrwlock.h> > + > +/* See include/linux/spinlock.h */ > +#define smp_mb__after_spinlock() smp_mb() > + > +#else /* CONFIG_QUEUED_RWLOCKS */ > + > +/* > + * Test-and-set spin-locking. > + */ Why retain that? same comments; it has far too many smp_mb()s in. > +#endif /* CONFIG_QUEUED_RWLOCKS */ > +#endif /* __ASM_CSKY_SPINLOCK_H */ > diff --git a/arch/csky/include/asm/spinlock_types.h b/arch/csky/include/asm/spinlock_types.h > new file mode 100644 > index 0000000..7e825c2 > --- /dev/null > +++ b/arch/csky/include/asm/spinlock_types.h > @@ -0,0 +1,35 @@ > +#ifndef __ASM_CSKY_SPINLOCK_TYPES_H > +#define __ASM_CSKY_SPINLOCK_TYPES_H > + > +#ifndef __LINUX_SPINLOCK_TYPES_H > +# error "please don't include this file directly" > +#endif > + > +#define TICKET_NEXT 16 > + > +typedef struct { > + union { > + u32 lock; > + struct __raw_tickets { > + /* little endian */ > + u16 owner; > + u16 next; > + } tickets; > + }; > +} arch_spinlock_t; > + > +#define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } } > + > +#ifdef CONFIG_QUEUED_RWLOCKS > +#include <asm-generic/qrwlock_types.h> > + > +#else /* CONFIG_NR_CPUS > 2 */ > + > +typedef struct { > + u32 lock; > +} arch_rwlock_t; > + > +#define __ARCH_RW_LOCK_UNLOCKED { 0 } > + > +#endif /* CONFIG_QUEUED_RWLOCKS */ > +#endif /* __ASM_CSKY_SPINLOCK_TYPES_H */
On Wed, Sep 12, 2018 at 04:30:36PM +0200, Arnd Bergmann wrote: > On Wed, Sep 12, 2018 at 3:25 PM Guo Ren <ren_guo@c-sky.com> wrote: > > > > This is the 3th version patchset to add the Linux kernel port for C-SKY(csky). > > Thanks to everyone who provided feedback on the previous version. > > > > This patchset adds architecture support to Linux for C-SKY's 32-bit embedded > > CPU cores and the patches are based on linux-4.18.4 > > > > There are two ABI versions with several CPU cores in this patchset: > > ABIv1: ck610 (16-bit instruction, 32-bit data path, VIPT Cache ...) > > ABIv2: ck807 ck810 ck860 (16/32-bit variable length instruction, PIPT Cache, > > SMP ...) > > > > More information: http://en.c-sky.com > > This looks good to me overall. I think a good next step would be to get the port > included in linux-next, by preparing a git tree with all the patches and asking > Stephen Rothwell to include it there. Further comments on the architecture > port itself can be done on top of the existing patches. I would suggest you > base the git tree on an -rc release (either 4.19-rc1 or 4.19-rc3) and then never > rebase again. Nice :) I'll follow the rules. > > You have included a couple of drivers in the submission: two timer and > two irqchip drivers. Please leave those out for the moment, and either have > them merged through the respective subsystem trees, or get an Ack > from the maintainers to merge them through your tree. Ok. > > I notice that a lot of the patches have no changeset comments on them. > You should fix that and make a habit of describing every single patch > with a few sentences, even if it seems obvious to you. Have a look at > the changeset descriptions for the nds32 and riscv architectures when > they got merged. Ok, I'll fixup them. > > One big question for me is what to do about time_t. Deepa and I are > in the process of finalizing the system call ABI for 32-bit architectures > with 64-bit time_t, but we are not done yet and it won't be complete > for 4.20. If you target 4.21, that could be a chance to make csky the > first architecture to only need the 64-bit time_t interface, with the > corresponding user space changes. y2038 is very important and csky32 has the issue. But 4.21 is too late for us, we really want to get into kernel.org as soon as possible. We could remove 32-bit time_t in future. Best Regards Guo Ren
On Fri, Sep 14, 2018 at 4:37 PM Guo Ren <ren_guo@c-sky.com> wrote: > On Wed, Sep 12, 2018 at 04:30:36PM +0200, Arnd Bergmann wrote: > > On Wed, Sep 12, 2018 at 3:25 PM Guo Ren <ren_guo@c-sky.com> wrote: > > > > One big question for me is what to do about time_t. Deepa and I are > > in the process of finalizing the system call ABI for 32-bit architectures > > with 64-bit time_t, but we are not done yet and it won't be complete > > for 4.20. If you target 4.21, that could be a chance to make csky the > > first architecture to only need the 64-bit time_t interface, with the > > corresponding user space changes. > y2038 is very important and csky32 has the issue. But 4.21 is too late for > us, we really want to get into kernel.org as soon as possible. > We could remove 32-bit time_t in future. Not really: the way we deal with user-visible APIs in the kernel, it's practically impossible to remove something that was working before, since there may always be users relying on it. This is why it is so important that we get the ABI right at the first try. We can always add new ABIs later, and that's what we're doing with all the other 32-bit architectures as well: each system call that takes a 32-bit time_t argument also needs to get the corresponding 64-bit replacement, and then we have to keep them both around. However, once the 64-bit syscalls are there, you don't need to use them, so if you merge the glibc port after both csky and the time64 syscalls are merged upstream, you can choose to only support the time64 syscalls rather than making it a compile time decision in each application. If you merge the glibc port before migrating to 64-bit syscalls, glibc will also have to support both indefinitely for compatibility with existing binaries. One level below that, you can of course choose to build a distro with only 64-bit time_t regardless of whether there is still support for 32-bit time_t in kernel and glibc or not. Arnd
On Fri, Sep 14, 2018 at 04:46:56PM +0200, Arnd Bergmann wrote: > On Fri, Sep 14, 2018 at 4:37 PM Guo Ren <ren_guo@c-sky.com> wrote: > > On Wed, Sep 12, 2018 at 04:30:36PM +0200, Arnd Bergmann wrote: > > > On Wed, Sep 12, 2018 at 3:25 PM Guo Ren <ren_guo@c-sky.com> wrote: > > > > > > One big question for me is what to do about time_t. Deepa and I are > > > in the process of finalizing the system call ABI for 32-bit architectures > > > with 64-bit time_t, but we are not done yet and it won't be complete > > > for 4.20. If you target 4.21, that could be a chance to make csky the > > > first architecture to only need the 64-bit time_t interface, with the > > > corresponding user space changes. > > y2038 is very important and csky32 has the issue. But 4.21 is too late for > > us, we really want to get into kernel.org as soon as possible. > > We could remove 32-bit time_t in future. > > Not really: the way we deal with user-visible APIs in the kernel, it's > practically impossible to remove something that was working before, > since there may always be users relying on it. This is why it is so > important that we get the ABI right at the first try. > > We can always add new ABIs later, and that's what we're doing with > all the other 32-bit architectures as well: each system call that takes a > 32-bit time_t argument also needs to get the corresponding 64-bit > replacement, and then we have to keep them both around. > > However, once the 64-bit syscalls are there, you don't need to > use them, so if you merge the glibc port after both csky and > the time64 syscalls are merged upstream, you can choose > to only support the time64 syscalls rather than making it a compile > time decision in each application. If you merge the glibc port > before migrating to 64-bit syscalls, glibc will also have to support > both indefinitely for compatibility with existing binaries. > > One level below that, you can of course choose to build > a distro with only 64-bit time_t regardless of whether there > is still support for 32-bit time_t in kernel and glibc or not. Do you mean I could merge kernel port into linux-4.19 or linux-4.20. But let glibc merged after linux-4.21 and csky glibc only support time64 syscalls ? Best Regards Guo Ren
On Fri, Sep 14, 2018 at 6:02 PM Guo Ren <ren_guo@c-sky.com> wrote: > > On Fri, Sep 14, 2018 at 04:46:56PM +0200, Arnd Bergmann wrote: > > On Fri, Sep 14, 2018 at 4:37 PM Guo Ren <ren_guo@c-sky.com> wrote: > > > One level below that, you can of course choose to build > > a distro with only 64-bit time_t regardless of whether there > > is still support for 32-bit time_t in kernel and glibc or not. > Do you mean I could merge kernel port into linux-4.19 or linux-4.20. > But let glibc merged after linux-4.21 and csky glibc only support > time64 syscalls ? Yes, that's what I meant. Note that it won't get merged into 4.19: the merge window for that was after the 4.18 release and no new features are getting merged for 4.19 that have not made it into 4.19-rc1 already. Getting your code into linux-next now is the prerequisite for you sending a pull request during the 4.20 window. Arnd
On Fri, Sep 14, 2018 at 06:09:35PM +0200, Arnd Bergmann wrote: > On Fri, Sep 14, 2018 at 6:02 PM Guo Ren <ren_guo@c-sky.com> wrote: > > > > On Fri, Sep 14, 2018 at 04:46:56PM +0200, Arnd Bergmann wrote: > > > On Fri, Sep 14, 2018 at 4:37 PM Guo Ren <ren_guo@c-sky.com> wrote: > > > > > One level below that, you can of course choose to build > > > a distro with only 64-bit time_t regardless of whether there > > > is still support for 32-bit time_t in kernel and glibc or not. > > Do you mean I could merge kernel port into linux-4.19 or linux-4.20. > > But let glibc merged after linux-4.21 and csky glibc only support > > time64 syscalls ? > > Yes, that's what I meant. Note that it won't get merged into 4.19: > the merge window for that was after the 4.18 release and no > new features are getting merged for 4.19 that have not made it > into 4.19-rc1 already. > > Getting your code into linux-next now is the prerequisite for you > sending a pull request during the 4.20 window. Got it, thank you very much. Guo Ren
Thx for the review, that's very helpful. On Wed, Sep 12, 2018 at 05:55:14PM +0200, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 09:24:45PM +0800, Guo Ren wrote: > > > +#define ATOMIC_OP(op, c_op) \ > > +static inline void atomic_##op(int i, atomic_t *v) \ > > +{ \ > > + unsigned long tmp; \ > > + \ > > + smp_mb(); \ > > + asm volatile ( \ > > + "1: ldex.w %0, (%2) \n" \ > > + " " #op " %0, %1 \n" \ > > + " stex.w %0, (%2) \n" \ > > + " bez %0, 1b \n" \ > > + : "=&r" (tmp) \ > > + : "r" (i), "r"(&v->counter) \ > > + : "memory"); \ > > + smp_mb(); \ > > +} > > ATOMIC_OP doesn't need to imply any smp_mb()'s what so ever. Ok. > > +#define ATOMIC_OP_RETURN(op, c_op) \ > > +static inline int atomic_##op##_return(int i, atomic_t *v) \ > > +{ \ > > + unsigned long tmp, ret; \ > > + \ > > + smp_mb(); \ > > + asm volatile ( \ > > + "1: ldex.w %0, (%3) \n" \ > > + " " #op " %0, %2 \n" \ > > + " mov %1, %0 \n" \ > > + " stex.w %0, (%3) \n" \ > > + " bez %0, 1b \n" \ > > + : "=&r" (tmp), "=&r" (ret) \ > > + : "r" (i), "r"(&v->counter) \ > > + : "memory"); \ > > + smp_mb(); \ > > + \ > > + return ret; \ > > +} > > + > > +#define ATOMIC_FETCH_OP(op, c_op) \ > > +static inline int atomic_fetch_##op(int i, atomic_t *v) \ > > +{ \ > > + unsigned long tmp, ret; \ > > + \ > > + smp_mb(); \ > > + asm volatile ( \ > > + "1: ldex.w %0, (%3) \n" \ > > + " mov %1, %0 \n" \ > > + " " #op " %0, %2 \n" \ > > + " stex.w %0, (%3) \n" \ > > + " bez %0, 1b \n" \ > > + : "=&r" (tmp), "=&r" (ret) \ > > + : "r" (i), "r"(&v->counter) \ > > + : "memory"); \ > > + smp_mb(); \ > > + \ > > + return ret; \ > > +} > > For these you could generate _relaxed variants and not provide smp_mb() > inside them. Ok, but I'll modify it in next commit. > > +#else /* CONFIG_CPU_HAS_LDSTEX */ > > + > > +#include <linux/irqflags.h> > > + > > > +#define ATOMIC_OP(op, c_op) \ > > +static inline void atomic_##op(int i, atomic_t *v) \ > > +{ \ > > + unsigned long tmp, flags; \ > > + \ > > + raw_local_irq_save(flags); \ > > + \ > > + asm volatile ( \ > > + " ldw %0, (%2) \n" \ > > + " " #op " %0, %1 \n" \ > > + " stw %0, (%2) \n" \ > > + : "=&r" (tmp) \ > > + : "r" (i), "r"(&v->counter) \ > > + : "memory"); \ > > + \ > > + raw_local_irq_restore(flags); \ > > +} > > Is this really 'better' than the generic UP fallback implementation? There is a lock irq instruction "idly4" with out irq_save. eg: asm volatile ( \ " idly4 \n" \ " ldw %0, (%2) \n" \ " " #op " %0, %1 \n" \ " stw %0, (%2) \n" \ I'll change to that after full tested. > > +static inline void arch_spin_lock(arch_spinlock_t *lock) > > +{ > > + arch_spinlock_t lockval; > > + u32 ticket_next = 1 << TICKET_NEXT; > > + u32 *p = &lock->lock; > > + u32 tmp; > > + > > + smp_mb(); > > spin_lock() doesn't need smp_mb() before. read_lock and write_lock also needn't smp_mb() before, isn't it? > > + > > +static inline void arch_spin_unlock(arch_spinlock_t *lock) > > +{ > > + smp_mb(); > > + lock->tickets.owner++; > > + smp_mb(); > > spin_unlock() doesn't need smp_mb() after. read_unlock and write_unlock also needn't smp_mb() after, isn't it? > > +#else /* CONFIG_QUEUED_RWLOCKS */ > > + > > +/* > > + * Test-and-set spin-locking. > > + */ > > Why retain that? > > same comments; it has far too many smp_mb()s in. I'm not sure about queued_rwlocks and just for 2-cores-smp test-and-set is faster and simpler, isn't it? Best Regards Guo Ren
On Wed, Sep 12, 2018 at 04:30:36PM +0200, Arnd Bergmann wrote: > On Wed, Sep 12, 2018 at 3:25 PM Guo Ren <ren_guo@c-sky.com> wrote: > > > > This is the 3th version patchset to add the Linux kernel port for C-SKY(csky). > > Thanks to everyone who provided feedback on the previous version. > > > > This patchset adds architecture support to Linux for C-SKY's 32-bit embedded > > CPU cores and the patches are based on linux-4.18.4 > > > > There are two ABI versions with several CPU cores in this patchset: > > ABIv1: ck610 (16-bit instruction, 32-bit data path, VIPT Cache ...) > > ABIv2: ck807 ck810 ck860 (16/32-bit variable length instruction, PIPT Cache, > > SMP ...) > > > > More information: http://en.c-sky.com > > This looks good to me overall. I think a good next step would be to get the port > included in linux-next, by preparing a git tree with all the patches and asking > Stephen Rothwell to include it there. Further comments on the architecture > port itself can be done on top of the existing patches. I would suggest you > base the git tree on an -rc release (either 4.19-rc1 or 4.19-rc3) and then never > rebase again. Another question: Could I add "Acked-by: Arnd Bergmann <arnd@arndb.de>" in all my comments? Best Regards Guo Ren
Hello Stephen, I'm Guo Ren from C-SKY and I'm working on csky linux port upstream. I've prepared my git-tree based on linux-4.19-rc3: git clone -b linux-next https://github.com/c-sky/csky-linux.git Here is the pre-built cross compiler for fast test from our CI: https://gitlab.com/c-sky/buildroot/-/jobs/97941896/artifacts/file/output/images/csky_toolchain_csky_ck860_platform_defconfig_72371bf75a51f27ea59fc34eeaf236e06b75bf69.tar.xz You can also build newest gcc, binutils and they are upstreamed but not released on gnu.org. Glibc is uptreaming now. Please have a look and any feed back is welcome. Best Regards Guo Ren On Wed, Sep 12, 2018 at 04:30:36PM +0200, Arnd Bergmann wrote: > On Wed, Sep 12, 2018 at 3:25 PM Guo Ren <ren_guo@c-sky.com> wrote: > > > > This is the 3th version patchset to add the Linux kernel port for C-SKY(csky). > > Thanks to everyone who provided feedback on the previous version. > > > > This patchset adds architecture support to Linux for C-SKY's 32-bit embedded > > CPU cores and the patches are based on linux-4.18.4 > > > > There are two ABI versions with several CPU cores in this patchset: > > ABIv1: ck610 (16-bit instruction, 32-bit data path, VIPT Cache ...) > > ABIv2: ck807 ck810 ck860 (16/32-bit variable length instruction, PIPT Cache, > > SMP ...) > > > > More information: http://en.c-sky.com > > This looks good to me overall. I think a good next step would be to get the port > included in linux-next, by preparing a git tree with all the patches and asking > Stephen Rothwell to include it there. Further comments on the architecture > port itself can be done on top of the existing patches. I would suggest you > base the git tree on an -rc release (either 4.19-rc1 or 4.19-rc3) and then never > rebase again. > > You have included a couple of drivers in the submission: two timer and > two irqchip drivers. Please leave those out for the moment, and either have > them merged through the respective subsystem trees, or get an Ack > from the maintainers to merge them through your tree. > > I notice that a lot of the patches have no changeset comments on them. > You should fix that and make a habit of describing every single patch > with a few sentences, even if it seems obvious to you. Have a look at > the changeset descriptions for the nds32 and riscv architectures when > they got merged. > > One big question for me is what to do about time_t. Deepa and I are > in the process of finalizing the system call ABI for 32-bit architectures > with 64-bit time_t, but we are not done yet and it won't be complete > for 4.20. If you target 4.21, that could be a chance to make csky the > first architecture to only need the 64-bit time_t interface, with the > corresponding user space changes. > > Arnd
On Sat, Sep 15, 2018 at 10:55:13PM +0800, Guo Ren wrote: > > > +#define ATOMIC_OP_RETURN(op, c_op) \ > > > +#define ATOMIC_FETCH_OP(op, c_op) \ > > For these you could generate _relaxed variants and not provide smp_mb() > > inside them. > Ok, but I'll modify it in next commit. That's fine. Just wanted to let you know about _relaxed() since it will benefit your platform. > > > +#define ATOMIC_OP(op, c_op) \ > > > +static inline void atomic_##op(int i, atomic_t *v) \ > > > +{ \ > > > + unsigned long tmp, flags; \ > > > + \ > > > + raw_local_irq_save(flags); \ > > > + \ > > > + asm volatile ( \ > > > + " ldw %0, (%2) \n" \ > > > + " " #op " %0, %1 \n" \ > > > + " stw %0, (%2) \n" \ > > > + : "=&r" (tmp) \ > > > + : "r" (i), "r"(&v->counter) \ > > > + : "memory"); \ > > > + \ > > > + raw_local_irq_restore(flags); \ > > > +} > > > > Is this really 'better' than the generic UP fallback implementation? > There is a lock irq instruction "idly4" with out irq_save. eg: > asm volatile ( \ > " idly4 \n" \ > " ldw %0, (%2) \n" \ > " " #op " %0, %1 \n" \ > " stw %0, (%2) \n" \ > I'll change to that after full tested. That is pretty nifty, could you explain (or reference me to a arch doc that does) the exact semantics of that "idly4" instruction? > > > +static inline void arch_spin_lock(arch_spinlock_t *lock) > > > +{ > > > + arch_spinlock_t lockval; > > > + u32 ticket_next = 1 << TICKET_NEXT; > > > + u32 *p = &lock->lock; > > > + u32 tmp; > > > + > > > + smp_mb(); > > > > spin_lock() doesn't need smp_mb() before. > read_lock and write_lock also needn't smp_mb() before, isn't it? Correct. The various *_lock() functions only need imply an ACQUIRE barrier, such that the critical section happens after the lock is taken. > > > + > > > +static inline void arch_spin_unlock(arch_spinlock_t *lock) > > > +{ > > > + smp_mb(); > > > + lock->tickets.owner++; > > > + smp_mb(); > > > > spin_unlock() doesn't need smp_mb() after. > read_unlock and write_unlock also needn't smp_mb() after, isn't it? Indeed so, the various *_unlock() functions only need imply a RELEASE barrier, such that the critical section happend before the lock is released. In both cases (lock and unlock) there is a great amount of subtle details, but most of that is irrelevant if all you have is smp_mb(). > > > +/* > > > + * Test-and-set spin-locking. > > > + */ > > > > Why retain that? > > > > same comments; it has far too many smp_mb()s in. > I'm not sure about queued_rwlocks and just for 2-cores-smp test-and-set is > faster and simpler, isn't it? Even on 2 cores I think you can create starvation cases with test-and-set spinlocks. And the maintenace overhead of carrying two lock implementations is non trivial. As to performance; I cannot say, but the ticket lock isn't very expensive, you could benchmark of course.
Hi Guo, On Sun, 16 Sep 2018 12:53:26 +0800 Guo Ren <ren_guo@c-sky.com> wrote: > > I'm Guo Ren from C-SKY and I'm working on csky linux port upstream. > I've prepared my git-tree based on linux-4.19-rc3: > git clone -b linux-next https://github.com/c-sky/csky-linux.git > > Here is the pre-built cross compiler for fast test from our CI: > https://gitlab.com/c-sky/buildroot/-/jobs/97941896/artifacts/file/output/images/csky_toolchain_csky_ck860_platform_defconfig_72371bf75a51f27ea59fc34eeaf236e06b75bf69.tar.xz > > You can also build newest gcc, binutils and they are upstreamed but not > released on gnu.org. Glibc is uptreaming now. > > Please have a look and any feed back is welcome. Added from today (called "csky"). Thanks for adding your subsystem tree as a participant of linux-next. As you may know, this is not a judgement of your code. The purpose of linux-next is for integration testing and to lower the impact of conflicts between subsystems in the next merge window. You will need to ensure that the patches/commits in your tree/series have been: * submitted under GPL v2 (or later) and include the Contributor's Signed-off-by, * posted to the relevant mailing list, * reviewed by you (or another maintainer of your subsystem tree), * successfully unit tested, and * destined for the current or next Linux merge window. Basically, this should be just what you would send to Linus (or ask him to fetch). It is allowed to be rebased if you deem it necessary.
Hi Guo, On Mon, 17 Sep 2018 21:54:43 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote: > > Added from today (called "csky"). Actually from tomorrow :-)
On Mon, Sep 17, 2018 at 09:54:43PM +1000, Stephen Rothwell wrote: > Hi Guo, > > On Sun, 16 Sep 2018 12:53:26 +0800 Guo Ren <ren_guo@c-sky.com> wrote: > > > > I'm Guo Ren from C-SKY and I'm working on csky linux port upstream. > > I've prepared my git-tree based on linux-4.19-rc3: > > git clone -b linux-next https://github.com/c-sky/csky-linux.git > > > > Here is the pre-built cross compiler for fast test from our CI: > > https://gitlab.com/c-sky/buildroot/-/jobs/97941896/artifacts/file/output/images/csky_toolchain_csky_ck860_platform_defconfig_72371bf75a51f27ea59fc34eeaf236e06b75bf69.tar.xz > > > > You can also build newest gcc, binutils and they are upstreamed but not > > released on gnu.org. Glibc is uptreaming now. > > > > Please have a look and any feed back is welcome. > > Added from today (called "csky"). > > Thanks for adding your subsystem tree as a participant of linux-next. As > you may know, this is not a judgement of your code. The purpose of > linux-next is for integration testing and to lower the impact of > conflicts between subsystems in the next merge window. > > You will need to ensure that the patches/commits in your tree/series have > been: > * submitted under GPL v2 (or later) and include the Contributor's > Signed-off-by, > * posted to the relevant mailing list, > * reviewed by you (or another maintainer of your subsystem tree), > * successfully unit tested, and > * destined for the current or next Linux merge window. > > Basically, this should be just what you would send to Linus (or ask him > to fetch). It is allowed to be rebased if you deem it necessary. I am very excited to receive your mail. I have been preparing for this for nearly 3 years. - All files are under GPL v2 - All Contributors have been Signed-off. - All have been reviewed. - For develop detail you can find in: https://github.com/c-sky/csky-linux It records from 'Dec 2015' to 'Sep 2018'. - Here is the newest ltp test result: https://gitlab.com/c-sky/buildroot/-/jobs/98061317 Best Regards Guo Ren
On Mon, Sep 17, 2018 at 10:03:57PM +1000, Stephen Rothwell wrote: > Hi Guo, > > On Mon, 17 Sep 2018 21:54:43 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote: > > > > Added from today (called "csky"). > > Actually from tomorrow :-) Ok.
On Mon, Sep 17, 2018 at 10:17:55AM +0200, Peter Zijlstra wrote: > On Sat, Sep 15, 2018 at 10:55:13PM +0800, Guo Ren wrote: > > > > +#define ATOMIC_OP_RETURN(op, c_op) \ > > > > > +#define ATOMIC_FETCH_OP(op, c_op) \ > > > > For these you could generate _relaxed variants and not provide smp_mb() > > > inside them. > > Ok, but I'll modify it in next commit. > > That's fine. Just wanted to let you know about _relaxed() since it will > benefit your platform. Thank you. > > > > +#define ATOMIC_OP(op, c_op) \ > > > > +static inline void atomic_##op(int i, atomic_t *v) \ > > > > +{ \ > > > > + unsigned long tmp, flags; \ > > > > + \ > > > > + raw_local_irq_save(flags); \ > > > > + \ > > > > + asm volatile ( \ > > > > + " ldw %0, (%2) \n" \ > > > > + " " #op " %0, %1 \n" \ > > > > + " stw %0, (%2) \n" \ > > > > + : "=&r" (tmp) \ > > > > + : "r" (i), "r"(&v->counter) \ > > > > + : "memory"); \ > > > > + \ > > > > + raw_local_irq_restore(flags); \ > > > > +} > > > > > > Is this really 'better' than the generic UP fallback implementation? > > There is a lock irq instruction "idly4" with out irq_save. eg: > > asm volatile ( \ > > " idly4 \n" \ > > " ldw %0, (%2) \n" \ > > " " #op " %0, %1 \n" \ > > " stw %0, (%2) \n" \ > > I'll change to that after full tested. > > That is pretty nifty, could you explain (or reference me to a arch doc > that does) the exact semantics of that "idly4" instruction? The idly4 allows the 4 instructions behind it to not respond to interrupts. When ldw got exception, it will cause the carry to be 1. So I need prepare the assemble like this: 1: cmpne r0, r0 idly4 ldw %0, (%2) bt 1b " #op " ... stw ... I need more stress test on it and then I'll change to it. > > > > +static inline void arch_spin_lock(arch_spinlock_t *lock) > > > > +{ > > > > + arch_spinlock_t lockval; > > > > + u32 ticket_next = 1 << TICKET_NEXT; > > > > + u32 *p = &lock->lock; > > > > + u32 tmp; > > > > + > > > > + smp_mb(); > > > > > > spin_lock() doesn't need smp_mb() before. > > read_lock and write_lock also needn't smp_mb() before, isn't it? > > Correct. The various *_lock() functions only need imply an ACQUIRE > barrier, such that the critical section happens after the lock is taken. > > > > > + > > > > +static inline void arch_spin_unlock(arch_spinlock_t *lock) > > > > +{ > > > > + smp_mb(); > > > > + lock->tickets.owner++; > > > > + smp_mb(); > > > > > > spin_unlock() doesn't need smp_mb() after. > > read_unlock and write_unlock also needn't smp_mb() after, isn't it? > > Indeed so, the various *_unlock() functions only need imply a RELEASE > barrier, such that the critical section happend before the lock is > released. > > In both cases (lock and unlock) there is a great amount of subtle > details, but most of that is irrelevant if all you have is smp_mb(). Got it, Thx for the explanation. > > > > > > +/* > > > > + * Test-and-set spin-locking. > > > > + */ > > > > > > Why retain that? > > > > > > same comments; it has far too many smp_mb()s in. > > I'm not sure about queued_rwlocks and just for 2-cores-smp test-and-set is > > faster and simpler, isn't it? > > Even on 2 cores I think you can create starvation cases with > test-and-set spinlocks. And the maintenace overhead of carrying two lock > implementations is non trivial. > > As to performance; I cannot say, but the ticket lock isn't very > expensive, you could benchmark of course. Ticket lock is good. But How about queued_rwlocks v.s my_test_set_rwlock? I'm not sure about the queued_rwlocks. I just implement the ticket-spinlock. Best Regards Guo Ren
On Fri, 14 Sep 2018 07:37:20 PDT (-0700), ren_guo@c-sky.com wrote: > On Wed, Sep 12, 2018 at 04:30:36PM +0200, Arnd Bergmann wrote: >> On Wed, Sep 12, 2018 at 3:25 PM Guo Ren <ren_guo@c-sky.com> wrote: >> > >> > This is the 3th version patchset to add the Linux kernel port for C-SKY(csky). >> > Thanks to everyone who provided feedback on the previous version. >> > >> > This patchset adds architecture support to Linux for C-SKY's 32-bit embedded >> > CPU cores and the patches are based on linux-4.18.4 >> > >> > There are two ABI versions with several CPU cores in this patchset: >> > ABIv1: ck610 (16-bit instruction, 32-bit data path, VIPT Cache ...) >> > ABIv2: ck807 ck810 ck860 (16/32-bit variable length instruction, PIPT Cache, >> > SMP ...) >> > >> > More information: http://en.c-sky.com >> >> This looks good to me overall. I think a good next step would be to get the port >> included in linux-next, by preparing a git tree with all the patches and asking >> Stephen Rothwell to include it there. Further comments on the architecture >> port itself can be done on top of the existing patches. I would suggest you >> base the git tree on an -rc release (either 4.19-rc1 or 4.19-rc3) and then never >> rebase again. > Nice :) I'll follow the rules. > >> >> You have included a couple of drivers in the submission: two timer and >> two irqchip drivers. Please leave those out for the moment, and either have >> them merged through the respective subsystem trees, or get an Ack >> from the maintainers to merge them through your tree. > Ok. > >> >> I notice that a lot of the patches have no changeset comments on them. >> You should fix that and make a habit of describing every single patch >> with a few sentences, even if it seems obvious to you. Have a look at >> the changeset descriptions for the nds32 and riscv architectures when >> they got merged. > Ok, I'll fixup them. > >> >> One big question for me is what to do about time_t. Deepa and I are >> in the process of finalizing the system call ABI for 32-bit architectures >> with 64-bit time_t, but we are not done yet and it won't be complete >> for 4.20. If you target 4.21, that could be a chance to make csky the >> first architecture to only need the 64-bit time_t interface, with the >> corresponding user space changes. > y2038 is very important and csky32 has the issue. But 4.21 is too late for > us, we really want to get into kernel.org as soon as possible. > We could remove 32-bit time_t in future. I don't want to hijack this thread, but in RISC-V land we were hoping to have a user ABI free of 32-bit time_t. Our 32-bit glibc ABI hasn't been finalized yet, and when I talked to the glibc guys a few weeks ago they were happy to let us wait until 32-bit time_t can be removed before we stabilize the ABI. We've been maintaining out-of-tree glibc patches for a while now, so I'd really like to get them into the next glibc release. Mapping out the schedule more explicitly, as I'm terrible with dates: * 4.19-rc4 was 2018-09-16 * 4.19 should be 2018-10-21 * 4.20 should be 2019-01-13 (skipping 2 weeks for the holidays) * 4.21 merge window should close 2019-01-27 * glibc 2.29 is scheduled for 2019-02-01 That's very tight, but assuming we at least have a prototype of the API so we can get the rv32i glibc patches in much earlier it might be OK. There was some talk of being able to use some workarounds to do a 32-bit time_t user ABI without the cooresponding kernel ABI, so we could always go down that route to start and then decide to deprecate or not deprecate the 32-bit kernel ABI at the last minute -- not something I'm fond of doing, but an option. How close to done do you think the 32-bit time_t will be by the end of the 4.20 merge window? If it's close enough to start our glibc push then that might be OK.
On Thu, Sep 20, 2018 at 10:52 AM Palmer Dabbelt <palmer@sifive.com> wrote: > > On Fri, 14 Sep 2018 07:37:20 PDT (-0700), ren_guo@c-sky.com wrote: > > On Wed, Sep 12, 2018 at 04:30:36PM +0200, Arnd Bergmann wrote: > >> On Wed, Sep 12, 2018 at 3:25 PM Guo Ren <ren_guo@c-sky.com> wrote: > I don't want to hijack this thread, but in RISC-V land we were hoping to have a > user ABI free of 32-bit time_t. Our 32-bit glibc ABI hasn't been finalized > yet, and when I talked to the glibc guys a few weeks ago they were happy to let > us wait until 32-bit time_t can be removed before we stabilize the ABI. We've > been maintaining out-of-tree glibc patches for a while now, so I'd really like > to get them into the next glibc release. > > Mapping out the schedule more explicitly, as I'm terrible with dates: > > * 4.19-rc4 was 2018-09-16 > * 4.19 should be 2018-10-21 > * 4.20 should be 2019-01-13 (skipping 2 weeks for the holidays) > * 4.21 merge window should close 2019-01-27 > * glibc 2.29 is scheduled for 2019-02-01 > > That's very tight, but assuming we at least have a prototype of the API so we > can get the rv32i glibc patches in much earlier it might be OK. There was some > talk of being able to use some workarounds to do a 32-bit time_t user ABI > without the cooresponding kernel ABI, so we could always go down that route to > start and then decide to deprecate or not deprecate the 32-bit kernel ABI at > the last minute -- not something I'm fond of doing, but an option. > > How close to done do you think the 32-bit time_t will be by the end of the 4.20 > merge window? If it's close enough to start our glibc push then that might be > OK. It will be a bit of a stretch, but it's possible. Most syscalls are done in linux-next, I have a few more pending, and only clock_adjtime is really missing now (I had some earlier patches that I could revive). My plan was to get that all into 4.20, and then have a conversation about the actual syscall table changes in 4.21. If we need it for both csky and rv32, we might just change the generic syscall table that way in 4.21 without changing all the other ones along with them. I don't want to drag things out over too many merge windows though, and my plan was to do all architectures together to simplify the version checks in the libc code to only have to check for a single version. Arnd
On Thu, Sep 20, 2018 at 10:18:51PM -0700, Arnd Bergmann wrote: > On Thu, Sep 20, 2018 at 10:52 AM Palmer Dabbelt <palmer@sifive.com> wrote: > > > > On Fri, 14 Sep 2018 07:37:20 PDT (-0700), ren_guo@c-sky.com wrote: > > > On Wed, Sep 12, 2018 at 04:30:36PM +0200, Arnd Bergmann wrote: > > >> On Wed, Sep 12, 2018 at 3:25 PM Guo Ren <ren_guo@c-sky.com> wrote: > > I don't want to hijack this thread, but in RISC-V land we were hoping to have a > > user ABI free of 32-bit time_t. Our 32-bit glibc ABI hasn't been finalized > > yet, and when I talked to the glibc guys a few weeks ago they were happy to let > > us wait until 32-bit time_t can be removed before we stabilize the ABI. We've > > been maintaining out-of-tree glibc patches for a while now, so I'd really like > > to get them into the next glibc release. > > > > Mapping out the schedule more explicitly, as I'm terrible with dates: > > > > * 4.19-rc4 was 2018-09-16 > > * 4.19 should be 2018-10-21 > > * 4.20 should be 2019-01-13 (skipping 2 weeks for the holidays) > > * 4.21 merge window should close 2019-01-27 > > * glibc 2.29 is scheduled for 2019-02-01 Thx for the schedule info. > > > > That's very tight, but assuming we at least have a prototype of the API so we > > can get the rv32i glibc patches in much earlier it might be OK. There was some > > talk of being able to use some workarounds to do a 32-bit time_t user ABI > > without the cooresponding kernel ABI, so we could always go down that route to > > start and then decide to deprecate or not deprecate the 32-bit kernel ABI at > > the last minute -- not something I'm fond of doing, but an option. > > > > How close to done do you think the 32-bit time_t will be by the end of the 4.20 > > merge window? If it's close enough to start our glibc push then that might be > > OK. > > It will be a bit of a stretch, but it's possible. Most syscalls are > done in linux-next, > I have a few more pending, and only clock_adjtime is really missing now (I had > some earlier patches that I could revive). Seems time schedule is OK. If we make csky get into linux-4.20, then csky glibc port could remove 32-bit time_t in patchset before glibc 2.29 release. > My plan was to get that all into 4.20, and then have a conversation about the > actual syscall table changes in 4.21. If we need it for both csky and rv32, > we might just change the generic syscall table that way in 4.21 without > changing all the other ones along with them. I don't want to drag things out > over too many merge windows though, and my plan was to do all architectures > together to simplify the version checks in the libc code to only have to check > for a single version. Seems that's no problem. Best Regards Guo Ren
Hi Arnd, On Fri, Sep 21, 2018 at 7:19 AM Arnd Bergmann <arnd@arndb.de> wrote: > On Thu, Sep 20, 2018 at 10:52 AM Palmer Dabbelt <palmer@sifive.com> wrote: > > On Fri, 14 Sep 2018 07:37:20 PDT (-0700), ren_guo@c-sky.com wrote: > > > On Wed, Sep 12, 2018 at 04:30:36PM +0200, Arnd Bergmann wrote: > > >> On Wed, Sep 12, 2018 at 3:25 PM Guo Ren <ren_guo@c-sky.com> wrote: > My plan was to get that all into 4.20, and then have a conversation about the > actual syscall table changes in 4.21. If we need it for both csky and rv32, > we might just change the generic syscall table that way in 4.21 without > changing all the other ones along with them. I don't want to drag things out > over too many merge windows though, and my plan was to do all architectures > together to simplify the version checks in the libc code to only have to check > for a single version. What happens with the version checks if it is backported to stable? Gr{oetje,eeting}s, Geert
On Mon, Sep 24, 2018 at 9:21 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Arnd, > > On Fri, Sep 21, 2018 at 7:19 AM Arnd Bergmann <arnd@arndb.de> wrote: > > On Thu, Sep 20, 2018 at 10:52 AM Palmer Dabbelt <palmer@sifive.com> wrote: > > > On Fri, 14 Sep 2018 07:37:20 PDT (-0700), ren_guo@c-sky.com wrote: > > > > On Wed, Sep 12, 2018 at 04:30:36PM +0200, Arnd Bergmann wrote: > > > >> On Wed, Sep 12, 2018 at 3:25 PM Guo Ren <ren_guo@c-sky.com> wrote: > > My plan was to get that all into 4.20, and then have a conversation about the > > actual syscall table changes in 4.21. If we need it for both csky and rv32, > > we might just change the generic syscall table that way in 4.21 without > > changing all the other ones along with them. I don't want to drag things out > > over too many merge windows though, and my plan was to do all architectures > > together to simplify the version checks in the libc code to only have to check > > for a single version. > > What happens with the version checks if it is backported to stable? They end up as dead code. The way these version checks are done in glibc, we always try the new interface first before falling back to the old version, but don't check the kernel version at runtime. The compile-time check optimizes out the fallback if glibc is built for a recent-enough minimum kernel version that is guaranteed to have the new version. Arnd