Message ID | 20200115173638.31514-1-juergh@canonical.com |
---|---|
Headers | show |
Series | Fix ThunderX boot issues | expand |
On Wed, Jan 15, 2020 at 06:36:36PM +0100, Juerg Haefliger wrote: > Two trivial KPTI patches to fix a nasty ThunderX boot issues. > > The 'arm64: add sentinel to kpti_safe_list' patch might not be a > contributor here but it certainly fixes a real issue since on ThunderX the > list is traversed all the way to the end and not stopped due to the > missing sentinel. > > The real problem is fixed by 'arm64: Check for errata before evaluating > cpu features'. The kernel has a check in place for a certain Cavium errata. > If that errata is enabled, KPTI is turned off (the code comment mentions > I-cache clobbering if both are enabled at the same time). Some earlier KPTI > commit changed the order in which features and erratas are evaluated and > enabled resulting in a short period of time where both KPTI and that errata > are enabled. The fix reverses that order. > > Dirk Mueller (1): > arm64: Check for errata before evaluating cpu features > > Mark Rutland (1): > arm64: add sentinel to kpti_safe_list > > arch/arm64/kernel/cpufeature.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) The two patches are trivial and they look correct to me. Acked-by: Andrea Righi <andrea.righi@canonical.com>
On 2020-01-15 18:36:36 , Juerg Haefliger wrote: > Two trivial KPTI patches to fix a nasty ThunderX boot issues. > > The 'arm64: add sentinel to kpti_safe_list' patch might not be a > contributor here but it certainly fixes a real issue since on ThunderX the > list is traversed all the way to the end and not stopped due to the > missing sentinel. > > The real problem is fixed by 'arm64: Check for errata before evaluating > cpu features'. The kernel has a check in place for a certain Cavium errata. > If that errata is enabled, KPTI is turned off (the code comment mentions > I-cache clobbering if both are enabled at the same time). Some earlier KPTI > commit changed the order in which features and erratas are evaluated and > enabled resulting in a short period of time where both KPTI and that errata > are enabled. The fix reverses that order. > > Dirk Mueller (1): > arm64: Check for errata before evaluating cpu features > > Mark Rutland (1): > arm64: add sentinel to kpti_safe_list > > arch/arm64/kernel/cpufeature.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > -- > 2.20.1 > > Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
On 2020-01-15 18:36:36, Juerg Haefliger wrote: > Two trivial KPTI patches to fix a nasty ThunderX boot issues. > > The 'arm64: add sentinel to kpti_safe_list' patch might not be a > contributor here but it certainly fixes a real issue since on ThunderX the > list is traversed all the way to the end and not stopped due to the > missing sentinel. > > The real problem is fixed by 'arm64: Check for errata before evaluating > cpu features'. The kernel has a check in place for a certain Cavium errata. > If that errata is enabled, KPTI is turned off (the code comment mentions > I-cache clobbering if both are enabled at the same time). Some earlier KPTI > commit changed the order in which features and erratas are evaluated and > enabled resulting in a short period of time where both KPTI and that errata > are enabled. The fix reverses that order. Both patches look correct. Acked-by: Tyler Hicks <tyhicks@canonical.com> Tyler > > Dirk Mueller (1): > arm64: Check for errata before evaluating cpu features > > Mark Rutland (1): > arm64: add sentinel to kpti_safe_list > > arch/arm64/kernel/cpufeature.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > -- > 2.20.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team
On Wed, Jan 15, 2020 at 10:37 AM Juerg Haefliger <juerg.haefliger@canonical.com> wrote: > > Two trivial KPTI patches to fix a nasty ThunderX boot issues. > > The 'arm64: add sentinel to kpti_safe_list' patch might not be a > contributor here but it certainly fixes a real issue since on ThunderX the > list is traversed all the way to the end and not stopped due to the > missing sentinel. > > The real problem is fixed by 'arm64: Check for errata before evaluating > cpu features'. The kernel has a check in place for a certain Cavium errata. > If that errata is enabled, KPTI is turned off (the code comment mentions > I-cache clobbering if both are enabled at the same time). Some earlier KPTI > commit changed the order in which features and erratas are evaluated and > enabled resulting in a short period of time where both KPTI and that errata > are enabled. The fix reverses that order. > > Dirk Mueller (1): > arm64: Check for errata before evaluating cpu features > > Mark Rutland (1): > arm64: add sentinel to kpti_safe_list Both fixes look correct to me. Indeed, I had noticed that the "kernel page table isolation" message was printed very early in unaffected kernels logs: [ 0.000000] CPU features: kernel page table isolation forced OFF by ARM64_WORKAROUND_CAVIUM_27456 While affected kernels weren't printing it until much later, just before enabling non-boot CPUs: [ 0.035287] smp: Bringing up secondary CPUs ... [ 0.035637] CPU features: kernel page table isolation forced OFF by ARM64_WORKAROUND_CAVIUM_27456 With these fixes, these messages are now again emitted at [ 0.000000]. When I saw crashes before, they were always pretty early - either during boot, or immediately after when I started a kernel build. I ran tests overnight on 3 machines (build kernel & reboot loop) with 4.15.0-74.84 +, these patches and some other candidates. 1 machine survived the night, another survived for several hours until it was released by automation, and a 3rd failed - but due to an SEA which might be an actual hardware failure. I've restarted the same testing on 4 machines today using latest bionic/master-next + just these 2 patches. They've all survived at least 2 iterations so far. I'll leave it running but, after surviving >16 passes, I believe either the bug is squashed, or Juerg has scared it back into hiding :) Acked-by: dann frazier <dann.frazier@canonical.com>
On Wed, 15 Jan 2020 18:36:36 +0100 Juerg Haefliger <juerg.haefliger@canonical.com> wrote: > Two trivial KPTI patches to fix a nasty ThunderX boot issues. > > The 'arm64: add sentinel to kpti_safe_list' patch might not be a > contributor here but it certainly fixes a real issue since on ThunderX the > list is traversed all the way to the end and not stopped due to the > missing sentinel. > > The real problem is fixed by 'arm64: Check for errata before evaluating > cpu features'. The kernel has a check in place for a certain Cavium errata. > If that errata is enabled, KPTI is turned off (the code comment mentions > I-cache clobbering if both are enabled at the same time). Some earlier KPTI > commit changed the order in which features and erratas are evaluated and > enabled resulting in a short period of time where both KPTI and that errata > are enabled. The fix reverses that order. Correction, KPTI and the erratum are enabled persistently and not just for a brief period of time. ...Juerg > Dirk Mueller (1): > arm64: Check for errata before evaluating cpu features > > Mark Rutland (1): > arm64: add sentinel to kpti_safe_list > > arch/arm64/kernel/cpufeature.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) >
That one was already applied last cycle. Sorry for the delay.