Message ID | 20231029124548.12198-2-sourabhjain@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | powerpc: make fadump resilient with memory add/remove events | expand |
Hi Sourabh, This seems like a good change to the design, but I'm confused by some things, more below ... Sourabh Jain <sourabhjain@linux.ibm.com> writes: > ... > > Table 1 below illustrates kernel's ability to collect dump if either the > first/crashed kernel or the second/fadump kernel does not have the > changes introduced here. Consider the 'old kernel' as the kernel without > this patch, and the 'new kernel' as the kernel with this patch included. > > +----------+------------------------+------------------------+-------+ > | scenario | first/crashed kernel | second/fadump kernel | Dump | > +----------+------------------------+------------------------+-------+ > | 1 | old kernel | new kernel | Yes | > +----------+------------------------+------------------------+-------+ > | 2 | new kernel | old kernel | No | > +----------+------------------------+------------------------+-------+ > > Table 1 > > Scenario 1: > ----------- > Since the magic number of fadump header is updated, the second kernel > can differentiate the crashed kernel is of type 'new kernel' or > 'old kernel' and act accordingly. In this scenario, since the crashed > kernel is of type 'old kernel,' the fadump kernel skips elfcorehdr > creation and uses the one prepared in the first kernel itself to collect > the dump. > > Scenario 2: > ----------- > Since 'old kernel' as the fadump kernel is NOT capable of processing > fadump header with updated magic number from 'new kernel' hence it > gracefully fails with the below error and dump collection fails in this > scenario. > > [ 0.007365] rtas fadump: Crash info header is not valid. > > Add a version field to the fadump_crash_info_header structure to avoid > the need to change its magic number in the future. Adding a version > field to the fadump header was one of the TODO items listed in > Documentation/powerpc/firmware-assisted-dump.rst. This is a good analysis of the issues with different kernel versions, and seems like an OK trade off, ie. that old kernels can't process dumps from new kernels. But do we actually support using different kernel versions for the crash/dump kernel? Because AFAICS the fadump_crash_info_header is not safe across kernel versions, in its current form or with your changes. > diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h > index 27f9e11eda28..7be3d8894520 100644 > --- a/arch/powerpc/include/asm/fadump-internal.h > +++ b/arch/powerpc/include/asm/fadump-internal.h > @@ -42,7 +42,25 @@ static inline u64 fadump_str_to_u64(const char *str) > > #define FADUMP_CPU_UNKNOWN (~((u32)0)) > > -#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPINF") > +/* > + * The introduction of new fields in the fadump crash info header has > + * led to a change in the magic key, from `FADMPINF` to `FADMPSIG`. > + * This alteration ensures backward compatibility, enabling the kernel > + * with the updated fadump crash info to handle kernel dumps from older > + * kernels. > + * > + * To prevent the need for further changes to the magic number in the > + * event of future modifications to the fadump header, a version field > + * has been introduced to track the fadump crash info header version. > + * > + * Historically, there was no connection between the magic number and > + * the fadump crash info header version. However, moving forward, the > + * `FADMPINF` magic number in header will be treated as version 0, while > + * the `FADMPSIG` magic number in header will include a version field to > + * determine its version. > + */ > +#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPSIG") > +#define FADUMP_VERSION 1 > > /* fadump crash info structure */ > struct fadump_crash_info_header { > @@ -51,6 +69,10 @@ struct fadump_crash_info_header { > struct fadump_crash_info_header { u64 magic_number; u64 elfcorehdr_addr; > u32 crashing_cpu; > struct pt_regs regs; > struct cpumask cpu_mask; > + u32 version; > + u64 elfcorehdr_size; > + u64 vmcoreinfo_raddr; > + u64 vmcoreinfo_size; > }; The reason I say it's not safe is because pt_regs and especially cpumask can change size depending on the kernel configuration. pt_regs probably doesn't change size in practice for common kernel configurations, but some of the fields are under #ifdef. cpumask on the other hand is directly controlled by CONFIG_NR_CPUS. So if the first and second kernel have a different value for NR_CPUS they will disagree on the size of the struct. That has presumably worked OK so far because folks tend to use the same, or similar kernels for the first/second kernel. And also the cpumask is the last element of the struct, so a disagreement about it size doesn't affect the location of other fields. However with your new design the version field will move based on the cpumask size, which will potentially break detection of the version by the second kernel. At least that's how it looks to me, I don't see any checks anywhere that will save us, or did I miss something? cheers
Hello Michael, On 09/11/23 17:44, Michael Ellerman wrote: > Hi Sourabh, > > This seems like a good change to the design, but I'm confused by > some things, more below ... Thanks. > Sourabh Jain <sourabhjain@linux.ibm.com> writes: > ... >> Table 1 below illustrates kernel's ability to collect dump if either the >> first/crashed kernel or the second/fadump kernel does not have the >> changes introduced here. Consider the 'old kernel' as the kernel without >> this patch, and the 'new kernel' as the kernel with this patch included. >> >> +----------+------------------------+------------------------+-------+ >> | scenario | first/crashed kernel | second/fadump kernel | Dump | >> +----------+------------------------+------------------------+-------+ >> | 1 | old kernel | new kernel | Yes | >> +----------+------------------------+------------------------+-------+ >> | 2 | new kernel | old kernel | No | >> +----------+------------------------+------------------------+-------+ >> >> Table 1 >> >> Scenario 1: >> ----------- >> Since the magic number of fadump header is updated, the second kernel >> can differentiate the crashed kernel is of type 'new kernel' or >> 'old kernel' and act accordingly. In this scenario, since the crashed >> kernel is of type 'old kernel,' the fadump kernel skips elfcorehdr >> creation and uses the one prepared in the first kernel itself to collect >> the dump. >> >> Scenario 2: >> ----------- >> Since 'old kernel' as the fadump kernel is NOT capable of processing >> fadump header with updated magic number from 'new kernel' hence it >> gracefully fails with the below error and dump collection fails in this >> scenario. >> >> [ 0.007365] rtas fadump: Crash info header is not valid. >> >> Add a version field to the fadump_crash_info_header structure to avoid >> the need to change its magic number in the future. Adding a version >> field to the fadump header was one of the TODO items listed in >> Documentation/powerpc/firmware-assisted-dump.rst. > This is a good analysis of the issues with different kernel versions, > and seems like an OK trade off, ie. that old kernels can't process dumps > from new kernels. > > But do we actually support using different kernel versions for the > crash/dump kernel? > > Because AFAICS the fadump_crash_info_header is not safe across kernel > versions, in its current form or with your changes. Yeah, I was also under the impression that it is not supported, but I was not aware that the size of pt_regs and cpumask can change based on the configuration. > >> diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h >> index 27f9e11eda28..7be3d8894520 100644 >> --- a/arch/powerpc/include/asm/fadump-internal.h >> +++ b/arch/powerpc/include/asm/fadump-internal.h >> @@ -42,7 +42,25 @@ static inline u64 fadump_str_to_u64(const char *str) >> >> #define FADUMP_CPU_UNKNOWN (~((u32)0)) >> >> -#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPINF") >> +/* >> + * The introduction of new fields in the fadump crash info header has >> + * led to a change in the magic key, from `FADMPINF` to `FADMPSIG`. >> + * This alteration ensures backward compatibility, enabling the kernel >> + * with the updated fadump crash info to handle kernel dumps from older >> + * kernels. >> + * >> + * To prevent the need for further changes to the magic number in the >> + * event of future modifications to the fadump header, a version field >> + * has been introduced to track the fadump crash info header version. >> + * >> + * Historically, there was no connection between the magic number and >> + * the fadump crash info header version. However, moving forward, the >> + * `FADMPINF` magic number in header will be treated as version 0, while >> + * the `FADMPSIG` magic number in header will include a version field to >> + * determine its version. >> + */ >> +#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPSIG") >> +#define FADUMP_VERSION 1 >> >> /* fadump crash info structure */ >> struct fadump_crash_info_header { >> @@ -51,6 +69,10 @@ struct fadump_crash_info_header { >> > struct fadump_crash_info_header { > u64 magic_number; > u64 elfcorehdr_addr; > >> u32 crashing_cpu; >> struct pt_regs regs; >> struct cpumask cpu_mask; >> + u32 version; >> + u64 elfcorehdr_size; >> + u64 vmcoreinfo_raddr; >> + u64 vmcoreinfo_size; >> }; > The reason I say it's not safe is because pt_regs and especially cpumask > can change size depending on the kernel configuration. > > pt_regs probably doesn't change size in practice for common kernel > configurations, but some of the fields are under #ifdef. > > cpumask on the other hand is directly controlled by CONFIG_NR_CPUS. So > if the first and second kernel have a different value for NR_CPUS they > will disagree on the size of the struct. > > That has presumably worked OK so far because folks tend to use the same, or > similar kernels for the first/second kernel. And also the cpumask is the > last element of the struct, so a disagreement about it size doesn't > affect the location of other fields. > > However with your new design the version field will move based on the > cpumask size, which will potentially break detection of the version by > the second kernel. > > At least that's how it looks to me, I don't see any checks anywhere that > will save us, or did I miss something? I agree with you. If the sizes of pt_regs and cpu_mask are different between the first/crash kernel and the fadump/second kernel, the dump collection might not work correctly. Given that dump capture is not supported across kernel versions, I think it is better to keep new attributes introduced in this patch to struct fadump_crash_info_header before pt_regs and cpumask. For example: /* fadump crash info structure */ struct fadump_crash_info_header { u64 magic_number; + u32 version; + u64 vmcoreinfo_raddr; + u64 vmcoreinfo_size; u64 elfcorehdr_addr; + u64 elfcorehdr_size; u32 crashing_cpu; struct pt_regs regs; struct cpumask cpu_mask; }; Kernel with this patch included will not process the dump if fadump_crash_info_header holds old magic number. Something like: if (fdh->magic_number == fadump_str_to_u64("FADMPINF")) { pr_err("Incompatible fadump header, can't process the dump"); } Does this approach look good to you? Thanks, Sourabh
Sourabh Jain <sourabhjain@linux.ibm.com> writes: .... > diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h > index 27f9e11eda28..7be3d8894520 100644 > --- a/arch/powerpc/include/asm/fadump-internal.h > +++ b/arch/powerpc/include/asm/fadump-internal.h > @@ -42,7 +42,25 @@ static inline u64 fadump_str_to_u64(const char *str) > > #define FADUMP_CPU_UNKNOWN (~((u32)0)) > > -#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPINF") > +/* > + * The introduction of new fields in the fadump crash info header has > + * led to a change in the magic key, from `FADMPINF` to `FADMPSIG`. > + * This alteration ensures backward compatibility, enabling the kernel > + * with the updated fadump crash info to handle kernel dumps from older > + * kernels. > + * > + * To prevent the need for further changes to the magic number in the > + * event of future modifications to the fadump header, a version field > + * has been introduced to track the fadump crash info header version. > + * > + * Historically, there was no connection between the magic number and > + * the fadump crash info header version. However, moving forward, the > + * `FADMPINF` magic number in header will be treated as version 0, while > + * the `FADMPSIG` magic number in header will include a version field to > + * determine its version. > + */ > +#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPSIG") > +#define FADUMP_VERSION 1 > Can we keep the old magic details as #define FADUMP_CRASH_INFO_MAGIC_OLD fadump_str_to_u64("FADMPINF") #define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPSIG") Also considering the struct need not be backward compatible, can we just do struct fadump_crash_info_header { u64 magic_number; u32 crashing_cpu; u64 elfcorehdr_addr; u64 elfcorehdr_size; u64 vmcoreinfo_raddr; u64 vmcoreinfo_size; struct pt_regs regs; struct cpumask cpu_mask; }; static inline bool fadump_compatible(struct fadump_crash_info_header *fdh) { return (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC) } and fail fadump if we find it not compatible? -aneesh
Hi Aneesh, Thanks for reviewing the patch. On 15/11/23 10:14, Aneesh Kumar K.V wrote: > Sourabh Jain<sourabhjain@linux.ibm.com> writes: > > .... > >> diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h >> index 27f9e11eda28..7be3d8894520 100644 >> --- a/arch/powerpc/include/asm/fadump-internal.h >> +++ b/arch/powerpc/include/asm/fadump-internal.h >> @@ -42,7 +42,25 @@ static inline u64 fadump_str_to_u64(const char *str) >> >> #define FADUMP_CPU_UNKNOWN (~((u32)0)) >> >> -#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPINF") >> +/* >> + * The introduction of new fields in the fadump crash info header has >> + * led to a change in the magic key, from `FADMPINF` to `FADMPSIG`. >> + * This alteration ensures backward compatibility, enabling the kernel >> + * with the updated fadump crash info to handle kernel dumps from older >> + * kernels. >> + * >> + * To prevent the need for further changes to the magic number in the >> + * event of future modifications to the fadump header, a version field >> + * has been introduced to track the fadump crash info header version. >> + * >> + * Historically, there was no connection between the magic number and >> + * the fadump crash info header version. However, moving forward, the >> + * `FADMPINF` magic number in header will be treated as version 0, while >> + * the `FADMPSIG` magic number in header will include a version field to >> + * determine its version. >> + */ >> +#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPSIG") >> +#define FADUMP_VERSION 1 >> > Can we keep the old magic details as > > #define FADUMP_CRASH_INFO_MAGIC_OLD fadump_str_to_u64("FADMPINF") > #define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPSIG") Sure. > Also considering the struct need not be backward compatible, can we just > do > > struct fadump_crash_info_header { > u64 magic_number; > u32 crashing_cpu; > u64 elfcorehdr_addr; > u64 elfcorehdr_size; > u64 vmcoreinfo_raddr; > u64 vmcoreinfo_size; > struct pt_regs regs; > struct cpumask cpu_mask; > }; > static inline bool fadump_compatible(struct fadump_crash_info_header > *fdh) > { > return (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC) > } > > and fail fadump if we find it not compatible? Agree that it is unsafe to collect a dump with an incompatible fadump crash info header. Given that I am updating the fadump crash info header, we can make a few arrangements like adding a size filed for the dynamic size attribute like pt_regs and cpumask to ensure better compatibility in the future. Additionally, let's introduce a version field to the fadump crash info header to avoid changing the magic number in the future. Thanks, Sourabh
On 11/17/23 10:03 AM, Sourabh Jain wrote: > Hi Aneesh, > > Thanks for reviewing the patch. > > On 15/11/23 10:14, Aneesh Kumar K.V wrote: >> Sourabh Jain<sourabhjain@linux.ibm.com> writes: >> >> .... >> >>> diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h >>> index 27f9e11eda28..7be3d8894520 100644 >>> --- a/arch/powerpc/include/asm/fadump-internal.h >>> +++ b/arch/powerpc/include/asm/fadump-internal.h >>> @@ -42,7 +42,25 @@ static inline u64 fadump_str_to_u64(const char *str) >>> #define FADUMP_CPU_UNKNOWN (~((u32)0)) >>> -#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPINF") >>> +/* >>> + * The introduction of new fields in the fadump crash info header has >>> + * led to a change in the magic key, from `FADMPINF` to `FADMPSIG`. >>> + * This alteration ensures backward compatibility, enabling the kernel >>> + * with the updated fadump crash info to handle kernel dumps from older >>> + * kernels. >>> + * >>> + * To prevent the need for further changes to the magic number in the >>> + * event of future modifications to the fadump header, a version field >>> + * has been introduced to track the fadump crash info header version. >>> + * >>> + * Historically, there was no connection between the magic number and >>> + * the fadump crash info header version. However, moving forward, the >>> + * `FADMPINF` magic number in header will be treated as version 0, while >>> + * the `FADMPSIG` magic number in header will include a version field to >>> + * determine its version. >>> + */ >>> +#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPSIG") >>> +#define FADUMP_VERSION 1 >>> >> Can we keep the old magic details as >> >> #define FADUMP_CRASH_INFO_MAGIC_OLD fadump_str_to_u64("FADMPINF") >> #define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPSIG") > > Sure. > >> Also considering the struct need not be backward compatible, can we just >> do >> >> struct fadump_crash_info_header { >> u64 magic_number; >> u32 crashing_cpu; >> u64 elfcorehdr_addr; >> u64 elfcorehdr_size; >> u64 vmcoreinfo_raddr; >> u64 vmcoreinfo_size; >> struct pt_regs regs; >> struct cpumask cpu_mask; >> }; >> static inline bool fadump_compatible(struct fadump_crash_info_header >> *fdh) >> { >> return (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC) >> } >> >> and fail fadump if we find it not compatible? > > Agree that it is unsafe to collect a dump with an incompatible fadump crash info header. > > Given that I am updating the fadump crash info header, we can make a few arrangements > like adding a size filed for the dynamic size attribute like pt_regs and cpumask to ensure > better compatibility in the future. > > Additionally, let's introduce a version field to the fadump crash info header to avoid changing > the magic number in the future. > I am not sure whether we need to add all the complexity to enable supporting different fadump kernel version. Is that even a possible use case with fadump? Can't we always assume that with fadump the crash kernel and fadump kernel will be same version? if yes we can simply fail with a magic number mismatch because that indicates an user config error? -aneesh
On 17/11/23 11:01 am, Aneesh Kumar K V wrote: > On 11/17/23 10:03 AM, Sourabh Jain wrote: >> Hi Aneesh, >> >> Thanks for reviewing the patch. >> >> On 15/11/23 10:14, Aneesh Kumar K.V wrote: >>> Sourabh Jain<sourabhjain@linux.ibm.com> writes: >>> >>> .... >>> >>>> diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h >>>> index 27f9e11eda28..7be3d8894520 100644 >>>> --- a/arch/powerpc/include/asm/fadump-internal.h >>>> +++ b/arch/powerpc/include/asm/fadump-internal.h >>>> @@ -42,7 +42,25 @@ static inline u64 fadump_str_to_u64(const char *str) >>>> #define FADUMP_CPU_UNKNOWN (~((u32)0)) >>>> -#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPINF") >>>> +/* >>>> + * The introduction of new fields in the fadump crash info header has >>>> + * led to a change in the magic key, from `FADMPINF` to `FADMPSIG`. >>>> + * This alteration ensures backward compatibility, enabling the kernel >>>> + * with the updated fadump crash info to handle kernel dumps from older >>>> + * kernels. >>>> + * >>>> + * To prevent the need for further changes to the magic number in the >>>> + * event of future modifications to the fadump header, a version field >>>> + * has been introduced to track the fadump crash info header version. >>>> + * >>>> + * Historically, there was no connection between the magic number and >>>> + * the fadump crash info header version. However, moving forward, the >>>> + * `FADMPINF` magic number in header will be treated as version 0, while >>>> + * the `FADMPSIG` magic number in header will include a version field to >>>> + * determine its version. >>>> + */ >>>> +#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPSIG") >>>> +#define FADUMP_VERSION 1 >>>> >>> Can we keep the old magic details as >>> >>> #define FADUMP_CRASH_INFO_MAGIC_OLD fadump_str_to_u64("FADMPINF") >>> #define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPSIG") >> >> Sure. >> >>> Also considering the struct need not be backward compatible, can we just >>> do >>> >>> struct fadump_crash_info_header { >>> u64 magic_number; >>> u32 crashing_cpu; >>> u64 elfcorehdr_addr; >>> u64 elfcorehdr_size; >>> u64 vmcoreinfo_raddr; >>> u64 vmcoreinfo_size; >>> struct pt_regs regs; >>> struct cpumask cpu_mask; >>> }; >>> static inline bool fadump_compatible(struct fadump_crash_info_header >>> *fdh) >>> { >>> return (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC) >>> } >>> >>> and fail fadump if we find it not compatible? >> >> Agree that it is unsafe to collect a dump with an incompatible fadump crash info header. >> >> Given that I am updating the fadump crash info header, we can make a few arrangements >> like adding a size filed for the dynamic size attribute like pt_regs and cpumask to ensure >> better compatibility in the future. >> >> Additionally, let's introduce a version field to the fadump crash info header to avoid changing >> the magic number in the future. >> > > I am not sure whether we need to add all the complexity to enable supporting different fadump kernel > version. Is that even a possible use case with fadump? Can't we always assume that with fadump the > crash kernel and fadump kernel will be same version? if yes we can simply fail with a magic number > mismatch because that indicates an user config error? If we decide not to support different kernel versions for production kernel and capture kernel, We can make that implicit by adding kernel version info of production kernel in the header and bailing out if there is kernel version mismatch as magic could still match for two different kernel versions. I would personally prefer something like the below though: struct fadump_crash_info_header { u64 magic_number; u32 version u32 crashing_cpu; u64 elfcorehdr_addr; u64 elfcorehdr_size; u64 vmcoreinfo_raddr; u64 vmcoreinfo_size; u8 kernel_version[]; u32 pt_regs_sz; struct pt_regs regs; u32 cpu_mask_sz; struct cpumask cpu_mask; }; if (magic_number != new_magic) goto err; /* Error out */ if (kernel_version != capture_kernel_version) { if (cpu_mask_sz == sizeof(struct pt_regs) && cpu_mask_sz == sizeof(struct cpumask)) /* * Warn about the kernel version mismatch and how data can be different * across kernel versions and proceed anyway! */ else goto err; /* Error out */ } This ensures we warn and proceed in cases where it is less likely to have issues capturing kernel dump. This helps in dev environment where we are trying to debug an early boot crash - in which case capture kernel can't be the same kernel as it would likely hit the same problem while booting.. Thanks Hari
Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes: ... > > I am not sure whether we need to add all the complexity to enable supporting different fadump kernel > version. Is that even a possible use case with fadump? Can't we always assume that with fadump the > crash kernel and fadump kernel will be same version? How sure are we of that? Don't we go through grub when we boot into the 2nd kernel. And so couldn't it choose to boot a different kernel, for whatever reason. I don't think we need to support different pt_reg / cpumask sizes, but requiring the exact same kernel version is too strict I think. But maybe I'm wrong. Would be good to hear what distro folks think. cheers
Hello Michael, On 22/11/23 10:47, Michael Ellerman wrote: > Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes: > ... >> I am not sure whether we need to add all the complexity to enable supporting different fadump kernel >> version. Is that even a possible use case with fadump? Can't we always assume that with fadump the >> crash kernel and fadump kernel will be same version? > How sure are we of that? > > Don't we go through grub when we boot into the 2nd kernel. And so > couldn't it choose to boot a different kernel, for whatever reason. > > I don't think we need to support different pt_reg / cpumask sizes, but > requiring the exact same kernel version is too strict I think. Agree. > > But maybe I'm wrong. Would be good to hear what distro folks think. How about checking fadump crash info header compatibility in the following way? static bool is_fadump_header_compatible(struct fadump_crash_info_header *fdh) { if (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC_OLD) { pr_err("Old magic number, can't process the dump."); return false; } if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { pr_err("Fadump header is corrupted."); return false; } /* * If the kernel version of the first/crashed kernel and the second/fadump * kernel is not same, then only collect the dump if the size of all * non-primitive type members of the fadump header is the same across kernels. */ if (strcmp(fdh->kernel_version, init_uts_ns.name.release)) { if (fdh->pt_regs_sz != sizeof(struct pt_regs) || fdh->cpu_mask_sz != sizeof(struct cpumask)) { pr_err("Fadump header size mismatch.\n") return false; } else pr_warn("Kernel version mismatch; dump data is unreliable.\n"); } return true; } And the new fadump crash info header will be: As suggested by Hari. /* fadump crash info structure */ struct fadump_crash_info_header { u64 magic_number; + u32 version; u32 crashing_cpu; u64 elfcorehdr_addr; + u64 elfcorehdr_size; + u64 vmcoreinfo_raddr; + u64 vmcoreinfo_size; + u8 kernel_version[__NEW_UTS_LEN + 1]; + u32 pt_regs_sz; struct pt_regs regs; + u32 cpu_mask_sz; struct cpumask cpu_mask; }; Thanks, Sourabh Jain
On 11/22/23 4:05 PM, Sourabh Jain wrote: > Hello Michael, > > > On 22/11/23 10:47, Michael Ellerman wrote: >> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes: >> ... >>> I am not sure whether we need to add all the complexity to enable supporting different fadump kernel >>> version. Is that even a possible use case with fadump? Can't we always assume that with fadump the >>> crash kernel and fadump kernel will be same version? >> How sure are we of that? >> >> Don't we go through grub when we boot into the 2nd kernel. And so >> couldn't it choose to boot a different kernel, for whatever reason. >> >> I don't think we need to support different pt_reg / cpumask sizes, but >> requiring the exact same kernel version is too strict I think. > Agree. >> >> But maybe I'm wrong. Would be good to hear what distro folks think. > > How about checking fadump crash info header compatibility in the following way? > > static bool is_fadump_header_compatible(struct fadump_crash_info_header *fdh) > { > if (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC_OLD) { > pr_err("Old magic number, can't process the dump."); > return false; > } > > if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { > pr_err("Fadump header is corrupted."); > return false; > } > > /* > * If the kernel version of the first/crashed kernel and the second/fadump > * kernel is not same, then only collect the dump if the size of all > * non-primitive type members of the fadump header is the same across kernels. > */ > if (strcmp(fdh->kernel_version, init_uts_ns.name.release)) { > if (fdh->pt_regs_sz != sizeof(struct pt_regs) || fdh->cpu_mask_sz != sizeof(struct cpumask)) { > pr_err("Fadump header size mismatch.\n") > return false; > } else > pr_warn("Kernel version mismatch; dump data is unreliable.\n"); > } > You also want a fdh->version check? I am still not sure you need the kernel_version check. IMHO that is too strict and can hit that check where you have kexec kernel which is not installed in /boot crashing? > return true; > } > > And the new fadump crash info header will be: As suggested by Hari. > > /* fadump crash info structure */ > struct fadump_crash_info_header { > u64 magic_number; > + u32 version; > u32 crashing_cpu; > u64 elfcorehdr_addr; > + u64 elfcorehdr_size; > + u64 vmcoreinfo_raddr; > + u64 vmcoreinfo_size; > + u8 kernel_version[__NEW_UTS_LEN + 1]; > + u32 pt_regs_sz; > struct pt_regs regs; > + u32 cpu_mask_sz; > struct cpumask cpu_mask; > }; > > Thanks, > Sourabh Jain
Sourabh Jain <sourabhjain@linux.ibm.com> writes: > On 22/11/23 10:47, Michael Ellerman wrote: >> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes: >> ... >>> I am not sure whether we need to add all the complexity to enable supporting different fadump kernel >>> version. Is that even a possible use case with fadump? Can't we always assume that with fadump the >>> crash kernel and fadump kernel will be same version? >> How sure are we of that? >> >> Don't we go through grub when we boot into the 2nd kernel. And so >> couldn't it choose to boot a different kernel, for whatever reason. >> >> I don't think we need to support different pt_reg / cpumask sizes, but >> requiring the exact same kernel version is too strict I think. > Agree. >> >> But maybe I'm wrong. Would be good to hear what distro folks think. > > How about checking fadump crash info header compatibility in the > following way? > > static bool is_fadump_header_compatible(struct fadump_crash_info_header > *fdh) > { > if (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC_OLD) { > pr_err("Old magic number, can't process the dump."); > return false; > } > > if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { > pr_err("Fadump header is corrupted."); > return false; > } > > /* > * If the kernel version of the first/crashed kernel and the > second/fadump > * kernel is not same, then only collect the dump if the size of all > * non-primitive type members of the fadump header is the same > across kernels. > */ > if (strcmp(fdh->kernel_version, init_uts_ns.name.release)) { I don't think the kernel version check is necessary? > if (fdh->pt_regs_sz != sizeof(struct pt_regs) || fdh->cpu_mask_sz != sizeof(struct cpumask)) { > pr_err("Fadump header size mismatch.\n") > return false; Yeah I think that works. > } else > pr_warn("Kernel version mismatch; dump data is unreliable.\n"); > } > > return true; > } > > And the new fadump crash info header will be: As suggested by Hari. > > /* fadump crash info structure */ > struct fadump_crash_info_header { > u64 magic_number; > + u32 version; Do we need version? We're effectively using the magic_number as a version already. Having an actual version would allow us to make backward compatible changes in future, but it's not clear we'll need or want to do that. > u32 crashing_cpu; > u64 elfcorehdr_addr; > + u64 elfcorehdr_size; > + u64 vmcoreinfo_raddr; > + u64 vmcoreinfo_size; > + u8 kernel_version[__NEW_UTS_LEN + 1]; > + u32 pt_regs_sz; > struct pt_regs regs; > + u32 cpu_mask_sz; Typically you would put all the size fields before the variable sized fields, because otherwise the later size fields may not be where you expect them to be. But because we're bailing out if any of the sizes don't match it doesn't actually matter. > struct cpumask cpu_mask; > }; The other issue is endian. I assume we're just declaring that the 1st/2nd kernel must be the same endian? We should still make that explicit though. To make it clearer that this struct is somewhat an ABI, I think we should move the definition into arch/powerpc/include/uapi/asm/fadump.h We don't expect userspace to actually use the header, but it will hopefully remind everyone that the struct needs to be changed with care :) cheers
Hello Aneesh, On 22/11/23 17:50, Aneesh Kumar K V wrote: > On 11/22/23 4:05 PM, Sourabh Jain wrote: >> Hello Michael, >> >> >> On 22/11/23 10:47, Michael Ellerman wrote: >>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes: >>> ... >>>> I am not sure whether we need to add all the complexity to enable supporting different fadump kernel >>>> version. Is that even a possible use case with fadump? Can't we always assume that with fadump the >>>> crash kernel and fadump kernel will be same version? >>> How sure are we of that? >>> >>> Don't we go through grub when we boot into the 2nd kernel. And so >>> couldn't it choose to boot a different kernel, for whatever reason. >>> >>> I don't think we need to support different pt_reg / cpumask sizes, but >>> requiring the exact same kernel version is too strict I think. >> Agree. >>> But maybe I'm wrong. Would be good to hear what distro folks think. >> How about checking fadump crash info header compatibility in the following way? >> >> static bool is_fadump_header_compatible(struct fadump_crash_info_header *fdh) >> { >> if (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC_OLD) { >> pr_err("Old magic number, can't process the dump."); >> return false; >> } >> >> if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { >> pr_err("Fadump header is corrupted."); >> return false; >> } >> >> /* >> * If the kernel version of the first/crashed kernel and the second/fadump >> * kernel is not same, then only collect the dump if the size of all >> * non-primitive type members of the fadump header is the same across kernels. >> */ >> if (strcmp(fdh->kernel_version, init_uts_ns.name.release)) { >> if (fdh->pt_regs_sz != sizeof(struct pt_regs) || fdh->cpu_mask_sz != sizeof(struct cpumask)) { >> pr_err("Fadump header size mismatch.\n") >> return false; >> } else >> pr_warn("Kernel version mismatch; dump data is unreliable.\n"); >> } >> > You also want a fdh->version check? Even though we don't have any action against an fdh->version right now, I think I should check the fadump header version. Currently, if the version doesn't match, it means the header is corrupted. > I am still not sure you need the kernel_version check. IMHO that is too strict > and can hit that check where you have kexec kernel which is not installed in /boot crashing? If the kernel versions mismatch, we still collect the dump if the `pt_regs` and `cpu_mask` sizes are the same across the kernels. The kernel version check is just to warn users that the collected dump may be unreliable. Should I remove the kernel_version filed from fadump crash info header and remove the the kernel version check while processing the kernel dump? Thanks, Sourabh Jain
Hello Michael, On 22/11/23 18:22, Michael Ellerman wrote: > Sourabh Jain <sourabhjain@linux.ibm.com> writes: >> On 22/11/23 10:47, Michael Ellerman wrote: >>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes: >>> ... >>>> I am not sure whether we need to add all the complexity to enable supporting different fadump kernel >>>> version. Is that even a possible use case with fadump? Can't we always assume that with fadump the >>>> crash kernel and fadump kernel will be same version? >>> How sure are we of that? >>> >>> Don't we go through grub when we boot into the 2nd kernel. And so >>> couldn't it choose to boot a different kernel, for whatever reason. >>> >>> I don't think we need to support different pt_reg / cpumask sizes, but >>> requiring the exact same kernel version is too strict I think. >> Agree. >>> But maybe I'm wrong. Would be good to hear what distro folks think. >> How about checking fadump crash info header compatibility in the >> following way? >> >> static bool is_fadump_header_compatible(struct fadump_crash_info_header >> *fdh) >> { >> if (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC_OLD) { >> pr_err("Old magic number, can't process the dump."); >> return false; >> } >> >> if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { >> pr_err("Fadump header is corrupted."); >> return false; >> } >> >> /* >> * If the kernel version of the first/crashed kernel and the >> second/fadump >> * kernel is not same, then only collect the dump if the size of all >> * non-primitive type members of the fadump header is the same >> across kernels. >> */ >> if (strcmp(fdh->kernel_version, init_uts_ns.name.release)) { > > I don't think the kernel version check is necessary? I didn't find a place where pt_regs members are accessed to take a decision in fadump kernel. we just copy the pt_regs in fadump kernel. So I think as long as size is same across kernels, we are good. > >> if (fdh->pt_regs_sz != sizeof(struct pt_regs) || fdh->cpu_mask_sz != sizeof(struct cpumask)) { >> pr_err("Fadump header size mismatch.\n") >> return false; > Yeah I think that works. > >> } else >> pr_warn("Kernel version mismatch; dump data is unreliable.\n"); >> } >> >> return true; >> } >> >> And the new fadump crash info header will be: As suggested by Hari. >> >> /* fadump crash info structure */ >> struct fadump_crash_info_header { >> u64 magic_number; >> + u32 version; > > Do we need version? We're effectively using the magic_number as a > version already. Having an actual version would allow us to make > backward compatible changes in future, but it's not clear we'll need or > want to do that. Agree that currently version field is not used but I added a version field to make the fadump header structure compatible with future changes without changing the magic number. I will add a comment on how version field should be utilized if one changes fadump header in future. > >> u32 crashing_cpu; >> u64 elfcorehdr_addr; >> + u64 elfcorehdr_size; >> + u64 vmcoreinfo_raddr; >> + u64 vmcoreinfo_size; >> + u8 kernel_version[__NEW_UTS_LEN + 1]; >> + u32 pt_regs_sz; >> struct pt_regs regs; >> + u32 cpu_mask_sz; > > Typically you would put all the size fields before the variable sized > fields, because otherwise the later size fields may not be where you > expect them to be. But because we're bailing out if any of the sizes > don't match it doesn't actually matter. Yeah, but I will reorganize fadump header and put the size fields before the variable sized fields. > >> struct cpumask cpu_mask; >> }; > The other issue is endian. I assume we're just declaring that the > 1st/2nd kernel must be the same endian? We should still make that > explicit though. A comment is fine or should we add a explicit check and error out with relevant error message if endianness is not same across the kernels? Something like: if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { if (fdh->magic_number == swab64(FADUMP_CRASH_INFO_MAGIC)) { pr_err("Endianness mismatch"); } else { pr_err("Fadump header is corrupted."); } return false; } > > To make it clearer that this struct is somewhat an ABI, I think we > should move the definition into arch/powerpc/include/uapi/asm/fadump.h Sure > > We don't expect userspace to actually use the header, but it will > hopefully remind everyone that the struct needs to be changed with care :) Agree Thanks, Sourabh Jain
diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h index 27f9e11eda28..7be3d8894520 100644 --- a/arch/powerpc/include/asm/fadump-internal.h +++ b/arch/powerpc/include/asm/fadump-internal.h @@ -42,7 +42,25 @@ static inline u64 fadump_str_to_u64(const char *str) #define FADUMP_CPU_UNKNOWN (~((u32)0)) -#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPINF") +/* + * The introduction of new fields in the fadump crash info header has + * led to a change in the magic key, from `FADMPINF` to `FADMPSIG`. + * This alteration ensures backward compatibility, enabling the kernel + * with the updated fadump crash info to handle kernel dumps from older + * kernels. + * + * To prevent the need for further changes to the magic number in the + * event of future modifications to the fadump header, a version field + * has been introduced to track the fadump crash info header version. + * + * Historically, there was no connection between the magic number and + * the fadump crash info header version. However, moving forward, the + * `FADMPINF` magic number in header will be treated as version 0, while + * the `FADMPSIG` magic number in header will include a version field to + * determine its version. + */ +#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPSIG") +#define FADUMP_VERSION 1 /* fadump crash info structure */ struct fadump_crash_info_header { @@ -51,6 +69,10 @@ struct fadump_crash_info_header { u32 crashing_cpu; struct pt_regs regs; struct cpumask cpu_mask; + u32 version; + u64 elfcorehdr_size; + u64 vmcoreinfo_raddr; + u64 vmcoreinfo_size; }; struct fadump_memory_range { diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 3ff2da7b120b..8fd90afe886e 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -53,8 +53,6 @@ static struct kobject *fadump_kobj; static atomic_t cpus_in_fadump; static DEFINE_MUTEX(fadump_mutex); -static struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0, false }; - #define RESERVED_RNGS_SZ 16384 /* 16K - 128 entries */ #define RESERVED_RNGS_CNT (RESERVED_RNGS_SZ / \ sizeof(struct fadump_memory_range)) @@ -373,12 +371,6 @@ static unsigned long __init get_fadump_area_size(void) size = PAGE_ALIGN(size); size += fw_dump.boot_memory_size; size += sizeof(struct fadump_crash_info_header); - size += sizeof(struct elfhdr); /* ELF core header.*/ - size += sizeof(struct elf_phdr); /* place holder for cpu notes */ - /* Program headers for crash memory regions. */ - size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2); - - size = PAGE_ALIGN(size); /* This is to hold kernel metadata on platforms that support it */ size += (fw_dump.ops->fadump_get_metadata_size ? @@ -931,36 +923,6 @@ static inline int fadump_add_mem_range(struct fadump_mrange_info *mrange_info, return 0; } -static int fadump_exclude_reserved_area(u64 start, u64 end) -{ - u64 ra_start, ra_end; - int ret = 0; - - ra_start = fw_dump.reserve_dump_area_start; - ra_end = ra_start + fw_dump.reserve_dump_area_size; - - if ((ra_start < end) && (ra_end > start)) { - if ((start < ra_start) && (end > ra_end)) { - ret = fadump_add_mem_range(&crash_mrange_info, - start, ra_start); - if (ret) - return ret; - - ret = fadump_add_mem_range(&crash_mrange_info, - ra_end, end); - } else if (start < ra_start) { - ret = fadump_add_mem_range(&crash_mrange_info, - start, ra_start); - } else if (ra_end < end) { - ret = fadump_add_mem_range(&crash_mrange_info, - ra_end, end); - } - } else - ret = fadump_add_mem_range(&crash_mrange_info, start, end); - - return ret; -} - static int fadump_init_elfcore_header(char *bufp) { struct elfhdr *elf; @@ -997,52 +959,6 @@ static int fadump_init_elfcore_header(char *bufp) return 0; } -/* - * Traverse through memblock structure and setup crash memory ranges. These - * ranges will be used create PT_LOAD program headers in elfcore header. - */ -static int fadump_setup_crash_memory_ranges(void) -{ - u64 i, start, end; - int ret; - - pr_debug("Setup crash memory ranges.\n"); - crash_mrange_info.mem_range_cnt = 0; - - /* - * Boot memory region(s) registered with firmware are moved to - * different location at the time of crash. Create separate program - * header(s) for this memory chunk(s) with the correct offset. - */ - for (i = 0; i < fw_dump.boot_mem_regs_cnt; i++) { - start = fw_dump.boot_mem_addr[i]; - end = start + fw_dump.boot_mem_sz[i]; - ret = fadump_add_mem_range(&crash_mrange_info, start, end); - if (ret) - return ret; - } - - for_each_mem_range(i, &start, &end) { - /* - * skip the memory chunk that is already added - * (0 through boot_memory_top). - */ - if (start < fw_dump.boot_mem_top) { - if (end > fw_dump.boot_mem_top) - start = fw_dump.boot_mem_top; - else - continue; - } - - /* add this range excluding the reserved dump area. */ - ret = fadump_exclude_reserved_area(start, end); - if (ret) - return ret; - } - - return 0; -} - /* * If the given physical address falls within the boot memory region then * return the relocated address that points to the dump region reserved @@ -1073,13 +989,28 @@ static inline unsigned long fadump_relocate(unsigned long paddr) return raddr; } -static int fadump_create_elfcore_headers(char *bufp) +static void populate_elf_pt_load(struct elf_phdr *phdr, u64 start, + u64 size, unsigned long long offset) { - unsigned long long raddr, offset; - struct elf_phdr *phdr; + phdr->p_align = 0; + phdr->p_memsz = size; + phdr->p_filesz = size; + phdr->p_paddr = start; + phdr->p_offset = offset; + phdr->p_type = PT_LOAD; + phdr->p_flags = PF_R|PF_W|PF_X; + phdr->p_vaddr = (unsigned long)__va(start); +} + +static void __init fadump_populate_elfcorehdr(struct fadump_crash_info_header *fdh) +{ + char *bufp; struct elfhdr *elf; - int i, j; + struct elf_phdr *phdr; + u64 boot_mem_dest_offset; + unsigned long long i, ra_start, ra_end, ra_size, mstart, mend; + bufp = (char *) __va(fdh->elfcorehdr_addr); fadump_init_elfcore_header(bufp); elf = (struct elfhdr *)bufp; bufp += sizeof(struct elfhdr); @@ -1113,54 +1044,68 @@ static int fadump_create_elfcore_headers(char *bufp) phdr->p_vaddr = 0; phdr->p_align = 0; - phdr->p_paddr = fadump_relocate(paddr_vmcoreinfo_note()); - phdr->p_offset = phdr->p_paddr; - phdr->p_memsz = phdr->p_filesz = VMCOREINFO_NOTE_SIZE; + phdr->p_paddr = phdr->p_offset = fdh->vmcoreinfo_raddr; + phdr->p_memsz = phdr->p_filesz = fdh->vmcoreinfo_size; /* Increment number of program headers. */ (elf->e_phnum)++; - /* setup PT_LOAD sections. */ - j = 0; - offset = 0; - raddr = fw_dump.boot_mem_addr[0]; - for (i = 0; i < crash_mrange_info.mem_range_cnt; i++) { - u64 mbase, msize; - - mbase = crash_mrange_info.mem_ranges[i].base; - msize = crash_mrange_info.mem_ranges[i].size; - if (!msize) - continue; - + /* + * setup PT_LOAD sections. + * first include boot memory regions and then add rest of the memory + * regions + */ + boot_mem_dest_offset = fw_dump.boot_mem_dest_addr; + for (i = 0; i < fw_dump.boot_mem_regs_cnt; i++) { phdr = (struct elf_phdr *)bufp; bufp += sizeof(struct elf_phdr); - phdr->p_type = PT_LOAD; - phdr->p_flags = PF_R|PF_W|PF_X; - phdr->p_offset = mbase; - - if (mbase == raddr) { - /* - * The entire real memory region will be moved by - * firmware to the specified destination_address. - * Hence set the correct offset. - */ - phdr->p_offset = fw_dump.boot_mem_dest_addr + offset; - if (j < (fw_dump.boot_mem_regs_cnt - 1)) { - offset += fw_dump.boot_mem_sz[j]; - raddr = fw_dump.boot_mem_addr[++j]; - } + populate_elf_pt_load(phdr, fw_dump.boot_mem_addr[i], + fw_dump.boot_mem_sz[i], + boot_mem_dest_offset); + /* Increment number of program headers. */ + (elf->e_phnum)++; + boot_mem_dest_offset += fw_dump.boot_mem_sz[i]; + } + + /* Memory reserved for fadump in first kernel */ + ra_start = fw_dump.reserve_dump_area_start; + ra_size = get_fadump_area_size(); + ra_end = ra_start + ra_size; + + phdr = (struct elf_phdr *)bufp; + for_each_mem_range(i, &mstart, &mend) { + /* + * Skip the memory regions that already added + * to elfcorehdr. + */ + if (mstart < fw_dump.boot_mem_top) { + if (mend > fw_dump.boot_mem_top) + mstart = fw_dump.boot_mem_top; + else + continue; } - phdr->p_paddr = mbase; - phdr->p_vaddr = (unsigned long)__va(mbase); - phdr->p_filesz = msize; - phdr->p_memsz = msize; - phdr->p_align = 0; + /* Handle memblock regions overlaps with reserved area */ + if ((ra_start < mend) && (ra_end > mstart)) { + if ((mstart < ra_start) && (mend > ra_end)) { + populate_elf_pt_load(phdr, mstart, ra_start - mstart, mstart); + elf->e_phnum++; + bufp += sizeof(struct elf_phdr); + phdr = (struct elf_phdr *)bufp; + populate_elf_pt_load(phdr, ra_end, mend - ra_end, ra_end); + } else if (mstart < ra_start) { + populate_elf_pt_load(phdr, mstart, ra_start - mstart, mstart); + } else if (ra_end < mend) { + populate_elf_pt_load(phdr, ra_end, mend - ra_end, ra_end); + } + } else { + populate_elf_pt_load(phdr, mstart, mend - mstart, mstart); + } - /* Increment number of program headers. */ - (elf->e_phnum)++; + elf->e_phnum++; + bufp += sizeof(struct elf_phdr); + phdr = (struct elf_phdr *) bufp; } - return 0; } static unsigned long init_fadump_header(unsigned long addr) @@ -1175,9 +1120,16 @@ static unsigned long init_fadump_header(unsigned long addr) memset(fdh, 0, sizeof(struct fadump_crash_info_header)); fdh->magic_number = FADUMP_CRASH_INFO_MAGIC; - fdh->elfcorehdr_addr = addr; + /* Elfcorehdr will now be prepared in the second kernel. */ + fdh->elfcorehdr_addr = 0; /* We will set the crashing cpu id in crash_fadump() during crash. */ fdh->crashing_cpu = FADUMP_CPU_UNKNOWN; + fdh->version = FADUMP_VERSION; + + /* Need below vmcoreinfo in second kernel to prepare elfcorehdr. */ + fdh->vmcoreinfo_raddr = fadump_relocate(paddr_vmcoreinfo_note()); + fdh->vmcoreinfo_size = VMCOREINFO_NOTE_SIZE; + /* * When LPAR is terminated by PYHP, ensure all possible CPUs' * register data is processed while exporting the vmcore. @@ -1190,8 +1142,6 @@ static unsigned long init_fadump_header(unsigned long addr) static int register_fadump(void) { unsigned long addr; - void *vaddr; - int ret; /* * If no memory is reserved then we can not register for firmware- @@ -1200,18 +1150,10 @@ static int register_fadump(void) if (!fw_dump.reserve_dump_area_size) return -ENODEV; - ret = fadump_setup_crash_memory_ranges(); - if (ret) - return ret; - addr = fw_dump.fadumphdr_addr; /* Initialize fadump crash info header. */ addr = init_fadump_header(addr); - vaddr = __va(addr); - - pr_debug("Creating ELF core headers at %#016lx\n", addr); - fadump_create_elfcore_headers(vaddr); /* register the future kernel dump with firmware. */ pr_debug("Registering for firmware-assisted kernel dump...\n"); @@ -1230,7 +1172,6 @@ void fadump_cleanup(void) } else if (fw_dump.dump_registered) { /* Un-register Firmware-assisted dump if it was registered. */ fw_dump.ops->fadump_unregister(&fw_dump); - fadump_free_mem_ranges(&crash_mrange_info); } if (fw_dump.ops->fadump_cleanup) @@ -1416,6 +1357,54 @@ static void fadump_release_memory(u64 begin, u64 end) fadump_release_reserved_area(tstart, end); } +/* + * To determine fadump header version, the following conditions apply + * - If magic_number is FADMPINF, return 0 + * - If magic_number is FADMPSIG, return the version value. + * - Otherwise, return -E (header is not valid). + */ +static u32 get_fadump_crash_header_version(struct fadump_crash_info_header *fdh) +{ + if (fdh->magic_number == fadump_str_to_u64("FADMPINF")) + return 0; + else if (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC) + return fdh->version; + + return -EINVAL; +} + +static void fadump_free_elfcorehdr_buf(void) +{ + struct fadump_crash_info_header *fdh; + unsigned long elfcorehdr_vaddr; + + if (!fw_dump.fadumphdr_addr) + return; + + fdh = (struct fadump_crash_info_header *) __va(fw_dump.fadumphdr_addr); + + /* + * For fadump version 1 the memory for elfcorehdr is dynamically + * allocated in second kernel so free it. + */ + if (get_fadump_crash_header_version(fdh) != 1) + return; + + if (fdh->elfcorehdr_addr == 0 || fdh->elfcorehdr_size == 0) + return; + + /* + * Reset the global `elfcorehdr_addr` before freeing `elfcorehdr` + * memory so that other kernel modules, like `vmcore`, do not + * access the invalid memory. + */ + elfcorehdr_addr = ELFCORE_ADDR_MAX; + elfcorehdr_vaddr = (unsigned long) __va(fdh->elfcorehdr_addr); + fadump_free_buffer(elfcorehdr_vaddr, fdh->elfcorehdr_size); + fdh->elfcorehdr_addr = 0; + fdh->elfcorehdr_size = 0; +} + static void fadump_invalidate_release_mem(void) { mutex_lock(&fadump_mutex); @@ -1424,6 +1413,8 @@ static void fadump_invalidate_release_mem(void) return; } + /* Free elfcorehdr buf before cleaning up the fadump reserved area. */ + fadump_free_elfcorehdr_buf(); fadump_cleanup(); mutex_unlock(&fadump_mutex); @@ -1632,6 +1623,93 @@ static void __init fadump_init_files(void) return; } +static int __init fadump_setup_elfcorehdr_buf(struct fadump_crash_info_header *fdh) +{ + char *bufp; + int elf_phdr_cnt; + unsigned long elfcorehdr_size; + + /* + * Program header for CPU notes comes first, followed by one for + * vmcoreinfo, and the remaining program headers correspond to + * memory regions. + */ + elf_phdr_cnt = 2 + fw_dump.boot_mem_regs_cnt + memblock_num_regions(memory); + elfcorehdr_size = sizeof(struct elfhdr) + (elf_phdr_cnt * sizeof(struct elf_phdr)); + elfcorehdr_size = PAGE_ALIGN(elfcorehdr_size); + bufp = (char *)fadump_alloc_buffer(elfcorehdr_size); + if (!bufp) { + pr_err("Failed to allocate %lu bytes for elfcorehdr\n", + elfcorehdr_size); + return -ENOMEM; + } + fdh->elfcorehdr_addr = virt_to_phys(bufp); + fdh->elfcorehdr_size = elfcorehdr_size; + return 0; +} + +/* + * Process an active dump in four steps. First, verify the crash info header + * signature/magic number for integrity and accuracy. Second, if the fadump + * version is greater than 0, prepare the elfcorehdr; for fadump version 0, + * it's already created in the first kernel as part of the fadump reserved + * area. Third, let the platform update CPU notes in elfcorehdr. Finally, + * set elfcorehdr_addr so that the vmcore module can export the elfcore + * header through '/proc/vmcore'. + */ +static void process_fadump(void) +{ + int fadump_version; + struct fadump_crash_info_header *fdh; + + fdh = (struct fadump_crash_info_header *) __va(fw_dump.fadumphdr_addr); + if (!fdh) { + pr_err("Crash info header is empty.\n"); + goto err_out; + } + + fadump_version = get_fadump_crash_header_version(fdh); + + /* + * Negative fadump version indicates that fadump crash info header + * is corrupted. + */ + if (fadump_version < 0) { + pr_err("Crash info header is not valid.\n"); + goto err_out; + } + + /* + * If crashed kernel's fadump crash info header version is 1, + * then create elfcorehdr here. + */ + if (fadump_version >= 1) { + if (fadump_setup_elfcorehdr_buf(fdh)) + goto err_out; + + fadump_populate_elfcorehdr(fdh); + } + + /* + * If dump process fails then invalidate the registration + * and release memory before proceeding for re-registration. + */ + if (fw_dump.ops->fadump_process(&fw_dump) < 0) + goto err_out; + + /* + * elfcore header is now ready to be exported. + * + * set elfcorehdr_addr so that vmcore module will export the + * elfcore header through '/proc/vmcore'. + */ + elfcorehdr_addr = fdh->elfcorehdr_addr; + return; + +err_out: + fadump_invalidate_release_mem(); +} + /* * Prepare for firmware-assisted dump. */ @@ -1651,12 +1729,7 @@ int __init setup_fadump(void) * saving it to the disk. */ if (fw_dump.dump_active) { - /* - * if dump process fails then invalidate the registration - * and release memory before proceeding for re-registration. - */ - if (fw_dump.ops->fadump_process(&fw_dump) < 0) - fadump_invalidate_release_mem(); + process_fadump(); } /* Initialize the kernel dump memory structure and register with f/w */ else if (fw_dump.reserve_dump_area_size) { diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c index 964f464b1b0e..a44d75cefc9e 100644 --- a/arch/powerpc/platforms/powernv/opal-fadump.c +++ b/arch/powerpc/platforms/powernv/opal-fadump.c @@ -526,12 +526,7 @@ static int __init opal_fadump_process(struct fw_dump *fadump_conf) if (!opal_fdm_active || !fadump_conf->fadumphdr_addr) return rc; - /* Validate the fadump crash info header */ fdh = __va(fadump_conf->fadumphdr_addr); - if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { - pr_err("Crash info header is not valid.\n"); - return rc; - } #ifdef CONFIG_OPAL_CORE /* @@ -545,18 +540,7 @@ static int __init opal_fadump_process(struct fw_dump *fadump_conf) kernel_initiated = true; #endif - rc = opal_fadump_build_cpu_notes(fadump_conf, fdh); - if (rc) - return rc; - - /* - * We are done validating dump info and elfcore header is now ready - * to be exported. set elfcorehdr_addr so that vmcore module will - * export the elfcore header through '/proc/vmcore'. - */ - elfcorehdr_addr = fdh->elfcorehdr_addr; - - return rc; + return opal_fadump_build_cpu_notes(fadump_conf, fdh); } static void opal_fadump_region_show(struct fw_dump *fadump_conf, diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c b/arch/powerpc/platforms/pseries/rtas-fadump.c index b5853e9fcc3c..3366082705d5 100644 --- a/arch/powerpc/platforms/pseries/rtas-fadump.c +++ b/arch/powerpc/platforms/pseries/rtas-fadump.c @@ -394,9 +394,6 @@ static int __init rtas_fadump_build_cpu_notes(struct fw_dump *fadump_conf) */ static int __init rtas_fadump_process(struct fw_dump *fadump_conf) { - struct fadump_crash_info_header *fdh; - int rc = 0; - if (!fdm_active || !fadump_conf->fadumphdr_addr) return -EINVAL; @@ -415,25 +412,7 @@ static int __init rtas_fadump_process(struct fw_dump *fadump_conf) return -EINVAL; } - /* Validate the fadump crash info header */ - fdh = __va(fadump_conf->fadumphdr_addr); - if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { - pr_err("Crash info header is not valid.\n"); - return -EINVAL; - } - - rc = rtas_fadump_build_cpu_notes(fadump_conf); - if (rc) - return rc; - - /* - * We are done validating dump info and elfcore header is now ready - * to be exported. set elfcorehdr_addr so that vmcore module will - * export the elfcore header through '/proc/vmcore'. - */ - elfcorehdr_addr = fdh->elfcorehdr_addr; - - return 0; + return rtas_fadump_build_cpu_notes(fadump_conf); } static void rtas_fadump_region_show(struct fw_dump *fadump_conf,
Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the CPUs and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the commit message. The current solution to address the aforementioned issue is as follows: Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. There are several notable issues associated with re-registering fadump for every memory add/remove events. 1. Bulk memory add/remove events with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. The memory for elfcorehdr is allocated based on the memblock regions available during early boot and remains fixed thereafter. However, if elfcorehdr is later recreated with to additional memblock regions, its size will increase, potentially leading to memory corruption. Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. At present, the first kernel prepares the elfcorehdr and stores it in the fadump reserved area. Subsequently, it includes the start address of the elfcorehdr in the elfcorehdr_addr attribute of the fadump header. The fadump header holds information like elfcorehdr address, crashing CPU details, etc. The first kernel prepares the fadump header and stores it in the fadump reserved area. In the event of first kernel crash the second boots and access the fadump header prepared by first kernel and do the following in a platform-specific function [rtas|opal]_fadump_process: 1. Sanity check for fadump header 2. Update CPU notes in elfcorehdr 3. Set the global variable elfcorehdr_addr to the address of the fadump header's elfcorehdr. For vmcore module to process it later on. Along with the above, update the setup_fadump()/fadump.c to create elfcorehdr in second/fadump kernel. Section below outlines the information required to create the elfcorehdr and the changes made to make it available to the fadump kernel if it's not already. To create elfcorehdr, the following crashed kernel information is required: CPU notes, vmcoreinfo, and memory ranges. At present, the CPU notes are already prepared in the fadump kernel, so no changes are needed in that regard. The fadump kernel has access to all crashed kernel memory regions, including boot memory regions that are relocated by firmware to fadump reserved areas, so no changes for that either. However, it is necessary to add new members to the fadump header, i.e., the 'fadump_crash_info_header' structure, in order to pass the crashed kernel's vmcoreinfo address and size to the fadump kernel. Table 1 below illustrates kernel's ability to collect dump if either the first/crashed kernel or the second/fadump kernel does not have the changes introduced here. Consider the 'old kernel' as the kernel without this patch, and the 'new kernel' as the kernel with this patch included. +----------+------------------------+------------------------+-------+ | scenario | first/crashed kernel | second/fadump kernel | Dump | +----------+------------------------+------------------------+-------+ | 1 | old kernel | new kernel | Yes | +----------+------------------------+------------------------+-------+ | 2 | new kernel | old kernel | No | +----------+------------------------+------------------------+-------+ Table 1 Scenario 1: ----------- Since the magic number of fadump header is updated, the second kernel can differentiate the crashed kernel is of type 'new kernel' or 'old kernel' and act accordingly. In this scenario, since the crashed kernel is of type 'old kernel,' the fadump kernel skips elfcorehdr creation and uses the one prepared in the first kernel itself to collect the dump. Scenario 2: ----------- Since 'old kernel' as the fadump kernel is NOT capable of processing fadump header with updated magic number from 'new kernel' hence it gracefully fails with the below error and dump collection fails in this scenario. [ 0.007365] rtas fadump: Crash info header is not valid. Add a version field to the fadump_crash_info_header structure to avoid the need to change its magic number in the future. Adding a version field to the fadump header was one of the TODO items listed in Documentation/powerpc/firmware-assisted-dump.rst. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> --- arch/powerpc/include/asm/fadump-internal.h | 24 +- arch/powerpc/kernel/fadump.c | 361 +++++++++++-------- arch/powerpc/platforms/powernv/opal-fadump.c | 18 +- arch/powerpc/platforms/pseries/rtas-fadump.c | 23 +- 4 files changed, 242 insertions(+), 184 deletions(-)