diff mbox

[U-Boot,RFC] rewrite doc/README.arm-unaligned-accesses

Message ID 1392727962-22039-1-git-send-email-albert.u.boot@aribaud.net
State RFC
Delegated to: Albert ARIBAUD
Headers show

Commit Message

Albert ARIBAUD Feb. 18, 2014, 12:52 p.m. UTC
Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>
---
There has been a few back-and-forths (and sideways too) about how
unaligned accesses are considered in ARM U-Boot. This post is to (try
and) get us together in one place, get things straight about what is
currently done and why as far as alignment is concerned, and to get
doc/README.arm-unaligned-accesses is clear and consistent with it.

Please, in discussing this topic, always elaborate on your opinions
on whatever you are commenting. Avoid saying "this is wrong" or "you
should do XYZ instead" without also telling what exactly is wrong and
why, or what you compare XYZ to and what are the pros and cons of each;
your reasoning may seem obvious to you, but it probably is not to the
person whose post you are commenting, and won't be to the readers of
the resulting doc/README-arm-unaligned-accesses file if it is to
include your contribution.

I also suggest giving concrete examples, code excerpts, possibly even
generated assembly language if required to make a point; just make sure
that any generated code is actually generated (as opposed to manually
constructed) and how it was produced (including compiler options), so
that people can reproduce it.

And yes, this goes for me as well :) so don't hesitate to point out
parts of this text which require improvement... and why they do. :)

 doc/README.arm-unaligned-accesses | 379 +++++++++++++++++++++++++++++---------
 1 file changed, 289 insertions(+), 90 deletions(-)
diff mbox

Patch

diff --git a/doc/README.arm-unaligned-accesses b/doc/README.arm-unaligned-accesses
index c37d135..5873689 100644
--- a/doc/README.arm-unaligned-accesses
+++ b/doc/README.arm-unaligned-accesses
@@ -1,55 +1,229 @@ 
-If you are reading this because of a data abort: the following MIGHT
-be relevant to your abort, if it was caused by an alignment violation.
-In order to determine this, use the PC from the abort dump along with
-an objdump -s -S of the u-boot ELF binary to locate the function where
-the abort happened; then compare this function with the examples below.
-If they match, then you've been hit with a compiler generated unaligned
-access, and you should rewrite your code or add -mno-unaligned-access
-to the command line of the offending file.
+Notes:
 
-Note that the PC shown in the abort message is relocated. In order to
-be able to match it to an address in the ELF binary dump, you will need
-to know the relocation offset. If your target defines CONFIG_CMD_BDI
-and if you can get to the prompt and enter commands before the abort
-happens, then command "bdinfo" will give you the offset. Otherwise you
-will need to try a build with DEBUG set, which will display the offset,
-or use a debugger and set a breakpoint at relocate_code() to see the
-offset (passed as an argument).
-
-*
-
-Since U-Boot runs on a variety of hardware, some only able to perform
-unaligned accesses with a strong penalty, some unable to perform them
-at all, the policy regarding unaligned accesses is to not perform any,
-unless absolutely necessary because of hardware or standards.
-
-Also, on hardware which permits it, the core is configured to throw
-data abort exceptions on unaligned accesses in order to catch these
-unallowed accesses as early as possible.
-
-Until version 4.7, the gcc default for performing unaligned accesses
-(-mno-unaligned-access) is to emulate unaligned accesses using aligned
-loads and stores plus shifts and masks. Emulated unaligned accesses
-will not be caught by hardware. These accesses may be costly and may
-be actually unnecessary. In order to catch these accesses and remove
-or optimize them, option -munaligned-access is explicitly set for all
-versions of gcc which support it.
-
-From gcc 4.7 onward starting at armv7 architectures, the default for
-performing unaligned accesses is to use unaligned native loads and
-stores (-munaligned-access), because the cost of unaligned accesses
-has dropped on armv7 and beyond. This should not affect U-Boot's
-policy of controlling unaligned accesses, however the compiler may
-generate uncontrolled unaligned accesses on its own in at least one
-known case: when declaring a local initialized char array, e.g.
+i) If you are reading this because of a data abort: section 10 is what
+you're looking for, but please read the whole document.
 
-function foo()
-{
-	char buffer[] = "initial value";
-/* or */
-	char buffer[] = { 'i', 'n', 'i', 't', 0 };
-	...
-}
+ii) In order to make sure the following is self-sufficient, it goes
+through the basics of alignment and assumes only good, not expert,
+knowledge of the C language.
+
+1. C99 alignment requirements
+
+The C99 standard [1] (henceforth: 'C99') defines alignment requirements
+as a "requirement that objects of a particular type be located on
+storage boundaries with addresses that are particular multiples of a
+byte address".
+
+In C99, unaligned accesses (those which which do not respect alignment
+requirements of the object type being accessed) are deemed 'undefined'.
+This means programs which contain unaligned accesses might build and
+execute as if there were no alignment constraints, or build and execute
+but not as expected, or build and crash at execution, or not build
+at all--or even crash the compiler.
+
+2. Implementation alignment requirements
+
+While C99 does define alignment requirements, it does not lay out any
+actual alignment requirements, because these depend greatly on the
+hardware involved; they are thus defined by C99 implementations.
+
+For ARM, the C alignment requirements are laid out in the ARM EABI [2].
+For instance, is is the ARM EABI which sets the alignment constraint of
+type 'long' to four-byte boundaries.
+
+Alignment requirements for a given architecture may differ from
+hardware capabilities, i.e. they might be stricter than what the
+hardware can actually do. One example is (32-bit) x86, which can do
+unaligned accesses at the hardware level at some performance cost, but
+has stricter requirement analogous to the ARM EABI ones. ARM is a mixed
+bag: some older ARM hardware cannot perform unaligned access at all;
+some can but at a cost; some can at virtually no cost.
+
+Further, even when a given architecture is capable of emitting such
+unaligned accesses at the core or CPU level, at a higher (system) level
+they might be forbidden because the target address falls within a
+region in which only aligned accesses are possible.
+
+3. Native vs emulated unaligned accesses.
+
+There is a different between the alignment of accesses in a C program
+source code and the alignment of accesses at the core or CPU level. In
+the following, translated accesses (core or CPU accesses) will be
+qualified as /native/ accesses to distinguish them from (untranslated) C
+program accesses.
+
+A C99 implementation might translate an unaligned access into a native
+unaligned access, or it might emulate it by a combination of native
+aligned accesses. 
+
+4. Alignment requirements in U-Boot
+
+As U-Boot runs on a variety of hardware using a variety of C99
+implementations, alignment requirements for the U-Boot code are the
+strictest possible so that all implementations building U-Boot will have
+their requirements satisfied. For instance, aligning longs to four-byte
+boundaries is compatible with all implementations and is thus the
+requirement for U-Boot.
+
+[U-Boot alignment requirements are actually stricter than just C99: in
+U-Boot, at least some accesses of a given width must not be broken into
+smaller accesses or lumped into a wider access, because they are done
+to or from a device for which each physical access may have side
+effects. C99 does not explicitly state such a constraint except in
+general terms, that a C99 implementation .]
+
+The strict alignment requirements imply that all type declarations in
+U-Boot should respect alignment requirements; especially, that no
+structure should contain unaligned members.
+
+5. Purposeful unaligned accesses in U-Boot
+
+However, there might be cases where some data representation may need
+to include objects which are not aligned to their requirements. This
+happens in structs representing protocols such as IP or USB, for
+instance, or de facto standards such as FAT. Taking FAT as an example,
+here is the start of the FAT header, expressed as a succession of C
+types from position 0 in the first disk sector:
+
+        0       3       jump instruction
+        3       8       Name of the tool which formatted the FAT
+        11      2       Number of bytes per sector
+
+Since the third field is a short, as per ARM EABI it should be
+even-aligned. However, it is not: its offset is 11, an odd value.
+Therefore, a struct representing the FAT header (examples below are
+adapted from actual code in fs/fdos/dos.h) would either have to split
+the field in two smaller bytes...
+
+    typedef struct bootsector
+    {
+        unsigned char jump [3];  /* 0  Jump to boot code */
+        char banner[BANNER_LG];	 /* 3  OEM name & version */
+        unsigned char secsiz_lo; /* 11 Bytes per sector LO */
+        unsigned char secsiz_hi; /* 12 Bytes per sector HI */
+        [...]
+    } Directory_t;
+
+... but this is awkward and makes accesses to the field more
+complicated to express and read. The alternative is to use a short
+field, made unaligned by way of a 'packed' attribute on the struct:
+
+    typedef struct bootsector
+    {
+        unsigned char jump [3]; /* 0  Jump to boot code */
+        char banner[BANNER_LG];	/* 3  OEM name & version */
+        unsigned short secsiz;  /* 11 Bytes per sector */
+        [...]
+    } __attribute__ ((packed))  Directory_t;
+
+The latter is what is done in the U-Boot code, and the access is then
+explicitly turned from one unaligned 16-bit to two aligned 8-bit ones
+by using the macro __le16_to_cpu (which is slightly below U-Boot
+requirements, as we should be using put_unaligned_le16 instead).
+
+6. Unintended unaligned accesses in U-Boot
+
+As stated in section 4, U-Boot code should be conforming enough that it
+would not perform accesses which are unaligned as per ARM EABI, and
+access to intentionally unaligned C objects in U-Boot should be done by
+emulating through get_unaligned/put_unaligned macros.
+
+However, mistakes do happen, and therefore a patch might slip past
+review, or a developer might write new code before submitting it, which
+could cause some unintended unaligned access.
+
+Mostly, the cause of unintended unaligned accesses fall into two
+categories: needlessly packed structs and pointer arithmetic mistakes,
+the latter including compile-time mistakes and run-time mistakes.
+
+Such unaligned accesses might only actually happen on some architecture
+different from the one the developer is working on, which will delay
+discovery of the issue and complicate its analysis and resolution.
+As we cannot test for all targets at each release, it might even happen
+that some release of U-Boot out there would fail to work for a given
+target because an unintended unaligned access was not detected early.
+
+Early detection of unaligned accesses is therefore important, even on
+architecture which could handle such unaligned accesses natively or
+by emulation, for the benefit of architectures which could not.
+
+7. ARM: detecting unintentional unaligned accesses by hardware
+
+[This section assumes that unaligned accesses are translated into native
+unaligned accesses rather than emulated. For a discussion of this
+assumption, see next section.]
+
+In order to catch unintended unaligned accesses on ARM platforms, we can
+use the A bit in SRC: setting it will cause the core to throw a Data
+Abort when a native unaligned access occurs.
+
+This will catch pointer arithmetic errors when they result in badly
+aligned pointers, as well as accesses to unaligned fields in packed
+structures of -munaligned-access is in effect (see next section).
+
+8. ARM: detecting unintentional unaligned accesses by software
+
+In the previous section, we have chosen to configure the ARM hardware
+to catch native unaligned accesses. However, whether unaligned accesses
+are translated into native or emulated accesses depends, on ARM, on a
+compiler option: -m[no-]unaligned-access.
+
+With -munaligned-access, the compiler is allowed to translate unaligned
+accesses from a C program into native unaligned accesses at the core
+level.
+
+With -mno-unaligned-access, the compiler is *not* allowed to translate
+unaligned accesses from a C program into native unaligned accesses at
+the core level, and must therefore emulate them with a combination of
+smaller aligned native accesses.
+
+At first sight, -mno-unaligne-access seems to solve our problem as it
+prevents the core from emitting native unaligned accesses; and
+-munaligned-access appears counter-intuitive as it allows the compiler
+to emit native unaligned accesses, which we precisely want to avoid.
+
+However, if -mno-unaligned-access prevents native unaligned accesses on
+ARM, it has no effect on other architectures for which it does not even
+always have an equivalent; thus, a C program which builds and runs
+without error with -mno-unaligned-access on ARM platforms might fail to
+build, or build but fail to run properly, on other platforms.
+
+On the other hand, -munaligned-access only causes natie unaligned
+accesses if the C program has unaligned accesses to begin with; for C
+programs in which unaligned accesses are all intentional, the compiler
+will not generate native unaligned accesses (except in very specific
+situations which can be controlled, see next section).
+
+We can see that while -mno-unaligned-accesses prevents the ill effects
+of unintended unaligned accesses by turning them into smaller native
+aligned ones, this translation also prevents detecting unaligned
+accesses early, whereas -munaligned-access does.
+
+Again, building ARM U-Boot with -munaligned-access might seem
+contradictory with setting the SRC A bit to catch native unaligned
+accesses; but this is because what we want is not to get rid of native
+unaligned accesses /per se/; what we want is to find and fix unintended
+unaligned accesses in the source code, and we use -munaligned-access
+for this because it makes a better job of translating unintended
+unaligned accesses into native unaligned accesses, and thus at getting
+the SCR.A bit setting to detect them, than -mno-unaligned-access does. 
+
+Again: if we had wanted to just not emit native unaligned accesses, then
+globally turning -mno-unaligned-access on would have been our option of
+choice; but this is not what we want.
+
+9. Corner cases: array initializations
+
+There is actually one single corner case where gcc might generate
+native unaligned accesses from a C program which does not at first
+sight appear to contain unaligned accesses at all. Consider the
+following:
+
+    function foo()
+    {
+        char buffer[] = "initial value";
+        ...
+    }
 
 Under -munaligned-accesses with optimizations on, this declaration
 causes the compiler to generate native loads from the literal string
@@ -57,8 +231,12 @@  and native stores to the buffer, and the literal string alignment
 cannot be controlled. If it is misaligned, then the core will throw
 a data abort exception.
 
-Quite probably the same might happen for 16-bit array initializations
-where the constant is aligned on a boundary which is a multiple of 2
+[under -munaligned-accesses *without* optimizations, there is no issue
+as the compiler emits the initialization as a memcpy() which does not
+cause any native unaligned access]
+
+The same might happen for 16-bit array initializations where the
+literal constant array is aligned on a boundary which is a multiple of 2
 but not of 4:
 
 function foo()
@@ -67,12 +245,17 @@  function foo()
 	...
 }
 
-The long term solution to this issue is to add an option to gcc to
-allow controlling the general alignment of data, including constant
-initialization values.
+Under -munaligned-accesses with optimizations on, the compiler will
+use unaligned long native accesses to copy the literal into the
+variable.
+
+The long term solution to this issue is to modify the compiler in
+order to get finer control on this particular initialization either by
+controlling literal alignment, or by controlling use of native
+unaligned accesses in this case.
 
-However this will only apply to the version of gcc which will have such
-an option. For other versions, there are four workarounds:
+However this will at best only apply to a future version of gcc. For
+existing versions, there are four workarounds:
 
 a) Enforce as a rule that array initializations as described above
    are forbidden. This is generally not acceptable as they are valid,
@@ -80,43 +263,59 @@  a) Enforce as a rule that array initializations as described above
    is when they actually equate to a const char* declaration, i.e. the
    array is initialized and never modified in the function's scope.
 
-b) Drop the requirement on unaligned accesses at least for ARMv7,
-   i.e. do not throw a data abort exception upon unaligned accesses.
-   But that will allow adding badly aligned code to U-Boot, only for
-   it to fail when re-used with a stricter target, possibly once the
-   bad code is already in mainline.
+b) Drop the hardware requirement on unaligned accesses at least for
+   ARMv7, i.e. do not throw a data abort exception upon unaligned
+   accesses. But that will allow adding badly aligned code to U-Boot,
+   only for it to fail when re-used with a stricter target, possibly
+   once the bad code is already in mainline.
 
 c) Relax the -munaligned-access rule globally. This will prevent native
-   unaligned accesses of course, but that will also hide any bug caused
-   by a bad unaligned access, making it much harder to diagnose it. It
-   is actually what already happens when building ARM targets with a
-   pre-4.7 gcc, and it may actually already hide some bugs yet unseen
-   until the target gets compiled with -munaligned-access.
-
-d) Relax the -munaligned-access rule only for for files susceptible to
-   the local initialized array issue and for armv7 architectures and
+   unaligned accesses from being generated of course, but that will also
+   hide any bug caused by an unwanted unaligned access, making it much
+   harder    to diagnose it. It is actually what already happens
+   when building ARM targets with a pre-4.7 gcc, and it may actually
+   already hide some bugs yet unseen until the target gets compiled
+   with -munaligned-access.
+
+d) Relax the -munaligned-access rule only for files susceptible to the
+   local initialized array issue and for armv7 architectures and
    beyond. This minimizes the quantity of code which can hide unwanted
    misaligned accesses.
 
 The option retained is d).
 
-Considering that actual occurrences of the issue are rare (as of this
-writing, 5 files out of 7840 in U-Boot, or .3%, contain an initialized
-local char array which cannot actually be replaced with a const char*),
-contributors should not be required to systematically try and detect
-the issue in their patches.
-
-Detecting files susceptible to the issue can be automated through a
-filter installed as a hook in .git which recognizes local char array
-initializations. Automation should err on the false positive side, for
-instance flagging non-local arrays as if they were local if they cannot
-be told apart.
-
-In any case, detection shall not prevent committing the patch, but
-shall pre-populate the commit message with a note to the effect that
-this patch contains an initialized local char or 16-bit array and thus
-should be protected from the gcc 4.7 issue.
-
-Upon a positive detection, either $(PLATFORM_NO_UNALIGNED) should be
-added to CFLAGS for the affected file(s), or if the array is a pseudo
-const char*, it should be replaced by an actual one.
+10. Alignment issues and Data Aborts
+
+If you are reading this file because of the message displayed during a
+data abort: the following MIGHT be relevant to your abort, if it was
+caused by an alignment requirement violation. This may have been caused
+by bad pointer arithmetic, or improper access to unaligned fields in a
+packed struct, or a corner case as described in the previous section,
+or an unlikely new corner case.
+
+In order to determine which is which, you must first determine whether
+the data abort is an alignment abort. Refer to the ARM architecture
+reference manual to determine the exact cause of the abort.
+
+If the abort is indeed an alignment one, then use the PC from the abort
+dump along with an objdump -d -S of the u-boot ELF binary to locate the
+function where the abort happened; then compare this function with the
+examples of this file. If they match, then you've been hit with a
+compiler-generated unaligned access, and you should rewrite your code
+or add -mno-unaligned-access to the command line of the offending file.
+Otherwise, you're probably having a bad pointer arithmetic case.
+
+Note that the PC shown in the abort message is relocated. In order to
+be able to match it to an address in the ELF binary dump, you will need
+to know the relocation offset. If your target defines CONFIG_CMD_BDI
+and if you can get to the prompt and enter commands before the abort
+happens, then command "bdinfo" will give you the offset. Otherwise you
+will need to try a build with DEBUG set, which will display the offset
+(but you'll have to witness the abort again on the DEBUG build, as the
+abort PC will have changed), or use a debugger and set a breakpoint at
+relocate_code() to see the offset (passed as an argument).
+
+11. References
+
+[1] ISO/IEC 9899:1999
+[2] ARM IHI 0042E (AAPCS)