Patchwork Committed, libgcc MMIX: implement static marking of program and data memory

login
register
mail settings
Submitter Hans-Peter Nilsson
Date Oct. 21, 2012, 3:21 a.m.
Message ID <alpine.BSF.2.00.1210202317430.23248@dair.pair.com>
Download mbox | patch
Permalink /patch/192982/
State New
Headers show

Comments

Hans-Peter Nilsson - Oct. 21, 2012, 3:21 a.m.
With a simulator that doesn't just allocate zeros on any access, it's
necessary to tell the simulator the bounds of defined memory, both for
static and dynamically allocated memory.  This patch implements static
code and data allocatation; zero'd data and constants may not be
otherwise loaded.  A patch to newlib is about to be committed to
mark dynamically allocated memory.

The attached mmix-sim.ch (a "literate programming" / ctangle "change
file"; the equivalent of a patch) implements such marking and memory
checking.  Put it into the untarred mmix-20110831.tar.gz (may work
with other versions) before compiling the "mmix" simulator.  Beware
that the distribution terms of the mmix simulator requires (in my
layman interpretation) that the resulting program is not distributed
as any part of the original mmixware package.

There was some fallout from this new checking, but I believe all
needed patches have been at least submitted, though not all approved
and committed.

libgcc:
	* config/mmix/crti.S: Mark program and data addresses using PRELD.
	Remove typo'd and unnecessary alignment-LOC for .data.  Remove
	no-longer-needed LDBU insns.


brgds, H-P
Copyright 2012 Hans-Peter Nilsson.  This file may be freely copied and
distributed, provided that no changes whatsoever are made.  Ah, just
kidding: you may change it as you like, except that this paragraph,
including the above attribution, must be kept unmodified and the
distribution terms not limited.  Add your own attribution below if you
change anything, so people don't blame me.  Special permission is
granted for the copyright owner of the original package to which this
file is a "change file", to include any part or function of this file
into the original package without any attribution or other
distribution terms than those of the original package.

Change file for the MMIXware "mmix" simulator, implementing an add-on:

Requiring the added command-line option -R (for Relaxed) or else that
the program by default follows certain requirements.  If a program
breaks the requirements, simulation halts and the simulator exits with
exitcode 127, unless interactive mode is set or the -I option is used.
This automatic-halt-on-break-mode is only active as long as no
ordinary breakpoints are set.  (For an otherwise unchanged mmix
simulator, breakpoints can not be set in non-interactive-mode, so this
matters only when there are other local changes.)

On the original unchanged mmix simulator, the only effect of these
requirements is the additional execution time of the annotating
instructions.  The requirements are:

- Control-flow goes to an aligned tetra (from instructions GO, PUSHGO,
RESUME).

- Besides the initial non-zero program image contents, memory is
marked as allocated using the PRELD instruction, with operands
covering the range of addresses being allocated, as if "pre-loaded"
per the documented PRELD semantics.

- Memory for the register stack is explicitly allocated before reads
from previously unused locations, e.g. when using UNSAVE.  (Memory is
still automatically allocated for register stack writes.)

- The downward-growing conventional stack, i.e. the stack-pointer
address in register $254, is initialized in Main, before the first
PUSHJ or equivalent.

- Dynamically loaded code is "registered" as code, using SYNCID
instructions, for the range of instructions to be executed similar to
the documented semantics of that instruction.

- All executable code for the original program image (i.e. not
including dynamically loaded code) is below address 0x20<<56.

The goal is to simplify debugging buggy programs and to avoid the
negative effects of automatically allocated memory, i.e. that buggy
programs run forever or in lucky cases (unlucky for some) until they
eat up all memory on the host machine.

Beware that zeros due to source-code data are sometimes not entered in
the program image: the program should either make sure to always
"allocate" static data if that may happen.  (N.B.: PRELD has no effect
on already-allocated data.)  Data normally allocated to the "common"
or .bss section (which holds zero-initialized data without explicit
initialization and also may hold some data with *explicit* zero-
initialization) is no exception.  Thus, using gcc version before 4.8
(the first version that does the required PRELD for zero-initialized
data and has non-zero data at the beginning and end of initialized
data in its support-code) and newlib before version 1.21 that does
PRELD on dynamically allocated data (e.g. as a result of malloc) will
likely result in spurious failures.

This is not intended as a fool-proof mechanism.  For example,
neighboring data in the same 512-tetra block may be automatically
"allocated" too.  This mechanism also interferes with the breakpoint
mechanism in that the disallowed accesses are mapped to the ordinary
breakpoints, and will show up as such with the 'B' command (it's a
feature).  This will not otherwise interfere with explicitly set or
cleared ordinary breakpoints.  To wit, they will not be cleared by
explicit-access by SYNCID or PRELD.


Some apologies for the level of literacy of the changes (and some
whining) seem appropriate.  Literate programming doesn't include
literate patching...  We have to present .ch-changes in the order
found in the .w file, due to limitations in ctangle (should in theory
be fixable).  This order is another than the preferred order in which
the changes are presented at the literate level.  Regarding the code-
address sanity-check duplication, we also can't nest section-macros.
A @d-macro gets too long for one line and as macros can't span
multiple lines, we settle for code duplication, which leads to a tiny
bit uglier code than intended.  (Ok, the last one is not thoroughly
investigated, I just gave up.)


First the code-address sanity checks.  We do them at PUSHGO, POP and
GO, see (a), (b) and (c) below.

Then memory contents checking.  The extra checking is conditional on
options and simulation context, so we need a few more global
variables.  They're at (d) below.

We also need a few more bits.  We piggy-back on the trace/breakpoint
machinery, but with special marking so we can tell them from ordinary
breakpoints when needed.  The definitions are at (e) below.

At certain points and for certain conditions, we turn on and off the
machinery.  See (f) for command-line options and (g) for when it's
being armed.  In (n) and (o) we handle writes to the register stack.

We initialize these extra bits when memory is (implicitly) allocated,
i.e. at each new_mem call, we call a new function init_new_mem once we
know the address of the block.  You see init_new_mem at (m) and the top
call at (h).  The only other call needed, when memory is implicitly
allocated, is at (i).  Note that we don't need any changes to get the
checking we need, as we ride for free (i.e. with no added
execution-time penalty) on the execute-breakpoint and the read and
write breakpoints.

Memory is validated where the SYNCID and and PRELD instructions are
handled, see part (j).

The mechanism for setting a breakpoint also clears it.  We drop the
special bits for a memory location if it's touched by a breakpoint
command, so explicitly set or cleared ordinary breakpoints work as
before.  See (k).

The one final detail is the updated help-text, also suitable as as
feature-presence signature at (l), and handling of accessibility
breakpoints making simulation go to interactive mode, at (p).

@x Part (m), accessibility of implicitly allocated memory.
mem_node* new_mem @,@,@[ARGS((void))@];@+@t}\6{@>
@y
static void init_new_mem @,@,@[ARGS((mem_node*))@];@+@t}\6{@>
static void init_new_mem(m)
  mem_node* m;
{
 unsigned char default_access=0;
 unsigned i;
 if (break_on_undefined_memory_armed) {
   default_access=unexec_bit|exec_bit;
   /* traditional stack access ok (high_stack_address > loc >= g[254]) */
   if ((m->loc.h<high_stack_address.h ||
        (m->loc.h==high_stack_address.h && m->loc.l<high_stack_address.l)) &&
       (m->loc.h>g[254].h ||
        /* remember that m->loc points at the *start* of the block */
        (m->loc.h==g[254].h && m->loc.l>=(g[254].l&0xfffff800))))
     ;
   else default_access|=inaccess_bit|read_bit|write_bit;
 }
 for (i=0;i<sizeof(m->dat)/sizeof(m->dat[0]);i++)
   m->dat[i].bkpt = default_access;
}

mem_node* new_mem @,@,@[ARGS((void))@];@+@t}\6{@>
@z

Part (h).
@x At top, creating the root memory node.
mem_root->loc.h=0x40000000;
@y
mem_root->loc.h=0x40000000;
init_new_mem(mem_root);
@z

Part (d), extra variables for memory validity checking.
@x
mem_node *mem_root; /* root of the treap */
@y
mem_node *mem_root; /* root of the treap */
bool relaxed_memory_model; /* auto-define memory and execute anything? */
bool break_on_undefined_memory_armed; /* break, or define memory automatically? */
bool undefined_memory_accessed; /* it happened */
bool auto_nonint_bkpt_cont; /* if true, simulation continues */
octa high_stack_address; /* at first PUSHJ/PUSHGO this must be initialized */
@z

@x Part (o), strictness exceptions when handling the register stack.
mem_tetra* mem_find @,@,@[ARGS((octa))@];@+@t}\6{@>
@y
mem_tetra* mem_find @,@,@[ARGS((octa))@];@+@t}\6{@>
static mem_tetra* mem_find_regstack @,@,@[ARGS((octa))@];@+@t}\6{@>
static mem_tetra* mem_find_regstack(addr)
  octa addr;
{
 /* We just roll back to the -P rules, temporarily. */
 /* FIXME: we could do better, like disallowing execution. */
 bool saved_break_on_undefined_memory=break_on_undefined_memory_armed;
 mem_tetra* m;
 break_on_undefined_memory_armed=false;
 m=mem_find(addr);
 break_on_undefined_memory_armed=saved_break_on_undefined_memory;
 return m;
}
@z

Part (i).
@x In mem_find.
  *q=new_mem();
  (*q)->loc=key;
@y
  {
   mem_node *qp = new_mem();
   qp->loc=key;
   init_new_mem(qp);
   *q=qp;
  }
@z

Part (e), memory checking macros.
@x Extra breakpoint flags for asserting valid memory and executability.

@d trace_bit (1<<3)
@y

@d trace_bit (1<<3)
@d inaccess_bit (1<<5) /* distinct from a read/write breakpoint */
@d unexec_bit (1<<4) /* distinct from an exec breakpoint */
@d allow_exec(m) if ((m)->bkpt&unexec_bit) (m)->bkpt&=~(unexec_bit|exec_bit)
@d allow_access(m) if ((m)->bkpt&inaccess_bit) (m)->bkpt&=~(inaccess_bit|read_bit|write_bit)
@z

@x Part (n), exceptions when handling the register-stack.
  register mem_tetra *ll=mem_find(g[rS]);
@y
  register mem_tetra *ll=mem_find_regstack(g[rS]);
@z

This is part (a) in checking sanity of code-addresses.
@x Check PUSHGO addresses.  We don't need to check the limited-range PUSHJ.
case PUSHGO: case PUSHGOI: inst_ptr=w;@+goto push;
@y
case PUSHGO: case PUSHGOI:
 if (!relaxed_memory_model && (w.l&3) != 0) {
   sprintf(lhs,"!misaligned pointer to");
   goto break_inst;
 }
 inst_ptr=w;
 goto push;
@z

Part (g), going strict for the validity checking.
@x "Arm" the defined-memory-checking machinery on the first PUSHJ/PUSHGO.
push:@+if (xx>=G) {
@y
push:
 if (!relaxed_memory_model) {
   if (!break_on_undefined_memory_armed)
     high_stack_address=g[254];
   break_on_undefined_memory_armed=true;
 }
 if (xx>=G) {
@z

This is part (b) checking code-address sanity.
@x For POP, we may have problems displaying the caller address...
 y=g[rJ];@+ z.l=yz<<2;@+ inst_ptr=oplus(y,z);
@y
 y=g[rJ];@+ z.l=yz<<2;
 w=oplus(y,z);
 if (!relaxed_memory_model && (w.l&3) != 0) {
   sprintf(lhs,"!misaligned pointer to");
   goto break_inst;
 }
 inst_ptr=w;
@z

@x Part (j), SYNCID and PRELD setting accessibility.
case SYNCID: case SYNCIDI: case PREST: case PRESTI:
case SYNCD: case SYNCDI: case PREGO: case PREGOI:
case PRELD: case PRELDI: x=incr(w,xx);@+break;
@y
case SYNCID: case SYNCIDI:
 for (i=0;i<=xx;i+=4) {
   x=incr(w,i);
   ll=mem_find(x);
   allow_exec(ll);
 }
case PREST: case PRESTI: case SYNCD: case SYNCDI:
case PREGO: case PREGOI: x=incr(w,xx);@+break;
case PRELD: case PRELDI:
 for (i=0;i<=xx;i+=4) {
   x=incr(w,i);
   ll=mem_find(x);
   allow_access(ll);
 }
 x=incr(w,xx);@+break;
@z

This is part (c) in the code-address sanity-checking.
@x Check GO similarly to how we check PUSHGO.
case GO: case GOI: x=inst_ptr;@+inst_ptr=w;@+goto store_x;
@y
case GO: case GOI:
 if (!relaxed_memory_model && (w.l&3) != 0) {
   sprintf(lhs,"!misaligned pointer to");
   goto break_inst;
 }
 x=inst_ptr;@+inst_ptr=w;@+goto store_x;
@z

@x Part (p), halting after a strictness breakpoint.
    if (interact_after_break) interacting=true, interact_after_break=false;
@y
    if (interact_after_break) interacting=true, interact_after_break=false;
    else if (breakpoint&&!auto_nonint_bkpt_cont&&!interacting) break;
@z

@x Cont'd.; exit with non-zero exit code for a strictness halt.
  return g[255].l; /* provide rudimentary feedback for non-interactive runs */
@y
  return breakpoint&&!interacting&&!auto_nonint_bkpt_cont&&!halted?
   127 : g[255].l; /* provide rudimentary feedback for non-interactive runs */
@z

@x Part (f), command-line option handling.
 case 'r': stack_tracing=true;@+return;
@y
 case 'r': stack_tracing=true;@+return;
 case 'R': relaxed_memory_model=auto_nonint_bkpt_cont=true;@+return;
@z

@x Part (l), update for the help-text.
"-r    trace hidden details of the register stack\n",@|
@y
"-r    trace hidden details of the register stack\n",@|
"-R    relaxed memory model with implicit memory allocation etc.\n",@|
@z

@x Part (k), clearing the accessibility marking for ordinary breakpoints.
   ll->bkpt=(ll->bkpt&-8)|k;
@y
   ll->bkpt=(ll->bkpt&-8&~(unexec_bit|inaccess_bit))|k;
   auto_nonint_bkpt_cont=true;
@z

Patch

Index: crti.S
===================================================================
--- crti.S	(revision 192353)
+++ crti.S	(working copy)
@@ -35,20 +35,25 @@  see the files COPYING3 and COPYING.RUNTI
 % respectively, so the compiler can switch between them pretending they're
 % segments.

-% This little treasure is here so the 32 lowest address bits of user data
-% will not be zero.  Because of truncation, that would cause testcase
-% gcc.c-torture/execute/980701-1.c to incorrectly fail.
+% This little treasure (some contents) is required so the 32 lowest
+% address bits of user data will not be zero.  Because of truncation,
+% that would cause testcase gcc.c-torture/execute/980701-1.c to
+% incorrectly fail.

 	.data	! mmixal:= 8H LOC Data_Segment
 	.p2align 3
-	LOC @+(8-@)@7
-	OCTA 2009
+dstart	OCTA 2009

 	.text	! mmixal:= 9H LOC 8B; LOC #100
 	.global Main

 % The __Stack_start symbol is provided by the link script.
 stackpp	OCTA __Stack_start
+crtstxt	OCTA _init	% Assumed to be the lowest executed address.
+	OCTA __etext	% Assumed to be beyond the highest executed address.
+
+crtsdat	OCTA dstart	% Assumed to be the lowest accessed address.
+	OCTA _end	% Assumed to be beyond the highest accessed address.

 % "Main" is the magic symbol the simulator jumps to.  We want to go
 % on to "main".
@@ -56,16 +61,47 @@  stackpp	OCTA __Stack_start
 Main	SETL	$255,32
 	PUT	rG,$255

+% Make sure we have valid memory for addresses in .text and .data (and
+% .bss, but we include this in .data), for the benefit of mmo-using
+% simulators that require validation of addresses for which contents
+% is not present.  Due to its implicit-zero nature, zeros in contents
+% may be left out in the mmo format, but we don't know the boundaries
+% of those zero-chunks; for mmo files from binutils, they correspond
+% to the beginning and end of sections in objects before linking.  We
+% validate the contents by executing PRELD (0; one byte) on each
+% 2048-byte-boundary of our .text .data, and we assume this size
+% matches the magic lowest-denominator chunk-size for all
+% validation-requiring simulators.  The effect of the PRELD (any size)
+% is assumed to be the same as initial loading of the contents, as
+% long as the PRELD happens before the first PUSHJ/PUSHGO.  If it
+% happens after that, we'll need to distinguish between
+% access-for-execution and read/write access.
+
+	GETA	$255,crtstxt
+	LDOU	$2,$255,0
+	ANDNL	$2,#7ff		% Align the start at a 2048-boundary.
+	LDOU	$3,$255,8
+	SETL	$4,2048
+0H	PRELD	0,$2,0
+	ADDU	$2,$2,$4
+	CMP	$255,$2,$3
+	BN	$255,0B
+
+	GETA	$255,crtsdat
+	LDOU	$2,$255,0
+	ANDNL	$2,#7ff
+	LDOU	$3,$255,8
+0H	PRELD	0,$2,0
+	ADDU	$2,$2,$4
+	CMP	$255,$2,$3
+	BN	$255,0B
+
 % Initialize the stack pointer.  It is supposedly made a global
 % zero-initialized (allowed to change) register in crtn.S; we use the
 % explicit number.
 	GETA	$255,stackpp
 	LDOU	$254,$255,0

-% Make sure we get more than one mem, to simplify counting cycles.
-	LDBU	$255,$1,0
-	LDBU	$255,$1,1
-
 	PUSHJ	$2,_init

 #ifdef __MMIX_ABI_GNU__