diff mbox

[23/27] Documentation: the "intro" subdirectory

Message ID 1414774977-25605-23-git-send-email-dmalcolm@redhat.com
State New
Headers show

Commit Message

David Malcolm Oct. 31, 2014, 5:02 p.m. UTC
The "intro" subdirectory of gcc/jit/docs consists of a 4-part tutorial
aimed at experienced developers who are new users of the library, taking
them from beginning use all the way up to adding JIT-compilation to an
interpreter.

gcc/jit/
	* docs/intro/factorial.png: New.
	* docs/intro/index.rst: New.
	* docs/intro/sum-of-squares.png: New.
	* docs/intro/tutorial01.rst: New.
	* docs/intro/tutorial02.rst: New.
	* docs/intro/tutorial03.rst: New.
	* docs/intro/tutorial04.rst: New.
---
 gcc/jit/docs/intro/factorial.png      |  Bin 0 -> 183838 bytes
 gcc/jit/docs/intro/index.rst          |   27 +
 gcc/jit/docs/intro/sum-of-squares.png |  Bin 0 -> 22839 bytes
 gcc/jit/docs/intro/tutorial01.rst     |   52 ++
 gcc/jit/docs/intro/tutorial02.rst     |  349 +++++++++++
 gcc/jit/docs/intro/tutorial03.rst     |  378 +++++++++++
 gcc/jit/docs/intro/tutorial04.rst     | 1108 +++++++++++++++++++++++++++++++++
 7 files changed, 1914 insertions(+)
 create mode 100644 gcc/jit/docs/intro/factorial.png
 create mode 100644 gcc/jit/docs/intro/index.rst
 create mode 100644 gcc/jit/docs/intro/sum-of-squares.png
 create mode 100644 gcc/jit/docs/intro/tutorial01.rst
 create mode 100644 gcc/jit/docs/intro/tutorial02.rst
 create mode 100644 gcc/jit/docs/intro/tutorial03.rst
 create mode 100644 gcc/jit/docs/intro/tutorial04.rst

Comments

Jeff Law Oct. 31, 2014, 9:29 p.m. UTC | #1
On 10/31/14 11:02, David Malcolm wrote:
> The "intro" subdirectory of gcc/jit/docs consists of a 4-part tutorial
> aimed at experienced developers who are new users of the library, taking
> them from beginning use all the way up to adding JIT-compilation to an
> interpreter.
>
> gcc/jit/
> 	* docs/intro/factorial.png: New.
> 	* docs/intro/index.rst: New.
> 	* docs/intro/sum-of-squares.png: New.
> 	* docs/intro/tutorial01.rst: New.
> 	* docs/intro/tutorial02.rst: New.
> 	* docs/intro/tutorial03.rst: New.
> 	* docs/intro/tutorial04.rst: New.
I'm not actually reviewing the .rst bits, but instead looking at the 
generate docs -- and I'm mostly just looking at them from a higher level 
(ie, what belongs, what doesn't and such).  Given you've addressed the 
issues raised earlier I don't expect further changes in this area.

Patches 23, 24, 25 are all OK.

I don't see #26 or #27 in the series?  Were they particularly large?


jeff
David Malcolm Oct. 31, 2014, 9:44 p.m. UTC | #2
On Fri, 2014-10-31 at 15:29 -0600, Jeff Law wrote:
> On 10/31/14 11:02, David Malcolm wrote:
> > The "intro" subdirectory of gcc/jit/docs consists of a 4-part tutorial
> > aimed at experienced developers who are new users of the library, taking
> > them from beginning use all the way up to adding JIT-compilation to an
> > interpreter.
> >
> > gcc/jit/
> > 	* docs/intro/factorial.png: New.
> > 	* docs/intro/index.rst: New.
> > 	* docs/intro/sum-of-squares.png: New.
> > 	* docs/intro/tutorial01.rst: New.
> > 	* docs/intro/tutorial02.rst: New.
> > 	* docs/intro/tutorial03.rst: New.
> > 	* docs/intro/tutorial04.rst: New.
> I'm not actually reviewing the .rst bits, but instead looking at the 
> generate docs -- and I'm mostly just looking at them from a higher level 
> (ie, what belongs, what doesn't and such).  Given you've addressed the 
> issues raised earlier I don't expect further changes in this area.
> 
> Patches 23, 24, 25 are all OK.
> 
> I don't see #26 or #27 in the series?  Were they particularly large?

Yes: I deliberately didn't send 19, 26, 27 as they were large (170k and
up), and are essentially already approved:

-rw-rw-r--. 1 david david 267546 Oct 31 12:35 0019-Testsuite-for-the-JIT.patch
-rw-rw-r--. 1 david david 177604 Oct 31 12:35 0026-Prebuilt-texinfo-documentation-for-the-JIT-library.patch
-rw-rw-r--. 1 david david 188093 Oct 31 12:35 0027-ChangeLog-files.patch

(and it's all in git, fwiw).

Some notes on each of these:

re [PATCH 19/27] Testsuite for the JIT

An earlier version of this was posted as:
  "[PATCH 07/10] Testsuite for the JIT (Re: Patches 5-10 of jit merger (was: Re: [PATCH 0/5] Merger of jit branch (v2)))"
    https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01248.html

which you approved in
  https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01413.html
with:
> OK when rest of JIT bits are approved.
> I just did some very rough sanity checking -- the details of the
> testsuite I'm totally leaving in your hands :-)

The only changes have been to jit.exp, to handle running the
testsuite without needing to install.



re [PATCH 26/27] Prebuilt texinfo documentation for the JIT library

This is for the benefit of those without Sphinx installed,
and is autogenerated by running "make texinfo" in the gcc/jit/docs
subdirectory.

I posted an earlier version of this as:
  "[PATCH 09/10] Prebuilt texinfo documentation for the JIT library (Re: Patches 5-10 of jit merger)"
    https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01395.html

which you approved in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01415.html
> Assuming these are built from the sphinx stuff, these are fine
> once the rest of the JIT stuff is approved -- as it updating these
> things if there's any updates necessary to the source sphinx
> documentation.

which preapproves this updated version.



re [PATCH 27/27] ChangeLog files

The dmalcolm/jit branch has ChangeLog.jit files describing the
history of the branch.  Add them to trunk for the record.

Also, there will be a gcc/jit/ChangeLog.

I posted an earlier version of this as:
  "[PATCH 10/10] ChangeLog files (Re: Patches 5-10 of jit merger (was: Re: [PATCH 0/5] Merger of jit branch (v2)))"
    https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01250.html

which you approved in
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01416.html :
> OK once the rest of the JIT stuff is approved.
> Not going to look at these in any detail.

Relative to that, this version merely adds new entries for the work
I've done on the branch since the original merger request, and is
thus presumably an obvious change.


Do you need me to post anything more for these three?

Thanks
Dave
Jeff Law Nov. 3, 2014, 4:23 p.m. UTC | #3
On 10/31/14 15:44, David Malcolm wrote:
>>
>> I don't see #26 or #27 in the series?  Were they particularly large?
>
> Yes: I deliberately didn't send 19, 26, 27 as they were large (170k and
> up), and are essentially already approved:
OK.  No worries then.

Jeff
diff mbox

Patch

diff --git a/gcc/jit/docs/intro/factorial.png b/gcc/jit/docs/intro/factorial.png
new file mode 100644
index 0000000..dff47ce
Binary files /dev/null and b/gcc/jit/docs/intro/factorial.png differ
diff --git a/gcc/jit/docs/intro/index.rst b/gcc/jit/docs/intro/index.rst
new file mode 100644
index 0000000..d3bcec9
--- /dev/null
+++ b/gcc/jit/docs/intro/index.rst
@@ -0,0 +1,27 @@ 
+.. Copyright (C) 2014 Free Software Foundation, Inc.
+   Originally contributed by David Malcolm <dmalcolm@redhat.com>
+
+   This is free software: you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see
+   <http://www.gnu.org/licenses/>.
+
+Tutorial
+========
+
+.. toctree::
+   :maxdepth: 2
+
+   tutorial01.rst
+   tutorial02.rst
+   tutorial03.rst
+   tutorial04.rst
diff --git a/gcc/jit/docs/intro/sum-of-squares.png b/gcc/jit/docs/intro/sum-of-squares.png
new file mode 100644
index 0000000..7a3b4af
Binary files /dev/null and b/gcc/jit/docs/intro/sum-of-squares.png differ
diff --git a/gcc/jit/docs/intro/tutorial01.rst b/gcc/jit/docs/intro/tutorial01.rst
new file mode 100644
index 0000000..b1a5128
--- /dev/null
+++ b/gcc/jit/docs/intro/tutorial01.rst
@@ -0,0 +1,52 @@ 
+.. Copyright (C) 2014 Free Software Foundation, Inc.
+   Originally contributed by David Malcolm <dmalcolm@redhat.com>
+
+   This is free software: you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see
+   <http://www.gnu.org/licenses/>.
+
+.. default-domain:: c
+
+Tutorial part 1: "Hello world"
+==============================
+
+Before we look at the details of the API, let's look at building and
+running programs that use the library.
+
+Here's a toy "hello world" program that uses the library to synthesize
+a call to `printf` and uses it to write a message to stdout.
+
+Don't worry about the content of the program for now; we'll cover
+the details in later parts of this tutorial.
+
+   .. literalinclude:: ../examples/tut01-hello-world.c
+    :language: c
+
+Copy the above to `tut01-hello-world.c`.
+
+Assuming you have the jit library installed, build the test program
+using:
+
+.. code-block:: console
+
+  $ gcc \
+      tut01-hello-world.c \
+      -o tut01-hello-world \
+      -lgccjit
+
+You should then be able to run the built program:
+
+.. code-block:: console
+
+  $ ./tut01-hello-world
+  hello world
diff --git a/gcc/jit/docs/intro/tutorial02.rst b/gcc/jit/docs/intro/tutorial02.rst
new file mode 100644
index 0000000..f52499e
--- /dev/null
+++ b/gcc/jit/docs/intro/tutorial02.rst
@@ -0,0 +1,349 @@ 
+.. Copyright (C) 2014 Free Software Foundation, Inc.
+   Originally contributed by David Malcolm <dmalcolm@redhat.com>
+
+   This is free software: you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see
+   <http://www.gnu.org/licenses/>.
+
+.. default-domain:: c
+
+Tutorial part 2: Creating a trivial machine code function
+---------------------------------------------------------
+
+Consider this C function:
+
+.. code-block:: c
+
+   int square (int i)
+   {
+     return i * i;
+   }
+
+How can we construct this at run-time using libgccjit?
+
+First we need to include the relevant header:
+
+.. code-block:: c
+
+  #include <libgccjit.h>
+
+All state associated with compilation is associated with a
+:c:type:`gcc_jit_context *`.
+
+Create one using :c:func:`gcc_jit_context_acquire`:
+
+.. code-block:: c
+
+  gcc_jit_context *ctxt;
+  ctxt = gcc_jit_context_acquire ();
+
+The JIT library has a system of types.  It is statically-typed: every
+expression is of a specific type, fixed at compile-time.  In our example,
+all of the expressions are of the C `int` type, so let's obtain this from
+the context, as a :c:type:`gcc_jit_type *`, using
+:c:func:`gcc_jit_context_get_type`:
+
+.. code-block:: c
+
+  gcc_jit_type *int_type =
+    gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_INT);
+
+:c:type:`gcc_jit_type *` is an example of a "contextual" object: every
+entity in the API is associated with a :c:type:`gcc_jit_context *`.
+
+Memory management is easy: all such "contextual" objects are automatically
+cleaned up for you when the context is released, using
+:c:func:`gcc_jit_context_release`:
+
+.. code-block:: c
+
+  gcc_jit_context_release (ctxt);
+
+so you don't need to manually track and cleanup all objects, just the
+contexts.
+
+Although the API is C-based, there is a form of class hierarchy, which
+looks like this::
+
+  +- gcc_jit_object
+      +- gcc_jit_location
+      +- gcc_jit_type
+         +- gcc_jit_struct
+      +- gcc_jit_field
+      +- gcc_jit_function
+      +- gcc_jit_block
+      +- gcc_jit_rvalue
+          +- gcc_jit_lvalue
+             +- gcc_jit_param
+
+There are casting methods for upcasting from subclasses to parent classes.
+For example, :c:func:`gcc_jit_type_as_object`:
+
+.. code-block:: c
+
+   gcc_jit_object *obj = gcc_jit_type_as_object (int_type);
+
+One thing you can do with a :c:type:`gcc_jit_object *` is
+to ask it for a human-readable description, using
+:c:func:`gcc_jit_object_get_debug_string`:
+
+.. code-block:: c
+
+   printf ("obj: %s\n", gcc_jit_object_get_debug_string (obj));
+
+giving this text on stdout:
+
+.. code-block:: bash
+
+   obj: int
+
+This is invaluable when debugging.
+
+Let's create the function.  To do so, we first need to construct
+its single parameter, specifying its type and giving it a name,
+using :c:func:`gcc_jit_context_new_param`:
+
+.. code-block:: c
+
+  gcc_jit_param *param_i =
+    gcc_jit_context_new_param (ctxt, NULL, int_type, "i");
+
+Now we can create the function, using
+:c:func:`gcc_jit_context_new_function`:
+
+.. code-block:: c
+
+  gcc_jit_function *func =
+    gcc_jit_context_new_function (ctxt, NULL,
+                                  GCC_JIT_FUNCTION_EXPORTED,
+                                  int_type,
+                                  "square",
+                                  1, &param_i,
+                                  0);
+
+To define the code within the function, we must create basic blocks
+containing statements.
+
+Every basic block contains a list of statements, eventually terminated
+by a statement that either returns, or jumps to another basic block.
+
+Our function has no control-flow, so we just need one basic block:
+
+.. code-block:: c
+
+  gcc_jit_block *block = gcc_jit_function_new_block (func, NULL);
+
+Our basic block is relatively simple: it immediately terminates by
+returning the value of an expression.
+
+We can build the expression using :c:func:`gcc_jit_context_new_binary_op`:
+
+.. code-block:: c
+
+   gcc_jit_rvalue *expr =
+     gcc_jit_context_new_binary_op (
+       ctxt, NULL,
+       GCC_JIT_BINARY_OP_MULT, int_type,
+       gcc_jit_param_as_rvalue (param_i),
+       gcc_jit_param_as_rvalue (param_i));
+
+A :c:type:`gcc_jit_rvalue *` is another example of a
+:c:type:`gcc_jit_object *` subclass.  We can upcast it using
+:c:func:`gcc_jit_rvalue_as_object` and as before print it with
+:c:func:`gcc_jit_object_get_debug_string`.
+
+.. code-block:: c
+
+   printf ("expr: %s\n",
+           gcc_jit_object_get_debug_string (
+             gcc_jit_rvalue_as_object (expr)));
+
+giving this output:
+
+.. code-block:: bash
+
+   expr: i * i
+
+Creating the expression in itself doesn't do anything; we have to add
+this expression to a statement within the block.  In this case, we use it
+to build a return statement, which terminates the basic block:
+
+.. code-block:: c
+
+  gcc_jit_block_end_with_return (block, NULL, expr);
+
+OK, we've populated the context.  We can now compile it using
+:c:func:`gcc_jit_context_compile`:
+
+.. code-block:: c
+
+   gcc_jit_result *result;
+   result = gcc_jit_context_compile (ctxt);
+
+and get a :c:type:`gcc_jit_result *`.
+
+We can now use :c:func:`gcc_jit_result_get_code` to look up a specific
+machine code routine within the result, in this case, the function we
+created above.
+
+.. code-block:: c
+
+   void *fn_ptr = gcc_jit_result_get_code (result, "square");
+   if (!fn_ptr)
+     {
+       fprintf (stderr, "NULL fn_ptr");
+       goto error;
+     }
+
+We can now cast the pointer to an appropriate function pointer type, and
+then call it:
+
+.. code-block:: c
+
+  typedef int (*fn_type) (int);
+  fn_type square = (fn_type)fn_ptr;
+  printf ("result: %d", square (5));
+
+.. code-block:: bash
+
+  result: 25
+
+
+Options
+*******
+
+To get more information on what's going on, you can set debugging flags
+on the context using :c:func:`gcc_jit_context_set_bool_option`.
+
+.. (I'm deliberately not mentioning
+    :c:macro:`GCC_JIT_BOOL_OPTION_DUMP_INITIAL_TREE` here since I think
+    it's probably more of use to implementors than to users)
+
+Setting :c:macro:`GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE` will dump a
+C-like representation to stderr when you compile (GCC's "GIMPLE"
+representation):
+
+.. code-block:: c
+
+   gcc_jit_context_set_bool_option (
+     ctxt,
+     GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE,
+     1);
+   result = gcc_jit_context_compile (ctxt);
+
+.. code-block:: c
+
+  square (signed int i)
+  {
+    signed int D.260;
+
+    entry:
+    D.260 = i * i;
+    return D.260;
+  }
+
+We can see the generated machine code in assembler form (on stderr) by
+setting :c:macro:`GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE` on the context
+before compiling:
+
+.. code-block:: c
+
+  gcc_jit_context_set_bool_option (
+    ctxt,
+    GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE,
+    1);
+  result = gcc_jit_context_compile (ctxt);
+
+.. code-block:: gas
+
+        .file   "fake.c"
+        .text
+        .globl  square
+        .type   square, @function
+  square:
+  .LFB6:
+        .cfi_startproc
+        pushq   %rbp
+        .cfi_def_cfa_offset 16
+        .cfi_offset 6, -16
+        movq    %rsp, %rbp
+        .cfi_def_cfa_register 6
+        movl    %edi, -4(%rbp)
+  .L14:
+        movl    -4(%rbp), %eax
+        imull   -4(%rbp), %eax
+        popq    %rbp
+        .cfi_def_cfa 7, 8
+        ret
+        .cfi_endproc
+  .LFE6:
+        .size   square, .-square
+        .ident  "GCC: (GNU) 4.9.0 20131023 (Red Hat 0.2-0.5.1920c315ff984892399893b380305ab36e07b455.fc20)"
+        .section       .note.GNU-stack,"",@progbits
+
+By default, no optimizations are performed, the equivalent of GCC's
+`-O0` option.  We can turn things up to e.g. `-O3` by calling
+:c:func:`gcc_jit_context_set_int_option` with
+:c:macro:`GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL`:
+
+.. code-block:: c
+
+  gcc_jit_context_set_int_option (
+    ctxt,
+    GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL,
+    3);
+
+.. code-block:: gas
+
+        .file   "fake.c"
+        .text
+        .p2align 4,,15
+        .globl  square
+        .type   square, @function
+  square:
+  .LFB7:
+        .cfi_startproc
+  .L16:
+        movl    %edi, %eax
+        imull   %edi, %eax
+        ret
+        .cfi_endproc
+  .LFE7:
+        .size   square, .-square
+        .ident  "GCC: (GNU) 4.9.0 20131023 (Red Hat 0.2-0.5.1920c315ff984892399893b380305ab36e07b455.fc20)"
+        .section        .note.GNU-stack,"",@progbits
+
+Naturally this has only a small effect on such a trivial function.
+
+
+Full example
+************
+
+Here's what the above looks like as a complete program:
+
+   .. literalinclude:: ../examples/tut02-square.c
+    :lines: 1-
+    :language: c
+
+Building and running it:
+
+.. code-block:: console
+
+  $ gcc \
+      tut02-square.c \
+      -o tut02-square \
+      -lgccjit
+
+  # Run the built program:
+  $ ./tut02-square
+  result: 25
diff --git a/gcc/jit/docs/intro/tutorial03.rst b/gcc/jit/docs/intro/tutorial03.rst
new file mode 100644
index 0000000..0845150
--- /dev/null
+++ b/gcc/jit/docs/intro/tutorial03.rst
@@ -0,0 +1,378 @@ 
+.. Copyright (C) 2014 Free Software Foundation, Inc.
+   Originally contributed by David Malcolm <dmalcolm@redhat.com>
+
+   This is free software: you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see
+   <http://www.gnu.org/licenses/>.
+
+Tutorial part 3: Loops and variables
+------------------------------------
+Consider this C function:
+
+ .. code-block:: c
+
+  int loop_test (int n)
+  {
+    int sum = 0;
+    for (int i = 0; i < n; i++)
+      sum += i * i;
+    return sum;
+  }
+
+This example demonstrates some more features of libgccjit, with local
+variables and a loop.
+
+To break this down into libgccjit terms, it's usually easier to reword
+the `for` loop as a `while` loop, giving:
+
+ .. code-block:: c
+
+  int loop_test (int n)
+  {
+    int sum = 0;
+    int i = 0;
+    while (i < n)
+    {
+      sum += i * i;
+      i++;
+    }
+    return sum;
+  }
+
+Here's what the final control flow graph will look like:
+
+    .. figure:: sum-of-squares.png
+      :alt: image of a control flow graph
+
+As before, we include the libgccjit header and make a
+:c:type:`gcc_jit_context *`.
+
+.. code-block:: c
+
+  #include <libgccjit.h>
+
+  void test (void)
+  {
+    gcc_jit_context *ctxt;
+    ctxt = gcc_jit_context_acquire ();
+
+The function works with the C `int` type:
+
+.. code-block:: c
+
+  gcc_jit_type *the_type =
+    gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_INT);
+  gcc_jit_type *return_type = the_type;
+
+though we could equally well make it work on, say, `double`:
+
+.. code-block:: c
+
+  gcc_jit_type *the_type =
+    gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_DOUBLE);
+
+Let's build the function:
+
+.. code-block:: c
+
+  gcc_jit_param *n =
+    gcc_jit_context_new_param (ctxt, NULL, the_type, "n");
+  gcc_jit_param *params[1] = {n};
+  gcc_jit_function *func =
+    gcc_jit_context_new_function (ctxt, NULL,
+				  GCC_JIT_FUNCTION_EXPORTED,
+				  return_type,
+				  "loop_test",
+				  1, params, 0);
+
+Expressions: lvalues and rvalues
+********************************
+
+The base class of expression is the :c:type:`gcc_jit_rvalue *`,
+representing an expression that can be on the *right*-hand side of
+an assignment: a value that can be computed somehow, and assigned
+*to* a storage area (such as a variable).  It has a specific
+:c:type:`gcc_jit_type *`.
+
+Anothe important class is :c:type:`gcc_jit_lvalue *`.
+A :c:type:`gcc_jit_lvalue *`. is something that can of the *left*-hand
+side of an assignment: a storage area (such as a variable).
+
+In other words, every assignment can be thought of as:
+
+.. code-block:: c
+
+   LVALUE = RVALUE;
+
+Note that :c:type:`gcc_jit_lvalue *` is a subclass of
+:c:type:`gcc_jit_rvalue *`, where in an assignment of the form:
+
+.. code-block:: c
+
+   LVALUE_A = LVALUE_B;
+
+the `LVALUE_B` implies reading the current value of that storage
+area, assigning it into the `LVALUE_A`.
+
+So far the only expressions we've seen are `i * i`:
+
+.. code-block:: c
+
+   gcc_jit_rvalue *expr =
+     gcc_jit_context_new_binary_op (
+       ctxt, NULL,
+       GCC_JIT_BINARY_OP_MULT, int_type,
+       gcc_jit_param_as_rvalue (param_i),
+       gcc_jit_param_as_rvalue (param_i));
+
+which is a :c:type:`gcc_jit_rvalue *`, and the various function
+parameters: `param_i` and `param_n`, instances of
+:c:type:`gcc_jit_param *`, which is a subclass of
+:c:type:`gcc_jit_lvalue *` (and, in turn, of :c:type:`gcc_jit_rvalue *`):
+we can both read from and write to function parameters within the
+body of a function.
+
+Our new example has a couple of local variables.  We create them by
+calling :c:func:`gcc_jit_function_new_local`, supplying a type and a
+name:
+
+.. code-block:: c
+
+  /* Build locals:  */
+  gcc_jit_lvalue *i =
+    gcc_jit_function_new_local (func, NULL, the_type, "i");
+  gcc_jit_lvalue *sum =
+    gcc_jit_function_new_local (func, NULL, the_type, "sum");
+
+These are instances of :c:type:`gcc_jit_lvalue *` - they can be read from
+and written to.
+
+Note that there is no precanned way to create *and* initialize a variable
+like in C:
+
+.. code-block:: c
+
+   int i = 0;
+
+Instead, having added the local to the function, we have to separately add
+an assignment of `0` to `local_i` at the beginning of the function.
+
+Control flow
+************
+
+This function has a loop, so we need to build some basic blocks to
+handle the control flow.  In this case, we need 4 blocks:
+
+1. before the loop (initializing the locals)
+2. the conditional at the top of the loop (comparing `i < n`)
+3. the body of the loop
+4. after the loop terminates (`return sum`)
+
+so we create these as :c:type:`gcc_jit_block *` instances within the
+:c:type:`gcc_jit_function *`:
+
+.. code-block:: c
+
+  gcc_jit_block *b_initial =
+    gcc_jit_function_new_block (func, "initial");
+  gcc_jit_block *b_loop_cond =
+    gcc_jit_function_new_block (func, "loop_cond");
+  gcc_jit_block *b_loop_body =
+    gcc_jit_function_new_block (func, "loop_body");
+  gcc_jit_block *b_after_loop =
+    gcc_jit_function_new_block (func, "after_loop");
+
+We now populate each block with statements.
+
+The entry block `b_initial` consists of initializations followed by a jump
+to the conditional.  We assign `0` to `i` and to `sum`, using
+:c:func:`gcc_jit_block_add_assignment` to add
+an assignment statement, and using :c:func:`gcc_jit_context_zero` to get
+the constant value `0` for the relevant type for the right-hand side of
+the assignment:
+
+.. code-block:: c
+
+  /* sum = 0; */
+  gcc_jit_block_add_assignment (
+    b_initial, NULL,
+    sum,
+    gcc_jit_context_zero (ctxt, the_type));
+
+  /* i = 0; */
+  gcc_jit_block_add_assignment (
+    b_initial, NULL,
+    i,
+    gcc_jit_context_zero (ctxt, the_type));
+
+We can then terminate the entry block by jumping to the conditional:
+
+.. code-block:: c
+
+  gcc_jit_block_end_with_jump (b_initial, NULL, b_loop_cond);
+
+The conditional block is equivalent to the line `while (i < n)` from our
+C example. It contains a single statement: a conditional, which jumps to
+one of two destination blocks depending on a boolean
+:c:type:`gcc_jit_rvalue *`, in this case the comparison of `i` and `n`.
+We build the comparison using :c:func:`gcc_jit_context_new_comparison`:
+
+.. code-block:: c
+
+   gcc_jit_rvalue *guard =
+     gcc_jit_context_new_comparison (
+       ctxt, NULL,
+       GCC_JIT_COMPARISON_GE,
+       gcc_jit_lvalue_as_rvalue (i),
+       gcc_jit_param_as_rvalue (n));
+
+and can then use this to add `b_loop_cond`'s sole statement, via
+:c:func:`gcc_jit_block_end_with_conditional`:
+
+.. code-block:: c
+
+  gcc_jit_block_end_with_conditional (b_loop_cond, NULL, guard);
+
+Next, we populate the body of the loop.
+
+The C statement `sum += i * i;` is an assignment operation, where an
+lvalue is modified "in-place".  We use
+:c:func:`gcc_jit_block_add_assignment_op` to handle these operations:
+
+.. code-block:: c
+
+  /* sum += i * i */
+  gcc_jit_block_add_assignment_op (
+    b_loop_body, NULL,
+    sum,
+    GCC_JIT_BINARY_OP_PLUS,
+    gcc_jit_context_new_binary_op (
+      ctxt, NULL,
+      GCC_JIT_BINARY_OP_MULT, the_type,
+      gcc_jit_lvalue_as_rvalue (i),
+      gcc_jit_lvalue_as_rvalue (i)));
+
+The `i++` can be thought of as `i += 1`, and can thus be handled in
+a similar way.  We use :c:func:`gcc_jit_context_one` to get the constant
+value `1` (for the relevant type) for the right-hand side
+of the assignment.
+
+.. code-block:: c
+
+  /* i++ */
+  gcc_jit_block_add_assignment_op (
+    b_loop_body, NULL,
+    i,
+    GCC_JIT_BINARY_OP_PLUS,
+    gcc_jit_context_one (ctxt, the_type));
+
+.. note::
+
+  For numeric constants other than 0 or 1, we could use
+  :c:func:`gcc_jit_context_new_rvalue_from_int` and
+  :c:func:`gcc_jit_context_new_rvalue_from_double`.
+
+The loop body completes by jumping back to the conditional:
+
+.. code-block:: c
+
+  gcc_jit_block_end_with_jump (b_loop_body, NULL, b_loop_cond);
+
+Finally, we populate the `b_after_loop` block, reached when the loop
+conditional is false.  We want to generate the equivalent of:
+
+.. code-block:: c
+
+   return sum;
+
+so the block is just one statement:
+
+.. code-block:: c
+
+  /* return sum */
+  gcc_jit_block_end_with_return (
+    b_after_loop,
+    NULL,
+    gcc_jit_lvalue_as_rvalue (sum));
+
+.. note::
+
+   You can intermingle block creation with statement creation,
+   but given that the terminator statements generally include references
+   to other blocks, I find it's clearer to create all the blocks,
+   *then* all the statements.
+
+We've finished populating the function.  As before, we can now compile it
+to machine code:
+
+.. code-block:: c
+
+   gcc_jit_result *result;
+   result = gcc_jit_context_compile (ctxt);
+
+   typedef int (*loop_test_fn_type) (int);
+   loop_test_fn_type loop_test =
+    (loop_test_fn_type)gcc_jit_result_get_code (result, "loop_test");
+   if (!loop_test)
+     goto error;
+   printf ("result: %d", loop_test (10));
+
+.. code-block:: bash
+
+   result: 285
+
+
+Visualizing the control flow graph
+**********************************
+
+You can see the control flow graph of a function using
+:c:func:`gcc_jit_function_dump_to_dot`:
+
+.. code-block:: c
+
+  gcc_jit_function_dump_to_dot (func, "/tmp/sum-of-squares.dot");
+
+giving a .dot file in GraphViz format.
+
+You can convert this to an image using `dot`:
+
+.. code-block:: bash
+
+   $ dot -Tpng /tmp/sum-of-squares.dot -o /tmp/sum-of-squares.png
+
+or use a viewer (my preferred one is xdot.py; see
+https://github.com/jrfonseca/xdot.py; on Fedora you can
+install it with `yum install python-xdot`):
+
+    .. figure:: sum-of-squares.png
+      :alt: image of a control flow graph
+
+Full example
+************
+
+   .. literalinclude:: ../examples/tut03-sum-of-squares.c
+    :lines: 1-
+    :language: c
+
+Building and running it:
+
+.. code-block:: console
+
+  $ gcc \
+      tut03-sum-of-squares.c \
+      -o tut03-sum-of-squares \
+      -lgccjit
+
+  # Run the built program:
+  $ ./tut03-sum-of-squares
+  loop_test returned: 285
diff --git a/gcc/jit/docs/intro/tutorial04.rst b/gcc/jit/docs/intro/tutorial04.rst
new file mode 100644
index 0000000..cafdddb
--- /dev/null
+++ b/gcc/jit/docs/intro/tutorial04.rst
@@ -0,0 +1,1108 @@ 
+.. Copyright (C) 2014 Free Software Foundation, Inc.
+   Originally contributed by David Malcolm <dmalcolm@redhat.com>
+
+   This is free software: you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see
+   <http://www.gnu.org/licenses/>.
+
+Tutorial part 4: Adding JIT-compilation to a toy interpreter
+------------------------------------------------------------
+In this example we construct a "toy" interpreter, and add JIT-compilation
+to it.
+
+Our toy interpreter
+*******************
+
+It's a stack-based interpreter, and is intended as a (very simple) example
+of the kind of bytecode interpreter seen in dynamic languages such as
+Python, Ruby etc.
+
+For the sake of simplicity, our toy virtual machine is very limited:
+
+  * The only data type is `int`
+
+  * It can only work on one function at a time (so that the only
+    function call that can be made is to recurse).
+
+  * Functions can only take one parameter.
+
+  * Functions have a stack of `int` values.
+
+  * We'll implement function call within the interpreter by calling a
+    function in our implementation, rather than implementing our own
+    frame stack.
+
+  * The parser is only good enough to get the examples to work.
+
+Naturally, a real interpreter would be much more complicated that this.
+
+The following operations are supported:
+
+====================== ======================== =============== ==============
+Operation              Meaning                  Old Stack       New Stack
+====================== ======================== =============== ==============
+DUP                    Duplicate top of stack.  ``[..., x]``    ``[..., x, x]``
+ROT                    Swap top two elements    ``[..., x, y]`` ``[..., y, x]``
+                       of stack.
+BINARY_ADD             Add the top two elements ``[..., x, y]`` ``[..., (x+y)]``
+                       on the stack.
+BINARY_SUBTRACT        Likewise, but subtract.  ``[..., x, y]`` ``[..., (x-y)]``
+BINARY_MULT            Likewise, but multiply.  ``[..., x, y]`` ``[..., (x*y)]``
+BINARY_COMPARE_LT      Compare the top two      ``[..., x, y]`` ``[..., (x<y)]``
+                       elements on the stack
+                       and push a nonzero/zero
+                       if (x<y).
+RECURSE                Recurse, passing the top ``[..., x]``    ``[..., fn(x)]``
+                       of the stack, and
+                       popping the result.
+RETURN                 Return the top of the    ``[x]``         ``[]``
+                       stack.
+PUSH_CONST `arg`       Push an int const.       ``[...]``       ``[..., arg]``
+JUMP_ABS_IF_TRUE `arg` Pop; if top of stack was ``[..., x]``    ``[...]``
+                       nonzero, jump to
+                       ``arg``.
+====================== ======================== =============== ==============
+
+Programs can be interpreted, disassembled, and compiled to machine code.
+
+The interpreter reads ``.toy`` scripts.  Here's what a simple recursive
+factorial program looks like, the script ``factorial.toy``.
+The parser ignores lines beginning with a `#`.
+
+   .. literalinclude:: ../examples/tut04-toyvm/factorial.toy
+    :lines: 1-
+    :language: c
+
+The interpreter is a simple infinite loop with a big ``switch`` statement
+based on what the next opcode is:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* Execute the given function.  */
+    :end-before: /* JIT compilation.  */
+    :language: c
+
+Compiling to machine code
+*************************
+We want to generate machine code that can be cast to this type and
+then directly executed in-process:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* Functions are compiled to this function ptr type.  */
+    :end-before: enum opcode
+    :language: c
+
+Our compiler isn't very sophisticated; it takes the implementation of
+each opcode above, and maps it directly to the operations supported by
+the libgccjit API.
+
+How should we handle the stack?  In theory we could calculate what the
+stack depth will be at each opcode, and optimize away the stack
+manipulation "by hand".  We'll see below that libgccjit is able to do
+this for us, so we'll implement stack manipulation
+in a direct way, by creating a ``stack`` array and ``stack_depth``
+variables, local within the generated function, equivalent to this C code:
+
+.. code-block:: c
+
+  int stack_depth;
+  int stack[MAX_STACK_DEPTH];
+
+We'll also have local variables ``x`` and ``y`` for use when implementing
+the opcodes, equivalent to this:
+
+.. code-block:: c
+
+  int x;
+  int y;
+
+This means our compiler has the following state:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* JIT compilation.  */
+    :end-before: /* Stack manipulation.  */
+    :language: c
+
+Setting things up
+*****************
+
+First we create our types:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* Create types.  */
+    :end-before: /* The constant value 1.  */
+    :language: c
+
+along with extracting a useful `int` constant:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* The constant value 1.  */
+    :end-before: /* Create locations.  */
+    :language: c
+
+We'll implement push and pop in terms of the ``stack`` array and
+``stack_depth``.  Here are helper functions for adding statements to
+a block, implementing pushing and popping values:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* Stack manipulation.  */
+    :end-before: /* The main compilation hook.  */
+    :language: c
+
+We will support single-stepping through the generated code in the
+debugger, so we need to create :c:type:`gcc_jit_location` instances, one
+per operation in the source code.  These will reference the lines of
+e.g. ``factorial.toy``.
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* Create locations.  */
+    :end-before: /* Creating the function.  */
+    :language: c
+
+Let's create the function itself.  As usual, we create its parameter
+first, then use the parameter to create the function:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* Creating the function.  */
+    :end-before: /* Create stack lvalues.  */
+    :language: c
+
+We create the locals within the function.
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* Create stack lvalues.  */
+    :end-before: /* 1st pass: create blocks, one per opcode.
+    :language: c
+
+Populating the function
+***********************
+
+There's some one-time initialization, and the API treats the first block
+you create as the entrypoint of the function, so we need to create that
+block first:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: first.  */
+    :end-before: /* Create a block per operation.  */
+    :language: c
+
+We can now create blocks for each of the operations.  Most of these will
+be consolidated into larger blocks when the optimizer runs.
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* Create a block per operation.  */
+    :end-before: /* Populate the initial block.  */
+    :language: c
+
+Now that we have a block it can jump to when it's done, we can populate
+the initial block:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* Populate the initial block.  */
+    :end-before: /* 2nd pass: fill in instructions.  */
+    :language: c
+
+We can now populate the blocks for the individual operations.  We loop
+through them, adding instructions to their blocks:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* 2nd pass: fill in instructions.  */
+    :end-before: /* Helper macros.  */
+    :language: c
+
+We're going to have another big ``switch`` statement for implementing
+the opcodes, this time for compiling them, rather than interpreting
+them.  It's helpful to have macros for implementing push and pop, so that
+we can make the ``switch`` statement that's coming up look as much as
+possible like the one above within the interpreter:
+
+.. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* Helper macros.  */
+    :end-before: gcc_jit_block_add_comment
+    :language: c
+
+.. note::
+
+   A particularly clever implementation would have an *identical*
+   ``switch`` statement shared by the interpreter and the compiler, with
+   some preprocessor "magic".  We're not doing that here, for the sake
+   of simplicity.
+
+When I first implemented this compiler, I accidentally missed an edit
+when copying and pasting the ``Y_EQUALS_POP`` macro, so that popping the
+stack into ``y`` instead erroneously assigned it to ``x``, leaving ``y``
+uninitialized.
+
+To track this kind of thing down, we can use
+:c:func:`gcc_jit_block_add_comment` to add descriptive comments
+to the internal representation.  This is invaluable when looking through
+the generated IR for, say ``factorial``:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: PUSH_RVALUE (gcc_jit_lvalue_as_rvalue (state.y))
+    :end-before: /* Handle the individual opcodes.  */
+    :language: c
+
+We can now write the big ``switch`` statement that implements the
+individual opcodes, populating the relevant block with statements:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* Handle the individual opcodes.  */
+    :end-before: /* Go to the next block.  */
+    :language: c
+
+Every block must be terminated, via a call to one of the
+``gcc_jit_block_end_with_`` entrypoints.  This has been done for two
+of the opcodes, but we need to do it for the other ones, by jumping
+to the next block.
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* Go to the next block.  */
+    :end-before: /* end of loop on PC locations.  */
+    :language: c
+
+This is analogous to simply incrementing the program counter.
+
+Verifying the control flow graph
+********************************
+Having finished looping over the blocks, the context is complete.
+
+As before, we can verify that the control flow and statements are sane by
+using :c:func:`gcc_jit_function_dump_to_dot`:
+
+.. code-block:: c
+
+  gcc_jit_function_dump_to_dot (state.fn, "/tmp/factorial.dot");
+
+and viewing the result.  Note how the label names, comments, and
+variable names show up in the dump, to make it easier to spot
+errors in our compiler.
+
+  .. figure:: factorial.png
+    :alt: image of a control flow graph
+
+Compiling the context
+*********************
+Having finished looping over the blocks and populating them with
+statements, the context is complete.
+
+We can now compile it, and extract machine code from the result:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* We've now finished populating the context.  Compile it.  */
+    :end-before: /* (this leaks "result" and "funcname") */
+    :language: c
+
+We can now run the result:
+
+   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
+    :start-after: /* JIT-compilation.  */
+    :end-before: return 0;
+    :language: c
+
+Single-stepping through the generated code
+******************************************
+
+It's possible to debug the generated code.  To do this we need to both:
+
+  * Set up source code locations for our statements, so that we can
+    meaningfully step through the code.  We did this above by
+    calling :c:func:`gcc_jit_context_new_location` and using the
+    results.
+
+  * Enable the generation of debugging information, by setting
+    :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` on the
+    :c:type:`gcc_jit_context` via
+    :c:func:`gcc_jit_context_set_bool_option`:
+
+    .. code-block:: c
+
+     gcc_jit_context_set_bool_option (
+       ctxt,
+       GCC_JIT_BOOL_OPTION_DEBUGINFO,
+       1);
+
+Having done this, we can put a breakpoint on the generated function:
+
+.. code-block:: console
+
+  $ gdb --args ./toyvm factorial.toy 10
+  (gdb) break factorial
+  Function "factorial" not defined.
+  Make breakpoint pending on future shared library load? (y or [n]) y
+  Breakpoint 1 (factorial) pending.
+  (gdb) run
+  Breakpoint 1, factorial (arg=10) at factorial.toy:14
+  14	DUP
+
+We've set up location information, which references ``factorial.toy``.
+This allows us to use e.g. ``list`` to see where we are in the script:
+
+.. code-block:: console
+
+  (gdb) list
+  9
+  10    # Initial state:
+  11    # stack: [arg]
+  12
+  13    # 0:
+  14    DUP
+  15    # stack: [arg, arg]
+  16
+  17    # 1:
+  18    PUSH_CONST 2
+
+and to step through the function, examining the data:
+
+.. code-block:: console
+
+  (gdb) n
+  18    PUSH_CONST 2
+  (gdb) n
+  22    BINARY_COMPARE_LT
+  (gdb) print stack
+  $5 = {10, 10, 2, 0, -7152, 32767, 0, 0}
+  (gdb) print stack_depth
+  $6 = 3
+
+You'll see that the parts of the ``stack`` array that haven't been
+touched yet are uninitialized.
+
+.. note::
+
+   Turning on optimizations may lead to unpredictable results when
+   stepping through the generated code: the execution may appear to
+   "jump around" the source code.  This is analogous to turning up the
+   optimization level in a regular compiler.
+
+Examining the generated code
+****************************
+
+How good is the optimized code?
+
+We can turn up optimizations, by calling
+:c:func:`gcc_jit_context_set_int_option` with
+:c:macro:`GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL`:
+
+.. code-block:: c
+
+  gcc_jit_context_set_int_option (
+    ctxt,
+    GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL,
+    3);
+
+One of GCC's internal representations is called "gimple".  A dump of the
+initial gimple representation of the code can be seen by setting:
+
+.. code-block:: c
+
+  gcc_jit_context_set_bool_option (ctxt,
+                                   GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE,
+                                   1);
+
+With optimization on and source locations displayed, this gives:
+
+.. We'll use "c" for gimple dumps
+
+.. code-block:: c
+
+  factorial (signed int arg)
+  {
+    <unnamed type> D.80;
+    signed int D.81;
+    signed int D.82;
+    signed int D.83;
+    signed int D.84;
+    signed int D.85;
+    signed int y;
+    signed int x;
+    signed int stack_depth;
+    signed int stack[8];
+
+    try
+      {
+        initial:
+        stack_depth = 0;
+        stack[stack_depth] = arg;
+        stack_depth = stack_depth + 1;
+        goto instr0;
+        instr0:
+        /* DUP */:
+        stack_depth = stack_depth + -1;
+        x = stack[stack_depth];
+        stack[stack_depth] = x;
+        stack_depth = stack_depth + 1;
+        stack[stack_depth] = x;
+        stack_depth = stack_depth + 1;
+        goto instr1;
+        instr1:
+        /* PUSH_CONST */:
+        stack[stack_depth] = 2;
+        stack_depth = stack_depth + 1;
+        goto instr2;
+
+        /* etc */
+
+You can see the generated machine code in assembly form via:
+
+.. code-block:: c
+
+  gcc_jit_context_set_bool_option (
+    ctxt,
+    GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE,
+    1);
+  result = gcc_jit_context_compile (ctxt);
+
+which shows that (on this x86_64 box) the compiler has unrolled the loop
+and is using MMX instructions to perform several multiplications
+simultaneously:
+
+.. code-block:: gas
+
+          .file   "fake.c"
+          .text
+  .Ltext0:
+          .p2align 4,,15
+          .globl  factorial
+          .type   factorial, @function
+  factorial:
+  .LFB0:
+          .file 1 "factorial.toy"
+          .loc 1 14 0
+          .cfi_startproc
+  .LVL0:
+  .L2:
+          .loc 1 26 0
+          cmpl    $1, %edi
+          jle     .L13
+          leal    -1(%rdi), %edx
+          movl    %edx, %ecx
+          shrl    $2, %ecx
+          leal    0(,%rcx,4), %esi
+          testl   %esi, %esi
+          je      .L14
+          cmpl    $9, %edx
+          jbe     .L14
+          leal    -2(%rdi), %eax
+          movl    %eax, -16(%rsp)
+          leal    -3(%rdi), %eax
+          movd    -16(%rsp), %xmm0
+          movl    %edi, -16(%rsp)
+          movl    %eax, -12(%rsp)
+          movd    -16(%rsp), %xmm1
+          xorl    %eax, %eax
+          movl    %edx, -16(%rsp)
+          movd    -12(%rsp), %xmm4
+          movd    -16(%rsp), %xmm6
+          punpckldq       %xmm4, %xmm0
+          movdqa  .LC1(%rip), %xmm4
+          punpckldq       %xmm6, %xmm1
+          punpcklqdq      %xmm0, %xmm1
+          movdqa  .LC0(%rip), %xmm0
+          jmp     .L5
+          # etc - edited for brevity
+
+This is clearly overkill for a function that will likely overflow the
+``int`` type before the vectorization is worthwhile - but then again, this
+is a toy example.
+
+Turning down the optimization level to 2:
+
+.. code-block:: c
+
+  gcc_jit_context_set_int_option (
+    ctxt,
+    GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL,
+    3);
+
+yields this code, which is simple enough to quote in its entirety:
+
+.. code-block:: gas
+
+          .file   "fake.c"
+          .text
+          .p2align 4,,15
+          .globl  factorial
+          .type   factorial, @function
+  factorial:
+  .LFB0:
+          .cfi_startproc
+  .L2:
+          cmpl    $1, %edi
+          jle     .L8
+          movl    $1, %edx
+          jmp     .L4
+          .p2align 4,,10
+          .p2align 3
+  .L6:
+          movl    %eax, %edi
+  .L4:
+  .L5:
+          leal    -1(%rdi), %eax
+          imull   %edi, %edx
+          cmpl    $1, %eax
+          jne     .L6
+  .L3:
+  .L7:
+          imull   %edx, %eax
+          ret
+  .L8:
+          movl    %edi, %eax
+          movl    $1, %edx
+          jmp     .L7
+          .cfi_endproc
+  .LFE0:
+          .size   factorial, .-factorial
+          .ident  "GCC: (GNU) 4.9.0 20131023 (Red Hat 0.2-%{gcc_release})"
+          .section        .note.GNU-stack,"",@progbits
+
+Note that the stack pushing and popping have been eliminated, as has the
+recursive call (in favor of an iteration).
+
+Putting it all together
+***********************
+
+The complete example can be seen in the source tree at
+``gcc/jit/docs/examples/tut04-toyvm/toyvm.c``
+
+along with a Makefile and a couple of sample .toy scripts:
+
+.. code-block:: console
+
+  $ ls -al
+  drwxrwxr-x. 2 david david   4096 Sep 19 17:46 .
+  drwxrwxr-x. 3 david david   4096 Sep 19 15:26 ..
+  -rw-rw-r--. 1 david david    615 Sep 19 12:43 factorial.toy
+  -rw-rw-r--. 1 david david    834 Sep 19 13:08 fibonacci.toy
+  -rw-rw-r--. 1 david david    238 Sep 19 14:22 Makefile
+  -rw-rw-r--. 1 david david  16457 Sep 19 17:07 toyvm.c
+
+  $ make toyvm
+  g++ -Wall -g -o toyvm toyvm.c -lgccjit
+
+  $ ./toyvm factorial.toy 10
+  interpreter result: 3628800
+  compiler result: 3628800
+
+  $ ./toyvm fibonacci.toy 10
+  interpreter result: 55
+  compiler result: 55
+
+Behind the curtain: How does our code get optimized?
+****************************************************
+
+Our example is done, but you may be wondering about exactly how the
+compiler turned what we gave it into the machine code seen above.
+
+We can examine what the compiler is doing in detail by setting:
+
+.. code-block:: c
+
+  gcc_jit_context_set_bool_option (state.ctxt,
+                                   GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING,
+                                   1);
+  gcc_jit_context_set_bool_option (state.ctxt,
+                                   GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES,
+                                   1);
+
+This will dump detailed information about the compiler's state to a
+directory under ``/tmp``, and keep it from being cleaned up.
+
+The precise names and their formats of these files is subject to change.
+Higher optimization levels lead to more files.
+Here's what I saw (edited for brevity; there were almost 200 files):
+
+.. code-block:: console
+
+  intermediate files written to /tmp/libgccjit-KPQbGw
+  $ ls /tmp/libgccjit-KPQbGw/
+  fake.c.000i.cgraph
+  fake.c.000i.type-inheritance
+  fake.c.004t.gimple
+  fake.c.007t.omplower
+  fake.c.008t.lower
+  fake.c.011t.eh
+  fake.c.012t.cfg
+  fake.c.014i.visibility
+  fake.c.015i.early_local_cleanups
+  fake.c.016t.ssa
+  # etc
+
+The gimple code is converted into Static Single Assignment form,
+with annotations for use when generating the debuginfo:
+
+.. code-block:: console
+
+  $ less /tmp/libgccjit-KPQbGw/fake.c.016t.ssa
+
+.. code-block:: c
+
+  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
+
+  factorial (signed int arg)
+  {
+    signed int stack[8];
+    signed int stack_depth;
+    signed int x;
+    signed int y;
+    <unnamed type> _20;
+    signed int _21;
+    signed int _38;
+    signed int _44;
+    signed int _51;
+    signed int _56;
+
+  initial:
+    stack_depth_3 = 0;
+    # DEBUG stack_depth => stack_depth_3
+    stack[stack_depth_3] = arg_5(D);
+    stack_depth_7 = stack_depth_3 + 1;
+    # DEBUG stack_depth => stack_depth_7
+    # DEBUG instr0 => NULL
+    # DEBUG /* DUP */ => NULL
+    stack_depth_8 = stack_depth_7 + -1;
+    # DEBUG stack_depth => stack_depth_8
+    x_9 = stack[stack_depth_8];
+    # DEBUG x => x_9
+    stack[stack_depth_8] = x_9;
+    stack_depth_11 = stack_depth_8 + 1;
+    # DEBUG stack_depth => stack_depth_11
+    stack[stack_depth_11] = x_9;
+    stack_depth_13 = stack_depth_11 + 1;
+    # DEBUG stack_depth => stack_depth_13
+    # DEBUG instr1 => NULL
+    # DEBUG /* PUSH_CONST */ => NULL
+    stack[stack_depth_13] = 2;
+
+    /* etc; edited for brevity */
+
+We can perhaps better see the code by turning off
+:c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` to suppress all those ``DEBUG``
+statements, giving:
+
+.. code-block:: console
+
+  $ less /tmp/libgccjit-1Hywc0/fake.c.016t.ssa
+
+.. code-block:: c
+
+  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
+
+  factorial (signed int arg)
+  {
+    signed int stack[8];
+    signed int stack_depth;
+    signed int x;
+    signed int y;
+    <unnamed type> _20;
+    signed int _21;
+    signed int _38;
+    signed int _44;
+    signed int _51;
+    signed int _56;
+
+  initial:
+    stack_depth_3 = 0;
+    stack[stack_depth_3] = arg_5(D);
+    stack_depth_7 = stack_depth_3 + 1;
+    stack_depth_8 = stack_depth_7 + -1;
+    x_9 = stack[stack_depth_8];
+    stack[stack_depth_8] = x_9;
+    stack_depth_11 = stack_depth_8 + 1;
+    stack[stack_depth_11] = x_9;
+    stack_depth_13 = stack_depth_11 + 1;
+    stack[stack_depth_13] = 2;
+    stack_depth_15 = stack_depth_13 + 1;
+    stack_depth_16 = stack_depth_15 + -1;
+    y_17 = stack[stack_depth_16];
+    stack_depth_18 = stack_depth_16 + -1;
+    x_19 = stack[stack_depth_18];
+    _20 = x_19 < y_17;
+    _21 = (signed int) _20;
+    stack[stack_depth_18] = _21;
+    stack_depth_23 = stack_depth_18 + 1;
+    stack_depth_24 = stack_depth_23 + -1;
+    x_25 = stack[stack_depth_24];
+    if (x_25 != 0)
+      goto <bb 4> (instr9);
+    else
+      goto <bb 3> (instr4);
+
+  instr4:
+  /* DUP */:
+    stack_depth_26 = stack_depth_24 + -1;
+    x_27 = stack[stack_depth_26];
+    stack[stack_depth_26] = x_27;
+    stack_depth_29 = stack_depth_26 + 1;
+    stack[stack_depth_29] = x_27;
+    stack_depth_31 = stack_depth_29 + 1;
+    stack[stack_depth_31] = 1;
+    stack_depth_33 = stack_depth_31 + 1;
+    stack_depth_34 = stack_depth_33 + -1;
+    y_35 = stack[stack_depth_34];
+    stack_depth_36 = stack_depth_34 + -1;
+    x_37 = stack[stack_depth_36];
+    _38 = x_37 - y_35;
+    stack[stack_depth_36] = _38;
+    stack_depth_40 = stack_depth_36 + 1;
+    stack_depth_41 = stack_depth_40 + -1;
+    x_42 = stack[stack_depth_41];
+    _44 = factorial (x_42);
+    stack[stack_depth_41] = _44;
+    stack_depth_46 = stack_depth_41 + 1;
+    stack_depth_47 = stack_depth_46 + -1;
+    y_48 = stack[stack_depth_47];
+    stack_depth_49 = stack_depth_47 + -1;
+    x_50 = stack[stack_depth_49];
+    _51 = x_50 * y_48;
+    stack[stack_depth_49] = _51;
+    stack_depth_53 = stack_depth_49 + 1;
+
+    # stack_depth_1 = PHI <stack_depth_24(2), stack_depth_53(3)>
+  instr9:
+  /* RETURN */:
+    stack_depth_54 = stack_depth_1 + -1;
+    x_55 = stack[stack_depth_54];
+    _56 = x_55;
+    stack ={v} {CLOBBER};
+    return _56;
+
+  }
+
+Note in the above how all the :c:type:`gcc_jit_block` instances we
+created have been consolidated into just 3 blocks in GCC's internal
+representation: ``initial``, ``instr4`` and ``instr9``.
+
+Optimizing away stack manipulation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Recall our simple implementation of stack operations.  Let's examine
+how the stack operations are optimized away.
+
+After a pass of constant-propagation, the depth of the stack at each
+opcode can be determined at compile-time:
+
+.. code-block:: console
+
+  $ less /tmp/libgccjit-1Hywc0/fake.c.021t.ccp1
+
+.. code-block:: c
+
+  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
+
+  factorial (signed int arg)
+  {
+    signed int stack[8];
+    signed int stack_depth;
+    signed int x;
+    signed int y;
+    <unnamed type> _20;
+    signed int _21;
+    signed int _38;
+    signed int _44;
+    signed int _51;
+
+  initial:
+    stack[0] = arg_5(D);
+    x_9 = stack[0];
+    stack[0] = x_9;
+    stack[1] = x_9;
+    stack[2] = 2;
+    y_17 = stack[2];
+    x_19 = stack[1];
+    _20 = x_19 < y_17;
+    _21 = (signed int) _20;
+    stack[1] = _21;
+    x_25 = stack[1];
+    if (x_25 != 0)
+      goto <bb 4> (instr9);
+    else
+      goto <bb 3> (instr4);
+
+  instr4:
+  /* DUP */:
+    x_27 = stack[0];
+    stack[0] = x_27;
+    stack[1] = x_27;
+    stack[2] = 1;
+    y_35 = stack[2];
+    x_37 = stack[1];
+    _38 = x_37 - y_35;
+    stack[1] = _38;
+    x_42 = stack[1];
+    _44 = factorial (x_42);
+    stack[1] = _44;
+    y_48 = stack[1];
+    x_50 = stack[0];
+    _51 = x_50 * y_48;
+    stack[0] = _51;
+
+  instr9:
+  /* RETURN */:
+    x_55 = stack[0];
+    x_56 = x_55;
+    stack ={v} {CLOBBER};
+    return x_56;
+
+  }
+
+Note how, in the above, all those ``stack_depth`` values are now just
+constants: we're accessing specific stack locations at each opcode.
+
+The "esra" pass ("Early Scalar Replacement of Aggregates") breaks
+out our "stack" array into individual elements:
+
+.. code-block:: console
+
+  $ less /tmp/libgccjit-1Hywc0/fake.c.024t.esra
+
+.. code-block:: c
+
+  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
+
+  Created a replacement for stack offset: 0, size: 32: stack$0
+  Created a replacement for stack offset: 32, size: 32: stack$1
+  Created a replacement for stack offset: 64, size: 32: stack$2
+
+  Symbols to be put in SSA form
+  { D.89 D.90 D.91 }
+  Incremental SSA update started at block: 0
+  Number of blocks in CFG: 5
+  Number of blocks to update: 4 ( 80%)
+
+
+  factorial (signed int arg)
+  {
+    signed int stack$2;
+    signed int stack$1;
+    signed int stack$0;
+    signed int stack[8];
+    signed int stack_depth;
+    signed int x;
+    signed int y;
+    <unnamed type> _20;
+    signed int _21;
+    signed int _38;
+    signed int _44;
+    signed int _51;
+
+  initial:
+    stack$0_45 = arg_5(D);
+    x_9 = stack$0_45;
+    stack$0_39 = x_9;
+    stack$1_32 = x_9;
+    stack$2_30 = 2;
+    y_17 = stack$2_30;
+    x_19 = stack$1_32;
+    _20 = x_19 < y_17;
+    _21 = (signed int) _20;
+    stack$1_28 = _21;
+    x_25 = stack$1_28;
+    if (x_25 != 0)
+      goto <bb 4> (instr9);
+    else
+      goto <bb 3> (instr4);
+
+  instr4:
+  /* DUP */:
+    x_27 = stack$0_39;
+    stack$0_22 = x_27;
+    stack$1_14 = x_27;
+    stack$2_12 = 1;
+    y_35 = stack$2_12;
+    x_37 = stack$1_14;
+    _38 = x_37 - y_35;
+    stack$1_10 = _38;
+    x_42 = stack$1_10;
+    _44 = factorial (x_42);
+    stack$1_6 = _44;
+    y_48 = stack$1_6;
+    x_50 = stack$0_22;
+    _51 = x_50 * y_48;
+    stack$0_1 = _51;
+
+    # stack$0_52 = PHI <stack$0_39(2), stack$0_1(3)>
+  instr9:
+  /* RETURN */:
+    x_55 = stack$0_52;
+    x_56 = x_55;
+    stack ={v} {CLOBBER};
+    return x_56;
+
+  }
+
+Hence at this point, all those pushes and pops of the stack are now
+simply assignments to specific temporary variables.
+
+After some copy propagation, the stack manipulation has been completely
+optimized away:
+
+.. code-block:: console
+
+  $ less /tmp/libgccjit-1Hywc0/fake.c.026t.copyprop1
+
+.. code-block:: c
+
+  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
+
+  factorial (signed int arg)
+  {
+    signed int stack$2;
+    signed int stack$1;
+    signed int stack$0;
+    signed int stack[8];
+    signed int stack_depth;
+    signed int x;
+    signed int y;
+    <unnamed type> _20;
+    signed int _21;
+    signed int _38;
+    signed int _44;
+    signed int _51;
+
+  initial:
+    stack$0_39 = arg_5(D);
+    _20 = arg_5(D) <= 1;
+    _21 = (signed int) _20;
+    if (_21 != 0)
+      goto <bb 4> (instr9);
+    else
+      goto <bb 3> (instr4);
+
+  instr4:
+  /* DUP */:
+    _38 = arg_5(D) + -1;
+    _44 = factorial (_38);
+    _51 = arg_5(D) * _44;
+    stack$0_1 = _51;
+
+    # stack$0_52 = PHI <arg_5(D)(2), _51(3)>
+  instr9:
+  /* RETURN */:
+    stack ={v} {CLOBBER};
+    return stack$0_52;
+
+  }
+
+Later on, another pass finally eliminated ``stack_depth`` local and the
+unused parts of the `stack`` array altogether:
+
+.. code-block:: console
+
+  $ less /tmp/libgccjit-1Hywc0/fake.c.036t.release_ssa
+
+.. code-block:: c
+
+  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
+
+  Released 44 names, 314.29%, removed 44 holes
+  factorial (signed int arg)
+  {
+    signed int stack$0;
+    signed int mult_acc_1;
+    <unnamed type> _5;
+    signed int _6;
+    signed int _7;
+    signed int mul_tmp_10;
+    signed int mult_acc_11;
+    signed int mult_acc_13;
+
+    # arg_9 = PHI <arg_8(D)(0)>
+    # mult_acc_13 = PHI <1(0)>
+  initial:
+
+    <bb 5>:
+    # arg_4 = PHI <arg_9(2), _7(3)>
+    # mult_acc_1 = PHI <mult_acc_13(2), mult_acc_11(3)>
+    _5 = arg_4 <= 1;
+    _6 = (signed int) _5;
+    if (_6 != 0)
+      goto <bb 4> (instr9);
+    else
+      goto <bb 3> (instr4);
+
+  instr4:
+  /* DUP */:
+    _7 = arg_4 + -1;
+    mult_acc_11 = mult_acc_1 * arg_4;
+    goto <bb 5>;
+
+    # stack$0_12 = PHI <arg_4(5)>
+  instr9:
+  /* RETURN */:
+    mul_tmp_10 = mult_acc_1 * stack$0_12;
+    return mul_tmp_10;
+
+  }
+
+
+Elimination of tail recursion
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Another significant optimization is the detection that the call to
+``factorial`` is tail recursion, which can be eliminated in favor of
+an iteration:
+
+.. code-block:: console
+
+  $ less /tmp/libgccjit-1Hywc0/fake.c.030t.tailr1
+
+.. code-block:: c
+
+  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
+
+
+  Symbols to be put in SSA form
+  { D.88 }
+  Incremental SSA update started at block: 0
+  Number of blocks in CFG: 5
+  Number of blocks to update: 4 ( 80%)
+
+
+  factorial (signed int arg)
+  {
+    signed int stack$2;
+    signed int stack$1;
+    signed int stack$0;
+    signed int stack[8];
+    signed int stack_depth;
+    signed int x;
+    signed int y;
+    signed int mult_acc_1;
+    <unnamed type> _20;
+    signed int _21;
+    signed int _38;
+    signed int mul_tmp_44;
+    signed int mult_acc_51;
+
+    # arg_5 = PHI <arg_39(D)(0), _38(3)>
+    # mult_acc_1 = PHI <1(0), mult_acc_51(3)>
+  initial:
+    _20 = arg_5 <= 1;
+    _21 = (signed int) _20;
+    if (_21 != 0)
+      goto <bb 4> (instr9);
+    else
+      goto <bb 3> (instr4);
+
+  instr4:
+  /* DUP */:
+    _38 = arg_5 + -1;
+    mult_acc_51 = mult_acc_1 * arg_5;
+    goto <bb 2> (initial);
+
+    # stack$0_52 = PHI <arg_5(2)>
+  instr9:
+  /* RETURN */:
+    stack ={v} {CLOBBER};
+    mul_tmp_44 = mult_acc_1 * stack$0_52;
+    return mul_tmp_44;
+
+  }