diff mbox

[docs,1/5] Update the first section of md.texi

Message ID 1420543302-11008-2-git-send-email-james.greenhalgh@arm.com
State New
Headers show

Commit Message

James Greenhalgh Jan. 6, 2015, 11:21 a.m. UTC
Hi,

This patch updates the introduction to, and first section of, md.texi.

I was aiming to:

  * Remove outdated details of the compiler.
  * Remove long or obscure words that, while accurate, only served to
    obfuscate a simple idea.
  * Refer to similar things in a consistent fashion - in particular
    trying to keep consistent use of "insn" and "pattern".
  * Remove superflous examples, or waffling.

OK?

Thanks,
James

---
2015-01-06  James Greenhalgh  <james.greenhalgh@arm.com>

	* doc/md.texi (Machine Descriptions): Update text.
	(menu): Likewise.
	(Overview): Likewise.

Comments

James Greenhalgh Jan. 6, 2015, 4:56 p.m. UTC | #1
As requested, please find a paragraph-by-paragraph justification for
the changes in this patch below.

I hope this aids review.

Of particular interest for this patch is the removal of the text
describing the semicolon for a comment syntax. I believe that a
description of the syntax of a .md file does not belong in the
introduction. I don't have a good suggestion as to where it should
move to.

Thanks,
James

On Tue, Jan 06, 2015 at 11:21:38AM +0000, James Greenhalgh wrote:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 7f0426c..0277f14 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -7,39 +7,39 @@
>  @chapter Machine Descriptions
>  @cindex machine descriptions
>  
> -A machine description has two parts: a file of instruction patterns
> -(@file{.md} file) and a C header file of macro definitions.
> +A machine description has two parts: one or more files describing the
> +instructions available on the target (@file{.md} files) and one or more
> +C/C++ source files implementing target hooks and macros.  These are
> +described in the next chapter (@pxref{Target Macros}).

The original text was inaccurate. .md files can be included in one
another, so it is wrong to say "a file". Real-world backends are not
just C header files of macro definitions.

Rather than just mentioning the next chapter, cross-reference it.

> -The @file{.md} file for a target machine contains a pattern for each
> -instruction that the target machine supports (or at least each instruction
> -that is worth telling the compiler about).  It may also contain comments.
> -A semicolon causes the rest of the line to be a comment, unless the semicolon
> -is inside a quoted string.

A detailed description of the syntax of a .md file does not belong in an
introduction. It is a deficiency of this patch set that this text is not
added back elsewhere. Do you have any suggestions of where the text could
sensibly go?

> -See the next chapter for information on the C header file.

Moved above.

> +This chapter describes the @file{.md} files for a target machine.  These
> +contain the instruction patterns which are used by the compiler when
> +expanding gimple to RTL, during the RTL optimization passes, and when
> +emitting assembler code.

New text. Give a high level overview of what is in an md file and why
we want it.

>  
>  @menu
>  * Overview::            How the machine description is used.
>  * Patterns::            How to write instruction patterns.
>  * Example::             An explained example of a @code{define_insn} pattern.
> -* RTL Template::        The RTL template defines what insns match a pattern.
> +* RTL Template::        The RTL template defines which insns may match a
> +                        pattern.

s/what/which/ -- This reads better to me
s/match/may match/ -- Many things can result in an RTL Template not
matching; pattern ordering, predicates, failing conditions, etc.

>  * Output Template::     The output template says how to make assembler code
> -                        from such an insn.
> +                        from an insn matched to an instruction pattern.

The previous text did not make sense. "such an insn" is not bound to
anything if I am reading the contents non-linearly (as I am likely to do).

>  * Output Statement::    For more generality, write C code to output
>                          the assembler code.
> -* Predicates::          Controlling what kinds of operands can be used
> -                        for an insn.
> +* Predicates::          Controlling which kinds of operands can be used
> +                        when matching an insn to an instruction pattern.

s/what/which/ -- Reads better to me.

Add text qualifying when predicates are used - predicates can't be used
to instruct register allocation after pattern matching, they are only
used in pattern matching.

>  * Constraints::         Fine-tuning operand selection.
>  * Standard Names::      Names mark patterns to use for code generation.
>  * Pattern Ordering::    When the order of patterns makes a difference.
>  * Dependent Patterns::  Having one pattern may make you need another.
>  * Jump Patterns::       Special considerations for patterns for jump insns.
>  * Looping Patterns::    How to define patterns for special looping insns.
> -* Insn Canonicalizations::Canonicalization of Instructions
> +* Insn Canonicalizations::Canonicalization of instructions

We don't use title case elsewhere in the descriptions.

>  * Expander Definitions::Generating a sequence of several RTL insns
>                          for a standard operation.
> -* Insn Splitting::      Splitting Instructions into Multiple Instructions.
> -* Including Patterns::  Including Patterns in Machine Descriptions.
> +* Insn Splitting::      Splitting insns into multiple insns.
> +* Including Patterns::  Including patterns in machine descriptions.

Insn splitting acts on insns (a compiler concept) not instructions (a
machine concept).

We don't use title case elsewhere in the descriptions.

>  * Peephole Definitions::Defining machine-specific peephole optimizations.
>  * Insn Attributes::     Specifying the value of attributes for generated insns.
>  * Conditional Execution::Generating @code{define_insn} patterns for
> @@ -54,50 +54,60 @@ See the next chapter for information on the C header file.
>  @node Overview
>  @section Overview of How the Machine Description is Used
>  
> -There are three main conversions that happen in the compiler:
> +There are four main conversions that happen in the compiler:

The previous text is inaccurate.

>  
>  @enumerate
>  
>  @item
> -The front end reads the source code and builds a parse tree.
> +The front end reads the source code and converts it to an intermediate,
> +front end specific, tree based representation.

The previous text is imprecise.

>  @item
> -The parse tree is used to generate an RTL insn list based on named
> -instruction patterns.
> +This tree based representation is lowered (gimplified) to a gimple
> +representation.

The previous text is inaccurate.

>  
>  @item
> -The insn list is matched against the RTL templates to produce assembler
> -code.
> +The gimple representation is transformed (expanded) to an RTL
> +representation.  This RTL representation is a doubly-linked chain of
> +objects called insns, known as the insn list.

New text describing expansion from gimple to RTL.

> +
> +@item
> +The insn list is optimized, and the optimized insn list is matched
> +against @code{define_insn} instruction patterns in the machine description
> +to produce assembler code.

Add text describing that the insn list is further optimized before
instruction selection.

>  
>  @end enumerate
>  
> -For the generate pass, only the names of the insns matter, from either a
> -named @code{define_insn} or a @code{define_expand}.  The compiler will
> +When expanding from gimple to RTL, only named @code{define_insn}
> +constructs and @code{define_expand} constructs are used.  The compiler will
>  choose the pattern with the right name and apply the operands according
>  to the documentation later in this chapter, without regard for the RTL
> -template or operand constraints.  Note that the names the compiler looks
> -for are hard-coded in the compiler---it will ignore unnamed patterns and
> -patterns with names it doesn't know about, but if you don't provide a
> -named pattern it needs, it will abort.
> +template or operand constraints.  Note that the names which are used for
> +expansion are hard-coded in the compiler---unnamed patterns and patterns
> +with names which do not have a standard meaning are ignored during
> +expansion.  If you don't provide a named pattern that the compiler is
> +trying to expand, it may try a different expansion strategy.  If no
> +other expansion strategies are possible, the compiler will abort.

This is a general cleanup to make the text less casual.  This could do
with a second pass over it to remove the repeated use of "you" and the
anthropomorphism of the compiler.

I prefer the text below. We lose "without regard for the RTL template or
operand constraints", but this is contradictory anyway, as the RTL template
is used when expanding a define_insn.

  When expanding from gimple to RTL, only named @code{define_insn}
  constructs and @code{define_expand} constructs are used.  The set of
  names which are meaningful during expansion are detailed later in this
  chapter (@pxref{Standard Names}.  Note that the names which are used for
  expansion are hard-coded in the compiler---unnamed patterns and patterns
  with names which do not have a standard meaning are ignored during
  expansion.  If a named pattern that the compiler is trying to expand,
  is not provided, it may try a different expansion strategy.  If no
  other expansion strategies are possible, the compiler will abort.



>  
>  If a @code{define_insn} is used, the template given is inserted into the
> -insn list.  If a @code{define_expand} is used, one of three things
> -happens, based on the condition logic.  The condition logic may manually
> -create new insns for the insn list, say via @code{emit_insn()}, and
> -invoke @code{DONE}.  For certain named patterns, it may invoke @code{FAIL} to tell the
> -compiler to use an alternate way of performing that task.  If it invokes
> -neither @code{DONE} nor @code{FAIL}, the template given in the pattern
> +insn list.
> +
> +If a @code{define_expand} is used, one of three things happens, encoded in
> +the condition logic.  The condition logic may manually create new insns
> +for the insn list, say via @code{emit_insn ()}, and invoke @code{DONE},
> +indicating a successful expansion.  If the standard pattern name permits
> +it, the condition logic may invoke @code{FAIL} to express that an alternate
> +strategy should be used to performing the expansion.  If the condition logic
> +invokes neither @code{DONE} nor @code{FAIL}, the template given in the pattern
>  is inserted, as if the @code{define_expand} were a @code{define_insn}.

Split one paragraph to two. One dealing with define_insn expressions, one
for define_expand expressions. This aids readability.

s/based on the condition logic/encoded in the condition logic/ -- I
don't like the use of "based on" to describe code, the code explicitly
says what to do, the compiler doesn't "base" what to do on the code.

I dislike that the condition logic is the actor in the paragraph, but I
struggled to rewrite it, so I gave up and just replaced the unclear use of
"it" with "the condition logic".

>  Once the insn list is generated, various optimization passes convert,
> -replace, and rearrange the insns in the insn list.  This is where the
> -@code{define_split} and @code{define_peephole} patterns get used, for
> -example.

These examples don't add anything useful and introduce new keywords we've
not mentioned before.

> -Finally, the insn list's RTL is matched up with the RTL templates in the
> -@code{define_insn} patterns, and those patterns are used to emit the
> -final assembly code.  For this purpose, each named @code{define_insn}
> -acts like it's unnamed, since the names are ignored.
> +replace, and rearrange the insns in the insn list.
> +
> +Finally, the insn list's RTL is matched with the RTL templates from the
> +@code{define_insn} instruction patterns, and those patterns are used to
> +emit the final assembly code.  For this purpose, names are ignored.  All
> +@code{define_insn} are considered for matching.

Just a rewording. The previous text is unnecessarily verbose.
Joseph Myers Jan. 6, 2015, 6:55 p.m. UTC | #2
On Tue, 6 Jan 2015, James Greenhalgh wrote:

> > -The @file{.md} file for a target machine contains a pattern for each
> > -instruction that the target machine supports (or at least each instruction
> > -that is worth telling the compiler about).  It may also contain comments.
> > -A semicolon causes the rest of the line to be a comment, unless the semicolon
> > -is inside a quoted string.
> 
> A detailed description of the syntax of a .md file does not belong in an
> introduction. It is a deficiency of this patch set that this text is not
> added back elsewhere. Do you have any suggestions of where the text could
> sensibly go?

A section that describes the lexical structure and syntax of .md files?

> > @@ -54,50 +54,60 @@ See the next chapter for information on the C header file.
> >  @node Overview
> >  @section Overview of How the Machine Description is Used
> >  
> > -There are three main conversions that happen in the compiler:
> > +There are four main conversions that happen in the compiler:
> 
> The previous text is inaccurate.

It would seem better not to give a specific number (just as we've tried to 
keep down the number of references in the web pages to a particular 
version control system).

> I prefer the text below. We lose "without regard for the RTL template or
> operand constraints", but this is contradictory anyway, as the RTL template
> is used when expanding a define_insn.

I believe the point of "without regard for the RTL template or operand 
constraints" is that when e.g. expanding addition of two SImode values, 
the RTL template from the addsi3 pattern is inserted blindly, even if that 
RTL looks nothing like RTL for an addition; nothing looks for an (add:SI) 
pattern, or generates such RTL if it's not what the RTL for addsi3 looks 
like.

I should point out, if giving paragraph-by-paragraph rationale, that *each 
revision of the patch* needs to give the paragraph-by-paragraph rationale 
corresponding to that revision of the patch - that a piece passed over for 
one revision by one reviewer may be reviewed in more detail later or by 
another reviewer.  This may be easier with a finer-grained division of the 
patch submission.
Jeff Law Jan. 8, 2015, 9:43 p.m. UTC | #3
On 01/06/15 11:55, Joseph Myers wrote:
> On Tue, 6 Jan 2015, James Greenhalgh wrote:
>
>>> -The @file{.md} file for a target machine contains a pattern for each
>>> -instruction that the target machine supports (or at least each instruction
>>> -that is worth telling the compiler about).  It may also contain comments.
>>> -A semicolon causes the rest of the line to be a comment, unless the semicolon
>>> -is inside a quoted string.
>>
>> A detailed description of the syntax of a .md file does not belong in an
>> introduction. It is a deficiency of this patch set that this text is not
>> added back elsewhere. Do you have any suggestions of where the text could
>> sensibly go?
>
> A section that describes the lexical structure and syntax of .md files?
Seems reasonable. However, is that too heavyweight?

ISTM we have, effectively 3 items.

Directives that start with an open-paren and end with their associated 
close paren.  ANything in between those parens is governed by the 
documentation for the specific directive.

Outside a directive we can lines with are empty or contain just 
whitespace.  These lines are effectively ignored.

Also outside a directive are lines that start with ';;'. From the ';;' 
to the end of line are comments and ignored.



>
>>> @@ -54,50 +54,60 @@ See the next chapter for information on the C header file.
>>>   @node Overview
>>>   @section Overview of How the Machine Description is Used
>>>
>>> -There are three main conversions that happen in the compiler:
>>> +There are four main conversions that happen in the compiler:
>>
>> The previous text is inaccurate.
>
> It would seem better not to give a specific number (just as we've tried to
> keep down the number of references in the web pages to a particular
> version control system).
Agreed. It's bound to get out of date as some point.


>
>> I prefer the text below. We lose "without regard for the RTL template or
>> operand constraints", but this is contradictory anyway, as the RTL template
>> is used when expanding a define_insn.
>
> I believe the point of "without regard for the RTL template or operand
> constraints" is that when e.g. expanding addition of two SImode values,
> the RTL template from the addsi3 pattern is inserted blindly, even if that
> RTL looks nothing like RTL for an addition; nothing looks for an (add:SI)
> pattern, or generates such RTL if it's not what the RTL for addsi3 looks
> like.
Right, but the next paragraph expands on things a bit and I think is 
sufficient that we can drop the "without regard for the RTL template". 
So perhaps take that out, but leave the "without regard for the operand 
constraints".

Jeff
Segher Boessenkool Jan. 9, 2015, 2:10 a.m. UTC | #4
On Thu, Jan 08, 2015 at 02:43:27PM -0700, Jeff Law wrote:
> >A section that describes the lexical structure and syntax of .md files?
> Seems reasonable. However, is that too heavyweight?
> 
> ISTM we have, effectively 3 items.
> 
> Directives that start with an open-paren and end with their associated 
> close paren.  ANything in between those parens is governed by the 
> documentation for the specific directive.
> 
> Outside a directive we can lines with are empty or contain just 
> whitespace.  These lines are effectively ignored.
> 
> Also outside a directive are lines that start with ';;'. From the ';;' 
> to the end of line are comments and ignored.

Empty lines are whitespace.  Comments are whitespace (two kinds of
comments: ";comment<EOL/EOF>" (single semi-colon!), as well as C-style
/*comments*/).

At the top level, you have <optional whitespace> '(' <opt ws> <name>
<whatever that kind of thing wants> <opt ws> ')'; repeat; <opt ws>.

<whatever> is constrained by just a few things: anything starting with
'"' is parsed as a string (i.e. until a non-quoted second '"'); everything
starting with '{' is parsed as a C block (i.e. until a matching '}'),
and outside of those the parens have to match.

Oh, and <name> can contain much more than just alphanumerics, but we
really shouldn't document that ;-)


Segher
Richard Sandiford Jan. 12, 2015, 10:19 p.m. UTC | #5
James Greenhalgh <james.greenhalgh@arm.com> writes:
> -For the generate pass, only the names of the insns matter, from either a
> -named @code{define_insn} or a @code{define_expand}.  The compiler will
> +When expanding from gimple to RTL, only named @code{define_insn}
> +constructs and @code{define_expand} constructs are used.  The compiler will
>  choose the pattern with the right name and apply the operands according
>  to the documentation later in this chapter, without regard for the RTL
> -template or operand constraints.  Note that the names the compiler looks
> -for are hard-coded in the compiler---it will ignore unnamed patterns and
> -patterns with names it doesn't know about, but if you don't provide a
> -named pattern it needs, it will abort.
> +template or operand constraints.  Note that the names which are used for
> +expansion are hard-coded in the compiler---unnamed patterns and patterns
> +with names which do not have a standard meaning are ignored during
> +expansion.  If you don't provide a named pattern that the compiler is
> +trying to expand, it may try a different expansion strategy.  If no
> +other expansion strategies are possible, the compiler will abort.

Minor nit, I think both "which"es here should be "that"s.

>  If a @code{define_insn} is used, the template given is inserted into the
> -insn list.  If a @code{define_expand} is used, one of three things
> -happens, based on the condition logic.  The condition logic may manually
> -create new insns for the insn list, say via @code{emit_insn()}, and
> -invoke @code{DONE}.  For certain named patterns, it may invoke @code{FAIL} to tell the
> -compiler to use an alternate way of performing that task.  If it invokes
> -neither @code{DONE} nor @code{FAIL}, the template given in the pattern
> +insn list.
> +
> +If a @code{define_expand} is used, one of three things happens, encoded in
> +the condition logic.  The condition logic may manually create new insns
> +for the insn list, say via @code{emit_insn ()}, and invoke @code{DONE},
> +indicating a successful expansion.  If the standard pattern name permits
> +it, the condition logic may invoke @code{FAIL} to express that an alternate
> +strategy should be used to performing the expansion.  If the condition logic
> +invokes neither @code{DONE} nor @code{FAIL}, the template given in the pattern

Can we s/alternate/alternative/ at the same time?  I think "alternative"
is still correct in US English.

s/to performing/to perform/.

Thanks,
Richard
diff mbox

Patch

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 7f0426c..0277f14 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7,39 +7,39 @@ 
 @chapter Machine Descriptions
 @cindex machine descriptions
 
-A machine description has two parts: a file of instruction patterns
-(@file{.md} file) and a C header file of macro definitions.
+A machine description has two parts: one or more files describing the
+instructions available on the target (@file{.md} files) and one or more
+C/C++ source files implementing target hooks and macros.  These are
+described in the next chapter (@pxref{Target Macros}).
 
-The @file{.md} file for a target machine contains a pattern for each
-instruction that the target machine supports (or at least each instruction
-that is worth telling the compiler about).  It may also contain comments.
-A semicolon causes the rest of the line to be a comment, unless the semicolon
-is inside a quoted string.
-
-See the next chapter for information on the C header file.
+This chapter describes the @file{.md} files for a target machine.  These
+contain the instruction patterns which are used by the compiler when
+expanding gimple to RTL, during the RTL optimization passes, and when
+emitting assembler code.
 
 @menu
 * Overview::            How the machine description is used.
 * Patterns::            How to write instruction patterns.
 * Example::             An explained example of a @code{define_insn} pattern.
-* RTL Template::        The RTL template defines what insns match a pattern.
+* RTL Template::        The RTL template defines which insns may match a
+                        pattern.
 * Output Template::     The output template says how to make assembler code
-                        from such an insn.
+                        from an insn matched to an instruction pattern.
 * Output Statement::    For more generality, write C code to output
                         the assembler code.
-* Predicates::          Controlling what kinds of operands can be used
-                        for an insn.
+* Predicates::          Controlling which kinds of operands can be used
+                        when matching an insn to an instruction pattern.
 * Constraints::         Fine-tuning operand selection.
 * Standard Names::      Names mark patterns to use for code generation.
 * Pattern Ordering::    When the order of patterns makes a difference.
 * Dependent Patterns::  Having one pattern may make you need another.
 * Jump Patterns::       Special considerations for patterns for jump insns.
 * Looping Patterns::    How to define patterns for special looping insns.
-* Insn Canonicalizations::Canonicalization of Instructions
+* Insn Canonicalizations::Canonicalization of instructions
 * Expander Definitions::Generating a sequence of several RTL insns
                         for a standard operation.
-* Insn Splitting::      Splitting Instructions into Multiple Instructions.
-* Including Patterns::  Including Patterns in Machine Descriptions.
+* Insn Splitting::      Splitting insns into multiple insns.
+* Including Patterns::  Including patterns in machine descriptions.
 * Peephole Definitions::Defining machine-specific peephole optimizations.
 * Insn Attributes::     Specifying the value of attributes for generated insns.
 * Conditional Execution::Generating @code{define_insn} patterns for
@@ -54,50 +54,60 @@  See the next chapter for information on the C header file.
 @node Overview
 @section Overview of How the Machine Description is Used
 
-There are three main conversions that happen in the compiler:
+There are four main conversions that happen in the compiler:
 
 @enumerate
 
 @item
-The front end reads the source code and builds a parse tree.
+The front end reads the source code and converts it to an intermediate,
+front end specific, tree based representation.
 
 @item
-The parse tree is used to generate an RTL insn list based on named
-instruction patterns.
+This tree based representation is lowered (gimplified) to a gimple
+representation.
 
 @item
-The insn list is matched against the RTL templates to produce assembler
-code.
+The gimple representation is transformed (expanded) to an RTL
+representation.  This RTL representation is a doubly-linked chain of
+objects called insns, known as the insn list.
+
+@item
+The insn list is optimized, and the optimized insn list is matched
+against @code{define_insn} instruction patterns in the machine description
+to produce assembler code.
 
 @end enumerate
 
-For the generate pass, only the names of the insns matter, from either a
-named @code{define_insn} or a @code{define_expand}.  The compiler will
+When expanding from gimple to RTL, only named @code{define_insn}
+constructs and @code{define_expand} constructs are used.  The compiler will
 choose the pattern with the right name and apply the operands according
 to the documentation later in this chapter, without regard for the RTL
-template or operand constraints.  Note that the names the compiler looks
-for are hard-coded in the compiler---it will ignore unnamed patterns and
-patterns with names it doesn't know about, but if you don't provide a
-named pattern it needs, it will abort.
+template or operand constraints.  Note that the names which are used for
+expansion are hard-coded in the compiler---unnamed patterns and patterns
+with names which do not have a standard meaning are ignored during
+expansion.  If you don't provide a named pattern that the compiler is
+trying to expand, it may try a different expansion strategy.  If no
+other expansion strategies are possible, the compiler will abort.
 
 If a @code{define_insn} is used, the template given is inserted into the
-insn list.  If a @code{define_expand} is used, one of three things
-happens, based on the condition logic.  The condition logic may manually
-create new insns for the insn list, say via @code{emit_insn()}, and
-invoke @code{DONE}.  For certain named patterns, it may invoke @code{FAIL} to tell the
-compiler to use an alternate way of performing that task.  If it invokes
-neither @code{DONE} nor @code{FAIL}, the template given in the pattern
+insn list.
+
+If a @code{define_expand} is used, one of three things happens, encoded in
+the condition logic.  The condition logic may manually create new insns
+for the insn list, say via @code{emit_insn ()}, and invoke @code{DONE},
+indicating a successful expansion.  If the standard pattern name permits
+it, the condition logic may invoke @code{FAIL} to express that an alternate
+strategy should be used to performing the expansion.  If the condition logic
+invokes neither @code{DONE} nor @code{FAIL}, the template given in the pattern
 is inserted, as if the @code{define_expand} were a @code{define_insn}.
 
 Once the insn list is generated, various optimization passes convert,
-replace, and rearrange the insns in the insn list.  This is where the
-@code{define_split} and @code{define_peephole} patterns get used, for
-example.
-
-Finally, the insn list's RTL is matched up with the RTL templates in the
-@code{define_insn} patterns, and those patterns are used to emit the
-final assembly code.  For this purpose, each named @code{define_insn}
-acts like it's unnamed, since the names are ignored.
+replace, and rearrange the insns in the insn list.
+
+Finally, the insn list's RTL is matched with the RTL templates from the
+@code{define_insn} instruction patterns, and those patterns are used to
+emit the final assembly code.  For this purpose, names are ignored.  All
+@code{define_insn} are considered for matching.
 
 @node Patterns
 @section Everything about Instruction Patterns