diff mbox

Add null identifiers to genmatch

Message ID 87lhaaosh5.fsf@e105548-lin.cambridge.arm.com
State New
Headers show

Commit Message

Richard Sandiford Nov. 7, 2015, 1:21 p.m. UTC
This patch adds a null identifier that can never match anything and
can never be generated.  It is only valid in operator lists and fors.
Later patches will add uses of it.

The idea is to allow operator lists for maths functions that have
four entries:

- float built-in
- double built-in
- long double built-in
- internal function

Not all maths functions have an associated internal function,
and for those the final operator will be "null".  Any simplification
that tries to use a null substitution will be skipped.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
	* doc/match-and-simplify.texi: Document the "null" identifier.
	* genmatch.c (id_base::NULL_ID): New kind.
	(null_id): New variable.
	(get_operator): Add a parameter that says whether null identifiers
	are allowed.
	(contains_id): New function.
	(lower_for): Skip substitutions that would have a null_id in
	either the match or the result.
	(parser::parse_for): Allow the null identifier to be used.
	(parser::parse_operator_list): Likewise.
	(main): Initialize null_id.

Comments

Pedro Alves Nov. 7, 2015, 2:31 p.m. UTC | #1
Hi Richard,

Passerby comment below.

On 11/07/2015 01:21 PM, Richard Sandiford wrote:
> -/* Lookup the identifier ID.  */
> +/* Lookup the identifier ID.  Allow "null" if ALLOW_NULL.  */
>  
>  id_base *
> -get_operator (const char *id)
> +get_operator (const char *id, bool allow_null = false)
>  {
> +  if (allow_null && strcmp (id, "null") == 0)
> +    return null_id;
> +
>    id_base tem (id_base::CODE, id);

Boolean params are best avoided if possible, IMO.  In this case,
it seems this could instead be a new wrapper function, like:

id_base *
get_operator_allow_null (const char *id)
{
  if (strcmp (id, "null") == 0)
    return null_id;
  return get_operator (id);
}

Then callers are more obvious as you no longer have to know
what true/false mean:

       const char *id = get_ident ();
-      if (get_operator (id) != NULL)
+      if (get_operator_allow_null (id) != NULL)
 	fatal_at (token, "operator already defined");


vs:

       const char *id = get_ident ();
-      if (get_operator (id) != NULL)
+      if (get_operator (id, true) != NULL)
 	fatal_at (token, "operator already defined");


Thanks,
Pedro Alves
Jeff Law Nov. 8, 2015, 11:17 p.m. UTC | #2
On 11/07/2015 07:31 AM, Pedro Alves wrote:
> Hi Richard,
>
> Passerby comment below.
>
> On 11/07/2015 01:21 PM, Richard Sandiford wrote:
>> -/* Lookup the identifier ID.  */
>> +/* Lookup the identifier ID.  Allow "null" if ALLOW_NULL.  */
>>
>>   id_base *
>> -get_operator (const char *id)
>> +get_operator (const char *id, bool allow_null = false)
>>   {
>> +  if (allow_null && strcmp (id, "null") == 0)
>> +    return null_id;
>> +
>>     id_base tem (id_base::CODE, id);
>
> Boolean params are best avoided if possible, IMO.  In this case,
> it seems this could instead be a new wrapper function, like:
This hasn't been something we've required for GCC.    I've come across 
this recommendation a few times over the last several months as I 
continue to look at refactoring and best practices for codebases such as 
GCC.

By encoding the boolean in the function's signature, it (IMHO) does make 
the code a bit easier to read, primarily because you don't have to go 
lookup the tense of the boolean).  The problem is when the boolean is 
telling us some property an argument, but there's more than one argument 
and other similar situations.

I wonder if the real benefit is in the refactoring necessary to do 
things in this way without a ton of code duplication.

Jeff
Richard Biener Nov. 9, 2015, 2:56 p.m. UTC | #3
On Mon, Nov 9, 2015 at 12:17 AM, Jeff Law <law@redhat.com> wrote:
> On 11/07/2015 07:31 AM, Pedro Alves wrote:
>>
>> Hi Richard,
>>
>> Passerby comment below.
>>
>> On 11/07/2015 01:21 PM, Richard Sandiford wrote:
>>>
>>> -/* Lookup the identifier ID.  */
>>> +/* Lookup the identifier ID.  Allow "null" if ALLOW_NULL.  */
>>>
>>>   id_base *
>>> -get_operator (const char *id)
>>> +get_operator (const char *id, bool allow_null = false)
>>>   {
>>> +  if (allow_null && strcmp (id, "null") == 0)
>>> +    return null_id;
>>> +
>>>     id_base tem (id_base::CODE, id);
>>
>>
>> Boolean params are best avoided if possible, IMO.  In this case,
>> it seems this could instead be a new wrapper function, like:
>
> This hasn't been something we've required for GCC.    I've come across this
> recommendation a few times over the last several months as I continue to
> look at refactoring and best practices for codebases such as GCC.
>
> By encoding the boolean in the function's signature, it (IMHO) does make the
> code a bit easier to read, primarily because you don't have to go lookup the
> tense of the boolean).  The problem is when the boolean is telling us some
> property an argument, but there's more than one argument and other similar
> situations.
>
> I wonder if the real benefit is in the refactoring necessary to do things in
> this way without a ton of code duplication.

I think the patch is ok as-is.

Thus ok.

Thanks,
Richard.

> Jeff
>
>
Pedro Alves Nov. 16, 2015, 8:15 p.m. UTC | #4
Hi Jeff,

(Sorry I didn't reply sooner, I was OOO.)

On 11/08/2015 11:17 PM, Jeff Law wrote:
> On 11/07/2015 07:31 AM, Pedro Alves wrote:
>> Hi Richard,
>>
>> Passerby comment below.
>>
>> On 11/07/2015 01:21 PM, Richard Sandiford wrote:
>>> -/* Lookup the identifier ID.  */
>>> +/* Lookup the identifier ID.  Allow "null" if ALLOW_NULL.  */
>>>
>>>   id_base *
>>> -get_operator (const char *id)
>>> +get_operator (const char *id, bool allow_null = false)
>>>   {
>>> +  if (allow_null && strcmp (id, "null") == 0)
>>> +    return null_id;
>>> +
>>>     id_base tem (id_base::CODE, id);
>>
>> Boolean params are best avoided if possible, IMO.  In this case,
>> it seems this could instead be a new wrapper function, like:
> This hasn't been something we've required for GCC.    I've come across 
> this recommendation a few times over the last several months as I 
> continue to look at refactoring and best practices for codebases such as 
> GCC.

FWIW, GDB is in progress of converting to C++ and we're pulling in
GCC's C++ conventions as reference, so I thought I'd see what the GCC
community thought here.

> 
> By encoding the boolean in the function's signature, it (IMHO) does make 
> the code a bit easier to read, primarily because you don't have to go 
> lookup the tense of the boolean).  The problem is when the boolean is 
> telling us some property an argument, but there's more than one argument 
> and other similar situations.

There's more than one way to avoid boolean parameters.
If you have more than one of those, the boolean trap is even
worse IMO.  In such cases, enums can help make things clearer for
the reader.  E.g.:

 foo (true, false);
 foo (false, true);

vs:

 foo (whatever::value1, bar::in_style);

I think that internal helper functions defined close to
their usage end up being OK to use booleans, while if you have
a API consumed by other modules it pays off more to try to
avoid the boolean trap.

Anyway, sorry for the noise.  I know there are bigger fish to fry.

Thanks,
Pedro Alves
Jeff Law Nov. 16, 2015, 8:48 p.m. UTC | #5
On 11/16/2015 01:15 PM, Pedro Alves wrote:
> Hi Jeff,
>
> (Sorry I didn't reply sooner, I was OOO.)
No worries.

>>>
>>> Boolean params are best avoided if possible, IMO.  In this case,
>>> it seems this could instead be a new wrapper function, like:
>> This hasn't been something we've required for GCC.    I've come across
>> this recommendation a few times over the last several months as I
>> continue to look at refactoring and best practices for codebases such as
>> GCC.
>
> FWIW, GDB is in progress of converting to C++ and we're pulling in
> GCC's C++ conventions as reference, so I thought I'd see what the GCC
> community thought here.
FWIW, I often look at GCC's conventions, google's conventions, then 
start doing google searches around this kind of stuff.  And as always, 
the quality of the latter can vary wildly :-)

>
>>
>> By encoding the boolean in the function's signature, it (IMHO) does make
>> the code a bit easier to read, primarily because you don't have to go
>> lookup the tense of the boolean).  The problem is when the boolean is
>> telling us some property an argument, but there's more than one argument
>> and other similar situations.
>
> There's more than one way to avoid boolean parameters.
> If you have more than one of those, the boolean trap is even
> worse IMO.  In such cases, enums can help make things clearer for
> the reader.  E.g.:
>
>   foo (true, false);
>   foo (false, true);
>
> vs:
>
>   foo (whatever::value1, bar::in_style);
Yea, I saw the latter at some point as well.  In general I don't think 
we've used enums as well as we could/should have in GCC through the years.


>
> I think that internal helper functions defined close to
> their usage end up being OK to use booleans, while if you have
> a API consumed by other modules it pays off more to try to
> avoid the boolean trap.
>
> Anyway, sorry for the noise.  I know there are bigger fish to fry.
Not noise at all -- no need to apologize.

jeff
Richard Sandiford Nov. 16, 2015, 9:52 p.m. UTC | #6
Jeff Law <law@redhat.com> writes:
>>>> Boolean params are best avoided if possible, IMO.  In this case,
>>>> it seems this could instead be a new wrapper function, like:
>>> This hasn't been something we've required for GCC.    I've come across
>>> this recommendation a few times over the last several months as I
>>> continue to look at refactoring and best practices for codebases such as
>>> GCC.
>>
>> FWIW, GDB is in progress of converting to C++ and we're pulling in
>> GCC's C++ conventions as reference, so I thought I'd see what the GCC
>> community thought here.
> FWIW, I often look at GCC's conventions, google's conventions, then 
> start doing google searches around this kind of stuff.  And as always, 
> the quality of the latter can vary wildly :-)
>
>>
>>>
>>> By encoding the boolean in the function's signature, it (IMHO) does make
>>> the code a bit easier to read, primarily because you don't have to go
>>> lookup the tense of the boolean).  The problem is when the boolean is
>>> telling us some property an argument, but there's more than one argument
>>> and other similar situations.
>>
>> There's more than one way to avoid boolean parameters.
>> If you have more than one of those, the boolean trap is even
>> worse IMO.  In such cases, enums can help make things clearer for
>> the reader.  E.g.:
>>
>>   foo (true, false);
>>   foo (false, true);
>>
>> vs:
>>
>>   foo (whatever::value1, bar::in_style);
> Yea, I saw the latter at some point as well.  In general I don't think 
> we've used enums as well as we could/should have in GCC through the years.

Yeah.  Kenny was adamant that for wide-int we should have an UNSIGNED/SIGNED
enum rather than a boolean flag.  And I think that does make things clearer.
I always have to remind myself whether "true" means "unsigned" or "signed",
especially for RTL functions.

I certainly prefer the enum to separate functions though.  They can get
messy if a new call site is added that needs a variable parameter.

Thanks,
Richard
Mike Stump Nov. 16, 2015, 11:14 p.m. UTC | #7
On Nov 16, 2015, at 1:52 PM, Richard Sandiford <rdsandiford@gmail.com> wrote:
> 
> Yeah.  Kenny was adamant that for wide-int we should have an UNSIGNED/SIGNED

Yeah, you can blame me.  I think (, UNSIGNED) conveys more than (,true) or (,false).  The sad part is, this has always been true.

> enum rather than a boolean flag.  And I think that does make things clearer.
> I always have to remind myself whether "true" means "unsigned" or "signed",
> especially for RTL functions.

Certainly any api that uses boolean for signed/unsigned, can be switched to signop (aka SIGNED, UNSIGNED).  It was put in as a general feature disconnected from wide-int for a reason.  :-)

Here we shall salute the last of the hold outs:

base_type_for_mode (machine_mode mode, bool unsignedp)
convert_extracted_bit_field (rtx x, machine_mode mode, machine_mode tmode, bool unsignedp)
gimple_signed_or_unsigned_type (bool unsignedp, tree type)
avoid_expensive_constant (machine_mode mode, optab binoptab,
                          int opn, rtx x, bool unsignedp)
get_rtx_code (enum tree_code tcode, bool unsignedp)
vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
                    bool unsignedp, enum insn_code icode)
create_expand_operand (struct expand_operand *op,
                       enum expand_operand_type type,
                       rtx value, machine_mode mode,
                       bool unsigned_p)
create_convert_operand_to (struct expand_operand *op, rtx value,
                           machine_mode mode, bool unsigned_p)
create_convert_operand_from (struct expand_operand *op, rtx value,
                             machine_mode mode, bool unsigned_p)

but, it was designed with uses like:

  bool unsigned_p = false;
 
in mind as well.  I will note:

bool signed_p;

and:

shorten_into_mode (struct rtx_iv *iv, machine_mode mode,
                   enum rtx_code cond, bool signed_p, struct niter_desc *desc)

are the odd man out.  Their value is inverted from the value signop uses.

> I certainly prefer the enum to separate functions though.  They can get
> messy if a new call site is added that needs a variable parameter.

To me it’s a numbers game.  When there are 200 simple calls and 5 complex ones, I prefer 200 simple calls without the extra parameters, and 5 with the horror that is the argument list.
Richard Biener Nov. 17, 2015, 10:01 a.m. UTC | #8
On Tue, Nov 17, 2015 at 12:14 AM, Mike Stump <mikestump@comcast.net> wrote:
> On Nov 16, 2015, at 1:52 PM, Richard Sandiford <rdsandiford@gmail.com> wrote:
>>
>> Yeah.  Kenny was adamant that for wide-int we should have an UNSIGNED/SIGNED
>
> Yeah, you can blame me.  I think (, UNSIGNED) conveys more than (,true) or (,false).  The sad part is, this has always been true.
>
>> enum rather than a boolean flag.  And I think that does make things clearer.
>> I always have to remind myself whether "true" means "unsigned" or "signed",
>> especially for RTL functions.
>
> Certainly any api that uses boolean for signed/unsigned, can be switched to signop (aka SIGNED, UNSIGNED).  It was put in as a general feature disconnected from wide-int for a reason.  :-)
>
> Here we shall salute the last of the hold outs:
>
> base_type_for_mode (machine_mode mode, bool unsignedp)
> convert_extracted_bit_field (rtx x, machine_mode mode, machine_mode tmode, bool unsignedp)
> gimple_signed_or_unsigned_type (bool unsignedp, tree type)
> avoid_expensive_constant (machine_mode mode, optab binoptab,
>                           int opn, rtx x, bool unsignedp)
> get_rtx_code (enum tree_code tcode, bool unsignedp)
> vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
>                     bool unsignedp, enum insn_code icode)
> create_expand_operand (struct expand_operand *op,
>                        enum expand_operand_type type,
>                        rtx value, machine_mode mode,
>                        bool unsigned_p)
> create_convert_operand_to (struct expand_operand *op, rtx value,
>                            machine_mode mode, bool unsigned_p)
> create_convert_operand_from (struct expand_operand *op, rtx value,
>                              machine_mode mode, bool unsigned_p)

Err, you miss a few '[un]signed unsigned[_]p' cases ;)

> but, it was designed with uses like:
>
>   bool unsigned_p = false;
>
> in mind as well.  I will note:
>
> bool signed_p;
>
> and:
>
> shorten_into_mode (struct rtx_iv *iv, machine_mode mode,
>                    enum rtx_code cond, bool signed_p, struct niter_desc *desc)
>
> are the odd man out.  Their value is inverted from the value signop uses.
>
>> I certainly prefer the enum to separate functions though.  They can get
>> messy if a new call site is added that needs a variable parameter.
>
> To me it’s a numbers game.  When there are 200 simple calls and 5 complex ones, I prefer 200 simple calls without the extra parameters, and 5 with the horror that is the argument list.
diff mbox

Patch

diff --git a/gcc/doc/match-and-simplify.texi b/gcc/doc/match-and-simplify.texi
index c5c2b7e..db6519d 100644
--- a/gcc/doc/match-and-simplify.texi
+++ b/gcc/doc/match-and-simplify.texi
@@ -323,6 +323,11 @@  is the same as
   (POW (abs @@0) (mult @@1 @{ built_real (TREE_TYPE (@@1), dconsthalf); @}))))
 @end smallexample
 
+@code{for}s and operator lists can include the special identifier
+@code{null} that matches nothing and can never be generated.  This can
+be used to pad an operator list so that it has a standard form,
+even if there isn't a suitable operator for every form.
+
 Another building block are @code{with} expressions in the
 result expression which nest the generated code in a new C block
 followed by its argument:
diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index c7ab4a4..cff32b0 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -297,7 +297,7 @@  commutative_ternary_tree_code (enum tree_code code)
 
 struct id_base : nofree_ptr_hash<id_base>
 {
-  enum id_kind { CODE, FN, PREDICATE, USER } kind;
+  enum id_kind { CODE, FN, PREDICATE, USER, NULL_ID } kind;
 
   id_base (id_kind, const char *, int = -1);
 
@@ -324,6 +324,9 @@  id_base::equal (const id_base *op1,
 	  && strcmp (op1->id, op2->id) == 0);
 }
 
+/* The special id "null", which matches nothing.  */
+static id_base *null_id;
+
 /* Hashtable of known pattern operators.  This is pre-seeded from
    all known tree codes and all known builtin function ids.  */
 static hash_table<id_base> *operators;
@@ -479,11 +482,14 @@  operator==(id_base &id, enum tree_code code)
   return false;
 }
 
-/* Lookup the identifier ID.  */
+/* Lookup the identifier ID.  Allow "null" if ALLOW_NULL.  */
 
 id_base *
-get_operator (const char *id)
+get_operator (const char *id, bool allow_null = false)
 {
+  if (allow_null && strcmp (id, "null") == 0)
+    return null_id;
+
   id_base tem (id_base::CODE, id);
 
   id_base *op = operators->find_with_hash (&tem, tem.hashval);
@@ -1115,6 +1121,40 @@  lower_cond (simplify *s, vec<simplify *>& simplifiers)
     }
 }
 
+/* Return true if O refers to ID.  */
+
+bool
+contains_id (operand *o, user_id *id)
+{
+  if (capture *c = dyn_cast<capture *> (o))
+    return c->what && contains_id (c->what, id);
+
+  if (expr *e = dyn_cast<expr *> (o))
+    {
+      if (e->operation == id)
+	return true;
+      for (unsigned i = 0; i < e->ops.length (); ++i)
+	if (contains_id (e->ops[i], id))
+	  return true;
+      return false;
+    }
+
+  if (with_expr *w = dyn_cast <with_expr *> (o))
+    return (contains_id (w->with, id)
+	    || contains_id (w->subexpr, id));
+
+  if (if_expr *ife = dyn_cast <if_expr *> (o))
+    return (contains_id (ife->cond, id)
+	    || contains_id (ife->trueexpr, id)
+	    || (ife->falseexpr && contains_id (ife->falseexpr, id)));
+
+  if (c_expr *ce = dyn_cast<c_expr *> (o))
+    return ce->capture_ids && ce->capture_ids->get (id->id);
+
+  return false;
+}
+
+
 /* In AST operand O replace operator ID with operator WITH.  */
 
 operand *
@@ -1270,16 +1310,29 @@  lower_for (simplify *sin, vec<simplify *>& simplifiers)
 	      operand *result_op = s->result;
 	      vec<std::pair<user_id *, id_base *> > subst;
 	      subst.create (n_ids);
+	      bool skip = false;
 	      for (unsigned i = 0; i < n_ids; ++i)
 		{
 		  user_id *id = ids[i];
 		  id_base *oper = id->substitutes[j % id->substitutes.length ()];
+		  if (oper == null_id
+		      && (contains_id (match_op, id)
+			  || contains_id (result_op, id)))
+		    {
+		      skip = true;
+		      break;
+		    }
 		  subst.quick_push (std::make_pair (id, oper));
 		  match_op = replace_id (match_op, id, oper);
 		  if (result_op
 		      && !can_delay_subst)
 		    result_op = replace_id (result_op, id, oper);
 		}
+	      if (skip)
+		{
+		  subst.release ();
+		  continue;
+		}
 	      simplify *ns = new simplify (s->kind, match_op, result_op,
 					   vNULL, s->capture_ids);
 	      ns->for_subst_vec.safe_splice (s->for_subst_vec);
@@ -4242,7 +4295,7 @@  parser::parse_for (source_location)
 
       /* Insert the user defined operators into the operator hash.  */
       const char *id = get_ident ();
-      if (get_operator (id) != NULL)
+      if (get_operator (id, true) != NULL)
 	fatal_at (token, "operator already defined");
       user_id *op = new user_id (id);
       id_base **slot = operators->find_slot_with_hash (op, op->hashval, INSERT);
@@ -4256,7 +4309,7 @@  parser::parse_for (source_location)
       while ((token = peek_ident ()) != 0)
 	{
 	  const char *oper = get_ident ();
-	  id_base *idb = get_operator (oper);
+	  id_base *idb = get_operator (oper, true);
 	  if (idb == NULL)
 	    fatal_at (token, "no such operator '%s'", oper);
 	  if (*idb == CONVERT0 || *idb == CONVERT1 || *idb == CONVERT2
@@ -4346,7 +4399,7 @@  parser::parse_operator_list (source_location)
   const cpp_token *token = peek (); 
   const char *id = get_ident ();
 
-  if (get_operator (id) != 0)
+  if (get_operator (id, true) != 0)
     fatal_at (token, "operator %s already defined", id);
 
   user_id *op = new user_id (id, true);
@@ -4356,7 +4409,7 @@  parser::parse_operator_list (source_location)
     {
       token = peek (); 
       const char *oper = get_ident ();
-      id_base *idb = get_operator (oper);
+      id_base *idb = get_operator (oper, true);
       
       if (idb == 0)
 	fatal_at (token, "no such operator '%s'", oper);
@@ -4590,6 +4643,8 @@  main (int argc, char **argv)
   cpp_define (r, gimple ? "GIMPLE=1": "GENERIC=1");
   cpp_define (r, gimple ? "GENERIC=0": "GIMPLE=0");
 
+  null_id = new id_base (id_base::NULL_ID, "null");
+
   /* Pre-seed operators.  */
   operators = new hash_table<id_base> (1024);
 #define DEFTREECODE(SYM, STRING, TYPE, NARGS) \