diff mbox series

libcpp: Fix up padding handling in funlike_invocation_p [PR104147]

Message ID 20220131190349.GM2646553@tucnak
State New
Headers show
Series libcpp: Fix up padding handling in funlike_invocation_p [PR104147] | expand

Commit Message

Jakub Jelinek Jan. 31, 2022, 7:03 p.m. UTC
Hi!

As mentioned in the PR, in some cases we preprocess incorrectly when we
encounter an identifier which is defined as function-like macro, followed
by at least 2 CPP_PADDING tokens and then some other identifier.
On the following testcase, the problem is in the 3rd funlike_invocation_p,
the tokens are CPP_NAME Y, CPP_PADDING (the pfile->avoid_paste shared token),
CPP_PADDING (one created with padding_token, val.source is non-NULL and
val.source->flags & PREV_WHITE is non-zero) and then another CPP_NAME.
funlike_invocation_p remembers there was a padding token, but remembers the
first one because of its condition, then the next token is the CPP_NAME,
which is not CPP_OPEN_PAREN, so the CPP_NAME token is backed up, but as we
can't easily backup more tokens, it pushes into a new context the padding
token (the pfile->avoid_paste one).  The net effect is that when Y is not
defined as fun-like macro, we read Y, avoid_paste, padding_token, Y,
while if Y is fun-like macro, we read Y, avoid_paste, avoid_paste, Y
(the second avoid_paste is because that is how we handle end of a context).
Now, for stringify_arg that is unfortunately a significant difference,
which handles CPP_PADDING tokens with:
      if (token->type == CPP_PADDING)
        {
          if (source == NULL
              || (!(source->flags & PREV_WHITE)
                  && token->val.source == NULL))
            source = token->val.source;
          continue;
        }
and later on
      /* Leading white space?  */
      if (dest - 1 != BUFF_FRONT (pfile->u_buff))
        {
          if (source == NULL)
            source = token;
          if (source->flags & PREV_WHITE)
            *dest++ = ' ';
        }
      source = NULL;
(and c-ppoutput.cc has similar code).
So, when Y is not fun-like macro, ' ' is added because padding_token's
val.source->flags & PREV_WHITE is non-zero, while when it is fun-like
macro, we don't add ' ' in between, because source is NULL and so
used from the next token (CPP_NAME Y), which doesn't have PREV_WHITE set.

Now, the funlike_invocation_p condition
       if (padding == NULL
           || (!(padding->flags & PREV_WHITE) && token->val.source == NULL))
        padding = token;
looks very similar to that in stringify_arg/c-ppoutput.cc, so I assume
the intent was to prefer do the same thing and pick the right padding.
But there are significant differences.  Both stringify_arg and c-ppoutput.cc
don't remember the CPP_PADDING token, but its val.source instead, while
in funlike_invocation_p we want to remember the padding token that has the
significant information for stringify_arg/c-ppoutput.cc.
So, IMHO we want to overwrite padding if:
1) padding == NULL (remember that there was any padding at all)
2) padding->val.source == NULL (this matches the source == NULL
   case in stringify_arg)
3) !(padding->val.source->flags & PREV_WHITE) && token->val.source == NULL
   (this matches the !(source->flags & PREV_WHITE) && token->val.source == NULL
   case in stringify_arg)

Bootstrapped/regtested on powerpc64le-linux, ok for trunk
(and after a while for some release branches)?

2022-01-31  Jakub Jelinek  <jakub@redhat.com>

	PR preprocessor/104147
	* macro.cc (funlike_invocation_p): For padding prefer a token
	with val.source non-NULL especially if it has PREV_WHITE set
	on val.source->flags.

	* c-c++-common/cpp/pr104147.c: New test.


	Jakub

Comments

Jason Merrill Jan. 31, 2022, 10:12 p.m. UTC | #1
On 1/31/22 14:03, Jakub Jelinek wrote:
> Hi!
> 
> As mentioned in the PR, in some cases we preprocess incorrectly when we
> encounter an identifier which is defined as function-like macro, followed
> by at least 2 CPP_PADDING tokens and then some other identifier.
> On the following testcase, the problem is in the 3rd funlike_invocation_p,
> the tokens are CPP_NAME Y, CPP_PADDING (the pfile->avoid_paste shared token),
> CPP_PADDING (one created with padding_token, val.source is non-NULL and
> val.source->flags & PREV_WHITE is non-zero) and then another CPP_NAME.
> funlike_invocation_p remembers there was a padding token, but remembers the
> first one because of its condition, then the next token is the CPP_NAME,
> which is not CPP_OPEN_PAREN, so the CPP_NAME token is backed up, but as we
> can't easily backup more tokens, it pushes into a new context the padding
> token (the pfile->avoid_paste one).  The net effect is that when Y is not
> defined as fun-like macro, we read Y, avoid_paste, padding_token, Y,
> while if Y is fun-like macro, we read Y, avoid_paste, avoid_paste, Y
> (the second avoid_paste is because that is how we handle end of a context).
> Now, for stringify_arg that is unfortunately a significant difference,
> which handles CPP_PADDING tokens with:
>        if (token->type == CPP_PADDING)
>          {
>            if (source == NULL
>                || (!(source->flags & PREV_WHITE)
>                    && token->val.source == NULL))
>              source = token->val.source;
>            continue;
>          }
> and later on
>        /* Leading white space?  */
>        if (dest - 1 != BUFF_FRONT (pfile->u_buff))
>          {
>            if (source == NULL)
>              source = token;
>            if (source->flags & PREV_WHITE)
>              *dest++ = ' ';
>          }
>        source = NULL;
> (and c-ppoutput.cc has similar code).
> So, when Y is not fun-like macro, ' ' is added because padding_token's
> val.source->flags & PREV_WHITE is non-zero, while when it is fun-like
> macro, we don't add ' ' in between, because source is NULL and so
> used from the next token (CPP_NAME Y), which doesn't have PREV_WHITE set.
> 
> Now, the funlike_invocation_p condition
>         if (padding == NULL
>             || (!(padding->flags & PREV_WHITE) && token->val.source == NULL))
>          padding = token;
> looks very similar to that in stringify_arg/c-ppoutput.cc, so I assume
> the intent was to prefer do the same thing and pick the right padding.
> But there are significant differences.  Both stringify_arg and c-ppoutput.cc
> don't remember the CPP_PADDING token, but its val.source instead, while
> in funlike_invocation_p we want to remember the padding token that has the
> significant information for stringify_arg/c-ppoutput.cc.
> So, IMHO we want to overwrite padding if:
> 1) padding == NULL (remember that there was any padding at all)
> 2) padding->val.source == NULL (this matches the source == NULL
>     case in stringify_arg)
> 3) !(padding->val.source->flags & PREV_WHITE) && token->val.source == NULL
>     (this matches the !(source->flags & PREV_WHITE) && token->val.source == NULL
>     case in stringify_arg)
> 
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk
> (and after a while for some release branches)?
> 
> 2022-01-31  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR preprocessor/104147
> 	* macro.cc (funlike_invocation_p): For padding prefer a token
> 	with val.source non-NULL especially if it has PREV_WHITE set
> 	on val.source->flags.
> 
> 	* c-c++-common/cpp/pr104147.c: New test.
> 
> --- libcpp/macro.cc.jj	2022-01-18 11:59:00.277972128 +0100
> +++ libcpp/macro.cc	2022-01-31 15:44:49.208847524 +0100
> @@ -1374,7 +1374,9 @@ funlike_invocation_p (cpp_reader *pfile,
>         if (token->type != CPP_PADDING)
>   	break;
>         if (padding == NULL
> -	  || (!(padding->flags & PREV_WHITE) && token->val.source == NULL))
> +	  || padding->val.source == NULL
> +	  || (!(padding->val.source->flags & PREV_WHITE)
> +	      && token->val.source == NULL))

What if padding->val.source == NULL and padding->flags & PREV_WHITE? 
Have you tried asserting that's not possible?

>   	padding = token;
>       }
>   
> --- gcc/testsuite/c-c++-common/cpp/pr104147.c.jj	2022-01-31 15:54:18.113847934 +0100
> +++ gcc/testsuite/c-c++-common/cpp/pr104147.c	2022-01-31 15:54:04.530038941 +0100
> @@ -0,0 +1,27 @@
> +/* PR preprocessor/104147 */
> +/* { dg-do run } */
> +
> +#define X(x,y) 	x y
> +#define STR_(x) #x
> +#define STR(x) 	STR_(x)
> +const char *str =
> +STR(X(Y,Y))
> +#define Y()
> +STR(X(Y,Y))
> +#undef Y
> +STR(X(Y,Y))
> +#define Y()
> +STR(X(Y,Y))
> +STR(X(Y,
> +Y))
> +STR(X(Y
> +,Y))
> +;
> +
> +int
> +main ()
> +{
> +  if (__builtin_strcmp (str, "Y YY YY YY YY YY Y") != 0)
> +    __builtin_abort ();
> +  return 0;
> +}
> 
> 	Jakub
>
Jakub Jelinek Jan. 31, 2022, 11:28 p.m. UTC | #2
On Mon, Jan 31, 2022 at 05:12:23PM -0500, Jason Merrill wrote:
> > --- libcpp/macro.cc.jj	2022-01-18 11:59:00.277972128 +0100
> > +++ libcpp/macro.cc	2022-01-31 15:44:49.208847524 +0100
> > @@ -1374,7 +1374,9 @@ funlike_invocation_p (cpp_reader *pfile,
> >         if (token->type != CPP_PADDING)
> >   	break;
> >         if (padding == NULL
> > -	  || (!(padding->flags & PREV_WHITE) && token->val.source == NULL))
> > +	  || padding->val.source == NULL
> > +	  || (!(padding->val.source->flags & PREV_WHITE)
> > +	      && token->val.source == NULL))
> 
> What if padding->val.source == NULL and padding->flags & PREV_WHITE? Have
> you tried asserting that's not possible?

I haven't, but will do so now.
That said, I think at least for the shared avoid_paste token one should
better not change any flags on it.
And, at least the stringify_arg and c-ppoutput.cc (token_streamer::stream)
code doesn't care at all about PREV_WHITE flags on CPP_PADDING tokens, it
only cares about PREV_WHITE flags on its val.source.
The only other spot in c-ppoutput.cc that checks PREV_WHITE is cb_include
where the tokens must be CPP_COMMENT, glue_header_name after checking
PREV_WHITE on token->flags calls cpp_spell_token which for CPP_PADDING
errors out, etc.
More importantly, the setting of PREV_WHITE bit (flags |= PREV_WHITE etc.)
I see only in cpp_lex_direct which shouldn't produce CPP_PADDING tokens,
then paste_tokens copies it from old lhs to the pasted token, but the old
lhs goes through cpp_spell_token, so can't be CPP_PADDING unless an error
is diagnosed, create_iso_definition can copy it from CPP_HASH token
to CPP_MACRO_ARG or CPP_NAME, and that's it.

	Jakub
Jakub Jelinek Feb. 1, 2022, 9:03 a.m. UTC | #3
On Tue, Feb 01, 2022 at 12:28:57AM +0100, Jakub Jelinek via Gcc-patches wrote:
> I haven't, but will do so now.

So, I've done another bootstrap/regtest with incremental
--- gcc/c-family/c-lex.cc.jj	2022-01-18 00:18:02.558747051 +0100
+++ gcc/c-family/c-lex.cc	2022-02-01 00:39:47.314183266 +0100
@@ -297,6 +297,7 @@ get_token_no_padding (cpp_reader *pfile)
       ret = cpp_get_token (pfile);
       if (ret->type != CPP_PADDING)
 	return ret;
+      gcc_assert ((ret->flags & PREV_WHITE) == 0);
     }
 }
 
@@ -487,6 +488,7 @@ c_lex_with_flags (tree *value, location_
   switch (type)
     {
     case CPP_PADDING:
+      gcc_assert ((tok->flags & PREV_WHITE) == 0);
       goto retry;
 
     case CPP_NAME:
@@ -1267,6 +1269,7 @@ lex_string (const cpp_token *tok, tree *
   switch (tok->type)
     {
     case CPP_PADDING:
+      gcc_assert ((tok->flags & PREV_WHITE) == 0);
       goto retry;
     case CPP_ATSIGN:
       if (objc_string)
--- libcpp/macro.cc.jj	2022-01-31 22:11:34.000000000 +0100
+++ libcpp/macro.cc	2022-02-01 00:28:30.717339868 +0100
@@ -1373,6 +1373,7 @@ funlike_invocation_p (cpp_reader *pfile,
       token = cpp_get_token (pfile);
       if (token->type != CPP_PADDING)
 	break;
+      gcc_assert ((token->flags & PREV_WHITE) == 0);
       if (padding == NULL
 	  || padding->val.source == NULL
 	  || (!(padding->val.source->flags & PREV_WHITE)

patch.  The funlike_invocation_p macro never triggered, the other
asserts did on some tests, see below for a full list.
This seems to be caused by #pragma/_Pragma handling.
do_pragma does:
          pfile->directive_result.src_loc = pragma_token_virt_loc;
          pfile->directive_result.type = CPP_PRAGMA;
          pfile->directive_result.flags = pragma_token->flags;
          pfile->directive_result.val.pragma = p->u.ident;
when it sees a pragma, while start_directive does:
  pfile->directive_result.type = CPP_PADDING;
and so does _cpp_do__Pragma.
Now, for #pragma lex.cc will just ignore directive_result if
it has CPP_PADDING type:
              if (_cpp_handle_directive (pfile, result->flags & PREV_WHITE))
                {
                  if (pfile->directive_result.type == CPP_PADDING)
                    continue;
                  result = &pfile->directive_result;
                }
but destringize_and_run does not:
  if (pfile->directive_result.type == CPP_PRAGMA)
    {
...
    }
  else
    {
      count = 1;
      toks = XNEW (cpp_token);
      toks[0] = pfile->directive_result;
and from there it will copy type member of CPP_PADDING, but all the
other members from the last CPP_PRAGMA before it.
Small testcase for it with no option (at least no -fopenmp or -fopenmp-simd).
#pragma GCC push_options
#pragma GCC ignored "-Wformat"
#pragma GCC pop_options
void
foo ()
{
  _Pragma ("omp simd")
  for (int i = 0; i < 64; i++)
    ;
}
I wonder if we shouldn't replace that
      toks[0] = pfile->directive_result;
line with
      toks[0] = pfile->avoid_paste;
or even replace those
      toks = XNEW (cpp_token);
      toks[0] = pfile->directive_result;
lines with
      toks = &pfile->avoid_paste;
(dunno how about the memory management whether something frees
the tokens or not, but e.g. funlike_invocation_p certainly uses the same
_cpp_push_token_context and pushes through that quite often
&pfile->avoid_paste).

+FAIL: 20_util/specialized_algorithms/pstl/uninitialized_construct.cc (test for excess errors)
+FAIL: 20_util/specialized_algorithms/pstl/uninitialized_copy_move.cc (test for excess errors)
+FAIL: 20_util/specialized_algorithms/pstl/uninitialized_fill_destroy.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_merge/inplace_merge.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_merge/merge.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/copy_if.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/copy_move.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/fill.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/generate.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/is_partitioned.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/partition.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/partition_copy.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/remove.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/remove_copy.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/replace.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/replace_copy.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/rotate.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/swap_ranges.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/transform_binary.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/transform_unary.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/unique.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/unique_copy_equal.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/adjacent_find.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/all_of.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/any_of.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/count.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/equal.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/find.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/find_end.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/find_first_of.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/find_if.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/for_each.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/mismatch.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/none_of.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/nth_element.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/reverse.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/reverse_copy.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_nonmodifying/search_n.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_sorting/includes.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_sorting/is_heap.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_sorting/is_sorted.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_sorting/lexicographical_compare.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_sorting/minmax_element.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_sorting/partial_sort.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_sorting/partial_sort_copy.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_sorting/set.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_sorting/sort.cc (test for excess errors)
+FAIL: 26_numerics/pstl/numeric_ops/adjacent_difference.cc (test for excess errors)
+FAIL: 26_numerics/pstl/numeric_ops/reduce.cc (test for excess errors)
+FAIL: 26_numerics/pstl/numeric_ops/scan.cc (test for excess errors)
+FAIL: 26_numerics/pstl/numeric_ops/transform_reduce.cc (test for excess errors)
+FAIL: 26_numerics/pstl/numeric_ops/transform_scan.cc (test for excess errors)
(+ corresponding UNRESOLVED).

	Jakub
diff mbox series

Patch

--- libcpp/macro.cc.jj	2022-01-18 11:59:00.277972128 +0100
+++ libcpp/macro.cc	2022-01-31 15:44:49.208847524 +0100
@@ -1374,7 +1374,9 @@  funlike_invocation_p (cpp_reader *pfile,
       if (token->type != CPP_PADDING)
 	break;
       if (padding == NULL
-	  || (!(padding->flags & PREV_WHITE) && token->val.source == NULL))
+	  || padding->val.source == NULL
+	  || (!(padding->val.source->flags & PREV_WHITE)
+	      && token->val.source == NULL))
 	padding = token;
     }
 
--- gcc/testsuite/c-c++-common/cpp/pr104147.c.jj	2022-01-31 15:54:18.113847934 +0100
+++ gcc/testsuite/c-c++-common/cpp/pr104147.c	2022-01-31 15:54:04.530038941 +0100
@@ -0,0 +1,27 @@ 
+/* PR preprocessor/104147 */
+/* { dg-do run } */
+
+#define X(x,y) 	x y
+#define STR_(x) #x
+#define STR(x) 	STR_(x)
+const char *str =
+STR(X(Y,Y))
+#define Y()
+STR(X(Y,Y))
+#undef Y
+STR(X(Y,Y))
+#define Y()
+STR(X(Y,Y))
+STR(X(Y,
+Y))
+STR(X(Y
+,Y))
+;
+
+int
+main ()
+{
+  if (__builtin_strcmp (str, "Y YY YY YY YY YY Y") != 0)
+    __builtin_abort ();
+  return 0;
+}