From patchwork Tue Nov 2 19:52:08 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Diego Novillo X-Patchwork-Id: 69921 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id E2318B6F11 for ; Wed, 3 Nov 2010 06:52:38 +1100 (EST) Received: (qmail 12813 invoked by alias); 2 Nov 2010 19:52:37 -0000 Received: (qmail 12793 invoked by uid 22791); 2 Nov 2010 19:52:29 -0000 X-SWARE-Spam-Status: No, hits=-1.4 required=5.0 tests=AWL, BAYES_05, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, SPF_HELO_PASS, TW_CP, TW_UF, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (216.239.44.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 02 Nov 2010 19:52:16 +0000 Received: from kpbe18.cbf.corp.google.com (kpbe18.cbf.corp.google.com [172.25.105.82]) by smtp-out.google.com with ESMTP id oA2JqEmM005738 for ; Tue, 2 Nov 2010 12:52:14 -0700 Received: from tobiano.tor.corp.google.com (tobiano.tor.corp.google.com [172.29.41.6]) by kpbe18.cbf.corp.google.com with ESMTP id oA2Jq9qp030635; Tue, 2 Nov 2010 12:52:09 -0700 Received: by tobiano.tor.corp.google.com (Postfix, from userid 54752) id E5789AE1EB; Tue, 2 Nov 2010 15:52:08 -0400 (EDT) Date: Tue, 2 Nov 2010 15:52:08 -0400 From: Diego Novillo To: gcc-patches@gcc.gnu.org, Lawrence Crowl Subject: [pph] Pre-tokenized headers, libcpp changes [1/3] Message-ID: <20101102195205.GA15524@google.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-System-Of-Record: true X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org This patch, mostly from Lawrence, implements all the changes needed to support pre-tokenized headers in libcpp. The bulk of the changes are in supporting light symbol tables used for validating hunks of tokens. When we decide to cache a set of tokens (token hunk), the hunk contains its own symbol table with pre-processor symbols. These are validated as we try to apply the hunks. This implementation is mostly in symtab.[ch] (cpp_lt_*). The other set of changes deal with pre-processing starting from arbitrary offsets into the text stream (cpp_get_pos and cpp_set_pos). There is a bug fix in here that is only really needed when using the repositioning code. In the libcpp reader structure, there is a field (macro_buffer) acting as a string buffer used when cpp_macro_definition() is called to return the string value for the macro. However, during macro argument processing, libcpp also uses that field to buffer identifier values. It then overwrites them with other values and when it is done, it restores the original value from the macro_buffer field. The problem was, that during argument processing, libcpp needs to lookup identifiers. Since we have modified the lookup process to capture macro values as well, we call cpp_macro_definition() which clobbers this buffer. The patch introduces a new field (param_buffer) in the reader structure to do proper buffering for identifier values during macro argument processing. 2010-11-02 Lawrence Crowl Diego Novillo * symtab.c: Include internal.h (ht_calc_hash): Rename from calc_hash. Declare extern. Adjust all users. (cpp_lt_exchange): New. (cpp_lt_create): New. (cpp_lt_statistics): New. (cpp_lt_destroy): New. (cpp_lt_num_entries): New. (cpp_lt_max_length): New. (cpp_lt_take_strings): New. (cpp_lt_forall): New. (lt_query_macro): New. (lt_macro_value): New. (cpp_lt_capture): New. (cpp_lt_verify): New. (cpp_lt_define_syntax): New. (cpp_lt_replay): New. (LT_MASK): Define. (LT_FIRST): Define. (LT_NEXT): Define. (lt_resize): New. (lt_lookup): New. * macro.c (_cpp_save_parameter): Re-write to use new param_buffer field to avoid clobbering macro_buffer. * include/cpplib.h (lexer_state): Define. (enum include_type): Define. (cpp_reset_lexer_state): Define. (cpp_restore_lexer_state): Define. (struct cpp_offset): Define. (cpp_buffer_start): Declare. (cpp_buffer_end): Declare. (cpp_lookup_with_hash): Declare. (cpp_peek_sym): Declare. (cpp_dump_identifier): Declare. (cpp_debug_identifier): Declare. (cpp_dump_identifiers): Declare. (cpp_debug_identifiers): Declare. (cpp_push_include_type): Declare. (cpp_get_pos): Declare. (cpp_set_pos): Declare. (cpp_return_at_eof): Declare. * include/symtab.h (cpp_lookaside): Define. (struct cpp_ident_use): Declare. (struct cpp_idents_used): Declare. (cpp_lt_exchange): Declare. (cpp_lt_create): Declare. (cpp_lt_destroy): Declare. (cpp_lt_capture): Declare. (cpp_lt_verify): Declare. (cpp_lt_replay): Declare. (cpp_lt_num_entries): Declare. (cpp_lt_max_length): Declare. (cpp_lt_take_strings): Declare. (*cpp_lookback): Declare. (cpp_lt_forall): Declare. (cpp_lt_statistics): Declare. * files.c (cpp_buffer_start): Define. (cpp_buffer_end): Define. (_cpp_stack_include): Add argument dname. Adjust all users. (cpp_push_include_type): New. (cpp_get_pos): New. (cpp_set_pos): New. (cpp_return_at_eof): New. * init.c (cpp_create_reader): Initialize field lookaside_table. (cpp_destroy): Free field param_buffer, if needed. * identifiers.c (cpp_lookup_with_hash): New. (cpp_lookup): Call it. (cpp_peek_sym): New. (cpp_dump_identifier): New. (cpp_debug_identifier): New. (cpp_dump_identifiers_r): New. (cpp_dump_identifiers): New. (cpp_debug_identifiers): New. * internal.h (revision 166136): (struct lae): Declare. (struct cpp_lookaside): Declare. (lt_lookup): Declare. (ht_calc_hash): Declare. (struct cpp_reader): Add fields param_buffer, param_buffer_len and lookaside_table. (_cpp_stack_include): Add new const char * argument. * lex.c (lex_identifier_intern): Call cpp_lookup_with_hash. (lex_identifier): Likewise. (_cpp_lex_token): Initialize result to NULL. (cpp_reset_lexer_state): New. (cpp_restore_lexer_state): New. * charset.c (_cpp_interpret_identifier): Call cpp_lookup. Index: libcpp/symtab.c =================================================================== --- libcpp/symtab.c (revision 166136) +++ libcpp/symtab.c (working copy) @@ -23,6 +23,7 @@ along with this program; see the file CO #include "config.h" #include "system.h" #include "symtab.h" +#include "internal.h" /* The code below is a specialization of Vladimir Makarov's expandable hash tables (see libiberty/hashtab.c). The abstraction penalty was @@ -30,7 +31,6 @@ along with this program; see the file CO intrinsically how to calculate a hash value, and how to compare an existing entry with a potential new one. */ -static unsigned int calc_hash (const unsigned char *, size_t); static void ht_expand (hash_table *); static double approx_sqrt (double); @@ -39,8 +39,8 @@ static double approx_sqrt (double); /* Calculate the hash of the string STR of length LEN. */ -static unsigned int -calc_hash (const unsigned char *str, size_t len) +unsigned int +ht_calc_hash (const unsigned char *str, size_t len) { size_t n = len; unsigned int r = 0; @@ -94,7 +94,7 @@ hashnode ht_lookup (hash_table *table, const unsigned char *str, size_t len, enum ht_lookup_option insert) { - return ht_lookup_with_hash (table, str, len, calc_hash (str, len), + return ht_lookup_with_hash (table, str, len, ht_calc_hash (str, len), insert); } @@ -361,3 +361,551 @@ approx_sqrt (double x) while (d > .0001); return s; } + + +/* Lookaside Identifier Hash Table */ + +cpp_lookaside * +cpp_lt_exchange (cpp_reader *pfile, cpp_lookaside *desired) +{ + cpp_lookaside *current = pfile->lookaside_table; + pfile->lookaside_table = desired; + return current; +} + +cpp_lookaside * +cpp_lt_create (unsigned int order, unsigned int debug) +{ + unsigned int slots = 1 << order; + cpp_lookaside *table = XCNEW (cpp_lookaside); + table->entries = XCNEWVEC (struct lae, slots); + table->order = order; + table->active = 0; + + table->max_length = 0; + table->strings = XCNEW (struct obstack); + /* Strings need no alignment. */ + _obstack_begin (table->strings, 0, 0, + (void *(*) (long)) xmalloc, + (void (*) (void *)) free); + obstack_alignment_mask (table->strings) = 0; + + table->searches = 0; + table->comparisons = 0; + table->strcmps = 0; + table->collisions = 0; + table->misses = 0; + table->insertions = 0; + table->macrovalue = 0; + table->resizes = 0; + table->bumps = 0; + table->iterations = 0; + table->empties = 0; + + table->flag_pth_debug = debug; + + return table; +} + +void +cpp_lt_statistics (cpp_reader *pfile) +{ + struct cpp_lookaside *table = pfile->lookaside_table; + fprintf (stderr, "lookaside "); + fprintf (stderr, "order=%u, ", table->order); + fprintf (stderr, "active=%u, ", table->active); + fprintf (stderr, "search=%llu, ", table->searches); + fprintf (stderr, "compare=%llu, ", table->comparisons); + fprintf (stderr, "strcmp=%llu, ", table->strcmps); + fprintf (stderr, "collide=%llu, ", table->collisions); + fprintf (stderr, "miss=%llu, ", table->misses); + fprintf (stderr, "insert=%llu, ", table->insertions); + fprintf (stderr, "macro=%llu, ", table->macrovalue); + fprintf (stderr, "resize=%llu, ", table->resizes); + fprintf (stderr, "bump=%llu, ", table->bumps); + fprintf (stderr, "iterations=%llu, ", table->iterations); + fprintf (stderr, "empties=%llu\n", table->empties); + table->searches = 0; + table->comparisons = 0; + table->strcmps = 0; + table->collisions = 0; + table->misses = 0; + table->insertions = 0; + table->macrovalue = 0; + table->resizes = 0; + table->bumps = 0; + table->iterations = 0; + table->empties = 0; +} + +void +cpp_lt_destroy (cpp_lookaside *table) +{ + if (table->strings) + { + obstack_free (table->strings, NULL); + free (table->strings); + } + free (table->entries); + free (table); +} + +unsigned int +cpp_lt_num_entries (cpp_lookaside *table) +{ + return table->active; +} + +unsigned int +cpp_lt_max_length (cpp_lookaside *table) +{ + return table->max_length; +} + +struct obstack * +cpp_lt_take_strings (cpp_lookaside *table) +{ + struct obstack *strings = table->strings; + table->strings = NULL; + return strings; +} + +void +cpp_lt_forall (cpp_lookaside *table, cpp_lookback grok, void *passthru) +{ + unsigned int slots = 1 << table->order; + struct lae *entries = table->entries; + unsigned int index; + for (index = 0; index < slots ; ++index) + { + hashnode node = entries[index].node; + if (node) + grok (passthru, (const char *)node->str, node->len, + entries[index].value, entries[index].length); + } +} + +/* Query a CPP_NODE for its macro value from PFILE. */ + +static const char * +lt_query_macro (cpp_reader *pfile, cpp_hashnode *cpp_node) +{ + const char *definition = NULL; + if (cpp_node->flags & NODE_BUILTIN) + { + const char *str = (const char *)cpp_node->ident.str; + if ( strcmp(str, "__DATE__") == 0 + || strcmp(str, "__TIME__") == 0 + || strcmp(str, "__FILE__") == 0 + || strcmp(str, "__LINE__") == 0) + definition = str; + else + { + static char *string = 0; + static unsigned int space = 0; + unsigned int front, back, needed; + const char *value; + + value = (const char *)_cpp_builtin_macro_text (pfile, cpp_node); + front = strlen (str); + back = strlen (value); + needed = front + 1 + back + 1; + if (space < needed) + { + if (string != NULL) + free (string); + string = XCNEWVEC (char, needed); + space = needed; + } + strcpy (string, str); + string[front] = '='; + strcpy (string + front + 1, value); + + definition = string; + } + } + else + definition = (const char *) cpp_macro_definition (pfile, cpp_node); + + if (pfile->lookaside_table->flag_pth_debug >= 3) + fprintf (stderr, "PTH: macro %s is %s\n", + (const char *)cpp_node->ident.str, + definition); + + return definition; +} + +/* Capture the current STRING definition of a macro for the + libcpp NODE and store it in the look ASIDE table of the PFILE. */ + +static unsigned int +lt_macro_value (const char** string, cpp_lookaside *aside, + cpp_reader *pfile, cpp_hashnode *cpp_node) +{ + const char *definition = lt_query_macro (pfile, cpp_node); + size_t macro_len = strlen (definition); + *string = (const char *) obstack_copy0 (aside->strings, definition, macro_len); + if (macro_len > aside->max_length) + aside->max_length = macro_len; + ++aside->macrovalue; + return macro_len; +} + +/* Capture the identifier state in the lookaside table of PFILE + and then empty the lookaside table. */ + +cpp_idents_used +cpp_lt_capture (cpp_reader *pfile) +{ + cpp_idents_used used; + cpp_lookaside *aside = pfile->lookaside_table; + unsigned int num_entries = aside->active; + unsigned int slots = 1 << aside->order; + unsigned int table_index; + unsigned int summary_index = 0; + + used.num_entries = aside->active; + used.entries = XCNEWVEC (cpp_ident_use, num_entries); + + for (table_index = 0; table_index < slots ; ++table_index) + { + struct lae *table_entry = aside->entries + table_index; + hashnode node = table_entry->node; + if (node) + { + cpp_ident_use *summary_entry; + cpp_hashnode *cpp_node; + + summary_entry = used.entries + summary_index++; + summary_entry->ident_len = node->len; + summary_entry->ident_str = (const char *)node->str; + summary_entry->before_len = table_entry->length; + summary_entry->before_str = table_entry->value; + + /* Capture any macro value. */ + cpp_node = CPP_HASHNODE (node); + if (cpp_node->type == NT_MACRO) + summary_entry->after_len = lt_macro_value + (&summary_entry->after_str, aside, pfile, cpp_node); + /* else .after_str and .after_len are still zero initialized. */ + } + } + + /* Now empty out the lookaside table. */ + memset (aside->entries, 0, slots * sizeof (struct lae)); + aside->active = 0; + + /* Take the strings from the table and give to the summary. */ + used.strings = aside->strings; + aside->strings = NULL; + used.max_length = aside->max_length; + + /* Create a new string table. */ + aside->max_length = 0; + aside->strings = XCNEW (struct obstack); + /* Strings need no alignment. */ + _obstack_begin (aside->strings, 0, 0, + (void *(*) (long)) xmalloc, + (void (*) (void *)) free); + obstack_alignment_mask (aside->strings) = 0; + + aside->iterations += slots; + ++aside->empties; + + return used; +} + +/* Verify that the INDENTIFIERS have before states that consistent + with the current identifier definitions in the READER. + If not, set the BAD_USE and CUR_DEF to indicate the first + inconsistency. A null means 'not a macro'. */ + +bool +cpp_lt_verify (cpp_reader *reader, cpp_idents_used* identifiers, + cpp_ident_use **bad_use, const char **cur_def) +{ + unsigned int i; + unsigned int num_entries = identifiers->num_entries; + cpp_ident_use *entries = identifiers->entries; + + *bad_use = NULL; + *cur_def = NULL; + + for (i = 0; i < num_entries; ++i) + { + cpp_hashnode *cpp_node; + cpp_ident_use *entry = entries + i; + const char *ident_str = entry->ident_str; + unsigned int ident_len = entry->ident_len; + const char *before_str = entry->before_str; + unsigned int before_len = entry->before_len; + cpp_node = cpp_peek_sym (reader, (const unsigned char *)ident_str, + ident_len); + if (cpp_node == NULL) + { + /* The symbol used to exist, but it doesn't now. */ + if (before_str != NULL) + { + *bad_use = entry; + *cur_def = NULL; + goto fail; + } + } + else if (before_len == -1U) + { + /* It was not saved as a macro. */ + if (cpp_node->type == NT_MACRO) + { + /* But it is a macro now! */ + *bad_use = entry; + *cur_def = (const char*) lt_query_macro (reader, cpp_node); + goto fail; + } + /* Otherwise, both agree it is not a macro. */ + } + else + { + /* It was saved as a macro. */ + const char *definition; + + if (cpp_node->type != NT_MACRO) + { + /* But it is not a macro now! */ + *bad_use = entry; + *cur_def = NULL; + goto fail; + } + /* Otherwise, both agree it is a macro. */ + definition = lt_query_macro (reader, cpp_node); + /* strlen is required to avoid the prefix problem. */ + if (definition == NULL + || before_len != strlen (definition) + || memcmp (definition, before_str, before_len) != 0) + { + /* They do not have the same value. */ + *bad_use = entry; + *cur_def = definition; + goto fail; + } + } + } +/* pass: */ + *bad_use = NULL; + *cur_def = NULL; + return true; + +fail: + return false; +} + +/* Produce the macro definition syntax NEEDED by cpp_define from + the syntax GIVEN by cpp_macro_definition. */ + +static void +cpp_lt_define_syntax (char *needed, const char *given) +{ + char c; + + c = *given++; + + /* Copy over macro identifier. */ + while ( ('0' <= c && c <= '9') + || ('A' <= c && c <= 'Z') + || ('a' <= c && c <= 'z') + || (c == '_')) + { + *needed++ = c; + c = *given++; + } + + if (c == '(') + { + /* Copy over parameter list. */ + while (c != ')') + { + *needed++ = c; + c = *given++; + } + + /* Copy over trailing parenthesis. */ + *needed++ = c; + c = *given++; + } + + /* Replace definition space by assignment. */ + /* (c == ' ') */ + *needed++ = '='; + c = *given++; + + /* Copy over macro identifier. */ + while (c != '\0') + { + *needed++ = c; + c = *given++; + } + + *needed++ = '\0'; +} + +/* Replay the macro definitions captured by the table of IDENTIFIERS + into the READER state. */ + +void +cpp_lt_replay (cpp_reader *reader, cpp_idents_used* identifiers) +{ + unsigned int i; + unsigned int num_entries = identifiers->num_entries; + cpp_ident_use *entries = identifiers->entries; + char *buffer = XCNEWVEC (char, identifiers->max_length + 1); + + /* Prevent the lexer from invalidating the tokens we've read so far. */ + reader->keep_tokens++; + + for (i = 0; i < num_entries; ++i) + { + cpp_ident_use *entry = entries + i; + const char *ident_str = entry->ident_str; + const char *before_str = entry->before_str; + const char *after_str = entry->after_str; + if (before_str == NULL) + { + if (after_str != NULL) + { + cpp_lt_define_syntax (buffer, after_str); + cpp_define (reader, buffer); + } + /* else consistently not macros */ + } + else + { + if (after_str == NULL) + { + cpp_undef (reader, ident_str); + } + else if (strcmp (before_str, after_str) != 0) + { + cpp_undef (reader, ident_str); + cpp_lt_define_syntax (buffer, after_str); + cpp_define (reader, buffer); + } + /* else macro with the same definition */ + } + } + + reader->keep_tokens--; + + free (buffer); +} + +/* Mappings from hash to index. */ +#define LT_MASK(order) (~(~0 << (order))) +#define LT_FIRST(hash, order, mask) (((hash) ^ ((hash) >> (order))) & (mask)) +#define LT_NEXT(index, mask) (((index) + 1) & (mask)) +/* Linear probing. */ + +static void +lt_resize (cpp_lookaside *aside, unsigned int old_order, unsigned int new_order) +{ + unsigned int old_index; + unsigned int old_slots = 1 << old_order; + unsigned int new_slots = 1 << new_order; + unsigned int new_mask = LT_MASK (new_order); + struct lae *old_entries = aside->entries; + struct lae *new_entries = XCNEWVEC (struct lae, new_slots); + for ( old_index = 0; old_index < old_slots; ++old_index ) + { + hashnode node = old_entries[old_index].node; + if (node) + { + unsigned int hash = old_entries[old_index].hash; + unsigned int new_index = LT_FIRST (hash, new_order, new_mask); + hashnode probe = new_entries[new_index].node; + while (probe) + { + new_index = LT_NEXT (new_index, new_mask); + probe = new_entries[new_index].node; + ++aside->bumps; + } + new_entries[new_index].node = node; + new_entries[new_index].hash = hash; + new_entries[new_index].length = old_entries[old_index].length; + new_entries[new_index].value = old_entries[old_index].value; + } + } + free (old_entries); + aside->entries = new_entries; + aside->order = new_order; + ++aside->resizes; +} + +cpp_hashnode * +lt_lookup (cpp_reader *pfile, + const unsigned char *identifier, + size_t length, + unsigned int hash) +{ + cpp_lookaside *aside = pfile->lookaside_table; + /* Compress the hash to an index. + Assume there is sufficient entropy in the lowest 2*order bits. */ + unsigned int order = aside->order; + unsigned int mask = LT_MASK (order); + unsigned int index = LT_FIRST (hash, order, mask); + cpp_hashnode *cpp_node; + + /* Search the lookaside table. */ + struct lae *entries = aside->entries; + hashnode node = entries[index].node; + ++aside->searches; + + /* Hashes have no sentinel value, so an entry is empty iff there is + a null node value. */ + while (node) + { + if (entries[index].hash == hash) + { + ++aside->comparisons; + if (node->len == length) + { + ++aside->strcmps; + if (memcmp (node->str, identifier, length) == 0) + return CPP_HASHNODE (node); + } + } + + ++aside->collisions; + index = LT_NEXT (index, mask); + node = entries[index].node; + } + + ++aside->misses; + + node = ht_lookup_with_hash + (pfile->hash_table, identifier, length, hash, HT_ALLOC); + cpp_node = CPP_HASHNODE(node); + + /* Do not save macro parameter names; they don't affect verification. */ + if (cpp_node->flags & NODE_MACRO_ARG) + return cpp_node; + + ++aside->insertions; + + /* Fill out new entry. */ + ++aside->active; + entries[index].node = node; + entries[index].hash = hash; + if (length > aside->max_length) + aside->max_length = length; + + /* Capture any macro value. */ + if (cpp_node->type == NT_MACRO) + entries[index].length = lt_macro_value + (&entries[index].value, aside, pfile, cpp_node); + /* else .value and .length are still zero from initialization. */ + + /* Check table load factor. */ + if (aside->active >= (unsigned)(1 << (order - 1))) + /* Table is at least half full; double it. */ + lt_resize (aside, order, order + 1); + + return cpp_node; +} Index: libcpp/macro.c =================================================================== --- libcpp/macro.c (revision 166136) +++ libcpp/macro.c (working copy) @@ -1515,15 +1515,14 @@ _cpp_save_parameter (cpp_reader *pfile, ((cpp_hashnode **) BUFF_FRONT (pfile->a_buff))[macro->paramc++] = node; node->flags |= NODE_MACRO_ARG; - len = macro->paramc * sizeof (union _cpp_hashnode_value); - if (len > pfile->macro_buffer_len) + len = macro->paramc; + if (len > pfile->param_buffer_len) { - pfile->macro_buffer = XRESIZEVEC (unsigned char, pfile->macro_buffer, - len); - pfile->macro_buffer_len = len; + pfile->param_buffer = XRESIZEVEC (union _cpp_hashnode_value, + pfile->param_buffer, len); + pfile->param_buffer_len = len; } - ((union _cpp_hashnode_value *) pfile->macro_buffer)[macro->paramc - 1] - = node->value; + pfile->param_buffer[macro->paramc - 1] = node->value; node->value.arg_index = macro->paramc; return false; @@ -1888,7 +1887,7 @@ _cpp_create_definition (cpp_reader *pfil { struct cpp_hashnode *node = macro->params[i]; node->flags &= ~ NODE_MACRO_ARG; - node->value = ((union _cpp_hashnode_value *) pfile->macro_buffer)[i]; + node->value = pfile->param_buffer[i]; } if (!ok) Index: libcpp/directives.c =================================================================== --- libcpp/directives.c (revision 166136) +++ libcpp/directives.c (working copy) @@ -788,7 +788,7 @@ do_include_common (cpp_reader *pfile, en pfile->directive->name, fname, angle_brackets, buf); - _cpp_stack_include (pfile, fname, angle_brackets, type); + _cpp_stack_include (pfile, NULL, fname, angle_brackets, type); } XDELETEVEC (fname); Index: libcpp/include/cpplib.h =================================================================== --- libcpp/include/cpplib.h (revision 166136) +++ libcpp/include/cpplib.h (working copy) @@ -37,6 +37,7 @@ typedef struct cpp_hashnode cpp_hashnode typedef struct cpp_macro cpp_macro; typedef struct cpp_callbacks cpp_callbacks; typedef struct cpp_dir cpp_dir; +typedef struct lexer_state lexer_state; struct answer; struct _cpp_file; @@ -607,6 +608,9 @@ enum cpp_builtin_type BT_LAST_USER = BT_FIRST_USER + 31 }; +/* #include types. */ +enum include_type {IT_INCLUDE, IT_INCLUDE_NEXT, IT_IMPORT, IT_CMDLINE}; + #define CPP_HASHNODE(HNODE) ((cpp_hashnode *) (HNODE)) #define HT_NODE(NODE) ((ht_identifier *) (NODE)) #define NODE_LEN(NODE) HT_LEN (&(NODE)->ident) @@ -915,6 +919,8 @@ extern const char *cpp_type2name (enum c string literal. Handles all relevant diagnostics. */ extern cppchar_t cpp_parse_escape (cpp_reader *, const unsigned char ** pstr, const unsigned char *limit, int wide); +extern lexer_state *cpp_reset_lexer_state (cpp_reader *); +extern void cpp_restore_lexer_state (cpp_reader *, lexer_state *); /* Structure used to hold a comment block at a given location in the source code. */ @@ -942,6 +948,27 @@ typedef struct int allocated; } cpp_comment_table; +/* Structure describing an offset into a cpp_buffer. */ + +typedef struct GTY(()) cpp_offset +{ + /* Distance, in bytes, from the start of the buffer to the current + character location. */ + size_t cur; + + /* Distance, in bytes, from the start of the buffer to the start of + the current physical line. */ + size_t line_base; + + /* Distance, in bytes, from the start of the buffer to the start of + the next logical line (the start of the to-be-cleaned line). */ + size_t next_line; +} cpp_offset; + +/* Constants for cpp_get_pos and cpp_set_pos. */ +extern const cpp_offset cpp_buffer_start; +extern const cpp_offset cpp_buffer_end; + /* Returns the table of comments encountered by the preprocessor. This table is only populated when pfile->state.save_comments is true. */ extern cpp_comment_table *cpp_get_comments (cpp_reader *); @@ -952,9 +979,18 @@ extern cpp_comment_table *cpp_get_commen table if it is not already there. */ extern cpp_hashnode *cpp_lookup (cpp_reader *, const unsigned char *, unsigned int); +extern cpp_hashnode *cpp_lookup_with_hash + (cpp_reader *, const unsigned char *, unsigned int, unsigned int); +extern cpp_hashnode *cpp_peek_sym (cpp_reader *, const unsigned char *, + unsigned int); +/* In identifiers.c */ typedef int (*cpp_cb) (cpp_reader *, cpp_hashnode *, void *); extern void cpp_forall_identifiers (cpp_reader *, cpp_cb, void *); +extern void cpp_dump_identifier (cpp_reader *, FILE *, cpp_hashnode *); +extern void cpp_debug_identifier (cpp_reader *, cpp_hashnode *); +extern void cpp_dump_identifiers (cpp_reader *, FILE *); +extern void cpp_debug_identifiers (cpp_reader *); /* In macro.c */ extern void cpp_scan_nooutput (cpp_reader *); @@ -967,6 +1003,8 @@ extern bool cpp_included (cpp_reader *, extern bool cpp_included_before (cpp_reader *, const char *, source_location); extern void cpp_make_system_header (cpp_reader *, int, int); extern bool cpp_push_include (cpp_reader *, const char *); +extern bool cpp_push_include_type (cpp_reader *, const char *, const char *, + bool, enum include_type); extern void cpp_change_file (cpp_reader *, enum lc_reason, const char *); extern const char *cpp_get_path (struct _cpp_file *); extern cpp_dir *cpp_get_dir (struct _cpp_file *); @@ -974,6 +1012,9 @@ extern cpp_buffer *cpp_get_buffer (cpp_r extern struct _cpp_file *cpp_get_file (cpp_buffer *); extern cpp_buffer *cpp_get_prev (cpp_buffer *); extern void cpp_clear_file_cache (cpp_reader *); +extern cpp_offset cpp_get_pos (cpp_buffer *); +extern void cpp_set_pos (cpp_buffer *, cpp_offset); +extern void cpp_return_at_eof (cpp_buffer *, bool); /* In pch.c */ struct save_macro_data; Index: libcpp/include/symtab.h =================================================================== --- libcpp/include/symtab.h (revision 166136) +++ libcpp/include/symtab.h (working copy) @@ -101,4 +101,87 @@ extern void ht_load (hash_table *ht, has /* Dump allocation statistics to stderr. */ extern void ht_dump_statistics (hash_table *); + +/* A lookaside identifier table for subsets of the token stream. */ + +typedef struct cpp_lookaside cpp_lookaside; + +/* A summary of the identifier uses captured by the lookaside table. */ + +typedef struct GTY(()) cpp_ident_use +{ + unsigned int ident_len; + const char *ident_str; + unsigned int before_len; + const char *before_str; + unsigned int after_len; + const char *after_str; +} cpp_ident_use; + +typedef struct GTY(()) cpp_idents_used +{ + unsigned int max_length; + unsigned int num_entries; + cpp_ident_use *entries; + struct obstack * GTY((skip)) strings; +} cpp_idents_used; + +/* Exchange the reader's current lookaside table with a new table. + To deactivate the lookaside table, set it to NULL. + The current table is the return value. + Clients are responsible for creating and destroying the tables. */ +cpp_lookaside * +cpp_lt_exchange (struct cpp_reader *pfile, cpp_lookaside *desired); + +/* Create the lookaside table. */ +cpp_lookaside * +cpp_lt_create (unsigned int order, unsigned int debug); + +/* Frees all memory associated with a lookaside table. */ +void +cpp_lt_destroy (cpp_lookaside *table); + +/* Captures the current state of the lookaside table, + together with macro definition state before and after the table, + and then empties the table. */ +cpp_idents_used +cpp_lt_capture (struct cpp_reader *pfile); + +/* Verifies that the previously captured identifiers + are consistent with the current state of the reader. + If not, set the bad_use and cur_def to indicate the first + inconsistency. A null means 'not a macro'. */ +bool +cpp_lt_verify (struct cpp_reader *reader, cpp_idents_used* identifiers, + cpp_ident_use **bad_use, const char **cur_def); + +/* Replay the macro definitions captured by the table of identifiers used + into the reader state. */ +void +cpp_lt_replay (struct cpp_reader *reader, cpp_idents_used* identifiers); + +/* Query the number of entries in the lookaside table. */ +unsigned int +cpp_lt_num_entries (cpp_lookaside *table); + +/* Query the string length in the lookaside table. */ +unsigned int +cpp_lt_max_length (cpp_lookaside *table); + +/* Take ownership of the obstack holding strings in the lookaside table. */ +struct obstack * +cpp_lt_take_strings (cpp_lookaside *table); + +/* Visit all the entries in the lookaside table. */ +typedef void (*cpp_lookback) (void *passthru, + const char *ident_str, unsigned int ident_len, + const char *macro_str, unsigned int macro_len); +void +cpp_lt_forall (cpp_lookaside *table, cpp_lookback grok, void *passthru); + +/* Dump the lookaside table statistics to stderr. */ +void +cpp_lt_statistics (struct cpp_reader *pfile); + + #endif /* LIBCPP_SYMTAB_H */ Index: libcpp/files.c =================================================================== --- libcpp/files.c (revision 166136) +++ libcpp/files.c (working copy) @@ -32,6 +32,9 @@ along with this program; see the file CO #include "md5.h" #include +const cpp_offset cpp_buffer_start = {0, 0, 0}; +const cpp_offset cpp_buffer_end = {-1, -1, -1}; + /* Variable length record files on VMS will have a stat size that includes record control characters that won't be included in the read size. */ #ifdef VMS @@ -174,8 +177,9 @@ static bool find_file_in_dir (cpp_reader static bool read_file_guts (cpp_reader *pfile, _cpp_file *file); static bool read_file (cpp_reader *pfile, _cpp_file *file); static bool should_stack_file (cpp_reader *, _cpp_file *file, bool import); -static struct cpp_dir *search_path_head (cpp_reader *, const char *fname, - int angle_brackets, enum include_type); +static struct cpp_dir *search_path_head (cpp_reader *, const char *dname, + const char *fname, int angle_brackets, + enum include_type); static const char *dir_name_of_file (_cpp_file *file); static void open_file_failed (cpp_reader *pfile, _cpp_file *file, int); static struct file_hash_entry *search_cache (struct file_hash_entry *head, @@ -843,11 +847,12 @@ _cpp_mark_file_once_only (cpp_reader *pf } /* Return the directory from which searching for FNAME should start, - considering the directive TYPE and ANGLE_BRACKETS. If there is - nothing left in the path, returns NULL. */ + considering the directive TYPE and ANGLE_BRACKETS. If ANGLE_BRACKETS + is 0, TYPE is IT_INCLUDE and DNAME is given, it returns a directory + entry for DNAME. If there is nothing left in the path, returns NULL. */ static struct cpp_dir * -search_path_head (cpp_reader *pfile, const char *fname, int angle_brackets, - enum include_type type) +search_path_head (cpp_reader *pfile, const char *dname, const char *fname, + int angle_brackets, enum include_type type) { cpp_dir *dir; _cpp_file *file; @@ -873,7 +878,7 @@ search_path_head (cpp_reader *pfile, con else if (pfile->quote_ignores_source_dir) dir = pfile->quote_include; else - return make_cpp_dir (pfile, dir_name_of_file (file), + return make_cpp_dir (pfile, (dname) ? dname : dir_name_of_file (file), pfile->buffer ? pfile->buffer->sysp : 0); if (dir == NULL) @@ -906,13 +911,13 @@ dir_name_of_file (_cpp_file *file) including HEADER, and the command line -imacros and -include. Returns true if a buffer was stacked. */ bool -_cpp_stack_include (cpp_reader *pfile, const char *fname, int angle_brackets, - enum include_type type) +_cpp_stack_include (cpp_reader *pfile, const char *dname, const char *fname, + int angle_brackets, enum include_type type) { struct cpp_dir *dir; _cpp_file *file; - dir = search_path_head (pfile, fname, angle_brackets, type); + dir = search_path_head (pfile, dname, fname, angle_brackets, type); if (!dir) return false; @@ -1325,7 +1330,7 @@ _cpp_compare_file_date (cpp_reader *pfil _cpp_file *file; struct cpp_dir *dir; - dir = search_path_head (pfile, fname, angle_brackets, IT_INCLUDE); + dir = search_path_head (pfile, NULL, fname, angle_brackets, IT_INCLUDE); if (!dir) return -1; @@ -1347,7 +1352,20 @@ _cpp_compare_file_date (cpp_reader *pfil bool cpp_push_include (cpp_reader *pfile, const char *fname) { - return _cpp_stack_include (pfile, fname, false, IT_CMDLINE); + return _cpp_stack_include (pfile, NULL, fname, false, IT_CMDLINE); +} + +/* Pushes the given file onto the buffer stack. Returns true if + successful. This is similar to cpp_push_include but it also + allows to specify whether a #include, #include_next or #import + should be used. If ANGLE_BRACKETS is true, it searches + the file in the system include path. */ +bool +cpp_push_include_type (cpp_reader *pfile, const char *dname, + const char *fname, bool angle_brackets, + enum include_type itype) +{ + return _cpp_stack_include (pfile, dname, fname, angle_brackets, itype); } /* Do appropriate cleanup when a file INC's buffer is popped off the @@ -1825,3 +1843,53 @@ check_file_against_entries (cpp_reader * return bsearch (&d, pchf->entries, pchf->count, sizeof (struct pchf_entry), pchf_compare) != NULL; } + + +/* Return the current position (in bytes) into BUFFER where the next + token will be read from. */ + +cpp_offset +cpp_get_pos (cpp_buffer *buffer) +{ + cpp_offset pos; + + pos.cur = (buffer->cur) ? buffer->cur - buffer->buf : 0; + pos.line_base = (buffer->line_base) ? buffer->line_base - buffer->buf : 0; + pos.next_line = (buffer->next_line) ? buffer->next_line - buffer->buf : 0; + + return pos; +} + + +/* Set the current position (in bytes) into BUFFER where the next + token should be read from. */ + +void +cpp_set_pos (cpp_buffer *buffer, cpp_offset pos) +{ + if (pos.cur == cpp_buffer_end.cur) + buffer->cur = buffer->line_base = buffer->next_line = buffer->rlimit; + else if (pos.cur == cpp_buffer_start.cur) + { + buffer->cur = buffer->line_base = NULL; + buffer->next_line = buffer->buf; + } + else + { + buffer->cur = buffer->buf + pos.cur; + buffer->line_base = buffer->buf + pos.line_base; + buffer->next_line = buffer->buf + pos.next_line; + } +} + + +/* Set the return-at-eof marker for BUFFER to VAL. If VAL is true, a + CPP_EOF token will be returned when the reader find the end of + file. Otherwise, the reader will transparently continue reading the + including file. */ + +void +cpp_return_at_eof (cpp_buffer *buffer, bool val) +{ + buffer->return_at_eof = val; +} Index: libcpp/init.c =================================================================== --- libcpp/init.c (revision 166136) +++ libcpp/init.c (working copy) @@ -231,10 +231,12 @@ cpp_create_reader (enum c_lang lang, has _cpp_init_files (pfile); _cpp_init_hashtable (pfile, table); + pfile->lookaside_table = NULL; return pfile; } + /* Set the line_table entry in PFILE. This is called after reading a PCH file, as the old line_table will be incorrect. */ void @@ -268,6 +270,13 @@ cpp_destroy (cpp_reader *pfile) pfile->macro_buffer_len = 0; } + if (pfile->param_buffer) + { + free (pfile->param_buffer); + pfile->param_buffer = NULL; + pfile->param_buffer_len = 0; + } + if (pfile->deps) deps_free (pfile->deps); obstack_free (&pfile->buffer_ob, 0); Index: libcpp/identifiers.c =================================================================== --- libcpp/identifiers.c (revision 166136) +++ libcpp/identifiers.c (working copy) @@ -86,22 +86,48 @@ _cpp_destroy_hashtable (cpp_reader *pfil } } +/* Returns the hash entry for the STR of length LEN with hash HASH, + creating one if necessary. The return is not NULL. */ +cpp_hashnode * +cpp_lookup_with_hash (cpp_reader *pfile, + const unsigned char *str, unsigned int len, + unsigned int hash) +{ + cpp_hashnode *n; + + if (pfile->lookaside_table) + n = lt_lookup (pfile, str, len, hash); + else + n = CPP_HASHNODE (ht_lookup_with_hash (pfile->hash_table, str, len, + hash, HT_ALLOC)); + + return n; +} + /* Returns the hash entry for the STR of length LEN, creating one - if necessary. */ + if necessary. The return is not NULL. */ cpp_hashnode * cpp_lookup (cpp_reader *pfile, const unsigned char *str, unsigned int len) { - /* ht_lookup cannot return NULL. */ - return CPP_HASHNODE (ht_lookup (pfile->hash_table, str, len, HT_ALLOC)); + unsigned int hash = ht_calc_hash (str, len); + return cpp_lookup_with_hash (pfile, str, len, hash); +} + +/* Returns the hash entry for STR of length LEN from PFILE's symbol + table. If no entry exists, it returns NULL. */ +cpp_hashnode * +cpp_peek_sym (cpp_reader *pfile, const unsigned char *str, unsigned int len) +{ + cpp_hashnode *node; + node = CPP_HASHNODE (ht_lookup (pfile->hash_table, str, len, HT_NO_INSERT)); + return node; } /* Determine whether the str STR, of length LEN, is a defined macro. */ int cpp_defined (cpp_reader *pfile, const unsigned char *str, int len) { - cpp_hashnode *node; - - node = CPP_HASHNODE (ht_lookup (pfile->hash_table, str, len, HT_NO_INSERT)); + cpp_hashnode *node = cpp_peek_sym (pfile, str, len); /* If it's of type NT_MACRO, it cannot be poisoned. */ return node && node->type == NT_MACRO; @@ -119,3 +145,56 @@ cpp_forall_identifiers (cpp_reader *pfil { ht_forall (pfile->hash_table, (ht_cb) cb, v); } + +/* Dump a single identifier in PFILE to FILE. */ +void +cpp_dump_identifier (cpp_reader *pfile, FILE *file, cpp_hashnode *node) +{ + const unsigned char *name; + unsigned int len; + + name = NODE_NAME (node); + len = NODE_LEN (node); + + fprintf (file, "%.*s ", len, name); + if (node->is_directive) + fprintf (file, " [directive]"); + + if (node->type == NT_MACRO) + fprintf (file, " = %s", cpp_macro_definition (pfile, node)); + + fprintf (file, "\n"); +} + + +/* Dump a single identifier in PFILE to stderr. */ +void +cpp_debug_identifier (cpp_reader *pfile, cpp_hashnode *node) +{ + cpp_dump_identifier (pfile, stderr, node); +} + + +/* Callback for cpp_dump_identifiers. */ +static int +cpp_dump_identifiers_r (cpp_reader *pfile, cpp_hashnode *node, void *data) +{ + cpp_dump_identifier (pfile, (FILE *) data, node); + return 1; +} + + +/* Dump all identifiers in PFILE to FILE. */ +void +cpp_dump_identifiers (cpp_reader *pfile, FILE *file) +{ + cpp_forall_identifiers (pfile, cpp_dump_identifiers_r, file); +} + + +/* Dump all identifiers in PFILE to stderr. */ +void +cpp_debug_identifiers (cpp_reader *pfile) +{ + cpp_dump_identifiers (pfile, stderr); +} Index: libcpp/internal.h =================================================================== --- libcpp/internal.h (revision 166136) +++ libcpp/internal.h (working copy) @@ -115,9 +115,6 @@ extern unsigned char *_cpp_unaligned_all #define BUFF_FRONT(BUFF) ((BUFF)->cur) #define BUFF_LIMIT(BUFF) ((BUFF)->limit) -/* #include types. */ -enum include_type {IT_INCLUDE, IT_INCLUDE_NEXT, IT_IMPORT, IT_CMDLINE}; - union utoken { const cpp_token *token; @@ -326,6 +323,57 @@ struct def_pragma_macro { unsigned int is_undef : 1; }; + +/* A lookaside identifier table for subsets of the token stream. */ + +/* The lookaside entry. */ +struct lae { + hashnode node; /* The entry in hash_table. */ + unsigned int hash; /* Hash value. */ + unsigned int length; /* Macro value length. */ + const char *value; /* Macro value string. */ +}; + +/* The lookaside table. */ +struct cpp_lookaside { + struct lae *entries; /* The entry storage. */ + unsigned int order; /* 2^order slots in the entries array. */ + unsigned int active; /* Number of active entries. */ + struct obstack *strings; /* For macro value storage. */ + unsigned int max_length; /* Largest string encountered. */ + + /* Table usage statistics. */ + unsigned long long searches; /* Number of calls to lt_lookup. */ + unsigned long long comparisons; /* Key comparisons. */ + unsigned long long strcmps; /* Key comparisons using strcmp. */ + unsigned long long collisions; /* Found unwanted hash or key. */ + unsigned long long misses; /* Searches not found in table. */ + unsigned long long insertions; /* Number insertions in table. */ + unsigned long long macrovalue; /* Number of macro values computed. */ + unsigned long long resizes; /* Had to resize (grow) the table. */ + unsigned long long bumps; /* Collisions in the resize process. */ + unsigned long long iterations; /* Cells iterated over table. */ + unsigned long long empties; /* Number of table empty/capture. */ + + /* Table debugging. */ + unsigned int flag_pth_debug; +}; + +/* Lookup an identifer in the lookaside table, + and failing that, lookup in the main hash table. + The return will always be non-null. */ +cpp_hashnode * +lt_lookup (cpp_reader *pfile, + const unsigned char *identifier, + size_t length, + unsigned int hash); + +/* The hash parameter is obtained with the following function, + or with the HT_... macros in include/symtab.h. */ +unsigned int +ht_calc_hash (const unsigned char *str, size_t len); + + /* A cpp_reader encapsulates the "state" of a pre-processor run. Applying cpp_get_token repeatedly yields a stream of pre-processor tokens. Usually, there is only one cpp_reader object active. */ @@ -413,6 +461,10 @@ struct cpp_reader unsigned char *macro_buffer; unsigned int macro_buffer_len; + /* Buffer to save parameter values during macro parameter processing. */ + union _cpp_hashnode_value *param_buffer; + unsigned int param_buffer_len; + /* Descriptor for converting from the source character set to the execution character set. */ struct cset_converter narrow_cset_desc; @@ -461,6 +513,7 @@ struct cpp_reader /* Identifier hash table. */ struct ht *hash_table; + cpp_lookaside *lookaside_table; /* Expression parser stack. */ struct op *op_stack, *op_limit; @@ -574,7 +627,7 @@ extern bool _cpp_find_failed (_cpp_file extern void _cpp_mark_file_once_only (cpp_reader *, struct _cpp_file *); extern void _cpp_fake_include (cpp_reader *, const char *); extern bool _cpp_stack_file (cpp_reader *, _cpp_file*, bool); -extern bool _cpp_stack_include (cpp_reader *, const char *, int, +extern bool _cpp_stack_include (cpp_reader *, const char *, const char *, int, enum include_type); extern int _cpp_compare_file_date (cpp_reader *, const char *, int); extern void _cpp_report_missing_guards (cpp_reader *); Index: libcpp/lex.c =================================================================== --- libcpp/lex.c (revision 166136) +++ libcpp/lex.c (working copy) @@ -1075,8 +1075,7 @@ lex_identifier_intern (cpp_reader *pfile } len = cur - base; hash = HT_HASHFINISH (hash, len); - result = CPP_HASHNODE (ht_lookup_with_hash (pfile->hash_table, - base, len, hash, HT_ALLOC)); + result = cpp_lookup_with_hash (pfile, base, len, hash); /* Rarely, identifiers require diagnostics when lexed. */ if (__builtin_expect ((result->flags & NODE_DIAGNOSTIC) @@ -1151,8 +1150,7 @@ lex_identifier (cpp_reader *pfile, const len = cur - base; hash = HT_HASHFINISH (hash, len); - result = CPP_HASHNODE (ht_lookup_with_hash (pfile->hash_table, - base, len, hash, HT_ALLOC)); + result = cpp_lookup_with_hash (pfile, base, len, hash); } /* Rarely, identifiers require diagnostics when lexed. */ @@ -1795,6 +1793,7 @@ _cpp_lex_token (cpp_reader *pfile) { cpp_token *result; + result = NULL; for (;;) { if (pfile->cur_token == pfile->cur_run->limit) @@ -2836,3 +2835,27 @@ cpp_token_val_index (cpp_token *tok) return CPP_TOKEN_FLD_NONE; } } + +/* Reset the lexer state in PFILE and return its previous setting. */ + +lexer_state * +cpp_reset_lexer_state (cpp_reader *pfile) +{ + lexer_state *s; + + s = (lexer_state *) xmalloc (sizeof (lexer_state)); + memcpy (s, &pfile->state, sizeof (pfile->state)); + memset (&pfile->state, 0, sizeof (pfile->state)); + + return s; +} + + +/* Restore the lexer state in PFILE to S. */ + +void +cpp_restore_lexer_state (cpp_reader *pfile, lexer_state *s) +{ + memcpy (&pfile->state, s, sizeof (pfile->state)); + free (s); +} Index: libcpp/charset.c =================================================================== --- libcpp/charset.c (revision 166136) +++ libcpp/charset.c (working copy) @@ -1676,8 +1676,7 @@ _cpp_interpret_identifier (cpp_reader *p } } - return CPP_HASHNODE (ht_lookup (pfile->hash_table, - buf, bufp - buf, HT_ALLOC)); + return cpp_lookup (pfile, buf, bufp - buf); } /* Convert an input buffer (containing the complete contents of one