Message ID | 20230809221414.2849878-5-lhyatt@gmail.com |
---|---|
State | New |
Headers | show |
Series | diagnostics: libcpp: Overhaul locations for _Pragma tokens | expand |
On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote: > This patch enhances location_get_source_line(), which is the primary > interface provided by the diagnostics infrastructure to obtain the line of > source code corresponding to a given location, so that it understands > generated data locations in addition to normal file-based locations. This > involves changing the argument to location_get_source_line() from a plain > file name, to a source_id object that can represent either type of location. > > gcc/ChangeLog: > > * input.cc (class data_cache_slot): New class. > (file_cache::lookup_data): New function. > (diagnostics_file_cache_forcibly_evict_data): New function. > (file_cache::forcibly_evict_data): New function. > (file_cache::evicted_cache_tab_entry): Generalize (via a template) > to work for both file_cache_slot and data_cache_slot. > (file_cache::add_file): Adapt for new interface to > evicted_cache_tab_entry. > (file_cache::add_data): New function. > (data_cache_slot::create): New function. > (file_cache::file_cache): Support the new m_data_slots member. > (file_cache::~file_cache): Likewise. > (file_cache::lookup_or_add_data): New function. > (file_cache::lookup_or_add): New function that calls either > lookup_or_add_data or lookup_or_add_file as appropriate. > (location_get_source_line): Change the FILE_PATH argument to a > source_id SRC, and use it to support obtaining source lines from > generated data as well as from files. > (location_compute_display_column): Support generated data using the > new features of location_get_source_line. > (dump_location_info): Likewise. > * input.h (location_get_source_line): Adjust prototype. Add a new > convenience overload taking an expanded_location. > (class cache_data_source): Declare. > (class data_cache_slot): Declare. > (class file_cache): Declare new members. > (diagnostics_file_cache_forcibly_evict_data): Declare. > --- > gcc/input.cc | 171 ++++++++++++++++++++++++++++++++++++++++----------- > gcc/input.h | 23 +++++-- > 2 files changed, 153 insertions(+), 41 deletions(-) > > diff --git a/gcc/input.cc b/gcc/input.cc > index 9377020b460..790279d4273 100644 > --- a/gcc/input.cc > +++ b/gcc/input.cc > @@ -207,6 +207,28 @@ private: > void maybe_grow (); > }; > > +/* This is the implementation of cache_data_source for generated > + data that is already in memory. */ > +class data_cache_slot final : public cache_data_source It occurred to me: why are we caching accessing a buffer that's already in memory - but we're also caching the line-splitting information, and providing the line-splitting algorithm with a consistent interface to the data, right? [...snip...] > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char *file_path) > global_dc->m_file_cache->forcibly_evict_file (file_path); > } > > +void > +diagnostics_file_cache_forcibly_evict_data (const char *data, > + unsigned int data_len) > +{ > + if (!global_dc->m_file_cache) > + return; > + global_dc->m_file_cache->forcibly_evict_data (data, data_len); Maybe we should rename diagnostic_context's m_file_cache to m_source_cache? (and class file_cache for that matter?) But if so, that can/should be a followup/separate patch. [...snip...] > @@ -525,10 +582,22 @@ file_cache_slot::create (const file_cache::input_context &in_context, > return true; > } > > +void > +data_cache_slot::create (const char *data, unsigned int data_len, > + unsigned int highest_use_count) > +{ > + reset (); > + on_create (highest_use_count + 1, > + total_lines_num (source_id {data, data_len})); > + m_data_begin = data; > + m_data_end = data + data_len; > +} > + > /* file_cache's ctor. */ > > file_cache::file_cache () > -: m_file_slots (new file_cache_slot[num_file_slots]) > + : m_file_slots (new file_cache_slot[num_file_slots]), > + m_data_slots (new data_cache_slot[num_file_slots]) Should "num_file_slots" be renamed to "num_slots"? I assume you're using the same value for both kinds of slot since the file_cache::evicted_cache_tab_entry template uses this. I suppose the number could be passed in as an argument to that function if we wanted to have different sizes for the two kinds, but I don't think it matters. [...snip...] > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t line_num, > If the function fails, a NULL char_span is returned. */ > > char_span > -location_get_source_line (const char *file_path, int line) > +location_get_source_line (source_id src, int line) > { > - const char *buffer = NULL; > - ssize_t len; > - > - if (line == 0) > - return char_span (NULL, 0); > - > - if (file_path == NULL) > - return char_span (NULL, 0); > + const char_span fail (nullptr, 0); > + if (!src || line <= 0) > + return fail; Looking at source_id's operator bool, are there effectively three kinds of source_id? (a) file names (b) generated buffer (c) NULL == m_filename_or_buffer What does (c) mean? Is it a "something's gone wrong/error" state? Or is this more a special-case of (a)? (in that the m_len for such a case would be zero) Should source_id's 2-param ctor have an assert that the ptr is non- NULL? [...snip...] The patch is OK for trunk as-is, but note the question about the source_id ctor above. Thanks Dave
On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote: > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote: > > This patch enhances location_get_source_line(), which is the primary > > interface provided by the diagnostics infrastructure to obtain the line of > > source code corresponding to a given location, so that it understands > > generated data locations in addition to normal file-based locations. This > > involves changing the argument to location_get_source_line() from a plain > > file name, to a source_id object that can represent either type of location. > > > > gcc/ChangeLog: > > > > * input.cc (class data_cache_slot): New class. > > (file_cache::lookup_data): New function. > > (diagnostics_file_cache_forcibly_evict_data): New function. > > (file_cache::forcibly_evict_data): New function. > > (file_cache::evicted_cache_tab_entry): Generalize (via a template) > > to work for both file_cache_slot and data_cache_slot. > > (file_cache::add_file): Adapt for new interface to > > evicted_cache_tab_entry. > > (file_cache::add_data): New function. > > (data_cache_slot::create): New function. > > (file_cache::file_cache): Support the new m_data_slots member. > > (file_cache::~file_cache): Likewise. > > (file_cache::lookup_or_add_data): New function. > > (file_cache::lookup_or_add): New function that calls either > > lookup_or_add_data or lookup_or_add_file as appropriate. > > (location_get_source_line): Change the FILE_PATH argument to a > > source_id SRC, and use it to support obtaining source lines from > > generated data as well as from files. > > (location_compute_display_column): Support generated data using the > > new features of location_get_source_line. > > (dump_location_info): Likewise. > > * input.h (location_get_source_line): Adjust prototype. Add a new > > convenience overload taking an expanded_location. > > (class cache_data_source): Declare. > > (class data_cache_slot): Declare. > > (class file_cache): Declare new members. > > (diagnostics_file_cache_forcibly_evict_data): Declare. > > --- > > gcc/input.cc | 171 ++++++++++++++++++++++++++++++++++++++++----------- > > gcc/input.h | 23 +++++-- > > 2 files changed, 153 insertions(+), 41 deletions(-) > > > > diff --git a/gcc/input.cc b/gcc/input.cc > > index 9377020b460..790279d4273 100644 > > --- a/gcc/input.cc > > +++ b/gcc/input.cc > > @@ -207,6 +207,28 @@ private: > > void maybe_grow (); > > }; > > > > +/* This is the implementation of cache_data_source for generated > > + data that is already in memory. */ > > +class data_cache_slot final : public cache_data_source > > It occurred to me: why are we caching accessing a buffer that's already > in memory - but we're also caching the line-splitting information, and > providing the line-splitting algorithm with a consistent interface to > the data, right? > Yeah, for the current _Pragma use case, multi-line buffers are not going to be common, but they can occur. I was mainly motivated by the consistent interface, and by the assumption that the overhead is not critical given a diagnostic is being issued. > [...snip...] > > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char *file_path) > > global_dc->m_file_cache->forcibly_evict_file (file_path); > > } > > > > +void > > +diagnostics_file_cache_forcibly_evict_data (const char *data, > > + unsigned int data_len) > > +{ > > + if (!global_dc->m_file_cache) > > + return; > > + global_dc->m_file_cache->forcibly_evict_data (data, data_len); > > Maybe we should rename diagnostic_context's m_file_cache to > m_source_cache? (and class file_cache for that matter?) But if so, > that can/should be a followup/separate patch. > Yes, we should. Believe it or not, I was trying to minimize the size of the patch :) So I didn't make such changes, but they will make things more clear. > [...snip...] > > > @@ -525,10 +582,22 @@ file_cache_slot::create (const file_cache::input_context &in_context, > > return true; > > } > > > > +void > > +data_cache_slot::create (const char *data, unsigned int data_len, > > + unsigned int highest_use_count) > > +{ > > + reset (); > > + on_create (highest_use_count + 1, > > + total_lines_num (source_id {data, data_len})); > > + m_data_begin = data; > > + m_data_end = data + data_len; > > +} > > + > > /* file_cache's ctor. */ > > > > file_cache::file_cache () > > -: m_file_slots (new file_cache_slot[num_file_slots]) > > + : m_file_slots (new file_cache_slot[num_file_slots]), > > + m_data_slots (new data_cache_slot[num_file_slots]) > > Should "num_file_slots" be renamed to "num_slots"? > > I assume you're using the same value for both kinds of slot since the > file_cache::evicted_cache_tab_entry template uses this. I suppose the > number could be passed in as an argument to that function if we wanted > to have different sizes for the two kinds, but I don't think it > matters. > Yes that's right... would rename num_file_slots too. > [...snip...] > > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t line_num, > > If the function fails, a NULL char_span is returned. */ > > > > char_span > > -location_get_source_line (const char *file_path, int line) > > +location_get_source_line (source_id src, int line) > > { > > - const char *buffer = NULL; > > - ssize_t len; > > - > > - if (line == 0) > > - return char_span (NULL, 0); > > - > > - if (file_path == NULL) > > - return char_span (NULL, 0); > > + const char_span fail (nullptr, 0); > > + if (!src || line <= 0) > > + return fail; > > Looking at source_id's operator bool, are there effectively three kinds > of source_id? > > (a) file names > (b) generated buffer > (c) NULL == m_filename_or_buffer > > What does (c) mean? Is it a "something's gone wrong/error" state? Or > is this more a special-case of (a)? (in that the m_len for such a case > would be zero) > > Should source_id's 2-param ctor have an assert that the ptr is non- > NULL? > > [...snip...] > > The patch is OK for trunk as-is, but note the question about the > source_id ctor above. > Thanks. (c) has the same meaning as a NULL file name currently does, so a default-constructed source_id is not an in-memory buffer, but is rather a NULL filename. linemap_add() for instance, will interpret a NULL filename for an LC_LEAVE map, as a request to copy it from the natural values being returned to. I think the source_id constructor needs to accept a NULL filename to remain backwards compatible. With the current design of source_id, it is safe always to change a 'const char*' file name argument to a source_id argument instead; it will work just how it did before because it has an implicit constructor. But if the constructor would assert on a non-NULL pointer, that would necessitate changing all call sites that currently expect they can pass a NULL pointer there. (For example, there are several calls to _cpp_do_file_change() within libcpp that take advantage of being able to pass a NULL filename to linemap_add.) -Lewis
On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote: > On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote: > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote: > > > This patch enhances location_get_source_line(), which is the > > > primary > > > interface provided by the diagnostics infrastructure to obtain > > > the line of > > > source code corresponding to a given location, so that it > > > understands > > > generated data locations in addition to normal file-based > > > locations. This > > > involves changing the argument to location_get_source_line() from > > > a plain > > > file name, to a source_id object that can represent either type > > > of location. > > > [...] > > > > > > > > > diff --git a/gcc/input.cc b/gcc/input.cc > > > index 9377020b460..790279d4273 100644 > > > --- a/gcc/input.cc > > > +++ b/gcc/input.cc > > > @@ -207,6 +207,28 @@ private: > > > void maybe_grow (); > > > }; > > > > > > +/* This is the implementation of cache_data_source for generated > > > + data that is already in memory. */ > > > +class data_cache_slot final : public cache_data_source > > > > It occurred to me: why are we caching accessing a buffer that's > > already > > in memory - but we're also caching the line-splitting information, > > and > > providing the line-splitting algorithm with a consistent interface > > to > > the data, right? > > > > Yeah, for the current _Pragma use case, multi-line buffers are not > going to > be common, but they can occur. I was mainly motivated by the > consistent > interface, and by the assumption that the overhead is not critical > given a > diagnostic is being issued. (nods) > > > [...snip...] > > > > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file > > > (const char *file_path) > > > global_dc->m_file_cache->forcibly_evict_file (file_path); > > > } > > > > > > +void > > > +diagnostics_file_cache_forcibly_evict_data (const char *data, > > > + unsigned int > > > data_len) > > > +{ > > > + if (!global_dc->m_file_cache) > > > + return; > > > + global_dc->m_file_cache->forcibly_evict_data (data, data_len); > > > > Maybe we should rename diagnostic_context's m_file_cache to > > m_source_cache? (and class file_cache for that matter?) But if > > so, > > that can/should be a followup/separate patch. > > > > Yes, we should. Believe it or not, I was trying to minimize the size > of the > patch :) :) Thanks for splitting it up, BTW. [...] > > > > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t > > > line_num, > > > If the function fails, a NULL char_span is returned. */ > > > > > > char_span > > > -location_get_source_line (const char *file_path, int line) > > > +location_get_source_line (source_id src, int line) > > > { > > > - const char *buffer = NULL; > > > - ssize_t len; > > > - > > > - if (line == 0) > > > - return char_span (NULL, 0); > > > - > > > - if (file_path == NULL) > > > - return char_span (NULL, 0); > > > + const char_span fail (nullptr, 0); > > > + if (!src || line <= 0) > > > + return fail; > > > > Looking at source_id's operator bool, are there effectively three > > kinds > > of source_id? > > > > (a) file names > > (b) generated buffer > > (c) NULL == m_filename_or_buffer > > > > What does (c) mean? Is it a "something's gone wrong/error" state? > > Or > > is this more a special-case of (a)? (in that the m_len for such a > > case > > would be zero) > > > > Should source_id's 2-param ctor have an assert that the ptr is non- > > NULL? > > > > [...snip...] > > > > The patch is OK for trunk as-is, but note the question about the > > source_id ctor above. > > > > Thanks. (c) has the same meaning as a NULL file name currently does, > so a > default-constructed source_id is not an in-memory buffer, but is > rather a > NULL filename. linemap_add() for instance, will interpret a NULL > filename > for an LC_LEAVE map, as a request to copy it from the natural values > being > returned to. I think the source_id constructor needs to accept a NULL > filename to remain backwards compatible. With the current design of > source_id, it is safe always to change a 'const char*' file name > argument to > a source_id argument instead; it will work just how it did before > because it > has an implicit constructor. But if the constructor would assert on a > non-NULL pointer, that would necessitate changing all call sites that > currently expect they can pass a NULL pointer there. (For example, > there are > several calls to _cpp_do_file_change() within libcpp that take > advantage of > being able to pass a NULL filename to linemap_add.) Yes, it's OK for this ctor to accept NULL; source_id (const char *filename = nullptr) and I see you added the default arg. I was referring to this ctor: source_id (const char *buffer, unsigned buffer_len) Is it ever OK for "buffer" to be NULL in this 2-param ctor, or can we assert that it's non-NULL in this ctor? Does the generated data case ever return NULL? This is more of a patch 1 thing, of course. Thanks Dave
On Tue, Aug 15, 2023 at 3:46 PM David Malcolm <dmalcolm@redhat.com> wrote: > > On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote: > > On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote: > > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote: > > > > This patch enhances location_get_source_line(), which is the > > > > primary > > > > interface provided by the diagnostics infrastructure to obtain > > > > the line of > > > > source code corresponding to a given location, so that it > > > > understands > > > > generated data locations in addition to normal file-based > > > > locations. This > > > > involves changing the argument to location_get_source_line() from > > > > a plain > > > > file name, to a source_id object that can represent either type > > > > of location. > > > > > > [...] > > > > > > > > > > > > > diff --git a/gcc/input.cc b/gcc/input.cc > > > > index 9377020b460..790279d4273 100644 > > > > --- a/gcc/input.cc > > > > +++ b/gcc/input.cc > > > > @@ -207,6 +207,28 @@ private: > > > > void maybe_grow (); > > > > }; > > > > > > > > +/* This is the implementation of cache_data_source for generated > > > > + data that is already in memory. */ > > > > +class data_cache_slot final : public cache_data_source > > > > > > It occurred to me: why are we caching accessing a buffer that's > > > already > > > in memory - but we're also caching the line-splitting information, > > > and > > > providing the line-splitting algorithm with a consistent interface > > > to > > > the data, right? > > > > > > > Yeah, for the current _Pragma use case, multi-line buffers are not > > going to > > be common, but they can occur. I was mainly motivated by the > > consistent > > interface, and by the assumption that the overhead is not critical > > given a > > diagnostic is being issued. > > (nods) > > > > > > [...snip...] > > > > > > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file > > > > (const char *file_path) > > > > global_dc->m_file_cache->forcibly_evict_file (file_path); > > > > } > > > > > > > > +void > > > > +diagnostics_file_cache_forcibly_evict_data (const char *data, > > > > + unsigned int > > > > data_len) > > > > +{ > > > > + if (!global_dc->m_file_cache) > > > > + return; > > > > + global_dc->m_file_cache->forcibly_evict_data (data, data_len); > > > > > > Maybe we should rename diagnostic_context's m_file_cache to > > > m_source_cache? (and class file_cache for that matter?) But if > > > so, > > > that can/should be a followup/separate patch. > > > > > > > Yes, we should. Believe it or not, I was trying to minimize the size > > of the > > patch :) > > :) > > Thanks for splitting it up, BTW. > > [...] > > > > > > > > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t > > > > line_num, > > > > If the function fails, a NULL char_span is returned. */ > > > > > > > > char_span > > > > -location_get_source_line (const char *file_path, int line) > > > > +location_get_source_line (source_id src, int line) > > > > { > > > > - const char *buffer = NULL; > > > > - ssize_t len; > > > > - > > > > - if (line == 0) > > > > - return char_span (NULL, 0); > > > > - > > > > - if (file_path == NULL) > > > > - return char_span (NULL, 0); > > > > + const char_span fail (nullptr, 0); > > > > + if (!src || line <= 0) > > > > + return fail; > > > > > > Looking at source_id's operator bool, are there effectively three > > > kinds > > > of source_id? > > > > > > (a) file names > > > (b) generated buffer > > > (c) NULL == m_filename_or_buffer > > > > > > What does (c) mean? Is it a "something's gone wrong/error" state? > > > Or > > > is this more a special-case of (a)? (in that the m_len for such a > > > case > > > would be zero) > > > > > > Should source_id's 2-param ctor have an assert that the ptr is non- > > > NULL? > > > > > > [...snip...] > > > > > > The patch is OK for trunk as-is, but note the question about the > > > source_id ctor above. > > > > > > > Thanks. (c) has the same meaning as a NULL file name currently does, > > so a > > default-constructed source_id is not an in-memory buffer, but is > > rather a > > NULL filename. linemap_add() for instance, will interpret a NULL > > filename > > for an LC_LEAVE map, as a request to copy it from the natural values > > being > > returned to. I think the source_id constructor needs to accept a NULL > > filename to remain backwards compatible. With the current design of > > source_id, it is safe always to change a 'const char*' file name > > argument to > > a source_id argument instead; it will work just how it did before > > because it > > has an implicit constructor. But if the constructor would assert on a > > non-NULL pointer, that would necessitate changing all call sites that > > currently expect they can pass a NULL pointer there. (For example, > > there are > > several calls to _cpp_do_file_change() within libcpp that take > > advantage of > > being able to pass a NULL filename to linemap_add.) > > Yes, it's OK for this ctor to accept NULL; > source_id (const char *filename = nullptr) > and I see you added the default arg. > > I was referring to this ctor: > source_id (const char *buffer, unsigned buffer_len) > Is it ever OK for "buffer" to be NULL in this 2-param ctor, or can we > assert that it's non-NULL in this ctor? Does the generated data case > ever return NULL? > Oh, I see. This should never be NULL and I can add an assert for that. -Lewis
On Tue, Aug 15, 2023 at 04:08:47PM -0400, Lewis Hyatt wrote: > On Tue, Aug 15, 2023 at 3:46 PM David Malcolm <dmalcolm@redhat.com> wrote: > > > > On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote: > > > On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote: > > > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote: > > > > > This patch enhances location_get_source_line(), which is the > > > > > primary > > > > > interface provided by the diagnostics infrastructure to obtain > > > > > the line of > > > > > source code corresponding to a given location, so that it > > > > > understands > > > > > generated data locations in addition to normal file-based > > > > > locations. This > > > > > involves changing the argument to location_get_source_line() from > > > > > a plain > > > > > file name, to a source_id object that can represent either type > > > > > of location. > > > > > > > > > [...] > > > > > > > > > > > > > > > > > diff --git a/gcc/input.cc b/gcc/input.cc > > > > > index 9377020b460..790279d4273 100644 > > > > > --- a/gcc/input.cc > > > > > +++ b/gcc/input.cc > > > > > @@ -207,6 +207,28 @@ private: > > > > > void maybe_grow (); > > > > > }; > > > > > > > > > > +/* This is the implementation of cache_data_source for generated > > > > > + data that is already in memory. */ > > > > > +class data_cache_slot final : public cache_data_source > > > > > > > > It occurred to me: why are we caching accessing a buffer that's > > > > already > > > > in memory - but we're also caching the line-splitting information, > > > > and > > > > providing the line-splitting algorithm with a consistent interface > > > > to > > > > the data, right? > > > > > > > > > > Yeah, for the current _Pragma use case, multi-line buffers are not > > > going to > > > be common, but they can occur. I was mainly motivated by the > > > consistent > > > interface, and by the assumption that the overhead is not critical > > > given a > > > diagnostic is being issued. > > > > (nods) > > > > > > > > > [...snip...] > > > > > > > > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file > > > > > (const char *file_path) > > > > > global_dc->m_file_cache->forcibly_evict_file (file_path); > > > > > } > > > > > > > > > > +void > > > > > +diagnostics_file_cache_forcibly_evict_data (const char *data, > > > > > + unsigned int > > > > > data_len) > > > > > +{ > > > > > + if (!global_dc->m_file_cache) > > > > > + return; > > > > > + global_dc->m_file_cache->forcibly_evict_data (data, data_len); > > > > > > > > Maybe we should rename diagnostic_context's m_file_cache to > > > > m_source_cache? (and class file_cache for that matter?) But if > > > > so, > > > > that can/should be a followup/separate patch. > > > > > > > > > > Yes, we should. Believe it or not, I was trying to minimize the size > > > of the > > > patch :) > > > > :) > > > > Thanks for splitting it up, BTW. > > > > [...] > > > > > > > > > > > > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t > > > > > line_num, > > > > > If the function fails, a NULL char_span is returned. */ > > > > > > > > > > char_span > > > > > -location_get_source_line (const char *file_path, int line) > > > > > +location_get_source_line (source_id src, int line) > > > > > { > > > > > - const char *buffer = NULL; > > > > > - ssize_t len; > > > > > - > > > > > - if (line == 0) > > > > > - return char_span (NULL, 0); > > > > > - > > > > > - if (file_path == NULL) > > > > > - return char_span (NULL, 0); > > > > > + const char_span fail (nullptr, 0); > > > > > + if (!src || line <= 0) > > > > > + return fail; > > > > > > > > Looking at source_id's operator bool, are there effectively three > > > > kinds > > > > of source_id? > > > > > > > > (a) file names > > > > (b) generated buffer > > > > (c) NULL == m_filename_or_buffer > > > > > > > > What does (c) mean? Is it a "something's gone wrong/error" state? > > > > Or > > > > is this more a special-case of (a)? (in that the m_len for such a > > > > case > > > > would be zero) > > > > > > > > Should source_id's 2-param ctor have an assert that the ptr is non- > > > > NULL? > > > > > > > > [...snip...] > > > > > > > > The patch is OK for trunk as-is, but note the question about the > > > > source_id ctor above. > > > > > > > > > > Thanks. (c) has the same meaning as a NULL file name currently does, > > > so a > > > default-constructed source_id is not an in-memory buffer, but is > > > rather a > > > NULL filename. linemap_add() for instance, will interpret a NULL > > > filename > > > for an LC_LEAVE map, as a request to copy it from the natural values > > > being > > > returned to. I think the source_id constructor needs to accept a NULL > > > filename to remain backwards compatible. With the current design of > > > source_id, it is safe always to change a 'const char*' file name > > > argument to > > > a source_id argument instead; it will work just how it did before > > > because it > > > has an implicit constructor. But if the constructor would assert on a > > > non-NULL pointer, that would necessitate changing all call sites that > > > currently expect they can pass a NULL pointer there. (For example, > > > there are > > > several calls to _cpp_do_file_change() within libcpp that take > > > advantage of > > > being able to pass a NULL filename to linemap_add.) > > > > Yes, it's OK for this ctor to accept NULL; > > source_id (const char *filename = nullptr) > > and I see you added the default arg. > > > > I was referring to this ctor: > > source_id (const char *buffer, unsigned buffer_len) > > Is it ever OK for "buffer" to be NULL in this 2-param ctor, or can we > > assert that it's non-NULL in this ctor? Does the generated data case > > ever return NULL? > > > > Oh, I see. This should never be NULL and I can add an assert for that. > This tweak (incremental to patch 1/8) accomplishes that. -Lewis -- >8 -- diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h index 395c4612dbe..ad20b140cce 100644 --- a/libcpp/include/line-map.h +++ b/libcpp/include/line-map.h @@ -604,7 +604,7 @@ public: : m_filename_or_buffer (buffer), m_len (buffer_len) { - linemap_assert (buffer_len > 0); + linemap_assert (buffer && buffer_len > 0); } explicit operator bool () const { return m_filename_or_buffer; }
diff --git a/gcc/input.cc b/gcc/input.cc index 9377020b460..790279d4273 100644 --- a/gcc/input.cc +++ b/gcc/input.cc @@ -207,6 +207,28 @@ private: void maybe_grow (); }; +/* This is the implementation of cache_data_source for generated + data that is already in memory. */ +class data_cache_slot final : public cache_data_source +{ +public: + void create (const char *data, unsigned int data_len, + unsigned int highest_use_count); + bool represents_data (const char *data, unsigned int) const + { + /* We can just use pointer equality here since the generated data lives in + memory in one persistent place. It isn't anticipated there would be + several generated data buffers with the same content, so we don't mind + that in such a case we will store it twice. */ + return m_data_begin == data; + } + +protected: + /* In contrast to file_cache_slot, we do not own a buffer. The buffer + passed to create() needs to outlive this object. */ + bool get_more_data () override { return false; } +}; + /* Current position in real source file. */ location_t input_location = UNKNOWN_LOCATION; @@ -382,6 +404,21 @@ file_cache::lookup_file (const char *file_path) return r; } +data_cache_slot * +file_cache::lookup_data (const char *data, unsigned int data_len) +{ + for (unsigned int i = 0; i != num_file_slots; ++i) + { + const auto slot = m_data_slots + i; + if (slot->represents_data (data, data_len)) + { + slot->inc_use_count (); + return slot; + } + } + return nullptr; +} + /* Purge any mention of FILENAME from the cache of files used for printing source code. For use in selftests when working with tempfiles. */ @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char *file_path) global_dc->m_file_cache->forcibly_evict_file (file_path); } +void +diagnostics_file_cache_forcibly_evict_data (const char *data, + unsigned int data_len) +{ + if (!global_dc->m_file_cache) + return; + global_dc->m_file_cache->forcibly_evict_data (data, data_len); +} + void file_cache::forcibly_evict_file (const char *file_path) { @@ -410,36 +456,36 @@ file_cache::forcibly_evict_file (const char *file_path) r->reset (); } +void +file_cache::forcibly_evict_data (const char *data, unsigned int data_len) +{ + if (auto r = lookup_data (data, data_len)) + r->reset (); +} + /* Return the cache that has been less used, recently, or the first empty one. If HIGHEST_USE_COUNT is non-null, *HIGHEST_USE_COUNT is set to the highest use count of the entries in the cache table. */ -file_cache_slot* -file_cache::evicted_cache_tab_entry (unsigned *highest_use_count) +template <class Slot> +Slot * +file_cache::evicted_cache_tab_entry (Slot *slots, + unsigned int *highest_use_count) { - diagnostic_file_cache_init (); - - file_cache_slot *to_evict = &m_file_slots[0]; + auto to_evict = &slots[0]; unsigned huc = to_evict->get_use_count (); for (unsigned i = 1; i < num_file_slots; ++i) { - file_cache_slot *c = &m_file_slots[i]; - bool c_is_empty = (c->get_file_path () == NULL); - + auto c = &slots[i]; if (c->get_use_count () < to_evict->get_use_count () - || (to_evict->get_file_path () && c_is_empty)) + || (!to_evict->unused () && c->unused ())) /* We evict C because it's either an entry with a lower use count or one that is empty. */ to_evict = c; if (huc < c->get_use_count ()) huc = c->get_use_count (); - - if (c_is_empty) - /* We've reached the end of the cache; subsequent elements are - all empty. */ - break; } if (highest_use_count) @@ -463,12 +509,23 @@ file_cache::add_file (const char *file_path) return NULL; unsigned highest_use_count = 0; - file_cache_slot *r = evicted_cache_tab_entry (&highest_use_count); + file_cache_slot *r = evicted_cache_tab_entry (m_file_slots, + &highest_use_count); if (!r->create (in_context, file_path, fp, highest_use_count)) return NULL; return r; } +data_cache_slot * +file_cache::add_data (const char *data, unsigned int data_len) +{ + unsigned int highest_use_count = 0; + data_cache_slot *r = evicted_cache_tab_entry (m_data_slots, + &highest_use_count); + r->create (data, data_len, highest_use_count); + return r; +} + /* Get a borrowed char_span to the full content of this file as decoded according to the input charset, encoded as UTF-8. */ @@ -525,10 +582,22 @@ file_cache_slot::create (const file_cache::input_context &in_context, return true; } +void +data_cache_slot::create (const char *data, unsigned int data_len, + unsigned int highest_use_count) +{ + reset (); + on_create (highest_use_count + 1, + total_lines_num (source_id {data, data_len})); + m_data_begin = data; + m_data_end = data + data_len; +} + /* file_cache's ctor. */ file_cache::file_cache () -: m_file_slots (new file_cache_slot[num_file_slots]) + : m_file_slots (new file_cache_slot[num_file_slots]), + m_data_slots (new data_cache_slot[num_file_slots]) { initialize_input_context (nullptr, false); } @@ -537,6 +606,7 @@ file_cache::file_cache () file_cache::~file_cache () { + delete[] m_data_slots; delete[] m_file_slots; } @@ -554,6 +624,24 @@ file_cache::lookup_or_add_file (const char *file_path) return r; } +data_cache_slot * +file_cache::lookup_or_add_data (const char *data, unsigned int data_len) +{ + data_cache_slot *r = lookup_data (data, data_len); + if (!r) + r = add_data (data, data_len); + return r; +} + +cache_data_source * +file_cache::lookup_or_add (source_id src) +{ + if (src.is_buffer ()) + return lookup_or_add_data (src.get_filename_or_buffer (), + src.get_buffer_len ()); + return src ? lookup_or_add_file (src.get_filename_or_buffer ()) : nullptr; +} + cache_data_source::cache_data_source () : m_data_begin (nullptr), m_data_end (nullptr), m_use_count (0), @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t line_num, If the function fails, a NULL char_span is returned. */ char_span -location_get_source_line (const char *file_path, int line) +location_get_source_line (source_id src, int line) { - const char *buffer = NULL; - ssize_t len; - - if (line == 0) - return char_span (NULL, 0); - - if (file_path == NULL) - return char_span (NULL, 0); + const char_span fail (nullptr, 0); + if (!src || line <= 0) + return fail; diagnostic_file_cache_init (); + const auto c = global_dc->m_file_cache->lookup_or_add (src); + if (!c) + return fail; - file_cache_slot *c = global_dc->m_file_cache->lookup_or_add_file (file_path); - if (c == NULL) - return char_span (NULL, 0); - + const char *buffer = NULL; + ssize_t len; bool read = c->read_line_num (line, &buffer, &len); if (!read) - return char_span (NULL, 0); + return fail; return char_span (buffer, len); } @@ -1193,9 +1277,9 @@ int location_compute_display_column (expanded_location exploc, const cpp_char_column_policy &policy) { - if (!(exploc.file && *exploc.file && exploc.line && exploc.column)) + if (!(exploc.src && exploc.line && exploc.column)) return exploc.column; - char_span line = location_get_source_line (exploc.file, exploc.line); + char_span line = location_get_source_line (exploc); /* If line is NULL, this function returns exploc.column which is the desired fallback. */ return cpp_byte_column_to_display_column (line.get_buffer (), line.length (), @@ -1425,13 +1509,26 @@ dump_location_info (FILE *stream) { /* Beginning of a new source line: draw the line. */ - char_span line_text = location_get_source_line (exploc.file, - exploc.line); + char_span line_text = location_get_source_line (exploc); if (!line_text) break; + + const char *fn1, *fn2; + if (exploc.src.is_buffer ()) + { + fn1 = ORDINARY_MAP_CONTAINING_FILE_NAME (line_table, map); + fn2 = special_fname_generated (); + } + else + { + fn1 = exploc.file; + fn2 = ""; + } + fprintf (stream, - "%s:%3i|loc:%5i|%.*s\n", - exploc.file, exploc.line, + "%s%s:%3i|loc:%5i|%.*s\n", + fn1, fn2, + exploc.line, loc, (int)line_text.length (), line_text.get_buffer ()); @@ -1450,7 +1547,7 @@ dump_location_info (FILE *stream) if (len_loc < 5) len_loc = 5; - int indent = 6 + strlen (exploc.file) + len_lnum + len_loc; + int indent = 6 + strlen (fn1) + strlen (fn2) + len_lnum + len_loc; /* Thousands. */ if (end_location > 999) diff --git a/gcc/input.h b/gcc/input.h index 5c578f1a9de..d30673f1089 100644 --- a/gcc/input.h +++ b/gcc/input.h @@ -114,15 +114,21 @@ class char_span size_t m_n_elts; }; -extern char_span location_get_source_line (const char *file_path, int line); +extern char_span location_get_source_line (source_id src, int line); +inline char_span location_get_source_line (expanded_location exploc) +{ + return location_get_source_line (exploc.src, exploc.line); +} extern char *get_source_text_between (location_t, location_t); extern char_span get_source_file_content (const char *file_path); extern bool location_missing_trailing_newline (const char *file_path); -/* Forward decl of slot within file_cache, so that the definition doesn't +/* Forward decl of slots within file_cache, so that the definition doesn't need to be in this header. */ +class cache_data_source; class file_cache_slot; +class data_cache_slot; /* A cache of source files for use when emitting diagnostics (and in a few places in the C/C++ frontends). @@ -140,7 +146,10 @@ class file_cache ~file_cache (); file_cache_slot *lookup_or_add_file (const char *file_path); + data_cache_slot *lookup_or_add_data (const char *data, unsigned int data_len); + cache_data_source *lookup_or_add (source_id src); void forcibly_evict_file (const char *file_path); + void forcibly_evict_data (const char *data, unsigned int data_len); /* See comments in diagnostic.h about the input conversion context. */ struct input_context @@ -152,13 +161,17 @@ class file_cache bool should_skip_bom); private: - file_cache_slot *evicted_cache_tab_entry (unsigned *highest_use_count); + template <class Slot> + Slot *evicted_cache_tab_entry (Slot *slots, unsigned int *highest_use_count); + file_cache_slot *add_file (const char *file_path); + data_cache_slot *add_data (const char *data, unsigned int data_len); file_cache_slot *lookup_file (const char *file_path); + data_cache_slot *lookup_data (const char *data, unsigned int data_len); - private: static const size_t num_file_slots = 16; file_cache_slot *m_file_slots; + data_cache_slot *m_data_slots; input_context in_context; }; @@ -256,6 +269,8 @@ void dump_location_info (FILE *stream); void diagnostics_file_cache_fini (void); void diagnostics_file_cache_forcibly_evict_file (const char *file_path); +void diagnostics_file_cache_forcibly_evict_data (const char *data, + unsigned int data_len); class GTY(()) string_concat {