diff mbox

[1/3] (v2) On-demand locations within string-literals

Message ID 1469629820.17384.17.camel@redhat.com
State New
Headers show

Commit Message

David Malcolm July 27, 2016, 2:30 p.m. UTC
On Tue, 2016-07-26 at 19:05 +0100, Manuel López-Ibáñez wrote:
> On 26/07/16 18:11, David Malcolm wrote:
> 
> > gcc/ChangeLog:
> > 	* gcc.c (cpp_options): Rename string to...
> > 	(cpp_options_): ...this, to avoid clashing with struct in
> > 	cpplib.h.
> 
> It seems to me that you need this because  now gcc.c includes
> cpplib.h via 
> input.h, which seems wrong.
> 
> input.h was FE-independent (it depends on line-map.h but it is an
> accident of 
> history that line-map.h is in libcpp since it doesn't depend on
> anything from 
> libcpp [*]). Note that input.h is included in coretypes.h, so this
> means that 
> now cpplib.h is included almost everywhere! [**]
> 
> There is the following in coretypes.h:
> 
> /* Provide forward struct declaration so that we don't have to
> include
>     all of cpplib.h whenever a random prototype includes a pointer.
>     Note that the cpp_reader and cpp_token typedefs remain part of
>     cpplib.h.  */
> 
> struct cpp_reader;
> struct cpp_token;
> 
> precisely to avoid including cpplib.h.
> 
> 
> If I understand correctly, cpplib.h is needed in input.h because of
> this 
> declaration:
> 
> +extern const char *get_source_range_for_substring (cpp_reader
> *pfile,
> +						   string_concat_db
> *concats,
> +						   location_t
> strloc,
> +						   enum cpp_ttype
> type,
> +						   int start_idx,
> int end_idx,
> +						   source_range
> *out_range);
> 
> 
> Does this really need to be in input.h ?  It seems something that
> only C-family 
> languages will be able to use. Note that you need a reader to use
> this 
> function, and for that, you need to already include cpplib.h.

Fair point; the attached modification to patch 1 compiles cleanly, and
moves it to a new header.

> Perhaps it could live for now in c-format.c, since it is the only
> place using it?

Martin Sebor [CC-ed] wants to use it from the middle-end:
  https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01088.html
so it's unclear to me that c-format.c would be a better location.

There are various places it could live; but getting it working took a
lot of effort to achieve - the currently proposed mixture of libcpp,
input.c and c-format.c for the locations of the various pieces works
(for example, auto_vec isn't available in libcpp).

Given that both Martin and I have candidate patches that are touching
the same area, I'd prefer to focus on getting this code in to trunk,
rather than rewrite it out-of-tree, so that we can at least have the
improvement to location-handling for Wformat.  Once the code is in the
tree, it should be easier to figure out how to access it from the
middle-end.

> Cheers,
> 
> 	Manuel.
> 
> [*] In an ideal world, we would have a language-agnostic diagnostics
> library 
> that would include line-map and that would be used by libcpp and the
> rest of 
> GCC, so that we can remove all the error-routines in libcpp and the
> awkward 
> glue code that ties it into diagnostics.c.,

Agreed, though that may have to wait until gcc 8 at this point.
(Given that the proposed diagnostics library would use line maps, and
would be used by libcpp, would it make sense to move the diagnostics
into libcpp itself?  Diagnostics would seem to be intimately related to
location-tracking)

> [**] And it seems that we are slowly undoing all the work that was
> done by 
> Andrew MacLeod to clean up the .h web and remove dependencies 
> (https://gcc.gnu.org/wiki/rearch).
> 
>

Comments

Manuel López-Ibáñez July 27, 2016, 10:41 p.m. UTC | #1
On 27 July 2016 at 15:30, David Malcolm <dmalcolm@redhat.com> wrote:
>> Perhaps it could live for now in c-format.c, since it is the only
>> place using it?
>
> Martin Sebor [CC-ed] wants to use it from the middle-end:
>   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01088.html
> so it's unclear to me that c-format.c would be a better location.

Fine. He will have to figure out how to get a cpp_reader from the
middle-end, though.

> There are various places it could live; but getting it working took a
> lot of effort to achieve - the currently proposed mixture of libcpp,
> input.c and c-format.c for the locations of the various pieces works
> (for example, auto_vec isn't available in libcpp).

I don't doubt it. I tried to do something similar in the past and I
failed, this is why I ended up with the poor approximation that was in
place until now. This is a significant step forward.

Is libcpp still C? When would be the time to move it to C++ already
and start using common utilities?

Also, moving vec.h, sbitmap, etc to their own directory/library so
that they can be used by other parts of the compiler (hey! maybe even
by other parts of the toolchain?) is desirable. Richard has said in
the past that he supports such moves. Did I understand correctly
Richard?

> Given that both Martin and I have candidate patches that are touching
> the same area, I'd prefer to focus on getting this code in to trunk,
> rather than rewrite it out-of-tree, so that we can at least have the
> improvement to location-handling for Wformat.  Once the code is in the
> tree, it should be easier to figure out how to access it from the
> middle-end.

Sure, I think this version is fine. I'm a big proponent of
step-by-step, even if the steps are only approximations to the optimal
solution :)
It may be enough to motivate someone else more capable to improve over
my poor approximations ;-)


>> [*] In an ideal world, we would have a language-agnostic diagnostics
>> library
>> that would include line-map and that would be used by libcpp and the
>> rest of
>> GCC, so that we can remove all the error-routines in libcpp and the
>> awkward
>> glue code that ties it into diagnostics.c.,
>
> Agreed, though that may have to wait until gcc 8 at this point.
> (Given that the proposed diagnostics library would use line maps, and
> would be used by libcpp, would it make sense to move the diagnostics
> into libcpp itself?  Diagnostics would seem to be intimately related to
> location-tracking)

I don't think so. There is nothing in diagnostic.* pretty-print.*
input.* line-map.* that requires libcpp (and only two mentions of tree
that could be easily abstracted out). This was a deliberate design
goal of Gabriel and followed by most of us later working on
diagnostics. Of course, cpp may make use of the new library, but also
other parts of the toolchain (GAS?). The main obstacle I faced when
trying to do this move was the build machinery to make both libcpp and
gcc build and statically link with this new library.

Once that move is done, the main abstraction challenge to remove the
glue is that libcpp has its own flags for options and diagnostics that
are independent from those of gcc (see c_cpp_error in c-common.c). It
would be great if libcpp used the common flags, but then one would
have to figure out a way to reorder things so that the diagnostic
library, libcpp and gcc can use (or avoid being dependent on) the same
flags.

Cheers,

Manuel.
David Malcolm July 28, 2016, 8:12 p.m. UTC | #2
On Wed, 2016-07-27 at 23:41 +0100, Manuel López-Ibáñez wrote:
> On 27 July 2016 at 15:30, David Malcolm <dmalcolm@redhat.com> wrote:
> > > Perhaps it could live for now in c-format.c, since it is the only
> > > place using it?
> > 
> > Martin Sebor [CC-ed] wants to use it from the middle-end:
> >   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01088.html
> > so it's unclear to me that c-format.c would be a better location.
> 
> Fine. He will have to figure out how to get a cpp_reader from the
> middle-end, though.

It seems to me that on-demand reconstruction of source locations for
STRING_CST nodes is inherently frontend-specific: unless we have the
frontend record the information in some fe-independent way (which I
assume we *don't* want to do, for space-efficiency), we need to be able
to effectively re-run part of the frontend.

So maybe this needs to be a langhook; the c-family can use the global
cpp_reader * there, and everything else can return a "not supported"
code if a diagnostic requests substring location information (and the
diagnostic needs to be able to cope with that).

> > There are various places it could live; but getting it working took
> > a
> > lot of effort to achieve - the currently proposed mixture of
> > libcpp,
> > input.c and c-format.c for the locations of the various pieces
> > works
> > (for example, auto_vec isn't available in libcpp).
> 
> I don't doubt it. I tried to do something similar in the past and I
> failed, this is why I ended up with the poor approximation that was
> in
> place until now. This is a significant step forward.

Thanks (for the current implementation, and for the kind words).

> Is libcpp still C? When would be the time to move it to C++ already
> and start using common utilities?

libcpp is very much C++: I converted the linemap types to use
inheritance as part of gcc 6 (and it helped a lot when implementing the
range-tracking stuff).

> Also, moving vec.h, sbitmap, etc to their own directory/library so
> that they can be used by other parts of the compiler (hey! maybe even
> by other parts of the toolchain?) is desirable. Richard has said in
> the past that he supports such moves. Did I understand correctly
> Richard?

FWIW, I'd want the selftest framework there too; part of the reason
things are in input.c rather than libcpp in the current patch is that
selftests aren't yet available from libcpp (and reworking that seems
orthogonal).

> > Given that both Martin and I have candidate patches that are
> > touching
> > the same area, I'd prefer to focus on getting this code in to
> > trunk,
> > rather than rewrite it out-of-tree, so that we can at least have
> > the
> > improvement to location-handling for Wformat.  Once the code is in
> > the
> > tree, it should be easier to figure out how to access it from the
> > middle-end.
> 
> Sure, I think this version is fine. I'm a big proponent of
> step-by-step, even if the steps are only approximations to the
> optimal
> solution :)
> It may be enough to motivate someone else more capable to improve
> over
> my poor approximations ;-)

:)

> > > [*] In an ideal world, we would have a language-agnostic
> > > diagnostics
> > > library
> > > that would include line-map and that would be used by libcpp and
> > > the
> > > rest of
> > > GCC, so that we can remove all the error-routines in libcpp and
> > > the
> > > awkward
> > > glue code that ties it into diagnostics.c.,
> > 
> > Agreed, though that may have to wait until gcc 8 at this point.
> > (Given that the proposed diagnostics library would use line maps,
> > and
> > would be used by libcpp, would it make sense to move the
> > diagnostics
> > into libcpp itself?  Diagnostics would seem to be intimately
> > related to
> > location-tracking)
> 
> I don't think so. There is nothing in diagnostic.* pretty-print.*
> input.* line-map.* that requires libcpp (and only two mentions of
> tree
> that could be easily abstracted out). This was a deliberate design
> goal of Gabriel and followed by most of us later working on
> diagnostics. Of course, cpp may make use of the new library, but also
> other parts of the toolchain (GAS?). The main obstacle I faced when
> trying to do this move was the build machinery to make both libcpp
> and
> gcc build and statically link with this new library.
> 
> Once that move is done, the main abstraction challenge to remove the
> glue is that libcpp has its own flags for options and diagnostics
> that
> are independent from those of gcc (see c_cpp_error in c-common.c). It
> would be great if libcpp used the common flags, but then one would
> have to figure out a way to reorder things so that the diagnostic
> library, libcpp and gcc can use (or avoid being dependent on) the
> same
> flags.

Thanks.

Dave
Martin Sebor July 28, 2016, 8:38 p.m. UTC | #3
On 07/28/2016 02:12 PM, David Malcolm wrote:
> On Wed, 2016-07-27 at 23:41 +0100, Manuel López-Ibáñez wrote:
>> On 27 July 2016 at 15:30, David Malcolm <dmalcolm@redhat.com> wrote:
>>>> Perhaps it could live for now in c-format.c, since it is the only
>>>> place using it?
>>>
>>> Martin Sebor [CC-ed] wants to use it from the middle-end:
>>>    https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01088.html
>>> so it's unclear to me that c-format.c would be a better location.
>>
>> Fine. He will have to figure out how to get a cpp_reader from the
>> middle-end, though.
>
> It seems to me that on-demand reconstruction of source locations for
> STRING_CST nodes is inherently frontend-specific: unless we have the
> frontend record the information in some fe-independent way (which I
> assume we *don't* want to do, for space-efficiency), we need to be able
> to effectively re-run part of the frontend.
>
> So maybe this needs to be a langhook; the c-family can use the global
> cpp_reader * there, and everything else can return a "not supported"
> code if a diagnostic requests substring location information (and the
> diagnostic needs to be able to cope with that).

The problem with the lanhook approach, as I learned from my first
-Wformat-length attempt, is that it doesn't make the front end
implementation available to LTO.  So passes that run late enough
with LTO (like the latest version of the -Wformat-length pass
does) would not be bale to make use of it.

Martin
Martin Sebor July 28, 2016, 9:16 p.m. UTC | #4
On 07/28/2016 02:38 PM, Martin Sebor wrote:
> On 07/28/2016 02:12 PM, David Malcolm wrote:
>> On Wed, 2016-07-27 at 23:41 +0100, Manuel López-Ibáñez wrote:
>>> On 27 July 2016 at 15:30, David Malcolm <dmalcolm@redhat.com> wrote:
>>>>> Perhaps it could live for now in c-format.c, since it is the only
>>>>> place using it?
>>>>
>>>> Martin Sebor [CC-ed] wants to use it from the middle-end:
>>>>    https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01088.html
>>>> so it's unclear to me that c-format.c would be a better location.
>>>
>>> Fine. He will have to figure out how to get a cpp_reader from the
>>> middle-end, though.
>>
>> It seems to me that on-demand reconstruction of source locations for
>> STRING_CST nodes is inherently frontend-specific: unless we have the
>> frontend record the information in some fe-independent way (which I
>> assume we *don't* want to do, for space-efficiency), we need to be able
>> to effectively re-run part of the frontend.
>>
>> So maybe this needs to be a langhook; the c-family can use the global
>> cpp_reader * there, and everything else can return a "not supported"
>> code if a diagnostic requests substring location information (and the
>> diagnostic needs to be able to cope with that).
>
> The problem with the lanhook approach, as I learned from my first
> -Wformat-length attempt, is that it doesn't make the front end
> implementation available to LTO.  So passes that run late enough
> with LTO (like the latest version of the -Wformat-length pass
> does) would not be bale to make use of it.

I'm sorry, I didn't mean to sound like I was dismissing the idea.
I agree that string processing is language and front-end specific.
Having the middle end call back into the front-end also seems like
the right thing to do, not just to make this case work, but others
like it as well.  So perhaps the problem to solve is how to teach
LTO to talk to the front end.  One way to do it would be to build
the front ends as shared libraries.

Martin
David Malcolm July 29, 2016, 12:36 p.m. UTC | #5
On Thu, 2016-07-28 at 15:16 -0600, Martin Sebor wrote:
> On 07/28/2016 02:38 PM, Martin Sebor wrote:
> > On 07/28/2016 02:12 PM, David Malcolm wrote:
> > > On Wed, 2016-07-27 at 23:41 +0100, Manuel López-Ibáñez wrote:
> > > > On 27 July 2016 at 15:30, David Malcolm <dmalcolm@redhat.com>
> > > > wrote:
> > > > > > Perhaps it could live for now in c-format.c, since it is
> > > > > > the only
> > > > > > place using it?
> > > > > 
> > > > > Martin Sebor [CC-ed] wants to use it from the middle-end:
> > > > >    https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01088.html
> > > > > so it's unclear to me that c-format.c would be a better
> > > > > location.
> > > > 
> > > > Fine. He will have to figure out how to get a cpp_reader from
> > > > the
> > > > middle-end, though.
> > > 
> > > It seems to me that on-demand reconstruction of source locations
> > > for
> > > STRING_CST nodes is inherently frontend-specific: unless we have
> > > the
> > > frontend record the information in some fe-independent way (which
> > > I
> > > assume we *don't* want to do, for space-efficiency), we need to
> > > be able
> > > to effectively re-run part of the frontend.
> > > 
> > > So maybe this needs to be a langhook; the c-family can use the
> > > global
> > > cpp_reader * there, and everything else can return a "not
> > > supported"
> > > code if a diagnostic requests substring location information (and
> > > the
> > > diagnostic needs to be able to cope with that).
> > 
> > The problem with the lanhook approach, as I learned from my first
> > -Wformat-length attempt, is that it doesn't make the front end
> > implementation available to LTO.  So passes that run late enough
> > with LTO (like the latest version of the -Wformat-length pass
> > does) would not be bale to make use of it.
> 
> I'm sorry, I didn't mean to sound like I was dismissing the idea.
> I agree that string processing is language and front-end specific.
> Having the middle end call back into the front-end also seems like
> the right thing to do, not just to make this case work, but others
> like it as well.  So perhaps the problem to solve is how to teach
> LTO to talk to the front end.  One way to do it would be to build
> the front ends as shared libraries.

Turning frontends into shared libraries as a prerequisite would seem to
be imposing a significant burden on the patch.

Currently all that we need from the C family of frontends is the
cpp_reader and the string concatenation records.  I think we can
reconstruct the cpp_reader if we have the options, though presumably
that's per TU, so to support all this we'd need to capture e.g. the per
-TU encoding information in the LTO records, for the case where one TU
is UTF-8 encoded source to UTF-8 execution, and another TU is EBCDIC
-encoded source to UCS-4 execution (or whatever).  And there's an issue
if different TUs compiled the same header with different encoding
options.

Or... we could not bother.  This is a Quality of Implementation thing,
for improving diagnostics, and in each case, the diagnostic is required
to cope with substring location information not being available (and
the code I posted in patch 2 of the kit makes it trivial to handle that
case from a diagnostic).  So we could simply have LTO use the
fallback mode.

There are two high-level approaches I've tried:

(a) capture the substring location information in the lexer/parser in
the frontend as it runs, and store it somehow.

(b) regenerate it "on-demand" when a diagnostic needs it.

Approach (b) is inherently going to be prone to the LTO issues you
describe, but it avoids adding to the CPU cycles/memory consumption for
the common case of not needing the information. [1]

Is approach (b) acceptable?

Thanks
Dave

[1] with the exception of the string concatenation records, but I
believe those are tiny
Martin Sebor July 29, 2016, 2:22 p.m. UTC | #6
> Currently all that we need from the C family of frontends is the
> cpp_reader and the string concatenation records.  I think we can
> reconstruct the cpp_reader if we have the options, though presumably
> that's per TU, so to support all this we'd need to capture e.g. the per
> -TU encoding information in the LTO records, for the case where one TU
> is UTF-8 encoded source to UTF-8 execution, and another TU is EBCDIC
> -encoded source to UCS-4 execution (or whatever).  And there's an issue
> if different TUs compiled the same header with different encoding
> options.
>
> Or... we could not bother.  This is a Quality of Implementation thing,
> for improving diagnostics, and in each case, the diagnostic is required
> to cope with substring location information not being available (and
> the code I posted in patch 2 of the kit makes it trivial to handle that
> case from a diagnostic).  So we could simply have LTO use the
> fallback mode.
>
> There are two high-level approaches I've tried:
>
> (a) capture the substring location information in the lexer/parser in
> the frontend as it runs, and store it somehow.
>
> (b) regenerate it "on-demand" when a diagnostic needs it.
>
> Approach (b) is inherently going to be prone to the LTO issues you
> describe, but it avoids adding to the CPU cycles/memory consumption for
> the common case of not needing the information. [1]
>
> Is approach (b) acceptable?

If (b) means potentially reduced quality of the location ranges
in the -Wformat-length pass (e.g., with funky C++ format strings)
then I don't think that's enough of a problem to worry about, at
least not for this warning.

If it means not being able to use the solution you're working
on in the middle end  at all (unless I misunderstood that doesn't
seem to be what you're implying, but just to be sure) then that
would seem like a serious shortcoming.  I would continue to use
the code I copied from c-format.c (assuming that will still work),
but as more warnings are implemented in later passes it would
lead to duplicating code or reinventing the wheel just to get
around the limitation (or simply worse quality diagnostics).

Martin

>
> Thanks
> Dave
>
> [1] with the exception of the string concatenation records, but I
> believe those are tiny
>
David Malcolm July 29, 2016, 2:46 p.m. UTC | #7
On Fri, 2016-07-29 at 08:22 -0600, Martin Sebor wrote:
> > Currently all that we need from the C family of frontends is the
> > cpp_reader and the string concatenation records.  I think we can
> > reconstruct the cpp_reader if we have the options, though
> > presumably
> > that's per TU, so to support all this we'd need to capture e.g. the
> > per
> > -TU encoding information in the LTO records, for the case where one
> > TU
> > is UTF-8 encoded source to UTF-8 execution, and another TU is
> > EBCDIC
> > -encoded source to UCS-4 execution (or whatever).  And there's an
> > issue
> > if different TUs compiled the same header with different encoding
> > options.
> > 
> > Or... we could not bother.  This is a Quality of Implementation
> > thing,
> > for improving diagnostics, and in each case, the diagnostic is
> > required
> > to cope with substring location information not being available
> > (and
> > the code I posted in patch 2 of the kit makes it trivial to handle
> > that
> > case from a diagnostic).  So we could simply have LTO use the
> > fallback mode.
> > 
> > There are two high-level approaches I've tried:
> > 
> > (a) capture the substring location information in the lexer/parser
> > in
> > the frontend as it runs, and store it somehow.
> > 
> > (b) regenerate it "on-demand" when a diagnostic needs it.
> > 
> > Approach (b) is inherently going to be prone to the LTO issues you
> > describe, but it avoids adding to the CPU cycles/memory consumption
> > for
> > the common case of not needing the information. [1]
> > 
> > Is approach (b) acceptable?
> 
> If (b) means potentially reduced quality of the location ranges
> in the -Wformat-length pass (e.g., with funky C++ format strings)
> then I don't think that's enough of a problem to worry about, at
> least not for this warning.
> 
> If it means not being able to use the solution you're working
> on in the middle end  at all (unless I misunderstood that doesn't
> seem to be what you're implying, but just to be sure) then that
> would seem like a serious shortcoming.  I would continue to use
> the code I copied from c-format.c (assuming that will still work),
> but as more warnings are implemented in later passes it would
> lead to duplicating code or reinventing the wheel just to get
> around the limitation (or simply worse quality diagnostics).

It'll work fine for the middle-end within cc1 and cc1plus.

I'm specifically referring to LTO here, and it would be fixable from
LTO if we can encode information about the TU encoding options into the
LTO data stream, and capture the string concatenation records there too
(but that would be followup work).

> Martin
> 
> > 
> > Thanks
> > Dave
> > 
> > [1] with the exception of the string concatenation records, but I
> > believe those are tiny
> > 
>
David Malcolm July 29, 2016, 3:25 p.m. UTC | #8
On Fri, 2016-07-29 at 10:46 -0400, David Malcolm wrote:
> On Fri, 2016-07-29 at 08:22 -0600, Martin Sebor wrote:
> > > Currently all that we need from the C family of frontends is the
> > > cpp_reader and the string concatenation records.  I think we can
> > > reconstruct the cpp_reader if we have the options, though
> > > presumably
> > > that's per TU, so to support all this we'd need to capture e.g.
> > > the
> > > per
> > > -TU encoding information in the LTO records, for the case where
> > > one
> > > TU
> > > is UTF-8 encoded source to UTF-8 execution, and another TU is
> > > EBCDIC
> > > -encoded source to UCS-4 execution (or whatever).  And there's an
> > > issue
> > > if different TUs compiled the same header with different encoding
> > > options.
> > > 
> > > Or... we could not bother.  This is a Quality of Implementation
> > > thing,
> > > for improving diagnostics, and in each case, the diagnostic is
> > > required
> > > to cope with substring location information not being available
> > > (and
> > > the code I posted in patch 2 of the kit makes it trivial to
> > > handle
> > > that
> > > case from a diagnostic).  So we could simply have LTO use the
> > > fallback mode.
> > > 
> > > There are two high-level approaches I've tried:
> > > 
> > > (a) capture the substring location information in the
> > > lexer/parser
> > > in
> > > the frontend as it runs, and store it somehow.
> > > 
> > > (b) regenerate it "on-demand" when a diagnostic needs it.
> > > 
> > > Approach (b) is inherently going to be prone to the LTO issues
> > > you
> > > describe, but it avoids adding to the CPU cycles/memory
> > > consumption
> > > for
> > > the common case of not needing the information. [1]
> > > 
> > > Is approach (b) acceptable?
> > 
> > If (b) means potentially reduced quality of the location ranges
> > in the -Wformat-length pass (e.g., with funky C++ format strings)
> > then I don't think that's enough of a problem to worry about, at
> > least not for this warning.
> > 
> > If it means not being able to use the solution you're working
> > on in the middle end  at all (unless I misunderstood that doesn't
> > seem to be what you're implying, but just to be sure) then that
> > would seem like a serious shortcoming.  I would continue to use
> > the code I copied from c-format.c (assuming that will still work),
> > but as more warnings are implemented in later passes it would
> > lead to duplicating code or reinventing the wheel just to get
> > around the limitation (or simply worse quality diagnostics).
> 
> It'll work fine for the middle-end within cc1 and cc1plus.
> 
> I'm specifically referring to LTO here, and it would be fixable from
> LTO if we can encode information about the TU encoding options into
> the
> LTO data stream, and capture the string concatenation records there
> too
> (but that would be followup work).

FWIW, it appears that clang uses the on-demand approach; the relevant
code appears to be StringLiteral::getLocationOfByte:
http://clang.llvm.org/doxygen/Expr_8cpp_source.html#l01008


> 
> > Martin
> > 
> > > 
> > > Thanks
> > > Dave
> > > 
> > > [1] with the exception of the string concatenation records, but I
> > > believe those are tiny
> > > 
> >
Manuel López-Ibáñez July 29, 2016, 4:53 p.m. UTC | #9
On 29 July 2016 at 16:25, David Malcolm <dmalcolm@redhat.com> wrote:
>
> FWIW, it appears that clang uses the on-demand approach; the relevant
> code appears to be StringLiteral::getLocationOfByte:
> http://clang.llvm.org/doxygen/Expr_8cpp_source.html#l01008

As far as I know, llvm doesn't do language diagnostics from the
middle-end/LTO. Thus, they do not have those problems.

Cheers,

Manuel.
David Malcolm July 29, 2016, 5:27 p.m. UTC | #10
On Fri, 2016-07-29 at 17:53 +0100, Manuel López-Ibáñez wrote:
> On 29 July 2016 at 16:25, David Malcolm <dmalcolm@redhat.com> wrote:
> > 
> > FWIW, it appears that clang uses the on-demand approach; the
> > relevant
> > code appears to be StringLiteral::getLocationOfByte:
> > http://clang.llvm.org/doxygen/Expr_8cpp_source.html#l01008
> 
> As far as I know, llvm doesn't do language diagnostics from the
> middle-end/LTO. Thus, they do not have those problems.

If you really want to have middle-end diagnostics from LTO, I can make
the on-demand approach work.

I can also do the stored-location approach, but it would mean rewriting
all the patches again, I think, would be less efficient.

I would prefer the on-demand approach.

Who is empowered to make a decision here?
Manuel López-Ibáñez July 30, 2016, 1:17 a.m. UTC | #11
On 29 July 2016 at 18:27, David Malcolm <dmalcolm@redhat.com> wrote:
> On Fri, 2016-07-29 at 17:53 +0100, Manuel López-Ibáñez wrote:
>> On 29 July 2016 at 16:25, David Malcolm <dmalcolm@redhat.com> wrote:
>> >
>> > FWIW, it appears that clang uses the on-demand approach; the
>> > relevant
>> > code appears to be StringLiteral::getLocationOfByte:
>> > http://clang.llvm.org/doxygen/Expr_8cpp_source.html#l01008
>>
>> As far as I know, llvm doesn't do language diagnostics from the
>> middle-end/LTO. Thus, they do not have those problems.
>
> If you really want to have middle-end diagnostics from LTO, I can make
> the on-demand approach work.

Personally, I'm happy with having this work only on the FEs. I haven't
had time to look at what Martin is doing, so he may prefer otherwise.

In any case, making it work from LTO could be done as a follow-up, no?

> I can also do the stored-location approach, but it would mean rewriting
> all the patches again, I think, would be less efficient.

Agreed, FWIW.

> I would prefer the on-demand approach.
>
> Who is empowered to make a decision here?

I thought you were the diagnostics maintainer ;-)

Cheers,

Manuel.
Jeff Law Aug. 3, 2016, 3:56 p.m. UTC | #12
On 07/29/2016 11:27 AM, David Malcolm wrote:
> On Fri, 2016-07-29 at 17:53 +0100, Manuel López-Ibáñez wrote:
>> On 29 July 2016 at 16:25, David Malcolm <dmalcolm@redhat.com> wrote:
>>>
>>> FWIW, it appears that clang uses the on-demand approach; the
>>> relevant
>>> code appears to be StringLiteral::getLocationOfByte:
>>> http://clang.llvm.org/doxygen/Expr_8cpp_source.html#l01008
>>
>> As far as I know, llvm doesn't do language diagnostics from the
>> middle-end/LTO. Thus, they do not have those problems.
>
> If you really want to have middle-end diagnostics from LTO, I can make
> the on-demand approach work.
>
> I can also do the stored-location approach, but it would mean rewriting
> all the patches again, I think, would be less efficient.
>
> I would prefer the on-demand approach.
>
> Who is empowered to make a decision here?
ISTM we've got a bit of a deadlock here with the two intertwined 
patches.  I'm wondering if we can move both forward, perhaps without the 
higher quality diagnostics for Martin's work initially.  Then iterate on 
what's in-tree to add the higher quality diagnostics, then figure out 
how to deal with some of the issues we have in the LTO space.

Martin's model of running early or late depending on flags is, IMHO, the 
right approach.  And more generally its a good solution for other 
problems in this space.  With that in mind, finding a way to get at the 
diagnostics framework from within the middle end and eventually LTO is, 
IMHO, important.

Given that the diagnostics are the uncommon case, I would strongly 
prefer an on-demand approach rather than recording a ton of stuff in the 
front-end for the unlikely case that we're going to want a diagnostic in 
the middle-end or LTO.

Jeff
diff mbox

Patch

From 09824cb27c0e817b29de1c7eb9b53c603116f13e Mon Sep 17 00:00:00 2001
From: David Malcolm <dmalcolm@redhat.com>
Date: Wed, 27 Jul 2016 10:33:52 -0400
Subject: [PATCH] Avoid including cpplib.h from input.h

gcc/c-family/ChangeLog:
	* c-common.c: Include "substring-locations.h".

gcc/ChangeLog:
	* input.h: Don't include cpplib.h.
	(get_source_range_for_substring): Move to...
	* substring-locations.h: New header.
---
 gcc/c-family/c-common.c   |  1 +
 gcc/input.h               |  8 --------
 gcc/substring-locations.h | 30 ++++++++++++++++++++++++++++++
 3 files changed, 31 insertions(+), 8 deletions(-)
 create mode 100644 gcc/substring-locations.h

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index f4ffc0e..c4843db 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -45,6 +45,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-iterator.h"
 #include "opts.h"
 #include "gimplify.h"
+#include "substring-locations.h"
 
 cpp_reader *parse_in;		/* Declared in c-pragma.h.  */
 
diff --git a/gcc/input.h b/gcc/input.h
index 24d9115..c17e440 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -22,7 +22,6 @@  along with GCC; see the file COPYING3.  If not see
 #define GCC_INPUT_H
 
 #include "line-map.h"
-#include <cpplib.h>
 
 extern GTY(()) struct line_maps *line_table;
 
@@ -131,11 +130,4 @@  class GTY(()) string_concat_db
   hash_map <location_hash, string_concat *> *m_table;
 };
 
-extern const char *get_source_range_for_substring (cpp_reader *pfile,
-						   string_concat_db *concats,
-						   location_t strloc,
-						   enum cpp_ttype type,
-						   int start_idx, int end_idx,
-						   source_range *out_range);
-
 #endif
diff --git a/gcc/substring-locations.h b/gcc/substring-locations.h
new file mode 100644
index 0000000..274ebbe
--- /dev/null
+++ b/gcc/substring-locations.h
@@ -0,0 +1,30 @@ 
+/* Source locations within string literals.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_SUBSTRING_LOCATIONS_H
+#define GCC_SUBSTRING_LOCATIONS_H
+
+extern const char *get_source_range_for_substring (cpp_reader *pfile,
+						   string_concat_db *concats,
+						   location_t strloc,
+						   enum cpp_ttype type,
+						   int start_idx, int end_idx,
+						   source_range *out_range);
+
+#endif /* ! GCC_SUBSTRING_LOCATIONS_H */
-- 
1.8.5.3