mbox series

[00/32] C++ 20 Modules

Message ID 94392c65-0a81-17c3-c5d3-f15a5e91dd79@acm.org
Headers show
Series C++ 20 Modules | expand

Message

Nathan Sidwell Nov. 3, 2020, 9:12 p.m. UTC
Here is the implementation of C++20 modules that I have been developing 
on the devel/c++-modules branch over the last few years.

It is not a complete implementation.  The major missing pieces are:

1) Private Module Fragment
   The syntax is recognized and a sorry emitted

2) textually parsing a duplicate global module definition when a 
definition has already been read from a header-unit.  (the converse is 
supported)

3) Complete type (in)visibility when provided in implementation 
partitions that are imported into the primary interface.  Users will see 
the type as complete.

4) Internal linkage reachability rules from exported entities.  We're 
likely to accept ill-formed programs.  This will not cause us to reject 
well-formed programs.

It is some 25K new lines of code (plus testsuite).  There are about 48 
FIXMEs, nearly all in module.cc and the remaining in name-lookup.c. Of 
these 12 are QOI comments.  The remaining 36 probably fall into the 
following categories:
1/3 are repeating a FIXME mentioned elsewhere
1/3 are already resolved, or have become irrelevant
1/3 are defects (an above missing feature, a QOI issue, or something else).

I believe there is time in stage 1 to address the most significant ones.

I have bootstrapped and tested on:
x86_64-linux
aarch64-linux
powerpc8le-linux
powerpc8-aix

Iain Sandoe has been regularly bootstrapping on x86_64-darwin.  Joseph 
Myers graciously built for i686-mingw host.  We eventually ran into 
compilation errors in the analyzer, as it seemed unprepared for an 
IL32P64 host.

I have attempted to break the patches apart into coherent pieces.  But 
they are somewhat interconnected.

01-langhooks.diff	New langhooks
02-cpp-line-maps.diff 	line-map pieces
03-cpp-callbacks.diff 	Preprocessor callbacks
04-cpp-lexer.diff 	There are new lexing requirements
05-cpp-files.diff 	... and file reading functionality
06-cpp-macro.diff 	... and macro expansion rules
07-cpp-main.diff 	Main file reading
08-cpp-mkdeps.diff 	Dependency generation
09-core-diag.diff 	Core diagnostics
10-core-config.diff 	Autoconf
11-core-parmtime.diff 	parameters and time instrumentation
12-core-doc.diff 	User documentation
13-family-options.diff 	New options
14-family-keywords.diff New keyword
15-c++-lexer.diff 	New C++ lexing
16-c++-infra.diff 	C++ infrastructure interfaces
17-c++-infra-constexpr.diff new constexpr interfacing
18-c++-infra-template.diff  new template interfacing
19-global-trees.diff 	Global tree ordering
20-c++-dynctor.diff 	Dynamic constructor generation
21-core-rawbits.diff 	Some core node bits
22-c++-otherbits.diff	Miscellaneous C++ changes
23-libcody.diff 	Libcody
24-c++-mapper.diff 	Module Mapper
25-c++-modules.diff 	The Modules file
26-c++-name-lookup.diff Name lookup changes
27-c++-parser.diff 	Parser changes
28-c++-langhooks.diff 	Lang hooks implementation
29-c++-make.diff 	Makefile changes
30-test-harness.diff 	Testharness changes
31-testsuite.diff 	The testsuite
32-aix-fixincl.diff 	AIX fixinclude

Nearly all of this is within gcc/cp and libcpp/ directories.  There are 
a few changes to gcc/ and more changes in gcc/c-family/  It is likely 
that this patchset will cause breakages, for that I apologize (please 
try the modules branch and report early).

My understanding is that a Global Maintainer's approval is needed for 
such a large patchset.  It's be good to get this in as early in stage 3 
as possible (if stage 1 expires).

Definitely the most important event of today :)  But don't forget to vote.

nathan

Comments

Hans-Peter Nilsson Nov. 4, 2020, 3:14 a.m. UTC | #1
On Tue, 3 Nov 2020, Nathan Sidwell wrote:

> Here is the implementation of C++20 modules that I have been developing on the
> devel/c++-modules branch over the last few years.

Ow.

> I have bootstrapped and tested on:
> x86_64-linux
> aarch64-linux
> powerpc8le-linux
> powerpc8-aix
>
> Iain Sandoe has been regularly bootstrapping on x86_64-darwin.  Joseph Myers
> graciously built for i686-mingw host.  We eventually ran into compilation
> errors in the analyzer, as it seemed unprepared for an IL32P64 host.

(So not actually tested there.)

Are any of the powerpc targets you tested ILP32, such that the
patchset is completely tested for such a target?

brgds, H-P
Nathan Sidwell Nov. 4, 2020, 12:30 p.m. UTC | #2
On 11/3/20 10:14 PM, Hans-Peter Nilsson wrote:
> On Tue, 3 Nov 2020, Nathan Sidwell wrote:

>> I have bootstrapped and tested on:
>> x86_64-linux
>> aarch64-linux
>> powerpc8le-linux
>> powerpc8-aix
>>
>> Iain Sandoe has been regularly bootstrapping on x86_64-darwin.  Joseph Myers
>> graciously built for i686-mingw host.  We eventually ran into compilation
>> errors in the analyzer, as it seemed unprepared for an IL32P64 host.
> 
> (So not actually tested there.)
> 
> Are any of the powerpc targets you tested ILP32, such that the
> patchset is completely tested for such a target?

No.  I tried building on one of the compile farm mips machines but it 
ran out of memory compiling some of the generated expanders (or something).

rechecking the compile-farm page, I see gcc45 is a 686 machine, I'll try 
that.

nathan
Nathan Sidwell Nov. 4, 2020, 1:50 p.m. UTC | #3
On 11/4/20 7:30 AM, Nathan Sidwell wrote:

> rechecking the compile-farm page, I see gcc45 is a 686 machine, I'll try 
> that.

yeah, that didn't work.  There's compilation errors in
../../../src/gcc/config/i386/x86-tune-costs.h about missing 
initializers.  and then ...

In file included from 
/usr/lib/gcc/i586-linux-gnu/4.9/include/xmmintrin.h:34:0,
                  from 
/usr/lib/gcc/i586-linux-gnu/4.9/include/x86intrin.h:31,
                  from 
/usr/include/i386-linux-gnu/c++/4.9/bits/opt_random.h:33,
                  from /usr/include/c++/4.9/random:50,
                  from /usr/include/c++/4.9/bits/stl_algo.h:66,
                  from /usr/include/c++/4.9/algorithm:62,
                  from ../../../src/gcc/cp/mapper-resolver.cc:26:
./mm_malloc.h:42:12: error: attempt to use poisoned "malloc"
      return malloc (__size);
             ^
Makefile:1127: recipe for target 'cp/mapper-resolver.o' failed

it's a little unfortunate we can't use the standard library :(  I'll see 
what I can do about avoiding algorithm.

nathan
Jason Merrill Nov. 4, 2020, 2:15 p.m. UTC | #4
On Wed, Nov 4, 2020 at 8:50 AM Nathan Sidwell <nathan@acm.org> wrote:

> On 11/4/20 7:30 AM, Nathan Sidwell wrote:
>
> > rechecking the compile-farm page, I see gcc45 is a 686 machine, I'll try
> > that.
>
> yeah, that didn't work.  There's compilation errors in
> ../../../src/gcc/config/i386/x86-tune-costs.h about missing
> initializers.  and then ...
>
> In file included from
> /usr/lib/gcc/i586-linux-gnu/4.9/include/xmmintrin.h:34:0,
>                   from
> /usr/lib/gcc/i586-linux-gnu/4.9/include/x86intrin.h:31,
>                   from
> /usr/include/i386-linux-gnu/c++/4.9/bits/opt_random.h:33,
>                   from /usr/include/c++/4.9/random:50,
>                   from /usr/include/c++/4.9/bits/stl_algo.h:66,
>                   from /usr/include/c++/4.9/algorithm:62,
>                   from ../../../src/gcc/cp/mapper-resolver.cc:26:
> ./mm_malloc.h:42:12: error: attempt to use poisoned "malloc"
>       return malloc (__size);
>              ^
> Makefile:1127: recipe for target 'cp/mapper-resolver.o' failed
>
> it's a little unfortunate we can't use the standard library :(  I'll see
> what I can do about avoiding algorithm.
>

We can; apparently the necessary incantation is to

#define INCLUDE_ALGORITHM

before

#include "system.h"

Jason
Nathan Sidwell Nov. 4, 2020, 3:06 p.m. UTC | #5
On 11/4/20 9:15 AM, Jason Merrill wrote:
> On Wed, Nov 4, 2020 at 8:50 AM Nathan Sidwell <nathan@acm.org 
> <mailto:nathan@acm.org>> wrote:
>
> 
> We can; apparently the necessary incantation is to
> 
> #define INCLUDE_ALGORITHM

thanks that's fixed the build problem.  And working around the i386 
error I get a working toolchain.  modules test all pass except a trivial 
one detecting va_list looks different.  I must have messed the target 
check there.

so i686-linux is now known good

nathan
Boris Kolpackov Nov. 5, 2020, 7:08 a.m. UTC | #6
Nathan Sidwell <nathan@acm.org> writes:

> Here is the implementation of C++20 modules that I have been developing 
> on the devel/c++-modules branch over the last few years.

Congrats on reaching this point.


> It is not a complete implementation.  The major missing pieces are:
> 
> [...]

Building C++20 modules requires non-trivial integration between the
compiler and the build system. This patch set introduces a module
mapper, a novel mechanism for such integration. Has it been tried
by any non-toy build system and on any real project?

If the answer is "no", then by shipping modules in GCC 11 are we
making any likely changes in this area impossible or unnecessarily
difficult?

To give an example of such a likely change, currently the mapper
has a notion of the central module repository directory that is
used to resolve all the relative CMI (compiled module interface[1])
paths (even paths like ./foo.gcm). However, this model will not
apply to all build systems. For example, in build2 (the build
system I am involved with), there can be no such central place
since a project can pull dependencies that are built in other
places. Currently, the only way to disable this repository
semantics is to use absolute CMI paths throughout.

Also, FWIW, I've attempted such a build system integration with
build2 back in 2019. While the overall idea of the module mapper
worked well, I had to make substantial extensions in my own
branch[2] of Nathan's c++-modules (also described in this[3]
WG21 paper). AFAIK, these extensions haven't yet been considered
for merging into c++-modules.

[1] BTW, SG15 seems to have settled on the BMI (built module
    interface) term instead of CMI:

    https://github.com/cplusplus/modules-ecosystem-tr/blob/master/definitions.tex

[2] https://github.com/boris-kolpackov/gcc-cxx-modules-ex

    The branch used to live on gcc.gnu.org/git but was dropped as
    part of the svn-to-git migration.

[3] https://wg21.link/P1842
Richard Biener Nov. 5, 2020, 1:33 p.m. UTC | #7
On Tue, Nov 3, 2020 at 10:12 PM Nathan Sidwell <nathan@acm.org> wrote:
>
> Here is the implementation of C++20 modules that I have been developing
> on the devel/c++-modules branch over the last few years.
>
> It is not a complete implementation.  The major missing pieces are:
>
> 1) Private Module Fragment
>    The syntax is recognized and a sorry emitted
>
> 2) textually parsing a duplicate global module definition when a
> definition has already been read from a header-unit.  (the converse is
> supported)
>
> 3) Complete type (in)visibility when provided in implementation
> partitions that are imported into the primary interface.  Users will see
> the type as complete.
>
> 4) Internal linkage reachability rules from exported entities.  We're
> likely to accept ill-formed programs.  This will not cause us to reject
> well-formed programs.
>
> It is some 25K new lines of code (plus testsuite).  There are about 48
> FIXMEs, nearly all in module.cc and the remaining in name-lookup.c. Of
> these 12 are QOI comments.  The remaining 36 probably fall into the
> following categories:
> 1/3 are repeating a FIXME mentioned elsewhere
> 1/3 are already resolved, or have become irrelevant
> 1/3 are defects (an above missing feature, a QOI issue, or something else).
>
> I believe there is time in stage 1 to address the most significant ones.
>
> I have bootstrapped and tested on:
> x86_64-linux
> aarch64-linux
> powerpc8le-linux
> powerpc8-aix
>
> Iain Sandoe has been regularly bootstrapping on x86_64-darwin.  Joseph
> Myers graciously built for i686-mingw host.  We eventually ran into
> compilation errors in the analyzer, as it seemed unprepared for an
> IL32P64 host.
>
> I have attempted to break the patches apart into coherent pieces.  But
> they are somewhat interconnected.
>
> 01-langhooks.diff       New langhooks
> 02-cpp-line-maps.diff   line-map pieces
> 03-cpp-callbacks.diff   Preprocessor callbacks
> 04-cpp-lexer.diff       There are new lexing requirements
> 05-cpp-files.diff       ... and file reading functionality
> 06-cpp-macro.diff       ... and macro expansion rules
> 07-cpp-main.diff        Main file reading
> 08-cpp-mkdeps.diff      Dependency generation
> 09-core-diag.diff       Core diagnostics
> 10-core-config.diff     Autoconf
> 11-core-parmtime.diff   parameters and time instrumentation
> 12-core-doc.diff        User documentation
> 13-family-options.diff  New options
> 14-family-keywords.diff New keyword
> 15-c++-lexer.diff       New C++ lexing
> 16-c++-infra.diff       C++ infrastructure interfaces
> 17-c++-infra-constexpr.diff new constexpr interfacing
> 18-c++-infra-template.diff  new template interfacing
> 19-global-trees.diff    Global tree ordering
> 20-c++-dynctor.diff     Dynamic constructor generation
> 21-core-rawbits.diff    Some core node bits
> 22-c++-otherbits.diff   Miscellaneous C++ changes
> 23-libcody.diff         Libcody
> 24-c++-mapper.diff      Module Mapper
> 25-c++-modules.diff     The Modules file
> 26-c++-name-lookup.diff Name lookup changes
> 27-c++-parser.diff      Parser changes
> 28-c++-langhooks.diff   Lang hooks implementation
> 29-c++-make.diff        Makefile changes
> 30-test-harness.diff    Testharness changes
> 31-testsuite.diff       The testsuite
> 32-aix-fixincl.diff     AIX fixinclude
>
> Nearly all of this is within gcc/cp and libcpp/ directories.  There are
> a few changes to gcc/ and more changes in gcc/c-family/  It is likely
> that this patchset will cause breakages, for that I apologize (please
> try the modules branch and report early).
>
> My understanding is that a Global Maintainer's approval is needed for
> such a large patchset.  It's be good to get this in as early in stage 3
> as possible (if stage 1 expires).

From a RM perspective this is OK if merging doesn't drag itself too
far along.  Expect build & install fallout from the more weird hosts & targets
we have though.

Moving the module mapper to a more easily (build-)testable location
and to a place where host dependences can be more easily fixed
& customized than in a bootstrapped directory would be nice.  Thus,
I think the module mapper should be in the toplevel somehow
and independently buildable.

Richard.

> Definitely the most important event of today :)  But don't forget to vote.
>
> nathan
>
> --
> Nathan Sidwell
David Malcolm Nov. 5, 2020, 2:30 p.m. UTC | #8
On Tue, 2020-11-03 at 16:12 -0500, Nathan Sidwell wrote:

[...]

[CCing Joseph]

> I have bootstrapped and tested on:
> x86_64-linux
> aarch64-linux
> powerpc8le-linux
> powerpc8-aix
> 
> Iain Sandoe has been regularly bootstrapping on x86_64-
> darwin.  Joseph 
> Myers graciously built for i686-mingw host.  We eventually ran into 
> compilation errors in the analyzer, as it seemed unprepared for an 
> IL32P64 host.

Sorry about the issues with the analyzer with IL32P64 hosts.  I pushed
Markus Böck's fix for PR 96608 to master on 2020-10-27 as
942086bf73ee2ba6cfd7fdacc552940048437a6e.

Is anyone still seeing build issues with the analyzer?  (and is there a
machine in the compile farm I can test them out on?).

Dave
Nathan Sidwell Nov. 5, 2020, 3:25 p.m. UTC | #9
On 11/5/20 8:33 AM, Richard Biener wrote:

> Moving the module mapper to a more easily (build-)testable location
> and to a place where host dependences can be more easily fixed
> & customized than in a bootstrapped directory would be nice.  Thus,
> I think the module mapper should be in the toplevel somehow
> and independently buildable.

Ok, that makes sense.  It is where it is, because originally it was much 
more tightly coupled with cc1plus.

The mapper-server and cc1plus do share some (maybe just one?) obj files. 
The in-process resolving and the server's default have the same 
functionality.

For bootstrap cc1plus needs them, so I guess they should remain in 
gcc/cp/?  The alternative would be to put them in new mapper-server dir 
and have it provide somekind of library that cc1plus could link with. 
However that'll probably mess up bootstrap.

Having a --with-module-mapper configure option seems sensible.

nathan
Nathan Sidwell Nov. 5, 2020, 5:17 p.m. UTC | #10
On 11/5/20 2:08 AM, Boris Kolpackov wrote:

> 
> To give an example of such a likely change, currently the mapper
> has a notion of the central module repository directory that is
> used to resolve all the relative CMI (compiled module interface[1])
> paths (even paths like ./foo.gcm). However, this model will not
> apply to all build systems. For example, in build2 (the build
> system I am involved with), there can be no such central place
> since a project can pull dependencies that are built in other
> places. Currently, the only way to disable this repository
> semantics is to use absolute CMI paths throughout.

The repo is providing a mechanism by which two processes can synchronize 
on a fixed location in the file system that is not /.  You need such a 
capability as the file system is the bulk transfer mechanism.

The alternatives are to always use absolute paths, or require the two 
ends of the communication to have the same working directory, or have 
one end of the communication to map file system locations into the other 
end's view.  That'll probably require knowing some fixed point, which 
you have to figure out how synchronize, and we're back to defining more 
fixed points in the file system.

The location of the repo is entirely under the mapper-server's control. 
Set it to / if you want.

nathan
Richard Biener Nov. 5, 2020, 6:45 p.m. UTC | #11
On November 5, 2020 4:25:23 PM GMT+01:00, Nathan Sidwell <nathan@acm.org> wrote:
>On 11/5/20 8:33 AM, Richard Biener wrote:
>
>> Moving the module mapper to a more easily (build-)testable location
>> and to a place where host dependences can be more easily fixed
>> & customized than in a bootstrapped directory would be nice.  Thus,
>> I think the module mapper should be in the toplevel somehow
>> and independently buildable.
>
>Ok, that makes sense.  It is where it is, because originally it was
>much 
>more tightly coupled with cc1plus.
>
>The mapper-server and cc1plus do share some (maybe just one?) obj
>files. 
>The in-process resolving and the server's default have the same 
>functionality.
>
>For bootstrap cc1plus needs them, so I guess they should remain in 
>gcc/cp/?  The alternative would be to put them in new mapper-server dir

Guess some file you can include from the mapper dir (and thus build it twice) would work? I'm not suggesting another static library, if the maybe libiberty if the thing is remotely generic. 

>and have it provide somekind of library that cc1plus could link with. 
>However that'll probably mess up bootstrap.
>
>Having a --with-module-mapper configure option seems sensible.
>
>nathan
Boris Kolpackov Nov. 6, 2020, 2:26 p.m. UTC | #12
Nathan Sidwell <nathan@acm.org> writes:

> The repo is providing a mechanism by which two processes can synchronize 
> on a fixed location in the file system that is not /.  You need such a 
> capability as the file system is the bulk transfer mechanism.
> 
> The alternatives are to always use absolute paths, or require the two 
> ends of the communication to have the same working directory [...]

Isn't the latter pretty much the norm for a build system that spawns
the compiler?


> The location of the repo is entirely under the mapper-server's control. 
> Set it to / if you want.

Except that now all my relative paths are relative to / and not CWD.

I find the current semantics heavily skewed towards the mapper operating
outside the build system (like the builtin mapper) while I expect most
non-toy/legacy build systems that wish to support C++ modules to have
an integrated mapper (build2 certainly does it this way). I think there
should at least be a way for the mapper to opt out of this repository
functionality.


Also, you mentioning synchronization reminded me of this part from
Invoking GCC/C++ Modules:

> When creating an output CMI any missing directory components are
> created in a manner that is safe for concurrent builds creating
> multiple, different, CMIs within a common subdirectory tree.
>
> CMI contents are written to a temporary file, which is then atomically
> renamed.  Observers will either see old contents (if there is an
> existing file), or complete new contents.  They will not observe the CMI
> during its creation.

This works atomically on POSIX but not on Windows. Also, from experience,
on Windows creating a temporary file and then renaming it often causes
more problems than creating it in the final destination from the outset.
That's because on Windows you cannot (re)move a file that is open by
another process. And there are various processes on Windows (anti-virus/
malware, indexers, IDEs, etc) that routinely scan the filesystem.
Mike Stump Nov. 13, 2020, 9:55 p.m. UTC | #13
On Nov 3, 2020, at 1:12 PM, Nathan Sidwell <nathan@acm.org> wrote:
> 
> Here is the implementation of C++20 modules that I have been developing on the devel/c++-modules branch over the last few years.

I was just recently wondering about this.  Congratulations.

> It is some 25K new lines of code (plus testsuite).

> Definitely the most important event of today :)

I agree.

> don't forget to vote.

I vote yes; although, I didn't know we had switched to voting patches in.

:-)
Boris Kolpackov Nov. 16, 2020, 8:50 a.m. UTC | #14
Nathan Sidwell <nathan@acm.org> writes:

> It is not a complete implementation.  The major missing pieces are: [...]

Would now be a good time to start reporting bugs in bugzilla so that
things don't fall through the cracks? Is so, would it make sense to
add the "c++ modules" component to bugzilla?