diff mbox series

varasm.c: Always output flags in merged .section for LLVM assembler compatibility [PR97827]

Message ID e5299514-9286-67a1-2050-2cac5c12cb37@codesourcery.com
State New
Headers show
Series varasm.c: Always output flags in merged .section for LLVM assembler compatibility [PR97827] | expand

Commit Message

Tobias Burnus Nov. 18, 2020, 11:51 a.m. UTC
As noted by Matthias when bootstrapping with AMD GCN support [PR97827]:
Assembler source code generated by GCC might no longer assembly with
LLVM's 'mc' since LLVM 11.

The reason is that GCC generates on purpose first the section with
the flags, e.g. (via mergeable_constant_section)
    .section        .rodata.cst8,"aM",@progbits,8
and then for subsequent uses, it does not repeat the flags:
    .section        .rodata.cst8

GNU assembler warns (and with as >=2.35 gives an error) if the flags
do not match, but not if the attributes/flags are left in the other
same-named sections (as above) – just if they are specified and different.

LLVM since February (in git) and released with LLVM 11 (12 Oct 2020)
does a similar check – but without the no-error-if-no-flag exception:
   strtod.s:4472:2: error: changed section flags for .rodata.cst8, expected: 0x12
   strtod.s:4472:2: error: changed section entsize for .rodata.cst8, expected: 8


The solution done by the attached patch is to emit the full flags also
for SECTION_MERGE.

Side note: For AMD GCN, we rely on LLVM as "GNU as" does not handle
this target, yet; still, also in general, it makes sense to be
compatible with llvm-mc.

OK?

Tobias

References:
- GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97827
   with some analysis and quotes from LLVM MC and GNU AS
- Filled https://bugs.llvm.org/show_bug.cgi?id=48201 about this
   issue but no one commented on that issue so far
- Added to LLVM with commit https://reviews.llvm.org/D73999
- Some alignment was tried with GNU as (see all three links
   above) but the only "if (attr != 0)" then do check of
   gas/config/obj-elf.c was missed when implementing the
     if (Section->getType() != Type)
   in LLVM MC.

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

Comments

Jakub Jelinek Nov. 18, 2020, 12:02 p.m. UTC | #1
On Wed, Nov 18, 2020 at 12:51:02PM +0100, Tobias Burnus wrote:
> As noted by Matthias when bootstrapping with AMD GCN support [PR97827]:
> Assembler source code generated by GCC might no longer assembly with
> LLVM's 'mc' since LLVM 11.
> 
> The reason is that GCC generates on purpose first the section with
> the flags, e.g. (via mergeable_constant_section)
>    .section        .rodata.cst8,"aM",@progbits,8
> and then for subsequent uses, it does not repeat the flags:
>    .section        .rodata.cst8
> 
> GNU assembler warns (and with as >=2.35 gives an error) if the flags
> do not match, but not if the attributes/flags are left in the other
> same-named sections (as above) – just if they are specified and different.
> 
> LLVM since February (in git) and released with LLVM 11 (12 Oct 2020)
> does a similar check – but without the no-error-if-no-flag exception:
>   strtod.s:4472:2: error: changed section flags for .rodata.cst8, expected: 0x12
>   strtod.s:4472:2: error: changed section entsize for .rodata.cst8, expected: 8
> 
> 
> The solution done by the attached patch is to emit the full flags also
> for SECTION_MERGE.
> 
> Side note: For AMD GCN, we rely on LLVM as "GNU as" does not handle
> this target, yet; still, also in general, it makes sense to be
> compatible with llvm-mc.
> 
> OK?

I think we shouldn't do this except when targetting the (buggy) llvm
assembler.
Specifying section flags just on first .section directive and not others
is correct, there is no point repeating that and GNU as (but I think many
other assemblers) has been supporting it that way forever.
The only time one needs to specify the section flags again is for comdat
sections because then the section name is not unique, one needs section name
and comdat pair...

	Jakub
Richard Biener Nov. 18, 2020, 2:33 p.m. UTC | #2
On Wed, Nov 18, 2020 at 1:04 PM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Wed, Nov 18, 2020 at 12:51:02PM +0100, Tobias Burnus wrote:
> > As noted by Matthias when bootstrapping with AMD GCN support [PR97827]:
> > Assembler source code generated by GCC might no longer assembly with
> > LLVM's 'mc' since LLVM 11.
> >
> > The reason is that GCC generates on purpose first the section with
> > the flags, e.g. (via mergeable_constant_section)
> >    .section        .rodata.cst8,"aM",@progbits,8
> > and then for subsequent uses, it does not repeat the flags:
> >    .section        .rodata.cst8
> >
> > GNU assembler warns (and with as >=2.35 gives an error) if the flags
> > do not match, but not if the attributes/flags are left in the other
> > same-named sections (as above) – just if they are specified and different.
> >
> > LLVM since February (in git) and released with LLVM 11 (12 Oct 2020)
> > does a similar check – but without the no-error-if-no-flag exception:
> >   strtod.s:4472:2: error: changed section flags for .rodata.cst8, expected: 0x12
> >   strtod.s:4472:2: error: changed section entsize for .rodata.cst8, expected: 8
> >
> >
> > The solution done by the attached patch is to emit the full flags also
> > for SECTION_MERGE.
> >
> > Side note: For AMD GCN, we rely on LLVM as "GNU as" does not handle
> > this target, yet; still, also in general, it makes sense to be
> > compatible with llvm-mc.
> >
> > OK?
>
> I think we shouldn't do this except when targetting the (buggy) llvm
> assembler.
> Specifying section flags just on first .section directive and not others
> is correct, there is no point repeating that and GNU as (but I think many
> other assemblers) has been supporting it that way forever.
> The only time one needs to specify the section flags again is for comdat
> sections because then the section name is not unique, one needs section name
> and comdat pair...

The branch might need similar updates.

Richard.

>
>         Jakub
>
Richard Sandiford Nov. 19, 2020, 9:51 a.m. UTC | #3
Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Wed, Nov 18, 2020 at 12:51:02PM +0100, Tobias Burnus wrote:
>> As noted by Matthias when bootstrapping with AMD GCN support [PR97827]:
>> Assembler source code generated by GCC might no longer assembly with
>> LLVM's 'mc' since LLVM 11.
>> 
>> The reason is that GCC generates on purpose first the section with
>> the flags, e.g. (via mergeable_constant_section)
>>    .section        .rodata.cst8,"aM",@progbits,8
>> and then for subsequent uses, it does not repeat the flags:
>>    .section        .rodata.cst8
>> 
>> GNU assembler warns (and with as >=2.35 gives an error) if the flags
>> do not match, but not if the attributes/flags are left in the other
>> same-named sections (as above) – just if they are specified and different.
>> 
>> LLVM since February (in git) and released with LLVM 11 (12 Oct 2020)
>> does a similar check – but without the no-error-if-no-flag exception:
>>   strtod.s:4472:2: error: changed section flags for .rodata.cst8, expected: 0x12
>>   strtod.s:4472:2: error: changed section entsize for .rodata.cst8, expected: 8
>> 
>> 
>> The solution done by the attached patch is to emit the full flags also
>> for SECTION_MERGE.
>> 
>> Side note: For AMD GCN, we rely on LLVM as "GNU as" does not handle
>> this target, yet; still, also in general, it makes sense to be
>> compatible with llvm-mc.
>> 
>> OK?
>
> I think we shouldn't do this except when targetting the (buggy) llvm
> assembler.
> Specifying section flags just on first .section directive and not others
> is correct, there is no point repeating that and GNU as (but I think many
> other assemblers) has been supporting it that way forever.

But are there any negative effects with specifying the flags multiple
times for GNU as?  If not, then it seems simpler to generate the form
that “all” assemblers accept.

Richard
Jakub Jelinek Nov. 19, 2020, 9:54 a.m. UTC | #4
On Thu, Nov 19, 2020 at 09:51:41AM +0000, Richard Sandiford wrote:
> > I think we shouldn't do this except when targetting the (buggy) llvm
> > assembler.
> > Specifying section flags just on first .section directive and not others
> > is correct, there is no point repeating that and GNU as (but I think many
> > other assemblers) has been supporting it that way forever.
> 
> But are there any negative effects with specifying the flags multiple
> times for GNU as?  If not, then it seems simpler to generate the form
> that “all” assemblers accept.

It makes the assembler files unnecessarily larger and harder to read.

	Jakub
Richard Sandiford Nov. 19, 2020, 10:12 a.m. UTC | #5
Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Thu, Nov 19, 2020 at 09:51:41AM +0000, Richard Sandiford wrote:
>> > I think we shouldn't do this except when targetting the (buggy) llvm
>> > assembler.
>> > Specifying section flags just on first .section directive and not others
>> > is correct, there is no point repeating that and GNU as (but I think many
>> > other assemblers) has been supporting it that way forever.
>> 
>> But are there any negative effects with specifying the flags multiple
>> times for GNU as?  If not, then it seems simpler to generate the form
>> that “all” assemblers accept.
>
> It makes the assembler files unnecessarily larger and harder to read.

It certainly makes them larger, but surely not by an amount that would
cause anyone difficulty in practice.  Not sure about harder to read
though.  Personally I've never found reading code that happens to be
after the first switch to a section any harder to read than subsequent
switches to the section.

The problem with switching based on what assembler we think people
are using is that we don't always know.  It can often be useful to
try something with a different tool for comparison purposes, and that
includes using llvm-mc instead of gas.  (Certainly done that myself
a few times, and I know others have.)

I'm not saying we should bend over backwards to support difficult
quirks.  But here we're talking about a choice between (a) doing
something that works “everywhere” unconditionally (and keeping things
simple) vs. (b) having both code that takes a shortcut and code that
doesn't take a shortcut and trying to predict which one we should do.

Thanks,
Richard
Tobias Burnus Dec. 14, 2020, 6:44 p.m. UTC | #6
Regarding this discussion, see also the comments from an LLVM reviewer
at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97827 * — to keep them
at one place, I won't quote the discussion here.

It would be great if this could be solved one way or the other...

Tobias

*  for the LLVM side, see also https://reviews.llvm.org/D92052

On 19.11.20 11:12, Richard Sandiford wrote:
> Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> On Thu, Nov 19, 2020 at 09:51:41AM +0000, Richard Sandiford wrote:
>>>> I think we shouldn't do this except when targetting the (buggy) llvm
>>>> assembler.
>>>> Specifying section flags just on first .section directive and not others
>>>> is correct, there is no point repeating that and GNU as (but I think many
>>>> other assemblers) has been supporting it that way forever.
>>> But are there any negative effects with specifying the flags multiple
>>> times for GNU as?  If not, then it seems simpler to generate the form
>>> that “all” assemblers accept.
>> It makes the assembler files unnecessarily larger and harder to read.
> It certainly makes them larger, but surely not by an amount that would
> cause anyone difficulty in practice.  Not sure about harder to read
> though.  Personally I've never found reading code that happens to be
> after the first switch to a section any harder to read than subsequent
> switches to the section.
>
> The problem with switching based on what assembler we think people
> are using is that we don't always know.  It can often be useful to
> try something with a different tool for comparison purposes, and that
> includes using llvm-mc instead of gas.  (Certainly done that myself
> a few times, and I know others have.)
>
> I'm not saying we should bend over backwards to support difficult
> quirks.  But here we're talking about a choice between (a) doing
> something that works “everywhere” unconditionally (and keeping things
> simple) vs. (b) having both code that takes a shortcut and code that
> doesn't take a shortcut and trying to predict which one we should do.
>
> Thanks,
> Richard
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Jakub Jelinek Dec. 14, 2020, 7:02 p.m. UTC | #7
On Mon, Dec 14, 2020 at 07:44:12PM +0100, Tobias Burnus wrote:
> Regarding this discussion, see also the comments from an LLVM reviewer
> at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97827 * — to keep them
> at one place, I won't quote the discussion here.

I think the LLVM people must be completely ignoring hand-written assembler,
the behavior of GNU as is not something specific to GNU as, but to all other
assemblers GCC uses except for this recent LLVM as change, and has been like
that for decades, so I'm sure there is a lot of hand-written assembler that
relies on that.

So, I'm against adding workarounds for bad LLVM decisions.

	Jakub
diff mbox series

Patch

varasm.c: Always output flags in merged .section for LLVM assembler compatibility [PR97827]

For compatibility with LLVM 11's 'mc' assembler, the flags have to be
repeated every time. See also LLVM Bug 48201 for this issue and
https://reviews.llvm.org/D73999 for the patch causing the issue.

gcc/
	PR target/97827
	* varasm.c (default_elf_asm_named_section): Always output all
	flags if SECTION_MERGE, even if already declared before.

diff --git a/gcc/varasm.c b/gcc/varasm.c
index ada99940f65..51a507393a8 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6738,9 +6738,11 @@  default_elf_asm_named_section (const char *name, unsigned int flags,
   /* If we have already declared this section, we can use an
      abbreviated form to switch back to it -- unless this section is
      part of a COMDAT groups, in which case GAS requires the full
-     declaration every time.  */
+     declaration every time.  LLVM's MC linker requires that the
+     flags are identical, thus avoid the abbreviated form with MERGE.  */
   if (!(HAVE_COMDAT_GROUP && (flags & SECTION_LINKONCE))
-      && (flags & SECTION_DECLARED))
+      && (flags & SECTION_DECLARED)
+      && !(flags & SECTION_MERGE))
     {
       fprintf (asm_out_file, "\t.section\t%s\n", name);
       return;