Patchwork PPC: Fix linker scripts on ppc hosts

login
register
mail settings
Submitter Alexander Graf
Date Dec. 12, 2011, 9:36 p.m.
Message ID <1323725761-5629-1-git-send-email-agraf@suse.de>
Download mbox | patch
Permalink /patch/130932/
State New
Headers show

Comments

Alexander Graf - Dec. 12, 2011, 9:36 p.m.
When compiling qemu statically with multilib on PPC, we hit the
same issue that commit 845f2c2812d9ed24b36c02a3d06ee83aeafe8b49
is fixing. Do the same here.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 ppc.ld   |   16 ++++++++++++++--
 ppc64.ld |   16 ++++++++++++++--
 2 files changed, 28 insertions(+), 4 deletions(-)
Richard Henderson - Dec. 13, 2011, 12:55 a.m.
On 12/12/2011 01:36 PM, Alexander Graf wrote:
> When compiling qemu statically with multilib on PPC, we hit the
> same issue that commit 845f2c2812d9ed24b36c02a3d06ee83aeafe8b49
> is fixing. Do the same here.

How many of these ld files can we get rid of if we use -Ttext-segment instead?
Generally all we're really caring about is moving the program base around so
that it doesn't conflict with the address space we want to use for the client.


r~
Alexander Graf - Dec. 13, 2011, 1:05 a.m.
On 13.12.2011, at 01:55, Richard Henderson wrote:

> On 12/12/2011 01:36 PM, Alexander Graf wrote:
>> When compiling qemu statically with multilib on PPC, we hit the
>> same issue that commit 845f2c2812d9ed24b36c02a3d06ee83aeafe8b49
>> is fixing. Do the same here.
> 
> How many of these ld files can we get rid of if we use -Ttext-segment instead?
> Generally all we're really caring about is moving the program base around so
> that it doesn't conflict with the address space we want to use for the client.

I tried to play with that as well but couldn't get it to work. If you do, I'd be more than happy to get rid of them :)


Alex
Paul Brook - Dec. 13, 2011, 6:19 a.m.
> > When compiling qemu statically with multilib on PPC, we hit the
> > same issue that commit 845f2c2812d9ed24b36c02a3d06ee83aeafe8b49
> > is fixing. Do the same here.
> 
> How many of these ld files can we get rid of if we use -Ttext-segment
> instead? Generally all we're really caring about is moving the program
> base around so that it doesn't conflict with the address space we want to
> use for the client.

Now that we have the automatic GUEST_BASE stuff you shouldn't need to do 
either.

Paul
Alexander Graf - Dec. 13, 2011, 7:45 a.m.
On 13.12.2011, at 07:19, Paul Brook <paul@codesourcery.com> wrote:

>>> When compiling qemu statically with multilib on PPC, we hit the
>>> same issue that commit 845f2c2812d9ed24b36c02a3d06ee83aeafe8b49
>>> is fixing. Do the same here.
>> 
>> How many of these ld files can we get rid of if we use -Ttext-segment
>> instead? Generally all we're really caring about is moving the program
>> base around so that it doesn't conflict with the address space we want to
>> use for the client.
> 
> Now that we have the automatic GUEST_BASE stuff you shouldn't need to do 
> either.

If it was working, yes :)

Alex
Peter Maydell - Dec. 13, 2011, 8:13 a.m.
On 13 December 2011 06:19, Paul Brook <paul@codesourcery.com> wrote:
>> > When compiling qemu statically with multilib on PPC, we hit the
>> > same issue that commit 845f2c2812d9ed24b36c02a3d06ee83aeafe8b49
>> > is fixing. Do the same here.
>>
>> How many of these ld files can we get rid of if we use -Ttext-segment
>> instead? Generally all we're really caring about is moving the program
>> base around so that it doesn't conflict with the address space we want to
>> use for the client.
>
> Now that we have the automatic GUEST_BASE stuff you shouldn't need to do
> either.

...which reminds me, are we ever going to add guest_base support to
the SPARC TCG targets? configure says that's the only one that doesn't
currently support it...

-- PMM
Paul Brook - Dec. 13, 2011, 4:31 p.m.
> On 13.12.2011, at 07:19, Paul Brook <paul@codesourcery.com> wrote:
> >>> When compiling qemu statically with multilib on PPC, we hit the
> >>> same issue that commit 845f2c2812d9ed24b36c02a3d06ee83aeafe8b49
> >>> is fixing. Do the same here.
> >> 
> >> How many of these ld files can we get rid of if we use -Ttext-segment
> >> instead? Generally all we're really caring about is moving the program
> >> base around so that it doesn't conflict with the address space we want
> >> to use for the client.
> > 
> > Now that we have the automatic GUEST_BASE stuff you shouldn't need to do
> > either.
> 
> If it was working, yes :)

What doesn't work?  I put a fair amout of effort into making it automatically 
pick a sensible value.  If there's some reason that won't work then you 
probably want to be using -R.

Paul
Alexander Graf - Dec. 13, 2011, 9:59 p.m.
On 13.12.2011, at 17:31, Paul Brook <paul@codesourcery.com> wrote:

>> On 13.12.2011, at 07:19, Paul Brook <paul@codesourcery.com> wrote:
>>>>> When compiling qemu statically with multilib on PPC, we hit the
>>>>> same issue that commit 845f2c2812d9ed24b36c02a3d06ee83aeafe8b49
>>>>> is fixing. Do the same here.
>>>> 
>>>> How many of these ld files can we get rid of if we use -Ttext-segment
>>>> instead? Generally all we're really caring about is moving the program
>>>> base around so that it doesn't conflict with the address space we want
>>>> to use for the client.
>>> 
>>> Now that we have the automatic GUEST_BASE stuff you shouldn't need to do
>>> either.
>> 
>> If it was working, yes :)
> 
> What doesn't work?  I put a fair amout of effort into making it automatically 
> pick a sensible value.  If there's some reason that won't work then you 
> probably want to be using -R.

IIRC mmap'ing files would break with 32-on-64, but I'd have to check up on the details. I ended up passing MAP_32BIT to all linux-user mmap calla for 32-on-x86_64, but that doesn't work with -R.

But either way, this patch fixes an immediate build failure on ppc hosts for us and should regardless be applied to 1.0-stable. I would also vote for adding it to HEAD now and go for the removal of all linker scripts later, if we're sure we don't have any regressions.


Alex
Paul Brook - Dec. 14, 2011, 12:30 a.m.
> >>>>> When compiling qemu statically with multilib on PPC, we hit the
> >>>>> same issue that commit 845f2c2812d9ed24b36c02a3d06ee83aeafe8b49
> >>>>> is fixing. Do the same here.
> >>>> 
> >>>> How many of these ld files can we get rid of if we use -Ttext-segment
> >>>> instead? Generally all we're really caring about is moving the program
> >>>> base around so that it doesn't conflict with the address space we want
> >>>> to use for the client.
> >>> 
> >>> Now that we have the automatic GUEST_BASE stuff you shouldn't need to
> >>> do either.
> >> 
> >> If it was working, yes :)
> > 
> > What doesn't work?  I put a fair amout of effort into making it
> > automatically pick a sensible value.  If there's some reason that won't
> > work then you probably want to be using -R.
> 
> IIRC mmap'ing files would break with 32-on-64, but I'd have to check up on
> the details. I ended up passing MAP_32BIT to all linux-user mmap calla for
> 32-on-x86_64, but that doesn't work with -R.

Hmm, I thought we'd fixed that.  It's the reason h2g_valid exists.

Either way it should definitely work with -R.  I specifically added that to 
avoid problems with the host mmap picking inconvenient addresse.

MAP_32BIT is an unconsionable hack, and doesn't exist on other 64-bit hosts.

Paul
Peter Maydell - Dec. 14, 2011, 8:53 a.m.
On 14 December 2011 00:30, Paul Brook <paul@codesourcery.com> wrote:
>> IIRC mmap'ing files would break with 32-on-64, but I'd have to check up on
>> the details. I ended up passing MAP_32BIT to all linux-user mmap calla for
>> 32-on-x86_64, but that doesn't work with -R.
>
> Hmm, I thought we'd fixed that.  It's the reason h2g_valid exists

A lot of the problem is that linux-user/mmap.c isn't very clever. What
happens, IIRC, is something like this:
 * we pick a guest base, and happily start to hand out memory from there
 * at some point, we hit a host shared library or whatever, so the
   kernel can't use our hinted preferred address, and picks one itself.
   On 64 bit kernels it seems to usually like to skip way ahead into
   the >4GB bit of the virtual address space, even if there's still
   plenty of space below 4GB
 * mmap_find_vma() wrongly assumes this means there's no more memory
   to be had below 4GB, and starts again with a hint address at the
   bottom of memory
 * that address is typically already used (by host lib or by a previous
   guest mmap). The kernel hands us back the same useless >4GB address.
 * mmap_find_vma() says "ooh, same as last time" and decides this means
   we're out of memory.

The effect is that on a 32-on-64 config we will fail mmap() unnecessarily
and in a lot of cases which work fine on 32-on-32.

The cheesy solution is to use MAP_32BIT, which I agree is a nasty hack.
The proper solution would be to rewrite mmap.c to be smarter (perhaps
by looking at /proc/self/maps and reserving a lot of space with PROT_NONE
mappings at startup and then managing it itself), but so far nobody's
done that, and MAP_32BIT is a much smaller change that improves matters
in the 99% situation (ie "host is x86-64").

-- PMM
Paul Brook - Dec. 14, 2011, 12:04 p.m.
> The proper solution would be to rewrite mmap.c to be smarter (perhaps
> by looking at /proc/self/maps and reserving a lot of space with PROT_NONE
> mappings at startup and then managing it itself), but so far nobody's
> done that

Yes they have. That's what -R does.

We used to try and parse /proc/self/maps.  This caused more problems than it 
solved.  It doesn't cover things like mmap_min_addr, and you have to re-parse 
it before every allocation in case the host libc allocated something new in 
between.

Paul
Peter Maydell - Dec. 14, 2011, 12:21 p.m.
On 14 December 2011 12:04, Paul Brook <paul@codesourcery.com> wrote:
>> The proper solution would be to rewrite mmap.c to be smarter (perhaps
>> by looking at /proc/self/maps and reserving a lot of space with PROT_NONE
>> mappings at startup and then managing it itself), but so far nobody's
>> done that
>
> Yes they have. That's what -R does.

-R doesn't happen by default, it requires you to specify how much you
want, and it insists that the space all be in one chunk.

> We used to try and parse /proc/self/maps.  This caused more problems than it
> solved.  It doesn't cover things like mmap_min_addr, and you have to re-parse
> it before every allocation in case the host libc allocated something new in
> between.

If you've used a PROT_NONE mapping to claim the space at startup, host libc
doesn't override that mapping, does it?

-- PMM
Paul Brook - Dec. 14, 2011, 5:34 p.m.
> >> The proper solution would be to rewrite mmap.c to be smarter (perhaps
> >> by looking at /proc/self/maps and reserving a lot of space with
> >> PROT_NONE mappings at startup and then managing it itself), but so far
> >> nobody's done that
> > 
> > Yes they have. That's what -R does.
> 
> -R doesn't happen by default, it requires you to specify how much you
> want, and it insists that the space all be in one chunk.

I've covered the defaults elsewhere in this thread.

If your 64-bit host can't find a contiguous 4G block of address space then 
you've much more serious issues.  System policies preventing applications 
allocating that much address space are a different problem, and splitting the 
into chunks would not help.

> > We used to try and parse /proc/self/maps.  This caused more problems than
> > it solved.  It doesn't cover things like mmap_min_addr, and you have to
> > re-parse it before every allocation in case the host libc allocated
> > something new in between.
> 
> If you've used a PROT_NONE mapping to claim the space at startup, host libc
> doesn't override that mapping, does it?

Ah, I see what you mean.

If you're solving problems other than 32-on-64 then there's some argument for 
allowing discontiguous blocks.  I'm not convinced there's much point parsing 
/proc/self/maps though.  Just keep calling mmap for sensible sized blocks 
until you either have enough address space or it fails.  In the latter case 
maybe free some back immediately to give the host libc room to work.

The hard bit is coming up with heuristics for "sensibe size block" and "have 
enough".  /proc/self/maps only tells you which areas of the host VM are 
currently mapped.  What we really want to know is which areas are available to 
be mapped.  On 32-bit hosts this may be this may be less than half of the gaps 
in /proc/self/maps. On 64-bit hosts it's many orders of magnitude smaller.  
For example x86-64 only has 47-bits of usable virtual address space.

Nested qemu is also going to make a complete mess of /proc/self/maps, though I 
admit you're probably going to trip over other bugs first :-)

Paul

Patch

diff --git a/ppc.ld b/ppc.ld
index 69aa3f2..2a0dcad 100644
--- a/ppc.ld
+++ b/ppc.ld
@@ -49,8 +49,20 @@  SECTIONS
   .rela.sbss2     : { *(.rela.sbss2 .rela.sbss2.* .rela.gnu.linkonce.sb2.*) }
   .rel.bss        : { *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) }
   .rela.bss       : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) }
-  .rel.plt        : { *(.rel.plt) }
-  .rela.plt       : { *(.rela.plt) }
+  .rel.plt      :
+  {
+    *(.rel.plt)
+    PROVIDE (__rel_iplt_start = .);
+    *(.rel.iplt)
+    PROVIDE (__rel_iplt_end = .);
+  }
+  .rela.plt       :
+  {
+    *(.rela.plt)
+    PROVIDE (__rela_iplt_start = .);
+    *(.rela.iplt)
+    PROVIDE (__rela_iplt_end = .);
+  }
   .init           :
   {
     KEEP (*(.init))
diff --git a/ppc64.ld b/ppc64.ld
index 0a7c0dd..e2dafa0 100644
--- a/ppc64.ld
+++ b/ppc64.ld
@@ -54,8 +54,20 @@  SECTIONS
       *(.rela.sbss2 .rela.sbss2.* .rela.gnu.linkonce.sb2.*)
       *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*)
     }
-  .rel.plt        : { *(.rel.plt) }
-  .rela.plt       : { *(.rela.plt) }
+  .rel.plt      :
+  {
+    *(.rel.plt)
+    PROVIDE (__rel_iplt_start = .);
+    *(.rel.iplt)
+    PROVIDE (__rel_iplt_end = .);
+  }
+  .rela.plt       :
+  {
+    *(.rela.plt)
+    PROVIDE (__rela_iplt_start = .);
+    *(.rela.iplt)
+    PROVIDE (__rela_iplt_end = .);
+  }
   .rela.tocbss	  : { *(.rela.tocbss) }
   .init           :
   {