diff mbox

[v2] Build *-user targets as PIE

Message ID 1251980117-10089-1-git-send-email-kirill@shutemov.name
State Superseded
Headers show

Commit Message

Kirill A. Shutemov Sept. 3, 2009, 12:15 p.m. UTC
Now we can drop link hack for i386 and fix text relocations on i386 host.

v2:
  - Add configure options do enable/disable PIE for usermode targets.
    Disabling can be useful if you build uswing toolchain which has
    broken PIE support. PIE for usermode targets enabled by default.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 Makefile          |   10 +---------
 Makefile.target   |   19 +++++++++++++++----
 configure         |   26 +++++++++++++++-----------
 linux-user/main.c |   20 --------------------
 4 files changed, 31 insertions(+), 44 deletions(-)

Comments

Reimar Döffinger Sept. 3, 2009, noon UTC | #1
On Thu, Sep 03, 2009 at 03:15:17PM +0300, Kirill A. Shutemov wrote:
> Now we can drop link hack for i386 and fix text relocations on i386 host.
> 
> v2:
>   - Add configure options do enable/disable PIE for usermode targets.
>     Disabling can be useful if you build uswing toolchain which has
>     broken PIE support. PIE for usermode targets enabled by default.

Hm. Would be nice if the commit message said more about the "why". What
is the advantage of PIE (I mean, is there something special about qemu
that makes it particularly useful)? Is there any measurable speed
difference between PIE and no PIE?
(sorry if it was explained for v1, I must have missed that one)
Kirill A. Shutemov Sept. 3, 2009, 12:51 p.m. UTC | #2
On Thu, Sep 3, 2009 at 3:07 PM, Juan Quintela<quintela@trasno.org> wrote:
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>> Now we can drop link hack for i386 and fix text relocations on i386
>> host.
>
> Still not good enough :(
>
> Fedora 11 here.  I got this error:
>
> /usr/bin/ld: main.o: relocation R_X86_64_TPO  LINK  arm-linux-user/qemu-arm
> /usr/bin/ld: main.o: relocation R_X86_64_TPOFF32 against `thread_env' can not be used when making a shared object; recompile with -fPIC
> main.o: could not read symbols: Bad value
> collect2: ld returned 1 exit status
> make[1]: *** [qemu-arm] Error 1
> make: *** [subdir-arm-linux-user] Error 2
>
> (I got it for all the -linux-user targets)

What version of binutils do you have? It seems your binutils is buggy.
Kirill A. Shutemov Sept. 3, 2009, 2:21 p.m. UTC | #3
On Thu, Sep 3, 2009 at 3:00 PM, Reimar
Döffinger<Reimar.Doeffinger@gmx.de> wrote:
> On Thu, Sep 03, 2009 at 03:15:17PM +0300, Kirill A. Shutemov wrote:
>> Now we can drop link hack for i386 and fix text relocations on i386 host.
>>
>> v2:
>>   - Add configure options do enable/disable PIE for usermode targets.
>>     Disabling can be useful if you build uswing toolchain which has
>>     broken PIE support. PIE for usermode targets enabled by default.
>
> Hm. Would be nice if the commit message said more about the "why". What
> is the advantage of PIE (I mean, is there something special about qemu
> that makes it particularly useful)?

The main advantage is that we can drop linking hack for i386 (and keep keep
qemu self-virtualizable) and solve  text relocations.

The other advantage is security. Since qemu is PIE kernel can load at random
position in memory. It makes qemu harder for many types of attacks.

> Is there any measurable speed
> difference between PIE and no PIE?

Actually, I have no numbers for qemu.

PIE code usually is a bit slower. Approximately, 1% for i386 according to
some tests. RISC architectures should be affected less, since they have
more registers. On other hand we are getting rid from text relocations on
i386 which make executable loading slower. So...
Paul Brook Sept. 3, 2009, 2:38 p.m. UTC | #4
> PIE code usually is a bit slower. Approximately, 1% for i386 according to
> some tests. RISC architectures should be affected less, since they have
> more registers. On other hand we are getting rid from text relocations on
> i386 which make executable loading slower. So...

I think you've got that backwards.
A traditional (fixed address) executable requires no load-time relocation for 
internal references because all addresses are known at static link time. PIE 
require the dynamic linker adjust all absolute addresses.

Paul
Paul Brook Sept. 3, 2009, 2:39 p.m. UTC | #5
On Thursday 03 September 2009, Kirill A. Shutemov wrote:
> Now we can drop link hack for i386 and fix text relocations on i386 host.
>
> v2:
>   - Add configure options do enable/disable PIE for usermode targets.
>     Disabling can be useful if you build uswing toolchain which has
>     broken PIE support. PIE for usermode targets enabled by default.

This isn't as useful as you might think.

How do you stop the host dynamic linker loading qemu where the guest 
application expects to be loaded?

Paul
Paolo Bonzini Sept. 3, 2009, 2:52 p.m. UTC | #6
On 09/03/2009 04:38 PM, Paul Brook wrote:
>> PIE code usually is a bit slower. Approximately, 1% for i386 according to
>> some tests. RISC architectures should be affected less, since they have
>> more registers. On other hand we are getting rid from text relocations on
>> i386 which make executable loading slower. So...
>
> I think you've got that backwards.
> A traditional (fixed address) executable requires no load-time relocation for
> internal references because all addresses are known at static link time. PIE
> require the dynamic linker adjust all absolute addresses.

Yes, but since it's also compiled as PIE, there are no absolute 
addresses.  Previously QEMU was linked -shared but compiled as 
non-position independent code.  I am not sure whether only the 
self-virtualized machine would be subject to relocation, or also the 
outer one (maybe address space virtualization would also have to be 
taken into account?).

Anyway, as far as text relocations are concerned Kirill's pathc cannot 
make things worse.

Paolo
Kirill A. Shutemov Sept. 3, 2009, 3:07 p.m. UTC | #7
On Thu, Sep 3, 2009 at 5:38 PM, Paul Brook<paul@codesourcery.com> wrote:
>> PIE code usually is a bit slower. Approximately, 1% for i386 according to
>> some tests. RISC architectures should be affected less, since they have
>> more registers. On other hand we are getting rid from text relocations on
>> i386 which make executable loading slower. So...
>
> I think you've got that backwards.
> A traditional (fixed address) executable requires no load-time relocation for
> internal references because all addresses are known at static link time. PIE
> require the dynamic linker adjust all absolute addresses.

Usermode qemu on i386 is not a traditional executable, sicne it uses
-Wl,-shared for linking. In result we've got an executable which looks
like PIE, but dynamic linker have to resolve text relocations. I think
the best way it to create a true PIE without a text relocations.

P.S. I press "reply" instead "reply all" first time.
Kirill A. Shutemov Sept. 3, 2009, 3:08 p.m. UTC | #8
On Thu, Sep 3, 2009 at 5:39 PM, Paul Brook<paul@codesourcery.com> wrote:
> On Thursday 03 September 2009, Kirill A. Shutemov wrote:
>> Now we can drop link hack for i386 and fix text relocations on i386 host.
>>
>> v2:
>>   - Add configure options do enable/disable PIE for usermode targets.
>>     Disabling can be useful if you build uswing toolchain which has
>>     broken PIE support. PIE for usermode targets enabled by default.
>
> This isn't as useful as you might think.
>
> How do you stop the host dynamic linker loading qemu where the guest
> application expects to be loaded?

At least it not worse that it was. For kernel qemu with the linking hack looks
like PIE and it can load it at random address, doesn't it?

P.S. I press "reply" instead "reply all" first time.
Reimar Döffinger Sept. 3, 2009, 5:17 p.m. UTC | #9
On Thu, Sep 03, 2009 at 06:07:21PM +0300, Kirill A. Shutemov wrote:
> On Thu, Sep 3, 2009 at 5:38 PM, Paul Brook<paul@codesourcery.com> wrote:
> >> PIE code usually is a bit slower. Approximately, 1% for i386 according to
> >> some tests. RISC architectures should be affected less, since they have
> >> more registers. On other hand we are getting rid from text relocations on
> >> i386 which make executable loading slower. So...
> >
> > I think you've got that backwards.
> > A traditional (fixed address) executable requires no load-time relocation for
> > internal references because all addresses are known at static link time. PIE
> > require the dynamic linker adjust all absolute addresses.
> 
> Usermode qemu on i386 is not a traditional executable, sicne it uses
> -Wl,-shared for linking. In result we've got an executable which looks
> like PIE, but dynamic linker have to resolve text relocations. I think
> the best way it to create a true PIE without a text relocations.

It is close to getting of topic, but since you state it, why try so hard
to avoid text relocations?
Sure, there are advantages (the biggest one is less issues with
mis-/insufficiently configured selinux I think), possibly better sharing
of pages when many instances are run and better delayed loading, but on
x86/i386 that doesn't sound like a clear advantage compared to the in
some cases quite relevant speed loss.
Also, since this patch adds --disable-pie, isn't the hack currently used
still necessary for that case?
Or is --disable-pie supposed to disable self-hosting? Then maybe the
option should be named --disable-self-hosting (and if that is indeed the
only side-effect it might be better to disable it by default)?
Kirill A. Shutemov Sept. 4, 2009, 4:33 a.m. UTC | #10
On Thu, Sep 3, 2009 at 8:17 PM, Reimar
Döffinger<Reimar.Doeffinger@gmx.de> wrote:
> On Thu, Sep 03, 2009 at 06:07:21PM +0300, Kirill A. Shutemov wrote:
>> On Thu, Sep 3, 2009 at 5:38 PM, Paul Brook<paul@codesourcery.com> wrote:
>> >> PIE code usually is a bit slower. Approximately, 1% for i386 according to
>> >> some tests. RISC architectures should be affected less, since they have
>> >> more registers. On other hand we are getting rid from text relocations on
>> >> i386 which make executable loading slower. So...
>> >
>> > I think you've got that backwards.
>> > A traditional (fixed address) executable requires no load-time relocation for
>> > internal references because all addresses are known at static link time. PIE
>> > require the dynamic linker adjust all absolute addresses.
>>
>> Usermode qemu on i386 is not a traditional executable, sicne it uses
>> -Wl,-shared for linking. In result we've got an executable which looks
>> like PIE, but dynamic linker have to resolve text relocations. I think
>> the best way it to create a true PIE without a text relocations.
>
> It is close to getting of topic, but since you state it, why try so hard
> to avoid text relocations?
> Sure, there are advantages (the biggest one is less issues with
> mis-/insufficiently configured selinux I think), possibly better sharing
> of pages when many instances are run and better delayed loading, but on
> x86/i386 that doesn't sound like a clear advantage compared to the in
> some cases quite relevant speed loss.

Do you have any numbers about speed loss?

> Also, since this patch adds --disable-pie, isn't the hack currently used
> still necessary for that case?
> Or is --disable-pie supposed to disable self-hosting? Then maybe the
> option should be named --disable-self-hosting (and if that is indeed the
> only side-effect it might be better to disable it by default)?
>
Reimar Döffinger Sept. 4, 2009, 7:51 a.m. UTC | #11
On Fri, Sep 04, 2009 at 07:33:25AM +0300, Kirill A. Shutemov wrote:
> On Thu, Sep 3, 2009 at 8:17 PM, Reimar
> Döffinger<Reimar.Doeffinger@gmx.de> wrote:
> > It is close to getting of topic, but since you state it, why try so hard
> > to avoid text relocations?
> > Sure, there are advantages (the biggest one is less issues with
> > mis-/insufficiently configured selinux I think), possibly better sharing
> > of pages when many instances are run and better delayed loading, but on
> > x86/i386 that doesn't sound like a clear advantage compared to the in
> > some cases quite relevant speed loss.
> 
> Do you have any numbers about speed loss?

No, I was getting a bit off-topic. At least with KVM I doubt there
is any relevant speed loss for qemu, though for MPlayer/FFmpeg (very different
situation) it could be about 10 % when I last did some tests.
Kirill A. Shutemov Sept. 4, 2009, 8:03 a.m. UTC | #12
On Fri, Sep 4, 2009 at 10:51 AM, Reimar
Döffinger<Reimar.Doeffinger@gmx.de> wrote:
> On Fri, Sep 04, 2009 at 07:33:25AM +0300, Kirill A. Shutemov wrote:
>> On Thu, Sep 3, 2009 at 8:17 PM, Reimar
>> Döffinger<Reimar.Doeffinger@gmx.de> wrote:
>> > It is close to getting of topic, but since you state it, why try so hard
>> > to avoid text relocations?
>> > Sure, there are advantages (the biggest one is less issues with
>> > mis-/insufficiently configured selinux I think), possibly better sharing
>> > of pages when many instances are run and better delayed loading, but on
>> > x86/i386 that doesn't sound like a clear advantage compared to the in
>> > some cases quite relevant speed loss.
>>
>> Do you have any numbers about speed loss?
>
> No, I was getting a bit off-topic. At least with KVM I doubt there
> is any relevant speed loss for qemu, though for MPlayer/FFmpeg (very different
> situation) it could be about 10 % when I last did some tests.
>

My patch compile only usermode targets as PIE, so it will not affect KVM.
diff mbox

Patch

diff --git a/Makefile b/Makefile
index bdac9b3..634ea81 100644
--- a/Makefile
+++ b/Makefile
@@ -39,8 +39,6 @@  subdir-%:
 	$(call quiet-command,$(MAKE) $(SUBDIR_MAKEFLAGS) -C $* V="$(V)" TARGET_DIR="$*/" all,)
 
 $(filter %-softmmu,$(SUBDIR_RULES)): libqemu_common.a
-$(filter %-user,$(SUBDIR_RULES)): libqemu_user.a
-
 
 ROMSUBDIR_RULES=$(patsubst %,romsubdir-%, $(ROMS))
 romsubdir-%:
@@ -74,7 +72,7 @@  block-obj-y +=  $(addprefix block/, $(block-nested-y))
 # CPUs and machines.
 
 obj-y = $(block-obj-y)
-obj-y += readline.o console.o host-utils.o
+obj-y += readline.o console.o
 
 obj-y += irq.o ptimer.o
 obj-y += i2c.o smbus.o smbus_eeprom.o max7310.o max111x.o wm8750.o
@@ -161,12 +159,6 @@  bt-host.o: QEMU_CFLAGS += $(BLUEZ_CFLAGS)
 
 libqemu_common.a: $(obj-y)
 
-#######################################################################
-# user-obj-y is code used by qemu userspace emulation
-user-obj-y = cutils.o cache-utils.o path.o envlist.o host-utils.o
-
-libqemu_user.a: $(user-obj-y)
-
 ######################################################################
 
 qemu-img.o: qemu-img-cmds.h
diff --git a/Makefile.target b/Makefile.target
index f7d1919..f738617 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -31,7 +31,7 @@  all: $(PROGS)
 
 #########################################################
 # cpu emulator library
-libobj-y = exec.o translate-all.o cpu-exec.o translate.o
+libobj-y = exec.o translate-all.o cpu-exec.o translate.o host-utils.o
 libobj-y += tcg/tcg.o tcg/tcg-runtime.o
 libobj-$(CONFIG_SOFTFLOAT) += fpu/softfloat.o
 libobj-$(CONFIG_NOSOFTFLOAT) += fpu/softfloat-native.o
@@ -80,9 +80,9 @@  ifdef CONFIG_LINUX_USER
 
 VPATH+=:$(SRC_PATH)/linux-user:$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR)
 QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user -I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR)
-
 obj-y = main.o syscall.o strace.o mmap.o signal.o thunk.o \
       elfload.o linuxload.o uaccess.o gdbstub.o gdbstub-xml.o
+obj-y += envlist.o path.o
 
 obj-$(TARGET_HAS_BFLT) += flatload.o
 obj-$(TARGET_HAS_ELFLOAD32) += elfload32.o
@@ -98,7 +98,7 @@  obj-arm-y += arm-semi.o
 
 obj-m68k-y += m68k-sim.o m68k-semi.o
 
-ARLIBS=../libqemu_user.a libqemu.a
+ARLIBS=libqemu.a
 endif #CONFIG_LINUX_USER
 
 #########################################################
@@ -116,6 +116,7 @@  LIBS+=-lmx
 
 obj-y = main.o commpage.o machload.o mmap.o signal.o syscall.o thunk.o \
         gdbstub.o gdbstub-xml.o
+obj-y += envlist.o path.o
 
 obj-i386-y += ioport-user.o
 
@@ -133,13 +134,23 @@  QEMU_CFLAGS+=-I$(SRC_PATH)/bsd-user -I$(SRC_PATH)/bsd-user/$(TARGET_ARCH)
 
 obj-y = main.o bsdload.o elfload.o mmap.o signal.o strace.o syscall.o \
         gdbstub.o gdbstub-xml.o uaccess.o
+obj-y += envlist.o path.o
 
 obj-i386-y += ioport-user.o
 
-ARLIBS=libqemu.a ../libqemu_user.a
+ARLIBS=libqemu.a
 
 endif #CONFIG_BSD_USER
 
+ifdef CONFIG_USER_ONLY
+# hack to compile with -fpie for *-user targets
+obj-y += cutils-user.o cache-utils-user.o
+cutils-user.c cache-utils-user.c:
+	@echo "  LN	$(TARGET_DIR)$@"
+	@ln -s $(SRC_PATH)/$(@:%-user.c=%.c) $@
+endif
+
+
 #########################################################
 # System emulator target
 ifdef CONFIG_SOFTMMU
diff --git a/configure b/configure
index 0d0162a..4f5850c 100755
--- a/configure
+++ b/configure
@@ -221,6 +221,7 @@  kerneldir=""
 aix="no"
 blobs="yes"
 pkgversion=""
+user_pie="yes"
 
 # OS specific
 if check_define __linux__ ; then
@@ -498,6 +499,10 @@  for opt do
   ;;
   --disable-guest-base) guest_base="no"
   ;;
+  --enable-user-pie) user_pie="yes"
+  ;;
+  --disable-user-pie) user_pie="no"
+  ;;
   --enable-uname-release=*) uname_release="$optarg"
   ;;
   --sparc_cpu=*)
@@ -672,6 +677,8 @@  echo "  --disable-bsd-user       disable all BSD usermode emulation targets"
 echo "  --enable-guest-base      enable GUEST_BASE support for usermode"
 echo "                           emulation targets"
 echo "  --disable-guest-base     disable GUEST_BASE support"
+echo "  --enable-user-pie        build usermode emulation targets as PIE"
+echo "  --disable-user-pie       do not build usermode emulation targets as PIE"
 echo "  --fmod-lib               path to FMOD library"
 echo "  --fmod-inc               path to FMOD includes"
 echo "  --oss-lib                path to OSS library"
@@ -1678,6 +1685,7 @@  echo "Documentation     $docs"
 echo "uname -r          $uname_release"
 echo "NPTL support      $nptl"
 echo "GUEST_BASE        $guest_base"
+echo "PIE user targets  $user_pie"
 echo "vde support       $vde"
 echo "IO thread         $io_thread"
 echo "Linux AIO support $linux_aio"
@@ -2302,6 +2310,12 @@  if test "$target_softmmu" = "yes" ; then
   esac
 fi
 
+if test "$target_user_only" = "yes" -a "$static" = "no" -a \
+	"$user_pie" = "yes" ; then
+  cflags="-fpie $cflags"
+  ldflags="-pie $ldflags"
+fi
+
 if test "$target_softmmu" = "yes" -a \( \
         "$TARGET_ARCH" = "microblaze" -o \
         "$TARGET_ARCH" = "cris" \) ; then
@@ -2323,16 +2337,6 @@  fi
 linker_script="-Wl,-T../config-host.ld -Wl,-T,\$(SRC_PATH)/\$(ARCH).ld"
 if test "$target_linux_user" = "yes" -o "$target_bsd_user" = "yes" ; then
   case "$ARCH" in
-  i386)
-    if test "$gprof" = "yes" -o "$static" = "yes" ; then
-      ldflags="$linker_script $ldflags"
-    else
-      # WARNING: this LDFLAGS is _very_ tricky : qemu is an ELF shared object
-      # that the kernel ELF loader considers as an executable. I think this
-      # is the simplest way to make it self virtualizable!
-      ldflags="-Wl,-shared $ldflags"
-    fi
-    ;;
   sparc)
     # -static is used to avoid g1/g3 usage by the dynamic linker
     ldflags="$linker_script -static $ldflags"
@@ -2340,7 +2344,7 @@  if test "$target_linux_user" = "yes" -o "$target_bsd_user" = "yes" ; then
   ia64)
     ldflags="-Wl,-G0 $linker_script -static $ldflags"
     ;;
-  x86_64|ppc|ppc64|s390|sparc64|alpha|arm|m68k|mips|mips64)
+  i386|x86_64|ppc|ppc64|s390|sparc64|alpha|arm|m68k|mips|mips64)
     ldflags="$linker_script $ldflags"
     ;;
   esac
diff --git a/linux-user/main.c b/linux-user/main.c
index a628c01..d3af2e2 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -54,26 +54,6 @@  const char *qemu_uname_release = CONFIG_UNAME_RELEASE;
 const char interp[] __attribute__((section(".interp"))) = "/lib/ld-linux.so.2";
 #endif
 
-/* for recent libc, we add these dummy symbols which are not declared
-   when generating a linked object (bug in ld ?) */
-#if (__GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 3)) && !defined(CONFIG_STATIC)
-asm(".globl __preinit_array_start\n"
-    ".globl __preinit_array_end\n"
-    ".globl __init_array_start\n"
-    ".globl __init_array_end\n"
-    ".globl __fini_array_start\n"
-    ".globl __fini_array_end\n"
-    ".section \".rodata\"\n"
-    "__preinit_array_start:\n"
-    "__preinit_array_end:\n"
-    "__init_array_start:\n"
-    "__init_array_end:\n"
-    "__fini_array_start:\n"
-    "__fini_array_end:\n"
-    ".long 0\n"
-    ".previous\n");
-#endif
-
 /* XXX: on x86 MAP_GROWSDOWN only works if ESP <= address + 32, so
    we allocate a bigger stack. Need a better solution, for example
    by remapping the process stack directly at the right place */