diff mbox

[libstdc++] Make empty std::string storage readonly

Message ID 503CE088.70208@salomon.at
State New
Headers show

Commit Message

Michael Haubenwallner Aug. 28, 2012, 3:15 p.m. UTC
Hi,

in some old, large, originally C-written application (using gcc-4.2.4 still)
I did have to find a bug that boils down to something like this:

   std::string x;
   strcpy( (char*) x.c_str(), "abc");

Any subsequent empty std::string instance did contain "abc" instead of "",
which was the issue showing up to the user.

My idea what could have helped out here was to make the empty string _Rep
object readonly (ie. const), to get a segmentation fault along the strcpy.

As the string impl doesn't write to the empty string _Rep object any more[1],
are there still reasons not to make it readonly, like in attached patch?

[1] http://gcc.gnu.org/PR40518

Thank you!
/haubi/

Comments

Jonathan Wakely Aug. 28, 2012, 4:46 p.m. UTC | #1
On 28 August 2012 16:15, Michael Haubenwallner wrote:
> Hi,
>
> in some old, large, originally C-written application (using gcc-4.2.4 still)
> I did have to find a bug that boils down to something like this:
>
>    std::string x;
>    strcpy( (char*) x.c_str(), "abc");
>
> Any subsequent empty std::string instance did contain "abc" instead of "",
> which was the issue showing up to the user.
>
> My idea what could have helped out here was to make the empty string _Rep
> object readonly (ie. const), to get a segmentation fault along the strcpy.

Does it actually produce a segfault? I suppose it might on some
platforms, but not all, so I'm not sure it's worth changing.

(It seems easier to simply track down and shoot the programmer who
thought it was OK to cast away const on the string! ;-)
Michael Haubenwallner Aug. 28, 2012, 5:27 p.m. UTC | #2
On 08/28/2012 06:46 PM, Jonathan Wakely wrote:
> On 28 August 2012 16:15, Michael Haubenwallner wrote:
>> Hi,
>>
>> in some old, large, originally C-written application (using gcc-4.2.4 still)
>> I did have to find a bug that boils down to something like this:
>>
>>    std::string x;
>>    strcpy( (char*) x.c_str(), "abc");
>>
>> Any subsequent empty std::string instance did contain "abc" instead of "",
>> which was the issue showing up to the user.
>>
>> My idea what could have helped out here was to make the empty string _Rep
>> object readonly (ie. const), to get a segmentation fault along the strcpy.
> 
> Does it actually produce a segfault? I suppose it might on some
> platforms, but not all, so I'm not sure it's worth changing.

It does segfault here on (32bit each):
 i686-pc-linux-gnu
 ia64-hp-hpux11.31
 i386-pc-solaris2.10
 sparc-sun-solaris2.10
 powerpc-ibm-aix5.3.0.0
 powerpc-ibm-aix6.1.0.0
 powerpc-ibm-aix7.1.0.0

It does not segfault here on:
 hppa2.0n-hp-hpux11.31
 i586-pc-interix5.2
 i586-pc-winnt5.2 (using MSVC)

Maybe it could be made segfault on hppa2.0n-hp-hpux11.31 too using some linker flag,
but that's a deprecated platform anyway.

As long as the major development platform (Linux) does segfault, it feels worth
changing - especially as string.clear() to write the '\0' back again won't help
as quick'n dirty workaround since gcc-4.4.4 any more.

Also, a segfault is always better than random behaviour, even in production.

> (It seems easier to simply track down and shoot the programmer who
> thought it was OK to cast away const on the string! ;-)

I'm in doubt there isn't just one occurrence/developer with this bug...

Thank you!
/haubi/
Jonathan Wakely Aug. 28, 2012, 6:12 p.m. UTC | #3
On 28 August 2012 18:27, Michael Haubenwallner wrote:
>>
>> Does it actually produce a segfault? I suppose it might on some
>> platforms, but not all, so I'm not sure it's worth changing.
>
> It does segfault here on (32bit each):
>  i686-pc-linux-gnu
>  ia64-hp-hpux11.31
>  i386-pc-solaris2.10
>  sparc-sun-solaris2.10
>  powerpc-ibm-aix5.3.0.0
>  powerpc-ibm-aix6.1.0.0
>  powerpc-ibm-aix7.1.0.0
>
> It does not segfault here on:
>  hppa2.0n-hp-hpux11.31
>  i586-pc-interix5.2
>  i586-pc-winnt5.2 (using MSVC)
>
> Maybe it could be made segfault on hppa2.0n-hp-hpux11.31 too using some linker flag,
> but that's a deprecated platform anyway.
>
> As long as the major development platform (Linux) does segfault, it feels worth
> changing - especially as string.clear() to write the '\0' back again won't help
> as quick'n dirty workaround since gcc-4.4.4 any more.

Hmm, I tested it on x86_64-unknown-linux-gnu without getting a
segfault - but I might have messed up my test.
Michael Haubenwallner Aug. 29, 2012, 12:25 p.m. UTC | #4
On 08/28/2012 08:12 PM, Jonathan Wakely wrote:
> On 28 August 2012 18:27, Michael Haubenwallner wrote:
>>>
>>> Does it actually produce a segfault? I suppose it might on some
>>> platforms, but not all, so I'm not sure it's worth changing.
>>
>> It does segfault here on (32bit each):
>>  i686-pc-linux-gnu
>>  ia64-hp-hpux11.31
>>  i386-pc-solaris2.10
>>  sparc-sun-solaris2.10
>>  powerpc-ibm-aix5.3.0.0
>>  powerpc-ibm-aix6.1.0.0
>>  powerpc-ibm-aix7.1.0.0
>>
>> It does not segfault here on:
>>  hppa2.0n-hp-hpux11.31
>>  i586-pc-interix5.2
>>  i586-pc-winnt5.2 (using MSVC)
>>
>> Maybe it could be made segfault on hppa2.0n-hp-hpux11.31 too using some linker flag,
>> but that's a deprecated platform anyway.
>>
>> As long as the major development platform (Linux) does segfault, it feels worth
>> changing - especially as string.clear() to write the '\0' back again won't help
>> as quick'n dirty workaround since gcc-4.4.4 any more.
> 
> Hmm, I tested it on x86_64-unknown-linux-gnu without getting a
> segfault - but I might have messed up my test.

Using this patch on my x86_64 Gentoo Linux Desktop with gcc-4.7.1 does segfault
as expected - when I make sure the correct libstdc++ is used at runtime,
having the '_S_empty_rep_storage' symbol in the .rodata section rather than .bss.

/haubi/
Jonathan Wakely Aug. 30, 2012, 9:45 a.m. UTC | #5
On 29 August 2012 13:25, Michael Haubenwallner wrote:
>
> On 08/28/2012 08:12 PM, Jonathan Wakely wrote:
>> On 28 August 2012 18:27, Michael Haubenwallner wrote:
>>>>
>>>> Does it actually produce a segfault? I suppose it might on some
>>>> platforms, but not all, so I'm not sure it's worth changing.
>>>
>>> It does segfault here on (32bit each):
>>>  i686-pc-linux-gnu
>>>  ia64-hp-hpux11.31
>>>  i386-pc-solaris2.10
>>>  sparc-sun-solaris2.10
>>>  powerpc-ibm-aix5.3.0.0
>>>  powerpc-ibm-aix6.1.0.0
>>>  powerpc-ibm-aix7.1.0.0
>>>
>>> It does not segfault here on:
>>>  hppa2.0n-hp-hpux11.31
>>>  i586-pc-interix5.2
>>>  i586-pc-winnt5.2 (using MSVC)
>>>
>>> Maybe it could be made segfault on hppa2.0n-hp-hpux11.31 too using some linker flag,
>>> but that's a deprecated platform anyway.
>>>
>>> As long as the major development platform (Linux) does segfault, it feels worth
>>> changing - especially as string.clear() to write the '\0' back again won't help
>>> as quick'n dirty workaround since gcc-4.4.4 any more.
>>
>> Hmm, I tested it on x86_64-unknown-linux-gnu without getting a
>> segfault - but I might have messed up my test.
>
> Using this patch on my x86_64 Gentoo Linux Desktop with gcc-4.7.1 does segfault
> as expected - when I make sure the correct libstdc++ is used at runtime,
> having the '_S_empty_rep_storage' symbol in the .rodata section rather than .bss.

Bah, I did mess up my test, not correctly disabling the extern
template instantiations in the library.

If it works reliably on x86_64 then I think the patch is worth considering.

I'm on holiday for a week, so maybe one of the other maintainers will
deal with it first.
Michael Haubenwallner Oct. 30, 2012, 9:05 a.m. UTC | #6
On 08/30/2012 11:45 AM, Jonathan Wakely wrote:
> On 29 August 2012 13:25, Michael Haubenwallner wrote:
>> On 08/28/2012 08:12 PM, Jonathan Wakely wrote:
>>> On 28 August 2012 18:27, Michael Haubenwallner wrote:
>>>>>
>>>>> Does it actually produce a segfault? I suppose it might on some
>>>>> platforms, but not all, so I'm not sure it's worth changing.
>>
>> Using this patch on my x86_64 Gentoo Linux Desktop with gcc-4.7.1 does segfault
>> as expected - when I make sure the correct libstdc++ is used at runtime,
>> having the '_S_empty_rep_storage' symbol in the .rodata section rather than .bss.
> 
> If it works reliably on x86_64 then I think the patch is worth considering.
> 
> I'm on holiday for a week, so maybe one of the other maintainers will
> deal with it first.

Any chance to get this in for 4.8?

Thank you!
/haubi/
Jonathan Wakely Oct. 30, 2012, 9:28 a.m. UTC | #7
On 30 October 2012 09:05, Michael Haubenwallner wrote:
> Any chance to get this in for 4.8?

I'm looking into it today.
Jonathan Wakely Oct. 30, 2012, 10:11 a.m. UTC | #8
On 30 October 2012 09:28, Jonathan Wakely wrote:
> On 30 October 2012 09:05, Michael Haubenwallner wrote:
>> Any chance to get this in for 4.8?
>
> I'm looking into it today.

Consider the case where one object file containing
std::string().erase() is built with an older GCC without the fix for
PR 40518, then it's linked to a new libstdc++.so where the empty rep
is read-only.  The program will attempt to write to the empty rep, but
now it's read-only and will crash.  I don't think we can apply it
unless we change the library ABI so that no pre-PR40518 objects can
link to a libstdc++.so containing a read-only empty rep.
Jonathan Wakely Oct. 30, 2012, 10:36 a.m. UTC | #9
On 30 October 2012 10:11, Jonathan Wakely wrote:
> On 30 October 2012 09:28, Jonathan Wakely wrote:
>> On 30 October 2012 09:05, Michael Haubenwallner wrote:
>>> Any chance to get this in for 4.8?
>>
>> I'm looking into it today.
>
> Consider the case where one object file containing
> std::string().erase() is built with an older GCC without the fix for
> PR 40518, then it's linked to a new libstdc++.so where the empty rep
> is read-only.  The program will attempt to write to the empty rep, but
> now it's read-only and will crash.  I don't think we can apply it
> unless we change the library ABI so that no pre-PR40518 objects can
> link to a libstdc++.so containing a read-only empty rep.

If I can convince myself this can't happen then I'll commit it, but I
need to look into it further this evening.
diff mbox

Patch

From d16c5d0cabbe9b308f852534be0a209b5b2c5cfd Mon Sep 17 00:00:00 2001
From: Michael Haubenwallner <michael.haubenwallner@salomon.at>
Date: Tue, 28 Aug 2012 17:05:48 +0200
Subject: [PATCH] Make default string storage readonly.

To help finding bugs that write to the empty string storage using C
string manipulation functions like strcpy, define the empty string
storage as const, resulting in a segmentation fault with such strcpy,
which is ways better than having subsequent empty strings non-empty.
---
 libstdc++-v3/ChangeLog                     |    6 ++++++
 libstdc++-v3/include/bits/basic_string.h   |    4 ++--
 libstdc++-v3/include/bits/basic_string.tcc |    4 ++--
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index c2fcddc..86a5781 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,9 @@ 
+2012-08-28  Michael Haubenwallner <michael.haubenwallner@salomon.at>
+
+	(_S_empty_rep_storage): Make default string storage readonly.
+	* include/bits/basic_string.h: Declare as const.
+	* include/bits/basic_string.tcc: Define the const value.
+
 2012-08-27  Ulrich Drepper  <drepper@gmail.com>
 
 	Add interfaces to retrieve random numbers in bulk.
diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index 24562c4..2d9b742 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -177,7 +177,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 	// The following storage is init'd to 0 by the linker, resulting
         // (carefully) in an empty string with one reference.
-        static size_type _S_empty_rep_storage[];
+        static size_type const _S_empty_rep_storage[];
 
         static _Rep&
         _S_empty_rep()
@@ -185,7 +185,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  // NB: Mild hack to avoid strict-aliasing warnings.  Note that
 	  // _S_empty_rep_storage is never modified and the punning should
 	  // be reasonably safe in this case.
-	  void* __p = reinterpret_cast<void*>(&_S_empty_rep_storage);
+	  void* __p = const_cast<void*>(reinterpret_cast<void const*>(&_S_empty_rep_storage));
 	  return *reinterpret_cast<_Rep*>(__p);
 	}
 
diff --git a/libstdc++-v3/include/bits/basic_string.tcc b/libstdc++-v3/include/bits/basic_string.tcc
index 7eff818..945ce1c 100644
--- a/libstdc++-v3/include/bits/basic_string.tcc
+++ b/libstdc++-v3/include/bits/basic_string.tcc
@@ -64,10 +64,10 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Linker sets _S_empty_rep_storage to all 0s (one reference, empty string)
   // at static init time (before static ctors are run).
   template<typename _CharT, typename _Traits, typename _Alloc>
-    typename basic_string<_CharT, _Traits, _Alloc>::size_type
+    typename basic_string<_CharT, _Traits, _Alloc>::size_type const
     basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_empty_rep_storage[
     (sizeof(_Rep_base) + sizeof(_CharT) + sizeof(size_type) - 1) /
-      sizeof(size_type)];
+      sizeof(size_type)] = {};
 
   // NB: This is the special case for Input Iterators, used in
   // istreambuf_iterators, etc.
-- 
1.7.3.4