Improve DST handling (Bug 23102, Bug 21942, Bug 18018, Bug, 23259, CVE-2011-0536 ).

Message ID 9cf43cb6-511c-ec6c-9a87-e89a467238d9@redhat.com
State New
Headers show
Series
  • Improve DST handling (Bug 23102, Bug 21942, Bug 18018, Bug, 23259, CVE-2011-0536 ).
Related show

Commit Message

Carlos O'Donell June 6, 2018, 5:02 a.m.
This commit improves DST handling significantly in the following
ways: firstly is_dst () is overhauled to correctly process DST
sequences that would be accepted given the ELF gABI. This means that
we actually now accept slightly more sequences than before.  Now we
accept $ORIGIN$ORIGIN, but in the past we accepted only $ORIGIN\0 or
$ORIGIN/..., but this kind of behaviour results in unexpected
and uninterpreted DST sequences being used as literal search paths
leading to security defects. Therefore the first step in correcting
this defect is making is_dst () properly account for all DSTs
and making the function context free in the sense that it counts
DSTs without knowledge of path, or AT_SECURE. Next, _dl_dst_count ()
is also simplified to count all DSTs regardless of context.
Then in _dl_dst_substitute () we reintroduce context-dependent
processing for such things as AT_SECURE handling. At the level of
_dl_dst_substitute we can have access to things like the true start
of the string sequence to validate $ORIGIN-based paths rooted in
trusted directories.  Lastly, callers of _dl_dst_substitute () are
adjusted to pass in the start of their string sequences, this includes
expand_dynamic_string_token () and fillin_rpath ().

Verified with a sequence of 47 tests on x86_64 that cover
non-AT_SECURE and AT_SECURE testing using a sysroot (requires root
to run). The tests cover cases for bug 23102, bug 21942, bug 18018,
and bug 23259. These tests are not yet appropriate for the glibc
regression testsuite, but with the upcoming test-in-container testing
framework it should be possible to include these tests upstream soon.

See the mailing list for the tests:
[insert final URL of post containing reference to swbz23259.tar.gz (test cases)]

OK to checkin?

---
 ChangeLog                  |  20 +++++++
 NEWS                       |  10 ++++
 elf/dl-deps.c              |   2 +-
 elf/dl-dst.h               |   3 +-
 elf/dl-load.c              | 141 +++++++++++++++++++++++++++++++--------------
 sysdeps/generic/ldsodefs.h |   5 +-
 6 files changed, 133 insertions(+), 48 deletions(-)

Comments

Florian Weimer June 6, 2018, 2 p.m. | #1
On 06/06/2018 07:02 AM, Carlos O'Donell wrote:
> diff --git a/elf/dl-load.c b/elf/dl-load.c
> index 431236920f..13263212d5 100644
> --- a/elf/dl-load.c
> +++ b/elf/dl-load.c
> @@ -177,63 +177,89 @@ is_trusted_path_normalize (const char *path, size_t len)
>     return false;
>   }
>   
> +/* Given a substring starting at NAME, just after the DST '$' start
> +   token, determine if NAME contains dynamic string token STR,
> +   following the ELF gABI rules for dynamic string tokens:
>   
> +   * Longest possible sequence using the rules (greedy).
> +
> +   * Must start with a $ (enforced by caller).
> +
> +   * Must follow $ with one underscore or ASCII [A-Za-z] (enforced by
> +     caller via STR comparison) or '{' (start curly quoted name).

“enforced by caller via STR comparison”: I don't see this in the 
remaining code.  So $ORIGIN and $PRIGIN are currently treated the same?

> +   * Must follow first two characters with zero or more [A-Za-z0-9_]
> +     (enforced by caller) or '}' (end curly quoted name).
> +
> +   If the sequence is a dynamic string token matching STR then
> +   the length of the DST is returned, otherwise 0.  */
>   static size_t
> -is_dst (const char *start, const char *name, const char *str, int secure)
> +is_dst (const char *name, const char *str)

I'm not entirely happy about the choice of parameter names. 
name/reference or input/name would work better IMHO.

>         /* Point again at the beginning of the name.  */
>         --name;

This assignment is now dead, and the comment is wrong.

(Still need to look at the rest of the patch.)

Thanks,
Florian
Florian Weimer June 6, 2018, 2:15 p.m. | #2
On 06/06/2018 07:02 AM, Carlos O'Donell wrote:
> +/* Passed the start of a DST sequence at the first '$' occurrence.
> +   See the DL_DST_COUNT macro which inlines the strchr to find the
> +   first occurrence of '$' and optimizes that likely case that there
> +   is no DST.  If there is a DST we call into _dl_dst_count to count
> +   the number of DSTs.  We count all known DSTs regardless of
> +   __libc_enable_secure; the caller is responsible for enforcing
> +   the security of the substitution rules (usually
> +   _dl_dst_substitute).  */

Maybe kill DL_DST_COUNT?  It doesn't look useful to me.

> +      /* All DSTs must follow ELF gABI rules, see is_dst ().  */
> +      if ((len = is_dst (name, "ORIGIN")) != 0
> +	  || (len = is_dst (name, "PLATFORM")) != 0
> +	  || (len = is_dst (name, "LIB")) != 0)
>   	++cnt;

len is never read, so you can remove the variable.

Thanks,
Florian
Carlos O'Donell June 6, 2018, 2:55 p.m. | #3
On 06/06/2018 10:00 AM, Florian Weimer wrote:
> On 06/06/2018 07:02 AM, Carlos O'Donell wrote:
>> diff --git a/elf/dl-load.c b/elf/dl-load.c
>> index 431236920f..13263212d5 100644
>> --- a/elf/dl-load.c
>> +++ b/elf/dl-load.c
>> @@ -177,63 +177,89 @@ is_trusted_path_normalize (const char *path, size_t len)
>>     return false;
>>   }
>>   +/* Given a substring starting at NAME, just after the DST '$' start
>> +   token, determine if NAME contains dynamic string token STR,
>> +   following the ELF gABI rules for dynamic string tokens:
>>   +   * Longest possible sequence using the rules (greedy).
>> +
>> +   * Must start with a $ (enforced by caller).
>> +
>> +   * Must follow $ with one underscore or ASCII [A-Za-z] (enforced by
>> +     caller via STR comparison) or '{' (start curly quoted name).
> 
> “enforced by caller via STR comparison”: I don't see this in the remaining code.  So $ORIGIN and $PRIGIN are currently treated the same?

This is a mistake when I moved the code from my test harness into is_dst.

The clarified comment is this:

   * Must follow $ with one underscore or ASCII [A-Za-z] (caller
     follows these rules for STR and we enforce the compariso)
     or '{' (start curly quoted name).

The updated code is this:

  /* Compare the DST (no strncmp this early in startup).  */
  len = 0;
  while (name[len] == str[len] && len < nlen)
    ++len;

  /* Characters of name don't match DST.  */
  if (len != nlen)
    return 0;

Very similar to the original code, but using the computed length
nlen (greedy length of name).

>> +   * Must follow first two characters with zero or more [A-Za-z0-9_]
>> +     (enforced by caller) or '}' (end curly quoted name).
>> +
>> +   If the sequence is a dynamic string token matching STR then
>> +   the length of the DST is returned, otherwise 0.  */
>>   static size_t
>> -is_dst (const char *start, const char *name, const char *str, int secure)
>> +is_dst (const char *name, const char *str)
> 
> I'm not entirely happy about the choice of parameter names. name/reference or input/name would work better IMHO.

Yeah, me neither, I tried to standardize across the board as name/str.

I will change everything over to use:

input - Incoming input byte stream with possible DSTs.
ref - Reference DST for comparison.

>>         /* Point again at the beginning of the name.  */
>>         --name;
> 
> This assignment is now dead, and the comment is wrong.

Good catch. Removed.

> (Still need to look at the rest of the patch.)

Thanks.

Cheers,
Carlos.
Andreas Schwab June 6, 2018, 3:47 p.m. | #4
On Jun 06 2018, Carlos O'Donell <carlos@redhat.com> wrote:

> diff --git a/elf/dl-load.c b/elf/dl-load.c
> index 431236920f..13263212d5 100644
> --- a/elf/dl-load.c
> +++ b/elf/dl-load.c
> @@ -177,63 +177,89 @@ is_trusted_path_normalize (const char *path, size_t len)
>    return false;
>  }
>  
> +/* Given a substring starting at NAME, just after the DST '$' start
> +   token, determine if NAME contains dynamic string token STR,
> +   following the ELF gABI rules for dynamic string tokens:
>  
> +   * Longest possible sequence using the rules (greedy).
> +
> +   * Must start with a $ (enforced by caller).
> +
> +   * Must follow $ with one underscore or ASCII [A-Za-z] (enforced by
> +     caller via STR comparison) or '{' (start curly quoted name).
> +
> +   * Must follow first two characters with zero or more [A-Za-z0-9_]
> +     (enforced by caller) or '}' (end curly quoted name).
> +
> +   If the sequence is a dynamic string token matching STR then
> +   the length of the DST is returned, otherwise 0.  */
>  static size_t
> -is_dst (const char *start, const char *name, const char *str, int secure)
> +is_dst (const char *name, const char *str)
>  {
> -  size_t len;
> +  size_t nlen, slen;
>    bool is_curly = false;
>  
> +  /* Is a ${...} name sequence?  */
>    if (name[0] == '{')
>      {
>        is_curly = true;
>        ++name;
>      }
>  
> -  len = 0;
> -  while (name[len] == str[len] && name[len] != '\0')
> -    ++len;
> +  /* Find longest valid name sequence.  */
> +  nlen = 0;
> +  while ((name[nlen] >= 'A' && name[nlen] <= 'Z')
> +	 || (name[nlen] >= 'a' && name[nlen] <= 'z')
> +	 || (name[nlen] >= '0' && name[nlen] <= '9')
> +	 || (name[nlen] == '_'))
> +    ++nlen;
> +
> +  slen = strlen (str);

You are completely ignoring the contents of str now.  That doesn't make
sense.

Andreas.
Carlos O'Donell June 6, 2018, 3:59 p.m. | #5
On 06/06/2018 10:15 AM, Florian Weimer wrote:
> On 06/06/2018 07:02 AM, Carlos O'Donell wrote:
>> +/* Passed the start of a DST sequence at the first '$' occurrence.
>> +   See the DL_DST_COUNT macro which inlines the strchr to find the
>> +   first occurrence of '$' and optimizes that likely case that there
>> +   is no DST.  If there is a DST we call into _dl_dst_count to count
>> +   the number of DSTs.  We count all known DSTs regardless of
>> +   __libc_enable_secure; the caller is responsible for enforcing
>> +   the security of the substitution rules (usually
>> +   _dl_dst_substitute).  */
> 
> Maybe kill DL_DST_COUNT?  It doesn't look useful to me.

Killed.

>> +      /* All DSTs must follow ELF gABI rules, see is_dst ().  */
>> +      if ((len = is_dst (name, "ORIGIN")) != 0
>> +      || (len = is_dst (name, "PLATFORM")) != 0
>> +      || (len = is_dst (name, "LIB")) != 0)
>>       ++cnt;
> 
> len is never read, so you can remove the variable.

Removed.

v2 patch coming up.

Cheers,
Carlos
Carlos O'Donell June 6, 2018, 4:01 p.m. | #6
On 06/06/2018 11:47 AM, Andreas Schwab wrote:
> You are completely ignoring the contents of str now.  That doesn't make
> sense.

Correct, it was a mistake in the posted patch.

v2 coming with a fix, and a new test case to ensure I don't cut-n-paste
the wrong thing again.

Cheers,
Carlos.
Carlos O'Donell June 6, 2018, 4:10 p.m. | #7
On 06/06/2018 01:02 AM, Carlos O'Donell wrote:
> This commit improves DST handling significantly in the following

v2

- Fix is_dst() by adding back string comparison, and clarify comment.
  Added extra test testcases (49 now) to cover this error.
- Renamed parameters as 'input' for data from ELF file, and 'ref' as
  reference DST to compare against.
- Removed dead update to name in is_dst().
- Killed DL_DST_COUNT, not needed really, and a weak optimization.
- Did not remove len from _dl_dst_count because we use it to advance
  input more quickly to the next DST.

This commit improves DST handling significantly in the following
ways: firstly is_dst () is overhauled to correctly process DST
sequences that would be accepted given the ELF gABI. This means that
we actually now accept slightly more sequences than before.  Now we
accept $ORIGIN$ORIGIN, but in the past we accepted only $ORIGIN\0 or
$ORIGIN/..., but this kind of behaviour results in unexpected
and uninterpreted DST sequences being used as literal search paths
leading to security defects. Therefore the first step in correcting
this defect is making is_dst () properly account for all DSTs
and making the function context free in the sense that it counts
DSTs without knowledge of path, or AT_SECURE. Next, _dl_dst_count ()
is also simplified to count all DSTs regardless of context.
Then in _dl_dst_substitute () we reintroduce context-dependent
processing for such things as AT_SECURE handling. At the level of
_dl_dst_substitute we can have access to things like the true start
of the string sequence to validate $ORIGIN-based paths rooted in
trusted directories.  Lastly, callers of _dl_dst_substitute () are
adjusted to pass in the start of their string sequences, this includes
expand_dynamic_string_token () and fillin_rpath (). Lastly, after this
commit we tighten up the accepted sequences in AT_SECURE, and avoid
leaving unexpanded DSTs, this is noted in the NEWS entry.

Verified with a sequence of 49 tests on x86_64 that cover
non-AT_SECURE and AT_SECURE testing using a sysroot (requires root
to run). The tests cover cases for bug 23102, bug 21942, bug 18018,
and bug 23259. These tests are not yet appropriate for the glibc
regression testsuite, but with the upcoming test-in-container testing
framework it should be possible to include these tests upstream soon.

See the mailing list for the tests:
https://www.sourceware.org/ml/libc-alpha/2018-06/msg00073.html
---
 ChangeLog                  |  20 ++++++
 NEWS                       |  10 +++
 elf/dl-deps.c              |   4 +-
 elf/dl-dst.h               |  13 ----
 elf/dl-load.c              | 174 +++++++++++++++++++++++++++++++--------------
 sysdeps/generic/ldsodefs.h |   5 +-
 6 files changed, 155 insertions(+), 71 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index a3bc2bf31e..f6dbc961e2 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,23 @@
+2018-06-06  Carlos O'Donell  <carlos@redhat.com>
+	    Andreas Schwab  <schwab@suse.de>
+	    Dmitry V. Levin  <ldv@altlinux.org>
+
+	[BZ #23102]
+	[BZ #21942]
+	[BZ #18018]
+	[BZ #23259]
+	CVE-2011-0536
+	* elf/dl-load.c (is_dst): Comment.  Support ELF gABI.
+	(_dl_dst_count): Comment.  Simplify and count DSTs.
+	(_dl_dst_substitute): Comment.  Support __libc_enable_secure handling.
+	Add string start to arguments.
+	(expand_dybamic_string_token): Comment. Accept path start.
+	(fillin_rpath): Pass string start to expand_dynamic_string_token.
+	* sysdeps/generic/ldsodefs.h: _dl_dst_substitute takes additiional
+	string start argument.
+	* elf/dl-deps.c (expand_dst): Adjust call to _dl_dst_substitute.
+	* elf/dl-dst.h: Remove DL_DST_COUNT.
+
 2018-06-05  Joseph Myers  <joseph@codesourcery.com>
 
 	* sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h (HWCAP_DIT): New
diff --git a/NEWS b/NEWS
index e2a6f45121..0d0bc9ad4c 100644
--- a/NEWS
+++ b/NEWS
@@ -41,6 +41,16 @@ Major new features:
   NI_IDN_ALLOW_UNASSIGNED, NI_IDN_USE_STD3_ASCII_RULES) have been
   deprecated.  They no longer have any effect.
 
+* Parsing of dynamic string tokens in DT_RPATH, DT_RUNPATH, and DT_NEEDED
+  has been expanded to support the full range of ELF gABI expressions
+  including such constructs as '$ORIGIN$ORIGIN' (if valid).  For SUID/GUID
+  applications the rules have been further restricted, and where in the
+  past a dynamic string token sequence may have been interpreted as a
+  literal string it will now cause a load failure.  These load failures
+  were always considered unspecified behaviour from the perspective of the
+  dynamic loader, and for safety are now load errors e.g. /foo/${ORIGIN}.so
+  in DT_NEEDED results in a load failure now.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * The nonstandard header files <libio.h> and <_G_config.h> are no longer
diff --git a/elf/dl-deps.c b/elf/dl-deps.c
index c975fcffd7..07b0859456 100644
--- a/elf/dl-deps.c
+++ b/elf/dl-deps.c
@@ -100,7 +100,7 @@ struct list
   ({									      \
     const char *__str = (str);						      \
     const char *__result = __str;					      \
-    size_t __dst_cnt = DL_DST_COUNT (__str);				      \
+    size_t __dst_cnt = _dl_dst_count (__str);				      \
 									      \
     if (__dst_cnt != 0)							      \
       {									      \
@@ -114,7 +114,7 @@ DST not allowed in SUID/SGID programs"));				      \
 	__newp = (char *) alloca (DL_DST_REQUIRED (l, __str, strlen (__str),  \
 						   __dst_cnt));		      \
 									      \
-	__result = _dl_dst_substitute (l, __str, __newp);		      \
+	__result = _dl_dst_substitute (l, __str, __str, __newp);	      \
 									      \
 	if (*__result == '\0')						      \
 	  {								      \
diff --git a/elf/dl-dst.h b/elf/dl-dst.h
index 32de5d225a..859032be0d 100644
--- a/elf/dl-dst.h
+++ b/elf/dl-dst.h
@@ -18,19 +18,6 @@
 
 #include "trusted-dirs.h"
 
-/* Determine the number of DST elements in the name.  Only if IS_PATH is
-   nonzero paths are recognized (i.e., multiple, ':' separated filenames).  */
-#define DL_DST_COUNT(name) \
-  ({									      \
-    size_t __cnt = 0;							      \
-    const char *__sf = strchr (name, '$');				      \
-									      \
-    if (__glibc_unlikely (__sf != NULL))				      \
-      __cnt = _dl_dst_count (__sf);					      \
-									      \
-    __cnt; })
-
-
 #ifdef SHARED
 # define IS_RTLD(l) (l) == &GL(dl_rtld_map)
 #else
diff --git a/elf/dl-load.c b/elf/dl-load.c
index 431236920f..a3f0b3f435 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -177,76 +177,115 @@ is_trusted_path_normalize (const char *path, size_t len)
   return false;
 }
 
+/* Given a substring starting at INPUT, just after the DST '$' start
+   token, determine if INPUT contains DST token REF, following the
+   ELF gABI rules for DSTs:
 
+   * Longest possible sequence using the rules (greedy).
+
+   * Must start with a $ (enforced by caller).
+
+   * Must follow $ with one underscore or ASCII [A-Za-z] (caller
+     follows these rules for REF and we enforce the comparison)
+     or '{' (start curly quoted name).
+
+   * Must follow first two characters with zero or more [A-Za-z0-9_]
+     (enforced by caller) or '}' (end curly quoted name).
+
+   If the sequence is a DST matching REF then the length of the DST is
+   returned, otherwise 0.  */
 static size_t
-is_dst (const char *start, const char *name, const char *str, int secure)
+is_dst (const char *input, const char *ref)
 {
-  size_t len;
+  size_t len, ilen, rlen;
   bool is_curly = false;
 
-  if (name[0] == '{')
+  /* Is a ${...} input sequence?  */
+  if (input[0] == '{')
     {
       is_curly = true;
-      ++name;
+      ++input;
     }
 
+  /* Find longest valid input sequence.  */
+  ilen = 0;
+  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
+	 || (input[ilen] >= 'a' && input[ilen] <= 'z')
+	 || (input[ilen] >= '0' && input[ilen] <= '9')
+	 || (input[ilen] == '_'))
+    ++ilen;
+
+  rlen = strlen (ref);
+
+  /* Can't be the DST we are looking for.  */
+  if (rlen != ilen)
+    return 0;
+
+  /* Compare the DST (no strncmp this early in startup).  */
   len = 0;
-  while (name[len] == str[len] && name[len] != '\0')
+  while (input[len] == ref[len] && len < ilen)
     ++len;
 
+  /* Characters of input don't match DST.  */
+  if (len != ilen)
+    return 0;
+
   if (is_curly)
     {
-      if (name[len] != '}')
+      /* Invalid curly sequence!  */
+      if (input[ilen] != '}')
 	return 0;
 
-      /* Point again at the beginning of the name.  */
-      --name;
-      /* Skip over closing curly brace and adjust for the --name.  */
-      len += 2;
+      /* Count the two curly braces.  */
+      ilen += 2;
     }
-  else if (name[len] != '\0' && name[len] != '/')
-    return 0;
 
-  if (__glibc_unlikely (secure)
-      && ((name[len] != '\0' && name[len] != '/')
-	  || (name != start + 1)))
-    return 0;
-
-  return len;
+  return ilen;
 }
 
-
+/* INPUT is the start of a DST sequence at the first '$' occurrence.
+   If there is a DST we call into _dl_dst_count to count the number of
+   DSTs.  We count all known DSTs regardless of __libc_enable_secure;
+   the caller is responsible for enforcing the security of the
+   substitution rules (usually _dl_dst_substitute).  */
 size_t
-_dl_dst_count (const char *name)
+_dl_dst_count (const char *input)
 {
-  const char *const start = name;
   size_t cnt = 0;
 
+  input = strchr (input, '$');
+
+  /* Most likely there is no DST.  */
+  if (__glibc_likely (input == NULL))
+    return 0;
+
   do
     {
       size_t len;
 
-      /* $ORIGIN is not expanded for SUID/GUID programs (except if it
-	 is $ORIGIN alone) and it must always appear first in path.  */
-      ++name;
-      if ((len = is_dst (start, name, "ORIGIN", __libc_enable_secure)) != 0
-	  || (len = is_dst (start, name, "PLATFORM", 0)) != 0
-	  || (len = is_dst (start, name, "LIB", 0)) != 0)
+      ++input;
+      /* All DSTs must follow ELF gABI rules, see is_dst ().  */
+      if ((len = is_dst (input, "ORIGIN")) != 0
+	  || (len = is_dst (input, "PLATFORM")) != 0
+	  || (len = is_dst (input, "LIB")) != 0)
 	++cnt;
 
-      name = strchr (name + len, '$');
+      /* There may be more than one DST in the input.  */
+      input = strchr (input + len, '$');
     }
-  while (name != NULL);
+  while (input != NULL);
 
   return cnt;
 }
 
-
+/* Process INPUT for DSTs and store in RESULT using the information from
+   link map L to resolve the DSTs.  The value of START must equal the
+   start of the parent string if INPUT is a substring sequence being
+   parsed with path separators e.g. $ORIGIN:$PLATFORM.  */
 char *
-_dl_dst_substitute (struct link_map *l, const char *name, char *result)
+_dl_dst_substitute (struct link_map *l, const char *start,
+		    const char *input, char *result)
 {
-  const char *const start = name;
-
   /* Now fill the result path.  While copying over the string we keep
      track of the start of the last path element.  When we come across
      a DST we copy over the value or (if the value is not available)
@@ -257,32 +296,54 @@ _dl_dst_substitute (struct link_map *l, const char *name, char *result)
 
   do
     {
-      if (__glibc_unlikely (*name == '$'))
+      if (__glibc_unlikely (*input == '$'))
 	{
 	  const char *repl = NULL;
 	  size_t len;
 
-	  ++name;
-	  if ((len = is_dst (start, name, "ORIGIN", __libc_enable_secure)) != 0)
+	  ++input;
+	  if ((len = is_dst (input, "ORIGIN")) != 0)
 	    {
-	      repl = l->l_origin;
+	      /* For SUID/GUID programs we normally ignore the path with
+		 a DST in DT_RUNPATH, or DT_RPATH.  However, there is one
+		 exception to this rule, and it is:
+
+		   * $ORIGIN appears first in the path element.
+		   * The path element is rooted in a trusted directory.
+
+		 This exception allows such programs to reference
+		 shared libraries in subdirectories of trusted
+		 directories.  The use case is one of general
+		 organization and deployment flexibility.
+		 Trusted directories are usually such paths as "/lib64"
+		 or "/lib".  */
+	      if (__glibc_unlikely (__libc_enable_secure)
+		  && ((input[len] != '\0' && input[len] != '/'
+		       && input[len] != ':')
+		      || (input != start + 1
+			  || (input > start + 2 && input[-2] != ':'))))
+		repl = (const char *) -1;
+	      else
+	        repl = l->l_origin;
+
 	      check_for_trusted = (__libc_enable_secure
 				   && l->l_type == lt_executable);
 	    }
-	  else if ((len = is_dst (start, name, "PLATFORM", 0)) != 0)
+	  else if ((len = is_dst (input, "PLATFORM")) != 0)
 	    repl = GLRO(dl_platform);
-	  else if ((len = is_dst (start, name, "LIB", 0)) != 0)
+	  else if ((len = is_dst (input, "LIB")) != 0)
 	    repl = DL_DST_LIB;
 
 	  if (repl != NULL && repl != (const char *) -1)
 	    {
 	      wp = __stpcpy (wp, repl);
-	      name += len;
+	      input += len;
 	    }
 	  else if (len > 1)
 	    {
 	      /* We cannot use this path element, the value of the
 		 replacement is unknown.  */
+	      check_for_trusted = false;
 	      wp = last_elem;
 	      break;
 	    }
@@ -292,10 +353,10 @@ _dl_dst_substitute (struct link_map *l, const char *name, char *result)
 	}
       else
 	{
-	  *wp++ = *name++;
+	  *wp++ = *input++;
 	}
     }
-  while (*name != '\0');
+  while (*input != '\0');
 
   /* In SUID/SGID programs, after $ORIGIN expansion the normalized
      path must be rooted in one of the trusted directories.  */
@@ -309,13 +370,17 @@ _dl_dst_substitute (struct link_map *l, const char *name, char *result)
 }
 
 
-/* Return copy of argument with all recognized dynamic string tokens
-   ($ORIGIN and $PLATFORM for now) replaced.  On some platforms it
-   might not be possible to determine the path from which the object
-   belonging to the map is loaded.  In this case the path element
-   containing $ORIGIN is left out.  */
+/* Return a malloc allocated copy of INPUT with all recognized DSTs
+   ($ORIGIN and $PLATFORM for now) replaced.  The value of START must
+   equal the start of the parent string if INPUT is a substring
+   sequence being parsed with path separators e.g. $ORIGIN:$PLATFORM.
+   On some platforms it might not be possible to determine the path
+   from which the object belonging to the map is loaded.  In this case
+   the path element containing $ORIGIN is left out.  On error NULL is
+   returned. */
 static char *
-expand_dynamic_string_token (struct link_map *l, const char *s)
+expand_dynamic_string_token (struct link_map *l, const char *start,
+			     const char *input)
 {
   /* We make two runs over the string.  First we determine how large the
      resulting string is and then we copy it over.  Since this is no
@@ -326,21 +391,21 @@ expand_dynamic_string_token (struct link_map *l, const char *s)
   char *result;
 
   /* Determine the number of DST elements.  */
-  cnt = DL_DST_COUNT (s);
+  cnt = _dl_dst_count (input);
 
   /* If we do not have to replace anything simply copy the string.  */
   if (__glibc_likely (cnt == 0))
-    return __strdup (s);
+    return __strdup (input);
 
   /* Determine the length of the substituted string.  */
-  total = DL_DST_REQUIRED (l, s, strlen (s), cnt);
+  total = DL_DST_REQUIRED (l, input, strlen (input), cnt);
 
   /* Allocate the necessary memory.  */
   result = (char *) malloc (total + 1);
   if (result == NULL)
     return NULL;
 
-  return _dl_dst_substitute (l, s, result);
+  return _dl_dst_substitute (l, start, input, result);
 }
 
 
@@ -387,6 +452,7 @@ fillin_rpath (char *rpath, struct r_search_path_elem **result, const char *sep,
 	      const char *what, const char *where, struct link_map *l)
 {
   char *cp;
+  char *start = rpath;
   size_t nelems = 0;
 
   while ((cp = __strsep (&rpath, sep)) != NULL)
@@ -398,7 +464,7 @@ fillin_rpath (char *rpath, struct r_search_path_elem **result, const char *sep,
       /* `strsep' can pass an empty string.  */
       if (*cp != '\0')
 	{
-	  to_free = cp = expand_dynamic_string_token (l, cp);
+	  to_free = cp = expand_dynamic_string_token (l, start, cp);
 
 	  /* expand_dynamic_string_token can return NULL in case of empty
 	     path or memory allocation failure.  */
@@ -2091,7 +2157,7 @@ _dl_map_object (struct link_map *loader, const char *name,
     {
       /* The path may contain dynamic string tokens.  */
       realname = (loader
-		  ? expand_dynamic_string_token (loader, name)
+		  ? expand_dynamic_string_token (loader, name, name)
 		  : __strdup (name));
       if (realname == NULL)
 	fd = -1;
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 95dc87519b..688ff60785 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -1108,8 +1108,9 @@ extern const char *_dl_get_origin (void) attribute_hidden;
 extern size_t _dl_dst_count (const char *name) attribute_hidden;
 
 /* Substitute DST values.  */
-extern char *_dl_dst_substitute (struct link_map *l, const char *name,
-				 char *result) attribute_hidden;
+extern char *_dl_dst_substitute (struct link_map *l, const char *start,
+				 const char *name, char *result)
+     attribute_hidden;
 
 /* Open the shared object NAME, relocate it, and run its initializer if it
    hasn't already been run.  MODE is as for `dlopen' (see <dlfcn.h>).  If
Andreas Schwab June 6, 2018, 4:30 p.m. | #8
On Jun 06 2018, Carlos O'Donell <carlos@redhat.com> wrote:

> +  /* Find longest valid input sequence.  */
> +  ilen = 0;
> +  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
> +	 || (input[ilen] >= 'a' && input[ilen] <= 'z')
> +	 || (input[ilen] >= '0' && input[ilen] <= '9')
> +	 || (input[ilen] == '_'))
> +    ++ilen;
> +
> +  rlen = strlen (ref);
> +
> +  /* Can't be the DST we are looking for.  */
> +  if (rlen != ilen)
> +    return 0;

Why do you need that?  Just compare, then check the next character.

Andreas.
Carlos O'Donell June 6, 2018, 5:10 p.m. | #9
On 06/06/2018 12:30 PM, Andreas Schwab wrote:
> On Jun 06 2018, Carlos O'Donell <carlos@redhat.com> wrote:
> 
>> +  /* Find longest valid input sequence.  */
>> +  ilen = 0;
>> +  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
>> +	 || (input[ilen] >= 'a' && input[ilen] <= 'z')
>> +	 || (input[ilen] >= '0' && input[ilen] <= '9')
>> +	 || (input[ilen] == '_'))
>> +    ++ilen;
>> +
>> +  rlen = strlen (ref);
>> +
>> +  /* Can't be the DST we are looking for.  */
>> +  if (rlen != ilen)
>> +    return 0;
> 
> Why do you need that?  Just compare, then check the next character.

Are you suggesting that:
~~~
rlen = strlen (ref);

/* Can't be the DST we are looking for.  */
if (rlen != ilen)
  return 0;
~~~
Can be dropped because we are going to compare the strings anyway?

I can do that.

Cheers,
Carlos.
Florian Weimer June 6, 2018, 5:28 p.m. | #10
On 06/06/2018 07:10 PM, Carlos O'Donell wrote:
> On 06/06/2018 12:30 PM, Andreas Schwab wrote:
>> On Jun 06 2018, Carlos O'Donell <carlos@redhat.com> wrote:
>>
>>> +  /* Find longest valid input sequence.  */
>>> +  ilen = 0;
>>> +  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
>>> +	 || (input[ilen] >= 'a' && input[ilen] <= 'z')
>>> +	 || (input[ilen] >= '0' && input[ilen] <= '9')
>>> +	 || (input[ilen] == '_'))
>>> +    ++ilen;
>>> +
>>> +  rlen = strlen (ref);
>>> +
>>> +  /* Can't be the DST we are looking for.  */
>>> +  if (rlen != ilen)
>>> +    return 0;
>>
>> Why do you need that?  Just compare, then check the next character.
> 
> Are you suggesting that:
> ~~~
> rlen = strlen (ref);
> 
> /* Can't be the DST we are looking for.  */
> if (rlen != ilen)
>    return 0;
> ~~~
> Can be dropped because we are going to compare the strings anyway?
> 
> I can do that.

If you compare the lengths first, yo ucan use memcmp.

You could compute the length of ref in an inline wrapper function, so 
that GCC will turn it into a compile-time constant.

Thanks,
Florian
Carlos O'Donell June 6, 2018, 6:49 p.m. | #11
On 06/06/2018 01:28 PM, Florian Weimer wrote:
> On 06/06/2018 07:10 PM, Carlos O'Donell wrote:
>> On 06/06/2018 12:30 PM, Andreas Schwab wrote:
>>> On Jun 06 2018, Carlos O'Donell <carlos@redhat.com> wrote:
>>>
>>>> +  /* Find longest valid input sequence.  */
>>>> +  ilen = 0;
>>>> +  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
>>>> +     || (input[ilen] >= 'a' && input[ilen] <= 'z')
>>>> +     || (input[ilen] >= '0' && input[ilen] <= '9')
>>>> +     || (input[ilen] == '_'))
>>>> +    ++ilen;
>>>> +
>>>> +  rlen = strlen (ref);
>>>> +
>>>> +  /* Can't be the DST we are looking for.  */
>>>> +  if (rlen != ilen)
>>>> +    return 0;
>>>
>>> Why do you need that?  Just compare, then check the next character.
>>
>> Are you suggesting that:
>> ~~~
>> rlen = strlen (ref);
>>
>> /* Can't be the DST we are looking for.  */
>> if (rlen != ilen)
>>    return 0;
>> ~~~
>> Can be dropped because we are going to compare the strings anyway?
>>
>> I can do that.
> 
> If you compare the lengths first, yo ucan use memcmp.
> 
> You could compute the length of ref in an inline wrapper function, so that GCC will turn it into a compile-time constant.

The memcmp is a good suggestion.

Like so?

 
-  len = 0;
-  while (name[len] == str[len] && name[len] != '\0')
-    ++len;
+  /* Find longest valid input sequence.  */
+  ilen = 0;
+  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
+        || (input[ilen] >= 'a' && input[ilen] <= 'z')
+        || (input[ilen] >= '0' && input[ilen] <= '9')
+        || (input[ilen] == '_'))
+    ++ilen;
+
+  rlen = strlen (ref);
+
+  /* Can't be the DST we are looking for.  */
+  if (rlen != ilen)
+    return 0;
+
+  /* Compare the DST (no strncmp this early in startup).  */
+  if (memcmp (input, ref, ilen) != 0)
+    return 0;


The inline wrapper seems overkill for the rare is_dst() case
continuing past the strchr for '$'.

Cheers,
Carlos.
Florian Weimer June 6, 2018, 6:56 p.m. | #12
On 06/06/2018 08:49 PM, Carlos O'Donell wrote:
> On 06/06/2018 01:28 PM, Florian Weimer wrote:
>> On 06/06/2018 07:10 PM, Carlos O'Donell wrote:
>>> On 06/06/2018 12:30 PM, Andreas Schwab wrote:
>>>> On Jun 06 2018, Carlos O'Donell <carlos@redhat.com> wrote:
>>>>
>>>>> +  /* Find longest valid input sequence.  */
>>>>> +  ilen = 0;
>>>>> +  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
>>>>> +     || (input[ilen] >= 'a' && input[ilen] <= 'z')
>>>>> +     || (input[ilen] >= '0' && input[ilen] <= '9')
>>>>> +     || (input[ilen] == '_'))
>>>>> +    ++ilen;
>>>>> +
>>>>> +  rlen = strlen (ref);
>>>>> +
>>>>> +  /* Can't be the DST we are looking for.  */
>>>>> +  if (rlen != ilen)
>>>>> +    return 0;
>>>>
>>>> Why do you need that?  Just compare, then check the next character.
>>>
>>> Are you suggesting that:
>>> ~~~
>>> rlen = strlen (ref);
>>>
>>> /* Can't be the DST we are looking for.  */
>>> if (rlen != ilen)
>>>     return 0;
>>> ~~~
>>> Can be dropped because we are going to compare the strings anyway?
>>>
>>> I can do that.
>>
>> If you compare the lengths first, yo ucan use memcmp.
>>
>> You could compute the length of ref in an inline wrapper function, so that GCC will turn it into a compile-time constant.
> 
> The memcmp is a good suggestion.
> 
> Like so?
> 
>   
> -  len = 0;
> -  while (name[len] == str[len] && name[len] != '\0')
> -    ++len;
> +  /* Find longest valid input sequence.  */
> +  ilen = 0;
> +  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
> +        || (input[ilen] >= 'a' && input[ilen] <= 'z')
> +        || (input[ilen] >= '0' && input[ilen] <= '9')
> +        || (input[ilen] == '_'))
> +    ++ilen;
> +
> +  rlen = strlen (ref);
> +
> +  /* Can't be the DST we are looking for.  */
> +  if (rlen != ilen)
> +    return 0;
> +
> +  /* Compare the DST (no strncmp this early in startup).  */
> +  if (memcmp (input, ref, ilen) != 0)
> +    return 0;

Yes, that is what I had in mind.

> The inline wrapper seems overkill for the rare is_dst() case
> continuing past the strchr for '$'.

I have no strong preference either way.

Thanks,
Florian
Carlos O'Donell June 6, 2018, 8:03 p.m. | #13
On 06/06/2018 02:56 PM, Florian Weimer wrote:
> On 06/06/2018 08:49 PM, Carlos O'Donell wrote:
>> On 06/06/2018 01:28 PM, Florian Weimer wrote:
>>> On 06/06/2018 07:10 PM, Carlos O'Donell wrote:
>>>> On 06/06/2018 12:30 PM, Andreas Schwab wrote:
>>>>> On Jun 06 2018, Carlos O'Donell <carlos@redhat.com> wrote:
>>>>>
>>>>>> +  /* Find longest valid input sequence.  */
>>>>>> +  ilen = 0;
>>>>>> +  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
>>>>>> +     || (input[ilen] >= 'a' && input[ilen] <= 'z')
>>>>>> +     || (input[ilen] >= '0' && input[ilen] <= '9')
>>>>>> +     || (input[ilen] == '_'))
>>>>>> +    ++ilen;
>>>>>> +
>>>>>> +  rlen = strlen (ref);
>>>>>> +
>>>>>> +  /* Can't be the DST we are looking for.  */
>>>>>> +  if (rlen != ilen)
>>>>>> +    return 0;
>>>>>
>>>>> Why do you need that?  Just compare, then check the next character.
>>>>
>>>> Are you suggesting that:
>>>> ~~~
>>>> rlen = strlen (ref);
>>>>
>>>> /* Can't be the DST we are looking for.  */
>>>> if (rlen != ilen)
>>>>     return 0;
>>>> ~~~
>>>> Can be dropped because we are going to compare the strings anyway?
>>>>
>>>> I can do that.
>>>
>>> If you compare the lengths first, yo ucan use memcmp.
>>>
>>> You could compute the length of ref in an inline wrapper function, so that GCC will turn it into a compile-time constant.
>>
>> The memcmp is a good suggestion.
>>
>> Like so?
>>
>>   -  len = 0;
>> -  while (name[len] == str[len] && name[len] != '\0')
>> -    ++len;
>> +  /* Find longest valid input sequence.  */
>> +  ilen = 0;
>> +  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
>> +        || (input[ilen] >= 'a' && input[ilen] <= 'z')
>> +        || (input[ilen] >= '0' && input[ilen] <= '9')
>> +        || (input[ilen] == '_'))
>> +    ++ilen;
>> +
>> +  rlen = strlen (ref);
>> +
>> +  /* Can't be the DST we are looking for.  */
>> +  if (rlen != ilen)
>> +    return 0;
>> +
>> +  /* Compare the DST (no strncmp this early in startup).  */
>> +  if (memcmp (input, ref, ilen) != 0)
>> +    return 0;
> 
> Yes, that is what I had in mind.
> 
>> The inline wrapper seems overkill for the rare is_dst() case
>> continuing past the strchr for '$'.
> 
> I have no strong preference either way.

Well, this turned up another issue to consider.

Test 17  [SUID]: Verify ${} (invalid) in DT_NEEDED discards the DT_NEEDED.
     13496:     _dl_dst_count ${}liborigin.so (4003a9)
     13496:     is_dst input={}liborigin.so (begin)
     13496:     is_dst ilen=0 rlen=6
     13496:     is_dst input={}liborigin.so (begin)
     13496:     is_dst ilen=0 rlen=8
     13496:     is_dst input={}liborigin.so (begin)
     13496:     is_dst ilen=0 rlen=3
     13496:     _dl_dst_count next {}liborigin.so
     13496:     _dl_dst_count cnt 0
     13496:     security = 1
     13496:     security = 1
     13496:     security = 1
     13496:     security = 1
origin: Function called.
FAIL: Incorrectly allowed DT_NEEDED with ${}.

For ilen == 0 cases like ${} or '$' the DST is not counted, since
it's invalid, has length 0, and doesn't meet the ELF gABI for being
considered a DST, and we invoke:
~~~
If a dollar sign is not immediately followed by a name or a 
brace-enclosed name, the behavior of the dynamic linker is unspecified. 
~~~

In theory ${} or '$' is an invalid name sequence.

Similar invalid name sequences like '${libnoendcurly.so' are possible and
they will appear unsubstituted in the output for AT_SECURE.

The question is:

There is a difference between a invalid DST sequence (one
which doesn't follow the ELF gABI), and one which does conform but
for which there is no substitution (unknown name).

So currently *valid* DST sequences that have no substitution are
counted and removed entirely (skipped) or can cause load errors.

However, invalid sequences, as above, are allowed unsubstituted
to pass into the search paths as literals.

Should, once we detect $ as the start of the DST sequences, cause
the whole sequence to be skipped even if it's an invalid sequence?

I think we need to be conservative here and just let these invalid
sequences pass through.

Cheers,
Carlos.
Carlos O'Donell June 6, 2018, 8:18 p.m. | #14
On 06/06/2018 12:10 PM, Carlos O'Donell wrote:
> On 06/06/2018 01:02 AM, Carlos O'Donell wrote:
>> This commit improves DST handling significantly in the following
> 
> v2
> 
> - Fix is_dst() by adding back string comparison, and clarify comment.
>   Added extra test testcases (49 now) to cover this error.
> - Renamed parameters as 'input' for data from ELF file, and 'ref' as
>   reference DST to compare against.
> - Removed dead update to name in is_dst().
> - Killed DL_DST_COUNT, not needed really, and a weak optimization.
> - Did not remove len from _dl_dst_count because we use it to advance
>   input more quickly to the next DST.
> 

v3
- Use memcmp in is_dst().
- In _dl_dst_substitute use 'len != 0' conditional to clarify that
  all we are looking to do is distinguish between a valid DST
  for which we don't recognize the DST, and an invalid DST which
  may just be stray characters which we will copy to the result.
- Added a few more tests: Small invalid sequence e.g. ${}, and
  large valid sequence with unknown DST.

This commit improves DST handling significantly in the following
ways: firstly is_dst () is overhauled to correctly process DST
sequences that would be accepted given the ELF gABI. This means that
we actually now accept slightly more sequences than before.  Now we
accept $ORIGIN$ORIGIN, but in the past we accepted only $ORIGIN\0 or
$ORIGIN/..., but this kind of behaviour results in unexpected
and uninterpreted DST sequences being used as literal search paths
leading to security defects. Therefore the first step in correcting
this defect is making is_dst () properly account for all DSTs
and making the function context free in the sense that it counts
DSTs without knowledge of path, or AT_SECURE. Next, _dl_dst_count ()
is also simplified to count all DSTs regardless of context.
Then in _dl_dst_substitute () we reintroduce context-dependent
processing for such things as AT_SECURE handling. At the level of
_dl_dst_substitute we can have access to things like the true start
of the string sequence to validate $ORIGIN-based paths rooted in
trusted directories.  Lastly, callers of _dl_dst_substitute () are
adjusted to pass in the start of their string sequences, this includes
expand_dynamic_string_token () and fillin_rpath (). Lastly, after this
commit we tighten up the accepted sequences in AT_SECURE, and avoid
leaving unexpanded DSTs, this is noted in the NEWS entry.

Verified with a sequence of 56 tests on x86_64 that cover
non-AT_SECURE and AT_SECURE testing using a sysroot (requires root
to run). The tests cover cases for bug 23102, bug 21942, bug 18018,
and bug 23259. These tests are not yet appropriate for the glibc
regression testsuite, but with the upcoming test-in-container testing
framework it should be possible to include these tests upstream soon.

See the mailing list for the tests:
https://www.sourceware.org/ml/libc-alpha/2018-06/msg00073.html
---
 ChangeLog                  |  20 ++++++
 NEWS                       |  10 +++
 elf/dl-deps.c              |   4 +-
 elf/dl-dst.h               |  13 ----
 elf/dl-load.c              | 175 ++++++++++++++++++++++++++++++---------------
 sysdeps/generic/ldsodefs.h |   5 +-
 6 files changed, 153 insertions(+), 74 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index a3bc2bf31e..f6dbc961e2 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,23 @@
+2018-06-06  Carlos O'Donell  <carlos@redhat.com>
+	    Andreas Schwab  <schwab@suse.de>
+	    Dmitry V. Levin  <ldv@altlinux.org>
+
+	[BZ #23102]
+	[BZ #21942]
+	[BZ #18018]
+	[BZ #23259]
+	CVE-2011-0536
+	* elf/dl-load.c (is_dst): Comment.  Support ELF gABI.
+	(_dl_dst_count): Comment.  Simplify and count DSTs.
+	(_dl_dst_substitute): Comment.  Support __libc_enable_secure handling.
+	Add string start to arguments.
+	(expand_dybamic_string_token): Comment. Accept path start.
+	(fillin_rpath): Pass string start to expand_dynamic_string_token.
+	* sysdeps/generic/ldsodefs.h: _dl_dst_substitute takes additiional
+	string start argument.
+	* elf/dl-deps.c (expand_dst): Adjust call to _dl_dst_substitute.
+	* elf/dl-dst.h: Remove DL_DST_COUNT.
+
 2018-06-05  Joseph Myers  <joseph@codesourcery.com>
 
 	* sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h (HWCAP_DIT): New
diff --git a/NEWS b/NEWS
index e2a6f45121..0d0bc9ad4c 100644
--- a/NEWS
+++ b/NEWS
@@ -41,6 +41,16 @@ Major new features:
   NI_IDN_ALLOW_UNASSIGNED, NI_IDN_USE_STD3_ASCII_RULES) have been
   deprecated.  They no longer have any effect.
 
+* Parsing of dynamic string tokens in DT_RPATH, DT_RUNPATH, and DT_NEEDED
+  has been expanded to support the full range of ELF gABI expressions
+  including such constructs as '$ORIGIN$ORIGIN' (if valid).  For SUID/GUID
+  applications the rules have been further restricted, and where in the
+  past a dynamic string token sequence may have been interpreted as a
+  literal string it will now cause a load failure.  These load failures
+  were always considered unspecified behaviour from the perspective of the
+  dynamic loader, and for safety are now load errors e.g. /foo/${ORIGIN}.so
+  in DT_NEEDED results in a load failure now.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * The nonstandard header files <libio.h> and <_G_config.h> are no longer
diff --git a/elf/dl-deps.c b/elf/dl-deps.c
index c975fcffd7..07b0859456 100644
--- a/elf/dl-deps.c
+++ b/elf/dl-deps.c
@@ -100,7 +100,7 @@ struct list
   ({									      \
     const char *__str = (str);						      \
     const char *__result = __str;					      \
-    size_t __dst_cnt = DL_DST_COUNT (__str);				      \
+    size_t __dst_cnt = _dl_dst_count (__str);				      \
 									      \
     if (__dst_cnt != 0)							      \
       {									      \
@@ -114,7 +114,7 @@ DST not allowed in SUID/SGID programs"));				      \
 	__newp = (char *) alloca (DL_DST_REQUIRED (l, __str, strlen (__str),  \
 						   __dst_cnt));		      \
 									      \
-	__result = _dl_dst_substitute (l, __str, __newp);		      \
+	__result = _dl_dst_substitute (l, __str, __str, __newp);	      \
 									      \
 	if (*__result == '\0')						      \
 	  {								      \
diff --git a/elf/dl-dst.h b/elf/dl-dst.h
index 32de5d225a..859032be0d 100644
--- a/elf/dl-dst.h
+++ b/elf/dl-dst.h
@@ -18,19 +18,6 @@
 
 #include "trusted-dirs.h"
 
-/* Determine the number of DST elements in the name.  Only if IS_PATH is
-   nonzero paths are recognized (i.e., multiple, ':' separated filenames).  */
-#define DL_DST_COUNT(name) \
-  ({									      \
-    size_t __cnt = 0;							      \
-    const char *__sf = strchr (name, '$');				      \
-									      \
-    if (__glibc_unlikely (__sf != NULL))				      \
-      __cnt = _dl_dst_count (__sf);					      \
-									      \
-    __cnt; })
-
-
 #ifdef SHARED
 # define IS_RTLD(l) (l) == &GL(dl_rtld_map)
 #else
diff --git a/elf/dl-load.c b/elf/dl-load.c
index 431236920f..ff7a95bee2 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -177,76 +177,110 @@ is_trusted_path_normalize (const char *path, size_t len)
   return false;
 }
 
+/* Given a substring starting at INPUT, just after the DST '$' start
+   token, determine if INPUT contains DST token REF, following the
+   ELF gABI rules for DSTs:
 
+   * Longest possible sequence using the rules (greedy).
+
+   * Must start with a $ (enforced by caller).
+
+   * Must follow $ with one underscore or ASCII [A-Za-z] (caller
+     follows these rules for REF and we enforce the comparison)
+     or '{' (start curly quoted name).
+
+   * Must follow first two characters with zero or more [A-Za-z0-9_]
+     (enforced by caller) or '}' (end curly quoted name).
+
+   If the sequence is a DST matching REF then the length of the DST is
+   returned, otherwise 0.  */
 static size_t
-is_dst (const char *start, const char *name, const char *str, int secure)
+is_dst (const char *input, const char *ref)
 {
-  size_t len;
+  size_t ilen, rlen;
   bool is_curly = false;
 
-  if (name[0] == '{')
+  /* Is a ${...} input sequence?  */
+  if (input[0] == '{')
     {
       is_curly = true;
-      ++name;
+      ++input;
     }
 
-  len = 0;
-  while (name[len] == str[len] && name[len] != '\0')
-    ++len;
+  /* Find longest valid input sequence.  */
+  ilen = 0;
+  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
+	 || (input[ilen] >= 'a' && input[ilen] <= 'z')
+	 || (input[ilen] >= '0' && input[ilen] <= '9')
+	 || (input[ilen] == '_'))
+    ++ilen;
+
+  rlen = strlen (ref);
+
+  /* Can't be the DST we are looking for.  */
+  if (rlen != ilen)
+    return 0;
+
+  /* Compare the DST (no strncmp this early in startup).  */
+  if (memcmp (input, ref, ilen) != 0)
+    return 0;
 
   if (is_curly)
     {
-      if (name[len] != '}')
+      /* Invalid curly sequence!  */
+      if (input[ilen] != '}')
 	return 0;
 
-      /* Point again at the beginning of the name.  */
-      --name;
-      /* Skip over closing curly brace and adjust for the --name.  */
-      len += 2;
+      /* Count the two curly braces.  */
+      ilen += 2;
     }
-  else if (name[len] != '\0' && name[len] != '/')
-    return 0;
 
-  if (__glibc_unlikely (secure)
-      && ((name[len] != '\0' && name[len] != '/')
-	  || (name != start + 1)))
-    return 0;
-
-  return len;
+  return ilen;
 }
 
-
+/* INPUT is the start of a DST sequence at the first '$' occurrence.
+   If there is a DST we call into _dl_dst_count to count the number of
+   DSTs.  We count all known DSTs regardless of __libc_enable_secure;
+   the caller is responsible for enforcing the security of the
+   substitution rules (usually _dl_dst_substitute).  */
 size_t
-_dl_dst_count (const char *name)
+_dl_dst_count (const char *input)
 {
-  const char *const start = name;
   size_t cnt = 0;
 
+  input = strchr (input, '$');
+
+  /* Most likely there is no DST.  */
+  if (__glibc_likely (input == NULL))
+    return 0;
+
   do
     {
       size_t len;
 
-      /* $ORIGIN is not expanded for SUID/GUID programs (except if it
-	 is $ORIGIN alone) and it must always appear first in path.  */
-      ++name;
-      if ((len = is_dst (start, name, "ORIGIN", __libc_enable_secure)) != 0
-	  || (len = is_dst (start, name, "PLATFORM", 0)) != 0
-	  || (len = is_dst (start, name, "LIB", 0)) != 0)
+      ++input;
+      /* All DSTs must follow ELF gABI rules, see is_dst ().  */
+      if ((len = is_dst (input, "ORIGIN")) != 0
+	  || (len = is_dst (input, "PLATFORM")) != 0
+	  || (len = is_dst (input, "LIB")) != 0)
 	++cnt;
 
-      name = strchr (name + len, '$');
+      /* There may be more than one DST in the input.  */
+      input = strchr (input + len, '$');
     }
-  while (name != NULL);
+  while (input != NULL);
 
   return cnt;
 }
 
-
+/* Process INPUT for DSTs and store in RESULT using the information from
+   link map L to resolve the DSTs.  The value of START must equal the
+   start of the parent string if INPUT is a substring sequence being
+   parsed with path separators e.g. $ORIGIN:$PLATFORM.  */
 char *
-_dl_dst_substitute (struct link_map *l, const char *name, char *result)
+_dl_dst_substitute (struct link_map *l, const char *start,
+		    const char *input, char *result)
 {
-  const char *const start = name;
-
   /* Now fill the result path.  While copying over the string we keep
      track of the start of the last path element.  When we come across
      a DST we copy over the value or (if the value is not available)
@@ -257,32 +291,54 @@ _dl_dst_substitute (struct link_map *l, const char *name, char *result)
 
   do
     {
-      if (__glibc_unlikely (*name == '$'))
+      if (__glibc_unlikely (*input == '$'))
 	{
 	  const char *repl = NULL;
 	  size_t len;
 
-	  ++name;
-	  if ((len = is_dst (start, name, "ORIGIN", __libc_enable_secure)) != 0)
+	  ++input;
+	  if ((len = is_dst (input, "ORIGIN")) != 0)
 	    {
-	      repl = l->l_origin;
+	      /* For SUID/GUID programs we normally ignore the path with
+		 a DST in DT_RUNPATH, or DT_RPATH.  However, there is one
+		 exception to this rule, and it is:
+
+		   * $ORIGIN appears first in the path element.
+		   * The path element is rooted in a trusted directory.
+
+		 This exception allows such programs to reference
+		 shared libraries in subdirectories of trusted
+		 directories.  The use case is one of general
+		 organization and deployment flexibility.
+		 Trusted directories are usually such paths as "/lib64"
+		 or "/lib".  */
+	      if (__glibc_unlikely (__libc_enable_secure)
+		  && ((input[len] != '\0' && input[len] != '/'
+		       && input[len] != ':')
+		      || (input != start + 1
+			  || (input > start + 2 && input[-2] != ':'))))
+		repl = (const char *) -1;
+	      else
+	        repl = l->l_origin;
+
 	      check_for_trusted = (__libc_enable_secure
 				   && l->l_type == lt_executable);
 	    }
-	  else if ((len = is_dst (start, name, "PLATFORM", 0)) != 0)
+	  else if ((len = is_dst (input, "PLATFORM")) != 0)
 	    repl = GLRO(dl_platform);
-	  else if ((len = is_dst (start, name, "LIB", 0)) != 0)
+	  else if ((len = is_dst (input, "LIB")) != 0)
 	    repl = DL_DST_LIB;
 
 	  if (repl != NULL && repl != (const char *) -1)
 	    {
 	      wp = __stpcpy (wp, repl);
-	      name += len;
+	      input += len;
 	    }
-	  else if (len > 1)
+	  else if (len != 0)
 	    {
 	      /* We cannot use this path element, the value of the
 		 replacement is unknown.  */
+	      check_for_trusted = false;
 	      wp = last_elem;
 	      break;
 	    }
@@ -292,10 +348,10 @@ _dl_dst_substitute (struct link_map *l, const char *name, char *result)
 	}
       else
 	{
-	  *wp++ = *name++;
+	  *wp++ = *input++;
 	}
     }
-  while (*name != '\0');
+  while (*input != '\0');
 
   /* In SUID/SGID programs, after $ORIGIN expansion the normalized
      path must be rooted in one of the trusted directories.  */
@@ -309,13 +365,17 @@ _dl_dst_substitute (struct link_map *l, const char *name, char *result)
 }
 
 
-/* Return copy of argument with all recognized dynamic string tokens
-   ($ORIGIN and $PLATFORM for now) replaced.  On some platforms it
-   might not be possible to determine the path from which the object
-   belonging to the map is loaded.  In this case the path element
-   containing $ORIGIN is left out.  */
+/* Return a malloc allocated copy of INPUT with all recognized DSTs
+   ($ORIGIN and $PLATFORM for now) replaced.  The value of START must
+   equal the start of the parent string if INPUT is a substring
+   sequence being parsed with path separators e.g. $ORIGIN:$PLATFORM.
+   On some platforms it might not be possible to determine the path
+   from which the object belonging to the map is loaded.  In this case
+   the path element containing $ORIGIN is left out.  On error NULL is
+   returned. */
 static char *
-expand_dynamic_string_token (struct link_map *l, const char *s)
+expand_dynamic_string_token (struct link_map *l, const char *start,
+			     const char *input)
 {
   /* We make two runs over the string.  First we determine how large the
      resulting string is and then we copy it over.  Since this is no
@@ -326,21 +386,21 @@ expand_dynamic_string_token (struct link_map *l, const char *s)
   char *result;
 
   /* Determine the number of DST elements.  */
-  cnt = DL_DST_COUNT (s);
+  cnt = _dl_dst_count (input);
 
   /* If we do not have to replace anything simply copy the string.  */
   if (__glibc_likely (cnt == 0))
-    return __strdup (s);
+    return __strdup (input);
 
   /* Determine the length of the substituted string.  */
-  total = DL_DST_REQUIRED (l, s, strlen (s), cnt);
+  total = DL_DST_REQUIRED (l, input, strlen (input), cnt);
 
   /* Allocate the necessary memory.  */
   result = (char *) malloc (total + 1);
   if (result == NULL)
     return NULL;
 
-  return _dl_dst_substitute (l, s, result);
+  return _dl_dst_substitute (l, start, input, result);
 }
 
 
@@ -387,6 +447,7 @@ fillin_rpath (char *rpath, struct r_search_path_elem **result, const char *sep,
 	      const char *what, const char *where, struct link_map *l)
 {
   char *cp;
+  char *start = rpath;
   size_t nelems = 0;
 
   while ((cp = __strsep (&rpath, sep)) != NULL)
@@ -398,7 +459,7 @@ fillin_rpath (char *rpath, struct r_search_path_elem **result, const char *sep,
       /* `strsep' can pass an empty string.  */
       if (*cp != '\0')
 	{
-	  to_free = cp = expand_dynamic_string_token (l, cp);
+	  to_free = cp = expand_dynamic_string_token (l, start, cp);
 
 	  /* expand_dynamic_string_token can return NULL in case of empty
 	     path or memory allocation failure.  */
@@ -2091,7 +2152,7 @@ _dl_map_object (struct link_map *loader, const char *name,
     {
       /* The path may contain dynamic string tokens.  */
       realname = (loader
-		  ? expand_dynamic_string_token (loader, name)
+		  ? expand_dynamic_string_token (loader, name, name)
 		  : __strdup (name));
       if (realname == NULL)
 	fd = -1;
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 95dc87519b..688ff60785 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -1108,8 +1108,9 @@ extern const char *_dl_get_origin (void) attribute_hidden;
 extern size_t _dl_dst_count (const char *name) attribute_hidden;
 
 /* Substitute DST values.  */
-extern char *_dl_dst_substitute (struct link_map *l, const char *name,
-				 char *result) attribute_hidden;
+extern char *_dl_dst_substitute (struct link_map *l, const char *start,
+				 const char *name, char *result)
+     attribute_hidden;
 
 /* Open the shared object NAME, relocate it, and run its initializer if it
    hasn't already been run.  MODE is as for `dlopen' (see <dlfcn.h>).  If
Andreas Schwab June 7, 2018, 6:48 a.m. | #15
On Jun 06 2018, Carlos O'Donell <carlos@redhat.com> wrote:

> On 06/06/2018 12:30 PM, Andreas Schwab wrote:
>> On Jun 06 2018, Carlos O'Donell <carlos@redhat.com> wrote:
>> 
>>> +  /* Find longest valid input sequence.  */
>>> +  ilen = 0;
>>> +  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
>>> +	 || (input[ilen] >= 'a' && input[ilen] <= 'z')
>>> +	 || (input[ilen] >= '0' && input[ilen] <= '9')
>>> +	 || (input[ilen] == '_'))
>>> +    ++ilen;
>>> +
>>> +  rlen = strlen (ref);
>>> +
>>> +  /* Can't be the DST we are looking for.  */
>>> +  if (rlen != ilen)
>>> +    return 0;
>> 
>> Why do you need that?  Just compare, then check the next character.
>
> Are you suggesting that:
> ~~~
> rlen = strlen (ref);
>
> /* Can't be the DST we are looking for.  */
> if (rlen != ilen)
>   return 0;
> ~~~
> Can be dropped because we are going to compare the strings anyway?

Drop the whole part.  When you have compared the string you know that it
is valid so far, so what's the value of validating it twice?

Andreas.
Florian Weimer June 7, 2018, 11:38 a.m. | #16
On 06/06/2018 10:18 PM, Carlos O'Donell wrote:
> +is_dst (const char *input, const char *ref)

Maybe we should fix the strncmp bug and use that?

In rtld, the actual definitions are not built with the proper name, but 
aliases for use in IFUNCs, which we don't use in rtld.

The attached patch fixes that for x86-64 and i686 (multi-arch in both 
cases).  I don't think other architectures are affected (verified for 
ppc64le, build-many-glibcs.py is still running).

If you want to use strncmp to express more clearly what the code is 
about, I can commit the fix separately (with a proper ChangeLog entry etc.).

Thanks,
Florian
diff --git a/elf/dl-load.c b/elf/dl-load.c
index ff7a95bee2..ff271c0bdf 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -192,12 +192,12 @@ is_trusted_path_normalize (const char *path, size_t len)
    * Must follow first two characters with zero or more [A-Za-z0-9_]
      (enforced by caller) or '}' (end curly quoted name).
 
-   If the sequence is a DST matching REF then the length of the DST is
+   If the sequence is a DST matching REF then the length of the DST
+   (excluding the $ sign but including curly braces, if any) is
    returned, otherwise 0.  */
 static size_t
 is_dst (const char *input, const char *ref)
 {
-  size_t ilen, rlen;
   bool is_curly = false;
 
   /* Is a ${...} input sequence?  */
@@ -207,35 +207,23 @@ is_dst (const char *input, const char *ref)
       ++input;
     }
 
-  /* Find longest valid input sequence.  */
-  ilen = 0;
-  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
-	 || (input[ilen] >= 'a' && input[ilen] <= 'z')
-	 || (input[ilen] >= '0' && input[ilen] <= '9')
-	 || (input[ilen] == '_'))
-    ++ilen;
-
-  rlen = strlen (ref);
-
-  /* Can't be the DST we are looking for.  */
-  if (rlen != ilen)
-    return 0;
-
-  /* Compare the DST (no strncmp this early in startup).  */
-  if (memcmp (input, ref, ilen) != 0)
+  /* Check for matching name, following closing curly brace (if
+     required), or trailing characters which are part of an
+     identifier.  */
+  size_t rlen = strlen (ref);
+  if (strncmp (input, ref, rlen) != 0
+      || (is_curly && input[rlen] != '}')
+      || ((input[rlen] >= 'A' && input[rlen] <= 'Z')
+	  || (input[rlen] >= 'a' && input[rlen] <= 'z')
+	  || (input[rlen] >= '0' && input[rlen] <= '9')
+	  || (input[rlen] == '_')))
     return 0;
 
   if (is_curly)
-    {
-      /* Invalid curly sequence!  */
-      if (input[ilen] != '}')
-	return 0;
-
-      /* Count the two curly braces.  */
-      ilen += 2;
-    }
-
-  return ilen;
+    /* Count the two curly braces.  */
+    return rlen + 2;
+  else
+    return rlen;
 }
 
 /* INPUT is the start of a DST sequence at the first '$' occurrence.
diff --git a/sysdeps/i386/i686/multiarch/strncmp-c.c b/sysdeps/i386/i686/multiarch/strncmp-c.c
index cc059da494..2e3eca9b2b 100644
--- a/sysdeps/i386/i686/multiarch/strncmp-c.c
+++ b/sysdeps/i386/i686/multiarch/strncmp-c.c
@@ -1,4 +1,4 @@
-#ifdef SHARED
+#if defined (SHARED) && IS_IN (libc)
 # define STRNCMP __strncmp_ia32
 # undef libc_hidden_builtin_def
 # define libc_hidden_builtin_def(name)  \
diff --git a/sysdeps/x86_64/multiarch/strncmp-sse2.S b/sysdeps/x86_64/multiarch/strncmp-sse2.S
index cc5252d826..0e1f52dea9 100644
--- a/sysdeps/x86_64/multiarch/strncmp-sse2.S
+++ b/sysdeps/x86_64/multiarch/strncmp-sse2.S
@@ -18,10 +18,13 @@
 
 #include <sysdep.h>
 
-#define STRCMP	__strncmp_sse2
-
-#undef libc_hidden_builtin_def
-#define libc_hidden_builtin_def(strcmp)
+#if IS_IN (libc)
+# define STRCMP	__strncmp_sse2
+# undef libc_hidden_builtin_def
+# define libc_hidden_builtin_def(strcmp)
+#else
+# define STRCMP strncmp
+#endif
 
 #define USE_AS_STRNCMP
 #include <sysdeps/x86_64/strcmp.S>
Florian Weimer June 7, 2018, 12:16 p.m. | #17
On 06/06/2018 10:18 PM, Carlos O'Donell wrote:
> +/* Process INPUT for DSTs and store in RESULT using the information from
> +   link map L to resolve the DSTs.  The value of START must equal the
> +   start of the parent string if INPUT is a substring sequence being
> +   parsed with path separators e.g. $ORIGIN:$PLATFORM.  */
>   char *
> -_dl_dst_substitute (struct link_map *l, const char *name, char *result)
> +_dl_dst_substitute (struct link_map *l, const char *start,
> +		    const char *input, char *result)

The comment should describe the storage requirements for RESULT.

I'm a bit worried about this:

	  else if (len != 0)
	    {
	      /* We cannot use this path element, the value of the
		 replacement is unknown.  */
	      check_for_trusted = false;
	      wp = last_elem;
	      break;
	    }

Does this really do the right thing for $ORIGIN/../$LIB:/foo/$ORIGIN?  I 
would have expected a trusted path check for the first component in this 
case.

Thanks,
Florian
Florian Weimer June 7, 2018, 12:43 p.m. | #18
On 06/06/2018 10:18 PM, Carlos O'Donell wrote:
> +	      if (__glibc_unlikely (__libc_enable_secure)
> +		  && ((input[len] != '\0' && input[len] != '/'
> +		       && input[len] != ':')
> +		      || (input != start + 1
> +			  || (input > start + 2 && input[-2] != ':'))))

Is the ':' check really the right thing here?

Didn't we change the code so that _dl_dst_substitute is only called with 
a single component as an argument?

fillin_rpath splits the string at :/:.  The callers in dl-deps.c

I also suggest to use struct alloc_buffer, to make the code more 
obviously correct.

Thanks,
Florian
Carlos O'Donell June 8, 2018, 2:08 a.m. | #19
On 06/07/2018 02:48 AM, Andreas Schwab wrote:
> On Jun 06 2018, Carlos O'Donell <carlos@redhat.com> wrote:
> 
>> On 06/06/2018 12:30 PM, Andreas Schwab wrote:
>>> On Jun 06 2018, Carlos O'Donell <carlos@redhat.com> wrote:
>>>
>>>> +  /* Find longest valid input sequence.  */
>>>> +  ilen = 0;
>>>> +  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
>>>> +	 || (input[ilen] >= 'a' && input[ilen] <= 'z')
>>>> +	 || (input[ilen] >= '0' && input[ilen] <= '9')
>>>> +	 || (input[ilen] == '_'))
>>>> +    ++ilen;
>>>> +
>>>> +  rlen = strlen (ref);
>>>> +
>>>> +  /* Can't be the DST we are looking for.  */
>>>> +  if (rlen != ilen)
>>>> +    return 0;
>>>
>>> Why do you need that?  Just compare, then check the next character.
>>
>> Are you suggesting that:
>> ~~~
>> rlen = strlen (ref);
>>
>> /* Can't be the DST we are looking for.  */
>> if (rlen != ilen)
>>   return 0;
>> ~~~
>> Can be dropped because we are going to compare the strings anyway?
> 
> Drop the whole part.  When you have compared the string you know that it
> is valid so far, so what's the value of validating it twice?

Careful, is_dst () takes as input the start of a DST sequence,
but that sequence is not validated yet.

We must compare the longest legal sequence with that of the DST
comparison. This is different from comparing *both* sequences.
The old glibc code was wrong in this regard.

In the old code we had:

  len = 0;
  while (name[len] == str[len] && name[len] != '\0')
    ++len;

What if name was '$ORIGIN-' but str was '$ORIGIN'?

With the above code we reject '$ORIGIN-' != '$ORIGIN', but this is
not true.

I have added a test for this in my test case:

With system glibc:
Test 1e: Test that RPATH of $ORIGIN- loads from /root- as expected.
FAIL: Wrong primary library.

With patched glibc:
Test 1e: Test that RPATH of $ORIGIN- loads from /root- as expected.
     24538:	expand_dynamic_string_token input=$ORIGIN-, start=$ORIGIN-
     24538:	_dl_dst_count $ORIGIN- (7f7934f38d20)
     24538:	is_dst input=ORIGIN- (begin)
     24538:	is_dst ilen=6 rlen=6
     24538:	is_dst ilen=6 input=ORIGIN- (7f7934f38d21) (ref=ORIGIN)
     24538:	_dl_dst_count next -
     24538:	_dl_dst_count cnt 1
     24538:	DL_DST_REQUIRED l_origin 7f7934f38d40
     24538:	_dl_dst_substitute input=$ORIGIN- start=$ORIGIN-
     24538:	is_dst input=ORIGIN- (begin)
     24538:	is_dst ilen=6 rlen=6
     24538:	is_dst ilen=6 input=ORIGIN- (7f7934f38d21) (ref=ORIGIN)
     24538:	_dl_dst_substitute 7f7934f38d40
     24538:	security = 0
     24538:	security = 0
     24538:	security = 0
     24538:	security = 0
PASS: Correct primary library.

The above code also makes is_dst () robust against mistakes if the caller
violates the ELF gABI specification by trying to use a DST which
doesn't conform to the [A-Za-z0-9_] requirement.

To do that the code does:

* Follow the ELF gABI rules to get the longest name.

* Compare the longest name to the DST we are looking to find.

If we didn't do the "longest name" computation we might allow both
glibc to use invalid DST names, and applications to get away with using
them without error and have them work.

With my patch the developer would have to knowingly adjust the allowed
longest name sequence, and that's a conscious change which is good
to avoid mistakes.

There is a bit of a performance cost to pay here, but I've not seen
any argument that this code is highly performance sensitive. If it
is really performance sensitive then we can talk about many other
ways to make it faster.

Again, this is a change from previous behaviour, but brings us more
sane parsing of the DSTs IMO.

Therefore I don't think this code has to change.

Cheers,
Carlos.
Carlos O'Donell June 8, 2018, 2:13 a.m. | #20
On 06/07/2018 07:38 AM, Florian Weimer wrote:
> On 06/06/2018 10:18 PM, Carlos O'Donell wrote:
>> +is_dst (const char *input, const char *ref)
> 
> Maybe we should fix the strncmp bug and use that?
> 
> In rtld, the actual definitions are not built with the proper name,
> but aliases for use in IFUNCs, which we don't use in rtld.
> 
> The attached patch fixes that for x86-64 and i686 (multi-arch in both
> cases).  I don't think other architectures are affected (verified for
> ppc64le, build-many-glibcs.py is still running).
> 
> If you want to use strncmp to express more clearly what the code is
> about, I can commit the fix separately (with a proper ChangeLog entry
> etc.).

I would like to exclude this for now.

I am happy to discuss using strncmp *after* we get the first patch in
which we all agree implements the semantics we want.

Note, I just took your new comment bit "(excluding the $ sign but 
including curly braces, if any)" and added it to my v4 patch which
I'll post after going through your other review. Thanks for that.

Cheers,
Carlos.
Carlos O'Donell June 8, 2018, 4:14 a.m. | #21
On 06/07/2018 08:16 AM, Florian Weimer wrote:
> On 06/06/2018 10:18 PM, Carlos O'Donell wrote:
>> +/* Process INPUT for DSTs and store in RESULT using the information from
>> +   link map L to resolve the DSTs.  The value of START must equal the
>> +   start of the parent string if INPUT is a substring sequence being
>> +   parsed with path separators e.g. $ORIGIN:$PLATFORM.  */
>>   char *
>> -_dl_dst_substitute (struct link_map *l, const char *name, char *result)
>> +_dl_dst_substitute (struct link_map *l, const char *start,
>> +            const char *input, char *result)
> 
> The comment should describe the storage requirements for RESULT.

Done, I'll add that to v4.

The answer is: As much as DL_DST_REQUIRED tells you you need.

We use DL_DST_REQUIRED in both expand_dst (used for NEEDED, AUX, and FILTER),
and in expand_dynamic_string_token (used for RPATH from fillin_rpath).

> I'm a bit worried about this:
> 
>       else if (len != 0)
>         {
>           /* We cannot use this path element, the value of the
>          replacement is unknown.  */
>           check_for_trusted = false;
>           wp = last_elem;
>           break;
>         }
> 
> Does this really do the right thing for $ORIGIN/../$LIB:/foo/$ORIGIN?
> I would have expected a trusted path check for the first component in
> this case.

That depends from which input this came from.

There are two ways to this code:

(1) _dl_map_object -> DT_RPATH/RUNPATH -> fillin_rpath -> _dl_dst_substitute

(2) _dl_map_object_deps -> DT_NEEDED/AUXILIARY/FILTER -> expand_dst -> _dl_dst_substitute

I assume that you intend (1), and that ':' is to be interpreted as the
path separator in a valid list-of-paths sequence.

I also assume we are talking about SUID/SGID.

The alternative is that you mean (2), and that you have a really odd
name for the dependency of the binary, or it's a mistake made by the
developer which might be exploited by an attacker. We can talk about
this in your other email regarding ':'-splitting in fillin_rpath.

Given that you mean (1), and in fillin_rpath we split on ':' then the
sequence arriving at _dl_dst_substitute is [$ORIGIN/../$LIB], and we
can substitute for $ORIGIN and $LIB without error, both of them return
a valid repl value and length, and we never set check_for_trusted to
false. Therefore the first component *does* get a trusted path check.

Verified here:

Test 18  [SUID]: Verify RPATH $ORIGIN/../$LIB:/foo/$ORIGIN loads from /lib64.
      6277:	decompose_rpath: Call fillin_rpath
      6277:	fillin_rpath (begin)
      6277:	expand_dynamic_string_token input=$ORIGIN/../$LIB, start=$ORIGIN/../$LIB
      6277:	_dl_dst_count $ORIGIN/../$LIB (7fa91c1d4d20)
      6277:	is_dst input=ORIGIN/../$LIB (begin)
      6277:	is_dst ilen=6 rlen=6
      6277:	is_dst ilen=6 input=ORIGIN/../$LIB (7fa91c1d4d21) (ref=ORIGIN)
      6277:	_dl_dst_count next /../$LIB
      6277:	is_dst input=LIB (begin)
      6277:	is_dst ilen=3 rlen=6
      6277:	is_dst input=LIB (begin)
      6277:	is_dst ilen=3 rlen=8
      6277:	is_dst input=LIB (begin)
      6277:	is_dst ilen=3 rlen=3
      6277:	is_dst ilen=3 input=LIB (7fa91c1d4d2c) (ref=LIB)
      6277:	_dl_dst_count next 
      6277:	_dl_dst_count cnt 2
      6277:	DL_DST_REQUIRED l_origin 7fa91c1d4d60
      6277:	_dl_dst_substitute input=$ORIGIN/../$LIB start=$ORIGIN/../$LIB
      6277:	is_dst input=ORIGIN/../$LIB (begin)
      6277:	is_dst ilen=6 rlen=6
      6277:	is_dst ilen=6 input=ORIGIN/../$LIB (7fa91c1d4d21) (ref=ORIGIN)
      6277:	_dl_dst_substitute 7fa91c1d4d60
      6277:	is_dst input=LIB (begin)
      6277:	is_dst ilen=3 rlen=6
      6277:	is_dst input=LIB (begin)
      6277:	is_dst ilen=3 rlen=8
      6277:	is_dst input=LIB (begin)
      6277:	is_dst ilen=3 rlen=3
      6277:	is_dst ilen=3 input=LIB (7fa91c1d4d2c) (ref=LIB)
      6277:	_dl_dst_substitute 7fa91bfcc457
      6277:	is_trusted_path_normalize: /root/../lib64 14
      6277:	is_trusted_path_normalize: Transformed path /lib64/
      6277:	is_trusted_path_normalize: Path IS trusted.
      6277:	expand_dynamic_string_token input=/foo/$ORIGIN, start=$ORIGIN/../$LIB
      6277:	_dl_dst_count $ORIGIN (7fa91c1d4d35)
      6277:	is_dst input=ORIGIN (begin)
      6277:	is_dst ilen=6 rlen=6
      6277:	is_dst ilen=6 input=ORIGIN (7fa91c1d4d36) (ref=ORIGIN)
      6277:	_dl_dst_count next 
      6277:	_dl_dst_count cnt 1
      6277:	_dl_dst_substitute input=/foo/$ORIGIN start=$ORIGIN/../$LIB
      6277:	is_dst input=ORIGIN (begin)
      6277:	is_dst ilen=6 rlen=6
      6277:	is_dst ilen=6 input=ORIGIN (7fa91c1d4d36) (ref=ORIGIN)
      6277:	_dl_dst_substitute Invalid DST!
      6277:	_dl_dst_substitute ffffffffffffffff
      6277:	security = 1
      6277:	security = 1
      6277:	security = 1
      6277:	security = 1
PASS: Correct primary library.

The 'Invalid DST!' comes from a debug in the security checking code which
notices '/foo/$ORIGIN' doesn't meet the rules for SUID/SGID (that $ORIGIN
is first and it's rooted in a trusted direcotry) and sets repl to -1.

Therefore this works as expected for the case you cite.

What about [$ORIGIN/../$FOO], which is one valid, and one invalid DST?

This might be what you intended to write above, but I'd rather just cover
both cases to save the round-trip on the review.

     12187:	decompose_rpath: Call fillin_rpath
     12187:	fillin_rpath (begin)
     12187:	expand_dynamic_string_token input=$ORIGIN/../$FOO, start=$ORIGIN/../$FOO
     12187:	_dl_dst_count $ORIGIN/../$FOO (7fbf5ff90d20)
     12187:	is_dst input=ORIGIN/../$FOO (begin)
     12187:	is_dst ilen=6 rlen=6
     12187:	is_dst ilen=6 input=ORIGIN/../$FOO (7fbf5ff90d21) (ref=ORIGIN)
     12187:	_dl_dst_count next /../$FOO
     12187:	is_dst input=FOO (begin)
     12187:	is_dst ilen=3 rlen=6
     12187:	is_dst input=FOO (begin)
     12187:	is_dst ilen=3 rlen=8
     12187:	is_dst input=FOO (begin)
     12187:	is_dst ilen=3 rlen=3
     12187:	is_dst memcmp failed.
     12187:	_dl_dst_count next FOO
     12187:	_dl_dst_count cnt 1
     12187:	DL_DST_REQUIRED l_origin 7fbf5ff90d60
     12187:	_dl_dst_substitute input=$ORIGIN/../$FOO start=$ORIGIN/../$FOO
     12187:	is_dst input=ORIGIN/../$FOO (begin)
     12187:	is_dst ilen=6 rlen=6
     12187:	is_dst ilen=6 input=ORIGIN/../$FOO (7fbf5ff90d21) (ref=ORIGIN)
     12187:	_dl_dst_substitute 7fbf5ff90d60
     12187:	is_dst input=FOO (begin)
     12187:	is_dst ilen=3 rlen=6
     12187:	is_dst input=FOO (begin)
     12187:	is_dst ilen=3 rlen=8
     12187:	is_dst input=FOO (begin)
     12187:	is_dst ilen=3 rlen=3
     12187:	is_dst memcmp failed.
     12187:	_dl_dst_substitute 0
     12187:	is_trusted_path_normalize: /root/../$FOO 13
     12187:	is_trusted_path_normalize: Transformed path /$FOO/
     12187:	is_trusted_path_normalize: Path is NOT trusted.
     12187:	expand_dynamic_string_token input=/foo/$ORIGIN, start=$ORIGIN/../$FOO
     12187:	_dl_dst_count $ORIGIN (7fbf5ff90d35)
     12187:	is_dst input=ORIGIN (begin)
     12187:	is_dst ilen=6 rlen=6
     12187:	is_dst ilen=6 input=ORIGIN (7fbf5ff90d36) (ref=ORIGIN)
     12187:	_dl_dst_count next 
     12187:	_dl_dst_count cnt 1
     12187:	_dl_dst_substitute input=/foo/$ORIGIN start=$ORIGIN/../$FOO
     12187:	is_dst input=ORIGIN (begin)
     12187:	is_dst ilen=6 rlen=6
     12187:	is_dst ilen=6 input=ORIGIN (7fbf5ff90d36) (ref=ORIGIN)
     12187:	_dl_dst_substitute Invalid DST!
     12187:	_dl_dst_substitute ffffffffffffffff
     12187:	security = 1
     12187:	security = 1
     12187:	security = 1
     12187:	security = 1

Here, the $FOO is left unexpanded, we don't have a matching DST,
and so it's copied directly to the output, and has to go through
a trusted directory check which it fails.

The only time the code you quote is executed, this code:

 338           else if (len != 0)
 339             {
 340               /* We cannot use this path element, the value of the
 341                  replacement is unknown.  */
 342               check_for_trusted = false;
 343               wp = last_elem;
 344               break;
 345             }

Is when we find a DST we know, say $LIB, but DL_DST_LIB is invalid
e.g. set to -1, indicating that $LIB's value is unknown, in which case
[$ORIGIN/../$LIB] is entirely considered unknown, and *discarded* (which
is what 'wp = last_elem' does).

For v4 I'm going to clean up _dl_dst_substitute to point out that we
only take individual path elements of a multi-path sequence.

I believe this answers your question. Please clarify if I have not.

Cheers,
Carlos.
Florian Weimer June 8, 2018, 5:21 a.m. | #22
On 06/08/2018 06:14 AM, Carlos O'Donell wrote:
> The only time the code you quote is executed, this code:
> 
>   338           else if (len != 0)
>   339             {
>   340               /* We cannot use this path element, the value of the
>   341                  replacement is unknown.  */
>   342               check_for_trusted = false;
>   343               wp = last_elem;
>   344               break;
>   345             }
> 
> Is when we find a DST we know, say $LIB, but DL_DST_LIB is invalid
> e.g. set to -1, indicating that $LIB's value is unknown, in which case
> [$ORIGIN/../$LIB] is entirely considered unknown, and*discarded*  (which
> is what 'wp = last_elem' does).
> 
> For v4 I'm going to clean up _dl_dst_substitute to point out that we
> only take individual path elements of a multi-path sequence.
> 
> I believe this answers your question. Please clarify if I have not.

Yes, it does.  What the quoted code actually does is something like 
this, right?

   /* Return an empty string to tell the caller to drop the element.  */
   *result = '\0';
   return;

Thanks,
Florian
Carlos O'Donell June 8, 2018, 5:37 a.m. | #23
On 06/07/2018 08:43 AM, Florian Weimer wrote:
> On 06/06/2018 10:18 PM, Carlos O'Donell wrote:
>> +          if (__glibc_unlikely (__libc_enable_secure)
>> +          && ((input[len] != '\0' && input[len] != '/'
>> +               && input[len] != ':')
>> +              || (input != start + 1
>> +              || (input > start + 2 && input[-2] != ':'))))
> 
> Is the ':' check really the right thing here?

No. It's superflous. We'll never see it (except with DT_NEEDED, but in that case
for SUID/SGID we already rejected the load, and !SUID/!SGID we just should interpret
as literal). However, I realized something else is wrong.

> Didn't we change the code so that _dl_dst_substitute is only called with a single component as an argument?

We did.
 
> fillin_rpath splits the string at :/:.  The callers in dl-deps.c

Correct.

I'm going to remove the ':' check *and* cleanup the code to mention explicitly
that we only accept single path components from a :-separated path.

The real problem I failed to notice is that strsep in fillin_rpath destroys the
":" in the string so "intput[-2] != ':'" is always true. Therefore a non-first
component in the multi-component list is always discarded even if it starts with
$ORIGIN and is rooted in a trusted directory.

I have removed the double-negation and fixed the code to read:

 324                    * $ORIGIN appears first in the path element, and is
 325                      the only thing in the element or is immediately
 326                      followed by a path separator and the rest of the
 327                      path.

 320               if (__glibc_unlikely (__libc_enable_secure)
 321                   && !((input == start + 1
 322                         || (input > start + 1 && input[-2] == '\0'))
 323                        && (input[len] == '\0' || input[len] == '/')))
 324                 repl = (const char *) -1;
 325               else
 326                 repl = l->l_origin;


Such that the second path e.g.

[$ORIGIN/../$LIB\0$ORIGIN/../$LIB/subdir]
 ^--- start        ^--- input

... now works.  I verified this with a test.

Test 19  [SUID]: Verify RPATH $ORIGIN/../$LIB:$ORIGIN/../$LIB/subdir loads from /lib64/subdir.
     10833:	decompose_rpath: Call fillin_rpath
     10833:	fillin_rpath (begin)
     10833:	expand_dynamic_string_token input=$ORIGIN/../$LIB, start=$ORIGIN/../$LIB
     10833:	_dl_dst_count $ORIGIN/../$LIB (7f84ae18bd20)
     10833:	is_dst input=ORIGIN/../$LIB (begin)
     10833:	is_dst ilen=6 rlen=6
     10833:	is_dst ilen=6 input=ORIGIN/../$LIB (7f84ae18bd21) (ref=ORIGIN)
     10833:	_dl_dst_count next /../$LIB
     10833:	is_dst input=LIB (begin)
     10833:	is_dst ilen=3 rlen=6
     10833:	is_dst input=LIB (begin)
     10833:	is_dst ilen=3 rlen=8
     10833:	is_dst input=LIB (begin)
     10833:	is_dst ilen=3 rlen=3
     10833:	is_dst ilen=3 input=LIB (7f84ae18bd2c) (ref=LIB)
     10833:	_dl_dst_count next 
     10833:	_dl_dst_count cnt 2
     10833:	DL_DST_REQUIRED l_origin 7f84ae18bd70
     10833:	_dl_dst_substitute input=$ORIGIN/../$LIB start=$ORIGIN/../$LIB
     10833:	is_dst input=ORIGIN/../$LIB (begin)
     10833:	is_dst ilen=6 rlen=6
     10833:	is_dst ilen=6 input=ORIGIN/../$LIB (7f84ae18bd21) (ref=ORIGIN)
     10833:	_dl_dst_substitute 7f84ae18bd70
     10833:	is_dst input=LIB (begin)
     10833:	is_dst ilen=3 rlen=6
     10833:	is_dst input=LIB (begin)
     10833:	is_dst ilen=3 rlen=8
     10833:	is_dst input=LIB (begin)
     10833:	is_dst ilen=3 rlen=3
     10833:	is_dst ilen=3 input=LIB (7f84ae18bd2c) (ref=LIB)
     10833:	_dl_dst_substitute 7f84adf83457
     10833:	is_trusted_path_normalize: /root/../lib64 14
     10833:	is_trusted_path_normalize: Transformed path /lib64/
     10833:	is_trusted_path_normalize: Path IS trusted.
     10833:	expand_dynamic_string_token input=$ORIGIN/../$LIB/subdir, start=$ORIGIN/../$LIB
     10833:	_dl_dst_count $ORIGIN/../$LIB/subdir (7f84ae18bd30)
     10833:	is_dst input=ORIGIN/../$LIB/subdir (begin)
     10833:	is_dst ilen=6 rlen=6
     10833:	is_dst ilen=6 input=ORIGIN/../$LIB/subdir (7f84ae18bd31) (ref=ORIGIN)
     10833:	_dl_dst_count next /../$LIB/subdir
     10833:	is_dst input=LIB/subdir (begin)
     10833:	is_dst ilen=3 rlen=6
     10833:	is_dst input=LIB/subdir (begin)
     10833:	is_dst ilen=3 rlen=8
     10833:	is_dst input=LIB/subdir (begin)
     10833:	is_dst ilen=3 rlen=3
     10833:	is_dst ilen=3 input=LIB/subdir (7f84ae18bd3c) (ref=LIB)
     10833:	_dl_dst_count next /subdir
     10833:	_dl_dst_count cnt 2
     10833:	_dl_dst_substitute input=$ORIGIN/../$LIB/subdir start=$ORIGIN/../$LIB
     10833:	is_dst input=ORIGIN/../$LIB/subdir (begin)
     10833:	is_dst ilen=6 rlen=6
     10833:	is_dst ilen=6 input=ORIGIN/../$LIB/subdir (7f84ae18bd31) (ref=ORIGIN)
     10833:	_dl_dst_substitute 7f84ae18bd70
     10833:	is_dst input=LIB/subdir (begin)
     10833:	is_dst ilen=3 rlen=6
     10833:	is_dst input=LIB/subdir (begin)
     10833:	is_dst ilen=3 rlen=8
     10833:	is_dst input=LIB/subdir (begin)
     10833:	is_dst ilen=3 rlen=3
     10833:	is_dst ilen=3 input=LIB/subdir (7f84ae18bd3c) (ref=LIB)
     10833:	_dl_dst_substitute 7f84adf83457
     10833:	is_trusted_path_normalize: /root/../lib64/subdir 21
     10833:	is_trusted_path_normalize: Transformed path /lib64/subdir/
     10833:	is_trusted_path_normalize: Path IS trusted.
     10833:	security = 1
     10833:	security = 1
     10833:	security = 1
     10833:	security = 1
PASS: Correct primary library.
PASS: Correctly used second valid RPATH component.

Lastly, we could get here via DT_NEEDED paths, and those could have weird SONAMES
which have ':' in them, but we should not treat those as special in any way.
It is the correct abstraction to have fillin_rpath do splitting, while also having
expand_dst() simply pass a full SONAME (which may have colons in it).

Now with this update _dl_dst_substitute doesn't care about colons.

> I also suggest to use struct alloc_buffer, to make the code more obviously correct.

I would like to leave this for a subsequent cleanup, just to get to an acceptable
set of semantics with the current code.

I'll post v4 of my patch with new tests shortly.

Cheers,
Carlos.
Carlos O'Donell June 8, 2018, 5:45 a.m. | #24
On 06/06/2018 04:18 PM, Carlos O'Donell wrote:
> On 06/06/2018 12:10 PM, Carlos O'Donell wrote:
>> On 06/06/2018 01:02 AM, Carlos O'Donell wrote:
>>> This commit improves DST handling significantly in the following
>>
>> v2
>>
>> - Fix is_dst() by adding back string comparison, and clarify comment.
>>   Added extra test testcases (49 now) to cover this error.
>> - Renamed parameters as 'input' for data from ELF file, and 'ref' as
>>   reference DST to compare against.
>> - Removed dead update to name in is_dst().
>> - Killed DL_DST_COUNT, not needed really, and a weak optimization.
>> - Did not remove len from _dl_dst_count because we use it to advance
>>   input more quickly to the next DST.
>>
> 
> v3
> - Use memcmp in is_dst().
> - In _dl_dst_substitute use 'len != 0' conditional to clarify that
>   all we are looking to do is distinguish between a valid DST
>   for which we don't recognize the DST, and an invalid DST which
>   may just be stray characters which we will copy to the result.
> - Added a few more tests: Small invalid sequence e.g. ${}, and
>   large valid sequence with unknown DST.
> 

v4
- Remove ":" logic from _dl_dst_substitute, and rewrite exception
  logic without the double negative, and in general cleanup the
  function to make it clear we accept only single path elements
  for processing.
- Cleanup comments regarding RESULT size and what is required.
- Adjust is_dst() comment to note that $ is not counted in length
  returned.
- Fix _dl_dst_substitute exception logic to check for NULL separator
  between non-first path since fillin_rpath and other callers will
  use strsep to split colon separated path elements. This fixes
  the case tested by test 19.

This commit improves DST handling significantly in the following
ways: firstly is_dst () is overhauled to correctly process DST
sequences that would be accepted given the ELF gABI. This means that
we actually now accept slightly more sequences than before.  Now we
accept $ORIGIN$ORIGIN, but in the past we accepted only $ORIGIN\0 or
$ORIGIN/..., but this kind of behaviour results in unexpected
and uninterpreted DST sequences being used as literal search paths
leading to security defects. Therefore the first step in correcting
this defect is making is_dst () properly account for all DSTs
and making the function context free in the sense that it counts
DSTs without knowledge of path, or AT_SECURE. Next, _dl_dst_count ()
is also simplified to count all DSTs regardless of context.
Then in _dl_dst_substitute () we reintroduce context-dependent
processing for such things as AT_SECURE handling. At the level of
_dl_dst_substitute we can have access to things like the true start
of the string sequence to validate $ORIGIN-based paths rooted in
trusted directories.  Lastly, callers of _dl_dst_substitute () are
adjusted to pass in the start of their string sequences, this includes
expand_dynamic_string_token () and fillin_rpath (). Lastly, after this
commit we tighten up the accepted sequences in AT_SECURE, and avoid
leaving unexpanded DSTs, this is noted in the NEWS entry.

Verified with a sequence of 63 tests on x86_64 that cover
non-AT_SECURE and AT_SECURE testing using a sysroot (requires root
to run). The tests cover cases for bug 23102, bug 21942, bug 18018,
and bug 23259. These tests are not yet appropriate for the glibc
regression testsuite, but with the upcoming test-in-container testing
framework it should be possible to include these tests upstream soon.

See the mailing list for the tests:
https://www.sourceware.org/ml/libc-alpha/2018-06/msg00073.html
---
 ChangeLog                  |  20 +++++
 NEWS                       |  10 +++
 elf/dl-deps.c              |   4 +-
 elf/dl-dst.h               |  13 ---
 elf/dl-load.c              | 210 ++++++++++++++++++++++++++++++---------------
 sysdeps/generic/ldsodefs.h |   5 +-
 6 files changed, 177 insertions(+), 85 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index a3bc2bf31e..f6dbc961e2 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,23 @@
+2018-06-06  Carlos O'Donell  <carlos@redhat.com>
+	    Andreas Schwab  <schwab@suse.de>
+	    Dmitry V. Levin  <ldv@altlinux.org>
+
+	[BZ #23102]
+	[BZ #21942]
+	[BZ #18018]
+	[BZ #23259]
+	CVE-2011-0536
+	* elf/dl-load.c (is_dst): Comment.  Support ELF gABI.
+	(_dl_dst_count): Comment.  Simplify and count DSTs.
+	(_dl_dst_substitute): Comment.  Support __libc_enable_secure handling.
+	Add string start to arguments.
+	(expand_dybamic_string_token): Comment. Accept path start.
+	(fillin_rpath): Pass string start to expand_dynamic_string_token.
+	* sysdeps/generic/ldsodefs.h: _dl_dst_substitute takes additiional
+	string start argument.
+	* elf/dl-deps.c (expand_dst): Adjust call to _dl_dst_substitute.
+	* elf/dl-dst.h: Remove DL_DST_COUNT.
+
 2018-06-05  Joseph Myers  <joseph@codesourcery.com>
 
 	* sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h (HWCAP_DIT): New
diff --git a/NEWS b/NEWS
index e2a6f45121..0d0bc9ad4c 100644
--- a/NEWS
+++ b/NEWS
@@ -41,6 +41,16 @@ Major new features:
   NI_IDN_ALLOW_UNASSIGNED, NI_IDN_USE_STD3_ASCII_RULES) have been
   deprecated.  They no longer have any effect.
 
+* Parsing of dynamic string tokens in DT_RPATH, DT_RUNPATH, and DT_NEEDED
+  has been expanded to support the full range of ELF gABI expressions
+  including such constructs as '$ORIGIN$ORIGIN' (if valid).  For SUID/GUID
+  applications the rules have been further restricted, and where in the
+  past a dynamic string token sequence may have been interpreted as a
+  literal string it will now cause a load failure.  These load failures
+  were always considered unspecified behaviour from the perspective of the
+  dynamic loader, and for safety are now load errors e.g. /foo/${ORIGIN}.so
+  in DT_NEEDED results in a load failure now.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * The nonstandard header files <libio.h> and <_G_config.h> are no longer
diff --git a/elf/dl-deps.c b/elf/dl-deps.c
index c975fcffd7..07b0859456 100644
--- a/elf/dl-deps.c
+++ b/elf/dl-deps.c
@@ -100,7 +100,7 @@ struct list
   ({									      \
     const char *__str = (str);						      \
     const char *__result = __str;					      \
-    size_t __dst_cnt = DL_DST_COUNT (__str);				      \
+    size_t __dst_cnt = _dl_dst_count (__str);				      \
 									      \
     if (__dst_cnt != 0)							      \
       {									      \
@@ -114,7 +114,7 @@ DST not allowed in SUID/SGID programs"));				      \
 	__newp = (char *) alloca (DL_DST_REQUIRED (l, __str, strlen (__str),  \
 						   __dst_cnt));		      \
 									      \
-	__result = _dl_dst_substitute (l, __str, __newp);		      \
+	__result = _dl_dst_substitute (l, __str, __str, __newp);	      \
 									      \
 	if (*__result == '\0')						      \
 	  {								      \
diff --git a/elf/dl-dst.h b/elf/dl-dst.h
index 32de5d225a..859032be0d 100644
--- a/elf/dl-dst.h
+++ b/elf/dl-dst.h
@@ -18,19 +18,6 @@
 
 #include "trusted-dirs.h"
 
-/* Determine the number of DST elements in the name.  Only if IS_PATH is
-   nonzero paths are recognized (i.e., multiple, ':' separated filenames).  */
-#define DL_DST_COUNT(name) \
-  ({									      \
-    size_t __cnt = 0;							      \
-    const char *__sf = strchr (name, '$');				      \
-									      \
-    if (__glibc_unlikely (__sf != NULL))				      \
-      __cnt = _dl_dst_count (__sf);					      \
-									      \
-    __cnt; })
-
-
 #ifdef SHARED
 # define IS_RTLD(l) (l) == &GL(dl_rtld_map)
 #else
diff --git a/elf/dl-load.c b/elf/dl-load.c
index 431236920f..efcfa13454 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -177,114 +177,181 @@ is_trusted_path_normalize (const char *path, size_t len)
   return false;
 }
 
+/* Given a substring starting at INPUT, just after the DST '$' start
+   token, determine if INPUT contains DST token REF, following the
+   ELF gABI rules for DSTs:
 
+   * Longest possible sequence using the rules (greedy).
+
+   * Must start with a $ (enforced by caller).
+
+   * Must follow $ with one underscore or ASCII [A-Za-z] (caller
+     follows these rules for REF and we enforce the comparison)
+     or '{' (start curly quoted name).
+
+   * Must follow first two characters with zero or more [A-Za-z0-9_]
+     (enforced by caller) or '}' (end curly quoted name).
+
+   If the sequence is a DST matching REF then the length of the DST
+   (excluding the $ sign but including curly braces, if any) is
+   returned, otherwise 0.  */
 static size_t
-is_dst (const char *start, const char *name, const char *str, int secure)
+is_dst (const char *input, const char *ref)
 {
-  size_t len;
+  size_t ilen, rlen;
   bool is_curly = false;
 
-  if (name[0] == '{')
+  /* Is a ${...} input sequence?  */
+  if (input[0] == '{')
     {
       is_curly = true;
-      ++name;
+      ++input;
     }
 
-  len = 0;
-  while (name[len] == str[len] && name[len] != '\0')
-    ++len;
+  /* Find longest valid input sequence.  */
+  ilen = 0;
+  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
+	 || (input[ilen] >= 'a' && input[ilen] <= 'z')
+	 || (input[ilen] >= '0' && input[ilen] <= '9')
+	 || (input[ilen] == '_'))
+    ++ilen;
+
+  rlen = strlen (ref);
+
+  /* Can't be the DST we are looking for.  */
+  if (rlen != ilen)
+    return 0;
+
+  /* Compare the DST (no strncmp this early in startup).  */
+  if (memcmp (input, ref, ilen) != 0)
+    return 0;
 
   if (is_curly)
     {
-      if (name[len] != '}')
+      /* Invalid curly sequence!  */
+      if (input[ilen] != '}')
 	return 0;
 
-      /* Point again at the beginning of the name.  */
-      --name;
-      /* Skip over closing curly brace and adjust for the --name.  */
-      len += 2;
+      /* Count the two curly braces.  */
+      ilen += 2;
     }
-  else if (name[len] != '\0' && name[len] != '/')
-    return 0;
-
-  if (__glibc_unlikely (secure)
-      && ((name[len] != '\0' && name[len] != '/')
-	  || (name != start + 1)))
-    return 0;
 
-  return len;
+  return ilen;
 }
 
-
+/* INPUT is the start of a DST sequence at the first '$' occurrence.
+   If there is a DST we call into _dl_dst_count to count the number of
+   DSTs.  We count all known DSTs regardless of __libc_enable_secure;
+   the caller is responsible for enforcing the security of the
+   substitution rules (usually _dl_dst_substitute).  */
 size_t
-_dl_dst_count (const char *name)
+_dl_dst_count (const char *input)
 {
-  const char *const start = name;
   size_t cnt = 0;
 
+  input = strchr (input, '$');
+
+  /* Most likely there is no DST.  */
+  if (__glibc_likely (input == NULL))
+    return 0;
+
   do
     {
       size_t len;
 
-      /* $ORIGIN is not expanded for SUID/GUID programs (except if it
-	 is $ORIGIN alone) and it must always appear first in path.  */
-      ++name;
-      if ((len = is_dst (start, name, "ORIGIN", __libc_enable_secure)) != 0
-	  || (len = is_dst (start, name, "PLATFORM", 0)) != 0
-	  || (len = is_dst (start, name, "LIB", 0)) != 0)
+      ++input;
+      /* All DSTs must follow ELF gABI rules, see is_dst ().  */
+      if ((len = is_dst (input, "ORIGIN")) != 0
+	  || (len = is_dst (input, "PLATFORM")) != 0
+	  || (len = is_dst (input, "LIB")) != 0)
 	++cnt;
 
-      name = strchr (name + len, '$');
+      /* There may be more than one DST in the input.  */
+      input = strchr (input + len, '$');
     }
-  while (name != NULL);
+  while (input != NULL);
 
   return cnt;
 }
 
-
+/* Process INPUT for DSTs and store in RESULT using the information
+   from link map L to resolve the DSTs.  The value of START must equal
+   the start of the parent string if INPUT is a substring sequence
+   being parsed with a NULL separator e.g. $ORIGIN\0$PLATFORM.  This
+   function only handles one path element and does not handle
+   ":"-separated path elements (see fillin_rpath ()).  Lastly the size
+   of result in bytes should be at least equal to the value returned
+   by DL_DST_REQUIRED.  Note that it is possible for a DT_NEEDED,
+   DT_AUXILIARY, and DT_FILTER entries to have colons, but we treat
+   those as literal colons here, not as path delimeters.  */
 char *
-_dl_dst_substitute (struct link_map *l, const char *name, char *result)
+_dl_dst_substitute (struct link_map *l, const char *start,
+		    const char *input, char *result)
 {
-  const char *const start = name;
-
   /* Now fill the result path.  While copying over the string we keep
-     track of the start of the last path element.  When we come across
-     a DST we copy over the value or (if the value is not available)
-     leave the entire path element out.  */
+     track of the start of the start of the path element i.e. begin.
+     When we come across a DST we copy over the value or (if the value
+     is not available) if the value is not available we discard the
+     path.  */
   char *wp = result;
-  char *last_elem = result;
+  char *begin = result;
   bool check_for_trusted = false;
 
   do
     {
-      if (__glibc_unlikely (*name == '$'))
+      if (__glibc_unlikely (*input == '$'))
 	{
 	  const char *repl = NULL;
 	  size_t len;
 
-	  ++name;
-	  if ((len = is_dst (start, name, "ORIGIN", __libc_enable_secure)) != 0)
+	  ++input;
+	  if ((len = is_dst (input, "ORIGIN")) != 0)
 	    {
-	      repl = l->l_origin;
+	      /* For SUID/GUID programs we normally ignore the path with
+		 a DST in DT_RUNPATH, or DT_RPATH.  However, there is
+		 one exception to this rule, and it is:
+
+		   * $ORIGIN appears first in the path element, and is
+		     the only thing in the element or is immediately
+		     followed by a path separator and the rest of the
+		     path.
+
+		   * The path element is rooted in a trusted directory.
+
+		 This exception allows such programs to reference
+		 shared libraries in subdirectories of trusted
+		 directories.  The use case is one of general
+		 organization and deployment flexibility.
+		 Trusted directories are usually such paths as "/lib64"
+		 or "/lib".  */
+	      if (__glibc_unlikely (__libc_enable_secure)
+		  && !((input == start + 1
+			|| (input > start + 1 && input[-2] == '\0'))
+		       && (input[len] == '\0' || input[len] == '/')))
+		repl = (const char *) -1;
+	      else
+	        repl = l->l_origin;
+
 	      check_for_trusted = (__libc_enable_secure
 				   && l->l_type == lt_executable);
 	    }
-	  else if ((len = is_dst (start, name, "PLATFORM", 0)) != 0)
+	  else if ((len = is_dst (input, "PLATFORM")) != 0)
 	    repl = GLRO(dl_platform);
-	  else if ((len = is_dst (start, name, "LIB", 0)) != 0)
+	  else if ((len = is_dst (input, "LIB")) != 0)
 	    repl = DL_DST_LIB;
 
 	  if (repl != NULL && repl != (const char *) -1)
 	    {
 	      wp = __stpcpy (wp, repl);
-	      name += len;
+	      input += len;
 	    }
-	  else if (len > 1)
+	  else if (len != 0)
 	    {
-	      /* We cannot use this path element, the value of the
-		 replacement is unknown.  */
-	      wp = last_elem;
-	      break;
+	      /* We found a valid DST that we know about, but we could
+	         not find a replacement value for it, therefore we
+		 cannot use this path element and discard it.  */
+	      *begin = '\0';
+	      return result;
 	    }
 	  else
 	    /* No DST we recognize.  */
@@ -292,16 +359,19 @@ _dl_dst_substitute (struct link_map *l, const char *name, char *result)
 	}
       else
 	{
-	  *wp++ = *name++;
+	  *wp++ = *input++;
 	}
     }
-  while (*name != '\0');
+  while (*input != '\0');
 
-  /* In SUID/SGID programs, after $ORIGIN expansion the normalized
+  /* In SUID/SGID programs, after DST expansion the normalized
      path must be rooted in one of the trusted directories.  */
   if (__glibc_unlikely (check_for_trusted)
-      && !is_trusted_path_normalize (last_elem, wp - last_elem))
-    wp = last_elem;
+      && !is_trusted_path_normalize (begin, wp - begin))
+    {
+      *begin = '\0';
+      return result;
+    }
 
   *wp = '\0';
 
@@ -309,13 +379,16 @@ _dl_dst_substitute (struct link_map *l, const char *name, char *result)
 }
 
 
-/* Return copy of argument with all recognized dynamic string tokens
-   ($ORIGIN and $PLATFORM for now) replaced.  On some platforms it
-   might not be possible to determine the path from which the object
-   belonging to the map is loaded.  In this case the path element
-   containing $ORIGIN is left out.  */
+/* Return a malloc allocated copy of INPUT with all recognized DSTs
+   replaced.  The value of START must equal the start of the parent
+   string if INPUT is a substring sequence being parsed with NULL
+   separators e.g. $ORIGIN\0$PLATFORM.  On some platforms it might not
+   be possible to determine the path from which the object belonging to
+   the map is loaded.  In this case the path element containing the DST
+   is left out.  On error NULL is returned. */
 static char *
-expand_dynamic_string_token (struct link_map *l, const char *s)
+expand_dynamic_string_token (struct link_map *l, const char *start,
+			     const char *input)
 {
   /* We make two runs over the string.  First we determine how large the
      resulting string is and then we copy it over.  Since this is no
@@ -326,21 +399,21 @@ expand_dynamic_string_token (struct link_map *l, const char *s)
   char *result;
 
   /* Determine the number of DST elements.  */
-  cnt = DL_DST_COUNT (s);
+  cnt = _dl_dst_count (input);
 
   /* If we do not have to replace anything simply copy the string.  */
   if (__glibc_likely (cnt == 0))
-    return __strdup (s);
+    return __strdup (input);
 
   /* Determine the length of the substituted string.  */
-  total = DL_DST_REQUIRED (l, s, strlen (s), cnt);
+  total = DL_DST_REQUIRED (l, input, strlen (input), cnt);
 
   /* Allocate the necessary memory.  */
   result = (char *) malloc (total + 1);
   if (result == NULL)
     return NULL;
 
-  return _dl_dst_substitute (l, s, result);
+  return _dl_dst_substitute (l, start, input, result);
 }
 
 
@@ -387,6 +460,7 @@ fillin_rpath (char *rpath, struct r_search_path_elem **result, const char *sep,
 	      const char *what, const char *where, struct link_map *l)
 {
   char *cp;
+  char *start = rpath;
   size_t nelems = 0;
 
   while ((cp = __strsep (&rpath, sep)) != NULL)
@@ -398,7 +472,7 @@ fillin_rpath (char *rpath, struct r_search_path_elem **result, const char *sep,
       /* `strsep' can pass an empty string.  */
       if (*cp != '\0')
 	{
-	  to_free = cp = expand_dynamic_string_token (l, cp);
+	  to_free = cp = expand_dynamic_string_token (l, start, cp);
 
 	  /* expand_dynamic_string_token can return NULL in case of empty
 	     path or memory allocation failure.  */
@@ -2091,7 +2165,7 @@ _dl_map_object (struct link_map *loader, const char *name,
     {
       /* The path may contain dynamic string tokens.  */
       realname = (loader
-		  ? expand_dynamic_string_token (loader, name)
+		  ? expand_dynamic_string_token (loader, name, name)
 		  : __strdup (name));
       if (realname == NULL)
 	fd = -1;
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 95dc87519b..688ff60785 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -1108,8 +1108,9 @@ extern const char *_dl_get_origin (void) attribute_hidden;
 extern size_t _dl_dst_count (const char *name) attribute_hidden;
 
 /* Substitute DST values.  */
-extern char *_dl_dst_substitute (struct link_map *l, const char *name,
-				 char *result) attribute_hidden;
+extern char *_dl_dst_substitute (struct link_map *l, const char *start,
+				 const char *name, char *result)
+     attribute_hidden;
 
 /* Open the shared object NAME, relocate it, and run its initializer if it
    hasn't already been run.  MODE is as for `dlopen' (see <dlfcn.h>).  If
Carlos O'Donell June 8, 2018, 5:46 a.m. | #25
On 06/08/2018 01:21 AM, Florian Weimer wrote:
> On 06/08/2018 06:14 AM, Carlos O'Donell wrote:
>> The only time the code you quote is executed, this code:
>>
>>   338           else if (len != 0)
>>   339             {
>>   340               /* We cannot use this path element, the value of the
>>   341                  replacement is unknown.  */
>>   342               check_for_trusted = false;
>>   343               wp = last_elem;
>>   344               break;
>>   345             }
>>
>> Is when we find a DST we know, say $LIB, but DL_DST_LIB is invalid
>> e.g. set to -1, indicating that $LIB's value is unknown, in which case
>> [$ORIGIN/../$LIB] is entirely considered unknown, and*discarded*  (which
>> is what 'wp = last_elem' does).
>>
>> For v4 I'm going to clean up _dl_dst_substitute to point out that we
>> only take individual path elements of a multi-path sequence.
>>
>> I believe this answers your question. Please clarify if I have not.
> 
> Yes, it does.  What the quoted code actually does is something like this, right?
> 
>   /* Return an empty string to tell the caller to drop the element.  */
>   *result = '\0';
>   return;

Exactly, and you'll see I do just that in v4 patch to make things clearer.

            {
-             /* We cannot use this path element, the value of the
-                replacement is unknown.  */
-             wp = last_elem;
-             break;
+             /* We found a valid DST that we know about, but we could
+                not find a replacement value for it, therefore we
+                cannot use this path element and discard it.  */
+             *begin = '\0';
+             return result;
            }

Cheers,
Carlos.
Carlos O'Donell June 8, 2018, 5:50 a.m. | #26
On 06/08/2018 01:46 AM, Carlos O'Donell wrote:
> On 06/08/2018 01:21 AM, Florian Weimer wrote:
>> On 06/08/2018 06:14 AM, Carlos O'Donell wrote:
>>> The only time the code you quote is executed, this code:
>>>
>>>   338           else if (len != 0)
>>>   339             {
>>>   340               /* We cannot use this path element, the value of the
>>>   341                  replacement is unknown.  */
>>>   342               check_for_trusted = false;
>>>   343               wp = last_elem;
>>>   344               break;
>>>   345             }
>>>
>>> Is when we find a DST we know, say $LIB, but DL_DST_LIB is invalid
>>> e.g. set to -1, indicating that $LIB's value is unknown, in which case
>>> [$ORIGIN/../$LIB] is entirely considered unknown, and*discarded*  (which
>>> is what 'wp = last_elem' does).
>>>
>>> For v4 I'm going to clean up _dl_dst_substitute to point out that we
>>> only take individual path elements of a multi-path sequence.
>>>
>>> I believe this answers your question. Please clarify if I have not.
>>
>> Yes, it does.  What the quoted code actually does is something like this, right?
>>
>>   /* Return an empty string to tell the caller to drop the element.  */
>>   *result = '\0';
>>   return;
> 
> Exactly, and you'll see I do just that in v4 patch to make things clearer.
> 
>             {
> -             /* We cannot use this path element, the value of the
> -                replacement is unknown.  */
> -             wp = last_elem;
> -             break;
> +             /* We found a valid DST that we know about, but we could
> +                not find a replacement value for it, therefore we
> +                cannot use this path element and discard it.  */
> +             *begin = '\0';
> +             return result;
>             }

... and as soon as I see this I realize that we can just use result
and should remove begin.

Cheers,
Carlos.
Florian Weimer June 8, 2018, 5:51 a.m. | #27
On 06/08/2018 07:45 AM, Carlos O'Donell wrote:
> +	      /* For SUID/GUID programs we normally ignore the path with
> +		 a DST in DT_RUNPATH, or DT_RPATH.  However, there is
> +		 one exception to this rule, and it is:
> +
> +		   * $ORIGIN appears first in the path element, and is
> +		     the only thing in the element or is immediately
> +		     followed by a path separator and the rest of the
> +		     path.
> +
> +		   * The path element is rooted in a trusted directory.
> +
> +		 This exception allows such programs to reference
> +		 shared libraries in subdirectories of trusted
> +		 directories.  The use case is one of general
> +		 organization and deployment flexibility.
> +		 Trusted directories are usually such paths as "/lib64"
> +		 or "/lib".  */
> +	      if (__glibc_unlikely (__libc_enable_secure)
> +		  && !((input == start + 1
> +			|| (input > start + 1 && input[-2] == '\0'))
> +		       && (input[len] == '\0' || input[len] == '/')))
> +		repl = (const char *) -1;

The comment does not match the code: The code checks that $ORIGIN comes 
first in the *path*, not *path element* (hence the need for the start 
variable).  I'm not sure what the right behavior is here.  Going by path 
element seems more correct.

(The begin variable doesn't seem to add much value, as you noted.)

Thanks,
Florian
Carlos O'Donell June 8, 2018, 6:03 a.m. | #28
On 06/08/2018 01:51 AM, Florian Weimer wrote:
> On 06/08/2018 07:45 AM, Carlos O'Donell wrote:
>> +          /* For SUID/GUID programs we normally ignore the path with
>> +         a DST in DT_RUNPATH, or DT_RPATH.  However, there is
>> +         one exception to this rule, and it is:
>> +
>> +           * $ORIGIN appears first in the path element, and is
>> +             the only thing in the element or is immediately
>> +             followed by a path separator and the rest of the
>> +             path.
>> +
>> +           * The path element is rooted in a trusted directory.
>> +
>> +         This exception allows such programs to reference
>> +         shared libraries in subdirectories of trusted
>> +         directories.  The use case is one of general
>> +         organization and deployment flexibility.
>> +         Trusted directories are usually such paths as "/lib64"
>> +         or "/lib".  */
>> +          if (__glibc_unlikely (__libc_enable_secure)
>> +          && !((input == start + 1
>> +            || (input > start + 1 && input[-2] == '\0'))
>> +               && (input[len] == '\0' || input[len] == '/')))
>> +        repl = (const char *) -1;
> 
> The comment does not match the code: The code checks that $ORIGIN
> comes first in the *path*, not *path element* (hence the need for the
> start variable).  I'm not sure what the right behavior is here.
> Going by path element seems more correct.
Sorry, there is a bit of confusion regarding terminology here.

When you have multiple colon separated paths in DT_RUNPATH or
DT_RPATH, what is each path called? Are they path elements?

Or are we calling path elements only those things that are
between slashes in a singular path?

It seems context dependent, but I understand the confusion.

[/path/dir:/path/dir1:/path/dir2]
 ^^^^^^^^^ --- What do we call this in the context of multiple paths?

Just a 'path' ?

I consider the following valid:

[$ORIGIN/../$LIB]

I do *not* consider the following valid:

[$LIB/../$ORIGIN/../$LIB]

Even though they are the same path.

I believe the glibc exception was a very *narrow* exception which
allowed only $ORIGIN-starting paths rooted in trusted directories
to be allowed for SUID/SGID. It's just easier to explain to
developers.

In the abstract all we really care about is that the result be
rooted in a trusted directory. So in that case the appearance
of *any* DST would enable check_for_trusted. That would be an
expansion of what is allowed for SUID/SGID though, and I'm not
sure I want to do that any further.

Thoughts?

Cheers,
Carlos.
Florian Weimer June 8, 2018, 6:25 a.m. | #29
On 06/08/2018 08:03 AM, Carlos O'Donell wrote:
> On 06/08/2018 01:51 AM, Florian Weimer wrote:
>> On 06/08/2018 07:45 AM, Carlos O'Donell wrote:
>>> +          /* For SUID/GUID programs we normally ignore the path with
>>> +         a DST in DT_RUNPATH, or DT_RPATH.  However, there is
>>> +         one exception to this rule, and it is:
>>> +
>>> +           * $ORIGIN appears first in the path element, and is
>>> +             the only thing in the element or is immediately
>>> +             followed by a path separator and the rest of the
>>> +             path.
>>> +
>>> +           * The path element is rooted in a trusted directory.
>>> +
>>> +         This exception allows such programs to reference
>>> +         shared libraries in subdirectories of trusted
>>> +         directories.  The use case is one of general
>>> +         organization and deployment flexibility.
>>> +         Trusted directories are usually such paths as "/lib64"
>>> +         or "/lib".  */
>>> +          if (__glibc_unlikely (__libc_enable_secure)
>>> +          && !((input == start + 1
>>> +            || (input > start + 1 && input[-2] == '\0'))
>>> +               && (input[len] == '\0' || input[len] == '/')))
>>> +        repl = (const char *) -1;
>>
>> The comment does not match the code: The code checks that $ORIGIN
>> comes first in the *path*, not *path element* (hence the need for the
>> start variable).  I'm not sure what the right behavior is here.
>> Going by path element seems more correct.
> Sorry, there is a bit of confusion regarding terminology here.
> 
> When you have multiple colon separated paths in DT_RUNPATH or
> DT_RPATH, what is each path called? Are they path elements?
> 
> Or are we calling path elements only those things that are
> between slashes in a singular path?
> 
> It seems context dependent, but I understand the confusion.
> 
> [/path/dir:/path/dir1:/path/dir2]
>   ^^^^^^^^^ --- What do we call this in the context of multiple paths?
> 
> Just a 'path' ?
> 
> I consider the following valid:
> 
> [$ORIGIN/../$LIB]

I'm actually asking about this:

[$LIB:$ORIGIN/../$LIB]

Doesn't the current code reject this?

(Some of the callers split at :, not /.)

Thanks,
Florian
Carlos O'Donell June 11, 2018, 2:54 a.m. | #30
On 06/08/2018 02:25 AM, Florian Weimer wrote:
> On 06/08/2018 08:03 AM, Carlos O'Donell wrote:
>> On 06/08/2018 01:51 AM, Florian Weimer wrote:
>>> On 06/08/2018 07:45 AM, Carlos O'Donell wrote:
>>>> +          /* For SUID/GUID programs we normally ignore the path with
>>>> +         a DST in DT_RUNPATH, or DT_RPATH.  However, there is
>>>> +         one exception to this rule, and it is:
>>>> +
>>>> +           * $ORIGIN appears first in the path element, and is
>>>> +             the only thing in the element or is immediately
>>>> +             followed by a path separator and the rest of the
>>>> +             path.
>>>> +
>>>> +           * The path element is rooted in a trusted directory.
>>>> +
>>>> +         This exception allows such programs to reference
>>>> +         shared libraries in subdirectories of trusted
>>>> +         directories.  The use case is one of general
>>>> +         organization and deployment flexibility.
>>>> +         Trusted directories are usually such paths as "/lib64"
>>>> +         or "/lib".  */
>>>> +          if (__glibc_unlikely (__libc_enable_secure)
>>>> +          && !((input == start + 1
>>>> +            || (input > start + 1 && input[-2] == '\0'))
>>>> +               && (input[len] == '\0' || input[len] == '/')))
>>>> +        repl = (const char *) -1;
>>>
>>> The comment does not match the code: The code checks that $ORIGIN
>>> comes first in the *path*, not *path element* (hence the need for the
>>> start variable).  I'm not sure what the right behavior is here.
>>> Going by path element seems more correct.
>> Sorry, there is a bit of confusion regarding terminology here.
>>
>> When you have multiple colon separated paths in DT_RUNPATH or
>> DT_RPATH, what is each path called? Are they path elements?
>>
>> Or are we calling path elements only those things that are
>> between slashes in a singular path?
>>
>> It seems context dependent, but I understand the confusion.
>>
>> [/path/dir:/path/dir1:/path/dir2]
>>   ^^^^^^^^^ --- What do we call this in the context of multiple paths?
>>
>> Just a 'path' ?
>>
>> I consider the following valid:
>>
>> [$ORIGIN/../$LIB]
> 
> I'm actually asking about this:
> 
> [$LIB:$ORIGIN/../$LIB]
> 
> Doesn't the current code reject this?

The current code does not reject this. In fact it accepts $LIB in RPATH
et. al. because there is no code to reject it. Only $ORIGIN has any
restrictions.

The patched code continues to accept [$LIB:$ORIGIN/../$LIB].

> (Some of the callers split at :, not /.)

The fillin_rpath caller splits at ':', generally for DT_RUNPATH and
DT_RPATH. While the other callers do no splitting. Do you see a caller
that splits at '/'?

In summary, the current patch doesn't change the handling of *just* $LIB
for SUID/SGID binaries.

Should all paths get tested for trusted paths for a SUID/SGID binary?
It seems like that's the right idea.

@@ -365,7 +383,8 @@ _dl_dst_substitute (struct link_map *l, const char *start,
 
   /* In SUID/SGID programs, after DST expansion the normalized
      path must be rooted in one of the trusted directories.  */
-  if (__glibc_unlikely (check_for_trusted)
+  if (__glibc_unlikely (__libc_enable_secure)
+      && l->l_type == lt_executable
       && !is_trusted_path_normalize (result, wp - result))
     {
       *result = '\0';

Just drop check_for_trusted, and execute the is_trusted_path_normalize
check for all SUID/SGID paths?

Cheers,
Carlos.
Florian Weimer June 11, 2018, 7:28 a.m. | #31
On 06/11/2018 04:54 AM, Carlos O'Donell wrote:
>>> I consider the following valid:
>>>
>>> [$ORIGIN/../$LIB]
>> I'm actually asking about this:
>>
>> [$LIB:$ORIGIN/../$LIB]
>>
>> Doesn't the current code reject this?
> The current code does not reject this. In fact it accepts $LIB in RPATH
> et. al. because there is no code to reject it. Only $ORIGIN has any
> restrictions.

I meant the $ORIGN part.  As far as I understand it, start will not be 
close to input for the second part containing origin (after the :), so 
this check in _dl_dst_substitute should reject it:

		      || (input != start + 1
			  || (input > start + 2 && input[-2] != ':'))))

I'm not sure that this is right.

> Should all paths get tested for trusted paths for a SUID/SGID binary?
> It seems like that's the right idea.
> 
> @@ -365,7 +383,8 @@ _dl_dst_substitute (struct link_map *l, const char *start,
>  
>    /* In SUID/SGID programs, after DST expansion the normalized
>       path must be rooted in one of the trusted directories.  */
> -  if (__glibc_unlikely (check_for_trusted)
> +  if (__glibc_unlikely (__libc_enable_secure)
> +      && l->l_type == lt_executable
>        && !is_trusted_path_normalize (result, wp - result))
>      {
>        *result = '\0';
> 
> Just drop check_for_trusted, and execute the is_trusted_path_normalize
> check for all SUID/SGID paths?

No, $ORIGIN in a trusted path is itself trusted (because you cannot 
manipulate it using hard links).  I think there are installations out 
there which depend on this.

Thanks,
Florian
Andreas Schwab June 11, 2018, 8:04 a.m. | #32
On Jun 07 2018, Carlos O'Donell <carlos@redhat.com> wrote:

> Careful, is_dst () takes as input the start of a DST sequence,
> but that sequence is not validated yet.

You have already validated the prefix by comparing it with the string,
and you only have to check the next character.

> What if name was '$ORIGIN-' but str was '$ORIGIN'?

That would match of course, since `-' is not part of the DST (with the
relaxed rules).

Andreas.
Carlos O'Donell June 11, 2018, 2:44 p.m. | #33
On 06/11/2018 03:28 AM, Florian Weimer wrote:
> On 06/11/2018 04:54 AM, Carlos O'Donell wrote:
>>>> I consider the following valid:
>>>>
>>>> [$ORIGIN/../$LIB]
>>> I'm actually asking about this:
>>>
>>> [$LIB:$ORIGIN/../$LIB]
>>>
>>> Doesn't the current code reject this?
>> The current code does not reject this. In fact it accepts $LIB in RPATH
>> et. al. because there is no code to reject it. Only $ORIGIN has any
>> restrictions.
> 
> I meant the $ORIGN part.  As far as I understand it, start will not
> be close to input for the second part containing origin (after the
> :), so this check in _dl_dst_substitute should reject it:

The only code that splits on ':' is fillin_rpath, and it passes start
as the start of the whole DT_RUNPATH/DT_RPATH, and uses strsep so there
is no ':' to find.
 
>               || (input != start + 1
>               || (input > start + 2 && input[-2] != ':'))))
> 
> I'm not sure that this is right.

You are correct, and I fixed this for you based on earlier comments when
I sent out v4. Notice the v4 patch calls this out and I added a test
 case for it when you commented on it.

- Fix _dl_dst_substitute exception logic to check for NULL separator
  between non-first path since fillin_rpath and other callers will
  use strsep to split colon separated path elements. This fixes
  the case tested by test 19.

+             /* For SUID/GUID programs we normally ignore the path with
+                a DST in DT_RUNPATH, or DT_RPATH.  However, there is
+                one exception to this rule, and it is:
+
+                  * $ORIGIN appears first in the path element, and is
+                    the only thing in the element or is immediately
+                    followed by a path separator and the rest of the
+                    path.
+
+                  * The path element is rooted in a trusted directory.
+
+                This exception allows such programs to reference
+                shared libraries in subdirectories of trusted
+                directories.  The use case is one of general
+                organization and deployment flexibility.
+                Trusted directories are usually such paths as "/lib64"
+                or "/lib".  */
+             if (__glibc_unlikely (__libc_enable_secure)
+                 && !((input == start + 1
+                       || (input > start + 1 && input[-2] == '\0'))
+                      && (input[len] == '\0' || input[len] == '/')))

The sequence in question:

+                 && !((input == start + 1
+                       || (input > start + 1 && input[-2] == '\0'))

Will match:

[$ORIGIN\0...]
 ^--- start
  ^--- input

input == start + 1

or

[...........\0$ORIGIN]
 ^--- start    ^--- input

input > start + 1 (not the first path element)
&& input[-2] == '\0' (first in the path element)

Does that answer your question?

However, I see your point now that start itself can also be deleted.

If we are only ever handling single paths, we can remove start from the API
and just track the start of the singular sequence.

I will clean up the code further.

I will also remove a ":" check in is_trusted_path_normalize().

>> Should all paths get tested for trusted paths for a SUID/SGID binary?
>> It seems like that's the right idea.
>>
>> @@ -365,7 +383,8 @@ _dl_dst_substitute (struct link_map *l, const char *start,
>>  
>>    /* In SUID/SGID programs, after DST expansion the normalized
>>       path must be rooted in one of the trusted directories.  */
>> -  if (__glibc_unlikely (check_for_trusted)
>> +  if (__glibc_unlikely (__libc_enable_secure)
>> +      && l->l_type == lt_executable
>>        && !is_trusted_path_normalize (result, wp - result))
>>      {
>>        *result = '\0';
>>
>> Just drop check_for_trusted, and execute the is_trusted_path_normalize
>> check for all SUID/SGID paths?
> 
> No, $ORIGIN in a trusted path is itself trusted (because you cannot
> manipulate it using hard links).  I think there are installations out
> there which depend on this.

Understood.

I think a clearer statement would be like this:

The $LIB and $PLATFORM DST cannot in any way be manipulated by the caller
because they are fixed values that are set by the dynamic loader and therefore
any paths using just $LIB or $PLATFORM need not be checked for trust, the
authors of the binaries themselves are trusted to have designed this correctly.
Only $ORIGIN is tested in this way because it may be manipulated in some ways
with hard links.

I will add something like this as a comment.

Cheers,
Carlos.
Carlos O'Donell June 12, 2018, 3:08 a.m. | #34
On 06/11/2018 04:04 AM, Andreas Schwab wrote:
> On Jun 07 2018, Carlos O'Donell <carlos@redhat.com> wrote:
> 
>> Careful, is_dst () takes as input the start of a DST sequence,
>> but that sequence is not validated yet.
> 
> You have already validated the prefix by comparing it with the string,
> and you only have to check the next character.

By validating it against the abstract definition of the ELF gABI spec 
for a DST name it prevents any future caller from accidentally straying
outside of that contract without having to make two changes: (1) change 
the comparison DST and (2) change what we accept in is_dst().

Also I find the code ends up being simpler. Right now we compute the
longest DST name possible and then the checks are much easier after
that.

Is there a strong performance argument to be made for not doing the
validation?

Cheers,
Carlos.
Carlos O'Donell June 12, 2018, 3:46 a.m. | #35
On 06/08/2018 01:45 AM, Carlos O'Donell wrote:
> On 06/06/2018 04:18 PM, Carlos O'Donell wrote:
>> On 06/06/2018 12:10 PM, Carlos O'Donell wrote:
>>> On 06/06/2018 01:02 AM, Carlos O'Donell wrote:
>>>> This commit improves DST handling significantly in the following
>>>
>>> v2
>>>
>>> - Fix is_dst() by adding back string comparison, and clarify comment.
>>>   Added extra test testcases (49 now) to cover this error.
>>> - Renamed parameters as 'input' for data from ELF file, and 'ref' as
>>>   reference DST to compare against.
>>> - Removed dead update to name in is_dst().
>>> - Killed DL_DST_COUNT, not needed really, and a weak optimization.
>>> - Did not remove len from _dl_dst_count because we use it to advance
>>>   input more quickly to the next DST.
>>>
>>
>> v3
>> - Use memcmp in is_dst().
>> - In _dl_dst_substitute use 'len != 0' conditional to clarify that
>>   all we are looking to do is distinguish between a valid DST
>>   for which we don't recognize the DST, and an invalid DST which
>>   may just be stray characters which we will copy to the result.
>> - Added a few more tests: Small invalid sequence e.g. ${}, and
>>   large valid sequence with unknown DST.
>>
> 
> v4
> - Remove ":" logic from _dl_dst_substitute, and rewrite exception
>   logic without the double negative, and in general cleanup the
>   function to make it clear we accept only single path elements
>   for processing.
> - Cleanup comments regarding RESULT size and what is required.
> - Adjust is_dst() comment to note that $ is not counted in length
>   returned.
> - Fix _dl_dst_substitute exception logic to check for NULL separator
>   between non-first path since fillin_rpath and other callers will
>   use strsep to split colon separated path elements. This fixes
>   the case tested by test 19.
>

v5
- Remove ":" logic from is_trusted_path_normalize(), we don't need
  to process any colons because we only get single paths from 
  _dl_dst_substitute.
- Remove 'start' argument from all API interfaces since we don't
  handle colon-separated path lists in _dl_dst_substitute, we can
  just simply record the start of the input for the appropriate
  SUID/SGID check to see if $ORIGIN is at the start of the input
  (Florian Weimer's suggested simplification).
- Remove all instances of the comment "path element" and instead
  use the more appropriate "path" or "path list" (colon separated
  path list).
- Passes all 67 tests. Attached as swbz23259-v5.tar.gz.

This commit improves DST handling significantly in the following
ways: firstly is_dst () is overhauled to correctly process DST
sequences that would be accepted given the ELF gABI. This means that
we actually now accept slightly more sequences than before.  Now we
accept $ORIGIN$ORIGIN, but in the past we accepted only $ORIGIN\0 or
$ORIGIN/..., but this kind of behaviour results in unexpected
and uninterpreted DST sequences being used as literal search paths
leading to security defects. Therefore the first step in correcting
this defect is making is_dst () properly account for all DSTs
and making the function context free in the sense that it counts
DSTs without knowledge of path, or AT_SECURE. Next, _dl_dst_count ()
is also simplified to count all DSTs regardless of context.
Then in _dl_dst_substitute () we reintroduce context-dependent
processing for such things as AT_SECURE handling. At the level of
_dl_dst_substitute we can have access to things like the true start
of the string sequence to validate $ORIGIN-based paths rooted in
trusted directories.  Lastly, callers of _dl_dst_substitute () are
adjusted to pass in the start of their string sequences, this includes
expand_dynamic_string_token () and fillin_rpath (). Lastly, after this
commit we tighten up the accepted sequences in AT_SECURE, and avoid
leaving unexpanded DSTs, this is noted in the NEWS entry.

Verified with a sequence of 67 tests on x86_64 that cover
non-AT_SECURE and AT_SECURE testing using a sysroot (requires root
to run). The tests cover cases for bug 23102, bug 21942, bug 18018,
and bug 23259. These tests are not yet appropriate for the glibc
regression testsuite, but with the upcoming test-in-container testing
framework it should be possible to include these tests upstream soon.

See the mailing list for the tests:
https://www.sourceware.org/ml/libc-alpha/2018-06/msg00073.html
---
 ChangeLog     |  18 +++++
 NEWS          |  11 +++
 elf/dl-deps.c |   2 +-
 elf/dl-dst.h  |  13 ----
 elf/dl-load.c | 217 ++++++++++++++++++++++++++++++++++++++--------------------
 5 files changed, 173 insertions(+), 88 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index a3bc2bf31e..cc7620e35a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,21 @@
+2018-06-06  Carlos O'Donell  <carlos@redhat.com>
+	    Andreas Schwab  <schwab@suse.de>
+	    Dmitry V. Levin  <ldv@altlinux.org>
+
+	[BZ #23102]
+	[BZ #21942]
+	[BZ #18018]
+	[BZ #23259]
+	CVE-2011-0536
+	* elf/dl-dst.h: Remove DL_DST_COUNT.
+	* elf/dl-deps.c (expand_dst): Call _dl_dst_count.
+	* elf/dl-load.c (is_trusted_path_normalize): Don't handle colons.
+	(is_dst): Comment.  Support ELF gABI.
+	(_dl_dst_count): Comment.  Simplify and count DSTs.
+	(_dl_dst_substitute): Comment.  Support __libc_enable_secure handling.
+	(expand_dybamic_string_token): Comment. Call _dl_dst_count. Rename
+	locals.
+
 2018-06-05  Joseph Myers  <joseph@codesourcery.com>
 
 	* sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h (HWCAP_DIT): New
diff --git a/NEWS b/NEWS
index e2a6f45121..de6577c4b6 100644
--- a/NEWS
+++ b/NEWS
@@ -41,6 +41,17 @@ Major new features:
   NI_IDN_ALLOW_UNASSIGNED, NI_IDN_USE_STD3_ASCII_RULES) have been
   deprecated.  They no longer have any effect.
 
+* Parsing of dynamic string tokens in DT_RPATH, DT_RUNPATH, DT_NEEDED,
+  DT_AUXILIARY, and DT_FILTER has been expanded to support the full
+  range of ELF gABI expressions including such constructs as
+  '$ORIGIN$ORIGIN' (if valid).  For SUID/GUID applications the rules
+  have been further restricted, and where in the past a dynamic string
+  token sequence may have been interpreted as a literal string it will
+  now cause a load failure.  These load failures were always considered
+  unspecified behaviour from the perspective of the dynamic loader, and
+  for safety are now load errors e.g. /foo/${ORIGIN}.so in DT_NEEDED
+  results in a load failure now.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * The nonstandard header files <libio.h> and <_G_config.h> are no longer
diff --git a/elf/dl-deps.c b/elf/dl-deps.c
index c975fcffd7..20b8e94f2e 100644
--- a/elf/dl-deps.c
+++ b/elf/dl-deps.c
@@ -100,7 +100,7 @@ struct list
   ({									      \
     const char *__str = (str);						      \
     const char *__result = __str;					      \
-    size_t __dst_cnt = DL_DST_COUNT (__str);				      \
+    size_t __dst_cnt = _dl_dst_count (__str);				      \
 									      \
     if (__dst_cnt != 0)							      \
       {									      \
diff --git a/elf/dl-dst.h b/elf/dl-dst.h
index 32de5d225a..859032be0d 100644
--- a/elf/dl-dst.h
+++ b/elf/dl-dst.h
@@ -18,19 +18,6 @@
 
 #include "trusted-dirs.h"
 
-/* Determine the number of DST elements in the name.  Only if IS_PATH is
-   nonzero paths are recognized (i.e., multiple, ':' separated filenames).  */
-#define DL_DST_COUNT(name) \
-  ({									      \
-    size_t __cnt = 0;							      \
-    const char *__sf = strchr (name, '$');				      \
-									      \
-    if (__glibc_unlikely (__sf != NULL))				      \
-      __cnt = _dl_dst_count (__sf);					      \
-									      \
-    __cnt; })
-
-
 #ifdef SHARED
 # define IS_RTLD(l) (l) == &GL(dl_rtld_map)
 #else
diff --git a/elf/dl-load.c b/elf/dl-load.c
index 431236920f..e99701d10f 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -121,12 +121,6 @@ is_trusted_path_normalize (const char *path, size_t len)
   if (len == 0)
     return false;
 
-  if (*path == ':')
-    {
-      ++path;
-      --len;
-    }
-
   char *npath = (char *) alloca (len + 2);
   char *wnp = npath;
   while (*path != '\0')
@@ -177,114 +171,179 @@ is_trusted_path_normalize (const char *path, size_t len)
   return false;
 }
 
+/* Given a substring starting at INPUT, just after the DST '$' start
+   token, determine if INPUT contains DST token REF, following the
+   ELF gABI rules for DSTs:
+
+   * Longest possible sequence using the rules (greedy).
+
+   * Must start with a $ (enforced by caller).
+
+   * Must follow $ with one underscore or ASCII [A-Za-z] (caller
+     follows these rules for REF and we enforce the comparison)
+     or '{' (start curly quoted name).
+
+   * Must follow first two characters with zero or more [A-Za-z0-9_]
+     (enforced by caller) or '}' (end curly quoted name).
 
+   If the sequence is a DST matching REF then the length of the DST
+   (excluding the $ sign but including curly braces, if any) is
+   returned, otherwise 0.  */
 static size_t
-is_dst (const char *start, const char *name, const char *str, int secure)
+is_dst (const char *input, const char *ref)
 {
-  size_t len;
+  size_t ilen, rlen;
   bool is_curly = false;
 
-  if (name[0] == '{')
+  /* Is a ${...} input sequence?  */
+  if (input[0] == '{')
     {
       is_curly = true;
-      ++name;
+      ++input;
     }
 
-  len = 0;
-  while (name[len] == str[len] && name[len] != '\0')
-    ++len;
+  /* Find longest valid input sequence.  */
+  ilen = 0;
+  while ((input[ilen] >= 'A' && input[ilen] <= 'Z')
+	 || (input[ilen] >= 'a' && input[ilen] <= 'z')
+	 || (input[ilen] >= '0' && input[ilen] <= '9')
+	 || (input[ilen] == '_'))
+    ++ilen;
+
+  rlen = strlen (ref);
+
+  /* Can't be the DST we are looking for.  */
+  if (rlen != ilen)
+    return 0;
+
+  /* Compare the DST (no strncmp this early in startup).  */
+  if (memcmp (input, ref, ilen) != 0)
+    return 0;
 
   if (is_curly)
     {
-      if (name[len] != '}')
+      /* Invalid curly sequence!  */
+      if (input[ilen] != '}')
 	return 0;
 
-      /* Point again at the beginning of the name.  */
-      --name;
-      /* Skip over closing curly brace and adjust for the --name.  */
-      len += 2;
+      /* Count the two curly braces.  */
+      ilen += 2;
     }
-  else if (name[len] != '\0' && name[len] != '/')
-    return 0;
 
-  if (__glibc_unlikely (secure)
-      && ((name[len] != '\0' && name[len] != '/')
-	  || (name != start + 1)))
-    return 0;
-
-  return len;
+  return ilen;
 }
 
-
+/* INPUT is the start of a DST sequence at the first '$' occurrence.
+   If there is a DST we call into _dl_dst_count to count the number of
+   DSTs.  We count all known DSTs regardless of __libc_enable_secure;
+   the caller is responsible for enforcing the security of the
+   substitution rules (usually _dl_dst_substitute).  */
 size_t
-_dl_dst_count (const char *name)
+_dl_dst_count (const char *input)
 {
-  const char *const start = name;
   size_t cnt = 0;
 
+  input = strchr (input, '$');
+
+  /* Most likely there is no DST.  */
+  if (__glibc_likely (input == NULL))
+    return 0;
+
   do
     {
       size_t len;
 
-      /* $ORIGIN is not expanded for SUID/GUID programs (except if it
-	 is $ORIGIN alone) and it must always appear first in path.  */
-      ++name;
-      if ((len = is_dst (start, name, "ORIGIN", __libc_enable_secure)) != 0
-	  || (len = is_dst (start, name, "PLATFORM", 0)) != 0
-	  || (len = is_dst (start, name, "LIB", 0)) != 0)
+      ++input;
+      /* All DSTs must follow ELF gABI rules, see is_dst ().  */
+      if ((len = is_dst (input, "ORIGIN")) != 0
+	  || (len = is_dst (input, "PLATFORM")) != 0
+	  || (len = is_dst (input, "LIB")) != 0)
 	++cnt;
 
-      name = strchr (name + len, '$');
+      /* There may be more than one DST in the input.  */
+      input = strchr (input + len, '$');
     }
-  while (name != NULL);
+  while (input != NULL);
 
   return cnt;
 }
 
-
+/* Process INPUT for DSTs and store in RESULT using the information
+   from link map L to resolve the DSTs. This function only handles one
+   path at a time and does not handle colon-separated path lists (see
+   fillin_rpath ()).  Lastly the size of result in bytes should be at
+   least equal to the value returned by DL_DST_REQUIRED.  Note that it
+   is possible for a DT_NEEDED, DT_AUXILIARY, and DT_FILTER entries to
+   have colons, but we treat those as literal colons here, not as path
+   list delimeters.  */
 char *
-_dl_dst_substitute (struct link_map *l, const char *name, char *result)
+_dl_dst_substitute (struct link_map *l, const char *input, char *result)
 {
-  const char *const start = name;
-
-  /* Now fill the result path.  While copying over the string we keep
-     track of the start of the last path element.  When we come across
-     a DST we copy over the value or (if the value is not available)
-     leave the entire path element out.  */
+  /* Copy character-by-character from input into the working pointer
+     looking for any DSTs.  We track the start of input and if we are
+     going to check for trusted paths, all of which are part of $ORIGIN
+     handling in SUID/SGID cases (see below).  In some cases, like when
+     a DST cannot be replaced, we may set result to an empty string and
+     return.  */
   char *wp = result;
-  char *last_elem = result;
+  const char *start = input;
   bool check_for_trusted = false;
 
   do
     {
-      if (__glibc_unlikely (*name == '$'))
+      if (__glibc_unlikely (*input == '$'))
 	{
 	  const char *repl = NULL;
 	  size_t len;
 
-	  ++name;
-	  if ((len = is_dst (start, name, "ORIGIN", __libc_enable_secure)) != 0)
+	  ++input;
+	  if ((len = is_dst (input, "ORIGIN")) != 0)
 	    {
-	      repl = l->l_origin;
+	      /* For SUID/GUID programs we normally ignore the path with
+		 $ORIGIN in DT_RUNPATH, or DT_RPATH.  However, there is
+		 one exception to this rule, and it is:
+
+		   * $ORIGIN appears as the first path element, and is
+		     the only string in the path or is immediately
+		     followed by a path separator and the rest of the
+		     path.
+
+		   * The path is rooted in a trusted directory.
+
+		 This exception allows such programs to reference
+		 shared libraries in subdirectories of trusted
+		 directories.  The use case is one of general
+		 organization and deployment flexibility.
+		 Trusted directories are usually such paths as "/lib64"
+		 or "/usr/lib64", and the usual RPATHs take the form of
+		 [$ORIGIN/../$LIB/somedir].  */
+	      if (__glibc_unlikely (__libc_enable_secure)
+		  && !(input == start + 1
+		       && (input[len] == '\0' || input[len] == '/')))
+		repl = (const char *) -1;
+	      else
+	        repl = l->l_origin;
+
 	      check_for_trusted = (__libc_enable_secure
 				   && l->l_type == lt_executable);
 	    }
-	  else if ((len = is_dst (start, name, "PLATFORM", 0)) != 0)
+	  else if ((len = is_dst (input, "PLATFORM")) != 0)
 	    repl = GLRO(dl_platform);
-	  else if ((len = is_dst (start, name, "LIB", 0)) != 0)
+	  else if ((len = is_dst (input, "LIB")) != 0)
 	    repl = DL_DST_LIB;
 
 	  if (repl != NULL && repl != (const char *) -1)
 	    {
 	      wp = __stpcpy (wp, repl);
-	      name += len;
+	      input += len;
 	    }
-	  else if (len > 1)
+	  else if (len != 0)
 	    {
-	      /* We cannot use this path element, the value of the
-		 replacement is unknown.  */
-	      wp = last_elem;
-	      break;
+	      /* We found a valid DST that we know about, but we could
+	         not find a replacement value for it, therefore we
+		 cannot use this path and discard it.  */
+	      *result = '\0';
+	      return result;
 	    }
 	  else
 	    /* No DST we recognize.  */
@@ -292,16 +351,26 @@ _dl_dst_substitute (struct link_map *l, const char *name, char *result)
 	}
       else
 	{
-	  *wp++ = *name++;
+	  *wp++ = *input++;
 	}
     }
-  while (*name != '\0');
+  while (*input != '\0');
 
   /* In SUID/SGID programs, after $ORIGIN expansion the normalized
-     path must be rooted in one of the trusted directories.  */
+     path must be rooted in one of the trusted directories.  The $LIB
+     and $PLATFORM DST cannot in any way be manipulated by the caller
+     because they are fixed values that are set by the dynamic loader
+     and therefore any paths using just $LIB or $PLATFORM need not be
+     checked for trust, the authors of the binaries themselves are
+     trusted to have designed this correctly.  Only $ORIGIN is tested in
+     this way because it may be manipulated in some ways with hard
+     links.  */
   if (__glibc_unlikely (check_for_trusted)
-      && !is_trusted_path_normalize (last_elem, wp - last_elem))
-    wp = last_elem;
+      && !is_trusted_path_normalize (result, wp - result))
+    {
+      *result = '\0';
+      return result;
+    }
 
   *wp = '\0';
 
@@ -309,13 +378,13 @@ _dl_dst_substitute (struct link_map *l, const char *name, char *result)
 }
 
 
-/* Return copy of argument with all recognized dynamic string tokens
-   ($ORIGIN and $PLATFORM for now) replaced.  On some platforms it
-   might not be possible to determine the path from which the object
-   belonging to the map is loaded.  In this case the path element
-   containing $ORIGIN is left out.  */
+/* Return a malloc allocated copy of INPUT with all recognized DSTs
+   replaced. On some platforms it might not be possible to determine the
+   path from which the object belonging to the map is loaded.  In this
+   case the path containing the DST is left out.  On error NULL
+   is returned.  */
 static char *
-expand_dynamic_string_token (struct link_map *l, const char *s)
+expand_dynamic_string_token (struct link_map *l, const char *input)
 {
   /* We make two runs over the string.  First we determine how large the
      resulting string is and then we copy it over.  Since this is no
@@ -325,22 +394,22 @@ expand_dynamic_string_token (struct link_map *l, const char *s)
   size_t total;
   char *result;
 
-  /* Determine the number of DST elements.  */
-  cnt = DL_DST_COUNT (s);
+  /* Determine the number of DSTs.  */
+  cnt = _dl_dst_count (input);
 
   /* If we do not have to replace anything simply copy the string.  */
   if (__glibc_likely (cnt == 0))
-    return __strdup (s);
+    return __strdup (input);
 
   /* Determine the length of the substituted string.  */
-  total = DL_DST_REQUIRED (l, s, strlen (s), cnt);
+  total = DL_DST_REQUIRED (l, input, strlen (input), cnt);
 
   /* Allocate the necessary memory.  */
   result = (char *) malloc (total + 1);
   if (result == NULL)
     return NULL;
 
-  return _dl_dst_substitute (l, s, result);
+  return _dl_dst_substitute (l, input, result);
 }
Andreas Schwab June 12, 2018, 7:31 a.m. | #36
On Jun 11 2018, Carlos O'Donell <carlos@redhat.com> wrote:

> By validating it against the abstract definition of the ELF gABI spec 
> for a DST name it prevents any future caller from accidentally straying
> outside of that contract without having to make two changes: (1) change 
> the comparison DST and (2) change what we accept in is_dst().

Why would we add any use of is_dst with an invalid DST?  That doesn't
make sense.

> Also I find the code ends up being simpler.

I don't agree.  It is overly complex for no reason.

Andreas.
Carlos O'Donell June 12, 2018, 12:45 p.m. | #37
On 06/12/2018 03:31 AM, Andreas Schwab wrote:
> On Jun 11 2018, Carlos O'Donell <carlos@redhat.com> wrote:
> 
>> By validating it against the abstract definition of the ELF gABI spec 
>> for a DST name it prevents any future caller from accidentally straying
>> outside of that contract without having to make two changes: (1) change 
>> the comparison DST and (2) change what we accept in is_dst().
> 
> Why would we add any use of is_dst with an invalid DST?  That doesn't
> make sense.

An invalid DST would be a mistake. The point is to catch mistakes by
enforcing the rules for valid DSTs.

>> Also I find the code ends up being simpler.
> 
> I don't agree.  It is overly complex for no reason.

There is a reason. You don't think the reason has any value though,
and that's different. That's also OK.

I'll post a v6 with the validation removed, I appreciate your point
of view. I'll work with Florian to get strncmp usage in place.

Would you be OK with something like this?

  /* Is a ${...} input sequence?  */
  if (input[0] == '{')
    {
      is_curly = true;
      ++input;
    }

  /* Check for matching name, following closing curly brace (if
     required), or trailing characters which are part of an
     identifier.  */
  size_t rlen = strlen (ref);
  if (strncmp (input, ref, rlen) != 0
      || (is_curly && input[rlen] != '}')
      || ((input[rlen] >= 'A' && input[rlen] <= 'Z')
          || (input[rlen] >= 'a' && input[rlen] <= 'z')
          || (input[rlen] >= '0' && input[rlen] <= '9')
          || (input[rlen] == '_')))
    return 0;
  
  if (is_curly)
    /* Count the two curly braces.  */
    return ilen + 2;
  else
    return ilen;

Cheers,
Carlos.g
Andreas Schwab June 12, 2018, 1:02 p.m. | #38
On Jun 12 2018, Carlos O'Donell <carlos@redhat.com> wrote:

> An invalid DST would be a mistake. The point is to catch mistakes by
> enforcing the rules for valid DSTs.

The chance for that is minuscule.

Andreas.
Carlos O'Donell June 12, 2018, 1:03 p.m. | #39
On 06/12/2018 09:02 AM, Andreas Schwab wrote:
> On Jun 12 2018, Carlos O'Donell <carlos@redhat.com> wrote:
> 
>> An invalid DST would be a mistake. The point is to catch mistakes by
>> enforcing the rules for valid DSTs.
> 
> The chance for that is minuscule.

Agreed, which is why I appreciate your input.

What did you think of the suggested final version?

Cheers,
Carlos.

Patch

diff --git a/ChangeLog b/ChangeLog
index a3bc2bf31e..5ece817a39 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,23 @@ 
+2018-06-06  Carlos O'Donell  <carlos@redhat.com>
+	    Andreas Schwab  <schwab@suse.de>
+	    Dmitry V. Levin  <ldv@altlinux.org> 
+
+	[BZ #23102]
+	[BZ #21942]
+	[BZ #18018]
+	[BZ #23259]
+	CVE-2011-0536
+	* elf/dl-load.c (is_dst): Comment.  Support ELF gABI. 
+	(_dl_dst_count): Comment.  Simplify and count DSTs.
+	(_dl_dst_substitute): Comment.  Support __libc_enable_secure handling.
+	Add string start to arguments.
+	(expand_dybamic_string_token): Comment. Accept path start.
+	(fillin_rpath): Pass string start to expand_dynamic_string_token. 
+	* sysdeps/generic/ldsodefs.h: _dl_dst_substitute takes additiional
+	string start argument.
+	* elf/dl-deps.c (expand_dst): Adjust call to _dl_dst_substitute.
+	* elf/dl-dst.h: Fix comment.
+
 2018-06-05  Joseph Myers  <joseph@codesourcery.com>
 
 	* sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h (HWCAP_DIT): New
diff --git a/NEWS b/NEWS
index e2a6f45121..0d0bc9ad4c 100644
--- a/NEWS
+++ b/NEWS
@@ -41,6 +41,16 @@  Major new features:
   NI_IDN_ALLOW_UNASSIGNED, NI_IDN_USE_STD3_ASCII_RULES) have been
   deprecated.  They no longer have any effect.
 
+* Parsing of dynamic string tokens in DT_RPATH, DT_RUNPATH, and DT_NEEDED
+  has been expanded to support the full range of ELF gABI expressions
+  including such constructs as '$ORIGIN$ORIGIN' (if valid).  For SUID/GUID
+  applications the rules have been further restricted, and where in the
+  past a dynamic string token sequence may have been interpreted as a
+  literal string it will now cause a load failure.  These load failures
+  were always considered unspecified behaviour from the perspective of the
+  dynamic loader, and for safety are now load errors e.g. /foo/${ORIGIN}.so
+  in DT_NEEDED results in a load failure now.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * The nonstandard header files <libio.h> and <_G_config.h> are no longer
diff --git a/elf/dl-deps.c b/elf/dl-deps.c
index c975fcffd7..2ec434a1ff 100644
--- a/elf/dl-deps.c
+++ b/elf/dl-deps.c
@@ -114,7 +114,7 @@  DST not allowed in SUID/SGID programs"));				      \
 	__newp = (char *) alloca (DL_DST_REQUIRED (l, __str, strlen (__str),  \
 						   __dst_cnt));		      \
 									      \
-	__result = _dl_dst_substitute (l, __str, __newp);		      \
+	__result = _dl_dst_substitute (l, __str, __str, __newp);	      \
 									      \
 	if (*__result == '\0')						      \
 	  {								      \
diff --git a/elf/dl-dst.h b/elf/dl-dst.h
index 32de5d225a..ee7254f3c3 100644
--- a/elf/dl-dst.h
+++ b/elf/dl-dst.h
@@ -18,8 +18,7 @@ 
 
 #include "trusted-dirs.h"
 
-/* Determine the number of DST elements in the name.  Only if IS_PATH is
-   nonzero paths are recognized (i.e., multiple, ':' separated filenames).  */
+/* Determine the number of DST elements in the name.  */
 #define DL_DST_COUNT(name) \
   ({									      \
     size_t __cnt = 0;							      \
diff --git a/elf/dl-load.c b/elf/dl-load.c
index 431236920f..13263212d5 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -177,63 +177,89 @@  is_trusted_path_normalize (const char *path, size_t len)
   return false;
 }
 
+/* Given a substring starting at NAME, just after the DST '$' start
+   token, determine if NAME contains dynamic string token STR,
+   following the ELF gABI rules for dynamic string tokens:
 
+   * Longest possible sequence using the rules (greedy).
+
+   * Must start with a $ (enforced by caller).
+
+   * Must follow $ with one underscore or ASCII [A-Za-z] (enforced by
+     caller via STR comparison) or '{' (start curly quoted name).
+
+   * Must follow first two characters with zero or more [A-Za-z0-9_]
+     (enforced by caller) or '}' (end curly quoted name).
+
+   If the sequence is a dynamic string token matching STR then
+   the length of the DST is returned, otherwise 0.  */
 static size_t
-is_dst (const char *start, const char *name, const char *str, int secure)
+is_dst (const char *name, const char *str)
 {
-  size_t len;
+  size_t nlen, slen;
   bool is_curly = false;
 
+  /* Is a ${...} name sequence?  */
   if (name[0] == '{')
     {
       is_curly = true;
       ++name;
     }
 
-  len = 0;
-  while (name[len] == str[len] && name[len] != '\0')
-    ++len;
+  /* Find longest valid name sequence.  */
+  nlen = 0;
+  while ((name[nlen] >= 'A' && name[nlen] <= 'Z')
+	 || (name[nlen] >= 'a' && name[nlen] <= 'z')
+	 || (name[nlen] >= '0' && name[nlen] <= '9')
+	 || (name[nlen] == '_'))
+    ++nlen;
+
+  slen = strlen (str);
+
+  /* Can't be the DST we are looking for.  */
+  if (slen != nlen)
+    return 0;
 
   if (is_curly)
     {
-      if (name[len] != '}')
+      /* Invalid curly sequence!  */
+      if (name[nlen] != '}')
 	return 0;
 
       /* Point again at the beginning of the name.  */
       --name;
-      /* Skip over closing curly brace and adjust for the --name.  */
-      len += 2;
+      /* Count the two curly braces.  */
+      nlen += 2;
     }
-  else if (name[len] != '\0' && name[len] != '/')
-    return 0;
 
-  if (__glibc_unlikely (secure)
-      && ((name[len] != '\0' && name[len] != '/')
-	  || (name != start + 1)))
-    return 0;
-
-  return len;
+  return nlen;
 }
 
-
+/* Passed the start of a DST sequence at the first '$' occurrence.
+   See the DL_DST_COUNT macro which inlines the strchr to find the
+   first occurrence of '$' and optimizes that likely case that there
+   is no DST.  If there is a DST we call into _dl_dst_count to count
+   the number of DSTs.  We count all known DSTs regardless of
+   __libc_enable_secure; the caller is responsible for enforcing
+   the security of the substitution rules (usually
+   _dl_dst_substitute).  */
 size_t
 _dl_dst_count (const char *name)
 {
-  const char *const start = name;
   size_t cnt = 0;
 
   do
     {
       size_t len;
 
-      /* $ORIGIN is not expanded for SUID/GUID programs (except if it
-	 is $ORIGIN alone) and it must always appear first in path.  */
       ++name;
-      if ((len = is_dst (start, name, "ORIGIN", __libc_enable_secure)) != 0
-	  || (len = is_dst (start, name, "PLATFORM", 0)) != 0
-	  || (len = is_dst (start, name, "LIB", 0)) != 0)
+      /* All DSTs must follow ELF gABI rules, see is_dst ().  */
+      if ((len = is_dst (name, "ORIGIN")) != 0
+	  || (len = is_dst (name, "PLATFORM")) != 0
+	  || (len = is_dst (name, "LIB")) != 0)
 	++cnt;
 
+      /* There may be more than one DST in the sequence.  */
       name = strchr (name + len, '$');
     }
   while (name != NULL);
@@ -241,12 +267,14 @@  _dl_dst_count (const char *name)
   return cnt;
 }
 
-
+/* Process NAME for DSTs and store in RESULT using the information from
+   link map L to resolve the DSTs.  The value of START must equal the
+   start of the parent string if NAME is a substring sequence being
+   parsed with path separators e.g. $ORIGIN:$PLATFORM.  */
 char *
-_dl_dst_substitute (struct link_map *l, const char *name, char *result)
+_dl_dst_substitute (struct link_map *l, const char *start,
+		    const char *name, char *result)
 {
-  const char *const start = name;
-
   /* Now fill the result path.  While copying over the string we keep
      track of the start of the last path element.  When we come across
      a DST we copy over the value or (if the value is not available)
@@ -263,15 +291,36 @@  _dl_dst_substitute (struct link_map *l, const char *name, char *result)
 	  size_t len;
 
 	  ++name;
-	  if ((len = is_dst (start, name, "ORIGIN", __libc_enable_secure)) != 0)
+	  if ((len = is_dst (name, "ORIGIN")) != 0)
 	    {
-	      repl = l->l_origin;
+	      /* For SUID/GUID programs we normally ignore the path with
+		 a DST in DT_RUNPATH, or DT_RPATH.  However, there is one
+		 exception to this rule, and it is:
+
+		   * $ORIGIN appears first in the path element.
+		   * The path element is rooted in a trusted directory.
+
+		 This exception allows such programs to reference
+		 shared libraries in subdirectories of trusted
+		 directories.  The use case is one of general
+		 organization and deployment flexibility.
+		 Trusted directories are usually such paths as "/lib64"
+		 or "/lib".  */
+	      if (__glibc_unlikely (__libc_enable_secure)
+		  && ((name[len] != '\0' && name[len] != '/'
+		       && name[len] != ':')
+		      || (name != start + 1
+			  || (name > start + 2 && name[-2] != ':'))))
+		repl = (const char *) -1;
+	      else
+	        repl = l->l_origin;
+
 	      check_for_trusted = (__libc_enable_secure
 				   && l->l_type == lt_executable);
 	    }
-	  else if ((len = is_dst (start, name, "PLATFORM", 0)) != 0)
+	  else if ((len = is_dst (name, "PLATFORM")) != 0)
 	    repl = GLRO(dl_platform);
-	  else if ((len = is_dst (start, name, "LIB", 0)) != 0)
+	  else if ((len = is_dst (name, "LIB")) != 0)
 	    repl = DL_DST_LIB;
 
 	  if (repl != NULL && repl != (const char *) -1)
@@ -283,6 +332,7 @@  _dl_dst_substitute (struct link_map *l, const char *name, char *result)
 	    {
 	      /* We cannot use this path element, the value of the
 		 replacement is unknown.  */
+	      check_for_trusted = false;
 	      wp = last_elem;
 	      break;
 	    }
@@ -309,13 +359,17 @@  _dl_dst_substitute (struct link_map *l, const char *name, char *result)
 }
 
 
-/* Return copy of argument with all recognized dynamic string tokens
-   ($ORIGIN and $PLATFORM for now) replaced.  On some platforms it
-   might not be possible to determine the path from which the object
-   belonging to the map is loaded.  In this case the path element
-   containing $ORIGIN is left out.  */
+/* Return a malloc allocated copy of NAME with all recognized DSTs
+   ($ORIGIN and $PLATFORM for now) replaced.  The value of START must
+   equal the start of the parent string if NAME is a substring
+   sequence being parsed with path separators e.g. $ORIGIN:$PLATFORM.
+   On some platforms it might not be possible to determine the path
+   from which the object belonging to the map is loaded.  In this case
+   the path element containing $ORIGIN is left out.  On error NULL is
+   returned.  */
 static char *
-expand_dynamic_string_token (struct link_map *l, const char *s)
+expand_dynamic_string_token (struct link_map *l, const char *start,
+			     const char *name)
 {
   /* We make two runs over the string.  First we determine how large the
      resulting string is and then we copy it over.  Since this is no
@@ -326,21 +380,21 @@  expand_dynamic_string_token (struct link_map *l, const char *s)
   char *result;
 
   /* Determine the number of DST elements.  */
-  cnt = DL_DST_COUNT (s);
+  cnt = DL_DST_COUNT (name);
 
   /* If we do not have to replace anything simply copy the string.  */
   if (__glibc_likely (cnt == 0))
-    return __strdup (s);
+    return __strdup (name);
 
   /* Determine the length of the substituted string.  */
-  total = DL_DST_REQUIRED (l, s, strlen (s), cnt);
+  total = DL_DST_REQUIRED (l, name, strlen (name), cnt);
 
   /* Allocate the necessary memory.  */
   result = (char *) malloc (total + 1);
   if (result == NULL)
     return NULL;
 
-  return _dl_dst_substitute (l, s, result);
+  return _dl_dst_substitute (l, start, name, result);
 }
 
 
@@ -387,6 +441,7 @@  fillin_rpath (char *rpath, struct r_search_path_elem **result, const char *sep,
 	      const char *what, const char *where, struct link_map *l)
 {
   char *cp;
+  char *start = rpath;
   size_t nelems = 0;
 
   while ((cp = __strsep (&rpath, sep)) != NULL)
@@ -398,7 +453,7 @@  fillin_rpath (char *rpath, struct r_search_path_elem **result, const char *sep,
       /* `strsep' can pass an empty string.  */
       if (*cp != '\0')
 	{
-	  to_free = cp = expand_dynamic_string_token (l, cp);
+	  to_free = cp = expand_dynamic_string_token (l, start, cp);
 
 	  /* expand_dynamic_string_token can return NULL in case of empty
 	     path or memory allocation failure.  */
@@ -2091,7 +2146,7 @@  _dl_map_object (struct link_map *loader, const char *name,
     {
       /* The path may contain dynamic string tokens.  */
       realname = (loader
-		  ? expand_dynamic_string_token (loader, name)
+		  ? expand_dynamic_string_token (loader, name, name)
 		  : __strdup (name));
       if (realname == NULL)
 	fd = -1;
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 95dc87519b..688ff60785 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -1108,8 +1108,9 @@  extern const char *_dl_get_origin (void) attribute_hidden;
 extern size_t _dl_dst_count (const char *name) attribute_hidden;
 
 /* Substitute DST values.  */
-extern char *_dl_dst_substitute (struct link_map *l, const char *name,
-				 char *result) attribute_hidden;
+extern char *_dl_dst_substitute (struct link_map *l, const char *start,
+				 const char *name, char *result)
+     attribute_hidden;
 
 /* Open the shared object NAME, relocate it, and run its initializer if it
    hasn't already been run.  MODE is as for `dlopen' (see <dlfcn.h>).  If