diff mbox

Fix handling of overlength pathname in AF_UNIX sun_path

Message ID 4F8D497F.8060601@gmail.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Michael Kerrisk \(man-pages\) April 17, 2012, 10:44 a.m. UTC
Tetsuo Handa sent me a patch to document the kernel's odd 
behavior when asked to create a UNIX domain socket address where the 
pathname completely fills the sun_path field without including a null 
terminator [1]. 

One of the consequences of the current kernel behavior is that 
when a socket address is returned to userspace (via getsockname(), 
getpeername(), accept(), recvfrom()), applications can't reliably 
do things such as:

printf("%s\n", addr.sun_path);

Instead one must either write:

printf("%.*s\n", sizeof(addr.sun_path), addr.sun_path);

or, when retrieving a socket address structure, employ a buffer whose
size is:

sizeof(struct sockaddr_un) + 1

(This ensures that there is enough space to hold the null terminator
for the case where a socket was bound to a sun_path with non-NUL characters
in all 108 bytes. But it entails some casting.)

Tetsuo initially considered there might be a kernel bug here, 
but an attempt to change the kernel behavior met resistance [2].

The patch at the end of this message is a slightly different 
fix for the same problem. There are a few reasons why I think 
it (or some variation) should be considered:

1. Changing the kernel behavior prevents userspace having 
   to go through the sort of contortions described above
   in order to handle the 108-non-NUL-bytes-in-sun_path case.
2. POSIX says that sun_path is a pathname. By (POSIX) definition, 
   a pathname is null terminated.
3. Considering these two sets:

   (a) [applications that rely on the assumption that there
        is a null terminator inside sizeof(sun_path) bytes]
   (b) [applications that would break if the kernel behavior changed]

   I suspect that set (a) is rather larger than set (b)--or, more 
   likely still, applications ensure they go for the lowest common
   denominator limit of 92 (HP-UX) or 104 (historical BSD limit)
   bytes, and so avoid this issue completely.

The accompanying patch changes unix_mkname() to ensure that a terminating 
null byte is always located within the first 108 bytes of sun_path. 
It does change the ABI for the former case where a pathname ran to 108
bytes without a null terminator: for that case, the call now fails with
the error -EINVAL. What are people's thoughts on applying this?

[1] http://thread.gmane.org/gmane.linux.network/174473/focus=1861
[2] http://thread.gmane.org/gmane.linux.kernel/291038

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>

---

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Willy Tarreau April 17, 2012, 10:51 a.m. UTC | #1
Hi Michael,

On Tue, Apr 17, 2012 at 10:44:15PM +1200, Michael Kerrisk wrote:
(...)
> The accompanying patch changes unix_mkname() to ensure that a terminating 
> null byte is always located within the first 108 bytes of sun_path. 
> It does change the ABI for the former case where a pathname ran to 108
> bytes without a null terminator: for that case, the call now fails with
> the error -EINVAL. What are people's thoughts on applying this?

My personal opinion is that (as you said), the risk of breaking existing
apps is already fairly low, but we must not deliberately break existing
apps. Eventhough there are currently a log, this is exactly what sysctls
are made for. I would personally like to have a default limit to 107 chars 
+ one zero, with a sysctl option to revert to current behaviour if ever it
broke an application. In my opinion it's exactly comparable to the risk of
breaking apps with mmap_min_addr : very low risk but must be covered by a
workaround (sysctl).

Just my 2 cents,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Carlos O'Donell April 17, 2012, 3:43 p.m. UTC | #2
On Tue, Apr 17, 2012 at 6:51 AM, Willy Tarreau <w@1wt.eu> wrote:
> Hi Michael,
>
> On Tue, Apr 17, 2012 at 10:44:15PM +1200, Michael Kerrisk wrote:
> (...)
>> The accompanying patch changes unix_mkname() to ensure that a terminating
>> null byte is always located within the first 108 bytes of sun_path.
>> It does change the ABI for the former case where a pathname ran to 108
>> bytes without a null terminator: for that case, the call now fails with
>> the error -EINVAL. What are people's thoughts on applying this?
>
> My personal opinion is that (as you said), the risk of breaking existing
> apps is already fairly low, but we must not deliberately break existing
> apps. Eventhough there are currently a log, this is exactly what sysctls
> are made for. I would personally like to have a default limit to 107 chars
> + one zero, with a sysctl option to revert to current behaviour if ever it
> broke an application. In my opinion it's exactly comparable to the risk of
> breaking apps with mmap_min_addr : very low risk but must be covered by a
> workaround (sysctl).

To further the opinion that the risk is low:

The Open Solaris SUN_LEN macro uses strlen to compute the length.

The glibc SUN_LEN macro uses strlen to compute the length.

The Mac OS X libc SUN_LEN macro uses strlen to compute the length.

The linux man-pages project sets the user expectation that it is a
`null-terminated string' (circular argument).

I see every expectation from userspace/glibc that it should be 107 chars + '\0'.

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 18, 2012, 2:36 a.m. UTC | #3
From: Michael Kerrisk <mtk.manpages@gmail.com>
Date: Tue, 17 Apr 2012 22:44:15 +1200

> 1. Changing the kernel behavior prevents userspace having 
>    to go through the sort of contortions described above
>    in order to handle the 108-non-NUL-bytes-in-sun_path case.

The problem with this logic is that it ignores every single Linux
system that exists right now.

You need to code this logic into your application unless you don't
want it to run properly on every Linux system that actually exists.

Sorry, we're not making this change.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox April 18, 2012, 7:44 p.m. UTC | #4
I think its pointless.

Yes there may have been a case years ago to nul terminate but thats now
how 4BSD defined the API so that's how the world looks.

It helps nobody on the app side because they'll be defensively coding for
the existing API for another ten years for enterprise distros anyway.

Alan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- net/unix/af_unix.c.orig	2012-04-17 09:50:07.383459311 +1200
+++ net/unix/af_unix.c	2012-04-17 19:49:56.077852639 +1200
@@ -207,14 +207,16 @@  static int unix_mkname(struct sockaddr_u
 	if (!sunaddr || sunaddr->sun_family != AF_UNIX)
 		return -EINVAL;
 	if (sunaddr->sun_path[0]) {
-		/*
-		 * This may look like an off by one error but it is a bit more
-		 * subtle. 108 is the longest valid AF_UNIX path for a binding.
-		 * sun_path[108] doesn't as such exist.  However in kernel space
-		 * we are guaranteed that it is a valid memory location in our
-		 * kernel address buffer.
-		 */
-		((char *)sunaddr)[len] = 0;
+		if (len == sizeof(*sunaddr)) {
+			/*
+			 * If 'sun_path' is completely filled, the user
+			 * must include a terminator
+			 */
+			if (!memchr(sunaddr->sun_path, '\0',
+					sizeof(sunaddr->sun_path)))
+				return -EINVAL;
+		} else
+			((char *)sunaddr)[len] = 0;
 		len = strlen(sunaddr->sun_path)+1+sizeof(short);
 		return len;
 	}