diff mbox

[v2] net: handle error more gracefully in socketpair()

Message ID 1385979146-13825-1-git-send-email-ydroneaud@opteya.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Yann Droneaud Dec. 2, 2013, 10:12 a.m. UTC
socketpair() error paths can be simplified to not call
heavy-weight sys_close().

This patch makes socketpair() use of error paths which do not
rely on heavy-weight call to sys_lose(): it's better to try
to push the file descriptor to userspace before installing
the socket file to the file descriptor, so that errors are
catched earlier and being easier to handle.

Three distinct error paths are needed since calling fput()
on file structure returned by sock_alloc_file() will
implicitly call sock_release() on the associated socket
structure.

Cc: David S. Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=1385977019-12282-1-git-send-email-ydroneaud@opteya.com
---

Hi,

Please discard my previous patch "[PATCH] net: handle error
more gracefully in socketpair()" [1] as I haven't double checked
the underlying reason for the not-so-complicated error paths.

This one is perhaps less appealing. But I think it's still
a good thing to write the file descriptor to userspace before
calling fd_install(): it's making error recovery easier.

Changes from v1 [1]:

- use three different error path to handle file allocated through
  sock_alloc_file().

Regards.

[1] http://mid.gmane.org/1385977019-12282-1-git-send-email-ydroneaud@opteya.com

 net/socket.c | 49 +++++++++++++++++++++++++++++++------------------
 1 file changed, 31 insertions(+), 18 deletions(-)

Comments

David Miller Dec. 5, 2013, 9:23 p.m. UTC | #1
From: Yann Droneaud <ydroneaud@opteya.com>
Date: Mon,  2 Dec 2013 11:12:26 +0100

> socketpair() error paths can be simplified to not call
> heavy-weight sys_close().
> 
> This patch makes socketpair() use of error paths which do not
> rely on heavy-weight call to sys_lose(): it's better to try
> to push the file descriptor to userspace before installing
> the socket file to the file descriptor, so that errors are
> catched earlier and being easier to handle.
> 
> Three distinct error paths are needed since calling fput()
> on file structure returned by sock_alloc_file() will
> implicitly call sock_release() on the associated socket
> structure.
> 
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
> Link: http://marc.info/?i=1385977019-12282-1-git-send-email-ydroneaud@opteya.com

I know that sys_close() is expensive, _but_ erroring out on
the put_user() is extremely rare and logically it makes
a ton more sense to fully install a file descriptor before
writing it's numeric value into userspace.

Sorry, I think the current code is fine and I'm not going
to apply this, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yann Droneaud Dec. 5, 2013, 11:15 p.m. UTC | #2
Hi,

Le jeudi 05 décembre 2013 à 16:23 -0500, David Miller a écrit :
> From: Yann Droneaud <ydroneaud@opteya.com>
> Date: Mon,  2 Dec 2013 11:12:26 +0100
> 
> > socketpair() error paths can be simplified to not call
> > heavy-weight sys_close().
> > 
> > This patch makes socketpair() use of error paths which do not
> > rely on heavy-weight call to sys_lose(): it's better to try
> > to push the file descriptor to userspace before installing
> > the socket file to the file descriptor, so that errors are
> > catched earlier and being easier to handle.
> > 
> > Three distinct error paths are needed since calling fput()
> > on file structure returned by sock_alloc_file() will
> > implicitly call sock_release() on the associated socket
> > structure.
> > 
> > Cc: David S. Miller <davem@davemloft.net>
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
> > Link: http://marc.info/?i=1385977019-12282-1-git-send-email-ydroneaud@opteya.com
> 
> I know that sys_close() is expensive, _but_ erroring out on
> the put_user() is extremely rare and logically it makes
> a ton more sense to fully install a file descriptor before
> writing it's numeric value into userspace.
> 
> Sorry, I think the current code is fine and I'm not going
> to apply this, thanks.

Thanks for the review.

AFAIK, using sys_close() seems to be the exception, and writing the file
descriptor before installing it is the more or less the norm.

I've made a review of each subsystem using get_unused_fd{,_flags} and
anon_inode_get{fd,file}: most of the time, error handling involve fput()
and put_unused_fd() and not sys_close().

Looking at the places where sys_close() is used make it pretty obvious:

- autofs_dev_ioctl_closemount(), dev-ioctl.c:298 [1]
  [just a "fancy" way of doing close() through an ioctl]
- load_misc_binary(),            binfmt_misc.c:208 [2]
  [could probably made use fput(),put_unused_fd()]
- change_floppy(),               do_mounts.c:490,501 [3]
- handle_initrd(),               do_mounts_initrd.c:113 [4]
- md_setup_drive(),              do_mounts_md.c:193,245,249 [5]
- autodetect_raid(),             do_mounts_md.c:299 [6]
- rd_load_image(),               do_mounts_rd.c:266,292,294 [7]
- do_copy(),                     initramfs.c:350 [8]
- clean_rootfs(),                initramfs.c:549,577 [9]
- populate_rootfs(),             initramfs.c:607 [10]
- socketpair(),                  socket.c:1486,1487 [11]
  [our target]

The majority of sys_close() users are in init code, which is code
behaving like userspace code.

This make socketpair() usage of sys_close() quite unusual.

It deserve to be replaced by the more common pattern of fput() /
put_unused_fd().

Regards.

Links:

[1]
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/autofs4/dev-ioctl.c?id=v3.13-rc2#n298

[2]
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_misc.c?id=v3.13-rc2#n208

[3]
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts.c?id=v3.13-rc2#n490

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts.c?id=v3.13-rc2#n501

[4]
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_initrd.c?id=v3.13-rc2#n113

[5]
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_md.c?id=v3.13-rc2#n193

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_md.c?id=v3.13-rc2#n245

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_md.c?id=v3.13-rc2#n249

[6]
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_md.c?id=v3.13-rc2#n299

[7]
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_rd.c?id=v3.13-rc2#n266

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_rd.c?id=v3.13-rc2#n292

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_rd.c?id=v3.13-rc2#n294

[8]
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/initramfs.c?id=v3.13-rc2#n350

[9]
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/initramfs.c?id=v3.13-rc2#n549

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/initramfs.c?id=v3.13-rc2#n577

[10]
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/initramfs.c?id=v3.13-rc2#n607

[11]
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/socket.c?id=v3.13-rc2#n1486

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/socket.c?id=v3.13-rc2#n1487


Aside: I will submit "soon" a patchset that add some sys_close() in
error paths of code relying on anon_inode_getfd() or using some layer 
which call get_unused_fd{,flags}()/fd_install() deep in the call chain,
mostly in kvm, drm, iio.

Regards.
David Miller Dec. 6, 2013, 12:43 a.m. UTC | #3
From: Yann Droneaud <ydroneaud@opteya.com>
Date: Fri, 06 Dec 2013 00:15:31 +0100

> AFAIK, using sys_close() seems to be the exception, and writing the file
> descriptor before installing it is the more or less the norm.

What other system call in the kernel writes a file descriptor's value
into the address space of a user process before the file descriptor
is actually usable?

That's really terrible semantically.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yann Droneaud Dec. 6, 2013, 10:10 a.m. UTC | #4
Hi, 

Le jeudi 05 décembre 2013 à 19:43 -0500, David Miller a écrit :
> From: Yann Droneaud <ydroneaud@opteya.com>
> Date: Fri, 06 Dec 2013 00:15:31 +0100
> 
> > AFAIK, using sys_close() seems to be the exception, and writing the file
> > descriptor before installing it is the more or less the norm.
> 
> What other system call in the kernel writes a file descriptor's value
> into the address space of a user process before the file descriptor
> is actually usable?
> 

Some carefully chosen examples:

- recvmsg() through
  - netlink_recvmsg() / unix_dgram_recvmsg() / unix_stream_recvmsg()
  - scm_recv()
  - scm_detach_fds(), scm.c:280
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/core/scm.c?id=v3.13-rc2#n280
  - scm_detach_fds_compat(), compat.c:296
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/compat.c?id=v3.13-rc2#n296

- pipe() / pipe2(), pipe.c:969, through
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/pipe.c?id=v3.13-rc2#n969
  - __do_pipe_flags, pipe.c:919
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/pipe.c?id=v3.13-rc2#n919


> That's really terrible semantically.

In fact, it's better:

1) get_unused_fd_flags() mark a file descriptor as 'reserved',
   so no other syscall would be able to allocate it.

   At this point, even if another userspace thread guess the correct
   file descriptor number, trying to use it will give it -EBADFD,
   since the file descriptor is not tied to a file (see __close_fd(), 
   open.c:589 for example)

2) file descriptor is wrote in userspace memory, using 
   put_user()/copy_to_user().

   At this point, the userspace (another thread) could potentially
   read the memory location and use the file descriptor, but here
   again, it will give -EBADFD for the very same reason.

3) fd_install() link the file descriptor to the file.

   Now, the userspace (another thread) could read the memory location
   and use the file descriptor even if the syscall which create it 
   haven't return to the userspace caller yet.

If we arrange to put the call to fd_install() in the very last step
of the syscall, the window where the file descriptor is usable by
userspace but not yet validly returned to userspace is very tiny.

In the other hand, when writing the file descriptor in the last step,
eg. doing 1), 3) and 2), it makes possible for userspace to guess and
manipulate the file descriptor while the syscall is not near completion,
eg. kernel could have some things to do on the file, device, socket, ...
before returning to userspace.
In such case, bad things might happen if another userspace thread is
trying to play a bad game with the file descriptor.
BTW I'm not aware of any security implication of this issue, but I think
it's better (safer) to have fd_install() being the very last step of a
syscall. It should be the default good habit when dealing with file
descriptor creation.

Regards.
Al Viro Dec. 6, 2013, 10:48 a.m. UTC | #5
On Thu, Dec 05, 2013 at 07:43:55PM -0500, David Miller wrote:
> From: Yann Droneaud <ydroneaud@opteya.com>
> Date: Fri, 06 Dec 2013 00:15:31 +0100
> 
> > AFAIK, using sys_close() seems to be the exception, and writing the file
> > descriptor before installing it is the more or less the norm.
> 
> What other system call in the kernel writes a file descriptor's value
> into the address space of a user process before the file descriptor
> is actually usable?
> 
> That's really terrible semantically.

What's the problem with that?  If nothing else, shared descriptor table is
a lot more visible to other threads than two-element array, most likely
in stack frame of whoever makes that syscall...

As for your question, how about pipe(2)?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 6, 2013, 5:14 p.m. UTC | #6
From: Al Viro <viro@ZenIV.linux.org.uk>
Date: Fri, 6 Dec 2013 10:48:05 +0000

> On Thu, Dec 05, 2013 at 07:43:55PM -0500, David Miller wrote:
>> From: Yann Droneaud <ydroneaud@opteya.com>
>> Date: Fri, 06 Dec 2013 00:15:31 +0100
>> 
>> > AFAIK, using sys_close() seems to be the exception, and writing the file
>> > descriptor before installing it is the more or less the norm.
>> 
>> What other system call in the kernel writes a file descriptor's value
>> into the address space of a user process before the file descriptor
>> is actually usable?
>> 
>> That's really terrible semantically.
> 
> What's the problem with that?  If nothing else, shared descriptor table is
> a lot more visible to other threads than two-element array, most likely
> in stack frame of whoever makes that syscall...
> 
> As for your question, how about pipe(2)?

Fair enough, Yann please resubmit your patch.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/socket.c b/net/socket.c
index 0b18693f2be6..72bf3c41f4f0 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1445,48 +1445,61 @@  SYSCALL_DEFINE4(socketpair, int, family, int, type, int, protocol,
 		err = fd1;
 		goto out_release_both;
 	}
+
 	fd2 = get_unused_fd_flags(flags);
 	if (unlikely(fd2 < 0)) {
 		err = fd2;
-		put_unused_fd(fd1);
-		goto out_release_both;
+		goto out_put_unused_1;
 	}
 
 	newfile1 = sock_alloc_file(sock1, flags, NULL);
 	if (unlikely(IS_ERR(newfile1))) {
 		err = PTR_ERR(newfile1);
-		put_unused_fd(fd1);
-		put_unused_fd(fd2);
-		goto out_release_both;
+		goto out_put_unused_both;
 	}
 
 	newfile2 = sock_alloc_file(sock2, flags, NULL);
 	if (IS_ERR(newfile2)) {
 		err = PTR_ERR(newfile2);
-		fput(newfile1);
-		put_unused_fd(fd1);
-		put_unused_fd(fd2);
-		sock_release(sock2);
-		goto out;
+		goto out_fput_1;
 	}
 
+	err = put_user(fd1, &usockvec[0]);
+	if (err)
+		goto out_fput_both;
+
+	err = put_user(fd2, &usockvec[1]);
+	if (err)
+		goto out_fput_both;
+
 	audit_fd_pair(fd1, fd2);
+
 	fd_install(fd1, newfile1);
 	fd_install(fd2, newfile2);
 	/* fd1 and fd2 may be already another descriptors.
 	 * Not kernel problem.
 	 */
 
-	err = put_user(fd1, &usockvec[0]);
-	if (!err)
-		err = put_user(fd2, &usockvec[1]);
-	if (!err)
-		return 0;
+	return 0;
 
-	sys_close(fd2);
-	sys_close(fd1);
-	return err;
+out_fput_both:
+	fput(newfile2);
+	fput(newfile1);
+	put_unused_fd(fd2);
+	put_unused_fd(fd1);
+	goto out;
+
+out_fput_1:
+	fput(newfile1);
+	put_unused_fd(fd2);
+	put_unused_fd(fd1);
+	sock_release(sock2);
+	goto out;
 
+out_put_unused_both:
+	put_unused_fd(fd2);
+out_put_unused_1:
+	put_unused_fd(fd1);
 out_release_both:
 	sock_release(sock2);
 out_release_1: