Message ID | AANLkTi=V93A660+YS8C2TvC13kGUcJpFgjPHUvONd_WW@mail.gmail.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
From: Linus Torvalds <torvalds@linux-foundation.org> Date: Thu, 21 Oct 2010 17:31:12 -0700 > We already (long ago) decided that POSIX.1 compatibility be damned, > and that reading and writing more than 2GB in a single system call is > bogus: so normal write calls will actually limit size_t arguments to > MAX_INT, exactly so that various filesystems don't have to worry about > overflow and can keep length arguments in an "int". Maybe the filesystem paths are this way, but the bulk of the socket paths properly use size_t when touching anything even related to an I/O length. I know that TCP can do a >= 4GB write just fine right now. In fact if you look I recently removed the last obstacle to this based upon a bug report from a user trying to do a 4GB write (which ended up getting truncated to zero): commit 01db403cf99f739f86903314a489fb420e0e254f Author: David S. Miller <davem@davemloft.net> Date: Mon Sep 27 20:24:54 2010 -0700 tcp: Fix >4GB writes on 64-bit. Fixes kernel bugzilla #16603 tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write zero bytes, for example. There is also the problem higher up of how verify_iovec() works. It wants to prevent the total length from looking like an error return value. However it does this using 'int', but syscalls return 'long' (and thus signed 64-bit on 64-bit machines). So it could trigger false-positives on 64-bit as written. So fix it to use 'long'. Reported-by: Olaf Bonorden <bono@onlinehome.de> Reported-by: Daniel Büse <dbuese@gmx.de> Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Anyways, my point is that not only is the socket layer entirely ready for this, it is also the case that while 2GB may seem big today in most places, some day it might not be. :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Oct 24, 2010 at 7:14 PM, David Miller <davem@davemloft.net> wrote: > > Maybe the filesystem paths are this way, but the bulk of the socket > paths properly use size_t when touching anything even related > to an I/O length. Umm. "Bulk" is not "all". Which is the whole point. Most filesystems have no trouble either. But when a mistake is a security issue, that's not enough. > I know that TCP can do a >= 4GB write just fine right now. Again - totally irrelevant. Plus anybody who relies on doing 4GB writes in one go would be broken _anyway_. In other words, what you argue for has zero upsides, and it has downsides. As shown by the fact that TIPC was buggy. > In fact if you look I recently removed the last obstacle to this based > upon a bug report from a user trying to do a 4GB write (which ended up > getting truncated to zero): .. and if you looked at my suggested patch, you would have seen that it would have avoided that, and still worked fine (exactly because it doesn't truncate anything). David - the issue is _security_. The way to fix security problems is not to say "most things handle this correctly". The way to avoid them is to have several layers of handling things correctly, so that even when one turns out to be broken, the others still protect it. Linus -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Linus Torvalds <torvalds@linux-foundation.org> Date: Sun, 24 Oct 2010 20:42:39 -0700 > David - the issue is _security_. The way to fix security problems is > not to say "most things handle this correctly". The way to avoid them > is to have several layers of handling things correctly, so that even > when one turns out to be broken, the others still protect it. Fair enough. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Linus Torvalds <torvalds@linux-foundation.org> Date: Thu, 21 Oct 2010 17:31:12 -0700 > Something like the appended UNTESTED patch. NOTE! it actually makes > "verify_iovec()" *change* the iovec if it grows too big. Ok I thought a bit about this patch. How we can behave here depends upon the socket type. For example, for a stream socket such as TCP a partial write is OK and we could truncate the iovec like this. That's fine. But for a datagram socket, we have to have a one-to-one correspondance between write() calls and packets on the wire. So we'd either need to accept the entire write() length or fail it with an error. verify_iovec() (currently) doesn't have the socket type information available, so it's not able to key off of that right now. I agree that cutting off these cases at a high level would be the thing to do long term, but right now verify_iovec() isn't positioned such that we can do it just yet. For now I'm going to look into specifically fixing the TIPC case and also think longer term about another way to address this at a high level. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 27, 2010 at 10:29 AM, David Miller <davem@davemloft.net> wrote: > > But for a datagram socket, we have to have a one-to-one correspondance > between write() calls and packets on the wire. So we'd either need to > accept the entire write() length or fail it with an error. I disagree. We had that exact issue with regular file read/write: in theory, POSIX says that you should never do a partial write to a regular file. And the thing is, WE SIMPLY DON'T CARE. If somebody does a 2GB+ IO, they damn well need to accept that it's not going to be some atomic single event. It doesn't matter _how_ much actual real memory you have, it's just stupid to even care about that situation. It's not something any real app actually can reasonably ever expect to work, so rather than say "we have to do it right or error out", you should just see it as a "it's a stupid situation, we can do whatever the hell we want, because anybody who cares is a f*cking moron that we don't care about". If you _really_ care deeply, then some packet-oriented protocol can just have its own private packet size limit (which would be way less than 2GB), and then just look at the total size and say "oh, the total size is bigger than my limit, so I'll just error out". Then, the fact that verify_iovec() may have truncated the message to 2GB-1 doesn't matter at all. (Practically speaking, I bet all packet-oriented protocols already have a limit that is enforced by simply allocation patterns, so I don't think it's actually a problem even now) Linus -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
net/core/iovec.c | 17 ++++++++--------- 1 files changed, 8 insertions(+), 9 deletions(-) diff --git a/net/core/iovec.c b/net/core/iovec.c index e6b133b..b489fd6 100644 --- a/net/core/iovec.c +++ b/net/core/iovec.c @@ -38,7 +38,7 @@ long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode) { int size, ct; - long err; + size_t err; if (m->msg_namelen) { if (mode == VERIFY_READ) { @@ -60,14 +60,13 @@ long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, err = 0; for (ct = 0; ct < m->msg_iovlen; ct++) { - err += iov[ct].iov_len; - /* - * Goal is not to verify user data, but to prevent returning - * negative value, which is interpreted as errno. - * Overflow is still possible, but it is harmless. - */ - if (err < 0) - return -EMSGSIZE; + size_t len = iov[ct].iov_len; + + if (len > INT_MAX - err) { + len = INT_MAX - err; + iov[ct].iov_len = len; + } + err += len; } return err;