Message ID | 6d52098964b54d848cbfd1957f093bd8@AcuMS.aculab.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net-next] net/ipv4/raw Optimise ipv4 raw sends when IP_HDRINCL set. | expand |
From: David Laight <David.Laight@ACULAB.COM> Date: Sun, 10 May 2020 16:00:41 +0000 > The final routing for ipv4 packets may be done with the IP address > from the message header not that from the address buffer. > If the addresses are different FLOWI_FLAG_KNOWN_NH must be set so > that a temporary 'struct rtable' entry is created to send the message. > However the allocate + free (under RCU) is relatively expensive > and can be avoided by a quick check shows the addresses match. > > Signed-off-by: David Laight <david.laight@aculab.com> The user can change the daddr field in userspace between when you do this test and when the iphdr is copied into the sk_buff. Also, you are obfuscating what you are doing in the way you have coded this check. You extract 4 bytes from a magic offset (16), which is hard to understand. Just explicitly code out the fact that you are accessing the daddr field of an ip header. But nonetheless you have to solve the "modified in userspace meanwhile" problem, as this is a bug we fix often in the kernel so we don't want to add new instances. Thanks.
From: David Miller > Sent: 11 May 2020 21:50 > To: David Laight <David.Laight@ACULAB.COM> > Cc: netdev@vger.kernel.org > Subject: Re: [PATCH net-next] net/ipv4/raw Optimise ipv4 raw sends when IP_HDRINCL set. > > From: David Laight <David.Laight@ACULAB.COM> > Date: Sun, 10 May 2020 16:00:41 +0000 > > > The final routing for ipv4 packets may be done with the IP address > > from the message header not that from the address buffer. > > If the addresses are different FLOWI_FLAG_KNOWN_NH must be set so > > that a temporary 'struct rtable' entry is created to send the message. > > However the allocate + free (under RCU) is relatively expensive > > and can be avoided by a quick check shows the addresses match. > > > > Signed-off-by: David Laight <david.laight@aculab.com> > > The user can change the daddr field in userspace between when you do > this test and when the iphdr is copied into the sk_buff. > > Also, you are obfuscating what you are doing in the way you have coded > this check. You extract 4 bytes from a magic offset (16), which is > hard to understand. Ok, that should at least be the structure offset. > Just explicitly code out the fact that you are accessing the daddr > field of an ip header. > > But nonetheless you have to solve the "modified in userspace > meanwhile" problem, as this is a bug we fix often in the kernel so we > don't want to add new instances. In this case the "modified in userspace meanwhile" just breaks the application - it isn't any kind of security issue. The problem is that you can't read the data into an skb until you have the offset - which is got by looking up the destination address. But you need the actual destination (from the packet data) to match the address buffer if you don't want to create a temporary rtable entry. I didn't find the commit that make rtable entries shared. I though the same table was used for routes and arps - but it looks like they got separated at some point. The code could put the address it read back into the skb, but that would look even worse. At the moment the performance is horrid when hundreds of the rtable entries get deleted under rcu all together. They are also added to a single locked linked list. I'm running 500 RTP streams which each send one UDP message every 20ms. It really needs to used the cached rtable entries. The only sane way to send the data is through a raw socket and to get the UDP checksum set you have to sort out the both IP addresses. (A UPD socket would be given rx data - which needs to go elsewhere.) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
From: David Laight <David.Laight@ACULAB.COM> Date: Mon, 11 May 2020 21:28:18 +0000 > In this case the "modified in userspace meanwhile" just breaks the > application - it isn't any kind of security issue. The kernel must provide correct behavior based upon the stable IP header that it copies into userspace. I'm not moving on this requirement, sorry. I'm sure you have great reasons why you can't use normal UDP sockets for RTP traffic, but that's how you will get a cached route and avoid this exact problem.
From: David Miller <davem@davemloft.net> > Sent: 12 May 2020 00:10 > From: David Laight <David.Laight@ACULAB.COM> > Date: Mon, 11 May 2020 21:28:18 +0000 > > > In this case the "modified in userspace meanwhile" just breaks the > > application - it isn't any kind of security issue. > > The kernel must provide correct behavior based upon the stable IP > header that it copies into userspace. I'm not moving on this > requirement, sorry. > > I'm sure you have great reasons why you can't use normal UDP sockets > for RTP traffic, but that's how you will get a cached route and avoid > this exact problem. Not unless you can tell me how to create a UDP socket that doesn't receive data. Even if there is a corresponding RTP receive flow there is no reason why it should use the same port numbers and IP addresses. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 3183413..0a81376 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -495,6 +495,27 @@ static int raw_getfrag(void *from, char *to, int offset, int len, int odd, return ip_generic_getfrag(rfv->msg, to, offset, len, odd, skb); } +static bool raw_msg_addr_matches(struct msghdr *msg, __be32 daddr) +{ + const struct iovec *iov; + __be32 msg_daddr; + + /* Check common case of user buffer with header in the first fragment. + * If we return false the message is still sent. + */ + + if (!iter_is_iovec(&msg->msg_iter)) + return false; + iov = msg->msg_iter.iov; + if (!iov || iov->iov_len < 20) + return false; + + if (get_user(msg_daddr, (__be32 __user *)(iov->iov_base + 16))) + return false; + + return daddr == msg_daddr; +} + static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { struct inet_sock *inet = inet_sk(sk); @@ -626,9 +647,14 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) flowi4_init_output(&fl4, ipc.oif, ipc.sockc.mark, tos, RT_SCOPE_UNIVERSE, hdrincl ? IPPROTO_RAW : sk->sk_protocol, - inet_sk_flowi_flags(sk) | - (hdrincl ? FLOWI_FLAG_KNOWN_NH : 0), + inet_sk_flowi_flags(sk), daddr, saddr, 0, 0, sk->sk_uid); + /* The final message routing may be done with the destination address + * in the user-supplied ipv4 header. If this differs from 'daddr' then + * a temporary destination table entry has to be created. + */ + if (hdrincl && !raw_msg_addr_matches(msg, daddr)) + fl4.flowi4_flags |= FLOWI_FLAG_KNOWN_NH; if (!hdrincl) { rfv.msg = msg;
The final routing for ipv4 packets may be done with the IP address from the message header not that from the address buffer. If the addresses are different FLOWI_FLAG_KNOWN_NH must be set so that a temporary 'struct rtable' entry is created to send the message. However the allocate + free (under RCU) is relatively expensive and can be avoided by a quick check shows the addresses match. Signed-off-by: David Laight <david.laight@aculab.com> --- This makes a considerable difference when we are sending a lot of RTP streams from a raw socket. IP_HDRINCL has to be set so that the calculated UDP checksum is right. net/ipv4/raw.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-)