diff mbox series

net: implement IP_RECVHDRS option to get full headers through recvmsg cmsg.

Message ID 20180401054314.33578-1-zenczykowski@gmail.com
State Rejected, archived
Delegated to: David Miller
Headers show
Series net: implement IP_RECVHDRS option to get full headers through recvmsg cmsg. | expand

Commit Message

Maciej Żenczykowski April 1, 2018, 5:43 a.m. UTC
From: Luigi Rizzo <lrizzo@google.com>

We have all sorts of different ways to fetch pre-UDP payload metadata:
  IP_RECVTOS
  IP_RECVTTL
  IP_RECVOPTS
  IP_RETOPTS

But nothing generic which simply allows you to receive the entire packet header.

This is in similar vein to TCP_SAVE_SYN but for UDP and other datagram sockets.

This is envisioned as a way to get GUE extension metadata for encapsulated
packets, but implemented in a way to be much more future proof.

(Implemented by Luigi, who asked me to send it upstream)

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
 include/net/inet_sock.h |  1 +
 include/uapi/linux/in.h |  1 +
 net/ipv4/ip_sockglue.c  | 26 ++++++++++++++++++++++++++
 3 files changed, 28 insertions(+)

Comments

Eric Dumazet April 1, 2018, 6:51 a.m. UTC | #1
On 03/31/2018 10:43 PM, Maciej Żenczykowski wrote:
> From: Luigi Rizzo <lrizzo@google.com>
> 
> We have all sorts of different ways to fetch pre-UDP payload metadata:
>   IP_RECVTOS
>   IP_RECVTTL
>   IP_RECVOPTS
>   IP_RETOPTS
> 
> But nothing generic which simply allows you to receive the entire packet header.
> 
> This is in similar vein to TCP_SAVE_SYN but for UDP and other datagram sockets.
> 
> This is envisioned as a way to get GUE extension metadata for encapsulated
> packets, but implemented in a way to be much more future proof.
> 
> (Implemented by Luigi, who asked me to send it upstream)
> 

Hmm... what happened to IPv6 ? ;)
David Miller April 2, 2018, 2:42 p.m. UTC | #2
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Sat, 31 Mar 2018 22:43:14 -0700

> From: Luigi Rizzo <lrizzo@google.com>
> 
> We have all sorts of different ways to fetch pre-UDP payload metadata:
>   IP_RECVTOS
>   IP_RECVTTL
>   IP_RECVOPTS
>   IP_RETOPTS
> 
> But nothing generic which simply allows you to receive the entire packet header.
> 
> This is in similar vein to TCP_SAVE_SYN but for UDP and other datagram sockets.
> 
> This is envisioned as a way to get GUE extension metadata for encapsulated
> packets, but implemented in a way to be much more future proof.
> 
> (Implemented by Luigi, who asked me to send it upstream)
> 
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Luigi Rizzo <lrizzo@google.com>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>

This is an ipv4 level socket option, so why are you copying in the MAC
header(s)?

That part I don't like at all.

First of all, you have no idea what the link level protocol is for that
MAC header, therefore how could you even begin to interpret it's contents
correctly?

Second of all, MAC level details belong not in AF_INET socket interfaces.

Thank you.
diff mbox series

Patch

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 0a671c32d6b9..4299750c3bea 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -237,6 +237,7 @@  struct inet_sock {
 #define IP_CMSG_ORIGDSTADDR	BIT(6)
 #define IP_CMSG_CHECKSUM	BIT(7)
 #define IP_CMSG_RECVFRAGSIZE	BIT(8)
+#define IP_CMSG_RECVHDRS	BIT(9)
 
 /**
  * sk_to_full_sk - Access to a full socket
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index 48e8a225b985..6dae3e1023cc 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -119,6 +119,7 @@  struct in_addr {
 #define IP_CHECKSUM	23
 #define IP_BIND_ADDRESS_NO_PORT	24
 #define IP_RECVFRAGSIZE	25
+#define IP_RECVHDRS	26
 
 /* IP_MTU_DISCOVER values */
 #define IP_PMTUDISC_DONT		0	/* Never send DF frames */
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 5ad2d8ed3a3f..35c5f70daea9 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -71,6 +71,14 @@  static void ip_cmsg_recv_tos(struct msghdr *msg, struct sk_buff *skb)
 	put_cmsg(msg, SOL_IP, IP_TOS, 1, &ip_hdr(skb)->tos);
 }
 
+/* Return all headers */
+static void ip_cmsg_recv_headers(struct msghdr *msg, struct sk_buff *skb)
+{
+	int len = skb->data - skb_mac_header(skb);
+
+	put_cmsg(msg, SOL_IP, IP_RECVHDRS, len, eth_hdr(skb));
+}
+
 static void ip_cmsg_recv_opts(struct msghdr *msg, struct sk_buff *skb)
 {
 	if (IPCB(skb)->opt.optlen == 0)
@@ -205,6 +213,14 @@  void ip_cmsg_recv_offset(struct msghdr *msg, struct sock *sk,
 			return;
 	}
 
+	if (flags & IP_CMSG_RECVHDRS) {
+		ip_cmsg_recv_headers(msg, skb);
+
+		flags &= ~IP_CMSG_RECVHDRS;
+		if (!flags)
+			return;
+	}
+
 	if (flags & IP_CMSG_RETOPTS) {
 		ip_cmsg_recv_retopts(sock_net(sk), msg, skb);
 
@@ -597,6 +613,7 @@  static int do_ip_setsockopt(struct sock *sk, int level,
 	case IP_PKTINFO:
 	case IP_RECVTTL:
 	case IP_RECVOPTS:
+	case IP_RECVHDRS:
 	case IP_RECVTOS:
 	case IP_RETOPTS:
 	case IP_TOS:
@@ -701,6 +718,12 @@  static int do_ip_setsockopt(struct sock *sk, int level,
 		else
 			inet->cmsg_flags &= ~IP_CMSG_RECVOPTS;
 		break;
+	case IP_RECVHDRS:
+		if (val)
+			inet->cmsg_flags |=  IP_CMSG_RECVHDRS;
+		else
+			inet->cmsg_flags &= ~IP_CMSG_RECVHDRS;
+		break;
 	case IP_RETOPTS:
 		if (val)
 			inet->cmsg_flags |= IP_CMSG_RETOPTS;
@@ -1362,6 +1385,9 @@  static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 	case IP_RECVOPTS:
 		val = (inet->cmsg_flags & IP_CMSG_RECVOPTS) != 0;
 		break;
+	case IP_RECVHDRS:
+		val = (inet->cmsg_flags & IP_CMSG_RECVHDRS) != 0;
+		break;
 	case IP_RETOPTS:
 		val = (inet->cmsg_flags & IP_CMSG_RETOPTS) != 0;
 		break;