diff mbox

[RFC,1/2] ipv6: add the IPV6_FL_F_REPLY flag to IPV6_FL_A_GET

Message ID 1386844697-31169-1-git-send-email-florent.fourcot@enst-bretagne.fr
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Florent Fourcot Dec. 12, 2013, 10:38 a.m. UTC
With this option, the socket will reply with the flow label value read
on received packets.

The goal is to have a connection with the same flow label in both
direction of the communication.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
---
 include/linux/ipv6.h     |  1 +
 include/uapi/linux/in6.h |  1 +
 net/ipv6/ip6_flowlabel.c | 16 ++++++++++++++++
 net/ipv6/tcp_ipv6.c      | 10 ++++++++++
 4 files changed, 28 insertions(+)

Comments

Florent Fourcot Dec. 12, 2013, 10:39 a.m. UTC | #1
There two patches extend the flow API. It adds three new actions with
IPV6_FLOWLABEL_MGR:

 * setsockopt with IPV6_FL_A_GET and IPV6_FL_F_REPLY set:
      enable the reply of flow label, packet are sent with the
      last low label read in packet sent by remote device
 * setsockopt with IPV6_FL_A_PUT and IPV6_FL_F_REPLY set:
      disable the flow label reply
 * getsockopt with IPV6_FL_A_GET and IPV6_FL_F_REPLY set:
      get the flow label send by the remote device


I'm not sure of the best way to do that. First, should we allocate a
real flow label in the global flow label? I don't think so, and it could
add hard issues with the system of permission/sharing.

Second, I was not sure of the best way for the coverage for all TCP states.

Regards,

Florent.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Dec. 14, 2013, 9:56 a.m. UTC | #2
On Thu, Dec 12, 2013 at 11:39:30AM +0100, Florent Fourcot wrote:
> There two patches extend the flow API. It adds three new actions with
> IPV6_FLOWLABEL_MGR:
> 
>  * setsockopt with IPV6_FL_A_GET and IPV6_FL_F_REPLY set:
>       enable the reply of flow label, packet are sent with the
>       last low label read in packet sent by remote device
>  * setsockopt with IPV6_FL_A_PUT and IPV6_FL_F_REPLY set:
>       disable the flow label reply
>  * getsockopt with IPV6_FL_A_GET and IPV6_FL_F_REPLY set:
>       get the flow label send by the remote device

Sorry, I am not up to date with flowlabel RFCs and applications. Couldn't
such a feature get in the way of RSVP flow control if a remote side
can dictate the label to use for the outgoing connection (I know, your
implementation is opt-in, but I don't want to give application developers
something at hand which could make their software difficult to operate
in future).

> I'm not sure of the best way to do that. First, should we allocate a
> real flow label in the global flow label? I don't think so, and it could
> add hard issues with the system of permission/sharing.

I wouldn't do that either. OTOH we are emitting frames with a flowlabel the
flowmgr does not know about. :/

> Second, I was not sure of the best way for the coverage for all TCP states.

And why would you only support TCP? Wouldn't this work for connected UDP
sockets, too?

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florent Fourcot Dec. 16, 2013, 6:24 p.m. UTC | #3
Hello,

> Sorry, I am not up to date with flowlabel RFCs and applications. Couldn't
> such a feature get in the way of RSVP flow control if a remote side
> can dictate the label to use for the outgoing connection (I know, your
> implementation is opt-in, but I don't want to give application developers
> something at hand which could make their software difficult to operate
> in future).
> 

You are right, it could be a problem. But Flow Label are not only
specified for Quality of Service, and RFC's does not forbid this use case.
Since this option is opt-in, it should not be a problem. It is already
possible to set flow label with random values in an application. A
developper can do getsocketopt(FLOWINFO_SEND), and set the flow label to
the result (but it lost first packets of TCP handshakes).

>> I'm not sure of the best way to do that. First, should we allocate a
>> real flow label in the global flow label? I don't think so, and it could
>> add hard issues with the system of permission/sharing.
> 
> I wouldn't do that either. OTOH we are emitting frames with a flowlabel the
> flowmgr does not know about. :/
> 

Hum, what is the best way in your opinion? It is not hard to allocate
the flow label at the first received packet, but there are for me two
issues.
The most important problem is the "sharing" of flow label: what happens
if we receive a packet with a already reserved label? Should we return
an error to the user (and them, how to do that?), or can we ignore it.

The second issue is the renew/release of label, more particularly if the
label sent by the remote device change.


>> Second, I was not sure of the best way for the coverage for all TCP states.
> 
> And why would you only support TCP? Wouldn't this work for connected UDP
> sockets, too?
> 

Actually, I began with TCP, but of course others transport protocols
could support this feature. For the UDP case, it could be implemented in
user space with recvmg/sendmsg (because there are no extra SYN/ACK/etc
packets), but add this option in the kernel space is a good idea.

Thanks for you feedbacks,

Florent.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Dec. 17, 2013, 7:51 a.m. UTC | #4
On Mon, Dec 16, 2013 at 07:24:07PM +0100, Florent Fourcot wrote:
> Hello,
> 
> > Sorry, I am not up to date with flowlabel RFCs and applications. Couldn't
> > such a feature get in the way of RSVP flow control if a remote side
> > can dictate the label to use for the outgoing connection (I know, your
> > implementation is opt-in, but I don't want to give application developers
> > something at hand which could make their software difficult to operate
> > in future).
> > 
> 
> You are right, it could be a problem. But Flow Label are not only
> specified for Quality of Service, and RFC's does not forbid this use case.
> Since this option is opt-in, it should not be a problem. It is already
> possible to set flow label with random values in an application. A
> developper can do getsocketopt(FLOWINFO_SEND), and set the flow label to
> the result (but it lost first packets of TCP handshakes).

But the application would have to create the label prior to sending and
this could result in an error the application would get at label creation
time it has to deal with.

The flowinfo in the sockaddr_in6 structure will be checked against the flowmgr
cache at connect time. So this path is ok.

> >> I'm not sure of the best way to do that. First, should we allocate a
> >> real flow label in the global flow label? I don't think so, and it could
> >> add hard issues with the system of permission/sharing.
> > 
> > I wouldn't do that either. OTOH we are emitting frames with a flowlabel the
> > flowmgr does not know about. :/
> > 
> 
> Hum, what is the best way in your opinion? It is not hard to allocate
> the flow label at the first received packet, but there are for me two
> issues.

Actually it is hard IMHO. I don't think flowlabel cache has been in use
a lot and we would have to make sure it does have all the protections
current hash tables need (hash keying, eviction strategy hardening etc.).

It is not simple to insert items from the network into a globally in-kernel
data structure, especially if it was used locally only before. It may result
in DoS attacks or one could try to evict a label which is in need by another
application which maybe could bring another applications syscalls to fail.

So, I wouldn't do that until the flowlabel data structures have seen an
audit with that in mind.

I haven't looked at it in details but that are my concerns.

> The most important problem is the "sharing" of flow label: what happens
> if we receive a packet with a already reserved label? Should we return
> an error to the user (and them, how to do that?), or can we ignore it.
> 
> The second issue is the renew/release of label, more particularly if the
> label sent by the remote device change.

I am not a fan of lots of tunables, but maybe we can restrict this
with a sysctl. If you only care about the reflection feature, maybe
a setsockopt-IPPROTO_IPV6, IPV6_FLOWLABEL_REFLECT, on the listening
socket, which only succeeds if a sysctl has been switched from off to
on would do the trick. The documentation for the sysctl should state,
that this bypasses flowmgr and system-wide uniquness is not guaranteed
any more under any circumstance (EXCL flag does not hold any more). Am
I seeing this correctly?

> >> Second, I was not sure of the best way for the coverage for all TCP states.
> > 
> > And why would you only support TCP? Wouldn't this work for connected UDP
> > sockets, too?
> > 
> 
> Actually, I began with TCP, but of course others transport protocols
> could support this feature. For the UDP case, it could be implemented in
> user space with recvmg/sendmsg (because there are no extra SYN/ACK/etc
> packets), but add this option in the kernel space is a good idea.

It is ok to start with TCP only and add other protocols later.

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 3fde066..f3dce48 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -191,6 +191,7 @@  struct ipv6_pinfo {
 	/* sockopt flags */
 	__u16			recverr:1,
 	                        sndflow:1,
+				repflow:1,
 				pmtudisc:2,
 				ipv6only:1,
 				srcprefs:3,	/* 001: prefer temporary address
diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
index 440d5c4..4686c1f 100644
--- a/include/uapi/linux/in6.h
+++ b/include/uapi/linux/in6.h
@@ -85,6 +85,7 @@  struct in6_flowlabel_req {
 
 #define IPV6_FL_F_CREATE	1
 #define IPV6_FL_F_EXCL		2
+#define IPV6_FL_F_REPLY 	4
 
 #define IPV6_FL_S_NONE		0
 #define IPV6_FL_S_EXCL		1
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index e7fb710..6bc17dc 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -527,6 +527,15 @@  int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
 
 	switch (freq.flr_action) {
 	case IPV6_FL_A_PUT:
+		if (freq.flr_flags & IPV6_FL_F_REPLY) {
+			if (sk->sk_protocol != IPPROTO_TCP)
+				return -ENOPROTOOPT;
+			if (!np->repflow)
+				return -ESRCH;
+			np->flow_label = 0;
+			np->repflow = 0;
+			return 0;
+		}
 		spin_lock_bh(&ip6_sk_fl_lock);
 		for (sflp = &np->ipv6_fl_list;
 		     (sfl = rcu_dereference(*sflp))!=NULL;
@@ -567,6 +576,13 @@  int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
 		return -ESRCH;
 
 	case IPV6_FL_A_GET:
+		if (freq.flr_flags & IPV6_FL_F_REPLY) {
+			if (sk->sk_protocol != IPPROTO_TCP)
+				return -ENOPROTOOPT;
+			np->repflow = 1;
+			return 0;
+		}
+
 		if (freq.flr_label & ~IPV6_FLOWLABEL_MASK)
 			return -EINVAL;
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index da046a5..798c4bc 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -480,6 +480,8 @@  static int tcp_v6_send_synack(struct sock *sk, struct dst_entry *dst,
 				    &ireq->ir_v6_rmt_addr);
 
 		fl6->daddr = ireq->ir_v6_rmt_addr;
+		if (np->repflow)
+			fl6->flowlabel = np->flow_label;
 		skb_set_queue_mapping(skb, queue_mapping);
 		err = ip6_xmit(sk, skb, fl6, np->opt, np->tclass);
 		err = net_xmit_eval(err);
@@ -997,6 +999,8 @@  static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 	ireq = inet_rsk(req);
 	ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr;
 	ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr;
+	if (np->repflow)
+		np->flow_label = ip6_flowlabel(ipv6_hdr(skb));
 	if (!want_cookie || tmp_opt.tstamp_ok)
 		TCP_ECN_create_request(req, skb, sock_net(sk));
 
@@ -1135,6 +1139,8 @@  static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 		newnp->mcast_oif   = inet6_iif(skb);
 		newnp->mcast_hops  = ipv6_hdr(skb)->hop_limit;
 		newnp->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(skb));
+		if (np->repflow)
+			newnp->flow_label = ip6_flowlabel(ipv6_hdr(skb));
 
 		/*
 		 * No need to charge this sock to the relevant IPv6 refcnt debug socks count
@@ -1215,6 +1221,8 @@  static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 	newnp->mcast_oif  = inet6_iif(skb);
 	newnp->mcast_hops = ipv6_hdr(skb)->hop_limit;
 	newnp->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(skb));
+	if (np->repflow)
+		newnp->flow_label = ip6_flowlabel(ipv6_hdr(skb));
 
 	/* Clone native IPv6 options from listening socket (if any)
 
@@ -1426,6 +1434,8 @@  ipv6_pktoptions:
 			np->mcast_hops = ipv6_hdr(opt_skb)->hop_limit;
 		if (np->rxopt.bits.rxflow || np->rxopt.bits.rxtclass)
 			np->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(opt_skb));
+		if (np->repflow)
+			np->flow_label = ip6_flowlabel(ipv6_hdr(opt_skb));
 		if (ipv6_opt_accepted(sk, opt_skb)) {
 			skb_set_owner_r(opt_skb, sk);
 			opt_skb = xchg(&np->pktoptions, opt_skb);