Message ID | 1320711791-11005-1-git-send-email-zenczykowski@gmail.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
FYI, This obviously requires a follow up patch which will add TOS and TCLASS info to appropriate dual-stack sockets (for example listening tcp v6 non-ipv6only). On Mon, Nov 7, 2011 at 4:23 PM, Maciej Żenczykowski <zenczykowski@gmail.com> wrote: > From: Maciej Żenczykowski <maze@google.com> > > commit 3ceca749668a52bd795585e0f71c6f0b04814f7b added a TOS attribute. > > Unfortunately TOS and TCLASS are both present in a dual-stack v6 socket, > furthermore they can have different values. As such one cannot in a > sane way expose both through a single attribute. > > Signed-off-by: Maciej Żenczyowski <maze@google.com> > CC: Murali Raja <muralira@google.com> > CC: Stephen Hemminger <shemminger@vyatta.com> > CC: Eric Dumazet <eric.dumazet@gmail.com> > CC: David S. Miller <davem@davemloft.net> > --- > include/linux/inet_diag.h | 3 ++- > net/ipv4/inet_diag.c | 4 ++-- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h > index 80b480c..abf5028 100644 > --- a/include/linux/inet_diag.h > +++ b/include/linux/inet_diag.h > @@ -98,9 +98,10 @@ enum { > INET_DIAG_VEGASINFO, > INET_DIAG_CONG, > INET_DIAG_TOS, > + INET_DIAG_TCLASS, > }; > > -#define INET_DIAG_MAX INET_DIAG_TOS > +#define INET_DIAG_MAX INET_DIAG_TCLASS > > > /* INET_DIAG_MEM */ > diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c > index f5e2bda..68e8ac5 100644 > --- a/net/ipv4/inet_diag.c > +++ b/net/ipv4/inet_diag.c > @@ -133,8 +133,8 @@ static int inet_csk_diag_fill(struct sock *sk, > &np->rcv_saddr); > ipv6_addr_copy((struct in6_addr *)r->id.idiag_dst, > &np->daddr); > - if (ext & (1 << (INET_DIAG_TOS - 1))) > - RTA_PUT_U8(skb, INET_DIAG_TOS, np->tclass); > + if (ext & (1 << (INET_DIAG_TCLASS - 1))) > + RTA_PUT_U8(skb, INET_DIAG_TCLASS, np->tclass); > } > #endif > > -- > 1.7.3.1 > >
On Mon, 7 Nov 2011 16:23:11 -0800 Maciej Żenczykowski <zenczykowski@gmail.com> wrote: > commit 3ceca749668a52bd795585e0f71c6f0b04814f7b added a TOS attribute. > > Unfortunately TOS and TCLASS are both present in a dual-stack v6 socket, > furthermore they can have different values. As such one cannot in a > sane way expose both through a single attribute. Do we really want to continue to expose that as a supported API. I would argue it was a mistake in the original implementation. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Do we really want to continue to expose that as a supported > API. I would argue it was a mistake in the original implementation. Yes, that's an interesting question... However, can this be changed at this point in time? Theoretically network fabric hardware could interpret v4 and v6 dscp code points differently (don't think anyone sane would do really want to do this though...) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Maciej Żenczykowski <zenczykowski@gmail.com> Date: Mon, 7 Nov 2011 16:23:11 -0800 > From: Maciej Żenczykowski <maze@google.com> > > commit 3ceca749668a52bd795585e0f71c6f0b04814f7b added a TOS attribute. > > Unfortunately TOS and TCLASS are both present in a dual-stack v6 socket, > furthermore they can have different values. As such one cannot in a > sane way expose both through a single attribute. > > Signed-off-by: Maciej Żenczyowski <maze@google.com> I can't see how an ipv6 mapped socket can even set the inet->tos value. As far as I can see, only net/ipv4/ip_sockglue.c:ip_setsockopt() provides the interface to change inet->tos. And ipv6 sockets, of any type, are provided no such vector by which to get at those interfaces. So inet->tos is always left at it's default value for ipv6 mapped sockets, and therefore I see no reason to report TCLASS vs. TOS separately. In fact, what I would suggest is to do something about the lack of ability to set inet->tos, and the best way to do that seems to be to simply propagate the npinfo->tclass setting into inet->tos. Performaing any munging if necessary. I'm not applying this patch. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I can't see how an ipv6 mapped socket can even set the inet->tos value. > > As far as I can see, only net/ipv4/ip_sockglue.c:ip_setsockopt() provides > the interface to change inet->tos. Yes, but an ipv6 socket is permitted to setsockopt SOL_IP in addition to SOL_IPV6. As such, one can simply setsockopt(ipv6_socket, SOL_IP, IP_TOS, value). This is just as well, because inet->tos is what determines the ipv4 tos value of any ipv4 packets generated by an ipv6 socket, ie. when connect(ed/ing) to an ipv4 mapped ipv6 address. > And ipv6 sockets, of any type, are provided no such vector by which to > get at those interfaces. > So inet->tos is always left at it's default value for ipv6 mapped sockets, > and therefore I see no reason to report TCLASS vs. TOS separately. > In fact, what I would suggest is to do something about the lack of > ability to set inet->tos, and the best way to do that seems to be to > simply propagate the npinfo->tclass setting into inet->tos. Performaing > any munging if necessary. > I'm not applying this patch. If people are in agreement that inet->tos vs ipv6->tclass being separately settable is not desirable, then I'm willing to go through and remove the tclass field (and send out a patch to do that). However, this is _QUITE DEFINITELY_ a user visible change in API and application visible behaviour. It is however an annoying corner case, so perhaps would be best fixed by getting rid of it? OTOH, I'm not sure if people aren't perhaps relying on the ability to separately set ipv4 dscp vs ipv6 tclass due to some internal network constraints... Personally, I'd prefer the simplification of not having to deal with IP_TOS vs IPV6_TCLASS discrepancies on ipv6 dual-stack sockets. - Maciej -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Maciej Żenczykowski <zenczykowski@gmail.com> Date: Wed, 9 Nov 2011 17:50:27 -0800 > Yes, but an ipv6 socket is permitted to setsockopt SOL_IP in addition > to SOL_IPV6. > As such, one can simply setsockopt(ipv6_socket, SOL_IP, IP_TOS, value). That's my whole point, this operation isn't currently possible, the socket ops for ipv4 setsockopt() aren't hooked up to ipv6 mapped sockets in the kernel right now. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2011/11/13 David Miller <davem@davemloft.net>: >>>> Yes, but an ipv6 socket is permitted to setsockopt SOL_IP in addition to SOL_IPV6. >>>> As such, one can simply setsockopt(ipv6_socket, SOL_IP, IP_TOS, value). >>> >>> That's my whole point, this operation isn't currently possible, the >>> socket ops for ipv4 setsockopt() aren't hooked up to ipv6 mapped >>> sockets in the kernel right now. >> >> http://lxr.linux.no/linux+v3.1/net/ipv6/ipv6_sockglue.c#L822 > > That definitely destroys the basis for all of my arguments :-) > > I'm going to add the TCLASS inet_diag patch, but can you find a cleaner > way to make sure that all types of sockets have the value initialized > to something sane? The one you posted wasn't... optimal :-) > > Thanks. I only realized just now that I had replied only to you and not to all. I hope you don't mind that I'm adding everyone back into the loop. (a) if we keep tos/tclass distinct. We should make sure to get this patch (and the as yet unwritten followup) into 3.2-final and not 3.3-rc1 (ie. net/master and not net-next/master). Since it does slightly change userspace visible API (thankfully, it was just added so it's not yet too late to change it). [FYI, I already see this patch in net/master - thanks!] As for the followup, I can see three ways to proceed. - The first involves figuring out what type of socket we're dealing with when we look at it (ie. something like the RFC patch I posted - I don't think it's going to be much prettier no matter what one does - there's simply a lot of complexity...) - The second involves keeping some sort of type-of-ip-socket field/bitmap (none, ipv4 only, ipv4/ipv6 dual stack, ipv6 only) directly in the socket. This is probably much harder to get right (changes all over the stack), but may have long term benefits??? - The third involves punting on the problem, we handle the simple cases in kernel and sometimes return excess information to userspace (ie. we take the RFC patch and strip out the address comparisons and error on the side of returning 'true') -> it's userspace's problem to sift through the excess garbage. (b) personally I would still prefer to just merge tos and tclass and get rid of the distinction... is this minor-but-userspace-visible-api change out of the question? - Maciej -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h index 80b480c..abf5028 100644 --- a/include/linux/inet_diag.h +++ b/include/linux/inet_diag.h @@ -98,9 +98,10 @@ enum { INET_DIAG_VEGASINFO, INET_DIAG_CONG, INET_DIAG_TOS, + INET_DIAG_TCLASS, }; -#define INET_DIAG_MAX INET_DIAG_TOS +#define INET_DIAG_MAX INET_DIAG_TCLASS /* INET_DIAG_MEM */ diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index f5e2bda..68e8ac5 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -133,8 +133,8 @@ static int inet_csk_diag_fill(struct sock *sk, &np->rcv_saddr); ipv6_addr_copy((struct in6_addr *)r->id.idiag_dst, &np->daddr); - if (ext & (1 << (INET_DIAG_TOS - 1))) - RTA_PUT_U8(skb, INET_DIAG_TOS, np->tclass); + if (ext & (1 << (INET_DIAG_TCLASS - 1))) + RTA_PUT_U8(skb, INET_DIAG_TCLASS, np->tclass); } #endif