diff mbox

tcp: make urg+gso work for real this time

Message ID 20081217114138.GA20976@gondor.apana.org.au
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu Dec. 17, 2008, 11:41 a.m. UTC
On Wed, Dec 17, 2008 at 10:26:00PM +1100, Herbert Xu wrote:
>
> I just checked NetBSD and guess what, they do exactly what I've
> suggested:

Actually the code quoted is from OpenBSD but I just checked
again and NetBSD does the same thing, while FreeBSD doesn't
check for overflows at all (Oops!).

> 	if (SEQ_GT(tp->snd_up, tp->snd_nxt)) {
> 		u_int32_t urp = tp->snd_up - tp->snd_nxt;
> 		if (urp > IP_MAXPACKET)
> 			urp = IP_MAXPACKET;
> 		th->th_urp = htons((u_int16_t)urp);
> 		th->th_flags |= TH_URG;
> 	} else  
> 
> So if you think this is bad then it's been there for decades :)

In case you still think you can rely on the urgent behaviour
you should check out this document:

http://tools.ietf.org/html/draft-gont-tcpm-urgent-data-00

So I think given all of the above setting the urg_ptr to 0xffff
is a good compromise.  Yes it does have the downside that the
recipient may misbehave by taking things out-of-band but an app
that doesn't deal with this is broken anyway since that's exactly
what'll happen if you run it under BSD.  On the plus side, we'll
actually signal urgent mode as soon as possible which is about
the only sensible thing to do for urgent data.

In other words what we do currently potentially makes urgent
mode completely useless since it may delay the sending of the
urgent notification indefinitely.  While the BSD behaviour would
only break "broken" applications.

Anyway, here's the patch:

tcp: Set urgent pointer even if we're more than 64K from the end

Our TCP stack does not set the urgent flag if the urgent pointer
does not fit in 16 bits, i.e., if it is more than 64K from the
sequence number of a packet.

This behaviour is different from NetBSD and OpenBSD, and clearly
contradicts the purpose of urgent mode, which is to send the
notification (though not necessarily the associated data) as soon
as possible.  Our current behaviour may in fact delay the urgent
message indefinitely if the receiver window does not open up.

This patch modifies our behaviour to align with that of the BSDs.

The potential downside is that applications that does not cope
with out-of-band data correctly may be confused by originally
in-band stuff going out-of-band.  However, such applications
are seriously broken anyway because this can occur when it runs
on any BSD or if urgent notifications are merged.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

 
Cheers,

Comments

Alexey Kuznetsov Dec. 17, 2008, 12:55 p.m. UTC | #1
Hello!

> In other words what we do currently potentially makes urgent
> mode completely useless since it may delay the sending of the
> urgent notification indefinitely.  While the BSD behaviour would
> only break "broken" applications.

This is true.

It definitely will work with correct application (those which
use SO_OOBINLINE), and surely will kill those which treat
urg data in _default_ __BSD__("BSD behaviour", you say, huh:-))
way and withdraw octet at urgent pointer. 

Probably, netbsd/openbsd have some another quirk to work this around.
I do not see how, though.

I decided that this is a fatal conflict, which can be solved cleanly
only in a painful way like you suggested. Current approach delaying
URG is the way which does not spoil any data, breaking only in the
case of aborts.

If you are serious about this, also it would be a good idea
to make SO_OOBINLINE default option. Consider this. At least this
will allow to trap applications which will corrupt data stream
earlier.

Alexey
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Petr Tesarik Dec. 17, 2008, 1:31 p.m. UTC | #2
Alexey Kuznetsov píše v St 17. 12. 2008 v 15:55 +0300:
> Hello!
> 
> > In other words what we do currently potentially makes urgent
> > mode completely useless since it may delay the sending of the
> > urgent notification indefinitely.  While the BSD behaviour would
> > only break "broken" applications.
> 
> This is true.
> 
> It definitely will work with correct application (those which
> use SO_OOBINLINE), and surely will kill those which treat
> urg data in _default_ __BSD__("BSD behaviour", you say, huh:-))
> way and withdraw octet at urgent pointer. 
> 
> Probably, netbsd/openbsd have some another quirk to work this around.
> I do not see how, though.
> 
> I decided that this is a fatal conflict, which can be solved cleanly
> only in a painful way like you suggested. Current approach delaying
> URG is the way which does not spoil any data, breaking only in the
> case of aborts.
> 
> If you are serious about this, also it would be a good idea
> to make SO_OOBINLINE default option. Consider this. At least this
> will allow to trap applications which will corrupt data stream
> earlier.

This might turn out to be a dangerous decision. I am aware at least of
one big application which uses urgent data and doesn't use SO_OOBINLINE:
Oracle.

Yes, of course it's broken, but do we really want to join the much-hated
folks who are stuck with an only-we-do-it-correctly approach?

Just my 2 cents,
Petr Tesarik


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexey Kuznetsov Dec. 17, 2008, 2:21 p.m. UTC | #3
Hello!

> This might turn out to be a dangerous decision. I am aware at least of
> one big application which uses urgent data and doesn't use SO_OOBINLINE:
> Oracle.

Yes, this is not some "rlogin"...


> Yes, of course it's broken, but do we really want to join the much-hated
> folks who are stuck with an only-we-do-it-correctly approach?

Well, we do not want to return to that company. :-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexey Kuznetsov Dec. 17, 2008, 3:30 p.m. UTC | #4
Hello!

> Probably, netbsd/openbsd have some another quirk to work this around.
> I do not see how, though.

F.e. this could be totally safe if we do this only in the case
when octet seq+0xFFFF has not yet been sent.

When we send segment with 0xFFFF inside urg_ptr is advanced and so on,
so that receiver has no chances to corrupt stream.

I.e. we set URG and urg_ptr to fake scb->seq + 0xFFFF only when:

	after(scb->seq + 0xFFFF, tp->snd_nxt)

My brains are rusty, so take this critically. :-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Petr Tesarik Dec. 17, 2008, 4:04 p.m. UTC | #5
Alexey Kuznetsov píše v St 17. 12. 2008 v 18:30 +0300:
> Hello!
> 
> > Probably, netbsd/openbsd have some another quirk to work this around.
> > I do not see how, though.
> 
> F.e. this could be totally safe if we do this only in the case
> when octet seq+0xFFFF has not yet been sent.
> 
> When we send segment with 0xFFFF inside urg_ptr is advanced and so on,
> so that receiver has no chances to corrupt stream.
> 
> I.e. we set URG and urg_ptr to fake scb->seq + 0xFFFF only when:
> 
> 	after(scb->seq + 0xFFFF, tp->snd_nxt)
> 
> My brains are rusty, so take this critically. :-)

This is all nice, but it still does not solve those series of SIGURGs on
the receiving side. My suggestion is to not generate a new SIGURG until
the data for the latest one have arrived. There can be only one byte of
urgent data, so if somebody sends more than one, it cannot be handled by
the receiver, anyway, so that use case is broken and need not be taken
into account.

Oh, and since SIGURG is not a real-time signal, there is no requirement
to send as many of them as there were urgent data. Of course, this does
not help other OSes, but I'm not sure how they behave in these
situations.

Petr

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Dec. 17, 2008, 9:27 p.m. UTC | #6
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> wrote:
> 
> If you are serious about this, also it would be a good idea
> to make SO_OOBINLINE default option. Consider this. At least this
> will allow to trap applications which will corrupt data stream
> earlier.

Well enabling SO_OOBINLINE would contradict POSIX.  Much as I'd
like to have it on by default, I don't think it's really worth it.

In any case such applications will break right now because URG
notifications can be merged due to out-of-order delivery, which
would cause OOB data to go in-band.

Cheers,
Herbert Xu Dec. 17, 2008, 9:29 p.m. UTC | #7
On Wed, Dec 17, 2008 at 06:30:15PM +0300, Alexey Kuznetsov wrote:
> Hello!
> 
> > Probably, netbsd/openbsd have some another quirk to work this around.
> > I do not see how, though.
> 
> F.e. this could be totally safe if we do this only in the case
> when octet seq+0xFFFF has not yet been sent.
> 
> When we send segment with 0xFFFF inside urg_ptr is advanced and so on,
> so that receiver has no chances to corrupt stream.
> 
> I.e. we set URG and urg_ptr to fake scb->seq + 0xFFFF only when:
> 
> 	after(scb->seq + 0xFFFF, tp->snd_nxt)
> 
> My brains are rusty, so take this critically. :-)

Indeed that's a great idea!

Thanks,
Herbert Xu Dec. 17, 2008, 9:30 p.m. UTC | #8
On Wed, Dec 17, 2008 at 05:04:48PM +0100, Petr Tesarik wrote:
>
> This is all nice, but it still does not solve those series of SIGURGs on
> the receiving side. My suggestion is to not generate a new SIGURG until
> the data for the latest one have arrived. There can be only one byte of
> urgent data, so if somebody sends more than one, it cannot be handled by
> the receiver, anyway, so that use case is broken and need not be taken
> into account.

No we need to send a SIGURG as soon as we enter urgent mode.  The
whole point of urgent mode is to inform the receiver ASAP.  As the
urgent data itself cannot be expedited due to the in-order nature
of TCP, setting the flag and sending the signal is the best we can
do.

Cheers,
Herbert Xu Dec. 17, 2008, 10:16 p.m. UTC | #9
On Wed, Dec 17, 2008 at 03:55:05PM +0300, Alexey Kuznetsov wrote:
>
> Probably, netbsd/openbsd have some another quirk to work this around.
> I do not see how, though.

Indeed they do have more quirks.  I only saw it after reading it
again, but they always use snd_nxt instead of the current sequence
number:

	if (SEQ_GT(tp->snd_up, tp->snd_nxt)) {
		u_int32_t urp = tp->snd_up - tp->snd_nxt;
		if (urp > IP_MAXPACKET)
			urp = IP_MAXPACKET;
		th->th_urp = htons((u_int16_t)urp);
		th->th_flags |= TH_URG;

So on the one hand this means that for a fresh transmission we
only use 0xffff when the urgent data is not in the packet.

However, for retransmissions the urgent pointer is comletely
bogus!

Again this shows that any application that's relying on the urgent
data to be out-of-band is just broken.  The only sane way to use
urgent mode is as a notification.

Cheers,
Herbert Xu Dec. 17, 2008, 10:52 p.m. UTC | #10
On Thu, Dec 18, 2008 at 08:29:07AM +1100, Herbert Xu wrote:
>
> > I.e. we set URG and urg_ptr to fake scb->seq + 0xFFFF only when:
> > 
> > 	after(scb->seq + 0xFFFF, tp->snd_nxt)
> > 
> > My brains are rusty, so take this critically. :-)
> 
> Indeed that's a great idea!

OK, I think this is what we should do:

1) Fix fresh transmissions of urgent mode as Alexey suggested.

This is the safest way of ensuring that the urgent notification
is not delayed.  This is still not as "fast" as the BSD behaviour
but it is much safer with respect to broken^H^H^H^H^H^Hlegacy
applications.

2) Continue to use the current test for retransmissions.

3) Apply my original GSO patch and revert the existing URG fixes.

With Alexey's fix, the scenario that Ilpo identified would only
occur on retransmissions.  As we disable TSO on retransmissions,
it no longer affects my patch.

Thanks,
David Miller Dec. 17, 2008, 11:17 p.m. UTC | #11
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 18 Dec 2008 09:16:37 +1100

> However, for retransmissions the urgent pointer is comletely
> bogus!
> 
> Again this shows that any application that's relying on the urgent
> data to be out-of-band is just broken.  The only sane way to use
> urgent mode is as a notification.

I don't think this is a completely fair assesment, to be
honest.

Showing that the BSD guys have some half-working hacks in
this area is pretty immaterial, in the grand scheme of things.
The fact that retransmits will corrupt the URG pointer is
merely an indication of how few people use BSD and URG in
this combination of circumstances.

Nothing more.

And it also shows that our strange scheme, where we don't
signal URG until it's within the 16-bit sequence offset
limit, at least reports an accurate URG value.

Also, what if we're using IPV6 jumbo frames and we advertise this
special 0xffff URG offset turd?  That points to a real byte in the
packet, but it might not be the right one since we can only accurately
point up to 0xffff into there.

In TCP/IP Volume 2, the original BSD code is even more ignorant of
this situation, it simply sets the difference into the field as-is,
not being mindful at all of overflow:

	ti->ti_urp = htons((u_short) (tp->snd_up - tp->snd_nxt));

which probably prompted the workaround you see in the BSD code these
days.  Which, as we've seen, it buggy.

Given all of this I still think our current compromise is likely the
best one.  We never will advertise an URG pointer that is not pointing
to where the URG data will be in the sequence space.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Dec. 17, 2008, 11:25 p.m. UTC | #12
On Wed, Dec 17, 2008 at 03:17:13PM -0800, David Miller wrote:
> 
> Also, what if we're using IPV6 jumbo frames and we advertise this
> special 0xffff URG offset turd?  That points to a real byte in the
> packet, but it might not be the right one since we can only accurately
> point up to 0xffff into there.

1) We don't support that right now.

2) Even if we did this would be broken either way because just
where do you put the urgent pointer if your packet is > 64K and
the urgent pointer lies in the packet but beyond 64K to the start?

In any case we can easily fix this up with an additional check
so I don't think it's central to the issue overall.

> Given all of this I still think our current compromise is likely the
> best one.  We never will advertise an URG pointer that is not pointing
> to where the URG data will be in the sequence space.

I agree to the extent that urgent mode is hopelessly broken in
so many ways so one more isn't going to make that much of a
difference.

However, I still think that the approach suggested by Alexey is
better than the status quo:

1) It's unlikely to break any existing apps (all it does is send SIGURG
multiple times).

2) For those apps that use urgent mode in the first place, they're
likely to want to see the urgent notification ASAP, rather than having
it delayed due to the receiver being busy.  So this is an improvement
over what we have.

Cheers,
David Miller Dec. 17, 2008, 11:34 p.m. UTC | #13
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 18 Dec 2008 10:25:54 +1100

> 1) We don't support that right now.

I guess this is what the people writing RFC793 were thinking
as well :-)

> 2) Even if we did this would be broken either way because just
> where do you put the urgent pointer if your packet is > 64K and
> the urgent pointer lies in the packet but beyond 64K to the start?

You would need to chop up the packet to allow proper specification of
the URG pointer.

> > Given all of this I still think our current compromise is likely the
> > best one.  We never will advertise an URG pointer that is not pointing
> > to where the URG data will be in the sequence space.
> 
> I agree to the extent that urgent mode is hopelessly broken in
> so many ways so one more isn't going to make that much of a
> difference.
> 
> However, I still think that the approach suggested by Alexey is
> better than the status quo:
> 
> 1) It's unlikely to break any existing apps (all it does is send SIGURG
> multiple times).
> 
> 2) For those apps that use urgent mode in the first place, they're
> likely to want to see the urgent notification ASAP, rather than having
> it delayed due to the receiver being busy.  So this is an improvement
> over what we have.

I agree that prompt signalling of the URG condition is desirable.

None of the RFCs seem to give any real guidance in this area, and that
explains the plethora of ways we see various stacks handling this
situation.

I'll think about your patch some more, but no promises.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Dec. 17, 2008, 11:41 p.m. UTC | #14
On Wed, Dec 17, 2008 at 03:34:42PM -0800, David Miller wrote:
>
> > 2) Even if we did this would be broken either way because just
> > where do you put the urgent pointer if your packet is > 64K and
> > the urgent pointer lies in the packet but beyond 64K to the start?
> 
> You would need to chop up the packet to allow proper specification of
> the URG pointer.

Right, and it'd be just as easy to chop up the packet so that we
can set the urgent pointer to 0xffff :)

> I agree that prompt signalling of the URG condition is desirable.
> 
> None of the RFCs seem to give any real guidance in this area, and that
> explains the plethora of ways we see various stacks handling this
> situation.
> 
> I'll think about your patch some more, but no promises.

This is definitely something that we don't want to rush into.

Thanks,
Petr Tesarik Dec. 18, 2008, 8:58 a.m. UTC | #15
Herbert Xu píše v Čt 18. 12. 2008 v 08:30 +1100:
> On Wed, Dec 17, 2008 at 05:04:48PM +0100, Petr Tesarik wrote:
> >
> > This is all nice, but it still does not solve those series of SIGURGs on
> > the receiving side. My suggestion is to not generate a new SIGURG until
> > the data for the latest one have arrived. There can be only one byte of
> > urgent data, so if somebody sends more than one, it cannot be handled by
> > the receiver, anyway, so that use case is broken and need not be taken
> > into account.
> 
> No we need to send a SIGURG as soon as we enter urgent mode.  The
> whole point of urgent mode is to inform the receiver ASAP.  As the
> urgent data itself cannot be expedited due to the in-order nature
> of TCP, setting the flag and sending the signal is the best we can
> do.

That doesn't contradict my suggestion, which is: "Send the SIGURG as
soon as possible, but do not send _more_ SIGURGs until the urgent data
arrives."  I'm going to send a patch to make it clear.

Petr Tesarik


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Dec. 18, 2008, 9:06 a.m. UTC | #16
On Thu, Dec 18, 2008 at 09:58:12AM +0100, Petr Tesarik wrote:
>
> That doesn't contradict my suggestion, which is: "Send the SIGURG as
> soon as possible, but do not send _more_ SIGURGs until the urgent data
> arrives."  I'm going to send a patch to make it clear.

OK that does make sense.

Thanks,
Ilpo Järvinen Dec. 18, 2008, 7:45 p.m. UTC | #17
On Thu, 18 Dec 2008, Herbert Xu wrote:

> On Thu, Dec 18, 2008 at 08:29:07AM +1100, Herbert Xu wrote:
> >
> > > I.e. we set URG and urg_ptr to fake scb->seq + 0xFFFF only when:
> > > 
> > > 	after(scb->seq + 0xFFFF, tp->snd_nxt)
> > > 
> > > My brains are rusty, so take this critically. :-)
> > 
> > Indeed that's a great idea!
> 
> OK, I think this is what we should do:
> 
> 1) Fix fresh transmissions of urgent mode as Alexey suggested.
> 
> This is the safest way of ensuring that the urgent notification
> is not delayed.  This is still not as "fast" as the BSD behaviour
> but it is much safer with respect to broken^H^H^H^H^H^Hlegacy
> applications.
> 
> 2) Continue to use the current test for retransmissions.
> 
> 3) Apply my original GSO patch and revert the existing URG fixes.
> 
> With Alexey's fix, the scenario that Ilpo identified would only
> occur on retransmissions.  As we disable TSO on retransmissions,
> it no longer affects my patch.

No, would you use that 0xffff hack with gso you would create even worse 
problem in case of reordering / losses. And that's just because the very 
scenario still remains fully unsolved. I'm not sure if you understood my 
scenario at all (I was hopeful that you did :-))? It's basically that 
(with the terms of this hack) some segments in a super-skb would need to 
set 0xffff while some < 0xffff. ...But you just can't send too large urg 
ptr in any segment or you're asking for trouble.

Also, I think even without gso but with reordering / enough losses will 
make it possible that not-yet-sent condition of Alexey's proposal (it's 
techically not-yet-received) will no longer hold from _the receiver's_ 
point of view and havoc breaks loose.
Alexey Kuznetsov Dec. 19, 2008, 1:46 p.m. UTC | #18
Hello!

> Indeed they do have more quirks.  I only saw it after reading it
> again, but they always use snd_nxt instead of the current sequence
> number:

Their snd_nxt and our snd_nxt are different things.
Analog of our snd_nxt used to be called snd_max.

BSD snd_nxt points to current transmission head,
which is rewinded to snd_una when it starts to retransmit.


> 	if (SEQ_GT(tp->snd_up, tp->snd_nxt)) {
> 		u_int32_t urp = tp->snd_up - tp->snd_nxt;
> 		if (urp > IP_MAXPACKET)
> 			urp = IP_MAXPACKET;
> 		th->th_urp = htons((u_int16_t)urp);
> 		th->th_flags |= TH_URG;

Obviously, somewhere around this place they do th->th_seq = tp->snd_nxt.


BTW, that line of code which I sent was logically (not practically) wrong,
Dave spotted this instantly. :-) Logically correct way would be
to check against future snd_nxt which will be advanced
after the packet is submitted. Practically, it does not matter,
the difference is only for jumbo frames, which are inhibited
for tcp.

Alexey
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Dec. 19, 2008, 7:59 p.m. UTC | #19
On Fri, Dec 19, 2008 at 04:46:44PM +0300, Alexey Kuznetsov wrote:
>
> BSD snd_nxt points to current transmission head,
> which is rewinded to snd_una when it starts to retransmit.

Right, but it would seem that it's broken for sack retransmits
at least.  This is from NetBSD but the others are similar:

	/*
	 * If we are doing retransmissions, then snd_nxt will
	 * not reflect the first unsent octet.  For ACK only
	 * packets, we do not want the sequence number of the
	 * retransmitted packet, we want the sequence number
	 * of the next unsent octet.  So, if there is no data
	 * (and no SYN or FIN), use snd_max instead of snd_nxt
	 * when filling in ti_seq.  But if we are in persist
	 * state, snd_max might reflect one byte beyond the
	 * right edge of the window, so use snd_nxt in that
	 * case, since we know we aren't doing a retransmission.
	 * (retransmit and persist are mutually exclusive...)
	 */
	if (TCP_SACK_ENABLED(tp) && sack_rxmit) {
		th->th_seq = htonl(p->rxmit);
		p->rxmit += len;
	} else {
		if (len || (flags & (TH_SYN|TH_FIN)) ||
		    TCP_TIMER_ISARMED(tp, TCPT_PERSIST))
			th->th_seq = htonl(tp->snd_nxt);
		else
			th->th_seq = htonl(tp->snd_max);
	}
 
> BTW, that line of code which I sent was logically (not practically) wrong,
> Dave spotted this instantly. :-) Logically correct way would be
> to check against future snd_nxt which will be advanced
> after the packet is submitted. Practically, it does not matter,
> the difference is only for jumbo frames, which are inhibited
> for tcp.

Actually I think the unadvanced snd_nxt makes sense too.  What we're
asking there is has the urgent data been transmitted before, i.e.,
is the current skb a retransmit (since we've already established
that the urgent pointer is forward of the current packet).

Yes this would be broken if we ever do jumbo frames, but it'd
be broken for the same reason that the current test'd be brokens,
that is, we need to split the packet in urgent mode.

Thanks,
David Miller Dec. 26, 2008, 1:11 a.m. UTC | #20
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 18 Dec 2008 21:45:13 +0200 (EET)

> No, would you use that 0xffff hack with gso you would create even worse 
> problem in case of reordering / losses. And that's just because the very 
> scenario still remains fully unsolved. I'm not sure if you understood my 
> scenario at all (I was hopeful that you did :-))? It's basically that 
> (with the terms of this hack) some segments in a super-skb would need to 
> set 0xffff while some < 0xffff. ...But you just can't send too large urg 
> ptr in any segment or you're asking for trouble.

Hmmm, since URG is relative it means that GSO (or in-hardware TSO)
would need to adjust the URG offset for each segment.

But if we stick this 0xffff thing there, it _can't_ do that properly
because it cannot know where the correct URG pointer actually is.

In fact, the straightforward URG offset adjust one would expect a TSO
engine to make would corrupt the URG pointer when we put this
"infinity" value there.

The URG pointer might be >= 0xffff from the initial sequence number
of the starting TSO segment, but the URG offset might be in range
of one of the non-initial MSS segmented frames.

Anyways, we skip TSO/GSO for any time URG is indicated making all of
this particular concern moot. ;-)  But if at some point we want to
allow it, we'd need to consider this problem.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Dec. 26, 2008, 1:47 a.m. UTC | #21
On Thu, Dec 25, 2008 at 05:11:46PM -0800, David Miller wrote:
>
> Hmmm, since URG is relative it means that GSO (or in-hardware TSO)
> would need to adjust the URG offset for each segment.
> 
> But if we stick this 0xffff thing there, it _can't_ do that properly
> because it cannot know where the correct URG pointer actually is.
> 
> In fact, the straightforward URG offset adjust one would expect a TSO
> engine to make would corrupt the URG pointer when we put this
> "infinity" value there.

With Alexey's proposal 0xffff is not infinity, what it is is a
position that we have not yet transmited.  The point is that due
to the way urgent mode works, the remote side will not exit urgent
mode or elide the data until it sees the actual data pointed to
by the urgent pointer.

When we do finally transmit the data pointed to by 0xffff, it'll
be accompanied with a new urgent pointer either pointing to the real
urgent data or 0xffff (which is again untransmitted).

> The URG pointer might be >= 0xffff from the initial sequence number
> of the starting TSO segment, but the URG offset might be in range
> of one of the non-initial MSS segmented frames.

It doesn't really matter if it's in range or not, the point is that
whether we point to the 0xffff point or the real urgent data, the
data at that sequence number is yet to be transmitted.  So by the
time the remote peer actually gets to that point, it would have
received an updated urgent pointer.

If you're talking about the case where we didn't set the urgent
pointer on the TSO packet, but we should set it on the final
segment generated from it, then that can't happen because we
don't do TSO on retransmissions.

Here's an example, we have a TSO packet with 1440 * 45 = 64800 bytes
of data, the MSS is 1440, let the initial sequence number be zero
Let's assume that the urgent data is at 66240.

So the TCP stack will send the TSO packet with urgent pointer set
to 65535.  The first segmented TCP packet will also have 65535 as
the urgent pointer, the one after it will have 65535 - 1440 = 64095,
etc.  The final one will have 65535 - 1440 * 44 = 2175.

The TCP stack then transmits the next packet (let's assume it's 1440
bytes long for the sake of simplicity), which has the urgent pointer
set to 66240 - 64800 = 1440.

If all these packets are delivered in order, then what'll happen
is that RCV.UP starts out at 65535, and once the final segment is
delivered it'll be updated to 66240.

If out-of-order delivery occurs and the final segment arrives earlier,
then all that'll happen is that RCV.UP will hit 66240 earlier, which
is not an issue since all subsequent urgent pointers will be less and
hence ignored.

Cheers,
David Miller Dec. 26, 2008, 2 a.m. UTC | #22
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 26 Dec 2008 12:47:39 +1100

> If all these packets are delivered in order, then what'll happen
> is that RCV.UP starts out at 65535, and once the final segment is
> delivered it'll be updated to 66240.
> 
> If out-of-order delivery occurs and the final segment arrives earlier,
> then all that'll happen is that RCV.UP will hit 66240 earlier, which
> is not an issue since all subsequent urgent pointers will be less and
> hence ignored.

So it all hinges on how the receiver updates his URG pointer.  This is
what legitimizes a sender advertising a changing URG pointer.

So it works, but it isn't the nicest thing in the world :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Dec. 26, 2008, 2:38 a.m. UTC | #23
On Thu, Dec 25, 2008 at 06:00:51PM -0800, David Miller wrote:
> 
> So it all hinges on how the receiver updates his URG pointer.  This is
> what legitimizes a sender advertising a changing URG pointer.

Pretty much.

> So it works, but it isn't the nicest thing in the world :-)

It would've been nice and simple had the original implementors
strictly followed RFC793 and done the out-of-band notifications
instead of coming up with their fanciful notions of out-of-band
data :)

Cheers,
diff mbox

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index fe3b4bd..2964c77 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -663,10 +663,12 @@  static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 	th->urg_ptr		= 0;
 
 	/* The urg_mode check is necessary during a below snd_una win probe */
-	if (unlikely(tcp_urg_mode(tp) &&
-		     between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF))) {
-		th->urg_ptr		= htons(tp->snd_up - tcb->seq);
+	if (unlikely(tcp_urg_mode(tp))) {
 		th->urg			= 1;
+		if (between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF))
+			th->urg_ptr = htons(tp->snd_up - tcb->seq);
+		else
+			th->urg_ptr = 0xFFFF;
 	}
 
 	tcp_options_write((__be32 *)(th + 1), tp, &opts, &md5_hash_location);