diff mbox

[v6] sctp: be more restrictive in transport selection on bundled sacks

Message ID 1341061466-4186-1-git-send-email-nhorman@tuxdriver.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Neil Horman June 30, 2012, 1:04 p.m. UTC
It was noticed recently that when we send data on a transport, its possible that
we might bundle a sack that arrived on a different transport.  While this isn't
a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
2960:

 An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
   etc.) to the same destination transport address from which it
   received the DATA or control chunk to which it is replying.  This
   rule should also be followed if the endpoint is bundling DATA chunks
   together with the reply chunk.

This patch seeks to correct that.  It restricts the bundling of sack operations
to only those transports which have moved the ctsn of the association forward
since the last sack.  By doing this we guarantee that we only bundle outbound
saks on a transport that has received a chunk since the last sack.  This brings
us into stricter compliance with the RFC.

Vlad had initially suggested that we strictly allow only sack bundling on the
transport that last moved the ctsn forward.  While this makes sense, I was
concerned that doing so prevented us from bundling in the case where we had
received chunks that moved the ctsn on multiple transports.  In those cases, the
RFC allows us to select any of the transports having received chunks to bundle
the sack on.  so I've modified the approach to allow for that, by adding a state
variable to each transport that tracks weather it has moved the ctsn since the
last sack.  This I think keeps our behavior (and performance), close enough to
our current profile that I think we can do this without a sysctl knob to
enable/disable it.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yaseivch <vyasevich@gmail.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Reported-by: Michele Baldessari <michele@redhat.com>
Reported-by: sorin serban <sserban@redhat.com>

---
Change Notes:
V2)
	* Removed unused variable as per Dave M. Request
	* Delayed rwnd adjustment until we are sure we will sack (Vlad Y.)
V3)
	* Switched test to use pkt->transport rather than chunk->transport
	* Modified detection of sacka-able transport.  Instead of just setting
	  and clearning a flag, we now mark each transport and association with
	  a sack generation tag.  We increment the associations generation on
	  every sack, and assign that generation tag to every transport that
	  updates the ctsn.  This prevents us from having to iterate over a for
	  loop on every sack, which is much more scalable.
V4)
	* Fixed up wrapping comment and logic
V5)
	* Simplified wrap logic further per request from vlad
V6)
	* Changed some style point as per request from Dave M.
---
 include/net/sctp/structs.h |    4 ++++
 include/net/sctp/tsnmap.h  |    3 ++-
 net/sctp/associola.c       |    1 +
 net/sctp/output.c          |    5 +++++
 net/sctp/sm_make_chunk.c   |   16 ++++++++++++++++
 net/sctp/sm_sideeffect.c   |    2 +-
 net/sctp/transport.c       |    2 ++
 net/sctp/tsnmap.c          |    6 +++++-
 net/sctp/ulpevent.c        |    3 ++-
 net/sctp/ulpqueue.c        |    2 +-
 10 files changed, 39 insertions(+), 5 deletions(-)

Comments

David Miller July 1, 2012, 12:39 a.m. UTC | #1
From: Neil Horman <nhorman@tuxdriver.com>
Date: Sat, 30 Jun 2012 09:04:26 -0400

> It was noticed recently that when we send data on a transport, its possible that
> we might bundle a sack that arrived on a different transport.  While this isn't
> a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
> 2960:
> 
>  An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
>    etc.) to the same destination transport address from which it
>    received the DATA or control chunk to which it is replying.  This
>    rule should also be followed if the endpoint is bundling DATA chunks
>    together with the reply chunk.
> 
> This patch seeks to correct that.  It restricts the bundling of sack operations
> to only those transports which have moved the ctsn of the association forward
> since the last sack.  By doing this we guarantee that we only bundle outbound
> saks on a transport that has received a chunk since the last sack.  This brings
> us into stricter compliance with the RFC.
> 
> Vlad had initially suggested that we strictly allow only sack bundling on the
> transport that last moved the ctsn forward.  While this makes sense, I was
> concerned that doing so prevented us from bundling in the case where we had
> received chunks that moved the ctsn on multiple transports.  In those cases, the
> RFC allows us to select any of the transports having received chunks to bundle
> the sack on.  so I've modified the approach to allow for that, by adding a state
> variable to each transport that tracks weather it has moved the ctsn since the
> last sack.  This I think keeps our behavior (and performance), close enough to
> our current profile that I think we can do this without a sysctl knob to
> enable/disable it.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Vlad Yaseivch <vyasevich@gmail.com>
> CC: David S. Miller <davem@davemloft.net>
> CC: linux-sctp@vger.kernel.org
> Reported-by: Michele Baldessari <michele@redhat.com>
> Reported-by: sorin serban <sserban@redhat.com>

Once this has Vlad's ACK I'll apply it.

There has to be a better way to handle this situation, wherein the
responsible party has ACK'd the patch but I just ask for a few coding
style fixups and whatnot.  As it stands now I have to twiddle my
thumbs waiting for the new ACK.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vladislav Yasevich July 1, 2012, 3:17 a.m. UTC | #2
David Miller <davem@davemloft.net> wrote:

>From: Neil Horman <nhorman@tuxdriver.com>
>Date: Sat, 30 Jun 2012 09:04:26 -0400
>
>> It was noticed recently that when we send data on a transport, its
>possible that
>> we might bundle a sack that arrived on a different transport.  While
>this isn't
>> a major problem, it does go against the SHOULDAcm requirement in section
>6.4 of RFC
>> 2960:
>> 
>>  An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
>>    etc.) to the same destination transport address from which it
>>    received the DATA or control chunk to which it is replying.  This
>>    rule should also be followed if the endpoint is bundling DATA
>chunks
>>    together with the reply chunk.
>> 
>> This patch seeks to correct that.  It restricts the bundling of sack
>operations
>> to only those transports which have moved the ctsn of the association
>forward
>> since the last sack.  By doing this we guarantee that we only bundle
>outbound
>> saks on a transport that has received a chunk since the last sack. 
>This brings
>> us into stricter compliance with the RFC.
>> 
>> Vlad had initially suggested that we strictly allow only sack
>bundling on the
>> transport that last moved the ctsn forward.  While this makes sense,
>I was
>> concerned that doing so prevented us from bundling in the case where
>we had
>> received chunks that moved the ctsn on multiple transports.  In those
>cases, the
>> RFC allows us to select any of the transports having received chunks
>to bundle
>> the sack on.  so I've modified the approach to allow for that, by
>adding a state
>> variable to each transport that tracks weather it has moved the ctsn
>since the
>> last sack.  This I think keeps our behavior (and performance), close
>enough to
>> our current profile that I think we can do this without a sysctl knob
>to
>> enable/disable it.
>> 
>> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>> CC: Vlad Yaseivch <vyasevich@gmail.com>
>> CC: David S. Miller <davem@davemloft.net>
>> CC: linux-sctp@vger.kernel.org
>> Reported-by: Michele Baldessari <michele@redhat.com>
>> Reported-by: sorin serban <sserban@redhat.com>
>
>Once this has Vlad's ACK I'll apply it.
>

Acked-by: Vlad Yasevich <vyasevich@gmail.com>

Sorry for the delay.

-vlad

>There has to be a better way to handle this situation, wherein the
>responsible party has ACK'd the patch but I just ask for a few coding
>style fixups and whatnot.  As it stands now I have to twiddle my
>thumbs waiting for the new ACK.
David Miller July 1, 2012, 5:44 a.m. UTC | #3
From: Vlad Yasevich <vyasevich@gmail.com>
Date: Sat, 30 Jun 2012 23:17:52 -0400

> David Miller <davem@davemloft.net> wrote:
> 
>>Once this has Vlad's ACK I'll apply it.
> 
> Acked-by: Vlad Yasevich <vyasevich@gmail.com>

Applied, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman July 1, 2012, 12:47 p.m. UTC | #4
On Sat, Jun 30, 2012 at 05:39:45PM -0700, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Sat, 30 Jun 2012 09:04:26 -0400
> 
> > It was noticed recently that when we send data on a transport, its possible that
> > we might bundle a sack that arrived on a different transport.  While this isn't
> > a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
> > 2960:
> > 
> >  An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
> >    etc.) to the same destination transport address from which it
> >    received the DATA or control chunk to which it is replying.  This
> >    rule should also be followed if the endpoint is bundling DATA chunks
> >    together with the reply chunk.
> > 
> > This patch seeks to correct that.  It restricts the bundling of sack operations
> > to only those transports which have moved the ctsn of the association forward
> > since the last sack.  By doing this we guarantee that we only bundle outbound
> > saks on a transport that has received a chunk since the last sack.  This brings
> > us into stricter compliance with the RFC.
> > 
> > Vlad had initially suggested that we strictly allow only sack bundling on the
> > transport that last moved the ctsn forward.  While this makes sense, I was
> > concerned that doing so prevented us from bundling in the case where we had
> > received chunks that moved the ctsn on multiple transports.  In those cases, the
> > RFC allows us to select any of the transports having received chunks to bundle
> > the sack on.  so I've modified the approach to allow for that, by adding a state
> > variable to each transport that tracks weather it has moved the ctsn since the
> > last sack.  This I think keeps our behavior (and performance), close enough to
> > our current profile that I think we can do this without a sysctl knob to
> > enable/disable it.
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: Vlad Yaseivch <vyasevich@gmail.com>
> > CC: David S. Miller <davem@davemloft.net>
> > CC: linux-sctp@vger.kernel.org
> > Reported-by: Michele Baldessari <michele@redhat.com>
> > Reported-by: sorin serban <sserban@redhat.com>
> 
> Once this has Vlad's ACK I'll apply it.
> 
Thanks!
> There has to be a better way to handle this situation, wherein the
> responsible party has ACK'd the patch but I just ask for a few coding
> style fixups and whatnot.  As it stands now I have to twiddle my
> thumbs waiting for the new ACK.
> 
Perhaps we could modify the SubmittingPatches document to indicate that an
Acked-by from a subsystem maintainer implicitly confers authority on the
upstream receiver to request reasonable stylistic changes that don't affect the
functionality of the patch in the interests of maintaining coding conventions.

i.e. Since vlad applied an acked-by to v5 of this patch, that would give you the
right to carry that ack forward to v6 because you only asked for style changes.

Thoughts?
Neil

> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 1, 2012, 9:43 p.m. UTC | #5
From: Neil Horman <nhorman@tuxdriver.com>
Date: Sun, 1 Jul 2012 08:47:50 -0400

> Perhaps we could modify the SubmittingPatches document to indicate that an
> Acked-by from a subsystem maintainer implicitly confers authority on the
> upstream receiver to request reasonable stylistic changes that don't affect the
> functionality of the patch in the interests of maintaining coding conventions.

Yes, that would make sense.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman July 1, 2012, 11:44 p.m. UTC | #6
On Sun, Jul 01, 2012 at 02:43:19PM -0700, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Sun, 1 Jul 2012 08:47:50 -0400
> 
> > Perhaps we could modify the SubmittingPatches document to indicate that an
> > Acked-by from a subsystem maintainer implicitly confers authority on the
> > upstream receiver to request reasonable stylistic changes that don't affect the
> > functionality of the patch in the interests of maintaining coding conventions.
> 
> Yes, that would make sense.
> 


I'll propose it in a few days.
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index e4652fe..fecdf31 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -912,6 +912,9 @@  struct sctp_transport {
 		/* Is this structure kfree()able? */
 		malloced:1;
 
+	/* Has this transport moved the ctsn since we last sacked */
+	__u32 sack_generation;
+
 	struct flowi fl;
 
 	/* This is the peer's IP address and port. */
@@ -1584,6 +1587,7 @@  struct sctp_association {
 		 */
 		__u8    sack_needed;     /* Do we need to sack the peer? */
 		__u32	sack_cnt;
+		__u32	sack_generation;
 
 		/* These are capabilities which our peer advertised.  */
 		__u8	ecn_capable:1,	    /* Can peer do ECN? */
diff --git a/include/net/sctp/tsnmap.h b/include/net/sctp/tsnmap.h
index e7728bc..2c5d2b4 100644
--- a/include/net/sctp/tsnmap.h
+++ b/include/net/sctp/tsnmap.h
@@ -117,7 +117,8 @@  void sctp_tsnmap_free(struct sctp_tsnmap *map);
 int sctp_tsnmap_check(const struct sctp_tsnmap *, __u32 tsn);
 
 /* Mark this TSN as seen.  */
-int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn);
+int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn,
+		     struct sctp_transport *trans);
 
 /* Mark this TSN and all lower as seen. */
 void sctp_tsnmap_skip(struct sctp_tsnmap *map, __u32 tsn);
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 5bc9ab1..b16517e 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -271,6 +271,7 @@  static struct sctp_association *sctp_association_init(struct sctp_association *a
 	 */
 	asoc->peer.sack_needed = 1;
 	asoc->peer.sack_cnt = 0;
+	asoc->peer.sack_generation = 1;
 
 	/* Assume that the peer will tell us if he recognizes ASCONF
 	 * as part of INIT exchange.
diff --git a/net/sctp/output.c b/net/sctp/output.c
index f1b7d4b..6ae47ac 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -248,6 +248,11 @@  static sctp_xmit_t sctp_packet_bundle_sack(struct sctp_packet *pkt,
 		/* If the SACK timer is running, we have a pending SACK */
 		if (timer_pending(timer)) {
 			struct sctp_chunk *sack;
+
+			if (pkt->transport->sack_generation !=
+			    pkt->transport->asoc->peer.sack_generation)
+				return retval;
+
 			asoc->a_rwnd = asoc->rwnd;
 			sack = sctp_make_sack(asoc);
 			if (sack) {
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index a85eeeb..098cff5 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -734,8 +734,10 @@  struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
 	int len;
 	__u32 ctsn;
 	__u16 num_gabs, num_dup_tsns;
+	struct sctp_association *aptr = (struct sctp_association *)asoc;
 	struct sctp_tsnmap *map = (struct sctp_tsnmap *)&asoc->peer.tsn_map;
 	struct sctp_gap_ack_block gabs[SCTP_MAX_GABS];
+	struct sctp_transport *trans;
 
 	memset(gabs, 0, sizeof(gabs));
 	ctsn = sctp_tsnmap_get_ctsn(map);
@@ -805,6 +807,20 @@  struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
 		sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns,
 				 sctp_tsnmap_get_dups(map));
 
+	/* Once we have a sack generated, check to see what our sack
+	 * generation is, if its 0, reset the transports to 0, and reset
+	 * the association generation to 1
+	 *
+	 * The idea is that zero is never used as a valid generation for the
+	 * association so no transport will match after a wrap event like this,
+	 * Until the next sack
+	 */ 
+	if (++aptr->peer.sack_generation == 0) {
+		list_for_each_entry(trans, &asoc->peer.transport_addr_list,
+				    transports)
+			trans->sack_generation = 0;
+		aptr->peer.sack_generation = 1;
+	}
 nodata:
 	return retval;
 }
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index c96d1a8..8716da1 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1268,7 +1268,7 @@  static int sctp_cmd_interpreter(sctp_event_t event_type,
 		case SCTP_CMD_REPORT_TSN:
 			/* Record the arrival of a TSN.  */
 			error = sctp_tsnmap_mark(&asoc->peer.tsn_map,
-						 cmd->obj.u32);
+						 cmd->obj.u32, NULL);
 			break;
 
 		case SCTP_CMD_REPORT_FWDTSN:
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index b026ba0..1dcceb6 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -68,6 +68,8 @@  static struct sctp_transport *sctp_transport_init(struct sctp_transport *peer,
 	peer->af_specific = sctp_get_af_specific(addr->sa.sa_family);
 	memset(&peer->saddr, 0, sizeof(union sctp_addr));
 
+	peer->sack_generation = 0;
+
 	/* From 6.3.1 RTO Calculation:
 	 *
 	 * C1) Until an RTT measurement has been made for a packet sent to the
diff --git a/net/sctp/tsnmap.c b/net/sctp/tsnmap.c
index f1e40ceb..b5fb7c4 100644
--- a/net/sctp/tsnmap.c
+++ b/net/sctp/tsnmap.c
@@ -114,7 +114,8 @@  int sctp_tsnmap_check(const struct sctp_tsnmap *map, __u32 tsn)
 
 
 /* Mark this TSN as seen.  */
-int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
+int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn,
+		     struct sctp_transport *trans)
 {
 	u16 gap;
 
@@ -133,6 +134,9 @@  int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn)
 		 */
 		map->max_tsn_seen++;
 		map->cumulative_tsn_ack_point++;
+		if (trans)
+			trans->sack_generation =
+				trans->asoc->peer.sack_generation;
 		map->base_tsn++;
 	} else {
 		/* Either we already have a gap, or about to record a gap, so
diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 8a84017..33d8947 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -715,7 +715,8 @@  struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct sctp_association *asoc,
 	 * can mark it as received so the tsn_map is updated correctly.
 	 */
 	if (sctp_tsnmap_mark(&asoc->peer.tsn_map,
-			     ntohl(chunk->subh.data_hdr->tsn)))
+			     ntohl(chunk->subh.data_hdr->tsn),
+			     chunk->transport))
 		goto fail_mark;
 
 	/* First calculate the padding, so we don't inadvertently
diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index f2d1de7..f5a6a4f 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -1051,7 +1051,7 @@  void sctp_ulpq_renege(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk,
 	if (chunk && (freed >= needed)) {
 		__u32 tsn;
 		tsn = ntohl(chunk->subh.data_hdr->tsn);
-		sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn);
+		sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn, chunk->transport);
 		sctp_ulpq_tail_data(ulpq, chunk, gfp);
 
 		sctp_ulpq_partial_delivery(ulpq, chunk, gfp);