Message ID | 1396964037-13245-1-git-send-email-dborkman@redhat.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On 04/08/2014 09:33 AM, Daniel Borkmann wrote: > SCTP charges chunks for wmem accounting via skb->truesize in > sctp_set_owner_w(), and sctp_wfree() respectively as the > reverse operation. If a sender runs out of wmem, it needs to > wait via sctp_wait_for_sndbuf(), and gets woken up by a call > to __sctp_write_space() mostly via sctp_wfree(). > > __sctp_write_space() is being called per association. Although > we assign sk->sk_write_space() to sctp_write_space(), which > is then being done per socket, it is only used if send space > is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE > is set and therefore not invoked in sock_wfree(). > > Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf > again when transmitting packet") fixed an issue where in case > sctp_packet_transmit() manages to queue up more than sndbuf > bytes, sctp_wait_for_sndbuf() will never be woken up again > unless it is interrupted by a signal. However, a still > remaining issue is that if net.sctp.sndbuf_policy=0, that is > accounting per socket, and one-to-many sockets are in use, > the reclaimed write space from sctp_wfree() is 'unfairly' > handed back on the server to the association that is the lucky > one to be woken up again via __sctp_write_space(), while > the remaining associations are never be woken up again > (unless by a signal). > > The effect disappears with net.sctp.sndbuf_policy=1, that > is wmem accounting per association, as it guarantees a fair > share of wmem among associations. > > Therefore, if we have reclaimed memory in case of per socket > accouting, wake all related associations to a socket in a > fair manner, that is, traverse the socket association list > starting from the current neighbour of the association and > issue a __sctp_write_space() to everyone until we end up > waking ourselves. This guarantees that no association is > preferred over another and even if more associations are > taken into the one-to-many session, all receivers will get > messages from the server and are not stalled forever on > high load. This setting still leaves the advantage of per > socket accounting in touch as an association can still use > up global limits if unused by others. > > Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.") > Signed-off-by: Daniel Borkmann <dborkman@redhat.com> > Cc: Thomas Graf <tgraf@suug.ch> > Cc: Neil Horman <nhorman@tuxdriver.com> > --- > [ When net-next opens up again, we need to think how > we can ideally make a new list interface and simplify > both open-coded list traversals. ] > > net/sctp/socket.c | 31 ++++++++++++++++++++++++++++++- > 1 file changed, 30 insertions(+), 1 deletion(-) > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c > index 981aaf8..a4c8c1f 100644 > --- a/net/sctp/socket.c > +++ b/net/sctp/socket.c > @@ -6593,6 +6593,35 @@ static void __sctp_write_space(struct sctp_association *asoc) > } > } > > +static void sctp_wake_up_waiters(struct sock *sk, > + struct sctp_association *asoc) > +{ > + struct sctp_association *tmp = asoc; > + > + /* We do accounting for the sndbuf space per association, > + * so we only need to wake our own association. > + */ > + if (asoc->ep->sndbuf_policy) > + return __sctp_write_space(asoc); > + > + /* Accounting for the sndbuf space is per socket, so we need > + * to wake up others, try to be fair and in case of other > + * associations, let them have a go first instead of just > + * doing a sctp_write_space() call. > + */ May be a note saying that we are here only when association frees queued up chunks and thus we are under lock and the list is guaranteed not to change. > + for (tmp = list_next_entry(tmp, asocs); 1; Why not change the stop condition to tmp == asoc. It should work since it will not be head pointer. -vlad > + tmp = list_next_entry(tmp, asocs)) { > + /* Manually skip the head element. */ > + if (&tmp->asocs == &((sctp_sk(sk))->ep->asocs)) > + continue; > + /* Wake up association. */ > + __sctp_write_space(tmp); > + /* We've reached the end. */ > + if (tmp == asoc) > + break; > + } > +} > + > /* Do accounting for the sndbuf space. > * Decrement the used sndbuf space of the corresponding association by the > * data size which was just transmitted(freed). > @@ -6620,7 +6649,7 @@ static void sctp_wfree(struct sk_buff *skb) > sk_mem_uncharge(sk, skb->truesize); > > sock_wfree(skb); > - __sctp_write_space(asoc); > + sctp_wake_up_waiters(sk, asoc); > > sctp_association_put(asoc); > } > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/08/2014 04:41 PM, Vlad Yasevich wrote: > On 04/08/2014 09:33 AM, Daniel Borkmann wrote: >> SCTP charges chunks for wmem accounting via skb->truesize in >> sctp_set_owner_w(), and sctp_wfree() respectively as the >> reverse operation. If a sender runs out of wmem, it needs to >> wait via sctp_wait_for_sndbuf(), and gets woken up by a call >> to __sctp_write_space() mostly via sctp_wfree(). >> >> __sctp_write_space() is being called per association. Although >> we assign sk->sk_write_space() to sctp_write_space(), which >> is then being done per socket, it is only used if send space >> is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE >> is set and therefore not invoked in sock_wfree(). >> >> Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf >> again when transmitting packet") fixed an issue where in case >> sctp_packet_transmit() manages to queue up more than sndbuf >> bytes, sctp_wait_for_sndbuf() will never be woken up again >> unless it is interrupted by a signal. However, a still >> remaining issue is that if net.sctp.sndbuf_policy=0, that is >> accounting per socket, and one-to-many sockets are in use, >> the reclaimed write space from sctp_wfree() is 'unfairly' >> handed back on the server to the association that is the lucky >> one to be woken up again via __sctp_write_space(), while >> the remaining associations are never be woken up again >> (unless by a signal). >> >> The effect disappears with net.sctp.sndbuf_policy=1, that >> is wmem accounting per association, as it guarantees a fair >> share of wmem among associations. >> >> Therefore, if we have reclaimed memory in case of per socket >> accouting, wake all related associations to a socket in a >> fair manner, that is, traverse the socket association list >> starting from the current neighbour of the association and >> issue a __sctp_write_space() to everyone until we end up >> waking ourselves. This guarantees that no association is >> preferred over another and even if more associations are >> taken into the one-to-many session, all receivers will get >> messages from the server and are not stalled forever on >> high load. This setting still leaves the advantage of per >> socket accounting in touch as an association can still use >> up global limits if unused by others. >> >> Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.") >> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> >> Cc: Thomas Graf <tgraf@suug.ch> >> Cc: Neil Horman <nhorman@tuxdriver.com> >> --- >> [ When net-next opens up again, we need to think how >> we can ideally make a new list interface and simplify >> both open-coded list traversals. ] >> >> net/sctp/socket.c | 31 ++++++++++++++++++++++++++++++- >> 1 file changed, 30 insertions(+), 1 deletion(-) >> >> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >> index 981aaf8..a4c8c1f 100644 >> --- a/net/sctp/socket.c >> +++ b/net/sctp/socket.c >> @@ -6593,6 +6593,35 @@ static void __sctp_write_space(struct sctp_association *asoc) >> } >> } >> >> +static void sctp_wake_up_waiters(struct sock *sk, >> + struct sctp_association *asoc) >> +{ >> + struct sctp_association *tmp = asoc; >> + >> + /* We do accounting for the sndbuf space per association, >> + * so we only need to wake our own association. >> + */ >> + if (asoc->ep->sndbuf_policy) >> + return __sctp_write_space(asoc); >> + >> + /* Accounting for the sndbuf space is per socket, so we need >> + * to wake up others, try to be fair and in case of other >> + * associations, let them have a go first instead of just >> + * doing a sctp_write_space() call. >> + */ > > May be a note saying that we are here only when association frees > queued up chunks and thus we are under lock and the list is guaranteed > not to change. Ok, will add that to the comment and respin, thanks Vlad. >> + for (tmp = list_next_entry(tmp, asocs); 1; > > Why not change the stop condition to tmp == asoc. It should work > since it will not be head pointer. If I see this correctly, wouldn't we then exclude to eventually call __sctp_write_space(tmp) on ourselves as we also need to make sure to wake us up? > -vlad > >> + tmp = list_next_entry(tmp, asocs)) { >> + /* Manually skip the head element. */ >> + if (&tmp->asocs == &((sctp_sk(sk))->ep->asocs)) >> + continue; >> + /* Wake up association. */ >> + __sctp_write_space(tmp); >> + /* We've reached the end. */ >> + if (tmp == asoc) >> + break; >> + } >> +} >> + >> /* Do accounting for the sndbuf space. >> * Decrement the used sndbuf space of the corresponding association by the >> * data size which was just transmitted(freed). >> @@ -6620,7 +6649,7 @@ static void sctp_wfree(struct sk_buff *skb) >> sk_mem_uncharge(sk, skb->truesize); >> >> sock_wfree(skb); >> - __sctp_write_space(asoc); >> + sctp_wake_up_waiters(sk, asoc); >> >> sctp_association_put(asoc); >> } >> > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/08/2014 10:52 AM, Daniel Borkmann wrote: > On 04/08/2014 04:41 PM, Vlad Yasevich wrote: >> On 04/08/2014 09:33 AM, Daniel Borkmann wrote: >>> SCTP charges chunks for wmem accounting via skb->truesize in >>> sctp_set_owner_w(), and sctp_wfree() respectively as the >>> reverse operation. If a sender runs out of wmem, it needs to >>> wait via sctp_wait_for_sndbuf(), and gets woken up by a call >>> to __sctp_write_space() mostly via sctp_wfree(). >>> >>> __sctp_write_space() is being called per association. Although >>> we assign sk->sk_write_space() to sctp_write_space(), which >>> is then being done per socket, it is only used if send space >>> is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE >>> is set and therefore not invoked in sock_wfree(). >>> >>> Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf >>> again when transmitting packet") fixed an issue where in case >>> sctp_packet_transmit() manages to queue up more than sndbuf >>> bytes, sctp_wait_for_sndbuf() will never be woken up again >>> unless it is interrupted by a signal. However, a still >>> remaining issue is that if net.sctp.sndbuf_policy=0, that is >>> accounting per socket, and one-to-many sockets are in use, >>> the reclaimed write space from sctp_wfree() is 'unfairly' >>> handed back on the server to the association that is the lucky >>> one to be woken up again via __sctp_write_space(), while >>> the remaining associations are never be woken up again >>> (unless by a signal). >>> >>> The effect disappears with net.sctp.sndbuf_policy=1, that >>> is wmem accounting per association, as it guarantees a fair >>> share of wmem among associations. >>> >>> Therefore, if we have reclaimed memory in case of per socket >>> accouting, wake all related associations to a socket in a >>> fair manner, that is, traverse the socket association list >>> starting from the current neighbour of the association and >>> issue a __sctp_write_space() to everyone until we end up >>> waking ourselves. This guarantees that no association is >>> preferred over another and even if more associations are >>> taken into the one-to-many session, all receivers will get >>> messages from the server and are not stalled forever on >>> high load. This setting still leaves the advantage of per >>> socket accounting in touch as an association can still use >>> up global limits if unused by others. >>> >>> Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.") >>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> >>> Cc: Thomas Graf <tgraf@suug.ch> >>> Cc: Neil Horman <nhorman@tuxdriver.com> >>> --- >>> [ When net-next opens up again, we need to think how >>> we can ideally make a new list interface and simplify >>> both open-coded list traversals. ] >>> >>> net/sctp/socket.c | 31 ++++++++++++++++++++++++++++++- >>> 1 file changed, 30 insertions(+), 1 deletion(-) >>> >>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >>> index 981aaf8..a4c8c1f 100644 >>> --- a/net/sctp/socket.c >>> +++ b/net/sctp/socket.c >>> @@ -6593,6 +6593,35 @@ static void __sctp_write_space(struct >>> sctp_association *asoc) >>> } >>> } >>> >>> +static void sctp_wake_up_waiters(struct sock *sk, >>> + struct sctp_association *asoc) >>> +{ >>> + struct sctp_association *tmp = asoc; >>> + >>> + /* We do accounting for the sndbuf space per association, >>> + * so we only need to wake our own association. >>> + */ >>> + if (asoc->ep->sndbuf_policy) >>> + return __sctp_write_space(asoc); >>> + >>> + /* Accounting for the sndbuf space is per socket, so we need >>> + * to wake up others, try to be fair and in case of other >>> + * associations, let them have a go first instead of just >>> + * doing a sctp_write_space() call. >>> + */ >> >> May be a note saying that we are here only when association frees >> queued up chunks and thus we are under lock and the list is guaranteed >> not to change. > > Ok, will add that to the comment and respin, thanks Vlad. > >>> + for (tmp = list_next_entry(tmp, asocs); 1; >> >> Why not change the stop condition to tmp == asoc. It should work >> since it will not be head pointer. > > If I see this correctly, wouldn't we then exclude to eventually > call __sctp_write_space(tmp) on ourselves as we also need to make > sure to wake us up? > Ahh, yes. You are right. This is yet another list traversal with skip_head. I am going to resurrect that code for net-next. -vlad >> -vlad >> >>> + tmp = list_next_entry(tmp, asocs)) { >>> + /* Manually skip the head element. */ >>> + if (&tmp->asocs == &((sctp_sk(sk))->ep->asocs)) >>> + continue; >>> + /* Wake up association. */ >>> + __sctp_write_space(tmp); >>> + /* We've reached the end. */ >>> + if (tmp == asoc) >>> + break; >>> + } >>> +} >>> + >>> /* Do accounting for the sndbuf space. >>> * Decrement the used sndbuf space of the corresponding association >>> by the >>> * data size which was just transmitted(freed). >>> @@ -6620,7 +6649,7 @@ static void sctp_wfree(struct sk_buff *skb) >>> sk_mem_uncharge(sk, skb->truesize); >>> >>> sock_wfree(skb); >>> - __sctp_write_space(asoc); >>> + sctp_wake_up_waiters(sk, asoc); >>> >>> sctp_association_put(asoc); >>> } >>> >> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/08/2014 05:26 PM, Vlad Yasevich wrote: ... > This is yet another list traversal with skip_head. I am going to > resurrect that code for net-next. Sounds good! I actually had a look longer time ago, to squeeze all that logic into a list macro, but so far I didn't find something clean without appending an if-clause to it that tests for head element - and that I don't like as it can happen that unintentionally users could run into a dangling else. In the end it all looked more complicated than the current code was, so I wasn't really satisfied and therefore didn't post a patch. ;) Cheers, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Daniel Borkmann <dborkman@redhat.com> Date: Tue, 8 Apr 2014 15:33:57 +0200 > [ When net-next opens up again, we need to think how > we can ideally make a new list interface and simplify > both open-coded list traversals. ] I don't think you need a new list interface. Instead, just have a back pointer to the sctp_sk() from the asoc, then you can do normal list traversals over the asocs. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/08/2014 12:50 PM, David Miller wrote: > From: Daniel Borkmann <dborkman@redhat.com> > Date: Tue, 8 Apr 2014 15:33:57 +0200 > >> [ When net-next opens up again, we need to think how >> we can ideally make a new list interface and simplify >> both open-coded list traversals. ] > > I don't think you need a new list interface. > > Instead, just have a back pointer to the sctp_sk() from the asoc, then > you can do normal list traversals over the asocs. We already have that, but doing so will give preference to first association and possibly starving the last one. This starvation avoidance is the whole point of having a per-association accounting. -vlad -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Vlad Yasevich <vyasevich@gmail.com> Date: Tue, 08 Apr 2014 13:13:46 -0400 > On 04/08/2014 12:50 PM, David Miller wrote: >> From: Daniel Borkmann <dborkman@redhat.com> >> Date: Tue, 8 Apr 2014 15:33:57 +0200 >> >>> [ When net-next opens up again, we need to think how >>> we can ideally make a new list interface and simplify >>> both open-coded list traversals. ] >> >> I don't think you need a new list interface. >> >> Instead, just have a back pointer to the sctp_sk() from the asoc, then >> you can do normal list traversals over the asocs. > > We already have that, but doing so will give preference to first > association and possibly starving the last one. > > This starvation avoidance is the whole point of having a per-association > accounting. I see, thanks for explaining. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 981aaf8..a4c8c1f 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -6593,6 +6593,35 @@ static void __sctp_write_space(struct sctp_association *asoc) } } +static void sctp_wake_up_waiters(struct sock *sk, + struct sctp_association *asoc) +{ + struct sctp_association *tmp = asoc; + + /* We do accounting for the sndbuf space per association, + * so we only need to wake our own association. + */ + if (asoc->ep->sndbuf_policy) + return __sctp_write_space(asoc); + + /* Accounting for the sndbuf space is per socket, so we need + * to wake up others, try to be fair and in case of other + * associations, let them have a go first instead of just + * doing a sctp_write_space() call. + */ + for (tmp = list_next_entry(tmp, asocs); 1; + tmp = list_next_entry(tmp, asocs)) { + /* Manually skip the head element. */ + if (&tmp->asocs == &((sctp_sk(sk))->ep->asocs)) + continue; + /* Wake up association. */ + __sctp_write_space(tmp); + /* We've reached the end. */ + if (tmp == asoc) + break; + } +} + /* Do accounting for the sndbuf space. * Decrement the used sndbuf space of the corresponding association by the * data size which was just transmitted(freed). @@ -6620,7 +6649,7 @@ static void sctp_wfree(struct sk_buff *skb) sk_mem_uncharge(sk, skb->truesize); sock_wfree(skb); - __sctp_write_space(asoc); + sctp_wake_up_waiters(sk, asoc); sctp_association_put(asoc); }
SCTP charges chunks for wmem accounting via skb->truesize in sctp_set_owner_w(), and sctp_wfree() respectively as the reverse operation. If a sender runs out of wmem, it needs to wait via sctp_wait_for_sndbuf(), and gets woken up by a call to __sctp_write_space() mostly via sctp_wfree(). __sctp_write_space() is being called per association. Although we assign sk->sk_write_space() to sctp_write_space(), which is then being done per socket, it is only used if send space is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE is set and therefore not invoked in sock_wfree(). Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf again when transmitting packet") fixed an issue where in case sctp_packet_transmit() manages to queue up more than sndbuf bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is interrupted by a signal. However, a still remaining issue is that if net.sctp.sndbuf_policy=0, that is accounting per socket, and one-to-many sockets are in use, the reclaimed write space from sctp_wfree() is 'unfairly' handed back on the server to the association that is the lucky one to be woken up again via __sctp_write_space(), while the remaining associations are never be woken up again (unless by a signal). The effect disappears with net.sctp.sndbuf_policy=1, that is wmem accounting per association, as it guarantees a fair share of wmem among associations. Therefore, if we have reclaimed memory in case of per socket accouting, wake all related associations to a socket in a fair manner, that is, traverse the socket association list starting from the current neighbour of the association and issue a __sctp_write_space() to everyone until we end up waking ourselves. This guarantees that no association is preferred over another and even if more associations are taken into the one-to-many session, all receivers will get messages from the server and are not stalled forever on high load. This setting still leaves the advantage of per socket accounting in touch as an association can still use up global limits if unused by others. Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Cc: Neil Horman <nhorman@tuxdriver.com> --- [ When net-next opens up again, we need to think how we can ideally make a new list interface and simplify both open-coded list traversals. ] net/sctp/socket.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)