| Message ID | 1610005408-25724-1-git-send-email-lirongqing@baidu.com |
|---|---|
| State | Rejected |
| Headers | show |
| Series | [ovs-dev,v2] conntrack: Fix conntrack tw expiration | expand |
Li RongQing <lirongqing@baidu.com> writes: > a connection will enter timewait status when a reset packet > reached after a fin is received/sent, But the expiration > time is not updated, still is the previous expiration > time. this maybe causes connection table overflow due to long > expiration time > > Fixes: a489b16854b59 ("conntrack: New userspace connection tracker.") > Signed-off-by: Li RongQing <lirongqing@baidu.com> > Co-authored-by: Mao YingMing <maoyingming@baidu.com> > Signed-off-by: Mao YingMing <maoyingming@baidu.com> > --- > resend with maoyingming signature > > lib/conntrack-tcp.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c > index 18a2aa7c7..f1595af7a 100644 > --- a/lib/conntrack-tcp.c > +++ b/lib/conntrack-tcp.c > @@ -406,6 +406,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, > > if (tcp_flags & TCP_RST) { > src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT; > + conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSED, now); > } > } else { > COVERAGE_INC(conntrack_tcp_seq_chk_failed); Does it make more sense to move the entire conn_update_expiration block in the previous condition to a common place after the 3 cases here (seq_chk, shotgun-syn, and in-window blocks)? That should cover this case, and the fin assignment as well. Then again, there are only two assignments in this block. Alternatively, I think the fin check earlier will need a similar conn_update_expiration - do we have a case where FIN|ACK on learning an already established connection could keep us with an older timeout?
struct conntrack *ct, struct conn > > *conn_, > > > > if (tcp_flags & TCP_RST) { > > src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT; > > + conn_update_expiration(ct, &conn->up, > CT_TM_TCP_CLOSED, > > + now); > > } > > } else { > > COVERAGE_INC(conntrack_tcp_seq_chk_failed); > > Does it make more sense to move the entire conn_update_expiration block in > the previous condition to a common place after the 3 cases here (seq_chk, > shotgun-syn, and in-window blocks)? > > That should cover this case, and the fin assignment as well. Then again, there > are only two assignments in this block. > Do you mean we should change like below; I think your suggestion make sense, if no one has objection, I will send a new version diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c index 18a2aa7c7..b827eab62 100644 --- a/lib/conntrack-tcp.c +++ b/lib/conntrack-tcp.c @@ -338,21 +338,6 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT; } - if (src->state >= CT_DPIF_TCPS_FIN_WAIT_2 - && dst->state >= CT_DPIF_TCPS_FIN_WAIT_2) { - conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSED, now); - } else if (src->state >= CT_DPIF_TCPS_CLOSING - && dst->state >= CT_DPIF_TCPS_CLOSING) { - conn_update_expiration(ct, &conn->up, CT_TM_TCP_FIN_WAIT, now); - } else if (src->state < CT_DPIF_TCPS_ESTABLISHED - || dst->state < CT_DPIF_TCPS_ESTABLISHED) { - conn_update_expiration(ct, &conn->up, CT_TM_TCP_OPENING, now); - } else if (src->state >= CT_DPIF_TCPS_CLOSING - || dst->state >= CT_DPIF_TCPS_CLOSING) { - conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSING, now); - } else { - conn_update_expiration(ct, &conn->up, CT_TM_TCP_ESTABLISHED, now); - } } else if ((dst->state < CT_DPIF_TCPS_SYN_SENT || dst->state >= CT_DPIF_TCPS_FIN_WAIT_2 || src->state >= CT_DPIF_TCPS_FIN_WAIT_2) @@ -412,6 +397,22 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, return CT_UPDATE_INVALID; } + if (src->state >= CT_DPIF_TCPS_FIN_WAIT_2 + && dst->state >= CT_DPIF_TCPS_FIN_WAIT_2) { + conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSED, now); + } else if (src->state >= CT_DPIF_TCPS_CLOSING + && dst->state >= CT_DPIF_TCPS_CLOSING) { + conn_update_expiration(ct, &conn->up, CT_TM_TCP_FIN_WAIT, now); + } else if (src->state < CT_DPIF_TCPS_ESTABLISHED + || dst->state < CT_DPIF_TCPS_ESTABLISHED) { + conn_update_expiration(ct, &conn->up, CT_TM_TCP_OPENING, now); + } else if (src->state >= CT_DPIF_TCPS_CLOSING + || dst->state >= CT_DPIF_TCPS_CLOSING) { + conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSING, now); + } else { + conn_update_expiration(ct, &conn->up, CT_TM_TCP_ESTABLISHED, now); + } + return CT_UPDATE_VALID; } -Li > Alternatively, I think the fin check earlier will need a similar > conn_update_expiration - do we have a case where FIN|ACK on learning an > already established connection could keep us with an older timeout?
"Li,Rongqing" <lirongqing@baidu.com> writes: > struct conntrack *ct, struct conn >> > *conn_, >> > >> > if (tcp_flags & TCP_RST) { >> > src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT; >> > + conn_update_expiration(ct, &conn->up, >> CT_TM_TCP_CLOSED, >> > + now); >> > } >> > } else { >> > COVERAGE_INC(conntrack_tcp_seq_chk_failed); >> >> Does it make more sense to move the entire conn_update_expiration block in >> the previous condition to a common place after the 3 cases here (seq_chk, >> shotgun-syn, and in-window blocks)? >> >> That should cover this case, and the fin assignment as well. Then again, there >> are only two assignments in this block. >> > > > Do you mean we should change like below; I think your suggestion make sense, if no one has objection, I will send a new version > > diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c > index 18a2aa7c7..b827eab62 100644 > --- a/lib/conntrack-tcp.c > +++ b/lib/conntrack-tcp.c > @@ -338,21 +338,6 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, > src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT; > } > > - if (src->state >= CT_DPIF_TCPS_FIN_WAIT_2 > - && dst->state >= CT_DPIF_TCPS_FIN_WAIT_2) { > - conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSED, now); > - } else if (src->state >= CT_DPIF_TCPS_CLOSING > - && dst->state >= CT_DPIF_TCPS_CLOSING) { > - conn_update_expiration(ct, &conn->up, CT_TM_TCP_FIN_WAIT, now); > - } else if (src->state < CT_DPIF_TCPS_ESTABLISHED > - || dst->state < CT_DPIF_TCPS_ESTABLISHED) { > - conn_update_expiration(ct, &conn->up, CT_TM_TCP_OPENING, now); > - } else if (src->state >= CT_DPIF_TCPS_CLOSING > - || dst->state >= CT_DPIF_TCPS_CLOSING) { > - conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSING, now); > - } else { > - conn_update_expiration(ct, &conn->up, CT_TM_TCP_ESTABLISHED, now); > - } > } else if ((dst->state < CT_DPIF_TCPS_SYN_SENT > || dst->state >= CT_DPIF_TCPS_FIN_WAIT_2 > || src->state >= CT_DPIF_TCPS_FIN_WAIT_2) > @@ -412,6 +397,22 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, > return CT_UPDATE_INVALID; > } > > + if (src->state >= CT_DPIF_TCPS_FIN_WAIT_2 > + && dst->state >= CT_DPIF_TCPS_FIN_WAIT_2) { > + conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSED, now); > + } else if (src->state >= CT_DPIF_TCPS_CLOSING > + && dst->state >= CT_DPIF_TCPS_CLOSING) { > + conn_update_expiration(ct, &conn->up, CT_TM_TCP_FIN_WAIT, now); > + } else if (src->state < CT_DPIF_TCPS_ESTABLISHED > + || dst->state < CT_DPIF_TCPS_ESTABLISHED) { > + conn_update_expiration(ct, &conn->up, CT_TM_TCP_OPENING, now); > + } else if (src->state >= CT_DPIF_TCPS_CLOSING > + || dst->state >= CT_DPIF_TCPS_CLOSING) { > + conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSING, now); > + } else { > + conn_update_expiration(ct, &conn->up, CT_TM_TCP_ESTABLISHED, now); > + } > + > return CT_UPDATE_VALID; > } I'm much less sure now, re-reading the comments, especially: * This must be a little more careful than the above code * since packet floods will also be caught here. We don't * update the TTL here to mitigate the damage of a packet * flood and so the same code can handle awkward establishment * and a loosened connection close. * In the establishment case, a correct peer response will * validate the connection, go through the normal state code * and keep updating the state TTL. I think given that, it would be wrong to make any change to the connection expiration time in this branch. I guess you're hitting the 'loosened connection close' case, which will not change the TTL and keep the connection in the table longer. Looking at other PF based firewalls, none of them ever update the xon timeout in this fall-through case (I looked at openbsd, dragonfly bsd, and freebsd). Have you looked at using the CT timeout policy framework to setup a timeout policy? I think that would make more sense than modifying this branch. > > > -Li > >> Alternatively, I think the fin check earlier will need a similar >> conn_update_expiration - do we have a case where FIN|ACK on learning an >> already established connection could keep us with an older timeout?
"Li,Rongqing" <lirongqing@baidu.com> writes: > > > > >> -----Original Message----- > >> From: Aaron Conole [mailto:aconole@redhat.com] > >> Sent: Friday, January 08, 2021 11:15 PM > >> To: Li,Rongqing <lirongqing@baidu.com> > >> Cc: ovs-dev@openvswitch.org; William Tu <u9012063@gmail.com> > >> Subject: Re: [ovs-dev] [PATCH][v2] conntrack: Fix conntrack tw expiration > >> > >> "Li,Rongqing" <lirongqing@baidu.com> writes: > >> > >> > struct conntrack *ct, struct conn > >> >> > *conn_, > >> >> > > >> >> > if (tcp_flags & TCP_RST) { > >> >> > src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT; > >> >> > + conn_update_expiration(ct, &conn->up, > >> >> CT_TM_TCP_CLOSED, > >> >> > + now); > >> >> > } > >> >> > } else { > >> >> > COVERAGE_INC(conntrack_tcp_seq_chk_failed); > >> >> > >> >> Does it make more sense to move the entire conn_update_expiration > >> >> block in the previous condition to a common place after the 3 cases > >> >> here (seq_chk, shotgun-syn, and in-window blocks)? > >> >> > >> >> That should cover this case, and the fin assignment as well. Then > >> >> again, there are only two assignments in this block. > >> >> > >> > > >> > > >> > Do you mean we should change like below; I think your suggestion make > >> > sense, if no one has objection, I will send a new version > >> > > >> > diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c index > >> > 18a2aa7c7..b827eab62 100644 > >> > --- a/lib/conntrack-tcp.c > >> > +++ b/lib/conntrack-tcp.c > >> > @@ -338,21 +338,6 @@ tcp_conn_update(struct conntrack *ct, struct conn > >> *conn_, > >> > src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT; > >> > } > >> > > >> > - if (src->state >= CT_DPIF_TCPS_FIN_WAIT_2 > >> > - && dst->state >= CT_DPIF_TCPS_FIN_WAIT_2) { > >> > - conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSED, > >> now); > >> > - } else if (src->state >= CT_DPIF_TCPS_CLOSING > >> > - && dst->state >= CT_DPIF_TCPS_CLOSING) { > >> > - conn_update_expiration(ct, &conn->up, > >> CT_TM_TCP_FIN_WAIT, now); > >> > - } else if (src->state < CT_DPIF_TCPS_ESTABLISHED > >> > - || dst->state < CT_DPIF_TCPS_ESTABLISHED) { > >> > - conn_update_expiration(ct, &conn->up, > >> CT_TM_TCP_OPENING, now); > >> > - } else if (src->state >= CT_DPIF_TCPS_CLOSING > >> > - || dst->state >= CT_DPIF_TCPS_CLOSING) { > >> > - conn_update_expiration(ct, &conn->up, > >> CT_TM_TCP_CLOSING, now); > >> > - } else { > >> > - conn_update_expiration(ct, &conn->up, > >> CT_TM_TCP_ESTABLISHED, now); > >> > - } > >> > } else if ((dst->state < CT_DPIF_TCPS_SYN_SENT > >> > || dst->state >= CT_DPIF_TCPS_FIN_WAIT_2 > >> > || src->state >= CT_DPIF_TCPS_FIN_WAIT_2) @@ > >> -412,6 > >> > +397,22 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, > >> > return CT_UPDATE_INVALID; > >> > } > >> > > >> > + if (src->state >= CT_DPIF_TCPS_FIN_WAIT_2 > >> > + && dst->state >= CT_DPIF_TCPS_FIN_WAIT_2) { > >> > + conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSED, > >> now); > >> > + } else if (src->state >= CT_DPIF_TCPS_CLOSING > >> > + && dst->state >= CT_DPIF_TCPS_CLOSING) { > >> > + conn_update_expiration(ct, &conn->up, CT_TM_TCP_FIN_WAIT, > >> now); > >> > + } else if (src->state < CT_DPIF_TCPS_ESTABLISHED > >> > + || dst->state < CT_DPIF_TCPS_ESTABLISHED) { > >> > + conn_update_expiration(ct, &conn->up, CT_TM_TCP_OPENING, > >> now); > >> > + } else if (src->state >= CT_DPIF_TCPS_CLOSING > >> > + || dst->state >= CT_DPIF_TCPS_CLOSING) { > >> > + conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSING, > >> now); > >> > + } else { > >> > + conn_update_expiration(ct, &conn->up, > >> CT_TM_TCP_ESTABLISHED, now); > >> > + } > >> > + > >> > return CT_UPDATE_VALID; > >> > } > >> > >> I'm much less sure now, re-reading the comments, especially: > >> > >> * This must be a little more careful than the above code > >> * since packet floods will also be caught here. We don't > >> * update the TTL here to mitigate the damage of a packet > >> * flood and so the same code can handle awkward establishment > >> * and a loosened connection close. > >> * In the establishment case, a correct peer response will > >> * validate the connection, go through the normal state code > >> * and keep updating the state TTL. > > > > > > Sorry, I am not clear how these codes related to TLL? As I understand the code, the state TTL is the expiration time for the state. >> > >> I think given that, it would be wrong to make any change to the connection > >> expiration time in this branch. > >> > >> I guess you're hitting the 'loosened connection close' case, which will not > > > > > > the connection that I see is blow: > > > > * > > > > > >> change the TTL and keep the connection in the table longer. Looking at other > >> PF based firewalls, none of them ever update the xon timeout in this > >> fall-through case (I looked at openbsd, dragonfly bsd, and freebsd). > >> > > Is it possible this is a common issue for BSD? I did not find the similar logic in linux > It is possible that such issues don't impact linux, but it could be for other reasons. If you have a traffic capture that you can post we can look at it. Best would be to create a test case using sendpkt.py to send the packets - that would demonstrate the issue between the two (and help us to keep it cohesive in the future). I think we do want to keep the datapaths as similar as possible, so I would welcome this test to be added. > >> Have you looked at using the CT timeout policy framework to setup a timeout > > > > > > What is CT timeout policy? Could you give some links, thanks I guess there isn't a whole lot of documentation. We should fix that. From ovs-vsctl(8): [--may-exist] add-zone-tp datapath zone=zone_id policies Creates a conntrack zone timeout policy with zone_id in data‐ path. The policies consist of key=value pairs, separated by spaces. For example, icmp_first=30 icmp_reply=60 specifies a 30-second timeout policy for the first ICMP packet and a 60-sec‐ ond policy for ICMP reply packets. See the CT_Timeout_Policy table in ovs-vswitchd.conf.db(5) for the supported keys. Without --may-exist, attempting to add a zone_id that already exists is an error. With --may-exist, this command does nothing if zone_id already exists. From the CT_Timeout_Policy section of ovs-vswitchd.conf.db(5): CT_Timeout_Policy TABLE Connection tracking timeout policy configuration Summary: Timeouts: timeouts map of string-integer pairs, key one of icmp_first, icmp_reply, tcp_close, tcp_close_wait, tcp_established, tcp_fin_wait, tcp_last_ack, tcp_retrans‐ mit, tcp_syn_recv, tcp_syn_sent2, tcp_syn_sent, tcp_time_wait, tcp_unack, udp_first, udp_multiple, or udp_single, value in range 0 to 4,294,967,295 TCP Timeouts: timeouts : tcp_syn_sent optional integer, in range 0 to 4,294,967,295 timeouts : tcp_syn_recv optional integer, in range 0 to 4,294,967,295 timeouts : tcp_established optional integer, in range 0 to 4,294,967,295 timeouts : tcp_fin_wait optional integer, in range 0 to 4,294,967,295 timeouts : tcp_close_wait optional integer, in range 0 to 4,294,967,295 timeouts : tcp_last_ack optional integer, in range 0 to 4,294,967,295 timeouts : tcp_time_wait optional integer, in range 0 to 4,294,967,295 timeouts : tcp_close optional integer, in range 0 to 4,294,967,295 timeouts : tcp_syn_sent2 optional integer, in range 0 to 4,294,967,295 timeouts : tcp_retransmit optional integer, in range 0 to 4,294,967,295 timeouts : tcp_unack optional integer, in range 0 to 4,294,967,295 .... As an example (just illustration - I didn't test any of this): ovs-vsctl --may-exist add-zone-tp netdev zone=0 tcp_time_wait=30 \ tcp_close=30 tcp_last_ack=30 tcp_fin_wait=30 > > > Thanks > > > > -Li
> -----Original Message----- > From: Aaron Conole [mailto:aconole@redhat.com] > Sent: Thursday, January 14, 2021 11:23 PM > To: Li,Rongqing <lirongqing@baidu.com> > Cc: ovs-dev@openvswitch.org; William Tu <u9012063@gmail.com> > Subject: Re: [ovs-dev] [PATCH][v2] conntrack: Fix conntrack tw expiration > > "Li,Rongqing" <lirongqing@baidu.com> writes: > > > > > > > > > > >> -----Original Message----- > > > > > >> I think given that, it would be wrong to make any change to the > >> connection > > > >> expiration time in this branch. > > > >> > > > >> I guess you're hitting the 'loosened connection close' case, which > >> will not > > > > > > > > > > > > the connection that I see is blow: > > > > > > > > * > > > > > > > > > > > >> change the TTL and keep the connection in the table longer. Looking > >> at other > > > >> PF based firewalls, none of them ever update the xon timeout in this > > > >> fall-through case (I looked at openbsd, dragonfly bsd, and freebsd). > > > >> > > > > Is it possible this is a common issue for BSD? I did not find the > > similar logic in linux > > > > It is possible that such issues don't impact linux, but it could be for other > reasons. If you have a traffic capture that you can post we can look at it. > Best would be to create a test case using sendpkt.py to send the packets - that > would demonstrate the issue between the two (and help us to keep it cohesive > in the future). > Ok, I will try to test it > I think we do want to keep the datapaths as similar as possible, so I would > welcome this test to be added. > > > > >> Have you looked at using the CT timeout policy framework to setup a > >> timeout > > > > > > > > > > > > What is CT timeout policy? Could you give some links, thanks > > I guess there isn't a whole lot of documentation. We should fix that. > > From ovs-vsctl(8): > [--may-exist] add-zone-tp datapath zone=zone_id policies > Creates a conntrack zone timeout policy with zone_id in > data‐ > path. The policies consist of key=value pairs, > separated by > spaces. For example, icmp_first=30 icmp_reply=60 > specifies a > 30-second timeout policy for the first ICMP packet and a > 60-sec‐ > ond policy for ICMP reply packets. See the > CT_Timeout_Policy > table in ovs-vswitchd.conf.db(5) for the supported keys. > > Without --may-exist, attempting to add a zone_id that > already > exists is an error. With --may-exist, this command does > nothing > if zone_id already exists. > > From the CT_Timeout_Policy section of ovs-vswitchd.conf.db(5): > > CT_Timeout_Policy TABLE > Connection tracking timeout policy configuration > > Summary: > Timeouts: > timeouts map of string-integer pairs, key > one of > icmp_first, icmp_reply, > tcp_close, > tcp_close_wait, > tcp_established, > tcp_fin_wait, tcp_last_ack, > tcp_retrans‐ > mit, tcp_syn_recv, > tcp_syn_sent2, > tcp_syn_sent, tcp_time_wait, > tcp_unack, > udp_first, udp_multiple, or > udp_single, > value in range 0 to 4,294,967,295 > TCP Timeouts: > timeouts : tcp_syn_sent optional integer, in range > 0 to > 4,294,967,295 > timeouts : tcp_syn_recv optional integer, in range > 0 to > 4,294,967,295 > timeouts : tcp_established > optional integer, in > range 0 to > 4,294,967,295 > timeouts : tcp_fin_wait optional integer, in range > 0 to > 4,294,967,295 > timeouts : tcp_close_wait > optional integer, in > range 0 to > 4,294,967,295 > timeouts : tcp_last_ack optional integer, in range > 0 to > 4,294,967,295 > timeouts : tcp_time_wait optional integer, in range > 0 to > 4,294,967,295 > timeouts : tcp_close optional integer, in range > 0 to > 4,294,967,295 > timeouts : tcp_syn_sent2 optional integer, in range > 0 to > 4,294,967,295 > timeouts : tcp_retransmit > optional integer, in > range 0 to > 4,294,967,295 > timeouts : tcp_unack optional integer, in range > 0 to > 4,294,967,295 > > .... > > As an example (just illustration - I didn't test any of this): > > ovs-vsctl --may-exist add-zone-tp netdev zone=0 tcp_time_wait=30 \ > tcp_close=30 tcp_last_ack=30 tcp_fin_wait=30 > I see, Thank you -Li > > > > > > Thanks > > > > > > > > -Li
diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c index 18a2aa7c7..f1595af7a 100644 --- a/lib/conntrack-tcp.c +++ b/lib/conntrack-tcp.c @@ -406,6 +406,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, if (tcp_flags & TCP_RST) { src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT; + conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSED, now); } } else { COVERAGE_INC(conntrack_tcp_seq_chk_failed);
a connection will enter timewait status when a reset packet reached after a fin is received/sent, But the expiration time is not updated, still is the previous expiration time. this maybe causes connection table overflow due to long expiration time Fixes: a489b16854b59 ("conntrack: New userspace connection tracker.") Signed-off-by: Li RongQing <lirongqing@baidu.com> Co-authored-by: Mao YingMing <maoyingming@baidu.com> Signed-off-by: Mao YingMing <maoyingming@baidu.com> --- resend with maoyingming signature lib/conntrack-tcp.c | 1 + 1 file changed, 1 insertion(+)