Message ID | CAOkSjBh2KtYcSEQ7=h_h+R9CE53pwGtBAiQ_OZF0aTX0fhY0fw@mail.gmail.com |
---|---|
State | RFC |
Delegated to: | Pablo Neira |
Headers | show |
On Thu, Aug 27, 2015 at 12:57:42PM +0200, Arturo Borrero Gonzalez wrote: > Hi, > > The documentation about DisableExternalCache reads: > > <<< > [...] > You can also use this option with the NOTRACK and ALARM modes. > This increases CPU consumption in the backup firewall but now you do not > need to commit the flow-states during the master failures since they are > already in the in-kernel Connection Tracking table. Moreover, you save > memory in the backup firewall since you do not need to store the > foreign flow-states anymore. > >>> > > However, the config parser doesn't allows it. Patch seems rather trivial: > > diff --git a/src/read_config_yy.y b/src/read_config_yy.y > index 73fabbf..d53aa70 100644 > --- a/src/read_config_yy.y > +++ b/src/read_config_yy.y > @@ -908,6 +908,7 @@ sync_mode_alarm_line: refreshtime > | purge > | relax_transitions > | delay_destroy_msgs > + | disable_external_cache > ; > > sync_mode_ftfw_list: > > > However, there seems to be some missing bits somewhere, the backup > node prints this in the logs: > > [...] > [Thu Aug 27 12:49:46 2015] (pid=15176) [ERROR] inject-add2: No such > file or directory > Thu Aug 27 12:49:46 2015 tcp 6 17949 ESTABLISHED > src=192.162.26.14 dst=192.168.5.134 sport=39089 dport=2015 [ASSURED] > mark=0 > [Thu Aug 27 12:49:56 2015] (pid=15176) [ERROR] inject-add2: No such > file or directory > Thu Aug 27 12:49:56 2015 tcp 6 17949 ESTABLISHED > src=192.162.26.14 dst=192.168.5.134 sport=39089 dport=2015 [ASSURED] > mark=0 > [...] > > Note, always the same connection. In my busy test environment, this > ENOENT happens every few seconds Perhaps a race condition somewhere? > > I would appreciate any hint/advice/pointer. Are these FTP data flows? I'm asking this because the master connection (control flow) may be missing in the conntrack table, thus the ENOENT error. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 28 August 2015 at 18:49, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > Are these FTP data flows? I'm asking this because the master > connection (control flow) may be missing in the conntrack table, thus > the ENOENT error. Makes sense. Would you consider accepting the patch?
On Mon, Aug 31, 2015 at 09:55:23AM +0200, Arturo Borrero Gonzalez wrote: > On 28 August 2015 at 18:49, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > > Are these FTP data flows? I'm asking this because the master > > connection (control flow) may be missing in the conntrack table, thus > > the ENOENT error. > > Makes sense. > > Would you consider accepting the patch? I think you can fix the problem that you're noticing by extending timer_add(). Note that the general idea behind the alarm approach is to insert a timer with a random time wait in unit from 0..M seconds. A quick workaround to address this is that master conntracks are placed in the time range from 0..N/2, then you place non-master conntrack entries in the range from N/2+1 to M. Then, unless there is packet loss, most likely the expected conntrack will find the master one. It should be just a few more extra lines away from this oneliner. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 1 September 2015 at 18:44, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> It should be just a few more extra lines away from this oneliner.
I have no time to immerse myself in such a sensitive codebase and do
it right :-(
However I am more than willing to test a patch from you or any other person.
On 1 September 2015 at 18:44, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Mon, Aug 31, 2015 at 09:55:23AM +0200, Arturo Borrero Gonzalez wrote: >> On 28 August 2015 at 18:49, Pablo Neira Ayuso <pablo@netfilter.org> wrote: >> > >> > Are these FTP data flows? I'm asking this because the master >> > connection (control flow) may be missing in the conntrack table, thus >> > the ENOENT error. >> >> Makes sense. >> >> Would you consider accepting the patch? > > I think you can fix the problem that you're noticing by extending > timer_add(). > > Note that the general idea behind the alarm approach is to insert a > timer with a random time wait in unit from 0..M seconds. > > A quick workaround to address this is that master conntracks are > placed in the time range from 0..N/2, then you place non-master > conntrack entries in the range from N/2+1 to M. Then, unless there is > packet loss, most likely the expected conntrack will find the master > one. > > It should be just a few more extra lines away from this oneliner. I keep thinking about this, and I have a few questions: timer_add() seems to manage timers for both internal and external caches. Is it okay to include the workaround you mentioned for *both* internal and external caches? This is what I've understood: the master conntrack in the internal cache of the sending node is vanishing before the non-master conntrack, and then the non-master conntrack is being synced again to the receiving node without the associated master conntrack. Is this assumption true? Other possibility is that both master and non-master conntracks are sycned, but then the receiving node fails to somehow inject to the kernel in the required order. Could you please give me additional hints? regards.
On Fri, Sep 25, 2015 at 01:38:38PM +0200, Arturo Borrero Gonzalez wrote: > On 1 September 2015 at 18:44, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > On Mon, Aug 31, 2015 at 09:55:23AM +0200, Arturo Borrero Gonzalez wrote: > >> On 28 August 2015 at 18:49, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > >> > > >> > Are these FTP data flows? I'm asking this because the master > >> > connection (control flow) may be missing in the conntrack table, thus > >> > the ENOENT error. > >> > >> Makes sense. > >> > >> Would you consider accepting the patch? > > > > I think you can fix the problem that you're noticing by extending > > timer_add(). > > > > Note that the general idea behind the alarm approach is to insert a > > timer with a random time wait in unit from 0..M seconds. > > > > A quick workaround to address this is that master conntracks are > > placed in the time range from 0..N/2, then you place non-master > > conntrack entries in the range from N/2+1 to M. Then, unless there is > > packet loss, most likely the expected conntrack will find the master > > one. > > > > It should be just a few more extra lines away from this oneliner. > > I keep thinking about this, and I have a few questions: > > timer_add() seems to manage timers for both internal and external caches. Only for external: struct sync_mode sync_alarm = { .internal_cache_flags = NO_FEATURES, .external_cache_flags = TIMER, ... > Is it okay to include the workaround you mentioned for *both* internal > and external caches? You only have to make it from the internal cache, the relevant timer bits are in the cache_alarm_extra code, more specifically the cache_alarm_add(), cache_alarm_update() and refresher() functions. These functions tell when to send the sync message to the backup firewall. > This is what I've understood: the master conntrack in the internal > cache of the sending node is vanishing before the non-master > conntrack, and then the non-master conntrack is being synced again to > the receiving node without the associated master conntrack. Is this > assumption true? The problem is that the non-master conntrack gets to the back before the master conntrack. With the workaround above, you can make sure the master always comes in first place (unless there is sync message loss, in that case the assumption above is no longer true). Another a bit more complex solution, but nicer, would be to look up in the internal cache for the master in case the IPS_EXPECTED status bit is set, then queue from alarm_enqueue() the master conntrack object before the expected conntrack. With this procedure, the master and the expected conntrack will be transmited in the same sync message, coming the master in first place as the backup firewall needs (the order matters). > Other possibility is that both master and non-master conntracks are > sycned, but then the receiving node fails to somehow inject to the > kernel in the required order. Not sure what you mean, both master and non-master (expected) conntracks are sync'ed, the problem is that the timer value is random, so one may get scheduled to be sent via the refresher() before another. Cheers, Pablo -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/read_config_yy.y b/src/read_config_yy.y index 73fabbf..d53aa70 100644 --- a/src/read_config_yy.y +++ b/src/read_config_yy.y @@ -908,6 +908,7 @@ sync_mode_alarm_line: refreshtime | purge | relax_transitions | delay_destroy_msgs + | disable_external_cache ; sync_mode_ftfw_list:
Hi, The documentation about DisableExternalCache reads: <<< [...] You can also use this option with the NOTRACK and ALARM modes. This increases CPU consumption in the backup firewall but now you do not need to commit the flow-states during the master failures since they are already in the in-kernel Connection Tracking table. Moreover, you save memory in the backup firewall since you do not need to store the foreign flow-states anymore. >>> However, the config parser doesn't allows it. Patch seems rather trivial: However, there seems to be some missing bits somewhere, the backup node prints this in the logs: [...] [Thu Aug 27 12:49:46 2015] (pid=15176) [ERROR] inject-add2: No such file or directory Thu Aug 27 12:49:46 2015 tcp 6 17949 ESTABLISHED src=192.162.26.14 dst=192.168.5.134 sport=39089 dport=2015 [ASSURED] mark=0 [Thu Aug 27 12:49:56 2015] (pid=15176) [ERROR] inject-add2: No such file or directory Thu Aug 27 12:49:56 2015 tcp 6 17949 ESTABLISHED src=192.162.26.14 dst=192.168.5.134 sport=39089 dport=2015 [ASSURED] mark=0 [...] Note, always the same connection. In my busy test environment, this ENOENT happens every few seconds Perhaps a race condition somewhere? I would appreciate any hint/advice/pointer.