Patchwork bridge: make bridge-nf-call-*tables default configurable

login
register
mail settings
Submitter Mark McLoughlin
Date June 30, 2009, 4:27 p.m.
Message ID <1246379267.3749.42.camel@blaa>
Download mbox | patch
Permalink /patch/29319/
State Rejected
Delegated to: David Miller
Headers show

Comments

Mark McLoughlin - June 30, 2009, 4:27 p.m.
With BRIDGE_NETFILTER enabled, bridge traffic is passed through
netfilter as it is forwarded across the bridge. This is a useful
feature in specialized cases where the admin wishes to filter bridge
traffic based on higher-level protocol headers.

However, in a lot of cases, it causes a large amount of confusion
since it is so counter-intuitive - nobody expects their IP firewall
rules to also apply to traffic on their bridges.

This is especially true for virtualization, where users create a
bridge and find that some types of traffic work and others don't, and
it can take quite some time to identify iptables as the culprit. Users
are often recommended to configure their iptables rules to ACCEPT
"physdev-is-bridged" in order to avoid this confusion.

However, because nf_conntrack introduces an skb_orphan(), it is now
recommended that bridge-nf-call-iptables be disabled completely so as
to ensure features like TUNSETSNDBUF work as expected.

For these reasons, it makes sense to allow distributions to disable
netfilter on the bridge by default and require those specialized users
to enable it explicitly via sysctl.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 net/Kconfig               |   12 ++++++++++++
 net/bridge/br_netfilter.c |    6 ++++++
 2 files changed, 18 insertions(+), 0 deletions(-)
Herbert Xu - June 30, 2009, 5 p.m.
On Tue, Jun 30, 2009 at 05:27:47PM +0100, Mark McLoughlin wrote:
> 
> However, because nf_conntrack introduces an skb_orphan(), it is now
> recommended that bridge-nf-call-iptables be disabled completely so as
> to ensure features like TUNSETSNDBUF work as expected.

Patrick, does conntrack ever make sense for bridging? Perhaps
we should get rid of that completely?

Cheers,
David Miller - June 30, 2009, 7:06 p.m.
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 1 Jul 2009 01:00:27 +0800

Adding appropriate lists and persons to CC:

> On Tue, Jun 30, 2009 at 05:27:47PM +0100, Mark McLoughlin wrote:
>> 
>> However, because nf_conntrack introduces an skb_orphan(), it is now
>> recommended that bridge-nf-call-iptables be disabled completely so as
>> to ensure features like TUNSETSNDBUF work as expected.
> 
> Patrick, does conntrack ever make sense for bridging? Perhaps
> we should get rid of that completely?
> 
> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt - June 30, 2009, 8:16 p.m.
On Tuesday 2009-06-30 21:06, David Miller wrote:
>Adding appropriate lists and persons to CC:
>
>> On Tue, Jun 30, 2009 at 05:27:47PM +0100, Mark McLoughlin wrote:
>>> 
>>> However, because nf_conntrack introduces an skb_orphan(), it is now
>>> recommended that bridge-nf-call-iptables be disabled completely so as
>>> to ensure features like TUNSETSNDBUF work as expected.
>> 
>> Patrick, does conntrack ever make sense for bridging? Perhaps
>> we should get rid of that completely?

It makes sense absolutely. Consider:

* packet enters bridge
* NF_HOOK(PF_INET6, NF_INET_PRE_ROUTING, ...) is called by nr_netfilter.c
* (connection tracking entry is set up)
* let bridging decision be "local delivery"
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Smith - June 30, 2009, 8:57 p.m.
Hi,

On Tue, 30 Jun 2009 22:16:35 +0200 (CEST)
Jan Engelhardt <jengelh@medozas.de> wrote:

> 
> On Tuesday 2009-06-30 21:06, David Miller wrote:
> >Adding appropriate lists and persons to CC:
> >
> >> On Tue, Jun 30, 2009 at 05:27:47PM +0100, Mark McLoughlin wrote:
> >>> 
> >>> However, because nf_conntrack introduces an skb_orphan(), it is now
> >>> recommended that bridge-nf-call-iptables be disabled completely so as
> >>> to ensure features like TUNSETSNDBUF work as expected.
> >> 
> >> Patrick, does conntrack ever make sense for bridging? Perhaps
> >> we should get rid of that completely?
> 
> It makes sense absolutely. Consider:
> 
> * packet enters bridge
> * NF_HOOK(PF_INET6, NF_INET_PRE_ROUTING, ...) is called by nr_netfilter.c
> * (connection tracking entry is set up)
> * let bridging decision be "local delivery"

I really like this feature, although it is only because I've spent
time thinking about it, and it's usefulness, after having been burnt
quite a lot by it, due to it's quite strange side effects on traffic.
e.g. it'll defragment bridged IP packets, and then, if the outbound
bridge interface MTU is big enough, will send off large single ethernet
frames. If you're not expecting that, or don't work out what is
going on, you'll believe you're seeing the input traffic in the form
it arrived, and you'll believe it for quite a long time :-(

I'm not sure if it supposed to work on IP traffic carried within
bridge PPPoE/PPP, but it does - and that was very, very confusing to
work out what had happened too. PPPoE comes up, as does PPP and IPCP,
but forwarded IP packets are dropped unless there is a FORWARD
iptables rule.

I do agree though, either it should default to off, or the behaviour be
made far more prominent somehow.

Regards,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu - July 1, 2009, 1:15 a.m.
On Tue, Jun 30, 2009 at 10:16:35PM +0200, Jan Engelhardt wrote:
>
> It makes sense absolutely. Consider:
> 
> * packet enters bridge
> * NF_HOOK(PF_INET6, NF_INET_PRE_ROUTING, ...) is called by nr_netfilter.c
> * (connection tracking entry is set up)
> * let bridging decision be "local delivery"

No, my question is does it ever make sense to use conntrack as
part of bridge netfilter.  That is, do you ever want to test it
in your rules that are run as part of bridge netfilter.

conntrack is inherently a security hole when used as part of
bridging, because it ignores the Ethernet header so two unrelated
connections can be tracked as one.

It used to be an even bigger security hole when we ignored VLAN
and PPPOE headers but at least that's now off by default.

Cheers,
Herbert Xu - July 1, 2009, 1:48 a.m.
On Wed, Jul 01, 2009 at 06:27:37AM +0930, Mark Smith wrote:
>
> I'm not sure if it supposed to work on IP traffic carried within
> bridge PPPoE/PPP, but it does - and that was very, very confusing to
> work out what had happened too. PPPoE comes up, as does PPP and IPCP,
> but forwarded IP packets are dropped unless there is a FORWARD
> iptables rule.

At least this bit should be off by default now, along as VLAN
decoding which was a gaping hole.

Cheers,
David Miller - July 1, 2009, 3:16 a.m.
From: Mark McLoughlin <markmc@redhat.com>
Date: Tue, 30 Jun 2009 17:27:47 +0100

> For these reasons, it makes sense to allow distributions to disable
> netfilter on the bridge by default and require those specialized users
> to enable it explicitly via sysctl.

I heard that distributions ship some file, what's it called...
something like /etc/sysctl.conf :-)

Really, if someone thinkgs the default stinks and dists don't like it
for their users, they can use sysctl.conf to set it how they please.

Notwithstanding that changing this default can break working
setups and scripts.  Yes they can "change", but they were just
(rightly) using the kernel as it came to them.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt - July 1, 2009, 3:50 a.m.
On Wednesday 2009-07-01 03:15, Herbert Xu wrote:
>On Tue, Jun 30, 2009 at 10:16:35PM +0200, Jan Engelhardt wrote:
>>
>> It makes sense absolutely. Consider:
>> 
>> * packet enters bridge
>> * NF_HOOK(PF_INET6, NF_INET_PRE_ROUTING, ...) is called by nr_netfilter.c
>> * (connection tracking entry is set up)
>> * let bridging decision be "local delivery"
>
>No, my question is does it ever make sense to use conntrack as
>part of bridge netfilter.  That is, do you ever want to test it
>in your rules that are run as part of bridge netfilter.

There is the possibility that some users have -m conntrack in their
mangle table in the PREROUTING chain. However, I am pretty sure that
if there are such users, they do it because of the layer-3/4/5/7 part
and not care about bridge so much.

>conntrack is inherently a security hole when used as part of
>bridging, because it ignores the Ethernet header so two unrelated
>connections can be tracked as one.

On secondary thought, one could also argue that because conntrack 
ignores the interface, two unrelated connections happening to be routed 
through the same machine(*) are tracked as one, too.




(*)
ip rule iif eth0 table 7
ip rule iif eth1 table 7
ip rule iif eth2 table 8
ip rule iif eth3 table 8
ip route add 2003::1/128 dev eth0 table 7
ip route add 2003::2/128 dev eth1 table 7
ip route add 2003::1/128 dev eth2 table 8
ip route add 2003::2/128 dev eth3 table 8
(four single nets, four single hosts)
In Unicode:

    A
    0
  ┌─║───┐
B1══╝   │
  │   ╔══2C
  └───║─┘
      3
      D
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu - July 1, 2009, 4:10 a.m.
On Wed, Jul 01, 2009 at 05:50:18AM +0200, Jan Engelhardt wrote:
> 
> On secondary thought, one could also argue that because conntrack 
> ignores the interface, two unrelated connections happening to be routed 
> through the same machine(*) are tracked as one, too.

Good point.  We really should make these risks much more explicit.

However, I still think the risk with bridging is higher, especially
in the presence of virtualisation.  Consider the scenario where you
have to VMs on the one host, each with a dedicated bridge with the
intention that neither should know anything about the other's
traffic.

With conntrack running as part of bridging, the traffic can now
cross over which is a serious security hole.

Cheers,
Ben Greear - July 1, 2009, 4:14 a.m.
Jan Engelhardt wrote:
> On Wednesday 2009-07-01 03:15, Herbert Xu wrote:
>   
>> On Tue, Jun 30, 2009 at 10:16:35PM +0200, Jan Engelhardt wrote:
>>     
>>> It makes sense absolutely. Consider:
>>>
>>> * packet enters bridge
>>> * NF_HOOK(PF_INET6, NF_INET_PRE_ROUTING, ...) is called by nr_netfilter.c
>>> * (connection tracking entry is set up)
>>> * let bridging decision be "local delivery"
>>>       
>> No, my question is does it ever make sense to use conntrack as
>> part of bridge netfilter.  That is, do you ever want to test it
>> in your rules that are run as part of bridge netfilter.
>>     
>
> There is the possibility that some users have -m conntrack in their
> mangle table in the PREROUTING chain. However, I am pretty sure that
> if there are such users, they do it because of the layer-3/4/5/7 part
> and not care about bridge so much.
>
>   
>> conntrack is inherently a security hole when used as part of
>> bridging, because it ignores the Ethernet header so two unrelated
>> connections can be tracked as one.
>>     
>
> On secondary thought, one could also argue that because conntrack 
> ignores the interface, two unrelated connections happening to be routed 
> through the same machine(*) are tracked as one, too.
>   
I had a similar problem while trying to implement virtual routers using 
different
routing tables and something like 'veth' to connect them.

My solution was to add a 'mark' field to the netdevice and allow 
user-space to set
the mark on the device.  This mark was included as part of the 
connection identifier.

The mark is set before the pkt hits the bridge code on ingress, so a pkt 
entering from
eth1 can get a different connection hash from a pkt entering eth2, even 
if all other
data in the packet is the same.

I'll dig it out of my monster patch if something like this is deemed 
useful upstream.

Thanks,
Ben
Patrick McHardy - July 1, 2009, 8:56 a.m.
Mark McLoughlin wrote:
> With BRIDGE_NETFILTER enabled, bridge traffic is passed through
> netfilter as it is forwarded across the bridge. This is a useful
> feature in specialized cases where the admin wishes to filter bridge
> traffic based on higher-level protocol headers.
> 
> However, in a lot of cases, it causes a large amount of confusion
> since it is so counter-intuitive - nobody expects their IP firewall
> rules to also apply to traffic on their bridges.
> 
> This is especially true for virtualization, where users create a
> bridge and find that some types of traffic work and others don't, and
> it can take quite some time to identify iptables as the culprit. Users
> are often recommended to configure their iptables rules to ACCEPT
> "physdev-is-bridged" in order to avoid this confusion.
> 
> However, because nf_conntrack introduces an skb_orphan(), it is now
> recommended that bridge-nf-call-iptables be disabled completely so as
> to ensure features like TUNSETSNDBUF work as expected.
> 
> For these reasons, it makes sense to allow distributions to disable
> netfilter on the bridge by default and require those specialized users
> to enable it explicitly via sysctl.

I agree that this makes sense, at least temporarily. Mid-term
we should really fix the defaults, so it would be good to have a
feature-removal-schedule and maybe a runtime warning stating that
these defaults will change.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy - July 1, 2009, 8:57 a.m.
Herbert Xu wrote:
> On Tue, Jun 30, 2009 at 05:27:47PM +0100, Mark McLoughlin wrote:
>> However, because nf_conntrack introduces an skb_orphan(), it is now
>> recommended that bridge-nf-call-iptables be disabled completely so as
>> to ensure features like TUNSETSNDBUF work as expected.
> 
> Patrick, does conntrack ever make sense for bridging? Perhaps
> we should get rid of that completely?

People are apparently using this for stateful tracking and even
NAT on bridges, so I'm afraid we can't get rid of it completely.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy - July 1, 2009, 9:01 a.m.
Herbert Xu wrote:
> On Tue, Jun 30, 2009 at 10:16:35PM +0200, Jan Engelhardt wrote:
>> It makes sense absolutely. Consider:
>>
>> * packet enters bridge
>> * NF_HOOK(PF_INET6, NF_INET_PRE_ROUTING, ...) is called by nr_netfilter.c
>> * (connection tracking entry is set up)
>> * let bridging decision be "local delivery"
> 
> No, my question is does it ever make sense to use conntrack as
> part of bridge netfilter.  That is, do you ever want to test it
> in your rules that are run as part of bridge netfilter.

Probably not, but thats not how its used currently. The packets are
passed to IP netfilter, which performs connection tracking. I'm not
sure how we could avoid the negative effects while still allowing this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu - July 1, 2009, 9:07 a.m.
On Wed, Jul 01, 2009 at 10:57:02AM +0200, Patrick McHardy wrote:
>
> People are apparently using this for stateful tracking and even
> NAT on bridges, so I'm afraid we can't get rid of it completely.

I think we should print out a nasty message warning the user
if we detect that conntrack is used by bridge netfilter for the
first time.  If the user persists then at least they can't sue
us for not warning them :)

Cheers,
Patrick McHardy - July 1, 2009, 9:21 a.m.
Herbert Xu wrote:
> On Wed, Jul 01, 2009 at 10:57:02AM +0200, Patrick McHardy wrote:
>> People are apparently using this for stateful tracking and even
>> NAT on bridges, so I'm afraid we can't get rid of it completely.
> 
> I think we should print out a nasty message warning the user
> if we detect that conntrack is used by bridge netfilter for the
> first time.  If the user persists then at least they can't sue
> us for not warning them :)

Agreed, at least as long as this is still the default behaviour.
Mark, could you add this to your patch? br_nf_pre_routing_finish()
looks like a good place to print a warning when skb->nfct != NULL.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark McLoughlin - July 1, 2009, 10:45 a.m.
On Tue, 2009-06-30 at 20:16 -0700, David Miller wrote:
> From: Mark McLoughlin <markmc@redhat.com>
> Date: Tue, 30 Jun 2009 17:27:47 +0100
> 
> > For these reasons, it makes sense to allow distributions to disable
> > netfilter on the bridge by default and require those specialized users
> > to enable it explicitly via sysctl.
> 
> I heard that distributions ship some file, what's it called...
> something like /etc/sysctl.conf :-)
> 
> Really, if someone thinkgs the default stinks and dists don't like it
> for their users, they can use sysctl.conf to set it how they please.

If upstream agrees the default stinks, then upstream could start making
moves towards rectifying it :-)

I like Patrick's idea of adding it to feature-removal-schedule

> Notwithstanding that changing this default can break working
> setups and scripts.  Yes they can "change", but they were just
> (rightly) using the kernel as it came to them.

Yep, that's a valid concern.

Weighing up the impact on the small number of people who use it versus
the ongoing impact on everyone else, I think it's best to make the
change.

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy - July 1, 2009, 10:51 a.m.
Mark McLoughlin wrote:
> On Tue, 2009-06-30 at 20:16 -0700, David Miller wrote:
>> From: Mark McLoughlin <markmc@redhat.com>
>> Date: Tue, 30 Jun 2009 17:27:47 +0100
>>
>>> For these reasons, it makes sense to allow distributions to disable
>>> netfilter on the bridge by default and require those specialized users
>>> to enable it explicitly via sysctl.
>> I heard that distributions ship some file, what's it called...
>> something like /etc/sysctl.conf :-)
>>
>> Really, if someone thinkgs the default stinks and dists don't like it
>> for their users, they can use sysctl.conf to set it how they please.
> 
> If upstream agrees the default stinks, then upstream could start making
> moves towards rectifying it :-)
> 
> I like Patrick's idea of adding it to feature-removal-schedule
> 
>> Notwithstanding that changing this default can break working
>> setups and scripts.  Yes they can "change", but they were just
>> (rightly) using the kernel as it came to them.
> 
> Yep, that's a valid concern.
> 
> Weighing up the impact on the small number of people who use it versus
> the ongoing impact on everyone else, I think it's best to make the
> change.

I agree, this has already caused an endless amount of problems for
users due to the unexpected behaviour and, in some cases, bugs.

Dave's point is certainly valid as well, until we change the defaults,
distributions can use sysctl.conf. But I think we should move towards
changing the defaults.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - July 1, 2009, 4:02 p.m.
From: Mark McLoughlin <markmc@redhat.com>
Date: Wed, 01 Jul 2009 11:45:58 +0100

> Weighing up the impact on the small number of people who use it versus
> the ongoing impact on everyone else, I think it's best to make the
> change.

Consider the posibility that the people using it aren't saying
anything because things just-work for them right now.

If you change this default, I guarentee we will hear from them.

"everyone else" is making noise only because the default isn't
what they want.  There is no other reason.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - July 1, 2009, 4:02 p.m.
From: Patrick McHardy <kaber@trash.net>
Date: Wed, 01 Jul 2009 12:51:06 +0200

> I agree, this has already caused an endless amount of problems for
> users due to the unexpected behaviour and, in some cases, bugs.
> 
> Dave's point is certainly valid as well, until we change the defaults,
> distributions can use sysctl.conf. But I think we should move towards
> changing the defaults.

You must do this via something like your suggestion, the
feature-removal-schedule.txt thing.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy - July 1, 2009, 4:05 p.m.
David Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Wed, 01 Jul 2009 12:51:06 +0200
> 
>> I agree, this has already caused an endless amount of problems for
>> users due to the unexpected behaviour and, in some cases, bugs.
>>
>> Dave's point is certainly valid as well, until we change the defaults,
>> distributions can use sysctl.conf. But I think we should move towards
>> changing the defaults.
> 
> You must do this via something like your suggestion, the
> feature-removal-schedule.txt thing.

Yes, of course. But I prefer the runtime warning (at least on top),
my feeling is not many people actually read feature-removal-schedule.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - July 1, 2009, 4:08 p.m.
From: Patrick McHardy <kaber@trash.net>
Date: Wed, 01 Jul 2009 18:05:40 +0200

> David Miller wrote:
>> From: Patrick McHardy <kaber@trash.net>
>> Date: Wed, 01 Jul 2009 12:51:06 +0200
>> 
>>> I agree, this has already caused an endless amount of problems for
>>> users due to the unexpected behaviour and, in some cases, bugs.
>>>
>>> Dave's point is certainly valid as well, until we change the defaults,
>>> distributions can use sysctl.conf. But I think we should move towards
>>> changing the defaults.
>> You must do this via something like your suggestion, the
>> feature-removal-schedule.txt thing.
> 
> Yes, of course. But I prefer the runtime warning (at least on top),
> my feeling is not many people actually read feature-removal-schedule.

Ok.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu - July 1, 2009, 4:26 p.m.
On Wed, Jul 01, 2009 at 09:02:02AM -0700, David Miller wrote:
> 
> Consider the posibility that the people using it aren't saying
> anything because things just-work for them right now.
> 
> If you change this default, I guarentee we will hear from them.
> 
> "everyone else" is making noise only because the default isn't
> what they want.  There is no other reason.

FWIW I don't really care what we have as the default for bridge
netfilter.  I just want to make sure that people who do have
bridge netfilter (and in particular, conntrack + bridge) active
on their machines are aware of the security implications.  Otherwise
we'd be negligent.

As you said distros can change the default regardless of what
the kernel does.

Cheers,
Herbert Xu - July 1, 2009, 4:33 p.m.
On Wed, Jul 01, 2009 at 11:21:44AM +0200, Patrick McHardy wrote:
>
> Agreed, at least as long as this is still the default behaviour.
> Mark, could you add this to your patch? br_nf_pre_routing_finish()
> looks like a good place to print a warning when skb->nfct != NULL.

Here's a suggestion:

Can we add another field to the conntrack tuple? This would be used
to ensure that every bridge's conntrack is distinct from each other,
as well as that of the system IP stack.

Cheers,
Patrick McHardy - July 1, 2009, 5:01 p.m.
Herbert Xu wrote:
> On Wed, Jul 01, 2009 at 11:21:44AM +0200, Patrick McHardy wrote:
>> Agreed, at least as long as this is still the default behaviour.
>> Mark, could you add this to your patch? br_nf_pre_routing_finish()
>> looks like a good place to print a warning when skb->nfct != NULL.
> 
> Here's a suggestion:
> 
> Can we add another field to the conntrack tuple? This would be used
> to ensure that every bridge's conntrack is distinct from each other,
> as well as that of the system IP stack.

We would need a key that can be uniquely determined at all points
and that can be inverted (taking into account ebtables NAT, NAT to
a different bridge etc) - I can't think of a suitable one right now.
But besides the conntrack size increase, I don't think this is the
correct solution for this problem.

Defragmentation (before conntrack) would still allow fragments to
cross boundaries, unless we key the defragmentation queues using the
same key. And generally defragmenting bridged packets by default,
possibly passing them through NAT, IP routing etc. is simply wrong
and only (somewhat) works in certain scenarios. Helpers might get
confused when the same packet is flooded to multiple output ports,
IPsec policies might magically get applied, etc etc. The best way
to make people aware of all these implications and avoid unsuspecting
people running into this again and again would be to change the
defaults and have people think before they use this. Long term I
think this needs to be completely redesigned.

And for the record, I don't believe that this is used a lot and we're
just not aware because it simply works. The fact is it always had major
problems that we fixed as good as possible over the years, but I'm
pretty certain I've heard from just about every user of this at least
once :)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Smith - July 1, 2009, 9:18 p.m.
On Wed, 01 Jul 2009 18:05:40 +0200
Patrick McHardy <kaber@trash.net> wrote:

> David Miller wrote:
> > From: Patrick McHardy <kaber@trash.net>
> > Date: Wed, 01 Jul 2009 12:51:06 +0200
> > 
> >> I agree, this has already caused an endless amount of problems for
> >> users due to the unexpected behaviour and, in some cases, bugs.
> >>
> >> Dave's point is certainly valid as well, until we change the defaults,
> >> distributions can use sysctl.conf. But I think we should move towards
> >> changing the defaults.
> > 
> > You must do this via something like your suggestion, the
> > feature-removal-schedule.txt thing.
> 
> Yes, of course. But I prefer the runtime warning (at least on top),
> my feeling is not many people actually read feature-removal-schedule.
> 

Reviewing my earlier email, I realised I didn't say why I liked it.
Yes, I've been burned by it quite a bit, but that was because I wasn't
aware of it.

I do see a lot of value in it for "layer 3 transparent" firewalling.
Adding a firewall to a network can be a bit of an effort as it may
involve changing the networks routing configuration, and consequently
all the things that involves e.g. renumbering hosts, spitting up
subnets or adding new ones. Being able to insert a layer 3
transparent firewalling device between the upstream router and the
downstream hosts would be far, far easier.

With it being able to firewall bridged PPPoE/PPP traffic, potentially
made it even more useful, although in less common cases. For example, I
have a number of devices at home that are themselves running PPPoE/PPP,
rather than having a single upstream router running it. If I wasn't
confident of the firewalling capabilities of each of those devices, I
could insert a layer 3 transparent iptables firewall, and add another
level of firewalling to the PPPoE/PPP encapsulated traffic.

So, I'd certainly like the feature to stay. It just needs to either not
be on by default, or the default made more obvious and a method added
to make it easy to switch off.

Thanks,
Mark.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/Kconfig b/net/Kconfig
index 7051b97..b5f4379 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -153,6 +153,18 @@  config BRIDGE_NETFILTER
 
 	  If unsure, say N.
 
+config BRIDGE_NETFILTER_DEFAULT_ON
+	def_bool y
+	prompt "Enable netfilter on the bridge by default"
+	depends on BRIDGE_NETFILTER && SYSCTL
+	---help---
+	  Selecting this option will enable netfilter iptables
+	  etc. rules on bridges by default. This means that netfilter
+	  iptables rules will apply to frames forwarded across the
+	  bridge. If this option is not selected, it can be enabled at
+	  runtime using the net.bridge.bridge-nf-call-*tables sysctl
+	  settings.
+
 source "net/netfilter/Kconfig"
 source "net/ipv4/netfilter/Kconfig"
 source "net/ipv6/netfilter/Kconfig"
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index d22f611..ed53e21 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -55,9 +55,15 @@ 
 
 #ifdef CONFIG_SYSCTL
 static struct ctl_table_header *brnf_sysctl_header;
+#ifdef CONFIG_BRIDGE_NETFILTER_DEFAULT_ON
 static int brnf_call_iptables __read_mostly = 1;
 static int brnf_call_ip6tables __read_mostly = 1;
 static int brnf_call_arptables __read_mostly = 1;
+#else
+static int brnf_call_iptables __read_mostly = 0;
+static int brnf_call_ip6tables __read_mostly = 0;
+static int brnf_call_arptables __read_mostly = 0;
+#endif
 static int brnf_filter_vlan_tagged __read_mostly = 0;
 static int brnf_filter_pppoe_tagged __read_mostly = 0;
 #else