Message ID | 20190730122534.30687-1-rdong.ge@gmail.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | bridge:fragmented packets dropped by bridge | expand |
Rundong Ge <rdong.ge@gmail.com> wrote: > Given following setup: > -modprobe br_netfilter > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables > -brctl addbr br0 > -brctl addif br0 enp2s0 > -brctl addif br0 enp3s0 > -brctl addif br0 enp6s0 > -ifconfig enp2s0 mtu 1300 > -ifconfig enp3s0 mtu 1500 > -ifconfig enp6s0 mtu 1500 > -ifconfig br0 up > > multi-port > mtu1500 - mtu1500|bridge|1500 - mtu1500 > A | B > mtu1300 How can a bridge forward a frame from A/B to mtu1300? > With netfilter defragmentation/conntrack enabled, fragmented > packets from A will be defragmented in prerouting, and refragmented > at postrouting. Yes, but I don't see how that relates to the problem at hand. > But in this scenario the bridge found the frag_max_size(1500) is > larger than the dst mtu stored in the fake_rtable whitch is > always equal to the bridge's mtu 1300, then packets will be dopped. What happens without netfilter or non-fragmented packets? > This modifies ip_skb_dst_mtu to use the out dev's mtu instead > of bridge's mtu in bridge refragment. It seems quite a hack? The above setup should use a router, not a bridge.
On 30/07/2019 15:25, Rundong Ge wrote: > Given following setup: > -modprobe br_netfilter > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables > -brctl addbr br0 > -brctl addif br0 enp2s0 > -brctl addif br0 enp3s0 > -brctl addif br0 enp6s0 > -ifconfig enp2s0 mtu 1300 > -ifconfig enp3s0 mtu 1500 > -ifconfig enp6s0 mtu 1500 > -ifconfig br0 up > > multi-port > mtu1500 - mtu1500|bridge|1500 - mtu1500 > A | B > mtu1300 > > With netfilter defragmentation/conntrack enabled, fragmented > packets from A will be defragmented in prerouting, and refragmented > at postrouting. > But in this scenario the bridge found the frag_max_size(1500) is > larger than the dst mtu stored in the fake_rtable whitch is > always equal to the bridge's mtu 1300, then packets will be dopped. > > This modifies ip_skb_dst_mtu to use the out dev's mtu instead > of bridge's mtu in bridge refragment. > > Signed-off-by: Rundong Ge <rdong.ge@gmail.com> > --- > include/net/ip.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/include/net/ip.h b/include/net/ip.h > index 29d89de..0512de3 100644 > --- a/include/net/ip.h > +++ b/include/net/ip.h > @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst, > static inline unsigned int ip_skb_dst_mtu(struct sock *sk, > const struct sk_buff *skb) > { > + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev) > + return min(skb->dev->mtu, IP_MAX_MTU); > if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) { > bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED; > > I don't think this is correct, there's a reason why the bridge chooses the smallest possible MTU out of its members and this is simply a hack to circumvent it. If you really like to do so just set the bridge MTU manually, we've added support so it won't change automatically to the smallest, but then how do you pass packets 1500 -> 1300 in this setup ? You're talking about the frag_size check in br_nf_ip_fragment(), right ?
> How can a bridge forward a frame from A/B to mtu1300? It is free for user to set different MTU for bridge ports. In our case only tcp traffic between A/B and mtu 1300, and mss negotiation can make packets less than 1300. > What happens without netfilter or non-fragmented packets? Without br_netfilter it works fine, there is no defragmentation and refragmentation, fragmented packets will egress directly. Florian Westphal <fw@strlen.de> 于2019年7月30日周二 下午8:35写道: > > Rundong Ge <rdong.ge@gmail.com> wrote: > > Given following setup: > > -modprobe br_netfilter > > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables > > -brctl addbr br0 > > -brctl addif br0 enp2s0 > > -brctl addif br0 enp3s0 > > -brctl addif br0 enp6s0 > > -ifconfig enp2s0 mtu 1300 > > -ifconfig enp3s0 mtu 1500 > > -ifconfig enp6s0 mtu 1500 > > -ifconfig br0 up > > > > multi-port > > mtu1500 - mtu1500|bridge|1500 - mtu1500 > > A | B > > mtu1300 > > How can a bridge forward a frame from A/B to mtu1300? > > > With netfilter defragmentation/conntrack enabled, fragmented > > packets from A will be defragmented in prerouting, and refragmented > > at postrouting. > > Yes, but I don't see how that relates to the problem at hand. > > > But in this scenario the bridge found the frag_max_size(1500) is > > larger than the dst mtu stored in the fake_rtable whitch is > > always equal to the bridge's mtu 1300, then packets will be dopped. > > What happens without netfilter or non-fragmented packets? > > > This modifies ip_skb_dst_mtu to use the out dev's mtu instead > > of bridge's mtu in bridge refragment. > > It seems quite a hack? The above setup should use a router, not a bridge.
Yes it is about the frag_size check in br_nf_ip_fragment(). As i said without br_netfilter the packets forwarding is fine. And I feel it is weird that br_nf_dev_queue_xmit() use out dev's mtu to decide whether to do the fragmentation, but then br_nf_ip_fragment() use bridge's mtu to do the actual fragmentation. And in this case fragmented packets fit the out dev mtu but were dropped, I think it is not right. Nikolay Aleksandrov <nikolay@cumulusnetworks.com> 于2019年7月30日周二 下午8:41写道: > > On 30/07/2019 15:25, Rundong Ge wrote: > > Given following setup: > > -modprobe br_netfilter > > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables > > -brctl addbr br0 > > -brctl addif br0 enp2s0 > > -brctl addif br0 enp3s0 > > -brctl addif br0 enp6s0 > > -ifconfig enp2s0 mtu 1300 > > -ifconfig enp3s0 mtu 1500 > > -ifconfig enp6s0 mtu 1500 > > -ifconfig br0 up > > > > multi-port > > mtu1500 - mtu1500|bridge|1500 - mtu1500 > > A | B > > mtu1300 > > > > With netfilter defragmentation/conntrack enabled, fragmented > > packets from A will be defragmented in prerouting, and refragmented > > at postrouting. > > But in this scenario the bridge found the frag_max_size(1500) is > > larger than the dst mtu stored in the fake_rtable whitch is > > always equal to the bridge's mtu 1300, then packets will be dopped. > > > > This modifies ip_skb_dst_mtu to use the out dev's mtu instead > > of bridge's mtu in bridge refragment. > > > > Signed-off-by: Rundong Ge <rdong.ge@gmail.com> > > --- > > include/net/ip.h | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/include/net/ip.h b/include/net/ip.h > > index 29d89de..0512de3 100644 > > --- a/include/net/ip.h > > +++ b/include/net/ip.h > > @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst, > > static inline unsigned int ip_skb_dst_mtu(struct sock *sk, > > const struct sk_buff *skb) > > { > > + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev) > > + return min(skb->dev->mtu, IP_MAX_MTU); > > if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) { > > bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED; > > > > > > I don't think this is correct, there's a reason why the bridge chooses the smallest > possible MTU out of its members and this is simply a hack to circumvent it. > If you really like to do so just set the bridge MTU manually, we've added support > so it won't change automatically to the smallest, but then how do you pass packets > 1500 -> 1300 in this setup ? > > You're talking about the frag_size check in br_nf_ip_fragment(), right ? >
On Tue, Jul 30, 2019 at 8:41 PM Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote: > > On 30/07/2019 15:25, Rundong Ge wrote: > > Given following setup: > > -modprobe br_netfilter > > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables > > -brctl addbr br0 > > -brctl addif br0 enp2s0 > > -brctl addif br0 enp3s0 > > -brctl addif br0 enp6s0 > > -ifconfig enp2s0 mtu 1300 > > -ifconfig enp3s0 mtu 1500 > > -ifconfig enp6s0 mtu 1500 > > -ifconfig br0 up > > > > multi-port > > mtu1500 - mtu1500|bridge|1500 - mtu1500 > > A | B > > mtu1300 > > > > With netfilter defragmentation/conntrack enabled, fragmented > > packets from A will be defragmented in prerouting, and refragmented > > at postrouting. > > But in this scenario the bridge found the frag_max_size(1500) is > > larger than the dst mtu stored in the fake_rtable whitch is > > always equal to the bridge's mtu 1300, then packets will be dopped. > > > > This modifies ip_skb_dst_mtu to use the out dev's mtu instead > > of bridge's mtu in bridge refragment. > > > > Signed-off-by: Rundong Ge <rdong.ge@gmail.com> > > --- > > include/net/ip.h | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/include/net/ip.h b/include/net/ip.h > > index 29d89de..0512de3 100644 > > --- a/include/net/ip.h > > +++ b/include/net/ip.h > > @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst, > > static inline unsigned int ip_skb_dst_mtu(struct sock *sk, > > const struct sk_buff *skb) > > { > > + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev) > > + return min(skb->dev->mtu, IP_MAX_MTU); > > if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) { > > bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED; > > > > > > I don't think this is correct, there's a reason why the bridge chooses the smallest > possible MTU out of its members and this is simply a hack to circumvent it. > If you really like to do so just set the bridge MTU manually, we've added support > so it won't change automatically to the smallest, but then how do you pass packets > 1500 -> 1300 in this setup ? > > You're talking about the frag_size check in br_nf_ip_fragment(), right ? > Hi Nikolay My setup may not be common. And may I know if there is any reason to use output port's MTU to do the re-fragment check but then use the bridge's MTU to do the re-fragment? Is it the expected behavior that the bridge's MTU will affect the FORWARD traffic re-fragment, because I used to think the bridge's MTU will only effect the OUTPUT traffic sent from "br0". And the modification in this patch will replace the MTU in the fake_rtable which is only used in the FORWARD re-fragment and won't affect the local traffic from "br0". TKS Raydodn
On Tuesday 2019-07-30 14:35, Florian Westphal wrote: >Rundong Ge <rdong.ge@gmail.com> wrote: >> Given following setup: >> -modprobe br_netfilter >> -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables >> -brctl addbr br0 >> -brctl addif br0 enp2s0 >> -brctl addif br0 enp3s0 >> -brctl addif br0 enp6s0 >> -ifconfig enp2s0 mtu 1300 >> -ifconfig enp3s0 mtu 1500 >> -ifconfig enp6s0 mtu 1500 >> -ifconfig br0 up >> >> multi-port >> mtu1500 - mtu1500|bridge|1500 - mtu1500 >> A | B >> mtu1300 > >How can a bridge forward a frame from A/B to mtu1300? There might be a misunderstanding here judging from the shortness of this thread. I understood it such that the bridge ports (eth0,eth1) have MTU 1500, yet br0 (in essence the third bridge port if you so wish) itself has MTU 1300. Therefore, frame forwarding from eth0 to eth1 should succeed, since the 1300-byte MTU is only relevant if the bridge decides the packet needs to be locally delivered.
Jan Engelhardt <jengelh@inai.de> 于2019年8月26日周一 下午3:59写道: > > > On Tuesday 2019-07-30 14:35, Florian Westphal wrote: > >Rundong Ge <rdong.ge@gmail.com> wrote: > >> Given following setup: > >> -modprobe br_netfilter > >> -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables > >> -brctl addbr br0 > >> -brctl addif br0 enp2s0 > >> -brctl addif br0 enp3s0 > >> -brctl addif br0 enp6s0 > >> -ifconfig enp2s0 mtu 1300 > >> -ifconfig enp3s0 mtu 1500 > >> -ifconfig enp6s0 mtu 1500 > >> -ifconfig br0 up > >> > >> multi-port > >> mtu1500 - mtu1500|bridge|1500 - mtu1500 > >> A | B > >> mtu1300 > > > >How can a bridge forward a frame from A/B to mtu1300? > > There might be a misunderstanding here judging from the shortness of this > thread. > > I understood it such that the bridge ports (eth0,eth1) have MTU 1500, yet br0 > (in essence the third bridge port if you so wish) itself has MTU 1300. > > Therefore, frame forwarding from eth0 to eth1 should succeed, since the > 1300-byte MTU is only relevant if the bridge decides the packet needs to be > locally delivered. Under this setup when I do "ping B -l 2000" from A, the fragmented packets will be dropped by bridge. When the "/proc/sys/net/bridge/bridge-nf-call-iptables" is on, bridge will do defragment at PREROUTING and re-fragment at POSTROUTING. At the re-fragment bridge will check if the max frag size is larger than the bridge's MTU in br_nf_ip_fragment(), if it is true packets will be dropped. And this patch use the outdev's MTU instead of the bridge's MTU to do the br_nf_ip_fragment.
diff --git a/include/net/ip.h b/include/net/ip.h index 29d89de..0512de3 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst, static inline unsigned int ip_skb_dst_mtu(struct sock *sk, const struct sk_buff *skb) { + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev) + return min(skb->dev->mtu, IP_MAX_MTU); if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) { bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
Given following setup: -modprobe br_netfilter -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables -brctl addbr br0 -brctl addif br0 enp2s0 -brctl addif br0 enp3s0 -brctl addif br0 enp6s0 -ifconfig enp2s0 mtu 1300 -ifconfig enp3s0 mtu 1500 -ifconfig enp6s0 mtu 1500 -ifconfig br0 up multi-port mtu1500 - mtu1500|bridge|1500 - mtu1500 A | B mtu1300 With netfilter defragmentation/conntrack enabled, fragmented packets from A will be defragmented in prerouting, and refragmented at postrouting. But in this scenario the bridge found the frag_max_size(1500) is larger than the dst mtu stored in the fake_rtable whitch is always equal to the bridge's mtu 1300, then packets will be dopped. This modifies ip_skb_dst_mtu to use the out dev's mtu instead of bridge's mtu in bridge refragment. Signed-off-by: Rundong Ge <rdong.ge@gmail.com> --- include/net/ip.h | 2 ++ 1 file changed, 2 insertions(+)