Message ID | 1358889648-1684-1-git-send-email-pshelar@nicira.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2013-01-22 at 13:20 -0800, Pravin B Shelar wrote: > Make copy of skb sharable data by linearizing skb. so that > csum remain consistent when skb is actually transmitted. > > Signed-off-by: Pravin B Shelar <pshelar@nicira.com> > --- > net/ipv4/ip_gre.c | 22 ++++++++++++++++++---- > 1 file changed, 18 insertions(+), 4 deletions(-) > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > index 7cdf1fe..20d3d37 100644 > --- a/net/ipv4/ip_gre.c > +++ b/net/ipv4/ip_gre.c > @@ -738,7 +738,7 @@ drop: > static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev) > { > struct ip_tunnel *tunnel = netdev_priv(dev); > - const struct iphdr *old_iph = ip_hdr(skb); > + const struct iphdr *old_iph; > const struct iphdr *tiph; > struct flowi4 fl4; > u8 tos; > @@ -752,9 +752,23 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev > int mtu; > u8 ttl; > > - if (skb->ip_summed == CHECKSUM_PARTIAL && > - skb_checksum_help(skb)) > - goto tx_error; > + if (skb->ip_summed == CHECKSUM_PARTIAL) { > + int err; > + > + /* Pages aren't locked and could change at any time. > + * If this happens after we compute the checksum, the > + * checksum will be wrong. We linearize now to avoid > + * this problem. > + */ > + err = __skb_linearize(skb); > + if (unlikely(err)) > + goto tx_error; > + > + err = skb_checksum_help(skb); > + if (unlikely(err)) > + goto tx_error; > + } > + old_iph = ip_hdr(skb); > > if (dev->type == ARPHRD_ETHER) > IPCB(skb)->flags = 0; This sounds bogus to me. If user cant cope with changes on sent data, just disable GSO on GRE device. An application changing data provided on a sendfile() or vmsplice() cant really expect data integrity being respected, even if checksum are done by the NIC (TSO) So if data integrity is not respected, just send a bogus TX checksum. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 22 Jan 2013 14:03:19 -0800 > An application changing data provided on a sendfile() or vmsplice() cant > really expect data integrity being respected, even if checksum are done > by the NIC (TSO) > > So if data integrity is not respected, just send a bogus TX checksum. I disagree. As a quality of implementation decision, we should always compute correct checksums, ragardless of whether the page contents can be modified asynchronously during the packet transmit. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-01-22 at 17:08 -0500, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Tue, 22 Jan 2013 14:03:19 -0800 > > > An application changing data provided on a sendfile() or vmsplice() cant > > really expect data integrity being respected, even if checksum are done > > by the NIC (TSO) > > > > So if data integrity is not respected, just send a bogus TX checksum. > > I disagree. > > As a quality of implementation decision, we should always compute > correct checksums, ragardless of whether the page contents can be > modified asynchronously during the packet transmit. The retransmits will compute the correct checksums. For a very unlikely operation from the user, we want to disable GSO on GRE. If so, just revert my GSO patch. This is plain stupid to cook GSO packets and then linearize them, adding an extra copy and a point of failure, as allocating a 128K skb->head will probably fail very easily. Now, tcp_sendmsg() cooked skb are fine, and this patch assumes they are not. This patch is absolutely wrong. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 22, 2013 at 2:15 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2013-01-22 at 17:08 -0500, David Miller wrote: >> From: Eric Dumazet <eric.dumazet@gmail.com> >> Date: Tue, 22 Jan 2013 14:03:19 -0800 >> >> > An application changing data provided on a sendfile() or vmsplice() cant >> > really expect data integrity being respected, even if checksum are done >> > by the NIC (TSO) >> > >> > So if data integrity is not respected, just send a bogus TX checksum. >> >> I disagree. >> >> As a quality of implementation decision, we should always compute >> correct checksums, ragardless of whether the page contents can be >> modified asynchronously during the packet transmit. > > The retransmits will compute the correct checksums. > > For a very unlikely operation from the user, we want to disable GSO on > GRE. > > If so, just revert my GSO patch. > > This is plain stupid to cook GSO packets and then linearize them, adding > an extra copy and a point of failure, as allocating a 128K skb->head > will probably fail very easily. > OK. This overhead can be avoided with GRE segmentation offload. I will send that series soon. > Now, tcp_sendmsg() cooked skb are fine, and this patch assumes they are > not. > > This patch is absolutely wrong. > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 22, 2013 at 2:15 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2013-01-22 at 17:08 -0500, David Miller wrote: >> From: Eric Dumazet <eric.dumazet@gmail.com> >> Date: Tue, 22 Jan 2013 14:03:19 -0800 >> >> > An application changing data provided on a sendfile() or vmsplice() cant >> > really expect data integrity being respected, even if checksum are done >> > by the NIC (TSO) >> > >> > So if data integrity is not respected, just send a bogus TX checksum. >> >> I disagree. >> >> As a quality of implementation decision, we should always compute >> correct checksums, ragardless of whether the page contents can be >> modified asynchronously during the packet transmit. > > The retransmits will compute the correct checksums. > > For a very unlikely operation from the user, we want to disable GSO on > GRE. We're currently enforcing this assumption in the rest of the network stack - it's why we mask out scatter/gather capability in the NIC if it isn't capable of checksumming the packet. Packets with asynchronous changes may come from VMs, so it isn't necessarily reasonable to tell people that they need to disable offloads with certain use cases. As Pravin said, pushing down the GSO to the lowest layer is the best way to solve the problem. However, I would argue that the current behavior is not correct. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-01-22 at 14:38 -0800, Jesse Gross wrote: > > We're currently enforcing this assumption in the rest of the network > stack - it's why we mask out scatter/gather capability in the NIC if > it isn't capable of checksumming the packet. > > Packets with asynchronous changes may come from VMs, so it isn't > necessarily reasonable to tell people that they need to disable > offloads with certain use cases. > > As Pravin said, pushing down the GSO to the lowest layer is the best > way to solve the problem. However, I would argue that the current > behavior is not correct. You do understand this problem is generic to GSO ? You basically are saying GSO should be removed. If you really care, please find another way to address the problem. Frames build by tcp_sendmsg() are fine : their content cannot be changed by the user. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 22 Jan 2013 14:44:13 -0800 > On Tue, 2013-01-22 at 14:38 -0800, Jesse Gross wrote: > >> >> We're currently enforcing this assumption in the rest of the network >> stack - it's why we mask out scatter/gather capability in the NIC if >> it isn't capable of checksumming the packet. >> >> Packets with asynchronous changes may come from VMs, so it isn't >> necessarily reasonable to tell people that they need to disable >> offloads with certain use cases. >> >> As Pravin said, pushing down the GSO to the lowest layer is the best >> way to solve the problem. However, I would argue that the current >> behavior is not correct. > > You do understand this problem is generic to GSO ? > > You basically are saying GSO should be removed. > > If you really care, please find another way to address the problem. > > Frames build by tcp_sendmsg() are fine : their content cannot be changed > by the user. We don't emit crap onto the wire knowingly. Jesse is right. If you want to do software GSO in situations where we know that the paged data cannot be modified asynchronously, you'll have to explicitly support that. I will not accept us saying that allowing the emission of bad checksums is OK. It never is. That's terrible behavior, and creates impossible to disagnose problems. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-01-22 at 19:41 -0500, David Miller wrote: > We don't emit crap onto the wire knowingly. Jesse is right. > > If you want to do software GSO in situations where we know that the > paged data cannot be modified asynchronously, you'll have to > explicitly support that. > > I will not accept us saying that allowing the emission of bad > checksums is OK. It never is. That's terrible behavior, and creates > impossible to disagnose problems. How is this related to GRE only ? We have dozen of skb_checksum_help() calls that basically have the same issue. Am I missing some obvious thing ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 22 Jan 2013 17:08:13 -0800 > On Tue, 2013-01-22 at 19:41 -0500, David Miller wrote: > >> We don't emit crap onto the wire knowingly. Jesse is right. >> >> If you want to do software GSO in situations where we know that the >> paged data cannot be modified asynchronously, you'll have to >> explicitly support that. >> >> I will not accept us saying that allowing the emission of bad >> checksums is OK. It never is. That's terrible behavior, and creates >> impossible to disagnose problems. > > How is this related to GRE only ? > > We have dozen of skb_checksum_help() calls that basically have the same > issue. > > Am I missing some obvious thing ? Then all of them need fixing. We can't emit bogus checksums on the wire when this is completely under our control. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: David Miller <davem@davemloft.net> Date: Tue, 22 Jan 2013 20:46:29 -0500 (EST) > We can't emit bogus checksums on the wire when this is completely > under our control. BTW consider a legitimate use case where an application has a shared memory page between threads that keeps track of statistic counters, and it uses sendfile() over an fd backing that mmap()'d area to send the statistics out remotely over TCP. That's completely legitimate. As is an application constantly overwriting a page in the filesystem page cache, and another app doing sendfile() over TCP from that area. In such a situation everyone of your magic retransmits will have a bad checksum too. You can't say that this behavior is anything other than broken, and I don't want to be responsible for a system that's doing broken things like that. If your GSO over GRE tunnels stuff fundamentally depends upon this magic retransmit fantasy working, then yes we have to disable it completely. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-01-22 at 23:22 -0500, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Tue, 22 Jan 2013 20:46:29 -0500 (EST) > > > We can't emit bogus checksums on the wire when this is completely > > under our control. > > BTW consider a legitimate use case where an application has a shared > memory page between threads that keeps track of statistic counters, > and it uses sendfile() over an fd backing that mmap()'d area to > send the statistics out remotely over TCP. > > That's completely legitimate. > > As is an application constantly overwriting a page in the filesystem > page cache, and another app doing sendfile() over TCP from that > area. > > In such a situation everyone of your magic retransmits will have a > bad checksum too. > > You can't say that this behavior is anything other than broken, and > I don't want to be responsible for a system that's doing broken > things like that. > > If your GSO over GRE tunnels stuff fundamentally depends upon this > magic retransmit fantasy working, then yes we have to disable it > completely. > -- But then GSO should be disabled right now, GRE or not. We have a generic problem. GSO was a bad idea from the very beginning, if you want such use case. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 22 Jan 2013 21:02:53 -0800 > GSO was a bad idea from the very beginning, if you want > such use case. It's perfectly fine when the card checksums the packet. Because it will checksum the exact data that will hit the wire and thus the checksum will be correct always. It's only buggy when we do pure software GSO and have page frags. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-01-23 at 00:19 -0500, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Tue, 22 Jan 2013 21:02:53 -0800 > > > GSO was a bad idea from the very beginning, if you want > > such use case. > > It's perfectly fine when the card checksums the packet. Because it > will checksum the exact data that will hit the wire and thus the > checksum will be correct always. > > It's only buggy when we do pure software GSO and have page frags. Right. We could add a new SKB_GSO_SHARED_FRAG bit for that, and force an extra copy of frags, instead of a linearize call. Basically, splice(pipe -> socket)/sendfile()/vmsplice() should set this flag. Its a bit late here, I'll do that tomorrow. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 22 Jan 2013 21:30:50 -0800 > We could add a new SKB_GSO_SHARED_FRAG bit for that, and force an extra > copy of frags, instead of a linearize call. > > Basically, splice(pipe -> socket)/sendfile()/vmsplice() should set this > flag. Right. > Its a bit late here, I'll do that tomorrow. Thank you. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 7cdf1fe..20d3d37 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -738,7 +738,7 @@ drop: static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev) { struct ip_tunnel *tunnel = netdev_priv(dev); - const struct iphdr *old_iph = ip_hdr(skb); + const struct iphdr *old_iph; const struct iphdr *tiph; struct flowi4 fl4; u8 tos; @@ -752,9 +752,23 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev int mtu; u8 ttl; - if (skb->ip_summed == CHECKSUM_PARTIAL && - skb_checksum_help(skb)) - goto tx_error; + if (skb->ip_summed == CHECKSUM_PARTIAL) { + int err; + + /* Pages aren't locked and could change at any time. + * If this happens after we compute the checksum, the + * checksum will be wrong. We linearize now to avoid + * this problem. + */ + err = __skb_linearize(skb); + if (unlikely(err)) + goto tx_error; + + err = skb_checksum_help(skb); + if (unlikely(err)) + goto tx_error; + } + old_iph = ip_hdr(skb); if (dev->type == ARPHRD_ETHER) IPCB(skb)->flags = 0;
Make copy of skb sharable data by linearizing skb. so that csum remain consistent when skb is actually transmitted. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> --- net/ipv4/ip_gre.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-)