diff mbox

[2/2] IP_GRE: Linearize skb before csum.

Message ID 1358889648-1684-1-git-send-email-pshelar@nicira.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Pravin B Shelar Jan. 22, 2013, 9:20 p.m. UTC
Make copy of skb sharable data by linearizing skb. so that
csum remain consistent when skb is actually transmitted.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/ipv4/ip_gre.c |   22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

Comments

Eric Dumazet Jan. 22, 2013, 10:03 p.m. UTC | #1
On Tue, 2013-01-22 at 13:20 -0800, Pravin B Shelar wrote:
> Make copy of skb sharable data by linearizing skb. so that
> csum remain consistent when skb is actually transmitted.
> 
> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
> ---
>  net/ipv4/ip_gre.c |   22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 7cdf1fe..20d3d37 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -738,7 +738,7 @@ drop:
>  static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct ip_tunnel *tunnel = netdev_priv(dev);
> -	const struct iphdr  *old_iph = ip_hdr(skb);
> +	const struct iphdr  *old_iph;
>  	const struct iphdr  *tiph;
>  	struct flowi4 fl4;
>  	u8     tos;
> @@ -752,9 +752,23 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
>  	int    mtu;
>  	u8     ttl;
>  
> -	if (skb->ip_summed == CHECKSUM_PARTIAL &&
> -	    skb_checksum_help(skb))
> -		goto tx_error;
> +	if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +		int err;
> +
> +		/* Pages aren't locked and could change at any time.
> +		 * If this happens after we compute the checksum, the
> +		 * checksum will be wrong.  We linearize now to avoid
> +		 * this problem.
> +		 */
> +		err = __skb_linearize(skb);
> +		if (unlikely(err))
> +			goto tx_error;
> +
> +		err = skb_checksum_help(skb);
> +		if (unlikely(err))
> +			goto tx_error;
> +	}
> +	old_iph = ip_hdr(skb);
>  
>  	if (dev->type == ARPHRD_ETHER)
>  		IPCB(skb)->flags = 0;

This sounds bogus to me.

If user cant cope with changes on sent data, just disable GSO on GRE
device.

An application changing data provided on a sendfile() or vmsplice() cant
really expect data integrity being respected, even if checksum are done
by the NIC (TSO)

So if data integrity is not respected, just send a bogus TX checksum.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 22, 2013, 10:08 p.m. UTC | #2
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Jan 2013 14:03:19 -0800

> An application changing data provided on a sendfile() or vmsplice() cant
> really expect data integrity being respected, even if checksum are done
> by the NIC (TSO)
> 
> So if data integrity is not respected, just send a bogus TX checksum.

I disagree.

As a quality of implementation decision, we should always compute
correct checksums, ragardless of whether the page contents can be
modified asynchronously during the packet transmit.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 22, 2013, 10:15 p.m. UTC | #3
On Tue, 2013-01-22 at 17:08 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 22 Jan 2013 14:03:19 -0800
> 
> > An application changing data provided on a sendfile() or vmsplice() cant
> > really expect data integrity being respected, even if checksum are done
> > by the NIC (TSO)
> > 
> > So if data integrity is not respected, just send a bogus TX checksum.
> 
> I disagree.
> 
> As a quality of implementation decision, we should always compute
> correct checksums, ragardless of whether the page contents can be
> modified asynchronously during the packet transmit.

The retransmits will compute the correct checksums.

For a very unlikely operation from the user, we want to disable GSO on
GRE.

If so, just revert my GSO patch.

This is plain stupid to cook GSO packets and then linearize them, adding
an extra copy and a point of failure, as allocating a 128K skb->head
will probably fail very easily.

Now, tcp_sendmsg() cooked skb are fine, and this patch assumes they are
not.

This patch is absolutely wrong.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pravin B Shelar Jan. 22, 2013, 10:30 p.m. UTC | #4
On Tue, Jan 22, 2013 at 2:15 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2013-01-22 at 17:08 -0500, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Tue, 22 Jan 2013 14:03:19 -0800
>>
>> > An application changing data provided on a sendfile() or vmsplice() cant
>> > really expect data integrity being respected, even if checksum are done
>> > by the NIC (TSO)
>> >
>> > So if data integrity is not respected, just send a bogus TX checksum.
>>
>> I disagree.
>>
>> As a quality of implementation decision, we should always compute
>> correct checksums, ragardless of whether the page contents can be
>> modified asynchronously during the packet transmit.
>
> The retransmits will compute the correct checksums.
>
> For a very unlikely operation from the user, we want to disable GSO on
> GRE.
>
> If so, just revert my GSO patch.
>
> This is plain stupid to cook GSO packets and then linearize them, adding
> an extra copy and a point of failure, as allocating a 128K skb->head
> will probably fail very easily.
>
OK.
This overhead can be avoided with GRE segmentation offload. I will
send that series soon.

> Now, tcp_sendmsg() cooked skb are fine, and this patch assumes they are
> not.
>
> This patch is absolutely wrong.
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Gross Jan. 22, 2013, 10:38 p.m. UTC | #5
On Tue, Jan 22, 2013 at 2:15 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2013-01-22 at 17:08 -0500, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Tue, 22 Jan 2013 14:03:19 -0800
>>
>> > An application changing data provided on a sendfile() or vmsplice() cant
>> > really expect data integrity being respected, even if checksum are done
>> > by the NIC (TSO)
>> >
>> > So if data integrity is not respected, just send a bogus TX checksum.
>>
>> I disagree.
>>
>> As a quality of implementation decision, we should always compute
>> correct checksums, ragardless of whether the page contents can be
>> modified asynchronously during the packet transmit.
>
> The retransmits will compute the correct checksums.
>
> For a very unlikely operation from the user, we want to disable GSO on
> GRE.

We're currently enforcing this assumption in the rest of the network
stack - it's why we mask out scatter/gather capability in the NIC if
it isn't capable of checksumming the packet.

Packets with asynchronous changes may come from VMs, so it isn't
necessarily reasonable to tell people that they need to disable
offloads with certain use cases.

As Pravin said, pushing down the GSO to the lowest layer is the best
way to solve the problem.  However, I would argue that the current
behavior is not correct.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 22, 2013, 10:44 p.m. UTC | #6
On Tue, 2013-01-22 at 14:38 -0800, Jesse Gross wrote:

> 
> We're currently enforcing this assumption in the rest of the network
> stack - it's why we mask out scatter/gather capability in the NIC if
> it isn't capable of checksumming the packet.
> 
> Packets with asynchronous changes may come from VMs, so it isn't
> necessarily reasonable to tell people that they need to disable
> offloads with certain use cases.
> 
> As Pravin said, pushing down the GSO to the lowest layer is the best
> way to solve the problem.  However, I would argue that the current
> behavior is not correct.

You do understand this problem is generic to GSO ?

You basically are saying GSO should be removed.

If you really care, please find another way to address the problem.

Frames build by tcp_sendmsg() are fine : their content cannot be changed
by the user.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 23, 2013, 12:41 a.m. UTC | #7
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Jan 2013 14:44:13 -0800

> On Tue, 2013-01-22 at 14:38 -0800, Jesse Gross wrote:
> 
>> 
>> We're currently enforcing this assumption in the rest of the network
>> stack - it's why we mask out scatter/gather capability in the NIC if
>> it isn't capable of checksumming the packet.
>> 
>> Packets with asynchronous changes may come from VMs, so it isn't
>> necessarily reasonable to tell people that they need to disable
>> offloads with certain use cases.
>> 
>> As Pravin said, pushing down the GSO to the lowest layer is the best
>> way to solve the problem.  However, I would argue that the current
>> behavior is not correct.
> 
> You do understand this problem is generic to GSO ?
> 
> You basically are saying GSO should be removed.
> 
> If you really care, please find another way to address the problem.
> 
> Frames build by tcp_sendmsg() are fine : their content cannot be changed
> by the user.

We don't emit crap onto the wire knowingly.  Jesse is right.

If you want to do software GSO in situations where we know that the
paged data cannot be modified asynchronously, you'll have to
explicitly support that.

I will not accept us saying that allowing the emission of bad
checksums is OK.  It never is.  That's terrible behavior, and creates
impossible to disagnose problems.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 23, 2013, 1:08 a.m. UTC | #8
On Tue, 2013-01-22 at 19:41 -0500, David Miller wrote:

> We don't emit crap onto the wire knowingly.  Jesse is right.
> 
> If you want to do software GSO in situations where we know that the
> paged data cannot be modified asynchronously, you'll have to
> explicitly support that.
> 
> I will not accept us saying that allowing the emission of bad
> checksums is OK.  It never is.  That's terrible behavior, and creates
> impossible to disagnose problems.

How is this related to GRE only ?

We have dozen of skb_checksum_help() calls that basically have the same
issue.

Am I missing some obvious thing ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 23, 2013, 1:46 a.m. UTC | #9
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Jan 2013 17:08:13 -0800

> On Tue, 2013-01-22 at 19:41 -0500, David Miller wrote:
> 
>> We don't emit crap onto the wire knowingly.  Jesse is right.
>> 
>> If you want to do software GSO in situations where we know that the
>> paged data cannot be modified asynchronously, you'll have to
>> explicitly support that.
>> 
>> I will not accept us saying that allowing the emission of bad
>> checksums is OK.  It never is.  That's terrible behavior, and creates
>> impossible to disagnose problems.
> 
> How is this related to GRE only ?
> 
> We have dozen of skb_checksum_help() calls that basically have the same
> issue.
> 
> Am I missing some obvious thing ?

Then all of them need fixing.

We can't emit bogus checksums on the wire when this is completely
under our control.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 23, 2013, 4:22 a.m. UTC | #10
From: David Miller <davem@davemloft.net>
Date: Tue, 22 Jan 2013 20:46:29 -0500 (EST)

> We can't emit bogus checksums on the wire when this is completely
> under our control.

BTW consider a legitimate use case where an application has a shared
memory page between threads that keeps track of statistic counters,
and it uses sendfile() over an fd backing that mmap()'d area to
send the statistics out remotely over TCP.

That's completely legitimate.

As is an application constantly overwriting a page in the filesystem
page cache, and another app doing sendfile() over TCP from that
area.

In such a situation everyone of your magic retransmits will have a
bad checksum too.

You can't say that this behavior is anything other than broken, and
I don't want to be responsible for a system that's doing broken
things like that.

If your GSO over GRE tunnels stuff fundamentally depends upon this
magic retransmit fantasy working, then yes we have to disable it
completely.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 23, 2013, 5:02 a.m. UTC | #11
On Tue, 2013-01-22 at 23:22 -0500, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 22 Jan 2013 20:46:29 -0500 (EST)
> 
> > We can't emit bogus checksums on the wire when this is completely
> > under our control.
> 
> BTW consider a legitimate use case where an application has a shared
> memory page between threads that keeps track of statistic counters,
> and it uses sendfile() over an fd backing that mmap()'d area to
> send the statistics out remotely over TCP.
> 
> That's completely legitimate.
> 
> As is an application constantly overwriting a page in the filesystem
> page cache, and another app doing sendfile() over TCP from that
> area.
> 
> In such a situation everyone of your magic retransmits will have a
> bad checksum too.
> 
> You can't say that this behavior is anything other than broken, and
> I don't want to be responsible for a system that's doing broken
> things like that.
> 
> If your GSO over GRE tunnels stuff fundamentally depends upon this
> magic retransmit fantasy working, then yes we have to disable it
> completely.
> --

But then GSO should be disabled right now, GRE or not.

We have a generic problem.

GSO was a bad idea from the very beginning, if you want
such use case. 



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 23, 2013, 5:19 a.m. UTC | #12
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Jan 2013 21:02:53 -0800

> GSO was a bad idea from the very beginning, if you want
> such use case. 

It's perfectly fine when the card checksums the packet.  Because it
will checksum the exact data that will hit the wire and thus the
checksum will be correct always.

It's only buggy when we do pure software GSO and have page frags.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 23, 2013, 5:30 a.m. UTC | #13
On Wed, 2013-01-23 at 00:19 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 22 Jan 2013 21:02:53 -0800
> 
> > GSO was a bad idea from the very beginning, if you want
> > such use case. 
> 
> It's perfectly fine when the card checksums the packet.  Because it
> will checksum the exact data that will hit the wire and thus the
> checksum will be correct always.
> 
> It's only buggy when we do pure software GSO and have page frags.

Right.

We could add a new SKB_GSO_SHARED_FRAG bit for that, and force an extra
copy of frags, instead of a linearize call.

Basically, splice(pipe -> socket)/sendfile()/vmsplice() should set this
flag.

Its a bit late here, I'll do that tomorrow.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 23, 2013, 5:36 a.m. UTC | #14
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Jan 2013 21:30:50 -0800

> We could add a new SKB_GSO_SHARED_FRAG bit for that, and force an extra
> copy of frags, instead of a linearize call.
> 
> Basically, splice(pipe -> socket)/sendfile()/vmsplice() should set this
> flag.

Right.

> Its a bit late here, I'll do that tomorrow.

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 7cdf1fe..20d3d37 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -738,7 +738,7 @@  drop:
 static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
-	const struct iphdr  *old_iph = ip_hdr(skb);
+	const struct iphdr  *old_iph;
 	const struct iphdr  *tiph;
 	struct flowi4 fl4;
 	u8     tos;
@@ -752,9 +752,23 @@  static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
 	int    mtu;
 	u8     ttl;
 
-	if (skb->ip_summed == CHECKSUM_PARTIAL &&
-	    skb_checksum_help(skb))
-		goto tx_error;
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		int err;
+
+		/* Pages aren't locked and could change at any time.
+		 * If this happens after we compute the checksum, the
+		 * checksum will be wrong.  We linearize now to avoid
+		 * this problem.
+		 */
+		err = __skb_linearize(skb);
+		if (unlikely(err))
+			goto tx_error;
+
+		err = skb_checksum_help(skb);
+		if (unlikely(err))
+			goto tx_error;
+	}
+	old_iph = ip_hdr(skb);
 
 	if (dev->type == ARPHRD_ETHER)
 		IPCB(skb)->flags = 0;