diff mbox

[3.2,085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

Message ID 572155F4.10405@candelatech.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Ben Greear April 28, 2016, 12:14 a.m. UTC
On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
> Hi Ben,
>
> On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
>> On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
>>> On 04/26/2016 04:02 PM, Ben Hutchings wrote:
>>>>
>>>> 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
>>> I would be careful about this.  It causes regressions when sending
>>> PACKET_SOCKET buffers from user-space to veth devices.
>>>
>>> There was a proposed upstream fix for the regression, but it has not gone
>>> into the tree as far as I know.
>>>
>>> http://www.spinics.net/lists/netdev/msg370436.html
>> [...]
>>
>> OK, I'll drop this for now.
>
> The fall out from not having this patch is in my opinion a bigger
> fallout than not having this patch. This patch fixes silent data
> corruption vs. the problem Ben Greear is talking about, which might not
> be that a common usage.
>
> What do others think?
>
> Bye,
> Hannes
>

This patch from Cong Wang seems to fix the regression for me, I think it should be added and
tested in the main tree, and then apply them to stable as a pair.

http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a




Thanks,
Ben

Comments

Sabrina Dubroca April 28, 2016, 10:29 a.m. UTC | #1
Hello,

2016-04-27, 17:14:44 -0700, Ben Greear wrote:
> On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
> > Hi Ben,
> > 
> > On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
> > > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
> > > > On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> > > > > 
> > > > > 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
> > > > I would be careful about this.  It causes regressions when sending
> > > > PACKET_SOCKET buffers from user-space to veth devices.
> > > > 
> > > > There was a proposed upstream fix for the regression, but it has not gone
> > > > into the tree as far as I know.
> > > > 
> > > > http://www.spinics.net/lists/netdev/msg370436.html
> > > [...]
> > > 
> > > OK, I'll drop this for now.
> > 
> > The fall out from not having this patch is in my opinion a bigger
> > fallout than not having this patch. This patch fixes silent data
> > corruption vs. the problem Ben Greear is talking about, which might not
> > be that a common usage.
> > 
> > What do others think?
> > 
> > Bye,
> > Hannes
> > 
> 
> This patch from Cong Wang seems to fix the regression for me, I think it should be added and
> tested in the main tree, and then apply them to stable as a pair.
> 
> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a

Actually, no, this is not really a regression.

If you capture packets on a device with checksum offloading enabled,
the TCP/UDP checksum isn't filled.  veth also behaves that way.  What
the "veth: don't modify ip_summed" patch does is enable proper
checksum validation on veth.  This really was a bug in veth.

Cong's patch would also break cases where we choose to inject packets
with invalid checksums, and they would now be accepted as correct.

Your use case is invalid, it just happened to work because of a
bug.  If you want the stack to fill checksums so that you want capture
and reinject packets, you have to disable checksum offloading (or
compute the checksum yourself in userspace).

Thanks.
Ben Greear April 28, 2016, 1:45 p.m. UTC | #2
On 04/28/2016 03:29 AM, Sabrina Dubroca wrote:
> Hello,
>
> 2016-04-27, 17:14:44 -0700, Ben Greear wrote:
>> On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
>>> Hi Ben,
>>>
>>> On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
>>>> On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
>>>>> On 04/26/2016 04:02 PM, Ben Hutchings wrote:
>>>>>>
>>>>>> 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
>>>>> I would be careful about this.  It causes regressions when sending
>>>>> PACKET_SOCKET buffers from user-space to veth devices.
>>>>>
>>>>> There was a proposed upstream fix for the regression, but it has not gone
>>>>> into the tree as far as I know.
>>>>>
>>>>> http://www.spinics.net/lists/netdev/msg370436.html
>>>> [...]
>>>>
>>>> OK, I'll drop this for now.
>>>
>>> The fall out from not having this patch is in my opinion a bigger
>>> fallout than not having this patch. This patch fixes silent data
>>> corruption vs. the problem Ben Greear is talking about, which might not
>>> be that a common usage.
>>>
>>> What do others think?
>>>
>>> Bye,
>>> Hannes
>>>
>>
>> This patch from Cong Wang seems to fix the regression for me, I think it should be added and
>> tested in the main tree, and then apply them to stable as a pair.
>>
>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>
> Actually, no, this is not really a regression.
>
> If you capture packets on a device with checksum offloading enabled,
> the TCP/UDP checksum isn't filled.  veth also behaves that way.  What
> the "veth: don't modify ip_summed" patch does is enable proper
> checksum validation on veth.  This really was a bug in veth.
>
> Cong's patch would also break cases where we choose to inject packets
> with invalid checksums, and they would now be accepted as correct.
>
> Your use case is invalid, it just happened to work because of a
> bug.  If you want the stack to fill checksums so that you want capture
> and reinject packets, you have to disable checksum offloading (or
> compute the checksum yourself in userspace).

Disabling checksum offloading or computing in user-space (and then
recomputing in veth to verify the checksum?) is a huge performance loss.

Maybe we could add a socket option to enable Cong's patch on a per-socket
basis?  That way my use-case can still work and you can have this new
behaviour by default?

Thanks,
Ben
Ben Hutchings April 30, 2016, 6:33 p.m. UTC | #3
On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
> Hello,
> 
> 2016-04-27, 17:14:44 -0700, Ben Greear wrote:
> > 
> > On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
> > > 
> > > Hi Ben,
> > > 
> > > On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
> > > > 
> > > > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
> > > > > 
> > > > > On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> > > > > > 
> > > > > > 
> > > > > > 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
> > > > > I would be careful about this.  It causes regressions when sending
> > > > > PACKET_SOCKET buffers from user-space to veth devices.
> > > > > 
> > > > > There was a proposed upstream fix for the regression, but it has not gone
> > > > > into the tree as far as I know.
> > > > > 
> > > > > http://www.spinics.net/lists/netdev/msg370436.html
> > > > [...]
> > > > 
> > > > OK, I'll drop this for now.
> > > The fall out from not having this patch is in my opinion a bigger
> > > fallout than not having this patch. This patch fixes silent data
> > > corruption vs. the problem Ben Greear is talking about, which might not
> > > be that a common usage.
> > > 
> > > What do others think?
> > > 
> > > Bye,
> > > Hannes
> > > 
> > This patch from Cong Wang seems to fix the regression for me, I think it should be added and
> > tested in the main tree, and then apply them to stable as a pair.
> > 
> > http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
> Actually, no, this is not really a regression.
[...]

It really is.  Even though the old behaviour was a bug (raw packets
should not be changed), if there are real applications that depend on
that then we have to keep those applications working somehow.

Ben.
Ben Hutchings April 30, 2016, 7:18 p.m. UTC | #4
On Thu, 2016-04-28 at 06:45 -0700, Ben Greear wrote:
> 
> On 04/28/2016 03:29 AM, Sabrina Dubroca wrote:
[...]
> > Your use case is invalid, it just happened to work because of a
> > bug.  If you want the stack to fill checksums so that you want capture
> > and reinject packets, you have to disable checksum offloading (or
> > compute the checksum yourself in userspace).
> Disabling checksum offloading or computing in user-space (and then
> recomputing in veth to verify the checksum?) is a huge performance loss.
> 
> Maybe we could add a socket option to enable Cong's patch on a per-socket
> basis?  That way my use-case can still work and you can have this new
> behaviour by default?

It does sound like a useful option to have.  If there are other
applications that depend on veth's checksum-fixing behaviour and are
being distributed in binary form, then a per-device option might be
necessary so users can keep those applications working before they're
updated.

Ben.
Ben Greear April 30, 2016, 7:40 p.m. UTC | #5
On 04/30/2016 11:33 AM, Ben Hutchings wrote:
> On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
>> Hello,

>>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>> Actually, no, this is not really a regression.
> [...]
>
> It really is.  Even though the old behaviour was a bug (raw packets
> should not be changed), if there are real applications that depend on
> that then we have to keep those applications working somehow.

To be honest, I fail to see why the old behaviour is a bug when sending
raw packets from user-space.  If raw packets should not be changed, then
we need some way to specify what the checksum setting is to begin with,
otherwise, user-space has not enough control.

A socket option for new programs, and sysctl configurable defaults for raw sockets
for old binary programs would be sufficient I think.

Thanks,
Ben
Tom Herbert April 30, 2016, 7:54 p.m. UTC | #6
We've put considerable effort into cleaning up the checksum interface
to make it as unambiguous as possible, please be very careful to
follow it. Broken checksum processing is really hard to detect and
debug.

CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
(indicated by csum_level) have been verified to be correct in a
packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
checksum it would refer to has not been verified and is incorrect this
is a major bug.

Tom

On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear <greearb@candelatech.com> wrote:
>
>
> On 04/30/2016 11:33 AM, Ben Hutchings wrote:
>>
>> On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
>>>
>>> Hello,
>
>
>>>>
>>>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>>>
>>> Actually, no, this is not really a regression.
>>
>> [...]
>>
>> It really is.  Even though the old behaviour was a bug (raw packets
>> should not be changed), if there are real applications that depend on
>> that then we have to keep those applications working somehow.
>
>
> To be honest, I fail to see why the old behaviour is a bug when sending
> raw packets from user-space.  If raw packets should not be changed, then
> we need some way to specify what the checksum setting is to begin with,
> otherwise, user-space has not enough control.
>
> A socket option for new programs, and sysctl configurable defaults for raw
> sockets
> for old binary programs would be sufficient I think.
>
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
Vijay Pandurangan April 30, 2016, 8:15 p.m. UTC | #7
[oops – resending this because I was using gmail in HTML mode before
by accident]

There was a discussion on a separate thread about this. I agree with
Sabrina fully. I believe veth should provide an abstraction layer that
correctly emulates a physical network in all ways.

Consider an environment where we have multiple physical computers.
Each computer runs one or more containers, each of which has a
publicly routable ip address.  When adding a new app to the cluster, a
scheduler might decide to run this container on any physical machine
of its choice, assuming that apps have a way of routing traffic to
their backends (we did something similar Google >10 years ago). This
is something we might imagine happening with docker and ipv6 for
instance.

If you have an app, A, which sends raw ethernet frames (the simplest
case I could imagine) with TCP data that has invalid checksums to app
B, which is receiving it, the behaviour of the system _will be
different_ depending upon whether app B is scheduled to run on the
same machine as app A or not. This seems like a clear bug and a broken
abstraction (especially as the default case), and something we should
endeavour to avoid.

I do see Ben's point about enabling sw checksum verification as
potentially incurring a huge performance penalty (I haven't had a
chance to measure it) that is completely wasteful in the vast majority
of cases.

Unfortunately I just don't see how we can solve this problem in a way
that preserves a correct abstraction layer while also avoiding excess
work. I guess the first piece of data that would be helpful is to
determine just how big of a performance penalty this is. If it's
small, then maybe it doesn't matter.




On Thu, Apr 28, 2016 at 6:29 AM, Sabrina Dubroca <sd@queasysnail.net> wrote:
> Hello,
>
> 2016-04-27, 17:14:44 -0700, Ben Greear wrote:
>> On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
>> > Hi Ben,
>> >
>> > On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
>> > > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
>> > > > On 04/26/2016 04:02 PM, Ben Hutchings wrote:
>> > > > >
>> > > > > 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
>> > > > I would be careful about this.  It causes regressions when sending
>> > > > PACKET_SOCKET buffers from user-space to veth devices.
>> > > >
>> > > > There was a proposed upstream fix for the regression, but it has not gone
>> > > > into the tree as far as I know.
>> > > >
>> > > > http://www.spinics.net/lists/netdev/msg370436.html
>> > > [...]
>> > >
>> > > OK, I'll drop this for now.
>> >
>> > The fall out from not having this patch is in my opinion a bigger
>> > fallout than not having this patch. This patch fixes silent data
>> > corruption vs. the problem Ben Greear is talking about, which might not
>> > be that a common usage.
>> >
>> > What do others think?
>> >
>> > Bye,
>> > Hannes
>> >
>>
>> This patch from Cong Wang seems to fix the regression for me, I think it should be added and
>> tested in the main tree, and then apply them to stable as a pair.
>>
>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>
> Actually, no, this is not really a regression.
>
> If you capture packets on a device with checksum offloading enabled,
> the TCP/UDP checksum isn't filled.  veth also behaves that way.  What
> the "veth: don't modify ip_summed" patch does is enable proper
> checksum validation on veth.  This really was a bug in veth.
>
> Cong's patch would also break cases where we choose to inject packets
> with invalid checksums, and they would now be accepted as correct.
>
> Your use case is invalid, it just happened to work because of a
> bug.  If you want the stack to fill checksums so that you want capture
> and reinject packets, you have to disable checksum offloading (or
> compute the checksum yourself in userspace).
>
> Thanks.
>
> --
> Sabrina
Ben Greear April 30, 2016, 8:59 p.m. UTC | #8
On 04/30/2016 12:54 PM, Tom Herbert wrote:
> We've put considerable effort into cleaning up the checksum interface
> to make it as unambiguous as possible, please be very careful to
> follow it. Broken checksum processing is really hard to detect and
> debug.
>
> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
> (indicated by csum_level) have been verified to be correct in a
> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
> checksum it would refer to has not been verified and is incorrect this
> is a major bug.

Suppose I know that the packet received on a packet-socket has
already been verified by a NIC that supports hardware checksumming.

Then, I want to transmit it on a veth interface using a second
packet socket.  I do not want veth to recalculate the checksum on
transmit, nor to validate it on the peer veth on receive, because I do
not want to waste the CPU cycles.  I am assuming that my app is not
accidentally corrupting frames, so the checksum can never be bad.

How should the checksumming be configured for the packets going into
the packet-socket from user-space?

Also, I might want to send raw frames that do have
broken checksums (lets assume a real NIC, not veth), and I want them
to hit the wire with those bad checksums.

How do I configure the checksumming in this case?


Thanks,
Ben


>
> Tom
>
> On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear <greearb@candelatech.com> wrote:
>>
>>
>> On 04/30/2016 11:33 AM, Ben Hutchings wrote:
>>>
>>> On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
>>>>
>>>> Hello,
>>
>>
>>>>>
>>>>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>>>>
>>>> Actually, no, this is not really a regression.
>>>
>>> [...]
>>>
>>> It really is.  Even though the old behaviour was a bug (raw packets
>>> should not be changed), if there are real applications that depend on
>>> that then we have to keep those applications working somehow.
>>
>>
>> To be honest, I fail to see why the old behaviour is a bug when sending
>> raw packets from user-space.  If raw packets should not be changed, then
>> we need some way to specify what the checksum setting is to begin with,
>> otherwise, user-space has not enough control.
>>
>> A socket option for new programs, and sysctl configurable defaults for raw
>> sockets
>> for old binary programs would be sufficient I think.
>>
>>
>> Thanks,
>> Ben
>>
>> --
>> Ben Greear <greearb@candelatech.com>
>> Candela Technologies Inc  http://www.candelatech.com
>
Vijay Pandurangan April 30, 2016, 9:13 p.m. UTC | #9
On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear <greearb@candelatech.com> wrote:
>
>
> On 04/30/2016 12:54 PM, Tom Herbert wrote:
>>
>> We've put considerable effort into cleaning up the checksum interface
>> to make it as unambiguous as possible, please be very careful to
>> follow it. Broken checksum processing is really hard to detect and
>> debug.
>>
>> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
>> (indicated by csum_level) have been verified to be correct in a
>> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
>> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
>> checksum it would refer to has not been verified and is incorrect this
>> is a major bug.
>
>
> Suppose I know that the packet received on a packet-socket has
> already been verified by a NIC that supports hardware checksumming.
>
> Then, I want to transmit it on a veth interface using a second
> packet socket.  I do not want veth to recalculate the checksum on
> transmit, nor to validate it on the peer veth on receive, because I do
> not want to waste the CPU cycles.  I am assuming that my app is not
> accidentally corrupting frames, so the checksum can never be bad.
>
> How should the checksumming be configured for the packets going into
> the packet-socket from user-space?


It seems like that only the receiver should decide whether or not to
checksum packets on the veth, not the sender.

How about:

We could add a receiving socket option for "don't checksum packets
received from a veth when the other side has marked them as
elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending
socket option for "mark all data sent via this socket to a veth as
elide-checksum-suggested".

So the process would be:

Writer:
1. open read socket
2. open write socket, with option elide-checksum-for-veth-suggested
3. write data

Reader:
1. open read socket with "follow-elide-checksum-suggestions-on-veth"
2. read data

The kernel / module would then need to persist the flag on all packets
that traverse a veth, and drop these data when they leave the veth
module.


>
>
> Also, I might want to send raw frames that do have
> broken checksums (lets assume a real NIC, not veth), and I want them
> to hit the wire with those bad checksums.
>
>
> How do I configure the checksumming in this case?


Correct me if I'm wrong but I think this is already possible now. You
can have packets with incorrect checksum hitting the wire as is. What
you cannot do is instruct the receiving end to ignore the checksum
from the sending end when using a physical device (and something I
think we should mimic on the sending device).


>
>
>
> Thanks,
> Ben
>
>
>
>>
>> Tom
>>
>> On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear <greearb@candelatech.com> wrote:
>>>
>>>
>>>
>>> On 04/30/2016 11:33 AM, Ben Hutchings wrote:
>>>>
>>>>
>>>> On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
>>>>>
>>>>>
>>>>> Hello,
>>>
>>>
>>>
>>>>>>
>>>>>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>>>>>
>>>>>
>>>>> Actually, no, this is not really a regression.
>>>>
>>>>
>>>> [...]
>>>>
>>>> It really is.  Even though the old behaviour was a bug (raw packets
>>>> should not be changed), if there are real applications that depend on
>>>> that then we have to keep those applications working somehow.
>>>
>>>
>>>
>>> To be honest, I fail to see why the old behaviour is a bug when sending
>>> raw packets from user-space.  If raw packets should not be changed, then
>>> we need some way to specify what the checksum setting is to begin with,
>>> otherwise, user-space has not enough control.
>>>
>>> A socket option for new programs, and sysctl configurable defaults for raw
>>> sockets
>>> for old binary programs would be sufficient I think.
>>>
>>>
>>> Thanks,
>>> Ben
>>>
>>> --
>>> Ben Greear <greearb@candelatech.com>
>>> Candela Technologies Inc  http://www.candelatech.com
>>
>>
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
Ben Greear April 30, 2016, 9:29 p.m. UTC | #10
On 04/30/2016 02:13 PM, Vijay Pandurangan wrote:
> On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear <greearb@candelatech.com> wrote:
>>
>>
>> On 04/30/2016 12:54 PM, Tom Herbert wrote:
>>>
>>> We've put considerable effort into cleaning up the checksum interface
>>> to make it as unambiguous as possible, please be very careful to
>>> follow it. Broken checksum processing is really hard to detect and
>>> debug.
>>>
>>> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
>>> (indicated by csum_level) have been verified to be correct in a
>>> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
>>> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
>>> checksum it would refer to has not been verified and is incorrect this
>>> is a major bug.
>>
>>
>> Suppose I know that the packet received on a packet-socket has
>> already been verified by a NIC that supports hardware checksumming.
>>
>> Then, I want to transmit it on a veth interface using a second
>> packet socket.  I do not want veth to recalculate the checksum on
>> transmit, nor to validate it on the peer veth on receive, because I do
>> not want to waste the CPU cycles.  I am assuming that my app is not
>> accidentally corrupting frames, so the checksum can never be bad.
>>
>> How should the checksumming be configured for the packets going into
>> the packet-socket from user-space?
>
>
> It seems like that only the receiver should decide whether or not to
> checksum packets on the veth, not the sender.
>
> How about:
>
> We could add a receiving socket option for "don't checksum packets
> received from a veth when the other side has marked them as
> elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending
> socket option for "mark all data sent via this socket to a veth as
> elide-checksum-suggested".
>
> So the process would be:
>
> Writer:
> 1. open read socket
> 2. open write socket, with option elide-checksum-for-veth-suggested
> 3. write data
>
> Reader:
> 1. open read socket with "follow-elide-checksum-suggestions-on-veth"
> 2. read data
>
> The kernel / module would then need to persist the flag on all packets
> that traverse a veth, and drop these data when they leave the veth
> module.

I'm not sure this works completely.  In my app, the packet flow might be:

eth0 <-> raw-socket <-> user-space-bridge <-> raw-socket <-> vethA <-> vethB <-> [kernel router/bridge logic ...] <-> eth1

There may be no sockets on the vethB port.  And reader/writer is not
a good way to look at it since I am implementing a bi-directional bridge in
user-space and each packet-socket is for both rx and tx.

>> Also, I might want to send raw frames that do have
>> broken checksums (lets assume a real NIC, not veth), and I want them
>> to hit the wire with those bad checksums.
>>
>>
>> How do I configure the checksumming in this case?
>
>
> Correct me if I'm wrong but I think this is already possible now. You
> can have packets with incorrect checksum hitting the wire as is. What
> you cannot do is instruct the receiving end to ignore the checksum
> from the sending end when using a physical device (and something I
> think we should mimic on the sending device).

Yes, it does work currently (or, last I checked)...I just want to make sure it keeps working.

Thanks,
Ben
Vijay Pandurangan April 30, 2016, 9:36 p.m. UTC | #11
On Sat, Apr 30, 2016 at 5:29 PM, Ben Greear <greearb@candelatech.com> wrote:
>
>
> On 04/30/2016 02:13 PM, Vijay Pandurangan wrote:
>>
>> On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear <greearb@candelatech.com>
>> wrote:
>>>
>>>
>>>
>>> On 04/30/2016 12:54 PM, Tom Herbert wrote:
>>>>
>>>>
>>>> We've put considerable effort into cleaning up the checksum interface
>>>> to make it as unambiguous as possible, please be very careful to
>>>> follow it. Broken checksum processing is really hard to detect and
>>>> debug.
>>>>
>>>> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
>>>> (indicated by csum_level) have been verified to be correct in a
>>>> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
>>>> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
>>>> checksum it would refer to has not been verified and is incorrect this
>>>> is a major bug.
>>>
>>>
>>>
>>> Suppose I know that the packet received on a packet-socket has
>>> already been verified by a NIC that supports hardware checksumming.
>>>
>>> Then, I want to transmit it on a veth interface using a second
>>> packet socket.  I do not want veth to recalculate the checksum on
>>> transmit, nor to validate it on the peer veth on receive, because I do
>>> not want to waste the CPU cycles.  I am assuming that my app is not
>>> accidentally corrupting frames, so the checksum can never be bad.
>>>
>>> How should the checksumming be configured for the packets going into
>>> the packet-socket from user-space?
>>
>>
>>
>> It seems like that only the receiver should decide whether or not to
>> checksum packets on the veth, not the sender.
>>
>> How about:
>>
>> We could add a receiving socket option for "don't checksum packets
>> received from a veth when the other side has marked them as
>> elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending
>> socket option for "mark all data sent via this socket to a veth as
>> elide-checksum-suggested".
>>
>> So the process would be:
>>
>> Writer:
>> 1. open read socket
>> 2. open write socket, with option elide-checksum-for-veth-suggested
>> 3. write data
>>
>> Reader:
>> 1. open read socket with "follow-elide-checksum-suggestions-on-veth"
>> 2. read data
>>
>> The kernel / module would then need to persist the flag on all packets
>> that traverse a veth, and drop these data when they leave the veth
>> module.
>
>
> I'm not sure this works completely.  In my app, the packet flow might be:
>
> eth0 <-> raw-socket <-> user-space-bridge <-> raw-socket <-> vethA <-> vethB
> <-> [kernel router/bridge logic ...] <-> eth1

Good point, so if you had:

eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <->
userspace-stub <->eth1

and user-space hub enabled this elide flag, things would work, right?
Then, it seems like what we need is a way to tell the kernel
router/bridge logic to follow elide signals in packets coming from
veth. I'm not sure what the best way to do this is because I'm less
familiar with conventions in that part of the kernel, but assuming
there's a way to do this, would it be acceptable?



>
> There may be no sockets on the vethB port.  And reader/writer is not
> a good way to look at it since I am implementing a bi-directional bridge in
> user-space and each packet-socket is for both rx and tx.

Sure, but we could model a bidrectional connection as two
unidirectional sockets for our discussions here, right?



>
>>> Also, I might want to send raw frames that do have
>>> broken checksums (lets assume a real NIC, not veth), and I want them
>>> to hit the wire with those bad checksums.
>>>
>>>
>>> How do I configure the checksumming in this case?
>>
>>
>>
>> Correct me if I'm wrong but I think this is already possible now. You
>> can have packets with incorrect checksum hitting the wire as is. What
>> you cannot do is instruct the receiving end to ignore the checksum
>> from the sending end when using a physical device (and something I
>> think we should mimic on the sending device).
>
>
> Yes, it does work currently (or, last I checked)...I just want to make sure
> it keeps working.

Yeah, good point. It would be great if we could write a test, or at
the very least, a list of invariants about what kinds of things should
work somewhere.

>
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
Ben Greear April 30, 2016, 9:52 p.m. UTC | #12
On 04/30/2016 02:36 PM, Vijay Pandurangan wrote:
> On Sat, Apr 30, 2016 at 5:29 PM, Ben Greear <greearb@candelatech.com> wrote:
>>
>>
>> On 04/30/2016 02:13 PM, Vijay Pandurangan wrote:
>>>
>>> On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear <greearb@candelatech.com>
>>> wrote:
>>>>
>>>>
>>>>
>>>> On 04/30/2016 12:54 PM, Tom Herbert wrote:
>>>>>
>>>>>
>>>>> We've put considerable effort into cleaning up the checksum interface
>>>>> to make it as unambiguous as possible, please be very careful to
>>>>> follow it. Broken checksum processing is really hard to detect and
>>>>> debug.
>>>>>
>>>>> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
>>>>> (indicated by csum_level) have been verified to be correct in a
>>>>> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
>>>>> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
>>>>> checksum it would refer to has not been verified and is incorrect this
>>>>> is a major bug.
>>>>
>>>>
>>>>
>>>> Suppose I know that the packet received on a packet-socket has
>>>> already been verified by a NIC that supports hardware checksumming.
>>>>
>>>> Then, I want to transmit it on a veth interface using a second
>>>> packet socket.  I do not want veth to recalculate the checksum on
>>>> transmit, nor to validate it on the peer veth on receive, because I do
>>>> not want to waste the CPU cycles.  I am assuming that my app is not
>>>> accidentally corrupting frames, so the checksum can never be bad.
>>>>
>>>> How should the checksumming be configured for the packets going into
>>>> the packet-socket from user-space?
>>>
>>>
>>>
>>> It seems like that only the receiver should decide whether or not to
>>> checksum packets on the veth, not the sender.
>>>
>>> How about:
>>>
>>> We could add a receiving socket option for "don't checksum packets
>>> received from a veth when the other side has marked them as
>>> elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending
>>> socket option for "mark all data sent via this socket to a veth as
>>> elide-checksum-suggested".
>>>
>>> So the process would be:
>>>
>>> Writer:
>>> 1. open read socket
>>> 2. open write socket, with option elide-checksum-for-veth-suggested
>>> 3. write data
>>>
>>> Reader:
>>> 1. open read socket with "follow-elide-checksum-suggestions-on-veth"
>>> 2. read data
>>>
>>> The kernel / module would then need to persist the flag on all packets
>>> that traverse a veth, and drop these data when they leave the veth
>>> module.
>>
>>
>> I'm not sure this works completely.  In my app, the packet flow might be:
>>
>> eth0 <-> raw-socket <-> user-space-bridge <-> raw-socket <-> vethA <-> vethB
>> <-> [kernel router/bridge logic ...] <-> eth1
>
> Good point, so if you had:
>
> eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <->
> userspace-stub <->eth1
>
> and user-space hub enabled this elide flag, things would work, right?
> Then, it seems like what we need is a way to tell the kernel
> router/bridge logic to follow elide signals in packets coming from
> veth. I'm not sure what the best way to do this is because I'm less
> familiar with conventions in that part of the kernel, but assuming
> there's a way to do this, would it be acceptable?

You cannot receive on one veth without transmitting on the other, so
I think the elide csum logic can go on the raw-socket, and apply to packets
in the transmit-from-user-space direction.  Just allowing the socket to make
the veth behave like it used to before this patch in question should be good
enough, since that worked for us for years.  So, just an option to modify the
ip_summed for pkts sent on a socket is probably sufficient.

>> There may be no sockets on the vethB port.  And reader/writer is not
>> a good way to look at it since I am implementing a bi-directional bridge in
>> user-space and each packet-socket is for both rx and tx.
>
> Sure, but we could model a bidrectional connection as two
> unidirectional sockets for our discussions here, right?

Best not to I think, you want to make sure that one socket can
correctly handle tx and rx.  As long as that works, then using
uni-directional sockets should work too.

Thanks,
Ben
Vijay Pandurangan April 30, 2016, 10:01 p.m. UTC | #13
On Sat, Apr 30, 2016 at 5:52 PM, Ben Greear <greearb@candelatech.com> wrote:
>>
>> Good point, so if you had:
>>
>> eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <->
>> userspace-stub <->eth1
>>
>> and user-space hub enabled this elide flag, things would work, right?
>> Then, it seems like what we need is a way to tell the kernel
>> router/bridge logic to follow elide signals in packets coming from
>> veth. I'm not sure what the best way to do this is because I'm less
>> familiar with conventions in that part of the kernel, but assuming
>> there's a way to do this, would it be acceptable?
>
>
> You cannot receive on one veth without transmitting on the other, so
> I think the elide csum logic can go on the raw-socket, and apply to packets
> in the transmit-from-user-space direction.  Just allowing the socket to make
> the veth behave like it used to before this patch in question should be good
> enough, since that worked for us for years.  So, just an option to modify
> the
> ip_summed for pkts sent on a socket is probably sufficient.

I don't think this is right. Consider:

- App A  sends out corrupt packets 50% of the time and discards inbound data.
- App B doesn't care about corrupt packets and is happy to receive
them and has some way of dealing with them (special case)
- App C is a regular app, say nc or something.

In your world, where A decides what happens to data it transmits,
then
A<--veth-->B and A<---wire-->B will have the same behaviour

but

A<-- veth --> C and A<-- wire --> C will have _different_ behaviour: C
will behave incorrectly if it's connected over veth but correctly if
connected with a wire. That is a bug.

Since A cannot know what the app it's talking to will desire, I argue
that both sides of a message must be opted in to this optimization.






>
>>> There may be no sockets on the vethB port.  And reader/writer is not
>>> a good way to look at it since I am implementing a bi-directional bridge
>>> in
>>> user-space and each packet-socket is for both rx and tx.
>>
>>
>> Sure, but we could model a bidrectional connection as two
>> unidirectional sockets for our discussions here, right?
>
>
> Best not to I think, you want to make sure that one socket can
> correctly handle tx and rx.  As long as that works, then using
> uni-directional sockets should work too.
>
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
Tom Herbert April 30, 2016, 10:42 p.m. UTC | #14
On Sat, Apr 30, 2016 at 1:59 PM, Ben Greear <greearb@candelatech.com> wrote:
>
> On 04/30/2016 12:54 PM, Tom Herbert wrote:
>>
>> We've put considerable effort into cleaning up the checksum interface
>> to make it as unambiguous as possible, please be very careful to
>> follow it. Broken checksum processing is really hard to detect and
>> debug.
>>
>> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
>> (indicated by csum_level) have been verified to be correct in a
>> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
>> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
>> checksum it would refer to has not been verified and is incorrect this
>> is a major bug.
>
>
> Suppose I know that the packet received on a packet-socket has
> already been verified by a NIC that supports hardware checksumming.
>
> Then, I want to transmit it on a veth interface using a second
> packet socket.  I do not want veth to recalculate the checksum on
> transmit, nor to validate it on the peer veth on receive, because I do
> not want to waste the CPU cycles.  I am assuming that my app is not
> accidentally corrupting frames, so the checksum can never be bad.
>
> How should the checksumming be configured for the packets going into
> the packet-socket from user-space?
>
Checksum unnecessary conversion to checksum complete should be done
and then the computed value can be sent to user space. If you want to
take advantage of the value on transmit then we would to change the
interface. But I'm not sure why recalculating the the checksum in the
host is even a problem. With all the advances in checksum offload,
LCO, RCO it seems like that shouldn't be a common case anyway.

> Also, I might want to send raw frames that do have
> broken checksums (lets assume a real NIC, not veth), and I want them
> to hit the wire with those bad checksums.
>
> How do I configure the checksumming in this case?

Just send raw packets with bad checksums (no checksum offload).

Tom

>
>
> Thanks,
> Ben
>
>
>
>>
>> Tom
>>
>> On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear <greearb@candelatech.com>
>> wrote:
>>>
>>>
>>>
>>> On 04/30/2016 11:33 AM, Ben Hutchings wrote:
>>>>
>>>>
>>>> On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
>>>>>
>>>>>
>>>>> Hello,
>>>
>>>
>>>
>>>>>>
>>>>>>
>>>>>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>>>>>
>>>>>
>>>>> Actually, no, this is not really a regression.
>>>>
>>>>
>>>> [...]
>>>>
>>>> It really is.  Even though the old behaviour was a bug (raw packets
>>>> should not be changed), if there are real applications that depend on
>>>> that then we have to keep those applications working somehow.
>>>
>>>
>>>
>>> To be honest, I fail to see why the old behaviour is a bug when sending
>>> raw packets from user-space.  If raw packets should not be changed, then
>>> we need some way to specify what the checksum setting is to begin with,
>>> otherwise, user-space has not enough control.
>>>
>>> A socket option for new programs, and sysctl configurable defaults for
>>> raw
>>> sockets
>>> for old binary programs would be sufficient I think.
>>>
>>>
>>> Thanks,
>>> Ben
>>>
>>> --
>>> Ben Greear <greearb@candelatech.com>
>>> Candela Technologies Inc  http://www.candelatech.com
>>
>>
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
Ben Greear April 30, 2016, 10:43 p.m. UTC | #15
On 04/30/2016 03:01 PM, Vijay Pandurangan wrote:
> On Sat, Apr 30, 2016 at 5:52 PM, Ben Greear <greearb@candelatech.com> wrote:
>>>
>>> Good point, so if you had:
>>>
>>> eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <->
>>> userspace-stub <->eth1
>>>
>>> and user-space hub enabled this elide flag, things would work, right?
>>> Then, it seems like what we need is a way to tell the kernel
>>> router/bridge logic to follow elide signals in packets coming from
>>> veth. I'm not sure what the best way to do this is because I'm less
>>> familiar with conventions in that part of the kernel, but assuming
>>> there's a way to do this, would it be acceptable?
>>
>>
>> You cannot receive on one veth without transmitting on the other, so
>> I think the elide csum logic can go on the raw-socket, and apply to packets
>> in the transmit-from-user-space direction.  Just allowing the socket to make
>> the veth behave like it used to before this patch in question should be good
>> enough, since that worked for us for years.  So, just an option to modify
>> the
>> ip_summed for pkts sent on a socket is probably sufficient.
>
> I don't think this is right. Consider:
>
> - App A  sends out corrupt packets 50% of the time and discards inbound data.
> - App B doesn't care about corrupt packets and is happy to receive
> them and has some way of dealing with them (special case)
> - App C is a regular app, say nc or something.
>
> In your world, where A decides what happens to data it transmits,
> then
> A<--veth-->B and A<---wire-->B will have the same behaviour
>
> but
>
> A<-- veth --> C and A<-- wire --> C will have _different_ behaviour: C
> will behave incorrectly if it's connected over veth but correctly if
> connected with a wire. That is a bug.
>
> Since A cannot know what the app it's talking to will desire, I argue
> that both sides of a message must be opted in to this optimization.

How can you make a generic app C know how to do this?  The path could be,
for instance:

eth0 <-> user-space-A <-> vethA <-> vethB <-> { kernel routing logic } <-> vethC <-> vethD <-> appC

There are no sockets on vethB, but it does need to have special behaviour to elide
csums.  Even if appC is hacked to know how to twiddle some thing on it's veth port,
mucking with vethD will have no effect on vethB.

With regard to your example above, why would A corrupt packets?  My guess:

1)  It has bugs (so, fix the bugs, it could equally create incorrect data with proper checksums,
     so just enabling checksumming adds no useful protection.)

2)  It means to corrupt frames.  In that case, someone must expect that C should receive incorrect
     frames, otherwise why bother making App-A corrupt them in the first place?

3)  You are explicitly trying to test the kernel checksum logic, so you want the kernel to
     detect the bad checksum and throw away the packet.  In this case, just don't set the socket
     option in appA to elide checksums and the packet will be thrown away.

Any other cases you can think of?

Thanks,
Ben
Willy Tarreau May 1, 2016, 5:30 a.m. UTC | #16
On Sat, Apr 30, 2016 at 03:43:51PM -0700, Ben Greear wrote:
> On 04/30/2016 03:01 PM, Vijay Pandurangan wrote:
> > Consider:
> > 
> > - App A  sends out corrupt packets 50% of the time and discards inbound data.
(...)
> How can you make a generic app C know how to do this?  The path could be,
> for instance:
> 
> eth0 <-> user-space-A <-> vethA <-> vethB <-> { kernel routing logic } <-> vethC <-> vethD <-> appC
> 
> There are no sockets on vethB, but it does need to have special behaviour to elide
> csums.  Even if appC is hacked to know how to twiddle some thing on it's veth port,
> mucking with vethD will have no effect on vethB.
> 
> With regard to your example above, why would A corrupt packets?  My guess:
> 
> 1)  It has bugs (so, fix the bugs, it could equally create incorrect data with proper checksums,
>     so just enabling checksumming adds no useful protection.)

I agree with Ben here, what he needs is the ability for userspace to be
trusted when *forwarding* a packet. Ideally you'd only want to receive
the csum status per packet on the packet socket and pass the same value
on the vethA interface, with this status being kept when the packet
reaches vethB.

If A purposely corrupts packet, it's A's problem. It's similar to designing
a NIC which intentionally corrupts packets and reports "checksum good".

The real issue is that in order to do things right, the userspace bridge
(here, "A") would really need to pass this status. In Ben's case as he
says, bad checksum packets are dropped before reaching A, so that
simplifies the process quite a bit and that might be what causes some
confusion, but ideally we'd rather have recvmsg() and sendmsg() with
these flags.

I faced the exact same issue 3 years ago when playing with netmap, it was
slow as hell because it would lose all checksum information when packets
were passing through userland, resulting in GRO/GSO etc being disabled,
and had to modify it to let userland preserve it. That's especially
important when you have to deal with possibly corrupted packets not yet
detected in the chain because the NIC did not validate their checksums.

Willy
Ben Greear May 13, 2016, 4:57 p.m. UTC | #17
Mr Miller:

How do you feel about a new socket-option to allow a socket to
request the old veth behaviour?

Thanks,
Ben

On 04/30/2016 10:30 PM, Willy Tarreau wrote:
> On Sat, Apr 30, 2016 at 03:43:51PM -0700, Ben Greear wrote:
>> On 04/30/2016 03:01 PM, Vijay Pandurangan wrote:
>>> Consider:
>>>
>>> - App A  sends out corrupt packets 50% of the time and discards inbound data.
> (...)
>> How can you make a generic app C know how to do this?  The path could be,
>> for instance:
>>
>> eth0 <-> user-space-A <-> vethA <-> vethB <-> { kernel routing logic } <-> vethC <-> vethD <-> appC
>>
>> There are no sockets on vethB, but it does need to have special behaviour to elide
>> csums.  Even if appC is hacked to know how to twiddle some thing on it's veth port,
>> mucking with vethD will have no effect on vethB.
>>
>> With regard to your example above, why would A corrupt packets?  My guess:
>>
>> 1)  It has bugs (so, fix the bugs, it could equally create incorrect data with proper checksums,
>>      so just enabling checksumming adds no useful protection.)
>
> I agree with Ben here, what he needs is the ability for userspace to be
> trusted when *forwarding* a packet. Ideally you'd only want to receive
> the csum status per packet on the packet socket and pass the same value
> on the vethA interface, with this status being kept when the packet
> reaches vethB.
>
> If A purposely corrupts packet, it's A's problem. It's similar to designing
> a NIC which intentionally corrupts packets and reports "checksum good".
>
> The real issue is that in order to do things right, the userspace bridge
> (here, "A") would really need to pass this status. In Ben's case as he
> says, bad checksum packets are dropped before reaching A, so that
> simplifies the process quite a bit and that might be what causes some
> confusion, but ideally we'd rather have recvmsg() and sendmsg() with
> these flags.
>
> I faced the exact same issue 3 years ago when playing with netmap, it was
> slow as hell because it would lose all checksum information when packets
> were passing through userland, resulting in GRO/GSO etc being disabled,
> and had to modify it to let userland preserve it. That's especially
> important when you have to deal with possibly corrupted packets not yet
> detected in the chain because the NIC did not validate their checksums.
>
> Willy
>
David Miller May 13, 2016, 6:21 p.m. UTC | #18
From: Ben Greear <greearb@candelatech.com>
Date: Fri, 13 May 2016 09:57:19 -0700

> How do you feel about a new socket-option to allow a socket to
> request the old veth behaviour?

I depend upon the opinions of the experts who work upstream on and
maintain these components, since it is an area I am not so familiar
with.

Generally speaking asking me directly for opinions on matters like
this isn't the way to go, in fact I kind of find it irritating.  It
can't all be on me.
Ben Greear May 13, 2016, 6:23 p.m. UTC | #19
On 05/13/2016 11:21 AM, David Miller wrote:
> From: Ben Greear <greearb@candelatech.com>
> Date: Fri, 13 May 2016 09:57:19 -0700
>
>> How do you feel about a new socket-option to allow a socket to
>> request the old veth behaviour?
>
> I depend upon the opinions of the experts who work upstream on and
> maintain these components, since it is an area I am not so familiar
> with.
>
> Generally speaking asking me directly for opinions on matters like
> this isn't the way to go, in fact I kind of find it irritating.  It
> can't all be on me.
>

Fair enough, thanks for your time.

Ben
diff mbox

Patch

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index da1ae0e..f8cc758 100644 (file)
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1926,6 +1926,7 @@  retry:
                 goto out_unlock;
         }

+       skb->ip_summed = CHECKSUM_UNNECESSARY;
         skb->protocol = proto;
         skb->dev = dev;
         skb->priority = sk->sk_priority;
@@ -2352,6 +2353,7 @@  static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,

         ph.raw = frame;

+       skb->ip_summed = CHECKSUM_UNNECESSARY;
         skb->protocol = proto;
         skb->dev = dev;
         skb->priority = po->sk.sk_priority;
@@ -2776,6 +2778,7 @@  static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
                 goto out_free;
         }

+       skb->ip_summed = CHECKSUM_UNNECESSARY;
         skb->protocol = proto;
         skb->dev = dev;
         skb->priority = sk->sk_priority;