diff mbox

[net] igb: pass the correct maxlen for eth_get_headlen()

Message ID 1429724760-10075-1-git-send-email-xiyou.wangcong@gmail.com
State Changes Requested
Headers show

Commit Message

Cong Wang April 22, 2015, 5:45 p.m. UTC
The second parameter of eth_get_headlen() is the length of
the frame buffer, not the header length of skb.

Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Cong Wang April 22, 2015, 8:14 p.m. UTC | #1
On Wed, Apr 22, 2015 at 12:43 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On 04/22/2015 10:45 AM, Cong Wang wrote:
>> The second parameter of eth_get_headlen() is the length of
>> the frame buffer, not the header length of skb.
>>
>> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>> ---
>>  drivers/net/ethernet/intel/igb/igb_main.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
>> index a0a9b1f..7b3a370 100644
>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>> @@ -6852,7 +6852,9 @@ static void igb_pull_tail(struct igb_ring *rx_ring,
>>       /* we need the header to contain the greater of either ETH_HLEN or
>>        * 60 bytes if the skb->len is less than 60 for skb_pad.
>>        */
>> -     pull_len = eth_get_headlen(va, IGB_RX_HDR_LEN);
>> +     pull_len = eth_get_headlen(va, skb_frag_size(frag));
>> +     if (unlikely(pull_len > IGB_RX_HDR_LEN))
>> +             pull_len = IGB_RX_HDR_LEN;
>>
>>       /* align pull length to size of long to optimize memcpy performance */
>>       skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
>
> You have this part right.  The length represents the maximum length we
> are willing to traverse in the buffer.  So if we 100% want to get the
> entire header regardless of what we can copy into then we could follow
> your approach.  However, since the allocated space in the skb is only
> IGB_RX_HDR_LEN we only really want to traverse up to that length.  Then
> that is all we copy out of the header.

But the frag size could be smaller than IGB_RX_HDR_LEN which is
the case this patch tries to fix.

>
> As a result we don't need the extra code for putting the upper limit on
> pull_len since that is factored in by passing IGB_RX_HDR_LEN as the
> maximum traversal length.

That is why I put unlikely() there, even with tunnel headers we unlikely
pull header longer than IGB_RX_HDR_LEN, the code is just for
completeness.
Alexander H Duyck April 22, 2015, 8:21 p.m. UTC | #2
On 04/22/2015 01:14 PM, Cong Wang wrote:
> On Wed, Apr 22, 2015 at 12:43 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On 04/22/2015 10:45 AM, Cong Wang wrote:
>>> The second parameter of eth_get_headlen() is the length of
>>> the frame buffer, not the header length of skb.
>>>
>>> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>>> ---
>>>  drivers/net/ethernet/intel/igb/igb_main.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
>>> index a0a9b1f..7b3a370 100644
>>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>>> @@ -6852,7 +6852,9 @@ static void igb_pull_tail(struct igb_ring *rx_ring,
>>>       /* we need the header to contain the greater of either ETH_HLEN or
>>>        * 60 bytes if the skb->len is less than 60 for skb_pad.
>>>        */
>>> -     pull_len = eth_get_headlen(va, IGB_RX_HDR_LEN);
>>> +     pull_len = eth_get_headlen(va, skb_frag_size(frag));
>>> +     if (unlikely(pull_len > IGB_RX_HDR_LEN))
>>> +             pull_len = IGB_RX_HDR_LEN;
>>>
>>>       /* align pull length to size of long to optimize memcpy performance */
>>>       skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
>> You have this part right.  The length represents the maximum length we
>> are willing to traverse in the buffer.  So if we 100% want to get the
>> entire header regardless of what we can copy into then we could follow
>> your approach.  However, since the allocated space in the skb is only
>> IGB_RX_HDR_LEN we only really want to traverse up to that length.  Then
>> that is all we copy out of the header.
> But the frag size could be smaller than IGB_RX_HDR_LEN which is
> the case this patch tries to fix.

No it can't.  We only add frags if the first frag is larger than
IGB_RX_HDR_LEN.  Go look at the code where this driver calls
add_rx_frag.  You should find there is copybreak code in there that
kicks in for anything less than or equal to IGB_RX_HDR_LEN in size on
the first frag.  It is there to allow us to avoid having to perform an
atomic inc/dec on the page.

>> As a result we don't need the extra code for putting the upper limit on
>> pull_len since that is factored in by passing IGB_RX_HDR_LEN as the
>> maximum traversal length.
> That is why I put unlikely() there, even with tunnel headers we unlikely
> pull header longer than IGB_RX_HDR_LEN, the code is just for
> completeness.

The code is fine as is.  I suggest you go look at the code where frags
are attached to the sk_buff.

- Alex
Cong Wang April 22, 2015, 8:33 p.m. UTC | #3
On Wed, Apr 22, 2015 at 1:21 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On 04/22/2015 01:14 PM, Cong Wang wrote:
>> On Wed, Apr 22, 2015 at 12:43 PM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On 04/22/2015 10:45 AM, Cong Wang wrote:
>>>> The second parameter of eth_get_headlen() is the length of
>>>> the frame buffer, not the header length of skb.
>>>>
>>>> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>>>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>>>> ---
>>>>  drivers/net/ethernet/intel/igb/igb_main.c | 4 +++-
>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
>>>> index a0a9b1f..7b3a370 100644
>>>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>>>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>>>> @@ -6852,7 +6852,9 @@ static void igb_pull_tail(struct igb_ring *rx_ring,
>>>>       /* we need the header to contain the greater of either ETH_HLEN or
>>>>        * 60 bytes if the skb->len is less than 60 for skb_pad.
>>>>        */
>>>> -     pull_len = eth_get_headlen(va, IGB_RX_HDR_LEN);
>>>> +     pull_len = eth_get_headlen(va, skb_frag_size(frag));
>>>> +     if (unlikely(pull_len > IGB_RX_HDR_LEN))
>>>> +             pull_len = IGB_RX_HDR_LEN;
>>>>
>>>>       /* align pull length to size of long to optimize memcpy performance */
>>>>       skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
>>> You have this part right.  The length represents the maximum length we
>>> are willing to traverse in the buffer.  So if we 100% want to get the
>>> entire header regardless of what we can copy into then we could follow
>>> your approach.  However, since the allocated space in the skb is only
>>> IGB_RX_HDR_LEN we only really want to traverse up to that length.  Then
>>> that is all we copy out of the header.
>> But the frag size could be smaller than IGB_RX_HDR_LEN which is
>> the case this patch tries to fix.
>
> No it can't.  We only add frags if the first frag is larger than
> IGB_RX_HDR_LEN.  Go look at the code where this driver calls
> add_rx_frag.  You should find there is copybreak code in there that
> kicks in for anything less than or equal to IGB_RX_HDR_LEN in size on
> the first frag.  It is there to allow us to avoid having to perform an
> atomic inc/dec on the page.
>

First, make sure you don't miss the TSIP case right above:

The frag starting pointer and its size are advanced by:

skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
...
va += IGB_TS_HDR_LEN;

even though we unlikely pull header longer than
IGB_RX_HDR_LEN - IGB_TS_HDR_LEN either.


Second, the check you mentioned above is:

if ((size <= IGB_RX_HDR_LEN) && !skb_is_nonlinear(skb))

skb is nonlinear _after_ the first igb_add_rx_frag(), a second
igb_add_rx_frag() is possible since igb_is_non_eop() could
return true.
Alexander H Duyck April 22, 2015, 9:42 p.m. UTC | #4
On 04/22/2015 01:33 PM, Cong Wang wrote:
> On Wed, Apr 22, 2015 at 1:21 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On 04/22/2015 01:14 PM, Cong Wang wrote:
>>> On Wed, Apr 22, 2015 at 12:43 PM, Alexander Duyck
>>> <alexander.duyck@gmail.com> wrote:
>>>> On 04/22/2015 10:45 AM, Cong Wang wrote:
>>>>> The second parameter of eth_get_headlen() is the length of
>>>>> the frame buffer, not the header length of skb.
>>>>>
>>>>> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>>>>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>>>>> ---
>>>>>  drivers/net/ethernet/intel/igb/igb_main.c | 4 +++-
>>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
>>>>> index a0a9b1f..7b3a370 100644
>>>>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>>>>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>>>>> @@ -6852,7 +6852,9 @@ static void igb_pull_tail(struct igb_ring *rx_ring,
>>>>>       /* we need the header to contain the greater of either ETH_HLEN or
>>>>>        * 60 bytes if the skb->len is less than 60 for skb_pad.
>>>>>        */
>>>>> -     pull_len = eth_get_headlen(va, IGB_RX_HDR_LEN);
>>>>> +     pull_len = eth_get_headlen(va, skb_frag_size(frag));
>>>>> +     if (unlikely(pull_len > IGB_RX_HDR_LEN))
>>>>> +             pull_len = IGB_RX_HDR_LEN;
>>>>>
>>>>>       /* align pull length to size of long to optimize memcpy performance */
>>>>>       skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
>>>> You have this part right.  The length represents the maximum length we
>>>> are willing to traverse in the buffer.  So if we 100% want to get the
>>>> entire header regardless of what we can copy into then we could follow
>>>> your approach.  However, since the allocated space in the skb is only
>>>> IGB_RX_HDR_LEN we only really want to traverse up to that length.  Then
>>>> that is all we copy out of the header.
>>> But the frag size could be smaller than IGB_RX_HDR_LEN which is
>>> the case this patch tries to fix.
>> No it can't.  We only add frags if the first frag is larger than
>> IGB_RX_HDR_LEN.  Go look at the code where this driver calls
>> add_rx_frag.  You should find there is copybreak code in there that
>> kicks in for anything less than or equal to IGB_RX_HDR_LEN in size on
>> the first frag.  It is there to allow us to avoid having to perform an
>> atomic inc/dec on the page.
>>
> First, make sure you don't miss the TSIP case right above:
>
> The frag starting pointer and its size are advanced by:
>
> skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
> ...
> va += IGB_TS_HDR_LEN;
>
> even though we unlikely pull header longer than
> IGB_RX_HDR_LEN - IGB_TS_HDR_LEN either.

So I believe this is a possible bug, one heck of a corner case to get
into though.  It requires timestamp in packet, size 240 - 256, and a
malformed header.

The proper fix would probably be to pull the timestamp out of the packet
before we add it to the frame.  I'll submit a patch to address this.

>
> Second, the check you mentioned above is:
>
> if ((size <= IGB_RX_HDR_LEN) && !skb_is_nonlinear(skb))
>
> skb is nonlinear _after_ the first igb_add_rx_frag(), a second
> igb_add_rx_frag() is possible since igb_is_non_eop() could
> return true.

I'm not sure this part makes any sense.  We pull the data out of the
first fragment always.  If skb_is_nonlinear is set then we should have
at least 2K - 16B in the case of igb.  We will never have a second
fragment without at least 2K of data being given in the first.

- Alex
Cong Wang April 22, 2015, 9:56 p.m. UTC | #5
On Wed, Apr 22, 2015 at 2:42 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On 04/22/2015 01:33 PM, Cong Wang wrote:
>> First, make sure you don't miss the TSIP case right above:
>>
>> The frag starting pointer and its size are advanced by:
>>
>> skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
>> ...
>> va += IGB_TS_HDR_LEN;
>>
>> even though we unlikely pull header longer than
>> IGB_RX_HDR_LEN - IGB_TS_HDR_LEN either.
>
> So I believe this is a possible bug, one heck of a corner case to get
> into though.  It requires timestamp in packet, size 240 - 256, and a
> malformed header.
>
> The proper fix would probably be to pull the timestamp out of the packet
> before we add it to the frame.  I'll submit a patch to address this.
>


Huh? Doesn't my patch already fix this? skb_frag_size() is always
up to date. Or you mean another different problem?

>>
>> Second, the check you mentioned above is:
>>
>> if ((size <= IGB_RX_HDR_LEN) && !skb_is_nonlinear(skb))
>>
>> skb is nonlinear _after_ the first igb_add_rx_frag(), a second
>> igb_add_rx_frag() is possible since igb_is_non_eop() could
>> return true.
>
> I'm not sure this part makes any sense.  We pull the data out of the
> first fragment always.  If skb_is_nonlinear is set then we should have
> at least 2K - 16B in the case of igb.  We will never have a second
> fragment without at least 2K of data being given in the first.

Apparently my igb knowledge isn't enough to verify this, I just did
logical analysis.
Alexander H Duyck April 22, 2015, 10:34 p.m. UTC | #6
On 04/22/2015 02:56 PM, Cong Wang wrote:
> On Wed, Apr 22, 2015 at 2:42 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On 04/22/2015 01:33 PM, Cong Wang wrote:
>>> First, make sure you don't miss the TSIP case right above:
>>>
>>> The frag starting pointer and its size are advanced by:
>>>
>>> skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
>>> ...
>>> va += IGB_TS_HDR_LEN;
>>>
>>> even though we unlikely pull header longer than
>>> IGB_RX_HDR_LEN - IGB_TS_HDR_LEN either.
>> So I believe this is a possible bug, one heck of a corner case to get
>> into though.  It requires timestamp in packet, size 240 - 256, and a
>> malformed header.
>>
>> The proper fix would probably be to pull the timestamp out of the packet
>> before we add it to the frame.  I'll submit a patch to address this.
>>
>
> Huh? Doesn't my patch already fix this? skb_frag_size() is always
> up to date. Or you mean another different problem?

Your patch has other issues.  I am still NAKing your patch, but there is
an issue with igb that you have pointed out.  The proper fix would be to
deal with the timestamp before we add the page fragment to the skb.

>
>>> Second, the check you mentioned above is:
>>>
>>> if ((size <= IGB_RX_HDR_LEN) && !skb_is_nonlinear(skb))
>>>
>>> skb is nonlinear _after_ the first igb_add_rx_frag(), a second
>>> igb_add_rx_frag() is possible since igb_is_non_eop() could
>>> return true.
>> I'm not sure this part makes any sense.  We pull the data out of the
>> first fragment always.  If skb_is_nonlinear is set then we should have
>> at least 2K - 16B in the case of igb.  We will never have a second
>> fragment without at least 2K of data being given in the first.
> Apparently my igb knowledge isn't enough to verify this, I just did
> logical analysis.

The logic with igb is that if skb_is_nonlinear it means that a page has
already been added.  The way the hardware works is that it will break a
frame up into 2K segments until the entire frame is written.  So the
only way to get a second page fragment is if the first has used the
entire 2K buffer it was given.

- Alex
Cong Wang April 22, 2015, 11:23 p.m. UTC | #7
On Wed, Apr 22, 2015 at 3:34 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On 04/22/2015 02:56 PM, Cong Wang wrote:
>> On Wed, Apr 22, 2015 at 2:42 PM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On 04/22/2015 01:33 PM, Cong Wang wrote:
>>>> First, make sure you don't miss the TSIP case right above:
>>>>
>>>> The frag starting pointer and its size are advanced by:
>>>>
>>>> skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
>>>> ...
>>>> va += IGB_TS_HDR_LEN;
>>>>
>>>> even though we unlikely pull header longer than
>>>> IGB_RX_HDR_LEN - IGB_TS_HDR_LEN either.
>>> So I believe this is a possible bug, one heck of a corner case to get
>>> into though.  It requires timestamp in packet, size 240 - 256, and a
>>> malformed header.
>>>
>>> The proper fix would probably be to pull the timestamp out of the packet
>>> before we add it to the frame.  I'll submit a patch to address this.
>>>
>>
>> Huh? Doesn't my patch already fix this? skb_frag_size() is always
>> up to date. Or you mean another different problem?
>
> Your patch has other issues.  I am still NAKing your patch, but there is
> an issue with igb that you have pointed out.  The proper fix would be to
> deal with the timestamp before we add the page fragment to the skb.
>

If the first frag is always 2K, then this is not a problem either.
IGB_TS_HDR_LEN + IGB_RX_HDR_LEN < 2K.


>>
>>>> Second, the check you mentioned above is:
>>>>
>>>> if ((size <= IGB_RX_HDR_LEN) && !skb_is_nonlinear(skb))
>>>>
>>>> skb is nonlinear _after_ the first igb_add_rx_frag(), a second
>>>> igb_add_rx_frag() is possible since igb_is_non_eop() could
>>>> return true.
>>> I'm not sure this part makes any sense.  We pull the data out of the
>>> first fragment always.  If skb_is_nonlinear is set then we should have
>>> at least 2K - 16B in the case of igb.  We will never have a second
>>> fragment without at least 2K of data being given in the first.
>> Apparently my igb knowledge isn't enough to verify this, I just did
>> logical analysis.
>
> The logic with igb is that if skb_is_nonlinear it means that a page has
> already been added.  The way the hardware works is that it will break a
> frame up into 2K segments until the entire frame is written.  So the
> only way to get a second page fragment is if the first has used the
> entire 2K buffer it was given.

So the first frag is always 2K, at least more than IGB_RX_HDR_LEN
then we are fine even for TSIP case.

The code looks correct to me now, except it is suspicious skb->len
is not updated after skb_copy_to_linear_data() while skb->tail is
advanced already. I need to think more before submitting a patch.
Alexander H Duyck April 23, 2015, 3:40 a.m. UTC | #8
On 04/22/2015 04:23 PM, Cong Wang wrote:
> On Wed, Apr 22, 2015 at 3:34 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On 04/22/2015 02:56 PM, Cong Wang wrote:
>>> On Wed, Apr 22, 2015 at 2:42 PM, Alexander Duyck
>>> <alexander.duyck@gmail.com> wrote:
>>>> On 04/22/2015 01:33 PM, Cong Wang wrote:
>>>>> First, make sure you don't miss the TSIP case right above:
>>>>>
>>>>> The frag starting pointer and its size are advanced by:
>>>>>
>>>>> skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
>>>>> ...
>>>>> va += IGB_TS_HDR_LEN;
>>>>>
>>>>> even though we unlikely pull header longer than
>>>>> IGB_RX_HDR_LEN - IGB_TS_HDR_LEN either.
>>>> So I believe this is a possible bug, one heck of a corner case to get
>>>> into though.  It requires timestamp in packet, size 240 - 256, and a
>>>> malformed header.
>>>>
>>>> The proper fix would probably be to pull the timestamp out of the packet
>>>> before we add it to the frame.  I'll submit a patch to address this.
>>>>
>>> Huh? Doesn't my patch already fix this? skb_frag_size() is always
>>> up to date. Or you mean another different problem?
>> Your patch has other issues.  I am still NAKing your patch, but there is
>> an issue with igb that you have pointed out.  The proper fix would be to
>> deal with the timestamp before we add the page fragment to the skb.
>>
> If the first frag is always 2K, then this is not a problem either.
> IGB_TS_HDR_LEN + IGB_RX_HDR_LEN < 2K.

The problem is skb->tail will get screwed up.

>
>>>>> Second, the check you mentioned above is:
>>>>>
>>>>> if ((size <= IGB_RX_HDR_LEN) && !skb_is_nonlinear(skb))
>>>>>
>>>>> skb is nonlinear _after_ the first igb_add_rx_frag(), a second
>>>>> igb_add_rx_frag() is possible since igb_is_non_eop() could
>>>>> return true.
>>>> I'm not sure this part makes any sense.  We pull the data out of the
>>>> first fragment always.  If skb_is_nonlinear is set then we should have
>>>> at least 2K - 16B in the case of igb.  We will never have a second
>>>> fragment without at least 2K of data being given in the first.
>>> Apparently my igb knowledge isn't enough to verify this, I just did
>>> logical analysis.
>> The logic with igb is that if skb_is_nonlinear it means that a page has
>> already been added.  The way the hardware works is that it will break a
>> frame up into 2K segments until the entire frame is written.  So the
>> only way to get a second page fragment is if the first has used the
>> entire 2K buffer it was given.
> So the first frag is always 2K, at least more than IGB_RX_HDR_LEN
> then we are fine even for TSIP case.
>
> The code looks correct to me now, except it is suspicious skb->len
> is not updated after skb_copy_to_linear_data() while skb->tail is
> advanced already. I need to think more before submitting a patch.

The first frag will be 2K in size only if there are multiple frags.  The
problem in the TSIP case is that we will put the size reported by the
descriptor into the fragment, and then try to pull the size we
determined via eth_get_headlen.  So there is a window where that value
can be wrong and result in data_len going negative and tail being moved
beyond the end of the actual data.

- Alex
Cong Wang April 23, 2015, 6:06 p.m. UTC | #9
On Wed, Apr 22, 2015 at 8:40 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On 04/22/2015 04:23 PM, Cong Wang wrote:
>> On Wed, Apr 22, 2015 at 3:34 PM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On 04/22/2015 02:56 PM, Cong Wang wrote:
>>>> On Wed, Apr 22, 2015 at 2:42 PM, Alexander Duyck
>>>> <alexander.duyck@gmail.com> wrote:
>>>>> On 04/22/2015 01:33 PM, Cong Wang wrote:
>>>>>> First, make sure you don't miss the TSIP case right above:
>>>>>>
>>>>>> The frag starting pointer and its size are advanced by:
>>>>>>
>>>>>> skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
>>>>>> ...
>>>>>> va += IGB_TS_HDR_LEN;
>>>>>>
>>>>>> even though we unlikely pull header longer than
>>>>>> IGB_RX_HDR_LEN - IGB_TS_HDR_LEN either.
>>>>> So I believe this is a possible bug, one heck of a corner case to get
>>>>> into though.  It requires timestamp in packet, size 240 - 256, and a
>>>>> malformed header.
>>>>>
>>>>> The proper fix would probably be to pull the timestamp out of the packet
>>>>> before we add it to the frame.  I'll submit a patch to address this.
>>>>>
>>>> Huh? Doesn't my patch already fix this? skb_frag_size() is always
>>>> up to date. Or you mean another different problem?
>>> Your patch has other issues.  I am still NAKing your patch, but there is
>>> an issue with igb that you have pointed out.  The proper fix would be to
>>> deal with the timestamp before we add the page fragment to the skb.
>>>
>> If the first frag is always 2K, then this is not a problem either.
>> IGB_TS_HDR_LEN + IGB_RX_HDR_LEN < 2K.
>
> The problem is skb->tail will get screwed up.
>

Why? We don't copy timestamp into skb header, instead it is saved
in shared_info, why skb->tail matters here in TSIP case?


[...]

>
> The first frag will be 2K in size only if there are multiple frags.  The


Or it doesn't have frags at all since we just copy it to skb header, right?


> problem in the TSIP case is that we will put the size reported by the
> descriptor into the fragment, and then try to pull the size we


That size is saved when adding the frag, in TSIP case we just sub it
by IGB_TS_HDR_LEN, which seems correct.


> determined via eth_get_headlen.  So there is a window where that value
> can be wrong and result in data_len going negative and tail being moved
> beyond the end of the actual data.

This sounds like a different problem if you are saying we should sub
the size in the descriptor too. Therefore I don't see why your patch
could replace mine.
Alexander H Duyck April 23, 2015, 6:40 p.m. UTC | #10
On 04/23/2015 11:06 AM, Cong Wang wrote:
> On Wed, Apr 22, 2015 at 8:40 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On 04/22/2015 04:23 PM, Cong Wang wrote:
>>> On Wed, Apr 22, 2015 at 3:34 PM, Alexander Duyck
>>> <alexander.duyck@gmail.com> wrote:
>>>> On 04/22/2015 02:56 PM, Cong Wang wrote:
>>>>> On Wed, Apr 22, 2015 at 2:42 PM, Alexander Duyck
>>>>> <alexander.duyck@gmail.com> wrote:
>>>>>> On 04/22/2015 01:33 PM, Cong Wang wrote:
>>>>>>> First, make sure you don't miss the TSIP case right above:
>>>>>>>
>>>>>>> The frag starting pointer and its size are advanced by:
>>>>>>>
>>>>>>> skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
>>>>>>> ...
>>>>>>> va += IGB_TS_HDR_LEN;
>>>>>>>
>>>>>>> even though we unlikely pull header longer than
>>>>>>> IGB_RX_HDR_LEN - IGB_TS_HDR_LEN either.
>>>>>> So I believe this is a possible bug, one heck of a corner case to get
>>>>>> into though.  It requires timestamp in packet, size 240 - 256, and a
>>>>>> malformed header.
>>>>>>
>>>>>> The proper fix would probably be to pull the timestamp out of the packet
>>>>>> before we add it to the frame.  I'll submit a patch to address this.
>>>>>>
>>>>> Huh? Doesn't my patch already fix this? skb_frag_size() is always
>>>>> up to date. Or you mean another different problem?
>>>> Your patch has other issues.  I am still NAKing your patch, but there is
>>>> an issue with igb that you have pointed out.  The proper fix would be to
>>>> deal with the timestamp before we add the page fragment to the skb.
>>>>
>>> If the first frag is always 2K, then this is not a problem either.
>>> IGB_TS_HDR_LEN + IGB_RX_HDR_LEN < 2K.
>> The problem is skb->tail will get screwed up.
>>
> Why? We don't copy timestamp into skb header, instead it is saved
> in shared_info, why skb->tail matters here in TSIP case?

TSIP only matters if you have a network header with bad data in it that
indicates the size is actually larger than it actually is.

>> The first frag will be 2K in size only if there are multiple frags.  The
>
> Or it doesn't have frags at all since we just copy it to skb header, right?

That is moot since this code only gets called if the skb is nonlinear

>> problem in the TSIP case is that we will put the size reported by the
>> descriptor into the fragment, and then try to pull the size we
>
> That size is saved when adding the frag, in TSIP case we just sub it
> by IGB_TS_HDR_LEN, which seems correct.
>

Yes, the size is saved.  But with your solution we could pull the whole
fragment but not free it which isn't correct as we shouldn't be left
with any 0 sized fragments.

>> determined via eth_get_headlen.  So there is a window where that value
>> can be wrong and result in data_len going negative and tail being moved
>> beyond the end of the actual data.
> This sounds like a different problem if you are saying we should sub
> the size in the descriptor too. Therefore I don't see why your patch
> could replace mine.

Your patches sort-of fixed the problem, but they introduced other issues
in the process.

The thing I don't think you are getting is that the code was meant to be
mutually exclusive w/ the copy-break code.  Either the frag is less than
IGB_RX_HDR_SIZE in which case we copy-break and don't attached the
fragment, or we should be pulling the header and leaving at least 1 byte
in the fragment.  The problem with your solution is that you potentially
pull the entire fragment in, but you don't free it if the size drops to
0.  That is why in the other mail I told you the better solution for igb
would likely be to just pass IGB_RX_HDR_SIZE - IGB_TS_HDR_SIZE as that
way you end up with at least 1 byte left in the fragment after you pull
the header.

- Alex
Cong Wang April 23, 2015, 7:14 p.m. UTC | #11
On Thu, Apr 23, 2015 at 11:40 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On 04/23/2015 11:06 AM, Cong Wang wrote:
>> On Wed, Apr 22, 2015 at 8:40 PM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On 04/22/2015 04:23 PM, Cong Wang wrote:
>>>> On Wed, Apr 22, 2015 at 3:34 PM, Alexander Duyck
>>>> <alexander.duyck@gmail.com> wrote:
>>>>> On 04/22/2015 02:56 PM, Cong Wang wrote:
>>>>>> On Wed, Apr 22, 2015 at 2:42 PM, Alexander Duyck
>>>>>> <alexander.duyck@gmail.com> wrote:
>>>>>>> On 04/22/2015 01:33 PM, Cong Wang wrote:
>>>>>>>> First, make sure you don't miss the TSIP case right above:
>>>>>>>>
>>>>>>>> The frag starting pointer and its size are advanced by:
>>>>>>>>
>>>>>>>> skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
>>>>>>>> ...
>>>>>>>> va += IGB_TS_HDR_LEN;
>>>>>>>>
>>>>>>>> even though we unlikely pull header longer than
>>>>>>>> IGB_RX_HDR_LEN - IGB_TS_HDR_LEN either.
>>>>>>> So I believe this is a possible bug, one heck of a corner case to get
>>>>>>> into though.  It requires timestamp in packet, size 240 - 256, and a
>>>>>>> malformed header.
>>>>>>>
>>>>>>> The proper fix would probably be to pull the timestamp out of the packet
>>>>>>> before we add it to the frame.  I'll submit a patch to address this.
>>>>>>>
>>>>>> Huh? Doesn't my patch already fix this? skb_frag_size() is always
>>>>>> up to date. Or you mean another different problem?
>>>>> Your patch has other issues.  I am still NAKing your patch, but there is
>>>>> an issue with igb that you have pointed out.  The proper fix would be to
>>>>> deal with the timestamp before we add the page fragment to the skb.
>>>>>
>>>> If the first frag is always 2K, then this is not a problem either.
>>>> IGB_TS_HDR_LEN + IGB_RX_HDR_LEN < 2K.
>>> The problem is skb->tail will get screwed up.
>>>
>> Why? We don't copy timestamp into skb header, instead it is saved
>> in shared_info, why skb->tail matters here in TSIP case?
>
> TSIP only matters if you have a network header with bad data in it that
> indicates the size is actually larger than it actually is.


OK.

>
>>> The first frag will be 2K in size only if there are multiple frags.  The
>>
>> Or it doesn't have frags at all since we just copy it to skb header, right?
>
> That is moot since this code only gets called if the skb is nonlinear
>
>>> problem in the TSIP case is that we will put the size reported by the
>>> descriptor into the fragment, and then try to pull the size we
>>
>> That size is saved when adding the frag, in TSIP case we just sub it
>> by IGB_TS_HDR_LEN, which seems correct.
>>
>
> Yes, the size is saved.  But with your solution we could pull the whole
> fragment but not free it which isn't correct as we shouldn't be left
> with any 0 sized fragments.


Good point, then TSIP case should check frag_size == 0 case,
the rest handles 0-size case correctly, see below.

>
>>> determined via eth_get_headlen.  So there is a window where that value
>>> can be wrong and result in data_len going negative and tail being moved
>>> beyond the end of the actual data.
>> This sounds like a different problem if you are saying we should sub
>> the size in the descriptor too. Therefore I don't see why your patch
>> could replace mine.
>
> Your patches sort-of fixed the problem, but they introduced other issues
> in the process.
>
> The thing I don't think you are getting is that the code was meant to be
> mutually exclusive w/ the copy-break code.  Either the frag is less than
> IGB_RX_HDR_SIZE in which case we copy-break and don't attached the
> fragment, or we should be pulling the header and leaving at least 1 byte
> in the fragment.  The problem with your solution is that you potentially
> pull the entire fragment in, but you don't free it if the size drops to


Only if it is a malformed packet, and it is still safe as long as we don't
run over the buffer size. Also I have a upper limit check for
IGB_RX_HDR_SIZE after eth_get_headlen().

> 0.  That is why in the other mail I told you the better solution for igb

I am sure eth_get_headlen() handles 0 case correctly, it simply returns 0.

> would likely be to just pass IGB_RX_HDR_SIZE - IGB_TS_HDR_SIZE as that
> way you end up with at least 1 byte left in the fragment after you pull
> the header.
>

The code should already handle pull_len==0 correctly.
Kirsher, Jeffrey T April 25, 2015, 1:07 a.m. UTC | #12
On Wed, 2015-04-22 at 10:45 -0700, Cong Wang wrote:
> The second parameter of eth_get_headlen() is the length of
> the frame buffer, not the header length of skb.
> 
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Dropping this patch based on feedback from Alex.
diff mbox

Patch

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index a0a9b1f..7b3a370 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6852,7 +6852,9 @@  static void igb_pull_tail(struct igb_ring *rx_ring,
 	/* we need the header to contain the greater of either ETH_HLEN or
 	 * 60 bytes if the skb->len is less than 60 for skb_pad.
 	 */
-	pull_len = eth_get_headlen(va, IGB_RX_HDR_LEN);
+	pull_len = eth_get_headlen(va, skb_frag_size(frag));
+	if (unlikely(pull_len > IGB_RX_HDR_LEN))
+		pull_len = IGB_RX_HDR_LEN;
 
 	/* align pull length to size of long to optimize memcpy performance */
 	skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));