diff mbox

[3/3] net: fec: align IP header in hardware

Message ID 1474728139-9335-4-git-send-email-eric@nelint.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Nelson Sept. 24, 2016, 2:42 p.m. UTC
The FEC receive accelerator (RACC) supports shifting the data payload of
received packets by 16-bits, which aligns the payload (IP header) on a
4-byte boundary, which is, if not required, at least strongly suggested
by the Linux networking layer.

Without this patch, a huge number of alignment faults will be taken by the
IP stack, as seen in /proc/cpu/alignment:

	~/$ cat /proc/cpu/alignment
	User:		0
	System:		72645 (inet_gro_receive+0x104/0x27c)
	Skipped:	0
	Half:		0
	Word:		0
	DWord:		0
	Multi:		72645
	User faults:	3 (fixup+warn)

This patch was suggested by Andrew Lunn in this message to linux-netdev:
	http://marc.info/?l=linux-arm-kernel&m=147465452108384&w=2

and adapted from a patch by Russell King from 2014:
	http://git.arm.linux.org.uk/cgit/linux-arm.git/commit/?id=70d8a8a

Signed-off-by: Eric Nelson <eric@nelint.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

David Laight Sept. 26, 2016, 9:26 a.m. UTC | #1
From: Eric Nelson
> Sent: 24 September 2016 15:42
> The FEC receive accelerator (RACC) supports shifting the data payload of
> received packets by 16-bits, which aligns the payload (IP header) on a
> 4-byte boundary, which is, if not required, at least strongly suggested
> by the Linux networking layer.
...
> +		/* align IP header */
> +		val |= FEC_RACC_SHIFT16;

I can't help feeling that there needs to be corresponding
changes to increase the buffer size by 2 (maybe for large mtu)
and to discard two bytes from the frame length.

If probably ought to be predicated on NET_IP_ALIGN as well.

	David
Eric Nelson Sept. 26, 2016, 6:39 p.m. UTC | #2
Hi David,

On 09/26/2016 02:26 AM, David Laight wrote:
> From: Eric Nelson
>> Sent: 24 September 2016 15:42
>> The FEC receive accelerator (RACC) supports shifting the data payload of
>> received packets by 16-bits, which aligns the payload (IP header) on a
>> 4-byte boundary, which is, if not required, at least strongly suggested
>> by the Linux networking layer.
> ...
>> +		/* align IP header */
>> +		val |= FEC_RACC_SHIFT16;
> 
> I can't help feeling that there needs to be corresponding
> changes to increase the buffer size by 2 (maybe for large mtu)
> and to discard two bytes from the frame length.
> 

In the normal case, the fec driver over-allocates all receive packets to
be of size FEC_ENET_RX_FRSIZE (2048) minus the value of rx_align,
which is either 0x0f (ARM) or 0x03 (PPC).

If the frame length is less than rx_copybreak (typically 256), then
the frame length from the receive buffer descriptor is used to
control the allocation size for a copied buffer, and this will include
the two bytes of padding if RACC_SHIFT16 is set.

> If probably ought to be predicated on NET_IP_ALIGN as well.
> 
Can you elaborate?

Please advise,


Eric
David Laight Sept. 28, 2016, 4:42 p.m. UTC | #3
From: Eric Nelson
> Sent: 26 September 2016 19:40
> Hi David,
> 
> On 09/26/2016 02:26 AM, David Laight wrote:
> > From: Eric Nelson
> >> Sent: 24 September 2016 15:42
> >> The FEC receive accelerator (RACC) supports shifting the data payload of
> >> received packets by 16-bits, which aligns the payload (IP header) on a
> >> 4-byte boundary, which is, if not required, at least strongly suggested
> >> by the Linux networking layer.
> > ...
> >> +		/* align IP header */
> >> +		val |= FEC_RACC_SHIFT16;
> >
> > I can't help feeling that there needs to be corresponding
> > changes to increase the buffer size by 2 (maybe for large mtu)
> > and to discard two bytes from the frame length.
> >
> 
> In the normal case, the fec driver over-allocates all receive packets to
> be of size FEC_ENET_RX_FRSIZE (2048) minus the value of rx_align,
> which is either 0x0f (ARM) or 0x03 (PPC).
> 
> If the frame length is less than rx_copybreak (typically 256), then
> the frame length from the receive buffer descriptor is used to
> control the allocation size for a copied buffer, and this will include
> the two bytes of padding if RACC_SHIFT16 is set.
> 
> > If probably ought to be predicated on NET_IP_ALIGN as well.
> >
> Can you elaborate?

From reading this it seems that the effect of FEC_RACC_SHIFT16 is to
add two bytes of 'junk' to the start of every receive frame.

In the 'copybreak' case the new skb would need to be 2 bytes shorter
than the length reported by the hardware, and the data copied from
2 bytes into the dma buffer.

The extra 2 bytes also mean the that maximum mtu that can be received
into a buffer is two bytes less.
If someone sets the mtu to (say) 9k for jumbo frames this might matter.
Even with fixed 2048 byte buffers it reduces the maximum value the mtu
can be set to by 2.

Now if NET_IP_ALIGN is zero then it is fine for the rx frame to start
on a 4n boundary, and the skb are likely to be allocated that way.
In this case you don't want to extra two bytes of 'junk'.

OTOH if NET_IP_ALIGN is 2 then you need to 'fiddle' things so that
the data is dma'd to offset -2 in the skb and then ensure that the
end of frame is set correctly.

	David
Eric Nelson Sept. 28, 2016, 5:14 p.m. UTC | #4
Thanks David,

On 09/28/2016 09:42 AM, David Laight wrote:
> From: Eric Nelson
>> Sent: 26 September 2016 19:40
>> Hi David,
>>
>> On 09/26/2016 02:26 AM, David Laight wrote:
>>> From: Eric Nelson
>>>> Sent: 24 September 2016 15:42
>>>> The FEC receive accelerator (RACC) supports shifting the data payload of
>>>> received packets by 16-bits, which aligns the payload (IP header) on a
>>>> 4-byte boundary, which is, if not required, at least strongly suggested
>>>> by the Linux networking layer.
>>> ...
>>>> +		/* align IP header */
>>>> +		val |= FEC_RACC_SHIFT16;
>>>
>>> I can't help feeling that there needs to be corresponding
>>> changes to increase the buffer size by 2 (maybe for large mtu)
>>> and to discard two bytes from the frame length.
>>>
>>
>> In the normal case, the fec driver over-allocates all receive packets to
>> be of size FEC_ENET_RX_FRSIZE (2048) minus the value of rx_align,
>> which is either 0x0f (ARM) or 0x03 (PPC).
>>
>> If the frame length is less than rx_copybreak (typically 256), then
>> the frame length from the receive buffer descriptor is used to
>> control the allocation size for a copied buffer, and this will include
>> the two bytes of padding if RACC_SHIFT16 is set.
>>
>>> If probably ought to be predicated on NET_IP_ALIGN as well.
>>>
>> Can you elaborate?
> 
> From reading this it seems that the effect of FEC_RACC_SHIFT16 is to
> add two bytes of 'junk' to the start of every receive frame.
> 

That's right. Two bytes of junk between the MAC header and the
IP header.

> In the 'copybreak' case the new skb would need to be 2 bytes shorter
> than the length reported by the hardware, and the data copied from
> 2 bytes into the dma buffer.
> 

As it stands, the skb allocated by the copybreak routine will include
the two bytes of padding, and the call to skb_pull_inline will ignore
them.

> The extra 2 bytes also mean the that maximum mtu that can be received
> into a buffer is two bytes less.
>

Right, but I think the max is already high enough that this isn't a
problem.

> If someone sets the mtu to (say) 9k for jumbo frames this might matter.
> Even with fixed 2048 byte buffers it reduces the maximum value the mtu
> can be set to by 2.
> 

As far as I can tell, the fec driver doesn't support jumbo frames, and
the max frame length is currently hard-coded at PKT_MAXBUF_SIZE (1522).

This is well within the 2048-byte allocation, even with optional headers
for VLAN etc.

> Now if NET_IP_ALIGN is zero then it is fine for the rx frame to start
> on a 4n boundary, and the skb are likely to be allocated that way.
> In this case you don't want to extra two bytes of 'junk'.
> 
NET_IP_ALIGN is defaulting to 2 by the conditional in skbuff.h

> OTOH if NET_IP_ALIGN is 2 then you need to 'fiddle' things so that
> the data is dma'd to offset -2 in the skb and then ensure that the
> end of frame is set correctly.
> 

That's what the RACC SHIFT16 bit does.

The FEC hardware isn't capable of DMA'ing to an un-aligned address.
On ARM, it requires 64-bit alignment, but suggests 128-bit alignment.

On other (PPC?) architectures, it requires 32-bit alignment. This is
handled by the rx_align field.

Regards,


Eric
Russell King (Oracle) Sept. 28, 2016, 5:25 p.m. UTC | #5
On Wed, Sep 28, 2016 at 10:14:52AM -0700, Eric Nelson wrote:
> Thanks David,
> 
> On 09/28/2016 09:42 AM, David Laight wrote:
> > From reading this it seems that the effect of FEC_RACC_SHIFT16 is to
> > add two bytes of 'junk' to the start of every receive frame.
> 
> That's right. Two bytes of junk between the MAC header and the
> IP header.

That's wrong.  FEC_RACC_SHIFT16 adds two bytes to the _beginning_ of
the packet, not in the middle of the packet:

   7      RX FIFO Shift-16
 SHIFT16
          When this field is set, the actual frame data starts at bit 16
          of the first word read from the RX FIFO aligning the Ethernet
          payload on a 32-bit boundary.

          NOTE: This function only affects the FIFO storage and has no
                influence on the statistics, which use the actual length
                of the frame received.

          0    Disabled.
          1    Instructs the MAC to write two additional bytes in front
               of each frame received into the RX FIFO.

*in front* of the frame - that's before the Ethernet header.  Not between
the ethernet and IP headers.
Eric Nelson Sept. 28, 2016, 6:01 p.m. UTC | #6
Thanks Russell,

On 09/28/2016 10:25 AM, Russell King - ARM Linux wrote:
> On Wed, Sep 28, 2016 at 10:14:52AM -0700, Eric Nelson wrote:
>> Thanks David,
>>
>> On 09/28/2016 09:42 AM, David Laight wrote:
>>> From reading this it seems that the effect of FEC_RACC_SHIFT16 is to
>>> add two bytes of 'junk' to the start of every receive frame.
>>
>> That's right. Two bytes of junk between the MAC header and the
>> IP header.
> 
> That's wrong.  FEC_RACC_SHIFT16 adds two bytes to the _beginning_ of
> the packet, not in the middle of the packet:
> 
>    7      RX FIFO Shift-16
>  SHIFT16
>           When this field is set, the actual frame data starts at bit 16
>           of the first word read from the RX FIFO aligning the Ethernet
>           payload on a 32-bit boundary.
> 
>           NOTE: This function only affects the FIFO storage and has no
>                 influence on the statistics, which use the actual length
>                 of the frame received.
> 
>           0    Disabled.
>           1    Instructs the MAC to write two additional bytes in front
>                of each frame received into the RX FIFO.
> 
> *in front* of the frame - that's before the Ethernet header.  Not between
> the ethernet and IP headers.
> 

I obviously mis-read this, and haven't dumped any packets to straighten
myself out.
David Laight Sept. 29, 2016, 11:07 a.m. UTC | #7
From: Eric Nelson
> Sent: 28 September 2016 18:15
> On 09/28/2016 09:42 AM, David Laight wrote:
> > From: Eric Nelson
> >> Sent: 26 September 2016 19:40
> >> Hi David,
> >>
> >> On 09/26/2016 02:26 AM, David Laight wrote:
> >>> From: Eric Nelson
> >>>> Sent: 24 September 2016 15:42
> >>>> The FEC receive accelerator (RACC) supports shifting the data payload of
> >>>> received packets by 16-bits, which aligns the payload (IP header) on a
> >>>> 4-byte boundary, which is, if not required, at least strongly suggested
> >>>> by the Linux networking layer.
> >>> ...
> >>>> +		/* align IP header */
> >>>> +		val |= FEC_RACC_SHIFT16;
> >>>
> >>> I can't help feeling that there needs to be corresponding
> >>> changes to increase the buffer size by 2 (maybe for large mtu)
> >>> and to discard two bytes from the frame length.
> >>>
> >>
> >> In the normal case, the fec driver over-allocates all receive packets to
> >> be of size FEC_ENET_RX_FRSIZE (2048) minus the value of rx_align,
> >> which is either 0x0f (ARM) or 0x03 (PPC).
> >>
> >> If the frame length is less than rx_copybreak (typically 256), then
> >> the frame length from the receive buffer descriptor is used to
> >> control the allocation size for a copied buffer, and this will include
> >> the two bytes of padding if RACC_SHIFT16 is set.
> >>
> >>> If probably ought to be predicated on NET_IP_ALIGN as well.
> >>>
> >> Can you elaborate?
> >
> > From reading this it seems that the effect of FEC_RACC_SHIFT16 is to
> > add two bytes of 'junk' to the start of every receive frame.
> >
> 
> That's right. Two bytes of junk between the MAC header and the
> IP header.
> 
> > In the 'copybreak' case the new skb would need to be 2 bytes shorter
> > than the length reported by the hardware, and the data copied from
> > 2 bytes into the dma buffer.
> >
> 
> As it stands, the skb allocated by the copybreak routine will include
> the two bytes of padding, and the call to skb_pull_inline will ignore
> them.

Ok, I didn't see that call being added by this patch.

> > The extra 2 bytes also mean the that maximum mtu that can be received
> > into a buffer is two bytes less.
> >
> 
> Right, but I think the max is already high enough that this isn't a
> problem.
> 
> > If someone sets the mtu to (say) 9k for jumbo frames this might matter.
> > Even with fixed 2048 byte buffers it reduces the maximum value the mtu
> > can be set to by 2.
> >
> 
> As far as I can tell, the fec driver doesn't support jumbo frames, and
> the max frame length is currently hard-coded at PKT_MAXBUF_SIZE (1522).
> 
> This is well within the 2048-byte allocation, even with optional headers
> for VLAN etc.

Hmm...
That (probably) means all the skb the driver allocates are actually 4k.
It would be much better to reduce the size so that the entire skb
(with packet buffer) is less than 2k.

> > Now if NET_IP_ALIGN is zero then it is fine for the rx frame to start
> > on a 4n boundary, and the skb are likely to be allocated that way.
> > In this case you don't want to extra two bytes of 'junk'.
> >
> NET_IP_ALIGN is defaulting to 2 by the conditional in skbuff.h

Even though it is always currently set is isn't really ideal to have
a driver that breaks if it isn't set.
This could easily happen at some point in the future if the ethernet
logic is put with a different cpu.


> > OTOH if NET_IP_ALIGN is 2 then you need to 'fiddle' things so that
> > the data is dma'd to offset -2 in the skb and then ensure that the
> > end of frame is set correctly.
> >
> 
> That's what the RACC SHIFT16 bit does.

No, that causes the ethernet controller to add 2 bytes to the frame.
You then need to change the dma target address to match.
Otherwise if a new version of the silicon stops ignoring the low
address with the frame will be misaligned in the buffer.

The receive frame length will also (probably) be 2 larger than the
actual frame - so you need to set the end point correctly as well.
IP will probably ignore the 2 bytes of pad I think you are generating.

> The FEC hardware isn't capable of DMA'ing to an un-aligned address.
> On ARM, it requires 64-bit alignment, but suggests 128-bit alignment.
> 
> On other (PPC?) architectures, it requires 32-bit alignment. This is
> handled by the rx_align field.

That isn't entirely relevant.
If the kernel is being built with NET_IP_ALIGN set to 0 you should
align the destination mac address on a 4n boundary
(Or rather the skb are likely to be allocated that way).
If it causes misaligned memory reads later on that is a different problem.
The MAC driver has aligned the frames as it was told to.

	David
Eric Nelson Sept. 30, 2016, 1:27 p.m. UTC | #8
Thanks for the feedback David,

On 09/29/2016 04:07 AM, David Laight wrote:
> From: Eric Nelson
>> Sent: 28 September 2016 18:15
>> On 09/28/2016 09:42 AM, David Laight wrote:
>>> From: Eric Nelson
>>>> Sent: 26 September 2016 19:40
>>>> Hi David,
>>>>
>>>> On 09/26/2016 02:26 AM, David Laight wrote:
>>>>> From: Eric Nelson
>>>>>> Sent: 24 September 2016 15:42
>>>>>> The FEC receive accelerator (RACC) supports shifting the data payload of
>>>>>> received packets by 16-bits, which aligns the payload (IP header) on a
>>>>>> 4-byte boundary, which is, if not required, at least strongly suggested
>>>>>> by the Linux networking layer.
>>>>> ...
>>>>>> +		/* align IP header */
>>>>>> +		val |= FEC_RACC_SHIFT16;
>>>>>
>>>>> I can't help feeling that there needs to be corresponding
>>>>> changes to increase the buffer size by 2 (maybe for large mtu)
>>>>> and to discard two bytes from the frame length.
>>>>>
>>>>
>>>> In the normal case, the fec driver over-allocates all receive packets to
>>>> be of size FEC_ENET_RX_FRSIZE (2048) minus the value of rx_align,
>>>> which is either 0x0f (ARM) or 0x03 (PPC).
>>>>
>>>> If the frame length is less than rx_copybreak (typically 256), then
>>>> the frame length from the receive buffer descriptor is used to
>>>> control the allocation size for a copied buffer, and this will include
>>>> the two bytes of padding if RACC_SHIFT16 is set.
>>>>
>>>>> If probably ought to be predicated on NET_IP_ALIGN as well.
>>>>>
>>>> Can you elaborate?
>>>
>>> From reading this it seems that the effect of FEC_RACC_SHIFT16 is to
>>> add two bytes of 'junk' to the start of every receive frame.
>>>
>>
>> That's right. Two bytes of junk between the MAC header and the
>> IP header.
>>
>>> In the 'copybreak' case the new skb would need to be 2 bytes shorter
>>> than the length reported by the hardware, and the data copied from
>>> 2 bytes into the dma buffer.
>>>
>>
>> As it stands, the skb allocated by the copybreak routine will include
>> the two bytes of padding, and the call to skb_pull_inline will ignore
>> them.
> 
> Ok, I didn't see that call being added by this patch.
> 
>>> The extra 2 bytes also mean the that maximum mtu that can be received
>>> into a buffer is two bytes less.
>>>
>>
>> Right, but I think the max is already high enough that this isn't a
>> problem.
>>
>>> If someone sets the mtu to (say) 9k for jumbo frames this might matter.
>>> Even with fixed 2048 byte buffers it reduces the maximum value the mtu
>>> can be set to by 2.
>>>
>>
>> As far as I can tell, the fec driver doesn't support jumbo frames, and
>> the max frame length is currently hard-coded at PKT_MAXBUF_SIZE (1522).
>>
>> This is well within the 2048-byte allocation, even with optional headers
>> for VLAN etc.
> 
> Hmm...
>
> That (probably) means all the skb the driver allocates are actually 4k.
> It would be much better to reduce the size so that the entire skb
> (with packet buffer) is less than 2k.
>

That seems worthwhile, but un-related to this patch.

It appears to me that the received packets could be allocated as

PKT_MAXBUF_SIZE+fep->rx_align+NET_IP_ALIGN

(+2 if FEC_RACC_SHIFT16 is used)

>>> Now if NET_IP_ALIGN is zero then it is fine for the rx frame to start
>>> on a 4n boundary, and the skb are likely to be allocated that way.
>>> In this case you don't want to extra two bytes of 'junk'.
>>>
>> NET_IP_ALIGN is defaulting to 2 by the conditional in skbuff.h
> 
> Even though it is always currently set is isn't really ideal to have
> a driver that breaks if it isn't set.
> This could easily happen at some point in the future if the ethernet
> logic is put with a different cpu.
> 

After multiple reads, I'm confused about the meaning of NET_IP_ALIGN
and how it should be used.

From Documentation/unaligned-memory-access.txt, I take it that this
should be configured on a per-architecture basis, and it seems to be
set to zero on both PPC and x86.

I wonder if this is proper though. It seems that its' use might depend
on the I/O subsystem(s) in use as much as the architecture.

For example, it might be desirable to have a different value for a PCIe
interface than for an integrated MAC like the FEC.

Looking at the example of the 3c59x driver, I see a pattern of an
allocation that adds NET_IP_ALIGN followed by an skb->reserve()
of NET_IP_ALIGN before determining the target address to end
up with allocation with 4n+2 alignment.

This seems somewhat equivalent to this patch, except that we're
using the allocated address as the target and using skb_pull_inline
afterwards.

Andy, is the FEC used on any PPC SOCs?

If so, then this patch may cause a DMA of 2 extra bytes per frame
unecessarily although the driver doesn't special-case the allocation
to align the IP header, so this is still probably preferred.

>>> OTOH if NET_IP_ALIGN is 2 then you need to 'fiddle' things so that
>>> the data is dma'd to offset -2 in the skb and then ensure that the
>>> end of frame is set correctly.
>>>
>>
>> That's what the RACC SHIFT16 bit does.
> 
> No, that causes the ethernet controller to add 2 bytes to the frame.
> You then need to change the dma target address to match.
>

Or use skb_pull_inline to ignore the two bytes.

> Otherwise if a new version of the silicon stops ignoring the low
> address with the frame will be misaligned in the buffer.
> 

I'm not sure I understand this.

> The receive frame length will also (probably) be 2 larger than the
> actual frame - so you need to set the end point correctly as well.
> IP will probably ignore the 2 bytes of pad I think you are generating.
> 

The received frame length **is** 2 bytes longer, but these are
eaten by skb_pull_inline().

>> The FEC hardware isn't capable of DMA'ing to an un-aligned address.
>> On ARM, it requires 64-bit alignment, but suggests 128-bit alignment.
>>
>> On other (PPC?) architectures, it requires 32-bit alignment. This is
>> handled by the rx_align field.
> 
> That isn't entirely relevant.
>
> If the kernel is being built with NET_IP_ALIGN set to 0 you should
> align the destination mac address on a 4n boundary
> (Or rather the skb are likely to be allocated that way).

They're not currently allocated that way. The routine
fec_enet_alloc_rxq_buffers
forces the allocations to 32 or 128-bit alignment through the
routine fec_enet_new_rxbdp().

> If it causes misaligned memory reads later on that is a different problem.

That's the problem this patch is designed to address. Without this
patch, the IP header is always mis-aligned.

> The MAC driver has aligned the frames as it was told to.
> 
> 	David
> 
> 

Regards,


Eric
David Laight Sept. 30, 2016, 1:49 p.m. UTC | #9
From: Eric Nelson
> Sent: 30 September 2016 14:27
> Thanks for the feedback David,
> 
> On 09/29/2016 04:07 AM, David Laight wrote:
> > From: Eric Nelson
> >> Sent: 28 September 2016 18:15
> >> On 09/28/2016 09:42 AM, David Laight wrote:
> >>> From: Eric Nelson
> >>>> Sent: 26 September 2016 19:40
> >>>> Hi David,
> >>>>
> >>>> On 09/26/2016 02:26 AM, David Laight wrote:
> >>>>> From: Eric Nelson
> >>>>>> Sent: 24 September 2016 15:42
> >>>>>> The FEC receive accelerator (RACC) supports shifting the data payload of
> >>>>>> received packets by 16-bits, which aligns the payload (IP header) on a
> >>>>>> 4-byte boundary, which is, if not required, at least strongly suggested
> >>>>>> by the Linux networking layer.
> >>>>> ...
> >>>>>> +		/* align IP header */
> >>>>>> +		val |= FEC_RACC_SHIFT16;
> >>>>>
> >>>>> I can't help feeling that there needs to be corresponding
> >>>>> changes to increase the buffer size by 2 (maybe for large mtu)
> >>>>> and to discard two bytes from the frame length.
> >>>>>
> >>>>
> >>>> In the normal case, the fec driver over-allocates all receive packets to
> >>>> be of size FEC_ENET_RX_FRSIZE (2048) minus the value of rx_align,
> >>>> which is either 0x0f (ARM) or 0x03 (PPC).
> >>>>
> >>>> If the frame length is less than rx_copybreak (typically 256), then
> >>>> the frame length from the receive buffer descriptor is used to
> >>>> control the allocation size for a copied buffer, and this will include
> >>>> the two bytes of padding if RACC_SHIFT16 is set.
> >>>>
> >>>>> If probably ought to be predicated on NET_IP_ALIGN as well.
> >>>>>
> >>>> Can you elaborate?
> >>>
> >>> From reading this it seems that the effect of FEC_RACC_SHIFT16 is to
> >>> add two bytes of 'junk' to the start of every receive frame.
> >>>
> >>
> >> That's right. Two bytes of junk between the MAC header and the
> >> IP header.
> >>
> >>> In the 'copybreak' case the new skb would need to be 2 bytes shorter
> >>> than the length reported by the hardware, and the data copied from
> >>> 2 bytes into the dma buffer.
> >>>
> >>
> >> As it stands, the skb allocated by the copybreak routine will include
> >> the two bytes of padding, and the call to skb_pull_inline will ignore
> >> them.
> >
> > Ok, I didn't see that call being added by this patch.
> >
> >>> The extra 2 bytes also mean the that maximum mtu that can be received
> >>> into a buffer is two bytes less.
> >>>
> >>
> >> Right, but I think the max is already high enough that this isn't a
> >> problem.
> >>
> >>> If someone sets the mtu to (say) 9k for jumbo frames this might matter.
> >>> Even with fixed 2048 byte buffers it reduces the maximum value the mtu
> >>> can be set to by 2.
> >>>
> >>
> >> As far as I can tell, the fec driver doesn't support jumbo frames, and
> >> the max frame length is currently hard-coded at PKT_MAXBUF_SIZE (1522).
> >>
> >> This is well within the 2048-byte allocation, even with optional headers
> >> for VLAN etc.
> >
> > Hmm...
> >
> > That (probably) means all the skb the driver allocates are actually 4k.
> > It would be much better to reduce the size so that the entire skb
> > (with packet buffer) is less than 2k.
> >
> 
> That seems worthwhile, but un-related to this patch.

Indeed.

> It appears to me that the received packets could be allocated as
> 
> PKT_MAXBUF_SIZE+fep->rx_align+NET_IP_ALIGN
> 
> (+2 if FEC_RACC_SHIFT16 is used)

No.
The packet buffers need to be allocated NET_IP_ALIGN + PKT_MAXBUF_SIZE
byte long and (I assume) aligned on a fep->rx_align byte boundary.

If NET_IP_ALIGN is set (to 2) then FEC_RACC_SHIFT16 must also me set
so that the ethernet frame itself is 4n+2 aligned.

> >>> Now if NET_IP_ALIGN is zero then it is fine for the rx frame to start
> >>> on a 4n boundary, and the skb are likely to be allocated that way.
> >>> In this case you don't want to extra two bytes of 'junk'.
> >>>
> >> NET_IP_ALIGN is defaulting to 2 by the conditional in skbuff.h
> >
> > Even though it is always currently set is isn't really ideal to have
> > a driver that breaks if it isn't set.
> > This could easily happen at some point in the future if the ethernet
> > logic is put with a different cpu.
> >
> 
> After multiple reads, I'm confused about the meaning of NET_IP_ALIGN
> and how it should be used.
> 
> From Documentation/unaligned-memory-access.txt, I take it that this
> should be configured on a per-architecture basis, and it seems to be
> set to zero on both PPC and x86.
> 
> I wonder if this is proper though. It seems that its' use might depend
> on the I/O subsystem(s) in use as much as the architecture.
...

If the cpu cannot do misaligned memory cycles then NET_IP_ALIGN must be 2
and all receive frames must be aligned like that.

If the cpu can do misaligned memory cycles then the alignment of receive
ethernet frames doesn't matter that much.
NET_IP_ALIGN is likely to be set to zero because the cost of the cpu
doing misaligned transfers it likely to be a lot less than that of
un-optimised dma accesses to misaligned memory [1] [2].

If NET_IP_ALIGN is zero then I believe that ethernet drivers are
allowed to build skb that have the frame on a 4n+2 alignment.
This is probably sensible if the hardware can write the two bytes.
(DM might correct me there.)

	David

[1] The original sparc sbus 'DMA' part did multiple 16bit transfers instead
  of a burst of 32bit transfers. This meant the buffer had to be misaligned
  and a software copy done to align the frames. Fixed in the DMA+ part.

[2] PCIe writes are likely to be much faster if they contain entire cache
  lines of data.
Eric Nelson Sept. 30, 2016, 2:16 p.m. UTC | #10
Hi David,

On 09/30/2016 06:49 AM, David Laight wrote:
> From: Eric Nelson
>> Sent: 30 September 2016 14:27
>> Thanks for the feedback David,
>>
>> On 09/29/2016 04:07 AM, David Laight wrote:
>>> From: Eric Nelson
>>>> Sent: 28 September 2016 18:15
>>>> On 09/28/2016 09:42 AM, David Laight wrote:
>>>>> From: Eric Nelson
>>>>>> Sent: 26 September 2016 19:40
>>>>>> Hi David,
>>>>>>
>>>>>> On 09/26/2016 02:26 AM, David Laight wrote:
>>>>>>> From: Eric Nelson
>>>>>>>> Sent: 24 September 2016 15:42
>>>>>>>> The FEC receive accelerator (RACC) supports shifting the data payload of
>>>>>>>> received packets by 16-bits, which aligns the payload (IP header) on a
>>>>>>>> 4-byte boundary, which is, if not required, at least strongly suggested
>>>>>>>> by the Linux networking layer.
>>>>>>> ...
>>>>>>>> +		/* align IP header */
>>>>>>>> +		val |= FEC_RACC_SHIFT16;
>>>>>>>
>>>>>>> I can't help feeling that there needs to be corresponding
>>>>>>> changes to increase the buffer size by 2 (maybe for large mtu)
>>>>>>> and to discard two bytes from the frame length.
>>>>>>>
>>>>>>
>>>>>> In the normal case, the fec driver over-allocates all receive packets to
>>>>>> be of size FEC_ENET_RX_FRSIZE (2048) minus the value of rx_align,
>>>>>> which is either 0x0f (ARM) or 0x03 (PPC).
>>>>>>
>>>>>> If the frame length is less than rx_copybreak (typically 256), then
>>>>>> the frame length from the receive buffer descriptor is used to
>>>>>> control the allocation size for a copied buffer, and this will include
>>>>>> the two bytes of padding if RACC_SHIFT16 is set.
>>>>>>
>>>>>>> If probably ought to be predicated on NET_IP_ALIGN as well.
>>>>>>>
>>>>>> Can you elaborate?
>>>>>
>>>>> From reading this it seems that the effect of FEC_RACC_SHIFT16 is to
>>>>> add two bytes of 'junk' to the start of every receive frame.
>>>>>
>>>>
>>>> That's right. Two bytes of junk between the MAC header and the
>>>> IP header.
>>>>
>>>>> In the 'copybreak' case the new skb would need to be 2 bytes shorter
>>>>> than the length reported by the hardware, and the data copied from
>>>>> 2 bytes into the dma buffer.
>>>>>
>>>>
>>>> As it stands, the skb allocated by the copybreak routine will include
>>>> the two bytes of padding, and the call to skb_pull_inline will ignore
>>>> them.
>>>
>>> Ok, I didn't see that call being added by this patch.
>>>
>>>>> The extra 2 bytes also mean the that maximum mtu that can be received
>>>>> into a buffer is two bytes less.
>>>>>
>>>>
>>>> Right, but I think the max is already high enough that this isn't a
>>>> problem.
>>>>
>>>>> If someone sets the mtu to (say) 9k for jumbo frames this might matter.
>>>>> Even with fixed 2048 byte buffers it reduces the maximum value the mtu
>>>>> can be set to by 2.
>>>>>
>>>>
>>>> As far as I can tell, the fec driver doesn't support jumbo frames, and
>>>> the max frame length is currently hard-coded at PKT_MAXBUF_SIZE (1522).
>>>>
>>>> This is well within the 2048-byte allocation, even with optional headers
>>>> for VLAN etc.
>>>
>>> Hmm...
>>>
>>> That (probably) means all the skb the driver allocates are actually 4k.
>>> It would be much better to reduce the size so that the entire skb
>>> (with packet buffer) is less than 2k.
>>>
>>
>> That seems worthwhile, but un-related to this patch.
> 
> Indeed.
> 
>> It appears to me that the received packets could be allocated as
>>
>> PKT_MAXBUF_SIZE+fep->rx_align+NET_IP_ALIGN
>>
>> (+2 if FEC_RACC_SHIFT16 is used)
> 
> No.
> The packet buffers need to be allocated NET_IP_ALIGN + PKT_MAXBUF_SIZE
> byte long and (I assume) aligned on a fep->rx_align byte boundary.
> 

I think we're saying the same thing here, with the exception of the
+2 for FEC_RACC_SHIFT16.

> If NET_IP_ALIGN is set (to 2) then FEC_RACC_SHIFT16 must also me set
> so that the ethernet frame itself is 4n+2 aligned.
> 

This patch does this, but not with the beginning of the skb.

It also does this when NET_IP_ALIGN is zero though, and I believe this
is the right thing, so the IP header is aligned in a sensible way.

The driver can't handle a DMA to (4n+2) on any architecture.

>>>>> Now if NET_IP_ALIGN is zero then it is fine for the rx frame to start
>>>>> on a 4n boundary, and the skb are likely to be allocated that way.
>>>>> In this case you don't want to extra two bytes of 'junk'.
>>>>>
>>>> NET_IP_ALIGN is defaulting to 2 by the conditional in skbuff.h
>>>
>>> Even though it is always currently set is isn't really ideal to have
>>> a driver that breaks if it isn't set.
>>> This could easily happen at some point in the future if the ethernet
>>> logic is put with a different cpu.
>>>
>>
>> After multiple reads, I'm confused about the meaning of NET_IP_ALIGN
>> and how it should be used.
>>
>> From Documentation/unaligned-memory-access.txt, I take it that this
>> should be configured on a per-architecture basis, and it seems to be
>> set to zero on both PPC and x86.
>>
>> I wonder if this is proper though. It seems that its' use might depend
>> on the I/O subsystem(s) in use as much as the architecture.
> ...
> 
> If the cpu cannot do misaligned memory cycles then NET_IP_ALIGN must be 2
> and all receive frames must be aligned like that.
> 

On ARM, the CPU can't handle misaligned memory cycles without
taking an alignment fault and NET_IP_ALIGN is set to 2.

On PPC, NET_IP_ALIGN is set to zero.

I could use some help from NXP about whether the driver is used on
PPC, but I don't think it can DMA to 4n+2 addresses on any architecture
and the purpose of this patch is to align the frame on a (4n+2)
address.

> If the cpu can do misaligned memory cycles then the alignment of receive
> ethernet frames doesn't matter that much.
> NET_IP_ALIGN is likely to be set to zero because the cost of the cpu
> doing misaligned transfers it likely to be a lot less than that of
> un-optimised dma accesses to misaligned memory [1] [2].
> 

On ARM, we have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
but I find it hard to believe that taking alignment faults is more
efficient than adding two bytes to the start of the frame.

> If NET_IP_ALIGN is zero then I believe that ethernet drivers are
> allowed to build skb that have the frame on a 4n+2 alignment.
> This is probably sensible if the hardware can write the two bytes.
> (DM might correct me there.)
> 

Again, I don't think the FEC can do this, even if PPC does allow
DMA to 4n+2 addresses for other functions.

> 	David
> 
> [1] The original sparc sbus 'DMA' part did multiple 16bit transfers instead
>   of a burst of 32bit transfers. This meant the buffer had to be misaligned
>   and a software copy done to align the frames. Fixed in the DMA+ part.
> 
> [2] PCIe writes are likely to be much faster if they contain entire cache
>   lines of data.
>
Russell King (Oracle) Oct. 1, 2016, 7:52 p.m. UTC | #11
On Fri, Sep 30, 2016 at 07:16:12AM -0700, Eric Nelson wrote:
> On ARM, the CPU can't handle misaligned memory cycles without
> taking an alignment fault and NET_IP_ALIGN is set to 2.

Let's get this right...  With Linux on MMU parts:

On ARMv6+, unaligned memory cycles using the LDR, LDRH and corresponding
store instructions are handled in hardware without any exception being
raised.

On pre-ARMv6, such instructions raise an alignment exception, and we fix
up the load/store manually.

Where things behave the same is with the LDM (load multiple) and STM
(store multiple) instructions.  Hardware does not fix these up if they
are unaligned: it is expected that the base address will always be
aligned to a 32-bit word.

For some reason, the compiler guys have decided it's okay to use these
instructions as an optimisation, and I see no way to disable this
behaviour.

> On ARM, we have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
> but I find it hard to believe that taking alignment faults is more
> efficient than adding two bytes to the start of the frame.

... see above, hence why CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
It's only when the compiler decides to do silly things that things go
wrong.  However, net code does not care about that configuration
setting, so it's irrelevant to this discussion.

The issue with the networking layer is that it passes around structure
pointers which may not be "naturally aligned" - technically it goes
against the C standard specs.  However, you'll find it hard to argue
against this, so we have to accept that the networking people expect
it to work.

The optimisation that the C compiler uses (using LDM to access multiple
32-bit consecutive words) is legal and efficient when the structure
pointers are aligned as it expects, but that all breaks if the pointer
is not so aligned.  So, raising it as a bug against the C compiler isn't
going to work either.

What may work is to raise a feature request with compiler people to have
a mechanism to disable the LDM/STM optimisation for code where we know
that pointers may not be naturally aligned.
David Laight Oct. 3, 2016, 4:42 p.m. UTC | #12
From: Russell King - ARM Linux
> Sent: 01 October 2016 20:52
> On Fri, Sep 30, 2016 at 07:16:12AM -0700, Eric Nelson wrote:
> > On ARM, the CPU can't handle misaligned memory cycles without
> > taking an alignment fault and NET_IP_ALIGN is set to 2.
> 
> Let's get this right...  With Linux on MMU parts:
> 
> On ARMv6+, unaligned memory cycles using the LDR, LDRH and corresponding
> store instructions are handled in hardware without any exception being
> raised.
> 
> On pre-ARMv6, such instructions raise an alignment exception, and we fix
> up the load/store manually.

I'm not sure that is a good idea but...

> Where things behave the same is with the LDM (load multiple) and STM
> (store multiple) instructions.  Hardware does not fix these up if they
> are unaligned: it is expected that the base address will always be
> aligned to a 32-bit word.
> 
> For some reason, the compiler guys have decided it's okay to use these
> instructions as an optimisation, and I see no way to disable this
> behaviour.
...
> The issue with the networking layer is that it passes around structure
> pointers which may not be "naturally aligned" - technically it goes
> against the C standard specs.  However, you'll find it hard to argue
> against this, so we have to accept that the networking people expect
> it to work.

I think it 'only' casts misaligned pointers to structure on systems
where unaligned accesses are allowed.
It is almost impossible to do a 'realignment copy' on (for example) sparc.

> The optimisation that the C compiler uses (using LDM to access multiple
> 32-bit consecutive words) is legal and efficient when the structure
> pointers are aligned as it expects, but that all breaks if the pointer
> is not so aligned.  So, raising it as a bug against the C compiler isn't
> going to work either.
> 
> What may work is to raise a feature request with compiler people to have
> a mechanism to disable the LDM/STM optimisation for code where we know
> that pointers may not be naturally aligned.

What happens is the relevant structures are marked 'packed'?
If the compiler still generates LDM/STM that is a bug.
Assuming the compiler is targeting v6+ it shouldn't generate byte accesses.

	David
Eric Nelson Oct. 3, 2016, 6:48 p.m. UTC | #13
Hi Russell,

On 10/01/2016 09:52 PM, Russell King - ARM Linux wrote:
> On Fri, Sep 30, 2016 at 07:16:12AM -0700, Eric Nelson wrote:
>> On ARM, the CPU can't handle misaligned memory cycles without
>> taking an alignment fault and NET_IP_ALIGN is set to 2.
> 
> Let's get this right...  With Linux on MMU parts:
> 
> On ARMv6+, unaligned memory cycles using the LDR, LDRH and corresponding
> store instructions are handled in hardware without any exception being
> raised.
> 
> On pre-ARMv6, such instructions raise an alignment exception, and we fix
> up the load/store manually.
> 
> Where things behave the same is with the LDM (load multiple) and STM
> (store multiple) instructions.  Hardware does not fix these up if they
> are unaligned: it is expected that the base address will always be
> aligned to a 32-bit word.
> 

Thanks for the clarification. This helps me understand why I didn't
see the exceptions that Eric warned about:

https://www.spinics.net/lists/netdev/msg397012.html

> For some reason, the compiler guys have decided it's okay to use these
> instructions as an optimisation, and I see no way to disable this
> behaviour.
> 
>> On ARM, we have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
>> but I find it hard to believe that taking alignment faults is more
>> efficient than adding two bytes to the start of the frame.
> 
> ... see above, hence why CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is set.
> It's only when the compiler decides to do silly things that things go
> wrong.  However, net code does not care about that configuration
> setting, so it's irrelevant to this discussion.
> 

The obfuscated optimization in ip_gro_receive doesn't help by
reading two 16-bit values as a __be32:

id = ntohl(*(__be32 *)&iph->id);
flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF));
id >>= 16;

> The issue with the networking layer is that it passes around structure
> pointers which may not be "naturally aligned" - technically it goes
> against the C standard specs.  However, you'll find it hard to argue
> against this, so we have to accept that the networking people expect
> it to work.
> 
> The optimisation that the C compiler uses (using LDM to access multiple
> 32-bit consecutive words) is legal and efficient when the structure
> pointers are aligned as it expects, but that all breaks if the pointer
> is not so aligned.  So, raising it as a bug against the C compiler isn't
> going to work either.
> 
> What may work is to raise a feature request with compiler people to have
> a mechanism to disable the LDM/STM optimisation for code where we know
> that pointers may not be naturally aligned.
> 

Agreed, but that's a long path even if the compiler folks agree
immediately.

It's probably to just fix the driver with known issues for now.

Did you have any comments on the patch? I tried to pull what I found
from your patch set, but only the enabling of the SHIFT16 bit.

Please advise,


Eric
Andy Duan Oct. 8, 2016, 2:44 a.m. UTC | #14
From: Eric Nelson <eric@nelint.com> Sent: Friday, September 30, 2016 9:27 PM
> To: David Laight <David.Laight@ACULAB.COM>; netdev@vger.kernel.org
> Cc: linux@arm.linux.org.uk; andrew@lunn.ch; Andy Duan
> <fugang.duan@nxp.com>; otavio@ossystems.com.br;
> edumazet@google.com; troy.kisky@boundarydevices.com;
> davem@davemloft.net; u.kleine-koenig@pengutronix.de
> Subject: Re: [PATCH 3/3] net: fec: align IP header in hardware
> 
> Thanks for the feedback David,
> 
> On 09/29/2016 04:07 AM, David Laight wrote:
> > From: Eric Nelson
> >> Sent: 28 September 2016 18:15
> >> On 09/28/2016 09:42 AM, David Laight wrote:
> >>> From: Eric Nelson
> >>>> Sent: 26 September 2016 19:40
> >>>> Hi David,
> >>>>
> >>>> On 09/26/2016 02:26 AM, David Laight wrote:
> >>>>> From: Eric Nelson
> >>>>>> Sent: 24 September 2016 15:42
> >>>>>> The FEC receive accelerator (RACC) supports shifting the data
> >>>>>> payload of received packets by 16-bits, which aligns the payload
> >>>>>> (IP header) on a 4-byte boundary, which is, if not required, at
> >>>>>> least strongly suggested by the Linux networking layer.
> >>>>> ...
> >>>>>> +		/* align IP header */
> >>>>>> +		val |= FEC_RACC_SHIFT16;
> >>>>>
> >>>>> I can't help feeling that there needs to be corresponding changes
> >>>>> to increase the buffer size by 2 (maybe for large mtu) and to
> >>>>> discard two bytes from the frame length.
> >>>>>
> >>>>
> >>>> In the normal case, the fec driver over-allocates all receive
> >>>> packets to be of size FEC_ENET_RX_FRSIZE (2048) minus the value of
> >>>> rx_align, which is either 0x0f (ARM) or 0x03 (PPC).
> >>>>
> >>>> If the frame length is less than rx_copybreak (typically 256), then
> >>>> the frame length from the receive buffer descriptor is used to
> >>>> control the allocation size for a copied buffer, and this will
> >>>> include the two bytes of padding if RACC_SHIFT16 is set.
> >>>>
> >>>>> If probably ought to be predicated on NET_IP_ALIGN as well.
> >>>>>
> >>>> Can you elaborate?
> >>>
> >>> From reading this it seems that the effect of FEC_RACC_SHIFT16 is to
> >>> add two bytes of 'junk' to the start of every receive frame.
> >>>
> >>
> >> That's right. Two bytes of junk between the MAC header and the IP
> >> header.
> >>
> >>> In the 'copybreak' case the new skb would need to be 2 bytes shorter
> >>> than the length reported by the hardware, and the data copied from
> >>> 2 bytes into the dma buffer.
> >>>
> >>
> >> As it stands, the skb allocated by the copybreak routine will include
> >> the two bytes of padding, and the call to skb_pull_inline will ignore
> >> them.
> >
> > Ok, I didn't see that call being added by this patch.
> >
> >>> The extra 2 bytes also mean the that maximum mtu that can be
> >>> received into a buffer is two bytes less.
> >>>
> >>
> >> Right, but I think the max is already high enough that this isn't a
> >> problem.
> >>
> >>> If someone sets the mtu to (say) 9k for jumbo frames this might matter.
> >>> Even with fixed 2048 byte buffers it reduces the maximum value the
> >>> mtu can be set to by 2.
> >>>
> >>
> >> As far as I can tell, the fec driver doesn't support jumbo frames,
> >> and the max frame length is currently hard-coded at PKT_MAXBUF_SIZE
> (1522).
> >>
> >> This is well within the 2048-byte allocation, even with optional
> >> headers for VLAN etc.
> >
> > Hmm...
> >
> > That (probably) means all the skb the driver allocates are actually 4k.
> > It would be much better to reduce the size so that the entire skb
> > (with packet buffer) is less than 2k.
> >
> 
> That seems worthwhile, but un-related to this patch.
> 
> It appears to me that the received packets could be allocated as
> 
> PKT_MAXBUF_SIZE+fep->rx_align+NET_IP_ALIGN
> 
> (+2 if FEC_RACC_SHIFT16 is used)
> 
> >>> Now if NET_IP_ALIGN is zero then it is fine for the rx frame to
> >>> start on a 4n boundary, and the skb are likely to be allocated that way.
> >>> In this case you don't want to extra two bytes of 'junk'.
> >>>
> >> NET_IP_ALIGN is defaulting to 2 by the conditional in skbuff.h
> >
> > Even though it is always currently set is isn't really ideal to have a
> > driver that breaks if it isn't set.
> > This could easily happen at some point in the future if the ethernet
> > logic is put with a different cpu.
> >
> 
> After multiple reads, I'm confused about the meaning of NET_IP_ALIGN and
> how it should be used.
> 
> From Documentation/unaligned-memory-access.txt, I take it that this should
> be configured on a per-architecture basis, and it seems to be set to zero on
> both PPC and x86.
> 
> I wonder if this is proper though. It seems that its' use might depend on the
> I/O subsystem(s) in use as much as the architecture.
> 
> For example, it might be desirable to have a different value for a PCIe
> interface than for an integrated MAC like the FEC.
> 
> Looking at the example of the 3c59x driver, I see a pattern of an allocation
> that adds NET_IP_ALIGN followed by an skb->reserve() of NET_IP_ALIGN
> before determining the target address to end up with allocation with 4n+2
> alignment.
> 
> This seems somewhat equivalent to this patch, except that we're using the
> allocated address as the target and using skb_pull_inline afterwards.
> 
> Andy, is the FEC used on any PPC SOCs?
> 
Sorry to reply the mail due to holiday.

Currently, i.MX and ColdFire like MCF5xxx series use the driver. And ColdFire series don't define FEC_QUIRK_HAS_RACC quirk flag, so the patch don't impact ColdFire.

The patch has no problem, has nothing related to DMA part.

> If so, then this patch may cause a DMA of 2 extra bytes per frame
> unecessarily although the driver doesn't special-case the allocation to align
> the IP header, so this is still probably preferred.
> 
> >>> OTOH if NET_IP_ALIGN is 2 then you need to 'fiddle' things so that
> >>> the data is dma'd to offset -2 in the skb and then ensure that the
> >>> end of frame is set correctly.
> >>>
> >>
> >> That's what the RACC SHIFT16 bit does.
> >
> > No, that causes the ethernet controller to add 2 bytes to the frame.
> > You then need to change the dma target address to match.
> >
> 
> Or use skb_pull_inline to ignore the two bytes.
> 
> > Otherwise if a new version of the silicon stops ignoring the low
> > address with the frame will be misaligned in the buffer.
> >
> 
> I'm not sure I understand this.
> 
> > The receive frame length will also (probably) be 2 larger than the
> > actual frame - so you need to set the end point correctly as well.
> > IP will probably ignore the 2 bytes of pad I think you are generating.
> >
> 
> The received frame length **is** 2 bytes longer, but these are eaten by
> skb_pull_inline().
> 
> >> The FEC hardware isn't capable of DMA'ing to an un-aligned address.
> >> On ARM, it requires 64-bit alignment, but suggests 128-bit alignment.
> >>
> >> On other (PPC?) architectures, it requires 32-bit alignment. This is
> >> handled by the rx_align field.
> >
> > That isn't entirely relevant.
> >
> > If the kernel is being built with NET_IP_ALIGN set to 0 you should
> > align the destination mac address on a 4n boundary (Or rather the skb
> > are likely to be allocated that way).
> 
> They're not currently allocated that way. The routine
> fec_enet_alloc_rxq_buffers forces the allocations to 32 or 128-bit alignment
> through the routine fec_enet_new_rxbdp().
> 
> > If it causes misaligned memory reads later on that is a different problem.
> 
> That's the problem this patch is designed to address. Without this patch, the
> IP header is always mis-aligned.
> 
> > The MAC driver has aligned the frames as it was told to.
> >
> > 	David
> >
> >
> 
> Regards,
> 
> 
> Eric
diff mbox

Patch

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 0219e79..1fa2d87 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -180,6 +180,7 @@  MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 /* FEC receive acceleration */
 #define FEC_RACC_IPDIS		(1 << 1)
 #define FEC_RACC_PRODIS		(1 << 2)
+#define FEC_RACC_SHIFT16	BIT(7)
 #define FEC_RACC_OPTIONS	(FEC_RACC_IPDIS | FEC_RACC_PRODIS)
 
 /*
@@ -945,9 +946,11 @@  fec_restart(struct net_device *ndev)
 
 #if !defined(CONFIG_M5272)
 	if (fep->quirks & FEC_QUIRK_HAS_RACC) {
-		/* set RX checksum */
 		val = readl(fep->hwp + FEC_RACC);
+		/* align IP header */
+		val |= FEC_RACC_SHIFT16;
 		if (fep->csum_flags & FLAG_RX_CSUM_ENABLED)
+			/* set RX checksum */
 			val |= FEC_RACC_OPTIONS;
 		else
 			val &= ~FEC_RACC_OPTIONS;
@@ -1428,6 +1431,12 @@  fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
 		prefetch(skb->data - NET_IP_ALIGN);
 		skb_put(skb, pkt_len - 4);
 		data = skb->data;
+
+#if !defined(CONFIG_M5272)
+		if (fep->quirks & FEC_QUIRK_HAS_RACC)
+			data = skb_pull_inline(skb, 2);
+#endif
+
 		if (!is_copybreak && need_swap)
 			swap_buffer(data, pkt_len);