diff mbox

ixgbe: drop zero length frame segments during a packet split rx

Message ID 1314972197-31557-1-git-send-email-nhorman@tuxdriver.com
State Awaiting Upstream, archived
Delegated to: David Miller
Headers show

Commit Message

Neil Horman Sept. 2, 2011, 2:03 p.m. UTC
This oops was reported recently no ppc64 hardware:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc0000000004dda0c
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
iptable_fi
lter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
nf_conntrack ip6table_filter ip6_tables ipv6 jsm ses enclosure sg ixgbe
mdio e1000 ehea ext4 jbd2 mbcache sd_mod crc_t10dif ipr dm_mod
NIP: c0000000004dda0c LR: c0000000004e3e50 CTR: c0000000004e3e20
REGS: c0000001bffeb8d0 TRAP: 0300   Not tainted  (3.1.0-rc2-10121-gab7e2db)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 28002042  XER: 20000000
CFAR: c000000000004d70
DAR: 0000000000000000, DSISR: 40000000
TASK = c000000000d548e0[0] 'swapper' THREAD: c000000000dfc000 CPU: 0
GPR04: c0000000010f4d80 c0000001bffebd80 0000000000000000 c0000001b18a8200
GPR08: 0000000000000280 c0000001bcc517a8 c0000001b18a7f80 0000000000000000
GPR12: d0000000047e5bb0 c000000001f10000 c0000001b19c8700 0000000000000000
GPR16: c0000001bffebd80 0000000000000083 c00000018f2447a0 0000000000000002
GPR20: 0000000000000000 c0000001ba860010 c0000001ba860000 d000000003d40000
GPR24: 0000000000000000 0000000000000083 d000000003d40000 0000000000000001
GPR28: c00000018f244780 c0000001b2b94310 c000000000da95f0 c0000001bcc51780
NIP [c0000000004dda0c] .skb_gro_reset_offset+0x5c/0xe0
LR [c0000000004e3e50] .napi_gro_receive+0x30/0x120
Call Trace:
[c0000001bffebb50] [c000000000da95f0] perf_callchain_user+0x0/0x10 (unreliable)
[c0000001bffebbf0] [d0000000047bd118] .ixgbe_clean_rx_irq+0x7a8/0x8a0 [ixgbe]
[c0000001bffebd10] [d0000000047bd414] .ixgbe_poll+0x64/0x160 [ixgbe]
[c0000001bffebdd0] [c0000000004e3358] .net_rx_action+0x108/0x2a0
[c0000001bffebea0] [c00000000009b220] .__do_softirq+0x110/0x2a0
[c0000001bffebf90] [c000000000023798] .call_do_softirq+0x14/0x24
[c000000000dff830] [c000000000011148] .do_softirq+0xf8/0x130
[c000000000dff8d0] [c00000000009aeb4] .irq_exit+0xb4/0xc0
[c000000000dff950] [c000000000011254] .do_IRQ+0xd4/0x300
[c000000000dffa10] [c000000000005024] hardware_interrupt_entry+0x18/0x74
--- Exception: 501 at .pseries_dedicated_idle_sleep+0xe4/0x210
LR = .pseries_dedicated_idle_sleep+0x8c/0x210
[c000000000dffd00] [c00000000005b194] .pseries_dedicated_idle_sleep+0x194/0x210
(unreliable)
[c000000000dffdc0] [c000000000018c84] .cpu_idle+0x164/0x210
[c000000000dffe70] [c00000000000b0d0] .rest_init+0x90/0xb0
[c000000000dffef0] [c000000000830bc0] .start_kernel+0x54c/0x56c
[c000000000dfff90] [c00000000000953c] .start_here_common+0x1c/0x60

Its caused when skb_gro_reset_offset attempts to call PageHighMem on
skb_shinfo(skb)->frags[0].page, when the frags array was left uninitalized.
This can happen in the ixgbe driver if the hardware reports a zero length rx
descriptor ni the middle of a packet split receive transaction.  I've consulted
with Jesse Brandeburg on this, who is attempting to root cause the issue at
Intel, but it seems prudent to add this check to the driver to discard frames of
that encounter this error to avoid the opps

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: John Fastabend <john.r.fastabend@intel.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   17 +++++++++++------
 1 files changed, 11 insertions(+), 6 deletions(-)

Comments

Kirsher, Jeffrey T Sept. 2, 2011, 4:04 p.m. UTC | #1
On Fri, 2011-09-02 at 07:03 -0700, Neil Horman wrote:
> This oops was reported recently no ppc64 hardware:
> Unable to handle kernel paging request for data at address 0x00000000
> Faulting instruction address: 0xc0000000004dda0c
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=1024 NUMA pSeries
> Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
> iptable_fi
> lter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
> nf_conntrack ip6table_filter ip6_tables ipv6 jsm ses enclosure sg
> ixgbe
> mdio e1000 ehea ext4 jbd2 mbcache sd_mod crc_t10dif ipr dm_mod
> NIP: c0000000004dda0c LR: c0000000004e3e50 CTR: c0000000004e3e20
> REGS: c0000001bffeb8d0 TRAP: 0300   Not tainted
> (3.1.0-rc2-10121-gab7e2db)
> MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 28002042  XER: 20000000
> CFAR: c000000000004d70
> DAR: 0000000000000000, DSISR: 40000000
> TASK = c000000000d548e0[0] 'swapper' THREAD: c000000000dfc000 CPU: 0
> GPR04: c0000000010f4d80 c0000001bffebd80 0000000000000000
> c0000001b18a8200
> GPR08: 0000000000000280 c0000001bcc517a8 c0000001b18a7f80
> 0000000000000000
> GPR12: d0000000047e5bb0 c000000001f10000 c0000001b19c8700
> 0000000000000000
> GPR16: c0000001bffebd80 0000000000000083 c00000018f2447a0
> 0000000000000002
> GPR20: 0000000000000000 c0000001ba860010 c0000001ba860000
> d000000003d40000
> GPR24: 0000000000000000 0000000000000083 d000000003d40000
> 0000000000000001
> GPR28: c00000018f244780 c0000001b2b94310 c000000000da95f0
> c0000001bcc51780
> NIP [c0000000004dda0c] .skb_gro_reset_offset+0x5c/0xe0
> LR [c0000000004e3e50] .napi_gro_receive+0x30/0x120
> Call Trace:
> [c0000001bffebb50] [c000000000da95f0] perf_callchain_user+0x0/0x10
> (unreliable)
> [c0000001bffebbf0] [d0000000047bd118] .ixgbe_clean_rx_irq+0x7a8/0x8a0
> [ixgbe]
> [c0000001bffebd10] [d0000000047bd414] .ixgbe_poll+0x64/0x160 [ixgbe]
> [c0000001bffebdd0] [c0000000004e3358] .net_rx_action+0x108/0x2a0
> [c0000001bffebea0] [c00000000009b220] .__do_softirq+0x110/0x2a0
> [c0000001bffebf90] [c000000000023798] .call_do_softirq+0x14/0x24
> [c000000000dff830] [c000000000011148] .do_softirq+0xf8/0x130
> [c000000000dff8d0] [c00000000009aeb4] .irq_exit+0xb4/0xc0
> [c000000000dff950] [c000000000011254] .do_IRQ+0xd4/0x300
> [c000000000dffa10] [c000000000005024] hardware_interrupt_entry
> +0x18/0x74
> --- Exception: 501 at .pseries_dedicated_idle_sleep+0xe4/0x210
> LR = .pseries_dedicated_idle_sleep+0x8c/0x210
> [c000000000dffd00] [c00000000005b194] .pseries_dedicated_idle_sleep
> +0x194/0x210
> (unreliable)
> [c000000000dffdc0] [c000000000018c84] .cpu_idle+0x164/0x210
> [c000000000dffe70] [c00000000000b0d0] .rest_init+0x90/0xb0
> [c000000000dffef0] [c000000000830bc0] .start_kernel+0x54c/0x56c
> [c000000000dfff90] [c00000000000953c] .start_here_common+0x1c/0x60
> 
> Its caused when skb_gro_reset_offset attempts to call PageHighMem on
> skb_shinfo(skb)->frags[0].page, when the frags array was left
> uninitalized.
> This can happen in the ixgbe driver if the hardware reports a zero
> length rx
> descriptor ni the middle of a packet split receive transaction.  I've
> consulted
> with Jesse Brandeburg on this, who is attempting to root cause the
> issue at
> Intel, but it seems prudent to add this check to the driver to discard
> frames of
> that encounter this error to avoid the opps
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: Thadeu Lima de Souza Cascardo
> <cascardo@linux.vnet.ibm.com>
> CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
> CC: Alexander Duyck <alexander.h.duyck@intel.com>
> CC: John Fastabend <john.r.fastabend@intel.com>
> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> CC: David S. Miller <davem@davemloft.net>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   17
> +++++++++++------
>  1 files changed, 11 insertions(+), 6 deletions(-) 

Thanks Neil, I have added the patch to my queue of ixgbe patches.

This patch was made against net-next, was this issue only seen on the
net-next kernel?
Duyck, Alexander H Sept. 2, 2011, 4:17 p.m. UTC | #2
This kind of fix just opens up a whole can of security related worms.  
If you are going to discard a packet you should do it after we have 
reached the EOP in the series.  My advice would be to determine what 
traits identify this packet and add those to the check for the 
IXGBE_RXDADV_ERR_FRAME_ERR_MASK check further down in the code.  Likely 
what you are seeing is skb_headlen(skb) will be equal to 0.

I'm suspecting this is some sort of read corruption.  It looks like in 
order to trigger it you have to either be reading rx_buffer_info->dma as 
0, or the header length is being read as 0.  Do you know if you actually 
have header split enabled when this is occuring?  Are you running with 
jumbo frames enabled to see the issue?  If not then packet split 
wouldn't be enabled.

Is this occurring on net-next or on an older kernel?  I just want to be 
sure since we added a read memory barrier in 2.6.34 to address the fact 
that the length and descriptor DD bits were being read in the wrong 
order resulting in the length being corrupted on PowerPC systems.  The 
fact that we are now seeing another length error on PowerPC seems very odd.

Thanks,

Alex

On 09/02/2011 07:03 AM, Neil Horman wrote:
> This oops was reported recently no ppc64 hardware:
> Unable to handle kernel paging request for data at address 0x00000000
> Faulting instruction address: 0xc0000000004dda0c
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=1024 NUMA pSeries
> Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
> iptable_fi
> lter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
> nf_conntrack ip6table_filter ip6_tables ipv6 jsm ses enclosure sg ixgbe
> mdio e1000 ehea ext4 jbd2 mbcache sd_mod crc_t10dif ipr dm_mod
> NIP: c0000000004dda0c LR: c0000000004e3e50 CTR: c0000000004e3e20
> REGS: c0000001bffeb8d0 TRAP: 0300   Not tainted  (3.1.0-rc2-10121-gab7e2db)
> MSR: 8000000000009032<EE,ME,IR,DR>   CR: 28002042  XER: 20000000
> CFAR: c000000000004d70
> DAR: 0000000000000000, DSISR: 40000000
> TASK = c000000000d548e0[0] 'swapper' THREAD: c000000000dfc000 CPU: 0
> GPR04: c0000000010f4d80 c0000001bffebd80 0000000000000000 c0000001b18a8200
> GPR08: 0000000000000280 c0000001bcc517a8 c0000001b18a7f80 0000000000000000
> GPR12: d0000000047e5bb0 c000000001f10000 c0000001b19c8700 0000000000000000
> GPR16: c0000001bffebd80 0000000000000083 c00000018f2447a0 0000000000000002
> GPR20: 0000000000000000 c0000001ba860010 c0000001ba860000 d000000003d40000
> GPR24: 0000000000000000 0000000000000083 d000000003d40000 0000000000000001
> GPR28: c00000018f244780 c0000001b2b94310 c000000000da95f0 c0000001bcc51780
> NIP [c0000000004dda0c] .skb_gro_reset_offset+0x5c/0xe0
> LR [c0000000004e3e50] .napi_gro_receive+0x30/0x120
> Call Trace:
> [c0000001bffebb50] [c000000000da95f0] perf_callchain_user+0x0/0x10 (unreliable)
> [c0000001bffebbf0] [d0000000047bd118] .ixgbe_clean_rx_irq+0x7a8/0x8a0 [ixgbe]
> [c0000001bffebd10] [d0000000047bd414] .ixgbe_poll+0x64/0x160 [ixgbe]
> [c0000001bffebdd0] [c0000000004e3358] .net_rx_action+0x108/0x2a0
> [c0000001bffebea0] [c00000000009b220] .__do_softirq+0x110/0x2a0
> [c0000001bffebf90] [c000000000023798] .call_do_softirq+0x14/0x24
> [c000000000dff830] [c000000000011148] .do_softirq+0xf8/0x130
> [c000000000dff8d0] [c00000000009aeb4] .irq_exit+0xb4/0xc0
> [c000000000dff950] [c000000000011254] .do_IRQ+0xd4/0x300
> [c000000000dffa10] [c000000000005024] hardware_interrupt_entry+0x18/0x74
> --- Exception: 501 at .pseries_dedicated_idle_sleep+0xe4/0x210
> LR = .pseries_dedicated_idle_sleep+0x8c/0x210
> [c000000000dffd00] [c00000000005b194] .pseries_dedicated_idle_sleep+0x194/0x210
> (unreliable)
> [c000000000dffdc0] [c000000000018c84] .cpu_idle+0x164/0x210
> [c000000000dffe70] [c00000000000b0d0] .rest_init+0x90/0xb0
> [c000000000dffef0] [c000000000830bc0] .start_kernel+0x54c/0x56c
> [c000000000dfff90] [c00000000000953c] .start_here_common+0x1c/0x60
>
> Its caused when skb_gro_reset_offset attempts to call PageHighMem on
> skb_shinfo(skb)->frags[0].page, when the frags array was left uninitalized.
> This can happen in the ixgbe driver if the hardware reports a zero length rx
> descriptor ni the middle of a packet split receive transaction.  I've consulted
> with Jesse Brandeburg on this, who is attempting to root cause the issue at
> Intel, but it seems prudent to add this check to the driver to discard frames of
> that encounter this error to avoid the opps
>
> Signed-off-by: Neil Horman<nhorman@tuxdriver.com>
> Signed-off-by: Thadeu Lima de Souza Cascardo<cascardo@linux.vnet.ibm.com>
> CC: Jesse Brandeburg<jesse.brandeburg@intel.com>
> CC: Alexander Duyck<alexander.h.duyck@intel.com>
> CC: John Fastabend<john.r.fastabend@intel.com>
> CC: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
> CC: David S. Miller<davem@davemloft.net>
> ---
>   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   17 +++++++++++------
>   1 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index d20e804..6d59185 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -1326,6 +1326,13 @@ static void ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>
>   		rx_buffer_info =&rx_ring->rx_buffer_info[i];
>
> +		i++;
> +		if (i == rx_ring->count)
> +			i = 0;
> +
> +		next_rxd = IXGBE_RX_DESC_ADV(rx_ring, i);
> +		prefetch(next_rxd);
> +
>   		skb = rx_buffer_info->skb;
>   		rx_buffer_info->skb = NULL;
>   		prefetch(skb->data);
> @@ -1367,6 +1374,10 @@ static void ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>   		} else {
>   			/* assume packet split since header is unmapped */
>   			upper_len = le16_to_cpu(rx_desc->wb.upper.length);
> +			if (!upper_len) {
> +				rx_buffer_info->skb = skb;
> +				goto next_desc;
> +			}
>   		}
>
>   		if (upper_len) {
> @@ -1391,12 +1402,6 @@ static void ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>   			skb->truesize += upper_len;
>   		}
>
> -		i++;
> -		if (i == rx_ring->count)
> -			i = 0;
> -
> -		next_rxd = IXGBE_RX_DESC_ADV(rx_ring, i);
> -		prefetch(next_rxd);
>   		cleaned_count++;
>
>   		if (pkt_is_rsc) {

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Sept. 2, 2011, 4:43 p.m. UTC | #3
On Fri, Sep 02, 2011 at 09:04:53AM -0700, Jeff Kirsher wrote:
> On Fri, 2011-09-02 at 07:03 -0700, Neil Horman wrote:
> > This oops was reported recently no ppc64 hardware:
> > Unable to handle kernel paging request for data at address 0x00000000
> > Faulting instruction address: 0xc0000000004dda0c
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > SMP NR_CPUS=1024 NUMA pSeries
> > Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
> > iptable_fi
> > lter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
> > nf_conntrack ip6table_filter ip6_tables ipv6 jsm ses enclosure sg
> > ixgbe
> > mdio e1000 ehea ext4 jbd2 mbcache sd_mod crc_t10dif ipr dm_mod
> > NIP: c0000000004dda0c LR: c0000000004e3e50 CTR: c0000000004e3e20
> > REGS: c0000001bffeb8d0 TRAP: 0300   Not tainted
> > (3.1.0-rc2-10121-gab7e2db)
> > MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 28002042  XER: 20000000
> > CFAR: c000000000004d70
> > DAR: 0000000000000000, DSISR: 40000000
> > TASK = c000000000d548e0[0] 'swapper' THREAD: c000000000dfc000 CPU: 0
> > GPR04: c0000000010f4d80 c0000001bffebd80 0000000000000000
> > c0000001b18a8200
> > GPR08: 0000000000000280 c0000001bcc517a8 c0000001b18a7f80
> > 0000000000000000
> > GPR12: d0000000047e5bb0 c000000001f10000 c0000001b19c8700
> > 0000000000000000
> > GPR16: c0000001bffebd80 0000000000000083 c00000018f2447a0
> > 0000000000000002
> > GPR20: 0000000000000000 c0000001ba860010 c0000001ba860000
> > d000000003d40000
> > GPR24: 0000000000000000 0000000000000083 d000000003d40000
> > 0000000000000001
> > GPR28: c00000018f244780 c0000001b2b94310 c000000000da95f0
> > c0000001bcc51780
> > NIP [c0000000004dda0c] .skb_gro_reset_offset+0x5c/0xe0
> > LR [c0000000004e3e50] .napi_gro_receive+0x30/0x120
> > Call Trace:
> > [c0000001bffebb50] [c000000000da95f0] perf_callchain_user+0x0/0x10
> > (unreliable)
> > [c0000001bffebbf0] [d0000000047bd118] .ixgbe_clean_rx_irq+0x7a8/0x8a0
> > [ixgbe]
> > [c0000001bffebd10] [d0000000047bd414] .ixgbe_poll+0x64/0x160 [ixgbe]
> > [c0000001bffebdd0] [c0000000004e3358] .net_rx_action+0x108/0x2a0
> > [c0000001bffebea0] [c00000000009b220] .__do_softirq+0x110/0x2a0
> > [c0000001bffebf90] [c000000000023798] .call_do_softirq+0x14/0x24
> > [c000000000dff830] [c000000000011148] .do_softirq+0xf8/0x130
> > [c000000000dff8d0] [c00000000009aeb4] .irq_exit+0xb4/0xc0
> > [c000000000dff950] [c000000000011254] .do_IRQ+0xd4/0x300
> > [c000000000dffa10] [c000000000005024] hardware_interrupt_entry
> > +0x18/0x74
> > --- Exception: 501 at .pseries_dedicated_idle_sleep+0xe4/0x210
> > LR = .pseries_dedicated_idle_sleep+0x8c/0x210
> > [c000000000dffd00] [c00000000005b194] .pseries_dedicated_idle_sleep
> > +0x194/0x210
> > (unreliable)
> > [c000000000dffdc0] [c000000000018c84] .cpu_idle+0x164/0x210
> > [c000000000dffe70] [c00000000000b0d0] .rest_init+0x90/0xb0
> > [c000000000dffef0] [c000000000830bc0] .start_kernel+0x54c/0x56c
> > [c000000000dfff90] [c00000000000953c] .start_here_common+0x1c/0x60
> > 
> > Its caused when skb_gro_reset_offset attempts to call PageHighMem on
> > skb_shinfo(skb)->frags[0].page, when the frags array was left
> > uninitalized.
> > This can happen in the ixgbe driver if the hardware reports a zero
> > length rx
> > descriptor ni the middle of a packet split receive transaction.  I've
> > consulted
> > with Jesse Brandeburg on this, who is attempting to root cause the
> > issue at
> > Intel, but it seems prudent to add this check to the driver to discard
> > frames of
> > that encounter this error to avoid the opps
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > Signed-off-by: Thadeu Lima de Souza Cascardo
> > <cascardo@linux.vnet.ibm.com>
> > CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > CC: Alexander Duyck <alexander.h.duyck@intel.com>
> > CC: John Fastabend <john.r.fastabend@intel.com>
> > CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > CC: David S. Miller <davem@davemloft.net>
> > ---
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   17
> > +++++++++++------
> >  1 files changed, 11 insertions(+), 6 deletions(-) 
> 
> Thanks Neil, I have added the patch to my queue of ixgbe patches.
> 
> This patch was made against net-next, was this issue only seen on the
> net-next kernel?
You can see all the details here:
https://bugzilla.redhat.com/buglist.cgi?quicksearch=683611

My understanding was that it was seen in RHEL, net-next, and with the
sourceforge driver.

Regards
Neil



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Sept. 2, 2011, 4:55 p.m. UTC | #4
On Fri, Sep 02, 2011 at 09:17:40AM -0700, Alexander Duyck wrote:
> This kind of fix just opens up a whole can of security related
> worms.  If you are going to discard a packet you should do it after
> we have reached the EOP in the series.  My advice would be to
> determine what traits identify this packet and add those to the
> check for the IXGBE_RXDADV_ERR_FRAME_ERR_MASK check further down in
> the code.  Likely what you are seeing is skb_headlen(skb) will be
> equal to 0.
> 
Well, the traits of the bogus descriptor are almost exactly as you describe
them, i.e. rx_buffer_info->dma is zero, which the driver takes to mean packet
split is enabled, and this is a buffer in the middle of that operation
(according to the comments in ixgbe_clean_rx_irq), and the upper_len value we
read from the rx_descriptior rx_dex->wb.upper.length is zero.  This implies we
have a frame which is in the middle of a packet split receive, and one of the
page long buffers has a length value of zero, which is non-sensical.  I suppose
we could wait until the next frame with EOP set to discard the whole thing, but
I'm not sure how that amounts to anything different than just skipping to the
next descriptor.

> I'm suspecting this is some sort of read corruption.  It looks like
> in order to trigger it you have to either be reading
> rx_buffer_info->dma as 0, or the header length is being read as 0.
Correct, which drops us into the else clause of the if(rx_buffer_info->dma)
conditional in ixgbe_clean_rx_irq.

> Do you know if you actually have header split enabled when this is
> occuring?  Are you running with jumbo frames enabled to see the
Yes, packet split is enabled. and no, Jumbo frames are not in use.

> issue?  If not then packet split wouldn't be enabled.
> 
> Is this occurring on net-next or on an older kernel?  I just want to
> be sure since we added a read memory barrier in 2.6.34 to address
> the fact that the length and descriptor DD bits were being read in
> the wrong order resulting in the length being corrupted on PowerPC
> systems.  The fact that we are now seeing another length error on
> PowerPC seems very odd.
> 
According to the bz:
https://bugzilla.redhat.com/show_bug.cgi?id=683611
This appears to be happening on RHEL, and on upstream kernels, as well as the
sourceforge driver.  Don't quote me on the SF driver though, because I never got
a clear answer on that.  Although, fwiw, the RHEL version of the driver in which
we were definately seeing this problem has a read memory barrrier at the top of
the loop in ixgbe_clean_rx_irq, pulled in from commit
3c945e5b3719bcc18c6ddd31bbcae8ef94f3d19a, so I think thats handled.


Regards
Neil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duyck, Alexander H Sept. 2, 2011, 5:54 p.m. UTC | #5
On 09/02/2011 09:55 AM, Neil Horman wrote:
> On Fri, Sep 02, 2011 at 09:17:40AM -0700, Alexander Duyck wrote:
>> This kind of fix just opens up a whole can of security related
>> worms.  If you are going to discard a packet you should do it after
>> we have reached the EOP in the series.  My advice would be to
>> determine what traits identify this packet and add those to the
>> check for the IXGBE_RXDADV_ERR_FRAME_ERR_MASK check further down in
>> the code.  Likely what you are seeing is skb_headlen(skb) will be
>> equal to 0.
>>
> Well, the traits of the bogus descriptor are almost exactly as you describe
> them, i.e. rx_buffer_info->dma is zero, which the driver takes to mean packet
> split is enabled, and this is a buffer in the middle of that operation
> (according to the comments in ixgbe_clean_rx_irq), and the upper_len value we
> read from the rx_descriptior rx_dex->wb.upper.length is zero.  This implies we
> have a frame which is in the middle of a packet split receive, and one of the
> page long buffers has a length value of zero, which is non-sensical.  I suppose
> we could wait until the next frame with EOP set to discard the whole thing, but
> I'm not sure how that amounts to anything different than just skipping to the
> next descriptor.
>
>> I'm suspecting this is some sort of read corruption.  It looks like
>> in order to trigger it you have to either be reading
>> rx_buffer_info->dma as 0, or the header length is being read as 0.
> Correct, which drops us into the else clause of the if(rx_buffer_info->dma)
> conditional in ixgbe_clean_rx_irq.
>
>> Do you know if you actually have header split enabled when this is
>> occuring?  Are you running with jumbo frames enabled to see the
> Yes, packet split is enabled. and no, Jumbo frames are not in use.
>
>> issue?  If not then packet split wouldn't be enabled.
>>
>> Is this occurring on net-next or on an older kernel?  I just want to
>> be sure since we added a read memory barrier in 2.6.34 to address
>> the fact that the length and descriptor DD bits were being read in
>> the wrong order resulting in the length being corrupted on PowerPC
>> systems.  The fact that we are now seeing another length error on
>> PowerPC seems very odd.
>>
> According to the bz:
> https://bugzilla.redhat.com/show_bug.cgi?id=683611
> This appears to be happening on RHEL, and on upstream kernels, as well as the
> sourceforge driver.  Don't quote me on the SF driver though, because I never got
> a clear answer on that.  Although, fwiw, the RHEL version of the driver in which
> we were definately seeing this problem has a read memory barrrier at the top of
> the loop in ixgbe_clean_rx_irq, pulled in from commit
> 3c945e5b3719bcc18c6ddd31bbcae8ef94f3d19a, so I think thats handled.
>
>
> Regards
> Neil
I'll review the bugzilla and submit my comments there.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Sept. 13, 2011, 10:50 a.m. UTC | #6
On Fri, Sep 02, 2011 at 10:03:17AM -0400, Neil Horman wrote:
> This oops was reported recently no ppc64 hardware:
> Unable to handle kernel paging request for data at address 0x00000000
> Faulting instruction address: 0xc0000000004dda0c
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=1024 NUMA pSeries
> Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
> iptable_fi
> lter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
> nf_conntrack ip6table_filter ip6_tables ipv6 jsm ses enclosure sg ixgbe
> mdio e1000 ehea ext4 jbd2 mbcache sd_mod crc_t10dif ipr dm_mod
> NIP: c0000000004dda0c LR: c0000000004e3e50 CTR: c0000000004e3e20
> REGS: c0000001bffeb8d0 TRAP: 0300   Not tainted  (3.1.0-rc2-10121-gab7e2db)
> MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 28002042  XER: 20000000
> CFAR: c000000000004d70
> DAR: 0000000000000000, DSISR: 40000000
> TASK = c000000000d548e0[0] 'swapper' THREAD: c000000000dfc000 CPU: 0
> GPR04: c0000000010f4d80 c0000001bffebd80 0000000000000000 c0000001b18a8200
> GPR08: 0000000000000280 c0000001bcc517a8 c0000001b18a7f80 0000000000000000
> GPR12: d0000000047e5bb0 c000000001f10000 c0000001b19c8700 0000000000000000
> GPR16: c0000001bffebd80 0000000000000083 c00000018f2447a0 0000000000000002
> GPR20: 0000000000000000 c0000001ba860010 c0000001ba860000 d000000003d40000
> GPR24: 0000000000000000 0000000000000083 d000000003d40000 0000000000000001
> GPR28: c00000018f244780 c0000001b2b94310 c000000000da95f0 c0000001bcc51780
> NIP [c0000000004dda0c] .skb_gro_reset_offset+0x5c/0xe0
> LR [c0000000004e3e50] .napi_gro_receive+0x30/0x120
> Call Trace:
> [c0000001bffebb50] [c000000000da95f0] perf_callchain_user+0x0/0x10 (unreliable)
> [c0000001bffebbf0] [d0000000047bd118] .ixgbe_clean_rx_irq+0x7a8/0x8a0 [ixgbe]
> [c0000001bffebd10] [d0000000047bd414] .ixgbe_poll+0x64/0x160 [ixgbe]
> [c0000001bffebdd0] [c0000000004e3358] .net_rx_action+0x108/0x2a0
> [c0000001bffebea0] [c00000000009b220] .__do_softirq+0x110/0x2a0
> [c0000001bffebf90] [c000000000023798] .call_do_softirq+0x14/0x24
> [c000000000dff830] [c000000000011148] .do_softirq+0xf8/0x130
> [c000000000dff8d0] [c00000000009aeb4] .irq_exit+0xb4/0xc0
> [c000000000dff950] [c000000000011254] .do_IRQ+0xd4/0x300
> [c000000000dffa10] [c000000000005024] hardware_interrupt_entry+0x18/0x74
> --- Exception: 501 at .pseries_dedicated_idle_sleep+0xe4/0x210
> LR = .pseries_dedicated_idle_sleep+0x8c/0x210
> [c000000000dffd00] [c00000000005b194] .pseries_dedicated_idle_sleep+0x194/0x210
> (unreliable)
> [c000000000dffdc0] [c000000000018c84] .cpu_idle+0x164/0x210
> [c000000000dffe70] [c00000000000b0d0] .rest_init+0x90/0xb0
> [c000000000dffef0] [c000000000830bc0] .start_kernel+0x54c/0x56c
> [c000000000dfff90] [c00000000000953c] .start_here_common+0x1c/0x60
> 
> Its caused when skb_gro_reset_offset attempts to call PageHighMem on
> skb_shinfo(skb)->frags[0].page, when the frags array was left uninitalized.
> This can happen in the ixgbe driver if the hardware reports a zero length rx
> descriptor ni the middle of a packet split receive transaction.  I've consulted
> with Jesse Brandeburg on this, who is attempting to root cause the issue at
> Intel, but it seems prudent to add this check to the driver to discard frames of
> that encounter this error to avoid the opps
> 
Sorry, I need to rescind this patch.  Looks like this is turning out to be an
issue with an ideosyncracy in the dma hardware on this platform.

Thanks
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirsher, Jeffrey T Sept. 13, 2011, 10:52 p.m. UTC | #7
On Tue, 2011-09-13 at 03:50 -0700, Neil Horman wrote:
> On Fri, Sep 02, 2011 at 10:03:17AM -0400, Neil Horman wrote:
> > This oops was reported recently no ppc64 hardware:
> > Unable to handle kernel paging request for data at address 0x00000000
> > Faulting instruction address: 0xc0000000004dda0c
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > SMP NR_CPUS=1024 NUMA pSeries
> > Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
> > iptable_fi
> > lter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
> > nf_conntrack ip6table_filter ip6_tables ipv6 jsm ses enclosure sg ixgbe
> > mdio e1000 ehea ext4 jbd2 mbcache sd_mod crc_t10dif ipr dm_mod
> > NIP: c0000000004dda0c LR: c0000000004e3e50 CTR: c0000000004e3e20
> > REGS: c0000001bffeb8d0 TRAP: 0300   Not tainted  (3.1.0-rc2-10121-gab7e2db)
> > MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 28002042  XER: 20000000
> > CFAR: c000000000004d70
> > DAR: 0000000000000000, DSISR: 40000000
> > TASK = c000000000d548e0[0] 'swapper' THREAD: c000000000dfc000 CPU: 0
> > GPR04: c0000000010f4d80 c0000001bffebd80 0000000000000000 c0000001b18a8200
> > GPR08: 0000000000000280 c0000001bcc517a8 c0000001b18a7f80 0000000000000000
> > GPR12: d0000000047e5bb0 c000000001f10000 c0000001b19c8700 0000000000000000
> > GPR16: c0000001bffebd80 0000000000000083 c00000018f2447a0 0000000000000002
> > GPR20: 0000000000000000 c0000001ba860010 c0000001ba860000 d000000003d40000
> > GPR24: 0000000000000000 0000000000000083 d000000003d40000 0000000000000001
> > GPR28: c00000018f244780 c0000001b2b94310 c000000000da95f0 c0000001bcc51780
> > NIP [c0000000004dda0c] .skb_gro_reset_offset+0x5c/0xe0
> > LR [c0000000004e3e50] .napi_gro_receive+0x30/0x120
> > Call Trace:
> > [c0000001bffebb50] [c000000000da95f0] perf_callchain_user+0x0/0x10 (unreliable)
> > [c0000001bffebbf0] [d0000000047bd118] .ixgbe_clean_rx_irq+0x7a8/0x8a0 [ixgbe]
> > [c0000001bffebd10] [d0000000047bd414] .ixgbe_poll+0x64/0x160 [ixgbe]
> > [c0000001bffebdd0] [c0000000004e3358] .net_rx_action+0x108/0x2a0
> > [c0000001bffebea0] [c00000000009b220] .__do_softirq+0x110/0x2a0
> > [c0000001bffebf90] [c000000000023798] .call_do_softirq+0x14/0x24
> > [c000000000dff830] [c000000000011148] .do_softirq+0xf8/0x130
> > [c000000000dff8d0] [c00000000009aeb4] .irq_exit+0xb4/0xc0
> > [c000000000dff950] [c000000000011254] .do_IRQ+0xd4/0x300
> > [c000000000dffa10] [c000000000005024] hardware_interrupt_entry+0x18/0x74
> > --- Exception: 501 at .pseries_dedicated_idle_sleep+0xe4/0x210
> > LR = .pseries_dedicated_idle_sleep+0x8c/0x210
> > [c000000000dffd00] [c00000000005b194] .pseries_dedicated_idle_sleep+0x194/0x210
> > (unreliable)
> > [c000000000dffdc0] [c000000000018c84] .cpu_idle+0x164/0x210
> > [c000000000dffe70] [c00000000000b0d0] .rest_init+0x90/0xb0
> > [c000000000dffef0] [c000000000830bc0] .start_kernel+0x54c/0x56c
> > [c000000000dfff90] [c00000000000953c] .start_here_common+0x1c/0x60
> > 
> > Its caused when skb_gro_reset_offset attempts to call PageHighMem on
> > skb_shinfo(skb)->frags[0].page, when the frags array was left uninitalized.
> > This can happen in the ixgbe driver if the hardware reports a zero length rx
> > descriptor ni the middle of a packet split receive transaction.  I've consulted
> > with Jesse Brandeburg on this, who is attempting to root cause the issue at
> > Intel, but it seems prudent to add this check to the driver to discard frames of
> > that encounter this error to avoid the opps
> > 
> Sorry, I need to rescind this patch.  Looks like this is turning out to be an
> issue with an ideosyncracy in the dma hardware on this platform.
> 
> Thanks
> Neil
> 

Thanks for the update Neil.  I have dropped the patch from my queue.
diff mbox

Patch

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d20e804..6d59185 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1326,6 +1326,13 @@  static void ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 
 		rx_buffer_info = &rx_ring->rx_buffer_info[i];
 
+		i++;
+		if (i == rx_ring->count)
+			i = 0;
+
+		next_rxd = IXGBE_RX_DESC_ADV(rx_ring, i);
+		prefetch(next_rxd);
+
 		skb = rx_buffer_info->skb;
 		rx_buffer_info->skb = NULL;
 		prefetch(skb->data);
@@ -1367,6 +1374,10 @@  static void ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		} else {
 			/* assume packet split since header is unmapped */
 			upper_len = le16_to_cpu(rx_desc->wb.upper.length);
+			if (!upper_len) {
+				rx_buffer_info->skb = skb;
+				goto next_desc;
+			}
 		}
 
 		if (upper_len) {
@@ -1391,12 +1402,6 @@  static void ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			skb->truesize += upper_len;
 		}
 
-		i++;
-		if (i == rx_ring->count)
-			i = 0;
-
-		next_rxd = IXGBE_RX_DESC_ADV(rx_ring, i);
-		prefetch(next_rxd);
 		cleaned_count++;
 
 		if (pkt_is_rsc) {