diff mbox

[LEDE-DEV] spinlock loop in cns3xxx_eth.c

Message ID 5761546C.8010409@ncentric.com
State Accepted
Headers show

Commit Message

Koen Vandeputte June 15, 2016, 1:13 p.m. UTC
Hi All,

I found the bug and was able to fix it.

Tested on 4 identical gateworks gw2388-4 (cns3xxx) boards.

- iperf in dual direction between board 1s & 2    (without patch)
- iperf in dual direction between board 3s & 4    (with patch)

Boards 1 and 2 go in spinlock after a few minutes on each try
Boards 3 and 4 have been running for hours without the error

To avoid any hardware delta, I retested with the patch only present on 
board 1 & 2
Board 1 & 2 now worked perfectly while 3 & 4 showed the loop issue



Patch:


 From 9338467c36b27bf1099dbdabd7591470c2a371df Mon Sep 17 00:00:00 2001
From: Koen Vandeputte <koen.vandeputte@ncentric.com>
Date: Wed, 15 Jun 2016 14:59:49 +0200
Subject: cns3xxx: fix RX softIRQ loop

Already reschedule when 1 or more frames came in.

Checking for a full queue could produce a re-schedule loop as
the checked RX ring location could contain undefined values
depending on activity in previous loops.

Signed-off-by: Koen Vandeputte <koen.vandeputte@ncentric.com>




Kind regards,

Koen


On 2016-06-14 16:34, Koen Vandeputte wrote:
> Hi Tim,
>
> I fully understand the reason for the check.
>
> However,
> I think the bug has nothing to do with the rotting packet issue.
>
>
> The bug is actually:
> - eth_schedule_poll() gets called as it's detected that the ring is 
> full after IRQ_enable. (cown flag is set in the previous slot of the 
> ring)
> --> During this extra loop,  _0_ packets are received/processed! The 
> ring still seems to be full, and another reschedule takes place .. and 
> so on ..
>
>
> Iv'e added 3 debug counters in the code:    (see below after the patch)
> - Loop --> Top of eth_poll()
> - IRQ --> Top of eth_rx_irq()
> - Resched --> Inside the reschedule check
>
>
>
>
> Patch:
>
> --- a/drivers/net/ethernet/cavium/cns3xxx_eth.c
> +++ b/drivers/net/ethernet/cavium/cns3xxx_eth.c
> @@ -308,6 +308,8 @@ static int ports_open;
>  static struct port *switch_port_tab[4];
>  struct net_device *napi_dev;
>
> +static int CntIRQ;
> +
>  static int cns3xxx_mdio_cmd(struct mii_bus *bus, int phy_id, int 
> location,
>                 int write, u16 cmd)
>  {
> @@ -437,7 +439,10 @@ static void cns3xxx_adjust_link(struct n
>  static void eth_schedule_poll(struct sw *sw)
>  {
>      if (unlikely(!napi_schedule_prep(&sw->napi)))
> +    {
> +        //printk(KERN_ERR "sched err\n");
>          return;
> +    }
>
>      disable_irq_nosync(sw->rx_irq);
>      __napi_schedule(&sw->napi);
> @@ -447,6 +452,7 @@ irqreturn_t eth_rx_irq(int irq, void *pd
>  {
>      struct net_device *dev = pdev;
>      struct sw *sw = netdev_priv(dev);
> +    CntIRQ++;
>      eth_schedule_poll(sw);
>      return (IRQ_HANDLED);
>  }
> @@ -617,6 +623,7 @@ static void eth_complete_tx(struct sw *s
>
>  static int eth_poll(struct napi_struct *napi, int budget)
>  {
> +    static int Loop = 0;
>      struct sw *sw = container_of(napi, struct sw, napi);
>      struct _rx_ring *rx_ring = &sw->rx_ring;
>      int received = 0;
> @@ -712,6 +719,9 @@ static int eth_poll(struct napi_struct *
>      }
>
>      rx_ring->cur_index = i;
> +
> +    Loop++;
> +
>      if (!received) {
>          napi_complete(napi);
>          enable_irq(sw->rx_irq);
> @@ -719,7 +729,12 @@ static int eth_poll(struct napi_struct *
>
>          /* if rx descriptors are full schedule another poll */
>          if (rx_ring->desc[(i-1) & (RX_DESCS-1)].cown)
> +        {
> +            static int Sched = 0;
> +
> +            printk(KERN_ERR "Loop: %d irq: %d ReSched: %d\n", Loop, 
> CntIRQ, ++Sched);
>              eth_schedule_poll(sw);
> +        }
>      }
>
>      spin_lock_bh(&tx_lock);
>
>
>
> When the bug triggers, you can see below that consecutive loops in 
> eth_poll(), triggered by a reschedule, doesn't process any RX packets 
> at all ..
> (As the ReSched counter increases linear with the loop counter and 
> ReSched can only be incremented when "!received")
>
>
> Also
> Checking /proc/interrupts shows that gig_switch IRQ counter goes 
> berserk (+10k IRQ/s)
>
>
> [  477.440000] Loop: 835307 irq: 300583 ReSched: 1
> [  477.440000] Loop: 835308 irq: 300583 ReSched: 2
> [  477.440000] Loop: 835310 irq: 300583 ReSched: 3
> [  477.440000] Loop: 835311 irq: 300583 ReSched: 4
> [  477.440000] Loop: 835312 irq: 300583 ReSched: 5
> [  477.440000] Loop: 835313 irq: 300583 ReSched: 6
> [  477.440000] Loop: 835314 irq: 300583 ReSched: 7
> [  477.440000] Loop: 835315 irq: 300583 ReSched: 8
> [  477.440000] Loop: 835316 irq: 300583 ReSched: 9
> [  477.440000] Loop: 835317 irq: 300583 ReSched: 10
> [  477.440000] Loop: 835318 irq: 300583 ReSched: 11
> [  477.440000] Loop: 835319 irq: 300583 ReSched: 12
> [  477.440000] Loop: 835320 irq: 300583 ReSched: 13
> [  477.440000] Loop: 835321 irq: 300583 ReSched: 14
> [  477.440000] Loop: 835322 irq: 300583 ReSched: 15
> [  477.440000] Loop: 835323 irq: 300583 ReSched: 16
> [  477.440000] Loop: 835324 irq: 300583 ReSched: 17
> [  477.440000] Loop: 835325 irq: 300583 ReSched: 18
> [  477.440000] Loop: 835326 irq: 300583 ReSched: 19
> [  477.440000] Loop: 835327 irq: 300583 ReSched: 20
> [  477.440000] Loop: 835328 irq: 300583 ReSched: 21
> [  477.440000] Loop: 835329 irq: 300583 ReSched: 22
> [  477.440000] Loop: 835330 irq: 300583 ReSched: 23
> [  477.440000] Loop: 835331 irq: 300583 ReSched: 24
> [  477.440000] Loop: 835332 irq: 300583 ReSched: 25
> [  477.440000] Loop: 835333 irq: 300583 ReSched: 26
> [  477.440000] Loop: 835334 irq: 300583 ReSched: 27
> [  477.440000] Loop: 835335 irq: 300583 ReSched: 28
> [  477.440000] Loop: 835336 irq: 300583 ReSched: 29
> [  477.440000] Loop: 835337 irq: 300583 ReSched: 30
> [  477.440000] Loop: 835338 irq: 300583 ReSched: 31
> [  477.440000] Loop: 835339 irq: 300583 ReSched: 32
> [  477.440000] Loop: 835340 irq: 300583 ReSched: 33
> [  477.440000] Loop: 835341 irq: 300583 ReSched: 34
> [  477.440000] Loop: 835342 irq: 300583 ReSched: 35
> [  477.440000] Loop: 835343 irq: 300583 ReSched: 36
> [  477.440000] Loop: 835344 irq: 300583 ReSched: 37
> [  477.440000] Loop: 835345 irq: 300583 ReSched: 38
> [  477.440000] Loop: 835346 irq: 300583 ReSched: 39
> [  477.440000] Loop: 835347 irq: 300583 ReSched: 40
> [  477.440000] Loop: 835348 irq: 300583 ReSched: 41
> [  477.440000] Loop: 835349 irq: 300583 ReSched: 42
> [  477.440000] Loop: 835350 irq: 300583 ReSched: 43
> [  477.440000] Loop: 835351 irq: 300583 ReSched: 44
> [  477.440000] Loop: 835352 irq: 300583 ReSched: 45
> [  477.440000] Loop: 835353 irq: 300583 ReSched: 46
> [  477.440000] Loop: 835354 irq: 300583 ReSched: 47
> [  477.440000] Loop: 835355 irq: 300583 ReSched: 48
> [  477.440000] Loop: 835356 irq: 300583 ReSched: 49
> [  477.440000] Loop: 835357 irq: 300583 ReSched: 50
> [  477.440000] Loop: 835358 irq: 300583 ReSched: 51
> [  477.440000] Loop: 835359 irq: 300583 ReSched: 52
> [  477.440000] Loop: 835360 irq: 300583 ReSched: 53
> [  477.440000] Loop: 835361 irq: 300583 ReSched: 54
> [  477.440000] Loop: 835362 irq: 300583 ReSched: 55
> [  477.440000] Loop: 835363 irq: 300583 ReSched: 56
> [  477.440000] Loop: 835364 irq: 300583 ReSched: 57
> [  477.440000] Loop: 835365 irq: 300583 ReSched: 58
> [  477.440000] Loop: 835366 irq: 300583 ReSched: 59
> [  477.440000] Loop: 835367 irq: 300583 ReSched: 60
> [  477.440000] Loop: 835368 irq: 300583 ReSched: 61
> [  477.440000] Loop: 835369 irq: 300583 ReSched: 62
> [  477.440000] Loop: 835370 irq: 300583 ReSched: 63
> [  477.440000] Loop: 835371 irq: 300583 ReSched: 64
> [  477.440000] Loop: 835372 irq: 300583 ReSched: 65
> [  477.440000] Loop: 835373 irq: 300583 ReSched: 66
> [  477.440000] Loop: 835374 irq: 300583 ReSched: 67
> [  477.440000] Loop: 835375 irq: 300583 ReSched: 68
> [  477.440000] Loop: 835376 irq: 300583 ReSched: 69
> [  477.440000] Loop: 835377 irq: 300583 ReSched: 70
> [  477.440000] Loop: 835378 irq: 300583 ReSched: 71
> [  477.440000] Loop: 835379 irq: 300583 ReSched: 72
> [  477.440000] Loop: 835380 irq: 300583 ReSched: 73
> [  477.440000] Loop: 835381 irq: 300583 ReSched: 74
> [  477.440000] Loop: 835382 irq: 300583 ReSched: 75
> [  477.440000] Loop: 835383 irq: 300583 ReSched: 76
> [  477.440000] Loop: 835384 irq: 300583 ReSched: 77
> [  477.440000] Loop: 835385 irq: 300583 ReSched: 78
> [  477.440000] Loop: 835386 irq: 300583 ReSched: 79
> [  477.440000] Loop: 835387 irq: 300583 ReSched: 80
> [  477.440000] Loop: 835388 irq: 300583 ReSched: 81
> [  477.440000] Loop: 835389 irq: 300583 ReSched: 82
> [  477.440000] Loop: 835390 irq: 300583 ReSched: 83
> [  477.440000] Loop: 835391 irq: 300583 ReSched: 84
> [  477.440000] Loop: 835392 irq: 300583 ReSched: 85
> [  477.440000] Loop: 835393 irq: 300583 ReSched: 86
> [  477.440000] Loop: 835394 irq: 300583 ReSched: 87
> [  477.440000] Loop: 835396 irq: 300583 ReSched: 88
> [  477.440000] Loop: 835397 irq: 300583 ReSched: 89
> [  477.440000] Loop: 835398 irq: 300583 ReSched: 90
> [  477.440000] Loop: 835399 irq: 300583 ReSched: 91
> [  477.440000] Loop: 835400 irq: 300583 ReSched: 92
> [  477.440000] Loop: 835403 irq: 300583 ReSched: 93
> [  477.440000] Loop: 835406 irq: 300583 ReSched: 94
> [  477.440000] Loop: 835407 irq: 300583 ReSched: 95
> [  477.440000] Loop: 835408 irq: 300583 ReSched: 96
> [  477.440000] Loop: 835409 irq: 300583 ReSched: 97
> [  477.440000] Loop: 835410 irq: 300583 ReSched: 98
> [  477.440000] Loop: 835411 irq: 300583 ReSched: 99
> [  477.440000] Loop: 835412 irq: 300583 ReSched: 100
> [  477.440000] Loop: 835413 irq: 300583 ReSched: 101
> [  477.440000] Loop: 835414 irq: 300583 ReSched: 102
> [  477.440000] Loop: 835415 irq: 300583 ReSched: 103
> [  477.440000] Loop: 835416 irq: 300583 ReSched: 104
> [  477.440000] Loop: 835417 irq: 300583 ReSched: 105
> [  477.440000] Loop: 835418 irq: 300583 ReSched: 106
> [  477.440000] Loop: 835419 irq: 300583 ReSched: 107
> [  477.440000] Loop: 835420 irq: 300583 ReSched: 108
> [  477.440000] Loop: 835421 irq: 300583 ReSched: 109
> [  477.440000] Loop: 835422 irq: 300583 ReSched: 110
> [  477.440000] Loop: 835423 irq: 300583 ReSched: 111
> [  477.440000] Loop: 835424 irq: 300583 ReSched: 112
> [  477.440000] Loop: 835425 irq: 300583 ReSched: 113
> [  477.440000] Loop: 835426 irq: 300583 ReSched: 114
> [  477.440000] Loop: 835427 irq: 300583 ReSched: 115
> [  477.440000] Loop: 835428 irq: 300583 ReSched: 116
> [  477.440000] Loop: 835429 irq: 300583 ReSched: 117
> [  477.440000] Loop: 835430 irq: 300583 ReSched: 118
> [  477.440000] Loop: 835431 irq: 300583 ReSched: 119
> [  477.440000] Loop: 835432 irq: 300583 ReSched: 120
> [  477.440000] Loop: 835433 irq: 300583 ReSched: 121
> [  477.440000] Loop: 835434 irq: 300583 ReSched: 122
> [  477.440000] Loop: 835435 irq: 300583 ReSched: 123
> [  477.440000] Loop: 835436 irq: 300583 ReSched: 124
> [  477.440000] Loop: 835437 irq: 300583 ReSched: 125
> [  477.440000] Loop: 835438 irq: 300583 ReSched: 126
> [  477.440000] Loop: 835439 irq: 300583 ReSched: 127
> [  477.440000] Loop: 835440 irq: 300583 ReSched: 128
> [  477.440000] Loop: 835441 irq: 300583 ReSched: 129
> [  477.440000] Loop: 835442 irq: 300583 ReSched: 130
> [  477.440000] Loop: 835443 irq: 300583 ReSched: 131
> [  477.440000] Loop: 835444 irq: 300583 ReSched: 132
> [  477.440000] Loop: 835445 irq: 300583 ReSched: 133
> [  477.440000] Loop: 835446 irq: 300583 ReSched: 134
> [  477.440000] Loop: 835447 irq: 300583 ReSched: 135
> [  477.440000] Loop: 835448 irq: 300583 ReSched: 136
> [  477.440000] Loop: 835449 irq: 300583 ReSched: 137
> [  477.440000] Loop: 835450 irq: 300583 ReSched: 138
> [  477.450000] Loop: 835451 irq: 300583 ReSched: 139
> [  477.450000] Loop: 835452 irq: 300583 ReSched: 140
> [  477.450000] Loop: 835453 irq: 300583 ReSched: 141
> [  477.450000] Loop: 835454 irq: 300583 ReSched: 142
> [  477.450000] Loop: 835455 irq: 300583 ReSched: 143
> [  477.450000] Loop: 835456 irq: 300583 ReSched: 144
> [  477.450000] Loop: 835457 irq: 300583 ReSched: 145
> [  477.450000] Loop: 835458 irq: 300583 ReSched: 146
> [  477.450000] Loop: 835459 irq: 300583 ReSched: 147
> [  477.450000] Loop: 835460 irq: 300583 ReSched: 148
> [  477.450000] Loop: 835461 irq: 300583 ReSched: 149
> [  477.450000] Loop: 835462 irq: 300583 ReSched: 150
> [  477.450000] Loop: 835463 irq: 300583 ReSched: 151
> [  477.450000] Loop: 835464 irq: 300583 ReSched: 152
> [  477.450000] Loop: 835465 irq: 300583 ReSched: 153
> [  477.450000] Loop: 835466 irq: 300583 ReSched: 154
> [  477.450000] Loop: 835467 irq: 300583 ReSched: 155
> [  477.450000] Loop: 835468 irq: 300583 ReSched: 156
> [  477.450000] Loop: 835469 irq: 300583 ReSched: 157
> [  477.450000] Loop: 835470 irq: 300583 ReSched: 158
> [  477.450000] Loop: 835472 irq: 300583 ReSched: 159
> [  477.450000] Loop: 835473 irq: 300583 ReSched: 160
> [  477.450000] Loop: 835474 irq: 300583 ReSched: 161
> [  477.450000] Loop: 835475 irq: 300583 ReSched: 162
> [  477.450000] Loop: 835476 irq: 300583 ReSched: 163
> [  477.450000] Loop: 835477 irq: 300583 ReSched: 164
> [  477.450000] Loop: 835478 irq: 300583 ReSched: 165
> [  477.450000] Loop: 835479 irq: 300583 ReSched: 166
> [  477.450000] Loop: 835480 irq: 300583 ReSched: 167
> [  477.450000] Loop: 835481 irq: 300583 ReSched: 168
> [  477.450000] Loop: 835482 irq: 300583 ReSched: 169
> [  477.450000] Loop: 835483 irq: 300583 ReSched: 170
> [  477.450000] Loop: 835484 irq: 300583 ReSched: 171
> [  477.450000] Loop: 835485 irq: 300583 ReSched: 172
> [  477.450000] Loop: 835486 irq: 300583 ReSched: 173
> [  477.450000] Loop: 835487 irq: 300583 ReSched: 174
> [  477.450000] Loop: 835488 irq: 300583 ReSched: 175
> [  477.450000] Loop: 835489 irq: 300583 ReSched: 176
> [  477.450000] Loop: 835490 irq: 300583 ReSched: 177
> [  477.450000] Loop: 835491 irq: 300583 ReSched: 178
> [  477.450000] Loop: 835492 irq: 300583 ReSched: 179
> [  477.450000] Loop: 835493 irq: 300583 ReSched: 180
> [  477.450000] Loop: 835494 irq: 300583 ReSched: 181
> [  477.450000] Loop: 835495 irq: 300583 ReSched: 182
> [  477.450000] Loop: 835496 irq: 300583 ReSched: 183
> [  477.450000] Loop: 835497 irq: 300583 ReSched: 184
> [  477.450000] Loop: 835498 irq: 300583 ReSched: 185
> [  477.450000] Loop: 835499 irq: 300583 ReSched: 186
> [  477.450000] Loop: 835500 irq: 300583 ReSched: 187
> [  477.450000] Loop: 835501 irq: 300583 ReSched: 188
> [  477.450000] Loop: 835502 irq: 300583 ReSched: 189
> [  477.450000] Loop: 835503 irq: 300583 ReSched: 190
> [  477.450000] Loop: 835504 irq: 300583 ReSched: 191
> [  477.450000] Loop: 835505 irq: 300583 ReSched: 192
> [  477.450000] Loop: 835506 irq: 300583 ReSched: 193
> [  477.450000] Loop: 835507 irq: 300583 ReSched: 194
> [  477.450000] Loop: 835508 irq: 300583 ReSched: 195
> [  477.450000] Loop: 835509 irq: 300583 ReSched: 196
> [  477.450000] Loop: 835510 irq: 300583 ReSched: 197
> [  477.450000] Loop: 835511 irq: 300583 ReSched: 198
> [  477.450000] Loop: 835512 irq: 300583 ReSched: 199
> [  477.450000] Loop: 835513 irq: 300583 ReSched: 200
> [  477.450000] Loop: 835514 irq: 300583 ReSched: 201
> [  477.450000] Loop: 835515 irq: 300583 ReSched: 202
> [  477.450000] Loop: 835516 irq: 300583 ReSched: 203
> [  477.450000] Loop: 835517 irq: 300583 ReSched: 204
> [  477.450000] Loop: 835518 irq: 300583 ReSched: 205
> [  477.450000] Loop: 835519 irq: 300583 ReSched: 206
> [  477.450000] Loop: 835520 irq: 300583 ReSched: 207
> [  477.450000] Loop: 835521 irq: 300583 ReSched: 208
> [  477.450000] Loop: 835522 irq: 300583 ReSched: 209
> [  477.450000] Loop: 835523 irq: 300583 ReSched: 210
> [  477.450000] Loop: 835524 irq: 300583 ReSched: 211
> [  477.450000] Loop: 835525 irq: 300583 ReSched: 212
> [  477.450000] Loop: 835526 irq: 300583 ReSched: 213
> [  477.450000] Loop: 835527 irq: 300583 ReSched: 214
> [  477.450000] Loop: 835528 irq: 300583 ReSched: 215
> [  477.450000] Loop: 835529 irq: 300583 ReSched: 216
> [  477.450000] Loop: 835530 irq: 300583 ReSched: 217
> [  478.450000] Loop: 835534 irq: 300584 ReSched: 218
> [  478.460000] Loop: 835536 irq: 300585 ReSched: 219
> [  478.470000] Loop: 835538 irq: 300586 ReSched: 220
> [  478.470000] Loop: 835539 irq: 300586 ReSched: 221
> [  478.500000] Loop: 835983 irq: 300586 ReSched: 622
> ...
> [  478.510000] Loop: 836072 irq: 300586 ReSched: 697
> [  478.510000] Loop: 836073 irq: 300586 ReSched: 698
> [  478.510000] Loop: 836074 irq: 300586 ReSched: 699
> [  478.510000] Loop: 836075 irq: 300586 ReSched: 700
> [  478.510000] Loop: 836076 irq: 300586 ReSched: 701
> [  478.510000] Loop: 836077 irq: 300586 ReSched: 702
> [  478.510000] Loop: 836078 irq: 300586 ReSched: 703
> [  478.510000] Loop: 836079 irq: 300586 ReSched: 704
> [  478.510000] Loop: 836080 irq: 300586 ReSched: 705
> [  478.510000] Loop: 836081 irq: 300586 ReSched: 706
> [  478.510000] Loop: 836082 irq: 300586 ReSched: 707
> [  478.510000] Loop: 836083 irq: 300586 ReSched: 708
> [  478.510000] Loop: 836084 irq: 300586 ReSched: 709
> [  478.510000] Loop: 836085 irq: 300586 ReSched: 710
> [  478.510000] Loop: 836086 irq: 300586 ReSched: 711
> [  478.510000] Loop: 836087 irq: 300586 ReSched: 712
> [  478.510000] Loop: 836088 irq: 300586 ReSched: 713
> [  478.510000] Loop: 836089 irq: 300586 ReSched: 714
> [  478.510000] Loop: 836090 irq: 300586 ReSched: 715
> [  478.510000] Loop: 836091 irq: 300586 ReSched: 716
> [  478.510000] Loop: 836092 irq: 300586 ReSched: 717
> [  478.510000] Loop: 836093 irq: 300586 ReSched: 718
> [  478.510000] Loop: 836094 irq: 300586 ReSched: 719
> [  478.510000] Loop: 836095 irq: 300586 ReSched: 720
> [  478.510000] Loop: 836096 irq: 300586 ReSched: 721
> [  478.510000] Loop: 836097 irq: 300586 ReSched: 722
> [  478.510000] Loop: 836098 irq: 300586 ReSched: 723
> [  478.510000] Loop: 836099 irq: 300586 ReSched: 724
> [  478.510000] Loop: 836100 irq: 300586 ReSched: 725
> [  478.510000] Loop: 836101 irq: 300586 ReSched: 726
> [  478.510000] Loop: 836102 irq: 300586 ReSched: 727
> [  478.510000] Loop: 836103 irq: 300586 ReSched: 728
> [  478.510000] Loop: 836104 irq: 300586 ReSched: 729
> [  478.510000] Loop: 836105 irq: 300586 ReSched: 730
> [  478.510000] Loop: 836106 irq: 300586 ReSched: 731
> [  478.510000] Loop: 836107 irq: 300586 ReSched: 732
> [  478.510000] Loop: 836108 irq: 300586 ReSched: 733
> [  478.510000] Loop: 836109 irq: 300586 ReSched: 734
> [  478.510000] Loop: 836110 irq: 300586 ReSched: 735
> [  478.510000] Loop: 836111 irq: 300586 ReSched: 736
> [  478.510000] Loop: 836112 irq: 300586 ReSched: 737
> [  478.510000] Loop: 836113 irq: 300586 ReSched: 738
> [  478.510000] Loop: 836114 irq: 300586 ReSched: 739
> [  478.510000] Loop: 836115 irq: 300586 ReSched: 740
> [  478.510000] Loop: 836116 irq: 300586 ReSched: 741
> [  478.510000] Loop: 836117 irq: 300586 ReSched: 742
> [  478.510000] Loop: 836118 irq: 300586 ReSched: 743
> [  478.510000] Loop: 836119 irq: 300586 ReSched: 744
> [  478.510000] Loop: 836120 irq: 300586 ReSched: 745
> [  478.510000] Loop: 836121 irq: 300586 ReSched: 746
> [  478.510000] Loop: 836122 irq: 300586 ReSched: 747
> [  478.510000] Loop: 836123 irq: 300586 ReSched: 748
> [  478.510000] Loop: 836124 irq: 300586 ReSched: 749
> [  478.510000] Loop: 836125 irq: 300586 ReSched: 750
> [  478.510000] Loop: 836126 irq: 300586 ReSched: 751
> [  478.510000] Loop: 836127 irq: 300586 ReSched: 752
> [  478.510000] Loop: 836128 irq: 300586 ReSched: 753
> [  478.510000] Loop: 836129 irq: 300586 ReSched: 754
> [  478.510000] Loop: 836130 irq: 300586 ReSched: 755
> [  478.510000] Loop: 836131 irq: 300586 ReSched: 756
> [  478.510000] Loop: 836132 irq: 300586 ReSched: 757
> [  478.510000] Loop: 836133 irq: 300586 ReSched: 758
> [  478.510000] Loop: 836134 irq: 300586 ReSched: 759
>
>
> Koen
>
> On 2016-06-13 20:17, Tim Harvey wrote:
>> On Mon, Jun 13, 2016 at 9:12 AM, Koen Vandeputte
>> <koen.vandeputte@ncentric.com> wrote:
>>> Hi All,
>>>
>>> There seems to be a bug in the function eth_poll() in this driver
>>>
>>> When the RX ring gets full once, the re-schedule is called forever, 
>>> even
>>> when the ring is empty afterwards.
>>>
>>>
>>>      if (!received) {
>>>          napi_complete(napi);
>>>          enable_irq(sw->rx_irq);
>>>          budget = 0;
>>>
>>>          /* if rx descriptors are full schedule another poll */
>>>          if (rx_ring->desc[(i-1) & (RX_DESCS-1)].cown)
>>>          {
>>>              eth_schedule_poll(sw);    <----  Gets called on each 
>>> function
>>> entry
>>>          }
>>>      }
>>>
>>>
>>> This causes SoftIRQ to fully load a core forever.
>>>
>>>
>>> I didn't fix it yet, but should I be the first, i'll supply a patch ..
>>>
>> Koen,
>>
>> We have seen this before, but admittedly don't understand why we enter
>> into the same condition on each subsequent call to eth_poll(). The
>> check is to catch the condition described as irq rot [1] and is to
>> catch the case where after processing the full budget, we are
>> immediately full again (a situation which is easily re-producible with
>> a flood-ping). If this occurs we will no longer get an rx interrupt
>> (because the descriptors are full) and our napi function will never
>> get called again (unless transmitting packets).
>>
>> What is your proposed patch?
>>
>> Tim
>>
>> 1 
>> .http://www.linuxfoundation.org/collaborate/workgroups/networking/napi#IRQ_race_a.k.a_rotting_packet
>

Comments

Pushpal Sidhu June 15, 2016, 9:23 p.m. UTC | #1
Hi,

On Wed, Jun 15, 2016 at 2:16 PM, Koen Vandeputte
<koen.vandeputte@ncentric.com> wrote:
>
> Hi Pushpal,
>
> In our usecase the repro rate was 100% within a few minutes on every single board we have.
>
> Apologies for the patch format.
> it's my first commit ever to an external project so I still have some learning to do..

No worries. In general, you'll want to respond inline as well in
threads like this instead of top-posting (and send emails in
plaintext). A good starting point for patch formatting can be found
here: https://www.kernel.org/doc/Documentation/SubmittingPatches

Every project is different, so you'll have to read up on their
specific method of accepting patches.

- Pushpal

> Thanks for taking the time to verify the patch.

Thanks for providing a fix :)

- Pushpal

> Kind regards,
>
> Koen
>
> Koen,
>
> I've looked into this problem, and it does appear your patch correctly solves this issue. In the past, we've had an incredibly difficult time reproducing this issue and found that either having a perfect RF scenario or absolutely low RF scenario would not make the problem occur. Instead, I had to get the MCS to jump around in an inconsistent fashion before I could reproduce the problem. I'm not sure if this is because the processor load is going up and down or something else.
>
> However, your patch format is kind of funny. I would recommend using 'git send-email' to send your patch instead of in-lining the patch. Though that is up to whoever is going to commit this patch (probably Felix).
>
> Tested-by: Pushpal Sidhu <psidhu@gateworks.com>
>
> - Pushpal
>
> On Wed, Jun 15, 2016 at 6:13 AM, Koen Vandeputte <koen.vandeputte@ncentric.com> wrote:
>>
>> Hi All,
>>
>> I found the bug and was able to fix it.
>>
>> Tested on 4 identical gateworks gw2388-4 (cns3xxx) boards.
>>
>> - iperf in dual direction between board 1s & 2    (without patch)
>> - iperf in dual direction between board 3s & 4    (with patch)
>>
>> Boards 1 and 2 go in spinlock after a few minutes on each try
>> Boards 3 and 4 have been running for hours without the error
>>
>> To avoid any hardware delta, I retested with the patch only present on board 1 & 2
>> Board 1 & 2 now worked perfectly while 3 & 4 showed the loop issue
>>
>>
>>
>> Patch:
>>
>>
>> From 9338467c36b27bf1099dbdabd7591470c2a371df Mon Sep 17 00:00:00 2001
>> From: Koen Vandeputte <koen.vandeputte@ncentric.com>
>> Date: Wed, 15 Jun 2016 14:59:49 +0200
>> Subject: cns3xxx: fix RX softIRQ loop
>>
>> Already reschedule when 1 or more frames came in.
>>
>> Checking for a full queue could produce a re-schedule loop as
>> the checked RX ring location could contain undefined values
>> depending on activity in previous loops.
>>
>> Signed-off-by: Koen Vandeputte <koen.vandeputte@ncentric.com>
>>
>> diff --git a/target/linux/cns3xxx/files/drivers/net/ethernet/cavium/cns3xxx_eth.c b/target/linux/cns3xxx/files/drivers/net/ethernet/cavium/cns3xxx_eth.c
>> index 2f232c1..51b0187 100644
>> --- a/target/linux/cns3xxx/files/drivers/net/ethernet/cavium/cns3xxx_eth.c
>> +++ b/target/linux/cns3xxx/files/drivers/net/ethernet/cavium/cns3xxx_eth.c
>> @@ -717,8 +717,8 @@ static int eth_poll(struct napi_struct *napi, int budget)
>>          enable_irq(sw->rx_irq);
>>          budget = 0;
>>
>> -        /* if rx descriptors are full schedule another poll */
>> -        if (rx_ring->desc[(i-1) & (RX_DESCS-1)].cown)
>> +        /* If 1 or more frames came in during IRQ enable, re-schedule */
>> +        if (rx_ring->desc[i].cown)
>>              eth_schedule_poll(sw);
>>      }
>>
>>
>>
>> Kind regards,
>>
>> Koen
>>
>>
>> On 2016-06-14 16:34, Koen Vandeputte wrote:
>>>
>>> Hi Tim,
>>>
>>> I fully understand the reason for the check.
>>>
>>> However,
>>> I think the bug has nothing to do with the rotting packet issue.
>>>
>>>
>>> The bug is actually:
>>> - eth_schedule_poll() gets called as it's detected that the ring is full after IRQ_enable. (cown flag is set in the previous slot of the ring)
>>> --> During this extra loop,  _0_ packets are received/processed! The ring still seems to be full, and another reschedule takes place .. and so on ..
>>>
>>>
>>> Iv'e added 3 debug counters in the code:    (see below after the patch)
>>> - Loop --> Top of eth_poll()
>>> - IRQ --> Top of eth_rx_irq()
>>> - Resched --> Inside the reschedule check
>>>
>>>
>>>
>>>
>>> Patch:
>>>
>>> --- a/drivers/net/ethernet/cavium/cns3xxx_eth.c
>>> +++ b/drivers/net/ethernet/cavium/cns3xxx_eth.c
>>> @@ -308,6 +308,8 @@ static int ports_open;
>>>  static struct port *switch_port_tab[4];
>>>  struct net_device *napi_dev;
>>>
>>> +static int CntIRQ;
>>> +
>>>  static int cns3xxx_mdio_cmd(struct mii_bus *bus, int phy_id, int location,
>>>                 int write, u16 cmd)
>>>  {
>>> @@ -437,7 +439,10 @@ static void cns3xxx_adjust_link(struct n
>>>  static void eth_schedule_poll(struct sw *sw)
>>>  {
>>>      if (unlikely(!napi_schedule_prep(&sw->napi)))
>>> +    {
>>> +        //printk(KERN_ERR "sched err\n");
>>>          return;
>>> +    }
>>>
>>>      disable_irq_nosync(sw->rx_irq);
>>>      __napi_schedule(&sw->napi);
>>> @@ -447,6 +452,7 @@ irqreturn_t eth_rx_irq(int irq, void *pd
>>>  {
>>>      struct net_device *dev = pdev;
>>>      struct sw *sw = netdev_priv(dev);
>>> +    CntIRQ++;
>>>      eth_schedule_poll(sw);
>>>      return (IRQ_HANDLED);
>>>  }
>>> @@ -617,6 +623,7 @@ static void eth_complete_tx(struct sw *s
>>>
>>>  static int eth_poll(struct napi_struct *napi, int budget)
>>>  {
>>> +    static int Loop = 0;
>>>      struct sw *sw = container_of(napi, struct sw, napi);
>>>      struct _rx_ring *rx_ring = &sw->rx_ring;
>>>      int received = 0;
>>> @@ -712,6 +719,9 @@ static int eth_poll(struct napi_struct *
>>>      }
>>>
>>>      rx_ring->cur_index = i;
>>> +
>>> +    Loop++;
>>> +
>>>      if (!received) {
>>>          napi_complete(napi);
>>>          enable_irq(sw->rx_irq);
>>> @@ -719,7 +729,12 @@ static int eth_poll(struct napi_struct *
>>>
>>>          /* if rx descriptors are full schedule another poll */
>>>          if (rx_ring->desc[(i-1) & (RX_DESCS-1)].cown)
>>> +        {
>>> +            static int Sched = 0;
>>> +
>>> +            printk(KERN_ERR "Loop: %d irq: %d ReSched: %d\n", Loop, CntIRQ, ++Sched);
>>>              eth_schedule_poll(sw);
>>> +        }
>>>      }
>>>
>>>      spin_lock_bh(&tx_lock);
>>>
>>>
>>>
>>> When the bug triggers, you can see below that consecutive loops in eth_poll(), triggered by a reschedule, doesn't process any RX packets at all ..
>>> (As the ReSched counter increases linear with the loop counter and ReSched can only be incremented when "!received")
>>>
>>>
>>> Also
>>> Checking /proc/interrupts shows that gig_switch IRQ counter goes berserk (+10k IRQ/s)
>>>
>>>
>>> [  477.440000] Loop: 835307 irq: 300583 ReSched: 1
>>> [  477.440000] Loop: 835308 irq: 300583 ReSched: 2
>>> [  477.440000] Loop: 835310 irq: 300583 ReSched: 3
>>> [  477.440000] Loop: 835311 irq: 300583 ReSched: 4
>>> [  477.440000] Loop: 835312 irq: 300583 ReSched: 5
>>> [  477.440000] Loop: 835313 irq: 300583 ReSched: 6
>>> [  477.440000] Loop: 835314 irq: 300583 ReSched: 7
>>> [  477.440000] Loop: 835315 irq: 300583 ReSched: 8
>>> [  477.440000] Loop: 835316 irq: 300583 ReSched: 9
>>> [  477.440000] Loop: 835317 irq: 300583 ReSched: 10
>>> [  477.440000] Loop: 835318 irq: 300583 ReSched: 11
>>> [  477.440000] Loop: 835319 irq: 300583 ReSched: 12
>>> [  477.440000] Loop: 835320 irq: 300583 ReSched: 13
>>> [  477.440000] Loop: 835321 irq: 300583 ReSched: 14
>>> [  477.440000] Loop: 835322 irq: 300583 ReSched: 15
>>> [  477.440000] Loop: 835323 irq: 300583 ReSched: 16
>>> [  477.440000] Loop: 835324 irq: 300583 ReSched: 17
>>> [  477.440000] Loop: 835325 irq: 300583 ReSched: 18
>>> [  477.440000] Loop: 835326 irq: 300583 ReSched: 19
>>> [  477.440000] Loop: 835327 irq: 300583 ReSched: 20
>>> [  477.440000] Loop: 835328 irq: 300583 ReSched: 21
>>> [  477.440000] Loop: 835329 irq: 300583 ReSched: 22
>>> [  477.440000] Loop: 835330 irq: 300583 ReSched: 23
>>> [  477.440000] Loop: 835331 irq: 300583 ReSched: 24
>>> [  477.440000] Loop: 835332 irq: 300583 ReSched: 25
>>> [  477.440000] Loop: 835333 irq: 300583 ReSched: 26
>>> [  477.440000] Loop: 835334 irq: 300583 ReSched: 27
>>> [  477.440000] Loop: 835335 irq: 300583 ReSched: 28
>>> [  477.440000] Loop: 835336 irq: 300583 ReSched: 29
>>> [  477.440000] Loop: 835337 irq: 300583 ReSched: 30
>>> [  477.440000] Loop: 835338 irq: 300583 ReSched: 31
>>> [  477.440000] Loop: 835339 irq: 300583 ReSched: 32
>>> [  477.440000] Loop: 835340 irq: 300583 ReSched: 33
>>> [  477.440000] Loop: 835341 irq: 300583 ReSched: 34
>>> [  477.440000] Loop: 835342 irq: 300583 ReSched: 35
>>> [  477.440000] Loop: 835343 irq: 300583 ReSched: 36
>>> [  477.440000] Loop: 835344 irq: 300583 ReSched: 37
>>> [  477.440000] Loop: 835345 irq: 300583 ReSched: 38
>>> [  477.440000] Loop: 835346 irq: 300583 ReSched: 39
>>> [  477.440000] Loop: 835347 irq: 300583 ReSched: 40
>>> [  477.440000] Loop: 835348 irq: 300583 ReSched: 41
>>> [  477.440000] Loop: 835349 irq: 300583 ReSched: 42
>>> [  477.440000] Loop: 835350 irq: 300583 ReSched: 43
>>> [  477.440000] Loop: 835351 irq: 300583 ReSched: 44
>>> [  477.440000] Loop: 835352 irq: 300583 ReSched: 45
>>> [  477.440000] Loop: 835353 irq: 300583 ReSched: 46
>>> [  477.440000] Loop: 835354 irq: 300583 ReSched: 47
>>> [  477.440000] Loop: 835355 irq: 300583 ReSched: 48
>>> [  477.440000] Loop: 835356 irq: 300583 ReSched: 49
>>> [  477.440000] Loop: 835357 irq: 300583 ReSched: 50
>>> [  477.440000] Loop: 835358 irq: 300583 ReSched: 51
>>> [  477.440000] Loop: 835359 irq: 300583 ReSched: 52
>>> [  477.440000] Loop: 835360 irq: 300583 ReSched: 53
>>> [  477.440000] Loop: 835361 irq: 300583 ReSched: 54
>>> [  477.440000] Loop: 835362 irq: 300583 ReSched: 55
>>> [  477.440000] Loop: 835363 irq: 300583 ReSched: 56
>>> [  477.440000] Loop: 835364 irq: 300583 ReSched: 57
>>> [  477.440000] Loop: 835365 irq: 300583 ReSched: 58
>>> [  477.440000] Loop: 835366 irq: 300583 ReSched: 59
>>> [  477.440000] Loop: 835367 irq: 300583 ReSched: 60
>>> [  477.440000] Loop: 835368 irq: 300583 ReSched: 61
>>> [  477.440000] Loop: 835369 irq: 300583 ReSched: 62
>>> [  477.440000] Loop: 835370 irq: 300583 ReSched: 63
>>> [  477.440000] Loop: 835371 irq: 300583 ReSched: 64
>>> [  477.440000] Loop: 835372 irq: 300583 ReSched: 65
>>> [  477.440000] Loop: 835373 irq: 300583 ReSched: 66
>>> [  477.440000] Loop: 835374 irq: 300583 ReSched: 67
>>> [  477.440000] Loop: 835375 irq: 300583 ReSched: 68
>>> [  477.440000] Loop: 835376 irq: 300583 ReSched: 69
>>> [  477.440000] Loop: 835377 irq: 300583 ReSched: 70
>>> [  477.440000] Loop: 835378 irq: 300583 ReSched: 71
>>> [  477.440000] Loop: 835379 irq: 300583 ReSched: 72
>>> [  477.440000] Loop: 835380 irq: 300583 ReSched: 73
>>> [  477.440000] Loop: 835381 irq: 300583 ReSched: 74
>>> [  477.440000] Loop: 835382 irq: 300583 ReSched: 75
>>> [  477.440000] Loop: 835383 irq: 300583 ReSched: 76
>>> [  477.440000] Loop: 835384 irq: 300583 ReSched: 77
>>> [  477.440000] Loop: 835385 irq: 300583 ReSched: 78
>>> [  477.440000] Loop: 835386 irq: 300583 ReSched: 79
>>> [  477.440000] Loop: 835387 irq: 300583 ReSched: 80
>>> [  477.440000] Loop: 835388 irq: 300583 ReSched: 81
>>> [  477.440000] Loop: 835389 irq: 300583 ReSched: 82
>>> [  477.440000] Loop: 835390 irq: 300583 ReSched: 83
>>> [  477.440000] Loop: 835391 irq: 300583 ReSched: 84
>>> [  477.440000] Loop: 835392 irq: 300583 ReSched: 85
>>> [  477.440000] Loop: 835393 irq: 300583 ReSched: 86
>>> [  477.440000] Loop: 835394 irq: 300583 ReSched: 87
>>> [  477.440000] Loop: 835396 irq: 300583 ReSched: 88
>>> [  477.440000] Loop: 835397 irq: 300583 ReSched: 89
>>> [  477.440000] Loop: 835398 irq: 300583 ReSched: 90
>>> [  477.440000] Loop: 835399 irq: 300583 ReSched: 91
>>> [  477.440000] Loop: 835400 irq: 300583 ReSched: 92
>>> [  477.440000] Loop: 835403 irq: 300583 ReSched: 93
>>> [  477.440000] Loop: 835406 irq: 300583 ReSched: 94
>>> [  477.440000] Loop: 835407 irq: 300583 ReSched: 95
>>> [  477.440000] Loop: 835408 irq: 300583 ReSched: 96
>>> [  477.440000] Loop: 835409 irq: 300583 ReSched: 97
>>> [  477.440000] Loop: 835410 irq: 300583 ReSched: 98
>>> [  477.440000] Loop: 835411 irq: 300583 ReSched: 99
>>> [  477.440000] Loop: 835412 irq: 300583 ReSched: 100
>>> [  477.440000] Loop: 835413 irq: 300583 ReSched: 101
>>> [  477.440000] Loop: 835414 irq: 300583 ReSched: 102
>>> [  477.440000] Loop: 835415 irq: 300583 ReSched: 103
>>> [  477.440000] Loop: 835416 irq: 300583 ReSched: 104
>>> [  477.440000] Loop: 835417 irq: 300583 ReSched: 105
>>> [  477.440000] Loop: 835418 irq: 300583 ReSched: 106
>>> [  477.440000] Loop: 835419 irq: 300583 ReSched: 107
>>> [  477.440000] Loop: 835420 irq: 300583 ReSched: 108
>>> [  477.440000] Loop: 835421 irq: 300583 ReSched: 109
>>> [  477.440000] Loop: 835422 irq: 300583 ReSched: 110
>>> [  477.440000] Loop: 835423 irq: 300583 ReSched: 111
>>> [  477.440000] Loop: 835424 irq: 300583 ReSched: 112
>>> [  477.440000] Loop: 835425 irq: 300583 ReSched: 113
>>> [  477.440000] Loop: 835426 irq: 300583 ReSched: 114
>>> [  477.440000] Loop: 835427 irq: 300583 ReSched: 115
>>> [  477.440000] Loop: 835428 irq: 300583 ReSched: 116
>>> [  477.440000] Loop: 835429 irq: 300583 ReSched: 117
>>> [  477.440000] Loop: 835430 irq: 300583 ReSched: 118
>>> [  477.440000] Loop: 835431 irq: 300583 ReSched: 119
>>> [  477.440000] Loop: 835432 irq: 300583 ReSched: 120
>>> [  477.440000] Loop: 835433 irq: 300583 ReSched: 121
>>> [  477.440000] Loop: 835434 irq: 300583 ReSched: 122
>>> [  477.440000] Loop: 835435 irq: 300583 ReSched: 123
>>> [  477.440000] Loop: 835436 irq: 300583 ReSched: 124
>>> [  477.440000] Loop: 835437 irq: 300583 ReSched: 125
>>> [  477.440000] Loop: 835438 irq: 300583 ReSched: 126
>>> [  477.440000] Loop: 835439 irq: 300583 ReSched: 127
>>> [  477.440000] Loop: 835440 irq: 300583 ReSched: 128
>>> [  477.440000] Loop: 835441 irq: 300583 ReSched: 129
>>> [  477.440000] Loop: 835442 irq: 300583 ReSched: 130
>>> [  477.440000] Loop: 835443 irq: 300583 ReSched: 131
>>> [  477.440000] Loop: 835444 irq: 300583 ReSched: 132
>>> [  477.440000] Loop: 835445 irq: 300583 ReSched: 133
>>> [  477.440000] Loop: 835446 irq: 300583 ReSched: 134
>>> [  477.440000] Loop: 835447 irq: 300583 ReSched: 135
>>> [  477.440000] Loop: 835448 irq: 300583 ReSched: 136
>>> [  477.440000] Loop: 835449 irq: 300583 ReSched: 137
>>> [  477.440000] Loop: 835450 irq: 300583 ReSched: 138
>>> [  477.450000] Loop: 835451 irq: 300583 ReSched: 139
>>> [  477.450000] Loop: 835452 irq: 300583 ReSched: 140
>>> [  477.450000] Loop: 835453 irq: 300583 ReSched: 141
>>> [  477.450000] Loop: 835454 irq: 300583 ReSched: 142
>>> [  477.450000] Loop: 835455 irq: 300583 ReSched: 143
>>> [  477.450000] Loop: 835456 irq: 300583 ReSched: 144
>>> [  477.450000] Loop: 835457 irq: 300583 ReSched: 145
>>> [  477.450000] Loop: 835458 irq: 300583 ReSched: 146
>>> [  477.450000] Loop: 835459 irq: 300583 ReSched: 147
>>> [  477.450000] Loop: 835460 irq: 300583 ReSched: 148
>>> [  477.450000] Loop: 835461 irq: 300583 ReSched: 149
>>> [  477.450000] Loop: 835462 irq: 300583 ReSched: 150
>>> [  477.450000] Loop: 835463 irq: 300583 ReSched: 151
>>> [  477.450000] Loop: 835464 irq: 300583 ReSched: 152
>>> [  477.450000] Loop: 835465 irq: 300583 ReSched: 153
>>> [  477.450000] Loop: 835466 irq: 300583 ReSched: 154
>>> [  477.450000] Loop: 835467 irq: 300583 ReSched: 155
>>> [  477.450000] Loop: 835468 irq: 300583 ReSched: 156
>>> [  477.450000] Loop: 835469 irq: 300583 ReSched: 157
>>> [  477.450000] Loop: 835470 irq: 300583 ReSched: 158
>>> [  477.450000] Loop: 835472 irq: 300583 ReSched: 159
>>> [  477.450000] Loop: 835473 irq: 300583 ReSched: 160
>>> [  477.450000] Loop: 835474 irq: 300583 ReSched: 161
>>> [  477.450000] Loop: 835475 irq: 300583 ReSched: 162
>>> [  477.450000] Loop: 835476 irq: 300583 ReSched: 163
>>> [  477.450000] Loop: 835477 irq: 300583 ReSched: 164
>>> [  477.450000] Loop: 835478 irq: 300583 ReSched: 165
>>> [  477.450000] Loop: 835479 irq: 300583 ReSched: 166
>>> [  477.450000] Loop: 835480 irq: 300583 ReSched: 167
>>> [  477.450000] Loop: 835481 irq: 300583 ReSched: 168
>>> [  477.450000] Loop: 835482 irq: 300583 ReSched: 169
>>> [  477.450000] Loop: 835483 irq: 300583 ReSched: 170
>>> [  477.450000] Loop: 835484 irq: 300583 ReSched: 171
>>> [  477.450000] Loop: 835485 irq: 300583 ReSched: 172
>>> [  477.450000] Loop: 835486 irq: 300583 ReSched: 173
>>> [  477.450000] Loop: 835487 irq: 300583 ReSched: 174
>>> [  477.450000] Loop: 835488 irq: 300583 ReSched: 175
>>> [  477.450000] Loop: 835489 irq: 300583 ReSched: 176
>>> [  477.450000] Loop: 835490 irq: 300583 ReSched: 177
>>> [  477.450000] Loop: 835491 irq: 300583 ReSched: 178
>>> [  477.450000] Loop: 835492 irq: 300583 ReSched: 179
>>> [  477.450000] Loop: 835493 irq: 300583 ReSched: 180
>>> [  477.450000] Loop: 835494 irq: 300583 ReSched: 181
>>> [  477.450000] Loop: 835495 irq: 300583 ReSched: 182
>>> [  477.450000] Loop: 835496 irq: 300583 ReSched: 183
>>> [  477.450000] Loop: 835497 irq: 300583 ReSched: 184
>>> [  477.450000] Loop: 835498 irq: 300583 ReSched: 185
>>> [  477.450000] Loop: 835499 irq: 300583 ReSched: 186
>>> [  477.450000] Loop: 835500 irq: 300583 ReSched: 187
>>> [  477.450000] Loop: 835501 irq: 300583 ReSched: 188
>>> [  477.450000] Loop: 835502 irq: 300583 ReSched: 189
>>> [  477.450000] Loop: 835503 irq: 300583 ReSched: 190
>>> [  477.450000] Loop: 835504 irq: 300583 ReSched: 191
>>> [  477.450000] Loop: 835505 irq: 300583 ReSched: 192
>>> [  477.450000] Loop: 835506 irq: 300583 ReSched: 193
>>> [  477.450000] Loop: 835507 irq: 300583 ReSched: 194
>>> [  477.450000] Loop: 835508 irq: 300583 ReSched: 195
>>> [  477.450000] Loop: 835509 irq: 300583 ReSched: 196
>>> [  477.450000] Loop: 835510 irq: 300583 ReSched: 197
>>> [  477.450000] Loop: 835511 irq: 300583 ReSched: 198
>>> [  477.450000] Loop: 835512 irq: 300583 ReSched: 199
>>> [  477.450000] Loop: 835513 irq: 300583 ReSched: 200
>>> [  477.450000] Loop: 835514 irq: 300583 ReSched: 201
>>> [  477.450000] Loop: 835515 irq: 300583 ReSched: 202
>>> [  477.450000] Loop: 835516 irq: 300583 ReSched: 203
>>> [  477.450000] Loop: 835517 irq: 300583 ReSched: 204
>>> [  477.450000] Loop: 835518 irq: 300583 ReSched: 205
>>> [  477.450000] Loop: 835519 irq: 300583 ReSched: 206
>>> [  477.450000] Loop: 835520 irq: 300583 ReSched: 207
>>> [  477.450000] Loop: 835521 irq: 300583 ReSched: 208
>>> [  477.450000] Loop: 835522 irq: 300583 ReSched: 209
>>> [  477.450000] Loop: 835523 irq: 300583 ReSched: 210
>>> [  477.450000] Loop: 835524 irq: 300583 ReSched: 211
>>> [  477.450000] Loop: 835525 irq: 300583 ReSched: 212
>>> [  477.450000] Loop: 835526 irq: 300583 ReSched: 213
>>> [  477.450000] Loop: 835527 irq: 300583 ReSched: 214
>>> [  477.450000] Loop: 835528 irq: 300583 ReSched: 215
>>> [  477.450000] Loop: 835529 irq: 300583 ReSched: 216
>>> [  477.450000] Loop: 835530 irq: 300583 ReSched: 217
>>> [  478.450000] Loop: 835534 irq: 300584 ReSched: 218
>>> [  478.460000] Loop: 835536 irq: 300585 ReSched: 219
>>> [  478.470000] Loop: 835538 irq: 300586 ReSched: 220
>>> [  478.470000] Loop: 835539 irq: 300586 ReSched: 221
>>> [  478.500000] Loop: 835983 irq: 300586 ReSched: 622
>>> ...
>>> [  478.510000] Loop: 836072 irq: 300586 ReSched: 697
>>> [  478.510000] Loop: 836073 irq: 300586 ReSched: 698
>>> [  478.510000] Loop: 836074 irq: 300586 ReSched: 699
>>> [  478.510000] Loop: 836075 irq: 300586 ReSched: 700
>>> [  478.510000] Loop: 836076 irq: 300586 ReSched: 701
>>> [  478.510000] Loop: 836077 irq: 300586 ReSched: 702
>>> [  478.510000] Loop: 836078 irq: 300586 ReSched: 703
>>> [  478.510000] Loop: 836079 irq: 300586 ReSched: 704
>>> [  478.510000] Loop: 836080 irq: 300586 ReSched: 705
>>> [  478.510000] Loop: 836081 irq: 300586 ReSched: 706
>>> [  478.510000] Loop: 836082 irq: 300586 ReSched: 707
>>> [  478.510000] Loop: 836083 irq: 300586 ReSched: 708
>>> [  478.510000] Loop: 836084 irq: 300586 ReSched: 709
>>> [  478.510000] Loop: 836085 irq: 300586 ReSched: 710
>>> [  478.510000] Loop: 836086 irq: 300586 ReSched: 711
>>> [  478.510000] Loop: 836087 irq: 300586 ReSched: 712
>>> [  478.510000] Loop: 836088 irq: 300586 ReSched: 713
>>> [  478.510000] Loop: 836089 irq: 300586 ReSched: 714
>>> [  478.510000] Loop: 836090 irq: 300586 ReSched: 715
>>> [  478.510000] Loop: 836091 irq: 300586 ReSched: 716
>>> [  478.510000] Loop: 836092 irq: 300586 ReSched: 717
>>> [  478.510000] Loop: 836093 irq: 300586 ReSched: 718
>>> [  478.510000] Loop: 836094 irq: 300586 ReSched: 719
>>> [  478.510000] Loop: 836095 irq: 300586 ReSched: 720
>>> [  478.510000] Loop: 836096 irq: 300586 ReSched: 721
>>> [  478.510000] Loop: 836097 irq: 300586 ReSched: 722
>>> [  478.510000] Loop: 836098 irq: 300586 ReSched: 723
>>> [  478.510000] Loop: 836099 irq: 300586 ReSched: 724
>>> [  478.510000] Loop: 836100 irq: 300586 ReSched: 725
>>> [  478.510000] Loop: 836101 irq: 300586 ReSched: 726
>>> [  478.510000] Loop: 836102 irq: 300586 ReSched: 727
>>> [  478.510000] Loop: 836103 irq: 300586 ReSched: 728
>>> [  478.510000] Loop: 836104 irq: 300586 ReSched: 729
>>> [  478.510000] Loop: 836105 irq: 300586 ReSched: 730
>>> [  478.510000] Loop: 836106 irq: 300586 ReSched: 731
>>> [  478.510000] Loop: 836107 irq: 300586 ReSched: 732
>>> [  478.510000] Loop: 836108 irq: 300586 ReSched: 733
>>> [  478.510000] Loop: 836109 irq: 300586 ReSched: 734
>>> [  478.510000] Loop: 836110 irq: 300586 ReSched: 735
>>> [  478.510000] Loop: 836111 irq: 300586 ReSched: 736
>>> [  478.510000] Loop: 836112 irq: 300586 ReSched: 737
>>> [  478.510000] Loop: 836113 irq: 300586 ReSched: 738
>>> [  478.510000] Loop: 836114 irq: 300586 ReSched: 739
>>> [  478.510000] Loop: 836115 irq: 300586 ReSched: 740
>>> [  478.510000] Loop: 836116 irq: 300586 ReSched: 741
>>> [  478.510000] Loop: 836117 irq: 300586 ReSched: 742
>>> [  478.510000] Loop: 836118 irq: 300586 ReSched: 743
>>> [  478.510000] Loop: 836119 irq: 300586 ReSched: 744
>>> [  478.510000] Loop: 836120 irq: 300586 ReSched: 745
>>> [  478.510000] Loop: 836121 irq: 300586 ReSched: 746
>>> [  478.510000] Loop: 836122 irq: 300586 ReSched: 747
>>> [  478.510000] Loop: 836123 irq: 300586 ReSched: 748
>>> [  478.510000] Loop: 836124 irq: 300586 ReSched: 749
>>> [  478.510000] Loop: 836125 irq: 300586 ReSched: 750
>>> [  478.510000] Loop: 836126 irq: 300586 ReSched: 751
>>> [  478.510000] Loop: 836127 irq: 300586 ReSched: 752
>>> [  478.510000] Loop: 836128 irq: 300586 ReSched: 753
>>> [  478.510000] Loop: 836129 irq: 300586 ReSched: 754
>>> [  478.510000] Loop: 836130 irq: 300586 ReSched: 755
>>> [  478.510000] Loop: 836131 irq: 300586 ReSched: 756
>>> [  478.510000] Loop: 836132 irq: 300586 ReSched: 757
>>> [  478.510000] Loop: 836133 irq: 300586 ReSched: 758
>>> [  478.510000] Loop: 836134 irq: 300586 ReSched: 759
>>>
>>>
>>> Koen
>>>
>>> On 2016-06-13 20:17, Tim Harvey wrote:
>>>>
>>>> On Mon, Jun 13, 2016 at 9:12 AM, Koen Vandeputte
>>>> <koen.vandeputte@ncentric.com> wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>> There seems to be a bug in the function eth_poll() in this driver
>>>>>
>>>>> When the RX ring gets full once, the re-schedule is called forever, even
>>>>> when the ring is empty afterwards.
>>>>>
>>>>>
>>>>>      if (!received) {
>>>>>          napi_complete(napi);
>>>>>          enable_irq(sw->rx_irq);
>>>>>          budget = 0;
>>>>>
>>>>>          /* if rx descriptors are full schedule another poll */
>>>>>          if (rx_ring->desc[(i-1) & (RX_DESCS-1)].cown)
>>>>>          {
>>>>>              eth_schedule_poll(sw);    <----  Gets called on each function
>>>>> entry
>>>>>          }
>>>>>      }
>>>>>
>>>>>
>>>>> This causes SoftIRQ to fully load a core forever.
>>>>>
>>>>>
>>>>> I didn't fix it yet, but should I be the first, i'll supply a patch ..
>>>>>
>>>> Koen,
>>>>
>>>> We have seen this before, but admittedly don't understand why we enter
>>>> into the same condition on each subsequent call to eth_poll(). The
>>>> check is to catch the condition described as irq rot [1] and is to
>>>> catch the case where after processing the full budget, we are
>>>> immediately full again (a situation which is easily re-producible with
>>>> a flood-ping). If this occurs we will no longer get an rx interrupt
>>>> (because the descriptors are full) and our napi function will never
>>>> get called again (unless transmitting packets).
>>>>
>>>> What is your proposed patch?
>>>>
>>>> Tim
>>>>
>>>> 1 .http://www.linuxfoundation.org/collaborate/workgroups/networking/napi#IRQ_race_a.k.a_rotting_packet
>>>
>>>
>>
>>
>> _______________________________________________
>> Lede-dev mailing list
>> Lede-dev@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/lede-dev
>
>
diff mbox

Patch

diff --git 
a/target/linux/cns3xxx/files/drivers/net/ethernet/cavium/cns3xxx_eth.c 
b/target/linux/cns3xxx/files/drivers/net/ethernet/cavium/cns3xxx_eth.c
index 2f232c1..51b0187 100644
--- a/target/linux/cns3xxx/files/drivers/net/ethernet/cavium/cns3xxx_eth.c
+++ b/target/linux/cns3xxx/files/drivers/net/ethernet/cavium/cns3xxx_eth.c
@@ -717,8 +717,8 @@  static int eth_poll(struct napi_struct *napi, int 
budget)
          enable_irq(sw->rx_irq);
          budget = 0;

-        /* if rx descriptors are full schedule another poll */
-        if (rx_ring->desc[(i-1) & (RX_DESCS-1)].cown)
+        /* If 1 or more frames came in during IRQ enable, re-schedule */
+        if (rx_ring->desc[i].cown)
              eth_schedule_poll(sw);
      }