diff mbox series

[net] igb: fix netpoll exit with traffic

Message ID 20210507023001.2621951-1-jesse.brandeburg@intel.com
State Changes Requested
Delegated to: Anthony Nguyen
Headers show
Series [net] igb: fix netpoll exit with traffic | expand

Commit Message

Jesse Brandeburg May 7, 2021, 2:30 a.m. UTC
Oleksandr brought a bug report where netpoll causes trace messages in
the log on igb.

[22038.710800] ------------[ cut here ]------------
[22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
[22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0

After some discussion and debug from the list, it was deemed that the
right thing to do is initialize the clean_complete variable to false
when the "netpoll mode" of passing a zero budget is used.

This logic should be sane and not risky because the only time budget
should be zero on entry is netpoll.  Change includes a small refactor
of local variable assignments to clean up the look.

Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead and improve performance")
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
---

Compile tested ONLY, but functionally it should be exactly the same for
all cases except when budget is zero on entry, which will hopefully fix
the bug.

Sending this through intel-wired-lan with Alex's ACK.
---
 drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Oleksandr Natalenko May 7, 2021, 6:32 a.m. UTC | #1
Hello.

On Thu, May 06, 2021 at 07:30:01PM -0700, Jesse Brandeburg wrote:
> Oleksandr brought a bug report where netpoll causes trace messages in
> the log on igb.
> 
> [22038.710800] ------------[ cut here ]------------
> [22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
> [22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0
> 
> After some discussion and debug from the list, it was deemed that the
> right thing to do is initialize the clean_complete variable to false
> when the "netpoll mode" of passing a zero budget is used.
> 
> This logic should be sane and not risky because the only time budget
> should be zero on entry is netpoll.  Change includes a small refactor
> of local variable assignments to clean up the look.
> 
> Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead and improve performance")
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>

Thanks for the patch, I'll test it ASAP and ask other persons from the
bugzilla to provide a feedback if possible.

Also, you may want to include a link to the kernel bugzilla into the
commit message:

Link: https://bugzilla.kernel.org/show_bug.cgi?id=212573

> ---
> 
> Compile tested ONLY, but functionally it should be exactly the same for
> all cases except when budget is zero on entry, which will hopefully fix
> the bug.
> 
> Sending this through intel-wired-lan with Alex's ACK.
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 0cd37ad81b4e..b0a9bed14071 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -7991,12 +7991,16 @@ static void igb_ring_irq_enable(struct igb_q_vector *q_vector)
>   **/
>  static int igb_poll(struct napi_struct *napi, int budget)
>  {
> -	struct igb_q_vector *q_vector = container_of(napi,
> -						     struct igb_q_vector,
> -						     napi);
> -	bool clean_complete = true;
> +	struct igb_q_vector *q_vector;
> +	bool clean_complete;
>  	int work_done = 0;
>  
> +	/* if budget is zero, we have a special case for netconsole, so
> +	 * make sure to set clean_complete to false in that case.
> +	 */
> +	clean_complete = !!budget;
> +
> +	q_vector = container_of(napi, struct igb_q_vector, napi);
>  #ifdef CONFIG_IGB_DCA
>  	if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
>  		igb_update_dca(q_vector);
> -- 
> 2.30.2
>
Oleksandr Natalenko May 8, 2021, 10:26 a.m. UTC | #2
Hello.

On Thu, May 06, 2021 at 07:30:01PM -0700, Jesse Brandeburg wrote:
> Oleksandr brought a bug report where netpoll causes trace messages in
> the log on igb.
> 
> [22038.710800] ------------[ cut here ]------------
> [22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
> [22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0
> 
> After some discussion and debug from the list, it was deemed that the
> right thing to do is initialize the clean_complete variable to false
> when the "netpoll mode" of passing a zero budget is used.
> 
> This logic should be sane and not risky because the only time budget
> should be zero on entry is netpoll.  Change includes a small refactor
> of local variable assignments to clean up the look.
> 
> Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead and improve performance")
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
> ---
> 
> Compile tested ONLY, but functionally it should be exactly the same for
> all cases except when budget is zero on entry, which will hopefully fix
> the bug.
> 
> Sending this through intel-wired-lan with Alex's ACK.
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 0cd37ad81b4e..b0a9bed14071 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -7991,12 +7991,16 @@ static void igb_ring_irq_enable(struct igb_q_vector *q_vector)
>   **/
>  static int igb_poll(struct napi_struct *napi, int budget)
>  {
> -	struct igb_q_vector *q_vector = container_of(napi,
> -						     struct igb_q_vector,
> -						     napi);
> -	bool clean_complete = true;
> +	struct igb_q_vector *q_vector;
> +	bool clean_complete;
>  	int work_done = 0;
>  
> +	/* if budget is zero, we have a special case for netconsole, so
> +	 * make sure to set clean_complete to false in that case.
> +	 */
> +	clean_complete = !!budget;
> +
> +	q_vector = container_of(napi, struct igb_q_vector, napi);
>  #ifdef CONFIG_IGB_DCA
>  	if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
>  		igb_update_dca(q_vector);
> -- 
> 2.30.2
> 

This didn't fix the issue neither for me, nor for another person from
the kernel bugzilla [1].

The same printout still happens:

```
May 07 20:26:55 spock kernel: igb_poll+0x0/0x1440 [igb] exceeded budget in poll
May 07 20:26:55 spock kernel: WARNING: CPU: 13 PID: 12285 at net/core/netpoll.c:154 netpoll_poll_dev+0x18a/0x1a0
…
May 07 20:26:55 spock kernel: Call Trace:
May 07 20:26:55 spock kernel:  netpoll_send_skb+0x1a0/0x260
May 07 20:26:55 spock kernel:  write_msg+0xe5/0x100 [netconsole]
May 07 20:26:55 spock kernel:  console_unlock+0x42f/0x720
May 07 20:26:55 spock kernel:  suspend_devices_and_enter+0x2ac/0x7f0
May 07 20:26:55 spock kernel:  pm_suspend.cold+0x321/0x36c
May 07 20:26:55 spock kernel:  state_store+0xa6/0x140
May 07 20:26:55 spock kernel:  kernfs_fop_write_iter+0x124/0x1b0
May 07 20:26:55 spock kernel:  new_sync_write+0x16a/0x200
May 07 20:26:55 spock kernel:  vfs_write+0x223/0x2e0
May 07 20:26:55 spock kernel:  __x64_sys_write+0x6d/0xf0
```

Probably, your patch is still fine, but another idea is desperately
needed :).

Thanks.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=212573
Oleksandr Natalenko July 9, 2021, 2:05 p.m. UTC | #3
On sobota 8. května 2021 12:26:36 CEST Oleksandr Natalenko wrote:
> Hello.
> 
> On Thu, May 06, 2021 at 07:30:01PM -0700, Jesse Brandeburg wrote:
> > Oleksandr brought a bug report where netpoll causes trace messages in
> > the log on igb.
> > 
> > [22038.710800] ------------[ cut here ]------------
> > [22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
> > [22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155
> > netpoll_poll_dev+0x18a/0x1a0
> > 
> > After some discussion and debug from the list, it was deemed that the
> > right thing to do is initialize the clean_complete variable to false
> > when the "netpoll mode" of passing a zero budget is used.
> > 
> > This logic should be sane and not risky because the only time budget
> > should be zero on entry is netpoll.  Change includes a small refactor
> > of local variable assignments to clean up the look.
> > 
> > Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead and
> > improve performance") Reported-by: Oleksandr Natalenko
> > <oleksandr@natalenko.name>
> > Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
> > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
> > ---
> > 
> > Compile tested ONLY, but functionally it should be exactly the same for
> > all cases except when budget is zero on entry, which will hopefully fix
> > the bug.
> > 
> > Sending this through intel-wired-lan with Alex's ACK.
> > ---
> > 
> >  drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > b/drivers/net/ethernet/intel/igb/igb_main.c index
> > 0cd37ad81b4e..b0a9bed14071 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -7991,12 +7991,16 @@ static void igb_ring_irq_enable(struct
> > igb_q_vector *q_vector)> 
> >   **/
> >  
> >  static int igb_poll(struct napi_struct *napi, int budget)
> >  {
> > 
> > -	struct igb_q_vector *q_vector = container_of(napi,
> > -						     struct igb_q_vector,
> > -						     napi);
> > -	bool clean_complete = true;
> > +	struct igb_q_vector *q_vector;
> > +	bool clean_complete;
> > 
> >  	int work_done = 0;
> > 
> > +	/* if budget is zero, we have a special case for netconsole, so
> > +	 * make sure to set clean_complete to false in that case.
> > +	 */
> > +	clean_complete = !!budget;
> > +
> > +	q_vector = container_of(napi, struct igb_q_vector, napi);
> > 
> >  #ifdef CONFIG_IGB_DCA
> >  
> >  	if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
> >  	
> >  		igb_update_dca(q_vector);
> 
> This didn't fix the issue neither for me, nor for another person from
> the kernel bugzilla [1].
> 
> The same printout still happens:
> 
> ```
> May 07 20:26:55 spock kernel: igb_poll+0x0/0x1440 [igb] exceeded budget in
> poll May 07 20:26:55 spock kernel: WARNING: CPU: 13 PID: 12285 at
> net/core/netpoll.c:154 netpoll_poll_dev+0x18a/0x1a0 …
> May 07 20:26:55 spock kernel: Call Trace:
> May 07 20:26:55 spock kernel:  netpoll_send_skb+0x1a0/0x260
> May 07 20:26:55 spock kernel:  write_msg+0xe5/0x100 [netconsole]
> May 07 20:26:55 spock kernel:  console_unlock+0x42f/0x720
> May 07 20:26:55 spock kernel:  suspend_devices_and_enter+0x2ac/0x7f0
> May 07 20:26:55 spock kernel:  pm_suspend.cold+0x321/0x36c
> May 07 20:26:55 spock kernel:  state_store+0xa6/0x140
> May 07 20:26:55 spock kernel:  kernfs_fop_write_iter+0x124/0x1b0
> May 07 20:26:55 spock kernel:  new_sync_write+0x16a/0x200
> May 07 20:26:55 spock kernel:  vfs_write+0x223/0x2e0
> May 07 20:26:55 spock kernel:  __x64_sys_write+0x6d/0xf0
> ```
> 
> Probably, your patch is still fine, but another idea is desperately
> needed :).
> 
> Thanks.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=212573

Gentle ping. IIUC, the patch was not picked up anyway, but maybe there's 
another idea?

Thanks.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 0cd37ad81b4e..b0a9bed14071 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -7991,12 +7991,16 @@  static void igb_ring_irq_enable(struct igb_q_vector *q_vector)
  **/
 static int igb_poll(struct napi_struct *napi, int budget)
 {
-	struct igb_q_vector *q_vector = container_of(napi,
-						     struct igb_q_vector,
-						     napi);
-	bool clean_complete = true;
+	struct igb_q_vector *q_vector;
+	bool clean_complete;
 	int work_done = 0;
 
+	/* if budget is zero, we have a special case for netconsole, so
+	 * make sure to set clean_complete to false in that case.
+	 */
+	clean_complete = !!budget;
+
+	q_vector = container_of(napi, struct igb_q_vector, napi);
 #ifdef CONFIG_IGB_DCA
 	if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
 		igb_update_dca(q_vector);